Notes on string theory #3: Nambu-Goto action

1. Introduction

I haven’t been keeping up with this as much as I would like, mainly because I have been busy. But I am committed to continuing to reupload many of my notes on Polchinski’s textbooks. It is fun for me to go through it all again in my spare time, and I’ve noticed that since the time of first working through the textbooks there is more I can add to many topics.

It is worth remembering that, in the last note, we reviewed the classical worldline and polynomial action for the relativistic point particle. We also discussed reparameterisation invariance and calculated the equations of motion. In this note, the focus is to construct the first-principle Nambu-Goto action for the relativistic string as given in equations (1.2.9a-1.2.9b) in Polchinski’s textbook.

Often in popular literature and discourse I read descriptions of the string that almost shroud it in mystery. How could the fundamental constituents of matter be described by a bunch of strings? Other times, caricatures of string theory can leave the impression that to view all elementary particles as vibrating strings is somewhat arbitrary. Why not some other type of objects? It is suggestive of a certain arbitrariness to the idea of modelling fundamental particles as strings; but the development of string theory is, in fact, well motivated. Ultimately, all that we’re doing is extending the concept of point particles that we all know and love, and this is first and foremost evidenced in the Nambu-Goto action. But, in terms of the bigger picture, what we see is that in studying the string and its dynamics an entire universe of implications emerge. It reminds me of a great line in David Tong’s lecture notes that is worth paraphrasing: we find that the requirements demanded by the tiny string are so stringent that we are led naturally to a description of how the entire universe moves. On many occasions it is, indeed, like “the tail is wagging the dog”.

As we’re following Polchinski’s textbook, which only covers the Nambu-Goto action in a few words, if the interested reader would like to spend more time studying this action I would recommend ‘String theory and M-theory’ by Katrin Becker, Melanie Becker, and John H. Schwarz, especially the exercises, or for an even more gentle introduction see Barton Zwiebach’s textbook ‘A first course in string theory’.

2. Area functional

To arrive at the Nambu-Goto action, let us first recall from the last note that a p-brane may be described as a p-dimensional object moving through D-dimensional flat spacetime with {D \geq p}. If a 0-dimensional point particle (0-brane) traces out a (0+1)-dimensional worldline, it follows that a 1-dimensional string (1-brane) sweeps out a (1+1)-dimensional surface that we call the string worldsheet. And just as we can parameterise the relativistic point particle’s (0+1)-dimensional worldline, we can parameterise the (1+1)-dimensional worldsheet traced by the string. Coming to grips with this idea is the first task.

The main idea is that the worldline of a particle is replaced by the worldsheet {\Sigma}, which is a surface embedded into D-dimensional Minkowksi spacetime. Given that the path of a point particle can be described by a single parameter, the proper time {\tau}, which multiplied by c, is the Lorentz invariant proper length of the worldline; for strings, we will define the Lorentz invariant proper area of the worldsheet in a completely analogous way. As we’ll see, the first-principle string action is indeed proportional to this proper area.

To start, we see that because the string worldsheet is a (1+1)-dimensional surface, its requires two parameters which we will denote as {\xi^{1}} and {\xi^{2}}. We will also limit our present considerations to the case of an open string (we will talk about closed strings in a later note). In order to define the appropriate area functional, we want to sketch a grid on the spacial surface of the string worldsheet with lines of constant {\xi^{1}} and {\xi^{2}}; then we want to embed this spatial surface in the background target space.

The target space is the world where the 2-dimensional surface lives. Ultimately, we want to distinguish between the area we parameterise and the actual physical string worldsheet. In order to accomplish this, we define a one-to-one map, which we may call the string map. The purpose of the string map is therefore to take us from the parameter space that we have constructed to the target space in which the physical surface propagates. Indeed, as we’ll see, the string action is in this precise sense defined as a functional of smooth maps.

To construct the string map, we first formalise the notion of area in parameter space, with this parameter space itself defined by the range of the parameters {\xi^{1}} and {\xi^{2}}. One can, in principle, view the parameters we have selected as local coordinates on the surface. And so, as emphasised above, we can think of the worldsheet as a physical surface, which is in fact the image of the parameter space under the one-to-one string map written as {\vec{x}(\xi^{1}, \xi^{2})}. The parameterised surface can therefore be described by the coordinate functions

\displaystyle  \vec{x}(\xi^1 , \xi^2) = x^1 (\xi^1 , \xi^2), x^2 (\xi^1 , \xi^2), x^3 (\xi^1 , \xi^2). \ \ \ \ \ (1)

The area to which we want to give mathematical description is more accurately an infinitesimal area element. Since we begin working in a parameter space, and since our very small square is mapped onto the surface in target space, when we map this very small area from the parameter space to the surface we achieve a parallelogram the sides of which may be denoted as {d\vec{v}_1} and {d\vec{v}_2}. We can express this as follows:

\displaystyle  d \vec{v}_1 = \frac{\partial \vec{x}}{\partial \xi^1} d\xi^1

\displaystyle  d \vec{v}_2 = \frac{\partial \vec{x}}{\partial \xi^2}d\xi^2, \ \ \ \ \ (2)

in which we have defined the rate of variation of the coordinates with respect to the parameters {\xi}. If we multiply this rate by the length {d\xi} of the horizontal side of the infinitesimal parallelogram, we get the vector {d \vec{v}_1} that represents this side in the target space.

The main objective is to now compute the area {dA} of this parallelogram.


Since we have already labelled the sides of the infinitesimal area in the parameter space, we simply need to invoke the formula for the area of a parallelogram:

\displaystyle  d^2 A = \mid d\vec{v}_1 \mid \mid d\vec{v}_2 \mid \mid \sin \theta \mid

\displaystyle  = \mid d\vec{v}_1 \mid \mid d\vec{v}_2 \mid \sqrt{1 - \cos^2 \theta}

\displaystyle  = \sqrt{\mid d\vec{v}_1 \mid^2 \mid d\vec{v}_2 \mid^2 - \mid d\vec{v}_1 \mid^2 \mid d\vec{v}_2 \mid^2 \cos^2 \theta}. \ \ \ \ \ (3)

Here {\theta} denotes the angle between the vectors {dv_1} and {dv_2}. Written in terms of dot products in which {(\vec{A} \times \vec{B}) \cdot (\vec{A} \times \vec{B}) = \mid A \mid^2 \mid B \mid^2 - (A \cdot B)^2} such that

\displaystyle  (d\vec{v}_1 \times d\vec{v}_2) \cdot (d\vec{v}_1 \times d\vec{v}_2) = (d\vec{v}_1)^2 (d\vec{v})^2 - (d\vec{v}_1 \cdot d\vec{v}_2)^2

we have

\displaystyle  = \sqrt{(d\vec{v}_1 \cdot d\vec{v}_1) (d\vec{v}_2 \cdot d\vec{v}_2) - (d\vec{v}_1 \cdot d\vec{v}_2)^2}. \ \ \ \ \ (4)

From this result, notice that we can now substitute for {d\vec{v}_1} and {d\vec{v}_2} using (2). Doing so gives

\displaystyle  dA = \sqrt{(\frac{\partial \vec{x}}{\partial \xi^1} \cdot \frac{\partial \vec{x}}{\partial \xi^1})(\frac{\partial \vec{x}}{\partial \xi^2} \cdot \frac{\partial \vec{x}}{\partial \xi^2}) - (\frac{\partial \vec{x}}{\partial \xi^1} \cdot \frac{\partial \vec{x}}{\partial \xi^2})^2} d\xi^1 d\xi^2. \ \ \ \ \ (5)

We have now obtained a general expression for the area element of the parameterised spatial surface. Written as the full area functional we have

\displaystyle  A = \int d\xi^1 d\xi^2 \ \sqrt{(\frac{\partial \vec{x}}{\partial \xi^1} \cdot \frac{\partial \vec{x}}{\partial \xi^1})(\frac{\partial \vec{x}}{\partial \xi^2} \cdot \frac{\partial \vec{x}}{\partial \xi^2}) - (\frac{\partial \vec{x}}{\partial \xi^1} \cdot \frac{\partial \vec{x}}{\partial \xi^2})^2}, \ \ \ \ \ (6)

where the integral extends over the ranges of the parameters {\xi^{1}} and {\xi^{2}}. This functional is reparameterisation invariant, which can be quickly verified by reparameterising the surface with tilde parameters {(\tilde{\xi}^1, \tilde{\xi}^2)} that then gives back (6) when {\tilde{\xi}^1 = \tilde{\xi}^1 (\xi^1)} and {\tilde{\xi}^2 = \tilde{\xi}^2 (\xi^2)}.

The main issue is that the area functional (6) is not very nice, and reparameterisation invariance is not completely general. We want reparameterisation invariance to be manifest.

3. Induced Metric

Suppose we have some vector {d\vec{x}} on the surface {\Sigma} that we have so far pencilled into the target space. We know that we can describe this surface through the string mapping functions {\vec{x}(\xi^{1}, \xi^{2})}. What if we then consider {d\vec{x}} tangent to the surface {\Sigma}? We could then let {ds} denote the length of this tangent vector, and hence we could invoke some early idea of a metric on {\Sigma}.

Given the vector tangent to the surface, with {ds} its length, we can write

\displaystyle  ds^2 = d\vec{x} \cdot d\vec{x}. \ \ \ \ \ (7)

But what is {d\vec{x}} in terms of the parameter space coordinates that we constructed? In other words, can we relate {d\vec{x}} with {\xi^{1}, \xi^{2}}? This is precisely what our mapping accomplishes such that we can express {d\vec{x}} in terms of partial derivatives and derivatives of {\xi^{1}, \xi^{2}}:

\displaystyle  d\vec{x} = \frac{\partial \vec{x}}{\partial \xi^1} d\xi^1 + \frac{\partial \vec{x}}{\partial \xi^2} d\xi^2 = \frac{\partial \vec{x}}{\partial \xi^i} d\xi^i, \ \ \ \ \ (8)

with the summation convention assumed for the repeated indices over possible values 1 and 2. If we now return to (7) and plug {d\vec{x}} back into our equation for {ds^2} we see that we can now write

\displaystyle  ds^2 = \frac{\partial \vec{x}}{\partial \xi^i} d\xi^i \cdot \frac{\partial \vec{x}}{\partial \xi^j} d\xi^j. \ \ \ \ \ (9)

But notice something interesting. If we set {h_{ij}(\xi) = \frac{\partial \vec{x}}{\partial \xi^i} d\xi^i \cdot \frac{\partial \vec{x}}{\partial \xi^j}}, this means we can write a more simplified equation of the form

\displaystyle  ds^{2} = h_{ij}(\xi) d\xi^i d\xi^j, \ \ \ \ \ (10)

in which the quantity {h_{ij}(\xi)} is called the induced metric. It is a metric on the target space surface precisely in the sense that, as {\xi_i} play the role of coordinates on {\Sigma}, we see that (10) determines distances on this surface. It is said to be induced because it uses the metric on the ambient space in which {\Sigma} lives to determine distances on {\Sigma}. More technically, we say that the induced metric is the pullback of the target space metric onto the worldsheet.

A question we can now ask is whether, upon constructing a metric on the target space surface, does this then lead us to an equivalent expression for (6)? Observe that, in matrix form, we have for the induced metric

\displaystyle  h_{ij} = \begin{pmatrix} \frac{\partial \vec{x}}{\partial \xi^1} \cdot \frac{\partial \vec{x}}{\partial \xi^1} & \frac{\partial \vec{x}}{\partial \xi^1} \cdot \frac{\partial \vec{x}}{\partial \xi^2} \\ \frac{\partial \vec{x}}{\partial \xi^2} \cdot \frac{\partial \vec{x}}{\partial \xi^1} & \frac{\partial \vec{x}}{\partial \xi^2} \cdot \frac{\partial \vec{x}}{\partial \xi^2} \\ \end{pmatrix}. \ \ (11)

What is this telling us? Notice that if you compute the determinant of the matrix {h_{ij}}, you find the same quantity that resides under the square root in (6). This is a massive hint that the construction is on the right track. So, let’s substitute the appropriate matrix elements into our earlier expression for the infinitesimal area. This is what we find,

\displaystyle  dA = \sqrt{(h_{11})h_{22} - h_{12}^2} \ d\xi^1 d\xi^2

\displaystyle  = \sqrt{\det h} \ d\xi^1 d\xi^2

\displaystyle  \therefore dA = \sqrt{h} \ d\xi^1 d\xi^2, \ \ \ \ \ (12)

where {h \equiv \det h_{ij} (\xi)}. This implies,

\displaystyle  A = \int d\xi^1 d\xi^2 \sqrt{h}. \ \ \ \ \ (13)

This new way to express the area, {A}, is now given in terms of the determinant of the induced metric. And although we are not yet done constructing the Nambu-Goto action, we see quite clearly from (13) why Polchinski says that the action {S_{NG}} in equations (1.2.9a-1.2.9b) in his textbook is proportional to the area of the worldsheet.

4. Reparameterisation invariance

The wonderful thing about this last result (13) is that we can now show manifest reparameterisation invariance in a much simpler way, as it may now be described by way of how the induced metric transforms.

To do this, we invoke a different set of parameters and therefore also a different metric, and then we show that the original vector {d\vec{x}} does not depend on our original parameterisation.

We begin with

\displaystyle  ds^2 = h_{ij}(\xi) d\xi^i d\xi^j = \tilde{h}_{ij}(\tilde{\xi}) d\tilde{\xi}_1 d\tilde{\xi}_2. \ \ \ \ \ (14)

We then use the chain rule

\displaystyle  ds^{2} = \tilde{h}_{pq}(\tilde{\xi}) \frac{\partial \tilde{\xi}^p}{\partial \xi^i} \frac{\partial \tilde{\xi}^q}{\partial \xi^j} d\xi^i d\xi^j

\displaystyle  h_{ij}(\xi) = \tilde{h}_{pq} (\tilde{\xi}) \frac{\partial \tilde{\xi}^p}{\partial \xi^i} \frac{\partial \tilde{\xi}^q}{\partial \xi^j}. \ \ \ \ \ (15)

Next, recall that the change of variable theorem tells us how the integration measure transforms

\displaystyle  d\xi^{1} d \xi^2 = \mid \det \frac{d \xi^i}{d \tilde{\xi}^j} \mid d \tilde{\xi}^1 d \tilde{\xi}^2 = \mid \det M \mid d \tilde{\xi}^1 d \tilde{\xi}^2, \ \ \ \ \ (16)

where {M} is the matrix defined by {M_{ij} = \partial \xi^1 / \partial \tilde{\xi}j} and similarly

\displaystyle  d\tilde{\xi}^{1} d \tilde{\xi}^2 = \mid \det \frac{d \tilde{\xi}^i}{d \xi^j} \mid d \xi^1 d \xi^2 = \mid \det \tilde{M} \mid d \xi^1 d \xi^2, \ \ \ \ \ (17)

where {\tilde{M}} is defined by {\tilde{M}_{ij} = \partial \tilde{\xi}^i / \partial \xi^j}. Using this and returning to (15) we can rewrite this equation for {h} and {\tilde{h}} such that

\displaystyle  h_{ij}(\xi) = \tilde{h}_{pq} \tilde{M}_{pi}\tilde{M}_{qj} = (\tilde{M}^T)_{ip}\tilde{h}_{pq} \tilde{M}_{qj}. \ \ \ \ \ (18)

If we denote {h \equiv \det h_{ij}}, and if take the determinant of the right-hand side of (18) we find

\displaystyle  h = (\det \tilde{M}^T) \tilde{h} (\det \tilde{M}) = \tilde{g}(\det \tilde{M})^2. \ \ \ \ \ (19)

Clearly, then, if we take the square root we obtain

\displaystyle  \sqrt{h} = \sqrt{\tilde{h}} \mid \det \tilde{M} \mid, \ \ \ \ \ (20)

which is the transformation property for the square root of the determinant of the metric.

Finally, we conclude using (16) and (20) with the fact that {\mid \det M \mid \mid \det \tilde{M} \mid = 1} we can show that (13) is reparameterisation invariant

\displaystyle  \int d\xi^1 d\xi^2 \sqrt{h} = \int d\tilde{\xi}^1 d\tilde{\xi}^2 \mid \det M \mid \sqrt{\tilde{h}} \mid \det \tilde{M} \mid = \int d\tilde{\xi}_1 d\tilde{\xi}_2 \sqrt{\tilde{h}}. \ \ \ \ \ (21)

There is perhaps a much more elegant way to show this proof. But for now, one should focus on how (21) is just a standard metric transformation inasmuch that {\int d\xi^1 d\xi^2 \sqrt{h}} transforms via a Jacobian determinant of {\xi} with respect to {\tilde{\xi}} as {\int d\tilde{\xi}_1 d\tilde{\xi}_2 \sqrt{\tilde{h}}}.

5. String propagating in spacetime

Let us now work toward constructing the Nambu-Goto action as it appears in equations (1.2.9a-1.2.9b). Up to this point we have taken the approach of mapping from a parameter space to a target space in which the surface {\Sigma} lives. But we are interested in the case of surfaces in spacetime. These surfaces are obtained by representing in spacetime the history of the string as it propagates, in the same way the worldline of the point particle is described by representing its history.

Spacetime surfaces, such as string worldsheets, are not all that different from the spatial surfaces we considered in the previous sections. Instead of the coordinates {\xi^{1}} and {\xi^{2}}, for a relativistic string we should parameterise the string worldsheet in such a way that we account for both the proper time and the string’s spatial extension. Another way to put this is that, if our interest is to consider surfaces in spacetime (the worldsheet traced by the string), we now use {\tau} to denote the proper time and {\sigma} to denote the spacial extension of the surface. Given usual spacetime coordinates, which we write following string theory conventions {X^{\mu} = (X^0, X^1, ..., X^d)}, the surface is then described by the mapping functions

\displaystyle  X^{\mu}(\tau, \sigma). \ \ \ \ \ (22)

Hence, we come to the point emphasised at the outset of this note. The string worldsheet action formally defines the map {\Sigma : (\tau, \sigma) \mapsto X^{\mu}(\tau, \sigma) \in \mathbb{R}^{1, d-1}}. If it is still not clear, remember that what we’re working toward is a description of the string worldsheet {\Sigma} as a curved surface embedded in spacetime. This embedding is given by the fields {X^{\mu}(\tau, \sigma)}, in which the parameters {\tau} and {\sigma} can be viewed (locally) as coordinates on the worldsheet. So the string map tells us that given some fixed point {X^{\mu}(\tau, \sigma)} in the parameter space, we are performing a direct mapping to a fixed point in spacetime coordinates. Typically we drop the arguments {(\tau, \sigma)} and leave them implicit, with the inverse of the map {X^{\mu}} taking the worldsheet to the parameter space.

It is also worth noting that the functions {X^{\mu}} describe how the string propagates and oscillates through spacetime, while the endpoints of the string are parameterised by {\tau} such that {\frac{\partial X^{\mu}}{\partial \tau} (\tau, \sigma) \neq 0}. In our present case, we are considering an open string; but if {\sigma} is periodic then we’d be working with a closed string embedded in the background spacetime.

Getting back to the task at hand: to find the area element we proceed in similar fashion as before, except now we must use relativistic notation. So for the area element we have {d\tau} and {d\sigma} describing the sides of an infinitesimal parallelogram in parameter space. In spacetime, this becomes a quadrilateral area element. We therefore set-up a direct analogue with our expression for {dA} in (4) where we consider the vectors {dv^{\mu}_{1}} and {dv_{2}^{\mu}} spanning the quadrilateral,

\displaystyle  dv^{\mu}_{1} = \frac{\partial X^{\mu}}{\partial \tau} d\tau, \ \ dv^{\mu}_{2} = \frac{\partial X^{\mu}}{\partial \sigma} d\sigma. \ \ \ \ \ (23)

Notice that we may substitute for {dv^{\mu}_{1}} and {dv_{2}^{\mu}} into (4),

\displaystyle  dA = d\tau d\sigma \sqrt{(\frac{\partial X^{\mu}}{\partial \tau} \frac{\partial X_{\mu}}{\partial \tau})(\frac{\partial X^{\nu}}{\partial \sigma} \frac{\partial X_{\nu}}{\partial \sigma}) - (\frac{\partial X^{\mu}}{\partial \tau} \frac{\partial X_{\mu}}{\partial \sigma})^2}. \ \ \ \ \ (24)

We now invoke relativistic dot product notation so that we ensure that what we are working with is the proper area. The object under the square root turns out to be negative, but we can switch the sign without violation of any rules. The basic idea is that, for a surface with a timelike vector and a spacelike vector the square root is always positive such that Cauchy-Schwarz inequality flips. This means, {(\dot{X}^2 \cdot X^{\prime})^2 - (\dot{X})^2 (X^{\prime})^2 > 0}. We also want to integrate (24). So, putting everything together, we have

\displaystyle  A = \int_{\sum} d\tau d\sigma \sqrt{(\frac{\partial X}{\partial \tau} \cdot \frac{\partial X}{\partial \sigma})^2 - (\frac{\partial X}{\partial \tau} \cdot \frac{\partial X}{\partial \tau})(\frac{\partial X}{\partial \sigma} \cdot \frac{\partial X}{\partial \sigma})}. \ \ \ \ \ (25)

We can still simplify our expression for the area using the more compact notation, {\dot{X}^{\mu} \equiv \frac{\partial X^{\mu}}{\partial \tau}} and {X^{\prime \mu} \equiv \frac{\partial X^{\mu}}{\partial \sigma}}. This means we can write,

\displaystyle  A = \int_{\Sigma} d\tau d\sigma \sqrt{(\dot{X})^2 (X^{\prime})^2 - (\dot{X} \cdot X^{\prime})^2}. \ \ \ \ \ (26)

Now comes the important part. From (26) there are a few ways we can approach the Nambu-Goto action. The most direct approach is to remember how, inasmuch that we are generalising the point particle action, we may anticipate the existence of some constant of proportionality. Indeed, it is completely reasonable to anticipate an action of the form general form {S = -T \int dA}. And this proves to be the case, because it follows that we may write the Nambu-Goto action for the string as

\displaystyle  S_{NG} = -\frac{T_0}{c} \int_{\tau_i}^{\tau_f} d\tau \int_{0}^{\sigma_1} d\sigma \sqrt{(\dot{X} \cdot X^{\prime})^2 - \dot{X}^2 \cdot X^{\prime^2}}, \ \ \ \ \ (27)

where {\frac{T_0}{c}} is a constant of proportionality to ensure units of action. To explain this, consider the following. Given that the string action is proportional to the proper area of the worldsheet, the area functional has units of length squared. We see this because {X^{\mu}} has unites of length, i.e., {[X] = L}, and there are four under the square root. Each term in the square root also has two {\sigma} derivatives and two {\tau} derivatives, with their units cancelling against the derivatives. Since {S_{NG}} must have the units of action {[S] = \hbar = ML^2/T} with {A} having units {L^2}, the total proper area must be multiplied by the value {M/T}. We know that the string will have a tension, {T_0}, which has units of force. We also know that force divided by velocity has the units {M/T}; so to ensure units of action the proper area is multiplied by {T_0 / c}.

6. Manifest Reparameterisation Invariance of the Nambu-Goto Action

We still shouldn’t be completely satisfied with this early form of the Nambu-Goto action (28). How do we know, for instance, that what we have ended up with is manifestly reparameterisation invariant? It is crucial that the {S_{NG}} action be dependent only on the embedding in spacetime and not the choice of parameterisation.

To explore the action (28) in a deeper way, we first need to invoke the target space Minkowski metric, {\eta_{\mu \nu}}, and we should consider a differential line element of the form

\displaystyle  -ds^{2} = dX^{\mu} dX_{\mu} = - \eta_{\mu \nu} dX^{\mu} dX^{\nu}. \ \ \ \ \ (28)

We may now expand the derivatives acting on {X},

\displaystyle  -ds^{2} = - \eta_{\mu \nu} \frac{\partial X^{\mu}}{\partial \xi^{\alpha}} \frac{\partial X^{\nu}}{\partial \xi^{\beta}} \ d\xi^{\alpha} d\xi^{\beta}, \ \ \ \ \ (29)

where {\alpha} and {\beta} run from 1 and 2. Similar as before for the spatial surface, we can define an induced metric. In this case, the induced metric on the string worldsheet is given as {h_{\alpha \beta}}. It is simply the pullback of the target space Minkowski metric, {\eta_{\mu \nu}}. This allows us to define the induced metric as,

\displaystyle  h_{\alpha \beta} = \eta_{\mu \nu} \frac{\partial X^{\mu}}{\partial \xi^{\alpha}} \frac{\partial X^{\nu}}{\partial \xi^{\beta}}. \ \ \ \ \ (30)

This means we can write the more compact equation for the line element

\displaystyle  -ds^{2} = h_{\alpha \beta} d\xi^{\alpha}d\xi^{\beta}, \ \ \ \ \ (31)

because, while the induced metric describes distances on the string worldsheet, it also includes the metric of the background spacetime in its definition. But, to ensure clarity of knowledge, let’s think about this induced metric a bit more. In matrix form, it is a {2 \times 2} matrix with components

\displaystyle  h_{\tau \tau} = \eta_{\mu \nu} \frac{\partial X^{\mu}}{\partial \tau} \frac{\partial X^{\nu}}{\partial \tau} = \dot{X}^{2},

\displaystyle h_{\sigma \tau} = \eta_{\mu \nu} \frac{\partial X^{\mu}}{\partial \sigma} \frac{\partial X^{\nu}}{\partial \tau} = \dot{X} \cdot X^{\prime} = h_{\tau \sigma},

\displaystyle  h_{\sigma \sigma} = \eta_{\mu \nu} \frac{\partial X^{\mu}}{\partial \sigma} \frac{\partial X^{\nu}}{\partial \sigma} = X^{\prime 2}. \ \ \ \ \ (32)

And so, the induced metric may be written in matrix form as

\displaystyle  h_{\alpha \beta} = \begin{pmatrix} \dot{X}^2 & \dot{X} \cdot X^{\prime} \\ \dot{X} \cdot X^{\prime} & X^2 \prime \\ \end{pmatrix}. \ \ \ \ \ (33)

Therefore, as we showed in the case of the spatial surface, we extend the logic of the previous examples and manifest reparameterisation invariance is seen to be featured with the help of the induced metric

\displaystyle  S_{NG} = -\frac{T_{0}}{c} \int_{\sum} d\tau d\sigma \sqrt{-h}, \ \ \ \ \ (34)

where {h = \det h_{\alpha \beta}}.

The final observation is that, as Polchinski notes (p.11), for the string tension an alternative parameter is {\alpha^{\prime}}. This proportionality constant {\alpha^{\prime}} has been used since the early days of string theory; one may recognise it as the Regge slope, which has to do with the relation between the angular momentum, {J}, of a rotating string and the square of the energy {E}. In that {\alpha^{\prime}} has units of spacetime-length-squared, we therefore observe

\displaystyle  T = \frac{1}{2 \pi \alpha \prime}, \ \ \ \ \ (35)

where we’ve set {\hbar = c = 1}. This is equation (1.2.10) in Polchinski. Hence, we may now rewrite {S_{NG}} in its more conventional form as read in equations (1.2.9a-1.29b):

\displaystyle  S_{NG} = - \frac{1}{2 \pi \alpha^{\prime}} \int_{\Sigma} d\tau d\sigma \ (- \det h_{\alpha \beta})^{1/2}. \ \ \ \ \ (36)

The answer in Exercise 1.1b reveals more explicitly how the string tension is related to the Regge slope. It just requires that we write things in terms of the transverse velocity. To keep these notes focused and organised, at the conclusion of each chapter we’ll go over the solutions to the exercises and so we’ll return to this question then.

6.1. Equations of motion

Before we get to the symmetries of the action (36), let’s quickly look at its equations of motion. To simplify matters, we can write the Lagrangian as {\mathcal{L} = \sqrt{-h}} with the Euler-Lagrange equations reading as

\displaystyle  \partial_\alpha\left(\frac{\partial\mathcal{L}}{\partial_\alpha X^\mu}\right)=\frac{\partial\mathcal{L}}{\partial_\alpha X^\mu}=0. \ \ \ \ \ (37)

From the chain rule, it is therefore clear that we need to calculate

\displaystyle  \partial_\alpha\left(\frac{\partial\mathcal{L}}{\partial_\alpha X^\mu}\right)=\partial_\alpha\left(\frac{\partial\mathcal{L}}{\partial h_{\beta\gamma}}\frac{\partial h_{\beta\gamma}}{\partial_\alpha X^\mu}\right). \ \ \ \ \ (38)

For the first term in brackets, we use the identity for the variation of the determinant {\delta\sqrt{-g}=-\frac{1}{2}\sqrt{-g}g_{\alpha\beta}\delta g^{\alpha\beta}}, which can be easily verified. Hence,

\displaystyle  \frac{\partial\mathcal{L}}{\partial h_{\beta\gamma}}=\frac{\partial\sqrt{-h}}{\partial h_{\beta\gamma}}=-\frac{1}{2}\sqrt{-h}\frac{h_{\rho\kappa}\delta h^{\rho\kappa}}{\delta h_{\beta\gamma}}=-\frac{1}{2}\sqrt{-h}h^{\beta\gamma}. \ \ \ \ \ (39)

For the next term we find

\displaystyle  \frac{\partial h_{\beta\gamma}}{\partial(\partial_\alpha X^\mu)} =\eta^{\mu\nu}\delta^{\alpha}_\beta\partial_\gamma X_\nu +\eta^{\mu\nu}\delta^\alpha_\gamma\partial_\beta X_\nu =\delta^\alpha_\beta\partial_\gamma X^\mu +\delta^\alpha_\gamma\partial_\beta X^\mu. \ \ \ \ \ (40)

Putting everything together

\displaystyle  \partial_\alpha\left(\frac{\partial\mathcal{L}}{\partial h_{\beta\gamma}}\frac{\partial h_{\beta\gamma}}{\partial_\alpha X^\mu}\right)=\partial_\alpha(-\frac{1}{2}\sqrt{-h}h^{\beta\gamma}(\delta_{\alpha}^\beta\partial_\gamma X^\mu+\delta_\alpha^\gamma\partial_\beta X^\mu))=0 \ \ \ \ \ (41)

\displaystyle  \frac{1}{2}\partial_\alpha(\sqrt{-h}h^{\alpha\gamma}\partial_\gamma X^\mu+\sqrt{-h}h^{\beta\alpha}\partial_\beta X^\mu)=0 \ \ \ \ \ (42)

\displaystyle  \partial_\alpha(\sqrt{-h}h^{\alpha\beta}\partial_\beta X^\mu)=0. \ \ \ \ \ (43)

As the metric {h} contains the embedding {X^{\mu}}, these equations are highly non-linear. But this is not unexpected given the fact that the action (35) is non-linear. One way to interrept these equations is that, as a minimal surface area is being demanded by the stationary action, in Zwiebach’s textbook one is motivated to think analogously of the image of static soap film in some Lorentz frame. In this case, we think of the film as a spatial surface in which every point is a saddle point.

6.2. Symmetries

Finally, the last topic covered concerns the symmetries of the action (36).

Poincare group: As the Nambu-Goto action is completely and directly analogous to the action for a relativistic point particle, one might rightly anticipate that the action for a string is invariant under the isometry group of flat spacetime, which is the D-dimensional Poincare group

\displaystyle  X^{\prime \mu}(\tau, \sigma) = \Lambda^{\mu}_{\nu}X^{\nu}(\tau, \sigma) + a^{\mu}. \ \ \ \ \ (44)

This symmetry group consists of consists of Lorentz transformations {\Lambda^{\mu}_{\nu}} satisfying {SO(D-1, 1)} algebra and {a^{\mu}} transformations. This symmetry is manifest and can be read-off from (36) since the Lorentz indices are contracted in the correct way to obtain a Lorentz scalar. But to see it explicitly just note that {X^{\mu}} are flat spacetime vectors. Under the transformation {X^{\mu} \rightarrow X^{\prime \mu} = \Lambda^{\mu}_{\nu}X^{\nu}(\tau, \sigma) + a^{\mu}} we see that {\partial_{\alpha}X^{\prime \mu} = \Lambda^{\mu}_{\nu}\partial_{\beta}X^{\nu}}. Hence

\displaystyle  \eta_{\mu}\partial_{\alpha} X^{\prime \mu}\partial_{\beta}X^{\prime \nu} = = {\Lambda^{\mu}}_{\gamma} \eta_{\mu \nu} {\Lambda^{\nu}}_{\sigma}\partial_{\alpha} X^{\gamma} \partial_{\beta} X^{\sigma} = \eta_{\gamma \sigma} \partial_{\alpha} X^{\gamma} \partial_{\beta} X^{\sigma}, \ \ \ \ \ (45)

where we used {\Lambda^{\mu}_{\gamma} \eta_{\mu \nu} \Lambda^{\nu}_{\sigma} = \eta_{\gamma \sigma}}.

From the perspective of the worldsheet theory, the Nambu-Goto action is a 2-dimensional field theory of scalar fields {X^{\mu}(\tau, \sigma)}, and Poincare invariance is in fact an internal symmetry.

Diffeomorphism invariance The Nambu-Goto action is also invariant under diffeomorphism transformations, or reparamterisation of the coordinates, which we’ve already observed by the very nature of how we construct (36) such that {X^{\prime \mu} (\tau^{\prime}, \sigma^{\prime}) = X^{\mu}(\tau, \sigma)}.

6.3. Concluding remarks

The main issue with the action (36) is the presence of the square root, which complicates matters when we attempt to quantise the theory or take the massless limit {m \rightarrow 0}. That is why, analogous to the case of the relativistic point particle, we’ll want to get rid of this square root and construct a classically equivalent action. This is known as the Polyakov action and, following the progression in Polchinski, it is the topic of the next note.

In the meantime, I want to point out that there is still much more to be learned about the Nambu-Goto action and its dynamics. There are some quite famous and important results, which are not covered in Polchinski’s textbook. It is notable, for instance, that from an analysis of the worldsheet momentum densities

\displaystyle  P^{\alpha}_{\mu} = \frac{\partial \mathcal{L}}{\partial \partial_{\alpha}X^{\mu}} \ \ \ \ \ (46)

we can evaluate the components of the canonical momenta explicitly

\displaystyle  \Pi = P_{\mu}^{\sigma} = \frac{\partial L}{\partial X^{\prime \mu}} = \frac{\partial}{\partial X^{\prime \mu}} (-T \sqrt{(\dot{X} \cdot X^{\prime})^{2} - (\dot{X}^2)(X^{\prime})^2})

\displaystyle  = -\frac{T}{2}[(\dot{X} \cdot X^{\prime})^{2} - (\dot{X}^2)(X^{\prime})^2]^{-1/2} [2(\dot{X} \cdot X^{\prime})\dot{X}_{\mu} - 2 \dot{X}^{2}X_{\mu}^{\prime}]

\displaystyle  = \frac{(\dot{X} \cdot X^{\prime})\dot{X}_{\mu} - \dot{X}^{2}X_{\mu}^{\prime}}{\sqrt{(\dot{X} \cdot X^{\prime})^{2} - (\dot{X}^2)(X^{\prime})^2}}, \ \ \ \ \ (47)


\displaystyle P_{\mu}^{\tau} = \frac{\partial L}{\partial \dot{X}^{\mu}} = \frac{\partial}{\partial \dot{X}^{\mu}}(-T \sqrt{(\dot{X} \cdot X^{\prime})^{2} - (\dot{X}^2)(X^{\prime})^2})

\displaystyle  = -\frac{T}{2}[(\dot{X} \cdot X^{\prime})^{2} - (\dot{X}^2)(X^{\prime})^2]^{-1/2} [2(\dot{X} \cdot X^{\prime})X_{\mu}^{\prime} - 2 X^{\prime} \dot{X}_{\mu}^{2}]

\displaystyle  = \frac{(\dot{X} \cdot X^{\prime})X_{\mu}^{\prime} - X^{\prime 2} \dot{X}_{\mu}}{\sqrt{(\dot{X} \cdot X^{\prime})^{2} - (\dot{X}^2)(X^{\prime})^2}}. \ \ \ \ \ (48)

From this analysis, we can obtain an equation that we can interpret as the generalised momentum flow of the particle worldline. This helps give a bit more insight and intuition into the direct analogue we’ve established between point particle theory and the theory of strings. Furthermore, by imposing the appropriate boundary conditions we can show for the equations of motion that

\displaystyle  \partial_{\alpha}P^{\alpha}_{\mu} = 0, \ \ \ \ \ (49)

which, given the equations for the worldsheet momentum, we find the 2-dimensional wave equation given the choice of coordinates {\dot{X} \cdot X^{\prime} = 0}, {\dot{X}^2 = -1}, and {X^{\prime} = 1}. In the same analysis, we can find very important conditions such as the Virasaro constraints that govern the dynamics of the string.

There is also much more that can be studied: boundary conditions and motion of the string endpoints, which provides a first introduction to D-branes; tension and energy of the stretched string; transverse velocity; among other interesting topics. All of this of course comes up also in our study of the Polyakov action; but for the interested reader, Zwiebach’s textbook referenced at the outset covers all of these topics in detail in the context of the Nambu-Goto action.

Cosmological constant, the duality symmetric string, and Atkin-Lehner symmetry

I was going through one of my notebooks and I came across a page with several comments on old papers by Arkady Tseytlin [1] and Gregory Moore [3], respectively. The notes must have been written last autumn at the start of the academic year, because it was around this time my supervisor and I were talking about the cosmological constant problem. In the referenced papers, two interesting approaches to the CC in string theory are presented.

Let’s start with Tseytlin. We’ve discussed in the past Tseytlin’s formulation of the duality symmetric string for interacting chiral bosons, so I direct the reader to that entry for a background introduction. Jumping straight to the point, what we find in the final sections of [2] is that, upon computing the 3-graviton amplitudes, the following 3-graviton interaction is obtained

\displaystyle S_3 = \int d^D x_{+} d^D x_{-} [h_{\alpha \beta} (h_{\lambda \rho}\partial_{+ \alpha} \partial_{- \beta} h_{\lambda \rho} + 2\partial_{+ \alpha} h_{\lambda \rho}\partial_{- \rho}h_{\beta \lambda})], \  \ (1)

where \partial_{\pm \mu} \equiv \partial / \partial x^{\mu}_{\pm} and h_{\mu \nu} \equiv H_{(\mu \nu)} (x_{+}, x_{-}). When (1) is written in terms of doubled coordintes (x, \tilde{x}) the low-energy effective theory takes the form

\displaystyle S_3 = \int d^D x d^D \tilde{x} [R_3 (\partial) - R_3 (\tilde{\partial})], \  \ (2)

where \partial_{\mu} = 1/\sqrt{2} (\partial_{+ \mu} + \partial_{-\mu}) = \partial / \partial x^{\mu} and \tilde{\partial}_{\mu} = 1 / \sqrt{2} (\partial_{+ \mu} - \partial_{-\mu} = \partial / \partial \tilde{x}_{\mu}. The 3-graviton term R_3 (\partial)(R_3(\tilde{\partial})) in the expansion of the scalar curvature for the metric G_{\mu \nu} = \delta_{\mu \nu} + h_{\mu \nu} with h_{\mu \nu}(x, \tilde{x}) can be written

\displaystyle R_3 (\partial) = 1/4 h_{\mu \nu} \partial^2 h_{\mu \nu} - 1/4 h_{\alpha \beta}(h_{\lambda \rho} \partial_{\alpha} \partial_{\beta} h_{\lambda \rho} + 2\partial_{\alpha} h_{\lambda \rho}\partial_{\rho}h_{\beta \lambda}) + ..., \  \ (3)

\displaystyle  \equiv R_2 + R_3 + ..., \  \ (4)

with \partial_{\mu} h_{\mu \nu}= 0 and h^{\mu}_{\mu} = 0.

As we then see in [2], in the case \tilde(\partial)_{\lambda} h_{\mu \nu} = 0 it follows (2) reduces to the standard Einstein vertex. But as Tseytlin also notes, there is a contradiction in the structure of (2) owed to the presence of the minus sign. What happens is that, if R_3(\partial) and R_3 (\tilde{\partial}) are replaced for the full Einstein scalars, the corresponding linearised equations for h_{\mu \nu} contains the difference of \partial^2 and \tilde{\partial}^2 which does not match the mass-shell condition (\partial^2 + \tilde{\partial}^2)H_{\mu \nu} = 0. To remedy this, the full off-shell generalisation of (2) is considered

\displaystyle S_{Eff} = \int d^D x d^D \tilde{x} \sqrt{g(x,\tilde{x})} \sqrt{\tilde{g}(x, \tilde{x})} [R(g, \partial) + R(\tilde{g}, \tilde{\partial}) + ...], \  \ (5)

which I think is fair to say is a quite famous result. Take particular notice of the structure of this effective action. For me, I could stare at it for lengths of time; it is one of my current favourite results in the context of duality symmetric string theory and I have several thoughts about it. In fact, some of my ongoing research is focused on thinking more broadly about the geometric structure of the full 2D-dimensional space, and I think there is still quite a bit left to be said about potential insight offered in (5).

But for the interests of the present post, we want to focus on an altogether different matter: the cosmological consant. To share something else that is interesting, in [1] perhaps a lesser known about ansatz is presented for the large distance effective gravitational action based on the effective theory (5). It takes the form

\displaystyle \bar{S} = \frac{S}{V} = \frac{\int d^D x \sqrt{g} (R + L_M)}{\int d^D x \sqrt{g}}. \  \ (6)

What we have here is a gravity plus matter system \bar{S} that is given by the standard action S divided by the volume V of spacetime. How to make sense of it? Much of [1] is spent arriving at (6), and so I’ll spare the details as they are quite clear in that paper. The main idea, in summary, is that from (5) in which the coordinates are doubled at the Planck scale, one can essentially integrate out the dual coordinates \tilde{x} (really, the dual coordinates are treated in Kaluza-Klein fashion and as such one sees that the integral over the dual coordinates decouples) so that, as a step to arriving at (6), an action is obtained for the standard curvature scalar R that includes the dual volume \tilde{V} that is the inverse of the usual volume. It looks like this

\displaystyle \hat{S} \simeq \tilde{V} \int d^D x \sqrt{g} R + ..., \  \ (7)


\displaystyle \tilde{V} = \int d^D \tilde{x} \sqrt{\tilde{g}(\tilde{x})}. \ \ (8)

What was really clever by Tseytlin resides in how, motivated by an earlier proposal by Linde, he saw that although some mechanism to solve the CC problem at the level of the Planck scale looked unlikely, one might be able to explain why the CC looked small through some modification of the low-energy effective gravitational action using a sort of nonlocality. He saw, quite rightly, such a possibility naturally emerges within the structure of duality symmetric string theory. However, as it stands, there are issues with radiative stability in this set-up, despite some claims in the literature. This was most recently explored in relation to vacuum energy sequestering. But despite these issues, among a number of other questions, I think there could still be something in the general line of thought; hence my interests in the target space of this theory.


The other paper [3] I started taking notes on was by another legend, Gregory Moore. One of the issues with the CC in string theory is the contribution to it by the massless sector. One can easily see this from an analysis of the standard string. But what Moore observes is how this contribution may be cancelled by a tower of massive states, such as by using the Atkin-Lehner symmetry for instance.

Atkin-Lehner (AL) symmetry is really quite neat. It originates from number theory and the study of modular forms, but there is some suggestion and deep hints that AL symmetry is present in string theory. Admittedly, I am not deeply familiar with this topic and have merely flagged this paper as interesting for when I have some time to go back and think about the CC. But from my understanding is that, given the fact that the string path integral can be viewed as an inner product of modular forms over some moduli space, then in the case of certain backgrounds the moduli space can be seen to exhibit AL symmetry.

In short, the motivation for Moore is to look for any kind of enhanced albeit hidden symmetry (for instance, in parameter space). In the expansion of the trace for a complete set of stringy states, the one-loop path integral can be interpreted as an inner product of left and right-moving wave-functions Z = \langle \Psi_R \vert \Psi_L \rangle. From a stringy point of view, it is argued that the vanishing of the cosmological constant in our universe could then be interpreted from understanding why \Psi_R and \Psi_L are orthogonal. Naturally, Moore turns to heterotic theory. He finds that the one-loop string cosmological constant vanishes in non-trivial non-supersymmetric backgrounds when viewing the path integral as an inner product of orthogonal wave-functions.

But from what I understand, there are issues with the construction in [3], for example when applied in the case of four-dimensional spacetime. There is also another paper that I am aware of on twisted modular forms, but I have not read it. That said, I would like to understand AL better and also the issues faced in [3]. It is a very interesting paper. Given time with a return to thinking about the CC, it would be a fun to properly work through. For that reason I share it here.


[1] A. A. Tseytlin. Duality-Symmetric String Theory and the Cosmological-Constant Problem. Phys. Rev. Lett. 66 (1991), 545-548. doi:10.1103/PhysRevLett.66.545. url:

[2] A. A. Tseytlin. Duality symmetric closed string theory and interacting chiral scalars. Nucl. Phys. B 350 (1991), 395-440. doi:10.1016/0550-3213(91)90266-Z.

[3] G. Moore. Atkin-Lehner Symmetry. Nucl. Phys. B293 (1987) 139. url:

Double Field Theory as the double copy of Yang-Mills

1. Introduction

A few weeks ago I came across this paper [DHP] on Double Field Theory and the double copy of Yang-Mills. Its result is most curious.

As a matter of introduction, recall how fundamental interactions in nature are governed by two kinds of theories: On the one hand, Einstein’s theory of relativity. On the other hand we have Yang-Mills theory, which provides a description of the gauge bosons of the standard model of particle physics. Yang-Mills is one example of gauge theory; however, not all gauge theories must necessarily be of Yang-Mills form. In a very broad picture view, gravity is also a gauge theory. This can be most easily seen in the diffeomorphism group symmetry.

Of course, Yang-Mills is the best quantum field theory that we have; it yields remarkable simplicity and is at the heart of the unification of the electromagnetic force and weak forces as well as the theory of the strong force, i.e., quantum chromodynamics. Similarly one might think that, given gravity is an incredibly symmetric theory, it should also yield a beautiful QFT. It doesn’t. When doing perturbation theory, even at quadratic order things already start to get hairy; but then at cubic and quartic order the theory is so complicated that attempting to do calculations with the interaction vertices becomes nightmarish. So instead of a beautiful QFT, what we actually find is incredibly complicated.

In this precise sense, on a quantum level there is quite an old juxtaposition between gauge theory in the sense of Yang-Mills (nice and simple) versus gravity (a hot mess). In other parts, the two can be seen to be quite close (at least we have have a lot of hints that they are close). Indeed, putting aside gauge formulations of gravity, even simply under the gauge theory of Lorentz symmetries we can start to draw a comparison between gravity and Yang-Mills, and this has been the case since at least the 1970s. Around a similar time gauge theory of super Poincare symmetries produced another collection of hints. And, one of the most important examples without question is the holographic principle and the AdS/CFT correspondence.

Yet another highly fruitful way to drill down into gauge-gravity, especially over the last decade, has followed the important work of Bern-Carrasco-Johansson in [BCJ1] and [BCJ2]. Here, a remarkable observation is made: gravity scattering amplitudes can be seen as the exact double copy of Yang-Mills amplitudes, suggesting even further a deeply formal and profoundly intimate relationship between gauge theory and gravity.

Schematically put, following the double copy technique it is observed that gravity = gauge x gauge. This leads to the somewhat misleading statement that gravity is gauge theory squared.

A lot goes back to the KLT relations of string theory. The general idea of the double copy method is that, from within perturbation theory, Yang-Mills (and gauge theories in general) can be appropriately constructed so that their building blocks obey a property known as color-kinematics duality. (This is, in itself, a fascinating property worthy of more discussion in the future. To somewhat foreshadow what is to come, there were already suspicions in the early 1990s that it may relate to T-duality, which one will recall is a fundamental symmetry of the string). Simply put, this is a duality between color and kinematics for gauge theories leaving the amplitudes unaltered.

For instance, to understand the relation between gravity and gauge theory amplitudes at tree-level, we can consider a gauge theory amplitude where all particles are in the adjoint color representation. So if we take pure Yang-Mills

\displaystyle  S_{YM} = \frac{1}{g^2} \int \text{Tr} F \wedge \star F \ \ (1)

there is an organisation of the n-point L-loop gluon amplitude in terms of only cubic diagrams

\displaystyle  \mathcal{A}_{YM}^{n,L} = \sum \limits_i \frac{c_i n_i}{S_i d_i}, \ \ (2)

where {c_i} are the colour factors, {n_i} the kinetic numerical factors, and {d_i} the propagator. Then the color-kinematic duality states that, given some choice of numerators, such that if those numerators are known, it is required there exists a transformation from any valid representation to one where the numerators satisfy equations in one-to-one correspondence with the Jacobi identity of the color factors,

\displaystyle  c_i + c_j + c_k = 0 \Rightarrow n_i + n_j + n_k = 0

\displaystyle  c_i \rightarrow -c_i \Rightarrow n_i \rightarrow -n_i. \ \ (3)

So, as the kinematic numerators satisfy the same Jacobi identities as the structure constants do, for some choice of numerators (from what I understand the choice is not unique), we can obtain the gravity amplitude. For example, given double copy {c_i \rightarrow n_i} it is possible to obtain an amplitude of {\mathcal{N} = 0} supergravity

\displaystyle  \mathcal{A}_{\mathcal{N}=0}^{n,L} = \sum \limits_i \frac{n_1 n_i}{S_i d_i}, \ \ (4)

where one will notice in the numerator that we’ve striped off the colour and replaced with kinematics, and where the supergravity action is

\displaystyle  S_{\mathcal{N}=0} = \frac{1}{2\kappa^2} \int \star R - \frac{1}{d-2} d\psi \wedge \star d\psi - \frac{1}{2} \exp(- \frac{4}{d-2}\psi) dB \wedge \star dB. \ \ (5)

In summary, the colour factors that contribute in the gauge theory appear on equal footing as the purely kinematical numerator factors (functions of momenta and polarizations), and all the while the Jacobi identities are satisfied. When all is said and done, the hot mess of a QFT in the gravity theory can be related to the nicest QFT in terms of Yang-Mills.

But notice that none of what has been said has anything to do with a description of physics at the level of the Lagrangian. For a long time, some attempts were made but there was no reason to think the double copy method should work at the level of an action. As stated in [Nico]: ‘no amount of fiddling with the Einstein-Hilbert action will reduce it to a square of a Yang-Mills action.’ Although many attempts have been made, with some notable results, this question of applying the double copy method on the level of the action takes us to [DHP].

In this paper, the authors use the double copy techniques to replace colour factors with a second set of kinematic factors, which come with their own momenta, and it ultimately leads to a double field theory (see past posts for discussion on DFT) with doubled momenta or, in position space, a doubled set of coordinates. In other words, the double copy of Yang-Mills theory (at the level of the action) yields at quadratic and cubic order double field theory upon integrating out the duality invariant dilaton.

When I first read this paper, the result of obtaining the background independent DFT action was astounding to me. In what follows, I want to quickly review the calculation (we’ll only consider the quadratic action, where the Lagrangian remains gauge invariant).

2. Yang-Mills / DFT – Quadratic theory

Start with a gauge theory of non-abelian vector fields in D-dimensions

\displaystyle  S_{YM} = -\frac{1}{4} \int \ d^Dx \ \kappa_{ab} F^{\mu \nu a} F_{\mu \nu}^{b}, \ \ (6)

with the field strength for the gauge bosons {A_{\mu}^{a}} defined as

\displaystyle  F_{\mu \nu}^{a} = \partial_{\mu} A^{a}_{\nu} - \partial_{\nu}A^{a}_{\mu} + g_{YM} f^{a}_{bc}A_{\mu}^{b} A_{\nu}^{c}. \ \ (7)

Here {g_{YM}} is the usual gauge coupling. The {f^{a}_{bc}} term denotes the structure constants of a compact Lie group (i.e., in this case a non-Abelian gauge group). This group represents the color gauge group, and we define {a,b,...} as adjoint indices. The invariant Cartan-Killing form {\kappa_{ab}} lowers the adjoint indices such that {f_{abc} \equiv \kappa_{ad}f^d_{bc}} is antisymmetric.

Expanding the action (3) to quadratic order in {A^{\mu}} and then integrating by parts we find

\displaystyle  -\frac{1}{4} \int d^{D}x \ \kappa_{ab} \ (-2 \Box A^{\mu a} A_{\mu}^{b} + \partial_{\mu}\partial^{\nu} A^{\mu a}A_{\nu}^b). \ \ (8)

Pulling out {A^{\mu a}} and the factor of 2, we obtain the second-order action as given in [DHP]

\displaystyle  S_{YM}^{(2)} = \frac{1}{2} \int d^{D}x \ \kappa_{ab} \ A^{\mu a}(\Box A^{b}_{\mu} - \partial_{\mu} \partial^{\nu} A^b_{\nu}). \ \ (9)

To make contact with the double copy formalism, we next move to momentum space with momenta {k}. Define {A^{a}_{\mu}(k) = 1/(2\pi)^{D/2} \int d^D x \ A_{\mu}^{a}(x) \exp(ikx)}. In these notes we use the shorthand {\int_k := \int d^{D} k}. In [DHP], the convention is used where {k^2} is scaled out, which then allows us to define the following projector

\displaystyle  \Pi^{\mu \nu}(k) \equiv \eta^{\mu \nu} - \frac{k^{\mu} k^{\nu}}{k^2}, \ \ (10)

where we have the Minkowski metric {\eta_{\mu \nu} = (-,+,+,+)}.

Proposition 1 The projector defined in (10) satisfies the identities

\displaystyle  \Pi^{\mu \nu}(k)k_{\nu} \equiv 0, \ \text{and} \ \Pi^{\mu \nu}\Pi_{\nu \rho} = \Pi^{\mu}_{\rho}. \ \ (11)

Proof: The second identity is trivial, while the first identity can be found substituting (10) in (11) and recalling we’ve scaled out {k^2}. \Box

The first identity in (11) implies gauge invariance under the transformation

\displaystyle  \delta A^{a}_{\mu}(k) = k_{\mu}\lambda^a(k), \ \ (12)

where the gauge parameter {\lambda^a(k)} is defined as an arbitrary function.

3. Double copy of gravity theory

Proposition 2 The double copy prescription of gravity theory leads to double field theory.

Proof: Begin by replacing the color indices {a} by a second set of spacetime indices {a \rightarrow \bar{\mu}}. This second set of spacetime indices then corresponds to a second set of spacetime momenta {\bar{k}^{\bar{\mu}}}. For the fields {A^a_{\mu}(k)} in momentum space, we define a new doubled field

\displaystyle  A^a_{\mu}(k) \rightarrow e_{\mu \bar{\mu}}(k, \bar{k}). \ \ (13)

Next, following the double copy formalism, a substitution rule for the Cartan-Killing metric {\kappa_{ab}} needs to be defined. In [DHP], the authors propose that we replace this metric with a projector carrying barred indices such that

\displaystyle  \kappa_{ab} \rightarrow \frac{1}{2} \bar{\Pi}^{\bar{\mu} \bar{\nu}}(\bar{k}). \ \ (14)

Notice, this expression exists entirely in the barred space.

Remark 1 (Argument for why (14) is correct) It is argued that the replacement (14) is derived from the double copy rule at the level of amplitudes. Schematically, one can consider a gauge theory amplitude of the form {\mathcal{A} = \Sigma_i n_i c_i / D_i}, where {n_i} are kinematic factors, {c_i} are colour factors, and {D_i} denote inverse propagators. Then, in the double copy, replace {c_i} by {n_i} with {D_i \sim k^2}. This means that {k^2} may be scaled out as before, leaving only the propagator to be doubled.

Making the appropriate substitutions, we obtain a double copy action for gravity of the form

\displaystyle  S_{grav}^{(2)} = - \frac{1}{4} \int_{k, \bar{k}} \ k^2 \ \Pi^{\mu \nu}(k) \bar{\Pi}^{\bar{\mu}\bar{\nu}}(\bar{k}) \ e_{\mu \bar{\mu}}(-k, -\bar{k})e_{\nu \bar{\nu}}(k, \bar{k}). \ \ (15)

The structure of this action is really quite nice; in some ways, it is what one might expect as it is very reminiscent of the structure of the duality symmetric string.

To make the doubled nature of the action (15) more explicit, define doubled momenta {K = (k, \bar{k})}, and, just as the duality symmetric string, treat {k, \bar{k}} on equal footing. It now seems arbitrary whether there is {k^2} or {\bar{k}^2} at the front of the integrand. In any case, unlike the measure factor for the duality symmetric string which, in momentum space, takes the form {k, \tilde{k}}, the asymmetry of (15) is resolved by imposing

\displaystyle  k^2 = \bar{k}^2, \ \ (16)

which one might notice is just the level-matching condition. To obtain DFT, the imposition of this constraint is necessary (indeed, just like it is in pure DFT).

Remark 2 (More general solutions) The solution {k = \bar{k}} should be familiar from studying the linearised theory. However, here exists more general solutions and it might be interesting to think more about this matter.

It is fairly straightforward to see that under

\displaystyle  \delta e_{\mu \bar{\nu}} = k_{\mu}\bar{\lambda}_{\bar{\nu}} + \bar{k}_{}\bar{\nu}\lambda_{\mu} \ \ (17)

the action (15) is invariant. Now we have two gauge parameters dependent on doubled momenta.

Upon writing out the projectors (11) and then imposing the level-matching condition (16), we can use the metric to lower indices. Then taking the product with the {e} fields, we find the action (15) to take the following form:

\displaystyle  S_{grav}^{(2)} = -\frac{1}{4} \int \ \int_{k, \bar{k}} (k^{2}e^{\mu \bar{\nu}}e_{\mu \bar{\nu}} - k^{\mu}k^{\rho}e_{\mu \bar{\nu}}e^{\bar{\nu}}_{\rho} - \bar{k}^{\bar{\nu}}\bar{k}^{\bar{\sigma}}e_{\mu \bar{\nu}}e^{\mu}_{\bar{\sigma}} + \frac{1}{k^2}k^{\mu}k^{\rho}\bar{k}^{\bar{\nu}}\bar{k}^{\bar{\sigma}}e_{\mu \bar{\nu}}e_{\rho \bar{\sigma}}). \ \ (18)

Already one can see this looks very similar to the background independent quadratic action of DFT. To get a better comparison, we can Fourier transform to doubled position space. In doing so, it is observed that every term transforms without a problem except the last term which results in a non-local piece. The trick, as noted in [DHP], is to introduce an auxiliary scalar field {\phi(k, \bar{k})} (i.e., the dilaton).

Doing these steps means we can first rewrite (18) as follows

\displaystyle  S_{grav}^{(2)} = -\frac{1}{4} \int \ \int_{k, \bar{k}} (k^{2}e^{\mu \bar{\nu}}e_{\mu \bar{\nu}} - k^{\mu}k^{\rho}e_{\mu \bar{\nu}}e^{\bar{\nu}}_{\rho} - \bar{k}^{\bar{\nu}}\bar{k}^{\bar{\sigma}}e_{\mu \bar{\nu}}e^{\mu}_{\bar{\sigma}} - k^2 \phi^2 + 2\phi k^{\mu}\bar{k}^{\bar{\nu}}e_{\mu \bar{\nu}}). \ \ (19)

By using the field equations for {\phi}

\displaystyle  \phi = \frac{1}{k^2} k^{\mu}\bar{k}^{\bar{\nu}}e_{\mu \bar{\nu}} \ \ (20)

or, alternatively, using the redefinition

\displaystyle  \phi \rightarrow \phi^{\prime} = \phi - \frac{1}{k^2} k^{\mu}\bar{k}^{\bar{\nu}}e_{\mu \bar{\nu}} \ \ (21)

we then get back the non-local action (18).

Remark 3 (Maintaining gauge invariance) What’s nice is that (19) is still gauge invariant, which can be checked using also the gauge transformation for the dilaton {\delta \phi = k_{\mu}\lambda^{\mu} + \bar{k}_{\bar{\mu}}\bar{\lambda}^{\bar{\mu}}}.

Now Fourier transforming (19) to doubled position space, we define in the standard way {\partial_{\mu} / \partial x^{\mu}} and {\bar{\partial}_{\bar{\mu}} = \partial / \partial \bar{x}^{\bar{\mu}}}. We also of course obtain the usual duality invariant measure. The resulting action takes the form

\displaystyle  S_{grav}^{(2)} = \frac{1}{4} \int d^D x \ d^D \bar{x} \ (e^{\mu \bar{\nu}}\Box e_{\mu \bar{\nu}} + \partial^{\mu}e_{\mu \bar{\nu}}\partial^{\rho}e_{\rho}^{\bar{\nu}}

\displaystyle  + \bar{\partial}^{\bar{\nu}}e_{\mu \bar{\nu}}\bar{\partial}^{\bar{\sigma}}e^{\mu}_{\bar{\sigma}} - \phi \Box \phi + 2\phi \partial^{\mu}\bar{\partial}^{\bar{\nu}}e_{\mu \bar{\nu}}. \ \ (22)


This is the standard quadratic double field theory action. As such, it maintains gauge invariance – notice, we haven’t had to impose a gauge condition and the only extra field introduced was the dilaton.

Very cool.


[BCJ1] Z. Bern, J.J. M. Carrasco, and H. Johansson, New Relations for Gauge-Theory Amplitudes. [0805.3993 [hep-ph]].

[BCJ2] Z. Bern, J.J. M. Carrasco, and H. Johansson, Perturbative Quantum Gravity as a Double Copy of Gauge Theory. [1004.0476 [hep-th]].

[BJH] R. Bonezzi, F. Diaz-Jaramillo, O. Hohm, The Gauge Structure of Double Field Theory follows from Yang-Mills Theory. [2203.07397 [hep-th]]

[DHP] F. Dıaz-Jaramillo, O. Hohm, and J. Plefka, Double Field Theory as the Double Copy of Yang-Mills. [2109.01153 [hep-th]].

[Nico] H. Nicolai, “From Grassmann to maximal (N=8) supergravity,” Annalen Phys. 19, 150–160 (2010).

*Cover image: Z. Bern lecture notes, Gravity as a Double Copy of Gauge Theory.

Notes on string theory #2: The relativistic point particle (pp. 9-11)

1. Introduction

In Chapter 1 of Polchinski’s textbook, we start with a discussion on the relativistic point particle (pp. 9-11).

String theory proposes that elementary particles are not pointlike, but rather 1-dimensional extended objects (i.e., strings). In fact, string theory (both the bosonic string in Volume 1 of Polchinski and the superstring that comprises much of Volume 2) can be seen as a special generalisation of point particle theory. But the deeper and more modern view is not one that necessarily begins with point particles and then strings, instead the story begins with branes. In that a number of features of string theory are shared by the point particle – as we’ll see in a later note, the point particle can be obtained in the limit the string collapses to a point – the bigger picture is that both of these objects can be considered as special cases of a p-brane.

We refer to p-branes as p-dimensional dynamical objects that have mass and can have other familiar attributes such as charge. As a p-brane moves through spacetime, it sweeps out a latex (p+1)-dimensional volume called its worldvolume. In this notation, a 0-brane corresponds to the case where p = 0. It simply describes a point particle that, as we’ll discuss in this note, traces out a worldline as it propagates through spacetime. A string (whether fundamental or solitonic) corresponds to the p = 1 case, and this turns out to be a very special case of p-branes (for many reasons we’ll learn in following notes). Without getting too bogged down in technical details that extend well beyond the current level of discussion, it is also possible to consider higher-dimensional branes. Important is the case for p = 2, which are 2-dimensional branes called membranes. In fact, the etymology for the word ‘brane’ can be viewed as derivative from `membrane’. As a physical object, a p-brane is actually a generalisation of a membrane such that we may assign arbitrary spatial dimensions. So, for the case {p \geq 2} , these are p-branes that appear in string theory as solitons in the corresponding low energy effective actions of various string theories (in addition to 0-branes and 1-branes).

In Type IIA and Type IIB string theories, which again is a subject of Volume 2, we see that there is entire family of p-brane solutions. From the viewpoint of perturbative string theory, which is the primary focus of Volume 1, solitons as p-branes are strictly non-perturbative objects. (There are also other classes of branes, such as Dp-branes that we’ll come across soon when studying the open string. The more complete picture of D/M-brane physics, including brane dynamics, is anticipated to be captured by M-theory. This is a higher dimensional theory that governs branes and, with good reason, is suspected to represent the non-perturbative completion of string theory).

In some sense, one can think of there being two equivalent ways to approach the idea of p-branes: a top-down higher dimensional view, or from the bottom-up as physical objects that generalise the notion of a point particle to higher dimensions. But given an introductory view of p-branes, perhaps it becomes slightly more intuitive why in approaching the concept of a string in string theory we may start (as Polchinski does) with a review of point particle theory. Indeed, it may at first seem odd to model the fundamental constituents of matter as strings. Indeed, it could seem completely arbitrary and therefore natural to ask, why not something else? But what is often missed, especially in popular and non-technical physics literature, is the natural generalising logic that leads us to study strings in particular. These are remarkable objects with remarkable properties, and what Polchinski does so well in Volume 1 is allow this generalising logic to come out naturally in the study of the simplest string theory: bosonic string theory.

In this note, we will construct the relativistic point particle action as given in p.10 (eqn. 1.2.2) and then work through the proceeding discussion in pages 10-11. The quantisation of the point particle is mentioned several pages later in the textbook, so we’ll address that topic then. In what follows, I originally also wanted to include notes on the superparticle and its superspace formulation (i.e., the inclusion of fermions to the point particle theory of bosons), as well as introduce other advanced topics; but I reasoned it is best to try to keep as close to the textbook as possible. The only exception to this rule is that, at the end of this note, we’ll finish by quickly looking at the p-brane action.

2. Relativistic point particle

Explanation of the action for a relativistic point particle as given in Polchinski (eqn. 1.2.2) is best achieved through its first-principle construction. So let us consider the basics of constructing the theory for a relativistic free point particle.

2.1. Minkowski space

We start with a discussion about the space in which we’ll build our theory [Moh08].

As one may recall from studying Einstein’s theory of relativity, spacetime may be modelled by D-dimensional Minkowski space {\mathbb{M}^D} . In the abstract, the basic idea is to consider two (distinct) sets E and {\vec{E}} , where E is a set of points (with no given structure) and {\vec{E}} is a vector space (of free vectors) acting on the set E. We view the elements of {\vec{E}} as forces acting on points in E, which we in turn think of as physical particles. Applying a force (free vector) {X \in \vec{E}} to a point {P \in E} results in a translation. In other words, the action of a force X is to move every point P to the point {P + X \in E} by translation that corresponds to X viewed as a vector.

In physics, the set E is viewed as the D-dimensional affine space {\mathbb{M}^D} , and then {\vec{E}} is the associated D-dimensional vector space {\mathbb{R}^{1,D-1}} defined over the field of real numbers. The choice to model spacetime as an affine space is quite natural, given that an affine space has no preferred or distinguished origin and, of course, the spacetime of special relativity possesses no preferred origin.

As the vectors {X \in \mathbb{R}^{1,D-1}} do not naturally correspond to points {P \in \mathbb{M}} , but rather as displacements relating a point P to another point Q, we write {X = \vec{PQ}} . The points can be defined to be in one-to-one correspondence with a position vector such that {\vec{X}_P = \vec{OP}} , with displacements then defined by the difference {\vec{PQ} = \vec{OQ} - \vec{OP}} . The associated vector space possesses a zero vector {\vec{0} \in \mathbb{R}^{1,D-1}} , which represents the neutral element of vector addition. We can also use the vector space {\mathbb{R}^{1,D-1}} to introduce linear coordinates on {\mathbb{M}^{D}} by making an arbitrary choice of origin as the point {O \in \mathbb{M}^D} .

The elements or points {P,Q,..., \in \mathbb{M}^D} are events, and they combine a moment of time with a specified position. With the arbitrary choice of origin made, we can refer to these points in Minkowski space in terms of their position vectors such that the components {X^{\mu} = (X^0, X^i) = (t, \vec{X})} , with {\mu = 0,..., D-1, i = 1,...,D-1} of vectors {X \in \mathbb{R}^{1,D-1}} correspond to linear coordinates on {\mathbb{M}^D} . The coordinates {X^{0}} is related to the time t, which is measured by an inertial or free falling observer by {X^0 =ct} , with the c the fundamental velocity. The {X^i} coordinates, which are combined into a (D-1)-component vector, parameterise space (from the perspective of the inertial observer).

It is notable that a vector {X} has contravariant coordinates {X^{\mu}} and covariant coordinates {X_{\mu}} which are related by raising and lowering indices such that {X_{\mu} = \eta_{\mu \nu}X^{\nu}} and {X^{\mu} = \eta^{\mu \nu}x_{\nu}} .

We still need to equip a Lorentzian scalar product. In the spacetime of special relativity, the vector space {\mathbb{R}} is furnished with the scalar product (relativistic distance between events)

\displaystyle  \eta_{\mu \nu} = X^{\mu}X_{\mu} = -t^2 + \vec{X}^2 \begin{cases} <0 \ \text{for timelike disrance} \\ =0 \ \text{for lightlike distance} \\ >0 \ \text{for spacelike distance} \end{cases} \ \ (1)

with matrix

\displaystyle  \eta = (\eta_{\mu \nu}) = \begin{pmatrix} - 1 & 0 \\  0 & 1_{D-1} \end{pmatrix}, \ \ (2)

where we have chosen the mostly plus convention. To make sense of (1), since the Minkowski metric (2) is defined by an indefinite scalar product, the distance-squared between events can be positive, zero or negative. This carries information about the causal structure of spacetime. If {X = \vec{PQ}} is the displacement between two events, then these events are called time-like, light-like or space-like relative to each other, depending on X. The zeroth component of X then carries information about the time of the event P as related to Q relative to a given Lorentz frame: P is after Q ({X^0 > Q} ), or simultaneous with Q ({X^0 = 0} ), or earlier than Q ({X^0 < 0} ).

2.2. Lorentz invariance and the Poincaré group

Let’s talk more about Lorentz invariance and the Poincaré group. As inertial observers are required to use linear coordinates which are orthonormal with respect to the scalar product (1), these orthonormal coordinates are distinguished by the above standard form of the metric. It is of course possible to use other curvilinear coordinate systems, such as spherical or cylindrical coordinates. Given the standard form of the metric (2), the most general class of transformations which preserve its form are the Poincaré group, which represents the group of Minkowski spacetime isometries.

The Poincaré group is a 10-dimensional Lie group. It consists of 4 translations along with the Lorentz group of 3 rotations and 3 boosts. As a general review, let’s start with the Lorentz group. This is the set of linear transformations of spacetime that leave the Lorentz interval unchanged.

From the definitions in the previous section, the line element takes the form

\displaystyle  ds^2 = \eta_{\mu \nu}dX^{\mu}dX^{\nu} = - dt^2 + d\vec{X}^2. \ \ (3)

For spacetime coordinates defined in the previous section, the Lorentz group is then defined to be the group of transformations {X^{\mu} \rightarrow X^{\prime \mu}} leaving the relativistic interval invariant. Assuming linearity (we will not prove linearity here, with many proofs easily accessible), define a Lorentz transformation as any real linear transformation {\Lambda} such that

\displaystyle  X^{\mu} \rightarrow X^{\prime \mu} = \Lambda^{\mu}_{\nu}X^{\nu} \ \ (4)


\displaystyle  \eta_{\mu \nu} dX^{\prime \mu} dX^{\prime \nu} = \eta_{\mu \nu} dX^{ \mu} dX^{\nu}, \ \ (5)

ensuring from (1) that

\displaystyle  X^{\prime 2} = X^{2}, \ \ (6)

which, for arbitrary X, requires

\displaystyle  \eta_{\mu \nu} = \eta_{\alpha \beta} \Lambda^{\alpha}_{\mu} \Lambda^{\beta}_{\nu}. \ \ (7)

Note that {\Lambda = (\Lambda^{\mu}_{\nu})} is an invertible {D \times D} matrix. In matrix notation (7) can be expressed as

\displaystyle  \Lambda^T \eta \Lambda = \eta. \ \ (8)

Matrices satisfying (8) contain rotations together with Lorentz boosts, which relate inertial frames travelling a constant velocity relative to each other. The Lorentz transformations form a six-dimensional Lie group, which is the Lorentz group O(1,D-1).

For elements {\Lambda \in O(1, D-1)} taking the determinant of (8) gives

\displaystyle  (\det \Lambda)^2 = 1 \implies \det \Lambda = \pm 1. \ \ (9)

By considering the {\Lambda^0_0} component we also find

\displaystyle  (\Lambda^0_0)^2 = 1 + \Sigma_i (\Lambda^0_i)^2 \geq 1 \Rightarrow \Lambda^0_0 \geq 1 \ \text{or} \ \Lambda^0_0 \leq -1. \ \ (10)

So, the Lorentz group has four components according to the signs of {\det \Lambda} and {\Lambda^0_0} . The matrices with {\det \Lambda = 1} form a subgroup SO(1,D-1) with two connected components as given on the right-hand side of (10). The component containing the unit matrix {1 \in O(1,D-1)} is connected and as {SO_0(1,D-1)} .

We may also briefly consider translations of the form

\displaystyle  X^{\mu} \rightarrow X^{\prime \mu} = X^{\mu} + a^{\mu}, \ \ (11)

where {a = (a^{\mu}) \in \mathbb{R}^{1, D-1}} . Translations form a group that can be parametrised by the components of the translation vector {a^{\mu}} .

As mentioned, the Poincaré group is then the complete spacetime symmetry group that combines translations with Lorentz transformations. For a Lorentz transformation {\Lambda} and a translation {a} the combined transformation {(\Lambda, a)} gives

\displaystyle X^{\mu} \rightarrow X^{\prime \mu} = \Lambda^{\mu}_{\nu} X^{\nu} + a^{\mu}. \ \ (12)

These combined transformations form a group since

\displaystyle (\Lambda_2, a_2)(\Lambda_1, a_1) = (\Lambda_2 \Lambda_2, \Lambda_2 a_1 + a_2), \ (\Lambda, a)^{-1} = (\Lambda^{-1}, -\Lambda^{-1}a). \ \ (13)

Since Lorentz transformations and translations do not commute, the Poincaré group is not a direct product. More precisely, the Poincaré group is the semi-direct product of the Lorentz and translation group, {IO(1,D-1) = O(1,D-1) \propto \mathbb{R}^D} .

2.3. Action principle

We now look to construct an action for the relativistic point particle (initially following the discussion in [Zwie09] as motivation).

The classical motion of a point particle as it propagates through spacetime is described by a geodesic on the spacetime. As Polchinski first notes, we can of course describe the motion of this particle by giving its position in terms of functions of time {X(t) = (X^{\mu}(t)) = (t, \vec{X}(t))} . For now, we may also consider some arbitrary origin and endpoint {(ct_f, \vec{X}_{f})} for the particle’s path or what is also called its worldline. We also know from the principle of least action that there are many possible paths between these points.

Particle worldline

It should be true that for any worldline all Lorentz observers compute the same value for the action. Let {\mathcal{P}} denote one such worldline. Then we may use the proper time as an Lorentz invariant quantity to describe this path. Moreover, from special relativity one may recall that the proper time is a Lorentz invariant measure of time. If different Lorentz observers will record different values for the time interval between the two events along {\mathcal{P}} , then we instead imagine that attached to the particle is a clock. The proper time is therefore the time elapsed between the two events on that clock, according to which all Lorentz observers must agree on the amount of elapsed time. This is the basic idea, and it means we want an action of the worldline {\mathcal{P}} that is proportional to the proper time.

To achieve this, we first recall the invariant interval for the motion of a particle

\displaystyle  - ds^2 = -c^2 dt^2 + (dX^1)^2 + (dX^2)^2 + (dX^3)^2, \ \ (14)

in which, from special relativity, the proper time

\displaystyle  -ds^2 = -c^2 dt_f \rightarrow ds = c dt_f \ \ (15)

tells us that for timelike intervals ds/c is the proper time interval. It follows that the integral of (ds/c) over the worldline {\mathcal{P}} gives the proper time elapsed on {\mathcal{P}} . But, if the proper time gives units of time, we still needs units of energy or units of mass times velocity-squared to ensure we have the full units of action (recall that for any dynamical system the action has units of energy times time, with the Lagrangian possessing units of energy). We also need to ensure that we preserve Lorentz invariance in the process of building our theory. One obvious choice is m for the rest mass of the particle, with c for the fundamental velocity in relativity. Then we have an overall multiplicative factor {mc^2} that represents the the rest energy of the particle. As a result, the action takes the tentative form {mc^2 (ds/c) = mc ds} . This should make some sense in that {ds} is just a Lorentz scalar, and we have the factor of relativity we expect. We also include a minus sign to ensure the follow integrand is real for timelike geodesics.

\displaystyle  S = -mc \int_{\mathcal{P}} ds. \ \ (16)

A good strategy now is to find an integral of our Lagrangian over time – say, {t_i} and {t_f} which are world-events that we’ll take to define our interval – because it will enable use to establish a more satisfactory expression that includes the values of time at the initial and final points of our particle’s path. If we fix a frame – which is to say if we choose the frame of a particular Lorentz observer – we may express the action (16) as the integral of the Lagrangian over time. To achieve this end, we must first return to our interval (14) and relate {ds} to {dt} ,

\displaystyle  -ds^2 = -c^2 dt^2 + (dX^1)^2 + (dX^2)^2 + (dX^3)^2

\displaystyle  ds^2 = c^2 dt^2 - (dX^1)^2 - (dX^2)^2 - (dX^3)^2

\displaystyle  ds^2 = [c^2 - \frac{(dX^1)^2}{dt} - \frac{(dX^2)^2}{dt} - \frac{(dX^3)^2}{dt}] dt^2

\displaystyle  \implies ds^2 = (c^2 - v^2) dt^2

\displaystyle  \therefore ds = \sqrt{c^2 - v^2} dt. \ \ (17)

With this relation between {ds} and {dt} , in the fixed frame the point particle action becomes

\displaystyle  S = -mc^{2} \int_{t_{i}}^{t_{f}} dt \sqrt{1 - \frac{v^{2}}{c^{2}}}, \ \ (18)

with the Lagrangian taking the form

\displaystyle  L = -mc^{2} \sqrt{1 - \frac{v^{2}}{c^{2}}}. \ \ (19)

This Lagrangian gives us a hint that it is correct as its logic breaks down when the velocity exceeds the speed of light {v > c} . This confirms the definition of the proper time from special relativity (i.e., the velocity should not exceed the speed of light for the proper time to be a valid concept). In the small velocity limit {v << c} , on the other hand, when we expand the square root (just use binomial theorem to approximate) we see that it gives

\displaystyle L \simeq -mc^2 (1 - \frac{1}{2}\frac{v^2}{c^2}) = - mc^2 + \frac{1}{2}m v^2. \ \ (20)

returning similar structure for the kinetic part of the free non-relativistic particle, with ({-mc^2} ) just a constant.

2.4. Canonical momentum and Hamiltonian

We will discuss the canonical momentum of the point particle again in a future note on quantisation; but for the present form of the action it is worth highlighting that we can also see the Lagrangian (19) is correct by computing the momentum {\vec{p}} and the Hamiltonian.

For the canonical momentum, we take the derivative of the Lagrangian with respect to the velocity

\displaystyle  \vec{p} = \frac{\partial L}{\partial \vec{v}} = -mc^{2}(-\frac{\vec{v}}{c^{2}})\frac{1}{\sqrt{1 - \frac{v^{2}}{c^{2}}}} = \frac{m\vec{v}}{\sqrt{1 - \frac{v^{2}}{c^{2}}}}. \ \ (21)

Now that we have an expression for the relativistic momentum of the particle, let us consider the Hamiltonian. The Hamiltonian may be written schematically as {H = \vec{p} \cdot \vec{v} - L} . All we need to do is make the appropriate substitutions,

\displaystyle  H = \frac{m\vec{v}^{2}}{\sqrt{1 - \frac{v^{2}}{c^{2}}}} + mc^{2}\sqrt{1 - \frac{v^{2}}{c^{2}}} = \frac{mc^{2}}{\sqrt{1 - \frac{v^{2}}{c^{2}}}}. \ \ (22)

The Hamiltonian should make sense. Notice, if we instead write the result in terms of the particle’s momentum (rather than velocity) by inverting (22), we find an expression in terms of the relativistic energy {\frac{E^{2}}{c^{2}} - \vec{p} \cdot \vec{p} = m^{2}c^{2}} . This is a deep hint that we’re on the right track, as it suggests quite clearly that we’ve recovered basic relativistic physics for a point-like object.

3. Reparameterisation invariance

An important property of the action (16) is that it is invariant under whatever choice of parameterisation we might choose. This makes sense because the invariant length ds between two points on the particle’s worldline does not depend on any parameterisation. We’ve only insisted on integrating the line element, which, if you think about it, is really just a matter of adding up all of the infinitesimal segments along the worldline. But, typically, a particle moving in spacetime is described by a parameterised curve. As Polchinski notes, it is generally best to introduce some parameter and then describe the motion in spacetime by functions of that parameter.

Furthermore, how we parameterise the particle’s path will govern whether, for the classical motion, the path is one that extremises the invariant distance ds as a minimum or maximum. Our choice of {\tau}-parameterisation is such that the invariant length ds is given by

\displaystyle ds^2 = -\eta_{\mu \nu}(X) dX^{\mu} dX^{\nu}, \ \ (23)

then the choice of worldline parameter {\tau} is considered to be increasing between some initial point {X^{\mu} (\tau_i)} and some final point {X^{\mu}(\tau_f)} . So the classical paths are those which maximise the proper time. It also means that the trajectory of the particle worldline is now described by the coordinates {X^{\mu} = X^{\mu}(\tau)} . As a result, the space of the theory can now be updated such that {X^{\mu}(\tau) \in \mathbb{R}^{1, D-1}} with {\mu, \nu = 0,...,D-1} .

In the use of {\tau} parameterisation, an important idea is that time is in a sense being promoted to a dynamical degree of freedom without it actually being a dynamical degree of freedom. We are in many ways leveraging the power of gauge symmetry, with our choice of parameterisation enabling us to treat space and time coordinates on equal footing. The cost by trading a less symmetric description for a more symmetric one is that we pick up redundancies.

Given the previous preference of background spacetime geometry to be Minkowski, recall the metric

\displaystyle  \eta_{\mu \nu} = \begin{pmatrix} -1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, \ \ (24)

such that for the integrand ds we now use

\displaystyle -\eta_{\mu \nu}(X) dX^{\mu} dX^{\nu} = -\eta_{\mu \nu}(X) \frac{dX^{\mu}(\tau)}{d\tau} \frac{dX^{\nu}(\tau)}{d\tau} d\tau^2. \ \ (25)

Therefore, the action (16) may be updated to take the form

\displaystyle  S_{pp} = -mc \int_{\tau_i}^{\tau_f} d\tau \ \sqrt{-\eta_{\mu \nu} \dot{X}^{\mu} \dot{X}^{\nu}} \ \ (26)

with {\dot{X}^{\mu} \equiv dX^{\mu}(\tau) / d\tau} .

Setting {c = 1} , notice (26) is precisely the action (eqn. 1.2.2) in Polchinski. This is the simplest action for a relativistic point particle with manifest Poincaré invariance that does not depend on the choice of parameterisation.

How do we interpret this form of the action? In the exercise to obtain (26) we have essentially played the role of a fixed observer, who has calculated the action using some parameter {\tau} . The important question is whether the value of the action depends on this choice of parameter. Polchinski comments that, in fact, it is a completely arbitrary choice of parameterisation. This should make sense because, again, the invariant length ds on the particle worldline {\mathcal{P}} should not depend on how the path is parameterised.

Proposition 1 The action (26) is reparameterisation invariant such that if we replace {\tau} with the parameter {\tau^{\prime} = f(\tau)} , where f is monotonic, we obtain the same value for the action.

Proof: Consider the following reparameterisation of the particle’s worldline {\tau \rightarrow \tau^{\prime} = f(\tau)} . Then we have

\displaystyle d\tau \rightarrow d\tau^{\prime} = \frac{\partial f}{\partial \tau}d\tau, \ \ (27)


\displaystyle  \frac{dX^{\mu}(\tau^{\prime})}{d\tau} = \frac{dX^{\mu}(\tau^{\prime})}{d\tau^{\prime}}\frac{d\tau^{\prime}}{d\tau} = \frac{dX^{\mu}(\tau^{\prime})}{d\tau^{\prime}} \frac{\partial f(\tau)}{\partial \tau}. \ \ (28)

Plugging this into the action (26) we get

\displaystyle S^{\prime} = -mc \int_{\tau_i}^{\tau_f} d\tau^{\prime} \ \sqrt{\frac{dX^{\mu}(\tau^{\prime})}{d\tau^{\prime}} \frac{dX_{\mu}(\tau^{\prime})}{d\tau^{\prime}}}

\displaystyle  = -mc \int_{\tau_i}^{\tau_f} \frac{\partial f}{\partial \tau} \ d\tau \ \sqrt{\frac{dX^{\mu}}{d\tau} \frac{dX_{\mu}}{d\tau} (\frac{\partial f}{\partial tau})^{-2}}

\displaystyle  = -mc \int_{\tau_i}^{\tau_f} (\frac{\partial f}{\partial \tau})(\frac{\partial f}{\partial \tau})^{-1} \ d\tau \ \sqrt{\frac{dX^{\mu}}{d\tau} \frac{dX_{\mu}}{d\tau}}

\displaystyle  = -mc \int_{\tau_i}^{\tau_f} d\tau \ \sqrt{\frac{dX^{\mu}(\tau)}{d\tau} \frac{dX_{\mu}(\tau)}{d\tau}}. \ \ (29)


This ends the proof. So we see the value of the action does not depend on the choice of parameter; indeed, the choice is arbitrary.

As alluded earlier in this section, reparameterisation invariance is a gauge symmetry. In some sense, this is not even an honest symmetry; because it means that we’ve introduced a redundancy in our description, as not all degrees of freedom {X^{\mu}} are physically meaningful. We’ll discuss this more in the context of the string (an example of such a redundancy appears in the study of the momenta).

4. Equation of motion for {S_{pp}}

To obtain (eqn. 1.2.3), Polchinski varies the action (26) and then integrates by parts. For simplicity, let us temporarily maintain {c = 1} . Varying (26)

\displaystyle  \delta S_{pp} = -m \int d\tau \delta (\sqrt{-\dot{X}^{\mu}\dot{X}_{\mu}}) \ \ (30)

\displaystyle  = -m \int d\tau \frac{1}{2}(-\dot{X}^{\mu}\dot{X}_{\mu})^{-1/2}(-\delta \dot{X}^{\mu}\dot{X}_{\mu}), \ \ (31)

then from the last term we pick up a factor of 2 leaving

\displaystyle  = -m \int d\tau (-\dot{X}^{\mu}\dot{X}_{\mu})^{-1/2} + (-\dot{X}^{\mu}\delta \dot{X}_{\mu}). \ \ (32)

Next, we make the substitution {u^{\mu} = \dot{X}^{\mu}(-\dot{X}^{\nu}\dot{X}_{\nu})^{-1/2}} such that

\displaystyle  \delta S_{pp} = -m \int d\tau (-u_{\mu})\delta \dot{X}^{\mu}. \ \ (33)

And now we integrate by parts, which shifts a derivative onto u using the fact we can commute the variation and the derivative {\delta \dot{X}^{\mu} = \delta d / d\tau X^{\mu} = d/d\tau \delta X^{\mu}} . We also drop the total derivative term that we obtain in the process

\displaystyle  \delta S_{pp} = -m \int d\tau \frac{d}{d\tau} (-u_{\mu}\delta X^{\mu}) - m \int d\tau \dot{u}_{\mu} \delta X^{\mu}, \ \ (34)

which gives the correct result

\displaystyle  \delta S_{pp} = -m \int d\tau \dot{u}_{\mu}\delta X^{\mu}. \ \ (34)

As Polchinski notes, the equation of motion {\dot{u}^{\mu} = 0} describes the free motion of the particle.

With the particle mass m being the normalisation constant, we can also take the non-relativistic limit to find (exercise 1.1). Returning to (26), one way to do this is for {\tau} to be the proper time, then, as before (reinstating c for the purpose of example)

\displaystyle  \dot{X}^{\mu}(\tau) = c \frac{dt}{d\tau} + \frac{d\vec{X}^{\mu}(\tau)}{d\tau} \ \ (35)

so that we may define the quantity {\gamma = (1 - v^2/c^2)^{-1/2}} . Then, in the non-relativistic limit where {v << c} we have {dt/d\tau = \gamma = 1 + \mathcal{O}(v^2/c^2)} . It follows

\displaystyle  \dot{X}^{\mu}\dot{X}_{\mu} = -c^2 + \mid \vec{v} \mid^2 + \mathcal{O}(v^2/c^2), \ \ (36)

with {\vec{v}} a spatial vector and we define the norm {\mid \vec{v} \mid \equiv v} . Now, equivalent as with the choice of static gauge, the action to order {v/c} takes the form

\displaystyle S_{pp} \approx -mc \int dt \sqrt{c^2 -\mid \vec{v} \mid^2}, \ \ (37)

where we now taylor expand to give

\displaystyle  S_{pp} \approx -mc \int (1 - \frac{1}{2}\frac{\mid \vec{v} \mid^2}{c^2}) \ \ (38)

Observe that we now have a time integral of a term with classical kinetic structure minus a potential-like term (actually a total time derivative) that is an artefact of the relative rest energy

\displaystyle  S_{pp} \approx \int dt \ (\frac{1}{2}m\mid \vec{v} \mid^2 - mc^2). \ \ (39)

5. Deriving {S_{pp}^{\prime}} (eqn. 1.2.5)

The main problem with the action (18) and equivalently (26) is that, when we go to quantise this theory, the square root function in the integrand is non-linear. Analogously, we will find a similar issue upon constructing the first-principle string action, namely the Nambu-Goto action. Additionally, in our study of the bosonic string, we will be interested firstly in studying massless particles. But notice that according to the action (26) a massless particle would be zero.

What we want to do is rewrite {S_{PP}} in yet another equivalent form. To do this, we add an auxiliary field so that our new action takes the form

\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d \tau (\eta^{-1} \dot{X}^{\mu} \dot{X}_{\mu} - \eta m^2), \ \ (40)

where we define the tetrad {\eta (\tau) = (- \gamma_{\tau \tau} (\tau))^{\frac{1}{2}}} . The independent worldline metric {\gamma_{\tau \tau}(\tau)} that we’ve introduce as an additional field is, in a sense, a generalised Lagrange multiplier. For simplicity we can denote this additional field {e(\tau)} so that we get the action

\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d\tau (e^{-1} \dot{X}^{2} - em^{2}), \ \ (41)

where we have simplified the notation by setting {\dot{X}^{2} = \eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu}} and completely eliminated the square root. This is equivlant to what Polchinski writes in (eqn.1.2.5). The structure of (41) may look familiar, as it reads like a worldline theory coupled to 1-dimensional gravity (worth checking and playing with).

To see that {S_{pp}^{\prime}} is classically equivalent (on-shell) to {S_{pp}} , we first consider its variation with respect to {e(\tau)}

\displaystyle  \delta S_{pp}^{\prime} = \frac{1}{2}\delta \int d\tau (e^{-1} \dot{X}^{2} - m^2 e)

\displaystyle  = \frac{1}{2} \int d\tau (- \delta (\frac{1}{e})\dot{X}^{2} - \delta (m^{2} e))

\displaystyle  = \frac{1}{2} \int d\tau (- \frac{1}{e^{2}}\dot{X}^{2} - m^{2}), \ \ (42)

which results in the following field equations

\displaystyle  e^{2} = \frac{\dot{X}^{2}}{m^{2}}

\displaystyle  \implies e = \sqrt{\frac{-\dot{X}^{2}}{m^{2}}} \ \ (43).

This again aligns with Polchinski’s result (eqn. 1.2.7).

Proposition 2 If we substitute (43) back into (41), we recover the original {S_{pp}} action (26).


\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d\tau [(-\frac{\dot{X}^2}{m^{2}})^{-1/2} \dot{X}^{2} - m^{2}(-\frac{\dot{X}^{2}}{m^{2}})^{1/2}]

\displaystyle  = \frac{1}{2} \int d\tau [(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} (\dot{X}^{2} - m^{2}(\frac{\dot{X}^{2}}{m^{2}})^{1/2})]

\displaystyle  = \frac{1}{2} \int d\tau [(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} (\dot{X}^{2} - m (- \dot{X}^{2})^{1/2})] \ \ (44)

Recalling {\dot{X}^{2} = \eta_{\mu \nu} \dot{X}^{\mu}\dot{X}^{\nu}} , substitute for {\dot{X}} in the square root on the right-hand side

\displaystyle  = \frac{1}{2} \int d\tau [(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} \dot{X}^{2} - m (- \eta_{\mu \nu} \dot{X}^{\mu}\dot{X}^{\nu})^{1/2}. \ \ (45)

For the first term we clean up with a bit of algebra. From complex variables recall {i^{2} = -1} .

\displaystyle  (-\frac{m^{2}}{\dot{X}^{2}})^{1/2} \dot{X}^{2} = (-1)(-1) -(\frac{m^{2}}{\dot{X}^{2}})^{1/2} \dot{X}^{2}

\displaystyle  = -(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} i^{2} \dot{X}^{2}

\displaystyle = -(-\frac{m^{2}}{\dot{X}^{2}} i^{4} \dot{X}^{2})^{1/2}

\displaystyle  = -(-m^{2}i^{4}\dot{X}^{2})^{1/2} = -m (-i^{4}\dot{X}^{2})^{1/2}. \ \ (46)

As {i^{4} = 1} , it follows {-m(i^{4}\dot{X}^{2})^{1/2} = -m (-\dot{X}^{2})^{1/2}} . Now, substitute for {\dot{X}^{2}} and we find {-m (-\eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu})^{1/2}} giving

\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d\tau [-m(- \eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu})^{1/2} - m (- \eta_{\mu \nu} \dot{X}^{\mu}\dot{X}^{\nu})^{1/2}

\displaystyle  = -m \int d\tau (- \eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu})^{1/2} = S_{pp} \ \ (47).


This ends the proof, demonstrating that {S_{pp}} and {S_{pp}^{\prime}} are classically equivalent.

It is also possible to show that, like with {S_{pp}} , the action {S_{pp}^{\prime}} is both Poincaré invariant and reparameterisation invariant.

6. Generalising to Dp-branes

As an aside, and to conclude this note, we can generalise the action for a point particle (0-brane) to an action for a p-brane. It follows that a p-brane in a {D \geq p} dimensional background spacetime can be described in such a way that the action becomes,

\displaystyle  S_{pb}= -T_p \int d\mu_p \ \ (48).

The term {T_p} is one that will become more familiar moving forward, especially when we begin to discuss the concept of string tension. However, in the above action it denotes the p-brane tension, which has units of mass/volume. The {d\mu_p} term is the {(p + 1)} -dimensional volume measure,

\displaystyle  d\mu_p = \sqrt{- \det G_{ab}} \ d^{p+1} \sigma, \ \ (49)

where {G_{ab}} is the induced metric, which, in the {p = 1} case, we will understand as the worldsheet metric. The induce metric is given by,

\displaystyle  G_{ab} (X) = \frac{\partial X^{\mu}}{\partial \sigma^{a}} \frac{\partial X^{\nu}}{\partial \sigma^{b}} h_{\mu \nu}(X) \ \ \ a, b \equiv 0, 1, ..., p \ \ (50)  p>

A few additional comments may follow. As {\sigma^{0} \equiv \tau} , spacelike coordinates in this theory run as {\sigma^{1}, \sigma^{2}, ... \sigma^{p}} for the surface traced out by the p-brane. Under {\tau} reparameterisation, the above action may also be shown to be invariant.

7. Summary

To summarise, one may recall how in classical (non-relativistic) theory [LINK] the evolution of a system is described by its field equations. One can generalise many of the concepts of the classical non-relativistic theory of a point particle to the case of the relativistic point particle. Indeed, one will likely be familiar with how in the non-relativistic case the path of the particle may be characterised as a path through space. This path is then parameterised by time. On the other hand, in the case of the relativistic point particle, we have briefly reviewed how the path may instead be characterised by a worldline through spacetime. This worldline is parameterised not by time, but by the proper time. And, in relativity, we learn in very succinct terms how freely falling relativistic particles move along geodesics.

It should be understood that the equations of motion for the relativistic point particle are given by the geodesics on the spacetime. This means that one must remain cognisant that whichever path the particle takes also has many possibilities, as noted in an earlier section. That is, there are many possible worldlines between some beginning point and end point. This useful fact will be explicated more thoroughly later on, where, in the case of the string, we will discuss the requirement to sum over all possible worldsheets. Other lessons related to the point particle will also be extended to the string, and will help guide how we construct the elementary string action.


[Moh08] T. Mohaupt, Liverpool lectures on string theory [lecture notes].

[Pol07] J. Polchinski, An introduction to the bosonic string. Cambridge, Cambridge University Press. (2007).

[Wray11] K. Wray, An introduction to string theory [lecture notes].

[Zwie09] B. Zwiebach, A first course in string theory. Cambridge, Cambridge University Press. (2009).

Learning M-theory: Gauge theory of membranes, brane intersections, and the self-dual string

I’ve been learning a lot about M-theory. It’s such a broad topic that, when people ask me ‘what is M-theory?’, I continue to struggle to know where to start. Right now, much of my learning is textbook and I have more questions than answers. I naturally take the approach of first wanting as broad and general of a picture as possible. In some sense, it is like starting with the general and working toward the particular. Or, in another way, it’s like when being introduced to a new landscape and wanting, at the outset, a broad orientation to its general geographical features, except in this case we are speaking in conceptual and quantitative terms. I may not ever be smart enough to grasp M-theory in its entirety, but what is certain is that I am working my hardest.

In surveying its geographical features and charting my own map, if I may continue the analogy, obtaining a better sense of the fundamental objects of M-theory is a particular task; but my main research interest has increasingly narrowed to the study and application of gauge theory and higher gauge theory. This can be sliced down further in that I am very interested in the relationship between string and gauge theory, and furthermore in studying the higher dimensional generalisation of gauge theory. This interest naturally follows from the importance of gauge theory in contemporary physics, and then how we may understand it from the generalisation of point particle theory to string theory and then to other higher dimensional extended objects (i.e., branes). We’ve talked a bit in the past about how the dynamics on the D-brane worldvolume is described by a gauge theory. We’ve also touched on categorical descriptions, and how in p-brane language when we study the quantum theory the resemblance of the photon can be seen as a p-dimensional version of the electromagnetic field (by the way, we’re going to start talking about p-branes in my next string note). That is to say, we obtain a p-dimensional analogue of Maxwell’s equations. More advanced perspectives from the gauge theory view, or in this case higher gauge theory view in M-theory, illuminate the existence of new objects like self-dual strings.

There is so much here to write about and explore, I look forward to sharing more as I progress through my own studies and thinking. In this post, though, I want to share some notebook reflections on things I’ve been learning more generally in the context of M-theory: some stuff about membranes, 11-dimensional supergravity, and the self-dual string. This post is not very technical; it’s just me thinking out loud.

11-dimensional supergravity

The field content of 11-dimensional supergravity consists of the metric g_{\mu \nu}  , with 44 degrees of freedom; a rank 3 anti-symmetric tensor field C_{\mu \nu \rho}  , with 84 degrees of freedom; and these are paired off with a 32 component Majorana gravitino \Psi_{\alpha \mu}  , with 128 degrees of freedom. Although much has progressed since originally conceived, the Lagrangian for the bosonic sector is similar to as it was originally written [3]

S_{SUGRA} = \frac{1}{2k_{11}^2} \int_{M_{11}} \sqrt{g} \ (R - \frac{1}{48}F^{2}_{4}) - \frac{1}{6} F_{4} \wedge F_{4} \wedge C_3. \ \ (1)

The field strength is F_4 = dC_3  and k_{11}  is the 11-dimensional coupling constant. The field strength is defined conventionally,

\mid F_n \mid^2 = \frac{1}{n !} G^{M_1 N_1} G^{M_2 N_2} ... G^{M_n N_n}F_{M_{1}M_{2} ... M_{n}}F_{N_1 N_2 ... N_n}. \ \ (2)

The 11-dimensional frame field in the metric combination is G_{MN} = \eta_{AB}E^{A}_{M}E^{B}_{N}  , where we have the elfbeins E^{B}_{N}  , M,N  are indices for curved base-space vectors, and A,B  are indices for tangent space vectors. The last term in (2) is the Cherns-Simons structure. This is a topological dependent term independent of the metric. We see this structure in a lot of different contexts.

Although, from what I presently understand, the total degrees of freedom of M-theory are not yet completely nailed down, we can of course begin to trace a picture in parameter space. As we’ve discussed before on this blog, it can be seen how 10-dimensional type IIA theory in the strong coupling regime behaves as an 11-dimensional theory whose low-energy limit is captured by 11-dimensional supergravity. Reversely, compactify 11-dimensional supergravity on a circle of fixed radius in the x^{10} = z  direction, from the 11-dimensional metric we then obtain the 10-dimensional metric, a vector field and the dilaton. The 3-form potential leads to both a 3-form and a 2-form in 10-dimensions. The mysterious 11-dimensional theory can also be seen to give further clue at its parental status given how supergravity compactified on unit interval {\mathbb{I} = [0,1]}  , for example, leads to the low-energy limit of E8 \times E8  heterotic theory.

Non-renomoralisability of 11-dimensional SUGRA

One thing that I’ve known about for sometime but I have not yet studied in significant detail concerns precisely how 11-dimensional supergravity is non-renormalisable [4,5,6]. Looking at the maths, what I understand is that above two-loops the graviton-graviton scattering is divergent. Moreover, as I still have some questions about this, what I find curious is that in the derivative expansion in 11-dimensional flat spacetime (using a 1PI/quantum effective Lagrangian approach) the generating functional for the graviton S-matrix is non-local. But due to supersymmetry, low order terms in the derivative expansion can be separated into local terms, such as t_8 t_8 R^4  , and non-local (or global) terms that correspond to loop amplitudes. But what happens is that, at 2-loops, a logarithmic divergence that is cut off at the Planck scale mixes with a local term of the schematic form D^{12}R^4  , where R^4  is the supersymmetrised vertex. In the literature, one will find a lot of discussion about this R^4  vertex. But like I said, I really need more time looking at this.

In short, the important mechanism in string theory that allows us to avoid UV divergences is absent, or appears absent, in maximal supergravity. What could the UV regulator be? As in any supergravity, from what I understand, it is not clear that a Lagrangian description is sufficient at the Planck scale.

Membranes, D-branes, and AdS/CFT

The facts of 11-dimensional supergravity and how it relates to 10-dimensional string theory are textbook and well-known. Going beyond dualities relating different string theories, an obvious question concerns what M-theory actually constitutes. One thing that is known is that M-theory reduces to 11-dimensional SUGRA at low-energies, as we touched on, and it is known that fundamental degrees of freedom are 2-dimensional and 5-dimensional objects, known as M2-branes and M5-branes. Study of these non-perturbative states offer several intriguing hints. There are also solutions to classical supergravity known as F1 – the fundamental string – and its magnetic dual, the NS5-brane. As it relates to the story of the five string theories, the M-branes realize all D-branes, and this is why D-branes are considered consistent objects in quantum gravity.

The way that M-theory sees D-branes is via the net of dualities. All of the D-branes and the NS5 brane are solutions to type II theories, both A and B. So, when you reduce M-theory on a circle, in that you get back to Type IIA, the M2-branes and M5-branes reduce to the various D-branes such that under S-duality from the D5-brane you get the NS5.

The worldvolume theory of the M5-brane is always strongly coupled, which can be seen in moduli space (its parameters are simply a point). So there is no Lagrangian for this theory, and it suggests something deep is needed or is missing. It is expected that its worldvolume theory will be a 6-dimensional superconformal field theory, typically known as the 6d(2,0) theory. The worldvolume theory for M2-branes (on an orbifold) has been found to be a 3-dimensional superconformal Chern-Simons theory with classical \mathcal{N} = 6 supersymmetry.

If one considers a single M5-brane, a theory can be formulated in terms of an Abelian (2,0)-tensor multiplet, consisting of a self-dual 2-form gauge field, 5 scalars, and 8 fermions, but it is not known how to generalise the construction to describe multiple M5-branes. To give an example, using AdS/CFT [7] it is described how the worldvolume theory for a stack of N  M5-branes is dual to M-theory on AdS7 \times S4  with N  units of flux through the 4-sphere, which reduces to 11-dimensional SUGRA on this background in the limit large N  limit.

Brane intersections and stacks

The existence of branes is one of the most fascinating things about quantum gravity. There is a lot to unpack when learning about D1-branes, D3-branes, D5-branes, M2-branes, and M5-branes, as well as how they may intersect and what sort of consistent solutions have already been found [8,9, 10, 11, 12].

For example, an M2-brane, or a stack of coincident M2-branes, can end on a D5-brane. This is similar to the more simplified story of how D-branes, coincident D-branes, can intersect in string theory. Typically, D1-D3 systems in Type IIB string theory are studied because this system relates to the M2-M5 system by dimensional reduction and T-duality.

Self-dual string

For a membrane to end on a D5-brane, the membrane boundary must carry the charge of the self-dual field B on the five-brane worldvolume. There are different solutions to the field equations of B. For instance, a BPS solution was found [10] by looking at the supersymmetry transformation.

The linearised supersymmetry equation is

\delta_{\epsilon} \Omega^{j}_{\beta} = \epsilon^{\alpha i}(\frac{1}{2} (\gamma^{a})_{\alpha \beta}(\gamma_{b^{\prime}})^{j}_{i}\partial_a X^{b^{\prime}} - \frac{1}{6}(\gamma^{abc})_{\alpha \beta}\delta^{j}_i h_{abc}) = 0. \ \ (3)

Here b^{\prime}  labels transverse scalars, a indices label worldvolume directions, \alpha, \beta  denote spinor indices of spin(1,5), and i,j are spinor indices of USp(4)  . The solution balances the contribution of the 3-form field strength h with a contribution from the scalars. Additionally, the worldvolume of the string soliton can be taken to be in the 0,1 directions with all fields independent of x^0  and x^1  . An illustration of the solution is given below, showing an M2-brane ending on an M5-brane with a cross section S^3 \times \mathbb{R}  .

M2-branes ending on a M5-brane. The endpoint is a string. Courtesy of N. Copland, Aspects of M-Theory Brane Interactions and String Theory Symmetries [].

As I am still trying to understand the calculation, I am currently looking at the following string solution

H_{01m} = \pm \frac{1}{4} \partial_m \phi,

H_{mnp} = \pm \frac{1}{4} \epsilon_{emnpq}\delta^{qr}\partial_r \phi,

\phi = \phi_0 + \frac{2Q}{\mid x - x_0 \mid^2}, \ \ (4)

where \phi  may be replaced by a more general superposition of solutions. We denote \pm Q as the magnetic and electric charge. There is a conformal factor in the full equations of motion which guarantees that they are satisfied even at x = x_0  , which means the solution is solitonic. This string soliton is said to possess its own anomalies that require cancellation (I assume Weyl, Lorentz). What is neat is that this string can be dimensionally reduced to get various T-duality configurations, which is something that would be fun to look into at some point down the road.


[1] D. Fiorenza, H. Sati, and U. Schreiber, The rational higher structure of m-theory. Fortschritte der Physik, 67(8-9):1910017, May 2019. [arXiv:1903.02834 [hep-th]].

[2] E. Witten, String theory dynamics in various dimensions. Nuclear PhysicsB, 443(1):85 – 126, 1995.

[3] E. Cremmer, B. Julia, and J. Scherk, Supergravity Theory in 11-dimensions. Phys. Lett. B76, No. 4, (409-412) 19 June 1978.

[4] S. Chester, S. Pufu, and X Yin, The M-Theory S-Matrix from ABJM: Beyond 11D supergravity. (2019). [arXiv:1804.00949v3 [hep-th]].

[5] A. Tseytlin, R4 terms in 11 dimensions and conformal anomaly of (2,0) theory. (2005). [arXiv:hep-th/0005072v4 [hep-th]].

[6] G. Russo, and A. Tseytlin, One-loop four-graviton amplitude in eleven-dimensional supergravity. (1997). [arXiv:hep-th/9707134v3 [hep-th]].

[7] P. Heslop, and A. Lipstein, M-theory Beyond The Supergravity Approximation. (2017). [arXiv:1712.08570 [hep-th]].

[8] P.K. Townsend, D-branes from M-branes. (1995). [arXiv:hep-th/9512062 [hep-th]].

[9] A. Strominger, \textit{Open p-branes}. Phys. Lett. B 383 (1996) 44. [arXiv:hep-th/9512059 [hep-th]].

[10] P.S. Howe, N.D. Lambert, and P.C. West, The self-dual string soliton. Nucl. Phys. B 515 (1998) 203. [arXiv:hep-th/9709014 [hep-th]].

[11] M. Perry and J.H. Schwarz, Interacting chiral gauge fields in six dimensions and Born-Infeld theory. Nucl. Phys. B 489 (1997) 47. [arXiv:hep-th/9611065 [hep-th]].

[12] D.S. Berman, Aspects of M-5 brane world volume dynamics. Phys. Lett. B 572 (2003) 101. [arXiv:hep-th/0307040 [hep-th]].

[13] J. Huerta, H. Sati, and U. Schreiber, Real ADE-equivariant (co)homotopy and Super M-branes. (2018). [arXiv:1805.05987 [hep-th]].

[14] N. Copland, Aspects of M-Theory Brane Interactions and String Theory Symmetries. [].

[15] S. Palmer, Higher gauge theory and M-theory. [].