Notes on string theory #2: The relativistic point particle (pp. 9-11)

1. Introduction

In Chapter 1 of Polchinski’s textbook, we start with a discussion on the relativistic point particle (pp. 9-11).

String theory proposes that elementary particles are not pointlike, but rather 1-dimensional extended objects (i.e., strings). In fact, string theory (both the bosonic string in Volume 1 of Polchinski and the superstring that comprises much of Volume 2) can be seen as a special generalisation of point particle theory. But the deeper and more modern view is not one that necessarily begins with point particles and then strings, instead the story begins with branes. In that a number of features of string theory are shared by the point particle – as we’ll see in a later note, the point particle can be obtained in the limit the string collapses to a point – the bigger picture is that both of these objects can be considered as special cases of a p-brane.

We refer to p-branes as p-dimensional dynamical objects that have mass and can have other familiar attributes such as charge. As a p-brane moves through spacetime, it sweeps out a latex (p+1)-dimensional volume called its worldvolume. In this notation, a 0-brane corresponds to the case where p = 0. It simply describes a point particle that, as we’ll discuss in this note, traces out a worldline as it propagates through spacetime. A string (whether fundamental or solitonic) corresponds to the p = 1 case, and this turns out to be a very special case of p-branes (for many reasons we’ll learn in following notes). Without getting too bogged down in technical details that extend well beyond the current level of discussion, it is also possible to consider higher-dimensional branes. Important is the case for p = 2, which are 2-dimensional branes called membranes. In fact, the etymology for the word ‘brane’ can be viewed as derivative from `membrane’. As a physical object, a p-brane is actually a generalisation of a membrane such that we may assign arbitrary spatial dimensions. So, for the case {p \geq 2} , these are p-branes that appear in string theory as solitons in the corresponding low energy effective actions of various string theories (in addition to 0-branes and 1-branes).

In Type IIA and Type IIB string theories, which again is a subject of Volume 2, we see that there is entire family of p-brane solutions. From the viewpoint of perturbative string theory, which is the primary focus of Volume 1, solitons as p-branes are strictly non-perturbative objects. (There are also other classes of branes, such as Dp-branes that we’ll come across soon when studying the open string. The more complete picture of D/M-brane physics, including brane dynamics, is anticipated to be captured by M-theory. This is a higher dimensional theory that governs branes and, with good reason, is suspected to represent the non-perturbative completion of string theory).

In some sense, one can think of there being two equivalent ways to approach the idea of p-branes: a top-down higher dimensional view, or from the bottom-up as physical objects that generalise the notion of a point particle to higher dimensions. But given an introductory view of p-branes, perhaps it becomes slightly more intuitive why in approaching the concept of a string in string theory we may start (as Polchinski does) with a review of point particle theory. Indeed, it may at first seem odd to model the fundamental constituents of matter as strings. Indeed, it could seem completely arbitrary and therefore natural to ask, why not something else? But what is often missed, especially in popular and non-technical physics literature, is the natural generalising logic that leads us to study strings in particular. These are remarkable objects with remarkable properties, and what Polchinski does so well in Volume 1 is allow this generalising logic to come out naturally in the study of the simplest string theory: bosonic string theory.

In this note, we will construct the relativistic point particle action as given in p.10 (eqn. 1.2.2) and then work through the proceeding discussion in pages 10-11. The quantisation of the point particle is mentioned several pages later in the textbook, so we’ll address that topic then. In what follows, I originally also wanted to include notes on the superparticle and its superspace formulation (i.e., the inclusion of fermions to the point particle theory of bosons), as well as introduce other advanced topics; but I reasoned it is best to try to keep as close to the textbook as possible. The only exception to this rule is that, at the end of this note, we’ll finish by quickly looking at the p-brane action.

2. Relativistic point particle

Explanation of the action for a relativistic point particle as given in Polchinski (eqn. 1.2.2) is best achieved through its first-principle construction. So let us consider the basics of constructing the theory for a relativistic free point particle.

2.1. Minkowski space

We start with a discussion about the space in which we’ll build our theory [Moh08].

As one may recall from studying Einstein’s theory of relativity, spacetime may be modelled by D-dimensional Minkowski space {\mathbb{M}^D} . In the abstract, the basic idea is to consider two (distinct) sets E and {\vec{E}} , where E is a set of points (with no given structure) and {\vec{E}} is a vector space (of free vectors) acting on the set E. We view the elements of {\vec{E}} as forces acting on points in E, which we in turn think of as physical particles. Applying a force (free vector) {X \in \vec{E}} to a point {P \in E} results in a translation. In other words, the action of a force X is to move every point P to the point {P + X \in E} by translation that corresponds to X viewed as a vector.

In physics, the set E is viewed as the D-dimensional affine space {\mathbb{M}^D} , and then {\vec{E}} is the associated D-dimensional vector space {\mathbb{R}^{1,D-1}} defined over the field of real numbers. The choice to model spacetime as an affine space is quite natural, given that an affine space has no preferred or distinguished origin and, of course, the spacetime of special relativity possesses no preferred origin.

As the vectors {X \in \mathbb{R}^{1,D-1}} do not naturally correspond to points {P \in \mathbb{M}} , but rather as displacements relating a point P to another point Q, we write {X = \vec{PQ}} . The points can be defined to be in one-to-one correspondence with a position vector such that {\vec{X}_P = \vec{OP}} , with displacements then defined by the difference {\vec{PQ} = \vec{OQ} - \vec{OP}} . The associated vector space possesses a zero vector {\vec{0} \in \mathbb{R}^{1,D-1}} , which represents the neutral element of vector addition. We can also use the vector space {\mathbb{R}^{1,D-1}} to introduce linear coordinates on {\mathbb{M}^{D}} by making an arbitrary choice of origin as the point {O \in \mathbb{M}^D} .

The elements or points {P,Q,..., \in \mathbb{M}^D} are events, and they combine a moment of time with a specified position. With the arbitrary choice of origin made, we can refer to these points in Minkowski space in terms of their position vectors such that the components {X^{\mu} = (X^0, X^i) = (t, \vec{X})} , with {\mu = 0,..., D-1, i = 1,...,D-1} of vectors {X \in \mathbb{R}^{1,D-1}} correspond to linear coordinates on {\mathbb{M}^D} . The coordinates {X^{0}} is related to the time t, which is measured by an inertial or free falling observer by {X^0 =ct} , with the c the fundamental velocity. The {X^i} coordinates, which are combined into a (D-1)-component vector, parameterise space (from the perspective of the inertial observer).

It is notable that a vector {X} has contravariant coordinates {X^{\mu}} and covariant coordinates {X_{\mu}} which are related by raising and lowering indices such that {X_{\mu} = \eta_{\mu \nu}X^{\nu}} and {X^{\mu} = \eta^{\mu \nu}x_{\nu}} .

We still need to equip a Lorentzian scalar product. In the spacetime of special relativity, the vector space {\mathbb{R}} is furnished with the scalar product (relativistic distance between events)

\displaystyle  \eta_{\mu \nu} = X^{\mu}X_{\mu} = -t^2 + \vec{X}^2 \begin{cases} <0 \ \text{for timelike disrance} \\ =0 \ \text{for lightlike distance} \\ >0 \ \text{for spacelike distance} \end{cases} \ \ (1)

with matrix

\displaystyle  \eta = (\eta_{\mu \nu}) = \begin{pmatrix} - 1 & 0 \\  0 & 1_{D-1} \end{pmatrix}, \ \ (2)

where we have chosen the mostly plus convention. To make sense of (1), since the Minkowski metric (2) is defined by an indefinite scalar product, the distance-squared between events can be positive, zero or negative. This carries information about the causal structure of spacetime. If {X = \vec{PQ}} is the displacement between two events, then these events are called time-like, light-like or space-like relative to each other, depending on X. The zeroth component of X then carries information about the time of the event P as related to Q relative to a given Lorentz frame: P is after Q ({X^0 > Q} ), or simultaneous with Q ({X^0 = 0} ), or earlier than Q ({X^0 < 0} ).

2.2. Lorentz invariance and the Poincaré group

Let’s talk more about Lorentz invariance and the Poincaré group. As inertial observers are required to use linear coordinates which are orthonormal with respect to the scalar product (1), these orthonormal coordinates are distinguished by the above standard form of the metric. It is of course possible to use other curvilinear coordinate systems, such as spherical or cylindrical coordinates. Given the standard form of the metric (2), the most general class of transformations which preserve its form are the Poincaré group, which represents the group of Minkowski spacetime isometries.

The Poincaré group is a 10-dimensional Lie group. It consists of 4 translations along with the Lorentz group of 3 rotations and 3 boosts. As a general review, let’s start with the Lorentz group. This is the set of linear transformations of spacetime that leave the Lorentz interval unchanged.

From the definitions in the previous section, the line element takes the form

\displaystyle  ds^2 = \eta_{\mu \nu}dX^{\mu}dX^{\nu} = - dt^2 + d\vec{X}^2. \ \ (3)

For spacetime coordinates defined in the previous section, the Lorentz group is then defined to be the group of transformations {X^{\mu} \rightarrow X^{\prime \mu}} leaving the relativistic interval invariant. Assuming linearity (we will not prove linearity here, with many proofs easily accessible), define a Lorentz transformation as any real linear transformation {\Lambda} such that

\displaystyle  X^{\mu} \rightarrow X^{\prime \mu} = \Lambda^{\mu}_{\nu}X^{\nu} \ \ (4)


\displaystyle  \eta_{\mu \nu} dX^{\prime \mu} dX^{\prime \nu} = \eta_{\mu \nu} dX^{ \mu} dX^{\nu}, \ \ (5)

ensuring from (1) that

\displaystyle  X^{\prime 2} = X^{2}, \ \ (6)

which, for arbitrary X, requires

\displaystyle  \eta_{\mu \nu} = \eta_{\alpha \beta} \Lambda^{\alpha}_{\mu} \Lambda^{\beta}_{\nu}. \ \ (7)

Note that {\Lambda = (\Lambda^{\mu}_{\nu})} is an invertible {D \times D} matrix. In matrix notation (7) can be expressed as

\displaystyle  \Lambda^T \eta \Lambda = \eta. \ \ (8)

Matrices satisfying (8) contain rotations together with Lorentz boosts, which relate inertial frames travelling a constant velocity relative to each other. The Lorentz transformations form a six-dimensional Lie group, which is the Lorentz group O(1,D-1).

For elements {\Lambda \in O(1, D-1)} taking the determinant of (8) gives

\displaystyle  (\det \Lambda)^2 = 1 \implies \det \Lambda = \pm 1. \ \ (9)

By considering the {\Lambda^0_0} component we also find

\displaystyle  (\Lambda^0_0)^2 = 1 + \Sigma_i (\Lambda^0_i)^2 \geq 1 \Rightarrow \Lambda^0_0 \geq 1 \ \text{or} \ \Lambda^0_0 \leq -1. \ \ (10)

So, the Lorentz group has four components according to the signs of {\det \Lambda} and {\Lambda^0_0} . The matrices with {\det \Lambda = 1} form a subgroup SO(1,D-1) with two connected components as given on the right-hand side of (10). The component containing the unit matrix {1 \in O(1,D-1)} is connected and as {SO_0(1,D-1)} .

We may also briefly consider translations of the form

\displaystyle  X^{\mu} \rightarrow X^{\prime \mu} = X^{\mu} + a^{\mu}, \ \ (11)

where {a = (a^{\mu}) \in \mathbb{R}^{1, D-1}} . Translations form a group that can be parametrised by the components of the translation vector {a^{\mu}} .

As mentioned, the Poincaré group is then the complete spacetime symmetry group that combines translations with Lorentz transformations. For a Lorentz transformation {\Lambda} and a translation {a} the combined transformation {(\Lambda, a)} gives

\displaystyle X^{\mu} \rightarrow X^{\prime \mu} = \Lambda^{\mu}_{\nu} X^{\nu} + a^{\mu}. \ \ (12)

These combined transformations form a group since

\displaystyle (\Lambda_2, a_2)(\Lambda_1, a_1) = (\Lambda_2 \Lambda_2, \Lambda_2 a_1 + a_2), \ (\Lambda, a)^{-1} = (\Lambda^{-1}, -\Lambda^{-1}a). \ \ (13)

Since Lorentz transformations and translations do not commute, the Poincaré group is not a direct product. More precisely, the Poincaré group is the semi-direct product of the Lorentz and translation group, {IO(1,D-1) = O(1,D-1) \propto \mathbb{R}^D} .

2.3. Action principle

We now look to construct an action for the relativistic point particle (initially following the discussion in [Zwie09] as motivation).

The classical motion of a point particle as it propagates through spacetime is described by a geodesic on the spacetime. As Polchinski first notes, we can of course describe the motion of this particle by giving its position in terms of functions of time {X(t) = (X^{\mu}(t)) = (t, \vec{X}(t))} . For now, we may also consider some arbitrary origin and endpoint {(ct_f, \vec{X}_{f})} for the particle’s path or what is also called its worldline. We also know from the principle of least action that there are many possible paths between these points.

Particle worldline

It should be true that for any worldline all Lorentz observers compute the same value for the action. Let {\mathcal{P}} denote one such worldline. Then we may use the proper time as an Lorentz invariant quantity to describe this path. Moreover, from special relativity one may recall that the proper time is a Lorentz invariant measure of time. If different Lorentz observers will record different values for the time interval between the two events along {\mathcal{P}} , then we instead imagine that attached to the particle is a clock. The proper time is therefore the time elapsed between the two events on that clock, according to which all Lorentz observers must agree on the amount of elapsed time. This is the basic idea, and it means we want an action of the worldline {\mathcal{P}} that is proportional to the proper time.

To achieve this, we first recall the invariant interval for the motion of a particle

\displaystyle  - ds^2 = -c^2 dt^2 + (dX^1)^2 + (dX^2)^2 + (dX^3)^2, \ \ (14)

in which, from special relativity, the proper time

\displaystyle  -ds^2 = -c^2 dt_f \rightarrow ds = c dt_f \ \ (15)

tells us that for timelike intervals ds/c is the proper time interval. It follows that the integral of (ds/c) over the worldline {\mathcal{P}} gives the proper time elapsed on {\mathcal{P}} . But, if the proper time gives units of time, we still needs units of energy or units of mass times velocity-squared to ensure we have the full units of action (recall that for any dynamical system the action has units of energy times time, with the Lagrangian possessing units of energy). We also need to ensure that we preserve Lorentz invariance in the process of building our theory. One obvious choice is m for the rest mass of the particle, with c for the fundamental velocity in relativity. Then we have an overall multiplicative factor {mc^2} that represents the the rest energy of the particle. As a result, the action takes the tentative form {mc^2 (ds/c) = mc ds} . This should make some sense in that {ds} is just a Lorentz scalar, and we have the factor of relativity we expect. We also include a minus sign to ensure the follow integrand is real for timelike geodesics.

\displaystyle  S = -mc \int_{\mathcal{P}} ds. \ \ (16)

A good strategy now is to find an integral of our Lagrangian over time – say, {t_i} and {t_f} which are world-events that we’ll take to define our interval – because it will enable use to establish a more satisfactory expression that includes the values of time at the initial and final points of our particle’s path. If we fix a frame – which is to say if we choose the frame of a particular Lorentz observer – we may express the action (16) as the integral of the Lagrangian over time. To achieve this end, we must first return to our interval (14) and relate {ds} to {dt} ,

\displaystyle  -ds^2 = -c^2 dt^2 + (dX^1)^2 + (dX^2)^2 + (dX^3)^2

\displaystyle  ds^2 = c^2 dt^2 - (dX^1)^2 - (dX^2)^2 - (dX^3)^2

\displaystyle  ds^2 = [c^2 - \frac{(dX^1)^2}{dt} - \frac{(dX^2)^2}{dt} - \frac{(dX^3)^2}{dt}] dt^2

\displaystyle  \implies ds^2 = (c^2 - v^2) dt^2

\displaystyle  \therefore ds = \sqrt{c^2 - v^2} dt. \ \ (17)

With this relation between {ds} and {dt} , in the fixed frame the point particle action becomes

\displaystyle  S = -mc^{2} \int_{t_{i}}^{t_{f}} dt \sqrt{1 - \frac{v^{2}}{c^{2}}}, \ \ (18)

with the Lagrangian taking the form

\displaystyle  L = -mc^{2} \sqrt{1 - \frac{v^{2}}{c^{2}}}. \ \ (19)

This Lagrangian gives us a hint that it is correct as its logic breaks down when the velocity exceeds the speed of light {v > c} . This confirms the definition of the proper time from special relativity (i.e., the velocity should not exceed the speed of light for the proper time to be a valid concept). In the small velocity limit {v << c} , on the other hand, when we expand the square root (just use binomial theorem to approximate) we see that it gives

\displaystyle L \simeq -mc^2 (1 - \frac{1}{2}\frac{v^2}{c^2}) = - mc^2 + \frac{1}{2}m v^2. \ \ (20)

returning similar structure for the kinetic part of the free non-relativistic particle, with ({-mc^2} ) just a constant.

2.4. Canonical momentum and Hamiltonian

We will discuss the canonical momentum of the point particle again in a future note on quantisation; but for the present form of the action it is worth highlighting that we can also see the Lagrangian (19) is correct by computing the momentum {\vec{p}} and the Hamiltonian.

For the canonical momentum, we take the derivative of the Lagrangian with respect to the velocity

\displaystyle  \vec{p} = \frac{\partial L}{\partial \vec{v}} = -mc^{2}(-\frac{\vec{v}}{c^{2}})\frac{1}{\sqrt{1 - \frac{v^{2}}{c^{2}}}} = \frac{m\vec{v}}{\sqrt{1 - \frac{v^{2}}{c^{2}}}}. \ \ (21)

Now that we have an expression for the relativistic momentum of the particle, let us consider the Hamiltonian. The Hamiltonian may be written schematically as {H = \vec{p} \cdot \vec{v} - L} . All we need to do is make the appropriate substitutions,

\displaystyle  H = \frac{m\vec{v}^{2}}{\sqrt{1 - \frac{v^{2}}{c^{2}}}} + mc^{2}\sqrt{1 - \frac{v^{2}}{c^{2}}} = \frac{mc^{2}}{\sqrt{1 - \frac{v^{2}}{c^{2}}}}. \ \ (22)

The Hamiltonian should make sense. Notice, if we instead write the result in terms of the particle’s momentum (rather than velocity) by inverting (22), we find an expression in terms of the relativistic energy {\frac{E^{2}}{c^{2}} - \vec{p} \cdot \vec{p} = m^{2}c^{2}} . This is a deep hint that we’re on the right track, as it suggests quite clearly that we’ve recovered basic relativistic physics for a point-like object.

3. Reparameterisation invariance

An important property of the action (16) is that it is invariant under whatever choice of parameterisation we might choose. This makes sense because the invariant length ds between two points on the particle’s worldline does not depend on any parameterisation. We’ve only insisted on integrating the line element, which, if you think about it, is really just a matter of adding up all of the infinitesimal segments along the worldline. But, typically, a particle moving in spacetime is described by a parameterised curve. As Polchinski notes, it is generally best to introduce some parameter and then describe the motion in spacetime by functions of that parameter.

Furthermore, how we parameterise the particle’s path will govern whether, for the classical motion, the path is one that extremises the invariant distance ds as a minimum or maximum. Our choice of {\tau}-parameterisation is such that the invariant length ds is given by

\displaystyle ds^2 = -\eta_{\mu \nu}(X) dX^{\mu} dX^{\nu}, \ \ (23)

then the choice of worldline parameter {\tau} is considered to be increasing between some initial point {X^{\mu} (\tau_i)} and some final point {X^{\mu}(\tau_f)} . So the classical paths are those which maximise the proper time. It also means that the trajectory of the particle worldline is now described by the coordinates {X^{\mu} = X^{\mu}(\tau)} . As a result, the space of the theory can now be updated such that {X^{\mu}(\tau) \in \mathbb{R}^{1, D-1}} with {\mu, \nu = 0,...,D-1} .

In the use of {\tau} parameterisation, an important idea is that time is in a sense being promoted to a dynamical degree of freedom without it actually being a dynamical degree of freedom. We are in many ways leveraging the power of gauge symmetry, with our choice of parameterisation enabling us to treat space and time coordinates on equal footing. The cost by trading a less symmetric description for a more symmetric one is that we pick up redundancies.

Given the previous preference of background spacetime geometry to be Minkowski, recall the metric

\displaystyle  \eta_{\mu \nu} = \begin{pmatrix} -1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, \ \ (24)

such that for the integrand ds we now use

\displaystyle -\eta_{\mu \nu}(X) dX^{\mu} dX^{\nu} = -\eta_{\mu \nu}(X) \frac{dX^{\mu}(\tau)}{d\tau} \frac{dX^{\nu}(\tau)}{d\tau} d\tau^2. \ \ (25)

Therefore, the action (16) may be updated to take the form

\displaystyle  S_{pp} = -mc \int_{\tau_i}^{\tau_f} d\tau \ \sqrt{-\eta_{\mu \nu} \dot{X}^{\mu} \dot{X}^{\nu}} \ \ (26)

with {\dot{X}^{\mu} \equiv dX^{\mu}(\tau) / d\tau} .

Setting {c = 1} , notice (26) is precisely the action (eqn. 1.2.2) in Polchinski. This is the simplest action for a relativistic point particle with manifest Poincaré invariance that does not depend on the choice of parameterisation.

How do we interpret this form of the action? In the exercise to obtain (26) we have essentially played the role of a fixed observer, who has calculated the action using some parameter {\tau} . The important question is whether the value of the action depends on this choice of parameter. Polchinski comments that, in fact, it is a completely arbitrary choice of parameterisation. This should make sense because, again, the invariant length ds on the particle worldline {\mathcal{P}} should not depend on how the path is parameterised.

Proposition 1 The action (26) is reparameterisation invariant such that if we replace {\tau} with the parameter {\tau^{\prime} = f(\tau)} , where f is monotonic, we obtain the same value for the action.

Proof: Consider the following reparameterisation of the particle’s worldline {\tau \rightarrow \tau^{\prime} = f(\tau)} . Then we have

\displaystyle d\tau \rightarrow d\tau^{\prime} = \frac{\partial f}{\partial \tau}d\tau, \ \ (27)


\displaystyle  \frac{dX^{\mu}(\tau^{\prime})}{d\tau} = \frac{dX^{\mu}(\tau^{\prime})}{d\tau^{\prime}}\frac{d\tau^{\prime}}{d\tau} = \frac{dX^{\mu}(\tau^{\prime})}{d\tau^{\prime}} \frac{\partial f(\tau)}{\partial \tau}. \ \ (28)

Plugging this into the action (26) we get

\displaystyle S^{\prime} = -mc \int_{\tau_i}^{\tau_f} d\tau^{\prime} \ \sqrt{\frac{dX^{\mu}(\tau^{\prime})}{d\tau^{\prime}} \frac{dX_{\mu}(\tau^{\prime})}{d\tau^{\prime}}}

\displaystyle  = -mc \int_{\tau_i}^{\tau_f} \frac{\partial f}{\partial \tau} \ d\tau \ \sqrt{\frac{dX^{\mu}}{d\tau} \frac{dX_{\mu}}{d\tau} (\frac{\partial f}{\partial tau})^{-2}}

\displaystyle  = -mc \int_{\tau_i}^{\tau_f} (\frac{\partial f}{\partial \tau})(\frac{\partial f}{\partial \tau})^{-1} \ d\tau \ \sqrt{\frac{dX^{\mu}}{d\tau} \frac{dX_{\mu}}{d\tau}}

\displaystyle  = -mc \int_{\tau_i}^{\tau_f} d\tau \ \sqrt{\frac{dX^{\mu}(\tau)}{d\tau} \frac{dX_{\mu}(\tau)}{d\tau}}. \ \ (29)


This ends the proof. So we see the value of the action does not depend on the choice of parameter; indeed, the choice is arbitrary.

As alluded earlier in this section, reparameterisation invariance is a gauge symmetry. In some sense, this is not even an honest symmetry; because it means that we’ve introduced a redundancy in our description, as not all degrees of freedom {X^{\mu}} are physically meaningful. We’ll discuss this more in the context of the string (an example of such a redundancy appears in the study of the momenta).

4. Equation of motion for {S_{pp}}

To obtain (eqn. 1.2.3), Polchinski varies the action (26) and then integrates by parts. For simplicity, let us temporarily maintain {c = 1} . Varying (26)

\displaystyle  \delta S_{pp} = -m \int d\tau \delta (\sqrt{-\dot{X}^{\mu}\dot{X}_{\mu}}) \ \ (30)

\displaystyle  = -m \int d\tau \frac{1}{2}(-\dot{X}^{\mu}\dot{X}_{\mu})^{-1/2}(-\delta \dot{X}^{\mu}\dot{X}_{\mu}), \ \ (31)

then from the last term we pick up a factor of 2 leaving

\displaystyle  = -m \int d\tau (-\dot{X}^{\mu}\dot{X}_{\mu})^{-1/2} + (-\dot{X}^{\mu}\delta \dot{X}_{\mu}). \ \ (32)

Next, we make the substitution {u^{\mu} = \dot{X}^{\mu}(-\dot{X}^{\nu}\dot{X}_{\nu})^{-1/2}} such that

\displaystyle  \delta S_{pp} = -m \int d\tau (-u_{\mu})\delta \dot{X}^{\mu}. \ \ (33)

And now we integrate by parts, which shifts a derivative onto u using the fact we can commute the variation and the derivative {\delta \dot{X}^{\mu} = \delta d / d\tau X^{\mu} = d/d\tau \delta X^{\mu}} . We also drop the total derivative term that we obtain in the process

\displaystyle  \delta S_{pp} = -m \int d\tau \frac{d}{d\tau} (-u_{\mu}\delta X^{\mu}) - m \int d\tau \dot{u}_{\mu} \delta X^{\mu}, \ \ (34)

which gives the correct result

\displaystyle  \delta S_{pp} = -m \int d\tau \dot{u}_{\mu}\delta X^{\mu}. \ \ (34)

As Polchinski notes, the equation of motion {\dot{u}^{\mu} = 0} describes the free motion of the particle.

With the particle mass m being the normalisation constant, we can also take the non-relativistic limit to find (exercise 1.1). Returning to (26), one way to do this is for {\tau} to be the proper time, then, as before (reinstating c for the purpose of example)

\displaystyle  \dot{X}^{\mu}(\tau) = c \frac{dt}{d\tau} + \frac{d\vec{X}^{\mu}(\tau)}{d\tau} \ \ (35)

so that we may define the quantity {\gamma = (1 - v^2/c^2)^{-1/2}} . Then, in the non-relativistic limit where {v << c} we have {dt/d\tau = \gamma = 1 + \mathcal{O}(v^2/c^2)} . It follows

\displaystyle  \dot{X}^{\mu}\dot{X}_{\mu} = -c^2 + \mid \vec{v} \mid^2 + \mathcal{O}(v^2/c^2), \ \ (36)

with {\vec{v}} a spatial vector and we define the norm {\mid \vec{v} \mid \equiv v} . Now, equivalent as with the choice of static gauge, the action to order {v/c} takes the form

\displaystyle S_{pp} \approx -mc \int dt \sqrt{c^2 -\mid \vec{v} \mid^2}, \ \ (37)

where we now taylor expand to give

\displaystyle  S_{pp} \approx -mc \int (1 - \frac{1}{2}\frac{\mid \vec{v} \mid^2}{c^2}) \ \ (38)

Observe that we now have a time integral of a term with classical kinetic structure minus a potential-like term (actually a total time derivative) that is an artefact of the relative rest energy

\displaystyle  S_{pp} \approx \int dt \ (\frac{1}{2}m\mid \vec{v} \mid^2 - mc^2). \ \ (39)

5. Deriving {S_{pp}^{\prime}} (eqn. 1.2.5)

The main problem with the action (18) and equivalently (26) is that, when we go to quantise this theory, the square root function in the integrand is non-linear. Analogously, we will find a similar issue upon constructing the first-principle string action, namely the Nambu-Goto action. Additionally, in our study of the bosonic string, we will be interested firstly in studying massless particles. But notice that according to the action (26) a massless particle would be zero.

What we want to do is rewrite {S_{PP}} in yet another equivalent form. To do this, we add an auxiliary field so that our new action takes the form

\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d \tau (\eta^{-1} \dot{X}^{\mu} \dot{X}_{\mu} - \eta m^2), \ \ (40)

where we define the tetrad {\eta (\tau) = (- \gamma_{\tau \tau} (\tau))^{\frac{1}{2}}} . The independent worldline metric {\gamma_{\tau \tau}(\tau)} that we’ve introduce as an additional field is, in a sense, a generalised Lagrange multiplier. For simplicity we can denote this additional field {e(\tau)} so that we get the action

\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d\tau (e^{-1} \dot{X}^{2} - em^{2}), \ \ (41)

where we have simplified the notation by setting {\dot{X}^{2} = \eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu}} and completely eliminated the square root. This is equivlant to what Polchinski writes in (eqn.1.2.5). The structure of (41) may look familiar, as it reads like a worldline theory coupled to 1-dimensional gravity (worth checking and playing with).

To see that {S_{pp}^{\prime}} is classically equivalent (on-shell) to {S_{pp}} , we first consider its variation with respect to {e(\tau)}

\displaystyle  \delta S_{pp}^{\prime} = \frac{1}{2}\delta \int d\tau (e^{-1} \dot{X}^{2} - m^2 e)

\displaystyle  = \frac{1}{2} \int d\tau (- \delta (\frac{1}{e})\dot{X}^{2} - \delta (m^{2} e))

\displaystyle  = \frac{1}{2} \int d\tau (- \frac{1}{e^{2}}\dot{X}^{2} - m^{2}), \ \ (42)

which results in the following field equations

\displaystyle  e^{2} = \frac{\dot{X}^{2}}{m^{2}}

\displaystyle  \implies e = \sqrt{\frac{-\dot{X}^{2}}{m^{2}}} \ \ (43).

This again aligns with Polchinski’s result (eqn. 1.2.7).

Proposition 2 If we substitute (43) back into (41), we recover the original {S_{pp}} action (26).


\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d\tau [(-\frac{\dot{X}^2}{m^{2}})^{-1/2} \dot{X}^{2} - m^{2}(-\frac{\dot{X}^{2}}{m^{2}})^{1/2}]

\displaystyle  = \frac{1}{2} \int d\tau [(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} (\dot{X}^{2} - m^{2}(\frac{\dot{X}^{2}}{m^{2}})^{1/2})]

\displaystyle  = \frac{1}{2} \int d\tau [(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} (\dot{X}^{2} - m (- \dot{X}^{2})^{1/2})] \ \ (44)

Recalling {\dot{X}^{2} = \eta_{\mu \nu} \dot{X}^{\mu}\dot{X}^{\nu}} , substitute for {\dot{X}} in the square root on the right-hand side

\displaystyle  = \frac{1}{2} \int d\tau [(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} \dot{X}^{2} - m (- \eta_{\mu \nu} \dot{X}^{\mu}\dot{X}^{\nu})^{1/2}. \ \ (45)

For the first term we clean up with a bit of algebra. From complex variables recall {i^{2} = -1} .

\displaystyle  (-\frac{m^{2}}{\dot{X}^{2}})^{1/2} \dot{X}^{2} = (-1)(-1) -(\frac{m^{2}}{\dot{X}^{2}})^{1/2} \dot{X}^{2}

\displaystyle  = -(-\frac{m^{2}}{\dot{X}^{2}})^{1/2} i^{2} \dot{X}^{2}

\displaystyle = -(-\frac{m^{2}}{\dot{X}^{2}} i^{4} \dot{X}^{2})^{1/2}

\displaystyle  = -(-m^{2}i^{4}\dot{X}^{2})^{1/2} = -m (-i^{4}\dot{X}^{2})^{1/2}. \ \ (46)

As {i^{4} = 1} , it follows {-m(i^{4}\dot{X}^{2})^{1/2} = -m (-\dot{X}^{2})^{1/2}} . Now, substitute for {\dot{X}^{2}} and we find {-m (-\eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu})^{1/2}} giving

\displaystyle  S_{pp}^{\prime} = \frac{1}{2} \int d\tau [-m(- \eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu})^{1/2} - m (- \eta_{\mu \nu} \dot{X}^{\mu}\dot{X}^{\nu})^{1/2}

\displaystyle  = -m \int d\tau (- \eta_{\mu \nu}\dot{X}^{\mu}\dot{X}^{\nu})^{1/2} = S_{pp} \ \ (47).


This ends the proof, demonstrating that {S_{pp}} and {S_{pp}^{\prime}} are classically equivalent.

It is also possible to show that, like with {S_{pp}} , the action {S_{pp}^{\prime}} is both Poincaré invariant and reparameterisation invariant.

6. Generalising to Dp-branes

As an aside, and to conclude this note, we can generalise the action for a point particle (0-brane) to an action for a p-brane. It follows that a p-brane in a {D \geq p} dimensional background spacetime can be described in such a way that the action becomes,

\displaystyle  S_{pb}= -T_p \int d\mu_p \ \ (48).

The term {T_p} is one that will become more familiar moving forward, especially when we begin to discuss the concept of string tension. However, in the above action it denotes the p-brane tension, which has units of mass/volume. The {d\mu_p} term is the {(p + 1)} -dimensional volume measure,

\displaystyle  d\mu_p = \sqrt{- \det G_{ab}} \ d^{p+1} \sigma, \ \ (49)

where {G_{ab}} is the induced metric, which, in the {p = 1} case, we will understand as the worldsheet metric. The induce metric is given by,

\displaystyle  G_{ab} (X) = \frac{\partial X^{\mu}}{\partial \sigma^{a}} \frac{\partial X^{\nu}}{\partial \sigma^{b}} h_{\mu \nu}(X) \ \ \ a, b \equiv 0, 1, ..., p \ \ (50)  p>

A few additional comments may follow. As {\sigma^{0} \equiv \tau} , spacelike coordinates in this theory run as {\sigma^{1}, \sigma^{2}, ... \sigma^{p}} for the surface traced out by the p-brane. Under {\tau} reparameterisation, the above action may also be shown to be invariant.

7. Summary

To summarise, one may recall how in classical (non-relativistic) theory [LINK] the evolution of a system is described by its field equations. One can generalise many of the concepts of the classical non-relativistic theory of a point particle to the case of the relativistic point particle. Indeed, one will likely be familiar with how in the non-relativistic case the path of the particle may be characterised as a path through space. This path is then parameterised by time. On the other hand, in the case of the relativistic point particle, we have briefly reviewed how the path may instead be characterised by a worldline through spacetime. This worldline is parameterised not by time, but by the proper time. And, in relativity, we learn in very succinct terms how freely falling relativistic particles move along geodesics.

It should be understood that the equations of motion for the relativistic point particle are given by the geodesics on the spacetime. This means that one must remain cognisant that whichever path the particle takes also has many possibilities, as noted in an earlier section. That is, there are many possible worldlines between some beginning point and end point. This useful fact will be explicated more thoroughly later on, where, in the case of the string, we will discuss the requirement to sum over all possible worldsheets. Other lessons related to the point particle will also be extended to the string, and will help guide how we construct the elementary string action.


[Moh08] T. Mohaupt, Liverpool lectures on string theory [lecture notes].

[Pol07] J. Polchinski, An introduction to the bosonic string. Cambridge, Cambridge University Press. (2007).

[Wray11] K. Wray, An introduction to string theory [lecture notes].

[Zwie09] B. Zwiebach, A first course in string theory. Cambridge, Cambridge University Press. (2009).

(n-1)-thoughts, n=4: Covid, Twitter news, and Douglas Adams

Covid days

This is my first post in some weeks. Admittedly, I am one that can easily lose track of time as I get absorbed in one calculation or another. But that is not the reason for my lack of writing.

It was my turn to experience Covid for the first time. For the first 7-10 days, it hit me quite hard relatively speaking as being a person who is vaccinated. The second week I mostly struggled with a persistent dry cough, fatigue, and weakness. I remember reading at the height of the pandemic people with Covid saying that it felt like ‘being hit by bus’. I struggle to understand what this actually meant, because I imagine being hit by a bus to be a rather gruesome event. But, in the peak of my own Covid days, I think I finally realised the true meaning of the words.

Thankfully, I am feeling better now. Although super busy catching up with work and developing some new calculations for potential papers, I look forward to getting back to posting on my blog. When I don’t write for a while, it starts to impact my mental health. It’s just something that I greatly enjoy. It helps me process, sometimes even indirectly, and can even help stimulate news thoughts, much the same with reading a good book. There are some very nice physics papers that I would like to write about. One in particular offers what I thought was a rather astounding result. So I’ll probably start there, and then also continue uploading my old string notes based on Polchinski’s textbooks.

Twitter news

I prefer a world where Twitter is not so important. Don’t get me wrong, I enjoy Twitter. It has some fantastic communities. When I use the app, it’s primarily to check for fun maths posts, science news, history papers, or cool new archaeological finds. I also enjoy some of the technical F1 discussions, or the odd bit of literary discussion. When it works – that is, when my feed is adequately filtered so that I don’t have to skim through pages minutiae, such as when people feel the urge to share every passing thought or post what they had for lunch – I find that Twitter can be an enjoyable and certainly also positive experience.

Granted, my experience is limited and intentionally curated: I tend to keep to small communities in which discussion is generally reasonable, where the flow of information is well-assessed and knowledgeable. Sometimes there is constructive debate, sometimes deep disagreement, and other times challenging questions are asked or interesting perspectives offered. I’ve found that Twitter can also provide opportunities to interact positively with others in ways that may have not been otherwise possible. For instance, Steve Brusatte, a very well-known palaeontologist, wrote to me to say that he hopes I enjoy his book. (The book was fantastic, by the way). I thought that was awesome. I also enjoy interactions with other scientists, reading about what they are working on; and, during the pandemic, it allowed me to follow some leading virologists and epidemiologists – including some at my university – to track the latest studies and policy discussions.

Twitter may not be worth $44bn for these reasons – I imagine instead it is because of its power to decide elections that makes it so valuable to the right people – but, at least for me, it is within such a limited context that I’ve found it useful and at times a valuable information feed.

But I also know that Twitter can be a cesspool. I am keenly aware that it faces many problems and challenges. The way in which the social media platform is structured seems to often nullify the very goal it was supposed to realise, assuming a priori the goal was social and democratic in the first place. In much of the literature, introductions to social media ecologies often speak praisingly about the promise of social media without addressing critically its many obvious social and political complexities. Managing the preservation of free speech, combating hate speech, ensuring inclusivity, and encouraging greater representation (to name a few) are all hot-topic issues. As an extension of the social world, the platform is also plagued by toxicity, the constant drone of stupidity, identity politics, inconsequential opinionizing, and petty bickering.

(The last I saw, only 15-20% of the UK population use Twitter, and it is reasonable to think that the amount of people that actively post and engage on the platform is much smaller. I think it is comparably similar in the US. So, perhaps one explanation is that, at least in general, it is much more an abode for extremists and activists will to battle over their ideological worldview than an honest reflection of the average citizen. I’m not entirely sure).

One of the fundamental problems with Twitter, and really social media in general, is its lack of accountability. There is a fine line between banning users without reasonable explanation and justification – or banning users because of political speech that may not be deemed agreeable according to whatever metric – and banning users whose speech incites violence or hate. I’m sure most readers of this blog would agree that speech that incites violence is unacceptable. But what about speech that – whether in bad faith, out of ignorance, or otherwise – spreads misinformation? We live in what is generally a post-fact, post-truth society; communicative reason, or its absence, within this context is where misinformation for one group is what another group believes. What people think is true is much more important and meaningful than what is actually true. Hence, the polarisation of political viewpoints and attitudes – the underlying trend that gives authority to ideology over objective methodology – is legitimated in so many different ways.

As the Twitter continues to grow increasingly powerful, the lack of accountability becomes increasingly magnified. I am not just speaking of bans due to political speech, which I think is quite troubling (even when I might not agree with that speech); the subtlties between information and misinformation; issues with corporate agents and social media influencers, who are invested in propagating a certain message or viewpoint to sell their products; or the way in which powerful people with investment in certain political and economic outcomes may use the platform to shape important historical events (however honestly or dishonestly). I think what is perhaps most troubling is that Twitter, as a social ecosystem in itself, has become an echo chamber for groups that, instead of promoting healthy engagement, strengthens the legitimisation of communication driven toward confirmation bias and cognitive prejudice.

Social bias and prejudice, as rampant as it is actively enabled, generally forms a massive part of what shapes people’s opinions and political viewpoints today. Perhaps it has always been this way. But in the digital world, paradoxically, the manifestation of a type of behaviour that seeks only information that reinforces opinion – certainly an artifact of a deeply human impulse – seems to easily become extremised. One manifestation of this is of course the ‘click bait’ phenomenon. ‘The headline confirms what I think’, and then move on. But even in the best case toy example in which two rational actors may be discussing a topic in a mutually encircling way, social media platforms like Twitter function on the basis of the reduction of information to mere snippets and at the cost of substantive analysis. In opinion, it is not the abundance of thought that is the problem in our historical present; it is the lack of slow, substantive and meaningful thought that ails us. Twitter is the perfect example.

We can extend the discussion down more philosophical paths and also consider the way in which this has impacted the role that “facts” and “truth” (and their absence) play in a society increasingly lacking in its support of rational faculty. If people weren’t so easily influenced by everything they read, then maybe the problem of social media and its symplistic information streams (i.e., read a meme and decide it’s representative of one’s worldview) would be mitagated to some degree. There is a reason why current and past US Presidents, and likewise why current and past British Primeministers, speak at the level of GSCE english (or lower) when delivering public speeches. But, given the emerging patterns of simplistic information streams and what drives this emergence (which is certainly a complicated array of forces, including increasing public demands for transparency), the prevailing trend is one of reductionist and equally ideologically-form-fitting narratives (typically confirmation bias consumed with greatest ease and with least friction against one’s established worldview).

This brings me to the news this week that Elon Musk has bought Twitter. Inasmuch the news sparked anger and consternation on one side, there is another group that has celebrated the announcement, entrusting Musk the responsibility to manage accountability. He is, afterall, a self-proclaimed ‘free speech absolutist’ (for consistancy sake, let’s ignore the obvious contradictions in behaviour). It’s like any of the buzzwords that characterise the strange identity wars going on. The concepts are largely dumbed down on either side so that everyone has an opinion, made accessible in such a way that the framing of issues require no expertise or well-defined knowledge. Going back to Musk, in my opinion, it is a fool’s game to be absolutist in anything; but the sentiment obviously sticks in a time when freedom of speech is percieved by many to be under threat. And yet, as Twitter has taught me, hyperbole about one’s principles can easily become an idiot’s fable.

I have a hard time believing that, under Musk’s ownership, anything fundamental with Twitter will change. If he indeed percieves Twitter as the online town hall where people can debate important issues, then I suppose what’s left is to lament the loss of the principle of debate; because Twitter is certainly not the venue. It’s like, with so much energised attention on freedom of speech within social media ecologies, where a person may without thought say anything, we are quick to preserve the sacred right that a human being is completely entitled to an opinion without evidence and then just as fast forget that as soon as such an opinion is expressed as fact it becomes a lie. Indeed, in my most pessimistic moments I would say that social media, in helping manifest the much deeper separation of reason from social discourse, has eagrly retained the idea of free speech at the loss of the rationality that makes it meaningful in the first place.

A quote by Douglas Adams

The major problem—one of the major problems, for there are several—one of the many major problems with governing people is that of whom you get to do it; or rather of who manages to get people to let them do it to them.

To summarize: it is a well-known fact that those people who must want to rule people are, ipso facto, those least suited to do it.

To summarize the summary: anyone who is capable of getting themselves made President should on no account be allowed to do the job.

– Douglas Adams

Learning M-theory: Gauge theory of membranes, brane intersections, and the self-dual string

I’ve been learning a lot about M-theory. It’s such a broad topic that, when people ask me ‘what is M-theory?’, I continue to struggle to know where to start. Right now, much of my learning is textbook and I have more questions than answers. I naturally take the approach of first wanting as broad and general of a picture as possible. In some sense, it is like starting with the general and working toward the particular. Or, in another way, it’s like when being introduced to a new landscape and wanting, at the outset, a broad orientation to its general geographical features, except in this case we are speaking in conceptual and quantitative terms. I may not ever be smart enough to grasp M-theory in its entirety, but what is certain is that I am working my hardest.

In surveying its geographical features and charting my own map, if I may continue the analogy, obtaining a better sense of the fundamental objects of M-theory is a particular task; but my main research interest has increasingly narrowed to the study and application of gauge theory and higher gauge theory. This can be sliced down further in that I am very interested in the relationship between string and gauge theory, and furthermore in studying the higher dimensional generalisation of gauge theory. This interest naturally follows from the importance of gauge theory in contemporary physics, and then how we may understand it from the generalisation of point particle theory to string theory and then to other higher dimensional extended objects (i.e., branes). We’ve talked a bit in the past about how the dynamics on the D-brane worldvolume is described by a gauge theory. We’ve also touched on categorical descriptions, and how in p-brane language when we study the quantum theory the resemblance of the photon can be seen as a p-dimensional version of the electromagnetic field (by the way, we’re going to start talking about p-branes in my next string note). That is to say, we obtain a p-dimensional analogue of Maxwell’s equations. More advanced perspectives from the gauge theory view, or in this case higher gauge theory view in M-theory, illuminate the existence of new objects like self-dual strings.

There is so much here to write about and explore, I look forward to sharing more as I progress through my own studies and thinking. In this post, though, I want to share some notebook reflections on things I’ve been learning more generally in the context of M-theory: some stuff about membranes, 11-dimensional supergravity, and the self-dual string. This post is not very technical; it’s just me thinking out loud.

11-dimensional supergravity

The field content of 11-dimensional supergravity consists of the metric g_{\mu \nu}  , with 44 degrees of freedom; a rank 3 anti-symmetric tensor field C_{\mu \nu \rho}  , with 84 degrees of freedom; and these are paired off with a 32 component Majorana gravitino \Psi_{\alpha \mu}  , with 128 degrees of freedom. Although much has progressed since originally conceived, the Lagrangian for the bosonic sector is similar to as it was originally written [3]

S_{SUGRA} = \frac{1}{2k_{11}^2} \int_{M_{11}} \sqrt{g} \ (R - \frac{1}{48}F^{2}_{4}) - \frac{1}{6} F_{4} \wedge F_{4} \wedge C_3. \ \ (1)

The field strength is F_4 = dC_3  and k_{11}  is the 11-dimensional coupling constant. The field strength is defined conventionally,

\mid F_n \mid^2 = \frac{1}{n !} G^{M_1 N_1} G^{M_2 N_2} ... G^{M_n N_n}F_{M_{1}M_{2} ... M_{n}}F_{N_1 N_2 ... N_n}. \ \ (2)

The 11-dimensional frame field in the metric combination is G_{MN} = \eta_{AB}E^{A}_{M}E^{B}_{N}  , where we have the elfbeins E^{B}_{N}  , M,N  are indices for curved base-space vectors, and A,B  are indices for tangent space vectors. The last term in (2) is the Cherns-Simons structure. This is a topological dependent term independent of the metric. We see this structure in a lot of different contexts.

Although, from what I presently understand, the total degrees of freedom of M-theory are not yet completely nailed down, we can of course begin to trace a picture in parameter space. As we’ve discussed before on this blog, it can be seen how 10-dimensional type IIA theory in the strong coupling regime behaves as an 11-dimensional theory whose low-energy limit is captured by 11-dimensional supergravity. Reversely, compactify 11-dimensional supergravity on a circle of fixed radius in the x^{10} = z  direction, from the 11-dimensional metric we then obtain the 10-dimensional metric, a vector field and the dilaton. The 3-form potential leads to both a 3-form and a 2-form in 10-dimensions. The mysterious 11-dimensional theory can also be seen to give further clue at its parental status given how supergravity compactified on unit interval {\mathbb{I} = [0,1]}  , for example, leads to the low-energy limit of E8 \times E8  heterotic theory.

Non-renomoralisability of 11-dimensional SUGRA

One thing that I’ve known about for sometime but I have not yet studied in significant detail concerns precisely how 11-dimensional supergravity is non-renormalisable [4,5,6]. Looking at the maths, what I understand is that above two-loops the graviton-graviton scattering is divergent. Moreover, as I still have some questions about this, what I find curious is that in the derivative expansion in 11-dimensional flat spacetime (using a 1PI/quantum effective Lagrangian approach) the generating functional for the graviton S-matrix is non-local. But due to supersymmetry, low order terms in the derivative expansion can be separated into local terms, such as t_8 t_8 R^4  , and non-local (or global) terms that correspond to loop amplitudes. But what happens is that, at 2-loops, a logarithmic divergence that is cut off at the Planck scale mixes with a local term of the schematic form D^{12}R^4  , where R^4  is the supersymmetrised vertex. In the literature, one will find a lot of discussion about this R^4  vertex. But like I said, I really need more time looking at this.

In short, the important mechanism in string theory that allows us to avoid UV divergences is absent, or appears absent, in maximal supergravity. What could the UV regulator be? As in any supergravity, from what I understand, it is not clear that a Lagrangian description is sufficient at the Planck scale.

Membranes, D-branes, and AdS/CFT

The facts of 11-dimensional supergravity and how it relates to 10-dimensional string theory are textbook and well-known. Going beyond dualities relating different string theories, an obvious question concerns what M-theory actually constitutes. One thing that is known is that M-theory reduces to 11-dimensional SUGRA at low-energies, as we touched on, and it is known that fundamental degrees of freedom are 2-dimensional and 5-dimensional objects, known as M2-branes and M5-branes. Study of these non-perturbative states offer several intriguing hints. There are also solutions to classical supergravity known as F1 – the fundamental string – and its magnetic dual, the NS5-brane. As it relates to the story of the five string theories, the M-branes realize all D-branes, and this is why D-branes are considered consistent objects in quantum gravity.

The way that M-theory sees D-branes is via the net of dualities. All of the D-branes and the NS5 brane are solutions to type II theories, both A and B. So, when you reduce M-theory on a circle, in that you get back to Type IIA, the M2-branes and M5-branes reduce to the various D-branes such that under S-duality from the D5-brane you get the NS5.

The worldvolume theory of the M5-brane is always strongly coupled, which can be seen in moduli space (its parameters are simply a point). So there is no Lagrangian for this theory, and it suggests something deep is needed or is missing. It is expected that its worldvolume theory will be a 6-dimensional superconformal field theory, typically known as the 6d(2,0) theory. The worldvolume theory for M2-branes (on an orbifold) has been found to be a 3-dimensional superconformal Chern-Simons theory with classical \mathcal{N} = 6 supersymmetry.

If one considers a single M5-brane, a theory can be formulated in terms of an Abelian (2,0)-tensor multiplet, consisting of a self-dual 2-form gauge field, 5 scalars, and 8 fermions, but it is not known how to generalise the construction to describe multiple M5-branes. To give an example, using AdS/CFT [7] it is described how the worldvolume theory for a stack of N  M5-branes is dual to M-theory on AdS7 \times S4  with N  units of flux through the 4-sphere, which reduces to 11-dimensional SUGRA on this background in the limit large N  limit.

Brane intersections and stacks

The existence of branes is one of the most fascinating things about quantum gravity. There is a lot to unpack when learning about D1-branes, D3-branes, D5-branes, M2-branes, and M5-branes, as well as how they may intersect and what sort of consistent solutions have already been found [8,9, 10, 11, 12].

For example, an M2-brane, or a stack of coincident M2-branes, can end on a D5-brane. This is similar to the more simplified story of how D-branes, coincident D-branes, can intersect in string theory. Typically, D1-D3 systems in Type IIB string theory are studied because this system relates to the M2-M5 system by dimensional reduction and T-duality.

Self-dual string

For a membrane to end on a D5-brane, the membrane boundary must carry the charge of the self-dual field B on the five-brane worldvolume. There are different solutions to the field equations of B. For instance, a BPS solution was found [10] by looking at the supersymmetry transformation.

The linearised supersymmetry equation is

\delta_{\epsilon} \Omega^{j}_{\beta} = \epsilon^{\alpha i}(\frac{1}{2} (\gamma^{a})_{\alpha \beta}(\gamma_{b^{\prime}})^{j}_{i}\partial_a X^{b^{\prime}} - \frac{1}{6}(\gamma^{abc})_{\alpha \beta}\delta^{j}_i h_{abc}) = 0. \ \ (3)

Here b^{\prime}  labels transverse scalars, a indices label worldvolume directions, \alpha, \beta  denote spinor indices of spin(1,5), and i,j are spinor indices of USp(4)  . The solution balances the contribution of the 3-form field strength h with a contribution from the scalars. Additionally, the worldvolume of the string soliton can be taken to be in the 0,1 directions with all fields independent of x^0  and x^1  . An illustration of the solution is given below, showing an M2-brane ending on an M5-brane with a cross section S^3 \times \mathbb{R}  .

M2-branes ending on a M5-brane. The endpoint is a string. Courtesy of N. Copland, Aspects of M-Theory Brane Interactions and String Theory Symmetries [].

As I am still trying to understand the calculation, I am currently looking at the following string solution

H_{01m} = \pm \frac{1}{4} \partial_m \phi,

H_{mnp} = \pm \frac{1}{4} \epsilon_{emnpq}\delta^{qr}\partial_r \phi,

\phi = \phi_0 + \frac{2Q}{\mid x - x_0 \mid^2}, \ \ (4)

where \phi  may be replaced by a more general superposition of solutions. We denote \pm Q as the magnetic and electric charge. There is a conformal factor in the full equations of motion which guarantees that they are satisfied even at x = x_0  , which means the solution is solitonic. This string soliton is said to possess its own anomalies that require cancellation (I assume Weyl, Lorentz). What is neat is that this string can be dimensionally reduced to get various T-duality configurations, which is something that would be fun to look into at some point down the road.


[1] D. Fiorenza, H. Sati, and U. Schreiber, The rational higher structure of m-theory. Fortschritte der Physik, 67(8-9):1910017, May 2019. [arXiv:1903.02834 [hep-th]].

[2] E. Witten, String theory dynamics in various dimensions. Nuclear PhysicsB, 443(1):85 – 126, 1995.

[3] E. Cremmer, B. Julia, and J. Scherk, Supergravity Theory in 11-dimensions. Phys. Lett. B76, No. 4, (409-412) 19 June 1978.

[4] S. Chester, S. Pufu, and X Yin, The M-Theory S-Matrix from ABJM: Beyond 11D supergravity. (2019). [arXiv:1804.00949v3 [hep-th]].

[5] A. Tseytlin, R4 terms in 11 dimensions and conformal anomaly of (2,0) theory. (2005). [arXiv:hep-th/0005072v4 [hep-th]].

[6] G. Russo, and A. Tseytlin, One-loop four-graviton amplitude in eleven-dimensional supergravity. (1997). [arXiv:hep-th/9707134v3 [hep-th]].

[7] P. Heslop, and A. Lipstein, M-theory Beyond The Supergravity Approximation. (2017). [arXiv:1712.08570 [hep-th]].

[8] P.K. Townsend, D-branes from M-branes. (1995). [arXiv:hep-th/9512062 [hep-th]].

[9] A. Strominger, \textit{Open p-branes}. Phys. Lett. B 383 (1996) 44. [arXiv:hep-th/9512059 [hep-th]].

[10] P.S. Howe, N.D. Lambert, and P.C. West, The self-dual string soliton. Nucl. Phys. B 515 (1998) 203. [arXiv:hep-th/9709014 [hep-th]].

[11] M. Perry and J.H. Schwarz, Interacting chiral gauge fields in six dimensions and Born-Infeld theory. Nucl. Phys. B 489 (1997) 47. [arXiv:hep-th/9611065 [hep-th]].

[12] D.S. Berman, Aspects of M-5 brane world volume dynamics. Phys. Lett. B 572 (2003) 101. [arXiv:hep-th/0307040 [hep-th]].

[13] J. Huerta, H. Sati, and U. Schreiber, Real ADE-equivariant (co)homotopy and Super M-branes. (2018). [arXiv:1805.05987 [hep-th]].

[14] N. Copland, Aspects of M-Theory Brane Interactions and String Theory Symmetries. [].

[15] S. Palmer, Higher gauge theory and M-theory. [].

Notes on string theory #1: The non-relativistic string

The study of waves has far reaching applications throughout physics (and across the sciences). Fundamental physics is no exception. Essential for understanding waves is understanding oscillation, and as one will recall from classical mechanics a concept of fundamental importance in this regard is the harmonic oscillator. A simple harmonic oscillator can then also be generalised to quantum mechanical systems. Of course, coupled oscillators is another important generalisation (i.e., oscillators interacting with each other). Think, for example, of atoms in a crystal and the eventual discovery of the Debye theory of heat capacity.

The importance of understanding oscillation is true for the study of many other phenomena in nature: for sound waves, water waves, and even gravitational waves. In sound waves air molecules oscillate in the longitudinal direction (i.e., the coordinate direction the sound is travelling). In gravitation waves, the sort of waves predicted by General Relativity and other theories of gravity, we may think of oscillations in the very fabric of spacetime, similar to how electromagnetic waves are oscillations of the electromagnetic field that propagate through spacetime. In string theory, we will similarly study waves and their oscillations (for instance, we will find a general solution for the fields describing the centre of mass motion and oscillations of the string as it propagates through spacetime), and from this analysis we will discover some incredibly exciting results.

But in this note we begin with a much simpler story: the study of transverse waves travelling along a classical, non-relativistic string. The value in a such a review is, in my opinion, to refresh the connection with classical concepts in what is a lengthy journey of generalisation toward higher concepts. Or, at least this is how I like to approach and motivate string theory. (The same emphasis will become apparent in the next note on the relativistic point particle and p-branes). It will also help lay some groundwork intuition before studying the first-principle action of the bosonic string: namely the Nambu-Goto action, which of course will be relativistic in nature. Much of what is reviewed below will be generalised in the relativistic setting. What is nice is how, upon constructing the Nambu-Goto action for the relativistic string (as studied in the first pages of Polchinski’s textbook), we will show that in the non-relativistic limit we can recover (up to a total derivative) the action for the ordinary, classical vibrating string discussed in this note.

As this is a lightning review of a small selection of topics from classical and wave mechanics, for further reading see Taylor [1] – or, in the context of an introduction to string theory, see Chapter 4 in Zwiebach [2]. David Morin’s lecture notes as well as these course notes by Matt Jarvis are particularly good, to name a few.

A stretched string with transverse oscillations

Consider a string stretched with fixed endpoints. For simplicity, we may use the (x, y)  plane and pin the string endpoints at (0, 0)  and (a, 0)  such that the string is stretched along the x-axis. The direction along the string is longitudinal (i.e., the x-coordinate direction), while the direction orthogonal to the string is transverse (i.e., the y-coordinate direction). When describing a transverse oscillation, the x-coordinate of any point on the string will not vary in time. Rather, the transverse displacements at any point will be described by the y-coordinate.

Two pieces of information are required in order to describe the classical mechanics of the homogeneous string: tension and mass per unit length.

Tension has units of force. This means that we can first write the tension in terms of [Energy / Length]. But energy has units of mass times velocity squared, so we can write the following equation

T_0 : [T_0] = [\text{Force}] = [\text{Energy / Length}] = \frac{M}{L}[v^2]. \ \ (1)

Denote the Mass / Length as \mu_0  such that (1) may be updated as T_0 \approx \mu_0 v^2  , where the natural velocity v = \sqrt{T_0 / \mu_0}  . The tension T_0  and the mass per unit length \mu_0  are dynamical parameters. The velocity will eventually prove to be the velocity of transverse waves.

As a continuation of the above reasoning, consider the next simplifying assumptions. Assume that the string is infinitesimally thin and completely flexible. Since we have not yet considered boundary conditions, let us change our previous assumption and now consider that this stretched homogeneous string extends infinitely in both directions. Next, consider exciting the string such that, for two nearby points separated by a displacement dx in the longitudinal direction, there is small transverse displacement dy.

Figure 1.

It is worth noting that, in the non-relativistic case, if the string is stretched an infinitesimal amount dx, its tension will remain approximately constant. Whatever change in energy will be equal to the work done T_0 dx  . For the total mass, there will be no change. But in the relativistic case, any increase in energy will of course correspond to a larger rest mass. Furthermore, equation (1) suggests for a relativistic string T_0  and \mu_0  may be expressed by the relation T_0 = \mu_0 c^2  , where c is the canonical velocity in relativity. When we eventually study the relativistic string in detail, we find this relation to indeed be correct.

For a small dy (which is to say, if the slope dy / dx is small), then we can approximate that all points in the string move only in the transverse direction. This means we consider there to be no longitudinal motion. To make sense of this statement, consider a point on the string in transverse displacement.

Figure 2.

Notice that the length of the hypotenuse equals

\sqrt{dx^2 + dy^2} = dx \sqrt{1 + (\frac{dy}{dx})^2}

\approx dx(1 + \frac{1}{2}(\frac{dy}{dx})^2) = dx + dy\frac{1}{2}(\frac{dy}{dx}). \ \ (2)

The length element that defines the farthest a given point can move longitudinally is generally different (indeed, smaller) than the length of the long side in the dx direction by dy(dy/dx)/2, which is only (dy/dx)/2 times as large as the transverse displacement dy. With the assumption that dy/dx is small, which is to say \mid dy/dx \mid << 1  , the longitudinal motion can therefore be neglected. In fact, all points along this segment of string move only in the transverse direction. Hence, each point is considered to be labelled with a unique value of x. Again, given the string will stretch slightly, we can safely assume that the amount of mass in any given horizontal span stays essentially constant.

Equations of motion

We want to calculate the equations of motion. The strategy is to invoke Newton’s law and write down the transverse F = ma  equation for the little piece of string in the span from x to x + dx (ignoring gravity for simplicity). To do this, we must first consider the forces acting on the string.

Figure 3.

Let T_1  and T_2  be the tensions in the string at the end points. We define the angle \theta_1  at x and \theta_2  at x + dx. As shown in Figure 3, this infinitesimal segment of string is subject to opposing tension forces at its two ends. These tension forces act along the local tangent line to the string. As the working assumption is that the string displacement remains sufficiently small such that the tension does not vary in magnitude along the string, we may suppose the local tangent line to the string subtends angles \theta_1  and \theta_2  with the x-axis. Note that these angles are written as infinitesimal quantities because the string displacement is assumed to be infinitesimally small, which implies that the string is everywhere almost parallel with the x-axis (i.e., the string displacement is greatly exaggerated in Figure 3 for purpose of illustration). Precisely because the slope dy/dx is small, it is essentially equal to the \theta  angles.

The slope of the string is different at the points x and x + dx. A consequence of this difference in slope (however small) is that the tension changes direction, so there is a net force F_{net}  .

Consider the net longitudinal or x-component of the tension force. We don’t need calculus here, just a simple reading of the force diagram shows

F_{x} = T_2 \cos (\theta_1 + \theta_2) - T_1 \cos \theta_1. \ \ (3)

Again, assuming the change in the angle is small at x and x + dx, then F_x \approx 0  . Another way to look at this is to use the small angle approximation \cos \theta \approx 1 - \theta^2 / 2  that tells us that the longitudinal components of the tensions are equal to the tensions themselves (up to small corrections of order \theta^2 \approx (dy / dx)^2  ). And because there is no longitudinal motion (or because it is so small we can neglect it), the acceleration component in the longitudinal direction is zero and, as a consequence of this reasoning, the longitudinal forces must cancel. Therefore, we find T_1 = T_2  .

Analysis of the transverse or y-components is obviously very different. The transverse components differ by a quantity that is first order in dy/dx. This difference cannot simply be neglected, because it causes the transverse acceleration.

For the y-component of the tension force, first recognise that the transverse force at x + dx is T sin \ \theta_2  , which is approximately equal to T times the slope. So the upward force can be written as Ty^{\prime}(x + dx)  . The downward force at x is then simply -Ty^{\prime}(x)  . After a bit of calculus the net transverse force is shown to be

F_{y} = T(y^{\prime}(x + dx) - y^{\prime}(x)) = T \frac{y^{\prime}(x + dx) - y^{\prime}(x)}{dx} \equiv T dx \frac{d^2 y(x)}{dx^2}, \ \ (4)

where the definition of the derivative is used to obtain the equivalence on the right-hand side. Indeed, the difference in the first derivatives yields the second derivative.

The mass dx of this piece of string, originally stretched from x to x + dx, is given by the mass density \mu_0  times dx. By Newton’s law, the net vertical force equals mass times vertical acceleration. So we can simply write

F_y = ma \implies T \frac{d^2y(x)}{dx^2} dx = (\mu_0 dx)\frac{d^2y(x)}{dt^2}. \ \ (5)

Cancel dx on both sides and rearrange terms to give

\frac{d^2y(x)}{dx^2} - \frac{\mu_0}{T}\frac{d^2 y(x)}{dt^2} = 0. \ \ (6)

Since y is a function of x and t, we can explicitly include this dependence and write y as y(x, t). Then the standard derivatives become partial derivatives and we arrive at the expected wave equation.

\frac{\partial^2 y(x,t)}{\partial x^2} - \frac{\mu_0}{T}\frac{\partial^2 y(x,t)}{\partial t^2}. \ \ (7)

Now, compare (7) with the standard wave equation below

\frac{\partial^2 y}{\partial x^2} - \frac{1}{c^{2}} \frac{\partial^2 y}{\partial t^2} = 0, \ \ (8)

with c the parameter for the velocity of the waves. In the case for transverse waves on the classical stretched string, we find the velocity of the waves is

c = \sqrt{T_0 / \mu_0}. \ \ (9)

Physically, we see that the lighter the string or the higher the tension, the faster the wave moves. This makes a lot of sense.

General solution

The wave equation provides a general equation for the propagation of waves, linking the displacements of a wave in the y-coordinate direction with the time and also the displacement along the perpendicular x-axis. We therefore require solutions that link together x and t dependences.

One approach to deriving such a solution follows d’Alembert. In this approach, the displacement in y is defined as a function of two new variables, u and v, such that

u = x -ct, \ \ v = x + ct. \ \ (10)

To relate these solutions to the wave equation, we first differentiate both u and v with respect to x and t. Using the chain rule,

\frac{\partial y}{\partial x} = \frac{\partial y}{\partial u} \frac{\partial u}{\partial x} + \frac{\partial y}{\partial v}\frac{\partial v}{\partial x} = \frac{\partial y}{\partial u} + \frac{\partial y}{\partial v},

\frac{\partial y}{\partial t} = \frac{\partial y}{\partial u}\frac{\partial u}{\partial t} + \frac{\partial y}{\partial v}\frac{\partial v}{\partial t} = -c \frac{\partial y}{\partial u} + c \frac{\partial y}{\partial v}. \ \ (11)

For the second derivatives we obtain,

\frac{\partial^2 y}{\partial x^2} = \frac{\partial}{\partial x}(\frac{\partial y}{\partial u} \frac{\partial u}{\partial x} + \frac{\partial y}{\partial v}\frac{\partial v}{\partial x})

= 2 (\frac{\partial^2 u}{\partial x \partial u} + \frac{\partial^2 v}{\partial x \partial v}) \frac{\partial y}{\partial x},


\frac{\partial^2 y}{\partial t^2} = \frac{\partial}{\partial t}(\frac{\partial y}{\partial u}\frac{\partial u}{\partial t} + \frac{\partial y}{\partial v}\frac{\partial v}{\partial t})

= 2(\frac{\partial^2 u}{\partial t \partial u} + \frac{\partial^2 v}{\partial t \partial v}) \frac{\partial y}{\partial t}. \ \ (12)

Using the equation for the respective first derivative in (11) we find,

\frac{\partial^2 y}{\partial x^2} = (\frac{\partial u}{\partial x}\frac{\partial}{\partial u} + \frac{\partial v}{\partial x}\frac{\partial}{\partial v})(\frac{\partial y}{\partial u} + \frac{\partial y}{\partial v}), \ \ (13)


\frac{\partial^2 y}{\partial t^2} = (\frac{\partial u}{\partial t}\frac{\partial}{\partial u} + \frac{\partial v}{\partial t}\frac{\partial}{\partial v})(-c (\frac{\partial y}{\partial u} -  \frac{\partial y}{\partial v})). \ \ (14)

Rearranging terms and substituting the fact from (10)

\frac{\partial u}{\partial x} = \frac{\partial v}{\partial x} = 1, \ \ -\frac{\partial u}{\partial t} = \frac{\partial v}{\partial t} = c, \ \ (15)

we find

\frac{\partial^2 y}{\partial x^2} = \frac{\partial^2 y}{\partial u^2} + 2 \frac{\partial^2 y}{\partial u \partial v} + \frac{\partial^2 y}{\partial v^2}


\frac{\partial^2 y}{\partial t^2} = c^2 (\frac{\partial^2 y}{\partial u^2} - 2 \frac{\partial^2 y}{\partial u \partial v} + \frac{\partial^2 y}{\partial v^2}). \ \ (16)

Substituting this result into the wave equation (7) we obtain

\frac{\partial^2 y}{\partial u \partial v} = 0. \ \ (17)

Observe that y is separable into functions of u and v such that

y(u,v) = g(u) + h(v). \ \ (18)

Therefore, the general solution to the wave equation takes the form

y(x,t) = g(x - ct) + h(x + ct), \ \ (19)

where g and h are single variable arbitrary functions, representing the string’s initial shape and velocity. The values of these functions are to be determined by the initial conditions of the system.

Boundary conditions and initial conditions

In order to find unique solutions to the partial differential equation (7) and (8), which involves space and time derivatives, we are required to apply both initial conditions and boundary conditions. The initial conditions we apply constrain the solution at some time, while the boundary conditions constrain the solution at the boundary of the system.

There are primarily two types of boundary conditions common when discussing the dynamics of a string: Dirichlet and Neumann boundary conditions. This is also true in string theory, and we will see in the context of the relativistic string that application of boundary conditions imply the existence of some interesting new objects. We will therefore return to this topic in a future note. But for now, there is an intuitive way to think about what happens at the endpoints of our stretched classical string.

Dirichlet boundary conditions specify the positions of the string endpoints. For example, imagine screwing each endpoint of a string into a wall or a wooden block. The endpoint of the string is fastened such that it cannot move up and down the wall in what we previously described as the y-coordinate direction. Therefore, when imposing Dirichlet boundary conditions we have

y(t, x = 0) = y(t,x=a) = 0. \ \ (20)

On the other hand, Neumann boundary conditions are akin to attaching a massless loop to each end of the string, with the loops allowed to slide along two frictionless poles. In this case, the Neumann boundary conditions mean that the endpoints are free to move along the y-coordinate direction. We specify the values of \partial y / \partial x  at the endpoints.

\frac{\partial y}{\partial x}(t,x = 0) = \frac{\partial y}{\partial x}(t, x = a) = 0. \ \ (21)

Finite string with fixed endpoints

There are a number of particular solutions to the wave equation that we may derive for various boundary configurations and initial conditions. For instance, we can have a fixed end at x = 0  and a free end at x = a  , or two free ends, or both ends fixed, and so on. First, we consider the case of an infinite string with one fixed endpoint. Then we’ll consider the case of a finite string with both endpoints fixed.

Consider a leftward-moving single sinusoidal wave that is incident on a wall located at x = 0  . The most general form of a leftward-moving sinusoidal wave is given by

y (x,t) = A \cos (kx - \omega t + \phi), \ \ (22)

where \omega / k = c = \sqrt{T / \rho}  , \phi  is arbitrary and depends on the location of the wave at t = 0  . We model the wall as equivalent to a system of infinite impedance with reflection coefficient r = -1  . This can, again, be reviewed in most classical mechanics texts. This means that at the wall we obtain a reflected wave with amplitude of the same magnitude as the incident wave but with the opposite sign and travelling in the opposite direction y_r  such that

y_r (x,t) = -A \cos (kx + \omega t + \phi). \ \ (23)

If we were to observe this system, we would see a summation of these two waves (using familiar trigonometric functions)

y(x,t) =  A \cos (kx - \omega t + \phi) - A\cos (kx + \omega t + \phi), \ \ (24)

which, using familiar trig identities, we obtain an expression in terms of the \sin  function

y(x,t) = -2A \sin (\omega t + \phi) \sin kx. \ \ (25)

Alternatively, we can write

y(x,t) = 2A \sin (\frac{2\pi x}{\lambda})\sin(\frac{2\pi t}{T} + \phi). \ \ (26)

Of course, rather than assigning r =-1  for a wall, we could instead derive the change in sign given that at the wall where x = 0  it follows y = 0  . Then for all t and working from the general solution of the wave equation we find

y(x,t) = A_1 \sin(kx - \omega t) + A_2 \cos(kx - \omega t) + A_3 \sin (kx + \omega t) + A_4 \cos (kx + \omega t)

\implies y(x,t) = B_1 \cos kn \cos \omega t + B_2 \sin kx \sin \omega t + B_3 \sin kx \cos \omega t + B_4 \cos kn \sin \omega t \ \ (27)

at y(0,t) = 0  . Therefore, we should only have \sin kx  terms

y(x,t) = B_2 \sin kn \sin \omega t + B_3 \sin kx \cos \omega t

= (B_2 \sin \omega t + B_3 \cos \omega t) \sin kx

= B \sin (\omega t + \phi) \sin kx. \ \ (28)

If the coefficients B_2 = B_3  then B = 2B_2 = 2B_3  .

For a system in which the string is fixed at both ends, i.e. at x = 0 and x = a, the boundary conditions we impose require both y(0, t) = 0  and y(a, t) = 0  . Therefore, we see from the previous example that the only way to have y(a, t) = 0  for all t is to ensure that \sin ka = 0  . This means that ka must be an integer number of \pi  such that

k_n =  \frac{n\pi}{a}, \ \ (29)

where n \in \mathbb{Z}  describes which string mode is excited.

In the present configuration, each endpoint represents a node. This implies that we can only have wavelengths which are related to the length of the string by n such that

\lambda_n = \frac{2\pi}{k_n} = \frac{2a}{n}. \ \ (30)

Therefore, we can now write a solution of the form

y(x,t) = -2A \sin (\omega t + \phi)\sin (\frac{n\pi}{a}) = -2A \sin (\omega t + \phi) \sin (\frac{n\pi}{a}). \ \ (31)

Observe that the allowed wavelengths on the string are all integer divisors of twice the length of the string. For the mode n = 0  , the string can be seen to be at rest and in equilibrium.

From what has been derived above, one can proceed with an analysis of the angular frequency \omega  to find that the frequency of oscillations of the string are in fact all integer multiples of the fundamental frequency. But, most importantly here, it is noted that since the wave equation (7) is linear, the most general motion of a string with both endpoints fixed is simply a linear combination of the solution (25), where k can only k = n\pi/L  and \omega / k = v  . Therefore, the general expression for y(x, t) may be written as a summation over all n,

y(x,t) = \sum\limits^{\infty}_{n = 0} F_n \sin (\omega_n t +  \phi_n) \sin(k_n x). \ \ (32)

We have obtained a sum of all possible solutions with the coefficients F_n  given by the initial displacement.

Constructing the Lagrangian

The remaining space will be focused on a review of how to construct the Lagrangian for the classical non-relativistic string, and then we will calculate its equations of motion within the Lagrangian formalism. It should be familiar that, in general, an action will take the form S = \int \ L \ dt  , where L is the Lagrangian. The Lagrangian will have both kinetic (T) and potential energy (V) such that schematically we have something of the form L = T - V  .

Returning to the picture of the string with constant mass density \mu_0  , constant tension T_0  , and with its endpoints located at x = 0  and x = a  , the first step is to see that we should integrate the kinetic energy over the infinitesimal dx pieces along the string. In other words, the kinetic energy will be the sum of the kinetic energies of all the infinitesimal segments along the string. We also consider the rate at which each segment along the string is moving. Following this reasoning we obtain for the kinetic part of the Lagrangian

T = \int_{0}^{a} \frac{1}{2}\mu_{0}(\frac{\partial y}{\partial t})^{2}. \ \ (33)

The potential energy relates the work done when each segment along the string is stretched. Similar to the picture earlier, when a single infinitesimal segment of string is stretched from (x, y) to (x + dx, y + dy) this individual segment will change by some \Delta \ L  . To derive an expression for the potential V, we assume small oscillation such that \mid \frac{dy}{dx} \mid << 1  . That is to say, we take \delta L = \sqrt{dx^2 + dy^2} - dx  .

As an expression of the work done by deformation, the potential energy V may be written as

V = T(ds -dx), \ \ (34)


(ds)^2 = (dx)^2 + (dy)^2 = (dx)^2 (1 + (\frac{\partial y}{\partial x})^2). \ \ (35)

Again, as found earlier, using the binomial series expansion (and because we are invoking the small angle approximation, we ignore the higher order terms in the expansion),

ds \approx dx (1 + \frac{1}{2}(\frac{\partial y}{\partial x})^2 + ...). \ \ (36)

We take this result and account for the fact that the work done relates to the stretching of each infinitesimal segment such that T_{0} ds  .

V = \int_{0}^{a} \frac{1}{2}T_{0}(\frac{\partial y}{\partial x})^{2} dx. \ \ (37)

Bringing together our expressions for T and V gives

L = \int_{0}^{a} [\frac{1}{2} \mu_0 (\frac{\partial y}{\partial t})^2 - \frac{1}{2} T_0 (\frac{\partial y}{\partial x})^2] dx \\ \equiv \int_{0}^{a} \mathcal{L} dx, \ \ (38)

where \mathcal{L}  is the Lagrangian density

\mathcal{L}(\frac{\partial y}{\partial t}, \frac{\partial y}{\partial x}) = \frac{1}{2}\mu_0 (\frac{\partial y}{\partial t})^2 - \frac{1}{2}T_0 (\frac{\partial y}{\partial x})^2. \ \ (39)

We may simplify notation with \partial y / \partial t = \dot{y}  and \partial y / \partial x = y^{\prime}.  The action for the non-relativistic string therefore takes the form,

S_{NR} = \int_{t_{i}}^{t_{f}} L(t) dt \int_{0}^{a} dx [\frac{1}{2} \mu_0 \dot{y}^2 - \frac{1}{2} T_0 y^{\prime 2}]. \ \ (40)

Recall from a previous section that we need to include a time component in addition to a spacial component, because in the action the path is the function y(t, x). So we see, again, that we are integrating from some initial time to some final time, and from an initial point to some final point in a region of (t,x) space.

Equations of motion

To find the equations of motion, we need to vary the action (37). But before that, let’s introduce some notation that will allow us to achieve a more general derivation. The simplification has to do with the momentum density of the string. The momentum density will be denoted as \mathcal{P}^t  and \mathcal{P}^x  . We compute these terms as follows,

\mathcal{P}^t = \frac{\partial \mathcal{L}}{\partial \dot{y}} = \mu_0 \frac{\partial y}{\partial t}

\mathcal{P}^x = \frac{\partial \mathcal{L}}{\partial y\prime} = - T_0 \frac{\partial y}{\partial x}. \ \ (41)

As the function y(t, x) represents the path of our string, in the variation y (t, x) \rightarrow y(t, x) + \delta y (t, x)  we may compute,

\delta S_{NR} = \int_{t_i}^{t_f} dt \int_{0}^{a} dx (\frac{\partial \mathcal{L}}{\partial \dot{y}} \delta \dot{y} + \frac{\partial \mathcal{L}}{\partial y \prime} \delta y\prime). \ \ (42)

Substituting for \mathcal{P}^t  and \mathcal{P}^x  in (40) we obtain,

= \int_{t_i}^{t_f} dt \int_{0}^{a} dx [\mathcal{P}^t \delta \dot{y} + \mathcal{P}^x \delta y^{\prime}]

= \int_{t_i}^{t_f} dt \int_{0}^{a} dx (\mathcal{P}^t \frac{\partial}{\partial t} \delta y + \mathcal{P}^x \frac{\partial}{\partial x} \delta y)

= \int dt dx [\frac{\partial}{\partial t} (\mathcal{P}^t \delta y) + \frac{\partial}{\partial x} (\mathcal{P}^x \delta y) - \delta y (\frac{\partial \mathcal{P}^t}{\partial t} + \frac{\partial \mathcal{P}^x}{\partial x})]. \ \ (43)

Following conventional procedure in the calculus of variations, we integrate by parts. Doing so gives

\delta S =\int_{0}^{a} dx (\mathcal{P}^t \delta y)]_{t_i}^{t_f} + \int_{t_i}^{t_f } dt (\mathcal{P}^x \delta y)]_{0}^{a} - \int_{t_i}^{t_f} dt \int_{0}^{a} dx \ \delta y (\frac{\partial \mathcal{P}^t}{\partial t} + \frac{\partial \mathcal{P}^x}{\partial x}). \ \ (44)

This final expression for the varied action contains three terms. Each term must vanish independently. Notice that the first term is determined by the string’s arrangement at t_{i}  and t_{f}  . In the present case, our interest is not with these initial and final conditions (unless we were interested in the Hamilton-Jacobi), so we can just specify initial and final configurations which results in setting the variation to zero. We could even just set the times to infinity and forget about it entirely, the choice is ours.

Instead, our interest begins with the second term. Notice that it describes the motion of the string endpoints between 0 and a during the time interval from t_i  to t_f  . We may expand this term as follows,

\int_{t_i}^{t_f } [\mathcal{P}^x \delta y]_{0}^{a} = \int_{t_i}^{t_f } dt [\mathcal{P}^x(t, x=a) \delta y(t, x=a) - \mathcal{P}^t (t, x=0) \delta y (t, x=0)]. \ \ (45)

What we require are boundary conditions for both terms.

Once again, putting the choice of mixed conditions to the side, we can either invoke Dirichlet or Neumann boundary conditions. As discussed in a previous section, there are physical implications dependent on either choice. Dirichlet boundary conditions means the string endpoints are fixed in time. Hence, consider some x coordinate at the endpoints, and with the choice of Dirichlet boundary conditions the variation \delta y(t, x)  must vanish for both terms. In this case, too, momenta along the string will not be conserved. Neumann boundary conditions, on the other hand, means that the endpoints are free to move, and thus under this choice \delta y(t, x)  would be unconstrained. If we were to do the calculations, we would see that momenta along the string is conserved.

Let us now focus on the remaining term in (42). This term is determined by the motion of the string for x \in (0, a)  and t \in (t_{i}, t_{f})  . It gives us the equation of motion,

\frac{\partial \mathcal{P}^{t}}{\partial t} + \frac{\partial \mathcal{P}^{x}}{\partial x} = 0. \ \ (46)

Now let us ask: what is \frac{\partial \mathcal{P}^t}{\partial t}  and \frac{\partial \mathcal{P}^x}{\partial x}  ? Let’s ask another question: what is \mathcal{P}^t  and \mathcal{P}^x  ? We already computed these above. All we need to do is take their partial derivative respectively.

\frac{\partial \mathcal{P}^t}{\partial t} + \frac{\partial \mathcal{P}^x}{\partial x} = 0

\implies \mu_0 \frac{\partial^2 y}{\partial t^2} - T_0 \frac{\partial^2 y}{\partial x^2} = 0. \ \ (47)

Rearranging and remembering that v^2 = \frac{T_0}{\mu_0}  we find,

\frac{\partial^2 y}{\partial x^2} - \frac{1}{\frac{T_0}{\mu_0}}\frac{\partial^2 y}{\partial t^2} = 0. \ \ (48)

Once again, we have found the wave equation.


In a final note, a vibrating string will of course carry energy. A natural question has to do with the quantity of energy. To answer this we may look at the relation between kinetic energy and potential energy as the string oscillates.

As before, we consider a small segment of string and study the linear density $\mu &fg=000000 &s=2$ between x and x + dx, displaced in the y-coordinate direction. Once again, assuming the displacement is small then we can calculate the kinetic energy density (i.e., the K.E. per unit length) and the potential energy density.

We know the kinetic energy (33) and the potential energy (37). We also know that the solutions to the wave equation take the form

y(x,t) = f(x \pm ct). \ \ (49)


\frac{dK}{dx} = \frac{1}{2}\mu c^2 [f^{\prime}(x \pm ct)]^2, \ \frac{dV}{dx} = \frac{1}{2}T[f^{\prime}(x \pm ct)]^2. \ \ (50)

Recall c = \sqrt{T / \mu}  , and so we find the kinetic energy density to be equal to the potential energy density. Another way to observe this equality is by substituting a solution y = A \sin (kx - \omega t)  for the wave equation into these equations for the kinetic energy and potential energy density. We can then evaluate the energy over n wavelengths.

For the kinetic energy density,

K = \frac{1}{2}\mu \int_{x}^{x + n\lambda} A^2 \omega^2 \cos^2 9kx - \omega t) dx

= \frac{1}{2} \mu A^2 \omega^2 \int_{x}^{x + n \lambda} \frac{1}{2}(1 + \cos[2(kx - \omega t)]) dx. \ \ (51)

And for the potential energy density we find,

V = \frac{1}{2}T \int_{x}^{x + n\lambda} A^2 k^2 \cos^2 (kx - \omega t) dx

= \frac{1}{2} T A^2 k^2 \int_{x}^{x + n\lambda} \frac{1}{2} (\cos [2(kx - \omega t)]) dx. \ \ (52)

And so we see that

K = \frac{1}{2} \mu A^2 \omega^2 \frac{n\lambda}{2}, \ V = \frac{1}{2} A^2 k^2 \frac{n\lambda}{2}. \ \ (53)

But as c = \sqrt{T / \mu} = \omega / k \rightarrow \mu \omega^2 = T k^2  , these expressions for K and V are once again equal. We can also deduce therefore that the total energy per unit length is 1/2 \mu A^2 \omega^2  . From this analysis it is possible to go on and study the energy flow per unit time and the power to generate the wave.


[1] Taylor, J., Classical mechanics. University Science Books, 2004.

[2] Zwiebach, B., A first course in string theory (2nd edition). Cambridge University Press, 2009.

Mathematical language of duality

As we’ve discussed at various times on this blog, many of the most important recent developments in string / M-theory are based on duality relations. Physical insight is quite ahead of mathematics in this regard. But, in the last decade or two, mathematics has started to properly formulate a language of duality that, on first look, seems incredibly simple but is ultimately very powerful: namely, the language of categories. In foundational mathematical terms, category theory provides tools to express structures – often very general structures – and their duals in a way that comes out naturally through the concept of a categorical product and coproduct. Below is a very brief summary.

Definition of a category

Let us quickly recall the definition of a category \mathcal{C}  . As mentioned in a past post, a category can be constructed for essentially any mathematical object. We can think of a category as a quintessential representation of structure.

Definition 1. A category \mathcal{C}  consists of a class of objects, and, for every pair of objects A,B \in \mathcal{C}  , a class of morphisms hom(A,B)  satisfying the properties:

  • Each morphism has specified domain and codomain objects. If f is a morphism with domain A and codomain B we write f: A \rightarrow B  .
  • For each A \in \mathcal{C}  , there is an identity morphism id_A \in \text{hom}(A,A)  such that for every B \in \mathcal{C}  we have left-right unit laws:

f \circ id_A = f \text{for all} f \in \text{hom}(A,B),

id_A \circ f = f \text{for all} f \in \text{hom}(B,A).

  • For any pair of morphisms f,g with codomain of f equal to codomain of g, there exists a composite morphism g \circ f  . The domain of the composite morphism is equal to the domain of f and the codomain is equal to the codomain of g.

In simple terms, a category is just a collection of objects (metric spaces, topological spaces, or whatever) and structure preserving maps between those objects. It is, in a sense, like a deeper generalisation of set theory, except that we can have categories of sets. A simple illustration of a category is as follows

There are two axioms that must be satisfied in the defining a category:

  • For any f: A \rightarrow B  , the composites 1_B f  and f1_A  are equal to f.
  • Composition is associative and unital. For all A, B,C,D \in \mathcal{C}  , f \in \text{hom}(A,B)  , g \in \text{hom}(B,C)  , and h \in \text{hom}(C, D)  , we have f \circ (h \circ g) = (g \circ f) \circ h  .


We can also define a functor, which maps between categories. We define the notion of a functor as corresponding to a mapping that sends the objects and arrows of one category to the objects and arrows in another category in a structure preserving way.

Definition 2. A functor F  from C  to D  is a structure preserving map between categories such that for each object A  of C  , we have F(A)  in D  .

For each arrow (morphism) f: A \rightarrow B  in C  , we have F(f): F(A) \rightarrow F(B)  such that F(g) \circ F(f) = F(g \circ f)  and F(Id_A) = Id_{F(A)}  .

Suppose f: A \rightarrow C  is a functor between categories A  and C  . For purposes of illustration, we’ll call A  an indexing category, and let’s suppose it’s a simple one with objects a_1, a_2, \  \text{and} \ a_3  :

A functor f out of this category A  is simply the choice of three objects and three arrows in the category C  such that

where f(a_1) = c_1  , f(a_2) = c_2  , and f(a_3) = c_3  . The image of the arrows in A  are the arrows g, k, and h in C  where g = h \circ k  .

Categorical products

What is very neat and exciting is that we can also define the notion of a categorical product (e.g., a product of two categories). For a long time, it was thought that taking a product between two sets was one of the most fundamental operations in mathematics. But, it turns out, from the definition of a categorical product we can still drill deeper and therefore also capture the essence behind the Cartesian product of sets, the direct product of groups or rings, and the product of topological spaces.

This topic is again quite technical but, in short, a simple definition of a categorical product is as follows:

Definition 3. For any categories C  and D  , there is a category C \times D  , their product, whose

  • objects are ordered pairs c,d  , where c is an object of C  and d is an object of D  ,
  • morphisms are ordered pairs with \pi_1 : C \times D \rightarrow C  , \pi_2 : C \times D \rightarrow D  such that for the other candidate X  we define the maps f: X \rightarrow A  , g: X \rightarrow B  for every unique h: C \times D  , and \pi_1 \circ h = f  and \pi_2 \circ h = g  ,
  • and in which composition and identities are defined componentwise.

A first glimpse at duality

Now, what is absolutely amazing is how, from the notion of a product of categories (which is like a generalisation of the Cartesian product of ordered sets), the first glimpse of a fundamental mathematical description of duality naturally emerges in the definition of a categorical coproduct.

Let us return to the definition of a categorical product and its diagram in the previous section. We want to think of its coproduct (i.e., the product in the opposite category). We will have the same picture, except all of the arrows will be reversed which is the same as exchanging domain and codomain.

Definition 4. The co-product C + D  , p_1 : C \rightarrow C + B  , p_2 : D \rightarrow C +D  is such that for each X  , f: C \rightarrow X  , g: D \rightarrow X  there exists a unique h: C + D \rightarrow X  that makes the diagram commute h \circ p_1 = f  and h \circ p_2 = g  .

The coproduct naturally takes the form of the category-theoretic dual notion to the categorical product. We can think of this in terms of a mapping from C  to C^{\text{op}}  .

Definition 5. Let C  be any category. The opposite category C^{\text{op}}  has

  • the same objects as in C  , and
  • a morphism f^{\text{op}}  in C^{\text{op}}  for each a morphism f \in C  so that the domain of f^{\text{op}}  is defined to be the codomain of f and the codomain of f^{\text{op}}  is defined to be the domain of f: i.e., f^{\text{op}}: X \rightarrow Y \in C^{\text{op}} \leftrightarrow f: Y \rightarrow X \in C  .

What this means is that, given C^{\text{op}}  has the same objects and morphisms as C  , the notion of duality in category theory is defined by a reversal of arrows: i.e., each morphism in C^{\text{op}}  is pointing in the opposite direction.

Statement \Sigma  Dual statement \Sigma^{\star}
f : a \rightarrow b  f : b \rightarrow a
a = \text{dom} f  a = \text{cod} f
i = 1_a  i = 1_a
h = g \circ f  h = f \circ g
f  is monicf  is epis
u  is a right inverse of h  u  is a left inverse of h
f  is invertiblef  is invertible
t is a terminal objectt is an initial object

The dual of each of the axioms for a category is also an axiom, while the dual of the dual returns the original statement. This is the duality principle in a nutshell.


[1] E. Riehl, Category theory in context. Dover Publications, 2016. [online].

[2] S. Mac Lane, Category theory for the working mathematician. Springer, 1978. [online].

[3] P. Smith, Category theory: A gentle introduction. [online].

[4] J. Baez, Category theory course. [online].