# Consistent Histories and Density Operator Formalism

I’ve been busy recently, mostly trying to sell an ethanol plant, and I realize that the plan was to talk about how a particle interaction could cause a force like gravity. However, I also made the New Year’s Resolution to be more professional in my physics and that would be a rather scary post. So instead, partly due to a posting of Lubos Motl, I’m going to rewrite the formalism of the Consistent Histories interpretation of quantum mechanics into my favorite formalism, that of pure density operators. This will be added to the first chapter of my book on density operator applications to elementary particles using Clifford / geometric algebra / calculus.

We will begin with a quick review of the consistent histories formalism, largely lifted from the Wikipedia article. We will then detail how the components of that formalism (which uses projection operators and a mixed density matrix) can be rewritten in terms of pure density matrices. Finally, time and space allowing, we will discuss what these things have to do with Margaret Hawton’s photon position operator in this language.

Legal stuff: Following Wikipedia’s copyright requirements for copies of their work, this blog post is licensed under a GNU Free Documentation License. It uses material from the Wikipedia article Consistent_histories. I’ve made a few minor changes intended to make the article easier to understand and I’ve clipped off the cosmology and quantum decoherence.

In quantum mechanics, the consistent histories approach is intended to give a modern interpretation of quantum mechanics, generalising the conventional Copenhagen interpretation and providing a natural interpretation of quantum cosmology. The theory is based on a consistency criterion that then allows the history of a system to be described so that the probabilities for each history obey the rules of classical probability while being consistent with the Schroedinger equation.

According to this interpretation of quantum mechanics, the purpose of a quantum mechanical theory is to predict probabilities of various alternative histories. A history $H_i$ is defined as a sequence (product) of projection operators $P_{i,j}$ at different moments of time $t_{i,j}$:
$H_i = T \prod_{j=1}^{n_i} P_{i,j}(t_{i,j})$
The symbol T indicates that the factors in the product are ordered chronologically according to their values of $t_{i,j}$: The “past” operators with smaller values of t appear towards the right side, and the “future” operators with greater values of t appear to the left.

These projection operators can correspond to any set of questions that include all possibilities. Examples might be the three projections meaning “the electron went through the left slit,” “the electron went through the right slit” and “the electron didn’t go through either slit.” One of the aims of the theory is to show that classical questions such as, “where are my keys?” are consistent. In this case one might use a very large set of projections each one specifying the location of the keys in some small region of space.

A history is a sequence of such questions, or — mathematically — the product of the corresponding projection operators. The role of quantum mechanics is to predict the probabilities of individual histories, given the known initial conditions.

Finally, the histories are required to be consistent, i.e.
$Tr(H_i\rho H_{i'}^{\dag} ) = 0$
for $i,i'$ different. Here $\rho$ represents the initial density matrix, and the operators are expressed in the Heisenberg picture. The consistency requirement allows us to postulate that the probability of the history $H_i$ is simply
$Pr(H_i) = Tr(H_i\rho H_i^{\dag})$
which guarantees that the probability of “A or B” equals the probability of “A” plus the probability of “B” minus the probability of “A and B” and so forth. …

.

Now the first thing to notice about the above interpretation of quantum mechanics is that the fundamental quantum states are not the usual state vectors (which are almost universally assumed to be the correct description of a quantum state), but instead are a density matrix. To rewrite a density matrix into pure density operator language, we need only note that arbitrary density matrices are built from statistical combinations of pure density matrices. The difference between “matrix” and “operator” is one of language only. The operator language is simply an elegant way of avoiding having to make a choice of representation of the matrices; instead of using matrices, one leaves calculations in the form of, for example, $\gamma^0, \gamma^1, \gamma^2, \gamma^3$. In the interest of not losing my readers, I will use the ugly, but more familiar, matrix notation.

The elements of a history, $P_{i,j}$ are projection operators and therefore are idempotent , i.e.: $(P_{i,j})^2 = P_{i,j}$. This is convenient because the pure density matrices are also idempotent. In the bra ket language, this follows from the normalization of the state vector. If $\langle x| |x\rangle = 1$, then the pure density matrix built from the state x, i.e. $\rho_x = |x\rangle\langle x|$ satisfies $(\rho_x)^2 = \rho_x$.

In addition to the idempotency requirement, a pure density matrix, at least in the usual state vector oriented theory, must also be Hermitian and have trace 1. The Hermitian requirement comes from the origin of density matrices from state vectors, and amounts to the fact that $(\rho_x)^{\dag} = (|x\rangle\langle x|)^{\dag} = (\langle x|)^{\dag} (|x\rangle)^{\dag}$ = $|x\rangle \langle x| = \rho_x$. The requirement that the trace be 1 amounts to a normalization requirement and arises from the way one calculates probabilities using the trace.

Primitive Idempotents

Suppose that we have a matrix that is a projection operator and happens to have a trace of 1. What does such a matrix look like? Let’s choose a representation that diagonalizes this projection operator. What will it look like? Well, since it’s diagonal, it’s really easy to compute the square of the matrix; it’s just the matrix with the diagonal elements squared. For the matrix to be idempotent, that is, for it to be a projection operator, this means that each of the complex numbers on the diagonal must also be idempotent. That is, they have to satisfy $(\rho_{kk})^2 = \rho_{kk}$. But there are only two solutions to this equation, zero and one. To get a trace of 1, there must be exactly one 1 on the diagonal, and the rest of the elements must be zero, for example, either of these two matrices would be okay:

The above two projection operators “annihilate” each other, that is, they multiply to zero. When one has two projection operators that do this, for example A and B, then their sum, A+B is also a projection operator because (A+B)^2 = A^2 + B^2 + AB + BA = A+B. Furthermore, the trace function is linear so the trace of A+B is the sum of the traces: tr(A+B) = tr(A) + tr(B). Consequently, if one has two pure density matrices such as those shown above, then their sum is not a pure density matrix cause its trace is 2 instead of 1.

The above fact suggests an alternative definition of the pure density matrices. They are just the projection operators that cannot be written as the sum of two non zero projection operators. In the mathematical language (as opposed to the words physicists prefer to use), such projection operators are called “primitive idempotents.” The “primitive” means they cannot be written as the sum of two nonzero idempotents. Maybe they chose “primitive” because mathematicians like prime numbers. (In his papers on the “measurement algebra,” Julian Schwinger calls the things that he uses to represent quantum states “elementary measurements”. These are also primitive idempotents.)

Projection Operators as Sums of Quantum States

Sticking to the example of 4×4 matrices, one might ask what are the possible values of the trace for a 4×4 matrix that is a projection operator? Another fact about projection operators is that their eigenvalues are always 0 and 1. Diagonalizing such a thing, one finds that its trace is 0, 1, 2, 3, or 4. Therefore, at least for the 4×4 matrices, any projection operator can be written as a sum of primitive idempotents.

More generally, for any sort of finite quantum spin states (like spin-3/2 or whatever), arbitrary projection operators can be written as sums over primitive idempotents. Is this true for continuous states? I’m not sure, but it doesn’t matter much. We can always use “box normalization” and do quantum mechanics in finite spaces. Maybe a reader will correct this in the comments.

When we split a projection operator into a sum over primitive idempotents we are taking advantage of the very common assumption that a quantum state is completely characterized by the values of a complete set of commuting observables. Each choice of possible quantum numbers for those obserables defines a quantum state and has a state vector associated with it. We can take these state vectors and use them to define pure density matrices and these pure density matrices will annihilate each other. If we add two of these annihilating primitive idempotents together, the result will no longer be a quantum state (at least in the usual interpretation), but it will still be a projection operator.

Thus we can think of the projection operators in the consistent histories interpretation as sums over pure density matrices. This puts the consistent histories interpretation completely in pure density matrix form. More generally, if $P_{i,j}$ can be written as the sum of n primitive idempotents, so we can write: $P_{i,j} = \rho_{i1,j} + \rho_{i2,j} + ... + \rho_{in,j}$, what we have done is broken the “histories” part of the consistent histories interpretation into pure density matrices. In this case, we end up with n times as many histories to compare as before, but the primitive idempotents are projection operators.

In short, we can recast the consistent histories interpretation into a pure density matrix or primitive idempotent interpretation. The histories built from pure density matrices each satisfy the requirements of the histories of the usual consistent histories interpretation, and the usual histories of the consistent histories interpretation can be split quite naturally into pure density matrix form.

So let’s define a history to be a sequence of pure density matrices. And to make our notation less complex, let’s look at just one history and drop the “i”. Define our history of interest as $H = \rho_n\rho_{n-1}...\rho_1\rho_0$. These pure density matrices are Hermitian and idempotent so $\rho_j^{\dag} = \rho_j$ and they square to themselves. Writing this as the trace of a product of primitive idempotents we can manipulate the product into a form with the quantum state at the beginning and end of the product:

The above consists of a product of primitive idempotents, with the same primitive idempotent at the beginning and end.

Let’s compute this product and then calculate the trace with an example from the Pauli algebra. We’ll choose the primitive idempotent to be, oh, maybe spin in the +x direction, and we’ll abbreviate the stuff in the middle (the history) as a 2×2 matrix $H_{jk}$:

The above gives what happens if we take the end state to be spin in the +x direction. We end up with a number multiplied by the primitive idempotent for spin in the +x direction.

This is a general fact about products of operators that begin and end with the same primitive idempotent; in general, the product will be a complex multiple of that primitive idempotent. In the above example, if for the end state we’d used spin in the +z direction, our answer would have been $h_{11}$ times that state. Spin in the -z direction would yield $h_{22}$ times the primitive idempotent for spin in the -z direction.

This fact raises a question. Why do we really need the trace? Since the pure density matrix at either end of the calculation has trace 1, we can instead take the answer to be the multiple of the state. We will get the same answer, but by making the calculation this way, our formalism is entirely written in operator (matrix) form. Not only are the quantum states density matrices, the histories are products of density matrices, and even the probabilities are multiples of density matrices. Thus getting rid of the trace reduces the number of mathematical objects we need; we only need operators. Now let’s discuss photons.

Hawton’s Photon Position Operator

In introductory quantum mechanics, one works with wave functions in the position space. With such a creature, one can compute a probability density, that is, a function of space that gives the probability of finding the particle at the various points in space. For example, if the wave function is $\psi(x,t)$, then the probability of finding the particle at the point $x_0$ is proportional to $|\psi(x_0,t)|^2 = \psi(x_0,t)^{\dag}\psi(x_0,t)$.

If we wish to compute the expected value of some operator Q that depends on position, we would compute the expectation value by the integral:

For a scalar valued function, one uses the position operator $\vec{x}$ for this. In Cartesian coordinates, one could make the calculation simply with (x,y,z). It turns out that this simple position operator doesn’t work for photons.

For a spin state, things get more complicated. The usual technique in QM is to choose a direction, usually z, and to split the spin states according to their spin measured in that direction. This works beautifully in the case of spin-1/2. Let $\psi_{\pm z}(x,t)$ be the wave functions for the spin up and spin down portions of a spin-1/2 state. We then define the wave function by the state vector:

To compute, for example, the average position for the wave function, we use the fact that “x” commutes with the way we have written our states. To see what is going on, let’s write it out explicitly:

In order to do the above calculation, we used the fact that we could commute the position operator “x” with our states. This happened because we were able to split the spin-1/2 states up into spin up and spin down, and these commute with position. The problem with the traditional proof that photon wave states do not exist is well explained in Hawton and Baylis’s article: Photon Position Operators and Localized Bases(2001) (page 27):

To ensure a rotationally invariant linear manifold of localized states for a system with total angular momentum quantum number $jv$, they assumed a complete set of $2j+1$ wave functions $\psi_{jm}, -j \leq m \leq j$, where $m$ is a component referenced to an external direction. While the existence of a complete set is sufficient to give a rotationally invariant manifold, it is not necessary for massless particles of spin $X > 1/2$. Massless particles with spin have only two spin states, namely those corresponding to the helicities $\pm S$. For a system of states at the coordinate origin, the orbital angular momentum vanishes and $j=S$. The states in the linear manifold are characterized by components of j not along a space-fixed direction but along the momentum direction p. For $S>1/2$, the manifold is not complete and consequently it cannot describe a state with spin quantized along an arbitrary direction. However, it can describe the allowed states with either helicity. Furthermore, since the helicity operator commutes with the generator J of rotations, the two helicity subspaces are separately rotationally invariant. Because the helicity eigenstates form a complete rotational set only for $S \leq 1/2$, it is clear why Newton’s and Wigner’s insistence on a complete rotational manifold is stronger than necessary for massless particles with $S \geq 1$.

.

The above explanation is sufficient to show how it came to be that Hawton found the photon density operator that had previously been overlooked. Complaints about her mathematics have tended to be that the resulting wave function is not as simple as desired, or that they use matrices. If we use the matrix formalism described above for the consistent histories, our probabilities will end up not as real numbers, but instead as real multiples of matrices. In such a formalism, ending up with matrix wave mechanics is quite natural.

The central problem of the photon wave function amounts to the fact that massless spin-1 quantum states have to take states that depend on the direction in which they are travelling. Hawton’s method of doing this is to eliminate the momentum dependency by rotating it out, then doing the state split, and finally rotating back to the original orientation. I will use her 2004 paper as a reference here.

Quantum mechanics is generally easier in the momentum space. For a scalar particle, the position operator in momentum space is simply an imaginary multiple of the gradient: $i\hbar\nabla$. The problem with spin-1 is that the available quantum states (and therefore the quantum numbers of spin) depend on the orientation. If momentum is directed in the +z direction, then the available choices for spin are just +z and -z.

A similar problem occurs in the left and right handed chiral halves of the standard model particles. Rather than talk about electrons with spin in the +z and -z direction, the standard model prefers right handed and left handed electrons. Thus instead of spin, what is more natural (in both photons and the elementary fermions) is helicity. And helicity, unlike spin, is preserved under rotation.

So Hawton’s method of writing a position operator in momentum space for the photon is to begin with the usual position operator, $i\hbar \nabla$, but fix the orientation problem by first adjusting the state with a rotation matrix D. After computing the position (with the gradient operator), the rotation is eliminated by using its inverse. The resulting position operator is (equation 3 in the above):

where D is a rotation matrix that “rotates the lab z axis into p,” and I’ve slightly simplified the paper’s formula by restricting to the number density normalization case.

So, can rotation matrices be written as products of pure density matrices? Of course they can! The sequence of projection operators $\rho_n \rho_{n-1}...\rho_2\rho_1$, when considered as an operator on quantum states, takes the ket $| 1\rangle$ and converts it into a complex multiple of the state $|n\rangle$. Perhaps a calculation will assist.

This property of products of pure density matrices is general, but let’s do the example using spin-1 as used for photons. To make things quick, we’ll use n=2 so there are only two pure density matrices to deal with. (Just as in the case of products of pure density matrices that begin and end with the same state, the effect of the intermediate states in a product of states is to adjust the amplitude and phase only.)

If one has an operator S that squares to unity, SS = 1, then one can quickly turn it into a projection operator by writing (1+S)/2. We can make these sorts of operators out of the usual spin-1 operators:

if we modify them by setting one of the diagonal zeros to +1 or -1. For example:

each square to +1. What we’ve done here amounts to changing the eigenvalues of the matrices from {-1, 0, +1} to {-1, -1, +1}. By doing this, we now have that (1+S’)/2 are primitive idempotents that pick out the +1 state. The resulting pure density matrix states are:

It is clear that the above have trace 1 and are projection operators. They each have two independent eigenvectors with eigenvalue 0 and one eigenvector with eigenvalue +1. These two eigenvectors are:

In the spinor language, the above vectors are the ket states that generate the two pure density matrices. It is easy to verify that the product of the above two pure density matrices:

is a matrix that takes kets of the second type and turns them into kets of the first type. Of course this is not a general rotation matrix, however, it is a matrix that is sufficient to rotate the given states from one to the other and should be equal to the task of assembling a photon position operator in the pure density matrix formalism.

While having “rotation matrices” that annihilate everything but one spin state is kind of ugly for photons, the notion is natural in the standard model where the left and right handed particles act completely differently and so projection operators are used to pick them apart when needed.

Filed under physics

### 3 responses to “Consistent Histories and Density Operator Formalism”

1. William

Sidenote: You don’t have to put your entire post under GFDL if you merely quote an article, just the parts that are actually from the article. (i.e. your changed/clipped article)

Anyways, back to reading the article…

2. nigel cook

“…. I realize that the plan was to talk about how a particle interaction could cause a force like gravity. However, I also made the New Year’s Resolution to be more professional in my physics and that would be a rather scary post.”

If it’s true that quantum gravity is a (relatively) simple physical interaction process (requiring simple maths and concepts to extract predictions), then in the end you don’t have much of a choice. It may turn out that there is only one way to deal with quantum gravity. You’re right that the big problem is tackling any such subject in a way that looks professional. I’m ploughing (or plowing as spelt in USA) through QFT textbooks so I can summarize the key mainstream QFT mathematics. I don’t think it is correct.

If you have a particle that is accelerated by a series of randomly occurring interactions with gravitons, the acceleration occurs as a result of a sequence of discrete impulses, like quantum leaps. Not continuous, uniform acceleration like “curvature”. So I really think that the entire mathematical formulation of GR and much of QFT is bunk: it works as a good approximation on large scales (but not too large, or the gravitons are seriously redshifted in being exchanged between receding masses in the universe). It doesn’t work on small scales, where chaotic graviton interactions cause particles to jump around more randomly. It takes a lot of graviton interactions to smooth out the chaos of quantum interactions on small scales. All of this is just ignored by GR. QFT is nearly as bad because it also uses calculus to approximate a lot of discrete events: path integrals.

If you consider a fraction of pollen grain in a high wind, it’s motion will not be a smooth acceleration but will depend on impacts of air molecules. However, a ship’s sail will average out a large number of impacts and appear to accelerate uniformly in the breeze. It’s a case that one mathematical model works on one scale, but it is only a probability formula or statistical approximation, not a 1-2-1 direct physical model of the situation.