Compatible operators and uncertainty

Last time, we ended with the idea of compatible operators. Two operators \( \hat{A}, \hat{B} \) are compatible if they commute, i.e. if \( [\hat{A}, \hat{B}] = 0 \). This has important implications for order of measurement: basically, we can measure \( \hat{A} \) and \( \hat{B} \) in either order and we will get the same outcome.

Now let's see this more rigorously. If the eigenvalues of \( \hat{A} \) are non-degenerate, then in the basis of eigenkets \( { \ket{a}} \), a compatible operator \( \hat{B} \) is represented by a diagonal matrix. (Obviously, so is \( \hat{A} \).) This is straightforward to see:

\[ \begin{aligned} \bra{a_j} [\hat{A}, \hat{B}] \ket{a_i} = \bra{a_j} (\hat{A} \hat{B} - \hat{B} \hat{A}) \ket{a_i} \\ = \sum_{k} \bra{a_j} \hat{A} \ket{a_k} \bra{a_k} \hat{B} \ket{a_i} - a_i \bra{a_j} \hat{B} \ket{a_i} \\ = \sum_{k} a_k \delta_{jk} \bra{a_k} \hat{B} \ket{a_i} - a_i \bra{a_j} \hat{B} \ket{a_i} \\ = (a_j - a_i) \bra{a_j} \hat{B} \ket{a_i} = 0. \end{aligned} \]

Since we assumed the eigenvalues of \( \hat{A} \) are non-degenerate, then if \( i \neq j \) the matrix element of \( \hat{B} \) is zero. So \( \hat{B} \) is diagonal. Although we had to assume non-degeneracy in this simple proof, it is generally true that if \( [\hat{A}, \hat{B}] = 0 \), there exists a basis in which both operators are diagonal (so compatible operators are simultaneously diagonalizable.) This also goes the other way: simultaneously diagonalizable operators must always commute with each other (you'll prove this direction on the homework.)

Although none of the three spin-component operators we've looked at commute with each other, there is another operator we can construct in the Stern-Gerlach experiment,

\[ \begin{aligned} \hat{\mathbf{S}}{}^2 \equiv \hat{S_x}{}^2 + \hat{S_y}{}^2 + \hat{S_z}{}^2 = \frac{3}{4} \hbar^2 \hat{1}, \end{aligned} \]

where we've used the properties of the Pauli matrices to show that this operator is proportional to the identity operator. Clearly, this implies that

\[ \begin{aligned} [\hat{\mathbf{S}}{}^2, \hat{S_i}] = 0 \end{aligned} \]

and the eigenstates of any of the \( \hat{S_i} \) are also eigenstates of \( \hat{\mathbf{S}}^2 \). (This is sort of trivial here; since \( \hat{\mathbf{S}}^2 \) is proprtional to the identity, any state is an eigenstate. For higher-spin objects, however, this operator will no longer be proportional to the identity, but it will still be compatible with each of the spin components.)

Compatible observables are nice, but the incompatible ones are really interesting. Of course, we saw in the Stern-Gerlach experiment that the order in which we took measurements mattered, which stems from the fact that the operators corresponding to those measurements don't commute. But there are plenty of examples of non-commutative operations in classical physics too: rotations of an object about its axes in three dimensions generally don't commute, for example. But in quantum mechanics there are much deeper consequences of non-commutative operators.

Consider a generic sequential experiment, consisting of what Sakurai calls "selective measurements". We have three operators \( \hat{A}, \hat{B}, \hat{C} \), represented by boxes. Our state comes into the box, is acted on by the operator, and then we block all of the results except for a chosen eigenstate. (The operator + blocking can be written simply as the projection operator onto the eigenstate, e.g. \( \ket{a} \bra{a} \).)

![][ABC]

[ABC]: images/selection-ABC.png width=600px

We can calculate the probability of observing \( \ket{c} \) from the last measurement; probabilities are multiplicative for independent events, so we just apply Born's rule twice, finding

\[ \begin{aligned} \textrm{pr}(c|b) = |\sprod{c}{b}|^2 |\sprod{b}{a}|^2. \end{aligned} \]

We can compute the total probability for finding state \( \ket{c} \) by summing over all of the possible intermediate states \( \ket{b} \); we can use our appartus to measure these probabilities one by one. The resulting sum is

\[ \begin{aligned} \textrm{pr}(c|\hat{B}) = \sum_b |\sprod{c}{b}|^2 |\sprod{b}{a}|^2 = \sum_b \sprod{c}{b} \sprod{b}{a} \sprod{a}{b} \sprod{b}{c}. \end{aligned} \]

I've written this out as a series of inner products so we can compare it with another way to sum over the intermediate states, which is simply to remove the \( \hat{B} \) measurement completely. Then the probability of observing final state \( \ket{c} \) becomes

\[ \begin{aligned} \textrm{pr}(c|\textrm{no}\ \hat{B}) = |\sprod{c}{a}|^2 = \left| \sum_b \sprod{c}{b} \sprod{b}{a} \right|^2. \end{aligned} \]

These expressions are, in general, not equal! Even if we try to "put the pieces back together" by summing over all possible intermediate \( \ket{b} \) states - and remember, we can think of the initial state \( \ket{a} \) as a linear combination of \( \ket{b} \) states! - we get a different result just based on the fact that we've recorded the probabilities of each of the \( \ket{b} \) at the intermediate step. This is a truly quantum phenomenon.

What does this have to do with incompatible observables? Well, it can be shown that under certain conditions the two expressions for the probability of \( \ket{c} \) become equal: either \( [\hat{A}, \hat{B}] = 0 \) or \( [\hat{B}, \hat{C}] = 0 \). The effect we've seen is a consequence of incompatible observables.

The uncertainty relation

We have established that if two observables are not compatible, \( [\hat{A}, \hat{B}] \neq 0 \), then they do not have simultaneous eigenstates: we can't precisely know the values of both observables simultaneously. What we can do is make statistical statements; given an ensemble in some state \( \ket{\psi} \) and many repeated trials, we can determine both \( \ev{\hat{A}} \) and \( \ev{\hat{B}} \).

But it turns out that even when we're willing to back off to statistical measurements of an ensemble, there is still a fundamental limit on how precisely we can predict the outcomes of incompatible observables. You have seen this before in the form of the Heisenberg uncertainty principle for momentum and position of a particle. Here we consider a more general inequality known as the uncertainty relation.

Following Sakurai, we define an operator

\[ \begin{aligned} \Delta \hat{A} \equiv \hat{A} - \ev{\hat{A}} \end{aligned} \]

which measures the residual difference between the outcome of \( \hat{A} \) and the expected mean. The expectation value of the square of this difference is called the variance or the dispersion:

\[ \begin{aligned} (\Delta A)^2 \equiv \ev{(\Delta \hat{A})^2} = \ev{\left(\hat{A}^2 - 2 \hat{A} \ev{\hat{A}} + \ev{\hat{A}}^2 \right)} = \ev{\hat{A}^2} - \ev{\hat{A}}^2. \end{aligned} \]

(Note that Sakurai doesn't use the notation "\( (\Delta A)^2 \)" for the variance, to avoid confusion with the operator \( \Delta \hat{A} \); since I'm using hat notation for operators, we can use the more conventional notation for the variance.)

This dispersion measures how widely dispersed the results of measuring \( \hat{A} \) will be over many trials. In fact, by the central limit theorem, if we repeat our experiment an enormous number of times the distribution will be a Gaussian, centered at \( \ev{\hat{A}} \) and with 68% of the observations within one standard deviation \( \Delta A \) of \( \ev{\hat{A}} \).

![][gauss]

[gauss]: images/gaussian.png width=500px

We know that there are some experiments for which the dispersion will be zero; if we prepare a pure eigenstate \( \ket{a} \) of \( \hat{A} \), then we will always measure the same eigenvalue \( a \). However, for a more general state there will be some dispersion. In particular, even if our system is in an eigenstate of \( \hat{A} \), we might expect to find some dispersion if we measure an incompatible observable \( \hat{B} \). For example, if we go back to our Stern-Gerlach system and prepare an eigenstate \( \ket{+} \) of \( \hat{S_z} \), then \( \Delta S_z = 0 \), but if we try to measure \( \hat{S_x} \), we find that

\[ \begin{aligned} \Delta S_x = \sqrt{\ev{\hat{S_x}^2} - \ev{\hat{S_x}}^2} = \frac{\hbar}{2}. \end{aligned} \]

If we choose a different state, the dispersions of \( \hat{S_x} \) and \( \hat{S_z} \) will be different, but it won't surprise you to learn that they can never both be made arbitrarily small; there is a limit on the combination of the two. For any state and any two observables (i.e. Hermitian operators)\( \hat{A} \) and \( \hat{B} \), the variances must satisfy the uncertainty relation,

\[ \begin{aligned} (\Delta A) (\Delta B) \geq \frac{1}{2} \left| \ev{[\hat{A}, \hat{B}]}\right|. \end{aligned} \]

Let's prove this inequality. Our starting point is the Cauchy-Schwarz inequality which we proved before: if we let

\[ \begin{aligned} \ket{\alpha} = \Delta \hat{A} \ket{\psi} \\ \ket{\beta} = \Delta \hat{B} \ket{\psi} \end{aligned} \]

where \( \ket{\psi} \) is arbitrary, then the inequality gives us

\[ \begin{aligned} \sprod{\alpha}{\alpha} \sprod{\beta}{\beta} \geq |\sprod{\alpha}{\beta}|^2 \\ \Rightarrow \bra{\psi}(\Delta \hat{A})^2 \ket{\psi} \bra{\psi} (\Delta \hat{B})^2 \ket{\psi} \geq |\bra{\psi} (\Delta \hat{A}) (\Delta \hat{B}) \ket{\psi}|^2. \end{aligned} \]

Now we'll use a useful trick: we can rewrite the product of two operators like so,

\[ \begin{aligned} \Delta \hat{A} \Delta \hat{B} = \frac{1}{2} [\Delta \hat{A}, \Delta \hat{B}] + \frac{1}{2} \{ \Delta \hat{A}, \Delta \hat{B}\}. \end{aligned} \]

We know that the observables themselves are Hermitian, but what about the commutator and anti-commutator? Well, first notice that

\[ \begin{aligned} [\Delta \hat{A}, \Delta \hat{B}] = [\hat{A} - \ev{\hat{A}}, \hat{B} - \ev{\hat{B}}] = [\hat{A}, \hat{B}] \end{aligned} \]

since the commutator vanishes if either argument is a scalar. This commutator is anti-Hermitian:

\[ \begin{aligned} ([\hat{A}, \hat{B}])^\dagger = (\hat{A} \hat{B} - \hat{B} \hat{A})^\dagger \\ = \hat{B}^\dagger \hat{A}^\dagger - \hat{A}^\dagger \hat{B}^\dagger \\ = \hat{B} \hat{A} - \hat{A} \hat{B} \\ = - [\hat{A}, \hat{B}]. \end{aligned} \]

Just like the expectation value of any Hermitian operator is real, that of an anti-Hermitian operator is purely imaginary. On the other hand, it's easy to show that the anticommutator \( {\Delta \hat{A}, \Delta \hat{B}} \) is in fact Hermitian. So we've rewritten the object inside the absolute value above as a real part and an imaginary part, which means that the absolute value is just the sum of the squares:

\[ \begin{aligned} \left|\ev{\Delta \hat{A} \Delta \hat{B}}\right|^2 = \frac{1}{4} \left| \ev{[\hat{A}, \hat{B}]} \right|^2 + \frac{1}{4} \left| \ev{\{\Delta \hat{A}, \Delta \hat{B}\}} \right|^2 \end{aligned} \]

Since the anticommutator term is always positive, this is always greater than or equal to \( \frac{1}{4} \left| \ev{[\hat{A}, \hat{B}]} \right|^2 \), so our proof is complete; taking the square root of both sides gives the form of the relation written above.

Change of basis

As we've seen, for finite-dimensional Hilbert spaces it's often very convenient to work in terms of explicit matrices and vectors to represent our operators and states. This requires a choice of basis. But what if we want to change from one basis to another, say from \( \hat{S_z} \) to \( \hat{S_x} \) eigenstates for our Stern-Gerlach example? Since we already know the complete basis of eigenstates for both \( \hat{S_x} \) and \( \hat{S_z} \), we can easily construct an operator to map from one to the other:

\[ \begin{aligned} \hat{U}_{z \to x} = \ket{S_{x,+}} \bra{S_{z,+}} + \ket{S_{x,-}} \bra{S_{z,-}} = \sum_i \ket{S_{x,i}} \bra{S_{z,i}}. \end{aligned} \]

This operator transforms \( \ket{S_{z,+}} \) to \( \ket{S_{x,+}} \), and likewise for the minus states. I've written it as \( \hat{U} \) because it is easily seen to be a unitary operator:

\[ \begin{aligned} \hat{U}^\dagger_{z \to x} \hat{U}_{z \to x} = \sum_i \sum_j \ket{S_{z,j}} \sprod{S_{x,j}}{S_{x,i}} \bra{S_{z,i}} = \sum_i \ket{S_{z,i}} \bra{S_{z,i}} = \hat{1}. \end{aligned} \]

In fact, for any two complete, orthonormal basis sets \( { \ket{a}} \) and \( { \ket{b} } \), we can always find such a unitary operator which changes from one basis to the other. (In fact, we can construct it as a sum over outer products in the same way.)

For an arbitrary ket \( \ket{\psi} \), it's useful to see how its expansion changes as we change basis: if we have

\[ \begin{aligned} \ket{\psi} = \sum_{i} \ket{a_i} \sprod{a_i}{\psi} \end{aligned} \]

then in the new basis, we need the coefficients \( \sprod{b_i}{\psi} \):

\[ \begin{aligned} \sprod{b_i}{\psi} = \sum_j \sprod{b_i}{a_j} \sprod{a_j}{\psi} = \sum_j \bra{a_i} \hat{U}{}^\dagger \ket{a_j} \sprod{a_j}{\psi}. \end{aligned} \]

Working in matrix notation, going from basis \( {\ket{a}} \) to basis \( {\ket{b}} \) just requires multiplication on the left by the matrix \( \hat{U}^\dagger \) (also expanded in basis \( {\ket{a}} \)),

\[ \begin{aligned} \psi \rightarrow U^\dagger \psi. \end{aligned} \]

A similar derivation for the matrix elements of an operator \( \hat{X} \) allows us to show that the new matrix representing \( \hat{X} \) is given by a similarity transformation of the old matrix,

\[ \begin{aligned} \hat{X} \rightarrow \hat{U}{}^\dagger \hat{X} \hat{U}. \end{aligned} \]

Notice that for both matrices and vectors, \( \hat{U}^\dagger \) is always on the left. The reason that we multiply our state vectors by \( \hat{U}^\dagger \) and not by \( \hat{U} \) is that we're thinking of this as a passive transformation; if the basis rotates according to \( \hat{U} \), then the components of a state vector are rotated the opposite way.

A unitary transformation preserves the norm of all of our kets; we can think of it as a rotation in the Hilbert space. In fact, operators have a property which is also invariant under unitary transformations, called the trace \( \tr(\hat{X}) \),

\[ \begin{aligned} \tr(\hat{X}) \equiv \sum_n \bra{a_n} \hat{X} \ket{a_n}. \end{aligned} \]

In other words, the trace of an operator is just the trace (sum of the diagonal components) of its matrix representation, in any orthonormal basis. The trace has some useful properties:

\[ \begin{aligned} \tr(\hat{X} \hat{Y} \hat{Z}) = \tr(\hat{Z} \hat{X} \hat{Y}) = \tr(\hat{Y} \hat{Z} \hat{X})\ \textit{(cyclic permutation)} \\ \tr(\ket{a_i}\bra{a_j}) = \delta_{ij} \\ \tr(\ket{b_i}\bra{a_i}) = \sprod{a_i}{b_i}. \end{aligned} \]

Diagonalization

In the Stern-Gerlach example, it was easy to construct \( \hat{U}_{z \to x} \) since we already knew the form of the \( \hat{S_x} \) eigenvectors in the \( \hat{S_z} \) eigenbasis. More generally, we can use the standard procedure for finding matrix eigenvalues and eigenvectors in order to construct a \( \hat{U} \) which will diagonalize any given Hermitian operator. Suppose that we're working in a basis \( {\ket{a}} \), in which operator \( \hat{B} \) is not diagonal. The eigenvectors of \( \hat{B} \) will satisfy the equation

\[ \begin{aligned} \hat{B} \ket{b} = b \ket{b} \\ \sum_i \hat{B} \ket{a_i} \sprod{a_i}{b} = b \sum_i \ket{a_i} \sprod{a_i}{b} \\ \sum_i \bra{a_j} \hat{B} \ket{a_i} \sprod{a_i}{b} = b \sum_i \sprod{a_j}{a_i} \sprod{a_i}{b} \\ \sum_i \left( \bra{a_j} \hat{B} \ket{a_i} - b \delta_{ij} \right) \sprod{a_i}{b} = 0. \end{aligned} \]

This is exactly the matrix eigenvalue equation; it has a solution only if

\[ \begin{aligned} \det(\hat{B} - b \hat{I}) = 0. \end{aligned} \]

For an \( N \)-dimensional Hilbert space, this is an \( N \)-th order polynomial equation with \( N \) (not necessarily distinct) solutions for \( b \), which are the eigenvalues. Solving for the eigenvalues and then eigenvectors in the standard way, the change of basis matrix is

\[ \begin{aligned} \hat{U} = \left( \begin{array}{cccc} &&&\\ &&&\\&&&\\ &&&\\ \ket{b_1} & \ket{b_2} & ... & \ket{b_N} \\ &&&\\ &&&\\ &&&\\ &&& \end{array}\right) \end{aligned} \]

i.e. it is constructed by stacking the eigenvectors as columns. You should already know how to do all of this; I'm just reminding you of the details and telling you that the procedure with operators is completely standard.

One final word: although the eigenvectors of an operator obviously depend on our basis, the eigenvalues do not, i.e. the operators \( \hat{A} \) and \( \hat{U}^\dagger \hat{A} \hat{U} \) have the same eigenvalues. Two operators related by such a transformation are known as unitary equivalent; the proof that their spectrum (set of eigenvalues) is identical is in Sakurai. Many symmetries can be stated in terms of unitary operators, for example, we will see that spatial rotations can be expressed as a unitary operator, from which we could predict that the various spin-component operators in the Stern-Gerlach experiment had exactly the same eigenvalues \( \pm \hbar/2 \).

Continuous operators

We've been working extensively with an example of a finite-dimensional Hilbert space, in fact a two-dimensional space, which is the smallest intersting example. But remember that our motivation for setting up this formalism in the first place was to be able to deal with both finite and infinite-dimensional spaces. We know from the postulates that the outcome of any measurement is an eigenvalue of the corresponding operator; so for any measurement that has a continuous spectrum, like momentum or position, it must have an infinite number of eigenvalues, and therefore must exist in an infinite-dimensional Hilbert space.

By design, the mathematics of continuous observables doesn't look very different! If \( \hat{\xi} \) is an observable in an infinite-dimensional space, then its eigenvalue equation looks familiar:

\[ \begin{aligned} \hat{\xi} \ket{\xi} = \xi \ket{\xi} \end{aligned} \]

There are some small differences when we go from discrete to continuous:

	Discrete	Continuous
Completeness	\( \hat{1} = \sum_a \ket{a} \bra{a} \)	\( \hat{1} = \int d\xi\ \ket{\xi} \bra{\xi} \)
Orthogonality	\( \sprod{a_i}{aj} = \delta{ij} \)	\( \sprod{\xi}{\xi'} = \delta(\xi-\xi') \)
Normalization	\( \sum_i	\sprod{a_i}{\psi}
Matrix elements	\( \bra{a_i}\hat{A}\ket{a_j} = aj \delta{ij} \)	\( \bra{\xi}\hat{\xi}\ket{\xi'} = \xi' \delta(\xi-\xi') \)

For the most part, the only difference in going to continuous observables is that we replace sums with integrals, and Kronecker delta symbols with the Dirac delta function, \( \delta(x) \). Next time, we'll dive in deeper.