Postulates of QM and two-state systems

Last time, we started introducing the postulates of quantum mechanics. These are the foundational principles that will finally let us talk about physics, using the mathematical formalism of Hilbert spaces that we've developed. In fact, the first postulate that we introduced is exactly that we will be working in Hilbert spaces:

Postulate 1: (States) The state of a physical system is represented by a normalized ket in a Hilbert space \( \mathcal{H} \).

Today we'll continue with the remaining postulates, using the Stern-Gerlach experiment as our guide. The Stern-Gerlach experiment is a simple example of a two-state system, which is very simple but still exhibits complex and interesting behavior.

We begin by labeling the two discrete outputs of a single \( \hat{z} \)-direction SG analyzer as \( \ket{+} \) for \( S_z = +\hbar/2 \), and \( \ket{-} \) for \( S_z = -\hbar/2 \). Any input beam gives one of these two outputs, which means that we can write any state as a sum over them:

\[ \begin{aligned} \ket{\psi} = \alpha \ket{+} + \beta \ket{-} \end{aligned} \]

By the first postulate this state must be normalized, which means \( \sprod{\psi}{\psi} = |\alpha|^2 + |\beta|^2 = 1 \). We can also write \( \ket{\psi} \) as a column vector, which will be convenient:

\[ \begin{aligned} \ket{\psi} = \left( \begin{array}{c} \alpha \\ \beta \end{array} \right). \end{aligned} \]

Alright, now that we know how to describe a physical state, what can we do with it?

Postulate 2: (Observables) All physical observables correspond to Hermitian operators.
Postulate 3: (Measurement) When an observable \( \hat{A} \) is measured, the outcome is always an eigenvalue \( a \) of \( \hat{A} \). After measurement, the state collapses into the corresponding eigenstate, \( \ket{\psi} \rightarrow \ket{a} \).

The reason that we require Hermitian operators is because as we proved last time, they have real eigenvalues, which (by postulate 3) correspond to the real numbers that come out of our experiments.

The Stern-Gerlach device oriented in the \( \hat{z} \) direction is an example of an observable; it measures the \( \hat{z} \)-component of the electron spin \( S_z \). We observed that there were two possible outcomes of the experiment, \( S_z = \pm \hbar/2 \), corresponding to the two eigenstates we're using as our basis. In matrix form, we can write the corresponding operator as

\[ \begin{aligned} \hat{S_z} = \frac{\hbar}{2} \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right). \end{aligned} \]

Our chosen basis states \( \ket{+} \) and \( \ket{-} \) are the eigenvectors of this Hermitian operator; as a result, they form an orthonormal basis. Another way to see the structure of this matrix is to write it explicitly as a set of matrix elements:

\[ \begin{aligned} \hat{S_z} = \left( \begin{array}{cc} \bra{+} \hat{S_z} \ket{+} & \bra{+} \hat{S_z} \ket{-} \\ \bra{-} \hat{S_z} \ket{+} & \bra{-} \hat{S_z} \ket{-} \end{array} \right) \end{aligned} \]

We know from experiment that if we begin with a pure output from a \( \hat{z} \)-direction analyzer, the next time we measure we're guaranteed to get the same outcome, either \( \pm \hbar/2 \). Thus, the "collapse" of postulate 3 does nothing, and we have

\[ \begin{aligned} \hat{S_z} \ket{\pm} = \pm \frac{\hbar}{2} \ket{\pm}. \end{aligned} \]

That gives the diagonal entries, and the orthogonality of eigenvectors \( \ket{+} \) and \( \ket{-} \) with each other gives the off-diagonal entries.

One important point was buried in that last argument: under certain circumstances, we were guaranteed to get a certain output. However, we know that quantum mechanics is probabilistic; for each individual atom passing through the \( \hat{S_z} \) measurement apparatus, we don't generally know if we will record spin-up or spin-down. (Implicit in postulate 3 is the fact that we can't predict which eigenvalue of \( \hat{A} \) will be measured as the outcome.) However, we can predict the probability of each measurement:

Postulate 4: (Born's Rule) If a system is in state \( \ket{\psi} \), then the probability of outcome \( \ket{a} \) from a measurement is \( \text{pr}(a) = |\sprod{a}{\psi}|^2 \).

We should certainly insist that the total probability of all possible outcomes is equal to 1, or

\[ \begin{aligned} 1 = \sum_i |\sprod{a_i}{\psi}|^2 = \bra{\psi} \left( \sum_i \ket{a_i} \sprod{a_i}{\psi} \right) = \sprod{\psi}{\psi}. \end{aligned} \]

This is why we required that all physical states should be normalized above!

If we repeat a measurement many times, the average value of the measurement is given by the expectation value:

\[ \begin{aligned} \ev{\hat{A}} = \sum_i \text{pr}(a_i) a_i \\ = \sum_i |\sprod{a_i}{\psi}|^2 a_i \\ = \sum_i \sprod{\psi}{a_i} a_i \sprod{a_i}{\psi} \\ = \sum_i \bra{\psi} \hat{A} \ket{a_i} \sprod{a_i}{\psi} \\ = \bra{\psi} \hat{A} \ket{\psi}. \end{aligned} \]

The spread in these measurements (the width of the distribution) is also interesting, and is determined by the variance \( \ev{\hat{A}^2} - \ev{\hat{A}}^2 \), which we can evaluate similarly by sandwiching with the state vector \( \ket{\psi} \).

Not all operators we will encounter are Hermitian, of course. In the context of the two-state system, there is a pair of particularly useful operators

\[ \begin{aligned} S_+ = \ket{+} \bra{-} = \left( \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right) \\ S_- = \ket{-} \bra{+} = \left( \begin{array}{cc} 0 & 0 \\ 1 & 0 \end{array} \right). \end{aligned} \]

These are called raising and lowering operators, respectively; the raising operator converts the state \( \ket{-} \) into \( \ket{+} \), and vice-versa for the lowering operator. They are not Hermitian, and hence do not correspond to any physical observable; but they happen to be useful mathematical objects when studying with the two-state system, or especially when we return to spin and angular momentum in general.

There will be one final postulate related to time evolution, but for now, let's work with these four and try to see some implications and important properties of the two-state system.

Two-state systems

We'll continue now with the Stern-Gerlach experiment, but I should note first that we're not doing this because we're really interested in the spin of silver atoms! The two-state system is actually extremely general and interesting. A few applications include:

Lasers are a very important application of the two-state system (even if the laser actually has more states, we can approximate it by a two-state system to studying individual processes, e.g. pumping, spontaneous and stimulated emission.)
Atomic transitions, either in isolation or interacting with an electromagnetic field, can likewise often be approximated by a two-state system if one transition is dominant. Nuclear magnetic resonance imaging is one example of many here.
Neutrinos are elementary particles which, even in vacuum, can oscillate from one species into another. (Once again, there are three neutrino species that we know of, but we can generally describe the oscillations in terms of a set of two-state systems.)
Quantum computers are built from a variety of two-state systems, which in that context are also known as qubits.

Basically everything we're about to go through can be applied to any of these systems. For concreteness, I'll stick to Stern-Gerlach as the motivating example. Our starting point is the general state

\[ \begin{aligned} \ket{\psi} = \alpha \ket{+} + \beta \ket{-} \end{aligned} \]

which again, must be normalized, giving the restriction \( |\alpha|^2 + |\beta|^2 = 1 \). Applying normalization leaves three real numbers as free parameters. Let's parameterize them in the following form:

\[ \begin{aligned} \ket{\psi} = e^{i\zeta} \left[\cos(\theta/2) \ket{+} + \sin(\theta/2) e^{i \phi} \ket{-} \right] \end{aligned} \]

The number \( \zeta \) here is a global phase: we can simply absorb it without changing any of the physics. Basically, this is because only squared amplitudes are physical. Suppose we have a second arbitrary state \( \ket{\psi'} \). Then the inner product of the two states is

\[ \begin{aligned} \sprod{\psi'}{\psi} = e^{i(\zeta - \zeta')} \left[ \cos(\theta'/2) \cos(\theta/2) + \sin(\theta'/2) \sin(\theta/2) e^{i(\phi-\phi')} \right]. \end{aligned} \]

But the state overlap itself is not observable - only the square of the overlap \( |\sprod{\psi'}{\psi}|^2 \) is, through Born's rule (postulate 4.) So the phases \( \phi \) and \( \phi' \) matter because they will survive in the square, but \( \zeta \) and \( \zeta' \) vanish from the probability no matter what. Another way to state the irrelevance of the global phase is that we have a symmetry: the states \( \ket{\psi} \) and \( e^{i\zeta} \ket{\psi} \) both describe the same physics, because they'll always give the same probabilities via Born's rule.

Thus, we actually only need two real numbers to fully describe an arbitrary state. As suggested by the use of angles, we can visualize this nicely as a point on the surface of a sphere (called the Bloch sphere after the physicist of the same name). The choice above corresponds to putting the states \( \ket{+} \) and \( \ket{-} \) at the north and south poles respectively:

\[ \begin{aligned} \ket{\psi} = \cos(\theta/2) \ket{+} + \sin(\theta/2) e^{i \phi} \ket{-} \end{aligned} \]

I should note in passing that only needing two real numbers is unique to the two-state system, i.e. for a three-state system, we would need four real parameters. (Three complex numbers, minus one normalization condition, minus one global phase.)

What about the other spin directions? Recall that one of the experimental facts that we observed about the sequential Stern-Gerlach experiment was that for the following configuration:

Sequential z-x Stern-Gerlach experiment.

On the final screen to the right, we find two spots of equal intensity, corresponding to \( S_x = \pm \hbar/2 \). That means that whatever the operator \( \hat{Sx} \) is, it should give us equal probability of finding either eigenvalue for a system prepared in the state \( \ket{+} \). Let's denote the \( x \)-direction eigenstates as \( \ket{S{x,+}} \) and \( \ket{S_{x,-}} \).

By Born's rule, we therefore know that

\[ \begin{aligned} |\sprod{+}{S_{x,+}}| = |\sprod{+}{S_{x,-}}| = \frac{1}{\sqrt{2}}, \end{aligned} \]

which tells us what the eigenkets of \( \hat{S_x} \) look like:

\[ \begin{aligned} \ket{S_{x,+}} = \frac{1}{\sqrt{2}} \ket{+} + \frac{1}{\sqrt{2}} e^{i \delta_1} \ket{-}, \\ \ket{S_{x,-}} = \frac{1}{\sqrt{2}} \ket{+} - \frac{1}{\sqrt{2}} e^{i \delta_1} \ket{-}. \end{aligned} \]

I've used another piece of information to construct these kets: we expect also that \( \sprod{S_{x,+}}{S_{x,-}} = 0 \), since the Stern-Gerlach device splits the beam into two distinct components. Thinking of this state geometrically on the Bloch sphere, from Born's rule we find that both states exist on the equator \( \theta = \pi/2 \), and the orthogonality condition tells us that if \( \phi_+ = \delta1 \), then \( \phi- = \delta_1 + \pi \). Without further information, we can't fix the phase \( \delta_1 \).

Given the eigenstates, we can reconstruct the operator \( \hat{S_x} \), which we know in terms of the eigenstates is diagonal, with entries equal to the eigenvectors:

\[ \begin{aligned} \hat{S_x} = \frac{\hbar}{2} \left( \ket{S_{x,+}} \bra{S_{x,+}} - \ket{S_{x,-}} \bra{S_{x,-}} \right) \\ = \frac{\hbar}{4} \left( 2 e^{-i\delta_1} \ket{+} \bra{-} + 2e^{i\delta_1} \ket{-} \bra{+} \right) \\ = \frac{\hbar}{2} \left( \begin{array}{cc} 0 & e^{-i \delta_1} \\ e^{i \delta_1} & 0 \end{array} \right). \end{aligned} \]

You can double-check the algebra here by computing the eigenvectors and eigenvalues of the matrix in the usual way and making sure we reconstruct the vectors that we started with.

Is there any more information we can use to fix the extra phase factor? Remember that we can orient the S-G device in a third independent direction, along the \( y \) axis. In fact, we know that there is no difference between performing the sequential S-G experiment in the order z-x-z or z-y-z, which means that the construction of the \( \hat{S_y} \) eigenstates is basically identical, i.e.

\[ \begin{aligned} \ket{S_{y,+}} = \frac{1}{\sqrt{2}} \ket{+} + \frac{1}{\sqrt{2}} e^{i \delta_2} \ket{-}, \\ \ket{S_{y,-}} = \frac{1}{\sqrt{2}} \ket{+} - \frac{1}{\sqrt{2}} e^{i \delta_2} \ket{-}. \end{aligned} \]

Geometrically, the \( \hat{S_y} \) eigenstates are still on the equator of the Bloch sphere, and are diametrically opposed to each other around the sphere (their \( \phi \) values again differ by \( \pi \).) The operator \( \hat{S_y} \) is the same as \( \hat{S_x} \) above, but with \( \delta_1 \) replaced by \( \delta_2 \).

We have one more experimental input: we know there is nothing special about the \( z \)-direction, so performing the sequential experiment in, say, the order x-y-x has to give exactly the same outcome, which tells us that

\[ \begin{aligned} |\sprod{S_{y,\pm}}{S_{x,+}}| = |\sprod{S_{y,\pm}}{S_{x,-}}| = \frac{1}{\sqrt{2}}. \end{aligned} \]

Now we multiply out the kets we found above. For example, the product between both "+" components gives

\[ \begin{aligned} |\sprod{S_{y,+}}{S_{x,+}}| = \frac{1}{2} \left( 1 + e^{i(\delta_1 - \delta_2)} \right). \end{aligned} \]

Looking at all four products, we find two distinct equations,

\[ \begin{aligned} \left|1 \pm e^{i(\delta_1 - \delta_2)}\right| = \sqrt{2} \end{aligned} \]

the only solutions to which are

\[ \begin{aligned} \delta_1 - \delta_2 = \pm \frac{\pi}{2}. \end{aligned} \]

This finally makes rigorous a point I made in passing in the second lecture, namely that all of the kets and/or matrix elements cannot be simultaneously real for spin in the three spatial directions. If we set \( \delta_1 = 0 \) to make the \( \hat{S_x} \) eigenkets real, then the \( \hat{S_y} \) kets are complex.

There are still, apparently, two ambiguities left in our total specification of the system: the value of \( \delta_1 \), and the sign of \( \delta_1 - \delta_2 \). We haven't developed the machinery to consider symmetries, in particular changes in coordinate systems, just yet, but both of these choices amount to a choice of coordinates. It's easiest to see this geometrically from the Bloch sphere: our \( \hat{S_z} \) eigenstates are at the poles, and \( \hat{S_x} \) and \( \hat{S_y} \) exist on the equator, equally spaced and separated by \( \phi = \pi/2 \).

It's clear from the picture that choosing \( \delta_1 \) just amounts to choosing the position of the second coordinate axis (we've already forced one axis through the \( \hat{S_z} \) states on the poles.) The choice of sign for \( \delta_1 - \delta_2 \) is a convention, namely whether we are working in a left-handed or right-handed coordinate system. We'll use standard right-handed coordinates in this class, which if we take \( \delta_1 = 0 \) gives \( \delta_2 = \pi/2 \) as the correct choice.

The Pauli matrices

With the phases all fixed, we have for our final set of three operators

\[ \begin{aligned} \hat{S_x} = \frac{\hbar}{2} \left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right) = \frac{\hbar}{2} \hat{\sigma_1} \\ \hat{S_y} = \frac{\hbar}{2} \left( \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right) = \frac{\hbar}{2} \hat{\sigma_2} \\ \hat{S_z} = \frac{\hbar}{2} \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) = \frac{\hbar}{2} \hat{\sigma_3}. \end{aligned} \]

where the dimensionless \( \hat{\sigma_i} \) are a special set of matrices known as the Pauli matrices. The Pauli matrices appear whenever we study the interactions of a spin-\( 1/2 \) particle like the electron, so it's useful to know some algebraic identities. They all square to the identity,

\[ \begin{aligned} \hat{\sigma_i}^2 = \hat{1}. \end{aligned} \]

They obey the commutation relations

\[ \begin{aligned} [\hat{\sigma_1}, \hat{\sigma_2}] = 2i \hat{\sigma_3} \\ [\hat{\sigma_1}, \hat{\sigma_3}] = -2i \hat{\sigma_2} \\ [\hat{\sigma_2}, \hat{\sigma_3}] = 2i \hat{\sigma_1} \\ \end{aligned} \]

or more compactly,

\[ \begin{aligned} [\hat{\sigma_i}, \hat{\sigma_j}] = 2i \epsilon_{ijk} \hat{\sigma_k} \end{aligned} \]

where \( \epsilon_{ijk} \) is the Levi-Civita symbol, an object which is equal to 0 if any of \( i,j,k \) are equal, \( \epsilon_{123} = 1 \), and which changes sign if we permute any two indices (so \( \epsilon_{ijk} = -\epsilon_{ikj} = \epsilon_{kij} \).) Finally, they also satisfy an anti-commutation relation,

\[ \begin{aligned} \{\hat{\sigma_i}, \hat{\sigma_j}\} = 2 \delta_{ij} \hat{1} \end{aligned} \]

where the anti-commutator is a sort of complementary symbol to the commutator, defined as

\[ \begin{aligned} \{\hat{A}, \hat{B}\} \equiv \hat{A} \hat{B} + \hat{B} \hat{A}. \end{aligned} \]

Commutation and compatible observables

As I hinted before, the commutator is a particularly interesting property to look at between operators. We've seen in the Stern-Gerlach experiment that the order in which we apply measurements matters: for example, projecting upper components in the order \( S_{z,+}, S_{x,+}, S_{z,-} \) gives a non-zero signal, but reversing the order to \( S_{z,+}, S_{z,-},S_{x,+} \) does not.

Two different measurement orders for a Stern-Gerlach experiment.

This non-commutation of measurements is easy to see in the operator formalism. We know that when we make a measurement, the state of the system collapses into one of the eigenstates of whatever operator we observe, for example

\[ \begin{aligned} \hat{S_z} \ket{\psi} \rightarrow \pm \frac{\hbar}{2} \ket{S_{z,\pm}}. \end{aligned} \]

Clearly if we perform the same measurement again (ignoring time evolution of the state itself, which we'll get to), Born's rule tells us that we find the same eigenvalue with probability 1. However, if we perform a different measurement, we in general throw the system into a different eigenstate, for example measuring \( \hat{S_x} \) after \( \hat{S_z} \) gives us either of the \( \hat{S_x} \) eigenstates with probability \( 1/2 \).

An obvious exception to this is when two operators share a common eigenstate, i.e. if both

\[ \begin{aligned} \hat{A} \ket{\chi} = a \ket{\chi}, \\ \hat{B} \ket{\chi} = b \ket{\chi}. \end{aligned} \]

Whenever it's clear, we'll label such a state by both eigenvalues, i.e. \( \ket{\chi} = \ket{a,b} \). When acting on such a state, it's easy to see that the order of operations doesn't matter:

\[ \begin{aligned} \hat{A} \hat{B} \ket{a,b} = \hat{A} b \ket{a,b} = ab \ket{a,b} \\ \hat{B} \hat{A} \ket{a,b} = \hat{B} a \ket{a,b} = ba \ket{a,b}. \end{aligned} \]

A more powerful statement is that the operators themselves commute, i.e. \( [\hat{A}, \hat{B}] = 0 \). If \( \hat{A} \) and \( \hat{B} \) are observables, then they are said to be compatible if they commute. As we will see next time, this implies the existence of states like \( \ket{a,b} \), so this isn't just a restricted special case!