Category Archives: Advanced

Why are D-Branes Non-Perturbative?

It’s frequently said that D-branes are non-pertubative objects. In other words, you can’t learn about them by doing a series expansion in the string coupling g. That’s because the DBI action which encodes the dynamics of D-branes couples to the dilaton field via a term e^{-\phi}. Now recall that the dilaton VEV yields the string coupling and Bob’s your uncle!

But there’s a more subtle point at work here. What determines the coupling term e^{-\phi}? For this, we must remember that D-brane dynamics may equivalently be viewed from an open string viewpoint. To get a feel for the DBI action, we can look at the low energy effective action of open strings. Lo and behold we find our promised factor of e^{-\phi}.

Yet erudite readers will know that the story doesn’t end there. Recall that in theories of gravity, we can change the metric by altering our frame of reference. In particular we can effect a Weyl rescaling to eliminate the pesky e^{-\phi}. From our new perspective, D-branes aren’t non-perturbative any more!

There’s a price to pay for this, and it’s a steep one. It turns out that this dual description turns strings into non-perturbative objects. This is quite unhelpful, since we know a fair amount about the perturbative behaviour of fundamental strings. So most people stick with the “string frame” in which D-branes are immune to the charms of perturbation theory.

Thanks to Felix Rudolph for an enlightening discussion and for bringing to my attention Tomas Ortin’s excellent volume on Gravity and Strings.

Thanks to David Berman, for pointing out my ridiculous and erroneous claim that D-branes are conformally invariant. My mistake is utterly obvious when you recall the Polyakov action for D-branes, namely

\displaystyle S[X,\gamma]= \int d^{p+1}\xi \sqrt{|\gamma|}\left(\gamma^{ij}g_{ij}+(1-p)\right)

where \xi are worldsheet coordinates, \gamma is the worldsheet metric, and g is the pull-back of the spacetime metric. Clearly conformal invariance is violated unless p=1.

From this perspective it’s even more remarkable that one gets conformally invariant \mathcal{N}=4 SYM in 4D from the low energy action of a stack of D-branes. This dimensional conspiracy is lucidly unravelled on page 192 of David Tong’s notes. But even for a 3-brane, the higher derivative operators in the \alpha' expansion ruin conformal invariance.

Incidentally, the lack of conformal invariance is a key reason why D-branes remain so mysterious. When we quantize strings, conformal invariance is enormously helpful. Without this crutch, quantization of D-branes becomes unpleasantly technical, hence our lack of knowledge!

Renormalization and Super Yang Mills Theory

It’s well known that \mathcal{N}=4 super Yang-Mills theory is perturbatively finite. This means that there’s no need to introduce a regulating cutoff to get sensible answers for scattering amplitude computations. In particular the \beta and \gamma functions for the theory vanish.

Recall that the \gamma function tells us about the anomalous dimensions of elementary fields. More specifically, if \phi is some field appearing in the Lagrangian, it must be rescaled to Z \phi during renormalization. The \gamma function then satisfies

\displaystyle \gamma(g)=\frac{1}{2}\mu\frac{d}{d\mu}\log Z(g,\mu)

where g is the coupling and \mu the renormalization scale. At a fixed point g_* of the renormalization group flow, it can be shown that \gamma(g_*) exactly encodes the difference between the classical dimension of \phi and it’s quantum scaling dimension.

Thankfully we can replace all that dense technical detail with the simple picture of a river above. This represents the space of all possible theories, and the mass scale \mu takes the place of usual time evolution. An elementary field operator travelling downstream will experience a change in scaling dimension. If it happens to get drawn into the fixed point in the middle of the whirlpool(!) the anomaly will exactly be encoded by the \gamma function.

For our beloved \mathcal{N}=4 though the river doesn’t flow at all. The theory just lives in one spot all the time, so the elementary field operators just keep their simple, classical dimensions forever!

But there’s a subtle twist in the tale, when you start considering composite operators. These are built up as products of known objects. Naively you might expect that these don’t get renormalized either, but there you would be wrong!

So what’s the problem? Well, we know that propagators have short distance singularities when their separation becomes small. To get sensible answers for the expectation value of composite operators we must regulate these. And that brings back the pesky problem of renormalization with a vengeance.

The punchline is the although \mathcal{N}=4 is finite, the full spectrum of primary operators does contain some with non-trivial scaling dimensions. And that’s just as well really, because otherwise the AdS/CFT correspondence wouldn’t be quite as interesting!

A Tale of Two Calculations

This post is mathematically advanced, but may be worth a skim if you’re a layman who’s curious how physicists do real calculations!

Recently I’ve been talking about the generalized unitarity method, extolling its virtues for 1-loop calculations. Despite all this hoodoo, I have failed to provide a single example of a successful application. Now it’s time for that to change. I’m about to show you just how useful generalized unitarity can be, borrowing examples from \mathcal{N}=4 super-Yang-Mills (SYM) and SU(3) Yang-Mills (YM).

We’ll begin by revising the general form of the generalized unitarity method. In picture form

Generalized_Unitarity

What exactly does all the notation mean? On the left hand side, I’m referring to the residue of the integrand when all the loop momenta l_i for i = 1,2,3,4 are taken on-shell. On the right hand side, I take a product of tree level diagrams with external lines as shown, and sum over the possible particle content of the l_i lines. Implicit in each of the blobs in the equation is a sum over tree level diagrams.

We’d like to use this formula to calculate 1-loop amplitudes. But hang on, doesn’t it only tell us about residues of integrands? Naively, it seems like that’s too little information to reconstruct the full result.

Fear not, however – help is at hand! Back in 1965, Don Melrose published his first paper. He presciently observed that loop diagrams in D dimensions could be expressed as linear combinations of scalar loop diagrams with \leq 4 sides. Later Bern, Dixon and Kosower generalized this result to take account of regularization.

Let’s express those words mathematically. We have

\displaystyle \mathcal{A}_n^{1\textrm{-loop}} = \sum_i D_i I_4(K^i) + \sum_j C_j I_3 (K^j) + \sum_m B_m I_2 (K^m) + R_n + O(\epsilon)\qquad (*)

where I_a are integrals corresponding to particular scalar theory diagrams, K_a^i indicate distribution of momenta on external legs, R_n is a rational function and \epsilon a regulator.

The integrals I_4, I_3 and I_2 are referred to as box, triangle and bubble integrals respectively. This is an obvious homage to their structure as Feynman diagrams. For example a triangle diagram looks like

Triangle_Diagram

where K_1, K_2, K_3 label the sums of external momenta at each of the vertices. The Feynman rules give (in dimensional regularization)

\displaystyle I_3(K_1, K_2, K_3) = \mu^{2 \epsilon}\int \frac{d^{4-2\epsilon}l}{(2\pi)^{4-2\epsilon}}\frac{1}{l^2 (l-K_1)^2 (l+K_3)^2}

We call result (*) above an integral basis expansion. It’s useful because the integrands of box, triangle and bubble diagrams have different pole structures. Thus we can reconstruct their coefficients by taking generalized unitarity cuts. Of course, the rational term cannot be determined this way. Theoretically we have reduced our problem to a simpler case, but not completely solved it.

Before we jump into a calculation, it’s worth taking a moment to consider the origin of the rational term. In Melrose’s original analysis, this term was absent. It appears in regularized versions, precisely because the act of regularization gives rise to extra rational terms at O(\epsilon^0). Such terms will be familiar if you’ve studied anomalies.

We can therefore loosely say that rational terms are associated with theories requiring renormalization. (This is not quite true; see page 44 of this review). In particular we know that \mathcal{N}=4 SYM theory is UV finite, so no rational terms appear. In theory, all 1-loop amplitudes are constructible from unitarity cuts alone!

Ignoring the subtleties of IR divergences, let’s press on and calculate an \mathcal{N}=4 SYM amplitude using unitarity. More precisely we’ll tackle the 4-point 1-loop superamplitude. It’s convenient to be conservative and cut only two propagators. To get the full result we need to sum over all channels in which we could make the cut, denoted s = (12), t = (13) and u=(14).

To make our lives somewhat easier, we’ll work in the planar limit of \mathcal{N}=4 SYM. This means we can ignore any diagrams which would be impossible to draw in the plane, in particular the u-channel ones. We make this assumption since it simplifies our analysis of the color structure of the theory. In particular it’s possible to factor out all SU(3) data as a single trace of generators in the planar limit.

Assuming this has been done, we’ll ignore color factors and calculate only the color-ordered amplitudes. We’ve got two channels to consider s and t. But since the trace is cyclic we can cyclically permute the external lines to equate the s and t channel cuts. Draw a picture if you are skeptical.

So we’re down to considering the s-channel unitarity cut. Explicitly the relevant formula is

4_particle_1_loop_SYM

where \mathcal{A}_4 is the tree level 4-particle superamplitude. Now observe that by necessity \mathcal{A}_4 must be an MHV amplitude. Indeed it is only nonvanishing if exactly two external particles have +ve helicity. Leaving the momentum conservation delta function implicit we quote the standard result

\displaystyle \mathcal{A}_4(-l_1, 1, 2, l_2) = \frac{\delta^{(8)}(L)}{\langle l_1 1\rangle\langle 1 2\rangle\langle 2l_2\rangle\langle l_2 l_1 \rangle}

where \delta^{(8)}(L) is a supermomentum conservation delta function. We get a similar result for the other tree level amplitude, involving a delta function \delta^{(8)}(R). Now by definition of the superamplitude, the sum over states can be effected as an integral over the Grassman variables \eta_{l_1} and \eta_{l_2}. Under the integral signs we may write

\displaystyle \delta^{(8)}(L) \delta^{(8)}(R) = \delta^{(8)}(L+R)\delta^{(8)}(R) = \delta^{(8)}(\tilde{Q})\delta^{(8)}(R)

where \delta^{(8)}(\tilde{Q}) is the overall supermomentum conservation delta function, which one can always factor out of a superamplitude in a supersymmetric theory. The remaining delta function gives a nonzero contribution in the integral. To evaluate this recall that the Grassman delta function for a process with n external particles has the form

\displaystyle \delta^{(8)}(R) = \prod_{A=1}^4 \sum_{i<j}^n \langle ij \rangle \eta_{iA}\eta_{jA}

We know that Grassman integration is the same as differentiation, so

\displaystyle \int d^4 \eta_{l_1} d^4 \eta_{l_2} \delta^{(8)}(R) = \langle l_1 l_2 \rangle ^4

Now plugging this in to the pictured formula we find the s-channel residue to be

\displaystyle \textrm{Res}_s = \frac{\delta^{(8)}(\tilde{Q})\langle l_1 l_2 \rangle^2}{\langle 12 \rangle\langle 34 \rangle \langle l_1 1 \rangle \langle 2 l_2 \rangle \langle l_2 4 \rangle \langle 3 l_1 \rangle} \qquad (\dagger)

Now for the second half of our strategy. We must compare this to the residues from scalar box, triangle and bubble integrands. We aim to pull out a kinematic factor depending on the external momenta, letting the basis integrand residue absorb all factors of loop momenta l_1 and l_2. But which basis integrands contribute to the residue from our unitarity cut?

This is quite easy to spot. Suppose we consider the residue of a loop integrand after a generic unitarity cut. Any remaining dependence on loop momentum l appears as factors of (l-K)^{-2}. These may be immediately matched with uncut loop propagators in the basis diagrams. Simple counting then establishes which basis diagram we want. As an example

\displaystyle \textrm{factor of }(l-K_1)^{-2}(l-K_2)^{-2}\Rightarrow 2 \textrm{ uncut propagators} \Rightarrow \textrm{box diagram}

We’ll momentarily see that this example is exactly the case for our calculation of \mathcal{A}_4^{1\textrm{-loop}}. To accomplish this, we must express the residue (\dagger) in more familiar momentum space variables. Our tools are the trusty identities

\displaystyle \langle ij \rangle [ij] =(p_i + p_j)^2

\displaystyle \sum_i \langle ri \rangle [ik] = 0

The first follows from the definition of the spinor-helicity formalism. Think of it as a consequence of the Weyl equation if you like. The second encodes momentum conservation. We’ve in fact got three set of momentum conservation to play with. There’s one each for the left and right hand tree diagrams, plus the overall (1234) relation.

To start with we can deal with that pesky supermomentum conservation delta function by extracting a factor of the tree level amplitude \mathcal{A}_4^{\textrm{tree}}. This leaves us with

\displaystyle \textrm{Res}_s = \mathcal{A}_4^{\textrm{tree}} \frac{\langle 23 \rangle \langle 41 \rangle \langle l_1 l_2 \rangle^2}{ \langle l_1 1 \rangle \langle 2 l_2 \rangle \langle l_2 4 \rangle \langle 3 l_1 \rangle}

Those factors of loop momenta in the numerator are annoying, because we know there shouldn’t be any in the momentum space result. We can start to get rid of them by multiplying top and bottom by [l_2 2]. A quick round of momentum conservation leaves us with

\displaystyle \textrm{Res}_s = \mathcal{A}_4^{\textrm{tree}} \frac{\langle 23 \rangle \langle 41 \rangle [12] \langle l_1 l_2 \rangle}{(l_2 + p_2)^2\langle l_2 4 \rangle \langle 3 l_1 \rangle}

That seemed to be a success, so let’s try it again! This time the natural choice is [3l_1]. Again momentum conservation leaves us with

\displaystyle \textrm{Res}_s = \mathcal{A}_4^{\textrm{tree}} \frac{\langle 23 \rangle \langle 41 \rangle [12] [34]}{(l_2 + p_2)^2 (l_1+p_3)^2}

Overall momentum conservation in the numerator finally leaves us with

\displaystyle \textrm{Res}_s = -\mathcal{A}_4^{\textrm{tree}} \frac{\langle 12 \rangle [12] \langle 23 \rangle [23]}{(l_2 + p_2)^2 (l_1+p_3)^2} = -\mathcal{A}_4^{\textrm{tree}} \frac{st}{(l_2 + p_2)^2 (l_1+p_3)^2}

where s and t are the standard Mandelstam variables. Phew! That was a bit messy. Unfortunately it’s the price you pay for the beauty of spinor-helicity notation. And it’s a piece of cake compared with the Feynman diagram approach.

Now we can immediately read off the dependence of the residue on loop momenta. We have two factors of the form (l-K)^{-2} so our result matches only the box integral. Therefore the 4-point 1-loop amplitude in \mathcal{N}=4 SYM takes the form

\displaystyle \mathcal{A}_4^{1\textrm{-loop}} = DI_4(p_1,p_2,p_3,p_4)

We determine the kinematic constant D by explicitly computing the I_4 integrand residue on our unitarity cut. This computation quickly yields

\displaystyle \mathcal{A}_4^{1\textrm{-loop}} = st \mathcal{A}_4^{\textrm{tree}}I_4(p_1,p_2,p_3,p_4)

Hooray – we are finally done. Although this looks like a fair amount of work, each step was mathematically elementary. The entire calculation fits on much less paper than the equivalent Feynman diagram approach. Naively you’d need to draw 1-loop diagrams for all the different particle scattering processes in \mathcal{N}=4 SYM, including possible ghost states in the loops. This itself would take a long time, and that’s before you’ve evaluated a single integral! In fact the first computation of this result didn’t come from classical Feynman diagrams, but rather as a limit of string theory.

A quick caveat is in order here. The eagle-eyed amongst you may have spotted that my final answer is wrong by a minus sign. Indeed, we’ve been very casual with our factors of i throughout this post. Recall that Feynman rules usually assign a factor of i to each propagator in a diagram. But we’ve completely ignored this prescription!

Sign errors and theorists are best of enemies. So we’d better confront our nemesis and find that missing minus sign. In fact it’s not hard to see where it comes from. The only place in our calculation where extra factors of i wouldn’t simply cancel comes from the cut propagators. Look back at the very first figure and observe that the left hand side has four more factors of i than the right.

Of course we’ve only cut two propagators to obtain the amplitude. This means that we should pick up an extra factor of (1/i)^2 = -1. This precisely corrects the sign error than pedants (or experimentalists) would find irritating!

I promised an SU(3) YM calculation, and I won’t disappoint. This will also provide a chance to show off generalized unitarity in all it’s glory. Explicitly we’re going to show that the NMHV gluon four-mass box coefficients vanish.

To start with, let’s disentangle some of that jargon. Remember that an n-particle NMHV gluon amplitude has 3 negative helicity external gluons and n-3 positive helicity ones. The four-mass condition means that each corner of the box has more than two external legs, so that the outgoing momentum is a massive 4-vector.

The coefficient of the box diagram will be given by a generalized unitarity cut of four loop propagators. Indeed triangle and bubble diagrams don’t even have four propagators available to cut, which mathematically translates into a zero contribution to the residue. The usual rules to compute residues tell us that we’ll always have a zero numerator factor left over in for bubble and triangle integrands.

Now the generalized unitarity method tells us to compute the product of four tree diagrams. By our four-mass assumption, each of these has at least 4 external gluons. We must have exactly 4 negative helicity and 4 positive helicity gluons from the cut propagators since all lines are assumed outgoing. We have exactly 3 further negative helicity particles by our NMHV assumption, so 7 negative helicity gluons to go round.

But tree level diagrams with \geq 4 legs must have at least 2 negative helicity gluons to be non-vanishing. This is not possible with our setup, since 7 < 8. We conclude that the NMHV gluon four-mass box coefficients vanish.

Our result here is probably a little disappointing compared with the \mathcal{N}=4 SYM example above. There we were able to completely compute a 4 point function at 1-loop. But for ordinary YM there are many more subcases to consider. Heuristically we lack enough symmetry to constrain the amplitude fully, so we have to do more work ourselves! A full analysis would consider all box cases, then move on to nonzero contributions from triangle and bubble integrals. Finally we’d need to determine the rational part of the amplitude, perhaps using BCFW recursion at loop level.

Don’t worry – I don’t propose to go into any further detail now. Hopefully I’ve sketched the mathematical landscape of amplitudes clearly enough already. I leave you with the thought-provoking claim that the simplest QFTs are those with the most symmetry. As Arkani-Hamed, Cachazo and Kaplan explain, this is at odds with our childhood desire for simple Lagrangians!

What Can Unitarity Tell Us About Amplitudes?

Let’s start by analysing the discontinuities in amplitudes, viewed as a function of external momenta. The basic Feynman rules tell us that 1-loop processes yield amplitudes of the form

\displaystyle \int d^4 l \frac{A}{l^2(p+q-l)^2}

where A is some term independent of l. This yields a complex logarithm term, which thus gives a branch cut as a function of a Mandelstam variable (p+q)^2.

It’s easy to get a formula for the discontinuity across such a cut. Observe first that amplitudes are real unless some internal propagator goes on shell. Indeed when an internal line goes on shell the i\epsilon prescription yields an imaginary contribution.

Now suppose we are considering some process as a function of an external momentum invariant s, like a Mandelstam variable. Consider the internal line whose energy is encoded by s. If s is lower than the threshold for producing a multiparticle state, then the internal line cannot go on shell. In that case the amplitude and s are both real so we may write

\displaystyle \mathcal{A}(s) = \mathcal{A}(s^*)^*

Now we analytically continue s to the whole complex plane. This equation must still hold, since each side is an analytic function of s. Fix s at some real value greater than the threshold for multiparticle state production, so that the internal line can go on shell. In this situation of course we expect a branch cut.

Our formula above enforces the relations

\displaystyle \textrm{Re}\mathcal{A}(s+i\epsilon) = \textrm{Re}\mathcal{A}(s-i\epsilon)

\displaystyle \textrm{Im}\mathcal{A}(s+i\epsilon) = -\textrm{Im}\mathcal{A}(s-i\epsilon)

Thus we must indeed have a branch cut for s in this region, with discontinuity given by

\displaystyle \textrm{Disc}\mathcal{A}(s) = 2\textrm{Im}\mathcal{A}(s) \qquad (*)

Now we’ve got a formula for the discontinuity across a general amplitude branch cut, we’re in a position to answer our original question. What can unitarity tell us about discontinuities?

When I say unitarity, I specifically mean the unitarity of the S-matrix. Remember that we compute amplitudes by sandwiching the S-matrix between incoming and outgoing states defined at a common reference time in the far past. In fact we usually discard non-interacting terms by considering instead the T-matrix defined by

\displaystyle S = \mathbf{1}+iT

The unitarity of the S-matrix, namely S^\dagger S = \mathbf{1} yields for the T-matrix the relation

\displaystyle 2\textrm{Im}(T) = T^\dagger T

Okay, I haven’t quite been fair with that final line. In fact it should make little sense to you straight off! What on earth is the imaginary part of a matrix, after all? Before you think to deeply about any mathematical or philosophical issues, let me explain that the previous equation is simply a notation. We understand it to hold when evaluated between any incoming and outgoing states. In other words

\displaystyle 2 \textrm{Im} \langle \mathbf{p}_1 \dots \mathbf{p}_n | T | \mathbf{k}_1 \dots \mathbf{k}_m\rangle = \langle \mathbf{p}_1 \dots \mathbf{p}_n | T^\dagger T | \mathbf{k}_1 \dots \mathbf{k}_m\rangle

But there’s still a problem: how do you go about evaluating the T^\dagger T term? Thinking back to the heady days of elementary quantum mechanics, perhaps you’re inspired to try inserting a completeness relation in the middle. That way you obtain a product of amplitudes, which are things we know how to compute. The final result looks like

\displaystyle 2 \textrm{Im} \langle \mathbf{p}_1 \dots \mathbf{p}_n | T | \mathbf{k}_1 \dots \mathbf{k}_m\rangle = \sum_l \left(\prod_{i=1}^l \int\frac{d^3 \mathbf{q}_i}{(2\pi)^3 2E_i}\right) \langle \mathbf{p}_1 \dots \mathbf{p}_n | T^\dagger | \{\mathbf{q_i}\} \rangle \langle \{\mathbf{q_i}\} | T | \mathbf{k}_1 \dots \mathbf{k}_m\rangle

Now we are in business. All the matrix elements in this formula correspond to amplitudes we can calculate. Using equation (*) above we can then relate the left hand side to a discontinuity across a branch cut. Heuristically we have the equation

\displaystyle \textrm{Disc}\mathcal{A}(1,\dots m \to 1,\dots n) = \sum_{\textrm{states}} \mathcal{A}(1,\dots m \to \textrm{state})\mathcal{A}(1,\dots n \to \textrm{state})^* \qquad (\dagger)

Finally, after a fair amount of work, we can pull out some useful information! In particular we can make deductions based on a loop expansion in powers of \hbar viz.

\displaystyle \mathcal{A}(m,n) = \sum_{L=0}^\infty \hbar^L \mathcal{A}^{(L)}(m,n)

where \mathcal{A}^{(L)}(m,n) is the L-loop amplitude with m incoming and n outgoing particles. Expanding equation (\dagger) order by order in \hbar we obtain

\displaystyle \textrm{Disc}\mathcal{A}^{(0)}(m,n) = 0

\displaystyle \textrm{Disc}\mathcal{A}^{(1)}(m,n) = \sum_{\textrm{states}} \mathcal{A}^{(0)}(m,\textrm{state})\mathcal{A}^{(0)}(n,\textrm{state})^*

and so forth. The first equation says that tree amplitudes have no branch cuts, which is immediately obvious from the Feynman rules. The second equation is more interesting. It tells us that the discontinuities of 1-loop amplitudes are given by products of tree level amplitudes! We can write this pictorially as

Unitarity_Method_General_Example (1)

Here we have specialized to m=2, n=3 and have left implicit a sum over the possible intermediate states. This result is certainly curious, but it’s hard to see how it can be useful in its current form. In particular, the sum we left implicit involves an arbitrary number of states. We’d really like a simpler relation which involves a well-defined, finite number of Feynman diagrams.

It turns out that this can be done, provided we consider particular channels in which the loop discontinuities occur. For each channel, the associated discontinuity is computed as a product of tree level diagrams obtained by cutting two of the loop propagators. By momentum conservation, each channel is uniquely determined by a subset of external momenta. Thus we label channels by their external particle content.

How exactly does this simplification come about mathematically? To see this we must take a more detailed look at Feynman diagrams, and particularly at the on-shell poles of loop integrands. This approach yields a pragmatic method, at the expense of obscuring the overarching role of unitarity. The results we’ve seen here will serve as both motivation and inspiration for the pedestrian perturbative approach.

We leave those treats in store for a future post. Until then, take care, and please don’t violate unitarity.

Twistor Transform

Long time no speak! I hope to be back blogging more regularly this summer, now that I’ve completed Part III. I’ll hopefully be talking about Wilson loops, lattice QCD and SUSY. I’ll try to throw in a few things for the layman too, however.

In the meantime, here’s a copy of my Part III essay on the Twistor Transform. The intended audience is beginning graduate students in theoretical physics, with no assumed knowledge of twistor theory. I tried to do things rigorously, so I hope it isn’t too dry for your tastes!

The Theorem of The Existence of Zeroes

It’s time to prove the central result of elementary algebraic geometry. Mostly it’s referred to as Hilbert’s Nullstellensatz. This German term translates precisely to the title of this post. Indeed ‘Null’ means ‘zero’, ‘stellen’ means to exist and ‘Satz’ means theorem. But referring to it merely as an existence theorem for zeroes is inadequate. Its real power is in setting up a correspondence between algebra and geometry.

Are you sitting comfortably? Grab a glass of water (or wine if you prefer). Settle back and have a peruse of these theorems. This is your first glance into the heart of a magical subject.

(In many texts these theorems are all referred to as the Nullstellensatz. I think this is both pointless and confusing, so have renamed them! If you have any comments or suggestions about these names please let me know).

Theorem 4.1 (Hilbert’s Nullstellensatz) Let J\subsetneq k[\mathbb{A}^n] be a proper ideal of the polynomial ring. Then V(J)\neq \emptyset. In other words, for every nontrivial ideal there exists a point which simulataneously zeroes all of its elements.

Theorem 4.2 (Maximal Ideal Theorem) Every maximal ideal \mathfrak{m}\subset k[\mathbb{A}^n] is of the form (x-a_1,\dots,x-a_n) for some (a_1,\dots,a_n)\in \mathbb{A}^n. In other words every maximal ideal is the ideal of some single point in affine space.

Theorem 4.3 (Correspondence Theorem) For every ideal J\subset k[\mathbb{A}^n] we have I(V(J))=\sqrt{J}.

We’ll prove all of these shortly. Before that let’s have a look at some particular consequences. First note that 4.1 is manifestly false if k is not algebraically closed. Consider for example k=\mathbb{R} and n=1. Then certainly V(x^2+1)=\emptyset. Right then. From here on in we really must stick just to algebraically closed fields.

Despite having the famous name, 4.1 not really immediately useful. In fact we’ll see its main role is as a convenient stopping point in the proof of 4.3 from 4.2. The maximal ideal theorem is much more important. It precisely provides the converse to Theorem 3.10. But it is the correspondence theorem that is of greatest merit. As an immediate corollary of 4.3, 3.8 and 3.10 (recalling that prime and maximal ideals are radical) we have

Corollary 4.4 The maps V,I as defined in 1.2 and 2.4 give rise to the following bijections

\{\textrm{affine varieties in }\mathbb{A}^n\} \leftrightarrow \{\textrm{radical ideals in } k[\mathbb{A}^n]\}
\{\textrm{irreducible varieties in }\mathbb{A}^n\} \leftrightarrow \{\textrm{prime ideals in } k[\mathbb{A}^n]\}
\{\textrm{points in }\mathbb{A}^n\} \leftrightarrow \{\textrm{maximal ideals in } k[\mathbb{A}^n]\}

Proof We’ll prove the first bijection explicitly, for it is so rarely done in the literature. The second and third bijections follow from the argument for the first and 3.8, 3.10. Let J be a radical ideal in k[\mathbb{A}^n]. Then V(J) certainly an affine variety so V well defined. Moreover V is injective. For suppose \exists J' radical with V(J')=V(J). Then I(V(J'))=I(V(J)) and thus by 4.3 J = J'. It remains to prove that V surjective. Take X an affine variety. Then J'=I(X) an ideal with V(J')=X by Lemma 2.5. But J' not necessarily radical. Let J=\sqrt{J'} a radical ideal. Then by 4.3 I(V(J'))=J. So V(J) = V(I(V(J')) = V(J') = X by 2.5. This completes the proof. \blacksquare

We’ll see in the next post that we need not restrict our attention to \mathbb{A}^n. In fact using the coordinate ring we can gain a similar correspondence for the subvarieties of any given variety. This will lead to an advanced introduction to the language of schemes. With these promising results on the horizon, let’s get down to business. We’ll begin by recalling a definition and a theorem.

Definition 4.5 A finitely generated k-algebra is a ring R s.t. R \cong k[a_1,\dots,a_n] for some a_i \in R. A finite k-algebra is a ring R s.t. R\cong ka_1 + \dots ka_n.

Observe how this definition might be confusing when compared to a finitely generated k-module. But applying a broader notion of ‘finitely generated’ to both algebras and modules clears up the issue. You can check that the following definition is equivalent to those we’ve seen for algebras and modules. A finitely generated algebra is richer than a finitely generated module because an algebra has an extra operation – multiplication.

Definition 4.6 We say an algebra (module) A is finitely generated if there exists a finite set of generators F s.t. A is the smallest algebra (module) containing F. We then say that A is generated by F.

Theorem 4.7 Let k be a general field and A a finitely generated k-algebra. If A is a field then A is algebraic over k.

Okay I cheated a bit saying ‘recall’ Theorem 4.7. You probably haven’t seen it anywhere before. And you might think that it’s a teensy bit abstract! Nevertheless we shall see that it has immediate practical consequences. If you are itching for a proof, don’t worry. We’ll in fact present two. The first will be due to Zariski, and the second an idea of Noether. But before we come to those we must deduce 4.1 – 4.3 from 4.7.

Proof of 4.2 Let m \subset k[\mathbb{A}^n] be a maximal ideal. Then F = k[\mathbb{A}^n]/m a field. Define the natural homomorphism \pi: k[\mathbb{A}^n] \ni x \mapsto x+m \in F. Note F is a finitely generated k-algebra, generated by the x_i+m certainly. Thus by 4.7 F/k is an algebraic extension. But k was algebraically closed. Hence k is isomorphic to F via \phi : k \rightarrowtail k[\mathbb{A}^n] \xrightarrow{\pi} F.

Let a_i = \phi^{-1}(x_i+m). Then \pi(x_i - a_i) = 0 so x_i - a_i \in \textrm{ker}\pi = m. Hence (x_1-a_1, \dots, x_n-a_n) \subset m. But (x_1-a_1, \dots, x_n-a_n) is itself maximal by 3.10. Hence m = (x_1-a_1, \dots, x_n-a_n) as required. \blacksquare

That was really quite easy! We just worked through the definitions, making good use of our stipulation that k is algebraically closed. We’ll soon see that all the algebraic content is squeezed into the proof of 4.7

Proof of 4.1 Let J be a proper ideal in the polynomial ring. Since k[\mathbb{A}^n] Noetherian J\subset m some maximal ideal. From 4.2 we know that m=I(P) some point P\in \mathbb{A}^n. Recall from 2.5 that V(I(P)) = \{P\} \subset V(J) so V(J) \neq \emptyset. \blacksquare

The following proof is lengthier but still not difficult. Our argument uses a method known as the Rabinowitsch trick.

Proof of 4.3 Let J\triangleleft k[\mathbb{A}^n] and f\in I(V(J)). We want to prove that \exists N s.t. f^N \in J. We start by introducing a new variable t. Define an ideal J_f \supset J by J_f = (J, ft - 1) \subset k[x_1,\dots,x_n,t]. By definition V(J_f) = \{(P,b) \in \mathbb{A}^{n+1} : P\in V(J), \ f(P)b = 1\}. Note that f \in I(V(J)) so V(J_f) = \emptyset.

Now by 4.1 we must have that J_f improper. In other words J_f = k[x_1,\dots, x_n, t]. In particular 1 \in J_f. Since k[x_1,\dots, x_n, t] is Noetherian we know that J finitely generated by some \{f_1,\dots,f_r\} say. Thus we can write 1 = \sum_{i=1}^r g_i f_i + g_o (ft - 1) where g_i\in k[x_1,\dots , x_n, t] (*).

Let N be such that t^N is the highest power of t appearing among the g_i for =\leq i \leq r. Now multiplying (*) above by f^N yields f^N = \sum_{i=1}^r G_i(x_1,\dots, x_n, ft) f_i + G_0(x_1,\dots,x_n,ft)(ft-1) where we define G_i = f^N g_i. This equation is valid in k[x_1,\dots,x_n, t]. Consider its reduction in the ring k[x_1,\dots,x_n,t]/(ft - 1). We have the congruence f_N\equiv \sum_{i=1}^r h_i (x_1,\dots,x_n) f_i \ \textrm{mod}\ (ft-1) where h_i = G_i(x_1,\dots,x_n,1).

Now consider the map \phi:k[x_1,\dots, x_n]\rightarrowtail k[x_n,\dots, x_n,t]\xrightarrow{\pi} k[x_n,\dots, x_n,t]/(ft-1). Certainly nothing in the image of the injection can possibly be in the ideal (ft - 1), not having any t dependence. Hence \phi must be injective. But then we see that f^N = \sum_{i=1}^r h_i(x_1,\dots, x_n) f_i holds in the ring k[\mathbb{A}^n]. Recalling that the f_i generate J gives the result. \blacksquare

We shall devote the rest of this post to establishing 4.7. To do so we’ll need a number of lemmas. You might be unable to see the wood for the trees! If so, you can safely skim over much of this. The important exception is Noether normalisation, which we’ll come to later. I’ll link the ideas of our lemmas to geometrical concepts at our next meeting.

Definition 4.8 Let A,B be rings with B \subset A. Let a\in A. We say that a is integral over B if a is the root of some monic polynomial with roots in B. That is to say \exists b_i \in B s.t. a^n + b_{n-1}a^{n-1} + \dots + b_0 = 0. If every a \in A is integral over B we say that A  is integral over B or A is an integral extension of B.

Let’s note some obvious facts. Firstly we can immediately talk about A being integral over B when A,B are algebras with B a subalgebra of A. Remember an algebra is still a ring! It’s rather pedantic to stress this now, but hopefully it’ll prevent confusion if I mix my termin0logy later. Secondly observe that when A and B are fields “integral over” means exactly the same as “algebraic over”.

We’ll begin by proving some results that will be of use in both our approaches. We’ll see that there’s a subtle interplay between finite k-algebras, integral extensions and fields.

Lemma 4.9 Let F be a field and R\subset F a subring. Suppose F is an integral extension of R. Then R is itself a field.

Proof Let r \in R. Then certainly r \in F so r^{-1} \in F since F a field. Now r^{-1} integral over R so satisfies an equation r^-n = b_{n-1} r^{-n+1} +\dots + b_0 withb_i \in R. But now multiplying through by r^{n-1} yields r^{-1} = b_{n-1} + \dots + b_0 r^{n-1} \in R. \blacksquare

Note that this isn’t obvious a priori. The property that an extension is integral contains sufficient information to percolate the property of inverses down to the base ring.

Lemma 4.10 If A is a finite B algebra then A is integral over B.

Proof Write A = Ba_1 + \dots +Ba_n. Let x \in A. We want to prove that x satisfies some equation x^n + b_{n-1}x^n{n-1} + \dots + b_0 = 0. We’ll do so by appealing to our knowledge about determinants. For each a_i we may clearly write xa_i = \sum_{i=1}^{n} b_{ij}a_j for some b_ij \in B.

Writing \vec{a} = (a_1, \dots, a_n) and defining the matrix (\beta)_{ij} = b_{ij} we can express our equation as \beta a = xa. We recognise this as an eigenvalue problem. In particular x satisfies the characteristic polynomial of \beta, a polynomial of degree n with coefficients in B. But this is precisely what we wanted to show. \blacksquare

Corollary 4.11 Let A be a field and B\subset A a subring. If A is a finite B-algebra then B is itself a field.

Proof Immediate from 4.9 and 4.10. \blacksquare

We now focus our attention on Zariski’s proof of the Nullstellensatz. I take as a source Daniel Grayson’s excellent exposition.

Lemma 4.12 Let R be a ring an F a R-algebra generated by x \in F. Suppose further that F a field. Then \exists s \in R s.t. S = R[s^{-1}] a field.  Moreover x is algebraic over S.

Proof Let R' be the fraction field of R. Now recall that x is algebraic over R' iff R'[x] \supset R'(x). Thus x is algebraic over R' iff R'[x] is a field. So certainly our x is algebraic over R' for we are given that F a field. Let x^n + f_{n-1}x^{n-1} + \dots + f_0 be the minimal polynomial of x.

Now define s\in R to be the common denominator of the f_i, so that f_0,\dots, f_{n-1} \in R[s^{-1}] = S. Now x is integral over S so F/S an integral extension. But then by 4.9 S a field, and x algebraic over it. \blacksquare

Observe that this result is extremely close to 4.7. Indeed if we take R to be a field we have S = R in 4.12. Then lemma then says that R[x] is algebraic as a field extension of R. Morally this proof mostly just used definitions. The only nontrivial fact was the relationship between R'(x) and R'[x]. Even this is not hard to show rigorously from first principles, and I leave it as an exercise for the reader.

We’ll now attempt to generalise 4.12 to R[x_1,\dots,x_n]. The argument is essentially inductive, though quite laborious. 4.7 will be immediate once we have succeeded.

Lemma 4.13 Let R = F[x] be a polynomial ring over a field F. Let u\in R. Then R[u^{-1}] is not a field.

Proof By Euclid, R has infinitely many prime elements. Let p be a prime not dividing u. Suppose \exists q \in R[u^{-1}] s.t. qp = 1. Then q = f(u^{-1}) where f a polynomial of degree n with coefficients in R. Hence in particular u^n = u^n f(u^{-1}) p holds in R for u^n f(u^{-1}) \in R. Thus p | u^n but p prime so p | u. This is a contradiction. \blacksquare

Corollary 4.14 Let K be a field, F\subset K a subfield, and x \in K. Let R = F[x]. Suppose \exists u\in R s.t. R[u^{-1}] = K. Then x is algebraic over F. Moreover R = K.

Proof Suppose x were transcendental over F. Then R=F[x] would be a polynomial ring, so by 4.12 R[u^{-1}] couldn’t be a field. Hence x is algebraic over F so R is a field. Hence R=R[u{-1}]=K. \blacksquare

The following fairly abstract theorem is the key to unlocking the Nullstellensatz. It’s essentially a slight extension of 4.14, applying 4.12 in the process. I’d recommend skipping the proof first time, focussing instead on how it’s useful for the induction of 4.16.

Theorem 4.15 Take K a field, F \subset K a subring, x \in K. Let R = F[x]. Suppose \exists u\in R s.t. R[u^{-1}] = K. Then \exists 0\neq s \in F s.t. F[s^{-1}] is a field. Moreover F[s^{-1}][x] = K and x is algebraic over F[s^{-1}].

Proof Let L=\textrm{Frac}(F). Now by 4.14 we can immediately say that L[x]=K, with x algebraic over L. Now we seek our element s with the desired properties. Looking back at 4.12, we might expect it to be useful. But to use 4.12 for our purposes we’ll need to apply it to some F' = F[t^{-1}] with F'[x] = K, where t \in F.

Suppose we’ve found such a t. Then 4.12 gives us s' \in F' s.t. F'[s'^{-1}] a field with x algebraic over it. But now s' = qt^{-m} some q \in F, \ m \in \mathbb{N}. Now F'[s'^{-1}]=F[t^{-1}][s'^{-1}]=F[(qt)^{-1}], so setting =qt completes the proof. (You might want to think about that last equality for a second. It’s perhaps not immediately obvious).

So all we need to do is find t. We do this using our first observation in the proof. Observe that u^{-1}\in K=L[x] so we can write u^{-1}=l_0+\dots +l_{n-1}x^{n-1}, l_i \in L. Now let t \in F be a common denominator for all the l_i. Then u^{-1} \in F'=F[t^{-1}] so F'[x]=K as required. \blacksquare

Corollary 4.16 Let k a ring, A a field, finitely generated as a k-algebra by x_1,\dots,x_n. Then \exists 0\neq s\in k s.t. k[s^{-1}] a field, with A a finite algebraic extension of k[s^{-1}]. Trivially if k a field, then A is algebraic over k, establishing 4.7.

Proof Apply Lemma 4.15 with F=k[x_1,\dots,x_{n-1}], x=x_n, u=1 to get s'\in F s.t. A' = k[x_1,\dots,x_{n-1}][s'^{-1}] is a field with x_n algebraic over it. But now apply 4.15 again with F=k[x_1,\dots,x_{n-2}], u = s' to deduce that A''=k[x_1,\dots, x_{n-2}][s''^{-1}] is a field, with A' algebraic over A'', for some s'' \in F. Applying the lemma a further (n-2) times gives the result. \blacksquare

This proof of the Nullstellensatz is pleasingly direct and algebraic. However it has taken us a long way away from the geometric content of the subject. Moreover 4.13-4.15 are pretty arcane in the current setting. (I’m not sure whether they become more meaningful with a better knowledge of the subject. Do comment if you happen to know)!

Our second proof sticks closer to the geometric roots. We’ll introduce an important idea called Noether Normalisation along the way. For that you’ll have to come back next time!

Invariant Theory and David Hilbert

Health warning: this post is part of a more advanced series on commutative algebra. It may be a little tricky for the layman to understand!

David Hilbert was perhaps the greatest mathematicians of the late 19th century. Much of his work laid the foundations for our modern study of commutative algebra. In doing so, he was sometimes said to have killed the study of invariants by solving the central problem in the field. In this post I’ll give a sketch of how he did so.

Motivated by Galois Theory we ask the following question. Given a polynomial ring S = k[x_1,\dots,x_n] and a group G acting on S as a k-automorphism, what are the elements of S that are invariant under the action of G? Following familiar notation we denote this set S^G and note that it certainly forms a subalgebra of S.

In the late 19th century it was found that S^G could be described fully by a finite set of generators for several suggestive special cases of G. It soon became clear that the fundamental problem of invariant theory was to find necessary and sufficient conditions for S^G to be finitely generated. Hilbert’s contribution was an incredibly general sufficient condition, as we shall soon see.

To begin with we shall recall the alternative definition of a Noetherian ring. It is a standard proof that this definition is equivalent to that which invokes the ascending chain condition on ideals. As an aside, also recall that the ascending chain condition can be restated by saying that every nonempty collection of ideals has a maximal element.

Definition A.1 A ring R is Noetherian if every ideal of R is finitely generated.

We shall also recall without proof Hilbert’s Basis Theorem, and draw an easy corollary.

Theorem A.2 If R Noetherian then R[x] Noetherian.

Corollary A.3 If S is a finitely generated algebra over R, with R Noetherian, then S Noetherian.

Proof We’ll first show that any homomorphic image of R is Noetherian. Let I be an ideal in the image under than homomorphism f. Then f^{-1}(I) an ideal in R. Indeed if k\in f^{-1}(I) and r\in R then f(rk)=f(r)f(k)\in I so rk \in f^{-1}(I). Hence f^{-1}(I) finitely generated, so certainly I finitely generated, by the images of the generators of f^{-1}(I).

Now we’ll prove the corollary. Since S is a finitely generated algebra over R, S is a homomorphic image of R[x_1,\dots,x_n] for some n, by the obvious homomorphism that takes each x_i to a generator of S. By Theorem A.2 and induction we know that R[x_1,\dots,x_n] is Noetherian. But then by the above, S is Noetherian. \blacksquare

Since we’re on the subject of Noetherian things, it’s probably worthwhile introducing the concept of a Noetherian module. The subsequent theorem is analogous to A.3 for general modules. This question ensures that the theorem has content.

Definition A.4 An R-module M is Noetherian if every submodule N is finitely generated, that is, if every element of N can be written as a polynomial in some generators \{f_1,\dots,f_n\}\subset N with coefficients in R.

Theorem A.5 If R Noetherian and M a finitely generated R-module then M Noetherian.

Proof Suppose M generated by f_1,\dots,f_t, and let N be a submodule. We show N finitely generated by induction on t.

If t=1 then clearly the map h:R\rightarrow M defined by 1\mapsto f_1 is surjective. Then the preimage of N is an ideal, just as in A.3, so is finitely generated. Hence N is finitely generated by the images of the generators of h^{-1}(N).  (*)

Now suppose t>1. Consider the quotient map h:M \to M/Rf_1. Let \tilde{N} be the image of N under this map. Then by the induction hypothesis \tilde{N} is finitely generated as it is a submodule of M/Rf_1. Let g_1,\dots,g_s be elements of N whose images generate \tilde{N}. Since Rf_1 is a submodule of M generated by a single element, we have by (*) that it’s submodule Rf_1\cap N is finitely generated, by h_1,\dots,h_r say.

We claim that \{g_1,\dots,g_s,h_1,\dots,h_r\} generate N. Indeed given n \in N the image of n \in N is a linear combination of the images of the g_i. Hence subtracting the relevant linear combination of the g_i from n produces an element of N \cap Rf_1 which is precisely a linear combination of the h_i by construction. This completes the induction. \blacksquare

We’re now ready to talk about the concrete problem that Hilbert solved using these ideas, namely the existence of finite bases for invariants. We’ll take k to be a field of characteristic 0 and G to be a finite group, or one of the linear groups \textrm{ GL}_n(k),\ \textrm{SL}_n(k). As in our notation above, we take S=k[x_1,\dots,x_n].

Suppose also we are given a group homomorphism \phi:G \to \textrm{GL}_r(k), which of course can naturally be seen as the group of invertible linear transformations of the vector space V over k with basis x_1,\dots,x_r. This is in fact the definition of a representation of G on the vector space V. As is common practice in representation theory, we view G as acting on V via (g,v)\mapsto \phi(g)v.

If G is \textrm{SL}_n(k) or \textrm{GL}_n(k) we shall further suppose that our representation of G is rational. That is, the matrices g \in G act on V as matrices whose entries are rational functions in the entries of g. (If you’re new to representation theory like me, you might want to read that sentence twice)!

We now extend the action of g\in G from V to the whole of S by defining (g,f)\mapsto f(g^{-1}(x_1),\dots,g^{-1}(x_r),x_{r+1},\dots,x_n). Thus we may view G as an automorphism group of S. The invariants under G are those polynomials left unchanged by the action of every g \in G, and these form a subring of S which we’ll denote S^G.

Enough set up. To proceed to more interesting territory we’ll need to make another definition.

Definition A.6 A polynomial is called homogeneous, homogeneous form, or merely a form, if each of its monomials with nonzero coefficient has the same total degree.

Hilbert noticed that the following totally obvious fact about S^G was critically important to the theory of invariants. We may write S^G as a direct sum of the vector spaces R_i of homogeneous forms of degree i that are invariant under G. We say that S^G may be graded by degree and use this to motivate our next definition.

Definition A.7 A graded ring is a ring R together with a direct sum decomposition as abelian groups R = R_0 \oplus R_1 \oplus \dots, such that R_i R_j \subset R_{i+j}.

This allows us to generalise our notion of homogeneous also.

Definition A.8 A homogeneous element of a graded ring R is an element of one of the groups R_i. A homogeneous ideal of R is an ideal generated by homogeneous elements.

Be warned that clearly homogeneous ideals may contain many inhomogeneous elements! It’s worth mentioning that there was no special reason for taking \mathbb{N} as our indexing set for the R_i. We can generalise this easily to \mathbb{Z}, and such graded rings are often called \mathbb{Z}-graded rings. We won’t need this today, however.

Note that if f \in R we have a unique expression for f of the form f = f_0 + f_1 + \dots + f_n with f_i \in R_i. (I have yet to convince myself why this terminates generally, any thoughts? I’ve also asked here.) We call the f_ihomogeneous component of f.

The next definition is motivated by algebraic geometry, specifically the study of projective varieties. When we arrive at these in the main blog (probably towards the end of this month) it shall make a good deal more sense!

Definition A.9 The ideal in a graded ring R generated by all forms of degree greater than 0 is called the irrelevant ideal and notated R_+.

Now we return to our earlier example. We may grade the polynomial ring S=k[x_1,\dots,x_n] by degree. In other words we write S=S_0\oplus S_1 \oplus \dots with S_i containing all the forms (homogeneous polynomials) of degree i.

To see how graded rings are subtly useful, we’ll draw a surprisingly powerful lemma.

Lemma A.10 Let I be a homogeneous ideal of a graded ring R, with I generated by f_1,\dots,f_r. Let f\in I be a homogeneous element. Then we may write f = \sum f_i g_i with g_i homogeneous of degree \textrm{deg}(f)-\textrm{deg}(f_i).

Proof We can certainly write f = \sum f_i G_i with G_i \in R. Take g_i to be the homogeneous components of G_i of degree \textrm{deg}(f)-\textrm{deg}(f_i). Then all other terms in the sum must cancel, for f is homogeneous by assumption. \blacksquare

Now we return to our attempt to emulate Hilbert. We saw earlier that he spotted that grading S^G by degree may be useful. His second observation was this. \exists maps \phi:S\to S^G of S^G-modules s.t. (1) \phi preserves degrees and (2) \phi fixes every element of S^G. It is easy to see that this abstract concept corresponds intuitively to the condition that S^G be a summand of the graded ring S.

This is trivial to see in the case that G is a finite group. Indeed let \phi (f) = \frac{1}{|G|}\sum_{g\in G} g(f). Note that we have implicitly used that k has characteristic zero to ensure that the multiplicative inverse to |G| exists. In the case that G is a linear group acting rationally, then the technique is to replace the sum by an integral. The particulars of this are well beyond the scope of this post however!

We finally prove the following general theorem. We immediately get Hilbert’s result on the finite generation of classes of invariants by taking R=S^G.

Theorem A.11 Takek a field and S=k[x_1,\dots,x_n] a polynomial ring graded by degree. Let R be a k-subalgebra of S. Suppose R is a summand of S, in the sense described above. Then R is finitely generated as a k-algebra.

Proof Let I\subset R be the ideal of R generated by all homogeneous elements of degree > 0. By the Basis Theorem S is Noetherian, and IS an ideal of S, so finitely generated. By splitting each generator into its homogeneous components we may assume that IS is generated by some homogeneous elements f_1,\dots,f_s which we may wlog assume lie in I. We’ll prove that these elements precisely generate R as a k-algebra.

Now let R' be the k-subalgebra of S generated by f_1,\dots,f_s and take f\in R a general homogeneous polynomial. Suppose we have shown f\in R'. Let g be a general element of R. Then certainly g\in S a sum of homogeneous components. But R a summand of S, so applying the given map \phi we have that the homogeneous components are wlog in R. Thus g\in R' also, and we are done.

It only remains to prove f \in R' which we’ll do by induction on the degree of f. If \textrm{deg}(f)=0 then f\in K\subset R'. Suppose \textrm{deg}(f)>0 so f\in I. Since the f_i generate IS as a homogeneous ideal of S we may write f = \sum f_i g_i with g_i homogeneous of degree \textrm{deg}(f)-\textrm{deg}(f_i)<\textrm{deg}(f) by Lemma A.10. But again we may use the map \phi obtained from our observation that R a summand of S. Indeed then f=\sum \phi(g_i)f_i for f,\ f_i \in R. But \phi preserves degrees so \phi(g_i) 0f lower degree than f. Thus by the induction hypothesis \phi(g_i) \in R' and hence f\in R' as required. \blacksquare

It’s worth noting that such an indirect proof caused quite a furore when it was originally published in the late 19th century. However the passage of time has provided us with a broader view of commutative algebra, and techniques such as this are much more acceptable to modern tastes! Nevertheless I shall finish by making explicit two facts that help to explain the success of our argument. We’ll first remind ourselves of a useful definition of an algebra.

Definition A.12 An R-algebra S is a ring S which has the compatible structure of a module over R in such a way that ring multiplication is R-bilinear.

It’s worth checking that this intuitive definition completely agrees with that we provided in the Background section, as is clearly outlined on the Wikipedia page.  The following provide an extension and converse to Corollary A.3 (that finitely generated algebras over fields are Noetherian) in the special case that R a graded ring.

Lemma A.13 S=R_0\oplus R_1 \oplus \dots a Noetherian graded ring iff R_0 Noetherian and S a finitely generated R_0 algebra.

Lemma A.14 Let S be a Noetherian graded ring, R a summand of S. Then R Noetherian.

We’ll prove these both next time. Note that they certainly aren’t true in general when S isn’t graded!

Algebra, Geometry and Topology: An Excellent Cocktail

Yes and I’ll have another one of those please waiter. One shot Geometry, topped up with Algebra and then a squeeze of Topology. Shaken, not stirred.

Okay, I admit that was both clichéd and contrived. But nonetheless it does accurately sum up the content of this post. We’ll shortly see that studying affine varieties on their own is like having a straight shot of gin – a little unpleasant, somewhat wasteful, and not an experience you’d be keen to repeat.

Part of the problem is the large number of affine varieties out there! We took a look at some last time, but it’s useful to have just a couple more examples. An affine plane curve is the zero set of any polynomial in \mathbb{A}^2. These crop up all the time in maths and there’s a lot of them. Go onto Wolfram Alpha and type plot f(x,y) = 0 replacing the term f(x,y) with any polynomial you wish. Here are a few that look nice

f(x,y) = y^2 – x^2 – x^3
f(x,y) = y^3 – x – x^2y
f(x,y) = x^2*y + x*y^2-x^4-y^4

There’s a more general notion than an affine plane curve that works in \mathbb{A}^n. We say a hypersurface is the zero of a single polynomial in \mathbb{A}^n. The cone in \mathbb{R}^3 that we say last time is a good example of a hypersurface. Finally we say a hyperplane is the zero of a single polynomial of degree 1 in \mathbb{A}^n.

Hopefully all that blathering has convinced you that there really are a lot of varieties, and so it looks like it’s going to be hard to say anything general about them. Indeed we could look at each one individually, study it hard and come up with some conclusions. But to do this for every single variety would be madness!

We could also try to group them into different types, then analyse them geometrically. This way is a bit more efficient, and indeed was the method of the Ancients when they learnt about conic sections. But it is predictably difficult to generalise this to higher dimensions. Moreover, most geometrical groupings are just the tip of the iceberg!

What with all this negativity, I imagine that a shot of gin sounds quite appealing now. But bear with me one second, and I’ll turn it into a Long Island Iced Tea! By broadening our horizons a bit with algebraic and topological ideas, we’ll see that all is not lost. In fact there are deep connections that make our (mathematical) life much easier and richer, thank goodness.

First though, I must come good on my promise to tell you about some subset’s of \mathbb{C}^n that aren’t algebraic varieties. A simple observation allows us to come up with a huge class of such subsets. Recall that polynomials are continuous functions from \mathbb{C}^n to \mathbb{C}, and therefore their zero sets must be closed in the Euclidean topology. Hence in particular, no open ball in \mathbb{C}^n can be thought of as a variety. (If you didn’t understand this, it’s probably time to brush up on your topology).

There are two further ‘obvious’ classes. Firstly graphs of transcendental functions are not algebraic varieties. For example the zero set of the function f(x,y) = e^{xy}-x^2 is not an affine variety. Secondly the closed square \{(x,y)\in \mathbb{C}^2:|x|,|y|\leq 1\} is an example of a closed set which is not an affine variety. This is because it clearly contains interior points, while no affine variety in \mathbb{C}^2 can contain such points. I’m not entirely sure at present why this is, so I’ve asked on math.stackexchange for a clarification!

How does algebra come into the mix then? To see that, we’ll need to recall a definition about a particular type of ring.

Definition 2.1 A Noetherian ring is a ring which satisfies the ascending chain condition on ideals. In other words given any chain I_1 \subseteq I_2 \subseteq \dots \ \exists n s.t. I_{n+k}=I_n for all k\in\mathbb{N}.

It’s easy to see that all fields are trivially Noetherian, for the only ideals in k are $latex 0$ and k itself. Moreover we have the following theorem due to Hilbert, which I won’t prove. You can find the (quite nifty) proof here.

Theorem 2.2 (Hilbert Basis) Let N be Noetherian. Then N[x_1] is Noetherian also, and by induction so is N[x_1,\dots,x_n] for any positive integer n.

This means that our polynomial rings k[\mathbb{A}^n] will always be Noetherian. In particular, we can write any ideal I\subset k[\mathbb{A}^n] as I=(f_1, \dots, f_r) for some finite r, using the ascending chain condition. Why is this useful? For that we’ll need a lemma.

Lemma 2.3 Let Y be an affine variety, so Y=V(T) some T\subset K[\mathbb{A}^n]. Let J=(T), the ideal generated by T. Then Y=V(J).

Proof By definition T\subset J so V(J)\subset V(T). We now need to show the reverse inclusion. For any g\in J there exist polynomials t_1,\dots, t_n in T and q_1,\dots,q_n in K[\mathbb{A}^n] s.t. g=\sum q_i t_i. Hence if p\in V(T) then t_i(p)=0 \ \forall i so p\in V(J). \blacksquare

Let’s put all these ideas together. After a bit of thought, we see that every affine variety Y can be written as the zero set of a finite number of polynomials t_1, \dots,t_n. If you don’t get this straight away look back carefully at the theorem and the lemma. Can you see how to marry their conclusions to get this fact?

This is an important and already somewhat surprising result. If you give me any subset of \mathbb{A}^n obtained from the solutions (possibly infinite) number of polynomial equations, I can always find a finite number of equations whose solutions give your geometrical shape! (At least in theory I can – doing so in practice is not always easy).

You can already see that a tiny bit of algebra has sweetened the cocktail! We’ve been able to deduce a fact about every affine variety with relative ease. Let’s pursue this link with algebra and see where it takes us.

Definition 2.4 For any subset X \subset \mathbb{A}^n we say the ideal of X is the set I(X):=\{f \in k[\mathbb{A}^n] : f(x)=0\forall x\in X\}.

In other words the ideal of X is all the polynomials which vanish on the set X. A trivial example is of course I(\mathbb{A}^n)=(0). Try to think of some other obvious examples before we move on.

Let’s recap. We’ve now defined two maps V: \{\textrm{ideals in }k[\mathbb{A}^n]\}\rightarrow \{\textrm{affine varieties in }\mathbb{A}^n\} and I:\{\textrm{subsets of }\mathbb{A}^n\}\rightarrow \{\textrm{ideals in }k[\mathbb{A}^n]\}. Intuitively these maps are somehow ‘opposite’ to each other. We’d like to be able to formalise that mathematically. More specifically we want to find certain classes of affine varieties and ideals where V and I are mutually inverse bijections.

Why did I say certain classes? Well, clearly it’s not the case that V and I are bijections on their domains of definition. Indeed V(x^n)=V(x), but (x)=\neq(x^n) so V isn’t injective. Furthermore working in \mathbb{A}^1_{\mathbb{C}} we see that I(\mathbb{Z})=(0)=I(\mathbb{A}^1) so I is not injective. Finally for n\geq 2 \ (x^n)\notin \textrm{Im}(I) so I is not surjective.

It’ll turn out in the next post that a special type of ideal called a radical ideal will play an important role. To help motivate its definition, think of some more examples where V fails to be injective. Can you spot a pattern? We’ll return to this next time. 

Now that we’ve got our maps V and I it’s instructive to examine their properties. This will give us a feeling for the basic manipulations of algebraic geometry. No need to read it very thoroughly, just skim it to pick up some of the ideas.

Lemma 2.5 The maps I and V satisfy the following, where J_i ideals and X_i subsets of \mathbb{A}^n:
(1) V(o)=\mathbb{A}^n,\ V(\mathbb{A}^n)=0
(2) V(J_1)\cup V(J_2)=V(J_1\cap J_2)
(3) \bigcap_{\lambda\in\Lambda}V(J_{\lambda}=V(\sum_{\lambda\in\Lambda}J_{\lambda})
(4) J_1\subset J_2 \Rightarrow V(J_2) \subset V(J_1)
(5) X_1\subset X_2 \Rightarrow I(X_2)\subset I(X_1)
(6) J_1 \subset I(V(J_1))
(7) X_1 \subset V(I(X_1)) with equality iff X_1 is an affine variety

Proof We prove each in turn.
(1) Trivial.
(2) We first prove “\subset“. Let q\in V(J_1)\cup V(J_2). Wlog assume q \in V(J_1). Then f(q)=0 \ \forall f \in J_1. So certainly f(q)=0 \ \forall f\in J_1\cap J_2, which is what we needed to prove. Now we show “\supset“. Let q\not\in {V(J_1)\cup V(J_2)}. Then q \not\in V(J_1) and q \not\in V(J_2). So there exists f \in J_1, \ g\in J_2 s.t. f(q) \neq 0,\ g(q)\neq 0. Hence fg(q)\neq 0. But fg\in J_1\cap J_2 s0 q \not\in {V(J_1\cap J_2)}.
(3) “\subset” is trivial. For “\supset” note that 0 \in J_{\lambda}\ \forall \lambda, and then it’s trivial.
(4) Trivial.
(5) Trivial.
(6) If p \in J_1 then p(q)=0\ \forall q \in V(J_1) by definition, so p \in I(V(J_1)).
(7) The relation X_1 \subset V(I(X_1)) follows from definitions exactly as (6) did. For the “if” statement, suppose X_1=V(J_1), some ideal J_1. Then by (5) J_1 \subset I(V(J_1)) so by (4) V(I(X_1)=V(I(V(J_1)) \subset V(J_1)=X_1. Conversely, suppose V(I(X_1)= X_1. Then X_1 is the zero set of I(X_1) so an affine variety by definition. \blacksquare

That was rather a lot of tedious set up! If you’re starting to get weary with this formalism, I can’t blame you. You may be losing sight of the purpose of all of this. What are these maps V and I and why do we care how they behave? A fair question indeed.

The answer is simple. Our V,\ I bijections will give us a dictionary between algebra and geometry. With minimal effort we can translate problems into an easier language. In particular, we’ll be allowed to use a generous dose of algebra to sweeten the geometric cocktail! You’ll have to wait until next time to see that in all its glory.

Finally, how does topology fit into all of this? Well, Lemma 2.5 (1)-(3) should give you an inkling. Indeed it instantly shows that the following definition makes sense.

Definition 2.6 We define the Zariski topology on \mathbb{A}^n by taking as closed sets all the affine varieties.

In some sense this is the natural topology on \mathbb{A}^n when we are concerned with solving equations. Letting k=\mathbb{C} we can make some comparisons with the usual Euclidean topology.

First note that since every affine variety is closed in the Euclidean topology, every Zariski closed set is Euclidean closed. However we saw in the last post that not all Euclidean closed sets are affine varieties. In fact there are many more Euclidean closed sets than Zariski ones. We say that the Euclidean topology is finer than the Zariski topology. Indeed the Euclidean topology has open balls of arbitrarily small radius. The general Zariski open set is somehow very large, since it’s the complement of a line or surface in \mathbb{A}^n.

Next time we’ll prove that for algebraically closed k every Zariski open set is dense in the Zariski topology, and hence (if k =\mathbb{C}) in the Euclidean topology. In particular, no nonempty Zariski open set is bounded in the Euclidean topology. Hence we immediately see that the intersection of two nonempty Zariski open sets of \mathbb{A}^n is never empty. This important observation tells us the the Zariski topology is not Hausdorff. We really are working with a very strange topological space!

And how is this useful? You know what I am going to say. It gives us yet another perspective on the world of affine varieties! Rather than just viewing them as geometrical objects in abstract \mathbb{A}^n we can imagine them as a fundamental world structure. We’ll now be able to use the tools of topology to help us learn things about geometry. And there’s the slice of lemon to garnish the perfect cocktail.

I leave you with this enlightening question I recently stumbled upon. Both the question, and the proposed solutions struck me as extremely elegant.