The Theorem of The Existence of Zeroes

It’s time to prove the central result of elementary algebraic geometry. Mostly it’s referred to as Hilbert’s Nullstellensatz. This German term translates precisely to the title of this post. Indeed ‘Null’ means ‘zero’, ‘stellen’ means to exist and ‘Satz’ means theorem. But referring to it merely as an existence theorem for zeroes is inadequate. Its real power is in setting up a correspondence between algebra and geometry.

Are you sitting comfortably? Grab a glass of water (or wine if you prefer). Settle back and have a peruse of these theorems. This is your first glance into the heart of a magical subject.

(In many texts these theorems are all referred to as the Nullstellensatz. I think this is both pointless and confusing, so have renamed them! If you have any comments or suggestions about these names please let me know).

Theorem 4.1 (Hilbert’s Nullstellensatz) Let J\subsetneq k[\mathbb{A}^n] be a proper ideal of the polynomial ring. Then V(J)\neq \emptyset. In other words, for every nontrivial ideal there exists a point which simulataneously zeroes all of its elements.

Theorem 4.2 (Maximal Ideal Theorem) Every maximal ideal \mathfrak{m}\subset k[\mathbb{A}^n] is of the form (x-a_1,\dots,x-a_n) for some (a_1,\dots,a_n)\in \mathbb{A}^n. In other words every maximal ideal is the ideal of some single point in affine space.

Theorem 4.3 (Correspondence Theorem) For every ideal J\subset k[\mathbb{A}^n] we have I(V(J))=\sqrt{J}.

We’ll prove all of these shortly. Before that let’s have a look at some particular consequences. First note that 4.1 is manifestly false if k is not algebraically closed. Consider for example k=\mathbb{R} and n=1. Then certainly V(x^2+1)=\emptyset. Right then. From here on in we really must stick just to algebraically closed fields.

Despite having the famous name, 4.1 not really immediately useful. In fact we’ll see its main role is as a convenient stopping point in the proof of 4.3 from 4.2. The maximal ideal theorem is much more important. It precisely provides the converse to Theorem 3.10. But it is the correspondence theorem that is of greatest merit. As an immediate corollary of 4.3, 3.8 and 3.10 (recalling that prime and maximal ideals are radical) we have

Corollary 4.4 The maps V,I as defined in 1.2 and 2.4 give rise to the following bijections

\{\textrm{affine varieties in }\mathbb{A}^n\} \leftrightarrow \{\textrm{radical ideals in } k[\mathbb{A}^n]\}
\{\textrm{irreducible varieties in }\mathbb{A}^n\} \leftrightarrow \{\textrm{prime ideals in } k[\mathbb{A}^n]\}
\{\textrm{points in }\mathbb{A}^n\} \leftrightarrow \{\textrm{maximal ideals in } k[\mathbb{A}^n]\}

Proof We’ll prove the first bijection explicitly, for it is so rarely done in the literature. The second and third bijections follow from the argument for the first and 3.8, 3.10. Let J be a radical ideal in k[\mathbb{A}^n]. Then V(J) certainly an affine variety so V well defined. Moreover V is injective. For suppose \exists J' radical with V(J')=V(J). Then I(V(J'))=I(V(J)) and thus by 4.3 J = J'. It remains to prove that V surjective. Take X an affine variety. Then J'=I(X) an ideal with V(J')=X by Lemma 2.5. But J' not necessarily radical. Let J=\sqrt{J'} a radical ideal. Then by 4.3 I(V(J'))=J. So V(J) = V(I(V(J')) = V(J') = X by 2.5. This completes the proof. \blacksquare

We’ll see in the next post that we need not restrict our attention to \mathbb{A}^n. In fact using the coordinate ring we can gain a similar correspondence for the subvarieties of any given variety. This will lead to an advanced introduction to the language of schemes. With these promising results on the horizon, let’s get down to business. We’ll begin by recalling a definition and a theorem.

Definition 4.5 A finitely generated k-algebra is a ring R s.t. R \cong k[a_1,\dots,a_n] for some a_i \in R. A finite k-algebra is a ring R s.t. R\cong ka_1 + \dots ka_n.

Observe how this definition might be confusing when compared to a finitely generated k-module. But applying a broader notion of ‘finitely generated’ to both algebras and modules clears up the issue. You can check that the following definition is equivalent to those we’ve seen for algebras and modules. A finitely generated algebra is richer than a finitely generated module because an algebra has an extra operation – multiplication.

Definition 4.6 We say an algebra (module) A is finitely generated if there exists a finite set of generators F s.t. A is the smallest algebra (module) containing F. We then say that A is generated by F.

Theorem 4.7 Let k be a general field and A a finitely generated k-algebra. If A is a field then A is algebraic over k.

Okay I cheated a bit saying ‘recall’ Theorem 4.7. You probably haven’t seen it anywhere before. And you might think that it’s a teensy bit abstract! Nevertheless we shall see that it has immediate practical consequences. If you are itching for a proof, don’t worry. We’ll in fact present two. The first will be due to Zariski, and the second an idea of Noether. But before we come to those we must deduce 4.1 – 4.3 from 4.7.

Proof of 4.2 Let m \subset k[\mathbb{A}^n] be a maximal ideal. Then F = k[\mathbb{A}^n]/m a field. Define the natural homomorphism \pi: k[\mathbb{A}^n] \ni x \mapsto x+m \in F. Note F is a finitely generated k-algebra, generated by the x_i+m certainly. Thus by 4.7 F/k is an algebraic extension. But k was algebraically closed. Hence k is isomorphic to F via \phi : k \rightarrowtail k[\mathbb{A}^n] \xrightarrow{\pi} F.

Let a_i = \phi^{-1}(x_i+m). Then \pi(x_i - a_i) = 0 so x_i - a_i \in \textrm{ker}\pi = m. Hence (x_1-a_1, \dots, x_n-a_n) \subset m. But (x_1-a_1, \dots, x_n-a_n) is itself maximal by 3.10. Hence m = (x_1-a_1, \dots, x_n-a_n) as required. \blacksquare

That was really quite easy! We just worked through the definitions, making good use of our stipulation that k is algebraically closed. We’ll soon see that all the algebraic content is squeezed into the proof of 4.7

Proof of 4.1 Let J be a proper ideal in the polynomial ring. Since k[\mathbb{A}^n] Noetherian J\subset m some maximal ideal. From 4.2 we know that m=I(P) some point P\in \mathbb{A}^n. Recall from 2.5 that V(I(P)) = \{P\} \subset V(J) so V(J) \neq \emptyset. \blacksquare

The following proof is lengthier but still not difficult. Our argument uses a method known as the Rabinowitsch trick.

Proof of 4.3 Let J\triangleleft k[\mathbb{A}^n] and f\in I(V(J)). We want to prove that \exists N s.t. f^N \in J. We start by introducing a new variable t. Define an ideal J_f \supset J by J_f = (J, ft - 1) \subset k[x_1,\dots,x_n,t]. By definition V(J_f) = \{(P,b) \in \mathbb{A}^{n+1} : P\in V(J), \ f(P)b = 1\}. Note that f \in I(V(J)) so V(J_f) = \emptyset.

Now by 4.1 we must have that J_f improper. In other words J_f = k[x_1,\dots, x_n, t]. In particular 1 \in J_f. Since k[x_1,\dots, x_n, t] is Noetherian we know that J finitely generated by some \{f_1,\dots,f_r\} say. Thus we can write 1 = \sum_{i=1}^r g_i f_i + g_o (ft - 1) where g_i\in k[x_1,\dots , x_n, t] (*).

Let N be such that t^N is the highest power of t appearing among the g_i for =\leq i \leq r. Now multiplying (*) above by f^N yields f^N = \sum_{i=1}^r G_i(x_1,\dots, x_n, ft) f_i + G_0(x_1,\dots,x_n,ft)(ft-1) where we define G_i = f^N g_i. This equation is valid in k[x_1,\dots,x_n, t]. Consider its reduction in the ring k[x_1,\dots,x_n,t]/(ft - 1). We have the congruence f_N\equiv \sum_{i=1}^r h_i (x_1,\dots,x_n) f_i \ \textrm{mod}\ (ft-1) where h_i = G_i(x_1,\dots,x_n,1).

Now consider the map \phi:k[x_1,\dots, x_n]\rightarrowtail k[x_n,\dots, x_n,t]\xrightarrow{\pi} k[x_n,\dots, x_n,t]/(ft-1). Certainly nothing in the image of the injection can possibly be in the ideal (ft - 1), not having any t dependence. Hence \phi must be injective. But then we see that f^N = \sum_{i=1}^r h_i(x_1,\dots, x_n) f_i holds in the ring k[\mathbb{A}^n]. Recalling that the f_i generate J gives the result. \blacksquare

We shall devote the rest of this post to establishing 4.7. To do so we’ll need a number of lemmas. You might be unable to see the wood for the trees! If so, you can safely skim over much of this. The important exception is Noether normalisation, which we’ll come to later. I’ll link the ideas of our lemmas to geometrical concepts at our next meeting.

Definition 4.8 Let A,B be rings with B \subset A. Let a\in A. We say that a is integral over B if a is the root of some monic polynomial with roots in B. That is to say \exists b_i \in B s.t. a^n + b_{n-1}a^{n-1} + \dots + b_0 = 0. If every a \in A is integral over B we say that A  is integral over B or A is an integral extension of B.

Let’s note some obvious facts. Firstly we can immediately talk about A being integral over B when A,B are algebras with B a subalgebra of A. Remember an algebra is still a ring! It’s rather pedantic to stress this now, but hopefully it’ll prevent confusion if I mix my termin0logy later. Secondly observe that when A and B are fields “integral over” means exactly the same as “algebraic over”.

We’ll begin by proving some results that will be of use in both our approaches. We’ll see that there’s a subtle interplay between finite k-algebras, integral extensions and fields.

Lemma 4.9 Let F be a field and R\subset F a subring. Suppose F is an integral extension of R. Then R is itself a field.

Proof Let r \in R. Then certainly r \in F so r^{-1} \in F since F a field. Now r^{-1} integral over R so satisfies an equation r^-n = b_{n-1} r^{-n+1} +\dots + b_0 withb_i \in R. But now multiplying through by r^{n-1} yields r^{-1} = b_{n-1} + \dots + b_0 r^{n-1} \in R. \blacksquare

Note that this isn’t obvious a priori. The property that an extension is integral contains sufficient information to percolate the property of inverses down to the base ring.

Lemma 4.10 If A is a finite B algebra then A is integral over B.

Proof Write A = Ba_1 + \dots +Ba_n. Let x \in A. We want to prove that x satisfies some equation x^n + b_{n-1}x^n{n-1} + \dots + b_0 = 0. We’ll do so by appealing to our knowledge about determinants. For each a_i we may clearly write xa_i = \sum_{i=1}^{n} b_{ij}a_j for some b_ij \in B.

Writing \vec{a} = (a_1, \dots, a_n) and defining the matrix (\beta)_{ij} = b_{ij} we can express our equation as \beta a = xa. We recognise this as an eigenvalue problem. In particular x satisfies the characteristic polynomial of \beta, a polynomial of degree n with coefficients in B. But this is precisely what we wanted to show. \blacksquare

Corollary 4.11 Let A be a field and B\subset A a subring. If A is a finite B-algebra then B is itself a field.

Proof Immediate from 4.9 and 4.10. \blacksquare

We now focus our attention on Zariski’s proof of the Nullstellensatz. I take as a source Daniel Grayson’s excellent exposition.

Lemma 4.12 Let R be a ring an F a R-algebra generated by x \in F. Suppose further that F a field. Then \exists s \in R s.t. S = R[s^{-1}] a field.  Moreover x is algebraic over S.

Proof Let R' be the fraction field of R. Now recall that x is algebraic over R' iff R'[x] \supset R'(x). Thus x is algebraic over R' iff R'[x] is a field. So certainly our x is algebraic over R' for we are given that F a field. Let x^n + f_{n-1}x^{n-1} + \dots + f_0 be the minimal polynomial of x.

Now define s\in R to be the common denominator of the f_i, so that f_0,\dots, f_{n-1} \in R[s^{-1}] = S. Now x is integral over S so F/S an integral extension. But then by 4.9 S a field, and x algebraic over it. \blacksquare

Observe that this result is extremely close to 4.7. Indeed if we take R to be a field we have S = R in 4.12. Then lemma then says that R[x] is algebraic as a field extension of R. Morally this proof mostly just used definitions. The only nontrivial fact was the relationship between R'(x) and R'[x]. Even this is not hard to show rigorously from first principles, and I leave it as an exercise for the reader.

We’ll now attempt to generalise 4.12 to R[x_1,\dots,x_n]. The argument is essentially inductive, though quite laborious. 4.7 will be immediate once we have succeeded.

Lemma 4.13 Let R = F[x] be a polynomial ring over a field F. Let u\in R. Then R[u^{-1}] is not a field.

Proof By Euclid, R has infinitely many prime elements. Let p be a prime not dividing u. Suppose \exists q \in R[u^{-1}] s.t. qp = 1. Then q = f(u^{-1}) where f a polynomial of degree n with coefficients in R. Hence in particular u^n = u^n f(u^{-1}) p holds in R for u^n f(u^{-1}) \in R. Thus p | u^n but p prime so p | u. This is a contradiction. \blacksquare

Corollary 4.14 Let K be a field, F\subset K a subfield, and x \in K. Let R = F[x]. Suppose \exists u\in R s.t. R[u^{-1}] = K. Then x is algebraic over F. Moreover R = K.

Proof Suppose x were transcendental over F. Then R=F[x] would be a polynomial ring, so by 4.12 R[u^{-1}] couldn’t be a field. Hence x is algebraic over F so R is a field. Hence R=R[u{-1}]=K. \blacksquare

The following fairly abstract theorem is the key to unlocking the Nullstellensatz. It’s essentially a slight extension of 4.14, applying 4.12 in the process. I’d recommend skipping the proof first time, focussing instead on how it’s useful for the induction of 4.16.

Theorem 4.15 Take K a field, F \subset K a subring, x \in K. Let R = F[x]. Suppose \exists u\in R s.t. R[u^{-1}] = K. Then \exists 0\neq s \in F s.t. F[s^{-1}] is a field. Moreover F[s^{-1}][x] = K and x is algebraic over F[s^{-1}].

Proof Let L=\textrm{Frac}(F). Now by 4.14 we can immediately say that L[x]=K, with x algebraic over L. Now we seek our element s with the desired properties. Looking back at 4.12, we might expect it to be useful. But to use 4.12 for our purposes we’ll need to apply it to some F' = F[t^{-1}] with F'[x] = K, where t \in F.

Suppose we’ve found such a t. Then 4.12 gives us s' \in F' s.t. F'[s'^{-1}] a field with x algebraic over it. But now s' = qt^{-m} some q \in F, \ m \in \mathbb{N}. Now F'[s'^{-1}]=F[t^{-1}][s'^{-1}]=F[(qt)^{-1}], so setting =qt completes the proof. (You might want to think about that last equality for a second. It’s perhaps not immediately obvious).

So all we need to do is find t. We do this using our first observation in the proof. Observe that u^{-1}\in K=L[x] so we can write u^{-1}=l_0+\dots +l_{n-1}x^{n-1}, l_i \in L. Now let t \in F be a common denominator for all the l_i. Then u^{-1} \in F'=F[t^{-1}] so F'[x]=K as required. \blacksquare

Corollary 4.16 Let k a ring, A a field, finitely generated as a k-algebra by x_1,\dots,x_n. Then \exists 0\neq s\in k s.t. k[s^{-1}] a field, with A a finite algebraic extension of k[s^{-1}]. Trivially if k a field, then A is algebraic over k, establishing 4.7.

Proof Apply Lemma 4.15 with F=k[x_1,\dots,x_{n-1}], x=x_n, u=1 to get s'\in F s.t. A' = k[x_1,\dots,x_{n-1}][s'^{-1}] is a field with x_n algebraic over it. But now apply 4.15 again with F=k[x_1,\dots,x_{n-2}], u = s' to deduce that A''=k[x_1,\dots, x_{n-2}][s''^{-1}] is a field, with A' algebraic over A'', for some s'' \in F. Applying the lemma a further (n-2) times gives the result. \blacksquare

This proof of the Nullstellensatz is pleasingly direct and algebraic. However it has taken us a long way away from the geometric content of the subject. Moreover 4.13-4.15 are pretty arcane in the current setting. (I’m not sure whether they become more meaningful with a better knowledge of the subject. Do comment if you happen to know)!

Our second proof sticks closer to the geometric roots. We’ll introduce an important idea called Noether Normalisation along the way. For that you’ll have to come back next time!

Gravity for Beginners

Everyone knows about gravity. Every time you drop a plate or trip on the stairs you’re painfully aware of it. It’s responsible for the thrills and spills at theme parks and the heady hysteria of a plummeting toboggan. But gravity is not merely restricted to us small fry on Earth. It is a truly universal force, clumping mass together to form stars and planets, keeping worlds in orbit around their Suns. Physicists call gravity a fundamental force – it cannot be explained in terms of any other interaction.

The most accurate theory of gravity is Einstein’s General Theory of Relativity. The presence of mass warps space and time, creating the physical effects we observe. The larger the mass, the more curved space and time become, so the greater the gravitational pull. Space and time are a rubber sheet, which a large body like the Sun distorts.

Gravity is very weak compared to other fundamental forces. This might be a bit of a surprise – after all it takes a very powerful rocket to leave Earth’s orbit. But this is just because Earth is so very huge. Things on an everyday scale don’t seem to be pulled together by gravity. But small magnets certainly are attracted to each other by magnetism. So magnetism is stronger than gravity.

The fact that gravity acts by changing the geometry of space and time sets it apart from all other forces. In fact our best theories of particles assume that there is some cosmic blank canvas on which interactions happen. This dichotomy and the weakness of gravity give rise to the conflicts at the heart of physics.

It’s All Relative, After All

“You can’t just let nature run wild.”

Walt Disney

Throughout history, scientists have employed principles of relativity to understand nature. In broad terms these say that different people doing the same scientific experiment should get the same answer. This stands to reason – we travel the world and experience the same laws of nature everywhere. Any differences we might think we observe can be explained by a change in our experimental conditions. Looking for polar bears in the Sahara is obvious madness.

In the early 17th century Galileo formulated a specific version of relativity that applied to physics. He said that the laws of physics are the same for everyone moving at constant speed, regardless of the speed they are going. We notice this in everyday life. If you do an egg and spoon race walking at a steady pace the egg will stay on the spoon just as if you were standing still.

If you try to speed up or slow down, though, the egg will likely go crashing to the ground. This shows that for an accelerating observer the laws of physics are not the same. Newton noticed this and posited that an accelerating object feels a force that grows with its mass and acceleration. Sitting in a plane on takeoff this force pushes you backwards into your seat. Physicists call this Newton’s Second Law.

Newton’s Second Law can be used the opposite way round too. Let’s take an example. Suppose we drop an orange in the kitchen. As it travels through the air its mass certainly stays the same and the only significant force it feels is gravity. Using Newton’s Second Law we can calculate the acceleration of the orange. Now acceleration is just change in speed over time. So given any time in the future we can predict the speed of the orange.

But speed is change in distance over time. So we know the distance the orange has travelled towards the ground after any given time. In particular we can say exactly how much time it’ll be before the orange goes splat on the kitchen floor. One of the remarkable features of the human brain is that it can do approximations to these calculations very quickly, enabling us to catch the orange before disaster strikes.

The power of the principle of relativity is now apparent. Suppose we drop the orange in a lift, while steadily travelling upwards. We can instantly calculate how long it’ll take to hit the lift floor. Indeed by the principle of relativity it must take exactly the same amount of time as it did when we were in the kitchen.

There’s a hidden subtlety here. We’ve secretly assumed that there is some kind of universal clock, ticking away behind the scenes. In other words, everyone measures time the same, no matter how they’re moving. There’s also a mysterious cosmic tape measure somewhere offstage. That is, everyone agrees on distances, regardless of their motion. These hypotheses are seemingly valid for everyday life.

But somehow these notions of absolute space and time are a little unsettling. It would seem that Galileo’s relativity principle applies not only to physics, but also to all of space and time. Newton’s ideas force the universe to exist against the fixed backdrop of graph paper. Quite why the clock ticks and the ruler measures precisely as they do is not up for discussion. And the mysteries only deepen with Newton’s theory of gravity.

A Tale of Two Forces

“That one body may act upon another at a distance through a vacuum without the mediation of anything else […] is to me so great an absurdity that […] no man who has […] a competent faculty of thinking could ever fall into it.”

Sir Isaac Newton

Newton was arguably the first man to formulate a consistent theory of gravitation. He claimed that two masses attract each other with a force related to their masses and separation. The heavier the objects and the closer they are, the bigger the force of gravity.

This description of gravity was astoundingly successful.  It successively accounted for the curvature of the Earth, explained the motion of the planets around the Sun, and predicted the precise time of appearance of comets. Contemporary tests of Newton’s theory returned a single verdict – it was right.

Nevertheless Newton was troubled by his theory. According to his calculations, changes in the gravitational force must be propagated instantaneously throughout the universe. Naturally he sought a mechanism for such a phenomenon. Surely something must carry this force and effect its changes.

Cause and effect is an ubiquitous feature of everyday physics. Indeed Newton’s Second Law says motion and force and inextricably linked. Consider the force which opens a door. It has a cause – our hand pushing against the wood – and an effect – the door swinging open. But Newton couldn’t come up with an analogous explanation for gravity. He had solved the “what”, but the “why” and “how” evaded him entirely.

For more than a century Newton’s theories reigned supreme. It was not until the early 1800s that physicists turned their attention firmly towards another mystery – electricity. Michael Faraday lead the charge with his 1721 invention of an electric motor. By placing a coil of wire in a magnetic field and connecting a battery he could make it rotate. The race was on to explain this curious phenomenon.

Faraday’s work implied that electricity and magnetism were two sides of the same coin. The strange force he had observed became known, appropriately, as electromagnetism. The scientific community quickly settled on an idea. Electricity and magnetism were examples of a force field – at every point in space surrounding a magnet or current, there were invisible lines of force which affected the motion of nearby objects.

All they needed now were some equations. This would allow them to predict the behaviour of currents near magnets, and verify that this revolutionary force field idea was correct. Without a firm mathematical footing the theory was worthless. Several preeminent figures tried their hand at deriving a complete description, but in 1861 the problem remained open.

In that year a young Scotsman named James Clerk Maxwell finally cracked the issue. By modifying the findings of those before him he arrived at a set of equations which completely described electromagnetism. He even went one better than Newton had with gravity. He found a mechanism for the transmission of electromagnetic energy.

Surprisingly, Maxwell’s equations suggested that light was an electromagnetic wave. To wit, solutions showed that electricity and magnetism could spread out in a wavelike manner. Moreover the speed of these waves was determined by a constant in his equations. This constant turned out to be very close to the expected speed of light in a vacuum. It wasn’t a giant leap to suppose this wave was light itself.

At first glance it seems like these theories solve all of physics. To learn about gravity and the motion of uncharged objects we use Newton’s theory. To predict electromagnetic phenomena we use Maxwell’s. Presumably to understand the motion of magnets under gravity we need a bit of both. But trying to get Maxwell’s equations to play nice with Galileo’s relativity throws a big spanner in the works.

A Breath of Fresh Air

“So the darkness shall be the light, and the stillness the dancing.”

T.S. Eliot

Galileo’s relativity provides a solid bedrock for Newtonian physics. It renders relative speeds completely irrelevant, allowing us to concentrate on the effects of acceleration and force. Newton’s mechanics and Galileo’s philosophy reinforce each other. In fact Newton’s equations look the same no matter what speed you are travelling at. Physicists say they are invariant for Galileo’s relativity.

We might naively hope that Maxwell’s equations are invariant. Indeed a magnet behaves the same whether you are sitting still at home or running around looking for buried treasure. Currents seem unaffected by how fast you are travelling – an iPod still works on a train. It would be convenient if electromagnetism had the same equations everywhere.

Physicists initially shared this hope. But it became immediately apparent that Maxwell’s equations were not invariant. A change in the speed you were travelling caused a change in the equations. Plus there was only one speed at which solutions to the equations gave the right answer! Things weren’t looking good for Galileo’s ideology.

To solve this paradox, physicists made clever use of a simple concept. Since antiquity we’ve had a sense of perspective – the world looks different from another point of view. Here’s a simple example in physics. Imagine you’re running away from a statue. From your viewpoint the statue is moving backwards. From the statue’s viewpoint you are moving forwards. Both descriptions are right. They merely describe the same motion in contrary ways.

In physics we give perspective a special name. Any observer has a frame of reference from which they see the world. In your frame of reference the statue moves backwards. In the statue’s frame of reference you move forwards. We’ve seen that Newton’s physics is the same in every frame of reference which moves with a constant speed.

We can now rephrase our discovery about Maxwell’s equations. Physicists found that there was one frame of reference in which they were correct. Maxwell’s equations somehow prefer this frame of reference over all other ones! Any calculations in electromagnetism must be done relative to this fixed frame. But why is this frame singled out?

Faced with this question, physicists spotted the chance to kill two birds with one stone. The discovery that light is a wave of electromagnetism raises an immediate question. What medium carries the light waves? We’re used to waves travelling through some definite substance. Water waves need a sea or ocean, sound waves require air, and earthquakes move through rock. Light waves must travel through a fixed mysterious fog pervading all of space. It was called aether.

The aether naturally has it’s own frame of reference, just as you and I do. When we measure the speed of light in vacuum, we’re really measuring it relative to the aether. So the aether is a special reference frame for light. But light is an electromagnetic wave. It’s quite sensible to suggest that Maxwell’s preferred reference frame is precisely the aether!

Phew, we’ve sorted it. Newtonian mechanics works with Galilean relativity because there’s nothing to specify a particular reference frame. Maxwell’s equations don’t follow relativity because light waves exist in the aether, which is a special frame of reference. Once we’ve found good evidence for the aether we’ll be home and dry.

So thought physicists in the late 19th century. The gauntlet was down. Provide reliable experimental proof and win instant fame. Two ambitious men took up the challenge – Albert Michelson and Edward Morley.

They reasoned that the Earth must be moving relative to the aether. Therefore from the perspective of Earth the aether must move. Just as sound moves faster when carried by a gust of wind, light must move faster when carried by a gust of aether. This prediction became known as the aether wind.

By measuring the speed of light in different directions, Michelson and Morley could determine the direction and strength of the aether wind. The speed of light would be greatest when they aligned their measuring apparatus with the wind. Indeed the gusts of aether would carry light more swiftly. The speed of light would be slowest when they aligned their apparatus at right angles to the wind.

The expected changes in speed were so minute that it required great ingenuity to measure them. Nevertheless by 1887 they had perfected a cunning technique. With physicists waiting eagerly for confirmation, Michelson and Morley’s experiment failed spectacularly. They measured no aether wind at all.

The experiment was repeated hundreds of times in the subsequent years. The results were conclusive. It didn’t matter which orientation you chose, the speed of light was the same. This devastating realisation blew aether theory to smithereens. Try as they might, no man nor woman could paper over the cracks. Physics needed a revolution.

How Time Flies

“Put your hand on a hot stove for a minute, and it seems like an hour. Sit with a pretty girl for an hour, and it seems like a minute. That’s relativity.”

Albert Einstein

It is the most romanticised of fables. The task of setting physics right again fell to a little known patent clerk in Bern. The life and work of Albert Einstein has become a paradigm for genius. But yet the idea that sparked his reworking of reality was beautifully simple.

For a generation, physicists had been struggling to reconcile Galileo and Maxwell. Einstein claimed that they had missed the point. Galileo and Maxwell are both right. We’ve just misunderstood the very nature of space and time.

This seems a ridiculously bold assumption. It’s best appreciated by way of an analogy. Suppose you live in a rickety old house and buy a nice new chair. Placing it in your lounge you notice that it wobbles slightly. You wedge a newspaper under one leg. At first glance you’ve solved the problem. But when you sit on the chair it starts wobbling again.

After a few more abortive attempts with newspapers, rags and other household items you decide to put the chair in another room. But the wobble won’t go away. Even when you sand down the legs you can’t make it stand firm. The chair and the house seem completely incompatible.

One day a rogue builder turns up. He promises to fix your problem. You don’t believe he can. Neither changing the house nor the chair has made the slightest difference to you. When you return from work you are aghast to see he’s knocked the whole house down. He shouts up to you from a hole in the ground, “just fixing your foundations”!

More concretely Einstein said that there’s no problem with Galileo’s relativity. The laws of physics really are the same in every frame moving at constant speed. Physicists often rename this idea Einstein’s special principle of relativity even though it is Galileo’s invention! Einstein also had no beef with Maxwell’s equations. In particular they are the same in every frame.

To get around the fact that Maxwell’s equations are not invariant for Galileo’s relativity, Einstein claimed that we don’t understand space and time. He claimed that there is no universal tape measure, or cosmic timepiece. Everybody is responsible for their own measurements of space and time. This clears up some of the issues that annoyed Newton.

In order to get Maxwell’s equations to be invariant when we change perspective, Einstein had to alter the foundations of physics. He postulated that each person’s measurements of space and time were different, depending on how fast they were going. This correction magically made Maxwell’s equations work with his principle of special relativity.

There’s an easy way to understand how Einstein modified space and time. We’re used to thinking that we move at a constant speed through time. Clocks, timers and watches all attest to our obsession with measuring time consistently. The feeling of time dragging on or whizzing by is merely a psychological curiosity. Before Einstein space didn’t have this privilege. We only move through space as we choose. And moving through space has no effect on moving through time.

Einstein made everything much more symmetrical. He said that we are always moving through both space and time. We can’t do anything about it. Everyone always moves through space and time at the same speed – the speed of light. To move faster through space you must move slower through time to compensate. The slower you trudge through space, the faster you whizz through time. Simple as.

Einstein simply put space and time on a more even footing. We call the whole construct spacetime. Intuitively spacetime is four dimensional. That is, you can move in four independent directions. Three of these are in space, up-down, left-right, forward-backward. One of these is in time.

You probably haven’t noticed it yet, but special relativity has some weird effects. First and foremost, the speed of light is the same however fast you are going. This is because it is a constant in Maxwell’s equations, which are the same in all frames. You can never catch up with a beam of light! This is not something we are used to from everyday life.

Nevertheless it can be explained using Einstein’s special relativity. Suppose you measure the speed of light when you are stationery. You do this by measuring the amount of time it takes for light to go a certain distance. For the sake of argument assume you get the answer 10 mph.

Now imagine speeding up to 5 mph. You measure the speed of light again. Without Einstein you’d expect the result to be 5 mph. But because you’re moving faster through space you must be moving slower through time. That means it’ll take the light less time to go the same distance. In fact the warping of spacetime precisely accounts for the speed you’ve reached. The answer is again 10 mph.

Einstein’s spacetime also forces us to forget our usual notions of simultaneity. In everyday physics we can say with precision whether two events happen at the same time. But this concept relies precisely on Newton’s absolute time. Without his convenient divine timepiece we can’t talk about exact time. We can only work with relative perspectives.

Let’s take an example. Imagine watching a Harrison Ford thriller. He’s standing on the top of a train as it rushes through the station. He positions himself precisely in the center of the train. At either end is a rapier wielding bad guy eager to kill him. Ford is equipped with two guns which he fires simultaneously at his two nemeses. These guns fire beams of light that kill the men instantly.

In Ford’s frame both men die at the same time. The speed of light is the same in both directions and he’s equidistant from the men when he shoots. Therefore the beams hit at the same moment according to Ford. But for Ford’s sweetheart on the platform the story is different.

Let’s assume she is aligned with Ford at the moment he shoots. She sees the back of the train catching up with the point where Ford took the shot. Moreover she sees the front of the train moving away from the point of firing. Now the speed of light is constant in all directions for her. Therefore she’ll see the man at the back of the train get hit before the man at the front!

Remarkably all of these strange effects have been experimentally confirmed. Special relativity and spacetime really do describe our universe. But with our present understanding it seems that making genuine measurements would be nigh on impossible. We’ve only seen examples of things we can’t measure with certainty!

Thankfully there is a spacetime measurement all observers can agree on. This quantity is known as proper time. It’s very easy to calculate the proper time between events A and B. Take a clock and put it on a spaceship. Set your spaceship moving at a constant speed so that it goes through the spacetime points corresponding to A and B. The proper time between A and B is the time that elapses on the spaceship clock between A and B.

Everyone is forced to agree on the proper time between two events. After all it only depends on the motion of the spaceship. Observers moving at different speeds will all see the same time elapsed according to the spaceship clock. The speed that they are moving has absolutely no effect on the speed of your spaceship!

You might have spotted a potential flaw. What if somebody else sets off a spaceship which takes a different route from A to B? Wouldn’t it measure a different proper time? Indeed it would. But it turns out this situation is impossible. Remember that we had to set our spaceship off at a constant speed. This means it is going in a straight line through spacetime. It’s easy to see that there’s only one possible straight line route joining any two points. (Draw a picture)!

That’s it. You now understand special relativity. In just a few paragraphs we’ve made a huge conceptual leap. Forget about absolute space and time – it’s plain wrong. Instead use Einstein’s new magic measurement of proper time. Proper time doesn’t determine conventional time or length. Rather it tells us about distances in spacetime.

Einstein had cracked the biggest problem in physics. But he wasn’t done yet. Armed with his ideas about relativity and proper time, he turned to the Holy Grail. Could he go one better than Newton? It was time to explain gravity.

The Shape Of Space

“[…] the great questions about gravitation. Does it require time? […] Has it any reference to electricity? Or does it stand on the very foundation of matter – mass or inertia?”

James Clerk Maxwell

Physicists are always happy when they are going at constant speed. Thanks to Galileo and Einstein they don’t need to worry exactly how fast they are going. They can just observe the world and be sure that their observations are true interpretations of physics.

We can all empathise with this. It’s much more pleasant going on a calm cruise across the Aegean than a rocky boat crossing the English Channel. This is because the choppy waters cause the boat to accelerate from side to side. We’re no longer travelling at a constant speed, so the laws of physics appear unusual. This can have unpleasant consequences if we don’t find our sea legs.

Given this overwhelming evidence it seems madness to alter Einstein and Galileo’s relativity. But this is exactly what Einstein did. He postulated a general principle of relativity – the laws of physics are the same in any frame of reference. That is, no matter what your speed or acceleration.

To explain this brazen statement, we’ll take an example. Imagine you go to Disneyland and queue for the Freefall Ride. In this terrifying experience you are hoisted 60 metres in the air and then dropped. As you plunge towards the ground you notice that you can’t feel your weight. For a few moments you are completely weightless! In other words there is no gravity in your accelerating frame.

Let’s try a similar thought experiment. Suppose you wake up and find yourself in a bare room with no windows or doors. You stick to the floor as you usually would on Earth. You might be forgiven for thinking that you were under the influence of terrestrial gravity. In fact you could equally be trapped in a spaceship, accelerating at precisely the correct rate to emulate the force of gravity. Just think back to Newton’s Second Law and this is obviously true.

Hopefully you’ve spotted a pattern. Acceleration and gravity are bound together. In fact there seems to be no way of unpicking the knots. Immediately Einstein’s general relativity becomes more credible. An accelerating frame is precisely a constant speed frame with gravity around it.

So far so good. But we haven’t said anything about gravity that couldn’t be said about other forces. If we carry on thinking of gravity as a conventional interaction the argument becomes quite circular. We need a description of gravity that is independent of frames.

Here special relativity comes to our rescue. Remember that proper time gives us a universally agreed distance between two points. It does so by finding the length of a straight line in spacetime from one event to the other. This property characterises spacetime as flat.

To see this imagine you are a millipede, living on a piece of string. You can only move forwards or backwards. In other words you have one dimension of space. You also move through time. Therefore you exist in a two dimensional spacetime.

One day you decide to make a map of all spacetime. You travel to every point in spacetime and write down the proper time it took to get there. Naturally you are very efficient, and always travel by the shortest possible route. When you return you try to make a scale drawing of all your findings on a flat sheet of paper.

Suppose your spacetime is exactly as we’d expect from special relativity. Then you have no problem making your map. Indeed you always travelled by the shortest route to each point, which is a straight line in the spacetime of special relativity. On your flat sheet of paper, the shortest distance between two points is also a straight line. So your map will definitely work out.

You can now see why special relativistic spacetime is flat. Just for fun, let’s assume that you had trouble making your map. Just as you’ve drawn a few points on the map you realise there’s a point which doesn’t fit. Thinking you must be mistaken you try again. The same thing happens. You go out for another trek round spacetime. Exhausted you return with the same results. How immensely puzzling!

Eventually you start to realise that something fundamental is wrong. What if spacetime isn’t flat? You grab a nearby bowling ball and start drawing your map on its surface. Suddenly everything adds up. The distances you measured between each point work out perfectly. Your spacetime isn’t flat, it’s spherical.

Let’s look a bit closer at why this works. When you went out measuring your spacetime you took the shortest route to every point. On the surface of a sphere this is not a straight line. Rather it is part of a circle – a line of longitude or latitude. If you aren’t convinced, find a globe and measure it yourself on the Earth’s surface.

When you try to make a flat map, it doesn’t work. This is because the shortest distance between two points on the flat paper is a straight line. The two systems of measurement are just incompatible. We’re used to seeing this every time we look at a flat map of the world. The distances on it are all wrong because it has to be stretched and squished to fit the flat page.

Gallivanting millipedes aside, what has this got to do with reality? If we truly live in the universe of special relativity then we don’t need to worry about such complications. Everything is always flat! But Einstein realised that these curiosities were exactly the key to the treasure chest of gravity.

In his landmark paper of 1915, Einstein told us to forget gravity as a force. Instead, he claimed, gravity modifies the proper time between events. In doing so, it changes the geometry of spacetime. The flat, featureless landscape of special relativity was instantly replaced by cliffs and ravines.

More precisely Einstein came up with a series of equations which described how the presence of mass changes the calculation of proper time. Large masses can warp spacetime, causing planets to orbit their suns and stars to form galaxies. Nothing is immune to the change in geometry. Even light bends around massive stars. This has been verified countless times during solar eclipses.

Einstein had devised a frame independent theory of gravity which elegantly explained all gravitational phenomena. His general principle of relativity was more than vindicated; it became the title of a momentous new perspective on the cosmos. The quest to decipher Newton’s gravity was complete.

In the past century, despite huge quantities of research, we have made little progress beyond Einstein’s insights. The experimental evidence for general relativity is tumultuous. But yet it resists all efforts to express it as a quantum theory. We stand at an impasse, looking desperately for a way across.

Radical Progress

I’ll start this post by tying up some loose ends from last time. Before we get going there’s no better recommendation for uplifting listening than this marvellous recording. Hopefully it’ll help motivate and inspire you (and I) as we journey deeper into the weird and wonderful world of algebra and geometry.

I promised a proof that for algebraically closed fields k every Zariski open set is dense in the Zariski topology. Quite a mouthful at this stage of a post, I admit. Basically what I’m showing is that Zariski open sets are really damn big, only in a mathematically precise way. But what of this ‘algebraically closed’ nonsense? Time for a definition.

Definition 3.1 A field k is algebraically closed if every nonconstant polynomial in k[x] has a root in k.

Let’s look at a few examples. Certainly \mathbb{R} isn’t algebraically closed. Indeed the polynomial x^2 + 1 has no root in \mathbb{R}. By contrast \mathbb{C} is algebraically closed, by virtue of the Fundamental Theorem of Algebra. Clearly no finite field is algebraically closed. Indeed suppose k=\{p_1,\dots ,p_n\} then (x-p_1)\dots (x-p_n) +1 has no root in k. We’ll take a short detour to exhibit another large class of algebraically closed fields.

Definition 3.2 Let k,\ l be fields with k\subset l. We say that l is a field extension of k and write l/k for this situation. If every element of l is the root of a polynomial in k[x] we call l/k an algebraic extension. Finally we say that the algebraic closure of k is the algebraic extension \bar{k} of k which is itself algebraically closed.

(For those with a more technical background, recall that the algebraic closure is unique up to k-isomorphisms, provided one is willing to apply Zorn’s Lemma).

The idea of algebraic closure gives us a pleasant way to construct algebraically closed fields. However it gives us little intuition about what these fields ‘look like’. An illustrative example is provided by the algebraic closure of the finite field of order p^d for p prime. We’ll write \mathbb{F}_{p^d} for this field, as is common practice. It’s not too hard to prove the following

Theorem 3.3 \mathbb{F}_{p^d}=\bigcup_{n=1}^{\infty}\mathbb{F}_{p^{n!}}

Proof Read this PlanetMath article for details.

Now we’ve got a little bit of an idea what algebraically closed fields might look like! In particular we’ve constructed such fields with characteristic p for all p. From now on we shall boldly assume that for our purposes

every field k is algebraically closed

I imagine that you may have an immediate objection. After all, I’ve been recommending that you use \mathbb{R}^n to gain an intuition about \mathbb{A}^n. But we’ve just seen that \mathbb{R} is not algebraically closed. Seems like we have an issue.

At this point I have to wave my hands a bit. Since \mathbb{R}^n is a subset of \mathbb{C}^n we can recover many (all?) of the geometrical properties we want to study in \mathbb{R}^n by examining them in \mathbb{C}^n and projecting appropriately. Moreover since \mathbb{C}^n can be identified with \mathbb{R}^{2n} in the Euclidean topology, our knowledge of \mathbb{R}^n is still a useful intuitive guide.

However we should be aware that when we are examining affine plane curves with k=\mathbb{C} they are in some sense 4 dimensional objects – subsets of \mathbb{C}^2. If you can imagine 4 dimensional space then you are a better person than I! That’s not to say that these basic varieties are completely intractable though. By looking at projections in \mathbb{R}^3 and \mathbb{R}^2 we can gain a pretty complete geometric insight. And this will soon be complemented by our burgeoning algebraic understanding.

Now that I’ve finished rambling, here’s the promised proof!

Lemma 3.4 Every nonempty Zariski open subset of \mathbb{A}^1 is dense.

Proof Recall that k[x] is a principal ideal domain. Thus any ideal I\subset k[x] may be written I=(f). But k algebraically closed so f splits into linear factors. In other words I = ((x-a_1)\dots (x-a_n)). Hence the nontrivial Zariski closed subsets of \mathbb{A}^1 are finite, so certainly the Zariski open subsets of \mathbb{A}^1 are dense. \blacksquare

I believe that the general case is true for the complement of an irreducible variety, a concept which will be introduced next. However I haven’t been able to find a proof, so have asked here.

How do varieties split apart? This is a perfectly natural question. Indeed many objects, both in mathematics and the everyday world, are made of some fundamental building block. Understanding this ‘irreducible atom’  gives us an insight into the properties of the object itself. We’ll thus need a notion for what constitutes an ‘irreducible’ or ‘atomic’ variety.

Definition 3.5 An affine variety X is called reducible if one can write X=Y\cup Z with Y,\ Z proper subsets of X. If X is not reducible, we call it irreducible.

This seems like a good and intuitive way of defining irreducibility. But we don’t yet know that every variety can be constructed from irreducible building blocks. We’ll use the next few minutes to pursue such a theorem.

As an aside, I’d better tell you about some notational confusion that sometimes creeps in. Some authors use the term algebraic set for  my term affine variety. Such books will often use the term affine variety to mean irreducible algebraic set. For the time being I’ll stick to my guns, and use the word irreducible when it’s necessary!

Before we go theorem hunting, let’s get an idea about what irreducible varieties look like by examining some examples. The ‘preschool’ example is that V(x_1 x_2)\subset \mathbb{A}^2 is reducible, for indeed V(x_1 x_2) = V(x_1)\cup V(x_2). This is neither very interesting nor really very informative, however.

A better example is the fact that \mathbb{A}^1 is irreducible. To see this, recall that earlier we found that the only proper subvarieties of \mathbb{A}^1 are finite. But k is algebraically closed, so infinite. Hence we cannot write \mathbb{A}^1 as the union of two proper subvarities!

What about the obvious generalization of this to \mathbb{A}^n? Turns out that it is indeed true, as we might expect. For the sake of formality I’ll write it up as a lemma.

Lemma 3.6 \mathbb{A}^n is irreducible

Proof Suppose we could write \mathbb{A}^n=V(f)\cup V(g). By Lemma 2.5 we know that V(f)\cup V(g) = V((f)\cap (g)). But (f)\cap(g)\supset (fg) so V((f)\cap(g))\subset V(fg) again by Lemma 2.5. Conversely if x\in V(fg) then either f(x) = 0 or g(x) = 0, so x \in V(f)\cup V(g). This shows that V(f)\cup V(g)=V(fg).

Now V(fg)=\mathbb{A}^n immediately tells us fg(x) = 0 \ \forall x\in k. Suppose that f is nonzero. We’ll prove that g is the zero polynomial by induction on n. Then V(g)=\mathbb{A}^n so \mathbb{A}^n not irreducible, as required.

We first note that since k algebraically closed k infinite. For n=1 suppose f,\ g \neq 0. Then f,\ g are each zero at finite sets of points. Thus since k infinite, fg is not the zero polynomial, a contradiction.

Now let n>1.  Consider f,\ g nonzero polynomials in k[\mathbb{A}^n]. Fix x_n \in k. Then f,\ g polynomials in k[\mathbb{A}^{n-1}]. For some x_n, f,\ g nonzero as polynomials in k[\mathbb{A}^{n-1}]. By the induction hypothesis fg\neq 0. This completes the induction. \blacksquare

I’ll quickly demonstrate that \mathbb{A}^n is quite strange, when considered as a topological space with the Zariski topology! Indeed let U and V be two nonempty open subsets. Then U\cap V\neq \emptyset. Otherwise \mathbb{A}^n\setminus U,\ \mathbb{A}^n\setminus V would be proper closed subsets (affine subvarieties) which covered \mathbb{A}^n, violating irreducibility. This is very much not what happens in the Euclidean topology! Similarly we now have a rigorous proof that an open subset U of \mathbb{A}^n is dense. Otherwise \bar{U} and \mathbb{A}^n\setminus U would be proper subvarieties covering \mathbb{A}^n.

It’s all very well looking for direct examples of irreducible varieties, but in doing so we’ve forgotten about algebra! In fact algebra gives us a big helping hand, as the following theorem shows. For completeness we first recall the definition of a prime ideal.

Definition 3.7 \mathfrak{p} is a prime ideal in R iff whenever fg \in \mathfrak{p} we have f\in \mathfrak{p} or g \in \mathfrak{p}. Equivalently \mathfrak{p} is prime iff R/\mathfrak{p} is an integral domain.

Theorem 3.8 Let X be a nonempty affine variety. Then X irreducible iff I(X) a prime ideal.

Proof [“\Rightarrow“] Suppose I(X) not prime. Then \exists f,g \in k[\mathbb{A}^n] with fg \in I(X) but f,\ g \notin I(X). Let J_1 = (I(X),f) and J_2 = (I(X),g). Further define X_1 = V(J_1), \ X_2 = V(J_2). Then V(X_1), \ V(X_2) \subset X so proper subsets of \mathbb{A}^n. On the other hand X\subset X_1 \cup X_2. Indeed if P\in X then fg(P)=0 so f(P)=0 or g(P)=0 so P \in X_1\cup X_2.

 [“\Leftarrow“] Suppose X is reducible, that is \exists X_1,\ X_2 proper subvarieties of X with X=X_1\cup X_2. Since X_1 a proper subvariety of X there must exist some element f \in I(X_1)\setminus I(X). Similarly we find g\in I(X_2)\setminus I(X). Hence fg(P) = 0 for all P in X_1\cup X_2 = X, so certainly fg \in I(X). But this means that I(X) is not prime. \blacksquare

This easy theorem is our first real taste of the power that abstract algebra lends to the study of geometry. Let’s see it in action.

Recall that a nonzero principal ideal of the ring k[\mathbb{A}^n] is prime iff it is generated by an irreducible polynomial. This is an easy consequence of the fact that k[\mathbb{A}^n] is a UFD. Indeed a nonzero principal ideal is prime iff it is generated by a prime element. But in a UFD every prime is irreducible, and every irreducible is prime!

Using the theorem we can say that every irreducible polynomial f gives rise to an irreducible affine hypersurface X s.t. I(X)=(f). Note that we cannot get a converse to this – there’s nothing to say that I(X) must be principal in general.

Does this generalise to ideals generated by several irreducible polynomials? We quickly see the answer is no. Indeed take f = x\, g = x^2 + y^2 -1 in k[\mathbb{A}^2]. These are both clearly irreducible, but (f,g) is not prime. We can see this in two ways. Algebraically y^2 \in (f,g) but y \notin (f,g). Geometrically, recall Lemma 2.5 (3). Also note that by definition (f,g) = (f)+(g). Hence V(f,g) = V(f)\cap V(g). But V(f) \cap V(g) is clearly just two distinct points (the intersection of the line with the circle). Hence it is reducible, and by our theorem (f,g) cannot be prime.

We can also use the theorem to exhibit a more informative example of a reducible variety. Consider X = V(X^2Y - Y^2). Clearly \mathfrak{a}=(X^2Y-Y^2) is not prime for Y(X^2 - Y) \in \mathfrak{a} but Y\notin \mathfrak{a}, \ X^2 - Y \notin \mathfrak{a}. Noting that \mathfrak{a}=(X^2-Y)\cap Y we see that geometrically X is the union of the X-axis and the parabola Y=X^2, by Lemma 2.5.

Having had such success with prime ideals and irreducible varieties, we might think – what about maximal ideals? Turns out that they have a role to play too. Note that maximal ideals are automatically prime, so any varieties they generate will certainly be irreducible.

Definition 3.9 An ideal \mathfrak{m} of R is said to be maximal if whenever \mathfrak{m}\subset\mathfrak{a}\subset R either \mathfrak{a} = \mathfrak{m} or \mathfrak{a} = R. Equivalently \mathfrak{m} is maximal iff R/\mathfrak{m} is a field.

Theorem 3.10 An affine variety X in \mathbb{A}^n is a point iff I(X) is a maximal ideal.

Proof  [“\Rightarrow“] Let X = \{(a_1, \dots , a_n)\} be a single point. Then clearly I(X) = (X_1-a_1,\dots ,X_n-a_n). But k[\mathbb{A}^n]/I(X) a field. Indeed k[\mathbb{A}^n]/I(X) isomorphic to k itself, via the isomorphism X_i \mapsto a_i. Hence I(X) maximal.

[“\Leftarrow“] We’ll see this next time. In fact all we need to show is that (X_1-a_1,\dots,X_n-a_n) are the only maximal ideals. \blacksquare

Theorems 3.8 and 3.10 are a promising start to our search for a dictionary between algebra and geometry. But they are unsatisfying in two ways. Firstly they tell us nothing about the behaviour of reducible affine varieties – a very large class! Secondly it is not obvious how to use 3.8 to construct irreducibly varieties in general. Indeed there is an inherent asymmetry in our knowledge at present, as I shall now demonstrate.

Given an irreducible variety X we can construct it’s ideal I(X) and be sure it is prime, by Theorem 3.8. Moreover we know by Lemma 2.5 that V(I(X))=X, a pleasing correspondence. However, given a prime ideal J we cannot immediately say that V(J) is prime. For in Lemma 2.5 there was nothing to say that I(V(J))=J, so Theorem 3.8 is useless. We clearly need to find a set of ideals for which I(V(J))=J holds, and hope that prime ideals are a subset of this.

It turns out that such a condition is satisfied by a class called radical ideals. Next time we shall prove this, and demonstrate that radical ideals correspond exactly to algebraic varieties. This will provide us with the basic dictionary of algebraic geometry, allowing us to proceed to deeper results. The remainder of this post shall be devoted to radical ideals, and the promised proof of an irreducible decomposition.

Definition 3.11 Let J be an ideal in a ring R. We define the radical of J to be the ideal \sqrt{J}=\{f\in R : f^m\in J \ \textrm{some} \ m\in \mathbb{N}\}. We say that J is a radical ideal if J=\sqrt{J}.

(That \sqrt{J} is a genuine ideal needs proof, but this is merely a trivial check of the axioms).

At first glance this appears to be a somewhat arbitrary definition, though the nomenclature should seem sensible enough. To get a more rounded perspective let’s introduce some other concepts that will become important later.

Definition 3.12polynomial function or regular function on an affine variety X is a map X\rightarrow k which is defined by the restriction of a polynomial in k[\mathbb{A}^n] to X. More explicitly it is a map f:X\rightarrow k with f(P)=F(P) for all P\in X where F\in k[\mathbb{A}^n] some polynomial.

These are eminently reasonable quantities to be interested in. In many ways they are the most obvious functions to define on affine varieties. Regular functions are the analogues of smooth functions in differential geometry, or continuous functions in topology. They are the canonical maps.

It is obvious that a regular function f cannot in general uniquely define the polynomial F giving rise to it. In fact suppose f(P)=F(P)=G(P) \ \forall P \in X. Then F-G = 0 on X so F-G\in I(X). This simple observation explains the implicit claim in the following definition.

Definition 3.13 Let X be an affine variety. The coordinate ring k[X] is the ring k[\mathbb{A}^n]|_X=k[\mathbb{A}^n]/I(X). In other words the coordinate ring is the ring of all regular functions on X.

This definition should also appear logical. Indeed we define the space of continuous functions in topology and the space of smooth functions in differential geometry. The coordinate ring is merely the same notion in algebraic geometry.  The name  ‘coordinate ring’ arises since clearly k[X] is generated by the coordinate functions x_1,\dots ,x_n restricted to X. The reason for our notation k[x_1,\dots ,x_n]=k[\mathbb{A}^n] should now be obvious. Note that the coordinate ring is trivially a finitely generated k-algebra.

The coordinate ring might seem a little useless at present. We’ll see in a later post that it has a vital role in allowing us to apply our dictionary of algebra and geometry to subvarieties. To avoid confusion we’ll stick to k[\mathbb{A}^n] for the near future. The reason for introducing coordinate rings was to link them to radical ideals. We’ll do this via two final definitions.

Definition 3.14 An element x of a ring R is called nilpotent if \exists some positive integer n s.t. x^n=0.

Definition 3.15 A ring R is reduced if 0 is its only nilpotent element.

Lemma 3.16 R/I is reduced iff I is radical.

Proof Let x+I be a nilpotent element of R/I i.e. (x^n + I) = 0. Hence x^n \in I so by definition x\in \sqrt{I}=I. Conversely let x\in R s.t. x^m \in I. Then x^m + I = 0 in R/I so x+I = 0+I i.e. x \in I. $/blacksquare$

Putting this all together we immediately see that the coordinate ring k[X] is a reduced, finitely generated k-algebra. That is, provided we assume that for an affine variety X, I(X) is radical, which we’ll prove next time. It’s useful to quickly see that these properties characterise coordinate rings of varieties. In fact given any reduced, finitely generated k-algebra A we can construct a variety X with k[X]=A as follows.

Write A=k[a_1,\dots ,a_n] and define a surjective homomorphism \pi:k[\mathbb{A}^n]\rightarrow A, \ x_i\mapsto a_i. Let I=\textrm{ker}(\pi) and X=V(I). By the isomorphism theorem A = k[\mathbb{A}^n]/I so I is radical since A reduced. But then by our theorem next time X an affine variety, with coordinate ring A.

We’ve come a long way in this post, and congratulations if you’ve stayed with me through all of it! Let’s survey the landscape. In the background we have abstract algebra – systems of equations whose solutions we want to study. In the foreground are our geometrical ideas – affine varieties which represent solutions to the equations. These varieties are built out of irreducible blocks, like Lego. We can match up ideals and varieties according to various criteria. We can also study maps from geometrical varieties down to the ground field using the coordinate ring.

Before I go here’s the promised proof that irreducible varieties really are the building blocks we’ve been talking about.

Theorem 3.17 Every affine variety X has a unique decomposition as X_1\cup\dots\cup X_n up to ordering, where the X_i are irreducible components and X_i\not\subset X_j for i\neq j.

Proof (Existence) An affine variety X is either irreducible or X=Y\cup Z with Y,Z proper subset of X. We similarly may decompose Y and Z if they are reducible, and so on. We claim that this process stops after finitely many steps. Suppose otherwise, then X contains an infinite sequence of subvarieties X\supsetneq X_1 \supsetneq X_2 \supsetneq \dots. By Lemma 2.5 (5) & (7) we have I(X)\subsetneq I(X_1) \subsetneq I(X_2) \subsetneq \dots. But k[\mathbb{A}^n] a Noetherian ring by Hilbert’s Basis Theorem, and this contradicts the ascending chain condition! To satisfy the X_i \not\subset X_j condition we simply remove any such X_i that exist in the decomposition we’ve found.

(Uniqueness) Suppose we have another decomposition X=Y_1\cup Y_m with Y_i\not\subset Y_j for i\neq j. Then X_i = X_i\cap X = \bigcup_{j=1}^{m}( X_i\cap Y_j). Since X_i is irreducible we must have X_i\cap Y_j = X_i for some j. In particular X_i \subset Y_j. But now by doing the same with the X and Y reversed we find X_k width X_i \subset Y_j \subset X_k. But this forces i=k and Y_j = X_i. But i was arbitrary, so we are done. \blacksquare

If you’re interested in calculating some specific examples of ideals and their associated varieties have a read about Groebner Bases. This will probably become a topic for a post at some point, loosely based on the ideas in Hassett’s excellent book. This question is also worth a skim.

I leave you with this enlightening MathOverflow discussion , tackling the irreducibility of polynomials in two variables. Although some of the material is a tad dense, it’s nevertheless interesting, and may be a useful future reference!

Invariant Theory and David Hilbert

Health warning: this post is part of a more advanced series on commutative algebra. It may be a little tricky for the layman to understand!

David Hilbert was perhaps the greatest mathematicians of the late 19th century. Much of his work laid the foundations for our modern study of commutative algebra. In doing so, he was sometimes said to have killed the study of invariants by solving the central problem in the field. In this post I’ll give a sketch of how he did so.

Motivated by Galois Theory we ask the following question. Given a polynomial ring S = k[x_1,\dots,x_n] and a group G acting on S as a k-automorphism, what are the elements of S that are invariant under the action of G? Following familiar notation we denote this set S^G and note that it certainly forms a subalgebra of S.

In the late 19th century it was found that S^G could be described fully by a finite set of generators for several suggestive special cases of G. It soon became clear that the fundamental problem of invariant theory was to find necessary and sufficient conditions for S^G to be finitely generated. Hilbert’s contribution was an incredibly general sufficient condition, as we shall soon see.

To begin with we shall recall the alternative definition of a Noetherian ring. It is a standard proof that this definition is equivalent to that which invokes the ascending chain condition on ideals. As an aside, also recall that the ascending chain condition can be restated by saying that every nonempty collection of ideals has a maximal element.

Definition A.1 A ring R is Noetherian if every ideal of R is finitely generated.

We shall also recall without proof Hilbert’s Basis Theorem, and draw an easy corollary.

Theorem A.2 If R Noetherian then R[x] Noetherian.

Corollary A.3 If S is a finitely generated algebra over R, with R Noetherian, then S Noetherian.

Proof We’ll first show that any homomorphic image of R is Noetherian. Let I be an ideal in the image under than homomorphism f. Then f^{-1}(I) an ideal in R. Indeed if k\in f^{-1}(I) and r\in R then f(rk)=f(r)f(k)\in I so rk \in f^{-1}(I). Hence f^{-1}(I) finitely generated, so certainly I finitely generated, by the images of the generators of f^{-1}(I).

Now we’ll prove the corollary. Since S is a finitely generated algebra over R, S is a homomorphic image of R[x_1,\dots,x_n] for some n, by the obvious homomorphism that takes each x_i to a generator of S. By Theorem A.2 and induction we know that R[x_1,\dots,x_n] is Noetherian. But then by the above, S is Noetherian. \blacksquare

Since we’re on the subject of Noetherian things, it’s probably worthwhile introducing the concept of a Noetherian module. The subsequent theorem is analogous to A.3 for general modules. This question ensures that the theorem has content.

Definition A.4 An R-module M is Noetherian if every submodule N is finitely generated, that is, if every element of N can be written as a polynomial in some generators \{f_1,\dots,f_n\}\subset N with coefficients in R.

Theorem A.5 If R Noetherian and M a finitely generated R-module then M Noetherian.

Proof Suppose M generated by f_1,\dots,f_t, and let N be a submodule. We show N finitely generated by induction on t.

If t=1 then clearly the map h:R\rightarrow M defined by 1\mapsto f_1 is surjective. Then the preimage of N is an ideal, just as in A.3, so is finitely generated. Hence N is finitely generated by the images of the generators of h^{-1}(N).  (*)

Now suppose t>1. Consider the quotient map h:M \to M/Rf_1. Let \tilde{N} be the image of N under this map. Then by the induction hypothesis \tilde{N} is finitely generated as it is a submodule of M/Rf_1. Let g_1,\dots,g_s be elements of N whose images generate \tilde{N}. Since Rf_1 is a submodule of M generated by a single element, we have by (*) that it’s submodule Rf_1\cap N is finitely generated, by h_1,\dots,h_r say.

We claim that \{g_1,\dots,g_s,h_1,\dots,h_r\} generate N. Indeed given n \in N the image of n \in N is a linear combination of the images of the g_i. Hence subtracting the relevant linear combination of the g_i from n produces an element of N \cap Rf_1 which is precisely a linear combination of the h_i by construction. This completes the induction. \blacksquare

We’re now ready to talk about the concrete problem that Hilbert solved using these ideas, namely the existence of finite bases for invariants. We’ll take k to be a field of characteristic 0 and G to be a finite group, or one of the linear groups \textrm{ GL}_n(k),\ \textrm{SL}_n(k). As in our notation above, we take S=k[x_1,\dots,x_n].

Suppose also we are given a group homomorphism \phi:G \to \textrm{GL}_r(k), which of course can naturally be seen as the group of invertible linear transformations of the vector space V over k with basis x_1,\dots,x_r. This is in fact the definition of a representation of G on the vector space V. As is common practice in representation theory, we view G as acting on V via (g,v)\mapsto \phi(g)v.

If G is \textrm{SL}_n(k) or \textrm{GL}_n(k) we shall further suppose that our representation of G is rational. That is, the matrices g \in G act on V as matrices whose entries are rational functions in the entries of g. (If you’re new to representation theory like me, you might want to read that sentence twice)!

We now extend the action of g\in G from V to the whole of S by defining (g,f)\mapsto f(g^{-1}(x_1),\dots,g^{-1}(x_r),x_{r+1},\dots,x_n). Thus we may view G as an automorphism group of S. The invariants under G are those polynomials left unchanged by the action of every g \in G, and these form a subring of S which we’ll denote S^G.

Enough set up. To proceed to more interesting territory we’ll need to make another definition.

Definition A.6 A polynomial is called homogeneous, homogeneous form, or merely a form, if each of its monomials with nonzero coefficient has the same total degree.

Hilbert noticed that the following totally obvious fact about S^G was critically important to the theory of invariants. We may write S^G as a direct sum of the vector spaces R_i of homogeneous forms of degree i that are invariant under G. We say that S^G may be graded by degree and use this to motivate our next definition.

Definition A.7 A graded ring is a ring R together with a direct sum decomposition as abelian groups R = R_0 \oplus R_1 \oplus \dots, such that R_i R_j \subset R_{i+j}.

This allows us to generalise our notion of homogeneous also.

Definition A.8 A homogeneous element of a graded ring R is an element of one of the groups R_i. A homogeneous ideal of R is an ideal generated by homogeneous elements.

Be warned that clearly homogeneous ideals may contain many inhomogeneous elements! It’s worth mentioning that there was no special reason for taking \mathbb{N} as our indexing set for the R_i. We can generalise this easily to \mathbb{Z}, and such graded rings are often called \mathbb{Z}-graded rings. We won’t need this today, however.

Note that if f \in R we have a unique expression for f of the form f = f_0 + f_1 + \dots + f_n with f_i \in R_i. (I have yet to convince myself why this terminates generally, any thoughts? I’ve also asked here.) We call the f_ihomogeneous component of f.

The next definition is motivated by algebraic geometry, specifically the study of projective varieties. When we arrive at these in the main blog (probably towards the end of this month) it shall make a good deal more sense!

Definition A.9 The ideal in a graded ring R generated by all forms of degree greater than 0 is called the irrelevant ideal and notated R_+.

Now we return to our earlier example. We may grade the polynomial ring S=k[x_1,\dots,x_n] by degree. In other words we write S=S_0\oplus S_1 \oplus \dots with S_i containing all the forms (homogeneous polynomials) of degree i.

To see how graded rings are subtly useful, we’ll draw a surprisingly powerful lemma.

Lemma A.10 Let I be a homogeneous ideal of a graded ring R, with I generated by f_1,\dots,f_r. Let f\in I be a homogeneous element. Then we may write f = \sum f_i g_i with g_i homogeneous of degree \textrm{deg}(f)-\textrm{deg}(f_i).

Proof We can certainly write f = \sum f_i G_i with G_i \in R. Take g_i to be the homogeneous components of G_i of degree \textrm{deg}(f)-\textrm{deg}(f_i). Then all other terms in the sum must cancel, for f is homogeneous by assumption. \blacksquare

Now we return to our attempt to emulate Hilbert. We saw earlier that he spotted that grading S^G by degree may be useful. His second observation was this. \exists maps \phi:S\to S^G of S^G-modules s.t. (1) \phi preserves degrees and (2) \phi fixes every element of S^G. It is easy to see that this abstract concept corresponds intuitively to the condition that S^G be a summand of the graded ring S.

This is trivial to see in the case that G is a finite group. Indeed let \phi (f) = \frac{1}{|G|}\sum_{g\in G} g(f). Note that we have implicitly used that k has characteristic zero to ensure that the multiplicative inverse to |G| exists. In the case that G is a linear group acting rationally, then the technique is to replace the sum by an integral. The particulars of this are well beyond the scope of this post however!

We finally prove the following general theorem. We immediately get Hilbert’s result on the finite generation of classes of invariants by taking R=S^G.

Theorem A.11 Takek a field and S=k[x_1,\dots,x_n] a polynomial ring graded by degree. Let R be a k-subalgebra of S. Suppose R is a summand of S, in the sense described above. Then R is finitely generated as a k-algebra.

Proof Let I\subset R be the ideal of R generated by all homogeneous elements of degree > 0. By the Basis Theorem S is Noetherian, and IS an ideal of S, so finitely generated. By splitting each generator into its homogeneous components we may assume that IS is generated by some homogeneous elements f_1,\dots,f_s which we may wlog assume lie in I. We’ll prove that these elements precisely generate R as a k-algebra.

Now let R' be the k-subalgebra of S generated by f_1,\dots,f_s and take f\in R a general homogeneous polynomial. Suppose we have shown f\in R'. Let g be a general element of R. Then certainly g\in S a sum of homogeneous components. But R a summand of S, so applying the given map \phi we have that the homogeneous components are wlog in R. Thus g\in R' also, and we are done.

It only remains to prove f \in R' which we’ll do by induction on the degree of f. If \textrm{deg}(f)=0 then f\in K\subset R'. Suppose \textrm{deg}(f)>0 so f\in I. Since the f_i generate IS as a homogeneous ideal of S we may write f = \sum f_i g_i with g_i homogeneous of degree \textrm{deg}(f)-\textrm{deg}(f_i)<\textrm{deg}(f) by Lemma A.10. But again we may use the map \phi obtained from our observation that R a summand of S. Indeed then f=\sum \phi(g_i)f_i for f,\ f_i \in R. But \phi preserves degrees so \phi(g_i) 0f lower degree than f. Thus by the induction hypothesis \phi(g_i) \in R' and hence f\in R' as required. \blacksquare

It’s worth noting that such an indirect proof caused quite a furore when it was originally published in the late 19th century. However the passage of time has provided us with a broader view of commutative algebra, and techniques such as this are much more acceptable to modern tastes! Nevertheless I shall finish by making explicit two facts that help to explain the success of our argument. We’ll first remind ourselves of a useful definition of an algebra.

Definition A.12 An R-algebra S is a ring S which has the compatible structure of a module over R in such a way that ring multiplication is R-bilinear.

It’s worth checking that this intuitive definition completely agrees with that we provided in the Background section, as is clearly outlined on the Wikipedia page.  The following provide an extension and converse to Corollary A.3 (that finitely generated algebras over fields are Noetherian) in the special case that R a graded ring.

Lemma A.13 S=R_0\oplus R_1 \oplus \dots a Noetherian graded ring iff R_0 Noetherian and S a finitely generated R_0 algebra.

Lemma A.14 Let S be a Noetherian graded ring, R a summand of S. Then R Noetherian.

We’ll prove these both next time. Note that they certainly aren’t true in general when S isn’t graded!