← Writing
Feb 20266–8 mins

Symmetry First

generating.

In 1915, Emmy Noether proved a theorem that physicists still consider one of the most beautiful results in all of mathematics. Every continuous symmetry of a physical system corresponds to a conserved quantity. Translational symmetry in space implies conservation of momentum. Rotational symmetry implies conservation of angular momentum. Symmetry in time implies conservation of energy. The theorem works in both directions: if you know a system conserves energy, you know it has a time-translation symmetry. The mathematics of symmetry and the mathematics of conservation are the same mathematics.

I think about this theorem constantly when I build machine learning models for molecular systems. Not because it is philosophically satisfying, which it is, but because violating it costs data and accuracy in ways that are precisely measurable.

What a molecule knows about orientation

A molecule has a geometry. Its atoms occupy positions in three-dimensional space. But the physical properties of a molecule, its energy, the forces on its atoms, its dipole moment, its stability, do not depend on where the molecule is in space or how it is oriented. Rotate a water molecule by thirty degrees and its bond length does not change. Translate it across the room and its electron density does not change. This is not an approximation. It is an exact symmetry of the laws governing molecular behavior.

A machine learning model that predicts molecular energy from atomic positions will, if trained naively on Cartesian coordinates, need to learn this symmetry from data. It will see the same molecule at different orientations and gradually figure out, over many training examples, that the orientation does not matter. This is data augmentation: you show the model many versions of the same thing until it stops caring about the irrelevant variable. It works, eventually. But it is a slow and wasteful way to teach a fact that you already know is true.

Equivariance as a design principle

The alternative is to build the symmetry into the architecture. An equivariant neural network transforms its outputs in a predictable way when its inputs are transformed. For a rotation-equivariant network, rotating the input molecule by a rotation matrix R produces an output that is also rotated by R in the appropriate way. For a model predicting scalar properties like energy, the output should be completely invariant to rotation: rotate the input, get the same number. For a model predicting vector properties like forces, the output should rotate along with the input.

Constructing equivariant networks requires replacing standard linear layers with tensor products that respect the rotation group. The mathematics comes from group representation theory, specifically the decomposition of functions over SO(3) into spherical harmonics. This sounds complicated, and the implementation is, but the principle is straightforward: every operation in the network is constrained to produce outputs that transform correctly when the inputs are rotated. The symmetry is guaranteed by construction, not learned from data.

The practical consequence is dramatic. NequIP, a graph neural network architecture built on E(3)-equivariance, achieves state-of-the-art force field accuracy on benchmark datasets using one to two orders of magnitude less training data than non-equivariant models trained on the same systems. The equivariant model does not waste any of its representational capacity learning that energy is rotation-invariant. All of that capacity goes toward learning the actual physics.

The cost of ignoring symmetry

When I was building the neural network potential trainer for the Reformix project, I tested both approaches. A non-equivariant model trained on Cartesian coordinates required roughly 5,000 training configurations to reach acceptable accuracy on the validation set. An equivariant model using the same data but respecting E(3) symmetry reached the same accuracy with 400 configurations. The training set for the equivariant model was collected in about two weeks of AIMD simulation. The non-equivariant model would have required months.

This gap matters enormously in practice. Running AIMD simulations to generate training data is expensive. Every configuration requires a DFT calculation, which takes anywhere from minutes to hours depending on system size. Reducing the data requirement by a factor of ten is not a minor optimization. It is the difference between a research project that is feasible with the resources available and one that is not.

Noether's theorem as an inductive bias

The phrase used in machine learning for prior knowledge built into model architecture is inductive bias. Every architectural choice is an inductive bias: convolutional networks assume translational invariance in images, recurrent networks assume sequential structure in text, attention mechanisms assume pairwise interactions. The choice of inductive bias determines how much the model has to learn from data versus how much it already knows from the architecture.

Noether's theorem provides the strongest possible justification for symmetry as an inductive bias in physical systems: the symmetry is not a simplification or an approximation. It is an exact property of the underlying physics. An equivariant architecture is not making a convenient assumption. It is encoding a provably true fact about the universe.

The failure mode in modern AI for science is treating neural networks as universal function approximators and assuming that given enough data, they will learn whatever structure is present. This is technically true and practically catastrophic. Universal approximators are universal in the limit of infinite data. Real scientific datasets are small, expensive, and hard-won. Every structural property of the problem you fail to encode in the architecture is a property you have to pay for with data. In molecular science, that payment is DFT compute time. The currency is weeks or months of simulation.

What symmetry tells you about understanding

There is a deeper point here. Equivariant models are not just more data-efficient. They generalize better to genuinely novel systems. A non-equivariant model that learned rotational invariance empirically from a training set of small organic molecules will have absorbed that invariance through the specific distribution of orientations it saw. A novel molecule with a different geometry might break that learned approximation in subtle ways. An equivariant model has the invariance exactly, for any input, including inputs far outside the training distribution.

This is what it looks like when physical understanding gets built into a model. Not larger datasets. Not better optimization. The right constraints, derived from the right theory, imposed at the right level of the architecture. Noether knew this in 1915, working on a different problem entirely. The lesson transfers.