← Writing
Apr 20267–9 mins

The Second Law Is the Only Law

generating.

There is one law in physics that runs in only one direction. Every other fundamental law is time-symmetric: run the equations backward and you get another valid solution. Newton's laws, Maxwell's equations, Schrödinger's equation, general relativity. Film a collision and play it in reverse; the reverse looks physically plausible. Film a sugar cube dissolving in water and play it in reverse; the reverse looks absurd. That absurdity is the second law of thermodynamics, and it is the only place in the fundamental equations of physics where the arrow of time appears.

I did not expect, when I started doing computational materials science, to keep running into entropy. It shows up everywhere. Not as a metaphor. As the actual mechanism.

Why you cannot unburn fuel

The second law says that in an isolated system, entropy never decreases. Entropy has a precise definition: it is proportional to the logarithm of the number of microscopic configurations consistent with the macroscopic state you observe. A gas with all its molecules in one corner of a room has very few configurations consistent with that description, so its entropy is low. A gas spread uniformly has an astronomical number of consistent configurations, so its entropy is high. The system evolves toward higher entropy not because of any force pushing it there, but because there are vastly more high-entropy states than low-entropy states. Probability does the work.

When you burn fuel, you take a highly ordered molecular structure, hydrocarbons with specific bonding geometry, and convert it to CO₂ and water vapor dispersed into the atmosphere. The number of accessible configurations after combustion is orders of magnitude larger than before. You cannot reverse this without doing work equal to the entropy increase times the temperature, and in practice you cannot reverse it at all because the dispersed CO₂ is irretrievably mixed into the atmosphere. The second law is not a statement about energy availability. It is a statement about the geometry of configuration space.

Why recycling is hard

The same logic applies to materials. A thermoset polymer, fully crosslinked, has its atoms locked into a three-dimensional covalent network. The network has a specific topology: every chain end is covalently bonded to a crosslink junction, the junctions are connected to each other, and the whole structure is one giant molecule. When you apply mechanical stress and fracture this material, you break bonds at random. The fragments are chemically active but spatially disordered. Reassembling them into a functional network requires not just that the right bonds reform, but that they reform in the right places with the right geometry. The entropy cost of this spatial reorganization is what makes recycling hard. It is not primarily a chemistry problem. It is a statistical one.

Vitrimers, the materials I work with at Reformix, solve this by making the network topology dynamic rather than static. The bonds can exchange partners at elevated temperature, which allows the network to reorganize without fully dissolving. But this is not free: you are using thermal energy to drive the system through a landscape of higher-entropy intermediate states before it settles into a new ordered configuration. The second law is not defeated. It is navigated.

Levinthal's paradox and protein folding

In 1969, Cyrus Levinthal pointed out something uncomfortable about protein folding. A protein of 100 amino acids has roughly 3¹⁰⁰ possible conformations if each residue has three backbone angles. If the protein sampled conformations at random, one per picosecond, it would take longer than the age of the universe to find the native folded state. Yet proteins fold in milliseconds. This is Levinthal's paradox.

The resolution is that protein folding is not a random search. The energy landscape is funneled: there are many pathways downhill toward the native state and very few traps. The entropy of the unfolded ensemble is high, but the conformational entropy decreases smoothly as the protein folds, trading entropy for enthalpy in a coordinated way that guides the search. The second law is not violated. The landscape is shaped by evolution to make the entropy decrease happen along a productive path.

AlphaFold2 learned to predict the endpoint of this process with remarkable accuracy. But it does not simulate the folding dynamics. It predicts the final structure from sequence. This is a fundamental difference, and one worth keeping clear: predicting where a system ends up is not the same as understanding the path it takes or the thermodynamic logic that gets it there.

Why stochastic gradient descent finds anything

Neural network training is an optimization problem over a loss surface with millions or billions of dimensions. The exact geometry of that surface is unknown and unanalyzable. Gradient descent, the obvious approach, gets stuck in local minima. Adding stochastic noise, which is what mini-batch stochastic gradient descent does, helps escape those traps. The gradients computed on random subsets of the data are noisy estimates of the true gradient, and this noise acts as a source of thermal fluctuation that allows the optimizer to explore the loss landscape more broadly.

This is not an accident. It is the same mechanism that drives physical systems: thermal fluctuations allow systems to escape local energy minima and find lower-energy configurations. Simulated annealing formalizes this analogy directly, starting with high noise and cooling gradually to allow the system to settle into a deep minimum. The success of stochastic optimization in deep learning is, at a structural level, a restatement of statistical mechanics. The mathematics of partition functions and free energy minimization runs through both.

The information connection

In 1948, Claude Shannon defined the entropy of a probability distribution as H equals minus the sum of p log p over all outcomes. In 1877, Ludwig Boltzmann defined thermodynamic entropy as S equals k times the logarithm of the number of microstates. These two definitions are not merely analogous. They are the same mathematical object in different units with different physical grounding.

This connection matters for machine learning in a concrete way. A model's capacity to represent a dataset is bounded by the information content of that dataset, which is measured by Shannon entropy. Compressing a training dataset into a model's weights is thermodynamically irreversible in a precise sense: you lose information at every compression step. The bits that get dropped are not recoverable. This is why a model trained on finite data cannot generalize perfectly to the full distribution: the compression introduced irreversibility, and irreversibility means information loss, and information loss means the second law.

Every time I run a simulation and watch a structure evolve from order to disorder, or design a material to reverse that process, I am working inside the same framework that governs protein folding, neural network training, and the arrow of time. One law. Running through everything.