Mar 20254–5 mins

What Feynman Meant

generating.

When Richard Feynman died in 1988, his Caltech blackboard had two sentences written on it: "What I cannot create, I do not understand" and "Know how to solve every problem that has been solved." The first sentence has been reproduced on T-shirts, conference slides, and institution websites. The second, which is harder and less comfortable, gets quoted less.

Both sentences are about the same thing. Feynman's standard for understanding was not whether you could recognize the right answer. It was whether you could produce the answer from scratch, starting from first principles, without looking anything up.

Discriminative versus generative

The modern machine learning framing offers a useful language for this distinction. A discriminative model takes an input and predicts a label: given this molecule, predict its toxicity. A generative model takes a label and produces an input: given this toxicity profile, produce a molecule that satisfies it. Most of the early progress in molecular ML was discriminative. Property predictors, binding affinity models, ADMET classifiers. You give the model a structure and it tells you a number.

Feynman's standard, translated: a model that can only discriminate does not understand in the sense he meant. Understanding, by his criterion, requires the ability to generate. If you truly understand the relationship between molecular structure and a property, you should be able to design a molecule with that property, not just evaluate any given molecule against it.

Generative molecular design models are attempting to meet this standard. They are not meeting it yet, for reasons that become clear when you press on what "generate" means.

Generation without mechanism

A VAE can generate molecules with predicted high binding affinity for a target protein. If you ask it why this molecule should bind well, it has no answer. The generation is real: the molecule exists, it can be synthesized, it can be tested. But the causal chain from molecular structure to binding affinity passes through quantum mechanics, protein dynamics, solvation thermodynamics, and conformational entropy, none of which is represented in the latent space. The model is following a gradient in a learned landscape, not reasoning about mechanism.

Feynman could derive the equations of electrodynamics from scratch. He did not merely know which answers were correct. He understood the structure well enough to reconstruct the whole thing. A generative model that produces a molecule matching a property target has done something analogous to looking up the answer in a table: it has found a point in its learned distribution where the target property is satisfied. That is not derivation from first principles.

What first-principles generation would look like

A system that truly understood molecular design in Feynman's sense would reason from quantum mechanics through molecular structure through binding geometry through thermodynamics to produce a designed molecule, with each step grounded in the underlying theory. The current generation of models skips all of this. They learn end-to-end mappings from structure to property and back, compressing the intervening physics into weights that nobody can interpret.

This is not a criticism of the current approach. The current approach produces useful results. But it is important to be clear about what it has achieved and what it has not. A model that generates a molecule with a desired property is not doing what Feynman meant. It is doing something more like sophisticated memorization with interpolation: finding a molecule that lies in the right region of the learned distribution, which happens to produce the right outputs on the trained property predictors.

The second sentence

"Know how to solve every problem that has been solved." This is the complement of the first. Not just produce answers, but be able to trace through the logic of every known solution. A physicist who meets this standard could, in principle, derive all of classical mechanics, electrodynamics, and quantum mechanics from scratch, knowing only the postulates. They would be reconstructing the understanding, not retrieving it.

Science built on machine learning models that cannot be interpreted, that cannot trace their reasoning, and that cannot explain their outputs in terms of underlying mechanism is science that meets neither of Feynman's standards. It produces results. It does not produce understanding.

The goal worth working toward is models that meet both. That generate answers and explain them. That learn mechanism, not just correlation. That can, when asked why the molecule works, describe the physics responsible. This is a harder problem than building accurate predictors, and the field would benefit from being clear that it is a different problem.