AI and Mathematics

Filed under 人工智能, 数学 on 2024年8月27日. Last updated on 2024年8月27日.

Table of Contents

Quotes

My talk is on AI and more generally machine assistance in mathematics.

It's all very exciting and it's beginning to be transformative, but on the other hand there's also a sense of continuity.

This I think is a way in which machine learning is being increasingly used in mathematics. It doesn't directly solve the problem for you but it gives you all these really useful hints as to where the connections are and where to look at them, but you still need the human to actually make the connections.

视频链接以及内容整理如下。


Introduction by Gregor Dolinar, IMO President

Hello everyone. Hello everyone. Some of you dear contestants are very young so perhaps you do not know who Professor Terence Tao is. Just a few words of introduction: He participated at the IMO for the first time when he was 11 years old and he received a bronze medal. The next year he came back and he received a silver medal. After that at the age of 13 he received a gold medal and he was the youngest participant to receive a gold medal. Then he, I don't know, got bored and went to university and he didn't participate at the IMO anymore. Now he is professor at the University of California LA and I can say that he is definitely the biggest IMO star and of course one of the most influential mathematicians of our time. Especially for you: Professor Terence Tao.

History of Machines and Mathematics

Thank you. I'm very happy to be back here at the IMO. The time I had at the IMO was one of the most fun times of my life. I still look back on it fondly. I hope we all had fun, not just in the competition whether you get a good score or not but also in the social activities. They always host a really good event here.

My talk is on AI and more generally machine assistance in mathematics. So you've all heard about AI and how it's changing everything. I think earlier today there was a talk by DeepMind on how there's a new product AlphaGeometry that can answer some IMO geometry questions now. There will be a presentation on the AI Math Olympia right after my talk actually so please stay after my talk for that.

I'll be talking more about how these tools are beginning to change research mathematics which is different from competition mathematics. Instead of having three hours to solve a problem you take months and sometimes you don't solve the problem then you have to change the problem. It is definitely not the same as math competitions although there's some overlap in skills.

It's all very exciting and it's beginning to be transformative, but on the other hand there's also a sense of continuity. We've actually been using computers and machines to do mathematics for a long time and it's just the nature of the way in which we're doing it is changing but it actually is following on a long tradition of machine assistance.

So here's a question: How long have we been using machines to do mathematics? The answer is thousands of years. Here's a machine that the Romans used to do mathematics. The abacus is one of the early machines. There are even some earlier ones. That's kind of boring. That's not a really smart machine.

Computers

What about computers? How long have we been using computers to do mathematics? That's about 300-400 years. I think that's a bit weird because our modern computers - we didn't have the electronic computers until the 1930s and 1940s. But computers weren't always electronic. Before that they were mechanical and before that they were human.

The computer was actually a job profession - someone who computes. Here is a cluster of computers during World War II to compute ballistics and other things. They had a whole cluster of computers who were mostly girls because the men were fighting the war. With adding machines and it was basically - there were programmers who basically just told the girls what to do. The basic unit of computational power at the time was not the CPU, it was the kilogirl. It's kilogirl/hour - how much computation you can do with 1,000 girls working like this for one hour.

Tables

We've been using computers actually even before that, since the 1700s or even earlier. The most basic use of computers in those times was to build tables. You may have heard of the logarithm tables of Napier. If you wanted to compute sines and cosines and so forth, you used a computer to generate the tables. When I was still in high school we still in our curriculum learned how to use these tables that were just being phased out. Of course now we had calculators and now computers.

We still use tables today. In mathematical research we rely on tables - we call them databases now but they're still the same thing. There are many important results in mathematics that were first discovered through tables. In number theory one of the most fundamental results is called the prime number theorem. It tells you roughly how many primes there are up to a large number XX and it was discovered by Legendre and Gauss. They couldn't prove it but they conjectured it to be true because Gauss and others they basically had computers - in fact Gauss himself was a bit of a computer - to compute tables of the first million primes and try to find patterns.

A couple centuries later, there's another really important conjecture. The prime number theorem was eventually proven in 1907 or so. But there's another really central problem in number theory called the Birch and Swinnerton-Dyer conjecture which I won't talk about here but it was also first discovered by looking at lots of tables - this time tables of data about elliptic curves.

Online Encyclopedia of Integer Sequences

A table that lots of mathematicians use now including myself is something called the Online Encyclopedia of Integer Sequences. Maybe you've encountered it yourself. You may recognize many integer sequences just from memory like if I tell the sequence 1, 1, 2, 3, 5, 8, 13 - you know that is the Fibonacci sequence. The OEIS is a database of hundreds of thousands of sequences like this. Many times when a mathematician is working on a research problem there is some natural sequence of numbers associated. Maybe there's a sequence of spaces depending on nn and you compute the dimension or how many or the cardinality of a set or something. You can compute the first five or six or 10 of these numbers, put it in, and then compare it to the OEIS. If you're lucky this sequence has already been put there by somebody else who discovered it from a completely different source, coming from studying some other mathematical problem. That really gives you a big clue that there's a connection between two problems and many promising productive research projects have come up that way.

Tables are one of the earliest ways we've been using computers. The most famous when you think of using computers to do mathematics you think of number crunching. The formal name for this is scientific computation. You want to do a very big calculation and you just do lots and lots of arithmetic - you source it out to a computer. We've been doing that since the 1920s. Maybe the first person to really do scientific computation was Hendrik Lorentz. He was tasked by the Dutch to figure out what's going to happen - they wanted to build a really giant dyke and they wanted to know what happened to the water flow and so they had to model some fluid equations. He used a whole bunch of human computers actually to work this out. He had to invent floating point arithmetic to do this. He realized that if you wanted to get a lot of people to do a lot of calculations very quickly you should represent lots of numbers of different magnitudes as floating points. We now use computers to model all kinds of things. If you're solving lots of linear equations or partial differential equations or want to do some combinatorial calculations, you can also solve algebra problems. In principle, many of the geometry questions you see at olympiads can be solved in principle by scientific computation. There are these algebra packages that can solve - you can turn any geometry problem say involving 10 points and some lines and circles into a system of equations of 20 real variables in some 20 unknowns and just whack it into Sage or Maple or something. Unfortunately, once it gets beyond a certain size the complexity becomes exponential or even double exponential. So until recently it was not really feasible to just brute force these problems with just standard computer algebra packages but now with AI assistance maybe it's more promising. You heard a talk about that this morning.

SAT Solvers

Another type of scientific computation that has become quite powerful is what are called SAT solvers - satisfiability solvers. These are meant to solve kind of logic puzzles. Like if you have 10 statements that are true or false - maybe a thousand statements that are true or false - and you know that maybe if the third statement is true and the sixth statement is true then the seventh statement must be false. If you're given a whole bunch of constraints like that, a SAT solver will try to take all this information and conclude - can you prove a certain combination of these sentences or not?

There's a more fancy version of a SAT solver called an SMT solver - satisfiability modulo theories - where you also have some variables xx, yy and zz and you assume some laws like maybe there's an addition operation and addition is commutative and associative. You plug in these laws as well as some other facts and you try to just brute force - can you deduce some conclusion out of some finite set of hypotheses?

Those are quite powerful but unfortunately they also don't scale well at all. Again the time complexity to solve them grows exponentially and so once you get past a thousand or so propositions it becomes really hard for these solvers to run in any reasonable amount of time. But they can actually solve some problems. One recent success for example here's a problem that probably will only ever be solved by computer - this will not be possible to solve by a human I think unassisted.

Pythagorean triple problem

It considers what was called the Pythagorean triple problem which was unsolved until this big computer SAT solver calculation. The question is: You take the natural numbers and you color them two colors red or blue. Is it true that no matter how you color the natural numbers, one of the colors must contain a Pythagorean triple - three numbers which form the sides of a right triangle like 3, 4, 5?

This was not known to be true. We have no sort of human proof of this. But we have a computer proof. It is now known that in fact you don't need all the natural numbers - you just need to go up to 7,824. No matter how many ways you can color 7,824 into two color classes, one of them will contain a Pythagorean triple. Now there's 2 to the 7824 such classes - you can't do it by brute force. So you have to be somewhat clever. But it is possible. Once you have 7,825, you must have a Pythagorean triple. There was an example of 7,824 where there's no Pythagorean triple in either class.

That's provable. It actually I think it was the world's longest proof ever at the time. I think now it's the second longest proof. The proof required a few years of computation and it generated a proof certificate. The actual proof is 200 terabytes long although it's since been compressed to a mere 86 gigabytes.

This is one way in which we can use computers just to do enormous case analysis. That's sort of a fairly obvious way to use computers. But in recent years we've begun to use computers in more creative ways. There are three ways in which computers are being used to do mathematics which I think are really exciting, particularly when they are combined with each other and with more classical databases, tables, and symbolic computation/scientific computation.

First of all, we're using machine learning neural networks to discover new connections and find out ways in which different types of mathematics are correlated in ways that you would not see as a human or are unlikely to see as a human. Most splashy are the large language models which are in some sense very large versions of machine learning algorithms, which can take natural language - like ChatGPT and Claude and so forth - and they can sometimes generate possible approaches to problems which sometimes work, sometimes don't. You'll see more examples of this actually in the talk after mine.

There's also another technology which is just becoming usable by the everyday mathematician which are called formal proof assistants. These are languages - so you know languages, computer languages you use to write executable code, programs that do things - formal proof assistants are languages that you write to check things, to check whether a certain argument is actually true and actually gives you the conclusion from the data. These have been fairly annoying to use until very recently and they're becoming now somewhat easier to use and they are facilitating a lot of interesting math projects that wouldn't have been possible without these proof assistants. They will combine very well in the future with the other tools I described here.

Proof Assistants

So I want to talk about these more modern ways to use machines and computers to do mathematics. I think I'll start with proof assistants.

Four Color Theorem

The first really computer-assisted proof maybe in history was the proof of the Four Color Theorem - every planar map can be colored using only four colors. That was proven in 1976. This was before proof assistants - it wouldn't really be called a computer proof nowadays. It was a proof which was a massive computation, like half of which was done by computer and half of which was done by humans. The way they proved the Four Color Theorem is that you basically induct on the number of countries. You show that if you have a massive map, there's some subgraph of countries - there was a produced list of about 1,000-2,000 special subgraphs - and every big graph of countries had to contain in some sense one of these subgraphs. That was one thing they had to check. Then they had to check that every time you had a subgraph you could replace that subgraph with something simpler and if you could four-color the simpler thing you could color the main thing. They had to check these properties - I think they're called dischargeability and reducibility - for each of these 10,000 or so subgraphs.

I think one of these tasks they could do by computer although this was an early computer - I think they had to enter in each graph by hand into this one program and check it. The other task was actually done by a human computer - one of the daughters of one of the authors actually had to spend hours and hours just manually checking this reducibility thing. It was very tedious. The process was not perfect - there were lots of little mistakes and they had to update the table. So it was not by modern standards a computer proof or computer-verifiable proof. That only came much later in the '90s where there was a simpler proof using a mere 700 or so graphs. But now all the things that need to be checked - there was a very precise well-defined list of properties and you could write code in your favorite computer language, C or Python or something, and you can check it in a couple pages and a couple hundred lines of code in a few minutes with a modern computer. To actually check that is completely - to write a proof that goes all the way down to the axioms of mathematics - that was done in 2005 using a proof assistant language called Coq. I think it's now renamed to Rooster.

That was one of the first proofs and you see there's a huge gap between sort of when the proof first appeared and then when we actually could completely verify it by computer.

Kepler conjecture

Another famous example is the Kepler conjecture for sphere packing. This is a really old conjecture - Kepler from the 17th century. It's very easy to state: You take a whole bunch of unit spheres and you want to cover three-dimensional space as efficiently as possible. There's sort of an obvious way to try to pack spheres - it's a triangular packing. Like the way you pack oranges at a grocery store. There's also a dual packing called the cubic packing which has the same density. There's a density of about 74% which is the obvious packing and the question is - is that the best possible? This turns out to be a surprisingly hard problem. In two dimensions, the hexagonal packing is not too hard to show it's the best. Only very recently in 8 and 24 dimensions do we know the answer - great work of Viazovska, maybe she talked about it yesterday. But three was the only other case that we know except for one which is trivial. Surprisingly difficult to prove. Again there was no completely human-readable proof of this conjecture. There is a strategy - so of course the problem is that there are infinitely many of these spheres and the density is an asymptotic thing so it's not a priori a finite problem that you can just throw at a computer.

But you can try to reduce it to a finite problem. There's a strategy proposed by Toth in the 50s. Every time you have a packing, it subdivides space into these polyhedra called Voronoi regions. The Voronoi polytope of a sphere is just all the points which are closer to the center of that sphere than to all the other spheres. So you can sort of split up space into all these polytopes and these polytopes have certain volumes. You can also count their faces and the surface areas and so forth and so they all have these statistics. The volumes - like the packing density is very closely related to sort of the average volume of these regions. So if you can say something about kind of how these volumes of these polytopes behave on average, then you could get at least maybe some upper bound onto how efficient these packings can be.

You can try to make relations between these polytopes - like if one polytope is very big maybe it forces the nearby ones to be very small and so maybe you can try to find some inequalities connecting the volume of one polytope to another. So maybe you should just collect lots and lots of these inequalities and then do some linear programming or something and hopefully you can just derive the right bound of this magic π32\frac{\pi}{3\sqrt{2}} which is the right density from all these inequalities. People tried this. There were many attempts - some even claimed success. But none have been accepted as actual proofs.

The problem was eventually solved first by Thomas Hales and his co-authors. He did basically the same strategy but with lots of technical tweaks. He changed the cells from Voronoi cells to slightly more fancy cells. Instead of taking the volume he invented this called the score that he assigned to each of these - it's a volume plus or minus lots of little ad hoc adjustments. But again with the aim of trying to create all these linear inequalities between these different scores and to eventually get upper bounds of the density and to hopefully hit exactly the optimal density. It's a very flexible method - actually it's too flexible because there are too many things you can try. There are so many ways you can set up the score and so forth. So there's a quote here: "Sam Ferguson realized that every time he encountered problems in trying to minimize his functional and so forth he could just change the score and try it again." But then all the things that they checked already they had to redo. And so the scoring function became more and more complicated. You know they worked on this for almost a decade I think. It became more and more complicated but each change we cut months through years from my work. This incessant fiddling was unpopular with our colleagues. Every time I presented my work in progress at a conference I was minimizing a different function. Even worse the function was mildly incompatible with what I did in earlier papers and this required going back and patching the earlier papers.

But eventually they did it. In 1998 they announced that they had finally found a score which obeyed a whole bunch of linear inequalities in 150 variables which they did minimize and got their thing. Initially they did not plan to make this a computer-assisted proof but as the project became more and more complicated it was inevitable they had to use more and more computers. The proof was enormous by the standards of 1998. It was 250 pages of notes and three gigabytes of computer programs and data. It actually had a very tough time getting refereed. It got sent to the top journal Annals of Mathematics and it took four years to referee with a panel of 12 referees. At the end they said they were 99% certain of the correctness of the proof but they could not certify the correctness of the computer calculations. They did a very unusual thing actually - they published the paper with a little caveat from the editors saying this. They have since removed that caveat actually. At the time there was a lot more controversy as to whether a computer-assisted proof qualified as an actual proof. Now I think we are much more comfortable with it. But even after it was published there was doubt about whether it was really a proof. So this was maybe the first major high-profile problem where there was really a big incentive to really formalize this completely all the way down to first principles in a formal proof language.

So Hales in fact created a language - well a modification of existing languages to do this. He called it the Flyspeck project. He estimated it would take 20 years to formalize his proof but actually with the help of 21 collaborators he actually finished in a mere 12 years. It finally appeared in 2014. So we now have sort of complete confidence in this particular result but it was quite a painful thing to do. Moving now to the last few years - we've now figured out sort of a better workflow for how to formalize. It's still tedious but it is getting better.

Condensed mathematics

Peter Scholze, who is a very prominent young mathematician - Fields Medalist for instance - he's famous for many many things but he created this amazingly promising area of mathematics. It's called condensed mathematics. It deploys the power of algebra, category theory and all the tools from algebra to apply to functional analysis - the theory of function spaces like Banach spaces and so forth which in analysis has really been resistant to the methods of algebra. But this area of mathematics in principle could allow one to solve questions in at least functional analysis - certain types of questions - with algebraic methods. So he set up this whole category of these things called condensed abelian groups and condensed vector spaces. I won't take long to explain what condensed means.

His thesis is that all our categories of function spaces that we learn in our graduate classes are incorrect or they're not the natural ones - there are ones with better properties. So he set up this theory but there was this one very important vanishing theorem which he needed to prove. I stated it here but I'm not going to explain what any of these words mean or symbols mean. But there was a very technical vanishing of a certain category theoretic group he needed to compute. Without this the whole theory doesn't have any interesting consequences. This was sort of the foundation of this theory. So he wrote a blog post about this result. He said he spent a whole year getting obsessed with the proof of this theorem, going almost crazy over it. In the end we were able to get an argument put down on paper but no one has dared to look at the details of this so I still have lingering doubts.

"With this theorem the hope that this condensed formalism can be fruitfully applied to functional analysis stands or falls. This theorem is of the utmost foundational importance so being 99.9% sure is not enough." He said he was happy to see many study groups on condensed mathematics throughout the world but they all stopped short of the proof of this theorem. "This proof is not much fun." So he says "This may be my most important result to date - better be sure it's correct."

Lean programming language

So he was also very incentivized to formalize this theorem now in a more modern proof assistant language called Lean. Lean is a language that has been developed quite a lot in recent years. It comes with a crowdsourced effort to develop this massive math library. Rather than deriving everything from the axioms of mathematics which becomes very tedious the more advanced you go - and this type of mathematics is very advanced - this central math library in Lean has already proved lots of intermediate results like the type of things you would see in say undergraduate math courses like calculus or basic theorems of group theory or topology and so forth. These have already been formalized and so you have a standing base - you're not starting from the axioms, you're starting from roughly a graduate level math education. Still a big gap to where you need to go but it helps. But in order to formalize this they had to add many extra things. The math library was not complete - it's still not complete. There's lots of areas of mathematics like homological algebra, sheaf theory, topos theory that needed to be added to the library. But in a mere 18 months they were able to formalize this theorem.

The proof was basically correct. There were some minor technical issues but nothing really major was discovered. They found some nice simplifications. There were some technical steps that were just too hard to formalize and so they were forced to find some shortcuts.

But actually the value of this project was more indirect.

Lean Blueprint

For example, one tool that was set up in the course of this project was what's called a blueprint. Taking a huge 50 page proof and trying to directly formalize it is really painful. You have to keep the whole proof in your head.

But what we've realized is the right workflow is that you take a big proof and you first write what's called a blueprint which sort of breaks up this proof into like hundreds of tiny little steps. Each step you can formalize separately and then you just put them all together. So you try to break up a huge argument into lots of little pieces. You write that first and then different people in your team can formalize different parts of different steps of your argument. So they also as a byproduct of this formalization also produced this very nice blueprint. This is probably the - if you actually want to read the proof as a human - the blueprint is probably the best place to go to now. Another spin-off of this - so there's now also this formal proof which is tens of thousands of lines long but now there are efforts to try to convert that back to a human readable proof. So another thing that's been developed is that there are now tools - you can take a proof that's been written in say this language Lean - like here's an example where there's a proof written of a topological problem and they converted it back to a human readable proof. So all this text here is a proof that is computer generated from a formal proof. It looks like a human proof. It uses the same sort of math language but it's much more interactive. You can click on any location - I can't do it because it's a static PDF - but you can click on any location here and it will tell you wherever you are what the hypothesis is, what you're trying to prove, what the variables are. If there's a step that is too short you can expand it and it will explain where it came from and you can go all the way down to the axioms if you want. I think this is great. I think in the future textbooks will be written in this interactive style. You formalize them first and then you can have much more interactive textbooks than currently.

Inspired by this, I myself started a project to formalize - so I recently last year I solved a problem in combinatorics with several people including Tim Gowers who's here in the audience. It's a problem in combinatorics - it's not super important what the problem is. There's a subset of Z mod 2 to the n, like what's called the Hamming cube, and it obeys a property called small doubling. Then there's a certain limit to how big it can be. But it doesn't really matter what the statement is. We proved it. The proof is about 33 pages. We formalized it in relatively record time - actually probably still the fastest formalized actual research paper. In 3 weeks, in a group project of about 20 people using all this blueprint machinery that had been developed in Scholze's project. It makes the task of proving things much more open and collaborative. You get all these nice visualizations. As I said, the first thing you do is that you take your big theorem and you break it up into lots of little pieces. The theorem that we have is we call it PFR - won't explain why. That corresponds to this little bubble at the bottom of this graph here and then we introduce all these other statements. The proof of PFR has to depend on several other previous statements. These ones depend on previous statements as well. So there's this dependency graph and they have different colors depending on whether you've formalized them or not.

A green bubble is a statement that you've already formally proven in your language. A blue bubble is one that hasn't yet been formalized but it's ready to be formalized - like all the definitions are in place, someone needs to actually go ahead and do it. A white bubble - even the statement hasn't been formalized yet, someone has to write in the statement.

So you get this tree of tasks. The beauty of this project is that you can get all these people to collaborate on these different pieces of this graph independently. Every little bubble corresponds to some statement and you don't need to understand the whole proof in order to just work on your little piece. This was a problem in combinatorics but the people who contributed - there were people from probability, there were people who were not even mathematicians, they were computer programmers but they were just very good at sort of doing these little mini puzzle type things. Everyone just sort of picked one bubble that they think they could do and they did it. In three weeks we did the whole thing. It was a really exciting project. In mathematics we don't normally collaborate with this many people. Maybe five people is the most I've normally seen because when you collaborate on a big project you have to trust that everyone's math is correct and past a certain size this is just not feasible. But with a project like this the Lean compiler automatically checks - you cannot upload anything that doesn't compile, it will get rejected. So you can collaborate with people that you never met before. I met a lot of people actually - wrote a lot of letters of recommendation actually coming out of this project. This is an example - like this is one little piece of the proof. This is what a proof looks like in Lean. It's not exactly - I mean if you know the language it's human readable but it looks a little bit unusual. It really sort of decouples the task of proving things into many different sort of disjoint skills. You can have some people who see the big picture and organize things into little pieces and then you have people who don't necessarily know all the mathematics but can just work on little pieces at a time. I think this will be a more and more common way of doing mathematics going forward. It still is painful to do - like the tools are not really - they're getting better and user-friendlier but you still need to have some expertise in programming. I would say it takes maybe 10 times longer to formalize a proof than to write it by hand. On the other hand if you want to change a proof - so for example there was a 12 that showed up in this theorem, we later improved this 12 to an 11. We got a slightly stronger theorem. Normally if you do that you have to rewrite the whole proof or like you could maybe cut and paste 12 to 11 but then you have to check you didn't make any mistakes when you did that. But actually when we formalized this then we got the improvement - it only took a few days to change the theorem to the 11. We just changed the 12 to 11 somewhere and then the compiler complained like five different places now is certain this very specific part is not working and we could just do some targeted fixes. So in fact for some specific types of doing mathematics already the formal approach is actually faster. Now there are actually quite a few big proof formalization projects going on right now. The biggest is Kevin Buzzard's project - he's just got a big grant to formalize Fermat's Last Theorem in Lean. He says it will take five years to do the most important parts of this proof. He doesn't claim to do the whole thing in five years but the interesting part is already on the way actually.

So that is formal proof assistants. I'll talk about machine learning.

Machine Learning

Machine learning - these are using neural networks to predict answers to various questions that you can use in many ways. I think I'll skip the first way I was discussing which is to use neural networks to guess solutions to differential equations which is a very exciting new tool in PDEs but I will skip it. I'll talk about another application of machine learning to knot theory.

Knot theory

Knot theory is quite a fun area of mathematics. It's an interesting area that brings together many different fields of mathematics and they don't really talk to each other. A knot is just a loop of string or a curve really in space that is closed. Two knots are equivalent if there's some way to continuously deform one knot to another in a way in which you're not allowed to cross the string - the string is not allowed to cross itself. The basic questions in knot theory are: When are two knots equivalent? If I give you two knots is there some way to turn one into the other? The way you approach this question normally is that you develop these things called knot invariants. These are various numbers, sometimes also polynomials that you can attach to a knot and these numbers don't change no matter how you continuously deform the knot. So if two knots have different invariants they cannot be equivalent. There are many many types of knot invariants. There's something called the signature which counts - you flatten the knot and you count crossings, whether the crossings go over or under and you create a certain matrix and so forth and you can get a certain integer called the signature. That is one type of knot invariant. There are some famous polynomials called the Jones polynomial and the Alexander polynomial which are connected to many areas of mathematics but I won't talk about that. Then there are these things called hyperbolic invariants which come from geometry. You can take the complement of the knot and that is actually what's called a hyperbolic space. It comes with a certain geometric structure, has a notion of distance and you can compute its volume and some other invariants. These are invariants that are real or complex numbers. So every knot comes with some combinatorial invariants like signatures and it comes with these geometric invariants like these hyperbolic invariants. Here is a whole list of knots with various hyperbolic invariants - there's something called the hyperbolic volume and the homological cusp shape and so forth. These are real or complex numbers but no one knew of any link between these two. There were these two separate ways to create statistics of knots and they didn't - there was no connection between them. It was only very recently that people started using machine learning to attack this problem. They created databases of millions of knots which actually was already a slightly nontrivial task and they trained a neural network on this. They found that after training the neural network you could give it all the hyperbolic geometry invariants and like 90% of the time it will predict - it will guess the right signature. So it created this black box and it will tell you how the signature was somehow hidden somewhere in these geometric invariants but it didn't tell you how - it was this black box. But that's still useful because once you have this black box you can just play with it. So what they did next is actually very simple analysis - this is what's called saliency analysis. What this black box does - it takes about 20 different inputs, one for each hyperbolic invariant, and which is one output - the signature. So once you have this black box you can just tweak each input. You just say what if I change one input, how likely is it to change the output? Of the 20 inputs that they found, they found that only three of them actually played a really major role in the output. The other 17 were barely relevant and it wasn't the three that they expected actually. They expected the volume for example to be very important and the volume turns out to be almost irrelevant. There were three - something called longitudinal translation and the real and complex parts of meridional translation - there were these three invariants that were the most important. So once they identified the ones that were most important, they could just plot directly the signature against those three particular inputs and then they could eyeball - rather than use neural network they use the human network - to then see oh okay there's some obvious patterns here. By staring at these graphs they could actually make conjectures as to what was actually going on. They made a conjecture based on this which turned out to be wrong actually. But they actually used the neural network to show that it was wrong. But then the way that it failed, they could correct it and they found a corrected version of the conjecture which actually did explain this phenomenon. Then once they found the right statement they were able to prove it. So they actually have a theoretical explanation of why the signature is so closely related to these particular statistics.

This I think is a way in which machine learning is being increasingly used in mathematics. It doesn't directly solve the problem for you but it gives you all these really useful hints as to where the connections are and where to look at them, but you still need the human to actually make the connections.

Large Language Models

And then finally we have the large language models which are the most splashy and have made the most news. Neural networks have been around for 20 years but large language models have also been around for 5 or so years but they've only become sort of human level in output very recently. You've all probably heard of GPT-4. This is ChatGPT's current model. Very famously when GPT-4 came out there was a paper describing its capabilities and they fed it basically a question from the 2022 IMO. It's a slightly simplified version - if you studied the 2022 IMO you'll probably notice it's not exactly the same form but it's a simplified form. For this particular question actually you give it the question and it actually gives a complete correct solution to this question. It actually solved an IMO question. Unfortunately this is an extremely cherry-picked example. I think out of the hundreds of IMO level questions they tested it on, they had a success rate of about 1%. So this particular problem they were able to solve, and they had to format the problem in the right way to get the solution, but still this is quite amazing.

On the other hand the funny thing about these tools is that things that humans find difficult, AI can do very easily sometimes, but things that humans find easy AI often struggles with. It is a very orthogonal way of solving problems.

In the same related paper or presentation, they asked the same model to do a basic arithmetic computation: 7 * 4 + 8 * 8. The model, which is just guessing the most likely output based on the input, basically guessed the answer is 120. Then it paused and said okay maybe I should give an explanation why it's 120. So they did a step by step but when they did a step by step they actually arrived at the actual answer which is 92, not the answer that they started with. So then if you asked "Wait but you said that it was 120" and they said "Oh that was a typo, sorry the correct answer is 92."

So you know they're not solving the problem from first principles, they're just guessing at each step of the output what is the most natural thing to say next. The amazing thing is sometimes that works, but often it doesn't. It's still ongoing how to sort of make it more accurate.

People are trying all kinds of things. You can connect these models to other more reliable software. In fact you will see a presentation after where there's a large language model connected where you don't do the computation yourself, you outsource it to Python in that case. But another thing you can do is that you can force the language model to only produce correct answers by forcing the model to output in one of these proof assistant languages and if it doesn't compile you send it back to the AI and the AI has to try again. Or you can try to teach it directly the same problem solving techniques we use to solve IMO problems - you know, try simple examples, prove by contradiction, try to actually prove step by step and so forth. So people are trying all kinds of things. It's still nowhere near able to solve a large majority of say math olympiad problems, let alone math research problems, but we're making progress. Besides being able to actually solve problems directly, it's also useful just as a muse actually. I've also used these models myself. I've experimented with various problems - I had a combinatorics problem which I was trying a few things and they weren't working so I as an experiment I just tried asking GPT "What other techniques would you suggest to solve this question?" It gave me a list of 10 techniques of which like 5 I'd already tried or were obviously not helpful. But there was one technique I'd not tried which was to use generating functions for this particular question, which once it was suggested I realized it was the right approach but I had missed it. So just as someone to converse with, it is somewhat useful. It is not great right now but it is not completely useless. There's another type of AI assistance that's actually become very useful for proof assistants. As I said, writing formal proofs is a very tedious task. I mean it's like any really fussy computer language - you have to get the syntax exactly right. If you miss a step it doesn't compile. But there are tools - so I use something called GitHub Copilot where you can write down half of a proof and it will try to guess what the next line is. About 20% of the time it actually guesses something close to being correct and then you can just say I'll accept that and say okay. So in this case I was trying to prove this statement here and the lines in gray are the ones that Copilot suggested. It turns out the first line is useless but the second line, which you can't quite see, actually did solve this particular problem. So you still have to - you can't just accept the input because it won't necessarily compile. But if you already know sort of how the code works it saves you a lot of time. These tools are getting better. So right now they can maybe - if a proof is one line or two lines long they can fill it in automatically. There are now experiments to sort of iterate an AI suggesting a proof and then you feed it back to the compiler and then if it compiles wrong you send the error message back. We're beginning to sort of prove things that are like 4 or 5 lines long - they can be done by this method. Of course a big proof is like tens of thousands of lines so it's nowhere near the point where you can just instantly get your proof formalized immediately. But it is already a useful tool. Okay so where are we now? There are people who are hoping that in a few years we can use computers to actually solve math problems directly. I think we are still a long way away from that. For very narrowly focused problems you can sort of set up specialized AI to handle just a very narrow band of problems. But even then they're not fully reliable - they can be useful. But still for the next few years at least, they're basically going to be really useful assistants beyond the sort of brute force computational assistance that we're already familiar with. People are trying all kinds of creative things. I think one direction which I find particularly exciting hasn't really been successful yet - hopefully AI will become very good at generating good conjectures. We already saw a little example of this with the knots where they could already sort of conjecture those connections between two different statistics. So you know there's just the hope that you just create these enormous datasets and feed them into an AI and they would just automatically generate lots of nice connections between different mathematical objects. We don't really know how to do this yet partly because we don't have these massive datasets but I think this is something that would eventually be possible. One thing I'm also excited about - this is a type of math that just doesn't exist yet. Right now because proving theorems is such a painful painstaking process we prove one theorem at a time or maybe two or three if you're efficient. But with AI you could imagine in the future instead of trying to prove one problem or solve one problem, you take a class of 1,000 similar problems and you say "Okay I'm going to tell your AI try to solve these 1,000 problems with this technique" and it will report back "Oh I could solve 35% of these problems with this technique. What about this technique? I can solve this percentage of problems. Well if I combine them I can do this." You could start exploring the space of problems rather than just each problem separately. This is something that you just you either cannot do right now or you do over a process of decades with dozens and dozens of papers slowly figuring out what you can and can't do with various techniques. But with these tools you could really start doing mathematics on a scale which is really unprecedented. So the future is going to be really exciting Ithink. I mean they we will still also be proving theorems the old fashioned way. In fact we'll have to because we can't we won't be able to guide these AIs unless we also know how to do the things ourselves. But we'll be able to do lots of things that we can't do right now. Okay I think I will stop there. So thank you very much. Any questions? So we are on a tight schedule but I was told we have time for maybe three or so questions. So if people would raise hands and - there's someone over there.

Q&A

Voevodsky

Attending university at a young age

Choosing fields of mathematics, Erdős number


References