St Andrews Research Repository

St Andrews University Home

  •   St Andrews Research Repository
  • Mathematics & Statistics (School of)
  • Pure Mathematics

Pure Mathematics Theses

  • Register / Login

By Issue Date Names Titles Subjects Classification Type Funder

Search within this collection:

Main areas of research activity are algebra, including group theory, semigroup theory, lattice theory, and computational group theory, and analysis, including fractal geometry, multifractal analysis, complex dynamical systems, Kleinian groups, and diophantine approximations.

For more information please visit the School of Mathematics and Statistics home page.

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Recent Submissions

Rearrangement groups of connected spaces , modern computational methods for finitely presented monoids , finiteness conditions on semigroups relating to their actions and one-sided congruences , on constructing topology from algebra , interpolating between hausdorff and box dimension .

feed

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 14 December 2023

Mathematical discoveries from program search with large language models

  • Bernardino Romera-Paredes   ORCID: orcid.org/0000-0003-3604-3590 1   na1 ,
  • Mohammadamin Barekatain   ORCID: orcid.org/0000-0002-8470-8203 1   na1 ,
  • Alexander Novikov 1   na1 ,
  • Matej Balog   ORCID: orcid.org/0000-0002-5552-9855 1   na1 ,
  • M. Pawan Kumar 1   na1 ,
  • Emilien Dupont 1   na1 ,
  • Francisco J. R. Ruiz   ORCID: orcid.org/0000-0002-2200-901X 1   na1 ,
  • Jordan S. Ellenberg 2 ,
  • Pengming Wang   ORCID: orcid.org/0009-0009-4976-4267 1 ,
  • Omar Fawzi 3 ,
  • Pushmeet Kohli   ORCID: orcid.org/0000-0002-7466-7997 1 &
  • Alhussein Fawzi   ORCID: orcid.org/0000-0001-7341-1917 1   na1  

Nature volume  625 ,  pages 468–475 ( 2024 ) Cite this article

181k Accesses

8 Citations

1026 Altmetric

Metrics details

  • Computer science
  • Pure mathematics

Large language models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations), which can result in them making plausible but incorrect statements 1 , 2 . This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pretrained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best-known results in important problems, pushing the boundary of existing LLM-based approaches 3 . Applying FunSearch to a central problem in extremal combinatorics—the cap set problem—we discover new constructions of large cap sets going beyond the best-known ones, both in finite dimensional and asymptotic cases. This shows that it is possible to make discoveries for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve on widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.

Similar content being viewed by others

pure mathematics research paper

Leveraging large language models for predictive chemistry

pure mathematics research paper

Autonomous chemical research with large language models

pure mathematics research paper

Synthesizing theories of human language with Bayesian program induction

Many problems in mathematical sciences are ‘easy to evaluate’, despite being typically ‘hard to solve’. For example, in computer science, NP-complete optimization problems admit a polynomial-time evaluation procedure (measuring the quality of the solution), despite the widespread belief that no polynomial-time algorithms to solve such problems exist. We focus in this paper on problems admitting an efficient ‘evaluate’ function, which measures the quality of a candidate solution. Prominent examples include the maximum independent set problem and maximum constraint satisfaction problems (such as finding the ground state energy of a Hamiltonian). Our goal is to generate a ‘solve’ program, such that its outputs receive high scores from the ‘evaluate’ function (when executed on inputs of interest), and ultimately improve on the best-known solutions.

Whereas large language models (LLMs) have recently seen notable improvements in their coding capabilities 4 , 5 , 6 , 7 , 8 , with applications including debugging 9 , 10 , solving code competitions 11 , 12 and improving code performance 13 , synthesizing ‘solve’ programs for open problems requires finding new ideas that are verifiably correct. This is very hard for LLMs, as they tend to confabulate or ultimately fall short of going beyond existing results. To surpass the ‘nominal’ capabilities of LLMs, recent studies 3 have combined them with evolutionary algorithms 14 , 15 , leading to important improvements on diverse synthetic problems 16 , searching for neural network architectures 17 , 18 , 19 and solving puzzles 20 . Our proposed method, FunSearch, pushes the boundary of LLM-guided evolutionary procedures to a new level: the discovery of new scientific results for established open problems and the discovery of new algorithms. Surpassing state-of-the-art results on established open problems provides a clear indication that the discoveries are truly new, as opposed to being retrieved from the LLM’s training data.

FunSearch (short for searching in the function space) combines a pretrained (frozen) LLM, whose goal is to provide creative solutions, with an evaluator, which guards against confabulations and incorrect ideas. FunSearch iterates over these two components, evolving initial low-scoring programs into high-scoring ones discovering new knowledge. Key to the success of this simple procedure is a combination of several essential ingredients. First, we sample best performing programs and feed them back into prompts for the LLM to improve on; we refer to this as best-shot prompting. Second, we start with a program in the form of a skeleton (containing boilerplate code and potentially known structure about the problem), and only evolve the part governing the critical program logic. For example, by setting a greedy program skeleton, we evolve a priority function used to make decisions at every step. Third, we maintain a large pool of diverse programs by using an island-based evolutionary method that encourages exploration and avoids local optima. Finally, leveraging the highly parallel nature of FunSearch, we scale it asynchronously, considerably broadening the scope of this approach to find new results, while keeping the overall cost of experiments low.

We show the surprising effectiveness of FunSearch on several use cases. We consider a fundamental problem in extremal combinatorics, namely, the cap set problem 21 , 22 . FunSearch demonstrates the existence of hitherto unknown constructions that go beyond existing ones, including the largest improvement in 20 years to the asymptotic lower bound. This demonstrates that it is possible to make a scientific discovery—a new piece of verifiable knowledge about a notorious scientific problem—using an LLM. Using FunSearch, we also find new algorithms for the online bin packing problem that improve on traditional ones on well-studied distributions of interest 23 , 24 , with potential applications to improving job scheduling algorithms.

Whereas most computer search techniques output directly what the solution is (for example, a list of vectors forming a cap set), FunSearch produces programs generating the solution. For structured problems, such programs tend to be more interpretable—facilitating interactions with domain experts—and concise—making it possible to scale to large instances—compared to a mere enumeration of the solution. In addition, decision procedures (such as for bin packing) described by code in a standard programming language are crucially easier to deploy compared to other types of descriptions (for example, neural networks), which typically require specialized hardware and for which verifying design specifications is notoriously hard.

An overview of FunSearch is shown in Fig. 1 , and its components are described in more detail below. For more details and ablations showing the importance of each component, see Methods and Supplementary Information Appendix  A .

figure 1

The input to FunSearch is a specification of the problem in the form of an ‘evaluate’ function, an initial implementation of the function to evolve, which can be trivial, and potentially a skeleton. At each iteration, FunSearch builds a prompt by combining several programs sampled from the programs database (favouring high-scoring ones). The prompt is then fed to the pretrained LLM and new programs are created. Newly created programs are then scored and stored in the programs database (if correct), thus closing the loop. The user can at any point retrieve the highest-scoring programs discovered so far.

Specification

The input to FunSearch is a specification of the problem in the form of an ‘evaluate’ function, which scores candidate solutions. In addition, we provide an initial program (which can be trivial) to evolve. Although in principle these are the minimum requirements, we found that performance tends to improve significantly if we write the initial ‘solve’ program in the form of a skeleton (containing boilerplate code and previous knowledge of the problem in the form of a program structure), and only use FunSearch to evolve the critical part that governs its logic. Fig. 2a shows an example in which the skeleton takes the form of a simple greedy algorithm, and the crucial part to evolve by FunSearch is the priority function that is used to make the greedy decision at every step. This delegates to FunSearch precisely the part that is usually the hardest to come up with. Whereas a fixed skeleton may constrain the space of programs that can be discovered, we find it improves overall results because it focuses the LLM resources on only evolving the critical part, instead of also using the LLM to recreate already known program structures (with more opportunities for mistakes that would render the entire program incorrect). If available, the user can optionally provide extra known information about the problem at hand, in the form of docstrings, relevant primitive functions or import packages, which FunSearch may use.

figure 2

The ‘evaluate’ function takes as input a candidate solution to the problem, and returns a score assessing it. The ‘solve’ function contains the algorithm skeleton, which calls the function to evolve that contains the crucial logic. a , Cap set. The function to evolve is called ‘priority’. b , Online bin packing. The function to evolve is called ‘heuristic’. The ‘main’ function implements the evaluation procedure by connecting the pieces together. Specifically, it uses the ‘solve’ function to solve the problem and then scores the resulting solutions using the ‘evaluate’ function. In the simplest cases, ‘main’ just executes ‘solve’ once and uses ‘evaluate’ to score the output, for example, a . In specific settings such as online algorithms, the ‘main’ function implements some more logic, for example, b .

Pretrained LLM

The LLM is the creative core of FunSearch, in charge of coming up with improvements to the functions presented in the prompt and sending these for evaluation. We obtain our results with a pretrained model, that is, without any fine-tuning on our problems. We use Codey, an LLM built on top of the PaLM2 model family 25 , which has been fine-tuned on a large corpus of code and is publicly accessible through its API 26 . Because FunSearch relies on sampling from an LLM extensively, an important performance-defining tradeoff is between the quality of the samples and the inference speed of the LLM. In practice, we have chosen to work with a fast-inference model (rather than slower-inference, higher-quality), and the results in the paper are obtained using a total number of samples on the order of 10 6 . Beyond this tradeoff, we have empirically observed that the results obtained in this paper are not too sensitive to the exact choice of LLM, as long as it has been trained on a large enough corpus of code. See Supplementary Information Appendix  A for a comparison to StarCoder 6 , a state-of-the-art open-source LLM for code.

Programs generated by the LLM are evaluated and scored on a set of inputs. For example, in the cap set problem (‘Extremal combinatorics’ section) the inputs are the values of the dimensionality n that we are interested in, and in combinatorial optimization (‘Bin packing’ section), the inputs correspond to different bin packing instances. The scores across different inputs are then combined into an overall score of the program using an aggregation function, such as the mean. The scored programs are then sent to the programs database. Programs that were incorrect (that did not execute within the imposed time and memory limits, or produced invalid outputs) are discarded, and the remaining scored programs are then sent to the programs database.

Programs database

The programs database keeps a population of correct programs, which are then sampled to create prompts. Preserving and encouraging diversity of programs in the database is crucial to enable exploration and avoid being stuck in local optima. To encourage diversity, we adopt an islands model, also known as a multiple population and multiple-deme model 27 , 28 , which is a genetic algorithm approach. Several islands, or subpopulations, are created and evolved independently. To sample from the program database, we first sample an island and then sample a program within that island, favouring higher-scoring and shorter programs (see Methods for the exact mechanism). Crucially, we let information flow between the islands by periodically discarding the programs in the worst half of the islands (corresponding to the ones whose best individuals have the lowest scores). We replace the programs in those islands with a new population, initialized by cloning one of the best individuals from the surviving islands.

New prompts are created by ‘best-shot prompting’ from the programs database, and are then fed to the LLM to generate a new program. We first sample k programs from a single island in the programs database, according to the procedure described above. Sampled programs are then sorted according to their score, and a version is assigned to each (‘v0’ for the lowest scoring program, ‘v1’ for the second lowest scoring and so on). These programs are then combined into a single prompt—with the version appended as a suffix to the function name; for example, in the case of Fig. 2a , this would be ‘ p riority_v0’, ‘priority_v1’, ...—and the header of the function we wish to generate (for example, ‘priority_vk’) is added to the end of the prompt. In practice, we set k  = 2, as two functions lead to better results compared to just one, with diminishing returns beyond that. Constructing a prompt by combining several programs (as opposed to only one) enables the LLM to spot patterns across the different programs and generalize those. Related approaches to prompt building have been recently considered, for example ref. 16 , and were shown to perform well on different domains.

Distributed approach

We implement FunSearch as a distributed system that has three types of workers—a programs database, samplers and evaluators—which communicate asynchronously. The programs database stores and serves programs, samplers generate new functions using the pretrained LLM and evaluators assess programs, as shown in Supplementary Fig. F.26 . In the example shown in Fig. 2a , the programs database stores priority functions, samplers generate new implementations of ‘priority’ and evaluators score the proposals by executing the ‘main’ function on user-specified inputs. Our distributed system offers several advantages. First, it naturally leverages parallelism across different tasks: for example, LLM sampling and evaluation are performed concurrently. Second, it enables scaling to more than one sampler and evaluator, which would be a very limiting setup, considering that evaluation can take minutes for many problems of interest. Running evaluators in parallel considerably broadens the scope of this approach to such problems. The distributed setting enables the running of many evaluator nodes on inexpensive CPU hardware, whereas few samplers run on machines with accelerators for fast LLM inference; this keeps the overall cost and energy usage of experiments low. In our experiments, we typically use 15 samplers and 150 CPU evaluators (can be served on five CPU servers each running 32 evaluators in parallel). See Supplementary Information Appendix  A for more details. Also, because of the randomness of LLM sampling and the evolutionary procedure, for some problems we run several experiments to get the best reported results. See Methods and Supplementary Information Appendix  A.3 for a full statistical analysis.

We now describe some of the new discoveries made by FunSearch in two different fields: pure mathematics and applied computer science. Further discoveries on other problems (namely, the corners problem and Shannon capacity of cycle graphs) are presented in Supplementary Information Appendix  B . The full discovered programs are available in Supplementary Information Appendix  C .

Extremal combinatorics

We apply FunSearch to two related problems in extremal combinatorics: a branch of mathematics that studies the maximal (or minimal) possible sizes of sets satisfying certain properties.

The cap set problem 21 , once described by Terence Tao as ‘perhaps my favourite open question’ 29 , refers to the task of finding the largest possible set of vectors in \({{\mathbb{Z}}}_{3}^{n}\) (known as a cap set) such that no three vectors sum to zero. Geometrically, no three points of a cap set are in a line (see Fig. 3 for an example with n  = 2).

figure 3

The circles are the elements of \({{\mathbb{Z}}}_{3}^{2}\) with the ones belonging to the cap set shown in blue. The possible lines in \({{\mathbb{Z}}}_{3}^{2}\) are also shown (with colours indicating lines that wrap around in arithmetic modulo 3). No three elements of the cap set are in a line.

The problem has drawn much interest for a variety of reasons. For one, it is an analogue of the classical number theory problem of finding large subsets of primes in which no three are in arithmetic progression. For another, it differs from many problems in combinatorics in that there is no consensus among mathematicians about what the right answer should be. Finally, the problem serves as a model for the many other problems involving ‘three-way interactions’. For instance, progress towards improved upper bounds for the cap set problem 30 , 31 immediately led to a series of other combinatorial results, for example, on the Erdös–Radio sunflower problem 32 .

The exact size of the largest possible cap set in n dimensions is known only for n  ≤ 6. A brute force approach is not practical as the search space quickly becomes enormous with growing n , for example, around 3 1,600 for n  = 8. Previous methods impose potentially suboptimal restrictions on the search space 33 , 34 . By contrast, we search the full space by means of an algorithm skeleton that uses a function ‘priority’ : \({{\mathbb{Z}}}_{3}^{n}\to {\mathbb{R}}\) . Intuitively, this function provides a priority with which each \(x\in {{\mathbb{Z}}}_{3}^{n}\) should be included in the cap set. Our algorithm starts with an empty set and iteratively adds the vector \(x\in {{\mathbb{Z}}}_{3}^{n}\) with the highest priority that does not violate the cap set constraint; Fig. 2a . Starting from a trivial constant function, we evolve the crucial ‘priority’ component of our approach to result in large cap sets.

Using this approach, we discovered cap sets of sizes shown in Fig. 4a . Notably, in dimension n  = 8, FunSearch found a larger cap set than what was previously known, thus illustrating the power of FunSearch to discover new constructions. This also shows the scalability of FunSearch to larger dimensions, in which the previously best-known construction relied on a complex combination of cap sets in lower dimensions 33 , 34 . By contrast, FunSearch discovered a larger cap set from scratch, without having to be explicitly taught any way of combining cap sets. Moreover, we do not just discover the set of 512 eight-dimensional vectors in itself, but a program that generates it: we show this program in Fig. 4b . Through inspecting the code, we obtain a degree of understanding of what this set is: specifically, manual simplification of Fig. 4b provides the construction in Fig. 4c . Some properties of this construction are similar to the construction of the Hill cap 35 , 36 , which results in the optimal 112-cap in \({{\mathbb{Z}}}_{3}^{6}\) .

figure 4

a , Size of the largest cap set in \({{\mathbb{Z}}}_{3}^{n}\) for different dimensions n . b , The function ‘priority’ : \({{\mathbb{Z}}}_{3}^{n}\to {\mathbb{R}}\) discovered by FunSearch that results in a cap set of size 512 in n  = 8 dimensions. One feature to note is that the priority is affected by whether the same entry appears in positions i and − i ( − i denotes the i th position counting from the end). This motivates the notion of reflections, used in c . c , An explicit construction of this new 512-cap, which we were able to manually construct thanks to having discovered the cap set by searching in function space. See Supplementary Information  Appendix  E.2 for more details and for relation to Hill cap.

Admissible sets

Beyond finding the size of the largest cap set c n in dimension n , a fundamental problem in additive combinatorics 22 is determining the capacity \(C=\mathop{\sup }\limits_{n}\,{c}_{n}^{1/n}\) . The breakthrough result from ref. 31 established an upper bound of C  ≤ 2.756. In this work, we are interested in lower bounds on C . To this end, we use the framework of constant weight admissible sets (or admissible sets for short) 34 , 37 , which has established the current state-of-the-art.

Formally, admissible sets \({\mathcal{A}}(n,w)\) are collections of vectors in {0, 1, 2} n satisfying two properties: (1) each vector has the same number w of non-zero elements but a unique support (therefore \(| A| \le \left(\begin{array}{c}n\\ w\end{array}\right)\) ); (2) for any three distinct vectors there is a coordinate in which their three respective values are {0, 1, 2}, {0, 0, 1} or {0, 0, 2}. Informally, an admissible set describes how to combine cap sets in smaller dimensions into large cap sets in higher dimensions 34 . We denote the set of full-size admissible sets (with \(| A| =\left(\begin{array}{c}n\\ w\end{array}\right)\) ) as \({\mathcal{I}}(n,w)\) . The current state-of-the-art 38 has relied on SAT solvers to construct large admissible sets.

As before, we evolve a function ‘priority’ : \({\{0,1,2\}}^{n}\to {\mathbb{R}}\) , which is used to iteratively grow admissible sets. Starting from a trivial constant function, we discover one that provides us with an \({\mathcal{I}}(12,7)\) admissible set; the discovered program is shown in Fig. 5b . This discovery alone already improves the lower bound on the cap set capacity from 2.2180 (ref. 38 ) to 2.2184. Yet, interpreting the program found by FunSearch (Fig. 5b ) helps us significantly push the boundaries of what admissible sets we can construct. Specifically, we notice that the discovered ‘priority’ function treats the n coordinates in a highly symmetric way, and indeed it turns out that the admissible set it constructs is preserved under independent cyclic permutations of coordinates within four disjoint groups of coordinate triples. Hereinafter we call such admissible sets symmetric (see Supplementary Information Appendix  D for a formal definition).

figure 5

a , Summary of lower bounds on the cap set capacity C . b , The ‘priority’ function \({\{0,1,2\}}^{n}\to {\mathbb{R}}\) discovered by FunSearch that results in an \({\mathcal{I}}(12,7)\) admissible set. The source code shows that when n  = 12, the function treats the four triples of coordinates {0, 4, 8}, {1, 5, 9}, {2, 6, 10} and {3, 7, 11} together. We then checked that the admissible set is in fact symmetric under independent cyclic permutations of coordinates within each of these four triples. See Supplementary Information Appendices D and   E.3 for more details.

We now use FunSearch to directly search for symmetric admissible sets. Note that this is a more restricted and also much smaller search space, which allows for significantly higher dimensions and weights than were previously possible. This led us to discovering a full-size \({\mathcal{I}}(15,10)\) admissible set (indicating C  ≥ 2.219486) and a partial admissible set in \({\mathcal{A}}(24,17)\) of size 237,984, which implies a new lower bound on the cap set capacity of 2.2202 (Fig. 5a ). Although this is a great improvement to the lower bound compared to research in the last 20 years, we note it is still far from the upper bound and we hope our results inspire future work on this problem.

Not only does FunSearch scale to much larger instances than traditional combinatorial solvers (Supplementary Information Appendix A.4 ), but it is also a unique feature of searching in function space that we were able to inspect the code discovered by FunSearch and infer a new insight into the problem, in the form of a new symmetry. The procedure we followed in this section is a concrete example of how LLM-based approaches can be used in mathematical sciences: FunSearch suggests a solution, which is examined by researchers, who may note features of interest. These features are used to refine the search, leading to better solutions. This process can be iterated, with both human and search consistently in the loop.

Bin packing

Combinatorial optimization is a subfield of mathematics that plays an important role across a wide range of areas, from theoretical computer science to practical problems in logistics and scheduling. Whereas many combinatorial optimization problems are provably hard to solve for large instances, it is typically possible to achieve strong performance using heuristics to guide the search algorithm. The choice of a heuristic is crucial for obtaining strong performance, but designing a good heuristic is difficult in practice. In this section, we show that FunSearch can be used to discover effective heuristics for one of the central problems in combinatorial optimization: bin packing 39 .

The goal of bin packing is to pack a set of items of various sizes into the smallest number of fixed-sized bins. Bin packing finds applications in many areas, from cutting materials to scheduling jobs on compute clusters. We focus on the online setting in which we pack an item as soon as it is received (as opposed to the offline setting in which we have access to all items in advance). Solving online bin packing problems then requires designing a heuristic for deciding which bin to assign an incoming item to.

Heuristics for online bin packing are well studied and several variants exist with strong worst case performance 40 , 41 , 42 , 43 , 44 , 45 . However, they often show poor performance in practice 39 . Instead, the most commonly used heuristics for bin packing are first fit and best fit. First fit places the incoming item in the first bin with enough available space, whereas best fit places the item in the bin with least available space where the item still fits. Here, we show that FunSearch discovers better heuristics than first fit and best fit on simulated data.

To achieve this, we define a heuristic as a program that takes as input an item and an array of bins (containing the remaining capacity of each bin) and returns a priority score for each bin. The ‘solve’ function picks the bin with the highest score according to the heuristic (Fig. 2b ). FunSearch is then used to evolve this heuristic, starting from best fit.

We first evaluate FunSearch on the well-known OR-Library bin packing benchmarks 23 , consisting of four datasets, OR1 to OR4, containing bin packing instances with an increasing number of items (see Supplementary Information Appendix  E.4 for details). We evolve our heuristic on a training set of generated bin packing instances with the same number of items as those in OR1 and, after the evolutionary process is concluded, test it on the OR1 to OR4 datasets. We measure performance as the fraction of excess bins used over the L 2 lower bound 46 of the optimal offline packing solution (which is generally not achievable in the online setting).

As can be seen in Table 1 , FunSearch outperforms both first fit and best fit across all datasets. Further, the learned heuristic generalizes: even though it has only seen instances of the same size as OR1 during training, it generalizes across problem sizes, performing even better on large instances and widening the gap to best fit. In addition to the OR benchmarks, we also use FunSearch to evolve heuristics on bin packing instances sampled from a Weibull distribution, as these closely follow many real-world scheduling problems 24 , 47 (see Supplementary Information Appendix  E.4 for details). As shown in Table 1 , the performance of FunSearch is very strong on this dataset, significantly outperforming first fit and best fit across instances, as well as scaling gracefully to large instances (being only 0.03% off the lower bound on the optimum for 100,000 items). In addition, FunSearch is robust and consistently outperforms these baselines as shown in the statistical analysis in the Supplementary Information Appendix  A.3 .

We observed that several heuristics discovered by FunSearch use the same general strategy for bin packing (see Fig. 6 for an example). Instead of packing items into bins with the least capacity (such as best fit), the FunSearch heuristics assign items to least capacity bins only if the fit is very tight after placing the item. Otherwise, the item is typically placed in another bin, which would leave more space after the item is placed. This strategy avoids leaving small gaps in bins that are unlikely to ever be filled (see Supplementary Information Appendix  E.5 for example visualizations of such packings).

figure 6

This example illustrates frequently observed behaviour: instead of always packing items into the best fit bin, the heuristic encourages packing the item only if the fit is tight. Comments in the code were manually added. See Supplementary Information Appendix  C for more discovered heuristics.

As this example demonstrates, the benefits of FunSearch extend beyond theoretical and mathematical results to practical problems such as bin packing. Indeed, bin packing, and related combinatorial optimization problems, are ubiquitous and find applications across a range of industries. We are optimistic that FunSearch could be applied to several such use cases with potential for real-world impact.

The effectiveness of FunSearch in discovering new knowledge for hard problems might seem intriguing. We believe that the LLM used within FunSearch does not use much context about the problem; the LLM should instead be seen as a source of diverse (syntactically correct) programs with occasionally interesting ideas. When further constrained to operate on the crucial part of the algorithm with a program skeleton, the LLM provides suggestions that marginally improve over existing ones in the population, which ultimately results in discovering new knowledge on open problems when combined with the evolutionary algorithm. Another crucial component of the effectiveness of FunSearch is that it operates in the space of programs: rather than directly searching for constructions (which is typically an enormous list of numbers), FunSearch searches for programs generating those constructions. Because most problems we care about are structured (highly non-random), we believe that solutions are described more concisely with a computer program, compared to other representations. For example, the trivial representation of the admissible set \({\mathcal{A}}(24,17)\) consists of more than 200,000 vectors, but the program generating this set consists of only a few lines of code. Because FunSearch implicitly encourages concise programs, it scales to much larger instances compared to traditional search approaches in structured problems. In a loose sense, FunSearch attempts to find solutions that have low Kolmogorov complexity 48 , 49 , 50 (which is the length of the shortest computer program that produces a given object as output), whereas traditional search procedures have a very different inductive bias. We believe that such Kolmogorov-compressed inductive bias is key to FunSearch scaling up to the large instances in our use cases. In addition to scale, we have empirically observed that FunSearch outputs programs that tend to be interpretable: that is, they are clearly easier to read and understand compared to a list of numbers. For example, by scrutinizing FunSearch’s output for the admissible set problem, we found a new symmetry, which was then subsequently used to improve the results even further. Despite the rarity of symmetric solutions, we observe that FunSearch preferred symmetric ones, as these are more parsimonious (that is, they require less information to specify), in addition to the natural bias of LLMs (trained on human-produced code) in outputting code with similar traits to human code. This is in contrast to traditional genetic programming that does not have this bias (and in addition requires hand-tuning the mutation operators 51 ).

We note that FunSearch, at present, works best for problems having the following characteristics: (1) availability of an efficient evaluator; (2) a ‘rich’ scoring feedback quantifying the improvements (as opposed to a binary signal) and (3) ability to provide a skeleton with an isolated part to be evolved. For example, the problem of generating proofs for theorems 52 , 53 , 54 falls outside this scope, because it is unclear how to provide a rich enough scoring signal. By contrast, for MAX-SAT, the number of satisfied clauses can be used as a scoring signal. In this paper, we have explicitly striven for simplicity and we are confident that FunSearch can be further extended to improve its performance and be applicable to more classes of problems. In addition, the rapid development of LLMs is likely to result in samples of far superior quality at a fraction of the cost, making FunSearch more effective at tackling a broad range of problems. As a result, we foresee that automatically tailored algorithms will soon become common practice and deployed in real-world applications.

Implementation details of FunSearch

Distributed system.

We implement FunSearch as a distributed system that has three types of workers: a programs database, samplers and evaluators. The programs database stores the initial user-provided program, as well as all programs received from the evaluators. The samplers are in charge of performing the LLM inference step; to do so they repeatedly query the programs database for prompts. To achieve higher sampling throughput, samplers generate several samples from each prompt. The samples from the LLM (that is, the generated programs) are sent to the evaluators, which score programs by executing them on inputs of interest and assessing the outputs using ‘evaluate’. Programs that are correct are sent to the programs database to be stored. Each of the three FunSearch components is provided as both Python code and pseudocode (Supplementary Information Appendix  F ).

Prompt building

When queried for a prompt, the programs database samples k programs to encourage the LLM to merge ideas from them (we typically set k  = 2; Supplementary Information  Appendix  E.1 ). Programs are sorted according to their score in increasing order, starting from version 0 (‘v0’). Using these k programs, the prompt is built as explained next.

For the sake of clarity, we use here the problem specification from Fig. 2a to precisely describe the prompting mechanism. The overall structure of the prompt mimics the structure of the program skeleton, with the following differences: (1) the ‘priority’ function is stripped out and replaced with the k  = 2 programs sampled, first ‘priority_v0’ and then ‘priority_v1’. (2) After that, a ‘priority_v2’ function with no body is appended: the LLM will be in charge of completing the body of that function. (3) All other functions that appear before ‘priority_v0’ are removed. See Extended Data Fig. 1 for an example of the structure of a prompt.

Evolutionary method and program selection

Another key feature of FunSearch is the method used for evolution of the population of programs from the programs database, as well as for program selection: that is, how the programs database samples programs when queried for a prompt. For this, we use the islands model, a parallel genetic algorithm 27 , 28 . Specifically, we split the population into m separate groups or islands. Each island is initialized with a copy of the user-provided initial program and is evolved separately. That is, whenever a prompt is required, we first uniformly sample an island and then sample k  = 2 programs from that island to build the prompt. The programs generated from the LLM on the basis of that prompt will later be stored in the same island. Every 4 h, we discard all the programs from the m /2 islands whose best instances have the lowest score. Each of these islands is then seeded with a single program, obtained by first choosing one of the surviving m /2 islands uniformly at random and then retrieving the highest-scoring program from that island (breaking ties in favour of older programs). The evolutionary process is then restarted from this state, in which the reset islands contain one high-performing program each (Extended Data Fig. 2 ).

This method has several advantages. First, drawing the analogy in which an island corresponds to an experiment, this approach effectively allows us to run several smaller experiments in parallel instead of a single large experiment. This is beneficial because single experiments can get stuck in local minima, in which most programs in the population are not easily mutated and combined into stronger programs. The multiple island approach allows us to bypass this and effectively kill off such experiments to make space for new ones starting from more promising programs. Second, promising experiments are run for longer, as the islands that survive a reset are the ones with higher scores.

Within each island, we further cluster programs according to their signature. We define the signature of a program as the tuple containing the program’s scores on each of the inputs (for example, the cap set size for each input n ). Programs with the same signature are clustered together. When sampling a program within an island, we first sample an island’s cluster and then a program within that cluster (Extended Data Fig. 3 ). This approach, which aims to preserve diversity 55 , 56 , is related to Lexicase 57 in that both approaches consider a set of test cases for scoring an individual, and it is related to fitness uniform optimization 58 , which also clusters individuals on the basis of their fitness value; however, we sample the clusters on the basis of their score instead of uniformly, as detailed next.

When sampling a cluster, we favour those with larger score values. Specifically, let s i denote the score of the i th cluster, defined as an aggregation (for example, mean) of all the scores in the signature that characterizes that cluster. The probability P i of choosing cluster i is

where T cluster is the temperature parameter, n is the current number of programs in the island, and T 0 and N are hyperparameters (given in Supplementary Information Appendix  E.1 ). This approach is sometimes referred to as the Boltzmann selection procedure 59 .

When sampling a program within a cluster, we favour shorter programs. In particular, let ℓ i denote the negative length of the i th program within the chosen cluster (measured as the number of characters), and let \({\widetilde{{\ell }}}_{i}=\frac{{{\ell }}_{i}-\mathop{\min }\limits_{{i}^{{\prime} }}{{\ell }}_{{i}^{{\prime} }}}{\mathop{\max }\limits_{{i}^{{\prime} }}{{\ell }}_{{i}^{{\prime} }}+1{0}^{-6}}\) . We set the probability of each program proportional to \(\exp ({\widetilde{{\ell }}}_{i}/{T}_{{\rm{program}}})\) , where T program is a temperature hyperparameter.

Owing to randomness in LLM sampling and in the evolutionary procedure, repeating an experiment can lead to different results. For some problems (for example, cap set through the admissible set problem and online bin packing) every single run of FunSearch surpasses the baseline, with only some variation in the magnitude of the difference. For example, all experiments on admissible sets improve on the previous best capacity lower bound, with 60% of experiments on \({\mathcal{I}}(12,7)\) finding a full-size admissible set. For other problems, many independent repetitions of an experiment may be necessary to improve on previous best results. In particular, the case of cap set by direct construction in n  = 8 dimensions is particularly challenging, with only four out of 140 experiments discovering a cap set of size 512. See Supplementary Information Appendix  A.3 for more details.

Related work

The rise of powerful LLMs such as that in ref. 60 has been followed by systems in which an LLM core has been enveloped by a ‘programmatic scaffold’ 61 , and several LLM calls were connected in some way to accomplish larger and more intricate tasks beyond what would be possible using a single prompt and the raw LLM, possibly by using external tools or external memory streams 62 , 63 , 64 , 65 , 66 . LLMs have also been paired with evaluators; for example, refs. 20 , 67 fine-tuned an LLM on data that had been previously generated by the LLM itself (respectively on puzzle problems and solutions, and on justifications and/or explanations for answers to questions), and they used an evaluator to assess the correctness of this data, ensuring that the fine-tuning dataset contained only correct solutions and/or explanations. More related to our approach is the use of LLMs as mutation operators on code, and ref. 3 was the first study to show that coupling an LLM with a programmatic way of scoring a solution can lead to a self-improvement loop. In refs. 16 , 17 , 18 , 19 , the LLM was used as a crossover operator rather than a mutation one, that is, the LLM prompts are composed of several functions, similarly to FunSearch. In refs. 3 , 16 , the task was to improve code that generated bidimensional virtual robots that could move as far as possible in a given simulated terrain (ref. 16 also considered the tasks of symbolic regression, natural language sentences and image generation). In refs. 17 , 18 , 19 the task was to find neural network architectures (described with Python code), and in ref. 68 the task was continuous exploration in the game of Minecraft. By contrast, in this paper, we tackle open problems in mathematics and algorithm design, and we surpass human-designed constructions. We achieve that by combining several ingredients: a distributed system with many samplers and evaluators that communicate asynchronously, a user-provided program specification and skeleton, as well as an evolutionary mechanism based on islands that preserves the diversity of programs. FunSearch achieves that using an off-the-shelf LLM without fine-tuning.

More broadly, LLMs have been used for program synthesis as one of its main applications 4 , 5 , 6 , 7 , 8 . There are many use cases being explored, such as automatically editing code to improve performance 13 , automatically debugging code 9 , 10 , generating code from natural language descriptions 69 , 70 , 71 and doing so to solve problems in code competitions 11 , 12 . Unlike the above approaches that provide tools to increase the productivity of software engineers, we combine in this paper the creativity of LLMs with the power of evolutionary procedures to push the boundaries of human knowledge through solving open hard problems. Another line of research uses LLMs to guide the search for formal proofs for automatic theorem proving 52 , 53 , 54 . Although this approach has the potential to eventually find new knowledge, the achievements of these methods still lag behind the frontier of human knowledge.

Genetic programming

Genetic programming is a subfield of computer science concerned with automatically generating or discovering computer programs using evolutionary methods 15 , 72 , 73 and is used for symbolic regression applications 74 , 75 and discovery of optimization algorithms 76 among others. In this broad sense, combining LLMs with evolution can be seen as an instance of genetic programming with the LLM acting as a mutation and crossover operator. However, using an LLM mitigates several issues in traditional genetic programming 51 , as shown in Supplementary Information Appendix  A and discussed in ref. 3 . Indeed, genetic programming methods require defining several parameters, chief among them the set of allowed mutation operations (or primitives) 15 . Designing such a set of operations is non-trivial and problem specific, requiring domain knowledge about the problem at hand or its plausible solution 51 . Although research has been done to mitigate this limitation, through, for example, the reuse of subprograms 77 or modelling the distribution of high-performing programs 78 , designing effective and general code mutation operators remains difficult. By contrast, LLMs have been trained on vast amounts of code and as such have learned about common patterns and routines from human-designed code. The LLM can leverage this, as well as the context given in the prompt, to generate more effective suggestions than the random ones typically used in genetic programming.

Related to genetic programming, the field of hyper-heuristics 79 , 80 seeks to design learning methods for generating heuristics applied to combinatorial optimization problems. In practice, these heuristics are often programs discovered through genetic programming, typically by evolving a heuristic on a set of instances of a given combinatorial optimization problem, such as bin packing 81 . Indeed, like FunSearch, hyper-heuristics have also been applied to online bin packing, with the learned heuristics able to match the performance of first fit 82 and best fit 83 on a set of generated bin packing instances. Augmenting the heuristics with memory of previously seen items can even lead to heuristics outperforming best fit 84 . In addition, these evolved heuristics can sometimes generalize to larger instances than the ones they were trained on 85 , similar to the learned FunSearch heuristics. However, as is the case with genetic programming, one of the fundamental limitations of hyper-heuristics is that the components of the evolved heuristic must be manually defined by the user and often need to be tailored to a specific problem to be effective. The LLM in FunSearch allows us to bypass this limitation and learn heuristics for bin packing and job scheduling as well as discovering new mathematical constructions, all within a single pipeline without problem-specific tuning.

Program superoptimization and software engineering

Searching for the best way of modifying source code is a task that appears in several branches of computer science and software development. These occurrences can be broadly classified into two groups: first, in which the goal is to find semantic-preserving modifications (this arises in program optimization and superoptimization, in which the aim is to modify the program so that it executes faster while maintaining its input–output behaviour), and second, in which the goal is to find programs with different semantics (this arises, for example, in automatic program repair and mutation testing). With some exceptions discussed below, most of these areas use relatively simple and hard-coded mutation operators on either the source code directly (such as deleting or swapping lines) or on the abstract syntax tree.

Machine learning approaches have been used for program superoptimization. For example, ref. 86 used reinforcement learning to learn the sampling probabilities used within a hierarchical probabilistic model of simple program edits introduced by STOKE 87 . Neural networks have also been proposed as a mutation operator for program optimization in ref. 88 . These studies operated on code written in Assembly (perhaps because designing meaningful and rich edit distributions on programs in higher-level languages is challenging). More recently, ref. 13 used LLMs to find performance-improving edits to code written in C++ or Python. We also note that reinforcement learning has recently been applied to discover new faster algorithms for fundamental operations such as matrix multiplication 89 and sorting 90 .

In this paper, we have not explicitly explored semantic-preserving applications such as discovering performance-improving code edits, but we believe that FunSearch could be an effective method for that setting too. In both use cases presented in the main text, the goal is to evolve programs with new semantics, but the application is different from program repair or mutation testing: in the ‘Extremal combinatorics’ section, we used FunSearch to discover a program that constructs a previously unknown mathematical object, and in the ‘Bin packing’ section, we used FunSearch to discover a program that corresponds to a more efficient heuristic for online bin packing.

Data availability

The experiments carried out in this paper do not require any data corpus other than the publicly available OR-Library bin packing benchmarks 23 . The output functions of interest produced by FunSearch are shown across the main paper and in text files in the Supplementary Information .

Code availability

The discovered functions as well as the evolutionary algorithm, code manipulation routines and a single-threaded implementation of the FunSearch pipeline are available as Python code in the Supplementary Information and at https://github.com/google-deepmind/funsearch . Furthermore, the software library launchpad 91 and a sandbox for safely executing generated code on our internal distributed system were used. No training or fine-tuning of a LLM is required; API access for inference is sufficient. We used Codey 26 , which is available through its API, and StarCoder 6 , which is open source.

Bang, Y. et al. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. Preprint at https://arxiv.org/abs/2302.04023 (2023).

Borji, A. A. categorical archive of ChatGPT failures. Preprint at https://arxiv.org/abs/2302.03494 (2023).

Lehman, J. et al. in Handbook of Evolutionary Machine Learning (eds Banzhaf, W. et al.) 331–366 (Springer, 2023).

Chen, M. et al. Evaluating large language models trained on code. Preprint at https://arxiv.org/abs/2107.03374 (2021).

Austin, J. et al. Program synthesis with large language models. Preprint at https://arxiv.org/abs/2108.07732 (2021).

Li, R. et al. StarCoder: may the source be with you! Preprint at https://arxiv.org/abs/2305.06161 (2023).

Fried, D. et al. Incoder: a generative model for code infilling and synthesis. In Proc. International Conference on Learning Representations (2022).

Nijkamp, E. et al. CodeGen: an open large language model for code with multi-turn program synthesis. In Proc. International Conference on Learning Representations (2022).

Chen, X., Lin, M., Schärli, N. & Zhou, D. Teaching large language models to self-debug. Preprint at https://arxiv.org/abs/2304.05128 (2023).

Liventsev, V., Grishina, A., Härmä, A. & Moonen, L. Fully autonomous programming with large language models. Preprint at https://arxiv.org/abs/2304.10423 (2023).

Li, Y. et al. Competition-level code generation with alphacode. Science 378 , 1092–1097 (2022).

Article   ADS   CAS   PubMed   Google Scholar  

Zelikman, E., Huang, Q., Poesia, G., Goodman, N. D. & Haber, N. Parsel: a (de-) compositional framework for algorithmic reasoning with language models. Preprint at https://arxiv.org/abs/2212.10561 (2023).

Madaan, A. et al. Learning performance-improving code edits. Preprint at https://arxiv.org/abs/2302.07867 (2023).

Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, 1989).

Koza, J. R. Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4 , 87–112 (1994).

Article   Google Scholar  

Meyerson, E. et al. Language model crossover: variation through few-shot prompting. Preprint at https://arxiv.org/abs/2302.12170 (2023).

Chen, A., Dohan, D. M. & So, D. R. EvoPrompting: language models for code-level neural architecture search. Preprint at https://arxiv.org/abs/2302.14838 (2023).

Zheng, M. et al. Can GPT-4 perform neural architecture search? Preprint at https://arxiv.org/abs/2304.10970 (2023).

Nasir, M. U., Earle, S., Togelius, J., James, S. & Cleghorn, C. LLMatic: neural architecture search via large language models and quality-diversity optimization. Preprint at https://arxiv.org/abs/2306.01102 (2023).

Haluptzok, P., Bowers, M. & Kalai, A. T. Language models can teach themselves to program better. In International Conference on Learning Representations (2023).

Grochow, J. New applications of the polynomial method: the cap set conjecture and beyond. Bull. Am. Math. Soc. 56 , 29–64 (2019).

Article   ADS   MathSciNet   Google Scholar  

Tao, T. & Vu, V. H. Additive Combinatorics Vol. 105 (Cambridge Univ. Press, 2006).

Beasley, J. E. OR-library: distributing test problems by electronic mail. J. Oper. Res. Soc. 41 , 1069–1072 (1990).

Castiñeiras, I., De Cauwer, M. & O’Sullivan, B. Weibull-based benchmarks for bin packing. In Proc. International Conference on Principles and Practice of Constraint Programming 207–222 (Springer, 2012).

Anil, R. et al. Palm 2 technical report. Preprint at https://arxiv.org/abs/2305.10403 (2023).

Code models overview. Vertex AI, Google Cloud https://cloud.google.com/vertex-ai/docs/generative-ai/code/code-models-overview (2023).

Tanese, R. Distributed Genetic Algorithms for Function Optimization. PhD thesis, Univ. Michigan (1989).

Cantú-Paz, E. A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux et Systemes Repartis 10 , 141–171 (1998).

Google Scholar  

Tao, T. Open question: best bounds for cap sets. WordPress Blog https://terrytao.wordpress.com/2007/02/23/open-question-best-bounds-for-cap-sets/ (2009).

Croot, E., Lev, V. F. & Pach, P. P. Progression-free sets in are exponentially small. Ann. Math. 185 , 331–337 (2017).

Ellenberg, J. S., Gijswijt, D. On large subsets of \({F}_{q}^{n}\) with no three-term arithmetic progression. Ann. Math. 185 , 339–343 (2017).

Naslund, E. & Sawin, W. Upper bounds for sunflower-free sets. Forum Math. Sigma 5 , e15 (2017).

Edel, Y. & Bierbrauer, J. Large caps in small spaces. Des. Codes Cryptogr. 23 , 197–212 (2001).

Article   MathSciNet   Google Scholar  

Edel, Y. Extensions of generalized product caps. Des. Codes Cryptogr. 31 , 5–14 (2004).

Hill, R. On the largest size of cap in S 5,3 . Rend Lincei. Sci. Fis. Mat. Nat. 54 , 378–384 (1973).

MathSciNet   Google Scholar  

Cameron, P. J. & Van Lint, J. H. Designs, Graphs, Codes and Their Links Vol. 3 (Cambridge Univ. Press, 1991).

Calderbank, A. R. & Fishburn, P. C. Maximal three-independent subsets of {0, 1, 2} n . Des. Codes Cryptogr. 4 , 203–211 (1994).

Tyrrell, F. New lower bounds for cap sets. Discrete Analysis https://doi.org/10.19086/da.91076 (2023).

Coffman, E. G., Garey, M. R. & Johnson, D. S. in Algorithm Design for Computer System Design (eds Ausiello, G. et al.) 49–106 (Springer, 1984).

Lee, C. C. & Lee, D. T. A simple on-line bin-packing algorithm. J. ACM 32 , 562–572 (1985).

Ramanan, P., Brown, D. J., Lee, C.-C. & Lee, D.-T. On-line bin packing in linear time. J. Algorithm. 10 , 305–326 (1989).

Seiden, S. S. On the online bin packing problem. J. ACM 49 , 640–671 (2002).

Balogh, J., Békési, J., Dósa, G., Sgall, J. & Stee, R. V. The optimal absolute ratio for online bin packing. In Proc. Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms , SIAM (ed. Chekuri, C.) 1425–1438 (SIAM, 2014).

Balogh, J., Békési, J., Dósa, G., Epstein, L. & Levin, A. A new and improved algorithm for online bin packing. In Proc. 26th Annual European Symposium on Algorithms (ESA 2018) 5:1–5:14 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018).

Coffman, E. G., Csirik, J., Galambos, G., Martello, S. & Vigo, D. in Handbook of Combinatorial Optimization (eds Pardalos, P. M. et al.) 455–531 (Springer, 2013).

Martello, S. & Toth, P. Lower bounds and reduction procedures for the bin packing problem. Discrete Appl. Math. 28 , 59–70 (1990).

Angelopoulos, S., Kamali, S. & Shadkami, K. Online bin packing with predictions. J. Artif. Intell. Res. 36 , 4574–4580 (2022).

Chaitin, G. J. On the length of programs for computing finite binary sequences. J. ACM 13 , 547–569 (1966).

Li, M. et al. An Introduction to Kolmogorov Complexity and its Applications Vol. 3 (Springer, 2008).

Solomonoff, R. J. A formal theory of inductive inference. Part I. Inf. Control 7 , 1–22 (1964).

O’Neill, M., Vanneschi, L., Gustafson, S. & Banzhaf, W. Open issues in genetic programming. Genet. Program. Evolvable Mach. 11 , 339–363 (2010).

Polu, S. & Sutskever, I. Generative language modeling for automated theorem proving. Preprint at https://arxiv.org/abs/2009.03393 (2020).

Polu, S. et al. Formal mathematics statement curriculum learning. In International Conference on Learning Representations (2023).

Jiang, A. Q. et al. THOR: wielding hammers to integrate language models and automated theorem provers. Adv. Neural Info. Process. Syst. 35 , 8360–8373 (2022).

Mouret, J.-B. & Doncieux, S. Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity. In Proc. 2009 IEEE Congress on Evolutionary Computation 1161–1168 (IEEE, 2009).

Pugh, J. K., Soros, L. B. & Stanley, K. O. Quality diversity: a new frontier for evolutionary computation. Front. Robotics AI 3 , 40 (2016).

Helmuth, T., Spector, L. & Matheson, J. Solving uncompromising problems with lexicase selection. IEEE Trans. Evol. Comput. 19 , 630–643 (2015).

Hutter, M. & Legg, S. Fitness uniform optimization. IEEE Trans. Evol. Comput. 10 , 568–589 (2006).

de la Maza, M. An analysis of selection procedures with particular attention paid to proportional and Boltzmann selection. In Proc. Fifth International Conference on Genetic Algorithms (Morgan Kaufmann, 1993).

OpenAI, GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

Millidge, B. Scaffolded LLMs as natural language computers. Beren’s Blog https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers (2023).

Schick, T. et al. Toolformer: language models can teach themselves to use tools. Preprint at https://arxiv.org/abs/2302.04761 (2023).

Park, J. S. et al. Generative agents: interactive simulacra of human behavior. In Proc. 36th Annual ACM Symposium on User Interface Software and Technology 1–22 (ACM, 2023).

Wu, J. et al. Recursively summarizing books with human feedback. Preprint at https://arxiv.org/abs/2109.10862 (2021).

Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. In Deep Learning for Code Workshop, International Conference on Learning Representations (2022).

Yao, S. et al. ReAct: dynergizing reasoning and acting in language models. In Proc. International Conference on Learning Representations (2023).

Zelikman, E., Wu, Y., Mu, J. & Goodman, N. Star: bootstrapping reasoning with reasoning. Adv. Neural Info. Process. Syst. 35 , 15476–15488 (2022).

Wang, G. et al. Voyager: an open-ended embodied agent with large language models. Preprint at https://arxiv.org/abs/2305.16291 (2023).

Yin, P. et al. Natural language to code generation in interactive data science notebooks. Preprint at https://arxiv.org/abs/2212.09248 (2022).

Ni, A. et al. Lever: learning to verify language-to-code generation with execution. In Proc. International Conference on Machine Learning 26106–26128 (PMLR, 2023).

Zhou, S., Alon, U., Xu, F. F., Jiang, Z. & Neubig, G. Docprompting: generating code by retrieving the docs. In Proc. International Conference on Learning Representations (2022).

Banzhaf, W., Nordin, P., Keller, R. E. & Francone, F. D. Genetic Programming: An Introduction: On The Automatic Evolution of Computer Programs and its Applications (Morgan Kaufmann, 1998).

Langdon, W. B. & Poli, R. Foundations of Genetic Programming (Springer Science & Business Media, 2013).

Ma, H., Narayanaswamy, A., Riley, P. & Li, L. Evolving symbolic density functionals. Sci. Adv. 8 , eabq0279 (2022).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324 , 81–85 (2009).

Chen, X. et al. Symbolic discovery of optimization algorithms. Preprint at https://arxiv.org/abs/2302.06675 (2023).

Koza, J. R. Genetic Programming II: Automatic Discovery of Reusable Programs (MIT, 1994).

Salustowicz, R. & Schmidhuber, J. Probabilistic incremental program evolution. Evol. Comput. 5 , 123–141 (1997).

Article   CAS   PubMed   Google Scholar  

Burke, E. et al. in Handbook of Metaheuristics (eds Glover, F. & Kochenberger, G. A.) 457–474 (Springer, 2003).

Ross, P. in Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques (eds Burke, E. K. & Kendall, G.) 529–556 (Springer, 2005).

Burke, E. K. et al. Hyper-heuristics: a survey of the state of the art. J. Oper. Res. Soc. 64 , 1695–1724 (2013).

Burke, E. K., Hyde, M. R. & Kendall, G. Evolving bin packing heuristics with genetic programming. In Proc. International Conference on Parallel Problem Solving from Nature 860–869 (Springer, 2006).

Burke, E. K., Hyde, M. R., Kendall, G. & Woodward, J. Automatic heuristic generation with genetic programming: evolving a jack-of-all-trades or a master of one. In Proc. 9th Annual Conference on Genetic and Evolutionary Computation 1559–1565 (ACM, 2007).

Burke, E. K., Hyde, M. R. & Kendall, G. Providing a memory mechanism to enhance the evolutionary design of heuristics. In Proc. IEEE Congress on Evolutionary Computation 1–8 (IEEE, 2010).

Burke, E. K., Hyde, M., Kendall, G. & Woodward, J. R. The scalability of evolved on line bin packing heuristics. In Proc. 2007 IEEE Congress on Evolutionary Computation 2530–2537 (IEEE, 2007).

Bunel, R., Desmaison, A., Kohli, P., Torr, P. H. & Kumar, M. P. Learning to superoptimize programs. In Proc. International Conference on Learning Representations (2017).

Schkufza, E., Sharma, R. & Aiken, A. Stochastic superoptimization. ACM SIGARCH Comp. Archit. News 41 , 305–316 (2013).

Shypula, A. et al. Learning to superoptimize real-world programs. In Proc. Deep Learning for Code Workshop (ICLR 2022 Workshop) (2022).

Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610 , 47–53 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Mankowitz, D. J. et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618 , 257–263 (2023).

Yang, F. et al. Launchpad: a programming model for distributed machine learning research. Preprint at https://arxiv.org/abs/2106.04516 (2021).

Download references

Acknowledgements

We thank R. Anil, V. Feinberg, E. Taropa, T. Hubert, J. Schrittwieser and S. Nowozin for their LLM support; T. Schaul, C. Fernando, A. Barreto and P. Gupta for discussions on evolutionary algorithms; M. Figurnov and T. Cemgil for reviewing the paper; F. Piccinini and S. Kenjeyev for their support on job scheduling; S. Blackwell for technical support; O. Ronneberger, F. Gimeno, B. Huergo, A. Mehrabian and A. Anand for useful advice and G. Holland for program management support.

Author information

These authors contributed equally: Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Alhussein Fawzi

Authors and Affiliations

Google DeepMind, London, UK

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Pengming Wang, Pushmeet Kohli & Alhussein Fawzi

Department of Mathematics, University of Wisconsin-Madison, Madison, WI, USA

Jordan S. Ellenberg

Laboratoire de l’Informatique du Parallélisme, University of Lyon (Inria, ENS Lyon, UCBL, LIP), Lyon, France

You can also search for this author in PubMed   Google Scholar

Contributions

B.R.-P. conceived the project with help from A.F. and P.K. A.F. scoped problems and developed project vision. B.R.-P. and A.N. developed the initial FunSearch codebase. A.N., B.R.-P., M. Balog, F.J.R.R., M. Barekatain, E.D. and A.F. implemented and refined the different components of the system. M. Barekatain and A.N. imported and experimented with LLMs. M. Barekatain, A.N. and M. Balog worked on evaluating, debugging and improving the efficiency of experiments. M. Balog, M. Barekatain, B.R.-P., A.N., A.F., O.F. and J.S.E. contributed to the cap set problem. M.P.K., M. Balog and J.S.E. researched and analysed results from the admissible sets problem. E.D., M. Barekatain and P.W. contributed to the online bin packing problem. F.J.R.R. and O.F. researched and did experiments on other problems (Shannon capacity and corners problems), P.K. contributed technical advice and ideas. A.F., B.R.-P., E.D., F.J.R.R., M.P.K., M. Balog, A.N., J.S.E. and M. Barekatain wrote the paper.

Corresponding authors

Correspondence to Bernardino Romera-Paredes , Pushmeet Kohli or Alhussein Fawzi .

Ethics declarations

Competing interests.

The authors of the paper are planning to file a patent application relating to subject matter contained in this paper in the name of Google DeepMind.

Peer review

Peer review information.

Nature thanks Josh Grochow, Andrea Lodi, Jean-Baptiste Mouret, Talia Ringer and Tao Yu for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 example of best-shot prompting, based on the skeleton from fig. 2a ..

The prompt includes k  = 2 implementations sampled from the programs database, with higher-scoring implementations being more likely to be included.

Extended Data Fig. 2 Evolutionary method.

The initial programs are separated into islands and each of them is evolved separately. After a number of iterations, the islands with the worst score are wiped and the best program from the islands with the best score are placed in the empty islands. Evolution then proceeds separately again until the next reset. This process is repeated until termination.

Extended Data Fig. 3 Program clusters within islands.

Within each island, programs are grouped into clusters based on their signature (i.e., their scores on several inputs). We first sample clusters, favoring the ones with higher score. Within the chosen clusters, we sample a program, favoring shorter programs. The sampled programs are used to prompt the LLM which generates a new program. If the new program is correct, it is added to the island, either in an existing cluster or a new one if its signature was not yet present.

Supplementary information

Supplementary information.

Further details about the method and extra results.

Supplementary Data 1

This zipped code file contains: (a) the evolutionary algorithm, code manipulation routines and a single-threaded implementation of the FunSearch pipeline; and (b) output functions of interest produced by FunSearch.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Romera-Paredes, B., Barekatain, M., Novikov, A. et al. Mathematical discoveries from program search with large language models. Nature 625 , 468–475 (2024). https://doi.org/10.1038/s41586-023-06924-6

Download citation

Received : 12 August 2023

Accepted : 30 November 2023

Published : 14 December 2023

Issue Date : 18 January 2024

DOI : https://doi.org/10.1038/s41586-023-06924-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Large language models help computer programs to evolve.

  • Jean-Baptiste Mouret

Nature (2024)

Automated discovery of algorithms from data

  • Paul J. Blazek
  • Kesavan Venkatesh
  • Milo M. Lin

Nature Computational Science (2024)

Automated quantum software engineering

  • Aritra Sarkar

Automated Software Engineering (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

pure mathematics research paper

Exploring the beauty of pure mathematics in novel ways

Alex Davies, Pushmeet Kohli, Demis Hassabis

  • Copy link ×

pure mathematics research paper

More than a century ago, Srinivasa Ramanujan shocked the mathematical world with his extraordinary ability to see remarkable patterns in numbers that no one else could see. The self-taught mathematician from India described his insights as deeply intuitive and spiritual, and patterns often came to him in vivid dreams. These observations captured the tremendous beauty and sheer possibility of the abstract world of pure mathematics. In recent years, we have begun to see AI make breakthroughs in areas involving deep human intuition , and more recently on some of the hardest problems across the sciences , yet until now, the latest AI techniques have not assisted in significant results in pure maths research.

As part of DeepMind's mission to solve intelligence, we explored the potential of machine learning (ML) to recognize mathematical structures and patterns, and help guide mathematicians toward discoveries they may otherwise never have found — demonstrating for the first time that AI can help at the forefront of pure mathematics.

Our research paper , published today in the journal Nature, details our collaboration with top mathematicians to apply AI toward discovering new insights in two areas of pure mathematics: topology and representation theory. With Professor Geordie Williamson at the University of Sydney, we discovered a new formula for a conjecture about permutations that has remained unsolved for decades. With Professor Marc Lackenby and Professor András Juhász at the University of Oxford, we have discovered an unexpected connection between different areas of mathematics by studying the structure of knots. These are the first significant mathematical discoveries made with machine learning, according to the top mathematicians who reviewed the work. We’re also releasing full companion papers on arXiv for each result that will be submitted to appropriate mathematical journals ( permutations paper ; knots paper ). Through these examples, we propose a model for how these tools could be used by other mathematicians to achieve new results.

An animated knot made with one strand of unbroken string.

A knot is one of the fundamental objects in low-dimensional topology. It is a twisted loop embedded in 3 dimensional space.

A list of letters: A, B, C, D, E. These letters are rearranged with permutation 32415, resulting in a second list of letters: D, B, A, C ,E.

A permutation is a re-arrangement of an ordered list of objects. The permutation “32415” puts the 1st element in the 3rd location, the 2nd element in the 2nd location and so on.

The two fundamental objects we investigated were knots and permutations.

For many years, computers have been used by mathematicians to generate data to help in the search for patterns. Known as experimental mathematics, this kind of research has resulted in well-known conjectures, such as the Birch and Swinnerton-Dyer conjecture — one of six Millennium Prize Problems , the most well-known open problems in mathematics (with a US$1 million prize attached to each). While this approach has been successful and is fairly common, the identification and discovery of patterns from this data has still relied mainly on mathematicians.

Finding patterns has become even more important in pure maths because it’s now possible to generate more data than any mathematician can reasonably expect to study in a lifetime. Some objects of interest — such as those with thousands of dimensions — can also simply be too unfathomable to reason about directly. With these constraints in mind, we believed that AI would be capable of augmenting mathematicians’ insights in entirely new ways.

It feels like Galileo picking up a telescope and being able to gaze deep into the universe of data and see things never detected before.

Marcus Du Sautoy, Simonyi Professor for the Public Understanding of Science and Professor of Mathematics, University of Oxford

Our results suggest that ML can complement maths research to guide intuition about a problem by detecting the existence of hypothesised patterns with supervised learning and giving insight into these patterns with attribution techniques from machine learning:

With Professor Williamson, we used AI to help discover a new approach to a long-standing conjecture in representation theory. Defying progress for nearly 40 years, the combinatorial invariance conjecture states that a relationship should exist between certain directed graphs and polynomials. Using ML techniques, we were able to gain confidence that such a relationship does indeed exist and to identify that it might be related to structures known as broken dihedral intervals and extremal reflections. With this knowledge, Professor Williamson was able to conjecture a surprising and beautiful algorithm that would solve the combinatorial invariance conjecture. We have computationally verified the new algorithm across more than 3 million examples.

With Professor Lackenby and Professor Juhász, we explored knots - one of the fundamental objects of study in topology. Knots not only tell us about the many ways a rope can be tangled but also have surprising connections with quantum field theory and non-Euclidean geometry. Algebra, geometry, and quantum theory all share unique perspectives on these objects and a long standing mystery is how these different branches relate: for example, what does the geometry of the knot tell us about the algebra? We trained an ML model to discover such a pattern and surprisingly, this revealed that a particular algebraic quantity — the signature — was directly related to the geometry of the knot, which was not previously known or suggested by existing theory. By using attribution techniques from machine learning, we guided Professor Lackenby to discover a new quantity, which we call the natural slope, that hints at an important aspect of structure overlooked until now. Together we were then able to prove the exact nature of the relationship, establishing some of the first connections between these different branches of mathematics.

pure mathematics research paper

We investigated whether ML could shed light on relationships between different mathematical objects. Shown here are two “Bruhat intervals” and their associated “Kazhdan-Lusztig polynomials” - two fundamental objects in representation theory. A Bruhat interval is a diagram that represents all the different ways you could reverse the order of a collection of objects by only swapping two of them at a time. The KL polynomials tell mathematicians something deep and subtle about the different ways that this graph can exist in high dimensional space. Interesting structure only starts to emerge when the Bruhat intervals have 100s or 1000s of vertices.

pure mathematics research paper

Our models highlight previously undiscovered structure that guided us to surprising new mathematical results. Shown here is a striking relationship between the geometry and signature of a knot. The geometry of a knot has to do with its shape (e.g. it’s volume) when measured in a canonical way. The signature is an algebraic invariant which can be calculated by looking at the way the knot crosses itself and twists.

The use of learning techniques and AI systems holds great promise for the identification and discovery of patterns in mathematics. Even if certain kinds of patterns continue to elude modern ML, we hope our Nature paper can inspire other researchers to consider the potential for AI as a useful tool in pure maths. To replicate the results, anybody can access our interactive notebooks . Reflecting on the incredible mind of Ramanujan, George Frederick James Temple wrote, “The great advances in mathematics have not been made by logic but by creative imagination.” Working with mathematicians, we look forward to seeing how AI can further elevate the beauty of human intuition to new levels of creativity.

Pure Mathematics Research

Pure Research

Pure Mathematics Fields

  • Algebra & Algebraic Geometry
  • Algebraic Topology
  • Analysis & PDEs
  • Geometry & Topology
  • Mathematical Logic & Foundations
  • Number Theory
  • Probability & Statistics
  • Representation Theory

Pure Math Committee

University of Bristol Logo

  • Help & Terms of Use

Pure Mathematics

  • Faculty of Science
  • School of Mathematics
  • Phone +44 (0)117 928 8451
  • Website http://www.bristol.ac.uk/maths/research/pure/

United Kingdom

Research output

  • 1103 Article (Academic Journal)
  • 58 Conference Contribution (Conference Proceeding)
  • 43 Chapter in a book
  • 12 Working paper
  • 8 Authored book
  • 7 Other contribution
  • 5 Web publication/site
  • 4 Edited book
  • 2 Book/Film/Article review (Academic Journal)
  • 1 Other chapter contribution
  • 1 Conference Paper
  • 1 Letter (Academic Journal)
  • 1 Special issue (Academic Journal)
  • 1 Article (Specialist Publication)
  • 1 PhD thesis (not Bristol)

Research output per year

  • 1 - 50 out of 1,103 results
  • Publication Year, Title (ascending)

Search results

A finite dimensional algebra with infinite delooping level.

Research output : Contribution to journal › Article (Academic Journal) › peer-review

  • Projective Module 100%

On some isoperimetric inequalities for the Newtonian capacity

Poissonian pair correlation for directions in multi-dimensional affine lattices, and escape of mass estimates for embedded horospheres.

  • Direction 100%
  • Lattices 100%
  • Horosphere 100%
  • Number 100%
  • Dimension 100%

Smallest denominators

  • Statistics 100%
  • Number Theory 100%
  • Expectation Value 49%
  • Rational Point 49%
  • Distance Function 49%

The limit point in the Jante's law process has an absolutely continuous distribution

  • Limit Point 100%
  • Continuous Distribution 100%
  • Converges 50%
  • Convex Body 50%
  • Proposition 50%

Topological generation of simple algebraic groups

  • Algebraic Groups 100%
  • Orthogonal Group 66%
  • Prime Order 66%
  • Symplectic Group 66%
  • Asymptotics 33%

Asymmetric cut and choose games

Bounds for theta sums in higher rank. ii.

  • Bounds 100%
  • Upper Bound 50%
  • Quadratic Form 50%
  • Exponential Sums 50%

Fine-scale distribution of roots of quadratic congruences

  • Distribution of Root 100%
  • Geometric Interpretation 50%
  • Hyperbolic Plane 50%

Maps between relatively hyperbolic spaces and between their boundaries

  • Hyperbolic Space 100%
  • Relatively hyperbolic groups 100%
  • Satisfying 50%
  • Polynomial 50%

Normalisers of maximal tori and a conjecture of Vdovin

  • Normalizer 100%
  • Calculate 50%
  • Computational 50%
  • Endomorphism 50%
  • Algebraic Groups 50%

On base sizes for primitive groups of product type

  • Primitive Group 100%
  • Stabilizer 33%
  • Concludes 16%

On soluble subgroups of sporadic groups

  • Computational 49%
  • Finite Group 49%

On some variational problems involving capacity, torsional rigidity, perimeter and measure

  • Measures 100%
  • Variational Problem 100%
  • Rigidity 100%
  • Lebesgue Measure 100%
  • Convex Set 50%

On the commuting probability of p-elements in a finite group

  • Probability Theory 100%
  • Finite Group 100%
  • Fixed Points 33%
  • Permutation Group 33%

On the soluble graph of a finite group

  • Simple Group 66%
  • Variations 33%

On the topological generation of exceptional groups by unipotent elements

  • Finite Field 100%
  • Exist Element 100%
  • Algebraically Closed Field 100%

Strongly base-two groups

  • Free Subgroup 100%
  • Permutation 50%
  • Conjugacy Class 50%

Almost elusive permutation groups

  • Prime Order 100%
  • Permutation Group 100%
  • Fixed Points 50%
  • Affine Type 50%

Arithmetic of hyperelliptic curves over local fields

A user's guide to the local arithmetic of hyperelliptic curves.

  • Arithmetic 100%
  • Local Field 33%

Closed and Unbounded Classes and the Härtig Quantifier Model

Decision times of infinite computations.

  • Supremum 100%
  • Algorithm 50%
  • Semidecidable Set 25%

Enumerating 3-generated axial algebras of Monster type

Research output : Contribution to journal › Article (Academic Journal)

  • Eigenvector 100%
  • Semisimple 100%
  • Subalgebra 100%
  • Idempotent 100%
  • Automorphism Group 100%

Explicit coverings of families of elliptic surfaces by squares of elliptic curves

  • Elliptic Curve 100%
  • Elliptic Surfaces 100%
  • Concludes 25%
  • Fibration 25%

Fixed point ratios for finite primitive groups and applications

Interacting particle systems and jacobi style identities.

  • Finite Frobenius Ring 100%
  • Interacting Particle Systems 100%
  • Measures 50%
  • Nearest Neighbour 50%
  • Triple Product 50%

L-Series of Harmonic Maass Forms and a Summation Formula for Harmonic Lifts

  • L-Series 100%
  • Functional Equation 33%
  • Cusp Form 33%

Non-vanishing of symmetric cube L-functions

On efficiency and localisation for the torsion function.

  • Operators 100%
  • Geometry 100%
  • Eigenfunction 100%
  • Open Set 100%

On the classification of extremely primitive affine groups

  • Complete 66%

On the Saxl graphs of primitive groups with soluble stabilisers

  • Stabilizer 100%
  • Vertex 100%

Poincaré profiles of Lie groups and a coarse geometric dichotomy

  • Lie Group 100%
  • Unimodular Lie Group 40%
  • Calculate 20%
  • Geometry 20%
  • Polynomial 20%

Quantum-jump vs stochastic Schrödinger dynamics for Gaussian states with quadratic Hamiltonians and linear Lindbladians

Split spin factor algebras.

  • Ideals 100%
  • Including Algebra 100%

Stably Measurable Cardinals

  • Consistency Strength 100%
  • Measurable Cardinal 100%
  • Definability 50%
  • Bounded Subset 50%

The mean square of the error term in the prime number theorem

  • Riemann Hypothesis 100%
  • Error Term 100%
  • Prime Number Theorem 100%
  • Mean Square 100%

2-parity conjecture for elliptic curves with isomorphic 2-torsion

Accurate estimation of sums over zeros of the riemann zeta-function, a five distance theorem for kronecker sequences.

  • Integer 100%
  • Upper Bound 100%
  • Nearest Neighbour 100%

Age evolution in the mean field forest fire model via multitype branching processes

  • Random Graph 50%
  • Converges 25%
  • Asymptotics 25%

Base sizes for primitive groups with soluble stabilisers

  • Permutation Group 50%

Conformal dimension of hyperbolic groups that split over elementary subgroups

Effective joint equidistribution of primitive rational points on expanding horospheres.

  • Rational Point 100%
  • Measures 33%
  • Fourier Analysis 33%

Efficiency and localisation for the first Dirichlet eigenfunction

  • Upper Bound 25%
  • Diagonal 25%
  • Wide Class 25%

First moments of Rankin–Selberg convolutions of automorphic forms on GL(2)

  • Convolution 100%
  • Automorphic Form 100%
  • Orthonormal Basis 66%
  • Optimal Estimate 33%
  • Modular Form 33%

Intrinsic Ultracontractivity for Domains in Negatively Curved Manifold

  • Constant 100%
  • Complete 100%
  • Manifold 100%
  • Gaussian Distribution 100%
  • Sufficient Condition 100%

Localisation for the torsion function and the strong Hardy inequality

  • Hardy Inequality 100%
  • Laplace Operator 50%

Models of curves over DVRs

  • Integer 50%
  • Projective 50%
  • Cohomologies 50%

On capacity and torsional rigidity

  • Open Set 50%
  • Maximizer 50%
  • Lebesgue Measure 50%

Stephen Wolfram

  • Artificial Intelligence
  • Big Picture
  • Companies & Business
  • Computational Science
  • Computational Thinking
  • Data Science
  • Future Perspectives
  • Historical Perspectives
  • Language & Communication
  • Life & Times
  • Life Science
  • Mathematica
  • Mathematics
  • New Kind of Science
  • New Technology
  • Personal Analytics
  • Software Design
  • Wolfram|Alpha
  • Wolfram Language
  • Workflow of pure math
  • Mathematica Pura
  • Computers & humans
  • Math by enumeration
  • Interestingness
  • Curating the math corpus
  • Undecidabilty bites
  • Math: science or art?
  • Proof as story
  • Automated discovery
  • Getting it done
  • The eCF example

Computational Knowledge and the Future of Pure Mathematics

Every four years for more than a century there’s been an International Congress of Mathematicians (ICM) held somewhere in the world. In 1900 it was where David Hilbert announced his famous collection of math problems—and it’s remained the top single periodic gathering for the world’s research mathematicians.

This year the ICM is in Seoul, and I’m going to it today. I went to the ICM once before—in Kyoto in 1990. Mathematica was only two years old then, and mathematicians were just getting used to it. Plenty already used it extensively—but at the ICM there were also quite a few who said, “I do pure mathematics. How can Mathematica possibly help me?”

Mathematics

Twenty-four years later, the vast majority of the world’s pure mathematicians do in fact use Mathematica in one way or another. But there’s nevertheless a substantial core of pure mathematics that still gets done pretty much the same way it’s been done for centuries—by hand, on paper.

Ever since the 1990 ICM I’ve been wondering how one could successfully inject technology into this. And I’m excited to say that I think I’ve recently begun to figure it out. There are plenty of details that I don’t yet know. And to make what I’m imagining real will require the support and involvement of a substantial set of the world’s pure mathematicians. But if it’s done, I think the results will be spectacular—and will surely change the face of pure mathematics at least as much as Mathematica (and for a younger generation, Wolfram|Alpha ) have changed the face of calculational mathematics, and potentially usher in a new golden age for pure mathematics.

Workflow of pure math

How can we usefully insert technology into this workflow? Here’s one simple way. Think about Wolfram|Alpha. If you enter 2+2 , Wolfram|Alpha—like Mathematica —will compute 4. But if you enter new york —or, for that matter, 2.3363636 or cos(x) log(x) —there’s no single “answer” for it to compute. And instead what it does is to generate a report that gives you a whole sequence of “interesting facts” about what you entered.

Part of Wolfram|Alpha's output for cos(x) log(x)

And this kind of thing fits right into the workflow for pure mathematics. You enter some mathematical object, result or structure, and then the system tries to tell you interesting things about it—just like some extremely wise mathematical colleague might. You can guide the system if you want to, by telling it what kinds of things you want to know about, or even by giving it a candidate statement that might be true. But the workflow is always the Wolfram|Alpha-like “what can you tell me about that?” rather than the Mathematica -like “what’s the answer to that?”

Wolfram|Alpha already does quite a lot of this kind of thing with mathematical objects. Enter a number , or a mathematical expression , or a graph , or a probability distribution , or whatever , and Wolfram|Alpha will use often-quite-sophisticated methods to try to tell you a collection of interesting things about it.

Wolfram|Alpha tells you interesting things about mathematical objects—here "petersen graph", "stellated dodecahedron", "pareto distribution", and "42424"

But to really be useful in pure mathematics, there’s something else that’s needed. In addition to being able to deal with concrete mathematical objects, one also has to be able to deal with abstract mathematical structures.

Countless pure mathematical papers start with things like, “Let F be a field with such-and-such properties.” We need to be able to enter something like this—then have our system automatically give us interesting facts and theorems about F , in effect creating a whole automatically generated paper that tells the story of F .

So what would be involved in creating a system to do this? Is it even possible? There are several different components, all quite difficult and time consuming to build. But based on my experiences with Mathematica , Wolfram|Alpha, and A New Kind of Science , I am quite certain that with the right leadership and enough effort, all of them can in fact be built.

A key part is to have a precise symbolic description of mathematical concepts and constructs. Lots of this now already exists —after more than a quarter century of work—in Mathematica . Because built right into the Wolfram Language are very general ways to represent geometries , or equations , or stochastic processes or quantifiers . But what’s not built in are representations of pure mathematical concepts like bijections or abstract semigroups or pullbacks.

Mathematica Pura

I’ve been doing language design now for 35 years—and it’s the hardest intellectual activity I know. It requires a curious mixture of clear thinking, aesthetics and pragmatic judgement. And it involves always seeking the deepest possible understanding, and trying to do the broadest unification—to come up in the end with the cleanest and “most obvious” primitives to represent things.

Today the main way pure mathematics is described—say in papers—is through a mixture of mathematical notation and natural language, together with a few diagrams. And in designing a precise symbolic language for pure mathematics, this has to be the starting point.

One might think that somehow mathematical notation would already have solved the whole problem. But there’s actually only a quite small set of constructs and concepts that can be represented with any degree of standardization in mathematical notation—and indeed many of these are already in the Wolfram Language.

So how should one go further? The first step is to understand what the appropriate primitives are. The whole Wolfram Language today has about 5000 built-in functions—together with many millions of built-in standardized entities. My guess is that to broadly support pure mathematics there would need to be something less than a thousand other well-designed functions that in effect define frameworks—together with maybe a few tens of thousands of new entities or their analogs.

Wolfram Language function and entity categories

Take something like function spaces. Maybe there’ll be a FunctionSpace function to represent a function space. Then there’ll be various operations on function spaces, like PushForward or MetrizableQ . Then there’ll be lots of named function spaces, like “CInfinity”, with various kinds of parameterizations.

Underneath, everything’s just a symbolic expression. But in the Wolfram Language there end up being three immediate ways to input things, all of which are critical to having a convenient and readable language. The first is to use short notations—like + or ∀ —as in standard mathematical notation. The second is to use carefully chosen function names—like MatrixRank or Simplex . And the third is to use free-form natural language—like trefoil knot or aleph0 .

One wants to have short notations for some of the most common structural or connective elements. But one needs the right number: not too few, like in LISP, nor too many, like in APL. Then one wants to have function names made of ordinary words, arranged so that if one’s given something written in the language one can effectively just “read the words” to know at least roughly what’s going on in it.

Computers & humans

Ultimately every named construct or concept in pure mathematics needs to have a place in our symbolic language. Most of the 13,000+ entries in MathWorld . Material from the 5600 or so entries in the MSC2010 classification scheme. All the many things that mathematicians in any given field would readily recognize when told their names.

But, OK, so let’s say we manage to create a precise symbolic language that captures the concepts and constructs of pure mathematics. What can we do with it?

One thing is to use it “Wolfram|Alpha style”: you give free-form input, which is then interpreted into the language, and then computations are done, and a report is generated.

But there’s something else too. If we have a sufficiently well-designed symbolic language, it’ll be useful not only to computers but also to humans. In fact, if it’s good enough, people should prefer to write out their math in this language than in their current standard mixture of natural language and mathematical notation.

When I write programs in the Wolfram Language, I pretty much think directly in the language. I’m not coming up with a description in English of what I’m trying to do and then translating it into the Wolfram Language. I’m forming my thoughts from the beginning in the Wolfram Language—and making use of its structure to help me define those thoughts.

If we can develop a sufficiently good symbolic language for pure mathematics, then it’ll provide something for pure mathematicians to think in too. And the great thing is that if you can describe what you’re thinking in a precise symbolic language, there’s never any ambiguity about what anything means: there’s a precise definition that you can just go to the documentation for the language to find.

And once pure math is represented in a precise symbolic language, it becomes in effect something on which computation can be done. Proofs can be generated or checked. Searches for theorems can be done. Connections can automatically be made. Chains of prerequisites can automatically be found.

But, OK, so let’s say we have the raw computational substrate we need for pure mathematics. How can we use this to actually implement a Wolfram|Alpha-like workflow where we enter descriptions of things, and then in effect automatically get mathematical wisdom about them?

There are two seemingly different directions one can go. The first is to imagine abstractly enumerating possible theorems about what has been entered, and then using heuristics to decide which of them are interesting. The second is to start from computable versions of the millions of theorems that have actually been published in the literature of mathematics, and then figure out how to connect these to whatever has been entered.

Each of these directions in effect reflects a slightly different view of what doing mathematics is about. And there’s quite a bit to say about each direction.

Math by enumeration

It’s easy to do either of these things for something like Boolean algebra. And the result is that one gets a sequence of true theorems. But if a human looks at them, many of them seem trivial or uninteresting. So then the question is how to know which of the possible theorems should actually be considered “interesting enough” to be included in a report that’s generated.

My first assumption was that there would be no automatic approach to this—and that “interestingness” would inevitably depend on the historical development of the relevant area of mathematics. But when I was working on A New Kind of Science , I did a simple experiment for the case of Boolean algebra.

Partial list of Boolean algebra theorems, from p. 817 of "A New Kind of Science"

There are 14 theorems of Boolean algebra that are usually considered “interesting enough” to be given names in textbooks. I took all possible theorems and listed them in order of complexity (number of variables, number of operators, etc). And the surprising thing I found is that the set of named theorems corresponds almost exactly to the set of theorems that can’t be proved just from ones that precede them in the list. In other words, the theorems which have been given names are in a sense exactly the minimal statements of new information about Boolean algebra.

Boolean algebra is of course a very simple case. And in the kind of enumeration I just described, once one’s got the theorems corresponding to all the axioms, one would conclude that there aren’t any more “interesting theorems” to find—which for many mathematical theories would be quite silly. But I think this example is a good indication of how one can start to use automated heuristics to figure out which theorems are “worth reporting on”, and which are, for example, just “uninteresting embellishments”.

Interestingness

So in principle one can imagine having a system that takes input and generates “interesting” theorems about it. Notice that while in a standard Mathematica -like calculational workflow, one would be taking input and “computing an answer” from it, here one’s just “finding interesting things to say about it”.

The character of the input is different too. In the calculational case, one’s typically dealing with an operation to be performed. In the Wolfram|Alpha-like pure mathematical case, one’s typically just giving a description of something. In some cases that description will be explicit. A specific number. A particular equation. A specific graph. But more often it will be implicit. It will be a set of constraints. One will say (to use the example from above), “Let F be a field,” and then one will give constraints that the field must satisfy.

In a sense an axiom system is a way of giving constraints too: it doesn’t say that such-and-such an operator “is Nand”; it just says that the operator must satisfy certain constraints. And even for something like standard Peano arithmetic, we know from Gödel’s Theorem that we can never ultimately resolve the constraints–we can never nail down that the thing we denote by “+” in the axioms is the particular operation of ordinary integer addition. Of course, we can still prove plenty of theorems about “+”, and those are what we choose from for our report.

So given a particular input, we can imagine representing it as a set of constraints in our precise symbolic language. Then we would generate theorems based on these constraints, and heuristically pick the “most interesting” of them.

One day I’m sure doing this will be an important part of pure mathematical work. But as of now it will seem quite alien to most pure mathematicians—because they are not used to “disembodied theorems”; they are used to theorems that occur in papers, written by actual mathematicians.

And this brings us to the second approach to the automatic generation of “mathematical wisdom”: start from the historical corpus of actual mathematical papers, and then make connections to whatever specific input is given. So one is able to say for example, “The following theorem from paper X applies in such-and-such a way to the input you have given”, and so on.

Curating the math corpus

So what can be done with these? First, of course, there’s simple search and retrieval. Often the words in the papers will make for better search targets than the more notational material in the actual theorems. But with the kind of linguistic-understanding technology for math that we have in Wolfram|Alpha, it should not be too difficult to build what’s needed to do good statistical retrieval on the corpus of mathematical papers.

But can one go further? One might think about tagging the source documents to improve retrieval. But my guess is that most kinds of static tagging won’t be worth the trouble; just as one’s seen for the web in general, it’ll be much easier and better to make the search system more sophisticated and content-aware than to add tags document by document.

What would unquestionably be worthwhile, however, is to put the theorems into a genuine computable form: to actually take theorems from papers and rewrite them in a precise symbolic language.

Will it be possible to do this automatically? Eventually I suspect large parts of it will. Today we can take small fragments of theorems from papers and use the linguistic understanding system built for Wolfram|Alpha to turn them into pieces of Wolfram Language code. But it should gradually be possible to extend this to larger fragments—and eventually get to the point where it takes, at most, modest human effort to convert a typical theorem to precise symbolic form.

So let’s imagine we curate all the theorems from the literature of mathematics, and get them in computable form. What would we do then? We could certainly build a Wolfram|Alpha-like system that would be quite spectacular—and very useful in practice for doing lots of pure mathematics.

Undecidability bites

And what this suggests is a kind of combination of the two basic approaches we’ve discussed—where in effect one takes the complete corpus of published mathematics, and views it as defining a giant 5-million-axiom formal system, and then follows the kind of automated theorem-enumeration procedure we discussed to find “interesting things to say”.

Math: science or art?

I think it depends on what one sees the nature of the pure mathematical enterprise as being. Is it science, or is it art? If it’s science, then being able to make more theorems faster is surely good. But if it’s art, that’s really not the point. If doing pure mathematics is like creating a painting, automation is going to be largely counterproductive—because the core of the activity is in a sense a form of human expression.

This is not unrelated to the role of proof. To some mathematicians, what matters is just the theorem: knowing what’s true. The proof is essentially backup to ensure one isn’t making a mistake. But to other mathematicians, proof is a core part of the content of the mathematics. For them, it’s the story that brings mathematical concepts to light, and communicates them.

\[SmallCircle]

It has 343 steps, and in ordinary-size type would be perhaps 40 pages long. And to me as a human, it’s completely incomprehensible. One might have thought it would help that the theorem prover broke the proof into 81 lemmas. But try as I might, I couldn’t really find a way to turn this automated proof into something I or other people could understand. It’s nice that the proof exists, but the actual proof itself doesn’t tell me anything.

Proof as story

So how can we do better? If we generate lots of similar proofs, then maybe we’ll start seeing similar lemmas a lot, and through being familiar they will seem more meaningful and comprehensible. And there are probably some visualizations that could help us quickly get a good picture of the overall structure of the proof. And of course, if we manage to curate all known theorems in the mathematics literature, then we can potentially connect automatically generated lemmas to those theorems.

It’s not immediately clear how often that will possible—and indeed in existing examples of computer-assisted proofs, like for the Four Color Theorem, the Kepler Conjecture, or the simplest universal Turing machine, my impression is that the often-computer-generated lemmas that appear rarely correspond to known theorems from the literature.

But despite all this, I know at least one example showing that with enough effort, one can generate proofs that tell stories that people can understand: the step-by-step solutions system in Wolfram|Alpha Pro . Millions of times a day students and others compute things like integrals with Wolfram|Alpha—then ask to see the steps.

Wolfram|Alpha's step-by-step solution for an indefinite integral

It’s notable that actually computing the integral is much easier than figuring out good steps to show; in fact, it takes some fairly elaborate algorithms and heuristics to generate steps that successfully communicate to a human how the integral can be done. But the example of step-by-step in Wolfram|Alpha suggests that it’s at least conceivable that with enough effort, it would be possible to generate proofs that are readable as “stories”—perhaps even selected to be as short and simple as possible (“proofs from The Book”, as Erdős would say).

Of course, while these kinds of automated methods may eventually be good at communicating the details of something like a proof, they won’t realistically be able to communicate—or even identify—overarching ideas and motivations. Needless to say, present-day pure mathematics papers are often quite deficient in communicating these too. Because in an effort to ensure rigor and precision, many papers tend to be written in a very formal way that cannot successfully represent the underlying ideas and motivations in the mind of the author—with the result that some of the most important ideas in mathematics are transmitted through an essentially oral tradition.

It would certainly help the progress of pure mathematics if there were better ways to communicate its content. And perhaps having a precise symbolic language for pure mathematics would make it easier to express concretely some of those important points that are currently left unwritten. But one thing is for sure: having such a language would make it possible to take a theorem from anywhere, and—like with a typical Wolfram Language code fragment—immediately be able to plug it in anywhere else, and use it.

But back to the question of whether automation in pure mathematics can ultimately make sense. I consider it fairly clear that a Wolfram|Alpha-like “pure math assistant” would be useful to human mathematicians. I also consider it fairly clear that having a good, precise, symbolic language—a kind of Mathematica Pura that’s a well-designed follow-on to standard mathematical notation—would be immensely helpful in formulating, checking and communicating math.

Automated discovery

But I think the real question is whether the computer can build up new conceptual frameworks and structures—in effect new mathematical theories. Certainly some theorems found by enumeration will be surprising and indicative of something fundamentally new. And it will surely be impressive when a computer can take a large collection of theorems—whether generated or from the literature—and discover correlations among them that indicate some new unifying principle. But I would expect that in time the computer will be able not only to identify new structures, but also name them, and start building stories about them. Of course, it is for humans to decide whether they care about where the computer is going, but the basic character of what it does will, I suspect, be largely indistinguishable from many forms of human pure mathematics.

All of this is still fairly far in the future, but there’s already a great way to discover math-like things today—that’s not practiced nearly as much as it should be: experimental mathematics . The term has slightly different meanings to different people. For me it’s about going out and studying what mathematical systems do by running experiments on them. And so, for example, if we want to find out about some class of cellular automata , or nonlinear PDEs , or number sequences , or whatever, we just enumerate possible cases and then run them and see what they do.

There’s a lot to discover like this. And certainly it’s a rich way to generate observations and hypotheses that can be explored using the traditional methodologies of pure mathematics. But the real thrust of what can be done does not fit into what pure mathematicians typically think of as math. It’s about exploring the “flora and fauna”—and principles—of the universe of possible systems, not about building up math-like structures that can be studied and explained using theorems and proofs. Which is why—to quote the title of my book—I think one should best consider this a new kind of science , rather than something connected to existing mathematics.

In discussing experimental mathematics and A New Kind of Science , it’s worth mentioning that in some sense it’s surprising that pure mathematics is doable at all —because if one just starts asking absolutely arbitrary questions about mathematical systems, many of them will end up being undecidable .

This is particularly obvious when one’s out in the computational universe of possible programs, but it’s also true for programs that represent typical mathematical systems. So why isn’t undecidability more of a problem for typical pure mathematics? The answer is that pure mathematics implicitly tends to select what it studies so as to avoid undecidability. In a sense this seems to be a reflection of history: pure mathematics follows what it has historically been successful in doing, and in that way ends up navigating around undecidability—and producing the millions of theorems that make up the corpus of existing pure mathematics.

OK, so those are some issues and directions. But where are we at in practice in bringing computational knowledge to pure mathematics?

Getting it done

One feature of essentially all these efforts is that they were conceived as defining a kind of “low-level language” for mathematics. Like most of today’s computer languages, they include a modest number of primitives, then imagine that essentially any actual content must be built externally, by individual users or in libraries.

But the new idea in the Wolfram Language is to have a knowledge-based language, in which as much actual knowledge as possible is carefully designed into the language itself. And I think that just like in general computing, the idea of a knowledge-based language is going to be crucial for injecting computation into pure mathematics in the most effective and broadly useful way.

So what’s involved in creating our Mathematica Pura —an extension to the Wolfram Language that builds in the actual structure and content of pure math? At the lowest level, the Wolfram Language deals with arbitrary symbolic expressions , which can represent absolutely anything. But then the language uses these expressions for many specific purposes. For example, it can use a symbol x to represent an algebraic variable. And given this, it has many functions for handling symbolic expressions—interpreted as mathematical or algebraic expressions—and doing various forms of math with them.

The emphasis of the math in Mathematica and the Wolfram Language today is on practical, calculational, math. And by now it certainly covers essentially all the math that has survived from the 19th century and before. But what about more recent math? Historically, math itself went through a transition about a century ago. Just around the time modernism swept through areas like the arts, math had its own version: it started to consider systems that emerged purely from its own formalism, without regard for obvious connection to the outside world.

And this is the kind of math —through developments like Bourbaki and beyond—that came to dominate pure mathematics in the 20th century. And inevitably, a lot of this math is about defining abstract structures to study. In simple cases, it seems like one might represent these structures using some hierarchy of types. But the types need to be parametrized, and quite quickly one ends up with a whole algebra or calculus of types—and it’s just as well that in the Wolfram Language one can use general symbolic expressions, with arbitrary heads , rather than just simple type descriptions.

As I mentioned early in this blog post, it’s going to take all sorts of new built-in functions to capture the frameworks needed to represent modern pure mathematics—together with lots of entity-like objects. And it’ll certainly take years of careful design to make a broad system for pure mathematics that’s really clean and usable. But there’s nothing fundamentally difficult about having symbolic constructs that represent differentiability or moduli spaces or whatever. It’s just language design, like designing ways to represent 3D images or remote computation processes or unique external entity references.

So what about curating theorems from the literature? Through Wolfram|Alpha and the Wolfram Language, not to mention for example the Wolfram Functions Site and the Wolfram Connected Devices Project , we’ve now had plenty of experience at the process of curation, and in making potentially complex things computable.

The eCF example

We chose about 2000 documents, then set about extracting theorems and other mathematical information from them. The result was about 600 theorems, 1500 basic formulas, and about 10,000 derived formulas. The formulas were directly in computable form—and were in effect immediately able to join the 300,000+ on the Wolfram Functions Site, that are all now included in Wolfram|Alpha. But with the theorems, our first step was just to treat them as entities themselves, with properties such as where they were first published, who discovered them, etc. And even at this level, we were able to insert some nice functionality into Wolfram|Alpha.

Some of the output from entering "Worpitzky theorem" into Wolfram|Alpha

But we also started trying to actually encode the content of the theorems in computable form. It took introducing some new constructs like LebesgueMeasure , ConvergenceSet and LyapunovExponent . But there was no fundamental problem in creating precise symbolic representations of the theorems. And just from these representations, it became possible to do computations like this in Wolfram|Alpha:

Wolfram|Alpha results for "continued fraction theorems for sqrt(7)"

An interesting feature of the continued fraction project (dubbed “eCF”) was how the process of curation actually led to the discovery of some new mathematics. For having done curation on 50+ papers about the Rogers–Ramanujan continued fraction, it became clear that there were missing cases that could now be computed. And the result was the filling of a gap left by Ramanujan for 100 years.

Ramanujan's missing cases are now computable

There’s always a tradeoff between curating knowledge and creating it afresh. And so, for example, in the Wolfram Functions Site, there was a core of relations between functions that came from reference books and the literature. But it was vastly more efficient to generate other relations than to scour the literature to find them.

The Wolfram Function Site, and Wolfram|Alpha, generate relations between functions

But if the goal is curation, then what would it take to curate the complete literature of mathematics? In the eCF project, it took about 3 hours of mathematician time to encode each theorem in computable form. But all this work was done by hand, and in a larger-scale project, I am certain that an increasing fraction of it could be done automatically, not least using extensions of our Wolfram|Alpha natural language understanding system.

Of course, there are all sorts of practical issues. Newer papers are predominantly in TeX, so it’s not too difficult to pull out theorems with all their mathematical notation. But older papers need to be scanned, which requires math OCR, which has yet to be properly developed.

Then there are issues like whether theorems stated in papers are actually valid. And even whether theorems that were considered valid, say, 100 years ago are still considered valid today. For example, for continued fractions, there are lots of pre-1950 theorems that were successfully proved in their time, but which ignore branch cuts, and so wouldn’t be considered correct today.

And in the end of course it requires lots of actual, skilled mathematicians to guide the curation process, and to encode theorems. But in a sense this kind of mobilization of mathematicians is not completely unfamiliar; it’s something like what was needed when Zentralblatt was started in 1931, or Mathematical Reviews in 1941. (As a curious footnote, the founding editor of both these publications was Otto Neugebauer, who worked just down the hall from me at the Institute for Advanced Study in the early 1980s, but who I had no idea was involved in anything other than decoding Babylonian mathematics until I was doing research for this blog post.)

When it comes to actually constructing a system for encoding pure mathematics, there’s an interesting example: Theorema , started by Bruno Buchberger in 1995, and recently updated to version 2. Theorema is written in the Wolfram Language, and provides both a document-based environment for representing mathematical statements and proofs, and actual computation capabilities for automated theorem proving and so on.

A proof in Theorema

No doubt it’ll be an element of what’s ultimately built. But the whole project is necessarily quite large—perhaps the world’s first example of “big math”. So can the project get done in the world today? A crucial part is that we now have the technical capability to design the language and build the infrastructure that’s needed. But beyond that, the project also needs a strong commitment from the world’s mathematics community—as well as lots of contributions from individual mathematicians from every possible field. And realistically it’s not a project that can be justified on commercial grounds—so the likely $100+ million that it will need will have to come from non-commercial sources.

But it’s a great and important project—that promises to be pivotal for pure mathematics. In almost every field there are golden ages when dramatic progress is made. And more often than not, such golden ages are initiated by new methodology and the arrival of new technology. And this is exactly what I think will happen in pure mathematics. If we can mobilize the effort to curate known mathematics and build the system to use and generate computational knowledge around it, then we will not only succeed in preserving and spreading the great heritage of pure mathematics, but we will also thrust pure mathematics into a period of dramatic growth.

Large projects like this rely on strong leadership. And I stand ready to do my part, and to contribute the core technology that is needed. Now to move this forward, what it takes is commitment from the worldwide mathematics community. We have the opportunity to make the second decade of the 21st century really count in the multi-millennium history of pure mathematics. Let’s actually make it happen!

Posted in: Future Perspectives , Mathematics

Please enter your comment (at least 5 characters).

Please enter your name.

Please enter a valid email address.

16 comments

Great post.

I look forward to using Mathematica Pura sometime “soon”.

great essay. Also see for relevant discussion:

http://monasandnomos.org/2012/12/05/the-idea-of-a-characteristica-universalis-between-leibniz-and-russell-and-its-relevancy-today/

http://vanemden.wordpress.com/2012/04/08/flowcharts-the-once-and-future-programming-language/

And the HN thread:

https://news.ycombinator.com/item?id=8168028

This is a very interesting post, and I hope it leads to something cool and useful. The exact problem of computerizing pure mathematics, in the most general sense, has actually been solved in the past, but only one way. Said solution is implemented in several languages, most popularly Coq and Agda. You mentioned Coq, but I’m not sure you understand exactly what the system is about. It’s not merely a proof assistant.

The solution is basically to just find a programming language that is both usable by programmers, and expressive enough to be used as a convenient foundation for mathematics. The language in question is Martin-Löf type theory, and the exact languages of Coq and Agda are implementations of mere extensions of that.

Since they are just extensions of a foundation for mathematics, you can directly construct definitions for whatever pure structure you want, and start programming. Theorems are just types, and all programs are proofs, as per the Curry-Howard isomorphism.

The utility of this approach is the ability to actually utilize theorems to prove the correctness of algorithms. For instance, you can define an inductive datatype “SortedList” in parallel to “List”. The difference is that “SortedList” carries with each element a proof that it should be sorted after it’s predecessor. This ability to express raw proof information is important, since, if you define a function “sort : List A -> SortedList A”, the type-checker will refuse to move onto compilation if the algorithm doesn’t show its work by constructing valid proofs with each sorted element. Even if the algorithm is technically correct, unless it shows its work, it won’t compile. This unifies specification with implementation, allowing you to, at the same time, specify what a program is intended to do, and implement it. If the implementation doesn’t match the specification the type-checker will tell you. This has vast applications to the formal verification of software, as well as to pure mathematics normally. This sort of technique is well known for having been used to prove the Four Color Theorem, as mentioned here, in the language Coq.

These systems can also have the ability to do automatic proof searches. Coq has a language subset dedicated to designing proof-search algorithms. Agda has an auto feature, which can use it’s unification algorithm to instantiate likely solutions to program holes.

There are also few variations on this approach. All but one implementations of this have been in the form of functional languages. The exception is an experimental language called Caledon, which is a logical language, similar to Prolog, more similar to λProlog. Interestingly, that language can be used to create a really elegant (~60 line) implementation of a small computer-algebra program that can solve rational derivatives. But, of course, none of these variations are available to Mathematica.

Other than the ability to prove the correctness of programs, there are other applications to unifying pure mathematics with computation. For instance, lit’s say I design a domain specific language on some monad. I should be able to use category theory proofs to generate correct algorithms that can be used to make statements within said DSL. Before Homotopy Type Theory, I’m not even sure I’d be able to do this systematically in Agda or Coq. This seems like the place where an approach like the one hinted at here can shine. The symbolic geometry language can actually be used to design algorithms that utilize geometry. I would expect the same from any extension to a pure domain.

“so the likely $100+ million that it will need will have to come from non-commercial sources.”

I find this idea as unacceptable as trying to do basic research with Mathematica altogether. The idea of Mathematica is ingeniuous and although I find the Wolfram Language far too convoluted, its documentation somewhat lacking and the user interface horrible, the actual breaking point for Mathematica and your possible Mathematica Pura programme is that it is opaque to me as a researcher. Mathematica would not need to be free software (in the definition of the Free Software Foundation), but it necessarily needs to be open and the whole of its Source Code inspectable so that we can, if not prove, at least show that the implementation of the Wolfram Language core functions is free of bugs. Without the complete implementational details you cannot expect anyone to invest in this programme, no development time, no money, and most importantly usage time, since we will never be completely sure, that the results thus obtained will be correct or simply artifacts of Mathematica.

The only tenable way forward is to completely open source Mathematica (preferably through a FOSS license for which you could still charge for exemption licenses or your add on services for the wonderful data you provide) and by generally improving the documentation of Mathematica with implementation details and links to background material.

The precise symbolic language, a.k.a. Mathematica Pura, would be immensely helpful in formulating and communicating math and thus lower the cost of access to pure mathematics. In the long run this will presumably generate far greater economic benefit that the $100+ million cost of this first big-math project. The required transformation might be like that from the 1960’s Apollo Moon Shot to current NewSpace, where the promise of dramatically lower cost of interplanetary space access has begun to attract enormous financial investment. It’s just that here we’re talking about exploring the universe of possible mathematical systems. PS: This big-math project applies automated theorem-enumeration to implement a Wolfram|Alpha-like workflow. And the “seemingly different directions” both end up doing enumeration. A convenient way to distinguish these might be bottom-up enumeration in the “Math by enumeration” case that handles inputs, for example, like the field F with constraints, and top-down enumeration in the “Curating the math corpus” case that starts with the math corpus 5-million-axiom formal system.

“If we can mobilize the effort to curate known mathematics and build the system to use and generate computational knowledge around it, then we will not only succeed in preserving and spreading the great heritage of pure mathematics, but we will also thrust pure mathematics into a period of dramatic growth.” Thank you!! Stephen, you again open the doors of new possibilities, this time in pure mathematics. Your enthusiast, Zlatko Bosanac, Croatia

so the likely $100+ million that it will need will have to come from non-commercial sources

I would say this is crucial, in order not to tie this “digitalization” process to proprietary software. The data should be collected and made computable in open and accessible formats, so not only by Mathematica or other specific products.

@Jörg Behrmann I agree with the idea that the source code should be peer-reviewed – and in fact I’m a little concerned by the current trend of Wolfram’s products. While I personally like the Wolfram Language (though I’m baffled by Stephen’s need to name *everything* after himself), I’m worried by things like Wolfram|Alpha Pro. It’s entirely reasonable to charge for large-scale government or corporate services, as those take incredible amounts of resources to create and maintain, but to take services that were free, such as (if I recall correctly) the step-by-step solutions for certain math problems, and to place them behind a paywall, is the opposite of the direction Stephen says he wants to go. I’m all for making money, but this move away from public access (or at least free public access) does not correlate to what Stephen claims to want to do.

Then again, making Mathematica available on the Raspberry Pi for free is a step in the right direction despite the extreme limitations of the hardware, so there’s probably still hope.

fine Stephen it’s great You will develop further Mathematica. I am pure mathematican using modern tools like Wolfram Mathematica in my passions (sic:) Your WM and WM/Alpha are brilent aids in special and everyday usage. So I will wait exctied for Mathematica Pura. Good luck Stephen.

Łukasz Surzycki private researcher

Great article. Please keep in mind students when you design Mathematica Pura, it would be awesome to start “small” by making a system able to used on problems of High School, Bachelor, and Masters level.

The public will be there, and it would definitely change the way students perceive mathematics. And if students have understood pure mathematics with Mathematica, they’ll want to continue to use it for the rest. This would be a good entry point and solve a real problem which is that it’s quite difficult to reason in pure mathematics, and usually students only begin to be comfortable with abstract mathematical thinking at the beginning of their graduate studies.

Mathematica Pura can be a game changer, and I think it would be a good idea to begin by targeting this public.

@JörgBehrmann I don’t think that it’s fair to ask them to publish the source code. Matlab is far from being open-source but is widely used in the industry.

Mathematica has proved that it’s a robust engine and reliable engine. If you work on nuclear stuff and highly sensitive and highly risky project, I’m sure they’ll be ok to receive you and let you check parts of the source code relevant to your concern if you contact them. But, honestly, Mathematica with closed-source is fine for 99% of the projects in the world. It’s clearly reliable enough.

They have worked decades to build this product, and they have the right to reap the benefits following these efforts and keep their core technology secret.

Stephen, this is an exciting adventure. As you say, merely the act encoding the 5 million extant theorems will reveal that thousands of them are false or unproven. The process will also discover many useful consequent theorems by making previously unnoticed connections.

The Wolfram|Alpha Step-by-Step Solution example of the indefinite integral doesn’t go far enough for me. It produces a solution matching in form the second step, the Cos(2x) equation. The initial equation is in the form Sin²(x) so I expect a solution with terms in Sin²(x). Can the Solver be told to match the terms in the original expression more closely?

I wonder if there something to learn from Doug Lenat’s Artificial Mathematician (Stanford ph.d. 1976 — http://en.wikipedia.org/wiki/Automated_Mathematician ).

I can think of three areas: 1) working within in an existing area, 2) venturing into an out-of-the-box construction with never before definitions and axioms, and 3) spin-offs or “branching” from an existing area. One and Three would benefit from curating. Just having NL connected to symbols, etc. would be a good first step provided it could speak as well for the visually impaired There should also be a feature allowing for the input of ones eBooks. These sources would be integrated with the curating function. And searchable as references. The difficulty encountered buy the pure mathematician is they typically know the objectives of a proof within the context of an existing area, but not the proof. Where as the out-of-the-box persons struggle with precise definitions, the necessity of certain axioms, and does not always know what the next theorem will be and if their new construct will get them to their goals.

Wow, Steve you sure are a prolific writer. Did you do this in one sitting?

I have one idea I’d like to get to you before I go back and actually read the post fully. Which is that there is a huge gap between the pedestrian chalkboard, Tex, LaTex, Oo, and friends, and formal syntax for Mathematica, MatLab, R, Maxima, Curvus, and the like. Mathematicians, even occasional student ones like me, have to straddle three different spaces to interact-think (pencil on paper or markers on whiteboard) with computer-algebra, publication, and code (Py, C++, Java..) generation. There is no non-left-to-right or non-right-to-left whitespace one can just ‘scribble’ formula into, pushing symbols about like on a jigsaw puzzle.. The supple, fluid chalk-on-blackboard UI has been replaced by three worlds of layers of syntactic sugar needed to get things acceptable to automata. No ‘DWIM’; no canonical in-formal-ula cum symbolic algebra cum publication-ready ‘objects’ which with minor tweaks could also represent fragments of library calls for OO & procedural languages. Almost a handwriting-recognition problem, but solved, one which would most enable ramblings among thought, teaching, exploration, computer-assisted reasoning, numerical evaluation, and publication. My bet is What enables more minds to participate, contribute, and consume, and interact with symbolic systems more easily has potential for widespread positive impact.

Mathematica and wolframalpha.com are awesome tools for students as well as for researchers and pro mathematicians. Thank you and good luck.

Related Writings

pure mathematics research paper

When Exactly Will the Eclipse Happen? A Multimillennium Tale of Computation

March 29, 2024

pure mathematics research paper

Can AI Solve Science?

March 5, 2024

pure mathematics research paper

How to Think Computationally about AI, the Universe and Everything

October 27, 2023

pure mathematics research paper

Will AIs Take All Our Jobs and End Human History—or Not? Well, It’s Complicated…

March 15, 2023

Recent Writings

pure mathematics research paper

Computing the Eclipse: Astronomy in the Wolfram Language

pure mathematics research paper

The Story Continues: Announcing Version 14 of Wolfram Language and Mathematica

January 9, 2024

pure mathematics research paper

Observer Theory

December 11, 2023

Popular Categories

Writings by Year

pure mathematics research paper

Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how »

Mathematics at MIT is administratively divided into two categories: Pure Mathematics and Applied Mathematics. They comprise the following research areas:

Pure Mathematics

  • Algebra & Algebraic Geometry
  • Algebraic Topology
  • Analysis & PDEs
  • Mathematical Logic & Foundations
  • Number Theory
  • Probability & Statistics
  • Representation Theory

Applied Mathematics

In applied mathematics, we look for important connections with other disciplines that may inspire interesting and useful mathematics, and where innovative mathematical reasoning may lead to new insights and applications.

  • Combinatorics
  • Computational Biology
  • Physical Applied Mathematics
  • Computational Science & Numerical Analysis
  • Theoretical Computer Science
  • Mathematics of Data

pure mathematics research paper

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Pure Mathematics

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Mathematics Follow Following
  • Mathematical Sciences Follow Following
  • Mixture models Follow Following
  • Functional Analysis Follow Following
  • Applied Mathematics Follow Following
  • Discrete Mathematics Follow Following
  • Differential Equations Follow Following
  • Numerical Analysis and Computational Mathematics Follow Following
  • Traffic control Follow Following
  • Mathematical Physics Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Search Menu
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • About International Mathematics Research Notices
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

  • 1 Introduction
  • 2 The ℼ-Colored Fan
  • 3 Normal Complexes
  • 4 Multimatroids
  • 5 Proofs of Main Theorems
  • 6 Intersection Numbers of psi-Classes
  • Acknowledgments

Multimatroids and Rational Curves with Cyclic Action

  • Article contents
  • Figures & tables
  • Supplementary Data

Emily Clader, Chiara Damiolini, Christopher Eur, Daoji Huang, Shiyue Li, Multimatroids and Rational Curves with Cyclic Action, International Mathematics Research Notices , 2024;, rnae069, https://doi.org/10.1093/imrn/rnae069

  • Permissions Icon Permissions

We study the connection between multimatroids and moduli spaces of rational curves with cyclic action. Multimatroids are generalizations of matroids and delta-matroids that naturally arise in topological graph theory. The perspective of moduli of curves provides a tropical framework for studying multimatroids, generalizing the previous connection between type- |$A$| permutohedral varieties (Losev–Manin moduli spaces) and matroids, and the connection between type- |$B$| permutohedral varieties and delta-matroids. Specifically, we equate a combinatorial nef cone of the moduli space with the space of |${\mathbb {R}}$| -multimatroids, a generalization of multimatroids, and we introduce the independence polytopal complex of a multimatroid, whose volume is identified with an intersection number on the moduli space. As an application, we give a combinatorial formula for a natural class of intersection numbers on the moduli space by relating to the volumes of independence polytopal complexes of multimatroids.

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1687-0247
  • Print ISSN 1073-7928
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

IMAGES

  1. International Mathematics Research Papers Template

    pure mathematics research paper

  2. Pure math unit 1 past papers 2007 2012

    pure mathematics research paper

  3. (PDF) Advanced Pure Mathematics

    pure mathematics research paper

  4. Cambridge International AS and A Level Mathematics Pure Mathematics 1

    pure mathematics research paper

  5. 02aa A level Mathematics specimen papers

    pure mathematics research paper

  6. Further Core Pure Mathematics

    pure mathematics research paper

VIDEO

  1. Further Pure Mathematics 01

  2. CSS PURE MATHEMATICS SOLVED PAST PAPER 2023 Q5(A)

  3. Cambridge A Level Mathematics (9709/32)

  4. AQA

  5. Surprising Link Between Pure Mathematics and Genetics

  6. Question 5, Paper 1, Pure Mathematics 1, May/June 2023

COMMENTS

  1. Pure Mathematics Research

    Pure Mathematics Fields. The E 8 Lie group. Algebra & Algebraic Geometry. Algebraic Topology. Analysis & PDEs. Geometry & Topology. Mathematical Logic & Foundations. Number Theory.

  2. Pure mathematics

    Research Open Access 15 Apr 2024 Scientific Reports Volume: 14, P: 8683 A novel group decision making method based on CoCoSo and interval-valued Q-rung orthopair fuzzy sets

  3. 244342 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on PURE MATHEMATICS. Find methods information, sources, references or conduct a literature review on ...

  4. Pure mathematics

    Advancing mathematics by guiding human intuition with AI. A framework through which machine learning can guide mathematicians in discovering new conjectures and theorems is presented and shown to ...

  5. 1164588 PDFs

    Mathematics, Pure and Applied Math | Explore the latest full-text research PDFs, articles, conference papers, preprints and more on MATHEMATICS. Find methods information, sources, references or ...

  6. Pure Mathematics Theses

    Elliott, Luke (University of St Andrews, 2022-06-14) - Thesis. In this thesis we explore natural procedures through which topological structure can be constructed from specific semigroups. We will do this in two ways: 1) we equip the semigroup object itself with a topological structure, ...

  7. Mathematical discoveries from program search with large ...

    Pure mathematics; Abstract. ... Although this is a great improvement to the lower bound compared to research in the last 20 years, we note it is still far from the upper bound and we hope our ...

  8. Exploring the beauty of pure mathematics in novel ways

    Our research paper, published today in the journal Nature, details our collaboration with top mathematicians to apply AI toward discovering new insights in two areas of pure mathematics: topology and representation theory.

  9. Research in Pure Mathematics

    Research » Research Overview » Research in Pure Mathematics. Algebra & Number Theory. Combinatorics & Graph Theory. Dynamical Systems. Geometry & Topology. Mathematical Logic & Theoretical Computer Science. Partial Differential Equations & Geometric Analysis. Probability. Real, Functional & Harmonic Analysis.

  10. Pure Mathematics Research

    Massachusetts Institute of Technology Department of Mathematics Headquarters Office Simons Building (Building 2), Room 106 77 Massachusetts Avenue

  11. PURE MATHEMATICS

    tions. In this paper, we introduce a new family of measures of noncompactness in the Frechet space lp loc(ℝ N) and by applying this family of measures of noncompactness, we discuss the existence of solutions for some classes of non-linear functional integral equations. Amiri Kayvanloo et al., Cogent Mathematics & Statistics (2019), 6: 1592276

  12. Stance and engagement in pure mathematics research articles: Linking

    Research highlights We demonstrate a broad correlation between disciplinary practices and discourse features in pure maths. Corpus and interview data are combined as a basis for the analysis. Boundaries between categories in the stance and engagement framework are shown to be fuzzy. Data reveals an unexpectedly frequent use of explicit shared knowledge references. We provide examples of stance ...

  13. Pure Mathematics

    Research output: Contribution to journal › Article (Academic Journal) › peer-review On the classification of extremely primitive affine groups Burness, T. & Lee, M. , 14 Jan 2022 , (Accepted/In press) In: Israel Journal of Mathematics. 11 p.

  14. Computational Knowledge and the Future of Pure Mathematics

    Today the main way pure mathematics is described—say in papers—is through a mixture of mathematical notation and natural language, together with a few diagrams. And in designing a precise symbolic language for pure mathematics, this has to be the starting point. ... For this project we picked a very specific and well-defined area of ...

  15. Examples from Pure Mathematics

    Join ResearchGate to discover and stay up-to-date with the latest research from leading experts in Pure Mathematics and many other scientific topics. Join for free ResearchGate iOS App

  16. Stance and engagement in pure mathematics research articles: Linking

    DOI: 10.1016/J.ESP.2011.11.002 Corpus ID: 143276255; Stance and engagement in pure mathematics research articles: Linking discourse features to disciplinary practices @article{McGrath2012StanceAE, title={Stance and engagement in pure mathematics research articles: Linking discourse features to disciplinary practices}, author={Lisa McGrath and Maria Kuteeva}, journal={English for Specific ...

  17. Research

    In applied mathematics, we look for important connections with other disciplines that may inspire interesting and useful mathematics, and where innovative mathematical reasoning may lead to new insights and applications. Combinatorics. Computational Biology. Physical Applied Mathematics. Computational Science & Numerical Analysis.

  18. PURE MATHEMATICS

    1 * Abstract: An equivalent, but variant form of Riemann's functional equation is explored, and several discoveries are made. Properties of Riemann's zeta function (s), from which a necessary and sufficient condition for the existence of zeros in the critical strip, are de- duced. This in turn, by an indirect route, eventually produces a simple, solvable, differential equation for arg( (s)) on ...

  19. Experiencing Research Practice in Pure Mathematics in A Teacher

    Abstract. This paper presents the early results of an experiment involving a class of elementary student teachers within the context of their mathematics preparation. The motivation of the ...

  20. Pure Mathematics Research Papers

    Selections of Lipschitz multifunctions generating a continuous flow. Download. by Alberto Bressan. 3. Mathematics , Pure Mathematics , Fuzzy Differential and Integral Equations. Integrable Hamiltonian system with two degrees of freedom. The topological structure of saturated neighbourhoods of points of focus-focus and saddle-saddle type.

  21. Multimatroids and Rational Curves with Cyclic Action

    Theorem B implies Theorem A via the results of [], as we show in Section 5.In order to prove Theorem B, the main idea is to use the work of Nathanson-Ross [], which relates the degrees of top powers of divisors on tropical fans to the volumes of associated polytopal complexes known as normal complexes.However, their results generally apply only to divisors satisfying a cubical condition ...

  22. (PDF) THE IMPACT OF PURE AND APPLIED MATHEMATICS TO THE ...

    Abstract. Mathematics, both pure and applied, plays a crucial role in shaping societies and d riving advancements in various fields. This. research paper explores the impact of pure and applied ...