Matbench Discovery Logo Matbench Discovery

Click on column headers to sort table rows
Model CPS Acc F1 DAF Prec MAE R2 κSRME RMSD Training Set Params Targets Date Added Links rcut Org
eSEN-30M-OAM0.8880.9770.9256.0690.9280.0180.8660.1700.0616.6M (113M) OMat24+MPtrj+sAlex30.2MEFSG2025-03-17 6 Å
EquFlash0.8880.9750.9195.9830.9150.0190.8710.1580.0606.6M (113M) OMat24+MPtrj+sAlex28.7MEFSG2025-06-23 6 ÅMaterials AI Lab at Samsung Electronics logo
Nequip-OAM-XL0.8860.9710.9065.8690.8970.0200.8720.1250.0636.6M (113M) OMat24+sAlex+MPtrj32.1MEFSG2025-11-30 6 ÅUniversity of Cambridge logo
Nequip-OAM-L0.8700.9670.8935.8230.8900.0220.8650.1660.0656.6M (113M) OMat24+sAlex+MPtrj9.6MEFSG2025-09-08 6 ÅMIR Group, Harvard University logo
GRACE-2L-OAM-L0.8650.9640.8835.8400.8930.0220.8620.1690.0646.6M (113M) OMat24+sAlex+MPtrj26.4MEFSG2025-09-09 6 ÅICAMS, Ruhr University Bochum logo
ORB v30.8600.9710.9055.9120.9040.0240.8210.2100.0756.47M (133M) MPtrj+Alex+OMat2425.5MEFSG2025-04-05 6 ÅOrbital Materials logo
SevenNet-MF-ompa0.8440.9690.9015.8250.8900.0210.8670.3170.0646.6M (113M) OMat24+sAlex+MPtrj25.7MEFSG2025-03-13 6 ÅSeoul National University logo
Allegro-OAM-L0.8400.9660.8955.6740.8670.0220.8680.3190.0656.6M (113M) OMat24+sAlex+MPtrj9.7MEFSG2025-09-08 7 ÅMIR Group, Harvard University logo
GRACE-2L-OAM0.8370.9630.8805.7740.8830.0230.8620.2940.0676.6M (113M) OMat24+sAlex+MPtrj12.6MEFSG2025-02-06 6 ÅICAMS, Ruhr University Bochum logo
DPA-3.1-3M-FT0.8020.9630.8845.6670.8660.0230.8690.4690.069163M OpenLAM3.27MEFSG2025-06-05 6 ÅAI for Science Institute, Beijing logo
eSEN-30M-MP0.7970.9460.8315.2600.8040.0330.8220.3400.075146k (1.58M) MPtrj30.1MEFSG2025-03-17 6 Å
MACE-MPA-00.7950.9540.8525.5820.8530.0280.8420.4120.0733.37M (12M) MPtrj+sAlex9.06MEFSG2024-12-09 6 ÅUniversity of Cambridge logo
AlphaNet-v1-OMA0.7690.9680.9015.7470.8790.0240.8310.6430.0796.6M (113M) OMat24+sAlex+MPtrj4.65MEFSG2025-05-12 5 ÅTsinghua University logo
MatterSim v1 5M0.7670.9590.8625.8520.8950.0240.8630.5750.07317M MatterSim4.55MEFSG2024-12-16 5 Å
GRACE-1L-OAM0.7610.9440.8245.2550.8030.0310.8420.5170.0726.6M (113M) OMat24+sAlex+MPtrj3.45MEFSG2025-02-06 6 ÅICAMS, Ruhr University Bochum logo
Eqnorm MPtrj0.7560.9290.7864.8440.7410.0400.7990.4080.084146k (1.58M) MPtrj1.31MEFSG2025-05-26 6 ÅZhejiang Lab logo
Nequip-MP-L0.7330.9210.7614.7040.7190.0430.7910.4520.086146k (1.58M) MPtrj9.6MEFSG2025-09-08 6 ÅMIR Group, Harvard University logo
Nequix MP0.7290.9140.7514.4550.6810.0440.7820.4460.085146k (1.58M) MPtrj708kEFSG2025-08-17 6 ÅMassachusetts Institute of Technology logo
Allegro-MP-L0.7200.9150.7514.5160.6900.0440.7780.5040.082146k (1.58M) MPtrj18.7MEFSG2025-09-08 6 ÅMIR Group, Harvard University logo
DPA-3.1-MPtrj0.7180.9360.8035.0240.7680.0370.8120.6500.080146k (1.58M) MPtrj4.81MEFSG2025-06-05 6 ÅAI for Science Institute, Beijing logo
SevenNet-l3i50.7140.9200.7604.6290.7080.0440.7760.5500.085146k (1.58M) MPtrj1.17MEFSG2024-12-10 5 ÅSeoul National University logo
HIENet0.7070.9290.7774.9320.7540.0410.7930.6420.080146k (1.58M) MPtrj7.51MEFSG2025-07-01 5 ÅTexas A&M University logo
GRACE-2L-MPtrj0.6810.8960.6914.1630.6360.0520.7410.5250.090146k (1.58M) MPtrj15.3MEFSG2024-11-21 6 ÅICAMS, Ruhr University Bochum logo
MatRIS v0.5.0 MPtrj0.6800.9380.8095.0490.7720.0370.8030.8650.077146k (1.58M) MPtrj5.83MEFSGM2025-03-13 6 ÅChinese Academy of Sciences logo
MACE-MP-00.6370.8780.6693.7770.5770.0570.6970.6820.091146k (1.58M) MPtrj4.69MEFSG2023-07-14 6 ÅUniversity of Cambridge logo
eqV2 M0.5580.9750.9176.0470.9240.0200.8481.7710.0693.37M (102M) OMat24+MPtrj86.6MEFSD2024-10-18 12 Å
ORB v20.5280.9650.8806.0410.9240.0280.8241.7340.0973.25M (32.1M) MPtrj+Alex25.2MEFSD2024-10-11 10 ÅOrbital Materials logo
eqV2 S DeNS0.5220.9410.8155.0420.7710.0360.7881.6760.076146k (1.58M) MPtrj31.2MEFSD2024-10-18 12 Å
ORB v2 MPtrj0.4700.9220.7654.7020.7190.0450.7561.7260.101146k (1.58M) MPtrj25.2MEFSD2024-10-14 10 ÅOrbital Materials logo
CHGNet0.3430.8510.6133.3610.5140.0630.6892.0000.095146k (1.58M) MPtrj413kEFSGM2023-03-03 5 ÅUC Berkeley logo
M3GNet0.3100.8130.5692.8820.4410.0750.5852.0000.11262.8k (188k) MPF228kEFSG2022-09-20 5 ÅUC San Diego logo
GNoMEn/a0.9550.8295.5230.8440.0350.785n/an/a6M (89M) GNoME16.2MEFG2024-02-03 5 ÅGoogle DeepMind logo
Download table as  Subscribe via RSS
The CPS (Combined Performance Score) is a metric that weights discovery performance (F1), geometry optimization quality (RMSD), and thermal conductivity prediction accuracy (κSRME). Use the radar chart to adjust the importance of each component.

The training set column shows the number of materials used to train the model. For models trained on DFT relaxations, we show the number of distinct frames in parentheses. In cases where only the number of frames is known, we report the number of frames as the training set size. (N=x) in the Model Params column shows the number of estimators if an ensemble was used. DAF = Discovery Acceleration Factor measures how many more stable materials a model finds compared to random selection from the test set. The unique structure prototypes in the WBM test set have a 15.3% rate of stable crystals, meaning the max possible DAF is (32.9k / 215k)^−1 ≈ 6.54.
CPS F1 50%κSRME 40%RMSD 10%

Compare models across different metrics and parameters:

Matbench Discovery is an interactive leaderboard which ranks ML models on multiple tasks designed to simulate high-throughput discovery of new stable inorganic crystals, finding their ground state atomic positions and predicting their thermal conductivity.

We rank 41 models covering multiple methodologies including graph neural network (GNN) interatomic potentials, GNN one-shot predictors, iterative Bayesian optimizers and random forests with shallow-learning structure fingerprints.

eSEN-30M-OAM (paper, code) achieves the highest F1 score of 0.925, R2 of 0.866 and a discovery acceleration factor (DAF) of 6.069 (i.e. a ~6.1x higher rate of stable structures compared to dummy discovery in the already enriched test set containing 16% stable materials).

Our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.

📖 Important: In Matbench Discovery, the convex hull used to evaluate stability is constructed from DFT reference energies, not from model predictions. This differs from some other benchmarking approaches and has important implications for metric interpretation. See /tasks/discovery for more information.

To cite Matbench Discovery, use:

Riebesell, J., Goodall, R.E.A., Benner, P. et al. A framework to evaluate machine learning crystal stability predictions. Nat Mach Intell 7, 836–847 (2025). https://doi.org/10.1038/s42256-025-01055-1

We welcome new models additions to the leaderboard through GitHub PRs. See the contributing guide for details and ask support questions via GitHub discussion.

For detailed results and analysis, check out https://nature.com/articles/s42256-025-01055-1.

Disclaimer: We evaluate how accurately ML models predict several material properties like thermodynamic stability, thermal conductivity, and atomic positions, in all cases using PBE DFT as reference data. Although these properties are important for high-throughput materials discovery, the ranking cannot give a complete picture of a model’s overall ability to drive materials research. A high ranking does not constitute endorsement by the Materials Project.

For details on the κSRME modeling task and evaluation method, refer to arXiv:2408.00755. The only difference between the procedure presented by Póta, Ahlawat, Csányi, and Simoncelli, and the results shown here is the relaxation protocol has been simplified and unified for all models (just a single simultaneous cell and site relaxation). See matbench_discovery/phonons/thermal_conductivity.py for code to predict 2nd and 3rd order force constants and matbench_discovery/metrics/phonons.py for code to compute κSRME.

GitHub Activity

Development activity and community engagement of MLIP GitHub repos. Points are sized by number of contributors and colored by number of commits over the last year.