Matbench Discovery
Columns
| Model | CPS ↑ | Acc ↑ | F1 ↑ | DAF ↑ | Prec ↑ | MAE ↓ | R2 ↑ | κSRME ↓ | RMSD ↓ | Training Set | Params | Targets | Date Added | Links | rcut | Org |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| eSEN-30M-OAM | 0.888 | 0.977 | 0.925 | 6.069 | 0.928 | 0.018 | 0.866 | 0.170 | 0.061 | 6.6M (113M) OMat24+MPtrj+sAlex | 30.2M | EFSG | 2025-03-17 | 6 Å | ||
| EquFlash | 0.888 | 0.975 | 0.919 | 5.983 | 0.915 | 0.019 | 0.871 | 0.158 | 0.060 | 6.6M (113M) OMat24+MPtrj+sAlex | 28.7M | EFSG | 2025-06-23 | 6 Å | ||
| Nequip-OAM-XL | 0.886 | 0.971 | 0.906 | 5.869 | 0.897 | 0.020 | 0.872 | 0.125 | 0.063 | 6.6M (113M) OMat24+sAlex+MPtrj | 32.1M | EFSG | 2025-11-30 | 6 Å | ||
| Nequip-OAM-L | 0.870 | 0.967 | 0.893 | 5.823 | 0.890 | 0.022 | 0.865 | 0.166 | 0.065 | 6.6M (113M) OMat24+sAlex+MPtrj | 9.6M | EFSG | 2025-09-08 | 6 Å | ||
| GRACE-2L-OAM-L | 0.865 | 0.964 | 0.883 | 5.840 | 0.893 | 0.022 | 0.862 | 0.169 | 0.064 | 6.6M (113M) OMat24+sAlex+MPtrj | 26.4M | EFSG | 2025-09-09 | 6 Å | ||
| ORB v3 | 0.860 | 0.971 | 0.905 | 5.912 | 0.904 | 0.024 | 0.821 | 0.210 | 0.075 | 6.47M (133M) MPtrj+Alex+OMat24 | 25.5M | EFSG | 2025-04-05 | 6 Å | ||
| SevenNet-MF-ompa | 0.844 | 0.969 | 0.901 | 5.825 | 0.890 | 0.021 | 0.867 | 0.317 | 0.064 | 6.6M (113M) OMat24+sAlex+MPtrj | 25.7M | EFSG | 2025-03-13 | 6 Å | ||
| Allegro-OAM-L | 0.840 | 0.966 | 0.895 | 5.674 | 0.867 | 0.022 | 0.868 | 0.319 | 0.065 | 6.6M (113M) OMat24+sAlex+MPtrj | 9.7M | EFSG | 2025-09-08 | 7 Å | ||
| GRACE-2L-OAM | 0.837 | 0.963 | 0.880 | 5.774 | 0.883 | 0.023 | 0.862 | 0.294 | 0.067 | 6.6M (113M) OMat24+sAlex+MPtrj | 12.6M | EFSG | 2025-02-06 | 6 Å | ||
| DPA-3.1-3M-FT | 0.802 | 0.963 | 0.884 | 5.667 | 0.866 | 0.023 | 0.869 | 0.469 | 0.069 | 163M OpenLAM | 3.27M | EFSG | 2025-06-05 | 6 Å | ||
| eSEN-30M-MP | 0.797 | 0.946 | 0.831 | 5.260 | 0.804 | 0.033 | 0.822 | 0.340 | 0.075 | 146k (1.58M) MPtrj | 30.1M | EFSG | 2025-03-17 | 6 Å | ||
| MACE-MPA-0 | 0.795 | 0.954 | 0.852 | 5.582 | 0.853 | 0.028 | 0.842 | 0.412 | 0.073 | 3.37M (12M) MPtrj+sAlex | 9.06M | EFSG | 2024-12-09 | 6 Å | ||
| AlphaNet-v1-OMA | 0.769 | 0.968 | 0.901 | 5.747 | 0.879 | 0.024 | 0.831 | 0.643 | 0.079 | 6.6M (113M) OMat24+sAlex+MPtrj | 4.65M | EFSG | 2025-05-12 | 5 Å | ||
| MatterSim v1 5M | 0.767 | 0.959 | 0.862 | 5.852 | 0.895 | 0.024 | 0.863 | 0.575 | 0.073 | 17M MatterSim | 4.55M | EFSG | 2024-12-16 | 5 Å | ||
| GRACE-1L-OAM | 0.761 | 0.944 | 0.824 | 5.255 | 0.803 | 0.031 | 0.842 | 0.517 | 0.072 | 6.6M (113M) OMat24+sAlex+MPtrj | 3.45M | EFSG | 2025-02-06 | 6 Å | ||
| Eqnorm MPtrj | 0.756 | 0.929 | 0.786 | 4.844 | 0.741 | 0.040 | 0.799 | 0.408 | 0.084 | 146k (1.58M) MPtrj | 1.31M | EFSG | 2025-05-26 | 6 Å | ||
| Nequip-MP-L | 0.733 | 0.921 | 0.761 | 4.704 | 0.719 | 0.043 | 0.791 | 0.452 | 0.086 | 146k (1.58M) MPtrj | 9.6M | EFSG | 2025-09-08 | 6 Å | ||
| Nequix MP | 0.729 | 0.914 | 0.751 | 4.455 | 0.681 | 0.044 | 0.782 | 0.446 | 0.085 | 146k (1.58M) MPtrj | 708k | EFSG | 2025-08-17 | 6 Å | ||
| Allegro-MP-L | 0.720 | 0.915 | 0.751 | 4.516 | 0.690 | 0.044 | 0.778 | 0.504 | 0.082 | 146k (1.58M) MPtrj | 18.7M | EFSG | 2025-09-08 | 6 Å | ||
| DPA-3.1-MPtrj | 0.718 | 0.936 | 0.803 | 5.024 | 0.768 | 0.037 | 0.812 | 0.650 | 0.080 | 146k (1.58M) MPtrj | 4.81M | EFSG | 2025-06-05 | 6 Å | ||
| SevenNet-l3i5 | 0.714 | 0.920 | 0.760 | 4.629 | 0.708 | 0.044 | 0.776 | 0.550 | 0.085 | 146k (1.58M) MPtrj | 1.17M | EFSG | 2024-12-10 | 5 Å | ||
| HIENet | 0.707 | 0.929 | 0.777 | 4.932 | 0.754 | 0.041 | 0.793 | 0.642 | 0.080 | 146k (1.58M) MPtrj | 7.51M | EFSG | 2025-07-01 | 5 Å | ||
| GRACE-2L-MPtrj | 0.681 | 0.896 | 0.691 | 4.163 | 0.636 | 0.052 | 0.741 | 0.525 | 0.090 | 146k (1.58M) MPtrj | 15.3M | EFSG | 2024-11-21 | 6 Å | ||
| MatRIS v0.5.0 MPtrj | 0.680 | 0.938 | 0.809 | 5.049 | 0.772 | 0.037 | 0.803 | 0.865 | 0.077 | 146k (1.58M) MPtrj | 5.83M | EFSGM | 2025-03-13 | 6 Å | ||
| MACE-MP-0 | 0.637 | 0.878 | 0.669 | 3.777 | 0.577 | 0.057 | 0.697 | 0.682 | 0.091 | 146k (1.58M) MPtrj | 4.69M | EFSG | 2023-07-14 | 6 Å | ||
| eqV2 M | 0.558 | 0.975 | 0.917 | 6.047 | 0.924 | 0.020 | 0.848 | 1.771 | 0.069 | 3.37M (102M) OMat24+MPtrj | 86.6M | EFSD | 2024-10-18 | 12 Å | ||
| ORB v2 | 0.528 | 0.965 | 0.880 | 6.041 | 0.924 | 0.028 | 0.824 | 1.734 | 0.097 | 3.25M (32.1M) MPtrj+Alex | 25.2M | EFSD | 2024-10-11 | 10 Å | ||
| eqV2 S DeNS | 0.522 | 0.941 | 0.815 | 5.042 | 0.771 | 0.036 | 0.788 | 1.676 | 0.076 | 146k (1.58M) MPtrj | 31.2M | EFSD | 2024-10-18 | 12 Å | ||
| ORB v2 MPtrj | 0.470 | 0.922 | 0.765 | 4.702 | 0.719 | 0.045 | 0.756 | 1.726 | 0.101 | 146k (1.58M) MPtrj | 25.2M | EFSD | 2024-10-14 | 10 Å | ||
| CHGNet | 0.343 | 0.851 | 0.613 | 3.361 | 0.514 | 0.063 | 0.689 | 2.000 | 0.095 | 146k (1.58M) MPtrj | 413k | EFSGM | 2023-03-03 | 5 Å | ||
| M3GNet | 0.310 | 0.813 | 0.569 | 2.882 | 0.441 | 0.075 | 0.585 | 2.000 | 0.112 | 62.8k (188k) MPF | 228k | EFSG | 2022-09-20 | 5 Å | ||
| GNoME | n/a | 0.955 | 0.829 | 5.523 | 0.844 | 0.035 | 0.785 | n/a | n/a | 6M (89M) GNoME | 16.2M | EFG | 2024-02-03 | 5 Å |
The training set column shows the number of materials used to train the model. For models trained on DFT relaxations, we show the number of distinct frames in parentheses. In cases where only the number of frames is known, we report the number of frames as the training set size.
(N=x) in the Model Params column
shows the number of estimators if an ensemble was used. DAF = Discovery Acceleration
Factor measures how many more stable materials a model finds compared to random
selection from the test set. The unique structure prototypes in the WBM test set
have a 15.3% rate of stable crystals, meaning the max possible DAF is (32.9k / 215k)^−1 ≈
6.54.Compare models across different metrics and parameters:
- Phonons > κSRME (31 models)
- Combined Performance Score (41 models)
- Discovery > Unique Prototypes > F1 Score (41 models)
- Number of model parameters (41 models)
Matbench Discovery is an interactive leaderboard which ranks ML models on multiple tasks designed to simulate high-throughput discovery of new stable inorganic crystals, finding their ground state atomic positions and predicting their thermal conductivity.
We rank 41 models covering multiple methodologies including graph neural network (GNN) interatomic potentials, GNN one-shot predictors, iterative Bayesian optimizers and random forests with shallow-learning structure fingerprints.
eSEN-30M-OAM (paper, code) achieves the highest F1 score of 0.925, R2 of 0.866 and a discovery acceleration factor (DAF) of 6.069 (i.e. a ~6.1x higher rate of stable structures compared to dummy discovery in the already enriched test set containing 16% stable materials).Our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
📖 Important: In Matbench Discovery, the convex hull used to evaluate stability is constructed from DFT reference energies, not from model predictions. This differs from some other benchmarking approaches and has important implications for metric interpretation. See
/tasks/discoveryfor more information.
To cite Matbench Discovery, use:
Riebesell, J., Goodall, R.E.A., Benner, P. et al. A framework to evaluate machine learning crystal stability predictions. Nat Mach Intell 7, 836–847 (2025). https://doi.org/10.1038/s42256-025-01055-1
We welcome new models additions to the leaderboard through GitHub PRs. See the contributing guide for details and ask support questions via GitHub discussion.
For detailed results and analysis, check out https://nature.com/articles/s42256-025-01055-1.
Disclaimer: We evaluate how accurately ML models predict several material properties like thermodynamic stability, thermal conductivity, and atomic positions, in all cases using PBE DFT as reference data. Although these properties are important for high-throughput materials discovery, the ranking cannot give a complete picture of a model’s overall ability to drive materials research. A high ranking does not constitute endorsement by the Materials Project.
For details on the κSRME modeling task and evaluation method, refer to arXiv:2408.00755. The only difference between the procedure presented by Póta, Ahlawat, Csányi, and Simoncelli, and the results shown here is the relaxation protocol has been simplified and unified for all models (just a single simultaneous cell and site relaxation). See
matbench_discovery/phonons/thermal_conductivity.pyfor code to predict 2nd and 3rd order force constants andmatbench_discovery/metrics/phonons.pyfor code to compute κSRME.
GitHub Activity
Development activity and community engagement of MLIP GitHub repos. Points are sized by number of contributors and colored by number of commits over the last year.