
Matbench Discovery
TL;DR: We benchmark ML models on crystal stability prediction from unrelaxed structures finding universal interatomic potentials (UIP) like M3GNet and CHGNet to be highly accurate, robust across chemistries and ready for production use in high-throughput discovery pipelines.
Matbench Discovery is an interactive leaderboard and associated PyPI package which together make it easy to rank ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals.
So far, we’ve tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential relaxers. We find CHGNet (paper) to achieve the highest F1 score of 0.59, of 0.61 and a discovery acceleration factor (DAF) of 3.06 (meaning a 3x higher rate of stable structures compared to dummy selection in our already enriched search space). We believe our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
Model | F1 | DAF | Precision | Accuracy | TPR | TNR | MAE | RMSE | R2 | Model Class |
---|---|---|---|---|---|---|---|---|---|---|
CHGNet | 0.59 | 3.06 | 0.52 | 0.84 | 0.67 | 0.87 | 0.07 | 0.11 | 0.61 | UIP-GNN |
M3GNet | 0.58 | 2.66 | 0.45 | 0.80 | 0.79 | 0.80 | 0.07 | 0.12 | 0.59 | UIP-GNN |
ALIGNN | 0.57 | 2.87 | 0.49 | 0.82 | 0.66 | 0.86 | 0.09 | 0.15 | 0.27 | GNN |
MEGNet | 0.52 | 2.70 | 0.46 | 0.81 | 0.59 | 0.86 | 0.13 | 0.20 | -0.27 | GNN |
CGCNN | 0.52 | 2.62 | 0.45 | 0.81 | 0.60 | 0.85 | 0.14 | 0.23 | -0.61 | GNN |
CGCNN+P | 0.51 | 2.38 | 0.41 | 0.78 | 0.69 | 0.79 | 0.11 | 0.18 | 0.02 | GNN |
Wrenformer | 0.48 | 2.13 | 0.36 | 0.74 | 0.71 | 0.74 | 0.10 | 0.18 | -0.04 | Transformer |
BOWSR + MEGNet | 0.44 | 1.90 | 0.32 | 0.68 | 0.74 | 0.67 | 0.11 | 0.16 | 0.15 | BO+GNN |
Voronoi RF | 0.34 | 1.51 | 0.26 | 0.66 | 0.52 | 0.69 | 0.14 | 0.21 | -0.32 | Fingerprint+RF |
Dummy | 0.19 | 1.00 | 0.17 | 0.68 | 0.23 | 0.77 | 0.12 | 0.18 | 0.00 | scikit-learn |
We welcome contributions that add new models to the leaderboard through GitHub PRs. See the contributing guide for details.
Anyone interested in joining this effort please open a GitHub discussion or reach out privately.
For detailed results and analysis, check out our preprint and SI.