Crystal Stability Prediction Metrics
Columns
| Model | Acc ↑ | F1 ↑ | DAF ↑ | Prec ↑ | TNR ↑ | TPR ↑ | MAE ↓ | R2 ↑ | RMSE ↓ | Date Added | Links |
|---|---|---|---|---|---|---|---|---|---|---|---|
| eSEN-30M-OAM | 0.977 | 0.925 | 6.069 | 0.928 | 0.987 | 0.923 | 0.018 | 0.866 | 0.067 | 2025-03-17 | |
| EquFlash | 0.975 | 0.919 | 5.983 | 0.915 | 0.984 | 0.922 | 0.019 | 0.871 | 0.066 | 2025-06-23 | |
| Nequip-OAM-XL | 0.971 | 0.906 | 5.869 | 0.897 | 0.981 | 0.915 | 0.020 | 0.872 | 0.066 | 2025-11-30 | |
| Nequip-OAM-L | 0.967 | 0.893 | 5.823 | 0.890 | 0.980 | 0.895 | 0.022 | 0.865 | 0.068 | 2025-09-08 | |
| GRACE-2L-OAM-L | 0.964 | 0.883 | 5.840 | 0.893 | 0.981 | 0.874 | 0.022 | 0.862 | 0.068 | 2025-09-09 | |
| ORB v3 | 0.971 | 0.905 | 5.912 | 0.904 | 0.982 | 0.907 | 0.024 | 0.821 | 0.078 | 2025-04-05 | |
| SevenNet-MF-ompa | 0.969 | 0.901 | 5.825 | 0.890 | 0.979 | 0.911 | 0.021 | 0.867 | 0.067 | 2025-03-13 | |
| Allegro-OAM-L | 0.966 | 0.895 | 5.674 | 0.867 | 0.974 | 0.923 | 0.022 | 0.868 | 0.067 | 2025-09-08 | |
| GRACE-2L-OAM | 0.963 | 0.880 | 5.774 | 0.883 | 0.979 | 0.878 | 0.023 | 0.862 | 0.068 | 2025-02-06 | |
| DPA-3.1-3M-FT | 0.963 | 0.884 | 5.667 | 0.866 | 0.974 | 0.903 | 0.023 | 0.869 | 0.067 | 2025-06-05 | |
| eSEN-30M-MP | 0.946 | 0.831 | 5.260 | 0.804 | 0.962 | 0.861 | 0.033 | 0.822 | 0.078 | 2025-03-17 | |
| MACE-MPA-0 | 0.954 | 0.852 | 5.582 | 0.853 | 0.973 | 0.851 | 0.028 | 0.842 | 0.073 | 2024-12-09 | |
| AlphaNet-v1-OMA | 0.968 | 0.901 | 5.747 | 0.879 | 0.977 | 0.924 | 0.024 | 0.831 | 0.076 | 2025-05-12 | |
| MatterSim v1 5M | 0.959 | 0.862 | 5.852 | 0.895 | 0.982 | 0.831 | 0.024 | 0.863 | 0.068 | 2024-12-16 | |
| GRACE-1L-OAM | 0.944 | 0.824 | 5.255 | 0.803 | 0.962 | 0.846 | 0.031 | 0.842 | 0.073 | 2025-02-06 | |
| Eqnorm MPtrj | 0.929 | 0.786 | 4.844 | 0.741 | 0.946 | 0.838 | 0.040 | 0.799 | 0.083 | 2025-05-26 | |
| Nequip-MP-L | 0.921 | 0.761 | 4.704 | 0.719 | 0.942 | 0.809 | 0.043 | 0.791 | 0.084 | 2025-09-08 | |
| Nequix MP | 0.914 | 0.751 | 4.455 | 0.681 | 0.928 | 0.836 | 0.044 | 0.782 | 0.086 | 2025-08-17 | |
| Allegro-MP-L | 0.915 | 0.751 | 4.516 | 0.690 | 0.932 | 0.823 | 0.044 | 0.778 | 0.087 | 2025-09-08 | |
| DPA-3.1-MPtrj | 0.936 | 0.803 | 5.024 | 0.768 | 0.953 | 0.841 | 0.037 | 0.812 | 0.080 | 2025-06-05 | |
| SevenNet-l3i5 | 0.920 | 0.760 | 4.629 | 0.708 | 0.938 | 0.821 | 0.044 | 0.776 | 0.087 | 2024-12-10 | |
| HIENet | 0.929 | 0.777 | 4.932 | 0.754 | 0.952 | 0.801 | 0.041 | 0.793 | 0.084 | 2025-07-01 | |
| GRACE-2L-MPtrj | 0.896 | 0.691 | 4.163 | 0.636 | 0.921 | 0.757 | 0.052 | 0.741 | 0.094 | 2024-11-21 | |
| MatRIS v0.5.0 MPtrj | 0.938 | 0.809 | 5.049 | 0.772 | 0.954 | 0.850 | 0.037 | 0.803 | 0.082 | 2025-03-13 | |
| MACE-MP-0 | 0.878 | 0.669 | 3.777 | 0.577 | 0.893 | 0.796 | 0.057 | 0.697 | 0.101 | 2023-07-14 | |
| eqV2 M | 0.975 | 0.917 | 6.047 | 0.924 | 0.986 | 0.910 | 0.020 | 0.848 | 0.072 | 2024-10-18 | |
| ORB v2 | 0.965 | 0.880 | 6.041 | 0.924 | 0.987 | 0.841 | 0.028 | 0.824 | 0.077 | 2024-10-11 | |
| eqV2 S DeNS | 0.941 | 0.815 | 5.042 | 0.771 | 0.953 | 0.864 | 0.036 | 0.788 | 0.085 | 2024-10-18 | |
| ORB v2 MPtrj | 0.922 | 0.765 | 4.702 | 0.719 | 0.941 | 0.817 | 0.045 | 0.756 | 0.091 | 2024-10-14 | |
| CHGNet | 0.851 | 0.613 | 3.361 | 0.514 | 0.868 | 0.758 | 0.063 | 0.689 | 0.103 | 2023-03-03 | |
| M3GNet | 0.813 | 0.569 | 2.882 | 0.441 | 0.813 | 0.803 | 0.075 | 0.585 | 0.118 | 2022-09-20 | |
| GNoME | 0.955 | 0.829 | 5.523 | 0.844 | 0.972 | 0.814 | 0.035 | 0.785 | 0.085 | 2024-02-03 |
Convex Hull Construction in Matbench Discovery
In Matbench Discovery, the convex hull is always constructed from DFT reference energies, not from the ML model’s predicted energies. This is an important methodological choice that differs from some other benchmarking approaches and has several implications. Understanding how the convex hull is constructed is important for correctly interpreting the energy metrics in Matbench Discovery.
What This Means
- DFT-based hull: When we calculate the distance to the convex hull (Ehull dist) for a material, we compare the model’s predicted formation energy against the DFT-computed convex hull built from Materials Project reference structures.
- Fixed reference: The hull does not change based on the model’s predictions. All models are evaluated against the same DFT reference hull.
- Discovery criterion: A material is counted as a “discovery” if the model correctly predicts it to be lower in energy than all known DFT-computed competing phases with the same (reduced) composition in Materials Project. The reference data was pulled on 2023-03-16 (14 GB), database release v2022.10.28.
Why This Matters
This approach means that:
Formation energy MAE = Hull distance MAE: Because both the model’s prediction and the DFT reference are measured on the same energy scale (relative to the same elemental references), the error in formation energy prediction directly equals the error in hull distance prediction. This is a consequence of linear transformations leaving the MAE metric invariant.
Systematic errors are not canceled: If a model has systematic errors (e.g., consistently over- or underpredicting certain elements), these errors will appear in both the formation energy and hull distance metrics. The model cannot “correct” for its own systematic errors by having them affect both the test structures and the reference hull equally.
- Advantage: Tests absolute accuracy of model predictions against ground truth.
- Use case: Pre-screening candidates for DFT calculations and evaluating a model’s ability to identify materials below the DFT reference hull.
Different from some literature: Papers like Nature Communications 11:3793 (2020) construct hulls from model predictions, allowing systematic model errors to partially cancel. In that approach, the hull distance MAE can differ from the formation energy MAE.
- Advantage: Systematic model errors can partially cancel.
- Use case: Using the model as a complete replacement for DFT.
This distinction is subtle but important for correctly interpreting model performance and making fair comparisons between different benchmarks.