COSMOSDataset

Created: Oct 21, 2025 243M structures Open CC-BY-4.0

Description

COSMOS Dataset (Combined Organic, Surface and Materials Open Source Dataset) aggregates 15 publicly available ab initio databases spanning molecules, inorganic crystals, metal and oxide surfaces, and metal-organic frameworks, covering levels of theory from generalized gradient approximation to hybrid functionals. It also includes a domain-bridging set, sampled from OC20, OC22, MatPES, ODAC23, OMol25, and QCML dataset and recomputed with MPtrj-consistent computational settings, which facilitates cross-domain knowledge transfer in a multi-task training framework. For detailed information about the dataset composition, see Table 1 from the paper.

Derived From

  1. MPtrj
  2. OMat24
  3. Subsampled Alexandria
  4. Open Catalyst 2020
  5. MatPES v1.0 PBE
  6. MatPES v1.0 r2SCAN

Methodology

  • Method: DFT
  • Code: Various
  • Functional: Various
  • Pseudopotentials: Various

Authors

  1. Jaesun Kim (Seoul National University)
  2. Jinmu You (Seoul National University)
  3. Seungwu Han (Seoul National University, Korea Institute for Advanced Study)

See incorrect or missing data? Suggest an edit to datasets.yml