You can run this notebook online in a Binder session or view it on Github.

First Steps

MolSSI

The Molecular Sciences Software Institute hosts the Quantum Chemistry Archive (QCArchive) and makes this data available to the entire Computational Molecular Sciences community free of charge. The QCArchive is both a database to view, analyze, and explore existing data as well as a live instance that continuously generates new data as directed by the community.

QCArchive

The primary interface to this database in Python is a through a FractalClient from the qcportal package which can be downloaded via pip (pip install -e qcportal) or conda (conda install qcportal -c conda-forge). A new FractalClient automatically connects to MolSSI’s central server and has access to all data contained within the QCArchive.

[1]:
import qcportal as ptl
client = ptl.FractalClient()
client
[1]:

FractalClient

  • Server:   The MolSSI QCArchive Server
  • Address:   https://api.qcarchive.molssi.org:443/
  • Username:   None

Finding Collections

One of the main ways to explore the QCArchive is to examine Collections which are structures that allow easy manipulation of data in preset ways. Several example of Collections contained within the QCArchive are as follows:

  • Dataset - A dataset where each record corresponds to a single molecule, with one or more QM methods applied to that molecule.

  • ReactionDataset - A dataset where each record is a combination of molecules (e.g. interaction and reaction energies). Each record contains data from one or more QM methods.

  • OptimizationDataset - A dataset where each record represents geometry optimization of a molecule.

  • TorsionDriveDataset - A dataset which organizes many molecular torsion scans together for data exploration, analysis, and methodology comparison (see the TorsionDrive Dataset example for more details).

[2]:
client.list_collections().head()
[2]:
tagline
collection name
Dataset GDB13-T In progress
OpenFF Discrepancy Benchmark 1 None
OpenFF NCI250K Boron 1 None
OpenFF Optimization Set 1 None
OpenFF VEHICLe Set 1 None

Specific Collection types can be queried to limit the amount of collections to browse through:

[3]:
client.list_collections("reactiondataset").head()
[3]:
tagline
collection name
ReactionDataset A21 Equilibrium complexes from A24 database of sma...
A24 Interaction energies for small bimolecular com...
ACONF Conformation energies for alkanes
AlkBind12 Binding energies of saturated and unsaturated ...
AlkIsod14 Isodesmic reaction energies for alkanes N=3--8

Exploring Collections

Collections can be obtained by pulling their data from the central server. A collection is primarily metadata and extremely large collections can be pulled in a few seconds. For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

[4]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)
ReactionDataset(name=`S22`, id='184', client='https://api.qcarchive.molssi.org:443/')

Statistics and Visualization

Visual statics and plotting can be generated by the visualize command:

[5]:
ds.visualize(method="B2PLYP", basis=["def2-svp", "def2-tzvp"], bench="S220", kind="violin")

Next steps

Congratulations! You have taken the first steps to exploring the data within the QCArchive. Please consider viewing the Reaction Dataset and the TorsionDrive Dataset examples for a more in depth look at these Collections and what you can do with them.

Feel free to explore the data you access through these examples in detail. When you connect a FractalClient to the server without a username and password, the data is open to explore and cannot alter what is saved on the server itself. So if you change your local data, the server data remains untouched!