You can run this notebook online in a Binder session or view it on Github.
First Steps¶
The Molecular Sciences Software Institute hosts the Quantum Chemistry Archive (QCArchive) and makes this data available to the entire Computational Molecular Sciences community free of charge. The QCArchive is both a database to view, analyze, and explore existing data as well as a live instance that continuously generates new data as directed by the community.
The primary interface to this database in Python is a through a FractalClient
from the qcportal
package which can be downloaded via pip (pip install -e qcportal
) or conda (conda install qcportal -c conda-forge
). A new FractalClient
automatically connects to MolSSI’s central server and has access to all data contained within the QCArchive.
[1]:
import qcportal as ptl
client = ptl.FractalClient()
client
[1]:
FractalClient
- Server: The MolSSI QCArchive Server
- Address: https://api.qcarchive.molssi.org:443/
- Username: None
Finding Collections¶
One of the main ways to explore the QCArchive is to examine Collection
s which are structures that allow easy manipulation of data in preset ways. Several example of Collection
s contained within the QCArchive are as follows:
Dataset
- A dataset where each record corresponds to a single molecule, with one or more QM methods applied to that molecule.ReactionDataset
- A dataset where each record is a combination of molecules (e.g. interaction and reaction energies). Each record contains data from one or more QM methods.OptimizationDataset
- A dataset where each record represents geometry optimization of a molecule.TorsionDriveDataset
- A dataset which organizes many molecular torsion scans together for data exploration, analysis, and methodology comparison (see the TorsionDrive Dataset example for more details).
[2]:
client.list_collections().head()
[2]:
tagline | ||
---|---|---|
collection | name | |
Dataset | GDB13-T | In progress |
OpenFF Discrepancy Benchmark 1 | None | |
OpenFF NCI250K Boron 1 | None | |
OpenFF Optimization Set 1 | None | |
OpenFF VEHICLe Set 1 | None |
Specific Collection
types can be queried to limit the amount of collections to browse through:
[3]:
client.list_collections("reactiondataset").head()
[3]:
tagline | ||
---|---|---|
collection | name | |
ReactionDataset | A21 | Equilibrium complexes from A24 database of sma... |
A24 | Interaction energies for small bimolecular com... | |
ACONF | Conformation energies for alkanes | |
AlkBind12 | Binding energies of saturated and unsaturated ... | |
AlkIsod14 | Isodesmic reaction energies for alkanes N=3--8 |
Exploring Collections¶
Collections can be obtained by pulling their data from the central server. A collection is primarily metadata and extremely large collections can be pulled in a few seconds. For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:
[4]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)
ReactionDataset(name=`S22`, id='184', client='https://api.qcarchive.molssi.org:443/')
Statistics and Visualization¶
Visual statics and plotting can be generated by the visualize
command:
[5]:
ds.visualize(method="B2PLYP", basis=["def2-svp", "def2-tzvp"], bench="S220", kind="violin")
Next steps¶
Congratulations! You have taken the first steps to exploring the data within the QCArchive. Please consider viewing the Reaction Dataset and the TorsionDrive Dataset examples for a more in depth look at these Collections and what you can do with them.
Feel free to explore the data you access through these examples in detail. When you connect a FractalClient
to the server without a username and password, the data is open to explore and cannot alter what is saved on the server itself. So if you change your local data, the server data remains untouched!