You can run this notebook online in a Binder session or view it on Github.

Reaction Datasets

ReactionDatasets are datasets where the primary index is made up of linear combinations of individual computations. For example, an interaction energy dataset would have an index of the complex subtracted by the individual monomers to obtain a final interaction energy. This idea can extended to standard reaction energies, conformational defect energies, and more.

This dataset type has been developed by the QCArchive Team in conjunction with:

To begin, we can connect to the MolSSI QCArchive server:

[2]:
import qcportal as ptl
client = ptl.FractalClient()
print(client)
FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')

The current ReactionDatasets can be explored below:

[2]:
client.list_collections("ReactionDataset").head()
[2]:
tagline
collection name
ReactionDataset A21 Equilibrium complexes from A24 database of sma...
A24 Interaction energies for small bimolecular com...
ACONF Conformation energies for alkanes
AlkBind12 Binding energies of saturated and unsaturated ...
AlkIsod14 Isodesmic reaction energies for alkanes N=3--8

Exploring a Dataset

For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

[3]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)
ReactionDataset(name=`S22`, id='5c8159a4b6a2de3bd1e74306', client='https://api.qcarchive.molssi.org:443/')

This dataset automatically comes with some Contributed Value data, or data that has been provided rather than explicitly computed through QCArchive. Such data often come from experiments or very costly benchmarks taken from literature.

Datasets are based off of Pandas DataFrames; we can directly access the underlying DataFrame to see the data provided:

[4]:
ds.df.head()
[4]:
S220 S22a S22b
Ammonia Dimer -3.17 -3.15 -3.133
Water Dimer -5.02 -5.07 -4.989
Formic Acid Dimer -18.61 -18.81 -18.753
Formamide Dimer -15.96 -16.11 -16.062
Uracil Dimer HB -20.65 -20.69 -20.641

Here we used .head() to access the first five records in the ReactionDataset.

All Collections that have Dataset in the name (including ReactionDataset) have a history available to them to list the data that has been computed. In this case we will filter our history by the DFT method B2PLYP and the basis set def2-SVP

[5]:
ds.list_history(method="B2PLYP", basis="def2-SVP")
[5]:
stoichiometry
driver program method basis keywords
energy psi4 b2plyp def2-svp scf_default cp
scf_default default

Here we can see that there are five primary keys in the computation:

  • driver - The type of computation, this can be energy, gradient, Hessian, and properties.

  • program - The program used in the computation.

  • method - The quantum chemistry, semiempirical, AI-model, or force field used in the computation.

  • basis - The basis used in the computation.

  • keywords - A keywords alias used in the computation, specific to the details of the program or procedure.

In addition, there is also the stoichiometry field which is unique to ReactionDatasets. There exist several ways to compute the interaction energy: counterpoise-corrected (cp), non-counterpoise-corrected (default), and Valiron–Mayer function counterpoise (vmfc). The stoichiometry field allows for the selection of this particular form.

Querying Data

To obtain the data for the various historical computations we must query them from the server. Here we will automatically pull all relevant computations that match our query:

[6]:
ds.get_history(method="B3LYP-D3M")
ds.df.head()
[6]:
S220 S22a S22b B3LYP-D3M/def2-svp B3LYP-D3M/def2-tzvp
Ammonia Dimer -3.17 -3.15 -3.133 -6.248386 -4.049052
Water Dimer -5.02 -5.07 -4.989 -9.002674 -6.427460
Formic Acid Dimer -18.61 -18.81 -18.753 -25.933297 -20.668411
Formamide Dimer -15.96 -16.11 -16.062 -21.689185 -17.436781
Uracil Dimer HB -20.65 -20.69 -20.641 -25.623412 -21.922461

Statistics and Visualization

Visual statistics and plotting can be generated by the visualize command:

[7]:
ds.visualize(method=["B3LYP", "B3LYP-D3", "B3LYP-D3M"], basis=["def2-tzvp"], groupby="D3")
[8]:
ds.visualize(method=["B3LYP", "B3LYP-D3", "B2PLYP", "B2PLYP-D3"], basis="def2-tzvp", groupby="D3", kind="violin")