You can run this notebook online in a Binder session or view it on Github.

Reaction Datasets

ReactionDatasets are datasets where the primary index represents a chemical reaction, made up of stoichiometrically weighted linear combinations of individual computations. For example, an interaction energy dataset would have an index of the complex subtracted by the individual monomers to obtain a final interaction energy. This idea can extended to standard reaction energies, conformational defect energies, and more.

This dataset type has been developed by the QCArchive Team in conjunction with:

To begin, we can connect to the MolSSI QCArchive server:

[1]:
import qcportal as ptl
client = ptl.FractalClient()
client
[1]:

FractalClient

  • Server:   The MolSSI QCArchive Server
  • Address:   https://api.qcarchive.molssi.org:443/
  • Username:   None

The current ReactionDatasets can be explored below:

[2]:
client.list_collections("ReactionDataset").head()
[2]:
tagline
collection name
ReactionDataset A21 Equilibrium complexes from A24 database of sma...
A24 Interaction energies for small bimolecular com...
ACONF Conformation energies for alkanes
AlkBind12 Binding energies of saturated and unsaturated ...
AlkIsod14 Isodesmic reaction energies for alkanes N=3--8

Exploring a Dataset

For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

[3]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)
ReactionDataset(name=`S22`, id='184', client='https://api.qcarchive.molssi.org:443/')

The reactions in the dataset – dimerization reactions in the case of S22 – can be listed:

[4]:
ds.get_index()
[4]:
['2-Pyridone-2-Aminopyridine Complex',
 'Adenine-Thymine Complex Stack',
 'Adenine-Thymine Complex WC',
 'Ammonia Dimer',
 'Benzene-Ammonia Complex',
 'Benzene Dimer PD',
 'Benzene Dimer T-Shape',
 'Benzene-HCN Complex',
 'Benzene-Methane Complex',
 'Benzene-Water Complex',
 'Ethene Dimer',
 'Ethene-Ethine Complex',
 'Formamide Dimer',
 'Formic Acid Dimer',
 'Indole-Benzene Complex Stack',
 'Indole-Benzene Complex T-Shape',
 'Methane Dimer',
 'Phenol Dimer',
 'Pyrazine Dimer',
 'Uracil Dimer HB',
 'Uracil Dimer Stack',
 'Water Dimer']

Datasets contain two types of data, those computed through QCArchive (“native”) and those that are provided from external sources (“contributed”). Contributed data often come from experiments or very costly benchmarks taken from literature.

Datasets and ReactionDatasets provide a list of all data that has been computed or contributed through the list_values method.

[5]:
ds.list_values().head()
[5]:
stoichiometry name
native driver program method basis keywords
False Unknown Unknown Unknown Unknown Unknown default S220
Unknown default S22a
Unknown default S22b
True energy psi4 b2plyp aug-cc-pvdz scf_default cp cp-B2PLYP/aug-cc-pvdz
scf_default default B2PLYP/aug-cc-pvdz

Here, we have listed the first five available data sources. The first three are contributed, marked by native=False and correspond to benchmarks. The last two are computed data (native=True).

There are six primary keys to describe data:

  • native - Whether a computation was done using QCArchive.

  • driver - The type of computation, this can be energy, gradient, Hessian, and properties.

  • program - The program used in the computation.

  • method - The quantum chemistry, semiempirical, AI-model, or force field used in the computation.

  • basis - The basis used in the computation.

  • keywords - A keywords alias used in the computation, specific to the details of the program or procedure.

In addition, there is also the stoichiometry field which is unique to ReactionDatasets. There exist several ways to compute the interaction energy: counterpoise-corrected (cp), non-counterpoise-corrected (default), and Valiron–Mayer function counterpoise (vmfc). The stoichiometry field allows for the selection of this particular form.

Searches in list_values may be narrowed by specifying some or all of the keys. In this case, we will filter our history by the DFT method B2PLYP and the basis set def2-SVP.

[6]:
ds.list_values(method="B2PLYP", basis="def2-SVP")
[6]:
stoichiometry name
native driver program method basis keywords
True energy psi4 b2plyp def2-svp scf_default cp cp-B2PLYP/def2-svp
scf_default default B2PLYP/def2-svp

Querying Data

To obtain the data for the computations we must query them from the server. For example, we can pull all B3LYP-D3M interaction energies:

[7]:
ds.get_values(method="B3LYP-D3M")
[7]:
B3LYP-D3M/def2-tzvp B3LYP-D3M/aug-cc-pvdz B3LYP-D3M/def2-svp B3LYP-D3M/aug-cc-pvtz
2-Pyridone-2-Aminopyridine Complex -18.536530 -19.005121 -22.831506 -18.238308
Adenine-Thymine Complex Stack -12.149707 -12.897930 -15.577143 -11.778090
Adenine-Thymine Complex WC -17.833451 -18.449484 -22.574701 -17.687043
Ammonia Dimer -4.049052 -3.509980 -6.248386 -3.328184
Benzene Dimer PD -2.556100 -3.058981 -3.459984 -2.467563
Benzene Dimer T-Shape -3.072012 -3.617173 -3.597379 -3.016720
Benzene-Ammonia Complex -2.934200 -2.833251 -3.251346 -2.572470
Benzene-HCN Complex -5.279021 -5.479076 -5.480155 -5.221790
Benzene-Methane Complex -1.555573 -1.830850 -1.917835 -1.552191
Benzene-Water Complex -4.613285 -3.924570 -4.926573 -3.727725
Ethene Dimer -1.668464 -2.050798 -2.294543 -1.678959
Ethene-Ethine Complex -1.878851 -2.114814 -2.330609 -1.823828
Formamide Dimer -17.436781 -17.546706 -21.689185 -17.104115
Formic Acid Dimer -20.668411 -20.536286 -25.933297 -20.385421
Indole-Benzene Complex Stack -4.398316 -5.056658 -5.736506 -4.213569
Indole-Benzene Complex T-Shape -6.363817 -6.796603 -7.081938 -6.083396
Methane Dimer -0.522247 -0.813825 -0.672244 -0.511469
Phenol Dimer -8.032781 -7.974372 -10.977429 -7.523356
Pyrazine Dimer -4.096590 -4.664210 -5.443984 -4.036813
Uracil Dimer HB -21.922461 -22.497729 -25.623412 -21.904878
Uracil Dimer Stack -10.781041 -11.125815 -13.223797 -10.486322
Water Dimer -6.427460 -5.539915 -9.002674 -5.417591

The units of these energies are stored in ds.units:

[8]:
ds.units
[8]:
'kcal / mol'

Statistics and Visualization

Visual statistics and plotting can be generated by the visualize command:

[9]:
ds.visualize(method=["B3LYP", "B3LYP-D3", "B3LYP-D3M"], basis=["def2-tzvp"], groupby="D3")
[10]:
ds.visualize(method=["B3LYP", "B3LYP-D3", "B2PLYP", "B2PLYP-D3"], basis="def2-tzvp", groupby="D3", kind="violin")

Next steps

The next sections cover other collections that are used for organizing workflows, such as geometry optimization. There are more examples using Dataset and ReactionDataset in the Cookbook. Full documentation of Dataset and ReactionDataset are available in the QCPortal documentation.