ReactionDatasets are datasets where the primary index is made up of linear combinations of individual computations. For example, an interaction energy dataset would have an index of the complex subtracted by the individual monomers to obtain a final interaction energy. This idea can extended to standard reaction energies, conformational defect energies, and more.
This dataset type has been developed by the QCArchive Team in conjunction with:
To begin, we can connect to the MolSSI QCArchive server:
import qcportal as ptl client = ptl.FractalClient() print(client)
FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')
ReactionDatasets can be explored below:
|ReactionDataset||A21||Equilibrium complexes from A24 database of sma...|
|A24||Interaction energies for small bimolecular com...|
|ACONF||Conformation energies for alkanes|
|AlkBind12||Binding energies of saturated and unsaturated ...|
|AlkIsod14||Isodesmic reaction energies for alkanes N=3--8|
Exploring a Dataset¶
For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:
ds = client.get_collection("ReactionDataset", "S22") print(ds)
ReactionDataset(name=`S22`, id='5c8159a4b6a2de3bd1e74306', client='https://api.qcarchive.molssi.org:443/')
This dataset automatically comes with some
Contributed Value data, or data that has been provided rather than explicitly computed through QCArchive. Such data often come from experiments or very costly benchmarks taken from literature.
Datasets are based off of Pandas DataFrames; we can directly access the underlying DataFrame to see the data provided:
|Formic Acid Dimer||-18.61||-18.81||-18.753|
|Uracil Dimer HB||-20.65||-20.69||-20.641|
Here we used
.head() to access the first five records in the
Collections that have
Dataset in the name (including
ReactionDataset) have a history available to them to list the data that has been computed. In this case we will filter our history by the DFT method
B2PLYP and the basis set
Here we can see that there are five primary keys in the computation:
driver- The type of computation, this can be energy, gradient, Hessian, and properties.
program- The program used in the computation.
method- The quantum chemistry, semiempirical, AI-model, or force field used in the computation.
basis- The basis used in the computation.
keywords- A keywords alias used in the computation, specific to the details of the program or procedure.
In addition, there is also the
stoichiometry field which is unique to
ReactionDatasets. There exist several ways to compute the interaction energy: counterpoise-corrected (
cp), non-counterpoise-corrected (
default), and Valiron–Mayer function counterpoise (
stoichiometry field allows for the selection of this particular form.
To obtain the data for the various historical computations we must query them from the server. Here we will automatically pull all relevant computations that match our query:
|Formic Acid Dimer||-18.61||-18.81||-18.753||-25.933297||-20.668411|
|Uracil Dimer HB||-20.65||-20.69||-20.641||-25.623412||-21.922461|
Statistics and Visualization¶
Visual statistics and plotting can be generated by the
ds.visualize(method=["B3LYP", "B3LYP-D3", "B3LYP-D3M"], basis=["def2-tzvp"], groupby="D3")
ds.visualize(method=["B3LYP", "B3LYP-D3", "B2PLYP", "B2PLYP-D3"], basis="def2-tzvp", groupby="D3", kind="violin")