You can run this notebook online in a Binder session or view it on Github.

Getting Molecules

This example shows how to get a molecule from QCArchive in a number of contexts.

From an ID

Every molecule computed with QCArchive is assigned a unique ID. If a molecule’s ID is known, it can be queried from the Molecules table.

[1]:
import qcportal as ptl
client = ptl.FractalClient()

For example, molecule 1234 is 1,2,3-trimethylbenzene.

[2]:
mol = client.query_molecules(1234)[0]
mol

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[2]:
<Molecule(name='C9H12' formula='C9H12' hash='572b510')>
[3]:
print(mol)
    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

       Center              X                  Y                   Z
    ------------   -----------------  -----------------  -----------------
    C                 0.776479871994     1.156134463385     0.121542591228
    C                 0.438429690334     0.679567908122    -1.141595091975
    C                 0.439577078821     0.423533055514     1.255585387764
    C                -0.363723536834    -0.465178778108    -1.279725991730
    C                -0.415502828385    -0.685937227907     1.160631416613
    C                -0.792912983429    -1.170236644458    -0.121804279943
    C                -0.744392084678    -0.917923156500    -2.666766549983
    C                -0.856925058179    -1.374181477949     2.427060703777
    C                -1.703936690413    -2.374380900784    -0.246989621254
    H                 1.380610203168     2.049406423411     0.216714048921
    H                 0.770290662964     1.232461941773    -2.011963177510
    H                 0.769502950936     0.784464203584     2.222141623291
    H                -0.238962510978    -1.878436765084    -2.898916777516
    H                -0.447809351101    -0.177691478927    -3.439954373507
    H                -1.844638825192    -1.050455805875    -2.735084841327
    H                -1.962016543060    -1.480103641644     2.438815782834
    H                -0.562925111565    -0.802128403465     3.332572326307
    H                -0.383242656300    -2.377541231755     2.485353500027
    H                -2.761425129123    -2.038610393380    -0.229251405356
    H                -1.542976842368    -3.097214459361     0.578338572599
    H                -1.519884697209    -2.938478658464    -1.182927991461

The following sections show how to find molecule IDs from Collections.

From a Dataset

Load a Dataset:

[5]:
import qcportal as ptl
client = ptl.FractalClient()

ds = client.get_collection("Dataset", "SMIRNOFF Coverage Set 1")

get_molecules returns molecules corresponding to row of the Dataset:

[6]:
molecules = ds.get_molecules()
molecules
[6]:
molecule
index
C(CBr)c1n[nH]nn1-1 Geometry (in Angstrom), charge = 0.0, mult...
C(CBr)c1n[nH]nn1-2 Geometry (in Angstrom), charge = 0.0, mult...
C(CBr)c1n[nH]nn1-3 Geometry (in Angstrom), charge = 0.0, mult...
C(CBr)c1n[n-]nn1-0 Geometry (in Angstrom), charge = -1.0, mul...
C(CBr)c1n[n-]nn1-1 Geometry (in Angstrom), charge = -1.0, mul...
... ...
CSSCCN=C=S-7 Geometry (in Angstrom), charge = 0.0, mult...
CSSCCN=C=S-8 Geometry (in Angstrom), charge = 0.0, mult...
CSSCCN=C=S-9 Geometry (in Angstrom), charge = 0.0, mult...
CSSCCN=C=S-10 Geometry (in Angstrom), charge = 0.0, mult...
CSSCCN=C=S-11 Geometry (in Angstrom), charge = 0.0, mult...

1109 rows × 1 columns

Individual Molecule objects may be picked out of the dataframe:

[8]:
molecules.loc["C(CBr)c1n[n-]nn1-0", "molecule"]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[8]:
<Molecule(name='BrC3H4N4' formula='BrC3H4N4' hash='9fd48c6')>

For large datasets, you may not want to query all molecules at once. get_molecules accepts a subset option for selecting specific molecules:

[9]:
ds.get_molecules(subset=['C(CBr)c1n[n-]nn1-0','CSSCCN=C=S-10'])
[9]:
molecule
index
C(CBr)c1n[n-]nn1-0 Geometry (in Angstrom), charge = -1.0, mul...
CSSCCN=C=S-10 Geometry (in Angstrom), charge = 0.0, mult...

If a single string is provided for subset, the Molecule object is returned directly.

[10]:
ds.get_molecules(subset='CSSCCN=C=S-10')

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[10]:
<Molecule(name='C4H7NS3' formula='C4H7NS3' hash='fc1a1d6')>

From a ReactionDataset

Load a ReactionDataset:

[11]:
import qcportal as ptl
client = ptl.FractalClient()

ds = client.get_collection("ReactionDataset", "S22")

get_molecules returns molecules corresponding to each reaction. By default, the final molecule is returned for every reaction:

[12]:
dimers = ds.get_molecules()
dimers
[12]:
molecule
name stoichiometry idx
2-Pyridone-2-Aminopyridine Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
Adenine-Thymine Complex Stack default 0 Geometry (in Angstrom), charge = 0.0, mult...
Adenine-Thymine Complex WC default 0 Geometry (in Angstrom), charge = 0.0, mult...
Ammonia Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Benzene Dimer PD default 0 Geometry (in Angstrom), charge = 0.0, mult...
Benzene Dimer T-Shape default 0 Geometry (in Angstrom), charge = 0.0, mult...
Benzene-Ammonia Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
Benzene-HCN Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
Benzene-Methane Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
Benzene-Water Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
Ethene Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Ethene-Ethine Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
Formamide Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Formic Acid Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Indole-Benzene Complex Stack default 0 Geometry (in Angstrom), charge = 0.0, mult...
Indole-Benzene Complex T-Shape default 0 Geometry (in Angstrom), charge = 0.0, mult...
Methane Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Phenol Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Pyrazine Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
Uracil Dimer HB default 0 Geometry (in Angstrom), charge = 0.0, mult...
Uracil Dimer Stack default 0 Geometry (in Angstrom), charge = 0.0, mult...
Water Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...

Individual Molecule objects may be picked out of the dataframe:

[13]:
dimers.loc['Adenine-Thymine Complex WC', 'molecule'][0]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[13]:
<Molecule(name='C10H11N7O2' formula='C10H11N7O2' hash='5357c2c')>

Reactants and products (or monomers and complexes) may be picked out with the stoich keyword. For the case of an interaction energy dataset like S22, stoich="default" corresponds to complexes and stoich="default1" corresponds to the monomers without counterpoise corrections.

[14]:
monomers = ds.get_molecules(stoich="default1")
monomers.head(10)
[14]:
molecule
name stoichiometry idx
2-Pyridone-2-Aminopyridine Complex default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Adenine-Thymine Complex Stack default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Adenine-Thymine Complex WC default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Ammonia Dimer default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Benzene Dimer PD default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...

As before, the individual Molecule objects for the monomers may be extracted from the DataFrame:

[15]:
monomers.loc['Adenine-Thymine Complex WC', 'molecule'][0]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[15]:
<Molecule(name='C10H11N7O2 ((0,),[])' formula='C5H5N5' hash='c0e7ed3')>
[16]:
monomers.loc['Adenine-Thymine Complex WC', 'molecule'][1]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[16]:
<Molecule(name='C10H11N7O2 ((1,),[])' formula='C5H6N2O2' hash='a4f9749')>

Note that it is possible to get all molecules involved in a reaction by specifying a list for stoich:

[17]:
ds.get_molecules(stoich=['default', 'default1']).head(15)
[17]:
molecule
name stoichiometry idx
2-Pyridone-2-Aminopyridine Complex default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Adenine-Thymine Complex Stack default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Adenine-Thymine Complex WC default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Ammonia Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Benzene Dimer PD default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...

Counterpoise-corrected calcuations are available through stoich="cp" and stoich="cp1". Counterpoise-corrected monomers contain ghost atoms:

[18]:
ds.get_molecules(stoich="cp1").loc['Adenine-Thymine Complex WC', 'molecule'][0]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[18]:
<Molecule(name='C10H11N7O2 ((0,),[1])' formula='C10H11N7O2' hash='d3955aa')>
[19]:
ds.get_molecules(stoich="cp1").loc['Adenine-Thymine Complex WC', 'molecule'][1]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[19]:
<Molecule(name='C10H11N7O2 ((1,),[0])' formula='C10H11N7O2' hash='e63c41f')>

For large datasets, you may not want to query all molecules at once. get_molecules accepts a subset option for selecting specific reactions:

[20]:
ds.get_molecules(subset='Adenine-Thymine Complex WC')
[20]:
molecule
name stoichiometry idx
Adenine-Thymine Complex WC default 0 Geometry (in Angstrom), charge = 0.0, mult...
[21]:
ds.get_molecules(subset=['Adenine-Thymine Complex WC', 'Ammonia Dimer', 'Water Dimer'], stoich=['default', 'default1'])
[21]:
molecule
name stoichiometry idx
Adenine-Thymine Complex WC default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Ammonia Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...
Water Dimer default 0 Geometry (in Angstrom), charge = 0.0, mult...
default1 0 Geometry (in Angstrom), charge = 0.0, mult...
1 Geometry (in Angstrom), charge = 0.0, mult...

From an OptimizationDataset

Load an OptimizationDataset:

[22]:
import qcportal as ptl
client = ptl.FractalClient()

client.list_collections()
ds = client.get_collection("OptimizationDataset", "SMIRNOFF Coverage Set 1")

Show some available molecules:

[23]:
ds.df.head()
[23]:
COC(O)OC-0
C[S-]-0
CS-0
CO-0
CCO-0

Show available specifications:

[24]:
ds.list_specifications()
[24]:
Description
Name
default Standard OpenFF optimization quantum chemistry...

Obtain a specific record from a molecule and specification:

[25]:
r = ds.get_record("CCO-0","default")

Get the optimized molecule:

[26]:
r.get_final_molecule()

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[26]:
<Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>

Get the optimization trajectory:

[27]:
r.get_molecular_trajectory()
[27]:
[<Molecule(name='C2H6O' formula='C2H6O' hash='29df3ae')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='93989e4')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='14261f7')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='3b6db86')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='b35d632')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='c900f12')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='a1e9d7a')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>]

From a TorsionDriveDataset

[28]:
import qcportal as ptl
client = ptl.FractalClient()

ds = client.get_collection("TorsionDriveDataset", "SMIRNOFF Coverage Torsion Set 1")

Show some available torsions:

[29]:
ds.df.head()
[29]:
[CH3:1][O:2][CH:3]([OH:4])OC
[CH3:1][O:2][CH:3](O)[O:4]C
CO[CH:3]([OH:4])[O:2][CH3:1]
C[O:4][CH:3](O)[O:2][CH3:1]
[H:4][C:3](O)([O:2][CH3:1])OC

Show available specifications:

[30]:
ds.list_specifications()
[30]:
Description
Name
default Standard OpenFF torsiondrive specification.

Get a specific torsiondrive:

[31]:
td = ds.get_record("CO[CH:3]([OH:4])[O:2][CH3:1]", "default")

Get molecules for each angle along the torsion scan:

[32]:
td.get_final_molecules()
[32]:
{(-75,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='60e16ca')>,
 (-90,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c337c03')>,
 (-60,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='b4ff4d4')>,
 (-105,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5b05d3a')>,
 (-30,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='8737c8f')>,
 (-45,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='240c817')>,
 (-120,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='399d214')>,
 (0,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='f1b0dd1')>,
 (-15,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='05c30a0')>,
 (15,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='f329f87')>,
 (-150,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='1c56b54')>,
 (180,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='d299528')>,
 (-165,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c81a1fc')>,
 (-135,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='530c77d')>,
 (30,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='99156ab')>,
 (150,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='810b759')>,
 (45,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='e1f13fa')>,
 (165,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='df216e3')>,
 (60,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='e69654b')>,
 (75,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5c12648')>,
 (135,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='35f87a2')>,
 (90,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='cdbfa17')>,
 (120,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5271be0')>,
 (105,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c0f46d7')>}
[33]:
td.get_final_molecules()[(30,)]

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[33]:
<Molecule(name='C3H8O3' formula='C3H8O3' hash='99156ab')>