You can run this notebook online in a Binder session or view it on Github.
Getting Molecules¶
This example shows how to get a molecule from QCArchive in a number of contexts.
From an ID¶
Every molecule computed with QCArchive is assigned a unique ID. If a molecule’s ID is known, it can be queried from the Molecules table.
[1]:
import qcportal as ptl
client = ptl.FractalClient()
For example, molecule 1234 is 1,2,3-trimethylbenzene.
[2]:
mol = client.query_molecules(1234)[0]
mol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[2]:
<Molecule(name='C9H12' formula='C9H12' hash='572b510')>
[3]:
print(mol)
Geometry (in Angstrom), charge = 0.0, multiplicity = 1:
Center X Y Z
------------ ----------------- ----------------- -----------------
C 0.776479871994 1.156134463385 0.121542591228
C 0.438429690334 0.679567908122 -1.141595091975
C 0.439577078821 0.423533055514 1.255585387764
C -0.363723536834 -0.465178778108 -1.279725991730
C -0.415502828385 -0.685937227907 1.160631416613
C -0.792912983429 -1.170236644458 -0.121804279943
C -0.744392084678 -0.917923156500 -2.666766549983
C -0.856925058179 -1.374181477949 2.427060703777
C -1.703936690413 -2.374380900784 -0.246989621254
H 1.380610203168 2.049406423411 0.216714048921
H 0.770290662964 1.232461941773 -2.011963177510
H 0.769502950936 0.784464203584 2.222141623291
H -0.238962510978 -1.878436765084 -2.898916777516
H -0.447809351101 -0.177691478927 -3.439954373507
H -1.844638825192 -1.050455805875 -2.735084841327
H -1.962016543060 -1.480103641644 2.438815782834
H -0.562925111565 -0.802128403465 3.332572326307
H -0.383242656300 -2.377541231755 2.485353500027
H -2.761425129123 -2.038610393380 -0.229251405356
H -1.542976842368 -3.097214459361 0.578338572599
H -1.519884697209 -2.938478658464 -1.182927991461
The following sections show how to find molecule IDs from Collections.
From a Dataset¶
Load a Dataset
:
[5]:
import qcportal as ptl
client = ptl.FractalClient()
ds = client.get_collection("Dataset", "SMIRNOFF Coverage Set 1")
get_molecules
returns molecules corresponding to row of the Dataset
:
[6]:
molecules = ds.get_molecules()
molecules
[6]:
molecule | |
---|---|
index | |
C(CBr)c1n[nH]nn1-1 | Geometry (in Angstrom), charge = 0.0, mult... |
C(CBr)c1n[nH]nn1-2 | Geometry (in Angstrom), charge = 0.0, mult... |
C(CBr)c1n[nH]nn1-3 | Geometry (in Angstrom), charge = 0.0, mult... |
C(CBr)c1n[n-]nn1-0 | Geometry (in Angstrom), charge = -1.0, mul... |
C(CBr)c1n[n-]nn1-1 | Geometry (in Angstrom), charge = -1.0, mul... |
... | ... |
CSSCCN=C=S-7 | Geometry (in Angstrom), charge = 0.0, mult... |
CSSCCN=C=S-8 | Geometry (in Angstrom), charge = 0.0, mult... |
CSSCCN=C=S-9 | Geometry (in Angstrom), charge = 0.0, mult... |
CSSCCN=C=S-10 | Geometry (in Angstrom), charge = 0.0, mult... |
CSSCCN=C=S-11 | Geometry (in Angstrom), charge = 0.0, mult... |
1109 rows × 1 columns
Individual Molecule
objects may be picked out of the dataframe:
[8]:
molecules.loc["C(CBr)c1n[n-]nn1-0", "molecule"]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[8]:
<Molecule(name='BrC3H4N4' formula='BrC3H4N4' hash='9fd48c6')>
For large datasets, you may not want to query all molecules at once. get_molecules
accepts a subset option for selecting specific molecules:
[9]:
ds.get_molecules(subset=['C(CBr)c1n[n-]nn1-0','CSSCCN=C=S-10'])
[9]:
molecule | |
---|---|
index | |
C(CBr)c1n[n-]nn1-0 | Geometry (in Angstrom), charge = -1.0, mul... |
CSSCCN=C=S-10 | Geometry (in Angstrom), charge = 0.0, mult... |
If a single string is provided for subset
, the Molecule
object is returned directly.
[10]:
ds.get_molecules(subset='CSSCCN=C=S-10')
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[10]:
<Molecule(name='C4H7NS3' formula='C4H7NS3' hash='fc1a1d6')>
From a ReactionDataset¶
Load a ReactionDataset
:
[11]:
import qcportal as ptl
client = ptl.FractalClient()
ds = client.get_collection("ReactionDataset", "S22")
get_molecules
returns molecules corresponding to each reaction. By default, the final molecule is returned for every reaction:
[12]:
dimers = ds.get_molecules()
dimers
[12]:
molecule | |||
---|---|---|---|
name | stoichiometry | idx | |
2-Pyridone-2-Aminopyridine Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Adenine-Thymine Complex Stack | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Adenine-Thymine Complex WC | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Ammonia Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Benzene Dimer PD | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Benzene Dimer T-Shape | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Benzene-Ammonia Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Benzene-HCN Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Benzene-Methane Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Benzene-Water Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Ethene Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Ethene-Ethine Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Formamide Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Formic Acid Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Indole-Benzene Complex Stack | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Indole-Benzene Complex T-Shape | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Methane Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Phenol Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Pyrazine Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Uracil Dimer HB | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Uracil Dimer Stack | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Water Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
Individual Molecule
objects may be picked out of the dataframe:
[13]:
dimers.loc['Adenine-Thymine Complex WC', 'molecule'][0]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[13]:
<Molecule(name='C10H11N7O2' formula='C10H11N7O2' hash='5357c2c')>
Reactants and products (or monomers and complexes) may be picked out with the stoich
keyword. For the case of an interaction energy dataset like S22, stoich="default"
corresponds to complexes and stoich="default1"
corresponds to the monomers without counterpoise corrections.
[14]:
monomers = ds.get_molecules(stoich="default1")
monomers.head(10)
[14]:
molecule | |||
---|---|---|---|
name | stoichiometry | idx | |
2-Pyridone-2-Aminopyridine Complex | default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Adenine-Thymine Complex Stack | default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Adenine-Thymine Complex WC | default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Ammonia Dimer | default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Benzene Dimer PD | default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
1 | Geometry (in Angstrom), charge = 0.0, mult... |
As before, the individual Molecule
objects for the monomers may be extracted from the DataFrame:
[15]:
monomers.loc['Adenine-Thymine Complex WC', 'molecule'][0]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[15]:
<Molecule(name='C10H11N7O2 ((0,),[])' formula='C5H5N5' hash='c0e7ed3')>
[16]:
monomers.loc['Adenine-Thymine Complex WC', 'molecule'][1]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[16]:
<Molecule(name='C10H11N7O2 ((1,),[])' formula='C5H6N2O2' hash='a4f9749')>
Note that it is possible to get all molecules involved in a reaction by specifying a list for stoich
:
[17]:
ds.get_molecules(stoich=['default', 'default1']).head(15)
[17]:
molecule | |||
---|---|---|---|
name | stoichiometry | idx | |
2-Pyridone-2-Aminopyridine Complex | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Adenine-Thymine Complex Stack | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Adenine-Thymine Complex WC | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Ammonia Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Benzene Dimer PD | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... |
Counterpoise-corrected calcuations are available through stoich="cp"
and stoich="cp1"
. Counterpoise-corrected monomers contain ghost atoms:
[18]:
ds.get_molecules(stoich="cp1").loc['Adenine-Thymine Complex WC', 'molecule'][0]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[18]:
<Molecule(name='C10H11N7O2 ((0,),[1])' formula='C10H11N7O2' hash='d3955aa')>
[19]:
ds.get_molecules(stoich="cp1").loc['Adenine-Thymine Complex WC', 'molecule'][1]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[19]:
<Molecule(name='C10H11N7O2 ((1,),[0])' formula='C10H11N7O2' hash='e63c41f')>
For large datasets, you may not want to query all molecules at once. get_molecules
accepts a subset option for selecting specific reactions:
[20]:
ds.get_molecules(subset='Adenine-Thymine Complex WC')
[20]:
molecule | |||
---|---|---|---|
name | stoichiometry | idx | |
Adenine-Thymine Complex WC | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
[21]:
ds.get_molecules(subset=['Adenine-Thymine Complex WC', 'Ammonia Dimer', 'Water Dimer'], stoich=['default', 'default1'])
[21]:
molecule | |||
---|---|---|---|
name | stoichiometry | idx | |
Adenine-Thymine Complex WC | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Ammonia Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... | ||
Water Dimer | default | 0 | Geometry (in Angstrom), charge = 0.0, mult... |
default1 | 0 | Geometry (in Angstrom), charge = 0.0, mult... | |
1 | Geometry (in Angstrom), charge = 0.0, mult... |
From an OptimizationDataset¶
Load an OptimizationDataset
:
[22]:
import qcportal as ptl
client = ptl.FractalClient()
client.list_collections()
ds = client.get_collection("OptimizationDataset", "SMIRNOFF Coverage Set 1")
Show some available molecules:
[23]:
ds.df.head()
[23]:
COC(O)OC-0 |
C[S-]-0 |
CS-0 |
CO-0 |
CCO-0 |
Show available specifications:
[24]:
ds.list_specifications()
[24]:
Description | |
---|---|
Name | |
default | Standard OpenFF optimization quantum chemistry... |
Obtain a specific record from a molecule and specification:
[25]:
r = ds.get_record("CCO-0","default")
Get the optimized molecule:
[26]:
r.get_final_molecule()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[26]:
<Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>
Get the optimization trajectory:
[27]:
r.get_molecular_trajectory()
[27]:
[<Molecule(name='C2H6O' formula='C2H6O' hash='29df3ae')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='93989e4')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='14261f7')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='3b6db86')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='b35d632')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='c900f12')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='a1e9d7a')>,
<Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>]
From a TorsionDriveDataset¶
[28]:
import qcportal as ptl
client = ptl.FractalClient()
ds = client.get_collection("TorsionDriveDataset", "SMIRNOFF Coverage Torsion Set 1")
Show some available torsions:
[29]:
ds.df.head()
[29]:
[CH3:1][O:2][CH:3]([OH:4])OC |
[CH3:1][O:2][CH:3](O)[O:4]C |
CO[CH:3]([OH:4])[O:2][CH3:1] |
C[O:4][CH:3](O)[O:2][CH3:1] |
[H:4][C:3](O)([O:2][CH3:1])OC |
Show available specifications:
[30]:
ds.list_specifications()
[30]:
Description | |
---|---|
Name | |
default | Standard OpenFF torsiondrive specification. |
Get a specific torsiondrive:
[31]:
td = ds.get_record("CO[CH:3]([OH:4])[O:2][CH3:1]", "default")
Get molecules for each angle along the torsion scan:
[32]:
td.get_final_molecules()
[32]:
{(-75,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='60e16ca')>,
(-90,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c337c03')>,
(-60,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='b4ff4d4')>,
(-105,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5b05d3a')>,
(-30,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='8737c8f')>,
(-45,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='240c817')>,
(-120,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='399d214')>,
(0,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='f1b0dd1')>,
(-15,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='05c30a0')>,
(15,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='f329f87')>,
(-150,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='1c56b54')>,
(180,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='d299528')>,
(-165,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c81a1fc')>,
(-135,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='530c77d')>,
(30,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='99156ab')>,
(150,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='810b759')>,
(45,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='e1f13fa')>,
(165,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='df216e3')>,
(60,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='e69654b')>,
(75,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5c12648')>,
(135,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='35f87a2')>,
(90,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='cdbfa17')>,
(120,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5271be0')>,
(105,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c0f46d7')>}
[33]:
td.get_final_molecules()[(30,)]
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[33]:
<Molecule(name='C3H8O3' formula='C3H8O3' hash='99156ab')>