You can run this notebook online in a Binder session or view it on Github.

TorsionDrive Datasets

An individual TorsionDrive is specific workflow in the QCArchive ecosystem that computes the energy profile of rotatable dihedral (torsion) for use in Force Field fitting. This workflow has been developed by the QCArchive Team in conjunction with:

The top-level TorsionDrive code can be found here.

The TorsionDrive Dataset organizes many TorsionDrives together for data exploration, analysis, and methodology comparison. To begin, we can connect to the MolSSI QCArchive server and query all known TorsionDrive Datasets:

[1]:
import qcportal as ptl
client = ptl.FractalClient()
client
[1]:

FractalClient

  • Server:   The MolSSI QCArchive Server
  • Address:   https://api.qcarchive.molssi.org:443/
  • Username:   None
[2]:
client.list_collections("TorsionDriveDataset")
[2]:
tagline
collection name
TorsionDriveDataset OpenFF Fragmenter Phenyl Benchmark Phenyl substituent torsional barrier heights.
OpenFF Group1 Torsions None

One of the datasets can be obtained in the canonical manner:

[3]:
ds = client.get_collection("TorsionDriveDataset", "OpenFF Fragmenter Phenyl Benchmark")

Exploring the Dataset

Each row of the dataset is comprised of a Entry which contains a list of starting molecules for the TorsionDrive, the dihedral of interest (in this case zero-indexed), and the scan resolution of the dihedral. For the purposes of the underlying DataFrame each row is the name of this Entry.

[4]:
ds.df.head()
[4]:
c1c[cH:1][c:2](cc1)[C:3](=[O:4])O
c1[cH:1][c:2](cnc1)[C:3](=[O:4])O
[cH:1]1cncc[c:2]1[C:3](=[O:4])O
[cH:1]1cc(nc[c:2]1[C:3](=[O:4])O)[O-]
Cc1c[cH:1][c:2](cn1)[C:3](=[O:4])O

New computations are based off specifications which contain many additional parameters to tune the torsiondrive as well as the underlying computational method. Here, we can list all specifications that are attributed to this dataset.

[5]:
ds.list_specifications()
[5]:
Description
Name
UFF UFF gradient evaluation with RDKit
B3LYP-D3 B3LYP-D3 evaluation with Psi4

As we can see, we have several specifications at different levels of theory. It is important to recall that these Collections are “live”: new specifications can be added and individual TorsionDrives can be under computation. To see the current status of each specification the status function is provided:

[6]:
ds.status(["uff", "b3lyp-d3"])
[6]:
UFF B3LYP-D3
COMPLETE 165 226
INCOMPLETE 62 1
[7]:
ds.status(["b3lyp-d3"], collapse=False, status="COMPLETE").head()
[7]:
B3LYP-D3
c1c[cH:1][c:2](cc1)[C:3](=[O:4])O COMPLETE
c1[cH:1][c:2](cnc1)[C:3](=[O:4])O COMPLETE
[cH:1]1cncc[c:2]1[C:3](=[O:4])O COMPLETE
[cH:1]1cc(nc[c:2]1[C:3](=[O:4])O)[O-] COMPLETE
Cc1c[cH:1][c:2](cn1)[C:3](=[O:4])O COMPLETE

Visualizing the TorsionDrives

TorsionDrives can be visualized via the visualize command. Multiple torsiondrives can be plotted on the same graph for comparison.

[8]:
ds.visualize(["[CH3:4][O:3][c:2]1[cH:1]cccc1", "[CH3:4][O:3][c:2]1[cH:1]ccnc1"], "B3LYP-D3", units="kJ / mol")

Computational Requirements

We can check the number of optimizations and individual gradient evaluations by using the count method. By default, the count method shows the number of individual geometry optimizations:

[9]:
ds.counts(["[CH3:4][O:3][c:2]1[cH:1]cccc1", "[CH3:4][O:3][c:2]1[cH:1]ccnc1"])
[9]:
UFF B3LYP-D3
[CH3:4][O:3][c:2]1[cH:1]cccc1 49 49
[CH3:4][O:3][c:2]1[cH:1]ccnc1 49 53

In addition to individual optimization runs, we can also find a sum of how many gradient evaluations were performed in the course of the run.

[10]:
ds.counts(["[CH3:4][O:3][c:2]1[cH:1]cccc1", "[CH3:4][O:3][c:2]1[cH:1]ccnc1"], count_gradients=True)
[10]:
UFF B3LYP-D3
[CH3:4][O:3][c:2]1[cH:1]cccc1 606 601
[CH3:4][O:3][c:2]1[cH:1]ccnc1 538 680

Deep data inspection

The TorsionDriveDataset also allows exploration of every single computation and molecule created during the individual TorsionDrive executions. Examples below this point will only be for those who wish to explore all of the individual computations in a TorsionDrive Dataset.

These TorsionDrive Datasets are different from canonical ReactionDataset and Dataset collections as each item in the underlying Pandas DataFrame is a TorsionDriveRecord object which contains links to all data used in the TorsionDrive. We can observe the these in the underlying DataFrame:

[11]:
ds.df.head()
[11]:
UFF B3LYP-D3
c1c[cH:1][c:2](cc1)[C:3](=[O:4])O TorsionDriveRecord(id='5c951108cc2095305535f47... TorsionDriveRecord(id='5c955144cc209530555d9af...
c1[cH:1][c:2](cnc1)[C:3](=[O:4])O TorsionDriveRecord(id='5c951108cc2095305535f48... TorsionDriveRecord(id='5c955144cc209530555d9b1...
[cH:1]1cncc[c:2]1[C:3](=[O:4])O TorsionDriveRecord(id='5c951108cc2095305535f48... TorsionDriveRecord(id='5c955144cc209530555d9b2...
[cH:1]1cc(nc[c:2]1[C:3](=[O:4])O)[O-] TorsionDriveRecord(id='5c951108cc2095305535f49... TorsionDriveRecord(id='5c955144cc209530555d9b3...
Cc1c[cH:1][c:2](cn1)[C:3](=[O:4])O TorsionDriveRecord(id='5c951108cc2095305535f4a... TorsionDriveRecord(id='5c955144cc209530555d9b5...

We can then begin to explore an individual TorsionDriveRecord by first pulling it from the DataFrame:

[12]:
td = ds.df.loc["[CH3:4][O:3][c:2]1[cH:1]cccc1", "B3LYP-D3"]

We can then request a variety of attributes off this TorsionDriveRecord:

[13]:
print("Torsion of interest                : {}".format(td.keywords.dihedrals))
print("Final optimization energy in hartree: {}".format(td.get_final_energies(180)))
Torsion of interest                : [(3, 5, 7, 6)]
Final optimization energy in hartree: -346.5319986074462
[14]:
td.get_final_molecules(90)

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[14]:
<Molecule(name='C7H8O' formula='C7H8O' hash='a6601a8')>

The Molecule object has a measure attribute so that we check the dihedral angle is in fact 180 degrees.

[15]:
"3-5-7-6 dihedral in degrees: {}".format(td.get_final_molecules(180).measure([3, 5, 7, 6]))
[15]:
'3-5-7-6 dihedral in degrees: 179.99999995886662'

Exploring connection optimizations

If desired, we can pull each geometry optimization belonging to the torsiondrive with the get_history function. In this case, we will pull the lowest energy optimization for the 180 degree dihedral:

[16]:
opt = td.get_history(180, minimum=True)
opt
[16]:
<OptimizationRecord(id='5c988a95cc20953055316237' status='COMPLETE')>
[17]:
opt.energies
[17]:
[-346.53147622614085,
 -346.53177915374135,
 -346.5318873591642,
 -346.53194287295247,
 -346.53197743434515,
 -346.5319973422999,
 -346.53199831291954,
 -346.53199826976714,
 -346.5319986074462]

Exploring individual gradient evaluations

We can go even deeper in the calculations to look at each gradient calculation of the Optimization calculation if we so choose and see even more details about how the TorsionDrive object was constructed.

[18]:
result = opt.get_trajectory()[-1]
[19]:
print("Program:                   {}".format(result.program))
print("Number of Basis Functions: {}".format(result.properties.calcinfo_nbasis))
print("Total execution time:      {:.2f}s".format(result.provenance.wall_time))
Program:                   psi4
Number of Basis Functions: 152
Total execution time:      12.15s

This example also contains the Wiberg-Lowdin indices. As this is specific to Psi4, this data resides inside the extras tag rather than general properties. This data is not yet well curated and currently exists as a 1D list. We will first pull this data and transform it to a 2D array:

[20]:
import numpy as np
wiberg = np.array(result.extras["qcvars"]["WIBERG_LOWDIN_INDICES"]).reshape(-1, 16)

As this particular example is exploring the 3-5-7-6 dihedral, we would find the most use in the 5-7 bond. This can be acquired as follows:

[21]:
wiberg[5, 7]
[21]:
1.2700940280530457

We can continue to explore these results and even obtain the standard Psi4 logging information!

[22]:
print(result.get_stdout()[:1000])

  Memory set to  60.800 GiB by Python driver.
gradient() will perform analytic gradient computation.

*** tstart() called on dt039
*** at Mon Mar 25 04:02:23 2019

   => Loading Basis Set <=

    Name: DEF2-SVP
    Role: ORBITAL
    Keyword: BASIS
    atoms 1-7  entry C          line    90 file /home/lnaden/miniconda3/envs/qca/share/psi4/basis/def2-svp.gbs
    atoms 8    entry O          line   130 file /home/lnaden/miniconda3/envs/qca/share/psi4/basis/def2-svp.gbs
    atoms 9-16 entry H          line    15 file /home/lnaden/miniconda3/envs/qca/share/psi4/basis/def2-svp.gbs


         ---------------------------------------------------------
                                   SCF
               by Justin Turney, Rob Parrish, Andy Simmonett
                          and Daniel G. A. Smith
                              RKS Reference
                        6 Threads,  62259 MiB Core
         ---------------------------------------------------------

  ==> Geometry <==

    Molecular