You can run this notebook online in a Binder session or view it on Github.

Optimization Datasets

An OptimizationDataset represents geometry optimizations performed on a set of molecules.

[1]:
import qcportal as ptl
client = ptl.FractalClient()
client
[1]:

FractalClient

  • Server:   The MolSSI QCArchive Server
  • Address:   https://api.qcarchive.molssi.org:443/
  • Username:   None
[2]:
client.list_collections("OptimizationDataset")
[2]:
tagline
collection name
OptimizationDataset FDA Optimization Dataset 1 None
JGI Metabolite Set 1 None
OpenFF Discrepancy Benchmark 1 None
OpenFF Full Optimization Benchmark 1 None
OpenFF NCI250K Boron 1 None
OpenFF Optimization Set 1 None
OpenFF Primary Optimization Benchmark 1 None
OpenFF VEHICLe Set 1 None
PEI None
Pfizer Discrepancy Optimization Dataset 1 None
QM8-T None
SMIRNOFF Coverage Set 1 None
[3]:
ds = client.get_collection("OptimizationDataset", "SMIRNOFF Coverage Set 1")

Exploring the Dataset

Each row of the dataset is comprised of a Entry which corresponds to a molecule.

[4]:
ds.df.head()
[4]:
COC(O)OC-0
C[S-]-0
CS-0
CO-0
CCO-0

New computations are based off specifications which contain many additional parameters to tune the geometry optimization as well as the underlying computational method. Here, we can list all specifications that are attributed to this dataset.

[5]:
ds.list_specifications()
[5]:
Description
Name
default Standard OpenFF optimization quantum chemistry...

In this case, there is one specification corresponding to a single level of theory. It is important to recall that these Collections are “live”: new specifications can be added and individual optimizations can be under computation. To see the current status of each specification the status function is provided:

[6]:
ds.status(["default"])
[6]:
default
COMPLETE 1118
INCOMPLETE 7
ERROR 7

The number of geometry steps for each molecule can be shown:

[7]:
ds.counts()
[7]:
default
COC(O)OC-0 11.0
C[S-]-0 6.0
CS-0 5.0
CO-0 4.0
CCO-0 8.0
... ...
CSSCCN=C=S-7 26.0
CSSCCN=C=S-8 45.0
CSSCCN=C=S-9 48.0
CSSCCN=C=S-10 60.0
CSSCCN=C=S-11 38.0

1118 rows × 1 columns

Individual records can be pulled for molecules:

[8]:
optrec = ds.get_record(name="CCO-0", specification="default")

These records contain the geometries and energies of the optimization trajectory. Below are some example data that may be pulled from an OptimizationRecord. The initial and final molecules may be extracted:

[9]:
optrec.get_initial_molecule()
[10]:
optrec.get_final_molecule()

And the energy trajectory of the optimization can be plotted:

[11]:
optrec.show_history()