You can run this notebook online in a Binder session or view it on Github.
Optimization Datasets¶
An OptimizationDataset
represents geometry optimizations performed on a set of molecules.
[1]:
import qcportal as ptl
client = ptl.FractalClient()
client
[1]:
FractalClient
- Server: The MolSSI QCArchive Server
- Address: https://api.qcarchive.molssi.org:443/
- Username: None
[2]:
client.list_collections("OptimizationDataset")
[2]:
tagline | ||
---|---|---|
collection | name | |
OptimizationDataset | FDA Optimization Dataset 1 | None |
JGI Metabolite Set 1 | None | |
OpenFF Discrepancy Benchmark 1 | None | |
OpenFF Full Optimization Benchmark 1 | None | |
OpenFF NCI250K Boron 1 | None | |
OpenFF Optimization Set 1 | None | |
OpenFF Primary Optimization Benchmark 1 | None | |
OpenFF VEHICLe Set 1 | None | |
PEI | None | |
Pfizer Discrepancy Optimization Dataset 1 | None | |
QM8-T | None | |
SMIRNOFF Coverage Set 1 | None |
[3]:
ds = client.get_collection("OptimizationDataset", "SMIRNOFF Coverage Set 1")
Exploring the Dataset¶
Each row of the dataset is comprised of a Entry
which corresponds to a molecule.
[4]:
ds.df.head()
[4]:
COC(O)OC-0 |
C[S-]-0 |
CS-0 |
CO-0 |
CCO-0 |
New computations are based off specifications which contain many additional parameters to tune the geometry optimization as well as the underlying computational method. Here, we can list all specifications that are attributed to this dataset.
[5]:
ds.list_specifications()
[5]:
Description | |
---|---|
Name | |
default | Standard OpenFF optimization quantum chemistry... |
In this case, there is one specification corresponding to a single level of theory. It is important to recall that these Collections are “live”: new specifications can be added and individual optimizations can be under computation. To see the current status of each specification the status
function is provided:
[6]:
ds.status(["default"])
[6]:
default | |
---|---|
COMPLETE | 1118 |
INCOMPLETE | 7 |
ERROR | 7 |
The number of geometry steps for each molecule can be shown:
[7]:
ds.counts()
[7]:
default | |
---|---|
COC(O)OC-0 | 11.0 |
C[S-]-0 | 6.0 |
CS-0 | 5.0 |
CO-0 | 4.0 |
CCO-0 | 8.0 |
... | ... |
CSSCCN=C=S-7 | 26.0 |
CSSCCN=C=S-8 | 45.0 |
CSSCCN=C=S-9 | 48.0 |
CSSCCN=C=S-10 | 60.0 |
CSSCCN=C=S-11 | 38.0 |
1118 rows × 1 columns
Individual records can be pulled for molecules:
[8]:
optrec = ds.get_record(name="CCO-0", specification="default")
These records contain the geometries and energies of the optimization trajectory. Below are some example data that may be pulled from an OptimizationRecord
. The initial and final molecules may be extracted:
[9]:
optrec.get_initial_molecule()
[10]:
optrec.get_final_molecule()
And the energy trajectory of the optimization can be plotted:
[11]:
optrec.show_history()