├── .gitattributes
├── .gitignore
├── README.md
├── binary_class_pred.py
├── bpe_tokenizer.py
├── create_hparam_set.py
├── create_selfies_alphabet.py
├── data
├── BPETokenizer
│ ├── bpe.json
│ ├── merges.txt
│ └── vocab.json
├── RobertaFastTokenizer
│ ├── merges.txt
│ ├── special_tokens_map.json
│ ├── tokenizer.json
│ ├── tokenizer_config.json
│ └── vocab.json
├── finetuning_datasets
│ ├── classification
│ │ ├── bace
│ │ │ └── bace.csv
│ │ ├── bbbp
│ │ │ ├── bbbp.csv
│ │ │ └── bbbp_mock.csv
│ │ ├── hiv
│ │ │ └── hiv.csv
│ │ ├── sider
│ │ │ ├── sider.csv
│ │ │ └── sider_mock.csv
│ │ └── tox21
│ │ │ └── tox21.csv
│ └── regression
│ │ ├── esol
│ │ ├── esol.csv
│ │ └── esol_mock.csv
│ │ ├── freesolv
│ │ └── freesolv.csv
│ │ ├── lipo
│ │ └── lipo.csv
│ │ └── pdbbind_full
│ │ └── pdbbind_full.csv
├── molecule_dataset_selfies.zip
├── molecule_dataset_smiles.zip
├── pretraining_hyperparameters.yml
└── requirements.yml
├── figures
└── selformer_architecture.png
├── generate_selfies.py
├── get_embeddings.py
├── get_moleculenet_embeddings.py
├── multilabel_class_pred.py
├── prepare_finetuning_data.py
├── prepare_pretraining_data.py
├── produce_embeddings.py
├── regression_pred.py
├── roberta_model.py
├── roberta_tokenizer.py
├── to_selfies.py
├── train_classification_model.py
├── train_classification_multilabel_model.py
├── train_pretraining_model.py
└── train_regression_model.py
/.gitattributes:
--------------------------------------------------------------------------------
1 | *.bin filter=lfs diff=lfs merge=lfs -text
2 | chembl_29_selfies.csv filter=lfs diff=lfs merge=lfs -text
3 | chembl_29_chemreps.txt filter=lfs diff=lfs merge=lfs -text
4 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | .vscode
3 |
4 | finetuned_models/
5 | pretrained_models/
6 | chembl_29_selfies.csv
7 | chembl_29_selfies.zip
8 | chembl_29_selfies_subset.txt
9 |
10 | *.ipynb
11 | *.out
12 | *.toml
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # SELFormer: Molecular Representation Learning via SELFIES Language Models
2 |
3 |
4 |
5 | [](https://doi.org/10.1088/2632-2153/acdb30) [](http://www.gnu.org/licenses/)
6 |
7 | Automated computational analysis of the vast chemical space is critical for numerous fields of research such as drug discovery and material science. Representation learning techniques have recently been employed with the primary objective of generating compact and informative numerical expressions of complex data. One approach to efficiently learn molecular representations is processing string-based notations of chemicals via natural language processing (NLP) algorithms. Majority of the methods proposed so far utilize SMILES notations for this purpose; however, SMILES is associated with numerous problems related to validity and robustness, which may prevent the model from effectively uncovering the knowledge hidden in the data. In this study, we propose SELFormer, a transformer architecture-based chemical language model that utilizes a 100% valid, compact and expressive notation, SELFIES, as input, in order to learn flexible and high-quality molecular representations. SELFormer is pre-trained on two million drug-like compounds and fine-tuned for diverse molecular property prediction tasks. Our performance evaluation has revealed that, SELFormer outperforms all competing methods, including graph learning-based approaches and SMILES-based chemical language models, on predicting aqueous solubility of molecules and adverse drug reactions. We also visualized molecular representations learned by SELFormer via dimensionality reduction, which indicated that even the pre-trained model can discriminate molecules with differing structural properties. We shared SELFormer as a programmatic tool, together with its datasets and pre-trained models. Overall, our research demonstrates the benefit of using the SELFIES notations in the context of chemical language modeling and opens up new possibilities for the design and discovery of novel drug candidates with desired features.
8 |
9 |
10 |
11 | **Figure.** The schematic representation of the SELFormer architecture and the experiments conducted. **Left:** the self-supervised pre-training utilizes the transformer encoder module via masked language modeling for learning concise and informative representations of small molecules encoded by their SELFIES notation. **Right:** the pre-trained model has been fine-tuned independently on numerous molecular property-based classification and regression tasks.
12 |
13 |
14 |
15 |
16 | ## The Architecture of SELFormer
17 |
18 | SELFormer is built on the RoBERTa transformer architecture, which utilizes the same architecture as BERT, but with certain modifications that have been found to improve model performance or provide other benefits. One such modification is the use of byte-level Byte-Pair Encoding (BPE) for tokenization instead of character-level BPE. Another one is that, RoBERTa is pre-trained exclusively on the masked language modeling (MLM) objective while disregarding the next sentence prediction (NSP) task. SELFormer has (i) self-supervised pre-trained models that utilize the transformer encoder module for learning concise and informative representations of small molecules encoded by their SELFIES notation, and (ii) supervised classification/regression models which use the pre-treined model as base and fine-tune on numerous classification- and regression-based molecular property prediction tasks.
19 |
20 | Our pre-trained encoder models are implemented as "RobertaMaskedLM" and fine-tuning models as "RobertaForSequenceClassification". For the fine-tuning process, the SELFormer architecture includes the pre-trained RoBERTa model as its base, and "RobertaClassificationHead" class as the following layers (for classification and regression). "RobertaClassificationHead" class consists of a dropout layer, a dense layer, tanh activation function, a dropout layer, and a final linear layer. We forward the sequence output of the pre-trained RoBERTa base model to the classifier during the fine-tuning process.
21 |
22 |
23 |
24 | ## Getting Started
25 |
26 | We highly recommend the Conda platform for installing dependencies. Following the installation of Conda, please create and activate an environment with dependencies as defined below:
27 |
28 | ```
29 | conda create -n SELFormer_env
30 | conda activate SELFormer_env
31 | conda env update --file data/requirements.yml
32 | ```
33 |
34 |
35 | ## Generating Molecule Embeddings Using Pre-trained Models
36 |
37 | Pre-trained SELFormer models are available for download [here](https://drive.google.com/drive/folders/1c3Mwc3j4M0PHk_iORrKU_V5cuxkD9aM6?usp=share_link). Embeddings of all molecules from CHEMBL30 and CHEMBL33 that are generated by our best performing model are available [here](https://drive.google.com/drive/folders/1Ii44Z6HonzJv5B5VYFujVaSThf802e2M?usp=sharing).
38 |
39 | You can also generate embeddings for your own dataset using the pre-trained models. To do so, you will need SELFIES notations of your molecules. You can use the command below to generate SELFIES notations for your SMILES dataset.
40 |
41 | If you want to reproduce our code for generating embeddings of CHEMBL30 dataset, you can unzip __molecule_dataset_smiles.zip__ and/or __molecule_dataset_selfies.zip__ files in the __data__ directory and use them as input SMILES and SELFIES datasets, respectively.
42 |
43 | ```
44 | python3 generate_selfies.py --smiles_dataset=data/molecule_dataset_smiles.txt --selfies_dataset=data/molecule_dataset_selfies.csv
45 | ```
46 |
47 | * __--smiles_dataset__: Path of the input SMILES dataset.
48 | * __--selfies_dataset__: Path of the output SELFIES dataset.
49 |
50 |
51 |
52 | To generate embeddings for the SELFIES molecule dataset using a pre-trained model, please run the following command:
53 |
54 | ```
55 | python3 produce_embeddings.py --selfies_dataset=data/molecule_dataset_selfies.csv --model_file=data/pretrained_models/SELFormer --embed_file=data/embeddings.csv
56 | ```
57 |
58 | * __--selfies_dataset__: Path of the input SELFIES dataset.
59 | * __--model_file__: Path of the pretrained model to be used.
60 | * __--embed_file__: Path of the output embeddings file.
61 |
62 |
63 |
64 | ### Generating Embeddings Using Pre-trained Models for MoleculeNet Dataset Molecules
65 |
66 | The embeddings generated by our best performing pre-trained model for MoleculeNet data can be directly downloaded [here](https://drive.google.com/drive/folders/1Xu3Q1T-KwXb67MF3Uw63pFm2IzoxeNNY?usp=share_link).
67 |
68 | You can also re-generate these embeddings using the command below.
69 |
70 | ```
71 | python3 get_moleculenet_embeddings.py --dataset_path=data/finetuning_datasets --model_file=data/pretrained_models/SELFormer
72 | ```
73 | * __--dataset_path__: Path of the directory containing the MoleculeNet datasets.
74 | * __--model_file__: Path of the pretrained model to be used.
75 |
76 |
77 |
78 | ## Training and Evaluating Models
79 |
80 | ### Pre-Training
81 | To pre-train a model, please run the command below. If you have a SELFIES dataset, you can use it directly by giving the path of the dataset to __--selfies_dataset__. If you have a SMILES dataset, you can give the path of the dataset to __--smiles_dataset__ and the SELFIES representations will be created at the path given to __--selfies_dataset__.
82 |
83 |
84 |
85 | ```
86 | python3 train_pretraining_model.py --smiles_dataset=data/molecule_dataset_smiles.txt --selfies_dataset=data/molecule_dataset_selfies.csv --prepared_data_path=data/selfies_data.txt --bpe_path=data/BPETokenizer --roberta_fast_tokenizer_path=data/RobertaFastTokenizer --hyperparameters_path=data/pretraining_hyperparameters.yml --subset_size=100000
87 | ```
88 |
89 | * __--smiles_dataset__: Path of the SMILES dataset. It is required if __--selfies_dataset__ does not exist (optional).
90 | * __--selfies_dataset__: Path of the SELFIES dataset. If a SELFIES dataset does not exist, it will be created at the given path using the __--smiles_dataset__. If it exists, SELFIES dataset will be used directly (required).
91 | * __--prepared_data_path__: Path of the intermediate file that will be created during pre-training. It will be used for tokenization. If it does not exist, it will be created at the given path (required).
92 | * __--bpe_path__: Path of the BPE tokenizer. If it does not exist, it will be created at the given path (required).
93 | * __--roberta_fast_tokenizer_path__: Path of the RobertaTokenizerFast tokenizer. If it does not exist, it will be created at the given path (required).
94 | * __--hyperparameters_path__: Path of the yaml file that contains the hyperparameter sets to be tested. Note that these sets will be tested one by one and not in parallel. Example file is available at /data/pretraining_hyperparameters.yml (required).
95 | * __--subset_size__: The size of the subset of the dataset that will be used for pre-training. By default, the whole dataset will be used (optional).
96 |
97 |
98 |
99 | ### Fine-tuning on Molecular Property Prediction
100 |
101 | You can use commands below to fine-tune a pre-trained model for various molecular property prediction tasks. These commands are utilized to handle datasets containing SMILES representations of molecules. SMILES representations should be stored in a column with a header named "smiles". You can see the example datasets in the __data/finetuning_datasets__ directory.
102 |
103 |
104 |
105 | **Binary Classification Tasks**
106 |
107 | To fine-tune a pre-trained model on a binary classification dataset, please run the command below.
108 |
109 | ```
110 | python3 train_classification_model.py --model=data/saved_models/SELFormer --tokenizer=data/RobertaFastTokenizer --dataset=data/finetuning_datasets/classification/bbbp/bbbp.csv --save_to=data/finetuned_models/SELFormer_bbbp_classification --target_column_id=1 --use_scaffold=1 --train_batch_size=16 --validation_batch_size=8 --num_epochs=25 --lr=5e-5 --wd=0
111 | ```
112 |
113 | * __--model__: Directory of the pre-trained model (required).
114 | * __--tokenizer__: Directory of the RobertaFastTokenizer (required).
115 | * __--dataset__: Path of the fine-tuning dataset (required).
116 | * __--save_to__: Directory where the fine-tuned model will be saved (required).
117 | * __--target_column_id__: Default: 1. The column id of the target column in the fine-tuning dataset (optional).
118 | * __--use_scaffold__: Default: 0. Determines whether to use scaffold splitting (1) or random splitting (0) (optional).
119 | * __--train_batch_size__: Default: 8 (optional).
120 | * __--validation_batch_size__ : Default: 8 (optional).
121 | * __--num_epochs__: Default: 50. Number of epochs (optional).
122 | * __--lr__: Default: 1e-5: Learning rate (optional).
123 | * __--wd__: Default: 0.1: Weight decay (optional).
124 |
125 |
126 |
127 | **Multi-Label Classification Tasks**
128 |
129 | To fine-tune a pre-trained model on a multi-label classification dataset, please run the command below. The RobertaFastTokenizer files should be stored in the same directory as the pre-trained model.
130 |
131 | ```
132 | python3 train_classification_multilabel_model.py --model=data/saved_models/SELFormer --dataset=data/finetuning_datasets/classification/tox21/tox21.csv --save_to=data/finetuned_models/SELFormer_tox21_classification --use_scaffold=1 --batch_size=16 --num_epochs=25 --lr=5e-5 --wd=0
133 | ```
134 |
135 | * __--model__: Directory of the pre-trained model (required).
136 | * __--dataset__: Path of the fine-tuning dataset (required).
137 | * __--save_to__: Directory where the fine-tuned model will be saved (required).
138 | * __--use_scaffold__: Default: 0. Determines whether to use scaffold splitting (1) or random splitting (0) (optional).
139 | * __--batch_size__: Default: 8. Train batch size (optional).
140 | * __--num_epochs__: Default: 50. Number of epochs (optional).
141 | * __--lr__: Default: 1e-5: Learning rate (optional).
142 | * __--wd__: Default: 0.1: Weight decay (optional).
143 |
144 |
145 |
146 | **Regression Tasks**
147 |
148 | To fine-tune a pre-trained model on a regression dataset, please run the command below.
149 |
150 | ```
151 | python3 train_regression_model.py --model=data/saved_models/SELFormer --tokenizer=data/RobertaFastTokenizer --dataset=data/finetuning_datasets/regression/esol/esol.csv --save_to=data/finetuned_models/SELFormer_esol_regression --target_column_id=-1 --scaler=2 --use_scaffold=1 --train_batch_size=16 --validation_batch_size=8 --num_epochs=25 --lr=5e-5 --wd=0
152 | ```
153 |
154 | * __--model__: Directory of the pre-trained model (required).
155 | * __--tokenizer__: Directory of the RobertaFastTokenizer (required).
156 | * __--dataset__: Path of the fine-tuning dataset (required).
157 | * __--save_to__: Directory where the fine-tuned model will be saved (required).
158 | * __--target_column_id__: Default: 1. The column id of the target column in the fine-tuning dataset (optional).
159 | * __--scaler__: Default: 0. Method to be used for scaling the target values. 0 for no scaling, 1 for min-max scaling, 2 for standard scaling (optional).
160 | * __--use_scaffold__: Default: 0. Determines whether to use scaffold splitting (1) or random splitting (0) (optional).
161 | * __--train_batch_size__: Default: 8 (optional).
162 | * __--validation_batch_size__ : Default: 8 (optional).
163 | * __--num_epochs__: Default: 50. Number of epochs (optional).
164 | * __--lr__: Default: 1e-5: Learning rate (optional).
165 | * __--wd__: Default: 0.1: Weight decay (optional).
166 |
167 |
168 |
169 | ## Producing Molecular Property Predictions with Fine-tuned Models
170 |
171 | Fine-tuned SELFormer models are available for download [here](https://drive.google.com/drive/folders/1LVw1YZBL1AUAGCxIkavz0KMJNVyzxAXG?usp=share_link). To make predictions with these models, please follow the instructions below.
172 |
173 |
174 |
175 | ### Binary Classification
176 |
177 | To make predictions for either BACE, BBBP, and HIV datasets, please run the command below. Change the indicated arguments for different tasks. Default parameters will load fine-tuned model on BBBP.
178 |
179 | ```
180 | python3 binary_class_pred.py --task=bbbp --model_name=data/finetuned_models/SELFormer_bbbp_scaffold_optimized --tokenizer=data/RobertaFastTokenizer --pred_set=data/finetuning_datasets/classification/bbbp/bbbp_mock.csv --training_args=data/finetuned_models/SELFormer_bbbp_scaffold_optimized/training_args.bin
181 | ```
182 |
183 | * __--task__: Binary classification task to choose. (bace, bbbp, hiv) (required).
184 | * __--model_name__: Path of the fine-tuned model (required).
185 | * __--tokenizer__: Tokenizer selection (required).
186 | * __--pred_set__: Molecules to make predictions. Should be a CSV file with a single column. Header should be smiles (required).
187 | * __--training_args__: Initialize the model arguments (required).
188 |
189 |
190 |
191 | ### Multi-Label Classification
192 |
193 | To make predictions for either Tox21 and SIDER datasets, please run the command below. Change the indicated arguments for different tasks. Default parameters will load fine-tuned model on SIDER.
194 |
195 | ```
196 | python3 multilabel_class_pred.py --task=sider --model_name=data/finetuned_models/SELFormer_sider_scaffold_optimized --pred_set=data/finetuning_datasets/classification/sider/sider_mock.csv --training_args=data/finetuned_models/SELFormer_sider_scaffold_optimized/training_args.bin --num_labels=27
197 | ```
198 |
199 | * __--task__: Multi-label classification task to choose. (tox21, sider) (required).
200 | * __--model_name__: Path of the fine-tuned model (required).
201 | * __--pred_set__: Molecules to make predictions. Should be a CSV file with a single column containing SMILES. Header should be 'smiles' (required).
202 | * __--training_args__: Initialize the model arguments (required).
203 | * __--num_labels__: Number of labels (required).
204 |
205 |
206 |
207 | ### Regression
208 |
209 | To make predictions for either ESOL, FreeSolv, Lipophilicity, and PDBBind datasets, please run the command below. Change the indicated arguments for different tasks. Default parameters will load fine-tuned model on ESOL.
210 |
211 | ```
212 | python3 regression_pred.py --task=esol --model_name=data/finetuned_models/esol_regression --tokenizer=data/RobertaFastTokenizer --pred_set=data/finetuning_datasets/classification/esol/esol_mock.csv --training_args=data/finetuned_models/esol_regression/training_args.bin
213 | ```
214 |
215 | * __--task__: Binary classification task to choose. (esol, freesolv, lipo, pdbbind_full) (required).
216 | * __--model_name__: Path of the fine-tuned model (required).
217 | * __--tokenizer__: Tokenizer selection (required).
218 | * __--pred_set__: Molecules to make predictions. Should be a CSV file with a single column. Header should be smiles (required).
219 | * __--training_args__: Initialize the model arguments (required).
220 |
221 |
222 |
223 | ## License
224 | Copyright (C) 2023 HUBioDataLab
225 |
226 | This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
227 |
228 | This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
229 |
230 | You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
231 |
232 |
--------------------------------------------------------------------------------
/binary_class_pred.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import pandas as pd
4 | import torch
5 | from torch.nn import CrossEntropyLoss
6 | from torch.utils.data import Dataset
7 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast
8 | from transformers.models.roberta.modeling_roberta import (
9 | RobertaClassificationHead,
10 | RobertaConfig,
11 | RobertaModel,
12 | )
13 | from transformers import Trainer
14 | from prepare_finetuning_data import smiles_to_selfies
15 | import argparse
16 |
17 | parser = argparse.ArgumentParser()
18 | parser.add_argument("--task", default="bbbp", help="task selection.")
19 | parser.add_argument("--tokenizer_name", default="data/RobertaFastTokenizer", metavar="/path/to/dataset/", help="Tokenizer selection.")
20 | parser.add_argument("--pred_set", default="data/finetuning_datasets/classification/bbbp/bbbp_mock.csv", metavar="/path/to/dataset/", help="Test set for predictions.")
21 | parser.add_argument("--training_args", default= "data/finetuned_models/SELFormer_bbbp_scaffold_optimized/training_args.bin", metavar="/path/to/dataset/", help="Trained model arguments.")
22 | parser.add_argument("--model_name", default="data/finetuned_models/SELFormer_bbbp_scaffold_optimized", metavar="/path/to/dataset/", help="Path to the model.")
23 | args = parser.parse_args()
24 |
25 | class SELFIESTransformers_For_Classification(BertPreTrainedModel):
26 | def __init__(self, config):
27 | super(SELFIESTransformers_For_Classification, self).__init__(config)
28 | self.num_labels = config.num_labels
29 | self.roberta = RobertaModel(config)
30 | self.classifier = RobertaClassificationHead(config)
31 |
32 | def forward(self, input_ids, attention_mask, labels=None):
33 | outputs = self.roberta(input_ids, attention_mask=attention_mask)
34 | sequence_output = outputs[0]
35 | logits = self.classifier(sequence_output)
36 |
37 | outputs = (logits,) + outputs[2:]
38 |
39 | if labels is not None:
40 | loss_fct = CrossEntropyLoss()
41 | loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
42 | outputs = (loss,) + outputs
43 | return outputs # (loss), logits, (hidden_states), (attentions)
44 |
45 | model_class = SELFIESTransformers_For_Classification
46 | config_class = RobertaConfig
47 | tokenizer_name = args.tokenizer_name
48 |
49 | tokenizer_class = RobertaTokenizerFast
50 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False)
51 |
52 | # Prepare and Get Data
53 | class SELFIESTransfomers_Dataset(Dataset):
54 | def __init__(self, data, tokenizer, MAX_LEN):
55 | text = data
56 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt")
57 |
58 |
59 | def __len__(self):
60 | return len(self.examples["input_ids"])
61 |
62 | def __getitem__(self, index):
63 | item = {key: self.examples[key][index] for key in self.examples}
64 |
65 | return item
66 |
67 | pred_set = pd.read_csv(args.pred_set)
68 | pred_df_selfies = smiles_to_selfies(pred_set)
69 |
70 | MAX_LEN = 128
71 |
72 | pred_examples = (pred_df_selfies.iloc[:, 0].astype(str).tolist())
73 | pred_dataset = SELFIESTransfomers_Dataset(pred_examples, tokenizer, MAX_LEN)
74 |
75 | training_args = torch.load(args.training_args)
76 |
77 | model_name = args.model_name
78 | config = config_class.from_pretrained(model_name, num_labels=2)
79 | bbbp_model = model_class.from_pretrained(model_name, config=config)
80 |
81 | trainer = Trainer(model=bbbp_model, args=training_args) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset
82 | raw_pred, label_ids, metrics = trainer.predict(pred_dataset)
83 | print(raw_pred)
84 | y_pred = np.argmax(raw_pred, axis=1).astype(int)
85 | res = pd.concat([pred_df_selfies, pd.DataFrame(y_pred, columns=["prediction"])], axis = 1)
86 |
87 | if not os.path.exists("data/predictions"):
88 | os.makedirs("data/predictions")
89 |
90 | res.to_csv("data/predictions/{}_predictions.csv".format(args.task), index=False)
--------------------------------------------------------------------------------
/bpe_tokenizer.py:
--------------------------------------------------------------------------------
1 | from tokenizers import Tokenizer
2 | from tokenizers.models import BPE
3 | from tokenizers.pre_tokenizers import Split
4 | from tokenizers import Regex
5 | from tokenizers.processors import TemplateProcessing
6 | from tokenizers.trainers import BpeTrainer
7 |
8 | from os import mkdir
9 |
10 |
11 | def bpe_tokenizer(path="./data/selfies_subset.txt", save_to="./data/bpe/"):
12 | try:
13 | mkdir(save_to)
14 | except FileExistsError:
15 | pass
16 |
17 | tokenizer = Tokenizer(BPE(unk_token=""))
18 |
19 | tokenizer.pre_tokenizer = Split(pattern=Regex("\[|\]"), behavior="removed")
20 |
21 | tokenizer.post_processor = TemplateProcessing(single=" $A ", pair=" $A $B:1 :1", special_tokens=[("", 1), ("", 2)],)
22 |
23 | trainer = BpeTrainer(special_tokens=["", "", "", "", ""])
24 | tokenizer.train(files=[path], trainer=trainer)
25 |
26 | tokenizer.save(save_to + "/bpe.json", pretty=True)
27 | tokenizer.model.save(save_to)
--------------------------------------------------------------------------------
/create_hparam_set.py:
--------------------------------------------------------------------------------
1 | import yaml
2 |
3 |
4 | def create_hparam_yml(TRAIN_BATCH_SIZE, TRAIN_EPOCHS, LEARNING_RATE, WEIGHT_DECAY, NUM_ATTENTION_HEADS, NUM_HIDDEN_LAYERS, save_to="hparams.yml"):
5 | # Hyperparameters
6 | hparams = {}
7 | set_no = 0
8 | for batch_size in TRAIN_BATCH_SIZE:
9 | for num_epoch in TRAIN_EPOCHS:
10 | for lr in LEARNING_RATE:
11 | for wd in WEIGHT_DECAY:
12 | for num_heads in NUM_ATTENTION_HEADS:
13 | for num_layers in NUM_HIDDEN_LAYERS:
14 | hparams["set_" + str(set_no)] = {
15 | "TRAIN_BATCH_SIZE": batch_size,
16 | "VALID_BATCH_SIZE": 8,
17 | "TRAIN_EPOCHS": num_epoch,
18 | "LEARNING_RATE": lr,
19 | "WEIGHT_DECAY": wd,
20 | "MAX_LEN": 128,
21 | "VOCAB_SIZE": 800,
22 | "MAX_POSITION_EMBEDDINGS": 514,
23 | "NUM_ATTENTION_HEADS": num_heads,
24 | "NUM_HIDDEN_LAYERS": num_layers,
25 | "TYPE_VOCAB_SIZE": 1,
26 | "HIDDEN_SIZE": 768,
27 | }
28 | set_no += 1
29 | set_no += 1
30 | set_no += 1
31 | set_no += 1
32 | set_no += 1
33 | set_no += 1
34 |
35 | # Write to yaml file
36 | with open(save_to, "w") as f:
37 | yaml.dump(hparams, f)
38 |
39 |
40 | create_hparam_yml(TRAIN_BATCH_SIZE=[16, 32, 64], TRAIN_EPOCHS=[5, 10], LEARNING_RATE=[1e-5], WEIGHT_DECAY=[0.001], NUM_ATTENTION_HEADS=[4, 8], NUM_HIDDEN_LAYERS=[8, 12])
41 |
--------------------------------------------------------------------------------
/create_selfies_alphabet.py:
--------------------------------------------------------------------------------
1 | import selfies as sf
2 |
3 |
4 | def get_selfies_alphabet(chembl_df, path="./data/chembl_29_selfies_alphabet.txt"):
5 | selfies_array = chembl_df.selfies.to_numpy(copy=True)
6 | selfies_alphabet = sf.get_alphabet_from_selfies(selfies_array)
7 |
8 | with open(path, "w") as f:
9 | f.write(",".join(list(selfies_alphabet)))
10 |
--------------------------------------------------------------------------------
/data/BPETokenizer/bpe.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": "1.0",
3 | "truncation": null,
4 | "padding": null,
5 | "added_tokens": [
6 | {
7 | "id": 0,
8 | "special": true,
9 | "content": "",
10 | "single_word": false,
11 | "lstrip": false,
12 | "rstrip": false,
13 | "normalized": false
14 | },
15 | {
16 | "id": 1,
17 | "special": true,
18 | "content": "",
19 | "single_word": false,
20 | "lstrip": false,
21 | "rstrip": false,
22 | "normalized": false
23 | },
24 | {
25 | "id": 2,
26 | "special": true,
27 | "content": "",
28 | "single_word": false,
29 | "lstrip": false,
30 | "rstrip": false,
31 | "normalized": false
32 | },
33 | {
34 | "id": 3,
35 | "special": true,
36 | "content": "",
37 | "single_word": false,
38 | "lstrip": false,
39 | "rstrip": false,
40 | "normalized": false
41 | },
42 | {
43 | "id": 4,
44 | "special": true,
45 | "content": "",
46 | "single_word": false,
47 | "lstrip": false,
48 | "rstrip": false,
49 | "normalized": false
50 | }
51 | ],
52 | "normalizer": null,
53 | "pre_tokenizer": {
54 | "type": "Split",
55 | "pattern": {
56 | "Regex": "\\[|\\]"
57 | },
58 | "behavior": "Removed",
59 | "invert": false
60 | },
61 | "post_processor": {
62 | "type": "TemplateProcessing",
63 | "single": [
64 | {
65 | "SpecialToken": {
66 | "id": "",
67 | "type_id": 0
68 | }
69 | },
70 | {
71 | "Sequence": {
72 | "id": "A",
73 | "type_id": 0
74 | }
75 | },
76 | {
77 | "SpecialToken": {
78 | "id": "",
79 | "type_id": 0
80 | }
81 | }
82 | ],
83 | "pair": [
84 | {
85 | "SpecialToken": {
86 | "id": "",
87 | "type_id": 0
88 | }
89 | },
90 | {
91 | "Sequence": {
92 | "id": "A",
93 | "type_id": 0
94 | }
95 | },
96 | {
97 | "SpecialToken": {
98 | "id": "",
99 | "type_id": 0
100 | }
101 | },
102 | {
103 | "Sequence": {
104 | "id": "B",
105 | "type_id": 1
106 | }
107 | },
108 | {
109 | "SpecialToken": {
110 | "id": "",
111 | "type_id": 1
112 | }
113 | }
114 | ],
115 | "special_tokens": {
116 | "": {
117 | "id": "",
118 | "ids": [
119 | 2
120 | ],
121 | "tokens": [
122 | ""
123 | ]
124 | },
125 | "": {
126 | "id": "",
127 | "ids": [
128 | 1
129 | ],
130 | "tokens": [
131 | ""
132 | ]
133 | }
134 | }
135 | },
136 | "decoder": null,
137 | "model": {
138 | "type": "BPE",
139 | "dropout": null,
140 | "unk_token": "",
141 | "continuing_subword_prefix": null,
142 | "end_of_word_suffix": null,
143 | "fuse_unk": false,
144 | "vocab": {
145 | "": 0,
146 | "": 1,
147 | "": 2,
148 | "": 3,
149 | "": 4,
150 | "\n": 5,
151 | "#": 6,
152 | "+": 7,
153 | "-": 8,
154 | ".": 9,
155 | "/": 10,
156 | "0": 11,
157 | "1": 12,
158 | "2": 13,
159 | "3": 14,
160 | "4": 15,
161 | "5": 16,
162 | "8": 17,
163 | "=": 18,
164 | "@": 19,
165 | "A": 20,
166 | "B": 21,
167 | "C": 22,
168 | "F": 23,
169 | "H": 24,
170 | "I": 25,
171 | "K": 26,
172 | "L": 27,
173 | "M": 28,
174 | "N": 29,
175 | "O": 30,
176 | "P": 31,
177 | "R": 32,
178 | "S": 33,
179 | "T": 34,
180 | "Z": 35,
181 | "\\": 36,
182 | "a": 37,
183 | "c": 38,
184 | "e": 39,
185 | "g": 40,
186 | "h": 41,
187 | "i": 42,
188 | "l": 43,
189 | "n": 44,
190 | "r": 45,
191 | "s": 46,
192 | "Br": 47,
193 | "an": 48,
194 | "ch": 49,
195 | "Bran": 50,
196 | "Branch": 51,
197 | "Branch1": 52,
198 | "=C": 53,
199 | "Ri": 54,
200 | "ng": 55,
201 | "Ring": 56,
202 | "Ring1": 57,
203 | "=Branch1": 58,
204 | "Branch2": 59,
205 | "=O": 60,
206 | "Ring2": 61,
207 | "H1": 62,
208 | "C@": 63,
209 | "=N": 64,
210 | "#Branch1": 65,
211 | "C@@": 66,
212 | "=Branch2": 67,
213 | "C@H1": 68,
214 | "C@@H1": 69,
215 | "#Branch2": 70,
216 | "Cl": 71,
217 | "#C": 72,
218 | "/C": 73,
219 | "NH1": 74,
220 | "+1": 75,
221 | "-1": 76,
222 | "=Ring1": 77,
223 | "O-1": 78,
224 | "N+1": 79,
225 | "\\C": 80,
226 | "/N": 81,
227 | "#N": 82,
228 | "=Ring2": 83,
229 | "=S": 84,
230 | "=N+1": 85,
231 | "Na": 86,
232 | "Na+1": 87,
233 | "\\N": 88,
234 | "S+1": 89,
235 | "/O": 90,
236 | "\\S": 91,
237 | "\\O": 92,
238 | "Br-1": 93,
239 | "I-1": 94,
240 | "Cl-1": 95,
241 | "/C@H1": 96,
242 | "Branch3": 97,
243 | "/C@@H1": 98,
244 | "=P": 99,
245 | "/S": 100,
246 | "=N-1": 101,
247 | "Si": 102,
248 | "K+1": 103,
249 | "N-1": 104,
250 | "Se": 105,
251 | "Li": 106,
252 | "Li+1": 107,
253 | "+3": 108,
254 | "Cl+3": 109,
255 | "\\C@H1": 110,
256 | "Ring3": 111,
257 | "\\C@@H1": 112,
258 | "/N+1": 113,
259 | "/P": 114,
260 | "\\F": 115,
261 | "P@": 116,
262 | "2H": 117,
263 | "PH1": 118,
264 | "/Br": 119,
265 | "N@": 120,
266 | "P+1": 121,
267 | "/Cl": 122,
268 | "\\NH1": 123,
269 | "\\Br": 124,
270 | "@+1": 125,
271 | "/I": 126,
272 | "/C@": 127,
273 | "Te": 128,
274 | "\\N+1": 129,
275 | "P@@": 130,
276 | "12": 131,
277 | "5I": 132,
278 | "\\O-1": 133,
279 | "125I": 134,
280 | "/F": 135,
281 | "#N+1": 136,
282 | "\\Cl": 137,
283 | "N@+1": 138,
284 | "\\I": 139,
285 | "-/": 140,
286 | "/C@@": 141,
287 | "N@@": 142,
288 | "N@@+1": 143,
289 | "-/Ring2": 144,
290 | "-\\": 145,
291 | "14": 146,
292 | "B-1": 147,
293 | "C-1": 148,
294 | "S@+1": 149,
295 | "14C": 150,
296 | "H2": 151,
297 | "H4": 152,
298 | "I+1": 153,
299 | "S-1": 154,
300 | "\\P": 155,
301 | "=S+1": 156,
302 | "=P@": 157,
303 | "SiH4": 158,
304 | "+2": 159,
305 | "3H": 160,
306 | "@@+1": 161,
307 | "Ag": 162,
308 | "C+1": 163,
309 | "S@@+1": 164,
310 | "Cl+1": 165,
311 | "=Se": 166,
312 | "-\\Ring1": 167,
313 | "H0": 168,
314 | "OH0": 169,
315 | "11": 170,
316 | "=Branch3": 171,
317 | "=Te": 172,
318 | "Mg": 173,
319 | "O+1": 174,
320 | "Zn": 175,
321 | "\\C@": 176,
322 | "\\S+1": 177,
323 | "H1-1": 178,
324 | "SeH1": 179,
325 | "P@+1": 180,
326 | "-\\Ring2": 181,
327 | "11C": 182,
328 | "=Te+1": 183,
329 | "Zn+2": 184,
330 | "/NH1": 185,
331 | "18": 186,
332 | "As": 187,
333 | "BH2": 188,
334 | "BH1-1": 189,
335 | "Ca": 190,
336 | "H3": 191,
337 | "OH1-1": 192,
338 | "SH2": 193,
339 | "=O+1": 194,
340 | "Se+1": 195,
341 | "TeH2": 196,
342 | "125IH1": 197,
343 | "-/Ring1": 198,
344 | "14CH2": 199,
345 | "Ag+1": 200,
346 | "=Se+1": 201,
347 | "MgH2": 202,
348 | "Mg+2": 203,
349 | "11CH3": 204,
350 | "18F": 205,
351 | "BH2-1": 206,
352 | "Ca+2": 207
353 | },
354 | "merges": [
355 | "B r",
356 | "a n",
357 | "c h",
358 | "Br an",
359 | "Bran ch",
360 | "Branch 1",
361 | "= C",
362 | "R i",
363 | "n g",
364 | "Ri ng",
365 | "Ring 1",
366 | "= Branch1",
367 | "Branch 2",
368 | "= O",
369 | "Ring 2",
370 | "H 1",
371 | "C @",
372 | "= N",
373 | "# Branch1",
374 | "C@ @",
375 | "= Branch2",
376 | "C@ H1",
377 | "C@@ H1",
378 | "# Branch2",
379 | "C l",
380 | "# C",
381 | "/ C",
382 | "N H1",
383 | "+ 1",
384 | "- 1",
385 | "= Ring1",
386 | "O -1",
387 | "N +1",
388 | "\\ C",
389 | "/ N",
390 | "# N",
391 | "= Ring2",
392 | "= S",
393 | "=N +1",
394 | "N a",
395 | "Na +1",
396 | "\\ N",
397 | "S +1",
398 | "/ O",
399 | "\\ S",
400 | "\\ O",
401 | "Br -1",
402 | "I -1",
403 | "Cl -1",
404 | "/ C@H1",
405 | "Branch 3",
406 | "/ C@@H1",
407 | "= P",
408 | "/ S",
409 | "=N -1",
410 | "S i",
411 | "K +1",
412 | "N -1",
413 | "S e",
414 | "L i",
415 | "Li +1",
416 | "+ 3",
417 | "Cl +3",
418 | "\\ C@H1",
419 | "Ring 3",
420 | "\\ C@@H1",
421 | "/ N+1",
422 | "/ P",
423 | "\\ F",
424 | "P @",
425 | "2 H",
426 | "P H1",
427 | "/ Br",
428 | "N @",
429 | "P +1",
430 | "/ Cl",
431 | "\\ NH1",
432 | "\\ Br",
433 | "@ +1",
434 | "/ I",
435 | "/ C@",
436 | "T e",
437 | "\\ N+1",
438 | "P@ @",
439 | "1 2",
440 | "5 I",
441 | "\\ O-1",
442 | "12 5I",
443 | "/ F",
444 | "# N+1",
445 | "\\ Cl",
446 | "N@ +1",
447 | "\\ I",
448 | "- /",
449 | "/ C@@",
450 | "N@ @",
451 | "N@ @+1",
452 | "-/ Ring2",
453 | "- \\",
454 | "1 4",
455 | "B -1",
456 | "C -1",
457 | "S @+1",
458 | "14 C",
459 | "H 2",
460 | "H 4",
461 | "I +1",
462 | "S -1",
463 | "\\ P",
464 | "=S +1",
465 | "=P @",
466 | "Si H4",
467 | "+ 2",
468 | "3 H",
469 | "@ @+1",
470 | "A g",
471 | "C +1",
472 | "S @@+1",
473 | "Cl +1",
474 | "=S e",
475 | "-\\ Ring1",
476 | "H 0",
477 | "O H0",
478 | "1 1",
479 | "= Branch3",
480 | "= Te",
481 | "M g",
482 | "O +1",
483 | "Z n",
484 | "\\ C@",
485 | "\\ S+1",
486 | "H1 -1",
487 | "Se H1",
488 | "P@ +1",
489 | "-\\ Ring2",
490 | "11 C",
491 | "=Te +1",
492 | "Zn +2",
493 | "/ NH1",
494 | "1 8",
495 | "A s",
496 | "B H2",
497 | "B H1-1",
498 | "C a",
499 | "H 3",
500 | "O H1-1",
501 | "S H2",
502 | "=O +1",
503 | "Se +1",
504 | "Te H2",
505 | "125I H1",
506 | "-/ Ring1",
507 | "14C H2",
508 | "Ag +1",
509 | "=Se +1",
510 | "Mg H2",
511 | "Mg +2",
512 | "11C H3",
513 | "18 F",
514 | "BH2 -1",
515 | "Ca +2"
516 | ]
517 | }
518 | }
--------------------------------------------------------------------------------
/data/BPETokenizer/merges.txt:
--------------------------------------------------------------------------------
1 | #version: 0.2 - Trained by `huggingface/tokenizers`
2 | B r
3 | a n
4 | c h
5 | Br an
6 | Bran ch
7 | Branch 1
8 | = C
9 | R i
10 | n g
11 | Ri ng
12 | Ring 1
13 | = Branch1
14 | Branch 2
15 | = O
16 | Ring 2
17 | H 1
18 | C @
19 | = N
20 | # Branch1
21 | C@ @
22 | = Branch2
23 | C@ H1
24 | C@@ H1
25 | # Branch2
26 | C l
27 | # C
28 | / C
29 | N H1
30 | + 1
31 | - 1
32 | = Ring1
33 | O -1
34 | N +1
35 | \ C
36 | / N
37 | # N
38 | = Ring2
39 | = S
40 | =N +1
41 | N a
42 | Na +1
43 | \ N
44 | S +1
45 | / O
46 | \ S
47 | \ O
48 | Br -1
49 | I -1
50 | Cl -1
51 | / C@H1
52 | Branch 3
53 | / C@@H1
54 | = P
55 | / S
56 | =N -1
57 | S i
58 | K +1
59 | N -1
60 | S e
61 | L i
62 | Li +1
63 | + 3
64 | Cl +3
65 | \ C@H1
66 | Ring 3
67 | \ C@@H1
68 | / N+1
69 | / P
70 | \ F
71 | P @
72 | 2 H
73 | P H1
74 | / Br
75 | N @
76 | P +1
77 | / Cl
78 | \ NH1
79 | \ Br
80 | @ +1
81 | / I
82 | / C@
83 | T e
84 | \ N+1
85 | P@ @
86 | 1 2
87 | 5 I
88 | \ O-1
89 | 12 5I
90 | / F
91 | # N+1
92 | \ Cl
93 | N@ +1
94 | \ I
95 | - /
96 | / C@@
97 | N@ @
98 | N@ @+1
99 | -/ Ring2
100 | - \
101 | 1 4
102 | B -1
103 | C -1
104 | S @+1
105 | 14 C
106 | H 2
107 | H 4
108 | I +1
109 | S -1
110 | \ P
111 | =S +1
112 | =P @
113 | Si H4
114 | + 2
115 | 3 H
116 | @ @+1
117 | A g
118 | C +1
119 | S @@+1
120 | Cl +1
121 | =S e
122 | -\ Ring1
123 | H 0
124 | O H0
125 | 1 1
126 | = Branch3
127 | = Te
128 | M g
129 | O +1
130 | Z n
131 | \ C@
132 | \ S+1
133 | H1 -1
134 | Se H1
135 | P@ +1
136 | -\ Ring2
137 | 11 C
138 | =Te +1
139 | Zn +2
140 | / NH1
141 | 1 8
142 | A s
143 | B H2
144 | B H1-1
145 | C a
146 | H 3
147 | O H1-1
148 | S H2
149 | =O +1
150 | Se +1
151 | Te H2
152 | 125I H1
153 | -/ Ring1
154 | 14C H2
155 | Ag +1
156 | =Se +1
157 | Mg H2
158 | Mg +2
159 | 11C H3
160 | 18 F
161 | BH2 -1
162 | Ca +2
163 |
--------------------------------------------------------------------------------
/data/BPETokenizer/vocab.json:
--------------------------------------------------------------------------------
1 | {"":0,"":1,"":2,"":3,"":4,"\n":5,"#":6,"+":7,"-":8,".":9,"/":10,"0":11,"1":12,"2":13,"3":14,"4":15,"5":16,"8":17,"=":18,"@":19,"A":20,"B":21,"C":22,"F":23,"H":24,"I":25,"K":26,"L":27,"M":28,"N":29,"O":30,"P":31,"R":32,"S":33,"T":34,"Z":35,"\\":36,"a":37,"c":38,"e":39,"g":40,"h":41,"i":42,"l":43,"n":44,"r":45,"s":46,"Br":47,"an":48,"ch":49,"Bran":50,"Branch":51,"Branch1":52,"=C":53,"Ri":54,"ng":55,"Ring":56,"Ring1":57,"=Branch1":58,"Branch2":59,"=O":60,"Ring2":61,"H1":62,"C@":63,"=N":64,"#Branch1":65,"C@@":66,"=Branch2":67,"C@H1":68,"C@@H1":69,"#Branch2":70,"Cl":71,"#C":72,"/C":73,"NH1":74,"+1":75,"-1":76,"=Ring1":77,"O-1":78,"N+1":79,"\\C":80,"/N":81,"#N":82,"=Ring2":83,"=S":84,"=N+1":85,"Na":86,"Na+1":87,"\\N":88,"S+1":89,"/O":90,"\\S":91,"\\O":92,"Br-1":93,"I-1":94,"Cl-1":95,"/C@H1":96,"Branch3":97,"/C@@H1":98,"=P":99,"/S":100,"=N-1":101,"Si":102,"K+1":103,"N-1":104,"Se":105,"Li":106,"Li+1":107,"+3":108,"Cl+3":109,"\\C@H1":110,"Ring3":111,"\\C@@H1":112,"/N+1":113,"/P":114,"\\F":115,"P@":116,"2H":117,"PH1":118,"/Br":119,"N@":120,"P+1":121,"/Cl":122,"\\NH1":123,"\\Br":124,"@+1":125,"/I":126,"/C@":127,"Te":128,"\\N+1":129,"P@@":130,"12":131,"5I":132,"\\O-1":133,"125I":134,"/F":135,"#N+1":136,"\\Cl":137,"N@+1":138,"\\I":139,"-/":140,"/C@@":141,"N@@":142,"N@@+1":143,"-/Ring2":144,"-\\":145,"14":146,"B-1":147,"C-1":148,"S@+1":149,"14C":150,"H2":151,"H4":152,"I+1":153,"S-1":154,"\\P":155,"=S+1":156,"=P@":157,"SiH4":158,"+2":159,"3H":160,"@@+1":161,"Ag":162,"C+1":163,"S@@+1":164,"Cl+1":165,"=Se":166,"-\\Ring1":167,"H0":168,"OH0":169,"11":170,"=Branch3":171,"=Te":172,"Mg":173,"O+1":174,"Zn":175,"\\C@":176,"\\S+1":177,"H1-1":178,"SeH1":179,"P@+1":180,"-\\Ring2":181,"11C":182,"=Te+1":183,"Zn+2":184,"/NH1":185,"18":186,"As":187,"BH2":188,"BH1-1":189,"Ca":190,"H3":191,"OH1-1":192,"SH2":193,"=O+1":194,"Se+1":195,"TeH2":196,"125IH1":197,"-/Ring1":198,"14CH2":199,"Ag+1":200,"=Se+1":201,"MgH2":202,"Mg+2":203,"11CH3":204,"18F":205,"BH2-1":206,"Ca+2":207}
--------------------------------------------------------------------------------
/data/RobertaFastTokenizer/merges.txt:
--------------------------------------------------------------------------------
1 | #version: 0.2 - Trained by `huggingface/tokenizers`
2 | B r
3 | a n
4 | c h
5 | Br an
6 | Bran ch
7 | Branch 1
8 | = C
9 | R i
10 | n g
11 | Ri ng
12 | Ring 1
13 | = Branch1
14 | Branch 2
15 | = O
16 | Ring 2
17 | H 1
18 | C @
19 | = N
20 | # Branch1
21 | C@ @
22 | = Branch2
23 | C@ H1
24 | C@@ H1
25 | # Branch2
26 | C l
27 | # C
28 | / C
29 | N H1
30 | + 1
31 | - 1
32 | = Ring1
33 | O -1
34 | N +1
35 | \ C
36 | / N
37 | # N
38 | = Ring2
39 | = S
40 | =N +1
41 | N a
42 | Na +1
43 | \ N
44 | S +1
45 | / O
46 | \ S
47 | \ O
48 | Br -1
49 | I -1
50 | Cl -1
51 | / C@H1
52 | Branch 3
53 | / C@@H1
54 | = P
55 | / S
56 | =N -1
57 | S i
58 | K +1
59 | N -1
60 | S e
61 | L i
62 | Li +1
63 | + 3
64 | Cl +3
65 | \ C@H1
66 | Ring 3
67 | \ C@@H1
68 | / N+1
69 | / P
70 | \ F
71 | P @
72 | 2 H
73 | P H1
74 | / Br
75 | N @
76 | P +1
77 | / Cl
78 | \ NH1
79 | \ Br
80 | @ +1
81 | / I
82 | / C@
83 | T e
84 | \ N+1
85 | P@ @
86 | 1 2
87 | 5 I
88 | \ O-1
89 | 12 5I
90 | / F
91 | # N+1
92 | \ Cl
93 | N@ +1
94 | \ I
95 | - /
96 | / C@@
97 | N@ @
98 | N@ @+1
99 | -/ Ring2
100 | - \
101 | 1 4
102 | B -1
103 | C -1
104 | S @+1
105 | 14 C
106 | H 2
107 | H 4
108 | I +1
109 | S -1
110 | \ P
111 | =S +1
112 | =P @
113 | Si H4
114 | + 2
115 | 3 H
116 | @ @+1
117 | A g
118 | C +1
119 | S @@+1
120 | Cl +1
121 | =S e
122 | -\ Ring1
123 | H 0
124 | O H0
125 | 1 1
126 | = Branch3
127 | = Te
128 | M g
129 | O +1
130 | Z n
131 | \ C@
132 | \ S+1
133 | H1 -1
134 | Se H1
135 | P@ +1
136 | -\ Ring2
137 | 11 C
138 | =Te +1
139 | Zn +2
140 | / NH1
141 | 1 8
142 | A s
143 | B H2
144 | B H1-1
145 | C a
146 | H 3
147 | O H1-1
148 | S H2
149 | =O +1
150 | Se +1
151 | Te H2
152 | 125I H1
153 | -/ Ring1
154 | 14C H2
155 | Ag +1
156 | =Se +1
157 | Mg H2
158 | Mg +2
159 | 11C H3
160 | 18 F
161 | BH2 -1
162 | Ca +2
163 |
--------------------------------------------------------------------------------
/data/RobertaFastTokenizer/special_tokens_map.json:
--------------------------------------------------------------------------------
1 | {"bos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "sep_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "cls_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "mask_token": {"content": "", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true}}
--------------------------------------------------------------------------------
/data/RobertaFastTokenizer/tokenizer.json:
--------------------------------------------------------------------------------
1 | {"version":"1.0","truncation":null,"padding":null,"added_tokens":[{"id":0,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":1,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":2,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":3,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":4,"special":true,"content":"","single_word":false,"lstrip":true,"rstrip":false,"normalized":true}],"normalizer":null,"pre_tokenizer":{"type":"ByteLevel","add_prefix_space":false,"trim_offsets":true},"post_processor":{"type":"RobertaProcessing","sep":["",2],"cls":["",1],"trim_offsets":true,"add_prefix_space":false},"decoder":{"type":"ByteLevel","add_prefix_space":true,"trim_offsets":true},"model":{"type":"BPE","dropout":null,"unk_token":null,"continuing_subword_prefix":"","end_of_word_suffix":"","fuse_unk":false,"vocab":{"":0,"":1,"":2,"":3,"":4,"\n":5,"#":6,"+":7,"-":8,".":9,"/":10,"0":11,"1":12,"2":13,"3":14,"4":15,"5":16,"8":17,"=":18,"@":19,"A":20,"B":21,"C":22,"F":23,"H":24,"I":25,"K":26,"L":27,"M":28,"N":29,"O":30,"P":31,"R":32,"S":33,"T":34,"Z":35,"\\":36,"a":37,"c":38,"e":39,"g":40,"h":41,"i":42,"l":43,"n":44,"r":45,"s":46,"Br":47,"an":48,"ch":49,"Bran":50,"Branch":51,"Branch1":52,"=C":53,"Ri":54,"ng":55,"Ring":56,"Ring1":57,"=Branch1":58,"Branch2":59,"=O":60,"Ring2":61,"H1":62,"C@":63,"=N":64,"#Branch1":65,"C@@":66,"=Branch2":67,"C@H1":68,"C@@H1":69,"#Branch2":70,"Cl":71,"#C":72,"/C":73,"NH1":74,"+1":75,"-1":76,"=Ring1":77,"O-1":78,"N+1":79,"\\C":80,"/N":81,"#N":82,"=Ring2":83,"=S":84,"=N+1":85,"Na":86,"Na+1":87,"\\N":88,"S+1":89,"/O":90,"\\S":91,"\\O":92,"Br-1":93,"I-1":94,"Cl-1":95,"/C@H1":96,"Branch3":97,"/C@@H1":98,"=P":99,"/S":100,"=N-1":101,"Si":102,"K+1":103,"N-1":104,"Se":105,"Li":106,"Li+1":107,"+3":108,"Cl+3":109,"\\C@H1":110,"Ring3":111,"\\C@@H1":112,"/N+1":113,"/P":114,"\\F":115,"P@":116,"2H":117,"PH1":118,"/Br":119,"N@":120,"P+1":121,"/Cl":122,"\\NH1":123,"\\Br":124,"@+1":125,"/I":126,"/C@":127,"Te":128,"\\N+1":129,"P@@":130,"12":131,"5I":132,"\\O-1":133,"125I":134,"/F":135,"#N+1":136,"\\Cl":137,"N@+1":138,"\\I":139,"-/":140,"/C@@":141,"N@@":142,"N@@+1":143,"-/Ring2":144,"-\\":145,"14":146,"B-1":147,"C-1":148,"S@+1":149,"14C":150,"H2":151,"H4":152,"I+1":153,"S-1":154,"\\P":155,"=S+1":156,"=P@":157,"SiH4":158,"+2":159,"3H":160,"@@+1":161,"Ag":162,"C+1":163,"S@@+1":164,"Cl+1":165,"=Se":166,"-\\Ring1":167,"H0":168,"OH0":169,"11":170,"=Branch3":171,"=Te":172,"Mg":173,"O+1":174,"Zn":175,"\\C@":176,"\\S+1":177,"H1-1":178,"SeH1":179,"P@+1":180,"-\\Ring2":181,"11C":182,"=Te+1":183,"Zn+2":184,"/NH1":185,"18":186,"As":187,"BH2":188,"BH1-1":189,"Ca":190,"H3":191,"OH1-1":192,"SH2":193,"=O+1":194,"Se+1":195,"TeH2":196,"125IH1":197,"-/Ring1":198,"14CH2":199,"Ag+1":200,"=Se+1":201,"MgH2":202,"Mg+2":203,"11CH3":204,"18F":205,"BH2-1":206,"Ca+2":207},"merges":["B r","a n","c h","Br an","Bran ch","Branch 1","= C","R i","n g","Ri ng","Ring 1","= Branch1","Branch 2","= O","Ring 2","H 1","C @","= N","# Branch1","C@ @","= Branch2","C@ H1","C@@ H1","# Branch2","C l","# C","/ C","N H1","+ 1","- 1","= Ring1","O -1","N +1","\\ C","/ N","# N","= Ring2","= S","=N +1","N a","Na +1","\\ N","S +1","/ O","\\ S","\\ O","Br -1","I -1","Cl -1","/ C@H1","Branch 3","/ C@@H1","= P","/ S","=N -1","S i","K +1","N -1","S e","L i","Li +1","+ 3","Cl +3","\\ C@H1","Ring 3","\\ C@@H1","/ N+1","/ P","\\ F","P @","2 H","P H1","/ Br","N @","P +1","/ Cl","\\ NH1","\\ Br","@ +1","/ I","/ C@","T e","\\ N+1","P@ @","1 2","5 I","\\ O-1","12 5I","/ F","# N+1","\\ Cl","N@ +1","\\ I","- /","/ C@@","N@ @","N@ @+1","-/ Ring2","- \\","1 4","B -1","C -1","S @+1","14 C","H 2","H 4","I +1","S -1","\\ P","=S +1","=P @","Si H4","+ 2","3 H","@ @+1","A g","C +1","S @@+1","Cl +1","=S e","-\\ Ring1","H 0","O H0","1 1","= Branch3","= Te","M g","O +1","Z n","\\ C@","\\ S+1","H1 -1","Se H1","P@ +1","-\\ Ring2","11 C","=Te +1","Zn +2","/ NH1","1 8","A s","B H2","B H1-1","C a","H 3","O H1-1","S H2","=O +1","Se +1","Te H2","125I H1","-/ Ring1","14C H2","Ag +1","=Se +1","Mg H2","Mg +2","11C H3","18 F","BH2 -1","Ca +2"]}}
--------------------------------------------------------------------------------
/data/RobertaFastTokenizer/tokenizer_config.json:
--------------------------------------------------------------------------------
1 | {"unk_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "errors": "replace", "sep_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "special_tokens_map_file": null, "name_or_path": "./data/bpe/", "tokenizer_class": "RobertaTokenizer"}
--------------------------------------------------------------------------------
/data/RobertaFastTokenizer/vocab.json:
--------------------------------------------------------------------------------
1 | {"":0,"":1,"":2,"":3,"":4,"\n":5,"#":6,"+":7,"-":8,".":9,"/":10,"0":11,"1":12,"2":13,"3":14,"4":15,"5":16,"8":17,"=":18,"@":19,"A":20,"B":21,"C":22,"F":23,"H":24,"I":25,"K":26,"L":27,"M":28,"N":29,"O":30,"P":31,"R":32,"S":33,"T":34,"Z":35,"\\":36,"a":37,"c":38,"e":39,"g":40,"h":41,"i":42,"l":43,"n":44,"r":45,"s":46,"Br":47,"an":48,"ch":49,"Bran":50,"Branch":51,"Branch1":52,"=C":53,"Ri":54,"ng":55,"Ring":56,"Ring1":57,"=Branch1":58,"Branch2":59,"=O":60,"Ring2":61,"H1":62,"C@":63,"=N":64,"#Branch1":65,"C@@":66,"=Branch2":67,"C@H1":68,"C@@H1":69,"#Branch2":70,"Cl":71,"#C":72,"/C":73,"NH1":74,"+1":75,"-1":76,"=Ring1":77,"O-1":78,"N+1":79,"\\C":80,"/N":81,"#N":82,"=Ring2":83,"=S":84,"=N+1":85,"Na":86,"Na+1":87,"\\N":88,"S+1":89,"/O":90,"\\S":91,"\\O":92,"Br-1":93,"I-1":94,"Cl-1":95,"/C@H1":96,"Branch3":97,"/C@@H1":98,"=P":99,"/S":100,"=N-1":101,"Si":102,"K+1":103,"N-1":104,"Se":105,"Li":106,"Li+1":107,"+3":108,"Cl+3":109,"\\C@H1":110,"Ring3":111,"\\C@@H1":112,"/N+1":113,"/P":114,"\\F":115,"P@":116,"2H":117,"PH1":118,"/Br":119,"N@":120,"P+1":121,"/Cl":122,"\\NH1":123,"\\Br":124,"@+1":125,"/I":126,"/C@":127,"Te":128,"\\N+1":129,"P@@":130,"12":131,"5I":132,"\\O-1":133,"125I":134,"/F":135,"#N+1":136,"\\Cl":137,"N@+1":138,"\\I":139,"-/":140,"/C@@":141,"N@@":142,"N@@+1":143,"-/Ring2":144,"-\\":145,"14":146,"B-1":147,"C-1":148,"S@+1":149,"14C":150,"H2":151,"H4":152,"I+1":153,"S-1":154,"\\P":155,"=S+1":156,"=P@":157,"SiH4":158,"+2":159,"3H":160,"@@+1":161,"Ag":162,"C+1":163,"S@@+1":164,"Cl+1":165,"=Se":166,"-\\Ring1":167,"H0":168,"OH0":169,"11":170,"=Branch3":171,"=Te":172,"Mg":173,"O+1":174,"Zn":175,"\\C@":176,"\\S+1":177,"H1-1":178,"SeH1":179,"P@+1":180,"-\\Ring2":181,"11C":182,"=Te+1":183,"Zn+2":184,"/NH1":185,"18":186,"As":187,"BH2":188,"BH1-1":189,"Ca":190,"H3":191,"OH1-1":192,"SH2":193,"=O+1":194,"Se+1":195,"TeH2":196,"125IH1":197,"-/Ring1":198,"14CH2":199,"Ag+1":200,"=Se+1":201,"MgH2":202,"Mg+2":203,"11CH3":204,"18F":205,"BH2-1":206,"Ca+2":207}
--------------------------------------------------------------------------------
/data/finetuning_datasets/classification/bbbp/bbbp_mock.csv:
--------------------------------------------------------------------------------
1 | smiles,p_np
2 | [Cl].CC(C)NCC(O)COc1cccc2ccccc12,1
3 | C(=O)(OC(C)(C)C)CCCc1ccc(cc1)N(CCCl)CCCl,1
4 | c12c3c(N4CCN(C)CC4)c(F)cc1c(c(C(O)=O)cn2C(C)CO3)=O,1
5 | C1CCN(CC1)Cc1cccc(c1)OCCCNC(=O)C,1
6 | Cc1onc(c2ccccc2Cl)c1C(=O)N[C@H]3[C@H]4SC(C)(C)[C@@H](N4C3=O)C(O)=O,1
7 | CCN1CCN(C(=O)N[C@@H](C(=O)N[C@H]2[C@H]3SCC(=C(N3C2=O)C(O)=O)CSc4nnnn4C)c5ccc(O)cc5)C(=O)C1=O,1
8 | CN(C)[C@H]1[C@@H]2C[C@H]3C(=C(O)c4c(O)cccc4[C@@]3(C)O)C(=O)[C@]2(O)C(=O)\C(=C(/O)NCN5CCCC5)C1=O,1
9 | Cn1c2CCC(Cn3ccnc3C)C(=O)c2c4ccccc14,1
10 | COc1ccc(cc1)[C@@H]2Sc3ccccc3N(CCN(C)C)C(=O)[C@@H]2OC(C)=O,1
11 |
--------------------------------------------------------------------------------
/data/finetuning_datasets/classification/sider/sider_mock.csv:
--------------------------------------------------------------------------------
1 | smiles,Hepatobiliary disorders,Metabolism and nutrition disorders,Product issues,Eye disorders,Investigations,Musculoskeletal and connective tissue disorders,Gastrointestinal disorders,Social circumstances,Immune system disorders,Reproductive system and breast disorders,"Neoplasms benign, malignant and unspecified (incl cysts and polyps)",General disorders and administration site conditions,Endocrine disorders,Surgical and medical procedures,Vascular disorders,Blood and lymphatic system disorders,Skin and subcutaneous tissue disorders,"Congenital, familial and genetic disorders",Infections and infestations,"Respiratory, thoracic and mediastinal disorders",Psychiatric disorders,Renal and urinary disorders,"Pregnancy, puerperium and perinatal conditions",Ear and labyrinth disorders,Cardiac disorders,Nervous system disorders,"Injury, poisoning and procedural complications"
2 | C(CNCCNCCNCCN)N,1,1,0,0,1,1,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0
3 | CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C,0,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,1,1,0,0,0,1,0,1,0
4 | CC[C@]12CC(=C)[C@H]3[C@H]([C@@H]1CC[C@]2(C#C)O)CCC4=CCCC[C@H]34,0,1,0,1,1,0,1,0,1,1,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1,0
5 | CCC12CC(=C)C3C(C1CC[C@]2(C#C)O)CCC4=CC(=O)CCC34,1,1,0,1,1,1,1,0,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1
6 | C1C(C2=CC=CC=C2N(C3=CC=CC=C31)C(=O)N)O,1,1,0,1,1,1,1,0,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,1,0,1,0
7 | CC[C@H](C)[C@H]1C(=O)N[C@H]2CSSC[C@@H](C(=O)N[C@@H](CSSC[C@@H](C(=O)NCC(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@@H](CSSC[C@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC2=O)CO)CC(C)C)CC3=CC=C(C=C3)O)CCC(=O)N)CC(C)C)CCC(=O)O)CC(=O)N)CC4=CC=C(C=C4)O)C(=O)NCC(=O)O)C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)NCC(=O)N[C@@H](CC5=CC=CC=C5)C(=O)N[C@@H](CC6=CC=CC=C6)C(=O)N[C@@H](CC7=CC=C(C=C7)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N8CCC[C@H]8C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CCCNC(=N)N)C(=O)O)C(C)C)CC(C)C)CC9=CC=C(C=C9)O)CC(C)C)C)CCC(=O)O)C(C)C)CC(C)C)CC2=CN=CN2)CO)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CC2=CC=CC=C2)N)C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)CO)[C@@H](C)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H]([C@@H](C)CC)NC(=O)CN,0,1,0,1,1,1,1,0,1,0,1,1,0,0,1,0,1,0,1,1,1,1,0,0,1,1,1
8 | CC1CC2C3CCC4=CC(=O)C=CC4([C@]3(C(CC2([C@]1(C(=O)CCl)O)C)O)F)C,0,1,0,1,0,0,1,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1
9 | CCCCCCOC(=O)N=C(C1=CC=C(C=C1)NCC2=NC3=C(N2C)C=CC(=C3)C(=O)N(CCC(=O)OCC)C4=CC=CC=N4)N,1,1,0,0,1,1,1,0,1,0,0,1,0,1,1,1,1,0,1,1,0,1,0,0,1,1,1
10 | CSCCC(C(=O)NCC(=O)NC(CC1=CNC2=CC=CC=C21)C(=O)NC(CCSC)C(=O)NC(CC(=O)O)C(=O)NC(CC3=CC=CC=C3)C(=O)N)NC(=O)C(CC4=CC=C(C=C4)OS(=O)(=O)O)NC(=O)C(CC(=O)O)N,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0
11 |
--------------------------------------------------------------------------------
/data/finetuning_datasets/regression/esol/esol.csv:
--------------------------------------------------------------------------------
1 | smiles,Compound ID,ESOL predicted log solubility in mols per litre,Minimum Degree,Molecular Weight,Number of H-Bond Donors,Number of Rings,Number of Rotatable Bonds,Polar Surface Area,measured log solubility in mols per litre
2 | OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(O)C3O ,Amigdalin,-0.974,1,457.4320000000001,7,3,7,202.32,-0.77
3 | Cc1occc1C(=O)Nc2ccccc2,Fenfuram,-2.885,1,201.225,1,2,2,42.24,-3.3
4 | CC(C)=CCCC(C)=CC(=O),citral,-2.579,1,152.237,0,0,4,17.07,-2.06
5 | c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43,Picene,-6.617999999999999,2,278.354,0,5,0,0.0,-7.87
6 | c1ccsc1,Thiophene,-2.232,2,84.14299999999999,0,1,0,0.0,-1.33
7 | c2ccc1scnc1c2 ,benzothiazole,-2.733,2,135.191,0,2,0,12.89,-1.5
8 | Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl,"2,2,4,6,6'-PCB",-6.545,1,326.437,0,2,1,0.0,-7.32
9 | CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O,Estradiol,-4.138,1,272.388,2,4,0,40.46,-5.03
10 | ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl,Dieldrin,-4.533,1,380.913,0,5,0,12.53,-6.29
11 | COc5cc4OCC3Oc2c1CC(Oc1ccc2C(=O)C3c4cc5OC)C(C)=C ,Rotenone,-5.246,1,394.4230000000002,0,5,3,63.22,-4.42
12 | O=C1CCCN1,2-pyrrolidone,0.243,1,85.10600000000001,1,1,0,29.1,1.07
13 | Clc1ccc2ccccc2c1,2-Chloronapthalene,-4.063,1,162.61899999999997,0,2,0,0.0,-4.14
14 | CCCC=C,1-Pentene ,-2.01,1,70.135,0,0,2,0.0,-2.68
15 | CCC1(C(=O)NCNC1=O)c2ccccc2,Primidone,-1.897,1,218.256,2,2,2,58.2,-2.64
16 | CCCCCCCCCCCCCC,Tetradecane,-5.45,1,198.39399999999995,0,0,11,0.0,-7.96
17 | CC(C)Cl,2-Chloropropane,-1.585,1,78.542,0,0,0,0.0,-1.41
18 | CCC(C)CO,2-Methylbutanol,-1.027,1,88.14999999999999,1,0,2,20.23,-0.47
19 | N#Cc1ccccc1,Benzonitrile,-2.03,1,103.12399999999997,0,1,0,23.79,-1.0
20 | CCOP(=S)(OCC)Oc1cc(C)nc(n1)C(C)C,Diazinon,-3.989,1,304.35200000000003,0,1,7,53.47,-3.64
21 | CCCCCCCCCC(C)O,2-Undecanol,-3.096,1,172.312,1,0,8,20.23,-2.94
22 | Clc1ccc(c(Cl)c1)c2c(Cl)ccc(Cl)c2Cl ,"2,2',3,4,6-PCB",-6.627000000000001,1,326.437,0,2,1,0.0,-7.43
23 | O=c2[nH]c1CCCc1c(=O)n2C3CCCCC3,Lenacil,-3.355,1,234.29899999999995,1,3,1,54.86,-4.593999999999999
24 | CCOP(=S)(OCC)SCSCC,Phorate,-3.747,1,260.386,0,0,8,18.46,-4.11
25 | CCOc1ccc(NC(=O)C)cc1,Phenacetin,-2.342,1,179.219,1,1,3,38.33,-2.35
26 | CCN(CC)c1c(cc(c(N)c1N(=O)=O)C(F)(F)F)N(=O)=O,Dinitramine,-4.479,1,322.243,1,1,5,115.54000000000002,-5.47
27 | CCCCCCCO,1-Heptanol,-1.751,1,116.204,1,0,5,20.23,-1.81
28 | Cn1c(=O)n(C)c2nc[nH]c2c1=O,Theophylline,-1.452,1,180.167,1,2,0,72.68,-1.39
29 | CCCCC1(CC)C(=O)NC(=O)NC1=O,Butethal,-1.974,1,212.249,2,1,4,75.27000000000001,-1.661
30 | ClC(Cl)=C(c1ccc(Cl)cc1)c2ccc(Cl)cc2,"P,P'-DDE",-6.553,1,318.0300000000001,0,2,2,0.0,-6.9
31 | CCCCCCCC(=O)OC,Methyl octanoate,-2.608,1,158.241,0,0,6,26.3,-3.17
32 | CCc1ccc(CC)cc1,"1,4-Diethylbenzene ",-3.633,1,134.22199999999998,0,1,2,0.0,-3.75
33 | CCOP(=S)(OCC)SCSC(C)(C)C,Terbufos,-4.367,1,288.44,0,0,7,18.46,-4.755
34 | COC(=O)Nc1cccc(OC(=O)Nc2cccc(C)c2)c1,Phenmedipham,-4.229,1,300.314,2,2,3,76.66,-4.805
35 | ClC(=C)Cl,"1,1-Dichloroethylene",-1.939,1,96.944,0,0,0,0.0,-1.64
36 | Cc1cccc2c1Cc3ccccc32,1-Methylfluorene,-4.478,1,180.25000000000003,0,3,0,0.0,-5.22
37 | CCCCC=O,Valeraldehyde,-1.103,1,86.13399999999999,0,0,3,17.07,-0.85
38 | N(c1ccccc1)c2ccccc2,Diphenylamine,-3.857,2,169.227,1,2,2,12.03,-3.504
39 | CN(C)C(=O)SCCCCOc1ccccc1,Fenothiocarb,-3.297,1,253.367,0,1,6,29.540000000000003,-3.927
40 | CCCOP(=S)(OCCC)SCC(=O)N1CCCCC1C,Piperophos,-4.637,1,353.4900000000001,0,1,9,38.77,-4.15
41 | CCCCCCCI,1-Iodoheptane,-3.904,1,226.101,0,0,5,0.0,-4.81
42 | c1c(Cl)cccc1c2ccccc2,3-Chlorobiphenyl,-4.685,1,188.657,0,2,1,0.0,-4.88
43 | OCCCC=C,4-Pentene-1-ol,-0.7909999999999999,1,86.134,1,0,3,20.23,-0.15
44 | O=C2NC(=O)C1(CCC1)C(=O)N2,Cyclobutyl-5-spirobarbituric acid,-0.527,1,168.15200000000002,2,2,0,75.27,-1.655
45 | CC(C)C1CCC(C)CC1O ,menthol,-2.782,1,156.269,1,1,1,20.23,-2.53
46 | CC(C)OC=O,Isopropyl formate,-0.684,1,88.106,0,0,2,26.3,-0.63
47 | CCCCCC(C)O,2-Heptanol ,-1.6780000000000002,1,116.20399999999998,1,0,4,20.23,-1.55
48 | CC(=O)Nc1ccc(Br)cc1,p-Bromoacetanilide,-3.012,1,214.062,1,1,1,29.1,-3.083
49 | c1ccccc1n2ncc(N)c(Br)c2(=O),brompyrazone,-3.005,1,266.098,1,2,1,60.91,-3.127
50 | COC(=O)C1=C(C)NC(=C(C1c2ccccc2N(=O)=O)C(=O)OC)C ,nifedipine,-4.248,1,346.33900000000017,1,2,4,107.77,-4.76
51 | c2c(C)cc1nc(C)ccc1c2 ,"2,7-dimethylquinoline",-3.342,1,157.216,0,2,0,12.89,-1.94
52 | CCCCCCC#C,1-Octyne ,-2.509,1,110.2,0,0,4,0.0,-3.66
53 | CCC1(C(=O)NC(=O)NC1=O)C2=CCCCC2 ,cyclobarbital,-2.421,1,236.271,2,2,2,75.27000000000001,-2.17
54 | c1ccc2c(c1)ccc3c4ccccc4ccc23,Chrysene,-5.568,2,228.294,0,4,0,0.0,-8.057
55 | CCC(C)n1c(=O)[nH]c(C)c(Br)c1=O ,Bromacil,-3.419,1,261.119,1,1,2,54.86,-2.523
56 | Clc1cccc(c1Cl)c2c(Cl)c(Cl)cc(Cl)c2Cl ,"2,2',3,3',5,6-PCB",-7.185,1,360.88200000000006,0,2,1,0.0,-8.6
57 | Cc1ccccc1O,2-Methylphenol,-2.281,1,108.14,1,1,0,20.23,-0.62
58 | CC(C)CCC(C)(C)C,"2,2,5-Trimethylhexane",-3.631,1,128.259,0,0,2,0.0,-5.05
59 | Cc1ccc(C)c2ccccc12,"1,4-Dimethylnaphthalene ",-4.147,1,156.228,0,2,0,0.0,-4.14
60 | Cc1cc2c3ccccc3ccc2c4ccccc14,6-Methylchrysene,-5.931,1,242.321,0,4,0,0.0,-6.57
61 | CCCC(=O)C,2-Pentanone,-0.846,1,86.13399999999999,0,0,2,17.07,-0.19
62 | Clc1cc(Cl)c(Cl)c(c1Cl)c2c(Cl)c(Cl)cc(Cl)c2Cl ,"2,2',3,3',5,5',6,6'-PCB",-8.304,1,429.77200000000016,0,2,1,0.0,-9.15
63 | CCCOC(=O)CC,Methyl butyrate,-1.545,1,116.15999999999998,0,0,3,26.3,-0.82
64 | CC34CC(O)C1(F)C(CCC2=CC(=O)C=CC12C)C3CC(O)C4(O)C(=O)CO,Triamcinolone,-2.734,1,394.43900000000014,4,4,2,115.06000000000002,-3.68
65 | Nc1ccc(O)cc1,p-Aminophenol,-1.231,1,109.128,2,1,0,46.25,-0.8
66 | O=C(Cn1ccnc1N(=O)=O)NCc2ccccc2,Benznidazole,-2.321,1,260.253,1,2,5,90.06,-2.81
67 | OC4=C(C1CCC(CC1)c2ccc(Cl)cc2)C(=O)c3ccccc3C4=O,"Atovaquone(0,430mg/ml) - neutral",-6.269,1,366.84400000000016,1,4,2,54.37,-5.931
68 | CCNc1nc(Cl)nc(n1)N(CC)CC,Trietazine,-3.233,1,229.715,1,1,5,53.940000000000005,-4.06
69 | NC(=O)c1cnccn1,Pyrazinamide,-0.674,1,123.11499999999998,1,1,1,68.87,-0.667
70 | CCC(Br)(CC)C(=O)NC(N)=O,Carbromal,-2.198,1,237.097,2,0,3,72.19,-2.68
71 | Clc1ccccc1c2ccccc2Cl ,"2,2'-PCB",-4.984,1,223.102,0,2,1,0.0,-5.27
72 | O=C2CN(N=Cc1ccc(o1)N(=O)=O)C(=O)N2 ,nitrofurantoin,-1.243,1,238.159,1,2,3,118.04999999999998,-3.38
73 | Clc2ccc(Oc1ccc(cc1)N(=O)=O)c(Cl)c2,Nitrofen,-5.361000000000001,1,284.098,0,2,3,52.37,-5.46
74 | CC1(C)C2CCC1(C)C(=O)C2,Camphor,-2.158,1,152.237,0,2,0,17.07,-1.96
75 | O=C1NC(=O)NC(=O)C1(CC=C)c1ccccc1,5-Allyl-5-phenylbarbital,-2.36,1,244.25,2,2,3,75.27000000000001,-2.369
76 | CCCCC(=O)OCC,Pentyl propanoate,-1.899,1,130.18699999999998,0,0,4,26.3,-2.25
77 | CC(C)CCOC(=O)C,Isopentyl acetate,-1.817,1,130.18699999999998,0,0,3,26.3,-1.92
78 | O=C1N(COC(=O)CCCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Hexanoyloxymethylphenyltoin,-4.1530000000000005,1,380.444,1,3,8,75.71,-5.886
79 | Clc1cccc(c1)c2cc(Cl)ccc2Cl ,"2,3',5-PCB",-5.7620000000000005,1,257.547,0,2,1,0.0,-6.01
80 | CCCBr,1-Bromopropane,-1.949,1,122.993,0,0,1,0.0,-1.73
81 | CCCC1COC(Cn2cncn2)(O1)c3ccc(Cl)cc3Cl,Propiconazole,-4.603,1,342.2260000000001,0,3,5,49.17,-3.4930000000000003
82 | COP(=S)(OC)SCC(=O)N(C)C=O,Formothion,-2.087,1,257.273,0,0,6,55.84,-1.995
83 | Cc1ncnc2nccnc12,4-methylpteridine,-1.24,1,146.15299999999996,0,2,0,51.56,-0.466
84 | NC(=S)N,Thiourea,0.3289999999999999,1,76.12400000000001,2,0,0,52.04,0.32
85 | Cc1ccc(C)cc1,p-Xylene ,-3.035,1,106.168,0,1,0,0.0,-2.77
86 | CCc1ccccc1CC,"1,2-Diethylbenzene",-3.6010000000000004,1,134.22199999999998,0,1,2,0.0,-3.28
87 | ClC(Cl)(Cl)C(Cl)(Cl)Cl,Hexachloroethane,-4.215,1,236.74,0,0,0,0.0,-3.67
88 | CC(C)C(C(=O)OC(C#N)c1cccc(Oc2ccccc2)c1)c3ccc(OC(F)F)cc3,Flucythrinate,-6.877999999999999,1,451.4690000000001,0,3,9,68.55000000000001,-6.876
89 | CCCN(=O)=O,1-Nitropropane,-0.816,1,89.09399999999998,0,0,2,43.14,-0.8
90 | CC(C)C1CCC(C)CC1=O,Menthone,-2.516,1,154.253,0,1,1,17.07,-2.35
91 | CCN2c1cc(Cl)ccc1NC(=O)c3cccnc23 ,RTI 24,-4.423,1,273.723,1,3,1,45.23,-5.36
92 | O=N(=O)c1c(Cl)c(Cl)ccc1,"2,3-Dichloronitrobenzene",-3.322,1,192.00100000000003,0,1,1,43.14,-3.48
93 | CCCC(C)C1(CC=C)C(=O)NC(=S)NC1=O ,thiamylal,-3.063,1,254.355,2,1,5,58.2,-3.46
94 | c1ccc2c(c1)c3cccc4cccc2c34,Fluoranthene,-4.957,2,202.256,0,4,0,0.0,-6.0
95 | CCCOC(C)C,Propylisopropylether,-1.354,1,102.17699999999998,0,0,3,9.23,-1.34
96 | Cc1cc(C)c2ccccc2c1,"1,3-Dimethylnaphthalene",-4.147,1,156.22799999999998,0,2,0,0.0,-4.29
97 | CCC(=C(CC)c1ccc(O)cc1)c2ccc(O)cc2 ,diethylstilbestrol,-5.074,1,268.356,2,2,4,40.46,-4.07
98 | c1(C#N)c(Cl)c(C#N)c(Cl)c(Cl)c(Cl)1,Chlorothalonil,-3.995,1,265.914,0,1,0,47.58,-5.64
99 | Clc1ccc(Cl)c(c1)c2ccc(Cl)c(Cl)c2,"2,3',4',5-PCB",-6.312,1,291.992,0,2,1,0.0,-7.25
100 | C1OC1c2ccccc2 ,styrene oxide,-1.826,2,120.15099999999995,0,2,1,12.53,-1.6
101 | CC(C)c1ccccc1,Isopropylbenzene ,-3.265,1,120.19499999999996,0,1,1,0.0,-3.27
102 | CC12CCC3C(CCC4=CC(=O)CCC34C)C2CCC1C(=O)CO,Deoxycorticosterone,-3.939,1,330.4680000000001,1,4,2,54.370000000000005,-3.45
103 | c2(Cl)c(Cl)c(Cl)c1nccnc1c2(Cl) ,chlorquinox,-4.438,1,267.93,0,2,0,25.78,-5.43
104 | C1OC(O)C(O)C(O)C1O,L-arabinose,0.601,1,150.13,4,1,0,90.15,0.39
105 | ClCCl,Dichloromethane,-1.156,1,84.93299999999999,0,0,0,0.0,-0.63
106 | CCc1cccc2ccccc12,1-Ethylnaphthalene ,-4.1,1,156.22799999999998,0,2,1,0.0,-4.17
107 | COC=O,Methyl formate,-0.048,1,60.05200000000001,0,0,1,26.3,0.58
108 | Oc1ccccc1N(=O)=O,o-Nitrophenol,-2.318,1,139.11,1,1,1,63.37,-1.74
109 | Cc1c[nH]c(=O)[nH]c1=O ,thymine,-0.78,1,126.115,2,1,0,65.72,-1.506
110 | CC(C)C,2-Methylpropane,-1.891,1,58.124,0,0,0,0.0,-2.55
111 | OCC1OC(C(O)C1O)n2cnc3c(O)ncnc23,Inosine,-0.8340000000000001,1,268.22900000000004,4,3,2,133.75,-1.23
112 | Oc1c(I)cc(C#N)cc1I,Ioxynil,-4.615,1,370.915,1,1,0,44.02,-3.61
113 | Oc1ccc(Cl)cc1C(=O)Nc2ccc(cc2Cl)N(=O)=O,Niclosamide,-5.032,1,327.1230000000001,2,2,3,92.47,-4.7
114 | CCCCC,Pentane,-2.261,1,72.151,0,0,2,0.0,-3.18
115 | c1ccccc1O,Phenol,-1.991,1,94.113,1,1,0,20.23,0.0
116 | Nc3ccc2cc1ccccc1cc2c3 ,2-aminoanthracene,-3.789,1,193.249,1,3,0,26.02,-5.17
117 | Cn1cnc2n(C)c(=O)[nH]c(=O)c12 ,theobromine,-1.05,1,180.167,1,2,0,72.68,-2.523
118 | c1ccc2cnccc2c1,Isoquinoline,-2.531,2,129.16199999999998,0,2,0,12.89,-1.45
119 | COP(=S)(OC)SCC(=O)N(C(C)C)c1ccc(Cl)cc1,Anilofos,-5.106,1,367.86,0,1,7,38.77,-4.432
120 | CCCCCCc1ccccc1,Hexylbenzene ,-4.22,1,162.276,0,1,5,0.0,-5.21
121 | Clc1ccccc1c2ccccc2,2-Chlorobiphenyl,-4.5280000000000005,1,188.657,0,2,1,0.0,-4.54
122 | CCCC(=C)C,2-Methyl-1-Pentene,-2.3480000000000003,1,84.16199999999999,0,0,2,0.0,-3.03
123 | CC(C)C(C)C(C)C,"2,3,4-Trimethylpentane",-3.276,1,114.232,0,0,2,0.0,-4.8
124 | Clc1cc(Cl)c(Cl)c(Cl)c1Cl,Pentachlorobenzene,-5.167999999999999,1,250.339,0,1,0,0.0,-5.65
125 | Oc1cccc(c1)N(=O)=O,m-Nitrophenol,-2.318,1,139.11,1,1,1,63.37,-1.01
126 | CCCCCCCCC=C,1-Decene,-3.781,1,140.26999999999998,0,0,7,0.0,-5.51
127 | CC(=O)OCC(COC(=O)C)OC(=O)C,Glyceryl triacetate,-1.285,1,218.205,0,0,5,78.9,-0.6
128 | CCCCc1c(C)nc(nc1O)N(C)C ,dimethirimol,-3.57,1,209.293,1,1,4,49.25000000000001,-2.24
129 | CC1(C)C(C=C(Cl)Cl)C1C(=O)OC(C#N)c2ccc(F)c(Oc3ccccc3)c2,Cyfluthrin,-6.84,1,434.29400000000015,0,3,6,59.32000000000001,-7.337000000000001
130 | c1ccncc1,Pyridine,-1.481,2,79.10199999999998,0,1,0,12.89,0.76
131 | CCCCCCCBr,1-Bromoheptane,-3.366,1,179.101,0,0,5,0.0,-4.43
132 | Cc1ccncc1C,"3,4-Dimethylpyridine",-2.067,1,107.156,0,1,0,12.89,0.36
133 | CC34CC(O)C1(F)C(CCC2=CC(=O)CCC12C)C3CCC4(O)C(=O)CO ,Fludrocortisone,-3.172,1,380.4560000000001,3,4,2,94.83,-3.43
134 | CCSCc1ccccc1OC(=O)NC ,ethiofencarb,-2.855,1,225.313,1,1,4,38.33,-2.09
135 | CCOC(=O)CC(=O)OCC,Malonic acid diethylester,-1.413,1,160.16899999999998,0,0,4,52.60000000000001,-0.82
136 | CC1=CCC(CC1)C(C)=C,d-Limonene,-3.429,1,136.238,0,1,1,0.0,-4.26
137 | C1Cc2ccccc2C1,Indan,-3.057,2,118.17899999999996,0,2,0,0.0,-3.04
138 | CC(C)(C)c1ccc(O)cc1,p-t-Butylphenol,-3.192,1,150.22099999999998,1,1,0,20.23,-2.41
139 | O=C2NC(=O)C1(CC1)C(=O)N2 ,Cyclopropyl-5-spirobarbituric acid,-0.088,1,154.125,2,2,0,75.27,-1.886
140 | Clc1cccc(I)c1,m-Chloroiodobenzene,-4.384,1,238.455,0,1,0,0.0,-3.55
141 | Brc1cccc2ccccc12,1-Bromonapthalene,-4.434,1,207.07,0,2,0,0.0,-4.35
142 | CC/C=C/C,trans-2-Pentene ,-2.076,1,70.135,0,0,1,0.0,-2.54
143 | Cc1cccc(C)n1,"2,6-Dimethylpyridine",-2.0980000000000003,1,107.156,0,1,0,12.89,0.45
144 | ClC=C(Cl)Cl,Trichloroethylene,-2.312,1,131.389,0,0,0,0.0,-1.96
145 | Nc1cccc2ccccc12,1-Napthylamine,-2.721,1,143.189,1,2,0,26.02,-1.92
146 | Cc1cccc(C)c1,m-Xylene ,-3.035,1,106.168,0,1,0,0.0,-2.82
147 | Oc2ncc1nccnc1n2,2-hydroxypteridine,-1.404,1,148.125,1,2,0,71.79,-1.947
148 | CO,Methanol,0.441,1,32.042,1,0,0,20.23,1.57
149 | CCC1(CCC(C)C)C(=O)NC(=O)NC1=O,Amobarbital,-2.312,1,226.276,2,1,4,75.27000000000001,-2.468
150 | CCC(=O)C,2-Butanone,-0.491,1,72.107,0,0,1,17.07,0.52
151 | Fc1c[nH]c(=O)[nH]c1=O ,5-fluorouracil,-0.792,1,130.078,2,1,0,65.72,-1.077
152 | Nc1ncnc2n(ccc12)C3OC(CO)C(O)C3O ,tubercidin,-0.892,1,266.257,4,3,2,126.65,-1.95
153 | Oc1cccc(O)c1,"1,3-Benzenediol",-1.59,1,110.11199999999998,2,1,0,40.46,0.81
154 | CCCCCCO,1-Hexanol,-1.3969999999999998,1,102.177,1,0,4,20.23,-1.24
155 | CCCCCCl,1-Chloropentane,-2.294,1,106.596,0,0,3,0.0,-2.73
156 | C=CC=C,"1,3-Butadiene",-1.376,1,54.09199999999999,0,0,1,0.0,-1.87
157 | CCCOC(=O)C,Propyl acetate,-1.125,1,102.13299999999998,0,0,2,26.3,-0.72
158 | Oc2ccc1CCCCc1c2 ,"5,6,7,8-tetrahydro-2-naphthol",-3.0860000000000003,1,148.205,1,2,0,20.23,-1.99
159 | NC(=O)CCl ,chloroacetamide,-0.106,1,93.513,1,0,1,43.09,-0.02
160 | COP(=S)(OC)Oc1cc(Cl)c(I)cc1Cl,Iodofenphos,-6.148,1,413.0,0,1,4,27.69,-6.62
161 | Cc1ccc(Cl)cc1,4-Chlorotoluene,-3.297,1,126.586,0,1,0,0.0,-3.08
162 | CSc1nnc(c(=O)n1N)C(C)(C)C,Metribuzin,-2.324,1,214.294,1,1,1,73.8,-2.253
163 | Cc1ccc(OP(=O)(Oc2cccc(C)c2)Oc3ccccc3C)cc1,Tricresyl phosphate,-6.39,1,368.3690000000001,0,3,6,44.760000000000005,-6.01
164 | CCCCCC=O,Caproaldehyde,-1.457,1,100.161,0,0,4,17.07,-1.3
165 | CCCCOC(=O)c1ccc(N)cc1,Butamben,-3.039,1,193.246,1,1,4,52.32,-3.082
166 | O2c1cc(C)ccc1N(C)C(=O)c3cc(N)cnc23 ,RTI 3,-3.049,1,255.277,1,3,0,68.45,-3.043
167 | CC(C)=CCC/C(C)=C\CO,Nerol,-2.603,1,154.253,1,0,4,20.23,-2.46
168 | Clc1ccc(cc1)c2ccccc2Cl ,"2,4'-PCB",-5.142,1,223.102,0,2,1,0.0,-5.28
169 | O=C1N(COC(=O)CCCCCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Octanoyloxymethylphenytoin,-4.84,1,408.498,1,3,10,75.71,-6.523
170 | CCN(=O)=O,Nitroethane,-0.462,1,75.067,0,0,1,43.14,-0.22
171 | CCN(CC(C)=C)c1c(cc(cc1N(=O)=O)C(F)(F)F)N(=O)=O,Ethalfluralin,-5.063,1,333.266,0,1,6,89.51999999999998,-6.124
172 | Clc1ccc(Cl)c(Cl)c1Cl,"1,2,3,4-Tetrachlorobenzene",-4.546,1,215.894,0,1,0,0.0,-4.57
173 | CCCC(C)(COC(N)=O)COC(N)=O ,Meprobamate,-1.376,1,218.253,2,0,6,104.64,-1.807
174 | CC(=O)C3CCC4C2CC=C1CC(O)CCC1(C)C2CCC34C ,pregnenolone,-4.342,1,316.48500000000007,1,4,1,37.3,-4.65
175 | CI,Iodomethane,-1.646,1,141.939,0,0,0,0.0,-1.0
176 | CC1CC(C)C(=O)C(C1)C(O)CC2CC(=O)NC(=O)C2 ,cycloheximide,-1.5319999999999998,1,281.35200000000003,2,2,3,83.47,-1.13
177 | O=C1N(COC(=O)CCCCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Heptanoyloxymethylphenytoin,-4.496,1,394.471,1,3,9,75.71,-6.301
178 | CC1=CC(=O)CC(C)(C)C1 ,isophorone,-2.015,1,138.20999999999998,0,1,0,17.07,-1.06
179 | O=C1NC(=O)NC(=O)C1(CC)C(C)CC,Butabarbital,-1.958,1,212.249,2,1,3,75.27000000000001,-2.39
180 | CCCCC(=O)CCCC,5-Nonanone,-2.329,1,142.242,0,0,6,17.07,-2.58
181 | CCC1(CCC(=O)NC1=O)c2ccccc2 ,Glutethimide,-2.591,1,217.268,1,2,2,46.17,-2.337
182 | CCC(C)CC,3-Methylpentane,-2.6,1,86.178,0,0,2,0.0,-3.68
183 | CCOc1ccc(cc1)C(C)(C)COCc3cccc(Oc2ccccc2)c3,Etofenprox,-6.896,1,376.496,0,3,9,27.69,-8.6
184 | Cc1ccccc1n3c(C)nc2ccccc2c3=O,Methaqualone,-3.881,1,250.301,0,3,1,34.89,-2.925
185 | ClCC#N,Chloroacetonitrile,-0.4479999999999999,1,75.498,0,0,0,23.79,-0.092
186 | CCOP(=S)(CC)Oc1cc(Cl)c(Cl)cc1Cl,Trichloronate,-5.225,1,333.60400000000004,0,1,5,18.46,-5.752000000000001
187 | CC12CCC(=O)C=C1CCC3C2CCC4(C)C3CCC4(O)C#C ,Ethisterone,-3.858,1,312.45300000000003,1,4,0,37.3,-5.66
188 | c1ccnnc1,Pyridazine,-0.619,2,80.08999999999999,0,1,0,25.78,1.1
189 | Clc1cc(Cl)c(Cl)c(Cl)c1,"1,2,3,5-Tetrachlorobenzene",-4.621,1,215.894,0,1,0,0.0,-4.63
190 | C1C(O)CCC2(C)CC3CCC4(C)C5(C)CC6OCC(C)CC6OC5CC4C3C=C21,Diosgenin,-5.681,1,414.63000000000017,1,6,0,38.69,-7.32
191 | Nc1ccccc1O,o-Aminophenol,-1.465,1,109.128,2,1,0,46.25,-0.72
192 | CCCCCCCCC(=O)OCC,Ethyl nonanoate,-3.3160000000000003,1,186.295,0,0,8,26.3,-3.8
193 | COCC(=O)N(C(C)C(=O)OC)c1c(C)cccc1C ,metalaxyl,-2.87,1,279.336,0,1,5,55.84,-1.601
194 | CNC(=O)Oc1ccccc1OC(C)C,Propoxur,-2.4090000000000003,1,209.245,1,1,3,47.56,-2.05
195 | CCC(C)Cl,2-Chlorobutane,-1.94,1,92.569,0,0,1,0.0,-1.96
196 | Oc1ccc2ccccc2c1,2-Napthol,-3.08,1,144.17299999999997,1,2,0,20.23,-2.28
197 | CC(C)Oc1cc(c(Cl)cc1Cl)n2nc(oc2=O)C(C)(C)C,Oxadiazon,-5.265,1,345.22600000000017,0,2,3,57.26,-5.696000000000001
198 | CCCCC#C,1-Hexyne ,-1.801,1,82.14599999999999,0,0,2,0.0,-2.36
199 | CCCCCCCC#C,1-Nonyne ,-2.864,1,124.227,0,0,5,0.0,-4.24
200 | Cc1ccccc1Cl,2-Chlorotoluene,-3.297,1,126.586,0,1,0,0.0,-3.52
201 | CC(C)OC(C)C,Diisopropyl ether ,-1.281,1,102.177,0,0,2,9.23,-1.1
202 | Nc1ccc(cc1)S(=O)(=O)c2ccc(N)cc2,Dapsone,-2.464,1,248.307,2,2,2,86.18,-3.094
203 | CNN,Methyl hydrazine,0.5429999999999999,1,46.073,2,0,0,38.05,1.34
204 | CC#C,Propyne,-0.672,1,40.065000000000005,0,0,0,0.0,-0.41
205 | CCOP(=S)(OCC)ON=C(C#N)c1ccccc1,Phoxim,-4.557,1,298.304,0,1,7,63.84,-4.862
206 | CCNP(=S)(OC)OC(=CC(=O)OC(C)C)C,Propetamphos,-2.826,1,281.314,1,0,7,56.790000000000006,-3.408
207 | C=CC=O,Acrolein,-0.184,1,56.064,0,0,1,17.07,0.57
208 | O=c1[nH]cnc2nc[nH]c12 ,Hypoxanthine,-0.6559999999999999,1,136.114,2,2,0,74.43,-2.296
209 | Oc2ccc1ncccc1c2 ,6-hydroxyquinoline,-2.725,1,145.161,1,2,0,33.120000000000005,-2.16
210 | Fc1ccccc1,Fluorobenzene,-2.514,1,96.10399999999998,0,1,0,0.0,-1.8
211 | CCCCl,1-Chloropropane,-1.585,1,78.542,0,0,1,0.0,-1.47
212 | CCOC(=O)C,Ethyl acetate,-0.77,1,88.106,0,0,1,26.3,-0.04
213 | CCCC(C)(C)C,"2,2-Dimethylpentane",-2.938,1,100.20499999999998,0,0,1,0.0,-4.36
214 | Cc1cc(C)c(C)c(C)c1C,Pentamethylbenzene,-3.993,1,148.249,0,1,0,0.0,-4.0
215 | CC12CCC(CC1)C(C)(C)O2 ,eucalyptol,-2.579,1,154.253,0,3,0,9.23,-1.64
216 | CCCCOC(=O)CCCCCCCCC(=O)OCCCC,dibutyl sebacate,-4.726,1,314.46600000000007,0,0,15,52.60000000000001,-3.896
217 | Clc1ccc(cc1)c2ccc(Cl)cc2 ,"4,4'-PCB",-5.299,1,223.102,0,2,1,0.0,-6.56
218 | Cc1cccnc1C,"2,3-Dimethylpyridine",-2.067,1,107.156,0,1,0,12.89,0.38
219 | CC(=C)C1CC=C(C)C(=O)C1,Carvone,-2.042,1,150.22099999999998,0,1,1,17.07,-2.06
220 | CCOP(=S)(OCC)SCSc1ccc(Cl)cc1,Carbophenthion,-5.827999999999999,1,342.875,0,1,8,18.46,-5.736000000000001
221 | COc1cc(cc(OC)c1O)C6C2C(COC2=O)C(OC4OC3COC(C)OC3C(O)C4O)c7cc5OCOc5cc67,"Etoposide (148-167,25mg/ml)",-3.292,1,588.5620000000001,3,7,5,160.83,-3.571
222 | c1cc2cccc3c4cccc5cccc(c(c1)c23)c54,Perylene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.804
223 | Cc1ccc(cc1N(=O)=O)N(=O)=O,"2,4-Dinitrotoluene",-2.604,1,182.135,0,1,2,86.28,-2.82
224 | c1c(Br)ccc2ccccc12 ,2-bromonaphthalene,-4.434,1,207.07,0,2,0,0.0,-4.4
225 | CNC(=O)Oc1cccc(N=CN(C)C)c1,Formetanate,-1.846,1,221.26,1,1,3,53.93,-2.34
226 | COc2cnc1ncncc1n2,6-methoxypteridine,-1.589,1,162.15200000000002,0,2,1,60.790000000000006,-1.139
227 | Cc3ccnc4N(C1CC1)c2ncccc2C(=O)Nc34 ,nevirapine,-3.397,1,266.30400000000003,1,4,1,58.120000000000005,-3.19
228 | CCOP(=S)(OCC)Oc1nc(Cl)n(n1)C(C)C,Isazofos,-3.76,1,313.747,0,1,7,58.4,-3.658
229 | CC(=C)C=C,"2-Methyl-1,3-Butadiene ",-1.714,1,68.11900000000001,0,0,1,0.0,-2.03
230 | CC(C)=CCCC(O)(C)C=C,linalool,-2.399,1,154.253,1,0,4,20.23,-1.99
231 | COP(=S)(OC)Oc1ccc(SC)c(C)c1,Fenthion,-4.265,1,278.335,0,1,5,27.69,-4.57
232 | OC1CCCCC1,Cyclohexanol ,-1.261,1,100.161,1,1,0,20.23,-0.44
233 | O=C1NC(=O)NC(=O)C1(C)CC=C,5-Allyl-5-methylbarbital,-1.013,1,182.179,2,1,2,75.27000000000001,-1.16
234 | CC34CCC1C(CCC2CC(O)CCC12C)C3CCC4=O,Epiandrosterone,-3.882,1,290.447,1,4,0,37.3,-4.16
235 | OCC(O)C(O)C(O)C(O)CO ,mannitol,0.647,1,182.172,6,0,5,121.38,0.06
236 | Cc1ccc(cc1)c2ccccc2,4-Methylbiphenyl,-4.424,1,168.239,0,2,1,0.0,-4.62
237 | CCNc1nc(Cl)nc(NC(C)C)n1,Atrazine,-3.069,1,215.688,2,1,4,62.73,-3.85
238 | NC(=S)Nc1ccccc1,Phenylthiourea,-1.7009999999999998,1,152.22199999999998,2,1,1,38.05,-1.77
239 | CCCC(=O)CCC,4-Heptanone,-1.62,1,114.188,0,0,4,17.07,-1.3
240 | CC(=O)C(C)(C)C,"3,3-Dimethyl-2-butanone",-1.25,1,100.161,0,0,0,17.07,-0.72
241 | Oc1ccc(Cl)cc1,4-Chlorophenol ,-2.761,1,128.558,1,1,0,20.23,-0.7
242 | O=C1CCCCC1,Cyclohexanone,-0.996,1,98.145,0,1,0,17.07,-0.6
243 | Cc1cccc(N)c1,m-Methylaniline,-1.954,1,107.156,1,1,0,26.02,-0.85
244 | ClC(Cl)(Cl)C#N,Trichloroacetonitrile,-2.019,1,144.388,0,0,0,23.79,-2.168
245 | CNc2cnn(c1cccc(c1)C(F)(F)F)c(=O)c2Cl,norflurazon,-4.029,1,303.67100000000005,1,2,2,46.92,-4.046
246 | CCCCCCCCC(=O)C,2-Decanone,-2.617,1,156.269,0,0,7,17.07,-3.3
247 | CCN(CC)c1nc(Cl)nc(NC(C)C)n1,Ipazine,-3.497,1,243.742,1,1,5,53.940000000000005,-3.785
248 | CCOC(=O)c1ccc(N)cc1,Benzocaine,-2.383,1,165.19199999999998,1,1,2,52.32,-2.616
249 | Clc1ccc(Cl)c(Cl)c1,"1,2,4-Trichlorobenzene",-4.083,1,181.449,0,1,0,0.0,-3.59
250 | Cc3nnc4CN=C(c1ccccc1Cl)c2cc(Cl)ccc2n34,Triazolam,-3.948,1,343.2170000000001,0,4,1,43.07,-4.09
251 | Oc1ccccc1O,"1,2-Benzenediol",-1.635,1,110.11199999999998,2,1,0,40.46,0.62
252 | CCN2c1ncccc1N(C)C(=O)c3cccnc23 ,Reverse Transcriptase inhibitor 1,-2.794,1,254.293,0,3,1,49.330000000000005,-2.62
253 | CSC,Dimethyl sulfide,-0.758,1,62.137,0,0,0,0.0,-0.45
254 | Cc1ccccc1Br,2-Bromotoluene,-3.667,1,171.03699999999998,0,1,0,0.0,-2.23
255 | CCOC(=O)N,O-Ethyl carbamate,-0.218,1,89.09400000000001,1,0,1,52.32,0.85
256 | CC(=O)OC3(CCC4C2C=C(C)C1=CC(=O)CCC1(C)C2CCC34C)C(C)=O ,megestrol acetate,-4.417,1,384.5160000000002,0,4,2,60.440000000000005,-5.35
257 | CC(C)C(O)C(C)C,"2,4-Dimethyl-3-pentanol",-1.6469999999999998,1,116.20399999999998,1,0,2,20.23,-1.22
258 | c1ccc2ccccc2c1,Napthalene,-3.468,2,128.17399999999995,0,2,0,0.0,-3.6
259 | CCNc1ccccc1,N-Ethylaniline,-2.389,1,121.18299999999996,1,1,2,12.03,-1.7
260 | O=C1NC(=O)C(N1)(c2ccccc2)c3ccccc3,Phenytoin,-3.057,1,252.273,2,3,2,58.2,-4.097
261 | Cc1c2ccccc2c(C)c3ccc4ccccc4c13,"7,12-Dimethylbenz(a)anthracene",-6.297000000000001,1,256.348,0,4,0,0.0,-7.02
262 | CCOP(=S)(OCC)SC(CCl)N1C(=O)c2ccccc2C1=O,Dialifor,-5.026,1,393.85400000000016,0,2,8,55.84,-6.34
263 | COc1ccc(cc1)C(c2ccc(OC)cc2)C(Cl)(Cl)Cl,Methoxychlor,-5.537999999999999,1,345.6529999999999,0,2,4,18.46,-6.89
264 | Fc1cccc(F)c1C(=O)NC(=O)Nc2cc(Cl)c(F)c(Cl)c2F ,TEFLUBENZURON,-5.462000000000001,1,381.1120000000001,2,2,2,58.2,-7.28
265 | O=C1N(COC(=O)CCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Pentanoyloxymethylphenytoin,-3.81,1,366.417,1,3,7,75.71,-4.678
266 | CN(C)C(=O)Nc1ccc(Cl)cc1,Monuron,-2.6710000000000003,1,198.653,1,1,1,32.34,-2.89
267 | OC(Cn1cncn1)(c2ccc(F)cc2)c3ccccc3F,Flutriafol,-3.569,1,301.296,1,3,4,50.94,-3.37
268 | CC(=O)OCC(=O)C3(O)C(CC4C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC34C)OC(C)=O ,triamcinolone diacetate,-3.876,1,478.51300000000026,2,4,4,127.20000000000002,-4.13
269 | CCCCBr,1-Bromobutane,-2.303,1,137.01999999999998,0,0,2,0.0,-2.37
270 | Brc1cc(Br)c(Br)cc1Br,"1,2,4,5-Tetrabromobenzene",-6.001,1,393.698,0,1,0,0.0,-6.98
271 | CC(C)CC(=O)C,4-Methyl-2-pentanone,-1.1840000000000002,1,100.161,0,0,2,17.07,-0.74
272 | CCSC(=O)N(CC)C1CCCCC1 ,cycloate,-3.35,1,215.362,0,1,3,20.31,-3.4
273 | COc1ccc(Cl)cc1,4-Chloroanisole,-3.057,1,142.585,0,1,1,9.23,-2.78
274 | CC1(C)C(C=C(Br)Br)C1C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Deltamethrin,-7.44,1,505.2060000000002,0,3,6,59.32000000000001,-8.402000000000001
275 | CCC(C)C1(CC=C)C(=O)NC(=O)NC1=O,Talbutal,-2.06,1,224.26,2,1,4,75.27000000000001,-2.016
276 | COP(=S)(OC)Oc1ccc(N(=O)=O)c(C)c1,Fenitrothion,-3.845,1,277.238,0,1,5,70.83000000000001,-4.04
277 | Ic1cccc2ccccc12,1-Iodonapthalene,-4.888999999999999,1,254.07,0,2,0,0.0,-4.55
278 | OCC(O)C(O)C(O)C(O)CO,Sorbitol,0.647,1,182.172,6,0,5,121.38,1.09
279 | CCS,Ethanethiol,-0.968,1,62.137,1,0,0,0.0,-0.6
280 | ClCC(Cl)Cl,"1,1,2-Trichloroethane",-1.961,1,133.405,0,0,1,0.0,-1.48
281 | CN(C)C(=O)Oc1cc(C)nn1c2ccccc2,Pyrolan,-3.141,1,245.282,0,2,2,47.36000000000001,-2.09
282 | NC(=O)c1ccccc1O,o-Hydroxybenzamide,-1.942,1,137.13799999999998,2,1,1,63.32000000000001,-1.82
283 | Cc1ccccc1N(=O)=O,o-Nitrotoluene,-2.589,1,137.138,0,1,1,43.14,-2.33
284 | O=C1NC(=O)NC(=O)C1(C(C)C)C(C)C,"5,5-Diisopropylbarbital",-1.942,1,212.249,2,1,2,75.27000000000001,-2.766
285 | CCc1ccccc1C,2-Ethyltoluene,-3.2960000000000003,1,120.19499999999996,0,1,1,0.0,-3.21
286 | CCCCCCCCl,1-Chloroheptane,-3.003,1,134.65,0,0,5,0.0,-4.0
287 | O=C1NC(=O)NC(=O)C1(CC)CC,Barbital,-1.265,1,184.195,2,1,2,75.27000000000001,-2.4
288 | C(Cc1ccccc1)c2ccccc2,Bibenzyl ,-4.301,2,182.266,0,2,3,0.0,-4.62
289 | ClC(Cl)C(Cl)Cl,"1,1,2,2-Tetrachloroethane",-2.549,1,167.85,0,0,1,0.0,-1.74
290 | CCN2c1cc(OC)cc(C)c1NC(=O)c3cccnc23 ,RTI 23,-4.228,1,283.331,1,3,2,54.46,-5.153
291 | Cc1ccc2c(ccc3ccccc32)c1,2-Methylphenanthrene,-4.87,1,192.261,0,3,0,0.0,-5.84
292 | CCCCOC(=O)c1ccccc1C(=O)OCCCC ,dibutylphthalate,-4.378,1,278.348,0,1,8,52.60000000000001,-4.4
293 | COc1c(O)c(Cl)c(Cl)c(Cl)c1Cl ,tetrachloroguaiacol,-4.299,1,261.919,1,1,1,29.46,-4.02
294 | CCN(CC)C(=O)C(=CCOP(=O)(OC)OC)Cl,Dimecron,-2.426,1,299.6909999999999,0,0,8,65.07000000000001,0.523
295 | CC34CCC1C(=CCc2cc(O)ccc12)C3CCC4=O,Equilin,-3.555,1,268.356,1,4,0,37.3,-5.282
296 | CCOC(=O)c1ccccc1S(=O)(=O)NN(C=O)c2nc(Cl)cc(OC)n2,Chlorimuron-ethyl (ph 7),-3.719,1,414.82700000000017,1,2,8,127.79,-4.5760000000000005
297 | COc1ccc(cc1)N(=O)=O,p-Nitroanisole,-2.522,1,153.13699999999997,0,1,2,52.37,-2.41
298 | CCCCCCCl,1-Chlorohexane,-2.648,1,120.623,0,0,4,0.0,-3.12
299 | Clc1cc(c(Cl)c(Cl)c1Cl)c2cc(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,3',4,4',5,5'-PCB",-8.468,1,429.77200000000016,0,2,1,0.0,-9.16
300 | OCC1OC(CO)(OC2OC(COC3OC(CO)C(O)C(O)C3O)C(O)C(O)C2O)C(O)C1O,Raffinose,0.496,1,504.4380000000001,11,3,8,268.67999999999995,-0.41
301 | CCCCCCCCCCCCCCCCCCCCCCCCCC,hexacosane,-9.702,1,366.7180000000002,0,0,23,0.0,-8.334
302 | CCN2c1ccccc1N(C)C(=O)c3cccnc23 ,RTI 5,-3.471,1,253.30499999999995,0,3,1,36.44,-3.324
303 | CC(Cl)Cl,"1,1-Dichloroethane",-1.5759999999999998,1,98.96,0,0,0,0.0,-1.29
304 | Nc1ccc(cc1)S(N)(=O)=O,Sulfanilamide,-0.954,1,172.20899999999995,2,1,1,86.18,-1.34
305 | CCCN(CCC)c1c(cc(cc1N(=O)=O)C(C)C)N(=O)=O,Isopropalin,-5.306,1,309.36600000000004,0,1,8,89.51999999999998,-6.49
306 | ClC1C(Cl)C(Cl)C(Cl)C(Cl)C1Cl,Lindane,-4.009,1,290.832,0,1,0,0.0,-4.64
307 | CCOP(=S)(NC(C)C)Oc1ccccc1C(=O)OC(C)C,Isofenphos,-4.538,1,345.4010000000002,1,1,8,56.790000000000006,-4.194
308 | Clc1cccc(Cl)c1Cl,"1,2,3-Trichlorobenzene",-4.008,1,181.449,0,1,0,0.0,-4.0
309 | ClC(Cl)(Cl)Cl,Tetrachloromethane,-2.607,1,153.823,0,0,0,0.0,-2.31
310 | O=N(=O)c1cc(Cl)c(Cl)cc1,"3,4-Dichloronitrobenzene",-3.448,1,192.001,0,1,1,43.14,-3.2
311 | OC1CCCCCCC1,Cyclooctanol,-2.14,1,128.215,1,1,0,20.23,-1.29
312 | CC1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3CCC21C,17a-Methyltestosterone,-4.073,1,302.4580000000001,1,4,0,37.3,-3.999
313 | CCOc1ccc(NC(N)=O)cc1,Dulcin,-2.167,1,180.207,2,1,3,64.35,-2.17
314 | C/C1CCC(\C)CC1,"trans-1,4-Dimethylcyclohexane",-3.305,1,112.216,0,1,0,0.0,-4.47
315 | c1cnc2c(c1)ccc3ncccc23,"1,7-phenantroline",-2.994,2,180.21,0,3,0,25.78,-2.68
316 | COC(C)(C)C,Methyl t-butyl ether ,-0.984,1,88.14999999999999,0,0,0,9.23,-0.24
317 | COc1ccc(C=CC)cc1,Anethole,-3.254,1,148.20499999999998,0,1,2,9.23,-3.13
318 | CCCCCCCCCCCCCCCCO,1-Hexadecanol,-4.94,1,242.44699999999992,1,0,14,20.23,-7.0
319 | O=c1cc[nH]c(=O)[nH]1 ,uracil,-0.441,1,112.088,2,1,0,65.72,-1.4880000000000002
320 | Nc1ncnc2nc[nH]c12 ,adenine,-1.255,1,135.13,2,2,0,80.47999999999999,-2.12
321 | Clc1cc(Cl)c(cc1Cl)c2cccc(Cl)c2Cl ,"2,2',3,4,5-PCB",-6.709,1,326.437,0,2,1,0.0,-7.21
322 | COc1ccc(cc1)C(O)(C2CC2)c3cncnc3 ,Ancymidol,-2.181,1,256.30499999999995,1,3,4,55.24,-2.596
323 | c1ccc2c(c1)c3cccc4c3c2cc5ccccc54,Benzo(b)fluoranthene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.23
324 | O=C(Nc1ccccc1)Nc2ccccc2,Carbanilide,-3.611,1,212.252,2,2,2,41.13,-3.15
325 | CCC1(C(=O)NC(=O)NC1=O)c2ccccc2 ,phenobarbital,-2.272,1,232.239,2,2,2,75.27000000000001,-2.322
326 | Clc1ccc(cc1)c2cccc(Cl)c2Cl ,"2',3,4-PCB",-5.686,1,257.547,0,2,1,0.0,-6.29
327 | CC(C)c1ccc(NC(=O)N(C)C)cc1,Isoproturon,-2.867,1,206.289,1,1,2,32.34,-3.536
328 | CCN(CC)C(=O)CSc1ccc(Cl)nn1,Azintamide,-2.231,1,259.762,0,1,5,46.09,-1.716
329 | CCC(C)(C)CO,"2,2-Dimethyl-1-butanol",-1.365,1,102.17699999999998,1,0,2,20.23,-1.04
330 | CCCOC(=O)CCC,Ethyl pentanoate,-1.899,1,130.18699999999998,0,0,4,26.3,-1.75
331 | Cc1c(cc(cc1N(=O)=O)N(=O)=O)N(=O)=O,"2,4,6-Trinitrotoluene",-2.6060000000000003,1,227.132,0,1,3,129.42000000000002,-3.22
332 | CC(C)OP(=S)(OC(C)C)SCCNS(=O)(=O)c1ccccc1,Bensulide,-4.99,1,397.52400000000006,1,1,10,64.63,-4.2
333 | C1CCCCCC1,Cycloheptane,-2.9160000000000004,2,98.189,0,1,0,0.0,-3.51
334 | CCCOC=O,Propyl formate,-0.757,1,88.10599999999998,0,0,3,26.3,-0.49
335 | CC(C)c1ccccc1C,2-Isopropyltoluene,-3.585,1,134.22199999999995,0,1,1,0.0,-3.76
336 | Nc1cccc(Cl)c1,m-Chloroaniline,-2.392,1,127.574,1,1,0,26.02,-1.37
337 | CC(C)CC(C)C,"2,4-Dimethylpentane",-2.938,1,100.20499999999998,0,0,2,0.0,-4.26
338 | o1c2ccccc2c3ccccc13,Dibenzofurane,-4.2010000000000005,2,168.195,0,3,0,13.14,-4.6
339 | CCOC2Oc1ccc(OS(C)(=O)=O)cc1C2(C)C,ethofumesate,-3.184,1,286.34900000000005,0,2,4,61.830000000000005,-3.42
340 | CN(C)C(=O)Nc1cccc(c1)C(F)(F)F,Fluometuron,-3.065,1,232.205,1,1,1,32.34,-3.43
341 | c3ccc2nc1ccccc1cc2c3,Acridine,-3.846,2,179.22199999999998,0,3,0,12.89,-3.67
342 | CC12CC(=O)C3C(CCC4=CC(=O)CCC34C)C2CCC1(O)C(=O)CO,Cortisone,-2.893,1,360.45000000000016,2,4,2,91.67,-3.11
343 | OCC1OC(O)C(O)C(O)C1O,glucose,0.501,1,180.156,5,1,1,110.38,0.74
344 | Cc1cccc(O)c1,3-Methylphenol,-2.313,1,108.14,1,1,0,20.23,-0.68
345 | CC2Cc1ccccc1N2NC(=O)c3ccc(Cl)c(c3)S(N)(=O)=O ,Indapamide,-4.345,1,365.842,2,3,3,92.5,-3.5860000000000003
346 | CCC(C)C(=O)OC2CC(C)C=C3C=CC(C)C(CCC1CC(O)CC(=O)O1)C23 ,Lovastatin,-4.731,1,404.54700000000025,1,3,6,72.83,-6.005
347 | O=N(=O)c1ccc(cc1)N(=O)=O,"1,4-Dinitrobenzene",-2.281,1,168.10799999999995,0,1,2,86.28,-3.39
348 | CCC1(C(=O)NC(=O)NC1=O)C2=CCC3CCC2C3,Reposal,-2.781,1,262.309,2,3,2,75.27000000000001,-2.696
349 | CCCCCCCCCC(=O)OCC,Ethyl decanoate,-3.671,1,200.322,0,0,9,26.3,-4.1
350 | CN(C)C(=O)Nc1ccccc1,Fenuron,-1.847,1,164.208,1,1,1,32.34,-1.6
351 | CCCOCC,Ethyl propyl ether,-1.072,1,88.14999999999999,0,0,3,9.23,-0.66
352 | CC(C)O,2-Propanol,-0.261,1,60.096,1,0,0,20.23,0.43
353 | Cc1ccc2ccccc2c1,2-Methylnapthalene,-3.802,1,142.201,0,2,0,0.0,-3.77
354 | ClC(Br)Br,Chlorodibromethane,-2.54,1,208.28,0,0,0,0.0,-1.9
355 | CCC(C(CC)c1ccc(O)cc1)c2ccc(O)cc2,Hexestrol,-4.854,1,270.372,2,2,5,40.46,-4.43
356 | CCOC(=O)CC(SP(=S)(OC)OC)C(=O)OCC,Malathion,-3.391,1,330.3640000000001,0,0,9,71.06,-3.37
357 | ClCc1ccccc1,Benzylchloride,-2.887,1,126.58599999999996,0,1,1,0.0,-2.39
358 | C/C=C/C=O,t-Crotonaldehyde,-0.604,1,70.09100000000001,0,0,1,17.07,0.32
359 | CON(C)C(=O)Nc1ccc(Br)c(Cl)c1,Chlorbromuron,-3.938,1,293.548,1,1,2,41.57,-3.924
360 | Cc1c2ccccc2c(C)c3ccccc13,"9,10-Dimethylanthracene",-5.228,1,206.288,0,3,0,0.0,-6.57
361 | CCCCCC(=O)OC,Methyl hexanoate,-1.899,1,130.18699999999998,0,0,4,26.3,-1.87
362 | CN(C)C(=O)Nc1ccc(c(Cl)c1)n2nc(oc2=O)C(C)(C)C,Dimefuron,-3.831,1,338.79500000000013,1,2,2,80.37,-4.328
363 | CC(=O)Nc1ccc(F)cc1,p-Fluoroacetanilide,-2.181,1,153.156,1,1,1,29.1,-1.78
364 | CCc1cccc(CC)c1N(COC)C(=O)CCl ,alachlor,-3.319,1,269.77199999999993,0,1,6,29.54,-3.26
365 | C1CCC=CC1,Cyclohexene,-2.16,2,82.146,0,1,0,0.0,-2.59
366 | CC12CC(O)C3C(CCC4=CC(=O)CCC34C)C2CCC1(O)C(=O)CO,Hydrocortisone ,-3.159,1,362.4660000000002,3,4,2,94.83,-3.09
367 | c1cncnc1,Pyrimidine,-0.884,2,80.08999999999999,0,1,0,25.78,1.1
368 | Clc1ccc(cc1)N(=O)=O,p-Chloronitrobenzene,-2.901,1,157.55599999999998,0,1,1,43.14,-2.92
369 | CCC(=O)OC,Methyl propionate,-0.836,1,88.106,0,0,1,26.3,-0.14
370 | Clc1ccccc1N(=O)=O,o-Chloronitrobenzene,-2.775,1,157.55599999999998,0,1,1,43.14,-2.55
371 | CCCCN(C)C(=O)Nc1ccc(Cl)c(Cl)c1,Neburon,-4.157,1,275.179,1,1,4,32.34,-4.77
372 | CN1CC(O)N(C1=O)c2nnc(s2)C(C)(C)C,Buthidazole,-2.398,1,256.331,1,2,1,69.56,-1.877
373 | O=N(=O)c1ccccc1,Nitrobenzene,-2.2880000000000003,1,123.11099999999996,0,1,1,43.14,-1.8
374 | Ic1ccccc1,Iodobenzene,-3.8,1,204.01,0,1,0,0.0,-3.01
375 | CC2Nc1cc(Cl)c(cc1C(=O)N2c3ccccc3C)S(N)(=O)=O ,Metolazone,-3.777,1,365.8420000000001,2,3,2,92.5,-3.78
376 | COc1ccccc1OCC(O)COC(N)=O,Methocarbamol,-1.4280000000000002,1,241.243,2,1,6,91.01,-0.985
377 | CCCCOCN(C(=O)CCl)c1c(CC)cccc1CC ,butachlor,-4.347,1,311.85300000000007,0,1,9,29.54,-4.19
378 | Oc1cccc(Cl)c1Cl,"2,3-Dichlorophenol",-3.144,1,163.003,1,1,0,20.23,-1.3
379 | CCCC(=O)OC,Propyl butyrate,-1.1909999999999998,1,102.13299999999998,0,0,2,26.3,-1.92
380 | CCC(=O)Nc1ccc(Cl)c(Cl)c1,Propanil,-3.644,1,218.083,1,1,2,29.1,-3.0
381 | Nc3nc(N)c2nc(c1ccccc1)c(N)nc2n3,Triamterene,-3.051,1,253.26900000000003,3,3,1,129.62,-2.404
382 | CCCCCC(=O)OCC,Ethyl hexanoate,-2.254,1,144.21399999999997,0,0,5,26.3,-2.35
383 | OCC(O)C2OC1OC(OC1C2O)C(Cl)(Cl)Cl ,chloralose,-1.887,1,309.529,3,2,2,88.38000000000001,-1.84
384 | CN(C=Nc1ccc(C)cc1C)C=Nc2ccc(C)cc2C,Amitraz,-5.533,1,293.41400000000004,0,2,4,27.96,-5.47
385 | COc1nc(NC(C)C)nc(NC(C)C)n1,Prometon,-3.448,1,225.296,2,1,5,71.96000000000001,-2.478
386 | CCCCCCC=C,1-Octene ,-3.073,1,112.216,0,0,5,0.0,-4.44
387 | Cc1ccc(N)cc1,p-Methylaniline ,-1.954,1,107.156,1,1,0,26.02,-1.21
388 | Nc1nccs1 ,aminothiazole,-1.226,1,100.146,1,1,0,38.91,-0.36
389 | c1ccccc1(OC(=O)NC),Metolcarb,-1.947,1,151.165,1,1,1,38.33,-1.803
390 | CCCC(O)CC,3-Hexanol,-1.324,1,102.177,1,0,3,20.23,-0.8
391 | c3ccc2c(O)c1ccccc1cc2c3 ,9-anthrol,-4.148,1,194.233,1,3,0,20.23,-4.73
392 | Cc1ccc2cc3ccccc3cc2c1,2-Methylanthracene,-4.87,1,192.261,0,3,0,0.0,-6.96
393 | Cc1cccc(C)c1C,"1,2,3-Trimethylbenzene ",-3.312,1,120.195,0,1,0,0.0,-3.2
394 | CNC(=O)Oc1ccc(N(C)C)c(C)c1,Aminocarb,-2.677,1,208.261,1,1,2,41.57,-2.36
395 | CCCCCCCC(C)O,2-Nonanol,-2.387,1,144.258,1,0,6,20.23,-2.74
396 | CN(C(=O)NC(C)(C)c1ccccc1)c2ccccc2,Methyldymron,-3.863,1,268.36,1,2,3,32.34,-3.35
397 | CCCC(=O)CC,3-Hexanone,-1.266,1,100.161,0,0,3,17.07,-0.83
398 | Oc1c(Br)cc(C#N)cc1Br ,bromoxynil,-3.793,1,276.915,1,1,0,44.02,-3.33
399 | Clc1ccc(cc1Cl)c2ccccc2 ,"3,4-PCB",-5.223,1,223.102,0,2,1,0.0,-6.39
400 | CN(C(=O)COc1nc2ccccc2s1)c3ccccc3,Mefenacet,-4.504,1,298.367,0,3,4,42.43000000000001,-4.873
401 | Oc1cccc2ncccc12 ,5-hydroxyquinoline,-2.725,1,145.161,1,2,0,33.120000000000005,-2.54
402 | CC1=C(SCCO1)C(=O)Nc2ccccc2,Carboxin,-2.927,1,235.308,1,2,2,38.33,-3.14
403 | CCOc2ccc1nc(sc1c2)S(N)(=O)=O ,Ethoxyzolamide,-3.085,1,258.324,1,2,3,82.28,-3.81
404 | Oc1c(Cl)c(Cl)c(Cl)c(Cl)c1Cl,Pentachlorophenol,-4.835,1,266.338,1,1,0,20.23,-4.28
405 | ClCBr,Bromochloromethane,-1.519,1,129.384,0,0,0,0.0,-0.89
406 | CCC1(CC)C(=O)NC(=O)N(C)C1=O ,metharbital,-1.658,1,198.222,1,1,2,66.48,-2.23
407 | CC(=O)OCC(=O)C3CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,deoxycorticosterone acetate,-4.472,1,372.5050000000002,0,4,3,60.440000000000005,-4.63
408 | NC(=O)NCc1ccccc1 ,benzylurea,-1.509,1,150.18099999999998,2,1,2,55.120000000000005,-0.95
409 | CN(C)C(=O)Nc1ccc(C)c(Cl)c1,Chlortoluron,-3.048,1,212.68,1,1,1,32.34,-3.483
410 | CON(C)C(=O)Nc1ccc(Cl)c(Cl)c1,Linuron,-3.5810000000000004,1,249.097,1,1,2,41.57,-3.592
411 | OC1CCCCCC1,Cycloheptanol,-1.7,1,114.188,1,1,0,20.23,-0.88
412 | CS(=O)(=O)c1ccc(cc1)C(O)C(CO)NC(=O)C(Cl)Cl ,Thiamphenicol,-1.936,1,356.2270000000001,3,1,6,103.70000000000002,-2.154
413 | CCCC(C)C1(CC)C(=O)NC(=S)NC1=O ,thiopental,-2.96,1,242.344,2,1,4,58.2,-3.36
414 | CC(=O)Nc1nnc(s1)S(N)(=O)=O ,acetazolamide,-0.7929999999999999,1,222.251,2,1,2,115.04,-2.36
415 | Oc1ccc(cc1)N(=O)=O,p-Nitrophenol,-2.318,1,139.11,1,1,1,63.37,-0.74
416 | ClC1=C(Cl)C2(Cl)C3C4CC(C=C4)C3C1(Cl)C2(Cl)Cl,Aldrin,-5.511,1,364.914,0,4,0,0.0,-6.307
417 | C1CCOC1,Tetrahydrofurane ,-0.62,2,72.107,0,1,0,9.23,0.49
418 | Nc1ccccc1N(=O)=O,o-Nitroaniline,-2.277,1,138.126,1,1,1,69.16,-1.96
419 | Clc1cccc(c1Cl)c2cccc(Cl)c2Cl,"2,2',3,3'-PCB",-6.079,1,291.992,0,2,1,0.0,-7.28
420 | CCCCC1C(=O)N(N(C1=O)c2ccccc2)c3ccccc3 ,phenylbutazone,-4.0760000000000005,1,308.38100000000003,0,3,5,40.620000000000005,-3.81
421 | Cc1c(cccc1N(=O)=O)N(=O)=O,"2,6-Dinitrotoluene",-2.553,1,182.135,0,1,2,86.28,-3.0
422 | CC(=O)C1CCC2C3CCC4=CC(=O)CCC4(C)C3CCC12C,Progesterone,-4.17,1,314.46900000000005,0,4,1,34.14,-4.42
423 | CCN(CC)c1nc(Cl)nc(n1)N(CC)CC,Chlorazine,-3.663,1,257.76899999999995,0,1,6,45.150000000000006,-4.4110000000000005
424 | ClC(Cl)C(Cl)(Cl)SN2C(=O)C1CC=CCC1C2=O ,captafol,-4.365,1,349.06600000000014,0,2,3,37.38,-5.4
425 | c1(Br)c(Br)cc(Br)cc1,"1,2,4-tribromobenzene",-5.144,1,314.802,0,1,0,0.0,-4.5
426 | OC3N=C(c1ccccc1)c2cc(Cl)ccc2NC3=O ,Oxazepam,-3.517,1,286.718,2,3,1,61.690000000000005,-3.952
427 | O=C1NC(=O)NC(=O)C1(C(C)CCC)CC=C,Secobarbital,-2.415,1,238.28699999999995,2,1,5,75.27000000000001,-2.356
428 | c1(O)c(C)ccc(C(C)C)c1,Carvacrol,-3.224,1,150.22099999999998,1,1,1,20.23,-2.08
429 | C1SC(=S)NC1(=O),rhodanine,-0.396,1,133.197,1,1,0,29.1,-1.77
430 | Oc1ccc(c(O)c1)c3oc2cc(O)cc(O)c2c(=O)c3O ,Morin,-2.7310000000000003,1,302.23800000000006,5,3,1,131.36,-3.083
431 | ClC1(C(=O)C2(Cl)C3(Cl)C14Cl)C5(Cl)C2(Cl)C3(Cl)C(Cl)(Cl)C45Cl,Kepone,-5.112,1,490.6390000000001,0,6,0,17.07,-5.259
432 | CCN(CC)C(=S)SSC(=S)N(CC)CC,Disulfiram,-3.862,1,296.5520000000001,0,0,4,6.48,-4.86
433 | C1CCCCC1,Cyclohexane,-2.477,2,84.162,0,1,0,0.0,-3.1
434 | ClC1=C(Cl)C(Cl)(C(=C1Cl)Cl)C2(Cl)C(=C(Cl)C(=C2Cl)Cl)Cl,Dienochlor,-7.848,1,474.64,0,2,1,0.0,-7.278
435 | CN(C)C=Nc1ccc(Cl)cc1C,chlordimeform,-3.164,1,196.681,0,1,2,15.6,-2.86
436 | CC34CCc1c(ccc2cc(O)ccc12)C3CCC4=O,Equilenin,-3.927,1,266.34,1,4,0,37.3,-5.24
437 | CCCCCCCCO,1-Octanol,-2.105,1,130.23100000000002,1,0,6,20.23,-2.39
438 | CCSCC,Diethyl sulfide,-1.598,1,90.191,0,0,2,0.0,-1.34
439 | ClCCCl,"1,2-Dichloroethane",-1.374,1,98.96,0,0,1,0.0,-1.06
440 | CCC(C)(C)Cl,2-Chloro-2-methylbutane,-2.278,1,106.596,0,0,1,0.0,-2.51
441 | ClCCBr,1-Chloro-2-bromoethane,-1.7380000000000002,1,143.411,0,0,1,0.0,-1.32
442 | Nc1ccc(cc1)N(=O)=O,p-Nitroaniline,-1.936,1,138.126,1,1,1,69.16,-2.37
443 | OCC1OC(OC2C(O)C(O)C(O)OC2CO)C(O)C(O)C1O,Lactose,1.071,1,342.297,8,2,4,189.53,-0.244
444 | CCN2c1ncccc1N(CC)C(=O)c3cccnc23 ,RTI 2,-3.125,1,268.32,0,3,2,49.330000000000005,-2.86
445 | Clc1ccccc1,Chlorobenzene,-2.975,1,112.55899999999995,0,1,0,0.0,-2.38
446 | CCCCCCCC=C,1-Nonene ,-3.427,1,126.243,0,0,6,0.0,-5.05
447 | Brc1ccc(I)cc1,p-Bromoiodobenzene,-4.754,1,282.90599999999995,0,1,0,0.0,-4.56
448 | CCC(C)(O)CC,3-Methyl-3-pentanol,-1.308,1,102.17699999999998,1,0,2,20.23,-0.36
449 | CCCCCc1ccccc1,Pentylbenzene,-3.899,1,148.249,0,1,4,0.0,-4.64
450 | NC(=O)NC1NC(=O)NC1=O ,allantoin,0.652,1,158.117,4,1,1,113.32,-1.6
451 | OCC(O)COC(=O)c1ccccc1Nc2ccnc3cc(Cl)ccc23,Glafenine,-5.052,1,372.80800000000016,3,3,6,91.68,-4.571000000000001
452 | ClC(Cl)C(c1ccc(Cl)cc1)c2ccc(Cl)cc2,DDD,-6.007999999999999,1,320.04600000000005,0,2,3,0.0,-7.2
453 | CC(=O)OC3CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,testosterone acetate,-4.449,1,330.4680000000001,0,4,1,43.370000000000005,-5.184
454 | Clc1cccc2ccccc12,1-Chloronapthalene,-4.063,1,162.61899999999997,0,2,0,0.0,-3.93
455 | CCN2c1ccccc1N(C)C(=O)c3ccccc23 ,RTI 19,-4.007,1,252.31699999999995,0,3,1,23.55,-4.749
456 | CCCCC(C)O,2-Hexanol,-1.324,1,102.17699999999998,1,0,3,20.23,-0.89
457 | CCCC1CCCC1,Propylcyclopentane,-3.16,1,112.216,0,1,2,0.0,-4.74
458 | CCOC(=O)c1cncn1C(C)c2ccccc2,Etomidate,-3.359,1,244.294,0,2,4,44.12,-4.735
459 | Oc1ccc(Cl)c(Cl)c1,"3,4-Dichlorophenol",-3.352,1,163.003,1,1,0,20.23,-1.25
460 | CC1(C)C(C=C(Cl)Cl)C1C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Cypermethrin,-6.775,1,416.30400000000014,0,3,6,59.32000000000001,-8.017000000000001
461 | c2ccc1ocnc1c2,Benzoxazole,-2.214,2,119.12299999999998,0,2,0,26.03,-1.16
462 | CCCCCO,1-Pentanol,-1.042,1,88.14999999999999,1,0,3,20.23,-0.6
463 | CCN(CC)c1ccccc1,"N,N-Diethylaniline",-3.16,1,149.237,0,1,3,3.24,-3.03
464 | Fc1cccc(F)c1,"1,3-Difluorobenzene",-2.636,1,114.094,0,1,0,0.0,-2.0
465 | ClCCC#N ,3-chloropropionitrile,-0.522,1,89.525,0,0,1,23.79,-0.29
466 | CC(C)(C)Cc1ccccc1,t-Pentylbenzene,-3.867,1,148.249,0,1,1,0.0,-4.15
467 | O=C1NC(=O)NC(=O)C1(CC)c1ccccc1,5-Ethyl-5-phenylbarbital,-2.272,1,232.239,2,2,2,75.27000000000001,-2.322
468 | Clc1ccccc1I,o-Chloroiodobenzene,-4.384,1,238.455,0,1,0,0.0,-3.54
469 | c2ccc1[nH]nnc1c2,Benzotriazole,-2.21,2,119.127,1,2,0,41.57,-0.78
470 | CNC(=O)Oc1cccc2CC(C)(C)Oc12,Carbofuran,-3.05,1,221.256,1,2,1,47.56,-2.8
471 | Cc1cccc(C)c1O,"2,6-Dimethylphenol",-2.589,1,122.167,1,1,0,20.23,-1.29
472 | CC(C)C(C)O,3-Methyl-2-butanol,-0.954,1,88.14999999999999,1,0,1,20.23,-0.18
473 | c1ccccc1C(O)c2ccccc2,benzhydrol,-3.033,1,184.238,1,2,2,20.23,-2.55
474 | CCCCCCCCCC(=O)OC,Methyl decanoate,-3.3160000000000003,1,186.295,0,0,8,26.3,-4.69
475 | COP(=S)(OC)Oc1ccc(cc1Cl)N(=O)=O,Dicapthon,-4.188,1,297.656,0,1,5,70.83000000000001,-4.31
476 | CC(C)CBr,1-Bromo-2-methylpropane,-2.2880000000000003,1,137.01999999999998,0,0,1,0.0,-2.43
477 | CCI,Iodoethane,-2.066,1,155.966,0,0,0,0.0,-1.6
478 | CN(C)C(=O)Oc1nc(nc(C)c1C)N(C)C,Pirimicarb,-2.34,1,238.291,0,1,2,58.56000000000001,-1.95
479 | CCCCCCBr,1-Bromohexane,-3.012,1,165.074,0,0,4,0.0,-3.81
480 | CCCC(C)C,2-Methylpentane,-2.6,1,86.178,0,0,2,0.0,-3.74
481 | Cc1c(F)c(F)c(COC(=O)C2C(C=C(Cl)C(F)(F)F)C2(C)C)c(F)c1F,Tetrafluthrin,-6.339,1,418.7360000000001,0,2,4,26.3,-7.321000000000001
482 | CCc1cccc(C)c1N(C(C)COC)C(=O)CCl ,Metolachlor,-3.431,1,283.7989999999999,0,1,6,29.54,-2.73
483 | ON=Cc1ccc(o1)N(=O)=O ,nifuroxime,-1.843,1,156.09699999999998,1,1,2,88.87,-2.19
484 | CC(C)C(Nc1ccc(cc1Cl)C(F)(F)F)C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Fluvalinate,-8.057,1,502.9200000000002,1,3,8,71.35,-8.003
485 | Nc1nc[nH]n1,Amitrole,-0.674,1,84.082,2,1,0,67.59,0.522
486 | BrC(Br)Br,Tribromomethane,-2.904,1,252.731,0,0,0,0.0,-1.91
487 | COP(=O)(OC)C(O)C(Cl)(Cl)Cl,Trichlorfon,-1.866,1,257.437,1,0,3,55.760000000000005,-0.22
488 | CCOP(=S)(OCC)SCn1c(=O)oc2cc(Cl)ccc12,Phosalone,-5.024,1,367.8160000000001,0,2,7,53.6,-5.233
489 | OCc1ccccc1,Phenylmethanol,-1.699,1,108.13999999999996,1,1,1,20.23,-0.4
490 | O=c2c(C3CCCc4ccccc43)c(O)c1ccccc1o2 ,Coumatetralyl,-5.194,1,292.33400000000006,1,4,1,50.44,-2.84
491 | Oc1ccc(Br)cc1,4-Bromophenol,-3.132,1,173.009,1,1,0,20.23,-1.09
492 | CC(C)Br,2-Bromopropane,-1.949,1,122.993,0,0,0,0.0,-1.59
493 | CC(C)CC(C)(C)C,"2,2,4-Trimethylpentane",-3.276,1,114.232,0,0,1,0.0,-4.74
494 | O=N(=O)c1cc(cc(c1)N(=O)=O)N(=O)=O,"1,3,5-Trinitrobenzene",-2.324,1,213.105,0,1,3,129.42000000000002,-2.89
495 | CN2C(=O)CN=C(c1ccccc1)c3cc(ccc23)N(=O)=O,Nimetazepam,-3.557,1,295.29800000000006,0,3,2,75.81,-3.796
496 | CCC,Propane,-1.5530000000000002,1,44.097,0,0,0,0.0,-1.94
497 | Nc1cc(nc(N)n1=O)N2CCCCC2 ,Minoxidil,-1.809,1,209.253,2,2,1,95.11,-1.989
498 | Nc2cccc3nc1ccccc1cc23 ,1-aminoacridine,-3.542,1,194.237,1,3,0,38.91,-4.22
499 | c1ccc2cc3c4cccc5cccc(c3cc2c1)c45,Benzo(k)fluoranthene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.49
500 | OC(c1ccc(Cl)cc1)(c2ccc(Cl)cc2)C(Cl)(Cl)Cl,Dicofol,-6.268,1,370.49,1,2,2,20.23,-5.666
501 | C1Cc2cccc3cccc1c23,Acenapthene,-3.792,2,154.21199999999996,0,3,0,0.0,-4.63
502 | CCOP(=S)(OCC)SC(CCl)N2C(=O)c1ccccc1C2=O,Dialifos,-5.026,1,393.85400000000016,0,2,8,55.84,-6.34
503 | Brc1ccc(Br)cc1,"1,4-Dibromobenzene",-4.298,1,235.906,0,1,0,0.0,-4.07
504 | Cn2c(=O)on(c1ccc(Cl)c(Cl)c1)c2=O,Methazole,-3.6010000000000004,1,261.064,0,2,1,57.14,-2.82
505 | Oc1ccc(cc1)c2ccccc2,p-Phenylphenol,-3.701,1,170.211,1,2,1,20.23,-3.48
506 | CC1=C(CCCO1)C(=O)Nc2ccccc2 ,pyracarbolid,-2.83,1,217.26800000000003,1,2,2,38.33,-2.56
507 | CCOC=C,Ethyl vinyl ether,-0.857,1,72.10700000000001,0,0,2,9.23,-0.85
508 | CCC#C,1-Butyne,-1.092,1,54.09199999999999,0,0,0,0.0,-1.24
509 | COc1ncnc2nccnc12 ,4-methoxypteridine,-1.589,1,162.15200000000002,0,2,1,60.790000000000006,-1.11
510 | CCCCC(C)(O)CC,3-Methyl-3-heptanol,-2.017,1,130.23099999999997,1,0,4,20.23,-1.6
511 | Clc1ccc(Cl)cc1,"1,4-Dichlorobenzene",-3.5580000000000003,1,147.00400000000002,0,1,0,0.0,-3.27
512 | O=C1N(COC(=O)C)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Ethanoyloxymethylphenytoin,-2.7230000000000003,1,324.33600000000007,1,3,4,75.71,-4.47
513 | CSCS(=O)CC(CO)NC(=O)C=Cc1c(C)[nH]c(=O)[nH]c1=O,"Sparsomycin (3,8mg/ml)",-1.57,1,361.4450000000001,4,1,8,132.11999999999998,-1.981
514 | Cc1c[nH]c2ccccc12 ,3-methylindole,-2.9810000000000003,1,131.17799999999997,1,2,0,15.79,-2.42
515 | COc2ncc1nccnc1n2,2-methoxypteridine,-1.589,1,162.152,0,2,1,60.790000000000006,-1.11
516 | CNC(=O)Oc1ccccc1C2OCCO2,Dioxacarb,-1.614,1,223.22799999999995,1,2,2,56.790000000000006,-1.57
517 | C1N(C(=O)NCC(C)C)C(=O)NC1,isocarbamid,-1.508,1,185.227,2,1,2,61.440000000000005,-2.15
518 | CC#N,Acetonitrile,0.152,1,41.053,0,0,0,23.79,0.26
519 | CCOC(=O)NCCOc2ccc(Oc1ccccc1)cc2,Fenoxycarb,-4.662,1,301.34200000000004,1,2,7,56.790000000000006,-4.7
520 | CC(=O)N(S(=O)c1ccc(N)cc1)c2onc(C)c2C ,acetyl sulfisoxazole,-2.024,1,293.34800000000007,1,2,3,89.43,-3.59
521 | ClCC(Cl)(Cl)Cl,"1,1,1,2-Tetrachloroethane",-2.794,1,167.85,0,0,0,0.0,-2.18
522 | CCCCO,1-Butanol,-0.688,1,74.12299999999999,1,0,2,20.23,0.0
523 | CC1CCCCC1NC(=O)Nc2ccccc2,Siduron,-3.779,1,232.32700000000003,2,2,2,41.13,-4.11
524 | Clc1cc(Cl)cc(Cl)c1,"1,3,5-Trichlorobenzene",-4.159,1,181.449,0,1,0,0.0,-4.48
525 | O=Cc1ccco1,Furfural,-1.391,1,96.08499999999998,0,1,1,30.21,-0.1
526 | CC(C)CCO,3-Methylbutan-1-ol,-1.027,1,88.14999999999999,1,0,2,20.23,-0.51
527 | O=Cc2ccc1OCOc1c2 ,piperonal,-2.033,1,150.13299999999998,0,2,1,35.53,-1.63
528 | CC(=C)C,2-Methylpropene,-1.5730000000000002,1,56.108,0,0,0,0.0,-2.33
529 | O=Cc1ccccc1,Benzaldehyde,-1.999,1,106.12399999999997,0,1,1,17.07,-1.19
530 | CC(=C)C(=C)C,"2,3-Dimethyl-1,3-Butadiene",-2.052,1,82.146,0,0,1,0.0,-2.4
531 | CCOC(=O)CCN(SN(C)C(=O)Oc1cccc2CC(C)(C)Oc21)C(C)C,Benfuracarb,-5.132999999999999,1,410.5360000000002,0,2,8,68.31,-4.71
532 | O2c1ccccc1N(C)C(=O)c3cccnc23 ,RTI 10,-2.7710000000000004,1,226.235,0,3,0,42.43,-3.672
533 | C1c2ccccc2c3ccccc13,Fluorene ,-4.125,2,166.22299999999998,0,3,0,0.0,-5.0
534 | CC1CCCCC1,Methylcyclohexane ,-2.891,1,98.189,0,1,0,0.0,-3.85
535 | NC(=N)NS(=O)(=O)c1ccc(N)cc1 ,sulfaguanidine,-0.706,1,214.25,4,1,2,122.06,-1.99
536 | COC(=O)c1ccc(O)cc1,Methylparaben,-2.441,1,152.149,1,1,1,46.53,-1.827
537 | CC1CCCO1,2-Methyltetrahydrofurane,-1.034,1,86.134,0,1,0,9.23,0.11
538 | CC3C2CCC1(C)C=CC(=O)C(=C1C2OC3=O)C,Santonin,-2.43,1,246.30599999999995,0,3,0,43.370000000000005,-3.09
539 | OCC2OC(Oc1ccccc1CO)C(O)C(O)C2O,Salicin,-0.975,1,286.28,5,2,4,119.61,-0.85
540 | CCCI,1-Iodopropane,-2.486,1,169.993,0,0,1,0.0,-2.29
541 | CCNc1nc(NC(C)C)nc(SC)n1,Ametryn,-3.43,1,227.337,2,1,5,62.73,-3.04
542 | CCCO,1-Propanol,-0.3339999999999999,1,60.096,1,0,1,20.23,0.62
543 | CC(=O)C1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3CCC21C,Hydroxyprogesterone-17a,-3.876,1,330.4680000000001,1,4,1,54.37,-3.817
544 | CCCC(C)O,2-Pentanol,-0.97,1,88.14999999999999,1,0,2,20.23,-0.29
545 | OC(C(=O)c1ccccc1)c2ccccc2,benzoin,-3.148,1,212.248,1,2,3,37.3,-2.85
546 | Cc1ccc(O)c(C)c1,"2,4-Dimethylphenol",-2.6210000000000004,1,122.167,1,1,0,20.23,-1.19
547 | Clc1cccc(c1)N(=O)=O,m-Chloronitrobenzene ,-2.901,1,157.55599999999998,0,1,1,43.14,-2.77
548 | Cc2c(N)c(=O)n(c1ccccc1)n2C,ampyrone,-1.192,1,203.245,1,2,1,52.95,-0.624
549 | Clc1ccc(c(Cl)c1)c2cc(Cl)ccc2Cl ,"2,2',4,5'-PCB",-6.23,1,291.992,0,2,1,0.0,-6.57
550 | ClC(=C(Cl)C(=C(Cl)Cl)Cl)Cl,"Hexachloro-1,3-butadiene",-4.546,1,260.762,0,0,1,0.0,-4.92
551 | CCNc1nc(NC(C)(C)C)nc(SC)n1,Terbutryn,-3.75,1,241.364,2,1,4,62.73,-4.0
552 | CCC(C)CCO,3-Methyl-2-pentanol,-1.308,1,102.177,1,0,3,20.23,-0.71
553 | Cc2ncc1nccnc1n2,2-methylpteridine,-1.24,1,146.153,0,2,0,51.56,-0.12
554 | CC23Cc1cnoc1C=C2CCC4C3CCC5(C)C4CCC5(O)C#C,Danazol,-4.557,1,337.4630000000001,1,5,0,46.260000000000005,-5.507000000000001
555 | CCCCI,1-Iodobutane,-2.841,1,184.02,0,0,2,0.0,-2.96
556 | Brc1ccc2ccccc2c1,2-Bromonapthalene,-4.434,1,207.07,0,2,0,0.0,-4.4
557 | CC1OC(CC(O)C1O)OC2C(O)CC(OC2C)OC8C(O)CC(OC7CCC3(C)C(CCC4C3CC(O)C5(C)C(CCC45O)C6=CC(=O)OC6)C7)OC8C ,"Digoxin (L1=41,8mg/mL, L2=68,2mg/mL, Z=40,1mg/mL)",-5.312,1,780.9490000000001,6,8,7,203.06,-4.081
558 | FC(F)(F)c1ccccc1,Benzyltrifluoride,-3.099,1,146.111,0,1,0,0.0,-2.51
559 | CCCCCCOC(=O)c1ccccc1C(=O)OCCCCCC,Dihexyl phthalate,-5.757999999999999,1,334.45600000000024,0,1,12,52.60000000000001,-6.144
560 | c1ccc2c(c1)sc3ccccc23,Dibenzothiophene,-4.597,2,184.263,0,3,0,0.0,-4.38
561 | Clc1ccc(c(Cl)c1)c2ccc(Cl)c(Cl)c2Cl ,"2,3',4,4'-PCB",-6.709,1,326.437,0,2,1,0.0,-7.8
562 | Clc1ccc(c(Cl)c1Cl)c2ccc(Cl)c(Cl)c2Cl ,"2,2',3,3',4,4'-PCB",-7.192,1,360.88200000000006,0,2,1,0.0,-8.01
563 | CC(=O)CC(c1ccccc1)c3c(O)c2ccccc2oc3=O ,Warfarin,-3.913,1,308.3330000000001,1,3,4,67.50999999999999,-3.893
564 | c1ccccc1C(O)C(O)c2ccccc2,hydrobenzoin,-2.645,1,214.264,2,2,3,40.46,-1.93
565 | COC(=O)c1ccccc1C(=O)OC,Dimethyl phthalate,-2.347,1,194.18599999999995,0,1,2,52.60000000000001,-1.66
566 | CCCCCCCC(=O)OCC,Ethyl octanoate,-2.962,1,172.26799999999997,0,0,7,26.3,-3.39
567 | CCSSCC,Diethyldisulfide,-2.364,1,122.258,0,0,3,0.0,-2.42
568 | CCOCCOCC,"1,2-Diethoxyethane ",-0.833,1,118.176,0,0,5,18.46,-0.77
569 | Clc1cc(Cl)c(Cl)cc1Cl,"1,2,4,5-Tetrachlorobenzene",-4.621,1,215.894,0,1,0,0.0,-5.56
570 | Nc1ccc(cc1)c2ccc(N)cc2,p-benzidine,-2.613,1,184.242,2,2,1,52.04,-2.7
571 | CCCCCC=C,1-Heptene,-2.718,1,98.189,0,0,4,0.0,-3.73
572 | CCCCc1c(C)nc(NCC)[nH]c1=O,Ethirimol,-2.732,1,209.293,2,1,5,57.78,-3.028
573 | O=C1NC(=O)NC(=O)C1(CC)C(C)CCC,Pentobarbital,-2.312,1,226.27599999999995,2,1,4,75.27000000000001,-2.39
574 | Nc1ccccc1Cl,o-Chloroaniline,-2.392,1,127.574,1,1,0,26.02,-1.52
575 | COc1cccc(Cl)c1,3-Chloroanisole,-3.057,1,142.58499999999998,0,1,1,9.23,-2.78
576 | CCCCN(CC)C(=O)SCCC,Pebulate,-3.131,1,203.351,0,0,6,20.31,-3.53
577 | CCCCOC=O,Butyl acetate,-1.111,1,102.13299999999998,0,0,4,26.3,-1.37
578 | CC12CC(O)C3C(CCC4=CC(=O)C=CC34C)C2CCC1(O)C(=O)CO,Prednisolone,-2.974,1,360.4500000000002,3,4,2,94.83,-3.18
579 | BrC(Cl)Cl,Bromodichloromethane,-2.176,1,163.82899999999998,0,0,0,0.0,-1.54
580 | CC34CC(=O)C1C(CCC2=CC(=O)CCC12C)C3CCC4(=O) ,adrenosterone,-2.99,1,300.3980000000001,0,4,0,51.21,-3.48
581 | c1ccc(cc1)c2ccc(cc2)c3ccccc3,p-terphenyl,-5.7410000000000005,2,230.31,0,3,2,0.0,-7.11
582 | Oc1ccc(C=O)cc1,p-Hydroxybenzaldehyde ,-2.003,1,122.12299999999998,1,1,1,37.3,-0.96
583 | CBr,Bromomethane,-1.109,1,94.939,0,0,0,0.0,-0.79
584 | Cc1cc(ccc1NS(=O)(=O)C(F)(F)F)S(=O)(=O)c2ccccc2,Perfluidone,-4.945,1,379.381,1,2,4,80.31,-3.8
585 | CC(=O)CC(c1ccc(Cl)cc1)c2c(O)c3ccccc3oc2=O,Coumachlor,-4.553999999999999,1,342.7780000000001,1,3,4,67.50999999999999,-5.839
586 | CCc1ccc2ccccc2c1,2-Ethylnaphthalene,-4.1,1,156.22799999999998,0,2,1,0.0,-4.29
587 | Nc1c(C)c[nH]c(=O)n1 ,5-methylcytosine,-0.257,1,125.131,2,1,0,71.77000000000001,-1.4580000000000002
588 | Clc2c(Cl)c(Cl)c(c1ccccc1)c(Cl)c2Cl ,"2,3,4,5,6-PCB",-6.785,1,326.437,0,2,1,0.0,-7.92
589 | c1c(NC(=O)c2ccccc2(I))cccc1,benodanil,-4.245,1,323.133,1,2,2,29.1,-4.21
590 | Cc3cc2nc1c(=O)[nH]c(=O)nc1n(CC(O)C(O)C(O)CO)c2cc3C,Riboflavin,-1.865,1,376.36900000000014,5,3,5,161.56,-3.685
591 | Fc1ccccc1Br,o-Fluorobromobenzene,-3.467,1,175.0,0,1,0,0.0,-2.7
592 | Oc1ccc(Cl)cc1Cl,"2,4-Dichlorophenol ",-3.22,1,163.003,1,1,0,20.23,-1.55
593 | CC1(C)C(C=C(Cl)Cl)C1C(=O)OCc2cccc(Oc3ccccc3)c2,Permethrin,-7.129,1,391.2940000000001,0,3,6,35.53,-6.291
594 | CN2C(=C(O)c1ccccc1S2(=O)=O)C(=O)Nc3ccccn3 ,piroxicam,-3.4730000000000003,1,331.353,2,3,2,99.6,-4.16
595 | O=C1N(COC(=O)CC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Propanoyloxymethylphenytoin,-3.128,1,338.36300000000006,1,3,5,75.71,-4.907
596 | C1CCCC1,Cyclopentane ,-2.0380000000000003,2,70.135,0,1,0,0.0,-2.64
597 | Cc1ccccc1N,o-Toluidine,-1.922,1,107.156,1,1,0,26.02,-2.21
598 | c1(OC)ccc(CC=C)cc1,Estragole,-3.074,1,148.205,0,1,3,9.23,-2.92
599 | CN(C)C(=O)Nc1cccc(OC(=O)NC(C)(C)C)c1 ,karbutilate,-2.655,1,279.34,2,1,2,70.67,-2.93
600 | CC(C)C=C,3-Methyl-1-Butene,-1.994,1,70.135,0,0,1,0.0,-2.73
601 | Oc1ccccn1,2-Hydroxypyridine,-1.655,1,95.101,1,1,0,33.120000000000005,1.02
602 | CC,Ethane,-1.132,1,30.07,0,0,0,0.0,-1.36
603 | Clc1ccccc1Cl,"1,2-Dichlorobenzene",-3.482,1,147.00399999999996,0,1,0,0.0,-3.05
604 | Sc2nc1ccccc1s2 ,mercaptobenzothiazole,-3.411,1,167.25799999999998,1,2,0,12.89,-3.18
605 | Clc1c(Cl)c(Cl)c(c(Cl)c1Cl)c2c(Cl)c(Cl)c(Cl)c(Cl)c2Cl ,"2,2',3,3',4,4',5,5',6,6'-PCB",-9.589,1,498.66200000000026,0,2,1,0.0,-11.6
606 | COc2c1occc1cc3ccc(=O)oc23 ,Methoxsalen,-3.25,1,216.19199999999995,0,3,1,52.58,-3.664
607 | CC(=O)N,Acetamide,0.494,1,59.068,1,0,0,43.09,1.58
608 | Cc1cccc2ccccc12,1-Methylnaphthalene,-3.802,1,142.201,0,2,0,0.0,-3.7
609 | CCN(CC)C(=O)C(C)Oc1cccc2ccccc12 ,Napropamide,-4.088,1,271.36,0,2,5,29.540000000000003,-3.57
610 | CC(O)C(C)(C)C,"3,3-Dimethyl-2-butanol",-1.2919999999999998,1,102.177,1,0,0,20.23,-0.62
611 | CCCC(=O)OCC,Methyl pentanoate,-1.545,1,116.15999999999998,0,0,3,26.3,-1.36
612 | CC2=CC(=O)c1ccccc1C2=O ,Menadione,-2.667,1,172.18299999999996,0,2,0,34.14,-3.03
613 | c1ccc2c(c1)ccc3ccccc32,Phenanthrene,-4.518,2,178.23399999999998,0,3,0,0.0,-5.26
614 | Cc1ccnc(C)c1,"2,4-Dimethylpyridine",-2.0980000000000003,1,107.156,0,1,0,12.89,0.38
615 | CCCCCCCCCO,1-Nonanol,-2.46,1,144.258,1,0,7,20.23,-3.01
616 | BrCBr,Dibromomethane,-1.883,1,173.83499999999998,0,0,0,0.0,-1.17
617 | CC1CC2C3CCC4=CC(=O)C=CC4(C)C3(F)C(O)CC2(C)C1(O)C(=O)CO,Dexamethasone,-3.4,1,392.4670000000002,3,4,2,94.83,-3.59
618 | Cc1ccc2cc(C)ccc2c1,"2,6-Dimethylnaphthalene ",-4.147,1,156.228,0,2,0,0.0,-4.89
619 | CCSC(=O)N(CC(C)C)CC(C)C,Butylate,-3.4530000000000003,1,217.378,0,0,5,20.31,-3.68
620 | O=N(=O)OCC(CON(=O)=O)ON(=O)=O ,nitroglycerin,-2.029,1,227.085,0,0,8,157.11,-2.22
621 | Nc1cccc(c1)N(=O)=O,m-Nitroaniline,-1.936,1,138.126,1,1,1,69.16,-2.19
622 | CCCCCl,1-Chlorobutane,-1.94,1,92.569,0,0,2,0.0,-2.03
623 | ClC(Cl)(Cl)C(NC=O)N1C=CN(C=C1)C(NC=O)C(Cl)(Cl)Cl ,triforine,-3.715,1,430.9340000000001,2,1,6,64.68,-4.19
624 | Cn2cc(c1ccccc1)c(=O)c(c2)c3cccc(c3)C(F)(F)F,Fluridone,-4.249,1,329.321,0,3,2,22.0,-4.445
625 | Nc3cc2c1ccccc1ccc2c4ccccc34 ,6-aminochrysene,-4.849,1,243.309,1,4,0,26.02,-6.2
626 | CC12CCC3C(CCc4cc(O)ccc34)C2CCC1=O,Estrone,-3.872,1,270.372,1,4,0,37.3,-3.955
627 | CCN2c1ccccc1N(C)C(=S)c3cccnc23 ,RTI 17,-4.227,1,269.373,0,3,1,19.37,-4.706
628 | CC1CO1,"1,2-Propylene oxide",-0.358,1,58.08,0,1,0,12.53,-0.59
629 | O=C3CN=C(c1ccccc1)c2cc(ccc2N3)N(=O)=O,Nitrazepam,-3.4730000000000003,1,281.271,1,3,2,84.6,-3.796
630 | CCNC(=S)NCC,"1,3-diethylthiourea",-1.028,1,132.232,2,0,2,24.06,-1.46
631 | Oc1cc(Cl)cc(Cl)c1Cl,"2,3,5-Trichlorophenol",-3.78,1,197.448,1,1,0,20.23,-2.67
632 | CCCCC(=O)OC,Propyl propanoate,-1.545,1,116.15999999999998,0,0,3,26.3,-1.34
633 | Nc1ccccc1,Aniline ,-1.632,1,93.129,1,1,0,26.02,-0.41
634 | Cc1cccc2c(C)cccc12,"1,5-Dimethlnapthalene",-4.147,1,156.228,0,2,0,0.0,-4.678999999999999
635 | NS(=O)(=O)c2cc1c(NCNS1(=O)=O)cc2Cl ,hydrochlorothiazide,-1.72,1,297.745,3,2,1,118.36,-2.63
636 | C1=Cc2cccc3cccc1c23,Acenapthylene,-3.682,2,152.19599999999994,0,3,0,0.0,-3.96
637 | CCCCCOC(=O)CC,Ethyl butyrate,-2.254,1,144.21399999999997,0,0,5,26.3,-1.28
638 | CCNc1nc(NC(C)C)nc(OC)n1,Atratone,-3.185,1,211.269,2,1,5,71.96000000000001,-2.084
639 | c1ccc2c(c1)cc3ccc4cccc5ccc2c3c45,Benzo(a)pyrene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.699
640 | CCBr,Bromoethane,-1.529,1,108.966,0,0,0,0.0,-1.09
641 | CCC#CCC,3-Hexyne,-1.933,1,82.14599999999999,0,0,0,0.0,-1.99
642 | CC1OC(CC(O)C1O)OC2C(O)CC(OC2C)OC8C(O)CC(OC7CCC3(C)C(CCC4C3CCC5(C)C(CCC45O)C6=CC(=O)OC6)C7)OC8C ,Digitoxin,-6.114,1,764.9499999999999,5,8,7,182.83,-5.292999999999999
643 | CCC(=C)C,2-Methyl-1-Butene,-1.994,1,70.13499999999999,0,0,1,0.0,-2.73
644 | Oc1cccc2cccnc12 ,8-quinolinol,-2.725,1,145.16099999999997,1,2,0,33.120000000000005,-2.42
645 | C1CCc2ccccc2C1,"1,2,3,4-Tetrahydronapthalene",-3.447,2,132.20599999999996,0,2,0,0.0,-4.37
646 | Oc1ccc(cc1)C2(OC(=O)c3ccccc23)c4ccc(O)cc4 ,phenolphthalein,-4.59,1,318.32800000000003,2,4,2,66.76,-2.9
647 | Brc1cc(Br)cc(Br)c1,"1,3,5-Tribromobenzene",-5.27,1,314.802,0,1,0,0.0,-5.6
648 | COP(=S)(OC)Oc1cc(Cl)c(Cl)cc1Cl,Ronnel,-5.247000000000001,1,321.549,0,1,4,27.69,-5.72
649 | Cc1cc(=O)[nH]c(=S)[nH]1,methylthiouracil,-0.547,1,142.18300000000002,2,1,0,48.65,-2.436
650 | COc1cc(CC=C)ccc1O,Eugenol,-2.675,1,164.204,1,1,3,29.46,-1.56
651 | O=C1NC(=O)NC(=O)C1(C(C)C)CC=C,5-Allyl-5-isopropylbarbital,-1.706,1,210.233,2,1,3,75.27000000000001,-1.7080000000000002
652 | c1cc2ccc3cccc4ccc(c1)c2c34,Pyrene,-4.957,2,202.256,0,4,0,0.0,-6.176
653 | CCOC(C)OCC,"1,1-Diethoxyethane ",-0.899,1,118.176,0,0,4,18.46,-0.43
654 | CC1(C)CON(Cc2ccccc2Cl)C1=O,Clomazone,-3.077,1,239.702,0,2,2,29.54,-2.338
655 | CCCCOCCO,2-Butoxyethanol,-0.775,1,118.176,1,0,5,29.46,-0.42
656 | Clc1c(Cl)c(Cl)c(N(=O)=O)c(Cl)c1Cl,Quintozene,-5.098,1,295.336,0,1,1,43.14,-5.82
657 | CC12CCC(O)CC1CCC3C2CCC4(C)C3CCC4=O,Androsterone,-3.882,1,290.447,1,4,0,37.3,-4.402
658 | FC(F)(F)c1cccc(c1)N2CC(CCl)C(Cl)C2=O,Flurochloridone,-4.749,1,312.118,0,2,2,20.31,-4.047
659 | c1ccc2ncccc2c1,Quinoline,-2.6630000000000003,2,129.16199999999998,0,2,0,12.89,-1.3
660 | COC(=O)c1cc(O)c(O)c(O)c1 ,methyl gallate,-1.913,1,184.147,3,1,1,86.99000000000001,-1.24
661 | OC(Cn1cncn1)(Cn2cncn2)c3ccc(F)cc3F ,fluconazole,-2.418,1,306.276,1,3,5,81.64999999999999,-1.8
662 | Clc2ccc1oc(=O)[nH]c1c2,Chlorzoxazone,-2.679,1,169.567,1,2,0,46.0,-2.8310000000000004
663 | Clc1ccc(c(Cl)c1)c2c(Cl)c(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,4,4',5',6-PCB",-7.898,1,395.3270000000001,0,2,1,0.0,-7.92
664 | O=C1NC(=O)C(=O)C(=O)N1 ,alloxan,0.436,1,142.07,2,1,0,92.34,-1.25
665 | ClCCCCl,"1,3-Dichloropropane",-1.618,1,112.987,0,0,2,0.0,-1.62
666 | Fc1cccc(Br)c1,m-Fluorobromobenzene,-3.467,1,175.0,0,1,0,0.0,-2.67
667 | Clc1ccc(Br)cc1,p-Chlorobromobenzene,-3.928,1,191.455,0,1,0,0.0,-3.63
668 | CC(C)C(C)C,"2,3-Dimethylbutane",-2.584,1,86.178,0,0,1,0.0,-3.65
669 | CCC=C,1-Butene,-1.655,1,56.108,0,0,1,0.0,-1.94
670 | Clc1ccc(Cl)c(c1)c2cc(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,4,5,5'-PCB",-7.343,1,360.88200000000006,0,2,1,0.0,-7.68
671 | Nc1cc[nH]c(=O)n1 ,cytosine,0.051,1,111.104,2,1,0,71.77000000000001,-1.155
672 | FC(F)(Cl)C(F)(Cl)Cl,"1,1,2-Trichlorotrifluoroethane",-3.077,1,187.37500000000003,0,0,1,0.0,-3.04
673 | CCC#N,Propionitrile,-0.2689999999999999,1,55.07999999999999,0,0,0,23.79,0.28
674 | ClC(Cl)C(c1ccc(Cl)cc1)c2ccccc2Cl ,"O,P'-DDD",-6.007999999999999,1,320.04600000000005,0,2,3,0.0,-6.51
675 | COc1ccccc1N(=O)=O,o-Nitroanisole,-2.346,1,153.13699999999997,0,1,2,52.37,-1.96
676 | CC34CCC1C(CC=C2CC(O)CCC12C)C3CCC4=O,Prasterone,-3.564,1,288.43100000000004,1,4,0,37.3,-4.12
677 | CC12CC2(C)C(=O)N(C1=O)c3cc(Cl)cc(Cl)c3,Procymidone,-3.464,1,284.142,0,3,1,37.38,-4.8
678 | c1cc2ccc3ccc4ccc5cccc6c(c1)c2c3c4c56,Benzo[ghi]perylene,-6.446000000000001,2,276.338,0,6,0,0.0,-9.018
679 | CCC(C)c1cc(cc(N(=O)=O)c1O)N(=O)=O ,Dinoseb,-3.715,1,240.21499999999995,1,1,4,106.51000000000002,-3.38
680 | c1c(OC)c(OC)C2C(=O)OCC2c1,meconin,-0.825,1,196.202,0,2,2,44.760000000000005,-1.899
681 | OCC(O)CO,Glycerol,0.688,1,92.094,3,0,2,60.69,1.12
682 | COc1ccccc1O ,Guaiacol,-1.941,1,124.13899999999995,1,1,1,29.46,-1.96
683 | CCOP(=S)(OCC)Oc1nc(Cl)c(Cl)cc1Cl ,chlorpyrifos,-4.972,1,350.591,0,1,6,40.58,-5.67
684 | Cc1c2ccccc2cc3ccccc13,9-Methylanthracene,-4.87,1,192.261,0,3,0,0.0,-5.89
685 | Cc1cc(=O)n(c2ccccc2)n1C,Antipyrene,-1.733,1,188.23,0,2,1,26.93,0.715
686 | CCCCOC,Methyl butyl ether ,-1.072,1,88.14999999999999,0,0,3,9.23,-0.99
687 | Cc2cnc1cncnc1n2,7-methylpteridine,-1.24,1,146.153,0,2,0,51.56,-0.8540000000000001
688 | CCNc1nc(Cl)nc(NCC)n1 ,simazine,-2.8110000000000004,1,201.661,2,1,4,62.73,-4.55
689 | CN(C)C(=O)C,"N,N-Dimethylacetamide",0.123,1,87.12199999999999,0,0,0,20.31,1.11
690 | CSc1nc(nc(n1)N(C)C)N(C)C,Simetryn,-2.689,1,213.31,0,1,3,45.150000000000006,-2.676
691 | C=C,Ethylene,-0.815,1,28.053999999999995,0,0,0,0.0,-0.4
692 | CC(C)(C)CCO,"3,3-Dimethyl-1-butanol",-1.365,1,102.177,1,0,1,20.23,-0.5
693 | O=C1NC(=O)NC(=O)C1(CC)CC=C,5-Allyl-5-ethylbarbital,-1.368,1,196.206,2,1,3,75.27000000000001,-1.614
694 | Oc1ccc(Cl)c(Cl)c1Cl,"2,3,4-Trichlorophenol",-3.705,1,197.448,1,1,0,20.23,-2.67
695 | COc1ccccc1,Anisole,-2.3680000000000003,1,108.13999999999996,0,1,1,9.23,-1.85
696 | c1ccc(Cl)cc1C(c2ccc(Cl)cc2)(O)C(=O)OC(C)C,chloropropylate,-5.093,1,339.21800000000013,1,2,4,46.53,-4.53
697 | CC13CCC(=O)C=C1CCC4C2CCC(C(=O)CO)C2(CC(O)C34)C=O ,aldosterone,-3.0660000000000003,1,360.4500000000001,2,4,3,91.67000000000002,-3.85
698 | COc2ccc(Oc1ccc(NC(=O)N(C)C)cc1)cc2,Difenoxuron,-3.928,1,286.331,1,2,4,50.8,-4.16
699 | CCc1ccc(C)cc1,4-Ethyltoluene,-3.3280000000000003,1,120.19499999999996,0,1,1,0.0,-3.11
700 | CC(C)SC(C)C,Diisopropylsulfide,-2.162,1,118.245,0,0,2,0.0,-2.24
701 | O=N(=O)c1cccc(c1)N(=O)=O,"1,3-Dinitrobenzene",-2.281,1,168.10799999999995,0,1,2,86.28,-2.29
702 | CCOP(=S)(OCC)SCSP(=S)(OCC)OCC,Ethion,-5.471,1,384.4870000000002,0,0,12,36.92,-5.54
703 | CCC1(C(C)C)C(=O)NC(=O)NC1=O ,probarbital,-1.6030000000000002,1,198.222,2,1,2,75.27000000000001,-2.21
704 | CC(=O)OCC(=O)C3(O)CCC4C2CCC1=CC(=O)CCC1(C)C2C(=O)CC34C ,cortisone acetate,-3.426,1,402.48700000000025,1,4,3,97.74,-4.21
705 | Cc1ncc(N(=O)=O)n1CCO,Metronidazole,-0.8590000000000001,1,171.15599999999998,1,1,3,81.19,-1.22
706 | Nc1ccc(Cl)cc1,p-Chloroaniline,-2.392,1,127.574,1,1,0,26.02,-1.66
707 | CCCC(C)(C)CO,"2,2-Dimethylpentanol",-1.719,1,116.20399999999998,1,0,3,20.23,-1.52
708 | c1ccoc1,Furane,-1.837,2,68.07499999999999,0,1,0,13.14,-0.82
709 | COCCCNc1nc(NC(C)C)nc(SC)n1,Methoproptryne,-3.259,1,271.39,2,1,8,71.96000000000001,-2.928
710 | CN(C)C(=O)NC1CC2CC1C3CCCC23,Norea,-2.47,1,222.332,1,3,1,32.34,-3.1710000000000003
711 | CC(C)(C)c1ccccc1,t-Butylbenzene ,-3.554,1,134.22199999999998,0,1,0,0.0,-3.66
712 | CC(=O)CCC1C(=O)N(N(C1=O)c2ccccc2)c3ccccc3,kebuzone,-2.645,1,322.36400000000003,0,3,5,57.690000000000005,-3.27
713 | CC(=O)OCC(=O)C3(O)CCC4C2CCC1=CC(=O)C=CC1(C)C2C(O)CC34C ,prednisolone acetate,-3.507,1,402.48700000000014,2,4,3,100.90000000000002,-4.37
714 | CCCOC,Methyl propyl ether ,-0.718,1,74.12299999999999,0,0,2,9.23,-0.39
715 | CC(C)OC(=O)C,Isopropyl acetate,-1.1909999999999998,1,102.133,0,0,1,26.3,-0.55
716 | Brc1ccccc1,Bromobenzene,-3.345,1,157.01,0,1,0,0.0,-2.55
717 | CCOC(=O)c1ccc(O)cc1,Ethyl-p-hydroxybenzoate ,-2.761,1,166.176,1,1,2,46.53,-2.35
718 | O=C1N(COC(=O)CCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Butanoyloxymethylphenytoin,-3.469,1,352.39000000000004,1,3,6,75.71,-5.071000000000001
719 | CCC(=O)OC3CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,testosterone propionate,-4.87,1,344.4950000000001,0,4,2,43.370000000000005,-5.37
720 | c1cc2ccc3ccc4ccc5ccc6ccc1c7c2c3c4c5c67,Coronene,-6.885,2,300.36000000000007,0,7,0,0.0,-9.332
721 | O=c1[nH]cnc2[nH]ncc12 ,allopurinol,-0.84,1,136.114,2,2,0,74.43,-2.266
722 | ClC=C,Chloroethylene,-1.188,1,62.499,0,0,0,0.0,-1.75
723 | CN(C)C(=O)C(c1ccccc1)c2ccccc2 ,diphenamid,-3.147,1,239.318,0,2,3,20.31,-2.98
724 | BrC(Br)(Br)Br,Tetrabromomethane,-4.063,1,331.62699999999995,0,0,0,0.0,-3.14
725 | CCN2c1cc(N(C)C)cc(C)c1NC(=O)c3cccnc23 ,RTI 22,-4.408,1,296.374,1,3,2,48.47,-4.871
726 | O=C1NC(=O)c2ccccc12 ,phthalimide,-1.882,1,147.13299999999998,1,2,0,46.17,-2.61
727 | OC(c1ccc(Cl)cc1)(c2cncnc2)c3ccccc3Cl,Fenarimol,-4.1080000000000005,1,331.202,1,3,3,46.010000000000005,-4.38
728 | COC(=O)c1ccccc1,Methyl benzoate ,-2.462,1,136.14999999999998,0,1,1,26.3,-1.85
729 | Cn1ccc(=O)[nH]c1=O,1-methyluracil,-0.375,1,126.115,1,1,0,54.86,-0.807
730 | CCCCC1C(=O)N(N(C1=O)c2ccc(O)cc2)c3ccccc3 ,oxyphenbutazone,-3.739,1,324.38000000000005,1,3,5,60.85000000000001,-3.73
731 | Clc1ccc(Cl)c(c1)c2cccc(Cl)c2Cl ,"2,2',3,5'-PCB",-6.155,1,291.9920000000001,0,2,1,0.0,-6.47
732 | CCC2NC(=O)c1cc(c(Cl)cc1N2)S(N)(=O)=O,Quinethazone,-2.184,1,289.7440000000001,3,2,2,101.29,-3.29
733 | CN(C)C(=O)Nc1ccc(Cl)c(Cl)c1,Diuron,-3.301,1,233.098,1,1,1,32.34,-3.8
734 | C1CC=CC1,Cyclopentene ,-1.72,2,68.11900000000001,0,1,0,0.0,-2.1
735 | C1(=O)NC(=O)NC(=O)C1(O)C2(O)C(=O)NC(=O)NC2(=O),alloxantin,0.919,1,286.156,6,2,1,191.0,-1.99
736 | CCCCCCCCC,Nonane,-3.678,1,128.259,0,0,6,0.0,-5.88
737 | Oc1ccccc1Cl,2-Chlorophenol,-2.553,1,128.558,1,1,0,20.23,-1.06
738 | c1cccc2c3c(C)cc4ccccc4c3ccc12,5-Methylchrysene,-5.931,1,242.321,0,4,0,0.0,-6.59
739 | CCOc1ccccc1,Phenetole,-2.66,1,122.16699999999996,0,1,2,9.23,-2.33
740 | CCOC(=O)C=Cc1ccccc1,ethyl cinnamate,-3.0980000000000003,1,176.215,0,1,3,26.3,-3.0
741 | Cc1[nH]c(=O)n(c(=O)c1Cl)C(C)(C)C,Terbacil,-3.033,1,216.668,1,1,0,54.86,-2.484
742 | Clc1ccccc1C2=NCC(=O)Nc3ccc(cc23)N(=O)=O,Clonazepam,-3.707,1,315.716,1,3,2,84.6,-3.499
743 | Cc1ccc(cc1)S(=O)(=O)N,p-Toluenesulfonamide ,-1.815,1,171.22099999999998,1,1,1,60.16,-1.74
744 | CC(OC(=O)Nc1cccc(Cl)c1)C#C,Chlorbufam,-3.629,1,223.659,1,1,2,38.33,-2.617
745 | CCCCCC(C)C,2-Methylheptane,-3.3080000000000003,1,114.232,0,0,4,0.0,-5.08
746 | CC1(C)C(C=C(Cl)C(F)(F)F)C1C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Cyhalothrin,-6.905,1,449.8560000000001,0,3,6,59.32000000000001,-8.176
747 | CCCC1C(=O)N3N(C1=O)c2cc(C)ccc2N=C3N(C)C ,Apazone,-2.9,1,300.3620000000001,0,3,2,56.220000000000006,-3.5380000000000003
748 | CN2C(=O)CN=C(c1ccccc1)c3cc(Cl)ccc23,Diazepam,-4.05,1,284.74600000000004,0,3,1,32.67,-3.754
749 | CCC(O)C(C)C,2-Methyl-3-pentanol,-1.308,1,102.177,1,0,2,20.23,-0.7
750 | CCOP(=S)(OCC)Oc1ccc(cc1)S(C)=O ,fensulfothion,-3.283,1,308.36100000000005,0,1,7,44.760000000000005,-2.3
751 | CC1(C)C2CCC1(C)C(O)C2,borneol,-2.423,1,154.253,1,2,0,20.23,-2.32
752 | CC12CCC3C(CCC4=CC(=O)CCC34C)C2CCC1O,Testosterone,-3.659,1,288.431,1,4,0,37.3,-4.02
753 | CCCCCCC,Heptane,-2.97,1,100.205,0,0,4,0.0,-4.53
754 | Oc1cccc2ccccc12,1-Napthol,-3.08,1,144.17299999999997,1,2,0,20.23,-2.22
755 | C/C1CCCCC1\C,"cis-1,2-Dimethylcyclohexane",-3.305,1,112.216,0,1,0,0.0,-4.3
756 | COc2cc1c(N)nc(nc1c(OC)c2OC)N3CCN(CC3)C(=O)OCC(C)(C)O ,Trimazosin,-3.958,1,435.48100000000034,2,3,6,132.5,-3.638
757 | C1Cc2c3c1cccc3cc4c2ccc5ccccc54,Cholanthrene,-5.942,2,254.332,0,5,0,0.0,-7.85
758 | CC(=O)C3(C)CCC4C2C=C(C)C1=CC(=O)CCC1(C)C2CCC34C,Medrogestone,-4.593,1,340.5070000000001,0,4,1,34.14,-5.27
759 | CCCCCC(=O)C,2-Heptanone,-1.554,1,114.188,0,0,4,17.07,-1.45
760 | COP(=O)(NC(C)=O)SC ,Acephate,-0.416,1,183.169,1,0,3,55.4,0.54
761 | CCCCSP(=O)(SCCCC)SCCCC,DEF,-4.074,1,314.5220000000001,0,0,12,17.07,-5.14
762 | c1cC2C(=O)NC(=O)C2cc1,phthalamide,-0.636,1,149.149,1,2,0,46.17,-2.932
763 | NS(=O)(=O)c2cc1c(NC(NS1(=O)=O)C(Cl)Cl)cc2Cl ,Trichlomethiazide,-2.98,1,380.662,3,2,2,118.36,-2.68
764 | CC=C(C)C,2-Methy-2-Butene,-1.994,1,70.13499999999999,0,0,0,0.0,-2.56
765 | Cc1ccc(C)c(C)c1,"1,2,4-Trimethylbenzene",-3.343,1,120.195,0,1,0,0.0,-3.31
766 | Oc1cc(Cl)c(Cl)cc1Cl,"2,4,5-Trichlorophenol ",-3.78,1,197.448,1,1,0,20.23,-2.21
767 | c1ccc2c(c1)cnc3ccccc23 ,phenanthridine,-3.713,2,179.22199999999998,0,3,0,12.89,-2.78
768 | CCCC(C)(O)CC,3-Methyl-3-hexanol,-1.663,1,116.20399999999998,1,0,3,20.23,-0.98
769 | CCCCCCCC,Octane,-3.324,1,114.232,0,0,5,0.0,-5.24
770 | c1ccc2cc3ccccc3cc2c1,Anthracene,-4.518,2,178.23399999999995,0,3,0,0.0,-6.35
771 | NNc1ccccc1,Phenylhydrazine,-1.866,1,108.14399999999998,2,1,1,38.05,0.07
772 | CCC=O,Propionaldehyde,-0.3939999999999999,1,58.08,0,0,1,17.07,0.58
773 | C1CCCCCCC1,Cyclooctane,-3.355,2,112.216,0,1,0,0.0,-4.15
774 | O=C1NC(=O)NC(=O)C1(CC=C)CC=C,"5,5-Diallylbarbital",-1.471,1,208.217,2,1,4,75.27000000000001,-2.077
775 | ClC(Cl)Cl,Trichloromethane,-1.812,1,119.378,0,0,0,0.0,-1.17
776 | Sc1nccc(=O)[nH]1 ,thiouracil,-0.992,1,128.15599999999998,2,1,0,45.75,-2.273
777 | Clc1ccc(CN(C2CCCC2)C(=O)Nc3ccccc3)cc1,Pencycuron,-5.126,1,328.843,1,3,4,32.34,-5.915
778 | CC1=CCCCC1,1-Methylcyclohexene ,-2.574,1,96.17300000000002,0,1,0,0.0,-3.27
779 | CCCCC(CC)C=O,2-Ethylhexanal,-2.232,1,128.21499999999995,0,0,5,17.07,-2.13
780 | COc2c1occc1c(OC)c3c(=O)cc(C)oc23 ,Khellin,-3.603,1,260.24499999999995,0,3,2,61.81,-3.0210000000000004
781 | O=C1NC(=O)NC(=O)C1(CC)CCC(C)C,5-Ethyl-5-(3-methylbutyl)barbital,-2.312,1,226.27599999999995,2,1,4,75.27000000000001,-2.658
782 | c1ccc2c3c(ccc2c1)c4cccc5cccc3c45,Benzo(j)fluoranthene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.0
783 | CCC(CC)C=O,2-Ethylbutanal,-1.523,1,100.161,0,0,3,17.07,-1.52
784 | CCCOCCC,Dipropyl ether,-1.426,1,102.177,0,0,4,9.23,-1.62
785 | CCCCCCCCCCCCCCO,1-Tetradecanol,-4.231,1,214.393,1,0,12,20.23,-5.84
786 | Oc1c(Cl)ccc(Cl)c1Cl,"2,3,6-Trichlorophenol",-3.572,1,197.448,1,1,0,20.23,-2.64
787 | NC(=O)N,Urea,0.8320000000000001,1,60.056,2,0,0,69.11,0.96
788 | CCCC#C,1-Pentyne,-1.446,1,68.11899999999999,0,0,1,0.0,-1.64
789 | Brc1cccc(Br)c1,"1,3-Dibromobenzene",-4.298,1,235.906,0,1,0,0.0,-3.54
790 | CCCCCCCCCCCCCCCCCCO,1-Octadecanol,-5.649,1,270.50099999999986,1,0,16,20.23,-8.4
791 | CC(=O)Nc1ccccc1,Acetanilide,-1.857,1,135.16599999999997,1,1,1,29.1,-1.33
792 | c1cc(O)c(O)c2OCC3(O)CC4=CC(=O)C(O)=CC4=C3c21,hematein,-1.795,1,300.266,4,4,0,107.22,-2.7
793 | c1nccc(C(=O)NN)c1,Isonazid,-0.7170000000000001,1,137.14200000000002,2,1,1,68.01,0.009
794 | OC1C=CC2C1C3(Cl)C(=C(Cl)C2(Cl)C3(Cl)Cl)Cl ,hydroxychlordene,-4.156000000000001,1,354.8749999999999,1,3,0,20.23,-5.46
795 | CC(C)CCOC=O,Isopentyl formate,-1.449,1,116.15999999999998,0,0,4,26.3,-1.52
796 | CC(=O)c1ccccc1,Acetophenone,-2.0780000000000003,1,120.15099999999995,0,1,1,17.07,-1.28
797 | c2ccc1nc(ccc1c2)c4ccc3ccccc3n4 ,biquinoline,-4.9030000000000005,2,256.308,0,4,1,25.78,-5.4
798 | CCOP(=O)(OCC)OCC,Triethyl phosphate,-0.953,1,182.156,0,0,6,44.760000000000005,0.43
799 | CC2(C)C1CCC(C)(C1)C2=O,D-fenchone,-2.158,1,152.237,0,2,0,17.07,-1.85
800 | COc2cnc1cncnc1n2,7-methoxypteridine,-1.589,1,162.152,0,2,1,60.790000000000006,-0.91
801 | ClC2=C(Cl)C3(Cl)C1C=CCC1C2(Cl)C3(Cl)Cl ,Chlordene,-5.152,1,338.876,0,3,0,0.0,-5.64
802 | CC(C)N(=O)=O,2-Nitropropane,-0.743,1,89.094,0,0,1,43.14,-0.62
803 | c1ccc2c(c1)[nH]c3ccccc32,Carbazole,-3.836,2,167.21099999999998,1,3,0,15.79,-5.27
804 | OCC(O)C(O)CO,Erythritol,0.675,1,122.12,4,0,3,80.92,0.7
805 | CCCOC(=O)c1ccc(N)cc1,Risocaine,-2.709,1,179.21899999999997,1,1,3,52.32,-2.452
806 | CNC(=O)C=C(C)OP(=O)(OC)OC,Azodrin,-0.949,1,223.165,1,0,5,73.86,0.6509999999999999
807 | O=C1CCC(=O)N1,Succinimide,0.282,1,99.089,1,1,0,46.17,0.3
808 | CCC(C)C(C)C,"2,3-Dimethylpentane",-2.938,1,100.20499999999998,0,0,2,0.0,-4.28
809 | CCCCc1c(C)nc(NCC)nc1OS(=O)(=O)N(C)C ,bupirimate,-3.4930000000000003,1,316.4270000000001,1,1,8,84.42,-4.16
810 | CCN2c1ncccc1N(C)C(=S)c3cccnc23 ,RTI 16,-3.411,1,270.361,0,3,1,32.260000000000005,-4.634
811 | O2c1ccccc1N(CC)C(=O)c3ccccc23 ,RTI 9,-3.784,1,239.274,0,3,1,29.54,-3.68
812 | C1CCOCC1,Tetrahydropyran ,-0.978,2,86.134,0,1,0,9.23,-0.03
813 | CCCCCC#C,1-Heptyne,-2.155,1,96.173,0,0,3,0.0,-3.01
814 | c1cc2ccc(OC)c(CC=C(C)(C))c2oc1=O ,osthole,-4.0760000000000005,1,244.29,0,2,3,39.44,-4.314
815 | c1cc(C)cc2c1c3cc4cccc5CCc(c45)c3cc2,3-Methylcholanthrene,-6.311,1,268.3589999999999,0,5,0,0.0,-7.92
816 | CCOC(=O)c1ccccc1,Ethyl benzoate ,-2.775,1,150.177,0,1,2,26.3,-2.32
817 | ClCC(C)C,1-Chloro-2-methylpropane,-1.924,1,92.569,0,0,1,0.0,-2.0
818 | CC34CCC1C(CCc2cc(O)ccc12)C3CCC4(O)C#C ,Ethinyl estradiol,-4.317,1,296.41,2,4,0,40.46,-4.3
819 | CCCCCCCCCCCC(=O)OC,methyl laurate,-4.025,1,214.349,0,0,10,26.3,-4.69
820 | CCCSCCC,Di-n-propylsulfide,-2.307,1,118.245,0,0,4,0.0,-2.58
821 | c1ccc2cc3cc4ccccc4cc3cc2c1,Napthacene,-5.568,2,228.294,0,4,0,0.0,-8.6
822 | CCCCCBr,1-Bromopentane,-2.658,1,151.047,0,0,3,0.0,-3.08
823 | CCCC/C=C/C,trans-2-Heptene ,-2.784,1,98.18899999999998,0,0,3,0.0,-3.82
824 | Cc1ncc(N(=O)=O)n1CCO ,Metranidazole,-0.8590000000000001,1,171.15599999999998,1,1,3,81.19,-1.26
825 | CCCCCC1CCCC1,Pentylcyclopentane,-3.869,1,140.26999999999998,0,1,4,0.0,-6.08
826 | Clc1ccc(Cl)c(c1)c2c(Cl)c(Cl)cc(Cl)c2Cl ,"2,2',3,5,5',6-PCB",-7.261,1,360.88200000000006,0,2,1,0.0,-7.42
827 | O=C1NC(=O)NC(=O)C1(CC)C(C)C,5-Ethyl-5-isopropylbarbituric acid,-1.6030000000000002,1,198.222,2,1,2,75.27000000000001,-2.148
828 | CC(Cl)(Cl)Cl,"1,1,1-Trichloroethane",-2.232,1,133.405,0,0,0,0.0,-2.0
829 | CON(C)C(=O)Nc1ccc(Cl)cc1,Monolinuron,-2.948,1,214.652,1,1,2,41.57,-2.57
830 | O=C2NC(=O)C1(CCCCC1)C(=O)N2,Cyclohexyl-5-spirobarbituric acid,-1.405,1,196.206,2,2,0,75.27,-3.06
831 | CN(C)C(=O)OC1=CC(=O)CC(C)(C)C1 ,dimetan,-2.3040000000000003,1,211.261,0,1,1,46.61,-0.85
832 | Cc1ccc(Br)cc1,4-Bromotoluene,-3.667,1,171.03700000000003,0,1,0,0.0,-3.19
833 | CCOCC,Diethyl ether ,-0.718,1,74.123,0,0,2,9.23,-0.09
834 | CC(C)NC(=O)N1CC(=O)N(C1=O)c2cc(Cl)cc(Cl)c2,Rovral,-4.004,1,330.17100000000005,1,2,2,69.72,-4.376
835 | CCCCN(CC)c1c(cc(cc1N(=O)=O)C(F)(F)F)N(=O)=O,Benfluralin,-5.205,1,335.28200000000004,0,1,7,89.51999999999998,-5.53
836 | Cc1cc(C)c(O)c(C)c1,"2,4,6-Trimethylphenol",-2.9410000000000003,1,136.194,1,1,0,20.23,-2.05
837 | c1ccccc1,Benzene ,-2.418,2,78.11399999999999,0,1,0,0.0,-1.64
838 | Clc1ccc(I)cc1,p-Chloroiodobenzene,-4.384,1,238.455,0,1,0,0.0,-4.03
839 | COc1ccc(NC(=O)N(C)C)cc1Cl,Metoxuron,-2.6830000000000003,1,228.679,1,1,2,41.57,-2.564
840 | CC(C)N(C(=O)CCl)c1ccccc1 ,propachlor,-3.018,1,211.69200000000004,0,1,3,20.31,-2.48
841 | C=Cc1ccccc1,Styrene,-2.85,1,104.15199999999996,0,1,1,0.0,-2.82
842 | COCOC,Dimethoxymethane,0.092,1,76.095,0,0,2,18.46,0.48
843 | Cc1ccccc1C,o-Xylene ,-3.004,1,106.168,0,1,0,0.0,-2.8
844 | CCC(C)O,Butan-2-ol,-0.616,1,74.12299999999999,1,0,1,20.23,0.47
845 | Oc1ccc(O)cc1,"1,4-Benzenediol",-1.59,1,110.11199999999998,2,1,0,40.46,-0.17
846 | CC34CCC1C(CCc2cc(O)ccc12)C3CC(O)C4O ,estriol,-3.858,1,288.387,3,4,0,60.69,-4.955
847 | C1c2ccccc2c3cc4ccccc4cc13,Benzo(b)fluorene,-5.189,2,216.283,0,4,0,0.0,-8.04
848 | O=C1CNC(=O)N1 ,hydantoin,0.603,1,100.077,2,1,0,58.2,-0.4
849 | c1(O)cc(O)ccc1CCCCCC,4-hexylresorcinol,-3.4930000000000003,1,194.27399999999992,2,1,5,40.46,-2.59
850 | C=CCS(=O)SCC=C,allicin,-2.045,1,162.27899999999997,0,0,5,17.07,-0.83
851 | CCOP(=S)(OCC)Oc2ccc1oc(=O)c(Cl)c(C)c1c2,Coumaphos,-5.04,1,362.7710000000001,0,2,6,57.9,-5.382000000000001
852 | Cc1c(C)c2c3ccccc3ccc2c4ccccc14,"5,6-Dimethylchrysene",-6.265,1,256.348,0,4,0,0.0,-7.01
853 | CCCCC(=O)OC3(C(C)CC4C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC34C)C(=O)CO,Betamethasone-17-valerate,-5.062,1,476.5850000000002,2,4,6,100.90000000000002,-4.71
854 | O=c2[nH]c(=O)c1[nH]c(=O)[nH]c1[nH]2 ,uric acid,-0.541,1,168.112,4,2,0,114.36999999999998,-3.93
855 | Oc1c(Cl)cc(Cl)c(Cl)c1Cl,"2,3,4,6-Tetrachlorophenol",-4.203,1,231.893,1,1,0,20.23,-3.1
856 | Clc1cccc(Cl)c1,"1,3-Dichlorobenzene",-3.5580000000000003,1,147.004,0,1,0,0.0,-3.04
857 | Clc1ccc(cc1)C(c2ccc(Cl)cc2)C(Cl)(Cl)Cl,DDT,-6.638,1,354.491,0,2,2,0.0,-7.15
858 | CC(C)COC=O,Isobutyl formate,-1.095,1,102.13299999999998,0,0,3,26.3,-1.01
859 | c1ccccc1SC,thioanisole,-2.87,1,124.208,0,1,1,0.0,-2.39
860 | CCN2c1nc(C)cc(C(F)(F)F)c1NC(=O)c3cccnc23 ,RTI 13,-4.45,1,322.29,1,3,1,58.120000000000005,-4.207
861 | CCCCCC,Hexane ,-2.615,1,86.178,0,0,3,0.0,-3.84
862 | COC(=O)c1cccnc1 ,methyl nicotinate,-1.621,1,137.138,0,1,1,39.19,-0.46
863 | NS(=O)(=O)c3cc2c(NC(Cc1ccccc1)NS2(=O)=O)cc3C(F)(F)F,Bendroflumethiazide,-3.741,1,421.4220000000001,3,3,3,118.36,-3.59
864 | Clc1ccc(cc1Cl)c2cc(Cl)c(Cl)c(Cl)c2Cl ,"2,3,3',4,4',5-PCB",-7.425,1,360.88200000000006,0,2,1,0.0,-7.82
865 | CC1(OC(=O)N(C1=O)c2cc(Cl)cc(Cl)c2)C=C,Vinclozolin,-4.377,1,286.11400000000003,0,2,2,46.61,-4.925
866 | CCNc1nc(Cl)nc(NC(C)(C)C#N)n1,Cyanazine,-2.49,1,240.698,2,1,4,86.52,-3.15
867 | c1ccc2c(c1)c3ccccc3c4ccccc24,Triphenylene,-5.568,2,228.294,0,4,0,0.0,-6.726
868 | CC=C(C(=CC)c1ccc(O)cc1)c2ccc(O)cc2,Dienestrol,-4.775,1,266.34,2,2,3,40.46,-4.95
869 | CCCCC(CC)COC(=O)c1ccccc1C(=O)OCC(CC)CCCC,Di(2-ethylhexyl)-phthalate,-7.117000000000001,1,390.5640000000003,0,1,14,52.60000000000001,-6.96
870 | CCc1ccccn1,2-Ethyl pyridine,-2.051,1,107.15599999999998,0,1,1,12.89,0.51
871 | COP(=O)(OC)OC(Br)C(Cl)(Cl)Br,Naled,-3.548,1,380.784,0,0,5,44.760000000000005,-2.28
872 | c1ccc(cc1)c2ccccc2,Biphenyl,-4.079,2,154.21199999999996,0,2,1,0.0,-4.345
873 | Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cc(Cl)cc2Cl ,"2,2',4,4',6,6'-PCB",-7.178999999999999,1,360.88200000000006,0,2,1,0.0,-8.71
874 | CN(C)c1nc(nc(n1)N(C)C)N(C)C,Altretamine,-2.492,1,210.285,0,1,3,48.39000000000001,-3.364
875 | CC(C)CC(C)(C)O,"2,4-Dimethyl-2-pentanol ",-1.6469999999999998,1,116.20399999999998,1,0,2,20.23,-0.92
876 | O=C2NC(=O)C1(CCCCCC1)C(=O)N2 ,Cycloheptyl-5-spirobarbituric acid,-1.844,1,210.233,2,2,0,75.27,-3.168
877 | OCC1OC(O)(CO)C(O)C1O,Fructose,0.471,1,180.156,5,1,2,110.38,0.64
878 | Cc1cc(C)cc(O)c1,"3,5-Dimethylphenol",-2.652,1,122.16699999999996,1,1,0,20.23,-1.4
879 | ClCC#CCOC(=O)Nc1cccc(Cl)c1,Barban,-4.16,1,258.104,1,1,2,38.33,-4.37
880 | CC(=O)Nc1ccc(Cl)cc1,p-Chloroacetanilide,-2.642,1,169.611,1,1,1,29.1,-2.843
881 | Clc1ccc(Cl)c(c1)c2c(Cl)c(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,4,5,5',6-PCB",-7.898,1,395.3270000000001,0,2,1,0.0,-8.94
882 | CCC(C)(C)C,"2,2-Dimethylbutane",-2.584,1,86.17799999999998,0,0,0,0.0,-3.55
883 | CNc1ccccc1,N-Methylaniline ,-2.097,1,107.15599999999998,1,1,1,12.03,-1.28
884 | C=CCC=C,"1,4-Pentadiene ",-1.758,1,68.119,0,0,2,0.0,-2.09
885 | CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3C(O)CC21C,Hydrocortisone 21-acetate,-3.692,1,404.5030000000002,2,4,3,100.90000000000002,-4.88
886 | Cc1cc(cc(N(=O)=O)c1O)N(=O)=O,DNOC,-2.818,1,198.134,1,1,2,106.51000000000002,-1.456
887 | OC3N=C(c1ccccc1Cl)c2cc(Cl)ccc2NC3=O,Lorazepam,-3.75,1,321.163,2,3,1,61.690000000000005,-3.604
888 | Oc1cccc(Cl)c1,3-Chlorophenol,-2.761,1,128.558,1,1,0,20.23,-0.7
889 | Clc1cccc(Br)c1,m-Chlorobromobenzene,-3.928,1,191.455,0,1,0,0.0,-3.21
890 | NS(=O)(=O)c2cc1c(N=CNS1(=O)=O)cc2Cl ,chlorothiazide,-1.7519999999999998,1,295.72900000000004,2,2,1,118.69,-3.05
891 | O=C1NC(=O)NC(=O)C1(C)CC,5-Methyl-5-ethylbarbituric acid,-0.911,1,170.16799999999998,2,1,1,75.27000000000001,-1.228
892 | OCCOc1ccccc1,2-Phenoxyethanol,-1.761,1,138.16599999999997,1,1,3,29.46,-0.7
893 | C(c1ccccc1)c2ccccc2,Diphenylmethane,-4.09,2,168.239,0,2,2,0.0,-4.08
894 | CCCCCC(O)CC,3-Octanol,-2.033,1,130.23099999999997,1,0,5,20.23,-1.98
895 | CCN(Cc1c(F)cccc1Cl)c2c(cc(cc2N(=O)=O)C(F)(F)F)N(=O)=O,Flumetralin,-6.584,1,421.7340000000001,0,2,6,89.51999999999998,-6.78
896 | CC(C)Nc1nc(Cl)nc(NC(C)C)n1,Propazine,-3.329,1,229.71500000000003,2,1,4,62.73,-4.43
897 | CCCC(C)CO,2-Methylpentanol,-1.381,1,102.177,1,0,3,20.23,-1.11
898 | CCCCC(C)(C)O,2-Methyl-2-hexanol,-1.663,1,116.20399999999998,1,0,3,20.23,-1.08
899 | CCc1ccccc1,Ethylbenzene,-2.988,1,106.16799999999996,0,1,1,0.0,-2.77
900 | O=C1NC(=O)NC(=O)C1(CC)CC=C(C)C,5-(3-Methyl-2-butenyl)-5-ethylbarbital,-2.126,1,224.26,2,1,3,75.27000000000001,-2.253
901 | ClC1C=CC2C1C3(Cl)C(=C(Cl)C2(Cl)C3(Cl)Cl)Cl,Heptachlor,-5.26,1,373.3209999999999,0,3,0,0.0,-6.317
902 | CCC(C)C1(CC(Br)=C)C(=O)NC(=O)NC1=O ,butallylonal,-2.766,1,303.156,2,1,4,75.27000000000001,-2.647
903 | CC1(C)C(C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2)C1(C)C,Fenpropathrin,-6.15,1,349.43000000000006,0,3,5,59.32000000000001,-6.025
904 | COC(C)(C)CCCC(C)CC=CC(C)=CC(=O)OC(C)C,Methoprene,-4.795,1,310.47800000000007,0,0,10,35.53,-5.19
905 | CCOC(=O)CC,Ethyl propionate,-1.1909999999999998,1,102.133,0,0,2,26.3,-0.66
906 | CSc1nc(NC(C)C)nc(NC(C)C)n1,Prometryn,-3.693,1,241.364,2,1,5,62.73,-4.1
907 | CC(C#C)N(C)C(=O)Nc1ccc(Cl)cc1,Buturon,-3.199,1,236.702,1,1,2,32.34,-3.9
908 | Cc1cc2ccccc2cc1C,"2,3-Dimethylnaphthalene",-4.1160000000000005,1,156.22799999999998,0,2,0,0.0,-4.72
909 | Clc1ccc(cc1)c2cc(Cl)ccc2Cl ,"2,4',5-PCB",-5.7620000000000005,1,257.547,0,2,1,0.0,-6.25
910 | Clc1ccc(c(Cl)c1)c2cc(Cl)c(Cl)c(Cl)c2Cl ,"2,3',4,4',5-PCB",-7.343,1,360.88200000000006,0,2,1,0.0,-7.39
911 | NC(N)=NC#N ,2-cyanoguanidine,0.361,1,84.082,2,0,0,88.19,-0.31
912 | ClC(Cl)(Cl)N(=O)=O,Chloropicrin,-1.866,1,164.375,0,0,0,43.14,-2.0
913 | Clc1cccc(Cl)c1c2ccccc2 ,"2,6-PCB",-4.984,1,223.102,0,2,1,0.0,-5.21
914 | COc1ccc(C=O)cc1,p-Methoxybenzaldehyde,-2.252,1,136.14999999999998,0,1,2,26.3,-1.49
915 | CC(=O)Nc1ccc(cc1)N(=O)=O,4-Nitroacetanilide,-2.219,1,180.163,1,1,2,72.24000000000001,-2.692
916 | CCCCCCC(=O)OCC,Ethyl heptanoate,-2.608,1,158.241,0,0,6,26.3,-2.74
917 | CC(=O)Nc1ccc(O)cc1,p-Hydroxyacetanilide,-1.495,1,151.165,2,1,1,49.33,-1.03
918 | c2ccc1[nH]ncc1c2 ,indazole,-2.34,2,118.13899999999998,1,2,0,28.68,-2.16
919 | CC5(C)OC4CC3C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC3(C)C4(O5)C(=O)CO ,triamcinolone acetonide,-3.928,1,434.50400000000025,2,5,2,93.06000000000002,-4.31
920 | Nc2nc1[nH]cnc1c(=O)[nH]2 ,guanine,-0.67,1,151.129,3,2,0,100.45,-3.583
921 | COC(=O)C,Methyl acetate,-0.416,1,74.07900000000001,0,0,0,26.3,0.46
922 | CC34CCC1C(CCC2CC(=O)CCC12C)C3CCC4O ,Stanolone,-3.882,1,290.44699999999995,1,4,0,37.3,-4.743
923 | CCCC(O)C=C,1-Hexene-3-ol,-1.199,1,100.161,1,0,3,20.23,-0.59
924 | OC(C1=CC2C5C(C1C2=C(c3ccccc3)c4ccccn4)C(=O)NC5=O)(c6ccccc6)c7ccccn7 ,norbormide,-4.238,1,511.5810000000002,2,7,5,92.18,-3.931
925 | CCCCOCCCC,Dibutyl ether ,-2.135,1,130.231,0,0,6,9.23,-1.85
926 | CCCCCCCCCCCCO,1-Dodecanol,-3.523,1,186.339,1,0,10,20.23,-4.8
927 | CCN2c1nc(N(C)(CCO))ccc1NC(=O)c3cccnc23 ,RTI 6,-3.335,1,313.36100000000005,2,3,4,81.59000000000002,-3.36
928 | CCCC(C)(C)O,2-Methyl-2-pentanol,-1.308,1,102.17699999999998,1,0,2,20.23,-0.49
929 | Nc1nc(=O)[nH]cc1F,Flucytosine,-0.132,1,129.09399999999997,2,1,0,71.77,-0.972
930 | CCCCOc1ccc(C(=O)OCC)c(c1)N(CC)CC ,stadacaine,-5.127999999999999,1,293.40700000000004,0,1,9,38.77,-3.84
931 | CCCCCC(C)(C)O,2-Methyl-2-heptanol,-2.017,1,130.231,1,0,4,20.23,-1.72
932 | Cc1c(C)c(C)c(C)c(C)c1C,Hexamethylbenzene,-4.361000000000001,1,162.27599999999998,0,1,0,0.0,-5.23
933 | CC(C)c1ccc(C)cc1O,Thymol,-3.129,1,150.22099999999998,1,1,1,20.23,-2.22
934 | c2cnc1ncncc1n2,Pteridine,-0.906,2,132.12599999999998,0,2,0,51.56,0.02
935 | CCOP(=S)(OCC)Oc1ccc(cc1)N(=O)=O,Parathion,-3.949,1,291.26500000000004,0,1,7,70.83000000000001,-4.66
936 | C,Methane,-0.636,0,16.043,0,0,0,0.0,-0.9
937 | c2ccc1NCCc1c2 ,indoline,-2.195,2,119.167,1,2,0,12.03,-1.04
938 | O=N(=O)c1cccc2ccccc12,1-Nitronapthalene,-3.414,1,173.171,0,2,1,43.14,-3.54
939 | CCC(C)C(=O)C,3-Methyl-2-pentanone,-1.266,1,100.161,0,0,2,17.07,-0.67
940 | Nc1nc(O)nc2nc[nH]c12 ,isoguanine,-1.74,1,151.129,3,2,0,100.71,-3.401
941 | OC(CC(c1ccccc1)c3c(O)c2ccccc2oc3=O)c4ccc(cc4)c5ccc(Br)cc5 ,bromadiolone,-7.877000000000001,1,527.4140000000002,2,5,6,70.67,-4.445
942 | CN(=O)=O,Nitromethane,-0.042,1,61.040000000000006,0,0,0,43.14,0.26
943 | CC(C)N(C(C)C)C(=O)SCC(Cl)=C(Cl)Cl,Triallate,-4.578,1,304.67,0,0,4,20.31,-4.88
944 | C=CCCC=C,"1,5-Hexadiene ",-2.112,1,82.14599999999999,0,0,3,0.0,-2.68
945 | c2ccc1[nH]ccc1c2,Indole,-2.654,2,117.15099999999995,1,2,0,15.79,-1.52
946 | CC34CCC1C(CCC2=CC(=O)CCC12C)C3CCC4=O,Androstenedione,-3.393,1,286.415,0,4,0,34.14,-3.69
947 | CCCCC=C,1-Hexene,-2.364,1,84.16199999999999,0,0,3,0.0,-3.23
948 | Cc1cccc(C)c1NC(=O)c2cc(c(Cl)cc2O)S(N)(=O)=O,Xipamide,-3.642,1,354.8150000000001,3,2,3,109.48999999999998,-3.79
949 | CCC1CCCCC1,Ethylcyclohexane,-3.245,1,112.216,0,1,1,0.0,-4.25
950 | CCCCCCCC(=O)C,2-Nonanone,-2.263,1,142.242,0,0,6,17.07,-2.58
951 | COC(=O)Nc2nc1ccc(cc1[nH]2)C(=O)c3ccccc3,Mebendazole,-4.118,1,295.298,2,3,3,84.07999999999998,-3.88
952 | CC(C)OC(=O)Nc1cccc(Cl)c1,Chloropham,-3.544,1,213.664,1,1,2,38.33,-3.38
953 | CCN2c1nc(Cl)ccc1N(C)C(=O)c3cccnc23 ,RTI 12,-3.446,1,288.73800000000006,0,3,1,49.330000000000005,-4.114
954 | CNC(=O)Oc1cccc2ccccc12,Carbaryl,-3.087,1,201.225,1,2,1,38.33,-3.224
955 | C#C,Ethyne,-0.252,1,26.038,0,0,0,0.0,0.29
956 | Cc1cncc(C)c1,"3,5-Dimethylpyridine",-2.0980000000000003,1,107.15599999999998,0,1,0,12.89,0.38
957 | C1C=CCC=C1,"1,4-Cyclohexadiene",-1.842,2,80.12999999999998,0,1,0,0.0,-2.06
958 | CCOC(=O)N(C)C(=O)CSP(=S)(OCC)OCC,Mecarbam,-3.738,1,329.3800000000001,0,0,8,65.07000000000001,-2.518
959 | CC(O)c1ccccc1,1-Phenylethanol,-1.919,1,122.16699999999996,1,1,1,20.23,-0.92
960 | CC(Cl)CCl,"1,2-Dichloropropane",-1.794,1,112.987,0,0,1,0.0,-1.6
961 | CCCC=C(CC)C=O,2-Ethyl-2-hexanal,-2.081,1,126.19899999999998,0,0,4,17.07,-2.46
962 | CCOP(=S)(OCC)SCCSCC,Disulfoton,-3.975,1,274.413,0,0,9,18.46,-4.23
963 | CC(=O)OC3(C)CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,methyltestosterone acetate,-4.863,1,344.4950000000001,0,4,1,43.370000000000005,-5.284
964 | Clc1ccc(cc1)c2c(Cl)cccc2Cl ,"2,4,6-PCB",-5.604,1,257.547,0,2,1,0.0,-6.14
965 | Fc1cccc(F)c1C(=O)NC(=O)Nc2ccc(Cl)cc2,difluron,-4.692,1,310.687,2,2,2,58.2,-6.02
966 | Oc1cc(Cl)ccc1Oc2ccc(Cl)cc2Cl,Triclosan,-5.645,1,289.545,1,2,2,29.46,-4.46
967 | c1(C(=O)OCCCCCC(C)(C))c(C(=O)OCCCCCC(C)(C))cccc1,diisooctyl phthalate,-7.117000000000001,1,390.5640000000002,0,1,14,52.60000000000001,-6.6370000000000005
968 | CC12CC(O)C3C(CCC4=CC(=O)CCC34C)C2CCC1C(=O)CO,Corticosterone,-3.454,1,346.46700000000016,2,4,2,74.6,-3.24
969 | Cc1cc(C)cc(C)c1,"1,3,5-Trimethylbenzene ",-3.375,1,120.19499999999998,0,1,0,0.0,-3.4
970 | CCCCCCCCOC(=O)c1ccccc1C(=O)OCCCCCCCC ,dioctyl phthalate,-7.148,1,390.5640000000004,0,1,16,52.60000000000001,-5.115
971 | CCCCCCCCCCCCCCCO,1-Pentadecanol,-4.586,1,228.42,1,0,13,20.23,-6.35
972 | Clc1cccc(Cl)c1c2c(Cl)cccc2Cl ,"2,2',6,6'-PCB",-5.915,1,291.992,0,2,1,0.0,-7.39
973 | O=C1NC(=O)NC(=O)C1(C)C,"5,5-Dimethylbarbituric acid",-0.556,1,156.141,2,1,0,75.27000000000001,-1.742
974 | CC(C)I,2-Iodopropane,-2.486,1,169.993,0,0,0,0.0,-2.09
975 | O=N(=O)c1ccccc1N(=O)=O,"1,2-Dinitrobenzene",-2.281,1,168.10799999999995,0,1,2,86.28,-3.1
976 | CC(C)C(=O)C,3-Methyl-2-butanone,-0.912,1,86.13399999999999,0,0,1,17.07,-0.12
977 | CCCCCCCCCCCCCCCC,Hexadecane,-6.159,1,226.44799999999992,0,0,13,0.0,-8.4
978 | CC12CCC(CC1)C(C)(C)O2,"1,8-Cineole",-2.579,1,154.253,0,3,0,9.23,-1.74
979 | Cc2cccc3sc1nncn1c23 ,Tricyclazole,-2.8680000000000003,1,189.243,0,3,0,30.19,-2.07
980 | CCCCCCC(=O)C,2-Octanone,-1.909,1,128.21499999999995,0,0,5,17.07,-2.05
981 | CCCCCCCCC(=O)OC,Methyl nonanoate,-2.962,1,172.268,0,0,7,26.3,-3.38
982 | Fc1ccc(F)cc1,"1,4-Difluorobenzene",-2.636,1,114.094,0,1,0,0.0,-1.97
983 | O=C1N(C2CCC(=O)NC2=O)C(=O)c3ccccc13,Thalidomide,-1.944,1,258.233,1,3,1,83.55000000000001,-2.676
984 | CCCN(CCC)c1c(cc(cc1N(=O)=O)C(F)(F)F)N(=O)=O,Trifluralin,-5.205,1,335.28200000000004,0,1,7,89.51999999999998,-5.68
985 | CCO,Ethanol,0.02,1,46.069,1,0,0,20.23,1.1
986 | O=C2NC(=O)C1(CCCC1)C(=O)N2,Cyclopentyl-5-spirobarbituric acid,-0.966,1,182.179,2,2,0,75.27,-2.349
987 | c1c(NC(=O)OC(C)C(=O)NCC)cccc1,Carbetamide,-2.29,1,236.271,2,1,4,67.42999999999999,-1.83
988 | CC(C)=CC3C(C(=O)OCc2cccc(Oc1ccccc1)c2)C3(C)C ,phenothrin,-6.763,1,350.4580000000001,0,3,6,35.53,-5.24
989 | CN(C)C(=O)NC1CCCCCCC1,Cycluron,-2.629,1,198.30999999999992,1,1,1,32.34,-2.218
990 | ClC1(C2(Cl)C3(Cl)C4(Cl)C5(Cl)C1(Cl)C3(Cl)Cl)C5(Cl)C(Cl)(Cl)C24Cl,Mirex,-6.155,1,545.5460000000002,0,6,0,0.0,-6.8
991 | CCCCCCCCBr,1-Bromooctane,-3.721,1,193.128,0,0,6,0.0,-5.06
992 | CCCCNC(=O)n1c(NC(=O)OC)nc2ccccc12,Benomyl,-2.902,1,290.323,2,2,4,85.25,-4.883
993 | CN(C)c2c(C)n(C)n(c1ccccc1)c2=O ,aminopyrine,-2.129,1,231.299,0,2,2,30.17,-0.364
994 | CCC(O)CC,3-Pentanol,-0.97,1,88.15,1,0,2,20.23,-0.24
995 | Cc1ccc(cc1)N(=O)=O,p-Nitrotoluene,-2.64,1,137.138,0,1,1,43.14,-2.49
996 | CC(C)CCCO,4-Methylpentanol,-1.381,1,102.177,1,0,3,20.23,-1.14
997 | CC34CCC1C(CCC2=CC(=O)CCC12O)C3CCC4(O)C#C,Norethisterone,-2.669,1,314.42500000000007,2,4,0,57.53,-4.57
998 | CC(C)OC(=O)C(O)(c1ccc(Br)cc1)c2ccc(Br)cc2 ,bromopropylate,-5.832999999999999,1,428.12000000000006,1,2,4,46.53,-4.93
999 | Nc2cnn(c1ccccc1)c(=O)c2Cl,Pyrazon,-2.603,1,221.647,1,2,1,60.91,-2.878
1000 | CCC(C)(C)O,2-Methylbutan-2-ol,-0.954,1,88.14999999999998,1,0,1,20.23,0.15
1001 | Cc1ccc(O)cc1,p-Cresol,-2.313,1,108.14,1,1,0,20.23,-0.73
1002 | CCOC=O,Ethyl formate,-0.402,1,74.07900000000001,0,0,2,26.3,0.15
1003 | CN(C)c1ccccc1,"N,N-Dimethylaniline",-2.542,1,121.18299999999996,0,1,1,3.24,-1.92
1004 | C1CCC2CCCCC2C1,Decalin,-3.715,2,138.254,0,2,0,0.0,-5.19
1005 | CCCCS,Butanethiol ,-1.676,1,90.191,1,0,2,0.0,-2.18
1006 | c1ccc2c(c1)c3cccc4ccc5cccc2c5c43,Benzo(e)pyrene,-6.007000000000001,2,252.316,0,5,0,0.0,-7.8
1007 | ClC(=C(Cl)Cl)Cl,Tetrachloroethylene,-3.063,1,165.834,0,0,0,0.0,-2.54
1008 | CCC(=O)CC,3-Pentanone,-0.912,1,86.134,0,0,2,17.07,-0.28
1009 | C=CC#N,Acrylonitrile,-0.354,1,53.06399999999999,0,0,0,23.79,0.15
1010 | CC1CC2C3CC(F)C4=CC(=O)C=CC4(C)C3(F)C(O)CC2(C)C1(O)C(=O)CO,Flumethasone,-3.539,1,410.4570000000002,3,4,2,94.83,-5.613
1011 | CCCCC(=O)C,2-Hexanone,-1.2,1,100.161,0,0,3,17.07,-0.8
1012 | CCNc1nc(NC(C)(C)C)nc(OC)n1,Terbumeton,-3.505,1,225.296,2,1,4,71.96000000000001,-3.239
1013 | CCCCC(C)CC,3-Methylheptane,-3.3080000000000003,1,114.232,0,0,4,0.0,-5.16
1014 | BrCCBr,"1,2-Dibromoethane",-2.102,1,187.862,0,0,1,0.0,-1.68
1015 | CNC(=O)Oc1ccccc1C(C)C,Isoprocarb,-2.734,1,193.246,1,1,2,38.33,-2.863
1016 | O=C1NCCN1c2ncc(s2)N(=O)=O,Niridazole,-1.948,1,214.206,1,2,2,88.37,-3.22
1017 | C1c2ccccc2c3ccc4ccccc4c13,Benzo(a)fluorene,-5.189,2,216.283,0,4,0,0.0,-6.68
1018 | COc1ccccc1Cl,2-Chloroanisole,-2.912,1,142.58499999999998,0,1,1,9.23,-2.46
1019 | COP(=S)(OC)Oc1cc(Cl)c(Br)cc1Cl,Bromophos,-5.604,1,366.0,0,1,4,27.69,-6.09
1020 | ClC(Cl)CC(=O)NC2=C(Cl)C(=O)c1ccccc1C2=O,Quinonamid,-3.988,1,332.57000000000005,1,2,3,63.24,-5.03
1021 | ClC(Cl)C(c1ccc(Cl)cc1)c2ccc(Cl)cc2 ,"P,P'-DDD",-6.007999999999999,1,320.04600000000005,0,2,3,0.0,-7.2
1022 | COC(=O)C=C,Methyl acrylate,-0.878,1,86.09,0,0,1,26.3,-0.22
1023 | CN(C)C(=O)Nc2ccc(Oc1ccc(Cl)cc1)cc2,Chloroxuron,-4.477,1,290.75,1,2,3,41.57000000000001,-4.89
1024 | N(=Nc1ccccc1)c2ccccc2,Azobenzene,-4.034,2,182.226,0,2,2,24.72,-4.45
1025 | CC(C)c1ccc(C)cc1,4-Isopropyltoluene,-3.617,1,134.22199999999998,0,1,1,0.0,-3.77
1026 | Oc1c(Cl)cccc1Cl,"2,6-Dichlorophenol",-3.012,1,163.003,1,1,0,20.23,-1.79
1027 | OCC2OC(OC1(CO)OC(CO)C(O)C1O)C(O)C(O)C2O ,Sucrose,0.31,1,342.297,8,2,5,189.53,0.79
1028 | OC1C(O)C(O)C(O)C(O)C1O,d-inositol,-0.887,1,180.156,6,1,0,121.38,0.35
1029 | Cn2c(=O)n(C)c1ncn(CC(O)CO)c1c2=O,Dyphylline,-0.847,1,254.24599999999995,2,2,3,102.28,-0.17
1030 | OCC(NC(=O)C(Cl)Cl)C(O)c1ccc(cc1)N(=O)=O,Chloramphenicol,-2.613,1,323.13200000000006,3,1,6,112.70000000000002,-2.111
1031 | CCC(O)(CC)CC,3-Ethyl-3-pentanol,-1.663,1,116.204,1,0,3,20.23,-0.85
1032 | CC45CCC2C(CCC3CC1SC1CC23C)C4CCC5O,Epitostanol,-4.545,1,306.51500000000004,1,5,0,20.23,-5.41
1033 | Brc1ccccc1Br,"1,2-Dibromobenzene",-4.172,1,235.906,0,1,0,0.0,-3.5
1034 | Oc1c(Cl)cc(Cl)cc1Cl,"2,4,6-Trichlorophenol",-3.648,1,197.448,1,1,0,20.23,-2.34
1035 | CCCN(CCC)c1c(cc(cc1N(=O)=O)S(N)(=O)=O)N(=O)=O,oryzalin,-3.784,1,346.3650000000001,1,1,8,149.67999999999998,-5.16
1036 | C2c1ccccc1N(CCF)C(=O)c3ccccc23 ,RTI 20,-3.663,1,255.292,0,3,2,20.31,-4.799
1037 | CC(C)C(=O)C(C)C,"2,4-Dimethyl-3-pentanone",-1.7519999999999998,1,114.18799999999996,0,0,2,17.07,-1.3
1038 | O=C1NC(=O)NC(=O)C1(C(C)C)CC=C(C)C,5-(3-Methyl-2-butenyl)-5-isoPrbarbital,-2.465,1,238.287,2,1,3,75.27000000000001,-2.593
1039 | c1c(O)C2C(=O)C3cc(O)ccC3OC2cc1(OC),gentisin,-1.2919999999999998,1,262.261,2,3,1,75.99000000000001,-2.943
1040 | Cn1cnc2n(C)c(=O)n(C)c(=O)c12,Caffeine,-1.4980000000000002,1,194.194,0,2,0,61.82,-0.8759999999999999
1041 | CC(=O)SC4CC1=CC(=O)CCC1(C)C5CCC2(C)C(CCC23CCC(=O)O3)C45,Spironolactone,-3.842,1,416.58300000000025,0,5,1,60.44,-4.173
1042 | Cc1ccc(O)cc1C,"3,4-Dimethylphenol",-2.6210000000000004,1,122.167,1,1,0,20.23,-1.38
1043 | O(c1ccccc1)c2ccccc2,Diphenyl ether ,-4.254,2,170.211,0,2,2,9.23,-3.96
1044 | Clc1cc(Cl)c(cc1Cl)c2cc(Cl)c(Cl)cc2Cl ,"2,2',4,4',5,5'-PCB",-7.343,1,360.88200000000006,0,2,1,0.0,-8.56
1045 | NC(=O)c1cccnc1 ,nicotinamide,-0.964,1,122.12699999999997,1,1,1,55.98,0.61
1046 | Sc1ccccc1,Thiophenol ,-2.758,1,110.18099999999995,1,1,0,0.0,-2.12
1047 | CNC(=O)Oc1cc(C)cc(C)c1,XMC,-2.688,1,179.219,1,1,1,38.33,-2.5810000000000004
1048 | ClC1CC2C(C1Cl)C3(Cl)C(=C(Cl)C2(Cl)C3(Cl)Cl)Cl,Chlordane,-6.039,1,409.7819999999999,0,3,0,0.0,-6.86
1049 | CSSC,Dimethyldisulfide,-1.524,1,94.204,0,0,1,0.0,-1.44
1050 | NC(=O)c1ccccc1,Benzamide,-1.501,1,121.13899999999995,1,1,1,43.09,-0.96
1051 | Clc1ccccc1Br,o-Chlorobromobenzene,-3.84,1,191.455,0,1,0,0.0,-3.19
1052 | COC(=O)c1ccccc1OC2OC(COC3OCC(O)C(O)C3O)C(O)C(O)C2O,Monotropitoside,-1.493,1,446.405,6,3,6,184.6,-0.742
1053 | CCCCC(O)CC,3-Heptanol ,-1.6780000000000002,1,116.20399999999998,1,0,4,20.23,-1.47
1054 | CCN2c1nc(C)cc(C)c1NC(=O)c3cccnc23 ,RTI 15,-3.891,1,268.32,1,3,1,58.120000000000005,-4.553999999999999
1055 | Oc1cc(Cl)cc(Cl)c1,"3,5-Dichlorophenol",-3.428,1,163.003,1,1,0,20.23,-1.34
1056 | Cc1cccc2c1ccc3ccccc32,1-Methylphenanthrene,-4.87,1,192.261,0,3,0,0.0,-5.85
1057 | CCCCC(CC)CO,2-Ethyl-1-hexanol,-2.089,1,130.231,1,0,5,20.23,-2.11
1058 | CC(C)N(C(C)C)C(=O)SCC(=CCl)Cl,Diallate,-3.827,1,270.225,0,0,4,20.31,-4.2860000000000005
1059 | Cc1ccccc1,Toluene ,-2.713,1,92.141,0,1,0,0.0,-2.21
1060 | Clc1cccc(n1)C(Cl)(Cl)Cl,Nitrapyrin,-3.833,1,230.909,0,1,0,12.89,-3.76
1061 | C1CCC=CCC1,Cycloheptene,-2.599,2,96.173,0,1,0,0.0,-3.18
1062 | CN(C)C(=S)SSC(=S)N(C)C ,Thiram,-2.444,1,240.444,0,0,0,6.48,-3.9
1063 | COC1=CC(=O)CC(C)C13Oc2c(Cl)c(OC)cc(OC)c2C3=O,Griseofulvin,-3.3280000000000003,1,352.7700000000001,0,3,3,71.06,-3.2460000000000004
1064 | CCCCCCCCCCO,1-Decanol,-2.814,1,158.285,1,0,8,20.23,-3.63
1065 | CCC(C)(C)CC,"3,3-Dimethylpentane",-2.938,1,100.20499999999998,0,0,2,0.0,-4.23
1066 | CNC(=O)C(C)SCCSP(=O)(OC)(OC),vamidothion,-1.446,1,287.343,1,0,8,64.63000000000001,1.144
1067 | Oc1cc(Cl)c(Cl)c(Cl)c1Cl,"2,3,4,5-Tetrachlorophenol",-4.335,1,231.893,1,1,0,20.23,-3.15
1068 | CCCC=O,Butyraldehyde,-0.7490000000000001,1,72.107,0,0,2,17.07,-0.01
1069 | CC4CC3C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC3(C)C4(O)C(=O)COC(C)=O ,dexamethasone acetate,-3.933,1,434.5040000000003,2,4,3,100.9,-4.9
1070 | CCCC,Butane,-1.907,1,58.124,0,0,1,0.0,-2.57
1071 | COc1ccccc1O,o-Methoxyphenol,-1.941,1,124.13899999999995,1,1,1,29.46,-1.96
1072 | CC1CC2C3CCC(O)(C(=O)C)C3(C)CC(O)C2(F)C4(C)C=CC(=O)C=C14,Fluoromethalone,-3.507,1,376.4680000000001,2,4,1,74.6,-4.099
1073 | ClC(Cl)C(Cl)(Cl)Cl,Pentachloroethane,-3.382,1,202.295,0,0,0,0.0,-2.6
1074 | CCOC(=O)c1ccccc1C(=O)OCC,Diethyl phthalate ,-3.016,1,222.23999999999995,0,1,4,52.60000000000001,-2.35
1075 | CC(C)CO,2-Methylpropan-1-ol,-0.672,1,74.12299999999999,1,0,1,20.23,0.1
1076 | CC(C)Cc1ccccc1,Isobutylbenzene,-3.57,1,134.22199999999998,0,1,2,0.0,-4.12
1077 | ICI,Diiodomethane,-2.958,1,267.835,0,0,0,0.0,-2.34
1078 | CCCC(O)CCC,4-Heptanol,-1.6780000000000002,1,116.204,1,0,4,20.23,-1.4
1079 | CCCCCOC(=O)C,Pentyl acetate,-1.833,1,130.18699999999998,0,0,4,26.3,-1.89
1080 | Oc1c(Cl)c(Cl)cc(Cl)c1Cl,"2,3,5,6-Tetrachlorophenol",-4.203,1,231.893,1,1,0,20.23,-3.37
1081 | CCCc1ccccc1,Propylbenzene ,-3.281,1,120.19499999999996,0,1,2,0.0,-3.37
1082 | FC(F)(Cl)C(F)(F)Cl,"1,2-Dichlorotetrafluoroethane",-2.697,1,170.92000000000002,0,0,1,0.0,-2.74
1083 | CC=CC=O,2-butenal,-0.604,1,70.09100000000001,0,0,1,17.07,0.32
1084 | CN(C)C(=O)N(C)C ,tetramethylurea,-0.495,1,116.164,0,0,0,23.550000000000004,0.94
1085 | Cc1cc(C)c(C)cc1C,"1,2,4,5-Tetramethylbenzene",-3.664,1,134.22199999999998,0,1,0,0.0,-4.59
1086 | CC(=O)OC3(CCC4C2CCC1=CC(=O)CCC1C2CCC34C)C#C,norethindrone acetate,-4.2410000000000005,1,340.4630000000001,0,4,1,43.370000000000005,-4.8
1087 | CCOP(=S)(OCC)N2C(=O)c1ccccc1C2=O,Ditalimfos,-3.992,1,299.28800000000007,0,2,5,55.84,-3.35
1088 | c1ccccc1NC(=O)c2c(O)cccc2,salicylanilide,-3.782,1,213.236,2,2,2,49.33,-3.59
1089 | CCN(CC)C(=S)SCC(Cl)=C,Sulfallate,-3.254,1,223.794,0,0,4,3.24,-3.39
1090 | ClCC,Chloroethane,-1.165,1,64.515,0,0,0,0.0,-1.06
1091 | CC(=O)Nc1cc(NS(=O)(=O)C(F)(F)F)c(C)cc1C,Mefluidide,-3.165,1,310.297,2,1,3,75.27000000000001,-3.24
1092 | O=C(C=CC=Cc2ccc1OCOc1c2)N3CCCCC3,Piperine,-3.659,1,285.343,0,3,3,38.77,-3.46
1093 | CC/C=C\C,cis-2-Pentene,-2.076,1,70.135,0,0,1,0.0,-2.54
1094 | CNC(=O)ON=C(CSC)C(C)(C)C ,thiofanox,-2.7,1,218.322,1,0,3,50.69,-1.62
1095 | O=C2NC(=O)C1(CCCCCCC1)C(=O)N2,Cyclooctyl-5-spirobarbituric acid,-2.2840000000000003,1,224.26,2,2,0,75.27,-2.982
1096 | c1(C(C)(C)C)cc(C(C)(C)C)cc(OC(=O)NC)c1,butacarb,-4.642,1,263.381,1,1,1,38.33,-4.24
1097 | Oc2cc(O)c1C(=O)CC(Oc1c2)c3ccc(O)c(O)c3,Eriodictyol,-3.152,1,288.255,4,3,1,107.22,-3.62
1098 | O=C(c1ccccc1)c2ccccc2,Benzophenone,-3.612,1,182.222,0,2,2,17.07,-3.12
1099 | CCCCCCCCCCCCCCCCCCCC,Eicosane,-7.576,1,282.5559999999999,0,0,17,0.0,-8.172
1100 | N(Nc1ccccc1)c2ccccc2 ,hydrazobenzene,-3.492,2,184.242,2,2,3,24.06,-2.92
1101 | CCC(CC)CO,2-Ethyl-1-butanol,-1.381,1,102.177,1,0,3,20.23,-1.17
1102 | Oc1ccncc1,4-hydroxypyridine,-1.655,1,95.10099999999998,1,1,0,33.120000000000005,1.02
1103 | Cl\C=C/Cl,"cis 1,2-Dichloroethylene",-1.561,1,96.94400000000002,0,0,0,0.0,-1.3
1104 | CC1CCCC1,Methylcyclopentane,-2.452,1,84.162,0,1,0,0.0,-3.3
1105 | CC(C)CC(C)O,4-Methyl-2-pentanol,-1.308,1,102.17699999999998,1,0,2,20.23,-0.8
1106 | O2c1ccc(N)cc1N(C)C(=O)c3cc(C)ccc23 ,RTI 11,-3.125,1,254.289,1,3,0,55.56,-3.928
1107 | CC(C)(C)CO,"2,2-Dimethylpropanol",-1.011,1,88.14999999999999,1,0,0,20.23,-0.4
1108 | CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n2cncn2,Triadimefon,-4.132,1,293.754,0,2,4,57.010000000000005,-3.61
1109 | Cc1cc(no1)C(=O)NNCc2ccccc2,Isocarboxazid,-2.251,1,231.255,2,2,4,67.16,-2.461
1110 | CC=C,Propylene,-1.235,1,42.081,0,0,0,0.0,-1.08
1111 | Oc1ccc(Cl)cc1Cc2cc(Cl)ccc2O,Dichlorophen,-4.924,1,269.127,2,2,2,40.46,-3.953
1112 | CCOC(=O)Nc2cccc(OC(=O)Nc1ccccc1)c2 ,Desmedipham,-4.182,1,300.314,2,2,4,76.66,-4.632
1113 | O=C1c2ccccc2C(=O)c3ccccc13,Anthraquinone,-3.34,1,208.216,0,3,0,34.14,-5.19
1114 | CCCCCCC(C)O,2-Octanol,-2.033,1,130.231,1,0,5,20.23,-2.09
1115 | CC1=C(C(=O)Nc2ccccc2)S(=O)(=O)CCO1,Oxycarboxin,-2.169,1,267.306,1,2,2,72.47,-2.281
1116 | CCCCc1ccccc1,Butylbenzene,-3.585,1,134.22199999999998,0,1,3,0.0,-4.06
1117 | O=C1NC(=O)C(=O)N1 ,parabanic acid,1.091,1,114.06,2,1,0,75.27,-0.4
1118 | COP(=S)(OC)Oc1ccc(Sc2ccc(OP(=S)(OC)OC)cc2)cc1,Abate,-6.678,1,466.47900000000016,0,2,10,55.38000000000001,-6.237
1119 | NS(=O)(=O)c1cc(ccc1Cl)C2(O)NC(=O)c3ccccc23,Chlorthalidone,-2.564,1,338.7720000000001,3,3,2,109.49,-3.451
1120 | CC(C)COC(=O)C,Isobutyl acetate,-1.463,1,116.15999999999998,0,0,2,26.3,-1.21
1121 | CC(C)C(C)(C)C,"2,2,3-Trimethylbutane",-2.922,1,100.20499999999998,0,0,0,0.0,-4.36
1122 | Clc1ccc(c(Cl)c1Cl)c2c(Cl)cc(Cl)c(Cl)c2Cl ,"2,3,3',4,4'6-PCB",-7.746,1,395.3270000000001,0,2,1,0.0,-7.66
1123 | N#Cc1ccccc1C#N,Phthalonitrile,-1.717,1,128.13399999999996,0,1,0,47.58,-2.38
1124 | Cc1cccc(c1)N(=O)=O,m-Nitrotoluene,-2.64,1,137.138,0,1,1,43.14,-2.44
1125 | FC(F)(F)C(Cl)Br ,halothane,-2.608,1,197.381,0,0,0,0.0,-1.71
1126 | CNC(=O)ON=C(SC)C(=O)N(C)C,Oxamyl,-0.908,1,219.266,1,0,1,70.99999999999999,0.106
1127 | CCSCCSP(=S)(OC)OC,Thiometon,-3.323,1,246.359,0,0,7,18.46,-3.091
1128 | CCC(C)C,2-Methylbutane,-2.245,1,72.151,0,0,1,0.0,-3.18
1129 | COP(=O)(OC)OC(=CCl)c1cc(Cl)c(Cl)cc1Cl,Stirofos,-4.32,1,365.96400000000006,0,1,5,44.760000000000005,-4.522
1130 |
--------------------------------------------------------------------------------
/data/finetuning_datasets/regression/esol/esol_mock.csv:
--------------------------------------------------------------------------------
1 | smiles,Compound ID,ESOL predicted log solubility in mols per litre,Minimum Degree,Molecular Weight,Number of H-Bond Donors,Number of Rings,Number of Rotatable Bonds,Polar Surface Area,measured log solubility in mols per litre
2 | OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(O)C3O ,Amigdalin,-0.974,1,457.4320000000001,7,3,7,202.32,-0.77
3 | Cc1occc1C(=O)Nc2ccccc2,Fenfuram,-2.885,1,201.225,1,2,2,42.24,-3.3
4 | CC(C)=CCCC(C)=CC(=O),citral,-2.579,1,152.237,0,0,4,17.07,-2.06
5 | c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43,Picene,-6.617999999999999,2,278.354,0,5,0,0.0,-7.87
6 | c1ccsc1,Thiophene,-2.232,2,84.14299999999999,0,1,0,0.0,-1.33
7 | c2ccc1scnc1c2 ,benzothiazole,-2.733,2,135.191,0,2,0,12.89,-1.5
8 | Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl,"2,2,4,6,6'-PCB",-6.545,1,326.437,0,2,1,0.0,-7.32
9 | CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O,Estradiol,-4.138,1,272.388,2,4,0,40.46,-5.03
10 | ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl,Dieldrin,-4.533,1,380.913,0,5,0,12.53,-6.29
11 |
--------------------------------------------------------------------------------
/data/finetuning_datasets/regression/freesolv/freesolv.csv:
--------------------------------------------------------------------------------
1 | smiles,freesolv
2 | CN(C)C(=O)c1ccc(cc1)OC,-11.01
3 | CS(=O)(=O)Cl,-4.87
4 | CC(C)C=C,1.83
5 | CCc1cnccn1,-5.45
6 | CCCCCCCO,-4.21
7 | Cc1cc(cc(c1)O)C,-6.27
8 | CC(C)C(C)C,2.34
9 | CCCC(C)(C)O,-3.92
10 | C[C@@H]1CCCC[C@@H]1C,1.58
11 | CC[C@H](C)O,-4.62
12 | C(Br)Br,-1.96
13 | CC[C@H](C(C)C)O,-3.88
14 | CCc1ccccn1,-4.33
15 | CCCCC(=O)OCC,-2.49
16 | c1ccc(cc1)S,-2.55
17 | CC(=CCC/C(=C\CO)/C)C,-4.78
18 | c1ccc2c(c1)CCC2,-1.46
19 | CCOc1ccccc1,-2.22
20 | c1cc(ccc1O)Br,-5.85
21 | CCCC(C)(C)C,2.88
22 | CC(=O)OCCOC(=O)C,-6.34
23 | CCOP(=S)(OCC)SCSP(=S)(OCC)OCC,-6.1
24 | C1CCCC(CC1)O,-5.48
25 | COC(=O)C1CC1,-4.1
26 | c1ccc(cc1)C#N,-4.1
27 | CCCCC#N,-3.52
28 | CC(C)(C)O,-4.47
29 | CC(C)C(=O)C(C)C,-2.74
30 | CCC=O,-3.43
31 | CN(C)C=O,-7.81
32 | Cc1ccc(cc1)C,-0.8
33 | C=CCC=C,0.93
34 | Cc1cccc(c1C)Nc2ccccc2C(=O)O,-6.78
35 | CN(C)C(=O)c1ccccc1,-9.29
36 | CCNCC,-4.07
37 | CC(C)(C)c1ccc(cc1)O,-5.91
38 | CC(C)CCOC=O,-2.13
39 | CCCCCCCCCCO,-3.64
40 | CCC(=O)OCC,-2.68
41 | CCCCCCCCC,3.13
42 | CC(=O)NC,-10
43 | CCCCCCCC=C,2.06
44 | c1ccc2cc(ccc2c1)O,-8.11
45 | c1cc(c(cc1Cl)Cl)Cl,-1.12
46 | C([C@H]([C@H]([C@@H]([C@@H](CO)O)O)O)O)O,-23.62
47 | CCCC(=O)OC,-2.83
48 | c1ccc(c(c1)C=O)O,-4.68
49 | C1CNC1,-5.56
50 | CCCNCCC,-3.65
51 | c1ccc(cc1)N,-5.49
52 | C(F)(F)(F)F,3.12
53 | CC[C@@H](C)CO,-4.42
54 | c1ccc(c(c1)O)I,-6.2
55 | COc1cccc(c1O)OC,-6.96
56 | CCC#C,-0.16
57 | c1ccc(cc1)C(F)(F)F,-0.25
58 | NN,-9.3
59 | Cc1ccccn1,-4.63
60 | CCNc1nc(nc(n1)Cl)NCC,-10.22
61 | c1ccc2c(c1)Oc3cc(c(cc3O2)Cl)Cl,-3.56
62 | CCCCCCCCN,-3.65
63 | N,-4.29
64 | c1ccc(c(c1)C(F)(F)F)C(F)(F)F,1.07
65 | COC(=O)c1ccc(cc1)O,-9.51
66 | CCCCCc1ccccc1,-0.23
67 | CC(F)F,-0.11
68 | c1ccc(cc1)n2c(=O)c(c(cn2)N)Cl,-16.43
69 | C=CC=C,0.56
70 | CN(C)C,-3.2
71 | CCCCCC(=O)N,-9.31
72 | CC(C)CO[N+](=O)[O-],-1.88
73 | c1ccc2c(c1)C(=O)c3cccc(c3C2=O)NCCO,-14.21
74 | C(CO[N+](=O)[O-])O,-8.18
75 | CCCCCCC(=O)C,-2.88
76 | CN1CCNCC1,-7.77
77 | CCN,-4.5
78 | C1C=CC=CC=C1,-0.99
79 | c1ccc2c(c1)Cc3ccccc3C2,-3.78
80 | CC(Cl)Cl,-0.84
81 | COc1cccc(c1)O,-7.66
82 | c1cc2cccc3c2c(c1)CC3,-3.15
83 | CCCCCCCCBr,0.52
84 | c1ccc(cc1)CO,-6.62
85 | c1c(c(=O)[nH]c(=O)[nH]1)Br,-18.17
86 | CCCC,2.1
87 | CCl,-0.55
88 | CC(C)CBr,-0.03
89 | CC(C)SC(C)C,-1.21
90 | CCCCCCC,2.67
91 | c1cnc[nH]1,-9.63
92 | c1cc2c(cc1Cl)Oc3cc(c(c(c3O2)Cl)Cl)Cl,-3.84
93 | CC[C@H](C)n1c(=O)c(c([nH]c1=O)C)Br,-9.73
94 | C(I)I,-2.49
95 | CCCN(CCC)C(=O)SCCC,-4.13
96 | C[N+](=O)[O-],-4.02
97 | CCOC,-2.1
98 | COC(CCl)(OC)OC,-4.59
99 | CC(C)C,2.3
100 | CC(C)CC(=O)O,-6.09
101 | CCOP(=O)(OCC)O/C(=C/Cl)/c1ccc(cc1Cl)Cl,-7.07
102 | CCCCl,-0.33
103 | CCCSCCC,-1.28
104 | CCC[C@H](CC)O,-4.06
105 | CC#N,-3.88
106 | CN(CC(F)(F)F)c1ccccc1,-1.92
107 | [C@@H](C(F)(F)F)(OC(F)F)Cl,0.1
108 | C=CCCC=C,1.01
109 | Cc1cccc(c1)C,-0.83
110 | CC(=O)OC,-3.13
111 | COC(c1ccccc1)(OC)OC,-4.04
112 | CCOC(=O)c1ccccc1,-3.64
113 | CCCS,-1.1
114 | CCCCCC(=O)C,-3.04
115 | CC1(Cc2cccc(c2O1)OC(=O)NC)C,-9.61
116 | c1ccc(cc1)CBr,-2.38
117 | CCCCCC(=O)OCC,-2.23
118 | CCCOC,-1.66
119 | CN1CCOCC1,-6.32
120 | c1cc(cc(c1)O)C#N,-9.65
121 | c1cc(c(cc1c2c(c(cc(c2Cl)Cl)Cl)Cl)Cl)Cl,-4.38
122 | CCCc1ccccc1,-0.53
123 | Cn1cnc2c1c(=O)n(c(=O)n2C)C,-12.64
124 | CNC,-4.29
125 | C(=C(F)F)(C(F)(F)F)F,2.93
126 | c1cc(ccc1O)Cl,-7.03
127 | C1CCNCC1,-5.11
128 | c1ccc2c(c1)ccc3c2cccc3,-3.88
129 | CI,-0.89
130 | COc1c(cc(c(c1O)OC)Cl)Cl,-6.44
131 | C(=C/Cl)\Cl,-0.78
132 | CCCCC,2.3
133 | CCCC#N,-3.64
134 | [C@@H](C(F)(F)F)(F)Br,0.5
135 | CC(C)Cc1cnccn1,-5.04
136 | CC[C@H](C)O[N+](=O)[O-],-1.82
137 | c1ccc(cc1)c2cc(ccc2Cl)Cl,-2.46
138 | c1ccc(cc1)c2cc(c(c(c2Cl)Cl)Cl)Cl,-3.48
139 | CC[C@@H](C)C(C)C,2.52
140 | C[C@H](CC(C)C)O,-3.73
141 | C1CCOCC1,-3.12
142 | C1CC1,0.75
143 | c1c(cc(c(c1Cl)Cl)Cl)c2cc(c(c(c2Cl)Cl)Cl)Cl,-3.17
144 | C=C(Cl)Cl,0.25
145 | CC(C)CO,-4.5
146 | CCCOC(=O)CC,-2.44
147 | C(C(Cl)(Cl)Cl)(Cl)(Cl)Cl,-0.64
148 | CSc1ccccc1,-2.73
149 | CCc1ccccc1O,-5.66
150 | CC(C)(C)Cl,1.09
151 | CC(=C)C=C,0.68
152 | Cc1ccc(cc1)C(C)C,-0.68
153 | Cn1ccnc1,-8.41
154 | C(CO)O,-9.3
155 | c1ccc(c(c1)Cl)Cl,-1.36
156 | c1c(=O)[nH]c(=O)[nH]c1Cl,-15.83
157 | CCCOC=O,-2.48
158 | c1ccc2c(c1)Oc3ccc(cc3O2)Cl,-3.1
159 | CCCCCC(=O)O,-6.21
160 | CCOC(=O)CCC(=O)OCC,-5.71
161 | Cc1ccnc(c1)C,-4.86
162 | C1CCC=CC1,0.14
163 | CN1CCN(CC1)C,-7.58
164 | c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl,-3.04
165 | C1=CC(=O)C=CC1=O,-6.5
166 | COC(=O)CCl,-4
167 | CCCC=O,-3.18
168 | CCc1ccccc1,-0.79
169 | C(=C(Cl)Cl)Cl,-0.44
170 | CCN(CC)CC,-3.22
171 | c1cc2c(cc1Cl)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-4.15
172 | Cc1ccncc1C,-5.22
173 | c1(=O)[nH]c(=O)[nH]c(=O)[nH]1,-18.06
174 | c1ccc(cc1)C=O,-4.02
175 | c1ccnc(c1)Cl,-4.39
176 | C=CCCl,-0.57
177 | Cc1ccc(cc1)C(=O)C,-4.7
178 | C=O,-2.75
179 | Cc1ccccc1Cl,-1.14
180 | CC(=O)N1CCCC1,-9.8
181 | CC(OC)(OC)OC,-4.42
182 | CCCCc1ccccc1,-0.4
183 | CN(C)c1ccccc1,-3.45
184 | CC(C)OC,-2.01
185 | c12c(c(c(c(c1Cl)Cl)Cl)Cl)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-4.53
186 | c1(c(c(c(c(c1Cl)Cl)Cl)Cl)Cl)c2c(c(c(c(c2Cl)Cl)Cl)Cl)Cl,-2.98
187 | C(C(Cl)Cl)Cl,-1.99
188 | CNc1ccccc1,-4.69
189 | CC(C)OC(=O)C,-2.64
190 | c1ccccc1,-0.9
191 | c1cc(c(c(c1)Cl)Cl)Cl,-1.24
192 | CCOP(=S)(OCC)SCSc1ccc(cc1)Cl,-6.5
193 | COP(=S)(OC)SCn1c(=O)c2ccccc2nn1,-10.03
194 | c1ccc2c(c1)Oc3c(cc(c(c3O2)Cl)Cl)Cl,-4.05
195 | CC(=C)C(=C)C,0.4
196 | CCCCC=C,1.58
197 | S,-0.7
198 | CCOCC,-1.59
199 | CCNc1nc(nc(n1)SC)NC(C)C,-7.65
200 | CCCCOC(=O)c1ccc(cc1)O,-8.72
201 | CCCCCCOC(=O)C,-2.26
202 | C1CCC(=O)C1,-4.7
203 | CCCCC(=O)O,-6.16
204 | CCBr,-0.74
205 | Cc1ccc2cc(ccc2c1)C,-2.63
206 | CCCCCCO,-4.4
207 | c1ccc(cc1)c2ccccc2Cl,-2.69
208 | CC1=CCCCC1,0.67
209 | CCCCCCO[N+](=O)[O-],-1.66
210 | C(Br)(Br)Br,-2.13
211 | CCc1ccc(cc1)O,-6.13
212 | CCCOCCO,-6.4
213 | c1ccc(cc1)OC=O,-3.82
214 | c1c(c(=O)[nH]c(=O)[nH]1)I,-18.72
215 | CCCC(=O)O,-6.35
216 | COC(C(F)(F)F)(OC)OC,-0.8
217 | C1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O,-20.52
218 | C(F)(F)(F)Br,1.79
219 | CCCCO,-4.72
220 | c1ccc(cc1)F,-0.8
221 | CCOC(=O)C,-2.94
222 | CC(C)COC(=O)C(C)C,-1.69
223 | CC(C)(C)OC,-2.21
224 | C1=C[C@@H]([C@@H]2[C@H]1[C@@]3(C(=C([C@]2(C3(Cl)Cl)Cl)Cl)Cl)Cl)Cl,-2.55
225 | CCC(=O)CC,-3.41
226 | COC(=O)C(F)(F)F,-1.1
227 | c1ccc2ccccc2c1,-2.4
228 | c1cc(c(c(c1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl)Cl,-4.4
229 | CC(=O)Oc1ccccc1C(=O)O,-9.94
230 | CC(=O)C(C)(C)C,-3.11
231 | COS(=O)(=O)C,-4.87
232 | CCc1ccncc1,-4.73
233 | CC(C)NC(C)C,-3.22
234 | c1cc2c(cc1Cl)Oc3ccc(cc3O2)Cl,-3.67
235 | CCCCCCCN,-3.79
236 | CC1CCCC1,1.59
237 | CCC,2
238 | C[C@H]1CCCO1,-3.3
239 | CNC(=O)Oc1cccc2c1cccc2,-9.45
240 | c1cc(cc(c1)O)C=O,-9.52
241 | c1ccc2cc3ccccc3cc2c1,-3.95
242 | C(Cl)Cl,-1.31
243 | CC(C)(C)C(=O)OC,-2.4
244 | C([N+](=O)[O-])(Cl)(Cl)Cl,-1.45
245 | C1CC[S+2](C1)([O-])[O-],-8.61
246 | Cc1cccc(c1O)C,-5.26
247 | Cc1cccc(c1)O,-5.49
248 | c1ccc2c(c1)C(=O)c3c(ccc(c3C2=O)O)N,-9.53
249 | c1ccc2c(c1)C(=O)c3c(ccc(c3C2=O)N)N,-11.85
250 | CCCCCCCC(=O)C,-2.49
251 | CCCCN,-4.24
252 | CCCC(=O)OCC,-2.49
253 | Cc1ccc(cc1)N,-5.57
254 | CCCCCCI,0.08
255 | C(C(F)(Cl)Cl)(F)(F)Cl,1.77
256 | COP(=O)(OC)OC,-8.7
257 | c1cc(cc(c1)Cl)Cl,-0.98
258 | Cc1cc(c2ccccc2c1)C,-2.47
259 | CCCC(C)C,2.51
260 | CCOP(=S)(OCC)Oc1c(cc(c(n1)Cl)Cl)Cl,-5.04
261 | C(C(F)(F)F)Cl,0.06
262 | C=C,1.28
263 | CCCCCI,-0.14
264 | COC(OC)OC,-4.42
265 | CCCCCCCCCC,3.16
266 | C[C@@H](CO[N+](=O)[O-])O[N+](=O)[O-],-4.95
267 | CC=C,1.32
268 | Cc1c[nH]c2c1cccc2,-5.88
269 | COP(=O)([C@H](C(Cl)(Cl)Cl)O)OC,-12.74
270 | C1CCCCC1,1.23
271 | CC(=CCC/C(=C/CO)/C)C,-4.45
272 | CC(C)c1ccccc1,-0.3
273 | CC(C)C(C)C(C)C,2.56
274 | CC(C)C(=O)C,-3.24
275 | CCCCNCCCC,-3.24
276 | CCCCS,-0.99
277 | c1ccc2c(c1)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-3.81
278 | COc1c(c(c(c(c1Cl)C=O)Cl)OC)O,-8.68
279 | C1CCC(CC1)N,-4.59
280 | C(F)(F)Cl,-0.5
281 | COC(=O)c1ccc(cc1)[N+](=O)[O-],-6.88
282 | CC(=O)c1cccnc1,-8.26
283 | CC#C,-0.48
284 | CCCCCCCCC=O,-2.07
285 | CCC(=O)O,-6.46
286 | C(Cl)(Cl)Cl,-1.08
287 | Cc1cccc(c1C)C,-1.21
288 | C,2
289 | c1ccc(cc1)CCl,-1.93
290 | CC1CCCCC1,1.7
291 | Cc1cccs1,-1.38
292 | c1ccncc1,-4.69
293 | CCCCCl,-0.16
294 | C[C@H]1CC[C@@H](O1)C,-2.92
295 | Cc1ccc(c(c1)OC)O,-5.8
296 | C1[C@H]([C@@H]2[C@H]([C@H]1Cl)[C@]3(C(=C([C@@]2(C3(Cl)Cl)Cl)Cl)Cl)Cl)Cl,-3.44
297 | Cc1ccccc1,-0.9
298 | CC(C)COC=O,-2.22
299 | CCOC(=O)c1ccc(cc1)O,-9.2
300 | CCOCCOCC,-3.54
301 | CCCCCOC(=O)CC,-2.11
302 | CCCc1ccc(cc1)O,-5.21
303 | CC=C(C)C,1.31
304 | C(CCl)Cl,-1.79
305 | CCC(C)(C)CC,2.56
306 | Cc1cc2ccccc2cc1C,-2.78
307 | Cc1cccc(n1)C,-4.59
308 | COC(C(Cl)Cl)(F)F,-1.12
309 | CCOCCOC(=O)C,-5.31
310 | COc1cccc(c1)N,-7.29
311 | c1cc(cnc1)C=O,-7.1
312 | CCC(C)(C)O,-4.43
313 | CCc1cccc(c1N(COC)C(=O)CCl)CC,-8.21
314 | Cn1cccc1,-2.89
315 | COCOC,-2.93
316 | CCC(CC)O,-4.35
317 | CCCCCCCCCC(=O)C,-2.15
318 | C(CBr)Cl,-1.95
319 | c1ccc(cc1)I,-1.74
320 | CC1=CC(=O)CC(C1)(C)C,-5.18
321 | CCI,-0.74
322 | CCCc1ccc(c(c1)OC)O,-5.26
323 | CC(C)Br,-0.48
324 | Cc1ccc(cc1)Br,-1.39
325 | c1cc(ccc1C#N)O,-10.17
326 | CS(=O)(=O)C,-10.08
327 | CCc1cccc(c1)O,-6.25
328 | CC1=CC[C@H](C[C@@H]1O)C(=C)C,-4.44
329 | c1cc(ccc1Br)Br,-2.3
330 | COc1c(ccc(c1C(=O)O)Cl)Cl,-9.86
331 | CC/C=C\C,1.31
332 | CC,1.83
333 | COc1ccccc1OC,-5.33
334 | CCSCC,-1.46
335 | c1cc(cnc1)C#N,-6.75
336 | c1cc(c(cc1O)Cl)Cl,-7.29
337 | COc1ccccc1,-2.45
338 | Cc1ccc(c(c1)O)C,-5.91
339 | c1cc(ccc1Cl)Cl,-1.01
340 | C(F)Cl,-0.77
341 | CCCC=C,1.68
342 | c1cc(c(c(c1Cl)Cl)Cl)Cl,-1.34
343 | CCCCCC#C,0.6
344 | CCCCCCCCC(=O)C,-2.34
345 | c1ccc(cc1)Cl,-1.12
346 | CN(C)CCOC(c1ccccc1)c2ccccc2,-9.34
347 | CCCCC=O,-3.03
348 | c1ccc(cc1)Oc2ccccc2,-2.87
349 | C1CCC(=O)CC1,-4.91
350 | CCCC[N+](=O)[O-],-3.09
351 | c1cnccc1C=O,-7
352 | C(CCl)OCCCl,-4.23
353 | CC[N+](=O)[O-],-3.71
354 | c1cc(cnc1)Cl,-4.01
355 | CBr,-0.82
356 | CO,-5.1
357 | CCCCCCC=O,-2.67
358 | c1cc(c(c(c1)Cl)c2c(cccc2Cl)Cl)Cl,-2.28
359 | c1ccc(c(c1)N)[N+](=O)[O-],-7.37
360 | CN1CCCCC1,-3.88
361 | CCCCCCCC=O,-2.29
362 | c1ccc(cc1)[N+](=O)[O-],-4.12
363 | C[C@@H]1CC[C@H](C(=O)C1)C(C)C,-2.53
364 | C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O,-25.47
365 | CF,-0.22
366 | CS(=O)C,-9.280000000000001
367 | c1ccc2c(c1)Oc3ccccc3O2,-3.15
368 | Cc1ccccc1N,-5.53
369 | CCCCBr,-0.4
370 | CCCCCCCCCO,-3.88
371 | Cc1ccncc1,-4.93
372 | C(=C(Cl)Cl)(Cl)Cl,0.1
373 | CC(C)(C)Br,0.84
374 | C=C(c1ccccc1)c2ccccc2,-2.78
375 | CCc1ccc(cc1)C,-0.95
376 | Cc1cccnc1,-4.77
377 | COCC(OC)(OC)OC,-5.73
378 | c1ccc-2c(c1)Cc3c2cccc3,-3.35
379 | CC(=O)N,-9.71
380 | COS(=O)(=O)OC,-5.1
381 | C(C(Cl)Cl)(Cl)Cl,-2.37
382 | COC(=O)C1CCCCC1,-3.3
383 | CCCCCCBr,0.18
384 | CCCCCCCBr,0.34
385 | c1ccc2c(c1)Oc3cccc(c3O2)Cl,-3.52
386 | COC(CC#N)(OC)OC,-6.4
387 | CC[C@H](C)Cl,0
388 | CCCCCCc1ccccc1,-0.04
389 | COc1cc(c(c(c1O)OC)Cl)C=O,-7.78
390 | c1cc(cc(c1)C(F)(F)F)C(F)(F)F,1.07
391 | c1ccc(cc1)Cn2ccnc2,-7.63
392 | c1ccc2c(c1)cccc2N,-7.28
393 | CCOC(=O)CC(=O)OCC,-6
394 | CC(=O)C1CC1,-4.61
395 | c1cc[nH]c1,-4.78
396 | c1cc(c(cc1c2ccc(cc2F)F)C(=O)O)O,-9.4
397 | CC1CCC(CC1)C,2.11
398 | C1CCC(CC1)O,-5.46
399 | CN(C)CCC=C1c2ccccc2CCc3c1cccc3,-7.43
400 | c1cc(ccc1O)F,-6.19
401 | c1ccc(c(c1)N)Cl,-4.91
402 | Cc1ccc(c(c1)C)C,-0.86
403 | CCc1ccccc1C,-0.85
404 | C[C@@H]1CC[C@H](CC1=O)C(=C)C,-3.75
405 | c1ccc(cc1)c2ccccc2,-2.7
406 | Cc1cccc(c1C)O,-6.16
407 | COP(=S)(OC)Oc1ccc(cc1)[N+](=O)[O-],-7.19
408 | CCOP(=S)(OCC)Oc1ccc(cc1)[N+](=O)[O-],-6.74
409 | CCN(CC)c1c(cc(c(c1[N+](=O)[O-])N)C(F)(F)F)[N+](=O)[O-],-5.66
410 | CSC,-1.61
411 | C[C@@H](c1cccc(c1)C(=O)c2ccccc2)C(=O)O,-10.78
412 | C1CCC(C1)O,-5.49
413 | CCCCC(=O)OC,-2.56
414 | CCCC(=C)C,1.47
415 | C[C@@H](c1ccc(c(c1)F)c2ccccc2)C(=O)O,-8.42
416 | CCCN(CCC)c1c(cc(cc1[N+](=O)[O-])S(=O)(=O)C)[N+](=O)[O-],-7.98
417 | C=CCl,-0.59
418 | Cc1ccc(cc1)C(=O)N(C)C,-9.76
419 | CCCC(=O)CCC,-2.92
420 | COC(=O)c1ccccc1,-3.92
421 | Cc1ccc(cc1)C=O,-4.27
422 | CCCC(=O)OCCC,-2.28
423 | C1CNCCN1,-7.4
424 | CCOP(=S)(OCC)S[C@@H](CCl)N1C(=O)c2ccccc2C1=O,-5.74
425 | CCOCCO,-6.69
426 | CCC(C)CC,2.51
427 | Cc1cnccn1,-5.51
428 | CCC[N+](=O)[O-],-3.34
429 | Cc1cc(cc(c1)C)C,-0.9
430 | c1c(c(=O)[nH]c(=O)[nH]1)F,-16.92
431 | CCO,-5
432 | Cc1ccc(c2c1cccc2)C,-2.82
433 | c1c2c(cc(c1Cl)Cl)Oc3cc(c(cc3O2)Cl)Cl,-3.37
434 | c1cc(c(c(c1)Cl)C#N)Cl,-4.71
435 | CCOC=O,-2.56
436 | c1c(c(cc(c1Cl)Cl)Cl)Cl,-1.34
437 | CCOC(OCC)Oc1ccccc1,-5.23
438 | c1cc(cc(c1)O)[N+](=O)[O-],-9.62
439 | CCCCCCCCO,-4.09
440 | CCC=C,1.38
441 | C(Cl)(Cl)(Cl)Cl,0.08
442 | c1ccc(cc1)CCO,-6.79
443 | CN(C)C(=O)Nc1ccccc1,-9.13
444 | CSSC,-1.83
445 | C1C=CC[C@@H]2[C@@H]1C(=O)N(C2=O)SC(Cl)(Cl)Cl,-9.01
446 | CC(=O)OCC(COC(=O)C)OC(=O)C,-8.84
447 | COC,-1.91
448 | CCCCCC,2.48
449 | C(CBr)Br,-2.33
450 | C(C(Cl)(Cl)Cl)(Cl)Cl,-1.23
451 | c1c(c(=O)[nH]c(=O)[nH]1)C(F)(F)F,-15.46
452 | Cc1cccc(c1N)C,-5.21
453 | CCCOC(=O)C,-2.79
454 | c1ccc2c(c1)cccn2,-5.72
455 | CCS,-1.14
456 | CCSSCC,-1.64
457 | c1ccsc1,-1.4
458 | CCc1cccc2c1cccc2,-2.4
459 | CCCC(=O)C,-3.52
460 | c1c(c(c(c(c1Cl)Cl)Cl)Cl)c2c(cc(c(c2Cl)Cl)Cl)Cl,-4.61
461 | CCC[N@@](CC1CC1)c2c(cc(cc2[N+](=O)[O-])C(F)(F)F)[N+](=O)[O-],-2.45
462 | CC(=O)O,-6.69
463 | CC=O,-3.5
464 | c1cc(cc(c1)[N+](=O)[O-])N,-8.84
465 | CCCCC#C,0.29
466 | COc1ccccc1N,-6.12
467 | c1ccc(cc1)O,-6.6
468 | CCC#N,-3.84
469 | c1ccc2c(c1)cccc2O,-7.67
470 | CCCCOC(=O)C,-2.64
471 | CC(C)(/C=N\OC(=O)NC)SC,-9.84
472 | Cc1ccccc1O,-5.9
473 | CC(C)C=O,-2.86
474 | CCC(=O)N,-9.4
475 | CCCBr,-0.56
476 | CC(C)Cl,-0.25
477 | C(CCl)CCl,-1.89
478 | c1cc(ccc1[N+](=O)[O-])O,-10.64
479 | C[C@@H](CCl)Cl,-1.27
480 | c1cc(ccc1N)Cl,-5.9
481 | c1ccc2c(c1)C(=O)c3cccc(c3C2=O)N,-9.44
482 | Cc1cccnc1C,-4.82
483 | c1cnccc1C#N,-6.02
484 | CCOP(=S)(OCC)SCSCC,-4.37
485 | CC(=O)C1CCCCC1,-3.9
486 | Cc1ccccc1C=O,-3.93
487 | CC(=O)c1ccncc1,-7.62
488 | c1c2c(cc(c1Cl)Cl)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-3.71
489 | CC(=O)C,-3.8
490 | CC(=C)C,1.16
491 | c1cc(c(cc1Cl)c2cc(c(c(c2)Cl)Cl)Cl)Cl,-3.61
492 | CCCCC[N+](=O)[O-],-2.82
493 | CCC/C=C/C=O,-3.68
494 | CN(C)C(=O)c1ccc(cc1)[N+](=O)[O-],-11.95
495 | C1CCOC1,-3.47
496 | CCCCCCCC,2.88
497 | CCCN(CCC)c1c(cc(cc1[N+](=O)[O-])C(F)(F)F)[N+](=O)[O-],-3.25
498 | CC(=CCC[C@](C)(C=C)OC(=O)C)C,-2.49
499 | C[C@@H](CCO[N+](=O)[O-])O[N+](=O)[O-],-4.29
500 | CC(C)OC(C)C,-0.53
501 | CCCCC(C)C,2.93
502 | c1(c(c(c(c(c1Cl)Cl)Cl)Cl)Cl)N(=O)=O,-5.22
503 | [C@@H](C(F)(F)F)(Cl)Br,-0.11
504 | CCCCOCCCC,-0.83
505 | CCCCCC1CCCC1,2.55
506 | CC(C)CC(C)C,2.83
507 | Cc1ccc(nc1)C,-4.72
508 | C/C=C/C=O,-4.22
509 | CCC[C@H](C)CC,2.71
510 | c1cc(c(c(c1)Cl)c2c(cc(cc2Cl)Cl)Cl)Cl,-1.96
511 | c1ccc(cc1)O[C@@H](C(F)F)F,-1.29
512 | COCCOC,-4.84
513 | CC[C@H](C)c1ccccc1,-0.45
514 | c1ccc(cc1)CCCO,-6.92
515 | CC[C@@H](C)c1cc(cc(c1O)[N+](=O)[O-])[N+](=O)[O-],-6.23
516 | COc1ccc(cc1)C(=O)OC,-5.33
517 | CCC(=O)Nc1ccc(c(c1)Cl)Cl,-7.78
518 | C[C@@H](c1ccc2cc(ccc2c1)OC)C(=O)O,-10.21
519 | C1(C(C(C1(F)F)(F)F)(F)F)(F)F,3.43
520 | CC(C)CCOC(=O)C,-2.21
521 | CCCCCCCl,0
522 | CC(C)CC(=O)C,-3.05
523 | CCCCCC=O,-2.81
524 | c1cc(cc(c1)Cl)N,-5.82
525 | C1COCCN1,-7.17
526 | CCOC(C)OCC,-3.28
527 | CCCC[N@](CC)c1c(cc(cc1[N+](=O)[O-])C(F)(F)F)[N+](=O)[O-],-3.51
528 | CS,-1.2
529 | C1[C@@H]2[C@H](COS(=O)O1)[C@@]3(C(=C([C@]2(C3(Cl)Cl)Cl)Cl)Cl)Cl,-4.23
530 | CC(=O)c1ccc(cc1)OC,-4.4
531 | C=CCO,-5.03
532 | CCSC,-1.5
533 | CCCCCOC(=O)C,-2.51
534 | c1c(cc(c(c1Cl)Cl)Cl)Cl,-1.62
535 | CC(=O)c1ccccc1,-4.58
536 | CCCl,-0.63
537 | CCCC1CCCC1,2.13
538 | c1c(cc(cc1Cl)Cl)Cl,-0.78
539 | CCCOC(=O)c1ccc(cc1)O,-9.37
540 | c1cc(cc(c1)Cl)O,-6.62
541 | CC(C)CCO,-4.42
542 | CCCCCN,-4.09
543 | Cc1c(c(=O)n(c(=O)[nH]1)C(C)(C)C)Cl,-11.14
544 | CC(C)CCC(C)(C)C,2.93
545 | CCCCOCCO,-6.25
546 | C1[C@@H]2[C@H]3[C@@H]([C@H]1[C@H]4[C@@H]2O4)[C@@]5(C(=C([C@]3(C5(Cl)Cl)Cl)Cl)Cl)Cl,-4.82
547 | c1ccc(cc1)C(=O)N,-11
548 | CC(C)[N+](=O)[O-],-3.13
549 | C(C(CO)O)O,-13.43
550 | CCCI,-0.53
551 | COCCN,-6.55
552 | C(C(Cl)(Cl)Cl)Cl,-1.43
553 | CCC(=O)OC,-2.93
554 | C1CCCC1,1.2
555 | CCc1cccnc1,-4.59
556 | Cc1cc(cnc1)C,-4.84
557 | COCCO,-6.619999999999999
558 | COC=O,-2.78
559 | c1ccc2cc(ccc2c1)N,-7.47
560 | Cc1c[nH]cn1,-10.27
561 | Cc1cccc(c1)[N+](=O)[O-],-3.45
562 | C(CCCl)CCl,-2.32
563 | CC(=O)CO[N+](=O)[O-],-5.99
564 | CC(C)(C)c1ccccc1,-0.44
565 | CCCCCC(=O)OC,-2.49
566 | C[C@@H](C(F)(F)F)O,-4.16
567 | CCCCCBr,-0.1
568 | CCCCCCC=C,1.92
569 | CC1=CC(=O)[C@@H](CC1)C(C)C,-4.51
570 | CC(C)O,-4.74
571 | CCCCCCN,-3.95
572 | C(CO[N+](=O)[O-])CO[N+](=O)[O-],-4.8
573 | Cc1ccc(c(c1)C)O,-6.01
574 | CCCCCO,-4.57
575 | CCC[C@@H](C)O,-4.39
576 | CCCC[C@@H](C)CC,2.97
577 | C[C@@H](c1ccc(cc1)CC(C)C)C(=O)O,-7
578 | CCOC(=O)C[C@H](C(=O)OCC)SP(=S)(OC)OC,-8.15
579 | Cc1ccc(cc1C)O,-6.5
580 | Cc1cc(ccc1Cl)O,-6.79
581 | CCCC/C=C/C,1.68
582 | CCCOCCC,-1.16
583 | C[C@@H]1CC[C@H]([C@@H](C1)O)C(C)C,-3.2
584 | CCNc1nc(nc(n1)SC)NC(C)(C)C,-6.68
585 | CC(C)CC(C)(C)C,2.89
586 | CCCCC(=O)CCCC,-2.64
587 | CCCCN(CC)C(=O)SCCC,-3.64
588 | CCCCCC=C,1.66
589 | CC(C)OC=O,-2.02
590 | CC(OC(=O)C)OC(=O)C,-4.97
591 | c1c(c(=O)[nH]c(=O)[nH]1)Cl,-17.74
592 | CC(=C)c1ccccc1,-1.24
593 | CCC(C)C,2.38
594 | CCCCO[N+](=O)[O-],-2.09
595 | c1ccc(cc1)Br,-1.46
596 | CC(Cl)(Cl)Cl,-0.19
597 | CC(=C)[C@H]1CCC(=CC1)C=O,-4.09
598 | Cc1ccccc1[N+](=O)[O-],-3.58
599 | CCCCCCCI,0.27
600 | c1cc2ccc3cccc4c3c2c(c1)cc4,-4.52
601 | CCCCCCl,-0.1
602 | CC(C)COC(=O)C,-2.36
603 | CCC(C)(C)C,2.51
604 | c1cc(ccc1N)N(=O)=O,-9.82
605 | COC(=O)CC#N,-6.72
606 | COc1ccc(cc1)N,-7.48
607 | CC(C)Cc1ccccc1,0.16
608 | c1ccc(cc1)c2c(cc(cc2Cl)Cl)Cl,-2.16
609 | CN,-4.55
610 | c1ccc(c(c1)O)Cl,-4.55
611 | c1ccc2c(c1)C(=O)c3ccc(cc3C2=O)N,-11.53
612 | C(=C\Cl)\Cl,-1.17
613 | CCCCC(=O)C,-3.28
614 | C(CO[N+](=O)[O-])O[N+](=O)[O-],-5.73
615 | c1ccc(c(c1)O)F,-5.29
616 | Cc1c(nc(nc1OC(=O)N(C)C)N(C)C)C,-9.41
617 | C=Cc1ccccc1,-1.24
618 | CCOP(=O)(OCC)OCC,-7.5
619 | C(C(F)(F)F)O,-4.31
620 | CCCCOC[C@H](C)O,-5.73
621 | CCCO,-4.85
622 | Cc1ccccc1C,-0.9
623 | CC(C)(C)C,2.51
624 | CCCC#C,0.01
625 | c1ccc2c(c1)C(=O)NC2=O,-9.61
626 | CCCCI,-0.25
627 | Cc1ccc(cc1)O,-6.13
628 | CC(C)I,-0.46
629 | COc1ccccc1O,-5.94
630 | C1CC=CC1,0.56
631 | C[C@H](C(F)(F)F)O,-4.2
632 | CCCN,-4.39
633 | c1ccc(c(c1)[N+](=O)[O-])O,-4.58
634 | Cc1cccc2c1cccc2,-2.44
635 | c1(c(c(c(c(c1Cl)Cl)Cl)Cl)Cl)Cl,-2.33
636 | CCCCC/C=C/C=O,-3.43
637 | CCCCCCC#C,0.71
638 | CCOP(=S)(OCC)Oc1cc(nc(n1)C(C)C)C,-6.48
639 | CCCCCCCC(=O)OC,-2.04
640 | C1CCNC1,-5.48
641 | c1cc(ccc1C=O)O,-8.83
642 | CCCCCCCCl,0.29
643 | C1COCCO1,-5.06
644 |
--------------------------------------------------------------------------------
/data/molecule_dataset_selfies.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HUBioDataLab/SELFormer/65e686feb72185cc95f5b81176444e20586848e1/data/molecule_dataset_selfies.zip
--------------------------------------------------------------------------------
/data/molecule_dataset_smiles.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HUBioDataLab/SELFormer/65e686feb72185cc95f5b81176444e20586848e1/data/molecule_dataset_smiles.zip
--------------------------------------------------------------------------------
/data/pretraining_hyperparameters.yml:
--------------------------------------------------------------------------------
1 | my_set0: &molbert
2 | HIDDEN_SIZE: 768
3 | TRAIN_BATCH_SIZE: 16
4 | VALID_BATCH_SIZE: 8
5 | TRAIN_EPOCHS: 100
6 | LEARNING_RATE: 0.00005
7 | WEIGHT_DECAY: 0.01
8 | MAX_LEN: 128
9 | VOCAB_SIZE: 800
10 | MAX_POSITION_EMBEDDINGS: 514
11 | NUM_ATTENTION_HEADS: 12
12 | NUM_HIDDEN_LAYERS: 8
13 | TYPE_VOCAB_SIZE: 1
14 | my_set1: &chemberta
15 | HIDDEN_SIZE: 768
16 | TRAIN_BATCH_SIZE: 16
17 | VALID_BATCH_SIZE: 8
18 | TRAIN_EPOCHS: 100
19 | LEARNING_RATE: 0.00005
20 | WEIGHT_DECAY: 0.01
21 | MAX_LEN: 128
22 | VOCAB_SIZE: 800
23 | MAX_POSITION_EMBEDDINGS: 514
24 | NUM_ATTENTION_HEADS: 6
25 | NUM_HIDDEN_LAYERS: 6
26 | TYPE_VOCAB_SIZE: 1
27 | my_set2: &run0-set30
28 | HIDDEN_SIZE: 768
29 | TRAIN_BATCH_SIZE: 16
30 | VALID_BATCH_SIZE: 8
31 | TRAIN_EPOCHS: 100
32 | LEARNING_RATE: 0.00005
33 | WEIGHT_DECAY: 0.01
34 | MAX_LEN: 128
35 | VOCAB_SIZE: 800
36 | MAX_POSITION_EMBEDDINGS: 514
37 | NUM_ATTENTION_HEADS: 4
38 | NUM_HIDDEN_LAYERS: 12
39 | TYPE_VOCAB_SIZE: 1
40 |
--------------------------------------------------------------------------------
/data/requirements.yml:
--------------------------------------------------------------------------------
1 | name: SELFormer_env
2 | channels:
3 | - anaconda
4 | - defaults
5 | - conda-forge
6 | dependencies:
7 | - pytorch=1.13.1
8 | - transformers=4.26.1
9 | - pyyaml=6.0
10 | - yaml=0.2.5
11 | - scikit-learn=1.2.1
12 | - datasets=2.9.0
13 | - chemprop=1.5.2
14 | - tokenizers=0.13.2
15 | - pip=22.3.1
16 | - pip:
17 | - simpletransformers==0.63.9
18 | - pandarallel==1.6.4
19 | - wandb==0.13.10
20 | - selfies==2.1.1
--------------------------------------------------------------------------------
/figures/selformer_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HUBioDataLab/SELFormer/65e686feb72185cc95f5b81176444e20586848e1/figures/selformer_architecture.png
--------------------------------------------------------------------------------
/generate_selfies.py:
--------------------------------------------------------------------------------
1 |
2 | import argparse
3 |
4 | parser = argparse.ArgumentParser()
5 | parser.add_argument("--smiles_dataset", required=True, metavar="/path/to/dataset/", help="Path of the input SMILES dataset.")
6 | parser.add_argument("--selfies_dataset", required=True, metavar="/path/to/dataset/", help="Path of the output SEFLIES dataset.")
7 | args = parser.parse_args()
8 |
9 | import pandas as pd
10 | from prepare_pretraining_data import prepare_data
11 |
12 | prepare_data(path=args.smiles_dataset, save_to=args.selfies_dataset)
13 | chembl_df = pd.read_csv(args.selfies_dataset)
14 | print("SELFIES representation file is ready.")
--------------------------------------------------------------------------------
/get_embeddings.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | os.environ["TOKENIZERS_PARALLELISM"] = "false"
4 | os.environ["WANDB_DISABLED"] = "true"
5 | os.environ["CUDA_VISIBLE_DEVICES"] = "0"
6 |
7 | import pandas as pd
8 | from pandarallel import pandarallel
9 |
10 | from transformers import RobertaTokenizer, RobertaModel, RobertaConfig
11 | import torch
12 |
13 | df = pd.read_csv("./data/molecule_dataset_selfies.csv") # path of the selfies data
14 |
15 | model_name = "./data/pretrained_models/SELFormer" # path of the pre-trained model
16 | config = RobertaConfig.from_pretrained(model_name)
17 | config.output_hidden_states = True
18 | tokenizer = RobertaTokenizer.from_pretrained("./data/RobertaFastTokenizer")
19 | model = RobertaModel.from_pretrained(model_name, config=config)
20 |
21 |
22 | def get_sequence_embeddings(selfies):
23 | token = torch.tensor([tokenizer.encode(selfies, add_special_tokens=True, max_length=512, padding=True, truncation=True)])
24 | output = model(token)
25 |
26 | sequence_out = output[0]
27 | return torch.mean(sequence_out[0], dim=0).tolist()
28 |
29 | print("Starting")
30 | df = df[:100000] # how many molecules should be processed
31 | pandarallel.initialize(nb_workers=5) # number of threads
32 | df["sequence_embeddings"] = df.selfies.parallel_apply(get_sequence_embeddings)
33 |
34 | df.drop(columns=["selfies"], inplace=True) # not interested in selfies data anymore, only chembl_id and the embedding
35 | df.to_csv("./data/embeddings.csv", index=False) # save embeddings here
36 | print("Finished")
37 |
--------------------------------------------------------------------------------
/get_moleculenet_embeddings.py:
--------------------------------------------------------------------------------
1 | import os
2 | from time import time
3 | from fnmatch import fnmatch
4 |
5 | import pandas as pd
6 | from pandarallel import pandarallel
7 | import to_selfies
8 | import torch
9 | from transformers import RobertaTokenizer, RobertaModel, RobertaConfig
10 |
11 | import argparse
12 |
13 | parser = argparse.ArgumentParser()
14 | parser.add_argument("--dataset_path", required=True, metavar="/path/to/dataset/", help="Path of the input MoleculeNet datasets.")
15 | parser.add_argument("--model_file", required=True, metavar="", type=str, help="Name of the pretrained model.")
16 |
17 | args = parser.parse_args()
18 |
19 | os.environ["TOKENIZERS_PARALLELISM"] = "false"
20 | os.environ["WANDB_DISABLED"] = "true"
21 | os.environ["CUDA_VISIBLE_DEVICES"] = "0"
22 |
23 | model_file = args.model_file # path of the pre-trained model
24 | config = RobertaConfig.from_pretrained(model_file)
25 | config.output_hidden_states = True
26 | tokenizer = RobertaTokenizer.from_pretrained("./data/RobertaFastTokenizer")
27 | model = RobertaModel.from_pretrained(model_file, config=config)
28 |
29 |
30 | def generate_moleculenet_selfies(dataset_file):
31 | """
32 | Generates SELFIES for a given dataset and saves it to a file.
33 | :param dataset_file: path to the dataset file
34 | """
35 |
36 | dataset_name = dataset_file.split("/")[-1].split(".")[0]
37 |
38 | print(f'\nGenerating SELFIES for {dataset_name}')
39 |
40 | if dataset_name == 'bace':
41 | smiles_column = 'mol'
42 | else:
43 | smiles_column = 'smiles'
44 |
45 | # read dataset
46 | dataset_df = pd.read_csv(os.path.join(dataset_file))
47 | dataset_df["selfies"] = dataset_df[smiles_column] # creating a new column "selfies" that is a copy of smiles_column
48 |
49 | # generate selfies
50 | pandarallel.initialize()
51 | dataset_df.selfies = dataset_df.selfies.parallel_apply(to_selfies.to_selfies)
52 |
53 | dataset_df.drop(dataset_df[dataset_df[smiles_column] == dataset_df.selfies].index, inplace=True)
54 | dataset_df.drop(columns=[smiles_column], inplace=True)
55 | out_name = dataset_name + "_selfies.csv"
56 |
57 | # save selfies to file
58 | path = os.path.dirname(dataset_file)
59 |
60 | dataset_df.to_csv(os.path.join(path, out_name), index=False)
61 | print(f'Saved to {os.path.join(path, out_name)}')
62 |
63 |
64 | def get_sequence_embeddings(selfies, tokenizer, model):
65 |
66 | torch.set_num_threads(1)
67 | token = torch.tensor([tokenizer.encode(selfies, add_special_tokens=True, max_length=512, padding=True, truncation=True)])
68 | output = model(token)
69 |
70 | sequence_out = output[0]
71 | return torch.mean(sequence_out[0], dim=0).tolist()
72 |
73 |
74 | def generate_embeddings(model_file, args):
75 |
76 | root = args.dataset_path
77 | model_name = model_file.split("/")[-1]
78 |
79 | prepare_data_pattern = "*.csv"
80 |
81 | print(f"\nGenerating embeddings using pre-trained model {model_name}")
82 | for path, subdirs, files in os.walk(root):
83 | for name in files:
84 | if fnmatch(name, prepare_data_pattern) and not any(substring in name for substring in ['selfies', 'embeddings', 'results']):
85 | dataset_file = os.path.join(path, name)
86 | generate_moleculenet_selfies(dataset_file)
87 |
88 | selfies_file = os.path.join(path, name.split(".")[0] + "_selfies.csv")
89 |
90 | dataset_name = selfies_file.split("/")[-1].split(".")[0].split('_selfies')[0]
91 | print(f'\nGenerating embeddings for {dataset_name}')
92 | t0 = time()
93 |
94 | dataset_df = pd.read_csv(selfies_file)
95 | pandarallel.initialize(nb_workers=10, progress_bar=True) # number of threads
96 | dataset_df["sequence_embeddings"] = dataset_df.selfies.parallel_apply(get_sequence_embeddings, args=(tokenizer, model))
97 |
98 | dataset_df.drop(columns=["selfies"], inplace=True) # not interested in selfies data anymore, only class and the embedding
99 | file_name = f"{dataset_name}_{model_name}_embeddings.csv"
100 |
101 | # save embeddings to file
102 | path = os.path.dirname(selfies_file)
103 | dataset_df.to_csv(os.path.join(path, file_name), index=False)
104 | t1 = time()
105 |
106 | print(f'Finished in {round((t1-t0) / 60, 2)} mins')
107 | print(f'Saved to {os.path.join(path, file_name)}\n')
108 |
109 | generate_embeddings(model_file, args)
--------------------------------------------------------------------------------
/multilabel_class_pred.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import pandas as pd
4 | import torch
5 | from simpletransformers.classification import MultiLabelClassificationModel
6 | from prepare_finetuning_data import smiles_to_selfies
7 | import argparse
8 |
9 | parser = argparse.ArgumentParser()
10 | parser.add_argument("--task", default="sider", help="task selection.")
11 | parser.add_argument("--pred_set", default="data/finetuning_datasets/classification/sider/sider_mock.csv", metavar="/path/to/dataset/", help="Test set for predictions.")
12 | parser.add_argument("--training_args", default= "data/finetuned_models/SELFormer_sider_scaffold_optimized/training_args.bin", metavar="/path/to/dataset/", help="Trained model arguments.")
13 | parser.add_argument("--model_name", default="data/finetuned_models/SELFormer_sider_scaffold_optimized", metavar="/path/to/dataset/", help="Path to the model.")
14 | parser.add_argument("--num_labels", default=27, type=int, help="Number of labels.")
15 | args = parser.parse_args()
16 |
17 | print("Loading test set...")
18 | pred_set = pd.read_csv(args.pred_set)
19 | pred_df_selfies = smiles_to_selfies(pred_set)
20 |
21 | print("Loading model...")
22 | training_args = torch.load(args.training_args)
23 | num_labels = args.num_labels
24 | model = MultiLabelClassificationModel("roberta", args.model_name, num_labels=num_labels, use_cuda=True, args=args.training_args)
25 |
26 | print("Predicting...")
27 | preds, _ = model.predict(pred_df_selfies["selfies"].tolist())
28 |
29 | # create a dataframe with the selfies and the predictions each in a seperate column named feature_0, feature_1, etc.
30 | res = pd.DataFrame(preds, columns=["feature_{}".format(i) for i in range(num_labels)])
31 | res.insert(0, "selfies", pred_df_selfies["selfies"].tolist())
32 |
33 | if not os.path.exists("data/predictions"):
34 | os.makedirs("data/predictions")
35 |
36 | res.to_csv("data/predictions/{}_predictions.csv".format(args.task), index=False)
37 | print("Predictions saved to data/predictions/{}_predictions.csv".format(args.task))
--------------------------------------------------------------------------------
/prepare_finetuning_data.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import chemprop
3 |
4 | from pandarallel import pandarallel
5 | import to_selfies
6 |
7 |
8 | def smiles_to_selfies(df):
9 | df.insert(0, "selfies", df["smiles"])
10 | pandarallel.initialize()
11 | df.selfies = df.selfies.parallel_apply(to_selfies.to_selfies)
12 |
13 | df.drop(df[df.smiles == df.selfies].index, inplace=True)
14 | df.drop(columns=["smiles"], inplace=True)
15 |
16 | return df
17 |
18 |
19 | def train_val_test_split_multilabel(path, scaffold_split):
20 | main_df = pd.read_csv(path)
21 | main_df.sample(frac=1).reset_index(drop=True) # shuffling
22 | main_df.rename(columns={main_df.columns[0]: "smiles"}, inplace=True)
23 | main_df.fillna(0, inplace=True)
24 | main_df.reset_index(drop=True, inplace=True)
25 |
26 | if scaffold_split:
27 | molecule_list = []
28 | for _, row in main_df.iterrows():
29 | molecule_list.append(chemprop.data.data.MoleculeDatapoint(smiles=[row["smiles"]], targets=row[1:].values))
30 | molecule_dataset = chemprop.data.data.MoleculeDataset(molecule_list)
31 | (train, val, test) = chemprop.data.scaffold.scaffold_split(data=molecule_dataset, sizes=(0.8, 0.1, 0.1), seed=42, balanced=True)
32 | return (train, val, test)
33 |
34 | else: # random split
35 | from sklearn.model_selection import train_test_split
36 |
37 | train, val = train_test_split(main_df, test_size=0.2, random_state=42)
38 | val, test = train_test_split(val, test_size=0.5, random_state=42)
39 | return (train, val, test)
40 |
41 |
42 | def train_val_test_split(path, target_column_number=1, scaffold_split=False):
43 | main_df = pd.read_csv(path)
44 | main_df.sample(frac=1).reset_index(drop=True) # shuffling
45 | main_df.rename(columns={main_df.columns[0]: "smiles", main_df.columns[target_column_number]: "target"}, inplace=True)
46 | main_df = main_df[["smiles", "target"]]
47 | # main_df.dropna(subset=["target"], inplace=True)
48 | main_df.fillna(0, inplace=True)
49 | main_df.reset_index(drop=True, inplace=True)
50 |
51 | if scaffold_split:
52 | molecule_list = []
53 | for _, row in main_df.iterrows():
54 | molecule_list.append(chemprop.data.data.MoleculeDatapoint(smiles=[row["smiles"]], targets=row[1:].values))
55 | molecule_dataset = chemprop.data.data.MoleculeDataset(molecule_list)
56 | (train, val, test) = chemprop.data.scaffold.scaffold_split(data=molecule_dataset, sizes=(0.8, 0.1, 0.1), seed=42, balanced=True)
57 | return (train, val, test)
58 |
59 | else: # random split
60 | from sklearn.model_selection import train_test_split
61 |
62 | train, val = train_test_split(main_df, test_size=0.2, random_state=42)
63 | val, test = train_test_split(val, test_size=0.5, random_state=42)
64 | return (train, val, test)
65 |
--------------------------------------------------------------------------------
/prepare_pretraining_data.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | from pandarallel import pandarallel
3 |
4 | import to_selfies
5 |
6 |
7 | def prepare_data(path="data/molecule_dataset_smiles.txt", save_to="./data/molecule_dataset_selfies.csv"):
8 | chembl_df = pd.read_csv(path, sep="\t") # data is TAB separated.
9 | # chembl_df.drop(columns=["standard_inchi", "standard_inchi_key"], inplace=True) # we are not interested in "standard_inchi" and "standard_inchi_key" columns.
10 | chembl_df["selfies"] = chembl_df["canonical_smiles"] # creating a new column "selfies" that is a copy of "canonical_smiles"
11 |
12 | pandarallel.initialize()
13 | chembl_df.selfies = chembl_df.selfies.parallel_apply(to_selfies.to_selfies)
14 |
15 | chembl_df.drop(chembl_df[chembl_df.canonical_smiles == chembl_df.selfies].index, inplace=True)
16 | chembl_df.drop(columns=["canonical_smiles"], inplace=True)
17 | chembl_df.to_csv(save_to, index=False)
18 |
19 |
20 | def create_selfies_file(selfies_df, save_to="./data/selfies_subset.txt", subset_size=100000, do_subset=True):
21 | selfies_df.sample(frac=1).reset_index(drop=True) # shuffling
22 |
23 | if do_subset:
24 | selfies_subset = selfies_df.selfies[:subset_size]
25 | else:
26 | selfies_subset = selfies_df.selfies
27 | selfies_subset = selfies_subset.to_frame()
28 | selfies_subset["selfies"].to_csv(save_to, index=False, header=False)
29 |
--------------------------------------------------------------------------------
/produce_embeddings.py:
--------------------------------------------------------------------------------
1 |
2 | import argparse
3 |
4 | parser = argparse.ArgumentParser()
5 | parser.add_argument("--selfies_dataset", required=True, metavar="/path/to/dataset/", help="Path of the input SEFLIES dataset.")
6 | parser.add_argument("--model_file", required=True, metavar="/path/to/dataset/", help="Path of the pretrained model file.")
7 | parser.add_argument("--embed_file", required=True, metavar="/path/to/dataset/", help="Path of the output embeddings file.")
8 | args = parser.parse_args()
9 |
10 | import os
11 |
12 | os.environ["TOKENIZERS_PARALLELISM"] = "false"
13 | os.environ["WANDB_DISABLED"] = "true"
14 | os.environ["CUDA_VISIBLE_DEVICES"] = "0"
15 |
16 | import pandas as pd
17 | from pandarallel import pandarallel
18 |
19 | from transformers import RobertaTokenizer, RobertaModel, RobertaConfig
20 | import torch
21 |
22 | df = pd.read_csv(args.selfies_dataset) # path of the selfies data
23 |
24 | model_name = args.model_file # path of the pre-trained model
25 | config = RobertaConfig.from_pretrained(model_name)
26 | config.output_hidden_states = True
27 | tokenizer = RobertaTokenizer.from_pretrained("./data/RobertaFastTokenizer")
28 | model = RobertaModel.from_pretrained(model_name, config=config)
29 |
30 |
31 | def get_sequence_embeddings(selfies):
32 | token = torch.tensor([tokenizer.encode(selfies, add_special_tokens=True, max_length=512, padding=True, truncation=True)])
33 | output = model(token)
34 |
35 | sequence_out = output[0]
36 | return torch.mean(sequence_out[0], dim=0).tolist()
37 |
38 | print("Starting")
39 | # df = df[:100000] # how many molecules should be processed
40 | pandarallel.initialize(nb_workers=5,progress_bar=True) # number of threads
41 | df["sequence_embeddings"] = df.selfies.parallel_apply(get_sequence_embeddings)
42 |
43 | df.drop(columns=["selfies"], inplace=True) # not interested in selfies data anymore, only chembl_id and the embedding
44 | df.to_csv(args.embed_file, index=False) # save embeddings here
45 | print("Finished")
46 |
47 | print("Molecule embeddings are ready.")
48 |
--------------------------------------------------------------------------------
/regression_pred.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import pandas as pd
4 | import torch
5 | from torch.nn import MSELoss
6 | from torch.utils.data import Dataset
7 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast
8 | from transformers.models.roberta.modeling_roberta import (
9 | RobertaClassificationHead,
10 | RobertaConfig,
11 | RobertaModel,
12 | )
13 | from transformers import Trainer, TrainingArguments
14 | from prepare_finetuning_data import smiles_to_selfies
15 | import argparse
16 |
17 | parser = argparse.ArgumentParser()
18 | parser.add_argument("--task", default="esol", help="task selection.")
19 | parser.add_argument("--tokenizer_name", default="data/RobertaFastTokenizer", metavar="/path/to/dataset/", help="Tokenizer selection.")
20 | parser.add_argument("--pred_set", default='data/finetuning_datasets/regression/esol/esol_mock.csv', metavar="/path/to/dataset/", help="Test set for predictions.")
21 | parser.add_argument("--training_args", default= "data/finetuned_models/esol_regression/training_args.bin", metavar="/path/to/dataset/", help="Trained model arguments.")
22 | parser.add_argument("--model_name", default='data/finetuned_models/esol_regression', metavar="/path/to/dataset/", help="Path to the model.")
23 | args = parser.parse_args()
24 |
25 | class SELFIESTransformers_For_Regression(BertPreTrainedModel):
26 | def __init__(self, config):
27 | super(SELFIESTransformers_For_Regression, self).__init__(config)
28 | self.num_labels = config.num_labels
29 | self.roberta = RobertaModel(config)
30 | self.classifier = RobertaClassificationHead(config)
31 |
32 | def forward(self, input_ids, attention_mask, labels=None):
33 | outputs = self.roberta(input_ids, attention_mask=attention_mask)
34 | sequence_output = outputs[0]
35 | logits = self.classifier(sequence_output)
36 |
37 | outputs = (logits,) + outputs[2:]
38 |
39 | if labels is not None:
40 | if self.num_labels == 1: # regression
41 | loss_fct = MSELoss()
42 | loss = loss_fct(logits.squeeze(), labels.squeeze())
43 | outputs = (loss,) + outputs
44 | return outputs # (loss), logits, (hidden_states), (attentions)
45 |
46 | model_class = SELFIESTransformers_For_Regression
47 |
48 | model_name = args.model_name
49 | tokenizer_name = args.tokenizer_name
50 | num_labels = 1
51 | config_class = RobertaConfig
52 | config = config_class.from_pretrained(model_name, num_labels=num_labels)
53 |
54 | model_class = SELFIESTransformers_For_Regression
55 | model = model_class.from_pretrained(model_name, config=config)
56 |
57 | tokenizer_class = RobertaTokenizerFast
58 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False)
59 |
60 | class SELFIESTransfomers_Dataset(Dataset):
61 | def __init__(self, data, tokenizer, MAX_LEN):
62 | text = data
63 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt")
64 |
65 |
66 | def __len__(self):
67 | return len(self.examples["input_ids"])
68 |
69 | def __getitem__(self, index):
70 | item = {key: self.examples[key][index] for key in self.examples}
71 |
72 | return item
73 |
74 | pred_set = pd.read_csv(args.pred_set)
75 | pred_df_selfies = smiles_to_selfies(pred_set)
76 |
77 | MAX_LEN = 128
78 |
79 | pred_examples = (pred_df_selfies.iloc[:, 0].astype(str).tolist())
80 | pred_dataset = SELFIESTransfomers_Dataset(pred_examples, tokenizer, MAX_LEN)
81 |
82 | training_args = torch.load(args.training_args)
83 |
84 | trainer = Trainer(model=model, args=training_args) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset
85 |
86 | raw_pred, label_ids, metrics = trainer.predict(pred_dataset)
87 | y_pred = [i[0] for i in raw_pred]
88 |
89 | res = pd.concat([pred_df_selfies, pd.DataFrame(y_pred, columns=["prediction"])], axis = 1)
90 |
91 | if not os.path.exists("data/predictions"):
92 | os.makedirs("data/predictions")
93 |
94 | res.to_csv("data/predictions/{}_predictions.csv".format(args.task), index=False)
95 |
--------------------------------------------------------------------------------
/roberta_model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.utils.data.dataset import Dataset
3 |
4 | import os
5 |
6 | os.environ["TOKENIZERS_PARALLELISM"] = "false"
7 | os.environ["WANDB_DISABLED"] = "true"
8 |
9 |
10 | class CustomDataset(Dataset):
11 | def __init__(self, df, tokenizer, MAX_LEN):
12 | self.examples = []
13 |
14 | for example in df.values:
15 | x = tokenizer.encode_plus(example, max_length=MAX_LEN, truncation=True, padding="max_length")
16 | self.examples += [x.input_ids]
17 |
18 | def __len__(self):
19 | return len(self.examples)
20 |
21 | def __getitem__(self, i):
22 | return torch.tensor(self.examples[i])
23 |
24 |
25 | import pandas as pd
26 | from sklearn.model_selection import train_test_split
27 |
28 | from transformers import RobertaConfig
29 | from transformers import RobertaForMaskedLM
30 | from transformers import RobertaTokenizerFast
31 |
32 | from transformers import DataCollatorForLanguageModeling
33 | from transformers import Trainer, TrainingArguments
34 |
35 | import math
36 |
37 |
38 | def train_and_save_roberta_model(hyperparameters_dict, selfies_path="./data/selfies_subset.txt", robertatokenizer_path="./data/robertatokenizer/", save_to="./saved_model/"):
39 | TRAIN_BATCH_SIZE = hyperparameters_dict["TRAIN_BATCH_SIZE"]
40 | VALID_BATCH_SIZE = hyperparameters_dict["VALID_BATCH_SIZE"]
41 | TRAIN_EPOCHS = hyperparameters_dict["TRAIN_EPOCHS"]
42 | LEARNING_RATE = hyperparameters_dict["LEARNING_RATE"]
43 | WEIGHT_DECAY = hyperparameters_dict["WEIGHT_DECAY"]
44 | MAX_LEN = hyperparameters_dict["MAX_LEN"]
45 |
46 | config = RobertaConfig(vocab_size=hyperparameters_dict["VOCAB_SIZE"], max_position_embeddings=hyperparameters_dict["MAX_POSITION_EMBEDDINGS"], num_attention_heads=hyperparameters_dict["NUM_ATTENTION_HEADS"], num_hidden_layers=hyperparameters_dict["NUM_HIDDEN_LAYERS"], type_vocab_size=hyperparameters_dict["TYPE_VOCAB_SIZE"], hidden_size=hyperparameters_dict["HIDDEN_SIZE"])
47 |
48 | # model = RobertaForMaskedLM(config=config)
49 | def _model_init():
50 | return RobertaForMaskedLM(config=config)
51 |
52 | df = pd.read_csv(selfies_path, header=None)
53 |
54 | tokenizer = RobertaTokenizerFast.from_pretrained(robertatokenizer_path)
55 |
56 | train_df, eval_df = train_test_split(df, test_size=0.2, random_state=42)
57 | train_dataset = CustomDataset(train_df[0], tokenizer, MAX_LEN) # column name is 0.
58 | eval_dataset = CustomDataset(eval_df[0], tokenizer, MAX_LEN)
59 |
60 | data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)
61 |
62 | training_args = TrainingArguments(
63 | output_dir=save_to,
64 | overwrite_output_dir=True,
65 | evaluation_strategy="epoch",
66 | save_strategy="epoch",
67 | num_train_epochs=TRAIN_EPOCHS,
68 | learning_rate=LEARNING_RATE,
69 | weight_decay=WEIGHT_DECAY,
70 | per_device_train_batch_size=TRAIN_BATCH_SIZE,
71 | per_device_eval_batch_size=VALID_BATCH_SIZE,
72 | save_total_limit=1,
73 | disable_tqdm=True,
74 | # fp16=True
75 | )
76 |
77 | trainer = Trainer(
78 | model_init=_model_init,
79 | args=training_args,
80 | data_collator=data_collator,
81 | train_dataset=train_dataset,
82 | eval_dataset=eval_dataset,
83 | # prediction_loss_only=True,
84 | )
85 |
86 | print("build trainer with on device:", training_args.device, "with n gpus:", training_args.n_gpu)
87 | trainer.train()
88 | print("training finished.")
89 |
90 | eval_results = trainer.evaluate()
91 | print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}")
92 |
93 | trainer.save_model(save_to)
94 |
--------------------------------------------------------------------------------
/roberta_tokenizer.py:
--------------------------------------------------------------------------------
1 | from transformers import RobertaTokenizerFast
2 |
3 |
4 | def save_roberta_tokenizer(path="./data/bpe/", save_to="./data/robertatokenizer/"):
5 | tokenizer = RobertaTokenizerFast.from_pretrained(path)
6 | print("Loaded BPE Tokenizer from: " + path)
7 |
8 | tokenizer.save_pretrained(save_to)
9 | print("Saved RobertaTokenizerFast to: " + save_to)
10 |
--------------------------------------------------------------------------------
/to_selfies.py:
--------------------------------------------------------------------------------
1 | import selfies as sf
2 |
3 |
4 | def to_selfies(smiles): # returns selfies representation of smiles string. if there is no representation return smiles unchanged.
5 | try:
6 | return sf.encoder(smiles)
7 | except sf.EncoderError:
8 | print("EncoderError in to_selfies()")
9 | return smiles
10 |
--------------------------------------------------------------------------------
/train_classification_model.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | os.environ["TOKENIZER_PARALLELISM"] = "false"
4 | os.environ["WANDB_DISABLED"] = "true"
5 |
6 | import numpy as np
7 | import pandas as pd
8 |
9 | import torch
10 | from torch.nn import CrossEntropyLoss
11 | from torch.utils.data import Dataset
12 |
13 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast
14 |
15 | from transformers.models.roberta.modeling_roberta import (
16 | RobertaClassificationHead,
17 | RobertaConfig,
18 | RobertaModel,
19 | )
20 |
21 |
22 | # Parse command line arguments
23 | import argparse
24 |
25 | parser = argparse.ArgumentParser()
26 | parser.add_argument("--model", required=True, metavar="/path/to/model", help="Directory of the pre-trained model")
27 | parser.add_argument("--tokenizer", required=True, metavar="/path/to/tokenizer/", help="Directory of the RobertaFastTokenizer")
28 | parser.add_argument("--dataset", required=True, metavar="/path/to/dataset/", help="Path of the fine-tuning dataset")
29 | parser.add_argument("--save_to", required=True, metavar="/path/to/save/to/", help="Directory to save the model")
30 | parser.add_argument("--target_column_id", required=False, default="1", metavar="", type=int, help="Column's ID in the dataframe")
31 | parser.add_argument(
32 | "--use_scaffold", required=False, metavar="", type=int, default=0, help="Split to use. 0 for random, 1 for scaffold. Default: 0",
33 | )
34 | parser.add_argument("--train_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for training. Default: 8")
35 | parser.add_argument("--validation_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for validation. Default: 8")
36 | parser.add_argument("--num_epochs", required=False, metavar="", type=int, default=50, help="Number of epochs. Default: 50")
37 | parser.add_argument("--lr", required=False, metavar="", type=float, default=1e-5, help="Learning rate. Default: 1e-5")
38 | parser.add_argument("--wd", required=False, metavar="", type=float, default=0.1, help="Weight decay. Default: 0.1")
39 | args = parser.parse_args()
40 |
41 |
42 | # Model
43 | class SELFIESTransformers_For_Classification(BertPreTrainedModel):
44 | def __init__(self, config):
45 | super(SELFIESTransformers_For_Classification, self).__init__(config)
46 | self.num_labels = config.num_labels
47 | self.roberta = RobertaModel(config)
48 | self.classifier = RobertaClassificationHead(config)
49 |
50 | def forward(self, input_ids, attention_mask, labels):
51 | outputs = self.roberta(input_ids, attention_mask=attention_mask)
52 | sequence_output = outputs[0]
53 | logits = self.classifier(sequence_output)
54 |
55 | outputs = (logits,) + outputs[2:]
56 |
57 | if labels is not None:
58 | loss_fct = CrossEntropyLoss()
59 | loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
60 | outputs = (loss,) + outputs
61 | return outputs # (loss), logits, (hidden_states), (attentions)
62 |
63 |
64 | model_name = args.model
65 | tokenizer_name = args.tokenizer
66 |
67 |
68 | # Configs
69 | num_labels = 2
70 | config_class = RobertaConfig
71 | config = config_class.from_pretrained(model_name, num_labels=num_labels)
72 |
73 | model_class = SELFIESTransformers_For_Classification
74 | model = model_class.from_pretrained(model_name, config=config)
75 |
76 | tokenizer_class = RobertaTokenizerFast
77 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False)
78 |
79 |
80 | # Prepare and Get Data
81 | class SELFIESTransfomers_Dataset(Dataset):
82 | def __init__(self, data, tokenizer, MAX_LEN):
83 | text, labels = data
84 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt")
85 | self.labels = torch.tensor(labels, dtype=torch.long)
86 |
87 | def __len__(self):
88 | return len(self.examples["input_ids"])
89 |
90 | def __getitem__(self, index):
91 | item = {key: self.examples[key][index] for key in self.examples}
92 | item["label"] = self.labels[index]
93 | return item
94 |
95 |
96 | DATASET_PATH = args.dataset
97 | from prepare_finetuning_data import smiles_to_selfies
98 | from prepare_finetuning_data import train_val_test_split
99 |
100 | if args.use_scaffold == 0: # random split
101 | print("Using random split")
102 | (train_df, validation_df, test_df) = train_val_test_split(DATASET_PATH, args.target_column_id, scaffold_split=False)
103 | else: # scaffold split
104 | print("Using scaffold split")
105 | (train, val, test) = train_val_test_split(DATASET_PATH, args.target_column_id, scaffold_split=True)
106 |
107 | train_smiles = [item[0] for item in train.smiles()]
108 | validation_smiles = [item[0] for item in val.smiles()]
109 | test_smiles = [item[0] for item in test.smiles()]
110 |
111 | train_df = pd.DataFrame(np.column_stack([train_smiles, train.targets()]), columns=["smiles", "target"])
112 | validation_df = pd.DataFrame(np.column_stack([validation_smiles, val.targets()]), columns=["smiles", "target"])
113 | test_df = pd.DataFrame(np.column_stack([test_smiles, test.targets()]), columns=["smiles", "target"])
114 |
115 | train_df = smiles_to_selfies(train_df)
116 | validation_df = smiles_to_selfies(validation_df)
117 | test_df = smiles_to_selfies(test_df)
118 | test_y = pd.DataFrame(test_df.target, columns=["target"])
119 |
120 | MAX_LEN = 128
121 | train_examples = (train_df.iloc[:, 0].astype(str).tolist(), train_df.iloc[:, 1].tolist())
122 | train_dataset = SELFIESTransfomers_Dataset(train_examples, tokenizer, MAX_LEN)
123 |
124 | validation_examples = (validation_df.iloc[:, 0].astype(str).tolist(), validation_df.iloc[:, 1].tolist())
125 | validation_dataset = SELFIESTransfomers_Dataset(validation_examples, tokenizer, MAX_LEN)
126 |
127 | test_examples = (test_df.iloc[:, 0].astype(str).tolist(), test_df.iloc[:, 1].tolist())
128 | test_dataset = SELFIESTransfomers_Dataset(test_examples, tokenizer, MAX_LEN)
129 |
130 |
131 | from sklearn.metrics import roc_auc_score
132 | from sklearn.metrics import precision_recall_curve
133 | from sklearn.metrics import auc
134 | from datasets import load_metric
135 |
136 | acc = load_metric("accuracy")
137 | precision = load_metric("precision")
138 | recall = load_metric("recall")
139 | f1 = load_metric("f1")
140 |
141 |
142 | def compute_metrics(eval_pred):
143 | predictions, labels = eval_pred
144 | predictions = np.argmax(predictions, axis=1)
145 |
146 | acc_result = acc.compute(predictions=predictions, references=labels)
147 | precision_result = precision.compute(predictions=predictions, references=labels)
148 | recall_result = recall.compute(predictions=predictions, references=labels)
149 | f1_result = f1.compute(predictions=predictions, references=labels)
150 | roc_auc_result = {"roc-auc": roc_auc_score(y_true=labels, y_score=predictions)}
151 | precision_from_curve, recall_from_curve, thresholds_from_curve = precision_recall_curve(labels, predictions)
152 | prc_auc_result = {"prc-auc": auc(recall_from_curve, precision_from_curve)}
153 |
154 | result = {**acc_result, **precision_result, **recall_result, **f1_result, **roc_auc_result, **prc_auc_result}
155 | return result
156 |
157 |
158 | # Train and Evaluate
159 | from transformers import TrainingArguments, Trainer
160 |
161 | TRAIN_BATCH_SIZE = args.train_batch_size
162 | VALID_BATCH_SIZE = args.validation_batch_size
163 | TRAIN_EPOCHS = args.num_epochs
164 | LEARNING_RATE = args.lr
165 | WEIGHT_DECAY = args.wd
166 | MAX_LEN = MAX_LEN
167 |
168 | training_args = TrainingArguments(
169 | output_dir=args.save_to,
170 | overwrite_output_dir=True,
171 | evaluation_strategy="epoch",
172 | save_strategy="epoch",
173 | num_train_epochs=TRAIN_EPOCHS,
174 | learning_rate=LEARNING_RATE,
175 | weight_decay=WEIGHT_DECAY,
176 | per_device_train_batch_size=TRAIN_BATCH_SIZE,
177 | per_device_eval_batch_size=VALID_BATCH_SIZE,
178 | disable_tqdm=True,
179 | # load_best_model_at_end=True,
180 | # metric_for_best_model="roc-auc",
181 | # greater_is_better=True,
182 | save_total_limit=1,
183 | )
184 |
185 | trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=validation_dataset, compute_metrics=compute_metrics) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset
186 |
187 | metrics = trainer.train()
188 | print("Metrics")
189 | print(metrics)
190 | trainer.save_model(args.save_to)
191 |
192 | # Testing
193 | # Make prediction
194 | raw_pred, label_ids, metrics = trainer.predict(test_dataset)
195 |
196 | # Preprocess raw predictions
197 | y_pred = np.argmax(raw_pred, axis=1)
198 |
199 | # ROC-AUC
200 | roc_auc_score_result = roc_auc_score(y_true=test_y, y_score=y_pred)
201 | # PRC-AUC
202 | precision_from_curve, recall_from_curve, thresholds_from_curve = precision_recall_curve(test_y, y_pred)
203 | auc_score_result = auc(recall_from_curve, precision_from_curve)
204 |
205 | print("\nROC-AUC: ", roc_auc_score_result, "\nPRC-AUC: ", auc_score_result)
206 |
--------------------------------------------------------------------------------
/train_classification_multilabel_model.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | os.environ["TOKENIZER_PARALLELISM"] = "false"
4 | os.environ["WANDB_DISABLED"] = "true"
5 |
6 | from simpletransformers.classification import MultiLabelClassificationModel
7 |
8 | import pandas as pd
9 | import numpy as np
10 |
11 | import argparse
12 |
13 | parser = argparse.ArgumentParser()
14 | parser.add_argument("--model", required=True, metavar="/path/to/model", help="Path to model")
15 | parser.add_argument("--dataset", required=True, metavar="/path/to/dataset/", help="Directory of the dataset")
16 | parser.add_argument("--save_to", required=True, metavar="/path/to/save/to/", help="Directory to save the model")
17 | parser.add_argument("--use_scaffold", required=False, metavar="", type=int, default=0, help="Split to use. 0 for random, 1 for scaffold. Default: 0")
18 | parser.add_argument("--num_epochs", required=False, metavar="", type=int, default=50, help="Number of epochs. Default: 50")
19 | parser.add_argument("--lr", required=False, metavar="", type=float, default=1e-5, help="Learning rate. Default: 1e-5")
20 | parser.add_argument("--wd", required=False, metavar="", type=float, default=0.1, help="Weight decay. Default: 0.1")
21 | parser.add_argument("--batch_size", required=False, metavar="", type=int, default=8, help="Batch size. Default: 8")
22 | args = parser.parse_args()
23 |
24 |
25 | num_labels = len(pd.read_csv(args.dataset).columns) - 1
26 | model_args = {
27 | "num_train_epochs": args.num_epochs,
28 | "learning_rate": args.lr,
29 | "weight_decay": args.wd,
30 | "train_batch_size": args.batch_size,
31 | "output_dir": args.save_to,
32 | }
33 |
34 | model = MultiLabelClassificationModel("roberta", args.model, num_labels=num_labels, use_cuda=True, args=model_args)
35 |
36 | from prepare_finetuning_data import train_val_test_split_multilabel
37 |
38 | if args.use_scaffold == 0: # random split
39 | print("Using random split")
40 | (train_df, eval_df, test_df) = train_val_test_split_multilabel(args.dataset, scaffold_split=False)
41 |
42 | train_df.columns = ["smiles"] + ["Feature_" + str(i) for i in range(num_labels)]
43 | eval_df.columns = ["smiles"] + ["Feature_" + str(i) for i in range(num_labels)]
44 | test_df.columns = ["smiles"] + ["Feature_" + str(i) for i in range(num_labels)]
45 | else: # scaffold split
46 | print("Using scaffold split")
47 | (train, val, test) = train_val_test_split_multilabel(args.dataset, scaffold_split=True)
48 |
49 | train_smiles = [item[0] for item in train.smiles()]
50 | validation_smiles = [item[0] for item in val.smiles()]
51 | test_smiles = [item[0] for item in test.smiles()]
52 |
53 | train_df = pd.DataFrame(np.column_stack([train_smiles, train.targets()]), columns=["smiles"] + ["Feature_" + str(i) for i in range(len(train.targets()[0]))])
54 | eval_df = pd.DataFrame(np.column_stack([validation_smiles, val.targets()]), columns=["smiles"] + ["Feature_" + str(i) for i in range(len(val.targets()[0]))])
55 | test_df = pd.DataFrame(np.column_stack([test_smiles, test.targets()]), columns=["smiles"] + ["Feature_" + str(i) for i in range(len(test.targets()[0]))])
56 |
57 | from prepare_finetuning_data import smiles_to_selfies
58 |
59 | train_df = smiles_to_selfies(train_df)
60 | eval_df = smiles_to_selfies(eval_df)
61 | test_df = smiles_to_selfies(test_df)
62 |
63 | train_df.insert(1, "labels", np.array([train_df["Feature_" + str(i)].to_numpy() for i in range(len(train_df.columns[1:]))], dtype=np.float32).T.tolist())
64 | eval_df.insert(1, "labels", np.array([eval_df["Feature_" + str(i)].to_numpy() for i in range(len(eval_df.columns[1:]))], dtype=np.float32).T.tolist())
65 | test_df.insert(1, "labels", np.array([test_df["Feature_" + str(i)].to_numpy() for i in range(len(test_df.columns[1:]))], dtype=np.float32).T.tolist())
66 |
67 | from sklearn.metrics import roc_auc_score
68 | from sklearn.metrics import precision_recall_curve
69 | from sklearn.metrics import auc
70 |
71 | from datasets import load_metric
72 |
73 | acc = load_metric("accuracy")
74 | precision = load_metric("precision")
75 | recall = load_metric("recall")
76 | f1 = load_metric("f1")
77 |
78 |
79 | def compute_metrics(y_true, y_pred):
80 | acc_result = acc.compute(predictions=y_pred, references=y_true)
81 | precision_result = precision.compute(predictions=y_pred, references=y_true)
82 | recall_result = recall.compute(predictions=y_pred, references=y_true)
83 | f1_result = f1.compute(predictions=y_pred, references=y_true)
84 | roc_auc_result = {"roc-auc": roc_auc_score(y_true=y_true, y_score=y_pred)}
85 | precision_from_curve, recall_from_curve, thresholds_from_curve = precision_recall_curve(y_true, y_pred)
86 | prc_auc_result = {"prc-auc": auc(recall_from_curve, precision_from_curve)}
87 |
88 | result = {**acc_result, **precision_result, **recall_result, **f1_result, **roc_auc_result, **prc_auc_result}
89 | return result
90 |
91 |
92 | model.train_model(train_df)
93 |
94 | print("Evaluation Scores")
95 | preds, _ = model.predict(eval_df["selfies"].tolist())
96 | print(compute_metrics(np.ravel(eval_df["labels"].tolist()), np.ravel(preds)))
97 |
98 | print("Test Scores")
99 | preds, _ = model.predict(test_df["selfies"].tolist())
100 | print(compute_metrics(np.ravel(test_df["labels"].tolist()), np.ravel(preds)))
101 |
--------------------------------------------------------------------------------
/train_pretraining_model.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | parser = argparse.ArgumentParser()
4 | parser.add_argument("--smiles_dataset", required=False, metavar="/path/to/dataset/", help="Path of the SMILES dataset. If you provided --selfies_dataset argument, then this argument is not required.")
5 | parser.add_argument("--selfies_dataset", required=True, metavar="/path/to/dataset/", help="Path of the SEFLIES dataset. If it does not exist, it will be created at the given path.")
6 | parser.add_argument("--subset_size", required=False, metavar="", type=int, default=0, help="By default the program will use the whole data. If you want to instead use a subset of the data, set this parameter to the size of the subset.")
7 | parser.add_argument("--prepared_data_path", required=True, metavar="/path/to/dataset/", help="Path of the .txt prepared data. If it does not exist, it will be created at the given path.")
8 | parser.add_argument("--bpe_path", required=True, metavar="/path/to/bpetokenizer/", default="", help="Path of the BPE tokenizer. If it does not exist, it will be created at the given path.")
9 | parser.add_argument("--roberta_fast_tokenizer_path", required=True, metavar="/path/to/robertafasttokenizer/", help="Directory of the RobertaTokenizerFast tokenizer. RobertaFastTokenizer only depends on the BPE Tokenizer and will be created regardless of whether it exists or not.")
10 | parser.add_argument("--hyperparameters_path", required=True, metavar="/path/to/hyperparameters/", help="Path of the hyperparameters that will be used for pre-training. Hyperparameters should be stored in a yaml file.")
11 | args = parser.parse_args()
12 |
13 | import pandas as pd
14 |
15 | try:
16 | chembl_df = pd.read_csv(args.selfies_dataset)
17 | except FileNotFoundError:
18 | print("SELFIES dataset was not found. SMILES dataset provided. Converting SMILES to SELFIES.")
19 | from prepare_pretraining_data import prepare_data
20 |
21 | prepare_data(path=args.smiles_dataset, save_to=args.selfies_dataset)
22 | chembl_df = pd.read_csv(args.selfies_dataset)
23 | print("SELFIES .csv is ready.")
24 |
25 | print("Creating SELFIES .txt for tokenization.")
26 | from os.path import isfile # returns True if the file exists else False.
27 |
28 | if not isfile(args.prepared_data_path):
29 | from prepare_pretraining_data import create_selfies_file
30 |
31 | if args.subset_size != 0:
32 | create_selfies_file(chembl_df, subset_size=args.subset_size, do_subset=True, save_to=args.prepared_data_path)
33 | else:
34 | create_selfies_file(chembl_df, do_subset=False, save_to=args.prepared_data_path)
35 | print("SELFIES .txt is ready for tokenization.")
36 |
37 | print("Creating BPE tokenizer.")
38 | if not isfile(args.bpe_path+"/merges.txt"):
39 | import bpe_tokenizer
40 |
41 | bpe_tokenizer.bpe_tokenizer(path=args.prepared_data_path, save_to=args.bpe_path)
42 | print("BPE Tokenizer is ready.")
43 |
44 | print("Creating RobertaTokenizerFast.")
45 | if not isfile(args.roberta_fast_tokenizer_path+"/merges.txt"):
46 | import roberta_tokenizer
47 |
48 | roberta_tokenizer.save_roberta_tokenizer(path=args.bpe_path, save_to=args.roberta_fast_tokenizer_path)
49 | print("RobertaFastTokenizer is ready.")
50 |
51 | import yaml
52 | import roberta_model
53 |
54 | with open(args.hyperparameters_path) as file:
55 | hyperparameters = yaml.safe_load(file)
56 | for key in hyperparameters.keys():
57 | print("Starting pretraining with {} parameter set.".format(key))
58 | roberta_model.train_and_save_roberta_model(hyperparameters_dict=hyperparameters[key], selfies_path=args.prepared_data_path, robertatokenizer_path=args.roberta_fast_tokenizer_path, save_to="./saved_models/" + key + "_saved_model/")
59 | print("Finished pretraining with {} parameter set.\n---------------\n".format(key))
60 |
--------------------------------------------------------------------------------
/train_regression_model.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | os.environ["TOKENIZER_PARALLELISM"] = "false"
4 | os.environ["WANDB_DISABLED"] = "true"
5 |
6 | import numpy as np
7 | import pandas as pd
8 |
9 | import torch
10 | from torch.nn import MSELoss
11 | from torch.utils.data import Dataset
12 |
13 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast
14 |
15 | from transformers.models.roberta.modeling_roberta import (
16 | RobertaClassificationHead,
17 | RobertaConfig,
18 | RobertaModel,
19 | )
20 |
21 | import argparse
22 |
23 | parser = argparse.ArgumentParser()
24 | parser.add_argument("--model", required=True, metavar="/path/to/model", help="Directory of the model")
25 | parser.add_argument("--tokenizer", required=True, metavar="/path/to/tokenizer/", help="Directory of the tokenizer")
26 | parser.add_argument("--dataset", required=True, metavar="/path/to/dataset/", help="Directory of the dataset")
27 | parser.add_argument("--save_to", required=True, metavar="/path/to/save/to/", help="Directory to save the model")
28 | parser.add_argument("--target_column_id", required=False, default="1", metavar="", type=int, help="Column's ID in the dataframe")
29 | parser.add_argument("--scaler", required=False, default=0, metavar="", type=int, help="Scaler to use for regression. 0 for no scaling, 1 for min-max scaling, 2 for standard scaling. Default: 0")
30 | parser.add_argument("--use_scaffold", required=False, metavar="", type=int, default=0, help="Split to use. 0 for random, 1 for scaffold. Default: 0")
31 | parser.add_argument("--train_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for training. Default: 8")
32 | parser.add_argument("--validation_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for validation. Default: 8")
33 | parser.add_argument("--num_epochs", required=False, metavar="", type=int, default=50, help="Number of epochs. Default: 50")
34 | parser.add_argument("--lr", required=False, metavar="", type=float, default=1e-5, help="Learning rate. Default: 1e-5")
35 | parser.add_argument("--wd", required=False, metavar="", type=float, default=0.1, help="Weight decay. Default: 0.1")
36 | args = parser.parse_args()
37 |
38 |
39 | # Model
40 | class SELFIESTransformers_For_Regression(BertPreTrainedModel):
41 | def __init__(self, config):
42 | super(SELFIESTransformers_For_Regression, self).__init__(config)
43 | self.num_labels = config.num_labels
44 | self.roberta = RobertaModel(config)
45 | self.classifier = RobertaClassificationHead(config)
46 |
47 | def forward(self, input_ids, attention_mask, labels):
48 | outputs = self.roberta(input_ids, attention_mask=attention_mask)
49 | sequence_output = outputs[0]
50 | logits = self.classifier(sequence_output)
51 |
52 | outputs = (logits,) + outputs[2:]
53 |
54 | if labels is not None:
55 | if self.num_labels == 1: # regression
56 | loss_fct = MSELoss()
57 | loss = loss_fct(logits.squeeze(), labels.squeeze())
58 | outputs = (loss,) + outputs
59 | return outputs # (loss), logits, (hidden_states), (attentions)
60 |
61 |
62 | model_name = args.model
63 | tokenizer_name = args.tokenizer
64 |
65 | # Configs
66 |
67 | num_labels = 1
68 | config_class = RobertaConfig
69 | config = config_class.from_pretrained(model_name, num_labels=num_labels)
70 |
71 | model_class = SELFIESTransformers_For_Regression
72 | model = model_class.from_pretrained(model_name, config=config)
73 |
74 | tokenizer_class = RobertaTokenizerFast
75 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False)
76 |
77 |
78 | # Prepare and Get Data
79 | class SELFIESTransfomers_Dataset(Dataset):
80 | def __init__(self, data, tokenizer, MAX_LEN):
81 | text, labels = data
82 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt")
83 | self.labels = torch.tensor(labels, dtype=torch.float)
84 |
85 | def __len__(self):
86 | return len(self.examples["input_ids"])
87 |
88 | def __getitem__(self, index):
89 | item = {key: self.examples[key][index] for key in self.examples}
90 | item["label"] = self.labels[index]
91 | return item
92 |
93 |
94 | DATASET_PATH = args.dataset
95 | from prepare_finetuning_data import smiles_to_selfies
96 | from prepare_finetuning_data import train_val_test_split
97 |
98 | if args.use_scaffold == 0: # random
99 | print("Using random split")
100 | (train_df, validation_df, test_df) = train_val_test_split(DATASET_PATH, args.target_column_id, scaffold_split=False)
101 | else: # scaffold
102 | print("Using scaffold split")
103 | (train, val, test) = train_val_test_split(DATASET_PATH, args.target_column_id)
104 |
105 | train_smiles = [item[0] for item in train.smiles()]
106 | validation_smiles = [item[0] for item in val.smiles()]
107 | test_smiles = [item[0] for item in test.smiles()]
108 |
109 | train_df = pd.DataFrame(np.column_stack([train_smiles, train.targets()]), columns=["smiles", "target"])
110 | validation_df = pd.DataFrame(np.column_stack([validation_smiles, val.targets()]), columns=["smiles", "target"])
111 | test_df = pd.DataFrame(np.column_stack([test_smiles, test.targets()]), columns=["smiles", "target"])
112 |
113 | train_df = smiles_to_selfies(train_df)
114 | validation_df = smiles_to_selfies(validation_df)
115 | test_df = smiles_to_selfies(test_df)
116 |
117 | from sklearn.preprocessing import StandardScaler, MinMaxScaler
118 |
119 | if args.scaler == 0:
120 | print("Not using a scaler.")
121 | elif args.scaler == 1:
122 | print("Using MinMaxScaler.")
123 | train_df["target"] = MinMaxScaler().fit_transform(np.array(train_df["target"]).reshape(-1, 1))
124 | validation_df["target"] = MinMaxScaler().fit_transform(np.array(validation_df["target"]).reshape(-1, 1))
125 | test_df["target"] = MinMaxScaler().fit_transform(np.array(test_df["target"]).reshape(-1, 1))
126 | elif args.scaler == 2:
127 | print("Using StandardScaler.")
128 | train_df["target"] = StandardScaler().fit_transform(np.array(train_df["target"]).reshape(-1, 1))
129 | validation_df["target"] = StandardScaler().fit_transform(np.array(validation_df["target"]).reshape(-1, 1))
130 | test_df["target"] = StandardScaler().fit_transform(np.array(test_df["target"]).reshape(-1, 1))
131 | else:
132 | print("Invalid scaler. Not using a scaler.")
133 |
134 | test_y = pd.DataFrame(test_df.target, columns=["target"])
135 |
136 | MAX_LEN = 128
137 | train_examples = (train_df.iloc[:, 0].astype(str).tolist(), train_df.iloc[:, 1].tolist())
138 | train_dataset = SELFIESTransfomers_Dataset(train_examples, tokenizer, MAX_LEN)
139 |
140 | validation_examples = (validation_df.iloc[:, 0].astype(str).tolist(), validation_df.iloc[:, 1].tolist())
141 | validation_dataset = SELFIESTransfomers_Dataset(validation_examples, tokenizer, MAX_LEN)
142 |
143 | test_examples = (test_df.iloc[:, 0].astype(str).tolist(), test_df.iloc[:, 1].tolist())
144 | test_dataset = SELFIESTransfomers_Dataset(test_examples, tokenizer, MAX_LEN)
145 |
146 |
147 | from sklearn.metrics import mean_absolute_error, mean_squared_error
148 | from sklearn.metrics import mean_absolute_error
149 |
150 |
151 | def compute_metrics(eval_pred):
152 | preds, labels = eval_pred
153 | predictions = [i[0] for i in preds]
154 |
155 | mse = {"mse": mean_squared_error(y_pred=predictions, y_true=labels, squared=True)} # it is actually squared=True, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html
156 | rmse = {"rmse": mean_squared_error(y_pred=predictions, y_true=labels, squared=False)} # it needs to squared=False, check the link above
157 | mae = {"mae": mean_absolute_error(y_pred=predictions, y_true=labels)}
158 |
159 | result = {**mse, **rmse, **mae}
160 | return result
161 |
162 |
163 | # Train and Evaluate
164 | from transformers import TrainingArguments, Trainer
165 |
166 | TRAIN_BATCH_SIZE = args.train_batch_size
167 | VALID_BATCH_SIZE = args.validation_batch_size
168 | TRAIN_EPOCHS = args.num_epochs
169 | LEARNING_RATE = args.lr
170 | WEIGHT_DECAY = args.wd
171 | MAX_LEN = MAX_LEN
172 |
173 | training_args = TrainingArguments(
174 | output_dir=args.save_to,
175 | overwrite_output_dir=True,
176 | evaluation_strategy="epoch",
177 | save_strategy="epoch",
178 | num_train_epochs=TRAIN_EPOCHS,
179 | learning_rate=LEARNING_RATE,
180 | weight_decay=WEIGHT_DECAY,
181 | per_device_train_batch_size=TRAIN_BATCH_SIZE,
182 | per_device_eval_batch_size=VALID_BATCH_SIZE,
183 | disable_tqdm=True,
184 | # load_best_model_at_end=True,
185 | # metric_for_best_model="roc-auc",
186 | # greater_is_better=True,
187 | save_total_limit=1,
188 | )
189 |
190 | trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=validation_dataset, compute_metrics=compute_metrics,) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset
191 |
192 | metrics = trainer.train()
193 | print("Metrics")
194 | print(metrics)
195 | trainer.save_model(args.save_to)
196 |
197 | # Testing
198 | # Make prediction
199 | raw_pred, label_ids, metrics = trainer.predict(test_dataset)
200 |
201 | # Preprocess raw predictions
202 | y_pred = [i[0] for i in raw_pred]
203 |
204 | MSE = mean_squared_error(y_true=test_y, y_pred=y_pred, squared=True) # it is actually squared=True, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html
205 | RMSE = mean_squared_error(y_true=test_y, y_pred=y_pred, squared=False) # it needs to squared=False, check the link above
206 | MAE = mean_absolute_error(y_true=test_y, y_pred=y_pred)
207 |
208 | print("\nMean Squared Error (MSE):", MSE)
209 | print("Root Mean Square Error (RMSE):", RMSE)
210 | print("Mean Absolute Error (MAE):", MAE)
211 |
--------------------------------------------------------------------------------