├── .gitattributes ├── .gitignore ├── README.md ├── binary_class_pred.py ├── bpe_tokenizer.py ├── create_hparam_set.py ├── create_selfies_alphabet.py ├── data ├── BPETokenizer │ ├── bpe.json │ ├── merges.txt │ └── vocab.json ├── RobertaFastTokenizer │ ├── merges.txt │ ├── special_tokens_map.json │ ├── tokenizer.json │ ├── tokenizer_config.json │ └── vocab.json ├── finetuning_datasets │ ├── classification │ │ ├── bace │ │ │ └── bace.csv │ │ ├── bbbp │ │ │ ├── bbbp.csv │ │ │ └── bbbp_mock.csv │ │ ├── hiv │ │ │ └── hiv.csv │ │ ├── sider │ │ │ ├── sider.csv │ │ │ └── sider_mock.csv │ │ └── tox21 │ │ │ └── tox21.csv │ └── regression │ │ ├── esol │ │ ├── esol.csv │ │ └── esol_mock.csv │ │ ├── freesolv │ │ └── freesolv.csv │ │ ├── lipo │ │ └── lipo.csv │ │ └── pdbbind_full │ │ └── pdbbind_full.csv ├── molecule_dataset_selfies.zip ├── molecule_dataset_smiles.zip ├── pretraining_hyperparameters.yml └── requirements.yml ├── figures └── selformer_architecture.png ├── generate_selfies.py ├── get_embeddings.py ├── get_moleculenet_embeddings.py ├── multilabel_class_pred.py ├── prepare_finetuning_data.py ├── prepare_pretraining_data.py ├── produce_embeddings.py ├── regression_pred.py ├── roberta_model.py ├── roberta_tokenizer.py ├── to_selfies.py ├── train_classification_model.py ├── train_classification_multilabel_model.py ├── train_pretraining_model.py └── train_regression_model.py /.gitattributes: -------------------------------------------------------------------------------- 1 | *.bin filter=lfs diff=lfs merge=lfs -text 2 | chembl_29_selfies.csv filter=lfs diff=lfs merge=lfs -text 3 | chembl_29_chemreps.txt filter=lfs diff=lfs merge=lfs -text 4 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | .vscode 3 | 4 | finetuned_models/ 5 | pretrained_models/ 6 | chembl_29_selfies.csv 7 | chembl_29_selfies.zip 8 | chembl_29_selfies_subset.txt 9 | 10 | *.ipynb 11 | *.out 12 | *.toml -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SELFormer: Molecular Representation Learning via SELFIES Language Models 2 | 3 | 4 | 5 | [![publication](https://img.shields.io/badge/Article-%40MLST%20journal-d30000.svg)](https://doi.org/10.1088/2632-2153/acdb30) [![license](https://img.shields.io/badge/license-GPLv3-blue.svg)](http://www.gnu.org/licenses/) 6 | 7 | Automated computational analysis of the vast chemical space is critical for numerous fields of research such as drug discovery and material science. Representation learning techniques have recently been employed with the primary objective of generating compact and informative numerical expressions of complex data. One approach to efficiently learn molecular representations is processing string-based notations of chemicals via natural language processing (NLP) algorithms. Majority of the methods proposed so far utilize SMILES notations for this purpose; however, SMILES is associated with numerous problems related to validity and robustness, which may prevent the model from effectively uncovering the knowledge hidden in the data. In this study, we propose SELFormer, a transformer architecture-based chemical language model that utilizes a 100% valid, compact and expressive notation, SELFIES, as input, in order to learn flexible and high-quality molecular representations. SELFormer is pre-trained on two million drug-like compounds and fine-tuned for diverse molecular property prediction tasks. Our performance evaluation has revealed that, SELFormer outperforms all competing methods, including graph learning-based approaches and SMILES-based chemical language models, on predicting aqueous solubility of molecules and adverse drug reactions. We also visualized molecular representations learned by SELFormer via dimensionality reduction, which indicated that even the pre-trained model can discriminate molecules with differing structural properties. We shared SELFormer as a programmatic tool, together with its datasets and pre-trained models. Overall, our research demonstrates the benefit of using the SELFIES notations in the context of chemical language modeling and opens up new possibilities for the design and discovery of novel drug candidates with desired features. 8 | 9 | Figure1_selformer_architecture 10 | 11 | **Figure.** The schematic representation of the SELFormer architecture and the experiments conducted. **Left:** the self-supervised pre-training utilizes the transformer encoder module via masked language modeling for learning concise and informative representations of small molecules encoded by their SELFIES notation. **Right:** the pre-trained model has been fine-tuned independently on numerous molecular property-based classification and regression tasks. 12 | 13 | 14 |
15 | 16 | ## The Architecture of SELFormer 17 | 18 | SELFormer is built on the RoBERTa transformer architecture, which utilizes the same architecture as BERT, but with certain modifications that have been found to improve model performance or provide other benefits. One such modification is the use of byte-level Byte-Pair Encoding (BPE) for tokenization instead of character-level BPE. Another one is that, RoBERTa is pre-trained exclusively on the masked language modeling (MLM) objective while disregarding the next sentence prediction (NSP) task. SELFormer has (i) self-supervised pre-trained models that utilize the transformer encoder module for learning concise and informative representations of small molecules encoded by their SELFIES notation, and (ii) supervised classification/regression models which use the pre-treined model as base and fine-tune on numerous classification- and regression-based molecular property prediction tasks. 19 | 20 | Our pre-trained encoder models are implemented as "RobertaMaskedLM" and fine-tuning models as "RobertaForSequenceClassification". For the fine-tuning process, the SELFormer architecture includes the pre-trained RoBERTa model as its base, and "RobertaClassificationHead" class as the following layers (for classification and regression). "RobertaClassificationHead" class consists of a dropout layer, a dense layer, tanh activation function, a dropout layer, and a final linear layer. We forward the sequence output of the pre-trained RoBERTa base model to the classifier during the fine-tuning process. 21 | 22 |
23 | 24 | ## Getting Started 25 | 26 | We highly recommend the Conda platform for installing dependencies. Following the installation of Conda, please create and activate an environment with dependencies as defined below: 27 | 28 | ``` 29 | conda create -n SELFormer_env 30 | conda activate SELFormer_env 31 | conda env update --file data/requirements.yml 32 | ``` 33 |
34 | 35 | ## Generating Molecule Embeddings Using Pre-trained Models 36 | 37 | Pre-trained SELFormer models are available for download [here](https://drive.google.com/drive/folders/1c3Mwc3j4M0PHk_iORrKU_V5cuxkD9aM6?usp=share_link). Embeddings of all molecules from CHEMBL30 and CHEMBL33 that are generated by our best performing model are available [here](https://drive.google.com/drive/folders/1Ii44Z6HonzJv5B5VYFujVaSThf802e2M?usp=sharing). 38 | 39 | You can also generate embeddings for your own dataset using the pre-trained models. To do so, you will need SELFIES notations of your molecules. You can use the command below to generate SELFIES notations for your SMILES dataset. 40 | 41 | If you want to reproduce our code for generating embeddings of CHEMBL30 dataset, you can unzip __molecule_dataset_smiles.zip__ and/or __molecule_dataset_selfies.zip__ files in the __data__ directory and use them as input SMILES and SELFIES datasets, respectively. 42 | 43 | ``` 44 | python3 generate_selfies.py --smiles_dataset=data/molecule_dataset_smiles.txt --selfies_dataset=data/molecule_dataset_selfies.csv 45 | ``` 46 | 47 | * __--smiles_dataset__: Path of the input SMILES dataset. 48 | * __--selfies_dataset__: Path of the output SELFIES dataset. 49 | 50 |
51 | 52 | To generate embeddings for the SELFIES molecule dataset using a pre-trained model, please run the following command: 53 | 54 | ``` 55 | python3 produce_embeddings.py --selfies_dataset=data/molecule_dataset_selfies.csv --model_file=data/pretrained_models/SELFormer --embed_file=data/embeddings.csv 56 | ``` 57 | 58 | * __--selfies_dataset__: Path of the input SELFIES dataset. 59 | * __--model_file__: Path of the pretrained model to be used. 60 | * __--embed_file__: Path of the output embeddings file. 61 | 62 |
63 | 64 | ### Generating Embeddings Using Pre-trained Models for MoleculeNet Dataset Molecules 65 | 66 | The embeddings generated by our best performing pre-trained model for MoleculeNet data can be directly downloaded [here](https://drive.google.com/drive/folders/1Xu3Q1T-KwXb67MF3Uw63pFm2IzoxeNNY?usp=share_link). 67 | 68 | You can also re-generate these embeddings using the command below. 69 | 70 | ``` 71 | python3 get_moleculenet_embeddings.py --dataset_path=data/finetuning_datasets --model_file=data/pretrained_models/SELFormer 72 | ``` 73 | * __--dataset_path__: Path of the directory containing the MoleculeNet datasets. 74 | * __--model_file__: Path of the pretrained model to be used. 75 | 76 |
77 | 78 | ## Training and Evaluating Models 79 | 80 | ### Pre-Training 81 | To pre-train a model, please run the command below. If you have a SELFIES dataset, you can use it directly by giving the path of the dataset to __--selfies_dataset__. If you have a SMILES dataset, you can give the path of the dataset to __--smiles_dataset__ and the SELFIES representations will be created at the path given to __--selfies_dataset__. 82 | 83 |
84 | 85 | ``` 86 | python3 train_pretraining_model.py --smiles_dataset=data/molecule_dataset_smiles.txt --selfies_dataset=data/molecule_dataset_selfies.csv --prepared_data_path=data/selfies_data.txt --bpe_path=data/BPETokenizer --roberta_fast_tokenizer_path=data/RobertaFastTokenizer --hyperparameters_path=data/pretraining_hyperparameters.yml --subset_size=100000 87 | ``` 88 | 89 | * __--smiles_dataset__: Path of the SMILES dataset. It is required if __--selfies_dataset__ does not exist (optional). 90 | * __--selfies_dataset__: Path of the SELFIES dataset. If a SELFIES dataset does not exist, it will be created at the given path using the __--smiles_dataset__. If it exists, SELFIES dataset will be used directly (required). 91 | * __--prepared_data_path__: Path of the intermediate file that will be created during pre-training. It will be used for tokenization. If it does not exist, it will be created at the given path (required). 92 | * __--bpe_path__: Path of the BPE tokenizer. If it does not exist, it will be created at the given path (required). 93 | * __--roberta_fast_tokenizer_path__: Path of the RobertaTokenizerFast tokenizer. If it does not exist, it will be created at the given path (required). 94 | * __--hyperparameters_path__: Path of the yaml file that contains the hyperparameter sets to be tested. Note that these sets will be tested one by one and not in parallel. Example file is available at /data/pretraining_hyperparameters.yml (required). 95 | * __--subset_size__: The size of the subset of the dataset that will be used for pre-training. By default, the whole dataset will be used (optional). 96 | 97 |
98 | 99 | ### Fine-tuning on Molecular Property Prediction 100 | 101 | You can use commands below to fine-tune a pre-trained model for various molecular property prediction tasks. These commands are utilized to handle datasets containing SMILES representations of molecules. SMILES representations should be stored in a column with a header named "smiles". You can see the example datasets in the __data/finetuning_datasets__ directory. 102 | 103 |
104 | 105 | **Binary Classification Tasks** 106 | 107 | To fine-tune a pre-trained model on a binary classification dataset, please run the command below. 108 | 109 | ``` 110 | python3 train_classification_model.py --model=data/saved_models/SELFormer --tokenizer=data/RobertaFastTokenizer --dataset=data/finetuning_datasets/classification/bbbp/bbbp.csv --save_to=data/finetuned_models/SELFormer_bbbp_classification --target_column_id=1 --use_scaffold=1 --train_batch_size=16 --validation_batch_size=8 --num_epochs=25 --lr=5e-5 --wd=0 111 | ``` 112 | 113 | * __--model__: Directory of the pre-trained model (required). 114 | * __--tokenizer__: Directory of the RobertaFastTokenizer (required). 115 | * __--dataset__: Path of the fine-tuning dataset (required). 116 | * __--save_to__: Directory where the fine-tuned model will be saved (required). 117 | * __--target_column_id__: Default: 1. The column id of the target column in the fine-tuning dataset (optional). 118 | * __--use_scaffold__: Default: 0. Determines whether to use scaffold splitting (1) or random splitting (0) (optional). 119 | * __--train_batch_size__: Default: 8 (optional). 120 | * __--validation_batch_size__ : Default: 8 (optional). 121 | * __--num_epochs__: Default: 50. Number of epochs (optional). 122 | * __--lr__: Default: 1e-5: Learning rate (optional). 123 | * __--wd__: Default: 0.1: Weight decay (optional). 124 | 125 |
126 | 127 | **Multi-Label Classification Tasks** 128 | 129 | To fine-tune a pre-trained model on a multi-label classification dataset, please run the command below. The RobertaFastTokenizer files should be stored in the same directory as the pre-trained model. 130 | 131 | ``` 132 | python3 train_classification_multilabel_model.py --model=data/saved_models/SELFormer --dataset=data/finetuning_datasets/classification/tox21/tox21.csv --save_to=data/finetuned_models/SELFormer_tox21_classification --use_scaffold=1 --batch_size=16 --num_epochs=25 --lr=5e-5 --wd=0 133 | ``` 134 | 135 | * __--model__: Directory of the pre-trained model (required). 136 | * __--dataset__: Path of the fine-tuning dataset (required). 137 | * __--save_to__: Directory where the fine-tuned model will be saved (required). 138 | * __--use_scaffold__: Default: 0. Determines whether to use scaffold splitting (1) or random splitting (0) (optional). 139 | * __--batch_size__: Default: 8. Train batch size (optional). 140 | * __--num_epochs__: Default: 50. Number of epochs (optional). 141 | * __--lr__: Default: 1e-5: Learning rate (optional). 142 | * __--wd__: Default: 0.1: Weight decay (optional). 143 | 144 |
145 | 146 | **Regression Tasks** 147 | 148 | To fine-tune a pre-trained model on a regression dataset, please run the command below. 149 | 150 | ``` 151 | python3 train_regression_model.py --model=data/saved_models/SELFormer --tokenizer=data/RobertaFastTokenizer --dataset=data/finetuning_datasets/regression/esol/esol.csv --save_to=data/finetuned_models/SELFormer_esol_regression --target_column_id=-1 --scaler=2 --use_scaffold=1 --train_batch_size=16 --validation_batch_size=8 --num_epochs=25 --lr=5e-5 --wd=0 152 | ``` 153 | 154 | * __--model__: Directory of the pre-trained model (required). 155 | * __--tokenizer__: Directory of the RobertaFastTokenizer (required). 156 | * __--dataset__: Path of the fine-tuning dataset (required). 157 | * __--save_to__: Directory where the fine-tuned model will be saved (required). 158 | * __--target_column_id__: Default: 1. The column id of the target column in the fine-tuning dataset (optional). 159 | * __--scaler__: Default: 0. Method to be used for scaling the target values. 0 for no scaling, 1 for min-max scaling, 2 for standard scaling (optional). 160 | * __--use_scaffold__: Default: 0. Determines whether to use scaffold splitting (1) or random splitting (0) (optional). 161 | * __--train_batch_size__: Default: 8 (optional). 162 | * __--validation_batch_size__ : Default: 8 (optional). 163 | * __--num_epochs__: Default: 50. Number of epochs (optional). 164 | * __--lr__: Default: 1e-5: Learning rate (optional). 165 | * __--wd__: Default: 0.1: Weight decay (optional). 166 | 167 |
168 | 169 | ## Producing Molecular Property Predictions with Fine-tuned Models 170 | 171 | Fine-tuned SELFormer models are available for download [here](https://drive.google.com/drive/folders/1LVw1YZBL1AUAGCxIkavz0KMJNVyzxAXG?usp=share_link). To make predictions with these models, please follow the instructions below. 172 | 173 |
174 | 175 | ### Binary Classification 176 | 177 | To make predictions for either BACE, BBBP, and HIV datasets, please run the command below. Change the indicated arguments for different tasks. Default parameters will load fine-tuned model on BBBP. 178 | 179 | ``` 180 | python3 binary_class_pred.py --task=bbbp --model_name=data/finetuned_models/SELFormer_bbbp_scaffold_optimized --tokenizer=data/RobertaFastTokenizer --pred_set=data/finetuning_datasets/classification/bbbp/bbbp_mock.csv --training_args=data/finetuned_models/SELFormer_bbbp_scaffold_optimized/training_args.bin 181 | ``` 182 | 183 | * __--task__: Binary classification task to choose. (bace, bbbp, hiv) (required). 184 | * __--model_name__: Path of the fine-tuned model (required). 185 | * __--tokenizer__: Tokenizer selection (required). 186 | * __--pred_set__: Molecules to make predictions. Should be a CSV file with a single column. Header should be smiles (required). 187 | * __--training_args__: Initialize the model arguments (required). 188 | 189 |
190 | 191 | ### Multi-Label Classification 192 | 193 | To make predictions for either Tox21 and SIDER datasets, please run the command below. Change the indicated arguments for different tasks. Default parameters will load fine-tuned model on SIDER. 194 | 195 | ``` 196 | python3 multilabel_class_pred.py --task=sider --model_name=data/finetuned_models/SELFormer_sider_scaffold_optimized --pred_set=data/finetuning_datasets/classification/sider/sider_mock.csv --training_args=data/finetuned_models/SELFormer_sider_scaffold_optimized/training_args.bin --num_labels=27 197 | ``` 198 | 199 | * __--task__: Multi-label classification task to choose. (tox21, sider) (required). 200 | * __--model_name__: Path of the fine-tuned model (required). 201 | * __--pred_set__: Molecules to make predictions. Should be a CSV file with a single column containing SMILES. Header should be 'smiles' (required). 202 | * __--training_args__: Initialize the model arguments (required). 203 | * __--num_labels__: Number of labels (required). 204 | 205 |
206 | 207 | ### Regression 208 | 209 | To make predictions for either ESOL, FreeSolv, Lipophilicity, and PDBBind datasets, please run the command below. Change the indicated arguments for different tasks. Default parameters will load fine-tuned model on ESOL. 210 | 211 | ``` 212 | python3 regression_pred.py --task=esol --model_name=data/finetuned_models/esol_regression --tokenizer=data/RobertaFastTokenizer --pred_set=data/finetuning_datasets/classification/esol/esol_mock.csv --training_args=data/finetuned_models/esol_regression/training_args.bin 213 | ``` 214 | 215 | * __--task__: Binary classification task to choose. (esol, freesolv, lipo, pdbbind_full) (required). 216 | * __--model_name__: Path of the fine-tuned model (required). 217 | * __--tokenizer__: Tokenizer selection (required). 218 | * __--pred_set__: Molecules to make predictions. Should be a CSV file with a single column. Header should be smiles (required). 219 | * __--training_args__: Initialize the model arguments (required). 220 | 221 |
222 | 223 | ## License 224 | Copyright (C) 2023 HUBioDataLab 225 | 226 | This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. 227 | 228 | This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. 229 | 230 | You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/. 231 | 232 | -------------------------------------------------------------------------------- /binary_class_pred.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import pandas as pd 4 | import torch 5 | from torch.nn import CrossEntropyLoss 6 | from torch.utils.data import Dataset 7 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast 8 | from transformers.models.roberta.modeling_roberta import ( 9 | RobertaClassificationHead, 10 | RobertaConfig, 11 | RobertaModel, 12 | ) 13 | from transformers import Trainer 14 | from prepare_finetuning_data import smiles_to_selfies 15 | import argparse 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument("--task", default="bbbp", help="task selection.") 19 | parser.add_argument("--tokenizer_name", default="data/RobertaFastTokenizer", metavar="/path/to/dataset/", help="Tokenizer selection.") 20 | parser.add_argument("--pred_set", default="data/finetuning_datasets/classification/bbbp/bbbp_mock.csv", metavar="/path/to/dataset/", help="Test set for predictions.") 21 | parser.add_argument("--training_args", default= "data/finetuned_models/SELFormer_bbbp_scaffold_optimized/training_args.bin", metavar="/path/to/dataset/", help="Trained model arguments.") 22 | parser.add_argument("--model_name", default="data/finetuned_models/SELFormer_bbbp_scaffold_optimized", metavar="/path/to/dataset/", help="Path to the model.") 23 | args = parser.parse_args() 24 | 25 | class SELFIESTransformers_For_Classification(BertPreTrainedModel): 26 | def __init__(self, config): 27 | super(SELFIESTransformers_For_Classification, self).__init__(config) 28 | self.num_labels = config.num_labels 29 | self.roberta = RobertaModel(config) 30 | self.classifier = RobertaClassificationHead(config) 31 | 32 | def forward(self, input_ids, attention_mask, labels=None): 33 | outputs = self.roberta(input_ids, attention_mask=attention_mask) 34 | sequence_output = outputs[0] 35 | logits = self.classifier(sequence_output) 36 | 37 | outputs = (logits,) + outputs[2:] 38 | 39 | if labels is not None: 40 | loss_fct = CrossEntropyLoss() 41 | loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) 42 | outputs = (loss,) + outputs 43 | return outputs # (loss), logits, (hidden_states), (attentions) 44 | 45 | model_class = SELFIESTransformers_For_Classification 46 | config_class = RobertaConfig 47 | tokenizer_name = args.tokenizer_name 48 | 49 | tokenizer_class = RobertaTokenizerFast 50 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False) 51 | 52 | # Prepare and Get Data 53 | class SELFIESTransfomers_Dataset(Dataset): 54 | def __init__(self, data, tokenizer, MAX_LEN): 55 | text = data 56 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt") 57 | 58 | 59 | def __len__(self): 60 | return len(self.examples["input_ids"]) 61 | 62 | def __getitem__(self, index): 63 | item = {key: self.examples[key][index] for key in self.examples} 64 | 65 | return item 66 | 67 | pred_set = pd.read_csv(args.pred_set) 68 | pred_df_selfies = smiles_to_selfies(pred_set) 69 | 70 | MAX_LEN = 128 71 | 72 | pred_examples = (pred_df_selfies.iloc[:, 0].astype(str).tolist()) 73 | pred_dataset = SELFIESTransfomers_Dataset(pred_examples, tokenizer, MAX_LEN) 74 | 75 | training_args = torch.load(args.training_args) 76 | 77 | model_name = args.model_name 78 | config = config_class.from_pretrained(model_name, num_labels=2) 79 | bbbp_model = model_class.from_pretrained(model_name, config=config) 80 | 81 | trainer = Trainer(model=bbbp_model, args=training_args) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset 82 | raw_pred, label_ids, metrics = trainer.predict(pred_dataset) 83 | print(raw_pred) 84 | y_pred = np.argmax(raw_pred, axis=1).astype(int) 85 | res = pd.concat([pred_df_selfies, pd.DataFrame(y_pred, columns=["prediction"])], axis = 1) 86 | 87 | if not os.path.exists("data/predictions"): 88 | os.makedirs("data/predictions") 89 | 90 | res.to_csv("data/predictions/{}_predictions.csv".format(args.task), index=False) -------------------------------------------------------------------------------- /bpe_tokenizer.py: -------------------------------------------------------------------------------- 1 | from tokenizers import Tokenizer 2 | from tokenizers.models import BPE 3 | from tokenizers.pre_tokenizers import Split 4 | from tokenizers import Regex 5 | from tokenizers.processors import TemplateProcessing 6 | from tokenizers.trainers import BpeTrainer 7 | 8 | from os import mkdir 9 | 10 | 11 | def bpe_tokenizer(path="./data/selfies_subset.txt", save_to="./data/bpe/"): 12 | try: 13 | mkdir(save_to) 14 | except FileExistsError: 15 | pass 16 | 17 | tokenizer = Tokenizer(BPE(unk_token="")) 18 | 19 | tokenizer.pre_tokenizer = Split(pattern=Regex("\[|\]"), behavior="removed") 20 | 21 | tokenizer.post_processor = TemplateProcessing(single=" $A ", pair=" $A $B:1 :1", special_tokens=[("", 1), ("", 2)],) 22 | 23 | trainer = BpeTrainer(special_tokens=["", "", "", "", ""]) 24 | tokenizer.train(files=[path], trainer=trainer) 25 | 26 | tokenizer.save(save_to + "/bpe.json", pretty=True) 27 | tokenizer.model.save(save_to) -------------------------------------------------------------------------------- /create_hparam_set.py: -------------------------------------------------------------------------------- 1 | import yaml 2 | 3 | 4 | def create_hparam_yml(TRAIN_BATCH_SIZE, TRAIN_EPOCHS, LEARNING_RATE, WEIGHT_DECAY, NUM_ATTENTION_HEADS, NUM_HIDDEN_LAYERS, save_to="hparams.yml"): 5 | # Hyperparameters 6 | hparams = {} 7 | set_no = 0 8 | for batch_size in TRAIN_BATCH_SIZE: 9 | for num_epoch in TRAIN_EPOCHS: 10 | for lr in LEARNING_RATE: 11 | for wd in WEIGHT_DECAY: 12 | for num_heads in NUM_ATTENTION_HEADS: 13 | for num_layers in NUM_HIDDEN_LAYERS: 14 | hparams["set_" + str(set_no)] = { 15 | "TRAIN_BATCH_SIZE": batch_size, 16 | "VALID_BATCH_SIZE": 8, 17 | "TRAIN_EPOCHS": num_epoch, 18 | "LEARNING_RATE": lr, 19 | "WEIGHT_DECAY": wd, 20 | "MAX_LEN": 128, 21 | "VOCAB_SIZE": 800, 22 | "MAX_POSITION_EMBEDDINGS": 514, 23 | "NUM_ATTENTION_HEADS": num_heads, 24 | "NUM_HIDDEN_LAYERS": num_layers, 25 | "TYPE_VOCAB_SIZE": 1, 26 | "HIDDEN_SIZE": 768, 27 | } 28 | set_no += 1 29 | set_no += 1 30 | set_no += 1 31 | set_no += 1 32 | set_no += 1 33 | set_no += 1 34 | 35 | # Write to yaml file 36 | with open(save_to, "w") as f: 37 | yaml.dump(hparams, f) 38 | 39 | 40 | create_hparam_yml(TRAIN_BATCH_SIZE=[16, 32, 64], TRAIN_EPOCHS=[5, 10], LEARNING_RATE=[1e-5], WEIGHT_DECAY=[0.001], NUM_ATTENTION_HEADS=[4, 8], NUM_HIDDEN_LAYERS=[8, 12]) 41 | -------------------------------------------------------------------------------- /create_selfies_alphabet.py: -------------------------------------------------------------------------------- 1 | import selfies as sf 2 | 3 | 4 | def get_selfies_alphabet(chembl_df, path="./data/chembl_29_selfies_alphabet.txt"): 5 | selfies_array = chembl_df.selfies.to_numpy(copy=True) 6 | selfies_alphabet = sf.get_alphabet_from_selfies(selfies_array) 7 | 8 | with open(path, "w") as f: 9 | f.write(",".join(list(selfies_alphabet))) 10 | -------------------------------------------------------------------------------- /data/BPETokenizer/bpe.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "1.0", 3 | "truncation": null, 4 | "padding": null, 5 | "added_tokens": [ 6 | { 7 | "id": 0, 8 | "special": true, 9 | "content": "", 10 | "single_word": false, 11 | "lstrip": false, 12 | "rstrip": false, 13 | "normalized": false 14 | }, 15 | { 16 | "id": 1, 17 | "special": true, 18 | "content": "", 19 | "single_word": false, 20 | "lstrip": false, 21 | "rstrip": false, 22 | "normalized": false 23 | }, 24 | { 25 | "id": 2, 26 | "special": true, 27 | "content": "", 28 | "single_word": false, 29 | "lstrip": false, 30 | "rstrip": false, 31 | "normalized": false 32 | }, 33 | { 34 | "id": 3, 35 | "special": true, 36 | "content": "", 37 | "single_word": false, 38 | "lstrip": false, 39 | "rstrip": false, 40 | "normalized": false 41 | }, 42 | { 43 | "id": 4, 44 | "special": true, 45 | "content": "", 46 | "single_word": false, 47 | "lstrip": false, 48 | "rstrip": false, 49 | "normalized": false 50 | } 51 | ], 52 | "normalizer": null, 53 | "pre_tokenizer": { 54 | "type": "Split", 55 | "pattern": { 56 | "Regex": "\\[|\\]" 57 | }, 58 | "behavior": "Removed", 59 | "invert": false 60 | }, 61 | "post_processor": { 62 | "type": "TemplateProcessing", 63 | "single": [ 64 | { 65 | "SpecialToken": { 66 | "id": "", 67 | "type_id": 0 68 | } 69 | }, 70 | { 71 | "Sequence": { 72 | "id": "A", 73 | "type_id": 0 74 | } 75 | }, 76 | { 77 | "SpecialToken": { 78 | "id": "", 79 | "type_id": 0 80 | } 81 | } 82 | ], 83 | "pair": [ 84 | { 85 | "SpecialToken": { 86 | "id": "", 87 | "type_id": 0 88 | } 89 | }, 90 | { 91 | "Sequence": { 92 | "id": "A", 93 | "type_id": 0 94 | } 95 | }, 96 | { 97 | "SpecialToken": { 98 | "id": "", 99 | "type_id": 0 100 | } 101 | }, 102 | { 103 | "Sequence": { 104 | "id": "B", 105 | "type_id": 1 106 | } 107 | }, 108 | { 109 | "SpecialToken": { 110 | "id": "", 111 | "type_id": 1 112 | } 113 | } 114 | ], 115 | "special_tokens": { 116 | "": { 117 | "id": "", 118 | "ids": [ 119 | 2 120 | ], 121 | "tokens": [ 122 | "" 123 | ] 124 | }, 125 | "": { 126 | "id": "", 127 | "ids": [ 128 | 1 129 | ], 130 | "tokens": [ 131 | "" 132 | ] 133 | } 134 | } 135 | }, 136 | "decoder": null, 137 | "model": { 138 | "type": "BPE", 139 | "dropout": null, 140 | "unk_token": "", 141 | "continuing_subword_prefix": null, 142 | "end_of_word_suffix": null, 143 | "fuse_unk": false, 144 | "vocab": { 145 | "": 0, 146 | "": 1, 147 | "": 2, 148 | "": 3, 149 | "": 4, 150 | "\n": 5, 151 | "#": 6, 152 | "+": 7, 153 | "-": 8, 154 | ".": 9, 155 | "/": 10, 156 | "0": 11, 157 | "1": 12, 158 | "2": 13, 159 | "3": 14, 160 | "4": 15, 161 | "5": 16, 162 | "8": 17, 163 | "=": 18, 164 | "@": 19, 165 | "A": 20, 166 | "B": 21, 167 | "C": 22, 168 | "F": 23, 169 | "H": 24, 170 | "I": 25, 171 | "K": 26, 172 | "L": 27, 173 | "M": 28, 174 | "N": 29, 175 | "O": 30, 176 | "P": 31, 177 | "R": 32, 178 | "S": 33, 179 | "T": 34, 180 | "Z": 35, 181 | "\\": 36, 182 | "a": 37, 183 | "c": 38, 184 | "e": 39, 185 | "g": 40, 186 | "h": 41, 187 | "i": 42, 188 | "l": 43, 189 | "n": 44, 190 | "r": 45, 191 | "s": 46, 192 | "Br": 47, 193 | "an": 48, 194 | "ch": 49, 195 | "Bran": 50, 196 | "Branch": 51, 197 | "Branch1": 52, 198 | "=C": 53, 199 | "Ri": 54, 200 | "ng": 55, 201 | "Ring": 56, 202 | "Ring1": 57, 203 | "=Branch1": 58, 204 | "Branch2": 59, 205 | "=O": 60, 206 | "Ring2": 61, 207 | "H1": 62, 208 | "C@": 63, 209 | "=N": 64, 210 | "#Branch1": 65, 211 | "C@@": 66, 212 | "=Branch2": 67, 213 | "C@H1": 68, 214 | "C@@H1": 69, 215 | "#Branch2": 70, 216 | "Cl": 71, 217 | "#C": 72, 218 | "/C": 73, 219 | "NH1": 74, 220 | "+1": 75, 221 | "-1": 76, 222 | "=Ring1": 77, 223 | "O-1": 78, 224 | "N+1": 79, 225 | "\\C": 80, 226 | "/N": 81, 227 | "#N": 82, 228 | "=Ring2": 83, 229 | "=S": 84, 230 | "=N+1": 85, 231 | "Na": 86, 232 | "Na+1": 87, 233 | "\\N": 88, 234 | "S+1": 89, 235 | "/O": 90, 236 | "\\S": 91, 237 | "\\O": 92, 238 | "Br-1": 93, 239 | "I-1": 94, 240 | "Cl-1": 95, 241 | "/C@H1": 96, 242 | "Branch3": 97, 243 | "/C@@H1": 98, 244 | "=P": 99, 245 | "/S": 100, 246 | "=N-1": 101, 247 | "Si": 102, 248 | "K+1": 103, 249 | "N-1": 104, 250 | "Se": 105, 251 | "Li": 106, 252 | "Li+1": 107, 253 | "+3": 108, 254 | "Cl+3": 109, 255 | "\\C@H1": 110, 256 | "Ring3": 111, 257 | "\\C@@H1": 112, 258 | "/N+1": 113, 259 | "/P": 114, 260 | "\\F": 115, 261 | "P@": 116, 262 | "2H": 117, 263 | "PH1": 118, 264 | "/Br": 119, 265 | "N@": 120, 266 | "P+1": 121, 267 | "/Cl": 122, 268 | "\\NH1": 123, 269 | "\\Br": 124, 270 | "@+1": 125, 271 | "/I": 126, 272 | "/C@": 127, 273 | "Te": 128, 274 | "\\N+1": 129, 275 | "P@@": 130, 276 | "12": 131, 277 | "5I": 132, 278 | "\\O-1": 133, 279 | "125I": 134, 280 | "/F": 135, 281 | "#N+1": 136, 282 | "\\Cl": 137, 283 | "N@+1": 138, 284 | "\\I": 139, 285 | "-/": 140, 286 | "/C@@": 141, 287 | "N@@": 142, 288 | "N@@+1": 143, 289 | "-/Ring2": 144, 290 | "-\\": 145, 291 | "14": 146, 292 | "B-1": 147, 293 | "C-1": 148, 294 | "S@+1": 149, 295 | "14C": 150, 296 | "H2": 151, 297 | "H4": 152, 298 | "I+1": 153, 299 | "S-1": 154, 300 | "\\P": 155, 301 | "=S+1": 156, 302 | "=P@": 157, 303 | "SiH4": 158, 304 | "+2": 159, 305 | "3H": 160, 306 | "@@+1": 161, 307 | "Ag": 162, 308 | "C+1": 163, 309 | "S@@+1": 164, 310 | "Cl+1": 165, 311 | "=Se": 166, 312 | "-\\Ring1": 167, 313 | "H0": 168, 314 | "OH0": 169, 315 | "11": 170, 316 | "=Branch3": 171, 317 | "=Te": 172, 318 | "Mg": 173, 319 | "O+1": 174, 320 | "Zn": 175, 321 | "\\C@": 176, 322 | "\\S+1": 177, 323 | "H1-1": 178, 324 | "SeH1": 179, 325 | "P@+1": 180, 326 | "-\\Ring2": 181, 327 | "11C": 182, 328 | "=Te+1": 183, 329 | "Zn+2": 184, 330 | "/NH1": 185, 331 | "18": 186, 332 | "As": 187, 333 | "BH2": 188, 334 | "BH1-1": 189, 335 | "Ca": 190, 336 | "H3": 191, 337 | "OH1-1": 192, 338 | "SH2": 193, 339 | "=O+1": 194, 340 | "Se+1": 195, 341 | "TeH2": 196, 342 | "125IH1": 197, 343 | "-/Ring1": 198, 344 | "14CH2": 199, 345 | "Ag+1": 200, 346 | "=Se+1": 201, 347 | "MgH2": 202, 348 | "Mg+2": 203, 349 | "11CH3": 204, 350 | "18F": 205, 351 | "BH2-1": 206, 352 | "Ca+2": 207 353 | }, 354 | "merges": [ 355 | "B r", 356 | "a n", 357 | "c h", 358 | "Br an", 359 | "Bran ch", 360 | "Branch 1", 361 | "= C", 362 | "R i", 363 | "n g", 364 | "Ri ng", 365 | "Ring 1", 366 | "= Branch1", 367 | "Branch 2", 368 | "= O", 369 | "Ring 2", 370 | "H 1", 371 | "C @", 372 | "= N", 373 | "# Branch1", 374 | "C@ @", 375 | "= Branch2", 376 | "C@ H1", 377 | "C@@ H1", 378 | "# Branch2", 379 | "C l", 380 | "# C", 381 | "/ C", 382 | "N H1", 383 | "+ 1", 384 | "- 1", 385 | "= Ring1", 386 | "O -1", 387 | "N +1", 388 | "\\ C", 389 | "/ N", 390 | "# N", 391 | "= Ring2", 392 | "= S", 393 | "=N +1", 394 | "N a", 395 | "Na +1", 396 | "\\ N", 397 | "S +1", 398 | "/ O", 399 | "\\ S", 400 | "\\ O", 401 | "Br -1", 402 | "I -1", 403 | "Cl -1", 404 | "/ C@H1", 405 | "Branch 3", 406 | "/ C@@H1", 407 | "= P", 408 | "/ S", 409 | "=N -1", 410 | "S i", 411 | "K +1", 412 | "N -1", 413 | "S e", 414 | "L i", 415 | "Li +1", 416 | "+ 3", 417 | "Cl +3", 418 | "\\ C@H1", 419 | "Ring 3", 420 | "\\ C@@H1", 421 | "/ N+1", 422 | "/ P", 423 | "\\ F", 424 | "P @", 425 | "2 H", 426 | "P H1", 427 | "/ Br", 428 | "N @", 429 | "P +1", 430 | "/ Cl", 431 | "\\ NH1", 432 | "\\ Br", 433 | "@ +1", 434 | "/ I", 435 | "/ C@", 436 | "T e", 437 | "\\ N+1", 438 | "P@ @", 439 | "1 2", 440 | "5 I", 441 | "\\ O-1", 442 | "12 5I", 443 | "/ F", 444 | "# N+1", 445 | "\\ Cl", 446 | "N@ +1", 447 | "\\ I", 448 | "- /", 449 | "/ C@@", 450 | "N@ @", 451 | "N@ @+1", 452 | "-/ Ring2", 453 | "- \\", 454 | "1 4", 455 | "B -1", 456 | "C -1", 457 | "S @+1", 458 | "14 C", 459 | "H 2", 460 | "H 4", 461 | "I +1", 462 | "S -1", 463 | "\\ P", 464 | "=S +1", 465 | "=P @", 466 | "Si H4", 467 | "+ 2", 468 | "3 H", 469 | "@ @+1", 470 | "A g", 471 | "C +1", 472 | "S @@+1", 473 | "Cl +1", 474 | "=S e", 475 | "-\\ Ring1", 476 | "H 0", 477 | "O H0", 478 | "1 1", 479 | "= Branch3", 480 | "= Te", 481 | "M g", 482 | "O +1", 483 | "Z n", 484 | "\\ C@", 485 | "\\ S+1", 486 | "H1 -1", 487 | "Se H1", 488 | "P@ +1", 489 | "-\\ Ring2", 490 | "11 C", 491 | "=Te +1", 492 | "Zn +2", 493 | "/ NH1", 494 | "1 8", 495 | "A s", 496 | "B H2", 497 | "B H1-1", 498 | "C a", 499 | "H 3", 500 | "O H1-1", 501 | "S H2", 502 | "=O +1", 503 | "Se +1", 504 | "Te H2", 505 | "125I H1", 506 | "-/ Ring1", 507 | "14C H2", 508 | "Ag +1", 509 | "=Se +1", 510 | "Mg H2", 511 | "Mg +2", 512 | "11C H3", 513 | "18 F", 514 | "BH2 -1", 515 | "Ca +2" 516 | ] 517 | } 518 | } -------------------------------------------------------------------------------- /data/BPETokenizer/merges.txt: -------------------------------------------------------------------------------- 1 | #version: 0.2 - Trained by `huggingface/tokenizers` 2 | B r 3 | a n 4 | c h 5 | Br an 6 | Bran ch 7 | Branch 1 8 | = C 9 | R i 10 | n g 11 | Ri ng 12 | Ring 1 13 | = Branch1 14 | Branch 2 15 | = O 16 | Ring 2 17 | H 1 18 | C @ 19 | = N 20 | # Branch1 21 | C@ @ 22 | = Branch2 23 | C@ H1 24 | C@@ H1 25 | # Branch2 26 | C l 27 | # C 28 | / C 29 | N H1 30 | + 1 31 | - 1 32 | = Ring1 33 | O -1 34 | N +1 35 | \ C 36 | / N 37 | # N 38 | = Ring2 39 | = S 40 | =N +1 41 | N a 42 | Na +1 43 | \ N 44 | S +1 45 | / O 46 | \ S 47 | \ O 48 | Br -1 49 | I -1 50 | Cl -1 51 | / C@H1 52 | Branch 3 53 | / C@@H1 54 | = P 55 | / S 56 | =N -1 57 | S i 58 | K +1 59 | N -1 60 | S e 61 | L i 62 | Li +1 63 | + 3 64 | Cl +3 65 | \ C@H1 66 | Ring 3 67 | \ C@@H1 68 | / N+1 69 | / P 70 | \ F 71 | P @ 72 | 2 H 73 | P H1 74 | / Br 75 | N @ 76 | P +1 77 | / Cl 78 | \ NH1 79 | \ Br 80 | @ +1 81 | / I 82 | / C@ 83 | T e 84 | \ N+1 85 | P@ @ 86 | 1 2 87 | 5 I 88 | \ O-1 89 | 12 5I 90 | / F 91 | # N+1 92 | \ Cl 93 | N@ +1 94 | \ I 95 | - / 96 | / C@@ 97 | N@ @ 98 | N@ @+1 99 | -/ Ring2 100 | - \ 101 | 1 4 102 | B -1 103 | C -1 104 | S @+1 105 | 14 C 106 | H 2 107 | H 4 108 | I +1 109 | S -1 110 | \ P 111 | =S +1 112 | =P @ 113 | Si H4 114 | + 2 115 | 3 H 116 | @ @+1 117 | A g 118 | C +1 119 | S @@+1 120 | Cl +1 121 | =S e 122 | -\ Ring1 123 | H 0 124 | O H0 125 | 1 1 126 | = Branch3 127 | = Te 128 | M g 129 | O +1 130 | Z n 131 | \ C@ 132 | \ S+1 133 | H1 -1 134 | Se H1 135 | P@ +1 136 | -\ Ring2 137 | 11 C 138 | =Te +1 139 | Zn +2 140 | / NH1 141 | 1 8 142 | A s 143 | B H2 144 | B H1-1 145 | C a 146 | H 3 147 | O H1-1 148 | S H2 149 | =O +1 150 | Se +1 151 | Te H2 152 | 125I H1 153 | -/ Ring1 154 | 14C H2 155 | Ag +1 156 | =Se +1 157 | Mg H2 158 | Mg +2 159 | 11C H3 160 | 18 F 161 | BH2 -1 162 | Ca +2 163 | -------------------------------------------------------------------------------- /data/BPETokenizer/vocab.json: -------------------------------------------------------------------------------- 1 | {"":0,"":1,"":2,"":3,"":4,"\n":5,"#":6,"+":7,"-":8,".":9,"/":10,"0":11,"1":12,"2":13,"3":14,"4":15,"5":16,"8":17,"=":18,"@":19,"A":20,"B":21,"C":22,"F":23,"H":24,"I":25,"K":26,"L":27,"M":28,"N":29,"O":30,"P":31,"R":32,"S":33,"T":34,"Z":35,"\\":36,"a":37,"c":38,"e":39,"g":40,"h":41,"i":42,"l":43,"n":44,"r":45,"s":46,"Br":47,"an":48,"ch":49,"Bran":50,"Branch":51,"Branch1":52,"=C":53,"Ri":54,"ng":55,"Ring":56,"Ring1":57,"=Branch1":58,"Branch2":59,"=O":60,"Ring2":61,"H1":62,"C@":63,"=N":64,"#Branch1":65,"C@@":66,"=Branch2":67,"C@H1":68,"C@@H1":69,"#Branch2":70,"Cl":71,"#C":72,"/C":73,"NH1":74,"+1":75,"-1":76,"=Ring1":77,"O-1":78,"N+1":79,"\\C":80,"/N":81,"#N":82,"=Ring2":83,"=S":84,"=N+1":85,"Na":86,"Na+1":87,"\\N":88,"S+1":89,"/O":90,"\\S":91,"\\O":92,"Br-1":93,"I-1":94,"Cl-1":95,"/C@H1":96,"Branch3":97,"/C@@H1":98,"=P":99,"/S":100,"=N-1":101,"Si":102,"K+1":103,"N-1":104,"Se":105,"Li":106,"Li+1":107,"+3":108,"Cl+3":109,"\\C@H1":110,"Ring3":111,"\\C@@H1":112,"/N+1":113,"/P":114,"\\F":115,"P@":116,"2H":117,"PH1":118,"/Br":119,"N@":120,"P+1":121,"/Cl":122,"\\NH1":123,"\\Br":124,"@+1":125,"/I":126,"/C@":127,"Te":128,"\\N+1":129,"P@@":130,"12":131,"5I":132,"\\O-1":133,"125I":134,"/F":135,"#N+1":136,"\\Cl":137,"N@+1":138,"\\I":139,"-/":140,"/C@@":141,"N@@":142,"N@@+1":143,"-/Ring2":144,"-\\":145,"14":146,"B-1":147,"C-1":148,"S@+1":149,"14C":150,"H2":151,"H4":152,"I+1":153,"S-1":154,"\\P":155,"=S+1":156,"=P@":157,"SiH4":158,"+2":159,"3H":160,"@@+1":161,"Ag":162,"C+1":163,"S@@+1":164,"Cl+1":165,"=Se":166,"-\\Ring1":167,"H0":168,"OH0":169,"11":170,"=Branch3":171,"=Te":172,"Mg":173,"O+1":174,"Zn":175,"\\C@":176,"\\S+1":177,"H1-1":178,"SeH1":179,"P@+1":180,"-\\Ring2":181,"11C":182,"=Te+1":183,"Zn+2":184,"/NH1":185,"18":186,"As":187,"BH2":188,"BH1-1":189,"Ca":190,"H3":191,"OH1-1":192,"SH2":193,"=O+1":194,"Se+1":195,"TeH2":196,"125IH1":197,"-/Ring1":198,"14CH2":199,"Ag+1":200,"=Se+1":201,"MgH2":202,"Mg+2":203,"11CH3":204,"18F":205,"BH2-1":206,"Ca+2":207} -------------------------------------------------------------------------------- /data/RobertaFastTokenizer/merges.txt: -------------------------------------------------------------------------------- 1 | #version: 0.2 - Trained by `huggingface/tokenizers` 2 | B r 3 | a n 4 | c h 5 | Br an 6 | Bran ch 7 | Branch 1 8 | = C 9 | R i 10 | n g 11 | Ri ng 12 | Ring 1 13 | = Branch1 14 | Branch 2 15 | = O 16 | Ring 2 17 | H 1 18 | C @ 19 | = N 20 | # Branch1 21 | C@ @ 22 | = Branch2 23 | C@ H1 24 | C@@ H1 25 | # Branch2 26 | C l 27 | # C 28 | / C 29 | N H1 30 | + 1 31 | - 1 32 | = Ring1 33 | O -1 34 | N +1 35 | \ C 36 | / N 37 | # N 38 | = Ring2 39 | = S 40 | =N +1 41 | N a 42 | Na +1 43 | \ N 44 | S +1 45 | / O 46 | \ S 47 | \ O 48 | Br -1 49 | I -1 50 | Cl -1 51 | / C@H1 52 | Branch 3 53 | / C@@H1 54 | = P 55 | / S 56 | =N -1 57 | S i 58 | K +1 59 | N -1 60 | S e 61 | L i 62 | Li +1 63 | + 3 64 | Cl +3 65 | \ C@H1 66 | Ring 3 67 | \ C@@H1 68 | / N+1 69 | / P 70 | \ F 71 | P @ 72 | 2 H 73 | P H1 74 | / Br 75 | N @ 76 | P +1 77 | / Cl 78 | \ NH1 79 | \ Br 80 | @ +1 81 | / I 82 | / C@ 83 | T e 84 | \ N+1 85 | P@ @ 86 | 1 2 87 | 5 I 88 | \ O-1 89 | 12 5I 90 | / F 91 | # N+1 92 | \ Cl 93 | N@ +1 94 | \ I 95 | - / 96 | / C@@ 97 | N@ @ 98 | N@ @+1 99 | -/ Ring2 100 | - \ 101 | 1 4 102 | B -1 103 | C -1 104 | S @+1 105 | 14 C 106 | H 2 107 | H 4 108 | I +1 109 | S -1 110 | \ P 111 | =S +1 112 | =P @ 113 | Si H4 114 | + 2 115 | 3 H 116 | @ @+1 117 | A g 118 | C +1 119 | S @@+1 120 | Cl +1 121 | =S e 122 | -\ Ring1 123 | H 0 124 | O H0 125 | 1 1 126 | = Branch3 127 | = Te 128 | M g 129 | O +1 130 | Z n 131 | \ C@ 132 | \ S+1 133 | H1 -1 134 | Se H1 135 | P@ +1 136 | -\ Ring2 137 | 11 C 138 | =Te +1 139 | Zn +2 140 | / NH1 141 | 1 8 142 | A s 143 | B H2 144 | B H1-1 145 | C a 146 | H 3 147 | O H1-1 148 | S H2 149 | =O +1 150 | Se +1 151 | Te H2 152 | 125I H1 153 | -/ Ring1 154 | 14C H2 155 | Ag +1 156 | =Se +1 157 | Mg H2 158 | Mg +2 159 | 11C H3 160 | 18 F 161 | BH2 -1 162 | Ca +2 163 | -------------------------------------------------------------------------------- /data/RobertaFastTokenizer/special_tokens_map.json: -------------------------------------------------------------------------------- 1 | {"bos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "sep_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "cls_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "mask_token": {"content": "", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true}} -------------------------------------------------------------------------------- /data/RobertaFastTokenizer/tokenizer.json: -------------------------------------------------------------------------------- 1 | {"version":"1.0","truncation":null,"padding":null,"added_tokens":[{"id":0,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":1,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":2,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":3,"special":true,"content":"","single_word":false,"lstrip":false,"rstrip":false,"normalized":true},{"id":4,"special":true,"content":"","single_word":false,"lstrip":true,"rstrip":false,"normalized":true}],"normalizer":null,"pre_tokenizer":{"type":"ByteLevel","add_prefix_space":false,"trim_offsets":true},"post_processor":{"type":"RobertaProcessing","sep":["",2],"cls":["",1],"trim_offsets":true,"add_prefix_space":false},"decoder":{"type":"ByteLevel","add_prefix_space":true,"trim_offsets":true},"model":{"type":"BPE","dropout":null,"unk_token":null,"continuing_subword_prefix":"","end_of_word_suffix":"","fuse_unk":false,"vocab":{"":0,"":1,"":2,"":3,"":4,"\n":5,"#":6,"+":7,"-":8,".":9,"/":10,"0":11,"1":12,"2":13,"3":14,"4":15,"5":16,"8":17,"=":18,"@":19,"A":20,"B":21,"C":22,"F":23,"H":24,"I":25,"K":26,"L":27,"M":28,"N":29,"O":30,"P":31,"R":32,"S":33,"T":34,"Z":35,"\\":36,"a":37,"c":38,"e":39,"g":40,"h":41,"i":42,"l":43,"n":44,"r":45,"s":46,"Br":47,"an":48,"ch":49,"Bran":50,"Branch":51,"Branch1":52,"=C":53,"Ri":54,"ng":55,"Ring":56,"Ring1":57,"=Branch1":58,"Branch2":59,"=O":60,"Ring2":61,"H1":62,"C@":63,"=N":64,"#Branch1":65,"C@@":66,"=Branch2":67,"C@H1":68,"C@@H1":69,"#Branch2":70,"Cl":71,"#C":72,"/C":73,"NH1":74,"+1":75,"-1":76,"=Ring1":77,"O-1":78,"N+1":79,"\\C":80,"/N":81,"#N":82,"=Ring2":83,"=S":84,"=N+1":85,"Na":86,"Na+1":87,"\\N":88,"S+1":89,"/O":90,"\\S":91,"\\O":92,"Br-1":93,"I-1":94,"Cl-1":95,"/C@H1":96,"Branch3":97,"/C@@H1":98,"=P":99,"/S":100,"=N-1":101,"Si":102,"K+1":103,"N-1":104,"Se":105,"Li":106,"Li+1":107,"+3":108,"Cl+3":109,"\\C@H1":110,"Ring3":111,"\\C@@H1":112,"/N+1":113,"/P":114,"\\F":115,"P@":116,"2H":117,"PH1":118,"/Br":119,"N@":120,"P+1":121,"/Cl":122,"\\NH1":123,"\\Br":124,"@+1":125,"/I":126,"/C@":127,"Te":128,"\\N+1":129,"P@@":130,"12":131,"5I":132,"\\O-1":133,"125I":134,"/F":135,"#N+1":136,"\\Cl":137,"N@+1":138,"\\I":139,"-/":140,"/C@@":141,"N@@":142,"N@@+1":143,"-/Ring2":144,"-\\":145,"14":146,"B-1":147,"C-1":148,"S@+1":149,"14C":150,"H2":151,"H4":152,"I+1":153,"S-1":154,"\\P":155,"=S+1":156,"=P@":157,"SiH4":158,"+2":159,"3H":160,"@@+1":161,"Ag":162,"C+1":163,"S@@+1":164,"Cl+1":165,"=Se":166,"-\\Ring1":167,"H0":168,"OH0":169,"11":170,"=Branch3":171,"=Te":172,"Mg":173,"O+1":174,"Zn":175,"\\C@":176,"\\S+1":177,"H1-1":178,"SeH1":179,"P@+1":180,"-\\Ring2":181,"11C":182,"=Te+1":183,"Zn+2":184,"/NH1":185,"18":186,"As":187,"BH2":188,"BH1-1":189,"Ca":190,"H3":191,"OH1-1":192,"SH2":193,"=O+1":194,"Se+1":195,"TeH2":196,"125IH1":197,"-/Ring1":198,"14CH2":199,"Ag+1":200,"=Se+1":201,"MgH2":202,"Mg+2":203,"11CH3":204,"18F":205,"BH2-1":206,"Ca+2":207},"merges":["B r","a n","c h","Br an","Bran ch","Branch 1","= C","R i","n g","Ri ng","Ring 1","= Branch1","Branch 2","= O","Ring 2","H 1","C @","= N","# Branch1","C@ @","= Branch2","C@ H1","C@@ H1","# Branch2","C l","# C","/ C","N H1","+ 1","- 1","= Ring1","O -1","N +1","\\ C","/ N","# N","= Ring2","= S","=N +1","N a","Na +1","\\ N","S +1","/ O","\\ S","\\ O","Br -1","I -1","Cl -1","/ C@H1","Branch 3","/ C@@H1","= P","/ S","=N -1","S i","K +1","N -1","S e","L i","Li +1","+ 3","Cl +3","\\ C@H1","Ring 3","\\ C@@H1","/ N+1","/ P","\\ F","P @","2 H","P H1","/ Br","N @","P +1","/ Cl","\\ NH1","\\ Br","@ +1","/ I","/ C@","T e","\\ N+1","P@ @","1 2","5 I","\\ O-1","12 5I","/ F","# N+1","\\ Cl","N@ +1","\\ I","- /","/ C@@","N@ @","N@ @+1","-/ Ring2","- \\","1 4","B -1","C -1","S @+1","14 C","H 2","H 4","I +1","S -1","\\ P","=S +1","=P @","Si H4","+ 2","3 H","@ @+1","A g","C +1","S @@+1","Cl +1","=S e","-\\ Ring1","H 0","O H0","1 1","= Branch3","= Te","M g","O +1","Z n","\\ C@","\\ S+1","H1 -1","Se H1","P@ +1","-\\ Ring2","11 C","=Te +1","Zn +2","/ NH1","1 8","A s","B H2","B H1-1","C a","H 3","O H1-1","S H2","=O +1","Se +1","Te H2","125I H1","-/ Ring1","14C H2","Ag +1","=Se +1","Mg H2","Mg +2","11C H3","18 F","BH2 -1","Ca +2"]}} -------------------------------------------------------------------------------- /data/RobertaFastTokenizer/tokenizer_config.json: -------------------------------------------------------------------------------- 1 | {"unk_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "errors": "replace", "sep_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "special_tokens_map_file": null, "name_or_path": "./data/bpe/", "tokenizer_class": "RobertaTokenizer"} -------------------------------------------------------------------------------- /data/RobertaFastTokenizer/vocab.json: -------------------------------------------------------------------------------- 1 | {"":0,"":1,"":2,"":3,"":4,"\n":5,"#":6,"+":7,"-":8,".":9,"/":10,"0":11,"1":12,"2":13,"3":14,"4":15,"5":16,"8":17,"=":18,"@":19,"A":20,"B":21,"C":22,"F":23,"H":24,"I":25,"K":26,"L":27,"M":28,"N":29,"O":30,"P":31,"R":32,"S":33,"T":34,"Z":35,"\\":36,"a":37,"c":38,"e":39,"g":40,"h":41,"i":42,"l":43,"n":44,"r":45,"s":46,"Br":47,"an":48,"ch":49,"Bran":50,"Branch":51,"Branch1":52,"=C":53,"Ri":54,"ng":55,"Ring":56,"Ring1":57,"=Branch1":58,"Branch2":59,"=O":60,"Ring2":61,"H1":62,"C@":63,"=N":64,"#Branch1":65,"C@@":66,"=Branch2":67,"C@H1":68,"C@@H1":69,"#Branch2":70,"Cl":71,"#C":72,"/C":73,"NH1":74,"+1":75,"-1":76,"=Ring1":77,"O-1":78,"N+1":79,"\\C":80,"/N":81,"#N":82,"=Ring2":83,"=S":84,"=N+1":85,"Na":86,"Na+1":87,"\\N":88,"S+1":89,"/O":90,"\\S":91,"\\O":92,"Br-1":93,"I-1":94,"Cl-1":95,"/C@H1":96,"Branch3":97,"/C@@H1":98,"=P":99,"/S":100,"=N-1":101,"Si":102,"K+1":103,"N-1":104,"Se":105,"Li":106,"Li+1":107,"+3":108,"Cl+3":109,"\\C@H1":110,"Ring3":111,"\\C@@H1":112,"/N+1":113,"/P":114,"\\F":115,"P@":116,"2H":117,"PH1":118,"/Br":119,"N@":120,"P+1":121,"/Cl":122,"\\NH1":123,"\\Br":124,"@+1":125,"/I":126,"/C@":127,"Te":128,"\\N+1":129,"P@@":130,"12":131,"5I":132,"\\O-1":133,"125I":134,"/F":135,"#N+1":136,"\\Cl":137,"N@+1":138,"\\I":139,"-/":140,"/C@@":141,"N@@":142,"N@@+1":143,"-/Ring2":144,"-\\":145,"14":146,"B-1":147,"C-1":148,"S@+1":149,"14C":150,"H2":151,"H4":152,"I+1":153,"S-1":154,"\\P":155,"=S+1":156,"=P@":157,"SiH4":158,"+2":159,"3H":160,"@@+1":161,"Ag":162,"C+1":163,"S@@+1":164,"Cl+1":165,"=Se":166,"-\\Ring1":167,"H0":168,"OH0":169,"11":170,"=Branch3":171,"=Te":172,"Mg":173,"O+1":174,"Zn":175,"\\C@":176,"\\S+1":177,"H1-1":178,"SeH1":179,"P@+1":180,"-\\Ring2":181,"11C":182,"=Te+1":183,"Zn+2":184,"/NH1":185,"18":186,"As":187,"BH2":188,"BH1-1":189,"Ca":190,"H3":191,"OH1-1":192,"SH2":193,"=O+1":194,"Se+1":195,"TeH2":196,"125IH1":197,"-/Ring1":198,"14CH2":199,"Ag+1":200,"=Se+1":201,"MgH2":202,"Mg+2":203,"11CH3":204,"18F":205,"BH2-1":206,"Ca+2":207} -------------------------------------------------------------------------------- /data/finetuning_datasets/classification/bbbp/bbbp_mock.csv: -------------------------------------------------------------------------------- 1 | smiles,p_np 2 | [Cl].CC(C)NCC(O)COc1cccc2ccccc12,1 3 | C(=O)(OC(C)(C)C)CCCc1ccc(cc1)N(CCCl)CCCl,1 4 | c12c3c(N4CCN(C)CC4)c(F)cc1c(c(C(O)=O)cn2C(C)CO3)=O,1 5 | C1CCN(CC1)Cc1cccc(c1)OCCCNC(=O)C,1 6 | Cc1onc(c2ccccc2Cl)c1C(=O)N[C@H]3[C@H]4SC(C)(C)[C@@H](N4C3=O)C(O)=O,1 7 | CCN1CCN(C(=O)N[C@@H](C(=O)N[C@H]2[C@H]3SCC(=C(N3C2=O)C(O)=O)CSc4nnnn4C)c5ccc(O)cc5)C(=O)C1=O,1 8 | CN(C)[C@H]1[C@@H]2C[C@H]3C(=C(O)c4c(O)cccc4[C@@]3(C)O)C(=O)[C@]2(O)C(=O)\C(=C(/O)NCN5CCCC5)C1=O,1 9 | Cn1c2CCC(Cn3ccnc3C)C(=O)c2c4ccccc14,1 10 | COc1ccc(cc1)[C@@H]2Sc3ccccc3N(CCN(C)C)C(=O)[C@@H]2OC(C)=O,1 11 | -------------------------------------------------------------------------------- /data/finetuning_datasets/classification/sider/sider_mock.csv: -------------------------------------------------------------------------------- 1 | smiles,Hepatobiliary disorders,Metabolism and nutrition disorders,Product issues,Eye disorders,Investigations,Musculoskeletal and connective tissue disorders,Gastrointestinal disorders,Social circumstances,Immune system disorders,Reproductive system and breast disorders,"Neoplasms benign, malignant and unspecified (incl cysts and polyps)",General disorders and administration site conditions,Endocrine disorders,Surgical and medical procedures,Vascular disorders,Blood and lymphatic system disorders,Skin and subcutaneous tissue disorders,"Congenital, familial and genetic disorders",Infections and infestations,"Respiratory, thoracic and mediastinal disorders",Psychiatric disorders,Renal and urinary disorders,"Pregnancy, puerperium and perinatal conditions",Ear and labyrinth disorders,Cardiac disorders,Nervous system disorders,"Injury, poisoning and procedural complications" 2 | C(CNCCNCCNCCN)N,1,1,0,0,1,1,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0 3 | CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C,0,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,1,1,0,0,0,1,0,1,0 4 | CC[C@]12CC(=C)[C@H]3[C@H]([C@@H]1CC[C@]2(C#C)O)CCC4=CCCC[C@H]34,0,1,0,1,1,0,1,0,1,1,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1,0 5 | CCC12CC(=C)C3C(C1CC[C@]2(C#C)O)CCC4=CC(=O)CCC34,1,1,0,1,1,1,1,0,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1 6 | C1C(C2=CC=CC=C2N(C3=CC=CC=C31)C(=O)N)O,1,1,0,1,1,1,1,0,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,1,0,1,0 7 | CC[C@H](C)[C@H]1C(=O)N[C@H]2CSSC[C@@H](C(=O)N[C@@H](CSSC[C@@H](C(=O)NCC(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@@H](CSSC[C@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC2=O)CO)CC(C)C)CC3=CC=C(C=C3)O)CCC(=O)N)CC(C)C)CCC(=O)O)CC(=O)N)CC4=CC=C(C=C4)O)C(=O)NCC(=O)O)C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)NCC(=O)N[C@@H](CC5=CC=CC=C5)C(=O)N[C@@H](CC6=CC=CC=C6)C(=O)N[C@@H](CC7=CC=C(C=C7)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N8CCC[C@H]8C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CCCNC(=N)N)C(=O)O)C(C)C)CC(C)C)CC9=CC=C(C=C9)O)CC(C)C)C)CCC(=O)O)C(C)C)CC(C)C)CC2=CN=CN2)CO)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CC2=CC=CC=C2)N)C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)CO)[C@@H](C)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H]([C@@H](C)CC)NC(=O)CN,0,1,0,1,1,1,1,0,1,0,1,1,0,0,1,0,1,0,1,1,1,1,0,0,1,1,1 8 | CC1CC2C3CCC4=CC(=O)C=CC4([C@]3(C(CC2([C@]1(C(=O)CCl)O)C)O)F)C,0,1,0,1,0,0,1,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1 9 | CCCCCCOC(=O)N=C(C1=CC=C(C=C1)NCC2=NC3=C(N2C)C=CC(=C3)C(=O)N(CCC(=O)OCC)C4=CC=CC=N4)N,1,1,0,0,1,1,1,0,1,0,0,1,0,1,1,1,1,0,1,1,0,1,0,0,1,1,1 10 | CSCCC(C(=O)NCC(=O)NC(CC1=CNC2=CC=CC=C21)C(=O)NC(CCSC)C(=O)NC(CC(=O)O)C(=O)NC(CC3=CC=CC=C3)C(=O)N)NC(=O)C(CC4=CC=C(C=C4)OS(=O)(=O)O)NC(=O)C(CC(=O)O)N,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0 11 | -------------------------------------------------------------------------------- /data/finetuning_datasets/regression/esol/esol.csv: -------------------------------------------------------------------------------- 1 | smiles,Compound ID,ESOL predicted log solubility in mols per litre,Minimum Degree,Molecular Weight,Number of H-Bond Donors,Number of Rings,Number of Rotatable Bonds,Polar Surface Area,measured log solubility in mols per litre 2 | OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(O)C3O ,Amigdalin,-0.974,1,457.4320000000001,7,3,7,202.32,-0.77 3 | Cc1occc1C(=O)Nc2ccccc2,Fenfuram,-2.885,1,201.225,1,2,2,42.24,-3.3 4 | CC(C)=CCCC(C)=CC(=O),citral,-2.579,1,152.237,0,0,4,17.07,-2.06 5 | c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43,Picene,-6.617999999999999,2,278.354,0,5,0,0.0,-7.87 6 | c1ccsc1,Thiophene,-2.232,2,84.14299999999999,0,1,0,0.0,-1.33 7 | c2ccc1scnc1c2 ,benzothiazole,-2.733,2,135.191,0,2,0,12.89,-1.5 8 | Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl,"2,2,4,6,6'-PCB",-6.545,1,326.437,0,2,1,0.0,-7.32 9 | CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O,Estradiol,-4.138,1,272.388,2,4,0,40.46,-5.03 10 | ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl,Dieldrin,-4.533,1,380.913,0,5,0,12.53,-6.29 11 | COc5cc4OCC3Oc2c1CC(Oc1ccc2C(=O)C3c4cc5OC)C(C)=C ,Rotenone,-5.246,1,394.4230000000002,0,5,3,63.22,-4.42 12 | O=C1CCCN1,2-pyrrolidone,0.243,1,85.10600000000001,1,1,0,29.1,1.07 13 | Clc1ccc2ccccc2c1,2-Chloronapthalene,-4.063,1,162.61899999999997,0,2,0,0.0,-4.14 14 | CCCC=C,1-Pentene ,-2.01,1,70.135,0,0,2,0.0,-2.68 15 | CCC1(C(=O)NCNC1=O)c2ccccc2,Primidone,-1.897,1,218.256,2,2,2,58.2,-2.64 16 | CCCCCCCCCCCCCC,Tetradecane,-5.45,1,198.39399999999995,0,0,11,0.0,-7.96 17 | CC(C)Cl,2-Chloropropane,-1.585,1,78.542,0,0,0,0.0,-1.41 18 | CCC(C)CO,2-Methylbutanol,-1.027,1,88.14999999999999,1,0,2,20.23,-0.47 19 | N#Cc1ccccc1,Benzonitrile,-2.03,1,103.12399999999997,0,1,0,23.79,-1.0 20 | CCOP(=S)(OCC)Oc1cc(C)nc(n1)C(C)C,Diazinon,-3.989,1,304.35200000000003,0,1,7,53.47,-3.64 21 | CCCCCCCCCC(C)O,2-Undecanol,-3.096,1,172.312,1,0,8,20.23,-2.94 22 | Clc1ccc(c(Cl)c1)c2c(Cl)ccc(Cl)c2Cl ,"2,2',3,4,6-PCB",-6.627000000000001,1,326.437,0,2,1,0.0,-7.43 23 | O=c2[nH]c1CCCc1c(=O)n2C3CCCCC3,Lenacil,-3.355,1,234.29899999999995,1,3,1,54.86,-4.593999999999999 24 | CCOP(=S)(OCC)SCSCC,Phorate,-3.747,1,260.386,0,0,8,18.46,-4.11 25 | CCOc1ccc(NC(=O)C)cc1,Phenacetin,-2.342,1,179.219,1,1,3,38.33,-2.35 26 | CCN(CC)c1c(cc(c(N)c1N(=O)=O)C(F)(F)F)N(=O)=O,Dinitramine,-4.479,1,322.243,1,1,5,115.54000000000002,-5.47 27 | CCCCCCCO,1-Heptanol,-1.751,1,116.204,1,0,5,20.23,-1.81 28 | Cn1c(=O)n(C)c2nc[nH]c2c1=O,Theophylline,-1.452,1,180.167,1,2,0,72.68,-1.39 29 | CCCCC1(CC)C(=O)NC(=O)NC1=O,Butethal,-1.974,1,212.249,2,1,4,75.27000000000001,-1.661 30 | ClC(Cl)=C(c1ccc(Cl)cc1)c2ccc(Cl)cc2,"P,P'-DDE",-6.553,1,318.0300000000001,0,2,2,0.0,-6.9 31 | CCCCCCCC(=O)OC,Methyl octanoate,-2.608,1,158.241,0,0,6,26.3,-3.17 32 | CCc1ccc(CC)cc1,"1,4-Diethylbenzene ",-3.633,1,134.22199999999998,0,1,2,0.0,-3.75 33 | CCOP(=S)(OCC)SCSC(C)(C)C,Terbufos,-4.367,1,288.44,0,0,7,18.46,-4.755 34 | COC(=O)Nc1cccc(OC(=O)Nc2cccc(C)c2)c1,Phenmedipham,-4.229,1,300.314,2,2,3,76.66,-4.805 35 | ClC(=C)Cl,"1,1-Dichloroethylene",-1.939,1,96.944,0,0,0,0.0,-1.64 36 | Cc1cccc2c1Cc3ccccc32,1-Methylfluorene,-4.478,1,180.25000000000003,0,3,0,0.0,-5.22 37 | CCCCC=O,Valeraldehyde,-1.103,1,86.13399999999999,0,0,3,17.07,-0.85 38 | N(c1ccccc1)c2ccccc2,Diphenylamine,-3.857,2,169.227,1,2,2,12.03,-3.504 39 | CN(C)C(=O)SCCCCOc1ccccc1,Fenothiocarb,-3.297,1,253.367,0,1,6,29.540000000000003,-3.927 40 | CCCOP(=S)(OCCC)SCC(=O)N1CCCCC1C,Piperophos,-4.637,1,353.4900000000001,0,1,9,38.77,-4.15 41 | CCCCCCCI,1-Iodoheptane,-3.904,1,226.101,0,0,5,0.0,-4.81 42 | c1c(Cl)cccc1c2ccccc2,3-Chlorobiphenyl,-4.685,1,188.657,0,2,1,0.0,-4.88 43 | OCCCC=C,4-Pentene-1-ol,-0.7909999999999999,1,86.134,1,0,3,20.23,-0.15 44 | O=C2NC(=O)C1(CCC1)C(=O)N2,Cyclobutyl-5-spirobarbituric acid,-0.527,1,168.15200000000002,2,2,0,75.27,-1.655 45 | CC(C)C1CCC(C)CC1O ,menthol,-2.782,1,156.269,1,1,1,20.23,-2.53 46 | CC(C)OC=O,Isopropyl formate,-0.684,1,88.106,0,0,2,26.3,-0.63 47 | CCCCCC(C)O,2-Heptanol ,-1.6780000000000002,1,116.20399999999998,1,0,4,20.23,-1.55 48 | CC(=O)Nc1ccc(Br)cc1,p-Bromoacetanilide,-3.012,1,214.062,1,1,1,29.1,-3.083 49 | c1ccccc1n2ncc(N)c(Br)c2(=O),brompyrazone,-3.005,1,266.098,1,2,1,60.91,-3.127 50 | COC(=O)C1=C(C)NC(=C(C1c2ccccc2N(=O)=O)C(=O)OC)C ,nifedipine,-4.248,1,346.33900000000017,1,2,4,107.77,-4.76 51 | c2c(C)cc1nc(C)ccc1c2 ,"2,7-dimethylquinoline",-3.342,1,157.216,0,2,0,12.89,-1.94 52 | CCCCCCC#C,1-Octyne ,-2.509,1,110.2,0,0,4,0.0,-3.66 53 | CCC1(C(=O)NC(=O)NC1=O)C2=CCCCC2 ,cyclobarbital,-2.421,1,236.271,2,2,2,75.27000000000001,-2.17 54 | c1ccc2c(c1)ccc3c4ccccc4ccc23,Chrysene,-5.568,2,228.294,0,4,0,0.0,-8.057 55 | CCC(C)n1c(=O)[nH]c(C)c(Br)c1=O ,Bromacil,-3.419,1,261.119,1,1,2,54.86,-2.523 56 | Clc1cccc(c1Cl)c2c(Cl)c(Cl)cc(Cl)c2Cl ,"2,2',3,3',5,6-PCB",-7.185,1,360.88200000000006,0,2,1,0.0,-8.6 57 | Cc1ccccc1O,2-Methylphenol,-2.281,1,108.14,1,1,0,20.23,-0.62 58 | CC(C)CCC(C)(C)C,"2,2,5-Trimethylhexane",-3.631,1,128.259,0,0,2,0.0,-5.05 59 | Cc1ccc(C)c2ccccc12,"1,4-Dimethylnaphthalene ",-4.147,1,156.228,0,2,0,0.0,-4.14 60 | Cc1cc2c3ccccc3ccc2c4ccccc14,6-Methylchrysene,-5.931,1,242.321,0,4,0,0.0,-6.57 61 | CCCC(=O)C,2-Pentanone,-0.846,1,86.13399999999999,0,0,2,17.07,-0.19 62 | Clc1cc(Cl)c(Cl)c(c1Cl)c2c(Cl)c(Cl)cc(Cl)c2Cl ,"2,2',3,3',5,5',6,6'-PCB",-8.304,1,429.77200000000016,0,2,1,0.0,-9.15 63 | CCCOC(=O)CC,Methyl butyrate,-1.545,1,116.15999999999998,0,0,3,26.3,-0.82 64 | CC34CC(O)C1(F)C(CCC2=CC(=O)C=CC12C)C3CC(O)C4(O)C(=O)CO,Triamcinolone,-2.734,1,394.43900000000014,4,4,2,115.06000000000002,-3.68 65 | Nc1ccc(O)cc1,p-Aminophenol,-1.231,1,109.128,2,1,0,46.25,-0.8 66 | O=C(Cn1ccnc1N(=O)=O)NCc2ccccc2,Benznidazole,-2.321,1,260.253,1,2,5,90.06,-2.81 67 | OC4=C(C1CCC(CC1)c2ccc(Cl)cc2)C(=O)c3ccccc3C4=O,"Atovaquone(0,430mg/ml) - neutral",-6.269,1,366.84400000000016,1,4,2,54.37,-5.931 68 | CCNc1nc(Cl)nc(n1)N(CC)CC,Trietazine,-3.233,1,229.715,1,1,5,53.940000000000005,-4.06 69 | NC(=O)c1cnccn1,Pyrazinamide,-0.674,1,123.11499999999998,1,1,1,68.87,-0.667 70 | CCC(Br)(CC)C(=O)NC(N)=O,Carbromal,-2.198,1,237.097,2,0,3,72.19,-2.68 71 | Clc1ccccc1c2ccccc2Cl ,"2,2'-PCB",-4.984,1,223.102,0,2,1,0.0,-5.27 72 | O=C2CN(N=Cc1ccc(o1)N(=O)=O)C(=O)N2 ,nitrofurantoin,-1.243,1,238.159,1,2,3,118.04999999999998,-3.38 73 | Clc2ccc(Oc1ccc(cc1)N(=O)=O)c(Cl)c2,Nitrofen,-5.361000000000001,1,284.098,0,2,3,52.37,-5.46 74 | CC1(C)C2CCC1(C)C(=O)C2,Camphor,-2.158,1,152.237,0,2,0,17.07,-1.96 75 | O=C1NC(=O)NC(=O)C1(CC=C)c1ccccc1,5-Allyl-5-phenylbarbital,-2.36,1,244.25,2,2,3,75.27000000000001,-2.369 76 | CCCCC(=O)OCC,Pentyl propanoate,-1.899,1,130.18699999999998,0,0,4,26.3,-2.25 77 | CC(C)CCOC(=O)C,Isopentyl acetate,-1.817,1,130.18699999999998,0,0,3,26.3,-1.92 78 | O=C1N(COC(=O)CCCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Hexanoyloxymethylphenyltoin,-4.1530000000000005,1,380.444,1,3,8,75.71,-5.886 79 | Clc1cccc(c1)c2cc(Cl)ccc2Cl ,"2,3',5-PCB",-5.7620000000000005,1,257.547,0,2,1,0.0,-6.01 80 | CCCBr,1-Bromopropane,-1.949,1,122.993,0,0,1,0.0,-1.73 81 | CCCC1COC(Cn2cncn2)(O1)c3ccc(Cl)cc3Cl,Propiconazole,-4.603,1,342.2260000000001,0,3,5,49.17,-3.4930000000000003 82 | COP(=S)(OC)SCC(=O)N(C)C=O,Formothion,-2.087,1,257.273,0,0,6,55.84,-1.995 83 | Cc1ncnc2nccnc12,4-methylpteridine,-1.24,1,146.15299999999996,0,2,0,51.56,-0.466 84 | NC(=S)N,Thiourea,0.3289999999999999,1,76.12400000000001,2,0,0,52.04,0.32 85 | Cc1ccc(C)cc1,p-Xylene ,-3.035,1,106.168,0,1,0,0.0,-2.77 86 | CCc1ccccc1CC,"1,2-Diethylbenzene",-3.6010000000000004,1,134.22199999999998,0,1,2,0.0,-3.28 87 | ClC(Cl)(Cl)C(Cl)(Cl)Cl,Hexachloroethane,-4.215,1,236.74,0,0,0,0.0,-3.67 88 | CC(C)C(C(=O)OC(C#N)c1cccc(Oc2ccccc2)c1)c3ccc(OC(F)F)cc3,Flucythrinate,-6.877999999999999,1,451.4690000000001,0,3,9,68.55000000000001,-6.876 89 | CCCN(=O)=O,1-Nitropropane,-0.816,1,89.09399999999998,0,0,2,43.14,-0.8 90 | CC(C)C1CCC(C)CC1=O,Menthone,-2.516,1,154.253,0,1,1,17.07,-2.35 91 | CCN2c1cc(Cl)ccc1NC(=O)c3cccnc23 ,RTI 24,-4.423,1,273.723,1,3,1,45.23,-5.36 92 | O=N(=O)c1c(Cl)c(Cl)ccc1,"2,3-Dichloronitrobenzene",-3.322,1,192.00100000000003,0,1,1,43.14,-3.48 93 | CCCC(C)C1(CC=C)C(=O)NC(=S)NC1=O ,thiamylal,-3.063,1,254.355,2,1,5,58.2,-3.46 94 | c1ccc2c(c1)c3cccc4cccc2c34,Fluoranthene,-4.957,2,202.256,0,4,0,0.0,-6.0 95 | CCCOC(C)C,Propylisopropylether,-1.354,1,102.17699999999998,0,0,3,9.23,-1.34 96 | Cc1cc(C)c2ccccc2c1,"1,3-Dimethylnaphthalene",-4.147,1,156.22799999999998,0,2,0,0.0,-4.29 97 | CCC(=C(CC)c1ccc(O)cc1)c2ccc(O)cc2 ,diethylstilbestrol,-5.074,1,268.356,2,2,4,40.46,-4.07 98 | c1(C#N)c(Cl)c(C#N)c(Cl)c(Cl)c(Cl)1,Chlorothalonil,-3.995,1,265.914,0,1,0,47.58,-5.64 99 | Clc1ccc(Cl)c(c1)c2ccc(Cl)c(Cl)c2,"2,3',4',5-PCB",-6.312,1,291.992,0,2,1,0.0,-7.25 100 | C1OC1c2ccccc2 ,styrene oxide,-1.826,2,120.15099999999995,0,2,1,12.53,-1.6 101 | CC(C)c1ccccc1,Isopropylbenzene ,-3.265,1,120.19499999999996,0,1,1,0.0,-3.27 102 | CC12CCC3C(CCC4=CC(=O)CCC34C)C2CCC1C(=O)CO,Deoxycorticosterone,-3.939,1,330.4680000000001,1,4,2,54.370000000000005,-3.45 103 | c2(Cl)c(Cl)c(Cl)c1nccnc1c2(Cl) ,chlorquinox,-4.438,1,267.93,0,2,0,25.78,-5.43 104 | C1OC(O)C(O)C(O)C1O,L-arabinose,0.601,1,150.13,4,1,0,90.15,0.39 105 | ClCCl,Dichloromethane,-1.156,1,84.93299999999999,0,0,0,0.0,-0.63 106 | CCc1cccc2ccccc12,1-Ethylnaphthalene ,-4.1,1,156.22799999999998,0,2,1,0.0,-4.17 107 | COC=O,Methyl formate,-0.048,1,60.05200000000001,0,0,1,26.3,0.58 108 | Oc1ccccc1N(=O)=O,o-Nitrophenol,-2.318,1,139.11,1,1,1,63.37,-1.74 109 | Cc1c[nH]c(=O)[nH]c1=O ,thymine,-0.78,1,126.115,2,1,0,65.72,-1.506 110 | CC(C)C,2-Methylpropane,-1.891,1,58.124,0,0,0,0.0,-2.55 111 | OCC1OC(C(O)C1O)n2cnc3c(O)ncnc23,Inosine,-0.8340000000000001,1,268.22900000000004,4,3,2,133.75,-1.23 112 | Oc1c(I)cc(C#N)cc1I,Ioxynil,-4.615,1,370.915,1,1,0,44.02,-3.61 113 | Oc1ccc(Cl)cc1C(=O)Nc2ccc(cc2Cl)N(=O)=O,Niclosamide,-5.032,1,327.1230000000001,2,2,3,92.47,-4.7 114 | CCCCC,Pentane,-2.261,1,72.151,0,0,2,0.0,-3.18 115 | c1ccccc1O,Phenol,-1.991,1,94.113,1,1,0,20.23,0.0 116 | Nc3ccc2cc1ccccc1cc2c3 ,2-aminoanthracene,-3.789,1,193.249,1,3,0,26.02,-5.17 117 | Cn1cnc2n(C)c(=O)[nH]c(=O)c12 ,theobromine,-1.05,1,180.167,1,2,0,72.68,-2.523 118 | c1ccc2cnccc2c1,Isoquinoline,-2.531,2,129.16199999999998,0,2,0,12.89,-1.45 119 | COP(=S)(OC)SCC(=O)N(C(C)C)c1ccc(Cl)cc1,Anilofos,-5.106,1,367.86,0,1,7,38.77,-4.432 120 | CCCCCCc1ccccc1,Hexylbenzene ,-4.22,1,162.276,0,1,5,0.0,-5.21 121 | Clc1ccccc1c2ccccc2,2-Chlorobiphenyl,-4.5280000000000005,1,188.657,0,2,1,0.0,-4.54 122 | CCCC(=C)C,2-Methyl-1-Pentene,-2.3480000000000003,1,84.16199999999999,0,0,2,0.0,-3.03 123 | CC(C)C(C)C(C)C,"2,3,4-Trimethylpentane",-3.276,1,114.232,0,0,2,0.0,-4.8 124 | Clc1cc(Cl)c(Cl)c(Cl)c1Cl,Pentachlorobenzene,-5.167999999999999,1,250.339,0,1,0,0.0,-5.65 125 | Oc1cccc(c1)N(=O)=O,m-Nitrophenol,-2.318,1,139.11,1,1,1,63.37,-1.01 126 | CCCCCCCCC=C,1-Decene,-3.781,1,140.26999999999998,0,0,7,0.0,-5.51 127 | CC(=O)OCC(COC(=O)C)OC(=O)C,Glyceryl triacetate,-1.285,1,218.205,0,0,5,78.9,-0.6 128 | CCCCc1c(C)nc(nc1O)N(C)C ,dimethirimol,-3.57,1,209.293,1,1,4,49.25000000000001,-2.24 129 | CC1(C)C(C=C(Cl)Cl)C1C(=O)OC(C#N)c2ccc(F)c(Oc3ccccc3)c2,Cyfluthrin,-6.84,1,434.29400000000015,0,3,6,59.32000000000001,-7.337000000000001 130 | c1ccncc1,Pyridine,-1.481,2,79.10199999999998,0,1,0,12.89,0.76 131 | CCCCCCCBr,1-Bromoheptane,-3.366,1,179.101,0,0,5,0.0,-4.43 132 | Cc1ccncc1C,"3,4-Dimethylpyridine",-2.067,1,107.156,0,1,0,12.89,0.36 133 | CC34CC(O)C1(F)C(CCC2=CC(=O)CCC12C)C3CCC4(O)C(=O)CO ,Fludrocortisone,-3.172,1,380.4560000000001,3,4,2,94.83,-3.43 134 | CCSCc1ccccc1OC(=O)NC ,ethiofencarb,-2.855,1,225.313,1,1,4,38.33,-2.09 135 | CCOC(=O)CC(=O)OCC,Malonic acid diethylester,-1.413,1,160.16899999999998,0,0,4,52.60000000000001,-0.82 136 | CC1=CCC(CC1)C(C)=C,d-Limonene,-3.429,1,136.238,0,1,1,0.0,-4.26 137 | C1Cc2ccccc2C1,Indan,-3.057,2,118.17899999999996,0,2,0,0.0,-3.04 138 | CC(C)(C)c1ccc(O)cc1,p-t-Butylphenol,-3.192,1,150.22099999999998,1,1,0,20.23,-2.41 139 | O=C2NC(=O)C1(CC1)C(=O)N2 ,Cyclopropyl-5-spirobarbituric acid,-0.088,1,154.125,2,2,0,75.27,-1.886 140 | Clc1cccc(I)c1,m-Chloroiodobenzene,-4.384,1,238.455,0,1,0,0.0,-3.55 141 | Brc1cccc2ccccc12,1-Bromonapthalene,-4.434,1,207.07,0,2,0,0.0,-4.35 142 | CC/C=C/C,trans-2-Pentene ,-2.076,1,70.135,0,0,1,0.0,-2.54 143 | Cc1cccc(C)n1,"2,6-Dimethylpyridine",-2.0980000000000003,1,107.156,0,1,0,12.89,0.45 144 | ClC=C(Cl)Cl,Trichloroethylene,-2.312,1,131.389,0,0,0,0.0,-1.96 145 | Nc1cccc2ccccc12,1-Napthylamine,-2.721,1,143.189,1,2,0,26.02,-1.92 146 | Cc1cccc(C)c1,m-Xylene ,-3.035,1,106.168,0,1,0,0.0,-2.82 147 | Oc2ncc1nccnc1n2,2-hydroxypteridine,-1.404,1,148.125,1,2,0,71.79,-1.947 148 | CO,Methanol,0.441,1,32.042,1,0,0,20.23,1.57 149 | CCC1(CCC(C)C)C(=O)NC(=O)NC1=O,Amobarbital,-2.312,1,226.276,2,1,4,75.27000000000001,-2.468 150 | CCC(=O)C,2-Butanone,-0.491,1,72.107,0,0,1,17.07,0.52 151 | Fc1c[nH]c(=O)[nH]c1=O ,5-fluorouracil,-0.792,1,130.078,2,1,0,65.72,-1.077 152 | Nc1ncnc2n(ccc12)C3OC(CO)C(O)C3O ,tubercidin,-0.892,1,266.257,4,3,2,126.65,-1.95 153 | Oc1cccc(O)c1,"1,3-Benzenediol",-1.59,1,110.11199999999998,2,1,0,40.46,0.81 154 | CCCCCCO,1-Hexanol,-1.3969999999999998,1,102.177,1,0,4,20.23,-1.24 155 | CCCCCCl,1-Chloropentane,-2.294,1,106.596,0,0,3,0.0,-2.73 156 | C=CC=C,"1,3-Butadiene",-1.376,1,54.09199999999999,0,0,1,0.0,-1.87 157 | CCCOC(=O)C,Propyl acetate,-1.125,1,102.13299999999998,0,0,2,26.3,-0.72 158 | Oc2ccc1CCCCc1c2 ,"5,6,7,8-tetrahydro-2-naphthol",-3.0860000000000003,1,148.205,1,2,0,20.23,-1.99 159 | NC(=O)CCl ,chloroacetamide,-0.106,1,93.513,1,0,1,43.09,-0.02 160 | COP(=S)(OC)Oc1cc(Cl)c(I)cc1Cl,Iodofenphos,-6.148,1,413.0,0,1,4,27.69,-6.62 161 | Cc1ccc(Cl)cc1,4-Chlorotoluene,-3.297,1,126.586,0,1,0,0.0,-3.08 162 | CSc1nnc(c(=O)n1N)C(C)(C)C,Metribuzin,-2.324,1,214.294,1,1,1,73.8,-2.253 163 | Cc1ccc(OP(=O)(Oc2cccc(C)c2)Oc3ccccc3C)cc1,Tricresyl phosphate,-6.39,1,368.3690000000001,0,3,6,44.760000000000005,-6.01 164 | CCCCCC=O,Caproaldehyde,-1.457,1,100.161,0,0,4,17.07,-1.3 165 | CCCCOC(=O)c1ccc(N)cc1,Butamben,-3.039,1,193.246,1,1,4,52.32,-3.082 166 | O2c1cc(C)ccc1N(C)C(=O)c3cc(N)cnc23 ,RTI 3,-3.049,1,255.277,1,3,0,68.45,-3.043 167 | CC(C)=CCC/C(C)=C\CO,Nerol,-2.603,1,154.253,1,0,4,20.23,-2.46 168 | Clc1ccc(cc1)c2ccccc2Cl ,"2,4'-PCB",-5.142,1,223.102,0,2,1,0.0,-5.28 169 | O=C1N(COC(=O)CCCCCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Octanoyloxymethylphenytoin,-4.84,1,408.498,1,3,10,75.71,-6.523 170 | CCN(=O)=O,Nitroethane,-0.462,1,75.067,0,0,1,43.14,-0.22 171 | CCN(CC(C)=C)c1c(cc(cc1N(=O)=O)C(F)(F)F)N(=O)=O,Ethalfluralin,-5.063,1,333.266,0,1,6,89.51999999999998,-6.124 172 | Clc1ccc(Cl)c(Cl)c1Cl,"1,2,3,4-Tetrachlorobenzene",-4.546,1,215.894,0,1,0,0.0,-4.57 173 | CCCC(C)(COC(N)=O)COC(N)=O ,Meprobamate,-1.376,1,218.253,2,0,6,104.64,-1.807 174 | CC(=O)C3CCC4C2CC=C1CC(O)CCC1(C)C2CCC34C ,pregnenolone,-4.342,1,316.48500000000007,1,4,1,37.3,-4.65 175 | CI,Iodomethane,-1.646,1,141.939,0,0,0,0.0,-1.0 176 | CC1CC(C)C(=O)C(C1)C(O)CC2CC(=O)NC(=O)C2 ,cycloheximide,-1.5319999999999998,1,281.35200000000003,2,2,3,83.47,-1.13 177 | O=C1N(COC(=O)CCCCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Heptanoyloxymethylphenytoin,-4.496,1,394.471,1,3,9,75.71,-6.301 178 | CC1=CC(=O)CC(C)(C)C1 ,isophorone,-2.015,1,138.20999999999998,0,1,0,17.07,-1.06 179 | O=C1NC(=O)NC(=O)C1(CC)C(C)CC,Butabarbital,-1.958,1,212.249,2,1,3,75.27000000000001,-2.39 180 | CCCCC(=O)CCCC,5-Nonanone,-2.329,1,142.242,0,0,6,17.07,-2.58 181 | CCC1(CCC(=O)NC1=O)c2ccccc2 ,Glutethimide,-2.591,1,217.268,1,2,2,46.17,-2.337 182 | CCC(C)CC,3-Methylpentane,-2.6,1,86.178,0,0,2,0.0,-3.68 183 | CCOc1ccc(cc1)C(C)(C)COCc3cccc(Oc2ccccc2)c3,Etofenprox,-6.896,1,376.496,0,3,9,27.69,-8.6 184 | Cc1ccccc1n3c(C)nc2ccccc2c3=O,Methaqualone,-3.881,1,250.301,0,3,1,34.89,-2.925 185 | ClCC#N,Chloroacetonitrile,-0.4479999999999999,1,75.498,0,0,0,23.79,-0.092 186 | CCOP(=S)(CC)Oc1cc(Cl)c(Cl)cc1Cl,Trichloronate,-5.225,1,333.60400000000004,0,1,5,18.46,-5.752000000000001 187 | CC12CCC(=O)C=C1CCC3C2CCC4(C)C3CCC4(O)C#C ,Ethisterone,-3.858,1,312.45300000000003,1,4,0,37.3,-5.66 188 | c1ccnnc1,Pyridazine,-0.619,2,80.08999999999999,0,1,0,25.78,1.1 189 | Clc1cc(Cl)c(Cl)c(Cl)c1,"1,2,3,5-Tetrachlorobenzene",-4.621,1,215.894,0,1,0,0.0,-4.63 190 | C1C(O)CCC2(C)CC3CCC4(C)C5(C)CC6OCC(C)CC6OC5CC4C3C=C21,Diosgenin,-5.681,1,414.63000000000017,1,6,0,38.69,-7.32 191 | Nc1ccccc1O,o-Aminophenol,-1.465,1,109.128,2,1,0,46.25,-0.72 192 | CCCCCCCCC(=O)OCC,Ethyl nonanoate,-3.3160000000000003,1,186.295,0,0,8,26.3,-3.8 193 | COCC(=O)N(C(C)C(=O)OC)c1c(C)cccc1C ,metalaxyl,-2.87,1,279.336,0,1,5,55.84,-1.601 194 | CNC(=O)Oc1ccccc1OC(C)C,Propoxur,-2.4090000000000003,1,209.245,1,1,3,47.56,-2.05 195 | CCC(C)Cl,2-Chlorobutane,-1.94,1,92.569,0,0,1,0.0,-1.96 196 | Oc1ccc2ccccc2c1,2-Napthol,-3.08,1,144.17299999999997,1,2,0,20.23,-2.28 197 | CC(C)Oc1cc(c(Cl)cc1Cl)n2nc(oc2=O)C(C)(C)C,Oxadiazon,-5.265,1,345.22600000000017,0,2,3,57.26,-5.696000000000001 198 | CCCCC#C,1-Hexyne ,-1.801,1,82.14599999999999,0,0,2,0.0,-2.36 199 | CCCCCCCC#C,1-Nonyne ,-2.864,1,124.227,0,0,5,0.0,-4.24 200 | Cc1ccccc1Cl,2-Chlorotoluene,-3.297,1,126.586,0,1,0,0.0,-3.52 201 | CC(C)OC(C)C,Diisopropyl ether ,-1.281,1,102.177,0,0,2,9.23,-1.1 202 | Nc1ccc(cc1)S(=O)(=O)c2ccc(N)cc2,Dapsone,-2.464,1,248.307,2,2,2,86.18,-3.094 203 | CNN,Methyl hydrazine,0.5429999999999999,1,46.073,2,0,0,38.05,1.34 204 | CC#C,Propyne,-0.672,1,40.065000000000005,0,0,0,0.0,-0.41 205 | CCOP(=S)(OCC)ON=C(C#N)c1ccccc1,Phoxim,-4.557,1,298.304,0,1,7,63.84,-4.862 206 | CCNP(=S)(OC)OC(=CC(=O)OC(C)C)C,Propetamphos,-2.826,1,281.314,1,0,7,56.790000000000006,-3.408 207 | C=CC=O,Acrolein,-0.184,1,56.064,0,0,1,17.07,0.57 208 | O=c1[nH]cnc2nc[nH]c12 ,Hypoxanthine,-0.6559999999999999,1,136.114,2,2,0,74.43,-2.296 209 | Oc2ccc1ncccc1c2 ,6-hydroxyquinoline,-2.725,1,145.161,1,2,0,33.120000000000005,-2.16 210 | Fc1ccccc1,Fluorobenzene,-2.514,1,96.10399999999998,0,1,0,0.0,-1.8 211 | CCCCl,1-Chloropropane,-1.585,1,78.542,0,0,1,0.0,-1.47 212 | CCOC(=O)C,Ethyl acetate,-0.77,1,88.106,0,0,1,26.3,-0.04 213 | CCCC(C)(C)C,"2,2-Dimethylpentane",-2.938,1,100.20499999999998,0,0,1,0.0,-4.36 214 | Cc1cc(C)c(C)c(C)c1C,Pentamethylbenzene,-3.993,1,148.249,0,1,0,0.0,-4.0 215 | CC12CCC(CC1)C(C)(C)O2 ,eucalyptol,-2.579,1,154.253,0,3,0,9.23,-1.64 216 | CCCCOC(=O)CCCCCCCCC(=O)OCCCC,dibutyl sebacate,-4.726,1,314.46600000000007,0,0,15,52.60000000000001,-3.896 217 | Clc1ccc(cc1)c2ccc(Cl)cc2 ,"4,4'-PCB",-5.299,1,223.102,0,2,1,0.0,-6.56 218 | Cc1cccnc1C,"2,3-Dimethylpyridine",-2.067,1,107.156,0,1,0,12.89,0.38 219 | CC(=C)C1CC=C(C)C(=O)C1,Carvone,-2.042,1,150.22099999999998,0,1,1,17.07,-2.06 220 | CCOP(=S)(OCC)SCSc1ccc(Cl)cc1,Carbophenthion,-5.827999999999999,1,342.875,0,1,8,18.46,-5.736000000000001 221 | COc1cc(cc(OC)c1O)C6C2C(COC2=O)C(OC4OC3COC(C)OC3C(O)C4O)c7cc5OCOc5cc67,"Etoposide (148-167,25mg/ml)",-3.292,1,588.5620000000001,3,7,5,160.83,-3.571 222 | c1cc2cccc3c4cccc5cccc(c(c1)c23)c54,Perylene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.804 223 | Cc1ccc(cc1N(=O)=O)N(=O)=O,"2,4-Dinitrotoluene",-2.604,1,182.135,0,1,2,86.28,-2.82 224 | c1c(Br)ccc2ccccc12 ,2-bromonaphthalene,-4.434,1,207.07,0,2,0,0.0,-4.4 225 | CNC(=O)Oc1cccc(N=CN(C)C)c1,Formetanate,-1.846,1,221.26,1,1,3,53.93,-2.34 226 | COc2cnc1ncncc1n2,6-methoxypteridine,-1.589,1,162.15200000000002,0,2,1,60.790000000000006,-1.139 227 | Cc3ccnc4N(C1CC1)c2ncccc2C(=O)Nc34 ,nevirapine,-3.397,1,266.30400000000003,1,4,1,58.120000000000005,-3.19 228 | CCOP(=S)(OCC)Oc1nc(Cl)n(n1)C(C)C,Isazofos,-3.76,1,313.747,0,1,7,58.4,-3.658 229 | CC(=C)C=C,"2-Methyl-1,3-Butadiene ",-1.714,1,68.11900000000001,0,0,1,0.0,-2.03 230 | CC(C)=CCCC(O)(C)C=C,linalool,-2.399,1,154.253,1,0,4,20.23,-1.99 231 | COP(=S)(OC)Oc1ccc(SC)c(C)c1,Fenthion,-4.265,1,278.335,0,1,5,27.69,-4.57 232 | OC1CCCCC1,Cyclohexanol ,-1.261,1,100.161,1,1,0,20.23,-0.44 233 | O=C1NC(=O)NC(=O)C1(C)CC=C,5-Allyl-5-methylbarbital,-1.013,1,182.179,2,1,2,75.27000000000001,-1.16 234 | CC34CCC1C(CCC2CC(O)CCC12C)C3CCC4=O,Epiandrosterone,-3.882,1,290.447,1,4,0,37.3,-4.16 235 | OCC(O)C(O)C(O)C(O)CO ,mannitol,0.647,1,182.172,6,0,5,121.38,0.06 236 | Cc1ccc(cc1)c2ccccc2,4-Methylbiphenyl,-4.424,1,168.239,0,2,1,0.0,-4.62 237 | CCNc1nc(Cl)nc(NC(C)C)n1,Atrazine,-3.069,1,215.688,2,1,4,62.73,-3.85 238 | NC(=S)Nc1ccccc1,Phenylthiourea,-1.7009999999999998,1,152.22199999999998,2,1,1,38.05,-1.77 239 | CCCC(=O)CCC,4-Heptanone,-1.62,1,114.188,0,0,4,17.07,-1.3 240 | CC(=O)C(C)(C)C,"3,3-Dimethyl-2-butanone",-1.25,1,100.161,0,0,0,17.07,-0.72 241 | Oc1ccc(Cl)cc1,4-Chlorophenol ,-2.761,1,128.558,1,1,0,20.23,-0.7 242 | O=C1CCCCC1,Cyclohexanone,-0.996,1,98.145,0,1,0,17.07,-0.6 243 | Cc1cccc(N)c1,m-Methylaniline,-1.954,1,107.156,1,1,0,26.02,-0.85 244 | ClC(Cl)(Cl)C#N,Trichloroacetonitrile,-2.019,1,144.388,0,0,0,23.79,-2.168 245 | CNc2cnn(c1cccc(c1)C(F)(F)F)c(=O)c2Cl,norflurazon,-4.029,1,303.67100000000005,1,2,2,46.92,-4.046 246 | CCCCCCCCC(=O)C,2-Decanone,-2.617,1,156.269,0,0,7,17.07,-3.3 247 | CCN(CC)c1nc(Cl)nc(NC(C)C)n1,Ipazine,-3.497,1,243.742,1,1,5,53.940000000000005,-3.785 248 | CCOC(=O)c1ccc(N)cc1,Benzocaine,-2.383,1,165.19199999999998,1,1,2,52.32,-2.616 249 | Clc1ccc(Cl)c(Cl)c1,"1,2,4-Trichlorobenzene",-4.083,1,181.449,0,1,0,0.0,-3.59 250 | Cc3nnc4CN=C(c1ccccc1Cl)c2cc(Cl)ccc2n34,Triazolam,-3.948,1,343.2170000000001,0,4,1,43.07,-4.09 251 | Oc1ccccc1O,"1,2-Benzenediol",-1.635,1,110.11199999999998,2,1,0,40.46,0.62 252 | CCN2c1ncccc1N(C)C(=O)c3cccnc23 ,Reverse Transcriptase inhibitor 1,-2.794,1,254.293,0,3,1,49.330000000000005,-2.62 253 | CSC,Dimethyl sulfide,-0.758,1,62.137,0,0,0,0.0,-0.45 254 | Cc1ccccc1Br,2-Bromotoluene,-3.667,1,171.03699999999998,0,1,0,0.0,-2.23 255 | CCOC(=O)N,O-Ethyl carbamate,-0.218,1,89.09400000000001,1,0,1,52.32,0.85 256 | CC(=O)OC3(CCC4C2C=C(C)C1=CC(=O)CCC1(C)C2CCC34C)C(C)=O ,megestrol acetate,-4.417,1,384.5160000000002,0,4,2,60.440000000000005,-5.35 257 | CC(C)C(O)C(C)C,"2,4-Dimethyl-3-pentanol",-1.6469999999999998,1,116.20399999999998,1,0,2,20.23,-1.22 258 | c1ccc2ccccc2c1,Napthalene,-3.468,2,128.17399999999995,0,2,0,0.0,-3.6 259 | CCNc1ccccc1,N-Ethylaniline,-2.389,1,121.18299999999996,1,1,2,12.03,-1.7 260 | O=C1NC(=O)C(N1)(c2ccccc2)c3ccccc3,Phenytoin,-3.057,1,252.273,2,3,2,58.2,-4.097 261 | Cc1c2ccccc2c(C)c3ccc4ccccc4c13,"7,12-Dimethylbenz(a)anthracene",-6.297000000000001,1,256.348,0,4,0,0.0,-7.02 262 | CCOP(=S)(OCC)SC(CCl)N1C(=O)c2ccccc2C1=O,Dialifor,-5.026,1,393.85400000000016,0,2,8,55.84,-6.34 263 | COc1ccc(cc1)C(c2ccc(OC)cc2)C(Cl)(Cl)Cl,Methoxychlor,-5.537999999999999,1,345.6529999999999,0,2,4,18.46,-6.89 264 | Fc1cccc(F)c1C(=O)NC(=O)Nc2cc(Cl)c(F)c(Cl)c2F ,TEFLUBENZURON,-5.462000000000001,1,381.1120000000001,2,2,2,58.2,-7.28 265 | O=C1N(COC(=O)CCCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Pentanoyloxymethylphenytoin,-3.81,1,366.417,1,3,7,75.71,-4.678 266 | CN(C)C(=O)Nc1ccc(Cl)cc1,Monuron,-2.6710000000000003,1,198.653,1,1,1,32.34,-2.89 267 | OC(Cn1cncn1)(c2ccc(F)cc2)c3ccccc3F,Flutriafol,-3.569,1,301.296,1,3,4,50.94,-3.37 268 | CC(=O)OCC(=O)C3(O)C(CC4C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC34C)OC(C)=O ,triamcinolone diacetate,-3.876,1,478.51300000000026,2,4,4,127.20000000000002,-4.13 269 | CCCCBr,1-Bromobutane,-2.303,1,137.01999999999998,0,0,2,0.0,-2.37 270 | Brc1cc(Br)c(Br)cc1Br,"1,2,4,5-Tetrabromobenzene",-6.001,1,393.698,0,1,0,0.0,-6.98 271 | CC(C)CC(=O)C,4-Methyl-2-pentanone,-1.1840000000000002,1,100.161,0,0,2,17.07,-0.74 272 | CCSC(=O)N(CC)C1CCCCC1 ,cycloate,-3.35,1,215.362,0,1,3,20.31,-3.4 273 | COc1ccc(Cl)cc1,4-Chloroanisole,-3.057,1,142.585,0,1,1,9.23,-2.78 274 | CC1(C)C(C=C(Br)Br)C1C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Deltamethrin,-7.44,1,505.2060000000002,0,3,6,59.32000000000001,-8.402000000000001 275 | CCC(C)C1(CC=C)C(=O)NC(=O)NC1=O,Talbutal,-2.06,1,224.26,2,1,4,75.27000000000001,-2.016 276 | COP(=S)(OC)Oc1ccc(N(=O)=O)c(C)c1,Fenitrothion,-3.845,1,277.238,0,1,5,70.83000000000001,-4.04 277 | Ic1cccc2ccccc12,1-Iodonapthalene,-4.888999999999999,1,254.07,0,2,0,0.0,-4.55 278 | OCC(O)C(O)C(O)C(O)CO,Sorbitol,0.647,1,182.172,6,0,5,121.38,1.09 279 | CCS,Ethanethiol,-0.968,1,62.137,1,0,0,0.0,-0.6 280 | ClCC(Cl)Cl,"1,1,2-Trichloroethane",-1.961,1,133.405,0,0,1,0.0,-1.48 281 | CN(C)C(=O)Oc1cc(C)nn1c2ccccc2,Pyrolan,-3.141,1,245.282,0,2,2,47.36000000000001,-2.09 282 | NC(=O)c1ccccc1O,o-Hydroxybenzamide,-1.942,1,137.13799999999998,2,1,1,63.32000000000001,-1.82 283 | Cc1ccccc1N(=O)=O,o-Nitrotoluene,-2.589,1,137.138,0,1,1,43.14,-2.33 284 | O=C1NC(=O)NC(=O)C1(C(C)C)C(C)C,"5,5-Diisopropylbarbital",-1.942,1,212.249,2,1,2,75.27000000000001,-2.766 285 | CCc1ccccc1C,2-Ethyltoluene,-3.2960000000000003,1,120.19499999999996,0,1,1,0.0,-3.21 286 | CCCCCCCCl,1-Chloroheptane,-3.003,1,134.65,0,0,5,0.0,-4.0 287 | O=C1NC(=O)NC(=O)C1(CC)CC,Barbital,-1.265,1,184.195,2,1,2,75.27000000000001,-2.4 288 | C(Cc1ccccc1)c2ccccc2,Bibenzyl ,-4.301,2,182.266,0,2,3,0.0,-4.62 289 | ClC(Cl)C(Cl)Cl,"1,1,2,2-Tetrachloroethane",-2.549,1,167.85,0,0,1,0.0,-1.74 290 | CCN2c1cc(OC)cc(C)c1NC(=O)c3cccnc23 ,RTI 23,-4.228,1,283.331,1,3,2,54.46,-5.153 291 | Cc1ccc2c(ccc3ccccc32)c1,2-Methylphenanthrene,-4.87,1,192.261,0,3,0,0.0,-5.84 292 | CCCCOC(=O)c1ccccc1C(=O)OCCCC ,dibutylphthalate,-4.378,1,278.348,0,1,8,52.60000000000001,-4.4 293 | COc1c(O)c(Cl)c(Cl)c(Cl)c1Cl ,tetrachloroguaiacol,-4.299,1,261.919,1,1,1,29.46,-4.02 294 | CCN(CC)C(=O)C(=CCOP(=O)(OC)OC)Cl,Dimecron,-2.426,1,299.6909999999999,0,0,8,65.07000000000001,0.523 295 | CC34CCC1C(=CCc2cc(O)ccc12)C3CCC4=O,Equilin,-3.555,1,268.356,1,4,0,37.3,-5.282 296 | CCOC(=O)c1ccccc1S(=O)(=O)NN(C=O)c2nc(Cl)cc(OC)n2,Chlorimuron-ethyl (ph 7),-3.719,1,414.82700000000017,1,2,8,127.79,-4.5760000000000005 297 | COc1ccc(cc1)N(=O)=O,p-Nitroanisole,-2.522,1,153.13699999999997,0,1,2,52.37,-2.41 298 | CCCCCCCl,1-Chlorohexane,-2.648,1,120.623,0,0,4,0.0,-3.12 299 | Clc1cc(c(Cl)c(Cl)c1Cl)c2cc(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,3',4,4',5,5'-PCB",-8.468,1,429.77200000000016,0,2,1,0.0,-9.16 300 | OCC1OC(CO)(OC2OC(COC3OC(CO)C(O)C(O)C3O)C(O)C(O)C2O)C(O)C1O,Raffinose,0.496,1,504.4380000000001,11,3,8,268.67999999999995,-0.41 301 | CCCCCCCCCCCCCCCCCCCCCCCCCC,hexacosane,-9.702,1,366.7180000000002,0,0,23,0.0,-8.334 302 | CCN2c1ccccc1N(C)C(=O)c3cccnc23 ,RTI 5,-3.471,1,253.30499999999995,0,3,1,36.44,-3.324 303 | CC(Cl)Cl,"1,1-Dichloroethane",-1.5759999999999998,1,98.96,0,0,0,0.0,-1.29 304 | Nc1ccc(cc1)S(N)(=O)=O,Sulfanilamide,-0.954,1,172.20899999999995,2,1,1,86.18,-1.34 305 | CCCN(CCC)c1c(cc(cc1N(=O)=O)C(C)C)N(=O)=O,Isopropalin,-5.306,1,309.36600000000004,0,1,8,89.51999999999998,-6.49 306 | ClC1C(Cl)C(Cl)C(Cl)C(Cl)C1Cl,Lindane,-4.009,1,290.832,0,1,0,0.0,-4.64 307 | CCOP(=S)(NC(C)C)Oc1ccccc1C(=O)OC(C)C,Isofenphos,-4.538,1,345.4010000000002,1,1,8,56.790000000000006,-4.194 308 | Clc1cccc(Cl)c1Cl,"1,2,3-Trichlorobenzene",-4.008,1,181.449,0,1,0,0.0,-4.0 309 | ClC(Cl)(Cl)Cl,Tetrachloromethane,-2.607,1,153.823,0,0,0,0.0,-2.31 310 | O=N(=O)c1cc(Cl)c(Cl)cc1,"3,4-Dichloronitrobenzene",-3.448,1,192.001,0,1,1,43.14,-3.2 311 | OC1CCCCCCC1,Cyclooctanol,-2.14,1,128.215,1,1,0,20.23,-1.29 312 | CC1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3CCC21C,17a-Methyltestosterone,-4.073,1,302.4580000000001,1,4,0,37.3,-3.999 313 | CCOc1ccc(NC(N)=O)cc1,Dulcin,-2.167,1,180.207,2,1,3,64.35,-2.17 314 | C/C1CCC(\C)CC1,"trans-1,4-Dimethylcyclohexane",-3.305,1,112.216,0,1,0,0.0,-4.47 315 | c1cnc2c(c1)ccc3ncccc23,"1,7-phenantroline",-2.994,2,180.21,0,3,0,25.78,-2.68 316 | COC(C)(C)C,Methyl t-butyl ether ,-0.984,1,88.14999999999999,0,0,0,9.23,-0.24 317 | COc1ccc(C=CC)cc1,Anethole,-3.254,1,148.20499999999998,0,1,2,9.23,-3.13 318 | CCCCCCCCCCCCCCCCO,1-Hexadecanol,-4.94,1,242.44699999999992,1,0,14,20.23,-7.0 319 | O=c1cc[nH]c(=O)[nH]1 ,uracil,-0.441,1,112.088,2,1,0,65.72,-1.4880000000000002 320 | Nc1ncnc2nc[nH]c12 ,adenine,-1.255,1,135.13,2,2,0,80.47999999999999,-2.12 321 | Clc1cc(Cl)c(cc1Cl)c2cccc(Cl)c2Cl ,"2,2',3,4,5-PCB",-6.709,1,326.437,0,2,1,0.0,-7.21 322 | COc1ccc(cc1)C(O)(C2CC2)c3cncnc3 ,Ancymidol,-2.181,1,256.30499999999995,1,3,4,55.24,-2.596 323 | c1ccc2c(c1)c3cccc4c3c2cc5ccccc54,Benzo(b)fluoranthene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.23 324 | O=C(Nc1ccccc1)Nc2ccccc2,Carbanilide,-3.611,1,212.252,2,2,2,41.13,-3.15 325 | CCC1(C(=O)NC(=O)NC1=O)c2ccccc2 ,phenobarbital,-2.272,1,232.239,2,2,2,75.27000000000001,-2.322 326 | Clc1ccc(cc1)c2cccc(Cl)c2Cl ,"2',3,4-PCB",-5.686,1,257.547,0,2,1,0.0,-6.29 327 | CC(C)c1ccc(NC(=O)N(C)C)cc1,Isoproturon,-2.867,1,206.289,1,1,2,32.34,-3.536 328 | CCN(CC)C(=O)CSc1ccc(Cl)nn1,Azintamide,-2.231,1,259.762,0,1,5,46.09,-1.716 329 | CCC(C)(C)CO,"2,2-Dimethyl-1-butanol",-1.365,1,102.17699999999998,1,0,2,20.23,-1.04 330 | CCCOC(=O)CCC,Ethyl pentanoate,-1.899,1,130.18699999999998,0,0,4,26.3,-1.75 331 | Cc1c(cc(cc1N(=O)=O)N(=O)=O)N(=O)=O,"2,4,6-Trinitrotoluene",-2.6060000000000003,1,227.132,0,1,3,129.42000000000002,-3.22 332 | CC(C)OP(=S)(OC(C)C)SCCNS(=O)(=O)c1ccccc1,Bensulide,-4.99,1,397.52400000000006,1,1,10,64.63,-4.2 333 | C1CCCCCC1,Cycloheptane,-2.9160000000000004,2,98.189,0,1,0,0.0,-3.51 334 | CCCOC=O,Propyl formate,-0.757,1,88.10599999999998,0,0,3,26.3,-0.49 335 | CC(C)c1ccccc1C,2-Isopropyltoluene,-3.585,1,134.22199999999995,0,1,1,0.0,-3.76 336 | Nc1cccc(Cl)c1,m-Chloroaniline,-2.392,1,127.574,1,1,0,26.02,-1.37 337 | CC(C)CC(C)C,"2,4-Dimethylpentane",-2.938,1,100.20499999999998,0,0,2,0.0,-4.26 338 | o1c2ccccc2c3ccccc13,Dibenzofurane,-4.2010000000000005,2,168.195,0,3,0,13.14,-4.6 339 | CCOC2Oc1ccc(OS(C)(=O)=O)cc1C2(C)C,ethofumesate,-3.184,1,286.34900000000005,0,2,4,61.830000000000005,-3.42 340 | CN(C)C(=O)Nc1cccc(c1)C(F)(F)F,Fluometuron,-3.065,1,232.205,1,1,1,32.34,-3.43 341 | c3ccc2nc1ccccc1cc2c3,Acridine,-3.846,2,179.22199999999998,0,3,0,12.89,-3.67 342 | CC12CC(=O)C3C(CCC4=CC(=O)CCC34C)C2CCC1(O)C(=O)CO,Cortisone,-2.893,1,360.45000000000016,2,4,2,91.67,-3.11 343 | OCC1OC(O)C(O)C(O)C1O,glucose,0.501,1,180.156,5,1,1,110.38,0.74 344 | Cc1cccc(O)c1,3-Methylphenol,-2.313,1,108.14,1,1,0,20.23,-0.68 345 | CC2Cc1ccccc1N2NC(=O)c3ccc(Cl)c(c3)S(N)(=O)=O ,Indapamide,-4.345,1,365.842,2,3,3,92.5,-3.5860000000000003 346 | CCC(C)C(=O)OC2CC(C)C=C3C=CC(C)C(CCC1CC(O)CC(=O)O1)C23 ,Lovastatin,-4.731,1,404.54700000000025,1,3,6,72.83,-6.005 347 | O=N(=O)c1ccc(cc1)N(=O)=O,"1,4-Dinitrobenzene",-2.281,1,168.10799999999995,0,1,2,86.28,-3.39 348 | CCC1(C(=O)NC(=O)NC1=O)C2=CCC3CCC2C3,Reposal,-2.781,1,262.309,2,3,2,75.27000000000001,-2.696 349 | CCCCCCCCCC(=O)OCC,Ethyl decanoate,-3.671,1,200.322,0,0,9,26.3,-4.1 350 | CN(C)C(=O)Nc1ccccc1,Fenuron,-1.847,1,164.208,1,1,1,32.34,-1.6 351 | CCCOCC,Ethyl propyl ether,-1.072,1,88.14999999999999,0,0,3,9.23,-0.66 352 | CC(C)O,2-Propanol,-0.261,1,60.096,1,0,0,20.23,0.43 353 | Cc1ccc2ccccc2c1,2-Methylnapthalene,-3.802,1,142.201,0,2,0,0.0,-3.77 354 | ClC(Br)Br,Chlorodibromethane,-2.54,1,208.28,0,0,0,0.0,-1.9 355 | CCC(C(CC)c1ccc(O)cc1)c2ccc(O)cc2,Hexestrol,-4.854,1,270.372,2,2,5,40.46,-4.43 356 | CCOC(=O)CC(SP(=S)(OC)OC)C(=O)OCC,Malathion,-3.391,1,330.3640000000001,0,0,9,71.06,-3.37 357 | ClCc1ccccc1,Benzylchloride,-2.887,1,126.58599999999996,0,1,1,0.0,-2.39 358 | C/C=C/C=O,t-Crotonaldehyde,-0.604,1,70.09100000000001,0,0,1,17.07,0.32 359 | CON(C)C(=O)Nc1ccc(Br)c(Cl)c1,Chlorbromuron,-3.938,1,293.548,1,1,2,41.57,-3.924 360 | Cc1c2ccccc2c(C)c3ccccc13,"9,10-Dimethylanthracene",-5.228,1,206.288,0,3,0,0.0,-6.57 361 | CCCCCC(=O)OC,Methyl hexanoate,-1.899,1,130.18699999999998,0,0,4,26.3,-1.87 362 | CN(C)C(=O)Nc1ccc(c(Cl)c1)n2nc(oc2=O)C(C)(C)C,Dimefuron,-3.831,1,338.79500000000013,1,2,2,80.37,-4.328 363 | CC(=O)Nc1ccc(F)cc1,p-Fluoroacetanilide,-2.181,1,153.156,1,1,1,29.1,-1.78 364 | CCc1cccc(CC)c1N(COC)C(=O)CCl ,alachlor,-3.319,1,269.77199999999993,0,1,6,29.54,-3.26 365 | C1CCC=CC1,Cyclohexene,-2.16,2,82.146,0,1,0,0.0,-2.59 366 | CC12CC(O)C3C(CCC4=CC(=O)CCC34C)C2CCC1(O)C(=O)CO,Hydrocortisone ,-3.159,1,362.4660000000002,3,4,2,94.83,-3.09 367 | c1cncnc1,Pyrimidine,-0.884,2,80.08999999999999,0,1,0,25.78,1.1 368 | Clc1ccc(cc1)N(=O)=O,p-Chloronitrobenzene,-2.901,1,157.55599999999998,0,1,1,43.14,-2.92 369 | CCC(=O)OC,Methyl propionate,-0.836,1,88.106,0,0,1,26.3,-0.14 370 | Clc1ccccc1N(=O)=O,o-Chloronitrobenzene,-2.775,1,157.55599999999998,0,1,1,43.14,-2.55 371 | CCCCN(C)C(=O)Nc1ccc(Cl)c(Cl)c1,Neburon,-4.157,1,275.179,1,1,4,32.34,-4.77 372 | CN1CC(O)N(C1=O)c2nnc(s2)C(C)(C)C,Buthidazole,-2.398,1,256.331,1,2,1,69.56,-1.877 373 | O=N(=O)c1ccccc1,Nitrobenzene,-2.2880000000000003,1,123.11099999999996,0,1,1,43.14,-1.8 374 | Ic1ccccc1,Iodobenzene,-3.8,1,204.01,0,1,0,0.0,-3.01 375 | CC2Nc1cc(Cl)c(cc1C(=O)N2c3ccccc3C)S(N)(=O)=O ,Metolazone,-3.777,1,365.8420000000001,2,3,2,92.5,-3.78 376 | COc1ccccc1OCC(O)COC(N)=O,Methocarbamol,-1.4280000000000002,1,241.243,2,1,6,91.01,-0.985 377 | CCCCOCN(C(=O)CCl)c1c(CC)cccc1CC ,butachlor,-4.347,1,311.85300000000007,0,1,9,29.54,-4.19 378 | Oc1cccc(Cl)c1Cl,"2,3-Dichlorophenol",-3.144,1,163.003,1,1,0,20.23,-1.3 379 | CCCC(=O)OC,Propyl butyrate,-1.1909999999999998,1,102.13299999999998,0,0,2,26.3,-1.92 380 | CCC(=O)Nc1ccc(Cl)c(Cl)c1,Propanil,-3.644,1,218.083,1,1,2,29.1,-3.0 381 | Nc3nc(N)c2nc(c1ccccc1)c(N)nc2n3,Triamterene,-3.051,1,253.26900000000003,3,3,1,129.62,-2.404 382 | CCCCCC(=O)OCC,Ethyl hexanoate,-2.254,1,144.21399999999997,0,0,5,26.3,-2.35 383 | OCC(O)C2OC1OC(OC1C2O)C(Cl)(Cl)Cl ,chloralose,-1.887,1,309.529,3,2,2,88.38000000000001,-1.84 384 | CN(C=Nc1ccc(C)cc1C)C=Nc2ccc(C)cc2C,Amitraz,-5.533,1,293.41400000000004,0,2,4,27.96,-5.47 385 | COc1nc(NC(C)C)nc(NC(C)C)n1,Prometon,-3.448,1,225.296,2,1,5,71.96000000000001,-2.478 386 | CCCCCCC=C,1-Octene ,-3.073,1,112.216,0,0,5,0.0,-4.44 387 | Cc1ccc(N)cc1,p-Methylaniline ,-1.954,1,107.156,1,1,0,26.02,-1.21 388 | Nc1nccs1 ,aminothiazole,-1.226,1,100.146,1,1,0,38.91,-0.36 389 | c1ccccc1(OC(=O)NC),Metolcarb,-1.947,1,151.165,1,1,1,38.33,-1.803 390 | CCCC(O)CC,3-Hexanol,-1.324,1,102.177,1,0,3,20.23,-0.8 391 | c3ccc2c(O)c1ccccc1cc2c3 ,9-anthrol,-4.148,1,194.233,1,3,0,20.23,-4.73 392 | Cc1ccc2cc3ccccc3cc2c1,2-Methylanthracene,-4.87,1,192.261,0,3,0,0.0,-6.96 393 | Cc1cccc(C)c1C,"1,2,3-Trimethylbenzene ",-3.312,1,120.195,0,1,0,0.0,-3.2 394 | CNC(=O)Oc1ccc(N(C)C)c(C)c1,Aminocarb,-2.677,1,208.261,1,1,2,41.57,-2.36 395 | CCCCCCCC(C)O,2-Nonanol,-2.387,1,144.258,1,0,6,20.23,-2.74 396 | CN(C(=O)NC(C)(C)c1ccccc1)c2ccccc2,Methyldymron,-3.863,1,268.36,1,2,3,32.34,-3.35 397 | CCCC(=O)CC,3-Hexanone,-1.266,1,100.161,0,0,3,17.07,-0.83 398 | Oc1c(Br)cc(C#N)cc1Br ,bromoxynil,-3.793,1,276.915,1,1,0,44.02,-3.33 399 | Clc1ccc(cc1Cl)c2ccccc2 ,"3,4-PCB",-5.223,1,223.102,0,2,1,0.0,-6.39 400 | CN(C(=O)COc1nc2ccccc2s1)c3ccccc3,Mefenacet,-4.504,1,298.367,0,3,4,42.43000000000001,-4.873 401 | Oc1cccc2ncccc12 ,5-hydroxyquinoline,-2.725,1,145.161,1,2,0,33.120000000000005,-2.54 402 | CC1=C(SCCO1)C(=O)Nc2ccccc2,Carboxin,-2.927,1,235.308,1,2,2,38.33,-3.14 403 | CCOc2ccc1nc(sc1c2)S(N)(=O)=O ,Ethoxyzolamide,-3.085,1,258.324,1,2,3,82.28,-3.81 404 | Oc1c(Cl)c(Cl)c(Cl)c(Cl)c1Cl,Pentachlorophenol,-4.835,1,266.338,1,1,0,20.23,-4.28 405 | ClCBr,Bromochloromethane,-1.519,1,129.384,0,0,0,0.0,-0.89 406 | CCC1(CC)C(=O)NC(=O)N(C)C1=O ,metharbital,-1.658,1,198.222,1,1,2,66.48,-2.23 407 | CC(=O)OCC(=O)C3CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,deoxycorticosterone acetate,-4.472,1,372.5050000000002,0,4,3,60.440000000000005,-4.63 408 | NC(=O)NCc1ccccc1 ,benzylurea,-1.509,1,150.18099999999998,2,1,2,55.120000000000005,-0.95 409 | CN(C)C(=O)Nc1ccc(C)c(Cl)c1,Chlortoluron,-3.048,1,212.68,1,1,1,32.34,-3.483 410 | CON(C)C(=O)Nc1ccc(Cl)c(Cl)c1,Linuron,-3.5810000000000004,1,249.097,1,1,2,41.57,-3.592 411 | OC1CCCCCC1,Cycloheptanol,-1.7,1,114.188,1,1,0,20.23,-0.88 412 | CS(=O)(=O)c1ccc(cc1)C(O)C(CO)NC(=O)C(Cl)Cl ,Thiamphenicol,-1.936,1,356.2270000000001,3,1,6,103.70000000000002,-2.154 413 | CCCC(C)C1(CC)C(=O)NC(=S)NC1=O ,thiopental,-2.96,1,242.344,2,1,4,58.2,-3.36 414 | CC(=O)Nc1nnc(s1)S(N)(=O)=O ,acetazolamide,-0.7929999999999999,1,222.251,2,1,2,115.04,-2.36 415 | Oc1ccc(cc1)N(=O)=O,p-Nitrophenol,-2.318,1,139.11,1,1,1,63.37,-0.74 416 | ClC1=C(Cl)C2(Cl)C3C4CC(C=C4)C3C1(Cl)C2(Cl)Cl,Aldrin,-5.511,1,364.914,0,4,0,0.0,-6.307 417 | C1CCOC1,Tetrahydrofurane ,-0.62,2,72.107,0,1,0,9.23,0.49 418 | Nc1ccccc1N(=O)=O,o-Nitroaniline,-2.277,1,138.126,1,1,1,69.16,-1.96 419 | Clc1cccc(c1Cl)c2cccc(Cl)c2Cl,"2,2',3,3'-PCB",-6.079,1,291.992,0,2,1,0.0,-7.28 420 | CCCCC1C(=O)N(N(C1=O)c2ccccc2)c3ccccc3 ,phenylbutazone,-4.0760000000000005,1,308.38100000000003,0,3,5,40.620000000000005,-3.81 421 | Cc1c(cccc1N(=O)=O)N(=O)=O,"2,6-Dinitrotoluene",-2.553,1,182.135,0,1,2,86.28,-3.0 422 | CC(=O)C1CCC2C3CCC4=CC(=O)CCC4(C)C3CCC12C,Progesterone,-4.17,1,314.46900000000005,0,4,1,34.14,-4.42 423 | CCN(CC)c1nc(Cl)nc(n1)N(CC)CC,Chlorazine,-3.663,1,257.76899999999995,0,1,6,45.150000000000006,-4.4110000000000005 424 | ClC(Cl)C(Cl)(Cl)SN2C(=O)C1CC=CCC1C2=O ,captafol,-4.365,1,349.06600000000014,0,2,3,37.38,-5.4 425 | c1(Br)c(Br)cc(Br)cc1,"1,2,4-tribromobenzene",-5.144,1,314.802,0,1,0,0.0,-4.5 426 | OC3N=C(c1ccccc1)c2cc(Cl)ccc2NC3=O ,Oxazepam,-3.517,1,286.718,2,3,1,61.690000000000005,-3.952 427 | O=C1NC(=O)NC(=O)C1(C(C)CCC)CC=C,Secobarbital,-2.415,1,238.28699999999995,2,1,5,75.27000000000001,-2.356 428 | c1(O)c(C)ccc(C(C)C)c1,Carvacrol,-3.224,1,150.22099999999998,1,1,1,20.23,-2.08 429 | C1SC(=S)NC1(=O),rhodanine,-0.396,1,133.197,1,1,0,29.1,-1.77 430 | Oc1ccc(c(O)c1)c3oc2cc(O)cc(O)c2c(=O)c3O ,Morin,-2.7310000000000003,1,302.23800000000006,5,3,1,131.36,-3.083 431 | ClC1(C(=O)C2(Cl)C3(Cl)C14Cl)C5(Cl)C2(Cl)C3(Cl)C(Cl)(Cl)C45Cl,Kepone,-5.112,1,490.6390000000001,0,6,0,17.07,-5.259 432 | CCN(CC)C(=S)SSC(=S)N(CC)CC,Disulfiram,-3.862,1,296.5520000000001,0,0,4,6.48,-4.86 433 | C1CCCCC1,Cyclohexane,-2.477,2,84.162,0,1,0,0.0,-3.1 434 | ClC1=C(Cl)C(Cl)(C(=C1Cl)Cl)C2(Cl)C(=C(Cl)C(=C2Cl)Cl)Cl,Dienochlor,-7.848,1,474.64,0,2,1,0.0,-7.278 435 | CN(C)C=Nc1ccc(Cl)cc1C,chlordimeform,-3.164,1,196.681,0,1,2,15.6,-2.86 436 | CC34CCc1c(ccc2cc(O)ccc12)C3CCC4=O,Equilenin,-3.927,1,266.34,1,4,0,37.3,-5.24 437 | CCCCCCCCO,1-Octanol,-2.105,1,130.23100000000002,1,0,6,20.23,-2.39 438 | CCSCC,Diethyl sulfide,-1.598,1,90.191,0,0,2,0.0,-1.34 439 | ClCCCl,"1,2-Dichloroethane",-1.374,1,98.96,0,0,1,0.0,-1.06 440 | CCC(C)(C)Cl,2-Chloro-2-methylbutane,-2.278,1,106.596,0,0,1,0.0,-2.51 441 | ClCCBr,1-Chloro-2-bromoethane,-1.7380000000000002,1,143.411,0,0,1,0.0,-1.32 442 | Nc1ccc(cc1)N(=O)=O,p-Nitroaniline,-1.936,1,138.126,1,1,1,69.16,-2.37 443 | OCC1OC(OC2C(O)C(O)C(O)OC2CO)C(O)C(O)C1O,Lactose,1.071,1,342.297,8,2,4,189.53,-0.244 444 | CCN2c1ncccc1N(CC)C(=O)c3cccnc23 ,RTI 2,-3.125,1,268.32,0,3,2,49.330000000000005,-2.86 445 | Clc1ccccc1,Chlorobenzene,-2.975,1,112.55899999999995,0,1,0,0.0,-2.38 446 | CCCCCCCC=C,1-Nonene ,-3.427,1,126.243,0,0,6,0.0,-5.05 447 | Brc1ccc(I)cc1,p-Bromoiodobenzene,-4.754,1,282.90599999999995,0,1,0,0.0,-4.56 448 | CCC(C)(O)CC,3-Methyl-3-pentanol,-1.308,1,102.17699999999998,1,0,2,20.23,-0.36 449 | CCCCCc1ccccc1,Pentylbenzene,-3.899,1,148.249,0,1,4,0.0,-4.64 450 | NC(=O)NC1NC(=O)NC1=O ,allantoin,0.652,1,158.117,4,1,1,113.32,-1.6 451 | OCC(O)COC(=O)c1ccccc1Nc2ccnc3cc(Cl)ccc23,Glafenine,-5.052,1,372.80800000000016,3,3,6,91.68,-4.571000000000001 452 | ClC(Cl)C(c1ccc(Cl)cc1)c2ccc(Cl)cc2,DDD,-6.007999999999999,1,320.04600000000005,0,2,3,0.0,-7.2 453 | CC(=O)OC3CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,testosterone acetate,-4.449,1,330.4680000000001,0,4,1,43.370000000000005,-5.184 454 | Clc1cccc2ccccc12,1-Chloronapthalene,-4.063,1,162.61899999999997,0,2,0,0.0,-3.93 455 | CCN2c1ccccc1N(C)C(=O)c3ccccc23 ,RTI 19,-4.007,1,252.31699999999995,0,3,1,23.55,-4.749 456 | CCCCC(C)O,2-Hexanol,-1.324,1,102.17699999999998,1,0,3,20.23,-0.89 457 | CCCC1CCCC1,Propylcyclopentane,-3.16,1,112.216,0,1,2,0.0,-4.74 458 | CCOC(=O)c1cncn1C(C)c2ccccc2,Etomidate,-3.359,1,244.294,0,2,4,44.12,-4.735 459 | Oc1ccc(Cl)c(Cl)c1,"3,4-Dichlorophenol",-3.352,1,163.003,1,1,0,20.23,-1.25 460 | CC1(C)C(C=C(Cl)Cl)C1C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Cypermethrin,-6.775,1,416.30400000000014,0,3,6,59.32000000000001,-8.017000000000001 461 | c2ccc1ocnc1c2,Benzoxazole,-2.214,2,119.12299999999998,0,2,0,26.03,-1.16 462 | CCCCCO,1-Pentanol,-1.042,1,88.14999999999999,1,0,3,20.23,-0.6 463 | CCN(CC)c1ccccc1,"N,N-Diethylaniline",-3.16,1,149.237,0,1,3,3.24,-3.03 464 | Fc1cccc(F)c1,"1,3-Difluorobenzene",-2.636,1,114.094,0,1,0,0.0,-2.0 465 | ClCCC#N ,3-chloropropionitrile,-0.522,1,89.525,0,0,1,23.79,-0.29 466 | CC(C)(C)Cc1ccccc1,t-Pentylbenzene,-3.867,1,148.249,0,1,1,0.0,-4.15 467 | O=C1NC(=O)NC(=O)C1(CC)c1ccccc1,5-Ethyl-5-phenylbarbital,-2.272,1,232.239,2,2,2,75.27000000000001,-2.322 468 | Clc1ccccc1I,o-Chloroiodobenzene,-4.384,1,238.455,0,1,0,0.0,-3.54 469 | c2ccc1[nH]nnc1c2,Benzotriazole,-2.21,2,119.127,1,2,0,41.57,-0.78 470 | CNC(=O)Oc1cccc2CC(C)(C)Oc12,Carbofuran,-3.05,1,221.256,1,2,1,47.56,-2.8 471 | Cc1cccc(C)c1O,"2,6-Dimethylphenol",-2.589,1,122.167,1,1,0,20.23,-1.29 472 | CC(C)C(C)O,3-Methyl-2-butanol,-0.954,1,88.14999999999999,1,0,1,20.23,-0.18 473 | c1ccccc1C(O)c2ccccc2,benzhydrol,-3.033,1,184.238,1,2,2,20.23,-2.55 474 | CCCCCCCCCC(=O)OC,Methyl decanoate,-3.3160000000000003,1,186.295,0,0,8,26.3,-4.69 475 | COP(=S)(OC)Oc1ccc(cc1Cl)N(=O)=O,Dicapthon,-4.188,1,297.656,0,1,5,70.83000000000001,-4.31 476 | CC(C)CBr,1-Bromo-2-methylpropane,-2.2880000000000003,1,137.01999999999998,0,0,1,0.0,-2.43 477 | CCI,Iodoethane,-2.066,1,155.966,0,0,0,0.0,-1.6 478 | CN(C)C(=O)Oc1nc(nc(C)c1C)N(C)C,Pirimicarb,-2.34,1,238.291,0,1,2,58.56000000000001,-1.95 479 | CCCCCCBr,1-Bromohexane,-3.012,1,165.074,0,0,4,0.0,-3.81 480 | CCCC(C)C,2-Methylpentane,-2.6,1,86.178,0,0,2,0.0,-3.74 481 | Cc1c(F)c(F)c(COC(=O)C2C(C=C(Cl)C(F)(F)F)C2(C)C)c(F)c1F,Tetrafluthrin,-6.339,1,418.7360000000001,0,2,4,26.3,-7.321000000000001 482 | CCc1cccc(C)c1N(C(C)COC)C(=O)CCl ,Metolachlor,-3.431,1,283.7989999999999,0,1,6,29.54,-2.73 483 | ON=Cc1ccc(o1)N(=O)=O ,nifuroxime,-1.843,1,156.09699999999998,1,1,2,88.87,-2.19 484 | CC(C)C(Nc1ccc(cc1Cl)C(F)(F)F)C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Fluvalinate,-8.057,1,502.9200000000002,1,3,8,71.35,-8.003 485 | Nc1nc[nH]n1,Amitrole,-0.674,1,84.082,2,1,0,67.59,0.522 486 | BrC(Br)Br,Tribromomethane,-2.904,1,252.731,0,0,0,0.0,-1.91 487 | COP(=O)(OC)C(O)C(Cl)(Cl)Cl,Trichlorfon,-1.866,1,257.437,1,0,3,55.760000000000005,-0.22 488 | CCOP(=S)(OCC)SCn1c(=O)oc2cc(Cl)ccc12,Phosalone,-5.024,1,367.8160000000001,0,2,7,53.6,-5.233 489 | OCc1ccccc1,Phenylmethanol,-1.699,1,108.13999999999996,1,1,1,20.23,-0.4 490 | O=c2c(C3CCCc4ccccc43)c(O)c1ccccc1o2 ,Coumatetralyl,-5.194,1,292.33400000000006,1,4,1,50.44,-2.84 491 | Oc1ccc(Br)cc1,4-Bromophenol,-3.132,1,173.009,1,1,0,20.23,-1.09 492 | CC(C)Br,2-Bromopropane,-1.949,1,122.993,0,0,0,0.0,-1.59 493 | CC(C)CC(C)(C)C,"2,2,4-Trimethylpentane",-3.276,1,114.232,0,0,1,0.0,-4.74 494 | O=N(=O)c1cc(cc(c1)N(=O)=O)N(=O)=O,"1,3,5-Trinitrobenzene",-2.324,1,213.105,0,1,3,129.42000000000002,-2.89 495 | CN2C(=O)CN=C(c1ccccc1)c3cc(ccc23)N(=O)=O,Nimetazepam,-3.557,1,295.29800000000006,0,3,2,75.81,-3.796 496 | CCC,Propane,-1.5530000000000002,1,44.097,0,0,0,0.0,-1.94 497 | Nc1cc(nc(N)n1=O)N2CCCCC2 ,Minoxidil,-1.809,1,209.253,2,2,1,95.11,-1.989 498 | Nc2cccc3nc1ccccc1cc23 ,1-aminoacridine,-3.542,1,194.237,1,3,0,38.91,-4.22 499 | c1ccc2cc3c4cccc5cccc(c3cc2c1)c45,Benzo(k)fluoranthene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.49 500 | OC(c1ccc(Cl)cc1)(c2ccc(Cl)cc2)C(Cl)(Cl)Cl,Dicofol,-6.268,1,370.49,1,2,2,20.23,-5.666 501 | C1Cc2cccc3cccc1c23,Acenapthene,-3.792,2,154.21199999999996,0,3,0,0.0,-4.63 502 | CCOP(=S)(OCC)SC(CCl)N2C(=O)c1ccccc1C2=O,Dialifos,-5.026,1,393.85400000000016,0,2,8,55.84,-6.34 503 | Brc1ccc(Br)cc1,"1,4-Dibromobenzene",-4.298,1,235.906,0,1,0,0.0,-4.07 504 | Cn2c(=O)on(c1ccc(Cl)c(Cl)c1)c2=O,Methazole,-3.6010000000000004,1,261.064,0,2,1,57.14,-2.82 505 | Oc1ccc(cc1)c2ccccc2,p-Phenylphenol,-3.701,1,170.211,1,2,1,20.23,-3.48 506 | CC1=C(CCCO1)C(=O)Nc2ccccc2 ,pyracarbolid,-2.83,1,217.26800000000003,1,2,2,38.33,-2.56 507 | CCOC=C,Ethyl vinyl ether,-0.857,1,72.10700000000001,0,0,2,9.23,-0.85 508 | CCC#C,1-Butyne,-1.092,1,54.09199999999999,0,0,0,0.0,-1.24 509 | COc1ncnc2nccnc12 ,4-methoxypteridine,-1.589,1,162.15200000000002,0,2,1,60.790000000000006,-1.11 510 | CCCCC(C)(O)CC,3-Methyl-3-heptanol,-2.017,1,130.23099999999997,1,0,4,20.23,-1.6 511 | Clc1ccc(Cl)cc1,"1,4-Dichlorobenzene",-3.5580000000000003,1,147.00400000000002,0,1,0,0.0,-3.27 512 | O=C1N(COC(=O)C)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Ethanoyloxymethylphenytoin,-2.7230000000000003,1,324.33600000000007,1,3,4,75.71,-4.47 513 | CSCS(=O)CC(CO)NC(=O)C=Cc1c(C)[nH]c(=O)[nH]c1=O,"Sparsomycin (3,8mg/ml)",-1.57,1,361.4450000000001,4,1,8,132.11999999999998,-1.981 514 | Cc1c[nH]c2ccccc12 ,3-methylindole,-2.9810000000000003,1,131.17799999999997,1,2,0,15.79,-2.42 515 | COc2ncc1nccnc1n2,2-methoxypteridine,-1.589,1,162.152,0,2,1,60.790000000000006,-1.11 516 | CNC(=O)Oc1ccccc1C2OCCO2,Dioxacarb,-1.614,1,223.22799999999995,1,2,2,56.790000000000006,-1.57 517 | C1N(C(=O)NCC(C)C)C(=O)NC1,isocarbamid,-1.508,1,185.227,2,1,2,61.440000000000005,-2.15 518 | CC#N,Acetonitrile,0.152,1,41.053,0,0,0,23.79,0.26 519 | CCOC(=O)NCCOc2ccc(Oc1ccccc1)cc2,Fenoxycarb,-4.662,1,301.34200000000004,1,2,7,56.790000000000006,-4.7 520 | CC(=O)N(S(=O)c1ccc(N)cc1)c2onc(C)c2C ,acetyl sulfisoxazole,-2.024,1,293.34800000000007,1,2,3,89.43,-3.59 521 | ClCC(Cl)(Cl)Cl,"1,1,1,2-Tetrachloroethane",-2.794,1,167.85,0,0,0,0.0,-2.18 522 | CCCCO,1-Butanol,-0.688,1,74.12299999999999,1,0,2,20.23,0.0 523 | CC1CCCCC1NC(=O)Nc2ccccc2,Siduron,-3.779,1,232.32700000000003,2,2,2,41.13,-4.11 524 | Clc1cc(Cl)cc(Cl)c1,"1,3,5-Trichlorobenzene",-4.159,1,181.449,0,1,0,0.0,-4.48 525 | O=Cc1ccco1,Furfural,-1.391,1,96.08499999999998,0,1,1,30.21,-0.1 526 | CC(C)CCO,3-Methylbutan-1-ol,-1.027,1,88.14999999999999,1,0,2,20.23,-0.51 527 | O=Cc2ccc1OCOc1c2 ,piperonal,-2.033,1,150.13299999999998,0,2,1,35.53,-1.63 528 | CC(=C)C,2-Methylpropene,-1.5730000000000002,1,56.108,0,0,0,0.0,-2.33 529 | O=Cc1ccccc1,Benzaldehyde,-1.999,1,106.12399999999997,0,1,1,17.07,-1.19 530 | CC(=C)C(=C)C,"2,3-Dimethyl-1,3-Butadiene",-2.052,1,82.146,0,0,1,0.0,-2.4 531 | CCOC(=O)CCN(SN(C)C(=O)Oc1cccc2CC(C)(C)Oc21)C(C)C,Benfuracarb,-5.132999999999999,1,410.5360000000002,0,2,8,68.31,-4.71 532 | O2c1ccccc1N(C)C(=O)c3cccnc23 ,RTI 10,-2.7710000000000004,1,226.235,0,3,0,42.43,-3.672 533 | C1c2ccccc2c3ccccc13,Fluorene ,-4.125,2,166.22299999999998,0,3,0,0.0,-5.0 534 | CC1CCCCC1,Methylcyclohexane ,-2.891,1,98.189,0,1,0,0.0,-3.85 535 | NC(=N)NS(=O)(=O)c1ccc(N)cc1 ,sulfaguanidine,-0.706,1,214.25,4,1,2,122.06,-1.99 536 | COC(=O)c1ccc(O)cc1,Methylparaben,-2.441,1,152.149,1,1,1,46.53,-1.827 537 | CC1CCCO1,2-Methyltetrahydrofurane,-1.034,1,86.134,0,1,0,9.23,0.11 538 | CC3C2CCC1(C)C=CC(=O)C(=C1C2OC3=O)C,Santonin,-2.43,1,246.30599999999995,0,3,0,43.370000000000005,-3.09 539 | OCC2OC(Oc1ccccc1CO)C(O)C(O)C2O,Salicin,-0.975,1,286.28,5,2,4,119.61,-0.85 540 | CCCI,1-Iodopropane,-2.486,1,169.993,0,0,1,0.0,-2.29 541 | CCNc1nc(NC(C)C)nc(SC)n1,Ametryn,-3.43,1,227.337,2,1,5,62.73,-3.04 542 | CCCO,1-Propanol,-0.3339999999999999,1,60.096,1,0,1,20.23,0.62 543 | CC(=O)C1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3CCC21C,Hydroxyprogesterone-17a,-3.876,1,330.4680000000001,1,4,1,54.37,-3.817 544 | CCCC(C)O,2-Pentanol,-0.97,1,88.14999999999999,1,0,2,20.23,-0.29 545 | OC(C(=O)c1ccccc1)c2ccccc2,benzoin,-3.148,1,212.248,1,2,3,37.3,-2.85 546 | Cc1ccc(O)c(C)c1,"2,4-Dimethylphenol",-2.6210000000000004,1,122.167,1,1,0,20.23,-1.19 547 | Clc1cccc(c1)N(=O)=O,m-Chloronitrobenzene ,-2.901,1,157.55599999999998,0,1,1,43.14,-2.77 548 | Cc2c(N)c(=O)n(c1ccccc1)n2C,ampyrone,-1.192,1,203.245,1,2,1,52.95,-0.624 549 | Clc1ccc(c(Cl)c1)c2cc(Cl)ccc2Cl ,"2,2',4,5'-PCB",-6.23,1,291.992,0,2,1,0.0,-6.57 550 | ClC(=C(Cl)C(=C(Cl)Cl)Cl)Cl,"Hexachloro-1,3-butadiene",-4.546,1,260.762,0,0,1,0.0,-4.92 551 | CCNc1nc(NC(C)(C)C)nc(SC)n1,Terbutryn,-3.75,1,241.364,2,1,4,62.73,-4.0 552 | CCC(C)CCO,3-Methyl-2-pentanol,-1.308,1,102.177,1,0,3,20.23,-0.71 553 | Cc2ncc1nccnc1n2,2-methylpteridine,-1.24,1,146.153,0,2,0,51.56,-0.12 554 | CC23Cc1cnoc1C=C2CCC4C3CCC5(C)C4CCC5(O)C#C,Danazol,-4.557,1,337.4630000000001,1,5,0,46.260000000000005,-5.507000000000001 555 | CCCCI,1-Iodobutane,-2.841,1,184.02,0,0,2,0.0,-2.96 556 | Brc1ccc2ccccc2c1,2-Bromonapthalene,-4.434,1,207.07,0,2,0,0.0,-4.4 557 | CC1OC(CC(O)C1O)OC2C(O)CC(OC2C)OC8C(O)CC(OC7CCC3(C)C(CCC4C3CC(O)C5(C)C(CCC45O)C6=CC(=O)OC6)C7)OC8C ,"Digoxin (L1=41,8mg/mL, L2=68,2mg/mL, Z=40,1mg/mL)",-5.312,1,780.9490000000001,6,8,7,203.06,-4.081 558 | FC(F)(F)c1ccccc1,Benzyltrifluoride,-3.099,1,146.111,0,1,0,0.0,-2.51 559 | CCCCCCOC(=O)c1ccccc1C(=O)OCCCCCC,Dihexyl phthalate,-5.757999999999999,1,334.45600000000024,0,1,12,52.60000000000001,-6.144 560 | c1ccc2c(c1)sc3ccccc23,Dibenzothiophene,-4.597,2,184.263,0,3,0,0.0,-4.38 561 | Clc1ccc(c(Cl)c1)c2ccc(Cl)c(Cl)c2Cl ,"2,3',4,4'-PCB",-6.709,1,326.437,0,2,1,0.0,-7.8 562 | Clc1ccc(c(Cl)c1Cl)c2ccc(Cl)c(Cl)c2Cl ,"2,2',3,3',4,4'-PCB",-7.192,1,360.88200000000006,0,2,1,0.0,-8.01 563 | CC(=O)CC(c1ccccc1)c3c(O)c2ccccc2oc3=O ,Warfarin,-3.913,1,308.3330000000001,1,3,4,67.50999999999999,-3.893 564 | c1ccccc1C(O)C(O)c2ccccc2,hydrobenzoin,-2.645,1,214.264,2,2,3,40.46,-1.93 565 | COC(=O)c1ccccc1C(=O)OC,Dimethyl phthalate,-2.347,1,194.18599999999995,0,1,2,52.60000000000001,-1.66 566 | CCCCCCCC(=O)OCC,Ethyl octanoate,-2.962,1,172.26799999999997,0,0,7,26.3,-3.39 567 | CCSSCC,Diethyldisulfide,-2.364,1,122.258,0,0,3,0.0,-2.42 568 | CCOCCOCC,"1,2-Diethoxyethane ",-0.833,1,118.176,0,0,5,18.46,-0.77 569 | Clc1cc(Cl)c(Cl)cc1Cl,"1,2,4,5-Tetrachlorobenzene",-4.621,1,215.894,0,1,0,0.0,-5.56 570 | Nc1ccc(cc1)c2ccc(N)cc2,p-benzidine,-2.613,1,184.242,2,2,1,52.04,-2.7 571 | CCCCCC=C,1-Heptene,-2.718,1,98.189,0,0,4,0.0,-3.73 572 | CCCCc1c(C)nc(NCC)[nH]c1=O,Ethirimol,-2.732,1,209.293,2,1,5,57.78,-3.028 573 | O=C1NC(=O)NC(=O)C1(CC)C(C)CCC,Pentobarbital,-2.312,1,226.27599999999995,2,1,4,75.27000000000001,-2.39 574 | Nc1ccccc1Cl,o-Chloroaniline,-2.392,1,127.574,1,1,0,26.02,-1.52 575 | COc1cccc(Cl)c1,3-Chloroanisole,-3.057,1,142.58499999999998,0,1,1,9.23,-2.78 576 | CCCCN(CC)C(=O)SCCC,Pebulate,-3.131,1,203.351,0,0,6,20.31,-3.53 577 | CCCCOC=O,Butyl acetate,-1.111,1,102.13299999999998,0,0,4,26.3,-1.37 578 | CC12CC(O)C3C(CCC4=CC(=O)C=CC34C)C2CCC1(O)C(=O)CO,Prednisolone,-2.974,1,360.4500000000002,3,4,2,94.83,-3.18 579 | BrC(Cl)Cl,Bromodichloromethane,-2.176,1,163.82899999999998,0,0,0,0.0,-1.54 580 | CC34CC(=O)C1C(CCC2=CC(=O)CCC12C)C3CCC4(=O) ,adrenosterone,-2.99,1,300.3980000000001,0,4,0,51.21,-3.48 581 | c1ccc(cc1)c2ccc(cc2)c3ccccc3,p-terphenyl,-5.7410000000000005,2,230.31,0,3,2,0.0,-7.11 582 | Oc1ccc(C=O)cc1,p-Hydroxybenzaldehyde ,-2.003,1,122.12299999999998,1,1,1,37.3,-0.96 583 | CBr,Bromomethane,-1.109,1,94.939,0,0,0,0.0,-0.79 584 | Cc1cc(ccc1NS(=O)(=O)C(F)(F)F)S(=O)(=O)c2ccccc2,Perfluidone,-4.945,1,379.381,1,2,4,80.31,-3.8 585 | CC(=O)CC(c1ccc(Cl)cc1)c2c(O)c3ccccc3oc2=O,Coumachlor,-4.553999999999999,1,342.7780000000001,1,3,4,67.50999999999999,-5.839 586 | CCc1ccc2ccccc2c1,2-Ethylnaphthalene,-4.1,1,156.22799999999998,0,2,1,0.0,-4.29 587 | Nc1c(C)c[nH]c(=O)n1 ,5-methylcytosine,-0.257,1,125.131,2,1,0,71.77000000000001,-1.4580000000000002 588 | Clc2c(Cl)c(Cl)c(c1ccccc1)c(Cl)c2Cl ,"2,3,4,5,6-PCB",-6.785,1,326.437,0,2,1,0.0,-7.92 589 | c1c(NC(=O)c2ccccc2(I))cccc1,benodanil,-4.245,1,323.133,1,2,2,29.1,-4.21 590 | Cc3cc2nc1c(=O)[nH]c(=O)nc1n(CC(O)C(O)C(O)CO)c2cc3C,Riboflavin,-1.865,1,376.36900000000014,5,3,5,161.56,-3.685 591 | Fc1ccccc1Br,o-Fluorobromobenzene,-3.467,1,175.0,0,1,0,0.0,-2.7 592 | Oc1ccc(Cl)cc1Cl,"2,4-Dichlorophenol ",-3.22,1,163.003,1,1,0,20.23,-1.55 593 | CC1(C)C(C=C(Cl)Cl)C1C(=O)OCc2cccc(Oc3ccccc3)c2,Permethrin,-7.129,1,391.2940000000001,0,3,6,35.53,-6.291 594 | CN2C(=C(O)c1ccccc1S2(=O)=O)C(=O)Nc3ccccn3 ,piroxicam,-3.4730000000000003,1,331.353,2,3,2,99.6,-4.16 595 | O=C1N(COC(=O)CC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Propanoyloxymethylphenytoin,-3.128,1,338.36300000000006,1,3,5,75.71,-4.907 596 | C1CCCC1,Cyclopentane ,-2.0380000000000003,2,70.135,0,1,0,0.0,-2.64 597 | Cc1ccccc1N,o-Toluidine,-1.922,1,107.156,1,1,0,26.02,-2.21 598 | c1(OC)ccc(CC=C)cc1,Estragole,-3.074,1,148.205,0,1,3,9.23,-2.92 599 | CN(C)C(=O)Nc1cccc(OC(=O)NC(C)(C)C)c1 ,karbutilate,-2.655,1,279.34,2,1,2,70.67,-2.93 600 | CC(C)C=C,3-Methyl-1-Butene,-1.994,1,70.135,0,0,1,0.0,-2.73 601 | Oc1ccccn1,2-Hydroxypyridine,-1.655,1,95.101,1,1,0,33.120000000000005,1.02 602 | CC,Ethane,-1.132,1,30.07,0,0,0,0.0,-1.36 603 | Clc1ccccc1Cl,"1,2-Dichlorobenzene",-3.482,1,147.00399999999996,0,1,0,0.0,-3.05 604 | Sc2nc1ccccc1s2 ,mercaptobenzothiazole,-3.411,1,167.25799999999998,1,2,0,12.89,-3.18 605 | Clc1c(Cl)c(Cl)c(c(Cl)c1Cl)c2c(Cl)c(Cl)c(Cl)c(Cl)c2Cl ,"2,2',3,3',4,4',5,5',6,6'-PCB",-9.589,1,498.66200000000026,0,2,1,0.0,-11.6 606 | COc2c1occc1cc3ccc(=O)oc23 ,Methoxsalen,-3.25,1,216.19199999999995,0,3,1,52.58,-3.664 607 | CC(=O)N,Acetamide,0.494,1,59.068,1,0,0,43.09,1.58 608 | Cc1cccc2ccccc12,1-Methylnaphthalene,-3.802,1,142.201,0,2,0,0.0,-3.7 609 | CCN(CC)C(=O)C(C)Oc1cccc2ccccc12 ,Napropamide,-4.088,1,271.36,0,2,5,29.540000000000003,-3.57 610 | CC(O)C(C)(C)C,"3,3-Dimethyl-2-butanol",-1.2919999999999998,1,102.177,1,0,0,20.23,-0.62 611 | CCCC(=O)OCC,Methyl pentanoate,-1.545,1,116.15999999999998,0,0,3,26.3,-1.36 612 | CC2=CC(=O)c1ccccc1C2=O ,Menadione,-2.667,1,172.18299999999996,0,2,0,34.14,-3.03 613 | c1ccc2c(c1)ccc3ccccc32,Phenanthrene,-4.518,2,178.23399999999998,0,3,0,0.0,-5.26 614 | Cc1ccnc(C)c1,"2,4-Dimethylpyridine",-2.0980000000000003,1,107.156,0,1,0,12.89,0.38 615 | CCCCCCCCCO,1-Nonanol,-2.46,1,144.258,1,0,7,20.23,-3.01 616 | BrCBr,Dibromomethane,-1.883,1,173.83499999999998,0,0,0,0.0,-1.17 617 | CC1CC2C3CCC4=CC(=O)C=CC4(C)C3(F)C(O)CC2(C)C1(O)C(=O)CO,Dexamethasone,-3.4,1,392.4670000000002,3,4,2,94.83,-3.59 618 | Cc1ccc2cc(C)ccc2c1,"2,6-Dimethylnaphthalene ",-4.147,1,156.228,0,2,0,0.0,-4.89 619 | CCSC(=O)N(CC(C)C)CC(C)C,Butylate,-3.4530000000000003,1,217.378,0,0,5,20.31,-3.68 620 | O=N(=O)OCC(CON(=O)=O)ON(=O)=O ,nitroglycerin,-2.029,1,227.085,0,0,8,157.11,-2.22 621 | Nc1cccc(c1)N(=O)=O,m-Nitroaniline,-1.936,1,138.126,1,1,1,69.16,-2.19 622 | CCCCCl,1-Chlorobutane,-1.94,1,92.569,0,0,2,0.0,-2.03 623 | ClC(Cl)(Cl)C(NC=O)N1C=CN(C=C1)C(NC=O)C(Cl)(Cl)Cl ,triforine,-3.715,1,430.9340000000001,2,1,6,64.68,-4.19 624 | Cn2cc(c1ccccc1)c(=O)c(c2)c3cccc(c3)C(F)(F)F,Fluridone,-4.249,1,329.321,0,3,2,22.0,-4.445 625 | Nc3cc2c1ccccc1ccc2c4ccccc34 ,6-aminochrysene,-4.849,1,243.309,1,4,0,26.02,-6.2 626 | CC12CCC3C(CCc4cc(O)ccc34)C2CCC1=O,Estrone,-3.872,1,270.372,1,4,0,37.3,-3.955 627 | CCN2c1ccccc1N(C)C(=S)c3cccnc23 ,RTI 17,-4.227,1,269.373,0,3,1,19.37,-4.706 628 | CC1CO1,"1,2-Propylene oxide",-0.358,1,58.08,0,1,0,12.53,-0.59 629 | O=C3CN=C(c1ccccc1)c2cc(ccc2N3)N(=O)=O,Nitrazepam,-3.4730000000000003,1,281.271,1,3,2,84.6,-3.796 630 | CCNC(=S)NCC,"1,3-diethylthiourea",-1.028,1,132.232,2,0,2,24.06,-1.46 631 | Oc1cc(Cl)cc(Cl)c1Cl,"2,3,5-Trichlorophenol",-3.78,1,197.448,1,1,0,20.23,-2.67 632 | CCCCC(=O)OC,Propyl propanoate,-1.545,1,116.15999999999998,0,0,3,26.3,-1.34 633 | Nc1ccccc1,Aniline ,-1.632,1,93.129,1,1,0,26.02,-0.41 634 | Cc1cccc2c(C)cccc12,"1,5-Dimethlnapthalene",-4.147,1,156.228,0,2,0,0.0,-4.678999999999999 635 | NS(=O)(=O)c2cc1c(NCNS1(=O)=O)cc2Cl ,hydrochlorothiazide,-1.72,1,297.745,3,2,1,118.36,-2.63 636 | C1=Cc2cccc3cccc1c23,Acenapthylene,-3.682,2,152.19599999999994,0,3,0,0.0,-3.96 637 | CCCCCOC(=O)CC,Ethyl butyrate,-2.254,1,144.21399999999997,0,0,5,26.3,-1.28 638 | CCNc1nc(NC(C)C)nc(OC)n1,Atratone,-3.185,1,211.269,2,1,5,71.96000000000001,-2.084 639 | c1ccc2c(c1)cc3ccc4cccc5ccc2c3c45,Benzo(a)pyrene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.699 640 | CCBr,Bromoethane,-1.529,1,108.966,0,0,0,0.0,-1.09 641 | CCC#CCC,3-Hexyne,-1.933,1,82.14599999999999,0,0,0,0.0,-1.99 642 | CC1OC(CC(O)C1O)OC2C(O)CC(OC2C)OC8C(O)CC(OC7CCC3(C)C(CCC4C3CCC5(C)C(CCC45O)C6=CC(=O)OC6)C7)OC8C ,Digitoxin,-6.114,1,764.9499999999999,5,8,7,182.83,-5.292999999999999 643 | CCC(=C)C,2-Methyl-1-Butene,-1.994,1,70.13499999999999,0,0,1,0.0,-2.73 644 | Oc1cccc2cccnc12 ,8-quinolinol,-2.725,1,145.16099999999997,1,2,0,33.120000000000005,-2.42 645 | C1CCc2ccccc2C1,"1,2,3,4-Tetrahydronapthalene",-3.447,2,132.20599999999996,0,2,0,0.0,-4.37 646 | Oc1ccc(cc1)C2(OC(=O)c3ccccc23)c4ccc(O)cc4 ,phenolphthalein,-4.59,1,318.32800000000003,2,4,2,66.76,-2.9 647 | Brc1cc(Br)cc(Br)c1,"1,3,5-Tribromobenzene",-5.27,1,314.802,0,1,0,0.0,-5.6 648 | COP(=S)(OC)Oc1cc(Cl)c(Cl)cc1Cl,Ronnel,-5.247000000000001,1,321.549,0,1,4,27.69,-5.72 649 | Cc1cc(=O)[nH]c(=S)[nH]1,methylthiouracil,-0.547,1,142.18300000000002,2,1,0,48.65,-2.436 650 | COc1cc(CC=C)ccc1O,Eugenol,-2.675,1,164.204,1,1,3,29.46,-1.56 651 | O=C1NC(=O)NC(=O)C1(C(C)C)CC=C,5-Allyl-5-isopropylbarbital,-1.706,1,210.233,2,1,3,75.27000000000001,-1.7080000000000002 652 | c1cc2ccc3cccc4ccc(c1)c2c34,Pyrene,-4.957,2,202.256,0,4,0,0.0,-6.176 653 | CCOC(C)OCC,"1,1-Diethoxyethane ",-0.899,1,118.176,0,0,4,18.46,-0.43 654 | CC1(C)CON(Cc2ccccc2Cl)C1=O,Clomazone,-3.077,1,239.702,0,2,2,29.54,-2.338 655 | CCCCOCCO,2-Butoxyethanol,-0.775,1,118.176,1,0,5,29.46,-0.42 656 | Clc1c(Cl)c(Cl)c(N(=O)=O)c(Cl)c1Cl,Quintozene,-5.098,1,295.336,0,1,1,43.14,-5.82 657 | CC12CCC(O)CC1CCC3C2CCC4(C)C3CCC4=O,Androsterone,-3.882,1,290.447,1,4,0,37.3,-4.402 658 | FC(F)(F)c1cccc(c1)N2CC(CCl)C(Cl)C2=O,Flurochloridone,-4.749,1,312.118,0,2,2,20.31,-4.047 659 | c1ccc2ncccc2c1,Quinoline,-2.6630000000000003,2,129.16199999999998,0,2,0,12.89,-1.3 660 | COC(=O)c1cc(O)c(O)c(O)c1 ,methyl gallate,-1.913,1,184.147,3,1,1,86.99000000000001,-1.24 661 | OC(Cn1cncn1)(Cn2cncn2)c3ccc(F)cc3F ,fluconazole,-2.418,1,306.276,1,3,5,81.64999999999999,-1.8 662 | Clc2ccc1oc(=O)[nH]c1c2,Chlorzoxazone,-2.679,1,169.567,1,2,0,46.0,-2.8310000000000004 663 | Clc1ccc(c(Cl)c1)c2c(Cl)c(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,4,4',5',6-PCB",-7.898,1,395.3270000000001,0,2,1,0.0,-7.92 664 | O=C1NC(=O)C(=O)C(=O)N1 ,alloxan,0.436,1,142.07,2,1,0,92.34,-1.25 665 | ClCCCCl,"1,3-Dichloropropane",-1.618,1,112.987,0,0,2,0.0,-1.62 666 | Fc1cccc(Br)c1,m-Fluorobromobenzene,-3.467,1,175.0,0,1,0,0.0,-2.67 667 | Clc1ccc(Br)cc1,p-Chlorobromobenzene,-3.928,1,191.455,0,1,0,0.0,-3.63 668 | CC(C)C(C)C,"2,3-Dimethylbutane",-2.584,1,86.178,0,0,1,0.0,-3.65 669 | CCC=C,1-Butene,-1.655,1,56.108,0,0,1,0.0,-1.94 670 | Clc1ccc(Cl)c(c1)c2cc(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,4,5,5'-PCB",-7.343,1,360.88200000000006,0,2,1,0.0,-7.68 671 | Nc1cc[nH]c(=O)n1 ,cytosine,0.051,1,111.104,2,1,0,71.77000000000001,-1.155 672 | FC(F)(Cl)C(F)(Cl)Cl,"1,1,2-Trichlorotrifluoroethane",-3.077,1,187.37500000000003,0,0,1,0.0,-3.04 673 | CCC#N,Propionitrile,-0.2689999999999999,1,55.07999999999999,0,0,0,23.79,0.28 674 | ClC(Cl)C(c1ccc(Cl)cc1)c2ccccc2Cl ,"O,P'-DDD",-6.007999999999999,1,320.04600000000005,0,2,3,0.0,-6.51 675 | COc1ccccc1N(=O)=O,o-Nitroanisole,-2.346,1,153.13699999999997,0,1,2,52.37,-1.96 676 | CC34CCC1C(CC=C2CC(O)CCC12C)C3CCC4=O,Prasterone,-3.564,1,288.43100000000004,1,4,0,37.3,-4.12 677 | CC12CC2(C)C(=O)N(C1=O)c3cc(Cl)cc(Cl)c3,Procymidone,-3.464,1,284.142,0,3,1,37.38,-4.8 678 | c1cc2ccc3ccc4ccc5cccc6c(c1)c2c3c4c56,Benzo[ghi]perylene,-6.446000000000001,2,276.338,0,6,0,0.0,-9.018 679 | CCC(C)c1cc(cc(N(=O)=O)c1O)N(=O)=O ,Dinoseb,-3.715,1,240.21499999999995,1,1,4,106.51000000000002,-3.38 680 | c1c(OC)c(OC)C2C(=O)OCC2c1,meconin,-0.825,1,196.202,0,2,2,44.760000000000005,-1.899 681 | OCC(O)CO,Glycerol,0.688,1,92.094,3,0,2,60.69,1.12 682 | COc1ccccc1O ,Guaiacol,-1.941,1,124.13899999999995,1,1,1,29.46,-1.96 683 | CCOP(=S)(OCC)Oc1nc(Cl)c(Cl)cc1Cl ,chlorpyrifos,-4.972,1,350.591,0,1,6,40.58,-5.67 684 | Cc1c2ccccc2cc3ccccc13,9-Methylanthracene,-4.87,1,192.261,0,3,0,0.0,-5.89 685 | Cc1cc(=O)n(c2ccccc2)n1C,Antipyrene,-1.733,1,188.23,0,2,1,26.93,0.715 686 | CCCCOC,Methyl butyl ether ,-1.072,1,88.14999999999999,0,0,3,9.23,-0.99 687 | Cc2cnc1cncnc1n2,7-methylpteridine,-1.24,1,146.153,0,2,0,51.56,-0.8540000000000001 688 | CCNc1nc(Cl)nc(NCC)n1 ,simazine,-2.8110000000000004,1,201.661,2,1,4,62.73,-4.55 689 | CN(C)C(=O)C,"N,N-Dimethylacetamide",0.123,1,87.12199999999999,0,0,0,20.31,1.11 690 | CSc1nc(nc(n1)N(C)C)N(C)C,Simetryn,-2.689,1,213.31,0,1,3,45.150000000000006,-2.676 691 | C=C,Ethylene,-0.815,1,28.053999999999995,0,0,0,0.0,-0.4 692 | CC(C)(C)CCO,"3,3-Dimethyl-1-butanol",-1.365,1,102.177,1,0,1,20.23,-0.5 693 | O=C1NC(=O)NC(=O)C1(CC)CC=C,5-Allyl-5-ethylbarbital,-1.368,1,196.206,2,1,3,75.27000000000001,-1.614 694 | Oc1ccc(Cl)c(Cl)c1Cl,"2,3,4-Trichlorophenol",-3.705,1,197.448,1,1,0,20.23,-2.67 695 | COc1ccccc1,Anisole,-2.3680000000000003,1,108.13999999999996,0,1,1,9.23,-1.85 696 | c1ccc(Cl)cc1C(c2ccc(Cl)cc2)(O)C(=O)OC(C)C,chloropropylate,-5.093,1,339.21800000000013,1,2,4,46.53,-4.53 697 | CC13CCC(=O)C=C1CCC4C2CCC(C(=O)CO)C2(CC(O)C34)C=O ,aldosterone,-3.0660000000000003,1,360.4500000000001,2,4,3,91.67000000000002,-3.85 698 | COc2ccc(Oc1ccc(NC(=O)N(C)C)cc1)cc2,Difenoxuron,-3.928,1,286.331,1,2,4,50.8,-4.16 699 | CCc1ccc(C)cc1,4-Ethyltoluene,-3.3280000000000003,1,120.19499999999996,0,1,1,0.0,-3.11 700 | CC(C)SC(C)C,Diisopropylsulfide,-2.162,1,118.245,0,0,2,0.0,-2.24 701 | O=N(=O)c1cccc(c1)N(=O)=O,"1,3-Dinitrobenzene",-2.281,1,168.10799999999995,0,1,2,86.28,-2.29 702 | CCOP(=S)(OCC)SCSP(=S)(OCC)OCC,Ethion,-5.471,1,384.4870000000002,0,0,12,36.92,-5.54 703 | CCC1(C(C)C)C(=O)NC(=O)NC1=O ,probarbital,-1.6030000000000002,1,198.222,2,1,2,75.27000000000001,-2.21 704 | CC(=O)OCC(=O)C3(O)CCC4C2CCC1=CC(=O)CCC1(C)C2C(=O)CC34C ,cortisone acetate,-3.426,1,402.48700000000025,1,4,3,97.74,-4.21 705 | Cc1ncc(N(=O)=O)n1CCO,Metronidazole,-0.8590000000000001,1,171.15599999999998,1,1,3,81.19,-1.22 706 | Nc1ccc(Cl)cc1,p-Chloroaniline,-2.392,1,127.574,1,1,0,26.02,-1.66 707 | CCCC(C)(C)CO,"2,2-Dimethylpentanol",-1.719,1,116.20399999999998,1,0,3,20.23,-1.52 708 | c1ccoc1,Furane,-1.837,2,68.07499999999999,0,1,0,13.14,-0.82 709 | COCCCNc1nc(NC(C)C)nc(SC)n1,Methoproptryne,-3.259,1,271.39,2,1,8,71.96000000000001,-2.928 710 | CN(C)C(=O)NC1CC2CC1C3CCCC23,Norea,-2.47,1,222.332,1,3,1,32.34,-3.1710000000000003 711 | CC(C)(C)c1ccccc1,t-Butylbenzene ,-3.554,1,134.22199999999998,0,1,0,0.0,-3.66 712 | CC(=O)CCC1C(=O)N(N(C1=O)c2ccccc2)c3ccccc3,kebuzone,-2.645,1,322.36400000000003,0,3,5,57.690000000000005,-3.27 713 | CC(=O)OCC(=O)C3(O)CCC4C2CCC1=CC(=O)C=CC1(C)C2C(O)CC34C ,prednisolone acetate,-3.507,1,402.48700000000014,2,4,3,100.90000000000002,-4.37 714 | CCCOC,Methyl propyl ether ,-0.718,1,74.12299999999999,0,0,2,9.23,-0.39 715 | CC(C)OC(=O)C,Isopropyl acetate,-1.1909999999999998,1,102.133,0,0,1,26.3,-0.55 716 | Brc1ccccc1,Bromobenzene,-3.345,1,157.01,0,1,0,0.0,-2.55 717 | CCOC(=O)c1ccc(O)cc1,Ethyl-p-hydroxybenzoate ,-2.761,1,166.176,1,1,2,46.53,-2.35 718 | O=C1N(COC(=O)CCC)C(=O)C(N1)(c2ccccc2)c3ccccc3,3-Butanoyloxymethylphenytoin,-3.469,1,352.39000000000004,1,3,6,75.71,-5.071000000000001 719 | CCC(=O)OC3CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,testosterone propionate,-4.87,1,344.4950000000001,0,4,2,43.370000000000005,-5.37 720 | c1cc2ccc3ccc4ccc5ccc6ccc1c7c2c3c4c5c67,Coronene,-6.885,2,300.36000000000007,0,7,0,0.0,-9.332 721 | O=c1[nH]cnc2[nH]ncc12 ,allopurinol,-0.84,1,136.114,2,2,0,74.43,-2.266 722 | ClC=C,Chloroethylene,-1.188,1,62.499,0,0,0,0.0,-1.75 723 | CN(C)C(=O)C(c1ccccc1)c2ccccc2 ,diphenamid,-3.147,1,239.318,0,2,3,20.31,-2.98 724 | BrC(Br)(Br)Br,Tetrabromomethane,-4.063,1,331.62699999999995,0,0,0,0.0,-3.14 725 | CCN2c1cc(N(C)C)cc(C)c1NC(=O)c3cccnc23 ,RTI 22,-4.408,1,296.374,1,3,2,48.47,-4.871 726 | O=C1NC(=O)c2ccccc12 ,phthalimide,-1.882,1,147.13299999999998,1,2,0,46.17,-2.61 727 | OC(c1ccc(Cl)cc1)(c2cncnc2)c3ccccc3Cl,Fenarimol,-4.1080000000000005,1,331.202,1,3,3,46.010000000000005,-4.38 728 | COC(=O)c1ccccc1,Methyl benzoate ,-2.462,1,136.14999999999998,0,1,1,26.3,-1.85 729 | Cn1ccc(=O)[nH]c1=O,1-methyluracil,-0.375,1,126.115,1,1,0,54.86,-0.807 730 | CCCCC1C(=O)N(N(C1=O)c2ccc(O)cc2)c3ccccc3 ,oxyphenbutazone,-3.739,1,324.38000000000005,1,3,5,60.85000000000001,-3.73 731 | Clc1ccc(Cl)c(c1)c2cccc(Cl)c2Cl ,"2,2',3,5'-PCB",-6.155,1,291.9920000000001,0,2,1,0.0,-6.47 732 | CCC2NC(=O)c1cc(c(Cl)cc1N2)S(N)(=O)=O,Quinethazone,-2.184,1,289.7440000000001,3,2,2,101.29,-3.29 733 | CN(C)C(=O)Nc1ccc(Cl)c(Cl)c1,Diuron,-3.301,1,233.098,1,1,1,32.34,-3.8 734 | C1CC=CC1,Cyclopentene ,-1.72,2,68.11900000000001,0,1,0,0.0,-2.1 735 | C1(=O)NC(=O)NC(=O)C1(O)C2(O)C(=O)NC(=O)NC2(=O),alloxantin,0.919,1,286.156,6,2,1,191.0,-1.99 736 | CCCCCCCCC,Nonane,-3.678,1,128.259,0,0,6,0.0,-5.88 737 | Oc1ccccc1Cl,2-Chlorophenol,-2.553,1,128.558,1,1,0,20.23,-1.06 738 | c1cccc2c3c(C)cc4ccccc4c3ccc12,5-Methylchrysene,-5.931,1,242.321,0,4,0,0.0,-6.59 739 | CCOc1ccccc1,Phenetole,-2.66,1,122.16699999999996,0,1,2,9.23,-2.33 740 | CCOC(=O)C=Cc1ccccc1,ethyl cinnamate,-3.0980000000000003,1,176.215,0,1,3,26.3,-3.0 741 | Cc1[nH]c(=O)n(c(=O)c1Cl)C(C)(C)C,Terbacil,-3.033,1,216.668,1,1,0,54.86,-2.484 742 | Clc1ccccc1C2=NCC(=O)Nc3ccc(cc23)N(=O)=O,Clonazepam,-3.707,1,315.716,1,3,2,84.6,-3.499 743 | Cc1ccc(cc1)S(=O)(=O)N,p-Toluenesulfonamide ,-1.815,1,171.22099999999998,1,1,1,60.16,-1.74 744 | CC(OC(=O)Nc1cccc(Cl)c1)C#C,Chlorbufam,-3.629,1,223.659,1,1,2,38.33,-2.617 745 | CCCCCC(C)C,2-Methylheptane,-3.3080000000000003,1,114.232,0,0,4,0.0,-5.08 746 | CC1(C)C(C=C(Cl)C(F)(F)F)C1C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2,Cyhalothrin,-6.905,1,449.8560000000001,0,3,6,59.32000000000001,-8.176 747 | CCCC1C(=O)N3N(C1=O)c2cc(C)ccc2N=C3N(C)C ,Apazone,-2.9,1,300.3620000000001,0,3,2,56.220000000000006,-3.5380000000000003 748 | CN2C(=O)CN=C(c1ccccc1)c3cc(Cl)ccc23,Diazepam,-4.05,1,284.74600000000004,0,3,1,32.67,-3.754 749 | CCC(O)C(C)C,2-Methyl-3-pentanol,-1.308,1,102.177,1,0,2,20.23,-0.7 750 | CCOP(=S)(OCC)Oc1ccc(cc1)S(C)=O ,fensulfothion,-3.283,1,308.36100000000005,0,1,7,44.760000000000005,-2.3 751 | CC1(C)C2CCC1(C)C(O)C2,borneol,-2.423,1,154.253,1,2,0,20.23,-2.32 752 | CC12CCC3C(CCC4=CC(=O)CCC34C)C2CCC1O,Testosterone,-3.659,1,288.431,1,4,0,37.3,-4.02 753 | CCCCCCC,Heptane,-2.97,1,100.205,0,0,4,0.0,-4.53 754 | Oc1cccc2ccccc12,1-Napthol,-3.08,1,144.17299999999997,1,2,0,20.23,-2.22 755 | C/C1CCCCC1\C,"cis-1,2-Dimethylcyclohexane",-3.305,1,112.216,0,1,0,0.0,-4.3 756 | COc2cc1c(N)nc(nc1c(OC)c2OC)N3CCN(CC3)C(=O)OCC(C)(C)O ,Trimazosin,-3.958,1,435.48100000000034,2,3,6,132.5,-3.638 757 | C1Cc2c3c1cccc3cc4c2ccc5ccccc54,Cholanthrene,-5.942,2,254.332,0,5,0,0.0,-7.85 758 | CC(=O)C3(C)CCC4C2C=C(C)C1=CC(=O)CCC1(C)C2CCC34C,Medrogestone,-4.593,1,340.5070000000001,0,4,1,34.14,-5.27 759 | CCCCCC(=O)C,2-Heptanone,-1.554,1,114.188,0,0,4,17.07,-1.45 760 | COP(=O)(NC(C)=O)SC ,Acephate,-0.416,1,183.169,1,0,3,55.4,0.54 761 | CCCCSP(=O)(SCCCC)SCCCC,DEF,-4.074,1,314.5220000000001,0,0,12,17.07,-5.14 762 | c1cC2C(=O)NC(=O)C2cc1,phthalamide,-0.636,1,149.149,1,2,0,46.17,-2.932 763 | NS(=O)(=O)c2cc1c(NC(NS1(=O)=O)C(Cl)Cl)cc2Cl ,Trichlomethiazide,-2.98,1,380.662,3,2,2,118.36,-2.68 764 | CC=C(C)C,2-Methy-2-Butene,-1.994,1,70.13499999999999,0,0,0,0.0,-2.56 765 | Cc1ccc(C)c(C)c1,"1,2,4-Trimethylbenzene",-3.343,1,120.195,0,1,0,0.0,-3.31 766 | Oc1cc(Cl)c(Cl)cc1Cl,"2,4,5-Trichlorophenol ",-3.78,1,197.448,1,1,0,20.23,-2.21 767 | c1ccc2c(c1)cnc3ccccc23 ,phenanthridine,-3.713,2,179.22199999999998,0,3,0,12.89,-2.78 768 | CCCC(C)(O)CC,3-Methyl-3-hexanol,-1.663,1,116.20399999999998,1,0,3,20.23,-0.98 769 | CCCCCCCC,Octane,-3.324,1,114.232,0,0,5,0.0,-5.24 770 | c1ccc2cc3ccccc3cc2c1,Anthracene,-4.518,2,178.23399999999995,0,3,0,0.0,-6.35 771 | NNc1ccccc1,Phenylhydrazine,-1.866,1,108.14399999999998,2,1,1,38.05,0.07 772 | CCC=O,Propionaldehyde,-0.3939999999999999,1,58.08,0,0,1,17.07,0.58 773 | C1CCCCCCC1,Cyclooctane,-3.355,2,112.216,0,1,0,0.0,-4.15 774 | O=C1NC(=O)NC(=O)C1(CC=C)CC=C,"5,5-Diallylbarbital",-1.471,1,208.217,2,1,4,75.27000000000001,-2.077 775 | ClC(Cl)Cl,Trichloromethane,-1.812,1,119.378,0,0,0,0.0,-1.17 776 | Sc1nccc(=O)[nH]1 ,thiouracil,-0.992,1,128.15599999999998,2,1,0,45.75,-2.273 777 | Clc1ccc(CN(C2CCCC2)C(=O)Nc3ccccc3)cc1,Pencycuron,-5.126,1,328.843,1,3,4,32.34,-5.915 778 | CC1=CCCCC1,1-Methylcyclohexene ,-2.574,1,96.17300000000002,0,1,0,0.0,-3.27 779 | CCCCC(CC)C=O,2-Ethylhexanal,-2.232,1,128.21499999999995,0,0,5,17.07,-2.13 780 | COc2c1occc1c(OC)c3c(=O)cc(C)oc23 ,Khellin,-3.603,1,260.24499999999995,0,3,2,61.81,-3.0210000000000004 781 | O=C1NC(=O)NC(=O)C1(CC)CCC(C)C,5-Ethyl-5-(3-methylbutyl)barbital,-2.312,1,226.27599999999995,2,1,4,75.27000000000001,-2.658 782 | c1ccc2c3c(ccc2c1)c4cccc5cccc3c45,Benzo(j)fluoranthene,-6.007000000000001,2,252.316,0,5,0,0.0,-8.0 783 | CCC(CC)C=O,2-Ethylbutanal,-1.523,1,100.161,0,0,3,17.07,-1.52 784 | CCCOCCC,Dipropyl ether,-1.426,1,102.177,0,0,4,9.23,-1.62 785 | CCCCCCCCCCCCCCO,1-Tetradecanol,-4.231,1,214.393,1,0,12,20.23,-5.84 786 | Oc1c(Cl)ccc(Cl)c1Cl,"2,3,6-Trichlorophenol",-3.572,1,197.448,1,1,0,20.23,-2.64 787 | NC(=O)N,Urea,0.8320000000000001,1,60.056,2,0,0,69.11,0.96 788 | CCCC#C,1-Pentyne,-1.446,1,68.11899999999999,0,0,1,0.0,-1.64 789 | Brc1cccc(Br)c1,"1,3-Dibromobenzene",-4.298,1,235.906,0,1,0,0.0,-3.54 790 | CCCCCCCCCCCCCCCCCCO,1-Octadecanol,-5.649,1,270.50099999999986,1,0,16,20.23,-8.4 791 | CC(=O)Nc1ccccc1,Acetanilide,-1.857,1,135.16599999999997,1,1,1,29.1,-1.33 792 | c1cc(O)c(O)c2OCC3(O)CC4=CC(=O)C(O)=CC4=C3c21,hematein,-1.795,1,300.266,4,4,0,107.22,-2.7 793 | c1nccc(C(=O)NN)c1,Isonazid,-0.7170000000000001,1,137.14200000000002,2,1,1,68.01,0.009 794 | OC1C=CC2C1C3(Cl)C(=C(Cl)C2(Cl)C3(Cl)Cl)Cl ,hydroxychlordene,-4.156000000000001,1,354.8749999999999,1,3,0,20.23,-5.46 795 | CC(C)CCOC=O,Isopentyl formate,-1.449,1,116.15999999999998,0,0,4,26.3,-1.52 796 | CC(=O)c1ccccc1,Acetophenone,-2.0780000000000003,1,120.15099999999995,0,1,1,17.07,-1.28 797 | c2ccc1nc(ccc1c2)c4ccc3ccccc3n4 ,biquinoline,-4.9030000000000005,2,256.308,0,4,1,25.78,-5.4 798 | CCOP(=O)(OCC)OCC,Triethyl phosphate,-0.953,1,182.156,0,0,6,44.760000000000005,0.43 799 | CC2(C)C1CCC(C)(C1)C2=O,D-fenchone,-2.158,1,152.237,0,2,0,17.07,-1.85 800 | COc2cnc1cncnc1n2,7-methoxypteridine,-1.589,1,162.152,0,2,1,60.790000000000006,-0.91 801 | ClC2=C(Cl)C3(Cl)C1C=CCC1C2(Cl)C3(Cl)Cl ,Chlordene,-5.152,1,338.876,0,3,0,0.0,-5.64 802 | CC(C)N(=O)=O,2-Nitropropane,-0.743,1,89.094,0,0,1,43.14,-0.62 803 | c1ccc2c(c1)[nH]c3ccccc32,Carbazole,-3.836,2,167.21099999999998,1,3,0,15.79,-5.27 804 | OCC(O)C(O)CO,Erythritol,0.675,1,122.12,4,0,3,80.92,0.7 805 | CCCOC(=O)c1ccc(N)cc1,Risocaine,-2.709,1,179.21899999999997,1,1,3,52.32,-2.452 806 | CNC(=O)C=C(C)OP(=O)(OC)OC,Azodrin,-0.949,1,223.165,1,0,5,73.86,0.6509999999999999 807 | O=C1CCC(=O)N1,Succinimide,0.282,1,99.089,1,1,0,46.17,0.3 808 | CCC(C)C(C)C,"2,3-Dimethylpentane",-2.938,1,100.20499999999998,0,0,2,0.0,-4.28 809 | CCCCc1c(C)nc(NCC)nc1OS(=O)(=O)N(C)C ,bupirimate,-3.4930000000000003,1,316.4270000000001,1,1,8,84.42,-4.16 810 | CCN2c1ncccc1N(C)C(=S)c3cccnc23 ,RTI 16,-3.411,1,270.361,0,3,1,32.260000000000005,-4.634 811 | O2c1ccccc1N(CC)C(=O)c3ccccc23 ,RTI 9,-3.784,1,239.274,0,3,1,29.54,-3.68 812 | C1CCOCC1,Tetrahydropyran ,-0.978,2,86.134,0,1,0,9.23,-0.03 813 | CCCCCC#C,1-Heptyne,-2.155,1,96.173,0,0,3,0.0,-3.01 814 | c1cc2ccc(OC)c(CC=C(C)(C))c2oc1=O ,osthole,-4.0760000000000005,1,244.29,0,2,3,39.44,-4.314 815 | c1cc(C)cc2c1c3cc4cccc5CCc(c45)c3cc2,3-Methylcholanthrene,-6.311,1,268.3589999999999,0,5,0,0.0,-7.92 816 | CCOC(=O)c1ccccc1,Ethyl benzoate ,-2.775,1,150.177,0,1,2,26.3,-2.32 817 | ClCC(C)C,1-Chloro-2-methylpropane,-1.924,1,92.569,0,0,1,0.0,-2.0 818 | CC34CCC1C(CCc2cc(O)ccc12)C3CCC4(O)C#C ,Ethinyl estradiol,-4.317,1,296.41,2,4,0,40.46,-4.3 819 | CCCCCCCCCCCC(=O)OC,methyl laurate,-4.025,1,214.349,0,0,10,26.3,-4.69 820 | CCCSCCC,Di-n-propylsulfide,-2.307,1,118.245,0,0,4,0.0,-2.58 821 | c1ccc2cc3cc4ccccc4cc3cc2c1,Napthacene,-5.568,2,228.294,0,4,0,0.0,-8.6 822 | CCCCCBr,1-Bromopentane,-2.658,1,151.047,0,0,3,0.0,-3.08 823 | CCCC/C=C/C,trans-2-Heptene ,-2.784,1,98.18899999999998,0,0,3,0.0,-3.82 824 | Cc1ncc(N(=O)=O)n1CCO ,Metranidazole,-0.8590000000000001,1,171.15599999999998,1,1,3,81.19,-1.26 825 | CCCCCC1CCCC1,Pentylcyclopentane,-3.869,1,140.26999999999998,0,1,4,0.0,-6.08 826 | Clc1ccc(Cl)c(c1)c2c(Cl)c(Cl)cc(Cl)c2Cl ,"2,2',3,5,5',6-PCB",-7.261,1,360.88200000000006,0,2,1,0.0,-7.42 827 | O=C1NC(=O)NC(=O)C1(CC)C(C)C,5-Ethyl-5-isopropylbarbituric acid,-1.6030000000000002,1,198.222,2,1,2,75.27000000000001,-2.148 828 | CC(Cl)(Cl)Cl,"1,1,1-Trichloroethane",-2.232,1,133.405,0,0,0,0.0,-2.0 829 | CON(C)C(=O)Nc1ccc(Cl)cc1,Monolinuron,-2.948,1,214.652,1,1,2,41.57,-2.57 830 | O=C2NC(=O)C1(CCCCC1)C(=O)N2,Cyclohexyl-5-spirobarbituric acid,-1.405,1,196.206,2,2,0,75.27,-3.06 831 | CN(C)C(=O)OC1=CC(=O)CC(C)(C)C1 ,dimetan,-2.3040000000000003,1,211.261,0,1,1,46.61,-0.85 832 | Cc1ccc(Br)cc1,4-Bromotoluene,-3.667,1,171.03700000000003,0,1,0,0.0,-3.19 833 | CCOCC,Diethyl ether ,-0.718,1,74.123,0,0,2,9.23,-0.09 834 | CC(C)NC(=O)N1CC(=O)N(C1=O)c2cc(Cl)cc(Cl)c2,Rovral,-4.004,1,330.17100000000005,1,2,2,69.72,-4.376 835 | CCCCN(CC)c1c(cc(cc1N(=O)=O)C(F)(F)F)N(=O)=O,Benfluralin,-5.205,1,335.28200000000004,0,1,7,89.51999999999998,-5.53 836 | Cc1cc(C)c(O)c(C)c1,"2,4,6-Trimethylphenol",-2.9410000000000003,1,136.194,1,1,0,20.23,-2.05 837 | c1ccccc1,Benzene ,-2.418,2,78.11399999999999,0,1,0,0.0,-1.64 838 | Clc1ccc(I)cc1,p-Chloroiodobenzene,-4.384,1,238.455,0,1,0,0.0,-4.03 839 | COc1ccc(NC(=O)N(C)C)cc1Cl,Metoxuron,-2.6830000000000003,1,228.679,1,1,2,41.57,-2.564 840 | CC(C)N(C(=O)CCl)c1ccccc1 ,propachlor,-3.018,1,211.69200000000004,0,1,3,20.31,-2.48 841 | C=Cc1ccccc1,Styrene,-2.85,1,104.15199999999996,0,1,1,0.0,-2.82 842 | COCOC,Dimethoxymethane,0.092,1,76.095,0,0,2,18.46,0.48 843 | Cc1ccccc1C,o-Xylene ,-3.004,1,106.168,0,1,0,0.0,-2.8 844 | CCC(C)O,Butan-2-ol,-0.616,1,74.12299999999999,1,0,1,20.23,0.47 845 | Oc1ccc(O)cc1,"1,4-Benzenediol",-1.59,1,110.11199999999998,2,1,0,40.46,-0.17 846 | CC34CCC1C(CCc2cc(O)ccc12)C3CC(O)C4O ,estriol,-3.858,1,288.387,3,4,0,60.69,-4.955 847 | C1c2ccccc2c3cc4ccccc4cc13,Benzo(b)fluorene,-5.189,2,216.283,0,4,0,0.0,-8.04 848 | O=C1CNC(=O)N1 ,hydantoin,0.603,1,100.077,2,1,0,58.2,-0.4 849 | c1(O)cc(O)ccc1CCCCCC,4-hexylresorcinol,-3.4930000000000003,1,194.27399999999992,2,1,5,40.46,-2.59 850 | C=CCS(=O)SCC=C,allicin,-2.045,1,162.27899999999997,0,0,5,17.07,-0.83 851 | CCOP(=S)(OCC)Oc2ccc1oc(=O)c(Cl)c(C)c1c2,Coumaphos,-5.04,1,362.7710000000001,0,2,6,57.9,-5.382000000000001 852 | Cc1c(C)c2c3ccccc3ccc2c4ccccc14,"5,6-Dimethylchrysene",-6.265,1,256.348,0,4,0,0.0,-7.01 853 | CCCCC(=O)OC3(C(C)CC4C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC34C)C(=O)CO,Betamethasone-17-valerate,-5.062,1,476.5850000000002,2,4,6,100.90000000000002,-4.71 854 | O=c2[nH]c(=O)c1[nH]c(=O)[nH]c1[nH]2 ,uric acid,-0.541,1,168.112,4,2,0,114.36999999999998,-3.93 855 | Oc1c(Cl)cc(Cl)c(Cl)c1Cl,"2,3,4,6-Tetrachlorophenol",-4.203,1,231.893,1,1,0,20.23,-3.1 856 | Clc1cccc(Cl)c1,"1,3-Dichlorobenzene",-3.5580000000000003,1,147.004,0,1,0,0.0,-3.04 857 | Clc1ccc(cc1)C(c2ccc(Cl)cc2)C(Cl)(Cl)Cl,DDT,-6.638,1,354.491,0,2,2,0.0,-7.15 858 | CC(C)COC=O,Isobutyl formate,-1.095,1,102.13299999999998,0,0,3,26.3,-1.01 859 | c1ccccc1SC,thioanisole,-2.87,1,124.208,0,1,1,0.0,-2.39 860 | CCN2c1nc(C)cc(C(F)(F)F)c1NC(=O)c3cccnc23 ,RTI 13,-4.45,1,322.29,1,3,1,58.120000000000005,-4.207 861 | CCCCCC,Hexane ,-2.615,1,86.178,0,0,3,0.0,-3.84 862 | COC(=O)c1cccnc1 ,methyl nicotinate,-1.621,1,137.138,0,1,1,39.19,-0.46 863 | NS(=O)(=O)c3cc2c(NC(Cc1ccccc1)NS2(=O)=O)cc3C(F)(F)F,Bendroflumethiazide,-3.741,1,421.4220000000001,3,3,3,118.36,-3.59 864 | Clc1ccc(cc1Cl)c2cc(Cl)c(Cl)c(Cl)c2Cl ,"2,3,3',4,4',5-PCB",-7.425,1,360.88200000000006,0,2,1,0.0,-7.82 865 | CC1(OC(=O)N(C1=O)c2cc(Cl)cc(Cl)c2)C=C,Vinclozolin,-4.377,1,286.11400000000003,0,2,2,46.61,-4.925 866 | CCNc1nc(Cl)nc(NC(C)(C)C#N)n1,Cyanazine,-2.49,1,240.698,2,1,4,86.52,-3.15 867 | c1ccc2c(c1)c3ccccc3c4ccccc24,Triphenylene,-5.568,2,228.294,0,4,0,0.0,-6.726 868 | CC=C(C(=CC)c1ccc(O)cc1)c2ccc(O)cc2,Dienestrol,-4.775,1,266.34,2,2,3,40.46,-4.95 869 | CCCCC(CC)COC(=O)c1ccccc1C(=O)OCC(CC)CCCC,Di(2-ethylhexyl)-phthalate,-7.117000000000001,1,390.5640000000003,0,1,14,52.60000000000001,-6.96 870 | CCc1ccccn1,2-Ethyl pyridine,-2.051,1,107.15599999999998,0,1,1,12.89,0.51 871 | COP(=O)(OC)OC(Br)C(Cl)(Cl)Br,Naled,-3.548,1,380.784,0,0,5,44.760000000000005,-2.28 872 | c1ccc(cc1)c2ccccc2,Biphenyl,-4.079,2,154.21199999999996,0,2,1,0.0,-4.345 873 | Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cc(Cl)cc2Cl ,"2,2',4,4',6,6'-PCB",-7.178999999999999,1,360.88200000000006,0,2,1,0.0,-8.71 874 | CN(C)c1nc(nc(n1)N(C)C)N(C)C,Altretamine,-2.492,1,210.285,0,1,3,48.39000000000001,-3.364 875 | CC(C)CC(C)(C)O,"2,4-Dimethyl-2-pentanol ",-1.6469999999999998,1,116.20399999999998,1,0,2,20.23,-0.92 876 | O=C2NC(=O)C1(CCCCCC1)C(=O)N2 ,Cycloheptyl-5-spirobarbituric acid,-1.844,1,210.233,2,2,0,75.27,-3.168 877 | OCC1OC(O)(CO)C(O)C1O,Fructose,0.471,1,180.156,5,1,2,110.38,0.64 878 | Cc1cc(C)cc(O)c1,"3,5-Dimethylphenol",-2.652,1,122.16699999999996,1,1,0,20.23,-1.4 879 | ClCC#CCOC(=O)Nc1cccc(Cl)c1,Barban,-4.16,1,258.104,1,1,2,38.33,-4.37 880 | CC(=O)Nc1ccc(Cl)cc1,p-Chloroacetanilide,-2.642,1,169.611,1,1,1,29.1,-2.843 881 | Clc1ccc(Cl)c(c1)c2c(Cl)c(Cl)c(Cl)c(Cl)c2Cl,"2,2',3,4,5,5',6-PCB",-7.898,1,395.3270000000001,0,2,1,0.0,-8.94 882 | CCC(C)(C)C,"2,2-Dimethylbutane",-2.584,1,86.17799999999998,0,0,0,0.0,-3.55 883 | CNc1ccccc1,N-Methylaniline ,-2.097,1,107.15599999999998,1,1,1,12.03,-1.28 884 | C=CCC=C,"1,4-Pentadiene ",-1.758,1,68.119,0,0,2,0.0,-2.09 885 | CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)CCC4(C)C3C(O)CC21C,Hydrocortisone 21-acetate,-3.692,1,404.5030000000002,2,4,3,100.90000000000002,-4.88 886 | Cc1cc(cc(N(=O)=O)c1O)N(=O)=O,DNOC,-2.818,1,198.134,1,1,2,106.51000000000002,-1.456 887 | OC3N=C(c1ccccc1Cl)c2cc(Cl)ccc2NC3=O,Lorazepam,-3.75,1,321.163,2,3,1,61.690000000000005,-3.604 888 | Oc1cccc(Cl)c1,3-Chlorophenol,-2.761,1,128.558,1,1,0,20.23,-0.7 889 | Clc1cccc(Br)c1,m-Chlorobromobenzene,-3.928,1,191.455,0,1,0,0.0,-3.21 890 | NS(=O)(=O)c2cc1c(N=CNS1(=O)=O)cc2Cl ,chlorothiazide,-1.7519999999999998,1,295.72900000000004,2,2,1,118.69,-3.05 891 | O=C1NC(=O)NC(=O)C1(C)CC,5-Methyl-5-ethylbarbituric acid,-0.911,1,170.16799999999998,2,1,1,75.27000000000001,-1.228 892 | OCCOc1ccccc1,2-Phenoxyethanol,-1.761,1,138.16599999999997,1,1,3,29.46,-0.7 893 | C(c1ccccc1)c2ccccc2,Diphenylmethane,-4.09,2,168.239,0,2,2,0.0,-4.08 894 | CCCCCC(O)CC,3-Octanol,-2.033,1,130.23099999999997,1,0,5,20.23,-1.98 895 | CCN(Cc1c(F)cccc1Cl)c2c(cc(cc2N(=O)=O)C(F)(F)F)N(=O)=O,Flumetralin,-6.584,1,421.7340000000001,0,2,6,89.51999999999998,-6.78 896 | CC(C)Nc1nc(Cl)nc(NC(C)C)n1,Propazine,-3.329,1,229.71500000000003,2,1,4,62.73,-4.43 897 | CCCC(C)CO,2-Methylpentanol,-1.381,1,102.177,1,0,3,20.23,-1.11 898 | CCCCC(C)(C)O,2-Methyl-2-hexanol,-1.663,1,116.20399999999998,1,0,3,20.23,-1.08 899 | CCc1ccccc1,Ethylbenzene,-2.988,1,106.16799999999996,0,1,1,0.0,-2.77 900 | O=C1NC(=O)NC(=O)C1(CC)CC=C(C)C,5-(3-Methyl-2-butenyl)-5-ethylbarbital,-2.126,1,224.26,2,1,3,75.27000000000001,-2.253 901 | ClC1C=CC2C1C3(Cl)C(=C(Cl)C2(Cl)C3(Cl)Cl)Cl,Heptachlor,-5.26,1,373.3209999999999,0,3,0,0.0,-6.317 902 | CCC(C)C1(CC(Br)=C)C(=O)NC(=O)NC1=O ,butallylonal,-2.766,1,303.156,2,1,4,75.27000000000001,-2.647 903 | CC1(C)C(C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2)C1(C)C,Fenpropathrin,-6.15,1,349.43000000000006,0,3,5,59.32000000000001,-6.025 904 | COC(C)(C)CCCC(C)CC=CC(C)=CC(=O)OC(C)C,Methoprene,-4.795,1,310.47800000000007,0,0,10,35.53,-5.19 905 | CCOC(=O)CC,Ethyl propionate,-1.1909999999999998,1,102.133,0,0,2,26.3,-0.66 906 | CSc1nc(NC(C)C)nc(NC(C)C)n1,Prometryn,-3.693,1,241.364,2,1,5,62.73,-4.1 907 | CC(C#C)N(C)C(=O)Nc1ccc(Cl)cc1,Buturon,-3.199,1,236.702,1,1,2,32.34,-3.9 908 | Cc1cc2ccccc2cc1C,"2,3-Dimethylnaphthalene",-4.1160000000000005,1,156.22799999999998,0,2,0,0.0,-4.72 909 | Clc1ccc(cc1)c2cc(Cl)ccc2Cl ,"2,4',5-PCB",-5.7620000000000005,1,257.547,0,2,1,0.0,-6.25 910 | Clc1ccc(c(Cl)c1)c2cc(Cl)c(Cl)c(Cl)c2Cl ,"2,3',4,4',5-PCB",-7.343,1,360.88200000000006,0,2,1,0.0,-7.39 911 | NC(N)=NC#N ,2-cyanoguanidine,0.361,1,84.082,2,0,0,88.19,-0.31 912 | ClC(Cl)(Cl)N(=O)=O,Chloropicrin,-1.866,1,164.375,0,0,0,43.14,-2.0 913 | Clc1cccc(Cl)c1c2ccccc2 ,"2,6-PCB",-4.984,1,223.102,0,2,1,0.0,-5.21 914 | COc1ccc(C=O)cc1,p-Methoxybenzaldehyde,-2.252,1,136.14999999999998,0,1,2,26.3,-1.49 915 | CC(=O)Nc1ccc(cc1)N(=O)=O,4-Nitroacetanilide,-2.219,1,180.163,1,1,2,72.24000000000001,-2.692 916 | CCCCCCC(=O)OCC,Ethyl heptanoate,-2.608,1,158.241,0,0,6,26.3,-2.74 917 | CC(=O)Nc1ccc(O)cc1,p-Hydroxyacetanilide,-1.495,1,151.165,2,1,1,49.33,-1.03 918 | c2ccc1[nH]ncc1c2 ,indazole,-2.34,2,118.13899999999998,1,2,0,28.68,-2.16 919 | CC5(C)OC4CC3C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC3(C)C4(O5)C(=O)CO ,triamcinolone acetonide,-3.928,1,434.50400000000025,2,5,2,93.06000000000002,-4.31 920 | Nc2nc1[nH]cnc1c(=O)[nH]2 ,guanine,-0.67,1,151.129,3,2,0,100.45,-3.583 921 | COC(=O)C,Methyl acetate,-0.416,1,74.07900000000001,0,0,0,26.3,0.46 922 | CC34CCC1C(CCC2CC(=O)CCC12C)C3CCC4O ,Stanolone,-3.882,1,290.44699999999995,1,4,0,37.3,-4.743 923 | CCCC(O)C=C,1-Hexene-3-ol,-1.199,1,100.161,1,0,3,20.23,-0.59 924 | OC(C1=CC2C5C(C1C2=C(c3ccccc3)c4ccccn4)C(=O)NC5=O)(c6ccccc6)c7ccccn7 ,norbormide,-4.238,1,511.5810000000002,2,7,5,92.18,-3.931 925 | CCCCOCCCC,Dibutyl ether ,-2.135,1,130.231,0,0,6,9.23,-1.85 926 | CCCCCCCCCCCCO,1-Dodecanol,-3.523,1,186.339,1,0,10,20.23,-4.8 927 | CCN2c1nc(N(C)(CCO))ccc1NC(=O)c3cccnc23 ,RTI 6,-3.335,1,313.36100000000005,2,3,4,81.59000000000002,-3.36 928 | CCCC(C)(C)O,2-Methyl-2-pentanol,-1.308,1,102.17699999999998,1,0,2,20.23,-0.49 929 | Nc1nc(=O)[nH]cc1F,Flucytosine,-0.132,1,129.09399999999997,2,1,0,71.77,-0.972 930 | CCCCOc1ccc(C(=O)OCC)c(c1)N(CC)CC ,stadacaine,-5.127999999999999,1,293.40700000000004,0,1,9,38.77,-3.84 931 | CCCCCC(C)(C)O,2-Methyl-2-heptanol,-2.017,1,130.231,1,0,4,20.23,-1.72 932 | Cc1c(C)c(C)c(C)c(C)c1C,Hexamethylbenzene,-4.361000000000001,1,162.27599999999998,0,1,0,0.0,-5.23 933 | CC(C)c1ccc(C)cc1O,Thymol,-3.129,1,150.22099999999998,1,1,1,20.23,-2.22 934 | c2cnc1ncncc1n2,Pteridine,-0.906,2,132.12599999999998,0,2,0,51.56,0.02 935 | CCOP(=S)(OCC)Oc1ccc(cc1)N(=O)=O,Parathion,-3.949,1,291.26500000000004,0,1,7,70.83000000000001,-4.66 936 | C,Methane,-0.636,0,16.043,0,0,0,0.0,-0.9 937 | c2ccc1NCCc1c2 ,indoline,-2.195,2,119.167,1,2,0,12.03,-1.04 938 | O=N(=O)c1cccc2ccccc12,1-Nitronapthalene,-3.414,1,173.171,0,2,1,43.14,-3.54 939 | CCC(C)C(=O)C,3-Methyl-2-pentanone,-1.266,1,100.161,0,0,2,17.07,-0.67 940 | Nc1nc(O)nc2nc[nH]c12 ,isoguanine,-1.74,1,151.129,3,2,0,100.71,-3.401 941 | OC(CC(c1ccccc1)c3c(O)c2ccccc2oc3=O)c4ccc(cc4)c5ccc(Br)cc5 ,bromadiolone,-7.877000000000001,1,527.4140000000002,2,5,6,70.67,-4.445 942 | CN(=O)=O,Nitromethane,-0.042,1,61.040000000000006,0,0,0,43.14,0.26 943 | CC(C)N(C(C)C)C(=O)SCC(Cl)=C(Cl)Cl,Triallate,-4.578,1,304.67,0,0,4,20.31,-4.88 944 | C=CCCC=C,"1,5-Hexadiene ",-2.112,1,82.14599999999999,0,0,3,0.0,-2.68 945 | c2ccc1[nH]ccc1c2,Indole,-2.654,2,117.15099999999995,1,2,0,15.79,-1.52 946 | CC34CCC1C(CCC2=CC(=O)CCC12C)C3CCC4=O,Androstenedione,-3.393,1,286.415,0,4,0,34.14,-3.69 947 | CCCCC=C,1-Hexene,-2.364,1,84.16199999999999,0,0,3,0.0,-3.23 948 | Cc1cccc(C)c1NC(=O)c2cc(c(Cl)cc2O)S(N)(=O)=O,Xipamide,-3.642,1,354.8150000000001,3,2,3,109.48999999999998,-3.79 949 | CCC1CCCCC1,Ethylcyclohexane,-3.245,1,112.216,0,1,1,0.0,-4.25 950 | CCCCCCCC(=O)C,2-Nonanone,-2.263,1,142.242,0,0,6,17.07,-2.58 951 | COC(=O)Nc2nc1ccc(cc1[nH]2)C(=O)c3ccccc3,Mebendazole,-4.118,1,295.298,2,3,3,84.07999999999998,-3.88 952 | CC(C)OC(=O)Nc1cccc(Cl)c1,Chloropham,-3.544,1,213.664,1,1,2,38.33,-3.38 953 | CCN2c1nc(Cl)ccc1N(C)C(=O)c3cccnc23 ,RTI 12,-3.446,1,288.73800000000006,0,3,1,49.330000000000005,-4.114 954 | CNC(=O)Oc1cccc2ccccc12,Carbaryl,-3.087,1,201.225,1,2,1,38.33,-3.224 955 | C#C,Ethyne,-0.252,1,26.038,0,0,0,0.0,0.29 956 | Cc1cncc(C)c1,"3,5-Dimethylpyridine",-2.0980000000000003,1,107.15599999999998,0,1,0,12.89,0.38 957 | C1C=CCC=C1,"1,4-Cyclohexadiene",-1.842,2,80.12999999999998,0,1,0,0.0,-2.06 958 | CCOC(=O)N(C)C(=O)CSP(=S)(OCC)OCC,Mecarbam,-3.738,1,329.3800000000001,0,0,8,65.07000000000001,-2.518 959 | CC(O)c1ccccc1,1-Phenylethanol,-1.919,1,122.16699999999996,1,1,1,20.23,-0.92 960 | CC(Cl)CCl,"1,2-Dichloropropane",-1.794,1,112.987,0,0,1,0.0,-1.6 961 | CCCC=C(CC)C=O,2-Ethyl-2-hexanal,-2.081,1,126.19899999999998,0,0,4,17.07,-2.46 962 | CCOP(=S)(OCC)SCCSCC,Disulfoton,-3.975,1,274.413,0,0,9,18.46,-4.23 963 | CC(=O)OC3(C)CCC4C2CCC1=CC(=O)CCC1(C)C2CCC34C ,methyltestosterone acetate,-4.863,1,344.4950000000001,0,4,1,43.370000000000005,-5.284 964 | Clc1ccc(cc1)c2c(Cl)cccc2Cl ,"2,4,6-PCB",-5.604,1,257.547,0,2,1,0.0,-6.14 965 | Fc1cccc(F)c1C(=O)NC(=O)Nc2ccc(Cl)cc2,difluron,-4.692,1,310.687,2,2,2,58.2,-6.02 966 | Oc1cc(Cl)ccc1Oc2ccc(Cl)cc2Cl,Triclosan,-5.645,1,289.545,1,2,2,29.46,-4.46 967 | c1(C(=O)OCCCCCC(C)(C))c(C(=O)OCCCCCC(C)(C))cccc1,diisooctyl phthalate,-7.117000000000001,1,390.5640000000002,0,1,14,52.60000000000001,-6.6370000000000005 968 | CC12CC(O)C3C(CCC4=CC(=O)CCC34C)C2CCC1C(=O)CO,Corticosterone,-3.454,1,346.46700000000016,2,4,2,74.6,-3.24 969 | Cc1cc(C)cc(C)c1,"1,3,5-Trimethylbenzene ",-3.375,1,120.19499999999998,0,1,0,0.0,-3.4 970 | CCCCCCCCOC(=O)c1ccccc1C(=O)OCCCCCCCC ,dioctyl phthalate,-7.148,1,390.5640000000004,0,1,16,52.60000000000001,-5.115 971 | CCCCCCCCCCCCCCCO,1-Pentadecanol,-4.586,1,228.42,1,0,13,20.23,-6.35 972 | Clc1cccc(Cl)c1c2c(Cl)cccc2Cl ,"2,2',6,6'-PCB",-5.915,1,291.992,0,2,1,0.0,-7.39 973 | O=C1NC(=O)NC(=O)C1(C)C,"5,5-Dimethylbarbituric acid",-0.556,1,156.141,2,1,0,75.27000000000001,-1.742 974 | CC(C)I,2-Iodopropane,-2.486,1,169.993,0,0,0,0.0,-2.09 975 | O=N(=O)c1ccccc1N(=O)=O,"1,2-Dinitrobenzene",-2.281,1,168.10799999999995,0,1,2,86.28,-3.1 976 | CC(C)C(=O)C,3-Methyl-2-butanone,-0.912,1,86.13399999999999,0,0,1,17.07,-0.12 977 | CCCCCCCCCCCCCCCC,Hexadecane,-6.159,1,226.44799999999992,0,0,13,0.0,-8.4 978 | CC12CCC(CC1)C(C)(C)O2,"1,8-Cineole",-2.579,1,154.253,0,3,0,9.23,-1.74 979 | Cc2cccc3sc1nncn1c23 ,Tricyclazole,-2.8680000000000003,1,189.243,0,3,0,30.19,-2.07 980 | CCCCCCC(=O)C,2-Octanone,-1.909,1,128.21499999999995,0,0,5,17.07,-2.05 981 | CCCCCCCCC(=O)OC,Methyl nonanoate,-2.962,1,172.268,0,0,7,26.3,-3.38 982 | Fc1ccc(F)cc1,"1,4-Difluorobenzene",-2.636,1,114.094,0,1,0,0.0,-1.97 983 | O=C1N(C2CCC(=O)NC2=O)C(=O)c3ccccc13,Thalidomide,-1.944,1,258.233,1,3,1,83.55000000000001,-2.676 984 | CCCN(CCC)c1c(cc(cc1N(=O)=O)C(F)(F)F)N(=O)=O,Trifluralin,-5.205,1,335.28200000000004,0,1,7,89.51999999999998,-5.68 985 | CCO,Ethanol,0.02,1,46.069,1,0,0,20.23,1.1 986 | O=C2NC(=O)C1(CCCC1)C(=O)N2,Cyclopentyl-5-spirobarbituric acid,-0.966,1,182.179,2,2,0,75.27,-2.349 987 | c1c(NC(=O)OC(C)C(=O)NCC)cccc1,Carbetamide,-2.29,1,236.271,2,1,4,67.42999999999999,-1.83 988 | CC(C)=CC3C(C(=O)OCc2cccc(Oc1ccccc1)c2)C3(C)C ,phenothrin,-6.763,1,350.4580000000001,0,3,6,35.53,-5.24 989 | CN(C)C(=O)NC1CCCCCCC1,Cycluron,-2.629,1,198.30999999999992,1,1,1,32.34,-2.218 990 | ClC1(C2(Cl)C3(Cl)C4(Cl)C5(Cl)C1(Cl)C3(Cl)Cl)C5(Cl)C(Cl)(Cl)C24Cl,Mirex,-6.155,1,545.5460000000002,0,6,0,0.0,-6.8 991 | CCCCCCCCBr,1-Bromooctane,-3.721,1,193.128,0,0,6,0.0,-5.06 992 | CCCCNC(=O)n1c(NC(=O)OC)nc2ccccc12,Benomyl,-2.902,1,290.323,2,2,4,85.25,-4.883 993 | CN(C)c2c(C)n(C)n(c1ccccc1)c2=O ,aminopyrine,-2.129,1,231.299,0,2,2,30.17,-0.364 994 | CCC(O)CC,3-Pentanol,-0.97,1,88.15,1,0,2,20.23,-0.24 995 | Cc1ccc(cc1)N(=O)=O,p-Nitrotoluene,-2.64,1,137.138,0,1,1,43.14,-2.49 996 | CC(C)CCCO,4-Methylpentanol,-1.381,1,102.177,1,0,3,20.23,-1.14 997 | CC34CCC1C(CCC2=CC(=O)CCC12O)C3CCC4(O)C#C,Norethisterone,-2.669,1,314.42500000000007,2,4,0,57.53,-4.57 998 | CC(C)OC(=O)C(O)(c1ccc(Br)cc1)c2ccc(Br)cc2 ,bromopropylate,-5.832999999999999,1,428.12000000000006,1,2,4,46.53,-4.93 999 | Nc2cnn(c1ccccc1)c(=O)c2Cl,Pyrazon,-2.603,1,221.647,1,2,1,60.91,-2.878 1000 | CCC(C)(C)O,2-Methylbutan-2-ol,-0.954,1,88.14999999999998,1,0,1,20.23,0.15 1001 | Cc1ccc(O)cc1,p-Cresol,-2.313,1,108.14,1,1,0,20.23,-0.73 1002 | CCOC=O,Ethyl formate,-0.402,1,74.07900000000001,0,0,2,26.3,0.15 1003 | CN(C)c1ccccc1,"N,N-Dimethylaniline",-2.542,1,121.18299999999996,0,1,1,3.24,-1.92 1004 | C1CCC2CCCCC2C1,Decalin,-3.715,2,138.254,0,2,0,0.0,-5.19 1005 | CCCCS,Butanethiol ,-1.676,1,90.191,1,0,2,0.0,-2.18 1006 | c1ccc2c(c1)c3cccc4ccc5cccc2c5c43,Benzo(e)pyrene,-6.007000000000001,2,252.316,0,5,0,0.0,-7.8 1007 | ClC(=C(Cl)Cl)Cl,Tetrachloroethylene,-3.063,1,165.834,0,0,0,0.0,-2.54 1008 | CCC(=O)CC,3-Pentanone,-0.912,1,86.134,0,0,2,17.07,-0.28 1009 | C=CC#N,Acrylonitrile,-0.354,1,53.06399999999999,0,0,0,23.79,0.15 1010 | CC1CC2C3CC(F)C4=CC(=O)C=CC4(C)C3(F)C(O)CC2(C)C1(O)C(=O)CO,Flumethasone,-3.539,1,410.4570000000002,3,4,2,94.83,-5.613 1011 | CCCCC(=O)C,2-Hexanone,-1.2,1,100.161,0,0,3,17.07,-0.8 1012 | CCNc1nc(NC(C)(C)C)nc(OC)n1,Terbumeton,-3.505,1,225.296,2,1,4,71.96000000000001,-3.239 1013 | CCCCC(C)CC,3-Methylheptane,-3.3080000000000003,1,114.232,0,0,4,0.0,-5.16 1014 | BrCCBr,"1,2-Dibromoethane",-2.102,1,187.862,0,0,1,0.0,-1.68 1015 | CNC(=O)Oc1ccccc1C(C)C,Isoprocarb,-2.734,1,193.246,1,1,2,38.33,-2.863 1016 | O=C1NCCN1c2ncc(s2)N(=O)=O,Niridazole,-1.948,1,214.206,1,2,2,88.37,-3.22 1017 | C1c2ccccc2c3ccc4ccccc4c13,Benzo(a)fluorene,-5.189,2,216.283,0,4,0,0.0,-6.68 1018 | COc1ccccc1Cl,2-Chloroanisole,-2.912,1,142.58499999999998,0,1,1,9.23,-2.46 1019 | COP(=S)(OC)Oc1cc(Cl)c(Br)cc1Cl,Bromophos,-5.604,1,366.0,0,1,4,27.69,-6.09 1020 | ClC(Cl)CC(=O)NC2=C(Cl)C(=O)c1ccccc1C2=O,Quinonamid,-3.988,1,332.57000000000005,1,2,3,63.24,-5.03 1021 | ClC(Cl)C(c1ccc(Cl)cc1)c2ccc(Cl)cc2 ,"P,P'-DDD",-6.007999999999999,1,320.04600000000005,0,2,3,0.0,-7.2 1022 | COC(=O)C=C,Methyl acrylate,-0.878,1,86.09,0,0,1,26.3,-0.22 1023 | CN(C)C(=O)Nc2ccc(Oc1ccc(Cl)cc1)cc2,Chloroxuron,-4.477,1,290.75,1,2,3,41.57000000000001,-4.89 1024 | N(=Nc1ccccc1)c2ccccc2,Azobenzene,-4.034,2,182.226,0,2,2,24.72,-4.45 1025 | CC(C)c1ccc(C)cc1,4-Isopropyltoluene,-3.617,1,134.22199999999998,0,1,1,0.0,-3.77 1026 | Oc1c(Cl)cccc1Cl,"2,6-Dichlorophenol",-3.012,1,163.003,1,1,0,20.23,-1.79 1027 | OCC2OC(OC1(CO)OC(CO)C(O)C1O)C(O)C(O)C2O ,Sucrose,0.31,1,342.297,8,2,5,189.53,0.79 1028 | OC1C(O)C(O)C(O)C(O)C1O,d-inositol,-0.887,1,180.156,6,1,0,121.38,0.35 1029 | Cn2c(=O)n(C)c1ncn(CC(O)CO)c1c2=O,Dyphylline,-0.847,1,254.24599999999995,2,2,3,102.28,-0.17 1030 | OCC(NC(=O)C(Cl)Cl)C(O)c1ccc(cc1)N(=O)=O,Chloramphenicol,-2.613,1,323.13200000000006,3,1,6,112.70000000000002,-2.111 1031 | CCC(O)(CC)CC,3-Ethyl-3-pentanol,-1.663,1,116.204,1,0,3,20.23,-0.85 1032 | CC45CCC2C(CCC3CC1SC1CC23C)C4CCC5O,Epitostanol,-4.545,1,306.51500000000004,1,5,0,20.23,-5.41 1033 | Brc1ccccc1Br,"1,2-Dibromobenzene",-4.172,1,235.906,0,1,0,0.0,-3.5 1034 | Oc1c(Cl)cc(Cl)cc1Cl,"2,4,6-Trichlorophenol",-3.648,1,197.448,1,1,0,20.23,-2.34 1035 | CCCN(CCC)c1c(cc(cc1N(=O)=O)S(N)(=O)=O)N(=O)=O,oryzalin,-3.784,1,346.3650000000001,1,1,8,149.67999999999998,-5.16 1036 | C2c1ccccc1N(CCF)C(=O)c3ccccc23 ,RTI 20,-3.663,1,255.292,0,3,2,20.31,-4.799 1037 | CC(C)C(=O)C(C)C,"2,4-Dimethyl-3-pentanone",-1.7519999999999998,1,114.18799999999996,0,0,2,17.07,-1.3 1038 | O=C1NC(=O)NC(=O)C1(C(C)C)CC=C(C)C,5-(3-Methyl-2-butenyl)-5-isoPrbarbital,-2.465,1,238.287,2,1,3,75.27000000000001,-2.593 1039 | c1c(O)C2C(=O)C3cc(O)ccC3OC2cc1(OC),gentisin,-1.2919999999999998,1,262.261,2,3,1,75.99000000000001,-2.943 1040 | Cn1cnc2n(C)c(=O)n(C)c(=O)c12,Caffeine,-1.4980000000000002,1,194.194,0,2,0,61.82,-0.8759999999999999 1041 | CC(=O)SC4CC1=CC(=O)CCC1(C)C5CCC2(C)C(CCC23CCC(=O)O3)C45,Spironolactone,-3.842,1,416.58300000000025,0,5,1,60.44,-4.173 1042 | Cc1ccc(O)cc1C,"3,4-Dimethylphenol",-2.6210000000000004,1,122.167,1,1,0,20.23,-1.38 1043 | O(c1ccccc1)c2ccccc2,Diphenyl ether ,-4.254,2,170.211,0,2,2,9.23,-3.96 1044 | Clc1cc(Cl)c(cc1Cl)c2cc(Cl)c(Cl)cc2Cl ,"2,2',4,4',5,5'-PCB",-7.343,1,360.88200000000006,0,2,1,0.0,-8.56 1045 | NC(=O)c1cccnc1 ,nicotinamide,-0.964,1,122.12699999999997,1,1,1,55.98,0.61 1046 | Sc1ccccc1,Thiophenol ,-2.758,1,110.18099999999995,1,1,0,0.0,-2.12 1047 | CNC(=O)Oc1cc(C)cc(C)c1,XMC,-2.688,1,179.219,1,1,1,38.33,-2.5810000000000004 1048 | ClC1CC2C(C1Cl)C3(Cl)C(=C(Cl)C2(Cl)C3(Cl)Cl)Cl,Chlordane,-6.039,1,409.7819999999999,0,3,0,0.0,-6.86 1049 | CSSC,Dimethyldisulfide,-1.524,1,94.204,0,0,1,0.0,-1.44 1050 | NC(=O)c1ccccc1,Benzamide,-1.501,1,121.13899999999995,1,1,1,43.09,-0.96 1051 | Clc1ccccc1Br,o-Chlorobromobenzene,-3.84,1,191.455,0,1,0,0.0,-3.19 1052 | COC(=O)c1ccccc1OC2OC(COC3OCC(O)C(O)C3O)C(O)C(O)C2O,Monotropitoside,-1.493,1,446.405,6,3,6,184.6,-0.742 1053 | CCCCC(O)CC,3-Heptanol ,-1.6780000000000002,1,116.20399999999998,1,0,4,20.23,-1.47 1054 | CCN2c1nc(C)cc(C)c1NC(=O)c3cccnc23 ,RTI 15,-3.891,1,268.32,1,3,1,58.120000000000005,-4.553999999999999 1055 | Oc1cc(Cl)cc(Cl)c1,"3,5-Dichlorophenol",-3.428,1,163.003,1,1,0,20.23,-1.34 1056 | Cc1cccc2c1ccc3ccccc32,1-Methylphenanthrene,-4.87,1,192.261,0,3,0,0.0,-5.85 1057 | CCCCC(CC)CO,2-Ethyl-1-hexanol,-2.089,1,130.231,1,0,5,20.23,-2.11 1058 | CC(C)N(C(C)C)C(=O)SCC(=CCl)Cl,Diallate,-3.827,1,270.225,0,0,4,20.31,-4.2860000000000005 1059 | Cc1ccccc1,Toluene ,-2.713,1,92.141,0,1,0,0.0,-2.21 1060 | Clc1cccc(n1)C(Cl)(Cl)Cl,Nitrapyrin,-3.833,1,230.909,0,1,0,12.89,-3.76 1061 | C1CCC=CCC1,Cycloheptene,-2.599,2,96.173,0,1,0,0.0,-3.18 1062 | CN(C)C(=S)SSC(=S)N(C)C ,Thiram,-2.444,1,240.444,0,0,0,6.48,-3.9 1063 | COC1=CC(=O)CC(C)C13Oc2c(Cl)c(OC)cc(OC)c2C3=O,Griseofulvin,-3.3280000000000003,1,352.7700000000001,0,3,3,71.06,-3.2460000000000004 1064 | CCCCCCCCCCO,1-Decanol,-2.814,1,158.285,1,0,8,20.23,-3.63 1065 | CCC(C)(C)CC,"3,3-Dimethylpentane",-2.938,1,100.20499999999998,0,0,2,0.0,-4.23 1066 | CNC(=O)C(C)SCCSP(=O)(OC)(OC),vamidothion,-1.446,1,287.343,1,0,8,64.63000000000001,1.144 1067 | Oc1cc(Cl)c(Cl)c(Cl)c1Cl,"2,3,4,5-Tetrachlorophenol",-4.335,1,231.893,1,1,0,20.23,-3.15 1068 | CCCC=O,Butyraldehyde,-0.7490000000000001,1,72.107,0,0,2,17.07,-0.01 1069 | CC4CC3C2CCC1=CC(=O)C=CC1(C)C2(F)C(O)CC3(C)C4(O)C(=O)COC(C)=O ,dexamethasone acetate,-3.933,1,434.5040000000003,2,4,3,100.9,-4.9 1070 | CCCC,Butane,-1.907,1,58.124,0,0,1,0.0,-2.57 1071 | COc1ccccc1O,o-Methoxyphenol,-1.941,1,124.13899999999995,1,1,1,29.46,-1.96 1072 | CC1CC2C3CCC(O)(C(=O)C)C3(C)CC(O)C2(F)C4(C)C=CC(=O)C=C14,Fluoromethalone,-3.507,1,376.4680000000001,2,4,1,74.6,-4.099 1073 | ClC(Cl)C(Cl)(Cl)Cl,Pentachloroethane,-3.382,1,202.295,0,0,0,0.0,-2.6 1074 | CCOC(=O)c1ccccc1C(=O)OCC,Diethyl phthalate ,-3.016,1,222.23999999999995,0,1,4,52.60000000000001,-2.35 1075 | CC(C)CO,2-Methylpropan-1-ol,-0.672,1,74.12299999999999,1,0,1,20.23,0.1 1076 | CC(C)Cc1ccccc1,Isobutylbenzene,-3.57,1,134.22199999999998,0,1,2,0.0,-4.12 1077 | ICI,Diiodomethane,-2.958,1,267.835,0,0,0,0.0,-2.34 1078 | CCCC(O)CCC,4-Heptanol,-1.6780000000000002,1,116.204,1,0,4,20.23,-1.4 1079 | CCCCCOC(=O)C,Pentyl acetate,-1.833,1,130.18699999999998,0,0,4,26.3,-1.89 1080 | Oc1c(Cl)c(Cl)cc(Cl)c1Cl,"2,3,5,6-Tetrachlorophenol",-4.203,1,231.893,1,1,0,20.23,-3.37 1081 | CCCc1ccccc1,Propylbenzene ,-3.281,1,120.19499999999996,0,1,2,0.0,-3.37 1082 | FC(F)(Cl)C(F)(F)Cl,"1,2-Dichlorotetrafluoroethane",-2.697,1,170.92000000000002,0,0,1,0.0,-2.74 1083 | CC=CC=O,2-butenal,-0.604,1,70.09100000000001,0,0,1,17.07,0.32 1084 | CN(C)C(=O)N(C)C ,tetramethylurea,-0.495,1,116.164,0,0,0,23.550000000000004,0.94 1085 | Cc1cc(C)c(C)cc1C,"1,2,4,5-Tetramethylbenzene",-3.664,1,134.22199999999998,0,1,0,0.0,-4.59 1086 | CC(=O)OC3(CCC4C2CCC1=CC(=O)CCC1C2CCC34C)C#C,norethindrone acetate,-4.2410000000000005,1,340.4630000000001,0,4,1,43.370000000000005,-4.8 1087 | CCOP(=S)(OCC)N2C(=O)c1ccccc1C2=O,Ditalimfos,-3.992,1,299.28800000000007,0,2,5,55.84,-3.35 1088 | c1ccccc1NC(=O)c2c(O)cccc2,salicylanilide,-3.782,1,213.236,2,2,2,49.33,-3.59 1089 | CCN(CC)C(=S)SCC(Cl)=C,Sulfallate,-3.254,1,223.794,0,0,4,3.24,-3.39 1090 | ClCC,Chloroethane,-1.165,1,64.515,0,0,0,0.0,-1.06 1091 | CC(=O)Nc1cc(NS(=O)(=O)C(F)(F)F)c(C)cc1C,Mefluidide,-3.165,1,310.297,2,1,3,75.27000000000001,-3.24 1092 | O=C(C=CC=Cc2ccc1OCOc1c2)N3CCCCC3,Piperine,-3.659,1,285.343,0,3,3,38.77,-3.46 1093 | CC/C=C\C,cis-2-Pentene,-2.076,1,70.135,0,0,1,0.0,-2.54 1094 | CNC(=O)ON=C(CSC)C(C)(C)C ,thiofanox,-2.7,1,218.322,1,0,3,50.69,-1.62 1095 | O=C2NC(=O)C1(CCCCCCC1)C(=O)N2,Cyclooctyl-5-spirobarbituric acid,-2.2840000000000003,1,224.26,2,2,0,75.27,-2.982 1096 | c1(C(C)(C)C)cc(C(C)(C)C)cc(OC(=O)NC)c1,butacarb,-4.642,1,263.381,1,1,1,38.33,-4.24 1097 | Oc2cc(O)c1C(=O)CC(Oc1c2)c3ccc(O)c(O)c3,Eriodictyol,-3.152,1,288.255,4,3,1,107.22,-3.62 1098 | O=C(c1ccccc1)c2ccccc2,Benzophenone,-3.612,1,182.222,0,2,2,17.07,-3.12 1099 | CCCCCCCCCCCCCCCCCCCC,Eicosane,-7.576,1,282.5559999999999,0,0,17,0.0,-8.172 1100 | N(Nc1ccccc1)c2ccccc2 ,hydrazobenzene,-3.492,2,184.242,2,2,3,24.06,-2.92 1101 | CCC(CC)CO,2-Ethyl-1-butanol,-1.381,1,102.177,1,0,3,20.23,-1.17 1102 | Oc1ccncc1,4-hydroxypyridine,-1.655,1,95.10099999999998,1,1,0,33.120000000000005,1.02 1103 | Cl\C=C/Cl,"cis 1,2-Dichloroethylene",-1.561,1,96.94400000000002,0,0,0,0.0,-1.3 1104 | CC1CCCC1,Methylcyclopentane,-2.452,1,84.162,0,1,0,0.0,-3.3 1105 | CC(C)CC(C)O,4-Methyl-2-pentanol,-1.308,1,102.17699999999998,1,0,2,20.23,-0.8 1106 | O2c1ccc(N)cc1N(C)C(=O)c3cc(C)ccc23 ,RTI 11,-3.125,1,254.289,1,3,0,55.56,-3.928 1107 | CC(C)(C)CO,"2,2-Dimethylpropanol",-1.011,1,88.14999999999999,1,0,0,20.23,-0.4 1108 | CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n2cncn2,Triadimefon,-4.132,1,293.754,0,2,4,57.010000000000005,-3.61 1109 | Cc1cc(no1)C(=O)NNCc2ccccc2,Isocarboxazid,-2.251,1,231.255,2,2,4,67.16,-2.461 1110 | CC=C,Propylene,-1.235,1,42.081,0,0,0,0.0,-1.08 1111 | Oc1ccc(Cl)cc1Cc2cc(Cl)ccc2O,Dichlorophen,-4.924,1,269.127,2,2,2,40.46,-3.953 1112 | CCOC(=O)Nc2cccc(OC(=O)Nc1ccccc1)c2 ,Desmedipham,-4.182,1,300.314,2,2,4,76.66,-4.632 1113 | O=C1c2ccccc2C(=O)c3ccccc13,Anthraquinone,-3.34,1,208.216,0,3,0,34.14,-5.19 1114 | CCCCCCC(C)O,2-Octanol,-2.033,1,130.231,1,0,5,20.23,-2.09 1115 | CC1=C(C(=O)Nc2ccccc2)S(=O)(=O)CCO1,Oxycarboxin,-2.169,1,267.306,1,2,2,72.47,-2.281 1116 | CCCCc1ccccc1,Butylbenzene,-3.585,1,134.22199999999998,0,1,3,0.0,-4.06 1117 | O=C1NC(=O)C(=O)N1 ,parabanic acid,1.091,1,114.06,2,1,0,75.27,-0.4 1118 | COP(=S)(OC)Oc1ccc(Sc2ccc(OP(=S)(OC)OC)cc2)cc1,Abate,-6.678,1,466.47900000000016,0,2,10,55.38000000000001,-6.237 1119 | NS(=O)(=O)c1cc(ccc1Cl)C2(O)NC(=O)c3ccccc23,Chlorthalidone,-2.564,1,338.7720000000001,3,3,2,109.49,-3.451 1120 | CC(C)COC(=O)C,Isobutyl acetate,-1.463,1,116.15999999999998,0,0,2,26.3,-1.21 1121 | CC(C)C(C)(C)C,"2,2,3-Trimethylbutane",-2.922,1,100.20499999999998,0,0,0,0.0,-4.36 1122 | Clc1ccc(c(Cl)c1Cl)c2c(Cl)cc(Cl)c(Cl)c2Cl ,"2,3,3',4,4'6-PCB",-7.746,1,395.3270000000001,0,2,1,0.0,-7.66 1123 | N#Cc1ccccc1C#N,Phthalonitrile,-1.717,1,128.13399999999996,0,1,0,47.58,-2.38 1124 | Cc1cccc(c1)N(=O)=O,m-Nitrotoluene,-2.64,1,137.138,0,1,1,43.14,-2.44 1125 | FC(F)(F)C(Cl)Br ,halothane,-2.608,1,197.381,0,0,0,0.0,-1.71 1126 | CNC(=O)ON=C(SC)C(=O)N(C)C,Oxamyl,-0.908,1,219.266,1,0,1,70.99999999999999,0.106 1127 | CCSCCSP(=S)(OC)OC,Thiometon,-3.323,1,246.359,0,0,7,18.46,-3.091 1128 | CCC(C)C,2-Methylbutane,-2.245,1,72.151,0,0,1,0.0,-3.18 1129 | COP(=O)(OC)OC(=CCl)c1cc(Cl)c(Cl)cc1Cl,Stirofos,-4.32,1,365.96400000000006,0,1,5,44.760000000000005,-4.522 1130 | -------------------------------------------------------------------------------- /data/finetuning_datasets/regression/esol/esol_mock.csv: -------------------------------------------------------------------------------- 1 | smiles,Compound ID,ESOL predicted log solubility in mols per litre,Minimum Degree,Molecular Weight,Number of H-Bond Donors,Number of Rings,Number of Rotatable Bonds,Polar Surface Area,measured log solubility in mols per litre 2 | OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(O)C3O ,Amigdalin,-0.974,1,457.4320000000001,7,3,7,202.32,-0.77 3 | Cc1occc1C(=O)Nc2ccccc2,Fenfuram,-2.885,1,201.225,1,2,2,42.24,-3.3 4 | CC(C)=CCCC(C)=CC(=O),citral,-2.579,1,152.237,0,0,4,17.07,-2.06 5 | c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43,Picene,-6.617999999999999,2,278.354,0,5,0,0.0,-7.87 6 | c1ccsc1,Thiophene,-2.232,2,84.14299999999999,0,1,0,0.0,-1.33 7 | c2ccc1scnc1c2 ,benzothiazole,-2.733,2,135.191,0,2,0,12.89,-1.5 8 | Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl,"2,2,4,6,6'-PCB",-6.545,1,326.437,0,2,1,0.0,-7.32 9 | CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O,Estradiol,-4.138,1,272.388,2,4,0,40.46,-5.03 10 | ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl,Dieldrin,-4.533,1,380.913,0,5,0,12.53,-6.29 11 | -------------------------------------------------------------------------------- /data/finetuning_datasets/regression/freesolv/freesolv.csv: -------------------------------------------------------------------------------- 1 | smiles,freesolv 2 | CN(C)C(=O)c1ccc(cc1)OC,-11.01 3 | CS(=O)(=O)Cl,-4.87 4 | CC(C)C=C,1.83 5 | CCc1cnccn1,-5.45 6 | CCCCCCCO,-4.21 7 | Cc1cc(cc(c1)O)C,-6.27 8 | CC(C)C(C)C,2.34 9 | CCCC(C)(C)O,-3.92 10 | C[C@@H]1CCCC[C@@H]1C,1.58 11 | CC[C@H](C)O,-4.62 12 | C(Br)Br,-1.96 13 | CC[C@H](C(C)C)O,-3.88 14 | CCc1ccccn1,-4.33 15 | CCCCC(=O)OCC,-2.49 16 | c1ccc(cc1)S,-2.55 17 | CC(=CCC/C(=C\CO)/C)C,-4.78 18 | c1ccc2c(c1)CCC2,-1.46 19 | CCOc1ccccc1,-2.22 20 | c1cc(ccc1O)Br,-5.85 21 | CCCC(C)(C)C,2.88 22 | CC(=O)OCCOC(=O)C,-6.34 23 | CCOP(=S)(OCC)SCSP(=S)(OCC)OCC,-6.1 24 | C1CCCC(CC1)O,-5.48 25 | COC(=O)C1CC1,-4.1 26 | c1ccc(cc1)C#N,-4.1 27 | CCCCC#N,-3.52 28 | CC(C)(C)O,-4.47 29 | CC(C)C(=O)C(C)C,-2.74 30 | CCC=O,-3.43 31 | CN(C)C=O,-7.81 32 | Cc1ccc(cc1)C,-0.8 33 | C=CCC=C,0.93 34 | Cc1cccc(c1C)Nc2ccccc2C(=O)O,-6.78 35 | CN(C)C(=O)c1ccccc1,-9.29 36 | CCNCC,-4.07 37 | CC(C)(C)c1ccc(cc1)O,-5.91 38 | CC(C)CCOC=O,-2.13 39 | CCCCCCCCCCO,-3.64 40 | CCC(=O)OCC,-2.68 41 | CCCCCCCCC,3.13 42 | CC(=O)NC,-10 43 | CCCCCCCC=C,2.06 44 | c1ccc2cc(ccc2c1)O,-8.11 45 | c1cc(c(cc1Cl)Cl)Cl,-1.12 46 | C([C@H]([C@H]([C@@H]([C@@H](CO)O)O)O)O)O,-23.62 47 | CCCC(=O)OC,-2.83 48 | c1ccc(c(c1)C=O)O,-4.68 49 | C1CNC1,-5.56 50 | CCCNCCC,-3.65 51 | c1ccc(cc1)N,-5.49 52 | C(F)(F)(F)F,3.12 53 | CC[C@@H](C)CO,-4.42 54 | c1ccc(c(c1)O)I,-6.2 55 | COc1cccc(c1O)OC,-6.96 56 | CCC#C,-0.16 57 | c1ccc(cc1)C(F)(F)F,-0.25 58 | NN,-9.3 59 | Cc1ccccn1,-4.63 60 | CCNc1nc(nc(n1)Cl)NCC,-10.22 61 | c1ccc2c(c1)Oc3cc(c(cc3O2)Cl)Cl,-3.56 62 | CCCCCCCCN,-3.65 63 | N,-4.29 64 | c1ccc(c(c1)C(F)(F)F)C(F)(F)F,1.07 65 | COC(=O)c1ccc(cc1)O,-9.51 66 | CCCCCc1ccccc1,-0.23 67 | CC(F)F,-0.11 68 | c1ccc(cc1)n2c(=O)c(c(cn2)N)Cl,-16.43 69 | C=CC=C,0.56 70 | CN(C)C,-3.2 71 | CCCCCC(=O)N,-9.31 72 | CC(C)CO[N+](=O)[O-],-1.88 73 | c1ccc2c(c1)C(=O)c3cccc(c3C2=O)NCCO,-14.21 74 | C(CO[N+](=O)[O-])O,-8.18 75 | CCCCCCC(=O)C,-2.88 76 | CN1CCNCC1,-7.77 77 | CCN,-4.5 78 | C1C=CC=CC=C1,-0.99 79 | c1ccc2c(c1)Cc3ccccc3C2,-3.78 80 | CC(Cl)Cl,-0.84 81 | COc1cccc(c1)O,-7.66 82 | c1cc2cccc3c2c(c1)CC3,-3.15 83 | CCCCCCCCBr,0.52 84 | c1ccc(cc1)CO,-6.62 85 | c1c(c(=O)[nH]c(=O)[nH]1)Br,-18.17 86 | CCCC,2.1 87 | CCl,-0.55 88 | CC(C)CBr,-0.03 89 | CC(C)SC(C)C,-1.21 90 | CCCCCCC,2.67 91 | c1cnc[nH]1,-9.63 92 | c1cc2c(cc1Cl)Oc3cc(c(c(c3O2)Cl)Cl)Cl,-3.84 93 | CC[C@H](C)n1c(=O)c(c([nH]c1=O)C)Br,-9.73 94 | C(I)I,-2.49 95 | CCCN(CCC)C(=O)SCCC,-4.13 96 | C[N+](=O)[O-],-4.02 97 | CCOC,-2.1 98 | COC(CCl)(OC)OC,-4.59 99 | CC(C)C,2.3 100 | CC(C)CC(=O)O,-6.09 101 | CCOP(=O)(OCC)O/C(=C/Cl)/c1ccc(cc1Cl)Cl,-7.07 102 | CCCCl,-0.33 103 | CCCSCCC,-1.28 104 | CCC[C@H](CC)O,-4.06 105 | CC#N,-3.88 106 | CN(CC(F)(F)F)c1ccccc1,-1.92 107 | [C@@H](C(F)(F)F)(OC(F)F)Cl,0.1 108 | C=CCCC=C,1.01 109 | Cc1cccc(c1)C,-0.83 110 | CC(=O)OC,-3.13 111 | COC(c1ccccc1)(OC)OC,-4.04 112 | CCOC(=O)c1ccccc1,-3.64 113 | CCCS,-1.1 114 | CCCCCC(=O)C,-3.04 115 | CC1(Cc2cccc(c2O1)OC(=O)NC)C,-9.61 116 | c1ccc(cc1)CBr,-2.38 117 | CCCCCC(=O)OCC,-2.23 118 | CCCOC,-1.66 119 | CN1CCOCC1,-6.32 120 | c1cc(cc(c1)O)C#N,-9.65 121 | c1cc(c(cc1c2c(c(cc(c2Cl)Cl)Cl)Cl)Cl)Cl,-4.38 122 | CCCc1ccccc1,-0.53 123 | Cn1cnc2c1c(=O)n(c(=O)n2C)C,-12.64 124 | CNC,-4.29 125 | C(=C(F)F)(C(F)(F)F)F,2.93 126 | c1cc(ccc1O)Cl,-7.03 127 | C1CCNCC1,-5.11 128 | c1ccc2c(c1)ccc3c2cccc3,-3.88 129 | CI,-0.89 130 | COc1c(cc(c(c1O)OC)Cl)Cl,-6.44 131 | C(=C/Cl)\Cl,-0.78 132 | CCCCC,2.3 133 | CCCC#N,-3.64 134 | [C@@H](C(F)(F)F)(F)Br,0.5 135 | CC(C)Cc1cnccn1,-5.04 136 | CC[C@H](C)O[N+](=O)[O-],-1.82 137 | c1ccc(cc1)c2cc(ccc2Cl)Cl,-2.46 138 | c1ccc(cc1)c2cc(c(c(c2Cl)Cl)Cl)Cl,-3.48 139 | CC[C@@H](C)C(C)C,2.52 140 | C[C@H](CC(C)C)O,-3.73 141 | C1CCOCC1,-3.12 142 | C1CC1,0.75 143 | c1c(cc(c(c1Cl)Cl)Cl)c2cc(c(c(c2Cl)Cl)Cl)Cl,-3.17 144 | C=C(Cl)Cl,0.25 145 | CC(C)CO,-4.5 146 | CCCOC(=O)CC,-2.44 147 | C(C(Cl)(Cl)Cl)(Cl)(Cl)Cl,-0.64 148 | CSc1ccccc1,-2.73 149 | CCc1ccccc1O,-5.66 150 | CC(C)(C)Cl,1.09 151 | CC(=C)C=C,0.68 152 | Cc1ccc(cc1)C(C)C,-0.68 153 | Cn1ccnc1,-8.41 154 | C(CO)O,-9.3 155 | c1ccc(c(c1)Cl)Cl,-1.36 156 | c1c(=O)[nH]c(=O)[nH]c1Cl,-15.83 157 | CCCOC=O,-2.48 158 | c1ccc2c(c1)Oc3ccc(cc3O2)Cl,-3.1 159 | CCCCCC(=O)O,-6.21 160 | CCOC(=O)CCC(=O)OCC,-5.71 161 | Cc1ccnc(c1)C,-4.86 162 | C1CCC=CC1,0.14 163 | CN1CCN(CC1)C,-7.58 164 | c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl,-3.04 165 | C1=CC(=O)C=CC1=O,-6.5 166 | COC(=O)CCl,-4 167 | CCCC=O,-3.18 168 | CCc1ccccc1,-0.79 169 | C(=C(Cl)Cl)Cl,-0.44 170 | CCN(CC)CC,-3.22 171 | c1cc2c(cc1Cl)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-4.15 172 | Cc1ccncc1C,-5.22 173 | c1(=O)[nH]c(=O)[nH]c(=O)[nH]1,-18.06 174 | c1ccc(cc1)C=O,-4.02 175 | c1ccnc(c1)Cl,-4.39 176 | C=CCCl,-0.57 177 | Cc1ccc(cc1)C(=O)C,-4.7 178 | C=O,-2.75 179 | Cc1ccccc1Cl,-1.14 180 | CC(=O)N1CCCC1,-9.8 181 | CC(OC)(OC)OC,-4.42 182 | CCCCc1ccccc1,-0.4 183 | CN(C)c1ccccc1,-3.45 184 | CC(C)OC,-2.01 185 | c12c(c(c(c(c1Cl)Cl)Cl)Cl)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-4.53 186 | c1(c(c(c(c(c1Cl)Cl)Cl)Cl)Cl)c2c(c(c(c(c2Cl)Cl)Cl)Cl)Cl,-2.98 187 | C(C(Cl)Cl)Cl,-1.99 188 | CNc1ccccc1,-4.69 189 | CC(C)OC(=O)C,-2.64 190 | c1ccccc1,-0.9 191 | c1cc(c(c(c1)Cl)Cl)Cl,-1.24 192 | CCOP(=S)(OCC)SCSc1ccc(cc1)Cl,-6.5 193 | COP(=S)(OC)SCn1c(=O)c2ccccc2nn1,-10.03 194 | c1ccc2c(c1)Oc3c(cc(c(c3O2)Cl)Cl)Cl,-4.05 195 | CC(=C)C(=C)C,0.4 196 | CCCCC=C,1.58 197 | S,-0.7 198 | CCOCC,-1.59 199 | CCNc1nc(nc(n1)SC)NC(C)C,-7.65 200 | CCCCOC(=O)c1ccc(cc1)O,-8.72 201 | CCCCCCOC(=O)C,-2.26 202 | C1CCC(=O)C1,-4.7 203 | CCCCC(=O)O,-6.16 204 | CCBr,-0.74 205 | Cc1ccc2cc(ccc2c1)C,-2.63 206 | CCCCCCO,-4.4 207 | c1ccc(cc1)c2ccccc2Cl,-2.69 208 | CC1=CCCCC1,0.67 209 | CCCCCCO[N+](=O)[O-],-1.66 210 | C(Br)(Br)Br,-2.13 211 | CCc1ccc(cc1)O,-6.13 212 | CCCOCCO,-6.4 213 | c1ccc(cc1)OC=O,-3.82 214 | c1c(c(=O)[nH]c(=O)[nH]1)I,-18.72 215 | CCCC(=O)O,-6.35 216 | COC(C(F)(F)F)(OC)OC,-0.8 217 | C1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O,-20.52 218 | C(F)(F)(F)Br,1.79 219 | CCCCO,-4.72 220 | c1ccc(cc1)F,-0.8 221 | CCOC(=O)C,-2.94 222 | CC(C)COC(=O)C(C)C,-1.69 223 | CC(C)(C)OC,-2.21 224 | C1=C[C@@H]([C@@H]2[C@H]1[C@@]3(C(=C([C@]2(C3(Cl)Cl)Cl)Cl)Cl)Cl)Cl,-2.55 225 | CCC(=O)CC,-3.41 226 | COC(=O)C(F)(F)F,-1.1 227 | c1ccc2ccccc2c1,-2.4 228 | c1cc(c(c(c1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl)Cl,-4.4 229 | CC(=O)Oc1ccccc1C(=O)O,-9.94 230 | CC(=O)C(C)(C)C,-3.11 231 | COS(=O)(=O)C,-4.87 232 | CCc1ccncc1,-4.73 233 | CC(C)NC(C)C,-3.22 234 | c1cc2c(cc1Cl)Oc3ccc(cc3O2)Cl,-3.67 235 | CCCCCCCN,-3.79 236 | CC1CCCC1,1.59 237 | CCC,2 238 | C[C@H]1CCCO1,-3.3 239 | CNC(=O)Oc1cccc2c1cccc2,-9.45 240 | c1cc(cc(c1)O)C=O,-9.52 241 | c1ccc2cc3ccccc3cc2c1,-3.95 242 | C(Cl)Cl,-1.31 243 | CC(C)(C)C(=O)OC,-2.4 244 | C([N+](=O)[O-])(Cl)(Cl)Cl,-1.45 245 | C1CC[S+2](C1)([O-])[O-],-8.61 246 | Cc1cccc(c1O)C,-5.26 247 | Cc1cccc(c1)O,-5.49 248 | c1ccc2c(c1)C(=O)c3c(ccc(c3C2=O)O)N,-9.53 249 | c1ccc2c(c1)C(=O)c3c(ccc(c3C2=O)N)N,-11.85 250 | CCCCCCCC(=O)C,-2.49 251 | CCCCN,-4.24 252 | CCCC(=O)OCC,-2.49 253 | Cc1ccc(cc1)N,-5.57 254 | CCCCCCI,0.08 255 | C(C(F)(Cl)Cl)(F)(F)Cl,1.77 256 | COP(=O)(OC)OC,-8.7 257 | c1cc(cc(c1)Cl)Cl,-0.98 258 | Cc1cc(c2ccccc2c1)C,-2.47 259 | CCCC(C)C,2.51 260 | CCOP(=S)(OCC)Oc1c(cc(c(n1)Cl)Cl)Cl,-5.04 261 | C(C(F)(F)F)Cl,0.06 262 | C=C,1.28 263 | CCCCCI,-0.14 264 | COC(OC)OC,-4.42 265 | CCCCCCCCCC,3.16 266 | C[C@@H](CO[N+](=O)[O-])O[N+](=O)[O-],-4.95 267 | CC=C,1.32 268 | Cc1c[nH]c2c1cccc2,-5.88 269 | COP(=O)([C@H](C(Cl)(Cl)Cl)O)OC,-12.74 270 | C1CCCCC1,1.23 271 | CC(=CCC/C(=C/CO)/C)C,-4.45 272 | CC(C)c1ccccc1,-0.3 273 | CC(C)C(C)C(C)C,2.56 274 | CC(C)C(=O)C,-3.24 275 | CCCCNCCCC,-3.24 276 | CCCCS,-0.99 277 | c1ccc2c(c1)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-3.81 278 | COc1c(c(c(c(c1Cl)C=O)Cl)OC)O,-8.68 279 | C1CCC(CC1)N,-4.59 280 | C(F)(F)Cl,-0.5 281 | COC(=O)c1ccc(cc1)[N+](=O)[O-],-6.88 282 | CC(=O)c1cccnc1,-8.26 283 | CC#C,-0.48 284 | CCCCCCCCC=O,-2.07 285 | CCC(=O)O,-6.46 286 | C(Cl)(Cl)Cl,-1.08 287 | Cc1cccc(c1C)C,-1.21 288 | C,2 289 | c1ccc(cc1)CCl,-1.93 290 | CC1CCCCC1,1.7 291 | Cc1cccs1,-1.38 292 | c1ccncc1,-4.69 293 | CCCCCl,-0.16 294 | C[C@H]1CC[C@@H](O1)C,-2.92 295 | Cc1ccc(c(c1)OC)O,-5.8 296 | C1[C@H]([C@@H]2[C@H]([C@H]1Cl)[C@]3(C(=C([C@@]2(C3(Cl)Cl)Cl)Cl)Cl)Cl)Cl,-3.44 297 | Cc1ccccc1,-0.9 298 | CC(C)COC=O,-2.22 299 | CCOC(=O)c1ccc(cc1)O,-9.2 300 | CCOCCOCC,-3.54 301 | CCCCCOC(=O)CC,-2.11 302 | CCCc1ccc(cc1)O,-5.21 303 | CC=C(C)C,1.31 304 | C(CCl)Cl,-1.79 305 | CCC(C)(C)CC,2.56 306 | Cc1cc2ccccc2cc1C,-2.78 307 | Cc1cccc(n1)C,-4.59 308 | COC(C(Cl)Cl)(F)F,-1.12 309 | CCOCCOC(=O)C,-5.31 310 | COc1cccc(c1)N,-7.29 311 | c1cc(cnc1)C=O,-7.1 312 | CCC(C)(C)O,-4.43 313 | CCc1cccc(c1N(COC)C(=O)CCl)CC,-8.21 314 | Cn1cccc1,-2.89 315 | COCOC,-2.93 316 | CCC(CC)O,-4.35 317 | CCCCCCCCCC(=O)C,-2.15 318 | C(CBr)Cl,-1.95 319 | c1ccc(cc1)I,-1.74 320 | CC1=CC(=O)CC(C1)(C)C,-5.18 321 | CCI,-0.74 322 | CCCc1ccc(c(c1)OC)O,-5.26 323 | CC(C)Br,-0.48 324 | Cc1ccc(cc1)Br,-1.39 325 | c1cc(ccc1C#N)O,-10.17 326 | CS(=O)(=O)C,-10.08 327 | CCc1cccc(c1)O,-6.25 328 | CC1=CC[C@H](C[C@@H]1O)C(=C)C,-4.44 329 | c1cc(ccc1Br)Br,-2.3 330 | COc1c(ccc(c1C(=O)O)Cl)Cl,-9.86 331 | CC/C=C\C,1.31 332 | CC,1.83 333 | COc1ccccc1OC,-5.33 334 | CCSCC,-1.46 335 | c1cc(cnc1)C#N,-6.75 336 | c1cc(c(cc1O)Cl)Cl,-7.29 337 | COc1ccccc1,-2.45 338 | Cc1ccc(c(c1)O)C,-5.91 339 | c1cc(ccc1Cl)Cl,-1.01 340 | C(F)Cl,-0.77 341 | CCCC=C,1.68 342 | c1cc(c(c(c1Cl)Cl)Cl)Cl,-1.34 343 | CCCCCC#C,0.6 344 | CCCCCCCCC(=O)C,-2.34 345 | c1ccc(cc1)Cl,-1.12 346 | CN(C)CCOC(c1ccccc1)c2ccccc2,-9.34 347 | CCCCC=O,-3.03 348 | c1ccc(cc1)Oc2ccccc2,-2.87 349 | C1CCC(=O)CC1,-4.91 350 | CCCC[N+](=O)[O-],-3.09 351 | c1cnccc1C=O,-7 352 | C(CCl)OCCCl,-4.23 353 | CC[N+](=O)[O-],-3.71 354 | c1cc(cnc1)Cl,-4.01 355 | CBr,-0.82 356 | CO,-5.1 357 | CCCCCCC=O,-2.67 358 | c1cc(c(c(c1)Cl)c2c(cccc2Cl)Cl)Cl,-2.28 359 | c1ccc(c(c1)N)[N+](=O)[O-],-7.37 360 | CN1CCCCC1,-3.88 361 | CCCCCCCC=O,-2.29 362 | c1ccc(cc1)[N+](=O)[O-],-4.12 363 | C[C@@H]1CC[C@H](C(=O)C1)C(C)C,-2.53 364 | C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O,-25.47 365 | CF,-0.22 366 | CS(=O)C,-9.280000000000001 367 | c1ccc2c(c1)Oc3ccccc3O2,-3.15 368 | Cc1ccccc1N,-5.53 369 | CCCCBr,-0.4 370 | CCCCCCCCCO,-3.88 371 | Cc1ccncc1,-4.93 372 | C(=C(Cl)Cl)(Cl)Cl,0.1 373 | CC(C)(C)Br,0.84 374 | C=C(c1ccccc1)c2ccccc2,-2.78 375 | CCc1ccc(cc1)C,-0.95 376 | Cc1cccnc1,-4.77 377 | COCC(OC)(OC)OC,-5.73 378 | c1ccc-2c(c1)Cc3c2cccc3,-3.35 379 | CC(=O)N,-9.71 380 | COS(=O)(=O)OC,-5.1 381 | C(C(Cl)Cl)(Cl)Cl,-2.37 382 | COC(=O)C1CCCCC1,-3.3 383 | CCCCCCBr,0.18 384 | CCCCCCCBr,0.34 385 | c1ccc2c(c1)Oc3cccc(c3O2)Cl,-3.52 386 | COC(CC#N)(OC)OC,-6.4 387 | CC[C@H](C)Cl,0 388 | CCCCCCc1ccccc1,-0.04 389 | COc1cc(c(c(c1O)OC)Cl)C=O,-7.78 390 | c1cc(cc(c1)C(F)(F)F)C(F)(F)F,1.07 391 | c1ccc(cc1)Cn2ccnc2,-7.63 392 | c1ccc2c(c1)cccc2N,-7.28 393 | CCOC(=O)CC(=O)OCC,-6 394 | CC(=O)C1CC1,-4.61 395 | c1cc[nH]c1,-4.78 396 | c1cc(c(cc1c2ccc(cc2F)F)C(=O)O)O,-9.4 397 | CC1CCC(CC1)C,2.11 398 | C1CCC(CC1)O,-5.46 399 | CN(C)CCC=C1c2ccccc2CCc3c1cccc3,-7.43 400 | c1cc(ccc1O)F,-6.19 401 | c1ccc(c(c1)N)Cl,-4.91 402 | Cc1ccc(c(c1)C)C,-0.86 403 | CCc1ccccc1C,-0.85 404 | C[C@@H]1CC[C@H](CC1=O)C(=C)C,-3.75 405 | c1ccc(cc1)c2ccccc2,-2.7 406 | Cc1cccc(c1C)O,-6.16 407 | COP(=S)(OC)Oc1ccc(cc1)[N+](=O)[O-],-7.19 408 | CCOP(=S)(OCC)Oc1ccc(cc1)[N+](=O)[O-],-6.74 409 | CCN(CC)c1c(cc(c(c1[N+](=O)[O-])N)C(F)(F)F)[N+](=O)[O-],-5.66 410 | CSC,-1.61 411 | C[C@@H](c1cccc(c1)C(=O)c2ccccc2)C(=O)O,-10.78 412 | C1CCC(C1)O,-5.49 413 | CCCCC(=O)OC,-2.56 414 | CCCC(=C)C,1.47 415 | C[C@@H](c1ccc(c(c1)F)c2ccccc2)C(=O)O,-8.42 416 | CCCN(CCC)c1c(cc(cc1[N+](=O)[O-])S(=O)(=O)C)[N+](=O)[O-],-7.98 417 | C=CCl,-0.59 418 | Cc1ccc(cc1)C(=O)N(C)C,-9.76 419 | CCCC(=O)CCC,-2.92 420 | COC(=O)c1ccccc1,-3.92 421 | Cc1ccc(cc1)C=O,-4.27 422 | CCCC(=O)OCCC,-2.28 423 | C1CNCCN1,-7.4 424 | CCOP(=S)(OCC)S[C@@H](CCl)N1C(=O)c2ccccc2C1=O,-5.74 425 | CCOCCO,-6.69 426 | CCC(C)CC,2.51 427 | Cc1cnccn1,-5.51 428 | CCC[N+](=O)[O-],-3.34 429 | Cc1cc(cc(c1)C)C,-0.9 430 | c1c(c(=O)[nH]c(=O)[nH]1)F,-16.92 431 | CCO,-5 432 | Cc1ccc(c2c1cccc2)C,-2.82 433 | c1c2c(cc(c1Cl)Cl)Oc3cc(c(cc3O2)Cl)Cl,-3.37 434 | c1cc(c(c(c1)Cl)C#N)Cl,-4.71 435 | CCOC=O,-2.56 436 | c1c(c(cc(c1Cl)Cl)Cl)Cl,-1.34 437 | CCOC(OCC)Oc1ccccc1,-5.23 438 | c1cc(cc(c1)O)[N+](=O)[O-],-9.62 439 | CCCCCCCCO,-4.09 440 | CCC=C,1.38 441 | C(Cl)(Cl)(Cl)Cl,0.08 442 | c1ccc(cc1)CCO,-6.79 443 | CN(C)C(=O)Nc1ccccc1,-9.13 444 | CSSC,-1.83 445 | C1C=CC[C@@H]2[C@@H]1C(=O)N(C2=O)SC(Cl)(Cl)Cl,-9.01 446 | CC(=O)OCC(COC(=O)C)OC(=O)C,-8.84 447 | COC,-1.91 448 | CCCCCC,2.48 449 | C(CBr)Br,-2.33 450 | C(C(Cl)(Cl)Cl)(Cl)Cl,-1.23 451 | c1c(c(=O)[nH]c(=O)[nH]1)C(F)(F)F,-15.46 452 | Cc1cccc(c1N)C,-5.21 453 | CCCOC(=O)C,-2.79 454 | c1ccc2c(c1)cccn2,-5.72 455 | CCS,-1.14 456 | CCSSCC,-1.64 457 | c1ccsc1,-1.4 458 | CCc1cccc2c1cccc2,-2.4 459 | CCCC(=O)C,-3.52 460 | c1c(c(c(c(c1Cl)Cl)Cl)Cl)c2c(cc(c(c2Cl)Cl)Cl)Cl,-4.61 461 | CCC[N@@](CC1CC1)c2c(cc(cc2[N+](=O)[O-])C(F)(F)F)[N+](=O)[O-],-2.45 462 | CC(=O)O,-6.69 463 | CC=O,-3.5 464 | c1cc(cc(c1)[N+](=O)[O-])N,-8.84 465 | CCCCC#C,0.29 466 | COc1ccccc1N,-6.12 467 | c1ccc(cc1)O,-6.6 468 | CCC#N,-3.84 469 | c1ccc2c(c1)cccc2O,-7.67 470 | CCCCOC(=O)C,-2.64 471 | CC(C)(/C=N\OC(=O)NC)SC,-9.84 472 | Cc1ccccc1O,-5.9 473 | CC(C)C=O,-2.86 474 | CCC(=O)N,-9.4 475 | CCCBr,-0.56 476 | CC(C)Cl,-0.25 477 | C(CCl)CCl,-1.89 478 | c1cc(ccc1[N+](=O)[O-])O,-10.64 479 | C[C@@H](CCl)Cl,-1.27 480 | c1cc(ccc1N)Cl,-5.9 481 | c1ccc2c(c1)C(=O)c3cccc(c3C2=O)N,-9.44 482 | Cc1cccnc1C,-4.82 483 | c1cnccc1C#N,-6.02 484 | CCOP(=S)(OCC)SCSCC,-4.37 485 | CC(=O)C1CCCCC1,-3.9 486 | Cc1ccccc1C=O,-3.93 487 | CC(=O)c1ccncc1,-7.62 488 | c1c2c(cc(c1Cl)Cl)Oc3c(c(c(c(c3Cl)Cl)Cl)Cl)O2,-3.71 489 | CC(=O)C,-3.8 490 | CC(=C)C,1.16 491 | c1cc(c(cc1Cl)c2cc(c(c(c2)Cl)Cl)Cl)Cl,-3.61 492 | CCCCC[N+](=O)[O-],-2.82 493 | CCC/C=C/C=O,-3.68 494 | CN(C)C(=O)c1ccc(cc1)[N+](=O)[O-],-11.95 495 | C1CCOC1,-3.47 496 | CCCCCCCC,2.88 497 | CCCN(CCC)c1c(cc(cc1[N+](=O)[O-])C(F)(F)F)[N+](=O)[O-],-3.25 498 | CC(=CCC[C@](C)(C=C)OC(=O)C)C,-2.49 499 | C[C@@H](CCO[N+](=O)[O-])O[N+](=O)[O-],-4.29 500 | CC(C)OC(C)C,-0.53 501 | CCCCC(C)C,2.93 502 | c1(c(c(c(c(c1Cl)Cl)Cl)Cl)Cl)N(=O)=O,-5.22 503 | [C@@H](C(F)(F)F)(Cl)Br,-0.11 504 | CCCCOCCCC,-0.83 505 | CCCCCC1CCCC1,2.55 506 | CC(C)CC(C)C,2.83 507 | Cc1ccc(nc1)C,-4.72 508 | C/C=C/C=O,-4.22 509 | CCC[C@H](C)CC,2.71 510 | c1cc(c(c(c1)Cl)c2c(cc(cc2Cl)Cl)Cl)Cl,-1.96 511 | c1ccc(cc1)O[C@@H](C(F)F)F,-1.29 512 | COCCOC,-4.84 513 | CC[C@H](C)c1ccccc1,-0.45 514 | c1ccc(cc1)CCCO,-6.92 515 | CC[C@@H](C)c1cc(cc(c1O)[N+](=O)[O-])[N+](=O)[O-],-6.23 516 | COc1ccc(cc1)C(=O)OC,-5.33 517 | CCC(=O)Nc1ccc(c(c1)Cl)Cl,-7.78 518 | C[C@@H](c1ccc2cc(ccc2c1)OC)C(=O)O,-10.21 519 | C1(C(C(C1(F)F)(F)F)(F)F)(F)F,3.43 520 | CC(C)CCOC(=O)C,-2.21 521 | CCCCCCCl,0 522 | CC(C)CC(=O)C,-3.05 523 | CCCCCC=O,-2.81 524 | c1cc(cc(c1)Cl)N,-5.82 525 | C1COCCN1,-7.17 526 | CCOC(C)OCC,-3.28 527 | CCCC[N@](CC)c1c(cc(cc1[N+](=O)[O-])C(F)(F)F)[N+](=O)[O-],-3.51 528 | CS,-1.2 529 | C1[C@@H]2[C@H](COS(=O)O1)[C@@]3(C(=C([C@]2(C3(Cl)Cl)Cl)Cl)Cl)Cl,-4.23 530 | CC(=O)c1ccc(cc1)OC,-4.4 531 | C=CCO,-5.03 532 | CCSC,-1.5 533 | CCCCCOC(=O)C,-2.51 534 | c1c(cc(c(c1Cl)Cl)Cl)Cl,-1.62 535 | CC(=O)c1ccccc1,-4.58 536 | CCCl,-0.63 537 | CCCC1CCCC1,2.13 538 | c1c(cc(cc1Cl)Cl)Cl,-0.78 539 | CCCOC(=O)c1ccc(cc1)O,-9.37 540 | c1cc(cc(c1)Cl)O,-6.62 541 | CC(C)CCO,-4.42 542 | CCCCCN,-4.09 543 | Cc1c(c(=O)n(c(=O)[nH]1)C(C)(C)C)Cl,-11.14 544 | CC(C)CCC(C)(C)C,2.93 545 | CCCCOCCO,-6.25 546 | C1[C@@H]2[C@H]3[C@@H]([C@H]1[C@H]4[C@@H]2O4)[C@@]5(C(=C([C@]3(C5(Cl)Cl)Cl)Cl)Cl)Cl,-4.82 547 | c1ccc(cc1)C(=O)N,-11 548 | CC(C)[N+](=O)[O-],-3.13 549 | C(C(CO)O)O,-13.43 550 | CCCI,-0.53 551 | COCCN,-6.55 552 | C(C(Cl)(Cl)Cl)Cl,-1.43 553 | CCC(=O)OC,-2.93 554 | C1CCCC1,1.2 555 | CCc1cccnc1,-4.59 556 | Cc1cc(cnc1)C,-4.84 557 | COCCO,-6.619999999999999 558 | COC=O,-2.78 559 | c1ccc2cc(ccc2c1)N,-7.47 560 | Cc1c[nH]cn1,-10.27 561 | Cc1cccc(c1)[N+](=O)[O-],-3.45 562 | C(CCCl)CCl,-2.32 563 | CC(=O)CO[N+](=O)[O-],-5.99 564 | CC(C)(C)c1ccccc1,-0.44 565 | CCCCCC(=O)OC,-2.49 566 | C[C@@H](C(F)(F)F)O,-4.16 567 | CCCCCBr,-0.1 568 | CCCCCCC=C,1.92 569 | CC1=CC(=O)[C@@H](CC1)C(C)C,-4.51 570 | CC(C)O,-4.74 571 | CCCCCCN,-3.95 572 | C(CO[N+](=O)[O-])CO[N+](=O)[O-],-4.8 573 | Cc1ccc(c(c1)C)O,-6.01 574 | CCCCCO,-4.57 575 | CCC[C@@H](C)O,-4.39 576 | CCCC[C@@H](C)CC,2.97 577 | C[C@@H](c1ccc(cc1)CC(C)C)C(=O)O,-7 578 | CCOC(=O)C[C@H](C(=O)OCC)SP(=S)(OC)OC,-8.15 579 | Cc1ccc(cc1C)O,-6.5 580 | Cc1cc(ccc1Cl)O,-6.79 581 | CCCC/C=C/C,1.68 582 | CCCOCCC,-1.16 583 | C[C@@H]1CC[C@H]([C@@H](C1)O)C(C)C,-3.2 584 | CCNc1nc(nc(n1)SC)NC(C)(C)C,-6.68 585 | CC(C)CC(C)(C)C,2.89 586 | CCCCC(=O)CCCC,-2.64 587 | CCCCN(CC)C(=O)SCCC,-3.64 588 | CCCCCC=C,1.66 589 | CC(C)OC=O,-2.02 590 | CC(OC(=O)C)OC(=O)C,-4.97 591 | c1c(c(=O)[nH]c(=O)[nH]1)Cl,-17.74 592 | CC(=C)c1ccccc1,-1.24 593 | CCC(C)C,2.38 594 | CCCCO[N+](=O)[O-],-2.09 595 | c1ccc(cc1)Br,-1.46 596 | CC(Cl)(Cl)Cl,-0.19 597 | CC(=C)[C@H]1CCC(=CC1)C=O,-4.09 598 | Cc1ccccc1[N+](=O)[O-],-3.58 599 | CCCCCCCI,0.27 600 | c1cc2ccc3cccc4c3c2c(c1)cc4,-4.52 601 | CCCCCCl,-0.1 602 | CC(C)COC(=O)C,-2.36 603 | CCC(C)(C)C,2.51 604 | c1cc(ccc1N)N(=O)=O,-9.82 605 | COC(=O)CC#N,-6.72 606 | COc1ccc(cc1)N,-7.48 607 | CC(C)Cc1ccccc1,0.16 608 | c1ccc(cc1)c2c(cc(cc2Cl)Cl)Cl,-2.16 609 | CN,-4.55 610 | c1ccc(c(c1)O)Cl,-4.55 611 | c1ccc2c(c1)C(=O)c3ccc(cc3C2=O)N,-11.53 612 | C(=C\Cl)\Cl,-1.17 613 | CCCCC(=O)C,-3.28 614 | C(CO[N+](=O)[O-])O[N+](=O)[O-],-5.73 615 | c1ccc(c(c1)O)F,-5.29 616 | Cc1c(nc(nc1OC(=O)N(C)C)N(C)C)C,-9.41 617 | C=Cc1ccccc1,-1.24 618 | CCOP(=O)(OCC)OCC,-7.5 619 | C(C(F)(F)F)O,-4.31 620 | CCCCOC[C@H](C)O,-5.73 621 | CCCO,-4.85 622 | Cc1ccccc1C,-0.9 623 | CC(C)(C)C,2.51 624 | CCCC#C,0.01 625 | c1ccc2c(c1)C(=O)NC2=O,-9.61 626 | CCCCI,-0.25 627 | Cc1ccc(cc1)O,-6.13 628 | CC(C)I,-0.46 629 | COc1ccccc1O,-5.94 630 | C1CC=CC1,0.56 631 | C[C@H](C(F)(F)F)O,-4.2 632 | CCCN,-4.39 633 | c1ccc(c(c1)[N+](=O)[O-])O,-4.58 634 | Cc1cccc2c1cccc2,-2.44 635 | c1(c(c(c(c(c1Cl)Cl)Cl)Cl)Cl)Cl,-2.33 636 | CCCCC/C=C/C=O,-3.43 637 | CCCCCCC#C,0.71 638 | CCOP(=S)(OCC)Oc1cc(nc(n1)C(C)C)C,-6.48 639 | CCCCCCCC(=O)OC,-2.04 640 | C1CCNC1,-5.48 641 | c1cc(ccc1C=O)O,-8.83 642 | CCCCCCCCl,0.29 643 | C1COCCO1,-5.06 644 | -------------------------------------------------------------------------------- /data/molecule_dataset_selfies.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HUBioDataLab/SELFormer/65e686feb72185cc95f5b81176444e20586848e1/data/molecule_dataset_selfies.zip -------------------------------------------------------------------------------- /data/molecule_dataset_smiles.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HUBioDataLab/SELFormer/65e686feb72185cc95f5b81176444e20586848e1/data/molecule_dataset_smiles.zip -------------------------------------------------------------------------------- /data/pretraining_hyperparameters.yml: -------------------------------------------------------------------------------- 1 | my_set0: &molbert 2 | HIDDEN_SIZE: 768 3 | TRAIN_BATCH_SIZE: 16 4 | VALID_BATCH_SIZE: 8 5 | TRAIN_EPOCHS: 100 6 | LEARNING_RATE: 0.00005 7 | WEIGHT_DECAY: 0.01 8 | MAX_LEN: 128 9 | VOCAB_SIZE: 800 10 | MAX_POSITION_EMBEDDINGS: 514 11 | NUM_ATTENTION_HEADS: 12 12 | NUM_HIDDEN_LAYERS: 8 13 | TYPE_VOCAB_SIZE: 1 14 | my_set1: &chemberta 15 | HIDDEN_SIZE: 768 16 | TRAIN_BATCH_SIZE: 16 17 | VALID_BATCH_SIZE: 8 18 | TRAIN_EPOCHS: 100 19 | LEARNING_RATE: 0.00005 20 | WEIGHT_DECAY: 0.01 21 | MAX_LEN: 128 22 | VOCAB_SIZE: 800 23 | MAX_POSITION_EMBEDDINGS: 514 24 | NUM_ATTENTION_HEADS: 6 25 | NUM_HIDDEN_LAYERS: 6 26 | TYPE_VOCAB_SIZE: 1 27 | my_set2: &run0-set30 28 | HIDDEN_SIZE: 768 29 | TRAIN_BATCH_SIZE: 16 30 | VALID_BATCH_SIZE: 8 31 | TRAIN_EPOCHS: 100 32 | LEARNING_RATE: 0.00005 33 | WEIGHT_DECAY: 0.01 34 | MAX_LEN: 128 35 | VOCAB_SIZE: 800 36 | MAX_POSITION_EMBEDDINGS: 514 37 | NUM_ATTENTION_HEADS: 4 38 | NUM_HIDDEN_LAYERS: 12 39 | TYPE_VOCAB_SIZE: 1 40 | -------------------------------------------------------------------------------- /data/requirements.yml: -------------------------------------------------------------------------------- 1 | name: SELFormer_env 2 | channels: 3 | - anaconda 4 | - defaults 5 | - conda-forge 6 | dependencies: 7 | - pytorch=1.13.1 8 | - transformers=4.26.1 9 | - pyyaml=6.0 10 | - yaml=0.2.5 11 | - scikit-learn=1.2.1 12 | - datasets=2.9.0 13 | - chemprop=1.5.2 14 | - tokenizers=0.13.2 15 | - pip=22.3.1 16 | - pip: 17 | - simpletransformers==0.63.9 18 | - pandarallel==1.6.4 19 | - wandb==0.13.10 20 | - selfies==2.1.1 -------------------------------------------------------------------------------- /figures/selformer_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HUBioDataLab/SELFormer/65e686feb72185cc95f5b81176444e20586848e1/figures/selformer_architecture.png -------------------------------------------------------------------------------- /generate_selfies.py: -------------------------------------------------------------------------------- 1 | 2 | import argparse 3 | 4 | parser = argparse.ArgumentParser() 5 | parser.add_argument("--smiles_dataset", required=True, metavar="/path/to/dataset/", help="Path of the input SMILES dataset.") 6 | parser.add_argument("--selfies_dataset", required=True, metavar="/path/to/dataset/", help="Path of the output SEFLIES dataset.") 7 | args = parser.parse_args() 8 | 9 | import pandas as pd 10 | from prepare_pretraining_data import prepare_data 11 | 12 | prepare_data(path=args.smiles_dataset, save_to=args.selfies_dataset) 13 | chembl_df = pd.read_csv(args.selfies_dataset) 14 | print("SELFIES representation file is ready.") -------------------------------------------------------------------------------- /get_embeddings.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | os.environ["TOKENIZERS_PARALLELISM"] = "false" 4 | os.environ["WANDB_DISABLED"] = "true" 5 | os.environ["CUDA_VISIBLE_DEVICES"] = "0" 6 | 7 | import pandas as pd 8 | from pandarallel import pandarallel 9 | 10 | from transformers import RobertaTokenizer, RobertaModel, RobertaConfig 11 | import torch 12 | 13 | df = pd.read_csv("./data/molecule_dataset_selfies.csv") # path of the selfies data 14 | 15 | model_name = "./data/pretrained_models/SELFormer" # path of the pre-trained model 16 | config = RobertaConfig.from_pretrained(model_name) 17 | config.output_hidden_states = True 18 | tokenizer = RobertaTokenizer.from_pretrained("./data/RobertaFastTokenizer") 19 | model = RobertaModel.from_pretrained(model_name, config=config) 20 | 21 | 22 | def get_sequence_embeddings(selfies): 23 | token = torch.tensor([tokenizer.encode(selfies, add_special_tokens=True, max_length=512, padding=True, truncation=True)]) 24 | output = model(token) 25 | 26 | sequence_out = output[0] 27 | return torch.mean(sequence_out[0], dim=0).tolist() 28 | 29 | print("Starting") 30 | df = df[:100000] # how many molecules should be processed 31 | pandarallel.initialize(nb_workers=5) # number of threads 32 | df["sequence_embeddings"] = df.selfies.parallel_apply(get_sequence_embeddings) 33 | 34 | df.drop(columns=["selfies"], inplace=True) # not interested in selfies data anymore, only chembl_id and the embedding 35 | df.to_csv("./data/embeddings.csv", index=False) # save embeddings here 36 | print("Finished") 37 | -------------------------------------------------------------------------------- /get_moleculenet_embeddings.py: -------------------------------------------------------------------------------- 1 | import os 2 | from time import time 3 | from fnmatch import fnmatch 4 | 5 | import pandas as pd 6 | from pandarallel import pandarallel 7 | import to_selfies 8 | import torch 9 | from transformers import RobertaTokenizer, RobertaModel, RobertaConfig 10 | 11 | import argparse 12 | 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--dataset_path", required=True, metavar="/path/to/dataset/", help="Path of the input MoleculeNet datasets.") 15 | parser.add_argument("--model_file", required=True, metavar="", type=str, help="Name of the pretrained model.") 16 | 17 | args = parser.parse_args() 18 | 19 | os.environ["TOKENIZERS_PARALLELISM"] = "false" 20 | os.environ["WANDB_DISABLED"] = "true" 21 | os.environ["CUDA_VISIBLE_DEVICES"] = "0" 22 | 23 | model_file = args.model_file # path of the pre-trained model 24 | config = RobertaConfig.from_pretrained(model_file) 25 | config.output_hidden_states = True 26 | tokenizer = RobertaTokenizer.from_pretrained("./data/RobertaFastTokenizer") 27 | model = RobertaModel.from_pretrained(model_file, config=config) 28 | 29 | 30 | def generate_moleculenet_selfies(dataset_file): 31 | """ 32 | Generates SELFIES for a given dataset and saves it to a file. 33 | :param dataset_file: path to the dataset file 34 | """ 35 | 36 | dataset_name = dataset_file.split("/")[-1].split(".")[0] 37 | 38 | print(f'\nGenerating SELFIES for {dataset_name}') 39 | 40 | if dataset_name == 'bace': 41 | smiles_column = 'mol' 42 | else: 43 | smiles_column = 'smiles' 44 | 45 | # read dataset 46 | dataset_df = pd.read_csv(os.path.join(dataset_file)) 47 | dataset_df["selfies"] = dataset_df[smiles_column] # creating a new column "selfies" that is a copy of smiles_column 48 | 49 | # generate selfies 50 | pandarallel.initialize() 51 | dataset_df.selfies = dataset_df.selfies.parallel_apply(to_selfies.to_selfies) 52 | 53 | dataset_df.drop(dataset_df[dataset_df[smiles_column] == dataset_df.selfies].index, inplace=True) 54 | dataset_df.drop(columns=[smiles_column], inplace=True) 55 | out_name = dataset_name + "_selfies.csv" 56 | 57 | # save selfies to file 58 | path = os.path.dirname(dataset_file) 59 | 60 | dataset_df.to_csv(os.path.join(path, out_name), index=False) 61 | print(f'Saved to {os.path.join(path, out_name)}') 62 | 63 | 64 | def get_sequence_embeddings(selfies, tokenizer, model): 65 | 66 | torch.set_num_threads(1) 67 | token = torch.tensor([tokenizer.encode(selfies, add_special_tokens=True, max_length=512, padding=True, truncation=True)]) 68 | output = model(token) 69 | 70 | sequence_out = output[0] 71 | return torch.mean(sequence_out[0], dim=0).tolist() 72 | 73 | 74 | def generate_embeddings(model_file, args): 75 | 76 | root = args.dataset_path 77 | model_name = model_file.split("/")[-1] 78 | 79 | prepare_data_pattern = "*.csv" 80 | 81 | print(f"\nGenerating embeddings using pre-trained model {model_name}") 82 | for path, subdirs, files in os.walk(root): 83 | for name in files: 84 | if fnmatch(name, prepare_data_pattern) and not any(substring in name for substring in ['selfies', 'embeddings', 'results']): 85 | dataset_file = os.path.join(path, name) 86 | generate_moleculenet_selfies(dataset_file) 87 | 88 | selfies_file = os.path.join(path, name.split(".")[0] + "_selfies.csv") 89 | 90 | dataset_name = selfies_file.split("/")[-1].split(".")[0].split('_selfies')[0] 91 | print(f'\nGenerating embeddings for {dataset_name}') 92 | t0 = time() 93 | 94 | dataset_df = pd.read_csv(selfies_file) 95 | pandarallel.initialize(nb_workers=10, progress_bar=True) # number of threads 96 | dataset_df["sequence_embeddings"] = dataset_df.selfies.parallel_apply(get_sequence_embeddings, args=(tokenizer, model)) 97 | 98 | dataset_df.drop(columns=["selfies"], inplace=True) # not interested in selfies data anymore, only class and the embedding 99 | file_name = f"{dataset_name}_{model_name}_embeddings.csv" 100 | 101 | # save embeddings to file 102 | path = os.path.dirname(selfies_file) 103 | dataset_df.to_csv(os.path.join(path, file_name), index=False) 104 | t1 = time() 105 | 106 | print(f'Finished in {round((t1-t0) / 60, 2)} mins') 107 | print(f'Saved to {os.path.join(path, file_name)}\n') 108 | 109 | generate_embeddings(model_file, args) -------------------------------------------------------------------------------- /multilabel_class_pred.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import pandas as pd 4 | import torch 5 | from simpletransformers.classification import MultiLabelClassificationModel 6 | from prepare_finetuning_data import smiles_to_selfies 7 | import argparse 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument("--task", default="sider", help="task selection.") 11 | parser.add_argument("--pred_set", default="data/finetuning_datasets/classification/sider/sider_mock.csv", metavar="/path/to/dataset/", help="Test set for predictions.") 12 | parser.add_argument("--training_args", default= "data/finetuned_models/SELFormer_sider_scaffold_optimized/training_args.bin", metavar="/path/to/dataset/", help="Trained model arguments.") 13 | parser.add_argument("--model_name", default="data/finetuned_models/SELFormer_sider_scaffold_optimized", metavar="/path/to/dataset/", help="Path to the model.") 14 | parser.add_argument("--num_labels", default=27, type=int, help="Number of labels.") 15 | args = parser.parse_args() 16 | 17 | print("Loading test set...") 18 | pred_set = pd.read_csv(args.pred_set) 19 | pred_df_selfies = smiles_to_selfies(pred_set) 20 | 21 | print("Loading model...") 22 | training_args = torch.load(args.training_args) 23 | num_labels = args.num_labels 24 | model = MultiLabelClassificationModel("roberta", args.model_name, num_labels=num_labels, use_cuda=True, args=args.training_args) 25 | 26 | print("Predicting...") 27 | preds, _ = model.predict(pred_df_selfies["selfies"].tolist()) 28 | 29 | # create a dataframe with the selfies and the predictions each in a seperate column named feature_0, feature_1, etc. 30 | res = pd.DataFrame(preds, columns=["feature_{}".format(i) for i in range(num_labels)]) 31 | res.insert(0, "selfies", pred_df_selfies["selfies"].tolist()) 32 | 33 | if not os.path.exists("data/predictions"): 34 | os.makedirs("data/predictions") 35 | 36 | res.to_csv("data/predictions/{}_predictions.csv".format(args.task), index=False) 37 | print("Predictions saved to data/predictions/{}_predictions.csv".format(args.task)) -------------------------------------------------------------------------------- /prepare_finetuning_data.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import chemprop 3 | 4 | from pandarallel import pandarallel 5 | import to_selfies 6 | 7 | 8 | def smiles_to_selfies(df): 9 | df.insert(0, "selfies", df["smiles"]) 10 | pandarallel.initialize() 11 | df.selfies = df.selfies.parallel_apply(to_selfies.to_selfies) 12 | 13 | df.drop(df[df.smiles == df.selfies].index, inplace=True) 14 | df.drop(columns=["smiles"], inplace=True) 15 | 16 | return df 17 | 18 | 19 | def train_val_test_split_multilabel(path, scaffold_split): 20 | main_df = pd.read_csv(path) 21 | main_df.sample(frac=1).reset_index(drop=True) # shuffling 22 | main_df.rename(columns={main_df.columns[0]: "smiles"}, inplace=True) 23 | main_df.fillna(0, inplace=True) 24 | main_df.reset_index(drop=True, inplace=True) 25 | 26 | if scaffold_split: 27 | molecule_list = [] 28 | for _, row in main_df.iterrows(): 29 | molecule_list.append(chemprop.data.data.MoleculeDatapoint(smiles=[row["smiles"]], targets=row[1:].values)) 30 | molecule_dataset = chemprop.data.data.MoleculeDataset(molecule_list) 31 | (train, val, test) = chemprop.data.scaffold.scaffold_split(data=molecule_dataset, sizes=(0.8, 0.1, 0.1), seed=42, balanced=True) 32 | return (train, val, test) 33 | 34 | else: # random split 35 | from sklearn.model_selection import train_test_split 36 | 37 | train, val = train_test_split(main_df, test_size=0.2, random_state=42) 38 | val, test = train_test_split(val, test_size=0.5, random_state=42) 39 | return (train, val, test) 40 | 41 | 42 | def train_val_test_split(path, target_column_number=1, scaffold_split=False): 43 | main_df = pd.read_csv(path) 44 | main_df.sample(frac=1).reset_index(drop=True) # shuffling 45 | main_df.rename(columns={main_df.columns[0]: "smiles", main_df.columns[target_column_number]: "target"}, inplace=True) 46 | main_df = main_df[["smiles", "target"]] 47 | # main_df.dropna(subset=["target"], inplace=True) 48 | main_df.fillna(0, inplace=True) 49 | main_df.reset_index(drop=True, inplace=True) 50 | 51 | if scaffold_split: 52 | molecule_list = [] 53 | for _, row in main_df.iterrows(): 54 | molecule_list.append(chemprop.data.data.MoleculeDatapoint(smiles=[row["smiles"]], targets=row[1:].values)) 55 | molecule_dataset = chemprop.data.data.MoleculeDataset(molecule_list) 56 | (train, val, test) = chemprop.data.scaffold.scaffold_split(data=molecule_dataset, sizes=(0.8, 0.1, 0.1), seed=42, balanced=True) 57 | return (train, val, test) 58 | 59 | else: # random split 60 | from sklearn.model_selection import train_test_split 61 | 62 | train, val = train_test_split(main_df, test_size=0.2, random_state=42) 63 | val, test = train_test_split(val, test_size=0.5, random_state=42) 64 | return (train, val, test) 65 | -------------------------------------------------------------------------------- /prepare_pretraining_data.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from pandarallel import pandarallel 3 | 4 | import to_selfies 5 | 6 | 7 | def prepare_data(path="data/molecule_dataset_smiles.txt", save_to="./data/molecule_dataset_selfies.csv"): 8 | chembl_df = pd.read_csv(path, sep="\t") # data is TAB separated. 9 | # chembl_df.drop(columns=["standard_inchi", "standard_inchi_key"], inplace=True) # we are not interested in "standard_inchi" and "standard_inchi_key" columns. 10 | chembl_df["selfies"] = chembl_df["canonical_smiles"] # creating a new column "selfies" that is a copy of "canonical_smiles" 11 | 12 | pandarallel.initialize() 13 | chembl_df.selfies = chembl_df.selfies.parallel_apply(to_selfies.to_selfies) 14 | 15 | chembl_df.drop(chembl_df[chembl_df.canonical_smiles == chembl_df.selfies].index, inplace=True) 16 | chembl_df.drop(columns=["canonical_smiles"], inplace=True) 17 | chembl_df.to_csv(save_to, index=False) 18 | 19 | 20 | def create_selfies_file(selfies_df, save_to="./data/selfies_subset.txt", subset_size=100000, do_subset=True): 21 | selfies_df.sample(frac=1).reset_index(drop=True) # shuffling 22 | 23 | if do_subset: 24 | selfies_subset = selfies_df.selfies[:subset_size] 25 | else: 26 | selfies_subset = selfies_df.selfies 27 | selfies_subset = selfies_subset.to_frame() 28 | selfies_subset["selfies"].to_csv(save_to, index=False, header=False) 29 | -------------------------------------------------------------------------------- /produce_embeddings.py: -------------------------------------------------------------------------------- 1 | 2 | import argparse 3 | 4 | parser = argparse.ArgumentParser() 5 | parser.add_argument("--selfies_dataset", required=True, metavar="/path/to/dataset/", help="Path of the input SEFLIES dataset.") 6 | parser.add_argument("--model_file", required=True, metavar="/path/to/dataset/", help="Path of the pretrained model file.") 7 | parser.add_argument("--embed_file", required=True, metavar="/path/to/dataset/", help="Path of the output embeddings file.") 8 | args = parser.parse_args() 9 | 10 | import os 11 | 12 | os.environ["TOKENIZERS_PARALLELISM"] = "false" 13 | os.environ["WANDB_DISABLED"] = "true" 14 | os.environ["CUDA_VISIBLE_DEVICES"] = "0" 15 | 16 | import pandas as pd 17 | from pandarallel import pandarallel 18 | 19 | from transformers import RobertaTokenizer, RobertaModel, RobertaConfig 20 | import torch 21 | 22 | df = pd.read_csv(args.selfies_dataset) # path of the selfies data 23 | 24 | model_name = args.model_file # path of the pre-trained model 25 | config = RobertaConfig.from_pretrained(model_name) 26 | config.output_hidden_states = True 27 | tokenizer = RobertaTokenizer.from_pretrained("./data/RobertaFastTokenizer") 28 | model = RobertaModel.from_pretrained(model_name, config=config) 29 | 30 | 31 | def get_sequence_embeddings(selfies): 32 | token = torch.tensor([tokenizer.encode(selfies, add_special_tokens=True, max_length=512, padding=True, truncation=True)]) 33 | output = model(token) 34 | 35 | sequence_out = output[0] 36 | return torch.mean(sequence_out[0], dim=0).tolist() 37 | 38 | print("Starting") 39 | # df = df[:100000] # how many molecules should be processed 40 | pandarallel.initialize(nb_workers=5,progress_bar=True) # number of threads 41 | df["sequence_embeddings"] = df.selfies.parallel_apply(get_sequence_embeddings) 42 | 43 | df.drop(columns=["selfies"], inplace=True) # not interested in selfies data anymore, only chembl_id and the embedding 44 | df.to_csv(args.embed_file, index=False) # save embeddings here 45 | print("Finished") 46 | 47 | print("Molecule embeddings are ready.") 48 | -------------------------------------------------------------------------------- /regression_pred.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import pandas as pd 4 | import torch 5 | from torch.nn import MSELoss 6 | from torch.utils.data import Dataset 7 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast 8 | from transformers.models.roberta.modeling_roberta import ( 9 | RobertaClassificationHead, 10 | RobertaConfig, 11 | RobertaModel, 12 | ) 13 | from transformers import Trainer, TrainingArguments 14 | from prepare_finetuning_data import smiles_to_selfies 15 | import argparse 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument("--task", default="esol", help="task selection.") 19 | parser.add_argument("--tokenizer_name", default="data/RobertaFastTokenizer", metavar="/path/to/dataset/", help="Tokenizer selection.") 20 | parser.add_argument("--pred_set", default='data/finetuning_datasets/regression/esol/esol_mock.csv', metavar="/path/to/dataset/", help="Test set for predictions.") 21 | parser.add_argument("--training_args", default= "data/finetuned_models/esol_regression/training_args.bin", metavar="/path/to/dataset/", help="Trained model arguments.") 22 | parser.add_argument("--model_name", default='data/finetuned_models/esol_regression', metavar="/path/to/dataset/", help="Path to the model.") 23 | args = parser.parse_args() 24 | 25 | class SELFIESTransformers_For_Regression(BertPreTrainedModel): 26 | def __init__(self, config): 27 | super(SELFIESTransformers_For_Regression, self).__init__(config) 28 | self.num_labels = config.num_labels 29 | self.roberta = RobertaModel(config) 30 | self.classifier = RobertaClassificationHead(config) 31 | 32 | def forward(self, input_ids, attention_mask, labels=None): 33 | outputs = self.roberta(input_ids, attention_mask=attention_mask) 34 | sequence_output = outputs[0] 35 | logits = self.classifier(sequence_output) 36 | 37 | outputs = (logits,) + outputs[2:] 38 | 39 | if labels is not None: 40 | if self.num_labels == 1: # regression 41 | loss_fct = MSELoss() 42 | loss = loss_fct(logits.squeeze(), labels.squeeze()) 43 | outputs = (loss,) + outputs 44 | return outputs # (loss), logits, (hidden_states), (attentions) 45 | 46 | model_class = SELFIESTransformers_For_Regression 47 | 48 | model_name = args.model_name 49 | tokenizer_name = args.tokenizer_name 50 | num_labels = 1 51 | config_class = RobertaConfig 52 | config = config_class.from_pretrained(model_name, num_labels=num_labels) 53 | 54 | model_class = SELFIESTransformers_For_Regression 55 | model = model_class.from_pretrained(model_name, config=config) 56 | 57 | tokenizer_class = RobertaTokenizerFast 58 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False) 59 | 60 | class SELFIESTransfomers_Dataset(Dataset): 61 | def __init__(self, data, tokenizer, MAX_LEN): 62 | text = data 63 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt") 64 | 65 | 66 | def __len__(self): 67 | return len(self.examples["input_ids"]) 68 | 69 | def __getitem__(self, index): 70 | item = {key: self.examples[key][index] for key in self.examples} 71 | 72 | return item 73 | 74 | pred_set = pd.read_csv(args.pred_set) 75 | pred_df_selfies = smiles_to_selfies(pred_set) 76 | 77 | MAX_LEN = 128 78 | 79 | pred_examples = (pred_df_selfies.iloc[:, 0].astype(str).tolist()) 80 | pred_dataset = SELFIESTransfomers_Dataset(pred_examples, tokenizer, MAX_LEN) 81 | 82 | training_args = torch.load(args.training_args) 83 | 84 | trainer = Trainer(model=model, args=training_args) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset 85 | 86 | raw_pred, label_ids, metrics = trainer.predict(pred_dataset) 87 | y_pred = [i[0] for i in raw_pred] 88 | 89 | res = pd.concat([pred_df_selfies, pd.DataFrame(y_pred, columns=["prediction"])], axis = 1) 90 | 91 | if not os.path.exists("data/predictions"): 92 | os.makedirs("data/predictions") 93 | 94 | res.to_csv("data/predictions/{}_predictions.csv".format(args.task), index=False) 95 | -------------------------------------------------------------------------------- /roberta_model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.utils.data.dataset import Dataset 3 | 4 | import os 5 | 6 | os.environ["TOKENIZERS_PARALLELISM"] = "false" 7 | os.environ["WANDB_DISABLED"] = "true" 8 | 9 | 10 | class CustomDataset(Dataset): 11 | def __init__(self, df, tokenizer, MAX_LEN): 12 | self.examples = [] 13 | 14 | for example in df.values: 15 | x = tokenizer.encode_plus(example, max_length=MAX_LEN, truncation=True, padding="max_length") 16 | self.examples += [x.input_ids] 17 | 18 | def __len__(self): 19 | return len(self.examples) 20 | 21 | def __getitem__(self, i): 22 | return torch.tensor(self.examples[i]) 23 | 24 | 25 | import pandas as pd 26 | from sklearn.model_selection import train_test_split 27 | 28 | from transformers import RobertaConfig 29 | from transformers import RobertaForMaskedLM 30 | from transformers import RobertaTokenizerFast 31 | 32 | from transformers import DataCollatorForLanguageModeling 33 | from transformers import Trainer, TrainingArguments 34 | 35 | import math 36 | 37 | 38 | def train_and_save_roberta_model(hyperparameters_dict, selfies_path="./data/selfies_subset.txt", robertatokenizer_path="./data/robertatokenizer/", save_to="./saved_model/"): 39 | TRAIN_BATCH_SIZE = hyperparameters_dict["TRAIN_BATCH_SIZE"] 40 | VALID_BATCH_SIZE = hyperparameters_dict["VALID_BATCH_SIZE"] 41 | TRAIN_EPOCHS = hyperparameters_dict["TRAIN_EPOCHS"] 42 | LEARNING_RATE = hyperparameters_dict["LEARNING_RATE"] 43 | WEIGHT_DECAY = hyperparameters_dict["WEIGHT_DECAY"] 44 | MAX_LEN = hyperparameters_dict["MAX_LEN"] 45 | 46 | config = RobertaConfig(vocab_size=hyperparameters_dict["VOCAB_SIZE"], max_position_embeddings=hyperparameters_dict["MAX_POSITION_EMBEDDINGS"], num_attention_heads=hyperparameters_dict["NUM_ATTENTION_HEADS"], num_hidden_layers=hyperparameters_dict["NUM_HIDDEN_LAYERS"], type_vocab_size=hyperparameters_dict["TYPE_VOCAB_SIZE"], hidden_size=hyperparameters_dict["HIDDEN_SIZE"]) 47 | 48 | # model = RobertaForMaskedLM(config=config) 49 | def _model_init(): 50 | return RobertaForMaskedLM(config=config) 51 | 52 | df = pd.read_csv(selfies_path, header=None) 53 | 54 | tokenizer = RobertaTokenizerFast.from_pretrained(robertatokenizer_path) 55 | 56 | train_df, eval_df = train_test_split(df, test_size=0.2, random_state=42) 57 | train_dataset = CustomDataset(train_df[0], tokenizer, MAX_LEN) # column name is 0. 58 | eval_dataset = CustomDataset(eval_df[0], tokenizer, MAX_LEN) 59 | 60 | data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15) 61 | 62 | training_args = TrainingArguments( 63 | output_dir=save_to, 64 | overwrite_output_dir=True, 65 | evaluation_strategy="epoch", 66 | save_strategy="epoch", 67 | num_train_epochs=TRAIN_EPOCHS, 68 | learning_rate=LEARNING_RATE, 69 | weight_decay=WEIGHT_DECAY, 70 | per_device_train_batch_size=TRAIN_BATCH_SIZE, 71 | per_device_eval_batch_size=VALID_BATCH_SIZE, 72 | save_total_limit=1, 73 | disable_tqdm=True, 74 | # fp16=True 75 | ) 76 | 77 | trainer = Trainer( 78 | model_init=_model_init, 79 | args=training_args, 80 | data_collator=data_collator, 81 | train_dataset=train_dataset, 82 | eval_dataset=eval_dataset, 83 | # prediction_loss_only=True, 84 | ) 85 | 86 | print("build trainer with on device:", training_args.device, "with n gpus:", training_args.n_gpu) 87 | trainer.train() 88 | print("training finished.") 89 | 90 | eval_results = trainer.evaluate() 91 | print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}") 92 | 93 | trainer.save_model(save_to) 94 | -------------------------------------------------------------------------------- /roberta_tokenizer.py: -------------------------------------------------------------------------------- 1 | from transformers import RobertaTokenizerFast 2 | 3 | 4 | def save_roberta_tokenizer(path="./data/bpe/", save_to="./data/robertatokenizer/"): 5 | tokenizer = RobertaTokenizerFast.from_pretrained(path) 6 | print("Loaded BPE Tokenizer from: " + path) 7 | 8 | tokenizer.save_pretrained(save_to) 9 | print("Saved RobertaTokenizerFast to: " + save_to) 10 | -------------------------------------------------------------------------------- /to_selfies.py: -------------------------------------------------------------------------------- 1 | import selfies as sf 2 | 3 | 4 | def to_selfies(smiles): # returns selfies representation of smiles string. if there is no representation return smiles unchanged. 5 | try: 6 | return sf.encoder(smiles) 7 | except sf.EncoderError: 8 | print("EncoderError in to_selfies()") 9 | return smiles 10 | -------------------------------------------------------------------------------- /train_classification_model.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | os.environ["TOKENIZER_PARALLELISM"] = "false" 4 | os.environ["WANDB_DISABLED"] = "true" 5 | 6 | import numpy as np 7 | import pandas as pd 8 | 9 | import torch 10 | from torch.nn import CrossEntropyLoss 11 | from torch.utils.data import Dataset 12 | 13 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast 14 | 15 | from transformers.models.roberta.modeling_roberta import ( 16 | RobertaClassificationHead, 17 | RobertaConfig, 18 | RobertaModel, 19 | ) 20 | 21 | 22 | # Parse command line arguments 23 | import argparse 24 | 25 | parser = argparse.ArgumentParser() 26 | parser.add_argument("--model", required=True, metavar="/path/to/model", help="Directory of the pre-trained model") 27 | parser.add_argument("--tokenizer", required=True, metavar="/path/to/tokenizer/", help="Directory of the RobertaFastTokenizer") 28 | parser.add_argument("--dataset", required=True, metavar="/path/to/dataset/", help="Path of the fine-tuning dataset") 29 | parser.add_argument("--save_to", required=True, metavar="/path/to/save/to/", help="Directory to save the model") 30 | parser.add_argument("--target_column_id", required=False, default="1", metavar="", type=int, help="Column's ID in the dataframe") 31 | parser.add_argument( 32 | "--use_scaffold", required=False, metavar="", type=int, default=0, help="Split to use. 0 for random, 1 for scaffold. Default: 0", 33 | ) 34 | parser.add_argument("--train_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for training. Default: 8") 35 | parser.add_argument("--validation_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for validation. Default: 8") 36 | parser.add_argument("--num_epochs", required=False, metavar="", type=int, default=50, help="Number of epochs. Default: 50") 37 | parser.add_argument("--lr", required=False, metavar="", type=float, default=1e-5, help="Learning rate. Default: 1e-5") 38 | parser.add_argument("--wd", required=False, metavar="", type=float, default=0.1, help="Weight decay. Default: 0.1") 39 | args = parser.parse_args() 40 | 41 | 42 | # Model 43 | class SELFIESTransformers_For_Classification(BertPreTrainedModel): 44 | def __init__(self, config): 45 | super(SELFIESTransformers_For_Classification, self).__init__(config) 46 | self.num_labels = config.num_labels 47 | self.roberta = RobertaModel(config) 48 | self.classifier = RobertaClassificationHead(config) 49 | 50 | def forward(self, input_ids, attention_mask, labels): 51 | outputs = self.roberta(input_ids, attention_mask=attention_mask) 52 | sequence_output = outputs[0] 53 | logits = self.classifier(sequence_output) 54 | 55 | outputs = (logits,) + outputs[2:] 56 | 57 | if labels is not None: 58 | loss_fct = CrossEntropyLoss() 59 | loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) 60 | outputs = (loss,) + outputs 61 | return outputs # (loss), logits, (hidden_states), (attentions) 62 | 63 | 64 | model_name = args.model 65 | tokenizer_name = args.tokenizer 66 | 67 | 68 | # Configs 69 | num_labels = 2 70 | config_class = RobertaConfig 71 | config = config_class.from_pretrained(model_name, num_labels=num_labels) 72 | 73 | model_class = SELFIESTransformers_For_Classification 74 | model = model_class.from_pretrained(model_name, config=config) 75 | 76 | tokenizer_class = RobertaTokenizerFast 77 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False) 78 | 79 | 80 | # Prepare and Get Data 81 | class SELFIESTransfomers_Dataset(Dataset): 82 | def __init__(self, data, tokenizer, MAX_LEN): 83 | text, labels = data 84 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt") 85 | self.labels = torch.tensor(labels, dtype=torch.long) 86 | 87 | def __len__(self): 88 | return len(self.examples["input_ids"]) 89 | 90 | def __getitem__(self, index): 91 | item = {key: self.examples[key][index] for key in self.examples} 92 | item["label"] = self.labels[index] 93 | return item 94 | 95 | 96 | DATASET_PATH = args.dataset 97 | from prepare_finetuning_data import smiles_to_selfies 98 | from prepare_finetuning_data import train_val_test_split 99 | 100 | if args.use_scaffold == 0: # random split 101 | print("Using random split") 102 | (train_df, validation_df, test_df) = train_val_test_split(DATASET_PATH, args.target_column_id, scaffold_split=False) 103 | else: # scaffold split 104 | print("Using scaffold split") 105 | (train, val, test) = train_val_test_split(DATASET_PATH, args.target_column_id, scaffold_split=True) 106 | 107 | train_smiles = [item[0] for item in train.smiles()] 108 | validation_smiles = [item[0] for item in val.smiles()] 109 | test_smiles = [item[0] for item in test.smiles()] 110 | 111 | train_df = pd.DataFrame(np.column_stack([train_smiles, train.targets()]), columns=["smiles", "target"]) 112 | validation_df = pd.DataFrame(np.column_stack([validation_smiles, val.targets()]), columns=["smiles", "target"]) 113 | test_df = pd.DataFrame(np.column_stack([test_smiles, test.targets()]), columns=["smiles", "target"]) 114 | 115 | train_df = smiles_to_selfies(train_df) 116 | validation_df = smiles_to_selfies(validation_df) 117 | test_df = smiles_to_selfies(test_df) 118 | test_y = pd.DataFrame(test_df.target, columns=["target"]) 119 | 120 | MAX_LEN = 128 121 | train_examples = (train_df.iloc[:, 0].astype(str).tolist(), train_df.iloc[:, 1].tolist()) 122 | train_dataset = SELFIESTransfomers_Dataset(train_examples, tokenizer, MAX_LEN) 123 | 124 | validation_examples = (validation_df.iloc[:, 0].astype(str).tolist(), validation_df.iloc[:, 1].tolist()) 125 | validation_dataset = SELFIESTransfomers_Dataset(validation_examples, tokenizer, MAX_LEN) 126 | 127 | test_examples = (test_df.iloc[:, 0].astype(str).tolist(), test_df.iloc[:, 1].tolist()) 128 | test_dataset = SELFIESTransfomers_Dataset(test_examples, tokenizer, MAX_LEN) 129 | 130 | 131 | from sklearn.metrics import roc_auc_score 132 | from sklearn.metrics import precision_recall_curve 133 | from sklearn.metrics import auc 134 | from datasets import load_metric 135 | 136 | acc = load_metric("accuracy") 137 | precision = load_metric("precision") 138 | recall = load_metric("recall") 139 | f1 = load_metric("f1") 140 | 141 | 142 | def compute_metrics(eval_pred): 143 | predictions, labels = eval_pred 144 | predictions = np.argmax(predictions, axis=1) 145 | 146 | acc_result = acc.compute(predictions=predictions, references=labels) 147 | precision_result = precision.compute(predictions=predictions, references=labels) 148 | recall_result = recall.compute(predictions=predictions, references=labels) 149 | f1_result = f1.compute(predictions=predictions, references=labels) 150 | roc_auc_result = {"roc-auc": roc_auc_score(y_true=labels, y_score=predictions)} 151 | precision_from_curve, recall_from_curve, thresholds_from_curve = precision_recall_curve(labels, predictions) 152 | prc_auc_result = {"prc-auc": auc(recall_from_curve, precision_from_curve)} 153 | 154 | result = {**acc_result, **precision_result, **recall_result, **f1_result, **roc_auc_result, **prc_auc_result} 155 | return result 156 | 157 | 158 | # Train and Evaluate 159 | from transformers import TrainingArguments, Trainer 160 | 161 | TRAIN_BATCH_SIZE = args.train_batch_size 162 | VALID_BATCH_SIZE = args.validation_batch_size 163 | TRAIN_EPOCHS = args.num_epochs 164 | LEARNING_RATE = args.lr 165 | WEIGHT_DECAY = args.wd 166 | MAX_LEN = MAX_LEN 167 | 168 | training_args = TrainingArguments( 169 | output_dir=args.save_to, 170 | overwrite_output_dir=True, 171 | evaluation_strategy="epoch", 172 | save_strategy="epoch", 173 | num_train_epochs=TRAIN_EPOCHS, 174 | learning_rate=LEARNING_RATE, 175 | weight_decay=WEIGHT_DECAY, 176 | per_device_train_batch_size=TRAIN_BATCH_SIZE, 177 | per_device_eval_batch_size=VALID_BATCH_SIZE, 178 | disable_tqdm=True, 179 | # load_best_model_at_end=True, 180 | # metric_for_best_model="roc-auc", 181 | # greater_is_better=True, 182 | save_total_limit=1, 183 | ) 184 | 185 | trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=validation_dataset, compute_metrics=compute_metrics) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset 186 | 187 | metrics = trainer.train() 188 | print("Metrics") 189 | print(metrics) 190 | trainer.save_model(args.save_to) 191 | 192 | # Testing 193 | # Make prediction 194 | raw_pred, label_ids, metrics = trainer.predict(test_dataset) 195 | 196 | # Preprocess raw predictions 197 | y_pred = np.argmax(raw_pred, axis=1) 198 | 199 | # ROC-AUC 200 | roc_auc_score_result = roc_auc_score(y_true=test_y, y_score=y_pred) 201 | # PRC-AUC 202 | precision_from_curve, recall_from_curve, thresholds_from_curve = precision_recall_curve(test_y, y_pred) 203 | auc_score_result = auc(recall_from_curve, precision_from_curve) 204 | 205 | print("\nROC-AUC: ", roc_auc_score_result, "\nPRC-AUC: ", auc_score_result) 206 | -------------------------------------------------------------------------------- /train_classification_multilabel_model.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | os.environ["TOKENIZER_PARALLELISM"] = "false" 4 | os.environ["WANDB_DISABLED"] = "true" 5 | 6 | from simpletransformers.classification import MultiLabelClassificationModel 7 | 8 | import pandas as pd 9 | import numpy as np 10 | 11 | import argparse 12 | 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--model", required=True, metavar="/path/to/model", help="Path to model") 15 | parser.add_argument("--dataset", required=True, metavar="/path/to/dataset/", help="Directory of the dataset") 16 | parser.add_argument("--save_to", required=True, metavar="/path/to/save/to/", help="Directory to save the model") 17 | parser.add_argument("--use_scaffold", required=False, metavar="", type=int, default=0, help="Split to use. 0 for random, 1 for scaffold. Default: 0") 18 | parser.add_argument("--num_epochs", required=False, metavar="", type=int, default=50, help="Number of epochs. Default: 50") 19 | parser.add_argument("--lr", required=False, metavar="", type=float, default=1e-5, help="Learning rate. Default: 1e-5") 20 | parser.add_argument("--wd", required=False, metavar="", type=float, default=0.1, help="Weight decay. Default: 0.1") 21 | parser.add_argument("--batch_size", required=False, metavar="", type=int, default=8, help="Batch size. Default: 8") 22 | args = parser.parse_args() 23 | 24 | 25 | num_labels = len(pd.read_csv(args.dataset).columns) - 1 26 | model_args = { 27 | "num_train_epochs": args.num_epochs, 28 | "learning_rate": args.lr, 29 | "weight_decay": args.wd, 30 | "train_batch_size": args.batch_size, 31 | "output_dir": args.save_to, 32 | } 33 | 34 | model = MultiLabelClassificationModel("roberta", args.model, num_labels=num_labels, use_cuda=True, args=model_args) 35 | 36 | from prepare_finetuning_data import train_val_test_split_multilabel 37 | 38 | if args.use_scaffold == 0: # random split 39 | print("Using random split") 40 | (train_df, eval_df, test_df) = train_val_test_split_multilabel(args.dataset, scaffold_split=False) 41 | 42 | train_df.columns = ["smiles"] + ["Feature_" + str(i) for i in range(num_labels)] 43 | eval_df.columns = ["smiles"] + ["Feature_" + str(i) for i in range(num_labels)] 44 | test_df.columns = ["smiles"] + ["Feature_" + str(i) for i in range(num_labels)] 45 | else: # scaffold split 46 | print("Using scaffold split") 47 | (train, val, test) = train_val_test_split_multilabel(args.dataset, scaffold_split=True) 48 | 49 | train_smiles = [item[0] for item in train.smiles()] 50 | validation_smiles = [item[0] for item in val.smiles()] 51 | test_smiles = [item[0] for item in test.smiles()] 52 | 53 | train_df = pd.DataFrame(np.column_stack([train_smiles, train.targets()]), columns=["smiles"] + ["Feature_" + str(i) for i in range(len(train.targets()[0]))]) 54 | eval_df = pd.DataFrame(np.column_stack([validation_smiles, val.targets()]), columns=["smiles"] + ["Feature_" + str(i) for i in range(len(val.targets()[0]))]) 55 | test_df = pd.DataFrame(np.column_stack([test_smiles, test.targets()]), columns=["smiles"] + ["Feature_" + str(i) for i in range(len(test.targets()[0]))]) 56 | 57 | from prepare_finetuning_data import smiles_to_selfies 58 | 59 | train_df = smiles_to_selfies(train_df) 60 | eval_df = smiles_to_selfies(eval_df) 61 | test_df = smiles_to_selfies(test_df) 62 | 63 | train_df.insert(1, "labels", np.array([train_df["Feature_" + str(i)].to_numpy() for i in range(len(train_df.columns[1:]))], dtype=np.float32).T.tolist()) 64 | eval_df.insert(1, "labels", np.array([eval_df["Feature_" + str(i)].to_numpy() for i in range(len(eval_df.columns[1:]))], dtype=np.float32).T.tolist()) 65 | test_df.insert(1, "labels", np.array([test_df["Feature_" + str(i)].to_numpy() for i in range(len(test_df.columns[1:]))], dtype=np.float32).T.tolist()) 66 | 67 | from sklearn.metrics import roc_auc_score 68 | from sklearn.metrics import precision_recall_curve 69 | from sklearn.metrics import auc 70 | 71 | from datasets import load_metric 72 | 73 | acc = load_metric("accuracy") 74 | precision = load_metric("precision") 75 | recall = load_metric("recall") 76 | f1 = load_metric("f1") 77 | 78 | 79 | def compute_metrics(y_true, y_pred): 80 | acc_result = acc.compute(predictions=y_pred, references=y_true) 81 | precision_result = precision.compute(predictions=y_pred, references=y_true) 82 | recall_result = recall.compute(predictions=y_pred, references=y_true) 83 | f1_result = f1.compute(predictions=y_pred, references=y_true) 84 | roc_auc_result = {"roc-auc": roc_auc_score(y_true=y_true, y_score=y_pred)} 85 | precision_from_curve, recall_from_curve, thresholds_from_curve = precision_recall_curve(y_true, y_pred) 86 | prc_auc_result = {"prc-auc": auc(recall_from_curve, precision_from_curve)} 87 | 88 | result = {**acc_result, **precision_result, **recall_result, **f1_result, **roc_auc_result, **prc_auc_result} 89 | return result 90 | 91 | 92 | model.train_model(train_df) 93 | 94 | print("Evaluation Scores") 95 | preds, _ = model.predict(eval_df["selfies"].tolist()) 96 | print(compute_metrics(np.ravel(eval_df["labels"].tolist()), np.ravel(preds))) 97 | 98 | print("Test Scores") 99 | preds, _ = model.predict(test_df["selfies"].tolist()) 100 | print(compute_metrics(np.ravel(test_df["labels"].tolist()), np.ravel(preds))) 101 | -------------------------------------------------------------------------------- /train_pretraining_model.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | parser = argparse.ArgumentParser() 4 | parser.add_argument("--smiles_dataset", required=False, metavar="/path/to/dataset/", help="Path of the SMILES dataset. If you provided --selfies_dataset argument, then this argument is not required.") 5 | parser.add_argument("--selfies_dataset", required=True, metavar="/path/to/dataset/", help="Path of the SEFLIES dataset. If it does not exist, it will be created at the given path.") 6 | parser.add_argument("--subset_size", required=False, metavar="", type=int, default=0, help="By default the program will use the whole data. If you want to instead use a subset of the data, set this parameter to the size of the subset.") 7 | parser.add_argument("--prepared_data_path", required=True, metavar="/path/to/dataset/", help="Path of the .txt prepared data. If it does not exist, it will be created at the given path.") 8 | parser.add_argument("--bpe_path", required=True, metavar="/path/to/bpetokenizer/", default="", help="Path of the BPE tokenizer. If it does not exist, it will be created at the given path.") 9 | parser.add_argument("--roberta_fast_tokenizer_path", required=True, metavar="/path/to/robertafasttokenizer/", help="Directory of the RobertaTokenizerFast tokenizer. RobertaFastTokenizer only depends on the BPE Tokenizer and will be created regardless of whether it exists or not.") 10 | parser.add_argument("--hyperparameters_path", required=True, metavar="/path/to/hyperparameters/", help="Path of the hyperparameters that will be used for pre-training. Hyperparameters should be stored in a yaml file.") 11 | args = parser.parse_args() 12 | 13 | import pandas as pd 14 | 15 | try: 16 | chembl_df = pd.read_csv(args.selfies_dataset) 17 | except FileNotFoundError: 18 | print("SELFIES dataset was not found. SMILES dataset provided. Converting SMILES to SELFIES.") 19 | from prepare_pretraining_data import prepare_data 20 | 21 | prepare_data(path=args.smiles_dataset, save_to=args.selfies_dataset) 22 | chembl_df = pd.read_csv(args.selfies_dataset) 23 | print("SELFIES .csv is ready.") 24 | 25 | print("Creating SELFIES .txt for tokenization.") 26 | from os.path import isfile # returns True if the file exists else False. 27 | 28 | if not isfile(args.prepared_data_path): 29 | from prepare_pretraining_data import create_selfies_file 30 | 31 | if args.subset_size != 0: 32 | create_selfies_file(chembl_df, subset_size=args.subset_size, do_subset=True, save_to=args.prepared_data_path) 33 | else: 34 | create_selfies_file(chembl_df, do_subset=False, save_to=args.prepared_data_path) 35 | print("SELFIES .txt is ready for tokenization.") 36 | 37 | print("Creating BPE tokenizer.") 38 | if not isfile(args.bpe_path+"/merges.txt"): 39 | import bpe_tokenizer 40 | 41 | bpe_tokenizer.bpe_tokenizer(path=args.prepared_data_path, save_to=args.bpe_path) 42 | print("BPE Tokenizer is ready.") 43 | 44 | print("Creating RobertaTokenizerFast.") 45 | if not isfile(args.roberta_fast_tokenizer_path+"/merges.txt"): 46 | import roberta_tokenizer 47 | 48 | roberta_tokenizer.save_roberta_tokenizer(path=args.bpe_path, save_to=args.roberta_fast_tokenizer_path) 49 | print("RobertaFastTokenizer is ready.") 50 | 51 | import yaml 52 | import roberta_model 53 | 54 | with open(args.hyperparameters_path) as file: 55 | hyperparameters = yaml.safe_load(file) 56 | for key in hyperparameters.keys(): 57 | print("Starting pretraining with {} parameter set.".format(key)) 58 | roberta_model.train_and_save_roberta_model(hyperparameters_dict=hyperparameters[key], selfies_path=args.prepared_data_path, robertatokenizer_path=args.roberta_fast_tokenizer_path, save_to="./saved_models/" + key + "_saved_model/") 59 | print("Finished pretraining with {} parameter set.\n---------------\n".format(key)) 60 | -------------------------------------------------------------------------------- /train_regression_model.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | os.environ["TOKENIZER_PARALLELISM"] = "false" 4 | os.environ["WANDB_DISABLED"] = "true" 5 | 6 | import numpy as np 7 | import pandas as pd 8 | 9 | import torch 10 | from torch.nn import MSELoss 11 | from torch.utils.data import Dataset 12 | 13 | from transformers import BertPreTrainedModel, RobertaConfig, RobertaTokenizerFast 14 | 15 | from transformers.models.roberta.modeling_roberta import ( 16 | RobertaClassificationHead, 17 | RobertaConfig, 18 | RobertaModel, 19 | ) 20 | 21 | import argparse 22 | 23 | parser = argparse.ArgumentParser() 24 | parser.add_argument("--model", required=True, metavar="/path/to/model", help="Directory of the model") 25 | parser.add_argument("--tokenizer", required=True, metavar="/path/to/tokenizer/", help="Directory of the tokenizer") 26 | parser.add_argument("--dataset", required=True, metavar="/path/to/dataset/", help="Directory of the dataset") 27 | parser.add_argument("--save_to", required=True, metavar="/path/to/save/to/", help="Directory to save the model") 28 | parser.add_argument("--target_column_id", required=False, default="1", metavar="", type=int, help="Column's ID in the dataframe") 29 | parser.add_argument("--scaler", required=False, default=0, metavar="", type=int, help="Scaler to use for regression. 0 for no scaling, 1 for min-max scaling, 2 for standard scaling. Default: 0") 30 | parser.add_argument("--use_scaffold", required=False, metavar="", type=int, default=0, help="Split to use. 0 for random, 1 for scaffold. Default: 0") 31 | parser.add_argument("--train_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for training. Default: 8") 32 | parser.add_argument("--validation_batch_size", required=False, metavar="", type=int, default=8, help="Batch size for validation. Default: 8") 33 | parser.add_argument("--num_epochs", required=False, metavar="", type=int, default=50, help="Number of epochs. Default: 50") 34 | parser.add_argument("--lr", required=False, metavar="", type=float, default=1e-5, help="Learning rate. Default: 1e-5") 35 | parser.add_argument("--wd", required=False, metavar="", type=float, default=0.1, help="Weight decay. Default: 0.1") 36 | args = parser.parse_args() 37 | 38 | 39 | # Model 40 | class SELFIESTransformers_For_Regression(BertPreTrainedModel): 41 | def __init__(self, config): 42 | super(SELFIESTransformers_For_Regression, self).__init__(config) 43 | self.num_labels = config.num_labels 44 | self.roberta = RobertaModel(config) 45 | self.classifier = RobertaClassificationHead(config) 46 | 47 | def forward(self, input_ids, attention_mask, labels): 48 | outputs = self.roberta(input_ids, attention_mask=attention_mask) 49 | sequence_output = outputs[0] 50 | logits = self.classifier(sequence_output) 51 | 52 | outputs = (logits,) + outputs[2:] 53 | 54 | if labels is not None: 55 | if self.num_labels == 1: # regression 56 | loss_fct = MSELoss() 57 | loss = loss_fct(logits.squeeze(), labels.squeeze()) 58 | outputs = (loss,) + outputs 59 | return outputs # (loss), logits, (hidden_states), (attentions) 60 | 61 | 62 | model_name = args.model 63 | tokenizer_name = args.tokenizer 64 | 65 | # Configs 66 | 67 | num_labels = 1 68 | config_class = RobertaConfig 69 | config = config_class.from_pretrained(model_name, num_labels=num_labels) 70 | 71 | model_class = SELFIESTransformers_For_Regression 72 | model = model_class.from_pretrained(model_name, config=config) 73 | 74 | tokenizer_class = RobertaTokenizerFast 75 | tokenizer = tokenizer_class.from_pretrained(tokenizer_name, do_lower_case=False) 76 | 77 | 78 | # Prepare and Get Data 79 | class SELFIESTransfomers_Dataset(Dataset): 80 | def __init__(self, data, tokenizer, MAX_LEN): 81 | text, labels = data 82 | self.examples = tokenizer(text=text, text_pair=None, truncation=True, padding="max_length", max_length=MAX_LEN, return_tensors="pt") 83 | self.labels = torch.tensor(labels, dtype=torch.float) 84 | 85 | def __len__(self): 86 | return len(self.examples["input_ids"]) 87 | 88 | def __getitem__(self, index): 89 | item = {key: self.examples[key][index] for key in self.examples} 90 | item["label"] = self.labels[index] 91 | return item 92 | 93 | 94 | DATASET_PATH = args.dataset 95 | from prepare_finetuning_data import smiles_to_selfies 96 | from prepare_finetuning_data import train_val_test_split 97 | 98 | if args.use_scaffold == 0: # random 99 | print("Using random split") 100 | (train_df, validation_df, test_df) = train_val_test_split(DATASET_PATH, args.target_column_id, scaffold_split=False) 101 | else: # scaffold 102 | print("Using scaffold split") 103 | (train, val, test) = train_val_test_split(DATASET_PATH, args.target_column_id) 104 | 105 | train_smiles = [item[0] for item in train.smiles()] 106 | validation_smiles = [item[0] for item in val.smiles()] 107 | test_smiles = [item[0] for item in test.smiles()] 108 | 109 | train_df = pd.DataFrame(np.column_stack([train_smiles, train.targets()]), columns=["smiles", "target"]) 110 | validation_df = pd.DataFrame(np.column_stack([validation_smiles, val.targets()]), columns=["smiles", "target"]) 111 | test_df = pd.DataFrame(np.column_stack([test_smiles, test.targets()]), columns=["smiles", "target"]) 112 | 113 | train_df = smiles_to_selfies(train_df) 114 | validation_df = smiles_to_selfies(validation_df) 115 | test_df = smiles_to_selfies(test_df) 116 | 117 | from sklearn.preprocessing import StandardScaler, MinMaxScaler 118 | 119 | if args.scaler == 0: 120 | print("Not using a scaler.") 121 | elif args.scaler == 1: 122 | print("Using MinMaxScaler.") 123 | train_df["target"] = MinMaxScaler().fit_transform(np.array(train_df["target"]).reshape(-1, 1)) 124 | validation_df["target"] = MinMaxScaler().fit_transform(np.array(validation_df["target"]).reshape(-1, 1)) 125 | test_df["target"] = MinMaxScaler().fit_transform(np.array(test_df["target"]).reshape(-1, 1)) 126 | elif args.scaler == 2: 127 | print("Using StandardScaler.") 128 | train_df["target"] = StandardScaler().fit_transform(np.array(train_df["target"]).reshape(-1, 1)) 129 | validation_df["target"] = StandardScaler().fit_transform(np.array(validation_df["target"]).reshape(-1, 1)) 130 | test_df["target"] = StandardScaler().fit_transform(np.array(test_df["target"]).reshape(-1, 1)) 131 | else: 132 | print("Invalid scaler. Not using a scaler.") 133 | 134 | test_y = pd.DataFrame(test_df.target, columns=["target"]) 135 | 136 | MAX_LEN = 128 137 | train_examples = (train_df.iloc[:, 0].astype(str).tolist(), train_df.iloc[:, 1].tolist()) 138 | train_dataset = SELFIESTransfomers_Dataset(train_examples, tokenizer, MAX_LEN) 139 | 140 | validation_examples = (validation_df.iloc[:, 0].astype(str).tolist(), validation_df.iloc[:, 1].tolist()) 141 | validation_dataset = SELFIESTransfomers_Dataset(validation_examples, tokenizer, MAX_LEN) 142 | 143 | test_examples = (test_df.iloc[:, 0].astype(str).tolist(), test_df.iloc[:, 1].tolist()) 144 | test_dataset = SELFIESTransfomers_Dataset(test_examples, tokenizer, MAX_LEN) 145 | 146 | 147 | from sklearn.metrics import mean_absolute_error, mean_squared_error 148 | from sklearn.metrics import mean_absolute_error 149 | 150 | 151 | def compute_metrics(eval_pred): 152 | preds, labels = eval_pred 153 | predictions = [i[0] for i in preds] 154 | 155 | mse = {"mse": mean_squared_error(y_pred=predictions, y_true=labels, squared=True)} # it is actually squared=True, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html 156 | rmse = {"rmse": mean_squared_error(y_pred=predictions, y_true=labels, squared=False)} # it needs to squared=False, check the link above 157 | mae = {"mae": mean_absolute_error(y_pred=predictions, y_true=labels)} 158 | 159 | result = {**mse, **rmse, **mae} 160 | return result 161 | 162 | 163 | # Train and Evaluate 164 | from transformers import TrainingArguments, Trainer 165 | 166 | TRAIN_BATCH_SIZE = args.train_batch_size 167 | VALID_BATCH_SIZE = args.validation_batch_size 168 | TRAIN_EPOCHS = args.num_epochs 169 | LEARNING_RATE = args.lr 170 | WEIGHT_DECAY = args.wd 171 | MAX_LEN = MAX_LEN 172 | 173 | training_args = TrainingArguments( 174 | output_dir=args.save_to, 175 | overwrite_output_dir=True, 176 | evaluation_strategy="epoch", 177 | save_strategy="epoch", 178 | num_train_epochs=TRAIN_EPOCHS, 179 | learning_rate=LEARNING_RATE, 180 | weight_decay=WEIGHT_DECAY, 181 | per_device_train_batch_size=TRAIN_BATCH_SIZE, 182 | per_device_eval_batch_size=VALID_BATCH_SIZE, 183 | disable_tqdm=True, 184 | # load_best_model_at_end=True, 185 | # metric_for_best_model="roc-auc", 186 | # greater_is_better=True, 187 | save_total_limit=1, 188 | ) 189 | 190 | trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=validation_dataset, compute_metrics=compute_metrics,) # the instantiated 🤗 Transformers model to be trained # training arguments, defined above # training dataset # evaluation dataset 191 | 192 | metrics = trainer.train() 193 | print("Metrics") 194 | print(metrics) 195 | trainer.save_model(args.save_to) 196 | 197 | # Testing 198 | # Make prediction 199 | raw_pred, label_ids, metrics = trainer.predict(test_dataset) 200 | 201 | # Preprocess raw predictions 202 | y_pred = [i[0] for i in raw_pred] 203 | 204 | MSE = mean_squared_error(y_true=test_y, y_pred=y_pred, squared=True) # it is actually squared=True, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html 205 | RMSE = mean_squared_error(y_true=test_y, y_pred=y_pred, squared=False) # it needs to squared=False, check the link above 206 | MAE = mean_absolute_error(y_true=test_y, y_pred=y_pred) 207 | 208 | print("\nMean Squared Error (MSE):", MSE) 209 | print("Root Mean Square Error (RMSE):", RMSE) 210 | print("Mean Absolute Error (MAE):", MAE) 211 | --------------------------------------------------------------------------------