.
675 |
--------------------------------------------------------------------------------
/MANIFEST:
--------------------------------------------------------------------------------
1 | # file GENERATED by distutils, do NOT edit
2 | CHANGELOG.txt
3 | LICENSE.txt
4 | README.html
5 | README.md
6 | setup.py
7 | examples/README.md
8 | examples/comp_2_files.csv
9 | examples/comp_simple_smiles.csv
10 | examples/comp_smiles.csv
11 | examples/zid_smiles.csv
12 | examples/zid_smiles2.csv
13 | examples/zid_smiles3.csv
14 | examples/zinc_ids.csv
15 | scripts/cmd_line_online_query_scripts/lookup_smile_str.py
16 | scripts/cmd_line_online_query_scripts/lookup_zincid.py
17 | scripts/csv_scripts/comp_2_smile_files.py
18 | scripts/csv_scripts/comp_smile_strings.py
19 | scripts/csv_scripts/gen_zincid_smile_csv.py
20 | scripts/sqlite_scripts/add_to_sqlite.py
21 | scripts/sqlite_scripts/lookup_single_id.py
22 | scripts/sqlite_scripts/lookup_smile.py
23 | scripts/sqlite_scripts/sqlite_to_csv.py
24 | smilite/__init__.py
25 | smilite/smilite.py
26 | test/test_get_zinc_smile.py
27 | test/test_simplify_smile.py
28 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # smilite
2 |
3 | smilite is a Python module to download and analyze SMILES strings (Simplified Molecular-Input Line-entry System) of chemical compounds from ZINC (a free database of commercially-available compounds for virtual screening, [http://zinc.docking.org](http://zinc.docking.org)).
4 | Now supports both Python 3.x and Python 2.x.
5 |
6 |
7 | 
8 |
9 | #### Sections
10 |
11 | • [Installation](#installation)
12 | • [Simple command line online query scripts](#simple_cmd_scripts)
13 | - [lookup_zincid.py](#lookup_zincid)
14 | - [lookup_smile_str.py](#lookup_smile_str)
15 | • [CSV file command line scripts](#csv_scripts)
16 | - [gen_zincid_smile_csv.py (downloading SMILES)](#gen_zincid)
17 | - [comp_smile_strings.py (checking for duplicates within 1 file)](#comp_smile)
18 | - [comp_2_smile_files.py (checking for duplicates across 2 files)](#comp_2_smile)
19 | • [SQLite file command line scripts](#sqlite_scripts)
20 | - [lookup_single_id.py](#lookup1id)
21 | - [lookup_smile.py](#lookupsmile)
22 | - [add_to_sqlite.py](#add_to_sqlite)
23 | - [sqlite_to_csv.py](#sqlite_to_csv)
24 | • [Changelog](#changelog)
25 |
26 |
27 |
28 | # Installation
29 |
30 | You can use the following command to install smilite:
31 | `pip install smilite`
32 | or
33 | `easy_install smilite`
34 |
35 | Alternatively, you can download the package manually from the Python Package Index [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite), unzip it, navigate into the package, and use the command:
36 |
37 | `python3 setup.py install`
38 |
39 |
40 |
41 | # Simple command line online query scripts
42 |
43 | If you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/cmd_line_online_query_scripts` dir.
44 |
45 |
46 |
47 | ### lookup_zincid.py
48 |
49 | Retrieves the SMILES string and simplified SMILES string for a given ZINC ID
50 | from the online Zinc. It uses [ZINC12](http://zinc.docking.org) as the default backend, and via an additional commandline argument `zinc15`, the [ZINC15](http://zinc15.docking.org) database will be used instead.
51 |
52 | **Usage:**
53 | `[shell]>> python3 lookup_zincid.py ZINC_ID [zinc12/zinc15]`
54 |
55 | **Example (retrieve data from ZINC):**
56 | `[shell]>> python3 lookup_zincid.py ZINC01234567 zinc15`
57 |
58 | **Output example:**
59 |
60 | ZINC01234567
61 | C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
62 | CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
63 |
64 |
65 | Where
66 | - 1st row: ZINC ID
67 | - 2nd row: SMILES string
68 | - 3rd row: simplified SMILES string
69 |
70 |
71 |
72 | ### lookup_smile_str.py
73 |
74 | Retrieves the corresponding ZINC_IDs for a given SMILES string
75 | from the online ZINC database.
76 |
77 | **Usage:**
78 | `[shell]>> python3 lookup_smile_str.py SMILE_str`
79 |
80 | **Example (retrieve data from ZINC):**
81 | `[shell]>> python3 lookup_smile_str.py "C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"`
82 |
83 | **Output example:**
84 |
85 | ZINC01234567
86 | ZINC01234568
87 | ZINC01242053
88 | ZINC01242055
89 |
90 |
91 |
92 | # CSV file command line scripts
93 |
94 | If you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/csv_scripts` dir.
95 |
96 |
97 |
98 | ### gen_zincid_smile_csv.py (downloading SMILES)
99 |
100 | Generates a ZINC_ID,SMILE_STR csv file from a input file of ZINC IDs. The input file should consist of 1 columns with 1 ZINC ID per row. [ZINC12](http://zinc.docking.org) is used as the default backend, and via an additional commandline argument `zinc15`, the [ZINC15](http://zinc15.docking.org) database can be used instead.
101 |
102 | **Usage:**
103 | `[shell]>> python3 gen_zincid_smile_csv.py in.csv out.csv [zinc12/zinc15]`
104 |
105 | **Example:**
106 | `[shell]>> python3 gen_zincid_smile_csv.py ../examples/zinc_ids.csv ../examples/zid_smiles.csv zinc15`
107 |
108 | **Screen Output:**
109 |
110 | Downloading SMILES
111 | 0% 100%
112 | [########## ] | ETA[sec]: 106.525
113 |
114 | **Input example file format:**
115 | 
116 | [zinc_ids.csv](https://raw.github.com/rasbt/smilite/master/examples/zinc_ids.csv)
117 |
118 | **Output example file format:**
119 | 
120 | [zid_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles.csv)
121 |
122 |
123 |
124 | ### comp_smile_strings.py (checking for duplicates within 1 file)
125 |
126 | Compares SMILES strings within a 2 column CSV file (ZINC_ID,SMILE_string) to identify duplicates. Generates a new CSV file with ZINC IDs of identified duplicates listed in a 3rd-nth column(s).
127 |
128 | **Usage:**
129 | `[shell]>> python3 comp_smile_strings.py in.csv out.csv [simplify]`
130 |
131 | **Example 1:**
132 | `[shell]>> python3 comp_smile_strings.py ../examples/zinc_smiles.csv ../examples/comp_smiles.csv`
133 |
134 | **Input example file format:**
135 | 
136 | [zid_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles.csv)
137 |
138 | **Output example file format 1:**
139 | 
140 | [comp_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_smiles.csv)
141 |
142 | Where
143 | - 1st column: ZINC ID
144 | - 2nd column: SMILES string
145 | - 3rd column: number of duplicates
146 | - 4th-nth column: ZINC IDs of duplicates
147 |
148 | **Example 2:**
149 | `[shell]>> python3 comp_smile_strings.py ../examples/zid_smiles.csv ../examples/comp_simple_smiles.csv simplify`
150 |
151 | **Output example file format 2:** 
152 | [comp_simple_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_simple_smiles.csv)
153 |
154 |
155 |
156 | ### comp_2_smile_files.py (checking for duplicates across 2 files)
157 |
158 | Compares SMILES strings between 2 input CSV files, where each file consists of rows with 2 columns ZINC_ID,SMILE_string to identify duplicate SMILES string across both files.
159 | Generates a new CSV file with ZINC IDs of identified duplicates listed in a 3rd-nth column(s).
160 |
161 | **Usage:**
162 | `[shell]>> python3 comp_2_smile_files.py in1.csv in2.csv out.csv [simplify]`
163 |
164 | **Example:**
165 | `[shell]>> python3 comp_2_smile_files.py ../examples/zid_smiles2.csv ../examples/zid_smiles3.csv ../examples/comp_2_files.csv`
166 |
167 | **Input example file 1:**
168 | 
169 | [zid_smiles2.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles2.csv)
170 |
171 | **Input example file 2:**
172 | 
173 | [zid_smiles3.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles3.csv)
174 |
175 | **Output example file format:**
176 | 
177 | [comp_2_files.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_2_files.csv)
178 |
179 | Where:
180 | - 1st column: name of the origin file
181 | - 2nd column: ZINC ID
182 | - 3rd column: SMILES string
183 | - 4th-nth column: ZINC IDs of duplicates
184 |
185 |
186 |
187 | # SQLite file command line scripts
188 |
189 | If you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/sqlite_scripts` dir.
190 |
191 |
192 |
193 | ### lookup_single_id.py
194 |
195 | Retrieves the SMILES string and simplified SMILES string for a given ZINC ID
196 | from a previously built smilite SQLite database or from the online ZINC database.
197 |
198 | **Usage:**
199 | `[shell]>> python3 lookup_single_id.py ZINC_ID [sqlite_file]`
200 |
201 | **Example1 (retrieve data from a smilite SQLite database):**
202 | `[shell]>> python3 lookup_single_id.py ZINC01234567 ~/Desktop/smilite_db.sqlite`
203 |
204 | **Example2 (retrieve data from the ZINC online database):**
205 | `[shell]>> python3 lookup_single_id.py ZINC01234567`
206 |
207 | **Output example:**
208 |
209 | ZINC01234567
210 | C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
211 | CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
212 |
213 |
214 | Where
215 | - 1st row: ZINC ID
216 | - 2nd row: SMILES string
217 | - 3rd row: simplified SMILES string
218 |
219 |
220 |
221 | ### lookup_smile.py
222 |
223 | Retrieves the ZINC ID(s) for a given SMILES string or simplified SMILES string from a previously built smilite SQLite database.
224 |
225 | **Usage:**
226 | `[shell]>> python3 lookup_smile.py sqlite_file SMILE_STRING [simplify]`
227 |
228 | **Example1 (search for SMILES string):**
229 | `[shell]>> python3 lookup_smile.py ~/Desktop/smilite.sqlite "C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"`
230 |
231 | **Example2 (search for simplified SMILES string):**
232 | `[shell]>> python3 lookup_smile.py ~/Desktop/smilite.sqlite "CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O" simple`
233 |
234 | **Output example:**
235 |
236 | ZINC01234567
237 | C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
238 | CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
239 |
240 |
241 | Where
242 | - 1st row: ZINC ID
243 | - 2nd row: SMILES string
244 | - 3rd row: simplified SMILES string
245 |
246 |
247 |
248 | ### add_to_sqlite.py
249 |
250 | Reads ZINC IDs from a CSV file and looks up SMILES strings and simplified SMILES strings from the ZINC online database. Writes those SMILES strings to a smilite SQLite database. A new database will be created if it doesn't exist, yet.
251 |
252 | **Usage:**
253 | `[shell]>> python3 add_to_sqlite.py sqlite_file csv_file`
254 |
255 | **Example:**
256 | `[shell]>> python3 add_to_sqlite.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_ids.csv`
257 |
258 | **Input CSV file example format:**
259 |
260 | ZINC01234567
261 | ZINC01234568
262 | ...
263 |
264 |
265 | An example of the smilite SQLite database contents after successful insertion is shown in the image below. 
266 |
267 |
268 |
269 | ### sqlite_to_csv.py
270 |
271 | Writes contents of an SQLite smilite database to a CSV file.
272 |
273 | **Usage:**
274 | `[shell]>> python3 sqlite_to_csv.py sqlite_file csv_file`
275 |
276 | **Example:**
277 | `[shell]>> python3 sqlite_to_csv.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_smiles.csv`
278 |
279 | **Input CSV file example format:**
280 |
281 | ZINC_ID,SMILE,SIMPLE_SMILE
282 | ZINC01234568,C[C@@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
283 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
284 |
285 |
286 | An example of the CSV file contents opened in an spreadsheet program is shown in the image below. 
287 |
288 |
289 |
290 |
291 |
292 | # Changelog
293 |
294 | **VERSION 2.3.1 (07/25/2020)**
295 |
296 | - Fix bug to allow `zinc15` option in gen_zincid_smile_csv.py script
297 |
298 | **VERSION 2.3.0 (06/10/2020)**
299 |
300 | - Fixes ZINC URL in `lookup_smile_str.py`
301 | - Adds an optional command line parameter (with arguments `zinc15` or `zinc12`) for `lookup_smile_str.py`
302 |
303 | **VERSION 2.2.0**
304 |
305 | * Provides an optional command line argument (zinc15) to use ZINC15 as a backend for downloading SMILES
306 |
307 | **VERSION 2.1.0**
308 |
309 | * Functions and scripts to fetch ZINC IDs corresponding to a SMILES string query
310 |
311 | **VERSION 2.0.1**
312 |
313 | * Progress bar for add_to_sqlite.py
314 |
315 | **VERSION 2.0.0**
316 |
317 | * added SQLite features
318 |
319 | **VERSION 1.3.0**
320 |
321 | * added script and module function to compare SMILES strings across 2 files.
322 |
323 | **VERSION 1.2.0**
324 |
325 | * added Python 2.x support
326 |
327 | **VERSION 1.1.1**
328 |
329 | * PyPrind dependency fix
330 |
331 | **VERSION 1.1.0**
332 |
333 | * added a progress bar (PyPrind) to `generate_zincid_smile_csv()` function
--------------------------------------------------------------------------------
/TO_DO.txt:
--------------------------------------------------------------------------------
1 | smilite `smile_to_zincid` extension
2 | ===================================
3 |
4 |
5 | []update version in changelog
6 | []update version in setup.py
7 | []update version in __init__
8 | [ ]create csv_scripts/gen_smile_zinc_id_csv.py (same output structure as from gen_zincid_smile_csv.py)
9 | [ ]modify sqlite_scripts/lookup_smile.py
10 | [ ]split sqlite_scripts/add_to_sqlite.py into
11 | [ ]sqlite_scripts/add_to_sqlite_from_id.py
12 | [ ]sqlite_scripts/add_to_sqlite_from_smile.py
13 | [ ]add/modify script paths in setup.py data dict
14 | [ ]update README.md
15 | [ ]convert to README.html
16 | [ ]upload smilite code to PyPi
17 | [ ]upload README.html to PyPi
18 | [ ]update documentation on website
19 |
--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
1 | ## Example Files
2 |
3 | #### zinc_ids.csv
4 | CSV file consisting of one column with 1 ZINC ID per row.
5 | Used as input for `smilite.generate_zincid_smile_csv()` to download SMILE strings
6 | from the ZINC online database ([http://zinc.docking.org](http://zinc.docking.org)) and generates a 2-column CSV file with
7 | ZINC_ID,SMILE_str pairs.
8 | This function is implemented in the command line script: [../scripts/gen_zincid_smiles.py](../scripts/gen_zincid_smiles.py).
9 |
10 | #### zid_smiles.csv
11 | A 2-column CSV file with ZINC_ID,SMILE_str pairs that is generated when the function `smilite.generate_zincid_smile_csv()` is invoked or via the command line script: [../scripts/gen_zincid_smiles.py](../scripts/gen_zincid_smiles.py).
12 |
13 | This CSV file can be used as input CSV file to check for SMILE string duplicates via the `smilite.check_duplicate_smiles()` function or the command line script [../scripts/comp_smile_strings.py](../scripts/comp_smile_strings.py).
14 |
15 | #### comp_smiles.csv
16 | A multi-column CSV file where
17 | - 1st column: ZINC ID
18 | - 2nd column: SMILE string
19 | - 3rd column: number of duplicates
20 | - 4th-nth column: ZINC IDs of duplicates
21 |
22 | This CSV file is the output of invoking the `smilite.check_duplicate_smiles()` function
23 | or the command line script [../scripts/comp_smile_strings.py](../scripts/comp_smile_strings.py).
24 |
25 |
26 | #### comp_simple_smiles.csv
27 |
28 | A multi-column CSV file where
29 | - 1st column: ZINC ID
30 | - 2nd column: simplified SMILE string
31 | - 3rd column: number of duplicates
32 | - 4th-nth column: ZINC IDs of duplicates
33 |
34 | This CSV file is the output of invoking the `smilite.check_duplicate_smiles()` function with the additional argument `compare_simplified_smiles=True`
35 | or the command line script [../scripts/comp_smile_strings.py](../scripts/comp_smile_strings.py) with the provided command line argument `simplify`.
36 |
--------------------------------------------------------------------------------
/examples/comp_2_files.csv:
--------------------------------------------------------------------------------
1 | file_origin,zinc_id,smile_str,duplicates
2 | examples/zid_smiles2.csv,ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2,
3 | examples/zid_smiles2.csv,ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C,
4 | examples/zid_smiles2.csv,ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC,ZINC12345678,
5 | examples/zid_smiles2.csv,ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC,ZINC12345678,
6 | examples/zid_smiles3.csv,ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC,ZINC12345678,ZINC12345678,
7 | examples/zid_smiles3.csv,ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,
8 | examples/zid_smiles3.csv,ZINC03245324,c1ccc(c(c1)C#N)Sc2ccccc2C(=O)OCC(=O)Nc3ccc4c(c3)OCCO4,
--------------------------------------------------------------------------------
/examples/comp_simple_smiles.csv:
--------------------------------------------------------------------------------
1 | zinc_id,smile_str,duplicates
2 | ZINC03245324,C1CCC(C(C1)CN)SC2CCCCC2C(=O)OCC(=O)NC3CCC4C(C3)OCCO4,0,
3 | ZINC83310457,CC1CCCC(C1N2C(CC(C2C)/C=N\NC(=O)C(=O)NC3CCC(CC3)N(=O)O)C)C,0,
4 | ZINC01234567,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O,0,
5 | ZINC12345678,CC1CCC(CC1C)OCCOC2C(CC(CC2I)/C=N/N3CNNC3)OC,0,
6 | ZINC00029323,COC1CCCC(C1)NC(=O)C2CCCNC2,1,ZINC00029323,
--------------------------------------------------------------------------------
/examples/comp_smiles.csv:
--------------------------------------------------------------------------------
1 | zinc_id,smile_str,duplicates
2 | ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C,0,
3 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,0,
4 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC,0,
5 | ZINC03245324,c1ccc(c(c1)C#N)Sc2ccccc2C(=O)OCC(=O)Nc3ccc4c(c3)OCCO4,0,
6 | ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2,1,ZINC00029323,
--------------------------------------------------------------------------------
/examples/zid_smiles.csv:
--------------------------------------------------------------------------------
1 | ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
2 | ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
3 | ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C
4 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
5 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
6 | ZINC03245324,c1ccc(c(c1)C#N)Sc2ccccc2C(=O)OCC(=O)Nc3ccc4c(c3)OCCO4
7 |
--------------------------------------------------------------------------------
/examples/zid_smiles2.csv:
--------------------------------------------------------------------------------
1 | ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
2 | ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C
3 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
4 |
5 |
--------------------------------------------------------------------------------
/examples/zid_smiles3.csv:
--------------------------------------------------------------------------------
1 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
2 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
3 | ZINC03245324,c1ccc(c(c1)C#N)Sc2ccccc2C(=O)OCC(=O)Nc3ccc4c(c3)OCCO4
4 |
--------------------------------------------------------------------------------
/examples/zinc_ids.csv:
--------------------------------------------------------------------------------
1 | ZINC00029323
2 | ZINC00029323
3 | ZINC83310457
4 | ZINC12345678
5 | ZINC01234567
6 | ZINC03245324
7 |
--------------------------------------------------------------------------------
/images/add_to_sqlite_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/add_to_sqlite_1.png
--------------------------------------------------------------------------------
/images/comp_2_files.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/comp_2_files.png
--------------------------------------------------------------------------------
/images/comp_simple_smiles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/comp_simple_smiles.png
--------------------------------------------------------------------------------
/images/comp_smiles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/comp_smiles.png
--------------------------------------------------------------------------------
/images/insert_id_sqlite_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/insert_id_sqlite_1.png
--------------------------------------------------------------------------------
/images/smilite_overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/smilite_overview.png
--------------------------------------------------------------------------------
/images/sqlite_to_csv_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/sqlite_to_csv_1.png
--------------------------------------------------------------------------------
/images/sqlite_to_csv_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/sqlite_to_csv_2.png
--------------------------------------------------------------------------------
/images/zid_smiles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/zid_smiles.png
--------------------------------------------------------------------------------
/images/zid_smiles2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/zid_smiles2.png
--------------------------------------------------------------------------------
/images/zid_smiles3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/zid_smiles3.png
--------------------------------------------------------------------------------
/images/zinc_ids.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rasbt/smilite/48f61df2ccc48b73d30a66cabcbb37b85c189d6c/images/zinc_ids.png
--------------------------------------------------------------------------------
/scripts/cmd_line_online_query_scripts/lookup_smile_str.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Sebastian Raschka
2 | #
3 | # Retrieves the corresponding ZINC_IDs for a given SMILE string
4 | # from the online ZINC database.
5 | #
6 | #
7 | # Usage:
8 | # [shell]>> python3 lookup_smile_str.py SMILE_str
9 | #
10 | # Example (retrieve data from the ZINC online database):
11 | # [shell]>> python3 lookup_smile_str.py \
12 | # 'C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O'
13 | #
14 | #
15 | # Output example:
16 | # ZINC01234567
17 | # ZINC01234568
18 | # ZINC01242053
19 | # ZINC01242055
20 | #
21 |
22 | import smilite
23 | import sys
24 |
25 |
26 | def print_usage():
27 | print('\nUSAGE: python3 lookup_smile_str.py SMILE_str Backend')
28 | print('\n\nUses zinc15 as backend by default'
29 | '\n\nEXAMPLE 1 (retrieve data from ZINC12):\n'
30 | 'python3 lookup_smile_str.py'
31 | ' C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"'
32 | '\n\nEXAMPLE 2 (retrieve data from ZINC12):\n'
33 | 'python3 lookup_smile_str.py'
34 | ' CCOc1ccc(cc1)N([C@@H](C)C(=O)Nc2ccc(cc2C)Cl)S(=O)(=O)C" zinc12')
35 |
36 |
37 | zinc_id = [None]
38 |
39 | try:
40 | smile_str = sys.argv[1]
41 |
42 | if len(sys.argv) >= 3:
43 | backend = sys.argv[2]
44 | else:
45 | backend = 'zinc15'
46 |
47 | zinc_ids = smilite.get_zincid_from_smile(smile_str, backend=backend)
48 | for zid in zinc_ids:
49 | print(zid)
50 |
51 | except IOError as err:
52 | print('\n\nERROR: {}'.format(err))
53 | print_usage()
54 |
55 | except IndexError:
56 | print('\n\nERROR: Invalid command line arguments.')
57 | print_usage()
58 |
--------------------------------------------------------------------------------
/scripts/cmd_line_online_query_scripts/lookup_zincid.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Retrieves the SMILE string and simplified SMILE string for a given ZINC ID
4 | # from the online ZINC database.
5 | #
6 | #
7 | # Usage:
8 | # [shell]>> python3 lookup_zincid.py ZINC_ID
9 | #
10 | # Example (retrieve data from the ZINC online database):
11 | # [shell]>> python3 lookup_zincid.py ZINC01234567
12 | #
13 | #
14 | #
15 | # Output example:
16 | # ZINC01234567
17 | # C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
18 | # CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
19 | #
20 | # Where
21 | # 1st row: ZINC ID
22 | # 2nd row: SMILE string
23 | # 3rd row: simplified SMILE string
24 | #
25 |
26 | import smilite
27 | import sys
28 |
29 |
30 | def print_usage():
31 | print('\nUSAGE: python3 lookup_zincid.py ZINC_ID [zinc12 (def.) / zinc15]')
32 | print('\n\nEXAMPLES (retrieve data from ZINC):\n'
33 | '1) python3 lookup_zincid.py ZINC01234567 zinc12\n'
34 | '2) python3 lookup_zincid.py ZINC01234567 zinc15')
35 |
36 |
37 | smile_str = ''
38 | simple_smile_str = ''
39 |
40 | try:
41 | zinc_id = sys.argv[1]
42 |
43 | if len(sys.argv) >= 3:
44 | backend = sys.argv[2]
45 | else:
46 | backend = 'zinc12'
47 |
48 | smile_str = smilite.get_zinc_smile(zinc_id, backend=backend)
49 | if smile_str:
50 | simple_smile_str = smilite.simplify_smile(smile_str)
51 | print('{}\n{}\n{}'.format(zinc_id, smile_str, simple_smile_str))
52 |
53 | except IOError as err:
54 | print('\n\nERROR: {}'.format(err))
55 | print_usage()
56 |
57 | except IndexError:
58 | print('\n\nERROR: Invalid command line arguments.')
59 | print_usage()
60 |
--------------------------------------------------------------------------------
/scripts/csv_scripts/comp_2_smile_files.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Compares SMILE strings between 2 input CSV files, where each
4 | # file consists of rows with 2 columns ZINC_ID,SMILE_string to #
5 | # identify duplicate SMILE string across both files.
6 | # Generates a new CSV file with ZINC IDs of identified
7 | # duplicates listed in a 3rd-nth column(s).
8 | #
9 | #
10 | # Input example file format:
11 | # ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
12 | # ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
13 | # ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C
14 | # ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
15 | #
16 | # Output example file format:
17 | # file_origin,zinc_id,smile_str,duplicates
18 | # my_file1.csv,ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2,1,ZINC00029323,
19 | # my_file1.csv,ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C,0,
20 | # my_file1.csv,ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,0,
21 | # my_file2.csv,ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2,1,ZINC00029323,
22 | # my_file2.csv,ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C,0,
23 | # my_file2.csv,ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,0,
24 | #
25 | #
26 | #
27 | # Where
28 | # 1st column: name of the origin file
29 | # 2nd column: ZINC ID
30 | # 3rd column: SMILE string
31 | # 4th-nth column: ZINC IDs of duplicates
32 | #
33 | # Usage:
34 | # [shell]>> python3 comp_2_smile_files.py in1.csv in2.csv out.csv [simplify]
35 | #
36 | # Example 1:
37 | # [shell]>> python3 comp_2_smile_files.py \
38 | # ../examples/zid_smiles2.csv \
39 | # ../examples/zid_smiles3.csv \
40 | # ../examples/comp_2_files.csv
41 | #
42 | # Example 2:
43 | # [shell]>> python3 comp_2_smile_files.py \
44 | # ../examples/zid_smiles2.csv \
45 | # ../examples/zid_smiles3.csv \
46 | # ../examples/comp_2_files.csv simplify
47 |
48 |
49 | import smilite
50 | import sys
51 |
52 |
53 | def print_usage():
54 | print('\nUSAGE: python3 comp_2_smile_files.py'
55 | ' in1.csv in2.csv out.csv [simplify]')
56 | print('\nEXAMPLE 1: python3 comp_2_smile_files.py'
57 | ' ../examples/zid_smiles2.csv ../examples/zid_smiles3.csv'
58 | ' ../examples/comp_2_files.csv\n')
59 | print('\nEXAMPLE 2: python3 comp_2_smile_files.py'
60 | ' ../examples/zid_smiles2.csv ../examples/zid_smiles3.csv'
61 | ' ../examples/comp_2_files.csv simplify\n')
62 |
63 |
64 | try:
65 | in_csv1 = sys.argv[1]
66 | in_csv2 = sys.argv[2]
67 | out_csv = sys.argv[3]
68 | simplify = False
69 |
70 | if len(sys.argv) > 4:
71 | simplify = True
72 |
73 | smilite.comp_two_csvfiles(in_csv1, in_csv2,
74 | out_csv, compare_simplified_smiles=simplify)
75 |
76 | except IOError as err:
77 | print('\n\nERROR: {}'.format(err))
78 | print_usage()
79 |
80 | except IndexError:
81 | print('\n\nERROR: Invalid command line arguments.')
82 | print_usage()
83 |
--------------------------------------------------------------------------------
/scripts/csv_scripts/comp_smile_strings.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Compares SMILE strings within a 2 column CSV file (ZINC_ID,SMILE_string) to
4 | # identify duplicates. Generates a new CSV file with ZINC IDs of identified
5 | # duplicates listed in a 3rd-nth column(s).
6 | #
7 | #
8 | # Input example file format:
9 | # ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
10 | # ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2
11 | # ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C
12 | # ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
13 | #
14 | # Output example file format:
15 | # zinc_id,smile_str,duplicates
16 | # ZINC00029323,COc1cccc(c1)NC(=O)c2cccnc2,1,ZINC00029323,
17 | # ZINC83310457,Cc1cccc(c1n2c(cc(c2C)/C=N\NC(=O)C(=O)Nc3ccc(cc3)[N+](=O)[O-])C)C,0,
18 | # ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,0,
19 | #
20 | # Where
21 | # 1st column: ZINC ID
22 | # 2nd column: SMILE string
23 | # 3rd column: number of duplicates
24 | # 4th-nth column: ZINC IDs of duplicates
25 | #
26 | # Usage:
27 | # [shell]>> python3 comp_smile_strings.py in.csv out.csv [simplify]
28 | #
29 | # Example 1:
30 | # [shell]>> python3 gen_zincid_smile_csv.py \
31 | # ../examples/zinc_ids.csv \
32 | # ../examples/zid_smiles.csv
33 | #
34 | # Example 2:
35 | # [shell]>> python3 comp_smile_strings.py \
36 | # ../examples/zid_smiles.csv \
37 | # ../examples/comp_simple_smiles.csv simplify
38 |
39 |
40 | import smilite
41 | import sys
42 |
43 |
44 | def print_usage():
45 | print('\nUSAGE: python3 comp_smile_strings.py in.csv out.csv [simplify]')
46 | print('\nEXAMPLE 1: python3 comp_smile_strings.py'
47 | ' ../examples/zid_smiles.csv ../examples/comp_smiles.csv\n')
48 | print('\nEXAMPLE 2: python3 comp_smile_strings.py'
49 | ' ../examples/zid_smiles.csv'
50 | ' ../examples/comp_simple_smiles.csv simplify\n')
51 |
52 |
53 | try:
54 | in_csv = sys.argv[1]
55 | out_csv = sys.argv[2]
56 | simplify = False
57 |
58 | if len(sys.argv) > 3:
59 | simplify = True
60 |
61 | smilite.check_duplicate_smiles(in_csv, out_csv,
62 | compare_simplified_smiles=simplify)
63 |
64 | except IOError as err:
65 | print('\n\nERROR: {}'.format(err))
66 | print_usage()
67 |
68 | except IndexError:
69 | print('\n\nERROR: Invalid command line arguments.')
70 | print_usage()
71 |
--------------------------------------------------------------------------------
/scripts/csv_scripts/gen_zincid_smile_csv.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Generates a ZINC_ID,SMILE_STR csv file from a input file of
4 | # ZINC IDs. The input file should consist of 1 columns with 1 ZINC ID per row.
5 | #
6 | # Input example file format:
7 | # ZINC0000123456
8 | # ZINC0000234567
9 | # ...
10 | #
11 | # Output example file format:
12 | # ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
13 | # ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
14 | # ...
15 | #
16 | # Usage:
17 | # [shell]>> python3 gen_zincid_smile_csv.py \
18 | # in.csv out.csv [zinc12 (default) / zinc15]
19 | #
20 | # Example:
21 | # [shell]>> python3 gen_zincid_smile_csv.py \
22 | # ../examples/zinc_ids.csv ../examples/zid_smiles.csv zinc15
23 |
24 |
25 | import smilite
26 | import sys
27 |
28 |
29 | def print_usage():
30 | print('\nUSAGE: python3 gen_zincid_smile_csv.py'
31 | ' in.csv out.csv [zinc12 (default) / zinc15]')
32 | print('\nEXAMPLE: python3 gen_zincid_smile_csv.py '
33 | '../examples/zinc_ids.csv ../examples/zid_smiles.csv zinc12\n')
34 |
35 |
36 | try:
37 | in_csv = sys.argv[1]
38 | out_csv = sys.argv[2]
39 |
40 | if len(sys.argv) >= 4:
41 | backend = sys.argv[3]
42 | else:
43 | backend = 'zinc12'
44 |
45 | smilite.generate_zincid_smile_csv(in_csv, out_csv,
46 | print_progress_bar=True,
47 | backend=backend)
48 |
49 | except IOError as err:
50 | print('\n\nERROR: {}'.format(err))
51 | print_usage()
52 |
53 | except IndexError:
54 | print('\n\nERROR: Invalid command line arguments.')
55 | print_usage()
56 |
--------------------------------------------------------------------------------
/scripts/sqlite_scripts/add_to_sqlite.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Reads ZINC IDs from a CSV file and looks up SMILE strings
4 | # and simplified SMILE strings
5 | # from the ZINC online database. Writes those SMILE strings
6 | # to a smilite SQLite database.
7 | # A new database will be created if it doesn't exist, yet.
8 | #
9 | # Usage:
10 | # [shell]>> python3 add_to_sqlite.py sqlite_file csv_file
11 | #
12 | # Example:
13 | # [shell]>> python3 add_to_sqlite.py
14 | # ~/Desktop/smilite.sqlite ~/Desktop/zinc_ids.csv
15 | #
16 | #
17 | # Input CSV file example format:
18 | # ZINC01234567
19 | # ZINC01234568
20 | # ...
21 | #
22 | #
23 | #
24 |
25 | import smilite
26 | import sys
27 | import os
28 | import pyprind
29 |
30 |
31 | def print_usage():
32 | print('\nUSAGE: python3 add_to_sqlite.py sqlite_file csv_file')
33 | print('\nEXAMPLE:\n'
34 | 'python3 add_to_sqlite.py'
35 | ' ~/Desktop/smilite.sqlite ~/Desktop/zinc_ids.csv')
36 |
37 |
38 | try:
39 | sqlite_file = sys.argv[1]
40 | csv_file = sys.argv[2]
41 | if not os.path.exists(sqlite_file):
42 | smilite.create_sqlite(sqlite_file)
43 | with open(csv_file, 'r') as in_csv:
44 | all_lines = in_csv.readlines()
45 | pbar = pyprind.ProgBar(len(all_lines), title='Downloading SMILES')
46 | for line in all_lines:
47 | pbar.update()
48 | line = line.strip()
49 | if line:
50 | zinc_id = line.split(',')[0]
51 | smilite.insert_id_sqlite(sqlite_file, zinc_id)
52 |
53 | except IOError as err:
54 | print('\n\nERROR: {}'.format(err))
55 | print_usage()
56 |
57 |
58 | except IndexError:
59 | print('\n\nERROR: Invalid command line arguments.')
60 | print_usage()
61 |
--------------------------------------------------------------------------------
/scripts/sqlite_scripts/lookup_single_id.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Retrieves the SMILE string and simplified SMILE string for a given ZINC ID
4 | # from a previously built smilite SQLite database
5 | # or from the online ZINC database.
6 | #
7 | #
8 | # Usage:
9 | # [shell]>> python3 lookup_single_id.py ZINC_ID [sqlite_file]
10 | #
11 | # Example 1 (retrieve data from a smilite SQLite database):
12 | # [shell]>> python3 lookup_single_id.py
13 | # ZINC01234567 \
14 | # ~/Desktop/smilite_db.sqlite
15 | #
16 | # Example 2 (retrieve data from the ZINC online database):
17 | # [shell]>> python3 lookup_single_id.py ZINC01234567
18 | #
19 | #
20 | #
21 | # Output example:
22 | # ZINC01234567
23 | # C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
24 | # CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
25 | #
26 | # Where
27 | # 1st row: ZINC ID
28 | # 2nd row: SMILE string
29 | # 3rd row: simplified SMILE string
30 | #
31 |
32 | import smilite
33 | import sys
34 |
35 |
36 | def print_usage():
37 | print('\nUSAGE: python3 lookup_single_id.py ZINC_ID [sqlite_file]')
38 | print('\n\nEXAMPLE1 (retrieve data from a smilite SQLite database):\n'
39 | 'python3 lookup_single_id.py ZINC01234567 '
40 | '~/Desktop/smilite_db.sqlite')
41 | print('\n\nEXAMPLE2 (retrieve data from the ZINC online database):\n'
42 | 'python3 lookup_single_id.py ZINC01234567\n')
43 |
44 |
45 | smile_str = ''
46 | simple_smile_str = ''
47 |
48 | try:
49 | zinc_id = sys.argv[1]
50 |
51 | if len(sys.argv) > 2:
52 | sqlite_file = sys.argv[2]
53 | lookup_result = smilite.lookup_id_sqlite(sqlite_file, zinc_id)
54 | try:
55 | smile_str, simple_smile_str = lookup_result[1], lookup_result[2]
56 | except IndexError:
57 | pass
58 |
59 | else:
60 | smile_str = smilite.get_zinc_smile(zinc_id)
61 | if smile_str:
62 | simple_smile_str = smilite.simplify_smile(smile_str)
63 |
64 | print('{}\n{}\n{}'.format(zinc_id, smile_str, simple_smile_str))
65 |
66 | except IOError as err:
67 | print('\n\nERROR: {}'.format(err))
68 | print_usage()
69 |
70 | except IndexError:
71 | print('\n\nERROR: Invalid command line arguments.')
72 | print_usage()
73 |
--------------------------------------------------------------------------------
/scripts/sqlite_scripts/lookup_smile.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Retrieves the ZINC ID(s) for a given SMILE sting or simplified SMILE string
4 | # from a previously built smilite SQLite database.
5 | #
6 | #
7 | # Usage:
8 | # [shell]>> python3 lookup_smile.py sqlite_file SMILE_STRING [simplify]
9 | #
10 | # Example 1 (search for SMILE string):
11 | # [shell]>> python3 lookup_smile.py ~/Desktop/smilite.sqlite \
12 | # "C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"
13 | #
14 | # Example 2 (search for simplified SMILE string):
15 | # [shell]>> python3 lookup_smile.py ~/Desktop/smilite.sqlite \
16 | # "CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O" simple
17 | #
18 | #
19 | #
20 | # Output example:
21 | # ZINC01234567
22 | # C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
23 | # CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
24 | #
25 | # Where
26 | # 1st row: ZINC ID
27 | # 2nd row: SMILE string
28 | # 3rd row: simplified SMILE string
29 | #
30 |
31 | import smilite
32 | import sys
33 |
34 |
35 | def print_usage():
36 | print('\nUSAGE: python3 lookup_smile.py '
37 | 'sqlite_file SMILE_STRING [simplify]')
38 | print('\n\nEXAMPLE1 (search for SMILE string):\n'
39 | 'python3 lookup_smile.py ~/Desktop/smilite.sqlite '
40 | '"C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"')
41 | print('\n\nEXAMPLE2 (search for simplified SMILE string):\n'
42 | 'python3 lookup_smile.py ~/Desktop/smilite.sqlite '
43 | '"CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O simple\n"')
44 |
45 |
46 | zinc_id = ''
47 | simple_smile = False
48 |
49 | try:
50 | sqlite_file = sys.argv[1]
51 | smile = sys.argv[2]
52 | if len(sys.argv) > 3:
53 | simple_smile = True
54 |
55 | result = smilite.lookup_smile_sqlite(sqlite_file, smile, simple_smile)
56 | for i in result:
57 | if isinstance(i, list):
58 | for j in i:
59 | print(j)
60 | else:
61 | print(i)
62 |
63 |
64 | except IOError as err:
65 | print('\n\nERROR: {}'.format(err))
66 | print_usage()
67 |
68 | except IndexError:
69 | print('\n\nERROR: Invalid command line arguments.')
70 | print_usage()
71 |
--------------------------------------------------------------------------------
/scripts/sqlite_scripts/sqlite_to_csv.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # Writes contents of an SQLite smilite database to a CSV file.
4 | #
5 | # Usage:
6 | # [shell]>> python3 sqlite_to_csv.py sqlite_file csv_file
7 | #
8 | # Example:
9 | # [shell]>> python3 sqlite_to_csv.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_smiles.csv
10 | #
11 | # Output CSV file example:
12 | # ZINC_ID,SMILE,SIMPLE_SMILE
13 | # ZINC01234568,C[C@@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
14 | # ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
15 | #
16 |
17 | import smilite
18 | import sys
19 | import os
20 |
21 | def print_usage():
22 | print('\nUSAGE: python3 sqlite_to_csv.py sqlite_file csv_file')
23 | print('\nEXAMPLE:\n'\
24 | 'python3 sqlite_to_csv.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_smiles.csv')
25 |
26 | try:
27 | sqlite_file = sys.argv[1]
28 | csv_file = sys.argv[2]
29 | smilite.sqlite_to_csv(sqlite_file, csv_file)
30 |
31 | except IOError as err:
32 | print('\n\nERROR: {}'.format(err))
33 | print_usage()
34 |
35 | except IndexError:
36 | print('\n\nERROR: Invalid command line arguments.')
37 | print_usage()
38 |
39 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 | setup(name='smilite',
4 | version='2.3.0',
5 | description='smilite is a Python module to download'
6 | ' and analyze SMILE strings',
7 | author='Sebastian Raschka',
8 | author_email='se.raschka@gmail.com',
9 | url='https://github.com/rasbt/smilite',
10 | packages=['smilite'],
11 | package_data={'': ['LICENSE.txt',
12 | 'README.md',
13 | 'CHANGELOG.txt']},
14 | install_requires=['PyPrind>=2.3.1'],
15 | license='GPLv3',
16 | platforms='any',
17 | classifiers=[
18 | 'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
19 | 'Development Status :: 5 - Production/Stable',
20 | 'Programming Language :: Python :: 3',
21 | 'Environment :: Console',
22 | ],
23 | long_description="""
24 |
25 | smilite is a Python module to download and analyze SMILE strings
26 | (Simplified Molecular-Input Line-entry System)
27 | of chemical compounds from ZINC
28 | (a free database of commercially-available compounds for virtual screening,
29 | http://zinc.docking.org).
30 |
31 | Source repository: https://github.com/rasbt/smilite
32 |
33 | """)
34 |
--------------------------------------------------------------------------------
/smilite/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # A small module to retrieve SMILE strings
4 | # (Simplified molecular-input line-entry system) from the ZINC online
5 | # database (http://zinc.docking.org)
6 |
7 | from .smilite import get_zinc_smile
8 | from .smilite import generate_zincid_smile_csv
9 | from .smilite import simplify_smile
10 | from .smilite import check_duplicate_smiles
11 | from .smilite import comp_two_csvfiles
12 | from .smilite import create_id_smile_list
13 | from .smilite import create_sqlite
14 | from .smilite import insert_id_sqlite
15 | from .smilite import lookup_id_sqlite
16 | from .smilite import lookup_smile_sqlite
17 | from .smilite import sqlite_to_dict
18 | from .smilite import sqlite_to_csv
19 | from .smilite import get_zincid_from_smile
20 |
21 | __version__ = '2.3.1'
22 |
23 |
--------------------------------------------------------------------------------
/smilite/smilite.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014-2020 Sebastian Raschka
2 | #
3 | # smilite is a Python module to download and analyze SMILE strings
4 | # (Simplified Molecular-Input Line-entry System) of chemical compounds
5 | # from ZINC (a free database of commercially-available compounds
6 | # for virtual screening:
7 | # http://zinc.docking.org
8 | # Now supports both Python 3.x and Python 2.x.
9 |
10 | import sys
11 | import sqlite3
12 | import os
13 | import pyprind
14 |
15 | # Load Python version specific modules
16 | if sys.version_info[0] == 3:
17 | import urllib.request
18 | import urllib.parse
19 | else:
20 | import urllib
21 |
22 |
23 | def get_zinc_smile(zinc_id, backend='zinc12'):
24 | """
25 | Gets the corresponding SMILE string for a ZINC ID query from
26 | the ZINC online database. Requires an internet connection.
27 |
28 | Keyword arguments:
29 | zinc_id (str): A valid ZINC ID, e.g. 'ZINC00029323'
30 | backend (str): zinc12 or zinc15
31 |
32 | Returns the SMILE string for the corresponding ZINC ID.
33 | E.g., 'COc1cccc(c1)NC(=O)c2cccnc2'
34 |
35 | """
36 | if backend not in {'zinc12', 'zinc15'}:
37 | raise ValueError("backend must be 'zinc12' or 'zinc15'")
38 |
39 | stripped_id = zinc_id.strip('ZINC')
40 |
41 | if backend == 'zinc12':
42 | min_len = 8
43 | base_path = 'http://zinc.docking.org/substance/'
44 | line_lookup = ('Draw'
48 |
49 | else:
50 | min_len = 12
51 | base_path = 'http://zinc15.docking.org/substances/'
52 | line_lookup = 'id="substance-smiles-field" readonly value="'
53 | first_linesplit = line_lookup
54 | second_linesplit = '">'
55 |
56 | while len(stripped_id) < min_len:
57 | stripped_id = '0' + stripped_id
58 |
59 | smile_str = None
60 |
61 | try:
62 | if sys.version_info[0] == 3:
63 | response = urllib.request.urlopen(
64 | '{}{}'
65 | .format(base_path, stripped_id))
66 | else:
67 | response = urllib.urlopen('{}{}'
68 | .format(base_path, stripped_id))
69 | except urllib.error.HTTPError:
70 | print('Invalid ZINC ID {}'.format(zinc_id))
71 | response = []
72 | for line in response:
73 | line = line.decode(encoding='UTF-8').strip()
74 | if line_lookup in line:
75 | line = (line.split(first_linesplit)[-1]
76 | .split(second_linesplit)[0])
77 | if sys.version_info[0] == 3:
78 | smile_str = urllib.parse.unquote(line)
79 | else:
80 | smile_str = urllib.unquote(line)
81 | break
82 | return smile_str
83 |
84 |
85 | def get_zincid_from_smile(smile_str, backend='zinc15'):
86 | """
87 | Gets the corresponding ZINC ID(s) for a SMILE string query from
88 | the ZINC online database. Requires an internet connection.
89 |
90 | Keyword arguments:
91 | smile_str (str): A valid SMILE string, e.g.,
92 | C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O'
93 | backend (str): Specifies the database backend, "zinc12" or "zinc15"
94 |
95 | Returns the SMILE string for the corresponding ZINC ID(s) in a list.
96 | E.g., ['ZINC01234567', 'ZINC01234568', 'ZINC01242053', 'ZINC01242055']
97 |
98 | """
99 |
100 | if backend not in {'zinc12', 'zinc15'}:
101 | raise ValueError("backend must be 'zinc12' or 'zinc15'")
102 |
103 | stripped_smile = smile_str.strip()
104 | encoded_smile = urllib.parse.quote(stripped_smile)
105 |
106 | if backend == 'zinc12':
107 | url_part1 = 'http://zinc12.docking.org/results?structure.smiles='
108 | url_part3 = '&structure.similarity=1.0'
109 | elif backend == 'zinc15':
110 | url_part1 = 'http://zinc.docking.org/substances/search/?q='
111 | url_part3 = ''
112 | else:
113 | raise ValueError("Backend must be 'zinc12' or 'zinc15'. "
114 | "Got %s" % (backend))
115 |
116 | zinc_ids = []
117 |
118 | try:
119 | if sys.version_info[0] == 3:
120 | #smile_url = urllib.request.pathname2url(encoded_smile)
121 | response = urllib.request.urlopen('{}{}{}'
122 | .format(url_part1,
123 | encoded_smile,
124 | url_part3))
125 | else:
126 | #smile_url = urllib.pathname2url(encoded_smile)
127 | response = urllib.urlopen('{}{}{}'
128 | .format(url_part1,
129 | encoded_smile,
130 | url_part3))
131 | except urllib.error.HTTPError:
132 | print('Invalid SMILE string {}'.format(smile_str))
133 | response = []
134 | for line in response:
135 | line = line.decode(encoding='UTF-8').strip()
136 |
137 | if backend == 'zinc15':
138 | if line.startswith('')[-2].split('>')[-1]
148 | if sys.version_info[0] == 3:
149 | zinc_id = urllib.parse.unquote(line)
150 | else:
151 | zinc_id = urllib.unquote(line)
152 | zinc_id = 'ZINC' + (8-len(zinc_id)) * '0' + zinc_id
153 | zinc_ids.append(str(zinc_id))
154 | return zinc_ids
155 |
156 |
157 | def generate_zincid_smile_csv(zincid_list, out_file,
158 | print_progress_bar=True, backend='zinc12'):
159 | """
160 | Generates a CSV file of ZINC_ID,SMILE_string entries
161 | by querying the ZINC online database.
162 |
163 | Keyword arguments:
164 | zincid_list (str): Path to a UTF-8 or ASCII formatted file
165 | that contains 1 ZINC_ID per row. E.g.,
166 | ZINC0000123456
167 | ZINC0000234567
168 | [...]
169 | out_file (str): Path to a new output CSV file that will be written.
170 | print_prgress_bar (bool): Prints a progress bar to the screen if True.
171 |
172 | """
173 | id_smile_pairs = []
174 | with open(zincid_list, 'r') as infile:
175 | all_lines = infile.readlines()
176 | if print_progress_bar:
177 | pbar = pyprind.ProgBar(len(all_lines), title='Downloading SMILES')
178 | for line in all_lines:
179 | line = line.strip()
180 | id_smile_pairs.append((line, get_zinc_smile(line,
181 | backend=backend)))
182 | if print_progress_bar:
183 | pbar.update()
184 | with open(out_file, 'w') as out:
185 | for p in id_smile_pairs:
186 | out.write('{},{}\n'.format(p[0], p[1]))
187 |
188 |
189 | def check_duplicate_smiles(zincid_list, out_file,
190 | compare_simplified_smiles=False):
191 | """
192 | Scans a ZINC_ID,SMILE_string CSV file for duplicate SMILE strings.
193 |
194 | Keyword arguments:
195 | zincid_list (str): Path to a UTF-8 or ASCII formatted file that
196 | contains 1 ZINC_ID + 1 SMILE String per row.
197 | E.g.,
198 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
199 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
200 | [...]
201 | out_file (str): Path to a new output CSV file that will be written.
202 | compare_simplified_smiles (bool): If true, S
203 | SMILE strings will be simplified for the comparison.
204 |
205 | """
206 | smile_dict = dict()
207 | with open(zincid_list, 'r') as infile:
208 | all_lines = infile.readlines()
209 | for line in all_lines:
210 | line = line.strip().split(',')
211 | if len(line) == 2:
212 | zinc_id, smile_str = line
213 | if compare_simplified_smiles:
214 | smile_str = simplify_smile(smile_str)
215 | if smile_str not in smile_dict:
216 | smile_dict[smile_str] = [zinc_id]
217 | else:
218 | smile_dict[smile_str].append(zinc_id)
219 |
220 | with open(out_file, 'w') as out:
221 | out.write('zinc_id,smile_str,duplicates')
222 | for entry in smile_dict:
223 | out.write('\n{},{},{},'.format(
224 | smile_dict[entry][0], entry, len(smile_dict[entry]) - 1))
225 | for duplicate in smile_dict[entry][1:]:
226 | out.write(duplicate + ',')
227 | print('\nResults written to', out_file)
228 |
229 |
230 | def create_id_smile_list(id_smile_csv, simplify_smiles=False):
231 | """
232 | Reads in a CSV file and returns a list of [ZINC_ID,SMILE_STR] sublists.
233 |
234 | Keyword arguments:
235 | id_smile_csv (str): Path to a UTF-8 or ASCII formatted file that
236 | contains 1 ZINC_ID + 1 SMILE String per row.
237 | E.g.,
238 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
239 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
240 | [...]
241 | simplify_smiles (bool): If true, SMILE strings will be simplified.
242 |
243 | """
244 | smile_list = []
245 | with open(id_smile_csv, 'r') as infile:
246 | all_lines = infile.readlines()
247 | for line in all_lines:
248 | line = line.strip().split(',')
249 | if len(line) == 2:
250 | zinc_id, smile_str = line
251 | if simplify_smiles:
252 | smile_str = simplify_smile(smile_str)
253 | smile_list.append([zinc_id, smile_str])
254 | return smile_list
255 |
256 |
257 | def comp_two_csvfiles(zincid_list1, zincid_list2, out_file,
258 | compare_simplified_smiles=False):
259 | """
260 | Compares SMILE strings across two ZINC_ID CSV files for duplicates
261 | (does not check for duplicates within each file).
262 |
263 | Keyword arguments:
264 | zincid_list1 (str): Path to a UTF-8 or ASCII formatted file that
265 | contains 1 ZINC_ID + 1 SMILE String per row.
266 | E.g.,
267 | ZINC12345678,Cc1ccc(cc1C)OCCOc2c(cc(cc2I)/C=N/n3cnnc3)OC
268 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
269 | [...]
270 | zincid_list2 (str): Second ZINC_ID list file, similarly
271 | out_file (str): Path to a new output CSV file that will be written.
272 | compare_simplified_smiles (bool): If true,
273 | SMILE strings will be simplified for the comparison.
274 |
275 | """
276 | smile_list1 = create_id_smile_list(zincid_list1, compare_simplified_smiles)
277 | smile_list2 = create_id_smile_list(zincid_list2, compare_simplified_smiles)
278 |
279 | with open(out_file, 'w') as out:
280 | out.write('file_origin,zinc_id,smile_str,duplicates')
281 | for entry in smile_list1:
282 | out.write('\n{},{},{},'.format(
283 | zincid_list1, entry[0], entry[1]))
284 | for comp in smile_list2:
285 | if comp[1] == entry[1]: # if SMILES are similar
286 | out.write(comp[0] + ',')
287 | for entry in smile_list2:
288 | out.write('\n{},{},{},'.format(
289 | zincid_list2, entry[0], entry[1]))
290 | for comp in smile_list1:
291 | if comp[1] == entry[1]: # if SMILES are similar
292 | out.write(comp[0] + ',')
293 | print('\nResults written to', out_file)
294 |
295 |
296 | def simplify_smile(smile_str):
297 | """
298 | Simplifies a SMILE string by removing hydrogen atoms (H),
299 | chiral specifications ('@'), charges (+ / -), '#'-characters,
300 | and square brackets ('[', ']').
301 |
302 | Keyword Arguments:
303 | smile_str (str): A smile string, e.g., C[C@H](CCC(=O)NCCS(=O)(=O)[O-])
304 |
305 | Returns a simplified SMILE string, e.g., CC(CCC(=O)NCCS(=O)(=O)O)
306 |
307 | """
308 | remove_chars = ['@', '-', '+', 'H', '[', ']', '#']
309 | stripped_smile = []
310 | for sym in smile_str:
311 | if sym.isalpha():
312 | sym = sym.upper()
313 | if sym not in remove_chars:
314 | stripped_smile.append(sym)
315 | return "".join(stripped_smile)
316 |
317 |
318 | def create_sqlite(sqlite_file):
319 | """
320 | Creates a new SQLite database file if it doesn't exist yet.
321 | The database created will consists of 3 columns:
322 | 1) 'zinc_id' (ZINC ID as Primary Key)
323 | 2) 'smile' (SMILE string obtained from the ZINC online db)
324 | 3) 'simple_smile' (simplified SMILE string,
325 | see smilite.simplify_smile())
326 |
327 | Keyword arguments:
328 | sqlite_file (str): Path to the new SQLite database file.
329 |
330 | """
331 | if not os.path.exists(sqlite_file):
332 | # open connection to a sqlite file object
333 | conn = sqlite3.connect(sqlite_file)
334 | c = conn.cursor()
335 |
336 | # creating a new SQLite table with 3 columns
337 | c.execute('CREATE TABLE smilite (zinc_id TEXT PRIMARY KEY,'
338 | ' smile TEXT, simple_smile TEXT)')
339 |
340 | # commit changes and close the connection to the sqlite file object.
341 | conn.commit()
342 | conn.close()
343 |
344 |
345 | def insert_id_sqlite(sqlite_file, zinc_id, backend='zinc12'):
346 | """
347 | Inserts a new ZINC ID into an existing SQLite database if the ZINC ID
348 | isn't contained in the database, yet. Obtains the SMILE string from the
349 | ZINC online database and adds it to the new ZINC ID database entry together
350 | with an simplified SMILE string.
351 |
352 | Example database entry:
353 | zinc_id,smile,simple_smile
354 | "ZINC01234567","C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O","CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O"
355 |
356 | Keyword arguments:
357 | sqlite_file (str): Path to an existing SQLite database file
358 | zinc_id (str): A valid ZINC ID
359 |
360 | Returns True if insertion was successful, else returns False.
361 |
362 | """
363 | success = False
364 | if os.path.exists(sqlite_file):
365 | # open connection to a sqlite file object
366 | conn = sqlite3.connect(sqlite_file)
367 | c = conn.cursor()
368 |
369 | # get smile string and simplified smile string
370 | smile_str = get_zinc_smile(zinc_id, backend=backend)
371 | if smile_str:
372 | simple_smile = simplify_smile(smile_str)
373 |
374 | # insert data into database
375 | if smile_str and simple_smile:
376 | c.execute('INSERT OR IGNORE INTO smilite (zinc_id, smile,'
377 | ' simple_smile) VALUES (?, ?, ?)',
378 | (zinc_id, smile_str, simple_smile))
379 | success = True
380 |
381 | # commit changes and close the connection to the sqlite file object.
382 | conn.commit()
383 | conn.close()
384 | return success
385 |
386 | else:
387 | return success
388 |
389 |
390 | def lookup_id_sqlite(sqlite_file, zinc_id):
391 | """
392 | Looks up an ZINC ID in an existing SQLite database file.
393 |
394 | Keyword arguments:
395 | sqlite_file (str): Path to an existing SQLite database file
396 | zinc_id (str): A valid ZINC ID
397 |
398 | Returns a list with the ZINC ID, SMILE string, and simplified SMILE
399 | string or an empty list if ZINC ID could not be found.
400 | Example returned list:
401 | ['ZINC01234567', 'C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O',
402 | 'CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O']
403 |
404 | """
405 | result = []
406 | if os.path.exists(sqlite_file):
407 |
408 | # open connection to a sqlite file object
409 | conn = sqlite3.connect(sqlite_file)
410 | c = conn.cursor()
411 |
412 | c.execute('SELECT * FROM smilite WHERE zinc_id=?', (zinc_id,))
413 | all_rows = c.fetchall()
414 | try:
415 | result = [i for i in all_rows[0]]
416 | except IndexError:
417 | pass
418 |
419 | # close the connection to the sqlite file object.
420 | conn.close()
421 |
422 | return result
423 |
424 |
425 | def lookup_smile_sqlite(sqlite_file, smile_str, simple_smile=False):
426 | """
427 | Looks up an ZINC ID for a given SMILE string in an existing
428 | SQLite database file.
429 |
430 | Keyword arguments:
431 | sqlite_file (str): Path to an existing SQLite database file
432 | smile_str (str): A SMILE string to query the database
433 | simple_smile (bool): Queries simplified smile strings in the
434 | database if true
435 |
436 | Returns a list with the ZINC ID, SMILE string, and simplified SMILE
437 | string or an empty list if SMILE string could not be found.
438 | Example returned list:
439 | ['ZINC01234567', 'C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O',
440 | 'CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O']
441 | If multiple ZINC IDs match the query SMILE string, a list of sublists
442 | is returned.
443 |
444 | """
445 | result = []
446 | if os.path.exists(sqlite_file):
447 |
448 | # open connection to a sqlite file object
449 | conn = sqlite3.connect(sqlite_file)
450 | c = conn.cursor()
451 |
452 | if simple_smile:
453 | c.execute('SELECT * FROM smilite WHERE simple_smile=?',
454 | (smile_str,))
455 | else:
456 | c.execute('SELECT * FROM smilite WHERE smile=?', (smile_str,))
457 | all_rows = c.fetchall()
458 | try:
459 | for i in all_rows:
460 | result.append([j for j in i])
461 | except IndexError:
462 | pass
463 |
464 | # close the connection to the sqlite file object.
465 | conn.close()
466 |
467 | return result
468 |
469 |
470 | def sqlite_to_dict(sqlite_file):
471 | """
472 | Returns contents of an SQLite smilite database as Python dictionary object.
473 |
474 | Keyword arguments:
475 | sqlite_file (str): Path to an existing SQLite database file
476 |
477 | Returns an SQLite smilite database as Python dictionary object with
478 | ZINC IDs as keys and corresponding
479 | [SMILE_string, Simple_SMILE_string] lists as values.
480 |
481 | Example returned dictionary:
482 | {
483 | 'ZINC01234568': ['C[C@@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O',
484 | 'CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O'],
485 | 'ZINC01234567': ['C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O',
486 | 'CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O']
487 | }
488 |
489 | """
490 | result = {}
491 | if os.path.exists(sqlite_file):
492 |
493 | # open connection to a sqlite file object
494 | conn = sqlite3.connect(sqlite_file)
495 | c = conn.cursor()
496 |
497 | c.execute('SELECT * FROM smilite')
498 | all_rows = c.fetchall()
499 | try:
500 | result = {i[0]: [i[1], i[2]] for i in all_rows}
501 | except IndexError:
502 | pass
503 |
504 | # close the connection to the sqlite file object.
505 | conn.close()
506 |
507 | return result
508 |
509 |
510 | def sqlite_to_csv(sqlite_file, csv_file):
511 | """
512 | Writes contents of an SQLite smilite database to a CSV file.
513 |
514 | Keyword arguments:
515 | sqlite_file (str): Path to an existing SQLite database file
516 | csv_file (str): Path to the output CSV file
517 |
518 | Example output CSV file contents:
519 |
520 | ZINC_ID,SMILE,SIMPLE_SMILE
521 | ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
522 | ZINC01234568,C[C@@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
523 | ...
524 |
525 | """
526 | zinc_dict = sqlite_to_dict(sqlite_file)
527 | with open(csv_file, 'a') as out_csv:
528 | out_csv.write('ZINC_ID,SMILE,SIMPLE_SMILE\n')
529 | for k in zinc_dict.keys():
530 | line = '{},{}\n'.format(k, ",".join(zinc_dict[k]))
531 | out_csv.write(line)
532 |
--------------------------------------------------------------------------------
/test/test_get_zinc_smile.py:
--------------------------------------------------------------------------------
1 | # Sebastian Raschka, 02/2014
2 |
3 | import smilite
4 |
5 | def test_get_zinc_smile():
6 | out = smilite.get_zinc_smile('ZINC00029323')
7 | assert out == 'COc1cccc(c1)NC(=O)c2cccnc2'
8 |
9 |
--------------------------------------------------------------------------------
/test/test_simplify_smile.py:
--------------------------------------------------------------------------------
1 | # Sebastian Raschka, 02/2014
2 |
3 | import smilite
4 |
5 | def test_simplify_smile():
6 | out = smilite.simplify_smile('C[C@H](CCC(=O)NCCS(=O)(=O)[O-])')
7 | assert out == 'CC(CCC(=O)NCCS(=O)(=O)O)'
8 |
9 |
--------------------------------------------------------------------------------