├── requirements.txt ├── README.md ├── NC_005816.gb └── My_sample_notebook.ipynb /requirements.txt: -------------------------------------------------------------------------------- 1 | folium==0.6.0 2 | ipython==7.16.3 3 | matplotlib==2.1.0 4 | pandas==1.0.3 5 | biopython==1.76 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Example Jupyter notebook with Binder & Colab integration 2 | 3 | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jperkel/example_notebook/master) 4 | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jperkel/example_notebook/blob/master/My_sample_notebook.ipynb) 5 | [![Open in Code Ocean](https://codeocean.com/codeocean-assets/badge/open-in-code-ocean.svg)](https://codeocean.com/2018/11/09/why-jupyter-is-data-scientists-rsquo-computational-notebook-of-choice/code) 6 | 7 | _Updated 09 Nov 2018: This code can also be run in Code Ocean. Click [here](https://codeocean.com/2018/11/09/why-jupyter-is-data-scientists-rsquo-computational-notebook-of-choice/code) (or the button above) to launch the notebook on that platform. Thanks to Code Ocean's Simon Adar ([@SimonAdar](https://twitter.com/simonadar)) and [Seth Green](https://github.com/setgree) for the help!_ 8 | 9 | This simple [Jupyter](https://jupyter.org/) notebook -- written to accompany a [_Nature_ Toolbox](https://www.nature.com/articles/d41586-018-07196-1) feature published on 30 October 2018 -- demonstrates how the computational notebook format allows users to interleave text, code, and results in a single file. 10 | 11 | But, unless you have Jupyter notebook installed on your computer, all you can do is view the notebooks, not play with them. (See for yourself: If you click `My_sample_notebook.ipynb` in this GitHub repository, you will be able to read the notebook, but only as a static document.) This is where [Binder](https://mybinder.org), Google's [Colaboratory](https://research.google.com/colaboratory/) environment, and [Code Ocean](https://codeocean.com) come in. Binder is a free, open-source web service that packages Jupyter notebooks inside an executable container, which can be run within a web browser, no installation required. Colab allows users with Google accounts to execute Jupyter notebooks on the Google cloud. Code Ocean is a commercial code-execution and -sharing service. 12 | 13 | **To execute the notebook in Binder:** 14 | 1. Click the `launch binder` button above. Once the demo launches, click `My_sample_notebook.ipynb` in the file listing. 15 | 2. Run the notebook by selecting `Cell > Run All`. 16 | 3. Take a look at the graph below the fifth cell (labeled 'The First 25 Fibonacci Numbers'). 17 | 4. Uncomment the line in the fifth cell that reads `# ax.plot (range(25), ar)` by removing the leading hashtag (`#`) 18 | 5. Click `Cell > Run All` again. You should see a change in the graph below that cell. 19 | 20 | **To execute the notebook in Colab:** 21 | 1. Click the `Open in Colab` button above. It will launch the notebook directly. 22 | 2. Make the notebook live by clicking 'Connect' in the Colab toolbar. 23 | 3. You'll need to uncomment a few lines of code to make the notebook work. Navigate to the first cell, which reads: 24 | `# !pip install biopython` 25 | `# !pip install folium` 26 | `# !curl -O https://raw.githubusercontent.com/jperkel/example_notebook/master/NC_005816.gb` 27 | 4. Uncomment this cell by removing the three leading `#`. 28 | 5. Select `Runtime > Run All` in the menu to execute the notebook. (You may get a warning that the page was not authored by Google.) 29 | 6. Take a look at the graph below the fifth cell (labeled 'The First 25 Fibonacci Numbers'). 30 | 7. Uncomment the line in the fifth cell that reads `# ax.plot (range(25), ar)` by removing the leading hashtag (`#`) 31 | 8. Click `Runtime > Run All` again. You should see a change in the graph below that cell. 32 | -------------------------------------------------------------------------------- /NC_005816.gb: -------------------------------------------------------------------------------- 1 | LOCUS NC_005816 9609 bp DNA circular BCT 21-JUL-2008 2 | DEFINITION Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete 3 | sequence. 4 | ACCESSION NC_005816 5 | VERSION NC_005816.1 GI:45478711 6 | DBLINK Project: 58037 7 | KEYWORDS . 8 | SOURCE Yersinia pestis biovar Microtus str. 91001 9 | ORGANISM Yersinia pestis biovar Microtus str. 91001 10 | Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; 11 | Enterobacteriaceae; Yersinia. 12 | REFERENCE 1 (bases 1 to 9609) 13 | AUTHORS Zhou,D., Tong,Z., Song,Y., Han,Y., Pei,D., Pang,X., Zhai,J., Li,M., 14 | Cui,B., Qi,Z., Jin,L., Dai,R., Du,Z., Wang,J., Guo,Z., Wang,J., 15 | Huang,P. and Yang,R. 16 | TITLE Genetics of metabolic variations between Yersinia pestis biovars 17 | and the proposal of a new biovar, microtus 18 | JOURNAL J. Bacteriol. 186 (15), 5147-5152 (2004) 19 | PUBMED 15262951 20 | REFERENCE 2 (bases 1 to 9609) 21 | AUTHORS Song,Y., Tong,Z., Wang,J., Wang,L., Guo,Z., Han,Y., Zhang,J., 22 | Pei,D., Zhou,D., Qin,H., Pang,X., Han,Y., Zhai,J., Li,M., Cui,B., 23 | Qi,Z., Jin,L., Dai,R., Chen,F., Li,S., Ye,C., Du,Z., Lin,W., 24 | Wang,J., Yu,J., Yang,H., Wang,J., Huang,P. and Yang,R. 25 | TITLE Complete genome sequence of Yersinia pestis strain 91001, an 26 | isolate avirulent to humans 27 | JOURNAL DNA Res. 11 (3), 179-197 (2004) 28 | PUBMED 15368893 29 | REFERENCE 3 (bases 1 to 9609) 30 | CONSRTM NCBI Genome Project 31 | TITLE Direct Submission 32 | JOURNAL Submitted (16-MAR-2004) National Center for Biotechnology 33 | Information, NIH, Bethesda, MD 20894, USA 34 | REFERENCE 4 (bases 1 to 9609) 35 | AUTHORS Song,Y., Tong,Z., Wang,L., Han,Y., Zhang,J., Pei,D., Wang,J., 36 | Zhou,D., Han,Y., Pang,X., Zhai,J., Chen,F., Qin,H., Wang,J., Li,S., 37 | Guo,Z., Ye,C., Du,Z., Lin,W., Wang,J., Yu,J., Yang,H., Wang,J., 38 | Huang,P. and Yang,R. 39 | TITLE Direct Submission 40 | JOURNAL Submitted (24-APR-2003) The Institute of Microbiology and 41 | Epidemiology, Academy of Military Medical Sciences, No. 20, 42 | Dongdajie Street, Fengtai District, Beijing 100071, People's 43 | Republic of China 44 | COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final 45 | NCBI review. The reference sequence was derived from AE017046. 46 | COMPLETENESS: full length. 47 | FEATURES Location/Qualifiers 48 | source 1..9609 49 | /organism="Yersinia pestis biovar Microtus str. 91001" 50 | /mol_type="genomic DNA" 51 | /strain="91001" 52 | /db_xref="taxon:229193" 53 | /plasmid="pPCP1" 54 | /biovar="Microtus" 55 | repeat_region 1..1954 56 | gene 87..1109 57 | /locus_tag="YP_pPCP01" 58 | /db_xref="GeneID:2767718" 59 | CDS 87..1109 60 | /locus_tag="YP_pPCP01" 61 | /note="similar to corresponding CDS from previously 62 | sequenced pPCP plasmid of Yersinia pestis KIM (AF053945) 63 | and CO92 (AL109969), also many transposase entries for 64 | insertion sequence IS100 of Yersinia pestis. Contains 65 | IS21-like element transposase, HTH domain 66 | (Interpro|IPR007101)" 67 | /codon_start=1 68 | /transl_table=11 69 | /product="putative transposase" 70 | /protein_id="NP_995567.1" 71 | /db_xref="GI:45478712" 72 | /db_xref="GeneID:2767718" 73 | /translation="MVTFETVMEIKILHKQGMSSRAIARELGISRNTVKRYLQAKSEP 74 | PKYTPRPAVASLLDEYRDYIRQRIADAHPYKIPATVIAREIRDQGYRGGMTILRAFIR 75 | SLSVPQEQEPAVRFETEPGRQMQVDWGTMRNGRSPLHVFVAVLGYSRMLYIEFTDNMR 76 | YDTLETCHRNAFRFFGGVPREVLYDNMKTVVLQRDAYQTGQHRFHPSLWQFGKEMGFS 77 | PRLCRPFRAQTKGKVERMVQYTRNSFYIPLMTRLRPMGITVDVETANRHGLRWLHDVA 78 | NQRKHETIQARPCDRWLEEQQSMLALPPEKKEYDVHLDENLVNFDKHPLHHPLSIYDS 79 | FCRGVA" 80 | misc_feature 87..959 81 | /locus_tag="YP_pPCP01" 82 | /note="Transposase and inactivated derivatives [DNA 83 | replication, recombination, and repair]; Region: COG4584" 84 | /db_xref="CDD:34222" 85 | misc_feature <111..209 86 | /locus_tag="YP_pPCP01" 87 | /note="Helix-turn-helix domain of Hin and related 88 | proteins, a family of DNA-binding domains unique to 89 | bacteria and represented by the Hin protein of Salmonella. 90 | The basic HTH domain is a simple fold comprised of three 91 | core helices that form a right-handed...; Region: 92 | HTH_Hin_like; cl01116" 93 | /db_xref="CDD:186341" 94 | misc_feature 438..812 95 | /locus_tag="YP_pPCP01" 96 | /note="Integrase core domain; Region: rve; cl01316" 97 | /db_xref="CDD:194099" 98 | gene 1106..1888 99 | /locus_tag="YP_pPCP02" 100 | /db_xref="GeneID:2767716" 101 | CDS 1106..1888 102 | /locus_tag="YP_pPCP02" 103 | /note="similar to corresponding CDS form previously 104 | sequenced pPCP plasmid of Yersinia pestis KIM (AF053945) 105 | and CO92 (AL109969), also many ATP-binding protein entries 106 | for insertion sequence IS100 of Yersinia pestis. Contains 107 | Chaperonin clpA/B (Interpro|IPR001270). Contains 108 | ATP/GTP-binding site motif A (P-loop) (Interpro|IPR001687, 109 | Molecular Function: ATP binding (GO:0005524)). Contains 110 | Bacterial chromosomal replication initiator protein, DnaA 111 | (Interpro|IPR001957, Molecular Function: DNA binding 112 | (GO:0003677), Molecular Function: DNA replication origin 113 | binding (GO:0003688), Molecular Function: ATP binding 114 | (GO:0005524), Biological Process: DNA replication 115 | initiation (GO:0006270), Biological Process: regulation of 116 | DNA replication (GO:0006275)). Contains AAA ATPase 117 | (Interpro|IPR003593, Molecular Function: nucleotide 118 | binding (GO:0000166))" 119 | /codon_start=1 120 | /transl_table=11 121 | /product="transposase/IS protein" 122 | /protein_id="NP_995568.1" 123 | /db_xref="GI:45478713" 124 | /db_xref="GeneID:2767716" 125 | /translation="MMMELQHQRLMALAGQLQLESLISAAPALSQQAVDQEWSYMDFL 126 | EHLLHEEKLARHQRKQAMYTRMAAFPAVKTFEEYDFTFATGAPQKQLQSLRSLSFIER 127 | NENIVLLGPSGVGKTHLAIAMGYEAVRAGIKVRFTTAADLLLQLSTAQRQGRYKTTLQ 128 | RGVMAPRLLIIDEIGYLPFSQEEAKLFFQVIAKRYEKSAMILTSNLPFGQWDQTFAGD 129 | AALTSAMLDRILHHSHVVQIKGESYRLRQKRKAGVIAEANPE" 130 | misc_feature 1109..1885 131 | /locus_tag="YP_pPCP02" 132 | /note="transposase/IS protein; Provisional; Region: 133 | PRK09183" 134 | /db_xref="CDD:181681" 135 | misc_feature 1367..>1669 136 | /locus_tag="YP_pPCP02" 137 | /note="The AAA+ (ATPases Associated with a wide variety of 138 | cellular Activities) superfamily represents an ancient 139 | group of ATPases belonging to the ASCE (for additional 140 | strand, catalytic E) division of the P-loop NTPase fold. 141 | The ASCE division also includes...; Region: AAA; cd00009" 142 | /db_xref="CDD:99707" 143 | misc_feature 1433..1456 144 | /locus_tag="YP_pPCP02" 145 | /note="Walker A motif; other site" 146 | /db_xref="CDD:99707" 147 | misc_feature order(1436..1459,1619..1621) 148 | /locus_tag="YP_pPCP02" 149 | /note="ATP binding site [chemical binding]; other site" 150 | /db_xref="CDD:99707" 151 | misc_feature 1607..1624 152 | /locus_tag="YP_pPCP02" 153 | /note="Walker B motif; other site" 154 | /db_xref="CDD:99707" 155 | gene 2925..3119 156 | /gene="rop" 157 | /locus_tag="YP_pPCP03" 158 | /gene_synonym="rom" 159 | /db_xref="GeneID:2767717" 160 | CDS 2925..3119 161 | /gene="rop" 162 | /locus_tag="YP_pPCP03" 163 | /gene_synonym="rom" 164 | /note="Best Blastp hit =gi|16082682|ref|NP_395229.1| 165 | (NC_003132) putative replication regulatory protein 166 | [Yersinia pestis], gi|5763813|emb|CAB531 66.1| (AL109969) 167 | putative replication regulatory protein [Yersinia pestis]; 168 | similar to gb|AAK91579.1| (AY048853), RNAI modulator 169 | protein Rom [Salmonella choleraesuis], Contains Regulatory 170 | protein Rop (Interpro|IPR000769)" 171 | /codon_start=1 172 | /transl_table=11 173 | /product="putative replication regulatory protein" 174 | /protein_id="NP_995569.1" 175 | /db_xref="GI:45478714" 176 | /db_xref="GeneID:2767717" 177 | /translation="MNKQQQTALNMARFIRSQSLILLEKLDALDADEQAAMCERLHEL 178 | AEELQNSIQARFEAESETGT" 179 | misc_feature 2925..3107 180 | /gene="rop" 181 | /locus_tag="YP_pPCP03" 182 | /gene_synonym="rom" 183 | /note="Rop protein; Region: Rop; pfam01815" 184 | /db_xref="CDD:145136" 185 | gene 3486..3857 186 | /locus_tag="YP_pPCP04" 187 | /db_xref="GeneID:2767720" 188 | CDS 3486..3857 189 | /locus_tag="YP_pPCP04" 190 | /note="Best Blastp hit = gi|321919|pir||JQ1541 191 | hypothetical 16.9K protein - Salmonella typhi murium 192 | plasmid NTP16." 193 | /codon_start=1 194 | /transl_table=11 195 | /product="hypothetical protein" 196 | /protein_id="NP_995570.1" 197 | /db_xref="GI:45478715" 198 | /db_xref="GeneID:2767720" 199 | /translation="MSKKRRPQKRPRRRRFFHRLRPPDEHHKNRRSSQRWRNPTGLKD 200 | TRRFPPEAPSCALLFRPCRLPDTSPPFSLREAWRFLIAHAVGISVRCRSFAPSWAVCT 201 | NPPFSPTTAPYPVTIVLSPTR" 202 | misc_feature 3498..3626 203 | /locus_tag="YP_pPCP04" 204 | /note="ProfileScan match to entry PS50323 ARG_RICH, 205 | E-value 8.981" 206 | gene 4343..4780 207 | /gene="pim" 208 | /locus_tag="YP_pPCP05" 209 | /db_xref="GeneID:2767712" 210 | CDS 4343..4780 211 | /gene="pim" 212 | /locus_tag="YP_pPCP05" 213 | /note="similar to many previously sequenced pesticin 214 | immunity protein entries of Yersinia pestis plasmid pPCP, 215 | e.g. gi| 16082683|,ref|NP_395230.1| (NC_003132) , 216 | gi|1200166|emb|CAA90861.1| (Z54145 ) , gi|1488655| 217 | emb|CAA63439.1| (X92856) , gi|2996219|gb|AAC62543.1| 218 | (AF053945) , and gi|5763814|emb|CAB531 67.1| (AL109969)" 219 | /codon_start=1 220 | /transl_table=11 221 | /product="pesticin immunity protein" 222 | /protein_id="NP_995571.1" 223 | /db_xref="GI:45478716" 224 | /db_xref="GeneID:2767712" 225 | /translation="MGGGMISKLFCLALIFLSSSGLAEKNTYTAKDILQNLELNTFGN 226 | SLSHGIYGKQTTFKQTEFTNIKSNTKKHIALINKDNSWMISLKILGIKRDEYTVCFED 227 | FSLIRPPTYVAIHPLLIKKVKSGNFIVVKEIKKSIPGCTVYYH" 228 | gene complement(4815..5888) 229 | /gene="pst" 230 | /locus_tag="YP_pPCP06" 231 | /db_xref="GeneID:2767721" 232 | CDS complement(4815..5888) 233 | /gene="pst" 234 | /locus_tag="YP_pPCP06" 235 | /note="Best Blastp hit =|16082684|ref|NP_395231.1| 236 | (NC_003132) pesticin [Yersinia pestis], 237 | gi|984824|gb|AAA75369.1| (U31974) pesticin [Yersinia 238 | pestis], gi|1488654|emb|CAA63438.1| (X92856) pesticin 239 | [Yersinia pestis], gi|2996220|gb|AAC62544.1| (AF053945) 240 | pesticin [Yersinia pestis], gi|5763815|emb|CAB53168.1| 241 | (AL1099 69) pesticin [Yersinia pestis]" 242 | /codon_start=1 243 | /transl_table=11 244 | /product="pesticin" 245 | /protein_id="NP_995572.1" 246 | /db_xref="GI:45478717" 247 | /db_xref="GeneID:2767721" 248 | /translation="MSDTMVVNGSGGVPAFLFSGSTLSSYRPNFEANSITIALPHYVD 249 | LPGRSNFKLMYIMGFPIDTEMEKDSEYSNKIRQESKISKTEGTVSYEQKITVETGQEK 250 | DGVKVYRVMVLEGTIAESIEHLDKKENEDILNNNRNRIVLADNTVINFDNISQLKEFL 251 | RRSVNIVDHDIFSSNGFEGFNPTSHFPSNPSSDYFNSTGVTFGSGVDLGQRSKQDLLN 252 | DGVPQYIADRLDGYYMLRGKEAYDKVRTAPLTLSDNEAHLLSNIYIDKFSHKIEGLFN 253 | DANIGLRFSDLPLRTRTALVSIGYQKGFKLSRTAPTVWNKVIAKDWNGLVNAFNNIVD 254 | GMSDRRKREGALVQKDIDSGLLK" 255 | variation 5910..5911 256 | /note="compared to AF053945" 257 | /replace="" 258 | variation 5933^5934 259 | /note="compared to AL109969" 260 | /replace="a" 261 | variation 5933^5934 262 | /note="compared to AF053945" 263 | /replace="aa" 264 | variation 5948 265 | /note="compared to AL109969" 266 | /replace="c" 267 | gene 6005..6421 268 | /locus_tag="YP_pPCP07" 269 | /db_xref="GeneID:2767719" 270 | CDS 6005..6421 271 | /locus_tag="YP_pPCP07" 272 | /note="Best Blastp hit = gi|16082685|ref|NP_395232.1| 273 | (NC_003132) hypothetical protein [Yersinia pestis], 274 | gi|5763816|emb|CAB53169.1| (AL109969) hypothetical protein 275 | [Yersinia pestis]" 276 | /codon_start=1 277 | /transl_table=11 278 | /product="hypothetical protein" 279 | /protein_id="NP_995573.1" 280 | /db_xref="GI:45478718" 281 | /db_xref="GeneID:2767719" 282 | /translation="MKFHFCDLNHSYKNQEGKIRSRKTAPGNIRKKQKGDNVSKTKSG 283 | RHRLSKTDKRLLAALVVAGYEERTARDLIQKHVYTLTQADLRHLVSEISNGVGQSQAY 284 | DAIYQARRIRLARKYLSGKKPEGVEPREGQEREDLP" 285 | variation 6525 286 | /note="compared to AF053945 and AL109969" 287 | /replace="c" 288 | gene 6664..7602 289 | /gene="pla" 290 | /locus_tag="YP_pPCP08" 291 | /db_xref="GeneID:2767715" 292 | CDS 6664..7602 293 | /gene="pla" 294 | /locus_tag="YP_pPCP08" 295 | /EC_number="3.4.23.48" 296 | /note="outer membrane protease; involved in virulence in 297 | many organisms; OmpT; IcsP; SopA; Pla; PgtE; omptin; in 298 | Escherichia coli OmpT can degrade antimicrobial peptides; 299 | in Yersinia Pla activates plasminogen during infection; in 300 | Shigella flexneria SopA cleaves the autotransporter IcsA" 301 | /codon_start=1 302 | /transl_table=11 303 | /product="outer membrane protease" 304 | /protein_id="NP_995574.1" 305 | /db_xref="GI:45478719" 306 | /db_xref="GeneID:2767715" 307 | /translation="MKKSSIVATIITILSGSANAASSQLIPNISPDSFTVAASTGMLS 308 | GKSHEMLYDAETGRKISQLDWKIKNVAILKGDISWDPYSFLTLNARGWTSLASGSGNM 309 | DDYDWMNENQSEWTDHSSHPATNVNHANEYDLNVKGWLLQDENYKAGITAGYQETRFS 310 | WTATGGSYSYNNGAYTGNFPKGVRVIGYNQRFSMPYIGLAGQYRINDFELNALFKFSD 311 | WVRAHDNDEHYMRDLTFREKTSGSRYYGTVINAGYYVTPNAKVFAEFTYSKYDEGKGG 312 | TQIIDKNSGDSVSIGGDAAGISNKNYTVTAGLQYRF" 313 | misc_feature 6664..7599 314 | /gene="pla" 315 | /locus_tag="YP_pPCP08" 316 | /note="Omptin family; Region: Omptin; cl01886" 317 | /db_xref="CDD:186487" 318 | gene complement(7789..8088) 319 | /locus_tag="YP_pPCP09" 320 | /db_xref="GeneID:2767713" 321 | CDS complement(7789..8088) 322 | /locus_tag="YP_pPCP09" 323 | /note="Best Blastp hit = gi|16082687|ref|NP_395234.1| 324 | (NC_003132) putative transcriptional regulator [Yersinia 325 | pestis], gi|5763818|emb|CAB53171.1| (AL109969) putative 326 | transcriptional regulator [Yersinia pestis]." 327 | /codon_start=1 328 | /transl_table=11 329 | /product="putative transcriptional regulator" 330 | /protein_id="NP_995575.1" 331 | /db_xref="GI:45478720" 332 | /db_xref="GeneID:2767713" 333 | /translation="MRTLDEVIASRSPESQTRIKEMADEMILEVGLQMMREELQLSQK 334 | QVAEAMGISQPAVTKLEQRGNDLKLATLKRYVEAMGGKLSLDVELPTGRRVAFHV" 335 | misc_feature complement(7837..7995) 336 | /locus_tag="YP_pPCP09" 337 | /note="Helix-turn-helix XRE-family like proteins. 338 | Prokaryotic DNA binding proteins belonging to the 339 | xenobiotic response element family of transcriptional 340 | regulators; Region: HTH_XRE; cl09100" 341 | /db_xref="CDD:195788" 342 | gene complement(8088..8360) 343 | /locus_tag="YP_pPCP10" 344 | /db_xref="GeneID:2767714" 345 | CDS complement(8088..8360) 346 | /locus_tag="YP_pPCP10" 347 | /note="Best Blastp hit = gi|16082688|ref|NP_395235.1| 348 | (NC_003132) hypothetical protein [ Yersinia pestis], 349 | gi|5763819|emb|CAB53172.1| (AL109969) hypothetical protein 350 | [Yersinia pestis]" 351 | /codon_start=1 352 | /transl_table=11 353 | /product="hypothetical protein" 354 | /protein_id="NP_995576.1" 355 | /db_xref="GI:45478721" 356 | /db_xref="GeneID:2767714" 357 | /translation="MADLKKLQVYGPELPRPYADTVKGSRYKNMKELRVQFSGRPIRA 358 | FYAFDPIRRAIVLCAGDKSNDKRFYEKLVRIAEDEFTAHLNTLESK" 359 | misc_feature complement(8091..>8357) 360 | /locus_tag="YP_pPCP10" 361 | /note="Phage derived protein Gp49-like (DUF891); Region: 362 | Gp49; cl01470" 363 | /db_xref="CDD:194142" 364 | variation 8529^8530 365 | /note="compared to AL109969" 366 | /replace="tt" 367 | ORIGIN 368 | 1 tgtaacgaac ggtgcaatag tgatccacac ccaacgcctg aaatcagatc cagggggtaa 369 | 61 tctgctctcc tgattcagga gagtttatgg tcacttttga gacagttatg gaaattaaaa 370 | 121 tcctgcacaa gcagggaatg agtagccggg cgattgccag agaactgggg atctcccgca 371 | 181 ataccgttaa acgttatttg caggcaaaat ctgagccgcc aaaatatacg ccgcgacctg 372 | 241 ctgttgcttc actcctggat gaataccggg attatattcg tcaacgcatc gccgatgctc 373 | 301 atccttacaa aatcccggca acggtaatcg ctcgcgagat cagagaccag ggatatcgtg 374 | 361 gcggaatgac cattctcagg gcattcattc gttctctctc ggttcctcag gagcaggagc 375 | 421 ctgccgttcg gttcgaaact gaacccggac gacagatgca ggttgactgg ggcactatgc 376 | 481 gtaatggtcg ctcaccgctt cacgtgttcg ttgctgttct cggatacagc cgaatgctgt 377 | 541 acatcgaatt cactgacaat atgcgttatg acacgctgga gacctgccat cgtaatgcgt 378 | 601 tccgcttctt tggtggtgtg ccgcgcgaag tgttgtatga caatatgaaa actgtggttc 379 | 661 tgcaacgtga cgcatatcag accggtcagc accggttcca tccttcgctg tggcagttcg 380 | 721 gcaaggagat gggcttctct ccccgactgt gtcgcccctt cagggcacag actaaaggta 381 | 781 aggtggaacg gatggtgcag tacacccgta acagttttta catcccacta atgactcgcc 382 | 841 tgcgcccgat ggggatcact gtcgatgttg aaacagccaa ccgccacggt ctgcgctggc 383 | 901 tgcacgatgt cgctaaccaa cgaaagcatg aaacaatcca ggcccgtccc tgcgatcgct 384 | 961 ggctcgaaga gcagcagtcc atgctggcac tgcctccgga gaaaaaagag tatgacgtgc 385 | 1021 atcttgatga aaatctggtg aacttcgaca aacaccccct gcatcatcca ctctccatct 386 | 1081 acgactcatt ctgcagagga gtggcgtgat gatggaactg caacatcaac gactgatggc 387 | 1141 gctcgccggg cagttgcaac tggaaagcct tataagcgca gcgcctgcgc tgtcacaaca 388 | 1201 ggcagtagac caggaatgga gttatatgga cttcctggag catctgcttc atgaagaaaa 389 | 1261 actggcacgt catcaacgta aacaggcgat gtatacccga atggcagcct tcccggcggt 390 | 1321 gaaaacgttc gaagagtatg acttcacatt cgccaccgga gcaccgcaga agcaactcca 391 | 1381 gtcgttacgc tcactcagct tcatagaacg taatgaaaat atcgtattac tggggccatc 392 | 1441 aggtgtgggg aaaacccatc tggcaatagc gatgggctat gaagcagtcc gtgcaggtat 393 | 1501 caaagttcgc ttcacaacag cagcagatct gttacttcag ttatctacgg cacaacgtca 394 | 1561 gggccgttat aaaacgacgc ttcagcgtgg agtaatggcc ccccgcctgc tcatcattga 395 | 1621 tgaaataggc tatctgccgt tcagtcagga agaagcaaag ctgttcttcc aggtcatcgc 396 | 1681 taaacgttac gaaaagagcg caatgatcct gacatccaat ctgccgttcg ggcagtggga 397 | 1741 tcaaacgttc gccggtgatg cagcactgac ctcagcgatg ctggaccgta tcttacacca 398 | 1801 ctcacatgtc gttcaaatca aaggagaaag ctatcgactc agacagaaac gaaaggccgg 399 | 1861 ggttatagca gaagctaatc ctgagtaaaa cggtggatca atattgggcc gttggtggag 400 | 1921 atataagtgg atcacttttc atccgtcgtt gacaccctga tgaattcacg tgttcacgcc 401 | 1981 tgaataacaa gaatgccgga gatacgcagt catatttttt acacaattct ctaatcccga 402 | 2041 caaggtcgta ggtcgttata ggaaaattct tagcaccatt ccggaacaat cagaacagca 403 | 2101 ggccatgaac gactgacaac attacgaata taaaaaacgc acccgggcca gacattcccc 404 | 2161 ctactgatta aaccagccgg acttgtccac ggaacggtct ttttaaaccg acacacagtc 405 | 2221 tgagtacaga tacatgtcac gatgatgcag gattagcgga agagtgtgag cacgtttccg 406 | 2281 ggaactgtgg tgaaccatag ctcaatattc gagtgagggc ataccggaaa cgcgctcaga 407 | 2341 ttcgttgtaa cgcgattttc cgtaccgggc aattttttca gttgtttttt cgtttcatgt 408 | 2401 cgtcagaaac gttctgagcg cgtttccggc atctgatgct acgcaaacca tccccatggt 409 | 2461 cagttgacag ccggaaacac gcgggtgtcg ttttagcgta tcgacgggac ggcgtcgaga 410 | 2521 agcacaaaaa acagatgttg tactcagtca gttgttttac agacagcact gcggcagatt 411 | 2581 gaaaaagtac cgtactttca ggaatgtcca gaaaccatgt gtcagacttc gttctccccc 412 | 2641 ttccgggtga atttttttgt catccgttca ggaatctctt tataacgatt actccatttc 413 | 2701 aggatttttt atgtggcgtt tactacaggc aggatattca aaggcaaaaa aatcccccgg 414 | 2761 aacaggcgga acccggacag ggggagaacg aatcgctaaa taattttcgt agttgtattt 415 | 2821 cccatcgttg ctactgcaac gggatgaatt tgccgcagtt tatcctgtaa aacaatcctg 416 | 2881 atttactcac actccacata tcactgacgg agcacaacgg aatagtgaac aaacaacaac 417 | 2941 aaactgcgct gaatatggcg cgatttatca gaagccagag cctgatactg cttgaaaaac 418 | 3001 tggatgctct ggatgccgac gagcaggcgg ccatgtgtga acgactgcac gaactcgcgg 419 | 3061 aagaactcca gaacagcatc caggctcgct ttgaagccga aagtgaaaca ggaacataac 420 | 3121 gaagctcccg gagacggtca cagcttgtct gtgaacggat gccgggagca gacaagcccg 421 | 3181 tcagggcgcg tcagcgggtt ttagcgggtg tcggggcgca gccatgaccc agtcacgtag 422 | 3241 cgatagcgga gtgtatactg gcttagtcat gcggcatcag tgcggattgt atgaaaagtg 423 | 3301 caccatgtac ggtgtgaaat gccgcacaga tgcgtaagga gaacatgcag atgccgatgc 424 | 3361 tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggtg 425 | 3421 tctgctcact caaaagcggt gatactgtta tccacacaat caggggataa cgccggaaag 426 | 3481 aacatgtgag caaaaaacga agaccccaga aaaggccgcg ccggaggcgc tttttccata 427 | 3541 ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg tggcgaaacc 428 | 3601 cgacaggact taaagatacc aggcgtttcc ccccggaagc tccctcgtgc gctctcctgt 429 | 3661 tccgaccctg ccgcttaccg gatacctctc cgcctttctc ccttcgggaa gcgtggcgct 430 | 3721 ttctcatagc tcacgctgtt ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg 431 | 3781 ctgtgtgcac gaaccccccg ttcagcccga ccactgcgcc ttatccggta actatcgtct 432 | 3841 tgagtccaac ccggtaagac acgactttac gccactggca gcagccattg gtaactgaaa 433 | 3901 agtggattta gatacgcaga actcttgaag ttgaagcctt atcgcggcta cactgaaagg 434 | 3961 acagcatttg gtatctgtgc tccacttaag ccagctacca caggttagaa agcctgagaa 435 | 4021 acttctaacc ttcgaaagaa cccacgcctg agaacgtggg ttttttcgtt tacaggcagc 436 | 4081 agattacgcg cagaaaaaaa ggatctcaag aagatccttt gatcttttct actgaattgc 437 | 4141 gctcccgatc agttcagcag aagattatga tggggttcta tgggtattgc tgcggtaaca 438 | 4201 cccatgttac ttgaggttgt atgtagtctg tgtagaatta tacacataag gcttaaactg 439 | 4261 ctcttttttt tcaatatgca attggaagtt cattgactac ataaatagat tattccaaat 440 | 4321 aatttattta tgtaagaaca ggatgggagg gggaatgatc tcaaagttat tttgcttggc 441 | 4381 tctcatattt ttatcatcaa gtggccttgc agaaaaaaac acatatacag caaaagacat 442 | 4441 cttgcaaaac ctagaattaa atacctttgg caattcattg tctcatggca tctatgggaa 443 | 4501 acagacaacc ttcaagcaaa ccgagtttac aaatattaaa agcaacacca aaaaacacat 444 | 4561 tgcacttatc aataaagaca actcatggat gatatcatta aaaatactag gaattaagag 445 | 4621 agatgagtat actgtctgtt ttgaagattt ctctctaata agaccgccaa catatgtagc 446 | 4681 catacatcct ctacttataa aaaaagtaaa atctggaaac tttatagtag tgaaagaaat 447 | 4741 aaagaaatct atccctggtt gcactgtata ttatcattaa tagcaagccc ctcattatta 448 | 4801 tgaggggctc atggttattt taacaatcca ctatcgatat ctttttgcac cagagcgccc 449 | 4861 tctcgtttac gtctgtcaga cattccatca acaatattat taaaagcatt tacaaggcca 450 | 4921 ttccagtctt ttgcgataac tttattccat actgtgggag cagttctgga taacttaaac 451 | 4981 cctttttgat atccaataga caccagtgct gtacgggttc tcaacggtaa atcgctgaac 452 | 5041 cgaagaccga tattagcgtc attgaaaaga ccttcaatct tatgtgagaa tttatcaata 453 | 5101 taaatattag ataagagatg agcttcatta tcagaaagcg tcagaggtgc tgttctcact 454 | 5161 ttatcataag cctccttccc tcgaagcata taatacccat caagtctatc tgcaatatac 455 | 5221 tgagggacac cgtcattcaa taaatcctgt ttgcttcgct gaccaaggtc aaccccggaa 456 | 5281 ccgaatgtaa caccggtact gttaaaataa tcgctactag gattagacgg aaaatgactt 457 | 5341 gtcggattaa acccttcaaa accattactg gagaaaatat cgtggtcaac aatatttacc 458 | 5401 gaacgacgta aaaattcctt cagttgacta atattgtcaa agttaatgac agtgttgtcc 459 | 5461 gctaggacga tgcgatttcg gttattattc agaatgtctt cgttctcttt cttatcgaga 460 | 5521 tgttcaatag attcggcaat cgttccctca agaaccatga cacggtagac tttcacaccg 461 | 5581 tctttttcct gacctgtttc aacagttatt ttctgttcgt aagacacggt cccttcagtt 462 | 5641 tttgaaattt tactttcctg gcggatctta tttgaatatt cactgtcttt ctccatctcc 463 | 5701 gtatcaatcg gaaaccccat aatgtacatc agtttaaaat tactccggcc aggcagatcc 464 | 5761 acataatgtg gtaatgcaat tgtaatcgaa ttagcttcaa aatttggtct gtaactgctt 465 | 5821 aatgtacttc cggaaaagag aaaagccgga acaccacctg aaccattcac taccattgta 466 | 5881 tctgacataa aaattcctct ttaacacata aaaaaaacaa taagttaaaa aaatactgta 467 | 5941 cataaaagca ctgtttttat gtacagtaat aaaattacgc cgctttattt tctctgtcaa 468 | 6001 taatatgaaa tttcattttt gtgatctgaa tcactcttat aaaaatcagg aagggaagat 469 | 6061 tcgcagcaga aaaacagcac cgggtaacat cagaaaaaaa cagaaaggag ataacgtgag 470 | 6121 caaaacaaaa tctggtcgcc accgactgag caaaacagac aaacgcctgc tggctgcact 471 | 6181 tgtcgttgcc ggatacgaag aacggacagc ccgtgacctc atccagaaac acgtttacac 472 | 6241 actgacacag gccgacctgc gccatctggt cagtgaaatc agtaacggtg tgggacagtc 473 | 6301 acaggcctac gatgcgattt accaggcgag acgcattcgt ctcgcccgta aatacctgag 474 | 6361 cggaaaaaaa ccggaagggg tggaaccccg ggaagggcag gaacgggaag atttaccata 475 | 6421 actcccgtta tcagtaccat cggctcaacg ctcgttgtcg gatctgaaaa attcgctcaa 476 | 6481 aagatcatat ttccctggat attttccacc gtttcttatg tgagaaaagt cacataattc 477 | 6541 tgtcagacga cgagaaaacg gatatcgatt attgtttaat atttttacat tattaaaaat 478 | 6601 gaaattagat aatcagatac aaataatatg ttttcgttca tgcagagaga ttaagggtgt 479 | 6661 ctaatgaaga aaagttctat tgtggcaacc attataacta ttctgtccgg gagtgctaat 480 | 6721 gcagcatcat ctcagttaat accaaatata tcccctgaca gctttacagt tgcagcctcc 481 | 6781 accgggatgc tgagtggaaa gtctcatgaa atgctttatg acgcagaaac aggaagaaag 482 | 6841 atcagccagt tagactggaa gatcaaaaat gtcgctatcc tgaaaggtga tatatcctgg 483 | 6901 gatccatact catttctgac cctgaatgcc agggggtgga cgtctctggc ttccgggtca 484 | 6961 ggtaatatgg atgactacga ctggatgaat gaaaatcaat ctgagtggac agatcactca 485 | 7021 tctcatcctg ctacaaatgt taatcatgcc aatgaatatg acctcaatgt gaaaggctgg 486 | 7081 ttactccagg atgagaatta taaagcaggt ataacagcag gatatcagga aacacgtttc 487 | 7141 agttggacag ctacaggtgg ttcatatagt tataataatg gagcttatac cggaaacttc 488 | 7201 ccgaaaggag tgcgggtaat aggttataac cagcgctttt ctatgccata tattggactt 489 | 7261 gcaggccagt atcgcattaa tgattttgag ttaaatgcat tatttaaatt cagcgactgg 490 | 7321 gttcgggcac atgataatga tgagcactat atgagagatc ttactttccg tgagaagaca 491 | 7381 tccggctcac gttattatgg taccgtaatt aacgctggat attatgtcac acctaatgcc 492 | 7441 aaagtctttg cggaatttac atacagtaaa tatgatgagg gcaaaggagg tactcagatc 493 | 7501 attgataaga atagtggaga ttctgtctct attggcggag atgctgccgg tatttccaat 494 | 7561 aaaaattata ctgtgacggc gggtctgcaa tatcgcttct gaaaaataca gatcatatct 495 | 7621 ctcttttcat cctcccctag cggggaggat gtctgtggaa aggaggttgg tgtttgacca 496 | 7681 accttcagat gtgtgaaaaa tcaccttttt caccataatg acggggcgct cattctgttg 497 | 7741 ttttgccttg acattctcca cgtctttcag ggcatggaga aggtcaaatt agacatggaa 498 | 7801 cgctactctc cttcctgtag gaagctcaac atccaagctt aatttgcctc ccattgcttc 499 | 7861 aacgtaacgc tttaacgtcg ccagctttaa atcatttccg cgctgctcca gctttgttac 500 | 7921 tgctggctgg cttataccca tcgcctcagc aacttgtttt tgtgataact ggagttcttc 501 | 7981 acgcatcatc tgcaagccga cctcaagaat catctcatct gccatttctt taattcgtgt 502 | 8041 ctggctttca ggtgaacgac tggcaatcac ctcatctaat gttctcatta cttgctctcc 503 | 8101 agtgtgttca gatgtgctgt aaattcatcc tcagctatac gcaccagttt ttcataaaac 504 | 8161 cgcttatcat tacttttatc tcctgcacaa agaacgatag cccgacgaat cggatcgaac 505 | 8221 gcataaaagg ctcttatcgg acggccagaa aactgaacgc gaagctcttt catatttttg 506 | 8281 taccgagaac ctttcacggt atcggcatat ggcctgggta actcaggtcc gtaaacctgt 507 | 8341 agctttttca aatcagccaa aaccttttcc tgaagagcgt cttcttgctc atttagccag 508 | 8401 tcgtcaaatc gctggctaaa aagtaccatc cacatgctca accctataac ctgtagctta 509 | 8461 ccccactaac aatataacct acgagttata ttttcaagaa aagctggcta tttaacataa 510 | 8521 cggcaatttg tacgcaccac tgaaatgcgt tcagcgcgat cacggcaaca gacaggcaaa 511 | 8581 aatagcaaca aacctcccga aaaaccgccg cgatcgcgcc tgataaattt taaccttatg 512 | 8641 catatctatg cagccaggcg aatcacgaac gaattgcctg cctgatgtaa ctgaaacggg 513 | 8701 tgttttttcc tgatttggtg ggcgtggaag acggaacatg aacgggaaaa cagaattcat 514 | 8761 gccagatgag cgcgatctgg caattaaggc aaaacacagc aacaaagaca cgccagaatc 515 | 8821 gcgcccggat atgttttaac gcgattttca gactcagaca aattcagcag aatgctactc 516 | 8881 cattcaccgg gctgatggtg aatacatgcg tatccaggat gagtacattt ctggctctgc 517 | 8941 cacagctctg tctgttggca gctttcgcct gtccggaaac ctgcttaaaa cgctcccgaa 518 | 9001 aggcctctga accagaaagc aacaaaacac aggccattaa gtaaatcgcg ttaaaacacg 519 | 9061 tctgatggat tgctgcaaaa aaaagtccct aatggagcag ggactgttaa acccagtgaa 520 | 9121 tagcgtctaa attaaagtaa gaatacgacc aggtactctt cagaaaagag attaatccac 521 | 9181 cgcacagaat aatcaacagt aaaaacaaac aaccctgatt ttttattttt ctttttttcg 522 | 9241 ataaaaacaa aattaaagaa ataattaatc agaacattcc ttaacttcag ggcattgcct 523 | 9301 gtgttccatt ttgtgattag tctgaaactt ccgaaggtgg ataacacccg gtattttttt 524 | 9361 gctcacataa agcccctcct tcaggcagag gggctttttc tttgccacca cataaaaaag 525 | 9421 gccctcacag gaggtgttct gtgagggcgt atgataagga ctgaatcgat ggttaatatg 526 | 9481 tctagtcctg acttttgcat ctccgaatat aaaaccctgt ttaacggcat gcaaaaccaa 527 | 9541 aaaataaaaa tgtgacatcg caatgccaga taatattgac gcatgaggga atgcgtaccc 528 | 9601 cgacccctg 529 | // 530 | -------------------------------------------------------------------------------- /My_sample_notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**Uncomment the following lines (by deleting the leading `#`) if you are running this in Colab. (Thanks to Jake VanderPlas for the tip!)**" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# !pip install biopython\n", 17 | "# !pip install folium\n", 18 | "# !curl -O https://raw.githubusercontent.com/jperkel/example_notebook/master/NC_005816.gb" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "This simple notebook demonstrates how users can interleave text, code, and results in a single document. We start with a simple calculation -- computing the first 25 numbers in the Fibonacci sequence, where each value equals the sum of the two previous values. The Jupyter notebook allows us to express that mathematically, using the typesetting language $\\LaTeX{}$: $$F_n = F_{n-1} + F_{n-2}$$\n", 26 | "Thus, the sequence is: 0, 1, 1, 2, 3, 5, 8, ..." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "The first cell contains an IPython 'magic' code, '%matplotlib', which allows the notebook to display plots inline, in the body of the notebook." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "%matplotlib inline" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 3, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "import matplotlib.pyplot as plt" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 4, 57 | "metadata": {}, 58 | "outputs": [ 59 | { 60 | "name": "stdout", 61 | "output_type": "stream", 62 | "text": [ 63 | "[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368]\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "# calculate the first 25 Fibonacci numbers\n", 69 | "f1 = 0\n", 70 | "f2 = 1\n", 71 | "ar = [f1, f2] # a list to hold the computed values. We know the first two numbers\n", 72 | "\n", 73 | "# we only need to run our calculation 23 times, because positions 1 and 2 are known\n", 74 | "for i in range (23):\n", 75 | " f3 = f1 + f2\n", 76 | " ar.append (f3)\n", 77 | " f1 = f2\n", 78 | " f2 = f3\n", 79 | " \n", 80 | "print (ar) # below, you see the output of the code itself." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "Plot the data" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 5, 93 | "metadata": {}, 94 | "outputs": [ 95 | { 96 | "data": { 97 | "image/png": "\n", 98 | "text/plain": [ 99 | "
" 100 | ] 101 | }, 102 | "metadata": { 103 | "needs_background": "light" 104 | }, 105 | "output_type": "display_data" 106 | } 107 | ], 108 | "source": [ 109 | "fig, ax = plt.subplots()\n", 110 | "ax.plot (range(25), ar, \"ro\")\n", 111 | "## uncomment the following call to ax.plot() (by removing the leading '#') and select \n", 112 | "## 'Cell > Run All' (in Binder) or 'Run > Run All Cells' (Jupyter) to change the graph below\n", 113 | "# ax.plot (range(25), ar)\n", 114 | "ax.set (xlabel = \"Sequence No.\", ylabel = \"Fibonacci No.\", \n", 115 | " title = \"The First 25 Fibonacci Numbers\")\n", 116 | "plt.show()" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "Anything you can do programmatically, can be documented in a notebook. Here we'll do some simple sequence analysis with Biopython. The following example is adapted from the [Biopython tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html). " 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "First, we'll read in a Genbank-formatted file, which represents a circular DNA, called a plasmid, from the bacterium, *Yersinia pestis*. " 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 6, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "data": { 140 | "text/plain": [ 141 | "Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', IUPACAmbiguousDNA())" 142 | ] 143 | }, 144 | "execution_count": 6, 145 | "metadata": {}, 146 | "output_type": "execute_result" 147 | } 148 | ], 149 | "source": [ 150 | "from Bio import SeqIO\n", 151 | "\n", 152 | "record = SeqIO.read(\"NC_005816.gb\", \"genbank\")\n", 153 | "record.seq" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "How long is the DNA, and how which genes does it encode?" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 7, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "name": "stdout", 170 | "output_type": "stream", 171 | "text": [ 172 | "Length: 9609 bp\n", 173 | "No. features: 41 \n", 174 | "\n", 175 | "Feat. 3: putative transposase\n", 176 | "Feat. 8: transposase/IS protein\n", 177 | "Feat. 15: putative replication regulatory protein\n", 178 | "Feat. 18: hypothetical protein\n", 179 | "Feat. 21: pesticin immunity protein\n", 180 | "Feat. 23: pesticin\n", 181 | "Feat. 29: hypothetical protein\n", 182 | "Feat. 32: outer membrane protease\n", 183 | "Feat. 35: putative transcriptional regulator\n", 184 | "Feat. 38: hypothetical protein\n" 185 | ] 186 | } 187 | ], 188 | "source": [ 189 | "print(\"Length:\", len(record.seq), \"bp\")\n", 190 | "print(\"No. features:\", len(record.features),\"\\n\")\n", 191 | "\n", 192 | "for i,feat in enumerate (record.features):\n", 193 | " if (feat.type == \"CDS\"):\n", 194 | " product = feat.qualifiers['product'][0]\n", 195 | " print (\"Feat. %d: %s\" % (i, product))" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "This plasmid encodes 41 features, including 10 genes. Let's focus on feature #18, which encodes a 'hypothetical protein'." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 8, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "Gene location\n", 215 | "Start: 3485 \n", 216 | "End: 3857\n", 217 | "GeneID: ['GI:45478715', 'GeneID:2767720']\n" 218 | ] 219 | } 220 | ], 221 | "source": [ 222 | "feat = record.features[18]\n", 223 | "print (\"Gene location\\nStart:\", feat.location.start, \"\\nEnd:\", feat.location.end)\n", 224 | "print (\"GeneID:\", feat.qualifiers.get(\"db_xref\"))" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "These data show us that the gene is located between bases 3485 and 3857. Let's retrieve that segment and translate it." 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 9, 237 | "metadata": {}, 238 | "outputs": [ 239 | { 240 | "name": "stdout", 241 | "output_type": "stream", 242 | "text": [ 243 | "VSKKRRPQKRPRRRRFFHRLRPPDEHHKNRRSSQRWRNPTGLKDTRRFPPEAPSCALLFRPCRLPDTSPPFSLREAWRFLIAHAVGISVRCRSFAPSWAVCTNPPFSPTTAPYPVTIVLSPTR*\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "my_gene = record[3485:3857]\n", 249 | "print (my_gene.seq.translate(table=\"Bacterial\"))" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "Now, let's find out more about this gene, using the NCBI Entrez database. We learned the accession number above: `GeneID: ['GI:45478715', 'GeneID:2767720']`.\n", 257 | "\n", 258 | "**20 May 2020**: Updated the following two cells to reflect changes in NCBI's Entrez database. " 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 10, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "from Bio import Entrez" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 11, 273 | "metadata": {}, 274 | "outputs": [ 275 | { 276 | "name": "stdout", 277 | "output_type": "stream", 278 | "text": [ 279 | "LOCUS AAS58761 123 aa linear BCT 26-JUL-2016\n", 280 | "DEFINITION conserved hypothetical protein (plasmid) [Yersinia pestis biovar\n", 281 | " Microtus str. 91001].\n", 282 | "ACCESSION AAS58761\n", 283 | "VERSION AAS58761.1\n", 284 | "DBLINK BioProject: PRJNA10638\n", 285 | " BioSample: SAMN02602970\n", 286 | "DBSOURCE accession AE017046.1\n", 287 | "KEYWORDS .\n", 288 | "SOURCE Yersinia pestis biovar Microtus str. 91001\n", 289 | " ORGANISM Yersinia pestis biovar Microtus str. 91001\n", 290 | " Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;\n", 291 | " Yersiniaceae; Yersinia.\n", 292 | "REFERENCE 1 (residues 1 to 123)\n", 293 | " AUTHORS Song,Y., Tong,Z., Wang,J., Wang,L., Guo,Z., Han,Y., Zhang,J.,\n", 294 | " Pei,D., Zhou,D., Qin,H., Pang,X., Han,Y., Zhai,J., Li,M., Cui,B.,\n", 295 | " Qi,Z., Jin,L., Dai,R., Chen,F., Li,S., Ye,C., Du,Z., Lin,W.,\n", 296 | " Wang,J., Yu,J., Yang,H., Wang,J., Huang,P. and Yang,R.\n", 297 | " TITLE Complete genome sequence of Yersinia pestis strain 91001, an\n", 298 | " isolate avirulent to humans\n", 299 | " JOURNAL DNA Res. 11 (3), 179-197 (2004)\n", 300 | " PUBMED 15368893\n", 301 | "REFERENCE 2 (residues 1 to 123)\n", 302 | " AUTHORS Zhou,D., Tong,Z., Song,Y., Han,Y., Pei,D., Pang,X., Zhai,J., Li,M.,\n", 303 | " Cui,B., Qi,Z., Jin,L., Dai,R., Du,Z., Wang,J., Guo,Z., Wang,J.,\n", 304 | " Huang,P. and Yang,R.\n", 305 | " TITLE Genetics of metabolic variations between Yersinia pestis biovars\n", 306 | " and the proposal of a new biovar, microtus\n", 307 | " JOURNAL J. Bacteriol. 186 (15), 5147-5152 (2004)\n", 308 | " PUBMED 15262951\n", 309 | "REFERENCE 3 (residues 1 to 123)\n", 310 | " AUTHORS Song,Y., Tong,Z., Wang,L., Han,Y., Zhang,J., Pei,D., Wang,J.,\n", 311 | " Zhou,D., Han,Y., Pang,X., Zhai,J., Chen,F., Qin,H., Wang,J., Li,S.,\n", 312 | " Guo,Z., Ye,C., Du,Z., Lin,W., Wang,J., Yu,J., Yang,H., Wang,J.,\n", 313 | " Huang,P. and Yang,R.\n", 314 | " TITLE Direct Submission\n", 315 | " JOURNAL Submitted (24-APR-2003) The Institute of Microbiology and\n", 316 | " Epidemiology, Academy of Military Medical Sciences, No. 20,\n", 317 | " Dongdajie Street, Fengtai District, Beijing 100071, People's\n", 318 | " Republic of China\n", 319 | "COMMENT Method: conceptual translation.\n", 320 | "FEATURES Location/Qualifiers\n", 321 | " source 1..123\n", 322 | " /organism=\"Yersinia pestis biovar Microtus str. 91001\"\n", 323 | " /strain=\"91001\"\n", 324 | " /sub_species=\"microtus\"\n", 325 | " /db_xref=\"taxon:229193\"\n", 326 | " /plasmid=\"pPCP1\"\n", 327 | " /note=\"biovar: Microtus\"\n", 328 | " Protein 1..123\n", 329 | " /product=\"conserved hypothetical protein\"\n", 330 | " CDS 1..123\n", 331 | " /locus_tag=\"YP_pPCP04\"\n", 332 | " /old_locus_tag=\"pPCP04\"\n", 333 | " /coded_by=\"AE017046.1:3486..3857\"\n", 334 | " /note=\"Best Blastp hit = gi|321919|pir||JQ1541\n", 335 | " hypothetical 16.9K protein - Salmonella typhi murium\n", 336 | " plasmid NTP16.\"\n", 337 | " /transl_table=11\n", 338 | "ORIGIN \n", 339 | " 1 mskkrrpqkr prrrrffhrl rppdehhknr rssqrwrnpt glkdtrrfpp eapscallfr\n", 340 | " 61 pcrlpdtspp fslreawrfl iahavgisvr crsfapswav ctnppfsptt apypvtivls\n", 341 | " 121 ptr\n", 342 | "//\n", 343 | "\n", 344 | "\n" 345 | ] 346 | } 347 | ], 348 | "source": [ 349 | "Entrez.email = \"A.N.Other@example.com\" # Always tell NCBI who you are\n", 350 | "with Entrez.efetch (db=\"protein\",rettype=\"gb\",retmode=\"text\",id=\"AAS58761\") as handle:\n", 351 | " print (handle.read())\n", 352 | "# handle = Entrez.efetch (db=\"nucleotide\",id=\"45478715\",rettype=\"gb\",retmode=\"text\")\n", 353 | "# print (handle.read())" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "Jupyter also supports interactive data exploration. Here we'll add an interactive map, just as in our June [mapping feature](https://www.nature.com/articles/d41586-018-05331-6) -- something you might do when working with geospatial data. This requires the Python Leaflet library, folium. " 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 12, 366 | "metadata": {}, 367 | "outputs": [], 368 | "source": [ 369 | "import folium" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "Now we create a simple map: a few points in London, Oxford and Cambridge, overlaid on either a street map, or on a map of geological data provided by the [Macrostrat Project](https://macrostrat.org/). " 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": 13, 382 | "metadata": {}, 383 | "outputs": [], 384 | "source": [ 385 | "coords = { \n", 386 | " 0: { \"name\": \"Nature\", \"lat\": 51.533925, \"long\": -0.121553 },\n", 387 | " 1: { \"name\": \"Francis Crick Institite\", \"lat\": 51.531877, \"long\": -0.128767 },\n", 388 | " 2: { \"name\": \"MRC Laboratory for Molecular Cell Biology\", \"lat\": 51.524435, \"long\": -0.132495 },\n", 389 | " 3: { \"name\": \"Kings College London\", \"lat\": 51.511573, \"long\": -0.116083 },\n", 390 | " 4: { \"name\": \"Imperial College London\", \"lat\": 51.498780, \"long\": -0.174888 },\n", 391 | " 5: { \"name\": \"Cambridge University\", \"lat\": 52.206960, \"long\": 0.115034 },\n", 392 | " 6: { \"name\": \"Oxford University\", \"lat\": 51.754843, \"long\": -1.254302 },\n", 393 | " 7: { \"name\": \"Platform 9 3/4\", \"lat\": 51.532349, \"long\": -0.123806 }\n", 394 | "}" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": 14, 400 | "metadata": {}, 401 | "outputs": [ 402 | { 403 | "data": { 404 | "text/plain": [ 405 | "" 406 | ] 407 | }, 408 | "execution_count": 14, 409 | "metadata": {}, 410 | "output_type": "execute_result" 411 | } 412 | ], 413 | "source": [ 414 | "m = folium.Map(location = [51.8561, -0.2966], tiles = 'CartoDB positron', zoom_start = 9)\n", 415 | "\n", 416 | "# add the locations to the map\n", 417 | "for key in coords.keys():\n", 418 | " folium.CircleMarker(\n", 419 | " location=[coords[key]['lat'], coords[key]['long']],\n", 420 | " popup=coords[key]['name'],\n", 421 | " color=('crimson' if coords[key]['name'] == 'Nature' else 'blue'),\n", 422 | " fill=False,\n", 423 | " ).add_to(m)\n", 424 | "\n", 425 | "# pull in the Macrostrat tile layer\n", 426 | "folium.TileLayer(tiles='https://tiles.macrostrat.org/carto/{z}/{x}/{y}.png', \n", 427 | " attr='Macrostrat', name='Macrostrat').add_to(m)\n", 428 | "folium.LayerControl().add_to(m) # allow user to switch between layers\n", 429 | "folium.LatLngPopup().add_to(m) # click on the map to get Lat/Long in a popup" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "Draw the map. **Note that this map is interactive**: you can zoom, pan, click the points of interest, and alternate between the two layers (by clicking on the tiles icon in the upper-right corner of the map). If you click anywhere on the map, a popup will appear showing the latitude and longitude of that position." 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 15, 442 | "metadata": {}, 443 | "outputs": [ 444 | { 445 | "data": { 446 | "text/html": [ 447 | "
Make this Notebook Trusted to load map: File -> Trust Notebook
" 448 | ], 449 | "text/plain": [ 450 | "" 451 | ] 452 | }, 453 | "execution_count": 15, 454 | "metadata": {}, 455 | "output_type": "execute_result" 456 | } 457 | ], 458 | "source": [ 459 | "m" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "Document our session, for [computational reproducibility](https://www.nature.com/articles/d41586-018-05990-5)!" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": 16, 472 | "metadata": {}, 473 | "outputs": [ 474 | { 475 | "name": "stdout", 476 | "output_type": "stream", 477 | "text": [ 478 | "{'commit_hash': '8bda98619',\n", 479 | " 'commit_source': 'installation',\n", 480 | " 'default_encoding': 'UTF-8',\n", 481 | " 'ipython_path': '/opt/anaconda3/lib/python3.7/site-packages/IPython',\n", 482 | " 'ipython_version': '7.12.0',\n", 483 | " 'os_name': 'posix',\n", 484 | " 'platform': 'Darwin-18.7.0-x86_64-i386-64bit',\n", 485 | " 'sys_executable': '/opt/anaconda3/bin/python',\n", 486 | " 'sys_platform': 'darwin',\n", 487 | " 'sys_version': '3.7.6 (default, Jan 8 2020, 13:42:34) \\n'\n", 488 | " '[Clang 4.0.1 (tags/RELEASE_401/final)]'}\n" 489 | ] 490 | } 491 | ], 492 | "source": [ 493 | "import IPython\n", 494 | "print(IPython.sys_info())" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 17, 500 | "metadata": {}, 501 | "outputs": [ 502 | { 503 | "name": "stdout", 504 | "output_type": "stream", 505 | "text": [ 506 | "biopython==1.76\n", 507 | "folium==0.11.0\n", 508 | "matplotlib==3.1.3\n" 509 | ] 510 | } 511 | ], 512 | "source": [ 513 | "!pip freeze | grep -E 'folium|matplotlib|biopython'" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": null, 519 | "metadata": {}, 520 | "outputs": [], 521 | "source": [] 522 | } 523 | ], 524 | "metadata": { 525 | "kernelspec": { 526 | "display_name": "Python 3", 527 | "language": "python", 528 | "name": "python3" 529 | }, 530 | "language_info": { 531 | "codemirror_mode": { 532 | "name": "ipython", 533 | "version": 3 534 | }, 535 | "file_extension": ".py", 536 | "mimetype": "text/x-python", 537 | "name": "python", 538 | "nbconvert_exporter": "python", 539 | "pygments_lexer": "ipython3", 540 | "version": "3.7.6" 541 | } 542 | }, 543 | "nbformat": 4, 544 | "nbformat_minor": 4 545 | } 546 | --------------------------------------------------------------------------------