├── DNA Translation ├── DNA Translation.ipynb ├── DNA.txt └── protein.txt ├── LICENSE.txt └── README.md /DNA Translation/DNA Translation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | " #                   DNA TRANSLATION" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "* DNA is a discrete code physically presentin almost every cell of an organism.\n", 15 | "\n", 16 | "* We can think of DNA as a one dimensional string of characterswith four characters to choose from.\n", 17 | "* These characters are A, C, G, and T.\n", 18 | "* They stand for the first letters with the four nucleotides used to construct DNA.\n", 19 | "* The full names of these nucleotides are adenine, cytosine, guanine,and thymine.\n", 20 | "* Each unique three character sequence of nucleotides,sometimes called a nucleotide triplet, corresponds to one amino acid.\n", 21 | "* The sequence of amino acids is unique for each type of proteinand all proteins are built from the same set of just 20 amino acids for all living things.\n", 22 | "* Protein molecules dominate the behavior of the cellserving as structural supports, chemical catalysts, molecular motors, and so on.\n", 23 | "* The so called central dogma of molecular biologydescribes the flow of genetic information in a biological system.\n", 24 | "* Instructions in the DNA are first transcribed into RNAand the RNA is then translated into proteins.\n", 25 | "* We can think of DNA, when read as sequences of three letters,as a dictionary of life." 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | " ##### In this task we conver DNA into Protein \n", 33 | " ##### Dataset Link: https://www.ncbi.nlm.nih.gov/nuccore/NM_207618.2\n", 34 | " ##### I alredy downloaded and posted the dataset in the folder of DNA Trabslation.it is just for reference" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 1, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "#defining function to read strings\n", 44 | "def read_seq(inputfile):\n", 45 | " \"\"\"Reads and returns the input sequence with special characters removed.\"\"\"\n", 46 | " with open (inputfile,\"r\") as f:\n", 47 | " seq = f.read()\n", 48 | " seq = seq.replace(\"\\n\",\"\")\n", 49 | " seq = seq.replace(\"\\r\",\"\")\n", 50 | " return seq" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "#loading protein and dna datas\n", 60 | "prt=read_seq(\"protein.txt\")\n", 61 | "dna =read_seq(\"DNA.txt\")" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 3, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "#defining function to translate codon\n", 71 | "def translate(seq):\n", 72 | " \n", 73 | " \"\"\"Translate a string containing a nucleotide sequence into a string containing the \n", 74 | " corresponding sequence of amino acids. Nucleotides are translated in triplets using\n", 75 | " the table dictionary; each amino acid is encoded with a string of length 1.\"\"\" #doc string\n", 76 | " \n", 77 | " table = {\n", 78 | " 'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',\n", 79 | " 'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',\n", 80 | " 'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',\n", 81 | " 'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',\n", 82 | " 'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',\n", 83 | " 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',\n", 84 | " 'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',\n", 85 | " 'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',\n", 86 | " 'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',\n", 87 | " 'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',\n", 88 | " 'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',\n", 89 | " 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',\n", 90 | " 'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',\n", 91 | " 'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',\n", 92 | " 'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',\n", 93 | " 'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}\n", 94 | "\n", 95 | " protein=\"\"\n", 96 | " if len(seq) %3 ==0:\n", 97 | " for i in range(0, len(seq), 3):\n", 98 | " codon =seq[i:i+3]\n", 99 | " protein += table[codon]\n", 100 | " \n", 101 | " return protein" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 4, 107 | "metadata": { 108 | "scrolled": true 109 | }, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/plain": [ 114 | "'K'" 115 | ] 116 | }, 117 | "execution_count": 4, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "#checking translate function\n", 124 | "translate(\"AAA\")" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 5, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "data": { 134 | "text/plain": [ 135 | "'R'" 136 | ] 137 | }, 138 | "execution_count": 5, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "translate(\"AGA\")" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 6, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "''" 156 | ] 157 | }, 158 | "execution_count": 6, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "\n", 165 | "#checking the output\n", 166 | "translate(dna) \n" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | " #### To solve this go to our dataset link see the CDS(coding sequence) which is (21..939) as cds is location of the gene where coding sequence starts and ends.\n", 174 | " #### As python indexing starts from 0 so I slice at 20 ..938\n" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 7, 180 | "metadata": {}, 181 | "outputs": [ 182 | { 183 | "data": { 184 | "text/plain": [ 185 | "'MSTHDTSLKTTEEVAFQIILLCQFGVGTFANVFLFVYNFSPISTGSKQRPRQVILRHMAVANALTLFLTIFPNNMMTFAPIIPQTDLKCKLEFFTRLVARSTNLCSTCVLSIHQFVTLVPVNSGKGILRASVTNMASYSCYSCWFFSVLNNIYIPIKVTGPQLTDNNNNSKSKLFCSTSDFSVGIVFLRFAHDATFMSIMVWTSVSMVLLLHRHCQRMQYIFTLNQDPRGQAETTATHTILMLVVTFVGFYLLSLICIIFYTYFIYSHHSLRHCNDILVSGFPTISPLLLTFRDPKGPCSVFFNC_'" 186 | ] 187 | }, 188 | "execution_count": 7, 189 | "metadata": {}, 190 | "output_type": "execute_result" 191 | } 192 | ], 193 | "source": [ 194 | "translate(dna[20:938])" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 8, 200 | "metadata": { 201 | "scrolled": false 202 | }, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/plain": [ 207 | "'MSTHDTSLKTTEEVAFQIILLCQFGVGTFANVFLFVYNFSPISTGSKQRPRQVILRHMAVANALTLFLTIFPNNMMTFAPIIPQTDLKCKLEFFTRLVARSTNLCSTCVLSIHQFVTLVPVNSGKGILRASVTNMASYSCYSCWFFSVLNNIYIPIKVTGPQLTDNNNNSKSKLFCSTSDFSVGIVFLRFAHDATFMSIMVWTSVSMVLLLHRHCQRMQYIFTLNQDPRGQAETTATHTILMLVVTFVGFYLLSLICIIFYTYFIYSHHSLRHCNDILVSGFPTISPLLLTFRDPKGPCSVFFNC'" 208 | ] 209 | }, 210 | "execution_count": 8, 211 | "metadata": {}, 212 | "output_type": "execute_result" 213 | } 214 | ], 215 | "source": [ 216 | "prt" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "#### As you can see string is almost similar the only difference is underscode character that appears at the end of our translated sequences.\n" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "#### skipping stop codon as our downloaded DNA sequence do not contain stop condon " 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 9, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "data": { 240 | "text/plain": [ 241 | "'MSTHDTSLKTTEEVAFQIILLCQFGVGTFANVFLFVYNFSPISTGSKQRPRQVILRHMAVANALTLFLTIFPNNMMTFAPIIPQTDLKCKLEFFTRLVARSTNLCSTCVLSIHQFVTLVPVNSGKGILRASVTNMASYSCYSCWFFSVLNNIYIPIKVTGPQLTDNNNNSKSKLFCSTSDFSVGIVFLRFAHDATFMSIMVWTSVSMVLLLHRHCQRMQYIFTLNQDPRGQAETTATHTILMLVVTFVGFYLLSLICIIFYTYFIYSHHSLRHCNDILVSGFPTISPLLLTFRDPKGPCSVFFNC'" 242 | ] 243 | }, 244 | "execution_count": 9, 245 | "metadata": {}, 246 | "output_type": "execute_result" 247 | } 248 | ], 249 | "source": [ 250 | "translate(dna[20:935]) " 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 10, 256 | "metadata": {}, 257 | "outputs": [ 258 | { 259 | "data": { 260 | "text/plain": [ 261 | "True" 262 | ] 263 | }, 264 | "execution_count": 10, 265 | "metadata": {}, 266 | "output_type": "execute_result" 267 | } 268 | ], 269 | "source": [ 270 | "#checking translation\n", 271 | "\n", 272 | "prt == translate(dna[20:935])" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "##### From this we conclude that both DNA and protein sequence are same" 280 | ] 281 | } 282 | ], 283 | "metadata": { 284 | "kernelspec": { 285 | "display_name": "Python 3", 286 | "language": "python", 287 | "name": "python3" 288 | }, 289 | "language_info": { 290 | "codemirror_mode": { 291 | "name": "ipython", 292 | "version": 3 293 | }, 294 | "file_extension": ".py", 295 | "mimetype": "text/x-python", 296 | "name": "python", 297 | "nbconvert_exporter": "python", 298 | "pygments_lexer": "ipython3", 299 | "version": "3.8.5" 300 | } 301 | }, 302 | "nbformat": 4, 303 | "nbformat_minor": 4 304 | } 305 | -------------------------------------------------------------------------------- /DNA Translation/DNA.txt: -------------------------------------------------------------------------------- 1 | GGTCAGAAAAAGCCCTCTCCATGTCTACTCACGATACATCCCTGAAAACCACTGAGGAAGTGGCTTTTCA 2 | GATCATCTTGCTTTGCCAGTTTGGGGTTGGGACTTTTGCCAATGTATTTCTCTTTGTCTATAATTTCTCT 3 | CCAATCTCGACTGGTTCTAAACAGAGGCCCAGACAAGTGATTTTAAGACACATGGCTGTGGCCAATGCCT 4 | TAACTCTCTTCCTCACTATATTTCCAAACAACATGATGACTTTTGCTCCAATTATTCCTCAAACTGACCT 5 | CAAATGTAAATTAGAATTCTTCACTCGCCTCGTGGCAAGAAGCACAAACTTGTGTTCAACTTGTGTTCTG 6 | AGTATCCATCAGTTTGTCACACTTGTTCCTGTTAATTCAGGTAAAGGAATACTCAGAGCAAGTGTCACAA 7 | ACATGGCAAGTTATTCTTGTTACAGTTGTTGGTTCTTCAGTGTCTTAAATAACATCTACATTCCAATTAA 8 | GGTCACTGGTCCACAGTTAACAGACAATAACAATAACTCTAAAAGCAAGTTGTTCTGTTCCACTTCTGAT 9 | TTCAGTGTAGGCATTGTCTTCTTGAGGTTTGCCCATGATGCCACATTCATGAGCATCATGGTCTGGACCA 10 | GTGTCTCCATGGTACTTCTCCTCCATAGACATTGTCAGAGAATGCAGTACATATTCACTCTCAATCAGGA 11 | CCCCAGGGGCCAAGCAGAGACCACAGCAACCCATACTATCCTGATGCTGGTAGTCACATTTGTTGGCTTT 12 | TATCTTCTAAGTCTTATTTGTATCATCTTTTACACCTATTTTATATATTCTCATCATTCCCTGAGGCATT 13 | GCAATGACATTTTGGTTTCGGGTTTCCCTACAATTTCTCCTTTACTGTTGACCTTCAGAGACCCTAAGGG 14 | TCCTTGTTCTGTGTTCTTCAACTGTTGAAAGCCAGAGTCACTAAAAATGCCAAACACAGAAGACAGCTTT 15 | GCTAATACCATTAAATACTTTATTCCATAAATATGTTTTTAAAAGCTTGTATGAACAAGGTATGGTGCTC 16 | ACTGCTATACTTATAAAAGAGTAAGGTTATAATCACTTGTTGATATGAAAAGATTTCTGGTTGGAATCTG 17 | ATTGAAACAGTGAGTTATTCACCACCCTCCATTCTCT 18 | -------------------------------------------------------------------------------- /DNA Translation/protein.txt: -------------------------------------------------------------------------------- 1 | MSTHDTSLKTTEEVAFQIILLCQFGVGTFANVFLFVYNFSPIST 2 | GSKQRPRQVILRHMAVANALTLFLTIFPNNMMTFAPIIPQTDLKCKLEFFTRLVARST 3 | NLCSTCVLSIHQFVTLVPVNSGKGILRASVTNMASYSCYSCWFFSVLNNIYIPIKVTG 4 | PQLTDNNNNSKSKLFCSTSDFSVGIVFLRFAHDATFMSIMVWTSVSMVLLLHRHCQRM 5 | QYIFTLNQDPRGQAETTATHTILMLVVTFVGFYLLSLICIIFYTYFIYSHHSLRHCND 6 | ILVSGFPTISPLLLTFRDPKGPCSVFFNC -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 saravanavel 4 | Permission is hereby granted, free of charge, to any person obtaining a copy 5 | of this software and associated documentation files (the "Software"), to deal 6 | in the Software without restriction, including without limitation the rights 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the Software is 9 | furnished to do so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # IN PROGRESS 2 | --------------------------------------------------------------------------------