├── nbjekyll
    ├── nb_git
    │   ├── tests
    │   │   ├── __init__.py
    │   │   ├── test_repo.py
    │   │   ├── base.py
    │   │   └── files
    │   │   │   └── Tutorial.md
    │   ├── __init__.py
    │   └── nb_git.py
    ├── jekyllconvert
    │   ├── __init__.py
    │   ├── templates
    │   │   └── Jekyll_template.tpl
    │   └── jekyll_export.py
    ├── __init__.py
    └── convert_nbs.py
├── setup.cfg
├── MANIFEST.in
├── testenv.yml
├── .gitignore
├── .travis.yml
├── LICENSE
├── License.txt
├── setup.py
└── README.md


/nbjekyll/nb_git/tests/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/nbjekyll/jekyllconvert/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | description_file = README.md


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include nbjekyll/jekyllconvert/templates/*
2 | 


--------------------------------------------------------------------------------
/nbjekyll/nb_git/__init__.py:
--------------------------------------------------------------------------------
1 | from .nb_git import nb_repo


--------------------------------------------------------------------------------
/nbjekyll/__init__.py:
--------------------------------------------------------------------------------
1 | from . import jekyllconvert
2 | from . import nb_git


--------------------------------------------------------------------------------
/testenv.yml:
--------------------------------------------------------------------------------
 1 | name: testenv
 2 | dependencies:
 3 |   - python=3.6
 4 |   - pytest
 5 |   - pytz
 6 |   - pip
 7 |   - pip:
 8 |     - nbval
 9 |     - nbconvert
10 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | 
 2 | \.DS_Store
 3 | 
 4 | \.idea/
 5 | 
 6 | __pycache__/
 7 | 
 8 | 
 9 | 
10 | # Compiled python modules.
11 | *.pyc
12 | 
13 | # Setuptools distribution folder.
14 | /dist/
15 | 
16 | # Python egg metadata, regenerated from source files by setuptools.
17 | /*.egg-info
18 | /*.egg
19 | 
20 | \.cache/v/cache/
21 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | 
 3 | python:
 4 |   - 3.6
 5 | 
 6 | env:
 7 |   global:
 8 |     - PACKAGENAME="nbjekyll"
 9 | 
10 | before_install:
11 |   # Here we download miniconda and createour env
12 |   - export MINICONDA=$HOME/miniconda
13 |   - export PATH="$MINICONDA/bin:$PATH"
14 |   - hash -r
15 |   - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
16 |   - bash miniconda.sh -b -f -p $MINICONDA
17 |   - conda config --set always_yes yes
18 |   - conda update conda
19 |   - conda info -a
20 |   - conda env create -f testenv.yml -v
21 |   - source activate testenv
22 |   - conda install -c conda-forge pygit2
23 | 
24 | install:
25 |   - python setup.py install
26 | 
27 | script:
28 |   - pytest $PACKAGENAME
29 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Tania Allard
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/License.txt:
--------------------------------------------------------------------------------
 1 | The MIT License
 2 | Copyright (c) 2018 Tania Allard <tania.sanchezmonroy@gmail.com>
 3 | Permission is hereby granted, free of charge, to any person obtaining
 4 | a copy of this software and associated documentation files (the
 5 | "Software"), to deal in the Software without restriction, including
 6 | without limitation the rights to use, copy, modify, merge, publish,
 7 | distribute, sublicense, and/or sell copies of the Software, and to
 8 | permit persons to whom the Software is furnished to do so, subject to
 9 | the following conditions:
10 | The above copyright notice and this permission notice shall be
11 | included in all copies or substantial portions of the Software.
12 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
13 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
14 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
15 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
16 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
17 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
18 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
19 | 


--------------------------------------------------------------------------------
/nbjekyll/nb_git/tests/test_repo.py:
--------------------------------------------------------------------------------
 1 | """Tests the nb_git functions """
 2 | import os
 3 | import unittest
 4 | import pygit2
 5 | import pytest
 6 | import time
 7 | import datetime
 8 | 
 9 | from . import base
10 | 
11 | from ..nb_git import nb_repo
12 | 
13 | here = os.getcwd()
14 | 
15 | class RepoTest(base.NoRepoTestCase):
16 | 
17 |     def test_discover_repo(self):
18 |         repo = pygit2.init_repository(self._temp_dir, False)
19 |         subdir = os.path.join(self._temp_dir, "test1", "test2")
20 |         os.makedirs(subdir)
21 |         self.assertEqual(repo.path, pygit2.discover_repository(subdir))
22 | 
23 | 
24 | def test_nb_repo():
25 |     """ Checks a repo is found """
26 |     repo = nb_repo(here)
27 |     assert repo.repo.path == pygit2.discover_repository(here)
28 | 
29 | def test_find_notebooks():
30 |     """Checks that the method finds the notebooks
31 |     in this repository """
32 |     notebooks = nb_repo(os.getcwd()).find_notebooks()
33 |     assert len(notebooks) == 1
34 |     assert notebooks[0] == 'Tutorial.ipynb'
35 | 
36 | def test_get_commit():
37 |     """Tests the get commit function, does it return a commit?
38 |     """
39 |     last_commit = nb_repo(os.getcwd()).get_commit()
40 |     repository = pygit2.Repository(pygit2.discover_repository(here))
41 |     assert last_commit['sha1'] == repository.revparse_single('HEAD').hex[0:7]
42 | 
43 | def test_convert_time():
44 |     now = time.time()
45 |     conv_time = nb_repo(os.getcwd()).convert_time(now)
46 |     now_full = datetime.datetime.now().strftime("%d-%m-%Y")
47 |     assert conv_time == now_full
48 | 
49 | 
50 | if __name__ == '__main__':
51 |     unittest.main()


--------------------------------------------------------------------------------
/nbjekyll/nb_git/tests/base.py:
--------------------------------------------------------------------------------
 1 | """ Base TestCase for testing nb_git"""
 2 | import gc
 3 | import os
 4 | import shutil
 5 | import unittest
 6 | import tempfile
 7 | 
 8 | class NoRepoTestCase(unittest.TestCase):
 9 | 
10 |     def setUp(self):
11 |         self._temp_dir = tempfile.mkdtemp()
12 |         self.repo = None
13 | 
14 |     def tearDown(self):
15 |         del self.repo
16 |         gc.collect()
17 |         rmtree(self._temp_dir)
18 | 
19 |     def assertRaisesAssign(self, exc_class, instance, name, value):
20 |         try:
21 |             setattr(instance, name, value)
22 |         except:
23 |             self.assertEqual(exc_class, sys.exc_info()[0])
24 | 
25 |     def assertAll(self, func, entries):
26 |         return self.assertTrue(all(func(x) for x in entries))
27 | 
28 |     def assertAny(self, func, entries):
29 |         return self.assertTrue(any(func(x) for x in entries))
30 | 
31 |     def assertRaisesWithArg(self, exc_class, arg, func, *args, **kwargs):
32 |         try:
33 |             func(*args, **kwargs)
34 |         except exc_class as exc_value:
35 |             self.assertEqual((arg,), exc_value.args)
36 |         else:
37 |             self.fail('%s(%r) not raised' % (exc_class.__name__, arg))
38 | 
39 |     def assertEqualSignature(self, a, b):
40 |         # XXX Remove this once equality test is supported by Signature
41 |         self.assertEqual(a.name, b.name)
42 |         self.assertEqual(a.email, b.email)
43 |         self.assertEqual(a.time, b.time)
44 |         self.assertEqual(a.offset, b.offset)
45 | 
46 | 
47 | def rmtree(path):
48 |     """In Windows a read-only file cannot be removed, and shutil.rmtree fails.
49 |     So we implement our own version of rmtree to address this issue.
50 |     """
51 |     if os.path.exists(path):
52 |         onerror = lambda func, path, e: force_rm_handle(func, path, e)
53 |         shutil.rmtree(path, onerror=onerror)
54 | 
55 | 


--------------------------------------------------------------------------------
/nbjekyll/jekyllconvert/templates/Jekyll_template.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'markdown.tpl' %}
 2 | 
 3 | {# custom header for jekyll post #}
 4 | {%- block header -%}
 5 | ---
 6 | layout: notebook
 7 | title: "{{resources['metadata']['name']}}"
 8 | tags:
 9 | update_date: [-date-]
10 | code_version: [-sha1-]
11 | author: [-author-]
12 | validation_pass: '[-validated-]'
13 | badge: "https://img.shields.io/badge/notebook-[-badge-]"
14 | ---
15 | <br />
16 | 
17 | 
18 | {%- if "widgets" in nb.metadata -%}
19 | <script src="https://unpkg.com/jupyter-js-widgets@2.0.*/dist/embed.js"></script>
20 | {%- endif-%}
21 | {%- endblock header -%}
22 | 
23 | 
24 | {% block in_prompt -%}
25 | {%- if cell.execution_count is defined -%}
26 | {%- if resources.global_content_filter.include_input_prompt-%}
27 | <font color='#808080'> In&nbsp;[{{ cell.execution_count|replace(None, "&nbsp;") }}]:</font>
28 | {%- else -%}
29 | <font color='#808080'> In&nbsp;[&nbsp;]: </font>
30 | {%- endif -%}
31 | {%- endif -%}
32 | {%- endblock in_prompt %}
33 | 
34 | 
35 | {# Images will be saved in the custom path #}
36 | {% block data_svg %}
37 | <img src="{{ output.metadata.filenames['image/svg+xml'] | jekyllpath }}" alt="svg" />
38 | {% endblock data_svg %}
39 | 
40 | {% block data_png %}
41 | <img src="{{ output.metadata.filenames['image/png'] | jekyllpath }}" alt="png"/>
42 | {% endblock data_png %}
43 | 
44 | {% block data_jpg %}
45 | <img src="{{ output.metadata.filenames['image/jpeg'] | jekyllpath }}" alt="jpeg" />
46 | {% endblock data_jpg %}
47 | 
48 | {# cells containing markdown text only #}
49 | {% block markdowncell scoped %}
50 | {{ cell.source | wrap_text(80) }}
51 | {% endblock markdowncell %}
52 | 
53 | {# headings #}
54 | {% block headingcell scoped %}
55 | {{ '#' * cell.level }} {{ cell.source | replace('\n', ' ') }}
56 | {% endblock headingcell %}
57 | 
58 | {% block stream -%}
59 | {% endblock stream %}
60 | 
61 | {# latex data block#}
62 | {% block data_latex %}
63 | {{ output.data['text/latex'] }}
64 | {% endblock data_latex %}
65 | 
66 | {% block data_text scoped %}
67 | {{ output.data['text/plain'] | indent }}
68 | {% endblock data_text %}
69 | 
70 | {% block data_html scoped -%}
71 | 
72 | {{ output.data['text/html'] }}
73 | 
74 | {%- endblock data_html %}
75 | 
76 | {% block data_markdown scoped -%}
77 | {{ output.data['text/markdown'] | markdown2html }}
78 | 
79 | {%- endblock data_markdown %}
80 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from setuptools import setup, find_packages
 4 | 
 5 | name = 'nbjekyll'
 6 | 
 7 | pkg_root = os.path.join(os.path.dirname(__file__), name)
 8 | here = os.path.dirname(__file__)
 9 | 
10 | setup_args = dict(name = name,
11 |                   version = '0.1.1',
12 | 
13 |                   description = 'Package used for easy conversion from Jupyter notebook to Jekyll posts',
14 |                   long_description = open('README.md').read(),
15 | 
16 |                   url = 'https://github.com/trallard/nbconvert-jekyllconvert.git',
17 |                   donwload_url ='https://github.com/trallard/nbjekyll/archive/v0.1.1.tar.gz',
18 | 
19 |                   # Author details
20 |                   author = 'Tania Allard',
21 |                   author_email = 'taniar.allard@gmail.com',
22 | 
23 |                   license = 'MIT',
24 |                   include_package_data = True,
25 | 
26 |                   # You can just specify the packages manually here if your project is
27 |                   # simple. Or you can use find_packages().
28 |                   packages = find_packages(),
29 |                   zip_safe = False,
30 | 
31 |                   install_requires = ['pygit2', 'nbval', 'nbconvert >= 5.0','pytz'],
32 | 
33 |                   keywords =  'jupyter, jekyll, teaching, dissemination, open science',
34 | 
35 |                   classifiers = [
36 |                     # Specify the Python versions you support here. In particular, ensure
37 |                     # that you indicate whether you support Python 2, Python 3 or both.
38 |                     'Programming Language :: Python :: 2',
39 |                     'Programming Language :: Python :: 2.6',
40 |                     'Programming Language :: Python :: 2.7',
41 |                     'Programming Language :: Python :: 3',
42 |                     'Programming Language :: Python :: 3.2',
43 |                     'Programming Language :: Python :: 3.3',
44 |                     'Programming Language :: Python :: 3.4',
45 |                     'Programming Language :: Python :: 3.5',
46 |                     'Programming Language :: Python :: 3.6',
47 |                     "Development Status :: 3 - Alpha",
48 |                     "Intended Audience :: Education",
49 |                     "License :: OSI Approved :: MIT License" ] )
50 | 
51 | if __name__ == '__main__':
52 |     setup(**setup_args)
53 | 


--------------------------------------------------------------------------------
/nbjekyll/convert_nbs.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | """
  3 | Script for conversion of .ipynb files into a format suitable
  4 | for Jekyll blog posts
  5 | """
  6 | 
  7 | import os
  8 | 
  9 | from pathlib import Path
 10 | from string import Template
 11 | import pytest
 12 | import argparse
 13 | 
 14 | from .nb_git.nb_git import nb_repo
 15 | from .jekyllconvert import jekyll_export
 16 | 
 17 | 
 18 | #-----------------------------------------------------------------------------
 19 | #Classes and functions
 20 | #-----------------------------------------------------------------------------
 21 | 
 22 | def validate_nb(nb):
 23 |     """
 24 |     Run pytest with nbval on the notebooks
 25 |     :param nb:
 26 |     :return: pytest exit code
 27 |     see hhttps://docs.pytest.org/en/latest/usage.html?%20main
 28 |     """
 29 |     print("[nbjekyll] Running test on {}".format(os.path.split(nb)[1]))
 30 |     return validation_code(pytest.main([nb, '--nbval-lax']))
 31 | 
 32 | 
 33 | def validation_code(exit_code):
 34 |     """
 35 |     Check the exit code and pass the value to
 36 |     the dictionary containing the commit information
 37 |     :param exit_code:
 38 |     :return: validation status
 39 |     """
 40 |     if exit_code == 0:
 41 |         validated = 'yes'
 42 |         badge = 'validated-brightgreen.svg'
 43 |     elif exit_code == 1:
 44 |         validated = 'no'
 45 |         badge = 'validation failed-red.svg'
 46 |     else:
 47 |         validated = 'unknown'
 48 |         badge = 'unknown%20status-yellow.svg'
 49 |     return [validated, badge]
 50 | 
 51 | def format_template(commit_info, nb):
 52 |     """
 53 |     Replace the template data with the information
 54 |     collected from the commit before
 55 |     :param commit_info:
 56 |     :param nb:
 57 |     :return: modified .md for the notebook previously
 58 |     converted
 59 |     """
 60 | 
 61 |     nb_path = os.path.abspath(nb).replace('ipynb', 'md')
 62 |     with open(nb_path, 'r+') as file:
 63 |         template = NbTemplate(file.read())
 64 |         updated = template.substitute(commit_info)
 65 |         file.seek(0)
 66 |         file.write(updated)
 67 |         file.truncate()
 68 | 
 69 | 
 70 | class NbTemplate(Template):
 71 |     """"
 72 |     Subclass of Template, this uses [- -] as the delimiter sequence
 73 |     to replace the template variables instead of the default $, ${}, $$
 74 |     as this causes problems when then notebooks use the R kernel
 75 |     """
 76 |     delimiter = '[-'
 77 |     pattern = r'''
 78 |         \[-(?:
 79 |            (?P<escaped>-) |            # Expression [-- will become [-
 80 |            (?P<named>[^\[\]\n-]+)-\] | # -, [, ], and \n can't be used in names
 81 |            \b\B(?P<braced>) |          # Braced names disabled
 82 |            (?P<invalid>)               #
 83 |         )
 84 |         '''
 85 | 
 86 | def parse_path():
 87 |     arg_parser = argparse.ArgumentParser(description="Convert Jupyter notebooks to Jekyll posts")
 88 |     arg_parser.add_argument('-p', '--path',
 89 |                             help="Custom path to save the Notebook images. The path in the"
 90 |                             " output markdown will be modified accordingly")
 91 | 
 92 |     return arg_parser.parse_args()
 93 | 
 94 | if __name__ == '__main__':
 95 |     args = parse_path()
 96 |     if args.path:
 97 |         img_path = args.path
 98 |     else:
 99 |         img_path = './images/notebook_images'
100 | 
101 |     print('[nbjekyll] Images will be saved in [{}]'.format(img_path))
102 | 
103 |     here = os.getcwd()
104 | 
105 |     # Step one: find if this is a repository
106 |     repository = nb_repo(here)
107 | 
108 |     # Find the notebooks that have been added to the repo
109 |     # or that have been updated in the last commit
110 |     notebooks = repository.check_log()
111 |     # Convert each of the notebooks using nbconvert
112 |     # then add repo specific information
113 |     for nb in notebooks['notebooks']:
114 |         nb_path = Path(nb).resolve()
115 |         if os.path.exists(nb_path):
116 |             # convert the notebook in a .md
117 |             print('[nbjekyll] Converting [{}]'.format(nb))
118 |             jekyll_export.convert_single_nb(nb_path, img_path)
119 |             # use nbval for the notebook
120 |             test = validate_nb(nb_path)
121 |             notebooks['validated'] = test[0]
122 |             notebooks['badge'] = test[1]
123 | 
124 |             # substitute header
125 |             format_template(notebooks, nb)
126 |             print('[nbjekyll] Finalising conversion of [{}]'.format(nb))
127 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # nbjekyll
 2 | 
 3 | | Release                                                                                                                                                                                              | Usage                                                                                                       | Development                                                                                                           |
 4 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
 5 | | [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active) | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) | [![Build Status](https://travis-ci.org/trallard/nbjekyll.svg?branch=master)](https://travis-ci.org/trallard/nbjekyll) |
 6 | | [![PyPI](https://img.shields.io/pypi/v/nine.svg)](https://pypi.python.org/pypi/nbjekyll)                                                                                                             | [![PyPI](https://img.shields.io/pypi/pyversions/Django.svg)]()                                              |                                                                                                                       |
 7 | 
 8 | An experimental tool to convert Jupyter notebooks to .md files that could be immediately passed into Jekyll for publishing.
 9 | 
10 | Jupyter comes with support for generating .md files by using their own generated exporters and templates. This is a very robust approach, but far from being ideal for .md conversion for Jekyll static blogs.
11 | 
12 | nbjekyll uses the nbconvert markdown exporter but ensures that the plots generated in the notebooks are saved in a separate directory and that the paths to such plots can be easily interpreted by Jekyll.
13 | The path for the plots is by default specified as `./images/notebook_images/{Notebook_name}` but it can be modified if needed. For more details see [Usage](#usage).
14 | 
15 | 
16 | nbjekylluses [nbval](https://github.com/computationalmodelling/nbval) to test the notebooks. Depending on the status code (see [pytest exit codes](https://docs.pytest.org/en/latest/usage.html)) the validation and appropriate badge is added:
17 | 
18 | ![](https://img.shields.io/badge/notebook-validated-brightgreen.svg)
19 | 
20 | <img src="https://img.shields.io/badge/notebook-validation failed-red.svg">
21 | 
22 | ![](https://img.shields.io/badge/notebook-unknown%20status-yellow.svg)
23 | 
24 | It returns a .md file with the yaml header needed for Jekyll. It contains the mandatory fields of `title` and `layout` (by default set to notebook). It also adds other fields such as the sha1 and author of the last commit associated to the notebook, the last update date, and a badge indicating if the notebook passed the validation with nbval.
25 | 
26 | ```yaml
27 | ---
28 | layout: notebook
29 | title: "Classify_demo"
30 | tags:
31 | update_date: 17-01-2018
32 | code_version: 19e3e29
33 | author: Tania Allard
34 | validation_pass: 'yes'
35 | badge: "https://img.shields.io/badge/notebook-validated-brightgreen.svg"
36 | ---
37 | ```
38 | 
39 | You can see a Jekyll site using the converted notebooks  [here](http://bitsandchips.me/Modules-template/) ✨⚡️
40 | 
41 | ## Install
42 | nbjekyll is available from [PyPi](https://pypi.python.org/pypi/nbjekyll) so you can install nbjekyll using pip like so:
43 | ```bash
44 | pip install nbjekyll
45 | ```
46 | 
47 | ## Usage
48 | Once the package is installed you can start using it directly from
49 | your Jekyll site directory.
50 | 
51 | 1. Add the Jupyter notebook you want to add to your blog
52 | 2. Commit the notebook or notebooks to Git
53 | 3. Run the Jekyll converter from the terminal. Make sure to run it from the
54 | main directory of your Jekyll blog:
55 | 
56 | ```bash
57 | python -m nbjekyll.convert_nbs
58 | ```
59 | If you want your output images to be in a different path you can use the flags `-p` `--path` like so:
60 | 
61 | ```bash
62 | python -m nbjekyll.convert_nbs -p ./site_images
63 | ```
64 | 4. Make sure to modify the layout in your .md yaml header!
65 | 
66 | ## Important things to consider
67 | - **You need to commit your notebooks to Git _right_ before using nbjekyll**
68 | 
69 | At this moment nbjekyll will check for the last commit in your repository and convert the notebooks associated to such commit.
70 | 
71 | We are looking into changing this to allow for more flexibility in the near future.
72 | 
73 | - **What are the pre requisites?**
74 |   - Python > 3.4
75 |   - pytest
76 |   - nbval
77 |   - nbconvert > 5.0
78 |   - pygit2 (if you use conda the easiest way to get this installed is by doing `conda install -c conda-forge pygit2`)
79 | 


--------------------------------------------------------------------------------
/nbjekyll/nb_git/nb_git.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Functions used to get details on the git repository
  3 | and its commits.
  4 | It is used to find which notebooks were modified in a specific
  5 | """
  6 | 
  7 | import pygit2
  8 | import os
  9 | import fnmatch
 10 | import glob
 11 | from pathlib import Path
 12 | from datetime import datetime
 13 | import pytz
 14 | 
 15 | class nb_repo(object):
 16 |     """ Class containing methods used to
 17 |     identify the notebooks committed to the
 18 |     repository and add the SHA to the Jinja template"""
 19 | 
 20 |     def __init__(self, here):
 21 |         """ Find if the current location is a
 22 |         Git repository, if so it will return a
 23 |         repository object """
 24 |         try:
 25 |             repo_path = pygit2.discover_repository(here)
 26 |             repo = pygit2.Repository(repo_path)
 27 |             self.repo = repo
 28 |             self.here = here
 29 |         except:
 30 |             raise OtherException("[nbjekyll] This does not seem to be a repository,"
 31 |                                  "make sure you are in an initialized repo")
 32 | 
 33 | 
 34 |     def check_log(self):
 35 |         """ Check the number of commits in the repository,
 36 |         if there is only one commit it will find all the notebooks
 37 |         inside the repository.
 38 |         Otherwise, it will find the notebooks in the latest commit only
 39 |         Returns
 40 |             -------
 41 |             notebooks: dictionary containing the sha1 for the commit,
 42 |             the list of found notebooks, author, and the date when
 43 |             the notebooks were last updated
 44 |         """
 45 |         all_commits = [commit for commit in self.repo.head.log()]
 46 |         if len(all_commits) <= 1:
 47 |             print('[nbjekyll] Only one commit: converting all the notebooks in the repo')
 48 |             # calls function find_notebooks
 49 |             notebooks = self.find_notebooks()
 50 |             commit_info = self.get_commit()
 51 |             commit_info['notebooks'] = notebooks
 52 | 
 53 |             return commit_info
 54 |         else:
 55 |             print(("[nbjekyll] There are notebooks already in version control,"
 56 |                    " finding the notebooks passed in the last commit"))
 57 |             # calls function last_commit
 58 |             notebooks = self.last_commit()
 59 |             return notebooks
 60 | 
 61 |     def find_notebooks(self):
 62 |         """ Find all the notebooks in the repo, but excludes those
 63 |         in the _site folder.
 64 |         Returns
 65 |             -------
 66 |             notebooks: dictionary containing the sha1, notebooks name,
 67 |             and dat of the commit
 68 |         """
 69 |         basePath = os.getcwd()
 70 |         #notebooksAll = [nb for nb in glob.glob('**/*.ipynb')]
 71 |         notebooksAll = list()
 72 |         for root, dirs, files in os.walk(basePath):
 73 |             # notebooksAll = [nb for nb in files if nb.endswith('.ipynb')]
 74 |             for file in files:
 75 |                 if file.endswith('.ipynb'):
 76 |                     notebooksAll.append(file)
 77 | 
 78 |         exception = os.path.join(basePath, '/_site/*/*')
 79 |         notebooks = [nb for nb in notebooksAll if not fnmatch.fnmatch(nb, exception)]
 80 | 
 81 |         if not notebooks:
 82 |             print('[nbjekyll] There were no notebooks found')
 83 |         else:
 84 |             return notebooks
 85 | 
 86 |     def last_commit(self):
 87 |         """ Find the notebooks modified in the last repository, but excludes those
 88 |                 in the _site folder.
 89 |                 Returns
 90 |                     -------
 91 |                     notebooks: dictionary containing the sha1, notebooks name,
 92 |                     and dat of the commit
 93 |                     if no notebooks were modified in the last commit then
 94 |                     the list notebooks is an empty list
 95 |         """
 96 | 
 97 |         commit_info = self.get_commit()
 98 |         parent_commit = self.repo.get(commit_info['parent'])
 99 |         #notebooks = [nb.name for nb in self.repo.revparse_single('HEAD').tree if '.ipynb' in nb.name]
100 |         diff = self.repo.revparse_single('HEAD').tree.diff_to_tree(parent_commit.tree)
101 |         patches = [p for p in diff]
102 |         notebooks = [patch.delta.new_file.path for patch in patches if 'ipynb' in patch.delta.new_file.path]
103 |         commit_info['notebooks'] = notebooks
104 |         del commit_info['parent']
105 | 
106 |         return commit_info
107 | 
108 | 
109 |     def convert_time(self, epoch):
110 |         """
111 |         Pass on the epoch date from the last commit and
112 |         returns it in a human readable format
113 |         :param epoch:
114 |         :return: commit date in a dd-mm-YYYY format
115 |         """
116 |         time_zone = pytz.timezone('GMT')
117 |         dt = datetime.fromtimestamp(epoch, time_zone)
118 | 
119 |         return dt.strftime('%d-%m-%Y')
120 | 
121 |     def get_commit(self):
122 |         """
123 |         Get the information for the last commit in the repository
124 |         :return: dictionary with the sha1, date, and author
125 |         of the last commit.
126 |         """
127 |         last = self.repo.revparse_single('HEAD')
128 |         sha1 = last.hex[0:7]
129 |         author = last.author.name
130 | 
131 |         parent_commit_id = last.parents[0].id
132 | 
133 |         date = self.convert_time(last.author.time)
134 |         commit_info = {'sha1': sha1,
135 |                        'date': date,
136 |                        'author': author,
137 |                        'parent': parent_commit_id}
138 | 
139 |         return commit_info
140 | 


--------------------------------------------------------------------------------
/nbjekyll/jekyllconvert/jekyll_export.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This uses the Jekyll markdown exporter to convert .ipynb files
  3 | to .md files.
  4 | It ensures that the images are not saved as base64 but as separate
  5 | .png, .jpg. or .svg files in an images directory and that the
  6 | path to this is accurately updated using a custom jekyllpath filter
  7 | 
  8 | It also uses Beautiful soup to do some basic HTML parsing
  9 | """
 10 | import os
 11 | import io
 12 | 
 13 | from traitlets.config import Config
 14 | from nbconvert import MarkdownExporter
 15 | from ipython_genutils.path import ensure_dir_exists
 16 | 
 17 | from bs4 import BeautifulSoup
 18 | 
 19 | def init_nb_resources(notebook_filename, img_path):
 20 |     """Step 1: Initialize resources
 21 |             This initializes the resources dictionary for a single notebook.
 22 |             Returns
 23 |             -------
 24 |             resources dictionary for a single notebook that MUST include:
 25 |                 - unique_key: notebook nametable
 26 |     """
 27 |     resources = {}
 28 |     basename = os.path.basename(notebook_filename)
 29 |     notebook_name = basename[:basename.rfind('.')]
 30 |     resources['unique_key'] = notebook_name
 31 |     #resources['output_files_dir'] = './images/notebook_images/{}'.format(notebook_name)
 32 |     resources['output_files_dir'] = img_path + '/' +notebook_name
 33 |     return resources
 34 | 
 35 | def export_notebook(notebook_filename, resources):
 36 |     """Step 2: Export the notebook
 37 |         Exports the notebook to a particular format according to the specified
 38 |         exporter. This function returns the output and (possibly modified)
 39 |         resources from the exporter.
 40 |         Parameters
 41 |         ----------
 42 |         notebook_filename : str
 43 |             name of notebook file.
 44 |         resources : dict
 45 |         Returns
 46 |         -------
 47 |         output
 48 |         dict
 49 |             resources (possibly modified)
 50 |         """
 51 |     config = Config()
 52 |     basePath = os.path.dirname(__file__)
 53 |     exporter = MarkdownExporter(config = config,
 54 |                                 template_path = [os.path.join(basePath,'templates/')],
 55 |                                 template_file = 'Jekyll_template.tpl',
 56 |                                 filters = {'jekyllpath': jekyllpath})
 57 |     content, resources = exporter.from_filename(notebook_filename, resources = resources)
 58 |     content = parse_html(content)
 59 |     return content, resources
 60 | 
 61 | def parse_html(content):
 62 |     """ This step is included in Step 2: this will use beautiful soup to
 63 |     modify certain tags of the returned content
 64 |     Parameters
 65 |     ----------
 66 |     content : returned from the notebook export
 67 |     Returns
 68 |     ------
 69 |     soup (parsed html content)
 70 |     """
 71 |     soup = BeautifulSoup(content, 'html.parser')
 72 |     if soup.table:
 73 |         for tag in soup.find_all('table'):
 74 |             tag['class'] = 'table-responsive table-striped'
 75 |             tag['border'] = '0'
 76 |     return soup
 77 | 
 78 | 
 79 | def jekyllpath(path):
 80 |     """ Take the filepath of an image output by the ExportOutputProcessor
 81 |     and convert it into a URL we can use with Jekyll. This is passed to the exporter
 82 |     as a filter to the exporter.
 83 |     Note that this will be directly taken from the Jekyll _config.yml file
 84 |     """
 85 |     return path.replace("./", "{{site.url}}{{site.baseurl}}/")
 86 | 
 87 | def write_outputs(content, resources):
 88 |     """Step 3: Write the notebook to file
 89 |             This writes output from the exporter to file using the specified writer.
 90 |             It returns the results from the writer.
 91 |             Parameters
 92 |             ----------
 93 |             output :
 94 |             resources : dict
 95 |                 resources for a single notebook including name, config directory
 96 |                 and directory to save output
 97 |             Returns
 98 |             -------
 99 |             file
100 |                 results from the specified writer output of exporter
101 |             """
102 | 
103 |     # various paths and variables needed for the module
104 |     notebook_namefull = resources['metadata']['name'] + resources.get('output_extension')
105 |     outdir_nb = resources['metadata']['path']
106 |     outfile = os.path.join(outdir_nb, notebook_namefull)
107 |     imgs_outdir = resources.get('output_files_dir')
108 |     ensure_dir_exists(imgs_outdir)
109 | 
110 |     # write file in the appropriate format
111 |     with io.open(outfile, 'w', encoding = "utf-8") as fout:
112 |         body = content.prettify(formatter='html')
113 |         fout.write(body)
114 | 
115 |     # if the content has images then they are returned and saved
116 |     if resources['outputs']:
117 |         save_imgs(resources, imgs_outdir)
118 | 
119 | def save_imgs(resources, imgs_outdir):
120 |     """ If the notebook had plots or figures, then they are saved in the appropriate
121 |     directory"""
122 |     items = resources.get('outputs', {}).items()
123 |     if not os.path.exists(imgs_outdir):
124 |         os.mkdir(imgs_outdir)
125 |     for filename, data in items:
126 |         dest = filename
127 |         with io.open(dest, 'wb+') as f:
128 |             f.write(data)
129 | 
130 | def convert_single_nb(notebook_filename, img_path):
131 |     """Convert a single notebook.
132 |             Performs the following steps:
133 |                 1. Initialize notebook resources
134 |                 2. Export the notebook to a particular format
135 |                 3. Write the exported notebook to file as well as complementary images
136 |             Parameters
137 |             ----------
138 |             notebook_filename : str
139 |             img_path : str
140 |             """
141 |     resources = init_nb_resources(notebook_filename, img_path)
142 |     content, resources = export_notebook(notebook_filename, resources)
143 |     write_outputs(content, resources)
144 | 


--------------------------------------------------------------------------------
/nbjekyll/nb_git/tests/files/Tutorial.md:
--------------------------------------------------------------------------------
   1 | ---
   2 | layout: notebook
   3 | title: "Tutorial"
   4 | tags:
   5 | update_date: 12-01-2018
   6 | code_version: d2de4b0
   7 | author: Tania Allard
   8 | ---
   9 | <br/>
  10 | <img src="https://img.shields.io/badge/notebook-validation failed-red.svg">
  11 |  <br/>
  12 |  # BAD Day 1: Tutorial
  13 | 
  14 | # 0. Source/install the needed packages
  15 |  <font color="#808080">
  16 |   In&nbsp;[1]:
  17 |  </font>
  18 |  ```R
  19 | # In case you need to install the packages
  20 |  install.packages("xlsx")
  21 |  install.packages("gdata")
  22 |  install.packages("ape")
  23 | ```
  24 |  <font color="#808080">
  25 |   In&nbsp;[2]:
  26 |  </font>
  27 |  ```R
  28 | source("http://bioconductor.org/biocLite.R");
  29 | biocLite("multtest");
  30 | ```
  31 | 
  32 | 
  33 | # 1. Exploratory data analysis
  34 | 
  35 | We will be usig the Gene Expression dataset from **Golub et al (1999)**. The
  36 | gene expression data collected by Golub et al. (1999) are among the most
  37 | classical in bioinformatics. A selection of the set is called `golub` which is
  38 | contained in the `multtest` package loaded before.
  39 | 
  40 | 
  41 | The data consist of gene expression values of 3051 genes (rows) from 38 leukemia
  42 | patients Pre-processing was done as described in Dudoit et al. (2002). The R
  43 | code for pre-processing is available in the file ../doc/golub.R.
  44 | 
  45 | **Source**:
  46 | Golub et al. (1999). Molecular classification of cancer: class discovery and
  47 | class prediction by gene expression monitoring, Science, Vol. 286:531-537.
  48 | (http://www-genome.wi.mit.edu/MPR/).
  49 |  <font color="#808080">
  50 |   In&nbsp;[3]:
  51 |  </font>
  52 |  ```R
  53 | require(multtest);
  54 | 
  55 | # Usage
  56 | data(golub);
  57 | 
  58 | # If you need more information on the data set just
  59 | # uncomment the line below
  60 | # ?golub
  61 | ```
  62 | 
  63 | Data set values:
  64 | - `golub`: matrix of gene expression levels for the 38 tumor mRNA samples, rows
  65 | correspond to genes (3051 genes) and columns to mRNA samples.
  66 | - `golub.cl`: numeric vector indicating the tumor class, 27 acute lymphoblastic
  67 | leukemia (ALL) cases (code 0) and 11 acute myeloid leukemia (AML) cases (code
  68 | 1).
  69 | - `golub.names`: a matrix containing the names of the 3051 genes for the
  70 | expression matrix golub. The three columns correspond to the gene index, ID, and
  71 | Name, respectively.
  72 |  <font color="#808080">
  73 |   In&nbsp;[4]:
  74 |  </font>
  75 |  ```R
  76 | # Checking the dimension of the data
  77 | dim(golub)
  78 | ```
  79 |  <ol class="list-inline">
  80 |   <li>
  81 |    3051
  82 |   </li>
  83 |   <li>
  84 |    38
  85 |   </li>
  86 |  </ol>
  87 |  <font color="#808080">
  88 |   In&nbsp;[5]:
  89 |  </font>
  90 |  ```R
  91 | # we will have a look at the first rows contained in the data set
  92 | head(golub)
  93 | ```
  94 |  <table class="table-responsive table-striped">
  95 |   <tbody>
  96 |    <tr>
  97 |     <td>
  98 |      -1.45769
  99 |     </td>
 100 |     <td>
 101 |      -1.39420
 102 |     </td>
 103 |     <td>
 104 |      -1.42779
 105 |     </td>
 106 |     <td>
 107 |      -1.40715
 108 |     </td>
 109 |     <td>
 110 |      -1.42668
 111 |     </td>
 112 |     <td>
 113 |      -1.21719
 114 |     </td>
 115 |     <td>
 116 |      -1.37386
 117 |     </td>
 118 |     <td>
 119 |      -1.36832
 120 |     </td>
 121 |     <td>
 122 |      -1.47649
 123 |     </td>
 124 |     <td>
 125 |      -1.21583
 126 |     </td>
 127 |     <td>
 128 |      ⋯
 129 |     </td>
 130 |     <td>
 131 |      -1.08902
 132 |     </td>
 133 |     <td>
 134 |      -1.29865
 135 |     </td>
 136 |     <td>
 137 |      -1.26183
 138 |     </td>
 139 |     <td>
 140 |      -1.44434
 141 |     </td>
 142 |     <td>
 143 |      1.10147
 144 |     </td>
 145 |     <td>
 146 |      -1.34158
 147 |     </td>
 148 |     <td>
 149 |      -1.22961
 150 |     </td>
 151 |     <td>
 152 |      -0.75919
 153 |     </td>
 154 |     <td>
 155 |      0.84905
 156 |     </td>
 157 |     <td>
 158 |      -0.66465
 159 |     </td>
 160 |    </tr>
 161 |    <tr>
 162 |     <td>
 163 |      -0.75161
 164 |     </td>
 165 |     <td>
 166 |      -1.26278
 167 |     </td>
 168 |     <td>
 169 |      -0.09052
 170 |     </td>
 171 |     <td>
 172 |      -0.99596
 173 |     </td>
 174 |     <td>
 175 |      -1.24245
 176 |     </td>
 177 |     <td>
 178 |      -0.69242
 179 |     </td>
 180 |     <td>
 181 |      -1.37386
 182 |     </td>
 183 |     <td>
 184 |      -0.50803
 185 |     </td>
 186 |     <td>
 187 |      -1.04533
 188 |     </td>
 189 |     <td>
 190 |      -0.81257
 191 |     </td>
 192 |     <td>
 193 |      ⋯
 194 |     </td>
 195 |     <td>
 196 |      -1.08902
 197 |     </td>
 198 |     <td>
 199 |      -1.05094
 200 |     </td>
 201 |     <td>
 202 |      -1.26183
 203 |     </td>
 204 |     <td>
 205 |      -1.25918
 206 |     </td>
 207 |     <td>
 208 |      0.97813
 209 |     </td>
 210 |     <td>
 211 |      -0.79357
 212 |     </td>
 213 |     <td>
 214 |      -1.22961
 215 |     </td>
 216 |     <td>
 217 |      -0.71792
 218 |     </td>
 219 |     <td>
 220 |      0.45127
 221 |     </td>
 222 |     <td>
 223 |      -0.45804
 224 |     </td>
 225 |    </tr>
 226 |    <tr>
 227 |     <td>
 228 |      0.45695
 229 |     </td>
 230 |     <td>
 231 |      -0.09654
 232 |     </td>
 233 |     <td>
 234 |      0.90325
 235 |     </td>
 236 |     <td>
 237 |      -0.07194
 238 |     </td>
 239 |     <td>
 240 |      0.03232
 241 |     </td>
 242 |     <td>
 243 |      0.09713
 244 |     </td>
 245 |     <td>
 246 |      -0.11978
 247 |     </td>
 248 |     <td>
 249 |      0.23381
 250 |     </td>
 251 |     <td>
 252 |      0.23987
 253 |     </td>
 254 |     <td>
 255 |      0.44201
 256 |     </td>
 257 |     <td>
 258 |      ⋯
 259 |     </td>
 260 |     <td>
 261 |      -0.43377
 262 |     </td>
 263 |     <td>
 264 |      -0.10823
 265 |     </td>
 266 |     <td>
 267 |      -0.29385
 268 |     </td>
 269 |     <td>
 270 |      0.05067
 271 |     </td>
 272 |     <td>
 273 |      1.69430
 274 |     </td>
 275 |     <td>
 276 |      -0.12472
 277 |     </td>
 278 |     <td>
 279 |      0.04609
 280 |     </td>
 281 |     <td>
 282 |      0.24347
 283 |     </td>
 284 |     <td>
 285 |      0.90774
 286 |     </td>
 287 |     <td>
 288 |      0.46509
 289 |     </td>
 290 |    </tr>
 291 |    <tr>
 292 |     <td>
 293 |      3.13533
 294 |     </td>
 295 |     <td>
 296 |      0.21415
 297 |     </td>
 298 |     <td>
 299 |      2.08754
 300 |     </td>
 301 |     <td>
 302 |      2.23467
 303 |     </td>
 304 |     <td>
 305 |      0.93811
 306 |     </td>
 307 |     <td>
 308 |      2.24089
 309 |     </td>
 310 |     <td>
 311 |      3.36576
 312 |     </td>
 313 |     <td>
 314 |      1.97859
 315 |     </td>
 316 |     <td>
 317 |      2.66468
 318 |     </td>
 319 |     <td>
 320 |      -1.21583
 321 |     </td>
 322 |     <td>
 323 |      ⋯
 324 |     </td>
 325 |     <td>
 326 |      0.29598
 327 |     </td>
 328 |     <td>
 329 |      -1.29865
 330 |     </td>
 331 |     <td>
 332 |      2.76869
 333 |     </td>
 334 |     <td>
 335 |      2.08960
 336 |     </td>
 337 |     <td>
 338 |      0.70003
 339 |     </td>
 340 |     <td>
 341 |      0.13854
 342 |     </td>
 343 |     <td>
 344 |      1.75908
 345 |     </td>
 346 |     <td>
 347 |      0.06151
 348 |     </td>
 349 |     <td>
 350 |      1.30297
 351 |     </td>
 352 |     <td>
 353 |      0.58186
 354 |     </td>
 355 |    </tr>
 356 |    <tr>
 357 |     <td>
 358 |      2.76569
 359 |     </td>
 360 |     <td>
 361 |      -1.27045
 362 |     </td>
 363 |     <td>
 364 |      1.60433
 365 |     </td>
 366 |     <td>
 367 |      1.53182
 368 |     </td>
 369 |     <td>
 370 |      1.63728
 371 |     </td>
 372 |     <td>
 373 |      1.85697
 374 |     </td>
 375 |     <td>
 376 |      3.01847
 377 |     </td>
 378 |     <td>
 379 |      1.12853
 380 |     </td>
 381 |     <td>
 382 |      2.17016
 383 |     </td>
 384 |     <td>
 385 |      -1.21583
 386 |     </td>
 387 |     <td>
 388 |      ⋯
 389 |     </td>
 390 |     <td>
 391 |      -1.08902
 392 |     </td>
 393 |     <td>
 394 |      -1.29865
 395 |     </td>
 396 |     <td>
 397 |      2.00518
 398 |     </td>
 399 |     <td>
 400 |      1.17454
 401 |     </td>
 402 |     <td>
 403 |      -1.47218
 404 |     </td>
 405 |     <td>
 406 |      -1.34158
 407 |     </td>
 408 |     <td>
 409 |      1.55086
 410 |     </td>
 411 |     <td>
 412 |      -1.18107
 413 |     </td>
 414 |     <td>
 415 |      1.01596
 416 |     </td>
 417 |     <td>
 418 |      0.15788
 419 |     </td>
 420 |    </tr>
 421 |    <tr>
 422 |     <td>
 423 |      2.64342
 424 |     </td>
 425 |     <td>
 426 |      1.01416
 427 |     </td>
 428 |     <td>
 429 |      1.70477
 430 |     </td>
 431 |     <td>
 432 |      1.63845
 433 |     </td>
 434 |     <td>
 435 |      -0.36075
 436 |     </td>
 437 |     <td>
 438 |      1.73451
 439 |     </td>
 440 |     <td>
 441 |      3.36576
 442 |     </td>
 443 |     <td>
 444 |      0.96870
 445 |     </td>
 446 |     <td>
 447 |      2.72368
 448 |     </td>
 449 |     <td>
 450 |      -1.21583
 451 |     </td>
 452 |     <td>
 453 |      ⋯
 454 |     </td>
 455 |     <td>
 456 |      -1.08902
 457 |     </td>
 458 |     <td>
 459 |      -1.29865
 460 |     </td>
 461 |     <td>
 462 |      1.73780
 463 |     </td>
 464 |     <td>
 465 |      0.89347
 466 |     </td>
 467 |     <td>
 468 |      -0.52883
 469 |     </td>
 470 |     <td>
 471 |      -1.22168
 472 |     </td>
 473 |     <td>
 474 |      0.90832
 475 |     </td>
 476 |     <td>
 477 |      -1.39906
 478 |     </td>
 479 |     <td>
 480 |      0.51266
 481 |     </td>
 482 |     <td>
 483 |      1.36249
 484 |     </td>
 485 |    </tr>
 486 |   </tbody>
 487 |  </table>
 488 |  The gene names are collected in the matrix `golub.gnames` of which the columns
 489 | correspond to the gene index, ID, and Name, respectively.
 490 |  <font color="#808080">
 491 |   In&nbsp;[6]:
 492 |  </font>
 493 |  ```R
 494 | # Adding 3051 gene names
 495 | row.names(golub) = golub.gnames[,3]
 496 | 
 497 | head(golub)
 498 | ```
 499 |  <table class="table-responsive table-striped">
 500 |   <tbody>
 501 |    <tr>
 502 |     <th scope="row">
 503 |      AFFX-HUMISGF3A/M97935_MA_at
 504 |     </th>
 505 |     <td>
 506 |      -1.45769
 507 |     </td>
 508 |     <td>
 509 |      -1.39420
 510 |     </td>
 511 |     <td>
 512 |      -1.42779
 513 |     </td>
 514 |     <td>
 515 |      -1.40715
 516 |     </td>
 517 |     <td>
 518 |      -1.42668
 519 |     </td>
 520 |     <td>
 521 |      -1.21719
 522 |     </td>
 523 |     <td>
 524 |      -1.37386
 525 |     </td>
 526 |     <td>
 527 |      -1.36832
 528 |     </td>
 529 |     <td>
 530 |      -1.47649
 531 |     </td>
 532 |     <td>
 533 |      -1.21583
 534 |     </td>
 535 |     <td>
 536 |      ⋯
 537 |     </td>
 538 |     <td>
 539 |      -1.08902
 540 |     </td>
 541 |     <td>
 542 |      -1.29865
 543 |     </td>
 544 |     <td>
 545 |      -1.26183
 546 |     </td>
 547 |     <td>
 548 |      -1.44434
 549 |     </td>
 550 |     <td>
 551 |      1.10147
 552 |     </td>
 553 |     <td>
 554 |      -1.34158
 555 |     </td>
 556 |     <td>
 557 |      -1.22961
 558 |     </td>
 559 |     <td>
 560 |      -0.75919
 561 |     </td>
 562 |     <td>
 563 |      0.84905
 564 |     </td>
 565 |     <td>
 566 |      -0.66465
 567 |     </td>
 568 |    </tr>
 569 |    <tr>
 570 |     <th scope="row">
 571 |      AFFX-HUMISGF3A/M97935_MB_at
 572 |     </th>
 573 |     <td>
 574 |      -0.75161
 575 |     </td>
 576 |     <td>
 577 |      -1.26278
 578 |     </td>
 579 |     <td>
 580 |      -0.09052
 581 |     </td>
 582 |     <td>
 583 |      -0.99596
 584 |     </td>
 585 |     <td>
 586 |      -1.24245
 587 |     </td>
 588 |     <td>
 589 |      -0.69242
 590 |     </td>
 591 |     <td>
 592 |      -1.37386
 593 |     </td>
 594 |     <td>
 595 |      -0.50803
 596 |     </td>
 597 |     <td>
 598 |      -1.04533
 599 |     </td>
 600 |     <td>
 601 |      -0.81257
 602 |     </td>
 603 |     <td>
 604 |      ⋯
 605 |     </td>
 606 |     <td>
 607 |      -1.08902
 608 |     </td>
 609 |     <td>
 610 |      -1.05094
 611 |     </td>
 612 |     <td>
 613 |      -1.26183
 614 |     </td>
 615 |     <td>
 616 |      -1.25918
 617 |     </td>
 618 |     <td>
 619 |      0.97813
 620 |     </td>
 621 |     <td>
 622 |      -0.79357
 623 |     </td>
 624 |     <td>
 625 |      -1.22961
 626 |     </td>
 627 |     <td>
 628 |      -0.71792
 629 |     </td>
 630 |     <td>
 631 |      0.45127
 632 |     </td>
 633 |     <td>
 634 |      -0.45804
 635 |     </td>
 636 |    </tr>
 637 |    <tr>
 638 |     <th scope="row">
 639 |      AFFX-HUMISGF3A/M97935_3_at
 640 |     </th>
 641 |     <td>
 642 |      0.45695
 643 |     </td>
 644 |     <td>
 645 |      -0.09654
 646 |     </td>
 647 |     <td>
 648 |      0.90325
 649 |     </td>
 650 |     <td>
 651 |      -0.07194
 652 |     </td>
 653 |     <td>
 654 |      0.03232
 655 |     </td>
 656 |     <td>
 657 |      0.09713
 658 |     </td>
 659 |     <td>
 660 |      -0.11978
 661 |     </td>
 662 |     <td>
 663 |      0.23381
 664 |     </td>
 665 |     <td>
 666 |      0.23987
 667 |     </td>
 668 |     <td>
 669 |      0.44201
 670 |     </td>
 671 |     <td>
 672 |      ⋯
 673 |     </td>
 674 |     <td>
 675 |      -0.43377
 676 |     </td>
 677 |     <td>
 678 |      -0.10823
 679 |     </td>
 680 |     <td>
 681 |      -0.29385
 682 |     </td>
 683 |     <td>
 684 |      0.05067
 685 |     </td>
 686 |     <td>
 687 |      1.69430
 688 |     </td>
 689 |     <td>
 690 |      -0.12472
 691 |     </td>
 692 |     <td>
 693 |      0.04609
 694 |     </td>
 695 |     <td>
 696 |      0.24347
 697 |     </td>
 698 |     <td>
 699 |      0.90774
 700 |     </td>
 701 |     <td>
 702 |      0.46509
 703 |     </td>
 704 |    </tr>
 705 |    <tr>
 706 |     <th scope="row">
 707 |      AFFX-HUMRGE/M10098_5_at
 708 |     </th>
 709 |     <td>
 710 |      3.13533
 711 |     </td>
 712 |     <td>
 713 |      0.21415
 714 |     </td>
 715 |     <td>
 716 |      2.08754
 717 |     </td>
 718 |     <td>
 719 |      2.23467
 720 |     </td>
 721 |     <td>
 722 |      0.93811
 723 |     </td>
 724 |     <td>
 725 |      2.24089
 726 |     </td>
 727 |     <td>
 728 |      3.36576
 729 |     </td>
 730 |     <td>
 731 |      1.97859
 732 |     </td>
 733 |     <td>
 734 |      2.66468
 735 |     </td>
 736 |     <td>
 737 |      -1.21583
 738 |     </td>
 739 |     <td>
 740 |      ⋯
 741 |     </td>
 742 |     <td>
 743 |      0.29598
 744 |     </td>
 745 |     <td>
 746 |      -1.29865
 747 |     </td>
 748 |     <td>
 749 |      2.76869
 750 |     </td>
 751 |     <td>
 752 |      2.08960
 753 |     </td>
 754 |     <td>
 755 |      0.70003
 756 |     </td>
 757 |     <td>
 758 |      0.13854
 759 |     </td>
 760 |     <td>
 761 |      1.75908
 762 |     </td>
 763 |     <td>
 764 |      0.06151
 765 |     </td>
 766 |     <td>
 767 |      1.30297
 768 |     </td>
 769 |     <td>
 770 |      0.58186
 771 |     </td>
 772 |    </tr>
 773 |    <tr>
 774 |     <th scope="row">
 775 |      AFFX-HUMRGE/M10098_M_at
 776 |     </th>
 777 |     <td>
 778 |      2.76569
 779 |     </td>
 780 |     <td>
 781 |      -1.27045
 782 |     </td>
 783 |     <td>
 784 |      1.60433
 785 |     </td>
 786 |     <td>
 787 |      1.53182
 788 |     </td>
 789 |     <td>
 790 |      1.63728
 791 |     </td>
 792 |     <td>
 793 |      1.85697
 794 |     </td>
 795 |     <td>
 796 |      3.01847
 797 |     </td>
 798 |     <td>
 799 |      1.12853
 800 |     </td>
 801 |     <td>
 802 |      2.17016
 803 |     </td>
 804 |     <td>
 805 |      -1.21583
 806 |     </td>
 807 |     <td>
 808 |      ⋯
 809 |     </td>
 810 |     <td>
 811 |      -1.08902
 812 |     </td>
 813 |     <td>
 814 |      -1.29865
 815 |     </td>
 816 |     <td>
 817 |      2.00518
 818 |     </td>
 819 |     <td>
 820 |      1.17454
 821 |     </td>
 822 |     <td>
 823 |      -1.47218
 824 |     </td>
 825 |     <td>
 826 |      -1.34158
 827 |     </td>
 828 |     <td>
 829 |      1.55086
 830 |     </td>
 831 |     <td>
 832 |      -1.18107
 833 |     </td>
 834 |     <td>
 835 |      1.01596
 836 |     </td>
 837 |     <td>
 838 |      0.15788
 839 |     </td>
 840 |    </tr>
 841 |    <tr>
 842 |     <th scope="row">
 843 |      AFFX-HUMRGE/M10098_3_at
 844 |     </th>
 845 |     <td>
 846 |      2.64342
 847 |     </td>
 848 |     <td>
 849 |      1.01416
 850 |     </td>
 851 |     <td>
 852 |      1.70477
 853 |     </td>
 854 |     <td>
 855 |      1.63845
 856 |     </td>
 857 |     <td>
 858 |      -0.36075
 859 |     </td>
 860 |     <td>
 861 |      1.73451
 862 |     </td>
 863 |     <td>
 864 |      3.36576
 865 |     </td>
 866 |     <td>
 867 |      0.96870
 868 |     </td>
 869 |     <td>
 870 |      2.72368
 871 |     </td>
 872 |     <td>
 873 |      -1.21583
 874 |     </td>
 875 |     <td>
 876 |      ⋯
 877 |     </td>
 878 |     <td>
 879 |      -1.08902
 880 |     </td>
 881 |     <td>
 882 |      -1.29865
 883 |     </td>
 884 |     <td>
 885 |      1.73780
 886 |     </td>
 887 |     <td>
 888 |      0.89347
 889 |     </td>
 890 |     <td>
 891 |      -0.52883
 892 |     </td>
 893 |     <td>
 894 |      -1.22168
 895 |     </td>
 896 |     <td>
 897 |      0.90832
 898 |     </td>
 899 |     <td>
 900 |      -1.39906
 901 |     </td>
 902 |     <td>
 903 |      0.51266
 904 |     </td>
 905 |     <td>
 906 |      1.36249
 907 |     </td>
 908 |    </tr>
 909 |   </tbody>
 910 |  </table>
 911 |  <font color="#808080">
 912 |   In&nbsp;[7]:
 913 |  </font>
 914 |  ```R
 915 | # Let's just have a look at the top 20 genes ID's contained in golub.gnames
 916 | head(golub.gnames[,2], n = 20)
 917 | ```
 918 |  <ol class="list-inline">
 919 |   <li>
 920 |    'AFFX-HUMISGF3A/M97935_MA_at (endogenous control)'
 921 |   </li>
 922 |   <li>
 923 |    'AFFX-HUMISGF3A/M97935_MB_at (endogenous control)'
 924 |   </li>
 925 |   <li>
 926 |    'AFFX-HUMISGF3A/M97935_3_at (endogenous control)'
 927 |   </li>
 928 |   <li>
 929 |    'AFFX-HUMRGE/M10098_5_at (endogenous control)'
 930 |   </li>
 931 |   <li>
 932 |    'AFFX-HUMRGE/M10098_M_at (endogenous control)'
 933 |   </li>
 934 |   <li>
 935 |    'AFFX-HUMRGE/M10098_3_at (endogenous control)'
 936 |   </li>
 937 |   <li>
 938 |    'AFFX-HUMGAPDH/M33197_5_at (endogenous control)'
 939 |   </li>
 940 |   <li>
 941 |    'AFFX-HUMGAPDH/M33197_M_at (endogenous control)'
 942 |   </li>
 943 |   <li>
 944 |    'AFFX-HSAC07/X00351_5_at (endogenous control)'
 945 |   </li>
 946 |   <li>
 947 |    'AFFX-HSAC07/X00351_M_at (endogenous control)'
 948 |   </li>
 949 |   <li>
 950 |    'AFFX-HUMTFRR/M11507_5_at (endogenous control)'
 951 |   </li>
 952 |   <li>
 953 |    'AFFX-HUMTFRR/M11507_M_at (endogenous control)'
 954 |   </li>
 955 |   <li>
 956 |    'AFFX-HUMTFRR/M11507_3_at (endogenous control)'
 957 |   </li>
 958 |   <li>
 959 |    'AFFX-M27830_5_at (endogenous control)'
 960 |   </li>
 961 |   <li>
 962 |    'AFFX-M27830_M_at (endogenous control)'
 963 |   </li>
 964 |   <li>
 965 |    'AFFX-M27830_3_at (endogenous control)'
 966 |   </li>
 967 |   <li>
 968 |    'AFFX-HSAC07/X00351_3_st (endogenous control)'
 969 |   </li>
 970 |   <li>
 971 |    'AFFX-HUMGAPDH/M33197_M_st (endogenous control)'
 972 |   </li>
 973 |   <li>
 974 |    'AFFX-HUMGAPDH/M33197_3_st (endogenous control)'
 975 |   </li>
 976 |   <li>
 977 |    'AFFX-HSAC07/X00351_M_st (endogenous control)'
 978 |   </li>
 979 |  </ol>
 980 |  Twenty-seven patients are diagnosed as acute lymphoblastic leukemia (ALL) and
 981 | eleven as acute myeloid leukemia (AML). The tumor class is given by the numeric
 982 | vector golub.cl, where ALL is indicated by 0 and AML by 1.
 983 |  <font color="#808080">
 984 |   In&nbsp;[8]:
 985 |  </font>
 986 |  ```R
 987 | colnames(golub) = golub.cl
 988 | 
 989 | head(golub)
 990 | ```
 991 |  <table class="table-responsive table-striped">
 992 |   <thead>
 993 |    <tr>
 994 |     <th>
 995 |     </th>
 996 |     <th scope="col">
 997 |      0
 998 |     </th>
 999 |     <th scope="col">
1000 |      0
1001 |     </th>
1002 |     <th scope="col">
1003 |      0
1004 |     </th>
1005 |     <th scope="col">
1006 |      0
1007 |     </th>
1008 |     <th scope="col">
1009 |      0
1010 |     </th>
1011 |     <th scope="col">
1012 |      0
1013 |     </th>
1014 |     <th scope="col">
1015 |      0
1016 |     </th>
1017 |     <th scope="col">
1018 |      0
1019 |     </th>
1020 |     <th scope="col">
1021 |      0
1022 |     </th>
1023 |     <th scope="col">
1024 |      0
1025 |     </th>
1026 |     <th scope="col">
1027 |      ⋯
1028 |     </th>
1029 |     <th scope="col">
1030 |      1
1031 |     </th>
1032 |     <th scope="col">
1033 |      1
1034 |     </th>
1035 |     <th scope="col">
1036 |      1
1037 |     </th>
1038 |     <th scope="col">
1039 |      1
1040 |     </th>
1041 |     <th scope="col">
1042 |      1
1043 |     </th>
1044 |     <th scope="col">
1045 |      1
1046 |     </th>
1047 |     <th scope="col">
1048 |      1
1049 |     </th>
1050 |     <th scope="col">
1051 |      1
1052 |     </th>
1053 |     <th scope="col">
1054 |      1
1055 |     </th>
1056 |     <th scope="col">
1057 |      1
1058 |     </th>
1059 |    </tr>
1060 |   </thead>
1061 |   <tbody>
1062 |    <tr>
1063 |     <th scope="row">
1064 |      AFFX-HUMISGF3A/M97935_MA_at
1065 |     </th>
1066 |     <td>
1067 |      -1.45769
1068 |     </td>
1069 |     <td>
1070 |      -1.39420
1071 |     </td>
1072 |     <td>
1073 |      -1.42779
1074 |     </td>
1075 |     <td>
1076 |      -1.40715
1077 |     </td>
1078 |     <td>
1079 |      -1.42668
1080 |     </td>
1081 |     <td>
1082 |      -1.21719
1083 |     </td>
1084 |     <td>
1085 |      -1.37386
1086 |     </td>
1087 |     <td>
1088 |      -1.36832
1089 |     </td>
1090 |     <td>
1091 |      -1.47649
1092 |     </td>
1093 |     <td>
1094 |      -1.21583
1095 |     </td>
1096 |     <td>
1097 |      ⋯
1098 |     </td>
1099 |     <td>
1100 |      -1.08902
1101 |     </td>
1102 |     <td>
1103 |      -1.29865
1104 |     </td>
1105 |     <td>
1106 |      -1.26183
1107 |     </td>
1108 |     <td>
1109 |      -1.44434
1110 |     </td>
1111 |     <td>
1112 |      1.10147
1113 |     </td>
1114 |     <td>
1115 |      -1.34158
1116 |     </td>
1117 |     <td>
1118 |      -1.22961
1119 |     </td>
1120 |     <td>
1121 |      -0.75919
1122 |     </td>
1123 |     <td>
1124 |      0.84905
1125 |     </td>
1126 |     <td>
1127 |      -0.66465
1128 |     </td>
1129 |    </tr>
1130 |    <tr>
1131 |     <th scope="row">
1132 |      AFFX-HUMISGF3A/M97935_MB_at
1133 |     </th>
1134 |     <td>
1135 |      -0.75161
1136 |     </td>
1137 |     <td>
1138 |      -1.26278
1139 |     </td>
1140 |     <td>
1141 |      -0.09052
1142 |     </td>
1143 |     <td>
1144 |      -0.99596
1145 |     </td>
1146 |     <td>
1147 |      -1.24245
1148 |     </td>
1149 |     <td>
1150 |      -0.69242
1151 |     </td>
1152 |     <td>
1153 |      -1.37386
1154 |     </td>
1155 |     <td>
1156 |      -0.50803
1157 |     </td>
1158 |     <td>
1159 |      -1.04533
1160 |     </td>
1161 |     <td>
1162 |      -0.81257
1163 |     </td>
1164 |     <td>
1165 |      ⋯
1166 |     </td>
1167 |     <td>
1168 |      -1.08902
1169 |     </td>
1170 |     <td>
1171 |      -1.05094
1172 |     </td>
1173 |     <td>
1174 |      -1.26183
1175 |     </td>
1176 |     <td>
1177 |      -1.25918
1178 |     </td>
1179 |     <td>
1180 |      0.97813
1181 |     </td>
1182 |     <td>
1183 |      -0.79357
1184 |     </td>
1185 |     <td>
1186 |      -1.22961
1187 |     </td>
1188 |     <td>
1189 |      -0.71792
1190 |     </td>
1191 |     <td>
1192 |      0.45127
1193 |     </td>
1194 |     <td>
1195 |      -0.45804
1196 |     </td>
1197 |    </tr>
1198 |    <tr>
1199 |     <th scope="row">
1200 |      AFFX-HUMISGF3A/M97935_3_at
1201 |     </th>
1202 |     <td>
1203 |      0.45695
1204 |     </td>
1205 |     <td>
1206 |      -0.09654
1207 |     </td>
1208 |     <td>
1209 |      0.90325
1210 |     </td>
1211 |     <td>
1212 |      -0.07194
1213 |     </td>
1214 |     <td>
1215 |      0.03232
1216 |     </td>
1217 |     <td>
1218 |      0.09713
1219 |     </td>
1220 |     <td>
1221 |      -0.11978
1222 |     </td>
1223 |     <td>
1224 |      0.23381
1225 |     </td>
1226 |     <td>
1227 |      0.23987
1228 |     </td>
1229 |     <td>
1230 |      0.44201
1231 |     </td>
1232 |     <td>
1233 |      ⋯
1234 |     </td>
1235 |     <td>
1236 |      -0.43377
1237 |     </td>
1238 |     <td>
1239 |      -0.10823
1240 |     </td>
1241 |     <td>
1242 |      -0.29385
1243 |     </td>
1244 |     <td>
1245 |      0.05067
1246 |     </td>
1247 |     <td>
1248 |      1.69430
1249 |     </td>
1250 |     <td>
1251 |      -0.12472
1252 |     </td>
1253 |     <td>
1254 |      0.04609
1255 |     </td>
1256 |     <td>
1257 |      0.24347
1258 |     </td>
1259 |     <td>
1260 |      0.90774
1261 |     </td>
1262 |     <td>
1263 |      0.46509
1264 |     </td>
1265 |    </tr>
1266 |    <tr>
1267 |     <th scope="row">
1268 |      AFFX-HUMRGE/M10098_5_at
1269 |     </th>
1270 |     <td>
1271 |      3.13533
1272 |     </td>
1273 |     <td>
1274 |      0.21415
1275 |     </td>
1276 |     <td>
1277 |      2.08754
1278 |     </td>
1279 |     <td>
1280 |      2.23467
1281 |     </td>
1282 |     <td>
1283 |      0.93811
1284 |     </td>
1285 |     <td>
1286 |      2.24089
1287 |     </td>
1288 |     <td>
1289 |      3.36576
1290 |     </td>
1291 |     <td>
1292 |      1.97859
1293 |     </td>
1294 |     <td>
1295 |      2.66468
1296 |     </td>
1297 |     <td>
1298 |      -1.21583
1299 |     </td>
1300 |     <td>
1301 |      ⋯
1302 |     </td>
1303 |     <td>
1304 |      0.29598
1305 |     </td>
1306 |     <td>
1307 |      -1.29865
1308 |     </td>
1309 |     <td>
1310 |      2.76869
1311 |     </td>
1312 |     <td>
1313 |      2.08960
1314 |     </td>
1315 |     <td>
1316 |      0.70003
1317 |     </td>
1318 |     <td>
1319 |      0.13854
1320 |     </td>
1321 |     <td>
1322 |      1.75908
1323 |     </td>
1324 |     <td>
1325 |      0.06151
1326 |     </td>
1327 |     <td>
1328 |      1.30297
1329 |     </td>
1330 |     <td>
1331 |      0.58186
1332 |     </td>
1333 |    </tr>
1334 |    <tr>
1335 |     <th scope="row">
1336 |      AFFX-HUMRGE/M10098_M_at
1337 |     </th>
1338 |     <td>
1339 |      2.76569
1340 |     </td>
1341 |     <td>
1342 |      -1.27045
1343 |     </td>
1344 |     <td>
1345 |      1.60433
1346 |     </td>
1347 |     <td>
1348 |      1.53182
1349 |     </td>
1350 |     <td>
1351 |      1.63728
1352 |     </td>
1353 |     <td>
1354 |      1.85697
1355 |     </td>
1356 |     <td>
1357 |      3.01847
1358 |     </td>
1359 |     <td>
1360 |      1.12853
1361 |     </td>
1362 |     <td>
1363 |      2.17016
1364 |     </td>
1365 |     <td>
1366 |      -1.21583
1367 |     </td>
1368 |     <td>
1369 |      ⋯
1370 |     </td>
1371 |     <td>
1372 |      -1.08902
1373 |     </td>
1374 |     <td>
1375 |      -1.29865
1376 |     </td>
1377 |     <td>
1378 |      2.00518
1379 |     </td>
1380 |     <td>
1381 |      1.17454
1382 |     </td>
1383 |     <td>
1384 |      -1.47218
1385 |     </td>
1386 |     <td>
1387 |      -1.34158
1388 |     </td>
1389 |     <td>
1390 |      1.55086
1391 |     </td>
1392 |     <td>
1393 |      -1.18107
1394 |     </td>
1395 |     <td>
1396 |      1.01596
1397 |     </td>
1398 |     <td>
1399 |      0.15788
1400 |     </td>
1401 |    </tr>
1402 |    <tr>
1403 |     <th scope="row">
1404 |      AFFX-HUMRGE/M10098_3_at
1405 |     </th>
1406 |     <td>
1407 |      2.64342
1408 |     </td>
1409 |     <td>
1410 |      1.01416
1411 |     </td>
1412 |     <td>
1413 |      1.70477
1414 |     </td>
1415 |     <td>
1416 |      1.63845
1417 |     </td>
1418 |     <td>
1419 |      -0.36075
1420 |     </td>
1421 |     <td>
1422 |      1.73451
1423 |     </td>
1424 |     <td>
1425 |      3.36576
1426 |     </td>
1427 |     <td>
1428 |      0.96870
1429 |     </td>
1430 |     <td>
1431 |      2.72368
1432 |     </td>
1433 |     <td>
1434 |      -1.21583
1435 |     </td>
1436 |     <td>
1437 |      ⋯
1438 |     </td>
1439 |     <td>
1440 |      -1.08902
1441 |     </td>
1442 |     <td>
1443 |      -1.29865
1444 |     </td>
1445 |     <td>
1446 |      1.73780
1447 |     </td>
1448 |     <td>
1449 |      0.89347
1450 |     </td>
1451 |     <td>
1452 |      -0.52883
1453 |     </td>
1454 |     <td>
1455 |      -1.22168
1456 |     </td>
1457 |     <td>
1458 |      0.90832
1459 |     </td>
1460 |     <td>
1461 |      -1.39906
1462 |     </td>
1463 |     <td>
1464 |      0.51266
1465 |     </td>
1466 |     <td>
1467 |      1.36249
1468 |     </td>
1469 |    </tr>
1470 |   </tbody>
1471 |  </table>
1472 |  Note that sometimes it is better to construct a factor which indicates the tumor
1473 | class of the patients. Such a factor could be used for instance to separate the
1474 | tumor groups for plotting purposes.  The factor (`gol.fac`) can be contructed as
1475 | follows.
1476 |  <font color="#808080">
1477 |   In&nbsp;[9]:
1478 |  </font>
1479 |  ```R
1480 | gol.fac &lt;-  factor(golub.cl, levels = 0:1, labels = c("AML", "ALL"))
1481 | ```
1482 | 
1483 | The labels correspond to the two tumor classes. The evaluation of gol.fac=="ALL"
1484 | returns
1485 | TRUE for the first twenty-seven values and FALSE for the remaining eleven,
1486 | which is useful as a column index for selecting the expression values of the
1487 | ALL patients. The expression values of gene CCND3 Cyclin D3 from the
1488 | ALL patients can now be printed to the screen, as follows.
1489 |  <font color="#808080">
1490 |   In&nbsp;[10]:
1491 |  </font>
1492 |  ```R
1493 | golub[1042, gol.fac == "ALL"]
1494 | ```
1495 |  <dl class="dl-horizontal">
1496 |   <dt>
1497 |    1
1498 |   </dt>
1499 |   <dd>
1500 |    0.88941
1501 |   </dd>
1502 |   <dt>
1503 |    1
1504 |   </dt>
1505 |   <dd>
1506 |    1.45014
1507 |   </dd>
1508 |   <dt>
1509 |    1
1510 |   </dt>
1511 |   <dd>
1512 |    0.42904
1513 |   </dd>
1514 |   <dt>
1515 |    1
1516 |   </dt>
1517 |   <dd>
1518 |    0.82667
1519 |   </dd>
1520 |   <dt>
1521 |    1
1522 |   </dt>
1523 |   <dd>
1524 |    0.63637
1525 |   </dd>
1526 |   <dt>
1527 |    1
1528 |   </dt>
1529 |   <dd>
1530 |    1.0225
1531 |   </dd>
1532 |   <dt>
1533 |    1
1534 |   </dt>
1535 |   <dd>
1536 |    0.12758
1537 |   </dd>
1538 |   <dt>
1539 |    1
1540 |   </dt>
1541 |   <dd>
1542 |    -0.74333
1543 |   </dd>
1544 |   <dt>
1545 |    1
1546 |   </dt>
1547 |   <dd>
1548 |    0.73784
1549 |   </dd>
1550 |   <dt>
1551 |    1
1552 |   </dt>
1553 |   <dd>
1554 |    0.4947
1555 |   </dd>
1556 |   <dt>
1557 |    1
1558 |   </dt>
1559 |   <dd>
1560 |    1.12058
1561 |   </dd>
1562 |  </dl>
1563 |  ## Creating the exploratory plots
1564 | 
1565 | ### 1.1\. Plotting the value of gene (CCND3) in all nRNA samples (M92287_at)
1566 | 
1567 | We shall first have a look at the expression values of a gener with manufacurer
1568 | name `M92278_at`, which is known in biology as "CCND3 Cyclin D3".
1569 | 
1570 | The expression values of this gene are collected in row 1042 of golub. To load
1571 | the data and to obtain the relevant information from row 1042 of golub.gnames,
1572 | use the following:
1573 |  <font color="#808080">
1574 |   In&nbsp;[11]:
1575 |  </font>
1576 |  ```R
1577 | mygene &lt;- golub[1042, ]
1578 | ```
1579 | 
1580 | The data has now been stored in the `golub` matrix. We will now plot the
1581 | expression values od the gene CCND3 Cyclin D3.
1582 |  <font color="#808080">
1583 |   In&nbsp;[12]:
1584 |  </font>
1585 |  ```R
1586 | plot(mygene)
1587 | ```
1588 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_21_0.png"/>
1589 |  In the previous plot we just used the default plotting preferences within R base
1590 | plotting.We can do some improvements so that the plot is easily understood.
1591 |  <font color="#808080">
1592 |   In&nbsp;[13]:
1593 |  </font>
1594 |  ```R
1595 | plot(mygene, pch = 15, col = 'slateblue', ylab = 'Expression value of gene: CCND3', 
1596 |     main = ' Gene expression values of CCND3 Cyclin D3')
1597 | ```
1598 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_23_0.png"/>
1599 |  In this plot the vertical axis corresponds to the size of the expression values
1600 | and the horizontal axis the index of the patients.
1601 | 
1602 | ### 1.2\. Gene expression between patient 1 (ALL) and patient 38 (AML)
1603 |  <font color="#808080">
1604 |   In&nbsp;[14]:
1605 |  </font>
1606 |  ```R
1607 | plot(golub[,1], golub[,38])
1608 | ```
1609 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_26_0.png"/>
1610 |  Adding diagonal lines to the plot and changing axes labels
1611 |  <font color="#808080">
1612 |   In&nbsp;[15]:
1613 |  </font>
1614 |  ```R
1615 | plot(golub[,1], golub[,38], xlab = 'Patient 1 (ALL)', ylab = 'Patient 38 (AML)') 
1616 | abline(a = 0, b = 1, col = 'mediumpurple', lwd =3)
1617 | ```
1618 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_28_0.png"/>
1619 |  ### 1.3\. Scatter plots to detect independence
1620 |  <font color="#808080">
1621 |   In&nbsp;[16]:
1622 |  </font>
1623 |  ```R
1624 | mysamplist &lt;- golub[, c(1:15)]
1625 | colnames(mysamplist) = c(1:15)
1626 | ```
1627 |  <font color="#808080">
1628 |   In&nbsp;[17]:
1629 |  </font>
1630 |  ```R
1631 | plot(as.data.frame(mysamplist), pch='.')
1632 | ```
1633 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_31_0.png"/>
1634 |  ### 1.4\. Bar plot of 4 cyclin genes expression values in 3 ALL and AML patients
1635 | 
1636 | We will analyse the expression values of the `D13639_at, M92287_at, U11791_at,
1637 | Z36714_AT` genes in three chosen AML and ALL patients
1638 |  <font color="#808080">
1639 |   In&nbsp;[18]:
1640 |  </font>
1641 |  ```R
1642 | mygenelist &lt;- golub[c(85, 1042, 1212, 2240), c(1:3, 36:38)]
1643 | 
1644 | # having a look at the data set chosen
1645 | mygenelist
1646 | ```
1647 |  <table class="table-responsive table-striped">
1648 |   <thead>
1649 |    <tr>
1650 |     <th>
1651 |     </th>
1652 |     <th scope="col">
1653 |      0
1654 |     </th>
1655 |     <th scope="col">
1656 |      0
1657 |     </th>
1658 |     <th scope="col">
1659 |      0
1660 |     </th>
1661 |     <th scope="col">
1662 |      1
1663 |     </th>
1664 |     <th scope="col">
1665 |      1
1666 |     </th>
1667 |     <th scope="col">
1668 |      1
1669 |     </th>
1670 |    </tr>
1671 |   </thead>
1672 |   <tbody>
1673 |    <tr>
1674 |     <th scope="row">
1675 |      D13639_at
1676 |     </th>
1677 |     <td>
1678 |      2.09511
1679 |     </td>
1680 |     <td>
1681 |      1.71953
1682 |     </td>
1683 |     <td>
1684 |      -1.46227
1685 |     </td>
1686 |     <td>
1687 |      -0.92935
1688 |     </td>
1689 |     <td>
1690 |      -0.11091
1691 |     </td>
1692 |     <td>
1693 |      1.15591
1694 |     </td>
1695 |    </tr>
1696 |    <tr>
1697 |     <th scope="row">
1698 |      M92287_at
1699 |     </th>
1700 |     <td>
1701 |      2.10892
1702 |     </td>
1703 |     <td>
1704 |      1.52405
1705 |     </td>
1706 |     <td>
1707 |      1.96403
1708 |     </td>
1709 |     <td>
1710 |      0.73784
1711 |     </td>
1712 |     <td>
1713 |      0.49470
1714 |     </td>
1715 |     <td>
1716 |      1.12058
1717 |     </td>
1718 |    </tr>
1719 |    <tr>
1720 |     <th scope="row">
1721 |      U11791_at
1722 |     </th>
1723 |     <td>
1724 |      -0.11439
1725 |     </td>
1726 |     <td>
1727 |      -0.72887
1728 |     </td>
1729 |     <td>
1730 |      -0.39674
1731 |     </td>
1732 |     <td>
1733 |      -0.94364
1734 |     </td>
1735 |     <td>
1736 |      0.05047
1737 |     </td>
1738 |     <td>
1739 |      0.05905
1740 |     </td>
1741 |    </tr>
1742 |    <tr>
1743 |     <th scope="row">
1744 |      Z36714_at
1745 |     </th>
1746 |     <td>
1747 |      -1.45769
1748 |     </td>
1749 |     <td>
1750 |      -1.39420
1751 |     </td>
1752 |     <td>
1753 |      -1.46227
1754 |     </td>
1755 |     <td>
1756 |      -1.39906
1757 |     </td>
1758 |     <td>
1759 |      -1.34579
1760 |     </td>
1761 |     <td>
1762 |      -1.32403
1763 |     </td>
1764 |    </tr>
1765 |   </tbody>
1766 |  </table>
1767 |  <font color="#808080">
1768 |   In&nbsp;[19]:
1769 |  </font>
1770 |  ```R
1771 | barplot(mygenelist)
1772 | box()
1773 | ```
1774 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_34_0.png"/>
1775 |  The plot is not very easy to read, so we will add some colours and a legend so
1776 | that we know which gene each bar segment corresponds to.
1777 |  <font color="#808080">
1778 |   In&nbsp;[20]:
1779 |  </font>
1780 |  ```R
1781 | # custom colours 
1782 | colours = c('lightblue2',   'slateblue', '#BD7BB8', '#2B377A')
1783 | 
1784 | barplot(mygenelist, col = colours, legend = TRUE, border = 'white')
1785 | box()
1786 | ```
1787 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_36_0.png"/>
1788 |  In this case the patients are indicated on the `X` axis (0 and 1 respectively)
1789 | while the gene expression level is indicate on the `Y` axis.
1790 | 
1791 | We can make some improvements to the plots.
1792 | Let's have a look at the `barplot` arguments:
1793 |  <font color="#808080">
1794 |   In&nbsp;[21]:
1795 |  </font>
1796 |  ```R
1797 | ?barplot
1798 | ```
1799 | 
1800 | We are going to focus on only a few of the histgram arguments:
1801 | - `beside`: `TRUE` for the bars to be displayed as justapoxed bars, `FALSE` for
1802 | stacked bars
1803 | - `horiz` : `FALSE` bars displayed vertically with the first bar to the left,
1804 | `TRUE` bars are displayed horizontally with the first at the bottom.
1805 | - `ylim`, `xlim` :  limits for the y and x axes
1806 | - `col`: colour choices
1807 |  <font color="#808080">
1808 |   In&nbsp;[22]:
1809 |  </font>
1810 |  ```R
1811 | barplot(mygenelist, horiz = TRUE, col = colours, legend = TRUE,
1812 |        ylab = 'Patient', border = 'white', 
1813 |         xlab = 'Gene expression level', main  = 'Cycline genes expression')
1814 | box()
1815 | ```
1816 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_40_0.png"/>
1817 |  In the plot above we presented the barplots horizontally and added some colours,
1818 | which makes it easier to understand the data presented.
1819 | You can also use the barplots to represent the mean and standard error which we
1820 | will be doing in the following sections.
1821 | 
1822 | ### 1.5\. Plotting the mean
1823 | 
1824 | In the following we will compute the mean for the expression values of both the
1825 | ALL and AML patients. We will be using the same 4 cycline genes used in the
1826 | example above.
1827 | 
1828 | First we will compute the ALL and AML for all the patients. Once the means are
1829 | computed they are combined into a single data frame.
1830 | 
1831 | Finally, the means are plotted using the `barplot` function.
1832 |  <font color="#808080">
1833 |   In&nbsp;[23]:
1834 |  </font>
1835 |  ```R
1836 | # Calculating the mean of the chosen genes from patient 1 to 27 and 28 to 38
1837 | ALLmean &lt;- rowMeans(golub[c(85,1042,1212,2240),c(1:27)])
1838 | AMLmean &lt;- rowMeans(golub[c(85,1042,1212,2240),c(28:38)])
1839 | 
1840 | # Combining the mean matrices previously calculated
1841 | dataheight &lt;- cbind(ALLmean, AMLmean)
1842 | 
1843 | # Plotting 
1844 | barx &lt;- barplot(dataheight, beside=T, horiz=F, col= colours, ylim=c(-2,2.5),
1845 |                 legend = TRUE,border = 'white' ,
1846 |                 ylab = 'Gene expression level', main = 'Cycline genes mean expression
1847 | in AML and ALL patients')
1848 | box()
1849 | ```
1850 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_43_0.png"/>
1851 |  ### 1.6\. Adding error bars to the previous plot
1852 | 
1853 | 
1854 | In the previous section we computed the mean expression level for 4 cycline
1855 | genes between the AML and ALL patients. Sometimes it is useful to add error bars
1856 | to the plots (as the one above) to convey the uncertainty in the data presented.
1857 | 
1858 | For such a purpose we often use the **Standard Deviation**:
1859 | 
1860 | 
1861 | $$ \sigma = \sqrt{\frac{\sum_{i=1}^{n}\left(x_i -\mu \right)^2}{N}}$$
1862 | 
1863 | 
1864 | which in turn tells us how much the values in a certain group tend to deviate
1865 | from their mean value.
1866 | 
1867 | Let's start calculating the Standard Deviation of the data.
1868 |  <font color="#808080">
1869 |   In&nbsp;[24]:
1870 |  </font>
1871 |  ```R
1872 | # Calculating the SD
1873 | ALLsd &lt;- apply(golub[c(85,1042,1212,2240),c(1:27)], 1, sd)
1874 | nALL=length(c(1:27))
1875 | AMLsd &lt;- apply(golub[c(85,1042,1212,2240),c(28:38)], 1, sd)
1876 | nAML=length(c(28:38))
1877 | 
1878 | # Combining the data
1879 | datasd &lt;- cbind(ALLsd, AMLsd)
1880 | 
1881 | 
1882 | ```
1883 | 
1884 | Another measure used to quantify the deviation is the **standard error**, which
1885 | qutifies the variability in the **_means_** of our groups instead of reporting
1886 | the variability among the data points.
1887 | 
1888 | A relatively straigtforward way to compute this is by assuming if we were to
1889 | repeat a  given experiment many many times, then it would roughly follow a
1890 | normal distribution. **Note &ndash; this is a big assumption**.  hence, if we assuemt
1891 | hat the means follow a nosmal distribution, then the standard error (_a.k.a.
1892 | variability of group means_) can be defined as:
1893 | 
1894 | $$ SE  = \frac{SD}{\sqrt{n}} $$
1895 | 
1896 | which in layman terms can be read as  &ldquo;take the general variability of the
1897 | points around their group means (the standard deviation), and scale this number
1898 | by the number of points that you&rsquo;ve collected&rdquo;.
1899 | 
1900 | Since we have already computed the SD we can now compute the standard error
1901 | (SE).
1902 |  <font color="#808080">
1903 |   In&nbsp;[25]:
1904 |  </font>
1905 |  ```R
1906 | datase &lt;- cbind(ALLsd/sqrt(nALL), AMLsd/sqrt(nAML))
1907 | ```
1908 | 
1909 | Now we can create a plot of the mean data as well as the SE and SD.
1910 |  <font color="#808080">
1911 |   In&nbsp;[26]:
1912 |  </font>
1913 |  ```R
1914 | # creating a panel of 2 plots displayed in 1 row
1915 | par(mfrow = c(1,2))
1916 | 
1917 | # Plot with the SD
1918 | datasdend&lt;-abs(dataheight) + abs(datasd)
1919 | datasdend[c(3,4),] = - datasdend[c(3,4),]
1920 | barx &lt;- barplot(dataheight, beside=T, horiz=F, col = colours, ylim=c(-2,2.5),
1921 |                main = 'Data +  SD', border = 'white')
1922 | abline(a = 0 , b = 0, h = 0)
1923 | arrows(barx, dataheight, barx, datasdend, angle=90, lwd = 2, length = 0.15, 
1924 |        col = 'navyblue')
1925 | box()
1926 | 
1927 | # Plot with the se: error associated to the mean!
1928 | datasdend&lt;-abs(dataheight) + abs(datase)
1929 | datasdend[c(3,4),] = -datasdend[c(3,4),]
1930 | barx &lt;- barplot(dataheight, beside=T, horiz=F, col = colours, ylim=c(-2,2.5),
1931 |                main = 'Data + SE', border = 'white')
1932 | abline(a = 0 , b = 0, h = 0)
1933 | arrows(barx, dataheight, barx, datasdend, angle=90, lwd = 2, length = 0.15,
1934 |        col = 'navyblue')
1935 | box()
1936 | ```
1937 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_49_0.png"/>
1938 |  Note that the error bars for the SE are smaller than those for the SD. This is
1939 | no coincidence!
1940 | 
1941 | As we increase N (in the SE equation), we will decrease the error. Hence the
1942 | standard error will **always** be smaller than the SD.
1943 | 
1944 | ## 2. Data representation
1945 | This section presents some essential manners to display and visualize  data.
1946 | 
1947 | ### 2.1 Frequency table
1948 | Discrete data occur when the values naturally fall into categories. A frequency
1949 | table simply gives the number of occurrences within a category.
1950 | 
1951 | A gene consists of a sequence of nucleotides (A; C; G; T)
1952 | 
1953 | The number of each nucleotide can be displayed in a frequency table.
1954 | 
1955 | This will be illustrated by the Zyxin gene which plays an important role in cell
1956 | adhesion The accession number (X94991.1) of one of its variants can be found in
1957 | a data base like NCBI (UniGene). The code below illustrates how to read the
1958 | sequence &rdquo;X94991.1&rdquo; of the species homo sapiens from GenBank, to construct a
1959 | pie from a frequency table of the four nucleotides .
1960 |  <font color="#808080">
1961 |   In&nbsp;[27]:
1962 |  </font>
1963 |  ```R
1964 | library('ape')
1965 | ```
1966 |  <font color="#808080">
1967 |   In&nbsp;[29]:
1968 |  </font>
1969 |  ```R
1970 | v = read.GenBank(c("X94991.1"),as.character = TRUE)
1971 | 
1972 | pie(table(v$X94991.1), col = colours, border = 'white')
1973 | 
1974 | # prints the data as a table 
1975 | table(read.GenBank(c("X94991.1"),as.character=TRUE))
1976 | ```
1977 | 
1978 | 
1979 |     
1980 |       a   c   g   t 
1981 |     410 789 573 394
1982 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_53_1.png"/>
1983 |  ### 2.2 Stripcharts
1984 | 
1985 | An elementary method to visualize data is by using a so-called stripchart,
1986 | by which the values of the data are represented as e.g. small boxes
1987 | it is useful in combination with a factor that distinguishes members from
1988 | different experimental conditions or patients groups.
1989 | 
1990 | Once again we use the CCND3 Cyclin D3 data to generate the plots.
1991 |  <font color="#808080">
1992 |   In&nbsp;[30]:
1993 |  </font>
1994 |  ```R
1995 | # data(golub, package = "multtest")
1996 | gol.fac &lt;- factor(golub.cl,levels=0:1, labels= c("ALL","AML"))
1997 | 
1998 | stripchart(golub[1042,] ~ gol.fac, method = "jitter", 
1999 |            col = c('slateblue', 'darkgrey'), pch = 16)
2000 | 
2001 | ```
2002 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_55_0.png"/>
2003 |  From the above figure, it can be observed that the CCND3 Cyclin D3 expression
2004 | values of the ALL patients tend to have larger expression values than those of
2005 | the AML patient.
2006 | 
2007 | 
2008 | ### 2.3 Histograms
2009 | 
2010 | Another method to visualize data is by dividing the range of data values into
2011 | a number of intervals and to plot the frequency per interval as a bar. Such
2012 | a plot is called a histogram.
2013 | 
2014 | We will now generate a histogram of the expression values of gene CCND3 Cyclin
2015 | D3 as well as all the genes for the AML and ALL patients contained in the Golub
2016 | dataset.
2017 |  <font color="#808080">
2018 |   In&nbsp;[31]:
2019 |  </font>
2020 |  ```R
2021 | par(mfrow=c(2,2))
2022 | 
2023 | hist(golub[1042, gol.fac == "ALL"], 
2024 |      col = 'slateblue', border = 'white',
2025 |     main = 'Golub[1042], ALL', xlab = 'ALL')
2026 | box()
2027 | 
2028 | hist(golub,breaks = 10, 
2029 |     col = 'slateblue', border = 'white',
2030 |     main =  'Golub')
2031 | box()
2032 | 
2033 | hist(golub[, gol.fac == "AML"],breaks = 10, 
2034 |      col = 'slateblue', border = 'white',
2035 |     main = 'Golub, AML', xlab = 'AML')
2036 | box()
2037 | 
2038 | hist(golub[, gol.fac == "ALL"],breaks = 10,
2039 |      col = 'slateblue', border = 'white',
2040 |     main = 'Golub, ALL', xlab = 'ALL')
2041 | box()
2042 | ```
2043 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_58_0.png"/>
2044 |  ### 2.3 Boxplots
2045 | 
2046 | A popular method to display data is by
2047 | drawing a box around the 1st and the 3rd quartile (a bold line segment
2048 | for the median), and the smaller line segments (whiskers) for the smallest and
2049 | the largest data values.
2050 | 
2051 | Such a data display is known as a box-and-whisker plot.
2052 | 
2053 | We will start by creating a vector with gene expression values sorted in
2054 | ascending order (using the `sort` function).
2055 |  <font color="#808080">
2056 |   In&nbsp;[32]:
2057 |  </font>
2058 |  ```R
2059 | # Sort the values of one gene
2060 | x &lt;- sort(golub[1042, gol.fac=="ALL"], decreasing = FALSE)
2061 | 
2062 | # printing the first five values
2063 | x[1:5]
2064 | ```
2065 |  <dl class="dl-horizontal">
2066 |   <dt>
2067 |    0
2068 |   </dt>
2069 |   <dd>
2070 |    0.45827
2071 |   </dd>
2072 |   <dt>
2073 |    0
2074 |   </dt>
2075 |   <dd>
2076 |    1.10546
2077 |   </dd>
2078 |   <dt>
2079 |    0
2080 |   </dt>
2081 |   <dd>
2082 |    1.27645
2083 |   </dd>
2084 |   <dt>
2085 |    0
2086 |   </dt>
2087 |   <dd>
2088 |    1.32551
2089 |   </dd>
2090 |   <dt>
2091 |    0
2092 |   </dt>
2093 |   <dd>
2094 |    1.36844
2095 |   </dd>
2096 |  </dl>
2097 |  A view on the distribution of the gene expression values of the `ALL` and `AML`
2098 | patients on gene CCND3 Cyclin D3 can be obtained by  generating two separate
2099 | boxplots adjacent to each other:
2100 |  <font color="#808080">
2101 |   In&nbsp;[41]:
2102 |  </font>
2103 |  ```R
2104 | # Even though we are creating two boxplots we only need one major graph
2105 | par(mfrow=c(1,1))
2106 | boxplot(golub[1042,] ~ gol.fac, col = c('lightblue2', 'mediumpurple'))
2107 | 
2108 | ```
2109 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_62_0.png"/>
2110 |  It can be observed that the gene expression values for ALL are larger than those
2111 | for AML. Furthermore, since the two sub-boxes around the median are more or less
2112 | equally wide, the data are quite symmetrically distributed around the median.
2113 | 
2114 | We can create a histogram of the expression values of gene CCND3 Cyclin D3 of
2115 | the acute lymphoblastic leukemia patients e.g.
2116 |  <font color="#808080">
2117 |   In&nbsp;[110]:
2118 |  </font>
2119 |  ```R
2120 | hist(golub[1042,], col= 'lightblue', border= 'black', breaks= 6, freq= F,
2121 |      main = 'Expression values of gene CCND3 Cyclin D3')
2122 | lines(density(golub[1042,]), col= 'slateblue', lwd = 3)
2123 | box()
2124 | ```
2125 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_64_0.png"/>
2126 |  Now we can observe the distribution of all gene expressions values in all 38
2127 | patients
2128 |  <font color="#808080">
2129 |   In&nbsp;[113]:
2130 |  </font>
2131 |  ```R
2132 | boxplot(golub, col= 'lightblue2', lwd = 1, border="black", pch=18)
2133 | ```
2134 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_66_0.png"/>
2135 |  To compute the exact values for the quartiles we need a sequence running from 0
2136 | to 1 with increments in steps of 0.25
2137 |  <font color="#808080">
2138 |   In&nbsp;[114]:
2139 |  </font>
2140 |  ```R
2141 | pvec &lt;- seq(0, 1, 0.25)
2142 | quantile(golub[1042, gol.fac=='ALL'], pvec)
2143 | ```
2144 |  <dl class="dl-horizontal">
2145 |   <dt>
2146 |    0%
2147 |   </dt>
2148 |   <dd>
2149 |    0.45827
2150 |   </dd>
2151 |   <dt>
2152 |    25%
2153 |   </dt>
2154 |   <dd>
2155 |    1.796065
2156 |   </dd>
2157 |   <dt>
2158 |    50%
2159 |   </dt>
2160 |   <dd>
2161 |    1.92776
2162 |   </dd>
2163 |   <dt>
2164 |    75%
2165 |   </dt>
2166 |   <dd>
2167 |    2.178705
2168 |   </dd>
2169 |   <dt>
2170 |    100%
2171 |   </dt>
2172 |   <dd>
2173 |    2.7661
2174 |   </dd>
2175 |  </dl>
2176 |  Outliers are data points lying far apart from the pattern set by the majority of
2177 | the data values. The implementation in R of the boxplot draws such outliers as
2178 | smalle circles.
2179 | 
2180 | A data point `x` is defined (graphically, not statistically) as an outlier point
2181 | if $$x &lt; 0.25 x -1.5\left(0.75 x -0.25 x\right) [x&gt;0.25x &gt;1.5(0.75x-0.25x)]$$
2182 | 
2183 | 
2184 | ### 2.4 Q-Q plots (Quantile-quantile plots)
2185 | 
2186 | A method to visualize the distribution of gene expression values is y the so-
2187 | called quantile-quantile (Q-Q) plots. In such a plot the quantiles of the gene
2188 | expression values are displayed against the corresponding quantiles of the
2189 | normal distribution (bell-shaped).
2190 | 
2191 | A straight line is added to represent the points which
2192 | correspond exactly to the quantiles of the normal distribution. By observing
2193 | the extent in which the points appear on the line, it can be evaluated to
2194 | what degree the data are normally distributed. That is, the closer the gene
2195 | expression values appear to the line, the more likely it is that the data are
2196 | normally distributed.
2197 | 
2198 | To produce a Q-Q plot of the ALL gene expression values of CCND3 Cyclin D3 one
2199 | may use the following.
2200 |  <font color="#808080">
2201 |   In&nbsp;[116]:
2202 |  </font>
2203 |  ```R
2204 | qqnorm(golub[1042, gol.fac == 'ALL'])
2205 | qqline(golub[1042, gol.fac == 'ALL'], col = 'slateblue', lwd = 2)
2206 | ```
2207 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_72_0.png"/>
2208 |  It can be seen that most of the data points are on or near the straight line,
2209 | while a few others are further away. The above example illustrates a case where
2210 | the degree of non-normality is moderate so that a clear conclusion cannot be
2211 | drawn.
2212 | 
2213 | 
2214 | ## 3. Loading tab-delimited data
2215 |  <font color="#808080">
2216 |   In&nbsp;[117]:
2217 |  </font>
2218 |  ```R
2219 | mydata&lt;-read.delim("./NeuralStemCellData.tab.txt", row.names=1, header=T)
2220 | ```
2221 |  <font color="#808080">
2222 |   In&nbsp;[118]:
2223 |  </font>
2224 |  ```R
2225 | class(mydata)
2226 | ```
2227 | 
2228 | 'data.frame'
2229 | 
2230 | ### Now try and do some exploratory analysis of your own on this data!
2231 | 
2232 | 
2233 | GvHD flow cytometry data
2234 | 
2235 | Only exract the CD3 positive cells
2236 |  <font color="#808080">
2237 |   In&nbsp;[119]:
2238 |  </font>
2239 |  ```R
2240 | cor(mydata[,1],mydata[,2])
2241 | plot(mydata[,1],mydata[,3])
2242 | ```
2243 | 
2244 | 0.956021382271511
2245 |  <img alt="png" src="{{site.url}}{{site.baseurl}}/site/Tutorial\Tutorial_79_1.png"/>
2246 | </img>


--------------------------------------------------------------------------------