├── .github ├── ISSUE_TEMPLATE │ ├── bug_report.md │ ├── feature_request.md │ └── question.md └── workflows │ ├── deploy.yml │ ├── pre-commit.yml │ └── run_tests.yml ├── .gitignore ├── .pre-commit-config.yaml ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── Rmagic ├── .Rbuildignore ├── .pre-commit-config.yaml ├── .pre-commit.r_requirements.txt ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── R │ ├── magic.R │ ├── magic_testdata.R │ ├── preprocessing.R │ └── utils.R ├── README.Rmd ├── README.md ├── data-raw │ └── generate_test_data.R ├── data │ └── magic_testdata.rda ├── inst │ ├── CITATION │ └── examples │ │ ├── BMMSC_data_R_after_magic.png │ │ ├── BMMSC_data_R_before_magic.png │ │ ├── BMMSC_data_R_pca_colored_by_magic.png │ │ ├── BMMSC_data_R_phate_colored_by_magic.png │ │ ├── EMT_data_R_after_magic.png │ │ ├── EMT_data_R_before_magic.png │ │ ├── EMT_data_R_pca_colored_by_magic.png │ │ ├── EMT_data_R_phate_colored_by_magic.png │ │ ├── bonemarrow_tutorial.Rmd │ │ ├── bonemarrow_tutorial.html │ │ ├── emt_tutorial.Rmd │ │ └── emt_tutorial.html ├── man │ ├── as.data.frame.Rd │ ├── as.matrix.Rd │ ├── check_pymagic_version.Rd │ ├── figures │ │ ├── README-plot_magic-1.png │ │ ├── README-plot_raw-1.png │ │ ├── README-plot_reduced_t-1.png │ │ ├── README-run_pca-1.png │ │ ├── README-run_phate-1.png │ │ ├── README-unnamed-chunk-1-1.png │ │ └── README-unnamed-chunk-3-1.png │ ├── ggplot.Rd │ ├── install.magic.Rd │ ├── library.size.normalize.Rd │ ├── magic.Rd │ ├── magic_testdata.Rd │ ├── print.Rd │ ├── pymagic_is_available.Rd │ └── summary.Rd └── tests │ └── test_magic.R ├── data ├── HMLE_TGFb_day_8_10.csv.gz └── test_data.csv ├── magic.gif ├── matlab ├── .DS_Store ├── MAGIC Tutorial MATLAB-EMT.pptx ├── compute_kernel.m ├── compute_operator.m ├── compute_optimal_t.m ├── load_10x.m ├── mmread.m ├── project_genes.m ├── randPCA.m ├── run_magic.m ├── svdpca.m ├── svdpca_sparse.m └── test_magic.m ├── python ├── README.rst ├── doc │ ├── Makefile │ └── source │ │ ├── api.rst │ │ ├── conf.py │ │ ├── index.rst │ │ ├── installation.rst │ │ ├── requirements.txt │ │ └── tutorial.rst ├── magic │ ├── __init__.py │ ├── after_magic_example.png │ ├── before_magic_example.png │ ├── magic.py │ ├── plot.py │ ├── utils.py │ └── version.py ├── requirements.txt ├── setup.py ├── test │ └── test.py └── tutorial_notebooks │ ├── bonemarrow_tutorial.ipynb │ └── emt_tutorial.ipynb └── setup.cfg /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: bug 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Standalone code to reproduce the error 15 | 16 | **Expected behavior** 17 | A clear and concise description of what you expected to happen. 18 | 19 | **Actual behavior** 20 | Please include the full traceback of any errors 21 | 22 | **System information:** 23 | 24 | Output of `magic.__version__`: 25 | 26 | ``` 27 | If you are running MAGIC in R or Python, please run magic.__version__ and paste the results here. 28 | 29 | You can do this with `python -c 'import magic; print(magic.__version__)'` 30 | ``` 31 | 32 | Output of `pd.show_versions()`: 33 | 34 |
35 | 36 | ``` 37 | If you are running MAGIC in R or Python, please run pd.show_versions() and paste the results here. 38 | 39 | You can do this with `python -c 'import pandas as pd; pd.show_versions()'` 40 | ``` 41 | 42 |
43 | 44 | Output of `sessionInfo()`: 45 | 46 |
47 | 48 | ``` 49 | If you are running MAGIC in R, please run sessionInfo() and paste the results here. 50 | 51 | You can do this with `R -e 'library(Rmagic); sessionInfo()'` 52 | ``` 53 | 54 |
55 | 56 | Output of `reticulate::py_discover_config(required_module = "magic")`: 57 | 58 |
59 | 60 | ``` 61 | If you are running MAGIC in R, please run `reticulate::py_discover_config(required_module = "magic")` and paste the results here. 62 | 63 | You can do this with `R -e 'reticulate::py_discover_config(required_module = "magic")'` 64 | ``` 65 | 66 |
67 | 68 | Output of `Rmagic::check_pymagic_version()`: 69 | 70 |
71 | 72 | ``` 73 | If you are running MAGIC in R, please run `Rmagic::check_pymagic_version()` and paste the results here. 74 | 75 | You can do this with `R -e 'Rmagic::check_pymagic_version()'` 76 | ``` 77 | 78 |
79 | 80 | **Additional context** 81 | Add any other context about the problem here. 82 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: enhancement 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/question.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Question 3 | about: Ask questions about MAGIC 4 | title: '' 5 | labels: question 6 | assignees: '' 7 | 8 | --- 9 | -------------------------------------------------------------------------------- /.github/workflows/deploy.yml: -------------------------------------------------------------------------------- 1 | name: Publish Python distributions to PyPI 2 | 3 | on: 4 | push: 5 | branches: 6 | - 'master' 7 | - 'test_deploy' 8 | tags: 9 | - '*' 10 | 11 | jobs: 12 | build-n-publish: 13 | name: Build and publish Python distributions to PyPI 14 | runs-on: ubuntu-latest 15 | 16 | steps: 17 | - uses: actions/checkout@master 18 | 19 | - name: Set up Python 3.7 20 | uses: actions/setup-python@v2 21 | with: 22 | python-version: 3.7 23 | 24 | - name: Install pypa/build 25 | run: >- 26 | cd python && 27 | python -m 28 | pip install 29 | build 30 | --user && 31 | cd .. 32 | 33 | - name: Build a binary wheel and a source tarball 34 | run: >- 35 | cd python && 36 | python -m 37 | build 38 | --sdist 39 | --wheel 40 | --outdir dist/ 41 | . && 42 | cd .. 43 | 44 | - name: Publish distribution to Test PyPI 45 | uses: pypa/gh-action-pypi-publish@master 46 | with: 47 | packages_dir: python/dist 48 | skip_existing: true 49 | password: ${{ secrets.test_pypi_password }} 50 | repository_url: https://test.pypi.org/legacy/ 51 | 52 | - name: Publish distribution to PyPI 53 | if: startsWith(github.ref, 'refs/tags') 54 | uses: pypa/gh-action-pypi-publish@master 55 | with: 56 | packages_dir: python/dist 57 | password: ${{ secrets.pypi_password }} 58 | -------------------------------------------------------------------------------- /.github/workflows/pre-commit.yml: -------------------------------------------------------------------------------- 1 | name: pre-commit 2 | on: 3 | push: 4 | branches-ignore: 5 | - 'master' 6 | 7 | jobs: 8 | pre-commit: 9 | runs-on: ubuntu-latest 10 | 11 | steps: 12 | - name: Cancel Previous Runs 13 | uses: styfle/cancel-workflow-action@0.6.0 14 | with: 15 | access_token: ${{ github.token }} 16 | - uses: actions/checkout@v2 17 | with: 18 | fetch-depth: 0 19 | 20 | - name: Set up environment 21 | run: | 22 | echo "UBUNTU_VERSION=`grep DISTRIB_RELEASE /etc/lsb-release | sed 's/.*=//g'`" >> $GITHUB_ENV 23 | mkdir -p .local/R/site-packages 24 | echo "R_LIBS_USER=`pwd`/.local/R/site-packages" >> $GITHUB_ENV 25 | 26 | - name: Install system dependencies 27 | if: runner.os == 'Linux' 28 | run: | 29 | sudo apt-get update -qq 30 | sudo apt-get install -y libcurl4-openssl-dev 31 | 32 | - name: Set up Python 33 | uses: actions/setup-python@v2 34 | with: 35 | python-version: "3.8" 36 | architecture: "x64" 37 | 38 | - name: Cache pre-commit 39 | uses: actions/cache@v2 40 | with: 41 | path: ~/.cache/pre-commit 42 | key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}- 43 | 44 | - name: Run pre-commit 45 | uses: pre-commit/action@v2.0.0 46 | 47 | - name: Cache R packages 48 | uses: actions/cache@v2 49 | if: startsWith(runner.os, 'Linux') 50 | with: 51 | path: ${{env.R_LIBS_USER}} 52 | key: precommit-${{env.UBUNTU_VERSION}}-renv-${{ hashFiles('Rmagic/.pre-commit.r_requirements.txt') }}-${{ hashFiles('Rmagic/DESCRIPTION') }}- 53 | restore-keys: | 54 | precommit-${{env.UBUNTU_VERSION}}-renv-${{ hashFiles('Rmagic/.pre-commit.r_requirements.txt') }}- 55 | precommit-${{env.UBUNTU_VERSION}}-renv- 56 | 57 | - name: Install R packages 58 | run: | 59 | if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv") 60 | con = file("Rmagic/.pre-commit.r_requirements.txt", "r") 61 | while ( length(pkg <- readLines(con, n = 1)) > 0 ) { 62 | renv::install(pkg) 63 | } 64 | close(con) 65 | if (!require("devtools")) install.packages("devtools", repos="http://cloud.r-project.org") 66 | devtools::install_dev_deps("./Rmagic", upgrade=TRUE) 67 | devtools::install("./Rmagic") 68 | shell: Rscript {0} 69 | 70 | - name: Run pre-commit for R 71 | run: | 72 | cd Rmagic 73 | git init 74 | git add * 75 | pre-commit run --all-files 76 | rm -rf .git 77 | cd .. 78 | 79 | - name: Commit files 80 | if: failure() 81 | run: | 82 | git checkout -- .github/workflows 83 | if [[ `git status --porcelain --untracked-files=no` ]]; then 84 | git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com" 85 | git config --local user.name "github-actions[bot]" 86 | git commit -m "pre-commit" -a 87 | fi 88 | 89 | - name: Push changes 90 | if: failure() 91 | uses: ad-m/github-push-action@master 92 | with: 93 | github_token: ${{ secrets.GITHUB_TOKEN }} 94 | branch: ${{ github.ref }} 95 | -------------------------------------------------------------------------------- /.github/workflows/run_tests.yml: -------------------------------------------------------------------------------- 1 | name: Unit Tests 2 | 3 | on: 4 | push: 5 | branches-ignore: 6 | - 'test_deploy' 7 | pull_request: 8 | branches: 9 | - '*' 10 | 11 | jobs: 12 | 13 | test_python: 14 | runs-on: ${{ matrix.config.os }} 15 | if: "!contains(github.event.head_commit.message, 'ci skip')" 16 | 17 | strategy: 18 | fail-fast: false 19 | matrix: 20 | config: 21 | - {name: '3.9', os: ubuntu-latest, python: '3.9' } 22 | - {name: '3.8', os: ubuntu-latest, python: '3.8' } 23 | - {name: '3.7', os: ubuntu-latest, python: '3.7' } 24 | - {name: '3.6', os: ubuntu-latest, python: '3.6' } 25 | 26 | steps: 27 | - name: Cancel Previous Runs 28 | uses: styfle/cancel-workflow-action@0.6.0 29 | with: 30 | access_token: ${{ github.token }} 31 | 32 | - name: Check Ubuntu version 33 | run: | 34 | echo "UBUNTU_VERSION=`grep DISTRIB_RELEASE /etc/lsb-release | sed 's/.*=//g'`" >> $GITHUB_ENV 35 | 36 | - uses: actions/checkout@v2 37 | 38 | - name: Set up Python 39 | uses: actions/setup-python@v2 40 | with: 41 | python-version: ${{ matrix.config.python }} 42 | 43 | - name: Cache Python packages 44 | uses: actions/cache@v2 45 | with: 46 | path: ${{ env.pythonLocation }} 47 | key: ${{runner.os}}-pip-${{ env.pythonLocation }}-${{ hashFiles('python/setup.py') }} 48 | restore-keys: ${{runner.os}}-pip-${{ env.pythonLocation }}- 49 | 50 | - name: Install package & dependencies 51 | run: | 52 | python -m pip install --upgrade pip 53 | pip install -U wheel setuptools 54 | pip install -U ./python[test] 55 | python -c "import magic" 56 | 57 | - name: Run Python tests 58 | run: | 59 | cd python 60 | nose2 -vvv 61 | cd .. 62 | 63 | - name: Build docs 64 | run: | 65 | cd python 66 | pip install .[doc] 67 | cd doc 68 | make html 69 | cd ../.. 70 | 71 | - name: Coveralls 72 | env: 73 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 74 | COVERALLS_SERVICE_NAME: github 75 | run: | 76 | coveralls 77 | 78 | - name: Upload check results on fail 79 | if: failure() 80 | uses: actions/upload-artifact@master 81 | with: 82 | name: ${{ matrix.config.name }}_results 83 | path: check 84 | 85 | test_r: 86 | runs-on: ${{ matrix.config.os }} 87 | if: "!contains(github.event.head_commit.message, 'ci skip')" 88 | 89 | strategy: 90 | fail-fast: false 91 | matrix: 92 | config: 93 | - {name: 'devel', os: ubuntu-latest, r: 'devel' } 94 | - {name: 'release', os: ubuntu-latest, r: 'release' } 95 | 96 | steps: 97 | - name: Cancel Previous Runs 98 | uses: styfle/cancel-workflow-action@0.6.0 99 | with: 100 | access_token: ${{ github.token }} 101 | 102 | - name: Set up environment 103 | run: | 104 | echo "UBUNTU_VERSION=`grep DISTRIB_RELEASE /etc/lsb-release | sed 's/.*=//g'`" >> $GITHUB_ENV 105 | mkdir -p .local/R/site-packages 106 | echo "R_LIBS_USER=`pwd`/.local/R/site-packages" >> $GITHUB_ENV 107 | 108 | - uses: actions/checkout@v2 109 | 110 | - name: Set up Python 111 | uses: actions/setup-python@v2 112 | with: 113 | python-version: "3.8" 114 | 115 | - name: Install system dependencies 116 | if: runner.os == 'Linux' 117 | run: | 118 | sudo apt-get update -qq 119 | sudo apt-get install -y libcurl4-openssl-dev pandoc 120 | 121 | - name: Cache Python packages 122 | uses: actions/cache@v2 123 | with: 124 | path: ${{ env.pythonLocation }} 125 | key: ${{runner.os}}-pip-${{ env.pythonLocation }}-${{ hashFiles('python/setup.py') }} 126 | restore-keys: ${{runner.os}}-pip-${{ env.pythonLocation }}- 127 | 128 | - name: Install package & dependencies 129 | run: | 130 | python -m pip install --upgrade pip 131 | pip install -U wheel setuptools 132 | pip install -U ./python 133 | python -c "import magic" 134 | 135 | - name: Set up R 136 | id: setup-r 137 | uses: r-lib/actions/setup-r@v1 138 | with: 139 | r-version: ${{ matrix.config.r }} 140 | 141 | - name: Cache R packages 142 | uses: actions/cache@v2 143 | if: startsWith(runner.os, 'Linux') 144 | with: 145 | path: ${{env.R_LIBS_USER}} 146 | key: test-${{env.UBUNTU_VERSION}}-renv-${{ steps.setup-r.outputs.installed-r-version }}-${{ hashFiles('Rmagic/DESCRIPTION') }}- 147 | restore-keys: | 148 | test-${{env.UBUNTU_VERSION}}-renv-${{ steps.setup-r.outputs.installed-r-version }}- 149 | 150 | - name: Install R packages 151 | run: | 152 | if (!require("devtools")) install.packages("devtools", repos="http://cloud.r-project.org") 153 | devtools::install_dev_deps("./Rmagic", upgrade=TRUE) 154 | devtools::install("./Rmagic") 155 | shell: Rscript {0} 156 | 157 | - name: Install tinytex 158 | uses: r-lib/actions/setup-tinytex@v1 159 | 160 | - name: Run R tests 161 | run: | 162 | cd Rmagic 163 | R CMD build . 164 | R CMD check --as-cran *.tar.gz 165 | cd .. 166 | 167 | - name: Upload check results on fail 168 | if: failure() 169 | uses: actions/upload-artifact@master 170 | with: 171 | name: ${{ matrix.config.name }}_results 172 | path: check 173 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # R project files 2 | 3 | .Rproj.user 4 | *.Rproj 5 | .Rhistory 6 | .RData 7 | .Ruserdata 8 | 9 | # R installation files 10 | 11 | build 12 | dist 13 | 14 | # Python installation files 15 | 16 | python/*.o 17 | python/*.so 18 | python/*.dll 19 | python/*.egg-info 20 | python/magic/__pycache__ 21 | python/magic/*.pyc 22 | python/tutorial_notebooks/.ipynb_checkpoints 23 | __pycache__ 24 | .eggs 25 | 26 | 27 | matlab/EMT.csv 28 | 29 | # Mac detritus 30 | 31 | *~ 32 | ~$* 33 | .DS_Store 34 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v3.3.0 4 | hooks: 5 | - id: check-yaml 6 | - id: end-of-file-fixer 7 | - id: trailing-whitespace 8 | exclude: \.(ai|gz)$ 9 | - repo: https://github.com/timothycrosley/isort 10 | rev: 5.6.4 11 | hooks: 12 | - id: isort 13 | - repo: https://github.com/psf/black 14 | rev: 20.8b1 15 | hooks: 16 | - id: black 17 | args: ['--target-version', 'py36'] 18 | - repo: https://github.com/pre-commit/mirrors-autopep8 19 | rev: v1.5.4 20 | hooks: 21 | - id: autopep8 22 | - repo: https://gitlab.com/pycqa/flake8 23 | rev: 3.8.4 24 | hooks: 25 | - id: flake8 26 | additional_dependencies: ['hacking'] 27 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | 2 | Contributing to MAGIC 3 | ============================ 4 | 5 | There are many ways to contribute to `MAGIC`, with the most common ones 6 | being contribution of code or documentation to the project. Improving the 7 | documentation is no less important than improving the library itself. If you 8 | find a typo in the documentation, or have made improvements, do not hesitate to 9 | submit a GitHub pull request. 10 | 11 | But there are many other ways to help. In particular answering queries on the 12 | [issue tracker](https://github.com/KrishnaswamyLab/MAGIC/issues), 13 | investigating bugs, and [reviewing other developers' pull 14 | requests](https://github.com/KrishnaswamyLab/MAGIC/pulls) 15 | are very valuable contributions that decrease the burden on the project 16 | maintainers. 17 | 18 | Another way to contribute is to report issues you're facing, and give a "thumbs 19 | up" on issues that others reported and that are relevant to you. It also helps 20 | us if you spread the word: reference the project from your blog and articles, 21 | link to it from your website, or simply star it in GitHub to say "I use it". 22 | 23 | Code Style and Testing 24 | ---------------------- 25 | 26 | Contributors are encouraged to write tests for their code, but if you do not know how to do so, please do not feel discouraged from contributing code! Others can always help you test your contribution. 27 | 28 | Code style is dictated by [`black`](https://pypi.org/project/black/#installation-and-usage) and [OpenStack](https://docs.openstack.org/hacking/latest/user/hacking.html#styleguide). Styling is automatically applied by [`pre-commit`](https://github.com/pre-commit/pre-commit). 29 | 30 | Code of Conduct 31 | --------------- 32 | 33 | We abide by the principles of openness, respect, and consideration of others 34 | of the Python Software Foundation: https://www.python.org/psf/codeofconduct/. 35 | 36 | Attribution 37 | --------------- 38 | 39 | This `CONTRIBUTING.md` was adapted from [scikit-learn](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md). 40 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | {description} 294 | Copyright (C) {year} {fullname} 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | {signature of Ty Coon}, 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. 340 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Markov Affinity-based Graph Imputation of Cells (MAGIC) 2 | ------------------------------------------------------- 3 | 4 | [![Latest PyPI version](https://img.shields.io/pypi/v/magic-impute.svg)](https://pypi.org/project/magic-impute/) 5 | [![Latest CRAN version](https://img.shields.io/cran/v/Rmagic.svg)](https://cran.r-project.org/package=Rmagic) 6 | [![GitHub Actions Build](https://img.shields.io/github/workflow/status/KrishnaswamyLab/MAGIC/Unit%20Tests/master?label=Github%20Actions)](https://github.com/KrishnaswamyLab/MAGIC/actions) 7 | [![Read the Docs](https://img.shields.io/readthedocs/magic.svg)](https://magic.readthedocs.io/) 8 | [![Cell Publication DOI](https://zenodo.org/badge/DOI/10.1016/j.cell.2018.05.061.svg)](https://www.cell.com/cell/abstract/S0092-8674(18)30724-4) 9 | [![Twitter](https://img.shields.io/twitter/follow/KrishnaswamyLab.svg?style=social&label=Follow)](https://twitter.com/KrishnaswamyLab) 10 | [![Github Stars](https://img.shields.io/github/stars/KrishnaswamyLab/MAGIC.svg?style=social&label=Stars)](https://github.com/KrishnaswamyLab/MAGIC/) 11 | 12 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising high-dimensional data most commonly applied to single-cell RNA sequencing data. MAGIC learns the manifold data, using the resultant graph to smooth the features and restore the structure of the data. 13 | 14 | To see how MAGIC can be applied to single-cell RNA-seq, elucidating the epithelial-to-mesenchymal transition, read our [publication in Cell](https://www.cell.com/cell/abstract/S0092-8674(18)30724-4). 15 | 16 | [David van Dijk, et al. **Recovering Gene Interactions from Single-Cell Data Using Data Diffusion**. 2018. *Cell*.](https://www.cell.com/cell/abstract/S0092-8674(18)30724-4) 17 | 18 | MAGIC has been implemented in Python, Matlab, and R. 19 | 20 | #### To get started immediately, check out our tutorials: 21 | ##### Python 22 | * [Epithelial-to-Mesenchymal Transition Tutorial](http://nbviewer.jupyter.org/github/KrishnaswamyLab/MAGIC/blob/master/python/tutorial_notebooks/emt_tutorial.ipynb) 23 | * [Bone Marrow Tutorial](http://nbviewer.jupyter.org/github/KrishnaswamyLab/MAGIC/blob/master/python/tutorial_notebooks/bonemarrow_tutorial.ipynb) 24 | ##### R 25 | * [Epithelial-to-Mesenchymal Transition Tutorial](http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/emt_tutorial.html) 26 | * [Bone Marrow Tutorial](http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/bonemarrow_tutorial.html) 27 | 28 | 29 |

30 | 31 |
32 | Magic reveals the interaction between Vimentin (VIM), Cadherin-1 (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by colors). 33 | 34 |

35 | 36 | ### Table of Contents 37 | 38 | * [Python](#python) 39 | * [Installation](#installation) 40 | * [Installation with pip](#installation-with-pip) 41 | * [Installation from GitHub](#installation-from-github) 42 | * [Usage](#usage) 43 | * [Quick Start](#quick-start) 44 | * [Tutorials](#tutorials) 45 | * [Matlab](#matlab) 46 | * [Instructions for the Matlab version](#instructions-for-the-matlab-version) 47 | * [R](#r) 48 | * [Installation](#installation-1) 49 | * [Installation from CRAN](#installation-from-cran) 50 | * [Installation from GitHub](#installation-from-github-1) 51 | * [Usage](#usage-1) 52 | * [Quick Start](#quick-start-1) 53 | * [Tutorials](#tutorials-1) 54 | * [Help](#help) 55 | 56 | ## Python 57 | 58 | ### Installation 59 | 60 | #### Installation with pip 61 | 62 | To install with `pip`, run the following from a terminal: 63 | 64 | pip install --user magic-impute 65 | 66 | #### Installation from GitHub 67 | 68 | To clone the repository and install manually, run the following from a terminal: 69 | 70 | git clone git://github.com/KrishnaswamyLab/MAGIC.git 71 | cd MAGIC/python 72 | python setup.py install --user 73 | 74 | ### Usage 75 | 76 | #### Quick Start 77 | 78 | The following code runs MAGIC on test data located in the MAGIC repository. 79 | 80 | import magic 81 | import pandas as pd 82 | import matplotlib.pyplot as plt 83 | X = pd.read_csv("MAGIC/data/test_data.csv") 84 | magic_operator = magic.MAGIC() 85 | X_magic = magic_operator.fit_transform(X, genes=['VIM', 'CDH1', 'ZEB1']) 86 | plt.scatter(X_magic['VIM'], X_magic['CDH1'], c=X_magic['ZEB1'], s=1, cmap='inferno') 87 | plt.show() 88 | magic.plot.animate_magic(X, gene_x='VIM', gene_y='CDH1', gene_color='ZEB1', operator=magic_operator) 89 | 90 | #### Tutorials 91 | 92 | You can read the MAGIC documentation at https://magic.readthedocs.io/. We have included two tutorial notebooks on MAGIC usage and results visualization for single cell RNA-seq data. 93 | 94 | EMT data notebook: http://nbviewer.jupyter.org/github/KrishnaswamyLab/MAGIC/blob/master/python/tutorial_notebooks/emt_tutorial.ipynb 95 | 96 | Bone Marrow data notebook: http://nbviewer.jupyter.org/github/KrishnaswamyLab/MAGIC/blob/master/python/tutorial_notebooks/bonemarrow_tutorial.ipynb 97 | 98 | ## Matlab 99 | 100 | ### Instructions for the Matlab version 101 | 1. run_magic.m -- MAGIC imputation function 102 | 2. test_magic.m -- Shows how to run MAGIC. Also included is a function for loading 10x format data (load_10x.m) 103 | 104 | ## R 105 | 106 | ### Installation 107 | 108 | To use MAGIC, you will need to install both the R and Python packages. 109 | 110 | If `python` or `pip` are not installed, you will need to install them. We recommend 111 | [Miniconda3](https://conda.io/miniconda.html) to install Python and `pip` together, 112 | or otherwise you can install `pip` from https://pip.pypa.io/en/stable/installing/. 113 | 114 | #### Installation from CRAN 115 | 116 | In R, run this command to install MAGIC and all dependencies: 117 | 118 | install.packages("Rmagic") 119 | 120 | In a terminal, run the following command to install the Python 121 | repository. 122 | 123 | pip install --user magic-impute 124 | 125 | #### Installation from GitHub 126 | 127 | To clone the repository and install manually, run the following from a terminal: 128 | 129 | git clone git://github.com/KrishnaswamyLab/MAGIC.git 130 | cd MAGIC/python 131 | python setup.py install --user 132 | cd ../Rmagic 133 | R CMD INSTALL . 134 | 135 | ### Usage 136 | 137 | #### Quick Start 138 | 139 | After installing the package, MAGIC can be run by loading the library and calling `magic()`: 140 | 141 | library(Rmagic) 142 | library(ggplot2) 143 | data(magic_testdata) 144 | MAGIC_data <- magic(magic_testdata, genes=c("VIM", "CDH1", "ZEB1")) 145 | ggplot(MAGIC_data) + 146 | geom_point(aes(x=VIM, y=CDH1, color=ZEB1)) 147 | 148 | #### Tutorials 149 | 150 | You can read the MAGIC tutorial by running `help(Rmagic::magic)`. For a working example, see the Rmarkdown tutorials at and or in `Rmagic/inst/examples`. 151 | 152 | ## Help 153 | 154 | If you have any questions or require assistance using MAGIC, please contact us at . 155 | -------------------------------------------------------------------------------- /Rmagic/.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^data-raw$ 2 | ^tests$ 3 | ^README\.Rmd$ 4 | ^.pre\-commit.*$ 5 | -------------------------------------------------------------------------------- /Rmagic/.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v3.3.0 4 | hooks: 5 | - id: check-yaml 6 | - id: end-of-file-fixer 7 | - id: trailing-whitespace 8 | exclude: \.(ai|gz)$ 9 | - repo: https://github.com/lorenzwalthert/precommit 10 | rev: v0.1.3 11 | hooks: 12 | - id: parsable-R 13 | - id: no-browser-statement 14 | - id: readme-rmd-rendered 15 | - id: deps-in-desc 16 | exclude: data\-raw 17 | - id: use-tidy-description 18 | - id: style-files 19 | - id: lintr 20 | args: [--warn_only] 21 | verbose: true 22 | - id: roxygenize 23 | -------------------------------------------------------------------------------- /Rmagic/.pre-commit.r_requirements.txt: -------------------------------------------------------------------------------- 1 | docopt 2 | styler 3 | git2r 4 | lintr 5 | roxygen2 6 | precommit 7 | -------------------------------------------------------------------------------- /Rmagic/DESCRIPTION: -------------------------------------------------------------------------------- 1 | Type: Package 2 | Package: Rmagic 3 | Title: MAGIC - Markov Affinity-Based Graph Imputation of Cells 4 | Version: 2.0.3.999 5 | Authors@R: 6 | c(person(given = "David", 7 | family = "van Dijk", 8 | role = "aut", 9 | email = "davidvandijk@gmail.com"), 10 | person(given = "Scott", 11 | family = "Gigante", 12 | role = "cre", 13 | email = "scott.gigante@yale.edu", 14 | comment = c(ORCID = "0000-0002-4544-2764"))) 15 | Maintainer: Scott Gigante 16 | Description: MAGIC (Markov affinity-based graph imputation of cells) is a 17 | method for addressing technical noise in single-cell data, including 18 | under-sampling of mRNA molecules, often termed "dropout" which can 19 | severely obscure important gene-gene relationships. MAGIC shares 20 | information across similar cells, via data diffusion, to denoise the 21 | cell count matrix and fill in missing transcripts. Read more: van Dijk 22 | et al. (2018) . 23 | License: GPL-2 | file LICENSE 24 | Depends: 25 | Matrix (>= 1.2-0), 26 | R (>= 3.3) 27 | Imports: 28 | ggplot2, 29 | methods, 30 | reticulate (>= 1.4), 31 | stats 32 | Suggests: 33 | phateR, 34 | readr, 35 | Seurat (>= 3.0.0), 36 | viridis 37 | Encoding: UTF-8 38 | LazyData: true 39 | RoxygenNote: 7.1.1 40 | -------------------------------------------------------------------------------- /Rmagic/LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | {description} 294 | Copyright (C) {year} {fullname} 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | {signature of Ty Coon}, 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. 340 | -------------------------------------------------------------------------------- /Rmagic/NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method(as.data.frame,magic) 4 | S3method(as.matrix,magic) 5 | S3method(ggplot,magic) 6 | S3method(magic,Seurat) 7 | S3method(magic,default) 8 | S3method(magic,seurat) 9 | S3method(print,magic) 10 | S3method(summary,magic) 11 | export(check_pymagic_version) 12 | export(install.magic) 13 | export(library.size.normalize) 14 | export(magic) 15 | export(pymagic_is_available) 16 | import(Matrix) 17 | importFrom(ggplot2,ggplot) 18 | importFrom(utils,packageVersion) 19 | -------------------------------------------------------------------------------- /Rmagic/R/magic.R: -------------------------------------------------------------------------------- 1 | #' Perform MAGIC on a data matrix 2 | #' 3 | #' Markov Affinity-based Graph Imputation of Cells (MAGIC) is an 4 | #' algorithm for denoising and transcript recover of single cells 5 | #' applied to single-cell RNA sequencing data, as described in 6 | #' van Dijk et al, 2018. 7 | #' 8 | #' @param data input data matrix or Seurat object 9 | #' @param genes character or integer vector, default: NULL 10 | #' vector of column names or column indices for which to return smoothed data 11 | #' If 'all_genes' or NULL, the entire smoothed matrix is returned 12 | #' @param knn int, optional, default: 5 13 | #' number of nearest neighbors on which to compute bandwidth 14 | #' @param knn.max int, optional, default: NULL 15 | #' maximum number of neighbors for each point. If NULL, defaults to 3*knn 16 | #' @param decay int, optional, default: 1 17 | #' sets decay rate of kernel tails. 18 | #' If NULL, alpha decaying kernel is not used 19 | #' @param t int, optional, default: 3 20 | #' power to which the diffusion operator is powered 21 | #' sets the level of diffusion. If 'auto', t is selected according to the 22 | #' Procrustes disparity of the diffused data.' 23 | #' @param npca number of PCA components that should be used; default: 100. 24 | #' @param solver str, optional, default: 'exact' 25 | #' Which solver to use. "exact" uses the implementation described 26 | #' in van Dijk et al. (2018). "approximate" uses a faster implementation 27 | #' that performs imputation in the PCA space and then projects back to the 28 | #' gene space. Note, the "approximate" solver may return negative values. 29 | #' @param init magic object, optional 30 | #' object to use for initialization. Avoids recomputing 31 | #' intermediate steps if parameters are the same. 32 | #' @param t.max int, optional, default: 20 33 | #' Maximum value of t to test for automatic t selection. 34 | #' @param knn.dist.method string, optional, default: 'euclidean'. 35 | #' recommended values: 'euclidean', 'cosine' 36 | #' Any metric from `scipy.spatial.distance` can be used 37 | #' distance metric for building kNN graph. 38 | #' @param verbose `int` or `boolean`, optional (default : 1) 39 | #' If `TRUE` or `> 0`, print verbose updates. 40 | #' @param n.jobs `int`, optional (default: 1) 41 | #' The number of jobs to use for the computation. 42 | #' If -1 all CPUs are used. If 1 is given, no parallel computing code is 43 | #' used at all, which is useful for debugging. 44 | #' For n_jobs below -1, (n.cpus + 1 + n.jobs) are used. Thus for 45 | #' n_jobs = -2, all CPUs but one are used 46 | #' @param seed int or `NULL`, random state (default: `NULL`) 47 | #' @param ... Arguments passed to and from other methods 48 | #' @param k Deprecated. Use `knn`. 49 | #' @param alpha Deprecated. Use `decay`. 50 | #' 51 | #' @return If a Seurat object is passed, a Seurat object is returned. Otherwise, a "magic" object containing: 52 | #' * **result**: matrix containing smoothed expression values 53 | #' * **operator**: The MAGIC operator (python magic.MAGIC object) 54 | #' * **params**: Parameters passed to magic 55 | #' 56 | #' @examples 57 | #' if (pymagic_is_available()) { 58 | #' data(magic_testdata) 59 | #' 60 | #' # Run MAGIC 61 | #' data_magic <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1")) 62 | #' summary(data_magic) 63 | #' ## CDH1 VIM ZEB1 64 | #' ## Min. :0.4303 Min. :3.854 Min. :0.01111 65 | #' ## 1st Qu.:0.4444 1st Qu.:3.947 1st Qu.:0.01145 66 | #' ## Median :0.4462 Median :3.964 Median :0.01153 67 | #' ## Mean :0.4461 Mean :3.965 Mean :0.01152 68 | #' ## 3rd Qu.:0.4478 3rd Qu.:3.982 3rd Qu.:0.01160 69 | #' ## Max. :0.4585 Max. :4.127 Max. :0.01201 70 | #' 71 | #' # Plot the result with ggplot2 72 | #' if (require(ggplot2)) { 73 | #' ggplot(data_magic) + 74 | #' geom_point(aes(x = VIM, y = CDH1, color = ZEB1)) 75 | #' } 76 | #' 77 | #' # Run MAGIC again returning all genes 78 | #' # We use the last run as initialization 79 | #' data_magic <- magic(magic_testdata, genes = "all_genes", init = data_magic) 80 | #' # Extract the smoothed data matrix to use in downstream analysis 81 | #' data_smooth <- as.matrix(data_magic) 82 | #' } 83 | #' 84 | #' if (pymagic_is_available() && require(Seurat)) { 85 | #' data(magic_testdata) 86 | #' 87 | #' # Create a Seurat object 88 | #' seurat_object <- CreateSeuratObject(counts = t(magic_testdata), assay = "RNA") 89 | #' seurat_object <- NormalizeData(object = seurat_object) 90 | #' seurat_object <- ScaleData(object = seurat_object) 91 | #' 92 | #' # Run MAGIC and reset the active assay 93 | #' seurat_object <- magic(seurat_object) 94 | #' seurat_object@active.assay <- "MAGIC_RNA" 95 | #' 96 | #' # Analyze with Seurat 97 | #' VlnPlot(seurat_object, features = c("VIM", "ZEB1", "CDH1")) 98 | #' } 99 | #' @export 100 | #' 101 | magic <- function(data, ...) { 102 | UseMethod(generic = "magic", object = data) 103 | } 104 | 105 | #' @rdname magic 106 | #' @export 107 | #' 108 | magic.default <- function( 109 | data, 110 | genes = NULL, 111 | knn = 5, 112 | knn.max = NULL, 113 | decay = 1, 114 | t = 3, 115 | npca = 100, 116 | solver = "exact", 117 | init = NULL, 118 | t.max = 20, 119 | knn.dist.method = "euclidean", 120 | verbose = 1, 121 | n.jobs = 1, 122 | seed = NULL, 123 | # deprecated args 124 | k = NULL, alpha = NULL, 125 | ...) { 126 | # check installation 127 | if (!reticulate::py_module_available(module = "magic") || 128 | (is.null(pymagic))) { 129 | load_pymagic() 130 | } 131 | # check for deprecated arguments 132 | if (!is.null(k)) { 133 | message("Argument k is deprecated. Using knn instead.") 134 | knn <- k 135 | } 136 | if (!is.null(alpha)) { 137 | message("Argument alpha is deprecated. Using decay instead.") 138 | decay <- alpha 139 | } 140 | # validate parameters 141 | knn <- check.int(x = knn) 142 | t.max <- check.int(x = t.max) 143 | n.jobs <- check.int(x = n.jobs) 144 | npca <- check.int.or.null(npca) 145 | knn.max <- check.int.or.null(knn.max) 146 | seed <- check.int.or.null(seed) 147 | verbose <- check.int.or.null(verbose) 148 | decay <- check.double.or.null(decay) 149 | t <- check.int.or.string(t, "auto") 150 | if (!methods::is(object = data, "Matrix")) { 151 | data <- as.matrix(x = data) 152 | } 153 | if (length(genes) <= 1 && (is.null(x = genes) || is.na(x = genes))) { 154 | genes <- NULL 155 | gene_names <- colnames(x = data) 156 | } else if (is.numeric(x = genes)) { 157 | gene_names <- colnames(x = data)[genes] 158 | genes <- as.integer(x = genes - 1) 159 | } else if (length(x = genes) == 1 && genes == "all_genes") { 160 | gene_names <- colnames(x = data) 161 | } else if (length(x = genes) == 1 && genes == "pca_only") { 162 | gene_names <- paste0("PC", 1:npca) 163 | } else { 164 | # character vector 165 | if (!all(genes %in% colnames(x = data))) { 166 | warning(paste0( 167 | "Genes ", 168 | genes[!(genes %in% colnames(data))], 169 | " not found.", 170 | collapse = ", " 171 | )) 172 | } 173 | genes <- which(x = colnames(x = data) %in% genes) 174 | gene_names <- colnames(x = data)[genes] 175 | genes <- as.integer(x = genes - 1) 176 | } 177 | # store parameters 178 | params <- list( 179 | "data" = data, 180 | "knn" = knn, 181 | "knn.max" = knn.max, 182 | "decay" = decay, 183 | "t" = t, 184 | "npca" = npca, 185 | "solver" = solver, 186 | "knn.dist.method" = knn.dist.method 187 | ) 188 | # use pre-initialized values if given 189 | operator <- NULL 190 | if (!is.null(x = init)) { 191 | if (!methods::is(init, "magic")) { 192 | warning("object passed to init is not a phate object") 193 | } else { 194 | operator <- init$operator 195 | operator$set_params( 196 | knn = knn, 197 | knn_max = knn.max, 198 | decay = decay, 199 | t = t, 200 | n_pca = npca, 201 | solver = solver, 202 | knn_dist = knn.dist.method, 203 | n_jobs = n.jobs, 204 | random_state = seed, 205 | verbose = verbose, 206 | ... 207 | ) 208 | } 209 | } 210 | if (is.null(x = operator)) { 211 | operator <- pymagic$MAGIC( 212 | knn = knn, 213 | knn_max = knn.max, 214 | decay = decay, 215 | t = t, 216 | n_pca = npca, 217 | solver = solver, 218 | knn_dist = knn.dist.method, 219 | n_jobs = n.jobs, 220 | random_state = seed, 221 | verbose = verbose, 222 | ... 223 | ) 224 | } 225 | result <- operator$fit_transform( 226 | data, 227 | genes = genes, 228 | t_max = t.max 229 | ) 230 | colnames(x = result) <- gene_names 231 | rownames(x = result) <- rownames(data) 232 | result <- as.data.frame(x = result) 233 | result <- list( 234 | "result" = result, 235 | "operator" = operator, 236 | "params" = params 237 | ) 238 | class(x = result) <- c("magic", "list") 239 | return(result) 240 | } 241 | 242 | #' @rdname magic 243 | #' @export 244 | #' @method magic seurat 245 | #' 246 | magic.seurat <- function( 247 | data, 248 | genes = NULL, 249 | knn = 5, 250 | knn.max = NULL, 251 | decay = 1, 252 | t = 3, 253 | npca = 100, 254 | solver = "exact", 255 | init = NULL, 256 | t.max = 20, 257 | knn.dist.method = "euclidean", 258 | verbose = 1, 259 | n.jobs = 1, 260 | seed = NULL, 261 | ...) { 262 | if (requireNamespace("Seurat", quietly = TRUE)) { 263 | results <- magic( 264 | data = as.matrix(x = t(x = data@data)), 265 | genes = genes, 266 | knn = knn, 267 | knn.max = knn.max, 268 | decay = decay, 269 | t = t, 270 | npca = npca, 271 | solver = solver, 272 | init = init, 273 | t.max = t.max, 274 | knn.dist.method = knn.dist.method, 275 | verbose = verbose, 276 | n.jobs = n.jobs, 277 | seed = seed, 278 | ... 279 | ) 280 | data@data <- t(x = as.matrix(x = results$result)) 281 | return(data) 282 | } else { 283 | message("Seurat package not available. Running default MAGIC implementation.") 284 | return(magic( 285 | data, 286 | genes = genes, 287 | knn = knn, 288 | knn.max = knn.max, 289 | decay = decay, 290 | t = t, 291 | npca = npca, 292 | solver = solver, 293 | init = init, 294 | t.max = t.max, 295 | knn.dist.method = knn.dist.method, 296 | verbose = verbose, 297 | n.jobs = n.jobs, 298 | seed = seed, 299 | ... 300 | )) 301 | } 302 | } 303 | 304 | #' @param assay Assay to use for imputation, defaults to the default assay 305 | #' 306 | #' @rdname magic 307 | #' @export 308 | #' @method magic Seurat 309 | #' 310 | magic.Seurat <- function( 311 | data, 312 | assay = NULL, 313 | genes = NULL, 314 | knn = 5, 315 | knn.max = NULL, 316 | decay = 1, 317 | t = 3, 318 | npca = 100, 319 | solver = "exact", 320 | init = NULL, 321 | t.max = 20, 322 | knn.dist.method = "euclidean", 323 | verbose = 1, 324 | n.jobs = 1, 325 | seed = NULL, 326 | ...) { 327 | if (requireNamespace("Seurat", quietly = TRUE)) { 328 | if (is.null(x = assay)) { 329 | assay <- Seurat::DefaultAssay(object = data) 330 | } 331 | results <- magic( 332 | data = t(x = Seurat::GetAssayData(object = data, slot = "data", assay = assay)), 333 | genes = genes, 334 | knn = knn, 335 | knn.max = knn.max, 336 | decay = decay, 337 | t = t, 338 | npca = npca, 339 | solver = solver, 340 | init = init, 341 | t.max = t.max, 342 | knn.dist.method = knn.dist.method, 343 | verbose = verbose, 344 | n.jobs = n.jobs, 345 | seed = seed, 346 | ... 347 | ) 348 | assay_name <- paste0("MAGIC_", assay) 349 | data[[assay_name]] <- Seurat::CreateAssayObject( 350 | data = t(x = as.matrix(x = results$result)) 351 | ) 352 | print(paste0( 353 | "Added MAGIC output to ", 354 | assay_name, 355 | ". To use it, pass assay='", 356 | assay_name, 357 | "' to downstream methods or set DefaultAssay(seurat_object) <- '", 358 | assay_name, 359 | "'." 360 | )) 361 | Seurat::Tool(object = data) <- results[c("operator", "params")] 362 | return(data) 363 | } else { 364 | message("Seurat package not available. Running default MAGIC implementation.") 365 | return(magic( 366 | data, 367 | genes = genes, 368 | knn = knn, 369 | knn.max = knn.max, 370 | decay = decay, 371 | t = t, 372 | npca = npca, 373 | init = init, 374 | t.max = t.max, 375 | knn.dist.method = knn.dist.method, 376 | verbose = verbose, 377 | n.jobs = n.jobs, 378 | seed = seed, 379 | ... 380 | )) 381 | } 382 | } 383 | 384 | #' Print a MAGIC object 385 | #' 386 | #' This avoids spamming the user's console with a list of many large matrices 387 | #' 388 | #' @param x A fitted MAGIC object 389 | #' @param ... Arguments for print() 390 | #' @examples 391 | #' if (pymagic_is_available()) { 392 | #' data(magic_testdata) 393 | #' data_magic <- magic(magic_testdata) 394 | #' print(data_magic) 395 | #' ## MAGIC with elements 396 | #' ## $result : (500, 197) 397 | #' ## $operator : Python MAGIC operator 398 | #' ## $params : list with elements (data, knn, decay, t, npca, knn.dist.method) 399 | #' } 400 | #' @rdname print 401 | #' @method print magic 402 | #' @export 403 | print.magic <- function(x, ...) { 404 | result <- paste0( 405 | "MAGIC with elements\n", 406 | " $result : (", nrow(x$result), ", ", 407 | ncol(x$result), ")\n", 408 | " $operator : Python MAGIC operator\n", 409 | " $params : list with elements (", 410 | paste(names(x$params), collapse = ", "), ")" 411 | ) 412 | cat(result) 413 | } 414 | 415 | #' Summarize a MAGIC object 416 | #' 417 | #' @param object A fitted MAGIC object 418 | #' @param ... Arguments for summary() 419 | #' @examples 420 | #' if (pymagic_is_available()) { 421 | #' data(magic_testdata) 422 | #' data_magic <- magic(magic_testdata) 423 | #' summary(data_magic) 424 | #' ## ZEB1 425 | #' ## Min. :0.01071 426 | #' ## 1st Qu.:0.01119 427 | #' ## Median :0.01130 428 | #' ## Mean :0.01129 429 | #' ## 3rd Qu.:0.01140 430 | #' ## Max. :0.01201 431 | #' } 432 | #' @rdname summary 433 | #' @method summary magic 434 | #' @export 435 | summary.magic <- function(object, ...) { 436 | summary(object$result) 437 | } 438 | 439 | #' Convert a MAGIC object to a matrix 440 | #' 441 | #' Returns the smoothed data matrix 442 | #' 443 | #' @param x A fitted MAGIC object 444 | #' @param ... Arguments for as.matrix() 445 | #' @rdname as.matrix 446 | #' @method as.matrix magic 447 | #' @export 448 | as.matrix.magic <- function(x, ...) { 449 | as.matrix(as.data.frame(x)) 450 | } 451 | #' Convert a MAGIC object to a data.frame 452 | #' 453 | #' Returns the smoothed data matrix 454 | #' 455 | #' @param x A fitted MAGIC object 456 | #' @param ... Arguments for as.data.frame() 457 | #' @rdname as.data.frame 458 | #' @method as.data.frame magic 459 | #' @export 460 | as.data.frame.magic <- function(x, ...) { 461 | x$result 462 | } 463 | 464 | 465 | #' Convert a MAGIC object to a data.frame for ggplot 466 | #' 467 | #' Passes the smoothed data matrix to ggplot 468 | #' @importFrom ggplot2 ggplot 469 | #' @param data A fitted MAGIC object 470 | #' @param ... Arguments for ggplot() 471 | #' @examples 472 | #' if (pymagic_is_available() && require(ggplot2)) { 473 | #' data(magic_testdata) 474 | #' data_magic <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1")) 475 | #' ggplot(data_magic, aes(VIM, CDH1, colour = ZEB1)) + 476 | #' geom_point() 477 | #' } 478 | #' @rdname ggplot 479 | #' @method ggplot magic 480 | #' @export 481 | ggplot.magic <- function(data, ...) { 482 | ggplot2::ggplot(as.data.frame(data), ...) 483 | } 484 | -------------------------------------------------------------------------------- /Rmagic/R/magic_testdata.R: -------------------------------------------------------------------------------- 1 | #' Fake scRNAseq data for examples 2 | #' 3 | #' A subsampled dataset of epithelial to mesenchymal transition 4 | #' 5 | #' @format A matrix with 500 rows and 197 variables 6 | #' 7 | #' @source The authors 8 | "magic_testdata" 9 | -------------------------------------------------------------------------------- /Rmagic/R/preprocessing.R: -------------------------------------------------------------------------------- 1 | #' Performs L1 normalization on input data such that the sum of expression 2 | #' values for each cell sums to 1, then returns normalized matrix to the metric 3 | #' space using median UMI count per cell effectively scaling all cells as if 4 | #' they were sampled evenly. 5 | 6 | #' @param data matrix (n_samples, n_dimensions) 7 | #' 2 dimensional input data array with n cells and p dimensions 8 | #' @param verbose boolean, default=FALSE. If true, print verbose output 9 | 10 | #' @return data_norm matrix (n_samples, n_dimensions) 11 | #' 2 dimensional array with normalized gene expression values 12 | #' @import Matrix 13 | #' 14 | #' @export 15 | library.size.normalize <- function(data, verbose = FALSE) { 16 | if (verbose) { 17 | message(paste0( 18 | "Normalizing library sizes for ", 19 | nrow(data), " cells" 20 | )) 21 | } 22 | library_size <- Matrix::rowSums(data) 23 | median_transcript_count <- stats::median(library_size) 24 | data_norm <- median_transcript_count * data / library_size 25 | data_norm 26 | } 27 | -------------------------------------------------------------------------------- /Rmagic/R/utils.R: -------------------------------------------------------------------------------- 1 | # Return TRUE if x and y are equal or both NA 2 | null_equal <- function(x, y) { 3 | if (is.null(x) && is.null(y)) { 4 | return(TRUE) 5 | } else if (is.null(x) || is.null(y)) { 6 | return(FALSE) 7 | } else { 8 | return(x == y) 9 | } 10 | } 11 | 12 | #' Check that the current MAGIC version in Python is up to date. 13 | #' 14 | #' @importFrom utils packageVersion 15 | #' @export 16 | check_pymagic_version <- function() { 17 | pyversion <- strsplit(pymagic$`__version__`, "\\.")[[1]] 18 | rversion <- strsplit(as.character(packageVersion("Rmagic")), "\\.")[[1]] 19 | major_version <- as.integer(rversion[1]) 20 | minor_version <- as.integer(rversion[2]) 21 | if (as.integer(pyversion[1]) < major_version) { 22 | warning(paste0( 23 | "Python MAGIC version ", 24 | pymagic$`__version__`, 25 | " is out of date (recommended: ", 26 | major_version, 27 | ".", 28 | minor_version, 29 | "). Please update with pip ", 30 | "(e.g. ", 31 | reticulate::py_config()$python, 32 | " -m pip install --upgrade magic-impute) or Rmagic::install.magic()." 33 | )) 34 | return(FALSE) 35 | } else if (as.integer(pyversion[2]) < minor_version) { 36 | warning(paste0( 37 | "Python MAGIC version ", 38 | pymagic$`__version__`, 39 | " is out of date (recommended: ", 40 | major_version, 41 | ".", 42 | minor_version, 43 | "). Consider updating with pip ", 44 | "(e.g. ", 45 | reticulate::py_config()$python, 46 | " -m pip install --upgrade magic-impute) or Rmagic::install.magic()." 47 | )) 48 | return(FALSE) 49 | } 50 | return(TRUE) 51 | } 52 | 53 | failed_pymagic_import <- function(e) { 54 | message("Error loading Python module magic") 55 | message(e) 56 | result <- as.character(e) 57 | if (length(grep("ModuleNotFoundError: No module named 'magic'", result)) > 0 || 58 | length(grep("ImportError: No module named magic", result)) > 0) { 59 | # not installed 60 | if (utils::menu(c("Yes", "No"), title = "Install MAGIC Python package with reticulate?") == 1) { 61 | install.magic() 62 | } 63 | } else if (length(grep("r\\-reticulate", reticulate::py_config()$python)) > 0) { 64 | # installed, but envs sometimes give weird results 65 | message("Consider removing the 'r-reticulate' environment by running:") 66 | if (length(grep("virtualenvs", reticulate::py_config()$python)) > 0) { 67 | message("reticulate::virtualenv_remove('r-reticulate')") 68 | } else { 69 | message("reticulate::conda_remove('r-reticulate')") 70 | } 71 | } 72 | } 73 | 74 | load_pymagic <- function() { 75 | delay_load <- list(on_load = check_pymagic_version, on_error = failed_pymagic_import) 76 | # load 77 | if (is.null(pymagic)) { 78 | # first time load 79 | result <- try(pymagic <<- reticulate::import("magic", delay_load = delay_load)) 80 | } else { 81 | # already loaded 82 | result <- try(reticulate::import("magic", delay_load = delay_load)) 83 | } 84 | } 85 | 86 | #' Check whether MAGIC Python package is available and can be loaded 87 | #' 88 | #' This is used primarily to avoid running tests on CRAN 89 | #' and elsewhere where the Python package should not be 90 | #' installed. 91 | #' 92 | #' @export 93 | pymagic_is_available <- function() { 94 | tryCatch( 95 | { 96 | reticulate::import("magic")$MAGIC 97 | check_pymagic_version() 98 | }, 99 | error = function(e) { 100 | FALSE 101 | } 102 | ) 103 | } 104 | 105 | #' Install MAGIC Python Package 106 | #' 107 | #' Install MAGIC Python package into a virtualenv or conda env. 108 | #' 109 | #' On Linux and OS X the "virtualenv" method will be used by default 110 | #' ("conda" will be used if virtualenv isn't available). On Windows, 111 | #' the "conda" method is always used. 112 | #' 113 | #' @param envname Name of environment to install packages into 114 | #' @param method Installation method. By default, "auto" automatically finds 115 | #' a method that will work in the local environment. Change the default to 116 | #' force a specific installation method. Note that the "virtualenv" method 117 | #' is not available on Windows. 118 | #' @param conda Path to conda executable (or "auto" to find conda using the PATH 119 | #' and other conventional install locations). 120 | #' @param pip Install from pip, if possible. 121 | #' @param ... Additional arguments passed to conda_install() or 122 | #' virtualenv_install(). 123 | #' 124 | #' @export 125 | install.magic <- function(envname = "r-reticulate", method = "auto", 126 | conda = "auto", pip = TRUE, ...) { 127 | message("Attempting to install MAGIC python package with reticulate") 128 | tryCatch( 129 | { 130 | reticulate::py_install("magic-impute", 131 | envname = envname, method = method, 132 | conda = conda, pip = pip, ... 133 | ) 134 | message("Install complete. Please restart R and try again.") 135 | }, 136 | error = function(e) { 137 | stop(paste0( 138 | "Cannot locate MAGIC Python package, please install through pip ", 139 | "(e.g. ", reticulate::py_config()$python, " -m pip install magic-impute) and then restart R." 140 | )) 141 | } 142 | ) 143 | } 144 | 145 | pymagic <- NULL 146 | 147 | .onLoad <- function(libname, pkgname) { 148 | py_config <- reticulate::py_discover_config(required_module = "magic") 149 | load_pymagic() 150 | } 151 | 152 | ###### 153 | # Parameter validation 154 | ###### 155 | 156 | check.int <- function(x) { 157 | as.integer(x) 158 | } 159 | 160 | check.int.or.null <- function(x) { 161 | if (is.numeric(x = x)) { 162 | x <- as.integer(x = x) 163 | } else if (!is.null(x = x) && is.na(x = x)) { 164 | x <- NULL 165 | } 166 | x 167 | } 168 | 169 | check.double.or.null <- function(x) { 170 | if (is.numeric(x = x)) { 171 | x <- as.integer(x = x) 172 | } else if (!is.null(x = x) && is.na(x = x)) { 173 | x <- NULL 174 | } 175 | x 176 | } 177 | 178 | check.int.or.string <- function(x, str) { 179 | if (is.numeric(x = x)) { 180 | x <- as.integer(x = x) 181 | } else if (is.null(x = x) || is.na(x = x)) { 182 | x <- str 183 | } 184 | x 185 | } 186 | -------------------------------------------------------------------------------- /Rmagic/README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title : Rmagic 3 | output: github_document 4 | toc: true 5 | --- 6 | 7 | 8 | 9 | ```{r setup, include = FALSE} 10 | knitr::opts_chunk$set( 11 | collapse = TRUE, 12 | comment = "#>", 13 | fig.path = "man/figures/README-", 14 | out.width = "100%" 15 | ) 16 | ``` 17 | 18 | [![Latest PyPI version](https://img.shields.io/pypi/v/magic-impute.svg)](https://pypi.org/project/magic-impute/) 19 | [![Latest CRAN version](https://img.shields.io/cran/v/Rmagic.svg)](https://cran.r-project.org/package=Rmagic) 20 | [![GitHub Actions Build](https://img.shields.io/github/workflow/status/KrishnaswamyLab/MAGIC/Unit%20Tests/master?label=Github%20Actions)](https://github.com/KrishnaswamyLab/MAGIC/actions) 21 | [![Read the Docs](https://img.shields.io/readthedocs/magic.svg)](https://magic.readthedocs.io/) 22 | [![Cell Publication DOI](https://zenodo.org/badge/DOI/10.1016/j.cell.2018.05.061.svg)](https://www.cell.com/cell/abstract/S0092-8674(18)30724-4) 23 | [![Twitter](https://img.shields.io/twitter/follow/KrishnaswamyLab.svg?style=social&label=Follow)](https://twitter.com/KrishnaswamyLab) 24 | [![Github Stars](https://img.shields.io/github/stars/KrishnaswamyLab/MAGIC.svg?style=social&label=Stars)](https://github.com/KrishnaswamyLab/MAGIC/) 25 | 26 | 27 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising and imputation of single cells applied to single-cell RNA sequencing data, as described in Van Dijk D *et al.* (2018), *Recovering Gene Interactions from Single-Cell Data Using Data Diffusion*, Cell . 28 | 29 |

30 | 31 |
32 | Magic reveals the interaction between Vimentin (VIM), Cadherin-1 (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by colors). 33 | 34 |

35 | 36 | * MAGIC imputes missing data values on sparse data sets, restoring the structure of the data 37 | * It also proves dimensionality reduction and gene expression visualizations 38 | * MAGIC can be performed on a variety of datasets 39 | * Here, we show the usage of MAGIC on a toy dataset 40 | * You can view further examples of MAGIC on real data in our notebooks under `inst/examples`: 41 | * http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/EMT_tutorial.html 42 | * http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/bonemarrow_tutorial.html 43 | 44 | ## Table of Contents 45 | 46 | * [Installation](#installation) 47 | * [Installation from CRAN and PyPi](#installation-from-cran-and-pypi) 48 | * [Installation with devtools and reticulate](#installation-with-devtools-and-reticulate) 49 | * [Installation from source](#installation-from-source) 50 | * [Quick Start](#quick-start) 51 | * [Tutorial](#tutorial) 52 | * [Issues](#issues) 53 | * [FAQ](#faq) 54 | * [Help](#help) 55 | 56 | ## Installation 57 | 58 | To use MAGIC, you will need to install both the R and Python packages. 59 | 60 | If `python` or `pip` are not installed, you will need to install them. We recommend [Miniconda3](https://conda.io/miniconda.html) to install Python and `pip` together, or otherwise you can install `pip` from https://pip.pypa.io/en/stable/installing/. 61 | 62 | #### Installation from CRAN 63 | 64 | In R, run this command to install MAGIC and all dependencies: 65 | 66 | ```{r install_Rmagic, eval=FALSE} 67 | install.packages("Rmagic") 68 | ``` 69 | 70 | In a terminal, run the following command to install the Python repository. 71 | 72 | ```{bash install_python_magic, eval=FALSE} 73 | pip install --user magic-impute 74 | ``` 75 | 76 | #### Installaton from source 77 | 78 | To install the very latest version of MAGIC, you can install from GitHub with the following commands run in a terminal. 79 | 80 | ```{bash install_magic_source, eval=FALSE} 81 | git clone https://github.com/KrishnaswamyLab/MAGIC 82 | cd MAGIC/python 83 | python setup.py install --user 84 | cd ../Rmagic 85 | R CMD INSTALL . 86 | ``` 87 | 88 | ## Quick Start 89 | 90 | If you have loaded a data matrix `data` in R (cells on rows, genes on columns) you can run PHATE as follows: 91 | 92 | ```{r quick start, eval=FALSE} 93 | library(phateR) 94 | data_phate <- phate(data) 95 | ``` 96 | 97 | ## Tutorial 98 | 99 | #### Extra packages for the tutorial 100 | 101 | We'll install a couple more tools for this tutorial. 102 | 103 | ```{r install_extras, eval=FALSE} 104 | if (!require(viridis)) install.packages("viridis") 105 | if (!require(ggplot2)) install.packages("ggplot2") 106 | if (!require(phateR)) install.packages("phateR") 107 | ``` 108 | 109 | If you have never used PHATE, you should also install PHATE from the command line as follows: 110 | 111 | ```{bash install_python_phate, eval=FALSE} 112 | pip install --user phate 113 | ``` 114 | 115 | ### Loading packages 116 | 117 | We load the Rmagic package and a few others for convenience functions. 118 | 119 | ```{r load_packages} 120 | library(Rmagic) 121 | library(ggplot2) 122 | library(viridis) 123 | library(phateR) 124 | ``` 125 | 126 | ### Loading data 127 | 128 | The example data is located in the MAGIC R package. 129 | 130 | ```{r load_data} 131 | # load data 132 | data(magic_testdata) 133 | magic_testdata[1:5, 1:10] 134 | ``` 135 | 136 | ### Running MAGIC 137 | 138 | Running MAGIC is as simple as running the `magic` function. 139 | 140 | ```{r run_magic} 141 | # run MAGIC 142 | data_MAGIC <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1")) 143 | ``` 144 | 145 | We can plot the data before and after MAGIC to visualize the results. 146 | 147 | ```{r plot_raw} 148 | ggplot(magic_testdata) + 149 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 150 | scale_colour_viridis(option = "B") 151 | ``` 152 | 153 | The data suffers from dropout to the point that we cannot infer anything about the gene-gene relationships. 154 | 155 | ```{r plot_magic} 156 | ggplot(data_MAGIC) + 157 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 158 | scale_colour_viridis(option = "B") 159 | ``` 160 | 161 | As you can see, the gene-gene relationships are much clearer after MAGIC. 162 | 163 | The data is sometimes a little too smooth - we can decrease `t` from the automatic value to reduce the amount of diffusion. We pass the original result to the argument `init` to avoid recomputing intermediate steps. 164 | 165 | ```{r plot_reduced_t} 166 | data_MAGIC <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1"), t = 6, init = data_MAGIC) 167 | ggplot(data_MAGIC) + 168 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 169 | scale_colour_viridis(option = "B") 170 | ``` 171 | 172 | 173 | We can look at the entire smoothed matrix with `genes='all_genes'`, passing the original result to the argument `init` to avoid recomputing intermediate steps. Note that this matrix may be large and could take up a lot of memory. 174 | 175 | ```{r run_magic_full_matrix} 176 | data_MAGIC <- magic(magic_testdata, genes = "all_genes", t = 6, init = data_MAGIC) 177 | as.data.frame(data_MAGIC)[1:5, 1:10] 178 | ``` 179 | 180 | ### Visualizing MAGIC values on PCA 181 | 182 | We can visualize the results of MAGIC on PCA as follows. 183 | 184 | ```{r run_pca} 185 | data_MAGIC_PCA <- as.data.frame(prcomp(data_MAGIC)$x) 186 | ggplot(data_MAGIC_PCA) + 187 | geom_point(aes(x = PC1, y = PC2, color = data_MAGIC$result$VIM)) + 188 | scale_color_viridis(option = "B") + 189 | labs(color = "VIM") 190 | ``` 191 | 192 | 193 | ### Visualizing MAGIC values on PHATE 194 | 195 | We can visualize the results of MAGIC on PHATE as follows. We set `t` and `k` manually, because this toy dataset is really too small to make sense with PHATE; however, the default values work well for single-cell genomic data. 196 | 197 | ```{r run_phate} 198 | data_PHATE <- phate(magic_testdata, k = 3, t = 15) 199 | ggplot(data_PHATE) + 200 | geom_point(aes(x = PHATE1, y = PHATE2, color = data_MAGIC$result$VIM)) + 201 | scale_color_viridis(option = "B") + 202 | labs(color = "VIM") 203 | ``` 204 | 205 | ## Issues 206 | 207 | ### FAQ 208 | 209 | - **Should genes (features) by rows or columns?** 210 | 211 | To be consistent with common functions such as PCA 212 | (`stats::prcomp`) and t-SNE (`Rtsne::Rtsne`), we require that cells 213 | (observations) be rows and genes (features) be columns of your input 214 | data. 215 | 216 | - **I have installed MAGIC in Python, but Rmagic says it is not 217 | installed!** 218 | 219 | Check your `reticulate::py_discover_config("magic")` and compare it to 220 | the version of Python in which you installed PHATE (run `which python` 221 | and `which pip` in a terminal.) Chances are `reticulate` can’t find the 222 | right version of Python; you can fix this by adding the following line 223 | to your `~/.Renviron`: 224 | 225 | `PATH=/path/to/my/python` 226 | 227 | You can read more about `Renviron` at 228 | . 229 | 230 | ### Help 231 | 232 | Please let us know of any issues at the [GitHub repository](https://github.com/KrishnaswamyLab/MAGIC/issues). If you 233 | have any questions or require assistance using MAGIC, please read the 234 | documentation by running `help(Rmagic::magic)` or contact us at 235 | . 236 | -------------------------------------------------------------------------------- /Rmagic/README.md: -------------------------------------------------------------------------------- 1 | --- 2 | title : Rmagic 3 | output: github_document 4 | toc: true 5 | --- 6 | 7 | 8 | 9 | 10 | 11 | [![Latest PyPI version](https://img.shields.io/pypi/v/magic-impute.svg)](https://pypi.org/project/magic-impute/) 12 | [![Latest CRAN version](https://img.shields.io/cran/v/Rmagic.svg)](https://cran.r-project.org/package=Rmagic) 13 | [![GitHub Actions Build](https://img.shields.io/github/workflow/status/KrishnaswamyLab/MAGIC/Unit%20Tests/master?label=Github%20Actions)](https://github.com/KrishnaswamyLab/MAGIC/actions) 14 | [![Read the Docs](https://img.shields.io/readthedocs/magic.svg)](https://magic.readthedocs.io/) 15 | [![Cell Publication DOI](https://zenodo.org/badge/DOI/10.1016/j.cell.2018.05.061.svg)](https://www.cell.com/cell/abstract/S0092-8674(18)30724-4) 16 | [![Twitter](https://img.shields.io/twitter/follow/KrishnaswamyLab.svg?style=social&label=Follow)](https://twitter.com/KrishnaswamyLab) 17 | [![Github Stars](https://img.shields.io/github/stars/KrishnaswamyLab/MAGIC.svg?style=social&label=Stars)](https://github.com/KrishnaswamyLab/MAGIC/) 18 | 19 | 20 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising and imputation of single cells applied to single-cell RNA sequencing data, as described in Van Dijk D *et al.* (2018), *Recovering Gene Interactions from Single-Cell Data Using Data Diffusion*, Cell . 21 | 22 |

23 | 24 |
25 | Magic reveals the interaction between Vimentin (VIM), Cadherin-1 (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by colors). 26 | 27 |

28 | 29 | * MAGIC imputes missing data values on sparse data sets, restoring the structure of the data 30 | * It also proves dimensionality reduction and gene expression visualizations 31 | * MAGIC can be performed on a variety of datasets 32 | * Here, we show the usage of MAGIC on a toy dataset 33 | * You can view further examples of MAGIC on real data in our notebooks under `inst/examples`: 34 | * http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/EMT_tutorial.html 35 | * http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/bonemarrow_tutorial.html 36 | 37 | ## Table of Contents 38 | 39 | * [Installation](#installation) 40 | * [Installation from CRAN and PyPi](#installation-from-cran-and-pypi) 41 | * [Installation with devtools and reticulate](#installation-with-devtools-and-reticulate) 42 | * [Installation from source](#installation-from-source) 43 | * [Quick Start](#quick-start) 44 | * [Tutorial](#tutorial) 45 | * [Issues](#issues) 46 | * [FAQ](#faq) 47 | * [Help](#help) 48 | 49 | ## Installation 50 | 51 | To use MAGIC, you will need to install both the R and Python packages. 52 | 53 | If `python` or `pip` are not installed, you will need to install them. We recommend [Miniconda3](https://conda.io/miniconda.html) to install Python and `pip` together, or otherwise you can install `pip` from https://pip.pypa.io/en/stable/installing/. 54 | 55 | #### Installation from CRAN 56 | 57 | In R, run this command to install MAGIC and all dependencies: 58 | 59 | 60 | ```r 61 | install.packages("Rmagic") 62 | ``` 63 | 64 | In a terminal, run the following command to install the Python repository. 65 | 66 | 67 | ```bash 68 | pip install --user magic-impute 69 | ``` 70 | 71 | #### Installaton from source 72 | 73 | To install the very latest version of MAGIC, you can install from GitHub with the following commands run in a terminal. 74 | 75 | 76 | ```bash 77 | git clone https://github.com/KrishnaswamyLab/MAGIC 78 | cd MAGIC/python 79 | python setup.py install --user 80 | cd ../Rmagic 81 | R CMD INSTALL . 82 | ``` 83 | 84 | ## Quick Start 85 | 86 | If you have loaded a data matrix `data` in R (cells on rows, genes on columns) you can run PHATE as follows: 87 | 88 | 89 | ```r 90 | library(phateR) 91 | data_phate <- phate(data) 92 | ``` 93 | 94 | ## Tutorial 95 | 96 | #### Extra packages for the tutorial 97 | 98 | We'll install a couple more tools for this tutorial. 99 | 100 | 101 | ```r 102 | if (!require(viridis)) install.packages("viridis") 103 | if (!require(ggplot2)) install.packages("ggplot2") 104 | if (!require(phateR)) install.packages("phateR") 105 | ``` 106 | 107 | If you have never used PHATE, you should also install PHATE from the command line as follows: 108 | 109 | 110 | ```bash 111 | pip install --user phate 112 | ``` 113 | 114 | ### Loading packages 115 | 116 | We load the Rmagic package and a few others for convenience functions. 117 | 118 | 119 | ```r 120 | library(Rmagic) 121 | #> Loading required package: Matrix 122 | library(ggplot2) 123 | library(viridis) 124 | #> Loading required package: viridisLite 125 | library(phateR) 126 | #> 127 | #> Attaching package: 'phateR' 128 | #> The following object is masked from 'package:Rmagic': 129 | #> 130 | #> library.size.normalize 131 | ``` 132 | 133 | ### Loading data 134 | 135 | The example data is located in the MAGIC R package. 136 | 137 | 138 | ```r 139 | # load data 140 | data(magic_testdata) 141 | magic_testdata[1:5, 1:10] 142 | #> A1BG-AS1 AAMDC AAMP AARSD1 ABCA12 ABCG2 ABHD13 AC007773.2 143 | #> 6564 0.0000000 0.0000000 0.0000000 0 0 0 0.0000000 0 144 | #> 3835 0.0000000 0.8714711 0.0000000 0 0 0 0.8714711 0 145 | #> 6318 0.7739207 0.0000000 0.7739207 0 0 0 0.0000000 0 146 | #> 3284 0.0000000 0.0000000 0.0000000 0 0 0 0.0000000 0 147 | #> 1171 0.0000000 0.0000000 0.0000000 0 0 0 0.0000000 0 148 | #> AC011998.4 AC013470.6 149 | #> 6564 0 0 150 | #> 3835 0 0 151 | #> 6318 0 0 152 | #> 3284 0 0 153 | #> 1171 0 0 154 | ``` 155 | 156 | ### Running MAGIC 157 | 158 | Running MAGIC is as simple as running the `magic` function. 159 | 160 | 161 | ```r 162 | # run MAGIC 163 | data_MAGIC <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1")) 164 | ``` 165 | 166 | We can plot the data before and after MAGIC to visualize the results. 167 | 168 | 169 | ```r 170 | ggplot(magic_testdata) + 171 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 172 | scale_colour_viridis(option = "B") 173 | ``` 174 | 175 | plot of chunk plot_raw 176 | 177 | The data suffers from dropout to the point that we cannot infer anything about the gene-gene relationships. 178 | 179 | 180 | ```r 181 | ggplot(data_MAGIC) + 182 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 183 | scale_colour_viridis(option = "B") 184 | ``` 185 | 186 | plot of chunk plot_magic 187 | 188 | As you can see, the gene-gene relationships are much clearer after MAGIC. 189 | 190 | The data is sometimes a little too smooth - we can decrease `t` from the automatic value to reduce the amount of diffusion. We pass the original result to the argument `init` to avoid recomputing intermediate steps. 191 | 192 | 193 | ```r 194 | data_MAGIC <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1"), t = 6, init = data_MAGIC) 195 | ggplot(data_MAGIC) + 196 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 197 | scale_colour_viridis(option = "B") 198 | ``` 199 | 200 | plot of chunk plot_reduced_t 201 | 202 | 203 | We can look at the entire smoothed matrix with `genes='all_genes'`, passing the original result to the argument `init` to avoid recomputing intermediate steps. Note that this matrix may be large and could take up a lot of memory. 204 | 205 | 206 | ```r 207 | data_MAGIC <- magic(magic_testdata, genes = "all_genes", t = 6, init = data_MAGIC) 208 | as.data.frame(data_MAGIC)[1:5, 1:10] 209 | #> A1BG-AS1 AAMDC AAMP AARSD1 ABCA12 ABCG2 210 | #> 6564 0.03332336 0.06672377 0.1718769 0.01765440 0.03641116 0.01703004 211 | #> 3835 0.03142519 0.06720022 0.1568662 0.01619578 0.03338187 0.01729001 212 | #> 6318 0.03519781 0.06551774 0.1811869 0.01462556 0.03595934 0.02094741 213 | #> 3284 0.03130388 0.06374405 0.1621586 0.01686944 0.03288072 0.01786413 214 | #> 1171 0.03515109 0.06447265 0.1735847 0.01444976 0.03791399 0.01995593 215 | #> ABHD13 AC007773.2 AC011998.4 AC013470.6 216 | #> 6564 0.07692547 0.0007960324 0.001382103 0.002978190 217 | #> 3835 0.07578407 0.0007146892 0.001206586 0.002613474 218 | #> 6318 0.08120989 0.0011273292 0.001594218 0.005743911 219 | #> 3284 0.07568180 0.0007009115 0.001017284 0.002982551 220 | #> 1171 0.07975672 0.0010427596 0.001982926 0.005315534 221 | ``` 222 | 223 | ### Visualizing MAGIC values on PCA 224 | 225 | We can visualize the results of MAGIC on PCA as follows. 226 | 227 | 228 | ```r 229 | data_MAGIC_PCA <- as.data.frame(prcomp(data_MAGIC)$x) 230 | ggplot(data_MAGIC_PCA) + 231 | geom_point(aes(x = PC1, y = PC2, color = data_MAGIC$result$VIM)) + 232 | scale_color_viridis(option = "B") + 233 | labs(color = "VIM") 234 | ``` 235 | 236 | plot of chunk run_pca 237 | 238 | 239 | ### Visualizing MAGIC values on PHATE 240 | 241 | We can visualize the results of MAGIC on PHATE as follows. We set `t` and `k` manually, because this toy dataset is really too small to make sense with PHATE; however, the default values work well for single-cell genomic data. 242 | 243 | 244 | ```r 245 | data_PHATE <- phate(magic_testdata, k = 3, t = 15) 246 | #> Argument k is deprecated. Using knn instead. 247 | ggplot(data_PHATE) + 248 | geom_point(aes(x = PHATE1, y = PHATE2, color = data_MAGIC$result$VIM)) + 249 | scale_color_viridis(option = "B") + 250 | labs(color = "VIM") 251 | ``` 252 | 253 | plot of chunk run_phate 254 | 255 | ## Issues 256 | 257 | ### FAQ 258 | 259 | - **Should genes (features) by rows or columns?** 260 | 261 | To be consistent with common functions such as PCA 262 | (`stats::prcomp`) and t-SNE (`Rtsne::Rtsne`), we require that cells 263 | (observations) be rows and genes (features) be columns of your input 264 | data. 265 | 266 | - **I have installed MAGIC in Python, but Rmagic says it is not 267 | installed!** 268 | 269 | Check your `reticulate::py_discover_config("magic")` and compare it to 270 | the version of Python in which you installed PHATE (run `which python` 271 | and `which pip` in a terminal.) Chances are `reticulate` can’t find the 272 | right version of Python; you can fix this by adding the following line 273 | to your `~/.Renviron`: 274 | 275 | `PATH=/path/to/my/python` 276 | 277 | You can read more about `Renviron` at 278 | . 279 | 280 | ### Help 281 | 282 | Please let us know of any issues at the [GitHub repository](https://github.com/KrishnaswamyLab/MAGIC/issues). If you 283 | have any questions or require assistance using MAGIC, please read the 284 | documentation by running `help(Rmagic::magic)` or contact us at 285 | . 286 | -------------------------------------------------------------------------------- /Rmagic/data-raw/generate_test_data.R: -------------------------------------------------------------------------------- 1 | library(readr) 2 | magic_testdata <- read_csv("../../data/HMLE_TGFb_day_8_10.csv.gz") 3 | set.seed(42) 4 | keep_cols <- colSums(magic_testdata > 0) > 10 5 | keep_rows <- rowSums(magic_testdata) > 2000 6 | magic_testdata <- magic_testdata[keep_rows, keep_cols] 7 | magic_testdata <- Rmagic::library.size.normalize(magic_testdata) 8 | magic_testdata <- sqrt(magic_testdata) 9 | select_cols <- c( 10 | colnames(magic_testdata)[ceiling(runif(200) * nrow(magic_testdata))], 11 | c("VIM", "CDH1", "ZEB1") 12 | ) 13 | magic_testdata <- magic_testdata[, colnames(magic_testdata) %in% select_cols] 14 | select_rows <- ceiling(runif(500) * nrow(magic_testdata)) 15 | magic_testdata <- magic_testdata[select_rows, ] 16 | write_csv(magic_testdata, "../../data/test_data.csv") 17 | usethis::use_data(magic_testdata) 18 | -------------------------------------------------------------------------------- /Rmagic/data/magic_testdata.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/data/magic_testdata.rda -------------------------------------------------------------------------------- /Rmagic/inst/CITATION: -------------------------------------------------------------------------------- 1 | bibentry( 2 | bibtype="Article", 3 | title="Recovering Gene Interactions from Single-Cell Data Using Data Diffusion", 4 | author = c( 5 | person("David", "van Dijk"), 6 | person("Roshan", "Sharma"), 7 | person("Juozas", "Nainys"), 8 | person("Kristina", "Yim"), 9 | person("Pooja", "Kathail"), 10 | person("Ambrose J.", "Carr"), 11 | person("Cassandra", "Burdziak"), 12 | person("Kevin R.", "Moon"), 13 | person("Christine L.", "Chaffer"), 14 | person("Diwakar", "Pattabiraman"), 15 | person("Brian", "Bierie"), 16 | person("Linas", "Mazutis"), 17 | person("Guy", "Wolf"), 18 | person("Smita", "Krishnaswamy"), 19 | person("Dana", "Pe'er")), 20 | year=2018, 21 | url="https://www.cell.com/cell/abstract/S0092-8674(18)30724-4", 22 | doi="10.1016/j.cell.2018.05.061", 23 | journal="Cell", 24 | publisher="Cell Press" 25 | ) 26 | -------------------------------------------------------------------------------- /Rmagic/inst/examples/BMMSC_data_R_after_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/BMMSC_data_R_after_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/BMMSC_data_R_before_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/BMMSC_data_R_before_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/BMMSC_data_R_pca_colored_by_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/BMMSC_data_R_pca_colored_by_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/BMMSC_data_R_phate_colored_by_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/BMMSC_data_R_phate_colored_by_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/EMT_data_R_after_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/EMT_data_R_after_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/EMT_data_R_before_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/EMT_data_R_before_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/EMT_data_R_pca_colored_by_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/EMT_data_R_pca_colored_by_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/EMT_data_R_phate_colored_by_magic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/inst/examples/EMT_data_R_phate_colored_by_magic.png -------------------------------------------------------------------------------- /Rmagic/inst/examples/bonemarrow_tutorial.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Rmagic Bone Marrow Tutorial" 3 | output: 4 | html_document: 5 | df_print: paged 6 | toc: yes 7 | toc_depth: '3' 8 | --- 9 | 10 | 11 | 12 | ```{r setup, include=FALSE} 13 | knitr::opts_chunk$set(echo = TRUE) 14 | ``` 15 | 16 | ## MAGIC (Markov Affinity-Based Graph Imputation of Cells) 17 | 18 | * MAGIC imputes missing data values on sparse data sets, restoring the structure of the data 19 | * It also proves dimensionality reduction and gene expression visualizations 20 | * MAGIC can be performed on a variety of datasets 21 | * Here, we show the effectiveness of MAGIC on erythroid and myeloid cells developing in mouse bone marrow. 22 | 23 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising and transcript recover of single cells applied to single-cell RNA sequencing data, as described in Van Dijk D *et al.* (2018), *Recovering Gene Interactions from Single-Cell Data Using Data Diffusion*, Cell . 24 | 25 | ### Installation 26 | 27 | If you haven't yet installed MAGIC, you can find installation instructions in our [GitHub README](https://github.com/KrishnaswamyLab/MAGIC/tree/master/Rmagic). 28 | 29 | We'll install a couple more tools for this tutorial. 30 | 31 | ```{r install_extras, eval=FALSE} 32 | if (!require(viridis)) install.packages("viridis") 33 | if (!require(ggplot2)) install.packages("ggplot2") 34 | if (!require(readr)) install.packages("readr") 35 | if (!require(phateR)) install.packages("phateR") 36 | ``` 37 | 38 | If you have never used PHATE, you should also install PHATE from the command line as follows: 39 | 40 | ```{bash install_python_phate, eval=FALSE} 41 | pip install --user phate 42 | ``` 43 | 44 | ### Loading packages 45 | 46 | We load the Rmagic package and a few others for convenience functions. 47 | 48 | ```{r load_packages} 49 | library(Rmagic) 50 | library(ggplot2) 51 | library(readr) 52 | library(viridis) 53 | library(phateR) 54 | ``` 55 | 56 | ### Loading data 57 | 58 | In this tutorial, we will analyse myeloid and erythroid cells in mouse bone marrow, as described in Paul et al., 2015. The example data is located in the PHATE Github repository and we can load it directly from the web. You can run this tutorial with your own data by downloading and opening it in RStudio. 59 | 60 | ```{r load_data} 61 | # load data 62 | bmmsc <- read_csv("https://github.com/KrishnaswamyLab/PHATE/raw/master/data/BMMC_myeloid.csv.gz") 63 | bmmsc <- bmmsc[, 2:ncol(bmmsc)] 64 | bmmsc[1:5, 1:10] 65 | ``` 66 | 67 | ### Filtering data 68 | 69 | First, we need to remove lowly expressed genes and cells with small library size. 70 | 71 | ```{r} 72 | # keep genes expressed in at least 10 cells 73 | keep_cols <- colSums(bmmsc > 0) > 10 74 | bmmsc <- bmmsc[, keep_cols] 75 | # look at the distribution of library sizes 76 | ggplot() + 77 | geom_histogram(aes(x = rowSums(bmmsc)), bins = 50) + 78 | geom_vline(xintercept = 1000, color = "red") 79 | ``` 80 | 81 | ```{r} 82 | # keep cells with at least 1000 UMIs 83 | keep_rows <- rowSums(bmmsc) > 1000 84 | bmmsc <- bmmsc[keep_rows, ] 85 | ``` 86 | 87 | ### Normalizing data 88 | 89 | We should library size normalize and transform the data prior to MAGIC. Many people use a log transform, which requires adding a "pseudocount" to avoid log(0). We square root instead, which has a similar form but doesn't suffer from instabilities at zero. 90 | 91 | ```{r normalize} 92 | bmmsc <- library.size.normalize(bmmsc) 93 | bmmsc <- sqrt(bmmsc) 94 | ``` 95 | 96 | ### Running MAGIC 97 | 98 | Running MAGIC is as simple as running the `magic` function. 99 | 100 | ```{r run_magic} 101 | # run MAGIC 102 | bmmsc_MAGIC <- magic(bmmsc, genes = c("Mpo", "Klf1", "Ifitm1")) 103 | ``` 104 | 105 | We can plot the data before and after MAGIC to visualize the results. 106 | 107 | ```{r plot_raw} 108 | ggplot(bmmsc) + 109 | geom_point(aes(Mpo, Klf1, color = Ifitm1)) + 110 | scale_color_viridis(option = "B") 111 | ggsave("BMMSC_data_R_before_magic.png", width = 5, height = 5) 112 | ``` 113 | 114 | The data suffers from dropout to the point that we cannot infer anything about the gene-gene relationships. 115 | 116 | ```{r plot_magic} 117 | ggplot(bmmsc_MAGIC) + 118 | geom_point(aes(Mpo, Klf1, color = Ifitm1)) + 119 | scale_color_viridis(option = "B") 120 | ``` 121 | 122 | As you can see, the gene-gene relationships are much clearer after MAGIC. These relationships also match the biological progression we expect to see - Ifitm1 is a stem cell marker, Klf1 is an erythroid marker, and Mpo is a myeloid marker. 123 | 124 | ### Rerunning MAGIC with new parameters 125 | 126 | The data is a little too smooth - we can increase `t` from the default value of 3 to increase the amount of diffusion. We pass the original result to the argument `init` to avoid recomputing intermediate steps. 127 | 128 | ```{r decrease_t} 129 | bmmsc_MAGIC <- magic(bmmsc, 130 | genes = c("Mpo", "Klf1", "Ifitm1"), 131 | t = 4, init = bmmsc_MAGIC 132 | ) 133 | ggplot(bmmsc_MAGIC) + 134 | geom_point(aes(Mpo, Klf1, color = Ifitm1)) + 135 | scale_color_viridis(option = "B") 136 | ggsave("BMMSC_data_R_after_magic.png", width = 5, height = 5) 137 | ``` 138 | 139 | ### Visualizing MAGIC values on PCA 140 | 141 | We can visualize the results of MAGIC on PCA with `genes="pca_only"`. 142 | 143 | ```{r run_pca} 144 | bmmsc_MAGIC_PCA <- magic(bmmsc, 145 | genes = "pca_only", 146 | t = 4, init = bmmsc_MAGIC 147 | ) 148 | # ggplot(bmmsc_MAGIC_PCA) + 149 | geom_point(aes(x = PC1, y = PC2, color = bmmsc_MAGIC$result$Klf1)) + 150 | scale_color_viridis(option = "B") + 151 | labs(color = "Klf1") 152 | ggsave("BMMSC_data_R_pca_colored_by_magic.png", width = 5, height = 5) 153 | ``` 154 | 155 | 156 | ### Visualizing MAGIC values on PHATE 157 | 158 | We can visualize the results of MAGIC on PHATE as follows. 159 | 160 | ```{r run_phate} 161 | bmmsc_PHATE <- phate(bmmsc) 162 | ggplot(bmmsc_PHATE) + 163 | geom_point(aes(x = PHATE1, y = PHATE2, color = bmmsc_MAGIC$result$Klf1)) + 164 | scale_color_viridis(option = "B") + 165 | labs(color = "Klf1") 166 | ggsave("BMMSC_data_R_phate_colored_by_magic.png", width = 5, height = 5) 167 | ``` 168 | 169 | ### Using MAGIC for downstream analysis 170 | 171 | We can look at the entire smoothed matrix with `genes='all_genes'`, passing the original result to the argument `init` to avoid recomputing intermediate steps. Note that this matrix may be large and could take up a lot of memory. 172 | 173 | ```{r run_magic_full_matrix} 174 | bmmsc_MAGIC <- magic(bmmsc, 175 | genes = "all_genes", 176 | t = 4, init = bmmsc_MAGIC 177 | ) 178 | as.data.frame(bmmsc_MAGIC)[1:5, 1:10] 179 | ``` 180 | 181 | ## Help 182 | 183 | If you have any questions or require assistance using MAGIC, please contact us at . 184 | -------------------------------------------------------------------------------- /Rmagic/inst/examples/emt_tutorial.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Rmagic EMT Tutorial" 3 | output: 4 | html_document: 5 | df_print: paged 6 | toc: yes 7 | toc_depth: '3' 8 | --- 9 | 10 | 11 | 12 | ```{r setup, include=FALSE} 13 | knitr::opts_chunk$set(echo = TRUE) 14 | ``` 15 | 16 | ## MAGIC (Markov Affinity-Based Graph Imputation of Cells) 17 | 18 | * MAGIC imputes missing data values on sparse data sets, restoring the structure of the data 19 | * It also proves dimensionality reduction and gene expression visualizations 20 | * MAGIC can be performed on a variety of datasets 21 | * Here, we show the effectiveness of MAGIC on epithelial-to-mesenchymal transition (EMT) data 22 | 23 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising and transcript recover of single cells applied to single-cell RNA sequencing data, as described in Van Dijk D *et al.* (2018), *Recovering Gene Interactions from Single-Cell Data Using Data Diffusion*, Cell . 24 | 25 | ### Installation 26 | 27 | If you haven't yet installed MAGIC, you can find installation instructions in our [GitHub README](https://github.com/KrishnaswamyLab/MAGIC/tree/master/Rmagic). 28 | 29 | We'll install a couple more tools for this tutorial. 30 | 31 | ```{r install_extras, eval=FALSE} 32 | if (!require(viridis)) install.packages("viridis") 33 | if (!require(ggplot2)) install.packages("ggplot2") 34 | if (!require(readr)) install.packages("readr") 35 | if (!require(phateR)) install.packages("phateR") 36 | ``` 37 | 38 | If you have never used PHATE, you should also install PHATE from the command line as follows: 39 | 40 | ```{bash install_python_phate, eval=FALSE} 41 | pip install --user phate 42 | ``` 43 | 44 | ### Loading packages 45 | 46 | We load the Rmagic package and a few others for convenience functions. 47 | 48 | ```{r load_packages} 49 | library(Rmagic) 50 | library(readr) 51 | library(ggplot2) 52 | library(viridis) 53 | library(phateR) 54 | ``` 55 | 56 | ### Loading data 57 | 58 | In this tutorial, we will analyze single-cell RNA sequencing of the epithelial to mesenchymal transition. The example data is located in the MAGIC Github repository. You can run this tutorial with your own data by downloading and opening it in RStudio. 59 | 60 | ```{r load_data} 61 | # load data 62 | data <- read_csv("../../../data/HMLE_TGFb_day_8_10.csv.gz") 63 | data[1:5, 1:10] 64 | ``` 65 | 66 | ### Filtering data 67 | 68 | First, we need to remove lowly expressed genes. 69 | 70 | ```{r remove_rare_genes} 71 | # keep genes expressed in at least 10 cells 72 | keep_cols <- colSums(data > 0) > 10 73 | data <- data[, keep_cols] 74 | ``` 75 | 76 | Ordinarily, we would remove cells with small library sizes. In this dataset, it has already been done; however, if you wanted to do that, you could do it with the code below. 77 | 78 | ```{r libsize_histogram} 79 | # look at the distribution of library sizes 80 | ggplot() + 81 | geom_histogram(aes(x = rowSums(data)), bins = 50) + 82 | geom_vline(xintercept = 1000, color = "red") 83 | ``` 84 | 85 | ```{r filter_libsize} 86 | if (FALSE) { 87 | # keep cells with at least 1000 UMIs and at most 15000 88 | keep_rows <- rowSums(data) > 1000 & rowSums(data) < 15000 89 | data <- data[keep_rows, ] 90 | } 91 | ``` 92 | 93 | ### Normalizing data 94 | 95 | We should library size normalize the data prior to MAGIC. Often we also transform the data with either log or square root. The log transform is commonly used, which requires adding a "pseudocount" to avoid log(0). We normally square root instead, which has a similar form but doesn't suffer from instabilities at zero. For this dataset, though, it is not necessary as the distribution of gene expression is not too extreme. 96 | 97 | ```{r normalize} 98 | data <- library.size.normalize(data) 99 | if (FALSE) { 100 | data <- sqrt(data) 101 | } 102 | ``` 103 | 104 | ### Running MAGIC 105 | 106 | Running MAGIC is as simple as running the `magic` function. Because this dataset is rather small, we can decrease `knn` from the default of 5 down to 3. 107 | 108 | ```{r run_magic} 109 | # run MAGIC 110 | data_MAGIC <- magic(data, knn = 3, genes = c("VIM", "CDH1", "ZEB1")) 111 | ``` 112 | 113 | We can plot the data before and after MAGIC to visualize the results. 114 | 115 | ```{r plot_raw} 116 | ggplot(data) + 117 | geom_point(aes(VIM, CDH1, color = ZEB1)) + 118 | scale_color_viridis(option = "B") 119 | ggsave("EMT_data_R_before_magic.png", width = 5, height = 5) 120 | ``` 121 | 122 | ```{r plot_magic} 123 | ggplot(data_MAGIC) + 124 | geom_point(aes(VIM, CDH1, color = ZEB1)) + 125 | scale_color_viridis(option = "B") 126 | ggsave("EMT_data_R_after_magic.png", width = 5, height = 5) 127 | ``` 128 | 129 | As you can see, the gene-gene relationships are much clearer after MAGIC. 130 | 131 | ### Visualizing MAGIC values on PCA 132 | 133 | We can visualize the results of MAGIC on PCA with `genes="pca_only"`. 134 | 135 | ```{r run_pca} 136 | data_MAGIC_PCA <- magic(data, 137 | genes = "pca_only", 138 | knn = 15, init = data_MAGIC 139 | ) 140 | ggplot(data_MAGIC_PCA) + 141 | geom_point(aes(x = PC1, y = PC2, color = data_MAGIC$result$VIM)) + 142 | scale_color_viridis(option = "B") + 143 | labs(color = "VIM") 144 | ggsave("EMT_data_R_pca_colored_by_magic.png", width = 5, height = 5) 145 | ``` 146 | 147 | ### Using MAGIC for downstream analysis 148 | 149 | We can look at the entire smoothed matrix with `genes='all_genes'`, passing the original result to the argument `init` to avoid recomputing intermediate steps. Note that this matrix may be large and could take up a lot of memory. 150 | 151 | ```{r run_magic_full_matrix} 152 | data_MAGIC <- magic(data, 153 | genes = "all_genes", 154 | knn = 15, init = data_MAGIC 155 | ) 156 | as.data.frame(data_MAGIC)[1:5, 1:10] 157 | ``` 158 | 159 | ### Help 160 | 161 | If you have any questions or require assistance using MAGIC, please contact us at . 162 | -------------------------------------------------------------------------------- /Rmagic/man/as.data.frame.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic.R 3 | \name{as.data.frame.magic} 4 | \alias{as.data.frame.magic} 5 | \title{Convert a MAGIC object to a data.frame} 6 | \usage{ 7 | \method{as.data.frame}{magic}(x, ...) 8 | } 9 | \arguments{ 10 | \item{x}{A fitted MAGIC object} 11 | 12 | \item{...}{Arguments for as.data.frame()} 13 | } 14 | \description{ 15 | Returns the smoothed data matrix 16 | } 17 | -------------------------------------------------------------------------------- /Rmagic/man/as.matrix.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic.R 3 | \name{as.matrix.magic} 4 | \alias{as.matrix.magic} 5 | \title{Convert a MAGIC object to a matrix} 6 | \usage{ 7 | \method{as.matrix}{magic}(x, ...) 8 | } 9 | \arguments{ 10 | \item{x}{A fitted MAGIC object} 11 | 12 | \item{...}{Arguments for as.matrix()} 13 | } 14 | \description{ 15 | Returns the smoothed data matrix 16 | } 17 | -------------------------------------------------------------------------------- /Rmagic/man/check_pymagic_version.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{check_pymagic_version} 4 | \alias{check_pymagic_version} 5 | \title{Check that the current MAGIC version in Python is up to date.} 6 | \usage{ 7 | check_pymagic_version() 8 | } 9 | \description{ 10 | Check that the current MAGIC version in Python is up to date. 11 | } 12 | -------------------------------------------------------------------------------- /Rmagic/man/figures/README-plot_magic-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-plot_magic-1.png -------------------------------------------------------------------------------- /Rmagic/man/figures/README-plot_raw-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-plot_raw-1.png -------------------------------------------------------------------------------- /Rmagic/man/figures/README-plot_reduced_t-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-plot_reduced_t-1.png -------------------------------------------------------------------------------- /Rmagic/man/figures/README-run_pca-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-run_pca-1.png -------------------------------------------------------------------------------- /Rmagic/man/figures/README-run_phate-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-run_phate-1.png -------------------------------------------------------------------------------- /Rmagic/man/figures/README-unnamed-chunk-1-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-unnamed-chunk-1-1.png -------------------------------------------------------------------------------- /Rmagic/man/figures/README-unnamed-chunk-3-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/Rmagic/man/figures/README-unnamed-chunk-3-1.png -------------------------------------------------------------------------------- /Rmagic/man/ggplot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic.R 3 | \name{ggplot.magic} 4 | \alias{ggplot.magic} 5 | \title{Convert a MAGIC object to a data.frame for ggplot} 6 | \usage{ 7 | \method{ggplot}{magic}(data, ...) 8 | } 9 | \arguments{ 10 | \item{data}{A fitted MAGIC object} 11 | 12 | \item{...}{Arguments for ggplot()} 13 | } 14 | \description{ 15 | Passes the smoothed data matrix to ggplot 16 | } 17 | \examples{ 18 | if (pymagic_is_available() && require(ggplot2)) { 19 | data(magic_testdata) 20 | data_magic <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1")) 21 | ggplot(data_magic, aes(VIM, CDH1, colour = ZEB1)) + 22 | geom_point() 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /Rmagic/man/install.magic.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{install.magic} 4 | \alias{install.magic} 5 | \title{Install MAGIC Python Package} 6 | \usage{ 7 | install.magic( 8 | envname = "r-reticulate", 9 | method = "auto", 10 | conda = "auto", 11 | pip = TRUE, 12 | ... 13 | ) 14 | } 15 | \arguments{ 16 | \item{envname}{Name of environment to install packages into} 17 | 18 | \item{method}{Installation method. By default, "auto" automatically finds 19 | a method that will work in the local environment. Change the default to 20 | force a specific installation method. Note that the "virtualenv" method 21 | is not available on Windows.} 22 | 23 | \item{conda}{Path to conda executable (or "auto" to find conda using the PATH 24 | and other conventional install locations).} 25 | 26 | \item{pip}{Install from pip, if possible.} 27 | 28 | \item{...}{Additional arguments passed to conda_install() or 29 | virtualenv_install().} 30 | } 31 | \description{ 32 | Install MAGIC Python package into a virtualenv or conda env. 33 | } 34 | \details{ 35 | On Linux and OS X the "virtualenv" method will be used by default 36 | ("conda" will be used if virtualenv isn't available). On Windows, 37 | the "conda" method is always used. 38 | } 39 | -------------------------------------------------------------------------------- /Rmagic/man/library.size.normalize.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/preprocessing.R 3 | \name{library.size.normalize} 4 | \alias{library.size.normalize} 5 | \title{Performs L1 normalization on input data such that the sum of expression 6 | values for each cell sums to 1, then returns normalized matrix to the metric 7 | space using median UMI count per cell effectively scaling all cells as if 8 | they were sampled evenly.} 9 | \usage{ 10 | library.size.normalize(data, verbose = FALSE) 11 | } 12 | \arguments{ 13 | \item{data}{matrix (n_samples, n_dimensions) 14 | 2 dimensional input data array with n cells and p dimensions} 15 | 16 | \item{verbose}{boolean, default=FALSE. If true, print verbose output} 17 | } 18 | \value{ 19 | data_norm matrix (n_samples, n_dimensions) 20 | 2 dimensional array with normalized gene expression values 21 | } 22 | \description{ 23 | Performs L1 normalization on input data such that the sum of expression 24 | values for each cell sums to 1, then returns normalized matrix to the metric 25 | space using median UMI count per cell effectively scaling all cells as if 26 | they were sampled evenly. 27 | } 28 | -------------------------------------------------------------------------------- /Rmagic/man/magic.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic.R 3 | \name{magic} 4 | \alias{magic} 5 | \alias{magic.default} 6 | \alias{magic.seurat} 7 | \alias{magic.Seurat} 8 | \title{Perform MAGIC on a data matrix} 9 | \usage{ 10 | magic(data, ...) 11 | 12 | \method{magic}{default}( 13 | data, 14 | genes = NULL, 15 | knn = 5, 16 | knn.max = NULL, 17 | decay = 1, 18 | t = 3, 19 | npca = 100, 20 | solver = "exact", 21 | init = NULL, 22 | t.max = 20, 23 | knn.dist.method = "euclidean", 24 | verbose = 1, 25 | n.jobs = 1, 26 | seed = NULL, 27 | k = NULL, 28 | alpha = NULL, 29 | ... 30 | ) 31 | 32 | \method{magic}{seurat}( 33 | data, 34 | genes = NULL, 35 | knn = 5, 36 | knn.max = NULL, 37 | decay = 1, 38 | t = 3, 39 | npca = 100, 40 | solver = "exact", 41 | init = NULL, 42 | t.max = 20, 43 | knn.dist.method = "euclidean", 44 | verbose = 1, 45 | n.jobs = 1, 46 | seed = NULL, 47 | ... 48 | ) 49 | 50 | \method{magic}{Seurat}( 51 | data, 52 | assay = NULL, 53 | genes = NULL, 54 | knn = 5, 55 | knn.max = NULL, 56 | decay = 1, 57 | t = 3, 58 | npca = 100, 59 | solver = "exact", 60 | init = NULL, 61 | t.max = 20, 62 | knn.dist.method = "euclidean", 63 | verbose = 1, 64 | n.jobs = 1, 65 | seed = NULL, 66 | ... 67 | ) 68 | } 69 | \arguments{ 70 | \item{data}{input data matrix or Seurat object} 71 | 72 | \item{...}{Arguments passed to and from other methods} 73 | 74 | \item{genes}{character or integer vector, default: NULL 75 | vector of column names or column indices for which to return smoothed data 76 | If 'all_genes' or NULL, the entire smoothed matrix is returned} 77 | 78 | \item{knn}{int, optional, default: 5 79 | number of nearest neighbors on which to compute bandwidth} 80 | 81 | \item{knn.max}{int, optional, default: NULL 82 | maximum number of neighbors for each point. If NULL, defaults to 3*knn} 83 | 84 | \item{decay}{int, optional, default: 1 85 | sets decay rate of kernel tails. 86 | If NULL, alpha decaying kernel is not used} 87 | 88 | \item{t}{int, optional, default: 3 89 | power to which the diffusion operator is powered 90 | sets the level of diffusion. If 'auto', t is selected according to the 91 | Procrustes disparity of the diffused data.'} 92 | 93 | \item{npca}{number of PCA components that should be used; default: 100.} 94 | 95 | \item{solver}{str, optional, default: 'exact' 96 | Which solver to use. "exact" uses the implementation described 97 | in van Dijk et al. (2018). "approximate" uses a faster implementation 98 | that performs imputation in the PCA space and then projects back to the 99 | gene space. Note, the "approximate" solver may return negative values.} 100 | 101 | \item{init}{magic object, optional 102 | object to use for initialization. Avoids recomputing 103 | intermediate steps if parameters are the same.} 104 | 105 | \item{t.max}{int, optional, default: 20 106 | Maximum value of t to test for automatic t selection.} 107 | 108 | \item{knn.dist.method}{string, optional, default: 'euclidean'. 109 | recommended values: 'euclidean', 'cosine' 110 | Any metric from `scipy.spatial.distance` can be used 111 | distance metric for building kNN graph.} 112 | 113 | \item{verbose}{`int` or `boolean`, optional (default : 1) 114 | If `TRUE` or `> 0`, print verbose updates.} 115 | 116 | \item{n.jobs}{`int`, optional (default: 1) 117 | The number of jobs to use for the computation. 118 | If -1 all CPUs are used. If 1 is given, no parallel computing code is 119 | used at all, which is useful for debugging. 120 | For n_jobs below -1, (n.cpus + 1 + n.jobs) are used. Thus for 121 | n_jobs = -2, all CPUs but one are used} 122 | 123 | \item{seed}{int or `NULL`, random state (default: `NULL`)} 124 | 125 | \item{k}{Deprecated. Use `knn`.} 126 | 127 | \item{alpha}{Deprecated. Use `decay`.} 128 | 129 | \item{assay}{Assay to use for imputation, defaults to the default assay} 130 | } 131 | \value{ 132 | If a Seurat object is passed, a Seurat object is returned. Otherwise, a "magic" object containing: 133 | * **result**: matrix containing smoothed expression values 134 | * **operator**: The MAGIC operator (python magic.MAGIC object) 135 | * **params**: Parameters passed to magic 136 | } 137 | \description{ 138 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an 139 | algorithm for denoising and transcript recover of single cells 140 | applied to single-cell RNA sequencing data, as described in 141 | van Dijk et al, 2018. 142 | } 143 | \examples{ 144 | if (pymagic_is_available()) { 145 | data(magic_testdata) 146 | 147 | # Run MAGIC 148 | data_magic <- magic(magic_testdata, genes = c("VIM", "CDH1", "ZEB1")) 149 | summary(data_magic) 150 | ## CDH1 VIM ZEB1 151 | ## Min. :0.4303 Min. :3.854 Min. :0.01111 152 | ## 1st Qu.:0.4444 1st Qu.:3.947 1st Qu.:0.01145 153 | ## Median :0.4462 Median :3.964 Median :0.01153 154 | ## Mean :0.4461 Mean :3.965 Mean :0.01152 155 | ## 3rd Qu.:0.4478 3rd Qu.:3.982 3rd Qu.:0.01160 156 | ## Max. :0.4585 Max. :4.127 Max. :0.01201 157 | 158 | # Plot the result with ggplot2 159 | if (require(ggplot2)) { 160 | ggplot(data_magic) + 161 | geom_point(aes(x = VIM, y = CDH1, color = ZEB1)) 162 | } 163 | 164 | # Run MAGIC again returning all genes 165 | # We use the last run as initialization 166 | data_magic <- magic(magic_testdata, genes = "all_genes", init = data_magic) 167 | # Extract the smoothed data matrix to use in downstream analysis 168 | data_smooth <- as.matrix(data_magic) 169 | } 170 | 171 | if (pymagic_is_available() && require(Seurat)) { 172 | data(magic_testdata) 173 | 174 | # Create a Seurat object 175 | seurat_object <- CreateSeuratObject(counts = t(magic_testdata), assay = "RNA") 176 | seurat_object <- NormalizeData(object = seurat_object) 177 | seurat_object <- ScaleData(object = seurat_object) 178 | 179 | # Run MAGIC and reset the active assay 180 | seurat_object <- magic(seurat_object) 181 | seurat_object@active.assay <- "MAGIC_RNA" 182 | 183 | # Analyze with Seurat 184 | VlnPlot(seurat_object, features = c("VIM", "ZEB1", "CDH1")) 185 | } 186 | } 187 | -------------------------------------------------------------------------------- /Rmagic/man/magic_testdata.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic_testdata.R 3 | \docType{data} 4 | \name{magic_testdata} 5 | \alias{magic_testdata} 6 | \title{Fake scRNAseq data for examples} 7 | \format{ 8 | A matrix with 500 rows and 197 variables 9 | } 10 | \source{ 11 | The authors 12 | } 13 | \usage{ 14 | magic_testdata 15 | } 16 | \description{ 17 | A subsampled dataset of epithelial to mesenchymal transition 18 | } 19 | \keyword{datasets} 20 | -------------------------------------------------------------------------------- /Rmagic/man/print.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic.R 3 | \name{print.magic} 4 | \alias{print.magic} 5 | \title{Print a MAGIC object} 6 | \usage{ 7 | \method{print}{magic}(x, ...) 8 | } 9 | \arguments{ 10 | \item{x}{A fitted MAGIC object} 11 | 12 | \item{...}{Arguments for print()} 13 | } 14 | \description{ 15 | This avoids spamming the user's console with a list of many large matrices 16 | } 17 | \examples{ 18 | if (pymagic_is_available()) { 19 | data(magic_testdata) 20 | data_magic <- magic(magic_testdata) 21 | print(data_magic) 22 | ## MAGIC with elements 23 | ## $result : (500, 197) 24 | ## $operator : Python MAGIC operator 25 | ## $params : list with elements (data, knn, decay, t, npca, knn.dist.method) 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /Rmagic/man/pymagic_is_available.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{pymagic_is_available} 4 | \alias{pymagic_is_available} 5 | \title{Check whether MAGIC Python package is available and can be loaded} 6 | \usage{ 7 | pymagic_is_available() 8 | } 9 | \description{ 10 | This is used primarily to avoid running tests on CRAN 11 | and elsewhere where the Python package should not be 12 | installed. 13 | } 14 | -------------------------------------------------------------------------------- /Rmagic/man/summary.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/magic.R 3 | \name{summary.magic} 4 | \alias{summary.magic} 5 | \title{Summarize a MAGIC object} 6 | \usage{ 7 | \method{summary}{magic}(object, ...) 8 | } 9 | \arguments{ 10 | \item{object}{A fitted MAGIC object} 11 | 12 | \item{...}{Arguments for summary()} 13 | } 14 | \description{ 15 | Summarize a MAGIC object 16 | } 17 | \examples{ 18 | if (pymagic_is_available()) { 19 | data(magic_testdata) 20 | data_magic <- magic(magic_testdata) 21 | summary(data_magic) 22 | ## ZEB1 23 | ## Min. :0.01071 24 | ## 1st Qu.:0.01119 25 | ## Median :0.01130 26 | ## Mean :0.01129 27 | ## 3rd Qu.:0.01140 28 | ## Max. :0.01201 29 | } 30 | } 31 | -------------------------------------------------------------------------------- /Rmagic/tests/test_magic.R: -------------------------------------------------------------------------------- 1 | # To run this file: 2 | # - Set the working directory to 'R/tests'. 3 | 4 | library(Rmagic) 5 | library(ggplot2) 6 | library(readr) 7 | library(viridis) 8 | 9 | seurat_obj <- function() { 10 | # load data 11 | data <- read.csv("../../data/HMLE_TGFb_day_8_10.csv.gz") 12 | 13 | seurat_raw_data <- t(data) 14 | rownames(seurat_raw_data) <- colnames(data) 15 | colnames(seurat_raw_data) <- rownames(data) 16 | seurat_obj <- Seurat::CreateSeuratObject(raw.data = seurat_raw_data) 17 | 18 | # run MAGIC 19 | data_MAGIC <- magic(data) 20 | seurat_MAGIC <- magic(seurat_obj) 21 | stopifnot(all(data_MAGIC$result == t(seurat_MAGIC@data))) 22 | 23 | # plot 24 | p <- ggplot(data) + 25 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 26 | scale_colour_viridis(option = "B") 27 | ggsave("EMT_data_R_before_magic.png", plot = p, width = 5, height = 5) 28 | 29 | p_m <- ggplot(data_MAGIC) + 30 | geom_point(aes(VIM, CDH1, colour = ZEB1)) + 31 | scale_colour_viridis(option = "B") 32 | ggsave("EMT_data_R_after_magic.png", plot = p_m, width = 5, height = 5) 33 | } 34 | -------------------------------------------------------------------------------- /data/HMLE_TGFb_day_8_10.csv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/data/HMLE_TGFb_day_8_10.csv.gz -------------------------------------------------------------------------------- /magic.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/magic.gif -------------------------------------------------------------------------------- /matlab/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/matlab/.DS_Store -------------------------------------------------------------------------------- /matlab/MAGIC Tutorial MATLAB-EMT.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/matlab/MAGIC Tutorial MATLAB-EMT.pptx -------------------------------------------------------------------------------- /matlab/compute_kernel.m: -------------------------------------------------------------------------------- 1 | function K = compute_alpha_kernel_sparse(data, varargin) 2 | % K = computer_alpha_kernel_sparse(data, varargin) 3 | % Computes sparse alpha-decay kernel 4 | % varargin: 5 | % 'npca' (default = [], no PCA) 6 | % Perform fast random PCA before computing distances 7 | % 'k' (default = 5) 8 | % k for the knn distances for the locally adaptive bandwidth 9 | % 'a' (default = 10) 10 | % The alpha exponent in the alpha-decaying kernel 11 | % 'distfun' (default = 'euclidean') 12 | % Input distance function 13 | k = 5; 14 | a = 10; 15 | npca = []; 16 | distfun = 'euclidean'; 17 | % get the input parameters 18 | if ~isempty(varargin) 19 | for j = 1:length(varargin) 20 | % k nearest neighbora 21 | if strcmp(varargin{j}, 'k') 22 | k = varargin{j+1}; 23 | end 24 | % alpha 25 | if strcmp(varargin{j}, 'a') 26 | a = varargin{j+1}; 27 | end 28 | % npca to project data 29 | if strcmp(varargin{j}, 'npca') 30 | npca = varargin{j+1}; 31 | end 32 | % distfun 33 | if strcmp(varargin{j}, 'distfun') 34 | distfun = varargin{j+1}; 35 | end 36 | end 37 | end 38 | 39 | th = 1e-4; 40 | 41 | k_knn = k * 20; 42 | 43 | bth=(-log(th))^(1/a); 44 | 45 | disp 'Computing alpha decay kernel:' 46 | 47 | N = size(data, 1); % number of cells 48 | 49 | if ~isempty(npca) 50 | disp ' PCA' 51 | data_centr = bsxfun(@minus, data, mean(data,1)); 52 | [U,~,~] = randPCA(data_centr', npca); % fast random svd 53 | %[U,~,~] = svds(data', npca); 54 | data_pc = data_centr * U; % PCA project 55 | else 56 | data_pc = data; 57 | end 58 | 59 | disp(['Number of samples = ' num2str(N)]) 60 | 61 | % Initial knn search and set the radius 62 | disp(['First iteration: k = ' num2str(k_knn)]) 63 | [idx, kdist]=knnsearch(data_pc,data_pc,'k',k_knn,'Distance',distfun); 64 | epsilon=kdist(:,k+1); 65 | 66 | % Find the points that have large enough distance to be below the kernel 67 | % threshold 68 | below_thresh=kdist(:,end)>=bth*epsilon; 69 | 70 | idx_thresh=find(below_thresh); 71 | 72 | if ~isempty(idx_thresh) 73 | K=exp(-(kdist(idx_thresh,:)./epsilon(idx_thresh)).^a); 74 | K(K<=th)=0; 75 | K=K(:); 76 | i = repmat(idx_thresh',1,size(idx,2)); 77 | i = i(:); 78 | idx_temp=idx(idx_thresh,:); 79 | j = idx_temp(:); 80 | end 81 | 82 | disp(['Number of samples below the threshold from 1st iter: ' num2str(length(idx_thresh))]) 83 | 84 | % Loop increasing k by factor of 20 until we cover 90% of the data 85 | while length(idx_thresh)<.9*N 86 | k_knn=min(20*k_knn,N); 87 | data_pc2=data_pc(~below_thresh,:); 88 | epsilon2=epsilon(~below_thresh); 89 | disp(['Next iteration: k= ' num2str(k_knn)]) 90 | [idx2, kdist2]=knnsearch(data_pc,data_pc2,'k',k_knn,'Distance',distfun); 91 | 92 | % Find the points that have large enough distance 93 | below_thresh2=kdist2(:,end)>=bth*epsilon2; 94 | idx_thresh2=find(below_thresh2); 95 | 96 | if ~isempty(idx_thresh2) 97 | K2=exp(-(kdist2(idx_thresh2,:)./epsilon2(idx_thresh2)).^a); 98 | K2(K2<=th)=0; 99 | idx_notthresh=find(~below_thresh); 100 | i2=repmat(idx_notthresh(idx_thresh2)',1,size(idx2,2)); 101 | i2=i2(:); 102 | idx_temp=idx2(idx_thresh2,:); 103 | j2=idx_temp(:); 104 | 105 | i=[i; i2]; 106 | j=[j; j2]; 107 | K=[K; K2(:)]; 108 | % Add the newly thresholded points to the old ones 109 | below_thresh(idx_notthresh(idx_thresh2))=1; 110 | idx_thresh=find(below_thresh); 111 | end 112 | disp(['Number of samples below the threshold from the next iter: ' num2str(length(idx_thresh))]) 113 | end 114 | 115 | % Radius search for the rest 116 | if length(idx_thresh) 0 91 | W = sparse(i, j, s); 92 | else 93 | W = sparse(i, j, ones(size(s))); % unweighted kNN graph 94 | end 95 | 96 | disp 'Symmetrize distances' 97 | W = W + W'; 98 | 99 | if epsilon > 0 100 | disp 'Computing kernel' 101 | [i,j,s] = find(W); 102 | i = [i; (1:N)']; 103 | j = [j; (1:N)']; 104 | s = [s./(epsilon^2); zeros(N,1)]; 105 | s = exp(-s); 106 | W = sparse(i,j,s); 107 | end 108 | 109 | disp 'Markov normalization' 110 | W = bsxfun(@rdivide, W, sum(W,2)); % Markov normalization 111 | 112 | disp 'done' 113 | -------------------------------------------------------------------------------- /matlab/compute_optimal_t.m: -------------------------------------------------------------------------------- 1 | function [data_opt_t, t_opt] = compute_optimal_t(data, DiffOp, varargin) 2 | 3 | t_max = 32; 4 | make_plot = true; 5 | th = 1e-3; 6 | data_opt_t = []; 7 | 8 | if ~isempty(varargin) 9 | for j = 1:length(varargin) 10 | if strcmp(varargin{j}, 't_max') 11 | t_max = varargin{j+1}; 12 | end 13 | if strcmp(varargin{j}, 'make_plot') 14 | make_plot = varargin{j+1}; 15 | end 16 | if strcmp(varargin{j}, 'th') 17 | th = varargin{j+1}; 18 | end 19 | end 20 | end 21 | 22 | data_prev = data; 23 | if make_plot 24 | error_vec = nan(t_max,1); 25 | for I=1:t_max 26 | disp(['t = ' num2str(I)]); 27 | data_curr = DiffOp * data_prev; 28 | error_vec(I) = procrustes(data_prev, data_curr); 29 | if error_vec(I) < th && isempty(data_opt_t) 30 | data_opt_t = data_curr; 31 | end 32 | data_prev = data_curr; 33 | end 34 | t_opt = find(error_vec < th, 1, 'first'); 35 | 36 | figure; 37 | hold all; 38 | plot(1:t_max, error_vec, '*-'); 39 | plot(t_opt, error_vec(t_opt), 'or', 'markersize', 10); 40 | xlabel 't' 41 | ylabel 'error' 42 | axis tight 43 | ylim([0 ceil(max(error_vec)*10)/10]); 44 | plot(xlim, [th th], '--k'); 45 | legend({'y' 'optimal t' ['y=' num2str(th)]}); 46 | set(gca,'xtick',1:t_max); 47 | set(gca,'ytick',0:0.1:1); 48 | else 49 | for I=1:t_max 50 | disp(['t = ' num2str(I)]); 51 | data_curr = DiffOp * data_prev; 52 | error = procrustes(data_prev, data_curr); 53 | if error < th 54 | t_opt = I; 55 | data_opt_t = data_curr; 56 | break 57 | end 58 | data_prev = data_curr; 59 | end 60 | end 61 | 62 | disp(['optimal t = ' num2str(t_opt)]); 63 | -------------------------------------------------------------------------------- /matlab/load_10x.m: -------------------------------------------------------------------------------- 1 | function [data, gene_names, gene_ids, cells] = load_10x(data_dir, varargin) 2 | % [data, gene_names, gene_ids, cells] = load_10x(data_dir, varargin) 3 | % loads 10x sparse format data 4 | % data_dir is dir that contains matrix.mtx, genes.tsv and barcodes.tsv 5 | % varargin 6 | % 'sparse', true -- returns data matrix in sparse format (default 'false') 7 | 8 | return_sparse = false; 9 | 10 | if isempty(data_dir) 11 | data_dir = './'; 12 | elseif data_dir(end) ~= '/' 13 | data_dir = [data_dir '/']; 14 | end 15 | 16 | for i=1:length(varargin)-1 17 | if (strcmp(varargin{i}, 'sparse')) 18 | return_sparse = varargin{i+1}; 19 | end 20 | end 21 | 22 | filename_dataMatrix = [data_dir 'matrix.mtx']; 23 | filename_genes = [data_dir 'genes.tsv']; 24 | filename_cells = [data_dir 'barcodes.tsv']; 25 | 26 | 27 | % Read in gene expression matrix (sparse matrix) 28 | % Rows = genes, columns = cells 29 | fprintf('LOADING\n') 30 | dataMatrix = mmread(filename_dataMatrix); 31 | fprintf(' Data matrix (%i cells x %i genes): %s\n', ... 32 | size(dataMatrix'), ['''' filename_dataMatrix '''' ]) 33 | 34 | % Read in row names (gene names / IDs) 35 | dataMatrix_genes = table2cell( ... 36 | readtable(filename_genes, ... 37 | 'FileType','text','ReadVariableNames',0)); 38 | dataMatrix_cells = table2cell( ... 39 | readtable(filename_cells, ... 40 | 'FileType','text','ReadVariableNames',0)); 41 | 42 | % Remove empty cells 43 | col_keep = any(dataMatrix,1); 44 | dataMatrix = dataMatrix(:,col_keep); 45 | dataMatrix_cells = dataMatrix_cells(col_keep,:); 46 | fprintf(' Removed %i empty cells\n', full(sum(~col_keep))) 47 | 48 | % Remove empty genes 49 | genes_keep = any(dataMatrix,2); 50 | dataMatrix = dataMatrix(genes_keep,:); 51 | dataMatrix_genes = dataMatrix_genes(genes_keep,:); 52 | fprintf(' Removed %i empty genes\n', full(sum(~genes_keep))) 53 | 54 | data = dataMatrix'; 55 | if ~return_sparse 56 | data = full(data); 57 | end 58 | gene_names = dataMatrix_genes(:,2); 59 | gene_ids = dataMatrix_genes(:,1); 60 | cells = dataMatrix_cells; 61 | -------------------------------------------------------------------------------- /matlab/mmread.m: -------------------------------------------------------------------------------- 1 | function [A,rows,cols,entries,rep,field,symm] = mmread(filename) 2 | % 3 | % function [A] = mmread(filename) 4 | % 5 | % function [A,rows,cols,entries,rep,field,symm] = mmread(filename) 6 | % 7 | % Reads the contents of the Matrix Market file 'filename' 8 | % into the matrix 'A'. 'A' will be either sparse or full, 9 | % depending on the Matrix Market format indicated by 10 | % 'coordinate' (coordinate sparse storage), or 11 | % 'array' (dense array storage). The data will be duplicated 12 | % as appropriate if symmetry is indicated in the header. 13 | % 14 | % Optionally, size information about the matrix can be 15 | % obtained by using the return values rows, cols, and 16 | % entries, where entries is the number of nonzero entries 17 | % in the final matrix. Type information can also be retrieved 18 | % using the optional return values rep (representation), field, 19 | % and symm (symmetry). 20 | % 21 | 22 | mmfile = fopen(filename,'r'); 23 | if ( mmfile == -1 ) 24 | disp(filename); 25 | error('File not found'); 26 | end; 27 | 28 | header = fgets(mmfile); 29 | if (header == -1 ) 30 | error('Empty file.') 31 | end 32 | 33 | % NOTE: If using a version of Matlab for which strtok is not 34 | % defined, substitute 'gettok' for 'strtok' in the 35 | % following lines, and download gettok.m from the 36 | % Matrix Market site. 37 | [head0,header] = strtok(header); % see note above 38 | [head1,header] = strtok(header); 39 | [rep,header] = strtok(header); 40 | [field,header] = strtok(header); 41 | [symm,header] = strtok(header); 42 | head1 = lower(head1); 43 | rep = lower(rep); 44 | field = lower(field); 45 | symm = lower(symm); 46 | if ( length(symm) == 0 ) 47 | disp(['Not enough words in header line of file ',filename]) 48 | disp('Recognized format: ') 49 | disp('%%MatrixMarket matrix representation field symmetry') 50 | error('Check header line.') 51 | end 52 | if ( ~ strcmp(head0,'%%MatrixMarket') ) 53 | error('Not a valid MatrixMarket header.') 54 | end 55 | if ( ~ strcmp(head1,'matrix') ) 56 | disp(['This seems to be a MatrixMarket ',head1,' file.']); 57 | disp('This function only knows how to read MatrixMarket matrix files.'); 58 | disp(' '); 59 | error(' '); 60 | end 61 | 62 | % Read through comments, ignoring them 63 | 64 | commentline = fgets(mmfile); 65 | while length(commentline) > 0 & commentline(1) == '%', 66 | commentline = fgets(mmfile); 67 | end 68 | 69 | % Read size information, then branch according to 70 | % sparse or dense format 71 | 72 | if ( strcmp(rep,'coordinate')) % read matrix given in sparse 73 | % coordinate matrix format 74 | 75 | [sizeinfo,count] = sscanf(commentline,'%d%d%d'); 76 | while ( count == 0 ) 77 | commentline = fgets(mmfile); 78 | if (commentline == -1 ) 79 | error('End-of-file reached before size information was found.') 80 | end 81 | [sizeinfo,count] = sscanf(commentline,'%d%d%d'); 82 | if ( count > 0 & count ~= 3 ) 83 | error('Invalid size specification line.') 84 | end 85 | end 86 | rows = sizeinfo(1); 87 | cols = sizeinfo(2); 88 | entries = sizeinfo(3); 89 | 90 | if ( strcmp(field,'real') || strcmp(field,'integer') ) % real valued entries: 91 | 92 | [T,count] = fscanf(mmfile,'%f',3); 93 | T = [T; fscanf(mmfile,'%f')]; 94 | if ( size(T) ~= 3*entries ) 95 | message = ... 96 | str2mat('Data file does not contain expected amount of data.',... 97 | 'Check that number of data lines matches nonzero count.'); 98 | disp(message); 99 | error('Invalid data.'); 100 | end 101 | T = reshape(T,3,entries)'; 102 | A = sparse(T(:,1), T(:,2), T(:,3), rows , cols); 103 | 104 | elseif ( strcmp(field,'complex')) % complex valued entries: 105 | 106 | T = fscanf(mmfile,'%f',4); 107 | T = [T; fscanf(mmfile,'%f')]; 108 | if ( size(T) ~= 4*entries ) 109 | message = ... 110 | str2mat('Data file does not contain expected amount of data.',... 111 | 'Check that number of data lines matches nonzero count.'); 112 | disp(message); 113 | error('Invalid data.'); 114 | end 115 | T = reshape(T,4,entries)'; 116 | A = sparse(T(:,1), T(:,2), T(:,3) + T(:,4)*sqrt(-1), rows , cols); 117 | 118 | elseif ( strcmp(field,'pattern')) % pattern matrix (no values given): 119 | 120 | T = fscanf(mmfile,'%f',2); 121 | T = [T; fscanf(mmfile,'%f')]; 122 | if ( size(T) ~= 2*entries ) 123 | message = ... 124 | str2mat('Data file does not contain expected amount of data.',... 125 | 'Check that number of data lines matches nonzero count.'); 126 | disp(message); 127 | error('Invalid data.'); 128 | end 129 | T = reshape(T,2,entries)'; 130 | A = sparse(T(:,1), T(:,2), ones(entries,1) , rows , cols); 131 | 132 | end 133 | 134 | elseif ( strcmp(rep,'array') ) % read matrix given in dense 135 | % array (column major) format 136 | 137 | [sizeinfo,count] = sscanf(commentline,'%d%d'); 138 | while ( count == 0 ) 139 | commentline = fgets(mmfile); 140 | if (commentline == -1 ) 141 | error('End-of-file reached before size information was found.') 142 | end 143 | [sizeinfo,count] = sscanf(commentline,'%d%d'); 144 | if ( count > 0 & count ~= 2 ) 145 | error('Invalid size specification line.') 146 | end 147 | end 148 | rows = sizeinfo(1); 149 | cols = sizeinfo(2); 150 | entries = rows*cols; 151 | if ( strcmp(field,'real') || strcmp(field,'integer') ) % real valued entries: 152 | A = fscanf(mmfile,'%f',1); 153 | A = [A; fscanf(mmfile,'%f')]; 154 | if ( strcmp(symm,'symmetric') | strcmp(symm,'hermitian') | strcmp(symm,'skew-symmetric') ) 155 | for j=1:cols-1, 156 | currenti = j*rows; 157 | A = [A(1:currenti); zeros(j,1);A(currenti+1:length(A))]; 158 | end 159 | elseif ( ~ strcmp(symm,'general') ) 160 | disp('Unrecognized symmetry') 161 | disp(symm) 162 | disp('Recognized choices:') 163 | disp(' symmetric') 164 | disp(' hermitian') 165 | disp(' skew-symmetric') 166 | disp(' general') 167 | error('Check symmetry specification in header.'); 168 | end 169 | A = reshape(A,rows,cols); 170 | elseif ( strcmp(field,'complex')) % complx valued entries: 171 | tmpr = fscanf(mmfile,'%f',1); 172 | tmpi = fscanf(mmfile,'%f',1); 173 | A = tmpr+tmpi*i; 174 | for j=1:entries-1 175 | tmpr = fscanf(mmfile,'%f',1); 176 | tmpi = fscanf(mmfile,'%f',1); 177 | A = [A; tmpr + tmpi*i]; 178 | end 179 | if ( strcmp(symm,'symmetric') | strcmp(symm,'hermitian') | strcmp(symm,'skew-symmetric') ) 180 | for j=1:cols-1, 181 | currenti = j*rows; 182 | A = [A(1:currenti); zeros(j,1);A(currenti+1:length(A))]; 183 | end 184 | elseif ( ~ strcmp(symm,'general') ) 185 | disp('Unrecognized symmetry') 186 | disp(symm) 187 | disp('Recognized choices:') 188 | disp(' symmetric') 189 | disp(' hermitian') 190 | disp(' skew-symmetric') 191 | disp(' general') 192 | error('Check symmetry specification in header.'); 193 | end 194 | A = reshape(A,rows,cols); 195 | elseif ( strcmp(field,'pattern')) % pattern (makes no sense for dense) 196 | disp('Matrix type:',field) 197 | error('Pattern matrix type invalid for array storage format.'); 198 | else % Unknown matrix type 199 | disp('Matrix type:',field) 200 | error('Invalid matrix type specification. Check header against MM documentation.'); 201 | end 202 | end 203 | 204 | % 205 | % If symmetric, skew-symmetric or Hermitian, duplicate lower 206 | % triangular part and modify entries as appropriate: 207 | % 208 | 209 | if ( strcmp(symm,'symmetric') ) 210 | A = A + A.' - diag(diag(A)); 211 | entries = nnz(A); 212 | elseif ( strcmp(symm,'hermitian') ) 213 | A = A + A' - diag(diag(A)); 214 | entries = nnz(A); 215 | elseif ( strcmp(symm,'skew-symmetric') ) 216 | A = A - A'; 217 | entries = nnz(A); 218 | end 219 | 220 | fclose(mmfile); 221 | % Done. 222 | -------------------------------------------------------------------------------- /matlab/project_genes.m: -------------------------------------------------------------------------------- 1 | function [M, genes_found, gene_idx] = project_genes(genes, genes_all, pc_imputed, U) 2 | % project_genes -- obtain gene values from compressed imputed data 3 | % [M, genes_found, gene_idx] = project_genes(genes, genes_all, pc_imputed, U) computes 4 | % gene values (M) for given gene names (genes) given all gene names (genes_all), loadings 5 | % (U), and imputed principal components (pc_imputed). 6 | % 7 | % Since pc_imputed and U are both narrow matrices the imputed data can be 8 | % stored in a memory efficient way, without having to store the dense 9 | % matrix. 10 | 11 | [gene_idx,locb] = ismember(lower(genes_all), lower(genes)); 12 | [~,sidx] = sort(locb(gene_idx)); 13 | idx = find(gene_idx); 14 | idx = idx(sidx); 15 | M = pc_imputed * U(idx,:)'; % project 16 | genes_found = genes_all(idx); 17 | gene_idx = find(gene_idx); 18 | -------------------------------------------------------------------------------- /matlab/randPCA.m: -------------------------------------------------------------------------------- 1 | function [U,S,V] = randPCA(A,k,its,l) 2 | 3 | %PCA Low-rank approximation in SVD form. 4 | % 5 | % 6 | % [U,S,V] = PCA(A) constructs a nearly optimal rank-6 approximation 7 | % USV' to A, using 2 full iterations of a block Lanczos method 8 | % of block size 6+2=8, started with an n x 8 random matrix, 9 | % when A is m x n; the ref. below explains "nearly optimal." 10 | % The smallest dimension of A must be >= 6 when A is 11 | % the only input to PCA. 12 | % 13 | % [U,S,V] = PCA(A,k) constructs a nearly optimal rank-k approximation 14 | % USV' to A, using 2 full iterations of a block Lanczos method 15 | % of block size k+2, started with an n x (k+2) random matrix, 16 | % when A is m x n; the ref. below explains "nearly optimal." 17 | % k must be a positive integer <= the smallest dimension of A. 18 | % 19 | % [U,S,V] = PCA(A,k,its) constructs a nearly optimal rank-k approx. USV' 20 | % to A, using its full iterations of a block Lanczos method 21 | % of block size k+2, started with an n x (k+2) random matrix, 22 | % when A is m x n; the ref. below explains "nearly optimal." 23 | % k must be a positive integer <= the smallest dimension of A, 24 | % and its must be a nonnegative integer. 25 | % 26 | % [U,S,V] = PCA(A,k,its,l) constructs a nearly optimal rank-k approx. 27 | % USV' to A, using its full iterates of a block Lanczos method 28 | % of block size l, started with an n x l random matrix, 29 | % when A is m x n; the ref. below explains "nearly optimal." 30 | % k must be a positive integer <= the smallest dimension of A, 31 | % its must be a nonnegative integer, 32 | % and l must be a positive integer >= k. 33 | % 34 | % 35 | % The low-rank approximation USV' is in the form of an SVD in the sense 36 | % that the columns of U are orthonormal, as are the columns of V, 37 | % the entries of S are all nonnegative, and the only nonzero entries 38 | % of S appear in non-increasing order on its diagonal. 39 | % U is m x k, V is n x k, and S is k x k, when A is m x n. 40 | % 41 | % Increasing its or l improves the accuracy of the approximation USV' 42 | % to A; the ref. below describes how the accuracy depends on its and l. 43 | % 44 | % 45 | % Note: PCA invokes RAND. To obtain repeatable results, 46 | % invoke RAND('seed',j) with a fixed integer j before invoking PCA. 47 | % 48 | % Note: PCA currently requires the user to center and normalize the rows 49 | % or columns of the input matrix A before invoking PCA (if such 50 | % is desired). 51 | % 52 | % Note: The user may ascertain the accuracy of the approximation USV' 53 | % to A by invoking DIFFSNORM(A,U,S,V). 54 | % 55 | % 56 | % inputs (the first is required): 57 | % A -- matrix being approximated 58 | % k -- rank of the approximation being constructed; 59 | % k must be a positive integer <= the smallest dimension of A, 60 | % and defaults to 6 61 | % its -- number of full iterations of a block Lanczos method to conduct; 62 | % its must be a nonnegative integer, and defaults to 2 63 | % l -- block size of the block Lanczos iterations; 64 | % l must be a positive integer >= k, and defaults to k+2 65 | % 66 | % outputs (all three are required): 67 | % U -- m x k matrix in the rank-k approximation USV' to A, 68 | % where A is m x n; the columns of U are orthonormal 69 | % S -- k x k matrix in the rank-k approximation USV' to A, 70 | % where A is m x n; the entries of S are all nonnegative, 71 | % and its only nonzero entries appear in nonincreasing order 72 | % on the diagonal 73 | % V -- n x k matrix in the rank-k approximation USV' to A, 74 | % where A is m x n; the columns of V are orthonormal 75 | % 76 | % 77 | % Example: 78 | % A = rand(1000,2)*rand(2,1000); 79 | % A = A/normest(A); 80 | % [U,S,V] = pca(A,2,0); 81 | % diffsnorm(A,U,S,V) 82 | % 83 | % This code snippet produces a rank-2 approximation USV' to A such that 84 | % the columns of U are orthonormal, as are the columns of V, and 85 | % the entries of S are all nonnegative and are zero off the diagonal. 86 | % diffsnorm(A,U,S,V) outputs an estimate of the spectral norm 87 | % of A-USV', which should be close to the machine precision. 88 | % 89 | % 90 | % Reference: 91 | % Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp, 92 | % Finding structure with randomness: Stochastic algorithms 93 | % for constructing approximate matrix decompositions, 94 | % arXiv:0909.4061 [math.NA; math.PR], 2009 95 | % (available at http://arxiv.org). 96 | % 97 | % 98 | % See also PCACOV, PRINCOMP, SVDS. 99 | % 100 | 101 | % Copyright 2009 Mark Tygert. 102 | 103 | warning off; 104 | 105 | % 106 | % Check the number of inputs. 107 | % 108 | if(nargin < 1) 109 | error('MATLAB:pca:TooFewIn',... 110 | 'There must be at least 1 input.') 111 | end 112 | 113 | if(nargin > 4) 114 | error('MATLAB:pca:TooManyIn',... 115 | 'There must be at most 4 inputs.') 116 | end 117 | 118 | % 119 | % Check the number of outputs. 120 | % 121 | if(nargout ~= 3) 122 | error('MATLAB:pca:WrongNumOut',... 123 | 'There must be exactly 3 outputs.') 124 | end 125 | 126 | % 127 | % Set the inputs k, its, and l to default values, if necessary. 128 | % 129 | if(nargin == 1) 130 | k = 6; 131 | its = 2; 132 | l = k+2; 133 | end 134 | 135 | if(nargin == 2) 136 | its = 2; 137 | l = k+2; 138 | end 139 | 140 | if(nargin == 3) 141 | l = k+2; 142 | end 143 | 144 | % 145 | % Check the first input argument. 146 | % 147 | if(~isfloat(A)) 148 | error('MATLAB:pca:In1NotFloat',... 149 | 'Input 1 must be a floating-point matrix.') 150 | end 151 | 152 | if(isempty(A)) 153 | error('MATLAB:pca:In1Empty',... 154 | 'Input 1 must not be empty.') 155 | end 156 | 157 | % 158 | % Retrieve the dimensions of A. 159 | % 160 | [m n] = size(A); 161 | 162 | % 163 | % Check the remaining input arguments. 164 | % 165 | if(size(k,1) ~= 1 || size(k,2) ~= 1) 166 | error('MATLAB:pca:In2Not1x1',... 167 | 'Input 2 must be a scalar.') 168 | end 169 | 170 | if(size(its,1) ~= 1 || size(its,2) ~= 1) 171 | error('MATLAB:pca:In3Not1x1',... 172 | 'Input 3 must be a scalar.') 173 | end 174 | 175 | if(size(l,1) ~= 1 || size(l,2) ~= 1) 176 | error('MATLAB:pca:In4Not1x1',... 177 | 'Input 4 must be a scalar.') 178 | end 179 | 180 | if(k <= 0) 181 | error('MATLAB:pca:In2NonPos',... 182 | 'Input 2 must be > 0.') 183 | end 184 | 185 | if((k > m) || (k > n)) 186 | error('MATLAB:pca:In2TooBig',... 187 | 'Input 2 must be <= the smallest dimension of Input 1.') 188 | end 189 | 190 | if(its < 0) 191 | error('MATLAB:pca:In3Neg',... 192 | 'Input 3 must be >= 0.') 193 | end 194 | 195 | if(l < k) 196 | error('MATLAB:pca:In4ltIn2',... 197 | 'Input 4 must be >= Input 2.') 198 | end 199 | 200 | % 201 | % SVD A directly if (its+1)*l >= m/1.25 or (its+1)*l >= n/1.25. 202 | % 203 | if(((its+1)*l >= m/1.25) || ((its+1)*l >= n/1.25)) 204 | 205 | if(~issparse(A)) 206 | [U,S,V] = svd(A,'econ'); 207 | end 208 | 209 | if(issparse(A)) 210 | [U,S,V] = svd(full(A),'econ'); 211 | end 212 | % 213 | % Retain only the leftmost k columns of U, the leftmost k columns of V, 214 | % and the uppermost leftmost k x k block of S. 215 | % 216 | U = U(:,1:k); 217 | V = V(:,1:k); 218 | S = S(1:k,1:k); 219 | 220 | return 221 | 222 | end 223 | 224 | 225 | if(m >= n) 226 | 227 | % 228 | % Apply A to a random matrix, obtaining H. 229 | % 230 | %rand('seed',rand('seed')); 231 | rng('default'); 232 | 233 | if(isreal(A)) 234 | H = A*(2*rand(n,l)-ones(n,l)); 235 | end 236 | 237 | if(~isreal(A)) 238 | H = A*( (2*rand(n,l)-ones(n,l)) + i*(2*rand(n,l)-ones(n,l)) ); 239 | end 240 | 241 | %rand('twister',rand('twister')); 242 | rng('default'); 243 | 244 | % 245 | % Initialize F to its final size and fill its leftmost block with H. 246 | % 247 | F = zeros(m,(its+1)*l); 248 | F(1:m, 1:l) = H; 249 | 250 | % 251 | % Apply A*A' to H a total of its times, 252 | % augmenting F with the new H each time. 253 | % 254 | for it = 1:its 255 | H = (H'*A)'; 256 | H = A*H; 257 | F(1:m, (1+it*l):((it+1)*l)) = H; 258 | end 259 | 260 | clear H; 261 | 262 | % 263 | % Form a matrix Q whose columns constitute an orthonormal basis 264 | % for the columns of F. 265 | % 266 | [Q,R,E] = qr(F,0); 267 | 268 | clear F R E; 269 | 270 | % 271 | % SVD Q'*A to obtain approximations to the singular values 272 | % and right singular vectors of A; adjust the left singular vectors 273 | % of Q'*A to approximate the left singular vectors of A. 274 | % 275 | [U2,S,V] = svd(Q'*A,'econ'); 276 | U = Q*U2; 277 | 278 | clear Q U2; 279 | 280 | % 281 | % Retain only the leftmost k columns of U, the leftmost k columns of V, 282 | % and the uppermost leftmost k x k block of S. 283 | % 284 | U = U(:,1:k); 285 | V = V(:,1:k); 286 | S = S(1:k,1:k); 287 | 288 | end 289 | 290 | 291 | if(m < n) 292 | 293 | % 294 | % Apply A' to a random matrix, obtaining H. 295 | % 296 | %rand('seed',rand('seed')); 297 | rng('default'); 298 | 299 | if(isreal(A)) 300 | H = ((2*rand(l,m)-ones(l,m))*A)'; 301 | end 302 | 303 | if(~isreal(A)) 304 | H = (( (2*rand(l,m)-ones(l,m)) + i*(2*rand(l,m)-ones(l,m)) )*A)'; 305 | end 306 | 307 | %rand('twister',rand('twister')); 308 | rng('default'); 309 | 310 | % 311 | % Initialize F to its final size and fill its leftmost block with H. 312 | % 313 | F = zeros(n,(its+1)*l); 314 | F(1:n, 1:l) = H; 315 | 316 | % 317 | % Apply A'*A to H a total of its times, 318 | % augmenting F with the new H each time. 319 | % 320 | for it = 1:its 321 | H = A*H; 322 | H = (H'*A)'; 323 | F(1:n, (1+it*l):((it+1)*l)) = H; 324 | end 325 | 326 | clear H; 327 | 328 | % 329 | % Form a matrix Q whose columns constitute an orthonormal basis 330 | % for the columns of F. 331 | % 332 | [Q,R,E] = qr(F,0); 333 | 334 | clear F R E; 335 | 336 | % 337 | % SVD A*Q to obtain approximations to the singular values 338 | % and left singular vectors of A; adjust the right singular vectors 339 | % of A*Q to approximate the right singular vectors of A. 340 | % 341 | [U,S,V2] = svd(A*Q,'econ'); 342 | V = Q*V2; 343 | 344 | clear Q V2; 345 | 346 | % 347 | % Retain only the leftmost k columns of U, the leftmost k columns of V, 348 | % and the uppermost leftmost k x k block of S. 349 | % 350 | U = U(:,1:k); 351 | V = V(:,1:k); 352 | S = S(1:k,1:k); 353 | 354 | end 355 | -------------------------------------------------------------------------------- /matlab/run_magic.m: -------------------------------------------------------------------------------- 1 | function [pc_imputed, U, pc] = run_magic(data, varargin) 2 | % run_magic Run MAGIC for imputing and denoising of single-cell data 3 | % [pc_imputed, U, pc] = run_magic(data, varargin) runs MAGIC on data (rows: 4 | % cells, columns: genes) with default parameter settings and returns the 5 | % imputed data in a compressed format. 6 | % 7 | % The compressed format consists of loadings (U) and imputed principal 8 | % components (pc_imputed). To obtain gene values form this compressedf format 9 | % either run project_genes.m or manually project (pc_imputed * U') either all 10 | % genes or a subset (pc_imputed * U(idx,:)'). Also returned are the original 11 | % principal components (pc); 12 | % 13 | % Since pc_imputed and U are both narrow matrices the imputed data can be 14 | % stored in a memory efficient way, without having to store the dense 15 | % matrix. 16 | % 17 | % Supplied data can be a sparse matrix, in which case MAGIC will be more 18 | % memory efficient. 19 | % 20 | % [...] = phate(data, 'PARAM1',val1, 'PARAM2',val2, ...) allows you to 21 | % specify optional parameter name/value pairs that control further details 22 | % of PHATE. Parameters are: 23 | % 24 | % 'npca' - number of PCA components to do MAGIC on. Defaults to 100. 25 | % 26 | % 'k' - number of nearest neighbors for bandwidth of adaptive alpha 27 | % decaying kernel or, when a=[], number of nearest neighbors of the knn 28 | % graph. For the unweighted kernel we recommend k to be a bit larger, 29 | % e.g. 10 or 15. Defaults to 10. 30 | % 31 | % 'a' - alpha of alpha decaying kernel. when a=[] knn (unweighted) kernel 32 | % is used. Defaults to 15. 33 | % 34 | % 't' - number of diffusion steps. Defaults to [] wich autmatically picks 35 | % the optimal t. 36 | % 37 | % 'distfun' - Distance function used to compute kernel. Defaults to 38 | % 'euclidean'. 39 | % 40 | % 'make_plot_opt_t' - Boolean flag for plotting the optimal t analysis. 41 | % Defaults to true. 42 | 43 | npca = 100; 44 | k = 10; 45 | a = 15; 46 | t = []; 47 | distfun = 'euclidean'; 48 | make_plot_opt_t = true; 49 | 50 | % get input parameters 51 | for i=1:length(varargin) 52 | % k for knn adaptive sigma 53 | if(strcmp(varargin{i},'k')) 54 | k = lower(varargin{i+1}); 55 | end 56 | % a (alpha) for alpha decaying kernel 57 | if(strcmp(varargin{i},'a')) 58 | a = lower(varargin{i+1}); 59 | end 60 | % diffusion time 61 | if(strcmp(varargin{i},'t')) 62 | t = lower(varargin{i+1}); 63 | end 64 | % npca 65 | if(strcmp(varargin{i},'npca')) 66 | npca = lower(varargin{i+1}); 67 | end 68 | % make plot optimal t 69 | if(strcmp(varargin{i},'make_plot_opt_t')) 70 | make_plot_opt_t = lower(varargin{i+1}); 71 | end 72 | end 73 | 74 | % PCA 75 | disp 'doing PCA' 76 | [U,~,~] = randPCA(data', npca); % this is svd 77 | pc = data * U; % this is PCA without mean centering to be able to handle sparse data 78 | 79 | % compute kernel 80 | disp 'computing kernel' 81 | K = compute_kernel(pc, 'k', k, 'a', a, 'distfun', distfun); 82 | 83 | % row stochastic 84 | P = bsxfun(@rdivide, K, sum(K,2)); 85 | 86 | % optimal t 87 | if isempty(t) 88 | disp 'imputing using optimal t' 89 | pc_imputed = compute_optimal_t(pc, P, 'make_plot', make_plot_opt_t); 90 | else 91 | disp 'imputing using provided t' 92 | pc_imputed = pc; 93 | for I=1:t 94 | disp(['t = ' num2str(I)]); 95 | pc_imputed = P * pc_imputed; 96 | end 97 | end 98 | 99 | disp 'done.' 100 | -------------------------------------------------------------------------------- /matlab/svdpca.m: -------------------------------------------------------------------------------- 1 | function Y = svdpca(X, k, method) 2 | 3 | if ~exist('method','var') 4 | method = 'svd'; 5 | end 6 | 7 | X = bsxfun(@minus, X, mean(X)); 8 | 9 | switch lower(method) 10 | case 'svd' 11 | disp 'PCA using SVD' 12 | [U,~,~] = svds(X', k); 13 | Y = X * U; 14 | case 'random' 15 | disp 'PCA using random SVD' 16 | [U,~,~] = randPCA(X', k); 17 | Y = X * U; 18 | case 'lra' 19 | disp 'LRA using random SVD' 20 | [U,S,V] = svds(X, k); 21 | Y = U*S*V'; 22 | case 'lra_random' 23 | disp 'LRA using random SVD' 24 | [U,S,V] = randPCA(X, k); 25 | Y = U*S*V'; 26 | end 27 | -------------------------------------------------------------------------------- /matlab/svdpca_sparse.m: -------------------------------------------------------------------------------- 1 | function [pc,U,S] = svdpca_sparse(X, k, method) 2 | 3 | if ~exist('method','var') 4 | method = 'svd'; 5 | end 6 | 7 | switch method 8 | case 'svd' 9 | disp 'PCA using SVD' 10 | [U,S,~] = svds(X', k); 11 | pc = X * U; 12 | case 'random' 13 | disp 'PCA using random SVD' 14 | [U,S,~] = randPCA(X', k); 15 | pc = X * U; 16 | S = diag(S); 17 | case 'none' 18 | disp 'No PCA performed' 19 | pc = X; 20 | end 21 | -------------------------------------------------------------------------------- /matlab/test_magic.m: -------------------------------------------------------------------------------- 1 | %% load data (e.g. 10x data) 2 | % data should be cells as rows and genes as columns 3 | %sample_dir = 'path_to_data/'; 4 | %[data, gene_names, gene_ids, cells] = load_10x(sample_dir); 5 | 6 | %% load EMT data 7 | file = '../data/HMLE_TGFb_day_8_10.csv'; %% gunzip ../data/HMLE_TGFb_day_8_10.csv.gz 8 | data = importdata(file); 9 | gene_names = data.colheaders; 10 | data = data.data; 11 | 12 | %% library size normalization 13 | libsize = sum(data,2); 14 | data = bsxfun(@rdivide, data, libsize) * median(libsize); 15 | 16 | %% log transform -- usually one would log transform the data. Here we don't do it. 17 | %data = log(data + 0.1); 18 | 19 | %% MAGIC 20 | [pc_imputed, U, pc] = run_magic(data, 'npca', 100, 'k', 15, 'a', 15, 'make_plot_opt_t', true); 21 | 22 | %% project genes 23 | plot_genes = {'Cdh1', 'Vim', 'Fn1', 'Zeb1'}; 24 | [M_imputed, genes_found] = project_genes(plot_genes, gene_names, pc_imputed, U); 25 | 26 | %% plot 27 | ms = 20; 28 | v = [-45 20]; 29 | % before MAGIC 30 | x = data(:, ismember(lower(gene_names), lower(plot_genes{1}))); 31 | y = data(:, ismember(lower(gene_names), lower(plot_genes{2}))); 32 | z = data(:, ismember(lower(gene_names), lower(plot_genes{3}))); 33 | c = data(:, ismember(lower(gene_names), lower(plot_genes{4}))); 34 | figure; 35 | subplot(2,2,1); 36 | scatter(y, x, ms, c, 'filled'); 37 | colormap(parula); 38 | axis tight 39 | xlabel(plot_genes{2}); 40 | ylabel(plot_genes{1}); 41 | h = colorbar; 42 | %ylabel(h,plot_genes{4}); 43 | title 'Before MAGIC' 44 | 45 | subplot(2,2,2); 46 | scatter3(x, y, z, ms, c, 'filled'); 47 | colormap(parula); 48 | axis tight 49 | xlabel(plot_genes{1}); 50 | ylabel(plot_genes{2}); 51 | zlabel(plot_genes{3}); 52 | %h = colorbar; 53 | ylabel(h,plot_genes{4}); 54 | view(v); 55 | title 'Before MAGIC' 56 | 57 | % plot after MAGIC 58 | x = M_imputed(:,1); 59 | y = M_imputed(:,2); 60 | z = M_imputed(:,3); 61 | c = M_imputed(:,4); 62 | subplot(2,2,3); 63 | scatter(y, x, ms, c, 'filled'); 64 | colormap(parula); 65 | axis tight 66 | xlabel(plot_genes{2}); 67 | ylabel(plot_genes{1}); 68 | h = colorbar; 69 | %ylabel(h,plot_genes{4}); 70 | title 'After MAGIC' 71 | 72 | subplot(2,2,4); 73 | scatter3(x, y, z, ms, c, 'filled'); 74 | colormap(parula); 75 | axis tight 76 | xlabel(plot_genes{1}); 77 | ylabel(plot_genes{2}); 78 | zlabel(plot_genes{3}); 79 | %h = colorbar; 80 | ylabel(h,plot_genes{4}); 81 | view(v); 82 | title 'After MAGIC' 83 | 84 | %% plot PCA before MAGIC 85 | figure; 86 | c = data(:, ismember(lower(gene_names), lower(plot_genes{4}))); 87 | Y = svdpca(pc, 3, 'random'); % original PCs are not mean centered so doing proper PCA here 88 | % alternative is to do proper PCA on data: 89 | %Y = svdpca(data, 3, 'random'); 90 | scatter3(Y(:,1), Y(:,2), Y(:,3), ms, c, 'filled'); 91 | colormap(parula); 92 | axis tight 93 | xlabel 'PC1' 94 | ylabel 'PC2' 95 | zlabel 'PC3' 96 | h = colorbar; 97 | ylabel(h,plot_genes{4}); 98 | view([-50 22]); 99 | title 'Before MAGIC' 100 | 101 | %% plot PCA after MAGIC 102 | figure; 103 | c = M_imputed(:,4); 104 | Y = svdpca(pc_imputed, 3, 'random'); % original PCs are not mean centered so doing proper PCA here 105 | % alternative is to go to full imputed data and then do proper PCA: 106 | %data_imputed = pc_imputed * U'; % project full data 107 | %Y = svdpca(data_imputed, 3, 'random'); 108 | scatter3(Y(:,1), Y(:,2), Y(:,3), ms, c, 'filled'); 109 | colormap(parula); 110 | axis tight 111 | xlabel 'PC1' 112 | ylabel 'PC2' 113 | zlabel 'PC3' 114 | h = colorbar; 115 | ylabel(h,plot_genes{4}); 116 | view([-50 22]); 117 | title 'After MAGIC' 118 | -------------------------------------------------------------------------------- /python/README.rst: -------------------------------------------------------------------------------- 1 | ======================================================= 2 | Markov Affinity-based Graph Imputation of Cells (MAGIC) 3 | ======================================================= 4 | 5 | .. image:: https://img.shields.io/pypi/v/magic-impute.svg 6 | :target: https://pypi.org/project/magic-impute/ 7 | :alt: Latest PyPi version 8 | .. image:: https://img.shields.io/cran/v/Rmagic.svg 9 | :target: https://cran.r-project.org/package=Rmagic 10 | :alt: Latest CRAN version 11 | .. image:: https://img.shields.io/github/workflow/status/KrishnaswamyLab/MAGIC/Unit%20Tests/master?label=Github%20Actions 12 | :target: https://github.com/KrishnaswamyLab/MAGIC/actions 13 | :alt: GitHub Actions Build 14 | .. image:: https://img.shields.io/readthedocs/magic.svg 15 | :target: https://magic.readthedocs.io/ 16 | :alt: Read the Docs 17 | .. image:: https://zenodo.org/badge/DOI/10.1016/j.cell.2018.05.061.svg 18 | :target: https://www.cell.com/cell/abstract/S0092-8674(18)30724-4 19 | :alt: Cell Publication DOI 20 | .. image:: https://img.shields.io/twitter/follow/KrishnaswamyLab.svg?style=social&label=Follow 21 | :target: https://twitter.com/KrishnaswamyLab 22 | :alt: Twitter 23 | .. image:: https://img.shields.io/github/stars/KrishnaswamyLab/MAGIC.svg?style=social&label=Stars 24 | :target: https://github.com/KrishnaswamyLab/MAGIC/ 25 | :alt: GitHub stars 26 | 27 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising high-dimensional data most commonly applied to single-cell RNA sequencing data. MAGIC learns the manifold data, using the resultant graph to smooth the features and restore the structure of the data. 28 | 29 | To see how MAGIC can be applied to single-cell RNA-seq, elucidating the epithelial-to-mesenchymal transition, read our `publication in Cell`_. 30 | 31 | `David van Dijk, et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. 2018. Cell.`__ 32 | 33 | .. _`publication in Cell`: https://www.cell.com/cell/abstract/S0092-8674(18)30724-4 34 | 35 | __ `publication in Cell`_ 36 | 37 | For R and MATLAB implementations of MAGIC, see 38 | https://github.com/KrishnaswamyLab/MAGIC. 39 | 40 | .. image:: https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/master/magic.gif 41 | :align: center 42 | :alt: Magic reveals the interaction between Vimentin (VIM), Cadherin-1 (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by colors). 43 | 44 | *Magic reveals the interaction between Vimentin (VIM), Cadherin-1 45 | (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by 46 | colors).* 47 | 48 | Installation 49 | ~~~~~~~~~~~~ 50 | 51 | Installation with pip 52 | --------------------- 53 | 54 | To install with ``pip``, run the following from a terminal:: 55 | 56 | pip install --user magic-impute 57 | 58 | Installation from GitHub 59 | ------------------------ 60 | 61 | To clone the repository and install manually, run the following from a 62 | terminal:: 63 | 64 | git clone git://github.com/KrishnaswamyLab/MAGIC.git 65 | cd MAGIC/python 66 | python setup.py install --user 67 | 68 | Usage 69 | ~~~~~ 70 | 71 | Example data 72 | ------------ 73 | 74 | The following code runs MAGIC on test data located in the MAGIC 75 | repository:: 76 | 77 | import magic 78 | import pandas as pd 79 | import matplotlib.pyplot as plt 80 | X = pd.read_csv("MAGIC/data/test_data.csv") 81 | magic_operator = magic.MAGIC() 82 | X_magic = magic_operator.fit_transform(X, genes=['VIM', 'CDH1', 'ZEB1']) 83 | plt.scatter(X_magic['VIM'], X_magic['CDH1'], c=X_magic['ZEB1'], s=1, cmap='inferno') 84 | plt.show() 85 | magic.plot.animate_magic(X, gene_x='VIM', gene_y='CDH1', gene_color='ZEB1', operator=magic_operator) 86 | 87 | Interactive command line 88 | ------------------------ 89 | 90 | We have included two tutorial notebooks on MAGIC usage and results 91 | visualization for single cell RNA-seq data. 92 | 93 | EMT data notebook: 94 | http://nbviewer.jupyter.org/github/KrishnaswamyLab/magic/blob/master/python/tutorial_notebooks/emt_tutorial.ipynb 95 | 96 | Bone Marrow data notebook: 97 | http://nbviewer.jupyter.org/github/KrishnaswamyLab/magic/blob/master/python/tutorial_notebooks/bonemarrow_tutorial.ipynb 98 | 99 | Help 100 | ~~~~ 101 | 102 | If you have any questions or require assistance using MAGIC, please 103 | contact us at https://krishnaswamylab.org/get-help. 104 | -------------------------------------------------------------------------------- /python/doc/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | SPHINXPROJ = PHATE 8 | SOURCEDIR = source 9 | BUILDDIR = build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | -------------------------------------------------------------------------------- /python/doc/source/api.rst: -------------------------------------------------------------------------------- 1 | API 2 | === 3 | 4 | MAGIC 5 | ----- 6 | 7 | .. automodule:: magic.magic 8 | :members: 9 | :inherited-members: 10 | :show-inheritance: 11 | 12 | Plotting 13 | -------- 14 | 15 | .. automodule:: magic.plot 16 | :members: 17 | :inherited-members: 18 | :show-inheritance: 19 | -------------------------------------------------------------------------------- /python/doc/source/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | # 4 | # MAGIC documentation build configuration file, created by 5 | # sphinx-quickstart on Thu Mar 30 19:50:14 2017. 6 | # 7 | # This file is execfile()d with the current directory set to its 8 | # containing dir. 9 | # 10 | # Note that not all possible configuration values are present in this 11 | # autogenerated file. 12 | # 13 | # All configuration values have a default; values that are commented out 14 | # serve to show the default. 15 | 16 | # If extensions (or modules to document with autodoc) are in another directory, 17 | # add these directories to sys.path here. If the directory is relative to the 18 | # documentation root, use os.path.abspath to make it absolute, like shown here. 19 | # 20 | import os 21 | import sys 22 | 23 | root_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) 24 | sys.path.insert(0, root_dir) 25 | # print(sys.path) 26 | 27 | # -- General configuration ------------------------------------------------ 28 | 29 | # If your documentation needs a minimal Sphinx version, state it here. 30 | # 31 | # needs_sphinx = '1.0' 32 | 33 | # Add any Sphinx extension module names here, as strings. They can be 34 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 35 | # ones. 36 | extensions = [ 37 | "sphinx.ext.autodoc", 38 | "sphinx.ext.autosummary", 39 | "sphinx.ext.napoleon", 40 | "sphinx.ext.doctest", 41 | "sphinx.ext.coverage", 42 | "sphinx.ext.viewcode", 43 | ] 44 | 45 | # Add any paths that contain templates here, relative to this directory. 46 | templates_path = ["ytemplates"] 47 | 48 | # The suffix(es) of source filenames. 49 | # You can specify multiple suffix as a list of string: 50 | # 51 | # source_suffix = ['.rst', '.md'] 52 | source_suffix = ".rst" 53 | 54 | # The master toctree document. 55 | master_doc = "index" 56 | 57 | # General information about the project. 58 | project = "MAGIC" 59 | copyright = "2017 Krishnaswamy Lab, Yale University" 60 | author = "Scott Gigante and Daniel Dager, Krishnaswamy Lab, Yale University" 61 | 62 | # The version info for the project you're documenting, acts as replacement for 63 | # |version| and |release|, also used in various other places throughout the 64 | # built documents. 65 | version_py = os.path.join(root_dir, "magic", "version.py") 66 | # The full version, including alpha/beta/rc tags. 67 | release = open(version_py).read().strip().split("=")[-1].replace('"', "").strip() 68 | # The short X.Y version. 69 | version = release.split("-")[0] 70 | 71 | # The language for content autogenerated by Sphinx. Refer to documentation 72 | # for a list of supported languages. 73 | # 74 | # This is also used if you do content translation via gettext catalogs. 75 | # Usually you set "language" from the command line for these cases. 76 | language = None 77 | 78 | # List of patterns, relative to source directory, that match files and 79 | # directories to ignore when looking for source files. 80 | # This patterns also effect to html_static_path and html_extra_path 81 | exclude_patterns = [] 82 | 83 | # The name of the Pygments (syntax highlighting) style to use. 84 | pygments_style = "sphinx" 85 | 86 | # If true, `todo` and `todoList` produce output, else they produce nothing. 87 | todo_include_todos = False 88 | 89 | 90 | # -- Options for HTML output ---------------------------------------------- 91 | 92 | # The theme to use for HTML and HTML Help pages. See the documentation for 93 | # a list of builtin themes. 94 | # 95 | html_theme = "default" 96 | 97 | # Theme options are theme-specific and customize the look and feel of a theme 98 | # further. For a list of options available for each theme, see the 99 | # documentation. 100 | # 101 | # html_theme_options = {} 102 | 103 | # Add any paths that contain custom static files (such as style sheets) here, 104 | # relative to this directory. They are copied after the builtin static files, 105 | # so a file named "default.css" will overwrite the builtin "default.css". 106 | html_static_path = ["ystatic"] 107 | 108 | 109 | # -- Options for HTMLHelp output ------------------------------------------ 110 | 111 | # Output file base name for HTML help builder. 112 | htmlhelp_basename = "MAGICdoc" 113 | 114 | 115 | # -- Options for LaTeX output --------------------------------------------- 116 | 117 | latex_elements = { 118 | # The paper size ('letterpaper' or 'a4paper'). 119 | # 120 | # 'papersize': 'letterpaper', 121 | # The font size ('10pt', '11pt' or '12pt'). 122 | # 123 | # 'pointsize': '10pt', 124 | # Additional stuff for the LaTeX preamble. 125 | # 126 | # 'preamble': '', 127 | # Latex figure (float) alignment 128 | # 129 | # 'figure_align': 'htbp', 130 | } 131 | 132 | # Grouping the document tree into LaTeX files. List of tuples 133 | # (source start file, target name, title, 134 | # author, documentclass [howto, manual, or own class]). 135 | latex_documents = [ 136 | ( 137 | master_doc, 138 | "MAGIC.tex", 139 | "MAGIC Documentation", 140 | "Scott Gigante and Daniel Dager, Krishnaswamy Lab, Yale University", 141 | "manual", 142 | ), 143 | ] 144 | 145 | 146 | # -- Options for manual page output --------------------------------------- 147 | 148 | # One entry per manual page. List of tuples 149 | # (source start file, name, description, authors, manual section). 150 | man_pages = [(master_doc, "magic", "MAGIC Documentation", [author], 1)] 151 | 152 | 153 | # -- Options for Texinfo output ------------------------------------------- 154 | 155 | # Grouping the document tree into Texinfo files. List of tuples 156 | # (source start file, target name, title, author, 157 | # dir menu entry, description, category) 158 | texinfo_documents = [ 159 | ( 160 | master_doc, 161 | "MAGIC", 162 | "MAGIC Documentation", 163 | author, 164 | "MAGIC", 165 | "One line description of project.", 166 | "Miscellaneous", 167 | ), 168 | ] 169 | -------------------------------------------------------------------------------- /python/doc/source/index.rst: -------------------------------------------------------------------------------- 1 | ======================================================= 2 | MAGIC - Markov Affinity-based Graph Imputation of Cells 3 | ======================================================= 4 | 5 | .. raw:: html 6 | 7 | Latest PyPI version 8 | 9 | .. raw:: html 10 | 11 | Latest CRAN version 12 | 13 | .. raw:: html 14 | 15 | GitHub Actions Build 16 | 17 | .. raw:: html 18 | 19 | Read the Docs 20 | 21 | .. raw:: html 22 | 23 | Cell Publication DOI 24 | 25 | .. raw:: html 26 | 27 | Twitter 28 | 29 | .. raw:: html 30 | 31 | GitHub stars 32 | 33 | Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising high-dimensional data most commonly applied to single-cell RNA sequencing data. MAGIC learns the manifold data, using the resultant graph to smooth the features and restore the structure of the data. 34 | 35 | To see how MAGIC can be applied to single-cell RNA-seq, elucidating the epithelial-to-mesenchymal transition, read our `publication in Cell`_. 36 | 37 | .. raw:: html 38 | 39 |

40 | 41 |
42 | Magic reveals the interaction between Vimentin (VIM), Cadherin-1 (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by colors). 43 | 44 |

45 | 46 | `David van Dijk, et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. 2018. Cell.`__ 47 | 48 | .. _`publication in Cell`: https://www.cell.com/cell/abstract/S0092-8674(18)30724-4 49 | 50 | __ `publication in Cell`_ 51 | 52 | .. toctree:: 53 | :maxdepth: 2 54 | 55 | installation 56 | tutorial 57 | api 58 | 59 | Quick Start 60 | =========== 61 | 62 | To run MAGIC on your dataset, create a MAGIC operator and run `fit_transform`. Here we show an example with a small, artificial dataset located in the MAGIC repository:: 63 | 64 | import magic 65 | import pandas as pd 66 | import matplotlib.pyplot as plt 67 | X = pd.read_csv("MAGIC/data/test_data.csv") 68 | magic_operator = magic.MAGIC() 69 | X_magic = magic_operator.fit_transform(X, genes=['VIM', 'CDH1', 'ZEB1']) 70 | plt.scatter(X_magic['VIM'], X_magic['CDH1'], c=X_magic['ZEB1'], s=1, cmap='inferno') 71 | plt.show() 72 | magic.plot.animate_magic(X, gene_x='VIM', gene_y='CDH1', gene_color='ZEB1', operator=magic_operator) 73 | 74 | Help 75 | ==== 76 | 77 | If you have any questions or require assistance using MAGIC, please contact us at https://krishnaswamylab.org/get-help 78 | 79 | .. autoclass:: magic.MAGIC 80 | :members: 81 | :noindex: 82 | -------------------------------------------------------------------------------- /python/doc/source/installation.rst: -------------------------------------------------------------------------------- 1 | Installation 2 | ============ 3 | 4 | Python installation 5 | ------------------- 6 | 7 | Installation with `pip` 8 | ~~~~~~~~~~~~~~~~~~~~~~~ 9 | 10 | The Python version of MAGIC can be installed using:: 11 | 12 | pip install --user magic-impute 13 | 14 | Installation from source 15 | ~~~~~~~~~~~~~~~~~~~~~~~~ 16 | 17 | The Python version of MAGIC can be installed from GitHub by running the following from a terminal:: 18 | 19 | git clone --recursive git://github.com/KrishnaswamyLab/MAGIC.git 20 | cd MAGIC/python 21 | python setup.py install --user 22 | 23 | MATLAB installation 24 | ------------------- 25 | 26 | 1. The MATLAB version of MAGIC can be accessed using:: 27 | 28 | git clone git://github.com/KrishnaswamyLab/MAGIC.git 29 | cd MAGIC/Matlab 30 | 31 | 2. Add the MAGIC/Matlab directory to your MATLAB path and run any of our `run` or `test` scripts to get a feel for MAGIC. 32 | 33 | R installation 34 | -------------- 35 | 36 | In order to use MAGIC in R, you must also install the Python package. 37 | 38 | If `python` or `pip` are not installed, you will need to install them. We recommend Miniconda3_ to install Python and `pip` together, or otherwise you can install `pip` from https://pip.pypa.io/en/stable/installing/. 39 | 40 | Installation from CRAN 41 | ~~~~~~~~~~~~~~~~~~~~~~ 42 | 43 | In R, run this command to install MAGIC and all dependencies:: 44 | 45 | install.packages("Rmagic") 46 | 47 | In a terminal, run the following command to install the Python 48 | repository:: 49 | 50 | pip install --user magic-impute 51 | 52 | .. _Miniconda3: https://conda.io/miniconda.html 53 | 54 | Installation from source 55 | ~~~~~~~~~~~~~~~~~~~~~~~~ 56 | 57 | The latest source version of MAGIC can be accessed by running the following in a terminal:: 58 | 59 | git clone https://github.com/KrishnaswamyLab/MAGIC.git 60 | cd MAGIC/Rmagic 61 | R CMD INSTALL . 62 | cd ../python 63 | python setup.py install --user 64 | 65 | If the `Rmagic` folder is empty, you have may forgotten to use the `--recursive` option for `git clone`. You can rectify this by running the following in a terminal:: 66 | 67 | cd MAGIC 68 | git submodule init 69 | git submodule update 70 | cd Rmagic 71 | R CMD INSTALL 72 | cd ../python 73 | python setup.py install --user 74 | -------------------------------------------------------------------------------- /python/doc/source/requirements.txt: -------------------------------------------------------------------------------- 1 | scikit-learn>=0.19.1 2 | numpy>=1.14.0 3 | pandas>=0.25 4 | scprep>=1.0 5 | scipy>=1.1.0 6 | matplotlib>=2.0.1 7 | future 8 | graphtools>=1.0.0 9 | sphinx 10 | sphinxcontrib-napoleon 11 | -------------------------------------------------------------------------------- /python/doc/source/tutorial.rst: -------------------------------------------------------------------------------- 1 | Tutorial 2 | -------- 3 | 4 | To run MAGIC on your dataset, create a MAGIC operator and run `fit_transform`. Here we show an example with an artificial test dataset located in the MAGIC repository:: 5 | 6 | import magic 7 | import matplotlib.pyplot as plt 8 | import pandas as pd 9 | X = pd.read_csv("MAGIC/data/test_data.csv") 10 | magic_operator = magic.MAGIC() 11 | X_magic = magic_operator.fit_transform(X, genes=['VIM', 'CDH1', 'ZEB1']) 12 | plt.scatter(X_magic['VIM'], X_magic['CDH1'], c=X_magic['ZEB1'], s=1, cmap='inferno') 13 | plt.show() 14 | magic.plot.animate_magic(X, gene_x='VIM', gene_y='CDH1', gene_color='ZEB1', operator=magic_operator) 15 | 16 | A demo on MAGIC usage for single cell RNA-seq data can be found in this notebook_: `http://nbviewer.jupyter.org/github/KrishnaswamyLab/magic/blob/master/python/tutorial_notebooks/emt_tutorial.ipynb`__ 17 | 18 | .. _notebook: http://nbviewer.jupyter.org/github/KrishnaswamyLab/magic/blob/master/python/tutorial_notebooks/emt_tutorial.ipynb 19 | 20 | __ notebook_ 21 | 22 | A second tutorial analyzing myeloid and erythroid cells in mouse bone marrow is available here_: `http://nbviewer.jupyter.org/github/KrishnaswamyLab/magic/blob/master/python/tutorial_notebooks/bonemarrow_tutorial.ipynb`__ 23 | 24 | .. _here: http://nbviewer.jupyter.org/github/KrishnaswamyLab/magic/blob/master/python/tutorial_notebooks/bonemarrow_tutorial.ipynb 25 | 26 | __ here_ 27 | -------------------------------------------------------------------------------- /python/magic/__init__.py: -------------------------------------------------------------------------------- 1 | from .magic import MAGIC 2 | from .version import __version__ 3 | 4 | import magic.plot 5 | -------------------------------------------------------------------------------- /python/magic/after_magic_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/python/magic/after_magic_example.png -------------------------------------------------------------------------------- /python/magic/before_magic_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KrishnaswamyLab/MAGIC/3a4ffdbe435716bb3b2fbe78f434c6cdc8dd8d78/python/magic/before_magic_example.png -------------------------------------------------------------------------------- /python/magic/plot.py: -------------------------------------------------------------------------------- 1 | # (C) 2017 Krishnaswamy Lab GPLv2 2 | 3 | from .magic import MAGIC 4 | from .utils import in_ipynb 5 | from matplotlib import animation 6 | from matplotlib import rc 7 | 8 | import matplotlib.pyplot as plt 9 | import numbers 10 | import numpy as np 11 | import pandas as pd 12 | import scprep 13 | 14 | 15 | def _validate_gene(gene, data): 16 | if isinstance(gene, str): 17 | if not isinstance(data, pd.DataFrame): 18 | raise ValueError( 19 | "Non-integer gene names only valid with pd.DataFrame " 20 | "input. X is a {}, gene = {}".format(type(data).__name__, gene) 21 | ) 22 | if gene not in data.columns: 23 | raise ValueError("gene {} not found".format(gene)) 24 | elif gene is not None and not isinstance(gene, numbers.Integral): 25 | raise TypeError("Expected int or str. Got {}".format(type(gene).__name__)) 26 | return gene 27 | 28 | 29 | def animate_magic( 30 | data, 31 | gene_x, 32 | gene_y, 33 | gene_color=None, 34 | t_max=20, 35 | delay=2, 36 | operator=None, 37 | filename=None, 38 | ax=None, 39 | figsize=None, 40 | s=1, 41 | cmap="inferno", 42 | interval=200, 43 | dpi=100, 44 | ipython_html="jshtml", 45 | verbose=False, 46 | **kwargs, 47 | ): 48 | """Animate a gene-gene relationship with increased diffusion 49 | 50 | Parameters 51 | ---------- 52 | data: array-like 53 | Input data matrix 54 | gene_x : int or str 55 | Gene to put on the x axis 56 | gene_y : int or str 57 | Gene to put on the y axis 58 | gene_color : int or str, optional (default: None) 59 | Gene to color by. If None, no color vector is used 60 | t_max : int, optional (default: 20) 61 | maximum value of t to include in the animation 62 | delay : int, optional (default: 5) 63 | number of frames to dwell on the first frame before applying MAGIC 64 | operator : magic.MAGIC, optional (default: None) 65 | precomputed MAGIC operator. If None, one is created. 66 | filename : str, optional (default: None) 67 | If not None, saves a .gif or .mp4 with the output 68 | ax : `matplotlib.Axes` or None, optional (default: None) 69 | axis on which to plot. If None, an axis is created 70 | figsize : tuple, optional (default: None) 71 | Tuple of floats for creation of new `matplotlib` figure. Only used if 72 | `ax` is None. 73 | s : int, optional (default: 1) 74 | Point size 75 | cmap : str or callable, optional (default: 'inferno') 76 | Matplotlib colormap 77 | interval : float, optional (default: 30) 78 | Time in milliseconds between frames 79 | dpi : int, optional (default: 100) 80 | Dots per inch (image quality) in saved animation) 81 | ipython_html : {'html5', 'jshtml'} 82 | which html writer to use if using a Jupyter Notebook 83 | verbose : bool, optional (default: False) 84 | MAGIC operator verbosity 85 | *kwargs : arguments for MAGIC 86 | 87 | Returns 88 | ------- 89 | A Matplotlib animation showing diffusion of an edge with increased t 90 | """ 91 | if in_ipynb(): 92 | # credit to 93 | # http://tiao.io/posts/notebooks/save-matplotlib-animations-as-gifs/ 94 | rc("animation", html=ipython_html) 95 | 96 | if filename is not None: 97 | if filename.endswith(".gif"): 98 | writer = "imagemagick" 99 | elif filename.endswith(".mp4"): 100 | writer = "ffmpeg" 101 | else: 102 | raise ValueError( 103 | "filename must end in .gif or .mp4. Got {}".format(filename) 104 | ) 105 | 106 | if operator is None: 107 | operator = MAGIC(verbose=verbose, **kwargs).fit(data) 108 | else: 109 | operator.set_params(verbose=verbose, **kwargs) 110 | gene_x = _validate_gene(gene_x, data) 111 | gene_y = _validate_gene(gene_y, data) 112 | gene_color = _validate_gene(gene_color, data) 113 | if gene_color is not None: 114 | genes = np.array([gene_x, gene_y, gene_color]) 115 | else: 116 | genes = np.array([gene_x, gene_y]) 117 | 118 | if isinstance(cmap, str): 119 | cmap = plt.cm.cmap_d[cmap] 120 | 121 | if ax is None: 122 | fig, ax = plt.subplots(figsize=figsize) 123 | show = True 124 | else: 125 | fig = ax.get_figure() 126 | show = False 127 | 128 | data_magic = scprep.select.select_cols(data, idx=genes) 129 | data_magic = scprep.utils.toarray(data_magic) 130 | c = data_magic[gene_color] if gene_color is not None else None 131 | sc = ax.scatter(data_magic[gene_x], data_magic[gene_y], c=c, cmap=cmap) 132 | ax.set_title("t = 0") 133 | ax.set_xlabel(gene_x) 134 | ax.set_ylabel(gene_y) 135 | ax.set_xticks([]) 136 | ax.set_yticks([]) 137 | ax.set_xticklabels([]) 138 | ax.set_yticklabels([]) 139 | if gene_color is not None: 140 | plt.colorbar(sc, label=gene_color, ticks=[]) 141 | 142 | data_magic = [data] 143 | for t in range(t_max): 144 | operator.set_params(t=t + 1) 145 | data_magic.append(operator.transform(genes=genes)) 146 | 147 | def init(): 148 | return ax 149 | 150 | def animate(i): 151 | i = max(i - delay, 0) 152 | data_t = data_magic[i] 153 | data_t = data_t if isinstance(data, pd.DataFrame) else data_t.T 154 | sc.set_offsets(np.array([data_t[gene_x], data_t[gene_y]]).T) 155 | xlim = np.min(data_t[gene_x]), np.max(data_t[gene_x]) 156 | xrange = xlim[1] - xlim[0] 157 | ax.set_xlim(xlim[0] - xrange / 10, xlim[1] + xrange / 10) 158 | ylim = np.min(data_t[gene_y]), np.max(data_t[gene_y]) 159 | yrange = ylim[1] - ylim[0] 160 | ax.set_ylim(ylim[0] - yrange / 10, ylim[1] + yrange / 10) 161 | ax.set_title("t = {}".format(i)) 162 | if gene_color is not None: 163 | color_t = data_t[gene_color] 164 | color_t -= np.min(color_t) 165 | color_t /= np.max(color_t) 166 | sc.set_facecolor(cmap(color_t)) 167 | return ax 168 | 169 | ani = animation.FuncAnimation( 170 | fig, 171 | animate, 172 | init_func=init, 173 | frames=range(t_max + delay + 1), 174 | interval=interval, 175 | blit=False, 176 | ) 177 | 178 | if filename is not None: 179 | ani.save(filename, writer=writer, dpi=dpi) 180 | 181 | if in_ipynb(): 182 | # credit to https://stackoverflow.com/a/45573903/3996580 183 | plt.close() 184 | elif show: 185 | plt.tight_layout() 186 | fig.show() 187 | 188 | return ani 189 | -------------------------------------------------------------------------------- /python/magic/utils.py: -------------------------------------------------------------------------------- 1 | from scipy import sparse 2 | 3 | import numbers 4 | import numpy as np 5 | import pandas as pd 6 | import scprep 7 | 8 | try: 9 | import anndata 10 | except (ImportError, SyntaxError): 11 | # anndata not installed 12 | pass 13 | 14 | 15 | def check_positive(**params): 16 | """Check that parameters are positive as expected. 17 | 18 | Raises 19 | ------ 20 | ValueError : unacceptable choice of parameters 21 | """ 22 | for p in params: 23 | if params[p] <= 0: 24 | raise ValueError("Expected {} > 0, got {}".format(p, params[p])) 25 | 26 | 27 | def check_int(**params): 28 | """Check that parameters are integers as expected. 29 | 30 | Raises 31 | ------ 32 | ValueError : unacceptable choice of parameters 33 | """ 34 | for p in params: 35 | if not isinstance(params[p], numbers.Integral): 36 | raise ValueError("Expected {} integer, got {}".format(p, params[p])) 37 | 38 | 39 | def check_if_not(x, *checks, **params): 40 | """Run checks only if parameters are not equal to a specified value. 41 | 42 | Parameters 43 | ---------- 44 | 45 | x : excepted value 46 | Checks not run if parameters equal x 47 | 48 | checks : function 49 | Unnamed arguments, check functions to be run 50 | 51 | params : object 52 | Named arguments, parameters to be checked 53 | 54 | Raises 55 | ------ 56 | ValueError : unacceptable choice of parameters 57 | """ 58 | for p in params: 59 | if params[p] is not x and params[p] != x: 60 | [check(p=params[p]) for check in checks] 61 | 62 | 63 | def check_in(choices, **params): 64 | """Checks parameters are in a list of allowed parameters. 65 | 66 | Parameters 67 | ---------- 68 | 69 | choices : array-like, accepted values 70 | 71 | params : object 72 | Named arguments, parameters to be checked 73 | 74 | Raises 75 | ------ 76 | ValueError : unacceptable choice of parameters 77 | """ 78 | for p in params: 79 | if params[p] not in choices: 80 | raise ValueError( 81 | "{} value {} not recognized. Choose from {}".format( 82 | p, params[p], choices 83 | ) 84 | ) 85 | 86 | 87 | def check_between(v_min, v_max, **params): 88 | """Checks parameters are in a specified range. 89 | 90 | Parameters 91 | ---------- 92 | 93 | v_min : float, minimum allowed value (inclusive) 94 | 95 | v_max : float, maximum allowed value (inclusive) 96 | 97 | params : object 98 | Named arguments, parameters to be checked 99 | 100 | Raises 101 | ------ 102 | ValueError : unacceptable choice of parameters 103 | """ 104 | for p in params: 105 | if params[p] < v_min or params[p] > v_max: 106 | raise ValueError( 107 | "Expected {} between {} and {}, " 108 | "got {}".format(p, v_min, v_max, params[p]) 109 | ) 110 | 111 | 112 | def matrix_is_equivalent(X, Y): 113 | """Check matrix equivalence with numpy, scipy and pandas.""" 114 | if X is Y: 115 | return True 116 | elif X.shape == Y.shape: 117 | if sparse.issparse(X) or sparse.issparse(Y): 118 | X = scprep.utils.to_array_or_spmatrix(X) 119 | Y = scprep.utils.to_array_or_spmatrix(Y) 120 | elif isinstance(X, pd.DataFrame) and isinstance(Y, pd.DataFrame): 121 | return np.all(X == Y) 122 | elif not (sparse.issparse(X) and sparse.issparse(Y)): 123 | X = scprep.utils.toarray(X) 124 | Y = scprep.utils.toarray(Y) 125 | return np.allclose(X, Y) 126 | else: 127 | return np.allclose((X - Y).data, 0) 128 | else: 129 | return False 130 | 131 | 132 | def convert_to_same_format(data, target_data, columns=None, prevent_sparse=False): 133 | """Convert data to same format as target data.""" 134 | # create new data object 135 | if scprep.utils.is_sparse_dataframe(target_data): 136 | if prevent_sparse: 137 | data = pd.DataFrame(data) 138 | else: 139 | data = scprep.utils.SparseDataFrame(data) 140 | pandas = True 141 | elif isinstance(target_data, pd.DataFrame): 142 | data = pd.DataFrame(data) 143 | pandas = True 144 | elif is_anndata(target_data): 145 | data = anndata.AnnData(data) 146 | pandas = False 147 | else: 148 | # nothing to do 149 | return data 150 | # retrieve column names 151 | target_columns = target_data.columns if pandas else target_data.var 152 | # subset column names 153 | try: 154 | if columns is not None: 155 | if pandas: 156 | target_columns = target_columns[columns] 157 | else: 158 | target_columns = target_columns.iloc[columns] 159 | except (KeyError, IndexError, ValueError): 160 | # keep the original column names 161 | if pandas: 162 | target_columns = columns 163 | else: 164 | target_columns = pd.DataFrame(index=columns) 165 | # set column names on new data object 166 | if pandas: 167 | data.columns = target_columns 168 | data.index = target_data.index 169 | else: 170 | data.var = target_columns 171 | data.obs = target_data.obs 172 | return data 173 | 174 | 175 | def in_ipynb(): 176 | """Check if we are running in a Jupyter Notebook. 177 | 178 | Credit to https://stackoverflow.com/a/24937408/3996580 179 | """ 180 | __VALID_NOTEBOOKS = [ 181 | "", 182 | "", 183 | ] 184 | try: 185 | return str(type(get_ipython())) in __VALID_NOTEBOOKS 186 | except NameError: 187 | return False 188 | 189 | 190 | def is_anndata(data): 191 | """Check if an object is an AnnData object.""" 192 | try: 193 | return isinstance(data, anndata.AnnData) 194 | except NameError: 195 | # anndata not installed 196 | return False 197 | 198 | 199 | def has_empty_columns(data): 200 | """Check if an object has empty columns.""" 201 | try: 202 | return np.any(np.array(data.sum(0)) == 0) 203 | except AttributeError: 204 | if is_anndata(data): 205 | return np.any(np.array(data.X.sum(0)) == 0) 206 | else: 207 | raise 208 | -------------------------------------------------------------------------------- /python/magic/version.py: -------------------------------------------------------------------------------- 1 | __version__ = "3.0.0" 2 | -------------------------------------------------------------------------------- /python/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy>=1.14.0 2 | scipy>=1.1.0 3 | pandas>=0.25 4 | scprep>=1.0 5 | matplotlib 6 | scikit-learn>=0.19.1 7 | future 8 | tasklogger>=1.0.0 9 | graphtools>=1.4.0 10 | -------------------------------------------------------------------------------- /python/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import find_packages 2 | from setuptools import setup 3 | 4 | import os 5 | 6 | install_requires = [ 7 | "numpy>=1.14.0", 8 | "scipy>=1.1.0", 9 | "matplotlib", 10 | "scikit-learn>=0.19.1", 11 | "future", 12 | "tasklogger>=1.0.0", 13 | "graphtools>=1.4.0", 14 | "pandas>=0.25", 15 | "scprep>=1.0", 16 | ] 17 | 18 | test_requires = ["nose2", "anndata", "coverage", "coveralls"] 19 | 20 | doc_requires = [ 21 | "sphinx", 22 | "sphinxcontrib-napoleon", 23 | ] 24 | 25 | version_py = os.path.join(os.path.dirname(__file__), "magic", "version.py") 26 | version = open(version_py).read().strip().split("=")[-1].replace('"', "").strip() 27 | 28 | readme = open("README.rst").read() 29 | 30 | setup( 31 | name="magic-impute", 32 | version=version, 33 | description="MAGIC", 34 | author="", 35 | author_email="", 36 | packages=find_packages(), 37 | license="GNU General Public License Version 2", 38 | python_requires=">=3.6", 39 | install_requires=install_requires, 40 | extras_require={"test": test_requires, "doc": doc_requires}, 41 | test_suite="nose2.collector.collector", 42 | long_description=readme, 43 | url="https://github.com/KrishnaswamyLab/MAGIC", 44 | download_url="https://github.com/KrishnaswamyLab/MAGIC/archive/v{}.tar.gz".format( 45 | version 46 | ), 47 | keywords=[ 48 | "visualization", 49 | "big-data", 50 | "dimensionality-reduction", 51 | "embedding", 52 | "manifold-learning", 53 | "computational-biology", 54 | ], 55 | classifiers=[ 56 | "Development Status :: 5 - Production/Stable", 57 | "Environment :: Console", 58 | "Framework :: Jupyter", 59 | "Intended Audience :: Developers", 60 | "Intended Audience :: Science/Research", 61 | "Natural Language :: English", 62 | "Operating System :: MacOS :: MacOS X", 63 | "Operating System :: Microsoft :: Windows", 64 | "Operating System :: POSIX :: Linux", 65 | "Programming Language :: Python :: 2", 66 | "Programming Language :: Python :: 2.7", 67 | "Programming Language :: Python :: 3", 68 | "Programming Language :: Python :: 3.5", 69 | "Programming Language :: Python :: 3.6", 70 | "Topic :: Scientific/Engineering :: Bio-Informatics", 71 | ], 72 | ) 73 | -------------------------------------------------------------------------------- /python/test/test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | 4 | import magic 5 | import matplotlib as mpl 6 | import numpy as np 7 | import os 8 | import scprep 9 | 10 | mpl.use("agg") 11 | 12 | try: 13 | import anndata 14 | except (ImportError, SyntaxError): 15 | # anndata not installed 16 | pass 17 | 18 | 19 | data_path = os.path.join("..", "data", "test_data.csv") 20 | if not os.path.isfile(data_path): 21 | data_path = os.path.join("..", data_path) 22 | scdata = scprep.io.load_csv(data_path, cell_names=False) 23 | scdata = scprep.filter.filter_empty_cells(scdata) 24 | scdata = scprep.filter.filter_empty_genes(scdata) 25 | scdata = scprep.filter.filter_duplicates(scdata) 26 | scdata_norm = scprep.normalize.library_size_normalize(scdata) 27 | scdata_norm = scprep.transform.sqrt(scdata_norm) 28 | 29 | 30 | def test_genes_str_int(): 31 | magic_op = magic.MAGIC(t="auto", decay=20, knn=10, verbose=False) 32 | str_gene_magic = magic_op.fit_transform(scdata_norm, genes=["VIM", "ZEB1"]) 33 | int_gene_magic = magic_op.fit_transform( 34 | scdata_norm, graph=magic_op.graph, genes=[-2, -1] 35 | ) 36 | assert str_gene_magic.shape[0] == scdata_norm.shape[0] 37 | np.testing.assert_array_equal(str_gene_magic, int_gene_magic) 38 | 39 | 40 | def test_pca_only(): 41 | magic_op = magic.MAGIC(t="auto", decay=20, knn=10, verbose=False) 42 | pca_magic = magic_op.fit_transform(scdata_norm, genes="pca_only") 43 | assert pca_magic.shape[0] == scdata_norm.shape[0] 44 | assert pca_magic.shape[1] == magic_op.n_pca 45 | 46 | 47 | def test_all_genes(): 48 | magic_op = magic.MAGIC(t="auto", decay=20, knn=10, verbose=False, random_state=42) 49 | int_gene_magic = magic_op.fit_transform(scdata_norm, genes=[-2, -1]) 50 | magic_all_genes = magic_op.fit_transform(scdata_norm, genes="all_genes") 51 | assert scdata_norm.shape == magic_all_genes.shape 52 | int_gene_magic2 = magic_op.transform(scdata_norm, genes=[-2, -1]) 53 | np.testing.assert_allclose(int_gene_magic, int_gene_magic2, rtol=0.015) 54 | 55 | 56 | def test_all_genes_approx(): 57 | magic_op = magic.MAGIC( 58 | t="auto", decay=20, knn=10, verbose=False, solver="approximate", random_state=42 59 | ) 60 | int_gene_magic = magic_op.fit_transform(scdata_norm, genes=[-2, -1]) 61 | magic_all_genes = magic_op.fit_transform(scdata_norm, genes="all_genes") 62 | assert scdata_norm.shape == magic_all_genes.shape 63 | int_gene_magic2 = magic_op.transform(scdata_norm, genes=[-2, -1]) 64 | np.testing.assert_allclose(int_gene_magic, int_gene_magic2, atol=0.003, rtol=0.008) 65 | 66 | 67 | def test_dremi(): 68 | magic_op = magic.MAGIC(t="auto", decay=20, knn=10, verbose=False) 69 | # test DREMI: need numerical precision here 70 | magic_op.set_params(random_state=42) 71 | magic_op.fit(scdata_norm) 72 | dremi = magic_op.knnDREMI("VIM", "ZEB1", plot=True) 73 | np.testing.assert_allclose(dremi, 1.466004, atol=0.0000005) 74 | 75 | 76 | def test_solver(): 77 | # Testing exact vs approximate solver 78 | magic_op = magic.MAGIC( 79 | t="auto", decay=20, knn=10, solver="exact", verbose=False, random_state=42 80 | ) 81 | data_imputed_exact = magic_op.fit_transform(scdata_norm) 82 | # should have exactly as many genes stored 83 | assert magic_op.X_magic.shape[1] == scdata_norm.shape[1] 84 | # should be nonzero 85 | assert np.all(data_imputed_exact >= 0) 86 | 87 | magic_op = magic.MAGIC( 88 | t="auto", 89 | decay=20, 90 | knn=10, 91 | n_pca=150, 92 | solver="approximate", 93 | verbose=False, 94 | random_state=42, 95 | ) 96 | # magic_op.set_params(solver='approximate') 97 | data_imputed_apprx = magic_op.fit_transform(scdata_norm) 98 | # should have n_pca genes stored 99 | assert magic_op.X_magic.shape[1] == 150 100 | # make sure they're close-ish 101 | np.testing.assert_allclose(data_imputed_apprx, data_imputed_exact, atol=0.15) 102 | # make sure they're not identical 103 | assert np.any(data_imputed_apprx != data_imputed_exact) 104 | 105 | 106 | def test_anndata(): 107 | try: 108 | anndata 109 | except NameError: 110 | # anndata not installed 111 | return 112 | scdata = anndata.read_csv(data_path) 113 | fast_magic_operator = magic.MAGIC( 114 | t="auto", solver="approximate", decay=None, knn=10, verbose=False 115 | ) 116 | sc_magic = fast_magic_operator.fit_transform(scdata, genes="all_genes") 117 | assert np.all(sc_magic.var_names == scdata.var_names) 118 | assert np.all(sc_magic.obs_names == scdata.obs_names) 119 | sc_magic = fast_magic_operator.fit_transform(scdata, genes=["VIM", "ZEB1"]) 120 | assert np.all(sc_magic.var_names.values == np.array(["VIM", "ZEB1"])) 121 | assert np.all(sc_magic.obs_names == scdata.obs_names) 122 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [flake8] 2 | ignore = 3 | # top-level module docstring 4 | D100, D104, 5 | # space before : conflicts with black 6 | E203 7 | per-file-ignores = 8 | # imported but unused 9 | __init__.py: F401 10 | # missing docstring in public function for methods, metrics, datasets 11 | openproblems/tasks/*/*/*.py: D103, E203 12 | openproblems/tasks/*/*/__init__.py: F401, D103 13 | max-line-length = 88 14 | exclude = 15 | .git, 16 | __pycache__, 17 | build, 18 | dist, 19 | Snakefile 20 | 21 | [isort] 22 | profile = black 23 | force_single_line = true 24 | force_alphabetical_sort = true 25 | --------------------------------------------------------------------------------