├── CONTRIBUTING.rst ├── LICENSE ├── README.md └── docs ├── README.md ├── date.txt └── source ├── .gitignore ├── Makefile ├── _static ├── create_client_id.png ├── create_git_project.png ├── custom.css ├── custom.js ├── igv_enable_google.png ├── log_into_google.png ├── open_rmarkdown.png ├── run_rmarkdown.png ├── set_working_directory.png └── upload_client_secrets.png ├── _templates └── page.html ├── add_message_read_on_rtd.py ├── api-client-python ├── getting-started-windows.rst ├── index.rst └── url-format.rst ├── api-client-r └── index.rst ├── auth_requirements.rst ├── common_api_flows.rst ├── conf.py ├── constants.rst ├── github_index.rst ├── includes ├── account_signup.rst ├── alpn_setup.rst ├── bioconductor_deployment_sidebar.rst ├── bioconductor_docker_details.rst ├── bioconductor_upload.rst ├── bioconductor_workshop_r_setup.rst ├── c2d_deployment_teardown.rst ├── collapsible_dataflow_setup.rst ├── collapsible_dataflow_setup_instructions.rst ├── collapsible_gcloud_setup.rst ├── collapsible_genomics_tools_setup.rst ├── collapsible_get_client_secrets.rst ├── collapsible_get_client_secrets_json.rst ├── collapsible_ld_dataflow_setup_instructions.rst ├── collapsible_spark_setup_instructions.rst ├── create_project.rst ├── dataflow_details.rst ├── dataflow_on_gce_run.rst ├── dataflow_on_gce_setup.rst ├── dataflow_setup.rst ├── gcloud_setup.rst ├── gcp_signup.rst ├── genomics_tools_setup.rst ├── get_client_secrets.rst ├── get_client_secrets_json.rst ├── get_client_secrets_steps.rst ├── getting-started-with-the-api.rst ├── grid-computing-tools-overview.rst ├── grid-computing-tools-run-your-own-overview.rst ├── grid-computing-tools-steps-check-logging.rst ├── grid-computing-tools-steps-create-cluster.rst ├── grid-computing-tools-steps-delete-cluster.rst ├── grid-computing-tools-steps-do-a-dry-run.rst ├── grid-computing-tools-steps-do-a-test-run.rst ├── grid-computing-tools-steps-download-grid-computing-repo.rst ├── grid-computing-tools-steps-monitoring-job-status.rst ├── grid-computing-tools-steps-sizing-disks.rst ├── grid-computing-tools-steps-ssh-to-master.rst ├── grid-computing-tools-steps-upload-source.rst ├── grid-computing-tools-steps-upload-your-config.rst ├── grid-computing-tools-steps-viewing-log-files.rst ├── grid-computing-tools-steps-viewing-results.rst ├── grid-computing-tools-workstation-directory-structure.rst ├── igv_desktop_setup.rst ├── ld_dataflow_details.rst ├── spark_details.rst ├── spark_setup.rst └── tute_data.rst ├── index.rst ├── job_troubleshooting.rst ├── mailinglist.rst ├── make.bat ├── migrating_v1beta2_to_v1.rst ├── sections ├── access_data.rst ├── advanced_bigquery.rst ├── analyze_data.rst ├── learn_more.rst ├── process_data.rst └── select_genomic_data.rst ├── use_cases ├── analyze_reads │ ├── calculate_coverage.rst │ ├── count_reads.rst │ └── index.rst ├── analyze_variants │ ├── analyze_variants_with_bigquery.rst │ ├── analyze_variants_with_bigquery │ │ ├── FILTER_count.png │ │ ├── My_Project_left_hand_nav.png │ │ ├── My_Project_with_genomics_public_data.png │ │ ├── array_fields_example.png │ │ ├── call_count_for_call_set.png │ │ ├── calls_with_multiple_FILTER_values.png │ │ ├── count_high_quality_calls_per_sample.png │ │ ├── count_high_quality_variant_calls.png │ │ ├── count_true_variants_per_callset.png │ │ ├── count_true_variants_per_callset_2.png │ │ ├── true_variants_by_chromome_final.png │ │ ├── true_variants_by_chromome_pad_with_0.png │ │ ├── true_variants_by_chromosome_1.png │ │ ├── true_variants_by_chromosome_remove_chr.png │ │ ├── variants_table_details.png │ │ ├── variants_table_preview.png │ │ └── variants_table_schema.png │ ├── data_analysis_codelab.rst │ ├── gwas.rst │ ├── hardy_weinberg_equilibrium.rst │ ├── index.rst │ └── transition_transversion.rst ├── annotate_variants │ ├── TuteAnnotation.png │ ├── annovar.rst │ ├── bioconductor_annotation.rst │ ├── google_genomics_annotation.rst │ ├── index.rst │ ├── interval_joins.rst │ └── tute_annotation.rst ├── browse_genomic_data │ ├── beacon.rst │ ├── bioconductor.rst │ ├── gabrowse.rst │ └── igv.rst ├── build_your_own_api_client │ └── index.rst ├── compress_or_decompress_many_files │ └── index.rst ├── compute_identity_by_state │ └── index.rst ├── compute_principal_coordinate_analysis │ ├── 1-way-pca.rst │ ├── 2-way-pca.rst │ └── index.rst ├── discover_public_data │ ├── 1000_cannabis_genomes.rst │ ├── 1000_genomes.rst │ ├── annotations_toc.rst │ ├── clinvar_annotations.rst │ ├── cosmic_annotations.rst │ ├── dream_smc_dna.rst │ ├── genomic_data_toc.rst │ ├── index.rst │ ├── isb_cgc_data.rst │ ├── mssng_data.rst │ ├── pgp_public_data.rst │ ├── platinum_genomes.rst │ ├── reference_genomes.rst │ ├── simons_foundation.rst │ ├── supercentenarians.rst │ ├── tute_genomics_public_data.rst │ └── ucsc_annotations.rst ├── getting-started-with-the-api │ ├── go.rst │ ├── java.rst │ └── python.rst ├── linkage_disequilibrium │ ├── analyze_ld_results.rst │ ├── compute_linkage_disequilibrium.rst │ ├── index.rst │ ├── public_ld_datasets.rst │ └── transform_ld_results.rst ├── load_data │ ├── index.rst │ ├── load_variants.rst │ └── multi_sample_variants.rst ├── perform_quality_control_checks │ ├── index.rst │ ├── qc_codelab.rst │ └── verify_bam_id.rst ├── run_familiar_tools │ ├── bioconductor.rst │ ├── datalab.rst │ ├── galaxy.rst │ └── ncbiblast.rst ├── run_picard_and_gatk │ └── index.rst ├── run_pipelines_in_the_cloud │ ├── index.rst │ └── pipelines_api.rst ├── run_samtools_over_many_files │ └── index.rst └── setup_gridengine_cluster_on_compute_engine │ └── index.rst └── workshops └── bioc-2015.rst /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | Warning: Google Genomics is now Cloud Life Sciences. The Google Genomics Cookbook on Read the Docs is not actively maintained and may contain incorrect or outdated information. The cookbook is only available for historical reference. For the most up to date documentation, view the official Cloud Life Sciences documentation at https://cloud.google.com/life-sciences. 2 | 3 | Also note that much of the Genomics v1 API surface has been superseded by `Variant Transforms `_ and `htsget `_. 4 | 5 | How to contribute 6 | =================================== 7 | 8 | First of all, thank you for contributing! 9 | 10 | The mailing list 11 | ---------------- 12 | 13 | For general questions or if you are having trouble getting started, try the 14 | `Google Genomics Discuss mailing list `_. 15 | It's a good way to sync up with other people who use googlegenomics including the core developers. You can subscribe 16 | by sending an email to ``google-genomics-discuss+subscribe@googlegroups.com`` or just post using 17 | the `web forum page `_. 18 | 19 | 20 | Submitting issues 21 | ----------------- 22 | 23 | If you are encountering a bug in the code or have a feature request in mind - file away! 24 | 25 | 26 | Submitting a pull request 27 | ------------------------- 28 | 29 | If you are ready to contribute code, Github provides a nice `overview on how to create a pull request 30 | `_. 31 | 32 | Some general rules to follow: 33 | 34 | * Do your work in `a fork `_ of this repo. 35 | * Create a branch for each update that you're working on. 36 | These branches are often called "feature" or "topic" branches. Any changes 37 | that you push to your feature branch will automatically be shown in the pull request. 38 | * Keep your pull requests as small as possible. Large pull requests are hard to review. 39 | Try to break up your changes into self-contained and incremental pull requests. 40 | * The first line of commit messages should be a short (<80 character) summary, 41 | followed by an empty line and then any details that you want to share about the commit. 42 | * Please try to follow the existing syntax style 43 | 44 | When you submit or change your pull request, the Travis build system will automatically run tests. 45 | If your pull request fails to pass tests, review the test log, make changes and 46 | then push them to your feature branch to be tested again. 47 | 48 | 49 | Contributor License Agreements 50 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 51 | 52 | All pull requests are welcome. Before we can submit them though, there is a legal hurdle we have to jump. 53 | You'll need to fill out either the individual or corporate Contributor License Agreement 54 | (CLA). 55 | 56 | * If you are an individual writing original source code and you're sure you 57 | own the intellectual property, then you'll need to sign an `individual CLA 58 | `_. 59 | * If you work for a company that wants to allow you to contribute your work, 60 | then you'll need to sign a `corporate CLA 61 | `_. 62 | 63 | Follow either of the two links above to access the appropriate CLA and 64 | instructions for how to sign and return it. Once we receive it, we'll be able to 65 | accept your pull requests. 66 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Warning: Google Genomics is now Cloud Life Sciences. The Google Genomics Cookbook on Read the Docs is not actively maintained and may contain incorrect or outdated information. The cookbook is only available for historical reference. For the most up to date documentation, view the official Cloud Life Sciences documentation at https://cloud.google.com/life-sciences. 2 | 3 | Also note that much of the Genomics v1 API surface has been superseded by [Variant Transforms](https://cloud.google.com/life-sciences/docs/how-tos/variant-transforms) and [htsget](https://cloud.google.com/life-sciences/docs/how-tos/reading-data-htsget). 4 | 5 | Welcome to the [Google Genomics](https://cloud.google.com/genomics) GitHub Organization! 6 | 7 | #### New to Google Genomics? 8 | 9 | 1. Watch this codelab walkthrough:
Google Genomics: Data Analysis Overview 12 | 2. [Try a few queries on public genomic data](https://github.com/googlegenomics/getting-started-bigquery) using Google BigQuery. 13 | 3. Browse the the [Google Genomics Cookbook](http://googlegenomics.readthedocs.org/en/latest/index.html). 14 | * There are a *ton* of samples on github for a variety of genomics use cases. 15 | * You'll find the samples described both by task (e.g., variant annotation) and technology (e.g. R, Python, BigQuery) in the cookbook. 16 | 17 | ##### This Repository 18 | 19 | **This** github repository is source control for the content on http://googlegenomics.readthedocs.org. Read it there. Edit it here if you want to contribute! 20 | 21 | * See https://cloud.google.com/genomics to understand what Google Genomics is and how to get started. 22 | * See http://googlegenomics.readthedocs.org for an incrementally growing task-oriented cookbook. 23 | * See the README in each GitHub repository for specifics about the code. 24 | -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- 1 | docs 2 | ==== 3 | 4 | The documentation for all of the repositories in googlegenomics is hosted on readthedocs at http://googlegenomics.readthedocs.org. 5 | 6 | The text on that site is automatically pushed from this repo's 7 | [reStructuredText](http://sphinx-doc.org/rest.html) files. See [this quick tutorial](http://rest-sphinx-memo.readthedocs.org/en/latest/ReST.html) or [Read The Doc's documentation](https://docs.readthedocs.org/en/latest/index.html) for more info. 8 | 9 | Please help us improve these docs by [contributing](https://github.com/googlegenomics/docs/blob/master/CONTRIBUTING.rst)! 10 | 11 | For documentation about the Google Genomics APIs themselves, see 12 | https://cloud.google.com/genomics/what-is-google-genomics and http://ga4gh.org 13 | 14 | ### Tips for local development 15 | 16 | If you want to render and view documentation on your local machine using the same theme 17 | as ReadTheDocs: 18 | 19 | (1) [Install the Sphinx RTD Theme](https://github.com/snide/sphinx_rtd_theme). 20 | 21 | (2) Locally modify conf.py. Do not check this in. 22 | ``` 23 | diff --git a/docs/source/conf.py b/docs/source/conf.py 24 | index c109c7f..61a0f27 100644 25 | --- a/docs/source/conf.py 26 | +++ b/docs/source/conf.py 27 | @@ -98,13 +98,13 @@ pygments_style = 'sphinx' 28 | 29 | # The theme to use for HTML and HTML Help pages. See the documentation for 30 | # a list of builtin themes. 31 | -html_theme = 'default' 32 | +#html_theme = 'default' 33 | 34 | #------------[ For Local Development ] ------------------------------------- 35 | # See https://github.com/snide/sphinx_rtd_theme for theme install instructions. 36 | -#import sphinx_rtd_theme 37 | -#html_theme = "sphinx_rtd_theme" 38 | -#html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] 39 | +import sphinx_rtd_theme 40 | +html_theme = "sphinx_rtd_theme" 41 | +html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] 42 | 43 | # Theme options are theme-specific and customize the look and feel of a theme 44 | # further. For a list of options available for each theme, see the 45 | ``` 46 | 47 | (3) Build the docs. 48 | 49 | ``` 50 | cd start-here/docs/source 51 | make html 52 | ``` 53 | 54 | (4) View the local files in your browser! 55 | -------------------------------------------------------------------------------- /docs/date.txt: -------------------------------------------------------------------------------- 1 | Wed Mar 25 10:58:40 PDT 2015 2 | -------------------------------------------------------------------------------- /docs/source/.gitignore: -------------------------------------------------------------------------------- 1 | _build/ 2 | *~ -------------------------------------------------------------------------------- /docs/source/_static/create_client_id.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/create_client_id.png -------------------------------------------------------------------------------- /docs/source/_static/create_git_project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/create_git_project.png -------------------------------------------------------------------------------- /docs/source/_static/custom.css: -------------------------------------------------------------------------------- 1 | /* NOTE: Add/update the date suffix to this file's filename if you want to ensure 2 | * folks pull in the most recent version instead of what is in their browser cache. 3 | * Then modify _templates/page.html to point to the updated files. 4 | */ 5 | 6 | /* http://stackoverflow.com/questions/2454577/sphinx-restructuredtext-show-hide-code-snippets*/ 7 | .toggle .header { 8 | display: block; 9 | clear: both; 10 | } 11 | 12 | .toggle .container { 13 | border-top: 0px; 14 | margin-top: 0px; 15 | padding-top: 0px; 16 | /*background-color: red;*/ 17 | padding-bottom: 24px; 18 | } 19 | 20 | .toggle .header:after { 21 | content: " ▼"; 22 | } 23 | 24 | .toggle .header.open:after { 25 | content: " ▲"; 26 | } 27 | 28 | /* For http://sphinx-doc.org/config.html#confval-rst_epilog */ 29 | .ggfooter p { 30 | font-size: small; 31 | font-style: italic; 32 | } 33 | 34 | .visible-only-on-github { 35 | display: none; 36 | } 37 | 38 | /* 39 | * By default tables add horizontal scrollbars instead of wrapping. 40 | * It is hard to conceive of when you would want this. 41 | * Issue discussed and workaround provided here: 42 | * https://github.com/snide/sphinx_rtd_theme/issues/117 43 | */ 44 | .wy-table-responsive table td, .wy-table-responsive table th { 45 | white-space: normal !important; 46 | } 47 | -------------------------------------------------------------------------------- /docs/source/_static/custom.js: -------------------------------------------------------------------------------- 1 | // Open all links in another window. 2 | $(document).ready(function() { 3 | $("a[href^='http']").attr('target','_blank'); 4 | }); 5 | 6 | // implement collapsible containers for content. 7 | jQuery($(document).ready(function() { 8 | $(".toggle > *").hide(); 9 | $(".toggle .header").show(); 10 | $(".toggle .header").click(function() { 11 | $(this).parent().children().not(".header").toggle(400); 12 | $(this).parent().children(".header").toggleClass("open"); 13 | }) 14 | })); 15 | -------------------------------------------------------------------------------- /docs/source/_static/igv_enable_google.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/igv_enable_google.png -------------------------------------------------------------------------------- /docs/source/_static/log_into_google.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/log_into_google.png -------------------------------------------------------------------------------- /docs/source/_static/open_rmarkdown.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/open_rmarkdown.png -------------------------------------------------------------------------------- /docs/source/_static/run_rmarkdown.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/run_rmarkdown.png -------------------------------------------------------------------------------- /docs/source/_static/set_working_directory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/set_working_directory.png -------------------------------------------------------------------------------- /docs/source/_static/upload_client_secrets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/_static/upload_client_secrets.png -------------------------------------------------------------------------------- /docs/source/_templates/page.html: -------------------------------------------------------------------------------- 1 | {% extends "!page.html" %} 2 | 3 | {% set css_files = css_files + ["_static/custom.css"] %} 4 | 5 | {% set script_files = script_files + ["_static/custom.js"] %} 6 | -------------------------------------------------------------------------------- /docs/source/api-client-python/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | The Python client 14 | ------------------- 15 | 16 | The `api-client-python `_ project 17 | provides a simple genome browser that pulls data from the Genomics API. 18 | 19 | .. toctree:: 20 | :maxdepth: 2 21 | 22 | getting-started-windows 23 | url-format 24 | 25 | 26 | The Python client does not currently use 27 | `Google's Python client library `_. 28 | If you want to use the client library, the 29 | `method documentation `_ 30 | for genomics can be very useful. 31 | -------------------------------------------------------------------------------- /docs/source/api-client-python/url-format.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | GABrowse URL format 14 | ------------------- 15 | 16 | The genome browser code supports direct linking to specific backends, readsets, and genomic positions. 17 | 18 | These parameters are set `using the hash `_. 19 | The format is very simple with only 3 supported key value pairs separated by ``&`` and then ``=``: 20 | 21 | * backend 22 | 23 | The backend to use for API calls. example: ``GOOGLE`` or ``NCBI`` 24 | 25 | * readsetId 26 | 27 | The ID of the readset that should be loaded. See :doc:`/constants` for more information. 28 | 29 | * location 30 | 31 | The genomic position to display at. Takes the form of ``:``. example: ``14:25419886`` 32 | This can also be an `RS ID `_ 33 | or a string that will be searched on `snpedia `_. 34 | 35 | As you navigate in the browser (either locally or at http://gabrowse.appspot.com), 36 | the hash will automatically populate to include these parameters. 37 | But you can also manually create a direct link without having to go through the UI. 38 | 39 | Putting all the pieces together, here is what a valid url looks like:: 40 | 41 | http://gabrowse.appspot.com/#backend=GOOGLE&readsetId=CPHG3MzoCRDY5IrcqZq8hMIB&location=14:25419886 42 | -------------------------------------------------------------------------------- /docs/source/api-client-r/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | The R client 14 | ------------ 15 | 16 | The `GoogleGenomics Bioconductor package`_ provides R methods to search for and retreive Reads and Variants stored in the Google Genomics API. 17 | 18 | Additionally it provides converters to `Bioconductor`_ datatypes such as: 19 | 20 | * `GAlignments `_ 21 | * `GRanges `_ 22 | * `VRanges `_ 23 | -------------------------------------------------------------------------------- /docs/source/auth_requirements.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | API authorization requirements 15 | ------------------------------ 16 | 17 | Calls to the Google Genomics API can be made with 18 | `OAuth `_ or with an 19 | `API key `_. 20 | 21 | * To access private data or to make any write calls, an API request needs to be authenticated with OAuth. 22 | * Read-only calls to public data only require an API key to identify the calling project. (OAuth will also work) 23 | 24 | Some APIs are still in the testing phase. 25 | The following lays out where each API call stands and also indicates whether a call 26 | supports requests without OAuth. 27 | 28 | 29 | Available APIs 30 | ~~~~~~~~~~~~~~ 31 | 32 | ============================================= ============== 33 | API method OAuth required 34 | ============================================= ============== 35 | Get, List and Search methods (except on Jobs) False 36 | Create, Delete, Patch and Update methods True 37 | Import and Export methods True 38 | All Job methods True 39 | ============================================= ============== 40 | 41 | 42 | APIs in testing 43 | ~~~~~~~~~~~~~~~ 44 | 45 | ======================== ============== 46 | API method OAuth required 47 | ======================== ============== 48 | genomics.experimental.* True 49 | ======================== ============== 50 | -------------------------------------------------------------------------------- /docs/source/constants.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Important constants and links 14 | ----------------------------- 15 | 16 | Google's base API url is: 17 | ``https://genomics.googleapis.com/v1`` 18 | 19 | More information on the API can be found at: 20 | https://cloud.google.com/genomics and http://ga4gh.org 21 | 22 | To test Google's compliance with the GA4GH API, you can use the compliance tests: 23 | https://github.com/ga4gh/compliance 24 | 25 | To get a list of public datasets that can be used with Google's API calls, you can 26 | `use the APIs explorer `_ 27 | or :doc:`/use_cases/discover_public_data/index`. 28 | -------------------------------------------------------------------------------- /docs/source/includes/account_signup.rst: -------------------------------------------------------------------------------- 1 | If you don't already have one, `sign up for a Google Account `_ 2 | -------------------------------------------------------------------------------- /docs/source/includes/alpn_setup.rst: -------------------------------------------------------------------------------- 1 | If you want to run a small pipeline on your machine before running it in parallel on Compute Engine, you will need `ALPN`_ since many of these pipelines require it. When running locally, this must be provided on the boot classpath but when running on Compute Engine Dataflow workers this is already configured for you. You can download it from `here `__. For example:: 2 | 3 | wget -O alpn-boot.jar \ 4 | http://central.maven.org/maven2/org/mortbay/jetty/alpn/alpn-boot/8.1.8.v20160420/alpn-boot-8.1.8.v20160420.jar 5 | 6 | -------------------------------------------------------------------------------- /docs/source/includes/bioconductor_deployment_sidebar.rst: -------------------------------------------------------------------------------- 1 | .. sidebar:: Details 2 | 3 | This will create a virtual machine on Google Cloud Platform with a locked down network (only SSH port 22 open). Your local machine will securely connect to the VM via an ssh tunnel. 4 | 5 | Within the docker container the directory ``/home/rstudio/data`` will correspond to directory ``/mnt/data`` on the virtual machine. This is where the persistent data disk is attached to the VM. **Store important files there.** Docker containers are stateless, so if the container restarts for any reason, then files you created within the container will be lost. 6 | -------------------------------------------------------------------------------- /docs/source/includes/bioconductor_docker_details.rst: -------------------------------------------------------------------------------- 1 | 2 | To run the docker container locally: 3 | 4 | 1. Install `Docker`_ for your platform. 5 | 2. Run command ``docker run gcr.io/bioc_2015/devel_sequencing`` 6 | 7 | See https://github.com/googlegenomics/gce-images for the Docker file. It depends upon http://www.bioconductor.org/help/docker/ which depends upon https://github.com/rocker-org/rocker/wiki. 8 | 9 | Note that its big, over ``4GB``, since it is derived from the `Bioconductor Sequencing view `_ and contains many annotation databases. 10 | -------------------------------------------------------------------------------- /docs/source/includes/bioconductor_upload.rst: -------------------------------------------------------------------------------- 1 | Upload ``client_secrets.json``. From the RStudio *Files Pane* click on the "Upload" button. 2 | 3 | .. image:: /_static/upload_client_secrets.png 4 | 5 | :alt: Upload Client Secrets 6 | -------------------------------------------------------------------------------- /docs/source/includes/bioconductor_workshop_r_setup.rst: -------------------------------------------------------------------------------- 1 | .. code:: 2 | 3 | # Install BiocInstaller. 4 | source("http://bioconductor.org/biocLite.R") 5 | # See http://www.bioconductor.org/developers/how-to/useDevel/ 6 | useDevel() 7 | # Install devtools which is needed for the special use of biocLite() below. 8 | biocLite("devtools") 9 | # Install the workshop material. 10 | biocLite("googlegenomics/bioconductor-workshop-r", build_vignettes=TRUE, dependencies=TRUE) 11 | -------------------------------------------------------------------------------- /docs/source/includes/c2d_deployment_teardown.rst: -------------------------------------------------------------------------------- 1 | If you would like to pause your VM when not using it: 2 | 3 | 1. Go to the Google Cloud Platform Console and select your project: https://console.cloud.google.com/project/_/compute/instances 4 | 2. Click on the checkbox next to your VM. 5 | 3. Click on *Stop* to pause your VM. 6 | 4. When you are ready to use it again, *Start* your VM. For more detail, see: https://cloud.google.com/compute/docs/instances/stopping-or-deleting-an-instance 7 | 8 | If you want to delete your deployment: 9 | 10 | 1. First copy any data off of the data disk that you wish to keep. **The data disk will be deleted when the deployment is deleted.** 11 | 2. Click on `Deployments`_ to navigate to your deployment and delete it. 12 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_dataflow_setup.rst: -------------------------------------------------------------------------------- 1 | .. container:: toggle 2 | 3 | .. container:: header 4 | 5 | To *launch* the job from your local machine: **Show/Hide Instructions** 6 | 7 | .. container:: content 8 | 9 | .. include:: /includes/dataflow_setup.rst 10 | 11 | .. container:: toggle 12 | 13 | .. container:: header 14 | 15 | To *launch* the job from Google Cloud Shell: **Show/Hide Instructions** 16 | 17 | .. container:: content 18 | 19 | .. include:: /includes/dataflow_on_gce_setup.rst 20 | 21 | .. include:: /includes/alpn_setup.rst 22 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_dataflow_setup_instructions.rst: -------------------------------------------------------------------------------- 1 | .. include:: /includes/collapsible_dataflow_setup.rst 2 | 3 | Download the latest GoogleGenomics dataflow **runnable** jar from the `Maven Central Repository `_. For example:: 4 | 5 | wget -O google-genomics-dataflow-runnable.jar \ 6 | https://search.maven.org/remotecontent?filepath=com/google/cloud/genomics/google-genomics-dataflow/v1-0.1/google-genomics-dataflow-v1-0.1-runnable.jar 7 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_gcloud_setup.rst: -------------------------------------------------------------------------------- 1 | .. container:: toggle 2 | 3 | .. container:: header 4 | 5 | If you have not done so before, install gcloud tool: **Show/Hide Instructions** 6 | 7 | .. container:: content 8 | 9 | .. include:: /includes/gcloud_setup.rst 10 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_genomics_tools_setup.rst: -------------------------------------------------------------------------------- 1 | .. container:: toggle 2 | 3 | .. container:: header 4 | 5 | Sign up and set up access to genomics data in Google Cloud: **Show/Hide Instructions** 6 | 7 | .. container:: content 8 | 9 | ..include:: /includes/genomics_tools_setup.rst 10 | 11 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_get_client_secrets.rst: -------------------------------------------------------------------------------- 1 | .. container:: toggle 2 | 3 | .. container:: header 4 | 5 | If you do not have it already, get your ``client ID`` and ``client secret``: **Show/Hide Instructions** 6 | 7 | .. container:: content 8 | 9 | .. include:: /includes/get_client_secrets.rst 10 | 11 | 12 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_get_client_secrets_json.rst: -------------------------------------------------------------------------------- 1 | .. container:: toggle 2 | 3 | .. container:: header 4 | 5 | If you do not have it already, get your ``client_secrets.json`` file: **Show/Hide Instructions** 6 | 7 | .. container:: content 8 | 9 | .. include:: /includes/get_client_secrets_json.rst 10 | 11 | 12 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_ld_dataflow_setup_instructions.rst: -------------------------------------------------------------------------------- 1 | .. include:: /includes/collapsible_dataflow_setup.rst 2 | 3 | Build the Linkage Disequilibrium jar:: 4 | 5 | git clone https://github.com/googlegenomics/linkage-disequilibrium.git 6 | cd linkage-disequilibrium 7 | mvn package 8 | -------------------------------------------------------------------------------- /docs/source/includes/collapsible_spark_setup_instructions.rst: -------------------------------------------------------------------------------- 1 | .. container:: toggle 2 | 3 | .. container:: header 4 | 5 | Deploy and configure the cluster: **Show/Hide Instructions** 6 | 7 | .. container:: content 8 | 9 | .. include:: /includes/spark_setup.rst 10 | -------------------------------------------------------------------------------- /docs/source/includes/create_project.rst: -------------------------------------------------------------------------------- 1 | If you do not yet have a cloud project, `create a Genomics and Cloud Storage enabled project via the Google Cloud Platform Console `_. 2 | -------------------------------------------------------------------------------- /docs/source/includes/dataflow_details.rst: -------------------------------------------------------------------------------- 1 | |dataflowADC| 2 | 3 | Use ``--help`` to get more information about the command line options. Change 4 | the pipeline class name below to match the one you would like to run. 5 | 6 | .. code-block:: shell 7 | 8 | java -cp google-genomics-dataflow*runnable.jar \ 9 | com.google.cloud.genomics.dataflow.pipelines.VariantSimilarity --help 10 | 11 | See the source code for implementation details: https://github.com/googlegenomics/dataflow-java 12 | -------------------------------------------------------------------------------- /docs/source/includes/dataflow_on_gce_run.rst: -------------------------------------------------------------------------------- 1 | The above command line runs the pipeline locally over a small portion of the genome, only taking a few minutes. If modified to run over a larger portion of the genome or the entire genome, it may take a few hours depending upon how many virtual machines are configured to run concurrently via ``--numWorkers``. Add the following additional command line parameters to run the pipeline on Google Cloud instead of locally:: 2 | 3 | --runner=DataflowPipelineRunner \ 4 | --project=YOUR-GOOGLE-CLOUD-PLATFORM-PROJECT-ID \ 5 | --stagingLocation=gs://YOUR-BUCKET/dataflow-staging \ 6 | --numWorkers=# 7 | -------------------------------------------------------------------------------- /docs/source/includes/dataflow_on_gce_setup.rst: -------------------------------------------------------------------------------- 1 | If you do not have Java on your local machine, the following setup instructions will allow you to *launch* Dataflow jobs using the `Google Cloud Shell`_: 2 | 3 | #. If you have not already done so, follow the `Genomics Quickstart`_. 4 | 5 | #. If you have not already done so, follow the `Dataflow Quickstart`_. 6 | 7 | #. Use the `Cloud Console`_ to activate the `Google Cloud Shell`_. 8 | 9 | #. Run the following commands in the Cloud Shell to install `Java 8`_. 10 | 11 | .. code-block:: shell 12 | 13 | sudo apt-get update 14 | sudo apt-get install --assume-yes openjdk-8-jdk maven 15 | sudo update-alternatives --config java 16 | sudo update-alternatives --config javac 17 | 18 | .. note:: 19 | 20 | Depending on the pipeline, Cloud Shell may not not have sufficient memory to run the pipeline locally (e.g., without the ``--runner`` command line flag). If you get error ``java.lang.OutOfMemoryError: Java heap space``, follow the instructions to run the pipeline using Compute Engine Dataflow workers instead of locally (e.g. use ``--runner=DataflowPipelineRunner``). 21 | -------------------------------------------------------------------------------- /docs/source/includes/dataflow_setup.rst: -------------------------------------------------------------------------------- 1 | Most users *launch* Dataflow jobs from their local machine. This is unrelated to where the job itself actually runs (which is controlled by the ``--runner`` parameter). Either way, `Java 8`_ is needed to run the Jar that kicks off the job. 2 | 3 | #. If you have not already done so, follow the `Genomics Quickstart`_. 4 | 5 | #. If you have not already done so, follow the `Dataflow Quickstart`_ including `installing gcloud `_ and running ``gcloud init``. 6 | -------------------------------------------------------------------------------- /docs/source/includes/gcloud_setup.rst: -------------------------------------------------------------------------------- 1 | Follow the Windows, Mac OS X or Linux instructions to install gcloud on your local machine: https://cloud.google.com/sdk/ 2 | 3 | * Download and install the Google Cloud SDK by running this command in your shell or Terminal: 4 | 5 | .. code-block:: shell 6 | 7 | curl https://sdk.cloud.google.com | bash 8 | 9 | Or, you can download `google-cloud-sdk.zip `_ or `google-cloud-sdk.tar.gz `_, unpack it, and launch the *./google-cloud-sdk/install.sh* script. 10 | 11 | Restart your shell or Terminal. 12 | 13 | * Authenticate: 14 | 15 | .. code-block:: shell 16 | 17 | $ gcloud auth login 18 | 19 | * Configure the project: 20 | 21 | .. code-block:: shell 22 | 23 | $ gcloud config set project 24 | 25 | -------------------------------------------------------------------------------- /docs/source/includes/gcp_signup.rst: -------------------------------------------------------------------------------- 1 | .. sidebar:: Details 2 | 3 | If you already have a Google Cloud Platform project, this link will take you to your list of projects. 4 | 5 | Sign up for Google Cloud Platform by clicking on this link: https://console.cloud.google.com/billing/freetrial 6 | -------------------------------------------------------------------------------- /docs/source/includes/genomics_tools_setup.rst: -------------------------------------------------------------------------------- 1 | These instructions are based on `Genomics Tools tutorial `_. 2 | 3 | Set up your account and a cloud project 4 | --------------------------------------- 5 | 6 | .. include:: /includes/account_signup.rst 7 | 8 | .. include:: /includes/gcp_signup.rst 9 | 10 | .. include:: /includes/create_project.rst 11 | 12 | Install gcloud tool and validate access to genomics data 13 | -------------------------------------------------------- 14 | 15 | .. include:: /includes/collapsible_gcloud_setup.rst 16 | 17 | * Install the Genomics tools: 18 | 19 | .. code-block:: shell 20 | 21 | $ gcloud components update alpha 22 | 23 | * Confirm the access to Genomics data works: 24 | 25 | .. code-block:: shell 26 | 27 | $ gcloud alpha genomics readgroupsets list 10473108253681171589 --limit 10 28 | ID NAME REFERENCE_SET_ID 29 | CMvnhpKTFhDq9e2Yy9G-Bg HG02573 EOSt9JOVhp3jkwE 30 | CMvnhpKTFhCEmf_d_o_JCQ HG03894 EOSt9JOVhp3jkwE 31 | ... 32 | 33 | Set up credentials for programs accessing the genomics data 34 | ----------------------------------------------------------- 35 | 36 | .. include:: /includes/collapsible_get_client_secrets_json.rst 37 | 38 | Copy **client_secrets.json** to the directory where you installed the Genomics tools. 39 | 40 | The first time you query the API you will be authenticated using the values in the client_secrets file you downloaded. After this initial authentication, the Genomics tools save a token to use during subsequent API requests. 41 | -------------------------------------------------------------------------------- /docs/source/includes/get_client_secrets.rst: -------------------------------------------------------------------------------- 1 | Get your ``client_id`` and ``client_secrets`` by visiting the following page: 2 | 3 | .. include:: /includes/get_client_secrets_steps.rst 4 | 5 | You can find these values at any time by returning the Credentials tab 6 | and clicking on the name you specified in step 4. 7 | -------------------------------------------------------------------------------- /docs/source/includes/get_client_secrets_json.rst: -------------------------------------------------------------------------------- 1 | Get your ``client_secrets.json`` file by visiting the following page: 2 | 3 | .. include:: /includes/get_client_secrets_steps.rst 4 | 5 | To download the ``client_secrets.json`` file: 6 | 7 | 1. Select **OK** to close the dialog 8 | 2. Select the name of your new client id (which you specified in step 4) 9 | 3. Select **Download JSON** 10 | 11 | Note that by convention the downloaded file is referred to as 12 | ``client_secrets.json`` though the file name is something much longer. 13 | -------------------------------------------------------------------------------- /docs/source/includes/get_client_secrets_steps.rst: -------------------------------------------------------------------------------- 1 | https://console.cloud.google.com/project/_/apiui/credential 2 | 3 | After you select your Google Cloud project, this link will 4 | automatically take you to the Credentials tab under the API Manager. 5 | 6 | 1. Select **New credentials** 7 | 2. Select **OAuth client ID** 8 | 9 | If prompted, select **Configure consent screen**, and follow the 10 | instructions to set a "product name" to identify your Cloud project in the 11 | consent screen. Choose "Save". 12 | 13 | 3. Under **Application Type** choose **Other** 14 | 4. Give your client ID a name, so you can remember why it was created (suggestion: |suggested_client_id_name|) 15 | 5. Select **Create** 16 | 17 | After successful creation, the interface should display your client ID 18 | and client secret. 19 | -------------------------------------------------------------------------------- /docs/source/includes/getting-started-with-the-api.rst: -------------------------------------------------------------------------------- 1 | Data stored in Google Genomics is accessible via the `Google Genomics API`_. 2 | This means that any programming language that can make network requests over 3 | HTTPS can be used to access it. 4 | 5 | We have `examples in github `_, which can help get you started. 6 | The code within each language-specific folder demonstrates the same things: 7 | 8 | * Getting the read bases for NA12872 at a specific position 9 | * Getting the variant overlapping that same position, and outputting the genotype 10 | 11 | If this is what you are looking for, then take a look at the 12 | |language-link| example. 13 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-overview.rst: -------------------------------------------------------------------------------- 1 | This tool takes advantage of two key technologies to process 2 | a large number of files: 3 | 4 | * `Google Compute Engine`_ 5 | * `Grid Engine`_ (SGE) 6 | 7 | Google Compute Engine provides virtual machines in the cloud. With sufficient 8 | quota in your Google Cloud project, you can start dozens or hundreds of 9 | instances concurrently. The more instances you add to your cluster, the more 10 | quickly you can process your files. 11 | 12 | Grid Engine is used to distribute the file operation tasks across 13 | all of the instances such that each instance takes the responsibility 14 | to download a single file, run the operation, and upload it back to 15 | Cloud Storage. 16 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-run-your-own-overview.rst: -------------------------------------------------------------------------------- 1 | #. Create an ``input list file`` 2 | #. Create a ``job config file`` 3 | #. Create a gridengine cluster with sufficient disk space attached to each ``compute`` node 4 | #. Upload input list file, config file, and ``grid-computing-tools`` source to the gridengine cluster master 5 | #. Do a "dry run" (*optional*) 6 | #. Do a "test run" (*optional*) 7 | #. Launch the job 8 | 9 | The following instructions provide guidance on each of these steps. 10 | It is recommended, though not a requirement, that you save your 11 | ``input list file`` and ``job config file`` 12 | to a directory outside the ``grid-computing-tools`` directory. 13 | For example, you might create a directory 14 | ``${WS_ROOT}/my_jobs``. 15 | 16 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-check-logging.rst: -------------------------------------------------------------------------------- 1 | **Checking the logging output of tasks** 2 | 3 | Each gridengine task will write to an "output" file and an "error" file. 4 | These files will be located in the directory the job was launched from (the ``HOME`` directory). 5 | The files will be named respectively: 6 | 7 | * *job_name*.\ **o**\ *job_id*.\ *task_id* (for example: ``my-job.o1.10``) 8 | * *job_name*.\ **e**\ *job_id*.\ *task_id* (for example: ``my-job.e1.10``) 9 | 10 | |br| 11 | The error file will contain any unexpected error output, and will also 12 | contain any download and upload logging from ``gsutil``. 13 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-create-cluster.rst: -------------------------------------------------------------------------------- 1 | **Create a cluster of Compute Engine instances running Grid Engine** 2 | 3 | In your current shell: 4 | 5 | a. ``cd ${WS_ROOT}`` 6 | b. Follow the instructions to 7 | :doc:`/use_cases/setup_gridengine_cluster_on_compute_engine/index` 8 | 9 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-delete-cluster.rst: -------------------------------------------------------------------------------- 1 | **Destroying the cluster** 2 | 3 | When you are finished running the samples, disconnect from the master instance 4 | and from your workstation shut down the gridengine cluster: 5 | 6 | .. code-block:: shell 7 | 8 | elasticluster stop gridengine 9 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-do-a-dry-run.rst: -------------------------------------------------------------------------------- 1 | **Do a "dry run"** (*optional*) 2 | 3 | The tool supports the ``DRYRUN`` environment variable. 4 | Setting this value to 1 when launching your job will cause the queued 5 | job to execute *without downloading or uploading* any files. 6 | 7 | The local output files, however, will be populated with useful information 8 | about what files *would* be copied. This can be useful for ensuring your 9 | file list is valid and that the output path is correct. 10 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-do-a-test-run.rst: -------------------------------------------------------------------------------- 1 | **Do a "test run"** (*optional*) 2 | 3 | The tool supports environment variables to indicate which lines in the input 4 | list to run: 5 | 6 | * LAUNCH_MIN - lowest line number to process 7 | * LAUNCH_MAX - highest line number to process 8 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-download-grid-computing-repo.rst: -------------------------------------------------------------------------------- 1 | **Download the** ``grid-computing-tools`` **repository (if you have not already done so)** 2 | 3 | .. code-block:: shell 4 | 5 | cd ${WS_ROOT} 6 | git clone https://github.com/googlegenomics/grid-computing-tools.git 7 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-monitoring-job-status.rst: -------------------------------------------------------------------------------- 1 | **Monitoring the status of your job** 2 | 3 | Grid Engine provides the ``qstat`` command to get the status of the execution queue. 4 | 5 | While the job is in the queue, the `state` column will indicate the status of each task. 6 | Tasks not yet allocated to a ``compute`` node will be collapsed into a single row as in the following output: 7 | 8 | .. code-block:: shell 9 | 10 | $ qstat 11 | job-ID prior name user state submit/start at queue slots ja-task-ID 12 | ------------------------------------------------------------------------------------------------ 13 | 1 0.00000 my-job janedoe qw 06/16/2015 18:03:32 1 1-6:1 14 | 15 | The above output indicates that tasks **1-6** of job **1** are all in a ``qw`` (queue waiting) state. 16 | 17 | When tasks get allocated, the output will look something like: 18 | 19 | .. code-block:: shell 20 | 21 | $ qstat 22 | job-ID prior name user state submit/start at queue slots ja-task-ID 23 | ------------------------------------------------------------------------------------------------ 24 | 1 0.50000 my-job janedoe r 06/16/2015 18:03:45 all.q@compute002 1 1 25 | 1 0.50000 my-job janedoe r 06/16/2015 18:03:45 all.q@compute001 1 2 26 | 1 0.50000 my-job janedoe r 06/16/2015 18:03:45 all.q@compute003 1 3 27 | 1 0.00000 my-job janedoe qw 06/16/2015 18:03:32 1 4-6:1 28 | 29 | which indicates tasks **1-3** are all in the ``r`` (running) state, while tasks **4-6** remain in a waiting state. 30 | 31 | When all tasks have completed ``qstat`` will produce no output. 32 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-sizing-disks.rst: -------------------------------------------------------------------------------- 1 | **Create a gridengine cluster with sufficient disk space attached to each** ``compute`` **node** 2 | 3 | a. Determine disk size requirements 4 | 5 | Each ``compute`` node will require sufficient disk space to hold the 6 | input and output files for its current task. Determine the largest file 7 | in your input list and estimate the total space you will need. 8 | It may be necessary to download the file and perform the operation 9 | manually to get a maximum combined input and output size. 10 | 11 | Persistent disk performance also scales with the size of the volume. 12 | Independent of storage requirements, for consistent throughput on long 13 | running jobs, use a standard persistent disk of at least 1TB, or use 14 | SSD persistent disk. More documentation is available for 15 | `selecting the right persistent disk`_. 16 | 17 | |br| 18 | 19 | b. Verify or increase quota 20 | 21 | Your choice for number of nodes and disk size must take into account your 22 | `Compute Engine resource quota`_ for the region of your cluster. 23 | 24 | Quota limits and current usage can be viewed with ``gcloud compute``: 25 | 26 | .. code-block:: shell 27 | 28 | gcloud compute regions describe *region* 29 | 30 | or in ``Cloud Platform Console``: 31 | 32 | https://console.cloud.google.com/project/_/compute/quotas 33 | 34 | Important quota limits include ``CPUs``, ``in-use IP addresses``, 35 | and ``disk size``. 36 | 37 | To request additional quota, submit the 38 | `Compute Engine quota request form`_. 39 | 40 | |br| 41 | 42 | c. Configure your cluster 43 | 44 | Instructions for setting the boot disk size for the compute nodes of your 45 | cluster can be found at :ref:`elasticluster-config-boot-disk`. 46 | 47 | You will likely want to set the number of ``compute`` nodes for your 48 | cluster to a number higher than the **3** specified in the example cluster 49 | setup instructions. 50 | 51 | Once configured, start your cluster. 52 | 53 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-ssh-to-master.rst: -------------------------------------------------------------------------------- 1 | **SSH to the master instance** 2 | 3 | .. code-block:: shell 4 | 5 | elasticluster ssh gridengine 6 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-upload-source.rst: -------------------------------------------------------------------------------- 1 | **Upload the** ``src`` **and** ``samples`` **directories to the Grid Engine master instance:** 2 | 3 | .. code-block:: shell 4 | 5 | cd grid-computing-tools 6 | 7 | elasticluster sftp gridengine << 'EOF' 8 | mkdir src 9 | put -r src 10 | mkdir samples 11 | put -r samples 12 | EOF 13 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-upload-your-config.rst: -------------------------------------------------------------------------------- 1 | **Upload input list file, config file, and** ``grid-computing-tools`` **source to the gridengine cluster master** 2 | 3 | .. code-block:: shell 4 | 5 | elasticluster sftp gridengine << EOF 6 | put ../my_jobs/* 7 | mkdir src 8 | put -r src 9 | EOF 10 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-viewing-log-files.rst: -------------------------------------------------------------------------------- 1 | **Viewing log files** 2 | 3 | When tasks complete, the result log files are uploaded to GCS if 4 | ``OUTPUT_LOG_PATH`` was set in the job config file. The log files can be of 5 | value both to verify success/failure of all tasks, as well as to gather 6 | some performance statistics before starting a larger job. 7 | 8 | * Count number of successful tasks 9 | 10 | .. code-block:: shell 11 | 12 | gsutil cat OUTPUT_LOG_PATH/* | grep SUCCESS | wc -l 13 | 14 | Where the ``OUTPUT_LOG_PATH`` should be the value you specified in the job 15 | config file (step 6 above). 16 | 17 | * Count number of failed tasks 18 | 19 | .. code-block:: shell 20 | 21 | gsutil cat OUTPUT_LOG_PATH/* | grep FAILURE | wc -l 22 | 23 | Where the ``OUTPUT_LOG_PATH`` should be the value you specified in the job 24 | config file (step 6 above). 25 | 26 | * Compute total task time 27 | 28 | .. code-block:: shell 29 | 30 | gsutil cat OUTPUT_LOG_PATH/* | \ 31 | sed -n -e 's#^Task time.*: \([0-9]*\) seconds#\1#p' | \ 32 | awk '{ sum += $1; } END { print sum/NR " seconds"}' 33 | 34 | * Compute average task time 35 | 36 | .. code-block:: shell 37 | 38 | gsutil cat OUTPUT_LOG_PATH/* | \ 39 | sed -n -e 's#^Task time.*: \([0-9]*\) seconds#\1#p' | \ 40 | awk '{ sum += $1; } END { print sum " seconds"}' 41 | 42 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-steps-viewing-results.rst: -------------------------------------------------------------------------------- 1 | **Viewing the results of the jobs** 2 | 3 | When tasks complete, the result files are uploaded to GCS. 4 | You can view the list of output files with ``gsutil ls``, such as: 5 | 6 | .. code-block:: shell 7 | 8 | gsutil ls OUTPUT_PATH 9 | 10 | Where the ``OUTPUT_PATH`` should be the value you specified in the job config 11 | file (step 6 above). 12 | -------------------------------------------------------------------------------- /docs/source/includes/grid-computing-tools-workstation-directory-structure.rst: -------------------------------------------------------------------------------- 1 | To use the tool, you will need to download both the 2 | `Grid Computing Tools github repo`_ and the `Elasticluster repo`_ 3 | to your local workstation or laptop. 4 | 5 | No specific relationship exists between these two repositories, but in the 6 | following instructions, it is assumed that the directories: 7 | 8 | * ``grid-computing-tools`` 9 | * ``elasticluster`` 10 | 11 | are siblings under a workspace root (``WS_ROOT``) directory. 12 | -------------------------------------------------------------------------------- /docs/source/includes/igv_desktop_setup.rst: -------------------------------------------------------------------------------- 1 | 1. Install or upgrade IGV Deskop to ensure you have a recent version. IGV Desktop can be obtained from http://www.broadinstitute.org/software/igv/download 2 | 3 | 2. Change preferences so that the Google menu is displayed. Choose menu item `View` -> `Preferences` and check the box next to `Enable Google access`. 4 | 5 | .. image:: /_static/igv_enable_google.png 6 | :alt: Enable Google Access 7 | 8 | 3. Log into Google. Choose menu item `Google` -> `Login ...` and follow the OAuth prompts. 9 | 10 | .. image:: /_static/log_into_google.png 11 | :alt: Log into Google 12 | -------------------------------------------------------------------------------- /docs/source/includes/ld_dataflow_details.rst: -------------------------------------------------------------------------------- 1 | Use ``--help`` to get more information about the command line options. Change 2 | the pipeline class name below to match the one you would like to run. 3 | 4 | .. code-block:: shell 5 | 6 | java -cp target/linkage-disequilibrium*runnable.jar \ 7 | com.google.cloud.genomics.dataflow.pipelines.LinkageDisequilibrium \ 8 | --help=com.google.cloud.genomics.dataflow.pipelines.LinkageDisequilibrium\$LinkageDisequilibriumOptions 9 | 10 | See the source code for implementation details: https://github.com/googlegenomics/linkage-disequilibrium 11 | -------------------------------------------------------------------------------- /docs/source/includes/spark_details.rst: -------------------------------------------------------------------------------- 1 | |sparkADC| 2 | 3 | Use ``--help`` to get more information about the job-specific command line options. Change 4 | the job class name below to match the one you would like to run. 5 | 6 | .. code-block:: shell 7 | 8 | spark-submit --class com.google.cloud.genomics.spark.examples.VariantsPcaDriver \ 9 | googlegenomics-spark-examples-assembly-1.0.jar --help 10 | 11 | See the source code for implementation details: https://github.com/googlegenomics/spark-examples 12 | -------------------------------------------------------------------------------- /docs/source/includes/spark_setup.rst: -------------------------------------------------------------------------------- 1 | * Deploy your Spark cluster using `Google Cloud Dataproc`_. This can be done using the `Cloud Platform Console `__ or the following ``gcloud`` command: 2 | 3 | .. code-block:: shell 4 | 5 | gcloud beta dataproc clusters create example-cluster --scopes cloud-platform 6 | 7 | * ssh to the master. 8 | 9 | .. code-block:: shell 10 | 11 | gcloud compute ssh example-cluster-m 12 | 13 | * Compile and build the pipeline jar. You can `build locally `_ or build on the Spark master Google Compute Engine virtual machine. 14 | 15 | .. container:: toggle 16 | 17 | .. container:: header 18 | 19 | To compile and build on Compute Engine: **Show/Hide Instructions** 20 | 21 | .. container:: content 22 | 23 | (1) Install `sbt `_. 24 | 25 | .. code-block:: shell 26 | 27 | echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list 28 | sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823 29 | sudo apt-get install apt-transport-https 30 | sudo apt-get update 31 | sudo apt-get install sbt 32 | 33 | (2) Clone the github repository. 34 | 35 | .. code-block:: shell 36 | 37 | sudo apt-get install git 38 | git clone https://github.com/googlegenomics/spark-examples.git 39 | 40 | (3) Compile the Jar. 41 | 42 | .. code-block:: shell 43 | 44 | cd spark-examples 45 | sbt assembly 46 | cp target/scala-2.*/googlegenomics-spark-examples-assembly-*.jar ~/ 47 | cd ~/ 48 | 49 | -------------------------------------------------------------------------------- /docs/source/includes/tute_data.rst: -------------------------------------------------------------------------------- 1 | Tute Genomics has made available to the community annotations for all hg19 SNPs as a BigQuery table. 2 | 3 | * For more details about the annotation databases included, see `Tute's blog post `_. 4 | * For sample queries on public data, see https://github.com/googlegenomics/bigquery-examples/tree/master/platinumGenomes 5 | 6 | Google Cloud Platform data locations 7 | ------------------------------------ 8 | 9 | * Google Cloud Storage folder `gs://tute_db `_ 10 | * Google BigQuery Dataset ID `silver-wall-555:TuteTable.hg19 `_ 11 | -------------------------------------------------------------------------------- /docs/source/index.rst: -------------------------------------------------------------------------------- 1 | .. Google Genomics documentation master file, created by 2 | sphinx-quickstart on Wed Apr 30 15:58:16 2014. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Google Genomics Cookbook 7 | ======================== 8 | 9 | Welcome to the Google Genomics Cookbook on Read the Docs. 10 | 11 | +--------------------------------------------------------------------------------------------------------------+ 12 | | Note: Google Genomics is now Cloud Life Sciences. | 13 | | The Google Genomics Cookbook on Read the Docs is not actively | 14 | | maintained and may contain incorrect or outdated information. | 15 | | The cookbook is only available for historical reference. For | 16 | | the most up to date documentation, view the official Cloud | 17 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 18 | | | 19 | | Also note that much of the Genomics v1 API surface has been | 20 | | superseded by `Variant Transforms `_ | 21 | | and `htsget `_. | 22 | +--------------------------------------------------------------------------------------------------------------+ 23 | 24 | Here on Read the Docs, you will find documentation and tutorials for 25 | common tasks including moving, transforming, and analyzing genomic data. 26 | 27 | .. toctree:: 28 | :maxdepth: 2 29 | 30 | sections/select_genomic_data 31 | sections/process_data 32 | sections/access_data 33 | sections/analyze_data 34 | sections/learn_more 35 | -------------------------------------------------------------------------------- /docs/source/job_troubleshooting.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Troubleshooting Job failures 14 | ------------------------------ 15 | 16 | .. toctree:: 17 | :maxdepth: 2 18 | 19 | 20 | If you were redirected to this page from a Job failure, that means 21 | your Job failed for an unknown reason. 22 | 23 | Either the failure was transient (which occassionally happens) and the 24 | Job should be retried, or there is a bug in our implementation which is 25 | causing an unexpected exception. 26 | 27 | Rest assured that we keep track of all failed Jobs, and will track 28 | down the bug if there is one. In a perfect world, you would never need 29 | to see this page. 30 | 31 | Because you are here though, please try the following: 32 | 33 | * Re-launch your Job once more. 34 | * If the Job fails a second time, please email google-genomics-discuss@googlegroups.com 35 | with both of your Job IDs. 36 | 37 | Sorry for the failure - we'll do better next time. 38 | -------------------------------------------------------------------------------- /docs/source/mailinglist.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Subscribe to the mailing list 14 | ----------------------------- 15 | 16 | .. toctree:: 17 | :maxdepth: 2 18 | 19 | 20 | The `Google Genomics Discuss mailing list `_ is a good 21 | way to sync up with other people who use genomics-tools including the core developers. You can subscribe 22 | by sending an email to ``google-genomics-discuss+subscribe@googlegroups.com`` or just post using 23 | the `web forum page `_. 24 | -------------------------------------------------------------------------------- /docs/source/sections/access_data.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Access Genomic Data using... 14 | ============================ 15 | 16 | .. comment: begin: goto-read-the-docs 17 | 18 | .. container:: visible-only-on-github 19 | 20 | +-----------------------------------------------------------------------------------+ 21 | | **The properly rendered version of this document can be found at Read The Docs.** | 22 | | | 23 | | **If you are reading this on github, you should instead click** `here`__. | 24 | +-----------------------------------------------------------------------------------+ 25 | 26 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/sections/access_data.html 27 | 28 | __ RenderedVersion_ 29 | 30 | .. comment: end: goto-read-the-docs 31 | 32 | .. toctree:: 33 | :maxdepth: 1 34 | 35 | /use_cases/browse_genomic_data/igv 36 | Picard and GATK tools 37 | Bioconductor 38 | /use_cases/browse_genomic_data/beacon 39 | AppEngine (GABrowse) 40 | R 41 | /use_cases/getting-started-with-the-api/python 42 | /use_cases/getting-started-with-the-api/java 43 | /use_cases/getting-started-with-the-api/go 44 | -------------------------------------------------------------------------------- /docs/source/sections/advanced_bigquery.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Advanced BigQuery Topics 14 | ======================== 15 | 16 | .. comment: begin: goto-read-the-docs 17 | 18 | .. container:: visible-only-on-github 19 | 20 | +-----------------------------------------------------------------------------------+ 21 | | **The properly rendered version of this document can be found at Read The Docs.** | 22 | | | 23 | | **If you are reading this on github, you should instead click** `here`__. | 24 | +-----------------------------------------------------------------------------------+ 25 | 26 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/sections/advanced_bigquery.html 27 | 28 | __ RenderedVersion_ 29 | 30 | .. comment: end: goto-read-the-docs 31 | 32 | .. toctree:: 33 | :maxdepth: 1 34 | 35 | /use_cases/annotate_variants/interval_joins 36 | /use_cases/annotate_variants/annovar 37 | /use_cases/analyze_variants/gwas 38 | /use_cases/load_data/multi_sample_variants 39 | .. /use_cases/load_data/reshape_bigquery_table 40 | -------------------------------------------------------------------------------- /docs/source/sections/analyze_data.rst: -------------------------------------------------------------------------------- 1 | Analyze Data in Google Genomics 2 | =============================== 3 | 4 | +--------------------------------------------------------------------------------------------------------------+ 5 | | Note: Google Genomics is now Cloud Life Sciences. | 6 | | The Google Genomics Cookbook on Read the Docs is not actively | 7 | | maintained and may contain incorrect or outdated information. | 8 | | The cookbook is only available for historical reference. For | 9 | | the most up to date documentation, view the official Cloud | 10 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 11 | | | 12 | | Also note that much of the Genomics v1 API surface has been | 13 | | superseded by `Variant Transforms `_ | 14 | | and `htsget `_. | 15 | +--------------------------------------------------------------------------------------------------------------+ 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/sections/analyze_data.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 1 35 | 36 | /use_cases/analyze_reads/index 37 | /use_cases/analyze_variants/index 38 | /use_cases/annotate_variants/index 39 | /use_cases/perform_quality_control_checks/index 40 | /use_cases/linkage_disequilibrium/index 41 | /sections/advanced_bigquery 42 | -------------------------------------------------------------------------------- /docs/source/sections/learn_more.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Learn More 14 | ========== 15 | 16 | .. comment: begin: goto-read-the-docs 17 | 18 | .. container:: visible-only-on-github 19 | 20 | +-----------------------------------------------------------------------------------+ 21 | | **The properly rendered version of this document can be found at Read The Docs.** | 22 | | | 23 | | **If you are reading this on github, you should instead click** `here`__. | 24 | +-----------------------------------------------------------------------------------+ 25 | 26 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/sections/learn_more.html 27 | 28 | __ RenderedVersion_ 29 | 30 | .. comment: end: goto-read-the-docs 31 | 32 | .. toctree:: 33 | :maxdepth: 1 34 | 35 | Read the API reference 36 | /migrating_v1beta2_to_v1.rst 37 | /use_cases/build_your_own_api_client/index 38 | Run through the BioC 2015 Workshop 39 | Browse the Google Genomics github repository 40 | /mailinglist 41 | -------------------------------------------------------------------------------- /docs/source/sections/process_data.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Process Data on Google Cloud 14 | ============================ 15 | 16 | .. comment: begin: goto-read-the-docs 17 | 18 | .. container:: visible-only-on-github 19 | 20 | +-----------------------------------------------------------------------------------+ 21 | | **The properly rendered version of this document can be found at Read The Docs.** | 22 | | | 23 | | **If you are reading this on github, you should instead click** `here`__. | 24 | +-----------------------------------------------------------------------------------+ 25 | 26 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/sections/process_data.html 27 | 28 | __ RenderedVersion_ 29 | 30 | .. comment: end: goto-read-the-docs 31 | 32 | .. toctree:: 33 | :maxdepth: 1 34 | 35 | /use_cases/run_pipelines_in_the_cloud/index 36 | /use_cases/run_familiar_tools/galaxy.rst 37 | /use_cases/run_familiar_tools/ncbiblast.rst 38 | /use_cases/run_familiar_tools/bioconductor.rst 39 | /use_cases/run_familiar_tools/datalab.rst 40 | -------------------------------------------------------------------------------- /docs/source/sections/select_genomic_data.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Select Genomic Data to work with 14 | ================================ 15 | 16 | .. comment: begin: goto-read-the-docs 17 | 18 | .. container:: visible-only-on-github 19 | 20 | +-----------------------------------------------------------------------------------+ 21 | | **The properly rendered version of this document can be found at Read The Docs.** | 22 | | | 23 | | **If you are reading this on github, you should instead click** `here`__. | 24 | +-----------------------------------------------------------------------------------+ 25 | 26 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/sections/select_genomic_data.html 27 | 28 | __ RenderedVersion_ 29 | 30 | .. comment: end: goto-read-the-docs 31 | 32 | .. toctree:: 33 | :maxdepth: 1 34 | 35 | /use_cases/discover_public_data/index 36 | Load Variant Data into Google Genomics 37 | 38 | -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_reads/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | Analyze Reads 14 | ================= 15 | 16 | .. comment: begin: goto-read-the-docs 17 | 18 | .. container:: visible-only-on-github 19 | 20 | +-----------------------------------------------------------------------------------+ 21 | | **The properly rendered version of this document can be found at Read The Docs.** | 22 | | | 23 | | **If you are reading this on github, you should instead click** `here`__. | 24 | +-----------------------------------------------------------------------------------+ 25 | 26 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_reads/index.html 27 | 28 | __ RenderedVersion_ 29 | 30 | .. comment: end: goto-read-the-docs 31 | 32 | Here are some analyses that operate on cloud-resident genomic reads. 33 | 34 | .. toctree:: 35 | :maxdepth: 1 36 | 37 | count_reads 38 | calculate_coverage 39 | -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/FILTER_count.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/FILTER_count.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/My_Project_left_hand_nav.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/My_Project_left_hand_nav.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/My_Project_with_genomics_public_data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/My_Project_with_genomics_public_data.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/array_fields_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/array_fields_example.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/call_count_for_call_set.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/call_count_for_call_set.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/calls_with_multiple_FILTER_values.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/calls_with_multiple_FILTER_values.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_high_quality_calls_per_sample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_high_quality_calls_per_sample.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_high_quality_variant_calls.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_high_quality_variant_calls.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_true_variants_per_callset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_true_variants_per_callset.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_true_variants_per_callset_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/count_true_variants_per_callset_2.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromome_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromome_final.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromome_pad_with_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromome_pad_with_0.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromosome_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromosome_1.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromosome_remove_chr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/true_variants_by_chromosome_remove_chr.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/variants_table_details.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/variants_table_details.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/variants_table_preview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/variants_table_preview.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/variants_table_schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/analyze_variants/analyze_variants_with_bigquery/variants_table_schema.png -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/data_analysis_codelab.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Data Analysis Codelab 15 | ===================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_variants/data_analysis_codelab.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | There are a collection of analyses upon variants documented in codelab `Data Analysis using Google Genomics`_. 34 | 35 | In this codelab, you will use `Google Genomics`_, `Google BigQuery`_, `Apache Spark`_, and `R`_ to explore the :doc:`/use_cases/discover_public_data/1000_genomes` dataset. Specifically, you will: 36 | 37 | * run a principal component analysis (either from scratch or using pre-computed results) 38 | * use BigQuery to explore population variation 39 | * zoom in to specific genome regions, including using the Genomics API to look all the way down to raw reads 40 | * run a GWAS over the variants within BRCA1 41 | * visualize and annotate results using various R packages, including `Bioconductor`_ 42 | 43 | To make use of this upon your own data: 44 | 45 | (1) First, load your data into Google Genomics and export your variants to BigQuery. See :doc:`/use_cases/load_data/index` for more detail as to how to do this. 46 | (2) Update the BigQuery table name, variant set id, and read group set in the example to match those of your data. 47 | 48 | 49 | -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/gwas.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Genome-Wide Association Study (GWAS) 15 | ==================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_variants/gwas.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | `Google BigQuery`_ can be used to perform a GWAS. Here are several examples: 34 | 35 | * Chi-squared tests on :doc:`/use_cases/discover_public_data/1000_genomes` dataset with members of EAS super population as case and control all other populations: 36 | 37 | * iPythonNotebook `Genome-wide association study (GWAS).ipynb `_ 38 | * SQL `gwas-pattern-chi-squared-test.sql `_ 39 | 40 | * Two-proportion Z test on :doc:`/use_cases/discover_public_data/1000_genomes` dataset with members of EAS super population as case and control all other populations: 41 | 42 | * SQL `gwas-pattern-two-proportion-z-test.sql `_ 43 | 44 | * Chi-squared test on :doc:`/use_cases/discover_public_data/1000_genomes` dataset with case and control determined by clustering from a PCA: 45 | 46 | * R package vignette `AllModalitiesDemo.md `__ 47 | * written as a codelab `AllModalitiesDemo.md `__ 48 | 49 | To run this on your own data: 50 | 51 | (1) First, load your data into Google Genomics and export your variants to BigQuery. See `Load Genomic Variants`_ for more detail as to how to do this. 52 | (2) For data with non-variant segments (e.g, `gVCF` or `Complete Genomics`_ data), reshape the data into multi-sample variants format via :doc:`/use_cases/load_data/multi_sample_variants` 53 | 54 | -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/hardy_weinberg_equilibrium.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Hardy-Weinberg Equilibrium 15 | ========================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_variants/hardy_weinberg_equilibrium.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | There are several Hardy-Weinberg Equilibrium examples in GitHub: 34 | 35 | * Hardy-Weinberg Equilibrium `query `_ and `example `_. 36 | * A `comparison `_ of vcfstats Hardy-Weinberg Equilibrium results to results from BigQuery. 37 | -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Analyze Variants 15 | ================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_variants/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Here are some analyses that operate on cloud-resident genomic variants. 34 | 35 | .. toctree:: 36 | :maxdepth: 1 37 | 38 | /use_cases/analyze_variants/analyze_variants_with_bigquery 39 | /use_cases/analyze_variants/data_analysis_codelab.rst 40 | /use_cases/analyze_variants/transition_transversion 41 | /use_cases/analyze_variants/hardy_weinberg_equilibrium 42 | /use_cases/compute_principal_coordinate_analysis/index 43 | /use_cases/compute_identity_by_state/index 44 | /use_cases/analyze_variants/gwas 45 | 46 | 47 | -------------------------------------------------------------------------------- /docs/source/use_cases/analyze_variants/transition_transversion.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Transition/Transversion Ratio 15 | ============================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_variants/transition_transversion.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | There are several transition/transversion ratio examples in GitHub: 34 | 35 | * Ti/Tv by Genomic Window `query `__ and `plot `__. 36 | * Ti/Tv by Alternate Allele Counts `query `__ and `plot `__. 37 | * Ti/Tv for an entire cohort `query `__. 38 | * A `comparison `__ of vcfstats Ti/Tv results to results from BigQuery. 39 | -------------------------------------------------------------------------------- /docs/source/use_cases/annotate_variants/TuteAnnotation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/googlegenomics/readthedocs/d85c86630f133c77089371422111c56eb80bab29/docs/source/use_cases/annotate_variants/TuteAnnotation.png -------------------------------------------------------------------------------- /docs/source/use_cases/annotate_variants/annovar.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Annovar Annotation 15 | ================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/annotate_variants/annovar.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | 34 | If your source data is single-sample VCF, `gVCF`_, or Complete Genomics masterVar format, this page offers some solutions to annotate all variants found within the cohort using `Annovar`_ or similar tools. 35 | 36 | (1) First, load your data into Google Genomics and export your variants to BigQuery. See `Load Genomic Variants`_ for more detail as to how to do this. 37 | 38 | (2) Note that merging has occurred during the import process, so each unique variant within the cohort will be a separate record within the variant set, with all calls for that variant nested within the record. For more information see `Variant Import merge logic details`_. 39 | 40 | (3) To create an export file similar to a VCF, run a query like the following and materialize the results to a new table. https://github.com/StanfordBioinformatics/mvp_aaa_codelabs/blob/master/sql/multisample-vcf.sql 41 | 42 | (4) Export the table to Cloud Storage and then download it to a Compute Engine instance with sufficient disk space. 43 | 44 | (5) Use ``sed`` or another file editing tool to finish the transformation needed. See also https://github.com/StanfordBioinformatics/mvp_aaa_codelabs/blob/master/bin/bq-to-vcf.py For example: 45 | 46 | * Add the ``#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT`` header line. 47 | * Convert commas to tabs. 48 | 49 | (6) Then run `Annovar`_ or similar tools on the file(s). 50 | 51 | (7) Lastly, import the result of the annotation back into BigQuery for use in your analyses. 52 | -------------------------------------------------------------------------------- /docs/source/use_cases/annotate_variants/bioconductor_annotation.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Bioconductor Annotation 15 | ======================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/annotate_variants/bioconductor_annotation.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 2 35 | 36 | `Bioconductor`_ provides a convenient way to annotate small regions of the genome. 37 | 38 | .. code-block:: shell 39 | 40 | require(GoogleGenomics) 41 | require(VariantAnnotation) 42 | require(BSgenome.Hsapiens.UCSC.hg19) 43 | require(TxDb.Hsapiens.UCSC.hg19.knownGene) 44 | 45 | GoogleGenomics::authenticate("/PATH/TO/YOUR/client_secrets.json") 46 | 47 | variants <- getVariants(datasetId="10473108253681171589", chromosome="17", start=41196311, end=41277499) 48 | granges <- variantsToGRanges(variants) 49 | 50 | txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene 51 | codingVariants <- locateVariants(granges, txdb, CodingVariants()) 52 | codingVariants 53 | 54 | coding <- predictCoding(rep(granges, elementLengths(granges$ALT)), 55 | txdb, 56 | seqSource=Hsapiens, 57 | varAllele=unlist(granges$ALT, use.names=FALSE)) 58 | coding 59 | 60 | A more extensive example of variant annotation with `Bioconductor`_ is documented towards the end of codelab `Data Analysis using Google Genomics `__. 61 | 62 | To make use of this upon your own data: 63 | 64 | (1) First, load your data into Google Genomics. See :doc:`/use_cases/load_data/index` for more detail as to how to do this. 65 | 66 | (2) If you do not have them already, install the necessary Bioconductor packages. See `Using Bioconductor`_ for more detail as to how to do this. 67 | 68 | (3) Update the parameters to the ``getVariants`` call the example above to match that of your data and desired genomic region to annotate. 69 | -------------------------------------------------------------------------------- /docs/source/use_cases/annotate_variants/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Annotate Variants 15 | ================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/annotate_variants/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | There are many ways to annotate cloud-resident genomic variants. 34 | 35 | .. toctree:: 36 | :maxdepth: 1 37 | 38 | /use_cases/annotate_variants/tute_annotation 39 | /use_cases/annotate_variants/bioconductor_annotation 40 | /use_cases/annotate_variants/interval_joins 41 | /use_cases/annotate_variants/annovar 42 | google_genomics_annotation 43 | -------------------------------------------------------------------------------- /docs/source/use_cases/annotate_variants/tute_annotation.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Tute Genomics Annotation 15 | ======================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/annotate_variants/tute_annotation.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 2 35 | 36 | .. include:: /includes/tute_data.rst 37 | 38 | To make use of this on your own data: 39 | 40 | (1) First, load your data into Google Genomics and export your variants to BigQuery. See `Load Genomic Variants`_ for more detail as to how to do this. 41 | (2) Copy and modify one of the queries in https://github.com/googlegenomics/bigquery-examples/tree/master/platinumGenomes so that it will perform a `JOIN `_ command against your table. 42 | (3) Run the revised query with BigQuery to join the Tute table with your variants and materialize the result to a new table. Notice in the screenshot below the destination table and 'Allow Large Results' is checked. 43 | 44 | .. image:: TuteAnnotation.png 45 | 46 | 47 | -------------------------------------------------------------------------------- /docs/source/use_cases/browse_genomic_data/beacon.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Beacon 15 | ====== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/browse_genomic_data/beacon.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | What's a beacon? [#beacon]_ 34 | 35 | A beacon is a simple web service that answers questions of the form, "Do you have any genomes with an 'A' at position 100,735 on chromosome 3?" (or similar data). It responds simply with either "Yes" or "No." This open web service is designed both to be technically simple (so it is easy to implement) and to mitigate risks associated with genomic data sharing. 36 | 37 | We call these applications "Beacons" because, like the SETI project, many dedicated people have been scanning the universe of human research for signs of willing participants in far-reaching data sharing efforts, but despite many assurances of interest, it has remained a dark and quiet place. Once your "Beacon" is lit, you can start to take the next steps to add functionality to it, and finding the other groups who may help by following their Beacons. 38 | 39 | There is an AppEngine implementation of the Beacon API from the Global Alliance for Genomics and Health written in Go. Here is an example query that is running against a private copy (for demonstration purposes) of the :doc:`/use_cases/discover_public_data/platinum_genomes` variants: 40 | 41 | http://goapp-beacon.appspot.com/?chromosome=chr17&coordinate=41196407&allele=A 42 | 43 | To turn on a beacon for your own data: 44 | 45 | (1) First, load your data into Google Genomics. See `Load Genomic Variants`_ for more detail as to how to do this. 46 | (2) Then follow the instructions on https://github.com/googlegenomics/beacon-go to deploy the AppEngine implementation of Beacon. 47 | 48 | .. rubric:: Footnotes 49 | 50 | .. [#beacon] http://ga4gh.org/#/beacon 51 | 52 | -------------------------------------------------------------------------------- /docs/source/use_cases/browse_genomic_data/bioconductor.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Browse Reads with Bioconductor 15 | ============================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/browse_genomic_data/bioconductor.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | `Bioconductor`_ provides a convenient way to browse regions of the genome. |browse-reads| 34 | 35 | .. code-block:: shell 36 | 37 | require(ggbio) 38 | require(GoogleGenomics) 39 | 40 | GoogleGenomics::authenticate("/PATH/TO/YOUR/client_secrets.json") 41 | 42 | galignments <- getReads(readGroupSetId="CMvnhpKTFhDnk4_9zcKO3_YB", chromosome="17", 43 | start=41218200, end=41218500, converter=readsToGAlignments) 44 | strand_plot <- autoplot(galignments, aes(color=strand, fill=strand)) 45 | coverage_plot <- ggplot(as(galignments, "GRanges")) + stat_coverage(color="gray40", 46 | fill="skyblue") 47 | tracks(strand_plot, coverage_plot, xlab="chr17") 48 | 49 | .. |browse-reads| image:: https://raw.githubusercontent.com/googlegenomics/codelabs/master/R/1000Genomes-BRCA1-analysis/figure/alignments-1.png 50 | 51 | A more extensive example of read browsing with `Bioconductor`_ is documented towards the end of codelab `Data Analysis using Google Genomics `__. 52 | 53 | To make use of this upon your own data: 54 | 55 | (1) First, load your data into `Google Genomics`_. See :doc:`/use_cases/load_data/index` for more detail as to how to do this. 56 | 57 | (2) If you do not have them already, install the necessary `Bioconductor`_ packages. See `Using Bioconductor`_ for more detail as to how to do this. Alternatively, you can :doc:`/use_cases/run_familiar_tools/bioconductor`. 58 | 59 | (3) Update the parameters to the ``getReads`` call the example above to match that of your data and desired genomic region to view. 60 | -------------------------------------------------------------------------------- /docs/source/use_cases/browse_genomic_data/gabrowse.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | GABrowse 15 | ======== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/browse_genomic_data/gabrowse.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Try it now: https://gabrowse.appspot.com/ 34 | 35 | GABrowse is a sample application designed to demonstrate the capabilities of the 36 | `Global Alliance for Genomics and Health API`_ GA4GH v0.5.1. Currently, you can view data from Google and Ensembl. 37 | 38 | * Use the button on the left to select a Read group set or Call set. 39 | * Once loaded, choose a chromosome and zoom or drag the main graph to explore Read data. 40 | * Individual bases will appear once you zoom in far enough. 41 | 42 | To make use of this upon your own data: 43 | 44 | (1) First, load your data into Google Genomics. See :doc:`/use_cases/load_data/index` for more detail as to how to do this. 45 | (2) Navigate to the auth-enabled endpoint http://gabrowse-with-auth.appspot.com/ and go through the oauth flow. 46 | (3) View some data, for example http://gabrowse-with-auth.appspot.com/#=&readsetId=CMvnhpKTFhCJyLrAurGOnrAB&backend=GOOGLE&callsetId=10473108253681171589-538&cBackend=GOOGLE&location=5%3A90839366 47 | (4) Then modify the ReadGroupSetId and/or CallsetId in the URL to those of your data. 48 | 49 | The code for this sample application is on GitHub https://github.com/googlegenomics/api-client-python 50 | -------------------------------------------------------------------------------- /docs/source/use_cases/browse_genomic_data/igv.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Integrative Genomics Viewer (IGV) 15 | ================================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/browse_genomic_data/igv.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | IGV Web 34 | ------- 35 | 36 | igv.js is an embeddable interactive genome visualization component. 37 | The source code for igv.js is available on github: 38 | 39 | https://github.com/igvteam/igv.js 40 | 41 | 42 | Documentation is available at: 43 | 44 | https://github.com/igvteam/igv.js/wiki 45 | 46 | 47 | Try it now with public data in Google Genomics: 48 | 49 | http://igv.org/web/examples/google-demo.html 50 | 51 | IGV Desktop 52 | ----------- 53 | 54 | IGV Desktop supports browsing of reads from the Google Genomics Reads API and also from BAM files in Google Cloud Storage. It implements an OAuth flow to facilitate access to private data in addition to public data. 55 | 56 | Setup 57 | ^^^^^^ 58 | 59 | .. include:: /includes/igv_desktop_setup.rst 60 | 61 | View a Google Genomics ReadGroupSet 62 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 63 | 64 | Choose menu item `Google` -> `Load Genomics ReadGroupSet` and enter the readGroupSet ID for the readGroupSet you wish to view. For example, a readGroupSet ID of ``CMvnhpKTFhD3he72j4KZuyc`` will display the reads for NA12877 from :doc:`/use_cases/discover_public_data/platinum_genomes`. 65 | 66 | View a BAM from Google Cloud Storage 67 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 68 | 69 | Choose menu item `File` -> `Load from URL` and enter the Google Cloud Storage path for the BAM you wish to view. For example, a path of ``gs://genomics-public-data/platinum-genomes/bam/NA12877_S1.bam`` will display the reads for NA12877 from :doc:`/use_cases/discover_public_data/platinum_genomes`. 70 | 71 | Be sure to have a ``.bai`` file stored along side the ``.bam`` file you wish to view. 72 | -------------------------------------------------------------------------------- /docs/source/use_cases/build_your_own_api_client/index.rst: -------------------------------------------------------------------------------- 1 | .. Google Genomics documentation master file, created by 2 | sphinx-quickstart on Wed Apr 30 15:58:16 2014. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | +--------------------------------------------------------------------------------------------------------------+ 7 | | Note: Google Genomics is now Cloud Life Sciences. | 8 | | The Google Genomics Cookbook on Read the Docs is not actively | 9 | | maintained and may contain incorrect or outdated information. | 10 | | The cookbook is only available for historical reference. For | 11 | | the most up to date documentation, view the official Cloud | 12 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 13 | | | 14 | | Also note that much of the Genomics v1 API surface has been | 15 | | superseded by `Variant Transforms `_ | 16 | | and `htsget `_. | 17 | +--------------------------------------------------------------------------------------------------------------+ 18 | 19 | Build your own Google Genomics API Client 20 | =============================================== 21 | 22 | .. comment: begin: goto-read-the-docs 23 | 24 | .. container:: visible-only-on-github 25 | 26 | +-----------------------------------------------------------------------------------+ 27 | | **The properly rendered version of this document can be found at Read The Docs.** | 28 | | | 29 | | **If you are reading this on github, you should instead click** `here`__. | 30 | +-----------------------------------------------------------------------------------+ 31 | 32 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/build_your_own_api_client/index.html 33 | 34 | __ RenderedVersion_ 35 | 36 | .. comment: end: goto-read-the-docs 37 | 38 | The tools for working with the `Google Genomics API`_ 39 | are all open source and available `on GitHub `_. 40 | 41 | This documentation covers how to get started with the available tools as well 42 | as how you might build your own code which uses the API. 43 | 44 | .. toctree:: 45 | :maxdepth: 2 46 | 47 | ../../constants 48 | ../../common_api_flows 49 | ../../auth_requirements 50 | ../../api-client-python/index 51 | ../../api-client-r/index 52 | -------------------------------------------------------------------------------- /docs/source/use_cases/compute_principal_coordinate_analysis/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Compute Principal Coordinate Analysis 15 | ======================================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/compute_principal_coordinate_analysis/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 1 35 | 36 | 1-way-pca 37 | 2-way-pca 38 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/1000_cannabis_genomes.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | 1000 Cannabis Genomes Project 15 | ============================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/1000_cannabis_genomes.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | A genomic open dataset of approximately 850 strains of cannabis via the `Open Cannabis Project `_ has been made available on Google Cloud Platform. See `the blog post `_ for more context and provenance details. 34 | 35 | Google Cloud Platform data locations 36 | ------------------------------------ 37 | 38 | See the `1000 Cannabis Genomes Project `_ for full details. 39 | 40 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/annotations_toc.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Annotations 15 | =========== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/annotations_toc.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 1 35 | 36 | tute_genomics_public_data 37 | /use_cases/linkage_disequilibrium/public_ld_datasets 38 | clinvar_annotations 39 | ucsc_annotations 40 | isb_cgc_data 41 | cosmic_annotations 42 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/clinvar_annotations.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | ClinVar Annotations 15 | =================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/clinvar_annotations.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Annotations from `ClinVar`_ were loaded into Google Genomics for use in sample annotation pipelines. This data reflects the state of `ClinVar`_ at a particular point in time. 34 | 35 | Google Cloud Platform data locations 36 | ------------------------------------ 37 | 38 | * Google Cloud Storage folder `gs://genomics-public-data/clinvar/ `_ 39 | * Google Genomics `annotation sets `_ 40 | 41 | Provenance 42 | ---------- 43 | 44 | Each of the annotation sets listed below was imported into the API from the source files. The source files are also mirrored in Google Cloud Storage. 45 | 46 | `ClinVar`_ (downloaded 2/5/2015 10:18AM PST): 47 | 48 | * ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz 49 | * ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/disease_names 50 | 51 | Caveats 52 | ------- 53 | 54 | A number of ClinVar entries were omitted during ingestion due to data incompatibility with the Google Genomics API. 55 | 56 | * 14737 were aligned to NCBI36, which the Google Genomics API does not currently support. 57 | * 5952 did not specify a reference assembly. 58 | * 1324 were labeled as insertions but did not specify the inserted bases. 59 | * 220 were labeled as SNPs, but did not specify an alternate base. 60 | * 148 were larger than 100MBp. 61 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/cosmic_annotations.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | COSMIC Annotations 15 | ================== 16 | 17 | The Institute for Systems Biology Cancer Genomics Cloud (ISB-CGC) has made 18 | `COSMIC `_ 19 | available as BigQuery tables to provide a new way to explore and understand 20 | the mutations driving cancer. 21 | 22 | See ISB-CGC's documentation for full details: 23 | http://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/COSMIC.html 24 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/dream_smc_dna.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | ICGC-TCGA DREAM Mutation Calling Challenge synthetic genomes 15 | ============================================================= 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/dream_smc_dna.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | This dataset comprises the three public synthetic tumor/normal pairs created for the `ICGC-TCGA DREAM Mutation Calling challenge `_. See the journal article for full details regarding how the synthetic data for challenge *in silico #1* was created: 34 | 35 | | `Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection `_ 36 | | Adam D Ewing, Kathleen E Houlahan, Yin Hu, Kyle Ellrott, Cristian Caloian, 37 | | Takafumi N Yamaguchi, J Christopher Bare, Christine P'ng, Daryl Waggott, 38 | | Veronica Y Sabelnykova, ICGC-TCGA DREAM Somatic Mutation Calling Challenge participants, 39 | | Michael R Kellen, Thea C Norman, David Haussler, Stephen H Friend, Gustavo Stolovitzky, 40 | | Adam A Margolin, Joshua M Stuart & Paul C Boutros 41 | | Published: May 18, 2015 42 | | DOI: 10.1038/nmeth.3407 43 | | 44 | 45 | Google Cloud Platform data locations 46 | ------------------------------------ 47 | * Google Cloud Storage folder `gs://public-dream-data/ `_ 48 | * Google Genomics dataset `337315832689 `_. 49 | 50 | Provenance 51 | ---------- 52 | 53 | * The authoritative data location is NCBI Sequence Read Archive: `SRP042948 `_. 54 | * The BAMs were uploaded to Google Cloud Storage and the reads were then imported to Google Genomics. 55 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/genomic_data_toc.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Genomic Data 15 | ============ 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/genomic_data_toc.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 1 35 | 36 | 1000_genomes 37 | platinum_genomes 38 | reference_genomes 39 | mssng_data 40 | isb_cgc_data 41 | supercentenarians 42 | pgp_public_data 43 | dream_smc_dna 44 | simons_foundation 45 | 1000_cannabis_genomes 46 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Discover Published Data 15 | ======================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | 34 | 35 | .. toctree:: 36 | :maxdepth: 2 37 | 38 | genomic_data_toc 39 | annotations_toc 40 | 41 | Please let us know if you have a dataset that you wish to share and have listed here for discovery. 42 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/isb_cgc_data.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | TCGA Cancer Genomics Data in the Cloud 15 | ====================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/isb_cgc_data.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Use the power of BigQuery to analyze the wealth of data created by `The Cancer Genome Atlas `_ (TCGA) project! 34 | 35 | The Institute for Systems Biology (ISB) has created and made public a dataset based on the open-access TCGA data including somatic mutation calls, clinical data, mRNA and miRNA expression, DNA methylation and protein expression from 33 different tumor types. It's part of their `Cancer Genomics Cloud`_, funded by the National Cancer Institute. They've also created public github repositories so you can try out sample queries and analyses in R or `Google Cloud Datalab`_. 36 | 37 | * `documentation `__ 38 | * `examples in R `_ 39 | * `examples in Python `_ 40 | 41 | Google Cloud Platform data locations 42 | ------------------------------------ 43 | 44 | * Google BigQuery Dataset IDs: 45 | + `isb-cgc:TCGA_bioclin_v0 `_ 46 | + `isb-cgc:TCGA_hg19_data_v0 `_ 47 | + `isb-cgc:TCGA_hg38_data_v0 `_ 48 | 49 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/mssng_data.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | MSSNG Database for Autism Researchers 15 | ===================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/mssng_data.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | This dataset comprises a growing collection of both Illumina and Complete Genomics genomes of families affected by autism. **Apply for data access at** `MSSNG Database for Autism Researchers `_. See the journal article for full details: 34 | 35 | | `Whole-genome sequencing of quartet families with autism spectrum disorder `_ 36 | | Ryan K C Yuen, Bhooma Thiruvahindrapuram, Daniele Merico, Susan Walker, Kristiina Tammimies, Ny Hoang, Christina Chrysler, Thomas Nalpathamkalam, Giovanna Pellecchia, Yi Liu, Matthew J Gazzellone, Lia D'Abate, Eric Deneault, Jennifer L Howe, Richard S C Liu, Ann Thompson, Mehdi Zarrei, Mohammed Uddin, Christian R Marshall, Robert H Ring, Lonnie Zwaigenbaum, Peter N Ray, Rosanna Weksberg, Melissa T Carter, Bridget A Fernandez, et al. 37 | | Published January 26, 2015 38 | | DOI: 10.1038/nm.3792 39 | | 40 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/platinum_genomes.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Illumina Platinum Genomes 15 | =========================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/platinum_genomes.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | This dataset comprises the `6 member CEPH pedigree 1463 `_. See http://www.illumina.com/platinumgenomes/ for full details. 34 | 35 | Google Cloud Platform data locations 36 | ------------------------------------ 37 | 38 | * Google Cloud Storage folder `gs://genomics-public-data/platinum-genomes `_ 39 | * Google Genomics Dataset ID `3049512673186936334 `_ 40 | 41 | * `ReadGroupSet IDs `_ 42 | * `Variant Reference Bounds `_ 43 | 44 | * Google BigQuery Dataset ID `genomics-public-data:platinum_genomes `_ 45 | 46 | Beacon 47 | ------ 48 | You can find a `Global Alliance for Genomics and Health Beacon`_ at http://webdev.dnastack.com/p/beacon/platinum?chromosome=1&coordinate=10177&allele=AC 49 | 50 | Provenance 51 | ---------- 52 | 53 | * The source files for this data include: 54 | * All of the BAM files listed at `the EBI FTP site `_. 55 | * All of the VCF files were listed at `the Illumina FTP site `_ prior to the IlluminaPlatinumGenomes_v6.0 release but they have since been taken down. 56 | * These files were copied to Google Cloud Storage, uploaded to Google Genomics, and the variants were exported to Google BigQuery. 57 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/supercentenarians.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Supercentenarian Genomes 15 | ======================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/supercentenarians.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | This dataset comprises Complete Genomics genomes for 17 supercentenarians (110 years or older). **Apply for data access at** http://goo.gl/MGcYJ5. See the journal article for full details: 34 | 35 | | `Whole-Genome Sequencing of the World's Oldest People `_ 36 | | Hinco J. Gierman, Kristen Fortney, Jared C. Roach, Natalie S. Coles, Hong Li, Gustavo Glusman, Glenn J. Markov, Justin D. Smith, Leroy Hood, L. Stephen Coles, Stuart K. Kim 37 | | Published: November 12, 2014 38 | | DOI: 10.1371/journal.pone.0112430 39 | | 40 | 41 | Google Cloud Platform data locations 42 | ------------------------------------ 43 | 44 | * Google Genomics dataset `18254571932956699773 `_ once access has been granted. 45 | 46 | Provenance 47 | ---------- 48 | 49 | * The data are also available from http://supercentenarians.stanford.edu/. 50 | * The CGI masterVar files were uploaded to Google Cloud Storage and then imported to Google Genomics. 51 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/tute_genomics_public_data.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Tute Genomics Annotation 15 | ======================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/tute_genomics_public_data.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. include:: /includes/tute_data.rst 34 | -------------------------------------------------------------------------------- /docs/source/use_cases/discover_public_data/ucsc_annotations.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | UCSC Annotations 15 | ================ 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/ucsc_annotations.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | `UCSC Sequence and Annotation Data`_ were loaded into Google Genomics for use in sample annotation pipelines. This data reflects the state of `UCSC Sequence and Annotation Data`_ at a particular point in time. 34 | 35 | Google Cloud Platform data locations 36 | ------------------------------------ 37 | 38 | * Google Cloud Storage folder `gs://genomics-public-data/ucsc/ `_ 39 | * Google Genomics `annotation sets `_ 40 | 41 | Provenance 42 | ---------- 43 | 44 | Each of the annotation sets listed below was imported into the API from the source files. The source files are also mirrored in Google Cloud Storage. 45 | 46 | UCSC GRCh38 (downloaded 12/29/2014 14:00 PST): 47 | 48 | * http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refFlat.txt.gz 49 | * http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz 50 | * http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/knownGene.txt.gz 51 | 52 | UCSC hg19 (downloaded 3/5/2015 17:00 PST): 53 | 54 | * http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz 55 | * http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz 56 | * http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz 57 | 58 | -------------------------------------------------------------------------------- /docs/source/use_cases/getting-started-with-the-api/go.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Go 15 | == 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/getting-started-with-the-api/go.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. |language-link| replace:: `Go getting started`_ 34 | 35 | .. include:: /includes/getting-started-with-the-api.rst 36 | -------------------------------------------------------------------------------- /docs/source/use_cases/getting-started-with-the-api/java.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Java 15 | ==== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/getting-started-with-the-api/java.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. |language-link| replace:: `Java getting started`_ 34 | 35 | .. include:: /includes/getting-started-with-the-api.rst 36 | -------------------------------------------------------------------------------- /docs/source/use_cases/getting-started-with-the-api/python.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Python 15 | ====== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/getting-started-with-the-api/python.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. |language-link| replace:: `Python getting started`_ 34 | 35 | .. include:: /includes/getting-started-with-the-api.rst 36 | -------------------------------------------------------------------------------- /docs/source/use_cases/linkage_disequilibrium/analyze_ld_results.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Analyze Linkage Disequilibrium Results 15 | ====================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/linkage_disequilibrium/analyze_ld_results.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 2 35 | 36 | There are several examples of interacting with LD results stored in BigQuery using Datalab in GitHub. The examples are all part of the `linkage disequilibrium project `_. 37 | 38 | * Exploring `summary statistics of LD data `_. 39 | * `Visualizing LD patterns `_ in specific genomic regions. 40 | * Examining the rate of `LD decay `_ as a function of distance. 41 | * Selecting "tag variants" and visualizing `tag variant distributions `_. 42 | 43 | -------------------------------------------------------------------------------- /docs/source/use_cases/linkage_disequilibrium/compute_linkage_disequilibrium.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Compute Linkage Disequilibrium on a Variant Set 15 | =============================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/linkage_disequilibrium/compute_linkage_disequilibrium.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 2 35 | 36 | .. contents:: 37 | 38 | This pipeline calculates linkage disequilibrium between pairs of variants in a Global Alliance `VariantSet`_ (which you can `create from a VCF file `_). It takes as input a VariantSet for which the linkage disequilibrium values will be calculated and calculates the D' and allelic correlation measures of linkage disequilibrium, defined in 39 | Box 1 of: 40 | 41 | | `Linkage disequilibrium - understanding the evolutionary past and mapping the medical future `_ 42 | | Slatkin, Montgomery 43 | | Nature Reviews Genetics, Volume 9, Issue 6, 477 - 485 44 | | DOI: http://dx.doi.org/10.1038/nrg2361 45 | | 46 | 47 | The pipeline is implemented on `Google Cloud Dataflow`_. 48 | 49 | Setup Dataflow 50 | -------------- 51 | 52 | .. include:: /includes/collapsible_ld_dataflow_setup_instructions.rst 53 | 54 | Run the pipeline 55 | ---------------- 56 | 57 | The following command will calculate linkage disequilibrium between all pairs of variants within 50,000 base pairs of each other for a specific region in the :doc:`/use_cases/discover_public_data/1000_genomes` Phase 3 VariantSet, and retain results for all pairs that have an absolute value of their allelic correlation of at least 0.4. 58 | 59 | .. code-block:: shell 60 | 61 | java -Xbootclasspath/p:alpn-boot.jar \ 62 | -cp target/linkage-disequilibrium*runnable.jar \ 63 | com.google.cloud.genomics.dataflow.pipelines.LinkageDisequilibrium \ 64 | --variantSetId=11027761582969783635 \ 65 | --references=17:41196311:41277499 \ 66 | --window=50000 \ 67 | --ldCutoff=0.4 \ 68 | --output=gs://YOUR-BUCKET/dataflow-output/linkage-disequilibrium-1000G_Phase_3-BRCA1.txt 69 | 70 | .. include:: /includes/dataflow_on_gce_run.rst 71 | 72 | |dataflowSomeRefs| 73 | 74 | |dataflowAllRefs| 75 | 76 | To run the pipeline on a subset of individuals in a VariantSet: 77 | 78 | * Add a ``--callSetsToUse`` flag that has a comma-delimited list of call sets to include. 79 | 80 | Additional details 81 | ------------------ 82 | 83 | .. include:: /includes/ld_dataflow_details.rst 84 | -------------------------------------------------------------------------------- /docs/source/use_cases/linkage_disequilibrium/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Compute and Analyze Linkage Disequilibrium 15 | ========================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/linkage_disequilibrium/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 1 35 | 36 | compute_linkage_disequilibrium 37 | /use_cases/linkage_disequilibrium/public_ld_datasets 38 | transform_ld_results 39 | analyze_ld_results 40 | -------------------------------------------------------------------------------- /docs/source/use_cases/linkage_disequilibrium/public_ld_datasets.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Linkage Disequilibrium Datasets 15 | =============================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/linkage_disequilibrium/public_ld_datasets.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Linkage disequilibrium was run separately for each `super population and sub population `_ within :doc:`/use_cases/discover_public_data/1000_genomes` phase 3 variants using the method defined in Box 1 of: 34 | 35 | | `Linkage disequilibrium - understanding the evolutionary past and mapping the medical future `_ 36 | | Slatkin, Montgomery 37 | | Nature Reviews Genetics, Volume 9, Issue 6, 477 - 485 38 | | DOI: http://dx.doi.org/10.1038/nrg2361 39 | | 40 | 41 | LD was computed for all pairs of variants within a window of 1,000,000 bp (1 megabase) and all pairs with absolute allelic correation of 0.4 are retained. See :doc:`/use_cases/linkage_disequilibrium/compute_linkage_disequilibrium` for more detail. 42 | 43 | The `output files `_ were split by chromosome with `output columns `_ indicating the identity of each pair of values and the resulting LD value. The output files have also been `loaded into BigQuery `_ with the same columns. Examples of using BigQuery to analyze LD are `available as Datalab notebooks `_. 44 | 45 | Google Cloud Platform data locations 46 | ------------------------------------ 47 | 48 | * Google Cloud Storage folder `gs://genomics-public-data/linkage-disequilibrium `_ 49 | * Google BigQuery Dataset ID `genomics-public-data:linkage_disequilibrium_1000G_phase_3 `_ 50 | -------------------------------------------------------------------------------- /docs/source/use_cases/load_data/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Load Data into Google Genomics 15 | =============================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/load_data/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Here you will find task-oriented documentation regarding how to load data into Google Genomics. 34 | 35 | .. toctree:: 36 | :maxdepth: 2 37 | 38 | load_variants 39 | ../../job_troubleshooting 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /docs/source/use_cases/load_data/load_variants.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Load Genomic Variants 15 | ===================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/load_data/load_variants.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. toctree:: 34 | :maxdepth: 3 35 | 36 | This tutorial has moved. See `Load Genomic Variants`_. 37 | -------------------------------------------------------------------------------- /docs/source/use_cases/perform_quality_control_checks/index.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Perform Quality Control Checks 15 | ============================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/perform_quality_control_checks/index.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | Here are some quality control measures written in a cloud-native manner. 34 | 35 | .. toctree:: 36 | :maxdepth: 1 37 | 38 | verify_bam_id 39 | qc_codelab 40 | -------------------------------------------------------------------------------- /docs/source/use_cases/perform_quality_control_checks/qc_codelab.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Perform Quality Control on Variants 15 | =================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/perform_quality_control_checks/qc_codelab.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | There are a collection of quality control checks for variants documented in codelab `Quality Control using Google Genomics`_. The methods include: 34 | 35 | * Sample Level 36 | 37 | * Genome Call Rate 38 | * Missingness Rate 39 | * Singleton Rate 40 | * Heterozygosity Rate 41 | * Homozygosity Rate 42 | * Inbreeding Coefficient 43 | * Sex Inference 44 | * Ethnicity Inference 45 | * Genome Similarity 46 | 47 | * Variant Level 48 | 49 | * Ti/Tv by Genomic Window 50 | * Ti/Tv by Alternate Allele Counts 51 | * Ti/Tv by Depth 52 | * Missingness Rate 53 | * Hardy-Weinberg Equilibrium 54 | * Heterozygous Haplotype 55 | 56 | These methods were co-developed with researchers working on the Million Veterans Program data. For more detail, please see `the paper `__ and `diagram of their full pipeline `__ with some additional quality control checks on `github `__. 57 | 58 | To make use of this codelab upon your own data: 59 | 60 | (1) First, load your data into Google Genomics and export your variants to BigQuery. See `Load Genomic Variants`_ for more detail as to how to do this. 61 | (2) Each section of `the codelab `_ discusses how to run that part on your own data. For example, update the BigQuery table name in `Part 1: Data Overview `_ 62 | -------------------------------------------------------------------------------- /docs/source/use_cases/run_familiar_tools/bioconductor.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Run Bioconductor on Compute Engine 15 | ================================== 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/run_familiar_tools/bioconductor.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | .. include:: /includes/bioconductor_deployment_sidebar.rst 34 | 35 | Bioconductor maintains Docker containers with R, Bioconductor packages, and RStudio Server all ready to go! Its a great way to set up your R environment quickly and start working. The instructions to deploy it to Google Compute Engine are below but if you want to learn more about these containers, see http://www.bioconductor.org/help/docker/. 36 | 37 | 1. Click on `click-to-deploy Bioconductor`_ to navigate to the launcher page on the Cloud Platform Console. 38 | 39 | 1. Optional: change the *Machine type* if you would like to deploy a machine with more CPU cores or RAM. 40 | 2. Optional: change the *Data disk size (GB)* if you would like to use a larger persistent disk for your own files. 41 | 3. Optional: change *Docker image* if you would like to run a container with additional Bioconductor packages preinstalled. 42 | 43 | 2. Click on the *Deploy Bioconductor* button. 44 | 3. Follow the post-deployment instructions to log into RStudioServer via your browser! 45 | 46 | If you want to deploy a different docker container, such as the one from :doc:`/workshops/bioc-2015` or from https://github.com/isb-cgc/examples-R: 47 | 48 | 1. In field *Docker Image* choose item ``custom``. 49 | 2. Click on *More* to display the additional form fields. 50 | 3. In field *Custom docker image* paste in the docker image path, such as ``gcr.io/bioc_2015/devel_sequencing`` or ``b.gcr.io/isb-cgc-public-docker-images/r-examples``. 51 | 52 | Change your virtual machine type (number of cores, amount of memory) 53 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 54 | 55 | 1. First, make sure results from your current R session are saved to the data disk (underneath ``/home/rstudio/data``) or another location outside of the container. 56 | 2. Follow these instructions to stop, resize, and start your VM: https://cloud.google.com/compute/docs/instances/changing-machine-type-of-stopped-instance 57 | 58 | "Stop" or "Delete" your virtual machine 59 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 60 | 61 | .. include:: /includes/c2d_deployment_teardown.rst 62 | 63 | -------------------------------------------------------------------------------- /docs/source/use_cases/run_familiar_tools/galaxy.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Run Galaxy on Compute Engine 15 | ============================ 16 | 17 | The `DREAM Tumor Heterogeneity challenge `_ using `Galaxy`_ and `Docker`_ is in full swing now. See the `quick-start videos `_ for how to set up Galaxy on Google Compute Engine. 18 | -------------------------------------------------------------------------------- /docs/source/use_cases/run_familiar_tools/ncbiblast.rst: -------------------------------------------------------------------------------- 1 | +--------------------------------------------------------------------------------------------------------------+ 2 | | Note: Google Genomics is now Cloud Life Sciences. | 3 | | The Google Genomics Cookbook on Read the Docs is not actively | 4 | | maintained and may contain incorrect or outdated information. | 5 | | The cookbook is only available for historical reference. For | 6 | | the most up to date documentation, view the official Cloud | 7 | | Life Sciences documentation at https://cloud.google.com/life-sciences. | 8 | | | 9 | | Also note that much of the Genomics v1 API surface has been | 10 | | superseded by `Variant Transforms `_ | 11 | | and `htsget `_. | 12 | +--------------------------------------------------------------------------------------------------------------+ 13 | 14 | Run NCBI BLAST on Compute Engine 15 | ================================ 16 | 17 | .. comment: begin: goto-read-the-docs 18 | 19 | .. container:: visible-only-on-github 20 | 21 | +-----------------------------------------------------------------------------------+ 22 | | **The properly rendered version of this document can be found at Read The Docs.** | 23 | | | 24 | | **If you are reading this on github, you should instead click** `here`__. | 25 | +-----------------------------------------------------------------------------------+ 26 | 27 | .. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/run_familiar_tools/ncbiblast.html 28 | 29 | __ RenderedVersion_ 30 | 31 | .. comment: end: goto-read-the-docs 32 | 33 | 34 | The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. 35 | 36 | * For more information, see `NCBI BLAST Cloud Documentation`_ and `NCBI BLAST`_. 37 | * To deploy BLAST to Google Compute Engine, you can `click-to-deploy NCBI BLAST`_. 38 | --------------------------------------------------------------------------------