├── .gitignore
├── .travis.yml
├── CLI-workflow-executor
├── GC-lite-workflow.ga
├── README.md
├── galaxy_credentials.yml
├── galaxy_credentials.yml.sample
├── input_files.yaml
├── run_galaxy_workflow.py
├── sample_lite.fa
└── wf-parameters.json
├── Dockerfile
├── FAQ.md
├── LICENSE
├── README.md
├── assets
├── img
│ ├── GalaxyDocker.png
│ ├── figure-pipeline_zigzag.png
│ ├── figure-pipeline_zigzag.svg
│ ├── graphclust_pipeline.png
│ ├── graphclust_pipeline.svg
│ ├── kitematic-1.png
│ ├── kitematic-2.png
│ ├── kitematic-3.png
│ ├── kitematic-32.png
│ ├── kitematic-4.png
│ ├── kitematic-5.png
│ ├── video-thumbnail.png
│ └── workflow_early.png
├── library
│ └── library_data.yaml
├── tools
│ ├── graphclust_tools.yml
│ ├── graphclust_tools2.yml
│ └── graphclust_utils.yml
├── tours
│ ├── graphclust_step_by_step.yaml
│ ├── graphclust_tutorial.yaml
│ └── graphclust_very_short.yaml
└── welcome.html
├── data
├── CLIP-sites
│ ├── Roquin1-PARCLIP-sites.fasta
│ └── SLBP-Galaxy1-[peaks_l2fc4_sorted_merged_ext60_merged.fasta].fasta
├── README
├── Rfam-cliques-dataset
│ ├── cliques-high-representatives.fa
│ └── cliques-low-representatives.fa
└── SHAPE-data
│ ├── Probealign_labeled_10-10.fa
│ └── Probealign_labeled_10-10.react
├── kitematic.md
└── workflows
├── GraphClust-MotifFinder.ga
├── GraphClust_main_1r.ga
├── GraphClust_main_2r.ga
├── GraphClust_main_3r.ga
├── README.md
├── auxiliary-workflows
├── Cluster-conservation-filter.ga
├── Cluster-conservation-filter_and_align.ga
├── Galaxy-Workflow-compute-SP-reactivity.ga
├── MAF-to-FASTA-Collection.ga
├── MAF-to-FASTA.ga
└── README.md
└── extra-workflows
├── Orthology
├── Galaxy-Workflow-MotifFinder-orthlncRNA-conservation-metrics.ga
└── README.md
├── README.md
├── RNAshapes
├── GraphClust_1r_brnashapes.ga
├── GraphClust_2r_brnashapes.ga
└── README.md
├── SHAPE
├── GraphClust_1r_SHAPE.ga
├── GraphClust_2r_SHAPE.ga
├── GraphClust_3r_SHAPE.ga
└── README.md
└── with-subworkflow
├── Galaxy-Workflow-MultiRoundClustering.ga
├── Galaxy-Workflow-iterative_clustering.ga
├── Galaxy-Workflow-iterative_clustering_r1.ga
├── GraphClust-iterative_clustering.ga
├── README.md
└── superflow-motif-finder-lncRNA-clustal.ga
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 |
27 | # PyInstaller
28 | # Usually these files are written by a python script from a template
29 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 |
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 |
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 |
48 | # Translations
49 | *.mo
50 | *.pot
51 |
52 | # Django stuff:
53 | *.log
54 | local_settings.py
55 |
56 | # Flask stuff:
57 | instance/
58 | .webassets-cache
59 |
60 | # Scrapy stuff:
61 | .scrapy
62 |
63 | # Sphinx documentation
64 | docs/_build/
65 |
66 | # PyBuilder
67 | target/
68 |
69 | # IPython Notebook
70 | .ipynb_checkpoints
71 |
72 | # pyenv
73 | .python-version
74 |
75 | # celery beat schedule file
76 | celerybeat-schedule
77 |
78 | # dotenv
79 | .env
80 |
81 | # virtualenv
82 | venv/
83 | ENV/
84 |
85 | # Spyder project settings
86 | .spyderproject
87 |
88 | # Rope project settings
89 | .ropeproject
90 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | sudo: required
2 |
3 | language: python
4 | python: 2.7
5 |
6 | services:
7 | - docker
8 |
9 | env:
10 | - TOX_ENV=py27
11 |
12 | git:
13 | submodules: false
14 |
15 | before_install:
16 | - wget https://raw.githubusercontent.com/bgruening/galaxy-flavor-testing/master/Makefile
17 | - make docker_install
18 | - travis_wait 50 make docker_build
19 | - make docker_run
20 | - sleep 80
21 |
22 | install:
23 | - make install
24 |
25 | script:
26 | - make test_api
27 | - make test_ftp
28 | - make test_bioblend
29 | #- make test_docker_in_docker
30 |
--------------------------------------------------------------------------------
/CLI-workflow-executor/README.md:
--------------------------------------------------------------------------------
1 | This is based on EBI gene expression group repository [https://github.com/ebi-gene-expression-group/galaxy-workflow-executor](https://github.com/ebi-gene-expression-group/galaxy-workflow-executor)
2 |
3 | A sample invocation would be:
4 |
5 | ```python run_galaxy_workflow.py -C galaxy_credentials.yml -i input_files.yaml -W GC-lite-workflow.ga -k -H testCLI -P wf-parameters.json -G usegalaxy_eu```
6 |
7 | For instruction details and technical issues regarding the CLI executation please refer to these repositories: https://github.com/ebi-gene-expression-group/galaxy-workflow-executor and https://github.com/ebi-gene-expression-group/scxa-workflows
8 |
9 |
10 | # Galaxy workflow executor
11 |
12 | This setup uses bioblend to run a Galaxy workflow through the cli:
13 |
14 | - Inputs:
15 | - Galaxy workflow as JSON file (from share workflow -> download).
16 | - Parameters dictionary as JSON
17 | - Input files defined in YAML
18 | - Steps with allowed errors in YAML (optional)
19 | - History name (optional)
20 |
21 | # Galaxy workflow
22 |
23 | The workflow should be annotated with labels, ideally for all steps, but at least
24 | for the steps where you want to be able to set parameters through the parameters
25 | dictionary. It should be the JSON file resulting from Workflows (upper menu) -> Share workflow
26 | (on the drop down menu of the workflow, in the workflow list) -> Download
27 | (in the following screen).
28 |
29 | # Parameters JSON
30 |
31 | It should follow the following structure:
32 |
33 | ```json
34 | {
35 | "step_label_x": {
36 | "param_name": "value",
37 | ....
38 | "nested_param_name": {
39 | "n_param_name": "n_value",
40 | ....
41 | "x_param_name": "x_value"
42 | }
43 |
44 | },
45 | "step_label_x2": {
46 | ....
47 | },
48 | ....
49 | "other_galaxy_setup_params": { ... }
50 | }
51 | ```
52 |
53 | # Input files in YAML
54 |
55 | It should point to the files in the file system, set a name (which needs to match
56 | with a workflow input label) and file type (among those recognized by Galaxy).
57 |
58 | The structure of the YAML file for inputs is:
59 |
60 | ```yaml
61 | matrix:
62 | path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx
63 | type: txt
64 | genes:
65 | path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx_rows
66 | type: tsv
67 | barcodes:
68 | path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx_cols
69 | type: tsv
70 | gtf:
71 | dataset_id: fe139k21xsak
72 | ```
73 |
74 | where in this example case the Galaxy workflow should have input labels called `matrix`,
75 | `genes`, `barcodes` and `gtf`. The paths need to exist in the local file system, if `path` is set within an input. Alternatively to a path in the local file system, if the file is already on the Galaxy instance, the `dataset_id` of the file can be given instead, as shown for the `gtf` case here.
76 |
77 | # Steps with allowed errors
78 |
79 | This optional YAML file indicates the executor which steps are allowed to fail without the overal execution being considered
80 | failed and hence retrieving result files anyway. This is to make room to the fact that on a production setup, there might
81 | be border conditions on datasets that could produce acceptable failures.
82 |
83 | The structure of the file relies on the labels for steps used in the workflow and parameters files
84 |
85 | ```yaml
86 | step_label_x:
87 | - any
88 | step_label_z:
89 | - 1
90 | - 43
91 | ```
92 |
93 | The above example means that the step with label `step_label_x` can fail with any error code, whereas step with label
94 | `step_label_z` will only be allowed to fail with codes 1 or 43.
95 |
96 |
97 |
--------------------------------------------------------------------------------
/CLI-workflow-executor/galaxy_credentials.yml:
--------------------------------------------------------------------------------
1 | __default: usegalaxy_eu
2 |
3 | usegalaxy_eu:
4 | key: "paste your account API key from https://usegalaxy.eu/user/api_key here"
5 | url: "https://usegalaxy.eu"
6 | docker_cloud:
7 | key: "xx"
8 | url: "http://4.4.4.X:8088 (public/private IP docker instance here)"
9 |
--------------------------------------------------------------------------------
/CLI-workflow-executor/galaxy_credentials.yml.sample:
--------------------------------------------------------------------------------
1 | __default: embassy
2 |
3 | embassy:
4 | key: "xx"
5 | url: "http://193.62.52.166:30700"
6 | ebi_cluster:
7 | key: "xx"
8 | url: "http://galaxy-gxa-001:8088"
9 |
--------------------------------------------------------------------------------
/CLI-workflow-executor/input_files.yaml:
--------------------------------------------------------------------------------
1 | cliques_lite:
2 | class: File
3 | path: sample_lite.fa
4 | type: fasta
5 |
--------------------------------------------------------------------------------
/CLI-workflow-executor/sample_lite.fa:
--------------------------------------------------------------------------------
1 | >RF00001_rep.0_AL096764.11/46123-46004 RF00001
2 | GUCUAUGGCCAUACCACCCUGAAUGUGCUUGAUCUCAUCUGAUCUCGUGAAGCCAAGCAGGGUGGGGCCUAGUUAGUACUUGGAUGGGAGACUUCCUGGGAAUAUAAGCUGCUGUUGGCU
3 | >RF00001_rep.1_U89919.1/939-1056 RF00001
4 | CUUUACGGCCACACCACCCUGAACGCACCGGAUCUCGACUGACCUUGAAAGCUAAGCAGGAUCGGGCCUGGUUAGUAUUGGGAUGGCAGACCCCCUGGAAAUACAGGGUGCUGAAGGU
5 | >RF00001_rep.2_AJ508600.1/161-58 RF00001
6 | GUCUACAGCCAUACCAUCCUGAACAUGCCAGAUCUUGUCUGACCUCUGAAGCUAAGCAGGGUCAAGCCUGGUUAGUACUUGGGAGAAGCUGGUGUGGCUAGACC
7 | >RF00005_rep.0_M15347.1/1040-968 RF00005
8 | GGCUCCAUAGCUCAGGGGUUAGAGCACUGGUCUUGUAAACCAGGGGUCGCGAGUUCAAUUCUCGCUGGGGCUU
9 | >RF00005_rep.10_X58792.1/174-245 RF00005
10 | GGUCCCAUGGUGUAAUGGUUAGCACUCUGGACUUUGAAUCCAGCGAUCCGAGUUCAAAUCUCGGUGGGACCU
11 | >RF00005_rep.11_AF346992.1/15890-15955 RF00005
12 | GUCCUUGUAGUAUAAACUAAUACACCAGUCUUGUAAACCGGAGAUGAAAACCUUUUUCCAAGGACA
13 | >RF00005_rep.12_AC108081.2/59868-59786 RF00005
14 | GUCAGGAUGGCCGAGCGGUCUAAGGCGCUGCGUUCAGGUCGCAGUCUCCCCUGGAGGCGUGGGUUCGAAUCCCACUUCUGACA
15 | >RF00005_rep.13_AC067849.6/4771-4840 RF00005
16 | CACUGUAAAGCUAACUUAGCAUUAACCUUUUAAGUUAAAGAUUAAGAGAACCAACACCUCUUUACAGUGA
17 | >RF00005_rep.14_AL021808.2/65570-65498 RF00005
18 | GCUUCUGUAGUGUAGUGGUUAUCACGUUCGCCUCACACGCGAAAGGUCCCCGGUUCGAAACCGGGCAGAAGCA
19 | >RF00005_rep.15_AC008443.10/42590-42518 RF00005
20 | GCCCGGCUAGCUCAGUCGGUAGAGCAUGAGACUCUUAAUCUCAGGGUCGUGGGUUCGAGCCCCACGUUGGGCG
21 | >RF00005_rep.16_AL133551.13/12355-12436 RF00005
22 | GCAGCGAUGGCCGAGUGGUUAAGGCGUUGGACUUGAAAUCCAAUGGGGUCUCCCCGCGCAGGUUCGAACCCUGCUCGCUGCG
23 | >RF00005_rep.17_AL021918.1/54817-54736 RF00005
24 | GUAGUCGUGGCCGAGUGGUUAAGGCGAUGGACUUGAAAUCCAUUGGGGUUUCCCCGCGCAGGUUCGAAUCCUGUCGGCUACG
25 | >RF00005_rep.18_AL021918.1/81116-81197 RF00005
26 | GUAGUCGUGGCCGAGUGGUUAAGGCGAUGGACUAGAAAUCCAUUGGGGUUUCCCCACGCAGGUUCGAAUCCUGCCGACUACG
27 | >RF00005_rep.19_AF134583.1/1816-1744 RF00005
28 | UAGAUUGAAGCCAGUUGAUUAGGGUGCUUAGCUGUUAACUAAGUGUUUGUGGGUUUAAGUCCCAUUGGUCUAG
29 | >RF00005_rep.1_AC005329.1/7043-6971 RF00005
30 | GCCGAAAUAGCUCAGUUGGGAGAGCGUUAGACUGAAGAUCUAAAGGUCCCUGGUUCGAUCCCGGGUUUCGGCA
31 | >RF00005_rep.20_AL671879.2/100356-100285 RF00005
32 | GGGGAUGUAGCUCAGUGGUAGAGCGCAUGCUUCGCAUGUAUGAGGCCCCGGGUUCGAUCCCCGGCAUCUCCA
33 | >RF00005_rep.21_AL355149.13/15278-15208 RF00005
34 | GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUCCCACGCGGGAGACCCGGGUUCAAUUCCCGGCCAAUGCA
35 | >RF00005_rep.22_AL590385.23/26487-26416 RF00005
36 | GCGUUGGUGGUAUAGUGGUGAGCAUAGCUGCCUUCCAAGCAGUUGACCCGGGUUCGAUUCCCGGCCAACGCA
37 | >RF00005_rep.23_M16479.1/42-123 RF00005
38 | GGUGGGGUUCCCGAGCGGCCAAAGGGAGCAGACUCUAAAUCUGCCGUCAUCGACUUCGAAGGUUCGAAUCCUUCCCCCACCA
39 | >RF00005_rep.24_AC004941.2/32735-32806 RF00005
40 | GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU
41 | >RF00005_rep.25_AC006449.19/196857-196784 RF00005
42 | GUCUCUGUGGCGCAAUCGGUUAGCGCGUUCGGCUGUUAACCGAAAGGUUGGUGGUUCGAGCCCACCCAGGGACG
43 | >RF00005_rep.26_AF346999.1/4402-4331 RF00005
44 | UAGGAUGGGGUGUGAUAGGUGGCACGGAGAAUUUUGGAUUCUCAGGGAUGGGUUCGAUUCUCAUAGUCCUAG
45 | >RF00005_rep.27_AL352978.6/119697-119770 RF00005
46 | GGCCGGUUAGCUCAGUUGGUUAGAGCGUGGUGCUAAUAACGCCAAGGUCGCGGGUUCGAUCCCCGUACGGGCCA
47 | >RF00005_rep.28_X04779.1/1-73 RF00005
48 | CCUUCGAUAGCUCAGCUGGUAGAGCGGAGGACUGUAGAUCCUUAGGUCGCUGGUUCGAUUCCGGCUCGAAGGA
49 | >RF00005_rep.29_AF381996.2/4265-4333 RF00005
50 | AGAAAUAUGUCUGAUAAAAGAGUUACUUUGAUAGAGUAAAUAAUAGGAGCUUAAACCCCCUUAUUUCUA
51 | >RF00005_rep.2_AL662865.4/12206-12135 RF00005
52 | GGUUCCAUGGUGUAAUGGUUAGCACUCUGGACUCUGAAUCCAGCGAUCCGAGUUCAAAUCUCGGUGGAACCU
53 | >RF00005_rep.30_AL132988.4/95773-95841 RF00005
54 | AAGGGCUUAGCUUAAUUAAAGUGGCUGAUUUGCGUUCAGUUGAUGCAGAGUGGGGUUUUGCAGUCCUUA
55 | >RF00005_rep.31_AC092686.3/29631-29561 RF00005
56 | GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUGCCACGCGGGAGGCCCGGGUUCGAUUCCCGGCCAAUGCA
57 | >RF00005_rep.32_AF347015.1/5892-5827 RF00005
58 | GGUAAAAUGGCUGAGUGAAGCAUUGGACUGUAAAUCUAAAGACAGGGGUUAGGCCUCUUUUUACCA
59 | >RF00005_rep.33_AC018638.5/4694-4623 RF00005
60 | GGCUCGUUGGUCUAGGGGUAUGAUUCUCGCUUAGGGUGCGAGAGGUCCCGGGUUCAAAUCCCGGACGAGCCC
61 | >RF00005_rep.34_AC008443.10/43006-42934 RF00005
62 | GUUUCCGUAGUGUAGUGGUUAUCACGUUCGCCUCACACGCGAAAGGUCCCCGGUUCGAAACCGGGCGGAAACA
63 | >RF00005_rep.35_AC005783.1/27398-27326 RF00005
64 | GUUUCCGUAGUGUAGCGGUUAUCACAUUCGCCUCACACGCGAAAGGUCCCCGGUUCGAUCCCGGGCGGAAACA
65 | >RF00005_rep.36_AC007298.17/145366-145295 RF00005
66 | UCCUCGUUAGUAUAGUGGUGAGUAUCCCCGCCUGUCACGCGGGAGACCGGGGUUCGAUUCCCCGACGGGGAG
67 | >RF00005_rep.37_AF347001.1/16015-15948 RF00005
68 | CAGAGAAUAGUUUAAAUUAGAAUCUUAGCUUUGGGUGCUAAUGGUGGAGUUAAAGACUUUUUCUCUGA
69 | >RF00005_rep.38_J00309.1/356-427 RF00005
70 | UCCCUGGUGGUCUAGUGGCUAGGAUUCGGCGCUUUCACCGCCGCGCCCCGGGUUCGAUUCCCGGCCAGGAAU
71 | >RF00005_rep.39_AL031229.2/40502-40430 RF00005
72 | GUUUCCGUAGUGUAGUGGUUAUCACGUUCGCCUAACACGCGAAAGGUCCCUGGAUCAAAACCAGGCGGAAACA
73 | >RF00005_rep.3_Z54587.1/126-45 RF00005
74 | GGUAGCGUGGCCGAGCGGUCUAAGGCGCUGGAUUUAGGCUCCAGUCUCUUCGGAGGCGUGGGUUCGAAUCCCACCGCUGCCA
75 | >RF00005_rep.40_AF382013.1/10403-10467 RF00005
76 | UGGUAUAUAGUUUAAACAAAACGAAUGAUUUCGACUCAUUAAAUUAUGAUAAUCAUAUUUACCAA
77 | >RF00005_rep.41_AC093311.2/140036-139968 RF00005
78 | GUUCUUGUAGUUGAAAUACAACGAUGGUUUUUCAUAUCAUUGGUCGUGGUUGUAGUCCGUGCGAGAAUA
79 | >RF00005_rep.42_AF347015.1/5827-5762 RF00005
80 | AGCUCCGAGGUGAUUUUCAUAUUGAAUUGCAAAUUCGAAGAAGCAGCUUCAAACCUGCCGGGGCUU
81 | >RF00005_rep.43_L23320.1/77-10 RF00005
82 | ACUCUUUUAGUAUAAAUAGUACCGUUAACUUCCAAUUAACUAGUUUUGACAACAUUCAAAAAAGAGUA
83 | >RF00005_rep.44_AC008670.6/83597-83665 RF00005
84 | GUAAAUAUAGUUUAACCAAAACAUCAGAUUGUGAAUCUGACAACAGAGGCUCACGACCCCUUAUUUACC
85 | >RF00005_rep.45_AF382005.1/581-651 RF00005
86 | GUUUAUGUAGCUUACCUCCUCAAAGCAAUACACUGAAAAUGUUUAGACGGGCUCACAUCACCCCAUAAACA
87 | >RF00005_rep.46_AF347015.1/1604-1672 RF00005
88 | CAGAGUGUAGCUUAACACAAAGCACCCAACUUACACUUAGGAGAUUUCAACUUAACUUGACCGCUCUGA
89 | >RF00005_rep.4_Z98744.2/66305-66234 RF00005
90 | AGCAGAGUGGCGCAGCGGAAGCGUGCUGGGCCCAUAACCCAGAGGUCGAUGGAUCGAAACCAUCCUCUGCUA
91 | >RF00005_rep.5_AL590385.23/26129-26058 RF00005
92 | UCCCUGGUGGUCUAGUGGUUAGGAUUCGGCGCUCUCACCGCCGCGGCCCGGGUUCGAUUCCCGGUCAGGGAA
93 | >RF00005_rep.6_X93334.1/6942-7009 RF00005
94 | AAGGUAUUAGAAAAACCAUUUCAUAACUUUGUCAAAGUUAAAUUAUAGGCUAAAUCCUAUAUAUCUUA
95 | >RF00005_rep.7_AF347005.1/12268-12338 RF00005
96 | ACUUUUAAAGGAUAACAGCUAUCCAUUGGUCUUAGGCCCCAAAAAUUUUGGUGCAACUCCAAAUAAAAGUA
97 | >RF00005_rep.8_AF134583.1/1599-1666 RF00005
98 | AGAAAUUUAGGUUAAAUACAGACCAAGAGCCUUCAAAGCCCUCAGUAAGUUGCAAUACUUAAUUUCUG
99 | >RF00005_rep.9_AP000442.6/2022-1950 RF00005
100 | GCCCGGAUAGCUCAGUCGGUAGAGCAUCAGACUUUUAAUCUGAGGGUCCAGGGUUCAAGUCCCUGUUCGGGCG
101 | >RF00006_rep.0_AF045145.1/1-88 RF00006
102 | GGCUGGCUUUAGCUCAGCGGUUACUUCGCGUGUCAUCAAACCACCUCUCUGGGUUGUUCGAGACCCGCGGGCGCUCUCCAGCCCUCUU
103 | >RF00006_rep.1_AC005219.1/49914-50014 RF00006
104 | GGGUCGGAGUUAGCUCAAGCGGUUACCUCCUCAUGCCGGACUUUCUAUCUGUCCAUCUCUGUGCUGGGGUUCGAGACCCGCGGGUGCUUACUGACCCUUUU
105 | >RF00006_rep.2_AF045143.1/1-98 RF00006
106 | GGCUGGCUUUAGCUCAGCGGUUACUUCGACAGUUCUUUAAUUGAAACAAGCAACCUGUCUGGGUUGUUCGAGACCCGCGGGCGCUCUCCAGUCCUUUU
107 | >RF00006_rep.3_AF045144.1/1-88 RF00006
108 | GGCUGGCUUUAGCUCAGCGGUUACUUCGAGUACAUUGUAACCACCUCUCUGGGUGGUUCGAGACCCGCGGGUGCUUUCCAGCUCUUUU
109 | >RF00019_rep.0_V00584.1/39-151 RF00019
110 | GGCUGGUCCGAAGGUAGUGAGUUAUCUCAAUUGAUUGUUCACAGUCAGUUACAGAUCGAACUCCUUGUUCUACUCUUUCCCCCCUUCUCACUACUGCACUUGACUAGUCUUUU
111 | >RF00019_rep.1_L32608.1/283-377 RF00019
112 | GGCUGGUCCGAUGGUAGUGGGUUAUCAGAACUUAUUAACAUUAGUGUCACUAAAGUUGGUAUACAACCCCCCACUGCUAAAUUUGACUGGCUUUU
113 | >RF00019_rep.2_ABBA01033605.1/1707-1808 RF00019
114 | GGCUGGUCCGAGUGCAGUGGUGUUUACAACUAAUUGAUCACAACCAGUUACAGAUUUCUUUGUUCCUUCUCCACUCCCACUGCUUCACUUGACUAGCCUUUU
115 | >RF00019_rep.3_AADD01087475.1/2469-2552 RF00019
116 | AGUUGGUCCGAGUGUUGUGGGUUAUUGUUAAGUUGAUUUAACAUUGUCUCCCCCCACAACCGCGCUUGACUAGCUUGCUGUUUU
117 | >RF00027_rep.0_AF480570.1/1-79 RF00027
118 | GUGAGGUAGUAAGUUGUAUUGUUGUGGGGUAGGGAUAUUAGGCCCCAAUUAGAAGAUAACUAUACAACUUACUACUUUC
119 | >RF00027_rep.1_AC048341.22/3536-3622 RF00027
120 | CCUGGCUGAGGUAGUAGUUUGUGCUGUUGGUCGGGUUGUGACAUUGCCCGCUGUGGAGAUAACUGCGCAAGCUACUGCCUUGCUAGU
121 | >RF00027_rep.2_AC018755.3/119936-120011 RF00027
122 | CCGGGCUGAGGUAGGAGGUUGUAUAGUUGAGGAGGACACCCAAGGAGAUCACUAUACGGCCUCCUAGCUUUCCCCA
123 | >RF00031_rep.0_X71973.1/730-791 RF00031
124 | CCGGCACUCAUGACGGCCUGCCUGCAAACCUGCUGGUGGGGCAGACCCGAAAAUCCAGCGUG
125 | >RF00031_rep.1_U67171.1/375-442 RF00031
126 | GACGCUUCAUGAUAGGAAGGACUGAAAAGUCUUGUGGACACCUGGUCUUUCCCUGAUGUUCUCGUGGC
127 | >RF00031_rep.2_S79854.1/1605-1666 RF00031
128 | CACUGCUGAUGACGAACUAUCUCUAACUGGUCUUGACCACGAGCUAGUUCUGAAUUGCAGGG
129 | >RF00031_rep.3_X53463.1/847-903 RF00031
130 | UUCACAGAAUGAUGGCACCUUCCUAAACCCUCAUGGGUGGUGUCUGAGAGGCGUGAA
131 | >RF00031_rep.4_AF195141.1/689-759 RF00031
132 | GACUGACAUUAUGAAGGCCUGUACUGAAGACAGCAAGCUGUUAGUACAGACCAGAUGCUUUCUUGGCAGGC
133 | >RF00031_rep.5_AF093774.1/5851-5916 RF00031
134 | GUGUGCGGAUGAUAACUACUGACGAAAGAGUCAUCGACCUCAGUUAGUGGUUGGAUGUAGUCACAU
135 | >RF00031_rep.6_BC003127.1/865-928 RF00031
136 | GUCACUGCAUGAUCCGCUCUGGUCAAACCCUUCCAGGCCAGCCAGAGUGGGGAUGGUCUGUGAC
--------------------------------------------------------------------------------
/CLI-workflow-executor/wf-parameters.json:
--------------------------------------------------------------------------------
1 | {
2 | "preprocessing": {
3 | "max_length": "120"
4 | }
5 | }
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | # Galaxy - GraphClust
2 | FROM quay.io/bgruening/galaxy:19.01
3 |
4 | MAINTAINER Björn A. Grüning, bjoern.gruening@gmail.com
5 |
6 | ENV GALAXY_CONFIG_BRAND GraphClust
7 | ENV ENABLE_TTS_INSTALL True
8 |
9 | # Install tools
10 | # Split into multiple layers, it seems that there is a max-layer size.
11 | COPY ./assets/tools/graphclust_tools.yml $GALAXY_ROOT/tools.yaml
12 | RUN install-tools $GALAXY_ROOT/tools.yaml && \
13 | /tool_deps/_conda/bin/conda clean --tarballs --yes && \
14 | rm /export/galaxy-central/ -rf
15 |
16 | COPY ./assets/tools/graphclust_tools2.yml $GALAXY_ROOT/tools_2.yaml
17 | RUN install-tools $GALAXY_ROOT/tools_2.yaml && \
18 | /tool_deps/_conda/bin/conda clean --tarballs --yes && \
19 | rm /export/galaxy-central/ -rf
20 |
21 | COPY ./assets/tools/graphclust_utils.yml $GALAXY_ROOT/tools_3.yaml
22 | RUN install-tools $GALAXY_ROOT/tools_3.yaml && \
23 | /tool_deps/_conda/bin/conda clean --tarballs --yes && \
24 | rm /export/galaxy-central/ -rf
25 |
26 |
27 | # Add Galaxy interactive tours
28 | ADD ./assets/tours/* $GALAXY_ROOT/config/plugins/tours/
29 |
30 | # Data libraries
31 | ADD ./assets/library/library_data.yaml $GALAXY_ROOT/library_data.yaml
32 |
33 | # Add workflows to the Docker image
34 | ADD ./workflows/*.ga $GALAXY_ROOT/workflows/
35 |
36 | # Download training data and populate the data library
37 | RUN startup_lite && \
38 | sleep 30 && \
39 | . $GALAXY_VIRTUAL_ENV/bin/activate && \
40 | workflow-install --workflow_path $GALAXY_ROOT/workflows/ -g http://localhost:8080 -u $GALAXY_DEFAULT_ADMIN_USER -p $GALAXY_DEFAULT_ADMIN_PASSWORD
41 | # && \
42 | # setup-data-libraries -i $GALAXY_ROOT/library_data.yaml -g http://localhost:8080 -u $GALAXY_DEFAULT_ADMIN_USER -p $GALAXY_DEFAULT_ADMIN_PASSWORD
43 |
44 | # Container Style
45 | ADD ./assets/img/workflow_early.png $GALAXY_CONFIG_DIR/web/welcome_image.png
46 | ADD ./assets/welcome.html $GALAXY_CONFIG_DIR/web/welcome.html
47 |
--------------------------------------------------------------------------------
/FAQ.md:
--------------------------------------------------------------------------------
1 | # Questions regarding setup and usage:
2 |
3 | 1. Q: How can I stop a running docker instance of Galaxy-GraphClust?
4 |
5 | A: If you are runnig the container in interactive mode (i.e. docker run -i) use `Ctrl+C`. To stop ALL docker instances on your computer you can run this command in terminal: `sudo docker stop $(sudo docker ps -a -q)`
6 |
7 | 2. Q: After a few experiments and upgrades, lots of disk storage is occupied. How can I clean it up?
8 |
9 | Docker takes a conservative approach for cleaning up unnecessary data objects. Below some solutions for cleaning up your hard disk is coming, ordered in the level of conservativeness:
10 |
11 | * Using `docker system prune` command, manual [here](https://docs.docker.com/config/pruning/)
12 | * Please make sure no unintended container instance is running on the background. You can get a list of all containers with `docker ps -a` and remove them if necessary with `docker rm ID-or-NAME`.
13 | * The above steps do not remove the dangling and not needed images which usually take most of the space. You can use `docker images` to get a list of them, and `docker rmi image-ID` to remove individual images.
14 | * To auto-remove a container after exiting, you can use `docker run --rm`.
15 | * A detailed tutorial about these and further ways can be found here: [https://www.tecmint.com/remove-docker-images-containers-and-volumes/](https://www.tecmint.com/remove-docker-images-containers-and-volumes/)
16 |
17 | 3. Q: I would like to customize the workflow settings but there are so many parameters there. What can I do?
18 |
19 | GraphClust2 workflow is collection of more than 15 tools where a majority invoke slightly complex methodologies. We have provided the pre-configurations that we think would be needed by the users, accroding to our own experience and the feedback from the GraphClust2 users and collaborators.
20 | Each tool wrapper is supplemented with brief help descriptions for the arguments and/or external links to the tool's documentation. We are extending, the in-Galaxy help descriptions and Galaxy tutorials, your feedback is very appreciated. If you would like to customize the configurations and do not know how to start, we would recommend to start with adapting the first and last steps of the workflow.
21 |
22 | GraphClust2 takes a windowing approach for folding and clustering long input sequences. The windows size and overlapping ratio can be adapted to the expectation of the structure features. Starting with shorter window-lengths (50-100nt) would be a good idea, specially if it's not known that the putative structured elements are covering the entire sequence or local elemnts. Using shorter windows, the small elements such as stem-loops can be identified, afterwards you may re-run the pipeline with an increased windows lenght to capture the complete structure.
23 |
24 | In the last step `cluster_collection_report`, GraphClust2 assigns the elements to the best matching cluster and also aligns the best (top) matching entries of each cluster. `results_top_num` defines how many of the top entries to align, increasing the number would be a good idea to find covariations and identify a reliable conserved element. Usually aligning the top10-30 or higher would help to identify reliable structure conservations and covariations. The other parameter to consider is the covariance model hit criteria (E-value or bitscore). The E-value works very well (and designed for )specially for structured non-coding RNAs with defined boundaries, like sequences in the Rfam database. We have found switching back to the CM-bit score option (option `Use CM score for cutoff`), to work better for identifying structured elements surrounded (within) a sequence context.
25 |
26 | 4. Q: The workflow runs forever on my computer. Isn't the liner run-time on of the highlighted remarks?
27 |
28 | The apparent practical bottleneck of the workflow specially for local instances is the covariance model calibration. Specifically the `cmcalibrate` step integrated into the `cmbuild` Infernal wrapper. This calibration is necessary to compute a reliable E-value for the significance of a CM hit, but time-consuming for generating ~million bases of background sequences.
29 |
30 | We would suggest to use the instance on our European Galaxy server, where the cmbuild step is pre-configured to use multi-processors and also the server is supported by thousands of computing nodes. In the docker instance, by default, the calibration is performed on a single core. We would recommend to use ask your galaxy admin to configure the wrapper according to the backend hardware. Alternatively, you can reduce the length of random sequences (-L in cmbuild-cmcalibrate) wrapper. Please refer to the Infernal manual and also take care that the E-values might not be reliable anymore.
31 |
32 | 5. Q: How can I do more rounds?
33 |
34 | A: To extend an existing workflow of GraphClust and add another round, you should run a workflow called Galaxy-Workflow-single_round_for_extension : [GraphClust_two](https://raw.githubusercontent.com/BackofenLab/docker-galaxy-graphclust/master/workflows/Galaxy-Workflow-single_round_for_extension.ga)
35 | The inputs for this workflow are the files generated by the GraphClust workflow. The names of each input corresponds to the name of the produced file, so you should just choose from a dropdown selection a needed file. Important parameter for this workflow is the **round number**, which must be specified in **NSPDK_cancidateCluster**, **pgma_graphclust** and **cluster_collection_report** tools. Alternatively it is recommended to use the *with-subworkflows* flavors.
36 |
37 | 6. Q: In my Ubuntu host system the container is running but constantly reports error: `could not connect to server: Connection refused`
38 |
39 | A0: For Ubuntu users we recommend to use 16.04 LTS version which is deleivered with Kernel 4.2 or higher.
40 |
41 | A1: Docker manager is tightly coupled with the host Linux kernel. Under certain Linux kernel the docker storage system might fail.
42 | Please proceed with the following commands or contact you administrator ( __Warning__ please be careful of potential data loss with this procedure):
43 |
44 | ```
45 | sudo apt update; sudo apt upgrade;
46 | sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
47 | sudo modprobe aufs
48 | sudo service docker stop
49 | sudo rm -rf /var/lib/docker/overlay2
50 | sudo service docker restart
51 | ```
52 | For more information please check Docker documentation: https://docs.docker.com/engine/userguide/storagedriver/aufs-driver/
53 |
54 |
55 | # Login to the docker instance:
56 | To have distinct history and workflows the Galaxy server requires each user to register for first access time. **By default anyone with access to the host network can register. No registration confirmation email will be sent to the given email.** So you can register with any custom (including non-existent) email address. There exist also a default Admin user [described here](https://bgruening.github.io/docker-galaxy-stable/users-passwords.html). To change the default authorization settings please refer to the Galaxy Wiki section [Authentication](https://wiki.galaxyproject.org/Develop/Authentication)
57 |
58 | * To register (first time only):
59 | * On top right of the panel goto **User→Register**
60 | * Provide a custom email address and password, confirm your password and enter a public name
61 |
62 | * To login:
63 | * On top right of the panel goto **User→Login**
64 | * Provide your registered email address and password
65 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
8 |
9 | GraphClust2
10 | ========================
11 | GraphClust2 is a workflow for scalable clustering of RNAs based on sequence and secondary structures feature. GraphClust2 is implemented within the Galaxy framework and consists a set of integrated Galaxy tools and flavors of the linear-time clustering workflow.
12 |
13 |
14 | Table of Contents
15 | =================
16 | * [GraphClust2](#graphclust2)
17 | * [Table of Contents](#table-of-contents)
18 | * [Availability](#availability)
19 | * [GraphClust2 on European Galaxy Server](#graphclust2-on-european-galaxy-server)
20 | * [GraphClust2 Docker 🐳 Image](#graphclust2-docker-whale-image)
21 | * [Installation and Setup](#installation-and-setup)
22 | * [Requirements](#requirements)
23 | * [Running the docker instance](#running-the-docker-instance)
24 | * [Using graphic interface (Windows/MacOS)](#using-graphic-interface-windowsmacos)
25 | * [Installation on a Galaxy instance](#installation-on-a-galaxy-instance)
26 | * [Setup support](#setup-support)
27 | * [Demo instance](#demo-instance)
28 | * [Usage - How to run GraphClust2](#usage---how-to-run-graphclust2)
29 | * [Browser access to the server](#browser-access-to-the-server)
30 | * [Public server](#public-server)
31 | * [Docker instance](#docker-instance)
32 | * [Video tutorial](#video-tutorial)
33 | * [Interactive tours](#interactive-tours)
34 | * [Import additional workflows](#import-additional-workflows)
35 | * [Workflow flavors](#workflow-flavors)
36 | * [Workflows on the running server](#workflows-on-the-running-server)
37 | * [command line support (beta)](#command-line-support-beta)
38 | * [Frequently Asked Questions](#frequently-asked-questions)
39 | * [Workflow overview](#workflow-overview)
40 | * [Input](#input)
41 | * [Output](#output)
42 | * [Support & Bug Reports](#support--bug-reports)
43 | * [References](#references)
44 |
45 |
46 | # Availability
47 |
48 | ## GraphClust2 on European Galaxy Server
49 | GraphClust2 is accessible on European Galaxy server at:
50 | * [https://graphclust.usegalaxy.eu](https://graphclust.usegalaxy.eu)
51 |
52 | ## GraphClust2 Docker :whale: Image
53 | It is also possible to run GraphClust2 as a stand-alone solution using a Docker container that is a pre-configured flavor of the official [Galaxy Docker image](https://github.com/bgruening/docker-galaxy-stable).
54 | This Docker image is a flavor of the Galaxy Docker image customized for GraphClust2 tools, tutorial interactive tours and workflows.
55 |
56 | ### Installation and Setup
57 | #### Requirements
58 |
59 | For running GraphClust2 locally, the `Docker` client is required.
60 | Docker supports the three major desktop operating systems Linux, Windows and Mac OSX. Please refer to thw [Docker installation guideline](https://docs.docker.com/installation) for details.
61 |
62 | A GUI client can also be used for Windows and Mac operation systems.
63 | Please follow the graphical instructions for using Kitematic client [here](./kitematic.md).
64 |
65 | **Hardware requirements:**
66 | * Minimum 8GB memory
67 | * Minimum 20GB free disk storage space, 100GB is recommended.
68 |
69 | **Supported operating systems**
70 |
71 | GraphClust2 has been tested on these operating systems:
72 | * *Windows* : 10 using [Kitematic](https://kitematic.com/)
73 | * *MacOSx*: 10.1x or higher using [Kitematic](https://kitematic.com/)
74 | * *Linux*: Kernel 4.2 or higher, preferably with aufs support (see [FAQ](FAQ.md))
75 |
76 |
77 | ### Running the docker instance
78 | From the command line:
79 |
80 | ```bash
81 | docker run -i -t -p 8080:80 backofenlab/docker-galaxy-graphclust
82 | ```
83 |
84 | For details about the docker commands please check the official guide [here](https://docs.docker.com/engine/reference/run/). Galaxy specific run options and configuration supports for computation grid systems are detailed in the Galaxy Docker [repository](https://github.com/bgruening/docker-galaxy-stable).
85 |
86 | ### Using graphic interface (Windows/MacOS)
87 | Please check this [step-by-step guide](./kitematic/kitematic.md).
88 |
89 | ## Installation on a Galaxy instance
90 | GraphClust2 can be integrated into any available Galaxy server. All the GraphClust2 tools and workflows needed to run the
91 | GraphClust pipeline are listed in [workflows](./workflows/) and
92 | [tools-list](./assets/tools/).
93 |
94 | #### Setup support
95 | In case you encountered problems please use the recommended settings, check the [FAQs](./FAQ.md) or contact us via [*Issues*](https://github.com/BackofenLab/GraphClust-2/issues) section of the repository.
96 |
97 |
98 | ## Demo instance
99 | A running demo instance of GraphClust2 is available at http://192.52.32.222:8080/.
100 | Please note that this instance is simply a Cloud instance of the provided Docker container, intended for rapid inspections and demonstration purposes. The computation
101 | capacity is limited and currently it is not planned to have a long-time availability. We recommend to follow instructions above. Please contact us if you prefer to keep this service available.
102 |
103 | # Usage - How to run GraphClust2
104 |
105 | ## Browser access to the server
106 | ### Public server
107 | Please register on our European Galaxy server [https://usegalaxy.eu](https://usegalaxy.eu) and use your authentication information to access the customized sub-domain [https://graphclust.usegalaxy.eu]. Guides and tutorial are available in the server welcome home page.
108 |
109 | ### Docker instance
110 | After running the Galaxy docker, a web server is established under the host IP/URL and designated port (default 8080).
111 | * Inside your browser goto IP/URL:PORT
112 | * Following same settings as previous step
113 | * In the same (local) computer: [http://localhost:8080/](http://localhost:8080)
114 | * In other systems in the network: [http://HOSTIP:8080]()
115 |
116 | ### Video tutorial
117 | You might find this [Youtube tutorial](https://www.youtube.com/watch?v=fJ6tUt_6uas) helpful to get a visually comprehensive introduction on setting-up and running GraphClust2.
118 |
119 |
120 | [](https://www.youtube.com/watch?v=fJ6tUt_6uas)
121 |
122 | ### Interactive tours
123 | Interactive Tours are available for Galaxy and GraphClust2. To run the tours please on top panel go to **Help→Interactive Tours** and click on one of the tours prefixed *GraphClust*. You can check the other tours for a more general introduction to the Galaxy interface.
124 |
125 | ### Import additional workflows
126 |
127 | To import or upload additional workflow flavors (e.g. from [extra-workflows directory](./workflows/extra-workflows/)), on the top panel go to *Workflow* menu. On top right side of the screen click on "Upload or import workflow" button. You can either upload workflow from your local system or by providing the URL of the workflow. Log in is necessary to access into the workflow menu. The docker galaxy instance has a pre-configured *easy!* info that can be found by following the interactive tour. You can download workflows from the following links
128 |
129 | ### Workflow flavors
130 | The pre-configured flavors of GraphClust2 are provided and described inside the [workflows directory](./workflows/)
131 |
132 | #### Workflows on the running server
133 | Below workflows can be directly accessed on the public server:
134 | * MotifFinder: [GraphClust-MotifFinder](https://graphclust.usegalaxy.eu/u/graphclust2/w/graphclust2--motiffinder)
135 | * Workflow main: [GraphClust_1r](https://graphclust.usegalaxy.eu/u/graphclust2/w/graphclust2--main-1r)
136 | * Workflow main, preconfigured for two rounds : [GraphClust_2r](https://graphclust.usegalaxy.eu/u/graphclust2/w/graphclust2--main-2r)
137 |
138 | ## command line support (beta)
139 | Galaxy service is accessible via the Galaxy project `bioblend` API library. In the future we plan to provide a full integration of bioblend API for GraphClust2. Currently a beta support for running GraphClust2 via the CLI is available. The wrapper and setup template is available inside [CLI-workflow-executor](./CLI-workflow-executor) directory.
140 |
141 |
142 | ## [Frequently Asked Questions](FAQ.md)
143 |
144 | Workflow overview
145 | ===============================
146 |
147 | The pipeline for clustering RNA sequences and structured motif discovery is a multi-step pipeline. Overall it consists of three major phases: a) sequence based pre-clustering b) encoding predicted RNA structures as graph features c) iterative fast candidate clustering then refinement
148 |
149 | 
150 |
151 |
152 | Below is a coarse-grained correspondence list of GraphClust2 tool names with each step:
153 |
154 | | Stage | Galaxy Tool Name | Description|
155 | | :--------------------: | :--------------- | :----------------|
156 | |1 | [Preprocessing](https://graphclust.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_preprocessing/preproc/0.5) | Input preprocessing (fragmentation)|
157 | |2 | [fasta_to_gspan](https://graphclust.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_fasta_to_gspan/gspan/0.4) | Generation of structures via RNAshapes and conversion into graphs|
158 | |3 | [NSPDK_sparseVect](https://graphclust.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_nspdk/nspdk_sparse/9.2.3) | Generation of graph features via NSPDK |
159 | |4| [NSPDK_candidateClusters](https://graphclust.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_nspdk/NSPDK_candidateClust/9.2.3) | min-hash based clustering of all feature vectors, output top dense candidate clusters|
160 | |5| [PGMA_locarna](https://graphclust.usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_prepocessing_for_mlocarna/preMloc/0.4),[locarna](https://graphclust.usegalaxy.eu/tool_runner?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_mlocarna/locarna_best_subtree/0.4), [CMfinder](https://graphclust.usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_cmfinder/cmFinder/0.4) | Locarna based clustering of each candidate cluster, all-vs-all pairwise alignments, create multiple alignments along guide tree, select best subtree, and refine alignment.|
161 | |6| [Build covariance models](https://graphclust.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/bgruening/infernal/infernal_cmbuild/1.1.0.2) | create candidate model |
162 | |7| [Search covariance models](https://graphclust.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/bgruening/infernal/infernal_cmsearch/1.1.0.2) | Scan full input sequences with Infernal's cmsearch to find missing cluster members |
163 | |8,9| [Report results](https://graphclust.usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam/graphclust_postprocessing/glob_report/0.5) and [conservation evaluations](https://graphclust.usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu/repos/rnateam%2Fgraphclust_aggregate_alignments/graphclust_aggregate_alignments/0.1) | Collect final clusters and create example alignments of top cluster members|
164 |
165 |
166 | ### Input
167 | The input to the workflow is a set of putative RNA sequences in FASTA format. Inside the `data` directory you can find examples of the input format. The labeled datasets are based on Rfam annotation that are labeled with the associated RNA family.
168 |
169 | ### Output
170 | The output contains the predicted clusters, where similar putative input RNA sequences form a cluster. Additionally overall status of the clusters and the matching of cluster elements is reported for each cluster.
171 |
172 |
173 |
179 |
180 |
181 | # Support & Bug Reports
182 |
183 | You can file a [github issue](https://github.com/BackofenLab/GraphClust-2/issues) or find our contact information in the [lab page](http://www.bioinf.uni-freiburg.de/team.html?en).
184 |
185 | # References
186 | The manuscript is currently under prepration/revision. If you find this resource useful, please cite the zenodo DOI of the repo or contact us.
187 |
188 | * Miladi, Milad, Eteri Sokhoyan, Torsten Houwaart, Steffen Heyne, Fabrizio Costa, Bjoern Gruening, and Rolf Backofen. "GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering." GigaScience, Volume 8, Issue 12, December 2019, giz150. doi: [https://doi.org/10.1093/gigascience/giz150](https://doi.org/10.1093/gigascience/giz150)
189 | * Milad Miladi, Björn Grüning, & Eteri Sokhoyan. BackofenLab/GraphClust-2: Zenodo. http://doi.org/10.5281/zenodo.1135094
190 | * GraphClust-1 methodology (S. Heyne, F. Costa, D. Rose, R. Backofen;
191 | GraphClust: alignment-free structural clustering of local RNA secondary structures; Bioinformatics, 2012) available at http://www.bioinf.uni-freiburg.de/Software/GraphClust/
192 |
193 |
--------------------------------------------------------------------------------
/assets/img/GalaxyDocker.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/GalaxyDocker.png
--------------------------------------------------------------------------------
/assets/img/figure-pipeline_zigzag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/figure-pipeline_zigzag.png
--------------------------------------------------------------------------------
/assets/img/graphclust_pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/graphclust_pipeline.png
--------------------------------------------------------------------------------
/assets/img/kitematic-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/kitematic-1.png
--------------------------------------------------------------------------------
/assets/img/kitematic-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/kitematic-2.png
--------------------------------------------------------------------------------
/assets/img/kitematic-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/kitematic-3.png
--------------------------------------------------------------------------------
/assets/img/kitematic-32.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/kitematic-32.png
--------------------------------------------------------------------------------
/assets/img/kitematic-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/kitematic-4.png
--------------------------------------------------------------------------------
/assets/img/kitematic-5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/kitematic-5.png
--------------------------------------------------------------------------------
/assets/img/video-thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/video-thumbnail.png
--------------------------------------------------------------------------------
/assets/img/workflow_early.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BackofenLab/GraphClust-2/c6edda2b28371d1fa6aa2a9750b890982cb32461/assets/img/workflow_early.png
--------------------------------------------------------------------------------
/assets/library/library_data.yaml:
--------------------------------------------------------------------------------
1 | libraries:
2 | - name: "FASTA"
3 | files:
4 | - url: https://raw.githubusercontent.com/eteriSokhoyan/test-data/master/cliques-high-representatives.fa
5 | file_type: fasta
6 | - url: https://raw.githubusercontent.com/eteriSokhoyan/test-data/master/cliques-low-representatives.fa
7 | file_type: fasta
8 |
--------------------------------------------------------------------------------
/assets/tools/graphclust_tools.yml:
--------------------------------------------------------------------------------
1 | ---
2 | # This is a sample file to be used as a reference for populating a list of
3 | # tools that you wish to install into Galaxy from a Tool Shed via the
4 | # `install_tool_shed_tools.py` script.
5 | #
6 | # For each tool you want to install, you must provide the following keys:
7 | # * name: this is is the name of the tool to install
8 | # * owner: owner of the Tool Shed repository from where the tools is being
9 | # installed
10 | # Further, you need to provide **one** of the following two keys:
11 | # * tool_panel_section_id: ID of the tool panel section where you want the
12 | # tool to be installed. The section ID can be found
13 | # in Galaxy's `shed_tool_conf.xml` config file. Note
14 | # that the specified section must exist in this file.
15 | # Otherwise, the tool will be installed outside any
16 | # section.
17 | # * tool_panel_section_label: Display label of a tool panel section where
18 | # you want the tool to be installed. If it does not
19 | # exist, this section will be created on the target
20 | # Galaxy instance (note that this is different than
21 | # when using the ID).
22 | # Multi-word labels need to be placed in quotes.
23 | # Each label will have a corresponding ID created;
24 | # the ID will be an all lowercase version of the
25 | # label, with multiple words joined with
26 | # underscores (e.g., 'BED tools' -> 'bed_tools').
27 | #
28 | # Tou can also specify the following optional keys to further define the
29 | # installation properties:
30 | # * tool_shed_url: the URL of the Tool Shed from where the tool should be
31 | # installed. (default: https://toolshed.g2.bx.psu.edu)
32 | # * revisions: a list of revisions of the tool, all of which will attempt to
33 | # be installed. (default: latest)
34 | # * install_tool_dependencies: True or False - whether to install tool
35 | # dependencies or not. (default: True)
36 | # * install_repository_dependencies: True or False - whether to install repo
37 | # dependencies or not. (default: True)
38 |
39 | api_key: admin
40 | galaxy_instance: http://localhost:8080
41 | install_resolver_dependencies: True
42 | install_tool_dependencies: False
43 | tools:
44 | - name: graphclust_cmfinder
45 | owner: rnateam
46 | tool_panel_section_label: "GraphClust"
47 |
48 | - name: graphclust_postprocessing
49 | owner: rnateam
50 | tool_panel_section_label: "GraphClust"
51 |
52 | - name: graphclust_fasta_to_gspan
53 | owner: rnateam
54 | tool_panel_section_label: "GraphClust"
55 |
56 | - name: structure_to_gspan
57 | owner: rnateam
58 | tool_panel_section_label: "GraphClust"
59 |
60 | - name: graphclust_mlocarna
61 | owner: rnateam
62 | tool_panel_section_label: "GraphClust"
63 |
--------------------------------------------------------------------------------
/assets/tools/graphclust_tools2.yml:
--------------------------------------------------------------------------------
1 | ---
2 | # This is a sample file to be used as a reference for populating a list of
3 | # tools that you wish to install into Galaxy from a Tool Shed via the
4 | # `install_tool_shed_tools.py` script.
5 | #
6 | # For each tool you want to install, you must provide the following keys:
7 | # * name: this is is the name of the tool to install
8 | # * owner: owner of the Tool Shed repository from where the tools is being
9 | # installed
10 | # Further, you need to provide **one** of the following two keys:
11 | # * tool_panel_section_id: ID of the tool panel section where you want the
12 | # tool to be installed. The section ID can be found
13 | # in Galaxy's `shed_tool_conf.xml` config file. Note
14 | # that the specified section must exist in this file.
15 | # Otherwise, the tool will be installed outside any
16 | # section.
17 | # * tool_panel_section_label: Display label of a tool panel section where
18 | # you want the tool to be installed. If it does not
19 | # exist, this section will be created on the target
20 | # Galaxy instance (note that this is different than
21 | # when using the ID).
22 | # Multi-word labels need to be placed in quotes.
23 | # Each label will have a corresponding ID created;
24 | # the ID will be an all lowercase version of the
25 | # label, with multiple words joined with
26 | # underscores (e.g., 'BED tools' -> 'bed_tools').
27 | #
28 | # Tou can also specify the following optional keys to further define the
29 | # installation properties:
30 | # * tool_shed_url: the URL of the Tool Shed from where the tool should be
31 | # installed. (default: https://toolshed.g2.bx.psu.edu)
32 | # * revisions: a list of revisions of the tool, all of which will attempt to
33 | # be installed. (default: latest)
34 | # * install_tool_dependencies: True or False - whether to install tool
35 | # dependencies or not. (default: True)
36 | # * install_repository_dependencies: True or False - whether to install repo
37 | # dependencies or not. (default: True)
38 |
39 | api_key: admin
40 | galaxy_instance: http://localhost:8080
41 | install_resolver_dependencies: True
42 | install_tool_dependencies: False
43 | tools:
44 | - name: graphclust_nspdk
45 | owner: rnateam
46 | tool_panel_section_label: "GraphClust"
47 |
48 | - name: graphclust_prepocessing_for_mlocarna
49 | owner: rnateam
50 | tool_panel_section_label: "GraphClust"
51 |
52 | - name: graphclust_preprocessing
53 | owner: rnateam
54 | tool_panel_section_label: "GraphClust"
55 |
56 | - name: graphclust_motif_finder_plot
57 | owner: rnateam
58 | tool_panel_section_label: "GraphClust"
59 |
60 | - name: graphclust_postprocessing_no_align
61 | owner: rnateam
62 | tool_panel_section_label: "GraphClust"
63 |
64 | - name: graphclust_align_cluster
65 | owner: rnateam
66 | tool_panel_section_label: "GraphClust"
67 |
68 | - name: graphclust_aggregate_alignments
69 | owner: rnateam
70 | tool_panel_section_label: "GraphClust"
71 |
72 |
73 |
74 |
--------------------------------------------------------------------------------
/assets/tools/graphclust_utils.yml:
--------------------------------------------------------------------------------
1 | ---
2 | # This is a sample file to be used as a reference for populating a list of
3 | # tools that you wish to install into Galaxy from a Tool Shed via the
4 | # `install_tool_shed_tools.py` script.
5 | #
6 | # For each tool you want to install, you must provide the following keys:
7 | # * name: this is is the name of the tool to install
8 | # * owner: owner of the Tool Shed repository from where the tools is being
9 | # installed
10 | # Further, you need to provide **one** of the following two keys:
11 | # * tool_panel_section_id: ID of the tool panel section where you want the
12 | # tool to be installed. The section ID can be found
13 | # in Galaxy's `shed_tool_conf.xml` config file. Note
14 | # that the specified section must exist in this file.
15 | # Otherwise, the tool will be installed outside any
16 | # section.
17 | # * tool_panel_section_label: Display label of a tool panel section where
18 | # you want the tool to be installed. If it does not
19 | # exist, this section will be created on the target
20 | # Galaxy instance (note that this is different than
21 | # when using the ID).
22 | # Multi-word labels need to be placed in quotes.
23 | # Each label will have a corresponding ID created;
24 | # the ID will be an all lowercase version of the
25 | # label, with multiple words joined with
26 | # underscores (e.g., 'BED tools' -> 'bed_tools').
27 | #
28 | # Tou can also specify the following optional keys to further define the
29 | # installation properties:
30 | # * tool_shed_url: the URL of the Tool Shed from where the tool should be
31 | # installed. (default: https://toolshed.g2.bx.psu.edu)
32 | # * revisions: a list of revisions of the tool, all of which will attempt to
33 | # be installed. (default: latest)
34 | # * install_tool_dependencies: True or False - whether to install tool
35 | # dependencies or not. (default: True)
36 | # * install_repository_dependencies: True or False - whether to install repo
37 | # dependencies or not. (default: True)
38 |
39 | api_key: admin
40 | galaxy_instance: http://localhost:8080
41 | install_resolver_dependencies: True
42 | install_tool_dependencies: False
43 | tools:
44 | - name: text_processing
45 | owner: bgruening
46 | tool_panel_section_id: "textutil"
47 |
48 | - name: fasta_compute_length
49 | owner: devteam
50 | tool_panel_section_label: "FASTA manipulation tools"
51 |
52 | - name: fasta_to_tabular
53 | owner: devteam
54 | tool_panel_section_label: "FASTA manipulation tools"
55 |
56 | - name: tabular_to_fasta
57 | owner: devteam
58 | tool_panel_section_label: "FASTA manipulation tools"
59 |
60 | - name: seq_filter_by_id
61 | owner: peterjc
62 | tool_panel_section_label: "FASTA manipulation tools"
63 |
64 | - name: infernal
65 | owner: bgruening
66 | tool_panel_section_label: "GraphClust"
67 |
68 | - name: cdhit
69 | owner: bebatut
70 | tool_panel_section_label: "CD-HIT"
71 |
72 | - name: viennarna_rnafold
73 | owner: rnateam
74 | tool_panel_section_label: "GraphClust"
75 |
76 |
77 |
--------------------------------------------------------------------------------
/assets/tours/graphclust_step_by_step.yaml:
--------------------------------------------------------------------------------
1 | name: GraphClust workflow step by step
2 | description: Step by step instructions for using GraphClust for clustering RNA sequences
3 | title_default: "GraphClust step by step"
4 | steps:
5 | - title: "A tutorial on GraphClust(Clustering RNA sequences)"
6 | content: "This tour will walk you through the process of GraphClust to cluster RNA sequences.
7 | Read and Follow the instructions before clicking 'Next'.
8 | Click 'Prev' in case you missed out on any step."
9 | backdrop: true
10 |
11 | - title: "A tutorial on GraphClust"
12 | content: "Together we will go through the following 10 steps:
13 |
14 |
15 |
Data Acquisition
16 |
Pre-processing
17 |
Creating a graph from FASTA
18 |
Creating a sparse vector
19 |
Computing candidate clusters
20 |
Preprocessing data for computing best subtree
21 |
Computing best subtree using LocaRna
22 |
Finding consensus motives
23 |
Building Covariance Model using cmbuild
24 |
Searching homologous sequences using cmsearch
25 |
Post-processing
26 |
27 | "
28 | backdrop: true
29 |
30 |
31 | - title: "Data Acquisition"
32 | content: "We will start with a simple small FASTA file.
33 | You will get one FASTA file with RNA sequences that we want to cluster.
"
34 | backdrop: true
35 |
36 | - title: "Data Acquisition"
37 | element: ".upload-button"
38 | intro: "We will import the FASTA file into into the history we just created.
39 | Click 'Next' and the tour will take you to the Upload screen."
40 | position: "right"
41 | postclick:
42 | - ".upload-button"
43 |
44 | - title: "Data Acquisition"
45 | element: "button#btn-new"
46 | intro: "The sample training data available on github is a good place to start.
47 | Simply click 'Next' and the links to the training data will be automatically inserted and ready for upload.
48 | Later on, when you want to upload other data, you can do so by clicking the 'Paste/Fetch Data' button or
49 | 'Choose local file' to upload locally stored file."
50 | position: "top"
51 | postclick:
52 | - "button#btn-new"
53 |
54 | - title: "Data Acquisition"
55 | element: ".upload-text-content:first"
56 | intro: "Links Acquired !"
57 | position: "top"
58 | textinsert:
59 | https://github.com/BackofenLab/docker-galaxy-graphclust/raw/master/data/Rfam-cliques-dataset/cliques-low-representatives.fa
60 |
61 | - title: "Data Acquisition"
62 | element: "button#btn-start"
63 | intro: "Click on 'Start' to upload the data into your Galaxy history."
64 | position: "top"
65 |
66 | - title: "Data Acquisition"
67 | element: "button#btn-close"
68 | intro: "The upload may take awhile.
69 | Hit the close button when you see that the files are uploaded into your history."
70 | position: "top"
71 |
72 | title: "Data Acquisition"
73 | element: "#current-history-panel > div.controls"
74 | intro: "You've acquired your data. Now let's start using GraphCLust tools.
"
75 | position: "left"
76 |
77 | - title: "GraphCLust"
78 | intro: "Once we have the data for analysis we can start the process of clustering RNA sequences step by step.
79 | Navigate to the tool panel on the left side and click on GraphCLust. This will open a section
80 | with set of tool necessary for GraphCLust process.
81 | The first step of the process is Preprocessing."
82 | position: "right"
83 |
84 |
85 | - title: "Pre-processing"
86 | intro: "This tool takes as an input file of sequences in Fasta format
87 | and creates the final input for GraphCLust based on given parameters. Parameters allows us to
88 | split long sequences into smaller fragments to enable the detection of local signals"
89 | position: "right"
90 | postclick:
91 | - "#preproc > div.toolTitle > div > a"
92 |
93 |
94 | - title: "Pre-processing"
95 | element: "#s2id_uid-18_select > a"
96 | intro: "Here we should define our input file. "
97 | position: "top"
98 |
99 |
100 | - title: "Pre-processing"
101 | element: "#uid-11"
102 | intro: "'Window size' defines the length of the fragments that the input sequences will be split.
103 | In default settings we set it to very high number to not split the sequences at all.
"
104 | position: "top"
105 |
106 |
107 | - title: "Pre-processing"
108 | element: "#uid-64 > div.ui-form-title > span"
109 | intro: "'Window shift in percent' defines the percentage of the shift fot fragments of the input sequences.
110 | In default settings it is 100% because we don't split input sequences.
"
116 | position: "top"
117 |
118 |
119 | - title: "Pre-processing"
120 | element: "#execute"
121 | intro: "To run the tool press 'Execute' button. "
122 | position: "top"
123 |
124 |
125 | - title: "Understanding the Output"
126 | element: "#current-history-panel"
127 | intro: "After the tool is executed several output files are created.
128 | By clicking on the 'eye' icon you can see the content of the files.
129 |
"
130 | position: "left"
131 |
132 | - title: "Preprocessing : DONE"
133 | intro: "Once preprocessing of the data is done we can move on to the next step :create a graph from FASTA file. "
134 | position: "right"
135 |
136 | - title: "Creating a graph from FASTA"
137 | intro: "To create a graph we need to use a tool called fasta_to_gspan which you
138 | can find in GraphClust section of tool panel. "
139 | position: "right"
140 | postclick:
141 | - "#gspan > div.toolTitle > div > a"
142 |
143 | - title: "Creating a graph from FASTA"
144 | element: "#uid-71 > div.ui-form-field > div.ui-select-content > div.ui-options > div.btn-group.ui-radiobutton"
145 | intro: "As an input this tool takes pre-processed FASTA file: 'data.fasta'. "
146 | position: "right"
147 |
148 |
149 | - title: "Creating a graph from FASTA"
150 | intro: "Detailed description of each parameter you can find in help section at the bottom of the page"
151 | position: "top"
152 |
153 | - title: "Creating a graph from FASTA"
154 | element: "#execute"
155 | intro: "To run the tool press 'Execute' button. "
156 | position: "top"
157 |
158 | - title: "Creating a graph from FASTA"
159 | intro: "Once tool is executed it will produce gspan.zip file. "
160 | position: "top"
161 |
162 | - title: "Creating a graph from FASTA : DONE"
163 | intro: "Now we have gspan.zip file and can move on the next step.
164 | The next step is to create sparse vector using NSPDK.
165 | From GraphClust tools choose NSPDK_sparseVect to start the process. "
166 | position: "top"
167 | postclick:
168 | - "#nspdk_sparse > div.toolTitle > div > a"
169 |
170 | - title: "Creating a sparse vector"
171 | intro: "This tool will create explicit sparse feature encoding using NSPDK. "
172 | position: "top"
173 |
174 |
175 | - title: "Creating a sparse vector"
176 | intro: "This tools requires 2 input files:
177 |
178 |
179 |
180 |
data.fasta : from pre-processing step
181 |
gspan.zip : from the previous step
182 |
183 | "
184 | position: "top"
185 |
186 |
187 | - title: "Creating a sparse vector"
188 | intro: "More information about NSPDK you can find in help section "
189 | position: "top"
190 |
191 | - title: "Creating a sparse vector"
192 | element: "#execute"
193 | intro: "Run the tool by pressing 'Execute' button. "
194 | position: "top"
195 |
196 | - title: "Creating a sparse vector"
197 | intro: "After tool is exec it will produce data_svector which we will use in the next step."
198 | position: "top"
199 |
200 | - title: "Creating a sparse vector: DONE"
201 | intro: "Created data_svector file will be used in the next step for computing
202 | candidate cluster. For that click on NSPDK_candidateClusters tool."
203 | position: "top"
204 | postclick:
205 | - "#NSPDK_candidateClust > div.toolTitle > div > a"
206 |
207 |
208 | - title: "Computing candidate clusters"
209 | intro: "During this step we will compute global feature index and get top dense sets.
210 | The candidate clusters are chosen as the top ranking neighborhoods provided that the
211 | size of their overlap is below a specified threshold."
212 | position: "top"
213 |
214 |
215 |
216 | - title: "Computing candidate clusters"
217 | intro: "Here we have to specify 3 input files:
218 |
219 |
220 |
221 |
data_svector : from the previous step
222 |
data_fasta
223 | and
224 |
data_names from pre-processing step
225 |
226 | "
227 | position: "top"
228 |
229 | - title: "Computing candidate clusters"
230 | intro: "Another important parameter for this tool is Multiple iterations.
231 | This parameter by default is set to 'no' which means we will do only a single iteration.
232 | By setting it to 'yes' we have to define some other input files which we would get from previous
233 | iterations. But in the scope of this tutorial we will just go for a single iteration."
234 | position: "top"
235 |
236 | - title: "Computing candidate clusters"
237 | intro: "For more information about this tool check the help section. "
238 | position: "top"
239 |
240 | - title: "Computing candidate clusters"
241 | element: "#execute"
242 | intro: "Run the tool by pressing 'Execute' button. "
243 | position: "top"
244 |
245 | - title: "Computing candidate clusters"
246 | intro: "This step will produce 3 output files:
247 |
248 |
249 |
250 |
fast_cluster
251 |
fast_cluster_sim
252 |
blacklist
253 |
254 |
255 | fast_cluster represents the id's of the candidate clusters.
256 | fast_cluster_sim represents the similarity scores.
257 | blacklist contains the ids of sequences that were already clustered, that's why in case
258 | of the single iteration it's empty. "
259 | position: "top"
260 |
261 | - title: "Computing candidate clusters : DONE"
262 | intro: "Once output files are ready we can move to the nexr step by clicking on
263 | premlocarna tool. "
264 | position: "top"
265 | postclick:
266 | - "#preMloc > div.toolTitle > div > a"
267 |
268 | - title: "Preprocessing data for computing best subtree"
269 | intro: "This tool will do some pre-processing for computing best subtrees. "
270 | position: "top"
271 |
272 |
273 | - title: "Preprocessing data for computing best subtree"
274 | intro: "This step needs 4 input files:
275 |
276 |
277 |
278 |
fast_cluster : from the previous step
279 |
fast_cluster_sim : from the previous step
280 |
data_fasta
281 | and
282 |
data_names from pre-processing step
283 |
284 |
285 | "
286 | position: "top"
287 |
288 | - title: "Preprocessing data for computing best subtree"
289 | element: "#execute"
290 | intro: "Run the tool by pressing 'Execute' button. "
291 | position: "top"
292 |
293 | - title: "Preprocessing data for computing best subtree"
294 | intro: "Execution of this tool results in 4 datasets :
295 |
296 |
297 |
298 |
centers
299 |
trees
300 |
tree_matrix
301 |
cmfinder_fa
302 |
model_tree_fa
303 |
304 |
305 | These datasets will be used in next steps.
306 | "
307 | position: "top"
308 |
309 | - title: "Preprocessing data for computing best subtree : DONE"
310 | intro: "Preprocessing for the best tree computation is done, so we can now do the actual computation
311 | of the best subtree. For that we need the tool called locarna_best_subtree. "
312 | position: "top"
313 | postclick:
314 | - "#locarna_best_subtree > div.toolTitle > div > a"
315 |
316 | - title: "Computing best subtree using LocaRna"
317 | intro: "This step computes a multiple sequence-structure alignment of RNA sequences using LocaRna.
318 | It uses tree file (from previous step) with guide tree in NEWICK format.
319 | The given tree is used as guide tree for the progressive alignment. It saves the calculation
320 | of pairwise all-vs-all similarities and construction of the guide tree. And at the end return the best subtree "
321 | position: "top"
322 |
323 |
324 | - title: "Computing best subtree using LocaRna"
325 | intro: "This step takes as an input following files:
326 |
327 |
328 |
centers
329 |
trees
330 |
tree_matrix
331 | from previous steps and
332 |
data_map from pre step
333 |
334 |
335 | "
336 | position: "top"
337 |
338 | - title: "Computing best subtree using LocaRna"
339 | element: "#execute"
340 | intro: "Run the tool by pressing 'Execute' button. "
341 | position: "top"
342 |
343 | - title: "Computing best subtree using LocaRna"
344 | intro: "Output of this tool is model.tree.stk which will
345 | be used in next step to find consensus motives. "
346 | position: "top"
347 |
348 | - title: "Computing best subtree using LocaRna : DONE"
349 | intro: "Now we have model.tree.stk so we can find consensus motives using the next tool -
350 | CMFinder_v0. "
351 | position: "top"
352 | postclick:
353 | - "#cmFinder > div.toolTitle > div > a"
354 |
355 | - title: "Finding consensus motives"
356 | intro: "During this step conversion from CLUSTAL format files to STOCKHOLM format is done.
357 | Then using CMFinder we determine consensus motives for sequences. "
358 | position: "top"
359 |
360 |
361 | - title: "Finding consensus motives"
362 | intro: "This tool takes as an input the following files:
363 |
364 |
365 |
model_tree_stk : from previous step
366 |
cmfinder_fa : from 'Preprocessing data for computing best subtree' step
367 |
tree_matrix
368 |
369 |
370 | "
371 | position: "top"
372 |
373 | - title: "Finding consensus motives"
374 | element: "#execute"
375 | intro: "Run the tool by pressing 'Execute' button. "
376 | position: "top"
377 |
378 | - title: "Finding consensus motives"
379 | intro: "Output of the tool is in STOCKHOLM format and contains the consensus structure. "
380 | position: "top"
381 |
382 | - title: "Finding consensus motives : DONE"
383 | intro: "Once we have consensus structure we can build covariance model with cmbuild tool.
384 | For that click on the tool named Build covariance models "
385 | position: "top"
386 | postclick:
387 | - "#infernal > div:nth-child(3) > div > a "
388 |
389 | - title: "Building Covariance Model using cmbuild"
390 | intro: "In this step we cm build a covariance model of an RNA multiple alignment.
391 | cmbuild uses the consensus structure to determine the architecture of the covariance model. "
392 | position: "top"
393 |
394 |
395 | - title: "Building Covariance Model using cmbuild"
396 | intro: "As an input for this tool we give 'model_cmfinder_stk' file containing consensus
397 | structure from the pre step.
398 | For more information about this tool read the help section of the page. "
399 | position: "top"
400 |
401 | - title: "Building Covariance Model using cmbuild"
402 | element: "#execute"
403 | intro: "Run the tool by pressing 'Execute' button. "
404 | position: "top"
405 |
406 | - title: "Building Covariance Model using cmbuild : DONE"
407 | intro: "After covariance model is built we can move on to .
408 | Simply click on Search covariance model(s). "
409 | position: "top"
410 | postclick:
411 | - "#infernal > div:nth-child(4) > div > a"
412 |
413 | - title: "Searching homologous sequences using cmsearch"
414 | intro: "cmsearch allows you to make consensus RNA secondary structure profiles,
415 | and use them to search nucleic acid sequence databases for homologous RNAs. "
416 | position: "top"
417 |
418 | - title: "Searching homologous sequences using cmsearch"
419 | intro: "As a sequence database we choose data_fasta_scan file generated during
420 | pre-processing.
421 | Then to use covariance model generated in previous step you should
422 | select from Subject covariance models dropdown menu Covariance model from your history option."
423 | position: "top"
424 |
425 | - title: "Searching homologous sequences using cmsearch"
426 | element: "#execute"
427 | intro: "Run the tool by pressing 'Execute' button. "
428 | position: "top"
429 |
430 | - title: "Searching homologous sequences using cmsearch : DONE"
431 | intro: "Finally we reached the last step of our workflow!
432 | Click on Report_Results to do the final step. "
433 | position: "top"
434 | postclick:
435 | - "#glob_report > div.toolTitle > div > a"
436 |
437 | - title: "Post-processing"
438 | intro: "Final step of our workflow is Post-processing.
439 | In this step we will report clusters and merge them if needed."
440 | position: "top"
441 |
442 |
443 | - title: "Post-processing"
444 | intro: "This tool takes as an input the following files:
445 |
446 |
447 |
FASTA.zip : from Preprocessing step
448 |
cmsearch_results : from previous step
449 | and
450 |
model_tree_files : from 'Preprocessing data for computing best subtree' step
451 |
452 |
453 | "
454 | position: "top"
455 |
456 | - title: "Post-processing"
457 | intro: "The final output:
458 | 'cluster.final.stat' file contains general information about clusters,
459 | e.g. number of clusters, number of sequences in each cluster etc.
460 | By clicking on the 'eye' icon you can see the content of the file.
461 |
462 | 'CLUSTERS' dataset collection contains one file for each cluster.
463 | Each file contains information about sequences in that cluster. Each line in the file contains: cluster number,
464 | cm_score, sequence origin (whether it comes from model or from Infernal search) and sequence id.
465 | "
466 | position: "top"
467 |
468 | - title: "A tutorial on GraphClust workflow"
469 | intro: "Thank You for going through our tutorial."
470 | backdrop: true
471 |
--------------------------------------------------------------------------------
/assets/tours/graphclust_tutorial.yaml:
--------------------------------------------------------------------------------
1 | name: GraphClust workflow
2 | description: Simple instructions for using GraphClust workflow for clustering RNA sequences
3 | title_default: "GraphClust"
4 | steps:
5 | - title: "A tutorial on Galaxy-GraphClust(Clustering RNA sequences)"
6 | content: "This tour will walk you through the process of GraphClust to cluster RNA sequences.
7 | In the forthcoming windows please read and follow the instructions before clicking 'Next'.
8 | Click 'Prev' in case you missed out on any step."
9 | backdrop: true
10 |
11 | - title: "A tutorial on GraphClust"
12 | content: "Together we will go through the following steps:
13 |
14 |
15 |
Data Acquisition
16 |
Running the Workflow
17 |
Understanding the Output
18 |
19 | "
20 | backdrop: true
21 |
22 | - title: "Log in"
23 | element: '#user > li > a'
24 | intro: " To be able to use workflows you should be logged in. So if you already have an account
25 | simply log in or otherwise register by clicking on 'User'.
26 | Within a Docker Galaxy-GraphClust everyone can register by default.
27 |
28 | To have a convenient access the worklows log in you can login with the pre-configured username and password:
29 |
30 | username : admin@galaxy.org
31 | password : admin
32 | "
33 | position: "left"
34 |
35 |
36 | - title: "GraphClust"
37 | intro: "Now that you are logged-in we can continue our tour"
38 | position: "left"
39 | backdrop: true
40 |
41 |
42 | - title: "Create a new history"
43 | element: '#history-options-button'
44 | intro: "Let's start by creating a new history:
45 | (History options :: Create New)"
46 | position: "left"
47 | preclick:
48 | - '#center-panel'
49 |
50 | - title: "Rename the history"
51 | element: "#current-history-panel > div.controls"
52 | intro: "Change the name of the new history to 'GraphClust'."
53 | position: "left"
54 |
55 | - title: "Data Acquisition"
56 | content: "We start with uploading a simple small set of sequences in FASTA format.
57 | You will get one FASTA file with RNA sequences that we want to cluster.
"
58 | backdrop: true
59 |
60 | - title: "Data Acquisition"
61 | element: ".upload-button"
62 | intro: "We will import the FASTA file into the history we just created.
63 | Click 'Next' and the tour will take you to the Upload screen."
64 | position: "right"
65 | postclick:
66 | - ".upload-button"
67 |
68 | - title: "Data Acquisition"
69 | element: "button#btn-new"
70 | intro: "The sample input data available on GitHub is a good place to start.
71 | Simply click 'Next' and the links to the input data will be automatically inserted and ready for upload.
72 | Later on, when you want to upload other data, you can do so by clicking the 'Paste/Fetch Data' button or
73 | 'Choose local file' to upload locally stored file."
74 | position: "top"
75 | postclick:
76 | - "button#btn-new"
77 |
78 | - title: "Data Acquisition"
79 | element: ".upload-text-content:first"
80 | intro: "Link acquired !
81 | This file contains annotated RNAs with human origin from RFAM database, from a mixture of RNA families."
82 | position: "top"
83 | textinsert:
84 | https://github.com/BackofenLab/docker-galaxy-graphclust/raw/master/data/Rfam-cliques-dataset/cliques-low-representatives.fa
85 |
86 | - title: "Data Acquisition"
87 | element: "button#btn-start"
88 | intro: "Click on 'Start' to upload the data into your Galaxy history."
89 | position: "top"
90 |
91 | - title: "Data Acquisition"
92 | element: "button#btn-close"
93 | intro: "The upload may take awhile.
94 | Hit the close button when you see that the files are uploaded into your history."
95 | position: "top"
96 |
97 | - title: "Data Acquisition"
98 | element: "#current-history-panel > div.controls"
99 | intro: "You've now acquired the input data. Now let's launch a flavor of Galaxy-GraphClust Workflow.
"
100 | position: "left"
101 |
102 | - title: "Running a Workflow"
103 | element: 'a[href$="/workflow/list_for_run"]'
104 | intro: "Click on 'All Workflows' to access your saved and pre-configured Workflows.
105 | Alternatively you can click on 'Workflow' tab from the top panel."
106 | position: "right"
107 |
108 |
109 | - title: "Running a Workflow"
110 | element: 'a[href$="/workflow/run?id=1cd8e2f6b131e891"]'
111 | intro: "Inside your workflows list you should see variations of Galaxy-GraphClust.
112 | The round number specifies the number of iterative clusterings.
113 | Click on GraphClust_1_round then Run, which is the faster one better fitting for
114 | the purpose of this tutorial.Then click Next."
115 | position: "top"
116 |
117 |
118 | - title: "Running a Workflow"
119 | element: "#field-uid-1 > div.btn-group.ui-radiobutton"
120 | intro: "We skip 'History Options' section because we have already created new history, so there is no need to crate a new one.
"
121 | position: "top"
122 |
123 | - title: "Running a Workflow"
124 | element: "#uid-23 > div.portlet-header > div.portlet-title > span > b"
125 | intro: "Step 1 is the first step of our workflow. Here an input dataset must be assigned.
126 | Input data is a set of putative RNA sequences that we want to cluster.
127 | Please ensure the fasta file uploaded in the first step is selected."
128 | position: "right"
129 |
130 |
131 | - title: "Running a Workflow"
132 | element: 'button#uid-11'
133 | intro: "To run the workflow with default setting simply click on 'Run workflow' blue button
134 | on the top right.
135 | For details about pipeline settings you can check 'step-by-step' tutorial and Galaxy-GraphClust documentations. "
136 | position: "left"
137 |
138 | - title: "Understanding the Output"
139 | intro: "Running the workflow takes a few minutes. The workflow is finished
140 | when all the steps inside History panel changes from gray/yellow to green.
141 | After all the steps are done, clustering output is ready.
142 | The results can be checked from navigating through History panel."
143 | position: "top"
144 |
145 | - title: "Understanding the Output"
146 | element: "#current-history-panel"
147 | intro: "'cluster.final.stat' file contains overall information about predicted clusters.
148 | By clicking on the 'eye' icon you can see the content of the file.
149 | The first four columns specify number of clusters, cluster ids, number of sequences in each cluster."
150 | position: "left"
151 |
152 | - title: "Understanding the Output"
153 | element: "#current-history-panel"
154 | intro: "Click 'CLUSTERS' dataset collection to see clustered sequences.
155 | There is one file for each cluster.
156 | Each file contains information about sequences in that cluster. Each line in the file contains:
157 |
158 |
159 |
CLUSTER: cluster number
160 |
cm_score: Covariance model bit score indicates how well sequence matches to the CM
161 | model of the cluster.
162 |
sequence origin (whether it orginnates from dense center MODEL
163 | or from Infernal CMSEARCH or preclustering CDHIT)
164 |
Input fasta sequence id seperated into ORIGID and ORIGHEAD sections
165 |
166 | "
167 |
168 | position: "left"
169 |
170 |
171 |
172 | - title: "A tutorial on GraphClust workflow"
173 | intro: "Thank You for going through our tutorial."
174 | backdrop: true
175 |
--------------------------------------------------------------------------------
/assets/tours/graphclust_very_short.yaml:
--------------------------------------------------------------------------------
1 | name: GraphClust workflow fast tutorial
2 | description: Simple and short instructions for using GraphClust workflow for clustering RNA sequences
3 | title_default: "GraphClust_short_tour"
4 | steps:
5 | - title: "A tutorial on GraphClust(Clustering RNA sequences)"
6 | content: "This tour will walk you through the process of GraphClust to cluster RNA sequences.
7 | Read and Follow the instructions before clicking 'Next'.
8 | Click 'Prev' in case you missed out on any step."
9 | backdrop: true
10 |
11 | - title: "A tutorial on GraphClust"
12 | content: "Together we will go through the following steps:
13 |
14 |
15 |
Data Acquisition
16 |
Calling the Workflow
17 |
Understanding the Output
18 |
19 | "
20 | backdrop: true
21 |
22 | - title: "Log in"
23 | element: '#user > li > a'
24 | intro: " To be able to use workflows you should be logged in. So if you already have an account
25 | simply log in or otherwise register by clicking on 'User'."
26 | position: "left"
27 |
28 |
29 | - title: "GraphClust"
30 | intro: "Now when you are logged in we can continue out tour"
31 | position: "left"
32 | backdrop: true
33 |
34 |
35 |
36 | - title: "Data Acquisition"
37 | content: "We will start with a simple small FASTA file.
38 | You will get one FASTA file with RNA sequences that we want to cluster.
"
39 | backdrop: true
40 |
41 | - title: "Data Acquisition"
42 | element: ".upload-button"
43 | intro: "We will import the FASTA file into into the history we just created.
44 | Click 'Next' and the tour will take you to the Upload screen."
45 | position: "right"
46 | postclick:
47 | - ".upload-button"
48 |
49 | - title: "Data Acquisition"
50 | element: "button#btn-new"
51 | intro: "The sample training data available on github is a good place to start.
52 | Simply click 'Next' and the links to the training data will be automatically inserted and ready for upload.
53 | Later on, when you want to upload other data, you can do so by clicking the 'Paste/Fetch Data' button or
54 | 'Choose local file' to upload localy stored file."
55 | position: "top"
56 | postclick:
57 | - "button#btn-new"
58 |
59 | - title: "Data Acquisition"
60 | element: ".upload-text-content:first"
61 | intro: "Links Acquired !"
62 | position: "top"
63 | textinsert:
64 | https://github.com/BackofenLab/docker-galaxy-graphclust/raw/master/data/Rfam-cliques-dataset/cliques-low-representatives.fa
65 |
66 | - title: "Data Acquisition"
67 | element: "button#btn-start"
68 | intro: "Click on 'Start' to upload the data into your Galaxy history."
69 | position: "top"
70 |
71 | - title: "Data Acquisition"
72 | element: "button#btn-close"
73 | intro: "The upload may take awhile.
74 | Hit the close button when you see that the files are uploaded into your history."
75 | position: "top"
76 |
77 | - title: "Data Acquisition"
78 | element: "#current-history-panel > div.controls"
79 | intro: "You've acquired your data. Now let's call the GraphClust Workflow.
"
80 | position: "left"
81 |
82 | - title: "Running a Workflow"
83 | element: 'a[href$="/workflow/list_for_run"]'
84 | intro: "Click on 'All Workflows' to access your saved workflows. "
85 | position: "right"
86 |
87 |
88 | - title: "Running a Workflow"
89 | element: 'a[href$="/workflow/run?id=1cd8e2f6b131e891"]'
90 | intro: "Select simple one round iteration workflow GraphClust_1_round.
"
91 | position: "top"
92 |
93 |
94 | - title: "Running a Workflow"
95 | element: "#field-uid-1 > div.btn-group.ui-radiobutton"
96 | intro: "If you want the output to be in a new history click 'yes' in 'History Options' otherwise just move on.
"
97 | position: "top"
98 |
99 | - title: "Running a Workflow"
100 | element: "#uid-23 > div.portlet-header > div.portlet-title > span > b"
101 | intro: "Step 1 is the first step of our workflow.Here we should define out input dataset,
102 | which will be the uploaded FASTA file.
"
103 | position: "right"
104 |
105 |
106 | - title: "Running a Workflow"
107 | element: 'button#uid-11'
108 | intro: "To run the workflow with default setting simply click on 'Run workflow' button
109 | on the top.
"
110 | position: "left"
111 |
112 | - title: "Understanding the Output"
113 | intro: "Running the workflow might take a while.
114 | After all the steps are done in History panel we will see the outputs.
"
115 | position: "top"
116 |
117 | - title: "Understanding the Output"
118 | element: "#current-history-panel"
119 | intro: "'cluste.final.stat' file contains general information about clusters,
120 | e.g. number of clusters, number of sequences in each cluster etc.
121 | By clicking on the 'eye' icon you can see the content of the file.
122 |
"
123 | position: "left"
124 |
125 | - title: "Understanding the Output"
126 | element: "#current-history-panel"
127 | intro: "'CLUSTERS' dataset collection contains one file for each cluster.
128 | Each file contains information about sequences in that cluster. Each line in the file contains:
129 |
130 |
131 |
cluster number
132 |
cm_score
133 |
sequence origin (whether it comes from model or from Infernal search)
Hello, your Galaxy GraphClust-2 Docker container is running!
13 | GraphClust-2 is a web-base workflow for structural clustering of RNA secondary structures developed on GraphClust methodology using the Galaxy framework. This is a running instance GraphClust-2 virtualized container.
14 |
How to start:
15 | A quick and easy way to get familiar with Galaxy and GraphClust-2 would be to visit the interactive tours provided under Help → Interactive Tours. Or simply click on the Guided Tour button below to start a tour.
16 |
17 | Guided Tour Tutorial»
18 |
21 |
Documentation:
22 | For more information about the GraphClust-2 pipeline please check GraphClust-2 website:
23 | GraphClust-2 repository
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
References:
36 |
37 |
38 | M. Miladi, E. Sokhoyan, T. Houwaart, S. Heyne, F. Costa, R. Backofen and B. Gruening; Empowering the annotation and discovery of structured RNAs with scalable and accessible integrative clustering (under preparation/revision)
39 |
40 |
41 |
42 |
About Galaxy project:
43 |
44 |
45 |
46 |
47 | Galaxy is an open platform for supporting data intensive
48 | research. Galaxy is developed by The Galaxy Team
49 | with the support of many contributors.
50 | The Galaxy Docker project is supported by the University of Freiburg, part of de.NBI.
51 |