├── .gitignore ├── avro2uml ├── schema_urls ├── type_header_comments ├── make_uml.sh ├── README.md ├── url_converter.py ├── avpr2uml.py └── example_svgs │ └── g2p_2016-02-26.svg ├── README.md ├── protobuf2uml ├── schema_urls ├── make_uml.sh ├── type_header_comments ├── README.md ├── url_converter.py ├── example_svgs │ ├── bmeg_2016-04-14.svg │ ├── bmeg_2016-04-05.svg │ └── bmeg_2016-06-08.svg ├── descriptor2uml.py └── descriptor.proto └── LICENSE.txt /.gitignore: -------------------------------------------------------------------------------- 1 | protobuf2uml/__pycache__/* 2 | protobuf2uml/misc/* 3 | protobuf2uml/schemas_proto/* 4 | protobuf2uml/descriptor_pb2.py 5 | protobuf2uml/uml.svg 6 | protobuf2uml/uml.dot -------------------------------------------------------------------------------- /avro2uml/schema_urls: -------------------------------------------------------------------------------- 1 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/common.avdl 2 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/metadata.avdl 3 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/reads.avdl 4 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/references.avdl 5 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/variants.avdl -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### schema-uml motivation 2 | 3 | Avro and Protocol Buffers are schema-description languages. Schema files (e.g. [here](https://en.wikipedia.org/wiki/Protocol_Buffers#Example)) can be difficult to understand when there are lots of data structures and multiple files, so it is helpful to visualize the relationships between the data structures in a [UML diagram](https://en.wikipedia.org/wiki/Unified_Modeling_Language). Example: [here](https://cdn.rawgit.com/malisas/schema-uml/master/avro2uml/example_svgs/everything_edited_2016-03-13.svg) 4 | 5 | ### Contents of this directory 6 | 7 | Adam Novak originally wrote code to visualize Avro schema files: [1](https://github.com/ga4gh/schemas/pull/297) and [2](https://github.com/adamnovak/schemas/tree/autouml2/scripts). 8 | 9 | **avro2uml** builds on his original code to add in additional features like data clusters, clickable clusters, and some amount of automation. 10 | 11 | **protobuf2uml** is the same idea, but using Protocol Buffers-described schemas instead of Avro. 12 | -------------------------------------------------------------------------------- /protobuf2uml/schema_urls: -------------------------------------------------------------------------------- 1 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/common.proto 2 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/metadata.proto 3 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/reads.proto 4 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/references.proto 5 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/variants.proto 6 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/variant_service.proto 7 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/read_service.proto 8 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/allele_annotations.proto 9 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/allele_annotation_service.proto 10 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/bio_metadata.proto 11 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/bio_metadata_service.proto 12 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/sequence_annotations.proto 13 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/sequence_annotation_service.proto 14 | https://github.com/ga4gh/schemas/blob/master/src/main/proto/ga4gh/assay_metadata.proto 15 | -------------------------------------------------------------------------------- /protobuf2uml/make_uml.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # Author: Malisa Smith 4 | 5 | # Download all the proto schema files into the schemas_proto folder 6 | # First clean-up old files from previous runs 7 | rm -rf schemas_proto/* 8 | # Obtain the raw github url's if not raw already: 9 | raw_schema_urls=$(python url_converter.py --getrawfromfile schema_urls) 10 | for raw_url in ${raw_schema_urls}; 11 | do 12 | wget --timestamping --directory-prefix ./schemas_proto ${raw_url}; 13 | done 14 | 15 | # Replace user-defined package imports with no path. This allows proto files to find each other. 16 | #For example, import "ga4gh/common.proto"; becomes import "common.proto"; 17 | for proto_file in schemas_proto/*; do 18 | sed -e 's:ga4gh/::g' $proto_file > $proto_file.tmp && mv $proto_file.tmp $proto_file 19 | # sed -i '' s/ga4gh\///g $proto_file 20 | done 21 | 22 | # Remove any temporary files in the schemas_proto directory which have have been created as a result of editing, etc: 23 | #rm -rf schemas_proto/*~ 24 | 25 | # Generate descriptor_pb2.py with protoc: 26 | protoc descriptor.proto --python_out=. 27 | 28 | # convert .proto files into a serialized FileDescriptorSet for input into descriptor2uml.py 29 | cd schemas_proto 30 | protoc --include_source_info -o MyFileDescriptorSet.pb * 31 | cd ../ 32 | 33 | # Make the dot file which describes the UML diagram. The type_header_comments file can be empty (or you can remove the option altogether) 34 | python descriptor2uml.py --descriptor ./schemas_proto/MyFileDescriptorSet.pb --dot uml.dot --urls schema_urls #--type_comments type_header_comments 35 | 36 | # Finally, draw the UMl diagram 37 | dot uml.dot -T svg -o uml.svg 38 | -------------------------------------------------------------------------------- /avro2uml/type_header_comments: -------------------------------------------------------------------------------- 1 | ExpressionUnits e.g. FPKM or TPM 2 | RnaQuantification an analysis of reads, the result of running the described programs on the specified reads and assignment to the listed annotation 3 | Characterization Mostly mapping/alignment data, along with reference to an analysis 4 | FeatureGroup Used to identify a group of features(?) for use by ExpressionLevel 5 | ExpressionLevel Actual numerical quantification for each feature?? 6 | Program Tracks how read data was generated 7 | ReadStats Summary statistics of read data 8 | ReadGroup Set of reads derived from one physical sequencing process 9 | ReadGroupSet logical collection of ReadGroups: e.g. all reads from one experimental sample 10 | LinearAlignment alignment of a read to a Reference, using a position and CIGAR array. 11 | ReadAlignment alignment with additional information about the fragment and the read, equivalent to line in SAM file 12 | ReferenceSet a reference assembly, e.g. GRCh38 13 | Reference an immutable contig 14 | Variant change in DNA sequence relative to some reference, e.g. SNP or insertion. 15 | Call determination (i.e. a probability) of genotype with respect to a particular variant 16 | VariantSet collection of variants and variant calls intended to be analyzed together 17 | CallSet collection of calls generated by analysis of same sample 18 | VariantSetMetaData Optional metadata associated with a variant set 19 | VariantAnnotation Result of comparing variant to a set of reference data 20 | TranscriptEffect Describes effect of allele on transcript. One record for each alternate allele. 21 | AnalysisResult Output of prediction package, e.g. SIFT 22 | AlleleLocation Location of allele relative to non-genomic coordinate system, e.g. CDS 23 | Impact Effect of allele on protein function, e.g. HIGH or MODERATE 24 | VariantAnnotationSet derived from VariantSet, contains VariantAnnotation records, describes software and reference data used in annotation 25 | HGVSAnnotation Human Genome Variation Society descriptions of the sequence change with respect to genomic, transcript and protein sequences. http://www.hgvs.org/mutnomen/recs.html 26 | Attributes Attributes/values associated with various protocol records 27 | Feature Node in annotation graph that annotates a contiguous region of a sequence -------------------------------------------------------------------------------- /protobuf2uml/type_header_comments: -------------------------------------------------------------------------------- 1 | ExpressionUnits e.g. FPKM or TPM 2 | RnaQuantification an analysis of reads, the result of running the described programs on the specified reads and assignment to the listed annotation 3 | Characterization Mostly mapping/alignment data, along with reference to an analysis 4 | FeatureGroup Used to identify a group of features(?) for use by ExpressionLevel 5 | ExpressionLevel Actual numerical quantification for each feature?? 6 | Program Tracks how read data was generated 7 | ReadStats Summary statistics of read data 8 | ReadGroup Set of reads derived from one physical sequencing process 9 | ReadGroupSet logical collection of ReadGroups: e.g. all reads from one experimental sample 10 | LinearAlignment alignment of a read to a Reference, using a position and CIGAR array. 11 | ReadAlignment alignment with additional information about the fragment and the read, equivalent to line in SAM file 12 | ReferenceSet a reference assembly, e.g. GRCh38 13 | Reference an immutable contig 14 | Variant change in DNA sequence relative to some reference, e.g. SNP or insertion. 15 | Call determination (i.e. a probability) of genotype with respect to a particular variant 16 | VariantSet collection of variants and variant calls intended to be analyzed together 17 | CallSet collection of calls generated by analysis of same sample 18 | VariantSetMetaData Optional metadata associated with a variant set 19 | VariantAnnotation Result of comparing variant to a set of reference data 20 | TranscriptEffect Describes effect of allele on transcript. One record for each alternate allele. 21 | AnalysisResult Output of prediction package, e.g. SIFT 22 | AlleleLocation Location of allele relative to non-genomic coordinate system, e.g. CDS 23 | Impact Effect of allele on protein function, e.g. HIGH or MODERATE 24 | VariantAnnotationSet derived from VariantSet, contains VariantAnnotation records, describes software and reference data used in annotation 25 | HGVSAnnotation Human Genome Variation Society descriptions of the sequence change with respect to genomic, transcript and protein sequences. http://www.hgvs.org/mutnomen/recs.html 26 | Attributes Attributes/values associated with various protocol records 27 | Feature Node in annotation graph that annotates a contiguous region of a sequence -------------------------------------------------------------------------------- /avro2uml/make_uml.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # Authors: Adam Novak and Malisa Smith 4 | 5 | # Download all the avdl schema files into the schemas_avdl folder 6 | # First clean-up old files from previous runs 7 | rm -rf schemas_avdl/* 8 | rm -rf schemas_avpr/* 9 | # Obtain the raw github url's if not raw already: 10 | raw_schema_urls=$(python url_converter.py --getrawfromfile schema_urls) 11 | # Note: This wget command will overwrite old versions of files upon re-download, but it will not delete old unwanted files. 12 | for raw_url in ${raw_schema_urls}; 13 | do 14 | wget --timestamping --directory-prefix ./schemas_avdl ${raw_url}; 15 | done 16 | 17 | ###################################### 18 | 19 | if [ ! -f avro-tools.jar ] 20 | then 21 | 22 | # Download the Avro tools 23 | curl -o avro-tools.jar http://www.us.apache.org/dist/avro/avro-1.7.7/java/avro-tools-1.7.7.jar 24 | fi 25 | 26 | # Make a directory for all the .avpr files 27 | mkdir -p schemas_avpr 28 | 29 | for AVDL_FILE in ./schemas_avdl/*.avdl 30 | do 31 | # Make each AVDL file into a JSON AVPR file. 32 | 33 | # Get the name of the AVDL file without its extension or path 34 | SCHEMA_NAME=$(basename "$AVDL_FILE" .avdl) 35 | 36 | # Decide what AVPR file it will become. 37 | AVPR_FILE="./schemas_avpr/${SCHEMA_NAME}.avpr" 38 | 39 | # Compile the AVDL to the AVPR 40 | java -jar avro-tools.jar idl "${AVDL_FILE}" "${AVPR_FILE}" 41 | 42 | done 43 | 44 | ###################################### 45 | 46 | # Now sort .avdl file names in order of referral (by imports) 47 | # This is done to help form clusters of records from each file. 48 | avpr_import_order=$( 49 | for f in ./schemas_avdl/*.avdl; 50 | do 51 | doc_name=`echo -n $f | sed -r 's/.\/schemas_avdl\/([[:alnum:]]*).avdl/\1/g'`; 52 | grep "import idl" $f | awk -v dn=$doc_name '{printf dn"\t"$3"\n"}' | sed -r 's/"([[:alnum:]]*).avdl";/\1/g'; 53 | printf ${doc_name}"\t"${doc_name}"\n" 54 | done | tsort | tac | awk -vORS=" " '{ print $1 }' | sed 's/ $//' 55 | ) 56 | 57 | ###################################### 58 | 59 | # You can still use the original function to make the DOT file using a list of the avprs 60 | # Note: You now need to declare --avprs because it is no longer a positional argument 61 | # ./avpr2uml.py --avprs `ls ./schemas_avpr/* | grep -v method` --dot uml.dot 62 | 63 | # Or make the DOT file using clusters, urls, colors, and header comments 64 | ./avpr2uml.py --clusters "${avpr_import_order}" --dot uml.dot --urls schema_urls --type_comments type_header_comments 65 | 66 | dot uml.dot -T svg -o uml.svg 67 | -------------------------------------------------------------------------------- /protobuf2uml/README.md: -------------------------------------------------------------------------------- 1 | **Date of this version:** April 14, 2016 2 | 3 | * * * * * * * * * * 4 | 5 | ### Project Description 6 | 7 | Visualize (.proto format) schema files as a UML diagram using Graphviz. 8 | 9 | This project creates a schema UML diagram from a list of github URL's which end in .proto. 10 | It uses python to construct a .dot file, which is read by Graphviz's dot program to make a .svg diagram. 11 | It is designed around ga4gh protobuf schema files, e.g. https://github.com/ga4gh/schemas/tree/protobuf/src/main/proto/ga4gh 12 | 13 | ### To create the diagram: 14 | 15 | **1)** Install graphviz, libprotoc, and python 3.0 (2.7 might work if you edit descriptor2uml.py to use dict.iteritems() instead of dict.items()) 16 | 17 | To install graphviz, `sudo apt-get install graphviz` should work. 18 | 19 | Notes for installing protobuf: 20 | 21 | As of April 2016, on Ubuntu v14.04 the following command installed an old version of protobuf: `sudo apt-get install protobuf-compiler`, so instead I download the [latest release](https://github.com/google/protobuf/releases/) and installed manually: 22 | 23 | `wget https://github.com/google/protobuf/releases/download/v3.0.0-beta-2/protobuf-python-3.0.0-beta-2.tar.gz` 24 | 25 | `tar -xvzf protobuf-python-3.0.0-beta-2.tar.gz` 26 | 27 | `cd protobuf-3.0.0-beta-2/` 28 | 29 | `sudo ./configure` 30 | `sudo make` 31 | `sudo make check` 32 | `sudo make install` 33 | `sudo ldconfig` 34 | `protoc --version` 35 | 36 | `cd python/` 37 | 38 | `python setup.py` 39 | `python setup.py build` 40 | `python setup.py test` 41 | `sudo python setup.py install` 42 | 43 | (I referred to [here](http://www.confusedcoders.com/random/how-to-install-protocol-buffer-2-5-0-on-ubuntu-13-04) and [here](https://github.com/BVLC/caffe/issues/2092#issuecomment-98917616) for protobuf installation help) 44 | 45 | **2)** Make sure you have the following files in the same directory: 46 | 47 | make_uml.sh 48 | descriptor2uml.py 49 | url_converter.py 50 | descriptor.proto 51 | 52 | **3)** Additionally, you should have two manually assembled input files in the directory: 53 | 54 | schema_urls (required for automatic download and if you want links in svg) 55 | type_header_comments (You can delete or modify the contents if you don't like the header comments, but the file should still exist as dummy input) 56 | 57 | The schema_urls file contains a list of github .proto file urls. 58 | The type_header_comments file contains lines of tab-delimited descriptions of data types, e.g. 59 | `ReferenceSet a reference assembly, e.g. GRCh38` 60 | 61 | **4)** Finally, run: 62 | 63 | `sh make_uml.sh` -------------------------------------------------------------------------------- /avro2uml/README.md: -------------------------------------------------------------------------------- 1 | **Author(s):** A lot of the core code is from [Adam Novak's original version](https://github.com/ga4gh/schemas/pull/297). Malisa Smith added clusters, urls, colors, and header comments. License of the original (and derived) work is under LICENSE.txt. 2 | **Date of this version:** February 24, 2016 3 | 4 | * * * * * * * * * * 5 | 6 | Visualize (.avdl format) schema files as a UML diagram using Graphviz. 7 | 8 | This project creates a schema UML diagram from a list of github URL's which end in .avdl. 9 | It uses python to construct a .dot file, which is read by Graphviz's dot program to make a .svg diagram. 10 | It is designed for use with ga4gh avro schema files, e.g. https://github.com/ga4gh/schemas/tree/master/src/main/resources/avro 11 | 12 | ### To create the diagram: 13 | 14 | **1)** Install graphviz and python 2.7 (3.0 might also work) 15 | 16 | http://www.graphviz.org/Download..php 17 | 18 | **2)** Make sure you have the following files in the same directory: 19 | 20 | make_uml.sh 21 | avpr2uml.py 22 | url_converter.py 23 | 24 | **3)** Additionally, you should have two manually assembled input files in the directory: 25 | 26 | schema_urls (required) 27 | type_header_comments (You can delete or modify the contents if you don't like the header comments, but the file must still exist as dummy input) 28 | 29 | The schema_urls file contains a list of github .avdl file urls, e.g. https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/reads.avdl 30 | Until the ga4gh schema is finalized and the entire schema can be stored in one local directory, files for inclusion in the diagram are to be listed in schema_urls. 31 | 32 | The type_header_comments file contains lines of tab-delimited descriptions of data types, e.g. ReferenceSet a reference assembly, e.g. GRCh38 33 | 34 | **4)** Finally, run: 35 | 36 | sh make_uml.sh 37 | 38 | ### Example UML diagram 39 | 40 | [Here](https://cdn.rawgit.com/malisas/schema-uml/master/avro2uml/example_svgs/master_uml_2016-03-07.svg) 41 | 42 | Grey clusters in image are click-able when viewing raw svg. 43 | 44 | ### Limitations 45 | 46 | Some edges and data structures may need to be manually modified or added in the dot file. You should check to make sure all data structures are properly represented in the UML diagram. 47 | 48 | Referential edge-finding between data-structure fields is based on "id" string matching, e.g. "analysisId" will point to "Analysis" object. Non-id references will not be found. Containments of objects are based on complete string matches to field types. 49 | 50 | If there is more than one instance of a data structure with the same name (e.g. "Evidence" might appear twice in the input avro files), it will only be drawn once. Technically this should be illegal anyway. If you want to have two objects with the same name, you must manually edit the dot file. Two objects with the same name will also cause edge-finding problems. 51 | -------------------------------------------------------------------------------- /avro2uml/url_converter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | import argparse, sys, os, re 3 | 4 | """ 5 | Author: Malisa Smith 6 | 7 | This program converts github url's to and from the raw file format url, e.g.: 8 | 9 | Raw url: 10 | https://raw.githubusercontent.com/ga4gh/schemas/master/src/main/resources/avro/reads.avdl 11 | 12 | Not-raw url: 13 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/reads.avdl 14 | """ 15 | 16 | def parse_args(args): 17 | """ 18 | Note: This function heavily borrows from Adam Novak's code: https://github.com/adamnovak/schemas/blob/autouml/contrib/avpr2uml.py 19 | 20 | 21 | """ 22 | 23 | # The command line arguments start with the program name, which we don't 24 | # want to treat as an argument for argparse. So we remove it. 25 | args = args[1:] 26 | 27 | # Construct the parser (which is stored in parser) 28 | # See http://docs.python.org/library/argparse.html#formatter-class 29 | parser = argparse.ArgumentParser(description=__doc__, 30 | formatter_class=argparse.RawDescriptionHelpFormatter) 31 | 32 | # Now add all the options to it 33 | #Note: avprs is now an optional argument. One of --avprs or --clusters must be specified, however. 34 | parser.add_argument("--getraw", type=str, 35 | help="Convert a github url to its raw form") 36 | parser.add_argument("--getcooked", type=str, 37 | help="Convert a github raw url to its not-raw/cooked form") 38 | parser.add_argument("--getrawfromfile", type=argparse.FileType("r"), 39 | help="Convert a list of urls in a file into their raw form") 40 | parser.add_argument("--getcookedfromfile", type=argparse.FileType("r"), 41 | help="Convert a list of urls in a file into their non-raw/cooked form") 42 | 43 | return parser.parse_args(args) 44 | 45 | def get_raw_url(url): 46 | if re.match("^https://raw\.githubusercontent\.com/", url): #the url is already raw 47 | return url 48 | else: #convert into raw format 49 | url_parts = re.match("^https://github\.com/(?P[a-zA-Z0-9_\-]+)/(?P[a-zA-Z0-9_\-]+)/blob/(?P.*$)", url) 50 | return "https://raw.githubusercontent.com/" + url_parts.group('user') + "/" + url_parts.group('repo') + "/" + url_parts.group('url_end') 51 | 52 | def get_cooked_url(url): 53 | if re.match("^https://github\.com/", url): #the url is already "cooked" 54 | return url 55 | else: #convert into "cooked" format 56 | url_parts = re.match("^https://raw\.githubusercontent\.com/(?P[a-zA-Z0-9_\-]+)/(?P[a-zA-Z0-9_\-]+)/(?P.*$)", url) 57 | return "https://github.com/" + url_parts.group('user') + "/" + url_parts.group('repo') + "/blob/" + url_parts.group('url_end') 58 | 59 | def get_raw_from_file(my_file): 60 | for url in my_file: 61 | print(get_raw_url(url.strip())) 62 | 63 | def get_cooked_from_file(my_file): 64 | for url in my_file: 65 | print(get_cooked_url(url.strip())) 66 | 67 | def main(args): 68 | """ 69 | Parses command line arguments, and does the work of the program. 70 | "args" specifies the program arguments, with args[0] being the executable 71 | name. The return value should be used as the program's exit code. 72 | """ 73 | 74 | options = parse_args(args) 75 | 76 | if options.getraw is not None: 77 | print(get_raw_url(options.getraw)) 78 | elif options.getcooked is not None: 79 | print(get_cooked_url(options.getcooked)) 80 | elif options.getrawfromfile is not None: 81 | get_raw_from_file(options.getrawfromfile) 82 | elif options.getcookedfromfile is not None: 83 | get_cooked_from_file(options.getcookedfromfile) 84 | 85 | if __name__ == "__main__" : 86 | sys.exit(main(sys.argv)) 87 | -------------------------------------------------------------------------------- /protobuf2uml/url_converter.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/python 2 | import argparse, sys, os, re 3 | 4 | """ 5 | Author: Malisa Smith 6 | 7 | This program converts github url's to and from the raw file format url, e.g.: 8 | 9 | Raw url: 10 | https://raw.githubusercontent.com/ga4gh/schemas/master/src/main/resources/avro/reads.avdl 11 | 12 | Not-raw url: 13 | https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/reads.avdl 14 | """ 15 | 16 | def parse_args(args): 17 | """ 18 | Note: This function heavily borrows from Adam Novak's code: https://github.com/adamnovak/schemas/blob/autouml/contrib/avpr2uml.py 19 | 20 | 21 | """ 22 | 23 | # The command line arguments start with the program name, which we don't 24 | # want to treat as an argument for argparse. So we remove it. 25 | args = args[1:] 26 | 27 | # Construct the parser (which is stored in parser) 28 | # See http://docs.python.org/library/argparse.html#formatter-class 29 | parser = argparse.ArgumentParser(description=__doc__, 30 | formatter_class=argparse.RawDescriptionHelpFormatter) 31 | 32 | # Now add all the options to it 33 | #Note: avprs is now an optional argument. One of --avprs or --clusters must be specified, however. 34 | parser.add_argument("--getraw", type=str, 35 | help="Convert a github url to its raw form") 36 | parser.add_argument("--getcooked", type=str, 37 | help="Convert a github raw url to its not-raw/cooked form") 38 | parser.add_argument("--getrawfromfile", type=argparse.FileType("r"), 39 | help="Convert a list of urls in a file into their raw form") 40 | parser.add_argument("--getcookedfromfile", type=argparse.FileType("r"), 41 | help="Convert a list of urls in a file into their non-raw/cooked form") 42 | 43 | return parser.parse_args(args) 44 | 45 | def get_raw_url(url): 46 | if re.match("^https://raw\.githubusercontent\.com/", url): #the url is already raw 47 | return url 48 | else: #convert into raw format 49 | url_parts = re.match("^https://github\.com/(?P[a-zA-Z0-9_\-]+)/(?P[a-zA-Z0-9_\-]+)/blob/(?P.*$)", url) 50 | return "https://raw.githubusercontent.com/" + url_parts.group('user') + "/" + url_parts.group('repo') + "/" + url_parts.group('url_end') 51 | 52 | def get_cooked_url(url): 53 | if re.match("^https://github\.com/", url): #the url is already "cooked" 54 | return url 55 | else: #convert into "cooked" format 56 | url_parts = re.match("^https://raw\.githubusercontent\.com/(?P[a-zA-Z0-9_\-]+)/(?P[a-zA-Z0-9_\-]+)/(?P.*$)", url) 57 | return "https://github.com/" + url_parts.group('user') + "/" + url_parts.group('repo') + "/blob/" + url_parts.group('url_end') 58 | 59 | def get_raw_from_file(my_file): 60 | for url in my_file: 61 | print(get_raw_url(url.strip())) 62 | 63 | def get_cooked_from_file(my_file): 64 | for url in my_file: 65 | print(get_cooked_url(url.strip())) 66 | 67 | def main(args): 68 | """ 69 | Parses command line arguments, and does the work of the program. 70 | "args" specifies the program arguments, with args[0] being the executable 71 | name. The return value should be used as the program's exit code. 72 | """ 73 | 74 | options = parse_args(args) 75 | 76 | if options.getraw is not None: 77 | print(get_raw_url(options.getraw)) 78 | elif options.getcooked is not None: 79 | print(get_cooked_url(options.getcooked)) 80 | elif options.getrawfromfile is not None: 81 | get_raw_from_file(options.getrawfromfile) 82 | elif options.getcookedfromfile is not None: 83 | get_cooked_from_file(options.getcookedfromfile) 84 | 85 | if __name__ == "__main__" : 86 | sys.exit(main(sys.argv)) 87 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. -------------------------------------------------------------------------------- /protobuf2uml/example_svgs/bmeg_2016-04-14.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 9 | 10 | UML 11 | 12 | cluster_variant_proto 13 | 14 | 15 | variant_proto 16 | 17 | 18 | 19 | 20 | IndividualList 21 | 22 | 23 | 24 | IndividualList 25 | 26 | - individuals 27 | 28 | 29 | Individual 30 | 31 | 32 | 33 | Individual 34 | 35 | - name 36 | 37 | - bioSamples 38 | 39 | - source 40 | 41 | - observations 42 | 43 | 44 | IndividualList:individuals:w->Individual 45 | 46 | 47 | 48 | 49 | VariantCall 50 | 51 | 52 | 53 | VariantCall 54 | 55 | - source 56 | 57 | - tumorAllele1 58 | 59 | - position 60 | 61 | - tumorAllele2 62 | 63 | - referenceAllele 64 | 65 | - variantClassification 66 | 67 | - normalAllele1 68 | 69 | - variantCallEffects 70 | 71 | - normalAllele2 72 | 73 | - info 74 | 75 | 76 | VariantCallEffect 77 | 78 | 79 | 80 | VariantCallEffect 81 | 82 | - source 83 | 84 | - transcriptStatus 85 | 86 | - feature 87 | 88 | - transcriptVersion 89 | 90 | - domains 91 | 92 | - cPosition 93 | 94 | - variantType 95 | 96 | - aminoAcidChange 97 | 98 | - transcriptSpecies 99 | 100 | - strand 101 | 102 | - transcriptName 103 | 104 | - info 105 | 106 | - transcriptSource 107 | 108 | 109 | VariantCall:variantCallEffects:w->VariantCallEffect 110 | 111 | 112 | 113 | 114 | Position 115 | 116 | 117 | 118 | Position 119 | 120 | - reference 121 | 122 | - end 123 | 124 | - start 125 | 126 | - strand 127 | 128 | 129 | VariantCall:position:w->Position 130 | 131 | 132 | 133 | 134 | BioSample 135 | 136 | 137 | 138 | BioSample 139 | 140 | - name 141 | 142 | - individualName 143 | 144 | - source 145 | 146 | - variantCalls 147 | 148 | 149 | Individual:bioSamples:w->BioSample 150 | 151 | 152 | 153 | 154 | Feature 155 | 156 | 157 | 158 | Feature 159 | Node in annotation graph that annotates a contiguous 160 | region of a sequence 161 | 162 | - position 163 | 164 | - attributes 165 | 166 | - featureType 167 | 168 | 169 | Feature:position:w->Position 170 | 171 | 172 | 173 | 174 | BioSample:variantCalls:w->VariantCall 175 | 176 | 177 | 178 | 179 | 180 | -------------------------------------------------------------------------------- /protobuf2uml/descriptor2uml.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/python 2 | 3 | """ 4 | Author: Malisa Smith 5 | 6 | Make UML diagrams based on protocol buffers-described schemas. Outputs a .dot file to be used with GraphViz's dot program. 7 | 8 | Instead of parsing the original schema .proto files, this program takes a 9 | FileDescriptorSet as input. A FileDescriptorSet is itself a protobuf-serialized message which contains information about the original schema files within it (defined here: 10 | https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto). See README for how to generate the FileDescriptorSet. 11 | """ 12 | 13 | import argparse, sys, os, itertools, re, textwrap 14 | from descriptor_pb2 import FileDescriptorSet #note: uses proto2!! 15 | import url_converter 16 | 17 | def parse_args(args): 18 | 19 | args = args[1:] 20 | parser = argparse.ArgumentParser(description=__doc__, 21 | formatter_class=argparse.RawDescriptionHelpFormatter) 22 | 23 | parser.add_argument("--descriptor", type=argparse.FileType("rb"), 24 | help="File with FileDescriptorSet of original schema") 25 | parser.add_argument("--dot", type=argparse.FileType("w"), 26 | help="GraphViz file to write a UML diagram to") 27 | parser.add_argument("--type_comments", type=argparse.FileType("r"), 28 | help="tab-delimited file with type names and type header comments") 29 | parser.add_argument("--urls", type=argparse.FileType("r"), 30 | help="file with links to original schema files") 31 | 32 | 33 | return parser.parse_args(args) 34 | 35 | #check the "message" to see if it is a trivial map. 36 | def is_trivial_map(nested_type): 37 | #I am defining a trivial map to be a message with a nested_type.name that ends in "Entry". With two fields, "key" and "value". The "value" field has a type that is not 11 (and a list) or 14. 38 | if nested_type.name.endswith("Entry") and len(nested_type.field) == 2 and nested_type.field[0].name == "key" and nested_type.field[1].name == "value" and not ((nested_type.field[1].type == 11 and not nested_type.field[1].type_name == ".google.protobuf.ListValue") or nested_type.field[1] == 14): 39 | return True 40 | else: 41 | return False 42 | 43 | #parse a message. Pass in all the dictionaries to be updated, as well as the relevant message 44 | # For now just parse the name, field, nested_type, and enum_type fields in DescriptorProto: https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto#L92 45 | # Might later also want to parse oneof_decl, but assume for now I won't be dealing with that. 46 | def parse_message(cluster, fields, containments, nests, id_targets, id_references, clusters, message, message_index=None, edges_from=None): 47 | #track all the fields in the message 48 | fields[message.name] = [] 49 | 50 | # for field in message.field: 51 | for field_index in range(0, len(message.field)): 52 | field = message.field[field_index] 53 | fields[message.name].append((field.name, field.type)) 54 | #deal with containments, id_targets, and id_references, if applicable. 55 | #Containments will be signified by a field.type of 11 (for TYPE_MESSAGE) or 14 (for TYPE_ENUM). I can determine the type of containment by looking at field.type_name 56 | #Note: maps will also come up as type 11 and will have a field.type_name of something like .bmeg.Feature.AttributesEntry where the actual field name is attributes 57 | if field.type == 11 or field.type == 14: 58 | # We are likely adding containments of trivial maps, e.g. ('VariantCallEffect', 'InfoEntry', 'info'). 59 | # The edge is only drawn if the map/message itself is processed fully using parse_message(), however. And, it will only be processed 60 | # if it is not a trivial map. (see how nested_types are dealt with further down). When drawing containment edges, the program checks if the 61 | # field type_name is a key in the fields dictionary. 62 | containments.add((message.name, field.type_name.split(".")[-1], field.name)) 63 | #id_targets are simply fields where field.name is "id" 64 | if field.name.lower() == "id": 65 | id_targets[message.name.lower()] = (message.name, field.name.lower().split(".")[-1]) 66 | #id_targets[field.name.lower().split(".")[-1]] = message.name#field.name 67 | #id_references are fields which end in id or ids 68 | elif field.name.lower().endswith("id") or field.name.lower().endswith("ids"): 69 | if field.name.lower().endswith("id"): 70 | destination = field.name.lower()[0:-2] 71 | elif field.name.lower().endswith("ids"): 72 | destination = field.name.lower()[0:-3] 73 | destination = destination.replace("_", "") 74 | id_references.add((message.name, destination, field.name)) 75 | if field.name.endswith("Edges"): 76 | edges_from[(cluster.name, 4, message_index, 2, field_index)] = [message.name, field.name] 77 | 78 | for nested_type in message.nested_type: 79 | #Note: it seems you can define a nested message without actually using it in a field in the outer message. So, a nested_type is not necessarily used in a field. 80 | #fields[message.name].append((nested_type.name, 11)) #a nested_type is a message. field types in DescriptorProto uses 11 for TYPE_MESSAGE 81 | 82 | # Note: according to https://developers.google.com/protocol-buffers/docs/proto#backwards-compatibility 83 | # maps are sent as messages (not map-types) "on the wire". We don't want to draw nodes for nested types that are trivial maps of string to string. 84 | # So, check if we want to process the nested_type further: 85 | if not is_trivial_map(nested_type): 86 | #the nested_type is nested within the message. So keep track of this edge in the nests variable 87 | #nests.add((message.name, nested_type.name)) #for now actually don't bother drawing edges for nests. 88 | 89 | #nested_type is itself a message, so recursively call this function. 90 | parse_message(cluster, fields, containments, nests, id_targets, id_references, clusters, nested_type) 91 | 92 | for enum_type in message.enum_type: #a nested Enum 93 | #we can consider the enum a nesting too 94 | #nests.add((message.name, enum_type.name)) #For now don't bother with drawing edges for xnests 95 | #And define it as a top-level type. So it has a fields entry. 96 | fields[enum_type.name] = [] 97 | for field in enum_type.value: 98 | fields[enum_type.name].append((field.name, 9)) 99 | #Finally, add it to the cluster 100 | clusters[cluster.name].append(enum_type.name) 101 | 102 | #Add the name of the message as a type in the current cluster 103 | clusters[cluster.name].append(message.name) 104 | 105 | def parse_cluster(cluster, fields, containments, nests, id_targets, id_references, edges_from, edges_targets, clusters): 106 | 107 | clusters[cluster.name] = [] 108 | 109 | #process all the enum-types in the cluster 110 | for enum in cluster.enum_type: 111 | #Track all the enum "fields" 112 | fields[enum.name] = [] 113 | for field in enum.value: 114 | fields[enum.name].append((field.name, 9)) #an Enum field is a string. field types in DescriptorProto uses 9 for TYPE_STRING 115 | #Record the name of the enum as a type in the current cluster 116 | clusters[cluster.name].append(enum.name) 117 | 118 | #track all the message-types in the cluster 119 | #for message in cluster.message_type: 120 | for message_index in range(0, len(cluster.message_type)): 121 | message = cluster.message_type[message_index] 122 | #recursively parse each message 123 | parse_message(cluster, fields, containments, nests, id_targets, id_references, clusters, message, message_index, edges_from) 124 | #Note: the message will add itself to the cluster 125 | 126 | # Parse source_code_info for edge targets. 127 | for source_location_index in range(0, len(cluster.source_code_info.location)): 128 | location = cluster.source_code_info.location[source_location_index] 129 | path = tuple(location.path) 130 | # Example when split: [' Target: VariantCall Biosample Individual Feature', ''] 131 | comments = location.leading_comments.split('\n') 132 | if len(comments) > 1 and comments[-2].startswith(" Target:"): 133 | targets = comments[-2].split(" ")[2:] 134 | edges_targets_key = (cluster.name,) + path # e.g. (samples.proto, 4, 13, 2, 6) 135 | edges_targets[edges_targets_key] = targets 136 | 137 | def write_graph(fields, containments, nests, matched_references, matched_edges, clusters, type_comments_file, urls_file, dot_file): 138 | 139 | # Parse type_comments_file if applicable 140 | type_comments = {} 141 | if type_comments_file is not None: 142 | for type_comment in type_comments_file: 143 | type_comment_split = type_comment.split("\t") 144 | type_comments[type_comment_split[0]] = type_comment_split[1].strip() 145 | 146 | # Breaks up a comment string so no more than ~57 characters are on each line 147 | def break_up_comment(comment): 148 | wrapper = textwrap.TextWrapper(break_long_words = False, width = 57) 149 | return "
".join(wrapper.wrap(comment)) 150 | 151 | # Fill in the urls dictionary. 152 | urls = {} 153 | if urls_file is not None: 154 | for url in urls_file: 155 | cooked_url = url_converter.get_cooked_url(url.strip()) 156 | url_key = cooked_url.split("/")[-1] 157 | urls[url_key] = cooked_url 158 | 159 | # Start a digraph 160 | dot_file.write("digraph UML {\n") 161 | 162 | # Define node properties: shaped like UML items. 163 | dot_file.write("node [\n") 164 | dot_file.write("\tshape=plaintext\n") 165 | dot_file.write("]\n\n") 166 | 167 | # Draw each node/type/record as a table 168 | for type_name, field_list in fields.items(): #python 2.x uses dict.iteritems() but python 3.x uses dict.items() 169 | 170 | dot_file.write("{} [label=<\n".format(type_name))#type_to_display(type_name))) 171 | dot_file.write("\n") 172 | dot_file.write("\t\n") 173 | dot_file.write("\t\t\n") 180 | dot_file.write("\t\n") 181 | 182 | 183 | # Now draw the rows of fields for the type. A field_list of [a, b, c, d, e, f, g] will have [a, e] in row 1, [b, f] in row 2, [c, g] in row 3, and just [d] in row 4 184 | num_fields = len(field_list) 185 | for i in range(0, num_fields//2 + num_fields%2): 186 | # Draw one row. 187 | dot_file.write("\t\n") 188 | # Port number and displayed text will be the i'th field's name 189 | dot_file.write("\t\t\n".format(field_list[i][0], field_list[i][0])) 190 | if (num_fields%2) == 1 and (i == num_fields//2 + num_fields%2 - 1): 191 | # Don't draw the second cell in the row if you have an odd number of fields and it is the last row 192 | pass 193 | else: 194 | dot_file.write("\t\t\n".format(field_list[num_fields//2 + num_fields%2 + i][0], field_list[num_fields//2 + num_fields%2 + i][0])) 195 | dot_file.write("\t\n") 196 | 197 | # Finish the table 198 | dot_file.write("
{}".format(type_name)) 174 | 175 | # Add option to specify description for header: 176 | if type_name in type_comments: 177 | dot_file.write("
{}".format(break_up_comment(type_comments[type_name]))) 178 | 179 | dot_file.write("
- {}- {}
>];\n\n") 199 | 200 | # Now define the clusters/subgraphs 201 | for cluster_name, cluster_types in clusters.items(): #python 2.x uses dict.iteritems() but python 3.x uses dict.items() 202 | dot_file.write("subgraph cluster_{} {{\n".format(cluster_name.replace(".", "_"))) 203 | dot_file.write("\tstyle=\"rounded, filled\";\n") 204 | dot_file.write("\tcolor=lightgrey;\n") 205 | dot_file.write("\tnode [style=filled,color=white];\n") 206 | dot_file.write("\tlabel = \"{}\";\n".format(cluster_name.replace(".", "_"))) 207 | 208 | if cluster_name in urls: 209 | dot_file.write("\tURL=\"{}\";\n".format(urls[cluster_name])) 210 | 211 | #After all the cluster formatting, define the cluster types 212 | for cluster_type in cluster_types: 213 | dot_file.write("\t{};\n".format(cluster_type)) #cluster_type should match up with a type_name from fields 214 | dot_file.write("}\n\n") 215 | 216 | 217 | dot_file.write("\n// Define containment edges\n") 218 | # Define edge properties for containments 219 | dot_file.write("edge [\n") 220 | dot_file.write("\tdir=both\n") 221 | dot_file.write("\tarrowtail=odiamond\n") 222 | dot_file.write("\tarrowhead=none\n") 223 | dot_file.write("\tcolor=\"#C55A11\"\n") 224 | dot_file.write("\tpenwidth=2\n") 225 | dot_file.write("]\n\n") 226 | 227 | for container, containee, container_field_name in containments: 228 | # Now do the containment edges 229 | # Only write the edge if the containee is a top-level field in fields. 230 | if containee in fields: 231 | dot_file.write("{}:{}:w -> {}\n".format(container, 232 | container_field_name, containee)) 233 | 234 | dot_file.write("\n// Define references edges\n") 235 | # Define edge properties for references 236 | dot_file.write("\nedge [\n") 237 | dot_file.write("\tdir=both\n") 238 | dot_file.write("\tarrowtail=none\n") 239 | dot_file.write("\tarrowhead=vee\n") 240 | dot_file.write("\tstyle=dashed\n") 241 | dot_file.write("\tcolor=\"darkgreen\"\n") 242 | dot_file.write("\tpenwidth=2\n") 243 | dot_file.write("]\n\n") 244 | 245 | for referencer, referencer_field, referencee in matched_references: 246 | # Now do the reference edges 247 | dot_file.write("{}:{}:w -> {}:id:w\n".format(referencer, referencer_field, 248 | referencee)) 249 | 250 | # Now make the edges which had targets encoded in leading comments 251 | for outgoing, targets in matched_edges: 252 | # Format is: [['PhenotypeAssociation', 'hasGenotypeEdges'], ['VariantCall', 'Biosample', 'Individual', 'Feature']]] 253 | for target in targets: 254 | dot_file.write("{}:{}:w -> {}:name:w\n".format(outgoing[0], outgoing[1], target)) 255 | 256 | 257 | # Close the digraph off. 258 | dot_file.write("}\n") 259 | 260 | 261 | #for now, returns fields, containments, and references (and clusters?), although in the future might want to also return type_comments and urls and clusters, etc... 262 | def parse_descriptor(descriptor_file): 263 | descriptor = FileDescriptorSet() 264 | descriptor.MergeFromString(descriptor_file.read()) 265 | 266 | # Holds the fields for each type, as lists of tuples of (name, type), 267 | # indexed by type. All types are fully qualified. 268 | fields = {} 269 | 270 | # Holds edge tuples for containment from container to contained. 271 | containments = set() 272 | 273 | # Holds edge tuples for nested type edges, from parent type to nested type. 274 | nests = set() 275 | 276 | # Holds a dict from lower-case short name to fully-qualified name for 277 | # everything with an "id" field. E.g. if Variant has an id, then key is "variant" and value is "Variant" 278 | id_targets = {} 279 | 280 | # Holds a set of tuples of ID references, (fully qualified name of 281 | # referencer, lower-case target name) 282 | id_references = set() 283 | 284 | # Dictionary of fields which act as outgoing edges within messages. 285 | # key: [cluster.name, inferred path in FileDescriptorSet source_code_info...], value: [message.name, field.name] 286 | edges_from = {} 287 | 288 | # Dictionary of field name targets which are encoded in comments, found via FileDescriptorSet source_code_info. 289 | # key: [cluster.name, inferred path in FileDescriptorSet source_code_info...], value: [target field types] 290 | edges_targets = {} 291 | 292 | # Holds the field names from each original .proto file, in order to draw one cluster of fields for each file 293 | # Key: cluster/file name Value: tuple of field names 294 | clusters = {} 295 | 296 | for cluster in descriptor.file: 297 | #Note: you can pass a dictionary into a function and modify the original since it still refers to the same location in memory I think? (you don't need to pass it back) 298 | parse_cluster(cluster, fields, containments, nests, id_targets, id_references, edges_from, edges_targets, clusters) 299 | 300 | # Now match the id references to targets. 301 | matched_references = set() #will contain tuples of strings, i.e. (referencer, referencer_field, referencee) 302 | #id_targets_keys_lowercase = [key.lower() for key in id_targets.keys()] 303 | for id_reference in id_references: 304 | if id_reference[1] in id_targets: 305 | matched_references.add((id_reference[0], id_reference[2], id_targets[id_reference[1]][0])) 306 | 307 | # Now match outgoing edge fields (ending with "Edges") to their targets found in source_code_info leading comments. 308 | matched_edges = [] 309 | for key, value in edges_from.items(): 310 | if key in edges_targets: 311 | matched_edges.append([value, edges_targets[key]]) 312 | 313 | return (fields, containments, nests, matched_references, matched_edges, clusters) 314 | """ 315 | #printing. test! 316 | print("\n*********************\nPRINTING fields\n(parent-type-name: [(field-name, field-type)]\n*********************\n") 317 | print(fields) 318 | 319 | print("\n*********************\nPRINTING containments\n(message name, field type name, field name)\n*********************\n") 320 | print(containments) 321 | 322 | print("\n*********************\nPRINTING nests\n(parent type, nested type)\n*********************\n") 323 | print(nests) 324 | 325 | print("\n*********************\nPRINTING id_targets\n(target-type-lower: (target-type, id-format))\n*********************\n") 326 | print(id_targets) 327 | 328 | print("\n*********************\nPRINTING id_references\n(referer-name, referred-type-lower, referer-field)\n*********************\n") 329 | print(id_references) 330 | 331 | print("\n*********************\nPRINTING clusters\n[list of types in one cluster/file]\n*********************\n") 332 | print(clusters) 333 | 334 | print("\n*********************\nPRINTING matched_references\n(referencer, referencer_field, referencee)\n*********************\n") 335 | print(matched_references) 336 | 337 | print("\n*********************\nPRINTING matched_edges\n*********************\n") 338 | print(matched_edges) 339 | """ 340 | 341 | def main(args): 342 | options = parse_args(args) # This holds the nicely-parsed options object 343 | 344 | (fields, containments, nests, matched_references, matched_edges, clusters) = parse_descriptor(options.descriptor) 345 | 346 | if options.dot is not None: 347 | #Now write the diagram to the dot file! 348 | write_graph(fields, containments, nests, matched_references, matched_edges, clusters, options.type_comments, options.urls, options.dot) 349 | 350 | if __name__ == "__main__" : 351 | sys.exit(main(sys.argv)) 352 | -------------------------------------------------------------------------------- /protobuf2uml/example_svgs/bmeg_2016-04-05.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 9 | 10 | UML 11 | 12 | cluster_simple_schema_proto 13 | 14 | 15 | simple_schema_proto 16 | 17 | 18 | 19 | 20 | Feature 21 | 22 | 23 | 24 | Feature 25 | Node in annotation graph that annotates a contiguous 26 | region of a sequence 27 | 28 | - id 29 | 30 | - featureType 31 | 32 | - position 33 | 34 | - attributes 35 | 36 | 37 | Position 38 | 39 | 40 | 41 | Position 42 | 43 | - referenceName 44 | 45 | - end 46 | 47 | - start 48 | 49 | - strand 50 | 51 | 52 | Feature:position:w->Position 53 | 54 | 55 | 56 | 57 | VariantCallEffect 58 | 59 | 60 | 61 | VariantCallEffect 62 | 63 | - id 64 | 65 | - feature 66 | 67 | - source 68 | 69 | - info 70 | 71 | - variantCallId 72 | 73 | 74 | VariantCall 75 | 76 | 77 | 78 | VariantCall 79 | 80 | - id 81 | 82 | - referenceBases 83 | 84 | - source 85 | 86 | - genotype 87 | 88 | - bioSampleId 89 | 90 | - variantCallEffects 91 | 92 | - position 93 | 94 | - info 95 | 96 | 97 | VariantCallEffect:variantCallId:w->VariantCall:id:w 98 | 99 | 100 | 101 | 102 | Individual 103 | 104 | 105 | 106 | Individual 107 | 108 | - id 109 | 110 | - bioSamples 111 | 112 | - source 113 | 114 | - observations 115 | 116 | - name 117 | 118 | 119 | BioSample 120 | 121 | 122 | 123 | BioSample 124 | 125 | - id 126 | 127 | - individualId 128 | 129 | - source 130 | 131 | - individualName 132 | 133 | - name 134 | 135 | - variantCalls 136 | 137 | 138 | Individual:bioSamples:w->BioSample 139 | 140 | 141 | 142 | 143 | BioSample:individualId:w->Individual:id:w 144 | 145 | 146 | 147 | 148 | BioSample:variantCalls:w->VariantCall 149 | 150 | 151 | 152 | 153 | Strand 154 | 155 | 156 | 157 | Strand 158 | 159 | - STRAND_UNSPECIFIED 160 | 161 | - POS_STRAND 162 | 163 | - NEG_STRAND 164 | 165 | 166 | Position:strand:w->Strand 167 | 168 | 169 | 170 | 171 | IndividualList 172 | 173 | 174 | 175 | IndividualList 176 | 177 | - individuals 178 | 179 | 180 | IndividualList:individuals:w->Individual 181 | 182 | 183 | 184 | 185 | VariantCall:variantCallEffects:w->VariantCallEffect 186 | 187 | 188 | 189 | 190 | VariantCall:bioSampleId:w->BioSample:id:w 191 | 192 | 193 | 194 | 195 | VariantCall:position:w->Position 196 | 197 | 198 | 199 | 200 | 201 | -------------------------------------------------------------------------------- /avro2uml/avpr2uml.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | """ 3 | Authors: Adam Novak and Malisa Smith 4 | 5 | avpr2uml.py: make UML diagrams from Avro AVPR files (which you can easily 6 | generate from AVDL files). Inclusion of other types will be detected and turned 7 | into the appropriate UML edges. ID references will be created if the referencee 8 | has an "id" field, and the referencer has a referenceeNameId(s) field. Some 9 | attempt is made to fuzzy-match referencers to referencees, but it is not perfect 10 | and may require manual adjustment of the resulting edges. 11 | 12 | Re-uses sample code and documentation from 13 | 14 | """ 15 | 16 | import argparse, sys, os, itertools, re, json, textwrap 17 | import url_converter 18 | 19 | def parse_args(args): 20 | """ 21 | Takes in the command-line arguments list (args), and returns a nice argparse 22 | result with fields for all the options. 23 | Borrows heavily from the argparse documentation examples: 24 | 25 | """ 26 | 27 | # The command line arguments start with the program name, which we don't 28 | # want to treat as an argument for argparse. So we remove it. 29 | args = args[1:] 30 | 31 | # Construct the parser (which is stored in parser) 32 | # Module docstring lives in __doc__ 33 | # See http://python-forum.com/pythonforum/viewtopic.php?f=3&t=36847 34 | # And a formatter class so our examples in the docstring look good. Isn't it 35 | # convenient how we already wrapped it to 80 characters? 36 | # See http://docs.python.org/library/argparse.html#formatter-class 37 | parser = argparse.ArgumentParser(description=__doc__, 38 | formatter_class=argparse.RawDescriptionHelpFormatter) 39 | 40 | # Now add all the options to it 41 | # Note: avprs is now an optional argument. One of --avprs or --clusters must be specified, however. 42 | group = parser.add_mutually_exclusive_group(required=True) 43 | group.add_argument("--avprs", type=argparse.FileType("r"), default=None, nargs='*', 44 | help="the AVPR file(s) to read") 45 | group.add_argument("--clusters", type=str, default=None, 46 | help="List of original clusters/avdl files as a space-separated string, in imported order") 47 | parser.add_argument("--dot", type=argparse.FileType("w"), 48 | help="GraphViz file to write a UML diagram to") 49 | parser.add_argument("--urls", type=argparse.FileType("r"), 50 | help="File with schema url's") 51 | parser.add_argument("--type_comments", type=argparse.FileType("r"), 52 | help="tab-delimited file with type names and type header comments") 53 | 54 | 55 | return parser.parse_args(args) 56 | 57 | def type_to_string(parsed_type, namespace=None, strip_namespace=False): 58 | """ 59 | Given the JSON representation of a field type (a string naming an Avro 60 | primitive type, a string naming a qualified user-defiend type, a string 61 | naming a non-qualified user-defined type, a list of types being unioned 62 | together, or a dict with a "type" of "array" or "map" and an "items" 63 | defining a type), produce a string defining the type relative to the given 64 | namespace. 65 | 66 | If strip_namespace is specified, namespace info will be stripped out. 67 | 68 | """ 69 | 70 | if isinstance(parsed_type, list): 71 | # It's a union. Recurse on each unioned element. 72 | return ("union<" + ",".join([type_to_string(x, namespace, 73 | strip_namespace) for x in parsed_type]) + ">") 74 | elif isinstance(parsed_type, dict): 75 | # It's an array or map. 76 | 77 | if parsed_type["type"] == "array": 78 | # For an array we recurse on items 79 | recurse_on = parsed_type["items"] 80 | elif parsed_type["type"] == "map": 81 | # For a map, we recurse on values. 82 | recurse_on = parsed_type["values"] 83 | else: 84 | # This is not allowed to be a template. 85 | raise RuntimeError("Invalid template {}".format( 86 | parsed_type["type"])) 87 | 88 | return (parsed_type["type"] + "<" + 89 | type_to_string(recurse_on, namespace, strip_namespace) + ">") 90 | elif parsed_type in ["int", "long", "string", "boolean", "float", "double", 91 | "null", "bytes"]: 92 | # If it's a primitive type, return it. 93 | return parsed_type 94 | elif "." in parsed_type: 95 | # It has a dot, so assume it's fully qualified. TODO: Handle partially 96 | # qualified types, where we have to check if this type actually exists. 97 | 98 | parts = parsed_type.split(".") 99 | 100 | parsed_namespace = ".".join(parts[:-1]) 101 | 102 | if strip_namespace or parsed_namespace == namespace: 103 | # Pull out the namespace, sicne we don't want/don't need it 104 | parsed_type = [-1] 105 | 106 | return parsed_type 107 | else: 108 | # Just interpret it in our namespace. Don't fully qualify it. 109 | 110 | # Then give back the type name 111 | return parsed_type 112 | 113 | def find_user_types(parsed_type, namespace=None): 114 | """ 115 | Given the JSON representation of a field type (a string naming an Avro 116 | primitive type, a string naming a qualified user-defiend type, a string 117 | naming a non-qualified user-defined type, a list of types being unioned 118 | together, or a dict with a "type" of "array" or "map" and an "items" 119 | defining a type), yield all of the user types it references. 120 | 121 | """ 122 | 123 | if isinstance(parsed_type, list): 124 | # It's a union. 125 | for option in parsed_type: 126 | # Recurse on each unioned element. 127 | for found in find_user_types(option, namespace): 128 | # And yield everything we find there. 129 | yield found 130 | elif isinstance(parsed_type, dict): 131 | # It's an array or map. 132 | 133 | if parsed_type["type"] == "array": 134 | # For an array we recurse on items 135 | recurse_on = parsed_type["items"] 136 | elif parsed_type["type"] == "map": 137 | # For a map, we recurse on values. 138 | recurse_on = parsed_type["values"] 139 | else: 140 | # This is not allowed to be a template. 141 | raise RuntimeError("Invalid template {}".format( 142 | parsed_type["type"])) 143 | 144 | for found in find_user_types(recurse_on, namespace): 145 | # Yield everything we find in there. 146 | yield found 147 | elif parsed_type in ["int", "long", "string", "boolean", "float", "double", 148 | "null", "bytes"]: 149 | # If it's a primitive type, skip it. 150 | pass 151 | elif "." in parsed_type: 152 | # It has a dot, so assume it's fully qualified. TODO: Handle partially 153 | # qualified types, where we have to check if this type actually exists. 154 | yield parsed_type 155 | else: 156 | # Just interpret it in our namespace. 157 | 158 | if namespace is not None: 159 | # First attach the namespace if applicable. 160 | parsed_type = "{}.{}".format(namespace, parsed_type) 161 | 162 | # Then give back the type name 163 | yield parsed_type 164 | 165 | def type_to_node(type_name): 166 | """ 167 | Convert an Avro type name (with dots) to a GraphViz node identifier. 168 | 169 | """ 170 | 171 | # First double underscores 172 | type_name = type_name.replace("_", "__") 173 | # Then turn dots into underscores 174 | type_name = type_name.replace(".", "_") 175 | 176 | return type_name 177 | 178 | def type_to_display(type_name): 179 | """ 180 | Convert an Avro fully qualified type name (with dots) to a display name. 181 | 182 | """ 183 | 184 | # Get the thing after the last dot, if any. 185 | return type_name.split(".")[-1] 186 | 187 | def dot_escape(label_content): 188 | """ 189 | Escape the given string so it is safe inside a GraphViz record label. Only 190 | actually handles the caharcters found in Avro type definitions, so not 191 | general purpose. 192 | 193 | """ 194 | 195 | return (label_content.replace("&", "&").replace("<", "<") 196 | .replace(">", ">").replace("\"", """)) 197 | 198 | def parse_avprs(avpr_files, cluster_order, url_file, type_comments_file): 199 | """ 200 | Given an iterator of AVPR file objects to read, return three things: a dict 201 | from fully qualified type names to lists of (field name, field type) tuples, 202 | and a set of (container, containee) containment tuples, and a set of 203 | (referencer, referencee) ID reference tuples. 204 | 205 | """ 206 | 207 | # Holds a dict from cluster key to full url. The key corressponds to a key in clusters, e.g. Key: reads.avdl Value: (the url) 208 | urls = {} 209 | 210 | # Holds a dict from type name to manually entered comment. e.g. Key: ExpressionUnits Value: e.g. FPKM or TPM 211 | type_comments = {} 212 | 213 | # Holds the fields for each type, as lists of tuples of (name, type), 214 | # indexed by type. All types are fully qualified. 215 | fields = {} 216 | 217 | # Holds edge tuples for containment from container to contained. 218 | containments = set() 219 | 220 | # Holds a dict from lower-case short name to fully-qualified name for 221 | # everything with an "id" field. 222 | id_targets = {} 223 | 224 | # Holds a set of tuples of ID references, (fully qualified name of 225 | # referencer, lower-case target name) 226 | id_references = set() 227 | 228 | # Holds the field names from each original .avdl file, in order to draw one cluster of fields for each file 229 | # Key: cluster/file name Value: tuple of field names 230 | clusters = {} 231 | 232 | # Fill in the urls dictionary. 233 | if url_file is not None: 234 | for url in url_file: 235 | cooked_url = url_converter.get_cooked_url(url.strip()) 236 | url_key = cooked_url.split("/")[-1] 237 | urls[url_key] = cooked_url 238 | 239 | # Fill in the type_comments dictionary 240 | if type_comments_file is not None: 241 | for type_comment in type_comments_file: 242 | type_comment_split = type_comment.split("\t") 243 | type_comments[type_comment_split[0]] = type_comment_split[1].strip() 244 | 245 | # Add types to clusters 246 | make_clusters = (cluster_order is not None) 247 | cluster_files = [] 248 | if make_clusters: 249 | cluster_order_list = cluster_order.split() 250 | for cluster in cluster_order_list: 251 | current_cluster = open(os.path.join(os.getcwd(), 'schemas_avpr', cluster + ".avpr"), 'r') 252 | cluster_files.append(current_cluster) 253 | 254 | files_for_iteration = None 255 | if make_clusters: 256 | files_for_iteration = cluster_files 257 | else: 258 | files_for_iteration = avpr_files 259 | 260 | # For avpr_file in avpr_files: 261 | for avpr_file in files_for_iteration: 262 | # Load each protocol that we want to look at. 263 | protocol = json.load(avpr_file) 264 | 265 | # Grab the namespace if set 266 | protocol_namespace = protocol.get("namespace", None) 267 | 268 | #Define cluster key if applicable 269 | cluster_key = None 270 | if make_clusters: 271 | cluster_key = avpr_file.name.split("/")[-1][:-5] + ".avdl" #e.g. path/to/common.avpr will become common.avdl 272 | 273 | for defined_type in protocol.get("types", []): 274 | # Get the name of the type 275 | type_name = defined_type["name"] 276 | 277 | type_namespace = defined_type.get("namespace", protocol_namespace) 278 | 279 | if type_namespace is not None: 280 | type_name = "{}.{}".format(type_namespace, type_name) 281 | 282 | #If make_clusters is set to True, then due to the order of files in cluster_files, a field should not get recorded in the wrong cluster because it is only recorded the first time it is seen. 283 | if fields.has_key(type_name): 284 | # Already saw this one. 285 | continue 286 | 287 | # Record this one as actually existing. 288 | fields[type_name] = [] 289 | 290 | # Record the field in the correct cluster if applicable 291 | if make_clusters: 292 | clusters.setdefault(cluster_key, []).append(type_name) 293 | 294 | # print("Type {}".format(type_name)) 295 | 296 | if defined_type["type"] == "record": 297 | # We can have fields. 298 | 299 | for field in defined_type["fields"]: 300 | # Parse out each field's name and type 301 | field_type = type_to_string(field["type"], type_namespace) 302 | field_name = field["name"] 303 | 304 | # Announce every field with its type 305 | # print("\t{} {}".format(field_type, field_name)) 306 | 307 | # Record the field for the UML. 308 | fields[type_name].append((field_name, field_type)) 309 | 310 | for used in find_user_types(field["type"], type_namespace): 311 | # Announce all the user types it uses 312 | # print("\t\tContainment of {}".format(used)) 313 | 314 | # And record them 315 | containments.add((type_name, used, field_name)) 316 | 317 | if (field_name.lower() == "id" and 318 | u"string" in field_type): 319 | 320 | # This is a possible ID target. Decide what we would 321 | # expect to appear in an ID reference field name. 322 | target_name = type_to_display(type_name).lower() 323 | 324 | if id_targets.has_key(target_name): 325 | # This target is ambiguous. 326 | id_targets[target_name] = None 327 | # print("WARNING: ID target {} exists twice!") 328 | else: 329 | # Say it points here 330 | id_targets[target_name] = type_name 331 | 332 | # print("\t\tFound ID target {}".format(target_name)) 333 | 334 | elif (field_name.lower().endswith("id") or 335 | field_name.lower().endswith("ids")): 336 | # This is probably an ID reference 337 | 338 | if field_name.lower().endswith("id"): 339 | # Chop off exactly these characters 340 | destination = field_name.lower()[0:-2] 341 | elif field_name.lower().endswith("ids"): 342 | # Chop off these instead. TODO: this is super ugly 343 | # and regexes are better. 344 | destination = field_name.lower()[0:-3] 345 | 346 | # Announce and save the reference 347 | # print("\t\tFound ID reference to {}".format( 348 | # destination)) 349 | #Edit 2-23-16: id_references tuples now contains a third index to aid in constructing edges from specific cells in type_name 350 | id_references.add((type_name, destination, field_name)) 351 | 352 | # Now we have to match ID references to targets. This holds the actual 353 | # referencing edges, as (from, to) fully qualified name tuples. 354 | references = set() 355 | 356 | for from_name, to_target, from_field_name in id_references: 357 | # For each reference 358 | 359 | if id_targets.has_key(to_target): 360 | # We point to something, what is it? 361 | to_name = id_targets[to_target] 362 | 363 | if to_name is None: 364 | # We point to something that's ambiguous 365 | # print("WARNING: Ambiguous target {} used by {}!".format( 366 | # to_target, from_name)) 367 | pass 368 | else: 369 | # We point to a real thing. Add the edge. 370 | # print("Matched reference from {} to {} exactly".format( 371 | # from_name, to_name)) 372 | references.add((from_name, to_name, from_field_name)) 373 | 374 | else: 375 | # None of these targets matches exactly 376 | # print("WARNING: {} wanted target {} but it does not exist!".format( 377 | # from_name, to_target)) 378 | 379 | # We will find partial matches, and save them as target, full name 380 | # tuples. 381 | partial_matches = [] 382 | 383 | for actual_target, to_name in id_targets.iteritems(): 384 | # For each possible target, see if it is a partial match 385 | if (actual_target in to_target or 386 | to_target in actual_target): 387 | 388 | partial_matches.append((actual_target, to_name)) 389 | 390 | if len(partial_matches) == 1: 391 | # We found exactly one partial match. Unpack it! 392 | actual_target, to_name = partial_matches[0] 393 | 394 | # Announce and record the match 395 | # print("WARNING: Matched reference from {} to {} on partial " 396 | # "match of {} and {}".format(from_name, to_name, to_target, 397 | # actual_target)) 398 | references.add((from_name, to_name, from_field_name)) 399 | elif len(partial_matches) > 1: 400 | # Complain we got no matches, or too many 401 | # print("WARNING: {} partial matches: {}".format( 402 | # len(partial_matches), 403 | # ", ".join([x[1] for x in partial_matches]))) 404 | pass 405 | 406 | 407 | 408 | return fields, containments, references, clusters, urls, type_comments 409 | 410 | def write_graph_ORIGINAL(dot_file, fields, containments, references): 411 | """ 412 | Given a file object to write to, a dict from type names to lists of (name, 413 | type) field tuples, a set of (container, containee) containment edges, and a 414 | set of (referencer, referencee) ID reference edges, and write a GraphViz 415 | UML. 416 | 417 | See 419 | 420 | """ 421 | 422 | # Start a digraph 423 | dot_file.write("digraph UML {\n") 424 | 425 | # Define node properties: shaped like UML items. 426 | dot_file.write("node [\n") 427 | dot_file.write("\tshape=record\n") 428 | dot_file.write("]\n") 429 | 430 | for type_name, field_list in fields.iteritems(): 431 | # Put a node for each type. 432 | dot_file.write("{} [\n".format(type_to_node(type_name))) 433 | 434 | # Start out the UML body bit with the class name 435 | dot_file.write("\tlabel=\"{{{}".format(type_to_display(type_name))) 436 | 437 | for field_name, field_type in field_list: 438 | # Put each field. Escape the field types. 439 | dot_file.write("|{} : {}".format(field_name, 440 | dot_escape(field_type))) 441 | 442 | # Close the label 443 | dot_file.write("}\"\n") 444 | 445 | # And the node 446 | dot_file.write("]\n") 447 | 448 | # Define edge properties for containments 449 | dot_file.write("edge [\n") 450 | dot_file.write("\tdir=both\n") 451 | dot_file.write("\tarrowtail=odiamond\n") 452 | dot_file.write("\tarrowhead=none\n") 453 | dot_file.write("]\n") 454 | 455 | for container, containee, container_field_name in containments: 456 | # Now do the containment edges 457 | dot_file.write("{} -> {}\n".format(type_to_node(container), 458 | type_to_node(containee))) 459 | 460 | # Define edge properties for references 461 | dot_file.write("edge [\n") 462 | dot_file.write("\tdir=both\n") 463 | dot_file.write("\tarrowtail=none\n") 464 | dot_file.write("\tarrowhead=vee\n") 465 | dot_file.write("\tstyle=dashed\n") 466 | dot_file.write("]\n") 467 | 468 | for referencer, referencee, local_referencee in references: 469 | # Now do the reference edges 470 | dot_file.write("{} -> {}\n".format(type_to_node(referencer), 471 | type_to_node(referencee))) 472 | 473 | # Close the digraph off. 474 | dot_file.write("}\n") 475 | 476 | def write_graph_with_clusters(dot_file, fields, containments, references, clusters, urls, type_comments): 477 | """ 478 | Given a file object to write to, a dict from type names to lists of (name, 479 | type) field tuples, a set of (container, containee) containment edges, and a 480 | set of (referencer, referencee) ID reference edges, and write a GraphViz 481 | UML. 482 | 483 | See 485 | 486 | """ 487 | 488 | # Breaks up a comment string so no more than ~57 characters are on each line 489 | def break_up_comment(comment): 490 | wrapper = textwrap.TextWrapper(break_long_words = False, width = 57) 491 | return "
".join(wrapper.wrap(comment)) 492 | 493 | # Start a digraph 494 | dot_file.write("digraph UML {\n") 495 | 496 | # Define node properties: shaped like UML items. 497 | dot_file.write("node [\n") 498 | dot_file.write("\tshape=plaintext\n") 499 | dot_file.write("]\n\n") 500 | 501 | # Draw each node/type/record as a table 502 | for type_name, field_list in fields.iteritems(): 503 | 504 | dot_file.write("{} [label=<\n".format(type_to_node(type_name)))#type_to_display(type_name))) 505 | dot_file.write("\n") 506 | dot_file.write("\t\n") 507 | dot_file.write("\t\t\n") 512 | dot_file.write("\t\n") 513 | 514 | 515 | # Now draw the rows of fields for the type. A field_list of [a, b, c, d, e, f, g] will have [a, e] in row 1, [b, f] in row 2, [c, g] in row 3, and just [d] in row 4 516 | num_fields = len(field_list) 517 | for i in range(0, num_fields//2 + num_fields%2): 518 | # Draw one row. 519 | dot_file.write("\t\n") 520 | # Port number and displayed text will be the i'th field's name 521 | dot_file.write("\t\t\n".format(field_list[i][0], field_list[i][0])) 522 | if (num_fields%2) == 1 and (i == num_fields//2 + num_fields%2 - 1): 523 | # Don't draw the second cell in the row if you have an odd number of fields and it is the last row 524 | pass 525 | else: 526 | dot_file.write("\t\t\n".format(field_list[num_fields//2 + num_fields%2 + i][0], field_list[num_fields//2 + num_fields%2 + i][0])) 527 | dot_file.write("\t\n") 528 | 529 | # Finish the table 530 | dot_file.write("
{}".format(type_to_display(type_name))) 508 | # Add option to specify description for header: 509 | if type_to_display(type_name) in type_comments: 510 | dot_file.write("
{}".format(break_up_comment(type_comments[type_to_display(type_name)]))) 511 | dot_file.write("
- {}- {}
>];\n\n") 531 | 532 | 533 | # Now define the clusters/subgraphs 534 | for cluster_name, cluster_types in clusters.iteritems(): 535 | # Use type_to_node to replace . with _ 536 | dot_file.write("subgraph cluster_{} {{\n".format(type_to_node(cluster_name))) 537 | dot_file.write("\tstyle=\"rounded, filled\";\n") 538 | dot_file.write("\tcolor=lightgrey;\n") 539 | dot_file.write("\tnode [style=filled,color=white];\n") 540 | dot_file.write("\tlabel = \"{}\";\n".format(cluster_name)) 541 | if cluster_name in urls: 542 | dot_file.write("\tURL=\"{}\";\n".format(urls[cluster_name])) 543 | #After all the cluster formatting, define the cluster types 544 | for cluster_type in cluster_types: 545 | dot_file.write("\t{};\n".format(type_to_node(cluster_type))) #cluster_type should match up with a type_name from fields 546 | dot_file.write("}\n\n") 547 | 548 | 549 | dot_file.write("\n// Define containment edges\n") 550 | # Define edge properties for containments 551 | dot_file.write("edge [\n") 552 | dot_file.write("\tdir=both\n") 553 | dot_file.write("\tarrowtail=odiamond\n") 554 | dot_file.write("\tarrowhead=none\n") 555 | dot_file.write("\tcolor=\"#C55A11\"\n") 556 | dot_file.write("\tpenwidth=2\n") 557 | dot_file.write("]\n\n") 558 | 559 | for container, containee, container_field_name in containments: 560 | # Now do the containment edges 561 | dot_file.write("{}:{}:w -> {}\n".format(type_to_node(container), 562 | container_field_name, type_to_node(containee))) 563 | 564 | dot_file.write("\n// Define references edges\n") 565 | # Define edge properties for references 566 | dot_file.write("\nedge [\n") 567 | dot_file.write("\tdir=both\n") 568 | dot_file.write("\tarrowtail=none\n") 569 | dot_file.write("\tarrowhead=vee\n") 570 | dot_file.write("\tstyle=dashed\n") 571 | dot_file.write("\tcolor=\"darkgreen\"\n") 572 | dot_file.write("\tpenwidth=2\n") 573 | dot_file.write("]\n\n") 574 | 575 | for referencer, referencee, local_referencee in references: 576 | # Now do the reference edges 577 | dot_file.write("{}:{}:w -> {}:id:w\n".format(type_to_node(referencer), local_referencee, 578 | type_to_node(referencee))) 579 | 580 | 581 | 582 | 583 | # Close the digraph off. 584 | dot_file.write("}\n") 585 | 586 | 587 | def main(args): 588 | """ 589 | Parses command line arguments, and does the work of the program. 590 | "args" specifies the program arguments, with args[0] being the executable 591 | name. The return value should be used as the program's exit code. 592 | """ 593 | 594 | options = parse_args(args) # This holds the nicely-parsed options object 595 | 596 | # Parse the AVPR files and get a dict of (field name, field type) tuple 597 | # lists for each user-defined type, a set of (container, containee) 598 | # containment relationships, an a similar set of reference relationships. 599 | fields, containments, references, clusters, urls, type_comments = parse_avprs(options.avprs, options.clusters, options.urls, options.type_comments) 600 | 601 | if options.dot is not None: 602 | # Now we do the output to GraphViz format. 603 | if bool(clusters): #check if the clusters dictionary is empty...if it isn't, draw the clusters 604 | write_graph_with_clusters(options.dot, fields, containments, references, clusters, urls, type_comments) 605 | else: 606 | write_graph_ORIGINAL(options.dot, fields, containments, references) 607 | 608 | 609 | if __name__ == "__main__" : 610 | sys.exit(main(sys.argv)) 611 | -------------------------------------------------------------------------------- /protobuf2uml/example_svgs/bmeg_2016-06-08.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 9 | 10 | UML 11 | 12 | cluster_sample_proto 13 | 14 | 15 | sample_proto 16 | 17 | 18 | 19 | 20 | LinearSignature 21 | 22 | 23 | 24 | LinearSignature 25 | 26 | - name 27 | 28 | - intercept 29 | 30 | - predicts 31 | 32 | - coefficients 33 | 34 | - phenotype 35 | 36 | - signatureForEdges 37 | 38 | - quantile 39 | 40 | 41 | Drug 42 | 43 | 44 | 45 | Drug 46 | 47 | - name 48 | 49 | - info 50 | 51 | - synonyms 52 | 53 | 54 | LinearSignature:signatureForEdges:w->Drug:name:w 55 | 56 | 57 | 58 | 59 | Position 60 | 61 | 62 | 63 | Position 64 | 65 | - name 66 | 67 | - end 68 | 69 | - chromosome 70 | 71 | - strand 72 | 73 | - start 74 | 75 | 76 | Evidence 77 | 78 | 79 | 80 | Evidence 81 | 82 | - name 83 | 84 | - info 85 | 86 | - pmid 87 | 88 | 89 | Phenotype 90 | 91 | 92 | 93 | Phenotype 94 | 95 | - name 96 | 97 | - description 98 | 99 | - isTypeEdges 100 | 101 | 102 | OntologyTerm 103 | 104 | 105 | 106 | OntologyTerm 107 | 108 | - name 109 | 110 | - source 111 | 112 | - term 113 | 114 | 115 | Phenotype:isTypeEdges:w->OntologyTerm:name:w 116 | 117 | 118 | 119 | 120 | Individual 121 | 122 | 123 | 124 | Individual 125 | 126 | - name 127 | 128 | - observations 129 | 130 | - source 131 | 132 | 133 | PhenotypeAssociation 134 | 135 | 136 | 137 | PhenotypeAssociation 138 | 139 | - name 140 | 141 | - hasContextEdges 142 | 143 | - hasGenotypeEdges 144 | 145 | - info 146 | 147 | - hasPhenotypeEdges 148 | 149 | 150 | PhenotypeAssociation:hasContextEdges:w->Evidence:name:w 151 | 152 | 153 | 154 | 155 | PhenotypeAssociation:hasPhenotypeEdges:w->Phenotype:name:w 156 | 157 | 158 | 159 | 160 | PhenotypeAssociation:hasGenotypeEdges:w->Individual:name:w 161 | 162 | 163 | 164 | 165 | Biosample 166 | 167 | 168 | 169 | Biosample 170 | 171 | - name 172 | 173 | - sampleType 174 | 175 | - source 176 | 177 | - sampleOfEdges 178 | 179 | - barcode 180 | 181 | 182 | PhenotypeAssociation:hasGenotypeEdges:w->Biosample:name:w 183 | 184 | 185 | 186 | 187 | VariantCall 188 | 189 | 190 | 191 | VariantCall 192 | 193 | - name 194 | 195 | - tumorAllele2 196 | 197 | - source 198 | 199 | - sequencer 200 | 201 | - variantType 202 | 203 | - atPositionEdges 204 | 205 | - referenceAllele 206 | 207 | - tumorSampleEdges 208 | 209 | - normalAllele1 210 | 211 | - normalSampleEdges 212 | 213 | - normalAllele2 214 | 215 | - info 216 | 217 | - tumorAllele1 218 | 219 | 220 | PhenotypeAssociation:hasGenotypeEdges:w->VariantCall:name:w 221 | 222 | 223 | 224 | 225 | Feature 226 | 227 | 228 | 229 | Feature 230 | 231 | - name 232 | 233 | - attributes 234 | 235 | 236 | PhenotypeAssociation:hasGenotypeEdges:w->Feature:name:w 237 | 238 | 239 | 240 | 241 | PhenotypeAssociation:hasContextEdges:w->Drug:name:w 242 | 243 | 244 | 245 | 246 | VariantCallEffect 247 | 248 | 249 | 250 | VariantCallEffect 251 | 252 | - name 253 | 254 | - effectOfEdges 255 | 256 | - source 257 | 258 | - dbsnpRS 259 | 260 | - variantClassification 261 | 262 | - dbsnpValStatus 263 | 264 | - inDomainEdges 265 | 266 | - info 267 | 268 | - inFeatureEdges 269 | 270 | 271 | VariantCallEffect:effectOfEdges:w->VariantCall:name:w 272 | 273 | 274 | 275 | 276 | Domain 277 | 278 | 279 | 280 | Domain 281 | 282 | - name 283 | 284 | 285 | VariantCallEffect:inDomainEdges:w->Domain:name:w 286 | 287 | 288 | 289 | 290 | VariantCallEffect:inFeatureEdges:w->Feature:name:w 291 | 292 | 293 | 294 | 295 | Biosample:sampleOfEdges:w->Individual:name:w 296 | 297 | 298 | 299 | 300 | VariantCall:atPositionEdges:w->Position:name:w 301 | 302 | 303 | 304 | 305 | VariantCall:normalSampleEdges:w->Biosample:name:w 306 | 307 | 308 | 309 | 310 | VariantCall:tumorSampleEdges:w->Biosample:name:w 311 | 312 | 313 | 314 | 315 | GeneExpression 316 | 317 | 318 | 319 | GeneExpression 320 | 321 | - name 322 | 323 | - expressionForEdges 324 | 325 | - source 326 | 327 | - expressions 328 | 329 | - barcode 330 | 331 | 332 | GeneExpression:expressionForEdges:w->Biosample:name:w 333 | 334 | 335 | 336 | 337 | 338 | -------------------------------------------------------------------------------- /protobuf2uml/descriptor.proto: -------------------------------------------------------------------------------- 1 | // Protocol Buffers - Google's data interchange format 2 | // Copyright 2008 Google Inc. All rights reserved. 3 | // https://developers.google.com/protocol-buffers/ 4 | // 5 | // Redistribution and use in source and binary forms, with or without 6 | // modification, are permitted provided that the following conditions are 7 | // met: 8 | // 9 | // * Redistributions of source code must retain the above copyright 10 | // notice, this list of conditions and the following disclaimer. 11 | // * Redistributions in binary form must reproduce the above 12 | // copyright notice, this list of conditions and the following disclaimer 13 | // in the documentation and/or other materials provided with the 14 | // distribution. 15 | // * Neither the name of Google Inc. nor the names of its 16 | // contributors may be used to endorse or promote products derived from 17 | // this software without specific prior written permission. 18 | // 19 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 | // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 | // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 | // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 | // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 | // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 | // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 | // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | 31 | // Author: kenton@google.com (Kenton Varda) 32 | // Based on original Protocol Buffers design by 33 | // Sanjay Ghemawat, Jeff Dean, and others. 34 | // 35 | // The messages in this file describe the definitions found in .proto files. 36 | // A valid .proto file can be translated directly to a FileDescriptorProto 37 | // without any other information (e.g. without reading its imports). 38 | 39 | 40 | syntax = "proto2"; 41 | 42 | package google.protobuf; 43 | option go_package = "descriptor"; 44 | option java_package = "com.google.protobuf"; 45 | option java_outer_classname = "DescriptorProtos"; 46 | option csharp_namespace = "Google.Protobuf.Reflection"; 47 | option objc_class_prefix = "GPB"; 48 | 49 | // descriptor.proto must be optimized for speed because reflection-based 50 | // algorithms don't work during bootstrapping. 51 | option optimize_for = SPEED; 52 | 53 | // The protocol compiler can output a FileDescriptorSet containing the .proto 54 | // files it parses. 55 | message FileDescriptorSet { 56 | repeated FileDescriptorProto file = 1; 57 | } 58 | 59 | // Describes a complete .proto file. 60 | message FileDescriptorProto { 61 | optional string name = 1; // file name, relative to root of source tree 62 | optional string package = 2; // e.g. "foo", "foo.bar", etc. 63 | 64 | // Names of files imported by this file. 65 | repeated string dependency = 3; 66 | // Indexes of the public imported files in the dependency list above. 67 | repeated int32 public_dependency = 10; 68 | // Indexes of the weak imported files in the dependency list. 69 | // For Google-internal migration only. Do not use. 70 | repeated int32 weak_dependency = 11; 71 | 72 | // All top-level definitions in this file. 73 | repeated DescriptorProto message_type = 4; 74 | repeated EnumDescriptorProto enum_type = 5; 75 | repeated ServiceDescriptorProto service = 6; 76 | repeated FieldDescriptorProto extension = 7; 77 | 78 | optional FileOptions options = 8; 79 | 80 | // This field contains optional information about the original source code. 81 | // You may safely remove this entire field without harming runtime 82 | // functionality of the descriptors -- the information is needed only by 83 | // development tools. 84 | optional SourceCodeInfo source_code_info = 9; 85 | 86 | // The syntax of the proto file. 87 | // The supported values are "proto2" and "proto3". 88 | optional string syntax = 12; 89 | } 90 | 91 | // Describes a message type. 92 | message DescriptorProto { 93 | optional string name = 1; 94 | 95 | repeated FieldDescriptorProto field = 2; 96 | repeated FieldDescriptorProto extension = 6; 97 | 98 | repeated DescriptorProto nested_type = 3; 99 | repeated EnumDescriptorProto enum_type = 4; 100 | 101 | message ExtensionRange { 102 | optional int32 start = 1; 103 | optional int32 end = 2; 104 | } 105 | repeated ExtensionRange extension_range = 5; 106 | 107 | repeated OneofDescriptorProto oneof_decl = 8; 108 | 109 | optional MessageOptions options = 7; 110 | 111 | // Range of reserved tag numbers. Reserved tag numbers may not be used by 112 | // fields or extension ranges in the same message. Reserved ranges may 113 | // not overlap. 114 | message ReservedRange { 115 | optional int32 start = 1; // Inclusive. 116 | optional int32 end = 2; // Exclusive. 117 | } 118 | repeated ReservedRange reserved_range = 9; 119 | // Reserved field names, which may not be used by fields in the same message. 120 | // A given name may only be reserved once. 121 | repeated string reserved_name = 10; 122 | } 123 | 124 | // Describes a field within a message. 125 | message FieldDescriptorProto { 126 | enum Type { 127 | // 0 is reserved for errors. 128 | // Order is weird for historical reasons. 129 | TYPE_DOUBLE = 1; 130 | TYPE_FLOAT = 2; 131 | // Not ZigZag encoded. Negative numbers take 10 bytes. Use TYPE_SINT64 if 132 | // negative values are likely. 133 | TYPE_INT64 = 3; 134 | TYPE_UINT64 = 4; 135 | // Not ZigZag encoded. Negative numbers take 10 bytes. Use TYPE_SINT32 if 136 | // negative values are likely. 137 | TYPE_INT32 = 5; 138 | TYPE_FIXED64 = 6; 139 | TYPE_FIXED32 = 7; 140 | TYPE_BOOL = 8; 141 | TYPE_STRING = 9; 142 | TYPE_GROUP = 10; // Tag-delimited aggregate. 143 | TYPE_MESSAGE = 11; // Length-delimited aggregate. 144 | 145 | // New in version 2. 146 | TYPE_BYTES = 12; 147 | TYPE_UINT32 = 13; 148 | TYPE_ENUM = 14; 149 | TYPE_SFIXED32 = 15; 150 | TYPE_SFIXED64 = 16; 151 | TYPE_SINT32 = 17; // Uses ZigZag encoding. 152 | TYPE_SINT64 = 18; // Uses ZigZag encoding. 153 | }; 154 | 155 | enum Label { 156 | // 0 is reserved for errors 157 | LABEL_OPTIONAL = 1; 158 | LABEL_REQUIRED = 2; 159 | LABEL_REPEATED = 3; 160 | // TODO(sanjay): Should we add LABEL_MAP? 161 | }; 162 | 163 | optional string name = 1; 164 | optional int32 number = 3; 165 | optional Label label = 4; 166 | 167 | // If type_name is set, this need not be set. If both this and type_name 168 | // are set, this must be one of TYPE_ENUM, TYPE_MESSAGE or TYPE_GROUP. 169 | optional Type type = 5; 170 | 171 | // For message and enum types, this is the name of the type. If the name 172 | // starts with a '.', it is fully-qualified. Otherwise, C++-like scoping 173 | // rules are used to find the type (i.e. first the nested types within this 174 | // message are searched, then within the parent, on up to the root 175 | // namespace). 176 | optional string type_name = 6; 177 | 178 | // For extensions, this is the name of the type being extended. It is 179 | // resolved in the same manner as type_name. 180 | optional string extendee = 2; 181 | 182 | // For numeric types, contains the original text representation of the value. 183 | // For booleans, "true" or "false". 184 | // For strings, contains the default text contents (not escaped in any way). 185 | // For bytes, contains the C escaped value. All bytes >= 128 are escaped. 186 | // TODO(kenton): Base-64 encode? 187 | optional string default_value = 7; 188 | 189 | // If set, gives the index of a oneof in the containing type's oneof_decl 190 | // list. This field is a member of that oneof. 191 | optional int32 oneof_index = 9; 192 | 193 | // JSON name of this field. The value is set by protocol compiler. If the 194 | // user has set a "json_name" option on this field, that option's value 195 | // will be used. Otherwise, it's deduced from the field's name by converting 196 | // it to camelCase. 197 | optional string json_name = 10; 198 | 199 | optional FieldOptions options = 8; 200 | } 201 | 202 | // Describes a oneof. 203 | message OneofDescriptorProto { 204 | optional string name = 1; 205 | } 206 | 207 | // Describes an enum type. 208 | message EnumDescriptorProto { 209 | optional string name = 1; 210 | 211 | repeated EnumValueDescriptorProto value = 2; 212 | 213 | optional EnumOptions options = 3; 214 | } 215 | 216 | // Describes a value within an enum. 217 | message EnumValueDescriptorProto { 218 | optional string name = 1; 219 | optional int32 number = 2; 220 | 221 | optional EnumValueOptions options = 3; 222 | } 223 | 224 | // Describes a service. 225 | message ServiceDescriptorProto { 226 | optional string name = 1; 227 | repeated MethodDescriptorProto method = 2; 228 | 229 | optional ServiceOptions options = 3; 230 | } 231 | 232 | // Describes a method of a service. 233 | message MethodDescriptorProto { 234 | optional string name = 1; 235 | 236 | // Input and output type names. These are resolved in the same way as 237 | // FieldDescriptorProto.type_name, but must refer to a message type. 238 | optional string input_type = 2; 239 | optional string output_type = 3; 240 | 241 | optional MethodOptions options = 4; 242 | 243 | // Identifies if client streams multiple client messages 244 | optional bool client_streaming = 5 [default=false]; 245 | // Identifies if server streams multiple server messages 246 | optional bool server_streaming = 6 [default=false]; 247 | } 248 | 249 | 250 | // =================================================================== 251 | // Options 252 | 253 | // Each of the definitions above may have "options" attached. These are 254 | // just annotations which may cause code to be generated slightly differently 255 | // or may contain hints for code that manipulates protocol messages. 256 | // 257 | // Clients may define custom options as extensions of the *Options messages. 258 | // These extensions may not yet be known at parsing time, so the parser cannot 259 | // store the values in them. Instead it stores them in a field in the *Options 260 | // message called uninterpreted_option. This field must have the same name 261 | // across all *Options messages. We then use this field to populate the 262 | // extensions when we build a descriptor, at which point all protos have been 263 | // parsed and so all extensions are known. 264 | // 265 | // Extension numbers for custom options may be chosen as follows: 266 | // * For options which will only be used within a single application or 267 | // organization, or for experimental options, use field numbers 50000 268 | // through 99999. It is up to you to ensure that you do not use the 269 | // same number for multiple options. 270 | // * For options which will be published and used publicly by multiple 271 | // independent entities, e-mail protobuf-global-extension-registry@google.com 272 | // to reserve extension numbers. Simply provide your project name (e.g. 273 | // Objective-C plugin) and your project website (if available) -- there's no 274 | // need to explain how you intend to use them. Usually you only need one 275 | // extension number. You can declare multiple options with only one extension 276 | // number by putting them in a sub-message. See the Custom Options section of 277 | // the docs for examples: 278 | // https://developers.google.com/protocol-buffers/docs/proto#options 279 | // If this turns out to be popular, a web service will be set up 280 | // to automatically assign option numbers. 281 | 282 | 283 | message FileOptions { 284 | 285 | // Sets the Java package where classes generated from this .proto will be 286 | // placed. By default, the proto package is used, but this is often 287 | // inappropriate because proto packages do not normally start with backwards 288 | // domain names. 289 | optional string java_package = 1; 290 | 291 | 292 | // If set, all the classes from the .proto file are wrapped in a single 293 | // outer class with the given name. This applies to both Proto1 294 | // (equivalent to the old "--one_java_file" option) and Proto2 (where 295 | // a .proto always translates to a single class, but you may want to 296 | // explicitly choose the class name). 297 | optional string java_outer_classname = 8; 298 | 299 | // If set true, then the Java code generator will generate a separate .java 300 | // file for each top-level message, enum, and service defined in the .proto 301 | // file. Thus, these types will *not* be nested inside the outer class 302 | // named by java_outer_classname. However, the outer class will still be 303 | // generated to contain the file's getDescriptor() method as well as any 304 | // top-level extensions defined in the file. 305 | optional bool java_multiple_files = 10 [default=false]; 306 | 307 | // If set true, then the Java code generator will generate equals() and 308 | // hashCode() methods for all messages defined in the .proto file. 309 | // This increases generated code size, potentially substantially for large 310 | // protos, which may harm a memory-constrained application. 311 | // - In the full runtime this is a speed optimization, as the 312 | // AbstractMessage base class includes reflection-based implementations of 313 | // these methods. 314 | // - In the lite runtime, setting this option changes the semantics of 315 | // equals() and hashCode() to more closely match those of the full runtime; 316 | // the generated methods compute their results based on field values rather 317 | // than object identity. (Implementations should not assume that hashcodes 318 | // will be consistent across runtimes or versions of the protocol compiler.) 319 | optional bool java_generate_equals_and_hash = 20 [default=false]; 320 | 321 | // If set true, then the Java2 code generator will generate code that 322 | // throws an exception whenever an attempt is made to assign a non-UTF-8 323 | // byte sequence to a string field. 324 | // Message reflection will do the same. 325 | // However, an extension field still accepts non-UTF-8 byte sequences. 326 | // This option has no effect on when used with the lite runtime. 327 | optional bool java_string_check_utf8 = 27 [default=false]; 328 | 329 | 330 | // Generated classes can be optimized for speed or code size. 331 | enum OptimizeMode { 332 | SPEED = 1; // Generate complete code for parsing, serialization, 333 | // etc. 334 | CODE_SIZE = 2; // Use ReflectionOps to implement these methods. 335 | LITE_RUNTIME = 3; // Generate code using MessageLite and the lite runtime. 336 | } 337 | optional OptimizeMode optimize_for = 9 [default=SPEED]; 338 | 339 | // Sets the Go package where structs generated from this .proto will be 340 | // placed. If omitted, the Go package will be derived from the following: 341 | // - The basename of the package import path, if provided. 342 | // - Otherwise, the package statement in the .proto file, if present. 343 | // - Otherwise, the basename of the .proto file, without extension. 344 | optional string go_package = 11; 345 | 346 | 347 | 348 | // Should generic services be generated in each language? "Generic" services 349 | // are not specific to any particular RPC system. They are generated by the 350 | // main code generators in each language (without additional plugins). 351 | // Generic services were the only kind of service generation supported by 352 | // early versions of google.protobuf. 353 | // 354 | // Generic services are now considered deprecated in favor of using plugins 355 | // that generate code specific to your particular RPC system. Therefore, 356 | // these default to false. Old code which depends on generic services should 357 | // explicitly set them to true. 358 | optional bool cc_generic_services = 16 [default=false]; 359 | optional bool java_generic_services = 17 [default=false]; 360 | optional bool py_generic_services = 18 [default=false]; 361 | 362 | // Is this file deprecated? 363 | // Depending on the target platform, this can emit Deprecated annotations 364 | // for everything in the file, or it will be completely ignored; in the very 365 | // least, this is a formalization for deprecating files. 366 | optional bool deprecated = 23 [default=false]; 367 | 368 | // Enables the use of arenas for the proto messages in this file. This applies 369 | // only to generated classes for C++. 370 | optional bool cc_enable_arenas = 31 [default=false]; 371 | 372 | 373 | // Sets the objective c class prefix which is prepended to all objective c 374 | // generated classes from this .proto. There is no default. 375 | optional string objc_class_prefix = 36; 376 | 377 | // Namespace for generated classes; defaults to the package. 378 | optional string csharp_namespace = 37; 379 | 380 | // Whether the nano proto compiler should generate in the deprecated non-nano 381 | // suffixed package. 382 | optional bool javanano_use_deprecated_package = 38; 383 | 384 | // The parser stores options it doesn't recognize here. See above. 385 | repeated UninterpretedOption uninterpreted_option = 999; 386 | 387 | // Clients can define custom options in extensions of this message. See above. 388 | extensions 1000 to max; 389 | } 390 | 391 | message MessageOptions { 392 | // Set true to use the old proto1 MessageSet wire format for extensions. 393 | // This is provided for backwards-compatibility with the MessageSet wire 394 | // format. You should not use this for any other reason: It's less 395 | // efficient, has fewer features, and is more complicated. 396 | // 397 | // The message must be defined exactly as follows: 398 | // message Foo { 399 | // option message_set_wire_format = true; 400 | // extensions 4 to max; 401 | // } 402 | // Note that the message cannot have any defined fields; MessageSets only 403 | // have extensions. 404 | // 405 | // All extensions of your type must be singular messages; e.g. they cannot 406 | // be int32s, enums, or repeated messages. 407 | // 408 | // Because this is an option, the above two restrictions are not enforced by 409 | // the protocol compiler. 410 | optional bool message_set_wire_format = 1 [default=false]; 411 | 412 | // Disables the generation of the standard "descriptor()" accessor, which can 413 | // conflict with a field of the same name. This is meant to make migration 414 | // from proto1 easier; new code should avoid fields named "descriptor". 415 | optional bool no_standard_descriptor_accessor = 2 [default=false]; 416 | 417 | // Is this message deprecated? 418 | // Depending on the target platform, this can emit Deprecated annotations 419 | // for the message, or it will be completely ignored; in the very least, 420 | // this is a formalization for deprecating messages. 421 | optional bool deprecated = 3 [default=false]; 422 | 423 | // Whether the message is an automatically generated map entry type for the 424 | // maps field. 425 | // 426 | // For maps fields: 427 | // map map_field = 1; 428 | // The parsed descriptor looks like: 429 | // message MapFieldEntry { 430 | // option map_entry = true; 431 | // optional KeyType key = 1; 432 | // optional ValueType value = 2; 433 | // } 434 | // repeated MapFieldEntry map_field = 1; 435 | // 436 | // Implementations may choose not to generate the map_entry=true message, but 437 | // use a native map in the target language to hold the keys and values. 438 | // The reflection APIs in such implementions still need to work as 439 | // if the field is a repeated message field. 440 | // 441 | // NOTE: Do not set the option in .proto files. Always use the maps syntax 442 | // instead. The option should only be implicitly set by the proto compiler 443 | // parser. 444 | optional bool map_entry = 7; 445 | 446 | // The parser stores options it doesn't recognize here. See above. 447 | repeated UninterpretedOption uninterpreted_option = 999; 448 | 449 | // Clients can define custom options in extensions of this message. See above. 450 | extensions 1000 to max; 451 | } 452 | 453 | message FieldOptions { 454 | // The ctype option instructs the C++ code generator to use a different 455 | // representation of the field than it normally would. See the specific 456 | // options below. This option is not yet implemented in the open source 457 | // release -- sorry, we'll try to include it in a future version! 458 | optional CType ctype = 1 [default = STRING]; 459 | enum CType { 460 | // Default mode. 461 | STRING = 0; 462 | 463 | CORD = 1; 464 | 465 | STRING_PIECE = 2; 466 | } 467 | // The packed option can be enabled for repeated primitive fields to enable 468 | // a more efficient representation on the wire. Rather than repeatedly 469 | // writing the tag and type for each element, the entire array is encoded as 470 | // a single length-delimited blob. In proto3, only explicit setting it to 471 | // false will avoid using packed encoding. 472 | optional bool packed = 2; 473 | 474 | 475 | // The jstype option determines the JavaScript type used for values of the 476 | // field. The option is permitted only for 64 bit integral and fixed types 477 | // (int64, uint64, sint64, fixed64, sfixed64). By default these types are 478 | // represented as JavaScript strings. This avoids loss of precision that can 479 | // happen when a large value is converted to a floating point JavaScript 480 | // numbers. Specifying JS_NUMBER for the jstype causes the generated 481 | // JavaScript code to use the JavaScript "number" type instead of strings. 482 | // This option is an enum to permit additional types to be added, 483 | // e.g. goog.math.Integer. 484 | optional JSType jstype = 6 [default = JS_NORMAL]; 485 | enum JSType { 486 | // Use the default type. 487 | JS_NORMAL = 0; 488 | 489 | // Use JavaScript strings. 490 | JS_STRING = 1; 491 | 492 | // Use JavaScript numbers. 493 | JS_NUMBER = 2; 494 | } 495 | 496 | // Should this field be parsed lazily? Lazy applies only to message-type 497 | // fields. It means that when the outer message is initially parsed, the 498 | // inner message's contents will not be parsed but instead stored in encoded 499 | // form. The inner message will actually be parsed when it is first accessed. 500 | // 501 | // This is only a hint. Implementations are free to choose whether to use 502 | // eager or lazy parsing regardless of the value of this option. However, 503 | // setting this option true suggests that the protocol author believes that 504 | // using lazy parsing on this field is worth the additional bookkeeping 505 | // overhead typically needed to implement it. 506 | // 507 | // This option does not affect the public interface of any generated code; 508 | // all method signatures remain the same. Furthermore, thread-safety of the 509 | // interface is not affected by this option; const methods remain safe to 510 | // call from multiple threads concurrently, while non-const methods continue 511 | // to require exclusive access. 512 | // 513 | // 514 | // Note that implementations may choose not to check required fields within 515 | // a lazy sub-message. That is, calling IsInitialized() on the outher message 516 | // may return true even if the inner message has missing required fields. 517 | // This is necessary because otherwise the inner message would have to be 518 | // parsed in order to perform the check, defeating the purpose of lazy 519 | // parsing. An implementation which chooses not to check required fields 520 | // must be consistent about it. That is, for any particular sub-message, the 521 | // implementation must either *always* check its required fields, or *never* 522 | // check its required fields, regardless of whether or not the message has 523 | // been parsed. 524 | optional bool lazy = 5 [default=false]; 525 | 526 | // Is this field deprecated? 527 | // Depending on the target platform, this can emit Deprecated annotations 528 | // for accessors, or it will be completely ignored; in the very least, this 529 | // is a formalization for deprecating fields. 530 | optional bool deprecated = 3 [default=false]; 531 | 532 | // For Google-internal migration only. Do not use. 533 | optional bool weak = 10 [default=false]; 534 | 535 | 536 | // The parser stores options it doesn't recognize here. See above. 537 | repeated UninterpretedOption uninterpreted_option = 999; 538 | 539 | // Clients can define custom options in extensions of this message. See above. 540 | extensions 1000 to max; 541 | } 542 | 543 | message EnumOptions { 544 | 545 | // Set this option to true to allow mapping different tag names to the same 546 | // value. 547 | optional bool allow_alias = 2; 548 | 549 | // Is this enum deprecated? 550 | // Depending on the target platform, this can emit Deprecated annotations 551 | // for the enum, or it will be completely ignored; in the very least, this 552 | // is a formalization for deprecating enums. 553 | optional bool deprecated = 3 [default=false]; 554 | 555 | // The parser stores options it doesn't recognize here. See above. 556 | repeated UninterpretedOption uninterpreted_option = 999; 557 | 558 | // Clients can define custom options in extensions of this message. See above. 559 | extensions 1000 to max; 560 | } 561 | 562 | message EnumValueOptions { 563 | // Is this enum value deprecated? 564 | // Depending on the target platform, this can emit Deprecated annotations 565 | // for the enum value, or it will be completely ignored; in the very least, 566 | // this is a formalization for deprecating enum values. 567 | optional bool deprecated = 1 [default=false]; 568 | 569 | // The parser stores options it doesn't recognize here. See above. 570 | repeated UninterpretedOption uninterpreted_option = 999; 571 | 572 | // Clients can define custom options in extensions of this message. See above. 573 | extensions 1000 to max; 574 | } 575 | 576 | message ServiceOptions { 577 | 578 | // Note: Field numbers 1 through 32 are reserved for Google's internal RPC 579 | // framework. We apologize for hoarding these numbers to ourselves, but 580 | // we were already using them long before we decided to release Protocol 581 | // Buffers. 582 | 583 | // Is this service deprecated? 584 | // Depending on the target platform, this can emit Deprecated annotations 585 | // for the service, or it will be completely ignored; in the very least, 586 | // this is a formalization for deprecating services. 587 | optional bool deprecated = 33 [default=false]; 588 | 589 | // The parser stores options it doesn't recognize here. See above. 590 | repeated UninterpretedOption uninterpreted_option = 999; 591 | 592 | // Clients can define custom options in extensions of this message. See above. 593 | extensions 1000 to max; 594 | } 595 | 596 | message MethodOptions { 597 | 598 | // Note: Field numbers 1 through 32 are reserved for Google's internal RPC 599 | // framework. We apologize for hoarding these numbers to ourselves, but 600 | // we were already using them long before we decided to release Protocol 601 | // Buffers. 602 | 603 | // Is this method deprecated? 604 | // Depending on the target platform, this can emit Deprecated annotations 605 | // for the method, or it will be completely ignored; in the very least, 606 | // this is a formalization for deprecating methods. 607 | optional bool deprecated = 33 [default=false]; 608 | 609 | // The parser stores options it doesn't recognize here. See above. 610 | repeated UninterpretedOption uninterpreted_option = 999; 611 | 612 | // Clients can define custom options in extensions of this message. See above. 613 | extensions 1000 to max; 614 | } 615 | 616 | 617 | // A message representing a option the parser does not recognize. This only 618 | // appears in options protos created by the compiler::Parser class. 619 | // DescriptorPool resolves these when building Descriptor objects. Therefore, 620 | // options protos in descriptor objects (e.g. returned by Descriptor::options(), 621 | // or produced by Descriptor::CopyTo()) will never have UninterpretedOptions 622 | // in them. 623 | message UninterpretedOption { 624 | // The name of the uninterpreted option. Each string represents a segment in 625 | // a dot-separated name. is_extension is true iff a segment represents an 626 | // extension (denoted with parentheses in options specs in .proto files). 627 | // E.g.,{ ["foo", false], ["bar.baz", true], ["qux", false] } represents 628 | // "foo.(bar.baz).qux". 629 | message NamePart { 630 | required string name_part = 1; 631 | required bool is_extension = 2; 632 | } 633 | repeated NamePart name = 2; 634 | 635 | // The value of the uninterpreted option, in whatever type the tokenizer 636 | // identified it as during parsing. Exactly one of these should be set. 637 | optional string identifier_value = 3; 638 | optional uint64 positive_int_value = 4; 639 | optional int64 negative_int_value = 5; 640 | optional double double_value = 6; 641 | optional bytes string_value = 7; 642 | optional string aggregate_value = 8; 643 | } 644 | 645 | // =================================================================== 646 | // Optional source code info 647 | 648 | // Encapsulates information about the original source file from which a 649 | // FileDescriptorProto was generated. 650 | message SourceCodeInfo { 651 | // A Location identifies a piece of source code in a .proto file which 652 | // corresponds to a particular definition. This information is intended 653 | // to be useful to IDEs, code indexers, documentation generators, and similar 654 | // tools. 655 | // 656 | // For example, say we have a file like: 657 | // message Foo { 658 | // optional string foo = 1; 659 | // } 660 | // Let's look at just the field definition: 661 | // optional string foo = 1; 662 | // ^ ^^ ^^ ^ ^^^ 663 | // a bc de f ghi 664 | // We have the following locations: 665 | // span path represents 666 | // [a,i) [ 4, 0, 2, 0 ] The whole field definition. 667 | // [a,b) [ 4, 0, 2, 0, 4 ] The label (optional). 668 | // [c,d) [ 4, 0, 2, 0, 5 ] The type (string). 669 | // [e,f) [ 4, 0, 2, 0, 1 ] The name (foo). 670 | // [g,h) [ 4, 0, 2, 0, 3 ] The number (1). 671 | // 672 | // Notes: 673 | // - A location may refer to a repeated field itself (i.e. not to any 674 | // particular index within it). This is used whenever a set of elements are 675 | // logically enclosed in a single code segment. For example, an entire 676 | // extend block (possibly containing multiple extension definitions) will 677 | // have an outer location whose path refers to the "extensions" repeated 678 | // field without an index. 679 | // - Multiple locations may have the same path. This happens when a single 680 | // logical declaration is spread out across multiple places. The most 681 | // obvious example is the "extend" block again -- there may be multiple 682 | // extend blocks in the same scope, each of which will have the same path. 683 | // - A location's span is not always a subset of its parent's span. For 684 | // example, the "extendee" of an extension declaration appears at the 685 | // beginning of the "extend" block and is shared by all extensions within 686 | // the block. 687 | // - Just because a location's span is a subset of some other location's span 688 | // does not mean that it is a descendent. For example, a "group" defines 689 | // both a type and a field in a single declaration. Thus, the locations 690 | // corresponding to the type and field and their components will overlap. 691 | // - Code which tries to interpret locations should probably be designed to 692 | // ignore those that it doesn't understand, as more types of locations could 693 | // be recorded in the future. 694 | repeated Location location = 1; 695 | message Location { 696 | // Identifies which part of the FileDescriptorProto was defined at this 697 | // location. 698 | // 699 | // Each element is a field number or an index. They form a path from 700 | // the root FileDescriptorProto to the place where the definition. For 701 | // example, this path: 702 | // [ 4, 3, 2, 7, 1 ] 703 | // refers to: 704 | // file.message_type(3) // 4, 3 705 | // .field(7) // 2, 7 706 | // .name() // 1 707 | // This is because FileDescriptorProto.message_type has field number 4: 708 | // repeated DescriptorProto message_type = 4; 709 | // and DescriptorProto.field has field number 2: 710 | // repeated FieldDescriptorProto field = 2; 711 | // and FieldDescriptorProto.name has field number 1: 712 | // optional string name = 1; 713 | // 714 | // Thus, the above path gives the location of a field name. If we removed 715 | // the last element: 716 | // [ 4, 3, 2, 7 ] 717 | // this path refers to the whole field declaration (from the beginning 718 | // of the label to the terminating semicolon). 719 | repeated int32 path = 1 [packed=true]; 720 | 721 | // Always has exactly three or four elements: start line, start column, 722 | // end line (optional, otherwise assumed same as start line), end column. 723 | // These are packed into a single field for efficiency. Note that line 724 | // and column numbers are zero-based -- typically you will want to add 725 | // 1 to each before displaying to a user. 726 | repeated int32 span = 2 [packed=true]; 727 | 728 | // If this SourceCodeInfo represents a complete declaration, these are any 729 | // comments appearing before and after the declaration which appear to be 730 | // attached to the declaration. 731 | // 732 | // A series of line comments appearing on consecutive lines, with no other 733 | // tokens appearing on those lines, will be treated as a single comment. 734 | // 735 | // leading_detached_comments will keep paragraphs of comments that appear 736 | // before (but not connected to) the current element. Each paragraph, 737 | // separated by empty lines, will be one comment element in the repeated 738 | // field. 739 | // 740 | // Only the comment content is provided; comment markers (e.g. //) are 741 | // stripped out. For block comments, leading whitespace and an asterisk 742 | // will be stripped from the beginning of each line other than the first. 743 | // Newlines are included in the output. 744 | // 745 | // Examples: 746 | // 747 | // optional int32 foo = 1; // Comment attached to foo. 748 | // // Comment attached to bar. 749 | // optional int32 bar = 2; 750 | // 751 | // optional string baz = 3; 752 | // // Comment attached to baz. 753 | // // Another line attached to baz. 754 | // 755 | // // Comment attached to qux. 756 | // // 757 | // // Another line attached to qux. 758 | // optional double qux = 4; 759 | // 760 | // // Detached comment for corge. This is not leading or trailing comments 761 | // // to qux or corge because there are blank lines separating it from 762 | // // both. 763 | // 764 | // // Detached comment for corge paragraph 2. 765 | // 766 | // optional string corge = 5; 767 | // /* Block comment attached 768 | // * to corge. Leading asterisks 769 | // * will be removed. */ 770 | // /* Block comment attached to 771 | // * grault. */ 772 | // optional int32 grault = 6; 773 | // 774 | // // ignored detached comments. 775 | optional string leading_comments = 3; 776 | optional string trailing_comments = 4; 777 | repeated string leading_detached_comments = 6; 778 | } 779 | } 780 | -------------------------------------------------------------------------------- /avro2uml/example_svgs/g2p_2016-02-26.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 9 | 10 | UML 11 | 12 | cluster_genotypephenotype_avdl 13 | 14 | 15 | genotypephenotype.avdl 16 | 17 | 18 | 19 | cluster_common_avdl 20 | 21 | 22 | common.avdl 23 | 24 | 25 | 26 | cluster_sequenceAnnotations_avdl 27 | 28 | 29 | sequenceAnnotations.avdl 30 | 31 | 32 | 33 | cluster_metadata_avdl 34 | 35 | 36 | metadata.avdl 37 | 38 | 39 | 40 | 41 | org_ga4gh_models_ExternalIdentifier 42 | 43 | 44 | 45 | ExternalIdentifier 46 | 47 | - database 48 | 49 | - version 50 | 51 | - identifier 52 | 53 | 54 | org_ga4gh_models_Evidence 55 | 56 | 57 | 58 | Evidence 59 | 60 | - evidenceType 61 | 62 | - description 63 | 64 | 65 | org_ga4gh_models_OntologyTerm 66 | 67 | 68 | 69 | OntologyTerm 70 | 71 | - id 72 | 73 | - sourceName 74 | 75 | - term 76 | 77 | - sourceVersion 78 | 79 | 80 | org_ga4gh_models_Evidence:evidenceType:w->org_ga4gh_models_OntologyTerm 81 | 82 | 83 | 84 | 85 | org_ga4gh_models_Feature 86 | 87 | 88 | 89 | Feature 90 | Node in annotation graph that annotates a contiguous 91 | region of a sequence 92 | 93 | - id 94 | 95 | - start 96 | 97 | - parentId 98 | 99 | - end 100 | 101 | - childIds 102 | 103 | - strand 104 | 105 | - featureSetId 106 | 107 | - featureType 108 | 109 | - referenceName 110 | 111 | - attributes 112 | 113 | 114 | org_ga4gh_models_Feature:featureType:w->org_ga4gh_models_OntologyTerm 115 | 116 | 117 | 118 | 119 | org_ga4gh_models_Attributes 120 | 121 | 122 | 123 | Attributes 124 | Attributes/values associated with various protocol 125 | records 126 | 127 | - vals 128 | 129 | 130 | org_ga4gh_models_Feature:attributes:w->org_ga4gh_models_Attributes 131 | 132 | 133 | 134 | 135 | org_ga4gh_models_FeatureSet 136 | 137 | 138 | 139 | FeatureSet 140 | 141 | - id 142 | 143 | - name 144 | 145 | - datasetId 146 | 147 | - sourceURI 148 | 149 | - referenceSetId 150 | 151 | - info 152 | 153 | 154 | org_ga4gh_models_Feature:featureSetId:w->org_ga4gh_models_FeatureSet:id:w 155 | 156 | 157 | 158 | 159 | org_ga4gh_models_Strand 160 | 161 | 162 | 163 | Strand 164 | 165 | 166 | org_ga4gh_models_Feature:strand:w->org_ga4gh_models_Strand 167 | 168 | 169 | 170 | 171 | org_ga4gh_models_Dataset 172 | 173 | 174 | 175 | Dataset 176 | 177 | - id 178 | 179 | - description 180 | 181 | - name 182 | 183 | 184 | org_ga4gh_models_Attributes:vals:w->org_ga4gh_models_ExternalIdentifier 185 | 186 | 187 | 188 | 189 | org_ga4gh_models_Attributes:vals:w->org_ga4gh_models_OntologyTerm 190 | 191 | 192 | 193 | 194 | org_ga4gh_models_FeaturePhenotypeAssociation 195 | 196 | 197 | 198 | FeaturePhenotypeAssociation 199 | 200 | - id 201 | 202 | - phenotype 203 | 204 | - phenotypeAssociationSetId 205 | 206 | - description 207 | 208 | - features 209 | 210 | - environmentalContexts 211 | 212 | - evidence 213 | 214 | 215 | org_ga4gh_models_FeaturePhenotypeAssociation:evidence:w->org_ga4gh_models_Evidence 216 | 217 | 218 | 219 | 220 | org_ga4gh_models_FeaturePhenotypeAssociation:features:w->org_ga4gh_models_Feature 221 | 222 | 223 | 224 | 225 | org_ga4gh_models_PhenotypeAssociationSet 226 | 227 | 228 | 229 | PhenotypeAssociationSet 230 | 231 | - id 232 | 233 | - datasetId 234 | 235 | - name 236 | 237 | - info 238 | 239 | 240 | org_ga4gh_models_FeaturePhenotypeAssociation:phenotypeAssociationSetId:w->org_ga4gh_models_PhenotypeAssociationSet:id:w 241 | 242 | 243 | 244 | 245 | org_ga4gh_models_PhenotypeInstance 246 | 247 | 248 | 249 | PhenotypeInstance 250 | 251 | - id 252 | 253 | - ageOfOnset 254 | 255 | - type 256 | 257 | - description 258 | 259 | - qualifier 260 | 261 | 262 | org_ga4gh_models_FeaturePhenotypeAssociation:phenotype:w->org_ga4gh_models_PhenotypeInstance 263 | 264 | 265 | 266 | 267 | org_ga4gh_models_EnvironmentalContext 268 | 269 | 270 | 271 | EnvironmentalContext 272 | 273 | - id 274 | 275 | - description 276 | 277 | - environmentType 278 | 279 | 280 | org_ga4gh_models_FeaturePhenotypeAssociation:environmentalContexts:w->org_ga4gh_models_EnvironmentalContext 281 | 282 | 283 | 284 | 285 | org_ga4gh_models_PhenotypeAssociationSet:datasetId:w->org_ga4gh_models_Dataset:id:w 286 | 287 | 288 | 289 | 290 | org_ga4gh_models_FeatureSet:datasetId:w->org_ga4gh_models_Dataset:id:w 291 | 292 | 293 | 294 | 295 | org_ga4gh_models_CigarUnit 296 | 297 | 298 | 299 | CigarUnit 300 | 301 | - operation 302 | 303 | - referenceSequence 304 | 305 | - operationLength 306 | 307 | 308 | org_ga4gh_models_CigarOperation 309 | 310 | 311 | 312 | CigarOperation 313 | 314 | 315 | org_ga4gh_models_CigarUnit:operation:w->org_ga4gh_models_CigarOperation 316 | 317 | 318 | 319 | 320 | org_ga4gh_models_PhenotypeInstance:type:w->org_ga4gh_models_OntologyTerm 321 | 322 | 323 | 324 | 325 | org_ga4gh_models_PhenotypeInstance:qualifier:w->org_ga4gh_models_OntologyTerm 326 | 327 | 328 | 329 | 330 | org_ga4gh_models_PhenotypeInstance:ageOfOnset:w->org_ga4gh_models_OntologyTerm 331 | 332 | 333 | 334 | 335 | org_ga4gh_models_Position 336 | 337 | 338 | 339 | Position 340 | 341 | - referenceName 342 | 343 | - strand 344 | 345 | - position 346 | 347 | 348 | org_ga4gh_models_Position:strand:w->org_ga4gh_models_Strand 349 | 350 | 351 | 352 | 353 | org_ga4gh_models_Experiment 354 | 355 | 356 | 357 | Experiment 358 | 359 | - id 360 | 361 | - selection 362 | 363 | - name 364 | 365 | - library 366 | 367 | - description 368 | 369 | - libraryLayout 370 | 371 | - recordCreateTime 372 | 373 | - instrumentModel 374 | 375 | - recordUpdateTime 376 | 377 | - instrumentDataFile 378 | 379 | - runTime 380 | 381 | - sequencingCenter 382 | 383 | - molecule 384 | 385 | - platformUnit 386 | 387 | - strategy 388 | 389 | - info 390 | 391 | 392 | org_ga4gh_models_EnvironmentalContext:environmentType:w->org_ga4gh_models_OntologyTerm 393 | 394 | 395 | 396 | 397 | 398 | --------------------------------------------------------------------------------