├── .gitignore ├── LICENSE ├── README.md ├── browser-guides ├── apoc │ ├── 01_apoc_intro.adoc │ ├── 02_datetime.adoc │ ├── 03_load_json.adoc │ ├── 04_refactor_data.adoc │ ├── 05_periodic.adoc │ └── apoc.adoc ├── bebe │ └── bebe_en.adoc ├── data │ ├── CompanyDataAmericans.csv │ ├── ElectionDonationsAmericans.csv │ ├── LandOwnershipAmericans.csv │ ├── PSCAmericans.csv │ ├── asoiaf-all-edges.csv │ ├── asoiaf-book1-edges.csv │ ├── asoiaf-book2-edges.csv │ ├── asoiaf-book3-edges.csv │ ├── asoiaf-book45-edges.csv │ ├── employee-map.json │ ├── person.json │ ├── stream_clean.json │ └── worldcities.csv ├── data_science │ ├── 01_data_import.adoc │ ├── 02_analysis_algo.adoc │ ├── 03_pagerank.adoc │ ├── 04_label_propagation.adoc │ ├── 05_louvain.adoc │ ├── 06_betweenness.adoc │ ├── data_science.adoc │ └── installing_apoc.adoc ├── football_transfers │ └── football_transfers.adoc ├── got │ ├── 01_eda.adoc │ ├── 02_algorithms.adoc │ └── got.adoc ├── got_wwc │ ├── 01_intro.adoc │ ├── 02_got.adoc │ ├── 03_got_houses.adoc │ ├── 04_got_families.adoc │ └── got_wwc.adoc ├── hospital │ └── hospital.adoc ├── img │ ├── AStormOfSwords.jpg │ ├── Graph_betweenness.jpg │ ├── PageRanks-Example.png │ ├── apoc-neo4j-user-defined-procedures.jpg │ ├── betweenness-centrality.png │ ├── bugs-bunny-the-end.jpg │ ├── char_cooccurence.png │ ├── cypher_create.jpg │ ├── cypher_run_button.jpg │ ├── cytutorial_neo4j_browser.jpg │ ├── dark-chocolate-pudding-with-malted-cream.jpg │ ├── database_import.png │ ├── document_common_attributes.png │ ├── download_csv.png │ ├── download_graph.png │ ├── enable_multiline_queries.jpg │ ├── footballtransfer-model.png │ ├── got_header.png │ ├── graph-data-science.jpg │ ├── hospitalmeta.jpg │ ├── jqassistant.png │ ├── label-propagation-graph-algorithm-1.png │ ├── label-propagation-graph-algorithm.png │ ├── life-science-import-datamodel.jpg │ ├── life-sciences-import-model-attribute.jpg │ ├── life-sciences-import-model-gene.jpg │ ├── louvain.jpg │ ├── meetup.png │ ├── n10s.png │ ├── neo4j-browser-sync.png │ ├── nodes.png │ ├── northwind_data_model.png │ ├── pin_button.png │ ├── rdf.png │ ├── restaurant_recommendation_model.png │ ├── schema.png │ ├── schema_documents.png │ ├── slides.jpg │ ├── stackexchange-logo.png │ ├── stackoverflow-logo.png │ ├── stackoverflow-model.jpg │ ├── style_actedin_relationship.png │ ├── style_person_node.png │ ├── style_sheet_grass.png │ ├── sushi_restaurants_nyc.png │ ├── sysinfo_stats.png │ ├── transfermarkt.png │ └── ukcompanies_model.png ├── import │ ├── 01_load_csv.adoc │ ├── 02_apoc.adoc │ ├── 03_procedures.adoc │ └── import.adoc ├── intro-browser │ └── intro-browser.adoc ├── javaland │ └── javaland.adoc ├── jqa │ └── jqa.adoc ├── life-science-import │ └── life-science-import.adoc ├── meetup │ ├── 01_meetup_import.adoc │ ├── 02_data_analysis.adoc │ └── meetup.adoc ├── rdf │ └── rdf.adoc ├── recipes │ └── recipes.adoc ├── restaurant_recommendation │ └── restaurant_recommendation.adoc ├── stackoverflow │ └── stackoverflow.adoc └── ukcompanies │ └── ukcompanies.adoc ├── finance └── neo4j_icij.adoc ├── fraud ├── BankFraud-1.png ├── Credit_Card_Fraud_Detection.adoc ├── Offshore_Leaks_and_Azerbaijan.adoc └── bank-fraud-detection.adoc ├── index.adoc ├── mdm ├── Organizational_learning.adoc └── aws-infrastructure.adoc ├── medical ├── DoctorFinder.adoc ├── central_hospital_of_asturias.adoc ├── pharma_drugs_targets.adoc ├── treatment_planners.adoc └── zombie.adoc ├── networkITmanagment ├── GeoptimaAllocation.adoc ├── NetworkDataCenterManagement1.adoc ├── datacenter-management-1.PNG └── network-routing.adoc ├── recommendation ├── Competence_Management.adoc └── marchMadnessBracketBuilder.adoc ├── render-guides.sh ├── retail ├── Menus_in_NYPL.adoc ├── SupplyChainManagement.adoc ├── hierarchy_graphgist.adoc └── northwind-graph.adoc ├── social ├── finding_influencers.adoc ├── neo4j-contact-networks.adoc └── project_management.adoc ├── syntax.adoc ├── uc-search ├── books.adoc ├── citation_patterns.adoc ├── graphgist_water.adoc └── yellowstone-gist.adoc └── web └── Aardvark.adoc /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | html 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | CC0 1.0 Universal 2 | 3 | Statement of Purpose 4 | 5 | The laws of most jurisdictions throughout the world automatically confer 6 | exclusive Copyright and Related Rights (defined below) upon the creator and 7 | subsequent owner(s) (each and all, an "owner") of an original work of 8 | authorship and/or a database (each, a "Work"). 9 | 10 | Certain owners wish to permanently relinquish those rights to a Work for the 11 | purpose of contributing to a commons of creative, cultural and scientific 12 | works ("Commons") that the public can reliably and without fear of later 13 | claims of infringement build upon, modify, incorporate in other works, reuse 14 | and redistribute as freely as possible in any form whatsoever and for any 15 | purposes, including without limitation commercial purposes. These owners may 16 | contribute to the Commons to promote the ideal of a free culture and the 17 | further production of creative, cultural and scientific works, or to gain 18 | reputation or greater distribution for their Work in part through the use and 19 | efforts of others. 20 | 21 | For these and/or other purposes and motivations, and without any expectation 22 | of additional consideration or compensation, the person associating CC0 with a 23 | Work (the "Affirmer"), to the extent that he or she is an owner of Copyright 24 | and Related Rights in the Work, voluntarily elects to apply CC0 to the Work 25 | and publicly distribute the Work under its terms, with knowledge of his or her 26 | Copyright and Related Rights in the Work and the meaning and intended legal 27 | effect of CC0 on those rights. 28 | 29 | 1. Copyright and Related Rights. A Work made available under CC0 may be 30 | protected by copyright and related or neighboring rights ("Copyright and 31 | Related Rights"). Copyright and Related Rights include, but are not limited 32 | to, the following: 33 | 34 | i. the right to reproduce, adapt, distribute, perform, display, communicate, 35 | and translate a Work; 36 | 37 | ii. moral rights retained by the original author(s) and/or performer(s); 38 | 39 | iii. publicity and privacy rights pertaining to a person's image or likeness 40 | depicted in a Work; 41 | 42 | iv. rights protecting against unfair competition in regards to a Work, 43 | subject to the limitations in paragraph 4(a), below; 44 | 45 | v. rights protecting the extraction, dissemination, use and reuse of data in 46 | a Work; 47 | 48 | vi. database rights (such as those arising under Directive 96/9/EC of the 49 | European Parliament and of the Council of 11 March 1996 on the legal 50 | protection of databases, and under any national implementation thereof, 51 | including any amended or successor version of such directive); and 52 | 53 | vii. other similar, equivalent or corresponding rights throughout the world 54 | based on applicable law or treaty, and any national implementations thereof. 55 | 56 | 2. Waiver. To the greatest extent permitted by, but not in contravention of, 57 | applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and 58 | unconditionally waives, abandons, and surrenders all of Affirmer's Copyright 59 | and Related Rights and associated claims and causes of action, whether now 60 | known or unknown (including existing as well as future claims and causes of 61 | action), in the Work (i) in all territories worldwide, (ii) for the maximum 62 | duration provided by applicable law or treaty (including future time 63 | extensions), (iii) in any current or future medium and for any number of 64 | copies, and (iv) for any purpose whatsoever, including without limitation 65 | commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes 66 | the Waiver for the benefit of each member of the public at large and to the 67 | detriment of Affirmer's heirs and successors, fully intending that such Waiver 68 | shall not be subject to revocation, rescission, cancellation, termination, or 69 | any other legal or equitable action to disrupt the quiet enjoyment of the Work 70 | by the public as contemplated by Affirmer's express Statement of Purpose. 71 | 72 | 3. Public License Fallback. Should any part of the Waiver for any reason be 73 | judged legally invalid or ineffective under applicable law, then the Waiver 74 | shall be preserved to the maximum extent permitted taking into account 75 | Affirmer's express Statement of Purpose. In addition, to the extent the Waiver 76 | is so judged Affirmer hereby grants to each affected person a royalty-free, 77 | non transferable, non sublicensable, non exclusive, irrevocable and 78 | unconditional license to exercise Affirmer's Copyright and Related Rights in 79 | the Work (i) in all territories worldwide, (ii) for the maximum duration 80 | provided by applicable law or treaty (including future time extensions), (iii) 81 | in any current or future medium and for any number of copies, and (iv) for any 82 | purpose whatsoever, including without limitation commercial, advertising or 83 | promotional purposes (the "License"). The License shall be deemed effective as 84 | of the date CC0 was applied by Affirmer to the Work. Should any part of the 85 | License for any reason be judged legally invalid or ineffective under 86 | applicable law, such partial invalidity or ineffectiveness shall not 87 | invalidate the remainder of the License, and in such case Affirmer hereby 88 | affirms that he or she will not (i) exercise any of his or her remaining 89 | Copyright and Related Rights in the Work or (ii) assert any associated claims 90 | and causes of action with respect to the Work, in either case contrary to 91 | Affirmer's express Statement of Purpose. 92 | 93 | 4. Limitations and Disclaimers. 94 | 95 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 96 | surrendered, licensed or otherwise affected by this document. 97 | 98 | b. Affirmer offers the Work as-is and makes no representations or warranties 99 | of any kind concerning the Work, express, implied, statutory or otherwise, 100 | including without limitation warranties of title, merchantability, fitness 101 | for a particular purpose, non infringement, or the absence of latent or 102 | other defects, accuracy, or the present or absence of errors, whether or not 103 | discoverable, all to the greatest extent permissible under applicable law. 104 | 105 | c. Affirmer disclaims responsibility for clearing rights of other persons 106 | that may apply to the Work or any use thereof, including without limitation 107 | any person's Copyright and Related Rights in the Work. Further, Affirmer 108 | disclaims responsibility for obtaining any necessary consents, permissions 109 | or other rights required for any use of the Work. 110 | 111 | d. Affirmer understands and acknowledges that Creative Commons is not a 112 | party to this document and has no duty or obligation with respect to this 113 | CC0 or use of the Work. 114 | 115 | For more information, please see 116 | 117 | 118 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # graphgists 2 | Reference Graph Gists 3 | 4 | == Basic Guidelines for Graph Gists 5 | 6 | * Use neo4j-version: 2.3 7 | * Use Neo4j 2.3 features, especially **Labels** 8 | * Adhere to the Cypher style guide (WIP, but: capitalized labels, all-caps rel-types, camel-case properties, if possible consistent keyword casing e.g. all-caps) 9 | * Use meaningful relationship types 10 | * Include a data model between 20 and 150 nodes in size 11 | * Explain the use case and be good read but not a novel 12 | * Include a good domain picture, if possible other illustrating pictures 13 | * Include meta-information about the author and topics 14 | * Use the graphgist tools (//graph_result, //table, //setup, //hide, //output) 15 | * Hide long long setup queries 16 | * Use one line per sentence for easier versioning 17 | * End with //console at the end 18 | 19 | 20 | == Basic Guidelines for Blog Posts 21 | 22 | In order to maximize the impact of the graph gists, we should write and update them with future blog posts in mind. 23 | 24 | * Posts should be 500+ words 25 | * Posts should include at least one picture or graphic (more are always welcome) 26 | * Posts should include at least one code example 27 | 28 | -------------------------------------------------------------------------------- /browser-guides/apoc/01_apoc_intro.adoc: -------------------------------------------------------------------------------- 1 | = Intro to APOC 2 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc 5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc 6 | :icons: font 7 | :neo4j-version: 3.5 8 | 9 | == Intro to APOC 10 | 11 | In this guide, we will see how to use the standard procedures and functions in the APOC library to assist in many activities with Neo4j. 12 | We will look at some of the most-used procedures, as well as some lesser known, and we will show how to navigate the library to find helpful procedures. 13 | 14 | Before we begin, though, we need to install the APOC library to operate with our Neo4j database instance. 15 | 16 | == Quick Check: Version compatibility matrix 17 | 18 | Since APOC relies on Neo4j's internal APIs in some places, you need to use the right APOC version for your Neo4j installaton. 19 | 20 | APOC uses a consistent versioning scheme: `.`. 21 | The trailing `` part of the version number will be incremented with every apoc release. 22 | 23 | [opts=header] 24 | |=== 25 | |apoc version | neo4j version 26 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.5.0.6[3.5.0.6^] | 3.5.12 (3.5.x) 27 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.4.0.4[3.4.0.6^] | 3.4.12 (3.4.x) 28 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.3.0.4[3.3.0.4^] | 3.3.6 (3.3.x) 29 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.2.3.6[3.2.3.6^] | 3.2.9 (3.2.x) 30 | |=== 31 | 32 | Full version compatibility matrix is in the https://github.com/neo4j-contrib/neo4j-apoc-procedures#version-compatibility-matrix[APOC docs^]. 33 | 34 | == Installation: Getting APOC 35 | 36 | We have a couple of options for installing APOC, depending on what type of Neo4j installation is running. 37 | 38 | 1. *For Neo4j Desktop:* we can install the built-in plugin in the `Project` or `Manage` database view. This automatically takes care of any configurations needed to use APOC. The steps are in the https://neo4j.com/docs/labs/apoc/current/introduction/#_installation_with_neo4j_desktop[APOC documentation^]. 39 | 40 | 2. *For Docker:* The Neo4j Docker image allows to supply a volume for the `/plugins` folder. 41 | Steps are in the https://neo4j.com/docs/labs/apoc/current/introduction/#_using_apoc_with_the_neo4j_docker_image[APOC documentation^]. 42 | 43 | 3. *Neo4j Sandbox or Aura:* These instances are both cloud-based and come with APOC pre-installed and ready to use! No steps required to use APOC. 44 | 45 | 4. *For Other Installations:* we will need to http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/{apoc-release}[download the jar^] from Github and place it in the `$NEO4J_HOME/plugins` folder. Additional information and initial configuration steps are shown in the https://neo4j.com/docs/labs/apoc/current/introduction/#_manual_installation_download_latest_release[APOC documentation^]. 46 | 47 | == Test APOC installation 48 | 49 | To verify everything installed correctly and we are able to run the procedures, we can try to access the APOC help command. 50 | 51 | [source, cypher] 52 | ---- 53 | CALL apoc.help('') 54 | ---- 55 | 56 | This procedure lists the type (procedure or function), name, text description, signature (format and parameters with types), role, and writes. 57 | 58 | As another option, we can execute the `dbms.procedures()` command and count the procedures in the APOC package. 59 | 60 | [source, cypher] 61 | ---- 62 | CALL dbms.procedures() YIELD name 63 | RETURN head(split(name,".")) as package, count(*), collect(name) as procedures; 64 | ---- 65 | 66 | == Calling APOC in Cypher 67 | 68 | User-defined functions can be used in any expression or predicate, just like built-in functions. 69 | 70 | Procedures can be called stand-alone with `CALL .();` syntax. 71 | You can also integrate them into your Cypher statements, which makes them much more powerful. 72 | 73 | Load JSON example: 74 | 75 | [source, cypher,subs=attributes] 76 | ---- 77 | WITH '{data-url}/person.json' AS url 78 | CALL apoc.load.json(url) YIELD value as person 79 | MERGE (p:Person {name:person.name}) 80 | ON CREATE SET p.age = person.age, p.children = size(person.children) 81 | RETURN p 82 | ---- 83 | 84 | == Next Step 85 | 86 | In the next section, we are going to see how to use APOC to convert dates and times. 87 | 88 | ifdef::env-guide[] 89 | pass:a[Date & Time Conversion] 90 | endif::[] 91 | 92 | ifdef::env-graphgist[] 93 | link:{gist}/02_datetime.adoc[Date & Time Conversion^] 94 | endif::[] -------------------------------------------------------------------------------- /browser-guides/apoc/02_datetime.adoc: -------------------------------------------------------------------------------- 1 | = Date & Time Conversion in APOC 2 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img 3 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc 4 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc 5 | :icons: font 6 | :neo4j-version: 3.5 7 | 8 | == APOC Date & Time Conversion 9 | 10 | Neo4j supports date and temporal values, but often, we are dealing with differing date formats between systems or files. 11 | These can be difficult to express and translate without a few flexible procedures to handle converting one value formatting to another. 12 | 13 | APOC has several procedures for converting and formatting various date, time, and temporal values. 14 | They save valuable time in manually converting values or creating a procedure from scratch! 15 | The full list of available procedures is in the https://neo4j.com/docs/labs/apoc/current/temporal/[APOC documentation^]. 16 | 17 | == Data set for this guide 18 | 19 | image::{img}/northwind_data_model.png[float=right] 20 | 21 | We will use the Northwind retail system data to test the date and time procedures in this guide. 22 | To load the data, we can run the browser guide below. 23 | 24 | [source,cypher] 25 | ---- 26 | :play northwind 27 | ---- 28 | 29 | A browser guide will appear. 30 | Go ahead and step through the guide, running all of the queries to load all the `Product`, `Supplier`, `Category`, `Order`, and `Customer` data with indexes on specific properties. 31 | 32 | Once completed, we can move to the next slide and start using APOC with this data. 33 | 34 | == Converting dates from Integer to String 35 | 36 | The APOC `apoc.date.format()` takes an integer value for the date and converts it to a string in the desired format, including a custom one. 37 | This is commonly used when translating data from APIs, flat files, or even other databases and moving that data into or out of Neo4j. 38 | 39 | Format: `apoc.date.format(12345, ['ms'/'s'], ['yyyy/MM/dd HH:mm:ss'])` 40 | 41 | This procedure has 3 parameters - 42 | 43 | 1. the date integer value to convert 44 | 2. how specific the first parameter is (`s` for seconds, `ms` for milliseconds) 45 | 3. how we want the date string result to look 46 | 47 | == apoc.date.format Example: 48 | 49 | Our Northwind data has `Customer` nodes who hopefully make orders with our business. 50 | We probably want to record timestamps when the first contact was sent to the business to see which customers were initially contacted in certain months and which probably made sales in the same year. 51 | 52 | [source, cypher] 53 | ---- 54 | WITH 841914000 as dateInt //1996-09-05 09:00 in epoch seconds 55 | MERGE (c:Customer {companyName: 'Island Trading'}) 56 | SET c.firstContact = apoc.date.format(dateInt, 's', 'yyyy-MM-dd HH:mm:ss') 57 | RETURN c 58 | ---- 59 | 60 | In the example above, we have a date integer in seconds, and we want to update our customer information with that datetime in a human-readable format. 61 | To do that, we merge the `Customer` node and set the `firstContact` property equal to the converted date (using the procedure). 62 | 63 | In the return, we should see the customer's node with all its properties and the formatted date! 64 | 65 | == Converting dates from String to Integer 66 | 67 | Let us do the reverse of what we just did on the previous slide by converting a string value to an integer with `apoc.date.parse()`. 68 | This is helpful for comparing date strings from and to various formats, most commonly in data import or export. 69 | 70 | Format: `apoc.date.parse('2019/03/25 03:15:59', ['ms'/'s'], ['yyyy/MM/dd HH:mm:ss'])` 71 | 72 | The procedure needs 3 parameters - 73 | 74 | 1. the date string that needs converted 75 | 2. how specific the conversion should be (down to seconds `s` or milliseconds `ms`) 76 | 3. what the format is of the date string (1st parameter) 77 | 78 | == apoc.date.parse Example: 79 | 80 | Let us say that we received a notification from our monitoring system that there was an error in the system at timestamp `882230400`, so we need to find out which orders were possibly affected by the error. 81 | We can use `apoc.date.parse()` to convert the string-formatted date in our Northwind data to a timestamp and compare that to the timestamp we have from our error system. 82 | 83 | [source, cypher] 84 | ---- 85 | WITH 882230400 as errorTimestamp //1997-12-16 00:00:00.000 in epoch seconds 86 | MATCH (o:Order) 87 | WHERE apoc.date.parse(o.orderDate, 's', 'yyyy-MM-dd HH:mm:ss.SSS') = errorTimestamp 88 | RETURN o 89 | ---- 90 | 91 | In our example, we are given a date integer (epoch time from the error in monitoring system) and want to find the orders that were made on that date. 92 | We use `MATCH` to search for `Order` nodes where the converted `orderDate` property (using the procedure) matches the date integer of the error and return the orders that are found. 93 | 94 | In the return, we should see 3 orders that have an order date of `1997-12-16`! 95 | 96 | == Adding or subtracting units from timestamps 97 | 98 | The marketing department might want to see how well a marketing campaign did to generate sales. 99 | The campaign was published at timestamp `891388800`, and we need to find out how many sales it generated within the first 30 days running. 100 | 101 | We can use `apoc.date.add()` to take a point in time of epoch milliseconds (integer) and add or subtract a specified time value to find the desired timestamp. 102 | 103 | Format: `apoc.date.add(12345, 'ms', -365, 'd')` 104 | 105 | The procedure above contains 4 parameters - 106 | 107 | 1. the date integer for adding or subtracting 108 | 2. how specific the date integer is (`s` for seconds, `ms` for milliseconds) 109 | 3. the number to add or subtract from the date integer 110 | 4. the unit type to add or subtract 111 | 112 | == apoc.date.add Example: 113 | 114 | [source, cypher] 115 | ---- 116 | WITH 891388800 as startDate 117 | WITH startDate, apoc.date.add(startDate, 's', 30, 'd') as endDate 118 | MATCH (o:Order) 119 | WHERE startDate < apoc.date.parse(o.orderDate,'s','yyyy-MM-dd HH:mm:ss.SSS') < endDate 120 | RETURN count(o) 121 | ---- 122 | 123 | In our query above, we first set the campaign start timestamp as a variable and then pass that to the next line, where we also use that `startDate` to calculate our end date (using the procedure). 124 | The `apoc.date.add` calculates it by adding 30 days (the `30` and `d` parameters) to the start date and setting that as our `endDate`. 125 | We then search for `Order` nodes where the `orderDate` (converted from string to integer using `apoc.date.parse()`) is greater than the start date of the campaign and less than the end date. 126 | 127 | In the return, we should see the number of orders made within 30 days of the campaign publish - a total of 70! 128 | 129 | == Converting date string to temporal type 130 | 131 | So far, we have worked with order dates as strings with a particular format. 132 | However, Neo4j supports date and time types, so it would probably make things much easier if we converted to the native types. 133 | 134 | There is an APOC procedure to convert the format from a string to a temporal type. 135 | Since Neo4j is compatible with the https://en.wikipedia.org/wiki/ISO_8601[ISO 8601^] standard, we will use that for our result format. 136 | 137 | Format: `apoc.date.convertFormat('2019-12-31 16:14:20', 'yyyy-MM-dd HH:mm:ss', 'iso_date_format')` 138 | 139 | The procedure contains 3 parameters - 140 | 141 | 1. the date string that needs converted 142 | 2. what the format is of the date string 143 | 3. the format for the resulting temporal type (can be specified manually, as https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html[Java formats^], or as these https://www.elastic.co/guide/en/elasticsearch/reference/5.5/mapping-date-format.html#built-in-date-formats[built-in formats^]) 144 | 145 | == apoc.date.convertFormat Example: 146 | 147 | [source, cypher] 148 | ---- 149 | MATCH (o:Order) 150 | SET o.isoOrderDate = apoc.date.convertFormat(o.orderDate, 'yyyy-MM-dd HH:mm:ss.SSS', 'iso_date_time') 151 | RETURN o 152 | ---- 153 | 154 | In the query above, we find all the orders in our system and set a new property called `isoOrderDate` that is equal to the converted `orderDate` string. 155 | The `orderDate` is converted using the procedure, specifying the string format it is currently in and the `iso_date_time` format (2019-01-01T00:00:00) we want to have as the result. 156 | 157 | Results of the query should return a sample (Browser will limit how much JavaScript has to render) of the orders we updated. 158 | Clicking on one shows all the properties on that node, including the new `isoOrderDate` property that is formatted as we expected! 159 | 160 | == Next Step 161 | 162 | In the next section, we are going to see how to use APOC to load JSON data into Neo4j. 163 | 164 | ifdef::env-guide[] 165 | pass:a[Load JSON Data] 166 | endif::[] 167 | 168 | ifdef::env-graphgist[] 169 | link:{gist}/03_load_json.adoc[Load JSON Data^] 170 | endif::[] -------------------------------------------------------------------------------- /browser-guides/apoc/05_periodic.adoc: -------------------------------------------------------------------------------- 1 | = Batch Data with APOC 2 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img 3 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc 4 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc 5 | :icons: font 6 | :neo4j-version: 3.5 7 | 8 | == Batch Data in Neo4j with APOC 9 | 10 | Sometimes, the updates that need to be made to data are operationally intensive and require more resources than can be allocated in a single transaction. 11 | APOC provides a few options for batching data to handle these larger demands. 12 | 13 | == Data set for this guide 14 | 15 | image::{img}/northwind_data_model.png[float=right] 16 | 17 | Just like in our previous sections on using APOC for refactoring or importing, we will use the Northwind retail system data to test the refactoring procedures in this guide. 18 | 19 | If you haven't loaded the data from earlier guides in this series (or if you want to start with clean data), you can run the code block below. 20 | The second statement will open the Northwind browser guide where you will need to execute each of the Cypher queries to load the data. 21 | 22 | [source,cypher] 23 | ---- 24 | MATCH (n) DETACH DELETE n; 25 | :play northwind-graph; 26 | ---- 27 | 28 | == Batching data with apoc.periodic.iterate 29 | 30 | For making updates to data in the graph, we may want to make the update across the entire graph or we may want to select a subset of data for updating. 31 | Either way, we could be dealing with vast amounts of data and may want to batch imports coming from files or other systems to load into our graph. 32 | 33 | The `apoc.periodic.iterate` procedure is one of the best ways to handle a variety of import and update scenarios in a batch manner. 34 | It uses a data-driven statement to select or read data, then uses an operation statement for specifying what we want to do with each batch. 35 | 36 | Format: `apoc.periodic.iterate('data-driven statement', 'operations statement', {config: ...})` 37 | 38 | The procedure has 3 parameters - 39 | 40 | 1. the data-driven statement for selecting/reading data into batches 41 | 2. the operations statement for updating/creating data in batches 42 | 3. any configurations 43 | 44 | == apoc.periodic.commit Example: 45 | 46 | Let's start with an example that is narrow in scope and is based on the need that we might want to flag products that need to be reordered. 47 | Perhaps we want to send our stock associates messages or put these items on a weekly report. 48 | 49 | To do this, we can search for products where our stock level is equal to or less than our reorder level and add an extra label to those nodes for easy retrieval by various systems or people. 50 | 51 | [source,cypher] 52 | ---- 53 | CALL apoc.periodic.iterate( 54 | 'MATCH (p:Product) WHERE p.unitsInStock <= p.reorderLevel RETURN p', 55 | 'SET p:Reorder', 56 | {batchSize: 100, batchMode: 'BATCH'} 57 | ) YIELD batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages 58 | RETURN batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages 59 | ---- 60 | 61 | Our statement above calls the procedure and uses the first Cypher query to select all of the Products where our stock is less or equal to the reorder level. 62 | Then, our second statement needs to add the `Reorder` label to those `Product` nodes. 63 | Next, we set some config for batchsize and the mode we want batches to execute. 64 | Because our Northwind data set is small, our batch size is also very small (it's not uncommon to see batchSizes set at 10,000 or more on larger graphs). 65 | 66 | Finally, we retrieve some statistics about our procedure execution, so that we have insight if anything goes wrong and can verify all the batches were successful. 67 | Note that since we set our batch size to 100, and we only have 22 updates (22 Product nodes have stock less than/equal to reorder level), it completes in a single batch. 68 | If we had hundreds or thousands of products in our graph and had low stock on most of them, however, we would see more batches. 69 | We could also have added a `parallel: true` config, since these updates wouldn't conflict (no relationships involved). 70 | However, since our graph is very small and we don't have very many updates, we don't need to add this configuration on this statement. 71 | 72 | == Verify results 73 | 74 | We can verify the update worked by running a query like the one below. 75 | 76 | [source,cypher] 77 | ---- 78 | MATCH (p:Product) 79 | RETURN p LIMIT 25; 80 | ---- 81 | 82 | == Another apoc.periodic.iterate Example: 83 | 84 | Let's take, for instance, that we might want to track and maintain our order line item information as a separate node, rather than properties on a relationship. 85 | We may be querying those relationship properties more often than initially thought, and query performance may see a dip, since relationship properties are not as optimized as patterns. 86 | 87 | To do this, we can use `apoc.periodic.iterate` to select all of the `ORDERS` relationships in our graph and add a `LineItem` intermediary node with relationships. 88 | 89 | [source,cypher] 90 | ---- 91 | CALL apoc.periodic.iterate( 92 | 'MATCH (o:Order)-[r:ORDERS]->(p:Product) RETURN r, o, p', 93 | 'MERGE (i:LineItem {id: o.orderID+p.productID}) 94 | SET i.quantity = r.quantity, i.unitPrice = r.unitPrice, i.discount = r.discount 95 | MERGE (o)-[rel:HAS_ITEM]->(i)-[rel2:IS_FOR]-(p) 96 | DELETE r', 97 | {batchSize: 10000, batchMode: 'BATCH'} 98 | ) YIELD batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages 99 | RETURN batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages 100 | ---- 101 | 102 | Our statement above calls the procedure and selects all of the Orders with an `ORDERS` relationship to Products in the first query. 103 | Then, our second statement takes those patterns and create a new intermediary node (`LineItem`) with the line item properties (from the existing `ORDERS` relationships). 104 | The next merge statement connects the new line items to the related `Order` and `Product` nodes, and the last statement deletes the old `ORDERS` relationships, since we have the new pattern. 105 | 106 | Finally, we set some config for batch size and the mode we want batches to execute. 107 | We retrieve some statistics about our procedure execution, so that we have insight if anything goes wrong and can verify all the batches were successful. 108 | Note that since we set our batch size to 10,000, and we only have 2,155 updates, it completes in a single batch. 109 | If our graph was much larger, however, we could very easily see more batches. 110 | 111 | == Verify results 112 | 113 | We can verify everything looks correct with the query below by selecting a specific customer and pulling all their orders with the new line items and related products. 114 | 115 | [source,cypher] 116 | ---- 117 | MATCH (c:Customer {companyName: 'Hanari Carnes'})-[r:PURCHASED]-(o:Order)-[r2:HAS_ITEM]-(i:LineItem)-[r3:IS_FOR]-(p:Product) 118 | RETURN c, r, o, r2, i, r3, p LIMIT 25 119 | ---- 120 | 121 | == Next Steps 122 | 123 | You are well on your way to mastering the APOC library and improving your interaction with graph data in Neo4j! 124 | Feel free to check out many of our other APOC resources for continuing your learning and discovering the many more useful procedures and functions available. 125 | 126 | * https://neo4j.com/docs/labs/apoc/current/[Reference the APOC documentation^] 127 | * https://www.youtube.com/playlist?list=PL9Hl4pk2FsvXEww23lDX_owoKoqqBQpdq[Video series: see how to use APOC procedures^] 128 | * https://community.neo4j.com/c/neo4j-graph-platform/procedures-apoc/77[Ask questions: join our Neo4j Community Site to get APOC help^] 129 | * https://neo4j.com/labs/apoc/[Learn more about APOC and contributing^] 130 | -------------------------------------------------------------------------------- /browser-guides/apoc/apoc.adoc: -------------------------------------------------------------------------------- 1 | = Awesome Procedures on Cypher (APOC) 2 | :author: Jennifer Reif 3 | :description: Learn to use some of the most popular procedures in the APOC library and explore the capabilities the library can provide 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc 7 | :tags: apoc, procedures, temporal, load-json, data-import, refactor, batching, periodic 8 | :neo4j-version: 3.5 9 | 10 | == Welcome to APOC 11 | 12 | The APOC library is a set of standard user-defined procedures to extend Cypher in Neo4j. 13 | User-defined procedures are custom implementations of certain functionality that cannot be easily expressed in Cypher. 14 | They are implemented in Java, so they are deployable to a Neo4j instance and callable directly from Cypher. 15 | 16 | image::{img}/apoc-neo4j-user-defined-procedures.jpg[float=right] 17 | 18 | The APOC library consists of over 450 procedures to help with many different tasks in areas like data integration, data conversion, and much more. 19 | 20 | ifdef::env-guide[] 21 | . pass:a[Intro to APOC] 22 | . pass:a[Date & Time Conversion] 23 | . pass:a[Load JSON Data] 24 | . pass:a[Refactor Data] 25 | . pass:a[Batching & Background Operations] 26 | endif::[] 27 | 28 | ifdef::env-graphgist[] 29 | . link:{gist}/01_apoc_intro.adoc[Intro to APOC^] 30 | . link:{gist}/02_datetime.adoc[Date & Time Conversion^] 31 | . link:{gist}/03_load_json.adoc[Load JSON Data^] 32 | . link:{gist}/04_refactor_data.adoc[Refactor Data^] 33 | . link:{gist}/05_periodic.adoc[Batching & Background Operations^] 34 | endif::[] 35 | 36 | == Resources 37 | 38 | * https://neo4j.com/docs/labs/apoc/current/[APOC Documentation^] 39 | * https://github.com/neo4j-contrib/neo4j-apoc-procedures[Github source code repository^] 40 | * https://neo4j.com/docs/java-reference/current/extending-neo4j/procedures-and-functions/functions/[Neo4j Docs: User-Defined Procedures^] 41 | -------------------------------------------------------------------------------- /browser-guides/bebe/bebe_en.adoc: -------------------------------------------------------------------------------- 1 | = Introduction to Graphs and Data 2 | :author: Michael Hunger 3 | :description: Introduce graphs and Cypher to young students with hands-on queries and exploration 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/bebe/img 5 | :tags: browser-guide, intro, cypher, students 6 | :neo4j-version: 3.5 7 | 8 | == Welcome to Neo4j! 9 | 10 | image::{img}/cypher_create.jpg[float=right,width=400] 11 | 12 | Neo4j is a database, a storage for *things* and their *relationships*. 13 | 14 | It is operated with a language called _Cypher_. 15 | 16 | With it, you can store things, but also find them again. 17 | 18 | Let's try that now. Continue with the arrow to the right. 19 | 20 | == Save things 21 | 22 | We can create ourselves: 23 | 24 | [source,cypher] 25 | ---- 26 | MERGE (me:Person {name: 'Jennifer'}) 27 | RETURN me 28 | ---- 29 | 30 | And then we can find ourselves, too: 31 | 32 | [source,cypher] 33 | ---- 34 | MATCH (p:Person {name: 'Jennifer'}) 35 | RETURN p 36 | ---- 37 | 38 | We show things as circles: `()` or `(:person {name: 'Jennifer'})` 39 | 40 | Can you find your neighbors? Give it a try! 41 | 42 | We can also find all the people: 43 | 44 | [source,cypher] 45 | ---- 46 | MATCH (p:Person) 47 | RETURN p 48 | ---- 49 | 50 | == Change things 51 | 52 | We can also store more than the name, like birthday or favorite color. 53 | 54 | We can find each other and then add new information. 55 | 56 | [source,cypher] 57 | ---- 58 | MATCH (p:Person {name: 'Jennifer'}) 59 | SET p.birthday = 'May' 60 | SET p.color = 'green' 61 | RETURN p 62 | ---- 63 | 64 | Now we can see who all likes the color `green`. 65 | 66 | [source,cypher] 67 | ---- 68 | MATCH (p:Person) 69 | WHERE p.color = 'green' 70 | RETURN p 71 | ---- 72 | 73 | What if we wanted to find out who doesn't like the color green? Or who has a birthday in `July`? 74 | 75 | == Connect things 76 | 77 | For this, we need two (a pair) of things. 78 | 79 | Find *you* and *your* neighbor to your right. 80 | 81 | [source,cypher] 82 | ---- 83 | MATCH (a:Person {name: 'Jennifer'}) 84 | MATCH (b:Person {name: 'Diego'}) 85 | RETURN a,b 86 | ---- 87 | 88 | Relationships are arrows like `+-->+` or `+-[:KNOWS]->+`. 89 | 90 | Now we can connect the neighbors. 91 | 92 | [source,cypher] 93 | ---- 94 | MATCH (a:Person {name: 'Jennifer'}) 95 | MATCH (b:Person {name: 'Diego'}) 96 | MERGE (a)-[k:KNOWS]->(b) 97 | RETURN * 98 | ---- 99 | 100 | How long is our chain? Could we find all the groups of neighbors? 101 | 102 | [source,cypher] 103 | ---- 104 | MERGE (a)-[k:KNOWS]->(b) 105 | RETURN * 106 | ---- 107 | 108 | == What can you save? 109 | 110 | Answer: ANYTHING! 111 | 112 | * Hobbies, friends, family 113 | * People, movies, songs, books, comics 114 | * Countries, cities, streets 115 | * Schools, classes, dates and times 116 | * Stars, planets, animals, plants 117 | 118 | Or whatever you feel like and what you are interested in. 119 | 120 | Let's have a look at two things: 121 | 122 | * pass:a[ movies] 123 | * pass:a[helper] 124 | 125 | //Translated with www.DeepL.com/Translator (free version) -------------------------------------------------------------------------------- /browser-guides/data/person.json: -------------------------------------------------------------------------------- 1 | {"name":"Michael", 2 | "age": 41, 3 | "children": ["Selina","Rana","Selma"] 4 | } 5 | -------------------------------------------------------------------------------- /browser-guides/data_science/02_analysis_algo.adoc: -------------------------------------------------------------------------------- 1 | = Data Exploration 2 | :author: Neo4j Engineering 3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science 7 | :tags: data-science, gds, graph-algorithms 8 | :neo4j-version: 3.5 9 | 10 | == Data visualization 11 | 12 | Let's briefly explore the dataset before running some algorithms. 13 | 14 | Run the following query to visualize the schema of your graph: 15 | 16 | [source,cypher] 17 | ---- 18 | CALL db.schema.visualization() 19 | ---- 20 | 21 | The `:Dead`, `:King`, and `:Knight` labels all appear on `:Person` nodes. 22 | You may find it useful to remove them from the visualization to make it easier to inspect. 23 | 24 | == Summary statistics 25 | 26 | Calculate some simple statistics to see how data is distributed. 27 | For example, find the minimum, maximum, average, and standard deviation of the number of interactions per character: 28 | 29 | [source,cypher] 30 | ---- 31 | MATCH (c:Person)-[:INTERACTS]->() 32 | WITH c, count(*) AS num 33 | RETURN min(num) AS min, max(num) AS max, avg(num) AS avg_interactions, stdev(num) AS stdev 34 | ---- 35 | 36 | Calculate the same grouped by book: 37 | 38 | [source,cypher] 39 | ---- 40 | MATCH (c:Person)-[r:INTERACTS]->() 41 | WITH r.book AS book, c, count(*) AS num 42 | RETURN book, min(num) AS min, max(num) AS max, avg(num) AS avg_interactions, stdev(num) AS stdev 43 | ORDER BY book 44 | ---- 45 | 46 | == Getting started with algorithms 47 | 48 | With Neo4j, you can run algorithms on explicitly and implicitly created graphs. In this tutorial, we will show you how to get the most out of the following algorithms: 49 | 50 | * Page Rank 51 | * Label Propagation 52 | * Weakly Connected Components (WCC) 53 | * Louvain 54 | * Node Similarity 55 | * Triangle Count 56 | * Local Clustering Coefficient 57 | 58 | == Algorithm syntax 59 | 60 | There are two ways to run algorithms on your graph - implicit and explicit. Explicit is a way to create a subgraph or projected graph that is stored in memory for running multiple algorithms without creating the subgraph each time. For this guide, we will focus on the implicit operation, which runs on the whole dataset or allows the user to create the subgraph adhoc. 61 | 62 | == Algorithm syntax: implicit graphs 63 | 64 | The implicit variant does not access the graph catalog. 65 | If you want to run an algorithm on such a graph, you configure the graph creation within the algorithm configuration map. 66 | 67 | [source] 68 | ---- 69 | CALL gds..( 70 | configuration: Map 71 | ) 72 | ---- 73 | 74 | * `` is the algorithm name. 75 | * `` is the algorithm execution mode. 76 | The supported modes are: 77 | ** `stream`: streams results back to the user. 78 | ** `stats`: returns a summary of the results. 79 | ** `write`: returns stats, as well as writes results to the Neo4j database. 80 | * The `configuration` parameter value is the algorithm-specific configuration. 81 | 82 | After the algorithm execution finishes, the graph is released from memory. 83 | 84 | == Next Steps 85 | 86 | Next, we will dive into using the first algorithm on our dataset - page rank. 87 | 88 | ifdef::env-guide[] 89 | pass:a[Centrality: Page Rank] 90 | endif::[] 91 | ifdef::env-graphgist[] 92 | link:{gist}/03_pagerank.adoc[Centrality: Page Rank^] 93 | endif::[] -------------------------------------------------------------------------------- /browser-guides/data_science/03_pagerank.adoc: -------------------------------------------------------------------------------- 1 | = Page Rank 2 | :author: Neo4j Engineering 3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science 7 | :tags: data-science, gds, graph-algorithms, pagerank, centrality 8 | :neo4j-version: 3.5 9 | 10 | == Page Rank 11 | 12 | image::{img}/PageRanks-Example.png[float="right", width="300"] 13 | 14 | Page Rank is an algorithm that measures the transitive influence and connectivity of nodes to find the most *influential* nodes in a graph. It computes an influence value for each node, called a _score_. As a result, the score of a node is a certain weighted average of the scores of its direct neighbors. 15 | 16 | *How Page Rank works* 17 | 18 | PageRank is an _iterative_ algorithm. 19 | In each iteration, every node propagates its score evenly divided to its neighbors. The algorithm runs for a configurable maximum number of iterations (default is 20), or until the node scores converge. That occurs when the maximum change in node score between two sequential iterations is smaller than the configured `tolerance` value. 20 | 21 | In the following chapters, you will see how Page Rank identifies the most important nodes. 22 | 23 | == Page Rank: stream mode 24 | 25 | Let's find out who is influential in the graph by running Page Rank. 26 | First, we run a basic Page Rank call in `stream` mode. 27 | 28 | [source, cypher] 29 | ---- 30 | CALL gds.pageRank.stream({ 31 | nodeProjection: 'Person', 32 | relationshipProjection: { 33 | INTERACTS: { 34 | orientation: 'UNDIRECTED' 35 | } 36 | } 37 | }) YIELD nodeId, score 38 | RETURN gds.util.asNode(nodeId).name AS name, score 39 | ORDER BY score DESC LIMIT 10 40 | ---- 41 | 42 | Then, you compare the Page Rank of each `Person` node with the number of interactions for that node. 43 | 44 | [source,cypher] 45 | ---- 46 | CALL gds.pageRank.stream({ 47 | nodeProjection: 'Person', 48 | relationshipProjection: { 49 | INTERACTS: { 50 | orientation: 'UNDIRECTED' 51 | } 52 | } 53 | }) YIELD nodeId, score AS pageRank 54 | WITH gds.util.asNode(nodeId) AS n, pageRank 55 | MATCH (n)-[i:INTERACTS]-() 56 | RETURN n.name AS name, pageRank, count(i) AS interactions 57 | ORDER BY pageRank DESC LIMIT 10 58 | ---- 59 | 60 | The result shows that not always the most talkative characters have the highest rank. 61 | 62 | == Page Rank: write mode 63 | 64 | Now that we have the results from our Page Rank query, we can write them back to Neo4j and use them for further queries. Specify the name of the property to which the algorithm will write using the `writeProperty` key in the config map passed to the procedure. 65 | 66 | [source,cypher] 67 | ---- 68 | CALL gds.pageRank.write({ 69 | nodeProjection: 'Person', 70 | relationshipProjection: { 71 | INTERACTS: { 72 | orientation: 'UNDIRECTED' 73 | } 74 | }, 75 | writeProperty: 'pageRank'}) 76 | ---- 77 | 78 | == Page Rank: rank per book 79 | 80 | Along with the generic `INTERACTS` relationships, you also have `INTERACTS_1`, `INTERACTS_2`, etc. for the different books. 81 | Let's compute and write the Page Rank scores for the first book. 82 | 83 | [source, cypher] 84 | ---- 85 | CALL gds.pageRank.write({ 86 | nodeProjection: 'Person', 87 | relationshipProjection: { 88 | INTERACTS_1: { 89 | orientation: 'UNDIRECTED' 90 | } 91 | }, 92 | writeProperty: 'pageRank1' 93 | }) 94 | ---- 95 | 96 | == Page Rank: exercise 97 | 98 | Let's see what you have learned so far. 99 | 100 | Try to calculate the Page Rank of the other books in the series and store the results in the database to measure and analyze influence. 101 | 102 | * Write queries that call `gds.pageRank.write` for the `INTERACTS_2`, `INTERACTS_3`, `INTERACTS_4`, and `INTERACTS_5` relationship types. (*Hint:* take a look at the previous query as a model) 103 | 104 | == Page Rank: answer questions 105 | 106 | Now, try to write queries to answer the following questions: 107 | 108 | * Which character has the biggest increase in influence from book 1 to 5? 109 | * Which character has the biggest decrease? 110 | 111 | *Note:* Answers are on the next slide. 112 | 113 | == Page Rank: exercise answer 114 | 115 | .Biggest increase 116 | [source, cypher] 117 | ---- 118 | MATCH (p:Person) 119 | RETURN p.name, p.pageRank1, p.pageRank5, p.pageRank5 - p.pageRank1 AS difference 120 | ORDER BY difference DESC 121 | LIMIT 10 122 | ---- 123 | 124 | .Biggest decrease 125 | [source, cypher] 126 | ---- 127 | MATCH (p:Person) 128 | RETURN p.name, p.pageRank1, p.pageRank5, p.pageRank5 - p.pageRank1 AS difference 129 | ORDER BY difference 130 | LIMIT 10 131 | ---- 132 | 133 | == Next Steps 134 | 135 | The next guide will look at the label propagation algorithm to find groups of people in communities. 136 | 137 | ifdef::env-guide[] 138 | pass:a[Communities: Label Propagation] 139 | endif::[] 140 | ifdef::env-graphgist[] 141 | link:{gist}/04_label_propagation.adoc[Communities: Label Propagation^] 142 | endif::[] -------------------------------------------------------------------------------- /browser-guides/data_science/04_label_propagation.adoc: -------------------------------------------------------------------------------- 1 | = Label Propagation 2 | :author: Neo4j Engineering 3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science 7 | :tags: data-science, gds, graph-algorithms, label-propagation, community 8 | :neo4j-version: 3.5 9 | 10 | == Label Propagation 11 | 12 | image::{img}/label-propagation-graph-algorithm-1.png[float="right",width=300] 13 | 14 | Label Propagation (LPA) is a fast algorithm for finding communities in a graph. It propagates labels throughout the graph and forms communities of nodes based on their influence. 15 | 16 | **How Label Propagation works** 17 | 18 | LPA is an _iterative_ algorithm. 19 | First, it assigns a unique community label to each node. In each iteration, the algorithm changes this label to the most common one among its neighbors. Densely connected nodes quickly broadcast their labels across the graph. 20 | At the end of the propagation, only a few labels remain. Nodes that have the same community label at convergence are considered to be in the same community. The algorithm runs for a configurable maximum number of iterations, or until it converges. 21 | 22 | For more details, see _https://neo4j.com/docs/graph-data-science/current/algorithms/label-propagation/[the documentation^]_. 23 | 24 | == Label Propagation: example 25 | 26 | Let's run label propagation to find the five largest communities of people interacting with each other. The weight property on the relationship represents the number of interactions between two people. In LPA, the weight is used to determine the influence of neighboring nodes when voting on community assignment. 27 | 28 | Let's now run LPA with just one iteration: 29 | 30 | [source, cypher] 31 | ---- 32 | CALL gds.labelPropagation.stream({ 33 | nodeProjection: 'Person', 34 | relationshipProjection: { 35 | INTERACTS: { 36 | orientation: 'UNDIRECTED', 37 | properties: 'weight' 38 | } 39 | }, 40 | relationshipWeightProperty: 'weight', 41 | maxIterations: 1 42 | }) YIELD nodeId, communityId 43 | RETURN communityId, count(nodeId) AS size 44 | ORDER BY size DESC 45 | LIMIT 5 46 | ---- 47 | 48 | You can see that the nodes are assigned to initial communities. However, the algorithm needs multiple iterations to achieve a stable result. 49 | So, let's run the same procedure with two iterations and see how the results change. 50 | 51 | [source, cypher] 52 | ---- 53 | CALL gds.labelPropagation.stream({ 54 | nodeProjection: 'Person', 55 | relationshipProjection: { 56 | INTERACTS: { 57 | orientation: 'UNDIRECTED', 58 | properties: 'weight' 59 | } 60 | }, 61 | relationshipWeightProperty: 'weight', 62 | maxIterations: 2 63 | }) YIELD nodeId, communityId 64 | RETURN communityId, count(nodeId) AS size 65 | ORDER BY size DESC 66 | LIMIT 5 67 | ---- 68 | 69 | Usually, label propagation requires more than a few iterations to converge on a stable result. 70 | The number of the required iterations depends on the graph structure -- you should experiment. 71 | When you don't see the numbers in each community changing (or changing very minimally), then you have probably arrived at a good number of iterations. 72 | 73 | == Label Propagation: seeding 74 | 75 | Label Propagation can be seeded with an initial community label from a pre-existing node property. This allows you to compute communities incrementally. Let's write the results after the first iteration back to the source graph, under the write property name `community`. 76 | 77 | [source, cypher] 78 | ---- 79 | CALL gds.labelPropagation.write({ 80 | nodeProjection: 'Person', 81 | relationshipProjection: { 82 | INTERACTS: { 83 | orientation: 'UNDIRECTED', 84 | properties: 'weight' 85 | } 86 | } 87 | relationshipWeightProperty: 'weight', 88 | maxIterations: 1, 89 | writeProperty: 'community' 90 | }) 91 | ---- 92 | 93 | You can now use the `community` property as a seed property for the second iteration. 94 | The results should be the same as the previous run with two iterations. Seeding is particularly useful when the source graph grows and you want to compute communities incrementally without starting again from scratch. 95 | 96 | Now, you can use the `seed` configuration key to specify the property from which you want to seed community IDs. 97 | 98 | [source, cypher] 99 | ---- 100 | CALL gds.labelPropagation.stream({ 101 | nodeProjection: { 102 | Person: { 103 | properties: 'community' 104 | } 105 | }, 106 | relationshipProjection: { 107 | INTERACTS: { 108 | orientation: 'UNDIRECTED', 109 | properties: 'weight' 110 | } 111 | }, 112 | relationshipWeightProperty: 'weight', 113 | maxIterations: 1, 114 | seedProperty: 'community' 115 | }) YIELD nodeId, communityId 116 | RETURN communityId, count(nodeId) AS size 117 | ORDER BY size DESC 118 | LIMIT 5 119 | ---- 120 | 121 | == Label Propagation: exercise 122 | 123 | Now that you understand the basics of LPA, let's experiment a little. 124 | 125 | 1. How many iterations does it take for LPA to converge on a stable number of communities? How many communities do you end up with? 126 | 127 | 2. What happens when you run LPA for 1,000 maxIterations? (_hint: try using YIELD ranIterations_) 128 | 129 | 3. What happens if you run LPA without weights? Do you find the same communities? 130 | 131 | *Bonus task*: What if you use house affiliations as seeds for communities? How would you use Cypher to create the initial seeds? Run the algorithm with the new seeds. Do you find a different set of communities? 132 | 133 | == Label Propagation: exercise answers 134 | 135 | 1. 5 iterations is when the results stabilize and don't seem to change by increasing iterations more than 5. 136 | 137 | 2. It only actually runs 6 times (5 to stabilize and the 6th to verify the community stabilization). 138 | 139 | [source,cypher] 140 | ---- 141 | CALL gds.labelPropagation.stats({ 142 | nodeProjection: 'Person', 143 | relationshipProjection: { 144 | INTERACTS: { 145 | orientation: 'UNDIRECTED', 146 | properties: 'weight' 147 | } 148 | }, 149 | relationshipWeightProperty: 'weight', 150 | maxIterations: 1000 151 | }) YIELD ranIterations 152 | ---- 153 | 154 | The above query uses the stats mode (stream does not output _ranIterations_) and outputs the ranIterations statistic. 155 | 156 | == Label Propagation: exercise answers 157 | 158 | 3. It does change the results. The communities are larger. 159 | 160 | [source,cypher] 161 | ---- 162 | CALL gds.labelPropagation.stream({ 163 | nodeProjection: { 164 | Person: { 165 | properties: 'community' 166 | } 167 | }, 168 | relationshipProjection: { 169 | INTERACTS: { 170 | orientation: 'UNDIRECTED' 171 | } 172 | }, 173 | maxIterations: 5 174 | }) YIELD nodeId, communityId 175 | RETURN communityId, count(nodeId) AS size 176 | ORDER BY size DESC 177 | LIMIT 5 178 | ---- 179 | 180 | == Label Propagation: exercise answers 181 | 182 | *Bonus task*: First, we need to write the algorithm to seed the communities for houses. The node query needs to pull both `Person` and `House` nodes into our graph on which to run label propagation. For the relationship query, we need to create our relationship query to both start and end on the `Person` nodes because the algorithms currently only support monopartite graphs. 183 | 184 | [source,cypher] 185 | ---- 186 | CALL gds.labelPropagation.write({ 187 | nodeQuery: 'MATCH (n) WHERE n:Person OR n:House RETURN id(n) as id', 188 | relationshipQuery: 'MATCH (p1:Person)-[:BELONGS_TO]->(:House)<-[:BELONGS_TO]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target', 189 | writeProperty: 'houseCommunity' 190 | }) 191 | ---- 192 | 193 | Now that we have seeded the communities, we can run the label propagation algorithm on those communities. 194 | 195 | [source,cypher] 196 | ---- 197 | CALL gds.labelPropagation.stream({ 198 | nodeQuery: 'MATCH (n) WHERE n:Person OR n:House RETURN id(n) as id', 199 | relationshipQuery: 'MATCH (p1:Person)-[:BELONGS_TO]->(:House)<-[:BELONGS_TO]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target', 200 | maxIterations: 2, 201 | seedProperty: 'houseCommunity' 202 | }) YIELD nodeId, communityId 203 | RETURN communityId, count(nodeId) AS size 204 | ORDER BY size DESC 205 | LIMIT 5 206 | ---- 207 | 208 | == Next Steps 209 | 210 | The next guide will look stay in the community detection algorithms with louvain. 211 | 212 | ifdef::env-guide[] 213 | pass:a[Communities: Louvain] 214 | endif::[] 215 | ifdef::env-graphgist[] 216 | link:{gist}/05_louvain.adoc[Communities: Louvain^] 217 | endif::[] -------------------------------------------------------------------------------- /browser-guides/data_science/05_louvain.adoc: -------------------------------------------------------------------------------- 1 | = Louvain 2 | :author: Neo4j Engineering 3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science 7 | :tags: data-science, gds, graph-algorithms, louvain, community 8 | :neo4j-version: 3.5 9 | 10 | == Louvain 11 | 12 | image::{img}/louvain.jpg[float="right",width=300] 13 | 14 | The Louvain algorithm, like Label Propagation, is a community detection algorithm that identifies clusters of nodes in a graph. 15 | It calculates how densely connected the nodes within a community are. 16 | Louvain also reveals a hierarchy of communities at different scales, which enables you to zoom in on different levels of granularity and find sub-communities within other sub-communities. 17 | 18 | *How Louvain works* 19 | 20 | Louvain is a _greedy_, _hierarchical clustering_ algorithm, meaning that it repeats the following two steps until it finds a global optimum: 21 | 22 | . Assign the nodes to communities, favoring local grouping. 23 | . Aggregate the nodes from the same community to form a single node, which inherits all connected relationships. 24 | 25 | These two steps are repeated until no further reassignments of communities are possible. 26 | You can get different results between different runs of the Louvain algorithm because the nodes can be reassigned to groups randomly. 27 | 28 | *What to consider* 29 | 30 | Louvain is significantly slower than Label Propagation, and the results can be hard to interpret. 31 | 32 | The algorithm can also use weights to calculate the communities. 33 | A good sign that you need to tweak your schema or weighting is when you notice that the results include only a _single_ giant community, or every node is a community on its own. 34 | 35 | == Louvain: examples 36 | 37 | Let's compute the Louvain community structure of our person interactions. 38 | 39 | [source, cypher] 40 | ---- 41 | CALL gds.louvain.stream({ 42 | nodeProjection: 'Person', 43 | relationshipProjection: { 44 | INTERACTS: { 45 | orientation: 'UNDIRECTED' 46 | } 47 | } 48 | }) 49 | YIELD nodeId, communityId 50 | RETURN gds.util.asNode(nodeId).name AS person, communityId 51 | ORDER BY communityId DESC 52 | ---- 53 | 54 | The query returns the name of each person and the id of the community to which it belongs. 55 | If you want to investigate how many communities are available, and the number of members of each community, you can change the RETURN statement. 56 | 57 | [source, cypher] 58 | ---- 59 | CALL gds.louvain.stream({ 60 | nodeProjection: 'Person', 61 | relationshipProjection: { 62 | INTERACTS: { 63 | orientation: 'UNDIRECTED' 64 | } 65 | } 66 | }) 67 | YIELD nodeId, communityId 68 | RETURN communityId, COUNT(DISTINCT nodeId) AS members 69 | ORDER BY members DESC 70 | ---- 71 | 72 | The result is 1382 communities, 11 of which with more than one member. 73 | 74 | == Louvain: weighting 75 | 76 | Now let's run the Louvain algorithm on a weighted graph. 77 | This way, it considers the relationship weights when calculating the modularity. 78 | 79 | We will need to use the `weight` property on the INTERACTS relationship to evaluate communities with weights: 80 | 81 | [source,cypher] 82 | ---- 83 | CALL gds.louvain.stream({ 84 | nodeProjection: 'Person', 85 | relationshipProjection: { 86 | INTERACTS: { 87 | orientation: 'UNDIRECTED', 88 | aggregation: 'NONE', 89 | properties: { 90 | weight: { 91 | property: 'weight', 92 | aggregation: 'NONE', 93 | defaultValue: 0.0 94 | } 95 | } 96 | } 97 | }, 98 | relationshipWeightProperty: 'weight' 99 | }) 100 | YIELD nodeId, communityId 101 | RETURN communityId, COUNT(DISTINCT nodeId) AS members 102 | ORDER BY members DESC 103 | ---- 104 | 105 | The result is 1384 communities, 13 of which with more than one member. 106 | 107 | == Louvain: intermediate communities 108 | 109 | Now let's try to identify communities at multiple levels in the graph: first small communities, and then combine those smaller groups into larger ones. 110 | 111 | To retrieve the intermediate communities, set `includeIntermediateCommunities` to `true`: 112 | 113 | [source,cypher] 114 | ---- 115 | CALL gds.louvain.stream({ 116 | nodeProjection: 'Person', 117 | relationshipProjection: { 118 | INTERACTS: { 119 | orientation: 'UNDIRECTED', 120 | aggregation: 'NONE', 121 | properties: { 122 | weight: { 123 | property: 'weight', 124 | aggregation: 'NONE', 125 | defaultValue: 0.0 126 | } 127 | } 128 | } 129 | }, 130 | includeIntermediateCommunities: true 131 | }) 132 | YIELD nodeId, communityId, intermediateCommunityIds 133 | RETURN communityId, COUNT(DISTINCT nodeId) AS members, intermediateCommunityIds 134 | ---- 135 | 136 | You can extract membership in different levels of communities and see how the composition changes: 137 | 138 | [source,cypher] 139 | ---- 140 | CALL gds.louvain.stream({ 141 | nodeProjection: 'Person', 142 | relationshipProjection: { 143 | INTERACTS: { 144 | orientation: 'UNDIRECTED', 145 | aggregation: 'NONE', 146 | properties: { 147 | weight: { 148 | property: 'weight', 149 | aggregation: 'NONE', 150 | defaultValue: 0.0 151 | } 152 | } 153 | } 154 | }, 155 | includeIntermediateCommunities: true 156 | }) 157 | YIELD nodeId, intermediateCommunityIds 158 | RETURN count(distinct intermediateCommunityIds[0]), count(distinct intermediateCommunityIds[1]) 159 | ---- 160 | 161 | `includeIntermediateCommunities: false` is the default value, in which case, the `intermediateCommunityIds` field of the result is `null`. 162 | 163 | == Next Steps 164 | 165 | In the next guide, we will go back to centrality algorithms with a look at betweenness centrality. 166 | 167 | ifdef::env-guide[] 168 | pass:a[Centralities: Betweenness] 169 | endif::[] 170 | ifdef::env-graphgist[] 171 | link:{gist}/06_betweenness.adoc[Centralities: Betweenness^] 172 | endif::[] -------------------------------------------------------------------------------- /browser-guides/data_science/06_betweenness.adoc: -------------------------------------------------------------------------------- 1 | = Betweenness Centrality 2 | :author: Neo4j Engineering 3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science 7 | :tags: data-science, gds, graph-algorithms, betweenness, centrality 8 | :neo4j-version: 3.5 9 | 10 | == Betweenness Centrality 11 | 12 | image::{img}/Graph_betweenness[float="right", width="300"] 13 | 14 | *How Betweenness Centrality works* 15 | 16 | The algorithm calculates shortest paths without weighting between all pairs of nodes in the graph. 17 | Each node receives a score based on the number of shortest paths that pass through the node. 18 | Nodes that lie on more shortest paths between other nodes will have higher betweenness centrality scores. 19 | 20 | == Betweenness Centrality: stream mode 21 | 22 | Let's find out who is influential in the graph by running Betweenness Centrality. 23 | 24 | First, you run the Betweenness Centrality algorithm in `stream` mode. 25 | 26 | [source, cypher] 27 | ---- 28 | CALL gds.betweenness.stream({ 29 | nodeProjection: 'Person', 30 | relationshipProjection: { 31 | INTERACTS: { 32 | orientation: 'UNDIRECTED' 33 | } 34 | } 35 | }) YIELD nodeId, score 36 | RETURN gds.util.asNode(nodeId).name AS name, score 37 | ORDER BY score DESC LIMIT 10 38 | ---- 39 | 40 | If you ran Page Rank previously, you may notice that the result is similar. 41 | You can run the Page Rank query again and compare the result. 42 | 43 | [source, cypher] 44 | ---- 45 | CALL gds.pageRank.stream({ 46 | nodeProjection: 'Person', 47 | relationshipProjection: { 48 | INTERACTS: { 49 | orientation: 'UNDIRECTED' 50 | } 51 | } 52 | }) YIELD nodeId, score 53 | RETURN gds.util.asNode(nodeId).name AS name, score 54 | ORDER BY score DESC LIMIT 10 55 | ---- 56 | 57 | The result is similar, but not identical. 58 | In general, betweenness centrality is a good metric to identify bottlenecks and bridges in a graph, while page rank is used to understand the influence of a node in a network. 59 | 60 | == Betweenness Centrality: stats, write and mutate 61 | 62 | In stats mode, betweenness centrality will return the minimum, maximum, and sum of the centrality scores. 63 | 64 | [source, cypher] 65 | ---- 66 | CALL gds.betweenness.stats({ 67 | nodeProjection: 'Person', 68 | relationshipProjection: { 69 | INTERACTS: { 70 | orientation: 'UNDIRECTED' 71 | } 72 | } 73 | }) 74 | YIELD minimumScore, maximumScore, scoreSum 75 | ---- 76 | 77 | The same is returned by the write and mutate modes as well, in addition to writing results back to Neo4j (write mode) or mutating the in-memory graph (mutate mode). 78 | 79 | == Next Steps 80 | 81 | Congratulations! We have learned and practiced some of the key algorithms for studying influence (centrality) and communities (community detection). 82 | For additional learning, see the full and expanded https://localhost:7474/browser?cmd=play&arg=graph-data-science[guide] for the graph data science library. 83 | https://neo4j.com/docs/graph-data-science/current/[Reference documentation] for the Neo4j graph data science library is also available for detailed information. -------------------------------------------------------------------------------- /browser-guides/data_science/data_science.adoc: -------------------------------------------------------------------------------- 1 | = Introduction to Graph Data Science 2 | :author: Neo4j Engineering 3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science 7 | :tags: data-science, gds, graph-algorithms 8 | :neo4j-version: 3.5 9 | 10 | == Welcome to an Introduction to Graph Data Science 11 | 12 | The Neo4j Graph Data Science (GDS) library contains a set of graph algorithms exposed through Cypher procedures. 13 | Graph algorithms provide insights into the graph structure and elements, for example, by computing centrality and similarity scores and detecting communities. 14 | 15 | This guide follows the ordinary workflow for running the product tier algorithms: PageRank, Label Propagation, Louvain, and Betweenness Centrality. We will cover the following concepts: 16 | 17 | * Create a graph and import the data. 18 | * Configure the algorithm to suit your needs and the data. 19 | 20 | image::{img}/graph-data-science.jpg[float=right] 21 | 22 | ifdef::env-guide[] 23 | . pass:a[Data and Import] 24 | . pass:a[Data Exploration] 25 | . pass:a[Page Rank] 26 | . pass:a[Label Propagation] 27 | . pass:a[Louvain] 28 | . pass:a[Betweenness Centrality] 29 | endif::[] 30 | 31 | ifdef::env-graphgist[] 32 | . link:{gist}/01_data_import.adoc[Data and Import^] 33 | . link:{gist}/02_analysis_algo.adoc[Data Exploration^] 34 | . link:{gist}/03_pagerank.adoc[Page Rank^] 35 | . link:{gist}/04_label_propagation.adoc[Label Propagation^] 36 | . link:{gist}/05_louvain.adoc[Louvain^] 37 | . link:{gist}/06_betweenness.adoc[Betweenness Centrality^] 38 | endif::[] 39 | 40 | == Further Resources 41 | 42 | For more resources, see link:https://neo4j.com/developer/graph-data-science/[the developer guides^]. 43 | 44 | The official Graph Data Science (GDS) library documentation can be found link:https://neo4j.com/docs/graph-data-science/current/[here^]. -------------------------------------------------------------------------------- /browser-guides/data_science/installing_apoc.adoc: -------------------------------------------------------------------------------- 1 | = Installing Awesome Procedures (apoc) 2 | :author: Neo4j Engineering 3 | 4 | == APOC library installation 5 | 6 | https://twitter.com/mesirii[Michael Hunger] has created the 7 | https://github.com/neo4j-contrib/neo4j-apoc-procedures[apoc] library which contains lots of useful procedures that we can use in our Neo4j applications. 8 | 9 | Let’s get `apoc` installed on our local instances of Neo4j: 10 | 11 | * You should have already copied `apoc.jar` onto your machine. If you haven’t, then grab a USB stick from one of the trainers or download the latest version of apoc from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/latest 12 | * Copy `apoc.jar` into your `plugins` folder wherever you have installed Neo4j. 13 | * Restart Neo4j 14 | 15 | == Check apoc installed correctly 16 | 17 | If you run the following command, you can see which additional procedures are now available to us: 18 | 19 | [source,highlight,pre-scrollable,programlisting,cm-s-neo,code,runnable,standalone-example,ng-binding] 20 | ---- 21 | CALL dbms.procedures() 22 | YIELD name, signature 23 | WITH name, signature 24 | WHERE name STARTS WITH "apoc" 25 | RETURN name, signature 26 | ---- 27 | 28 | If you don’t see any rows, grab your closest trainer for help. 29 | Once you’ve got it installed, you can close this guide and return to the previous one. -------------------------------------------------------------------------------- /browser-guides/got/got.adoc: -------------------------------------------------------------------------------- 1 | = Neo4j Graph of Thrones and Data Science 2 | :author: Mark Needham 3 | :description: Explore the Game of Thrones world with Cypher and data science algorithms 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/got/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got 7 | :tags: intro, cypher, load-csv, gds, algorithms, data-science 8 | :neo4j-version: 3.5 9 | 10 | == Welcome to Neo4j Graph of Thrones and Data Science 11 | 12 | image:{img}/got_header.png[got_header,float=right,width=500] 13 | 14 | ifdef::env-guide[] 15 | . pass:a[Exploratory Data Analysis] 16 | . pass:a[Applied Graph Algorithms] 17 | endif::[] 18 | 19 | ifdef::env-graphgist[] 20 | . link:{gist}/01_eda.adoc[Exploratory Data Analysis^] 21 | . link:{gist}/02_algorithms.adoc[Applied Graph Algorithms^] 22 | endif::[] 23 | 24 | == Further Resources 25 | 26 | * https://neo4j.com/graphgists[Graphgist Examples] 27 | * https://neo4j.com/docs/stable/cypher-refcard/[Cypher Reference Card] 28 | * https://neo4j.com/docs/cypher-manual/current/[Neo4j Cypher Manual] 29 | * https://graphdatabases.com[e-book: Graph Databases (free)] 30 | -------------------------------------------------------------------------------- /browser-guides/got_wwc/01_intro.adoc: -------------------------------------------------------------------------------- 1 | = Intro to Neo4j and Cypher 2 | :csv-url: https://raw.githubusercontent.com/neo4j-meetups/modeling-worked-example/master/data/ 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc 5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc 6 | :icons: font 7 | :neo4j-version: 3.5 8 | 9 | == Intro to Neo4j 10 | 11 | Welcome to the first of a set of interactive guides. 12 | In these guides you'll execute some pre written Cypher queries as well as having the chance to write some yourself. 13 | 14 | Let's get started! 15 | 16 | == Your Turn - `CREATE` Arya 17 | 18 | We'll use the `CREATE` keyword to create a node representing Arya Stark. 19 | Run the following query: 20 | 21 | [source,cypher] 22 | ---- 23 | CREATE (:Character {name: 'Arya Stark'}) 24 | ---- 25 | 26 | This query: 27 | 28 | * creates a node 29 | * with the `Character` label and 30 | * a `name` property with value `Arya Stark` 31 | 32 | Properties are stored as key/value pairs. 33 | The allowed data types are: strings, numbers, booleans and arrays. 34 | 35 | == MATCH - Finding Arya 36 | 37 | Now let's try and find Arya. 38 | 39 | We want to `MATCH` a pattern in the graph. 40 | In this case that pattern is a node with the `Character` label and with the `name` property set to `Arya Stark`. 41 | 42 | [source,cypher] 43 | ---- 44 | MATCH (character:Character {name: 'Arya Stark'}) 45 | RETURN character 46 | ---- 47 | 48 | This is syntactic sugar for the following long hand: 49 | 50 | [source,cypher] 51 | ---- 52 | MATCH (character:Character) 53 | WHERE character.name = 'Arya Stark' 54 | RETURN character 55 | ---- 56 | 57 | == SET - Add and update properties 58 | 59 | Let's add Arya's title to the Arya node: 60 | 61 | [source, cypher] 62 | ---- 63 | MATCH (character:Character {name: 'Arya Stark'}) 64 | SET character.title = "Princess" 65 | RETURN character 66 | ---- 67 | 68 | == Schema-less by default 69 | 70 | Try creating Arya again: 71 | 72 | [source,cypher] 73 | ---- 74 | CREATE (:Character {name: 'Arya Stark'}) 75 | ---- 76 | 77 | What happens? 78 | 79 | [source,cypher] 80 | ---- 81 | MATCH (character:Character {name: 'Arya Stark'}) 82 | RETURN character 83 | ---- 84 | 85 | Oh no! We've now got two Aryas! 86 | 87 | == Constraints 88 | 89 | image::{img}/slides.jpg[] 90 | 91 | == Constraints 92 | 93 | Let's create a constraint on the `name` property for any `Character` nodes so we don't end up with duplicates. 94 | 95 | [source, cypher] 96 | ---- 97 | CREATE CONSTRAINT character_name ON (c:Character) 98 | ASSERT c.name IS UNIQUE; 99 | ---- 100 | 101 | Unfortunately we can't actually create the constraint because we already have two `Character` nodes with the same `name`. 102 | 103 | == Deleting the second Arya 104 | 105 | We need to delete the second Arya we created. 106 | 107 | We can work out which node that is by finding the one that doesn't have the `title` property. 108 | We'll then use the `DELETE` command to get rid of that node: 109 | 110 | [source, cypher] 111 | ---- 112 | MATCH (character:Character {name: 'Arya Stark'}) 113 | WHERE NOT EXISTS (character.title) 114 | DELETE character 115 | ---- 116 | 117 | Now we can try and apply our constraint again: 118 | 119 | [source, cypher] 120 | ---- 121 | CREATE CONSTRAINT character_name ON (c:Character) 122 | ASSERT c.name IS UNIQUE; 123 | ---- 124 | 125 | You can see which constraints and indexes have been created by running the following command: 126 | 127 | [source, cypher] 128 | ---- 129 | :schema 130 | ---- 131 | 132 | == MERGE - Get-Or-Create 133 | 134 | Now let's try and create Arya again: 135 | 136 | [source,cypher] 137 | ---- 138 | CREATE (:Character {name: 'Arya Stark'}) 139 | ---- 140 | 141 | This time the unique constraint stops us. 142 | 143 | The `MERGE` keyword can come in useful here. 144 | `MERGE` will: 145 | 146 | * `MATCH` to check the whole pattern exists 147 | * If not, Cypher will `CREATE` it 148 | * `MERGE`-ing on the constraint - ensures strong guarantees 149 | 150 | [source, cypher] 151 | ---- 152 | MERGE (character:Character {name: 'Arya Stark'}) 153 | RETURN character 154 | ---- 155 | 156 | == Exercise: Create some more nodes 157 | 158 | Now it's your turn! 159 | We need to create nodes to represent `House Stark` and `Winter is Coming`. 160 | 161 | image::{img}/nodes.png[] 162 | 163 | == Answer: Create some more nodes 164 | 165 | [source,cypher] 166 | ---- 167 | MERGE (allegiance:House {name: 'House Stark'}) 168 | RETURN allegiance 169 | ---- 170 | 171 | [source,cypher] 172 | ---- 173 | MERGE (episode:Episode {number: 1}) 174 | ON CREATE SET episode.title = 'Winter is Coming' 175 | RETURN episode 176 | ---- 177 | 178 | == Create relationships 179 | 180 | Now we need to connect our nodes together. 181 | 182 | We'll start by writing a query to find and return `Arya Stark` and `House Stark`: 183 | 184 | [source, cypher] 185 | ---- 186 | MATCH (house:House {name: 'House Stark'}) 187 | MATCH (character:Character {name: 'Arya Stark'}) 188 | RETURN character, house 189 | ---- 190 | 191 | To create a relationship between them we can use the `CREATE` or `MERGE` keywords. 192 | 193 | [source, cypher] 194 | ---- 195 | MATCH (house:House {name: 'House Stark'}) 196 | MATCH (character:Character {name: 'Arya Stark'}) 197 | CREATE (character)-[:HAS_ALLEGIANCE_TO]->(house) 198 | ---- 199 | 200 | or 201 | 202 | [source, cypher] 203 | ---- 204 | MATCH (house:House {name: 'House Stark'}) 205 | MATCH (character:Character {name: 'Arya Stark'}) 206 | MERGE (character)-[:HAS_ALLEGIANCE_TO]->(house) 207 | ---- 208 | 209 | The `MERGE` version of the query will only create the relationship once no matter how many times we run it. 210 | The `CREATE` version will create a new relationship each time we run it. 211 | 212 | == Exercise: Create a relationship between `Arya Stark` and `Winter is Coming` 213 | 214 | Following the example in the previous example, let's now create a relationship with Arya and Winter is Coming. 215 | 216 | == Answer: Create a relationship between `Arya Stark` and `Winter is Coming` 217 | 218 | [source, cypher] 219 | ---- 220 | MATCH (character:Character {name: 'Arya Stark'}) 221 | MATCH (episode:Episode {number: 1}) 222 | MERGE (character)-[:APPEARED_IN]->(episode) 223 | ---- 224 | 225 | == Next Step 226 | 227 | In the next section we're going to import the full dataset and play with that 228 | 229 | ifdef::env-guide[] 230 | pass:a[Game of Thrones dataset] 231 | endif::[] 232 | ifdef::env-graphgist[] 233 | link:{gist}/02_got.adoc[Game of Thrones dataset^] 234 | endif::[] -------------------------------------------------------------------------------- /browser-guides/got_wwc/02_got.adoc: -------------------------------------------------------------------------------- 1 | = Game of Thrones: Characters and Episodes 2 | :csv-url: https://raw.githubusercontent.com/mneedham/neo4j-got/master/data/import 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc 5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc 6 | :icons: font 7 | :neo4j-version: 3.5 8 | 9 | == The Game of Thrones dataset 10 | 11 | Now that you've got a bit of practice with the Cypher syntax it's time to work with a bigger dataset. 12 | 13 | For this section it's best to start with a blank slate but first a quick overview about deleting graph data. 14 | 15 | image::{img}/slides.jpg[] 16 | 17 | == Delete all the things 18 | 19 | Run the following query to delete all the data we've created so far: 20 | 21 | [source, cypher] 22 | ---- 23 | MATCH (n) 24 | DETACH DELETE n 25 | ---- 26 | 27 | The `DETACH DELETE` clause deletes a node and any relationships connected to it. 28 | 29 | We'll also delete the constraint that we created on `:Character(name)`. 30 | In the real dataset, some characters have the same name and are distinguished by having a different `id`. 31 | 32 | [source, cypher] 33 | ---- 34 | DROP CONSTRAINT character_name; 35 | ---- 36 | 37 | Now we're ready to explore the Game of Thrones dataset. 38 | 39 | == LOAD CSV - The ETL Power Tool 40 | 41 | We're going to be using the `LOAD CSV` command in Cypher so first look at the slides for a brief introduction. 42 | 43 | image::{img}/slides.jpg[] 44 | 45 | == LOAD CSV - Exploring the data 46 | 47 | As well as importing data from CSV files we can also use `LOAD CSV` to explore those same files. 48 | 49 | Run the following query to see how many characters there are: 50 | 51 | [source, cypher,subs=attributes] 52 | ---- 53 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters.csv" AS row 54 | RETURN COUNT(*) 55 | ---- 56 | 57 | Refer to the link:https://neo4j.com/docs/cypher-refcard/current/[Cypher Refcard^] to see the full set of commands/functions available to us. 58 | 59 | == LOAD CSV - Exploring the data 60 | 61 | We can look at the individual rows by returning them directly rather than applying the `COUNT` function. 62 | 63 | The following query will return the first 5 rows of the CSV file. 64 | 65 | [source, cypher,subs=attributes] 66 | ---- 67 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters.csv" AS row 68 | RETURN row 69 | LIMIT 5 70 | ---- 71 | 72 | The `LIMIT` clause works the same way as in SQL. 73 | 74 | Try returning more rows or removing the `LIMIT` clause to see what other data the file contains. 75 | 76 | == Create the characters 77 | 78 | Now let's combine `LOAD CSV` with the commands we learnt in the first half of the session and put all the GoT characters into our graph. 79 | Run the following query: 80 | 81 | [source, cypher,subs=attributes] 82 | ---- 83 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters.csv" AS row 84 | MERGE (c:Character {id: row.link}) 85 | ON CREATE SET c.name = row.character 86 | ---- 87 | 88 | This query: 89 | 90 | * iterates over every row in the `characters.csv` file 91 | * creates a node with the `Character` label and an `id` property if such a node doesn't already exist 92 | * sets the `name` property if the node is being created 93 | 94 | == Finding characters 95 | 96 | Now let's see what we've imported into the database. 97 | Run the following query to see a sample of the nodes we've just created: 98 | 99 | [source, cypher] 100 | ---- 101 | MATCH (c:Character) 102 | RETURN c 103 | ORDER BY rand() 104 | LIMIT 25 105 | ---- 106 | 107 | The use of the `rand()` function means we get a different 25 characters each time. 108 | Try running the query a few times. 109 | 110 | Now that we've got the characters loaded it's time to import some episodes for them to appear in. 111 | 112 | == Importing episodes 113 | 114 | We have a CSV file containing episodes which we can explore by running the following query: 115 | 116 | [source, cypher, subs=attributes] 117 | ---- 118 | LOAD CSV WITH HEADERS FROM "{csv-url}/overview.csv" AS row 119 | RETURN row 120 | LIMIT 10 121 | ---- 122 | 123 | We'll run the following query to create a node for each episode: 124 | 125 | [source, cypher, subs=attributes] 126 | ---- 127 | LOAD CSV WITH HEADERS FROM "{csv-url}/overview.csv" AS row 128 | MERGE (episode:Episode {id: toInteger(row.episodeId)}) 129 | ON CREATE SET 130 | episode.season = toInteger(row.season), 131 | episode.number = toInteger(row.episode), 132 | episode.title = row.title 133 | ---- 134 | 135 | By default properties have the `String` data type. 136 | In this case we want `season`, `number` and `id` to be numeric so we coerce the data using the `toInteger` function. 137 | 138 | So now we've got characters and episodes but we still haven't got a graph as they aren't connected yet. 139 | Let's do that next. 140 | 141 | == Connecting episodes and characters 142 | 143 | (Surprise, surprise) We also have a CSV file containing the episodes that characters appeared in. 144 | We can explore that by running the following query: 145 | 146 | [source, cypher, subs=attributes] 147 | ---- 148 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_episodes.csv" AS row 149 | RETURN row 150 | LIMIT 10 151 | ---- 152 | 153 | We're going to create an `APPEARED_IN` relationship between a `Character` and `Episode` for each row in the file. 154 | 155 | [source, cypher, subs=attributes] 156 | ---- 157 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_episodes.csv" AS row 158 | MATCH (episode:Episode {id: toInteger(row.episodeId)}) 159 | MATCH (character:Character {id: row.character}) 160 | MERGE (character)-[:APPEARED_IN]->(episode) 161 | ---- 162 | 163 | This query: 164 | 165 | * looks up an episode 166 | * looks up a character 167 | * creates an `APPEARED_IN` relationship between them if one doesn't already exist. 168 | 169 | If you run this query again you'll see that it doesn't do anything the second time around. 170 | 171 | == Characters and Episodes 172 | 173 | We should now have a graph connecting Game of Thrones characters with the episodes that they appear in. 174 | 175 | Run the following query to check everything has imported correctly: 176 | 177 | [source, cypher] 178 | ---- 179 | MATCH (character:Character)-[:APPEARED_IN]->(episode:Episode) 180 | RETURN * 181 | ORDER BY rand() 182 | LIMIT 25 183 | ---- 184 | 185 | This query: 186 | 187 | * looks up nodes with the label `Character` 188 | * that have an outgoing `APPEARED_IN` relationship 189 | * to nodes with the label `Episode` 190 | * and finds 25 paths that match that pattern and returns them 191 | 192 | Spend a couple of minutes clicking around the graph visualisation to get a feel for the data we've imported. 193 | 194 | == Aggregation queries 195 | 196 | In the next section we'll have an exercise where you will write queries to answer some questions. 197 | A couple of these queries will require use of aggregation functions so we'll quickly go over those. 198 | 199 | Perhaps the most obvious question to answer after after importing characters and episodes is `Who appeared in the most episodes?`. 200 | We can write the following query to answer that question: 201 | 202 | [source, cypher] 203 | ---- 204 | MATCH (character:Character)-[:APPEARED_IN]->() 205 | RETURN character.name, COUNT(*) AS appearances 206 | ORDER BY appearances DESC 207 | ---- 208 | 209 | Look at the slides for a quick explanation of this query: 210 | 211 | image::{img}/slides.jpg[] 212 | 213 | == Exercise 214 | 215 | Here's a few questions for you to try and answer: 216 | 217 | * Who appeared in the most episodes in season 4? 218 | * Which `Stark` character appeared the least times? 219 | * Which episodes does `Arya Stark` not appear in? (You'll need to write a `WHERE NOT` clause in this query) 220 | 221 | Don't forget the link:https://neo4j.com/docs/cypher-refcard/current/[Cypher Refcard^] is your friend! 222 | 223 | == Answer: Who appeared in the most episodes in season 4? 224 | 225 | [source, cypher] 226 | ---- 227 | MATCH (character:Character)-[:APPEARED_IN]->({season: 4}) 228 | RETURN character.id, character.name, COUNT(*) AS appearances 229 | ORDER BY appearances DESC 230 | ---- 231 | 232 | == Answer: Which `Stark` character appeared the least times? 233 | 234 | [source, cypher] 235 | ---- 236 | MATCH (character:Character)-[:APPEARED_IN]->() 237 | WHERE character.name ENDS WITH "Stark" 238 | RETURN character.id, character.name, COUNT(*) AS appearances 239 | ORDER BY appearances 240 | LIMIT 1 241 | ---- 242 | 243 | == Answer: Which episodes does `Arya Stark` not appear in? 244 | 245 | [source, cypher] 246 | ---- 247 | MATCH (episode: Episode) 248 | WHERE NOT (:Character {name: "Arya Stark"})-[:APPEARED_IN]->(episode) 249 | RETURN episode 250 | ORDER BY episode.id 251 | ---- 252 | 253 | == Next Step 254 | 255 | In the next section we're going to look at the houses that characters belong to. 256 | 257 | ifdef::env-guide[] 258 | pass:a[Houses] 259 | endif::[] 260 | ifdef::env-graphgist[] 261 | link:{gist}/03_got_houses.adoc[Houses^] 262 | endif::[] -------------------------------------------------------------------------------- /browser-guides/got_wwc/03_got_houses.adoc: -------------------------------------------------------------------------------- 1 | = Game of Thrones: Houses 2 | :csv-url: https://raw.githubusercontent.com/mneedham/neo4j-got/master/data/import/ 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc 5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc 6 | :icons: font 7 | :neo4j-version: 3.5 8 | 9 | == Importing houses 10 | 11 | In this next section we're going to import the houses that characters belong to. 12 | 13 | Run the following query to explore the houses CSV file: 14 | 15 | [source, cypher,subs=attributes] 16 | ---- 17 | LOAD CSV WITH HEADERS FROM "{csv-url}/houses.csv" AS row 18 | RETURN row 19 | ---- 20 | 21 | Now let's create a node with the `House` label for each row in the file: 22 | 23 | [source, cypher,subs=attributes] 24 | ---- 25 | LOAD CSV WITH HEADERS FROM "{csv-url}/houses.csv" AS row 26 | MERGE (house:House {id: row.link}) 27 | ON CREATE SET house.name = row.name 28 | ---- 29 | 30 | Run the following query to return all the houses: 31 | 32 | [source, cypher] 33 | ---- 34 | MATCH (house:House) 35 | RETURN house 36 | ---- 37 | 38 | You should see 73 nodes if the import has worked as expected. 39 | 40 | == Exercise: Create allegiances 41 | 42 | Now it's your turn! 43 | Run the following query to view the allegiances between characters and houses: 44 | 45 | [source, cypher,subs=attributes] 46 | ---- 47 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_houses.csv" AS row 48 | RETURN row.character, row.house 49 | ---- 50 | 51 | Now create a `HAS_ALLEGIANCE_TO` relationship for each character/house pair in the file. 52 | 53 | == Answer: Create allegiances 54 | 55 | [source, cypher,subs=attributes] 56 | ---- 57 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_houses.csv" AS row 58 | MATCH (character:Character {id: row.character}) 59 | MATCH (house:House {id: row.house}) 60 | MERGE (character)-[:HAS_ALLEGIANCE_TO]->(house) 61 | ---- 62 | 63 | == Exploring allegiances 64 | 65 | Run the following query to check that the allegiances have been created: 66 | 67 | [source, cypher] 68 | ---- 69 | MATCH (character:Character)-[:HAS_ALLEGIANCE_TO]->(house) 70 | RETURN character.id, character.name, count(*) AS allegiances 71 | ORDER BY allegiances DESC 72 | ---- 73 | 74 | You should see `Randyll Tarly` in first place with 4 houses. 75 | 76 | == Exercise: What houses do people have allegiance to? 77 | 78 | Time for another mini exercise. 79 | 80 | See if you can tweak the query from the previous slide to include the names of the houses as well as the count. 81 | 82 | _Tip_ The link:https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-collect[`collect` function] will be helpful. 83 | 84 | == Answer: What houses do people have allegiance to? 85 | 86 | We just need to add a call to `collect()` as part of our `RETURN` statement: 87 | 88 | [source, cypher] 89 | ---- 90 | MATCH (character:Character)-[:HAS_ALLEGIANCE_TO]->(house) 91 | RETURN character.id, character.name, collect(house.name) AS houses, count(*) AS allegiances 92 | ORDER BY allegiances DESC 93 | ---- 94 | 95 | == Appearances of the Starks 96 | 97 | In the previous guide we wrote the following query to find the `Stark` character who appeared in the least episodes. 98 | 99 | [source, cypher] 100 | ---- 101 | MATCH (character:Character)-[:APPEARED_IN]->() 102 | WHERE character.name ENDS WITH "Stark" 103 | RETURN character.id, character.name, count(*) AS appearances 104 | ORDER BY appearances 105 | LIMIT 1 106 | ---- 107 | 108 | This query made the assumption that all members of `House Stark` had a name that ended in `Stark`, which isn't necessarily the case. 109 | 110 | The following query finds the most prominent members of `House Stark` who don't have a `Stark` surname: 111 | 112 | [source, cypher] 113 | ---- 114 | MATCH (:House {name: "House Stark"})<-[:HAS_ALLEGIANCE_TO]-(character:Character)-[:APPEARED_IN]->() 115 | WHERE NOT(character.name ENDS WITH "Stark") 116 | RETURN character.id, character.name, count(*) AS appearances 117 | ORDER BY appearances DESC 118 | ---- 119 | 120 | Try tweaking the query to see if there are prominent characters of other houses who don't have the surname of that House. 121 | 122 | == Multiple aggregations 123 | 124 | Let's revisit one of our queries from the first section where we found the characters who'd appeared in the most episodes: 125 | 126 | [source, cypher] 127 | ---- 128 | MATCH (character:Character)-[:APPEARED_IN]->() 129 | RETURN character.name, count(*) AS appearances 130 | ORDER BY appearances DESC 131 | ---- 132 | 133 | It would be cool to see which houses each character had allegiance too as well. 134 | We might try to extend the query to use the `collect` function to do this with the following query: 135 | 136 | [source, cypher] 137 | ---- 138 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character)-[:APPEARED_IN]->() 139 | RETURN character.id, character.name, collect(house.name) AS houses, count(*) AS appearances 140 | ORDER BY appearances DESC 141 | ---- 142 | 143 | Unfortunately, this doesn't give us the result we might have expected. 144 | We've got the house names repeated loads of times and `appearances` is now wrong! 145 | 146 | == Multiple aggregations 147 | 148 | The problem is that we're doing aggregations across different relationships. 149 | But we're trying to do it all in one go. 150 | 151 | Look at the slides for an explanation of how we can use the `WITH` keyword to get around this: 152 | 153 | image::{img}/slides.jpg[] 154 | 155 | == Multiple aggregations using `WITH` 156 | 157 | The following query will correctly calculate the houses and appearances for each character: 158 | 159 | [source, cypher] 160 | ---- 161 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character) 162 | WITH character, collect(house.name) AS houses 163 | MATCH (character)-[:APPEARED_IN]->() 164 | RETURN character.id, character.name, houses, count(*) AS appearances 165 | ORDER BY appearances DESC 166 | ---- 167 | 168 | == Exercise 169 | 170 | Update the query to: 171 | 172 | * only show characters who have appeared in 30 or more episodes. 173 | * and have allegiance to more than 1 house. 174 | 175 | _Tip_ The link:https://neo4j.com/docs/cypher-manual/current/clauses/with/[WITH] and link:https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-size[size()] documentation pages are your friends. 176 | 177 | == Answer: only show characters who have appeared in 30 or more episodes. 178 | 179 | [source, cypher] 180 | ---- 181 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character) 182 | 183 | WITH character, collect(house.name) AS houses 184 | MATCH (character)-[:APPEARED_IN]->() 185 | 186 | WITH character, houses, count(*) AS appearances 187 | WHERE appearances >= 30 188 | 189 | RETURN character.id, character.name, houses, appearances 190 | ORDER BY appearances DESC 191 | ---- 192 | 193 | == Answer: only show characters who have appeared in 30 or more episodes and have allegiance to more than 1 house. 194 | 195 | [source, cypher] 196 | ---- 197 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character) 198 | 199 | WITH character, collect(house.name) AS houses 200 | WHERE size(houses) > 1 201 | MATCH (character)-[:APPEARED_IN]->() 202 | 203 | WITH character, houses, count(*) AS appearances 204 | WHERE appearances >= 30 205 | 206 | RETURN character.id, character.name, houses, appearances 207 | ORDER BY appearances DESC 208 | ---- 209 | 210 | == Next Step 211 | 212 | In the next section, we're going to look at the relationships between different characters and the houses they belong to. 213 | 214 | ifdef::env-guide[] 215 | pass:a[Family Ties] 216 | endif::[] 217 | ifdef::env-graphgist[] 218 | link:{gist}/04_got_families.adoc[Family Ties^] 219 | endif::[] -------------------------------------------------------------------------------- /browser-guides/got_wwc/04_got_families.adoc: -------------------------------------------------------------------------------- 1 | = Game of Thrones: Families 2 | :csv-url: https://raw.githubusercontent.com/mneedham/neo4j-got/master/data/import 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc 5 | :guides: https://guides.neo4j.com/got_wwc 6 | :icons: font 7 | :neo4j-version: 3.5 8 | 9 | == Importing families 10 | 11 | In this final section, we're going to import the family relationships between characters. 12 | 13 | Run the following query to explore the family ties CSV file: 14 | 15 | [source,cypher,subs=attributes] 16 | ---- 17 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row 18 | RETURN row 19 | ---- 20 | 21 | We've got mother and father relationships between pairs of characters. 22 | The father relationships are a bit more nuanced which we can see by running the following query: 23 | 24 | [source,cypher, subs=attributes] 25 | ---- 26 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row 27 | RETURN DISTINCT row.relationship, row.type 28 | ---- 29 | 30 | We have biological fathers, adoptive and legal fathers. 31 | 32 | == Importing families 33 | 34 | First let's import the mother relationships. 35 | We'll create a `PARENT_OF` relationship from a mother to their child for each row in the CSV file: 36 | 37 | [source, cypher, subs=attributes] 38 | ---- 39 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row 40 | WITH row WHERE row.relationship = "mother" 41 | MATCH (character1:Character {id: row.character1}) 42 | MATCH (character2:Character {id: row.character2}) 43 | MERGE (character2)-[:PARENT_OF {type: "mother"}]->(character1) 44 | ---- 45 | 46 | Now we'll do the same with the fathers but we'll also record the type of father relationship as part of the `type` property on the relationship. 47 | 48 | [source, cypher, subs=attributes] 49 | ---- 50 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row 51 | WITH row WHERE row.relationship = "father" 52 | MATCH (character1:Character {id: row.character1}) 53 | MATCH (character2:Character {id: row.character2}) 54 | MERGE (character2)-[:PARENT_OF {type: row.type + " " + row.relationship}]->(character1) 55 | ---- 56 | 57 | == Who are Jon Snow's parents? 58 | 59 | Now we can finally start exploring the relationships between different characters. 60 | We'll start by finding Jon Snow's parents: 61 | 62 | [source, cypher] 63 | ---- 64 | MATCH path = (character:Character {name: "Jon Snow"})<-[:PARENT_OF]-(parent) 65 | RETURN path 66 | ---- 67 | 68 | This query returns both his biological and adoptive fathers. 69 | If we want to only return his biological parents we can tweak the query like so: 70 | 71 | [source, cypher] 72 | ---- 73 | MATCH path = (character:Character {name: "Jon Snow"})<-[parentOf:PARENT_OF]-(parent) 74 | WHERE parentOf.type IN ["mother", "biological father"] 75 | RETURN path 76 | ---- 77 | 78 | == Who are Jon Snow's biological grandparents? 79 | 80 | And finally we can find Jon Snow's biological grandparents! 81 | 82 | [source, cypher] 83 | ---- 84 | MATCH path = (:Character {name: "Jon Snow"})<-[:PARENT_OF*..2]-(parent) 85 | WHERE all(x in relationships(path) WHERE x.type IN ["mother", "biological father"]) 86 | RETURN path 87 | ---- -------------------------------------------------------------------------------- /browser-guides/got_wwc/got_wwc.adoc: -------------------------------------------------------------------------------- 1 | = An Intro to Neo4j with Game of Thrones 2 | :author: Mark Needham 3 | :description: Learn Cypher and explore the Game of Thrones world 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc 7 | :tags: browser-guide, intro, cypher, load-csv, aggregation 8 | :neo4j-version: 3.5 9 | 10 | == Welcome to An Intro to Neo4j with Game of Thrones 11 | 12 | image::{img}/nodes.png[float=right,width=400] 13 | 14 | ifdef::env-guide[] 15 | . pass:a[Intro to Cypher] 16 | . pass:a[Game of Thrones: Characters and Episodes] 17 | . pass:a[Game of Thrones: Houses] 18 | . pass:a[Game of Thrones: Family Ties] 19 | endif::[] 20 | 21 | ifdef::env-graphgist[] 22 | . link:{gist}/01_intro.adoc[Intro to Cypher^] 23 | . link:{gist}/02_got.adoc[Game of Thrones: Characters and Episodes^] 24 | . link:{gist}/03_got_houses.adoc[Game of Thrones: Houses^] 25 | . link:{gist}/04_got_families.adoc[Game of Thrones: Family Ties^] 26 | endif::[] 27 | 28 | == Further Resources 29 | 30 | * https://neo4j.com/graphgists[GraphGist Examples^] 31 | * https://neo4j.com/docs/cypher-refcard/current/[Cypher Reference Card^] 32 | * https://neo4j.com/docs/cypher-manual/current/[Neo4j Cypher Manual^] 33 | * https://neo4j.com/developer/cypher-resources/[Cypher Resources^] 34 | * https://graphdatabases.com[Free e-book: Graph Databases^] -------------------------------------------------------------------------------- /browser-guides/hospital/hospital.adoc: -------------------------------------------------------------------------------- 1 | = Working with Hierarchical Trees in Neo4j 2 | :author: Tomaz Bratanic 3 | :description: Approach hierarchical tree structures in Neo4j by querying and exploring a hospital data set 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/hospital/img 5 | :tags: hierarchy, trees, parent-child, hospital, load-csv, apoc 6 | :neo4j-version: 3.5 7 | 8 | image:{img}/hospitalmeta.jpg[hospitalmeta,width=400] 9 | 10 | == Introduction 11 | 12 | My name is Tomaz Bratanic. I want to demonstrate how you should approach hierarchical location trees in Neo4j. From what I have learned during importing/querying with them, I came up with a few ground rules 13 | one should follow to in order to get the correct query results. 14 | 15 | === Rules of location tree: 16 | 17 | * _All relationships are directed from children to parents, going up the 18 | hiearchy._ 19 | * _We have a single type for all relationships. (PARENT;FROM;IS_IN)_ 20 | * _Every node has a single outgoing relationship to its parent._ 21 | * _Every node can have one or multiple incoming relationships from its 22 | children._ 23 | 24 | === Contact: 25 | 26 | * _twitter: @tb_tomaz_ 27 | * _github: https://github.com/tomasonjo_ 28 | * _blog: https://tbgraph.wordpress.com/category/hospital_ 29 | 30 | == Import 31 | 32 | Let's load some data into our graph to explore. 33 | 34 | === Add constraints and indexes 35 | 36 | First, we need to add indexes and constraints, as they will optimize our queries. The first array in the procedure below sets the indexes, and the second array contains the unique constraints. You will need to have the APOC library installed. 37 | 38 | [source,cypher] 39 | ---- 40 | CREATE INDEX ON :County(name); 41 | CREATE INDEX ON :City(name); 42 | CREATE INDEX ON :ZipCode(name); 43 | CREATE INDEX ON :Address(name); 44 | 45 | CREATE CONSTRAINT ON (h:Hospital) ASSERT h.id IS UNIQUE; 46 | CREATE CONSTRAINT ON (s:State) ASSERT s.name IS UNIQUE; 47 | ---- 48 | 49 | == Location hierarchical tree import 50 | 51 | You can notice that we do not take the standard approach, where we 52 | merge each node separately, but we merge them in pattern with their 53 | parent in a hierarchical tree because some counties/cities/addresses share 54 | the same name. 55 | 56 | [source,cypher] 57 | ---- 58 | LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row 59 | WITH row 60 | WHERE row.State = 'NY' 61 | // state name is unique 62 | MERGE (state:State{name:row.State}) 63 | // merge by pattern with their parents 64 | MERGE (state)<-[:IS_IN]-(county:County{name:row.`County Name`}) 65 | MERGE (county)<-[:IS_IN]-(city:City{name:row.City}) 66 | MERGE (city)<-[:IS_IN]-(zip:ZipCode{name:row.`ZIP Code`}) 67 | MERGE (zip)<-[:IS_IN]-(address:Address{name:row.Address}) 68 | // for entities, it is best to have an id system 69 | MERGE (h:Hospital{id:row.`Provider ID`}) 70 | MERGE (h)-[:IS_IN]->(address) 71 | ---- 72 | 73 | == Additional hospital information 74 | 75 | We will also import some additional information about the hospitals such as their ratings, ownership, and more. 76 | 77 | [source,cypher] 78 | ---- 79 | LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row 80 | WITH row 81 | WHERE row.State = 'NY' 82 | MATCH (h:Hospital{id:row.`Provider ID`}) 83 | SET h.phone=row.`Phone Number`, 84 | h.emergency_services = row.`Emergency Services`, 85 | h.name= row.`Hospital Name`, 86 | h.mortality = row.`Mortality national comparison`, 87 | h.safety = row.`Safety of care national comparison`, 88 | h.timeliness = row.`Timeliness of care national comparison`, 89 | h.experience = row.`Patient experience national comparison`, 90 | h.effectiveness = row.`Effectiveness of care national comparison` 91 | MERGE (type:HospitalType{name:row.`Hospital Type`}) 92 | MERGE (h)-[:HAS_TYPE]->(type) 93 | MERGE (ownership:Ownership{name: row.`Hospital Ownership`}) 94 | MERGE (h)-[:HAS_OWNERSHIP]->(ownership) 95 | MERGE (rating:Rating{name:row.`Hospital overall rating`}) 96 | MERGE (h)-[:HAS_RATING]->(rating) 97 | ---- 98 | 99 | == Geospatial import 100 | 101 | The last thing to import is the geospatial information of hospitals. 102 | 103 | [source,cypher] 104 | ---- 105 | LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/gpsinfo.csv" as row 106 | MATCH (hospital:Hospital {id:row.id}) 107 | SET hospital.latitude = toFloat(row.latitude), 108 | hospital.longitude = toFloat(row.longitude) 109 | ---- 110 | 111 | == Spatial query example 112 | 113 | Let's say you get lost on `Liberty Island` and want to find the nearest 10 114 | hospitals. Distance is in meters. *Note: does not work in Neo4j Sandbox.* 115 | 116 | [source,cypher] 117 | ---- 118 | WITH "Liberty Island, Manhattan" as myLocation 119 | call apoc.spatial.geocodeOnce(myLocation) YIELD location 120 | WITH point({longitude: location.longitude, latitude: location.latitude}) as myPosition,100 as distanceInKm 121 | MATCH (h:Hospital)-->(rating:Rating) 122 | WHERE exists(h.latitude) and 123 | distance(myPosition, point({longitude:h.longitude,latitude:h.latitude})) < (distanceInKm * 100) 124 | RETURN h.name as hospital,rating.name as rating,distance(myPosition, 125 | point({longitude:h.longitude,latitude:h.latitude})) as distance 126 | ORDER BY distance LIMIT 10 127 | ---- 128 | 129 | == Data Validation 130 | 131 | === Validation #1 132 | 133 | We can check if any `:Address` has more than one relationship going up the hierarchy. Every node has a single outgoing relationship to its parent rule. 134 | 135 | [source,cypher] 136 | ---- 137 | MATCH (a:Address) 138 | WHERE size((a)-[:IS_IN]->()) > 1 139 | RETURN a 140 | ---- 141 | 142 | === Validation #2 143 | 144 | We can also check the length of all the paths in location tree. 145 | Because of our rules we placed, every hospital must have exactly one 146 | location path because every hospital have exactly one address. 147 | 148 | [source,cypher] 149 | ---- 150 | MATCH path=(h:Hospital)-[:IS_IN*..10]->(location) 151 | WHERE NOT (location)-[:IS_IN]->() 152 | RETURN distinct(length(path)) as length, 153 | count(*) as numberOfPaths, 154 | count(distinct(h)) as numberOfHospitals 155 | ---- 156 | 157 | == Data Validation 158 | 159 | === Validation #3 160 | 161 | Check how many labels each node has. 162 | This is useful when learning. You do not wish to have nodes without labels. 163 | 164 | [source,cypher] 165 | ---- 166 | MATCH (n) 167 | RETURN size(labels(n)) as size,count(*) as count 168 | ---- 169 | 170 | == Queries 171 | 172 | Lets run a few queries and learn about our data. 173 | 174 | === Average rating by ownership 175 | 176 | [source,cypher] 177 | ---- 178 | MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:HAS_OWNERSHIP]->(o) 179 | RETURN o.name as ownership,avg(toInteger(r.name)) as averageRating 180 | ORDER BY averageRating DESC LIMIT 15 181 | ---- 182 | 183 | === Number of hospitals per city 184 | 185 | [source,cypher] 186 | ---- 187 | MATCH (h:Hospital)-[:IS_IN*3..3]->(city) 188 | RETURN city.name as city,count(h) as NumberOfHospitals 189 | ORDER BY NumberOfHospitals DESC LIMIT 15 190 | ---- 191 | 192 | == Queries 193 | 194 | === Top 10 states by rating 195 | 196 | [source,cypher] 197 | ---- 198 | MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:IS_IN*5..5]->(state) 199 | WHERE NOT r.name="Not Available" 200 | RETURN state.name as state,avg(toInteger(r.name)) as averageRating,count(h) as numberOfHospitals 201 | ORDER BY averageRating DESC LIMIT 15 202 | ---- 203 | 204 | === Which states have the most above-average hospitals in effectivness 205 | 206 | [source,cypher] 207 | ---- 208 | MATCH (h:Hospital)-[:IS_IN*5..5]->(state) 209 | WHERE h.effectiveness = "Above the National average" 210 | RETURN state.name as state,h.effectiveness,count(h) as numberOfHospitals 211 | ORDER BY numberOfHospitals DESC LIMIT 15 212 | ---- 213 | 214 | === Which states have the most below-average hospitals in mortality 215 | 216 | [source,cypher] 217 | ---- 218 | MATCH (h:Hospital)-[:IS_IN*5..5]->(state) 219 | WHERE h.mortality = "Below the National average" 220 | RETURN state.name as state,h.mortality,count(h) as numberOfHospitals 221 | ORDER BY numberOfHospitals DESC LIMIT 15 222 | ---- 223 | -------------------------------------------------------------------------------- /browser-guides/img/AStormOfSwords.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/AStormOfSwords.jpg -------------------------------------------------------------------------------- /browser-guides/img/Graph_betweenness.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/Graph_betweenness.jpg -------------------------------------------------------------------------------- /browser-guides/img/PageRanks-Example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/PageRanks-Example.png -------------------------------------------------------------------------------- /browser-guides/img/apoc-neo4j-user-defined-procedures.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/apoc-neo4j-user-defined-procedures.jpg -------------------------------------------------------------------------------- /browser-guides/img/betweenness-centrality.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/betweenness-centrality.png -------------------------------------------------------------------------------- /browser-guides/img/bugs-bunny-the-end.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/bugs-bunny-the-end.jpg -------------------------------------------------------------------------------- /browser-guides/img/char_cooccurence.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/char_cooccurence.png -------------------------------------------------------------------------------- /browser-guides/img/cypher_create.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/cypher_create.jpg -------------------------------------------------------------------------------- /browser-guides/img/cypher_run_button.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/cypher_run_button.jpg -------------------------------------------------------------------------------- /browser-guides/img/cytutorial_neo4j_browser.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/cytutorial_neo4j_browser.jpg -------------------------------------------------------------------------------- /browser-guides/img/dark-chocolate-pudding-with-malted-cream.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/dark-chocolate-pudding-with-malted-cream.jpg -------------------------------------------------------------------------------- /browser-guides/img/database_import.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/database_import.png -------------------------------------------------------------------------------- /browser-guides/img/document_common_attributes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/document_common_attributes.png -------------------------------------------------------------------------------- /browser-guides/img/download_csv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/download_csv.png -------------------------------------------------------------------------------- /browser-guides/img/download_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/download_graph.png -------------------------------------------------------------------------------- /browser-guides/img/enable_multiline_queries.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/enable_multiline_queries.jpg -------------------------------------------------------------------------------- /browser-guides/img/footballtransfer-model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/footballtransfer-model.png -------------------------------------------------------------------------------- /browser-guides/img/got_header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/got_header.png -------------------------------------------------------------------------------- /browser-guides/img/graph-data-science.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/graph-data-science.jpg -------------------------------------------------------------------------------- /browser-guides/img/hospitalmeta.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/hospitalmeta.jpg -------------------------------------------------------------------------------- /browser-guides/img/jqassistant.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/jqassistant.png -------------------------------------------------------------------------------- /browser-guides/img/label-propagation-graph-algorithm-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/label-propagation-graph-algorithm-1.png -------------------------------------------------------------------------------- /browser-guides/img/label-propagation-graph-algorithm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/label-propagation-graph-algorithm.png -------------------------------------------------------------------------------- /browser-guides/img/life-science-import-datamodel.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/life-science-import-datamodel.jpg -------------------------------------------------------------------------------- /browser-guides/img/life-sciences-import-model-attribute.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/life-sciences-import-model-attribute.jpg -------------------------------------------------------------------------------- /browser-guides/img/life-sciences-import-model-gene.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/life-sciences-import-model-gene.jpg -------------------------------------------------------------------------------- /browser-guides/img/louvain.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/louvain.jpg -------------------------------------------------------------------------------- /browser-guides/img/meetup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/meetup.png -------------------------------------------------------------------------------- /browser-guides/img/n10s.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/n10s.png -------------------------------------------------------------------------------- /browser-guides/img/neo4j-browser-sync.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/neo4j-browser-sync.png -------------------------------------------------------------------------------- /browser-guides/img/nodes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/nodes.png -------------------------------------------------------------------------------- /browser-guides/img/northwind_data_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/northwind_data_model.png -------------------------------------------------------------------------------- /browser-guides/img/pin_button.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/pin_button.png -------------------------------------------------------------------------------- /browser-guides/img/rdf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/rdf.png -------------------------------------------------------------------------------- /browser-guides/img/restaurant_recommendation_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/restaurant_recommendation_model.png -------------------------------------------------------------------------------- /browser-guides/img/schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/schema.png -------------------------------------------------------------------------------- /browser-guides/img/schema_documents.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/schema_documents.png -------------------------------------------------------------------------------- /browser-guides/img/slides.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/slides.jpg -------------------------------------------------------------------------------- /browser-guides/img/stackexchange-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/stackexchange-logo.png -------------------------------------------------------------------------------- /browser-guides/img/stackoverflow-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/stackoverflow-logo.png -------------------------------------------------------------------------------- /browser-guides/img/stackoverflow-model.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/stackoverflow-model.jpg -------------------------------------------------------------------------------- /browser-guides/img/style_actedin_relationship.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/style_actedin_relationship.png -------------------------------------------------------------------------------- /browser-guides/img/style_person_node.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/style_person_node.png -------------------------------------------------------------------------------- /browser-guides/img/style_sheet_grass.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/style_sheet_grass.png -------------------------------------------------------------------------------- /browser-guides/img/sushi_restaurants_nyc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/sushi_restaurants_nyc.png -------------------------------------------------------------------------------- /browser-guides/img/sysinfo_stats.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/sysinfo_stats.png -------------------------------------------------------------------------------- /browser-guides/img/transfermarkt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/transfermarkt.png -------------------------------------------------------------------------------- /browser-guides/img/ukcompanies_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/ukcompanies_model.png -------------------------------------------------------------------------------- /browser-guides/import/01_load_csv.adoc: -------------------------------------------------------------------------------- 1 | = Neo4j import: LOAD CSV in Cypher 2 | :author: Mark Needham 3 | :description: Learn how to use 3 methods for importing data into Neo4j 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/import/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/import 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/import 7 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data 8 | :tags: import, data, load, load-csv 9 | :neo4j-version: 3.5 10 | :icons: font 11 | 12 | == Intro to the dataset 13 | 14 | Welcome to the first of a set of interactive guides. 15 | In these guides, we will import a dataset containing the connections between U.S. airports in 2008. 16 | 17 | Let's get started! 18 | 19 | == Exploring data with `LOAD CSV` 20 | 21 | While we are getting started with our dataset, it's much easier to work with a subset of the data so that we can iterate quickly. 22 | A smaller dataset containing 10,000 connections between U.S. airports lives in `flights_initial.csv`. 23 | 24 | We can run the following query to see what data we have to work with: 25 | 26 | [source,cypher,subs=attributes] 27 | ---- 28 | LOAD CSV WITH HEADERS FROM "{data-url}flights_initial.csv" AS row 29 | RETURN row 30 | LIMIT 5 31 | ---- 32 | 33 | This query: 34 | 35 | * loads the file `flights_initial.csv` 36 | * iterates over the file, referring to each line as the variable `row` 37 | * and returns the first 5 lines in the file 38 | 39 | If you see an error message that mentions `Couldn't load the external resource` the CSV files haven't been copied to the correct location. 40 | Grab a trainer for help! 41 | 42 | There are lots of different fields in this CSV file. 43 | 44 | == Importing flights and airports 45 | 46 | Run the following query to create nodes and relationships for the flights 47 | 48 | [source,cypher,subs=attributes] 49 | ---- 50 | LOAD CSV WITH HEADERS FROM "{data-url}flights_initial.csv" AS row 51 | MERGE (origin:Airport {code: row.Origin}) 52 | MERGE (destination:Airport {code: row.Dest}) 53 | WITH row.UniqueCarrier + row.FlightNum + "_" + row.Year + "-" + row.Month + "-" + row.DayofMonth + "_" + row.Origin + "_" + row.Dest AS flightIdentifier, row, origin, destination 54 | MERGE (flight:Flight { id: flightIdentifier }) 55 | ON CREATE SET flight.date = row.Year + "-" + row.Month + "-" + row.DayofMonth, 56 | flight.airline = row.UniqueCarrier, flight.number = row.FlightNum, flight.departure = row.CRSDepTime, 57 | flight.arrival = row.CRSArrTime, flight.distance = row.Distance, flight.cancelled = row.Cancelled 58 | MERGE (flight)-[:ORIGIN]->(origin) 59 | MERGE (flight)-[:DESTINATION]->(destination) 60 | ---- 61 | 62 | This query: 63 | 64 | * iterates through each row in the file 65 | * creates nodes with the `Airport` label for the origin and destination airports if they don't already exist 66 | * creates nodes with the `Flight` label for flights if they don't already exist. We invent our own `flightIdentifier` as there isn't one in the dataset 67 | * creates an `ORIGIN` relationship between the origin airport and the flight 68 | * creates a `DESTINATION` relationship between the destination airport and the flight 69 | 70 | You'll notice that this query took quite a while to run - we'll look at how to address that in a minute, but first let's talk about property types. 71 | 72 | == Coercing values 73 | 74 | By default properties will be stored as strings. 75 | This will cause us some problems when we start querying the data. 76 | 77 | What if we want to find all the flights that were longer than 500km? 78 | We might write the following query: 79 | 80 | [source,cypher] 81 | ---- 82 | MATCH (flight:Flight) 83 | WHERE flight.distance > 500 84 | RETURN flight 85 | ---- 86 | 87 | No rows! 88 | That's maybe surprising since we know there are definitely some flights that meet this criteria. 89 | 90 | == Coercing values: Integers 91 | 92 | Cypher has functions that allow us to coerce values to other types. 93 | You can read more about them in the https://neo4j.com/docs/cypher-manual/current/functions/scalar/#query-functions-scalar[scalar functions section] of the https://neo4j.com/docs/cypher-manual/current/[cypher manual^]. 94 | 95 | We can use the `toInteger` function to convert the `distance` parameter. 96 | 97 | [source,cypher] 98 | ---- 99 | MATCH (flight:Flight) 100 | SET flight.distance = toInteger(flight.distance) 101 | ---- 102 | 103 | Now let's retry the query: 104 | 105 | [source,cypher] 106 | ---- 107 | MATCH (flight:Flight) 108 | WHERE flight.distance > 500 109 | RETURN flight 110 | ---- 111 | 112 | == Coercing values: Booleans 113 | 114 | The `cancelled` property hasn't been imported in an optimal way either. 115 | Ideally, we would like that to be a boolean value, but at the moment, it's stored as `0` or `1`. 116 | 117 | There isn't a function to fix this but we can write some Cypher that will do the trick: 118 | 119 | [source,cypher] 120 | ---- 121 | MATCH (flight:Flight) 122 | SET flight.cancelled = CASE WHEN flight.cancelled = "1" THEN true ELSE false END 123 | ---- 124 | 125 | Now we can write a query to find all the flights that were cancelled: 126 | 127 | [source,cypher] 128 | ---- 129 | MATCH (flight:Flight) 130 | WHERE flight.cancelled 131 | RETURN flight 132 | ---- 133 | 134 | == Speeding up the import 135 | 136 | Next, we are going to import 40,000 more flights, but first, we need to make our import script quicker. 137 | 138 | In our initial `LOAD CSV` command, we do multiple label scans on our `MERGE` clauses to create origins, destinations, and flights. 139 | 140 | We can create unique constraints to solve this problem. 141 | This will have the added benefit of stopping us from accidentally creating duplicate nodes! 142 | 143 | [source,cypher] 144 | ---- 145 | CREATE CONSTRAINT ON (a:Airport) 146 | ASSERT a.code IS UNIQUE 147 | ---- 148 | 149 | [source,cypher] 150 | ---- 151 | CREATE CONSTRAINT ON (f:Flight) 152 | ASSERT f.id IS UNIQUE 153 | ---- 154 | 155 | Run the following commands to check our constraints were created: 156 | 157 | [source,cypher] 158 | ---- 159 | :schema 160 | ---- 161 | 162 | == Import a bigger dataset 163 | 164 | Now we are ready to import some more flights. 165 | We will use the `USING PERIODIC COMMIT` clause so that we don't build up lots of transaction state in memory - by default our query will commit every 1,000 rows. 166 | 167 | Run the following command: 168 | 169 | [source,cypher,subs=attributes] 170 | ---- 171 | USING PERIODIC COMMIT 172 | LOAD CSV WITH HEADERS FROM "{csv-url}flights_50k.csv" AS row 173 | MERGE (origin:Airport {code: row.Origin}) 174 | MERGE (destination:Airport {code: row.Dest}) 175 | WITH row.UniqueCarrier + row.FlightNum + "_" + row.Year + "-" + row.Month + "-" + row.DayofMonth + "_" + row.Origin + "_" + row.Dest AS flightIdentifier, row, origin, destination 176 | MERGE (flight:Flight { id: flightIdentifier }) 177 | ON CREATE SET flight.date = row.Year + "-" + row.Month + "-" + row.DayofMonth, 178 | flight.airline = row.UniqueCarrier, flight.number = row.FlightNum, flight.departure = row.CRSDepTime, 179 | flight.arrival = row.CRSArrTime, flight.distance = row.Distance, flight.cancelled = row.Cancelled 180 | MERGE (flight)-[:ORIGIN]->(origin) 181 | MERGE (flight)-[:DESTINATION]->(destination) 182 | ---- 183 | 184 | == Checking our import 185 | 186 | We now have 50,000 flights in the database, which we can check by executing the following query: 187 | 188 | WARNING: If you don't have enough heap configured, this query will fail, despite the `PERIODIC COMMIT`. That's because of the `Eager` operator that's inserted by the double `MERGE` on the same label-property combination. 189 | 190 | [source,cypher] 191 | ---- 192 | MATCH (:Flight) 193 | RETURN count(*) 194 | ---- 195 | 196 | == Next step 197 | 198 | We can get a lot of data into Neo4j using pure Cypher, but if we want to import data from other sources, then APOC is the best method that covers a wide range of other data import scenarios. 199 | 200 | ifdef::env-guide[] 201 | pass:a[Cypher and APOC] 202 | endif::[] 203 | 204 | ifdef::env-graphgist[] 205 | link:{gist}/02_apoc.adoc[Cypher and APOC^] 206 | endif::[] -------------------------------------------------------------------------------- /browser-guides/import/03_procedures.adoc: -------------------------------------------------------------------------------- 1 | = Neo4j import: Custom procedures 2 | :author: Mark Needham 3 | :description: Learn how to use 3 methods for importing data into Neo4j 4 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data 5 | :img: https://s3.amazonaws.com/guides.neo4j.com/import/img 6 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/import 7 | :guides: https://s3.amazonaws.com/guides.neo4j.com/import 8 | :tags: import, data, load, custom-procedures, user-defined, procedures 9 | :neo4j-version: 3.5 10 | :icons: font 11 | 12 | == Procedures 13 | 14 | In this next section, we'll get some practice writing custom procedures. 15 | You'll need to have Java installed on your machine for this exercise. 16 | 17 | == OpenStreetMap 18 | 19 | OpenStreetMap provides https://wiki.openstreetmap.org/wiki/Downloading_data[several different to export data^], including the Overpass API which allows us to specify the coordinates of a bounded box that we would like to download. 20 | 21 | e.g. https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145] 22 | 23 | If we open that URI, we will see something like this: 24 | 25 | ``` 26 | 27 | The data included in this document is from www.openstreetmap.org. The data is made available under ODbL. 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | ``` 39 | 40 | == Exploring OpenStreetMap with `apoc.load.xml` 41 | 42 | We want to create nodes based on the `` elements and connect them together using the `` elements. 43 | 44 | In OSM, https://wiki.openstreetmap.org/wiki/Node[a node^] represents "a single point in space defined by its latitude, longitude, and node id." 45 | 46 | Let's first try using APOC's `apoc.load.xml` procedure to do this. 47 | The following query finds the points in a bounded box in Munich: 48 | 49 | [source,cypher] 50 | ---- 51 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]') 52 | YIELD value 53 | UNWIND value["_children"] AS child 54 | 55 | WITH child WHERE child["_type"] = "node" 56 | RETURN child.id AS id, child.lat AS latitude, child.lon AS longitude, child["user"] AS userName 57 | LIMIT 10 58 | ---- 59 | 60 | == Importing OpenStreetMap with `apoc.load.xml` 61 | 62 | Now let's import those points! 63 | 64 | First we'll create a unique constraint on `:Point(id)` so that we don't end up with duplicate points. 65 | This command will also create an index which will be useful in the next section: 66 | 67 | [source,cypher] 68 | ---- 69 | CREATE CONSTRAINT ON (p:Point) 70 | ASSERT p.id is UNIQUE 71 | ---- 72 | 73 | == Import the XML data 74 | 75 | Run the following query to import the xml data and create the `Users` and `Points` for our data model: 76 | 77 | [source,cypher] 78 | ---- 79 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]') 80 | YIELD value 81 | UNWIND value["_children"] AS child 82 | 83 | WITH child WHERE child["_type"] = "node" 84 | WITH child.id AS id, child.lat AS latitude, child.lon AS longitude, child["user"] AS userName 85 | 86 | MERGE (point:Point {id: id}) 87 | SET point.latitude = latitude, point.longitude = longitude 88 | MERGE (user:User {name: userName}) 89 | MERGE (user)-[:EDITED]->(point) 90 | ---- 91 | 92 | == Verify data in the graph 93 | 94 | We can run the following query to check the points were created: 95 | 96 | [source,cypher] 97 | ---- 98 | MATCH (point:Point)<-[:EDITED]-(user) 99 | RETURN point.id, point.latitude, point.longitude, user.name 100 | LIMIT 25 101 | ---- 102 | 103 | == Importing OpenStreetMap with `apoc.load.xml` 104 | 105 | Next, we want to create a relationship between adjacent points. 106 | 107 | Let's first see what the data in the `` elements look like: 108 | 109 | [source,cypher] 110 | ---- 111 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]') 112 | YIELD value 113 | UNWIND value["_children"] AS child 114 | 115 | WITH child WHERE child["_type"] = "way" 116 | RETURN child.id AS id, [child in child["_children"] where child["_type"] = "nd"] AS children 117 | LIMIT 1 118 | ---- 119 | 120 | We want to create a `CONNECTS` relationship between the adjacent nodes inside a given `way`. For instance, if `children` contained `[1,2,3]` we want to create `(1)-[:CONNECTS]->(2)` and `(2)-[:CONNECTS]->(3)`. 121 | 122 | == Importing OpenStreetMap with `apoc.load.xml` 123 | 124 | Run the following query to add a `CONNECTS` relationship between adjacent nodes: 125 | 126 | [source,cypher] 127 | ---- 128 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]') 129 | YIELD value 130 | UNWIND value["_children"] AS child 131 | 132 | WITH child WHERE child["_type"] = "way" 133 | WITH child.id AS id, [child in child["_children"] where child["_type"] = "nd"] AS children 134 | UNWIND range(0, size(children) - 2) AS idx 135 | WITH id, children[idx] as start, children[idx+1] AS end 136 | MATCH (p1:Point {id: start["ref"]}) 137 | MATCH (p2:Point {id: end["ref"]}) 138 | MERGE (p1)-[:CONNECTS]->(p2) 139 | ---- 140 | 141 | == Querying OpenStreetMap 142 | 143 | Now let's see if we can find a path between two points: 144 | 145 | [source,cypher] 146 | ---- 147 | MATCH (p1:Point {id: "3800618341"}) 148 | MATCH (p2:Point {id: "1485915298"}) 149 | MATCH path = shortestpath((p1)-[:CONNECTS*]-(p2)) 150 | RETURN p1, p2, path 151 | ---- 152 | 153 | Cool! All good so far. 154 | 155 | == Custom procedures 156 | 157 | We were able to achieve what we wanted with `apoc.load.xml`, but the Cypher we have to write gets more complicated as we get deeper into the XML structure. 158 | We also had to run two queries to achieve our desired graph structure. It would be nice if we could do everything in one pass. 159 | 160 | We've started on the implementation of a procedure that can do just this! 161 | You can find it on the Neo4j training repository - https://github.com/neo4j-contrib/training/tree/master/import/custom-procedure. 162 | 163 | == OSM Import Procedure 164 | 165 | Go ahead and clone the repository, then build the procedure by executing the following command: 166 | 167 | ``` 168 | mvn clean install -DskipTests 169 | ``` 170 | 171 | We'll then have the following jar in our `target` directory: 172 | 173 | ``` 174 | $ ls target/neo4j*.jar 175 | target/neo4j-procedures-examples-1.0.0-SNAPSHOT.jar 176 | ``` 177 | 178 | Copy that into your Neo4j `plugins` directory and restart Neo4j. 179 | 180 | == Running the OSM Import Procedure 181 | 182 | We've already implemented importing nodes which you can try out by executing the following command: 183 | 184 | [source, cypher] 185 | ---- 186 | CALL osm.importUri('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543, 48.145]') 187 | ---- 188 | 189 | == Exercise: Adding connections to the OSM Import Procedure 190 | 191 | Now we need to update our procedure to import the connections as well. 192 | 193 | This is where you will need Java installed on your system to give this a try. 194 | 195 | == Next steps 196 | 197 | Congratulations! You have completed this guide and learned how to use Cypher with LOAD CSV, the APOC library, and custom procedures to import data into Neo4j. 198 | There are more things you can do with each of these methods, as well as other import methods available, such as the ETL tool, language drivers, Kettle, and command line tools. 199 | Feel free to check out the resources linked below for more information! 200 | 201 | * https://neo4j.com/developer/data-import/[Data Import with Neo4j] 202 | * https://neo4j.com/docs/cypher-manual/current/[Cypher documentation] 203 | * https://neo4j.com/labs/apoc/current/[APOC documentation] 204 | * https://neo4j.com/docs/cypher-manual/current/functions/user-defined/[User-Defined Cypher Functions] 205 | * https://neo4j.com/developer/cypher/procedures-functions/[Writing custom procedures and functions] -------------------------------------------------------------------------------- /browser-guides/import/import.adoc: -------------------------------------------------------------------------------- 1 | = Neo4j Import 2 | :author: Mark Needham 3 | :description: Learn how to use 3 methods for importing data into Neo4j 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/import/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/import 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/import 7 | :tags: import, data, load, load-csv, apoc, procedures 8 | :neo4j-version: 3.5 9 | 10 | == Welcome to Neo4j Import 11 | 12 | image:{img}/database_import.png[db-import,width=300,float=right] 13 | 14 | In this set of guides, we will cover three different ways to import data into Neo4j. These are not the only three ways to ingest data for Neo4j, but they are common methods. 15 | 16 | ifdef::env-guide[] 17 | . pass:a[Cypher and LOAD CSV] 18 | . pass:a[Cypher and APOC] 19 | . pass:a[Procedures] 20 | endif::[] 21 | 22 | ifdef::env-graphgist[] 23 | . link:{gist}/01_load_csv.adoc[Cypher and LOAD CSV^] 24 | . link:{gist}/02_apoc.adoc[Cypher and APOC^] 25 | . link:{gist}/03_procedures.adoc[Procedures^] 26 | endif::[] 27 | 28 | == Further Resources 29 | 30 | * https://neo4j.com/graphgists[Graph Gist Examples] 31 | * https://neo4j.com/docs/stable/cypher-refcard/[Cypher Reference Card] 32 | * https://neo4j.com/labs/apoc/[APOC documentation] 33 | * https://neo4j.com/docs/cypher-manual/current/[Cypher documentation] 34 | * https://graphdatabases.com[e-book: Graph Databases (free)] -------------------------------------------------------------------------------- /browser-guides/meetup/01_meetup_import.adoc: -------------------------------------------------------------------------------- 1 | = Data import 2 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/meetup/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/meetup 5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/meetup 6 | :icons: font 7 | :neo4j-version: 3.5 8 | 9 | == Import data from Meetup API 10 | 11 | First, we need to import the data from the Meetup API. 12 | Many of the endpoints provided by Meetup.com are restricted and require an account and credentials, but for this guide, we will only query the open endpoint for event RSVPs. 13 | 14 | For additional data or analysis, you can create a free account and import from many other endpoints, as outlined in the https://www.meetup.com/meetup_api/docs/[Meetup API documentation^]. 15 | 16 | == Setup: Indexes and Constraints 17 | 18 | To help speed up performance of queries and ensure unique entities, let's go ahead and set up some constraints and indexes. 19 | 20 | *Note:* Ensure the `Enable multi statement query editor` setting is checked under `Settings` in Neo4j Browser. 21 | 22 | [source,cypher] 23 | ---- 24 | CREATE INDEX ON :Member(id); 25 | 26 | CREATE INDEX ON :Event(id); 27 | CREATE INDEX ON :Event(time); 28 | CREATE INDEX ON :Event(location); 29 | 30 | CREATE INDEX ON :Group(id); 31 | CREATE INDEX ON :Group(name); 32 | CREATE INDEX ON :Group(location); 33 | 34 | CREATE INDEX ON :Venue(id); 35 | CREATE INDEX ON :Venue(location); 36 | CREATE INDEX ON :RSVP(id); 37 | CREATE INDEX ON :Topic(name); 38 | CREATE INDEX ON :Topic(urlkey); 39 | 40 | CREATE INDEX ON :City(name); 41 | CREATE INDEX ON :City(location); 42 | CREATE INDEX ON :City(population); 43 | 44 | CREATE INDEX ON :Country(iso2); 45 | CREATE INDEX ON :Country(name); 46 | 47 | CREATE CONSTRAINT ON (t:Topic) ASSERT t.id IS UNIQUE; 48 | ---- 49 | 50 | == Import data 51 | 52 | Now we can import the data with the statement below. 53 | It creates group, member, event, venue, rsvp, and topic entities in our graph. 54 | 55 | The query will take a few minutes to complete, as it is retrieving 100 entities from the API and creating all of the relations at once. 56 | 57 | *Note:* Each time this query is run, it may yield different results. The query is not filtering a specific set of RSVP data, so it will retrieve whatever is provided by the API. 58 | 59 | [source, cypher] 60 | ---- 61 | WITH 'https://stream.meetup.com/2/rsvps' as url 62 | CALL apoc.load.json(url) YIELD value 63 | WITH value LIMIT 100 64 | WITH value.venue as venueData, value.member as memberData, value.event as eventData, value.group.group_topics as topics, value as data, apoc.map.removeKeys(value.group, ['group_topics']) as groupData 65 | 66 | MERGE (member:Member { id: memberData.member_id }) 67 | ON CREATE SET member.name = memberData.member_name, member.photo = memberData.photo 68 | 69 | MERGE (event:Event { id: eventData.event_id }) 70 | ON CREATE SET event.name = eventData.event_name, event.time = datetime({ epochMillis: coalesce(eventData.time, 0) }), event.url = eventData.event_url 71 | 72 | MERGE (group:Group { id: groupData.group_id }) 73 | ON CREATE SET group.name = groupData.group_name, group.city = groupData.group_city, group.country = groupData.group_country, group.state = groupData.group_state, group.location = point({latitude: groupData.group_lat, longitude: groupData.group_lon}), group.urlname = groupData.group_urlname 74 | 75 | MERGE (venue:Venue { id: coalesce(venueData.venue_id, randomUUID()) }) 76 | ON CREATE SET venue.name = venueData.venue_name, venue.location = point({latitude: venueData.lat, longitude: venueData.lon}) 77 | 78 | CREATE (rsvp:RSVP {id: coalesce(data.rsvp_id, randomUUID()), guests: coalesce(data.guests, 0), mtime: datetime({ epochMillis: coalesce(data.mtime, 0) }), response: data.response, visibility: data.visibility}) 79 | MERGE (rsvp)-[:MEMBER]->(member) 80 | MERGE (rsvp)-[:EVENT]->(event) 81 | MERGE (rsvp)-[:GROUP]->(group) 82 | 83 | MERGE (member)-[:RSVP]->(event) 84 | MERGE (event)<-[:HELD]-(group) 85 | MERGE (event)-[:LOCATED_AT]->(venue) 86 | 87 | WITH group, topics 88 | UNWIND topics as tp 89 | MERGE (t:Topic { urlkey: tp.urlkey }) 90 | ON CREATE SET t.name = tp.topic_name 91 | MERGE (group)-[:TOPIC]->(t); 92 | ---- 93 | 94 | == Verify data import 95 | 96 | We should have a small data set in our graph database for us to query and explore now! 97 | Before we dive into exploration, though, let us take a look at our data model of the data that is there. 98 | 99 | [source,cypher] 100 | ---- 101 | //what does our data model look like? 102 | CALL db.schema.visualization(); 103 | ---- 104 | 105 | == Improvements? 106 | 107 | Hm, it might be nice to have location (country/city) separated for our meetup groups so that we can easily query for groups in a certain area. 108 | Let's see if we can fix that by importing all countries and cities in the world. 109 | 110 | == Import World Cities/Countries 111 | 112 | [source,cypher,subs=attributes] 113 | ---- 114 | LOAD CSV WITH HEADERS 115 | FROM '{data-url}/worldcities.csv' AS line 116 | 117 | MERGE (country:Country {name: coalesce(line.country, '')}) 118 | SET iso2: coalesce(line.iso2, ''), iso3: coalesce(line.iso3, '') 119 | 120 | MERGE (c:City {name: coalesce(line.city, '')}) 121 | SET id: coalesce(line.id, ''), asciiName: coalesce(line.city_ascii, ''), adminName: coalesce(line.admin_name, ''), capital: coalesce(line.capital, ''), location: point({latitude: toFloat(coalesce(line.lat, '0.0')), longitude: toFloat(coalesce(line.lng, '0.0'))}), population: coalesce(toInteger(coalesce(line.population, 0)), 0) 122 | 123 | MERGE (c)-[:IN]->(country); 124 | ---- 125 | 126 | == Verify City/Country import 127 | 128 | We can verify our last import with a quick query searching for the city of `London`. 129 | 130 | [source,cypher] 131 | ---- 132 | MATCH (c:City {name: 'London'})-[r:IN]-(o:Country) 133 | RETURN c, r, o 134 | ---- 135 | 136 | A few results should come back. It looks like that United Kingdom city also shares a name with cities in a couple of different states in the United States, as well as a city in Canada. 137 | 138 | Now we need to tie those locations back to our meetup groups! 139 | 140 | == Add relationships between locations, meetup groups, and events 141 | 142 | [source,cypher] 143 | ---- 144 | //link groups and locations 145 | MATCH (g:Group) 146 | WITH toUpper(g.country) as iso2, g 147 | MATCH (c:Country {iso2: iso2}) 148 | MERGE (g)-[r:IN]->(c) 149 | RETURN count(r); 150 | ---- 151 | 152 | [source,cypher] 153 | ---- 154 | //link venues and cities 155 | CALL apoc.periodic.iterate("MATCH (c:City) RETURN c.location as loc, c", 156 | "WITH loc, c, 24140.2 as FifteenMilesInMeters 157 | MATCH (v:Venue) 158 | WHERE distance(v.location, c.location) < FifteenMilesInMeters 159 | MERGE (v)-[r:NEAR]->(c)", { batchSize: 500 }) 160 | YIELD batches, total 161 | RETURN batches, total; 162 | ---- 163 | 164 | == Import check 165 | 166 | Now that we have all of that data, let's take a look at our data model again, then run a few summary queries to understand what all we have. 167 | 168 | [source,cypher] 169 | ---- 170 | CALL db.schema.visualization(); 171 | ---- 172 | 173 | == Data summary queries 174 | 175 | [source,cypher] 176 | ---- 177 | //How many meetup groups are in our dataset? 178 | MATCH (n:Group) RETURN count(n); 179 | ---- 180 | 181 | [source,cypher] 182 | ---- 183 | //find some cities with events 184 | MATCH (c:City)-[n:NEAR]-(v:Venue)-[l:LOCATED_AT]-(e:Event) 185 | RETURN * LIMIT 20; 186 | ---- 187 | 188 | [source,cypher] 189 | ---- 190 | //find some upcoming events 191 | MATCH (e:Event)-[l:LOCATED_AT]-(v:Venue)-[n:NEAR]-(c:City) 192 | WHERE e.time > datetime() 193 | RETURN * LIMIT 20; 194 | ---- 195 | 196 | == Next 197 | 198 | In the next section, we are going to explore our data more thoroughly using queries. 199 | 200 | ifdef::env-guide[] 201 | pass:a[Data Analysis] 202 | endif::[] 203 | 204 | ifdef::env-graphgist[] 205 | link:{gist}/02_data_analysis.adoc[Data Analysis^] 206 | endif::[] -------------------------------------------------------------------------------- /browser-guides/meetup/02_data_analysis.adoc: -------------------------------------------------------------------------------- 1 | = Data Analysis 2 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data 3 | :img: https://s3.amazonaws.com/guides.neo4j.com/meetup/img 4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/meetup 5 | :icons: font 6 | :neo4j-version: 3.5 7 | 8 | == Explore Meetup data with Cypher 9 | 10 | We can dig deeper into our graph to gain insights using the relationships between entities for insights. 11 | 12 | First, let's see which cities have the most events. 13 | 14 | *Remember:* We imported random data from api, so there may be varying results. 15 | 16 | [source,cypher] 17 | ---- 18 | //find cities with most events 19 | MATCH (c:City)-[n:NEAR]-(v:Venue)-[l:LOCATED_AT]-(e:Event) 20 | RETURN c.name, count(e) as count ORDER BY count DESC; 21 | ---- 22 | 23 | == Analysis: Topics 24 | 25 | === Query if any groups are in the Tech topic 26 | [source,cypher] 27 | ---- 28 | //note: there may not be any groups for this topic in our database due to random import 29 | MATCH (t:Topic {name: 'Tech'})-[r:TOPIC]-(g:Group) 30 | RETURN *; 31 | ---- 32 | 33 | === Find most popular topics 34 | 35 | [source,cypher] 36 | ---- 37 | MATCH (t:Topic)-[r:TOPIC]-(g:Group) 38 | RETURN t.name, count(g) as count ORDER BY count DESC 39 | ---- 40 | 41 | == Analysis: Topics 42 | 43 | === Find which users attend the most Meetups in a random topic 44 | 45 | [source,cypher] 46 | ---- 47 | MATCH (t:Topic) 48 | WITH collect(t) as topics 49 | WITH apoc.coll.randomItem(topics) as targetTopic 50 | MATCH (targetTopic)-[:TOPIC]-(g:Group)-[:HELD]-(e:Event)<-[:EVENT]-(r:RSVP)-[:MEMBER]-(member:Member) 51 | RETURN targetTopic.name as topic, member.name as member, count(r) as RSVPs 52 | ORDER BY RSVPs DESC limit 10; 53 | ---- 54 | 55 | == Analysis: Groups 56 | 57 | === Which group was created most recently? 58 | 59 | [source,cypher] 60 | ---- 61 | MATCH (g:Group) 62 | RETURN g 63 | ORDER BY g.created DESC 64 | LIMIT 1 65 | ---- 66 | 67 | === How many groups have been running for at least 1 year? 68 | 69 | [source,cypher] 70 | ---- 71 | //note: there may not be any results due to random import 72 | MATCH (g:Group) 73 | WHERE (timestamp() - g.created) / 1000 / 3600 / 24 / 365 >= 1 74 | RETURN count(g) 75 | ---- 76 | 77 | == Analysis: Groups 78 | 79 | === Find groups with 'Neo4j' or 'Data' in their name. 80 | 81 | [source,cypher] 82 | ---- 83 | MATCH (g:Group) 84 | WHERE g.name CONTAINS 'Neo4j' OR g.name CONTAINS 'Data' 85 | RETURN g 86 | ---- 87 | 88 | === What are the distinct topics for those groups? 89 | 90 | [source,cypher] 91 | ---- 92 | MATCH (g:Group)-[:TOPIC]->(t:Topic) 93 | WHERE g.name CONTAINS 'Neo4j' OR g.name CONTAINS 'Data' 94 | RETURN t.name, count(*) 95 | ---- 96 | 97 | == Analysis: Events 98 | 99 | === Who brings the most guests? 100 | 101 | [source,cypher] 102 | ---- 103 | MATCH (r:RSVP)-[:MEMBER]->(m:Member) 104 | WHERE r.guests > 5 105 | RETURN m.name, sum(r.guests) as totalGuests 106 | ORDER BY totalGuests DESC limit 10; 107 | ---- 108 | 109 | === Which venue hosts the most meetups? 110 | 111 | [source,cypher] 112 | ---- 113 | MATCH (v:Venue)<-[:LOCATED_AT]-(e:Event) 114 | WHERE v.name IS NOT NULL 115 | RETURN v.name, v.location, count(e) as events 116 | ORDER BY events desc 117 | LIMIT 10; 118 | ---- 119 | 120 | == Analysis: Events 121 | 122 | === Find meetups a random venue has hosted? 123 | 124 | [source,cypher] 125 | ---- 126 | MATCH (v:Venue) 127 | WHERE v.name IS NOT NULL 128 | WITH collect(v) as venues 129 | WITH apoc.coll.randomItem(venues) as venue 130 | MATCH (venue)<-[:LOCATED_AT]-(e:Event)<-[:HELD]-(g:Group), 131 | (e)-[:EVENT]-(r:RSVP) 132 | RETURN venue.name, venue.location, e.name, g.name, count(r) as RSVPs 133 | LIMIT 10; 134 | ---- 135 | 136 | == Analysis: Shortest paths 137 | 138 | === Shortest paths between random venues 139 | 140 | [source,cypher] 141 | ---- 142 | MATCH (v:Venue) 143 | WHERE v.name IS NOT NULL 144 | WITH collect(v) as venues 145 | WITH apoc.coll.randomItem(venues) as v1, apoc.coll.randomItem(venues) as v2 146 | MATCH p=shortestPath((v1)-[*]-(v2)) 147 | RETURN p; 148 | ---- 149 | 150 | === Shortest path between two random topics 151 | 152 | [source,cypher] 153 | ---- 154 | MATCH (t:Topic) 155 | WITH collect(t) as topics 156 | WITH apoc.coll.randomItem(topics) as t1, apoc.coll.randomItem(topics) as t2 157 | MATCH p=shortestPath((t1)-[*]-(t2)) 158 | RETURN p; 159 | ---- 160 | 161 | == Analysis: Shortest paths 162 | 163 | === Shortest path among 3 random members 164 | 165 | [source,cypher] 166 | ---- 167 | MATCH (m:Member) 168 | WITH collect(m) as members 169 | WITH apoc.coll.randomItem(members) as m1, apoc.coll.randomItem(members) as m2, apoc.coll.randomItem(members) as m3 170 | MATCH p1=shortestPath((m1)-[*]-(m2)), 171 | p2=shortestPath((m2)-[*]-(m3)), 172 | p3=shortestPath((m1)-[*]-(m3)) 173 | RETURN p1, p2, p3; 174 | ---- 175 | 176 | == Analysis: Find events in area 177 | 178 | === Find future Richmond meetups within 10 miles of downtown 179 | 180 | [source,cypher] 181 | ---- 182 | WITH point({ latitude: 37.5407246, longitude: -77.4360481 }) as RichmondVA, 32186.9 as TenMiles /* 10 mi expressed in meters */ 183 | MATCH (v:Venue)<-[:LOCATED_AT]-(e:Event)-[:HELD]-(g:Group) 184 | WHERE distance(v.location, RichmondVA) < TenMiles AND e.time > datetime() 185 | RETURN g.name as GroupName, e.name as EventName, e.time as When, v.name as Venue limit 10; 186 | ---- 187 | 188 | == Analysis: Find events in area 189 | 190 | === Find events within distance of random location 191 | 192 | [source,cypher] 193 | ---- 194 | WITH rand() * 90 * (CASE WHEN rand() <= 0.5 THEN 1 ELSE -1 END) as randLat, rand() * 90 * (CASE WHEN rand() <= 0.5 THEN 1 ELSE -1 END) as randLon 195 | WITH point({ latitude: randLat, longitude: randLon }) as randomLocation 196 | MATCH (v:Venue)-[:NEAR]->(city:City)-[:IN]->(c:Country) 197 | RETURN city.name as City, 198 | c.name as Country, 199 | v.name as Venue, 200 | v.location as VenueLocation, 201 | randomLocation as RandomLocation, 202 | distance(v.location, randomLocation) as DistanceInMeters 203 | ORDER BY distance(v.location, randomLocation) ASC 204 | LIMIT 1; 205 | ---- 206 | 207 | == Analysis: Find events in area 208 | 209 | === Find upcoming dance events in Manhattan 210 | 211 | [source,cypher] 212 | ---- 213 | WITH point({ latitude: 40.758896, longitude: -73.985130 }) as TimesSquareManhattan, 32186.9 as TenMiles 214 | MATCH (v:Venue)<-[:LOCATED_AT]-(e:Event), 215 | (e)-[:HELD]-(g:Group), 216 | (g)-[:TOPIC]->(t:Topic), 217 | (e)<-[:EVENT]-(r:RSVP) 218 | WHERE e.time >= datetime("2018-09-06T00:00:00Z") AND 219 | e.time <= datetime("2018-09-06T23:59:59Z") AND 220 | distance(v.location, TimesSquareManhattan) < TenMiles AND 221 | v.name is not null AND 222 | t.name =~ '(?i).*dancing.*' 223 | RETURN g.name as GroupName, 224 | collect(distinct t.name) as topics, 225 | e.name as EventName, 226 | count(r) as RSVPs, 227 | e.time as When, 228 | v.name as Venue 229 | ORDER BY RSVPs DESC 230 | LIMIT 100; 231 | ---- 232 | 233 | == Next 234 | 235 | We have seen how to use Cypher to import and analyze meetup data from the Meetup API. 236 | We can continue analysis with additional queries, import other data for more layers, and more! -------------------------------------------------------------------------------- /browser-guides/meetup/meetup.adoc: -------------------------------------------------------------------------------- 1 | = Analyzing Meetup Data with Neo4j 2 | :author: Neo4j Devrel 3 | :description: Analyze API data from Meetup.com with Neo4j 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/meetup/img 5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/meetup 6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/meetup 7 | :tags: cypher, data-analysis, similarity, import, load-csv 8 | :neo4j-version: 3.5 9 | :icons: font 10 | 11 | == Import and Analyze data from Meetup.com API 12 | 13 | image::{img}/meetup.png[float=right] 14 | 15 | In this guide, we will call the Meetup API that provides data for meetup groups, topics, members, events, and more to import the data set to Neo4j. 16 | Once imported, we can explore the data as a graph using the Cypher query language to retrieve and discover insights and interesting details. 17 | 18 | Table of Contents: 19 | 20 | ifdef::env-guide[] 21 | . pass:a[Data Import] 22 | . pass:a[Data Analysis] 23 | endif::[] 24 | 25 | ifdef::env-graphgist[] 26 | . link:{gist}/01_meetup_import.adoc[Data Import^] 27 | . link:{gist}/02_data_analysis.adoc[Data Analysis^] 28 | endif::[] 29 | 30 | == Further Resources 31 | 32 | * https://neo4j.com/graphgists[Graph Gist Examples] 33 | * https://neo4j.com/docs/cypher-refcard/current/[Cypher Reference Card] 34 | * https://neo4j.com/docs/cypher-manual/current/[Cypher Manual] 35 | * https://neo4j.com/developer/cypher/resources/[Cypher Resources] 36 | * https://graphdatabases.com[e-book: Graph Databases (free)] -------------------------------------------------------------------------------- /browser-guides/restaurant_recommendation/restaurant_recommendation.adoc: -------------------------------------------------------------------------------- 1 | = Restaurant Recommendations 2 | :author: Neo4j 3 | :description: Understand and build a small recommendation engine 4 | :img: https://s3.amazonaws.com/guides.neo4j.com/restaurant_recommendation/img 5 | :tags: recommendation, graph-search, introduction 6 | :neo4j-version: 3.5 7 | :icons: font 8 | 9 | == Restaurant Recommendations: Introduction 10 | 11 | image::{img}/restaurant_recommendation_model.png[height=300,float=right] 12 | 13 | We want to demonstrate how easy it is to model a domain as a graph and answer questions in almost-natural language. 14 | 15 | Graph-based search and discovery is prominent a use case for graph databases like https://neo4j.com[Neo4j]. 16 | 17 | Here, we use a domain of restaurants that serve cuisines and are located in a city. 18 | 19 | The domain diagram was created with the http://www.apcjones.com/arrows/[Arrows tool]. 20 | 21 | == Setup: Creating Friends, Restaurants, Cities, and Cuisines 22 | 23 | We will create a small example graph of people with cuisines they like and the restaurants serving those cuisines. 24 | Our people are in the same social circle (friend relationships), so we can create recommendations of cuisines and restaurants others will like based on their social connections and their preferences. 25 | 26 | [source,cypher] 27 | ---- 28 | CREATE (philip:Person {name:"Philip"})-[:IS_FRIEND_OF]->(emil:Person {name:"Emil"}), 29 | (philip)-[:IS_FRIEND_OF]->(michael:Person {name:"Michael"}), 30 | (philip)-[:IS_FRIEND_OF]->(andreas:Person {name:"Andreas"}) 31 | CREATE (sushi:Cuisine {name:"Sushi"}), (nyc:City {name:"New York"}), 32 | (iSushi:Restaurant {name:"iSushi"})-[:SERVES]->(sushi),(iSushi)-[:LOCATED_IN]->(nyc), 33 | (michael)-[:LIKES]->(iSushi), 34 | (andreas)-[:LIKES]->(iSushi), 35 | (zam:Restaurant {name:"Zushi Zam"})-[:SERVES]->(sushi),(zam)-[:LOCATED_IN]->(nyc), 36 | (andreas)-[:LIKES]->(zam) 37 | ---- 38 | 39 | == Setup: Philip's Friends 40 | 41 | First, let's some of our graph data and find who is friends with Philip. 42 | 43 | [source,cypher] 44 | ---- 45 | MATCH (philip:Person {name:'Philip'})-[:IS_FRIEND_OF]-(person) 46 | RETURN person.name 47 | ---- 48 | 49 | We should see 3 friends of Philip in our graph - Andreas, Michael, and Emil. 50 | 51 | == Restaurants in NYC and their cusines 52 | 53 | Now let's look at restaurants and the cities where they are located with the cuisines they serve. 54 | 55 | [source,cypher] 56 | ---- 57 | MATCH (nyc:City {name:'New York'})<-[:LOCATED_IN]-(restaurant)-[:SERVES]->(cuisine) 58 | RETURN nyc, restaurant, cuisine 59 | ---- 60 | 61 | This query should show us nodes and relationships for the `City` of New York, 2 restaurants `LOCATED_IN` that city, and that each restaurant `SERVES` the `Cuisine` of sushi. 62 | 63 | == Graph Search Recommendation 64 | 65 | image::{img}/sushi_restaurants_nyc.png[height=300,float=right] 66 | 67 | Now that we have an idea what our data looks like, we can start recommending things based on the relationships connecting our people, location, cuisines, and restaurants. 68 | 69 | We want to make a recommendation for Philip by answering the following question: 70 | 71 | "" 72 | Find Sushi Restaurants in New York that Philip's friends like. 73 | "" 74 | 75 | == Recommendation criteria 76 | 77 | To answer this question, we need to find our starting point - _Philip_ needs the recommendation, so his node is where we start our search in the graph. 78 | Now we need to determine which parts of the graph to search using the following criteria from the question: 79 | 80 | * Find _Philip_ and his friends 81 | * Find _Restaurants_ that are located in _New York_ 82 | * Find _Restaurants_ that serve the cuisine _sushi_ 83 | * Find _Restaurants_ that _Philip's friends_ like 84 | 85 | == Recommendation query 86 | 87 | With those criteria, we construct this query: 88 | 89 | [source,cypher] 90 | ---- 91 | MATCH (philip:Person {name: 'Philip'}), 92 | (philip)-[:IS_FRIEND_OF]-(friend), 93 | (restaurant:Restaurant)-[:LOCATED_IN]->(:City {name: 'New York'}), 94 | (restaurant)-[:SERVES]->(:Cuisine {name: 'Sushi'}), 95 | (friend)-[:LIKES]->(restaurant) 96 | RETURN restaurant.name as restaurantName, collect(friend.name) AS recommendedBy, count(*) AS numberOfRecommendations 97 | ORDER BY numberOfRecommendations DESC 98 | ---- 99 | 100 | This tells us that 2 of Philip's friends recommend iSushi restaurant for sushi, and 1 of his friends recommends Zushi Zam restaurant for sushi. 101 | 102 | == More on recommendations 103 | 104 | Larger graphs and deeper relationship paths can add complexity and power to recommendation engines. This example shows the beginning steps and logic for building these systems using the relationships in the network to recommend products, hobbies, services, similarities, and more. 105 | 106 | * https://neo4j.com/use-cases/real-time-recommendation-engine/[Use case: Recommendations Engine] 107 | * https://neo4j.com/developer/cypher/guide-build-a-recommendation-engine/[Tutorial: Building Recommendation Engine] -------------------------------------------------------------------------------- /fraud/BankFraud-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/fraud/BankFraud-1.png -------------------------------------------------------------------------------- /fraud/bank-fraud-detection.adoc: -------------------------------------------------------------------------------- 1 | = Bank Fraud Detection 2 | :neo4j-version: 2.3.0-RC1 3 | :author: Kenny Bastani 4 | :twitter: @kennybastani 5 | :domain: finance 6 | :use-case: fraud-detection 7 | 8 | == Introduction to Problem 9 | 10 | (original source: https://github.com/neo4j-contrib/gists/blob/master/other/BankFraudDetection.adoc ) 11 | 12 | This interactive Neo4j graph tutorial covers bank fraud detection scenarios. 13 | 14 | Banks and Insurance companies lose billions of dollars every year to fraud. 15 | Traditional methods of fraud detection play an important role in minimizing these losses. 16 | However, increasingly sophisticated fraudsters have developed a variety of ways to elude discovery, both by working together and by leveraging various other means of constructing false identities. 17 | 18 | ''' 19 | 20 | == Explanation of Scenario 21 | 22 | While no fraud prevention measures can ever be perfect, significant opportunity for improvement lies in looking beyond the individual data points, to the connections that link them. 23 | Oftentimes these connections go unnoticed until it is too late-- something that is unfortunate, as these connections oftentimes hold the best clues. 24 | 25 | === Typical Scenario 26 | 27 | While the exact details behind each first-party fraud collusion vary from operation to operation, the pattern below illustrates how fraud rings commonly operate: 28 | 29 | * A group of two or more people organize into a fraud ring 30 | * The ring shares a subset of legitimate contact information, i.e., phone numbers and addresses, combining them to create a number of fictional identities 31 | * Ring members open accounts using these fictional identities 32 | * New accounts are added to the original ones: unsecured credit lines, credit cards, overdraft protection, personal loans, etc. 33 | * The accounts are used as normally, with regular purchases and timely payments 34 | * Banks increase the revolving credit lines over time, due to the observed responsible credit behavior 35 | * One day the ring "busts out", coordinating their activity, maxing out all of their credit lines, and disappearing 36 | * Sometimes fraudsters will go a step further and bring all of their balances to zero using fake checks immediately before the prior step, doubling the damage 37 | * Collections processes ensue, but agents are never able to reach the fraudster 38 | * The uncollectible debt is written off 39 | 40 | ''' 41 | 42 | == Explanation of Solution 43 | 44 | Graph databases offer new methods of uncovering fraud rings and other sophisticated scams with a high degree of accuracy, and are capable of stopping advanced fraud scenarios in real time. 45 | 46 | === How Graph Databases Can Help 47 | 48 | Augmenting one's existing fraud detection infrastructure to support ring detection can be done by running appropriate entity link analysis queries using a graph database, and running checks during key stages in the customer & account lifecycle, such as: 49 | 50 | * At the time the account is created 51 | * During an investigation 52 | * As soon as a credit balance threshold is hit 53 | * When a check is bounced 54 | 55 | Real time graph traversals tied to the right kinds of events can help banks identify probable fraud rings, during or even before the Bust-Out occurs. 56 | 57 | ''' 58 | 59 | == Bank Fraud Graph Data Model 60 | 61 | Graph databases have emerged as an ideal tool for overcoming these hurdles. 62 | Languages like Cypher provide a simple semantic for detecting rings in the graph, navigating connections in memory, in real time. 63 | 64 | The graph data model below represents how the data actually looks to the graph database, and illustrates how one can find rings by simply walking the graph: 65 | 66 | .Bank Fraud 67 | image::https://raw.github.com/neo4j-contrib/gists/master/other/images/BankFraud-1.png[Bank Fraud] 68 | 69 | ''' 70 | 71 | == Sample Data Set 72 | 73 | //hide 74 | //setup 75 | [source,cypher] 76 | ---- 77 | 78 | // Create account holders 79 | CREATE (accountHolder1:AccountHolder { 80 | FirstName: "John", 81 | LastName: "Doe", 82 | UniqueId: "JohnDoe" }) 83 | 84 | CREATE (accountHolder2:AccountHolder { 85 | FirstName: "Jane", 86 | LastName: "Appleseed", 87 | UniqueId: "JaneAppleseed" }) 88 | 89 | CREATE (accountHolder3:AccountHolder { 90 | FirstName: "Matt", 91 | LastName: "Smith", 92 | UniqueId: "MattSmith" }) 93 | 94 | // Create Address 95 | CREATE (address1:Address { 96 | Street: "123 NW 1st Street", 97 | City: "San Francisco", 98 | State: "California", 99 | ZipCode: "94101" }) 100 | 101 | // Connect 3 account holders to 1 address 102 | CREATE (accountHolder1)-[:HAS_ADDRESS]->(address1), 103 | (accountHolder2)-[:HAS_ADDRESS]->(address1), 104 | (accountHolder3)-[:HAS_ADDRESS]->(address1) 105 | 106 | // Create Phone Number 107 | CREATE (phoneNumber1:PhoneNumber { PhoneNumber: "555-555-5555" }) 108 | 109 | // Connect 2 account holders to 1 phone number 110 | CREATE (accountHolder1)-[:HAS_PHONENUMBER]->(phoneNumber1), 111 | (accountHolder2)-[:HAS_PHONENUMBER]->(phoneNumber1) 112 | 113 | // Create SSN 114 | CREATE (ssn1:SSN { SSN: "241-23-1234" }) 115 | 116 | // Connect 2 account holders to 1 SSN 117 | CREATE (accountHolder2)-[:HAS_SSN]->(ssn1), 118 | (accountHolder3)-[:HAS_SSN]->(ssn1) 119 | 120 | // Create SSN and connect 1 account holder 121 | CREATE (ssn2:SSN { SSN: "241-23-4567" })<-[:HAS_SSN]-(accountHolder1) 122 | 123 | // Create Credit Card and connect 1 account holder 124 | CREATE (creditCard1:CreditCard { 125 | AccountNumber: "1234567890123456", 126 | Limit: 5000, Balance: 1442.23, 127 | ExpirationDate: "01-20", 128 | SecurityCode: "123" })<-[:HAS_CREDITCARD]-(accountHolder1) 129 | 130 | // Create Bank Account and connect 1 account holder 131 | CREATE (bankAccount1:BankAccount { 132 | AccountNumber: "2345678901234567", 133 | Balance: 7054.43 })<-[:HAS_BANKACCOUNT]-(accountHolder1) 134 | 135 | // Create Credit Card and connect 1 account holder 136 | CREATE (creditCard2:CreditCard { 137 | AccountNumber: "1234567890123456", 138 | Limit: 4000, Balance: 2345.56, 139 | ExpirationDate: "02-20", 140 | SecurityCode: "456" })<-[:HAS_CREDITCARD]-(accountHolder2) 141 | 142 | // Create Bank Account and connect 1 account holder 143 | CREATE (bankAccount2:BankAccount { 144 | AccountNumber: "3456789012345678", 145 | Balance: 4231.12 })<-[:HAS_BANKACCOUNT]-(accountHolder2) 146 | 147 | // Create Unsecured Loan and connect 1 account holder 148 | CREATE (unsecuredLoan2:UnsecuredLoan { 149 | AccountNumber: "4567890123456789-0", 150 | Balance: 9045.53, 151 | APR: .0541, 152 | LoanAmount: 12000.00 })<-[:HAS_UNSECUREDLOAN]-(accountHolder2) 153 | 154 | // Create Bank Account and connect 1 account holder 155 | CREATE (bankAccount3:BankAccount { 156 | AccountNumber: "4567890123456789", 157 | Balance: 12345.45 })<-[:HAS_BANKACCOUNT]-(accountHolder3) 158 | 159 | // Create Unsecured Loan and connect 1 account holder 160 | CREATE (unsecuredLoan3:UnsecuredLoan { 161 | AccountNumber: "5678901234567890-0", 162 | Balance: 16341.95, APR: .0341, 163 | LoanAmount: 22000.00 })<-[:HAS_UNSECUREDLOAN]-(accountHolder3) 164 | 165 | // Create Phone Number and connect 1 account holder 166 | CREATE (phoneNumber2:PhoneNumber { 167 | PhoneNumber: "555-555-1234" })<-[:HAS_PHONENUMBER]-(accountHolder3) 168 | 169 | RETURN * 170 | ---- 171 | 172 | //graph 173 | 174 | ''' 175 | 176 | == Entity Link Analysis 177 | 178 | Performing entity link analysis on the above data model is demonstrated below. 179 | We use brackets in the below table is to isolate individual elements of a http://neo4j.com/docs/stable/syntax-collections.html[collection]. 180 | 181 | == Find account holders who share more than one piece of legitimate contact information 182 | 183 | [source,cypher] 184 | ---- 185 | MATCH (accountHolder:AccountHolder)-[]->(contactInformation) 186 | WITH contactInformation, 187 | count(accountHolder) AS RingSize 188 | MATCH (contactInformation)<-[]-(accountHolder) 189 | WITH collect(accountHolder.UniqueId) AS AccountHolders, 190 | contactInformation, RingSize 191 | WHERE RingSize > 1 192 | RETURN AccountHolders AS FraudRing, 193 | labels(contactInformation) AS ContactType, 194 | RingSize 195 | ORDER BY RingSize DESC 196 | ---- 197 | 198 | //output 199 | //table 200 | 201 | 202 | 203 | == Determine the financial risk of a possible fraud ring 204 | 205 | [source,cypher] 206 | ---- 207 | MATCH (accountHolder:AccountHolder)-[]->(contactInformation) 208 | WITH contactInformation, 209 | count(accountHolder) AS RingSize 210 | MATCH (contactInformation)<-[]-(accountHolder), 211 | (accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]->(unsecuredAccount) 212 | WITH collect(DISTINCT accountHolder.UniqueId) AS AccountHolders, 213 | contactInformation, RingSize, 214 | SUM(CASE type(r) 215 | WHEN 'HAS_CREDITCARD' THEN unsecuredAccount.Limit 216 | WHEN 'HAS_UNSECUREDLOAN' THEN unsecuredAccount.Balance 217 | ELSE 0 218 | END) as FinancialRisk 219 | WHERE RingSize > 1 220 | RETURN AccountHolders AS FraudRing, 221 | labels(contactInformation) AS ContactType, 222 | RingSize, 223 | round(FinancialRisk) as FinancialRisk 224 | ORDER BY FinancialRisk DESC 225 | ---- 226 | 227 | //output 228 | //table 229 | 230 | //console 231 | -------------------------------------------------------------------------------- /index.adoc: -------------------------------------------------------------------------------- 1 | == Neo4j Use-Case Examples as Guides and GraphGists 2 | :graphgist: http://neo4j.com/graphgist 3 | :guides: http://guides.neo4j.com/graphgists 4 | 5 | Let's look at some interesting use case examples in detail 6 | 7 | . pass:a[Bank Fraud Detection] {graphgist}/9d627127-003b-411a-b3ce-f8d3970c2afa[(GraphGist)] 8 | . pass:a[Books Management Graph] {graphgist}/56c4ceb8-0af1-4d36-b14c-aaa482dc2abc[(GraphGist)] 9 | . pass:a[Analyzing Offshore Leaks] {graphgist}/ec65c2fa-9d83-4894-bc1e-98c475c7b57a[(GraphGist)] 10 | . pass:a[Network Dependency Graph] {graphgist}/306bb0c7-9820-4c29-9835-15625e4e9f96[(GraphGist)] 11 | . pass:a[Job Recommendation System] {graphgist}/4cea8113-30e9-46bc-bbb0-06236a9bd8b9[(GraphGist)] 12 | 13 | === Other Resources 14 | 15 | * http://neo4j.com/graphgists[All Graph Gists] 16 | * http://portal.graphgist.org[GraphGist Author Portal] 17 | -------------------------------------------------------------------------------- /medical/DoctorFinder.adoc: -------------------------------------------------------------------------------- 1 | = DoctorFinder! 2 | :neo4j-version: 2.3.0 3 | :author: The Vidal Team 4 | :twitter: @fbiville 5 | 6 | :toc: 7 | 8 | This GraphGist represents a mobile application backend helping users to find adequate drugs and specialists given their physical characteristics, location and current symptoms. 9 | 10 | == Our resulting model 11 | 12 | [[img-model]] 13 | .DoctorFinder model 14 | image::http://img15.hostingpics.net/pics/800451GraphGist.png[DoctorFinder! model, 854, 500] 15 | 16 | //hide 17 | //setup 18 | [source,cypher] 19 | ------- 20 | CREATE 21 | (_6:DrugClass {name:"Bronchodilators"}), 22 | (_7:DrugClass {name:"Corticosteroids"}), 23 | (_8:DrugClass {name:"Xanthine"}), 24 | (_9:Drug {name:"Salbutamol"}), 25 | (_10:Drug {name:"Terbutaline"}), 26 | (_11:Drug {name:"Bambuterol"}), 27 | (_12:Drug {name:"Formoterol"}), 28 | (_13:Drug {name:"Salmeterol"}), 29 | (_14:Drug {name:"Beclometasone"}), 30 | (_15:Drug {name:"Budesonide"}), 31 | (_16:Drug {name:"Ciclesonide"}), 32 | (_17:Drug {name:"Fluticasone"}), 33 | (_18:Drug {name:"Mometasone"}), 34 | (_19:Drug {name:"Betametasone"}), 35 | (_20:Drug {name:"Prednisolone"}), 36 | (_21:Drug {name:"Dilatrane"}), 37 | (_22:Allergy {name:"Hypersensitivity to Betametasone"}), 38 | (_23:Pathology {name:"Asthma"}), 39 | (_24:Symptom {name:"Wheezing"}), 40 | (_25:Symptom {name:"Chest tightness"}), 41 | (_26:Symptom {name:"Cough"}), 42 | (_27:Doctor {latitude:48.8573,longitude:2.35685,name:"Irving Matrix"}), 43 | (_28:Doctor {latitude:46.83144,longitude:-71.28454,name:"Jack McKee"}), 44 | (_29:Doctor {latitude:48.86982,longitude:2.32503,name:"Michaela Quinn"}), 45 | (_30:DoctorSpecialization {name:"Physician"}), 46 | (_31:DoctorSpecialization {name:"Angiologist"}), 47 | (_6)-[:CURES {age_max:60,age_min:18,indication:"Adult asthma"}]->_23, 48 | (_7)-[:CURES {age_max:18,age_min:5,indication:"Child asthma"}]->_23, 49 | (_8)-[:CURES {age_max:60,age_min:18,indication:"Adult asthma"}]->_23, 50 | (_9)-[:BELONGS_TO_CLASS]->(_6), 51 | (_10)-[:BELONGS_TO_CLASS]->(_6), 52 | (_11)-[:BELONGS_TO_CLASS]->(_6), 53 | (_12)-[:BELONGS_TO_CLASS]->(_6), 54 | (_13)-[:BELONGS_TO_CLASS]->(_6), 55 | (_14)-[:BELONGS_TO_CLASS]->(_7), 56 | (_15)-[:BELONGS_TO_CLASS]->(_7), 57 | (_16)-[:BELONGS_TO_CLASS]->(_7), 58 | (_17)-[:BELONGS_TO_CLASS]->(_7), 59 | (_18)-[:BELONGS_TO_CLASS]->(_7), 60 | (_19)-[:BELONGS_TO_CLASS]->(_6), 61 | (_19)-[:BELONGS_TO_CLASS]->(_7), 62 | (_19)-[:MAY_CAUSE_ALLERGY]->(_22), 63 | (_20)-[:BELONGS_TO_CLASS]->(_7), 64 | (_21)-[:BELONGS_TO_CLASS]->_8, 65 | (_23)-[:MAY_MANIFEST_SYMPTOMS]->(_24), 66 | (_23)-[:MAY_MANIFEST_SYMPTOMS]->(_25), 67 | (_23)-[:MAY_MANIFEST_SYMPTOMS]->(_26), 68 | (_27)-[:SPECIALISES_IN]->(_31), 69 | (_28)-[:SPECIALISES_IN]->(_31), 70 | (_29)-[:SPECIALISES_IN]->(_30), 71 | (_30)-[:CAN_PRESCRIBE]->(_7), 72 | (_31)-[:CAN_PRESCRIBE]->(_6) 73 | ------- 74 | //graph 75 | 76 | 77 | From VIDAL with ♥ (`Suzanne`, `Nicolas`, `Édouard`, `Marouane`, `Sébastian`, `Thibaut`, `Olivier`, `Sylvain`, `Florent` (aka Cypher translator)). 78 | 79 | == User stories 80 | 81 | === Symptom autocompletion 82 | 83 | > **As** an application user, + 84 | > **When** I start typing my symptoms 85 | > **Then** matching symptoms are returned in alphabetical order. 86 | 87 | ==== Example 88 | 89 | User types 'c'. 90 | 91 | [source,cypher] 92 | ---- 93 | MATCH (s:Symptom) 94 | WHERE UPPER(s.name)=~ UPPER('c.*') 95 | RETURN s.name AS `Symptom` 96 | ORDER BY s.name ASC 97 | ---- 98 | //table 99 | 100 | For simplicity's sake, this query will not be included in the following examples. 101 | However, it would definitely be the first clause of each (as user types only symptom starts). 102 | Subsequent queries will assume symptom names were resolved by this first sub-query. 103 | 104 | === Drug advisor 105 | 106 | > **As** an application user, + 107 | > **When** I start typing my symptoms 108 | > 109 | > **Then** adequate drugs are returned, grouped by their therapeutic class. 110 | 111 | ==== Example 112 | 113 | Current user is a 35-year old man, manifesting **wheezing** and **chest tightness**, suffering from **hypersensitivity to Betametasone** allergy. 114 | 115 | We expect all drugs of class `Bronchodilators` (`Betametasone` drug excluded, because of the aforementioned allergy) and `Xanthine` to appear as they are the only therapeutic classes suitable for adults in our dataset. 116 | 117 | [source,cypher] 118 | ---- 119 | MATCH (patho:Pathology)-[:MAY_MANIFEST_SYMPTOMS]->(symptoms:Symptom) 120 | WHERE symptoms.name IN ['Chest tightness', 'Wheezing'] 121 | WITH patho 122 | 123 | MATCH (DrugClass:DrugClass)-[cures:CURES]->(patho) 124 | WHERE cures.age_min <= 35 AND 35 < cures.age_max 125 | WITH DrugClass 126 | 127 | MATCH (drug:Drug)-[:BELONGS_TO_CLASS]->(DrugClass), (allergy:ALLERGY) 128 | WHERE allergy.name IN ['Hypersensitivity to Betametasone'] 129 | AND (NOT (drug)-[:MAY_CAUSE_ALLERGY]->(allergy)) 130 | RETURN DrugClass.name AS `Therapeutic class`, COLLECT(DISTINCT drug.name) AS `Drugs`; 131 | ---- 132 | //table 133 | 134 | === Doctor finder 135 | 136 | > **As** an application user, + 137 | > **When** I start typing my symptoms 138 | > 139 | > **Then** the doctors who (ahah!) can prescribe adequate drugs are returned with these drugs, ordered by proximity. 140 | 141 | See definition above for what 'adequate drugs' mean. 142 | If drugs can be purchased without prescription, the mention 'No doctor required' for these drugs should be returned, with a distance to user home of **0**. 143 | 144 | ==== Example 145 | 146 | Current user is a 19-year old woman, manifesting **cough**, suffering from hypersensitivity to Betametasone allergy and living at '14, rue de Bruxelles 75009 PARIS, FRANCE' (latitude:48.88344, longitude:2.33180). 147 | 148 | We expect all angiologists to be returned as the drugs they can prescribe can cure illnesses related to the user symptom. 149 | 150 | Moreover, drugs of class `Xanthine` do not require a prescription and they can cure the same kind of illnesses as well. 151 | 152 | [source,cypher] 153 | ---- 154 | MATCH (patho:Pathology)-[:MAY_MANIFEST_SYMPTOMS]->(symptoms:Symptom) 155 | WHERE symptoms.name IN ['Cough'] 156 | WITH patho 157 | 158 | MATCH (DrugClass:DrugClass)-[cures:CURES]->(patho) 159 | WHERE cures.age_min <= 19 AND 19 < cures.age_max 160 | WITH DrugClass 161 | 162 | MATCH (drug:Drug)-[:BELONGS_TO_CLASS]->(DrugClass), (allergy:ALLERGY) 163 | WHERE allergy.name IN ['Hypersensitivity to Betametasone'] 164 | AND (NOT (drug)-[:MAY_CAUSE_ALLERGY]->(allergy)) 165 | WITH DrugClass, drug 166 | 167 | OPTIONAL MATCH (doctor:Doctor)-->(spe:DoctorSpecialization)-[:CAN_PRESCRIBE]->(DrugClass) 168 | RETURN COALESCE(doctor.name + ' (' + spe.name + ')', 'No doctor required') AS `Doctor`, COLLECT(DISTINCT drug.name) AS `Drugs for your symptoms`, 2 * 6371 * asin(sqrt(haversin(radians(48.88344 - COALESCE(doctor.latitude,48.88344))) + cos(radians(48.88344)) * cos(radians(COALESCE(doctor.latitude,90)))* haversin(radians(2.33180 - COALESCE(doctor.longitude,2.33180))))) AS `Distance to home (km)` 169 | ORDER BY `Distance to home (km)` ASC; 170 | ---- 171 | //table 172 | 173 | As obfuscated as it looks, the distance computation is just a null-safe variant of the haversin formula explained in Cypher manual (indeed, there are drugs that do not require a doctor prescription). 174 | 175 | //console 176 | -------------------------------------------------------------------------------- /medical/pharma_drugs_targets.adoc: -------------------------------------------------------------------------------- 1 | = Pharmaceutical Drugs and their Targets 2 | Josh Kunken 3 | v1.0, 14-Dec-2013 4 | :neo4j-version: 2.3.0 5 | :author: Josh Kunken 6 | :twitter: joshkunken 7 | 8 | :toc: 9 | 10 | == Domain 11 | 12 | A pharmaceutical portfolio is a collection of drug compounds, their respective indications, and their targets. 13 | A pharmaceutical company or drugstore organizes its pharmaceutical products into one or more portfolios. 14 | A drug portfolio thus contains multiple pharmaceuticals, with each pharmaceutical containing a link to one or more of its targets in the human body. 15 | This lends itself to be modeled as a graph. 16 | Each pharmaceutical and drug target can also have a distinct set of attributes which also fit nicely into the property graph model. 17 | Within the examples found in this use case, most drug targets happen to be G-protein coupled receptors (GPCRs), for which structures have only recently been solved in the last several years. 18 | 19 | A drug can have one or more targets. 20 | A target can be targeted by one or more drugs. 21 | This is not a complete solution for all the drug portfolio use cases but provides a good starting point. 22 | 23 | .Domain Model 24 | image::http://www.sohosci.com/drug_portfolio.PNG[Domain Model] 25 | 26 | 27 | == Setup 28 | 29 | The sample data set uses a pharmaceutical portfolio. 30 | 31 | //hide 32 | //setup 33 | [source,cypher] 34 | ---- 35 | CREATE (drugPortfolio:Portfolio{ name:'Pharmaceutical Portfolio' }) 36 | 37 | CREATE (drugs:Category { name:'Drugs' }) 38 | CREATE drugs-[:PARENT]->drugPortfolio 39 | 40 | CREATE (antipsychotic_agents:Category { name:'Antipsychotic Agents' }) 41 | CREATE antipsychotic_agents-[:PARENT]->drugs 42 | CREATE (antiparkinson_agents:Category { name:'Antiparkinson Agents' }) 43 | CREATE antiparkinson_agents-[:PARENT]->drugs 44 | CREATE (antimigraine_agents:Category { name:'Antimigraine Agents' }) 45 | CREATE antimigraine_agents-[:PARENT]->drugs 46 | CREATE (antidepressive_agents:Category { name:'Antidepressive Agents' }) 47 | CREATE antidepressive_agents-[:PARENT]->drugs 48 | CREATE (antiallergic_agents:Category { name:'Antiallergic Agents' }) 49 | CREATE antiallergic_agents-[:PARENT]->drugs 50 | CREATE (cns_stimulants:Category { name:'CNS Stimulants' }) 51 | CREATE cns_stimulants-[:PARENT]->drugs 52 | CREATE (bronchodilator_agents:Category { name:'Bronchodilator Agents' }) 53 | CREATE bronchodilator_agents-[:PARENT]->drugs 54 | CREATE (vasodilator:Category { name:'Vasodilator' }) 55 | CREATE vasodilator-[:PARENT]->drugs 56 | 57 | CREATE (HUMAN_5HT1A:DrugTarget{ name:'5HT1A_HUMAN' }) 58 | CREATE (HUMAN_5HT1B:DrugTarget{ name:'5HT1B_HUMAN' }) 59 | CREATE (HUMAN_5HT2A:DrugTarget{ name:'5HT2A_HUMAN' }) 60 | CREATE (HUMAN_AA1R:DrugTarget{ name:'AA1R_HUMAN' }) 61 | CREATE (HUMAN_AA2AR:DrugTarget{ name:'AA2AR_HUMAN' }) 62 | CREATE (HUMAN_AA2BR:DrugTarget{ name:'AA2BR_HUMAN' }) 63 | 64 | CREATE (clozapine:Product { name:'Clozapine' }) 65 | CREATE clozapine-[:OF_TYPE]->antipsychotic_agents 66 | CREATE clozapine-[:TARGETS]->HUMAN_5HT1A 67 | 68 | CREATE (aripiprazole:Product { name:'Aripiprazole' }) 69 | CREATE aripiprazole-[:OF_TYPE]->antipsychotic_agents 70 | CREATE aripiprazole-[:TARGETS]->HUMAN_5HT1A 71 | 72 | CREATE (lisuride:Product { name:'Lisuride' }) 73 | CREATE lisuride-[:OF_TYPE]->antiparkinson_agents 74 | CREATE lisuride-[:TARGETS]->HUMAN_5HT1A 75 | 76 | CREATE (methysergide:Product { name:'Methysergide' }) 77 | CREATE methysergide-[:OF_TYPE]->antimigraine_agents 78 | CREATE methysergide-[:TARGETS]->HUMAN_5HT1A 79 | 80 | CREATE (almotriptan:Product { name:'Almotriptan' }) 81 | CREATE almotriptan-[:OF_TYPE]->antimigraine_agents 82 | CREATE almotriptan-[:TARGETS]->HUMAN_5HT1B 83 | 84 | CREATE (eletriptan:Product { name:'Eletriptan' }) 85 | CREATE eletriptan-[:OF_TYPE]->antimigraine_agents 86 | CREATE eletriptan-[:TARGETS]->HUMAN_5HT1B 87 | 88 | CREATE (ergotamine:Product { name:'Ergotamine' }) 89 | CREATE ergotamine-[:OF_TYPE]->antimigraine_agents 90 | CREATE ergotamine-[:TARGETS]->HUMAN_5HT1B 91 | 92 | CREATE (frovatriptan:Product { name:'Frovatriptan' }) 93 | CREATE frovatriptan-[:OF_TYPE]->antimigraine_agents 94 | CREATE frovatriptan-[:TARGETS]->HUMAN_5HT1B 95 | 96 | CREATE (naratriptan:Product { name:'Naratriptan' }) 97 | CREATE naratriptan-[:OF_TYPE]->antimigraine_agents 98 | CREATE naratriptan-[:TARGETS]->HUMAN_5HT1B 99 | 100 | CREATE (chlorprothixene:Product { name:'Chlorprothixene' }) 101 | CREATE chlorprothixene-[:OF_TYPE]->antipsychotic_agents 102 | CREATE chlorprothixene-[:TARGETS]->HUMAN_5HT2A 103 | 104 | CREATE clozapine-[:OF_TYPE]->antipsychotic_agents 105 | CREATE clozapine-[:TARGETS]->HUMAN_5HT2A 106 | 107 | CREATE (cyclobenzaprine:Product { name:'Cyclobenzaprine' }) 108 | CREATE cyclobenzaprine-[:OF_TYPE]->antidepressive_agents 109 | CREATE cyclobenzaprine-[:TARGETS]->HUMAN_5HT2A 110 | 111 | CREATE (cyproheptadine:Product { name:'Cyclobenzaprine' }) 112 | CREATE cyproheptadine-[:OF_TYPE]->antiallergic_agents 113 | CREATE cyproheptadine-[:TARGETS]->HUMAN_5HT2A 114 | 115 | CREATE (caffeine:Product { name:'Caffeine' }) 116 | CREATE caffeine-[:OF_TYPE]->cns_stimulants 117 | CREATE caffeine-[:TARGETS]->HUMAN_AA1R 118 | CREATE caffeine-[:TARGETS]->HUMAN_AA2AR 119 | 120 | CREATE (theophylline:Product { name:'Theophylline' }) 121 | CREATE theophylline-[:OF_TYPE]->bronchodilator_agents 122 | CREATE theophylline-[:TARGETS]->HUMAN_AA1R 123 | CREATE theophylline-[:TARGETS]->HUMAN_AA2AR 124 | CREATE theophylline-[:TARGETS]->HUMAN_AA2BR 125 | 126 | CREATE (regadenoson:Product { name:'Regadenoson' }) 127 | CREATE regadenoson-[:OF_TYPE]->vasodilator 128 | CREATE regadenoson-[:TARGETS]->HUMAN_AA2AR 129 | ---- 130 | 131 | === Try other queries yourself! 132 | //console 133 | 134 | == Use Cases 135 | 136 | == All portfolios 137 | 138 | [source,cypher] 139 | ---- 140 | MATCH (c:Portfolio) 141 | RETURN c.name AS Portfolios 142 | ---- 143 | //table 144 | 145 | == All categories by Depth 146 | 147 | [source,cypher] 148 | ---- 149 | MATCH p=(cats:Category)-[:PARENT|PARENT*]->(cat:Portfolio) 150 | RETURN LENGTH(p) AS Depth, COLLECT(cats.name) AS Categories 151 | ORDER BY Depth ASC 152 | ---- 153 | //table 154 | 155 | == All categories of a given depth 156 | 157 | [source,cypher] 158 | ---- 159 | MATCH p=(cats:Category)-[:PARENT*]->(cat:Portfolio) 160 | WHERE cat.name='Pharmaceutical Portfolio' AND length(p)=1 161 | RETURN cats.name AS `Categories of Given Level` 162 | ORDER BY cats.name 163 | ---- 164 | //table 165 | 166 | == All sub-categories of a given category 167 | 168 | [source,cypher] 169 | ---- 170 | MATCH (cats:Category)-[:PARENT]->(parentCat:Category), (parentCat)-[:PARENT*]->(c:Portfolio) 171 | RETURN parentCat.name AS Parent, COLLECT(cats.name) AS SubCategories 172 | ---- 173 | //table 174 | 175 | == All parents and their child categories 176 | 177 | [source,cypher] 178 | ---- 179 | MATCH (child:Category)-[:PARENT*]->(parent) 180 | RETURN parent.name AS Parent, COLLECT(child.name) AS Children 181 | ---- 182 | //table 183 | 184 | == All parent and their IMMEDIATE children 185 | 186 | [source,cypher] 187 | ---- 188 | MATCH (child:Category)-[:PARENT]->(parent) 189 | RETURN labels(parent), parent.name AS Parent, COLLECT(child.name) AS Children 190 | ---- 191 | //table 192 | //console 193 | -------------------------------------------------------------------------------- /medical/treatment_planners.adoc: -------------------------------------------------------------------------------- 1 | = Behavioral Health Treatment Planning 2 | Greg Ricker 3 | v1.0, 22-2-2015 4 | :neo4j-version: 2.3.0 5 | :author: Greg Ricker 6 | :twitter: @greg_ricker 7 | 8 | :toc: 9 | 10 | == Using the Wiley Treatment Plan 11 | 12 | I am using the "Wiley treatment plan" data set as the basis of our domain model since it is one standard that is used in behavioral health. 13 | A key aspect of treatment in the field of behavioral health involves creating a four-part treatment plan, packaged as libraries, consisting of a Problem, Goal, Objective, and Intervention. 14 | 15 | === The Problem 16 | 17 | The Problem states, in general terms, what the patient is suffering with. 18 | For example, this might be Depression, Low self-Esteem, Substance Abuse, or something else. 19 | 20 | === The Goal 21 | 22 | The Goal is the end result. 23 | For example, a patient might have the goal of "Demonstrate respect and regard for self and others". 24 | 25 | === The Objectives 26 | 27 | Objectives are milestones along the way from the Problem to the Goal: ways in which the patient is going to improve. 28 | 29 | === The Intervention 30 | 31 | Interventions are tasks or activities performed as part of the plan. 32 | These may be actions taken by the patient and others involved in the treatment plan. 33 | 34 | === Additional Complications 35 | 36 | In practice, there are several snarls in the model described above that make the implementation of the Wiley plan difficult for relational databases. 37 | 38 | The first is the existence of links from problem to goal to intervention restricting interventions to those related only to a specfic problem and/or goal. 39 | Setting this up in an SQL database required a separate table to maintain "linkages" for each plan. 40 | Generating an appropriate plan requires traversing the linkage table a number of times, resulting in queries that can run from two to ten seconds depending on how many libraries are loaded. 41 | 42 | The second change is that not everyone uses the plan in the defined order of problem -> goal -> objective -> intervention. 43 | Any practical implementation of the treatment plan system has to let the user start from any point in the plan and work from there. 44 | For example, the user can start with goal then jump to intervention and then on to problem. 45 | 46 | Thirdly, the information (goals, objectives, interventions) is reused. 47 | For example, GoalA may be used for ProblemA in LibraryA but it maybe used again with other problems within the same library or span libraries. 48 | 49 | .The Wiley Treatment Plan Domain Model 50 | [Domain Model] 51 | image::https://gricker.files.wordpress.com/2015/02/wiley.png[] 52 | 53 | == Implementing The Wiley Plan using Neo4j 54 | 55 | Implementing the model in Neo4j resulted in 300 nodes and 120k relationships. 56 | A typical query runs in about 500 ms and `RETURN`s 500-700 values. 57 | In addition, adding custom plans that deviate from the Wiley plan was easy and didn't affect performance. 58 | 59 | .Modified Wiley Treatment Plan Domain Model 60 | [Domain Model] 61 | image::https://gricker.files.wordpress.com/2015/02/treatment-model.png[] 62 | 63 | == Nodes 64 | 65 | ---- 66 | (:Library) 67 | (:Problem) 68 | (:Goal) 69 | ---- 70 | 71 | == Relationships 72 | 73 | ---- 74 | (:Library)-[:HAS_PROBLEM]->(:Problem)-[:HAS_GOAL]->(:Goal) 75 | (:Problem)-[:HAS_OBJECTIVE]->(:Objective) 76 | (:Problem)-[:HAS_INTERVENTION]->(:Intervenion) 77 | ---- 78 | 79 | === Sample Dataset 80 | 81 | The sample data set uses a one library, one problem, and four objectives, goals, and interventions. 82 | 83 | //hide 84 | //setup 85 | //output 86 | [source,cypher] 87 | ---- 88 | CREATE (lib:Library {GroupID:'230', Description:'School Counseling and Social Work'}) 89 | CREATE (prob1:Problem {name:'17', Description:'Parenting Skills/Discipline',GroupID:'230'}) 90 | CREATE (obj1:Objective {name:'9', Description:'Parents use natural and logical consequences to redirect the students behavior.',GroupID:'230',ProblemNumber:'17'}) 91 | CREATE (obj2:Objective {name:'8', Description:'Parents allow the student to learn from his/her mistakes.',GroupID:'230',ProblemNumber:'17'}) 92 | CREATE (obj3:Objective {name:'6', Description:'Parents set limits using positive discipline strategies.',GroupID:'230',ProblemNumber:'17'}) 93 | CREATE (obj4:Objective {name:'20', Description:'Parents work to maintain a strong, couple-centered family environment',GroupID:'230',ProblemNumber:'17'}) 94 | CREATE (goal1:Goal {name:'4', Description:'Acquire positive and moral character traits',GroupID:'230',ProblemNumber:'17'}) 95 | CREATE (goal2:Goal {name:'3', Description:'Demonstrate respect and regard for self and others.',GroupID:'230',ProblemNumber:'17'}) 96 | CREATE (goal3:Goal {name:'5', Description:'Parents acquire positive discipline strategies that set limits and encourage independence.,',GroupID:'230',ProblemNumber:'17'}) 97 | CREATE (goal4:Goal {name:'6', Description:'Family atmosphere is peaceful, loving, and harmonious.',GroupID:'230',ProblemNumber:'17'}) 98 | CREATE (intervention1:Intervention {name:'7', Description:'Suggest that the parents and the student meet weekly at a designated time to review progress, give encouragement, note continuing concerns, and keep a written progress report to share with a counselor or private therapist.',GroupID:'230',ProblemNumber:'17'}) 99 | CREATE (intervention2:Intervention {name:'38', Description:'Encourage the parents and teachers to allow the student to seek his/her own solutions with guidance even if it requires some struggle and learning from mistakes. Recommend that the parents and teachers listen to the students problems with empathy and give guidance or assistance only when requested; discuss the results of this approach in a subsequent counseling session.',GroupID:'230',ProblemNumber:'17'}) 100 | CREATE (intervention3:Intervention {name:'1', Description:'Meet with the parents to obtain information about discipline, family harmony, and the students developmental history.',GroupID:'230',ProblemNumber:'17'}) 101 | CREATE (intervention4:Intervention {name:'8', Description:'Have the student complete the (Personal Profile) informational sheet from the School Counseling and School Social Homework Planner (Knapp), which details pertinent personal data, or gather personal information in an informal interview with the student."',GroupID:'230',ProblemNumber:'17'}) 102 | CREATE (lib)-[:HAS_PROBLEM]->(prob1) 103 | CREATE (prob1)-[:HAS_GOAL]->(goal1) 104 | CREATE (prob1)-[:HAS_GOAL]->(goal2) 105 | CREATE (prob1)-[:HAS_GOAL]->(goal3) 106 | CREATE (prob1)-[:HAS_GOAL]->(goal4) 107 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj1) 108 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj2) 109 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj3) 110 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj4) 111 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention1) 112 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention2) 113 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention3) 114 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention4) 115 | ---- 116 | // graph 117 | 118 | == Use Cases 119 | 120 | == Display All Objectives 121 | 122 | [source,cypher] 123 | ---- 124 | MATCH (c:Objective) 125 | RETURN c.Description AS Objective 126 | ---- 127 | //table 128 | 129 | == Display All Problems 130 | [source,cypher] 131 | ---- 132 | MATCH (c:Problem) 133 | RETURN c.Description AS Problem 134 | ---- 135 | //table 136 | 137 | == Display All Goals 138 | 139 | [source,cypher] 140 | ---- 141 | MATCH (c:Goal) 142 | RETURN c.Description AS Goal 143 | ---- 144 | //table 145 | 146 | == Find nterventions for all libraries and problem number 17 147 | 148 | [source,cypher] 149 | ---- 150 | MATCH (lib:Library)-[:HAS_PROBLEM]->(st:Problem{name:'17'})-[:HAS_INTERVENTION]-(i:Intervention) 151 | RETURN lib.Description AS Library, st.Description AS Problem, i.Description AS Intervention; 152 | ---- 153 | //table 154 | 155 | == Display All Problems, Interventions, and Objectives for one library 156 | 157 | [source,cypher] 158 | ---- 159 | MATCH (lib:Library{GroupID:'230'})-[:HAS_PROBLEM]->(st:Problem{name:'17'})-[:HAS_INTERVENTION]-(i:Intervention) with i,st MATCH (st)-[:HAS_OBJECTIVE]->(m:Objective) 160 | RETURN st.Description AS Problem, m.Description AS Objective, i.Description AS Intervention; 161 | ---- 162 | //table 163 | 164 | == Conclusion 165 | Developing the treatment planner in SQL took months to get correct and the performance to the point where it was useable. 166 | I used py2neo to populate import the data in to the graph. 167 | In all, it took less than a week from start to finish(it took longer to create this gist). 168 | 169 | //console 170 | -------------------------------------------------------------------------------- /networkITmanagment/datacenter-management-1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/networkITmanagment/datacenter-management-1.PNG -------------------------------------------------------------------------------- /networkITmanagment/network-routing.adoc: -------------------------------------------------------------------------------- 1 | = Information Flow Through a Network 2 | :neo4j-version: 2.3.0 3 | :twitter: @lyonwj 4 | :author: William L. 5 | 6 | :toc: 7 | 8 | == Introduction 9 | 10 | Modern financial markets operate very quickly due to algorithmic trading. 11 | Essentially, computers execute trades automatically based on information inputs and clever algorithms. All of this occurs very rapidly. 12 | The computers are so good at executing trades quickly, in fact, that the speed at which information moves from one city to another is actually a limiting factor. 13 | For example, if an announcement that impacts the financial markets is made in Washington, DC, traders in New York, NY will probably "hear" about it before those in, say, Seattle, WA. 14 | 15 | Since information can travel no faster than the speed of light (in reality it travels much more slowly because of various processing that must be done along the way) we can compute a lower bound on the time it will take a particular piece of information released in one city to arrive in other cities. 16 | As you might expect, this problem involves flow through a network, and is therefore fairly simple to model in Neo4j (or any graph database, really). 17 | 18 | == Data Model 19 | 20 | Our data model consists of a set of cities, each with a latitude and longitude, which we will use to compute distances (we could have entered the distances as data, but it was more fun to use Neo4j for this task, and in a more generalized case you might want the nodes to be able to move). 21 | Each city is linked to one or more other cities by `:BACKBONE_TO` relationships. 22 | This indicates that the two cities involved in the relationship have an Internet backbone running between them. 23 | For this example, we mostly made up the backbones, although the backbones running to Tokyo are accurate given actual undersea cable topology. 24 | 25 | .Information network data model 26 | image::http://i.imgur.com/uxv29rM.png[Information network data model] 27 | 28 | //hide 29 | //setup 30 | [source,cypher] 31 | ---- 32 | CREATE (chc:City { name: "Chicago", lat: 41.833, lon: -87.617 }), (sea:City { name: "Seattle", lat: 47.617, lon: -122.334 }), 33 | (sfo:City { name: "San Francisco", lat: 37.783, lon: -122.433 }), (tok:City { name: "Tokyo", lat: 35.667, lon: 139.75 }), 34 | (chc)-[:BACKBONE_TO]->(sea), (sea)-[:BACKBONE_TO]->(sfo), (sea)-[:BACKBONE_TO]->(tok), (sfo)-[:BACKBONE_TO]->(tok) 35 | ---- 36 | //graph 37 | 38 | == Distance Computation 39 | 40 | Before we can establish a lower bound on the time it takes information to flow between cities, we must determine the distance between cities that are linked by an Internet backbone. 41 | We assume that the cables connecting cities lie on https://en.wikipedia.org/wiki/Great-circle_distance[great circles], in other words, that each cable lies on the shortest possible line between cities. 42 | 43 | [source,cypher] 44 | ---- 45 | // Find the great circle distance from Tokyo to Seattle 46 | MATCH (c1:City { name: "Tokyo" }), (c2:City { name: "Seattle" }) 47 | RETURN 2 * 6371 * asin(sqrt(haversin(radians(c1.lat - c2.lat)) + cos(radians(c1.lat)) * cos(radians(c2.lat)) * haversin(radians(c1.lon - c2.lon)))) AS Distance 48 | ---- 49 | //table 50 | 51 | In our case we would like to record the distances between cities that are connected with an Internet backbone. 52 | We can store this data on the `:BACKBONE_TO` edges. 53 | Note that we don't have to worry about double-computing because we have specified a relationship direction, so each distance will only be computed once. 54 | If we left out the direction, each distance would be computed twice, although the end result would be exactly the same. 55 | 56 | [source,cypher] 57 | ---- 58 | // Add distance to backbone edges 59 | MATCH (c1:City)-[r:BACKBONE_TO]->(c2:City) 60 | WITH 2 * 6371 * asin(sqrt(haversin(radians(c1.lat - c2.lat))+ cos(radians(c1.lat))* cos(radians(c2.lat))* haversin(radians(c1.lon - c2.lon)))) AS dist, r, c1, c2 61 | SET r.dist = dist 62 | RETURN c1.name, c2.name, r.dist 63 | ---- 64 | //graph 65 | 66 | == Mapping Information Flow 67 | 68 | Our next step is to find candidate paths from one city to another. 69 | Once we have these paths, we can compute the minimum amount of time it would take a piece of information to move along each path, and find the shortest route between them. 70 | First, we can find all unique, simple (no repeated cities) from one city to another. 71 | Here is example for Tokyo and Chicago: 72 | 73 | [source,cypher] 74 | ---- 75 | // Find all unique, simple paths from one city to another 76 | MATCH p=(:City { name: "Tokyo" })-[:BACKBONE_TO*]-(:City { name: "Chicago" }) 77 | WHERE all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c))) 78 | RETURN DISTINCT extract(n IN nodes(p) | n.name) AS Path, length(p) AS Length 79 | ---- 80 | //table 81 | 82 | Next, we would like to find the shortest path, in terms of distance: 83 | 84 | [source,cypher] 85 | ---- 86 | // Find the shortest distance path from one city to another 87 | MATCH p=(:City { name: "Tokyo" })-[:BACKBONE_TO*]-(:City { name: "Chicago" }) 88 | WHERE all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c))) 89 | WITH reduce(s = 0, hop IN rels(p) | s + hop.dist) AS distance, p 90 | ORDER BY distance LIMIT 1 91 | RETURN DISTINCT extract(n IN nodes(p) | n.name) AS Path, length(p) AS Length, distance AS Distance 92 | ---- 93 | //table 94 | 95 | The next step is to change distance into an amount of time. 96 | We will assume that information travels between cities at the speed of light. 97 | As mentioned earlier this is not strictly true, but since we are looking for a lower bound on the time it takes information to move from one city to another, using the fastest possible speed for the information flow itself makes sense. 98 | 99 | As an aside, we could make this model more complicated, and perhaps more accurate, by taking into account the processing time each each junction point along the way. 100 | When information arrives at a particular node in the network it must be routed to the next node along its path, this takes some non-zero amount of time. 101 | Therefore, we might actually want to minimize the distance information has to travel, subject to the constraint that each "hop" has a cost. 102 | In this case, we would tend to prefer less complicated paths, even if they are slightly longer. 103 | 104 | We can compute the number of milliseconds it should take information to travel between cities using the query below. 105 | Note that we simply divide the distance by 300 to get milliseconds, since light travels at 300,000 kilometers per second. 106 | 107 | [source,cypher] 108 | ---- 109 | // Compute minimum milliseconds for information travel from one city to another 110 | MATCH p=(:City { name: "Tokyo" })-[:BACKBONE_TO*]-(:City { name: "Chicago" }) 111 | WHERE all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c))) 112 | WITH reduce(s = 0, hop IN rels(p) | s + hop.dist) AS distance, p 113 | ORDER BY distance LIMIT 1 114 | RETURN DISTINCT extract(n IN nodes(p) | n.name) AS Path, length(p) AS Length, distance / 300 AS ms 115 | ---- 116 | //table 117 | 118 | We can then generalize this query to compute the time it would take information to travel between any pair of cities. 119 | 120 | [source,cypher] 121 | ---- 122 | // Compute the minimum travel time between all pairs of cities 123 | MATCH p=(c1:City)-[:BACKBONE_TO*]-(c2:City) 124 | WHERE c1.name <> c2.name and all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c))) 125 | WITH reduce(s = 0, hop IN rels(p) | s + hop.dist) AS distance, p, c1, c2 126 | ORDER BY distance 127 | RETURN c1.name AS `Start City`, c2.name AS `End City`, collect(distance / 300)[0] AS ms 128 | ORDER BY c2.name 129 | ---- 130 | //table 131 | 132 | == About 133 | 134 | Create by George Lesica (https://twitter.com/glesica[@glesica]) and William Lyon (https://twitter.com/lyonwj[@lyonwj]). 135 | 136 | //console 137 | -------------------------------------------------------------------------------- /render-guides.sh: -------------------------------------------------------------------------------- 1 | export GUIDES=../neo4j-guides 2 | 3 | rm -rf html 4 | mkdir html 5 | 6 | $GUIDES/run.sh index.adoc html/index.html +1 http://guides.neo4j.com/graphgists 7 | 8 | s3cmd put -P html/index.html s3://guides.neo4j.com/graphgists 9 | 10 | #. http://neo4j.com/graphgist/9d627127-003b-411a-b3ce-f8d3970c2afa[Bank Fraud Detection] 11 | 12 | $GUIDES/run.sh fraud/bank-fraud-detection.adoc html/fraud 13 | 14 | #. http://neo4j.com/graphgist/56c4ceb8-0af1-4d36-b14c-aaa482dc2abc[Books Management Graph] 15 | 16 | $GUIDES/run.sh uc-search/books.adoc html/books 17 | 18 | #. http://neo4j.com/graphgist/ec65c2fa-9d83-4894-bc1e-98c475c7b57a[Analyzing Offshore Leaks] 19 | 20 | $GUIDES/run.sh fraud/Offshore_Leaks_and_Azerbaijan.adoc html/leaks 21 | 22 | #. http://neo4j.com/graphgist/306bb0c7-9820-4c29-9835-15625e4e9f96[Network Dependency Graph] 23 | 24 | $GUIDES/run.sh networkITmanagment/NetworkDataCenterManagement1.adoc html/network 25 | 26 | #. http://neo4j.com/graphgist/4cea8113-30e9-46bc-bbb0-06236a9bd8b9[Job Recommendation System] 27 | 28 | $GUIDES/run.sh recommendation/Competence_Management.adoc html/jobs 29 | 30 | s3cmd put -P --recursive html/* s3://guides.neo4j.com/graphgists/ 31 | -------------------------------------------------------------------------------- /retail/hierarchy_graphgist.adoc: -------------------------------------------------------------------------------- 1 | = (Product) Hierarchy GraphGist 2 | :neo4j-version: 2.3.0 3 | :twitter: @rvanbruggen 4 | :author: Rik Van Bruggen 5 | 6 | :toc: 7 | 8 | == Introduction 9 | 10 | This gist is a complement to http://blog.bruggen.com/2014/03/using-Neo4j-to-manage-and-calculate.html[a blogpost that I wrote] about managing hierarchical data structures in http://www.Neo4j.org[Neo4j]. 11 | 12 | In this example, we are using a "product hierarchy", essentially holding information about the composition of a product (what is it made of, how many of the components are used, and at the lowest level, what is the price of these components). 13 | The model looks like this: 14 | 15 | .Model of a Product Hierarchy 16 | image::http://1.bp.blogspot.com/-XIjEXWHpNmc/Uzbhuoo-9xI/AAAAAAABNWE/7zYyn3Vl3i0/s3200/Screen+Shot+2014-03-29+at+16.04.35.png[] 17 | 18 | Note that in the GraphGist, I have cut the tree depth to 5 levels (product to costs) instead of 6 in the blogpost - and that I also reduced the width of the tree to make it manageable in a gist. 19 | 20 | == Loading some data: a 5-level tree 21 | First we have to load the data into the graph. This was a bit of work - but not difficult at all: 22 | 23 | .Creating the top of the tree, the Product (just one in this case): 24 | [source,cypher] 25 | ---- 26 | CREATE (n1:Product {id:1}) 27 | ---- 28 | .Then CREATE the Cost Groups: 29 | [source,cypher] 30 | ---- 31 | MATCH (n1:Product) foreach (r in range(1,3) | CREATE (n2:CostGroup {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n1)) 32 | ---- 33 | .Then add the Cost Types to the Cost Groups: 34 | [source,cypher] 35 | ---- 36 | MATCH (n2:CostGroup) foreach (r in range(1,5) | CREATE (n3:CostType {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n2)) 37 | ---- 38 | .Then add the Cost Subtypes to the Cost Types: 39 | [source,cypher] 40 | ---- 41 | MATCH (n3:CostType) foreach (r in range(1,3) | CREATE (n4:CostSubtype {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n3)) 42 | ---- 43 | .Then finally add the Costs to the Cost Subtypes: 44 | [source,cypher] 45 | ---- 46 | MATCH (n4:CostSubtype) foreach (r in range(1,5) | CREATE (n5:COST {id:r,price:round(rand()*1000)})-[:PART_OF {quantity:round(rand()*100)}]->(n4)) 47 | ---- 48 | 49 | The actual graph then looks like this: 50 | 51 | //graph 52 | 53 | == Querying the hierarchy structure == 54 | 55 | Then we can do some easy queries. 56 | Let's check the structure of the hierarchy and the number of nodes: 57 | 58 | [source,cypher] 59 | ---- 60 | MATCH (n) 61 | RETURN labels(n) AS `Kinds of Nodes`, count(n) AS `Number of Nodes`; 62 | ---- 63 | 64 | This is what it looks like: 65 | 66 | //table 67 | 68 | Now let's start manipulating the graph and do some interesting stuff. 69 | Let's calculate the price of the product at the top of this hierarchy by sweeping through the graph and mutiplying price by the quantities on each ot the relationships. 70 | 71 | [source,cypher] 72 | ---- 73 | //calculating price based on full sweep of the tree 74 | MATCH (n1:Product {id:1})<-[r1]-(:CostGroup)<-[r2]-(:CostType)<-[r3]-(:CostSubtype)<-[r4]-(n5:COST) 75 | RETURN sum(r1.quantity*r2.quantity*r3.quantity*r4.quantity*n5.price) AS `Price of Product` 76 | ---- 77 | //table 78 | 79 | == Optimising the calculation with intermediate price values at every level 80 | 81 | But maybe we can do that more efficiently by calculating intermediate prices for each of the levels in the hierarchy: 82 | 83 | [source, cypher] 84 | ---- 85 | //calculate intermediate pricing 86 | MATCH (n4:CostSubtype)<-[r4]-(n5:COST) 87 | WITH n4,sum(r4.quantity*n5.price) AS Sum 88 | SET n4.price=Sum; 89 | ---- 90 | [source, cypher] 91 | ---- 92 | MATCH (n3:CostType)<-[r3]-(n4:CostSubtype) 93 | WITH n3,sum(r3.quantity*n4.price) AS Sum 94 | SET n3.price=Sum; 95 | ---- 96 | [source, cypher] 97 | ---- 98 | MATCH (n2:CostGroup)<-[r2]-(n3:CostType) 99 | WITH n2,sum(r2.quantity*n3.price) AS Sum 100 | SET n2.price=Sum; 101 | ---- 102 | [source, cypher] 103 | ---- 104 | MATCH (n1:Product)<-[r1]-(n2:CostGroup) 105 | WITH n1, sum(r1.quantity*n2.price) AS Sum 106 | SET n1.price=Sum 107 | RETURN Sum; 108 | ---- 109 | //table 110 | 111 | Then we can easily calculate the price of the product by just using the intermediate pricing, and scanning a MUCH smaller part of the graph: 112 | 113 | [source, cypher] 114 | ---- 115 | MATCH (n1:Product {id:1})<-[r1]-(n2:CostGroup) 116 | RETURN sum(r1.quantity*n2.price) AS `Price of Product` 117 | ---- 118 | 119 | //table 120 | 121 | We can check the accuracy by looking at a different level and verifying if we get the same result: 122 | 123 | [source, cypher] 124 | ---- 125 | MATCH (n1:Product {id:1})<-[r1]-(n2:CostGroup)<-[r2]-(n3:CostType) 126 | RETURN sum(r1.quantity*r2.quantity*n3.price) AS `Price of Product` 127 | ---- 128 | //table 129 | 130 | Yey! That seems to have confirmed the theory! 131 | 132 | == What if something changes to the hierarchy? == 133 | Now let's see what happens if we change something to the price of one of the costs at the bottom of the tree: 134 | 135 | [source,cypher] 136 | ---- 137 | MATCH (n5:COST) 138 | WITH n5, n5.price AS OldPrice LIMIT 1 139 | SET n5.price = n5.price*10 140 | WITH n5.price-OldPrice AS PriceDiff,n5 141 | MATCH (n5)-[r4:PART_OF]->(n4:CostSubtype)-[r3:PART_OF]->(n3:CostType)-[r2:PART_OF]->(n2:CostGroup)-[r1:PART_OF]-(n1:Product) 142 | SET n4.price=n4.price+(PriceDiff*r4.quantity), 143 | n3.price=n3.price+(PriceDiff*r4.quantity*r3.quantity), 144 | n2.price=n2.price+(PriceDiff*r4.quantity*r3.quantity*r2.quantity), 145 | n1.price=n1.price+(PriceDiff*r4.quantity*r3.quantity*r2.quantity*r1.quantity) 146 | RETURN PriceDiff AS `Price Difference`, n1.price AS `New Price of Product` 147 | ---- 148 | //table 149 | 150 | Then we can also go back and replay the queries above and see what has happened in the console below: 151 | 152 | == Conclusion == 153 | 154 | I hope this gist complements the blogpost and gives you some ideas around how to work with any kind of hierarchy using Neo4j. 155 | 156 | == About the Author 157 | 158 | This gist was created by link:mailto:rik@neotechnology.com[Rik Van Bruggen] 159 | 160 | * link:http://blog.bruggen.com[My Blog] 161 | * link:http://twitter.com/rvanbruggen[On Twitter] 162 | * link:http://be.linkedin.com/in/rikvanbruggen/[On LinkedIn] 163 | 164 | //console 165 | -------------------------------------------------------------------------------- /retail/northwind-graph.adoc: -------------------------------------------------------------------------------- 1 | = Northwind Graph 2 | :neo4j-version: 2.3.0 3 | 4 | :toc: 5 | 6 | == From RDBMS to Graph, using a classic dataset 7 | 8 | The__Northwind Graph__ demonstrates how to migrate from a relational 9 | database to Neo4j. The transformation is iterative and deliberate, 10 | emphasizing the conceptual shift from relational tables to the nodes and 11 | relationships of a graph. 12 | 13 | This guide will show you how to: 14 | 15 | 1. Load: create data from external CSV files 16 | 2. Index: index nodes based on label 17 | 3. Relate: transform foreign key references into data relationships 18 | 4. Promote: transform join records into relationships 19 | 20 | 21 | == Product Catalog 22 | 23 | Northwind sells food products in a few categories, provided by 24 | suppliers. Let's start by loading the product catalog tables. 25 | 26 | The load statements to the right require public internet 27 | access.`LOAD CSV` will retrieve a CSV file from a valid URL, applying a 28 | Cypher statement to each row using a named map (here we're using the 29 | name `row`). 30 | 31 | image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151109/product-category-supplier.png[image] 32 | 33 | == Load records 34 | 35 | [source,cypher] 36 | ---- 37 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row 38 | CREATE (n:Product) 39 | SET n = row, 40 | n.unitPrice = toFloat(row.unitPrice), 41 | n.unitsInStock = toInt(row.unitsInStock), n.unitsOnOrder = toInt(row.unitsOnOrder), 42 | n.reorderLevel = toInt(row.reorderLevel), n.discontinued = (row.discontinued <> "0") 43 | ---- 44 | 45 | [source,cypher] 46 | ---- 47 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/categories.csv" AS row 48 | CREATE (n:Category) 49 | SET n = row 50 | ---- 51 | 52 | [source,cypher] 53 | ---- 54 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/suppliers.csv" AS row 55 | CREATE (n:Supplier) 56 | SET n = row 57 | ---- 58 | 59 | == Create indexes 60 | 61 | [source,cypher] 62 | ---- 63 | CREATE INDEX ON :Product(productID) 64 | ---- 65 | 66 | [source,cypher] 67 | ---- 68 | CREATE INDEX ON :Category(categoryID) 69 | ---- 70 | 71 | [source,cypher] 72 | ---- 73 | CREATE INDEX ON :Supplier(supplierID) 74 | ---- 75 | 76 | == Product Catalog Graph 77 | 78 | The products, categories and suppliers are related through foreign key 79 | references. Let's promote those to data relationships to realize the 80 | graph. 81 | 82 | image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151108/product-graph.png[image] 83 | 84 | === Create data relationships 85 | 86 | [source,cypher] 87 | ---- 88 | MATCH (p:Product),(c:Category) 89 | WHERE p.categoryID = c.categoryID 90 | CREATE (p)-[:PART_OF]->(c) 91 | ---- 92 | 93 | Note you only need to compare property values like this when first 94 | creating relationships 95 | 96 | Calculate join, materialize relationship. 97 | (See http://neo4j.com/developer/guide-importing-data-and-etl[importing 98 | guide] for more details) 99 | 100 | [source,cypher] 101 | ---- 102 | MATCH (p:Product),(s:Supplier) 103 | WHERE p.supplierID = s.supplierID 104 | CREATE (s)-[:SUPPLIES]->(p) 105 | ---- 106 | 107 | Note you only need to compare property values like this when first 108 | creating relationships 109 | 110 | == Querying Product Catalog Graph 111 | 112 | Lets try some queries using patterns. 113 | 114 | image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151108/product-graph.png[image] 115 | 116 | === Query using patterns 117 | 118 | List the product categories provided by each supplier: 119 | 120 | [source,cypher] 121 | ---- 122 | MATCH (s:Supplier)-->(:Product)-->(c:Category) 123 | RETURN s.companyName as Company, collect(distinct c.categoryName) as Categories 124 | ---- 125 | //table 126 | 127 | [source,cypher] 128 | ---- 129 | MATCH (c:Category {categoryName:"Produce"})<--(:Product)<--(s:Supplier) 130 | RETURN DISTINCT s.companyName as ProduceSuppliers 131 | ---- 132 | //table 133 | 134 | Find the produce suppliers. 135 | 136 | == Customer Orders 137 | 138 | Northwind customers place orders which may detail multiple 139 | products.image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151108/customer-orders.png[image] 140 | 141 | === Load and index records 142 | 143 | [source,cypher] 144 | ---- 145 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/customers.csv" AS row 146 | CREATE (n:Customer) 147 | SET n = row 148 | ---- 149 | 150 | [source,cypher] 151 | ---- 152 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/orders.csv" AS row 153 | CREATE (n:Order) 154 | SET n = row 155 | ---- 156 | 157 | [source,cypher] 158 | ---- 159 | CREATE INDEX ON :Customer(customerID) 160 | ---- 161 | 162 | [source,cypher] 163 | ---- 164 | CREATE INDEX ON :Order(orderID) 165 | ---- 166 | 167 | == Create data relationships 168 | 169 | [source,cypher] 170 | ---- 171 | MATCH (c:Customer),(o:Order) 172 | WHERE c.customerID = o.customerID 173 | CREATE (c)-[:PURCHASED]->(o) 174 | ---- 175 | 176 | Note you only need to compare property values like this when first 177 | creating relationships 178 | 179 | == Customer Order Graph 180 | 181 | Notice that Order Details are always part of an Order and that 182 | they__relate__ the Order to a Product — they're a join table. Join 183 | tables are always a sign of a data relationship, indicating shared 184 | information between two other records. 185 | 186 | Here, we'll directly promote each OrderDetail record into a relationship 187 | in the graph.image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151107/order-graph.png[image] 188 | 189 | 190 | === Load and index records 191 | 192 | [source,cypher] 193 | ---- 194 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/order-details.csv" AS row 195 | MATCH (p:Product), (o:Order) 196 | WHERE p.productID = row.productID AND o.orderID = row.orderID 197 | CREATE (o)-[details:ORDERS]->(p) 198 | SET details = row, 199 | details.quantity = toInt(row.quantity) 200 | ---- 201 | 202 | Note you only need to compare property values like this when first 203 | creating relationships 204 | 205 | == Query using patterns 206 | 207 | [source,cypher] 208 | ---- 209 | MATCH (cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product), 210 | (p)-[:PART_OF]->(c:Category {categoryName:"Produce"}) 211 | RETURN DISTINCT cust.contactName as CustomerName, SUM(o.quantity) AS TotalProductsPurchased 212 | ---- 213 | //table 214 | 215 | _More Resources_ 216 | 217 | * http://neo4j.com/developer/guide-importing-data-and-etl/[Full 218 | Northwind import example] 219 | * http://neo4j.com/developer[Developer resources] 220 | 221 | -------------------------------------------------------------------------------- /syntax.adoc: -------------------------------------------------------------------------------- 1 | = How to create a GraphGist 2 | Anders Nawroth 3 | v0.1, 2013-09-01 4 | :neo4j-version: 2.3 5 | :author: Anders Nawroth 6 | :twitter: @nawroth 7 | :style: red:Person(name), #54A835/#1078B5/white:Database(name) 8 | 9 | You create a GraphGist by creating a https://gist.github.com/[GitHub Gist] in http://asciidoctor.org/docs/asciidoc-quick-reference/[AsciiDoc] and enter the URL to it in the form on this page. 10 | Alternatively, you can put an AsciiDoc document in https://www.dropbox.com/[Dropbox], etherpad, pastebin or google doc, and enter the public URL in the URL-box top-right. 11 | 12 | This GraphGist shows the basics of using AsciiDoc syntax and a few additions for GraphGists. 13 | The additions are entered as comments on their own line. 14 | They are: +//console+ for a query console; +//hide+, +//setup+ and +//output+ to configure a query; +//graph+ and +//table+ to visualize queries and show a result table. 15 | 16 | Click on the Page Source button in the menu to see the source for this GraphGist. 17 | Read below to get the full details. 18 | 19 | == Configure GraphGist Metadata 20 | 21 | The metadata is optional, it is partially used when submitting a GraphGist. 22 | To select a particular version, the `neo4j-version` attribute can be used. 23 | 24 | To provide custom styling to the graph visualization the `style` attribute provides means to set the color/[border-color]/[text-color] for a label and to pre-select a property. 25 | The general syntax is: `:style: color:Label1(property1), color/border-color/text-color:Label2(property2),...` 26 | 27 | Here are the settings for this document. 28 | 29 | [subs="attributes"] 30 | ---- 31 | :neo4j-version: {neo4j-version} 32 | :author: {author} 33 | :twitter: {twitter} 34 | :style: {style} 35 | ---- 36 | 37 | == Define a http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html[Cypher] query 38 | 39 | [source,cypher] 40 | ---- 41 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what) 42 | RETURN who,likes,what 43 | ---- 44 | 45 | becomes: 46 | 47 | [source,cypher] 48 | ---- 49 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what) 50 | RETURN who,likes,what 51 | ---- 52 | 53 | _Queries are executed in the order they appear on the page during rendering, so make sure they can be performed in that order._ 54 | Each query has a green or red button to indicate if the query was successful or not. 55 | The console is set up after the executions, with an empty database, for the reader to play around with the queries. 56 | 57 | There's three additional settings you can use for queries. 58 | They all go as comments, on their own lines, before the query. 59 | The settings are: 60 | 61 | [width="50%",cols="1m,5"] 62 | |=== 63 | | hide | Hide the query. The reader can still expand it to see it. 64 | Useful for long queries like setting up initial data. 65 | | setup | Initialize the console with this query. 66 | | output | Show the output from the query. 67 | The output is always there, but this option makes it visible at page load for this query. 68 | |=== 69 | 70 | Let's try all the settings together, which means this query will be used to initialize the console, it will be hidden, and the raw output will be shown: 71 | 72 | //hide 73 | //setup 74 | //output 75 | [source,cypher] 76 | ---- 77 | CREATE (me:Person {name:'Me'})-[r:LIKES]->(neo4j:Database {name:'Neo4j',link:'http://neo4j.com'}) 78 | RETURN me.name, r, neo4j 79 | ---- 80 | 81 | which becomes: 82 | 83 | //hide 84 | //setup 85 | //output 86 | [source,cypher] 87 | ---- 88 | CREATE (me:Person {name:'Me'})-[r:LIKES]->(neo4j:Database {name:'Neo4j',link:'http://neo4j.com'}) 89 | RETURN me.name, r, neo4j 90 | ---- 91 | 92 | 93 | == Show a graph visualization 94 | 95 | The visualization is based on the **database contents** after the preceding query in the page. 96 | 97 | +//graph+ 98 | 99 | becomes: 100 | 101 | //graph 102 | 103 | 104 | == Show a graph visualization 105 | 106 | [source,cypher] 107 | ---- 108 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what) 109 | RETURN who,likes,what 110 | ---- 111 | 112 | The visualisation is based on the **results** returned by the preceding query in the page. 113 | 114 | +//graph_result+ 115 | 116 | becomes: 117 | 118 | //graph_result 119 | 120 | == Show a result table for a query 121 | 122 | This will show a result table for the preceding query. 123 | Properties/Cells that are URLs are rendered as links and image URLs are rendered as inline images. 124 | 125 | [source,cypher] 126 | ---- 127 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what) 128 | RETURN who,likes,what, who.name, what.name, what.link 129 | ---- 130 | 131 | +//table+ 132 | 133 | becomes: 134 | 135 | //table 136 | 137 | == Query with error 138 | 139 | This is what happens if a query causes an error: 140 | 141 | [source,cypher] 142 | ---- 143 | CREATE (n:QueryLanguage {name:'cypher'} 144 | ---- 145 | 146 | == Include a query console 147 | 148 | A console button will be included anyway to hide or show the console. 149 | This will place the cypher-console where you want it and open it by default. 150 | 151 | +//console+ 152 | 153 | becomes: 154 | 155 | //console 156 | 157 | == Basic AsciiDoc formatting 158 | 159 | [width="50%",cols="1m,1a"] 160 | |=== 161 | | \_Italic_ | _Italic_ 162 | | \*Bold* | *Bold* 163 | | \`Monospace` | `Monospace` 164 | | `http://www.neo4j.org/` | http://www.neo4j.org/ 165 | | `http://www.neo4j.org/[neo4j.org]` | http://www.neo4j.org/[neo4j.org] 166 | | `link:./?5956246[Link to a GraphGist]` | link:./?5956246[Link to a GraphGist] 167 | |=== 168 | 169 | Document Info: 170 | 171 | ---- 172 | = Graph Gist Title 173 | :neo4j-version: 2.3 174 | :author: Author Name 175 | :twitter: twitterhandle 176 | ---- 177 | 178 | Headings: 179 | 180 | = Heading 1 181 | == Heading 2 182 | === Heading 3 183 | 184 | Images: 185 | 186 | image::http://assets.neo4j.org/img/still/cineasts.gif[] 187 | 188 | image::http://assets.neo4j.org/img/still/cineasts.gif[] 189 | 190 | ---- 191 | * Item 1 192 | ** Item 1.1 193 | * Item 2 194 | ---- 195 | 196 | * Item 1 197 | ** Item 1.1 198 | * Item 2 199 | 200 | ---- 201 | . First 202 | . Second 203 | ---- 204 | 205 | . First 206 | . Second 207 | 208 | Monospaced block: indent lines with one space. 209 | 210 | Tables are well supported. 211 | See http://asciidoctor.org/docs/asciidoc-quick-reference/[AsciiDoc Quick Reference] for information on that and more. 212 | -------------------------------------------------------------------------------- /uc-search/graphgist_water.adoc: -------------------------------------------------------------------------------- 1 | = Piping Water 2 | :neo4j-version: 2.3.0 3 | :author: Shaun Daley 4 | :twitter: @shaundaley1 5 | :tags: resources 6 | :domain: Shutting Valves and Migrating Infrastructure 7 | 8 | :toc: 9 | 10 | == Inspiration 11 | 12 | London's antique water distribution network is infamous: it loses a http://www.theguardian.com/commentisfree/2012/may/08/water-industry-pipes-scandal[quarter of the water] supplied to London (spilt into the ground). Consequence: http://www.bbc.co.uk/news/10213835[desalination], massive additional CO~2~ emissions, road congestion caused by too many emergency excavations and very high water prices for consumers. 13 | London's case is severe but not atypical: most cities suffer from the same underlying infrastructure problem. 14 | Pipes and valves buried below busy urban streets are inherently difficult and expensive to maintain. 15 | Inaccessibility, lack of information, failure to efficiently process data and the high cost of each human intervention in legacy systems all compound to undermine efficient resource distribution. 16 | 17 | Modern wireless networking offers the first crucial part in reducing the human time and cost of maintenance, and avoiding environmental damage: modern valves can be remotely (and even autonomously) operated to isolate components and pipe sections; modern flow sensors can transmit cross-section flow rates minute-by-minute. 18 | The remaining challenge is to log, model and process these resources on a city scale (London sized) with several hundred million components (pipe sections, valves, pumps, flow sensors, outlets, sources, etc) and the sparse, evolving relations between those components. 19 | Neo4j fits this bill perfectly. 20 | 21 | To illustrate the domain, we'll eliminate most complexity and focus on a single (but ubiquitous) problem. 22 | 23 | 24 | _Using the Design for Queryability modeling approach by http://twitter.com/ianrobinson[Ian Robinson]_ 25 | 26 | == Illustrating the Domain 27 | 28 | === Application/End-User Goals 29 | 30 | ____ 31 | *As an* engineer for a water utility 32 | 33 | *I want* to know the accessible (remotely controllable, man-hole-accessible or excavation-accessible) valves for shutting off a set of components. 34 | 35 | *So that* we can rapidly isolate a burst pipe, leaking valve or a set of components for replacement. 36 | ____ 37 | 38 | === Questions To Ask of the Domain 39 | 40 | ____ 41 | What is the set of valves that must be closed to isolate the smallest possible part of the network including a set of components? 42 | What is the minimum set of network controllable (or manhole-accessible) valves that must be closed to isolate a set of components? 43 | ____ 44 | 45 | 46 | === Identify Nodes 47 | 48 | * Component: Pipes, sources, outlets, pumps, flow sensors, et cetera 49 | * Connection 50 | * Valve 51 | 52 | == Identify Relationships Between Nodes 53 | 54 | ---- 55 | (:Component)<-[:CONNECTS]-(:Connection)-[:CONNECTS]->(:Component) 56 | (:Valve)-[:CLOSES]->(:Connection) 57 | ---- 58 | 59 | Valves are distinct objects with their own relationships e.g. IP address, API key, state information, whether they are manually closed clockwise or anti-clockwise (both valve closing directions exist and are widely prevalent in UK legacy water distribution infrastructure). 60 | For efficiency of querying, the nature of connections is duplicated in the relationship also: 61 | 62 | ---- 63 | (:Valve{access:'API'})->[:CLOSES]->(:Connection)->[:API_CONNECTS]->(:Component) 64 | (:Valve{access:'Manhole'})->[:CLOSES]->(:Connection)->[:MANHOLE_CONNECTS]->(:Component) 65 | (:Valve{access:'Excavation'})->[:CLOSES]->(:Connection)->[:EXCAV_CONNECTS]->(:Component) 66 | ---- 67 | 68 | == Candidate Data Model 69 | 70 | //hide 71 | //setup 72 | [source,cypher] 73 | ---- 74 | CREATE 75 | (burstPipe:Component{name:"BurstPipe"}), 76 | (pipe1:Component), 77 | (pipe2:Component), 78 | (pipe3:Component), 79 | (pipe4:Component), 80 | (pipe5:Component), 81 | (pipe6:Component), 82 | (pipe7:Component), 83 | (pipe8:Component), 84 | (pipe9:Component), 85 | (pipe10:Component), 86 | (pipe11:Component), 87 | (connection1:Connection), 88 | (connection2:Connection), 89 | (connection3:Connection), 90 | (connection4:Connection), 91 | (connection5:Connection), 92 | (connection6:Connection), 93 | (connection7:Connection), 94 | (connection8:Connection), 95 | (connection9:Connection), 96 | (connection10:Connection), 97 | (connection11:Connection), 98 | (valve1:Valve {access:'API'}), 99 | (valve2:Valve {access:'Excavation'}), 100 | (valve3:Valve {access:'API'}), 101 | (valve4:Valve {access:'Manhole'}), 102 | (valve5:Valve {access:'API'}), 103 | (valve6:Valve {access:'API'}), 104 | (valve7:Valve {access:'API'}), 105 | (valve8:Valve {access:'API'}), 106 | (connection1)-[:API_CONNECTS]->(burstPipe), 107 | (connection1)-[:API_CONNECTS]->(pipe1), 108 | (connection2)-[:EXCAV_CONNECTS]->(burstPipe), 109 | (connection2)-[:EXCAV_CONNECTS]->(pipe2), 110 | (connection3)-[:CONNECTS]->(pipe2), 111 | (connection3)-[:CONNECTS]->(pipe3), 112 | (connection4)-[:API_CONNECTS]->(pipe2), 113 | (connection4)-[:API_CONNECTS]->(pipe4), 114 | (connection5)-[:MANHOLE_CONNECTS]->(pipe3), 115 | (connection5)-[:MANHOLE_CONNECTS]->(pipe5), 116 | (connection6)-[:API_CONNECTS]->(pipe3), 117 | (connection6)-[:API_CONNECTS]->(pipe6), 118 | (connection7)-[:CONNECTS]->(pipe5), 119 | (connection7)-[:CONNECTS]->(pipe7), 120 | (connection8)-[:CONNECTS]->(pipe5), 121 | (connection8)-[:CONNECTS]->(pipe8), 122 | (connection9)-[:API_CONNECTS]->(pipe5), 123 | (connection9)-[:API_CONNECTS]->(pipe9), 124 | (connection10)-[:API_CONNECTS]->(pipe7), 125 | (connection10)-[:API_CONNECTS]->(pipe10), 126 | (connection11)-[:API_CONNECTS]->(pipe8), 127 | (connection11)-[:API_CONNECTS]->(pipe11), 128 | (valve1)-[:CLOSES]->(connection1), 129 | (valve2)-[:CLOSES]->(connection2), 130 | (valve3)-[:CLOSES]->(connection4), 131 | (valve4)-[:CLOSES]->(connection5), 132 | (valve5)-[:CLOSES]->(connection6), 133 | (valve6)-[:CLOSES]->(connection9), 134 | (valve7)-[:CLOSES]->(connection10), 135 | (valve8)-[:CLOSES]->(connection11) 136 | RETURN * 137 | ---- 138 | // graph 139 | 140 | === Isolate the Burst Pipe Using Only Remote Calls to API-Accessible Valves 141 | 142 | [source,cypher] 143 | ---- 144 | START burstPipe=node:node_auto_index(name='BurstPipe') 145 | MATCH (burstPipe)-[:CONNECTS|EXCAV_CONNECTS|MANHOLE_CONNECTS*0..]-()-[:API_CONNECTS]-(h)-[:CLOSES]-(v {access:'API'}) 146 | RETURN v 147 | ---- 148 | // table 149 | 150 | === Isolate the Burst Pipe Using Manhole-Accessible and API-Accessible Valves 151 | 152 | [source,cypher] 153 | ---- 154 | START burstPipe=node:node_auto_index(name='BurstPipe') 155 | MATCH (burstPipe)-[:CONNECTS|EXCAV_CONNECTS*0..]-()-[:MANHOLE_CONNECTS|API_CONNECTS]-(h)-[:CLOSES]-(v) 156 | RETURN v 157 | ---- 158 | // table 159 | 160 | === Isolate the Burst Pipe Using Any Existing Valves 161 | 162 | [source,cypher] 163 | ---- 164 | START burstPipe=node:node_auto_index(name='BurstPipe') 165 | MATCH (burstPipe)-[:CONNECTS*0..]-()-[:EXCAV_CONNECTS|MANHOLE_CONNECTS|API_CONNECTS]-(h)-[:CLOSES]-(v) 166 | RETURN v 167 | ---- 168 | // table 169 | 170 | == Extension 171 | 172 | For real world application, there are some necessary modifications (e.g. modelling state information in relationships, such as whether a connection is presently closed or scheduled for opening/closing; limiting query depth and notifying of query failure in event of maximum query depth being reached). 173 | 174 | In real world application, extending the above model, there is potential for adding greater value still: 175 | 176 | - estimating the marginal water savings from replacing any defined set of components 177 | - estimating the resilience of network water pressure to failure of specific pumps (both current and under hypothetical modifications to the network) 178 | - scheduling replacement or state-change of parts, and communicating this seamlessly (and automatically) in real time to all other parties that this might affect 179 | 180 | This approach is more generic than it may initially seem. 181 | Many resource problems involve networks of distribution in which many components interact across sparse relationships (electricity generation and distribution, natural gas, sewage, district-piped heating); rapid and efficient querying on these relationships is necessary for efficient resource allocation and better environmental and cost outcomes. 182 | 183 | //console 184 | --------------------------------------------------------------------------------