├── .gitignore
├── LICENSE
├── README.md
├── browser-guides
├── apoc
│ ├── 01_apoc_intro.adoc
│ ├── 02_datetime.adoc
│ ├── 03_load_json.adoc
│ ├── 04_refactor_data.adoc
│ ├── 05_periodic.adoc
│ └── apoc.adoc
├── bebe
│ └── bebe_en.adoc
├── data
│ ├── CompanyDataAmericans.csv
│ ├── ElectionDonationsAmericans.csv
│ ├── LandOwnershipAmericans.csv
│ ├── PSCAmericans.csv
│ ├── asoiaf-all-edges.csv
│ ├── asoiaf-book1-edges.csv
│ ├── asoiaf-book2-edges.csv
│ ├── asoiaf-book3-edges.csv
│ ├── asoiaf-book45-edges.csv
│ ├── employee-map.json
│ ├── person.json
│ ├── stream_clean.json
│ └── worldcities.csv
├── data_science
│ ├── 01_data_import.adoc
│ ├── 02_analysis_algo.adoc
│ ├── 03_pagerank.adoc
│ ├── 04_label_propagation.adoc
│ ├── 05_louvain.adoc
│ ├── 06_betweenness.adoc
│ ├── data_science.adoc
│ └── installing_apoc.adoc
├── football_transfers
│ └── football_transfers.adoc
├── got
│ ├── 01_eda.adoc
│ ├── 02_algorithms.adoc
│ └── got.adoc
├── got_wwc
│ ├── 01_intro.adoc
│ ├── 02_got.adoc
│ ├── 03_got_houses.adoc
│ ├── 04_got_families.adoc
│ └── got_wwc.adoc
├── hospital
│ └── hospital.adoc
├── img
│ ├── AStormOfSwords.jpg
│ ├── Graph_betweenness.jpg
│ ├── PageRanks-Example.png
│ ├── apoc-neo4j-user-defined-procedures.jpg
│ ├── betweenness-centrality.png
│ ├── bugs-bunny-the-end.jpg
│ ├── char_cooccurence.png
│ ├── cypher_create.jpg
│ ├── cypher_run_button.jpg
│ ├── cytutorial_neo4j_browser.jpg
│ ├── dark-chocolate-pudding-with-malted-cream.jpg
│ ├── database_import.png
│ ├── document_common_attributes.png
│ ├── download_csv.png
│ ├── download_graph.png
│ ├── enable_multiline_queries.jpg
│ ├── footballtransfer-model.png
│ ├── got_header.png
│ ├── graph-data-science.jpg
│ ├── hospitalmeta.jpg
│ ├── jqassistant.png
│ ├── label-propagation-graph-algorithm-1.png
│ ├── label-propagation-graph-algorithm.png
│ ├── life-science-import-datamodel.jpg
│ ├── life-sciences-import-model-attribute.jpg
│ ├── life-sciences-import-model-gene.jpg
│ ├── louvain.jpg
│ ├── meetup.png
│ ├── n10s.png
│ ├── neo4j-browser-sync.png
│ ├── nodes.png
│ ├── northwind_data_model.png
│ ├── pin_button.png
│ ├── rdf.png
│ ├── restaurant_recommendation_model.png
│ ├── schema.png
│ ├── schema_documents.png
│ ├── slides.jpg
│ ├── stackexchange-logo.png
│ ├── stackoverflow-logo.png
│ ├── stackoverflow-model.jpg
│ ├── style_actedin_relationship.png
│ ├── style_person_node.png
│ ├── style_sheet_grass.png
│ ├── sushi_restaurants_nyc.png
│ ├── sysinfo_stats.png
│ ├── transfermarkt.png
│ └── ukcompanies_model.png
├── import
│ ├── 01_load_csv.adoc
│ ├── 02_apoc.adoc
│ ├── 03_procedures.adoc
│ └── import.adoc
├── intro-browser
│ └── intro-browser.adoc
├── javaland
│ └── javaland.adoc
├── jqa
│ └── jqa.adoc
├── life-science-import
│ └── life-science-import.adoc
├── meetup
│ ├── 01_meetup_import.adoc
│ ├── 02_data_analysis.adoc
│ └── meetup.adoc
├── rdf
│ └── rdf.adoc
├── recipes
│ └── recipes.adoc
├── restaurant_recommendation
│ └── restaurant_recommendation.adoc
├── stackoverflow
│ └── stackoverflow.adoc
└── ukcompanies
│ └── ukcompanies.adoc
├── finance
└── neo4j_icij.adoc
├── fraud
├── BankFraud-1.png
├── Credit_Card_Fraud_Detection.adoc
├── Offshore_Leaks_and_Azerbaijan.adoc
└── bank-fraud-detection.adoc
├── index.adoc
├── mdm
├── Organizational_learning.adoc
└── aws-infrastructure.adoc
├── medical
├── DoctorFinder.adoc
├── central_hospital_of_asturias.adoc
├── pharma_drugs_targets.adoc
├── treatment_planners.adoc
└── zombie.adoc
├── networkITmanagment
├── GeoptimaAllocation.adoc
├── NetworkDataCenterManagement1.adoc
├── datacenter-management-1.PNG
└── network-routing.adoc
├── recommendation
├── Competence_Management.adoc
└── marchMadnessBracketBuilder.adoc
├── render-guides.sh
├── retail
├── Menus_in_NYPL.adoc
├── SupplyChainManagement.adoc
├── hierarchy_graphgist.adoc
└── northwind-graph.adoc
├── social
├── finding_influencers.adoc
├── neo4j-contact-networks.adoc
└── project_management.adoc
├── syntax.adoc
├── uc-search
├── books.adoc
├── citation_patterns.adoc
├── graphgist_water.adoc
└── yellowstone-gist.adoc
└── web
└── Aardvark.adoc
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | html
3 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | CC0 1.0 Universal
2 |
3 | Statement of Purpose
4 |
5 | The laws of most jurisdictions throughout the world automatically confer
6 | exclusive Copyright and Related Rights (defined below) upon the creator and
7 | subsequent owner(s) (each and all, an "owner") of an original work of
8 | authorship and/or a database (each, a "Work").
9 |
10 | Certain owners wish to permanently relinquish those rights to a Work for the
11 | purpose of contributing to a commons of creative, cultural and scientific
12 | works ("Commons") that the public can reliably and without fear of later
13 | claims of infringement build upon, modify, incorporate in other works, reuse
14 | and redistribute as freely as possible in any form whatsoever and for any
15 | purposes, including without limitation commercial purposes. These owners may
16 | contribute to the Commons to promote the ideal of a free culture and the
17 | further production of creative, cultural and scientific works, or to gain
18 | reputation or greater distribution for their Work in part through the use and
19 | efforts of others.
20 |
21 | For these and/or other purposes and motivations, and without any expectation
22 | of additional consideration or compensation, the person associating CC0 with a
23 | Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
24 | and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
25 | and publicly distribute the Work under its terms, with knowledge of his or her
26 | Copyright and Related Rights in the Work and the meaning and intended legal
27 | effect of CC0 on those rights.
28 |
29 | 1. Copyright and Related Rights. A Work made available under CC0 may be
30 | protected by copyright and related or neighboring rights ("Copyright and
31 | Related Rights"). Copyright and Related Rights include, but are not limited
32 | to, the following:
33 |
34 | i. the right to reproduce, adapt, distribute, perform, display, communicate,
35 | and translate a Work;
36 |
37 | ii. moral rights retained by the original author(s) and/or performer(s);
38 |
39 | iii. publicity and privacy rights pertaining to a person's image or likeness
40 | depicted in a Work;
41 |
42 | iv. rights protecting against unfair competition in regards to a Work,
43 | subject to the limitations in paragraph 4(a), below;
44 |
45 | v. rights protecting the extraction, dissemination, use and reuse of data in
46 | a Work;
47 |
48 | vi. database rights (such as those arising under Directive 96/9/EC of the
49 | European Parliament and of the Council of 11 March 1996 on the legal
50 | protection of databases, and under any national implementation thereof,
51 | including any amended or successor version of such directive); and
52 |
53 | vii. other similar, equivalent or corresponding rights throughout the world
54 | based on applicable law or treaty, and any national implementations thereof.
55 |
56 | 2. Waiver. To the greatest extent permitted by, but not in contravention of,
57 | applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
58 | unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
59 | and Related Rights and associated claims and causes of action, whether now
60 | known or unknown (including existing as well as future claims and causes of
61 | action), in the Work (i) in all territories worldwide, (ii) for the maximum
62 | duration provided by applicable law or treaty (including future time
63 | extensions), (iii) in any current or future medium and for any number of
64 | copies, and (iv) for any purpose whatsoever, including without limitation
65 | commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
66 | the Waiver for the benefit of each member of the public at large and to the
67 | detriment of Affirmer's heirs and successors, fully intending that such Waiver
68 | shall not be subject to revocation, rescission, cancellation, termination, or
69 | any other legal or equitable action to disrupt the quiet enjoyment of the Work
70 | by the public as contemplated by Affirmer's express Statement of Purpose.
71 |
72 | 3. Public License Fallback. Should any part of the Waiver for any reason be
73 | judged legally invalid or ineffective under applicable law, then the Waiver
74 | shall be preserved to the maximum extent permitted taking into account
75 | Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
76 | is so judged Affirmer hereby grants to each affected person a royalty-free,
77 | non transferable, non sublicensable, non exclusive, irrevocable and
78 | unconditional license to exercise Affirmer's Copyright and Related Rights in
79 | the Work (i) in all territories worldwide, (ii) for the maximum duration
80 | provided by applicable law or treaty (including future time extensions), (iii)
81 | in any current or future medium and for any number of copies, and (iv) for any
82 | purpose whatsoever, including without limitation commercial, advertising or
83 | promotional purposes (the "License"). The License shall be deemed effective as
84 | of the date CC0 was applied by Affirmer to the Work. Should any part of the
85 | License for any reason be judged legally invalid or ineffective under
86 | applicable law, such partial invalidity or ineffectiveness shall not
87 | invalidate the remainder of the License, and in such case Affirmer hereby
88 | affirms that he or she will not (i) exercise any of his or her remaining
89 | Copyright and Related Rights in the Work or (ii) assert any associated claims
90 | and causes of action with respect to the Work, in either case contrary to
91 | Affirmer's express Statement of Purpose.
92 |
93 | 4. Limitations and Disclaimers.
94 |
95 | a. No trademark or patent rights held by Affirmer are waived, abandoned,
96 | surrendered, licensed or otherwise affected by this document.
97 |
98 | b. Affirmer offers the Work as-is and makes no representations or warranties
99 | of any kind concerning the Work, express, implied, statutory or otherwise,
100 | including without limitation warranties of title, merchantability, fitness
101 | for a particular purpose, non infringement, or the absence of latent or
102 | other defects, accuracy, or the present or absence of errors, whether or not
103 | discoverable, all to the greatest extent permissible under applicable law.
104 |
105 | c. Affirmer disclaims responsibility for clearing rights of other persons
106 | that may apply to the Work or any use thereof, including without limitation
107 | any person's Copyright and Related Rights in the Work. Further, Affirmer
108 | disclaims responsibility for obtaining any necessary consents, permissions
109 | or other rights required for any use of the Work.
110 |
111 | d. Affirmer understands and acknowledges that Creative Commons is not a
112 | party to this document and has no duty or obligation with respect to this
113 | CC0 or use of the Work.
114 |
115 | For more information, please see
116 |
117 |
118 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # graphgists
2 | Reference Graph Gists
3 |
4 | == Basic Guidelines for Graph Gists
5 |
6 | * Use neo4j-version: 2.3
7 | * Use Neo4j 2.3 features, especially **Labels**
8 | * Adhere to the Cypher style guide (WIP, but: capitalized labels, all-caps rel-types, camel-case properties, if possible consistent keyword casing e.g. all-caps)
9 | * Use meaningful relationship types
10 | * Include a data model between 20 and 150 nodes in size
11 | * Explain the use case and be good read but not a novel
12 | * Include a good domain picture, if possible other illustrating pictures
13 | * Include meta-information about the author and topics
14 | * Use the graphgist tools (//graph_result, //table, //setup, //hide, //output)
15 | * Hide long long setup queries
16 | * Use one line per sentence for easier versioning
17 | * End with //console at the end
18 |
19 |
20 | == Basic Guidelines for Blog Posts
21 |
22 | In order to maximize the impact of the graph gists, we should write and update them with future blog posts in mind.
23 |
24 | * Posts should be 500+ words
25 | * Posts should include at least one picture or graphic (more are always welcome)
26 | * Posts should include at least one code example
27 |
28 |
--------------------------------------------------------------------------------
/browser-guides/apoc/01_apoc_intro.adoc:
--------------------------------------------------------------------------------
1 | = Intro to APOC
2 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc
5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc
6 | :icons: font
7 | :neo4j-version: 3.5
8 |
9 | == Intro to APOC
10 |
11 | In this guide, we will see how to use the standard procedures and functions in the APOC library to assist in many activities with Neo4j.
12 | We will look at some of the most-used procedures, as well as some lesser known, and we will show how to navigate the library to find helpful procedures.
13 |
14 | Before we begin, though, we need to install the APOC library to operate with our Neo4j database instance.
15 |
16 | == Quick Check: Version compatibility matrix
17 |
18 | Since APOC relies on Neo4j's internal APIs in some places, you need to use the right APOC version for your Neo4j installaton.
19 |
20 | APOC uses a consistent versioning scheme: `.`.
21 | The trailing `` part of the version number will be incremented with every apoc release.
22 |
23 | [opts=header]
24 | |===
25 | |apoc version | neo4j version
26 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.5.0.6[3.5.0.6^] | 3.5.12 (3.5.x)
27 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.4.0.4[3.4.0.6^] | 3.4.12 (3.4.x)
28 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.3.0.4[3.3.0.4^] | 3.3.6 (3.3.x)
29 | | http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.2.3.6[3.2.3.6^] | 3.2.9 (3.2.x)
30 | |===
31 |
32 | Full version compatibility matrix is in the https://github.com/neo4j-contrib/neo4j-apoc-procedures#version-compatibility-matrix[APOC docs^].
33 |
34 | == Installation: Getting APOC
35 |
36 | We have a couple of options for installing APOC, depending on what type of Neo4j installation is running.
37 |
38 | 1. *For Neo4j Desktop:* we can install the built-in plugin in the `Project` or `Manage` database view. This automatically takes care of any configurations needed to use APOC. The steps are in the https://neo4j.com/docs/labs/apoc/current/introduction/#_installation_with_neo4j_desktop[APOC documentation^].
39 |
40 | 2. *For Docker:* The Neo4j Docker image allows to supply a volume for the `/plugins` folder.
41 | Steps are in the https://neo4j.com/docs/labs/apoc/current/introduction/#_using_apoc_with_the_neo4j_docker_image[APOC documentation^].
42 |
43 | 3. *Neo4j Sandbox or Aura:* These instances are both cloud-based and come with APOC pre-installed and ready to use! No steps required to use APOC.
44 |
45 | 4. *For Other Installations:* we will need to http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/{apoc-release}[download the jar^] from Github and place it in the `$NEO4J_HOME/plugins` folder. Additional information and initial configuration steps are shown in the https://neo4j.com/docs/labs/apoc/current/introduction/#_manual_installation_download_latest_release[APOC documentation^].
46 |
47 | == Test APOC installation
48 |
49 | To verify everything installed correctly and we are able to run the procedures, we can try to access the APOC help command.
50 |
51 | [source, cypher]
52 | ----
53 | CALL apoc.help('')
54 | ----
55 |
56 | This procedure lists the type (procedure or function), name, text description, signature (format and parameters with types), role, and writes.
57 |
58 | As another option, we can execute the `dbms.procedures()` command and count the procedures in the APOC package.
59 |
60 | [source, cypher]
61 | ----
62 | CALL dbms.procedures() YIELD name
63 | RETURN head(split(name,".")) as package, count(*), collect(name) as procedures;
64 | ----
65 |
66 | == Calling APOC in Cypher
67 |
68 | User-defined functions can be used in any expression or predicate, just like built-in functions.
69 |
70 | Procedures can be called stand-alone with `CALL .();` syntax.
71 | You can also integrate them into your Cypher statements, which makes them much more powerful.
72 |
73 | Load JSON example:
74 |
75 | [source, cypher,subs=attributes]
76 | ----
77 | WITH '{data-url}/person.json' AS url
78 | CALL apoc.load.json(url) YIELD value as person
79 | MERGE (p:Person {name:person.name})
80 | ON CREATE SET p.age = person.age, p.children = size(person.children)
81 | RETURN p
82 | ----
83 |
84 | == Next Step
85 |
86 | In the next section, we are going to see how to use APOC to convert dates and times.
87 |
88 | ifdef::env-guide[]
89 | pass:a[Date & Time Conversion]
90 | endif::[]
91 |
92 | ifdef::env-graphgist[]
93 | link:{gist}/02_datetime.adoc[Date & Time Conversion^]
94 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/apoc/02_datetime.adoc:
--------------------------------------------------------------------------------
1 | = Date & Time Conversion in APOC
2 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img
3 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc
4 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc
5 | :icons: font
6 | :neo4j-version: 3.5
7 |
8 | == APOC Date & Time Conversion
9 |
10 | Neo4j supports date and temporal values, but often, we are dealing with differing date formats between systems or files.
11 | These can be difficult to express and translate without a few flexible procedures to handle converting one value formatting to another.
12 |
13 | APOC has several procedures for converting and formatting various date, time, and temporal values.
14 | They save valuable time in manually converting values or creating a procedure from scratch!
15 | The full list of available procedures is in the https://neo4j.com/docs/labs/apoc/current/temporal/[APOC documentation^].
16 |
17 | == Data set for this guide
18 |
19 | image::{img}/northwind_data_model.png[float=right]
20 |
21 | We will use the Northwind retail system data to test the date and time procedures in this guide.
22 | To load the data, we can run the browser guide below.
23 |
24 | [source,cypher]
25 | ----
26 | :play northwind
27 | ----
28 |
29 | A browser guide will appear.
30 | Go ahead and step through the guide, running all of the queries to load all the `Product`, `Supplier`, `Category`, `Order`, and `Customer` data with indexes on specific properties.
31 |
32 | Once completed, we can move to the next slide and start using APOC with this data.
33 |
34 | == Converting dates from Integer to String
35 |
36 | The APOC `apoc.date.format()` takes an integer value for the date and converts it to a string in the desired format, including a custom one.
37 | This is commonly used when translating data from APIs, flat files, or even other databases and moving that data into or out of Neo4j.
38 |
39 | Format: `apoc.date.format(12345, ['ms'/'s'], ['yyyy/MM/dd HH:mm:ss'])`
40 |
41 | This procedure has 3 parameters -
42 |
43 | 1. the date integer value to convert
44 | 2. how specific the first parameter is (`s` for seconds, `ms` for milliseconds)
45 | 3. how we want the date string result to look
46 |
47 | == apoc.date.format Example:
48 |
49 | Our Northwind data has `Customer` nodes who hopefully make orders with our business.
50 | We probably want to record timestamps when the first contact was sent to the business to see which customers were initially contacted in certain months and which probably made sales in the same year.
51 |
52 | [source, cypher]
53 | ----
54 | WITH 841914000 as dateInt //1996-09-05 09:00 in epoch seconds
55 | MERGE (c:Customer {companyName: 'Island Trading'})
56 | SET c.firstContact = apoc.date.format(dateInt, 's', 'yyyy-MM-dd HH:mm:ss')
57 | RETURN c
58 | ----
59 |
60 | In the example above, we have a date integer in seconds, and we want to update our customer information with that datetime in a human-readable format.
61 | To do that, we merge the `Customer` node and set the `firstContact` property equal to the converted date (using the procedure).
62 |
63 | In the return, we should see the customer's node with all its properties and the formatted date!
64 |
65 | == Converting dates from String to Integer
66 |
67 | Let us do the reverse of what we just did on the previous slide by converting a string value to an integer with `apoc.date.parse()`.
68 | This is helpful for comparing date strings from and to various formats, most commonly in data import or export.
69 |
70 | Format: `apoc.date.parse('2019/03/25 03:15:59', ['ms'/'s'], ['yyyy/MM/dd HH:mm:ss'])`
71 |
72 | The procedure needs 3 parameters -
73 |
74 | 1. the date string that needs converted
75 | 2. how specific the conversion should be (down to seconds `s` or milliseconds `ms`)
76 | 3. what the format is of the date string (1st parameter)
77 |
78 | == apoc.date.parse Example:
79 |
80 | Let us say that we received a notification from our monitoring system that there was an error in the system at timestamp `882230400`, so we need to find out which orders were possibly affected by the error.
81 | We can use `apoc.date.parse()` to convert the string-formatted date in our Northwind data to a timestamp and compare that to the timestamp we have from our error system.
82 |
83 | [source, cypher]
84 | ----
85 | WITH 882230400 as errorTimestamp //1997-12-16 00:00:00.000 in epoch seconds
86 | MATCH (o:Order)
87 | WHERE apoc.date.parse(o.orderDate, 's', 'yyyy-MM-dd HH:mm:ss.SSS') = errorTimestamp
88 | RETURN o
89 | ----
90 |
91 | In our example, we are given a date integer (epoch time from the error in monitoring system) and want to find the orders that were made on that date.
92 | We use `MATCH` to search for `Order` nodes where the converted `orderDate` property (using the procedure) matches the date integer of the error and return the orders that are found.
93 |
94 | In the return, we should see 3 orders that have an order date of `1997-12-16`!
95 |
96 | == Adding or subtracting units from timestamps
97 |
98 | The marketing department might want to see how well a marketing campaign did to generate sales.
99 | The campaign was published at timestamp `891388800`, and we need to find out how many sales it generated within the first 30 days running.
100 |
101 | We can use `apoc.date.add()` to take a point in time of epoch milliseconds (integer) and add or subtract a specified time value to find the desired timestamp.
102 |
103 | Format: `apoc.date.add(12345, 'ms', -365, 'd')`
104 |
105 | The procedure above contains 4 parameters -
106 |
107 | 1. the date integer for adding or subtracting
108 | 2. how specific the date integer is (`s` for seconds, `ms` for milliseconds)
109 | 3. the number to add or subtract from the date integer
110 | 4. the unit type to add or subtract
111 |
112 | == apoc.date.add Example:
113 |
114 | [source, cypher]
115 | ----
116 | WITH 891388800 as startDate
117 | WITH startDate, apoc.date.add(startDate, 's', 30, 'd') as endDate
118 | MATCH (o:Order)
119 | WHERE startDate < apoc.date.parse(o.orderDate,'s','yyyy-MM-dd HH:mm:ss.SSS') < endDate
120 | RETURN count(o)
121 | ----
122 |
123 | In our query above, we first set the campaign start timestamp as a variable and then pass that to the next line, where we also use that `startDate` to calculate our end date (using the procedure).
124 | The `apoc.date.add` calculates it by adding 30 days (the `30` and `d` parameters) to the start date and setting that as our `endDate`.
125 | We then search for `Order` nodes where the `orderDate` (converted from string to integer using `apoc.date.parse()`) is greater than the start date of the campaign and less than the end date.
126 |
127 | In the return, we should see the number of orders made within 30 days of the campaign publish - a total of 70!
128 |
129 | == Converting date string to temporal type
130 |
131 | So far, we have worked with order dates as strings with a particular format.
132 | However, Neo4j supports date and time types, so it would probably make things much easier if we converted to the native types.
133 |
134 | There is an APOC procedure to convert the format from a string to a temporal type.
135 | Since Neo4j is compatible with the https://en.wikipedia.org/wiki/ISO_8601[ISO 8601^] standard, we will use that for our result format.
136 |
137 | Format: `apoc.date.convertFormat('2019-12-31 16:14:20', 'yyyy-MM-dd HH:mm:ss', 'iso_date_format')`
138 |
139 | The procedure contains 3 parameters -
140 |
141 | 1. the date string that needs converted
142 | 2. what the format is of the date string
143 | 3. the format for the resulting temporal type (can be specified manually, as https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html[Java formats^], or as these https://www.elastic.co/guide/en/elasticsearch/reference/5.5/mapping-date-format.html#built-in-date-formats[built-in formats^])
144 |
145 | == apoc.date.convertFormat Example:
146 |
147 | [source, cypher]
148 | ----
149 | MATCH (o:Order)
150 | SET o.isoOrderDate = apoc.date.convertFormat(o.orderDate, 'yyyy-MM-dd HH:mm:ss.SSS', 'iso_date_time')
151 | RETURN o
152 | ----
153 |
154 | In the query above, we find all the orders in our system and set a new property called `isoOrderDate` that is equal to the converted `orderDate` string.
155 | The `orderDate` is converted using the procedure, specifying the string format it is currently in and the `iso_date_time` format (2019-01-01T00:00:00) we want to have as the result.
156 |
157 | Results of the query should return a sample (Browser will limit how much JavaScript has to render) of the orders we updated.
158 | Clicking on one shows all the properties on that node, including the new `isoOrderDate` property that is formatted as we expected!
159 |
160 | == Next Step
161 |
162 | In the next section, we are going to see how to use APOC to load JSON data into Neo4j.
163 |
164 | ifdef::env-guide[]
165 | pass:a[Load JSON Data]
166 | endif::[]
167 |
168 | ifdef::env-graphgist[]
169 | link:{gist}/03_load_json.adoc[Load JSON Data^]
170 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/apoc/05_periodic.adoc:
--------------------------------------------------------------------------------
1 | = Batch Data with APOC
2 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img
3 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc
4 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc
5 | :icons: font
6 | :neo4j-version: 3.5
7 |
8 | == Batch Data in Neo4j with APOC
9 |
10 | Sometimes, the updates that need to be made to data are operationally intensive and require more resources than can be allocated in a single transaction.
11 | APOC provides a few options for batching data to handle these larger demands.
12 |
13 | == Data set for this guide
14 |
15 | image::{img}/northwind_data_model.png[float=right]
16 |
17 | Just like in our previous sections on using APOC for refactoring or importing, we will use the Northwind retail system data to test the refactoring procedures in this guide.
18 |
19 | If you haven't loaded the data from earlier guides in this series (or if you want to start with clean data), you can run the code block below.
20 | The second statement will open the Northwind browser guide where you will need to execute each of the Cypher queries to load the data.
21 |
22 | [source,cypher]
23 | ----
24 | MATCH (n) DETACH DELETE n;
25 | :play northwind-graph;
26 | ----
27 |
28 | == Batching data with apoc.periodic.iterate
29 |
30 | For making updates to data in the graph, we may want to make the update across the entire graph or we may want to select a subset of data for updating.
31 | Either way, we could be dealing with vast amounts of data and may want to batch imports coming from files or other systems to load into our graph.
32 |
33 | The `apoc.periodic.iterate` procedure is one of the best ways to handle a variety of import and update scenarios in a batch manner.
34 | It uses a data-driven statement to select or read data, then uses an operation statement for specifying what we want to do with each batch.
35 |
36 | Format: `apoc.periodic.iterate('data-driven statement', 'operations statement', {config: ...})`
37 |
38 | The procedure has 3 parameters -
39 |
40 | 1. the data-driven statement for selecting/reading data into batches
41 | 2. the operations statement for updating/creating data in batches
42 | 3. any configurations
43 |
44 | == apoc.periodic.commit Example:
45 |
46 | Let's start with an example that is narrow in scope and is based on the need that we might want to flag products that need to be reordered.
47 | Perhaps we want to send our stock associates messages or put these items on a weekly report.
48 |
49 | To do this, we can search for products where our stock level is equal to or less than our reorder level and add an extra label to those nodes for easy retrieval by various systems or people.
50 |
51 | [source,cypher]
52 | ----
53 | CALL apoc.periodic.iterate(
54 | 'MATCH (p:Product) WHERE p.unitsInStock <= p.reorderLevel RETURN p',
55 | 'SET p:Reorder',
56 | {batchSize: 100, batchMode: 'BATCH'}
57 | ) YIELD batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages
58 | RETURN batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages
59 | ----
60 |
61 | Our statement above calls the procedure and uses the first Cypher query to select all of the Products where our stock is less or equal to the reorder level.
62 | Then, our second statement needs to add the `Reorder` label to those `Product` nodes.
63 | Next, we set some config for batchsize and the mode we want batches to execute.
64 | Because our Northwind data set is small, our batch size is also very small (it's not uncommon to see batchSizes set at 10,000 or more on larger graphs).
65 |
66 | Finally, we retrieve some statistics about our procedure execution, so that we have insight if anything goes wrong and can verify all the batches were successful.
67 | Note that since we set our batch size to 100, and we only have 22 updates (22 Product nodes have stock less than/equal to reorder level), it completes in a single batch.
68 | If we had hundreds or thousands of products in our graph and had low stock on most of them, however, we would see more batches.
69 | We could also have added a `parallel: true` config, since these updates wouldn't conflict (no relationships involved).
70 | However, since our graph is very small and we don't have very many updates, we don't need to add this configuration on this statement.
71 |
72 | == Verify results
73 |
74 | We can verify the update worked by running a query like the one below.
75 |
76 | [source,cypher]
77 | ----
78 | MATCH (p:Product)
79 | RETURN p LIMIT 25;
80 | ----
81 |
82 | == Another apoc.periodic.iterate Example:
83 |
84 | Let's take, for instance, that we might want to track and maintain our order line item information as a separate node, rather than properties on a relationship.
85 | We may be querying those relationship properties more often than initially thought, and query performance may see a dip, since relationship properties are not as optimized as patterns.
86 |
87 | To do this, we can use `apoc.periodic.iterate` to select all of the `ORDERS` relationships in our graph and add a `LineItem` intermediary node with relationships.
88 |
89 | [source,cypher]
90 | ----
91 | CALL apoc.periodic.iterate(
92 | 'MATCH (o:Order)-[r:ORDERS]->(p:Product) RETURN r, o, p',
93 | 'MERGE (i:LineItem {id: o.orderID+p.productID})
94 | SET i.quantity = r.quantity, i.unitPrice = r.unitPrice, i.discount = r.discount
95 | MERGE (o)-[rel:HAS_ITEM]->(i)-[rel2:IS_FOR]-(p)
96 | DELETE r',
97 | {batchSize: 10000, batchMode: 'BATCH'}
98 | ) YIELD batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages
99 | RETURN batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages
100 | ----
101 |
102 | Our statement above calls the procedure and selects all of the Orders with an `ORDERS` relationship to Products in the first query.
103 | Then, our second statement takes those patterns and create a new intermediary node (`LineItem`) with the line item properties (from the existing `ORDERS` relationships).
104 | The next merge statement connects the new line items to the related `Order` and `Product` nodes, and the last statement deletes the old `ORDERS` relationships, since we have the new pattern.
105 |
106 | Finally, we set some config for batch size and the mode we want batches to execute.
107 | We retrieve some statistics about our procedure execution, so that we have insight if anything goes wrong and can verify all the batches were successful.
108 | Note that since we set our batch size to 10,000, and we only have 2,155 updates, it completes in a single batch.
109 | If our graph was much larger, however, we could very easily see more batches.
110 |
111 | == Verify results
112 |
113 | We can verify everything looks correct with the query below by selecting a specific customer and pulling all their orders with the new line items and related products.
114 |
115 | [source,cypher]
116 | ----
117 | MATCH (c:Customer {companyName: 'Hanari Carnes'})-[r:PURCHASED]-(o:Order)-[r2:HAS_ITEM]-(i:LineItem)-[r3:IS_FOR]-(p:Product)
118 | RETURN c, r, o, r2, i, r3, p LIMIT 25
119 | ----
120 |
121 | == Next Steps
122 |
123 | You are well on your way to mastering the APOC library and improving your interaction with graph data in Neo4j!
124 | Feel free to check out many of our other APOC resources for continuing your learning and discovering the many more useful procedures and functions available.
125 |
126 | * https://neo4j.com/docs/labs/apoc/current/[Reference the APOC documentation^]
127 | * https://www.youtube.com/playlist?list=PL9Hl4pk2FsvXEww23lDX_owoKoqqBQpdq[Video series: see how to use APOC procedures^]
128 | * https://community.neo4j.com/c/neo4j-graph-platform/procedures-apoc/77[Ask questions: join our Neo4j Community Site to get APOC help^]
129 | * https://neo4j.com/labs/apoc/[Learn more about APOC and contributing^]
130 |
--------------------------------------------------------------------------------
/browser-guides/apoc/apoc.adoc:
--------------------------------------------------------------------------------
1 | = Awesome Procedures on Cypher (APOC)
2 | :author: Jennifer Reif
3 | :description: Learn to use some of the most popular procedures in the APOC library and explore the capabilities the library can provide
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/apoc/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/apoc
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/apoc
7 | :tags: apoc, procedures, temporal, load-json, data-import, refactor, batching, periodic
8 | :neo4j-version: 3.5
9 |
10 | == Welcome to APOC
11 |
12 | The APOC library is a set of standard user-defined procedures to extend Cypher in Neo4j.
13 | User-defined procedures are custom implementations of certain functionality that cannot be easily expressed in Cypher.
14 | They are implemented in Java, so they are deployable to a Neo4j instance and callable directly from Cypher.
15 |
16 | image::{img}/apoc-neo4j-user-defined-procedures.jpg[float=right]
17 |
18 | The APOC library consists of over 450 procedures to help with many different tasks in areas like data integration, data conversion, and much more.
19 |
20 | ifdef::env-guide[]
21 | . pass:a[Intro to APOC]
22 | . pass:a[Date & Time Conversion]
23 | . pass:a[Load JSON Data]
24 | . pass:a[Refactor Data]
25 | . pass:a[Batching & Background Operations]
26 | endif::[]
27 |
28 | ifdef::env-graphgist[]
29 | . link:{gist}/01_apoc_intro.adoc[Intro to APOC^]
30 | . link:{gist}/02_datetime.adoc[Date & Time Conversion^]
31 | . link:{gist}/03_load_json.adoc[Load JSON Data^]
32 | . link:{gist}/04_refactor_data.adoc[Refactor Data^]
33 | . link:{gist}/05_periodic.adoc[Batching & Background Operations^]
34 | endif::[]
35 |
36 | == Resources
37 |
38 | * https://neo4j.com/docs/labs/apoc/current/[APOC Documentation^]
39 | * https://github.com/neo4j-contrib/neo4j-apoc-procedures[Github source code repository^]
40 | * https://neo4j.com/docs/java-reference/current/extending-neo4j/procedures-and-functions/functions/[Neo4j Docs: User-Defined Procedures^]
41 |
--------------------------------------------------------------------------------
/browser-guides/bebe/bebe_en.adoc:
--------------------------------------------------------------------------------
1 | = Introduction to Graphs and Data
2 | :author: Michael Hunger
3 | :description: Introduce graphs and Cypher to young students with hands-on queries and exploration
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/bebe/img
5 | :tags: browser-guide, intro, cypher, students
6 | :neo4j-version: 3.5
7 |
8 | == Welcome to Neo4j!
9 |
10 | image::{img}/cypher_create.jpg[float=right,width=400]
11 |
12 | Neo4j is a database, a storage for *things* and their *relationships*.
13 |
14 | It is operated with a language called _Cypher_.
15 |
16 | With it, you can store things, but also find them again.
17 |
18 | Let's try that now. Continue with the arrow to the right.
19 |
20 | == Save things
21 |
22 | We can create ourselves:
23 |
24 | [source,cypher]
25 | ----
26 | MERGE (me:Person {name: 'Jennifer'})
27 | RETURN me
28 | ----
29 |
30 | And then we can find ourselves, too:
31 |
32 | [source,cypher]
33 | ----
34 | MATCH (p:Person {name: 'Jennifer'})
35 | RETURN p
36 | ----
37 |
38 | We show things as circles: `()` or `(:person {name: 'Jennifer'})`
39 |
40 | Can you find your neighbors? Give it a try!
41 |
42 | We can also find all the people:
43 |
44 | [source,cypher]
45 | ----
46 | MATCH (p:Person)
47 | RETURN p
48 | ----
49 |
50 | == Change things
51 |
52 | We can also store more than the name, like birthday or favorite color.
53 |
54 | We can find each other and then add new information.
55 |
56 | [source,cypher]
57 | ----
58 | MATCH (p:Person {name: 'Jennifer'})
59 | SET p.birthday = 'May'
60 | SET p.color = 'green'
61 | RETURN p
62 | ----
63 |
64 | Now we can see who all likes the color `green`.
65 |
66 | [source,cypher]
67 | ----
68 | MATCH (p:Person)
69 | WHERE p.color = 'green'
70 | RETURN p
71 | ----
72 |
73 | What if we wanted to find out who doesn't like the color green? Or who has a birthday in `July`?
74 |
75 | == Connect things
76 |
77 | For this, we need two (a pair) of things.
78 |
79 | Find *you* and *your* neighbor to your right.
80 |
81 | [source,cypher]
82 | ----
83 | MATCH (a:Person {name: 'Jennifer'})
84 | MATCH (b:Person {name: 'Diego'})
85 | RETURN a,b
86 | ----
87 |
88 | Relationships are arrows like `+-->+` or `+-[:KNOWS]->+`.
89 |
90 | Now we can connect the neighbors.
91 |
92 | [source,cypher]
93 | ----
94 | MATCH (a:Person {name: 'Jennifer'})
95 | MATCH (b:Person {name: 'Diego'})
96 | MERGE (a)-[k:KNOWS]->(b)
97 | RETURN *
98 | ----
99 |
100 | How long is our chain? Could we find all the groups of neighbors?
101 |
102 | [source,cypher]
103 | ----
104 | MERGE (a)-[k:KNOWS]->(b)
105 | RETURN *
106 | ----
107 |
108 | == What can you save?
109 |
110 | Answer: ANYTHING!
111 |
112 | * Hobbies, friends, family
113 | * People, movies, songs, books, comics
114 | * Countries, cities, streets
115 | * Schools, classes, dates and times
116 | * Stars, planets, animals, plants
117 |
118 | Or whatever you feel like and what you are interested in.
119 |
120 | Let's have a look at two things:
121 |
122 | * pass:a[ movies]
123 | * pass:a[helper]
124 |
125 | //Translated with www.DeepL.com/Translator (free version)
--------------------------------------------------------------------------------
/browser-guides/data/person.json:
--------------------------------------------------------------------------------
1 | {"name":"Michael",
2 | "age": 41,
3 | "children": ["Selina","Rana","Selma"]
4 | }
5 |
--------------------------------------------------------------------------------
/browser-guides/data_science/02_analysis_algo.adoc:
--------------------------------------------------------------------------------
1 | = Data Exploration
2 | :author: Neo4j Engineering
3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science
7 | :tags: data-science, gds, graph-algorithms
8 | :neo4j-version: 3.5
9 |
10 | == Data visualization
11 |
12 | Let's briefly explore the dataset before running some algorithms.
13 |
14 | Run the following query to visualize the schema of your graph:
15 |
16 | [source,cypher]
17 | ----
18 | CALL db.schema.visualization()
19 | ----
20 |
21 | The `:Dead`, `:King`, and `:Knight` labels all appear on `:Person` nodes.
22 | You may find it useful to remove them from the visualization to make it easier to inspect.
23 |
24 | == Summary statistics
25 |
26 | Calculate some simple statistics to see how data is distributed.
27 | For example, find the minimum, maximum, average, and standard deviation of the number of interactions per character:
28 |
29 | [source,cypher]
30 | ----
31 | MATCH (c:Person)-[:INTERACTS]->()
32 | WITH c, count(*) AS num
33 | RETURN min(num) AS min, max(num) AS max, avg(num) AS avg_interactions, stdev(num) AS stdev
34 | ----
35 |
36 | Calculate the same grouped by book:
37 |
38 | [source,cypher]
39 | ----
40 | MATCH (c:Person)-[r:INTERACTS]->()
41 | WITH r.book AS book, c, count(*) AS num
42 | RETURN book, min(num) AS min, max(num) AS max, avg(num) AS avg_interactions, stdev(num) AS stdev
43 | ORDER BY book
44 | ----
45 |
46 | == Getting started with algorithms
47 |
48 | With Neo4j, you can run algorithms on explicitly and implicitly created graphs. In this tutorial, we will show you how to get the most out of the following algorithms:
49 |
50 | * Page Rank
51 | * Label Propagation
52 | * Weakly Connected Components (WCC)
53 | * Louvain
54 | * Node Similarity
55 | * Triangle Count
56 | * Local Clustering Coefficient
57 |
58 | == Algorithm syntax
59 |
60 | There are two ways to run algorithms on your graph - implicit and explicit. Explicit is a way to create a subgraph or projected graph that is stored in memory for running multiple algorithms without creating the subgraph each time. For this guide, we will focus on the implicit operation, which runs on the whole dataset or allows the user to create the subgraph adhoc.
61 |
62 | == Algorithm syntax: implicit graphs
63 |
64 | The implicit variant does not access the graph catalog.
65 | If you want to run an algorithm on such a graph, you configure the graph creation within the algorithm configuration map.
66 |
67 | [source]
68 | ----
69 | CALL gds..(
70 | configuration: Map
71 | )
72 | ----
73 |
74 | * `` is the algorithm name.
75 | * `` is the algorithm execution mode.
76 | The supported modes are:
77 | ** `stream`: streams results back to the user.
78 | ** `stats`: returns a summary of the results.
79 | ** `write`: returns stats, as well as writes results to the Neo4j database.
80 | * The `configuration` parameter value is the algorithm-specific configuration.
81 |
82 | After the algorithm execution finishes, the graph is released from memory.
83 |
84 | == Next Steps
85 |
86 | Next, we will dive into using the first algorithm on our dataset - page rank.
87 |
88 | ifdef::env-guide[]
89 | pass:a[Centrality: Page Rank]
90 | endif::[]
91 | ifdef::env-graphgist[]
92 | link:{gist}/03_pagerank.adoc[Centrality: Page Rank^]
93 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/data_science/03_pagerank.adoc:
--------------------------------------------------------------------------------
1 | = Page Rank
2 | :author: Neo4j Engineering
3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science
7 | :tags: data-science, gds, graph-algorithms, pagerank, centrality
8 | :neo4j-version: 3.5
9 |
10 | == Page Rank
11 |
12 | image::{img}/PageRanks-Example.png[float="right", width="300"]
13 |
14 | Page Rank is an algorithm that measures the transitive influence and connectivity of nodes to find the most *influential* nodes in a graph. It computes an influence value for each node, called a _score_. As a result, the score of a node is a certain weighted average of the scores of its direct neighbors.
15 |
16 | *How Page Rank works*
17 |
18 | PageRank is an _iterative_ algorithm.
19 | In each iteration, every node propagates its score evenly divided to its neighbors. The algorithm runs for a configurable maximum number of iterations (default is 20), or until the node scores converge. That occurs when the maximum change in node score between two sequential iterations is smaller than the configured `tolerance` value.
20 |
21 | In the following chapters, you will see how Page Rank identifies the most important nodes.
22 |
23 | == Page Rank: stream mode
24 |
25 | Let's find out who is influential in the graph by running Page Rank.
26 | First, we run a basic Page Rank call in `stream` mode.
27 |
28 | [source, cypher]
29 | ----
30 | CALL gds.pageRank.stream({
31 | nodeProjection: 'Person',
32 | relationshipProjection: {
33 | INTERACTS: {
34 | orientation: 'UNDIRECTED'
35 | }
36 | }
37 | }) YIELD nodeId, score
38 | RETURN gds.util.asNode(nodeId).name AS name, score
39 | ORDER BY score DESC LIMIT 10
40 | ----
41 |
42 | Then, you compare the Page Rank of each `Person` node with the number of interactions for that node.
43 |
44 | [source,cypher]
45 | ----
46 | CALL gds.pageRank.stream({
47 | nodeProjection: 'Person',
48 | relationshipProjection: {
49 | INTERACTS: {
50 | orientation: 'UNDIRECTED'
51 | }
52 | }
53 | }) YIELD nodeId, score AS pageRank
54 | WITH gds.util.asNode(nodeId) AS n, pageRank
55 | MATCH (n)-[i:INTERACTS]-()
56 | RETURN n.name AS name, pageRank, count(i) AS interactions
57 | ORDER BY pageRank DESC LIMIT 10
58 | ----
59 |
60 | The result shows that not always the most talkative characters have the highest rank.
61 |
62 | == Page Rank: write mode
63 |
64 | Now that we have the results from our Page Rank query, we can write them back to Neo4j and use them for further queries. Specify the name of the property to which the algorithm will write using the `writeProperty` key in the config map passed to the procedure.
65 |
66 | [source,cypher]
67 | ----
68 | CALL gds.pageRank.write({
69 | nodeProjection: 'Person',
70 | relationshipProjection: {
71 | INTERACTS: {
72 | orientation: 'UNDIRECTED'
73 | }
74 | },
75 | writeProperty: 'pageRank'})
76 | ----
77 |
78 | == Page Rank: rank per book
79 |
80 | Along with the generic `INTERACTS` relationships, you also have `INTERACTS_1`, `INTERACTS_2`, etc. for the different books.
81 | Let's compute and write the Page Rank scores for the first book.
82 |
83 | [source, cypher]
84 | ----
85 | CALL gds.pageRank.write({
86 | nodeProjection: 'Person',
87 | relationshipProjection: {
88 | INTERACTS_1: {
89 | orientation: 'UNDIRECTED'
90 | }
91 | },
92 | writeProperty: 'pageRank1'
93 | })
94 | ----
95 |
96 | == Page Rank: exercise
97 |
98 | Let's see what you have learned so far.
99 |
100 | Try to calculate the Page Rank of the other books in the series and store the results in the database to measure and analyze influence.
101 |
102 | * Write queries that call `gds.pageRank.write` for the `INTERACTS_2`, `INTERACTS_3`, `INTERACTS_4`, and `INTERACTS_5` relationship types. (*Hint:* take a look at the previous query as a model)
103 |
104 | == Page Rank: answer questions
105 |
106 | Now, try to write queries to answer the following questions:
107 |
108 | * Which character has the biggest increase in influence from book 1 to 5?
109 | * Which character has the biggest decrease?
110 |
111 | *Note:* Answers are on the next slide.
112 |
113 | == Page Rank: exercise answer
114 |
115 | .Biggest increase
116 | [source, cypher]
117 | ----
118 | MATCH (p:Person)
119 | RETURN p.name, p.pageRank1, p.pageRank5, p.pageRank5 - p.pageRank1 AS difference
120 | ORDER BY difference DESC
121 | LIMIT 10
122 | ----
123 |
124 | .Biggest decrease
125 | [source, cypher]
126 | ----
127 | MATCH (p:Person)
128 | RETURN p.name, p.pageRank1, p.pageRank5, p.pageRank5 - p.pageRank1 AS difference
129 | ORDER BY difference
130 | LIMIT 10
131 | ----
132 |
133 | == Next Steps
134 |
135 | The next guide will look at the label propagation algorithm to find groups of people in communities.
136 |
137 | ifdef::env-guide[]
138 | pass:a[Communities: Label Propagation]
139 | endif::[]
140 | ifdef::env-graphgist[]
141 | link:{gist}/04_label_propagation.adoc[Communities: Label Propagation^]
142 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/data_science/04_label_propagation.adoc:
--------------------------------------------------------------------------------
1 | = Label Propagation
2 | :author: Neo4j Engineering
3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science
7 | :tags: data-science, gds, graph-algorithms, label-propagation, community
8 | :neo4j-version: 3.5
9 |
10 | == Label Propagation
11 |
12 | image::{img}/label-propagation-graph-algorithm-1.png[float="right",width=300]
13 |
14 | Label Propagation (LPA) is a fast algorithm for finding communities in a graph. It propagates labels throughout the graph and forms communities of nodes based on their influence.
15 |
16 | **How Label Propagation works**
17 |
18 | LPA is an _iterative_ algorithm.
19 | First, it assigns a unique community label to each node. In each iteration, the algorithm changes this label to the most common one among its neighbors. Densely connected nodes quickly broadcast their labels across the graph.
20 | At the end of the propagation, only a few labels remain. Nodes that have the same community label at convergence are considered to be in the same community. The algorithm runs for a configurable maximum number of iterations, or until it converges.
21 |
22 | For more details, see _https://neo4j.com/docs/graph-data-science/current/algorithms/label-propagation/[the documentation^]_.
23 |
24 | == Label Propagation: example
25 |
26 | Let's run label propagation to find the five largest communities of people interacting with each other. The weight property on the relationship represents the number of interactions between two people. In LPA, the weight is used to determine the influence of neighboring nodes when voting on community assignment.
27 |
28 | Let's now run LPA with just one iteration:
29 |
30 | [source, cypher]
31 | ----
32 | CALL gds.labelPropagation.stream({
33 | nodeProjection: 'Person',
34 | relationshipProjection: {
35 | INTERACTS: {
36 | orientation: 'UNDIRECTED',
37 | properties: 'weight'
38 | }
39 | },
40 | relationshipWeightProperty: 'weight',
41 | maxIterations: 1
42 | }) YIELD nodeId, communityId
43 | RETURN communityId, count(nodeId) AS size
44 | ORDER BY size DESC
45 | LIMIT 5
46 | ----
47 |
48 | You can see that the nodes are assigned to initial communities. However, the algorithm needs multiple iterations to achieve a stable result.
49 | So, let's run the same procedure with two iterations and see how the results change.
50 |
51 | [source, cypher]
52 | ----
53 | CALL gds.labelPropagation.stream({
54 | nodeProjection: 'Person',
55 | relationshipProjection: {
56 | INTERACTS: {
57 | orientation: 'UNDIRECTED',
58 | properties: 'weight'
59 | }
60 | },
61 | relationshipWeightProperty: 'weight',
62 | maxIterations: 2
63 | }) YIELD nodeId, communityId
64 | RETURN communityId, count(nodeId) AS size
65 | ORDER BY size DESC
66 | LIMIT 5
67 | ----
68 |
69 | Usually, label propagation requires more than a few iterations to converge on a stable result.
70 | The number of the required iterations depends on the graph structure -- you should experiment.
71 | When you don't see the numbers in each community changing (or changing very minimally), then you have probably arrived at a good number of iterations.
72 |
73 | == Label Propagation: seeding
74 |
75 | Label Propagation can be seeded with an initial community label from a pre-existing node property. This allows you to compute communities incrementally. Let's write the results after the first iteration back to the source graph, under the write property name `community`.
76 |
77 | [source, cypher]
78 | ----
79 | CALL gds.labelPropagation.write({
80 | nodeProjection: 'Person',
81 | relationshipProjection: {
82 | INTERACTS: {
83 | orientation: 'UNDIRECTED',
84 | properties: 'weight'
85 | }
86 | }
87 | relationshipWeightProperty: 'weight',
88 | maxIterations: 1,
89 | writeProperty: 'community'
90 | })
91 | ----
92 |
93 | You can now use the `community` property as a seed property for the second iteration.
94 | The results should be the same as the previous run with two iterations. Seeding is particularly useful when the source graph grows and you want to compute communities incrementally without starting again from scratch.
95 |
96 | Now, you can use the `seed` configuration key to specify the property from which you want to seed community IDs.
97 |
98 | [source, cypher]
99 | ----
100 | CALL gds.labelPropagation.stream({
101 | nodeProjection: {
102 | Person: {
103 | properties: 'community'
104 | }
105 | },
106 | relationshipProjection: {
107 | INTERACTS: {
108 | orientation: 'UNDIRECTED',
109 | properties: 'weight'
110 | }
111 | },
112 | relationshipWeightProperty: 'weight',
113 | maxIterations: 1,
114 | seedProperty: 'community'
115 | }) YIELD nodeId, communityId
116 | RETURN communityId, count(nodeId) AS size
117 | ORDER BY size DESC
118 | LIMIT 5
119 | ----
120 |
121 | == Label Propagation: exercise
122 |
123 | Now that you understand the basics of LPA, let's experiment a little.
124 |
125 | 1. How many iterations does it take for LPA to converge on a stable number of communities? How many communities do you end up with?
126 |
127 | 2. What happens when you run LPA for 1,000 maxIterations? (_hint: try using YIELD ranIterations_)
128 |
129 | 3. What happens if you run LPA without weights? Do you find the same communities?
130 |
131 | *Bonus task*: What if you use house affiliations as seeds for communities? How would you use Cypher to create the initial seeds? Run the algorithm with the new seeds. Do you find a different set of communities?
132 |
133 | == Label Propagation: exercise answers
134 |
135 | 1. 5 iterations is when the results stabilize and don't seem to change by increasing iterations more than 5.
136 |
137 | 2. It only actually runs 6 times (5 to stabilize and the 6th to verify the community stabilization).
138 |
139 | [source,cypher]
140 | ----
141 | CALL gds.labelPropagation.stats({
142 | nodeProjection: 'Person',
143 | relationshipProjection: {
144 | INTERACTS: {
145 | orientation: 'UNDIRECTED',
146 | properties: 'weight'
147 | }
148 | },
149 | relationshipWeightProperty: 'weight',
150 | maxIterations: 1000
151 | }) YIELD ranIterations
152 | ----
153 |
154 | The above query uses the stats mode (stream does not output _ranIterations_) and outputs the ranIterations statistic.
155 |
156 | == Label Propagation: exercise answers
157 |
158 | 3. It does change the results. The communities are larger.
159 |
160 | [source,cypher]
161 | ----
162 | CALL gds.labelPropagation.stream({
163 | nodeProjection: {
164 | Person: {
165 | properties: 'community'
166 | }
167 | },
168 | relationshipProjection: {
169 | INTERACTS: {
170 | orientation: 'UNDIRECTED'
171 | }
172 | },
173 | maxIterations: 5
174 | }) YIELD nodeId, communityId
175 | RETURN communityId, count(nodeId) AS size
176 | ORDER BY size DESC
177 | LIMIT 5
178 | ----
179 |
180 | == Label Propagation: exercise answers
181 |
182 | *Bonus task*: First, we need to write the algorithm to seed the communities for houses. The node query needs to pull both `Person` and `House` nodes into our graph on which to run label propagation. For the relationship query, we need to create our relationship query to both start and end on the `Person` nodes because the algorithms currently only support monopartite graphs.
183 |
184 | [source,cypher]
185 | ----
186 | CALL gds.labelPropagation.write({
187 | nodeQuery: 'MATCH (n) WHERE n:Person OR n:House RETURN id(n) as id',
188 | relationshipQuery: 'MATCH (p1:Person)-[:BELONGS_TO]->(:House)<-[:BELONGS_TO]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target',
189 | writeProperty: 'houseCommunity'
190 | })
191 | ----
192 |
193 | Now that we have seeded the communities, we can run the label propagation algorithm on those communities.
194 |
195 | [source,cypher]
196 | ----
197 | CALL gds.labelPropagation.stream({
198 | nodeQuery: 'MATCH (n) WHERE n:Person OR n:House RETURN id(n) as id',
199 | relationshipQuery: 'MATCH (p1:Person)-[:BELONGS_TO]->(:House)<-[:BELONGS_TO]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target',
200 | maxIterations: 2,
201 | seedProperty: 'houseCommunity'
202 | }) YIELD nodeId, communityId
203 | RETURN communityId, count(nodeId) AS size
204 | ORDER BY size DESC
205 | LIMIT 5
206 | ----
207 |
208 | == Next Steps
209 |
210 | The next guide will look stay in the community detection algorithms with louvain.
211 |
212 | ifdef::env-guide[]
213 | pass:a[Communities: Louvain]
214 | endif::[]
215 | ifdef::env-graphgist[]
216 | link:{gist}/05_louvain.adoc[Communities: Louvain^]
217 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/data_science/05_louvain.adoc:
--------------------------------------------------------------------------------
1 | = Louvain
2 | :author: Neo4j Engineering
3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science
7 | :tags: data-science, gds, graph-algorithms, louvain, community
8 | :neo4j-version: 3.5
9 |
10 | == Louvain
11 |
12 | image::{img}/louvain.jpg[float="right",width=300]
13 |
14 | The Louvain algorithm, like Label Propagation, is a community detection algorithm that identifies clusters of nodes in a graph.
15 | It calculates how densely connected the nodes within a community are.
16 | Louvain also reveals a hierarchy of communities at different scales, which enables you to zoom in on different levels of granularity and find sub-communities within other sub-communities.
17 |
18 | *How Louvain works*
19 |
20 | Louvain is a _greedy_, _hierarchical clustering_ algorithm, meaning that it repeats the following two steps until it finds a global optimum:
21 |
22 | . Assign the nodes to communities, favoring local grouping.
23 | . Aggregate the nodes from the same community to form a single node, which inherits all connected relationships.
24 |
25 | These two steps are repeated until no further reassignments of communities are possible.
26 | You can get different results between different runs of the Louvain algorithm because the nodes can be reassigned to groups randomly.
27 |
28 | *What to consider*
29 |
30 | Louvain is significantly slower than Label Propagation, and the results can be hard to interpret.
31 |
32 | The algorithm can also use weights to calculate the communities.
33 | A good sign that you need to tweak your schema or weighting is when you notice that the results include only a _single_ giant community, or every node is a community on its own.
34 |
35 | == Louvain: examples
36 |
37 | Let's compute the Louvain community structure of our person interactions.
38 |
39 | [source, cypher]
40 | ----
41 | CALL gds.louvain.stream({
42 | nodeProjection: 'Person',
43 | relationshipProjection: {
44 | INTERACTS: {
45 | orientation: 'UNDIRECTED'
46 | }
47 | }
48 | })
49 | YIELD nodeId, communityId
50 | RETURN gds.util.asNode(nodeId).name AS person, communityId
51 | ORDER BY communityId DESC
52 | ----
53 |
54 | The query returns the name of each person and the id of the community to which it belongs.
55 | If you want to investigate how many communities are available, and the number of members of each community, you can change the RETURN statement.
56 |
57 | [source, cypher]
58 | ----
59 | CALL gds.louvain.stream({
60 | nodeProjection: 'Person',
61 | relationshipProjection: {
62 | INTERACTS: {
63 | orientation: 'UNDIRECTED'
64 | }
65 | }
66 | })
67 | YIELD nodeId, communityId
68 | RETURN communityId, COUNT(DISTINCT nodeId) AS members
69 | ORDER BY members DESC
70 | ----
71 |
72 | The result is 1382 communities, 11 of which with more than one member.
73 |
74 | == Louvain: weighting
75 |
76 | Now let's run the Louvain algorithm on a weighted graph.
77 | This way, it considers the relationship weights when calculating the modularity.
78 |
79 | We will need to use the `weight` property on the INTERACTS relationship to evaluate communities with weights:
80 |
81 | [source,cypher]
82 | ----
83 | CALL gds.louvain.stream({
84 | nodeProjection: 'Person',
85 | relationshipProjection: {
86 | INTERACTS: {
87 | orientation: 'UNDIRECTED',
88 | aggregation: 'NONE',
89 | properties: {
90 | weight: {
91 | property: 'weight',
92 | aggregation: 'NONE',
93 | defaultValue: 0.0
94 | }
95 | }
96 | }
97 | },
98 | relationshipWeightProperty: 'weight'
99 | })
100 | YIELD nodeId, communityId
101 | RETURN communityId, COUNT(DISTINCT nodeId) AS members
102 | ORDER BY members DESC
103 | ----
104 |
105 | The result is 1384 communities, 13 of which with more than one member.
106 |
107 | == Louvain: intermediate communities
108 |
109 | Now let's try to identify communities at multiple levels in the graph: first small communities, and then combine those smaller groups into larger ones.
110 |
111 | To retrieve the intermediate communities, set `includeIntermediateCommunities` to `true`:
112 |
113 | [source,cypher]
114 | ----
115 | CALL gds.louvain.stream({
116 | nodeProjection: 'Person',
117 | relationshipProjection: {
118 | INTERACTS: {
119 | orientation: 'UNDIRECTED',
120 | aggregation: 'NONE',
121 | properties: {
122 | weight: {
123 | property: 'weight',
124 | aggregation: 'NONE',
125 | defaultValue: 0.0
126 | }
127 | }
128 | }
129 | },
130 | includeIntermediateCommunities: true
131 | })
132 | YIELD nodeId, communityId, intermediateCommunityIds
133 | RETURN communityId, COUNT(DISTINCT nodeId) AS members, intermediateCommunityIds
134 | ----
135 |
136 | You can extract membership in different levels of communities and see how the composition changes:
137 |
138 | [source,cypher]
139 | ----
140 | CALL gds.louvain.stream({
141 | nodeProjection: 'Person',
142 | relationshipProjection: {
143 | INTERACTS: {
144 | orientation: 'UNDIRECTED',
145 | aggregation: 'NONE',
146 | properties: {
147 | weight: {
148 | property: 'weight',
149 | aggregation: 'NONE',
150 | defaultValue: 0.0
151 | }
152 | }
153 | }
154 | },
155 | includeIntermediateCommunities: true
156 | })
157 | YIELD nodeId, intermediateCommunityIds
158 | RETURN count(distinct intermediateCommunityIds[0]), count(distinct intermediateCommunityIds[1])
159 | ----
160 |
161 | `includeIntermediateCommunities: false` is the default value, in which case, the `intermediateCommunityIds` field of the result is `null`.
162 |
163 | == Next Steps
164 |
165 | In the next guide, we will go back to centrality algorithms with a look at betweenness centrality.
166 |
167 | ifdef::env-guide[]
168 | pass:a[Centralities: Betweenness]
169 | endif::[]
170 | ifdef::env-graphgist[]
171 | link:{gist}/06_betweenness.adoc[Centralities: Betweenness^]
172 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/data_science/06_betweenness.adoc:
--------------------------------------------------------------------------------
1 | = Betweenness Centrality
2 | :author: Neo4j Engineering
3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science
7 | :tags: data-science, gds, graph-algorithms, betweenness, centrality
8 | :neo4j-version: 3.5
9 |
10 | == Betweenness Centrality
11 |
12 | image::{img}/Graph_betweenness[float="right", width="300"]
13 |
14 | *How Betweenness Centrality works*
15 |
16 | The algorithm calculates shortest paths without weighting between all pairs of nodes in the graph.
17 | Each node receives a score based on the number of shortest paths that pass through the node.
18 | Nodes that lie on more shortest paths between other nodes will have higher betweenness centrality scores.
19 |
20 | == Betweenness Centrality: stream mode
21 |
22 | Let's find out who is influential in the graph by running Betweenness Centrality.
23 |
24 | First, you run the Betweenness Centrality algorithm in `stream` mode.
25 |
26 | [source, cypher]
27 | ----
28 | CALL gds.betweenness.stream({
29 | nodeProjection: 'Person',
30 | relationshipProjection: {
31 | INTERACTS: {
32 | orientation: 'UNDIRECTED'
33 | }
34 | }
35 | }) YIELD nodeId, score
36 | RETURN gds.util.asNode(nodeId).name AS name, score
37 | ORDER BY score DESC LIMIT 10
38 | ----
39 |
40 | If you ran Page Rank previously, you may notice that the result is similar.
41 | You can run the Page Rank query again and compare the result.
42 |
43 | [source, cypher]
44 | ----
45 | CALL gds.pageRank.stream({
46 | nodeProjection: 'Person',
47 | relationshipProjection: {
48 | INTERACTS: {
49 | orientation: 'UNDIRECTED'
50 | }
51 | }
52 | }) YIELD nodeId, score
53 | RETURN gds.util.asNode(nodeId).name AS name, score
54 | ORDER BY score DESC LIMIT 10
55 | ----
56 |
57 | The result is similar, but not identical.
58 | In general, betweenness centrality is a good metric to identify bottlenecks and bridges in a graph, while page rank is used to understand the influence of a node in a network.
59 |
60 | == Betweenness Centrality: stats, write and mutate
61 |
62 | In stats mode, betweenness centrality will return the minimum, maximum, and sum of the centrality scores.
63 |
64 | [source, cypher]
65 | ----
66 | CALL gds.betweenness.stats({
67 | nodeProjection: 'Person',
68 | relationshipProjection: {
69 | INTERACTS: {
70 | orientation: 'UNDIRECTED'
71 | }
72 | }
73 | })
74 | YIELD minimumScore, maximumScore, scoreSum
75 | ----
76 |
77 | The same is returned by the write and mutate modes as well, in addition to writing results back to Neo4j (write mode) or mutating the in-memory graph (mutate mode).
78 |
79 | == Next Steps
80 |
81 | Congratulations! We have learned and practiced some of the key algorithms for studying influence (centrality) and communities (community detection).
82 | For additional learning, see the full and expanded https://localhost:7474/browser?cmd=play&arg=graph-data-science[guide] for the graph data science library.
83 | https://neo4j.com/docs/graph-data-science/current/[Reference documentation] for the Neo4j graph data science library is also available for detailed information.
--------------------------------------------------------------------------------
/browser-guides/data_science/data_science.adoc:
--------------------------------------------------------------------------------
1 | = Introduction to Graph Data Science
2 | :author: Neo4j Engineering
3 | :description: Get an introduction to the graph data science library with hands-on practice with some of the key graph algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/data_science/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data_science
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/data_science
7 | :tags: data-science, gds, graph-algorithms
8 | :neo4j-version: 3.5
9 |
10 | == Welcome to an Introduction to Graph Data Science
11 |
12 | The Neo4j Graph Data Science (GDS) library contains a set of graph algorithms exposed through Cypher procedures.
13 | Graph algorithms provide insights into the graph structure and elements, for example, by computing centrality and similarity scores and detecting communities.
14 |
15 | This guide follows the ordinary workflow for running the product tier algorithms: PageRank, Label Propagation, Louvain, and Betweenness Centrality. We will cover the following concepts:
16 |
17 | * Create a graph and import the data.
18 | * Configure the algorithm to suit your needs and the data.
19 |
20 | image::{img}/graph-data-science.jpg[float=right]
21 |
22 | ifdef::env-guide[]
23 | . pass:a[Data and Import]
24 | . pass:a[Data Exploration]
25 | . pass:a[Page Rank]
26 | . pass:a[Label Propagation]
27 | . pass:a[Louvain]
28 | . pass:a[Betweenness Centrality]
29 | endif::[]
30 |
31 | ifdef::env-graphgist[]
32 | . link:{gist}/01_data_import.adoc[Data and Import^]
33 | . link:{gist}/02_analysis_algo.adoc[Data Exploration^]
34 | . link:{gist}/03_pagerank.adoc[Page Rank^]
35 | . link:{gist}/04_label_propagation.adoc[Label Propagation^]
36 | . link:{gist}/05_louvain.adoc[Louvain^]
37 | . link:{gist}/06_betweenness.adoc[Betweenness Centrality^]
38 | endif::[]
39 |
40 | == Further Resources
41 |
42 | For more resources, see link:https://neo4j.com/developer/graph-data-science/[the developer guides^].
43 |
44 | The official Graph Data Science (GDS) library documentation can be found link:https://neo4j.com/docs/graph-data-science/current/[here^].
--------------------------------------------------------------------------------
/browser-guides/data_science/installing_apoc.adoc:
--------------------------------------------------------------------------------
1 | = Installing Awesome Procedures (apoc)
2 | :author: Neo4j Engineering
3 |
4 | == APOC library installation
5 |
6 | https://twitter.com/mesirii[Michael Hunger] has created the
7 | https://github.com/neo4j-contrib/neo4j-apoc-procedures[apoc] library which contains lots of useful procedures that we can use in our Neo4j applications.
8 |
9 | Let’s get `apoc` installed on our local instances of Neo4j:
10 |
11 | * You should have already copied `apoc.jar` onto your machine. If you haven’t, then grab a USB stick from one of the trainers or download the latest version of apoc from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/latest
12 | * Copy `apoc.jar` into your `plugins` folder wherever you have installed Neo4j.
13 | * Restart Neo4j
14 |
15 | == Check apoc installed correctly
16 |
17 | If you run the following command, you can see which additional procedures are now available to us:
18 |
19 | [source,highlight,pre-scrollable,programlisting,cm-s-neo,code,runnable,standalone-example,ng-binding]
20 | ----
21 | CALL dbms.procedures()
22 | YIELD name, signature
23 | WITH name, signature
24 | WHERE name STARTS WITH "apoc"
25 | RETURN name, signature
26 | ----
27 |
28 | If you don’t see any rows, grab your closest trainer for help.
29 | Once you’ve got it installed, you can close this guide and return to the previous one.
--------------------------------------------------------------------------------
/browser-guides/got/got.adoc:
--------------------------------------------------------------------------------
1 | = Neo4j Graph of Thrones and Data Science
2 | :author: Mark Needham
3 | :description: Explore the Game of Thrones world with Cypher and data science algorithms
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/got/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got
7 | :tags: intro, cypher, load-csv, gds, algorithms, data-science
8 | :neo4j-version: 3.5
9 |
10 | == Welcome to Neo4j Graph of Thrones and Data Science
11 |
12 | image:{img}/got_header.png[got_header,float=right,width=500]
13 |
14 | ifdef::env-guide[]
15 | . pass:a[Exploratory Data Analysis]
16 | . pass:a[Applied Graph Algorithms]
17 | endif::[]
18 |
19 | ifdef::env-graphgist[]
20 | . link:{gist}/01_eda.adoc[Exploratory Data Analysis^]
21 | . link:{gist}/02_algorithms.adoc[Applied Graph Algorithms^]
22 | endif::[]
23 |
24 | == Further Resources
25 |
26 | * https://neo4j.com/graphgists[Graphgist Examples]
27 | * https://neo4j.com/docs/stable/cypher-refcard/[Cypher Reference Card]
28 | * https://neo4j.com/docs/cypher-manual/current/[Neo4j Cypher Manual]
29 | * https://graphdatabases.com[e-book: Graph Databases (free)]
30 |
--------------------------------------------------------------------------------
/browser-guides/got_wwc/01_intro.adoc:
--------------------------------------------------------------------------------
1 | = Intro to Neo4j and Cypher
2 | :csv-url: https://raw.githubusercontent.com/neo4j-meetups/modeling-worked-example/master/data/
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc
5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc
6 | :icons: font
7 | :neo4j-version: 3.5
8 |
9 | == Intro to Neo4j
10 |
11 | Welcome to the first of a set of interactive guides.
12 | In these guides you'll execute some pre written Cypher queries as well as having the chance to write some yourself.
13 |
14 | Let's get started!
15 |
16 | == Your Turn - `CREATE` Arya
17 |
18 | We'll use the `CREATE` keyword to create a node representing Arya Stark.
19 | Run the following query:
20 |
21 | [source,cypher]
22 | ----
23 | CREATE (:Character {name: 'Arya Stark'})
24 | ----
25 |
26 | This query:
27 |
28 | * creates a node
29 | * with the `Character` label and
30 | * a `name` property with value `Arya Stark`
31 |
32 | Properties are stored as key/value pairs.
33 | The allowed data types are: strings, numbers, booleans and arrays.
34 |
35 | == MATCH - Finding Arya
36 |
37 | Now let's try and find Arya.
38 |
39 | We want to `MATCH` a pattern in the graph.
40 | In this case that pattern is a node with the `Character` label and with the `name` property set to `Arya Stark`.
41 |
42 | [source,cypher]
43 | ----
44 | MATCH (character:Character {name: 'Arya Stark'})
45 | RETURN character
46 | ----
47 |
48 | This is syntactic sugar for the following long hand:
49 |
50 | [source,cypher]
51 | ----
52 | MATCH (character:Character)
53 | WHERE character.name = 'Arya Stark'
54 | RETURN character
55 | ----
56 |
57 | == SET - Add and update properties
58 |
59 | Let's add Arya's title to the Arya node:
60 |
61 | [source, cypher]
62 | ----
63 | MATCH (character:Character {name: 'Arya Stark'})
64 | SET character.title = "Princess"
65 | RETURN character
66 | ----
67 |
68 | == Schema-less by default
69 |
70 | Try creating Arya again:
71 |
72 | [source,cypher]
73 | ----
74 | CREATE (:Character {name: 'Arya Stark'})
75 | ----
76 |
77 | What happens?
78 |
79 | [source,cypher]
80 | ----
81 | MATCH (character:Character {name: 'Arya Stark'})
82 | RETURN character
83 | ----
84 |
85 | Oh no! We've now got two Aryas!
86 |
87 | == Constraints
88 |
89 | image::{img}/slides.jpg[]
90 |
91 | == Constraints
92 |
93 | Let's create a constraint on the `name` property for any `Character` nodes so we don't end up with duplicates.
94 |
95 | [source, cypher]
96 | ----
97 | CREATE CONSTRAINT character_name ON (c:Character)
98 | ASSERT c.name IS UNIQUE;
99 | ----
100 |
101 | Unfortunately we can't actually create the constraint because we already have two `Character` nodes with the same `name`.
102 |
103 | == Deleting the second Arya
104 |
105 | We need to delete the second Arya we created.
106 |
107 | We can work out which node that is by finding the one that doesn't have the `title` property.
108 | We'll then use the `DELETE` command to get rid of that node:
109 |
110 | [source, cypher]
111 | ----
112 | MATCH (character:Character {name: 'Arya Stark'})
113 | WHERE NOT EXISTS (character.title)
114 | DELETE character
115 | ----
116 |
117 | Now we can try and apply our constraint again:
118 |
119 | [source, cypher]
120 | ----
121 | CREATE CONSTRAINT character_name ON (c:Character)
122 | ASSERT c.name IS UNIQUE;
123 | ----
124 |
125 | You can see which constraints and indexes have been created by running the following command:
126 |
127 | [source, cypher]
128 | ----
129 | :schema
130 | ----
131 |
132 | == MERGE - Get-Or-Create
133 |
134 | Now let's try and create Arya again:
135 |
136 | [source,cypher]
137 | ----
138 | CREATE (:Character {name: 'Arya Stark'})
139 | ----
140 |
141 | This time the unique constraint stops us.
142 |
143 | The `MERGE` keyword can come in useful here.
144 | `MERGE` will:
145 |
146 | * `MATCH` to check the whole pattern exists
147 | * If not, Cypher will `CREATE` it
148 | * `MERGE`-ing on the constraint - ensures strong guarantees
149 |
150 | [source, cypher]
151 | ----
152 | MERGE (character:Character {name: 'Arya Stark'})
153 | RETURN character
154 | ----
155 |
156 | == Exercise: Create some more nodes
157 |
158 | Now it's your turn!
159 | We need to create nodes to represent `House Stark` and `Winter is Coming`.
160 |
161 | image::{img}/nodes.png[]
162 |
163 | == Answer: Create some more nodes
164 |
165 | [source,cypher]
166 | ----
167 | MERGE (allegiance:House {name: 'House Stark'})
168 | RETURN allegiance
169 | ----
170 |
171 | [source,cypher]
172 | ----
173 | MERGE (episode:Episode {number: 1})
174 | ON CREATE SET episode.title = 'Winter is Coming'
175 | RETURN episode
176 | ----
177 |
178 | == Create relationships
179 |
180 | Now we need to connect our nodes together.
181 |
182 | We'll start by writing a query to find and return `Arya Stark` and `House Stark`:
183 |
184 | [source, cypher]
185 | ----
186 | MATCH (house:House {name: 'House Stark'})
187 | MATCH (character:Character {name: 'Arya Stark'})
188 | RETURN character, house
189 | ----
190 |
191 | To create a relationship between them we can use the `CREATE` or `MERGE` keywords.
192 |
193 | [source, cypher]
194 | ----
195 | MATCH (house:House {name: 'House Stark'})
196 | MATCH (character:Character {name: 'Arya Stark'})
197 | CREATE (character)-[:HAS_ALLEGIANCE_TO]->(house)
198 | ----
199 |
200 | or
201 |
202 | [source, cypher]
203 | ----
204 | MATCH (house:House {name: 'House Stark'})
205 | MATCH (character:Character {name: 'Arya Stark'})
206 | MERGE (character)-[:HAS_ALLEGIANCE_TO]->(house)
207 | ----
208 |
209 | The `MERGE` version of the query will only create the relationship once no matter how many times we run it.
210 | The `CREATE` version will create a new relationship each time we run it.
211 |
212 | == Exercise: Create a relationship between `Arya Stark` and `Winter is Coming`
213 |
214 | Following the example in the previous example, let's now create a relationship with Arya and Winter is Coming.
215 |
216 | == Answer: Create a relationship between `Arya Stark` and `Winter is Coming`
217 |
218 | [source, cypher]
219 | ----
220 | MATCH (character:Character {name: 'Arya Stark'})
221 | MATCH (episode:Episode {number: 1})
222 | MERGE (character)-[:APPEARED_IN]->(episode)
223 | ----
224 |
225 | == Next Step
226 |
227 | In the next section we're going to import the full dataset and play with that
228 |
229 | ifdef::env-guide[]
230 | pass:a[Game of Thrones dataset]
231 | endif::[]
232 | ifdef::env-graphgist[]
233 | link:{gist}/02_got.adoc[Game of Thrones dataset^]
234 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/got_wwc/02_got.adoc:
--------------------------------------------------------------------------------
1 | = Game of Thrones: Characters and Episodes
2 | :csv-url: https://raw.githubusercontent.com/mneedham/neo4j-got/master/data/import
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc
5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc
6 | :icons: font
7 | :neo4j-version: 3.5
8 |
9 | == The Game of Thrones dataset
10 |
11 | Now that you've got a bit of practice with the Cypher syntax it's time to work with a bigger dataset.
12 |
13 | For this section it's best to start with a blank slate but first a quick overview about deleting graph data.
14 |
15 | image::{img}/slides.jpg[]
16 |
17 | == Delete all the things
18 |
19 | Run the following query to delete all the data we've created so far:
20 |
21 | [source, cypher]
22 | ----
23 | MATCH (n)
24 | DETACH DELETE n
25 | ----
26 |
27 | The `DETACH DELETE` clause deletes a node and any relationships connected to it.
28 |
29 | We'll also delete the constraint that we created on `:Character(name)`.
30 | In the real dataset, some characters have the same name and are distinguished by having a different `id`.
31 |
32 | [source, cypher]
33 | ----
34 | DROP CONSTRAINT character_name;
35 | ----
36 |
37 | Now we're ready to explore the Game of Thrones dataset.
38 |
39 | == LOAD CSV - The ETL Power Tool
40 |
41 | We're going to be using the `LOAD CSV` command in Cypher so first look at the slides for a brief introduction.
42 |
43 | image::{img}/slides.jpg[]
44 |
45 | == LOAD CSV - Exploring the data
46 |
47 | As well as importing data from CSV files we can also use `LOAD CSV` to explore those same files.
48 |
49 | Run the following query to see how many characters there are:
50 |
51 | [source, cypher,subs=attributes]
52 | ----
53 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters.csv" AS row
54 | RETURN COUNT(*)
55 | ----
56 |
57 | Refer to the link:https://neo4j.com/docs/cypher-refcard/current/[Cypher Refcard^] to see the full set of commands/functions available to us.
58 |
59 | == LOAD CSV - Exploring the data
60 |
61 | We can look at the individual rows by returning them directly rather than applying the `COUNT` function.
62 |
63 | The following query will return the first 5 rows of the CSV file.
64 |
65 | [source, cypher,subs=attributes]
66 | ----
67 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters.csv" AS row
68 | RETURN row
69 | LIMIT 5
70 | ----
71 |
72 | The `LIMIT` clause works the same way as in SQL.
73 |
74 | Try returning more rows or removing the `LIMIT` clause to see what other data the file contains.
75 |
76 | == Create the characters
77 |
78 | Now let's combine `LOAD CSV` with the commands we learnt in the first half of the session and put all the GoT characters into our graph.
79 | Run the following query:
80 |
81 | [source, cypher,subs=attributes]
82 | ----
83 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters.csv" AS row
84 | MERGE (c:Character {id: row.link})
85 | ON CREATE SET c.name = row.character
86 | ----
87 |
88 | This query:
89 |
90 | * iterates over every row in the `characters.csv` file
91 | * creates a node with the `Character` label and an `id` property if such a node doesn't already exist
92 | * sets the `name` property if the node is being created
93 |
94 | == Finding characters
95 |
96 | Now let's see what we've imported into the database.
97 | Run the following query to see a sample of the nodes we've just created:
98 |
99 | [source, cypher]
100 | ----
101 | MATCH (c:Character)
102 | RETURN c
103 | ORDER BY rand()
104 | LIMIT 25
105 | ----
106 |
107 | The use of the `rand()` function means we get a different 25 characters each time.
108 | Try running the query a few times.
109 |
110 | Now that we've got the characters loaded it's time to import some episodes for them to appear in.
111 |
112 | == Importing episodes
113 |
114 | We have a CSV file containing episodes which we can explore by running the following query:
115 |
116 | [source, cypher, subs=attributes]
117 | ----
118 | LOAD CSV WITH HEADERS FROM "{csv-url}/overview.csv" AS row
119 | RETURN row
120 | LIMIT 10
121 | ----
122 |
123 | We'll run the following query to create a node for each episode:
124 |
125 | [source, cypher, subs=attributes]
126 | ----
127 | LOAD CSV WITH HEADERS FROM "{csv-url}/overview.csv" AS row
128 | MERGE (episode:Episode {id: toInteger(row.episodeId)})
129 | ON CREATE SET
130 | episode.season = toInteger(row.season),
131 | episode.number = toInteger(row.episode),
132 | episode.title = row.title
133 | ----
134 |
135 | By default properties have the `String` data type.
136 | In this case we want `season`, `number` and `id` to be numeric so we coerce the data using the `toInteger` function.
137 |
138 | So now we've got characters and episodes but we still haven't got a graph as they aren't connected yet.
139 | Let's do that next.
140 |
141 | == Connecting episodes and characters
142 |
143 | (Surprise, surprise) We also have a CSV file containing the episodes that characters appeared in.
144 | We can explore that by running the following query:
145 |
146 | [source, cypher, subs=attributes]
147 | ----
148 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_episodes.csv" AS row
149 | RETURN row
150 | LIMIT 10
151 | ----
152 |
153 | We're going to create an `APPEARED_IN` relationship between a `Character` and `Episode` for each row in the file.
154 |
155 | [source, cypher, subs=attributes]
156 | ----
157 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_episodes.csv" AS row
158 | MATCH (episode:Episode {id: toInteger(row.episodeId)})
159 | MATCH (character:Character {id: row.character})
160 | MERGE (character)-[:APPEARED_IN]->(episode)
161 | ----
162 |
163 | This query:
164 |
165 | * looks up an episode
166 | * looks up a character
167 | * creates an `APPEARED_IN` relationship between them if one doesn't already exist.
168 |
169 | If you run this query again you'll see that it doesn't do anything the second time around.
170 |
171 | == Characters and Episodes
172 |
173 | We should now have a graph connecting Game of Thrones characters with the episodes that they appear in.
174 |
175 | Run the following query to check everything has imported correctly:
176 |
177 | [source, cypher]
178 | ----
179 | MATCH (character:Character)-[:APPEARED_IN]->(episode:Episode)
180 | RETURN *
181 | ORDER BY rand()
182 | LIMIT 25
183 | ----
184 |
185 | This query:
186 |
187 | * looks up nodes with the label `Character`
188 | * that have an outgoing `APPEARED_IN` relationship
189 | * to nodes with the label `Episode`
190 | * and finds 25 paths that match that pattern and returns them
191 |
192 | Spend a couple of minutes clicking around the graph visualisation to get a feel for the data we've imported.
193 |
194 | == Aggregation queries
195 |
196 | In the next section we'll have an exercise where you will write queries to answer some questions.
197 | A couple of these queries will require use of aggregation functions so we'll quickly go over those.
198 |
199 | Perhaps the most obvious question to answer after after importing characters and episodes is `Who appeared in the most episodes?`.
200 | We can write the following query to answer that question:
201 |
202 | [source, cypher]
203 | ----
204 | MATCH (character:Character)-[:APPEARED_IN]->()
205 | RETURN character.name, COUNT(*) AS appearances
206 | ORDER BY appearances DESC
207 | ----
208 |
209 | Look at the slides for a quick explanation of this query:
210 |
211 | image::{img}/slides.jpg[]
212 |
213 | == Exercise
214 |
215 | Here's a few questions for you to try and answer:
216 |
217 | * Who appeared in the most episodes in season 4?
218 | * Which `Stark` character appeared the least times?
219 | * Which episodes does `Arya Stark` not appear in? (You'll need to write a `WHERE NOT` clause in this query)
220 |
221 | Don't forget the link:https://neo4j.com/docs/cypher-refcard/current/[Cypher Refcard^] is your friend!
222 |
223 | == Answer: Who appeared in the most episodes in season 4?
224 |
225 | [source, cypher]
226 | ----
227 | MATCH (character:Character)-[:APPEARED_IN]->({season: 4})
228 | RETURN character.id, character.name, COUNT(*) AS appearances
229 | ORDER BY appearances DESC
230 | ----
231 |
232 | == Answer: Which `Stark` character appeared the least times?
233 |
234 | [source, cypher]
235 | ----
236 | MATCH (character:Character)-[:APPEARED_IN]->()
237 | WHERE character.name ENDS WITH "Stark"
238 | RETURN character.id, character.name, COUNT(*) AS appearances
239 | ORDER BY appearances
240 | LIMIT 1
241 | ----
242 |
243 | == Answer: Which episodes does `Arya Stark` not appear in?
244 |
245 | [source, cypher]
246 | ----
247 | MATCH (episode: Episode)
248 | WHERE NOT (:Character {name: "Arya Stark"})-[:APPEARED_IN]->(episode)
249 | RETURN episode
250 | ORDER BY episode.id
251 | ----
252 |
253 | == Next Step
254 |
255 | In the next section we're going to look at the houses that characters belong to.
256 |
257 | ifdef::env-guide[]
258 | pass:a[Houses]
259 | endif::[]
260 | ifdef::env-graphgist[]
261 | link:{gist}/03_got_houses.adoc[Houses^]
262 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/got_wwc/03_got_houses.adoc:
--------------------------------------------------------------------------------
1 | = Game of Thrones: Houses
2 | :csv-url: https://raw.githubusercontent.com/mneedham/neo4j-got/master/data/import/
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc
5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc
6 | :icons: font
7 | :neo4j-version: 3.5
8 |
9 | == Importing houses
10 |
11 | In this next section we're going to import the houses that characters belong to.
12 |
13 | Run the following query to explore the houses CSV file:
14 |
15 | [source, cypher,subs=attributes]
16 | ----
17 | LOAD CSV WITH HEADERS FROM "{csv-url}/houses.csv" AS row
18 | RETURN row
19 | ----
20 |
21 | Now let's create a node with the `House` label for each row in the file:
22 |
23 | [source, cypher,subs=attributes]
24 | ----
25 | LOAD CSV WITH HEADERS FROM "{csv-url}/houses.csv" AS row
26 | MERGE (house:House {id: row.link})
27 | ON CREATE SET house.name = row.name
28 | ----
29 |
30 | Run the following query to return all the houses:
31 |
32 | [source, cypher]
33 | ----
34 | MATCH (house:House)
35 | RETURN house
36 | ----
37 |
38 | You should see 73 nodes if the import has worked as expected.
39 |
40 | == Exercise: Create allegiances
41 |
42 | Now it's your turn!
43 | Run the following query to view the allegiances between characters and houses:
44 |
45 | [source, cypher,subs=attributes]
46 | ----
47 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_houses.csv" AS row
48 | RETURN row.character, row.house
49 | ----
50 |
51 | Now create a `HAS_ALLEGIANCE_TO` relationship for each character/house pair in the file.
52 |
53 | == Answer: Create allegiances
54 |
55 | [source, cypher,subs=attributes]
56 | ----
57 | LOAD CSV WITH HEADERS FROM "{csv-url}/characters_houses.csv" AS row
58 | MATCH (character:Character {id: row.character})
59 | MATCH (house:House {id: row.house})
60 | MERGE (character)-[:HAS_ALLEGIANCE_TO]->(house)
61 | ----
62 |
63 | == Exploring allegiances
64 |
65 | Run the following query to check that the allegiances have been created:
66 |
67 | [source, cypher]
68 | ----
69 | MATCH (character:Character)-[:HAS_ALLEGIANCE_TO]->(house)
70 | RETURN character.id, character.name, count(*) AS allegiances
71 | ORDER BY allegiances DESC
72 | ----
73 |
74 | You should see `Randyll Tarly` in first place with 4 houses.
75 |
76 | == Exercise: What houses do people have allegiance to?
77 |
78 | Time for another mini exercise.
79 |
80 | See if you can tweak the query from the previous slide to include the names of the houses as well as the count.
81 |
82 | _Tip_ The link:https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-collect[`collect` function] will be helpful.
83 |
84 | == Answer: What houses do people have allegiance to?
85 |
86 | We just need to add a call to `collect()` as part of our `RETURN` statement:
87 |
88 | [source, cypher]
89 | ----
90 | MATCH (character:Character)-[:HAS_ALLEGIANCE_TO]->(house)
91 | RETURN character.id, character.name, collect(house.name) AS houses, count(*) AS allegiances
92 | ORDER BY allegiances DESC
93 | ----
94 |
95 | == Appearances of the Starks
96 |
97 | In the previous guide we wrote the following query to find the `Stark` character who appeared in the least episodes.
98 |
99 | [source, cypher]
100 | ----
101 | MATCH (character:Character)-[:APPEARED_IN]->()
102 | WHERE character.name ENDS WITH "Stark"
103 | RETURN character.id, character.name, count(*) AS appearances
104 | ORDER BY appearances
105 | LIMIT 1
106 | ----
107 |
108 | This query made the assumption that all members of `House Stark` had a name that ended in `Stark`, which isn't necessarily the case.
109 |
110 | The following query finds the most prominent members of `House Stark` who don't have a `Stark` surname:
111 |
112 | [source, cypher]
113 | ----
114 | MATCH (:House {name: "House Stark"})<-[:HAS_ALLEGIANCE_TO]-(character:Character)-[:APPEARED_IN]->()
115 | WHERE NOT(character.name ENDS WITH "Stark")
116 | RETURN character.id, character.name, count(*) AS appearances
117 | ORDER BY appearances DESC
118 | ----
119 |
120 | Try tweaking the query to see if there are prominent characters of other houses who don't have the surname of that House.
121 |
122 | == Multiple aggregations
123 |
124 | Let's revisit one of our queries from the first section where we found the characters who'd appeared in the most episodes:
125 |
126 | [source, cypher]
127 | ----
128 | MATCH (character:Character)-[:APPEARED_IN]->()
129 | RETURN character.name, count(*) AS appearances
130 | ORDER BY appearances DESC
131 | ----
132 |
133 | It would be cool to see which houses each character had allegiance too as well.
134 | We might try to extend the query to use the `collect` function to do this with the following query:
135 |
136 | [source, cypher]
137 | ----
138 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character)-[:APPEARED_IN]->()
139 | RETURN character.id, character.name, collect(house.name) AS houses, count(*) AS appearances
140 | ORDER BY appearances DESC
141 | ----
142 |
143 | Unfortunately, this doesn't give us the result we might have expected.
144 | We've got the house names repeated loads of times and `appearances` is now wrong!
145 |
146 | == Multiple aggregations
147 |
148 | The problem is that we're doing aggregations across different relationships.
149 | But we're trying to do it all in one go.
150 |
151 | Look at the slides for an explanation of how we can use the `WITH` keyword to get around this:
152 |
153 | image::{img}/slides.jpg[]
154 |
155 | == Multiple aggregations using `WITH`
156 |
157 | The following query will correctly calculate the houses and appearances for each character:
158 |
159 | [source, cypher]
160 | ----
161 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character)
162 | WITH character, collect(house.name) AS houses
163 | MATCH (character)-[:APPEARED_IN]->()
164 | RETURN character.id, character.name, houses, count(*) AS appearances
165 | ORDER BY appearances DESC
166 | ----
167 |
168 | == Exercise
169 |
170 | Update the query to:
171 |
172 | * only show characters who have appeared in 30 or more episodes.
173 | * and have allegiance to more than 1 house.
174 |
175 | _Tip_ The link:https://neo4j.com/docs/cypher-manual/current/clauses/with/[WITH] and link:https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-size[size()] documentation pages are your friends.
176 |
177 | == Answer: only show characters who have appeared in 30 or more episodes.
178 |
179 | [source, cypher]
180 | ----
181 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character)
182 |
183 | WITH character, collect(house.name) AS houses
184 | MATCH (character)-[:APPEARED_IN]->()
185 |
186 | WITH character, houses, count(*) AS appearances
187 | WHERE appearances >= 30
188 |
189 | RETURN character.id, character.name, houses, appearances
190 | ORDER BY appearances DESC
191 | ----
192 |
193 | == Answer: only show characters who have appeared in 30 or more episodes and have allegiance to more than 1 house.
194 |
195 | [source, cypher]
196 | ----
197 | MATCH (house:House)<-[:HAS_ALLEGIANCE_TO]-(character:Character)
198 |
199 | WITH character, collect(house.name) AS houses
200 | WHERE size(houses) > 1
201 | MATCH (character)-[:APPEARED_IN]->()
202 |
203 | WITH character, houses, count(*) AS appearances
204 | WHERE appearances >= 30
205 |
206 | RETURN character.id, character.name, houses, appearances
207 | ORDER BY appearances DESC
208 | ----
209 |
210 | == Next Step
211 |
212 | In the next section, we're going to look at the relationships between different characters and the houses they belong to.
213 |
214 | ifdef::env-guide[]
215 | pass:a[Family Ties]
216 | endif::[]
217 | ifdef::env-graphgist[]
218 | link:{gist}/04_got_families.adoc[Family Ties^]
219 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/got_wwc/04_got_families.adoc:
--------------------------------------------------------------------------------
1 | = Game of Thrones: Families
2 | :csv-url: https://raw.githubusercontent.com/mneedham/neo4j-got/master/data/import
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc
5 | :guides: https://guides.neo4j.com/got_wwc
6 | :icons: font
7 | :neo4j-version: 3.5
8 |
9 | == Importing families
10 |
11 | In this final section, we're going to import the family relationships between characters.
12 |
13 | Run the following query to explore the family ties CSV file:
14 |
15 | [source,cypher,subs=attributes]
16 | ----
17 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row
18 | RETURN row
19 | ----
20 |
21 | We've got mother and father relationships between pairs of characters.
22 | The father relationships are a bit more nuanced which we can see by running the following query:
23 |
24 | [source,cypher, subs=attributes]
25 | ----
26 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row
27 | RETURN DISTINCT row.relationship, row.type
28 | ----
29 |
30 | We have biological fathers, adoptive and legal fathers.
31 |
32 | == Importing families
33 |
34 | First let's import the mother relationships.
35 | We'll create a `PARENT_OF` relationship from a mother to their child for each row in the CSV file:
36 |
37 | [source, cypher, subs=attributes]
38 | ----
39 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row
40 | WITH row WHERE row.relationship = "mother"
41 | MATCH (character1:Character {id: row.character1})
42 | MATCH (character2:Character {id: row.character2})
43 | MERGE (character2)-[:PARENT_OF {type: "mother"}]->(character1)
44 | ----
45 |
46 | Now we'll do the same with the fathers but we'll also record the type of father relationship as part of the `type` property on the relationship.
47 |
48 | [source, cypher, subs=attributes]
49 | ----
50 | LOAD CSV WITH HEADERS FROM "{csv-url}/family_ties.csv" AS row
51 | WITH row WHERE row.relationship = "father"
52 | MATCH (character1:Character {id: row.character1})
53 | MATCH (character2:Character {id: row.character2})
54 | MERGE (character2)-[:PARENT_OF {type: row.type + " " + row.relationship}]->(character1)
55 | ----
56 |
57 | == Who are Jon Snow's parents?
58 |
59 | Now we can finally start exploring the relationships between different characters.
60 | We'll start by finding Jon Snow's parents:
61 |
62 | [source, cypher]
63 | ----
64 | MATCH path = (character:Character {name: "Jon Snow"})<-[:PARENT_OF]-(parent)
65 | RETURN path
66 | ----
67 |
68 | This query returns both his biological and adoptive fathers.
69 | If we want to only return his biological parents we can tweak the query like so:
70 |
71 | [source, cypher]
72 | ----
73 | MATCH path = (character:Character {name: "Jon Snow"})<-[parentOf:PARENT_OF]-(parent)
74 | WHERE parentOf.type IN ["mother", "biological father"]
75 | RETURN path
76 | ----
77 |
78 | == Who are Jon Snow's biological grandparents?
79 |
80 | And finally we can find Jon Snow's biological grandparents!
81 |
82 | [source, cypher]
83 | ----
84 | MATCH path = (:Character {name: "Jon Snow"})<-[:PARENT_OF*..2]-(parent)
85 | WHERE all(x in relationships(path) WHERE x.type IN ["mother", "biological father"])
86 | RETURN path
87 | ----
--------------------------------------------------------------------------------
/browser-guides/got_wwc/got_wwc.adoc:
--------------------------------------------------------------------------------
1 | = An Intro to Neo4j with Game of Thrones
2 | :author: Mark Needham
3 | :description: Learn Cypher and explore the Game of Thrones world
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/got_wwc/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/got_wwc
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/got_wwc
7 | :tags: browser-guide, intro, cypher, load-csv, aggregation
8 | :neo4j-version: 3.5
9 |
10 | == Welcome to An Intro to Neo4j with Game of Thrones
11 |
12 | image::{img}/nodes.png[float=right,width=400]
13 |
14 | ifdef::env-guide[]
15 | . pass:a[Intro to Cypher]
16 | . pass:a[Game of Thrones: Characters and Episodes]
17 | . pass:a[Game of Thrones: Houses]
18 | . pass:a[Game of Thrones: Family Ties]
19 | endif::[]
20 |
21 | ifdef::env-graphgist[]
22 | . link:{gist}/01_intro.adoc[Intro to Cypher^]
23 | . link:{gist}/02_got.adoc[Game of Thrones: Characters and Episodes^]
24 | . link:{gist}/03_got_houses.adoc[Game of Thrones: Houses^]
25 | . link:{gist}/04_got_families.adoc[Game of Thrones: Family Ties^]
26 | endif::[]
27 |
28 | == Further Resources
29 |
30 | * https://neo4j.com/graphgists[GraphGist Examples^]
31 | * https://neo4j.com/docs/cypher-refcard/current/[Cypher Reference Card^]
32 | * https://neo4j.com/docs/cypher-manual/current/[Neo4j Cypher Manual^]
33 | * https://neo4j.com/developer/cypher-resources/[Cypher Resources^]
34 | * https://graphdatabases.com[Free e-book: Graph Databases^]
--------------------------------------------------------------------------------
/browser-guides/hospital/hospital.adoc:
--------------------------------------------------------------------------------
1 | = Working with Hierarchical Trees in Neo4j
2 | :author: Tomaz Bratanic
3 | :description: Approach hierarchical tree structures in Neo4j by querying and exploring a hospital data set
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/hospital/img
5 | :tags: hierarchy, trees, parent-child, hospital, load-csv, apoc
6 | :neo4j-version: 3.5
7 |
8 | image:{img}/hospitalmeta.jpg[hospitalmeta,width=400]
9 |
10 | == Introduction
11 |
12 | My name is Tomaz Bratanic. I want to demonstrate how you should approach hierarchical location trees in Neo4j. From what I have learned during importing/querying with them, I came up with a few ground rules
13 | one should follow to in order to get the correct query results.
14 |
15 | === Rules of location tree:
16 |
17 | * _All relationships are directed from children to parents, going up the
18 | hiearchy._
19 | * _We have a single type for all relationships. (PARENT;FROM;IS_IN)_
20 | * _Every node has a single outgoing relationship to its parent._
21 | * _Every node can have one or multiple incoming relationships from its
22 | children._
23 |
24 | === Contact:
25 |
26 | * _twitter: @tb_tomaz_
27 | * _github: https://github.com/tomasonjo_
28 | * _blog: https://tbgraph.wordpress.com/category/hospital_
29 |
30 | == Import
31 |
32 | Let's load some data into our graph to explore.
33 |
34 | === Add constraints and indexes
35 |
36 | First, we need to add indexes and constraints, as they will optimize our queries. The first array in the procedure below sets the indexes, and the second array contains the unique constraints. You will need to have the APOC library installed.
37 |
38 | [source,cypher]
39 | ----
40 | CREATE INDEX ON :County(name);
41 | CREATE INDEX ON :City(name);
42 | CREATE INDEX ON :ZipCode(name);
43 | CREATE INDEX ON :Address(name);
44 |
45 | CREATE CONSTRAINT ON (h:Hospital) ASSERT h.id IS UNIQUE;
46 | CREATE CONSTRAINT ON (s:State) ASSERT s.name IS UNIQUE;
47 | ----
48 |
49 | == Location hierarchical tree import
50 |
51 | You can notice that we do not take the standard approach, where we
52 | merge each node separately, but we merge them in pattern with their
53 | parent in a hierarchical tree because some counties/cities/addresses share
54 | the same name.
55 |
56 | [source,cypher]
57 | ----
58 | LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
59 | WITH row
60 | WHERE row.State = 'NY'
61 | // state name is unique
62 | MERGE (state:State{name:row.State})
63 | // merge by pattern with their parents
64 | MERGE (state)<-[:IS_IN]-(county:County{name:row.`County Name`})
65 | MERGE (county)<-[:IS_IN]-(city:City{name:row.City})
66 | MERGE (city)<-[:IS_IN]-(zip:ZipCode{name:row.`ZIP Code`})
67 | MERGE (zip)<-[:IS_IN]-(address:Address{name:row.Address})
68 | // for entities, it is best to have an id system
69 | MERGE (h:Hospital{id:row.`Provider ID`})
70 | MERGE (h)-[:IS_IN]->(address)
71 | ----
72 |
73 | == Additional hospital information
74 |
75 | We will also import some additional information about the hospitals such as their ratings, ownership, and more.
76 |
77 | [source,cypher]
78 | ----
79 | LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
80 | WITH row
81 | WHERE row.State = 'NY'
82 | MATCH (h:Hospital{id:row.`Provider ID`})
83 | SET h.phone=row.`Phone Number`,
84 | h.emergency_services = row.`Emergency Services`,
85 | h.name= row.`Hospital Name`,
86 | h.mortality = row.`Mortality national comparison`,
87 | h.safety = row.`Safety of care national comparison`,
88 | h.timeliness = row.`Timeliness of care national comparison`,
89 | h.experience = row.`Patient experience national comparison`,
90 | h.effectiveness = row.`Effectiveness of care national comparison`
91 | MERGE (type:HospitalType{name:row.`Hospital Type`})
92 | MERGE (h)-[:HAS_TYPE]->(type)
93 | MERGE (ownership:Ownership{name: row.`Hospital Ownership`})
94 | MERGE (h)-[:HAS_OWNERSHIP]->(ownership)
95 | MERGE (rating:Rating{name:row.`Hospital overall rating`})
96 | MERGE (h)-[:HAS_RATING]->(rating)
97 | ----
98 |
99 | == Geospatial import
100 |
101 | The last thing to import is the geospatial information of hospitals.
102 |
103 | [source,cypher]
104 | ----
105 | LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/gpsinfo.csv" as row
106 | MATCH (hospital:Hospital {id:row.id})
107 | SET hospital.latitude = toFloat(row.latitude),
108 | hospital.longitude = toFloat(row.longitude)
109 | ----
110 |
111 | == Spatial query example
112 |
113 | Let's say you get lost on `Liberty Island` and want to find the nearest 10
114 | hospitals. Distance is in meters. *Note: does not work in Neo4j Sandbox.*
115 |
116 | [source,cypher]
117 | ----
118 | WITH "Liberty Island, Manhattan" as myLocation
119 | call apoc.spatial.geocodeOnce(myLocation) YIELD location
120 | WITH point({longitude: location.longitude, latitude: location.latitude}) as myPosition,100 as distanceInKm
121 | MATCH (h:Hospital)-->(rating:Rating)
122 | WHERE exists(h.latitude) and
123 | distance(myPosition, point({longitude:h.longitude,latitude:h.latitude})) < (distanceInKm * 100)
124 | RETURN h.name as hospital,rating.name as rating,distance(myPosition,
125 | point({longitude:h.longitude,latitude:h.latitude})) as distance
126 | ORDER BY distance LIMIT 10
127 | ----
128 |
129 | == Data Validation
130 |
131 | === Validation #1
132 |
133 | We can check if any `:Address` has more than one relationship going up the hierarchy. Every node has a single outgoing relationship to its parent rule.
134 |
135 | [source,cypher]
136 | ----
137 | MATCH (a:Address)
138 | WHERE size((a)-[:IS_IN]->()) > 1
139 | RETURN a
140 | ----
141 |
142 | === Validation #2
143 |
144 | We can also check the length of all the paths in location tree.
145 | Because of our rules we placed, every hospital must have exactly one
146 | location path because every hospital have exactly one address.
147 |
148 | [source,cypher]
149 | ----
150 | MATCH path=(h:Hospital)-[:IS_IN*..10]->(location)
151 | WHERE NOT (location)-[:IS_IN]->()
152 | RETURN distinct(length(path)) as length,
153 | count(*) as numberOfPaths,
154 | count(distinct(h)) as numberOfHospitals
155 | ----
156 |
157 | == Data Validation
158 |
159 | === Validation #3
160 |
161 | Check how many labels each node has.
162 | This is useful when learning. You do not wish to have nodes without labels.
163 |
164 | [source,cypher]
165 | ----
166 | MATCH (n)
167 | RETURN size(labels(n)) as size,count(*) as count
168 | ----
169 |
170 | == Queries
171 |
172 | Lets run a few queries and learn about our data.
173 |
174 | === Average rating by ownership
175 |
176 | [source,cypher]
177 | ----
178 | MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:HAS_OWNERSHIP]->(o)
179 | RETURN o.name as ownership,avg(toInteger(r.name)) as averageRating
180 | ORDER BY averageRating DESC LIMIT 15
181 | ----
182 |
183 | === Number of hospitals per city
184 |
185 | [source,cypher]
186 | ----
187 | MATCH (h:Hospital)-[:IS_IN*3..3]->(city)
188 | RETURN city.name as city,count(h) as NumberOfHospitals
189 | ORDER BY NumberOfHospitals DESC LIMIT 15
190 | ----
191 |
192 | == Queries
193 |
194 | === Top 10 states by rating
195 |
196 | [source,cypher]
197 | ----
198 | MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:IS_IN*5..5]->(state)
199 | WHERE NOT r.name="Not Available"
200 | RETURN state.name as state,avg(toInteger(r.name)) as averageRating,count(h) as numberOfHospitals
201 | ORDER BY averageRating DESC LIMIT 15
202 | ----
203 |
204 | === Which states have the most above-average hospitals in effectivness
205 |
206 | [source,cypher]
207 | ----
208 | MATCH (h:Hospital)-[:IS_IN*5..5]->(state)
209 | WHERE h.effectiveness = "Above the National average"
210 | RETURN state.name as state,h.effectiveness,count(h) as numberOfHospitals
211 | ORDER BY numberOfHospitals DESC LIMIT 15
212 | ----
213 |
214 | === Which states have the most below-average hospitals in mortality
215 |
216 | [source,cypher]
217 | ----
218 | MATCH (h:Hospital)-[:IS_IN*5..5]->(state)
219 | WHERE h.mortality = "Below the National average"
220 | RETURN state.name as state,h.mortality,count(h) as numberOfHospitals
221 | ORDER BY numberOfHospitals DESC LIMIT 15
222 | ----
223 |
--------------------------------------------------------------------------------
/browser-guides/img/AStormOfSwords.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/AStormOfSwords.jpg
--------------------------------------------------------------------------------
/browser-guides/img/Graph_betweenness.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/Graph_betweenness.jpg
--------------------------------------------------------------------------------
/browser-guides/img/PageRanks-Example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/PageRanks-Example.png
--------------------------------------------------------------------------------
/browser-guides/img/apoc-neo4j-user-defined-procedures.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/apoc-neo4j-user-defined-procedures.jpg
--------------------------------------------------------------------------------
/browser-guides/img/betweenness-centrality.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/betweenness-centrality.png
--------------------------------------------------------------------------------
/browser-guides/img/bugs-bunny-the-end.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/bugs-bunny-the-end.jpg
--------------------------------------------------------------------------------
/browser-guides/img/char_cooccurence.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/char_cooccurence.png
--------------------------------------------------------------------------------
/browser-guides/img/cypher_create.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/cypher_create.jpg
--------------------------------------------------------------------------------
/browser-guides/img/cypher_run_button.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/cypher_run_button.jpg
--------------------------------------------------------------------------------
/browser-guides/img/cytutorial_neo4j_browser.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/cytutorial_neo4j_browser.jpg
--------------------------------------------------------------------------------
/browser-guides/img/dark-chocolate-pudding-with-malted-cream.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/dark-chocolate-pudding-with-malted-cream.jpg
--------------------------------------------------------------------------------
/browser-guides/img/database_import.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/database_import.png
--------------------------------------------------------------------------------
/browser-guides/img/document_common_attributes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/document_common_attributes.png
--------------------------------------------------------------------------------
/browser-guides/img/download_csv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/download_csv.png
--------------------------------------------------------------------------------
/browser-guides/img/download_graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/download_graph.png
--------------------------------------------------------------------------------
/browser-guides/img/enable_multiline_queries.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/enable_multiline_queries.jpg
--------------------------------------------------------------------------------
/browser-guides/img/footballtransfer-model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/footballtransfer-model.png
--------------------------------------------------------------------------------
/browser-guides/img/got_header.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/got_header.png
--------------------------------------------------------------------------------
/browser-guides/img/graph-data-science.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/graph-data-science.jpg
--------------------------------------------------------------------------------
/browser-guides/img/hospitalmeta.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/hospitalmeta.jpg
--------------------------------------------------------------------------------
/browser-guides/img/jqassistant.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/jqassistant.png
--------------------------------------------------------------------------------
/browser-guides/img/label-propagation-graph-algorithm-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/label-propagation-graph-algorithm-1.png
--------------------------------------------------------------------------------
/browser-guides/img/label-propagation-graph-algorithm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/label-propagation-graph-algorithm.png
--------------------------------------------------------------------------------
/browser-guides/img/life-science-import-datamodel.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/life-science-import-datamodel.jpg
--------------------------------------------------------------------------------
/browser-guides/img/life-sciences-import-model-attribute.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/life-sciences-import-model-attribute.jpg
--------------------------------------------------------------------------------
/browser-guides/img/life-sciences-import-model-gene.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/life-sciences-import-model-gene.jpg
--------------------------------------------------------------------------------
/browser-guides/img/louvain.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/louvain.jpg
--------------------------------------------------------------------------------
/browser-guides/img/meetup.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/meetup.png
--------------------------------------------------------------------------------
/browser-guides/img/n10s.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/n10s.png
--------------------------------------------------------------------------------
/browser-guides/img/neo4j-browser-sync.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/neo4j-browser-sync.png
--------------------------------------------------------------------------------
/browser-guides/img/nodes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/nodes.png
--------------------------------------------------------------------------------
/browser-guides/img/northwind_data_model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/northwind_data_model.png
--------------------------------------------------------------------------------
/browser-guides/img/pin_button.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/pin_button.png
--------------------------------------------------------------------------------
/browser-guides/img/rdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/rdf.png
--------------------------------------------------------------------------------
/browser-guides/img/restaurant_recommendation_model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/restaurant_recommendation_model.png
--------------------------------------------------------------------------------
/browser-guides/img/schema.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/schema.png
--------------------------------------------------------------------------------
/browser-guides/img/schema_documents.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/schema_documents.png
--------------------------------------------------------------------------------
/browser-guides/img/slides.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/slides.jpg
--------------------------------------------------------------------------------
/browser-guides/img/stackexchange-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/stackexchange-logo.png
--------------------------------------------------------------------------------
/browser-guides/img/stackoverflow-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/stackoverflow-logo.png
--------------------------------------------------------------------------------
/browser-guides/img/stackoverflow-model.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/stackoverflow-model.jpg
--------------------------------------------------------------------------------
/browser-guides/img/style_actedin_relationship.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/style_actedin_relationship.png
--------------------------------------------------------------------------------
/browser-guides/img/style_person_node.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/style_person_node.png
--------------------------------------------------------------------------------
/browser-guides/img/style_sheet_grass.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/style_sheet_grass.png
--------------------------------------------------------------------------------
/browser-guides/img/sushi_restaurants_nyc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/sushi_restaurants_nyc.png
--------------------------------------------------------------------------------
/browser-guides/img/sysinfo_stats.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/sysinfo_stats.png
--------------------------------------------------------------------------------
/browser-guides/img/transfermarkt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/transfermarkt.png
--------------------------------------------------------------------------------
/browser-guides/img/ukcompanies_model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/browser-guides/img/ukcompanies_model.png
--------------------------------------------------------------------------------
/browser-guides/import/01_load_csv.adoc:
--------------------------------------------------------------------------------
1 | = Neo4j import: LOAD CSV in Cypher
2 | :author: Mark Needham
3 | :description: Learn how to use 3 methods for importing data into Neo4j
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/import/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/import
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/import
7 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data
8 | :tags: import, data, load, load-csv
9 | :neo4j-version: 3.5
10 | :icons: font
11 |
12 | == Intro to the dataset
13 |
14 | Welcome to the first of a set of interactive guides.
15 | In these guides, we will import a dataset containing the connections between U.S. airports in 2008.
16 |
17 | Let's get started!
18 |
19 | == Exploring data with `LOAD CSV`
20 |
21 | While we are getting started with our dataset, it's much easier to work with a subset of the data so that we can iterate quickly.
22 | A smaller dataset containing 10,000 connections between U.S. airports lives in `flights_initial.csv`.
23 |
24 | We can run the following query to see what data we have to work with:
25 |
26 | [source,cypher,subs=attributes]
27 | ----
28 | LOAD CSV WITH HEADERS FROM "{data-url}flights_initial.csv" AS row
29 | RETURN row
30 | LIMIT 5
31 | ----
32 |
33 | This query:
34 |
35 | * loads the file `flights_initial.csv`
36 | * iterates over the file, referring to each line as the variable `row`
37 | * and returns the first 5 lines in the file
38 |
39 | If you see an error message that mentions `Couldn't load the external resource` the CSV files haven't been copied to the correct location.
40 | Grab a trainer for help!
41 |
42 | There are lots of different fields in this CSV file.
43 |
44 | == Importing flights and airports
45 |
46 | Run the following query to create nodes and relationships for the flights
47 |
48 | [source,cypher,subs=attributes]
49 | ----
50 | LOAD CSV WITH HEADERS FROM "{data-url}flights_initial.csv" AS row
51 | MERGE (origin:Airport {code: row.Origin})
52 | MERGE (destination:Airport {code: row.Dest})
53 | WITH row.UniqueCarrier + row.FlightNum + "_" + row.Year + "-" + row.Month + "-" + row.DayofMonth + "_" + row.Origin + "_" + row.Dest AS flightIdentifier, row, origin, destination
54 | MERGE (flight:Flight { id: flightIdentifier })
55 | ON CREATE SET flight.date = row.Year + "-" + row.Month + "-" + row.DayofMonth,
56 | flight.airline = row.UniqueCarrier, flight.number = row.FlightNum, flight.departure = row.CRSDepTime,
57 | flight.arrival = row.CRSArrTime, flight.distance = row.Distance, flight.cancelled = row.Cancelled
58 | MERGE (flight)-[:ORIGIN]->(origin)
59 | MERGE (flight)-[:DESTINATION]->(destination)
60 | ----
61 |
62 | This query:
63 |
64 | * iterates through each row in the file
65 | * creates nodes with the `Airport` label for the origin and destination airports if they don't already exist
66 | * creates nodes with the `Flight` label for flights if they don't already exist. We invent our own `flightIdentifier` as there isn't one in the dataset
67 | * creates an `ORIGIN` relationship between the origin airport and the flight
68 | * creates a `DESTINATION` relationship between the destination airport and the flight
69 |
70 | You'll notice that this query took quite a while to run - we'll look at how to address that in a minute, but first let's talk about property types.
71 |
72 | == Coercing values
73 |
74 | By default properties will be stored as strings.
75 | This will cause us some problems when we start querying the data.
76 |
77 | What if we want to find all the flights that were longer than 500km?
78 | We might write the following query:
79 |
80 | [source,cypher]
81 | ----
82 | MATCH (flight:Flight)
83 | WHERE flight.distance > 500
84 | RETURN flight
85 | ----
86 |
87 | No rows!
88 | That's maybe surprising since we know there are definitely some flights that meet this criteria.
89 |
90 | == Coercing values: Integers
91 |
92 | Cypher has functions that allow us to coerce values to other types.
93 | You can read more about them in the https://neo4j.com/docs/cypher-manual/current/functions/scalar/#query-functions-scalar[scalar functions section] of the https://neo4j.com/docs/cypher-manual/current/[cypher manual^].
94 |
95 | We can use the `toInteger` function to convert the `distance` parameter.
96 |
97 | [source,cypher]
98 | ----
99 | MATCH (flight:Flight)
100 | SET flight.distance = toInteger(flight.distance)
101 | ----
102 |
103 | Now let's retry the query:
104 |
105 | [source,cypher]
106 | ----
107 | MATCH (flight:Flight)
108 | WHERE flight.distance > 500
109 | RETURN flight
110 | ----
111 |
112 | == Coercing values: Booleans
113 |
114 | The `cancelled` property hasn't been imported in an optimal way either.
115 | Ideally, we would like that to be a boolean value, but at the moment, it's stored as `0` or `1`.
116 |
117 | There isn't a function to fix this but we can write some Cypher that will do the trick:
118 |
119 | [source,cypher]
120 | ----
121 | MATCH (flight:Flight)
122 | SET flight.cancelled = CASE WHEN flight.cancelled = "1" THEN true ELSE false END
123 | ----
124 |
125 | Now we can write a query to find all the flights that were cancelled:
126 |
127 | [source,cypher]
128 | ----
129 | MATCH (flight:Flight)
130 | WHERE flight.cancelled
131 | RETURN flight
132 | ----
133 |
134 | == Speeding up the import
135 |
136 | Next, we are going to import 40,000 more flights, but first, we need to make our import script quicker.
137 |
138 | In our initial `LOAD CSV` command, we do multiple label scans on our `MERGE` clauses to create origins, destinations, and flights.
139 |
140 | We can create unique constraints to solve this problem.
141 | This will have the added benefit of stopping us from accidentally creating duplicate nodes!
142 |
143 | [source,cypher]
144 | ----
145 | CREATE CONSTRAINT ON (a:Airport)
146 | ASSERT a.code IS UNIQUE
147 | ----
148 |
149 | [source,cypher]
150 | ----
151 | CREATE CONSTRAINT ON (f:Flight)
152 | ASSERT f.id IS UNIQUE
153 | ----
154 |
155 | Run the following commands to check our constraints were created:
156 |
157 | [source,cypher]
158 | ----
159 | :schema
160 | ----
161 |
162 | == Import a bigger dataset
163 |
164 | Now we are ready to import some more flights.
165 | We will use the `USING PERIODIC COMMIT` clause so that we don't build up lots of transaction state in memory - by default our query will commit every 1,000 rows.
166 |
167 | Run the following command:
168 |
169 | [source,cypher,subs=attributes]
170 | ----
171 | USING PERIODIC COMMIT
172 | LOAD CSV WITH HEADERS FROM "{csv-url}flights_50k.csv" AS row
173 | MERGE (origin:Airport {code: row.Origin})
174 | MERGE (destination:Airport {code: row.Dest})
175 | WITH row.UniqueCarrier + row.FlightNum + "_" + row.Year + "-" + row.Month + "-" + row.DayofMonth + "_" + row.Origin + "_" + row.Dest AS flightIdentifier, row, origin, destination
176 | MERGE (flight:Flight { id: flightIdentifier })
177 | ON CREATE SET flight.date = row.Year + "-" + row.Month + "-" + row.DayofMonth,
178 | flight.airline = row.UniqueCarrier, flight.number = row.FlightNum, flight.departure = row.CRSDepTime,
179 | flight.arrival = row.CRSArrTime, flight.distance = row.Distance, flight.cancelled = row.Cancelled
180 | MERGE (flight)-[:ORIGIN]->(origin)
181 | MERGE (flight)-[:DESTINATION]->(destination)
182 | ----
183 |
184 | == Checking our import
185 |
186 | We now have 50,000 flights in the database, which we can check by executing the following query:
187 |
188 | WARNING: If you don't have enough heap configured, this query will fail, despite the `PERIODIC COMMIT`. That's because of the `Eager` operator that's inserted by the double `MERGE` on the same label-property combination.
189 |
190 | [source,cypher]
191 | ----
192 | MATCH (:Flight)
193 | RETURN count(*)
194 | ----
195 |
196 | == Next step
197 |
198 | We can get a lot of data into Neo4j using pure Cypher, but if we want to import data from other sources, then APOC is the best method that covers a wide range of other data import scenarios.
199 |
200 | ifdef::env-guide[]
201 | pass:a[Cypher and APOC]
202 | endif::[]
203 |
204 | ifdef::env-graphgist[]
205 | link:{gist}/02_apoc.adoc[Cypher and APOC^]
206 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/import/03_procedures.adoc:
--------------------------------------------------------------------------------
1 | = Neo4j import: Custom procedures
2 | :author: Mark Needham
3 | :description: Learn how to use 3 methods for importing data into Neo4j
4 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data
5 | :img: https://s3.amazonaws.com/guides.neo4j.com/import/img
6 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/import
7 | :guides: https://s3.amazonaws.com/guides.neo4j.com/import
8 | :tags: import, data, load, custom-procedures, user-defined, procedures
9 | :neo4j-version: 3.5
10 | :icons: font
11 |
12 | == Procedures
13 |
14 | In this next section, we'll get some practice writing custom procedures.
15 | You'll need to have Java installed on your machine for this exercise.
16 |
17 | == OpenStreetMap
18 |
19 | OpenStreetMap provides https://wiki.openstreetmap.org/wiki/Downloading_data[several different to export data^], including the Overpass API which allows us to specify the coordinates of a bounded box that we would like to download.
20 |
21 | e.g. https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]
22 |
23 | If we open that URI, we will see something like this:
24 |
25 | ```
26 |
27 | The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | ```
39 |
40 | == Exploring OpenStreetMap with `apoc.load.xml`
41 |
42 | We want to create nodes based on the `` elements and connect them together using the `` elements.
43 |
44 | In OSM, https://wiki.openstreetmap.org/wiki/Node[a node^] represents "a single point in space defined by its latitude, longitude, and node id."
45 |
46 | Let's first try using APOC's `apoc.load.xml` procedure to do this.
47 | The following query finds the points in a bounded box in Munich:
48 |
49 | [source,cypher]
50 | ----
51 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]')
52 | YIELD value
53 | UNWIND value["_children"] AS child
54 |
55 | WITH child WHERE child["_type"] = "node"
56 | RETURN child.id AS id, child.lat AS latitude, child.lon AS longitude, child["user"] AS userName
57 | LIMIT 10
58 | ----
59 |
60 | == Importing OpenStreetMap with `apoc.load.xml`
61 |
62 | Now let's import those points!
63 |
64 | First we'll create a unique constraint on `:Point(id)` so that we don't end up with duplicate points.
65 | This command will also create an index which will be useful in the next section:
66 |
67 | [source,cypher]
68 | ----
69 | CREATE CONSTRAINT ON (p:Point)
70 | ASSERT p.id is UNIQUE
71 | ----
72 |
73 | == Import the XML data
74 |
75 | Run the following query to import the xml data and create the `Users` and `Points` for our data model:
76 |
77 | [source,cypher]
78 | ----
79 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]')
80 | YIELD value
81 | UNWIND value["_children"] AS child
82 |
83 | WITH child WHERE child["_type"] = "node"
84 | WITH child.id AS id, child.lat AS latitude, child.lon AS longitude, child["user"] AS userName
85 |
86 | MERGE (point:Point {id: id})
87 | SET point.latitude = latitude, point.longitude = longitude
88 | MERGE (user:User {name: userName})
89 | MERGE (user)-[:EDITED]->(point)
90 | ----
91 |
92 | == Verify data in the graph
93 |
94 | We can run the following query to check the points were created:
95 |
96 | [source,cypher]
97 | ----
98 | MATCH (point:Point)<-[:EDITED]-(user)
99 | RETURN point.id, point.latitude, point.longitude, user.name
100 | LIMIT 25
101 | ----
102 |
103 | == Importing OpenStreetMap with `apoc.load.xml`
104 |
105 | Next, we want to create a relationship between adjacent points.
106 |
107 | Let's first see what the data in the `` elements look like:
108 |
109 | [source,cypher]
110 | ----
111 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]')
112 | YIELD value
113 | UNWIND value["_children"] AS child
114 |
115 | WITH child WHERE child["_type"] = "way"
116 | RETURN child.id AS id, [child in child["_children"] where child["_type"] = "nd"] AS children
117 | LIMIT 1
118 | ----
119 |
120 | We want to create a `CONNECTS` relationship between the adjacent nodes inside a given `way`. For instance, if `children` contained `[1,2,3]` we want to create `(1)-[:CONNECTS]->(2)` and `(2)-[:CONNECTS]->(3)`.
121 |
122 | == Importing OpenStreetMap with `apoc.load.xml`
123 |
124 | Run the following query to add a `CONNECTS` relationship between adjacent nodes:
125 |
126 | [source,cypher]
127 | ----
128 | CALL apoc.load.xml('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543,48.145]')
129 | YIELD value
130 | UNWIND value["_children"] AS child
131 |
132 | WITH child WHERE child["_type"] = "way"
133 | WITH child.id AS id, [child in child["_children"] where child["_type"] = "nd"] AS children
134 | UNWIND range(0, size(children) - 2) AS idx
135 | WITH id, children[idx] as start, children[idx+1] AS end
136 | MATCH (p1:Point {id: start["ref"]})
137 | MATCH (p2:Point {id: end["ref"]})
138 | MERGE (p1)-[:CONNECTS]->(p2)
139 | ----
140 |
141 | == Querying OpenStreetMap
142 |
143 | Now let's see if we can find a path between two points:
144 |
145 | [source,cypher]
146 | ----
147 | MATCH (p1:Point {id: "3800618341"})
148 | MATCH (p2:Point {id: "1485915298"})
149 | MATCH path = shortestpath((p1)-[:CONNECTS*]-(p2))
150 | RETURN p1, p2, path
151 | ----
152 |
153 | Cool! All good so far.
154 |
155 | == Custom procedures
156 |
157 | We were able to achieve what we wanted with `apoc.load.xml`, but the Cypher we have to write gets more complicated as we get deeper into the XML structure.
158 | We also had to run two queries to achieve our desired graph structure. It would be nice if we could do everything in one pass.
159 |
160 | We've started on the implementation of a procedure that can do just this!
161 | You can find it on the Neo4j training repository - https://github.com/neo4j-contrib/training/tree/master/import/custom-procedure.
162 |
163 | == OSM Import Procedure
164 |
165 | Go ahead and clone the repository, then build the procedure by executing the following command:
166 |
167 | ```
168 | mvn clean install -DskipTests
169 | ```
170 |
171 | We'll then have the following jar in our `target` directory:
172 |
173 | ```
174 | $ ls target/neo4j*.jar
175 | target/neo4j-procedures-examples-1.0.0-SNAPSHOT.jar
176 | ```
177 |
178 | Copy that into your Neo4j `plugins` directory and restart Neo4j.
179 |
180 | == Running the OSM Import Procedure
181 |
182 | We've already implemented importing nodes which you can try out by executing the following command:
183 |
184 | [source, cypher]
185 | ----
186 | CALL osm.importUri('https://overpass-api.de/api/xapi_meta?*[bbox=11.54,48.14,11.543, 48.145]')
187 | ----
188 |
189 | == Exercise: Adding connections to the OSM Import Procedure
190 |
191 | Now we need to update our procedure to import the connections as well.
192 |
193 | This is where you will need Java installed on your system to give this a try.
194 |
195 | == Next steps
196 |
197 | Congratulations! You have completed this guide and learned how to use Cypher with LOAD CSV, the APOC library, and custom procedures to import data into Neo4j.
198 | There are more things you can do with each of these methods, as well as other import methods available, such as the ETL tool, language drivers, Kettle, and command line tools.
199 | Feel free to check out the resources linked below for more information!
200 |
201 | * https://neo4j.com/developer/data-import/[Data Import with Neo4j]
202 | * https://neo4j.com/docs/cypher-manual/current/[Cypher documentation]
203 | * https://neo4j.com/labs/apoc/current/[APOC documentation]
204 | * https://neo4j.com/docs/cypher-manual/current/functions/user-defined/[User-Defined Cypher Functions]
205 | * https://neo4j.com/developer/cypher/procedures-functions/[Writing custom procedures and functions]
--------------------------------------------------------------------------------
/browser-guides/import/import.adoc:
--------------------------------------------------------------------------------
1 | = Neo4j Import
2 | :author: Mark Needham
3 | :description: Learn how to use 3 methods for importing data into Neo4j
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/import/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/import
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/import
7 | :tags: import, data, load, load-csv, apoc, procedures
8 | :neo4j-version: 3.5
9 |
10 | == Welcome to Neo4j Import
11 |
12 | image:{img}/database_import.png[db-import,width=300,float=right]
13 |
14 | In this set of guides, we will cover three different ways to import data into Neo4j. These are not the only three ways to ingest data for Neo4j, but they are common methods.
15 |
16 | ifdef::env-guide[]
17 | . pass:a[Cypher and LOAD CSV]
18 | . pass:a[Cypher and APOC]
19 | . pass:a[Procedures]
20 | endif::[]
21 |
22 | ifdef::env-graphgist[]
23 | . link:{gist}/01_load_csv.adoc[Cypher and LOAD CSV^]
24 | . link:{gist}/02_apoc.adoc[Cypher and APOC^]
25 | . link:{gist}/03_procedures.adoc[Procedures^]
26 | endif::[]
27 |
28 | == Further Resources
29 |
30 | * https://neo4j.com/graphgists[Graph Gist Examples]
31 | * https://neo4j.com/docs/stable/cypher-refcard/[Cypher Reference Card]
32 | * https://neo4j.com/labs/apoc/[APOC documentation]
33 | * https://neo4j.com/docs/cypher-manual/current/[Cypher documentation]
34 | * https://graphdatabases.com[e-book: Graph Databases (free)]
--------------------------------------------------------------------------------
/browser-guides/meetup/01_meetup_import.adoc:
--------------------------------------------------------------------------------
1 | = Data import
2 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/meetup/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/meetup
5 | :guides: https://s3.amazonaws.com/guides.neo4j.com/meetup
6 | :icons: font
7 | :neo4j-version: 3.5
8 |
9 | == Import data from Meetup API
10 |
11 | First, we need to import the data from the Meetup API.
12 | Many of the endpoints provided by Meetup.com are restricted and require an account and credentials, but for this guide, we will only query the open endpoint for event RSVPs.
13 |
14 | For additional data or analysis, you can create a free account and import from many other endpoints, as outlined in the https://www.meetup.com/meetup_api/docs/[Meetup API documentation^].
15 |
16 | == Setup: Indexes and Constraints
17 |
18 | To help speed up performance of queries and ensure unique entities, let's go ahead and set up some constraints and indexes.
19 |
20 | *Note:* Ensure the `Enable multi statement query editor` setting is checked under `Settings` in Neo4j Browser.
21 |
22 | [source,cypher]
23 | ----
24 | CREATE INDEX ON :Member(id);
25 |
26 | CREATE INDEX ON :Event(id);
27 | CREATE INDEX ON :Event(time);
28 | CREATE INDEX ON :Event(location);
29 |
30 | CREATE INDEX ON :Group(id);
31 | CREATE INDEX ON :Group(name);
32 | CREATE INDEX ON :Group(location);
33 |
34 | CREATE INDEX ON :Venue(id);
35 | CREATE INDEX ON :Venue(location);
36 | CREATE INDEX ON :RSVP(id);
37 | CREATE INDEX ON :Topic(name);
38 | CREATE INDEX ON :Topic(urlkey);
39 |
40 | CREATE INDEX ON :City(name);
41 | CREATE INDEX ON :City(location);
42 | CREATE INDEX ON :City(population);
43 |
44 | CREATE INDEX ON :Country(iso2);
45 | CREATE INDEX ON :Country(name);
46 |
47 | CREATE CONSTRAINT ON (t:Topic) ASSERT t.id IS UNIQUE;
48 | ----
49 |
50 | == Import data
51 |
52 | Now we can import the data with the statement below.
53 | It creates group, member, event, venue, rsvp, and topic entities in our graph.
54 |
55 | The query will take a few minutes to complete, as it is retrieving 100 entities from the API and creating all of the relations at once.
56 |
57 | *Note:* Each time this query is run, it may yield different results. The query is not filtering a specific set of RSVP data, so it will retrieve whatever is provided by the API.
58 |
59 | [source, cypher]
60 | ----
61 | WITH 'https://stream.meetup.com/2/rsvps' as url
62 | CALL apoc.load.json(url) YIELD value
63 | WITH value LIMIT 100
64 | WITH value.venue as venueData, value.member as memberData, value.event as eventData, value.group.group_topics as topics, value as data, apoc.map.removeKeys(value.group, ['group_topics']) as groupData
65 |
66 | MERGE (member:Member { id: memberData.member_id })
67 | ON CREATE SET member.name = memberData.member_name, member.photo = memberData.photo
68 |
69 | MERGE (event:Event { id: eventData.event_id })
70 | ON CREATE SET event.name = eventData.event_name, event.time = datetime({ epochMillis: coalesce(eventData.time, 0) }), event.url = eventData.event_url
71 |
72 | MERGE (group:Group { id: groupData.group_id })
73 | ON CREATE SET group.name = groupData.group_name, group.city = groupData.group_city, group.country = groupData.group_country, group.state = groupData.group_state, group.location = point({latitude: groupData.group_lat, longitude: groupData.group_lon}), group.urlname = groupData.group_urlname
74 |
75 | MERGE (venue:Venue { id: coalesce(venueData.venue_id, randomUUID()) })
76 | ON CREATE SET venue.name = venueData.venue_name, venue.location = point({latitude: venueData.lat, longitude: venueData.lon})
77 |
78 | CREATE (rsvp:RSVP {id: coalesce(data.rsvp_id, randomUUID()), guests: coalesce(data.guests, 0), mtime: datetime({ epochMillis: coalesce(data.mtime, 0) }), response: data.response, visibility: data.visibility})
79 | MERGE (rsvp)-[:MEMBER]->(member)
80 | MERGE (rsvp)-[:EVENT]->(event)
81 | MERGE (rsvp)-[:GROUP]->(group)
82 |
83 | MERGE (member)-[:RSVP]->(event)
84 | MERGE (event)<-[:HELD]-(group)
85 | MERGE (event)-[:LOCATED_AT]->(venue)
86 |
87 | WITH group, topics
88 | UNWIND topics as tp
89 | MERGE (t:Topic { urlkey: tp.urlkey })
90 | ON CREATE SET t.name = tp.topic_name
91 | MERGE (group)-[:TOPIC]->(t);
92 | ----
93 |
94 | == Verify data import
95 |
96 | We should have a small data set in our graph database for us to query and explore now!
97 | Before we dive into exploration, though, let us take a look at our data model of the data that is there.
98 |
99 | [source,cypher]
100 | ----
101 | //what does our data model look like?
102 | CALL db.schema.visualization();
103 | ----
104 |
105 | == Improvements?
106 |
107 | Hm, it might be nice to have location (country/city) separated for our meetup groups so that we can easily query for groups in a certain area.
108 | Let's see if we can fix that by importing all countries and cities in the world.
109 |
110 | == Import World Cities/Countries
111 |
112 | [source,cypher,subs=attributes]
113 | ----
114 | LOAD CSV WITH HEADERS
115 | FROM '{data-url}/worldcities.csv' AS line
116 |
117 | MERGE (country:Country {name: coalesce(line.country, '')})
118 | SET iso2: coalesce(line.iso2, ''), iso3: coalesce(line.iso3, '')
119 |
120 | MERGE (c:City {name: coalesce(line.city, '')})
121 | SET id: coalesce(line.id, ''), asciiName: coalesce(line.city_ascii, ''), adminName: coalesce(line.admin_name, ''), capital: coalesce(line.capital, ''), location: point({latitude: toFloat(coalesce(line.lat, '0.0')), longitude: toFloat(coalesce(line.lng, '0.0'))}), population: coalesce(toInteger(coalesce(line.population, 0)), 0)
122 |
123 | MERGE (c)-[:IN]->(country);
124 | ----
125 |
126 | == Verify City/Country import
127 |
128 | We can verify our last import with a quick query searching for the city of `London`.
129 |
130 | [source,cypher]
131 | ----
132 | MATCH (c:City {name: 'London'})-[r:IN]-(o:Country)
133 | RETURN c, r, o
134 | ----
135 |
136 | A few results should come back. It looks like that United Kingdom city also shares a name with cities in a couple of different states in the United States, as well as a city in Canada.
137 |
138 | Now we need to tie those locations back to our meetup groups!
139 |
140 | == Add relationships between locations, meetup groups, and events
141 |
142 | [source,cypher]
143 | ----
144 | //link groups and locations
145 | MATCH (g:Group)
146 | WITH toUpper(g.country) as iso2, g
147 | MATCH (c:Country {iso2: iso2})
148 | MERGE (g)-[r:IN]->(c)
149 | RETURN count(r);
150 | ----
151 |
152 | [source,cypher]
153 | ----
154 | //link venues and cities
155 | CALL apoc.periodic.iterate("MATCH (c:City) RETURN c.location as loc, c",
156 | "WITH loc, c, 24140.2 as FifteenMilesInMeters
157 | MATCH (v:Venue)
158 | WHERE distance(v.location, c.location) < FifteenMilesInMeters
159 | MERGE (v)-[r:NEAR]->(c)", { batchSize: 500 })
160 | YIELD batches, total
161 | RETURN batches, total;
162 | ----
163 |
164 | == Import check
165 |
166 | Now that we have all of that data, let's take a look at our data model again, then run a few summary queries to understand what all we have.
167 |
168 | [source,cypher]
169 | ----
170 | CALL db.schema.visualization();
171 | ----
172 |
173 | == Data summary queries
174 |
175 | [source,cypher]
176 | ----
177 | //How many meetup groups are in our dataset?
178 | MATCH (n:Group) RETURN count(n);
179 | ----
180 |
181 | [source,cypher]
182 | ----
183 | //find some cities with events
184 | MATCH (c:City)-[n:NEAR]-(v:Venue)-[l:LOCATED_AT]-(e:Event)
185 | RETURN * LIMIT 20;
186 | ----
187 |
188 | [source,cypher]
189 | ----
190 | //find some upcoming events
191 | MATCH (e:Event)-[l:LOCATED_AT]-(v:Venue)-[n:NEAR]-(c:City)
192 | WHERE e.time > datetime()
193 | RETURN * LIMIT 20;
194 | ----
195 |
196 | == Next
197 |
198 | In the next section, we are going to explore our data more thoroughly using queries.
199 |
200 | ifdef::env-guide[]
201 | pass:a[Data Analysis]
202 | endif::[]
203 |
204 | ifdef::env-graphgist[]
205 | link:{gist}/02_data_analysis.adoc[Data Analysis^]
206 | endif::[]
--------------------------------------------------------------------------------
/browser-guides/meetup/02_data_analysis.adoc:
--------------------------------------------------------------------------------
1 | = Data Analysis
2 | :data-url: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data
3 | :img: https://s3.amazonaws.com/guides.neo4j.com/meetup/img
4 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/meetup
5 | :icons: font
6 | :neo4j-version: 3.5
7 |
8 | == Explore Meetup data with Cypher
9 |
10 | We can dig deeper into our graph to gain insights using the relationships between entities for insights.
11 |
12 | First, let's see which cities have the most events.
13 |
14 | *Remember:* We imported random data from api, so there may be varying results.
15 |
16 | [source,cypher]
17 | ----
18 | //find cities with most events
19 | MATCH (c:City)-[n:NEAR]-(v:Venue)-[l:LOCATED_AT]-(e:Event)
20 | RETURN c.name, count(e) as count ORDER BY count DESC;
21 | ----
22 |
23 | == Analysis: Topics
24 |
25 | === Query if any groups are in the Tech topic
26 | [source,cypher]
27 | ----
28 | //note: there may not be any groups for this topic in our database due to random import
29 | MATCH (t:Topic {name: 'Tech'})-[r:TOPIC]-(g:Group)
30 | RETURN *;
31 | ----
32 |
33 | === Find most popular topics
34 |
35 | [source,cypher]
36 | ----
37 | MATCH (t:Topic)-[r:TOPIC]-(g:Group)
38 | RETURN t.name, count(g) as count ORDER BY count DESC
39 | ----
40 |
41 | == Analysis: Topics
42 |
43 | === Find which users attend the most Meetups in a random topic
44 |
45 | [source,cypher]
46 | ----
47 | MATCH (t:Topic)
48 | WITH collect(t) as topics
49 | WITH apoc.coll.randomItem(topics) as targetTopic
50 | MATCH (targetTopic)-[:TOPIC]-(g:Group)-[:HELD]-(e:Event)<-[:EVENT]-(r:RSVP)-[:MEMBER]-(member:Member)
51 | RETURN targetTopic.name as topic, member.name as member, count(r) as RSVPs
52 | ORDER BY RSVPs DESC limit 10;
53 | ----
54 |
55 | == Analysis: Groups
56 |
57 | === Which group was created most recently?
58 |
59 | [source,cypher]
60 | ----
61 | MATCH (g:Group)
62 | RETURN g
63 | ORDER BY g.created DESC
64 | LIMIT 1
65 | ----
66 |
67 | === How many groups have been running for at least 1 year?
68 |
69 | [source,cypher]
70 | ----
71 | //note: there may not be any results due to random import
72 | MATCH (g:Group)
73 | WHERE (timestamp() - g.created) / 1000 / 3600 / 24 / 365 >= 1
74 | RETURN count(g)
75 | ----
76 |
77 | == Analysis: Groups
78 |
79 | === Find groups with 'Neo4j' or 'Data' in their name.
80 |
81 | [source,cypher]
82 | ----
83 | MATCH (g:Group)
84 | WHERE g.name CONTAINS 'Neo4j' OR g.name CONTAINS 'Data'
85 | RETURN g
86 | ----
87 |
88 | === What are the distinct topics for those groups?
89 |
90 | [source,cypher]
91 | ----
92 | MATCH (g:Group)-[:TOPIC]->(t:Topic)
93 | WHERE g.name CONTAINS 'Neo4j' OR g.name CONTAINS 'Data'
94 | RETURN t.name, count(*)
95 | ----
96 |
97 | == Analysis: Events
98 |
99 | === Who brings the most guests?
100 |
101 | [source,cypher]
102 | ----
103 | MATCH (r:RSVP)-[:MEMBER]->(m:Member)
104 | WHERE r.guests > 5
105 | RETURN m.name, sum(r.guests) as totalGuests
106 | ORDER BY totalGuests DESC limit 10;
107 | ----
108 |
109 | === Which venue hosts the most meetups?
110 |
111 | [source,cypher]
112 | ----
113 | MATCH (v:Venue)<-[:LOCATED_AT]-(e:Event)
114 | WHERE v.name IS NOT NULL
115 | RETURN v.name, v.location, count(e) as events
116 | ORDER BY events desc
117 | LIMIT 10;
118 | ----
119 |
120 | == Analysis: Events
121 |
122 | === Find meetups a random venue has hosted?
123 |
124 | [source,cypher]
125 | ----
126 | MATCH (v:Venue)
127 | WHERE v.name IS NOT NULL
128 | WITH collect(v) as venues
129 | WITH apoc.coll.randomItem(venues) as venue
130 | MATCH (venue)<-[:LOCATED_AT]-(e:Event)<-[:HELD]-(g:Group),
131 | (e)-[:EVENT]-(r:RSVP)
132 | RETURN venue.name, venue.location, e.name, g.name, count(r) as RSVPs
133 | LIMIT 10;
134 | ----
135 |
136 | == Analysis: Shortest paths
137 |
138 | === Shortest paths between random venues
139 |
140 | [source,cypher]
141 | ----
142 | MATCH (v:Venue)
143 | WHERE v.name IS NOT NULL
144 | WITH collect(v) as venues
145 | WITH apoc.coll.randomItem(venues) as v1, apoc.coll.randomItem(venues) as v2
146 | MATCH p=shortestPath((v1)-[*]-(v2))
147 | RETURN p;
148 | ----
149 |
150 | === Shortest path between two random topics
151 |
152 | [source,cypher]
153 | ----
154 | MATCH (t:Topic)
155 | WITH collect(t) as topics
156 | WITH apoc.coll.randomItem(topics) as t1, apoc.coll.randomItem(topics) as t2
157 | MATCH p=shortestPath((t1)-[*]-(t2))
158 | RETURN p;
159 | ----
160 |
161 | == Analysis: Shortest paths
162 |
163 | === Shortest path among 3 random members
164 |
165 | [source,cypher]
166 | ----
167 | MATCH (m:Member)
168 | WITH collect(m) as members
169 | WITH apoc.coll.randomItem(members) as m1, apoc.coll.randomItem(members) as m2, apoc.coll.randomItem(members) as m3
170 | MATCH p1=shortestPath((m1)-[*]-(m2)),
171 | p2=shortestPath((m2)-[*]-(m3)),
172 | p3=shortestPath((m1)-[*]-(m3))
173 | RETURN p1, p2, p3;
174 | ----
175 |
176 | == Analysis: Find events in area
177 |
178 | === Find future Richmond meetups within 10 miles of downtown
179 |
180 | [source,cypher]
181 | ----
182 | WITH point({ latitude: 37.5407246, longitude: -77.4360481 }) as RichmondVA, 32186.9 as TenMiles /* 10 mi expressed in meters */
183 | MATCH (v:Venue)<-[:LOCATED_AT]-(e:Event)-[:HELD]-(g:Group)
184 | WHERE distance(v.location, RichmondVA) < TenMiles AND e.time > datetime()
185 | RETURN g.name as GroupName, e.name as EventName, e.time as When, v.name as Venue limit 10;
186 | ----
187 |
188 | == Analysis: Find events in area
189 |
190 | === Find events within distance of random location
191 |
192 | [source,cypher]
193 | ----
194 | WITH rand() * 90 * (CASE WHEN rand() <= 0.5 THEN 1 ELSE -1 END) as randLat, rand() * 90 * (CASE WHEN rand() <= 0.5 THEN 1 ELSE -1 END) as randLon
195 | WITH point({ latitude: randLat, longitude: randLon }) as randomLocation
196 | MATCH (v:Venue)-[:NEAR]->(city:City)-[:IN]->(c:Country)
197 | RETURN city.name as City,
198 | c.name as Country,
199 | v.name as Venue,
200 | v.location as VenueLocation,
201 | randomLocation as RandomLocation,
202 | distance(v.location, randomLocation) as DistanceInMeters
203 | ORDER BY distance(v.location, randomLocation) ASC
204 | LIMIT 1;
205 | ----
206 |
207 | == Analysis: Find events in area
208 |
209 | === Find upcoming dance events in Manhattan
210 |
211 | [source,cypher]
212 | ----
213 | WITH point({ latitude: 40.758896, longitude: -73.985130 }) as TimesSquareManhattan, 32186.9 as TenMiles
214 | MATCH (v:Venue)<-[:LOCATED_AT]-(e:Event),
215 | (e)-[:HELD]-(g:Group),
216 | (g)-[:TOPIC]->(t:Topic),
217 | (e)<-[:EVENT]-(r:RSVP)
218 | WHERE e.time >= datetime("2018-09-06T00:00:00Z") AND
219 | e.time <= datetime("2018-09-06T23:59:59Z") AND
220 | distance(v.location, TimesSquareManhattan) < TenMiles AND
221 | v.name is not null AND
222 | t.name =~ '(?i).*dancing.*'
223 | RETURN g.name as GroupName,
224 | collect(distinct t.name) as topics,
225 | e.name as EventName,
226 | count(r) as RSVPs,
227 | e.time as When,
228 | v.name as Venue
229 | ORDER BY RSVPs DESC
230 | LIMIT 100;
231 | ----
232 |
233 | == Next
234 |
235 | We have seen how to use Cypher to import and analyze meetup data from the Meetup API.
236 | We can continue analysis with additional queries, import other data for more layers, and more!
--------------------------------------------------------------------------------
/browser-guides/meetup/meetup.adoc:
--------------------------------------------------------------------------------
1 | = Analyzing Meetup Data with Neo4j
2 | :author: Neo4j Devrel
3 | :description: Analyze API data from Meetup.com with Neo4j
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/meetup/img
5 | :gist: https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/meetup
6 | :guides: https://s3.amazonaws.com/guides.neo4j.com/meetup
7 | :tags: cypher, data-analysis, similarity, import, load-csv
8 | :neo4j-version: 3.5
9 | :icons: font
10 |
11 | == Import and Analyze data from Meetup.com API
12 |
13 | image::{img}/meetup.png[float=right]
14 |
15 | In this guide, we will call the Meetup API that provides data for meetup groups, topics, members, events, and more to import the data set to Neo4j.
16 | Once imported, we can explore the data as a graph using the Cypher query language to retrieve and discover insights and interesting details.
17 |
18 | Table of Contents:
19 |
20 | ifdef::env-guide[]
21 | . pass:a[Data Import]
22 | . pass:a[Data Analysis]
23 | endif::[]
24 |
25 | ifdef::env-graphgist[]
26 | . link:{gist}/01_meetup_import.adoc[Data Import^]
27 | . link:{gist}/02_data_analysis.adoc[Data Analysis^]
28 | endif::[]
29 |
30 | == Further Resources
31 |
32 | * https://neo4j.com/graphgists[Graph Gist Examples]
33 | * https://neo4j.com/docs/cypher-refcard/current/[Cypher Reference Card]
34 | * https://neo4j.com/docs/cypher-manual/current/[Cypher Manual]
35 | * https://neo4j.com/developer/cypher/resources/[Cypher Resources]
36 | * https://graphdatabases.com[e-book: Graph Databases (free)]
--------------------------------------------------------------------------------
/browser-guides/restaurant_recommendation/restaurant_recommendation.adoc:
--------------------------------------------------------------------------------
1 | = Restaurant Recommendations
2 | :author: Neo4j
3 | :description: Understand and build a small recommendation engine
4 | :img: https://s3.amazonaws.com/guides.neo4j.com/restaurant_recommendation/img
5 | :tags: recommendation, graph-search, introduction
6 | :neo4j-version: 3.5
7 | :icons: font
8 |
9 | == Restaurant Recommendations: Introduction
10 |
11 | image::{img}/restaurant_recommendation_model.png[height=300,float=right]
12 |
13 | We want to demonstrate how easy it is to model a domain as a graph and answer questions in almost-natural language.
14 |
15 | Graph-based search and discovery is prominent a use case for graph databases like https://neo4j.com[Neo4j].
16 |
17 | Here, we use a domain of restaurants that serve cuisines and are located in a city.
18 |
19 | The domain diagram was created with the http://www.apcjones.com/arrows/[Arrows tool].
20 |
21 | == Setup: Creating Friends, Restaurants, Cities, and Cuisines
22 |
23 | We will create a small example graph of people with cuisines they like and the restaurants serving those cuisines.
24 | Our people are in the same social circle (friend relationships), so we can create recommendations of cuisines and restaurants others will like based on their social connections and their preferences.
25 |
26 | [source,cypher]
27 | ----
28 | CREATE (philip:Person {name:"Philip"})-[:IS_FRIEND_OF]->(emil:Person {name:"Emil"}),
29 | (philip)-[:IS_FRIEND_OF]->(michael:Person {name:"Michael"}),
30 | (philip)-[:IS_FRIEND_OF]->(andreas:Person {name:"Andreas"})
31 | CREATE (sushi:Cuisine {name:"Sushi"}), (nyc:City {name:"New York"}),
32 | (iSushi:Restaurant {name:"iSushi"})-[:SERVES]->(sushi),(iSushi)-[:LOCATED_IN]->(nyc),
33 | (michael)-[:LIKES]->(iSushi),
34 | (andreas)-[:LIKES]->(iSushi),
35 | (zam:Restaurant {name:"Zushi Zam"})-[:SERVES]->(sushi),(zam)-[:LOCATED_IN]->(nyc),
36 | (andreas)-[:LIKES]->(zam)
37 | ----
38 |
39 | == Setup: Philip's Friends
40 |
41 | First, let's some of our graph data and find who is friends with Philip.
42 |
43 | [source,cypher]
44 | ----
45 | MATCH (philip:Person {name:'Philip'})-[:IS_FRIEND_OF]-(person)
46 | RETURN person.name
47 | ----
48 |
49 | We should see 3 friends of Philip in our graph - Andreas, Michael, and Emil.
50 |
51 | == Restaurants in NYC and their cusines
52 |
53 | Now let's look at restaurants and the cities where they are located with the cuisines they serve.
54 |
55 | [source,cypher]
56 | ----
57 | MATCH (nyc:City {name:'New York'})<-[:LOCATED_IN]-(restaurant)-[:SERVES]->(cuisine)
58 | RETURN nyc, restaurant, cuisine
59 | ----
60 |
61 | This query should show us nodes and relationships for the `City` of New York, 2 restaurants `LOCATED_IN` that city, and that each restaurant `SERVES` the `Cuisine` of sushi.
62 |
63 | == Graph Search Recommendation
64 |
65 | image::{img}/sushi_restaurants_nyc.png[height=300,float=right]
66 |
67 | Now that we have an idea what our data looks like, we can start recommending things based on the relationships connecting our people, location, cuisines, and restaurants.
68 |
69 | We want to make a recommendation for Philip by answering the following question:
70 |
71 | ""
72 | Find Sushi Restaurants in New York that Philip's friends like.
73 | ""
74 |
75 | == Recommendation criteria
76 |
77 | To answer this question, we need to find our starting point - _Philip_ needs the recommendation, so his node is where we start our search in the graph.
78 | Now we need to determine which parts of the graph to search using the following criteria from the question:
79 |
80 | * Find _Philip_ and his friends
81 | * Find _Restaurants_ that are located in _New York_
82 | * Find _Restaurants_ that serve the cuisine _sushi_
83 | * Find _Restaurants_ that _Philip's friends_ like
84 |
85 | == Recommendation query
86 |
87 | With those criteria, we construct this query:
88 |
89 | [source,cypher]
90 | ----
91 | MATCH (philip:Person {name: 'Philip'}),
92 | (philip)-[:IS_FRIEND_OF]-(friend),
93 | (restaurant:Restaurant)-[:LOCATED_IN]->(:City {name: 'New York'}),
94 | (restaurant)-[:SERVES]->(:Cuisine {name: 'Sushi'}),
95 | (friend)-[:LIKES]->(restaurant)
96 | RETURN restaurant.name as restaurantName, collect(friend.name) AS recommendedBy, count(*) AS numberOfRecommendations
97 | ORDER BY numberOfRecommendations DESC
98 | ----
99 |
100 | This tells us that 2 of Philip's friends recommend iSushi restaurant for sushi, and 1 of his friends recommends Zushi Zam restaurant for sushi.
101 |
102 | == More on recommendations
103 |
104 | Larger graphs and deeper relationship paths can add complexity and power to recommendation engines. This example shows the beginning steps and logic for building these systems using the relationships in the network to recommend products, hobbies, services, similarities, and more.
105 |
106 | * https://neo4j.com/use-cases/real-time-recommendation-engine/[Use case: Recommendations Engine]
107 | * https://neo4j.com/developer/cypher/guide-build-a-recommendation-engine/[Tutorial: Building Recommendation Engine]
--------------------------------------------------------------------------------
/fraud/BankFraud-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/fraud/BankFraud-1.png
--------------------------------------------------------------------------------
/fraud/bank-fraud-detection.adoc:
--------------------------------------------------------------------------------
1 | = Bank Fraud Detection
2 | :neo4j-version: 2.3.0-RC1
3 | :author: Kenny Bastani
4 | :twitter: @kennybastani
5 | :domain: finance
6 | :use-case: fraud-detection
7 |
8 | == Introduction to Problem
9 |
10 | (original source: https://github.com/neo4j-contrib/gists/blob/master/other/BankFraudDetection.adoc )
11 |
12 | This interactive Neo4j graph tutorial covers bank fraud detection scenarios.
13 |
14 | Banks and Insurance companies lose billions of dollars every year to fraud.
15 | Traditional methods of fraud detection play an important role in minimizing these losses.
16 | However, increasingly sophisticated fraudsters have developed a variety of ways to elude discovery, both by working together and by leveraging various other means of constructing false identities.
17 |
18 | '''
19 |
20 | == Explanation of Scenario
21 |
22 | While no fraud prevention measures can ever be perfect, significant opportunity for improvement lies in looking beyond the individual data points, to the connections that link them.
23 | Oftentimes these connections go unnoticed until it is too late-- something that is unfortunate, as these connections oftentimes hold the best clues.
24 |
25 | === Typical Scenario
26 |
27 | While the exact details behind each first-party fraud collusion vary from operation to operation, the pattern below illustrates how fraud rings commonly operate:
28 |
29 | * A group of two or more people organize into a fraud ring
30 | * The ring shares a subset of legitimate contact information, i.e., phone numbers and addresses, combining them to create a number of fictional identities
31 | * Ring members open accounts using these fictional identities
32 | * New accounts are added to the original ones: unsecured credit lines, credit cards, overdraft protection, personal loans, etc.
33 | * The accounts are used as normally, with regular purchases and timely payments
34 | * Banks increase the revolving credit lines over time, due to the observed responsible credit behavior
35 | * One day the ring "busts out", coordinating their activity, maxing out all of their credit lines, and disappearing
36 | * Sometimes fraudsters will go a step further and bring all of their balances to zero using fake checks immediately before the prior step, doubling the damage
37 | * Collections processes ensue, but agents are never able to reach the fraudster
38 | * The uncollectible debt is written off
39 |
40 | '''
41 |
42 | == Explanation of Solution
43 |
44 | Graph databases offer new methods of uncovering fraud rings and other sophisticated scams with a high degree of accuracy, and are capable of stopping advanced fraud scenarios in real time.
45 |
46 | === How Graph Databases Can Help
47 |
48 | Augmenting one's existing fraud detection infrastructure to support ring detection can be done by running appropriate entity link analysis queries using a graph database, and running checks during key stages in the customer & account lifecycle, such as:
49 |
50 | * At the time the account is created
51 | * During an investigation
52 | * As soon as a credit balance threshold is hit
53 | * When a check is bounced
54 |
55 | Real time graph traversals tied to the right kinds of events can help banks identify probable fraud rings, during or even before the Bust-Out occurs.
56 |
57 | '''
58 |
59 | == Bank Fraud Graph Data Model
60 |
61 | Graph databases have emerged as an ideal tool for overcoming these hurdles.
62 | Languages like Cypher provide a simple semantic for detecting rings in the graph, navigating connections in memory, in real time.
63 |
64 | The graph data model below represents how the data actually looks to the graph database, and illustrates how one can find rings by simply walking the graph:
65 |
66 | .Bank Fraud
67 | image::https://raw.github.com/neo4j-contrib/gists/master/other/images/BankFraud-1.png[Bank Fraud]
68 |
69 | '''
70 |
71 | == Sample Data Set
72 |
73 | //hide
74 | //setup
75 | [source,cypher]
76 | ----
77 |
78 | // Create account holders
79 | CREATE (accountHolder1:AccountHolder {
80 | FirstName: "John",
81 | LastName: "Doe",
82 | UniqueId: "JohnDoe" })
83 |
84 | CREATE (accountHolder2:AccountHolder {
85 | FirstName: "Jane",
86 | LastName: "Appleseed",
87 | UniqueId: "JaneAppleseed" })
88 |
89 | CREATE (accountHolder3:AccountHolder {
90 | FirstName: "Matt",
91 | LastName: "Smith",
92 | UniqueId: "MattSmith" })
93 |
94 | // Create Address
95 | CREATE (address1:Address {
96 | Street: "123 NW 1st Street",
97 | City: "San Francisco",
98 | State: "California",
99 | ZipCode: "94101" })
100 |
101 | // Connect 3 account holders to 1 address
102 | CREATE (accountHolder1)-[:HAS_ADDRESS]->(address1),
103 | (accountHolder2)-[:HAS_ADDRESS]->(address1),
104 | (accountHolder3)-[:HAS_ADDRESS]->(address1)
105 |
106 | // Create Phone Number
107 | CREATE (phoneNumber1:PhoneNumber { PhoneNumber: "555-555-5555" })
108 |
109 | // Connect 2 account holders to 1 phone number
110 | CREATE (accountHolder1)-[:HAS_PHONENUMBER]->(phoneNumber1),
111 | (accountHolder2)-[:HAS_PHONENUMBER]->(phoneNumber1)
112 |
113 | // Create SSN
114 | CREATE (ssn1:SSN { SSN: "241-23-1234" })
115 |
116 | // Connect 2 account holders to 1 SSN
117 | CREATE (accountHolder2)-[:HAS_SSN]->(ssn1),
118 | (accountHolder3)-[:HAS_SSN]->(ssn1)
119 |
120 | // Create SSN and connect 1 account holder
121 | CREATE (ssn2:SSN { SSN: "241-23-4567" })<-[:HAS_SSN]-(accountHolder1)
122 |
123 | // Create Credit Card and connect 1 account holder
124 | CREATE (creditCard1:CreditCard {
125 | AccountNumber: "1234567890123456",
126 | Limit: 5000, Balance: 1442.23,
127 | ExpirationDate: "01-20",
128 | SecurityCode: "123" })<-[:HAS_CREDITCARD]-(accountHolder1)
129 |
130 | // Create Bank Account and connect 1 account holder
131 | CREATE (bankAccount1:BankAccount {
132 | AccountNumber: "2345678901234567",
133 | Balance: 7054.43 })<-[:HAS_BANKACCOUNT]-(accountHolder1)
134 |
135 | // Create Credit Card and connect 1 account holder
136 | CREATE (creditCard2:CreditCard {
137 | AccountNumber: "1234567890123456",
138 | Limit: 4000, Balance: 2345.56,
139 | ExpirationDate: "02-20",
140 | SecurityCode: "456" })<-[:HAS_CREDITCARD]-(accountHolder2)
141 |
142 | // Create Bank Account and connect 1 account holder
143 | CREATE (bankAccount2:BankAccount {
144 | AccountNumber: "3456789012345678",
145 | Balance: 4231.12 })<-[:HAS_BANKACCOUNT]-(accountHolder2)
146 |
147 | // Create Unsecured Loan and connect 1 account holder
148 | CREATE (unsecuredLoan2:UnsecuredLoan {
149 | AccountNumber: "4567890123456789-0",
150 | Balance: 9045.53,
151 | APR: .0541,
152 | LoanAmount: 12000.00 })<-[:HAS_UNSECUREDLOAN]-(accountHolder2)
153 |
154 | // Create Bank Account and connect 1 account holder
155 | CREATE (bankAccount3:BankAccount {
156 | AccountNumber: "4567890123456789",
157 | Balance: 12345.45 })<-[:HAS_BANKACCOUNT]-(accountHolder3)
158 |
159 | // Create Unsecured Loan and connect 1 account holder
160 | CREATE (unsecuredLoan3:UnsecuredLoan {
161 | AccountNumber: "5678901234567890-0",
162 | Balance: 16341.95, APR: .0341,
163 | LoanAmount: 22000.00 })<-[:HAS_UNSECUREDLOAN]-(accountHolder3)
164 |
165 | // Create Phone Number and connect 1 account holder
166 | CREATE (phoneNumber2:PhoneNumber {
167 | PhoneNumber: "555-555-1234" })<-[:HAS_PHONENUMBER]-(accountHolder3)
168 |
169 | RETURN *
170 | ----
171 |
172 | //graph
173 |
174 | '''
175 |
176 | == Entity Link Analysis
177 |
178 | Performing entity link analysis on the above data model is demonstrated below.
179 | We use brackets in the below table is to isolate individual elements of a http://neo4j.com/docs/stable/syntax-collections.html[collection].
180 |
181 | == Find account holders who share more than one piece of legitimate contact information
182 |
183 | [source,cypher]
184 | ----
185 | MATCH (accountHolder:AccountHolder)-[]->(contactInformation)
186 | WITH contactInformation,
187 | count(accountHolder) AS RingSize
188 | MATCH (contactInformation)<-[]-(accountHolder)
189 | WITH collect(accountHolder.UniqueId) AS AccountHolders,
190 | contactInformation, RingSize
191 | WHERE RingSize > 1
192 | RETURN AccountHolders AS FraudRing,
193 | labels(contactInformation) AS ContactType,
194 | RingSize
195 | ORDER BY RingSize DESC
196 | ----
197 |
198 | //output
199 | //table
200 |
201 |
202 |
203 | == Determine the financial risk of a possible fraud ring
204 |
205 | [source,cypher]
206 | ----
207 | MATCH (accountHolder:AccountHolder)-[]->(contactInformation)
208 | WITH contactInformation,
209 | count(accountHolder) AS RingSize
210 | MATCH (contactInformation)<-[]-(accountHolder),
211 | (accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]->(unsecuredAccount)
212 | WITH collect(DISTINCT accountHolder.UniqueId) AS AccountHolders,
213 | contactInformation, RingSize,
214 | SUM(CASE type(r)
215 | WHEN 'HAS_CREDITCARD' THEN unsecuredAccount.Limit
216 | WHEN 'HAS_UNSECUREDLOAN' THEN unsecuredAccount.Balance
217 | ELSE 0
218 | END) as FinancialRisk
219 | WHERE RingSize > 1
220 | RETURN AccountHolders AS FraudRing,
221 | labels(contactInformation) AS ContactType,
222 | RingSize,
223 | round(FinancialRisk) as FinancialRisk
224 | ORDER BY FinancialRisk DESC
225 | ----
226 |
227 | //output
228 | //table
229 |
230 | //console
231 |
--------------------------------------------------------------------------------
/index.adoc:
--------------------------------------------------------------------------------
1 | == Neo4j Use-Case Examples as Guides and GraphGists
2 | :graphgist: http://neo4j.com/graphgist
3 | :guides: http://guides.neo4j.com/graphgists
4 |
5 | Let's look at some interesting use case examples in detail
6 |
7 | . pass:a[Bank Fraud Detection] {graphgist}/9d627127-003b-411a-b3ce-f8d3970c2afa[(GraphGist)]
8 | . pass:a[Books Management Graph] {graphgist}/56c4ceb8-0af1-4d36-b14c-aaa482dc2abc[(GraphGist)]
9 | . pass:a[Analyzing Offshore Leaks] {graphgist}/ec65c2fa-9d83-4894-bc1e-98c475c7b57a[(GraphGist)]
10 | . pass:a[Network Dependency Graph] {graphgist}/306bb0c7-9820-4c29-9835-15625e4e9f96[(GraphGist)]
11 | . pass:a[Job Recommendation System] {graphgist}/4cea8113-30e9-46bc-bbb0-06236a9bd8b9[(GraphGist)]
12 |
13 | === Other Resources
14 |
15 | * http://neo4j.com/graphgists[All Graph Gists]
16 | * http://portal.graphgist.org[GraphGist Author Portal]
17 |
--------------------------------------------------------------------------------
/medical/DoctorFinder.adoc:
--------------------------------------------------------------------------------
1 | = DoctorFinder!
2 | :neo4j-version: 2.3.0
3 | :author: The Vidal Team
4 | :twitter: @fbiville
5 |
6 | :toc:
7 |
8 | This GraphGist represents a mobile application backend helping users to find adequate drugs and specialists given their physical characteristics, location and current symptoms.
9 |
10 | == Our resulting model
11 |
12 | [[img-model]]
13 | .DoctorFinder model
14 | image::http://img15.hostingpics.net/pics/800451GraphGist.png[DoctorFinder! model, 854, 500]
15 |
16 | //hide
17 | //setup
18 | [source,cypher]
19 | -------
20 | CREATE
21 | (_6:DrugClass {name:"Bronchodilators"}),
22 | (_7:DrugClass {name:"Corticosteroids"}),
23 | (_8:DrugClass {name:"Xanthine"}),
24 | (_9:Drug {name:"Salbutamol"}),
25 | (_10:Drug {name:"Terbutaline"}),
26 | (_11:Drug {name:"Bambuterol"}),
27 | (_12:Drug {name:"Formoterol"}),
28 | (_13:Drug {name:"Salmeterol"}),
29 | (_14:Drug {name:"Beclometasone"}),
30 | (_15:Drug {name:"Budesonide"}),
31 | (_16:Drug {name:"Ciclesonide"}),
32 | (_17:Drug {name:"Fluticasone"}),
33 | (_18:Drug {name:"Mometasone"}),
34 | (_19:Drug {name:"Betametasone"}),
35 | (_20:Drug {name:"Prednisolone"}),
36 | (_21:Drug {name:"Dilatrane"}),
37 | (_22:Allergy {name:"Hypersensitivity to Betametasone"}),
38 | (_23:Pathology {name:"Asthma"}),
39 | (_24:Symptom {name:"Wheezing"}),
40 | (_25:Symptom {name:"Chest tightness"}),
41 | (_26:Symptom {name:"Cough"}),
42 | (_27:Doctor {latitude:48.8573,longitude:2.35685,name:"Irving Matrix"}),
43 | (_28:Doctor {latitude:46.83144,longitude:-71.28454,name:"Jack McKee"}),
44 | (_29:Doctor {latitude:48.86982,longitude:2.32503,name:"Michaela Quinn"}),
45 | (_30:DoctorSpecialization {name:"Physician"}),
46 | (_31:DoctorSpecialization {name:"Angiologist"}),
47 | (_6)-[:CURES {age_max:60,age_min:18,indication:"Adult asthma"}]->_23,
48 | (_7)-[:CURES {age_max:18,age_min:5,indication:"Child asthma"}]->_23,
49 | (_8)-[:CURES {age_max:60,age_min:18,indication:"Adult asthma"}]->_23,
50 | (_9)-[:BELONGS_TO_CLASS]->(_6),
51 | (_10)-[:BELONGS_TO_CLASS]->(_6),
52 | (_11)-[:BELONGS_TO_CLASS]->(_6),
53 | (_12)-[:BELONGS_TO_CLASS]->(_6),
54 | (_13)-[:BELONGS_TO_CLASS]->(_6),
55 | (_14)-[:BELONGS_TO_CLASS]->(_7),
56 | (_15)-[:BELONGS_TO_CLASS]->(_7),
57 | (_16)-[:BELONGS_TO_CLASS]->(_7),
58 | (_17)-[:BELONGS_TO_CLASS]->(_7),
59 | (_18)-[:BELONGS_TO_CLASS]->(_7),
60 | (_19)-[:BELONGS_TO_CLASS]->(_6),
61 | (_19)-[:BELONGS_TO_CLASS]->(_7),
62 | (_19)-[:MAY_CAUSE_ALLERGY]->(_22),
63 | (_20)-[:BELONGS_TO_CLASS]->(_7),
64 | (_21)-[:BELONGS_TO_CLASS]->_8,
65 | (_23)-[:MAY_MANIFEST_SYMPTOMS]->(_24),
66 | (_23)-[:MAY_MANIFEST_SYMPTOMS]->(_25),
67 | (_23)-[:MAY_MANIFEST_SYMPTOMS]->(_26),
68 | (_27)-[:SPECIALISES_IN]->(_31),
69 | (_28)-[:SPECIALISES_IN]->(_31),
70 | (_29)-[:SPECIALISES_IN]->(_30),
71 | (_30)-[:CAN_PRESCRIBE]->(_7),
72 | (_31)-[:CAN_PRESCRIBE]->(_6)
73 | -------
74 | //graph
75 |
76 |
77 | From VIDAL with ♥ (`Suzanne`, `Nicolas`, `Édouard`, `Marouane`, `Sébastian`, `Thibaut`, `Olivier`, `Sylvain`, `Florent` (aka Cypher translator)).
78 |
79 | == User stories
80 |
81 | === Symptom autocompletion
82 |
83 | > **As** an application user, +
84 | > **When** I start typing my symptoms
85 | > **Then** matching symptoms are returned in alphabetical order.
86 |
87 | ==== Example
88 |
89 | User types 'c'.
90 |
91 | [source,cypher]
92 | ----
93 | MATCH (s:Symptom)
94 | WHERE UPPER(s.name)=~ UPPER('c.*')
95 | RETURN s.name AS `Symptom`
96 | ORDER BY s.name ASC
97 | ----
98 | //table
99 |
100 | For simplicity's sake, this query will not be included in the following examples.
101 | However, it would definitely be the first clause of each (as user types only symptom starts).
102 | Subsequent queries will assume symptom names were resolved by this first sub-query.
103 |
104 | === Drug advisor
105 |
106 | > **As** an application user, +
107 | > **When** I start typing my symptoms
108 | >
109 | > **Then** adequate drugs are returned, grouped by their therapeutic class.
110 |
111 | ==== Example
112 |
113 | Current user is a 35-year old man, manifesting **wheezing** and **chest tightness**, suffering from **hypersensitivity to Betametasone** allergy.
114 |
115 | We expect all drugs of class `Bronchodilators` (`Betametasone` drug excluded, because of the aforementioned allergy) and `Xanthine` to appear as they are the only therapeutic classes suitable for adults in our dataset.
116 |
117 | [source,cypher]
118 | ----
119 | MATCH (patho:Pathology)-[:MAY_MANIFEST_SYMPTOMS]->(symptoms:Symptom)
120 | WHERE symptoms.name IN ['Chest tightness', 'Wheezing']
121 | WITH patho
122 |
123 | MATCH (DrugClass:DrugClass)-[cures:CURES]->(patho)
124 | WHERE cures.age_min <= 35 AND 35 < cures.age_max
125 | WITH DrugClass
126 |
127 | MATCH (drug:Drug)-[:BELONGS_TO_CLASS]->(DrugClass), (allergy:ALLERGY)
128 | WHERE allergy.name IN ['Hypersensitivity to Betametasone']
129 | AND (NOT (drug)-[:MAY_CAUSE_ALLERGY]->(allergy))
130 | RETURN DrugClass.name AS `Therapeutic class`, COLLECT(DISTINCT drug.name) AS `Drugs`;
131 | ----
132 | //table
133 |
134 | === Doctor finder
135 |
136 | > **As** an application user, +
137 | > **When** I start typing my symptoms
138 | >
139 | > **Then** the doctors who (ahah!) can prescribe adequate drugs are returned with these drugs, ordered by proximity.
140 |
141 | See definition above for what 'adequate drugs' mean.
142 | If drugs can be purchased without prescription, the mention 'No doctor required' for these drugs should be returned, with a distance to user home of **0**.
143 |
144 | ==== Example
145 |
146 | Current user is a 19-year old woman, manifesting **cough**, suffering from hypersensitivity to Betametasone allergy and living at '14, rue de Bruxelles 75009 PARIS, FRANCE' (latitude:48.88344, longitude:2.33180).
147 |
148 | We expect all angiologists to be returned as the drugs they can prescribe can cure illnesses related to the user symptom.
149 |
150 | Moreover, drugs of class `Xanthine` do not require a prescription and they can cure the same kind of illnesses as well.
151 |
152 | [source,cypher]
153 | ----
154 | MATCH (patho:Pathology)-[:MAY_MANIFEST_SYMPTOMS]->(symptoms:Symptom)
155 | WHERE symptoms.name IN ['Cough']
156 | WITH patho
157 |
158 | MATCH (DrugClass:DrugClass)-[cures:CURES]->(patho)
159 | WHERE cures.age_min <= 19 AND 19 < cures.age_max
160 | WITH DrugClass
161 |
162 | MATCH (drug:Drug)-[:BELONGS_TO_CLASS]->(DrugClass), (allergy:ALLERGY)
163 | WHERE allergy.name IN ['Hypersensitivity to Betametasone']
164 | AND (NOT (drug)-[:MAY_CAUSE_ALLERGY]->(allergy))
165 | WITH DrugClass, drug
166 |
167 | OPTIONAL MATCH (doctor:Doctor)-->(spe:DoctorSpecialization)-[:CAN_PRESCRIBE]->(DrugClass)
168 | RETURN COALESCE(doctor.name + ' (' + spe.name + ')', 'No doctor required') AS `Doctor`, COLLECT(DISTINCT drug.name) AS `Drugs for your symptoms`, 2 * 6371 * asin(sqrt(haversin(radians(48.88344 - COALESCE(doctor.latitude,48.88344))) + cos(radians(48.88344)) * cos(radians(COALESCE(doctor.latitude,90)))* haversin(radians(2.33180 - COALESCE(doctor.longitude,2.33180))))) AS `Distance to home (km)`
169 | ORDER BY `Distance to home (km)` ASC;
170 | ----
171 | //table
172 |
173 | As obfuscated as it looks, the distance computation is just a null-safe variant of the haversin formula explained in Cypher manual (indeed, there are drugs that do not require a doctor prescription).
174 |
175 | //console
176 |
--------------------------------------------------------------------------------
/medical/pharma_drugs_targets.adoc:
--------------------------------------------------------------------------------
1 | = Pharmaceutical Drugs and their Targets
2 | Josh Kunken
3 | v1.0, 14-Dec-2013
4 | :neo4j-version: 2.3.0
5 | :author: Josh Kunken
6 | :twitter: joshkunken
7 |
8 | :toc:
9 |
10 | == Domain
11 |
12 | A pharmaceutical portfolio is a collection of drug compounds, their respective indications, and their targets.
13 | A pharmaceutical company or drugstore organizes its pharmaceutical products into one or more portfolios.
14 | A drug portfolio thus contains multiple pharmaceuticals, with each pharmaceutical containing a link to one or more of its targets in the human body.
15 | This lends itself to be modeled as a graph.
16 | Each pharmaceutical and drug target can also have a distinct set of attributes which also fit nicely into the property graph model.
17 | Within the examples found in this use case, most drug targets happen to be G-protein coupled receptors (GPCRs), for which structures have only recently been solved in the last several years.
18 |
19 | A drug can have one or more targets.
20 | A target can be targeted by one or more drugs.
21 | This is not a complete solution for all the drug portfolio use cases but provides a good starting point.
22 |
23 | .Domain Model
24 | image::http://www.sohosci.com/drug_portfolio.PNG[Domain Model]
25 |
26 |
27 | == Setup
28 |
29 | The sample data set uses a pharmaceutical portfolio.
30 |
31 | //hide
32 | //setup
33 | [source,cypher]
34 | ----
35 | CREATE (drugPortfolio:Portfolio{ name:'Pharmaceutical Portfolio' })
36 |
37 | CREATE (drugs:Category { name:'Drugs' })
38 | CREATE drugs-[:PARENT]->drugPortfolio
39 |
40 | CREATE (antipsychotic_agents:Category { name:'Antipsychotic Agents' })
41 | CREATE antipsychotic_agents-[:PARENT]->drugs
42 | CREATE (antiparkinson_agents:Category { name:'Antiparkinson Agents' })
43 | CREATE antiparkinson_agents-[:PARENT]->drugs
44 | CREATE (antimigraine_agents:Category { name:'Antimigraine Agents' })
45 | CREATE antimigraine_agents-[:PARENT]->drugs
46 | CREATE (antidepressive_agents:Category { name:'Antidepressive Agents' })
47 | CREATE antidepressive_agents-[:PARENT]->drugs
48 | CREATE (antiallergic_agents:Category { name:'Antiallergic Agents' })
49 | CREATE antiallergic_agents-[:PARENT]->drugs
50 | CREATE (cns_stimulants:Category { name:'CNS Stimulants' })
51 | CREATE cns_stimulants-[:PARENT]->drugs
52 | CREATE (bronchodilator_agents:Category { name:'Bronchodilator Agents' })
53 | CREATE bronchodilator_agents-[:PARENT]->drugs
54 | CREATE (vasodilator:Category { name:'Vasodilator' })
55 | CREATE vasodilator-[:PARENT]->drugs
56 |
57 | CREATE (HUMAN_5HT1A:DrugTarget{ name:'5HT1A_HUMAN' })
58 | CREATE (HUMAN_5HT1B:DrugTarget{ name:'5HT1B_HUMAN' })
59 | CREATE (HUMAN_5HT2A:DrugTarget{ name:'5HT2A_HUMAN' })
60 | CREATE (HUMAN_AA1R:DrugTarget{ name:'AA1R_HUMAN' })
61 | CREATE (HUMAN_AA2AR:DrugTarget{ name:'AA2AR_HUMAN' })
62 | CREATE (HUMAN_AA2BR:DrugTarget{ name:'AA2BR_HUMAN' })
63 |
64 | CREATE (clozapine:Product { name:'Clozapine' })
65 | CREATE clozapine-[:OF_TYPE]->antipsychotic_agents
66 | CREATE clozapine-[:TARGETS]->HUMAN_5HT1A
67 |
68 | CREATE (aripiprazole:Product { name:'Aripiprazole' })
69 | CREATE aripiprazole-[:OF_TYPE]->antipsychotic_agents
70 | CREATE aripiprazole-[:TARGETS]->HUMAN_5HT1A
71 |
72 | CREATE (lisuride:Product { name:'Lisuride' })
73 | CREATE lisuride-[:OF_TYPE]->antiparkinson_agents
74 | CREATE lisuride-[:TARGETS]->HUMAN_5HT1A
75 |
76 | CREATE (methysergide:Product { name:'Methysergide' })
77 | CREATE methysergide-[:OF_TYPE]->antimigraine_agents
78 | CREATE methysergide-[:TARGETS]->HUMAN_5HT1A
79 |
80 | CREATE (almotriptan:Product { name:'Almotriptan' })
81 | CREATE almotriptan-[:OF_TYPE]->antimigraine_agents
82 | CREATE almotriptan-[:TARGETS]->HUMAN_5HT1B
83 |
84 | CREATE (eletriptan:Product { name:'Eletriptan' })
85 | CREATE eletriptan-[:OF_TYPE]->antimigraine_agents
86 | CREATE eletriptan-[:TARGETS]->HUMAN_5HT1B
87 |
88 | CREATE (ergotamine:Product { name:'Ergotamine' })
89 | CREATE ergotamine-[:OF_TYPE]->antimigraine_agents
90 | CREATE ergotamine-[:TARGETS]->HUMAN_5HT1B
91 |
92 | CREATE (frovatriptan:Product { name:'Frovatriptan' })
93 | CREATE frovatriptan-[:OF_TYPE]->antimigraine_agents
94 | CREATE frovatriptan-[:TARGETS]->HUMAN_5HT1B
95 |
96 | CREATE (naratriptan:Product { name:'Naratriptan' })
97 | CREATE naratriptan-[:OF_TYPE]->antimigraine_agents
98 | CREATE naratriptan-[:TARGETS]->HUMAN_5HT1B
99 |
100 | CREATE (chlorprothixene:Product { name:'Chlorprothixene' })
101 | CREATE chlorprothixene-[:OF_TYPE]->antipsychotic_agents
102 | CREATE chlorprothixene-[:TARGETS]->HUMAN_5HT2A
103 |
104 | CREATE clozapine-[:OF_TYPE]->antipsychotic_agents
105 | CREATE clozapine-[:TARGETS]->HUMAN_5HT2A
106 |
107 | CREATE (cyclobenzaprine:Product { name:'Cyclobenzaprine' })
108 | CREATE cyclobenzaprine-[:OF_TYPE]->antidepressive_agents
109 | CREATE cyclobenzaprine-[:TARGETS]->HUMAN_5HT2A
110 |
111 | CREATE (cyproheptadine:Product { name:'Cyclobenzaprine' })
112 | CREATE cyproheptadine-[:OF_TYPE]->antiallergic_agents
113 | CREATE cyproheptadine-[:TARGETS]->HUMAN_5HT2A
114 |
115 | CREATE (caffeine:Product { name:'Caffeine' })
116 | CREATE caffeine-[:OF_TYPE]->cns_stimulants
117 | CREATE caffeine-[:TARGETS]->HUMAN_AA1R
118 | CREATE caffeine-[:TARGETS]->HUMAN_AA2AR
119 |
120 | CREATE (theophylline:Product { name:'Theophylline' })
121 | CREATE theophylline-[:OF_TYPE]->bronchodilator_agents
122 | CREATE theophylline-[:TARGETS]->HUMAN_AA1R
123 | CREATE theophylline-[:TARGETS]->HUMAN_AA2AR
124 | CREATE theophylline-[:TARGETS]->HUMAN_AA2BR
125 |
126 | CREATE (regadenoson:Product { name:'Regadenoson' })
127 | CREATE regadenoson-[:OF_TYPE]->vasodilator
128 | CREATE regadenoson-[:TARGETS]->HUMAN_AA2AR
129 | ----
130 |
131 | === Try other queries yourself!
132 | //console
133 |
134 | == Use Cases
135 |
136 | == All portfolios
137 |
138 | [source,cypher]
139 | ----
140 | MATCH (c:Portfolio)
141 | RETURN c.name AS Portfolios
142 | ----
143 | //table
144 |
145 | == All categories by Depth
146 |
147 | [source,cypher]
148 | ----
149 | MATCH p=(cats:Category)-[:PARENT|PARENT*]->(cat:Portfolio)
150 | RETURN LENGTH(p) AS Depth, COLLECT(cats.name) AS Categories
151 | ORDER BY Depth ASC
152 | ----
153 | //table
154 |
155 | == All categories of a given depth
156 |
157 | [source,cypher]
158 | ----
159 | MATCH p=(cats:Category)-[:PARENT*]->(cat:Portfolio)
160 | WHERE cat.name='Pharmaceutical Portfolio' AND length(p)=1
161 | RETURN cats.name AS `Categories of Given Level`
162 | ORDER BY cats.name
163 | ----
164 | //table
165 |
166 | == All sub-categories of a given category
167 |
168 | [source,cypher]
169 | ----
170 | MATCH (cats:Category)-[:PARENT]->(parentCat:Category), (parentCat)-[:PARENT*]->(c:Portfolio)
171 | RETURN parentCat.name AS Parent, COLLECT(cats.name) AS SubCategories
172 | ----
173 | //table
174 |
175 | == All parents and their child categories
176 |
177 | [source,cypher]
178 | ----
179 | MATCH (child:Category)-[:PARENT*]->(parent)
180 | RETURN parent.name AS Parent, COLLECT(child.name) AS Children
181 | ----
182 | //table
183 |
184 | == All parent and their IMMEDIATE children
185 |
186 | [source,cypher]
187 | ----
188 | MATCH (child:Category)-[:PARENT]->(parent)
189 | RETURN labels(parent), parent.name AS Parent, COLLECT(child.name) AS Children
190 | ----
191 | //table
192 | //console
193 |
--------------------------------------------------------------------------------
/medical/treatment_planners.adoc:
--------------------------------------------------------------------------------
1 | = Behavioral Health Treatment Planning
2 | Greg Ricker
3 | v1.0, 22-2-2015
4 | :neo4j-version: 2.3.0
5 | :author: Greg Ricker
6 | :twitter: @greg_ricker
7 |
8 | :toc:
9 |
10 | == Using the Wiley Treatment Plan
11 |
12 | I am using the "Wiley treatment plan" data set as the basis of our domain model since it is one standard that is used in behavioral health.
13 | A key aspect of treatment in the field of behavioral health involves creating a four-part treatment plan, packaged as libraries, consisting of a Problem, Goal, Objective, and Intervention.
14 |
15 | === The Problem
16 |
17 | The Problem states, in general terms, what the patient is suffering with.
18 | For example, this might be Depression, Low self-Esteem, Substance Abuse, or something else.
19 |
20 | === The Goal
21 |
22 | The Goal is the end result.
23 | For example, a patient might have the goal of "Demonstrate respect and regard for self and others".
24 |
25 | === The Objectives
26 |
27 | Objectives are milestones along the way from the Problem to the Goal: ways in which the patient is going to improve.
28 |
29 | === The Intervention
30 |
31 | Interventions are tasks or activities performed as part of the plan.
32 | These may be actions taken by the patient and others involved in the treatment plan.
33 |
34 | === Additional Complications
35 |
36 | In practice, there are several snarls in the model described above that make the implementation of the Wiley plan difficult for relational databases.
37 |
38 | The first is the existence of links from problem to goal to intervention restricting interventions to those related only to a specfic problem and/or goal.
39 | Setting this up in an SQL database required a separate table to maintain "linkages" for each plan.
40 | Generating an appropriate plan requires traversing the linkage table a number of times, resulting in queries that can run from two to ten seconds depending on how many libraries are loaded.
41 |
42 | The second change is that not everyone uses the plan in the defined order of problem -> goal -> objective -> intervention.
43 | Any practical implementation of the treatment plan system has to let the user start from any point in the plan and work from there.
44 | For example, the user can start with goal then jump to intervention and then on to problem.
45 |
46 | Thirdly, the information (goals, objectives, interventions) is reused.
47 | For example, GoalA may be used for ProblemA in LibraryA but it maybe used again with other problems within the same library or span libraries.
48 |
49 | .The Wiley Treatment Plan Domain Model
50 | [Domain Model]
51 | image::https://gricker.files.wordpress.com/2015/02/wiley.png[]
52 |
53 | == Implementing The Wiley Plan using Neo4j
54 |
55 | Implementing the model in Neo4j resulted in 300 nodes and 120k relationships.
56 | A typical query runs in about 500 ms and `RETURN`s 500-700 values.
57 | In addition, adding custom plans that deviate from the Wiley plan was easy and didn't affect performance.
58 |
59 | .Modified Wiley Treatment Plan Domain Model
60 | [Domain Model]
61 | image::https://gricker.files.wordpress.com/2015/02/treatment-model.png[]
62 |
63 | == Nodes
64 |
65 | ----
66 | (:Library)
67 | (:Problem)
68 | (:Goal)
69 | ----
70 |
71 | == Relationships
72 |
73 | ----
74 | (:Library)-[:HAS_PROBLEM]->(:Problem)-[:HAS_GOAL]->(:Goal)
75 | (:Problem)-[:HAS_OBJECTIVE]->(:Objective)
76 | (:Problem)-[:HAS_INTERVENTION]->(:Intervenion)
77 | ----
78 |
79 | === Sample Dataset
80 |
81 | The sample data set uses a one library, one problem, and four objectives, goals, and interventions.
82 |
83 | //hide
84 | //setup
85 | //output
86 | [source,cypher]
87 | ----
88 | CREATE (lib:Library {GroupID:'230', Description:'School Counseling and Social Work'})
89 | CREATE (prob1:Problem {name:'17', Description:'Parenting Skills/Discipline',GroupID:'230'})
90 | CREATE (obj1:Objective {name:'9', Description:'Parents use natural and logical consequences to redirect the students behavior.',GroupID:'230',ProblemNumber:'17'})
91 | CREATE (obj2:Objective {name:'8', Description:'Parents allow the student to learn from his/her mistakes.',GroupID:'230',ProblemNumber:'17'})
92 | CREATE (obj3:Objective {name:'6', Description:'Parents set limits using positive discipline strategies.',GroupID:'230',ProblemNumber:'17'})
93 | CREATE (obj4:Objective {name:'20', Description:'Parents work to maintain a strong, couple-centered family environment',GroupID:'230',ProblemNumber:'17'})
94 | CREATE (goal1:Goal {name:'4', Description:'Acquire positive and moral character traits',GroupID:'230',ProblemNumber:'17'})
95 | CREATE (goal2:Goal {name:'3', Description:'Demonstrate respect and regard for self and others.',GroupID:'230',ProblemNumber:'17'})
96 | CREATE (goal3:Goal {name:'5', Description:'Parents acquire positive discipline strategies that set limits and encourage independence.,',GroupID:'230',ProblemNumber:'17'})
97 | CREATE (goal4:Goal {name:'6', Description:'Family atmosphere is peaceful, loving, and harmonious.',GroupID:'230',ProblemNumber:'17'})
98 | CREATE (intervention1:Intervention {name:'7', Description:'Suggest that the parents and the student meet weekly at a designated time to review progress, give encouragement, note continuing concerns, and keep a written progress report to share with a counselor or private therapist.',GroupID:'230',ProblemNumber:'17'})
99 | CREATE (intervention2:Intervention {name:'38', Description:'Encourage the parents and teachers to allow the student to seek his/her own solutions with guidance even if it requires some struggle and learning from mistakes. Recommend that the parents and teachers listen to the students problems with empathy and give guidance or assistance only when requested; discuss the results of this approach in a subsequent counseling session.',GroupID:'230',ProblemNumber:'17'})
100 | CREATE (intervention3:Intervention {name:'1', Description:'Meet with the parents to obtain information about discipline, family harmony, and the students developmental history.',GroupID:'230',ProblemNumber:'17'})
101 | CREATE (intervention4:Intervention {name:'8', Description:'Have the student complete the (Personal Profile) informational sheet from the School Counseling and School Social Homework Planner (Knapp), which details pertinent personal data, or gather personal information in an informal interview with the student."',GroupID:'230',ProblemNumber:'17'})
102 | CREATE (lib)-[:HAS_PROBLEM]->(prob1)
103 | CREATE (prob1)-[:HAS_GOAL]->(goal1)
104 | CREATE (prob1)-[:HAS_GOAL]->(goal2)
105 | CREATE (prob1)-[:HAS_GOAL]->(goal3)
106 | CREATE (prob1)-[:HAS_GOAL]->(goal4)
107 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj1)
108 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj2)
109 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj3)
110 | CREATE (prob1)-[:HAS_OBJECTIVE]->(obj4)
111 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention1)
112 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention2)
113 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention3)
114 | CREATE (prob1)-[:HAS_INTERVENTION]->(intervention4)
115 | ----
116 | // graph
117 |
118 | == Use Cases
119 |
120 | == Display All Objectives
121 |
122 | [source,cypher]
123 | ----
124 | MATCH (c:Objective)
125 | RETURN c.Description AS Objective
126 | ----
127 | //table
128 |
129 | == Display All Problems
130 | [source,cypher]
131 | ----
132 | MATCH (c:Problem)
133 | RETURN c.Description AS Problem
134 | ----
135 | //table
136 |
137 | == Display All Goals
138 |
139 | [source,cypher]
140 | ----
141 | MATCH (c:Goal)
142 | RETURN c.Description AS Goal
143 | ----
144 | //table
145 |
146 | == Find nterventions for all libraries and problem number 17
147 |
148 | [source,cypher]
149 | ----
150 | MATCH (lib:Library)-[:HAS_PROBLEM]->(st:Problem{name:'17'})-[:HAS_INTERVENTION]-(i:Intervention)
151 | RETURN lib.Description AS Library, st.Description AS Problem, i.Description AS Intervention;
152 | ----
153 | //table
154 |
155 | == Display All Problems, Interventions, and Objectives for one library
156 |
157 | [source,cypher]
158 | ----
159 | MATCH (lib:Library{GroupID:'230'})-[:HAS_PROBLEM]->(st:Problem{name:'17'})-[:HAS_INTERVENTION]-(i:Intervention) with i,st MATCH (st)-[:HAS_OBJECTIVE]->(m:Objective)
160 | RETURN st.Description AS Problem, m.Description AS Objective, i.Description AS Intervention;
161 | ----
162 | //table
163 |
164 | == Conclusion
165 | Developing the treatment planner in SQL took months to get correct and the performance to the point where it was useable.
166 | I used py2neo to populate import the data in to the graph.
167 | In all, it took less than a week from start to finish(it took longer to create this gist).
168 |
169 | //console
170 |
--------------------------------------------------------------------------------
/networkITmanagment/datacenter-management-1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/neo4j-examples/graphgists/81d6bef98570eea3ce3cc16515fcebc6ed785f9c/networkITmanagment/datacenter-management-1.PNG
--------------------------------------------------------------------------------
/networkITmanagment/network-routing.adoc:
--------------------------------------------------------------------------------
1 | = Information Flow Through a Network
2 | :neo4j-version: 2.3.0
3 | :twitter: @lyonwj
4 | :author: William L.
5 |
6 | :toc:
7 |
8 | == Introduction
9 |
10 | Modern financial markets operate very quickly due to algorithmic trading.
11 | Essentially, computers execute trades automatically based on information inputs and clever algorithms. All of this occurs very rapidly.
12 | The computers are so good at executing trades quickly, in fact, that the speed at which information moves from one city to another is actually a limiting factor.
13 | For example, if an announcement that impacts the financial markets is made in Washington, DC, traders in New York, NY will probably "hear" about it before those in, say, Seattle, WA.
14 |
15 | Since information can travel no faster than the speed of light (in reality it travels much more slowly because of various processing that must be done along the way) we can compute a lower bound on the time it will take a particular piece of information released in one city to arrive in other cities.
16 | As you might expect, this problem involves flow through a network, and is therefore fairly simple to model in Neo4j (or any graph database, really).
17 |
18 | == Data Model
19 |
20 | Our data model consists of a set of cities, each with a latitude and longitude, which we will use to compute distances (we could have entered the distances as data, but it was more fun to use Neo4j for this task, and in a more generalized case you might want the nodes to be able to move).
21 | Each city is linked to one or more other cities by `:BACKBONE_TO` relationships.
22 | This indicates that the two cities involved in the relationship have an Internet backbone running between them.
23 | For this example, we mostly made up the backbones, although the backbones running to Tokyo are accurate given actual undersea cable topology.
24 |
25 | .Information network data model
26 | image::http://i.imgur.com/uxv29rM.png[Information network data model]
27 |
28 | //hide
29 | //setup
30 | [source,cypher]
31 | ----
32 | CREATE (chc:City { name: "Chicago", lat: 41.833, lon: -87.617 }), (sea:City { name: "Seattle", lat: 47.617, lon: -122.334 }),
33 | (sfo:City { name: "San Francisco", lat: 37.783, lon: -122.433 }), (tok:City { name: "Tokyo", lat: 35.667, lon: 139.75 }),
34 | (chc)-[:BACKBONE_TO]->(sea), (sea)-[:BACKBONE_TO]->(sfo), (sea)-[:BACKBONE_TO]->(tok), (sfo)-[:BACKBONE_TO]->(tok)
35 | ----
36 | //graph
37 |
38 | == Distance Computation
39 |
40 | Before we can establish a lower bound on the time it takes information to flow between cities, we must determine the distance between cities that are linked by an Internet backbone.
41 | We assume that the cables connecting cities lie on https://en.wikipedia.org/wiki/Great-circle_distance[great circles], in other words, that each cable lies on the shortest possible line between cities.
42 |
43 | [source,cypher]
44 | ----
45 | // Find the great circle distance from Tokyo to Seattle
46 | MATCH (c1:City { name: "Tokyo" }), (c2:City { name: "Seattle" })
47 | RETURN 2 * 6371 * asin(sqrt(haversin(radians(c1.lat - c2.lat)) + cos(radians(c1.lat)) * cos(radians(c2.lat)) * haversin(radians(c1.lon - c2.lon)))) AS Distance
48 | ----
49 | //table
50 |
51 | In our case we would like to record the distances between cities that are connected with an Internet backbone.
52 | We can store this data on the `:BACKBONE_TO` edges.
53 | Note that we don't have to worry about double-computing because we have specified a relationship direction, so each distance will only be computed once.
54 | If we left out the direction, each distance would be computed twice, although the end result would be exactly the same.
55 |
56 | [source,cypher]
57 | ----
58 | // Add distance to backbone edges
59 | MATCH (c1:City)-[r:BACKBONE_TO]->(c2:City)
60 | WITH 2 * 6371 * asin(sqrt(haversin(radians(c1.lat - c2.lat))+ cos(radians(c1.lat))* cos(radians(c2.lat))* haversin(radians(c1.lon - c2.lon)))) AS dist, r, c1, c2
61 | SET r.dist = dist
62 | RETURN c1.name, c2.name, r.dist
63 | ----
64 | //graph
65 |
66 | == Mapping Information Flow
67 |
68 | Our next step is to find candidate paths from one city to another.
69 | Once we have these paths, we can compute the minimum amount of time it would take a piece of information to move along each path, and find the shortest route between them.
70 | First, we can find all unique, simple (no repeated cities) from one city to another.
71 | Here is example for Tokyo and Chicago:
72 |
73 | [source,cypher]
74 | ----
75 | // Find all unique, simple paths from one city to another
76 | MATCH p=(:City { name: "Tokyo" })-[:BACKBONE_TO*]-(:City { name: "Chicago" })
77 | WHERE all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c)))
78 | RETURN DISTINCT extract(n IN nodes(p) | n.name) AS Path, length(p) AS Length
79 | ----
80 | //table
81 |
82 | Next, we would like to find the shortest path, in terms of distance:
83 |
84 | [source,cypher]
85 | ----
86 | // Find the shortest distance path from one city to another
87 | MATCH p=(:City { name: "Tokyo" })-[:BACKBONE_TO*]-(:City { name: "Chicago" })
88 | WHERE all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c)))
89 | WITH reduce(s = 0, hop IN rels(p) | s + hop.dist) AS distance, p
90 | ORDER BY distance LIMIT 1
91 | RETURN DISTINCT extract(n IN nodes(p) | n.name) AS Path, length(p) AS Length, distance AS Distance
92 | ----
93 | //table
94 |
95 | The next step is to change distance into an amount of time.
96 | We will assume that information travels between cities at the speed of light.
97 | As mentioned earlier this is not strictly true, but since we are looking for a lower bound on the time it takes information to move from one city to another, using the fastest possible speed for the information flow itself makes sense.
98 |
99 | As an aside, we could make this model more complicated, and perhaps more accurate, by taking into account the processing time each each junction point along the way.
100 | When information arrives at a particular node in the network it must be routed to the next node along its path, this takes some non-zero amount of time.
101 | Therefore, we might actually want to minimize the distance information has to travel, subject to the constraint that each "hop" has a cost.
102 | In this case, we would tend to prefer less complicated paths, even if they are slightly longer.
103 |
104 | We can compute the number of milliseconds it should take information to travel between cities using the query below.
105 | Note that we simply divide the distance by 300 to get milliseconds, since light travels at 300,000 kilometers per second.
106 |
107 | [source,cypher]
108 | ----
109 | // Compute minimum milliseconds for information travel from one city to another
110 | MATCH p=(:City { name: "Tokyo" })-[:BACKBONE_TO*]-(:City { name: "Chicago" })
111 | WHERE all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c)))
112 | WITH reduce(s = 0, hop IN rels(p) | s + hop.dist) AS distance, p
113 | ORDER BY distance LIMIT 1
114 | RETURN DISTINCT extract(n IN nodes(p) | n.name) AS Path, length(p) AS Length, distance / 300 AS ms
115 | ----
116 | //table
117 |
118 | We can then generalize this query to compute the time it would take information to travel between any pair of cities.
119 |
120 | [source,cypher]
121 | ----
122 | // Compute the minimum travel time between all pairs of cities
123 | MATCH p=(c1:City)-[:BACKBONE_TO*]-(c2:City)
124 | WHERE c1.name <> c2.name and all(c IN nodes(p) WHERE 1=length(filter(m IN nodes(p) WHERE m=c)))
125 | WITH reduce(s = 0, hop IN rels(p) | s + hop.dist) AS distance, p, c1, c2
126 | ORDER BY distance
127 | RETURN c1.name AS `Start City`, c2.name AS `End City`, collect(distance / 300)[0] AS ms
128 | ORDER BY c2.name
129 | ----
130 | //table
131 |
132 | == About
133 |
134 | Create by George Lesica (https://twitter.com/glesica[@glesica]) and William Lyon (https://twitter.com/lyonwj[@lyonwj]).
135 |
136 | //console
137 |
--------------------------------------------------------------------------------
/render-guides.sh:
--------------------------------------------------------------------------------
1 | export GUIDES=../neo4j-guides
2 |
3 | rm -rf html
4 | mkdir html
5 |
6 | $GUIDES/run.sh index.adoc html/index.html +1 http://guides.neo4j.com/graphgists
7 |
8 | s3cmd put -P html/index.html s3://guides.neo4j.com/graphgists
9 |
10 | #. http://neo4j.com/graphgist/9d627127-003b-411a-b3ce-f8d3970c2afa[Bank Fraud Detection]
11 |
12 | $GUIDES/run.sh fraud/bank-fraud-detection.adoc html/fraud
13 |
14 | #. http://neo4j.com/graphgist/56c4ceb8-0af1-4d36-b14c-aaa482dc2abc[Books Management Graph]
15 |
16 | $GUIDES/run.sh uc-search/books.adoc html/books
17 |
18 | #. http://neo4j.com/graphgist/ec65c2fa-9d83-4894-bc1e-98c475c7b57a[Analyzing Offshore Leaks]
19 |
20 | $GUIDES/run.sh fraud/Offshore_Leaks_and_Azerbaijan.adoc html/leaks
21 |
22 | #. http://neo4j.com/graphgist/306bb0c7-9820-4c29-9835-15625e4e9f96[Network Dependency Graph]
23 |
24 | $GUIDES/run.sh networkITmanagment/NetworkDataCenterManagement1.adoc html/network
25 |
26 | #. http://neo4j.com/graphgist/4cea8113-30e9-46bc-bbb0-06236a9bd8b9[Job Recommendation System]
27 |
28 | $GUIDES/run.sh recommendation/Competence_Management.adoc html/jobs
29 |
30 | s3cmd put -P --recursive html/* s3://guides.neo4j.com/graphgists/
31 |
--------------------------------------------------------------------------------
/retail/hierarchy_graphgist.adoc:
--------------------------------------------------------------------------------
1 | = (Product) Hierarchy GraphGist
2 | :neo4j-version: 2.3.0
3 | :twitter: @rvanbruggen
4 | :author: Rik Van Bruggen
5 |
6 | :toc:
7 |
8 | == Introduction
9 |
10 | This gist is a complement to http://blog.bruggen.com/2014/03/using-Neo4j-to-manage-and-calculate.html[a blogpost that I wrote] about managing hierarchical data structures in http://www.Neo4j.org[Neo4j].
11 |
12 | In this example, we are using a "product hierarchy", essentially holding information about the composition of a product (what is it made of, how many of the components are used, and at the lowest level, what is the price of these components).
13 | The model looks like this:
14 |
15 | .Model of a Product Hierarchy
16 | image::http://1.bp.blogspot.com/-XIjEXWHpNmc/Uzbhuoo-9xI/AAAAAAABNWE/7zYyn3Vl3i0/s3200/Screen+Shot+2014-03-29+at+16.04.35.png[]
17 |
18 | Note that in the GraphGist, I have cut the tree depth to 5 levels (product to costs) instead of 6 in the blogpost - and that I also reduced the width of the tree to make it manageable in a gist.
19 |
20 | == Loading some data: a 5-level tree
21 | First we have to load the data into the graph. This was a bit of work - but not difficult at all:
22 |
23 | .Creating the top of the tree, the Product (just one in this case):
24 | [source,cypher]
25 | ----
26 | CREATE (n1:Product {id:1})
27 | ----
28 | .Then CREATE the Cost Groups:
29 | [source,cypher]
30 | ----
31 | MATCH (n1:Product) foreach (r in range(1,3) | CREATE (n2:CostGroup {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n1))
32 | ----
33 | .Then add the Cost Types to the Cost Groups:
34 | [source,cypher]
35 | ----
36 | MATCH (n2:CostGroup) foreach (r in range(1,5) | CREATE (n3:CostType {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n2))
37 | ----
38 | .Then add the Cost Subtypes to the Cost Types:
39 | [source,cypher]
40 | ----
41 | MATCH (n3:CostType) foreach (r in range(1,3) | CREATE (n4:CostSubtype {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n3))
42 | ----
43 | .Then finally add the Costs to the Cost Subtypes:
44 | [source,cypher]
45 | ----
46 | MATCH (n4:CostSubtype) foreach (r in range(1,5) | CREATE (n5:COST {id:r,price:round(rand()*1000)})-[:PART_OF {quantity:round(rand()*100)}]->(n4))
47 | ----
48 |
49 | The actual graph then looks like this:
50 |
51 | //graph
52 |
53 | == Querying the hierarchy structure ==
54 |
55 | Then we can do some easy queries.
56 | Let's check the structure of the hierarchy and the number of nodes:
57 |
58 | [source,cypher]
59 | ----
60 | MATCH (n)
61 | RETURN labels(n) AS `Kinds of Nodes`, count(n) AS `Number of Nodes`;
62 | ----
63 |
64 | This is what it looks like:
65 |
66 | //table
67 |
68 | Now let's start manipulating the graph and do some interesting stuff.
69 | Let's calculate the price of the product at the top of this hierarchy by sweeping through the graph and mutiplying price by the quantities on each ot the relationships.
70 |
71 | [source,cypher]
72 | ----
73 | //calculating price based on full sweep of the tree
74 | MATCH (n1:Product {id:1})<-[r1]-(:CostGroup)<-[r2]-(:CostType)<-[r3]-(:CostSubtype)<-[r4]-(n5:COST)
75 | RETURN sum(r1.quantity*r2.quantity*r3.quantity*r4.quantity*n5.price) AS `Price of Product`
76 | ----
77 | //table
78 |
79 | == Optimising the calculation with intermediate price values at every level
80 |
81 | But maybe we can do that more efficiently by calculating intermediate prices for each of the levels in the hierarchy:
82 |
83 | [source, cypher]
84 | ----
85 | //calculate intermediate pricing
86 | MATCH (n4:CostSubtype)<-[r4]-(n5:COST)
87 | WITH n4,sum(r4.quantity*n5.price) AS Sum
88 | SET n4.price=Sum;
89 | ----
90 | [source, cypher]
91 | ----
92 | MATCH (n3:CostType)<-[r3]-(n4:CostSubtype)
93 | WITH n3,sum(r3.quantity*n4.price) AS Sum
94 | SET n3.price=Sum;
95 | ----
96 | [source, cypher]
97 | ----
98 | MATCH (n2:CostGroup)<-[r2]-(n3:CostType)
99 | WITH n2,sum(r2.quantity*n3.price) AS Sum
100 | SET n2.price=Sum;
101 | ----
102 | [source, cypher]
103 | ----
104 | MATCH (n1:Product)<-[r1]-(n2:CostGroup)
105 | WITH n1, sum(r1.quantity*n2.price) AS Sum
106 | SET n1.price=Sum
107 | RETURN Sum;
108 | ----
109 | //table
110 |
111 | Then we can easily calculate the price of the product by just using the intermediate pricing, and scanning a MUCH smaller part of the graph:
112 |
113 | [source, cypher]
114 | ----
115 | MATCH (n1:Product {id:1})<-[r1]-(n2:CostGroup)
116 | RETURN sum(r1.quantity*n2.price) AS `Price of Product`
117 | ----
118 |
119 | //table
120 |
121 | We can check the accuracy by looking at a different level and verifying if we get the same result:
122 |
123 | [source, cypher]
124 | ----
125 | MATCH (n1:Product {id:1})<-[r1]-(n2:CostGroup)<-[r2]-(n3:CostType)
126 | RETURN sum(r1.quantity*r2.quantity*n3.price) AS `Price of Product`
127 | ----
128 | //table
129 |
130 | Yey! That seems to have confirmed the theory!
131 |
132 | == What if something changes to the hierarchy? ==
133 | Now let's see what happens if we change something to the price of one of the costs at the bottom of the tree:
134 |
135 | [source,cypher]
136 | ----
137 | MATCH (n5:COST)
138 | WITH n5, n5.price AS OldPrice LIMIT 1
139 | SET n5.price = n5.price*10
140 | WITH n5.price-OldPrice AS PriceDiff,n5
141 | MATCH (n5)-[r4:PART_OF]->(n4:CostSubtype)-[r3:PART_OF]->(n3:CostType)-[r2:PART_OF]->(n2:CostGroup)-[r1:PART_OF]-(n1:Product)
142 | SET n4.price=n4.price+(PriceDiff*r4.quantity),
143 | n3.price=n3.price+(PriceDiff*r4.quantity*r3.quantity),
144 | n2.price=n2.price+(PriceDiff*r4.quantity*r3.quantity*r2.quantity),
145 | n1.price=n1.price+(PriceDiff*r4.quantity*r3.quantity*r2.quantity*r1.quantity)
146 | RETURN PriceDiff AS `Price Difference`, n1.price AS `New Price of Product`
147 | ----
148 | //table
149 |
150 | Then we can also go back and replay the queries above and see what has happened in the console below:
151 |
152 | == Conclusion ==
153 |
154 | I hope this gist complements the blogpost and gives you some ideas around how to work with any kind of hierarchy using Neo4j.
155 |
156 | == About the Author
157 |
158 | This gist was created by link:mailto:rik@neotechnology.com[Rik Van Bruggen]
159 |
160 | * link:http://blog.bruggen.com[My Blog]
161 | * link:http://twitter.com/rvanbruggen[On Twitter]
162 | * link:http://be.linkedin.com/in/rikvanbruggen/[On LinkedIn]
163 |
164 | //console
165 |
--------------------------------------------------------------------------------
/retail/northwind-graph.adoc:
--------------------------------------------------------------------------------
1 | = Northwind Graph
2 | :neo4j-version: 2.3.0
3 |
4 | :toc:
5 |
6 | == From RDBMS to Graph, using a classic dataset
7 |
8 | The__Northwind Graph__ demonstrates how to migrate from a relational
9 | database to Neo4j. The transformation is iterative and deliberate,
10 | emphasizing the conceptual shift from relational tables to the nodes and
11 | relationships of a graph.
12 |
13 | This guide will show you how to:
14 |
15 | 1. Load: create data from external CSV files
16 | 2. Index: index nodes based on label
17 | 3. Relate: transform foreign key references into data relationships
18 | 4. Promote: transform join records into relationships
19 |
20 |
21 | == Product Catalog
22 |
23 | Northwind sells food products in a few categories, provided by
24 | suppliers. Let's start by loading the product catalog tables.
25 |
26 | The load statements to the right require public internet
27 | access.`LOAD CSV` will retrieve a CSV file from a valid URL, applying a
28 | Cypher statement to each row using a named map (here we're using the
29 | name `row`).
30 |
31 | image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151109/product-category-supplier.png[image]
32 |
33 | == Load records
34 |
35 | [source,cypher]
36 | ----
37 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
38 | CREATE (n:Product)
39 | SET n = row,
40 | n.unitPrice = toFloat(row.unitPrice),
41 | n.unitsInStock = toInt(row.unitsInStock), n.unitsOnOrder = toInt(row.unitsOnOrder),
42 | n.reorderLevel = toInt(row.reorderLevel), n.discontinued = (row.discontinued <> "0")
43 | ----
44 |
45 | [source,cypher]
46 | ----
47 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/categories.csv" AS row
48 | CREATE (n:Category)
49 | SET n = row
50 | ----
51 |
52 | [source,cypher]
53 | ----
54 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/suppliers.csv" AS row
55 | CREATE (n:Supplier)
56 | SET n = row
57 | ----
58 |
59 | == Create indexes
60 |
61 | [source,cypher]
62 | ----
63 | CREATE INDEX ON :Product(productID)
64 | ----
65 |
66 | [source,cypher]
67 | ----
68 | CREATE INDEX ON :Category(categoryID)
69 | ----
70 |
71 | [source,cypher]
72 | ----
73 | CREATE INDEX ON :Supplier(supplierID)
74 | ----
75 |
76 | == Product Catalog Graph
77 |
78 | The products, categories and suppliers are related through foreign key
79 | references. Let's promote those to data relationships to realize the
80 | graph.
81 |
82 | image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151108/product-graph.png[image]
83 |
84 | === Create data relationships
85 |
86 | [source,cypher]
87 | ----
88 | MATCH (p:Product),(c:Category)
89 | WHERE p.categoryID = c.categoryID
90 | CREATE (p)-[:PART_OF]->(c)
91 | ----
92 |
93 | Note you only need to compare property values like this when first
94 | creating relationships
95 |
96 | Calculate join, materialize relationship.
97 | (See http://neo4j.com/developer/guide-importing-data-and-etl[importing
98 | guide] for more details)
99 |
100 | [source,cypher]
101 | ----
102 | MATCH (p:Product),(s:Supplier)
103 | WHERE p.supplierID = s.supplierID
104 | CREATE (s)-[:SUPPLIES]->(p)
105 | ----
106 |
107 | Note you only need to compare property values like this when first
108 | creating relationships
109 |
110 | == Querying Product Catalog Graph
111 |
112 | Lets try some queries using patterns.
113 |
114 | image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151108/product-graph.png[image]
115 |
116 | === Query using patterns
117 |
118 | List the product categories provided by each supplier:
119 |
120 | [source,cypher]
121 | ----
122 | MATCH (s:Supplier)-->(:Product)-->(c:Category)
123 | RETURN s.companyName as Company, collect(distinct c.categoryName) as Categories
124 | ----
125 | //table
126 |
127 | [source,cypher]
128 | ----
129 | MATCH (c:Category {categoryName:"Produce"})<--(:Product)<--(s:Supplier)
130 | RETURN DISTINCT s.companyName as ProduceSuppliers
131 | ----
132 | //table
133 |
134 | Find the produce suppliers.
135 |
136 | == Customer Orders
137 |
138 | Northwind customers place orders which may detail multiple
139 | products.image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151108/customer-orders.png[image]
140 |
141 | === Load and index records
142 |
143 | [source,cypher]
144 | ----
145 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/customers.csv" AS row
146 | CREATE (n:Customer)
147 | SET n = row
148 | ----
149 |
150 | [source,cypher]
151 | ----
152 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/orders.csv" AS row
153 | CREATE (n:Order)
154 | SET n = row
155 | ----
156 |
157 | [source,cypher]
158 | ----
159 | CREATE INDEX ON :Customer(customerID)
160 | ----
161 |
162 | [source,cypher]
163 | ----
164 | CREATE INDEX ON :Order(orderID)
165 | ----
166 |
167 | == Create data relationships
168 |
169 | [source,cypher]
170 | ----
171 | MATCH (c:Customer),(o:Order)
172 | WHERE c.customerID = o.customerID
173 | CREATE (c)-[:PURCHASED]->(o)
174 | ----
175 |
176 | Note you only need to compare property values like this when first
177 | creating relationships
178 |
179 | == Customer Order Graph
180 |
181 | Notice that Order Details are always part of an Order and that
182 | they__relate__ the Order to a Product — they're a join table. Join
183 | tables are always a sign of a data relationship, indicating shared
184 | information between two other records.
185 |
186 | Here, we'll directly promote each OrderDetail record into a relationship
187 | in the graph.image:http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/20160211151107/order-graph.png[image]
188 |
189 |
190 | === Load and index records
191 |
192 | [source,cypher]
193 | ----
194 | LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/order-details.csv" AS row
195 | MATCH (p:Product), (o:Order)
196 | WHERE p.productID = row.productID AND o.orderID = row.orderID
197 | CREATE (o)-[details:ORDERS]->(p)
198 | SET details = row,
199 | details.quantity = toInt(row.quantity)
200 | ----
201 |
202 | Note you only need to compare property values like this when first
203 | creating relationships
204 |
205 | == Query using patterns
206 |
207 | [source,cypher]
208 | ----
209 | MATCH (cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product),
210 | (p)-[:PART_OF]->(c:Category {categoryName:"Produce"})
211 | RETURN DISTINCT cust.contactName as CustomerName, SUM(o.quantity) AS TotalProductsPurchased
212 | ----
213 | //table
214 |
215 | _More Resources_
216 |
217 | * http://neo4j.com/developer/guide-importing-data-and-etl/[Full
218 | Northwind import example]
219 | * http://neo4j.com/developer[Developer resources]
220 |
221 |
--------------------------------------------------------------------------------
/syntax.adoc:
--------------------------------------------------------------------------------
1 | = How to create a GraphGist
2 | Anders Nawroth
3 | v0.1, 2013-09-01
4 | :neo4j-version: 2.3
5 | :author: Anders Nawroth
6 | :twitter: @nawroth
7 | :style: red:Person(name), #54A835/#1078B5/white:Database(name)
8 |
9 | You create a GraphGist by creating a https://gist.github.com/[GitHub Gist] in http://asciidoctor.org/docs/asciidoc-quick-reference/[AsciiDoc] and enter the URL to it in the form on this page.
10 | Alternatively, you can put an AsciiDoc document in https://www.dropbox.com/[Dropbox], etherpad, pastebin or google doc, and enter the public URL in the URL-box top-right.
11 |
12 | This GraphGist shows the basics of using AsciiDoc syntax and a few additions for GraphGists.
13 | The additions are entered as comments on their own line.
14 | They are: +//console+ for a query console; +//hide+, +//setup+ and +//output+ to configure a query; +//graph+ and +//table+ to visualize queries and show a result table.
15 |
16 | Click on the Page Source button in the menu to see the source for this GraphGist.
17 | Read below to get the full details.
18 |
19 | == Configure GraphGist Metadata
20 |
21 | The metadata is optional, it is partially used when submitting a GraphGist.
22 | To select a particular version, the `neo4j-version` attribute can be used.
23 |
24 | To provide custom styling to the graph visualization the `style` attribute provides means to set the color/[border-color]/[text-color] for a label and to pre-select a property.
25 | The general syntax is: `:style: color:Label1(property1), color/border-color/text-color:Label2(property2),...`
26 |
27 | Here are the settings for this document.
28 |
29 | [subs="attributes"]
30 | ----
31 | :neo4j-version: {neo4j-version}
32 | :author: {author}
33 | :twitter: {twitter}
34 | :style: {style}
35 | ----
36 |
37 | == Define a http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html[Cypher] query
38 |
39 | [source,cypher]
40 | ----
41 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what)
42 | RETURN who,likes,what
43 | ----
44 |
45 | becomes:
46 |
47 | [source,cypher]
48 | ----
49 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what)
50 | RETURN who,likes,what
51 | ----
52 |
53 | _Queries are executed in the order they appear on the page during rendering, so make sure they can be performed in that order._
54 | Each query has a green or red button to indicate if the query was successful or not.
55 | The console is set up after the executions, with an empty database, for the reader to play around with the queries.
56 |
57 | There's three additional settings you can use for queries.
58 | They all go as comments, on their own lines, before the query.
59 | The settings are:
60 |
61 | [width="50%",cols="1m,5"]
62 | |===
63 | | hide | Hide the query. The reader can still expand it to see it.
64 | Useful for long queries like setting up initial data.
65 | | setup | Initialize the console with this query.
66 | | output | Show the output from the query.
67 | The output is always there, but this option makes it visible at page load for this query.
68 | |===
69 |
70 | Let's try all the settings together, which means this query will be used to initialize the console, it will be hidden, and the raw output will be shown:
71 |
72 | //hide
73 | //setup
74 | //output
75 | [source,cypher]
76 | ----
77 | CREATE (me:Person {name:'Me'})-[r:LIKES]->(neo4j:Database {name:'Neo4j',link:'http://neo4j.com'})
78 | RETURN me.name, r, neo4j
79 | ----
80 |
81 | which becomes:
82 |
83 | //hide
84 | //setup
85 | //output
86 | [source,cypher]
87 | ----
88 | CREATE (me:Person {name:'Me'})-[r:LIKES]->(neo4j:Database {name:'Neo4j',link:'http://neo4j.com'})
89 | RETURN me.name, r, neo4j
90 | ----
91 |
92 |
93 | == Show a graph visualization
94 |
95 | The visualization is based on the **database contents** after the preceding query in the page.
96 |
97 | +//graph+
98 |
99 | becomes:
100 |
101 | //graph
102 |
103 |
104 | == Show a graph visualization
105 |
106 | [source,cypher]
107 | ----
108 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what)
109 | RETURN who,likes,what
110 | ----
111 |
112 | The visualisation is based on the **results** returned by the preceding query in the page.
113 |
114 | +//graph_result+
115 |
116 | becomes:
117 |
118 | //graph_result
119 |
120 | == Show a result table for a query
121 |
122 | This will show a result table for the preceding query.
123 | Properties/Cells that are URLs are rendered as links and image URLs are rendered as inline images.
124 |
125 | [source,cypher]
126 | ----
127 | MATCH (who:Person {name:'Me'})-[likes:LIKES]->(what)
128 | RETURN who,likes,what, who.name, what.name, what.link
129 | ----
130 |
131 | +//table+
132 |
133 | becomes:
134 |
135 | //table
136 |
137 | == Query with error
138 |
139 | This is what happens if a query causes an error:
140 |
141 | [source,cypher]
142 | ----
143 | CREATE (n:QueryLanguage {name:'cypher'}
144 | ----
145 |
146 | == Include a query console
147 |
148 | A console button will be included anyway to hide or show the console.
149 | This will place the cypher-console where you want it and open it by default.
150 |
151 | +//console+
152 |
153 | becomes:
154 |
155 | //console
156 |
157 | == Basic AsciiDoc formatting
158 |
159 | [width="50%",cols="1m,1a"]
160 | |===
161 | | \_Italic_ | _Italic_
162 | | \*Bold* | *Bold*
163 | | \`Monospace` | `Monospace`
164 | | `http://www.neo4j.org/` | http://www.neo4j.org/
165 | | `http://www.neo4j.org/[neo4j.org]` | http://www.neo4j.org/[neo4j.org]
166 | | `link:./?5956246[Link to a GraphGist]` | link:./?5956246[Link to a GraphGist]
167 | |===
168 |
169 | Document Info:
170 |
171 | ----
172 | = Graph Gist Title
173 | :neo4j-version: 2.3
174 | :author: Author Name
175 | :twitter: twitterhandle
176 | ----
177 |
178 | Headings:
179 |
180 | = Heading 1
181 | == Heading 2
182 | === Heading 3
183 |
184 | Images:
185 |
186 | image::http://assets.neo4j.org/img/still/cineasts.gif[]
187 |
188 | image::http://assets.neo4j.org/img/still/cineasts.gif[]
189 |
190 | ----
191 | * Item 1
192 | ** Item 1.1
193 | * Item 2
194 | ----
195 |
196 | * Item 1
197 | ** Item 1.1
198 | * Item 2
199 |
200 | ----
201 | . First
202 | . Second
203 | ----
204 |
205 | . First
206 | . Second
207 |
208 | Monospaced block: indent lines with one space.
209 |
210 | Tables are well supported.
211 | See http://asciidoctor.org/docs/asciidoc-quick-reference/[AsciiDoc Quick Reference] for information on that and more.
212 |
--------------------------------------------------------------------------------
/uc-search/graphgist_water.adoc:
--------------------------------------------------------------------------------
1 | = Piping Water
2 | :neo4j-version: 2.3.0
3 | :author: Shaun Daley
4 | :twitter: @shaundaley1
5 | :tags: resources
6 | :domain: Shutting Valves and Migrating Infrastructure
7 |
8 | :toc:
9 |
10 | == Inspiration
11 |
12 | London's antique water distribution network is infamous: it loses a http://www.theguardian.com/commentisfree/2012/may/08/water-industry-pipes-scandal[quarter of the water] supplied to London (spilt into the ground). Consequence: http://www.bbc.co.uk/news/10213835[desalination], massive additional CO~2~ emissions, road congestion caused by too many emergency excavations and very high water prices for consumers.
13 | London's case is severe but not atypical: most cities suffer from the same underlying infrastructure problem.
14 | Pipes and valves buried below busy urban streets are inherently difficult and expensive to maintain.
15 | Inaccessibility, lack of information, failure to efficiently process data and the high cost of each human intervention in legacy systems all compound to undermine efficient resource distribution.
16 |
17 | Modern wireless networking offers the first crucial part in reducing the human time and cost of maintenance, and avoiding environmental damage: modern valves can be remotely (and even autonomously) operated to isolate components and pipe sections; modern flow sensors can transmit cross-section flow rates minute-by-minute.
18 | The remaining challenge is to log, model and process these resources on a city scale (London sized) with several hundred million components (pipe sections, valves, pumps, flow sensors, outlets, sources, etc) and the sparse, evolving relations between those components.
19 | Neo4j fits this bill perfectly.
20 |
21 | To illustrate the domain, we'll eliminate most complexity and focus on a single (but ubiquitous) problem.
22 |
23 |
24 | _Using the Design for Queryability modeling approach by http://twitter.com/ianrobinson[Ian Robinson]_
25 |
26 | == Illustrating the Domain
27 |
28 | === Application/End-User Goals
29 |
30 | ____
31 | *As an* engineer for a water utility
32 |
33 | *I want* to know the accessible (remotely controllable, man-hole-accessible or excavation-accessible) valves for shutting off a set of components.
34 |
35 | *So that* we can rapidly isolate a burst pipe, leaking valve or a set of components for replacement.
36 | ____
37 |
38 | === Questions To Ask of the Domain
39 |
40 | ____
41 | What is the set of valves that must be closed to isolate the smallest possible part of the network including a set of components?
42 | What is the minimum set of network controllable (or manhole-accessible) valves that must be closed to isolate a set of components?
43 | ____
44 |
45 |
46 | === Identify Nodes
47 |
48 | * Component: Pipes, sources, outlets, pumps, flow sensors, et cetera
49 | * Connection
50 | * Valve
51 |
52 | == Identify Relationships Between Nodes
53 |
54 | ----
55 | (:Component)<-[:CONNECTS]-(:Connection)-[:CONNECTS]->(:Component)
56 | (:Valve)-[:CLOSES]->(:Connection)
57 | ----
58 |
59 | Valves are distinct objects with their own relationships e.g. IP address, API key, state information, whether they are manually closed clockwise or anti-clockwise (both valve closing directions exist and are widely prevalent in UK legacy water distribution infrastructure).
60 | For efficiency of querying, the nature of connections is duplicated in the relationship also:
61 |
62 | ----
63 | (:Valve{access:'API'})->[:CLOSES]->(:Connection)->[:API_CONNECTS]->(:Component)
64 | (:Valve{access:'Manhole'})->[:CLOSES]->(:Connection)->[:MANHOLE_CONNECTS]->(:Component)
65 | (:Valve{access:'Excavation'})->[:CLOSES]->(:Connection)->[:EXCAV_CONNECTS]->(:Component)
66 | ----
67 |
68 | == Candidate Data Model
69 |
70 | //hide
71 | //setup
72 | [source,cypher]
73 | ----
74 | CREATE
75 | (burstPipe:Component{name:"BurstPipe"}),
76 | (pipe1:Component),
77 | (pipe2:Component),
78 | (pipe3:Component),
79 | (pipe4:Component),
80 | (pipe5:Component),
81 | (pipe6:Component),
82 | (pipe7:Component),
83 | (pipe8:Component),
84 | (pipe9:Component),
85 | (pipe10:Component),
86 | (pipe11:Component),
87 | (connection1:Connection),
88 | (connection2:Connection),
89 | (connection3:Connection),
90 | (connection4:Connection),
91 | (connection5:Connection),
92 | (connection6:Connection),
93 | (connection7:Connection),
94 | (connection8:Connection),
95 | (connection9:Connection),
96 | (connection10:Connection),
97 | (connection11:Connection),
98 | (valve1:Valve {access:'API'}),
99 | (valve2:Valve {access:'Excavation'}),
100 | (valve3:Valve {access:'API'}),
101 | (valve4:Valve {access:'Manhole'}),
102 | (valve5:Valve {access:'API'}),
103 | (valve6:Valve {access:'API'}),
104 | (valve7:Valve {access:'API'}),
105 | (valve8:Valve {access:'API'}),
106 | (connection1)-[:API_CONNECTS]->(burstPipe),
107 | (connection1)-[:API_CONNECTS]->(pipe1),
108 | (connection2)-[:EXCAV_CONNECTS]->(burstPipe),
109 | (connection2)-[:EXCAV_CONNECTS]->(pipe2),
110 | (connection3)-[:CONNECTS]->(pipe2),
111 | (connection3)-[:CONNECTS]->(pipe3),
112 | (connection4)-[:API_CONNECTS]->(pipe2),
113 | (connection4)-[:API_CONNECTS]->(pipe4),
114 | (connection5)-[:MANHOLE_CONNECTS]->(pipe3),
115 | (connection5)-[:MANHOLE_CONNECTS]->(pipe5),
116 | (connection6)-[:API_CONNECTS]->(pipe3),
117 | (connection6)-[:API_CONNECTS]->(pipe6),
118 | (connection7)-[:CONNECTS]->(pipe5),
119 | (connection7)-[:CONNECTS]->(pipe7),
120 | (connection8)-[:CONNECTS]->(pipe5),
121 | (connection8)-[:CONNECTS]->(pipe8),
122 | (connection9)-[:API_CONNECTS]->(pipe5),
123 | (connection9)-[:API_CONNECTS]->(pipe9),
124 | (connection10)-[:API_CONNECTS]->(pipe7),
125 | (connection10)-[:API_CONNECTS]->(pipe10),
126 | (connection11)-[:API_CONNECTS]->(pipe8),
127 | (connection11)-[:API_CONNECTS]->(pipe11),
128 | (valve1)-[:CLOSES]->(connection1),
129 | (valve2)-[:CLOSES]->(connection2),
130 | (valve3)-[:CLOSES]->(connection4),
131 | (valve4)-[:CLOSES]->(connection5),
132 | (valve5)-[:CLOSES]->(connection6),
133 | (valve6)-[:CLOSES]->(connection9),
134 | (valve7)-[:CLOSES]->(connection10),
135 | (valve8)-[:CLOSES]->(connection11)
136 | RETURN *
137 | ----
138 | // graph
139 |
140 | === Isolate the Burst Pipe Using Only Remote Calls to API-Accessible Valves
141 |
142 | [source,cypher]
143 | ----
144 | START burstPipe=node:node_auto_index(name='BurstPipe')
145 | MATCH (burstPipe)-[:CONNECTS|EXCAV_CONNECTS|MANHOLE_CONNECTS*0..]-()-[:API_CONNECTS]-(h)-[:CLOSES]-(v {access:'API'})
146 | RETURN v
147 | ----
148 | // table
149 |
150 | === Isolate the Burst Pipe Using Manhole-Accessible and API-Accessible Valves
151 |
152 | [source,cypher]
153 | ----
154 | START burstPipe=node:node_auto_index(name='BurstPipe')
155 | MATCH (burstPipe)-[:CONNECTS|EXCAV_CONNECTS*0..]-()-[:MANHOLE_CONNECTS|API_CONNECTS]-(h)-[:CLOSES]-(v)
156 | RETURN v
157 | ----
158 | // table
159 |
160 | === Isolate the Burst Pipe Using Any Existing Valves
161 |
162 | [source,cypher]
163 | ----
164 | START burstPipe=node:node_auto_index(name='BurstPipe')
165 | MATCH (burstPipe)-[:CONNECTS*0..]-()-[:EXCAV_CONNECTS|MANHOLE_CONNECTS|API_CONNECTS]-(h)-[:CLOSES]-(v)
166 | RETURN v
167 | ----
168 | // table
169 |
170 | == Extension
171 |
172 | For real world application, there are some necessary modifications (e.g. modelling state information in relationships, such as whether a connection is presently closed or scheduled for opening/closing; limiting query depth and notifying of query failure in event of maximum query depth being reached).
173 |
174 | In real world application, extending the above model, there is potential for adding greater value still:
175 |
176 | - estimating the marginal water savings from replacing any defined set of components
177 | - estimating the resilience of network water pressure to failure of specific pumps (both current and under hypothetical modifications to the network)
178 | - scheduling replacement or state-change of parts, and communicating this seamlessly (and automatically) in real time to all other parties that this might affect
179 |
180 | This approach is more generic than it may initially seem.
181 | Many resource problems involve networks of distribution in which many components interact across sparse relationships (electricity generation and distribution, natural gas, sewage, district-piped heating); rapid and efficient querying on these relationships is necessary for efficient resource allocation and better environmental and cost outcomes.
182 |
183 | //console
184 |
--------------------------------------------------------------------------------