├── .gitignore ├── img ├── architecture.jpg └── Goblin_Neo4J_Dependency_Graph.png ├── README.md ├── 004_Miner.md ├── 002_Neo4jDatabase.md ├── 001_Installation.md ├── 005_AddedValues.md ├── 003_WeaverAPI.md └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /img/architecture.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Goblin-Ecosystem/goblinTutorial/HEAD/img/architecture.jpg -------------------------------------------------------------------------------- /img/Goblin_Neo4J_Dependency_Graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Goblin-Ecosystem/goblinTutorial/HEAD/img/Goblin_Neo4J_Dependency_Graph.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # goblinTutorial 2 | A tutorial on how to use the datasets and tools from the Goblin ecosystem. 3 | 4 | The Goblin framework (see Figure Below) is organized around a **Neo4J database** of the whole Maven Central dependency graph. 5 | This database can be created and updated incrementally using **Goblin Miner**. 6 | The database can be queried directly using Cypher (the Neo4j query language) or through the **Goblin Weaver** tool. 7 | 8 |
9 | Goblin framework 10 |
11 | 12 | We give here some elements on the use of the Neo4J databases and the Weaver and Miner tools. 13 | 14 | - [Installing the data set and the tools](001_Installation.md) 15 | 16 | This tutorial will help you in installing a Neo4J database for an ecosystem dependency graph. 17 | Several versions of such database dumps are provided (see link in the tutorial), either "raw" or with added metrics pre-computed and associated to the graph nodes. 18 | 19 | - [Using the Neo4J database](002_Neo4jDatabase.md) 20 | 21 | This tutorial will help you understand the dependency graph database structure and give some idea of the numbers of nodes/edges inside. 22 | It will also give examples of queries you can directly give (using Neo4J's Cypher query language) to the Neo4J engine. 23 | As for programmatic access to the database, one is referred to the Weaver source code that can me mimicked. 24 | 25 | - [Using and extending the Weaver tool](003_WeaverAPI.md) 26 | 27 | This tutorial explains how to use and extend the Weaver tool that is used to compute and retrieve parts of the dependency graph with added information on it. 28 | Some metrics are already supported by the Weaver (and included in some database dumps we provide) but one may extend it to include new ones. 29 | 30 | - [Regenerating a fresh dependency graph using the Miner](004_Miner.md) 31 | 32 | This tutorial is provided in case one wants to regenerate a fresh dependency graph database from scratch (e.g., a more up-to-date version wrt Maven Central). 33 | 34 | - [Automatic and manual deletion, and populating the entire database with added values](005_AddedValues.md) 35 | 36 | This tutorial is provided to better understand how added values are automatically deleted from the database, how to manually delete them, and how to populate the database with added values. 37 | -------------------------------------------------------------------------------- /004_Miner.md: -------------------------------------------------------------------------------- 1 | # GoblinMiner 2 | The Miner source code is available on [GitHub](https://github.com/Goblin-Ecosystem/goblinDependencyMiner) 3 | 4 | The Goblin miner allows you to generate and/or update a Maven Central dependency graph in a Neo4j database. 5 | 6 | To get all Maven releases data, we use the Central index archive [here](https://repo.maven.apache.org/maven2/.index/) 7 | Initially, this program will download the most recent archive and unpack it with the Maven Indexer CLI jar present on the lib folder. 8 | This will create a "central-lucene-index" folder at the root of the project during the execution, this folder will be deleted at the end of the program. 9 | Doc: https://maven.apache.org/repository/central-index.html 10 | 11 | ## Requirements 12 | - Java 17 13 | - Maven 14 | 15 | ## Configuration 16 | ### Configuration file 17 | To run the application you need to edit the configuration file in: src/main/resources/configuration.yml. 18 | - **dataBaseExport:** Choose the database you want to export data (can be Postgres, neo4J or both). 19 | - **update:** Set true if you want to update an existing neo4j graph, set false to generate a graph from scratch. 20 | - **thread:** Define the number of threads allocated to run the program. 21 | ### Database configuration 22 | #### Postgres 23 | To configure your Postgres database, you have to put your database information in the src/main/resources/META-INF/persistence.xml file. 24 | #### Neo4J 25 | To configure your Neo4J database, you have to put your database information in the src/main/resources/configuration.yml file. 26 | 27 | ## Run 28 | **Generating the graph requires a lot of memory**, so we have to force the JVM to use 30 GB at execution time. 29 | > _JAVA_OPTIONS="-Xmx30G" mvn clean compile exec:java 30 | 31 | ## Time 32 | Generation and updating can be very time-consuming, times displayed below have been realized with 12 threads configuration and a machine with the following characteristics: 33 | > OS: Red Hat Enterprise Linux
34 | > OS version: 8.7
35 | > 16 CPUs: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz
36 | > Memory: 64 GB
37 | 38 | Time to run the project from scratch on October 05, 2023: 4.1 days. 39 | Time to update our dataset from October 05, 2023, to October 14, 2023, (nine days): 1h06m. 40 | Time to update our dataset from April 14, 2023, to October 14, 2023, (six months): 6h23. 41 | Time to update our dataset from October 14, 2022, to October 14, 2023, (one year): 11h27m. -------------------------------------------------------------------------------- /002_Neo4jDatabase.md: -------------------------------------------------------------------------------- 1 | # Neo4j database 2 | Database dumps are available on [Zenodo](https://doi.org/10.5281/zenodo.13734581) 3 | 4 | The dependency graph database is composed of two node types (for libraries and for their releases) and two edge types (from releases to their dependencies and from libraries to their releases). The nodes for libraries (type Artifact) contain the Maven id (g.a) information and a boolean "found". The "found" boolean is used to know whether the node has been found in the ecosystem or not, that allow provides dependency information even if the artifact is not found on Maven Central. The nodes for releases (type Release) contain the Maven id (g.a.v), the release timestamp, and the version information. The edges for dependencies (type dependency) are from Release nodes to Artifact nodes and contain target version (which can be a range) and scope (compile, test, etc). The edges for versioning (type relationship_AR) edges are from Artifact nodes to Release nodes. 5 | 6 | ![](./img/Goblin_Neo4J_Dependency_Graph.png "Graph structure") 7 | 8 | The latest version of our dataset, dated August 30th, 2024, contains 15,117,217 nodes (658,078 libraries and 14,459,139 releases) and 134,119,545 edges (119,660,406 dependencies and 14,459,139 versioning edges). 9 | 10 | We also provide a second version of this dataset enriched with the Weaver metrics, which has the effect of creating new “AddedValue” nodes in the database containing the metrics (CVE (dated September 4, 2024), freshness, popularity and speed). This adds 44,035,495 new nodes. 11 | 12 | ## Neo4j Cypher querying 13 | Cypher is Neo4j’s declarative query language, more information [here](https://neo4j.com/docs/cypher-manual/current/queries/basic/). 14 | 15 | Here, we will presents basic examples of Cypher queries on the Neo4j Maven Central dependency graph. 16 | 17 | 18 | ### Get a Release Node 19 | ```cypher 20 | MATCH (r:Release) WHERE r.id='org.jgrapht:jgrapht-core:1.5.2' RETURN r 21 | ``` 22 | 23 | ### Get all Library versions 24 | To retrieve Release nodes only, use this request: 25 | ```cypher 26 | MATCH (a:Artifact) WHERE a.id='org.jgrapht:jgrapht-core' 27 | WITH a 28 | MATCH (a)-[e:relationship_AR]->(r) 29 | RETURN r 30 | ``` 31 | 32 | To retrieve the Artifact node, the edges and the Releases, use this request: 33 | ```cypher 34 | MATCH (a:Artifact) WHERE a.id='org.jgrapht:jgrapht-core' 35 | WITH a 36 | MATCH (a)-[e:relationship_AR]->(r) 37 | RETURN a, e, r 38 | ``` 39 | 40 | ### Get Release direct dependencies 41 | To get all direct dependencies of a Release, use this request: 42 | ```cypher 43 | MATCH (r:Release) WHERE r.id='org.jgrapht:jgrapht-core:1.5.2' 44 | WITH r 45 | MATCH (r)-[e:dependency]->(a) 46 | RETURN r, e, a 47 | ``` 48 | 49 | If you want to filter dependencies to remove test dependencies for example, do: 50 | ```cypher 51 | MATCH (r:Release) WHERE r.id='org.jgrapht:jgrapht-core:1.5.2' 52 | WITH r 53 | MATCH (r)-[e:dependency]->(a) WHERE e.scope<>'test' 54 | RETURN r, e, a 55 | ``` 56 | 57 | ### Get Release dependents 58 | Note that release versions contained in dependency edges can be ranges (e.g. [1.0,2.0)). 59 | The following query therefore does not take into account the resolution of this type of dependency. 60 | 61 | ```cypher 62 | MATCH (r:Release)-[d:dependency]->(a:Artifact) 63 | WHERE a.id = 'org.jgrapht:jgrapht-core' AND d.targetVersion = '1.5.2' 64 | RETURN r, d, a 65 | ``` 66 | 67 | ### Metrics-enriched database: get specific release CVE 68 | This query works on the dump enriched with metrics; here we are simply looking for the CVEs of a specific release. 69 | ```cypher 70 | MATCH (r:Release)-[a:addedValues]->(v:AddedValue) WHERE r.id='org.apache.logging.log4j:log4j-core:2.17.0' AND v.type='CVE' RETURN v.value 71 | ``` 72 | 73 | ## Neo4j Programmatic querying 74 | For more complicated queries (such as iterations or recursivity), it may be simpler to proceed programmatically, using the neo4j drivers available in the various programming languages. 75 | 76 | The [Weaver code](https://github.com/Goblin-Ecosystem/goblinWeaver) shows examples of how to use the Neo4j database in Java. 77 | -------------------------------------------------------------------------------- /001_Installation.md: -------------------------------------------------------------------------------- 1 | # Installation 2 | 3 | We will present later on how to generate or update a dependency graph database. By now we suppose we work with a snapshot provided online. 4 | 5 | ## Requirements 6 | 7 | The Goblin framework relies on a Neo4J graph database. You can either have this using a containers (Docker), using Neo4J desktop (on you machine) or using Neo4J as an application server (on a remote machine). 8 | 9 | The requirements are: 10 | 11 | - if you install things and run using Docker: Docker 12 | - if you install manually and run Neo4J using Neo4J Desktop: Java 17 to run the Goblin tools. 13 | - if you install manually and run Neo4J as a server application: Java 17 **and** Java 11 (it seems that Neo4J as a server application is not working with Java 17). 14 | 15 | ## Installation using Docker 16 | 17 | We provide Docker files to install the Graph Database and Goblin Weaver [here](https://github.com/Goblin-Ecosystem/Neo4jWeaverDocker). 18 | Please note that using this you are not able to choose the graph database version by default (you may modify `Dockerfile.neo4j` if you want). Furthermore the user name is `neo4j` and the password is set to `Password1` (again it can be changed in `Dockerfile.neo4j`). 19 | 20 | For the first run: 21 | 22 | ```sh 23 | docker-compose up --build 24 | ``` 25 | 26 | For other runs: 27 | 28 | ```sh 29 | docker-compose up 30 | ``` 31 | 32 | Then, proceed to the "Test of the Installation" section, below, to check your installation. 33 | 34 | ## Manual Installation of the Graph Database 35 | 36 | 1. install [Neo4J](https://neo4j.com/product/neo4j-graph-database/), see [here](https://neo4j.com/docs/operations-manual/4.4/installation/) 37 | 38 | You can either install the Desktop application (if you are experimenting on your computer) or the server application (if you are running on a server machine). 39 | 40 | **Important:** if you are using the Desktop application you will be able to select the DBMS version when importing the database dump. If you are using the server application then you **must** ensure that you are installing version 4.x. 41 | 42 | 2. download the database dump from [here](https://doi.org/10.5281/zenodo.13734581) 43 | 44 | We release newer versions from time to time. See the versions list there. 45 | 46 | 3. importing the database dump into Neo4J 47 | 48 | a. Desktop application 49 | 50 | - open Neo4J Desktop 51 | - an example project must have been created and opened (if not, please create one) 52 | - click on the "Add" dropdown menu and select "File" 53 | - find and select your dump file 54 | - put the cursor over the dump file name in the project file list, select "..." and click on "Create new DBMS from dump" 55 | - give a name to your new database, setup a password (and keep it somewhere), and **select the last 4.x version in the "Version" dropdown** 56 | - click on the "Create" button 57 | - the database appears on top of the project and you can click "Start" to start it 58 | 59 | b. Server application 60 | 61 | - Stop the database if it's running: 62 | ```sh 63 | sudo systemctl stop neo4j 64 | ``` 65 | - Load the dump: 66 | ```sh 67 | sudo -u neo4j neo4j-admin load --from=/path/to/dump.dump --database=neo4j --force 68 | ``` 69 | - Start neo4j: 70 | ```sh 71 | sudo systemctl stop neo4j 72 | ``` 73 | - To access the database, issue the command: 74 | ```sh 75 | cypher-shell -u neo4j -p your_password 76 | ``` 77 | 78 | ## Manual Installation of Goblin Weaver 79 | 80 | 1. Make sure that the Neo4j database containing the graph is running. 81 | 2. Download the Goblin Weaver jar [here](https://github.com/Goblin-Ecosystem/goblinWeaver/releases) 82 | 3. Open a terminal and run the following command (If needed, update the Neo4j user, password and uri): 83 | ```sh 84 | java -Dneo4jUri="bolt://localhost:7687/" -Dneo4jUser="neo4j" -Dneo4jPassword="Password1" -jar goblinWeaver-2.1.0.jar 85 | ``` 86 | 87 | The program will first download the osv.dev dataset and create a folder called "osvData", it's takes approximately 3m. 88 | For other runs, **if you don't want to update the CVE data**, you can add the "noUpdate" argument on the java -jar command like this: 89 | ```sh 90 | java -Dneo4jUri="bolt://localhost:7687/" -Dneo4jUser="neo4j" -Dneo4jPassword="Password1" -jar goblinWeaver-2.1.0.jar noUpdate 91 | ``` 92 | 93 | ## Test of the Installation 94 | 95 | ### Neo4j dataset 96 | To verify the Neo4j dataset works, open a terminal and run the following command: 97 | 98 | **Important:** If needed, update the Neo4j user, password and uri. 99 | ```sh 100 | curl -H "Content-Type: application/json" -X POST \ 101 | -u neo4j:Password1 \ 102 | -d '{"statements": [{"statement": "MATCH (n) RETURN count(n)"}]}' \ 103 | http://localhost:7474/db/neo4j/tx/commit 104 | ``` 105 | 106 | If it works, you should get a response like this 107 | ```sh 108 | {"results":[{"columns":["count(n)"],"data":[{"row":[14077982],"meta":[null]}]}],"errors":[]}% 109 | ``` 110 | 111 | ### Weaver 112 | If the Weaver has launched without displaying an error, it is ready for use. 113 | The Swagger documentation should therefore be available under this link: 114 | [http://localhost:8080/swagger-ui/index.html](http://localhost:8080/swagger-ui/index.html) 115 | 116 | ## Accessibility 117 | - By default, Neo4j web interface will be accessible via http://localhost:7474 118 | - By default, Weaver REST API will be accessible via http://localhost:8080 119 | -------------------------------------------------------------------------------- /005_AddedValues.md: -------------------------------------------------------------------------------- 1 | # Added values 2 | As indicated on the page [003_WeaverAPI.md](003_WeaverAPI.md) of the tutorial, the Weaver allows adding values to the graph. 3 | This tutorial helps to further understand these new nodes of the graph. 4 | 5 | ## Deletion of added values 6 | ### Automatic deletion 7 | The added values can be automatically deleted by the Weaver. 8 | #### CVE 9 | CVE entries depend on the [OSV](https://osv.dev/) database. Upon the update of this database, all AddedValue nodes of type CVE are deleted from the database. 10 | 11 | ⚠️ The OSV database is updated automatically when the Weaver starts. 12 | To prevent it from updating, add the `noUpdate` argument: 13 | ```sh 14 | java -Dneo4jUri="bolt://localhost:7687/" -Dneo4jUser="neo4j" -Dneo4jPassword="Password1" -jar goblinWeaver-2.1.0.jar noUpdate 15 | ``` 16 | 17 | #### FRESHNESS, POPULARITY_1_YEAR, SPEED 18 | These three metrics are calculated based on the information nodes present in the graph; they do not depend on external data. 19 | Thus, when the Weaver starts, it checks if the graph has been updated. If it has been updated, it deletes these added values. 20 | 21 | ### Manual deletion 22 | #### Weaver 23 | The Weaver comes with routes that allow deleting the added values from the database. 24 | 25 | To delete all added values, use the following route: 26 | **Method**: `DELETE` 27 | **ROUTE**: `/addedValues` 28 | 29 | To delete one or more specific added value(s), use the following route: 30 | **Method**: `DELETE` 31 | **ROUTE**: `/addedValue` 32 | **Body example**: 33 | 34 | ```json 35 | { 36 | "addedValues": 37 | ["CVE", "FRESHNESS"] 38 | } 39 | ``` 40 | 41 | #### Cypher 42 | You can also delete the AddedValue nodes by directly using Cypher on Neo4j. 43 | Example of a Cypher query: 44 | ```cypher 45 | :auto MATCH (n:AddedValue) WHERE n.type IN ['CVE', 'FRESHNESS'] CALL { WITH n DETACH DELETE n } IN TRANSACTIONS OF 10000 ROWS; 46 | ``` 47 | 48 | Here, we launch the transaction in batches of 10,000 to avoid overloading the memory, as the number of nodes of this type can reach millions. The `:auto` at the beginning of the query allows the use of transactions. As an example, this query without transactions would look like this: 49 | ```cypher 50 | MATCH (n:AddedValue) WHERE n.type IN ['CVE', 'FRESHNESS'] WITH n DETACH DELETE n 51 | ``` 52 | 53 | ## Addition of added value 54 | As mentioned on the [003_WeaverAPI.md](003_WeaverAPI.md) page, when you call the Weaver specifying a value to add, it will calculate and automatically add it to your Neo4j database. 55 | 56 | The [Zenodo](https://doi.org/10.5281/zenodo.13734581) archive contains a database with the CVE, FRESHNESS, POPULARITY, and SPEED metrics already pre-calculated. However, if you wish to update the database or the CVE data, they will be automatically deleted (as they are no longer up-to-date). Also, if you want to add a custom metric or use aggregated metrics, you may want to populate the entire database rather than calculate them on the fly. 57 | 58 | Here is a simple example of Java code to populate the database with a specific added value using the Weaver: 59 | ```java 60 | import org.json.simple.JSONArray; 61 | import org.json.simple.JSONObject; 62 | 63 | import java.io.IOException; 64 | import java.io.OutputStream; 65 | import java.net.HttpURLConnection; 66 | import java.net.URL; 67 | import java.nio.charset.StandardCharsets; 68 | import java.util.List; 69 | 70 | public class App 71 | { 72 | private static final String API_URL = "http://localhost:8080"; 73 | private static final int nodesMaxId = XXX; 74 | private static final int batchSize = 50000; 75 | 76 | public static void main( String[] args ) 77 | { 78 | int startId = 0; 79 | for (int currentStart = startId; currentStart <= nodesMaxId; currentStart += batchSize) { 80 | int currentEnd = currentStart + batchSize - 1; 81 | String cypherQuery = String.format( 82 | "MATCH (n:Release) " + 83 | "WHERE id(n) >= %d AND id(n) <= %d " + 84 | "RETURN n;", 85 | currentStart, currentEnd 86 | ); 87 | System.out.println(currentEnd + "/" + nodesMaxId); 88 | cypherQuery(cypherQuery, List.of("CVE")); 89 | } 90 | } 91 | 92 | public static void cypherQuery(String query, List addedValues){ 93 | String apiRoute = "/cypher"; 94 | JSONObject bodyJsonObject = new JSONObject(); 95 | bodyJsonObject.put("query", query); 96 | JSONArray jsonArray = new JSONArray(); 97 | jsonArray.addAll(addedValues); 98 | bodyJsonObject.put("addedValues", jsonArray); 99 | executeQuery(bodyJsonObject, apiRoute); 100 | } 101 | 102 | private static void executeQuery(JSONObject bodyJsonObject, String apiRoute){ 103 | try { 104 | URL url = new URL(API_URL+apiRoute); 105 | HttpURLConnection http = (HttpURLConnection) url.openConnection(); 106 | http.setRequestMethod("POST"); 107 | http.setRequestProperty("Content-Type", "application/json; utf-8"); 108 | http.setRequestProperty("Accept", "application/json"); 109 | http.setDoOutput(true); 110 | 111 | byte[] out = bodyJsonObject.toString().getBytes(StandardCharsets.UTF_8); 112 | 113 | OutputStream stream = http.getOutputStream(); 114 | stream.write(out); 115 | 116 | if(http.getResponseCode() != 200){ 117 | System.out.println("Error with query: \n "+bodyJsonObject); 118 | } 119 | http.disconnect(); 120 | } catch (IOException e) { 121 | System.out.println("Unable to connect to API:\n" + e); 122 | } 123 | } 124 | } 125 | ``` 126 | This code populates the database with one or more metrics by iterating over each node in batches of 50,000 to avoid overloading the memory. 127 | 128 | To use this code, you need to make the following modifications: 129 | - The Cypher query `MATCH (n:Release)` hould be specified with the type of node to which the added values apply, such as `Release` or `Artifact`. 130 | - Specify the added value(s) `List.of("CVE")` to indicate which added values to calculate. 131 | - And finally, set the `nodesMaxId` variable to specify the maximum Neo4j ID present in the database. To find this value, execute the following Cypher query in Neo4j: 132 | ```Cypher 133 | MATCH (n) 134 | RETURN max(id(n)) AS maxId 135 | ``` -------------------------------------------------------------------------------- /003_WeaverAPI.md: -------------------------------------------------------------------------------- 1 | # Weaver API 2 | The Weaver source code is available on [GitHub](https://github.com/Goblin-Ecosystem/goblinWeaver) 3 | 4 | The Goblin Weaver REST API is available as an alternative for direct access to the database using the Cypher language and for on-demand enrichment of the dependency graph with new information. A memoization principle is available to avoid re-computing enrichments, as soon as the base graph is not re-computed or incremented. For this, new kinds of nodes (type AddedValue) and edges (type addedValues from an Artifact or Release node to an AddedValue node) are used in the graph database. One should be careful, as the graph is large, calculating metrics (especially aggregate ones) for the whole graph can be time-consuming. 5 | 6 | ## Added values 7 | Added values are assigned to a certain type of node (Artifact or Release). 8 | The page [005_AddedValues.md](005_AddedValues.md) give more information about the deleting and adding added values. 9 | Currently, the weaver can compute the following added values: 10 | 11 | ### Release nodes added values 12 | The Release added values can be computed either locally (*e.g.*, the CVEs of a release) or aggregated (*e.g.*, the CVEs of a release and of all its direct and indirect dependencies). 13 | To use the aggregated value of a metric, add "_AGGREGATE" after the metric name. 14 | For example, to the the aggegated value of the "CVE" metric, use "CVE_AGGREGATE". 15 | 16 | #### CVE 17 | [Common Vulnerabilities and Exposures (CVE)](https://cve.mitre.org/), is a dictionary of public information on security vulnerabilities. 18 | We use the [osv.dev](https://osv.dev/) dataset to get CVEs information. 19 | Our added value contains for each CVE its name, cwe (type of vulnerability) and severity (low, moderate, high, critical). 20 | 21 | #### FRESHNESS 22 | Corresponds, for a specific release, to the number of more recent releases available and to the time elapsed in milliseconds between the specific release and the most recent release. 23 | More information about freshness [here](https://ieeexplore.ieee.org/abstract/document/7202955). 24 | 25 | #### POPULARITY_1_YEAR 26 | To compute the popularity of a Release, we compute the number of dependants of the version of the library over a one year window (back from the date of the dependency graph). 27 | There are many ways to calculate the popularity of a release, and you can extend the Weaver to create your own, or to modify the one-year window we've defined. 28 | 29 | ### Artifact nodes added values 30 | #### SPEED 31 | Corresponds to the average number of releases per day of an artifact. more information in our [benevol 2022 paper](https://hal.science/hal-03725099/document). 32 | 33 | ## Run the Weaver 34 | ## Manual Installation of Goblin Weaver 35 | 36 | 1. Make sure that the Neo4j database containing the graph is running. 37 | 2. Open a terminal and run the following command (If needed, update the Neo4j user, password and uri): 38 | ```sh 39 | java -Dneo4jUri="bolt://localhost:7687/" -Dneo4jUser="neo4j" -Dneo4jPassword="Password1" -jar goblinWeaver-2.1.0.jar 40 | ``` 41 | 42 | The program will first download the osv.dev dataset and create a folder called "osvData", it's takes approximately 3m. 43 | For other runs, **if you don't want to update the CVE data**, you can add the "noUpdate" argument on the java -jar command like this: 44 | ```sh 45 | java -Dneo4jUri="bolt://localhost:7687/" -Dneo4jUser="neo4j" -Dneo4jPassword="Password1" -jar goblinWeaver-2.1.0.jar noUpdate 46 | ``` 47 | 48 | ## Use the Weaver 49 | Pre-designed requests are available, but you can also send your own Cypher requests directly to the API. 50 | You can add to the body query for the API a list of Added values, and it will enrich the result for you. 51 | 52 | The Weaver API comes with its Swagger documentation: http://localhost:8080/swagger-ui/index.html 53 | 54 | ### Example: new versions of a release with metrics 55 | Here, we want to know which are the latest versions available after jgrapht-core 1.5.0 and add to that their CVE, freshness and popularity information. 56 | 57 | To do that, we use a pre-defined route 58 | 59 | **Method**: `POST` 60 | **ROUTE**: `/release/newVersions` 61 | **Body**: 62 | 63 | ```json 64 | { 65 | "groupId": "org.jgrapht", 66 | "artifactId": "jgrapht-core", 67 | "version": "1.5.0", 68 | "addedValues": ["CVE", "FRESHNESS", "POPULARITY_1_YEAR"] 69 | } 70 | ``` 71 | 72 | **Response** 73 | ```json 74 | { 75 | "nodes": [ 76 | { 77 | "cve": [], 78 | "id": "org.jgrapht:jgrapht-core:1.5.1", 79 | "nodeType": "RELEASE", 80 | "freshness": { 81 | "numberMissedRelease": "1", 82 | "outdatedTimeInMs": "66882028000" 83 | }, 84 | "version": "1.5.1", 85 | "popularity_1_year": 105, 86 | "timestamp": 1616171280000 87 | }, 88 | { 89 | "cve": [], 90 | "id": "org.jgrapht:jgrapht-core:1.5.2", 91 | "nodeType": "RELEASE", 92 | "freshness": { 93 | "numberMissedRelease": "0", 94 | "outdatedTimeInMs": "0" 95 | }, 96 | "version": "1.5.2", 97 | "popularity_1_year": 953, 98 | "timestamp": 1683053308000 99 | } 100 | ] 101 | } 102 | ``` 103 | 104 | ### Exemple: Cypher query 105 | Routes can be used to simplify client use by not using Cypher, or to create more complex queries (e.g. a sequence of queries with processing between them). 106 | But it's also possible to send Cypher directly to the Weaver and ask it to enrich the result. 107 | 108 | For example, the query below using Cypher retrieves all log4j-core versions and adds the CVEs associated with each of them: 109 | 110 | **Method**: `POST` 111 | **ROUTE**: `/cypher` 112 | **Body**: 113 | ```json 114 | { 115 | "query": "MATCH (a:Artifact) WHERE a.id='org.apache.logging.log4j:log4j-core' WITH a MATCH (a)-[e:relationship_AR]->(r) RETURN r", 116 | "addedValues": ["CVE"] 117 | } 118 | ``` 119 | 120 | ## Extend the Weaver 121 | The Weaver is designed to be extensible, allowing a user to easily add information their research need. 122 | 123 | The Weaver source code is available on [GitHub](https://github.com/Goblin-Ecosystem/goblinWeaver) 124 | 125 | ### Build 126 | 127 | **Requirements:** Java 17, Maven 128 | 129 | To build the project, run: 130 | ```sh 131 | mvn clean package 132 | ``` 133 | 134 | ### Add new added values 135 | 1. Go to weaver/addedValue/AddedValueEnum and add the name of your new value. 136 | 2. Fill the three methods of this enumeration with your new added value 137 | 3. Create a new class that extends weaver/addedValue/AbstractAddedValue. 138 | 4. (optional) If you also want to create an aggregated value of your new added value, create a new class that extends your previous new class and implements the "AggregateValue" interface. 139 | 5. Write your internal logic in this new class. 140 | 141 | ### Add new routes 142 | 1. The routes files are available in: `src/main/java/com/cifre/sap/su/goblinWeaver/api/controllers` 143 | 2. Here, we have one class per route type (release, artifact, graph, cypher). Open the file corresponding to your route or create a new one. 144 | 3. We use the Spring framework to build our API, you only need to create a new method with the following structure: 145 | ```java 146 | @Operation( 147 | description = "My description", 148 | summary = "My summary" 149 | ) 150 | @PostMapping("/my/route") 151 | public JSONObject myNewRoute(@RequestBody MyQuery myQuery) { 152 | } 153 | ``` 154 | 155 | The body content (define by `@RequestBody`) is defined by the classes present in `src/main/java/com/cifre/sap/su/goblinWeaver/api/entities`. You can reuse an existing class or create a new one. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | --------------------------------------------------------------------------------