├── .gitignore
├── LICENSE
├── MANIFEST_FILE_NOTE.md
├── README.md
├── endpoints
    ├── AUTHENTICATION.md
    ├── BATCH_GENERATE_PRESIGNED_URLS.md
    ├── GET_DOWNLOAD_LIMITS.md
    ├── GET_MY_PACKAGES.md
    ├── GET_PACKAGE.md
    ├── GET_PACKAGES.md
    ├── GET_PACKAGE_FILES.md
    ├── GET_PACKAGE_FILES_FROM_S3.md
    ├── GET_PACKAGE_FILE_DOWNLOAD_CREDENTIALS.md
    ├── GET_SHARED_PACKAGES.md
    └── README.md
└── example
    ├── JAVA.md
    ├── PYTHON.md
    └── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | **/.idea/
2 | **/*.iml


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 NIMH Data Archive
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/MANIFEST_FILE_NOTE.md:
--------------------------------------------------------------------------------
 1 | More on Manifest Files
 2 | ======================
 3 | There are a few key differences between Package Service
 4 | and Data Manager, those moving your manifest file-based
 5 | downloaders between the two tools might have noticed this.
 6 | We have tried to make the service as 3rd party friendly as
 7 | possible, which includes changes to how files are indexed
 8 | and downloaded. So it's important to know a few key
 9 | differences that you'll encounter and how to 'work around'
10 | them.  
11 | 
12 | The most important one of them is the indexing of files,
13 | Package Service is very particular about how it keeps track
14 | of file data. This means that if you use the S3 Url to Package
15 | Service file conversion method as outlined in the [sample code segments](example/README.md)
16 | the package being queried will need to have the files present
17 | for the S3 conversion to work. What does this mean? Well let's
18 | explore a hypothetical:  
19 | 
20 | Let's say you're operating a similar flow to
21 | [DCAN Labs](https://github.com/DCAN-Labs/nda-abcd-s3-downloader),
22 | you want to download your package somewhere remote, but you also
23 | want to take advantage of the extra metadata located within your
24 | manifest file. You can download the manifest file directly from
25 | Package Service without having to jump through the hoop of
26 | downloading said manifest file through Download Manager, instead
27 | you can download it by interfacing with Package Service directly.
28 | However, if you follow the instructions perfectly you'll notice that Package
29 | Service cannot convert any of the S3 Urls to Package Service file references.
30 | This is because the files are not associated with your package, as when creating
31 | your package you didn't include the associated files, which results in the Package
32 | Service being unable to locate the appropriate package file associations with that
33 | S3 Url. What you can do to alleviate this is that when you're creating your package
34 | you include associated files, this will ensure the associations are present for
35 | Package Service to be able to resolve the S3 Urls, but this will make the package
36 | more 'cluttered', so you'll probably have to iterate on all files checking their names
37 | to ensure that you're downloading and processing the correct file. This is a tedious change and
38 | requires one to iterate over all of the package's files to snag the appropriate ids,
39 | so we have implemented a way to narrow down your bulk file requests to just a specific
40 | type of file, so as long as you know what type of file you're looking for it'll allow
41 | you to shave of thousands of files with a simple specification.  
42 | 
43 | You can read more on the specific endpoints of Package Service [here](endpoints/README.md).


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Package Service Documentation
 2 | =============================
 3 | - [Code Examples](example/README.md)
 4 | - [Endpoints](endpoints/README.md)
 5 | - [A note on using manifest files](MANIFEST_FILE_NOTE.md)
 6 | 
 7 | FAQ
 8 | ---
 9 | ### I keep getting an 'access denied' when querying a package I created today
10 | This is probably caused by the database not being fully caught up, while
11 | your package is technically ready to download, permissions updates haven't
12 | updated fully, permissions are updated every night.
13 | 
14 | ### What is a manifest file?
15 | A manifest file is a file that is contained within a package that has the
16 | express intent of serving as an extra layer of metadata for that specific
17 | package. Manifest files tend to contain various forms of data and then a
18 | S3 link to the file associated with that data, this effectively allows
19 | advanced users to perform specific operations on specific files within a
20 | package without having to operate on the entire set of data.


--------------------------------------------------------------------------------
/endpoints/AUTHENTICATION.md:
--------------------------------------------------------------------------------
1 | Authentication Endpoint
2 | =======================
3 | - Will return a 200 if user credentials are valid and usable.
4 | - URL `/auth`
5 | - Http Method: `GET`
6 | - No Parameters
7 | - Requires authentication header
8 |     - `Authentication`: `Basic <base64 encoded username:password>`
9 | - No return data.


--------------------------------------------------------------------------------
/endpoints/BATCH_GENERATE_PRESIGNED_URLS.md:
--------------------------------------------------------------------------------
 1 | Batch Generate Presigned Urls Endpoint
 2 | ======================================
 3 | - This endpoint is used to generate presigned urls to
 4 | allow for the downloading of files from the Package Service.
 5 | - URL: `/{packageId}/files/batchGeneratePresignedUrls`
 6 | - Http Method: `POST`
 7 | - URL Parameters:
 8 |     - `packageId` Long: The id of the package
 9 |       you want to fetch the files for.
10 | - No Query Parameters
11 | - Post Parameters:
12 |     - Long[]: The json post data to this endpoint
13 |     should only contain an array of longs that are
14 |     the file ids to generate urls for.
15 | - Sample Post Data:
16 | ```json
17 | [
18 |   1,
19 |   2
20 | ]
21 | ```
22 | - [Requires authentication header](AUTHENTICATION.md)
23 | - Return data:
24 | ```json
25 | {
26 |   "presignedUrls" : [
27 |     {
28 |       "package_file_id" : 1,
29 |       "downloadURL" : "https://somes3url.s3.amazonaws.com"
30 |     },
31 |     {
32 |       "package_file_id" : 2,
33 |       "downloadURL" : "https://somes3url.s3.amazonaws.com"
34 |     }
35 |   ],
36 |   "errors" : []
37 | }
38 | ```


--------------------------------------------------------------------------------
/endpoints/GET_DOWNLOAD_LIMITS.md:
--------------------------------------------------------------------------------
 1 | Get Download Limits Endpoint
 2 | ============================
 3 | - Used to get the download limits of the user.
 4 | - URL: `/downloadlimits`
 5 | - Http Method: `GET`
 6 | - No Parameters
 7 | - [Requires authentication header](AUTHENTICATION.md)
 8 | - Return data:
 9 | ```json
10 | {
11 |   "download_threshold" : 21990232555520,
12 |   "download_volume_outside_aws" : 1069080920055,
13 |   "download_volume_inside_aws" : 2208240775095,
14 |   "download_volume_monthly_outside_aws" : 32858087790,
15 |   "download_volume_monthly_inside_aws" : 321444489496
16 | }
17 | ```


--------------------------------------------------------------------------------
/endpoints/GET_MY_PACKAGES.md:
--------------------------------------------------------------------------------
 1 | Get My Packages Endpoint
 2 | ========================
 3 | - Gets the packages created by your user, similar to
 4 | [Get Packages](GET_PACKAGES.md) but executes a less expensive query.
 5 | - URL: `/mypackages`
 6 | - Http Method: `GET`
 7 | - No Parameters
 8 | - [Requires authentication header](AUTHENTICATION.md)
 9 | - Return data:
10 | ```json
11 | [
12 |   {
13 |     "links" : [
14 |       {
15 |         "rel" : "self",
16 |         "href" : "https://nda.nih.gov/api/package/1"
17 |       },
18 |       {
19 |         "rel" : "associate",
20 |         "href" : "https://nda.nih.gov/api/package/1/associate"
21 |       }
22 |     ],
23 |     "package_id" : 1,
24 |     "status" : "Ready to Download",
25 |     "description" : "Description",
26 |     "total_package_size" : 1024,
27 |     "has_associated_files" : true,
28 |     "created_date" : "2021-02-11T00:00:00.000-0500",
29 |     "package_type" : "My Package",
30 |     "permission_group" : "Permission Group",
31 |     "file_count" : 2
32 |   }
33 | ]
34 | ```


--------------------------------------------------------------------------------
/endpoints/GET_PACKAGE.md:
--------------------------------------------------------------------------------
 1 | Get Package Endpoint
 2 | ====================
 3 | - Used to get data on a specific package.
 4 | - URL: `/{packageId}`
 5 | - Http Method: `GET`
 6 | - URL Parameters:
 7 |     - `packageId` Long: The id of the package
 8 |       you want to fetch the files for.
 9 | - [Requires authentication header](AUTHENTICATION.md)
10 | - Return data:
11 | ```json
12 | {
13 |   "links" : [
14 |     {
15 |       "rel" : "self",
16 |       "href" : "https://nda.nih.gov/api/package/1"
17 |     },
18 |     {
19 |       "rel" : "associate",
20 |       "href" : "https://nda.nih.gov/api/package/1/associate"
21 |     }
22 |   ],
23 |   "package_id" : 1,
24 |   "status" : "Ready to Download",
25 |   "description" : "Description",
26 |   "total_package_size" : 1024,
27 |   "has_associated_files" : true,
28 |   "created_date" : "2021-02-11T00:00:00.000-0500",
29 |   "package_type" : "Shared Package",
30 |   "permission_group" : "Permission Group",
31 |   "file_count" : 2
32 | }
33 | ```


--------------------------------------------------------------------------------
/endpoints/GET_PACKAGES.md:
--------------------------------------------------------------------------------
 1 | Get Packages Endpoint
 2 | =====================
 3 | - Used to get packages that the user has access to.
 4 | - URL: `/`
 5 | - Http Method: `GET`
 6 | - Query Parameters:
 7 |     - `type` PackageType **Optional**: Allows you to filter on
 8 |     what type of package you wish to query for.
 9 |     Package types include:
10 |       - All
11 |       - My Package
12 |       - Shared Package
13 |     - `status` PackageStatus **Optional**: Allows you to filter on
14 |     the status of package you wish to query for.
15 |     Package statuses include:
16 |       - Error Creating Package
17 |       - Ready to Download
18 |       - Download Paused
19 |       - Download Complete
20 |       - Package Deleted
21 |       - Creating Package
22 |       - Downloading
23 |       - Pending
24 |       - Download Stopped
25 |       - Initiate
26 |       - Download Error
27 |       - Upload Complete
28 |       - Error
29 |       - Download Incomplete
30 |       - Package Empty
31 |       - GUID Web Service
32 |     - `hasAssociatedFiles` Boolean **Optional**: Allows you to filter on
33 |     if a package has files associated with it.
34 | - [Requires authentication header](AUTHENTICATION.md)
35 | - Return data:
36 | ```json
37 | [
38 |   {
39 |     "links" : [
40 |       {
41 |         "rel" : "self",
42 |         "href" : "https://nda.nih.gov/api/package/1"
43 |       },
44 |       {
45 |         "rel" : "associate",
46 |         "href" : "https://nda.nih.gov/api/package/1/associate"
47 |       }
48 |     ],
49 |     "package_id" : 1,
50 |     "status" : "Ready to Download",
51 |     "description" : "Description",
52 |     "total_package_size" : 1024,
53 |     "has_associated_files" : true,
54 |     "created_date" : "2021-02-11T00:00:00.000-0500",
55 |     "package_type" : "Shared Package",
56 |     "permission_group" : "Permission Group",
57 |     "file_count" : 2
58 |   }
59 | ]
60 | ```


--------------------------------------------------------------------------------
/endpoints/GET_PACKAGE_FILES.md:
--------------------------------------------------------------------------------
 1 | Get Package Files Endpoint
 2 | ==========================
 3 | - Used to get the files of a specific package that the
 4 | user has access to.
 5 | - URL: `/{packageId}/files`
 6 | - Http Method: `GET`
 7 | - URL Parameters:
 8 |     - `packageId` Long: The id of the package
 9 |     you want to fetch the files for.
10 | - Query Parameters:
11 |     - `types` FileType[] **Optional**: An array of the desired
12 |     file types. File types include:
13 |         - Package Metadata
14 |         - Data
15 |         - Study
16 |         - Experiment
17 |         - Associated
18 |         - Collection
19 |         - Unknown
20 |     - `page` Integer **Optional**: Results are paginated to make
21 |     processing easier, this value determines which
22 |     page to query. Can also be `last` or `first`.
23 |     - `size` Integer **Optional**: Determines the size of the
24 |     pages returned. Can also be `all`.
25 | - [Requires authentication header](AUTHENTICATION.md)
26 | - Return data:
27 | ```json
28 | {
29 |   "results" : [
30 |     {
31 |       "dataFile" : false,
32 |       "documentFile" : false,
33 |       "associatedFile" : false,
34 |       "_links" : {
35 |         "self" : {
36 |           "href" : "/{packageId}/files/1"
37 |         },
38 |         "download_url" : {
39 |           "href" : "/{packageId}/files/1/download_url"
40 |         },
41 |         "download_token" : {
42 |           "href" : "/{packageId}/files/1/download_token"
43 |         }
44 |       },
45 |       "package_file_id" : 1,
46 |       "download_alias" : "somefile.pdf",
47 |       "file_size" : 2048,
48 |       "is_associated_file" : false,
49 |       "created_date" : "2021-02-10T00:00:00.000-0500",
50 |       "is_data_file" : false,
51 |       "is_document_file" : false,
52 |       "nda_file_type" : "Package Metadata"
53 |     }
54 |   ],
55 |   "_links" : {
56 |     "next" : {
57 |       "href" : "/{packageId}/files?page=2&size=1"
58 |     },
59 |     "previous" : {
60 |       "href" : "/{packageId}/files?page=1&size=1"
61 |     },
62 |     "first" : {
63 |       "href" : "/{packageId}/files?page=1&size=1"
64 |     },
65 |     "last" : {
66 |       "href" : "/{packageId}/files?page=15&size=1"
67 |     }
68 |   }
69 | }
70 | ```


--------------------------------------------------------------------------------
/endpoints/GET_PACKAGE_FILES_FROM_S3.md:
--------------------------------------------------------------------------------
 1 | Get Package Files From S3 Endpoint
 2 | ==================================
 3 | - Used to convert S3 links to package file references.
 4 | - URL: `/{packageId}/files`
 5 | - Http Method: `POST`
 6 | - URL Parameters:
 7 |     - `packageId` Long: The id of the package
 8 |       you want to fetch the files for.
 9 | - No Query Parameters
10 | - Post Parameters:
11 |     - String[]: The json post data to this endpoint
12 |       should only contain an array of strings that are
13 |       the s3 urls to convert to Package Service files.
14 | - Sample Post Data:
15 | ```json
16 | [
17 |   "s3://some/s3/url",
18 |   "s3://some/other/s3/url"
19 | ]
20 | ```
21 | - [Requires authentication header](AUTHENTICATION.md)
22 | - Return data:
23 | ```json
24 | [
25 |   {
26 |     "dataFile" : false,
27 |     "documentFile" : false,
28 |     "associatedFile" : false,
29 |     "_links" : {
30 |       "self" : {
31 |         "href" : "/{packageId}/files/1"
32 |       },
33 |       "download_url" : {
34 |         "href" : "/{packageId}/files/1/download_url"
35 |       },
36 |       "download_token" : {
37 |         "href" : "/{packageId}/files/1/download_token"
38 |       }
39 |     },
40 |     "package_file_id" : 1,
41 |     "download_alias" : "somefile.pdf",
42 |     "file_size" : 2048,
43 |     "is_associated_file" : false,
44 |     "created_date" : "2021-02-10T00:00:00.000-0500",
45 |     "is_data_file" : false,
46 |     "is_document_file" : false,
47 |     "nda_file_type" : "Package Metadata"
48 |   }
49 | ]
50 | ```


--------------------------------------------------------------------------------
/endpoints/GET_PACKAGE_FILE_DOWNLOAD_CREDENTIALS.md:
--------------------------------------------------------------------------------
 1 | Get Package File Download Credentials Endpoint
 2 | ==============================================
 3 | - Used to generate file download AWS credentials. This
 4 | can be used to use AWS Libraries to download files from
 5 | the S3 bucket.
 6 | - URL: `/{packageId}/files/multiFileDownloadCredentials`
 7 | - Http Method: `GET`
 8 | - Query Parameters:
 9 |     - `package_file_id` Long[]: The file ids you want to
10 |     generate an S3 download token for.
11 | - [Requires authentication header](AUTHENTICATION.md)
12 | - Return data:
13 | ```json
14 | {
15 |   "access_key" : "somekey",
16 |   "secret_key" : "somesecretkey",
17 |   "session_token" : "somesessiontoken",
18 |   "expiration_date" : "2021-02-14T00:00:00.000-0500",
19 |   "package_files" : [
20 |     {
21 |       "package_file_id" : 1,
22 |       "destination_uri" : "s3://some/s3/url",
23 |       "download_alias" : "filename"
24 |     }
25 |   ]
26 | }
27 | ```


--------------------------------------------------------------------------------
/endpoints/GET_SHARED_PACKAGES.md:
--------------------------------------------------------------------------------
 1 | Get Shared Packages Endpoint
 2 | ============================
 3 | - Gets the packages shared with your user, similar to
 4 |   [Get Packages](GET_PACKAGES.md) but executes a less expensive query.
 5 | - URL: `/sharedpackages`
 6 | - Http Method: `GET`
 7 | - No Parameters
 8 | - [Requires authentication header](AUTHENTICATION.md)
 9 | - Return data:
10 | ```json
11 | [
12 |   {
13 |     "userId": 1,
14 |     "dataRepositoryId": 1,
15 |     "dataRepositoryName": "Name",
16 |     "dataRepositoryDesc": "Description",
17 |     "packageId": 1,
18 |     "packageName": "PackageName",
19 |     "packageDescription": "Package Description",
20 |     "createdDate": "2021-02-03T00:11:57.095-0500",
21 |     "fileCount": 1,
22 |     "fileSize": 1024,
23 |     "links": [
24 |       {
25 |         "rel": "self",
26 |         "href": "https://nda.nih.gov/api/package/1"
27 |       },
28 |       {
29 |         "rel": "associate",
30 |         "href": "https://nda.nih.gov/api/package/1/associate"
31 |       }
32 |     ]
33 |   }
34 | ]
35 | ```


--------------------------------------------------------------------------------
/endpoints/README.md:
--------------------------------------------------------------------------------
 1 | Endpoints Advanced Documentation
 2 | ================================
 3 | These are the endpoints which are currently documented:
 4 | - [Authentication](AUTHENTICATION.md)
 5 | - [Get Packages](GET_PACKAGES.md)
 6 | - [Get Download Limits](GET_DOWNLOAD_LIMITS.md)
 7 | - [Get My Packages](GET_MY_PACKAGES.md)
 8 | - [Get Shared Packages](GET_SHARED_PACKAGES.md)
 9 | - [Get Specific Package](GET_PACKAGE.md)
10 | - [Get Package Files](GET_PACKAGE_FILES.md)
11 | - [Get Package Files from S3 Urls](GET_PACKAGE_FILES_FROM_S3.md)
12 | - [Get Package File Download Credentials](GET_PACKAGE_FILE_DOWNLOAD_CREDENTIALS.md)
13 | - [Batch Generate Presigned Urls](BATCH_GENERATE_PRESIGNED_URLS.md)  
14 | 
15 | There are more endpoints, they just have yet to be
16 | documented.


--------------------------------------------------------------------------------
/example/JAVA.md:
--------------------------------------------------------------------------------
  1 | Java Code Example
  2 | =================
  3 | Some sample java code for retrieving files
  4 | from Package Service. The HTTP Client library
  5 | used in this example is the Apache HTTP Client,
  6 | and the JSON Library used in this example is Gson.
  7 | 
  8 | Authentication
  9 | --------------
 10 | 
 11 | You can verify if your authentication credentials work
 12 | by using the `/auth` endpoint:
 13 | ```java
 14 | // Create the HttpClient with a customized User-Agent.
 15 | // Please be sure that you communicate to the Package Service
 16 | // using a custom User-Agent that helps us identify you.
 17 | // This will help protect against your address getting blocked
 18 | // in the case of an incident occurring.
 19 | HttpClient client = HttpClients.custom()
 20 |                         .setUserAgent("Example Client")
 21 |                         .build();
 22 | 
 23 | // Your credentials string should be Base64 encoded.
 24 | String credentials = Base64.getEncoder().encodeToString((username + ":" + password).getBytes());
 25 | 
 26 | // Create a basic get request and set the authorization
 27 | // and the accepted content type.
 28 | HttpGet authRequest = new HttpGet("https://nda.nih.gov/api/package/auth");
 29 | authRequest.setHeader("Accept", "application/json");
 30 | authRequest.setHeader("Authorization", "Basic " + credentials);
 31 | 
 32 | // Execute the request and store the response
 33 | HttpResponse response = client.execute(authRequest);
 34 | 
 35 | // Store the HTTP Response code
 36 | int status = response.getStatusLine().getStatusCode();
 37 | 
 38 | // Business logic
 39 | if (status == 200) {
 40 |     System.out.println("Successfully authenticated!");
 41 | } else {
 42 |     System.out.println("Authentication failed...");
 43 |     return;
 44 | }
 45 | ```
 46 | 
 47 | The auth endpoint is not required in your usage though,
 48 | it is only helpful for validating your credentials.  
 49 | 
 50 | Retrieving Files
 51 | ----------------
 52 | #### From Nothing:
 53 | If you want to get and download all files related to a
 54 | single package:
 55 | ```java
 56 | // Assume code in authentication section is present.
 57 | 
 58 | // Make a Gson object for decoding the JSON responses.
 59 | final Gson GSON = new GsonBuilder()
 60 |                     .setPrettyPrinting()
 61 |                     .create();
 62 | 
 63 | final long packageId = 1234;
 64 | 
 65 | // Construct the request to get the files of package 1234
 66 | // URL structure is: https://nda.nih.gov/api/package/{packageId}/files
 67 | HttpGet filesRequest = new HttpGet("https://nda.nih.gov/api/package/" + packageId + "/files");
 68 | filesRequest.setHeader("Accept", "application/json");
 69 | filesRequest.setHeader("Authorization", "Basic " + credentials);
 70 | 
 71 | response = client.execute(filesRequest);
 72 | 
 73 | // Get the String response body.
 74 | String responseBody = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);
 75 | 
 76 | // Convert the response body String to a JsonObject.
 77 | // This endpoint will only return a JsonObject.
 78 | JsonObject object = GSON.fromJson(responseBody, JsonObject.class);
 79 | 
 80 | // The results property will always be a JsonArray.
 81 | JsonArray results = object.get("results").getAsJsonArray();
 82 | 
 83 | // Business Logic.
 84 | 
 85 | // In our example we want to parse all files in
 86 | // a package and store their IDs
 87 | Map<Long, String> files = new HashMap<>();
 88 | 
 89 | results.forEach(e -> {
 90 |     // The array will always contain JsonObjects.
 91 |     JsonObject arrayObject = e.getAsJsonObject();
 92 |     
 93 |     // Add the file's ID to the list of file IDs.
 94 |     files.put(arrayObject.get("package_file_id").getAsLong(), arrayObject.get("download_alias").getAsString());
 95 | });
 96 | ```
 97 | 
 98 | #### From S3 URL
 99 | This is a use-case that was inspired by
100 | the [DCAN Labs NDA ABCD Downloader](https://github.com/DCAN-Labs/nda-abcd-s3-downloader).
101 | ([More on this workflow](../MANIFEST_FILE_NOTE.md))
102 | This method allows you to convert pre-existing s3
103 | file references to Package Service files:
104 | ```java
105 | // Assume code in authentication section is present.
106 | 
107 | // Handle reading in the file and adding the urls to a list.
108 | ArrayList<String> s3Urls = new ArrayList<>();
109 | 
110 | // Not all manifest files are the same, all that really matters
111 | // for this approach is loading the S3 paths, and having a package
112 | // that has all of the S3 files associated with it.
113 | String rows;
114 | // Open reader and read in data.
115 | BufferedReader reader = new BufferedReader(new FileReader("datastructure_manifest.txt"));
116 | while ((rows = reader.readLine()) != null) {
117 |     String[] data = rows.split("\t");
118 | 
119 |     for (String row : data) {
120 |         // Sanatize
121 |         String s3 = row.replace("\"", "");
122 | 
123 |         if (!s3.startsWith("s3://")) {
124 |             continue;
125 |         }
126 | 
127 |         s3Urls.add(s3);
128 |     }
129 | }
130 | 
131 | // Make a Gson object for decoding the JSON responses.
132 | final Gson GSON = new GsonBuilder()
133 |                     .setPrettyPrinting()
134 |                     .create();
135 | 
136 | final long packageId = 1234;
137 | 
138 | // Fetch files from package 1234
139 | HttpPost urlRequest = new HttpPost("https://nda.nih.gov/api/package/" + packageId + "/files");
140 | urlRequest.setHeader("Accept", "application/json");
141 | urlRequest.setHeader("Authorization", "Basic " + credentials);
142 | 
143 | // Create our post data.
144 | JsonArray filesArray = new JsonArray();
145 | 
146 | // Populate post data.
147 | s3Urls.forEach(filesArray::add);
148 | 
149 | // Set post data.
150 | urlRequest.setEntity(new ByteArrayEntity(GSON.toJson(filesArray).getBytes(StandardCharsets.UTF_8)));
151 | 
152 | response = client.execute(urlRequest);
153 | 
154 | // Business Logic.
155 | 
156 | Map<Long, String> files = new HashMap<>();
157 | 
158 | results.forEach(e -> {
159 |     // The array will always contain JsonObjects.
160 |     JsonObject arrayObject = e.getAsJsonObject();
161 | 
162 |     // Add the file's ID to the list of file IDs.
163 |     files.put(arrayObject.get("package_file_id").getAsLong(), arrayObject.get("download_alias").getAsString());
164 | });
165 | ```
166 | 
167 | Downloading Files
168 | -----------------
169 | After receiving files using one of the above methods you will have a map of
170 | all file ids in a package and their names. If you were to want to then move
171 | to downloading said files, you'd do something along the lines of:
172 | ```java
173 | // Assume code in authentication section is present.
174 | // Assume that one of the retrieving files implementations is present too
175 | 
176 | // Create a post request to the batch generate presigned urls endpoint.
177 | HttpPost urlRequest = new HttpPost("https://nda.nih.gov/api/package/" + packageId + "/files/batchGeneratePresignedUrls");
178 | urlRequest.setHeader("Accept", "application/json");
179 | urlRequest.setHeader("Authorization", "Basic " + credentials);
180 | 
181 | // Create our post data.
182 | JsonArray filesArray = new JsonArray();
183 | 
184 | // Populate post data.
185 | files.keySet().forEach(filesArray::add);
186 | 
187 | // Set post data.
188 | urlRequest.setEntity(new ByteArrayEntity(GSON.toJson(filesArray).getBytes(StandardCharsets.UTF_8)));
189 | 
190 | response = client.execute(urlRequest);
191 | 
192 | responseBody = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);
193 | object = GSON.fromJson(responseBody, JsonObject.class);
194 | 
195 | // Business Logic.
196 | 
197 | // Get the presignedUrls array from response.
198 | JsonArray urlArray = object.get("presignedUrls").getAsJsonArray();
199 | 
200 | // Iterate on urls.
201 | urlArray.forEach(e -> {
202 |     // Always will be an array of JsonObjects.
203 |     JsonObject downloadObject = e.getAsJsonObject();
204 |     
205 |     Long fileId = downloadObject.get("package_file_id").getAsLong();
206 | 
207 |     // You should catch an IOException around all of the following code.
208 |     // It's not caught here for brevity.
209 |     URL downloadUrl = new URL(downloadObject.get("downloadURL").getAsString());
210 |     String name = files.get(fileId);
211 |     
212 |     // Begin the download.
213 |     ReadableByteChannel channel = Channels.newChannel(downloadUrl.openStream());
214 |     
215 |     // Process file name to make this multiplat
216 |     name = name.replace("/", File.seperator);
217 |     File result = new File("somedownloaddir", name);
218 |     
219 |     // Since packages have their own file structures, having
220 |     // this here will ensure that all required directories are
221 |     // created based off of the remote.
222 |     if (!result.getParentFile().exists()) {
223 |         result.getParentFile().mkdirs();
224 |     }
225 |     
226 |     // Actually write the data to the file.
227 |     FileOutputStream fileOut = new FileOutputStream(result);
228 |     fileOut.getChannel().transferFrom(channel, 0, Long.MAX_VALUE);
229 | });
230 | ```
231 | 
232 | Endpoints used in this example:
233 | - [Authentication](../endpoints/AUTHENTICATION.md)
234 | - [Get Package Files](../endpoints/GET_PACKAGE_FILES.md)
235 | - [Batch Generate Presigned URLS](../endpoints/BATCH_GENERATE_PRESIGNED_URLS.md)
236 | - [Get Package Files from S3 Link](../endpoints/GET_PACKAGE_FILES_FROM_S3.md)


--------------------------------------------------------------------------------
/example/PYTHON.md:
--------------------------------------------------------------------------------
  1 | Python Code Example
  2 | ===================
  3 | Some sample python code for retrieving files
  4 | from Package Service. The `requests` library
  5 | is used in this example (`pip install requests`).
  6 | The sample code was written using Python v3.9.1.
  7 | 
  8 | Authentication
  9 | --------------
 10 | 
 11 | You can verify if your authentication credentials work
 12 | by using the `/auth` endpoint:
 13 | ```python
 14 | import base64
 15 | import requests
 16 | import json
 17 | import urllib.request
 18 | import shutil
 19 | from pathlib import Path
 20 | 
 21 | # Encode our credentials then convert it to a string.
 22 | credentials = base64.b64encode(b'username:password').decode('utf-8')
 23 | 
 24 | # Create the headers we will be using for all requests.
 25 | headers = {
 26 |     'Authorization': 'Basic ' + credentials,
 27 |     'User-Agent': 'Example Client',
 28 |     'Accept': 'application/json'
 29 | }
 30 | 
 31 | # Send Http request
 32 | response = requests.get('https://nda.nih.gov/api/package/auth', headers=headers)
 33 | 
 34 | # Business Logic.
 35 | 
 36 | # If the response status code does not equal 200
 37 | # throw an exception up.
 38 | if response.status_code != requests.codes.ok:
 39 |     print('failed to authenticate')
 40 |     response.raise_for_status()
 41 | 
 42 | # The auth endpoint does no return any data to parse
 43 | # only a Http response code is returned.
 44 | ```
 45 | 
 46 | The auth endpoint is not required in your usage though,
 47 | it is only helpful for validating your credentials.
 48 | 
 49 | Retrieving Files
 50 | ----------------
 51 | #### From Nothing:
 52 | If you want to get and download all files related to a
 53 | single package:
 54 | ```python
 55 | # Assume code in authentication section is present.
 56 | 
 57 | packageId = 1234
 58 | 
 59 | # Construct the request to get the files of package 1234
 60 | # URL structure is: https://nda.nih.gov/api/package/{packageId}/files
 61 | response = requests.get('https://nda.nih.gov/api/package/' + str(packageId) + '/files', headers=headers)
 62 | 
 63 | # Get the results array from the json response.
 64 | results = response.json()['results']
 65 | 
 66 | # Business Logic.
 67 | 
 68 | files = {}
 69 | 
 70 | # Add important file data to the files dictionary.
 71 | for f in results:
 72 |     files[f['package_file_id']] = {'name': f['download_alias']}
 73 | ```
 74 | 
 75 | #### From S3 URL
 76 | This is a use-case that was inspired by
 77 | the [DCAN Labs NDA ABCD Downloader](https://github.com/DCAN-Labs/nda-abcd-s3-downloader).
 78 | ([More on this workflow](../MANIFEST_FILE_NOTE.md))
 79 | This method allows you to convert pre-existing s3
 80 | file references to Package Service files:
 81 | ```python
 82 | import csv
 83 | 
 84 | # Assume code in authentication section is present.
 85 | 
 86 | packageId = 1234
 87 | 
 88 | s3Files = []
 89 | 
 90 | # Load in and process the manifest file.
 91 | # Not all manifest files are structured like this, all you require is
 92 | # an S3 url and a package that has the files associated with it.
 93 | with open('datastructure_manifest.txt', 'r') as manifest:
 94 |     for rows in csv.reader(manifest, dialect='excel-tab'):
 95 |         for row in rows:
 96 |             if row.startsWith('s3://'):
 97 |                 s3Files.append(row)
 98 | 
 99 | # The manifest files have their column declarations listed twice, trim those out
100 | s3Files = s3Files[2:]
101 | 
102 | # Construct the request to get the files of package 1234
103 | # URL structure is: https://nda.nih.gov/api/package/{packageId}/files
104 | response = requests.post('https://nda.nih.gov/api/package/' + str(packageId) + '/files', json=s3Files, headers=headers)
105 | 
106 | # Business Logic.
107 | 
108 | files = {}
109 | 
110 | # Add important file data to the files dictionary.
111 | # We can skip having to transform the json because a json array is returned.
112 | for f in response.json():
113 |     files[f['package_file_id']] = {'name': f['download_alias']}
114 | ```
115 | 
116 | Downloading Files
117 | -----------------
118 | After receiving files using one of the above methods you will have a dictionary of
119 | all file ids in a package and their names. If you were to want to then move
120 | to downloading said files, you'd do something along the lines of:
121 | ```python
122 | # Assume code in authentication section is present.
123 | # Assume that one of the retrieving files implementations is present too
124 | 
125 | # Create a post request to the batch generate presigned urls endpoint.
126 | # Use keys from files dictionary to form a list, which is converted to
127 | # a json array which is posted.
128 | response = requests.post('https://nda.nih.gov/api/package/' + str(packageId) + '/files/batchGeneratePresignedUrls', json=list(files.keys()), headers=headers)
129 | 
130 | # Get the presigned urls from the response.
131 | results = response.json()['presignedUrls']
132 | 
133 | # Business Logic.
134 | 
135 | # Add a download key to the file's data.
136 | for url in results:
137 |     files[url['package_file_id']]['download'] = url['downloadURL']
138 | 
139 | # Iterate on file id and it's data to perform the downloads.
140 | for id, data in files:
141 |     name = data['name']
142 |     downloadUrl = data['download']
143 |     # Create a downloads directory
144 |     file = 'downloads/' + name
145 |     # Strip out the file's name for creating non-existent directories
146 |     directory = file[:file.rfind('/')]
147 |     
148 |     # Create non-existent directories, package files have their
149 |     # own directory structure, and this will ensure that it is
150 |     # kept in tact when downloading.
151 |     Path(directory).mkdir(parents=True, exist_ok=True)
152 |     
153 |     # Initiate the download.
154 |     with urllib.request.urlopen(downloadUrl) as dl, open(file, 'wb') as out_file:
155 |         shutil.copyfileobj(dl, out_file)
156 | ```
157 | 
158 | Endpoints used in this example:
159 | - [Authentication](../endpoints/AUTHENTICATION.md)
160 | - [Get Package Files](../endpoints/GET_PACKAGE_FILES.md)
161 | - [Batch Generate Presigned URLS](../endpoints/BATCH_GENERATE_PRESIGNED_URLS.md)
162 | - [Get Package Files from S3 Link](../endpoints/GET_PACKAGE_FILES_FROM_S3.md)


--------------------------------------------------------------------------------
/example/README.md:
--------------------------------------------------------------------------------
1 | Code Examples
2 | =============
3 | - [Java](JAVA.md)
4 | - [Python](PYTHON.md)
5 | 


--------------------------------------------------------------------------------