├── .gitignore ├── LICENSE ├── MANIFEST_FILE_NOTE.md ├── README.md ├── endpoints ├── AUTHENTICATION.md ├── BATCH_GENERATE_PRESIGNED_URLS.md ├── GET_DOWNLOAD_LIMITS.md ├── GET_MY_PACKAGES.md ├── GET_PACKAGE.md ├── GET_PACKAGES.md ├── GET_PACKAGE_FILES.md ├── GET_PACKAGE_FILES_FROM_S3.md ├── GET_PACKAGE_FILE_DOWNLOAD_CREDENTIALS.md ├── GET_SHARED_PACKAGES.md └── README.md └── example ├── JAVA.md ├── PYTHON.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | **/.idea/ 2 | **/*.iml -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 NIMH Data Archive 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MANIFEST_FILE_NOTE.md: -------------------------------------------------------------------------------- 1 | More on Manifest Files 2 | ====================== 3 | There are a few key differences between Package Service 4 | and Data Manager, those moving your manifest file-based 5 | downloaders between the two tools might have noticed this. 6 | We have tried to make the service as 3rd party friendly as 7 | possible, which includes changes to how files are indexed 8 | and downloaded. So it's important to know a few key 9 | differences that you'll encounter and how to 'work around' 10 | them. 11 | 12 | The most important one of them is the indexing of files, 13 | Package Service is very particular about how it keeps track 14 | of file data. This means that if you use the S3 Url to Package 15 | Service file conversion method as outlined in the [sample code segments](example/README.md) 16 | the package being queried will need to have the files present 17 | for the S3 conversion to work. What does this mean? Well let's 18 | explore a hypothetical: 19 | 20 | Let's say you're operating a similar flow to 21 | [DCAN Labs](https://github.com/DCAN-Labs/nda-abcd-s3-downloader), 22 | you want to download your package somewhere remote, but you also 23 | want to take advantage of the extra metadata located within your 24 | manifest file. You can download the manifest file directly from 25 | Package Service without having to jump through the hoop of 26 | downloading said manifest file through Download Manager, instead 27 | you can download it by interfacing with Package Service directly. 28 | However, if you follow the instructions perfectly you'll notice that Package 29 | Service cannot convert any of the S3 Urls to Package Service file references. 30 | This is because the files are not associated with your package, as when creating 31 | your package you didn't include the associated files, which results in the Package 32 | Service being unable to locate the appropriate package file associations with that 33 | S3 Url. What you can do to alleviate this is that when you're creating your package 34 | you include associated files, this will ensure the associations are present for 35 | Package Service to be able to resolve the S3 Urls, but this will make the package 36 | more 'cluttered', so you'll probably have to iterate on all files checking their names 37 | to ensure that you're downloading and processing the correct file. This is a tedious change and 38 | requires one to iterate over all of the package's files to snag the appropriate ids, 39 | so we have implemented a way to narrow down your bulk file requests to just a specific 40 | type of file, so as long as you know what type of file you're looking for it'll allow 41 | you to shave of thousands of files with a simple specification. 42 | 43 | You can read more on the specific endpoints of Package Service [here](endpoints/README.md). -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Package Service Documentation 2 | ============================= 3 | - [Code Examples](example/README.md) 4 | - [Endpoints](endpoints/README.md) 5 | - [A note on using manifest files](MANIFEST_FILE_NOTE.md) 6 | 7 | FAQ 8 | --- 9 | ### I keep getting an 'access denied' when querying a package I created today 10 | This is probably caused by the database not being fully caught up, while 11 | your package is technically ready to download, permissions updates haven't 12 | updated fully, permissions are updated every night. 13 | 14 | ### What is a manifest file? 15 | A manifest file is a file that is contained within a package that has the 16 | express intent of serving as an extra layer of metadata for that specific 17 | package. Manifest files tend to contain various forms of data and then a 18 | S3 link to the file associated with that data, this effectively allows 19 | advanced users to perform specific operations on specific files within a 20 | package without having to operate on the entire set of data. -------------------------------------------------------------------------------- /endpoints/AUTHENTICATION.md: -------------------------------------------------------------------------------- 1 | Authentication Endpoint 2 | ======================= 3 | - Will return a 200 if user credentials are valid and usable. 4 | - URL `/auth` 5 | - Http Method: `GET` 6 | - No Parameters 7 | - Requires authentication header 8 | - `Authentication`: `Basic ` 9 | - No return data. -------------------------------------------------------------------------------- /endpoints/BATCH_GENERATE_PRESIGNED_URLS.md: -------------------------------------------------------------------------------- 1 | Batch Generate Presigned Urls Endpoint 2 | ====================================== 3 | - This endpoint is used to generate presigned urls to 4 | allow for the downloading of files from the Package Service. 5 | - URL: `/{packageId}/files/batchGeneratePresignedUrls` 6 | - Http Method: `POST` 7 | - URL Parameters: 8 | - `packageId` Long: The id of the package 9 | you want to fetch the files for. 10 | - No Query Parameters 11 | - Post Parameters: 12 | - Long[]: The json post data to this endpoint 13 | should only contain an array of longs that are 14 | the file ids to generate urls for. 15 | - Sample Post Data: 16 | ```json 17 | [ 18 | 1, 19 | 2 20 | ] 21 | ``` 22 | - [Requires authentication header](AUTHENTICATION.md) 23 | - Return data: 24 | ```json 25 | { 26 | "presignedUrls" : [ 27 | { 28 | "package_file_id" : 1, 29 | "downloadURL" : "https://somes3url.s3.amazonaws.com" 30 | }, 31 | { 32 | "package_file_id" : 2, 33 | "downloadURL" : "https://somes3url.s3.amazonaws.com" 34 | } 35 | ], 36 | "errors" : [] 37 | } 38 | ``` -------------------------------------------------------------------------------- /endpoints/GET_DOWNLOAD_LIMITS.md: -------------------------------------------------------------------------------- 1 | Get Download Limits Endpoint 2 | ============================ 3 | - Used to get the download limits of the user. 4 | - URL: `/downloadlimits` 5 | - Http Method: `GET` 6 | - No Parameters 7 | - [Requires authentication header](AUTHENTICATION.md) 8 | - Return data: 9 | ```json 10 | { 11 | "download_threshold" : 21990232555520, 12 | "download_volume_outside_aws" : 1069080920055, 13 | "download_volume_inside_aws" : 2208240775095, 14 | "download_volume_monthly_outside_aws" : 32858087790, 15 | "download_volume_monthly_inside_aws" : 321444489496 16 | } 17 | ``` -------------------------------------------------------------------------------- /endpoints/GET_MY_PACKAGES.md: -------------------------------------------------------------------------------- 1 | Get My Packages Endpoint 2 | ======================== 3 | - Gets the packages created by your user, similar to 4 | [Get Packages](GET_PACKAGES.md) but executes a less expensive query. 5 | - URL: `/mypackages` 6 | - Http Method: `GET` 7 | - No Parameters 8 | - [Requires authentication header](AUTHENTICATION.md) 9 | - Return data: 10 | ```json 11 | [ 12 | { 13 | "links" : [ 14 | { 15 | "rel" : "self", 16 | "href" : "https://nda.nih.gov/api/package/1" 17 | }, 18 | { 19 | "rel" : "associate", 20 | "href" : "https://nda.nih.gov/api/package/1/associate" 21 | } 22 | ], 23 | "package_id" : 1, 24 | "status" : "Ready to Download", 25 | "description" : "Description", 26 | "total_package_size" : 1024, 27 | "has_associated_files" : true, 28 | "created_date" : "2021-02-11T00:00:00.000-0500", 29 | "package_type" : "My Package", 30 | "permission_group" : "Permission Group", 31 | "file_count" : 2 32 | } 33 | ] 34 | ``` -------------------------------------------------------------------------------- /endpoints/GET_PACKAGE.md: -------------------------------------------------------------------------------- 1 | Get Package Endpoint 2 | ==================== 3 | - Used to get data on a specific package. 4 | - URL: `/{packageId}` 5 | - Http Method: `GET` 6 | - URL Parameters: 7 | - `packageId` Long: The id of the package 8 | you want to fetch the files for. 9 | - [Requires authentication header](AUTHENTICATION.md) 10 | - Return data: 11 | ```json 12 | { 13 | "links" : [ 14 | { 15 | "rel" : "self", 16 | "href" : "https://nda.nih.gov/api/package/1" 17 | }, 18 | { 19 | "rel" : "associate", 20 | "href" : "https://nda.nih.gov/api/package/1/associate" 21 | } 22 | ], 23 | "package_id" : 1, 24 | "status" : "Ready to Download", 25 | "description" : "Description", 26 | "total_package_size" : 1024, 27 | "has_associated_files" : true, 28 | "created_date" : "2021-02-11T00:00:00.000-0500", 29 | "package_type" : "Shared Package", 30 | "permission_group" : "Permission Group", 31 | "file_count" : 2 32 | } 33 | ``` -------------------------------------------------------------------------------- /endpoints/GET_PACKAGES.md: -------------------------------------------------------------------------------- 1 | Get Packages Endpoint 2 | ===================== 3 | - Used to get packages that the user has access to. 4 | - URL: `/` 5 | - Http Method: `GET` 6 | - Query Parameters: 7 | - `type` PackageType **Optional**: Allows you to filter on 8 | what type of package you wish to query for. 9 | Package types include: 10 | - All 11 | - My Package 12 | - Shared Package 13 | - `status` PackageStatus **Optional**: Allows you to filter on 14 | the status of package you wish to query for. 15 | Package statuses include: 16 | - Error Creating Package 17 | - Ready to Download 18 | - Download Paused 19 | - Download Complete 20 | - Package Deleted 21 | - Creating Package 22 | - Downloading 23 | - Pending 24 | - Download Stopped 25 | - Initiate 26 | - Download Error 27 | - Upload Complete 28 | - Error 29 | - Download Incomplete 30 | - Package Empty 31 | - GUID Web Service 32 | - `hasAssociatedFiles` Boolean **Optional**: Allows you to filter on 33 | if a package has files associated with it. 34 | - [Requires authentication header](AUTHENTICATION.md) 35 | - Return data: 36 | ```json 37 | [ 38 | { 39 | "links" : [ 40 | { 41 | "rel" : "self", 42 | "href" : "https://nda.nih.gov/api/package/1" 43 | }, 44 | { 45 | "rel" : "associate", 46 | "href" : "https://nda.nih.gov/api/package/1/associate" 47 | } 48 | ], 49 | "package_id" : 1, 50 | "status" : "Ready to Download", 51 | "description" : "Description", 52 | "total_package_size" : 1024, 53 | "has_associated_files" : true, 54 | "created_date" : "2021-02-11T00:00:00.000-0500", 55 | "package_type" : "Shared Package", 56 | "permission_group" : "Permission Group", 57 | "file_count" : 2 58 | } 59 | ] 60 | ``` -------------------------------------------------------------------------------- /endpoints/GET_PACKAGE_FILES.md: -------------------------------------------------------------------------------- 1 | Get Package Files Endpoint 2 | ========================== 3 | - Used to get the files of a specific package that the 4 | user has access to. 5 | - URL: `/{packageId}/files` 6 | - Http Method: `GET` 7 | - URL Parameters: 8 | - `packageId` Long: The id of the package 9 | you want to fetch the files for. 10 | - Query Parameters: 11 | - `types` FileType[] **Optional**: An array of the desired 12 | file types. File types include: 13 | - Package Metadata 14 | - Data 15 | - Study 16 | - Experiment 17 | - Associated 18 | - Collection 19 | - Unknown 20 | - `page` Integer **Optional**: Results are paginated to make 21 | processing easier, this value determines which 22 | page to query. Can also be `last` or `first`. 23 | - `size` Integer **Optional**: Determines the size of the 24 | pages returned. Can also be `all`. 25 | - [Requires authentication header](AUTHENTICATION.md) 26 | - Return data: 27 | ```json 28 | { 29 | "results" : [ 30 | { 31 | "dataFile" : false, 32 | "documentFile" : false, 33 | "associatedFile" : false, 34 | "_links" : { 35 | "self" : { 36 | "href" : "/{packageId}/files/1" 37 | }, 38 | "download_url" : { 39 | "href" : "/{packageId}/files/1/download_url" 40 | }, 41 | "download_token" : { 42 | "href" : "/{packageId}/files/1/download_token" 43 | } 44 | }, 45 | "package_file_id" : 1, 46 | "download_alias" : "somefile.pdf", 47 | "file_size" : 2048, 48 | "is_associated_file" : false, 49 | "created_date" : "2021-02-10T00:00:00.000-0500", 50 | "is_data_file" : false, 51 | "is_document_file" : false, 52 | "nda_file_type" : "Package Metadata" 53 | } 54 | ], 55 | "_links" : { 56 | "next" : { 57 | "href" : "/{packageId}/files?page=2&size=1" 58 | }, 59 | "previous" : { 60 | "href" : "/{packageId}/files?page=1&size=1" 61 | }, 62 | "first" : { 63 | "href" : "/{packageId}/files?page=1&size=1" 64 | }, 65 | "last" : { 66 | "href" : "/{packageId}/files?page=15&size=1" 67 | } 68 | } 69 | } 70 | ``` -------------------------------------------------------------------------------- /endpoints/GET_PACKAGE_FILES_FROM_S3.md: -------------------------------------------------------------------------------- 1 | Get Package Files From S3 Endpoint 2 | ================================== 3 | - Used to convert S3 links to package file references. 4 | - URL: `/{packageId}/files` 5 | - Http Method: `POST` 6 | - URL Parameters: 7 | - `packageId` Long: The id of the package 8 | you want to fetch the files for. 9 | - No Query Parameters 10 | - Post Parameters: 11 | - String[]: The json post data to this endpoint 12 | should only contain an array of strings that are 13 | the s3 urls to convert to Package Service files. 14 | - Sample Post Data: 15 | ```json 16 | [ 17 | "s3://some/s3/url", 18 | "s3://some/other/s3/url" 19 | ] 20 | ``` 21 | - [Requires authentication header](AUTHENTICATION.md) 22 | - Return data: 23 | ```json 24 | [ 25 | { 26 | "dataFile" : false, 27 | "documentFile" : false, 28 | "associatedFile" : false, 29 | "_links" : { 30 | "self" : { 31 | "href" : "/{packageId}/files/1" 32 | }, 33 | "download_url" : { 34 | "href" : "/{packageId}/files/1/download_url" 35 | }, 36 | "download_token" : { 37 | "href" : "/{packageId}/files/1/download_token" 38 | } 39 | }, 40 | "package_file_id" : 1, 41 | "download_alias" : "somefile.pdf", 42 | "file_size" : 2048, 43 | "is_associated_file" : false, 44 | "created_date" : "2021-02-10T00:00:00.000-0500", 45 | "is_data_file" : false, 46 | "is_document_file" : false, 47 | "nda_file_type" : "Package Metadata" 48 | } 49 | ] 50 | ``` -------------------------------------------------------------------------------- /endpoints/GET_PACKAGE_FILE_DOWNLOAD_CREDENTIALS.md: -------------------------------------------------------------------------------- 1 | Get Package File Download Credentials Endpoint 2 | ============================================== 3 | - Used to generate file download AWS credentials. This 4 | can be used to use AWS Libraries to download files from 5 | the S3 bucket. 6 | - URL: `/{packageId}/files/multiFileDownloadCredentials` 7 | - Http Method: `GET` 8 | - Query Parameters: 9 | - `package_file_id` Long[]: The file ids you want to 10 | generate an S3 download token for. 11 | - [Requires authentication header](AUTHENTICATION.md) 12 | - Return data: 13 | ```json 14 | { 15 | "access_key" : "somekey", 16 | "secret_key" : "somesecretkey", 17 | "session_token" : "somesessiontoken", 18 | "expiration_date" : "2021-02-14T00:00:00.000-0500", 19 | "package_files" : [ 20 | { 21 | "package_file_id" : 1, 22 | "destination_uri" : "s3://some/s3/url", 23 | "download_alias" : "filename" 24 | } 25 | ] 26 | } 27 | ``` -------------------------------------------------------------------------------- /endpoints/GET_SHARED_PACKAGES.md: -------------------------------------------------------------------------------- 1 | Get Shared Packages Endpoint 2 | ============================ 3 | - Gets the packages shared with your user, similar to 4 | [Get Packages](GET_PACKAGES.md) but executes a less expensive query. 5 | - URL: `/sharedpackages` 6 | - Http Method: `GET` 7 | - No Parameters 8 | - [Requires authentication header](AUTHENTICATION.md) 9 | - Return data: 10 | ```json 11 | [ 12 | { 13 | "userId": 1, 14 | "dataRepositoryId": 1, 15 | "dataRepositoryName": "Name", 16 | "dataRepositoryDesc": "Description", 17 | "packageId": 1, 18 | "packageName": "PackageName", 19 | "packageDescription": "Package Description", 20 | "createdDate": "2021-02-03T00:11:57.095-0500", 21 | "fileCount": 1, 22 | "fileSize": 1024, 23 | "links": [ 24 | { 25 | "rel": "self", 26 | "href": "https://nda.nih.gov/api/package/1" 27 | }, 28 | { 29 | "rel": "associate", 30 | "href": "https://nda.nih.gov/api/package/1/associate" 31 | } 32 | ] 33 | } 34 | ] 35 | ``` -------------------------------------------------------------------------------- /endpoints/README.md: -------------------------------------------------------------------------------- 1 | Endpoints Advanced Documentation 2 | ================================ 3 | These are the endpoints which are currently documented: 4 | - [Authentication](AUTHENTICATION.md) 5 | - [Get Packages](GET_PACKAGES.md) 6 | - [Get Download Limits](GET_DOWNLOAD_LIMITS.md) 7 | - [Get My Packages](GET_MY_PACKAGES.md) 8 | - [Get Shared Packages](GET_SHARED_PACKAGES.md) 9 | - [Get Specific Package](GET_PACKAGE.md) 10 | - [Get Package Files](GET_PACKAGE_FILES.md) 11 | - [Get Package Files from S3 Urls](GET_PACKAGE_FILES_FROM_S3.md) 12 | - [Get Package File Download Credentials](GET_PACKAGE_FILE_DOWNLOAD_CREDENTIALS.md) 13 | - [Batch Generate Presigned Urls](BATCH_GENERATE_PRESIGNED_URLS.md) 14 | 15 | There are more endpoints, they just have yet to be 16 | documented. -------------------------------------------------------------------------------- /example/JAVA.md: -------------------------------------------------------------------------------- 1 | Java Code Example 2 | ================= 3 | Some sample java code for retrieving files 4 | from Package Service. The HTTP Client library 5 | used in this example is the Apache HTTP Client, 6 | and the JSON Library used in this example is Gson. 7 | 8 | Authentication 9 | -------------- 10 | 11 | You can verify if your authentication credentials work 12 | by using the `/auth` endpoint: 13 | ```java 14 | // Create the HttpClient with a customized User-Agent. 15 | // Please be sure that you communicate to the Package Service 16 | // using a custom User-Agent that helps us identify you. 17 | // This will help protect against your address getting blocked 18 | // in the case of an incident occurring. 19 | HttpClient client = HttpClients.custom() 20 | .setUserAgent("Example Client") 21 | .build(); 22 | 23 | // Your credentials string should be Base64 encoded. 24 | String credentials = Base64.getEncoder().encodeToString((username + ":" + password).getBytes()); 25 | 26 | // Create a basic get request and set the authorization 27 | // and the accepted content type. 28 | HttpGet authRequest = new HttpGet("https://nda.nih.gov/api/package/auth"); 29 | authRequest.setHeader("Accept", "application/json"); 30 | authRequest.setHeader("Authorization", "Basic " + credentials); 31 | 32 | // Execute the request and store the response 33 | HttpResponse response = client.execute(authRequest); 34 | 35 | // Store the HTTP Response code 36 | int status = response.getStatusLine().getStatusCode(); 37 | 38 | // Business logic 39 | if (status == 200) { 40 | System.out.println("Successfully authenticated!"); 41 | } else { 42 | System.out.println("Authentication failed..."); 43 | return; 44 | } 45 | ``` 46 | 47 | The auth endpoint is not required in your usage though, 48 | it is only helpful for validating your credentials. 49 | 50 | Retrieving Files 51 | ---------------- 52 | #### From Nothing: 53 | If you want to get and download all files related to a 54 | single package: 55 | ```java 56 | // Assume code in authentication section is present. 57 | 58 | // Make a Gson object for decoding the JSON responses. 59 | final Gson GSON = new GsonBuilder() 60 | .setPrettyPrinting() 61 | .create(); 62 | 63 | final long packageId = 1234; 64 | 65 | // Construct the request to get the files of package 1234 66 | // URL structure is: https://nda.nih.gov/api/package/{packageId}/files 67 | HttpGet filesRequest = new HttpGet("https://nda.nih.gov/api/package/" + packageId + "/files"); 68 | filesRequest.setHeader("Accept", "application/json"); 69 | filesRequest.setHeader("Authorization", "Basic " + credentials); 70 | 71 | response = client.execute(filesRequest); 72 | 73 | // Get the String response body. 74 | String responseBody = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8); 75 | 76 | // Convert the response body String to a JsonObject. 77 | // This endpoint will only return a JsonObject. 78 | JsonObject object = GSON.fromJson(responseBody, JsonObject.class); 79 | 80 | // The results property will always be a JsonArray. 81 | JsonArray results = object.get("results").getAsJsonArray(); 82 | 83 | // Business Logic. 84 | 85 | // In our example we want to parse all files in 86 | // a package and store their IDs 87 | Map files = new HashMap<>(); 88 | 89 | results.forEach(e -> { 90 | // The array will always contain JsonObjects. 91 | JsonObject arrayObject = e.getAsJsonObject(); 92 | 93 | // Add the file's ID to the list of file IDs. 94 | files.put(arrayObject.get("package_file_id").getAsLong(), arrayObject.get("download_alias").getAsString()); 95 | }); 96 | ``` 97 | 98 | #### From S3 URL 99 | This is a use-case that was inspired by 100 | the [DCAN Labs NDA ABCD Downloader](https://github.com/DCAN-Labs/nda-abcd-s3-downloader). 101 | ([More on this workflow](../MANIFEST_FILE_NOTE.md)) 102 | This method allows you to convert pre-existing s3 103 | file references to Package Service files: 104 | ```java 105 | // Assume code in authentication section is present. 106 | 107 | // Handle reading in the file and adding the urls to a list. 108 | ArrayList s3Urls = new ArrayList<>(); 109 | 110 | // Not all manifest files are the same, all that really matters 111 | // for this approach is loading the S3 paths, and having a package 112 | // that has all of the S3 files associated with it. 113 | String rows; 114 | // Open reader and read in data. 115 | BufferedReader reader = new BufferedReader(new FileReader("datastructure_manifest.txt")); 116 | while ((rows = reader.readLine()) != null) { 117 | String[] data = rows.split("\t"); 118 | 119 | for (String row : data) { 120 | // Sanatize 121 | String s3 = row.replace("\"", ""); 122 | 123 | if (!s3.startsWith("s3://")) { 124 | continue; 125 | } 126 | 127 | s3Urls.add(s3); 128 | } 129 | } 130 | 131 | // Make a Gson object for decoding the JSON responses. 132 | final Gson GSON = new GsonBuilder() 133 | .setPrettyPrinting() 134 | .create(); 135 | 136 | final long packageId = 1234; 137 | 138 | // Fetch files from package 1234 139 | HttpPost urlRequest = new HttpPost("https://nda.nih.gov/api/package/" + packageId + "/files"); 140 | urlRequest.setHeader("Accept", "application/json"); 141 | urlRequest.setHeader("Authorization", "Basic " + credentials); 142 | 143 | // Create our post data. 144 | JsonArray filesArray = new JsonArray(); 145 | 146 | // Populate post data. 147 | s3Urls.forEach(filesArray::add); 148 | 149 | // Set post data. 150 | urlRequest.setEntity(new ByteArrayEntity(GSON.toJson(filesArray).getBytes(StandardCharsets.UTF_8))); 151 | 152 | response = client.execute(urlRequest); 153 | 154 | // Business Logic. 155 | 156 | Map files = new HashMap<>(); 157 | 158 | results.forEach(e -> { 159 | // The array will always contain JsonObjects. 160 | JsonObject arrayObject = e.getAsJsonObject(); 161 | 162 | // Add the file's ID to the list of file IDs. 163 | files.put(arrayObject.get("package_file_id").getAsLong(), arrayObject.get("download_alias").getAsString()); 164 | }); 165 | ``` 166 | 167 | Downloading Files 168 | ----------------- 169 | After receiving files using one of the above methods you will have a map of 170 | all file ids in a package and their names. If you were to want to then move 171 | to downloading said files, you'd do something along the lines of: 172 | ```java 173 | // Assume code in authentication section is present. 174 | // Assume that one of the retrieving files implementations is present too 175 | 176 | // Create a post request to the batch generate presigned urls endpoint. 177 | HttpPost urlRequest = new HttpPost("https://nda.nih.gov/api/package/" + packageId + "/files/batchGeneratePresignedUrls"); 178 | urlRequest.setHeader("Accept", "application/json"); 179 | urlRequest.setHeader("Authorization", "Basic " + credentials); 180 | 181 | // Create our post data. 182 | JsonArray filesArray = new JsonArray(); 183 | 184 | // Populate post data. 185 | files.keySet().forEach(filesArray::add); 186 | 187 | // Set post data. 188 | urlRequest.setEntity(new ByteArrayEntity(GSON.toJson(filesArray).getBytes(StandardCharsets.UTF_8))); 189 | 190 | response = client.execute(urlRequest); 191 | 192 | responseBody = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8); 193 | object = GSON.fromJson(responseBody, JsonObject.class); 194 | 195 | // Business Logic. 196 | 197 | // Get the presignedUrls array from response. 198 | JsonArray urlArray = object.get("presignedUrls").getAsJsonArray(); 199 | 200 | // Iterate on urls. 201 | urlArray.forEach(e -> { 202 | // Always will be an array of JsonObjects. 203 | JsonObject downloadObject = e.getAsJsonObject(); 204 | 205 | Long fileId = downloadObject.get("package_file_id").getAsLong(); 206 | 207 | // You should catch an IOException around all of the following code. 208 | // It's not caught here for brevity. 209 | URL downloadUrl = new URL(downloadObject.get("downloadURL").getAsString()); 210 | String name = files.get(fileId); 211 | 212 | // Begin the download. 213 | ReadableByteChannel channel = Channels.newChannel(downloadUrl.openStream()); 214 | 215 | // Process file name to make this multiplat 216 | name = name.replace("/", File.seperator); 217 | File result = new File("somedownloaddir", name); 218 | 219 | // Since packages have their own file structures, having 220 | // this here will ensure that all required directories are 221 | // created based off of the remote. 222 | if (!result.getParentFile().exists()) { 223 | result.getParentFile().mkdirs(); 224 | } 225 | 226 | // Actually write the data to the file. 227 | FileOutputStream fileOut = new FileOutputStream(result); 228 | fileOut.getChannel().transferFrom(channel, 0, Long.MAX_VALUE); 229 | }); 230 | ``` 231 | 232 | Endpoints used in this example: 233 | - [Authentication](../endpoints/AUTHENTICATION.md) 234 | - [Get Package Files](../endpoints/GET_PACKAGE_FILES.md) 235 | - [Batch Generate Presigned URLS](../endpoints/BATCH_GENERATE_PRESIGNED_URLS.md) 236 | - [Get Package Files from S3 Link](../endpoints/GET_PACKAGE_FILES_FROM_S3.md) -------------------------------------------------------------------------------- /example/PYTHON.md: -------------------------------------------------------------------------------- 1 | Python Code Example 2 | =================== 3 | Some sample python code for retrieving files 4 | from Package Service. The `requests` library 5 | is used in this example (`pip install requests`). 6 | The sample code was written using Python v3.9.1. 7 | 8 | Authentication 9 | -------------- 10 | 11 | You can verify if your authentication credentials work 12 | by using the `/auth` endpoint: 13 | ```python 14 | import base64 15 | import requests 16 | import json 17 | import urllib.request 18 | import shutil 19 | from pathlib import Path 20 | 21 | # Encode our credentials then convert it to a string. 22 | credentials = base64.b64encode(b'username:password').decode('utf-8') 23 | 24 | # Create the headers we will be using for all requests. 25 | headers = { 26 | 'Authorization': 'Basic ' + credentials, 27 | 'User-Agent': 'Example Client', 28 | 'Accept': 'application/json' 29 | } 30 | 31 | # Send Http request 32 | response = requests.get('https://nda.nih.gov/api/package/auth', headers=headers) 33 | 34 | # Business Logic. 35 | 36 | # If the response status code does not equal 200 37 | # throw an exception up. 38 | if response.status_code != requests.codes.ok: 39 | print('failed to authenticate') 40 | response.raise_for_status() 41 | 42 | # The auth endpoint does no return any data to parse 43 | # only a Http response code is returned. 44 | ``` 45 | 46 | The auth endpoint is not required in your usage though, 47 | it is only helpful for validating your credentials. 48 | 49 | Retrieving Files 50 | ---------------- 51 | #### From Nothing: 52 | If you want to get and download all files related to a 53 | single package: 54 | ```python 55 | # Assume code in authentication section is present. 56 | 57 | packageId = 1234 58 | 59 | # Construct the request to get the files of package 1234 60 | # URL structure is: https://nda.nih.gov/api/package/{packageId}/files 61 | response = requests.get('https://nda.nih.gov/api/package/' + str(packageId) + '/files', headers=headers) 62 | 63 | # Get the results array from the json response. 64 | results = response.json()['results'] 65 | 66 | # Business Logic. 67 | 68 | files = {} 69 | 70 | # Add important file data to the files dictionary. 71 | for f in results: 72 | files[f['package_file_id']] = {'name': f['download_alias']} 73 | ``` 74 | 75 | #### From S3 URL 76 | This is a use-case that was inspired by 77 | the [DCAN Labs NDA ABCD Downloader](https://github.com/DCAN-Labs/nda-abcd-s3-downloader). 78 | ([More on this workflow](../MANIFEST_FILE_NOTE.md)) 79 | This method allows you to convert pre-existing s3 80 | file references to Package Service files: 81 | ```python 82 | import csv 83 | 84 | # Assume code in authentication section is present. 85 | 86 | packageId = 1234 87 | 88 | s3Files = [] 89 | 90 | # Load in and process the manifest file. 91 | # Not all manifest files are structured like this, all you require is 92 | # an S3 url and a package that has the files associated with it. 93 | with open('datastructure_manifest.txt', 'r') as manifest: 94 | for rows in csv.reader(manifest, dialect='excel-tab'): 95 | for row in rows: 96 | if row.startsWith('s3://'): 97 | s3Files.append(row) 98 | 99 | # The manifest files have their column declarations listed twice, trim those out 100 | s3Files = s3Files[2:] 101 | 102 | # Construct the request to get the files of package 1234 103 | # URL structure is: https://nda.nih.gov/api/package/{packageId}/files 104 | response = requests.post('https://nda.nih.gov/api/package/' + str(packageId) + '/files', json=s3Files, headers=headers) 105 | 106 | # Business Logic. 107 | 108 | files = {} 109 | 110 | # Add important file data to the files dictionary. 111 | # We can skip having to transform the json because a json array is returned. 112 | for f in response.json(): 113 | files[f['package_file_id']] = {'name': f['download_alias']} 114 | ``` 115 | 116 | Downloading Files 117 | ----------------- 118 | After receiving files using one of the above methods you will have a dictionary of 119 | all file ids in a package and their names. If you were to want to then move 120 | to downloading said files, you'd do something along the lines of: 121 | ```python 122 | # Assume code in authentication section is present. 123 | # Assume that one of the retrieving files implementations is present too 124 | 125 | # Create a post request to the batch generate presigned urls endpoint. 126 | # Use keys from files dictionary to form a list, which is converted to 127 | # a json array which is posted. 128 | response = requests.post('https://nda.nih.gov/api/package/' + str(packageId) + '/files/batchGeneratePresignedUrls', json=list(files.keys()), headers=headers) 129 | 130 | # Get the presigned urls from the response. 131 | results = response.json()['presignedUrls'] 132 | 133 | # Business Logic. 134 | 135 | # Add a download key to the file's data. 136 | for url in results: 137 | files[url['package_file_id']]['download'] = url['downloadURL'] 138 | 139 | # Iterate on file id and it's data to perform the downloads. 140 | for id, data in files: 141 | name = data['name'] 142 | downloadUrl = data['download'] 143 | # Create a downloads directory 144 | file = 'downloads/' + name 145 | # Strip out the file's name for creating non-existent directories 146 | directory = file[:file.rfind('/')] 147 | 148 | # Create non-existent directories, package files have their 149 | # own directory structure, and this will ensure that it is 150 | # kept in tact when downloading. 151 | Path(directory).mkdir(parents=True, exist_ok=True) 152 | 153 | # Initiate the download. 154 | with urllib.request.urlopen(downloadUrl) as dl, open(file, 'wb') as out_file: 155 | shutil.copyfileobj(dl, out_file) 156 | ``` 157 | 158 | Endpoints used in this example: 159 | - [Authentication](../endpoints/AUTHENTICATION.md) 160 | - [Get Package Files](../endpoints/GET_PACKAGE_FILES.md) 161 | - [Batch Generate Presigned URLS](../endpoints/BATCH_GENERATE_PRESIGNED_URLS.md) 162 | - [Get Package Files from S3 Link](../endpoints/GET_PACKAGE_FILES_FROM_S3.md) -------------------------------------------------------------------------------- /example/README.md: -------------------------------------------------------------------------------- 1 | Code Examples 2 | ============= 3 | - [Java](JAVA.md) 4 | - [Python](PYTHON.md) 5 | --------------------------------------------------------------------------------