├── .github
    ├── CODE_OF_CONDUCT.md
    ├── ISSUE_TEMPLATE.md
    └── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── code
    ├── AOAIHandler.py
    ├── AzureBatch.py
    ├── AzureStorageHandler.py
    ├── RunBatch.py
    └── Utilities.py
├── media
    ├── batch_accel_overview_new.png
    └── overview.pdf
├── requirements.txt
└── templates
    ├── AOAI_config_template.json
    ├── app_config.json
    ├── batch_template.json
    └── storage_config.json


/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Microsoft Open Source Code of Conduct
 2 | 
 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
 4 | 
 5 | Resources:
 6 | 
 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | <!--
 2 | IF SUFFICIENT INFORMATION IS NOT PROVIDED VIA THE FOLLOWING TEMPLATE THE ISSUE MIGHT BE CLOSED WITHOUT FURTHER CONSIDERATION OR INVESTIGATION
 3 | -->
 4 | > Please provide us with the following information:
 5 | > ---------------------------------------------------------------
 6 | 
 7 | ### This issue is for a: (mark with an `x`)
 8 | ```
 9 | - [ ] bug report -> please search issues before submitting
10 | - [ ] feature request
11 | - [ ] documentation issue or request
12 | - [ ] regression (a behavior that used to work and stopped in a new release)
13 | ```
14 | 
15 | ### Minimal steps to reproduce
16 | >
17 | 
18 | ### Any log messages given by the failure
19 | >
20 | 
21 | ### Expected/desired behavior
22 | >
23 | 
24 | ### OS and Version?
25 | > Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
26 | 
27 | ### Versions
28 | >
29 | 
30 | ### Mention any other details that might be useful
31 | 
32 | > ---------------------------------------------------------------
33 | > Thanks! We'll be in touch soon.
34 | 


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | ## Purpose
 2 | <!-- Describe the intention of the changes being proposed. What problem does it solve or functionality does it add? -->
 3 | * ...
 4 | 
 5 | ## Does this introduce a breaking change?
 6 | <!-- Mark one with an "x". -->
 7 | ```
 8 | [ ] Yes
 9 | [ ] No
10 | ```
11 | 
12 | ## Pull Request Type
13 | What kind of change does this Pull Request introduce?
14 | 
15 | <!-- Please check the one that applies to this PR using "x". -->
16 | ```
17 | [ ] Bugfix
18 | [ ] Feature
19 | [ ] Code style update (formatting, local variables)
20 | [ ] Refactoring (no functional changes, no api changes)
21 | [ ] Documentation content changes
22 | [ ] Other... Please describe:
23 | ```
24 | 
25 | ## How to Test
26 | *  Get the code
27 | 
28 | ```
29 | git clone [repo-address]
30 | cd [repo-name]
31 | git checkout [branch-name]
32 | npm install
33 | ```
34 | 
35 | * Test the code
36 | <!-- Add steps to run the tests suite and/or manually test -->
37 | ```
38 | ```
39 | 
40 | ## What to Check
41 | Verify that the following are valid
42 | * ...
43 | 
44 | ## Other Information
45 | <!-- Add any other helpful information that may be needed here. -->


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /.venv
2 | /config
3 | /data
4 | /notebooks
5 | /code/__pycache__
6 | /code/test.py


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | ## [project-title] Changelog
 2 | 
 3 | <a name="x.y.z"></a>
 4 | # x.y.z (yyyy-mm-dd)
 5 | 
 6 | *Features*
 7 | * ...
 8 | 
 9 | *Bug Fixes*
10 | * ...
11 | 
12 | *Breaking Changes*
13 | * ...
14 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to [project-title]
 2 | 
 3 | This project welcomes contributions and suggestions.  Most contributions require you to agree to a
 4 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
 5 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
 6 | 
 7 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
 8 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
 9 | provided by the bot. You will only need to do this once across all repos using our CLA.
10 | 
11 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
12 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
13 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
14 | 
15 |  - [Code of Conduct](#coc)
16 |  - [Issues and Bugs](#issue)
17 |  - [Feature Requests](#feature)
18 |  - [Submission Guidelines](#submit)
19 | 
20 | ## <a name="coc"></a> Code of Conduct
21 | Help us keep this project open and inclusive. Please read and follow our [Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
22 | 
23 | ## <a name="issue"></a> Found an Issue?
24 | If you find a bug in the source code or a mistake in the documentation, you can help us by
25 | [submitting an issue](#submit-issue) to the GitHub Repository. Even better, you can
26 | [submit a Pull Request](#submit-pr) with a fix.
27 | 
28 | ## <a name="feature"></a> Want a Feature?
29 | You can *request* a new feature by [submitting an issue](#submit-issue) to the GitHub
30 | Repository. If you would like to *implement* a new feature, please submit an issue with
31 | a proposal for your work first, to be sure that we can use it.
32 | 
33 | * **Small Features** can be crafted and directly [submitted as a Pull Request](#submit-pr).
34 | 
35 | ## <a name="submit"></a> Submission Guidelines
36 | 
37 | ### <a name="submit-issue"></a> Submitting an Issue
38 | Before you submit an issue, search the archive, maybe your question was already answered.
39 | 
40 | If your issue appears to be a bug, and hasn't been reported, open a new issue.
41 | Help us to maximize the effort we can spend fixing issues and adding new
42 | features, by not reporting duplicate issues.  Providing the following information will increase the
43 | chances of your issue being dealt with quickly:
44 | 
45 | * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps
46 | * **Version** - what version is affected (e.g. 0.1.2)
47 | * **Motivation for or Use Case** - explain what are you trying to do and why the current behavior is a bug for you
48 | * **Browsers and Operating System** - is this a problem with all browsers?
49 | * **Reproduce the Error** - provide a live example or a unambiguous set of steps
50 | * **Related Issues** - has a similar issue been reported before?
51 | * **Suggest a Fix** - if you can't fix the bug yourself, perhaps you can point to what might be
52 |   causing the problem (line of code or commit)
53 | 
54 | You can file new issues by providing the above information at the corresponding repository's issues link: https://github.com/[organization-name]/[repository-name]/issues/new].
55 | 
56 | ### <a name="submit-pr"></a> Submitting a Pull Request (PR)
57 | Before you submit your Pull Request (PR) consider the following guidelines:
58 | 
59 | * Search the repository (https://github.com/[organization-name]/[repository-name]/pulls) for an open or closed PR
60 |   that relates to your submission. You don't want to duplicate effort.
61 | 
62 | * Make your changes in a new git fork:
63 | 
64 | * Commit your changes using a descriptive commit message
65 | * Push your fork to GitHub:
66 | * In GitHub, create a pull request
67 | * If we suggest changes then:
68 |   * Make the required updates.
69 |   * Rebase your fork and force push to your GitHub repository (this will update your Pull Request):
70 | 
71 |     ```shell
72 |     git rebase master -i
73 |     git push -f
74 |     ```
75 | 
76 | That's it! Thank you for your contribution!
77 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 |     MIT License
 2 | 
 3 |     Copyright (c) Microsoft Corporation.
 4 | 
 5 |     Permission is hereby granted, free of charge, to any person obtaining a copy
 6 |     of this software and associated documentation files (the "Software"), to deal
 7 |     in the Software without restriction, including without limitation the rights
 8 |     to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 |     copies of the Software, and to permit persons to whom the Software is
10 |     furnished to do so, subject to the following conditions:
11 | 
12 |     The above copyright notice and this permission notice shall be included in all
13 |     copies or substantial portions of the Software.
14 | 
15 |     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 |     IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 |     FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 |     AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 |     LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 |     OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 |     SOFTWARE


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <h1>Unofficial Azure OpenAI Batch Accelerator</h1>
 2 | <h2>Disclaimer:</h2>This is a reference implementation of the Azure OpenAI Batch API designed to be extended for different use cases.<br/>
 3 |  This code is <b>NOT</b> intended for production use but instead as a starting point/reference implenentation of the Azure OpenAI (AOAI) Batch API. The code here is provided <b>AS IS</b>, you assume all responsibility (e.g., charges) from running this code. Testing in your environment should be done before running large use cases. Lastly, this is a work in progress and will be updated frequently. Please check back regularly for updates. 
 4 |  <h1>Background & Overview</h1>
 5 |  This accelerator is designed to help users to quickly start using the Azure OpenAI Batch API. An overview of how the accelerator works is shown below:
 6 |  
 7 |  ![Overview](media/batch_accel_overview_new.png)
 8 |  
 9 |  Key features of the accelerator are:
10 |  <br/>
11 |  
12 |  1. Automated Batch Job Submission and Creation
13 |  2. Multi-threaded Async Processing to Reduce Overall Processing Time
14 |  3. Automated Error Tracking
15 |  4. Multi-directory Hierarchy Support
16 |  5. Configurable Micro-batch support
17 |  6. Automated Post-job Cleanup
18 |     
19 |  <br/>
20 |  For more details, including a detailed data flow diagram, please see <a href="media/overview.pdf" type="application/pdf">this</a> overview.
21 |  
22 | <h1>Installation & Setup</h1>
23 | <i>Environment:</i><br/><br/>
24 | 
25 | 1. Python 3.11 (or higher)
26 | 2. Pip
27 | 3. An Azure Data Lake Storage (v2) account
28 | 4. An Azure OpenAI deployment
29 | 
30 | <br/><i>The following pip packages are required:</i><br/>
31 | 1. azure-storage-file-datalake<br/>
32 | 2. openai
33 | 3. tiktoken
34 | 4. requests
35 | 5. token-count
36 | 6. asyncio
37 | 7. aiohttp
38 | 
39 | In addition to this, it is recommended to install these dependencies in a virtual environment to avoid conflicts (e.g., .venv)
40 | <h2>Connecting AOAI to Azure Storage</h2>
41 | The `Storage Blob Data Contributer` role must be given to the AOAI service's Managed Identity to allow AOAI to access the data in the Azure Storage Account.
42 | <h1>Configuration:</h1>
43 | There are three configuration files required to use this accelerator:
44 | 
45 | 1. `AOAI_config.json` - This file contains the settings for AOAI.
46 | 2. `storage_config.json` - This file contains the settings for the Azure Data Lake Storage Account which will hold the input/output of the job.
47 | 3. `app_config.json` - This file contains the application configuration settings.
48 | 4. `APP_CONFIG` in `runBatch.py` - This variable should be set to point to the `app_config.json` file which defines the app settings. Alternatively, this value can be set as an environment variable in the underlying OS. This will support command line parameter-based input in the future.
49 | 
50 | Reference templates of these files have been provided in the `templates` directory where <> denote settings that must be filled in. 
51 | Other important settings are:
52 | 
53 | 1. aoai_api_version - This must be set to `2024-07-01-preview` as that's the only API version which supports the Batch API at this time. In the future, different versions can be set here.
54 | 2. batch_job_endpoint - This must be set to `/chat/completions`.
55 | 3. batch_size - This controls the 'micro batch' size which is the number of files that will be sent to the batch service in paralle. It is set to a recommended value of `10` but can be changed
56 | based on the requirements/file sizes being sent to the batch service.
57 | 4. download_to_local - This controls if the files should be downloaded to local to count the number of tokens in a file. Currently this should be set to the default value of `false` but may be used in future versions.
58 | 5. input_directory/filesystem - This is the directory and filesystem the code will check for input files, respectively. The default directory setting of `/` assumes no directories in the input filesystem. The current implementation is not recursive; if input files are stored in a directory in the input filesystem/container then it should be specified here.
59 | 6. output_directory/filesystem - This is the directory and filesystem the code will write output files, respectively. The default directory setting of `/` assumes no directories in the ouput filesystem. 
60 | 7. error_directory/filesystem - This is the directory and filesystem the code will write error files, respectively. The default directory setting of `/` assumes no directories in the error filesystem.
61 | 8. continuous_mode - This setting controls how the code is run. If set to `true`, it will continuously check the input directory for files every 60 seconds, taking a snapshot of the files and kicking off a series of batch jobs to process until all files are processed. To stop, press `ctrl+c`. If set to `false` it will only run when executed. 
62 | 
63 | <h1>Using the accelerator</h1>
64 | 
65 | 1. <b>Input</b>: Upload formatted batch files to the input location specified in the `storage_config.json` configuration file. Once all files are uploaded, start the `runBatch.py` in the code directoy. When run, the code will run continuously or once, depending on the `continuous_mode` setting described above. 
66 | 2. <b>Output</b>: The code will create a directory in the `processed_filesystem_system_name` location in `storage_config.json` configuration file for each file processed along with a timestamp of when the file was processed. The raw input file will also be moved to the `processed` directory. In addition, if there are any errors, they will be put in the `error_filesystem_system_name` location, with a timestamp. 
67 | 3. <b>Metadata</b>: The output creates a metadata file for each input file which contains mapping information which may be useful for automated processing of results.
68 | 4. <b>Cleanup</b>: After processing is complete, the code will automatically process and clean up all files in the input directory, locally downloaded files, and all uploaded files to the AOAI Batch Service.
69 | 
70 | <h1>Issues</h1>
71 | If you have any problems using this code or would like to see a new feature added, please create a new issue using the 'Issues' tab.
72 | 
73 | 


--------------------------------------------------------------------------------
/code/AOAIHandler.py:
--------------------------------------------------------------------------------
  1 | 
  2 | from openai import AzureOpenAI
  3 | import requests
  4 | import aiohttp
  5 | import datetime
  6 | import asyncio
  7 | 
  8 | class AOAIHandler:
  9 |     def __init__(self, config, batch=False):
 10 |         self.config_data = config
 11 |         self.model = config["aoai_deployment_name"]
 12 |         self.batch_endpoint = config["batch_job_endpoint"]
 13 |         self.completion_window = config["completion_window"]
 14 |         self.aoai_client = self.init_client(config)
 15 |         self.batch_status = {}
 16 |         self.azure_endpoint = config['aoai_endpoint']
 17 |         self.api_version = config['aoai_api_version']
 18 |         self.api_key = config["aoai_key"]
 19 |     def init_client(self,config):
 20 |         client = AzureOpenAI(
 21 |             azure_endpoint = config['aoai_endpoint'], 
 22 |             api_key=config['aoai_key'],  
 23 |             api_version=config['aoai_api_version']
 24 |         )
 25 |         return client
 26 |     async def upload_batch_input_file_async(self,input_file_name, input_file_path, session):
 27 |         try:
 28 |             url = f"{self.azure_endpoint}openai/files/import?api-version={self.api_version}"
 29 |             headers = {
 30 |             "Content-Type": "application/json",
 31 |             "api-key": self.api_key  # Replace with your actual API key
 32 |             }   
 33 |             # Define the payload
 34 |             payload = {
 35 |                 "purpose": "batch",
 36 |                 "filename": input_file_name,
 37 |                 "content_url": input_file_path
 38 |             }
 39 |             async with session.post(url, headers=headers, json=payload) as response:
 40 |                 return await response.json()
 41 |         except Exception as e:
 42 |             print(f"An exception occurred while uploading the file: {e}")
 43 |             return False
 44 |     def upload_batch_input_file(self,input_file_name, input_file_path):
 45 |         try:
 46 |             url = f"{self.azure_endpoint}openai/files/import?api-version={self.api_version}"
 47 |             headers = {
 48 |             "Content-Type": "application/json",
 49 |             "api-key": self.api_key  # Replace with your actual API key
 50 |             }   
 51 |             # Define the payload
 52 |             payload = {
 53 |                 "purpose": "batch",
 54 |                 "filename": input_file_name,
 55 |                 "content_url": input_file_path
 56 |             }
 57 |         
 58 |             return requests.request("POST", url, headers=headers, json=payload)
 59 |         except Exception as e:
 60 |             print(f"An exception occurred while uploading the file: {e}")
 61 |             return False
 62 |     def delete_single(self, file_id):
 63 |         deletion_status = False
 64 |         try:
 65 |             # Attempt to delete the file
 66 |             response = self.aoai_client.files.delete(file_id)
 67 |             print(f"File {file_id} deleted from client successfully.")
 68 |             deletion_status = True
 69 |         except Exception as e:
 70 |             # Handle any exceptions that occur
 71 |             print(f"An error occurred while deleting file {file_id}: {e}")
 72 |         return deletion_status
 73 |     def delete_all_files(self):
 74 |         deletion_status = {}
 75 |         file_objects = self.aoai_client.files.list().data
 76 |         # Extracting the ids using a list comprehension
 77 |         file_ids = [file_object.id for file_object in file_objects] 
 78 |         for file_id in file_ids:
 79 |             try:
 80 |                 # Attempt to delete the file
 81 |                 response = self.aoai_client.files.delete(file_id)
 82 |                 print(f"File {file_id} deleted successfully.")
 83 |                 deletion_status[file_id] = True
 84 |             except Exception as e:
 85 |                 # Handle any exceptions that occur
 86 |                 print(f"An error occurred while deleting file {file_id}: {e}")
 87 |                 deletion_status[file_id] = False
 88 |     
 89 |         return deletion_status
 90 |     def create_batch_job(self,file_id):
 91 |         # Submit a batch job with the file
 92 |         batch_response = self.aoai_client.batches.create(
 93 |             input_file_id=file_id,
 94 |             endpoint=self.batch_endpoint,
 95 |             completion_window=self.completion_window,
 96 |         )
 97 |         # Save batch ID for later use
 98 |         batch_id = batch_response.id
 99 |         self.batch_status[batch_id] = "Submitted"
100 |         return batch_response
101 |     async def wait_for_file_upload(self, file_id):
102 |         status = "pending"
103 |         while True:
104 |             file_response = self.aoai_client.files.retrieve(file_id)
105 |             status = file_response.status
106 |             if status == "error":
107 |                 print(f"{datetime.datetime.now()} Error occurred while processing file {file_id}")
108 |                 break
109 |             elif status == "processed":
110 |                 print(f"{datetime.datetime.now()} File {file_id} processed successfully.")
111 |                 break
112 |             else:
113 |                 print(f"{datetime.datetime.now()} File Id: {file_id}, Status: {status}")
114 |             await asyncio.sleep(5)
115 |         return file_response
116 |     async def wait_for_batch_job(self, batch_id):
117 |         # Wait until the uploaded file is in processed state
118 |         status = "validating"
119 |         while status not in ("completed", "failed", "canceled"):
120 |             batch_response = self.aoai_client.batches.retrieve(batch_id)
121 |             status = batch_response.status
122 |             print(f"{datetime.datetime.now()} Batch Id: {batch_id},  Status: {status}")
123 |             await asyncio.sleep(10)
124 |         if status == "failed":
125 |             print(f"Batch job {batch_id} failed.")
126 |         elif status == "canceled":
127 |             print(f"Batch job {batch_id} was canceled.")
128 |         else:
129 |             print(f"Batch job {batch_id} completed successfully.")
130 |         return batch_response


--------------------------------------------------------------------------------
/code/AzureBatch.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | from Utilities import Utils
  4 | import asyncio
  5 | import aiohttp
  6 | class AzureBatch:
  7 |     def __init__(self, aoai_client, input_storage_handler, 
  8 |                  error_storage_handler, processed_storage_handler, batch_path,
  9 |                 input_directory_client, local_download_path, output_directory, error_directory,
 10 |                 count_tokens=False):
 11 |         self.aoai_client = aoai_client
 12 |         self.input_storage_handler = input_storage_handler
 13 |         self.error_storage_handler = error_storage_handler
 14 |         self.processed_storage_handler = processed_storage_handler
 15 |         self.batch_path = batch_path
 16 |         self.input_directory_client = input_directory_client
 17 |         self.local_download_path = local_download_path
 18 |         self.output_directory = output_directory
 19 |         self.error_directory = error_directory
 20 |         self.count_tokens = count_tokens
 21 | 
 22 |     async def process_all_files(self,files,micro_batch_size):
 23 |         tasks = []
 24 |         current_tasks = 0
 25 |         async with aiohttp.ClientSession() as session:
 26 |             for file in files:
 27 |                 tasks.append(self.process_file(file, session))
 28 |                 current_tasks += 1
 29 |                 if current_tasks == micro_batch_size:
 30 |                     await asyncio.gather(*tasks)
 31 |                     tasks = []
 32 |                     current_tasks = 0
 33 |             #Process any remaining tasks
 34 |             if len(tasks) > 0:
 35 |                 await asyncio.gather(*tasks)
 36 |         
 37 |     async def process_file(self,file, session):
 38 |         print(f"Processing file {file}")
 39 |         filename_only = Utils.get_file_name_only(file)
 40 |         file_wo_directory = Utils.strip_directory_name(file)
 41 |         file_extension = Utils.get_file_extension(file_wo_directory)
 42 |         output_directory_name = self.output_directory+"/"+Utils.append_postfix(filename_only)
 43 |         error_directory_name = self.error_directory+"/"+Utils.append_postfix(filename_only)
 44 |         #Mark start time
 45 |         processing_result = {}
 46 |         batch_data = None
 47 |         try:
 48 |             batch_data = await self.submit_batch_job(file, file_wo_directory, error_directory_name, filename_only, session)
 49 |             if batch_data is None:
 50 |                 return
 51 |             self.process_batch_result(batch_data, filename_only, file_extension, file_wo_directory, 
 52 |                                   error_directory_name, output_directory_name)
 53 |             cleanup_status = self.cleanup_batch(file_wo_directory,batch_data["file_id"], batch_data["output_file_id"], batch_data["error_file_id"])
 54 |             processing_result["cleanup_status"] = cleanup_status
 55 |         except Exception as e:
 56 |             #Unexpected exception during processing
 57 |             print(f"An error occurred while processing file: {file}. Error: {e}")
 58 |             if batch_data is not None:
 59 |                 file_write_result = self.error_storage_handler.write_content_to_directory(batch_data["batch_file_data"],error_directory_name,filename_only)
 60 |                 cleanup_status = self.cleanup_batch(file_wo_directory,batch_data["file_id"], batch_data["output_file_id"], batch_data["error_file_id"])
 61 |                 processing_result["cleanup_status"] = cleanup_status
 62 |         return processing_result
 63 |     
 64 |     async def submit_batch_job(self,file, file_wo_directory, error_directory_name, filename_only, session):
 65 |         batch_storage_path = self.batch_path + file
 66 |         try:
 67 |             if self.local_download_path is not None:
 68 |                 output_path = os.path.join(self.local_download_path, file)
 69 |                 batch_file_data = self.input_storage_handler.save_file_to_local(file, 
 70 |                                             self.input_directory_client, output_path)
 71 |                 if self.count_tokens:
 72 |                     token_size = Utils.get_tokens_in_file(output_path,"gpt-4")
 73 |             else:
 74 |                 batch_file_data = self.input_storage_handler.get_file_data(file_wo_directory,self.input_directory_client)
 75 |                 batch_file_string = str(batch_file_data)
 76 |                 if self.count_tokens:
 77 |                     token_size = Utils.num_tokens_from_string(batch_file_string,"gpt-4")
 78 |         except Exception as e:
 79 |             print(f"Could not download file: {file}. Error: {e}")
 80 |             return None
 81 |         if self.count_tokens:
 82 |             print(f"File {file} has {token_size} tokens")
 83 |         else:
 84 |             token_size = "N/A"
 85 |          # Process the file
 86 |         upload_response = await self.aoai_client.upload_batch_input_file_async(file,batch_storage_path, session)
 87 |         if not upload_response:
 88 |             print(f"An error occurred while uploading file {file}. Please check the file and try again.")
 89 |             file_write_result = self.error_storage_handler.write_content_to_directory(batch_file_data,error_directory_name,file_wo_directory)
 90 |             cleanup_status = self.cleanup_batch(file_wo_directory,None, None, None)
 91 |             return None
 92 |         file_content_json = upload_response
 93 |         if "error" in file_content_json:
 94 |             print(f"An error occurred while uploading file {file}. Please check the file and try again.\n\nCode: "+file_content_json["error"]["code"]+"\n\nMessage: "+file_content_json["error"]["message"])
 95 |             file_write_result = self.error_storage_handler.write_content_to_directory(batch_file_data,error_directory_name,file_wo_directory)
 96 |             cleanup_status = self.cleanup_batch(file_wo_directory,None, None, None)
 97 |             return None
 98 |         file_id = file_content_json['id']
 99 |         print(f"file_id: {file_content_json['id']}")
100 |         #TODO: Check if the file was uploaded successfully, if not, move to error folder and cleanup
101 |         await self.aoai_client.wait_for_file_upload(file_id)
102 |         try:
103 |             initial_batch_response = self.aoai_client.create_batch_job(file_id)
104 |         except Exception as e:
105 |             print(f"An error occurred while creating batch job for file: {file}. Error: {e}")
106 |             file_write_result = self.error_storage_handler.write_content_to_directory(batch_file_data,error_directory_name,file_wo_directory)
107 |             cleanup_status = self.cleanup_batch(file_wo_directory,None, None, None)
108 |             return None
109 |         #This takes start time as a param
110 |         (finished_batch_response) = await self.aoai_client.wait_for_batch_job(initial_batch_response.id)
111 |         batch_data = {
112 |             "file": file,
113 |             "input_file_id": finished_batch_response.input_file_id,
114 |             "batch_job_id": initial_batch_response.id,
115 |             "error_file_id": finished_batch_response.error_file_id,
116 |             "output_file_id": finished_batch_response.output_file_id,
117 |             "token_size": token_size,
118 |             "initial_batch_response": initial_batch_response,
119 |             "finished_batch_response": finished_batch_response,
120 |             "file_id": file_id,
121 |             "batch_file_data": batch_file_data
122 |         }
123 |         return batch_data
124 |     
125 |     def process_batch_result(self,batch_data, filename_only, file_extension, file_wo_directory, 
126 |                              error_directory_name, output_directory_name):
127 |         batch_metadata = self.create_batch_metadata(batch_data)
128 |         metadata_filename = f"{filename_only}_metadata."+file_extension
129 |         if batch_data["error_file_id"] is not None:
130 |             error_file_content = self.aoai_client.aoai_client.files.content(batch_data["error_file_id"])
131 |             error_file_content_string = str(error_file_content.text)
132 |         else:
133 |             errors = batch_data["finished_batch_response"].errors.data
134 |             error_file_content = {}
135 |             error_index = 1
136 |             for error in errors:
137 |                 error_file_content["Error "+str(error_index)] = error.message
138 |             error_file_content_string = json.dumps(error_file_content)
139 |         if batch_data["output_file_id"] is not None:
140 |             output_file_content = self.aoai_client.aoai_client.files.content(batch_data["output_file_id"])  
141 |             output_file_content_string = str(output_file_content.text)
142 |         else:
143 |             output_file_content = ""
144 |             output_file_content_string = "" 
145 |         filename = batch_data["file"]
146 |         file_id = batch_data["initial_batch_response"].id
147 |         if not error_file_content_string == "":
148 |             error_filename = f"{filename_only}_error."+file_extension
149 |             batch_data["error_file_name"] = error_filename
150 |             error_file_content_json = error_file_content_string
151 |             error_file_metadata = json.dumps(batch_metadata)
152 |             error_content_write_result = self.error_storage_handler.write_content_to_directory(error_file_content_json,error_directory_name,error_filename)
153 |             error_metadata_write_result = self.error_storage_handler.write_content_to_directory(error_file_metadata,error_directory_name,metadata_filename)
154 |             file_write_result = self.error_storage_handler.write_content_to_directory(batch_data["batch_file_data"],error_directory_name,file_wo_directory)
155 |             if error_content_write_result and error_metadata_write_result:
156 |                 print(f"An error file with details written to the 'error' directory.")
157 |             else:
158 |                 print(f"There was a problem processing file: {filename} and details could not be written to storage. Please check {file_id} for more details.")
159 |         if not output_file_content_string == "":
160 |             output_filename = f"{filename_only}_output."+file_extension
161 |             batch_metadata["output_file_name"] = output_filename
162 |             output_file_content = self.aoai_client.aoai_client.files.content(batch_metadata["output_file_id"])   
163 |             output_file_content_json = output_file_content_string
164 |             output_file_metadata = json.dumps(batch_metadata)
165 |             output_content_write_result = self.processed_storage_handler.write_content_to_directory(output_file_content_json,output_directory_name,output_filename)
166 |             output_metadata_write_result = self.processed_storage_handler.write_content_to_directory(output_file_metadata,output_directory_name,metadata_filename)
167 |             file_write_result = self.processed_storage_handler.write_content_to_directory(batch_data["batch_file_data"],output_directory_name,file_wo_directory)
168 |             if output_content_write_result and output_metadata_write_result:
169 |                 print(f"File: {filename} has been processed successfully. Results are available in the 'processed' directory.")
170 |             else:
171 |                 print(f"File: {filename} has been processed successfully but could not be written to storage. Please check {file_id} for more details.")  
172 |     
173 |     def create_batch_metadata(self,batch_data):
174 |         batch_metadata = {
175 |             "file_name": batch_data["file"],
176 |             "input_file_id": batch_data["finished_batch_response"].input_file_id,
177 |             "batch_job_id": batch_data["initial_batch_response"].id,
178 |             "error_file_id": batch_data["finished_batch_response"].error_file_id,
179 |             "output_file_id": batch_data["finished_batch_response"].output_file_id,
180 |             "token_size": batch_data["token_size"],
181 |             "file_id": batch_data["file_id"]
182 |         }
183 |         return batch_metadata
184 |     
185 |     def cleanup_batch(self,filename,file_id, output_file_id, error_file_id):
186 |         cleanup_result = {}
187 |         if file_id is not None:
188 |             print("Deleting input file from client...")
189 |             deletion_status = self.aoai_client.delete_single(file_id)     
190 |         if output_file_id is not None:
191 |             print("Deleting output file from client...")
192 |             deletion_status = self.aoai_client.delete_single(output_file_id)
193 |         if error_file_id is not None:
194 |             print("Deleting error file from client...")
195 |             deletion_status = self.aoai_client.delete_single(error_file_id)
196 |         if self.local_download_path is not None:
197 |             local_filename_with_path = self.local_download_path+"\\"+filename
198 |             if os.path.exists(local_filename_with_path):
199 |                 os.remove(local_filename_with_path)
200 |                 print(f"File {local_filename_with_path} deleted successfully.")
201 |                 cleanup_result["local_file_deletion"] = True
202 |         az_storage_deletion_status = self.input_storage_handler.delete_file_data(filename,self.input_directory_client)
203 |         if az_storage_deletion_status:
204 |             print(f"File {filename} deleted from storage successfully.")
205 |             cleanup_result["az_storage_file_deletion"] = True
206 |         else:
207 |             print(f"An error occurred while deleting file {filename} from storage.")
208 |             cleanup_result["az_storage_file_deletion"] = False
209 |         return cleanup_result


--------------------------------------------------------------------------------
/code/AzureStorageHandler.py:
--------------------------------------------------------------------------------
  1 | from azure.storage.filedatalake import (
  2 |     DataLakeServiceClient,
  3 |     DataLakeDirectoryClient,
  4 |     FileSystemClient
  5 | )
  6 | import json
  7 | class StorageHandler:
  8 |     def __init__(self, storage_account_name, storage_account_key, file_system_name=None):
  9 |         self.storage_account_name = storage_account_name
 10 |         self.storage_account_key = storage_account_key
 11 |         self.service_client = self.get_service_client_account_key(storage_account_name, storage_account_key)
 12 |         if file_system_name is not None:
 13 |             self.file_system_client = self.get_file_system_client(file_system_name)
 14 |         else:
 15 |             self.file_system_client = None
 16 |         self.byte_read_size = 50000
 17 |     def get_directories(self,path):
 18 |         paths = self.file_system_client.get_paths(path=path)
 19 |         return_paths = []
 20 |         for current_path in paths:
 21 |             if current_path.is_directory:
 22 |                 return_paths.append(current_path.name)
 23 |         #No subdirectories found, return the current directory
 24 |         if len(return_paths) == 0:
 25 |             return_paths.append(path)
 26 |         return return_paths
 27 |     def write_content_to_directory(self, file_content, directory_name, output_filename):
 28 |         write_result = False  
 29 |         destination_directory_client = self.get_or_create_directory_client(directory_name)     
 30 |         result_file_content_status = self.write_json_to_storage(output_filename,file_content,destination_directory_client)
 31 |         if result_file_content_status:
 32 |             write_result = True
 33 |             print(f"File {output_filename} written to storage directory.")
 34 |         else:
 35 |             print(f"Error writing file {output_filename} to directory.")
 36 |         return write_result
 37 |     def get_or_create_directory_client(self,directory_name):
 38 |         dir_exists = self.check_directory_exists(directory_name)
 39 |         if(dir_exists):
 40 |             directory_client = self.get_directory_client(directory_name)
 41 |         else:
 42 |             directory_client = self.create_directory(directory_name)
 43 |         return directory_client
 44 |     def write_bytes_to_storage_chunked(self, source_filename,source_directory_client, 
 45 |                                        destination_filename,destination_directory_client):
 46 |         try:
 47 |             output_file_stream = destination_directory_client.get_file_client(destination_filename)
 48 |             file_content_stream = self.get_file_stream(source_filename,source_directory_client)
 49 |             byte_stream = file_content_stream.read(self.byte_read_size)
 50 |             offset = 0
 51 |             while len(byte_stream) > 0:
 52 |                 size = len(byte_stream)
 53 |                 if not output_file_stream.exists():
 54 |                     output_file_stream.upload_data(data=byte_stream, overwrite=True)
 55 |                 else:
 56 |                     output_file_stream.append_data(data=byte_stream, offset=offset, length=size, flush=True)
 57 |                 offset += size
 58 |                 byte_stream = file_content_stream.read(self.byte_read_size)
 59 |         except Exception as e:
 60 |             print(f"Error writing file {source_filename} to destination directory: {e}")
 61 |             
 62 |     def copy_file_to_directory(self, source_filename, source_directory, destination_filesystem_client, 
 63 |                                destination_directory, destination_filename ):
 64 |         source_directory_client = self.get_directory_client(source_directory)
 65 |         destination_directory_client = destination_filesystem_client.get_or_create_directory_client(destination_directory)
 66 |         self.write_bytes_to_storage_chunked(source_filename,source_directory_client,destination_filename, 
 67 |                                             destination_directory_client)
 68 |         
 69 |         return True
 70 |     def write_json_to_storage(self,output_name,output_data,directory_client):
 71 |         return_code = True
 72 |         try:
 73 |             file_client = directory_client.get_file_client(output_name)
 74 |             file_client.upload_data(output_data, overwrite=True)
 75 |         except Exception as e:
 76 |             return_code = False
 77 |         finally:
 78 |             return return_code
 79 |     def check_directory_exists(self,directory_name):
 80 |         return_status = False
 81 |         try:
 82 |             directory_client = self.file_system_client.get_directory_client(directory_name)
 83 |             if directory_client.exists():
 84 |                 return_status = True
 85 |             else:
 86 |                 return_status = False
 87 |         except Exception as e:
 88 |             return_status = False
 89 |         return return_status   
 90 |     def create_directory(self, directory_name: str) -> DataLakeDirectoryClient:
 91 |         directory_client = self.file_system_client.create_directory(directory_name)
 92 |         return directory_client
 93 |     
 94 |     def get_directory_client(self, directory_name: str) -> DataLakeDirectoryClient:
 95 |         directory_client = self.file_system_client.get_directory_client(directory_name)
 96 |         return directory_client
 97 |     
 98 |     def get_file_list(self, path: str) -> list:
 99 |         file_list = [] 
100 |         paths = self.file_system_client.get_paths(path=path)
101 |         for path in paths:
102 |             if not path.is_directory:
103 |                 file_list.append(path.name)
104 |         return file_list
105 |     def get_file_stream(self, file_name,directory_client):
106 |         file_client = directory_client.get_file_client(file_name)
107 |         download = file_client.download_file()
108 |         return download
109 |     def get_file_data(self, file_name,directory_client):
110 |         file_client = directory_client.get_file_client(file_name)
111 |         download = file_client.download_file()
112 |         return download.readall()
113 |     def delete_file_data(self, file_name,directory_client):
114 |         return_status = True
115 |         try:
116 |             file_client = directory_client.get_file_client(file_name)
117 |             file_client.delete_file()
118 |         except Exception as e:
119 |             return_status = False
120 |         return return_status
121 |     def save_file_to_local(self, file_name, directory_client, local_path):
122 |         file_client = directory_client.get_file_client(file_name)
123 |         download = file_client.download_file()
124 |         data = download.readall()
125 |         try:
126 |             with open(local_path, "wb") as file:
127 |                 file.write(data)
128 |             print(f"File {file_name} saved to local path {local_path}")
129 |         except Exception as e:
130 |             print(f"An error occurred while saving file {file_name} to local path {local_path}: {e}")
131 |         return data
132 | 
133 |     def get_file_system_client(self, file_system_name: str) -> FileSystemClient:
134 |         file_system_client = self.service_client.get_file_system_client(file_system_name)
135 |         return file_system_client
136 | 
137 |     def get_service_client_account_key(self, account_name, account_key) -> DataLakeServiceClient:
138 |         account_url = f"https://{account_name}.dfs.core.windows.net"
139 |         service_client = DataLakeServiceClient(account_url, credential=account_key)
140 | 
141 |         return service_client
142 | 
143 | 


--------------------------------------------------------------------------------
/code/RunBatch.py:
--------------------------------------------------------------------------------
 1 | from Utilities import Utils
 2 | from AzureStorageHandler import StorageHandler
 3 | from AOAIHandler import AOAIHandler
 4 | from AzureBatch import AzureBatch
 5 | import time
 6 | import asyncio
 7 | import signal
 8 | import sys
 9 | import os
10 | 
11 | def signal_handler(sig, frame):
12 |     print('Exiting...')
13 |     sys.exit(0)
14 | 
15 | def main():
16 |     signal.signal(signal.SIGINT, signal_handler)
17 |     APP_CONFIG = os.environ.get('APP_CONFIG', r"C:\Users\dade\Desktop\AOAIBatchWorkingFork\aoai-batch-api-accelerator\config\app_config.json")
18 |     try:
19 |         app_config_data = Utils.read_json_data(APP_CONFIG)
20 |         storage_config_data = Utils.read_json_data(app_config_data["storage_config"])
21 |         storage_account_name = storage_config_data["storage_account_name"]
22 |         storage_account_key = storage_config_data["storage_account_key"]
23 |         input_filesystem_system_name =  storage_config_data["input_filesystem_system_name"]
24 |         error_filesystem_system_name = storage_config_data["error_filesystem_system_name"]
25 |         processed_filesystem_system_name = storage_config_data["processed_filesystem_system_name"]
26 |         input_directory = storage_config_data["input_directory"]
27 |         output_directory = storage_config_data["output_directory"]
28 |         error_directory = storage_config_data["error_directory"]
29 |         aoai_config_data = Utils.read_json_data(app_config_data["AOAI_config"])
30 |         BATCH_PATH = "https://"+storage_account_name+".blob.core.windows.net/"+input_filesystem_system_name+"/"
31 |         batch_size = int(app_config_data["batch_size"])
32 |         count_tokens = int(app_config_data["count_tokens"])
33 |         aoai_client = AOAIHandler(aoai_config_data)
34 |         input_storage_handler = StorageHandler(storage_account_name, storage_account_key, input_filesystem_system_name)
35 |         error_storage_handler = StorageHandler(storage_account_name, storage_account_key, error_filesystem_system_name)
36 |         processed_storage_handler = StorageHandler(storage_account_name, storage_account_key, processed_filesystem_system_name)
37 |         files = input_storage_handler.get_file_list(input_directory)
38 |         input_directory_client = input_storage_handler.get_directory_client(input_directory)
39 |         download_to_local = app_config_data["download_to_local"]
40 |         local_download_path = None
41 |         if download_to_local:
42 |             local_download_path = app_config_data["local_download_path"]
43 |         continuous_mode = app_config_data["continuous_mode"]
44 |         azure_batch = AzureBatch(aoai_client, input_storage_handler, 
45 |                                 error_storage_handler, processed_storage_handler, BATCH_PATH, input_directory_client, 
46 |                                 local_download_path,output_directory, error_directory,count_tokens)
47 |     except Exception as e:
48 |         print(f"An error occurred while initializing the application, please check the configuration. \n\n\tException:\n\n\t\t{e}\n\n")
49 |         return
50 |     if continuous_mode:
51 |         print("Running in continuous mode")
52 |         while True:
53 |             if len(files) > 0:
54 |                 asyncio.run(azure_batch.process_all_files(files, batch_size))
55 |             else:
56 |                 print("No files found. Sleeping for 60 seconds")
57 |                 time.sleep(60) 
58 |             files = input_storage_handler.get_file_list(input_directory)
59 |     else:
60 |         print("Running in on-demand mode")
61 |         asyncio.run(azure_batch.process_all_files(files, batch_size))   
62 | 
63 |     #TODO: 1) Support blob storage
64 |      
65 | 
66 | 
67 |           
68 |            
69 | if __name__ == "__main__":
70 |     main()


--------------------------------------------------------------------------------
/code/Utilities.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import tiktoken
 3 | import os
 4 | from token_count import TokenCount
 5 | from  datetime import datetime
 6 | class Utils:
 7 |     #Add utility to count output tokens and estimate price.
 8 |     def __init__(self):
 9 |         pass
10 |     @staticmethod
11 |     def strip_directory_name(file_name):
12 |         file_name_split = file_name.split("/")
13 |         return file_name_split[len(file_name_split)-1]
14 |     @staticmethod
15 |     def get_file_name_only(file_name):
16 |         file_name_with_extension = Utils.strip_directory_name(file_name)
17 |         file_name_with_extension_split = file_name_with_extension.split(".")
18 |         file_name_only = file_name_with_extension_split[0]
19 |         return file_name_only
20 |     @staticmethod
21 |     def read_json_data(file_name):
22 |         with open(file_name) as json_file:
23 |             data = json.load(json_file)
24 |         return data
25 |     @staticmethod
26 |     def get_file_list(directory):
27 |         file_list = []
28 |         for file in os.listdir(directory):
29 |             file_list.append(file)
30 |         return file_list
31 |     @staticmethod
32 |     def num_tokens_from_string(string: str, encoding_name: str) -> int:
33 |         encoding = tiktoken.encoding_for_model(encoding_name)
34 |         num_tokens = len(encoding.encode(string))
35 |         return num_tokens
36 |     @staticmethod
37 |     def get_tokens_in_file(file, model_family):
38 |         tc = TokenCount(model_name=model_family)
39 |         tokens = tc.num_tokens_from_file(file)
40 |         return tokens
41 |     @staticmethod
42 |     def append_postfix(file):
43 |         datetime_string = datetime.today().strftime('%Y-%m-%d_%H_%M_%S')
44 |         return f"{file}_{datetime_string}"
45 |     @staticmethod
46 |     def clean_binary_string(data):
47 |         return data[2:-1].replace('\\n', '').replace('\\"', '"').replace('\\\\', '\\')
48 |     @staticmethod
49 |     def convert_to_json_from_binary_string(data):
50 |         # Remove the leading "b'" and trailing "'"
51 |         data_str = data[2:-1]
52 | 
53 |         # Replace escape sequences
54 |         data_str_clean = data_str.replace('\\n', '').replace('\\"', '"').replace('\\\\', '\\')
55 | 
56 |         # Convert the JSON string to a dictionary
57 |         data_dict = json.loads(data_str_clean)
58 |         return data_dict
59 |     @staticmethod
60 |     def get_file_extension(file_name):
61 |         file_name_split = file_name.split(".")
62 |         #No extension
63 |         extension = file_name
64 |         if len(file_name_split) > 1:
65 |             extension = file_name_split[len(file_name_split)-1]
66 |         return extension
67 |         
68 | 
69 | 


--------------------------------------------------------------------------------
/media/batch_accel_overview_new.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/aoai-batch-api-accelerator/61315acbc5360f0ec0d6bbfa04693d9921ea9216/media/batch_accel_overview_new.png


--------------------------------------------------------------------------------
/media/overview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/aoai-batch-api-accelerator/61315acbc5360f0ec0d6bbfa04693d9921ea9216/media/overview.pdf


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | azure-storage-file-datalake
2 | openai
3 | tiktoken
4 | requests
5 | token-count
6 | asyncio
7 | aiohttp


--------------------------------------------------------------------------------
/templates/AOAI_config_template.json:
--------------------------------------------------------------------------------
1 | {
2 |     "aoai_key": "<Azure OpenAI key>",
3 |     "aoai_api_version": "2024-07-01-preview",
4 |     "aoai_endpoint": "<Azure OpenAI endpoint>",
5 |     "aoai_deployment_name": "<Azure OpenAI deployment name>",
6 |     "batch_job_endpoint": "/chat/completions",
7 |     "completion_window": "24h"
8 | }


--------------------------------------------------------------------------------
/templates/app_config.json:
--------------------------------------------------------------------------------
1 | {
2 |     "AOAI_config": "<Local path to AOAI_config.json>",
3 |     "storage_config": "<Local path to storage_config.json>",
4 |     "local_download_path": "<Local path for file downloads if enabled>",
5 |     "batch_size":10,
6 |     "download_to_local":false,
7 |     "continuous_mode":true,
8 |     "count_tokens":false
9 | }


--------------------------------------------------------------------------------
/templates/batch_template.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "custom_id":"<UID>",
 3 |     "method":"POST",
 4 |     "url":"/chat/completions",
 5 |     "body":{
 6 |         "model":"<Deployment Name>",
 7 |         "messages":[
 8 |             {
 9 |                 "role":"system",
10 |                 "content":"<System Prompt>"
11 |             },
12 |             {
13 |                 "role":"user",
14 |                 "content":"<User Prompt>"
15 |             }
16 |         ]
17 |     }
18 | }


--------------------------------------------------------------------------------
/templates/storage_config.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "storage_account_name": "<Azure Data Lake Storage Account Name>",
 3 |     "storage_account_key": "<Azure Data Lake Storage Account Key>",
 4 |     "input_filesystem_system_name": "<Input Filesystem/Container System Name>",
 5 |     "processed_filesystem_system_name": "<Processed Filesystem/Container System Name>",
 6 |     "error_filesystem_system_name": "<Error Filesystem/Container System Name>",
 7 |     "input_directory": "/",
 8 |     "output_directory": "/",
 9 |     "error_directory": "/"
10 |     
11 | }


--------------------------------------------------------------------------------