├── .gitignore
├── CONTRIBUTE.md
├── README.md
├── docs
    ├── index.html
    ├── materials.html
    └── molecules.html
├── images
    ├── AiMat_logo_purple.png
    └── ChemMatData_logo_final.png
├── materials.json
└── molecules.json


/.gitignore:
--------------------------------------------------------------------------------
 1 | # See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
 2 | 
 3 | # dependencies
 4 | website/node_modules
 5 | website/.pnp
 6 | .pnp.js
 7 | 
 8 | # testing
 9 | website/coverage
10 | 
11 | # production
12 | website/build
13 | 
14 | # misc
15 | .DS_Store
16 | .env.local
17 | .env.development.local
18 | .env.test.local
19 | .env.production.local
20 | 
21 | npm-debug.log*
22 | yarn-debug.log*
23 | yarn-error.log*
24 | 


--------------------------------------------------------------------------------
/CONTRIBUTE.md:
--------------------------------------------------------------------------------
 1 | ## Option 1: 
 2 | Please send a link to a missing dataset to Jana (jana.zeller@student.kit.edu) and Pascal (pascal.friederich@kit.edu).
 3 | Or, if you have a new dataset, please send it to us along with a short description.
 4 | 
 5 | ## Option 2: 
 6 | Click on the `branch` button and then click on the green `new branch` button.
 7 | Edit the `molecules.json` or `materials.json` file directly online in github.
 8 | 
 9 | ## Option 3: 
10 | 1. Fork the Project (`git pull https://github.com/aimat-lab/ChemMatData.git`)
11 | 2. Create your Dataset / Feature Branch (`git checkout -b feature/AmazingDataset`)
12 | 3. Open the desired JSON file (molecules.json or materials.json) in a text editor.
13 | 4. Create a new JSON object that represents your dataset. Make sure to include all the relevant fields: Dataset Name, Domain, Task Type, Data Type, #Compounds, #Tasks, Short Description, Papers, and DownloadLink.
14 | 5. If the dataset has multiple values for Task Type, Data Type, or DownloadLink, separate each entry with a comma followed by a whitespace (", ").
15 | 6. For the Papers field, create an array of objects where each object represents a paper with the fields Name and Link.
16 | 
17 | Your JSON could now look like this:
18 | ```
19 | {
20 |     "Dataset Name": "QM9",
21 |     "Domain": "Quantum Mechanics",
22 |     "Short Description": "QM9 is a comprehensive dataset that provides geometric, energetic, electronic and thermodynamic properties for a subset of GDB-17 database, comprising 134 thousand stable organic molecules with up to nine heavy atoms. All molecules are modeled using density functional theory (B3LYP/6-31G(2df,p) based DFT)",
23 |     "#Tasks": 16,
24 |     "#Compounds": 134000,
25 |     "Task Type": "Regression",
26 |     "Data Type": "SMILES, 3D coordinates",
27 |     "DownloadLink": "http://quantum-machine.org/datasets/#:~:text=Available%20via-,figshare,-.",
28 |     "Papers" : [
29 |         {
30 |             "Name": "Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17",
31 |             "Link": "http://pubs.acs.org/doi/abs/10.1021/ci300415d"
32 |         },
33 |         {
34 |             "Name": "Quantum chemistry structures and properties of 134 kilo molecules",
35 |             "Link": "http://quantum-machine.org/datasets/#:~:text=A.%20von%20Lilienfeld%2C-,Quantum%20chemistry%20structures%20and%20properties%20of%20134%20kilo%20molecules,-%2C%20Scientific%20Data"
36 |         }
37 |     ]
38 | }
39 | ```
40 | 7. Add your new dataset object to the array, making sure to place a comma before it.
41 | 8. Commit your Changes (`git commit -m 'Add some AmazingDataset'`)
42 | 9. Push to the Branch (`git push origin feature/AmazingDataset`)
43 | 10. Open a Pull Request. You open a pull request by clicking on the branch icon in the start page and navigating to the branch you just added. In the yellow banner click the `Compare & pull request` button. Select the `main` branch as the branch you want to merge into. Write a short description about the dataset you added. Then click on `Create Pull Request`. See [this full detailed description](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) from GitHub on how to open pull requests.
44 | 
45 | 
46 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ChemMatData
 2 | ## Global collection of molecular and materials datasets
 3 | Website at [aimat-lab.github.io/ChemMatData](https://aimat-lab.github.io/ChemMatData/index.html)
 4 | 
 5 | <img src="images/ChemMatData_logo_final.png" width="700">
 6 | 
 7 | ---
 8 | <!-- ABOUT THE PROJECT -->
 9 | ## About The Project
10 | Our main goal is to create a collaborative platform
11 | where we can gather and categorize various datasets,
12 | making them conveniently accessible in one place.
13 | We are actively collecting datasets for [molecules](https://github.com/aimat-lab/ChemMatData/blob/main/molecules.json) as well as [crystalline structures](https://github.com/aimat-lab/ChemMatData/blob/main/materials.json) to provide a comprehensive resource for researchers, scientists, and enthusiasts.
14 | 
15 | See the list of all [molecular datasets](https://github.com/aimat-lab/ChemMatData/blob/main/molecules.json).
16 | See the list of all [materials datasets](https://github.com/aimat-lab/ChemMatData/blob/main/materials.json).
17 | 
18 | You can sort and explore all available datasets using our [website](https://aimat-lab.github.io/ChemMatData/index.html).
19 | 
20 | 
21 | ---
22 | <!-- CONTRIBUTING -->
23 | ## Contributing
24 | If you have additional datasets that you believe should be included in our repository, we encourage you to [contribute](https://github.com/aimat-lab/ChemMatData/blob/main/CONTRIBUTE.md).
25 | Here's how you can do it:
26 | 1. Send a link to a missing dataset or your own dataset with a short description Pascal (pascal.friederich@kit.edu).
27 | 2. Directly add your dataset to the table in your browser.
28 | 3. Clone and extend this repository ([detailed description here](https://github.com/aimat-lab/ChemMatData/blob/main/CONTRIBUTE.md))
29 | 
30 | We appreciate your contribution and look forward to incorporating your suggested datasets into our growing collection!
31 | 
32 | ---
33 | <!-- CONTRIBUTORS -->
34 | ## List of contributors
35 | 
36 | Send us pull requests or emails with new datasets if you want to see your name here!
37 | 
38 | ---
39 | <!-- CONTACT -->
40 | ## About Us
41 | An open-source project hosted by the [AiMat Group](https://aimat.iti.kit.edu/) at the [Karlsruhe Institute of Technology (KIT)](https://www.kit.edu/).
42 | The initial version of this project was developed by Jana Zeller and Pascal Friederich.
43 | 
44 | <a href="https://aimat.science"><img src="images/AiMat_logo_purple.png" width="300"></a>
45 | 
46 | 
47 | 
48 | 


--------------------------------------------------------------------------------
/docs/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 | <head>
 4 |   <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
 5 |   <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65" crossorigin="anonymous">
 6 | </head>
 7 | 
 8 | <body>
 9 |   <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-kenU1KFdBIe4zVF0s0G1M5b4hcpxyD9F7jL+jjXkk+Q2h455rYXK/7HAuoJl+0I4" crossorigin="anonymous"></script>
10 |   <script src="https://kit.fontawesome.com/1620f13295.js" crossorigin="anonymous"></script>
11 | 
12 |   <nav class="navbar navbar-expand-lg navbar-light bg-light" style="padding: 24px">
13 |     <a class="navbar-brand" href="./index.html">ChemMatData</a>
14 |     <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
15 |       <span class="navbar-toggler-icon"></span>
16 |     </button>
17 |   
18 |     <div class="collapse navbar-collapse" id="navbarSupportedContent">
19 |       <ul class="navbar-nav mr-auto">
20 |         <li class="nav-item active">
21 |           <a class="nav-link" href="index.html">Home</a>
22 |         </li>
23 |         <li class="nav-item active">
24 |           <a class="nav-link" href="molecules.html">Molecules</a>
25 |         </li>
26 |         <li class="nav-item active">
27 |           <a class="nav-link" href="materials.html">Materials</a>
28 |         </li>
29 |       </ul>
30 |     </div>
31 |   </nav>
32 |   
33 |   <div style="text-align: center;">
34 |     <img src="https://cdn.jsdelivr.net/gh/aimat-lab/ChemMatData@main/images/ChemMatData_logo_final.png" class="img-fluid" style="width: 400px; margin: 32px;" alt="ChemMatData logo">
35 |   </div>
36 | 
37 |   <div class="d-flex justify-content-center" style="margin-bottom: 32px;">
38 |     <a href="molecules.html">
39 |       <button type="button" class="btn btn-outline-dark" style="padding: 16px; margin: 16px">
40 |         <i class="fa-solid fa-atom fa-xl"></i>
41 |         <text>
42 |           Explore Molecules
43 |         </text>
44 |       </button>
45 |     </a>
46 | 
47 |     <a href="materials.html">
48 |       <button type="button" class="btn btn-outline-dark" style="padding: 16px; margin: 16px">
49 |         <i class="fa-solid fa-diagram-project fa-xl"></i>
50 |         <text>
51 |           Explore Materials
52 |         </text>
53 |       </button>
54 |     </a>
55 |     
56 |     <a href="https://github.com/aimat-lab/ChemMatData">
57 |       <button type="button" class="btn btn-outline-dark" style="padding: 16px; margin: 16px">
58 |         <i class="fa-brands fa-github fa-xl"></i>
59 |         <text>
60 |           Contribute on GitHub!
61 |         </text>
62 |       </button>
63 |     </a>
64 |   </div>
65 | 
66 |   <p style="text-align: justify; margin: 32px">
67 |     Our main goal is to create a collaborative platform
68 |     where we can gather and categorize various datasets,
69 |     making them conveniently accessible in one place.
70 |     We are actively collecting datasets for <a href="./molecules.html">molecules</a> as well as <a href="./materials.html">crystalline structures</a>
71 |     to provide a comprehensive resource for researchers, scientists, and enthusiasts.
72 |   </p>
73 | 
74 |   <div style="text-align: justify; margin: 32px">
75 |     <h3>Contributors</h3>
76 |     <p>Send us pull requests or emails with new datasets if you want to see your name here!</p>
77 |   </div>
78 | 
79 |   <div style="text-align: justify; margin: 32px">
80 |     <h3>About Us</h3>
81 |     <p>An open-source project hosted by the <a href="aimat.science">AiMat Group</a> at the <a href="kit.edu">Karlsruhe Institute of Technology (KIT)</a>.</p>
82 |   </div>
83 | 
84 |   <div style="text-align: center;">
85 |     <a href="aimat.science">
86 |       <img class="img-fluid" style="width: 200px" src="https://cdn.jsdelivr.net/gh/aimat-lab/ChemMatData@main/images/AiMat_logo_purple.png">
87 |     </a>
88 |   </div>
89 | 
90 |   <footer class="py-3 my-4">
91 |     <ul class="nav justify-content-center border-bottom pb-3 mb-3">
92 |       <li class="nav-item"><a href="aimat.science" class="nav-link px-2 text-muted">AiMat</a></li>
93 |       <li class="nav-item"><a href="https://aimat.iti.kit.edu/legals.php" class="nav-link px-2 text-muted">Legals</a></li>
94 |       <li class="nav-item"><a href="https://aimat.iti.kit.edu/datenschutz.php" class="nav-link px-2 text-muted">Privacy Policy</a></li>
95 |     </ul>
96 |     <p class="text-center text-muted">© 2023 AiMAT</p>
97 |   </footer>
98 | </body>


--------------------------------------------------------------------------------
/docs/materials.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 | <head>
 4 |     <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
 5 |   
 6 |     <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65" crossorigin="anonymous">
 7 | </head>
 8 | 
 9 | <body>
10 |     <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-kenU1KFdBIe4zVF0s0G1M5b4hcpxyD9F7jL+jjXkk+Q2h455rYXK/7HAuoJl+0I4" crossorigin="anonymous"></script>
11 | 
12 |     <nav class="navbar navbar-expand-lg navbar-light bg-light" style="padding: 24px">
13 |         <a class="navbar-brand" href="./index.html">ChemMatData</a>
14 |         <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
15 |           <span class="navbar-toggler-icon"></span>
16 |         </button>
17 |       
18 |         <div class="collapse navbar-collapse" id="navbarSupportedContent">
19 |           <ul class="navbar-nav mr-auto">
20 |             <li class="nav-item active">
21 |               <a class="nav-link" href="index.html">Home</a>
22 |             </li>
23 |             <li class="nav-item active">
24 |               <a class="nav-link" href="molecules.html">Molecules</a>
25 |             </li>
26 |             <li class="nav-item active">
27 |               <a class="nav-link" href="materials.html">Materials</a>
28 |             </li>
29 |           </ul>
30 |         </div>
31 |       </nav>
32 | 
33 |     <h2 class="text-center">Materials</h2>
34 |     <h2 class="text-center">Coming Soon!</h2>
35 |   
36 |     <footer class="py-3 my-4">
37 |       <ul class="nav justify-content-center border-bottom pb-3 mb-3">
38 |         <li class="nav-item"><a href="aimat.science" class="nav-link px-2 text-muted">AiMat</a></li>
39 |         <li class="nav-item"><a href="https://aimat.iti.kit.edu/legals.php" class="nav-link px-2 text-muted">Legals</a></li>
40 |         <li class="nav-item"><a href="https://aimat.iti.kit.edu/datenschutz.php" class="nav-link px-2 text-muted">Privacy Policy</a></li>
41 |       </ul>
42 |       <p class="text-center text-muted">© 2023 AiMAT</p>
43 |     </footer>
44 | </body>


--------------------------------------------------------------------------------
/docs/molecules.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en">
  3 | <head>
  4 |     <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
  5 |   
  6 |     <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65" crossorigin="anonymous">
  7 |     
  8 |     <link href="https://cdn.jsdelivr.net/gh/tofsjonas/sortable@latest/sortable.min.css" rel="stylesheet" />
  9 |     <script src="https://cdn.jsdelivr.net/gh/tofsjonas/sortable@latest/sortable.min.js"></script>
 10 |   
 11 |     <script>
 12 |       function mapToBadges(string, mapping) {
 13 |         var cell = $("<div>")
 14 |         var labels = string.split(", ");
 15 |         for (label of labels) {
 16 |           var color = 'secondary';
 17 |           if (label in mapping) {
 18 |             color = mapping[label];
 19 |           }
 20 |           var badge = $("<span>").addClass("badge bg-" + color).text(label);
 21 |           cell.append(badge);
 22 |         }
 23 |         return cell;
 24 |       }
 25 |   
 26 |       $(document).ready(function() {
 27 |         $.getJSON("https://cdn.jsdelivr.net/gh/aimat-lab/ChemMatData@latest/molecules.json", function(data) {
 28 |           var tableBody = $("#table-body");
 29 |   
 30 |           data.forEach(function(item, id) {
 31 |             var row = $("<tr>");
 32 |             row.append($("<td value=\"Name\">").text(item["Dataset Name"]));
 33 |             
 34 |             domainCell = $("<td>");
 35 |             const domainColorMap = {
 36 |               'Quantum Mechanics': 'primary',
 37 |               'Biophysics': 'danger',
 38 |               'Physical Chemistry': 'success',
 39 |               'Biophysics': 'dark',
 40 |               'Physiology': 'info'
 41 |             };
 42 |             domainCell.append(mapToBadges(item['Domain'], domainColorMap));
 43 |             row.append(domainCell);
 44 |   
 45 |             row.append($("<td> value=\"#Tasks\">").text(item["#Tasks"]));
 46 |             row.append($("<td value=\"#Compounds\">").text(item["#Compounds"]));
 47 |             
 48 |             taskTypeCell = $("<td>");
 49 |             const taskColorMap = {
 50 |               'Regression': 'primary',
 51 |               'Classification': 'warning',
 52 |               'Rank': 'success'
 53 |             };
 54 |             taskTypeCell.append(mapToBadges(item['Task Type'], taskColorMap));
 55 |             row.append(taskTypeCell);
 56 |             
 57 |             dataTypeCell = $("<td>");
 58 |             const dataTypeColorMap = {
 59 |               'molecular graph': 'primary',
 60 |               '3D coordinates': 'warning',
 61 |               'SMILES': 'success'
 62 |             }
 63 |             dataTypeCell.append(mapToBadges(item['Data Type'], dataTypeColorMap));
 64 |             row.append(dataTypeCell);
 65 |   
 66 |             var infoButtonCell = $("<td>");
 67 |             var infoButton = $("<button class=\"btn btn-primary\" type=\"button\" data-bs-toggle=\"modal\">");
 68 |             infoButton.attr("data-bs-target", "#modal-" + id);
 69 |             infoButton.text("More Info");
 70 |             infoButtonCell.append(infoButton);
 71 |             row.append(infoButtonCell);
 72 |   
 73 |             var modal = $("<div class=\"modal fade\">");
 74 |             modal.attr("id", "modal-" + id);
 75 |             modal.attr("tabindex", "-1");
 76 |             modal.attr("role", "dialog");
 77 |             modal.attr("aria-labelledby", "modal-" + id + "-label");
 78 |             var modalDialog = $("<div class=\"modal-dialog\" role=\"document\">");
 79 |             var modalContent = $("<div class=\"modal-content\">");
 80 |             var modalHeader = $("<div class=\"modal-header\">");
 81 |             var modalTitle = $("<h5 class=\"modal-title\">");
 82 |             modalTitle.attr("id", "modal-" + id + "-label");
 83 |             modalTitle.text(item["Dataset Name"]);
 84 |   
 85 |             var modalBody = $("<div class=\"modal-body\">");
 86 |             var description = $("<p>").text(item["Short Description"])
 87 |             modalBody.append(description);
 88 |   
 89 |             var referencesCell = $("<p>").text("References:\n")
 90 |             referencesCell.append("<ul>")
 91 |             console.log(item['Papers'])
 92 |             for (paper of item['Papers']) {
 93 |               var listItem = $("<li>");
 94 |               var reference = $("<a>").attr("href", paper.Link).text(paper.Name);
 95 |               listItem.append(reference);
 96 |               referencesCell.append(listItem)
 97 |             }
 98 |             modalBody.append(referencesCell);
 99 |   
100 |             var linkCell = $("<p>").text("Download here: ");
101 |             var links = item["DownloadLink"].split(", ")
102 |             for (link of links) {
103 |               var text = "Download"
104 |               try {
105 |                 text = new URL(link).hostname
106 |               } catch (error) {
107 |                 console.log(error)
108 |               }
109 |               var linkATag = $("<a>").attr("href", link).text(text);
110 |               linkCell.append(linkATag);
111 |               linkCell.append(" ");
112 |             }
113 |             modalBody.append(linkCell);
114 |   
115 |             var modalFooter = $("<div class=\"modal-footer\">");
116 |             var closeButton = $("<button type=\"button\" class=\"btn btn-secondary\" data-bs-dismiss=\"modal\">Close</button>");
117 |             modalHeader.append(modalTitle);
118 |             modalContent.append(modalHeader);
119 |             modalContent.append(modalBody);
120 |             modalContent.append(modalFooter);
121 |             modalFooter.append(closeButton);
122 |             modalDialog.append(modalContent);
123 |             modal.append(modalDialog);
124 |             row.append($("<td>").append(modal));
125 |   
126 |             tableBody.append(row);
127 |           });
128 |   
129 |           // Initialize the modal component
130 |           $(".modal").modal();
131 |         });
132 |       });
133 |     </script>
134 | </head>
135 | 
136 | <body>
137 |     <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-kenU1KFdBIe4zVF0s0G1M5b4hcpxyD9F7jL+jjXkk+Q2h455rYXK/7HAuoJl+0I4" crossorigin="anonymous"></script>
138 | 
139 |     <nav class="navbar navbar-expand-lg navbar-light bg-light" style="padding: 24px">
140 |         <a class="navbar-brand" href="./index.html">ChemMatData</a>
141 |         <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
142 |           <span class="navbar-toggler-icon"></span>
143 |         </button>
144 |       
145 |         <div class="collapse navbar-collapse" id="navbarSupportedContent">
146 |           <ul class="navbar-nav mr-auto">
147 |             <li class="nav-item active">
148 |               <a class="nav-link" href="index.html">Home</a>
149 |             </li>
150 |             <li class="nav-item active">
151 |               <a class="nav-link" href="molecules.html">Molecules</a>
152 |             </li>
153 |             <li class="nav-item active">
154 |               <a class="nav-link" href="materials.html">Materials</a>
155 |             </li>
156 |           </ul>
157 |         </div>
158 |       </nav>
159 | 
160 |     <h2 class="text-center">Molecules</h2>
161 |     <table class="table sortable">
162 |       <thead>
163 |         <tr>
164 |           <th data-field="Dataset Name">Dataset Name</th>
165 |           <th data-field="Domain">Domain</th>
166 |           <!-- <th data-field="Short Description">Short Description</th> -->
167 |           <th data-field="#Tasks">#Tasks</th>
168 |           <th data-field="#Compounds">#Compounds</th>
169 |           <th data-field="Task Type">Task Type</th>
170 |           <th data-field="Data Type">Data Type</th>
171 |           <th data-field="More Infos">More Infos</th>
172 |         </tr>
173 |       </thead>
174 |       <tbody id="table-body">
175 |       </tbody>
176 |     </table>
177 |   
178 |     <footer class="py-3 my-4">
179 |       <ul class="nav justify-content-center border-bottom pb-3 mb-3">
180 |         <li class="nav-item"><a href="aimat.science" class="nav-link px-2 text-muted">AiMat</a></li>
181 |         <li class="nav-item"><a href="https://aimat.iti.kit.edu/legals.php" class="nav-link px-2 text-muted">Legals</a></li>
182 |         <li class="nav-item"><a href="https://aimat.iti.kit.edu/datenschutz.php" class="nav-link px-2 text-muted">Privacy Policy</a></li>
183 |       </ul>
184 |       <p class="text-center text-muted">© 2023 AiMAT</p>
185 |     </footer>
186 | </body>


--------------------------------------------------------------------------------
/images/AiMat_logo_purple.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aimat-lab/ChemMatData/c99907247ff053318e7479dcf6e6e6d9e2198a72/images/AiMat_logo_purple.png


--------------------------------------------------------------------------------
/images/ChemMatData_logo_final.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aimat-lab/ChemMatData/c99907247ff053318e7479dcf6e6e6d9e2198a72/images/ChemMatData_logo_final.png


--------------------------------------------------------------------------------
/materials.json:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aimat-lab/ChemMatData/c99907247ff053318e7479dcf6e6e6d9e2198a72/materials.json


--------------------------------------------------------------------------------
/molecules.json:
--------------------------------------------------------------------------------
  1 | [
  2 |     {
  3 |         "Dataset Name": "QM9",
  4 |         "Domain": "Quantum Mechanics",
  5 |         "Short Description": "QM9 is a comprehensive dataset that provides geometric, energetic, electronic and thermodynamic properties for a subset of GDB-17 database, comprising 134 thousand stable organic molecules with up to nine heavy atoms. All molecules are modeled using density functional theory (B3LYP/6-31G(2df,p) based DFT)",
  6 |         "#Tasks": 16,
  7 |         "#Compounds": 134000,
  8 |         "Task Type": "Regression",
  9 |         "Data Type": "SMILES, 3D coordinates",
 10 |         "DownloadLink": "http://quantum-machine.org/datasets/#:~:text=Available%20via-,figshare,-.",
 11 |         "Papers" : [
 12 |             {
 13 |                 "Name": "Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17",
 14 |                 "Link": "http://pubs.acs.org/doi/abs/10.1021/ci300415d"
 15 |             },
 16 |             {
 17 |                 "Name": "Quantum chemistry structures and properties of 134 kilo molecules",
 18 |                 "Link": "http://quantum-machine.org/datasets/#:~:text=A.%20von%20Lilienfeld%2C-,Quantum%20chemistry%20structures%20and%20properties%20of%20134%20kilo%20molecules,-%2C%20Scientific%20Data"
 19 |             }
 20 |         ]
 21 |     },
 22 |     {
 23 |         "Dataset Name": "PCQM4Mv2",
 24 |         "Domain": "Quantum Mechanics",
 25 |         "Short Description": "Based on the PubChemQC, we define a meaningful ML task of predicting DFT-calculated HOMO-LUMO energy gap of molecules given their 2D molecular graphs. The HOMO-LUMO gap is one of the most practically-relevant quantum chemical properties of molecules since it is related to reactivity, photoexcitation, and charge transport.",
 26 |         "#Tasks": 1,
 27 |         "#Compounds": 3378606,
 28 |         "Task Type": "Regression",
 29 |         "Data Type": "SMILES",
 30 |         "DownloadLink": "https://ogb.stanford.edu/docs/lsc/pcqm4mv2/#dataset",
 31 |         "Papers": []
 32 |     },
 33 |     {
 34 |         "Dataset Name": "Alchemy",
 35 |         "Domain": "Quantum Mechanics",
 36 |         "Short Description": "The dataset comprises of 12 quantum mechanical properties of 119,487 organic molecules with up to 14 heavy atoms, sampled from the GDB MedChem database.",
 37 |         "#Tasks": 12,
 38 |         "#Compounds": 202579,
 39 |         "Task Type": "Regression",
 40 |         "Data Type": "SMILES, 3D coordinates",
 41 |         "DownloadLink": "https://chrsmrrs.github.io/datasets/docs/datasets/",
 42 |         "Papers": [
 43 |             {
 44 |                 "Name": "Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models",
 45 |                 "Link": "https://arxiv.org/pdf/1906.09427.pdf"
 46 |             }
 47 |         ]
 48 |     },
 49 |     {
 50 |         "Dataset Name": "BACE",
 51 |         "Domain": "Biophysics",
 52 |         "Short Description": "The BACE dataset provides quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human β-secretase 1 (BACE-1)",
 53 |         "#Tasks": 2,
 54 |         "#Compounds": 1522,
 55 |         "Task Type": "Regression, Classification",
 56 |         "Data Type": "SMILES",
 57 |         "DownloadLink": "https://moleculenet.org/datasets-1",
 58 |         "Papers": []
 59 |     },
 60 |     {
 61 |         "Dataset Name": "Freesolv",
 62 |         "Domain": "Physical Chemistry",
 63 |         "Short Description": "A collection of experimental and calculated hydration free energies for small molecules in water. The calculated values are derived from alchemical free energy calculations using molecular dynamics simulations.",
 64 |         "#Tasks": 1,
 65 |         "#Compounds": 643,
 66 |         "Task Type": "Regression",
 67 |         "Data Type": "SMILES",
 68 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=organic%20small%20molecules.-,FreeSolv,-%3A%20Experimental%20and%20calculated), [weilab.math.msu.edu](https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=4-,FreeSolv,-Solvation%20free%20energy",
 69 |         "Papers": [
 70 |             {
 71 |                 "Name": "FreeSolv: a database of experimental and calculated hydration free energies, with input files",
 72 |                 "Link": "https://pubmed.ncbi.nlm.nih.gov/24928188/"
 73 |             }
 74 |         ]
 75 |     },
 76 |     {
 77 |         "Dataset Name": "ESOL (delaney)",
 78 |         "Domain": "Physical Chemistry",
 79 |         "Short Description": "Water solubility data(log solubility in mols per litre) for common organic small molecules.",
 80 |         "#Tasks": 1,
 81 |         "#Compounds": 1128,
 82 |         "Task Type": "Regression",
 83 |         "Data Type": "SMILES",
 84 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=modelled%20small%20molecules.-,ESOL,-%3A%20Water%20solubility%20data, https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=5-,ESOL,-ESOL%20(delaney)%20is",
 85 |         "Papers": [
 86 |             {
 87 |                 "Name": "ESOL: Estimating Aqueous Solubility Directly from Molecular Structure",
 88 |                 "Link": "https://pubs.acs.org/doi/10.1021/ci034243x"
 89 |             }
 90 |         ]
 91 |     },
 92 |     {
 93 |         "Dataset Name": "Lipophilicity",
 94 |         "Domain": "Physical Chemistry",
 95 |         "Short Description": "Experimental results of octanol/water distribution coefficient(logD at pH 7.4).",
 96 |         "#Tasks": 1,
 97 |         "#Compounds": 4200,
 98 |         "Task Type": "Regression",
 99 |         "Data Type": "SMILES",
100 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=molecules%20in%20water.-,Lipophilicity,-%3A%20Experimental%20results%20of, https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=3%2C%205-,Lipophilicity,-SMILES%20strings%20are",
101 |         "Papers": []
102 |     },
103 |     {
104 |         "Dataset Name": "MUV",
105 |         "Domain": "Biophysics",
106 |         "Short Description": "Subset of PubChem BioAssay by applying a refined nearest neighbor analysis, designed for validation of virtual screening techniques.",
107 |         "#Tasks": 17,
108 |         "#Compounds": 93087,
109 |         "Task Type": "Classification",
110 |         "Data Type": "SMILES",
111 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=high%2Dthroughput%20screening.-,MUV,-%3A%20Subset%20of%20PubChem",
112 |         "Papers": [
113 |             {
114 |                 "Name": "MoleculeNet: A Benchmark for Molecular Machine Learning",
115 |                 "Link": "https://arxiv.org/abs/1703.00564"
116 |             }
117 |         ]
118 |     },
119 |     {
120 |         "Dataset Name": "HIV",
121 |         "Domain": "Biophysics",
122 |         "Short Description": "Experimentally measured abilities to inhibit HIV replication.",
123 |         "#Tasks": 1,
124 |         "#Compounds": 41127,
125 |         "Task Type": "Classification",
126 |         "Data Type": "SMILES",
127 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=virtual%20screening%20techniques.-,HIV,-%3A%20Experimentally%20measured%20abilities",
128 |         "Papers": [
129 |             {
130 |                 "Name": "MoleculeNet: a benchmark for molecular machine learning",
131 |                 "Link": "https://pubs.rsc.org/en/content/articlehtml/2018/sc/c7sc02664a"
132 |             }
133 |         ]
134 |     },
135 |     {
136 |         "Dataset Name": "AIDS",
137 |         "Domain": "Biophysics",
138 |         "Short Description": "The DTP AIDS Antiviral Screen has checked tens of thousands of compounds for evidence of anti-HIV activity. Available are screening results and chemical structural data on compounds that are not covered by a confidentiality agreement.",
139 |         "#Tasks": 2,
140 |         "#Compounds": 2000,
141 |         "Task Type": "Classification",
142 |         "Data Type": "molecular graph",
143 |         "DownloadLink": "https://chrsmrrs.github.io/datasets/docs/datasets/#:~:text=%E2%80%93-,AIDS,-alchemy_full",
144 |         "Papers": [
145 |             {
146 |                 "Name": "IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning",
147 |                 "Link": "https://DownloadLink.springer.com/chapter/10.1007/978-3-540-89689-0_33"
148 |             },
149 |             {
150 |                 "Name": "AIDS Antiviral Screen Data (2004)",
151 |                 "Link": "https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data"
152 |             }
153 |         ]
154 |     },
155 |     {
156 |         "Dataset Name": "PDBbind",
157 |         "Domain": "Biophysics",
158 |         "Short Description": "Binding affinities for bio-molecular complexes, both structures of proteins and ligands are provided.",
159 |         "#Tasks": 1,
160 |         "#Compounds": 11908,
161 |         "Task Type": "Regression",
162 |         "Data Type": "SMILES, 3D coordinates",
163 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=inhibit%20HIV%20replication.-,PDBbind,-%3A%20Binding%20affinities%20for",
164 |         "Papers": [
165 |             {
166 |                 "Name": "Comparative assessment of scoring functions on a diverse test set",
167 |                 "Link": "https://pubmed.ncbi.nlm.nih.gov/19358517/"
168 |             }
169 |         ]
170 |     },
171 |     {
172 |         "Dataset Name": "BBBP",
173 |         "Domain": "Physiology",
174 |         "Short Description": "Binary labels of blood-brain barrier penetration(permeability).",
175 |         "#Tasks": 1,
176 |         "#Compounds": 2039,
177 |         "Task Type": "Classification",
178 |         "Data Type": "SMILES",
179 |         "DownloadLink": "https://moleculenet.org/datasets-1), https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=5-,BBBP,-Blood%E2%80%93brain%20barrier",
180 |         "Papers": [
181 |             {
182 |                 "Name": "MoleculeNet: a benchmark for molecular machine learning",
183 |                 "Link": "https://pubs.rsc.org/en/content/articlehtml/2018/sc/c7sc02664a"
184 |             }
185 |         ]
186 |     },
187 |     {
188 |         "Dataset Name": "Tox21",
189 |         "Domain": "Physiology",
190 |         "Short Description": "Qualitative toxicity measurements on 12 biological targets, including nuclear receptors and stress response pathways.",
191 |         "#Tasks": 12,
192 |         "#Compounds": 7831,
193 |         "Task Type": "Classification",
194 |         "Data Type": "SMILES",
195 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=barrier%20penetration(permeability).-,Tox21,-%3A%20Qualitative%20toxicity%20measurements",
196 |         "Papers": [
197 |             {
198 |                 "Name": "MoleculeNet: a benchmark for molecular machine learning",
199 |                 "Link": "https://pubs.rsc.org/en/content/articlehtml/2018/sc/c7sc02664a"
200 |             }
201 |         ]
202 |     },
203 |     {
204 |         "Dataset Name": "ToxCast",
205 |         "Domain": "Physiology",
206 |         "Short Description": "Toxicology data for a large library of compounds based on in vitro high-throughput screening, including experiments on over 600 tasks.",
207 |         "#Tasks": 617,
208 |         "#Compounds": 8575,
209 |         "Task Type": "Classification",
210 |         "Data Type": "SMILES",
211 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=stress%20response%20pathways.-,ToxCast,-%3A%20Toxicology%20data%20for",
212 |         "Papers": [
213 |             {
214 |                 "Name": "MoleculeNet: a benchmark for molecular machine learning",
215 |                 "Link": "https://pubs.rsc.org/en/content/articlehtml/2018/sc/c7sc02664a"
216 |             }
217 |         ]
218 |     },
219 |     {
220 |         "Dataset Name": "SIDER",
221 |         "Domain": "Physiology",
222 |         "Short Description": "Database of marketed drugs and adverse drug reactions (ADR), grouped into 27 system organ classes.",
223 |         "#Tasks": 27,
224 |         "#Compounds": 1427,
225 |         "Task Type": "Classification",
226 |         "Data Type": "SMILES",
227 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=over%20600%20tasks.-,SIDER,-%3A%20Database%20of%20marketed",
228 |         "Papers": [
229 |             {
230 |                 "Name": "MoleculeNet: a benchmark for molecular machine learning",
231 |                 "Link": "https://pubs.rsc.org/en/content/articlehtml/2018/sc/c7sc02664a"
232 |             }
233 |         ]
234 |     },
235 |     {
236 |         "Dataset Name": "ClinTOX",
237 |         "Domain": "Physiology",
238 |         "Short Description": "Qualitative data of drugs approved by the FDA and those that have failed clinical trials for toxicity reasons.",
239 |         "#Tasks": 2,
240 |         "#Compounds": 1478,
241 |         "Task Type": "Classification",
242 |         "Data Type": "SMILES",
243 |         "DownloadLink": "https://moleculenet.org/datasets-1#:~:text=system%20organ%20classes.-,ClinTox,-%3A%20Qualitative%20data%20of, https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=5-,ClinTox,-The%20ClinTox%20dataset",
244 |         "Papers": [
245 |             {
246 |                 "Name": "MoleculeNet: a benchmark for molecular machine learning",
247 |                 "Link": "https://pubs.rsc.org/en/content/articlehtml/2018/sc/c7sc02664a"
248 |             }
249 |         ]    
250 |     },
251 |     {
252 |         "Dataset Name": "Quantitative toxicity - LD50",
253 |         "Domain": "Physiology",
254 |         "Short Description": "The oral rat LD50 dataset (LD50).",
255 |         "#Tasks": 1,
256 |         "#Compounds": 7413,
257 |         "Task Type": "Regression",
258 |         "Data Type": "SMILES",
259 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=1-,Quantitative%20toxicity,-LD50",
260 |         "Papers": [
261 |             {
262 |                 "Name": "Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks",
263 |                 "Link": "https://users.math.msu.edu/users/weig/PaperName/p222.pdf"
264 |             },
265 |             {
266 |                 "Name": "Algebraic graph-assisted bidirectional transformers for molecular property prediction",
267 |                 "Link": "https://www.nature.com/articles/s41467-021-23720-w.pdf"
268 |             }
269 |         ]
270 |     },
271 |     {
272 |         "Dataset Name": "Quantitative toxicity - IGC50",
273 |         "Domain": "Physiology",
274 |         "Short Description": "Tetrahymena pyriformis IGC50 dataset (IGC50).",
275 |         "#Tasks": 1,
276 |         "#Compounds": 1792,
277 |         "Task Type": "Regression",
278 |         "Data Type": "SMILES",
279 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=1-,Quantitative%20toxicity,-LD50",
280 |         "Papers": [
281 |             {
282 |                 "Name": "Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks",
283 |                 "Link": "https://users.math.msu.edu/users/weig/PaperName/p222.pdf"
284 |             },
285 |             {
286 |                 "Name": "Algebraic graph-assisted bidirectional transformers for molecular property prediction",
287 |                 "Link": "https://www.nature.com/articles/s41467-021-23720-w.pdf"
288 |             }
289 |         ]
290 |     },
291 |     {
292 |         "Dataset Name": "Quantitative toxicity - LC50",
293 |         "Domain": "Physiology",
294 |         "Short Description": "96 h fathead minnow LC50 dataset.",
295 |         "#Tasks": 1,
296 |         "#Compounds": 813,
297 |         "Task Type": "Regression",
298 |         "Data Type": "SMILES",
299 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=1-,Quantitative%20toxicity,-LD50",
300 |         "Papers": [
301 |             {
302 |                 "Name": "Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks",
303 |                 "Link": "https://users.math.msu.edu/users/weig/PaperName/p222.pdf"
304 |             },
305 |             {
306 |                 "Name": "Algebraic graph-assisted bidirectional transformers for molecular property prediction",
307 |                 "Link": "https://www.nature.com/articles/s41467-021-23720-w.pdf"
308 |             }
309 |         ]
310 |     },
311 |     {
312 |         "Dataset Name": "Quantitative toxicity - LC50DM",
313 |         "Domain": "Physiology",
314 |         "Short Description": "The oral rat LD50 dataset (LD50).",
315 |         "#Tasks": 1,
316 |         "#Compounds": 353,
317 |         "Task Type": "Regression",
318 |         "Data Type": "SMILES",
319 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=1-,Quantitative%20toxicity,-LD50",
320 |         "Papers": [
321 |             {
322 |                 "Name": "Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks",
323 |                 "Link": "https://users.math.msu.edu/users/weig/PaperName/p222.pdf"
324 |             },
325 |             {
326 |                 "Name": "Algebraic graph-assisted bidirectional transformers for molecular property prediction",
327 |                 "Link": "https://www.nature.com/articles/s41467-021-23720-w.pdf"
328 |             }
329 |         ]
330 |     },
331 |     {
332 |         "Dataset Name": "beet",
333 |         "Domain": "Physiology",
334 |         "Short Description": "The toxicity in honey bees (beet) dataset was extract from a study on the prediction of acute contact toxicity of pesticides in honeybees. The data set contains 254 compounds with their experimental values.",
335 |         "#Tasks": 2,
336 |         "#Compounds": 254,
337 |         "Task Type": "Classification",
338 |         "Data Type": "SMILES",
339 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=5-,beet,-The%20toxicity%20in",
340 |         "Papers": [
341 |             {
342 |                 "Name": "Extracting Predictive Representations from Hundreds of Millions of Molecules",
343 |                 "Link": "https://pubs.acs.org/doi/pdf/10.1021/acs.jpclett.1c03058"
344 |             }
345 |         ]
346 |     },
347 |     {
348 |         "Dataset Name": "logP",
349 |         "Domain": "",
350 |         "Short Description": "Partition coefficient datasets, including training set (8199 compounds), Food and Drug Administration (FDA) set, Star, and Nonstar set.",
351 |         "#Tasks": 3,
352 |         "#Compounds": "8199(train), 406(test-FDA), 223(test-Star), 43(test-Nonstar)",
353 |         "Task Type": "Regression",
354 |         "Data Type": "SMILES, 3D coordinates",
355 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/3D/#:~:text=Reference-,logP,-Partition%20coefficient%20datasets",
356 |         "Papers": [
357 |             {
358 |                 "Name": "Algebraic graph-assisted bidirectional transformers for molecular property prediction",
359 |                 "Link": "https://www.nature.com/articles/s41467-021-23720-w.pdf"
360 |             },
361 |             {
362 |                 "Name": "TopP–S: Persistent Homology-Based Multi-Task Deep Neural Networks for Simultaneous Predictions of Partition Coefficient and Aqueous Solubility",
363 |                 "Link": "https://users.math.msu.edu/users/weig/paper/p223.pdf"
364 |             }
365 |         ]
366 |     },
367 |     {
368 |         "Dataset Name": "logS(1)",
369 |         "Domain": "",
370 |         "Short Description": "Small aqueous solubility datasets.",
371 |         "#Tasks": 2,
372 |         "#Compounds": 1431,
373 |         "Task Type": "Regression",
374 |         "Data Type": "SMILES, 3D coordinates",
375 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=1%2C%203-,logS(1),-A%20diverse%20dataset",
376 |         "Papers": [
377 |             {
378 |                 "Name": "TopP–S: Persistent Homology-Based Multi-Task Deep Neural Networks for Simultaneous Predictions of Partition Coefficient and Aqueous Solubility",
379 |                 "Link": "https://users.math.msu.edu/users/weig/paper/p223.pdf"
380 |             }
381 |         ]
382 |     },
383 |     {
384 |         "Dataset Name": "DPP4",
385 |         "Domain": "",
386 |         "Short Description": "DPP-4 inhibitors (DPP4) was extract from ChEMBL with DPP-4 target. The data was processed by removing salt and normalizing molecular structure, with molecular duplication examination, leaving 3933 molecules.",
387 |         "#Tasks": 1,
388 |         "#Compounds": 3933,
389 |         "Task Type": "Regression",
390 |         "Data Type": "SMILES",
391 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=3%2C%205-,DPP4,-DPP%2D4%20inhibitors",
392 |         "Papers": [
393 |             {
394 |                 "Name": "Extracting Predictive Representations from Hundreds of Millions of Molecules",
395 |                 "Link": "https://pubs.acs.org/doi/pdf/10.1021/acs.jpclett.1c03058"
396 |             }
397 |         ]
398 |     },
399 |     {
400 |         "Dataset Name": "Ames",
401 |         "Domain": "",
402 |         "Short Description": "Ames mutagenicity. The dataset includes 6512 compounds and corresponding binary labels from Ames Mutagenicity results.",
403 |         "#Tasks": 1,
404 |         "#Compounds": 6512,
405 |         "Task Type": "Classification",
406 |         "Data Type": "SMILES",
407 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=3%2C%205-,Ames,-Ames%20mutagenicity.%20The",
408 |         "Papers": [
409 |             {
410 |                 "Name": "Extracting Predictive Representations from Hundreds of Millions of Molecules",
411 |                 "Link": "https://pubs.acs.org/doi/pdf/10.1021/acs.jpclett.1c03058"
412 |             }
413 |         ]
414 |     },
415 |     {
416 |         "Dataset Name": "DUD",
417 |         "Domain": "",
418 |         "Short Description": "A Directory of Useful Decoys (DUD).",
419 |         "#Tasks": 21,
420 |         "#Compounds": "between 31 and 365 actives and 1,344 and 15,560 decoys depending on target",
421 |         "Task Type": "Rank",
422 |         "Data Type": "SMILES",
423 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#:~:text=5-,DUD,-A%20Directory%20of",
424 |         "Papers": [
425 |             {
426 |                 "Name": "Extracting Predictive Representations from Hundreds of Millions of Molecules",
427 |                 "Link": "https://pubs.acs.org/doi/pdf/10.1021/acs.jpclett.1c03058"
428 |             }
429 |         ]
430 |     },
431 |     {
432 |         "Dataset Name": "MUV",
433 |         "Domain": "",
434 |         "Short Description": "Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening.",
435 |         "#Tasks": 17,
436 |         "#Compounds": "30 actives and 1,500 decoys per target",
437 |         "Task Type": "Rank",
438 |         "Data Type": "SMILES",
439 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref5:~:text=5-,MUV,-Maximum%20Unbiased%20Validation",
440 |         "Papers": [
441 |             {
442 |                 "Name": "Extracting Predictive Representations from Hundreds of Millions of Molecules",
443 |                 "Link": "https://pubs.acs.org/doi/pdf/10.1021/acs.jpclett.1c03058"
444 |             }
445 |         ]
446 |     },
447 |     {
448 |         "Dataset Name": "Cocaine addiction datasets",
449 |         "Domain": "",
450 |         "Short Description": "The 36 cocaine-addiction related datasets are collected from ChEMDL database (https://www.ebi.ac.uk/chembl/) and literatures (references 1 and 2 in README file), which involve 32 cocaine-addiction protein targets. The labels are binding affinities to these targets.",
451 |         "#Tasks": 36,
452 |         "#Compounds": "between 114 and 6,923 depending on the target",
453 |         "Task Type": "Regression",
454 |         "Data Type": "SMILES",
455 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/#ref6:~:text=5-,Cocaine%20addiction%20datasets,-The%2036%20cocaine",
456 |         "Papers": [
457 |             {
458 |                 "Name": "Proteome-informed machine learning studies of cocaine addiction",
459 |                 "Link": "https://weilab.math.msu.edu/DataLibrary/2D/#ref6:~:text=of%20cocaine%20addiction%22.-,PDF,-%5B7%5D%20Hongsong"
460 |             }
461 |         ]
462 |     },
463 |     {
464 |         "Dataset Name": "Cocaine addiction datasets 2",
465 |         "Domain": "",
466 |         "Short Description": "The 30 additional cocaine-addiction related datasets collected from ChEMDL database (https://www.ebi.ac.uk/chembl/), which involve 30 cocaine-addiction protein targets. The labels are binding affinities to these targets.",
467 |         "#Tasks": 36,
468 |         "#Compounds": "between 123 and 6,923 depending on the target",
469 |         "Task Type": "Regression",
470 |         "Data Type": "SMILES",
471 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/Downloads/cocaine_addiction-datasets2.zip",
472 |         "Papers": [
473 |             {
474 |                 "Name": "Proteome-informed machine learning studies of cocaine addiction",
475 |                 "Link": "https://pubs.acs.org/doi/pdf/10.1021/acs.jpclett.1c03133?casa_token=H4K9rfMLmasAAAAA:_C3oLB_pkvc5Lbd-aklaIASqvHZwue_Z3ghqfUgBkjj4LtmD9kU4urhC5zT5zegGO2ncig5v3dL_Qg"
476 |             }
477 |         ]
478 |     },
479 |     {
480 |         "Dataset Name": "Drug_addiction_related",
481 |         "Domain": "",
482 |         "Short Description": "Receptors related to opioid or cocaine addiction.",
483 |         "#Tasks": 11,
484 |         "#Compounds": "between 815 and 11,297 depending on target",
485 |         "Task Type": "Regression",
486 |         "Data Type": "SMILES",
487 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/Downloads/drug_addiction_related_WeiWeb_2D.zip",
488 |         "Papers": [
489 |             {
490 |                 "Name": "TIDAL: Topology-Inferred Drug Addiction Learning",
491 |                 "Link": "https://pubs.acs.org/doi/full/10.1021/acs.jcim.3c00046?casa_token=C4B_jMAbt4AAAAAA:BLEYP4-f1E8ZP1-3umVhxzrrXuGUzVLJkhOCFneHCeQOwXG6eb8e0NyVeOis8xBwz3jgxdawRDrKwQ"
492 |             }
493 |         ]
494 |     },
495 |     {
496 |         "Dataset Name": "hERG blocker/non-blocker datasets",
497 |         "Domain": "",
498 |         "Short Description": "Seven datasets are provided for the classification of hERG blocker/non-blockers. These datasets are from literatures and the original datasets are included.",
499 |         "#Tasks": 7,
500 |         "#Compounds": "between 927 and 203,853 (train) and 407 and 87,366 (test) depending on the task",
501 |         "Task Type": "Classification",
502 |         "Data Type": "SMILES",
503 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/Downloads/hERG-classification.zip",
504 |         "Papers": [
505 |             {
506 |                 "Name": "Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models",
507 |                 "Link": "https://www.sciencedirect.com/science/article/pii/S0010482522011994"
508 |             }
509 |         ]
510 |     },
511 |     {
512 |         "Dataset Name": "Opioid use disorder datasets",
513 |         "Domain": "",
514 |         "Short Description": "75 datasets collected from ChEMDL database (https://www.ebi.ac.uk/chembl/) used in the machine-learning study of opioid use disorder. The labels are binding affinities to these targets.",
515 |         "#Tasks": 75,
516 |         "#Compounds": "between 268 and 6,298 depending on the task",
517 |         "Task Type": "Regression",
518 |         "Data Type": "SMILES",
519 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/Downloads/OUD-datasets.zip",
520 |         "Papers": [
521 |             {
522 |                 "Name": "Machine-learning Analysis of Opioid Use Disorder Informed by MOR, DOR, KOR, NOR and ZOR-Based Interactome Networks",
523 |                 "Link": "https://arxiv.org/abs/2301.04815"
524 |             },
525 |             {
526 |                 "Name": "Machine-learning Repurposing of DrugBank Compounds for Opioid Use Disorder",
527 |                 "Link": "https://arxiv.org/abs/2303.00240"
528 |             }
529 |         ]
530 |     },
531 |     {
532 |         "Dataset Name": "SVS datasets",
533 |         "Domain": "",
534 |         "Short Description": "The 9 datasets for biomolecules interactions, including 4 regressions and 5 classfications.",
535 |         "#Tasks": 9,
536 |         "#Compounds": "between 186 and 11,188 depending on the task",
537 |         "Task Type": "Regression, Classification",
538 |         "Data Type": "SMILES",
539 |         "DownloadLink": "https://weilab.math.msu.edu/DataLibrary/2D/Downloads/SVS_datasets.zip",
540 |         "Papers": [
541 |             {
542 |                 "Name": "SVSBI: Sequence-based virtual screening of biomolecular interactions",
543 |                 "Link": "https://arxiv.org/abs/2212.13617"
544 |             }
545 |         ]
546 |     }
547 | ]
548 | 


--------------------------------------------------------------------------------