├── .gitignore ├── FOSS4Spec.xlsx ├── FOSS4Spec_2025-05-12.xlsx ├── FOSS4Spectroscopy.Rmd ├── Include ├── Table1.png └── sorttable.js ├── LICENSE ├── README.md ├── Render.R ├── Utilities ├── dereplicate_repos.R ├── final_manual_check.R ├── review_candidates.R └── run_searches.R ├── docs └── index.html └── ~$FOSS4Spec.xlsx /.gitignore: -------------------------------------------------------------------------------- 1 | *.Rhistory 2 | .DS_Store 3 | .Rapp.history 4 | .Rproj.user/ 5 | *~ 6 | Links404.csv 7 | DateReport.csv 8 | Searches/ 9 | -------------------------------------------------------------------------------- /FOSS4Spec.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanhanson/FOSS4Spectroscopy/a5dfd759e93b7ac2b59658169a9645e1a011a442/FOSS4Spec.xlsx -------------------------------------------------------------------------------- /FOSS4Spec_2025-05-12.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanhanson/FOSS4Spectroscopy/a5dfd759e93b7ac2b59658169a9645e1a011a442/FOSS4Spec_2025-05-12.xlsx -------------------------------------------------------------------------------- /FOSS4Spectroscopy.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: FOSS For Spectroscopy 3 | author: Bryan A. Hanson, DePauw University 4 | date: >- 5 | `r paste(format(Sys.time(), 6 | format = "%Y-%m-%d %H:%M", tz = "GMT"), "UTC")` 7 | output: 8 | html_document: 9 | theme: united 10 | --- 11 | 12 | 14 | 15 | 16 | 21 | 22 | 23 | 24 | 25 | 26 | The following table collects information about free and open source software ([FOSS](https://en.wikipedia.org/wiki/Free_and_open-source_software)) for spectroscopy. All information is taken from the respective websites and/or repositories. Some projects have been described in publications. If so, we try to provide a link, but the document may be behind a paywall. This [blog post](https://chemospec.org/posts/2021-04-19-Search-GH-Topics/2021-04-19-Search-GH-Topics.html) give some details about how we find interesting repositories. 27 | 28 | Unless otherwise noted, the software mentioned here: 29 | 30 | * Is suitable for one or more of the following techniques: NMR, IR, Raman, ESR/EPR, fluorescence, XRF, LIBS and UV-Vis. 31 | * Mass *spectrometry* software is not included (see the work by Stanstrup just below). 32 | * Software for MRI is not included. 33 | * Software for remote sensing is generally not included, though some projects involving hyperspectral imaging are included. 34 | * While some folks publish open source add-ons for `Matlab`, `Matlab` is not open source and projects written in `Matlab` are not included. 35 | 36 | Some other places to look: 37 | 38 | * [Stanstrup *et al.*](https://www.mdpi.com/2218-1989/9/10/200) have published a comprehensive paper describing the `R` packages suitable for use in metabolomics, which partially overlaps with the information here. The authors have also created a dynamic [document](https://rformassspectrometry.github.io/metaRbolomics-book/) with the same information and more. 39 | * [awesome-spectra](https://github.com/erwanp/awesome-spectra) is a page somewhat in the spirit of the work here, but apparently depends on authors to add their own material, and is missing some key entries (e.g. no NMR packages). 40 | * [All Things Raman](https://github.com/allthingsraman) is a curated collection of software for Raman spectroscopy. 41 | * The [CRAN Task View for Chemometrics & Computational Physics](https://cran.r-project.org/web/views/ChemPhys.html) includes some `R` packages listed here as well as related software. 42 | 43 | #### How Does One Choose a Package? 44 | 45 | *The projects listed here have been lightly vetted and the process is imperfect!* If a project looks incomplete, is a class project, or I can't tell what it does, it's not included!* Of course, if you feel I have not included your package in error, feel free to request its inclusion. With that in mind, there are still many packages to consider. As a general guide, the excellent checklist provided by [Lortie *et al.*](https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.5970) is included here for your consideration. 46 | 47 | Lortie Table 1 48 | 49 | ```{r setupR, echo = FALSE, results = "hide"} 50 | # Clean up the workspace but keep the local token, if present 51 | # This is necessary for the local build 52 | keep <- "github_token" 53 | rm(list = ls()[!(ls() %in% keep)]) 54 | 55 | suppressPackageStartupMessages(library("knitr")) 56 | suppressPackageStartupMessages(library("gt")) 57 | suppressPackageStartupMessages(library("readxl")) 58 | suppressPackageStartupMessages(library("httr")) 59 | suppressPackageStartupMessages(library("lubridate")) 60 | suppressPackageStartupMessages(library("jsonlite")) 61 | suppressPackageStartupMessages(library("rvest")) #??? 62 | suppressPackageStartupMessages(library("stringr")) 63 | suppressPackageStartupMessages(library("xml2")) 64 | suppressPackageStartupMessages(library("webu")) 65 | 66 | opts_chunk$set(echo = FALSE) 67 | 68 | set_config(timeout(40)) # httr setting for GET calls (default 13) 69 | cnt <- 0L # counter for the number of GET calls to Github 70 | username <- "bryanhanson" 71 | ``` 72 | 73 | 74 | ```{r readDB} 75 | # Please edit FOSS4Spec.xlsx to add information or make corrections. 76 | # Please follow the conventions of existing entries for consistency. 77 | # Remember the table in the web page is sortable so consistency in 78 | # description and focus is especially important for users to obtain 79 | # useful information easily. 80 | 81 | DF <- as.data.frame(read_excel("FOSS4Spec.xlsx", na = "NA")) 82 | # shorten names for less typing 83 | names(DF) <- c("pkgname", "desc", "lang", "focus", "repo", "web", "pub", "maint", "maint_email", "author_email") 84 | DF <- DF[order(DF$pkgname),] 85 | ``` 86 | 87 | ```{r token} 88 | junk <- check_for_github_token(github_token) 89 | ``` 90 | 91 | ```{r verifyURLs} 92 | # This takes some time! 93 | ne <- nrow(DF) # number of entries 94 | 95 | # Check all URLs, if the site is down handle so table always looks good. 96 | # Site URL might also just be missing from table, handle this too. 97 | webLink <- rep(FALSE, ne) # If TRUE, there is a link in the input table 98 | repoLink <- rep(FALSE, ne) 99 | pubLink <- rep(FALSE, ne) 100 | 101 | webOK <- rep(FALSE, ne) # If TRUE, URL was reachable 102 | repoOK <- rep(FALSE, ne) 103 | pubOK <- rep(FALSE, ne) 104 | 105 | # These are used for internal reporting 106 | # If TRUE, URL was given but not reachable 107 | badWeb <- rep(FALSE, ne) 108 | badRepo <- rep(FALSE, ne) 109 | badPub <- rep(FALSE, ne) 110 | 111 | for (i in 1:ne) { 112 | if (!is.na(DF$web[i])) { 113 | webOK[i] <- good_url(DF$web[i]) 114 | webLink[i] <- TRUE 115 | if (webLink[i] != webOK[i]) badWeb[i] <- TRUE 116 | } 117 | 118 | if (!is.na(DF$repo[i])) { 119 | repoOK[i] <- good_url(DF$repo[i]) 120 | repoLink[i] <- TRUE 121 | if (repoLink[i] != repoOK[i]) badRepo[i] <- TRUE 122 | } 123 | 124 | if (!is.na(DF$pub[i])) { 125 | pubOK[i] <- good_url(DF$pub[i]) 126 | pubLink[i] <- TRUE 127 | if (pubLink[i] != pubOK[i]) badPub[i] <- TRUE 128 | } 129 | } 130 | # If URLs are bad they will still be added to the table as hyperlinks, but 131 | # those links will give status 404. 132 | # Write a report so maintainers can check & fix if it's on our end 133 | LinkReport <- data.frame(name = DF$pkgname, webLink, webOK, repoLink, repoOK, pubLink, pubOK, stringsAsFactors = FALSE) 134 | keep <- badPub | badRepo | badWeb 135 | LinkReport <- LinkReport[keep,] 136 | if (nrow(LinkReport) > 0) write.csv(LinkReport, row.names = FALSE, file = "Reports/Links404.csv") 137 | ``` 138 | 139 | ```{r checkUpdateDate, warning = FALSE} 140 | # Use the info from checking URLs above 141 | webDate <- as.POSIXct(rep(NA, ne)) # see stackoverflow.com/a/33002710/633251 142 | commitDate <- as.POSIXct(rep(NA, ne)) 143 | issueDate <- as.POSIXct(rep(NA, ne)) 144 | updateDate <- as.POSIXct(rep(NA, ne)) 145 | 146 | repoType <- rep("xx", ne) 147 | repoType[grepl("github\\.com", DF$repo)] <- "gh" 148 | 149 | for (i in 1:ne) { 150 | 151 | if (webOK[i]) { 152 | ans <- find_page_date(flatten_web_page(DF$web[i])) 153 | if (!is.na(ans)) webDate[i] <- ans 154 | } 155 | 156 | if (repoOK[i]) { 157 | if (repoType[i] == "gh") { 158 | # NA returned when repo path bad 159 | tmp <- get_github_dates(DF$repo[i], "commits") 160 | cnt <- cnt + 1 161 | if (!is.na(tmp)) commitDate[i] <- ymd(tmp) 162 | tmp <- get_github_dates(DF$repo[i], "issues") 163 | cnt <- cnt + 1 164 | if (!is.na(tmp)) issueDate[i] <- ymd(tmp) 165 | } 166 | } 167 | 168 | # updateDate will be the most recent of webDate, issueDate, commitDate 169 | # If all are NA, a warning is issued so to avoid that do: 170 | if (is.na(webDate[i]) & is.na(commitDate[i]) & is.na(issueDate[i])) next 171 | updateDate[i] <- max(webDate[i], commitDate[i], issueDate[i], na.rm = TRUE) 172 | } 173 | updateDate <- date(updateDate) # -> ymd 174 | 175 | # Write a report 176 | DateReport <- data.frame(name = DF$pkgname, webDate, commitDate, issueDate, updateDate, stringsAsFactors = FALSE) 177 | write.csv(DateReport, row.names = FALSE, file = "Reports/DateReport.csv") 178 | ``` 179 | 180 | ```{r createNamelink} 181 | # Additional processing of the input values 182 | # Combine name, website and pub as available to create hyperlink 183 | # If website is missing, use repo instead (otherwise one must edit the original table more) 184 | namelink <- DF$pkgname # There must be at least a pkgname in the input table 185 | for (i in 1:ne) { 186 | if (!is.na(DF$web[i])) { 187 | namelink[i] <- paste("[", DF$pkgname[i], "](", DF$web[i], ")", sep = "") 188 | } 189 | if (is.na(DF$web[i])) { 190 | if (!is.na(DF$repo[i])) { 191 | namelink[i] <- paste("[", DF$pkgname[i], "](", DF$repo[i], ")", sep = "") 192 | } 193 | } 194 | if (!is.na(DF$pub[i])) { 195 | namelink[i] <- paste(namelink[i], " ([pub](", DF$pub[i], "))", sep = "") 196 | } 197 | } 198 | ``` 199 | 200 | ```{r createTable} 201 | DF2 <- data.frame(namelink, DF$desc, DF$lang, DF$focus, updateDate, stringsAsFactors = FALSE) 202 | names(DF2) <- c("Name", "Description", "Lang", "Focus", "Status") 203 | ``` 204 | 205 |
206 | 207 | * *Click on a header to sort the table ("Focus" may be a good starting point)* 208 | * [Abbreviations](#abbreviations) & terms below the table 209 | * The table currently has `r nrow(DF2)` entries; the majority are `R` (`r length(DF2$Lang[DF2$Lang == "R"])`) or `Python` (`r length(DF2$Lang[DF2$Lang == "Python"])`). 210 | 211 | ```{r printTable, results = "asis"} 212 | DF2 |> 213 | gt()|> 214 | opt_stylize(style = 1) |> 215 | fmt_markdown() 216 | ``` 217 | 218 | Additions or corrections? Please [file an issue](https://github.com/bryanhanson/FOSS4Spectroscopy/issues) with the necessary information or submit a pull request at the [repo](https://github.com/bryanhanson/FOSS4Spectroscopy). 219 | 220 | ##### Status Column: {#status} 221 | 222 | The status column in the table gives the date of the most recent: 223 | 224 | * commit to a repository, 225 | * activity on an issue filed in a repository, 226 | * web site update, or 227 | * submission to an archival network such as CRAN or PyPI 228 | 229 | as a proxy for regular maintenance, at the time this page was updated. Keep in mind that some software is fairly mature and thus an older status date does not necessarily mean the software is not maintained. Follow the links to the websites for more details. 230 | 231 | *The status date is found by automatic checking of sites with valid links. Commits and issues are only checked for Github sites, via the Github API. Web site updates are found using a simple search for common date formats and may be inaccurate. Keep in mind the most recent activity may be on any branch in a repo, and may be newer than any official releases.*[^1] 232 | 233 | ##### Abbreviations & Terms: {#abbreviations} 234 | 235 | * __CRAN:__ [Comprehensive R Archival Network](https://cran.r-project.org/) 236 | * __EDA:__ Exploratory data analysis (unsupervised chemometrics) 237 | * __Focus:__ The type of spectroscopy the software is focused on. Keep in mind that software designed with a given spectroscopy in mind may still work with other types of spectroscopic data. 238 | * __Language:__ Most software is built on several languages. In the table, "lang" refers to the primary language used to create the software. 239 | * __PyPI:__ [Python Package Index](https://pypi.org/) 240 | * __Python:__ Multipurpose programming language [details](https://www.python.org/about/) 241 | * __R:__ A software environment for statistical computing and graphics [details, platforms](https://www.r-project.org/) 242 | 243 | 244 | [^1]: We made `r cnt` inquiries to Github to produce this report. 245 | -------------------------------------------------------------------------------- /Include/Table1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanhanson/FOSS4Spectroscopy/a5dfd759e93b7ac2b59658169a9645e1a011a442/Include/Table1.png -------------------------------------------------------------------------------- /Include/sorttable.js: -------------------------------------------------------------------------------- 1 | /* 2 | SortTable 3 | version 2 4 | 7th April 2007 5 | Stuart Langridge, http://www.kryogenix.org/code/browser/sorttable/ 6 | 7 | Instructions: 8 | Download this file 9 | Add to your HTML 10 | Add class="sortable" to any table you'd like to make sortable 11 | Click on the headers to sort 12 | 13 | Thanks to many, many people for contributions and suggestions. 14 | Licenced as X11: http://www.kryogenix.org/code/browser/licence.html 15 | This basically means: do what you want with it. 16 | */ 17 | 18 | 19 | var stIsIE = /*@cc_on!@*/false; 20 | 21 | sorttable = { 22 | init: function() { 23 | // quit if this function has already been called 24 | if (arguments.callee.done) return; 25 | // flag this function so we don't do the same thing twice 26 | arguments.callee.done = true; 27 | // kill the timer 28 | if (_timer) clearInterval(_timer); 29 | 30 | if (!document.createElement || !document.getElementsByTagName) return; 31 | 32 | sorttable.DATE_RE = /^(\d\d?)[\/\.-](\d\d?)[\/\.-]((\d\d)?\d\d)$/; 33 | 34 | forEach(document.getElementsByTagName('table'), function(table) { 35 | if (table.className.search(/\bsortable\b/) != -1) { 36 | sorttable.makeSortable(table); 37 | } 38 | }); 39 | 40 | }, 41 | 42 | makeSortable: function(table) { 43 | if (table.getElementsByTagName('thead').length == 0) { 44 | // table doesn't have a tHead. Since it should have, create one and 45 | // put the first table row in it. 46 | the = document.createElement('thead'); 47 | the.appendChild(table.rows[0]); 48 | table.insertBefore(the,table.firstChild); 49 | } 50 | // Safari doesn't support table.tHead, sigh 51 | if (table.tHead == null) table.tHead = table.getElementsByTagName('thead')[0]; 52 | 53 | if (table.tHead.rows.length != 1) return; // can't cope with two header rows 54 | 55 | // Sorttable v1 put rows with a class of "sortbottom" at the bottom (as 56 | // "total" rows, for example). This is B&R, since what you're supposed 57 | // to do is put them in a tfoot. So, if there are sortbottom rows, 58 | // for backwards compatibility, move them to tfoot (creating it if needed). 59 | sortbottomrows = []; 60 | for (var i=0; i5' : ' ▴'; 104 | this.appendChild(sortrevind); 105 | return; 106 | } 107 | if (this.className.search(/\bsorttable_sorted_reverse\b/) != -1) { 108 | // if we're already sorted by this column in reverse, just 109 | // re-reverse the table, which is quicker 110 | sorttable.reverse(this.sorttable_tbody); 111 | this.className = this.className.replace('sorttable_sorted_reverse', 112 | 'sorttable_sorted'); 113 | this.removeChild(document.getElementById('sorttable_sortrevind')); 114 | sortfwdind = document.createElement('span'); 115 | sortfwdind.id = "sorttable_sortfwdind"; 116 | sortfwdind.innerHTML = stIsIE ? ' 6' : ' ▾'; 117 | this.appendChild(sortfwdind); 118 | return; 119 | } 120 | 121 | // remove sorttable_sorted classes 122 | theadrow = this.parentNode; 123 | forEach(theadrow.childNodes, function(cell) { 124 | if (cell.nodeType == 1) { // an element 125 | cell.className = cell.className.replace('sorttable_sorted_reverse',''); 126 | cell.className = cell.className.replace('sorttable_sorted',''); 127 | } 128 | }); 129 | sortfwdind = document.getElementById('sorttable_sortfwdind'); 130 | if (sortfwdind) { sortfwdind.parentNode.removeChild(sortfwdind); } 131 | sortrevind = document.getElementById('sorttable_sortrevind'); 132 | if (sortrevind) { sortrevind.parentNode.removeChild(sortrevind); } 133 | 134 | this.className += ' sorttable_sorted'; 135 | sortfwdind = document.createElement('span'); 136 | sortfwdind.id = "sorttable_sortfwdind"; 137 | sortfwdind.innerHTML = stIsIE ? ' 6' : ' ▾'; 138 | this.appendChild(sortfwdind); 139 | 140 | // build an array to sort. This is a Schwartzian transform thing, 141 | // i.e., we "decorate" each row with the actual sort key, 142 | // sort based on the sort keys, and then put the rows back in order 143 | // which is a lot faster because you only do getInnerText once per row 144 | row_array = []; 145 | col = this.sorttable_columnindex; 146 | rows = this.sorttable_tbody.rows; 147 | for (var j=0; j 12) { 184 | // definitely dd/mm 185 | return sorttable.sort_ddmm; 186 | } else if (second > 12) { 187 | return sorttable.sort_mmdd; 188 | } else { 189 | // looks like a date, but we can't tell which, so assume 190 | // that it's dd/mm (English imperialism!) and keep looking 191 | sortfn = sorttable.sort_ddmm; 192 | } 193 | } 194 | } 195 | } 196 | return sortfn; 197 | }, 198 | 199 | getInnerText: function(node) { 200 | // gets the text we want to use for sorting for a cell. 201 | // strips leading and trailing whitespace. 202 | // this is *not* a generic getInnerText function; it's special to sorttable. 203 | // for example, you can override the cell text with a customkey attribute. 204 | // it also gets .value for fields. 205 | 206 | if (!node) return ""; 207 | 208 | hasInputs = (typeof node.getElementsByTagName == 'function') && 209 | node.getElementsByTagName('input').length; 210 | 211 | if (node.getAttribute("sorttable_customkey") != null) { 212 | return node.getAttribute("sorttable_customkey"); 213 | } 214 | else if (typeof node.textContent != 'undefined' && !hasInputs) { 215 | return node.textContent.replace(/^\s+|\s+$/g, ''); 216 | } 217 | else if (typeof node.innerText != 'undefined' && !hasInputs) { 218 | return node.innerText.replace(/^\s+|\s+$/g, ''); 219 | } 220 | else if (typeof node.text != 'undefined' && !hasInputs) { 221 | return node.text.replace(/^\s+|\s+$/g, ''); 222 | } 223 | else { 224 | switch (node.nodeType) { 225 | case 3: 226 | if (node.nodeName.toLowerCase() == 'input') { 227 | return node.value.replace(/^\s+|\s+$/g, ''); 228 | } 229 | case 4: 230 | return node.nodeValue.replace(/^\s+|\s+$/g, ''); 231 | break; 232 | case 1: 233 | case 11: 234 | var innerText = ''; 235 | for (var i = 0; i < node.childNodes.length; i++) { 236 | innerText += sorttable.getInnerText(node.childNodes[i]); 237 | } 238 | return innerText.replace(/^\s+|\s+$/g, ''); 239 | break; 240 | default: 241 | return ''; 242 | } 243 | } 244 | }, 245 | 246 | reverse: function(tbody) { 247 | // reverse the rows in a tbody 248 | newrows = []; 249 | for (var i=0; i=0; i--) { 253 | tbody.appendChild(newrows[i]); 254 | } 255 | delete newrows; 256 | }, 257 | 258 | /* sort functions 259 | each sort function takes two parameters, a and b 260 | you are comparing a[0] and b[0] */ 261 | sort_numeric: function(a,b) { 262 | aa = parseFloat(a[0].replace(/[^0-9.-]/g,'')); 263 | if (isNaN(aa)) aa = 0; 264 | bb = parseFloat(b[0].replace(/[^0-9.-]/g,'')); 265 | if (isNaN(bb)) bb = 0; 266 | return aa-bb; 267 | }, 268 | sort_alpha: function(a,b) { 269 | if (a[0]==b[0]) return 0; 270 | if (a[0] 0 ) { 316 | var q = list[i]; list[i] = list[i+1]; list[i+1] = q; 317 | swap = true; 318 | } 319 | } // for 320 | t--; 321 | 322 | if (!swap) break; 323 | 324 | for(var i = t; i > b; --i) { 325 | if ( comp_func(list[i], list[i-1]) < 0 ) { 326 | var q = list[i]; list[i] = list[i-1]; list[i-1] = q; 327 | swap = true; 328 | } 329 | } // for 330 | b++; 331 | 332 | } // while(swap) 333 | } 334 | } 335 | 336 | /* ****************************************************************** 337 | Supporting functions: bundled here to avoid depending on a library 338 | ****************************************************************** */ 339 | 340 | // Dean Edwards/Matthias Miller/John Resig 341 | 342 | /* for Mozilla/Opera9 */ 343 | if (document.addEventListener) { 344 | document.addEventListener("DOMContentLoaded", sorttable.init, false); 345 | } 346 | 347 | /* for Internet Explorer */ 348 | /*@cc_on @*/ 349 | /*@if (@_win32) 350 | document.write("