├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.sqlite 2 | *.zip 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # nutrition-data 2 | 3 | The USDA provides [a database of nutritional data][1], which is useful for food related 4 | software. It is structured as a zip of flat files and a PDF that describes the format. 5 | 6 | This is a quick and dirty python script that converts that zip file to a sqlite 7 | database. To avoid hard-coding schemas, I try to parse the schema from the `pdftotext` 8 | output of the documentation. 9 | 10 | To run this you will need the `requests` and `sqlite3` python modules and the 11 | `pdftotext` executable. 12 | 13 | The resulting database uses text fields for `NDB_NO`, per the specification in the 14 | PDF file. If you don't need an exact match for the upstream identifiers, which 15 | have leading zeros, you could convert these fields to integers. 16 | 17 | 18 | [1]: http://www.ars.usda.gov/Services/docs.htm?docid=8964 19 | 20 | --------------------------------------------------------------------------------