├── .gitignore
├── LICENSE
├── README.md
├── _config.yml
├── code
    ├── classification
    │   ├── ModelSelection_v2.html
    │   ├── ModelSelection_v2.ipynb
    │   ├── Model_Aug2020.html
    │   ├── Model_Aug2020.ipynb
    │   ├── training_codebook.txt
    │   ├── training_set_suppl_v2.csv
    │   └── training_set_v2.csv
    ├── data_acquisition
    │   ├── jp2_download.py
    │   ├── search.csv
    │   └── xml_parser.py
    ├── marginalia
    │   ├── cropfunctions.py
    │   ├── example_utilities.py
    │   ├── marginalia_determination.py
    │   └── marginalia_removal.py
    ├── ocr
    │   ├── adjRec.py
    │   ├── geonames.py
    │   ├── geonames.txt
    │   ├── ocr_func.py
    │   └── ocr_use.py
    └── split_cleanup
    │   ├── 00_initial_ch_sec_split.py
    │   ├── 01_auto_chap_clean1.py
    │   ├── 02_auto_chap_clean2.py
    │   ├── 03_gen_manual_chapfix_files.py
    │   ├── 04_integrate_manual_chapfixes.py
    │   ├── 05_auto_section_clean.py
    │   ├── 06_gen_final_agg.py
    │   └── 07_final_sec_appraisal.py
├── environment.yml
├── examples
    ├── adjustment_recommendation
    │   ├── adjRec.ipynb
    │   ├── adjRec.py
    │   ├── adjusted.png
    │   ├── example_image.py
    │   ├── geonames.py
    │   ├── geonames.txt
    │   ├── images
    │   │   └── lawsresolutionso1891nort_jpg
    │   │   │   ├── lawsresolutionso1891nort_0272.jpg
    │   │   │   ├── lawsresolutionso1891nort_0374.jpg
    │   │   │   ├── lawsresolutionso1891nort_0542.jpg
    │   │   │   ├── lawsresolutionso1891nort_0606.jpg
    │   │   │   ├── lawsresolutionso1891nort_0771.jpg
    │   │   │   ├── lawsresolutionso1891nort_0944.jpg
    │   │   │   ├── lawsresolutionso1891nort_1114.jpg
    │   │   │   ├── lawsresolutionso1891nort_1210.jpg
    │   │   │   ├── lawsresolutionso1891nort_1373.jpg
    │   │   │   └── lawsresolutionso1891nort_1494.jpg
    │   ├── marginalia_metadata_demo.csv
    │   ├── ocr_func.py
    │   ├── sample_metadata.csv
    │   └── unadjusted.png
    ├── marginalia_determination
    │   ├── cropfunctions.py
    │   ├── example_image.py
    │   ├── lawsresolutionso1891nort_jp2
    │   │   ├── lawsresolutionso1891nort_0697.jpg
    │   │   ├── lawsresolutionso1891nort_0715.jpg
    │   │   └── lawsresolutionso1891nort_0716.jpg
    │   ├── marginalia_determination.html
    │   ├── marginalia_determination.ipynb
    │   ├── output
    │   │   └── marginalia_metadata_demo.csv
    │   └── sample_metadata.csv
    ├── ocr
    │   ├── adjustments_demo.csv
    │   ├── images
    │   │   └── lawsresolutionso1891nort_jpg
    │   │   │   ├── lawsresolutionso1891nort_0272.jpg
    │   │   │   ├── lawsresolutionso1891nort_0374.jpg
    │   │   │   ├── lawsresolutionso1891nort_0542.jpg
    │   │   │   ├── lawsresolutionso1891nort_0606.jpg
    │   │   │   ├── lawsresolutionso1891nort_0771.jpg
    │   │   │   ├── lawsresolutionso1891nort_0944.jpg
    │   │   │   ├── lawsresolutionso1891nort_1114.jpg
    │   │   │   ├── lawsresolutionso1891nort_1210.jpg
    │   │   │   ├── lawsresolutionso1891nort_1373.jpg
    │   │   │   └── lawsresolutionso1891nort_1494.jpg
    │   ├── marginalia_metadata_demo.csv
    │   ├── ocr_func.py
    │   ├── ocr_use.ipynb
    │   ├── output
    │   │   └── lawsresolutionso1891nort
    │   │   │   ├── lawsresolutionso1891nort_adjustments.txt
    │   │   │   ├── lawsresolutionso1891nort_private laws.txt
    │   │   │   ├── lawsresolutionso1891nort_private laws_data.tsv
    │   │   │   ├── lawsresolutionso1891nort_public laws.txt
    │   │   │   └── lawsresolutionso1891nort_public laws_data.tsv
    │   └── xmljpegmerge_demo.csv
    └── split_cleanup
    │   ├── 1899_public_chapnumflags_step4.csv
    │   ├── 1899_public_chapnumflags_step5.csv
    │   ├── 1899_public_final_agg.csv
    │   ├── 1899_public_initial_agg.csv
    │   ├── 1899_public_sample_flag_rows.csv
    │   ├── 1899_public_sample_raw.csv
    │   ├── 1899_public_weird_chaps_example.csv
    │   ├── chap_num_manual.png
    │   ├── split_cleanup.ipynb
    │   └── step4_fixlog.csv
├── images
    ├── Pauli_Murray.jpg
    ├── UniversityLibraries_logo_black_h75.png
    └── mellon-foundation-logo.jpg
├── index.html
├── index.md
├── installation.md
├── oer
    ├── .ipynb_checkpoints
    │   ├── 04-HowToOCR-checkpoint.ipynb
    │   ├── 05-StructuringOCRData-checkpoint.ipynb
    │   └── environment_backup-checkpoint.yml
    ├── 00-Introduction-AlgorithmsOfResistance.ipynb
    ├── 01-AlgorithmsOfResistance-WhatIsAnAlgorithm.ipynb
    ├── 02-GatheringACorpus.ipynb
    ├── 03-WhatIsOCR.ipynb
    ├── 04-HowToOCR.ipynb
    ├── 05-StructuringOCRData.ipynb
    ├── 06-ExploratoryAnalysis.ipynb
    ├── NC_counties.txt
    ├── README.md
    ├── environment_backup.yml
    ├── geonames.txt
    ├── images
    │   ├── 00-intro-01.jpeg
    │   ├── 00-intro-02.jpg
    │   ├── 00-intro-03.jpeg
    │   ├── 00-intro-04.jpeg
    │   ├── 00-intro-05.jpeg
    │   ├── 00-intro-06.jpeg
    │   ├── 00-intro-07.jpeg
    │   ├── 00-intro-08.jpeg
    │   ├── 00-intro-09.jpeg
    │   ├── 00-intro-10.jpeg
    │   ├── 00-intro-11.jpg
    │   ├── 00-intro-12.jpg
    │   ├── 00-intro-25.jpg
    │   ├── 01-algorithms-01.jpg
    │   ├── 06-corpus-01.jpeg
    │   ├── 06-corpus-02.jpeg
    │   ├── 06-corpus-03.jpeg
    │   ├── 06-corpus-04.jpg
    │   ├── 06-corpus-05.jpeg
    │   ├── 06-corpus-06.jpeg
    │   ├── 06-corpus-07.jpeg
    │   ├── 06-corpus-08.jpeg
    │   ├── 06-corpus-09.jpg
    │   ├── 06-corpus-10.jpeg
    │   ├── 06-corpus-11.jpeg
    │   ├── 06-corpus-12.jpeg
    │   ├── 06-corpus-13.jpeg
    │   ├── 06-corpus-14.jpeg
    │   ├── 06-corpus-15.jpg
    │   ├── 06-corpus-16.jpg
    │   ├── 06-corpus-17.jpg
    │   ├── 06-corpus-18.jpg
    │   ├── 06-corpus-runcode.mp4
    │   ├── 07-ocr-01.jpeg
    │   ├── 07-ocr-02.jpeg
    │   ├── 07-ocr-03.jpeg
    │   ├── 07-ocr-04.jpeg
    │   ├── 07-ocr-05.jpeg
    │   ├── 07-ocr-05.txt
    │   ├── 07-ocr-06.jpeg
    │   ├── 07-ocr-06.txt
    │   ├── 07-ocr-07.jpeg
    │   ├── 07-ocr-07.txt
    │   ├── 07-ocr-08.jpeg
    │   ├── 07-ocr-08.txt
    │   ├── 08-ocr-01.jpeg
    │   ├── 08-ocr-02.jpeg
    │   ├── 08-ocr-03.jpeg
    │   ├── 08-ocr-04.jpeg
    │   ├── 08-ocr-05.jpeg
    │   ├── 08-ocr-06.jpeg
    │   ├── 08-ocr-07.jpeg
    │   ├── 09-data-01.jpeg
    │   ├── 09-data-02.jpeg
    │   ├── 09-data-03.jpeg
    │   ├── 09-data-04.jpeg
    │   ├── 09-data-05.jpeg
    │   ├── 09-data-06.jpeg
    │   ├── 09-data-07.jpeg
    │   ├── 10-explore-01.jpeg
    │   ├── 10-explore-02.jpeg
    │   ├── 10-explore-03.jpeg
    │   ├── 10-explore-04.jpeg
    │   ├── 10-explore-05.jpeg
    │   ├── 10-explore-06.jpeg
    │   ├── 10-explore-07.jpeg
    │   ├── 10-explore-08.jpeg
    │   ├── 10-explore-09.jpeg
    │   ├── 10-explore-10.jpeg
    │   ├── 10-explore-11.jpeg
    │   ├── 10-explore-12.jpeg
    │   ├── 10-explore-13.jpeg
    │   ├── 10-explore-14.jpeg
    │   ├── Anaconda_Nucleus_Horizontal_white.svg
    │   ├── LawBooks-feature.png
    │   ├── chronam_daybook_19151112_pellagra_full.jpg
    │   ├── chronam_daybook_19151112_pellagra_full_bboxes.png
    │   ├── noun_arrow with loops_2073885.png
    │   ├── sessionlawsresol1955nort_0057.jpg
    │   ├── sessionlawsresol1955nort_0057_300ppi.jpg
    │   ├── sessionlawsresol1955nort_0057_grayscale.jpg
    │   ├── sessionlawsresol1955nort_0057_inverted.jpg
    │   ├── sessionlawsresol1955nort_0057_rotated.jpg
    │   ├── sessionlawsresol1955nort_0057_skewed.jpg
    │   └── sessionlawsresol1955nort_0058.jpg
    ├── jc_laws_list.csv
    ├── jclaws_dataset.csv
    ├── jpg_output
    │   ├── sessionlawsresol1955nort_0000.jpg
    │   ├── sessionlawsresol1955nort_0001.jpg
    │   ├── sessionlawsresol1955nort_0002.jpg
    │   ├── sessionlawsresol1955nort_0003.jpg
    │   ├── sessionlawsresol1955nort_0004.jpg
    │   ├── sessionlawsresol1955nort_0005.jpg
    │   ├── sessionlawsresol1955nort_0006.jpg
    │   ├── sessionlawsresol1955nort_0007.jpg
    │   ├── sessionlawsresol1955nort_0008.jpg
    │   ├── sessionlawsresol1955nort_0009.jpg
    │   ├── sessionlawsresol1955nort_0010.jpg
    │   ├── sessionlawsresol1955nort_0011.jpg
    │   ├── sessionlawsresol1955nort_0012.jpg
    │   ├── sessionlawsresol1955nort_0013.jpg
    │   ├── sessionlawsresol1955nort_0014.jpg
    │   ├── sessionlawsresol1955nort_0015.jpg
    │   ├── sessionlawsresol1955nort_0016.jpg
    │   ├── sessionlawsresol1955nort_0017.jpg
    │   ├── sessionlawsresol1955nort_0018.jpg
    │   ├── sessionlawsresol1955nort_0019.jpg
    │   ├── sessionlawsresol1955nort_0020.jpg
    │   ├── sessionlawsresol1955nort_0021.jpg
    │   ├── sessionlawsresol1955nort_0022.jpg
    │   ├── sessionlawsresol1955nort_0023.jpg
    │   ├── sessionlawsresol1955nort_0024.jpg
    │   └── sessionlawsresol1955nort_0025.jpg
    ├── on_the_books_text_jc_all.txt
    ├── sample
    │   ├── sessionlawsresol1955nort_0057.jpg
    │   ├── sessionlawsresol1955nort_0058.jpg
    │   ├── sessionlawsresol1955nort_0059.jpg
    │   ├── sessionlawsresol1955nort_0060.jpg
    │   ├── sessionlawsresol1955nort_0061.jpg
    │   ├── sessionlawsresol1955nort_0062.jpg
    │   ├── sessionlawsresol1955nort_0063.jpg
    │   ├── sessionlawsresol1955nort_0064.jpg
    │   ├── sessionlawsresol1955nort_0065.jpg
    │   └── sessionlawsresol1955nort_0066.jpg
    ├── sample_output.txt
    ├── sample_output
    │   ├── sample_output_spellchecked.csv
    │   ├── sessionlawsresol1955nort_0057.txt
    │   ├── sessionlawsresol1955nort_0058.txt
    │   ├── sessionlawsresol1955nort_0059.txt
    │   ├── sessionlawsresol1955nort_0060.txt
    │   ├── sessionlawsresol1955nort_0061.txt
    │   ├── sessionlawsresol1955nort_0062.txt
    │   ├── sessionlawsresol1955nort_0063.txt
    │   ├── sessionlawsresol1955nort_0064.txt
    │   ├── sessionlawsresol1955nort_0065.txt
    │   └── sessionlawsresol1955nort_0066.txt
    ├── sessionlawsresol1955nort_0057.jpg
    ├── sessionlawsresol1955nort_0057_grayscale.jpg
    └── sessionlawsresol1955nort_0057_inverted.jpg
└── workflow.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # OnTheBooks
 2 | 
 3 | [On the Books: Jim Crow and Algorithms of Resistance](https://onthebooks.lib.unc.edu/) is a [collections as data project](https://collectionsasdata.github.io/part2whole/) of the [University of North Carolina at Chapel Hill Libraries](https://library.unc.edu/) to make North Carolina legal history accessible to researchers by creating a corpus that contains over one hundred years of North Carolina session laws from Reconstruction through the Civil Rights Movement (1866-1967). The project also used machine learning to identify Jim Crow laws during this period. 
 4 | 
 5 | [Read more](https://unc-libraries-data.github.io/OnTheBooks/)
 6 | 
 7 | ## [Installation and Dependencies](installation.md)
 8 | 
 9 | ## [Workflow (code and examples)](workflow.md)
10 | 
11 | ## [Text Corpora](https://doi.org/10.17615/5c4g-sd44)
12 | 
13 | ## [Open Educational Resource](/oer)
14 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-slate


--------------------------------------------------------------------------------
/code/classification/Model_Aug2020.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "This notebook documents the model fit during the first phase of *On the Books: Jim Crow and Algorithms of Resistance*, as of August 2020."
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Packages"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 1,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "import os\n",
 24 |     "import re\n",
 25 |     "\n",
 26 |     "import pandas as pd\n",
 27 |     "import numpy as np\n",
 28 |     "import scipy.sparse\n",
 29 |     "\n",
 30 |     "from nltk.tokenize import word_tokenize\n",
 31 |     "from nltk.corpus import stopwords\n",
 32 |     "\n",
 33 |     "from sklearn.feature_extraction.text import CountVectorizer\n",
 34 |     "from sklearn.calibration import CalibratedClassifierCV, calibration_curve\n",
 35 |     "\n",
 36 |     "from xgboost import XGBClassifier"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "## Data Preparation\n"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 2,
 49 |    "metadata": {},
 50 |    "outputs": [],
 51 |    "source": [
 52 |     "train_df = pd.read_csv(\"../training_set/training_set_v0_clean.csv\")"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "markdown",
 57 |    "metadata": {},
 58 |    "source": [
 59 |     "We performed simple preprocessing on the text:\n",
 60 |     "* Replaced hyphenated and line broken words with unbroken words.\n",
 61 |     "* Removed section numbering from the law text (\"section_text\").\n",
 62 |     "* Removed all non-ASCII characters (most of these were OCR errors).\n",
 63 |     "* Converted all words to lower case.\n",
 64 |     "* Removed stopwords based on `nltk`'s default list.\n",
 65 |     " * We also removed any words occuring in less than 2 or more than 1000 documents.\n",
 66 |     "* We used session or volume identified (\"csv\") information to extract a numeric year.  In the case of multi-year volumes (e.g. 1956-1957) the earlier year was used.\n",
 67 |     "\n",
 68 |     "Then we convert the text into a document-term matrix, augmented with year and law type variables."
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": 3,
 74 |    "metadata": {},
 75 |    "outputs": [],
 76 |    "source": [
 77 |     "repl = lambda m: m.group(\"letter\")\n",
 78 |     "\n",
 79 |     "#Fix hyphenated words\n",
 80 |     "train_df[\"text\"] = train_df.text.str.replace(r\"-[ \\|]+(?P<letter>[a-zA-Z])\",repl).astype(\"str\")\n",
 81 |     "train_df[\"section_text\"] = train_df.section_text.str.replace(r\"-[ \\|]+(?P<letter>[a-zA-Z])\",repl).astype(\"str\")\n",
 82 |     "train_df[\"section_text\"] = [re.sub(r'- *\\n+(\\w+ *)', r'\\1\\n',r) for r in train_df[\"section_text\"]]\n",
 83 |     "\n",
 84 |     "#Remove section titles (e.g. \"Sec. 1\") from law text.\n",
 85 |     "train_df[\"start\"] = train_df.section_raw.str.len().fillna(0).astype(\"int\")\n",
 86 |     "train_df[\"section_text\"] = train_df.apply(lambda x: x['section_text'][(x[\"start\"]):], axis=1).str.strip()\n",
 87 |     "\n",
 88 |     "#Remove all non-ASCII characters\n",
 89 |     "train_df[\"section_text\"] = train_df[\"section_text\"].str.replace(r\"[^\\x00-\\x7F]\", \"\", regex=True)\n",
 90 |     "\n",
 91 |     "law_list = [word_tokenize(r.lower()) for r in train_df.section_text]\n",
 92 |     "stop_words = stopwords.words('english')\n",
 93 |     "law_list = [[word for word in law if word not in stop_words] for law in law_list]\n",
 94 |     "\n",
 95 |     "#Extract a numeric year variable\n",
 96 |     "train_df[\"year\"] = train_df.sess.str.slice(start = 0, stop = 4).astype(\"float\")\n",
 97 |     "train_df.loc[train_df.sess.isna(),\"year\"] = train_df.csv.str.extract(\"(\\d{4})\")\n",
 98 |     "\n",
 99 |     "def dummy(doc):\n",
100 |     "    return doc\n",
101 |     "#Remove terms appearing in less than 2 or more than 1000 documents, then convert to document-term matrix.\n",
102 |     "vect = CountVectorizer(tokenizer=dummy,preprocessor=dummy, decode_error = \"ignore\",\n",
103 |     "                      min_df = 2, max_df = 1000)\n",
104 |     "dtm = vect.fit_transform(law_list)\n",
105 |     "\n",
106 |     "#Add year and law type variables.\n",
107 |     "extra_df = train_df.loc[:,[\"year\",\"type\"]].copy()\n",
108 |     "extra_df = pd.get_dummies(extra_df, columns = [\"type\"], prefix = [\"type\"])\n",
109 |     "X = scipy.sparse.hstack((dtm,extra_df.values))"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "markdown",
114 |    "metadata": {},
115 |    "source": [
116 |     "## Model Details\n",
117 |     "\n",
118 |     "The `fit_params` below were fit using a 80-20 training-test split, followed by 10-fold cross validation on the training set.  We will include a basic template of our model selection process later this year."
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": 4,
124 |    "metadata": {},
125 |    "outputs": [],
126 |    "source": [
127 |     "fit_params =  {'colsample_bytree': 0.3, 'gamma': 0.3, 'learning_rate': 0.3, \n",
128 |     "               'max_depth': 20, 'min_child_weight': 1, 'n_estimators': 50, \n",
129 |     "               'scale_pos_weight': 5}\n",
130 |     "all_mod = XGBClassifier(**fit_params)\n",
131 |     "all_modfit = all_mod.fit(X, train_df.assessment)"
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "markdown",
136 |    "metadata": {},
137 |    "source": [
138 |     "The XGBoost classifier outperformed the other models selected.  Read more about XGBoost [here](https://arxiv.org/abs/1603.02754).  \n",
139 |     "\n",
140 |     "After fitting, we used probability calibration to adjust the model probabilities to better reflect the training set."
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": 5,
146 |    "metadata": {},
147 |    "outputs": [],
148 |    "source": [
149 |     "calibrated_mod = CalibratedClassifierCV(all_modfit, cv=10, method=\"isotonic\")\n",
150 |     "calibrated_modfit = calibrated_mod.fit(X, train_df.assessment)\n",
151 |     "\n",
152 |     "\n",
153 |     "train_df[\"base_labels\"] = all_modfit.predict(X)\n",
154 |     "train_df[\"base_probs\"] = all_modfit.predict_proba(X)[:,1]\n",
155 |     "train_df[\"calibrated_probs\"] = calibrated_modfit.predict_proba(X)[:,1]\n",
156 |     "train_df[\"calibrated_labels\"] = (train_df.calibrated_probs > 0.9).astype(\"int\")"
157 |    ]
158 |   },
159 |   {
160 |    "cell_type": "markdown",
161 |    "metadata": {},
162 |    "source": [
163 |     "We reported any laws with a calibrated probability over 90% as as Jim Crow laws with a source of \"model\", unless they were also later confirmed by an expert, in which case they were labeled as \"model and expert\".  We chose to be conservative at this point to minimize false positives and since this project will continue over the coming year, allowing us more time to fine tune the modeling process."
164 |    ]
165 |   }
166 |  ],
167 |  "metadata": {
168 |   "kernelspec": {
169 |    "display_name": "Python 3",
170 |    "language": "python",
171 |    "name": "python3"
172 |   },
173 |   "language_info": {
174 |    "codemirror_mode": {
175 |     "name": "ipython",
176 |     "version": 3
177 |    },
178 |    "file_extension": ".py",
179 |    "mimetype": "text/x-python",
180 |    "name": "python",
181 |    "nbconvert_exporter": "python",
182 |    "pygments_lexer": "ipython3",
183 |    "version": "3.7.3"
184 |   },
185 |   "toc": {
186 |    "base_numbering": 1,
187 |    "nav_menu": {},
188 |    "number_sections": true,
189 |    "sideBar": true,
190 |    "skip_h1_title": true,
191 |    "title_cell": "Table of Contents",
192 |    "title_sidebar": "Contents",
193 |    "toc_cell": false,
194 |    "toc_position": {},
195 |    "toc_section_display": true,
196 |    "toc_window_display": false
197 |   }
198 |  },
199 |  "nbformat": 4,
200 |  "nbformat_minor": 4
201 | }
202 | 


--------------------------------------------------------------------------------
/code/classification/training_codebook.txt:
--------------------------------------------------------------------------------
 1 | id:            Standardized identifier for each law consisting of: year, law type, chapter_num and section_num
 2 | source:        Source of Jim Crow law assessment (Pauli Murray, Richard Paschal, or project experts - William Sturkey or Kimber Thomas)
 3 | jim_crow:      Indicator of Jim Crow (1) or not Jim Crow (0)
 4 | type:          Type of law
 5 | chapter_num:   Chapter number as integer, generated from OCR and data cleaning
 6 | section_num:   Section number as integer, generated from OCR and data cleaning
 7 | chapter_text:  The text of the title and any introduction before the first section of the law
 8 | section_text:  The text of the specificed section
 9 | extrinsic:     Supplemental data only. This field indicates whether the Jim Crow assessment was extrinsic (1), based almost completely on information outside the text of the law, or implicitly (0) Jim Crow.
10 | 


--------------------------------------------------------------------------------
/code/data_acquisition/jp2_download.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # coding: utf-8
 3 | 
 4 | """
 5 | jp2_download.py
 6 | 
 7 |     
 8 | @summary: This script downloads and stores volumes of image files from
 9 |     the Internet Archive. It uses an existing list of volume
10 |     titles to create request links.
11 |     
12 |     Once the downloads are complete, the script checks for errors.
13 |     
14 |     It uses the volume title list mentioned above to check for missing
15 |     volumes so that the user can download missing volumes manually. 
16 |     It then checks for download errors by comparing file sizes
17 |     of local copies with those of the Internet Archive copies.
18 |     
19 |     Discrepancies are printed for the user.
20 | 
21 | @author: Rucha Dalwadi
22 | 
23 | Digital Research Services
24 | University Libraries
25 | UNC Chapel Hill
26 | 
27 | """
28 | 
29 | import urllib.request
30 | import os
31 | import csv
32 | import time
33 | 
34 | os.chdir(r"")# The Directory where you want the pdfs to be downloaded in like C:\Users\onthebooks\Documents\lawpdfs
35 | 
36 | # Open the file with identifiers for parsing
37 | with open("search.csv","r") as identifiers:
38 |     reader=csv.DictReader(identifiers)
39 |     l=[d['identifier'] for d in reader] # identifier is the column with the volume names
40 | 
41 | ct=0
42 | start=time.time()
43 | problems=list()
44 | fails=0
45 | sourcelink = 'https://archive.org/download/'  # A single web source contains all the files to download
46 | 
47 | # The identifiers are used to generate links to the images for download
48 | for f in l:
49 |     try:
50 |         ct+=1
51 |         # A download link is created for each file by appending the id and file extension to the source
52 |         full_link = sourcelink+f+'/'+f+'_jp2.zip'         
53 |         urllib.request.urlretrieve(full_link, f+'_jp2.zip')
54 |         time.sleep(120)
55 |         if ct%10==0:
56 |             print(str(ct)+": "+f)
57 |             print(time.time()-start)
58 |     except:
59 |         fails+=1
60 |         time.sleep(60)
61 | 
62 | end=time.time() 
63 | print(end-start)
64 | 
65 | ## Checking for and resolving problems:
66 | 
67 | for f in l:
68 |     zips= sourcelink + f +'/' +f+ '_jp2.zip'
69 |     
70 | # Get a list of the missing folders
71 | missed=[f for f in zips if f.split("/")[-1] not in os.listdir(".")]
72 | print(missed)
73 | #manually download missed files
74 | 
75 | # Get a list of the items with broken links by comparing file sizes of the
76 | # original file to the downloaded file
77 | broken_dl=[]
78 | for z in zips:
79 |     local=z.split("/")[-1]
80 |     local_size=os.path.getsize(local)
81 |     with urllib.request.urlopen(z) as web:
82 |         meta=dict(web.info())
83 |         web_size=int(meta["Content-Length"])
84 |     if web_size!=local_size:
85 |         broken_dl.append(z)
86 | 
87 | # Print the sizes of the broken files and original files for comparison
88 | for z in broken_dl:
89 |     local=z.split("/")[-1]
90 |     local_size=os.path.getsize(local)
91 |     with urllib.request.urlopen(z) as web:
92 |         meta=dict(web.info())
93 |         web_size=int(meta["Content-Length"])
94 |     print(local_size/(1024^2),web_size/(1024^2))
95 | 
96 | 


--------------------------------------------------------------------------------
/code/data_acquisition/search.csv:
--------------------------------------------------------------------------------
  1 | identifier
  2 | lawsofstateofnor184849nor
  3 | privatelawsofsta1905nort
  4 | publiclawsofstat186465nor
  5 | publiclawsofstat185859nor
  6 | publiclawsresolx1921nort
  7 | privatelawsofsta1895nort
  8 | lawsresolutionso1887nort
  9 | publiclawsresolu1927nort
 10 | publiclawsresolu1920nort
 11 | publiclawsresolu1924nort
 12 | publiclocallawsp1941nort
 13 | lawsofstateofnor184041nort
 14 | publiclocallawso1915nort
 15 | publiclocallawso1911nort
 16 | publiclawsresolu1908nort
 17 | publiclawsresolu1905nort
 18 | publiclocallawso1913nort
 19 | lawsofstateofnor183637nort
 20 | lawsresolutionso1883nort
 21 | publiclawsresolu187273nor
 22 | publiclocallawsp1924nort
 23 | publiclocallawsp1921nort
 24 | lawsresolutionso1891nort
 25 | privlawsofsta185455nort
 26 | privatelawsofsta1907nort
 27 | privatelawsofsta186970nor
 28 | privatelawsofsta1913nort
 29 | privatelawsofsta186869nor
 30 | publiclawsresolu1903nort
 31 | publiclawsresolu1909nort
 32 | privatelawsofsta1893nort
 33 | privatelawsofsta1901nort
 34 | publiclawsresolu1935nort
 35 | publiclawsresolu1923nort
 36 | publiclawsofstat185657nor
 37 | publiclawsofstat186970nor
 38 | publiclawsofstat187071nor
 39 | publiclawsofstat186566nor
 40 | publiclawsofstat186061nor
 41 | lawsresolutionso1880nort
 42 | lawsresolutionso1881nort
 43 | publiclocallawsp1933nort
 44 | publiclawsresolu1907nort
 45 | publiclocallawsp1917nort
 46 | publiclawsresolu1913nort
 47 | publiclawsresolu1899nort
 48 | publiclocallawsp1925nort
 49 | privatelawsofsta186667nor
 50 | privatelawsofsta187071nor
 51 | privatelawsofsta187172nor
 52 | publiclocallawsp1919nort
 53 | publiclocallawsp1935nort
 54 | lawsofstateofnor184647nor
 55 | lawsofstateofnor185051nor
 56 | privatelawsofsta1908nort
 57 | privatelawsofsta186465nor
 58 | privatelawsofsta1915nort
 59 | privatelawsofsta1899nort
 60 | privatelawsofsta186566nor
 61 | publiclawsresolu1933nort
 62 | publiclawsresolu1931nort
 63 | publiclawsofstat186869nor
 64 | publiclawsresolu1921nort
 65 | publiclawsofstat187172nor
 66 | publiclawsofstat186667nor
 67 | publiclawsofstat186263nor
 68 | publiclawsofstat1868nort
 69 | lawsresolutionso1879nort
 70 | lawsresolutionso187475nor
 71 | lawsresolutionso187374nor
 72 | lawsofnorthcarol1827nort
 73 | lawsofnorthcarol1813nort
 74 | lawsofnorthcarol1822nort
 75 | lawsofnorthcarol183132nort
 76 | publiclocallawsp1923nort
 77 | publiclocallawsp1927nort
 78 | publiclocallawsp3839nort
 79 | publiclocallawsp1929nort
 80 | publiclocallawsp1931nort
 81 | lawsofstateofnor184243nort
 82 | lawsofstateofnor18381839nort
 83 | lawsofstateofnor184445nor
 84 | lawsresolutionso187677nor
 85 | lawsofstateofnor1852nort
 86 | lawsresolutionso1889nort
 87 | lawsresolutionso1885nort
 88 | publiclawsresolu1893nort
 89 | publiclawsresolu1901nort
 90 | publiclawsresolu1897nort
 91 | publiclawsofstat1861nort
 92 | publiclawsofstat185455nor
 93 | publiclawsofstat1863nort
 94 | publiclawsresolu1941nort
 95 | publiclawsresolu1936nort
 96 | publiclawsresolu1925nort
 97 | publiclawsresolu1929nort
 98 | publiclawsresolu193839nor
 99 | privatelawsofsta1897nort
100 | privatelawsofsta1903nort
101 | privatelawsofsta1909nort
102 | privatelawsofsta1911nort
103 | sessionlawsresol1953nort
104 | sessionlawsresol19891nort
105 | sessionlawsresol1973nort
106 | sessionlawsresol1949nort
107 | sessionlawsr199192nort
108 | sessionlawsresol1995nort
109 | sessionlawsresol1943nort
110 | lawsofnorthcarol1791nort
111 | lawsofnorthcarol1819nort
112 | lawsofnorthcarol1816nort
113 | lawsofnorthcarol1799nort
114 | lawsofnorthcarol1798nort
115 | lawsofnorthcarol1807nort
116 | lawsofnorthcarol1800nort
117 | lawsofnorthcarol1835nort
118 | lawsofnorthcarol1828nort
119 | lawsofnorthcarol1801nort
120 | lawsofnorthcarol1795nort
121 | sessionlawsresol02nort
122 | sessionlawsresol19892nort
123 | sessionlawsresol1947nort
124 | sessionlawsresol19911nort
125 | sessionlawsresol19953nort
126 | sessionlawsresol1955nort
127 | sessionlaws198788nort
128 | publiclawsresolu1900nort
129 | publiclocallaws1913nort
130 | publiclawsresolu1911nort
131 | publiclawsresolu1917nort
132 | publiclawsresolu1915nort
133 | publiclawsresolu1895nort
134 | publiclawsresolu1919nort
135 | lawsofnorthcarol1797nort
136 | lawsofnorthcarol1792nort
137 | lawsofnorthcarol1820nort
138 | lawsofnorthcarol1817nort
139 | lawsofnorthcarol1790nort
140 | lawsofnorthcarol1825nort
141 | lawsofnorthcarol1829nort
142 | lawsofnorthcarol1818nort
143 | sessionlawsresol00nort
144 | sessionlawsre19932nort
145 | sessionlaws1997983nort
146 | sessionlaws197778nort
147 | sessionlawsresol1969nort
148 | sessionlaws198384nort
149 | sessionlawsresol19912nort
150 | sessionlawsre199394nort
151 | sessionlawsresol1945nort
152 | sessionlawsresol03nort
153 | sessionlawsresol1959nort
154 | sessionlawsresol19991nort
155 | sessionlawsresol19972nort
156 | sessionlaws195657nort
157 | sessionlaws196365nort
158 | sessionlaws7980nort
159 | sessionlawsresol19971nort
160 | sessionlawsresol1977nort
161 | sessionlawsresol1981nort
162 | sessionlawsresol1961nort
163 | sessionlaws198990nort
164 | sessionlawsresol19871nort
165 | sessionlaws19656667nort
166 | sessionlaws1997984nort
167 | lawsofnorthcarol1821nort
168 | lawsofnorthcarol1793nort
169 | lawsofnorthcarol1796nort
170 | lawsofnorthcarol1794nort
171 | lawsofnorthcarol1812nort
172 | lawsofnorthcarol1810nort
173 | lawsofnorthcarol1809nort
174 | lawsofnorthcarol183435nort
175 | lawsofnorthcarol1811nort
176 | lawsofnorthcarol183334nort
177 | lawsofnorthcarol1823nort
178 | lawsofnorthcarol183031nort
179 | lawsofnorthcarol1826nort
180 | lawsofnorthcarol183233nort
181 | lawsofnorthcarol1815nort
182 | lawsofnorthcarol1814nort
183 | lawsofnorthcarol1824nort
184 | sessionlawsresol1971nort
185 | sessionlawsresol83nort
186 | sessionlawsresol1951nort
187 | sessionlawsl197576nort
188 | sessionlawsresol1985nort
189 | sessionlawsresol1963nort
190 | sessionlawsresol19872nort
191 | sessionlaws198586nort
192 | sessionlaws8182nort
193 | sessionlawsresol19952nort
194 | sessionlawsr19931nort
195 | sessionlawsresol1975nort
196 | 


--------------------------------------------------------------------------------
/code/data_acquisition/xml_parser.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # coding: utf-8
 3 | 
 4 | 
 5 | """
 6 | xml_parser.py
 7 | 
 8 |     
 9 | @summary: The purpose of this script is to extract certain metadata
10 |     for a set of volumes using the volume names (identifiers) from the Internet Archive. 
11 |     First we use "search.csv" to construct a list of the volumes whose xml metadata we 
12 |     want to parse. We use the volume names to build request urls for the xml metadata
13 |     files associated with each volume. We then parse the xml files to locate and store 
14 |     the following information for each page in a given volume: 
15 |     
16 |     handSide: the hand side (L/R) of a given leaf.
17 |     pageNum: logical page numbers (image numbers)
18 |     leafNum: Physical page numbers
19 |     filename: The filename associated with each page image
20 |     
21 |     This information is then written to xml_metadata.csv with each image 
22 |     file in each volume constituting a row. The information in this file
23 |     can then be combined with other, manually compiled metadata
24 |     to form the xmljpegmerge.csv file, used in later steps.
25 |     
26 | 
27 | @author: Rucha Dalwadi
28 | 
29 | Digital Research Services
30 | University Libraries
31 | UNC Chapel Hill
32 | 
33 | """
34 | 
35 | import xml.etree.ElementTree as ET
36 | import csv
37 | import pandas as pd
38 | import os
39 | import urllib
40 | 
41 | # Using the search.csv file, a list of the volumes whose xml files will be parsed is created. 
42 | with open("search.csv","r") as xmlfiles:
43 |     reader=csv.DictReader(xmlfiles)
44 |     l=[d['xmlfiles'] for d in reader] # The column xmlfiles contains the identifiers of the volumes
45 | 
46 | # Through the xml files, extract the logical page numbers(pageNum), physical page numbers(leafNum) 
47 | # and leaf hand side(handSide)
48 | handSide = []
49 | pageNum = []
50 | filename = []
51 | leafNum = []
52 | master = []
53 | 
54 | for i in l:
55 |     try:
56 |         # Get the xml file of a volume by creating a download link using volume identifier
57 |         xml = urllib.request.urlopen('https://archive.org/download/' + i + '/' + i + '_' + 'scandata.xml') 
58 |         tree = ET.parse(xml)
59 |         root = tree.getroot()
60 | 
61 |         # Create a master list with xml metadata for all volumes
62 |         for page in root[2].findall('page'):
63 |             leafNum.append(int(page.attrib['leafNum']))
64 |             handSide.append(page.find('handSide').text)
65 | 
66 |             page_dict = {}
67 |             page_dict['leafNum'] = int(page.attrib['leafNum'])
68 |             if page.find('pageNumber') != None:
69 |                 page_dict['pageNum'] = page.find('pageNumber').text
70 |             else:
71 |                 page_dict['pageNum'] = ''
72 | 
73 |             page_dict['handSide'] = page.find('handSide').text
74 |             page_dict['filename'] = i + '_' + '%04d' %  page_dict['leafNum']
75 |             master.append(page_dict)
76 | 
77 |         # Create a csv file
78 |         with open('xml_metadata.csv', 'w') as csvfile:
79 |             writer1 = csv.DictWriter(csvfile, fieldnames=['filename','leafNum','handSide','pageNum'],lineterminator='\n')
80 |             writer1.writeheader()
81 |             for row in master:
82 |                 writer1.writerow(row)
83 |     except:
84 |         print(i)
85 | 
86 | 


--------------------------------------------------------------------------------
/code/marginalia/example_utilities.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon Jul  8 08:16:57 2019
  4 | 
  5 | @author: mtjansen
  6 | """
  7 | 
  8 | from PIL import Image, ImageDraw
  9 | import os
 10 | import csv
 11 | 
 12 | sys.path.append(os.path.abspath(r"C:\Users\mtjansen\Desktop\OnTheBooks"))
 13 | from cropfunctions import *
 14 | 
 15 | 
 16 | def combine_bbox(b1,b2):
 17 |     """Combines successive boundary boxes, each within the last.
 18 |     Returns:
 19 |     tuple: Coordinates of crop (left,upper,right,lower)
 20 |     """
 21 |     b1 = list(b1)
 22 |     b2 = list(b2)
 23 |     total=[b1[k] + b2[k] for k in range(2)] + [b1[k-2] + b2[k] for k in range(2,4)]
 24 |     return tuple(total)
 25 | 
 26 | def example_image(orig, diff, angle, band_dict, bheight, total_bbox, orig_bbox):
 27 |         bd_ct = len(band_dict["hbands"])
 28 |         back_height = bd_ct*(bheight+20)+100        
 29 |         
 30 | #        bounds = Image.new(orig.mode, (orig.size[0],back_height), "white")
 31 | #        
 32 | #        for row in band_dict["hbands"]:
 33 | #            band_bbox = list(row["raw"])
 34 | #            band_bbox[1] = row["index"]
 35 | #            band_bbox[3] = row["index"]+50
 36 | #            band = orig.crop(orig_bbox).crop(tuple(band_bbox))
 37 | #            hoff = list(row["raw"])[0] + list(orig_bbox)[0]
 38 | #            voff = row["index"] + list(orig_bbox)[1]+10*(row["index"]+50)/bheight
 39 | #            bounds.paste(band,(hoff,int(voff)))
 40 |         
 41 |         img = orig.copy().rotate(angle).crop(orig_bbox)
 42 |         bounds = Image.new(orig.mode, orig.size, "white")
 43 |         draw = ImageDraw.Draw(bounds)
 44 | 
 45 |         for row in band_dict["hbands"]:
 46 |             band_bbox = list(row["raw"])
 47 |             band_bbox[1] = row["index"]-50
 48 |             band_bbox[3] = row["index"]
 49 |             band = combine_bbox(orig_bbox,band_bbox)
 50 |             spot = tuple(list(band)[0:2])
 51 |             bounds.paste(img.crop(band_bbox),spot)
 52 |             if list(band_bbox)[2]-list(band_bbox)[0]>10:
 53 |                 draw.rectangle(band,outline="red",fill = None,width=2)
 54 |             
 55 |         final = orig.rotate(angle).crop(total_bbox)
 56 |         
 57 |         back_width = (orig.size[0]+bounds.size[0]+final.size[0])+250
 58 |         back = Image.new(orig.mode, (back_width,back_height), "white")
 59 |         
 60 |         back.paste(orig,(50,50))
 61 |         back.paste(bounds,(orig.size[0]+150,50))
 62 |         back.paste(final,(orig.size[0]+bounds.size[0]+250,list(total_bbox)[1]+50))
 63 |         
 64 |         return back
 65 |         
 66 |     
 67 | os.chdir(r"C:\Users\mtjansen\Desktop\OnTheBooks\1865-1968 jp2 files")
 68 | 
 69 | ############################
 70 | # Get xml data from file. ##
 71 | ############################
 72 | master = []
 73 | 
 74 | with open(r"..\xmljpegmerge_official.csv", "r") as csvfile:
 75 |     reader = csv.DictReader(csvfile)
 76 |     for row in reader:
 77 |         row_dict = dict()
 78 |         row_dict["filename"] = row["filename"] + ".jp2"
 79 |         row_dict["side"] = row["handSide"].lower()
 80 |         row_dict["folder"] = row["filename"].split("_")[0]+"_jp2"
 81 |         row_dict["type"] = row['sectiontype']
 82 |         row_dict["start_section"] = False
 83 |         master.append(row_dict)
 84 |         
 85 | master = sorted(master, key = lambda i: i['filename']) 
 86 | 
 87 | for k in range(1,len(master)):
 88 |     if master[k]["type"] != master[k-1]["type"]:
 89 |         master[k]["start_section"] = True
 90 | 
 91 | master = [m for m in master if "186465" not in m["filename"]]    
 92 |     
 93 | batch = [row for row in master if row["filename"] in ["publiclocallawsp1917nort_0568.jp2","publiclocallawsp1933nort_0063.jp2"]]
 94 | 
 95 | output_dir = r"C:\Users\mtjansen\Desktop\OnTheBooks\outwide_fix"
 96 | for r in batch:
 97 |     t0 = time.time()
 98 |     f = os.path.join(r["folder"],r["filename"])
 99 |     orig = Image.open(f)
100 |     
101 |     side = r["side"]
102 |         
103 |     ang = rotation_angle(orig)
104 |     
105 |     if r["start_section"]:
106 |         diff, background, orig_bbox = trim(orig, angle=ang, find_top=False)
107 |     else:
108 |         diff, background, orig_bbox = trim(orig, angle=ang)
109 | 
110 |     if "196" in r["folder"] or "195" in r["folder"]:
111 |         total_bbox = orig_bbox
112 |         cut = None
113 |     else:
114 |         bheight = 50
115 |         band_dict = get_bands(diff, bheight=bheight) 
116 |         
117 |         width = diff.size[0]
118 |         cut = simp_bd(band_dict=band_dict, diff=diff, side=side, width=width,
119 |                       pad=10, freq =0.9)
120 |     
121 |         out_bbox = [0, 0] + list(diff.size)
122 |         side_dict = {"left":0, "right":2}
123 |         out_bbox[side_dict[side]] = cut
124 |         
125 |         total_bbox = combine_bbox(orig_bbox,out_bbox)
126 |     
127 |     ex = example_image(orig=orig, diff=diff, angle=ang, band_dict=band_dict, 
128 |                   bheight=bheight, total_bbox=total_bbox, orig_bbox=orig_bbox)
129 |     
130 |     out = os.path.join(output_dir,r["filename"].replace(".jp2","_BREAK.jpg"))
131 |     ex.save(out, "JPEG")
132 |     


--------------------------------------------------------------------------------
/code/marginalia/marginalia_determination.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Thu Jun 27 13:30:26 2019
  4 | 
  5 | @author: mtjansen
  6 | """
  7 | 
  8 | import sys
  9 | import os
 10 | import csv
 11 | import shutil
 12 | import time
 13 | import random
 14 | 
 15 | from collections import Counter
 16 | from PIL import Image, ImageChops, ImageStat
 17 | from scipy.ndimage import interpolation as inter
 18 | import numpy as np
 19 | 
 20 | sys.path.append(os.path.abspath(r"C:\Users\mtjansen\Desktop\OnTheBooks"))
 21 | from cropfunctions import *
 22 | 
 23 | os.chdir(r"C:\Users\mtjansen\Desktop\OnTheBooks\1865-1968 jp2 files")
 24 | 
 25 | ############################
 26 | # Get xml data from file. ##
 27 | ############################
 28 | master = []
 29 | 
 30 | with open(r"..\xmljpegmerge_official.csv", "r") as csvfile:
 31 |     reader = csv.DictReader(csvfile)
 32 |     for row in reader:
 33 |         row_dict = dict()
 34 |         row_dict["filename"] = row["filename"] + ".jp2"
 35 |         row_dict["side"] = row["handSide"].lower()
 36 |         row_dict["folder"] = row["filename"].split("_")[0]+"_jp2"
 37 |         row_dict["type"] = row['sectiontype']
 38 |         row_dict["start_section"] = False
 39 |         master.append(row_dict)
 40 |         
 41 | master = sorted(master, key = lambda i: i['filename']) 
 42 | 
 43 | for k in range(1,len(master)):
 44 |     if master[k]["type"] != master[k-1]["type"]:
 45 |         master[k]["start_section"] = True
 46 | 
 47 | master = [m for m in master if "186465" not in m["filename"]]
 48 | # Process metadata
 49 |     
 50 | #test = random.sample(master,500)
 51 | batch = master[80000:]
 52 | meta = []
 53 | 
 54 | img_ct = 0
 55 | start = time.time()
 56 | for r in batch:
 57 |     #t0 = time.time()
 58 |     f = os.path.join(r["folder"],r["filename"])
 59 |     orig = Image.open(f)
 60 |     
 61 |     side = r["side"]
 62 |         
 63 |     ang = rotation_angle(orig)
 64 |     
 65 |     if r["start_section"]:
 66 |         diff, background, orig_bbox = trim(orig, angle=ang, find_top=False)
 67 |     else:
 68 |         diff, background, orig_bbox = trim(orig, angle=ang)
 69 | 
 70 |     if "196" in r["folder"] or "195" in r["folder"]:
 71 |         total_bbox = orig_bbox
 72 |         cut = None
 73 |     else:
 74 |         bheight = 50
 75 |         band_dict = get_bands(diff, bheight=bheight) 
 76 |         
 77 |         width = diff.size[0]
 78 |         cut = simp_bd(band_dict=band_dict, diff=diff, side=side, width=width,
 79 |                       pad=10, freq =0.9)
 80 |     
 81 |         out_bbox = [0, 0] + list(diff.size)
 82 |         side_dict = {"left":0, "right":2}
 83 |         out_bbox[side_dict[side]] = cut
 84 |         
 85 |         total_bbox = combine_bbox(orig_bbox,out_bbox)
 86 |     
 87 |     meta_list = [r["filename"], ang, side, cut]
 88 |     meta_list.extend(background)
 89 |     meta_list.extend(total_bbox)
 90 |     meta.append(meta_list)
 91 |     img_ct +=1
 92 |     #print (r["filename"], time.time() - t0)
 93 |     if img_ct % 100 ==0:
 94 |         print(img_ct, time.time()- start)
 95 | 
 96 | headers = ["file","angle","side","cut","backR","backG","backB",
 97 |            "bbox1","bbox2","bbox3","bbox4"]
 98 | with open("..\marginalia_metadata_part2.csv","a",newline="") as outfile:
 99 |     writer=csv.writer(outfile)
100 |     for row in meta:
101 |         writer.writerow(row)
102 | 
103 | 


--------------------------------------------------------------------------------
/code/marginalia/marginalia_removal.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Fri Jul  5 09:11:57 2019
  4 | 
  5 | @author: mtjansen
  6 | """
  7 | 
  8 | import csv
  9 | import os
 10 | from PIL import Image
 11 | 
 12 | meta = []
 13 | with open(r"C:\Users\mtjansen\Desktop\OnTheBooks\marginalia_metadata.csv","r") as csvfile:
 14 |     reader = csv.DictReader(csvfile)
 15 |     for row in reader:
 16 |         meta.append(row)
 17 | 
 18 | def remove_marginalia(img, meta, image_directory, file_output=False, output_directory = None):
 19 |     """Uses marginalia metadata to crop image and add border.
 20 |     
 21 |     Parameters:
 22 |     img (str): image file name with or without .jp2 file ending
 23 |     metadata (str): list of dicts from "marginalia_metadata.csv" with keys:
 24 |         file: file path with extension
 25 |         angle: angle of rotation
 26 |         backR: Red channel of background color in RGB
 27 |         backG: Green channel of background color in RGB
 28 |         backB: Blue channel of background color in RGB
 29 |         bbox1: First coordinate of bounding box (left)
 30 |         bbox2: Second coordinate of bounding box (top)
 31 |         bbox3: Third coordinate of bounding box (right)
 32 |         bbox4: Fourth coordinate of bounding box (bottom)
 33 |     image_directory (str): path to directory containing volue subfolders e.g.
 34 |         1865-1968 jp2 files\sessionlaws196365nort_jp2\sessionlaws196365nort_0000.jp2
 35 |         The path above maps to a single image, therefore the path to 
 36 |         1865-1968 jp2 files should be supplied to image_directory
 37 |     file_output (logical): whether to locally save a jpg version of the 
 38 |         cropped image
 39 |     output_directory (str): path to directory to save output images if indicated
 40 |         by file_output.  Directory structure will mirror the input directories
 41 |         in image_directory
 42 |         
 43 |     Returns:
 44 |     PIL.Image.Image: An image cropped as indicated in meta, with a 200 pixel wide
 45 |         border filled in with the supplied background color in meta.
 46 |         If file_output is selected, a jpg version of the cropped image will be saved
 47 |             to output_directory.
 48 |     """
 49 | 
 50 |     try:
 51 |         if not img.endswith(".jp2"):
 52 |             img = img+".jp2"
 53 |         row = [r for r in meta if r["file"]==img][0]
 54 |         path = os.path.join(image_directory,
 55 |                             row["file"].split("_")[0]+"_jp2",
 56 |                             row["file"])
 57 |         background = tuple([int(n) for n in [row["backR"],row["backG"],
 58 |                             row["backB"]]])
 59 |         bbox = tuple([int(n) for n in [row["bbox1"],row["bbox2"],
 60 |                       row["bbox3"],row["bbox4"]]])
 61 |         orig = Image.open(path)
 62 |         new = orig.rotate(float(row["angle"])).crop(bbox)
 63 |         outimg = Image.new(orig.mode, tuple(x+400 for x in new.size), background)
 64 |         offset = (200, 200)
 65 |         outimg.paste(new, offset)
 66 |         
 67 |         if file_output:
 68 | #            out = os.path.join(output_dir,
 69 | #                               row["file"].split("_")[0]+"_jp2",
 70 | #                               row["file"])
 71 | #            if not (os.path.exists(os.path.split(out)[0])):
 72 | #                os.mkdir(os.path.split(out)[0])
 73 |             out = os.path.join(output_dir,
 74 |                                row["file"].replace(".jp2",".jpg"))
 75 |             outimg.save(out, "JPEG")
 76 |         
 77 |         return outimg
 78 |     except:
 79 |         print("Image not found in metadata")
 80 | 
 81 |     
 82 |     
 83 | #Test
 84 | image_dir = r"C:\Users\mtjansen\Desktop\OnTheBooks\1865-1968 jp2 files"
 85 | output_dir = r"C:\Users\mtjansen\Desktop\OnTheBooks\out_width"
 86 | 
 87 | #import random
 88 | #test_set = random.sample(meta,500)
 89 | 
 90 | outliers = []
 91 | with open(r"C:\Users\mtjansen\Desktop\OnTheBooks\outlier_metadata_width.csv","r") as csvfile:
 92 |     reader = csv.DictReader(csvfile)
 93 |     for row in reader:
 94 |         outliers.append(row)
 95 |         
 96 | #test_set = [row for row in meta if row["file"] in ["publiclocallawsp1917nort_0568.jp2","publiclocallawsp1933nort_0063.jp2"]]
 97 | #        
 98 | #test_set = random.sample(outliers,100)
 99 |         
100 | 
101 | for row in outliers:
102 |     img = remove_marginalia(img = row["file"],
103 |                       meta = outliers,
104 |                       image_directory = image_dir,
105 |                       file_output = True,
106 |                       output_directory = output_dir)
107 |     
108 | for row in test_set:
109 |     path = os.path.join(image_directory,
110 |                             row["file"].split("_")[0]+"_jp2",
111 |                             row["file"])
112 |     out = os.path.join(output_dir,
113 |                                row["file"].replace(".jp2",".jpg"))
114 |     
115 |     background = tuple([int(n) for n in [row["backR"],row["backG"],
116 |                     row["backB"]]])
117 |     bbox = tuple([int(n) for n in [row["bbox1"],row["bbox2"],
118 |               row["bbox3"],row["bbox4"]]])
119 |     
120 |     orig = Image.open(path)
121 |     new = orig.rotate(float(row["angle"])).crop(bbox)
122 |     outimg = Image.new(orig.mode, tuple(x+400 for x in new.size), background)
123 |     offset = (200, 200)
124 |     outimg.paste(new, offset)
125 |     outimg.save(out, "JPEG")


--------------------------------------------------------------------------------
/code/ocr/adjRec.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python2
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | Created on Tue Jul 23 16:38:46 2019
  5 | 
  6 | @author: Lorin Bruckner
  7 | 
  8 | Digital Research Services
  9 | University Libraries
 10 | UNC Chapel Hill
 11 | """
 12 | 
 13 | import os, sys
 14 | import pandas as pandas
 15 | from random import sample
 16 | import csv
 17 | 
 18 | #get ocr functions
 19 | sys.path.insert(0, "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/OCR/")
 20 | from ocr_func import cutMarg, OCRtestImg, testList
 21 | 
 22 | def adjRec(vol, dirpath, masterlist, margdata, n):
 23 |     
 24 |     """
 25 | 
 26 |     Get the best image adustments to use on a volume.     
 27 |   
 28 |     vol (str)        : The name for the volume to be tested. Should not include
 29 |                        "_jp2" (so for 1879, it's "lawsresolutionso1879nort")
 30 |                   
 31 |     dirpath (str)    : The directory path for the folder where ALL volumes are located
 32 |     
 33 |     masterlist (str) : The direct file path for xmljpegmerge_official.csv
 34 |     
 35 |     margdata (str)   : The direct file path for the csv with marginalia data
 36 |     
 37 |     n (int)          : The sample size to use for testing
 38 | 
 39 |     """
 40 | 
 41 |     #Merge csvs
 42 |     mastercsv = pandas.read_csv(masterlist)
 43 |     margcsv = pandas.read_csv(margdata)
 44 |     mastercsv["filename"] = mastercsv["filename"] + ".jp2" 
 45 |     csv = mastercsv.merge(margcsv, left_on="filename", right_on="file")
 46 |     
 47 |     #Create a pool of image filenames for the volume and take a sample
 48 |     pool = []
 49 |     csvf = csv[csv["filename"].str.startswith(vol)].set_index("filename")
 50 |     
 51 |     for row in csvf.itertuples():
 52 |         pool.append(os.path.normpath(os.path.join(dirpath, vol + "_jp2/" + row.file)))
 53 |     
 54 |     pool = sample(pool, n)  
 55 |     
 56 |     #Get images for files in sample, cut margins and make a test list
 57 |     imgs = []
 58 |     results = []
 59 |     
 60 |     for img in pool:
 61 |         
 62 |         #get image name
 63 |         name = os.path.split(img)[1]
 64 |         
 65 |         #get values for cutting margins
 66 |         rotate = csvf.loc[name]["angle"]
 67 |         left = csvf.loc[name]["bbox1"]
 68 |         up = csvf.loc[name]["bbox2"]
 69 |         right = csvf.loc[name]["bbox3"]
 70 |         lower = csvf.loc[name]["bbox4"]
 71 |         bkgcol = (csvf.loc[name]["backR"], csvf.loc[name]["backG"], csvf.loc[name]["backB"])
 72 |         
 73 |         #cut the margins          
 74 |         img = cutMarg(img = img, rotate = rotate, left = left, up = up, right = right,
 75 |                      lower = lower, border = 200, bkgcol = bkgcol)
 76 |         
 77 |         #add the new image to the list
 78 |         imgs.append(img)
 79 |         
 80 |         #perform an OCR test on the new image and add the results to the list
 81 |         results.append(OCRtestImg(img))
 82 |         
 83 |     #create a testList object with the  images and results
 84 |     testSample = testList(imgs, results)
 85 |     
 86 |     #set up a dict of reccommended adjustments and perform tests
 87 |     adjustments = { "volume": vol, "color": 1.0, "invert": False, 
 88 |                     "autocontrast": 0, "blur": False, "sharpen": False, 
 89 |                     "smooth": False, "xsmooth": False }
 90 | 
 91 |     #color test
 92 |     testRes = testSample.adjustTest("color", levels = [1,.75,.5,.25,0])
 93 |     best = float(testRes["best_adjustment"].replace("color", ""))
 94 |     if best != 1.0:
 95 |         testSample = testSample.adjustImg(color = best)
 96 |         adjustments["color"] = best
 97 |         
 98 |     #invert test
 99 | #    testRes = testSample.adjustTest("invert")
100 | #    if testRes["best_adjustment"] == "invertTrue":
101 | #        testSample = testSample.adjustImg(invert = True)
102 | #        adjustments["invert"] = True
103 |         
104 |     #autocontrast test
105 |     testRes = testSample.adjustTest("autocontrast", levels = [0,2,4,6,8])
106 |     best = float(testRes["best_adjustment"].replace("autocontrast", ""))
107 |     if best != 0.0:
108 |         testSample = testSample.adjustImg(autocontrast = best)
109 |         adjustments["autocontrast"] = best
110 |         
111 |     #blur test
112 |     testRes = testSample.adjustTest("blur")
113 |     if testRes["best_adjustment"] == "blurTrue":
114 |         testSample = testSample.adjustImg(blur = True)
115 |         adjustments["blur"] = True
116 |         
117 |     #sharpen test
118 |     testRes = testSample.adjustTest("sharpen")
119 |     if testRes["best_adjustment"] == "sharpenTrue":
120 |         testSample = testSample.adjustImg(sharpen = True)
121 |         adjustments["sharpen"] = True
122 |         
123 |     #smooth test
124 |     testRes = testSample.adjustTest("smooth")
125 |     if testRes["best_adjustment"] == "smoothTrue":
126 |         testSample = testSample.adjustImg(smooth = True)
127 |         adjustments["smooth"] = True
128 |         
129 |         #xsmooth test
130 |         testRes = testSample.adjustTest("xsmooth")
131 |         if testRes["best_adjustment"] == "xsmoothTrue":
132 |             adjustments["xsmooth"] = True
133 | 
134 |     return adjustments    
135 | 
136 | 
137 | ###########  Set up locations  ###############################################
138 |     
139 | dirpath = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/1865-1968 jp2 files/"        
140 | masterlist = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/xmljpegmerge_official.csv"
141 | margdata = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/marginalia_metadata_part2_fix.csv"
142 | 
143 | 
144 | ###########  Reccommend Adjustments for a Single Volume  ######################
145 | 
146 | adj1943 = adjRec("sessionlawsresol1943nort", dirpath, masterlist, margdata, 10)
147 | 
148 | 
149 | ###########  Create a CSV with Adjustment Specs for all Volumes  ############## 
150 | 
151 | savfile = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/output/adjustments.csv"
152 | 
153 | for folder in os.listdir(dirpath):
154 |     
155 |     if folder == ".DS_Store":
156 |         continue
157 |     
158 |     #get volume
159 |     vol = folder.replace("_jp2", "")
160 |     print ("Testing " + vol + "...")
161 |     
162 |     #peform adjustment tests 
163 |     adjRow = adjRec(vol, dirpath, masterlist, margdata, 10)
164 |     
165 |     #record adjustments
166 |     with open(savfile, "a") as f:
167 |         w = csv.DictWriter(f, adjRow.keys())
168 |         if f.tell() == 0:
169 |             w.writeheader()
170 |             w.writerow(adjRow)
171 |         else: 
172 |             w.writerow(adjRow)


--------------------------------------------------------------------------------
/code/ocr/geonames.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python2
 2 | # -*- coding: utf-8 -*-
 3 | """
 4 | Created on Fri May 31 16:46:58 2019
 5 | 
 6 | @author: Lorin Bruckner
 7 | 
 8 | Digital Research Services
 9 | University Libraries
10 | UNC Chapel Hill
11 | """
12 | 
13 | import pandas as pandas
14 | from nltk import word_tokenize
15 | 
16 | #Read in the tab delimited file from http://download.geonames.org/export/dump/US.zip
17 | #File was downloaded 5/31/19, 4:41 PM
18 | gn = pandas.read_csv("/Users/tuesday/Documents/_Projects/Research/OnTheBooks/US/US.txt", sep ="\t", header = None)
19 | 
20 | #Filter records for North Carolina 
21 | ncgn = gn[gn.loc[:,10] == "NC"]
22 | 
23 | #Dump all geonames into a single string
24 | geonames = ""
25 | for index,row in ncgn.iterrows():
26 |     if type(row[2]) is str: 
27 |         geonames = geonames + " " + row[2]
28 | 
29 | #Tokenize geonames. Remove punctuation, duplicates and single letters. Make lowercase.
30 | geotokens = word_tokenize(geonames)
31 | geotokens = [token for token in geotokens if token.isalpha()]
32 | geotokens = list(dict.fromkeys(geotokens))
33 | geotokens = [token for token in geotokens if len(token) > 1]
34 | geotokens = [token.lower() for token in geotokens]
35 | 
36 | #Create text file to add to Spell Checker
37 | with open("/Users/tuesday/Documents/_Projects/Research/OnTheBooks/geonames.txt", "w") as file:
38 |     for token in geotokens:
39 |         file.write(token + " ")


--------------------------------------------------------------------------------
/code/ocr/ocr_use.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python2
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | Created on Mon Jul 22 15:18:26 2019
  5 | 
  6 | @author: Lorin Bruckner
  7 | 
  8 | Digital Research Services
  9 | University Libraries
 10 | UNC Chapel Hill
 11 | """
 12 | 
 13 | import os, sys
 14 | import pandas as pandas
 15 | from datetime import datetime
 16 | 
 17 | #get ocr functions
 18 | sys.path.insert(0, "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/OCR/")
 19 | from ocr_func import cutMarg, adjustImg, tsvOCR
 20 | 
 21 | #Set up locations
 22 | masterlist = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/xmljpegmerge_official.csv"
 23 | margdata = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/marginalia_metadata_part2_fix.csv"
 24 | adjdata = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/adjustments_fixed.csv"
 25 | rootImgDir = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/1865-1968 jp2 files/"        
 26 | outDir = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/output/"
 27 | 
 28 | #Read csvs
 29 | mastercsv = pandas.read_csv(masterlist)
 30 | margcsv = pandas.read_csv(margdata)
 31 | adjcsv = pandas.read_csv(adjdata)
 32 | 
 33 | #Create column for volume
 34 | for index, row in mastercsv.iterrows():
 35 |     volume = row["filename"].split("_")[0]
 36 |     mastercsv.at[index, "volume"] = volume  
 37 | 
 38 | #Merge csvs
 39 | mastercsv["filename"] = mastercsv["filename"] + ".jp2" 
 40 | mcsv = mastercsv.merge(margcsv, left_on="filename", right_on="file")
 41 | fcsv = mcsv.merge(adjcsv, on = "volume", how = "right")
 42 | 
 43 | #get separate volumes
 44 | volsGrouped = fcsv.groupby("volume")
 45 | vols = volsGrouped.groups.keys()
 46 | 
 47 | #loop through volumes
 48 | for vol in vols:
 49 |     
 50 |     print("")
 51 |     
 52 |     #create a folder for the volume in the output directory if it doesn't already exist
 53 |     newdir = os.path.normpath(os.path.join(outDir, vol))
 54 |     if os.path.exists(newdir) == False:
 55 |         os.mkdir(newdir)
 56 |     
 57 |     #select rows for volume
 58 |     voldf = volsGrouped.get_group(vol)
 59 |     
 60 |     #get separate section types
 61 |     secsGrouped = voldf.groupby("sectiontype")
 62 |     secs = secsGrouped.groups.keys()
 63 |     
 64 |     #create seperate OCR files for each section
 65 |     for sec in secs:
 66 |             
 67 |         #select rows for section type
 68 |         secsdf = secsGrouped.get_group(sec)
 69 | 
 70 |         print(datetime.now().strftime("%H:%M") + " Processing " + vol + " " + sec + "...")
 71 |         
 72 |         #Loop through section
 73 |         for row in secsdf.itertuples():
 74 |             
 75 |             img = os.path.normpath(os.path.join(rootImgDir, vol + "_jp2", row.file))
 76 |             
 77 |             #set up margin cutting
 78 |             cuts = {"rotate" : row.angle,
 79 |                     "left" : row.bbox1,
 80 |                     "up" : row.bbox2,
 81 |                     "right" : row.bbox3,
 82 |                     "lower" : row.bbox4,
 83 |                     "border" : 200,
 84 |                     "bkgcol" : (row.backR, row.backG, row.backB)}
 85 |             
 86 |             #set up image adjustment            
 87 |             adjustments = {"color": row.color, 
 88 |                            "autocontrast": row.autocontrast,
 89 |                            "blur": row.blur,
 90 |                            "sharpen": row.sharpen,
 91 |                            "smooth": row.smooth,
 92 |                            "xsmooth": row.xsmooth}
 93 |             
 94 |             #Record image adjustments
 95 |             adjf = open(os.path.normpath(os.path.join(outDir, vol, vol + "_adjustments.txt")), "w")
 96 |             adjf.write("IMAGE ADJUSTMENTS\n\n")
 97 |             for key, value in adjustments.items():
 98 |                 adjf.write("{}: {}\n" .format(key, value))
 99 |             adjf.close()
100 |             
101 |             #OCR the image
102 |             tsvOCR((adjustImg(cutMarg(img, **cuts), **adjustments)), 
103 |                    savpath = os.path.normpath(os.path.join(outDir, vol, vol + "_" + sec + ".txt")), 
104 |                    tsvfile = vol + "_" + sec + "_data.tsv")
105 |  


--------------------------------------------------------------------------------
/code/split_cleanup/00_initial_ch_sec_split.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Tue Aug 11 14:14:58 2020
  4 | 
  5 | @summary: This script parses raw OCR output files for section and chapter
  6 |     headers. It creates new versions of these raw files as well as new files 
  7 |     that have been aggregated into sections. This script corresponds to the
  8 |     "Step 1. Initial Splitting Process" section of the Split Cleanup Jupyter 
  9 |     notebook.
 10 | 
 11 | @author: Rucha Dalwadi & Matt Jansen
 12 | 
 13 | Digital Research Services
 14 | University Libraries
 15 | UNC Chapel Hill
 16 | """
 17 | 
 18 | 
 19 | import pandas as pd
 20 | import numpy as np
 21 | import joblib
 22 | import re
 23 | import os
 24 | 
 25 | 
 26 | 
 27 | def tsvparser(filename):
 28 |     """
 29 |     
 30 |     Identifies chapters and sections within a raw OCR output .tsv file for
 31 |     a single volume.
 32 |     
 33 |     Assigns chapter and section identifiers to rows in the raw file and
 34 |     creates an aggregate pd.DataFrame object grouping text into 
 35 |     individual sections.
 36 |     
 37 |     Outputs a new .csv version of the raw file and creates an initial .csv
 38 |     version of the aggregate file.
 39 |     
 40 |     Arguments
 41 |     --------------------------------------------------------------------------    
 42 |     filepath (str)       : The filepath for a single volume's raw .tsv OCR
 43 |                            output file.
 44 |                          
 45 |     
 46 |     Returns
 47 |     --------------------------------------------------------------------------
 48 |     N/A
 49 |     
 50 |     """
 51 | 
 52 | 
 53 |     
 54 |     # Import the raw .tsv file into a pd.DataFrame object
 55 |     raw=pd.read_csv(filename)
 56 |     raw['text'] = raw['text'].replace(np.nan, '')
 57 | 
 58 |     # Add columns to dataframe and create lists to which identified chapter
 59 |     # and section headers will be appended
 60 |     # Set the variables for chapter and section that will be used to fill out
 61 |     # the above lists
 62 |     chapter = '' 
 63 |     chapter_column = []
 64 |     raw['chapter'] = ''
 65 |     section = ''
 66 |     section_column = []
 67 |     raw['section'] = ''
 68 | 
 69 |     # Iterate through all rows in the raw file (word by word)
 70 |     for i in range(0, raw.shape[0]):
 71 |         
 72 |         # Initialize as variables the regex patterns used to identify 
 73 |         # chapters (match_chapter), abbreviate section headers 
 74 |         # (match_section), and unabbreviated "Section 1" sections
 75 |         # (match_section1)
 76 |         match_chapter = re.match('^(C|O)[A-Za-z]*(R|r)(\.|,|:|;)*$', raw.iloc[i]['text'])
 77 |         match_section = re.match('(S|s)[a-zA-Z]{2,3}(\.|,|:|;){0,2}$', raw.iloc[i]['text'])
 78 |         match_section1 = re.match('S[a-zA-Z]+$', raw.iloc[i]['text'])
 79 |         
 80 |         # Create a matching condition to check for three blank spaces above
 81 |         # potential matches
 82 |         blank3 = (re.match('^$', raw.iloc[i-1]['text']) and
 83 |                   re.match('^$', raw.iloc[i-2]['text']) and
 84 |                   re.match('^$', raw.iloc[i-3]['text']))
 85 |         
 86 |         # The following conditional statements check for situations that
 87 |         # indicate the beginning of a new chapter, a new section, or a new 
 88 |         # first section. If any of these are satisfied, the 'chapter' or 
 89 |         # 'section' variable is changed accordingly. Once all conditionals have
 90 |         # been checked, the resulting 'chapter' and 'section' values for the
 91 |         # word in question are added to their respective lists.
 92 |         # The results are two lists, one for each column, with a chapter and
 93 |         # section value for each row in the raw file.
 94 |         
 95 |         # Check for new chapters
 96 |         if (match_chapter and
 97 |             re.search('[0-9.]+(\.|,|:|;){0,2}', raw.iloc[i+1]['text']) and
 98 |             blank3):
 99 |             chapter = raw.iloc[i]['text'] +' '+ raw.iloc[i+1]['text']
100 |         
101 |         # Check for new abbreviated sections
102 |         if ((match_section and re.search('^[0-9.\}]+(\.|,|:|;){0,2}$', raw.iloc[i+1]['text'])) or
103 |             (match_section and blank3)):
104 |             section = raw.iloc[i]['text'] +' '+ raw.iloc[i+1]['text']
105 |             
106 |         # Check for new unabbreviated "Section 1" sections
107 |         if (match_section1 and
108 |             re.search('^(1|.)(\.|,|:|;){0,2}$', raw.iloc[i+1]['text'])):
109 |             section = raw.iloc[i]['text'] +' '+ raw.iloc[i+1]['text']
110 |             
111 |         # Set the "section" value to blank for areas of text belonging
112 |         # to a chapter title and not an actual section
113 |         if (match_chapter and
114 |             re.search('[0-9.]+(\.|,|:|;){0,2}', raw.iloc[i+1]['text']) and
115 |             blank3 !=  raw.iloc[i]['chapter']):
116 |             section = ''
117 |         
118 |         # Add the resulting 'section' and 'chapter' values to their respective
119 |         # lists.
120 |         section_column.append(section)    
121 |         chapter_column.append(chapter)
122 | 
123 |     # Once all words in the raw file have been checked, add the lists as
124 |     # columns to the raw dataframe.
125 |     raw['chapter'] = chapter_column
126 |     raw['section'] = section_column
127 | 
128 |     # Add a chapter index to differentiate duplicate chapter headers
129 |     raw["chapter_index"] = ((raw["chapter"].notna()) & (raw["chapter"]!=raw["chapter"].shift(1))).cumsum()
130 |     
131 |     # Add cell values for special cases
132 |     raw.loc[((raw["chapter"]=="") & (raw["section"]=="")), ["chapter","section"]] = "Paratextual"
133 |     raw.loc[((raw["chapter"]!="") & (raw["section"]=="")), ["section"]] = "Chapter_Title"
134 |     raw.loc[((raw["chapter"]=="") & (raw["section"]!="")), ["chapter"]] = "Chapter_UNKNOWN"
135 |     
136 |     # Create the aggregate dataframe grouping words by their identified 
137 |     # section assignments
138 |     agg = raw[raw["text"]!=""].groupby(['chapter', 'section', 'chapter_index'], sort=False)['text'].apply(' '.join).reset_index()
139 | 
140 |     # Output the raw and aggregate dataframes as .csv files
141 |     raw_outname = os.path.join("outputs","raw",filename.replace(".csv",'') + "_output.csv")
142 |     agg_outname = os.path.join("outputs","agg",filename.replace(".csv",'') + "_aggregated_ouput.csv")
143 | 
144 |     raw.to_csv(raw_outname, index=False, encoding="utf-8-sig")
145 |     agg.to_csv(agg_outname, index=False, encoding="utf-8-sig")
146 | 
147 | 
148 | def main():
149 |     
150 |     # Set OCR output file directory path
151 |     ocr_path = "."
152 | 
153 |     # Create directories for new raw/agg files
154 |     parse_output_path = "."
155 |     os.makedirs(os.path.join(parse_output_path,"outputs","agg"),exist_ok=True)
156 |     os.makedirs(os.path.join(parse_output_path,"outputs","raw"),exist_ok=True)
157 |     
158 |     # Create a list of all raw OCR output files in corpus
159 |     listdir = [f for f in os.listdir(ocr_path) if f.endswith(".tsv")]
160 |     
161 |     # Run "tsvparser" function in parallel to decrease compute time
162 |     with joblib.parallel_backend(n_jobs=7,backend='loky'):
163 |         joblib.Parallel(verbose=5)(joblib.delayed(tsvparser)(filename) for filename in listdir)
164 |         
165 | 
166 | if __name__ == "__main__":
167 |     main()


--------------------------------------------------------------------------------
/code/split_cleanup/01_auto_chap_clean1.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Thu Aug  6 10:28:19 2020
  4 | 
  5 | @summary: This script corresponds to the "Step 2. Chapter Cleanup: 
  6 |     First Automatic Pass" section of the Split Cleanup Jupyter notebook  
  7 |     documentation. This script generates a group of excel files, one for each 
  8 |     volume, with chapter errors identified and with certain suggested 
  9 |     corrections ("chapnumflags" files). These files are then utilized in the 
 10 |     "Step 3. Chapter Cleanup: First Manual Pass" section of the Split
 11 |     Cleanup Jupyter notebook.
 12 | 
 13 | @author: Neil Byers
 14 | 
 15 | Digital Research Services
 16 | University Libraries
 17 | UNC Chapel Hill
 18 | """
 19 | 
 20 | 
 21 | import csv
 22 | import pandas as pd
 23 | import os
 24 | from string import punctuation
 25 | import numpy as np
 26 | import xlsxwriter
 27 | csv.field_size_limit(600000)
 28 | 
 29 | # Create a variable to store all automatic fix/recommendation data for each volume
 30 | # This will later be used to output a report file for this step     
 31 | meta_list=[]
 32 | 
 33 | 
 34 | def initial_chap_fixes(agg_folder, agg_file):
 35 |     """    
 36 |     This function identifies chapter split numbering errors, suggests corrections
 37 |     for certain situtations, and outputs a volume-level list of chapters with
 38 |     potential errors and suggested corrections flagged for manual review.
 39 |     The function does not provide any return values. Instead, it outputs a
 40 |     single Excel file for each volume and adds volume-level metadata to the 
 41 |     corpus-level report list ("meta_list")
 42 |     
 43 |     Arguments
 44 |     --------------------------------------------------------------------------    
 45 |     agg_folder (str)         : The string filepath for the directory containing
 46 |                                the corpus "aggregate" files
 47 |     agg_file (str)           : The string base file name for an individual
 48 |                                volume's "aggregate" file
 49 |                          
 50 |     Returns
 51 |     --------------------------------------------------------------------------
 52 |     N/A
 53 |     
 54 |     """
 55 |     
 56 |     # Create path string variables and import the agg file into a Pandas dataframe
 57 |     inpath = agg_folder + agg_file
 58 |       
 59 |     outpath = inpath.replace("chap_adjusted_agg", "chap_num_flags")
 60 |     outpath = outpath.replace("aggregated_chapadjusted.csv", "chapnumflags.xlsx")
 61 |     
 62 |     vol_df = pd.read_csv(inpath, encoding = 'utf-8-sig')
 63 | 
 64 |     # Create lists to be converted to series for a chapter-level dataframe that
 65 |     # will be exported as an excel file
 66 |     chap_headers = []
 67 |     chap_indices = []
 68 |     chap_nums_raw = []
 69 | 
 70 | 
 71 |     # Populate the above lists
 72 |     if vol_df.loc[0,"chapter"]=="Paratextual":
 73 |         for idx, row in vol_df.iterrows():
 74 |                 if idx > 0:
 75 |                     if vol_df.loc[idx,"chapter"] != vol_df.loc[idx-1,"chapter"]:
 76 |                         chap_headers.append(vol_df.loc[idx,"chapter"])
 77 |                         chap_indices.append(vol_df.loc[idx,"chapter_index"])
 78 |     else:
 79 |         for idx, row in vol_df.iterrows():
 80 |             if idx==0:
 81 |                 chap_headers.append(vol_df.loc[idx,"chapter"])
 82 |                 chap_indices.append(vol_df.loc[idx,"chapter_index"])
 83 |             elif vol_df.loc[idx,"chapter"] != vol_df.loc[idx-1,"chapter"]:
 84 |                     chap_headers.append(vol_df.loc[idx,"chapter"])
 85 |                     chap_indices.append(vol_df.loc[idx,"chapter_index"])
 86 |     for i in range(0,len(chap_headers)):
 87 | 
 88 |         try:
 89 |             chapter_num = chap_headers[i].split()[1]
 90 |             chapter_num = chapter_num.rstrip(punctuation)
 91 |             chap_nums_raw.append(chapter_num)
 92 |         except:
 93 |             chap_nums_raw.append("N/A")
 94 | 
 95 |     # Convert the above lists to series
 96 |     # Create a stable list of the original numbering for
 97 |     # Comparison purposes
 98 |     raw_titles = pd.Series(chap_headers)
 99 |     indices_Series = pd.Series(chap_indices)
100 |     unq_ch = pd.Series(pd.to_numeric(chap_nums_raw, errors="coerce"))
101 |     orig_num = unq_ch.copy()
102 | 
103 | 
104 | 
105 |     # Complete all lag3, lag2, and lag1 fixes
106 |     for row_num in range(3,len(unq_ch)-3):
107 |         if unq_ch[row_num]!= (unq_ch[row_num+1]-1) and unq_ch[row_num]!= (unq_ch[row_num-1]+1):
108 |             lag1_test = (unq_ch[row_num+1]-unq_ch[row_num-1])==2
109 |             lag2_test = (unq_ch[row_num+2]-unq_ch[row_num-2])==4 and unq_ch[row_num+2]-unq_ch[row_num+1]==1
110 |             lag3_test = (unq_ch[row_num+3]-unq_ch[row_num-3])==6 and unq_ch[row_num+3]-unq_ch[row_num+2]==1
111 | 
112 |             if lag1_test and lag2_test and lag3_test:
113 |                 unq_ch[row_num] = unq_ch[row_num-3]+3 
114 | 
115 |     for row_num in range(2,len(unq_ch)-2):
116 |         if unq_ch[row_num]!= (unq_ch[row_num+1]-1) and unq_ch[row_num]!= (unq_ch[row_num-1]+1):
117 |             lag1_test = (unq_ch[row_num+1]-unq_ch[row_num-1])==2
118 |             lag2_test = (unq_ch[row_num+2]-unq_ch[row_num-2])==4 and unq_ch[row_num+2]-unq_ch[row_num+1]==1
119 |             if lag1_test and lag2_test:
120 |                 unq_ch[row_num] = unq_ch[row_num-2]+2
121 | 
122 |     for row_num in range(1,len(unq_ch)-1):
123 |         if unq_ch[row_num]!= (unq_ch[row_num+1]-1) and unq_ch[row_num]!= (unq_ch[row_num-1]+1):
124 |             lag1_test = (unq_ch[row_num+1]-unq_ch[row_num-1])==2
125 |             if lag1_test:
126 |                 unq_ch[row_num] = unq_ch[row_num-1]+1
127 | 
128 |     # Parse chapter rows in groups of 5 to flag areas with potential errors
129 |     # Mark those chapters that were corrected by the lag fix steps above
130 |     max_diff = unq_ch.diff(1).rolling(window=5, center=True).max()
131 |     min_diff = unq_ch.diff(1).rolling(window=5, center=True).min()
132 |     flag = ~((max_diff == min_diff) & max_diff == 1)
133 |     corrected = np.logical_and(unq_ch != orig_num, np.isnan(unq_ch)==False)
134 | 
135 |     # Compile dataframe to be exported as an excel file
136 |     output = pd.concat([raw_titles, orig_num, indices_Series, unq_ch, corrected, flag], axis=1)
137 |     output.columns = ['chap_title', 'raw_num', 'chapter_index', 'corrected_num', 'correction_made', 'flag']
138 | 
139 |     # Create an excel workbook from the above dataframe, add formatting to make
140 |     # corrections an errors more easily findable, and save.
141 |     with pd.ExcelWriter(outpath, engine='xlsxwriter') as writer:
142 | 
143 |         # create workbook object
144 |         workbook = writer.book
145 | 
146 |         header_format = workbook.add_format({'bold': True,
147 |                                              'valign': 'vcenter',
148 |                                              'border': 1,
149 |                                              'bg_color': '#e2efda',
150 |                                              'font_size': 14})
151 |         flag_format = workbook.add_format({'bg_color': '#f8cbad'})
152 |         corrected_format = workbook.add_format({'bg_color': '#b7dee8'})
153 | 
154 | 
155 |         # Convert the dataframe to an XlsxWriter Excel object.
156 |         output.to_excel(writer, sheet_name='op', index=False, startrow=1, header=False)
157 |         outputSheet = writer.sheets['op']
158 |         for col_num, value in enumerate(output.columns.values):
159 |             outputSheet.write(0, col_num, value, header_format)
160 |         outputSheet.set_column(0, 0, 13)
161 |         outputSheet.set_column(1, 1, 10)
162 |         outputSheet.set_column(2, 2, 16)
163 |         outputSheet.set_column(3, 3, 18)
164 |         outputSheet.set_column(4, 4, 9)
165 | 
166 |         outputSheet.conditional_format('F2:F'+str(output.shape[0]+1), {'type':     'cell',
167 |                                             'criteria': 'equal to',
168 |                                             'value':    "True",
169 |                                             'format':   flag_format})
170 |         outputSheet.conditional_format('E2:E'+str(output.shape[0]+1), {'type':     'cell',
171 |                                             'criteria': 'equal to',
172 |                                             'value':    "True",
173 |                                             'format':   corrected_format})
174 | 
175 |         writer.save()
176 |         
177 |     
178 |     # Add fix metadata for the volume in question to the corpus-level list
179 |     # This list will be saved as a report .csv file
180 |     try:
181 |         corrections = corrected.value_counts()[1]
182 |     except:
183 |         corrections = 0
184 |     meta_list.append({"agg_file":agg_file, 
185 |           "chap_count":output.shape[0], 
186 |           "flags":flag.value_counts()[1], 
187 |           "corrections":corrections})
188 | 
189 | 
190 | 
191 | def main():
192 |     # Set the filepath variable for the directory containing the corpus
193 |     # aggregate files
194 |     agg_filelist = os.listdir(r"C:\Users\npbyers\Desktop\OTB\ChapNumFixes\chap_adjusted_agg")
195 |     agg_folder = "./chap_adjusted_agg/"
196 |     
197 |     # Perform chapter fix/report operations for each volume using the 
198 |     # "initial_chap_fixes" function
199 |     for agg_file in agg_filelist:
200 |         initial_chap_fixes(agg_folder, agg_file)
201 |         
202 |     # Compile the corpus-level report for this step and output it to a .csv file
203 |     meta = pd.DataFrame(meta_list)
204 |     meta.to_csv("chap_nums_check.csv")
205 | 
206 | if __name__ == "__main__":
207 |     main()


--------------------------------------------------------------------------------
/code/split_cleanup/03_gen_manual_chapfix_files.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Fri Aug  7 08:44:09 2020
  4 | 
  5 | @summary: This script creates the files used in "Step 5. Chapter Cleanup: 
  6 |     Second Manual Pass" (see the Split Cleanup Juypter notebook). These
  7 |     files, called 'flag_rows' files, consist of all rows from a given volume's 
  8 |     raw file which belong to chapters that remain 'flagged' - in other words, 
  9 |     those chapters in the volume that are identified as being in the vicinity 
 10 |     of chapter numbering errors. It places these new files along with the 
 11 |     'chapnumflags' files from previous steps in volume-specific directories to
 12 |     enable quick action by manual reviewers.
 13 | 
 14 | 
 15 | @author: Neil Byers
 16 | 
 17 | Digital Research Services
 18 | University Libraries
 19 | UNC Chapel Hill
 20 | """
 21 | 
 22 | 
 23 | import pandas as pd
 24 | import os
 25 | import numpy as np
 26 | import joblib
 27 | import shutil
 28 | 
 29 | def create_manual_files(raw_fix_pair):
 30 |     """    
 31 |     This function generates files containing only those rows in a raw file
 32 |     that belong to chapters which remain 'flagged' after the first rounds of 
 33 |     manual and automatic chapter numbering error corrections. One "flag_rows"
 34 |     file is generated for each volume with remaining chapter errors. These files
 35 |     are intended by use for manual reviewers to aid them in cleaning the chapter 
 36 |     errors that could not be fixed automatically.
 37 |     
 38 |     The "flag_rows" files contain chapter, section, text, and chapter_index 
 39 |     information for each row (word). Volume metadata and Internet Archive
 40 |     jpeg/pdf urls for each page are also included. These final piees of information
 41 |     allow for manual reviewers to quickly access page images to aid in them
 42 |     in correcting errors. Finally, the raw file index location for each row is 
 43 |     added so that any changes made to the "flag_rows" file can be be re-integrated
 44 |     into new versions of the raw files.
 45 |     
 46 |     Once the "flag_rows" file is compiled, it is output as a .csv file. A version 
 47 |     of the final 'chapnumfixes' file for each affected volume is copied into the 
 48 |     same directory so that manual reviewers will have access to all necessary
 49 |     files in one location.
 50 |     
 51 |     Arguments
 52 |     --------------------------------------------------------------------------    
 53 |     raw_fix_pair (list)         : List with the string filepaths for both the 
 54 |                                   raw file and "chapnumfixes" file for a 
 55 |                                   given volume.
 56 | 
 57 |     Returns
 58 |     --------------------------------------------------------------------------
 59 |     N/A
 60 |     """
 61 |     
 62 |     # load files & create dataframes
 63 | 
 64 |     rawfile = raw_fix_pair[0]
 65 |     fixfile = raw_fix_pair[1]
 66 |     volume = (os.path.basename(rawfile))
 67 |     volume = volume.replace("_output_chapadjusted_rd2.csv", "")
 68 | 
 69 |     raw_df = pd.read_csv(rawfile, encoding='utf-8', low_memory=False)
 70 |     fix_df = pd.read_excel(fixfile, encoding='utf-8')
 71 | 
 72 |     # Identify raw file rows assigned to "flagged" chapters and compile a new
 73 |     # dataframe containing only these rows.
 74 |     if (fix_df['flag']==True).any():
 75 | 
 76 |         raw_df['chapter'] = raw_df['chapter'].replace(np.nan, '')
 77 |         raw_df['section'] = raw_df['section'].replace(np.nan, '')
 78 |         raw_df['text'] = raw_df['text'].replace(np.nan, '')
 79 |         raw_df['chapter_index'] = raw_df['chapter_index'].replace(np.nan, '')
 80 |         raw_df['flag'] = False
 81 | 
 82 |         for i in range(0, fix_df.shape[0]):
 83 |             idx = fix_df.iloc[i]["chapter_index"]
 84 |             if fix_df.iloc[i]['flag']:
 85 |                 raw_df.loc[((raw_df["chapter_index"]==idx)), ["flag"]] = True
 86 | 
 87 |         flag_df = raw_df[raw_df['flag']==True].copy()
 88 | 
 89 |         # Add IA urls to flag_rows file
 90 |         flag_df["vol"] = flag_df["name"].str.split(pat = "_")
 91 |         flag_df["vol"] = flag_df["vol"].apply(lambda x: x[0])
 92 |         flag_df['img_num'] = flag_df["name"].str.split(pat = "_")
 93 |         flag_df['img_num'] = flag_df['img_num'].apply(lambda x: x[1].replace(".jp2", ""))
 94 |         flag_df["jpg_url"] = "https://archive.org/download/" + flag_df["vol"] + "/" + flag_df["vol"] + "_jp2.zip/" + flag_df["vol"] + "_jp2%2F" + flag_df["name"] + "&ext=jpg"
 95 |         flag_df["pdf_url"] = "https://archive.org/download/" + flag_df["vol"] + "/" + flag_df["vol"] + ".pdf#page=" + flag_df['img_num']
 96 | 
 97 | 
 98 |         flag_df = flag_df[['text', 'name', 'chapter', 'section', 'jpg_url', 'pdf_url']]
 99 | 
100 | 
101 |         # Output flag_rows file
102 |         # Copy the chapnumfixes file to the same location
103 |         outname = volume + "_flag_rows.csv"
104 | 
105 |         outdir = "./manual_fixes/" + volume
106 |         if not os.path.exists(outdir):
107 |             os.mkdir(outdir)
108 | 
109 |         fullname = os.path.join(outdir, outname)
110 | 
111 |         flag_df.to_csv(fullname, index_label="rawfile_index")
112 |         shutil.copy2(fixfile, outdir)
113 |         
114 | def main():
115 |     # Set directories for raw and "chapnumfixe" files
116 |     raw_path = r"C:\Users\npbyers\Desktop\OTB\ChapNumFixes\chap_adjusted_raw_round2"
117 |     fix_path = r"C:\Users\npbyers\Desktop\OTB\ChapNumFixes\chap_num_fixes_final"
118 |     
119 |     rawfolder = "./chap_adjusted_raw_round2/"
120 |     fixfolder = "./chap_num_fixes_final/"
121 |     
122 |     # Create filepath lists for both sets of files
123 |     raw_filelist = [(rawfolder + f) for f in os.listdir(raw_path) if f.endswith(".csv")]
124 |     fix_filelist = [(fixfolder + f) for f in os.listdir(fix_path) if f.endswith(".xlsx")]
125 | 
126 |     # Create a list of pairs, each containing the path for a raw file
127 |     # and "chapnumfixes" file for a given volume
128 |     raw_fix_pairs = []
129 |     for i in range(0, len(raw_filelist)):
130 |         raw_fix_pairs.append([raw_filelist[i], fix_filelist[i]])
131 |     
132 |     # Run the 'create_manual_files' function in parallel to reduce compute time
133 |     with joblib.parallel_backend(n_jobs=7,backend='loky'):
134 |         joblib.Parallel(verbose=5)(joblib.delayed(create_manual_files)(pair) for pair in raw_fix_pairs)
135 | 
136 | 
137 | if __name__ == "__main__":
138 |     main()


--------------------------------------------------------------------------------
/code/split_cleanup/06_gen_final_agg.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Fri Aug  7 10:44:11 2020
  4 | 
  5 | @summary: This script generates a final round of aggregate files from the raw 
  6 |     files that resulted from the automatic and manual section-cleaning 
  7 |     processes. In the Split Cleanup Jupyter notebook, this script corresponds
  8 |     to "Step 8. Generating Final Files / Remaining Error Appraisal"
  9 | 
 10 | @author: Neil Byers
 11 | 
 12 | Digital Research Services
 13 | University Libraries
 14 | UNC Chapel Hill
 15 | """
 16 | 
 17 | 
 18 | import csv
 19 | import pandas as pd
 20 | import os
 21 | import numpy as np
 22 | import joblib
 23 | csv.field_size_limit(600000)
 24 | 
 25 | 
 26 | def generate_new(raw_file):
 27 |     """    
 28 |     This function generates new aggregate files to reflect changes made in the
 29 |     manual section error correction process. The final versions of these files
 30 |     contain 
 31 |     
 32 |     Arguments
 33 |     --------------------------------------------------------------------------    
 34 |     raw_file (str)           : The "raw" file path for a single volume
 35 | 
 36 |     Returns
 37 |     --------------------------------------------------------------------------
 38 |     N/A
 39 |     """
 40 | 
 41 |     raw_df = pd.read_csv(raw_file, encoding='utf-8', low_memory=False)
 42 | 
 43 |     raw_df['chapter'] = raw_df['chapter'].replace(np.nan, '')
 44 |     raw_df['section'] = raw_df['section'].replace(np.nan, '')
 45 |     raw_df['text'] = raw_df['text'].replace(np.nan, '')
 46 | 
 47 |     # reset chapter_index in raw files
 48 |     raw_df["chapter_index"] = (raw_df.chapter != raw_df.chapter.shift(1)).cumsum()
 49 |     raw_df['chapter_index'] = raw_df['chapter_index'].replace(np.nan, '')
 50 |     
 51 |     # reset section_index in raw files
 52 |     raw_df["section_index"] = (raw_df.section != raw_df.section.shift(1)).groupby(raw_df.chapter).cumsum()
 53 | 
 54 |     # Create a new aggregate dataframe with the jpeg image name on which
 55 |     # a given section begins included on each row (section)
 56 |     group_list = ['chapter', 'chapter_index', 'section', 'section_index']
 57 |     agg_dict = {'text': ' '.join,
 58 |                 'name': 'first'}
 59 |     agg = raw_df[raw_df["text"]!=""].groupby(group_list, sort=False, as_index=False).agg(agg_dict)
 60 |     agg.rename(columns={"name":"first_jpeg"}, inplace=True)
 61 | 
 62 |     # Generate the Internet Archive jpeg/pdf urls for each section's start page
 63 |     # based on the page image file name. Each row (section) in the 
 64 |     # aggregate file will thus be paired with its page image urls
 65 |     agg["vol"] = agg["first_jpeg"].str.split(pat = "_")
 66 |     agg["vol"] = agg["vol"].apply(lambda x: x[0])
 67 |     agg['img_num'] = agg["first_jpeg"].str.split(pat = "_")
 68 |     agg['img_num'] = agg['img_num'].apply(lambda x: x[1].replace(".jp2", ""))
 69 |     agg["first_jpg_url"] = "https://archive.org/download/" + agg["vol"] + "/" + agg["vol"] + "_jp2.zip/" + agg["vol"] + "_jp2%2F" + agg["first_jpeg"] + "&ext=jpg"
 70 |     agg["pdf_url"] = "https://archive.org/download/" + agg["vol"] + "/" + agg["vol"] + ".pdf#page=" + agg['img_num']
 71 |     
 72 |     
 73 | 
 74 | 
 75 |     # Remove extraneous columns from aggregate dataframe
 76 |     agg = agg.drop(columns=['first_jpeg', 'vol', 'img_num'])
 77 | 
 78 |     #output new raw and agg files to .csv
 79 |     raw_outname = raw_file.replace("_output.csv", "_output_final.csv")
 80 |     raw_outname = raw_outname.replace("/sec_clean/raw1/", "/sec_clean_final/raw/")
 81 |     agg_outname = raw_file.replace("_output.csv", "_aggregated_output_final.csv")
 82 |     agg_outname = agg_outname.replace("/sec_clean/raw1/", "/sec_clean_final/agg/")
 83 | 
 84 | 
 85 |     #output new raw/agg to file
 86 |     raw_df.to_csv(raw_outname, index=False, encoding="utf-8")
 87 |     agg.to_csv(agg_outname, index=False, encoding="utf-8")
 88 |         
 89 | def main():
 90 |     # Set directory path locations for raw files
 91 |     raw_path = r"C:\Users\npbyers\Desktop\OTB\SectNumFixes\sec_clean\raw1"
 92 |     rawfolder = "./sec_clean/raw1/"
 93 |     
 94 |     # Create a list of all raw files
 95 |     raw_filelist = [(rawfolder + f) for f in os.listdir(raw_path) if f.endswith(".csv")]
 96 |     
 97 |     # Create a new aggregate file using the 'generate_new' function.
 98 |     # This operation is run in parallel to reduce compute time.
 99 |     with joblib.parallel_backend(n_jobs=7,backend='loky'):
100 |         joblib.Parallel(verbose=5)(joblib.delayed(generate_new)(raw_file) for raw_file in raw_filelist)
101 | 
102 | if __name__ == "__main__":
103 |     main()


--------------------------------------------------------------------------------
/code/split_cleanup/07_final_sec_appraisal.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Fri Aug  7 11:23:54 2020
  4 | 
  5 | @summary: Creates two corpus-level report files on remaining section errors 
  6 |     to aid in future error correction efforts. The first 
  7 |     ('remaining_sec_errors.csv') consists of volume-level information about 
  8 |     remaining section gaps. The second ('final_error_chap_rows.csv') contains 
  9 |     section-level information for all chapters containing section numbering 
 10 |     errors, as identified by section numbering 'gaps'. In the Split Cleanup 
 11 |     Jupyter notebook, this script corresponds to "Step 8. Generating Final 
 12 |     Files / Remaining Error Appraisal".
 13 | 
 14 | 
 15 | @author: Neil Byers
 16 | 
 17 | Digital Research Services
 18 | University Libraries
 19 | UNC Chapel Hill
 20 | """
 21 | 
 22 | 
 23 | import csv
 24 | import pandas as pd
 25 | import os
 26 | from string import punctuation
 27 | import numpy as np
 28 | import joblib
 29 | csv.field_size_limit(600000)
 30 | 
 31 | 
 32 | def error_check(raw_file):
 33 |     """    
 34 |     This function compiles information related to the section 'gaps' present in
 35 |     a single volume. This information (total sections, chapters containing errors,
 36 |     types of errors remaining, etc.) is then used to compile two corpus-level
 37 |     report files to aid in future rounds of manual review. One, the 'meta' section
 38 |     errors file, contains metadata related to the remaining errors (gaps) in each
 39 |     volume of the corpus. The second, 'final_error_chap_rows.csv' consists of 
 40 |     rows for all sections of all chapters in the corpus which still contain
 41 |     sections preceded by 'gaps' after all of the previous cleanup steps. These
 42 |     files will be used in future rounds of manual and automatic review to complete
 43 |     the section cleanup process for the entire corpus.
 44 |     
 45 |     Arguments
 46 |     --------------------------------------------------------------------------    
 47 |     raw_file (str)           : The "raw" file path for a single volume
 48 | 
 49 |     Returns
 50 |     --------------------------------------------------------------------------
 51 |     report_row               : A dictionary containing the title of the volume,
 52 |                                the total number sections, the total number of 
 53 |                                chapters, the number of remaining section errors,
 54 |                                the number of chapters containing section errors,
 55 |                                and a list of dictionaries, one for each section,
 56 |                                for all sections in chapters that still contain
 57 |                                sections with 'gaps'.
 58 |     """
 59 |     
 60 |     # Read in raw file
 61 |     raw_df = pd.read_csv(raw_file, encoding='utf-8', low_memory=False)
 62 | 
 63 |     # eliminate np.nan from the raw dataframe
 64 |     raw_df['chapter'] = raw_df['chapter'].replace(np.nan, '')
 65 |     raw_df['section'] = raw_df['section'].replace(np.nan, '')
 66 |     raw_df['text'] = raw_df['text'].replace(np.nan, '')
 67 | 
 68 |     #reset chapter_index
 69 |     raw_df["chapter_index"] = (raw_df.chapter != raw_df.chapter.shift(1)).cumsum()
 70 |     raw_df['chapter_index'] = raw_df['chapter_index'].replace(np.nan, '')
 71 | 
 72 |     #Reset section index
 73 |     raw_df["section_index"] = (raw_df.section != raw_df.section.shift(1)).groupby(raw_df.chapter).cumsum()  
 74 | 
 75 | 
 76 | 
 77 | 
 78 | 
 79 |     # Create a dataframe for the volume's sections, one per row
 80 |     sections = raw_df.loc[:, ['chapter', 'chapter_index', 'section', 'section_index']].drop_duplicates()
 81 |     # Extract the numeric values from the section headers and convert them to float
 82 |     sections['raw_num'] = sections['section'].apply(lambda x: 0 if x == "Chapter_Title" else (x.strip().split()[1].rstrip(punctuation) if " " in x.strip() and x.strip().split()[1].rstrip(punctuation).isnumeric() else np.nan))
 83 |     sections.loc[sections["section"]=="Paratextual", "raw_num"] = 0
 84 |     sections['raw_num'] = sections['raw_num'].astype(float)
 85 |     
 86 |     # Create a groupby dataframe to group the sections by their respective chapters
 87 |     chapters = sections.groupby('chapter_index')
 88 | 
 89 |     # Generate the gap information for each section
 90 |     # This value indicates the numeric distance between a given section number
 91 |     # and that of the preceding section
 92 |     sections["gap"] = chapters.raw_num.diff(1)
 93 |     sections["gap"] = sections["gap"].replace(np.nan, 1)
 94 | 
 95 | 
 96 | 
 97 |     # Create a list of the unique gap values present in the volume
 98 |     gaps_remaining = sections['gap'].value_counts().keys().tolist()
 99 |     # Determine the frequencies of the above gap values
100 |     gap_counts = sections['gap'].value_counts().tolist()
101 | 
102 |     # Create variables for frequencies of gaps of 2, gaps of 3, and gaps of any
103 |     # other value with the exception of 1
104 |     two_gaps_left = 0
105 |     three_gaps_left = 0
106 |     other_gaps_left = 0
107 | 
108 |     for i in range(0, len(gaps_remaining)):
109 |         if gaps_remaining[i] == 2:
110 |             two_gaps_left = gap_counts[i]
111 |         elif gaps_remaining[i] == 3:
112 |             three_gaps_left = gap_counts[i]
113 |         elif gaps_remaining[i] != 1:
114 |             other_gaps_left += gap_counts[i]
115 | 
116 | 
117 |     # Calculate total sections in the volume, including those that are missing
118 |     # "Other" gaps are excluded because these are often not actual gaps and
119 |     # are likely to artificially inflate the section count variable.
120 |     total_sections = sections.shape[0]+two_gaps_left+(2*three_gaps_left)
121 |     total_chapters = len(chapters.groups.keys())
122 | 
123 |     # Calculate 'errors remaining' to include missing chapters and 'other' errors,
124 |     # as indicated by gaps with values other than 1, 2, or 3.
125 |     errors_remaining = two_gaps_left+(2*three_gaps_left)+other_gaps_left
126 |     
127 |     # Extract the volume title
128 |     vol = os.path.basename(raw_file).replace("_data_cleaned_new.csv", "")
129 | 
130 | 
131 | 
132 |     # Create a dataframe that includes all sections for all chapters containing
133 |     # sections with a gap value other than one, or zero in the case of Paratextuals
134 |     # and Chapter_Titles
135 |     error_chaps = []
136 |     total_error_chaps = 0
137 |     for b in chapters.groups.keys():
138 |             chap_sects = chapters.get_group(b)
139 |             for r in range(0,chap_sects.shape[0]):
140 |                 if (chap_sects.iloc[r]['gap']==0 and chap_sects.iloc[r]['section_index']!=1) or chap_sects.iloc[r]['gap']!=1:
141 |                     total_error_chaps += 1
142 |                     for b in range(0,chap_sects.shape[0]):
143 |                         error_chaps.append({"vol":vol, 
144 |                                                  "ch_index":chap_sects.iloc[b]['chapter_index'], 
145 |                                                  "ch_title":chap_sects.iloc[b]['chapter'],
146 |                                                  "sec_index":chap_sects.iloc[b]['section_index'],
147 |                                                  "sec_title":chap_sects.iloc[b]['section'],
148 |                                                  "gap":chap_sects.iloc[b]['gap']})
149 |                     break
150 | 
151 | 
152 |     # Compile the data below into a dictionary. This dictionary is then returned
153 |     # by the function to be added as a row to a corpus-level report. Rows from
154 |     # the 'error_chaps' list will be added to a corpus-level document containing
155 |     # all sections of all chapters containing sections with remaining gaps
156 |     report_row = {"vol":vol,
157 |                   "total_chapters":total_chapters,
158 |                   "total_sections":total_sections,
159 |                   "errors_remaining":errors_remaining,
160 |                   "error_chaps": total_error_chaps,
161 |                   "error_chaps_list": error_chaps}
162 | 
163 |     return report_row
164 | 
165 | 
166 | def main():
167 |     
168 |     # Set raw file directory variables and create a list of all raw files
169 |     raw_path = r"C:\Users\npbyers\Desktop\OTB\SectNumFixes\final\raw"   
170 |     rawfolder = "./final/raw/"    
171 |     raw_filelist = [(rawfolder + f) for f in os.listdir(raw_path) if f.endswith(".csv")]
172 | 
173 |     
174 |     # Call the error_check function above, once for each volume, in parallel
175 |     # to decrease compute time.
176 |     with joblib.parallel_backend(n_jobs=7,backend='loky'):
177 |         report_rows = joblib.Parallel(verbose=5)(
178 |         joblib.delayed(error_check)(raw_file) for raw_file in raw_filelist)
179 |     
180 |     # Compile the .csv file with all sections from all chapters containing
181 |     # sections with unusual gaps (gaps with values other than 1, or 0 in 
182 |     # certain cases)
183 |     error_chap_master = []    
184 |     for row in report_rows:
185 |         for i in row['error_chaps_list']:
186 |             error_chap_master.append(i)    
187 |     error_chap_df=pd.DataFrame(error_chap_master)
188 |     error_chap_df.to_csv(r"C:\Users\npbyers\Desktop\OTB\SectNumFixes\final_error_chap_rows.csv", index=False)
189 |     
190 |     # Compile the .csv file with volume-level information about remaining errors
191 |     # in the corpus as whole.
192 |     report_df = pd.DataFrame(report_rows)   
193 |     meta_df = report_df.drop('error_chaps_list', 1)    
194 |     meta_df.to_csv(r"C:\Users\npbyers\Desktop\OTB\SectNumFixes\remaining_sec_errors.csv", index=False)
195 | 
196 | 
197 | if __name__ == "__main__":
198 |     main()


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
 1 | name: oer-environment
 2 | channels:
 3 |   - conda-forge
 4 |   - anaconda
 5 |   - defaults
 6 |   
 7 | dependencies:
 8 |   - python
 9 |   - tesseract
10 |   - pytesseract
11 |   - pillow
12 |   - pip
13 |   - pip:
14 |     - geopandas
15 |     - internetarchive
16 |     - matplotlib
17 |     - nltk
18 |     - pandas
19 |     - pillow
20 |     - pyspellchecker
21 |     - requests
22 | 


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/adjRec.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python2
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | Created on Tue Jul 23 16:38:46 2019
  5 | 
  6 | @author: Lorin Bruckner
  7 | 
  8 | Digital Research Services
  9 | University Libraries
 10 | UNC Chapel Hill
 11 | """
 12 | 
 13 | import os, sys
 14 | import pandas as pandas
 15 | from random import sample
 16 | import csv
 17 | 
 18 | #get ocr functions
 19 | sys.path.append(os.path.abspath("./"))
 20 | from ocr_func import cutMarg, OCRtestImg, testList
 21 | 
 22 | def adjRec(vol, dirpath, masterlist, margdata, n):
 23 |     
 24 |     """
 25 | 
 26 |     Get the best image adustments to use on a volume.     
 27 |   
 28 |     vol (str)        : The name for the volume to be tested. Should not include
 29 |                        "_jp2" (so for 1879, it's "lawsresolutionso1879nort")
 30 |                   
 31 |     dirpath (str)    : The directory path for the folder where ALL volumes are located
 32 |     
 33 |     masterlist (str) : The direct file path for xmljpegmerge_official.csv
 34 |     
 35 |     margdata (str)   : The direct file path for the csv with marginalia data
 36 |     
 37 |     n (int)          : The sample size to use for testing
 38 | 
 39 |     """
 40 | 
 41 |     #Merge csvs
 42 |     mastercsv = pandas.read_csv(masterlist)
 43 |     margcsv = pandas.read_csv(margdata)
 44 |     mastercsv["filename"] = mastercsv["filename"] + ".jp2" 
 45 |     csv = mastercsv.merge(margcsv, left_on="filename", right_on="file")
 46 |     
 47 |     #Create a pool of image filenames for the volume and take a sample
 48 |     pool = []
 49 |     csvf = csv[csv["filename"].str.startswith(vol)].set_index("filename")
 50 |     
 51 |     for row in csvf.itertuples():
 52 |         pool.append(os.path.normpath(os.path.join(dirpath, vol + "_jp2/" + row.file)))
 53 |     
 54 |     pool = sample(pool, n)  
 55 |     
 56 |     #Get images for files in sample, cut margins and make a test list
 57 |     imgs = []
 58 |     results = []
 59 |     
 60 |     for img in pool:
 61 |         
 62 |         #get image name
 63 |         name = os.path.split(img)[1]
 64 |         
 65 |         #get values for cutting margins
 66 |         rotate = csvf.loc[name]["angle"]
 67 |         left = csvf.loc[name]["bbox1"]
 68 |         up = csvf.loc[name]["bbox2"]
 69 |         right = csvf.loc[name]["bbox3"]
 70 |         lower = csvf.loc[name]["bbox4"]
 71 |         bkgcol = (csvf.loc[name]["backR"], csvf.loc[name]["backG"], csvf.loc[name]["backB"])
 72 |         
 73 |         #cut the margins          
 74 |         img = cutMarg(img = img, rotate = rotate, left = left, up = up, right = right,
 75 |                      lower = lower, border = 200, bkgcol = bkgcol)
 76 |         
 77 |         #add the new image to the list
 78 |         imgs.append(img)
 79 |         
 80 |         #perform an OCR test on the new image and add the results to the list
 81 |         results.append(OCRtestImg(img))
 82 |         
 83 |     #create a testList object with the  images and results
 84 |     testSample = testList(imgs, results)
 85 |     
 86 |     #set up a dict of reccommended adjustments and perform tests
 87 |     adjustments = { "volume": vol, "color": 1.0, "invert": False, 
 88 |                     "autocontrast": 0, "blur": False, "sharpen": False, 
 89 |                     "smooth": False, "xsmooth": False }
 90 | 
 91 |     #color test
 92 |     testRes = testSample.adjustTest("color", levels = [1,.75,.5,.25,0])
 93 |     best = float(testRes["best_adjustment"].replace("color", ""))
 94 |     if best != 1.0:
 95 |         testSample = testSample.adjustSampleImgs(color = best)
 96 |         adjustments["color"] = best
 97 |         
 98 |     #invert test
 99 | #    testRes = testSample.adjustTest("invert")
100 | #    if testRes["best_adjustment"] == "invertTrue":
101 | #        testSample = testSample.adjustImg(invert = True)
102 | #        adjustments["invert"] = True
103 |         
104 |     #autocontrast test
105 |     testRes = testSample.adjustTest("autocontrast", levels = [0,2,4,6,8])
106 |     best = float(testRes["best_adjustment"].replace("autocontrast", ""))
107 |     if best != 0.0:
108 |         testSample = testSample.adjustSampleImgs(autocontrast = best)
109 |         adjustments["autocontrast"] = best
110 |         
111 |     #blur test
112 |     testRes = testSample.adjustTest("blur")
113 |     if testRes["best_adjustment"] == "blurTrue":
114 |         testSample = testSample.adjustSampleImgs(blur = True)
115 |         adjustments["blur"] = True
116 |         
117 |     #sharpen test
118 |     testRes = testSample.adjustTest("sharpen")
119 |     if testRes["best_adjustment"] == "sharpenTrue":
120 |         testSample = testSample.adjustSampleImgs(sharpen = True)
121 |         adjustments["sharpen"] = True
122 |         
123 |     #smooth test
124 |     testRes = testSample.adjustTest("smooth")
125 |     if testRes["best_adjustment"] == "smoothTrue":
126 |         testSample = testSample.adjustSampleImgs(smooth = True)
127 |         adjustments["smooth"] = True
128 |         
129 |         #xsmooth test
130 |         testRes = testSample.adjustTest("xsmooth")
131 |         if testRes["best_adjustment"] == "xsmoothTrue":
132 |             testSample = testSample.adjustSampleImgs(xsmooth = True)
133 |             adjustments["xsmooth"] = True
134 | 
135 |     return adjustments    
136 | 
137 | 
138 | ###########  Set up locations  ###############################################
139 |     
140 | dirpath = "/images"     
141 | masterlist = "SampleMetadata.csv"
142 | margdata = "marginalia_metadata_demo.csv"
143 | 
144 | 
145 | ###########  Reccommend Adjustments for a Single Volume  ######################
146 | 
147 | adjDEMO = adjRec("lawsresolutionso1891nort", dirpath, masterlist, margdata, 3)
148 | 
149 | 
150 | ###########  Create a CSV with Adjustment Specs for all Volumes  ############## 
151 | 
152 | savfile = "/Users/tuesday/Documents/_Projects/Research/OnTheBooks/output/adjustmentsDEMO.csv"
153 | 
154 | for folder in os.listdir(dirpath):
155 |     
156 |     if folder == ".DS_Store":
157 |         continue
158 |     
159 |     #get volume
160 |     vol = folder.replace("_jp2", "")
161 |     print ("Testing " + vol + "...")
162 |     
163 |     #peform adjustment tests 
164 |     adjRow = adjRec(vol, dirpath, masterlist, margdata, 10)
165 |     
166 |     #record adjustments
167 |     with open(savfile, "a") as f:
168 |         w = csv.DictWriter(f, adjRow.keys())
169 |         if f.tell() == 0:
170 |             w.writeheader()
171 |             w.writerow(adjRow)
172 |         else: 
173 |             w.writerow(adjRow)


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/adjusted.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/adjusted.png


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/example_image.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon Jul  8 08:16:57 2019
  4 | 
  5 | @author: mtjansen, npbyers
  6 | """
  7 | 
  8 | from PIL import Image, ImageDraw
  9 | import os
 10 | import sys
 11 | 
 12 | sys.path.append(os.path.abspath("./"))
 13 | from cropfunctions import *
 14 | 
 15 | def bandimage(orig, angle, band_dict, bheight, orig_bbox):
 16 |         bd_ct = len(band_dict["band_bboxes"])
 17 |         back_height = bd_ct*(bheight+20)+100        
 18 |         img = orig.copy().rotate(angle).crop(orig_bbox)
 19 |         bounds = Image.new(orig.mode, orig.size, "white")
 20 |         draw = ImageDraw.Draw(bounds)
 21 | 
 22 |         for row in band_dict["band_bboxes"]:
 23 |             band_bbox = list(row["raw"])
 24 |             band_bbox[1] = row["index"]-50
 25 |             band_bbox[3] = row["index"]
 26 |             band = combine_bbox(orig_bbox,band_bbox)
 27 |             spot = tuple(list(band)[0:2])
 28 |             bounds.paste(img.crop(band_bbox),spot)
 29 |             if list(band_bbox)[2]-list(band_bbox)[0]>10:
 30 |                 draw.rectangle(band,outline="red",fill = None,width=2)
 31 |             
 32 |         return bounds
 33 |     
 34 | def diffbands(diff, band_dict, cut, bheight):
 35 |         cdiff = diff.convert(mode="RGB")
 36 | 
 37 |         bd_ct = len(band_dict["band_bboxes"])
 38 |         back_height = bd_ct*(bheight+20)+100        
 39 |         bounds = Image.new(cdiff.mode, cdiff.size, "white")
 40 |         drawBands = ImageDraw.Draw(bounds)
 41 | 
 42 | 
 43 |         for row in band_dict["band_bboxes"]:
 44 |             band_bbox = list(row["raw"])
 45 |             band_bbox[1] = row["index"]-50
 46 |             band_bbox[3] = row["index"]
 47 |             spot = tuple(list(band_bbox)[0:2])
 48 |             bounds.paste(diff.crop(band_bbox),spot)
 49 |             if list(band_bbox)[2]-list(band_bbox)[0]>10:
 50 |                 drawBands.rectangle(band_bbox,outline="#fc8003", fill = None,width=2)
 51 | 
 52 |         drawCut = ImageDraw.Draw(bounds)
 53 |         drawCut.line((cut,0, cut, bounds.size[1]),fill ="#0f03fc",width = 7)
 54 |             
 55 |         return bounds
 56 | 
 57 | def bandsdisplay(bandimages):
 58 |         back_width = (bandimages[0].size[0]+bandimages[1].size[0]+bandimages[2].size[0]+200)
 59 |         back_height = (max([bandimages[0].size[1], bandimages[1].size[1], bandimages[2].size[1]])+100)
 60 |         back = Image.new(bandimages[0].mode, (back_width,back_height), "white")
 61 | 
 62 |         back.paste(bandimages[0],(50,50))
 63 |         back.paste(bandimages[1],(bandimages[0].size[0]+100,50))
 64 |         back.paste(bandimages[2],(bandimages[0].size[0]+bandimages[1].size[0]+150,50))
 65 |         
 66 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
 67 |         
 68 |         
 69 |         return rs
 70 | 
 71 |     
 72 | def comparison1(orig, diff, angle, orig_bbox):
 73 |         img = orig.copy().convert(mode="RGBA")
 74 |         box=Image.new('RGBA', (img.size[0],img.size[1]))
 75 |         d = ImageDraw.Draw(box)
 76 |         d.rectangle(orig_bbox,outline="red",fill = None,width=5)
 77 |         w=box.rotate(-angle)
 78 | 
 79 |         superimpose = Image.new('RGBA', (img.size[0],img.size[1]))
 80 |         superimpose.paste(img, (0,0))
 81 |         superimpose.paste(w, (0,0), mask=w)
 82 | 
 83 |         back_width = (img.size[0]+diff.size[0]+150)
 84 |         back_height = (img.size[1]+100)
 85 |         back = Image.new(img.mode, (back_width,back_height), "white")
 86 |         draw = ImageDraw.Draw(img)
 87 | 
 88 |         back.paste(superimpose,(50,50))
 89 |         back.paste(diff,(superimpose.size[0]+100,orig_bbox[1]+50))
 90 |         
 91 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
 92 |         return rs
 93 |     
 94 | def comparison2(band, final):
 95 |         back_width = (band.size[0]+final.size[0]+150)
 96 |         back_height = (band.size[1]+100)
 97 |         back = Image.new(final.mode, (back_width,back_height), "white")
 98 | 
 99 |         back.paste(band,(50,50))
100 |         back.paste(final,(band.size[0]+100,50))
101 |         
102 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
103 |         return rs
104 |     
105 | def origdisplay(orig1, orig2, orig3):
106 |         back_width = (orig1.size[0]+orig2.size[0]+orig3.size[0]+200)
107 |         back_height = (max([orig1.size[1], orig2.size[1], orig3.size[1]])+100)
108 |         back = Image.new(orig1.mode, (back_width,back_height), "white")
109 | 
110 |         back.paste(orig1,(50,50))
111 |         back.paste(orig2,(orig1.size[0]+100,50))
112 |         back.paste(orig3,(orig1.size[0]+orig2.size[0]+150,50))
113 |         
114 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
115 |         return rs
116 |     
117 | def diffdisplay(diff1, diff2, diff3):
118 |         img1 = diff1.copy().convert(mode="RGBA")
119 |         img2 = diff2.copy().convert(mode="RGBA")
120 |         img3 = diff3.copy().convert(mode="RGBA")
121 |         back_width = (img1.size[0]+img2.size[0]+img3.size[0]+200)
122 |         back_height = (max([img1.size[1], img2.size[1], img3.size[1]])+100)
123 |         back = Image.new(img1.mode, (back_width,back_height), "white")
124 | 
125 |         back.paste(img1,(50,50))
126 |         back.paste(img2,(img1.size[0]+100,50))
127 |         back.paste(img3,(img1.size[0]+img2.size[0]+150,50))
128 |         
129 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
130 |         return rs
131 | 
132 | def finalsdisplay(finalimages):
133 |         #IN USE
134 |         back_width = (finalimages[0].size[0]+finalimages[1].size[0]+finalimages[2].size[0]+200)
135 |         back_height = (max([finalimages[0].size[1], finalimages[1].size[1], finalimages[2].size[1]])+100)
136 |         back = Image.new(finalimages[0].mode, (back_width,back_height), "white")
137 | 
138 |         back.paste(finalimages[0],(50,50))
139 |         back.paste(finalimages[1],(finalimages[0].size[0]+100,50))
140 |         back.paste(finalimages[2],(finalimages[0].size[0]+finalimages[1].size[0]+150,50))
141 |         
142 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
143 |         return rs
144 |     


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/geonames.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python2
 2 | # -*- coding: utf-8 -*-
 3 | """
 4 | Created on Fri May 31 16:46:58 2019
 5 | 
 6 | @author: Lorin Bruckner
 7 | 
 8 | Digital Research Services
 9 | University Libraries
10 | UNC Chapel Hill
11 | """
12 | 
13 | import pandas as pandas
14 | from nltk import word_tokenize
15 | 
16 | #Read in the tab delimited file from http://download.geonames.org/export/dump/US.zip
17 | #File was downloaded 5/31/19, 4:41 PM
18 | gn = pandas.read_csv("/Users/tuesday/Documents/_Projects/Research/OnTheBooks/US/US.txt", sep ="\t", header = None)
19 | 
20 | #Filter records for North Carolina 
21 | ncgn = gn[gn.loc[:,10] == "NC"]
22 | 
23 | #Dump all geonames into a single string
24 | geonames = ""
25 | for index,row in ncgn.iterrows():
26 |     if type(row[2]) is str: 
27 |         geonames = geonames + " " + row[2]
28 | 
29 | #Tokenize geonames. Remove punctuation, duplicates and single letters. Make lowercase.
30 | geotokens = word_tokenize(geonames)
31 | geotokens = [token for token in geotokens if token.isalpha()]
32 | geotokens = list(dict.fromkeys(geotokens))
33 | geotokens = [token for token in geotokens if len(token) > 1]
34 | geotokens = [token.lower() for token in geotokens]
35 | 
36 | #Create text file to add to Spell Checker
37 | with open("/Users/tuesday/Documents/_Projects/Research/OnTheBooks/geonames.txt", "w") as file:
38 |     for token in geotokens:
39 |         file.write(token + " ")


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0272.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0272.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0374.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0374.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0542.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0542.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0606.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0606.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0771.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0771.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0944.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0944.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1114.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1114.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1210.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1210.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1373.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1373.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1494.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1494.jpg


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/marginalia_metadata_demo.csv:
--------------------------------------------------------------------------------
 1 | file,angle,side,cut,backR,backG,backB,bbox1,bbox2,bbox3,bbox4
 2 | lawsresolutionso1891nort_0272.jpg,0,left,350,229,212,185,427,337,1915,2960
 3 | lawsresolutionso1891nort_0374.jpg,0,left,350,228,212,187,446,352,1927,2964
 4 | lawsresolutionso1891nort_0542.jpg,0,left,350,228,212,186,472,337,1957,2954
 5 | lawsresolutionso1891nort_0606.jpg,0,left,353,227,211,184,459,318,1944,3225
 6 | lawsresolutionso1891nort_0771.jpg,0,right,1484,214,198,174,46,345,1530,2510
 7 | lawsresolutionso1891nort_0944.jpg,0,left,350,232,215,192,488,326,1974,2945
 8 | lawsresolutionso1891nort_1114.jpg,0,left,350,231,215,193,504,327,1989,2947
 9 | lawsresolutionso1891nort_1210.jpg,0,left,350,232,217,197,490,296,1976,2909
10 | lawsresolutionso1891nort_1373.jpg,0,right,1478,216,201,178,30,316,1508,2958
11 | lawsresolutionso1891nort_1494.jpg,-0.25,left,358,232,216,195,483,390,1965,3045
12 | 


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/sample_metadata.csv:
--------------------------------------------------------------------------------
 1 | filename,leafNum,handSide,page,sectiontype,sectiontitle,fileUrl
 2 | lawsresolutionso1891nort_0272,272,LEFT,226,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0272.jpg
 3 | lawsresolutionso1891nort_0374,374,LEFT,328,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0374.jpg
 4 | lawsresolutionso1891nort_0542,542,LEFT,496,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0542.jpg
 5 | lawsresolutionso1891nort_0606,606,LEFT,558,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0606.jpg
 6 | lawsresolutionso1891nort_0771,771,RIGHT,723,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0771.jpg
 7 | lawsresolutionso1891nort_0944,944,LEFT,896,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0944.jpg
 8 | lawsresolutionso1891nort_1114,1114,LEFT,1066,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1114.jpg
 9 | lawsresolutionso1891nort_1210,1210,LEFT,1162,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1210.jpg
10 | lawsresolutionso1891nort_1373,1373,RIGHT,1325,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1373.jpg
11 | lawsresolutionso1891nort_1494,1494,LEFT,1446,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1494.jpg
12 | 


--------------------------------------------------------------------------------
/examples/adjustment_recommendation/unadjusted.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/adjustment_recommendation/unadjusted.png


--------------------------------------------------------------------------------
/examples/marginalia_determination/example_image.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon Jul  8 08:16:57 2019
  4 | 
  5 | @author: mtjansen, npbyers
  6 | """
  7 | 
  8 | from PIL import Image, ImageDraw
  9 | import os
 10 | import sys
 11 | 
 12 | sys.path.append(os.path.abspath("./"))
 13 | from cropfunctions import *
 14 | 
 15 | def bandimage(orig, angle, band_dict, bheight, orig_bbox):
 16 |         bd_ct = len(band_dict["band_bboxes"])
 17 |         back_height = bd_ct*(bheight+20)+100        
 18 |         img = orig.copy().rotate(angle).crop(orig_bbox)
 19 |         bounds = Image.new(orig.mode, orig.size, "white")
 20 |         draw = ImageDraw.Draw(bounds)
 21 | 
 22 |         for row in band_dict["band_bboxes"]:
 23 |             band_bbox = list(row["raw"])
 24 |             band_bbox[1] = row["index"]-50
 25 |             band_bbox[3] = row["index"]
 26 |             band = combine_bbox(orig_bbox,band_bbox)
 27 |             spot = tuple(list(band)[0:2])
 28 |             bounds.paste(img.crop(band_bbox),spot)
 29 |             if list(band_bbox)[2]-list(band_bbox)[0]>10:
 30 |                 draw.rectangle(band,outline="red",fill = None,width=2)
 31 |             
 32 |         return bounds
 33 |     
 34 | def diffbands(diff, band_dict, cut, bheight):
 35 |         cdiff = diff.convert(mode="RGB")
 36 | 
 37 |         bd_ct = len(band_dict["band_bboxes"])
 38 |         back_height = bd_ct*(bheight+20)+100        
 39 |         bounds = Image.new(cdiff.mode, cdiff.size, "white")
 40 |         drawBands = ImageDraw.Draw(bounds)
 41 | 
 42 | 
 43 |         for row in band_dict["band_bboxes"]:
 44 |             band_bbox = list(row["raw"])
 45 |             band_bbox[1] = row["index"]-50
 46 |             band_bbox[3] = row["index"]
 47 |             spot = tuple(list(band_bbox)[0:2])
 48 |             bounds.paste(diff.crop(band_bbox),spot)
 49 |             if list(band_bbox)[2]-list(band_bbox)[0]>10:
 50 |                 drawBands.rectangle(band_bbox,outline="#fc8003", fill = None,width=2)
 51 | 
 52 |         drawCut = ImageDraw.Draw(bounds)
 53 |         drawCut.line((cut,0, cut, bounds.size[1]),fill ="#0f03fc",width = 7)
 54 |             
 55 |         return bounds
 56 | 
 57 | def bandsdisplay(bandimages):
 58 |         back_width = (bandimages[0].size[0]+bandimages[1].size[0]+bandimages[2].size[0]+200)
 59 |         back_height = (max([bandimages[0].size[1], bandimages[1].size[1], bandimages[2].size[1]])+100)
 60 |         back = Image.new(bandimages[0].mode, (back_width,back_height), "white")
 61 | 
 62 |         back.paste(bandimages[0],(50,50))
 63 |         back.paste(bandimages[1],(bandimages[0].size[0]+100,50))
 64 |         back.paste(bandimages[2],(bandimages[0].size[0]+bandimages[1].size[0]+150,50))
 65 |         
 66 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
 67 |         
 68 |         
 69 |         return rs
 70 | 
 71 |     
 72 | def comparison1(orig, diff, angle, orig_bbox):
 73 |         img = orig.copy().convert(mode="RGBA")
 74 |         box=Image.new('RGBA', (img.size[0],img.size[1]))
 75 |         d = ImageDraw.Draw(box)
 76 |         d.rectangle(orig_bbox,outline="red",fill = None,width=5)
 77 |         w=box.rotate(-angle)
 78 | 
 79 |         superimpose = Image.new('RGBA', (img.size[0],img.size[1]))
 80 |         superimpose.paste(img, (0,0))
 81 |         superimpose.paste(w, (0,0), mask=w)
 82 | 
 83 |         back_width = (img.size[0]+diff.size[0]+150)
 84 |         back_height = (img.size[1]+100)
 85 |         back = Image.new(img.mode, (back_width,back_height), "white")
 86 |         draw = ImageDraw.Draw(img)
 87 | 
 88 |         back.paste(superimpose,(50,50))
 89 |         back.paste(diff,(superimpose.size[0]+100,orig_bbox[1]+50))
 90 |         
 91 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
 92 |         return rs
 93 |     
 94 | def comparison2(band, final):
 95 |         back_width = (band.size[0]+final.size[0]+150)
 96 |         back_height = (band.size[1]+100)
 97 |         back = Image.new(final.mode, (back_width,back_height), "white")
 98 | 
 99 |         back.paste(band,(50,50))
100 |         back.paste(final,(band.size[0]+100,50))
101 |         
102 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
103 |         return rs
104 |     
105 | def origdisplay(orig1, orig2, orig3):
106 |         back_width = (orig1.size[0]+orig2.size[0]+orig3.size[0]+200)
107 |         back_height = (max([orig1.size[1], orig2.size[1], orig3.size[1]])+100)
108 |         back = Image.new(orig1.mode, (back_width,back_height), "white")
109 | 
110 |         back.paste(orig1,(50,50))
111 |         back.paste(orig2,(orig1.size[0]+100,50))
112 |         back.paste(orig3,(orig1.size[0]+orig2.size[0]+150,50))
113 |         
114 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
115 |         return rs
116 |     
117 | def diffdisplay(diff1, diff2, diff3):
118 |         img1 = diff1.copy().convert(mode="RGBA")
119 |         img2 = diff2.copy().convert(mode="RGBA")
120 |         img3 = diff3.copy().convert(mode="RGBA")
121 |         back_width = (img1.size[0]+img2.size[0]+img3.size[0]+200)
122 |         back_height = (max([img1.size[1], img2.size[1], img3.size[1]])+100)
123 |         back = Image.new(img1.mode, (back_width,back_height), "white")
124 | 
125 |         back.paste(img1,(50,50))
126 |         back.paste(img2,(img1.size[0]+100,50))
127 |         back.paste(img3,(img1.size[0]+img2.size[0]+150,50))
128 |         
129 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
130 |         return rs
131 | 
132 | def finalsdisplay(finalimages):
133 |         #IN USE
134 |         back_width = (finalimages[0].size[0]+finalimages[1].size[0]+finalimages[2].size[0]+200)
135 |         back_height = (max([finalimages[0].size[1], finalimages[1].size[1], finalimages[2].size[1]])+100)
136 |         back = Image.new(finalimages[0].mode, (back_width,back_height), "white")
137 | 
138 |         back.paste(finalimages[0],(50,50))
139 |         back.paste(finalimages[1],(finalimages[0].size[0]+100,50))
140 |         back.paste(finalimages[2],(finalimages[0].size[0]+finalimages[1].size[0]+150,50))
141 |         
142 |         rs = back.resize((int(back.size[0]/5),int(back.size[1]/5)))
143 |         return rs
144 |     


--------------------------------------------------------------------------------
/examples/marginalia_determination/lawsresolutionso1891nort_jp2/lawsresolutionso1891nort_0697.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/marginalia_determination/lawsresolutionso1891nort_jp2/lawsresolutionso1891nort_0697.jpg


--------------------------------------------------------------------------------
/examples/marginalia_determination/lawsresolutionso1891nort_jp2/lawsresolutionso1891nort_0715.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/marginalia_determination/lawsresolutionso1891nort_jp2/lawsresolutionso1891nort_0715.jpg


--------------------------------------------------------------------------------
/examples/marginalia_determination/lawsresolutionso1891nort_jp2/lawsresolutionso1891nort_0716.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/marginalia_determination/lawsresolutionso1891nort_jp2/lawsresolutionso1891nort_0716.jpg


--------------------------------------------------------------------------------
/examples/marginalia_determination/output/marginalia_metadata_demo.csv:
--------------------------------------------------------------------------------
1 | file,angle,side,cut,backR,backG,backB,bbox1,bbox2,bbox3,bbox4
2 | lawsresolutionso1891nort_0697.jp2,0.0,right,1470,214,197,169,52,316,1522,1570
3 | lawsresolutionso1891nort_0697.jp2,-0.75,right,1485,206,185,154,50,427,1535,3153
4 | lawsresolutionso1891nort_0697.jp2,0.0,left,350,225,205,176,494,352,1980,2972
5 | 


--------------------------------------------------------------------------------
/examples/marginalia_determination/sample_metadata.csv:
--------------------------------------------------------------------------------
1 | filename,leafNum,handSide,page,sectiontype,sectiontitle,fileUrl
2 | lawsresolutionso1891nort_0697,697,RIGHT,649,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0697.jpg
3 | lawsresolutionso1891nort_0715,715,RIGHT,667,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0715.jpg
4 | lawsresolutionso1891nort_0716,716,LEFT,668,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0716.jpg
5 | 


--------------------------------------------------------------------------------
/examples/ocr/adjustments_demo.csv:
--------------------------------------------------------------------------------
1 | ﻿volume,color,invert,autocontrast,blur,sharpen,smooth,xsmooth
2 | lawsresolutionso1891nort,0.75,False,4,False,False,False,False
3 | 


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0272.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0272.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0374.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0374.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0542.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0542.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0606.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0606.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0771.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0771.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0944.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_0944.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1114.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1114.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1210.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1210.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1373.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1373.jpg


--------------------------------------------------------------------------------
/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1494.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/images/lawsresolutionso1891nort_jpg/lawsresolutionso1891nort_1494.jpg


--------------------------------------------------------------------------------
/examples/ocr/marginalia_metadata_demo.csv:
--------------------------------------------------------------------------------
 1 | file,angle,side,cut,backR,backG,backB,bbox1,bbox2,bbox3,bbox4
 2 | lawsresolutionso1891nort_0272.jpg,0,left,350,229,212,185,427,337,1915,2960
 3 | lawsresolutionso1891nort_0374.jpg,0,left,350,228,212,187,446,352,1927,2964
 4 | lawsresolutionso1891nort_0542.jpg,0,left,350,228,212,186,472,337,1957,2954
 5 | lawsresolutionso1891nort_0606.jpg,0,left,353,227,211,184,459,318,1944,3225
 6 | lawsresolutionso1891nort_0771.jpg,0,right,1484,214,198,174,46,345,1530,2510
 7 | lawsresolutionso1891nort_0944.jpg,0,left,350,232,215,192,488,326,1974,2945
 8 | lawsresolutionso1891nort_1114.jpg,0,left,350,231,215,193,504,327,1989,2947
 9 | lawsresolutionso1891nort_1210.jpg,0,left,350,232,217,197,490,296,1976,2909
10 | lawsresolutionso1891nort_1373.jpg,0,right,1478,216,201,178,30,316,1508,2958
11 | lawsresolutionso1891nort_1494.jpg,-0.25,left,358,232,216,195,483,390,1965,3045
12 | 


--------------------------------------------------------------------------------
/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_adjustments.txt:
--------------------------------------------------------------------------------
1 | IMAGE ADJUSTMENTS
2 | 
3 | color: 0.75
4 | autocontrast: 4
5 | blur: False
6 | sharpen: False
7 | smooth: False
8 | xsmooth: False
9 | 


--------------------------------------------------------------------------------
/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_private laws.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_private laws.txt


--------------------------------------------------------------------------------
/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_private laws_data.tsv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_private laws_data.tsv


--------------------------------------------------------------------------------
/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_public laws.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_public laws.txt


--------------------------------------------------------------------------------
/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_public laws_data.tsv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/ocr/output/lawsresolutionso1891nort/lawsresolutionso1891nort_public laws_data.tsv


--------------------------------------------------------------------------------
/examples/ocr/xmljpegmerge_demo.csv:
--------------------------------------------------------------------------------
 1 | filename,leafNum,handSide,page,sectiontype,sectiontitle,fileUrl
 2 | lawsresolutionso1891nort_0272,272,LEFT,226,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0272.jpg
 3 | lawsresolutionso1891nort_0374,374,LEFT,328,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0374.jpg
 4 | lawsresolutionso1891nort_0542,542,LEFT,496,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0542.jpg
 5 | lawsresolutionso1891nort_0606,606,LEFT,558,public laws,Public Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0606.jpg
 6 | lawsresolutionso1891nort_0771,771,RIGHT,723,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0771.jpg
 7 | lawsresolutionso1891nort_0944,944,LEFT,896,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_0944.jpg
 8 | lawsresolutionso1891nort_1114,1114,LEFT,1066,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1114.jpg
 9 | lawsresolutionso1891nort_1210,1210,LEFT,1162,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1210.jpg
10 | lawsresolutionso1891nort_1373,1373,RIGHT,1325,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1373.jpg
11 | lawsresolutionso1891nort_1494,1494,LEFT,1446,private laws,Private Laws of the State of North Carolina Session 1891,https://archive.org/download/lawsresolutionso1891nort/lawsresolutionso1891nort_jp2.zip/lawsresolutionso1891nort_jp2%2Flawsresolutionso1891nort_1494.jpg
12 | 


--------------------------------------------------------------------------------
/examples/split_cleanup/1899_public_chapnumflags_step4.csv:
--------------------------------------------------------------------------------
 1 | ﻿chap_title,chapter_index,raw_num,corrected_num,correction_made,flag,gap
 2 | CHAPTER 1.,2,1,1,FALSE,TRUE,
 3 | CHaprer 12.,3,,,FALSE,TRUE,
 4 | CHAPTER 2.,4,2,2,FALSE,TRUE,
 5 | CHAPTER 3.,5,3,3,FALSE,TRUE,
 6 | CHAPTER 4.,6,4,4,FALSE,TRUE,
 7 | CHAPTER 5.,7,5,5,FALSE,TRUE,
 8 | CHAPTER 6.,8,6,6,FALSE,TRUE,
 9 | CHAPTER 6&8.,9,,,FALSE,TRUE,
10 | CHAPTER 9.,10,9,9,FALSE,TRUE,
11 | CHAPTER 10.,11,10,10,FALSE,TRUE,
12 | CHAPTER 14.,12,14,14,FALSE,TRUE,3
13 | CHAPTER 15.,13,15,15,FALSE,TRUE,
14 | CHAPTER 16.,14,16,16,FALSE,TRUE,
15 | CHAPTER 17.,15,17,17,FALSE,FALSE,
16 | CHAPTER 18.,16,18,18,FALSE,FALSE,
17 | CHAPTER 19.,17,19,19,FALSE,FALSE,
18 | CHAPTER 20.,18,20,20,FALSE,FALSE,
19 | CHAPTER 21.,19,21,21,FALSE,FALSE,
20 | CHAPTER 22.,20,22,22,FALSE,FALSE,
21 | CHAPTER 23.,21,23,23,FALSE,FALSE,
22 | CHAPTER 24.,22,24,24,FALSE,FALSE,
23 | CHAPTER 25.,23,25,25,FALSE,FALSE,
24 | CHAPTER 26.,24,26,26,FALSE,FALSE,
25 | CHAPTER 27.,25,27,27,FALSE,FALSE,
26 | CHAPTER 28.,26,28,28,FALSE,FALSE,
27 | CHAPTER 29.,27,29,29,FALSE,FALSE,
28 | CHAPTER 30.,28,30,30,FALSE,FALSE,
29 | CHAPTER 31.,29,31,31,FALSE,FALSE,
30 | CHAPTER 382.,30,382,32,TRUE,FALSE,
31 | CHAPTER 33.,31,33,33,FALSE,FALSE,
32 | CHAPTER 34.,32,34,34,FALSE,FALSE,
33 | CHAPTER 35.,33,35,35,FALSE,FALSE,
34 | 


--------------------------------------------------------------------------------
/examples/split_cleanup/1899_public_chapnumflags_step5.csv:
--------------------------------------------------------------------------------
 1 | chap_title,raw_num,chapter_index,corrected_num,correction_made,flag
 2 | CHAPTER 1.0,1,2,1,FALSE,TRUE
 3 | CHaprer 1—2.,,3,,FALSE,TRUE
 4 | CHAPTER 2.0,2,4,2,FALSE,TRUE
 5 | CHAPTER 3.0,3,5,3,FALSE,TRUE
 6 | CHAPTER 4.0,4,6,4,FALSE,TRUE
 7 | CHAPTER 5.0,5,7,5,FALSE,TRUE
 8 | CHAPTER 6.0,6,8,6,FALSE,TRUE
 9 | CHAPTER 6&8.,,9,,FALSE,TRUE
10 | CHAPTER 9.0,9,10,9,FALSE,TRUE
11 | CHAPTER 10.0,10,11,10,FALSE,TRUE
12 | CHAPTER 11.0,11,12,11,FALSE,TRUE
13 | CHAPTER 13.0,13,13,13,FALSE,TRUE
14 | CHAPTER 14.0,14,14,14,FALSE,TRUE
15 | CHAPTER 15.0,15,15,15,FALSE,TRUE
16 | CHAPTER 16.0,16,16,16,FALSE,FALSE
17 | CHAPTER 17.0,17,17,17,FALSE,FALSE
18 | CHAPTER 18.0,18,18,18,FALSE,FALSE
19 | CHAPTER 19.0,19,19,19,FALSE,FALSE
20 | CHAPTER 20.0,20,20,20,FALSE,FALSE
21 | CHAPTER 21.0,21,21,21,FALSE,FALSE
22 | CHAPTER 22.0,22,22,22,FALSE,FALSE
23 | CHAPTER 23.0,23,23,23,FALSE,FALSE
24 | CHAPTER 24.0,24,24,24,FALSE,FALSE
25 | CHAPTER 25.0,25,25,25,FALSE,FALSE
26 | CHAPTER 26.0,26,26,26,FALSE,FALSE
27 | CHAPTER 27.0,27,27,27,FALSE,FALSE
28 | CHAPTER 28.0,28,28,28,FALSE,FALSE
29 | CHAPTER 29.0,29,29,29,FALSE,FALSE
30 | CHAPTER 30.0,30,30,30,FALSE,FALSE
31 | CHAPTER 31.0,31,31,31,FALSE,FALSE
32 | CHAPTER 32.0,32,32,32,FALSE,FALSE
33 | CHAPTER 33.0,33,33,33,FALSE,FALSE
34 | CHAPTER 34.0,34,34,34,FALSE,FALSE
35 | CHAPTER 35.0,35,35,35,FALSE,FALSE
36 | 


--------------------------------------------------------------------------------
/examples/split_cleanup/1899_public_weird_chaps_example.csv:
--------------------------------------------------------------------------------
 1 | ﻿vol,ch_index,ch_title,sec_index,sec_title,gap
 2 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,1,Chapter_Title,
 3 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,2,Section 1.,1
 4 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,3,SkEc. 2.,1
 5 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,4,Sec. 2.,0
 6 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,5,Src. 3.,1
 7 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,6,Src. 4.,1
 8 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,7,Suc. 5.,1
 9 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,8,Src. 6.,1
10 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,9,Suc. 7.,1
11 | publiclawsresolu1899nort_public laws,18,CHAPTER 17.0,10,Suc. 8.,1
12 | 


--------------------------------------------------------------------------------
/examples/split_cleanup/chap_num_manual.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/examples/split_cleanup/chap_num_manual.png


--------------------------------------------------------------------------------
/examples/split_cleanup/step4_fixlog.csv:
--------------------------------------------------------------------------------
1 | ﻿Volume,Reviewer,Chapter(s),Notes,Affected image jpg url,Transcription required,transcription_index,transcription_ID,transcription_order,transcription_chapter,transcription_section,transcription_text
2 | publiclawsresolu1899nort_public laws_data,Neil,1,"Page header misread as chapter title. Last line of Chapter 1, Sec. 66 at top of page was left out by OCR. OCR resumes at sec. 67, but sections 67-70 have been assigned in the raw file to the misread chapter title. False chapter title removed, Chapter 1.0 title extended through end of chapter 1. ""Chapter_Title"" value replaced with Sec 66. Value in section column for affected rows.",https://archive.org/download/publiclawsresolu1899nort/publiclawsresolu1899nort_jp2.zip/publiclawsresolu1899nort_jp2%2Fpubliclawsresolu1899nort_0096.jp2&ext=jpg,yes,12034,1,1,CHAPTER 1.0,SEC. 66.,"his right mind, or such time as he may be considered harmless and incurable."
3 | publiclawsresolu1899nort_public laws_data,Neil,6,"OCR missed bits from chapter 6, sections 4 and 5.",https://archive.org/download/publiclawsresolu1899nort/publiclawsresolu1899nort_jp2.zip/publiclawsresolu1899nort_jp2%2Fpubliclawsresolu1899nort_0102.jp2&ext=jpg,yes,14494,2,1,CHAPTER 6.0,Sec. 4.,Sec. 4. that this act shall apply to the election of
4 | publiclawsresolu1899nort_public laws_data,Neil,6,"OCR missed bits from chapter 6, sections 4 and 5.",https://archive.org/download/publiclawsresolu1899nort/publiclawsresolu1899nort_jp2.zip/publiclawsresolu1899nort_jp2%2Fpubliclawsresolu1899nort_0102.jp2&ext=jpg,yes,14513,3,1,CHAPTER 6.0,Sec. 4.,act are hereby repealed.
5 | publiclawsresolu1899nort_public laws_data,Neil,6,"OCR missed bits from chapter 6, sections 4 and 5.",https://archive.org/download/publiclawsresolu1899nort/publiclawsresolu1899nort_jp2.zip/publiclawsresolu1899nort_jp2%2Fpubliclawsresolu1899nort_0102.jp2&ext=jpg,yes,14514,4,1,CHAPTER 6.0,Sec. 5.,"Sec. 5. That this act shall be in force from and after its ratification. Ratified this seventh day of January, A.D. eighteen hundred and ninety-nine."
6 | publiclawsresolu1899nort_public laws_data,Neil,"7,8","Split script missed the chapter 7 chapter header and OCR misread the chapter 8 header (""chapter 6&8""). Mis-assigned chapter and section titles were corrected.",https://archive.org/download/publiclawsresolu1899nort/publiclawsresolu1899nort_jp2.zip/publiclawsresolu1899nort_jp2%2Fpubliclawsresolu1899nort_0102.jp2&ext=jpg,no,,,,,,
7 | publiclawsresolu1899nort_public laws_data,Neil,12,"Chapter 12 header misread as ""Ch"" ""meh"" ""th"" across 3 rows. Chapter/section titles cleaned up in vicinity of error",https://archive.org/download/publiclawsresolu1899nort/publiclawsresolu1899nort_jp2.zip/publiclawsresolu1899nort_jp2%2Fpubliclawsresolu1899nort_0132.jp2&ext=jpg,no,,,,,,
8 | 


--------------------------------------------------------------------------------
/images/Pauli_Murray.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/images/Pauli_Murray.jpg


--------------------------------------------------------------------------------
/images/UniversityLibraries_logo_black_h75.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/images/UniversityLibraries_logo_black_h75.png


--------------------------------------------------------------------------------
/images/mellon-foundation-logo.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/images/mellon-foundation-logo.jpg


--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 | <!DOCTYPE html>
2 | <meta charset="utf-8">
3 | <title>Redirecting to https://onthebooks.lib.unc.edu</title>
4 | <meta http-equiv="refresh" content="0; URL=https://onthebooks.lib.unc.edu/">
5 | <link rel="canonical" href="https://onthebooks.lib.unc.edu/">
6 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/installation.md:
--------------------------------------------------------------------------------
 1 | # Software Installation Documentation 
 2 | ### Optional: Install the Windows Subsystem for Linux (Ubuntu):  
 3 | * To install Linux, first enable the Windows Subsystem for Linux optional feature by running the following command in Windows Powershell (Start --> Windows Powershell  --> Right click --> Run as Administrator):   
 4 |     * _Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux_   
 5 | * Restart when prompted 
 6 | * From the [Microsoft store](https://www.microsoft.com/en-us/p/ubuntu/9nblggh4msv6?activetab=pivot:overviewtab) Get Ubuntu 
 7 | * Click on the install button  
 8 | * Click on the launch button 
 9 | * Once Installation is complete, create a UNIX user and password 
10 |      
11 | ### 1. Install Anaconda Python: 
12 | * Install Anaconda from [Anaconda website](https://www.anaconda.com/distribution/) 
13 | * Download latest Python version (64 Bit - Graphical Installer)  
14 | * Run the Anaconda setup file with default settings 
15 |   
16 | ### 2. Install Tesseract: 
17 | * __Windows__: 
18 |     * Install from [Github link](https://github.com/UB-Mannheim/tesseract/wiki) 
19 |     * Click on tesseract-ocr-w64-setup-v4.1.0.20190314 (rc1) 
20 |     * Run the Setup file with default settings, noting the installation location
21 |     * Add the installation location to your PATH variable
22 | * __MacOS__: 
23 |     * To install using MacPorts run the command: _sudo port install tesseract_ 
24 |     * To install using Homebrew run the command: _brew install tesseract_  
25 |       
26 | ### 3. Install Python packages
27 | * Optional:
28 | Use your conda terminal to create a new environment called "onthebooks":
29 | ```
30 | conda create -n onthebooks python=3.7
31 | ```
32 | Activate the environment and reinstall basic dependencies with:
33 | ```
34 | conda activate onthebooks
35 | conda install pandas
36 | conda install spyder
37 | ```
38 | 
39 | * Open a conda terminal, and execute the following:
40 |    + Note: Pillow needs to be reinstalled after openjpeg is installed to correctly link to the jpeg2000 decoder.
41 | ```
42 | conda install openjpeg
43 | pip install Pillow --force-reinstall
44 | pip install pyspellchecker
45 | pip install pytesseract
46 | ```
47 | 


--------------------------------------------------------------------------------
/oer/.ipynb_checkpoints/environment_backup-checkpoint.yml:
--------------------------------------------------------------------------------
 1 | # This file is a duplicate of the environment.yml file
 2 | # stored in the root of the On The Books Github repository.
 3 | # If you only need the oer folder in the repository,
 4 | # copy its contents to a new Github repository and rename
 5 | # this file to environment.yml so that Binder will pull your
 6 | # dependencies from this file.
 7 | name: oer-environment
 8 | channels:
 9 |   - conda-forge
10 | dependencies:
11 |   - python
12 |   - pip
13 |   - pip:
14 |     - geopandas
15 |     - internetarchive
16 |     - matplotlib
17 |     - nltk
18 |     - pandas
19 |     - pillow
20 |     - pyspellchecker
21 |     - pytesseract
22 |     - requests
23 | 


--------------------------------------------------------------------------------
/oer/NC_counties.txt:
--------------------------------------------------------------------------------
  1 | Alamance
  2 | Alexander
  3 | Alleghany
  4 | Anson
  5 | Ashe
  6 | Avery
  7 | Beaufort
  8 | Bertie
  9 | Bladen
 10 | Brunswick
 11 | Buncombe
 12 | Burke
 13 | Cabarrus
 14 | Caldwell
 15 | Camden
 16 | Carteret
 17 | Caswell
 18 | Catawba
 19 | Chatham
 20 | Cherokee
 21 | Chowan
 22 | Clay
 23 | Cleveland
 24 | Columbus
 25 | Craven
 26 | Cumberland
 27 | Currituck
 28 | Dare
 29 | Davidson
 30 | Davie
 31 | Duplin
 32 | Durham
 33 | Edgecombe
 34 | Forsyth
 35 | Franklin
 36 | Gaston
 37 | Gates
 38 | Graham
 39 | Granville
 40 | Greene
 41 | Guilford
 42 | Halifax
 43 | Harnett
 44 | Haywood
 45 | Henderson
 46 | Hertford
 47 | Hoke
 48 | Hyde
 49 | Iredell
 50 | Jackson
 51 | Johnston
 52 | Jones
 53 | Lee
 54 | Lenoir
 55 | Lincoln
 56 | McDowell
 57 | Macon
 58 | Madison
 59 | Martin
 60 | Mecklenburg
 61 | Mitchell
 62 | Montgomery
 63 | Moore
 64 | Nash
 65 | New Hanover
 66 | Northampton
 67 | Onslow
 68 | Orange
 69 | Pamlico
 70 | Pasquotank
 71 | Pender
 72 | Perquimans
 73 | Person
 74 | Pitt
 75 | Polk
 76 | Randolph
 77 | Richmond
 78 | Robeson
 79 | Rockingham
 80 | Rowan
 81 | Rutherford
 82 | Sampson
 83 | Scotland
 84 | Stanly
 85 | Stokes
 86 | Surry
 87 | Swain
 88 | Transylvania
 89 | Tyrrell
 90 | Union
 91 | Vance
 92 | Wake
 93 | Warren
 94 | Washington
 95 | Watauga
 96 | Wayne
 97 | Wilkes
 98 | Wilson
 99 | Yadkin
100 | Yancey


--------------------------------------------------------------------------------
/oer/environment_backup.yml:
--------------------------------------------------------------------------------
 1 | # This file is a duplicate of the environment.yml file
 2 | # stored in the root of the On The Books Github repository.
 3 | # If you only need the oer folder in the repository,
 4 | # copy its contents to a new Github repository and rename
 5 | # this file to environment.yml so that Binder will pull your
 6 | # dependencies from this file.
 7 | name: oer-environment
 8 | channels:
 9 |   - conda-forge
10 | dependencies:
11 |   - python
12 |   - pip
13 |   - pip:
14 |     - geopandas
15 |     - internetarchive
16 |     - matplotlib
17 |     - nltk
18 |     - pandas
19 |     - pillow
20 |     - pyspellchecker
21 |     - pytesseract
22 |     - requests
23 | 


--------------------------------------------------------------------------------
/oer/images/00-intro-01.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-01.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-02.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-02.jpg


--------------------------------------------------------------------------------
/oer/images/00-intro-03.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-03.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-04.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-04.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-05.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-05.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-06.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-06.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-07.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-07.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-08.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-08.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-09.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-09.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-10.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-10.jpeg


--------------------------------------------------------------------------------
/oer/images/00-intro-11.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-11.jpg


--------------------------------------------------------------------------------
/oer/images/00-intro-12.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-12.jpg


--------------------------------------------------------------------------------
/oer/images/00-intro-25.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/00-intro-25.jpg


--------------------------------------------------------------------------------
/oer/images/01-algorithms-01.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/01-algorithms-01.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-01.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-01.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-02.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-02.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-03.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-03.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-04.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-04.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-05.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-05.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-06.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-06.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-07.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-07.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-08.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-08.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-09.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-09.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-10.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-10.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-11.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-11.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-12.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-12.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-13.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-13.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-14.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-14.jpeg


--------------------------------------------------------------------------------
/oer/images/06-corpus-15.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-15.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-16.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-16.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-17.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-17.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-18.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-18.jpg


--------------------------------------------------------------------------------
/oer/images/06-corpus-runcode.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/06-corpus-runcode.mp4


--------------------------------------------------------------------------------
/oer/images/07-ocr-01.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-01.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-02.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-02.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-03.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-03.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-04.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-04.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-05.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-05.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-05.txt:
--------------------------------------------------------------------------------
1 | hereby re-enacted: Provided, however, that convicts shall not be worked on said railroad in the counties of New l Hanover or Pender. subSEC. 2. That if the company shall fail to begin the be '.uc-construction of the road within twelve months from the 


--------------------------------------------------------------------------------
/oer/images/07-ocr-06.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-06.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-06.txt:
--------------------------------------------------------------------------------
1 | year the sum of twenty-five cents on each three hundred dollars' t\"Orth of property and the same arnouut on each poll, which shall constitute and he held a sinking fund: 


--------------------------------------------------------------------------------
/oer/images/07-ocr-07.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-07.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-07.txt:
--------------------------------------------------------------------------------
1 | a S€1'.)arate fund, 


--------------------------------------------------------------------------------
/oer/images/07-ocr-08.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/07-ocr-08.jpeg


--------------------------------------------------------------------------------
/oer/images/07-ocr-08.txt:
--------------------------------------------------------------------------------
1 | meanor and upon conviction shall be tined not less than ten doliars nor more than thirty Jollars, or imprisoned uot less tbun ten days nor rnore than thirty dayc, or both at the disereLiou of the cCJurc. SEC. 10 . .?i'ovided, tliat no person shall be admitted into I sa.1d school a-: a studet"!t who has uot :1ttained the age of fiftEeu yeHrs; and that all tbo::ie \Vho shnll eujoy the priv


--------------------------------------------------------------------------------
/oer/images/08-ocr-01.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-01.jpeg


--------------------------------------------------------------------------------
/oer/images/08-ocr-02.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-02.jpeg


--------------------------------------------------------------------------------
/oer/images/08-ocr-03.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-03.jpeg


--------------------------------------------------------------------------------
/oer/images/08-ocr-04.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-04.jpeg


--------------------------------------------------------------------------------
/oer/images/08-ocr-05.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-05.jpeg


--------------------------------------------------------------------------------
/oer/images/08-ocr-06.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-06.jpeg


--------------------------------------------------------------------------------
/oer/images/08-ocr-07.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/08-ocr-07.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-01.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-01.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-02.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-02.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-03.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-03.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-04.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-04.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-05.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-05.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-06.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-06.jpeg


--------------------------------------------------------------------------------
/oer/images/09-data-07.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/09-data-07.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-01.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-01.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-02.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-02.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-03.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-03.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-04.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-04.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-05.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-05.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-06.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-06.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-07.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-07.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-08.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-08.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-09.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-09.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-10.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-10.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-11.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-11.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-12.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-12.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-13.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-13.jpeg


--------------------------------------------------------------------------------
/oer/images/10-explore-14.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/10-explore-14.jpeg


--------------------------------------------------------------------------------
/oer/images/Anaconda_Nucleus_Horizontal_white.svg:
--------------------------------------------------------------------------------
1 | <svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1618 313"><defs><style>.cls-1{fill:#fff;}.cls-2{fill:#3dae2b;}</style></defs><path class="cls-1" d="M305.84,112c-.5-.7-.9-1.4-2.1-1.4h-1.2a2.05,2.05,0,0,0-2.1,1.4L261,197.6a2.17,2.17,0,0,0,2.1,3.2h11.1a3.67,3.67,0,0,0,3.7-2.6l6.3-13.7h38l6.3,13.7c.9,1.9,1.9,2.6,3.7,2.6h11.1c1.9,0,3-1.6,2.1-3.2ZM290,170.4l12.5-27.8h.5l12.8,27.8Z"/><path class="cls-1" d="M426.74,112h-11.6a2.32,2.32,0,0,0-2.3,2.3v52.4h-.2l-52.9-56.1h-3.2a2.26,2.26,0,0,0-2.3,2.3v85.4a2.47,2.47,0,0,0,2.3,2.3h11.6a2.32,2.32,0,0,0,2.3-2.3V143.7h.2l53.1,58.2h3a2.26,2.26,0,0,0,2.3-2.3V114.3A2.2,2.2,0,0,0,426.74,112Z"/><path class="cls-1" d="M483.74,112c-.5-.7-.9-1.4-2.1-1.4h-1.2a2.05,2.05,0,0,0-2.1,1.4l-39.4,85.6a2.17,2.17,0,0,0,2.1,3.2h11.1a3.67,3.67,0,0,0,3.7-2.6l6.3-13.7h38l6.3,13.7c.9,1.9,1.9,2.6,3.7,2.6h11.1c1.9,0,3-1.6,2.1-3.2Zm-15.8,58.4,12.5-27.8h.5l12.8,27.8Z"/><path class="cls-1" d="M592.34,178.1a2.42,2.42,0,0,0-3.2,0,30,30,0,0,1-19.5,7.2c-16.2,0-28.5-13.5-28.5-29.2,0-16,12.1-29.7,28.3-29.7a30.28,30.28,0,0,1,19.5,7.4,2.23,2.23,0,0,0,3.2,0l7.7-7.9a2.43,2.43,0,0,0-.2-3.5c-8.6-7.7-17.9-11.8-30.6-11.8a45.7,45.7,0,1,0,0,91.4,43.36,43.36,0,0,0,30.9-12.3,2.51,2.51,0,0,0,.2-3.5Z"/><path class="cls-1" d="M652.44,110.6a45.7,45.7,0,1,0,45.7,45.9A45.5,45.5,0,0,0,652.44,110.6Zm0,74.9a29.1,29.1,0,0,1,0-58.2,29.1,29.1,0,1,1,0,58.2Z"/><path class="cls-1" d="M784.64,112H773a2.32,2.32,0,0,0-2.3,2.3v52.4h-.2l-52.9-56.1h-3.2a2.26,2.26,0,0,0-2.3,2.3v85.4a2.47,2.47,0,0,0,2.3,2.3H726a2.32,2.32,0,0,0,2.3-2.3V143.7h.2l53.1,58.2h3a2.26,2.26,0,0,0,2.3-2.3V114.3A2.47,2.47,0,0,0,784.64,112Z"/><path class="cls-1" d="M835.44,112h-30.2a2.47,2.47,0,0,0-2.3,2.3v84a2.26,2.26,0,0,0,2.3,2.3h30.2a44.47,44.47,0,0,0,44.5-44.5C879.94,131.9,860,112,835.44,112ZM834,185.3h-14.6v-58h14.4c16.5,0,28.5,12.5,28.5,29C862.54,172.7,850.54,185.3,834,185.3Z"/><path class="cls-1" d="M964.84,197.8l-39.2-85.4c-.5-.7-.9-1.4-2.1-1.4h-1.2a2.05,2.05,0,0,0-2.1,1.4l-39.4,85.4a2.17,2.17,0,0,0,2.1,3.2H894a3.67,3.67,0,0,0,3.7-2.6l6.3-13.7h38l6.3,13.7c.9,1.9,1.9,2.6,3.7,2.6h11.1A2.17,2.17,0,0,0,964.84,197.8Zm-54.7-27.4,12.5-27.8h.5l12.7,27.8Z"/><path class="cls-1" d="M980.64,199.4h.7a.54.54,0,0,0,.5-.5v-2.3h1.4l.9,2.6c0,.2.2.2.5.2h.9a.54.54,0,0,0,.5-.5c-.5-.7-.7-1.6-1.2-2.3a2.27,2.27,0,0,0,1.6-2.3,2.52,2.52,0,0,0-2.6-2.6h-3a.54.54,0,0,0-.5.5v6.7C980.44,199.4,980.44,199.4,980.64,199.4Zm1.2-6.2h1.9a.83.83,0,0,1,.9.9,1,1,0,0,1-.9.9h-1.9Z"/><path class="cls-1" d="M982.94,203.1a7.26,7.26,0,0,0,7.4-7.4,7.4,7.4,0,1,0-7.4,7.4Zm0-13.7a6,6,0,1,1-6,6A6.15,6.15,0,0,1,982.94,189.4Z"/><path class="cls-2" d="M96.84,197.6v-.5A86.85,86.85,0,0,1,98,183.6v-.5l-.5-.2a113.37,113.37,0,0,1-12.1-6l-.5-.2-.2.5a127.32,127.32,0,0,0-6.5,16.2l-.2.5.5.2a87.92,87.92,0,0,0,17.6,3.2Z"/><path class="cls-2" d="M108.94,118h0c-3.5,0-6.7.2-10.2.5a73.09,73.09,0,0,0,1.9,10.2A41.52,41.52,0,0,1,108.94,118Z"/><path class="cls-2" d="M96.84,201.7v-.5h-.5a119.4,119.4,0,0,1-15.1-2.6l-1.4-.2.7,1.2A80.6,80.6,0,0,0,97,217.7l.9.9V217A141.18,141.18,0,0,1,96.84,201.7Z"/><path class="cls-2" d="M121.24,81.6A80.57,80.57,0,0,0,104,90a100.86,100.86,0,0,1,11.8,2.8A104.44,104.44,0,0,1,121.24,81.6Z"/><path class="cls-2" d="M146.44,77.4a83.34,83.34,0,0,0-8.8.5,89.23,89.23,0,0,1,11.8,9.5l3,2.8-3,3a69.75,69.75,0,0,0-7.4,7.9v.2a16.43,16.43,0,0,0-1.2,1.4,52.4,52.4,0,0,1,5.6-.2,53.4,53.4,0,0,1,0,106.8,51.82,51.82,0,0,1-27.8-7.9,97.07,97.07,0,0,1-12.1.7,52.4,52.4,0,0,1-5.6-.2,181.18,181.18,0,0,0,1.6,18.8,77,77,0,0,0,43.8,13.5,78.4,78.4,0,0,0,.1-156.8Z"/><path class="cls-2" d="M136,96a62.71,62.71,0,0,1,4.9-5.3c-3.7-3-7.7-6-11.6-8.6a98.54,98.54,0,0,0-6.5,13c3.5,1.4,7,2.8,10.4,4.4C134.64,97.6,135.84,96.2,136,96Z"/><path class="cls-2" d="M82.94,140.7l.2.5.5-.2A104.06,104.06,0,0,1,95,133.8l.5-.2v-.5a87.29,87.29,0,0,1-2.6-13.7v-.5h-.5a113.37,113.37,0,0,0-16.9,4.6l-.5.2.2.5A66.57,66.57,0,0,0,82.94,140.7Z"/><path class="cls-2" d="M81.94,148.1l-.5.5A105.62,105.62,0,0,0,69.64,160l-.4.4.5.5a137.16,137.16,0,0,0,13,9.7l.5.2.2-.5a103.33,103.33,0,0,1,7-10.7l.2-.5-.2-.2a103.68,103.68,0,0,1-8.1-10.2Z"/><path class="cls-2" d="M112.84,197.8h1.4l-1.2-.9a47.52,47.52,0,0,1-10.7-12.1v-.2l-.9-.5v.7a97.47,97.47,0,0,0-.9,12.3v.5h.5c1.9,0,3.7.2,5.6.2Z"/><path class="cls-2" d="M110.24,111a106.49,106.49,0,0,1,3.2-11.4,96.52,96.52,0,0,0-15.1-3.2,98,98,0,0,0-.2,15.3A121,121,0,0,1,110.24,111Z"/><path class="cls-2" d="M117.74,110.8a68.43,68.43,0,0,1,10.9-5.3,59.88,59.88,0,0,0-8.4-3.5C119.34,104.8,118.44,107.8,117.74,110.8Z"/><path class="cls-2" d="M81.34,174.4h-.5a131,131,0,0,1-11.6-8.4l-1.2-.9.2,1.4a71.47,71.47,0,0,0,6.3,21.8l.5,1.2.5-1.2a124.16,124.16,0,0,1,5.6-13Z"/><path class="cls-2" d="M91.24,100.1a81.94,81.94,0,0,0-12.1,15.5c3.9-1.2,8.1-2.1,12.3-3C91.24,108.5,91.24,104.3,91.24,100.1Z"/><path class="cls-2" d="M93.54,155.1v-1.4a52.51,52.51,0,0,1,2.3-13.9l.5-1.4-1.2.7c-3,1.9-6,3.7-9,5.8l-.5.2.7.5c2.1,2.8,4.2,5.8,6.5,8.4Z"/><path class="cls-2" d="M94.24,163.2l-.2-1.4-.7,1.2c-2.1,3-4.2,6.3-6,9.5l-.2.5.5.2a107.77,107.77,0,0,0,10.2,5.1l1.2.5-.5-1.2A47.93,47.93,0,0,1,94.24,163.2Z"/><path class="cls-2" d="M78.94,144.4l.5-.2-.2-.5a112.35,112.35,0,0,1-6.7-12.3l-.5-1.2-.5,1.4a78,78,0,0,0-3.7,22V155l.9-.9A62.38,62.38,0,0,1,78.94,144.4Z"/><path class="cls-2" d="M1016.86,111.45a2.44,2.44,0,0,1,2.47-2.34h3.24L1081.06,182h.26V112.87a2.46,2.46,0,0,1,2.46-2.46h4.54a2.55,2.55,0,0,1,2.46,2.46v87.28a2.44,2.44,0,0,1-2.46,2.34H1086l-59.53-74.31h-.13v70.54a2.47,2.47,0,0,1-2.46,2.47h-4.54a2.56,2.56,0,0,1-2.47-2.47Z"/><path class="cls-2" d="M1105.43,112.87a2.56,2.56,0,0,1,2.47-2.46h4.93a2.46,2.46,0,0,1,2.46,2.46V167c0,14.79,9.21,26.33,24.38,26.33s24.64-11.28,24.64-26.07V112.87a2.46,2.46,0,0,1,2.46-2.46h4.93a2.56,2.56,0,0,1,2.47,2.46v54.86c0,19.71-14,34.76-34.5,34.76s-34.24-15.05-34.24-34.76Z"/><path class="cls-2" d="M1230.57,109.11c13.1,0,22.57,4.54,31.39,12.06a2.51,2.51,0,0,1,.13,3.63l-3.5,3.51c-1,1.29-2.08,1.16-3.38-.13-6.74-5.84-16-9.73-24.76-9.73-20.5,0-36.19,17.25-36.19,37.35S1210,193,1230.45,193c11.28,0,17.89-4.54,24.76-9.73,1.3-1,2.34-.91,3.12-.39l3.89,3.5c1,.78.78,2.6-.13,3.51a44.51,44.51,0,0,1-31.52,12.58,46.69,46.69,0,1,1,0-93.38Z"/><path class="cls-2" d="M1278.16,112.87a2.47,2.47,0,0,1,2.47-2.46h5.06a2.55,2.55,0,0,1,2.46,2.46V192.5h37.35A2.46,2.46,0,0,1,1328,195v3.76a2.46,2.46,0,0,1-2.46,2.47h-44.87a2.47,2.47,0,0,1-2.47-2.47Z"/><path class="cls-2" d="M1342.62,112.87a2.46,2.46,0,0,1,2.46-2.46h51.09a2.47,2.47,0,0,1,2.47,2.46v3.77a2.47,2.47,0,0,1-2.47,2.46H1352.6v31.64h37.22a2.55,2.55,0,0,1,2.46,2.47V157a2.46,2.46,0,0,1-2.46,2.46H1352.6V192.5h43.57a2.47,2.47,0,0,1,2.47,2.46v3.76a2.47,2.47,0,0,1-2.47,2.47h-51.09a2.46,2.46,0,0,1-2.46-2.47Z"/><path class="cls-2" d="M1412.9,112.87a2.56,2.56,0,0,1,2.47-2.46h4.92a2.47,2.47,0,0,1,2.47,2.46V167c0,14.79,9.2,26.33,24.38,26.33s24.64-11.28,24.64-26.07V112.87a2.46,2.46,0,0,1,2.46-2.46h4.93a2.55,2.55,0,0,1,2.46,2.46v54.86c0,19.71-14,34.76-34.49,34.76s-34.24-15.05-34.24-34.76Z"/><path class="cls-2" d="M1495.89,187.57c.65-.78,1.3-1.68,1.95-2.46,1.3-1.69,2.72-2.73,4.54-1.17.91.78,10.37,9.86,21.92,9.86,10.5,0,17.37-6.62,17.37-14.27,0-8.95-7.78-14.26-22.69-20.49-14.27-6.22-22.83-12.06-22.83-26.84,0-8.82,7-23.09,27.63-23.09a42.35,42.35,0,0,1,22.17,6.62c.78.39,2.34,1.94.78,4.41-.52.77-1,1.68-1.56,2.46-1.16,1.82-2.46,2.33-4.53,1.17-.91-.52-9.08-6-17-6-13.75,0-17.9,8.82-17.9,14.27,0,8.69,6.62,13.74,17.51,18.28,17.51,7.14,28.79,13.75,28.79,28.79,0,13.49-12.84,23.35-28,23.35A42.27,42.27,0,0,1,1496.67,192C1495.64,191.07,1494.34,190,1495.89,187.57Z"/><path class="cls-2" d="M1557.48,202.49a.46.46,0,0,1-.52-.53v-5.44h-1.91c-.35,0-.53-.17-.53-.5v-.08c0-.33.18-.49.53-.49h5c.35,0,.52.16.52.49V196c0,.33-.17.5-.52.5h-1.91V202a.47.47,0,0,1-.53.53Zm4.16-6.55a.47.47,0,0,1,.53-.49h.18a.62.62,0,0,1,.62.39l1.47,3.29a.92.92,0,0,1,.09.23c0,.08,0,.15.07.21a2,2,0,0,1,.07.23h0a.68.68,0,0,1,.07-.23,1.82,1.82,0,0,1,.16-.44l1.47-3.29a.62.62,0,0,1,.62-.39h.18a.47.47,0,0,1,.53.49l.48,6a.46.46,0,0,1-.52.55h-.16a.48.48,0,0,1-.52-.5l-.29-3.71a.49.49,0,0,1,0-.17c0-.06,0-.11,0-.17a.61.61,0,0,1,0-.14h0l-1.36,3a.6.6,0,0,1-.61.42h-.09a.6.6,0,0,1-.61-.42l-1.36-3h0a.61.61,0,0,1,0,.14v.34l-.3,3.71a.48.48,0,0,1-.53.5h-.16a.46.46,0,0,1-.51-.55Z"/></svg>


--------------------------------------------------------------------------------
/oer/images/LawBooks-feature.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/LawBooks-feature.png


--------------------------------------------------------------------------------
/oer/images/chronam_daybook_19151112_pellagra_full.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/chronam_daybook_19151112_pellagra_full.jpg


--------------------------------------------------------------------------------
/oer/images/chronam_daybook_19151112_pellagra_full_bboxes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/chronam_daybook_19151112_pellagra_full_bboxes.png


--------------------------------------------------------------------------------
/oer/images/noun_arrow with loops_2073885.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/noun_arrow with loops_2073885.png


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0057.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0057.jpg


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0057_300ppi.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0057_300ppi.jpg


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0057_grayscale.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0057_grayscale.jpg


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0057_inverted.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0057_inverted.jpg


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0057_rotated.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0057_rotated.jpg


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0057_skewed.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0057_skewed.jpg


--------------------------------------------------------------------------------
/oer/images/sessionlawsresol1955nort_0058.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/images/sessionlawsresol1955nort_0058.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0000.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0000.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0001.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0002.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0003.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0004.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0005.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0006.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0007.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0008.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0009.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0010.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0011.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0012.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0013.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0014.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0015.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0016.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0016.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0017.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0017.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0018.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0018.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0019.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0019.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0020.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0020.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0021.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0021.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0022.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0022.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0023.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0023.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0024.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0024.jpg


--------------------------------------------------------------------------------
/oer/jpg_output/sessionlawsresol1955nort_0025.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/jpg_output/sessionlawsresol1955nort_0025.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0057.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0057.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0058.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0058.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0059.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0059.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0060.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0060.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0061.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0061.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0062.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0062.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0063.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0063.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0064.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0064.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0065.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0065.jpg


--------------------------------------------------------------------------------
/oer/sample/sessionlawsresol1955nort_0066.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample/sessionlawsresol1955nort_0066.jpg


--------------------------------------------------------------------------------
/oer/sample_output.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sample_output.txt


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0057.txt:
--------------------------------------------------------------------------------
 1 | SESSION LAWS
 2 | 
 3 | OF THE
 4 | 
 5 | STATE OF NORTH CAROLINA
 6 | 
 7 | SESSION 1955
 8 | 
 9 | S. B. 4 CHAPTER 1
10 | 
11 | AN ACT TO AUTHORIZE THE BOARD OF TRUSTEES OF THE
12 | SOUTHERN PINES SCHOOL DISTRICT TO TRANSFER CERTAIN
13 | FUNDS FROM ITS DEBT SERVICE ACCOUNT TO ITS CAPITAL
14 | OUTLAY OR CURRENT EXPENSE ACCOUNTS, OR TO BOTH
15 | SUCH ACCOUNTS.
16 | 
17 | The General Assembly of North Carolina do enact:
18 | 
19 | Section 1. The Board of Trustees of the Southern Pines School Dis-
20 | triet is hereby authorized and empowered to transfer all surplus funds held
21 | by it in its debt service account on the date of the ratification of this Act
22 | or on July 1, 1955, to its capital outlay account or current expense account,
23 | or to both such accounts, and to use said funds for capital outlay or current
24 | expense purposes, or both, including the construction of school buildings.
25 | 
26 | See. 2. All laws and clauses of laws in conflict with this Act are hereby
27 | repealed.
28 | 
29 | See. 3. This Act shall become effective on and after its ratification.
30 | 
31 | In the General Assembly read three times and ratified, this the 14th
32 | day of January, 1955.
33 | 
34 | H. B. 13 CHAPTER 2
35 | 
36 | AN ACT TO PERMIT THE BOARD OF COMMISSIONERS OF
37 | CATAWBA COUNTY TO MAKE APPROPRIATIONS FOR BUILDING
38 | WATER LINES, SEWER LINES OR EITHER OF THEM, FROM THE
39 | CORPORATE LIMITS OF MUNICIPALITIES TO COMMUNITIES IN
40 | THE COUNTY.
41 | 
42 | The General Assembly of North Carolina do enact:
43 | 
44 | Section 1. The Board of County Commissioners of Catawba County is
45 | hereby authorized and empowered in its discretion to expend out of non-
46 | tax funds available to said board such amount or amounts as it may deem
47 | wise, not exceeding in the aggregate the sum of one hundred and twenty-
48 | 
49 | 1
50 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0058.txt:
--------------------------------------------------------------------------------
 1 | CH. 2-3 1955—-SEssION Laws
 2 | 
 3 | five thousand dollars ($125,000.00), to be used in such amounts in the dis-
 4 | cretion of said board of county commissioners for the purpose of acquir-
 5 | ing easements for water and sewer lines, or either of them, and for the
 6 | purpose of laying and constructing water and sewer lines or either of them
 7 | from the corporate limits of municipalities located in Catawba County to
 8 | communities located within said county, but outside of the corporate limits
 9 | of municipalities, said water and sewer lines, or either of them, shall be
10 | constructed and laid and said easements therefor shall be acquired, for the
11 | purpose of promoting the general welfare of said county and the expense
12 | of laying and construction of said lines and acquiring said easements is
13 | hereby declared to be expenditures for public purposes.
14 | 
15 | Sec. 2. All laws and clauses of laws in conflict with this Act are hereby
16 | repealed.
17 | 
18 | See. 3. This Act shall be in full force and effect from and after its
19 | ratification.
20 | 
21 | In the General Assembly read three times and ratified, this the 14th
22 | day of January, 1955.
23 | 
24 | S. B. 13 CHAPTER 3
25 | 
26 | AN ACT TO AMEND THE ELECTION LAW HERETOFORE PROVIDED
27 | FOR THE TOWN OF CONETOE, IN EDGECOMBE COUNTY, AND
28 | TO FIX THE DATES OF ELECTIONS FOR SAID TOWN.
29 | 
30 | The General Assembly of North Carolina do enact:
31 | 
32 | Section 1. Amend Section 2 of Chapter 673 of the Session Laws of 1953
33 | by striking out the following: “1953”, appearing in the first line of said
34 | Section 2, and by inserting in lieu thereof the following: “1955”.
35 | 
36 | See. 2. Amend Section 3 of Chapter 673 of the Session Laws of 1953
37 | by striking out the figures “1953”, as the same appear in the eighth line of
38 | said Section 3, and by inserting in lieu thereof the figures “1955”,
39 | 
40 | Sec. 3. Amend Section 4 of Chapter 673 of the Session Laws of 1953 by
41 | striking out the figures “1955”, as the same appear in the first line of said
42 | Section 4, and by inserting in lieu thereof the figures “1957”.
43 | 
44 | Further amend said Section 4 of Chapter 673 of the Session Laws of
45 | 1953 by striking out the figures “1953”, as the same appear in the tenth
46 | line of said Section 4, and by inserting in lieu thereof the figures “1955”.
47 | 
48 | Sec. 4. All laws and clauses of laws in conflict with this Act are hereby
49 | repealed.
50 | 
51 | See. 5. This Act shall be in full force and effect from and after its rati-
52 | fication.
53 | 
54 | In the General Assembly read three times and ratified, this the 14th day
55 | of January, 1955.
56 | 
57 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0059.txt:
--------------------------------------------------------------------------------
 1 | 1955—SEsSION LAWS Cu. 4
 2 | 
 3 | H. B. 34 CHAPTER 4
 4 | 
 5 | AN ACT TO PROVIDE THAT THE OFFICE OF SOLICITOR OF THE
 6 | RECORDER’S COURT OF FRANKLIN COUNTY BE AN ELECTIVE
 7 | OFFICE.
 8 | 
 9 | The General Assembly of North Carolina do enact:
10 | 
11 | Section 1. That Section 6 of Chapter 12, Session Laws of 1951 is here-
12 | by repealed.
13 | 
14 | Sec. 2. That G. S. 7-235 is hereby amended by adding at the end there-
15 | of the following:
16 | 
17 | “Provided that as of February 1, 1955, the office of prosecuting attor-
18 | ney of the Recorder’s Court of Franklin County, denominated solicitor,
19 | shall be an elective office. The first solicitor shall be elected by the Board
20 | of Commissioners of Franklin County on or before February 1, 1955, and
21 | shall hold his office under said appointment until the first Monday in De-
22 | cember, 1956. At the primary and general elections to be held in the year
23 | 1956 and biennially thereafter, the Solicitor of the Recorder’s Court of
24 | Franklin County shall be nominated and elected in the same manner and
25 | at the same time as is now or may hereafter be provided by law for the
26 | nomination and election of the elective officers of the county; the term of
27 | office of said Solicitor shall begin on the first Monday in December follow-
28 | ing the biennial general election at which he shall have been elected and
29 | shall extend to the first Monday in December following the next ensuing
30 | biennial general election. In the event of a vacancy in the office of Solicitor,
31 | either by death, resignation, failure to qualify, or otherwise, the Board of
32 | Commissioners of Franklin County shall fill such vacancy by appointment
33 | and the person so appointed shall serve until the first Monday in December
34 | following the next biennial general election. The salary of said Solicitor
35 | shall be two thousand four hundred dollars ($2400.00) per year and shall
36 | be paid in equal monthly installments from the General Fund of the county.
37 | The said Solicitor shall, at the time of his appointment or nomination and
38 | election, be a qualified elector of Franklin County and a licensed attorney
39 | at law, and before entering upon the duties of his office shall take and sub-
40 | scribe an oath substantially in the form required of State solicitors by
41 | Section 11-11 of the General Statutes of North Carolina, and said oath shall
42 | be recorded by the Clerk of the Superior Court of Franklin County.”
43 | 
44 | Sec. 3. All laws and clauses of laws in conflict with the provisions of
45 | this Act are hereby repealed.
46 | 
47 | See. 4. This Act shall be in full force and effect from and after its rati-
48 | fication.
49 | 
50 | In the General Assembly read three times and ratified, this the 26th day
51 | of January, 1955.
52 | 
53 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0060.txt:
--------------------------------------------------------------------------------
 1 | Cu. 5-6-7 1955—SEssION LAws
 2 | 
 3 | H. B. 2 CHAPTER 5
 4 | 
 5 | AN ACT TO REPEAL CHAPTER 501 OF THE SESSION LAWS OF 1953,
 6 | RELATING TO COMMITTEE HEARINGS ON THE APPROPRIA-
 7 | TIONS BILL.
 8 | 
 9 | The General Assembly of North Carolina do enact:
10 | 
11 | Section 1. Chapter 501 of the Session Laws of 1958 is repealed.
12 | 
13 | See. 2. All laws and clauses of laws in conflict with this Act are here-
14 | by repealed.
15 | 
16 | See. 3. This Act shall be in full force and effect from and after its rati-
17 | fication.
18 | 
19 | In the General Assembly read three times and ratified, this the 28th
20 | day of January, 1955.
21 | 
22 | H. B. 18 CHAPTER 6
23 | 
24 | AN ACT TO AMEND THE CHARTER OF THE CITY OF SALISBURY
25 | BY REQUIRING COUNCIL MEETINGS TO BE HELD AS OFTEN AS
26 | TWICE MONTHLY INSTEAD OF ONCE WEEKLY.
27 | 
28 | The General Assembly of North Carolina do enact:
29 | 
30 | Section 1. Section 8 of Chapter 231 of the Private Laws of 1927, as
31 | amended by Chapter 178 of the Private Laws of 1929, be and the same is
32 | hereby further amended by striking out the first sentence appearing there-
33 | in and inserting in lieu thereof the following:
34 | 
35 | “The Council shall fix suitable times for its regular meetings, which
36 | shall be as often as twice monthly.”
37 | 
38 | Sec. 2. All laws and clauses of laws in conflict with this Act are here-
39 | by repealed.
40 | 
41 | Sec. 3. This Act shall be in full force and effect from and after its
42 | ratification.
43 | 
44 | In the General Assembly read three times and ratified, this the 28th
45 | day of January, 1955.
46 | 
47 | S. B. 33 CHAPTER 7
48 | 
49 | AN ACT TO AMEND ARTICLE 4 OF CHAPTER 15 OF THE GENERAL
50 | STATUTES SO AS TO PROVIDE FOR THE ISSUANCE OF SEARCH
51 | WARRANTS FOR NARCOTIC DRUGS.
52 | 
53 | The General Assembly of North Carolina do enact:
54 | 
55 | Section 1. G. S. 15-25 is amended by inserting between the comma fol-
56 | lowing the word “premises” and the word “any” in line 5 of said Section
57 | the words and punetuation “any narcotic drugs as defined in Article 5 of
58 | Chapter 90 of the General Statutes,”. G. S. 15-25 is further amended by
59 | inserting between the word “such” and the word “stolen” in line 22 of said
60 | Section the words and punctuation “narcotic drugs,”.
61 | 
62 | Sec. 2. All laws and clauses of laws in conflict with this Act are hereby
63 | 
64 | 4
65 | 
66 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0061.txt:
--------------------------------------------------------------------------------
 1 | 1955—SEssion LAws Cu. 7-8-9
 2 | 
 3 | repealed.
 4 | 
 5 | Sec. 3. This Act shall be in full force and effect from and after its rati-
 6 | fication.
 7 | 
 8 | In the General Assembly read three times and ratified, this the 3rd day
 9 | of February, 1955.
10 | 
11 | S. B. 38 CHAPTER 8
12 | 
13 | AN ACT TO AMEND G. S. 7-274 SO AS TO AUTHORIZE THE CLERK
14 | OR DEPUTY CLERK OF THE GENERAL COUNTY COURT OF
15 | HALIFAX COUNTY TO ISSUE CRIMINAL WARRANTS.
16 | 
17 | The General Assembly of North Carolina do enact:
18 | 
19 | Section 1. G. S. 7-274 is hereby amended by striking out the word
20 | “Halifax” as the same appears in line 13.
21 | 
22 | Sec. 2. All laws and clauses of laws in conflict with this Act are hereby
23 | repealed.
24 | 
25 | Sec. 3. This Act shall become effective upon ratification.
26 | 
27 | In the General Assembly read three times and ratified, this the 3rd day
28 | of February, 1955.
29 | 
30 | H. B. 28 CHAPTER 9
31 | 
32 | AN ACT TO AUTHORIZE AND EMPOWER THE BOARD OF COMMIS-
33 | SIONERS OF STOKES COUNTY TO SELL AND CONVEY THE
34 | TRACT OF LAND AND BUILDINGS SITUATED THEREON FOR-
35 | MERLY USED BY THE COUNTY IN CONNECTION WITH THE
36 | OPERATION AND MAINTENANCE OF THE COUNTY HOME
37 | FARM.
38 | 
39 | The General Assembly of North Carolina do enact:
40 | 
41 | Section 1. The Board of County Commissioners of Stokes County is
42 | hereby authorized and empowered to sell at public or private sale the entire
43 | tract of land and buildings situated thereon known as the County Home
44 | Farm or such part or parts thereof as in the discretion of the board will
45 | not be needed for public purposes. If the sale is made at public auction,
46 | notice of the sale shall be published once a week for two successive weeks
47 | in a newspaper of general circulation in the county. After any such public
48 | sale, the board of county commissioners is authorized to reject any bid
49 | which in the opinion of the board is not considered to be the fair market
50 | value of the partial or entire tract of land offered. If, after public auction,
51 | the board of county commissioners rejects the highest bid made, further
52 | public auctions may be held or the partial or entire tract of land may be
53 | sold privately for a higher price.
54 | 
55 | Sec. 2. In carrying out the provisions of this Act the Board of County
56 | Commissioners of Stokes County may execute all necessary deeds and may
57 | employ an auction company to assist with subdividing and selling the
58 | property involved but shall not pay any company so employed more than
59 | 
60 | 5
61 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0062.txt:
--------------------------------------------------------------------------------
 1 | Cu. 9-10-11 1955—SEssIon LAWS
 2 | 
 3 | four per cent (4%) of the sales price as confirmed by the board of county
 4 | commissioners.
 5 | 
 6 | See. 3. All laws and clauses of laws in conflict with this Act are hereby
 7 | repealed.
 8 | 
 9 | See. 4. This Act shall be in full force and effect from and after its
10 | ratification.
11 | 
12 | In the General Assembly read three times and ratified, this the 8rd day
13 | of February, 1955.
14 | 
15 | H. B. 58 CHAPTER 10
16 | 
17 | AN ACT TO AMEND G. §S. 1-109, RELATING TO PROSECUTION
18 | BONDS, SO AS TO PLACE THE STATE ON THE SAME BASIS AS
19 | CITIES AND TOWNS WITH RESPECT TO EXEMPTION THERE-
20 | FROM.
21 | 
22 | The General Assembly of North Carolina do enact:
23 | 
24 | Section 1. G. S. 1-109 is hereby amended by inserting the words “the
25 | State of North Carolina or any of its agencies, commissions or institutions,
26 | or to” immediately following the word “to” and immediately preceding the
27 | word “counties”, in line 3 of paragraph 3, and by inserting the words “the
28 | State of North Carolina or any of its agencies, commissions or institutions,
29 | and” immediately following the word “that” and immediately preceding the
30 | word “counties” in line 4 of paragraph 3.
31 | 
32 | See. 2. This Act shall apply to pending litigation, and all actions or
33 | proceedings heretofore instituted by the State of North Carolina or its
34 | agencies shall be valid as if the provisions of this Act had at all times been
35 | the law of the land.
36 | 
37 | Sec. 3. All laws and clauses of laws in conflict with this Act are hereby
38 | repealed.
39 | 
40 | Sec. 4. This Act shall become effective upon its ratification.
41 | 
42 | In the General Assembly read three times and ratified, this the 3rd day
43 | of February, 1955.
44 | 
45 | S. B. 22 CHAPTER 11
46 | 
47 | AN ACT TO AMEND G. S. 153-38 SO AS TO PROVIDE FOR THE PAY-
48 | MENT OF THE EXPENSES BY GRANVILLE COUNTY OF THE
49 | COUNTY AUDITOR, THE CLERK TO THE BOARD OF COUNTY
50 | COMMISSIONERS, AND THE COUNTY ATTORNEY IN ATTEND-
51 | ING MEETINGS OF THE STATE ASSOCIATION OF COUNTY COM-
52 | MISSIONERS.
53 | 
54 | The General Assembly of North Carolina do enact:
55 | Section 1. G. S. 158-38 is amended by adding at the end thereof a new
56 | paragraph to read as follows:
57 | “In Granville County, the Board of County Commissioners is authorized,
58 | in its discretion, to pay the expenses of the County Auditor, the Clerk to
59 | 
60 | 6
61 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0063.txt:
--------------------------------------------------------------------------------
 1 | 1955—SESSION Laws Cu. 11-12-13
 2 | 
 3 | the Board of County Commissioners, and the County Attorney in attend-
 4 | ing meetings of the State Association of County Commissioners.”
 5 | 
 6 | See. 2. All action heretofore taken by the Board of County Commission-
 7 | ers of Granville County in paying the expenses of the officials named in
 8 | Section 1 of this Act in attending meetings of the State Association of
 9 | County Commissioners is hereby validated, ratified, and confirmed.
10 | 
11 | Sec. 3. All laws and clauses of laws in conflict with this Act are hereby
12 | repealed,
13 | 
14 | See. 4. This Act shall be in full force and effect from and after its rati-
15 | fication.
16 | 
17 | In the General Assembly read three times and ratified, this the 4th day
18 | of February, 1955.
19 | 
20 | S. B. 34 CHAPTER 12
21 | 
22 | AN ACT TO AMEND CHAPTER 465 OF THE SESSION LAWS OF 1949
23 | TO AUTHORIZE THE BOARD OF COUNTY COMMISSIONERS OF
24 | ROWAN COUNTY IN ITS DISCRETION TO ADD THE DUTIES
25 | AND POWERS OF COUNTY TAX SUPERVISOR TO THOSE NOW
26 | BEING PERFORMED BY THE COUNTY TAX COLLECTOR.
27 | 
28 | The General Assembly of North Carolina do enact:
29 | 
30 | Section 1. Section 1 of Chapter 465 of the Session Laws of 1949 is
31 | hereby amended by rewriting Section 4 thereof to read as follows: “Sec. 4.
32 | The Board of County Commissioners of Rowan County may, in its discre-
33 | tion, add the duties and powers of County Tax Supervisor to those now
34 | being performed by the County Auditor or by the County Tax Collector
35 | and, in such event, may pay such County Auditor or County Tax Collector
36 | such additional compensation for such services as, in its discretion, it may
37 | deem appropriate.”
38 | 
39 | Sec. 2. All laws and clauses of laws in conflict with this Act are here-
40 | by repealed.
41 | 
42 | Sec. 3. This Act shall be in full force and effect from and after its rati-
43 | fication.
44 | 
45 | In the General Assembly read three times and ratified, this the 4th
46 | day of February, 1955.
47 | 
48 | S. B. 35 CHAPTER 13
49 | 
50 | AN ACT AUTHORIZING THE BOARD OF COUNTY COMMISSIONERS
51 | OF ROWAN COUNTY TO EXTEND THE PERIOD DURING WHICH
52 | IT MAY SIT IN 1955 AS A BOARD OF EQUALIZATION AND RE-
53 | VIEW.
54 | 
55 | WHEREAS, the Board of County Commissioners of Rowan County are
56 | in the process of revaluing taxable property in Rowan County; and
57 | WHEREAS, the revaluation was not completed on January Ist, 1955
58 | the day upon which tax listing began; and
59 | WHEREAS, the said County Commissioners, acting as a Board of
60 | Equalization and Review from March 21st to April 11th, 1955, will not
61 | 
62 | ui
63 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0064.txt:
--------------------------------------------------------------------------------
 1 | Cu. 18-14 1955—SrssIon LAws
 2 | 
 3 | have sufficient time to properly consider all complaints likely to arise on
 4 | account of such revaluations: Now, therefore,
 5 | The General Assembly of North Carolina do enact:
 6 | 
 7 | Section 1. That the Board of County Commissioners in its discretion
 8 | may extend the period during which it may sit in the year 1955 as a Board
 9 | of Equalization and Review until such time as it has completed the work
10 | of hearing and determining complaints relating to revaluation; but said
11 | extension shall end on or before October Ist, 1955.
12 | 
13 | See. 2. All laws and clauses of laws in conflict with this Act are here-
14 | by repealed.
15 | 
16 | See. 3. This Act shall be in full force and effect from and after its
17 | ratification.
18 | 
19 | In the General Assembly read three times and ratified, this the 4th day
20 | of February, 1955.
21 | 
22 | S. B. 101 CHAPTER 14
23 | 
24 | AN ACT TO AMEND CHAPTER 788 OF THE SESSION LAWS OF 1953
25 | SO AS TO APPOINT A MEMBER OF THE BOARD OF EDUCATION
26 | OF BRUNSWICK COUNTY TO SERVE OUT THE UNEXPIRED
27 | TERM OF RAY WALTON.
28 | 
29 | WHEREAS, Ray Walton was named in Chapter 788 of the Session
30 | Laws of 1953 to serve on the Board of Education of Brunswick County for
31 | a term of two years; and
32 | 
33 | WHEREAS, Ray Walton having been elected Senator from the Tenth
34 | Senatorial District to serve in the 1955 General Assembly resigned from
35 | his position as a member of the Board of Education of Brunswick
36 | County: Now, therefore,
37 | 
38 | The General Assembly of North Carolina do enact:
39 | 
40 | Section 1. Section 1 of Chapter 788 of the Session Laws of 1953 is here-
41 | by amended so as to provide that Thomas St. George is appointed a mem-
42 | ber of the Board of Education of Brunswick County to serve for the un-
43 | expired term of Ray Walton.
44 | 
45 | Sec. 2. All laws and clauses of laws in conflict with this Act are here-
46 | by repealed.
47 | 
48 | Sec. 3. This Act shall be in full force and effect from and after its
49 | ratification.
50 | 
51 | In the General Assembly read three times and ratified, this the 4th day
52 | of February, 1955.
53 | 
54 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0065.txt:
--------------------------------------------------------------------------------
 1 | 1955—SESSION LAws Cu. 15-16-17
 2 | 
 3 | H. B. 6 CHAPTER 15
 4 | 
 5 | AN ACT TO REPEAL CHAPTER 522 OF THE SESSION LAWS OF 1953,
 6 | RELATING TO COUNTY POLICEMEN OF MCDOWELL COUNTY.
 7 | 
 8 | The General Assembly of North Carolina do enact:
 9 | 
10 | Section 1. Chapter 522 of the Session Laws of 1958 is repealed.
11 | 
12 | Sec. 2. All laws and clauses of laws in conflict with this Act are here-
13 | by repealed.
14 | 
15 | Sec. 3. This Act shall be in full force and effect from and after its rati-
16 | fication.
17 | 
18 | In the General Assembly read three times and ratified, this the 4th day
19 | of February, 1955.
20 | 
21 | He BYi5 CHAPTER 16
22 | 
23 | AN ACT TO INCREASE THE MEMBERSHIP OF THE BOARD OF
24 | COUNTY COMMISSIONERS OF PERSON COUNTY FROM 3 TO 5,
25 | AND TO AMEND G. §. 153-5.
26 | 
27 | The General Assembly of North Carolina do enact:
28 | 
29 | Section 1. That G. S. 153-5 is hereby amended by adding at the end
30 | thereof a paragraph reading as follows:
31 | 
32 | “There shall be elected in Person County at the general election to be
33 | held in the year 1956 and every two years thereafter by the duly qualified
34 | voters thereof, a board of county commissioners composed of five persons
35 | who shall serve for a term of two years from the first Monday in Decem-
36 | ber after their election and until their successors are elected and quali-
37 | fied.”
38 | 
39 | Sec. 2. All laws and clauses of laws in conflict with the provisions of
40 | this Act are hereby repealed.
41 | 
42 | See. 3. This Act shall be in full force and effect from and after its
43 | ratification.
44 | 
45 | In the General Assembly read three times and ratified, this the 4th day
46 | of February, 1955.
47 | 
48 | H. B. 16 CHAPTER 17
49 | 
50 | AN ACT TO AMEND CHAPTER 105 OF THE GENERAL STATUTES SO.
51 | AS TO CHANGE THE TIME FOR FILING STATE INCOME TAX
52 | RETURNS BY PERSONS OTHER THAN CORPORATIONS FROM
53 | THE FIFTEENTH DAY OF MARCH TO THE FIFTEENTH DAY OF
54 | APRIL IN EACH YEAR, AND TO CONFORM THE STATE LAW TO
55 | THE FEDERAL LAW AS TO THE TIME FOR FILING RETURNS.
56 | 
57 | The General Assembly of North Carolina do enact:
58 | 
59 | Section 1. The first paragraph of G. S. 105-155 is hereby amended by
60 | rewriting said paragraph to read as follows:
61 | 
62 | “Returns shall be in such form as the Commissioner of Revenue may
63 | from time to time prescribe, and shall be filed with the Commissioner at his
64 | 
65 | 9
66 | 


--------------------------------------------------------------------------------
/oer/sample_output/sessionlawsresol1955nort_0066.txt:
--------------------------------------------------------------------------------
 1 | CH. 17 1955—-SEssIoN LAWS
 2 | 
 3 | main office, or at any branch office which he may establish. The return of
 4 | every person reporting on a calendar year basis shall be filed on or before
 5 | the fifteenth day of April in each year, and the return of every person
 6 | reporting on a fiscal year basis shall be filed on or before the fifteenth
 7 | day of the fourth month following the close of the fiscal year. The return
 8 | of a corporation reporting on a calendar year basis shall be filed on or
 9 | before the fifteenth day of March in each year, and the return of a cor-
10 | poration reporting on a fiscal year basis shall be filed on or before the
11 | fifteenth day of the third month following the close of the fiscal year. In
12 | ease of sickness, absence, or other disability or whenever in his judgment
13 | good cause exists, the Commissioner may allow further time for filing
14 | returns.”
15 | 
16 | See. 2. Subsection (1) of G. S. 105-157 is hereby amended by rewriting
17 | the subsection to read as follows
18 | 
19 | “(1) Except as otherwise provided in this Section, the full amount of
20 | the tax payable as shown on the face of the return shall be paid to the
21 | Commissioner of Revenue at the office where the return is filed at the time
22 | fixed by law for filing the return.
23 | 
24 | “If the taxpayer is a person reporting on a calendar year basis and the
25 | amount of tax exceeds fifty dollars ($50.00), payment may be made in two
26 | equal installments: one-half at the time of filing the return, and one-half
27 | on or before the fifteenth day of September following the date the return
28 | was originally due to be filed, with interest on the deferred payment at the
29 | rate of four per cent (4%) per annum from the date the return was origi-
30 | nally due to be filed. If the taxpayer is a person reporting on a calendar
31 | year basis and the amount of the tax exceeds four hundred dollars ($400.00),
32 | payment may be made in four equal installments: one-fourth at the time
33 | of filing the return, one-fourth on or before the fifteenth day of June fol-
34 | lowing the date the return was originally due to be filed, one-fourth on or
35 | before the fifteenth day of September following the date the return was
36 | originally due to be filed, and one-fourth on or before the fifteenth day of
37 | December following the date the return was originally due to be filed, with
38 | interest on deferred payments at the rate of four per cent (4%) per annum
39 | from the date the return was originally due to be filed.
40 | 
41 | “If the taxpayer is a person reporting on a fiscal year basis or a cor-
42 | poration reporting on either a calendar year or fiscal year basis and the
43 | amount of the tax exceeds fifty dollars ($50.00), payment may be made in
44 | two equal installments: one-half on the date the return is filed, and one-
45 | half on or before the fifteenth day of the sixth month following the month
46 | in which the return was originally due to be filed, with interest on the
47 | deferred payment at the rate of four per cent (4%) per annum from the
48 | date the return was originally due to be filed. If the taxpayer is a person
49 | reporting on a fiscal year basis or a corporation reporting on either a cal-
50 | endar year or fiscal year basis and the amount of the tax exceeds four
51 | hundred dollars ($490.00), payment may be made in four equal install-
52 | ments: one-fourth at the time of filing the return, one-fourth on or before
53 | the fifteenth day of the third month following the month in which the re-
54 | turn was originally due to be filed, one-fourth on or before the fifteenth
55 | 
56 | 10
57 | 


--------------------------------------------------------------------------------
/oer/sessionlawsresol1955nort_0057.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sessionlawsresol1955nort_0057.jpg


--------------------------------------------------------------------------------
/oer/sessionlawsresol1955nort_0057_grayscale.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sessionlawsresol1955nort_0057_grayscale.jpg


--------------------------------------------------------------------------------
/oer/sessionlawsresol1955nort_0057_inverted.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UNC-Libraries-data/OnTheBooks/18f8ba7d6c0007c0c5a7388fef61790fdfa758a3/oer/sessionlawsresol1955nort_0057_inverted.jpg


--------------------------------------------------------------------------------
/workflow.md:
--------------------------------------------------------------------------------
 1 | # *OnTheBooks* Workflow
 2 | This page is meant to provide an overview of the workflow used to create the On the Books corpus. The workflow can be divided into seven major stages:
 3 | 
 4 | 1. Data Acquisition
 5 | 2. Marginalia Determination
 6 | 3. Image Adjustment Recommendations
 7 | 4. Optical Character Recognition (OCR)
 8 | 5. Section Splitting & Cleaning
 9 | 6. Analysis
10 | 7. XML Generation
11 | 
12 | ## Data Acquisition
13 | During data acquisition, images and metadata were gathered through a combination of automatic downloads from the Internet Archive and manual metadata creation.
14 | 
15 | First, digitized versions of the volumes from the Internet Archive were identified using the Internet Archive's advanced search interface. Using the metadata that resulted from this search, all images comprising the corpus were downloaded using [jp2_download.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/data_acquisition/jp2_download.py). Extraneous page images, such as blank pages or those containing tables of contents, were manually identified and deleted.
16 | 
17 | Next, metadata such as law type (private laws, public laws, etc.) and original print page number were manually compiled for corpus images. These metadata were combined with other page-level metadata such as the leaf number (pdf page number) and page hand side (left or right), gathered from Internet Archive XML files using [xml_parser.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/data_acquisition/xml_parser.py).
18 | 
19 | The products of this stage consisted of a curated set of all relevant image files as well as page-level metadata for all images in the corpus:
20 | * file name
21 | * leaf number
22 | * hand side
23 | * print page number
24 | * law type section
25 | * law type section title
26 | * Internet Archive image URL
27 | 
28 | These items were compiled into a corpus-level document called 'xmljpegmerge.csv'.
29 | 
30 | **Output File(s):**
31 | * *xmljpegmerge.csv* - .csv file with page-level metadata for the entire corpus
32 | 
33 | ## Marginalia Determination
34 | Marginalia, which is text that serves as a finding aid, was printed in the corpus volumes prior to 1951. The marginalia are not part of the laws and needed to be left out of the OCR process, as did paratextual information from page headers and footers. The marginalia determination process involved identifying the coordinates of the main text body for OCR. The marginalia determination process also identified the median page color to allow for the creation of a blank, color-neutral border around the main body text on each page. Tesseract OCR performs best when the text is not too close to the edge of the page.
35 | 
36 | This step was accomplished using [marginalia_determination.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/marginalia/marginalia_determination.py) in concert with [cropfunctions.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/marginalia/cropfunctions.py). Detailed documentation for this step can be found [here](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/examples/marginalia_determination/marginalia_determination.ipynb).
37 | 
38 | **Output File(s):**
39 | * *marginalia_metadata.csv* - .csv file containing main body text boundary coordinates and background color information for each page in the corpus
40 | 
41 | ## Image Adjustment Recommendations
42 | Once the marginalia cropping information had been compiled, various image adjustments were tested for each volume to maximize OCR performance. A sample of images for each volume was selected and tested using different values for a range of parameters (color, contrast, etc.). Once the optimal image adjustments for each volume had been determined, these were stored for use during the following OCR stage.
43 | 
44 | This step was accomplished using [adjRec.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/ocr/adjRec.py) in concert with [ocr_func.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/ocr/ocr_func.py). Detailed documentation for this step can be found [here](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/examples/adjustment_recommendation/adjRec.ipynb).
45 | 
46 | **Output File(s):**
47 | * *adjustments.csv* - .csv file containing OCR-optimized image adjustment parameter values for each volume
48 | 
49 | ## Optical Character Recognition (OCR)
50 | Having produced the prerequisite files ("adjustments.csv", "marginalia_metadata.csv", and "xmljpegmerge.csv"), OCR was performed on each page of each volume to produce a series of output files. OCR output files were saved for each law type (public, private, public-local) and session (e.g. Private Laws of the State of North Carolina, Session 1891 saved as lawsresolutionso1891nort_private laws_data.tsv).
51 | 
52 | This step was accomplished using [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki), which was accessed programmatically via a [pytesseract](https://pypi.org/project/pytesseract/) wrapper. The scripts involved in this stage were [ocr_use.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/ocr/ocr_use.py) and those functions contained in [ocr_func.py](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/code/ocr/ocr_func.py). Detailed documentation for this step can be found [here](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/examples/ocr/ocr_use.ipynb).
53 | 
54 | **Output File(s):**
55 | * *(volume)_adjustments.txt* - stores the image adjustments used to perform OCR on that particular volume. One of these files was created for each physical volume.
56 | * *(volume)_(section).txt* - stores a compiled version of all OCR'd text for a given law type section. One of these files was created for each set of laws ("Public", "Private", etc.) found in each physical volume.
57 | * *(volume)_(section)_data.tsv* - a word-level .tsv file for a given section. The rows in this file correspond to each individual token (word) recorded by the OCR process, along with page coordinates and confidence value for each. One of these files was created for each set of laws ("Public", "Private", etc.) in each physical volume.
58 | 
59 | ## Section Splitting & Cleaning
60 | After completing OCR, each volume was 'split' into its constituent chapters and sections, with each section representing an individual law. This was accomplished using regular expression pattern matching on the word-level "(volume)_(section)_data.tsv" files produced in the previous step. Once initial assignments had been made, the corpus underwent a lengthy cleaning process that eliminated most section and chapter assignment errors and created a set of "aggregate" files in which all words were aggregated into their assigned sections (laws).
61 | 
62 | This step was accomplished using the 7 separate scripts located [here](https://github.com/UNC-Libraries-data/OnTheBooks/tree/main/code/split_cleanup) in combination with several rounds of manual review. Detailed documentation for this step can be found [here](https://github.com/UNC-Libraries-data/OnTheBooks/blob/main/examples/split_cleanup/split_cleanup.ipynb).
63 | 
64 | **Output File(s):**
65 | * *(volume)_(section)_data.csv* - an updated version of the 'raw' output .tsv files created in the OCR step. One of these files was created for each set of laws found ("Public", "Private", etc.) in each physical volume.
66 | * *(volume)_(section)_aggregate_data.csv* - contains all volume text aggregated into sections (laws). One of these files was created for each set of laws found ("Public", "Private", etc.) in each physical volume.
67 | 
68 | ## Analysis
69 | The analysis phase of the project involved both supervised and unsupervised learning methods. The purposes of this phase were twofold:
70 | 1. To use automated techniques to help explore and better understand the characteristics and composition of Jim Crow Laws
71 | 2. To provide an efficient means for expanding the collection of Jim Crow laws already identified by experts
72 | 
73 | This phase utilized the aggregated versions of the laws, compiled during the previous phase: "(volume)_(section)_aggregate_data.csv".
74 | 
75 | Latent Dirichlet Allocation, an unsupervised method, was used to build a topic model for the laws. This analysis was conducted by team member Rucha Dalwadi and is detailed in her master’s paper ([Dalwadi 2020](https://doi.org/10.17615/tksc-t217)).
76 | 
77 | Following the unsupervised classification efforts, an effort was made to identify Jim Crow laws using "active" supervised classification. A training set was compiled by expert reviewers doing close reading. A combination of preliminary classification runs and expert review was used to expand the existing labeled training set. The resulting expanded training set was used to [perform classification on the entire corpus (script)](https://unc-libraries-data.github.io/OnTheBooks/code/classification/ModelSelection_v2.html). This allowed for the labeling of laws as "Jim Crow" or "not Jim Crow" based on a pre-determined probability threshold.
78 | 
79 | This step was accomplished using [scikit learn](https://scikit-learn.org/) and [XGBoost](https://xgboost.readthedocs.io/) to build and evaluate models. For text processing, [nltk](https://www.nltk.org/) was used.
80 | 
81 | **Output File(s):**
82 | * *jim_crow_list.csv* - contains all laws identified as Jim Crow laws by expert reviewers, analytical models, or both.
83 | * *law_list.csv* - contains all laws in the corpus with all metadata accumulated from previous steps along with each law's Jim Crow classification value and classification source (experts, models, or both).
84 | 
85 | ## XML Generation
86 | Following the analysis phase, the corpus was prepared for dissemination from the [Carolina Digital Repository](https://doi.org/10.17615/5c4g-sd44). Each volume was enriched with metadata as XML. Metadata files were merged using a unique identifier, then added to the corpus as XML elements and attributes. Python's [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html) API was used to generate the XML. A .xsd schema was then created that defines the information provided about each volume in the corpus, such as the volume title, year, and session name. The schema also provides information about the laws contained in each volume, such as law titles, types, and Jim Crow classifications.
87 | 
88 | **Output File(s):**
89 | * *onthebooks.xsd* - the xml schema definition for all xml files in the corpus.
90 | * *(volume).xml* - contains metadata and content for all laws within a given volume, tagged according to the above schema. One of these files was created for each physical volume in the corpus.
91 | 


--------------------------------------------------------------------------------