├── LICENSE
├── XML Test Code
    ├── lexicons.csv
    ├── RMA-Tool-CSVOnly.py
    ├── RMA-Tool.py
    └── RMA-GUI.py
├── project notes
├── Code
    ├── GUI-Documentation.md
    ├── Test Versions
    │   ├── RMA-Tool-1.0.py
    │   ├── RMA-GUI-1.0.py
    │   ├── RMA-GUI-2.0.py
    │   └── RMA-GUI-2.5.py
    ├── Past Versions
    │   ├── MacOS-UsersGuide-2.5.md
    │   ├── MaRMAT-CommandLine-2.5.py
    │   └── MaRMAT-GUI-2.5.2.py
    ├── lexicon-reparative-metadata.csv
    ├── MarMAT-CommandLine-2.6.py
    ├── example-output-lcsh-subject-lexicon.csv
    ├── MaRMAT-GUI-2.5.3.py
    ├── example-output-reparative-metadata-lexicon.csv
    ├── lexicon-LCSH.csv
    └── example-input-metadata.csv
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 J. Willard Marriott Library
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/XML Test Code/lexicons.csv:
--------------------------------------------------------------------------------
 1 | Aggrandizement,RaceEuphemisms,RaceTerms,SlaveryTerms,GenderTerms,LGBTQ,MentalIllness,Disability
 2 | acclaimed,color blind,aboriginal,abolition,matriarch,biologically female,brain damaged,able-bodied
 3 | ambitious,colored,aboriginals,abolitionist,miss,biologically male,committed suicide,birth defect
 4 | celebrated,coloured,aborigines,antislavery,mistress,dyke,crazy,confined to a wheelchair
 5 | distinguished,negro,afro-american,anti-slavery,mrs.,fag,dumb,cripple
 6 | eminent,race relations,aliens,bill of sale,muse,gay lifestyle,emotionally disturbed,crippled
 7 | esteemed,race situation,arab,bills of sale,patriarch,gays,insane,deaf-mute
 8 | expert,race-based,arabs,enslaved,spouse,homosexual,retarded,deformed
 9 | famous,racial,asians,freed slave,wife,homosexuality,slow learner,disabled person
10 | father of,racism,asiatic,freed slaves,,lesbianism,special needs,dwarf
11 | foremost,riot,blacks,freedman,,sexual minorities,stupid,epileptic
12 | founding father,troubles,bushman,freedmen,,sexual minority,,handicap
13 | genius,unruly,bushmen,fugitive slave,,sexual preference,,handicapped
14 | gentleman,,bushwoman,hired out,,tranny,,invalid
15 | important,,chink,hiring out,,transvestite,,lame
16 | influential,,civilized,manumission,,,,midget
17 | man of letters,,coolie,manumitted,,,,paraplegic
18 | masterpiece,,coolies,negro,,,,physically challenged
19 | notable,,creole,overseer,,,,the deaf
20 | patriot,,creoles,plantation,,,,the disabled
21 | pioneer,,ethnic,planter,,,,wheelchair-bound
22 | plantation owner,,exotic,runaway slave,,,,
23 | planter,,gook,runaway slaves,,,,
24 | preeminent,,gypsies,slave,,,,
25 | prestigious,,gypsy,slave holder,,,,
26 | prolific,,hispanics,slave master,,,,
27 | prominent,,illegal alien,slave owner,,,,
28 | renowned,,illegal aliens,slave revolt,,,,
29 | respected,,illegal immigrant,slaveholder,,,,
30 | revolutionary,,illegal immigrants,slavery,,,,
31 | seminal,,illegals,slaves,,,,
32 | successful,,indian,,,,,
33 | wealthy,,indians,,,,,
34 | ,,japs,,,,,
35 | ,,mammy,,,,,
36 | ,,mulatto,,,,,
37 | ,,mulattoes,,,,,
38 | ,,mulattos,,,,,
39 | ,,native americans,,,,,
40 | ,,natives,,,,,
41 | ,,negro,,,,,
42 | ,,negroes,,,,,
43 | ,,negros,,,,,
44 | ,,oriental,,,,,
45 | ,,primitive people,,,,,
46 | ,,primitives,,,,,
47 | ,,pygmies,,,,,
48 | ,,pygmy,,,,,
49 | ,,sambo,,,,,
50 | ,,savages,,,,,
51 | ,,segregated,,,,,
52 | ,,squaw,,,,,
53 | ,,squaws,,,,,
54 | ,,uncivilized,,,,,
55 | 


--------------------------------------------------------------------------------
/project notes:
--------------------------------------------------------------------------------
 1 | Here are our project notes from Friday, April 19 2024
 2 | 
 3 | 
 4 | Project notes from working meeting:
 5 | 
 6 | We continued to look at issues with harvesting OAI and convering xml to CSV
 7 | We decided to focus only on downloaded CSV and TSV files
 8 | 
 9 | Kaylee explored writing something that would help Rachel convert TSV files, or this is done in Open Refine? Does this need to be revisited?
10 | We had discussions about supplementing the lexicon.
11 | 
12 | Kaylee wrote code and Rachel tested on her PC
13 | One issue raised was the idea of searching for phrases, for example "biological male"
14 | We also thought of supplementing the lexicon with lists of LCSH terms that were outdated, in addition to terms that were problematic.
15 | We tried to find ways of getting automated lists of changed LCSH subject headings and did not come up with a good solution. 
16 | Rachel and Anna are coping and pasting lists of changes into Excel and adjusting in order to come up with these terms from classweb:
17 | https://classweb.org/approved-subjects/ (Rachel to do from 2011-2016, Anna to work on 2017-2024)
18 | 
19 | At the close of day, there is a working script, with some issues - tokenizing? on parts of words causes the riot in Marriott to appear.
20 | 
21 | Need to change section of find matches for what script is looking for to resolve this
22 | 
23 | Kaylee cleared out the riots as a closing activity today! WOO!
24 | 
25 | 
26 | May 1 2024
27 | 
28 | We reviewed current progress. Rachel tested GUI, it did not work
29 | Anna prepared Aileen H Clyde sample metadata, it is in project Box folder
30 | Kaylee notes that we can add more information about description of the project in advance of our next meeting
31 | 
32 | June 6
33 | 
34 | We are going to do CSV only
35 | We are still considering the progress bar
36 | Anna will review documentation, see if there is anything additonal to add about the requirement for csv files
37 | 
38 | Rachel will finalize lexicons
39 | 
40 | Kaylee will add additional column output for original context
41 | 
42 | We discussed the issues with matching against the problem LCSH lexicon
43 | Kaylee thinks that the processing step where the metadata gets stripped of punctuation is removing the subdivisions between LCSH headings and subheadings
44 | This might be making the matching process against the LCSH lexicon not function well.
45 | Anna asked if we could add intermediate steps where we print out some of the behind the scenes processing steps would be useful
46 | 
47 | maybe we need a separate tool that could process digital library LCSH against a LCSH lexicon
48 | 
49 | Also maybe take out headings only from LCSH changes, not subheadings in order to pick up broad matches
50 | 


--------------------------------------------------------------------------------
/Code/GUI-Documentation.md:
--------------------------------------------------------------------------------
 1 | # Marriott Reparative Metadata Assessment Tool (MaRMAT) GUI
 2 | 
 3 | The MaRMAT GUI is a graphical application built using Tkinter in Python. This tool allows users to match terms from a problematic terms lexicon file with text data from a collections metadata file, facilitating metadata cleanup and analysis.
 4 | 
 5 | ## Overview
 6 | 
 7 | The application provides the following functionalities:
 8 | 
 9 | - Load CSV files for both lexicon and metadata.
10 | - Select specific columns from the metadata for analysis.
11 | - Choose an identifier column in the metadata to relate back to the original dataset.
12 | - Select categories of terms from the lexicon for searching.
13 | - Perform matching to find terms in selected metadata columns and export results to a CSV file.
14 | 
15 | ## Features
16 | 
17 | - **User Interface**: Utilizes Tkinter for a GUI interface.
18 | - **File Loading**: Supports loading CSV files for lexicon and metadata.
19 | - **Column Selection**: Allows users to choose specific columns from metadata for term analysis.
20 | - **Identifier Selection**: Enables selection of an identifier column for linking matched terms back to the original metadata.
21 | - **Category Selection**: Provides options to select categories of terms from the lexicon for matching.
22 | - **Matching Process**: Performs regex-based term matching across selected metadata columns and chosen lexicon categories.
23 | - **Output**: Exports matched data to a CSV file for further analysis or use.
24 | 
25 | ## Getting Started
26 | 
27 | To use the MaRMAT GUI, follow these steps:
28 | 
29 | 1. Download the [RMA-GUI-2.52.py](https://github.com/kayleealexander/RMA-Tool/blob/main/Code/RMA-GUI-2.52.py) file.
30 | 2. Download one of our sample lexicons in the [Code](https://github.com/kayleealexander/RMA-Tool/tree/main/Code) folder, or create your own.
31 | 3. Download the metadata you want to assess as a CSV file.
32 | 4. Open the [RMA-GUI-2.52.py](https://github.com/kayleealexander/RMA-Tool/blob/main/Code/RMA-GUI-2.52.py) file and follow the prompts.
33 | 
34 | ## Using the Tool
35 | 
36 | **1. Load Lexicon and Metadata**:
37 |    - Follow on-screen instructions to load your lexicon and metadata CSV files using the provided buttons.
38 | 
39 | **2. Perform Analysis**:
40 |    - Select columns from your metadata for analysis.
41 |    - Choose an identifier column for matching results back to the original dataset.
42 |    - Select categories of terms from the lexicon for analysis.
43 |    - Click "Perform Matching" to find matches and export the results as a CSV file.
44 | 
45 | ## Additional Notes
46 | 
47 | **Dependencies**: Ensure you have Python 3.x and the `pandas` library installed as per the installation instructions.
48 | 
49 | ## Contact
50 | 
51 | For any questions or support, please contact [Kaylee Alexander](mailto:kaylee.alexander@utah.edu).
52 | 


--------------------------------------------------------------------------------
/Code/Test Versions/RMA-Tool-1.0.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import string
 3 | import re
 4 | 
 5 | def load_lexicon(file_path):
 6 |     try:
 7 |         # Load the lexicon CSV file into a DataFrame
 8 |         lexicon_df = pd.read_csv(file_path, encoding='latin1')
 9 |         return lexicon_df
10 |     except FileNotFoundError:
11 |         print("File not found. Please provide a valid file path.")
12 |         return None
13 |     except Exception as e:
14 |         print("An error occurred:", e)
15 |         return None
16 | 
17 | def load_metadata(file_path):
18 |     try:
19 |         # Load the metadata CSV file into a DataFrame
20 |         metadata_df = pd.read_csv(file_path, encoding='latin1')
21 |         
22 |         # Remove punctuation from specified columns
23 |         punctuation_table = str.maketrans('', '', string.punctuation)
24 |         metadata_df['Title'] = metadata_df['Title'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x)
25 |         metadata_df['Description'] = metadata_df['Description'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x)
26 |         metadata_df['Collection Name'] = metadata_df['Collection Name'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x)
27 |         
28 |         return metadata_df
29 |     except FileNotFoundError:
30 |         print("File not found. Please provide a valid file path.")
31 |         return None
32 |     except Exception as e:
33 |         print("An error occurred:", e)
34 |         return None
35 |     
36 | def find_matches(lexicon_df, metadata_df):
37 |     matches = []
38 |     # Iterate over each row in the metadata DataFrame
39 |     for index, row in metadata_df.iterrows():
40 |         # Process the text in each specified column
41 |         for col in ['Title', 'Description', 'Subject', 'Collection Name']:
42 |             # Check if the value in the column is a string
43 |             if isinstance(row[col], str):
44 |                 # Iterate over each term in the lexicon and check for matches
45 |                 for term, category in zip(lexicon_df['term'], lexicon_df['category']):
46 |                     # Check if the whole term exists in the text column
47 |                     if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
48 |                         matches.append((row['Identifier'], term, category, col))
49 |     return matches
50 | 
51 | # Example usage
52 | lexicon_file_path = "lexicon.csv"  # Replace with the path to your lexicon CSV file
53 | metadata_file_path = "metadata.csv"  # Replace with the path to your metadata CSV file
54 | output_file_path = "matches.csv"  # Path to the output matches CSV
55 | 
56 | lexicon = load_lexicon(lexicon_file_path)
57 | metadata = load_metadata(metadata_file_path)
58 | 
59 | # Perform matching
60 | if lexicon is not None and metadata is not None:
61 |     matches = find_matches(lexicon, metadata)
62 |     # Create DataFrame from matches
63 |     matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column'])
64 |     # Merge matches with original metadata using left join on "Identifier"
65 |     merged_df = pd.merge(metadata, matches_df, on="Identifier", how="left")
66 |     # Filter out rows without matches
67 |     merged_df = merged_df.dropna(subset=['Term'])
68 |     # Save merged DataFrame to CSV
69 |     merged_df.to_csv(output_file_path, index=False)
70 | 
71 |     print("Merged data saved to:", output_file_path)
72 | 


--------------------------------------------------------------------------------
/Code/Test Versions/RMA-GUI-1.0.py:
--------------------------------------------------------------------------------
 1 | import tkinter as tk
 2 | from tkinter import filedialog
 3 | from tkinter import messagebox
 4 | import pandas as pd
 5 | import string
 6 | import re
 7 | 
 8 | def load_lexicon(file_path):
 9 |     try:
10 |         lexicon_df = pd.read_csv(file_path, encoding='latin1')
11 |         return lexicon_df
12 |     except FileNotFoundError:
13 |         messagebox.showerror("Error", "File not found. Please provide a valid file path.")
14 |         return None
15 |     except Exception as e:
16 |         messagebox.showerror("Error", f"An error occurred: {e}")
17 |         return None
18 | 
19 | def load_metadata(file_path):
20 |     try:
21 |         metadata_df = pd.read_csv(file_path, encoding='latin1')
22 |         punctuation_table = str.maketrans('', '', string.punctuation)
23 |         metadata_df['Title'] = metadata_df['Title'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x)
24 |         metadata_df['Description'] = metadata_df['Description'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x)
25 |         metadata_df['Collection Name'] = metadata_df['Collection Name'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x)
26 |         return metadata_df
27 |     except FileNotFoundError:
28 |         messagebox.showerror("Error", "File not found. Please provide a valid file path.")
29 |         return None
30 |     except Exception as e:
31 |         messagebox.showerror("Error", f"An error occurred: {e}")
32 |         return None
33 |     
34 | def find_matches(lexicon_df, metadata_df):
35 |     matches = []
36 |     for index, row in metadata_df.iterrows():
37 |         for col in ['Title', 'Description', 'Subject', 'Collection Name']:
38 |             if isinstance(row[col], str):
39 |                 for term, category in zip(lexicon_df['term'], lexicon_df['category']):
40 |                     if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
41 |                         matches.append((row['Identifier'], term, category, col))
42 |     return matches
43 | 
44 | def execute_matching():
45 |     lexicon_file_path = lexicon_entry.get()
46 |     metadata_file_path = metadata_entry.get()
47 |     output_file_path = output_entry.get()
48 |     
49 |     lexicon = load_lexicon(lexicon_file_path)
50 |     metadata = load_metadata(metadata_file_path)
51 | 
52 |     if lexicon is not None and metadata is not None:
53 |         matches = find_matches(lexicon, metadata)
54 |         matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column'])
55 |         merged_df = pd.merge(metadata, matches_df, on="Identifier", how="left")
56 |         merged_df = merged_df.dropna(subset=['Term'])
57 |         merged_df.to_csv(output_file_path, index=False)
58 |         messagebox.showinfo("Success", "Matching process completed. Output saved successfully.")
59 | 
60 | # GUI
61 | root = tk.Tk()
62 | root.title("Lexicon Matcher")
63 | 
64 | # Lexicon file path entry
65 | lexicon_label = tk.Label(root, text="Lexicon File Path:")
66 | lexicon_label.grid(row=0, column=0, padx=5, pady=5)
67 | lexicon_entry = tk.Entry(root, width=50)
68 | lexicon_entry.grid(row=0, column=1, padx=5, pady=5)
69 | lexicon_button = tk.Button(root, text="Browse", command=lambda: lexicon_entry.insert(tk.END, filedialog.askopenfilename()))
70 | lexicon_button.grid(row=0, column=2, padx=5, pady=5)
71 | 
72 | # Metadata file path entry
73 | metadata_label = tk.Label(root, text="Metadata File Path:")
74 | metadata_label.grid(row=1, column=0, padx=5, pady=5)
75 | metadata_entry = tk.Entry(root, width=50)
76 | metadata_entry.grid(row=1, column=1, padx=5, pady=5)
77 | metadata_button = tk.Button(root, text="Browse", command=lambda: metadata_entry.insert(tk.END, filedialog.askopenfilename()))
78 | metadata_button.grid(row=1, column=2, padx=5, pady=5)
79 | 
80 | # Output file path entry
81 | output_label = tk.Label(root, text="Output File Path:")
82 | output_label.grid(row=2, column=0, padx=5, pady=5)
83 | output_entry = tk.Entry(root, width=50)
84 | output_entry.grid(row=2, column=1, padx=5, pady=5)
85 | output_button = tk.Button(root, text="Browse", command=lambda: output_entry.insert(tk.END, filedialog.asksaveasfilename(defaultextension=".csv")))
86 | output_button.grid(row=2, column=2, padx=5, pady=5)
87 | 
88 | # Execute button
89 | execute_button = tk.Button(root, text="Execute Matching", command=execute_matching)
90 | execute_button.grid(row=3, column=1, padx=5, pady=5)
91 | 
92 | root.mainloop()
93 | 


--------------------------------------------------------------------------------
/XML Test Code/RMA-Tool-CSVOnly.py:
--------------------------------------------------------------------------------
  1 | import csv
  2 | import nltk
  3 | import string
  4 | from nltk.tokenize import word_tokenize
  5 | from nltk.corpus import stopwords
  6 | 
  7 | # Download NLTK resources if not already downloaded
  8 | nltk.download('punkt')
  9 | nltk.download('stopwords')
 10 | 
 11 | def load_lexicon_from_csv(file_path):
 12 |     """
 13 |     Loads lexicon categories and terms from a CSV file into a dictionary.
 14 | 
 15 |     Parameters:
 16 |     - file_path (str): File path of the CSV input file.
 17 | 
 18 |     Returns:
 19 |     - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values.
 20 | 
 21 |     Note:
 22 |     - The CSV file should have lexicon categories as column headers and terms listed under each category.
 23 |     """
 24 | 
 25 |     lexicon = {
 26 |         "Aggrandizement": [],
 27 |         "RaceEuphemisms": [],
 28 |         "RaceTerms": [],
 29 |         "SlaveryTerms": [],
 30 |         "GenderTerms": [],
 31 |         "LGBTQ": [],
 32 |         "MentalIllness": [],
 33 |         "Disability": []
 34 |     }
 35 |     
 36 |     with open(file_path, 'r', encoding='utf-8-sig') as csv_file:
 37 |         csv_reader = csv.reader(csv_file)
 38 |         next(csv_reader)  # Skip the header row
 39 |         
 40 |         for row in csv_reader:
 41 |             for category, terms in lexicon.items():
 42 |                 terms.extend(row)  # Assuming each row contains terms for each lexicon category
 43 |     
 44 |     return lexicon
 45 | 
 46 | def search_and_append_lexicon_category(lexicon, input_csv_file, output_csv_file):
 47 |     """
 48 |     Searches for lexicon term matches in an input CSV file, appends lexicon categories to each row, and writes the modified data into an output CSV file.
 49 | 
 50 |     Parameters:
 51 |     - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values.
 52 |     - input_csv_file (str): File path of the input CSV file.
 53 |     - output_csv_file (str): File path of the output CSV file.
 54 | 
 55 |     Note:
 56 |     - The input CSV file should contain columns specified for lexicon analysis.
 57 |     - The output CSV file will have additional columns for each lexicon category, indicating the matched terms.
 58 |     """
 59 | 
 60 |     # Load lexicon
 61 |     lexicon = load_lexicon_from_csv(lexicon)
 62 | 
 63 |     # Open input CSV file for reading and output CSV file for writing
 64 |     with open(input_csv_file, 'r', newline='', encoding='utf-8') as input_csv, \
 65 |          open(output_csv_file, 'w', newline='', encoding='utf-8') as output_csv:
 66 |         
 67 |         reader = csv.DictReader(input_csv)
 68 |         fieldnames = reader.fieldnames + list(lexicon.keys())  # Add lexicon category names as additional columns
 69 |         writer = csv.DictWriter(output_csv, fieldnames=fieldnames)
 70 |         writer.writeheader()
 71 |         
 72 |         # Iterate over rows in the input CSV file
 73 |         for row in reader:
 74 |             # Initialize dictionary to store token matches for each lexicon category
 75 |             token_matches = {category: [] for category in lexicon}
 76 |             
 77 |             # Tokenize and preprocess text from specified columns
 78 |             for column in ["Title", "Subject", "Description", "Collection Name"]:
 79 |                 text = row[column]
 80 |                 if text:
 81 |                     tokens = word_tokenize(text.lower())
 82 |                     filtered_tokens = [word for word in tokens if word not in stopwords.words('english') and word not in string.punctuation and not word.isdigit() and word != '--']
 83 |                     # Search for matches between tokens and terms in the lexicon
 84 |                     for category, terms in lexicon.items():
 85 |                         matches = [term for term in filtered_tokens if term in terms]
 86 |                         token_matches[category].extend(matches)
 87 |             
 88 |             # Update the row with token matches for each lexicon category
 89 |             row.update(token_matches)
 90 |             # Write the modified row to the output CSV file
 91 |             writer.writerow(row)
 92 | 
 93 | # File paths
 94 | lexicon_file_path = "PATH_TO_LEXICON_CSV_FILE"  # Insert path to your lexicon CSV file
 95 | input_csv_file_path = "PATH_TO_INPUT_CSV_FILE"  # Insert path to your input CSV file
 96 | output_csv_file_path = "PATH_TO_OUTPUT_CSV_FILE"  # Insert path to desired output CSV file
 97 | 
 98 | # Search for matches, append lexicon categories, and write to output CSV
 99 | search_and_append_lexicon_category(lexicon_file_path, input_csv_file_path, output_csv_file_path)
100 | 
101 | print("Lexicon matching and appending completed.")
102 | 


--------------------------------------------------------------------------------
/Code/Past Versions/MacOS-UsersGuide-2.5.md:
--------------------------------------------------------------------------------
  1 | # Comprehensive Guide for Running MaRMAT in Terminal on a Mac
  2 | 
  3 | ## 1. Prerequisites
  4 | 
  5 | ### 1.1 **Python Installation**: 
  6 |    - Ensure Python is installed by running the following in Terminal:
  7 | 
  8 |       ```bash
  9 |      python3 --version
 10 |      ```
 11 |       
 12 |    - If Python is not installed, download it from the [official Python website](https://www.python.org/downloads/).
 13 | 
 14 | ### 1.2 **Library Requirements**:
 15 |    - **Pandas**: Install the `pandas` library:
 16 |      
 17 |      ```bash
 18 |      pip3 install pandas
 19 |      ```
 20 |      
 21 |    - **Regular Expression (`re`) Module**: This module is part of Python’s standard library. Confirm its availability:
 22 |      
 23 |      ```bash
 24 |      python3 -c "import re; print('re module is available')"
 25 |      ```
 26 | 
 27 | ## 2. Step-by-Step Instructions
 28 | 
 29 | ### 2.1. **Save the Script**
 30 |    - Ensure the [MaRMAT-CommandLine-2.5.py](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-CommandLine-2.5.py) script is saved on your Mac.
 31 |    - Add the lexicon (e.g., [Reparative Metadata](https://github.com/marriott-library/MaRMAT/blob/main/Code/reparative-metadata-lexicon.csv), [LCSH](https://github.com/marriott-library/MaRMAT/blob/main/Code/LCSH-lexicon.csv)) you'd like to use as well as the metadata file you want to analyze to the same folder. 
 32 | 
 33 | ### 2.2. **Opening the Script for Editing with TextEdit**
 34 | 
 35 |    - **Locate the Script**:
 36 |       - Open Finder and navigate to the directory where the script is saved (e.g., `Documents`, `Downloads`).
 37 | 
 38 |    - **Open with TextEdit**:
 39 |       - Right-click on the script file (`MaRMAT-CommandLine-2.5.py`) and select **Open With > TextEdit**.
 40 |       - If you don’t see TextEdit, choose **Other...** and select TextEdit (or another text editor) from the list.
 41 | 
 42 |    - **Edit the Script**:
 43 |       - In TextEdit, find and modify the sections below according to your specific file paths and requirements. Note: These sections are all at the very end of the script under "# Example usage" (you can use `command` + `F` to search for "Example usage" to quickly find this).
 44 | 
 45 |         - **Load Lexicon**:
 46 |           
 47 |           ```python
 48 |           tool.load_lexicon("/path/to/your/lexicon.csv")  # Replace with the path to your lexicon CSV file.
 49 |           ```
 50 | 
 51 |         - **Load Metadata**:
 52 |           
 53 |           ```python
 54 |           tool.load_metadata("/path/to/your/metadata.csv")  # Replace with the path to your metadata CSV file.
 55 |           ```
 56 | 
 57 |         - **Select Columns for Matching**:
 58 |           
 59 |           ```python
 60 |           tool.select_columns(["Column1", "Column2"])  # Replace with the metadata column names you want to analyze.
 61 |           ```
 62 | 
 63 |         - **Select Identifier Column**:
 64 |           
 65 |           ```python
 66 |           tool.select_identifier_column("Identifier")  # Replace with the name of your identifier column (e.g., a record ID number).
 67 |           ```
 68 | 
 69 |         - **Select Categories for Matching**:
 70 |           
 71 |           ```python
 72 |           tool.select_categories(["RaceTerms"])  # Replace with the categories from the lexicon that you want to search for.
 73 |           ```
 74 | 
 75 |         - **Perform Matching and View Results**:
 76 |           
 77 |           ```python
 78 |           tool.perform_matching("/path/to/your/output.csv")  # Replace with the path to your output file.
 79 |           ```
 80 | 
 81 |       - Save your changes by clicking **File > Save** or pressing `Command + S`.
 82 | 
 83 |    - **Ensure Proper TextEdit Settings**:
 84 |       - If TextEdit opens the file in **Rich Text Format (RTF)**, change it to **Plain Text** by selecting **Format > Make Plain Text** from the menu. This ensures the script runs correctly.
 85 | 
 86 | ### 2.3. **Running the Script**
 87 | 
 88 |    - **Open Terminal**:
 89 |      
 90 |       - **Navigate to the Script’s Directory**:
 91 |       - Use the `cd` command to go to the directory where your script is located, for example:
 92 |         
 93 |         ```bash
 94 |         cd ~/Documents
 95 |         ```
 96 |    
 97 |    - **Execute the Script**:
 98 |       - Run the script using the following command:
 99 |         
100 |         ```bash
101 |         python3 MaRMAT-CommandLine-2.5.py
102 |         ```
103 | 
104 | ### 3. Additional Considerations
105 | 
106 | - **Full Paths**: Always use the full path for files if they are not in the same directory as the script.
107 | - **Output Files**: Ensure the output directory specified in the script has the appropriate permissions to save the results.
108 | 


--------------------------------------------------------------------------------
/Code/lexicon-reparative-metadata.csv:
--------------------------------------------------------------------------------
  1 | term,category
  2 | acclaimed,Aggrandizement
  3 | ambitious,Aggrandizement
  4 | celebrated,Aggrandizement
  5 | distinguished,Aggrandizement
  6 | eminent,Aggrandizement
  7 | esteemed,Aggrandizement
  8 | expert,Aggrandizement
  9 | famous,Aggrandizement
 10 | father of,Aggrandizement
 11 | foremost,Aggrandizement
 12 | founding father,Aggrandizement
 13 | genius,Aggrandizement
 14 | gentleman,Aggrandizement
 15 | important,Aggrandizement
 16 | influential,Aggrandizement
 17 | man of letters,Aggrandizement
 18 | masterpiece,Aggrandizement
 19 | notable,Aggrandizement
 20 | patriot,Aggrandizement
 21 | pioneer,Aggrandizement
 22 | plantation owner,Aggrandizement
 23 | preeminent,Aggrandizement
 24 | prestigious,Aggrandizement
 25 | prolific,Aggrandizement
 26 | prominent,Aggrandizement
 27 | renowned,Aggrandizement
 28 | respected,Aggrandizement
 29 | revolutionary,Aggrandizement
 30 | seminal,Aggrandizement
 31 | successful,Aggrandizement
 32 | wealthy,Aggrandizement
 33 | able-bodied,Disability
 34 | birth defect,Disability
 35 | confined to a wheelchair,Disability
 36 | cripples,Disability
 37 | cripple,Disability
 38 | crippled,Disability
 39 | deaf-mute,Disability
 40 | deformed,Disability
 41 | disabled person,Disability
 42 | dwarf,Disability
 43 | dwarfs,Disability
 44 | epileptic,Disability
 45 | handicap,Disability
 46 | handicapped,Disability
 47 | invalid,Disability
 48 | invalids,Disability
 49 | lame,Disability
 50 | midget,Disability
 51 | paraplegic,Disability
 52 | physically challenged,Disability
 53 | the deaf,Disability
 54 | the disabled,Disability
 55 | wheelchair-bound,Disability
 56 | matriarch,Gender
 57 | miss,Gender
 58 | mistress,Gender
 59 | mrs.,Gender
 60 | muse,Gender
 61 | patriarch,Gender
 62 | spouse,Gender
 63 | wife,Gender
 64 | wives,Gender
 65 | biologically female,LGBTQ
 66 | biologically male,LGBTQ
 67 | dyke,LGBTQ
 68 | fag,LGBTQ
 69 | gay lifestyle,LGBTQ
 70 | gays,LGBTQ
 71 | homosexual,LGBTQ
 72 | homosexuality,LGBTQ
 73 | lesbianism,LGBTQ
 74 | sexual minorities,LGBTQ
 75 | sexual minority,LGBTQ
 76 | sexual preference,LGBTQ
 77 | tranny,LGBTQ
 78 | transvestite,LGBTQ
 79 | brain damaged,MentalIllness
 80 | committed suicide,MentalIllness
 81 | crazy,MentalIllness
 82 | dumb,MentalIllness
 83 | emotionally disturbed,MentalIllness
 84 | insane,MentalIllness
 85 | retarded,MentalIllness
 86 | slow learner,MentalIllness
 87 | special needs,MentalIllness
 88 | stupid,MentalIllness
 89 | color blind,RaceEuphemisms
 90 | colored,RaceEuphemisms
 91 | coloured,RaceEuphemisms
 92 | race relations,RaceEuphemisms
 93 | race situation,RaceEuphemisms
 94 | race-based,RaceEuphemisms
 95 | racial,RaceEuphemisms
 96 | racism,RaceEuphemisms
 97 | riot,RaceEuphemisms
 98 | troubles,RaceEuphemisms
 99 | unruly,RaceEuphemisms
100 | aboriginal,Race
101 | aboriginals,Race
102 | aborigines,Race
103 | afro-american,Race
104 | aliens,Race
105 | arab,Race
106 | arabs,Race
107 | asians,Race
108 | asiatic,Race
109 | blacks,Race
110 | bushman,Race
111 | bushmen,Race
112 | bushwoman,Race
113 | chink,Race
114 | civilized,Race
115 | coolie,Race
116 | coolies,Race
117 | creole,Race
118 | creoles,Race
119 | ethnic,Race
120 | exotic,Race
121 | gook,Race
122 | gypsies,Race
123 | gypsy,Race
124 | hispanics,Race
125 | illegal alien,Race
126 | illegal aliens,Race
127 | illegal immigrant,Race
128 | illegal immigrants,Race
129 | illegals,Race
130 | indian,Race
131 | indians,Race
132 | japs,Race
133 | mammy,Race
134 | mulatto,Race
135 | mulattoes,Race
136 | mulattos,Race
137 | Native Americans,Race
138 | natives,Race
139 | negroes,Race
140 | negros,Race
141 | oriental,Race
142 | primitive people,Race
143 | primitives,Race
144 | pygmies,Race
145 | pygmy,Race
146 | sambo,Race
147 | savages,Race
148 | segregated,Race
149 | squaw,Race
150 | squaws,Race
151 | uncivilized,Race
152 | lamenite,Race
153 | abolition,Slavery
154 | abolitionist,Slavery
155 | antislavery,Slavery
156 | anti-slavery,Slavery
157 | bill of sale,Slavery
158 | bills of sale,Slavery
159 | enslaved,Slavery
160 | freed slave,Slavery
161 | freed slaves,Slavery
162 | freedman,Slavery
163 | freedmen,Slavery
164 | fugitive slave,Slavery
165 | hired out,Slavery
166 | hiring out,Slavery
167 | manumission,Slavery
168 | manumitted,Slavery
169 | overseer,Slavery
170 | plantation,Slavery
171 | runaway slave,Slavery
172 | runaway slaves,Slavery
173 | slave,Slavery
174 | slave holder,Slavery
175 | slave master,Slavery
176 | slave owner,Slavery
177 | slave revolt,Slavery
178 | slaveholder,Slavery
179 | slavery,Slavery
180 | slaves,Slavery
181 | evacuate,USJapaneseIncarceration
182 | evacuation,USJapaneseIncarceration
183 | evacuees,USJapaneseIncarceration
184 | evacuee,USJapaneseIncarceration
185 | relocation,USJapaneseIncarceration
186 | internment,USJapaneseIncarceration
187 | assembly center,USJapaneseIncarceration
188 | relocation center,USJapaneseIncarceration
189 | non-aliens,USJapaneseIncarceration
190 | native American aliens,USJapaneseIncarceration
191 | civilian exclusion orders,USJapaneseIncarceration
192 | relocate,USJapaneseIncarceration
193 | relocation,USJapaneseIncarceration
194 | 


--------------------------------------------------------------------------------
/Code/Past Versions/MaRMAT-CommandLine-2.5.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import re
  3 | 
  4 | class MaRMAT:
  5 |     """A tool for assessing metadata and identifying matches based on a provided lexicon."""
  6 | 
  7 |     def __init__(self):
  8 |         """Initialize the assessment tool."""
  9 |         self.lexicon_df = None
 10 |         self.metadata_df = None
 11 |         self.columns = []  # List of all available columns in the metadata
 12 |         self.categories = []  # List of all available categories in the lexicon
 13 |         self.selected_columns = []  # List of columns selected for matching
 14 |         self.identifier_column = None  # Identifier column used to uniquely identify rows
 15 | 
 16 |     def load_lexicon(self, file_path):
 17 |         """Load the lexicon file.
 18 | 
 19 |         Parameters:
 20 |         file_path (str): Path to the lexicon CSV file.
 21 | 
 22 |         """
 23 |         try:
 24 |             self.lexicon_df = pd.read_csv(file_path, encoding='latin1')
 25 |             print("Lexicon loaded successfully.")
 26 |         except Exception as e:
 27 |             print(f"An error occurred while loading lexicon: {e}")
 28 | 
 29 |     def load_metadata(self, file_path):
 30 |         """Load the metadata file.
 31 | 
 32 |         Parameters:
 33 |         file_path (str): Path to the metadata CSV file.
 34 | 
 35 |         """
 36 |         try:
 37 |             self.metadata_df = pd.read_csv(file_path, encoding='latin1')
 38 |             print("Metadata loaded successfully.")
 39 |         except Exception as e:
 40 |             print(f"An error occurred while loading metadata: {e}")
 41 | 
 42 |     def select_columns(self, columns):
 43 |         """Select columns from the metadata for matching.
 44 | 
 45 |         Parameters:
 46 |         columns (list of str): List of column names in the metadata.
 47 | 
 48 |         """
 49 |         self.selected_columns = columns
 50 | 
 51 |     def select_identifier_column(self, column):
 52 |         """Select the identifier column used for uniquely identifying rows.
 53 | 
 54 |         Parameters:
 55 |         column (str): Name of the identifier column in the metadata.
 56 | 
 57 |         """
 58 |         self.identifier_column = column
 59 | 
 60 |     def select_categories(self, categories):
 61 |         """Select categories from the lexicon for matching.
 62 | 
 63 |         Parameters:
 64 |         categories (list of str): List of category names in the lexicon.
 65 | 
 66 |         """
 67 |         self.categories = categories
 68 | 
 69 |     def perform_matching(self, output_file):
 70 |         """Perform matching between selected columns and categories and save results to a CSV file.
 71 | 
 72 |         Parameters:
 73 |         output_file (str): Path to the output CSV file to save matching results.
 74 | 
 75 |         """
 76 |         if self.lexicon_df is None or self.metadata_df is None:
 77 |             print("Please load lexicon and metadata files first.")
 78 |             return
 79 | 
 80 |         matches = self.find_matches(self.selected_columns, self.categories)
 81 |         matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column'])
 82 |         print(matches_df)
 83 | 
 84 |         """Write results to CSV"""
 85 |         try:
 86 |             matches_df.to_csv(output_file, index=False)
 87 |             print(f"Results saved to {output_file}")
 88 |         except Exception as e:
 89 |             print(f"An error occurred while saving results: {e}")
 90 | 
 91 |     def find_matches(self, selected_columns, selected_categories):
 92 |         """Find matches between metadata and lexicon based on selected columns and categories.
 93 | 
 94 |         Parameters:
 95 |         selected_columns (list of str): List of column names from metadata for matching.
 96 |         selected_categories (list of str): List of category names from the lexicon for matching.
 97 | 
 98 |         Returns:
 99 |         list of tuple: List of tuples containing matched results (Identifier, Term, Category, Column).
100 | 
101 |         """
102 |         matches = []
103 |         lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)]
104 |         for index, row in self.metadata_df.iterrows():
105 |             for col in selected_columns:
106 |                 if isinstance(row[col], str):
107 |                     for term, category in zip(lexicon_df['term'], lexicon_df['category']):
108 |                         if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
109 |                             matches.append((row[self.identifier_column], term, category, col))
110 |         return matches
111 | 
112 | # Define output file path
113 | output_file = "matches.csv" # Input the file path where you want to save your matches here.
114 | 
115 | # Example usage:
116 | print("1. Initialize the tool:")
117 | tool = MaRMAT()
118 | 
119 | print("\n2. Load lexicon and metadata files:")
120 | tool.load_lexicon("lexicon.csv")  # Input the path to your lexicon CSV file.
121 | tool.load_metadata("metadata.csv")  # Input the path to your metadata CSV file.
122 | 
123 | print("\n3. Select columns for matching:")
124 | tool.select_columns(["Column1", "Column2"])  # Input the name(s) of the metadata column(s) you want to analyze.
125 | 
126 | print("\n4. Select the identifier column:")
127 | tool.select_identifier_column("Identifier")  # Input the name of your identifier column (e.g., a record ID number).
128 | 
129 | print("\n5. Select categories for matching:")
130 | tool.select_categories(["RaceTerms"])  # Input the categories from the lexicon that you want to search for.
131 | 
132 | print("\n6. Perform matching and view results:")
133 | tool.perform_matching(output_file) 
134 | 


--------------------------------------------------------------------------------
/Code/MarMAT-CommandLine-2.6.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import re
  3 | 
  4 | class MaRMAT:
  5 |     """A tool for assessing metadata and identifying matches based on a provided lexicon."""
  6 | 
  7 |     def __init__(self):
  8 |         """Initialize the assessment tool."""
  9 |         self.lexicon_df = None
 10 |         self.metadata_df = None
 11 |         self.columns = []  # List of all available columns in the metadata
 12 |         self.categories = []  # List of all available categories in the lexicon
 13 |         self.selected_columns = []  # List of columns selected for matching
 14 |         self.identifier_column = None  # Identifier column used to uniquely identify rows
 15 | 
 16 |     def load_lexicon(self, file_path):
 17 |         """Load the lexicon file.
 18 | 
 19 |         Parameters:
 20 |         file_path (str): Path to the lexicon CSV file.
 21 | 
 22 |         """
 23 |         try:
 24 |             self.lexicon_df = pd.read_csv(file_path, encoding='latin1')
 25 |             print("Lexicon loaded successfully.")
 26 |         except Exception as e:
 27 |             print(f"An error occurred while loading lexicon: {e}")
 28 | 
 29 |     def load_metadata(self, file_path):
 30 |         """Load the metadata file.
 31 | 
 32 |         Parameters:
 33 |         file_path (str): Path to the metadata CSV file.
 34 | 
 35 |         """
 36 |         try:
 37 |             self.metadata_df = pd.read_csv(file_path, encoding='latin1')
 38 |             print("Metadata loaded successfully.")
 39 |         except Exception as e:
 40 |             print(f"An error occurred while loading metadata: {e}")
 41 | 
 42 |     def select_columns(self, columns):
 43 |         """Select columns from the metadata for matching.
 44 | 
 45 |         Parameters:
 46 |         columns (list of str): List of column names in the metadata.
 47 | 
 48 |         """
 49 |         self.selected_columns = columns
 50 | 
 51 |     def select_identifier_column(self, column):
 52 |         """Select the identifier column used for uniquely identifying rows.
 53 | 
 54 |         Parameters:
 55 |         column (str): Name of the identifier column in the metadata.
 56 | 
 57 |         """
 58 |         self.identifier_column = column
 59 | 
 60 |     def select_categories(self, categories):
 61 |         """Select categories from the lexicon for matching.
 62 | 
 63 |         Parameters:
 64 |         categories (list of str): List of category names in the lexicon.
 65 | 
 66 |         """
 67 |         self.categories = categories
 68 | 
 69 |     def perform_matching(self, output_file):
 70 |         """Perform matching between selected columns and categories and save results to a CSV file.
 71 | 
 72 |         Parameters:
 73 |         output_file (str): Path to the output CSV file to save matching results.
 74 | 
 75 |         """
 76 |         if self.lexicon_df is None or self.metadata_df is None:
 77 |             print("Please load lexicon and metadata files first.")
 78 |             return
 79 | 
 80 |         matches = self.find_matches(self.selected_columns, self.categories)
 81 |         matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column'])
 82 |         print(matches_df)
 83 | 
 84 |         """Write results to CSV"""
 85 |         try:
 86 |             matches_df.to_csv(output_file, index=False)
 87 |             print(f"Results saved to {output_file}")
 88 |         except Exception as e:
 89 |             print(f"An error occurred while saving results: {e}")
 90 | 
 91 |     def find_matches(self, selected_columns, selected_categories):
 92 |         """Find matches between metadata and lexicon based on selected columns and categories.
 93 | 
 94 |         Parameters:
 95 |         selected_columns (list of str): List of column names from metadata for matching.
 96 |         selected_categories (list of str): List of category names from the lexicon for matching.
 97 | 
 98 |         Returns:
 99 |         list of tuple: List of tuples containing matched results (Identifier, Term, Category, Column).
100 | 
101 |         """
102 |         matches = []
103 |         lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)]
104 |         for index, row in self.metadata_df.iterrows():
105 |             for col in selected_columns:
106 |                 if isinstance(row[col], str):
107 |                     for term, category in zip(lexicon_df['term'], lexicon_df['category']):
108 |                         if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
109 |                             matches.append((row[self.identifier_column], term, category, col))
110 |         return matches
111 | 
112 | # Main program for command line interaction
113 | if __name__ == "__main__":
114 |     print("1. Initialize the tool:")
115 |     tool = MaRMAT()
116 | 
117 |     print("\n2. Load lexicon and metadata files:")
118 |     lexicon_path = input("Enter the path to the lexicon CSV file: ")
119 |     tool.load_lexicon(lexicon_path)
120 |     
121 |     metadata_path = input("Enter the path to the metadata CSV file: ")
122 |     tool.load_metadata(metadata_path)
123 | 
124 |     print("\n3. Select columns for matching:")
125 |     columns = input("Enter the column names for matching, separated by commas: ").split(",")
126 |     tool.select_columns([col.strip() for col in columns])  # Strip whitespace
127 | 
128 |     print("\n4. Select the identifier column:")
129 |     identifier = input("Enter the name of the identifier column: ")
130 |     tool.select_identifier_column(identifier)
131 | 
132 |     print("\n5. Select categories for matching:")
133 |     categories = input("Enter the categories from the lexicon for matching, separated by commas: ").split(",")
134 |     tool.select_categories([cat.strip() for cat in categories])  # Strip whitespace
135 | 
136 |     print("\n6. Perform matching and view results:")
137 |     output_file = input("Enter the path to save the output CSV file: ")
138 |     tool.perform_matching(output_file)
139 | 


--------------------------------------------------------------------------------
/Code/example-output-lcsh-subject-lexicon.csv:
--------------------------------------------------------------------------------
 1 | Identifier,Term,Category,Column,Original Text
 2 | 1533946,Indians of North America,ProblemLCSH,subjects,"Indians of North America--Monuments--Photographs; Indians of North America--Art--Photographs; Malin, Millard F., 1891-1974--Works--Photographs; Sculptures--Photographs; Indigenous peoples--North America"
 3 | 1398979,Navajo,ProblemLCSH,subjects,"Navajo Indians--Music--Photographs; Navajo Indians-Photographs; Salt Lake City (Utah)--Photographs; Olympic Winter Games (19th : 2002 : Salt Lake City, Utah)--Photographs; Indigenous peoples--North America"
 4 | 962277,Indians of North America,ProblemLCSH,subjects,"Wounded Knee (S.D.)--History--Indian occupation, 1973--Photographs; Indians of North America; Indigenous peoples--North America"
 5 | 2348627,Navajo,ProblemLCSH,subjects,Navajo Mountain (Utah and Ariz.)--Photographs; Colorado River (Colo.-Mexico)--Photographs
 6 | 1498946,Discovery and exploration,ProblemLCSH,subjects,"Pueblo Indians--History--Art; Southwest, New--Discovery and exploration--Art; Indigenous peoples--North America"
 7 | 995167,Navajo,ProblemLCSH,subjects,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America
 8 | 958790,Race,ProblemLCSH,subjects,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs"
 9 | 998919,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs"
10 | 1302623,Navajo,ProblemLCSH,subjects,Volcanic fields--Arizona--Apache County--Photographs; Volcanic fields--Navajo Indian Reservation--Photographs; Buttes--Arizona--Apache County--Photographs; Buttes--Navajo Indian Reservation--Photographs; Landforms--Arizona--Apache County--Photographs; Landforms--Navajo Indian Reservation--Photographs; Geology--Arizona--Apache County--Photographs; Navajo Indian Reservation--Photographs
11 | 995167,Navajo,ProblemLCSH,subjects,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America
12 | 2364219,Mormon,ChangeHeadingLCSH,subjects,Mormon pioneers--Photographs; Pioneer Day (Mormon history)--Photographs; Parades--Utah--Salt Lake City--Photographs
13 | 2364219,Mormon pioneers,ChangeHeadingLCSH,subjects,Mormon pioneers--Photographs; Pioneer Day (Mormon history)--Photographs; Parades--Utah--Salt Lake City--Photographs
14 | 941713,"Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH,subjects,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Clothing & dress--Tule Lake Relocation Center--1940-1950"
15 | 941496,"Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH,subjects,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Agriculture--1940-1950"
16 | 941536,"Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH,subjects,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; World War, 1939-1945--Concentration camps--California; Farming--California--Tule Lake--1940-1950; Agricultural laborers--California--Tule lake--1940-1950"
17 | 958469,Race,ProblemLCSH,subjects,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs"
18 | 958790,Race,ProblemLCSH,subjects,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs"
19 | 998946,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs"
20 | 998908,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs"
21 | 998938,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Mormon history); Days of '47; Holidays; Local holidays; Theatrical sets; Costumes (character dress); Thrones; McKay, Calleen Alice Robinson, 1928-2005; Women; Pioneer Days Royalty; Centennial Queens; Beauty contestants"
22 | 998919,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs"
23 | 2292054,Race,ProblemLCSH,subjects,"Civil rights movements--United States--History--20th century; Civil rights--Utah; Race discrimination--Religious aspects--Latter Day Saint churches; African Americans; Racism against Black people; Nabors, Charles James,1934-1986; African American scientists"
24 | 1396789,Hispanic Americans,ProblemLCSH,subjects,Hispanic Americans--Utah; Latin Americans--Utah
25 | 1396777,Indians of North America,ProblemLCSH,subjects,"Indians--Legal status, laws, etc.; Indians of North America--Legal status, laws, etc.--Canada; Indians of North America--Legal status, laws, etc.--United States; Discrimination; Indians--Social conditions; Indigenous peoples--North America"
26 | 1049391,Nephites,ChangeHeadingLCSH,subjects,Nephites; Folklore--Utah; Latter Day Saints--Utah--History--Anecdotes
27 | 1006680,Mormon,ChangeHeadingLCSH,subjects,Church of Jesus Christ of Latter-day Saints--History; Utah--History; Mormon converts
28 | 1006680,Mormon converts,ChangeHeadingLCSH,subjects,Church of Jesus Christ of Latter-day Saints--History; Utah--History; Mormon converts
29 | 1470940,Victims,ProblemLCSH,subjects,Nuclear weapons Testing; Nuclear weapons--United States--Testing; Nuclear weapons testing victims; Nuclear weapons--United States--Testing; Radioactive fallout
30 | 818891,Mormon,ChangeHeadingLCSH,subjects,"Loewenberg, Peter, 1933- --Interviews; Brodie, Fawn McKay, 1915-1981--Biography; Latter Day Saints--Biography; Mormon scholars--Biography; University of California, Los Angeles--Faculty--Biography"
31 | 818891,Mormon scholars,ChangeHeadingLCSH,subjects,"Loewenberg, Peter, 1933- --Interviews; Brodie, Fawn McKay, 1915-1981--Biography; Latter Day Saints--Biography; Mormon scholars--Biography; University of California, Los Angeles--Faculty--Biography"
32 | 1396789,Hispanic Americans,ProblemLCSH,subjects,Hispanic Americans--Utah; Latin Americans--Utah
33 | 893658,Race,ProblemLCSH,subjects,"African Americans--Utah--Interviews; Henry, Alberta H., 1920-2005--Interviews; African Americans--Civil rights--Utah; Utah--Race relations"
34 | 893658,Race relations,ProblemLCSH,subjects,"African Americans--Utah--Interviews; Henry, Alberta H., 1920-2005--Interviews; African Americans--Civil rights--Utah; Utah--Race relations"
35 | 958364,Race,ProblemLCSH,subjects,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs"
36 | 1739616,Victims,ProblemLCSH,subjects,Murder victims--Photographs
37 | 


--------------------------------------------------------------------------------
/XML Test Code/RMA-Tool.py:
--------------------------------------------------------------------------------
  1 | import xml.etree.ElementTree as ET
  2 | import csv
  3 | import nltk
  4 | import string
  5 | from nltk.tokenize import word_tokenize
  6 | from nltk.corpus import stopwords
  7 | import re
  8 | 
  9 | # Download NLTK resources if not already downloaded
 10 | nltk.download('punkt')
 11 | nltk.download('stopwords')
 12 | 
 13 | def parse_xml_to_csv(xml_file, csv_file):
 14 |     """
 15 |     Parses an XML file containing specific metadata and writes the extracted data into a CSV file.
 16 | 
 17 |     Parameters:
 18 |     - xml_file (str): File path of the XML input file.
 19 |     - csv_file (str): File path of the CSV output file.
 20 | 
 21 |     Note:
 22 |     - Make sure the XML file follows a specific structure with predefined namespaces.
 23 |     - Ensure that the CSV file path points to a writable location.
 24 |     """
 25 | 
 26 |     # Define namespaces
 27 |     namespaces = {
 28 |         'oai': 'http://www.openarchives.org/OAI/2.0/',
 29 |         'qdc': 'http://worldcat.org/xmlschemas/qdc-1.0/',
 30 |         'dcterms': 'http://purl.org/dc/terms/',
 31 |         'dc': 'http://purl.org/dc/elements/1.1/'
 32 |     }
 33 |     
 34 |     # Load stopwords and punctuation
 35 |     stop_words = set(stopwords.words('english'))
 36 |     punctuation = set(string.punctuation)
 37 |     
 38 |     # Parse XML
 39 |     tree = ET.parse(xml_file)
 40 |     root = tree.getroot()
 41 |     
 42 |     # Open CSV file for writing
 43 |     with open(csv_file, 'w', newline='', encoding='utf-8') as csvfile:
 44 |         writer = csv.writer(csvfile)
 45 |         
 46 |         # Write headers
 47 |         writer.writerow(['Identifier', 'Title', 'Subject', 'IdentifierURL', 'Token'])
 48 |         
 49 |         # Extract data from XML and write to CSV
 50 |         for record in root.findall('.//oai:record', namespaces):
 51 |             identifier = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else ""
 52 |             title = record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces) is not None else ""
 53 |             subject = record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces) is not None else ""
 54 |             identifier_url = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else ""
 55 |             
 56 |             # Tokenize and preprocess title and subject
 57 |             title_tokens = [word for word in word_tokenize(title.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if title else []
 58 |             subject_tokens = [word for word in word_tokenize(subject.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if subject else []
 59 |             
 60 |             # Write each token as a separate row with other columns filled down
 61 |             for token in title_tokens:
 62 |                 writer.writerow([identifier, title, subject, identifier_url, token])
 63 |                 
 64 |             for token in subject_tokens:
 65 |                 writer.writerow([identifier, title, subject, identifier_url, token])
 66 | 
 67 | # Define file paths
 68 | xml_file_path = "PATH_TO_XML_FILE"  # Insert path to your XML file
 69 | csv_file_path = "PATH_TO_CSV_FILE"  # Insert path to desired CSV output file
 70 | 
 71 | # Parse XML and write to CSV
 72 | parse_xml_to_csv(xml_file_path, csv_file_path)
 73 | 
 74 | print("Conversion from XML to CSV successfully completed.")
 75 | 
 76 | def load_lexicon_from_csv(file_path):
 77 |     """
 78 |     Loads lexicon categories and terms from a CSV file into a dictionary.
 79 | 
 80 |     Parameters:
 81 |     - file_path (str): File path of the CSV input file.
 82 | 
 83 |     Returns:
 84 |     - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values.
 85 | 
 86 |     Note:
 87 |     - The CSV file should have lexicon categories as column headers and terms listed under each category.
 88 |     """
 89 | 
 90 |     lexicon = {
 91 |         "Aggrandizement": [],
 92 |         "RaceEuphemisms": [],
 93 |         "RaceTerms": [],
 94 |         "SlaveryTerms": [],
 95 |         "GenderTerms": [],
 96 |         "LGBTQ": [],
 97 |         "MentalIllness": [],
 98 |         "Disability": []
 99 |     }
100 |     
101 |     with open(file_path, 'r', encoding='utf-8-sig') as csv_file:
102 |         csv_reader = csv.reader(csv_file)
103 |         next(csv_reader)  # Skip the header row
104 |         
105 |         for row in csv_reader:
106 |             if len(row) == 8:
107 |                 lexicon["Aggrandizement"].append(row[0])
108 |                 lexicon["RaceEuphemisms"].append(row[1])
109 |                 lexicon["RaceTerms"].append(row[2])
110 |                 lexicon["SlaveryTerms"].append(row[3])
111 |                 lexicon["GenderTerms"].append(row[4])
112 |                 lexicon["LGBTQ"].append(row[5])
113 |                 lexicon["MentalIllness"].append(row[6])
114 |                 lexicon["Disability"].append(row[7])
115 |             else:
116 |                 print("Invalid row format. Skipping...")
117 |     
118 |     return lexicon
119 | 
120 | def search_and_append_lexicon_category(lexicon, input_csv_file, output_csv_file):
121 |     """
122 |     Searches for lexicon term matches in an input CSV file, appends lexicon categories to each row, and writes the modified data into an output CSV file.
123 | 
124 |     Parameters:
125 |     - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values.
126 |     - input_csv_file (str): File path of the input CSV file.
127 |     - output_csv_file (str): File path of the output CSV file.
128 | 
129 |     Note:
130 |     - The input CSV file should contain a 'Token' column where lexicon term matches will be searched.
131 |     - The output CSV file will have an additional column 'LexiconCategory' appended to each row, indicating the matched lexicon categories.
132 |     """
133 | 
134 |     # Load lexicon
135 |     lexicon = load_lexicon_from_csv(lexicon)
136 | 
137 |     # Open input CSV file for reading and output CSV file for writing
138 |     with open(input_csv_file, 'r', newline='', encoding='utf-8') as input_csv, \
139 |          open(output_csv_file, 'w', newline='', encoding='utf-8') as output_csv:
140 |         
141 |         reader = csv.reader(input_csv)
142 |         writer = csv.writer(output_csv)
143 |         
144 |         # Write headers to the output CSV file
145 |         headers = next(reader)
146 |         headers.append('LexiconCategory')
147 |         writer.writerow(headers)
148 |         
149 |         # Iterate over rows in the input CSV file
150 |         for row in reader:
151 |             token = row[4]  # Assuming token is in the 5th column
152 |             # Search for matches between token and terms in the lexicon
153 |             matching_categories = [category for category, terms in lexicon.items() if token in terms]
154 |             # Append lexicon category to the row
155 |             row.append(', '.join(matching_categories))
156 |             # Write the modified row to the output CSV file
157 |             writer.writerow(row)
158 | 
159 |     # Open the output CSV file again to remove rows without a LexiconCategory
160 |     with open(output_csv_file, 'r', newline='', encoding='utf-8') as output_csv:
161 |         reader = csv.reader(output_csv)
162 |         # Filter rows based on presence of LexiconCategory
163 |         rows_to_keep = [row for row in reader if row[-1] != '']  # Assuming LexiconCategory is the last column
164 |         # Write filtered rows back to the output CSV file
165 |         with open(output_csv_file, 'w', newline='', encoding='utf-8') as updated_output_csv:
166 |             writer = csv.writer(updated_output_csv)
167 |             writer.writerows(rows_to_keep)
168 | 
169 | # File paths
170 | lexicon_file_path = "PATH_TO_LEXICON_CSV_FILE"  # Insert path to your lexicon CSV file
171 | input_csv_file_path = "PATH_TO_INPUT_CSV_FILE"  # Insert path to your input CSV file
172 | output_csv_file_path = "PATH_TO_OUTPUT_CSV_FILE"  # Insert path to desired output CSV file
173 | 
174 | # Search for matches, append lexicon category, and remove rows without a LexiconCategory
175 | search_and_append_lexicon_category(lexicon_file_path, input_csv_file_path, output_csv_file_path)
176 | 
177 | print("Reparative Metadata Audit successfully completed.")
178 | 


--------------------------------------------------------------------------------
/XML Test Code/RMA-GUI.py:
--------------------------------------------------------------------------------
  1 | import xml.etree.ElementTree as ET
  2 | import csv
  3 | import nltk
  4 | import string
  5 | from nltk.tokenize import word_tokenize
  6 | from nltk.corpus import stopwords
  7 | import tkinter as tk
  8 | from tkinter import filedialog, messagebox
  9 | 
 10 | # Download NLTK resources if not already downloaded
 11 | nltk.download('punkt')
 12 | nltk.download('stopwords')
 13 | 
 14 | def parse_xml_to_csv(xml_file, csv_file):
 15 |     """
 16 |     Parses an XML file containing specific metadata and writes the extracted data into a CSV file.
 17 | 
 18 |     Parameters:
 19 |     - xml_file (str): File path of the XML input file.
 20 |     - csv_file (str): File path of the CSV output file.
 21 | 
 22 |     Note:
 23 |     - Make sure the XML file follows a specific structure with predefined namespaces.
 24 |     - Ensure that the CSV file path points to a writable location.
 25 |     """
 26 | 
 27 |     # Define namespaces
 28 |     namespaces = {
 29 |         'oai': 'http://www.openarchives.org/OAI/2.0/',
 30 |         'qdc': 'http://worldcat.org/xmlschemas/qdc-1.0/',
 31 |         'dcterms': 'http://purl.org/dc/terms/',
 32 |         'dc': 'http://purl.org/dc/elements/1.1/'
 33 |     }
 34 |     
 35 |     # Load stopwords and punctuation
 36 |     stop_words = set(stopwords.words('english'))
 37 |     punctuation = set(string.punctuation)
 38 |     
 39 |     # Parse XML
 40 |     tree = ET.parse(xml_file)
 41 |     root = tree.getroot()
 42 |     
 43 |     # Open CSV file for writing
 44 |     with open(csv_file, 'w', newline='', encoding='utf-8') as csvfile:
 45 |         writer = csv.writer(csvfile)
 46 |         
 47 |         # Write headers
 48 |         writer.writerow(['Identifier', 'Title', 'Subject', 'IdentifierURL', 'Token'])
 49 |         
 50 |         # Extract data from XML and write to CSV
 51 |         for record in root.findall('.//oai:record', namespaces):
 52 |             identifier = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else ""
 53 |             title = record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces) is not None else ""
 54 |             subject = record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces) is not None else ""
 55 |             identifier_url = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else ""
 56 |             
 57 |             # Tokenize and preprocess title and subject
 58 |             title_tokens = [word for word in word_tokenize(title.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if title else []
 59 |             subject_tokens = [word for word in word_tokenize(subject.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if subject else []
 60 |             
 61 |             # Write each token as a separate row with other columns filled down
 62 |             for token in title_tokens:
 63 |                 writer.writerow([identifier, title, subject, identifier_url, token])
 64 |                 
 65 |             for token in subject_tokens:
 66 |                 writer.writerow([identifier, title, subject, identifier_url, token])
 67 | 
 68 | def load_lexicon_from_csv(file_path):
 69 |     """
 70 |     Loads lexicon categories and terms from a CSV file into a dictionary.
 71 | 
 72 |     Parameters:
 73 |     - file_path (str): File path of the CSV input file.
 74 | 
 75 |     Returns:
 76 |     - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values.
 77 | 
 78 |     Note:
 79 |     - The CSV file should have lexicon categories as column headers and terms listed under each category.
 80 |     """
 81 | 
 82 |     lexicon = {
 83 |         "Aggrandizement": [],
 84 |         "RaceEuphemisms": [],
 85 |         "RaceTerms": [],
 86 |         "SlaveryTerms": [],
 87 |         "GenderTerms": [],
 88 |         "LGBTQ": [],
 89 |         "MentalIllness": [],
 90 |         "Disability": []
 91 |     }
 92 |     
 93 |     with open(file_path, 'r', encoding='utf-8-sig') as csv_file:
 94 |         csv_reader = csv.reader(csv_file)
 95 |         next(csv_reader)  # Skip the header row
 96 |         
 97 |         for row in csv_reader:
 98 |             if len(row) == 8:
 99 |                 lexicon["Aggrandizement"].append(row[0])
100 |                 lexicon["RaceEuphemisms"].append(row[1])
101 |                 lexicon["RaceTerms"].append(row[2])
102 |                 lexicon["SlaveryTerms"].append(row[3])
103 |                 lexicon["GenderTerms"].append(row[4])
104 |                 lexicon["LGBTQ"].append(row[5])
105 |                 lexicon["MentalIllness"].append(row[6])
106 |                 lexicon["Disability"].append(row[7])
107 |             else:
108 |                 print("Invalid row format. Skipping...")
109 |     
110 |     return lexicon
111 | 
112 | def search_and_append_lexicon_category(lexicon, input_csv_file, output_csv_file):
113 |     """
114 |     Searches for lexicon term matches in an input CSV file, appends lexicon categories to each row, and writes the modified data into an output CSV file.
115 | 
116 |     Parameters:
117 |     - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values.
118 |     - input_csv_file (str): File path of the input CSV file.
119 |     - output_csv_file (str): File path of the output CSV file.
120 | 
121 |     Note:
122 |     - The input CSV file should contain a 'Token' column where lexicon term matches will be searched.
123 |     - The output CSV file will have an additional column 'LexiconCategory' appended to each row, indicating the matched lexicon categories.
124 |     """
125 | 
126 |     # Load lexicon
127 |     lexicon = load_lexicon_from_csv(lexicon)
128 | 
129 |     # Open input CSV file for reading and output CSV file for writing
130 |     with open(input_csv_file, 'r', newline='', encoding='utf-8') as input_csv, \
131 |          open(output_csv_file, 'w', newline='', encoding='utf-8') as output_csv:
132 |         
133 |         reader = csv.reader(input_csv)
134 |         writer = csv.writer(output_csv)
135 |         
136 |         # Write headers to the output CSV file
137 |         headers = next(reader)
138 |         headers.append('LexiconCategory')
139 |         writer.writerow(headers)
140 |         
141 |         # Iterate over rows in the input CSV file
142 |         for row in reader:
143 |             token = row[4]  # Assuming token is in the 5th column
144 |             # Search for matches between token and terms in the lexicon
145 |             matching_categories = [category for category, terms in lexicon.items() if token in terms]
146 |             # Append lexicon category to the row
147 |             row.append(', '.join(matching_categories))
148 |             # Write the modified row to the output CSV file
149 |             writer.writerow(row)
150 | 
151 |     # Open the output CSV file again to remove rows without a LexiconCategory
152 |     with open(output_csv_file, 'r', newline='', encoding='utf-8') as output_csv:
153 |         reader = csv.reader(output_csv)
154 |         # Filter rows based on presence of LexiconCategory
155 |         rows_to_keep = [row for row in reader if row[-1] != '']  # Assuming LexiconCategory is the last column
156 |         # Write filtered rows back to the output CSV file
157 |         with open(output_csv_file, 'w', newline='', encoding='utf-8') as updated_output_csv:
158 |             writer = csv.writer(updated_output_csv)
159 |             writer.writerows(rows_to_keep)
160 | 
161 | def browse_xml():
162 |     filename = filedialog.askopenfilename(filetypes=[("XML Files", "*.xml")])
163 |     xml_entry.delete(0, tk.END)
164 |     xml_entry.insert(0, filename)
165 | 
166 | def browse_lexicon():
167 |     filename = filedialog.askopenfilename(filetypes=[("CSV Files", "*.csv")])
168 |     lexicon_entry.delete(0, tk.END)
169 |     lexicon_entry.insert(0, filename)
170 | 
171 | def browse_output():
172 |     filename = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV Files", "*.csv")])
173 |     output_entry.delete(0, tk.END)
174 |     output_entry.insert(0, filename)
175 | 
176 | def process_files():
177 |     xml_file = xml_entry.get()
178 |     lexicon_file = lexicon_entry.get()
179 |     output_file = output_entry.get()
180 | 
181 |     if not xml_file or not lexicon_file or not output_file:
182 |         messagebox.showerror("Error", "Please select XML file, lexicon file, and output file.")
183 |         return
184 |     
185 |     try:
186 |         parse_xml_to_csv(xml_file, "temp.csv")
187 |         search_and_append_lexicon_category(lexicon_file, "temp.csv", output_file)
188 |         messagebox.showinfo("Success", "Conversion completed successfully.")
189 |     except Exception as e:
190 |         messagebox.showerror("Error", f"An error occurred: {str(e)}")
191 | 
192 |     # Remove temp file
193 |     import os
194 |     os.remove("temp.csv")
195 | 
196 | # Create GUI
197 | root = tk.Tk()
198 | root.title("XML to CSV Converter")
199 | 
200 | # XML File
201 | xml_label = tk.Label(root, text="Please select the XML file you want to analyze:")
202 | xml_label.grid(row=0, column=0, padx=5, pady=5, sticky="w")
203 | xml_entry = tk.Entry(root, width=50)
204 | xml_entry.grid(row=0, column=1, columnspan=2, padx=5, pady=5)
205 | xml_button = tk.Button(root, text="Browse", command=browse_xml)
206 | xml_button.grid(row=0, column=3, padx=5, pady=5)
207 | 
208 | # Lexicon File
209 | lexicon_label = tk.Label(root, text="Navigate to the lexicons.csv file saved on your computer:")
210 | lexicon_label.grid(row=1, column=0, padx=5, pady=5, sticky="w")
211 | lexicon_entry = tk.Entry(root, width=50)
212 | lexicon_entry.grid(row=1, column=1, columnspan=2, padx=5, pady=5)
213 | lexicon_button = tk.Button(root, text="Browse", command=browse_lexicon)
214 | lexicon_button.grid(row=1, column=3, padx=5, pady=5)
215 | 
216 | # Output File
217 | output_label = tk.Label(root, text="Choose where you would like the audit file to be saved:")
218 | output_label.grid(row=2, column=0, padx=5, pady=5, sticky="w")
219 | output_entry = tk.Entry(root, width=50)
220 | output_entry.grid(row=2, column=1, columnspan=2, padx=5, pady=5)
221 | output_button = tk.Button(root, text="Browse", command=browse_output)
222 | output_button.grid(row=2, column=3, padx=5, pady=5)
223 | 
224 | # Process Button
225 | process_button = tk.Button(root, text="Process", command=process_files)
226 | process_button.grid(row=3, column=0, columnspan=4, padx=5, pady=5)
227 | 
228 | root.mainloop()
229 | 


--------------------------------------------------------------------------------
/Code/Test Versions/RMA-GUI-2.0.py:
--------------------------------------------------------------------------------
  1 | import tkinter as tk
  2 | from tkinter import filedialog, messagebox
  3 | from tkinter import ttk
  4 | import pandas as pd
  5 | import string
  6 | import re
  7 | 
  8 | class ReparativeMetadataAuditTool(tk.Tk):
  9 |     def __init__(self):
 10 |         super().__init__()
 11 |         self.title("Lexicon Matching")
 12 |         
 13 |         # Initialize variables
 14 |         self.lexicon_df = None
 15 |         self.metadata_df = None
 16 |         self.columns = []
 17 |         self.categories = []
 18 |         self.selected_columns = []  # Store selected columns as an attribute
 19 |         self.category_selection_page_active = False  # Track whether category selection page is active
 20 |         
 21 |         # Create main frame
 22 |         self.main_frame = ttk.Frame(self)
 23 |         self.main_frame.pack(fill='both', expand=True)
 24 |         
 25 |         # Load lexicon button
 26 |         self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon)
 27 |         self.load_lexicon_button.pack(pady=10)
 28 |         
 29 |         # Load metadata button
 30 |         self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata)
 31 |         self.load_metadata_button.pack(pady=10)
 32 |         
 33 |         # Next button
 34 |         self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection)
 35 |         self.next_button.pack(pady=10)
 36 |         
 37 |         # Hide next button initially
 38 |         self.next_button.pack_forget()
 39 |         
 40 |         # Second screen frame (Column selection)
 41 |         self.column_selection_frame = ttk.Frame(self)
 42 |         
 43 |         # Column selection label
 44 |         self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:")
 45 |         self.column_label.pack(pady=5)
 46 |         
 47 |         # Columns listbox
 48 |         self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple')
 49 |         self.column_listbox.pack(pady=5)
 50 |         
 51 |         # All columns checkbox
 52 |         self.all_columns_var = tk.BooleanVar(value=False)
 53 |         self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns)
 54 |         self.all_columns_checkbox.pack(pady=5)
 55 |         
 56 |         # Next button for column selection
 57 |         self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_category_selection)
 58 |         self.next_button_columns.pack(pady=10)
 59 |         
 60 |         # Back button for column selection
 61 |         self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.show_main_frame)
 62 |         self.back_button_columns.pack(pady=10)
 63 |         
 64 |         # Second screen frame (Category selection)
 65 |         self.category_selection_frame = ttk.Frame(self)
 66 |         
 67 |         # Initialize match button
 68 |         self.match_button = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching)
 69 | 
 70 |         # Initialize all_categories_var
 71 |         self.all_categories_var = tk.BooleanVar(value=False)
 72 |         
 73 |         # Hide second screen frames initially
 74 |         self.column_selection_frame.pack_forget()
 75 |         self.category_selection_frame.pack_forget()
 76 |     
 77 |     def load_lexicon(self):
 78 |         file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
 79 |         if file_path:
 80 |             try:
 81 |                 self.lexicon_df = pd.read_csv(file_path, encoding='latin1')
 82 |                 messagebox.showinfo("Success", "Lexicon loaded successfully.")
 83 |                 self.load_lexicon_button.config(state='disabled')
 84 |             except Exception as e:
 85 |                 messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}")
 86 |     
 87 |     def load_metadata(self):
 88 |         file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
 89 |         if file_path:
 90 |             try:
 91 |                 self.metadata_df = pd.read_csv(file_path, encoding='latin1')
 92 |                 messagebox.showinfo("Success", "Metadata loaded successfully.")
 93 |                 self.load_metadata_button.config(state='disabled')
 94 |                 self.next_button.pack()
 95 |             except Exception as e:
 96 |                 messagebox.showerror("Error", f"An error occurred while loading metadata: {e}")
 97 |     
 98 |     def show_column_selection(self):
 99 |         if self.lexicon_df is None or self.metadata_df is None:
100 |             messagebox.showwarning("Warning", "Please load lexicon and metadata files first.")
101 |             return
102 |         
103 |         # Populate columns listbox
104 |         self.columns = self.metadata_df.columns.tolist()
105 |         for column in self.columns:
106 |             self.column_listbox.insert(tk.END, column)
107 |         
108 |         # Show column selection frame
109 |         self.main_frame.pack_forget()
110 |         self.column_selection_frame.pack(fill='both', expand=True)
111 |     
112 |     def show_category_selection(self):
113 |         # Get selected columns
114 |         self.selected_columns = self.get_selected_columns()  # Store selected columns
115 |         if not self.selected_columns:
116 |             messagebox.showwarning("Warning", "Please select at least one column.")
117 |             return
118 |         
119 |         # Clear previous selections
120 |         self.categories.clear()
121 |         if hasattr(self, 'category_listbox'):
122 |             self.category_listbox.destroy()  # Destroy previous listbox if exists
123 |         self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple')
124 |         self.category_listbox.pack(pady=5)
125 |         
126 |         # Populate categories listbox
127 |         self.categories = self.lexicon_df['category'].unique().tolist()
128 |         for category in self.categories:
129 |             self.category_listbox.insert(tk.END, category)
130 |         
131 |         # Show category selection frame
132 |         self.column_selection_frame.pack_forget()
133 |         self.match_button.pack_forget()  # Hide matching button if it's already displayed
134 |         self.category_selection_frame.pack(fill='both', expand=True)
135 |         self.category_selection_page_active = True  # Set category selection page as active
136 |         self.match_button.pack(pady=10)  # Display matching button
137 |     
138 |     def perform_matching(self):
139 |         # Hide back buttons
140 |         self.back_button_columns.pack_forget()
141 |         
142 |         if self.category_selection_page_active:  # Check if currently on category selection page
143 |             # Get selected categories
144 |             selected_categories = self.get_selected_categories()
145 | 
146 |             if not selected_categories:
147 |                 messagebox.showwarning("Warning", "Please select at least one category.")
148 |                 return
149 | 
150 |             # Proceed with matching
151 |             matches = self.find_matches(self.selected_columns, selected_categories)
152 |         else:
153 |             # Get selected columns
154 |             selected_columns = self.get_selected_columns()
155 | 
156 |             if not selected_columns:
157 |                 messagebox.showwarning("Warning", "Please select at least one column.")
158 |                 return
159 | 
160 |             # Get selected categories
161 |             selected_categories = self.get_selected_categories()
162 | 
163 |             if not selected_categories:
164 |                 messagebox.showwarning("Warning", "Please select at least one category.")
165 |                 return
166 | 
167 |             # Proceed with matching
168 |             matches = self.find_matches(selected_columns, selected_categories)
169 | 
170 |         # Filter matches to include only selected columns
171 |         matches_filtered = [(identifier, term, category, col) for identifier, term, category, col in matches if col in self.selected_columns]
172 | 
173 |         # Save to CSV
174 |         output_file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
175 |         if output_file_path:
176 |             try:
177 |                 matches_df = pd.DataFrame(matches_filtered, columns=['Identifier', 'Term', 'Category', 'Column'])
178 |                 matches_df.to_csv(output_file_path, index=False)
179 |                 messagebox.showinfo("Success", f"Merged data saved to: {output_file_path}")
180 |             except Exception as e:
181 |                 messagebox.showerror("Error", f"An error occurred while saving file: {e}")
182 |     
183 |     def toggle_columns(self):
184 |         if self.all_columns_var.get():
185 |             self.column_listbox.selection_set(0, tk.END)
186 |             self.column_listbox.config(state='disabled')
187 |         else:
188 |             self.column_listbox.selection_clear(0, tk.END)
189 |             self.column_listbox.config(state='normal')
190 |     
191 |     def toggle_categories(self):
192 |         if self.all_categories_var.get():
193 |             self.category_listbox.selection_set(0, tk.END)
194 |             self.category_listbox.config(state='disabled')
195 |         else:
196 |             self.category_listbox.selection_clear(0, tk.END)
197 |             self.category_listbox.config(state='normal')
198 |     
199 |     def get_selected_columns(self):
200 |         if self.all_columns_var.get():
201 |             return self.columns
202 |         else:
203 |             return [self.columns[i] for i in self.column_listbox.curselection()]
204 |     
205 |     def get_selected_categories(self):
206 |         if self.all_categories_var.get():
207 |             return self.categories
208 |         else:
209 |             return [self.categories[i] for i in self.category_listbox.curselection()]
210 |     
211 |     def find_matches(self, selected_columns, selected_categories):
212 |         matches = []
213 |         # Filter lexicon based on selected categories
214 |         lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)]
215 |         # Iterate over each row in the metadata DataFrame
216 |         for index, row in self.metadata_df.iterrows():
217 |             # Process the text in each specified column
218 |             for col in selected_columns:
219 |                 # Check if the value in the column is a string
220 |                 if isinstance(row[col], str):
221 |                     # Iterate over each term in the lexicon and check for matches
222 |                     for term, category in zip(lexicon_df['term'], lexicon_df['category']):
223 |                         # Check if the whole term exists in the text column
224 |                         if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
225 |                             matches.append((row['Identifier'], term, category, col))
226 |                             break  # Break out of the inner loop once a match is found in this column
227 |         return matches
228 |     
229 |     def show_main_frame(self):
230 |         if self.category_selection_page_active:
231 |             self.category_selection_page_active = False
232 |             self.category_selection_frame.pack_forget()
233 |             self.column_selection_frame.pack(fill='both', expand=True)
234 |         else:
235 |             self.column_selection_frame.pack_forget()
236 |             self.main_frame.pack(fill='both', expand=True)
237 | 
238 | # Create and run the application
239 | app = ReparativeMetadataAuditTool()
240 | app.mainloop()
241 | 


--------------------------------------------------------------------------------
/Code/Past Versions/MaRMAT-GUI-2.5.2.py:
--------------------------------------------------------------------------------
  1 | import tkinter as tk
  2 | from tkinter import filedialog, messagebox, ttk
  3 | import pandas as pd
  4 | import re
  5 | import threading
  6 | 
  7 | class MaRMAT(tk.Tk):
  8 |     def __init__(self):
  9 |         super().__init__()
 10 |         self.title("Marriott Reparative Metadata Assessment Tool (MaRMAT)")
 11 |         
 12 |         # Initialize variables
 13 |         self.lexicon_df = None
 14 |         self.metadata_df = None
 15 |         self.columns = []
 16 |         self.categories = []
 17 |         self.selected_columns = []
 18 |         self.identifier_column = None
 19 |         
 20 |         # Create main frame
 21 |         self.main_frame = ttk.Frame(self)
 22 |         self.main_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
 23 |         
 24 |         # Explanation text
 25 |         self.explanation_text = """
 26 |         Welcome to Marriott Reparative Metadata Assessment Tool (MaRMAT)!
 27 |         
 28 |         This tool allows you to match terms from a problematic terms lexicon file with text data from a collections metadata file.
 29 |         
 30 |         Please follow the steps below:
 31 |         
 32 |         1. Load your lexicon and metadata files using the provided buttons.
 33 |         
 34 |         2. On the next screen, select the columns from your metadata file that you want to analyze.
 35 |         
 36 |         3. After selecting columns, choose the column in your metadata file that you want to rewrite as the "Identifier" column that relates back to your original metadata (e.g., a collection ID).
 37 |         
 38 |         4. Then, choose the categories of terms from the lexicon that you want to search for.
 39 |         
 40 |         5. Click "Perform Matching" to find matches and export the results to a CSV file.
 41 |         
 42 |         Let's get started!
 43 |         """
 44 |         
 45 |         self.explanation_label = ttk.Label(self.main_frame, text=self.explanation_text, justify='left', wraplength=600)
 46 |         self.explanation_label.grid(row=0, column=0, columnspan=3, padx=20, pady=20, sticky="nsew")
 47 |         
 48 |         # Load lexicon button
 49 |         self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon)
 50 |         self.load_lexicon_button.grid(row=1, column=0, padx=10, pady=10, sticky="w")
 51 |         
 52 |         # Load metadata button
 53 |         self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata)
 54 |         self.load_metadata_button.grid(row=1, column=1, padx=10, pady=10, sticky="w")
 55 |         
 56 |         # Reset button
 57 |         self.reset_button = ttk.Button(self.main_frame, text="Reset", command=self.reset)
 58 |         self.reset_button.grid(row=1, column=2, padx=10, pady=10, sticky="w")
 59 |         
 60 |         # Next button
 61 |         self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection)
 62 |         self.next_button.grid(row=2, column=0, columnspan=3, pady=10, sticky="nsew")
 63 |         self.next_button.grid_remove()  # Hide next button initially
 64 | 
 65 |     def load_lexicon(self):
 66 |         file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
 67 |         if file_path:
 68 |             try:
 69 |                 self.lexicon_df = pd.read_csv(file_path, encoding='latin1')
 70 |                 messagebox.showinfo("Success", "Lexicon loaded successfully.")
 71 |                 self.load_lexicon_button.config(state='disabled')
 72 |             except Exception as e:
 73 |                 messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}")
 74 |     
 75 |     def load_metadata(self):
 76 |         file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
 77 |         if file_path:
 78 |             try:
 79 |                 self.metadata_df = pd.read_csv(file_path, encoding='latin1')
 80 |                 messagebox.showinfo("Success", "Metadata loaded successfully.")
 81 |                 self.load_metadata_button.config(state='disabled')
 82 |                 self.next_button.grid()
 83 |             except Exception as e:
 84 |                 messagebox.showerror("Error", f"An error occurred while loading metadata: {e}")
 85 |     
 86 |     def show_column_selection(self):
 87 |         if self.lexicon_df is None or self.metadata_df is None:
 88 |             messagebox.showwarning("Warning", "Please load lexicon and metadata files first.")
 89 |             return
 90 |         
 91 |         # Populate columns listbox
 92 |         self.columns = self.metadata_df.columns.tolist()
 93 |         self.column_selection_frame = ttk.Frame(self)
 94 |         self.column_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
 95 |         
 96 |         self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:")
 97 |         self.column_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
 98 |         
 99 |         self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple')
100 |         self.column_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
101 |         for column in self.columns:
102 |             self.column_listbox.insert(tk.END, column)
103 |         
104 |         self.all_columns_var = tk.BooleanVar(value=False)
105 |         self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns)
106 |         self.all_columns_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w")
107 |         
108 |         self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_identifier_selection)
109 |         self.next_button_columns.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
110 |         
111 |         self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.back_to_main_frame)
112 |         self.back_button_columns.grid(row=4, column=0, padx=10, pady=10, sticky="nsew")
113 |     
114 |     def show_identifier_selection(self):
115 |         self.selected_columns = self.get_selected_columns()  # Store selected columns
116 |         if not self.selected_columns:
117 |             messagebox.showwarning("Warning", "Please select at least one column.")
118 |             return
119 |         
120 |         self.column_selection_frame.grid_remove()
121 |         
122 |         self.identifier_selection_frame = ttk.Frame(self)
123 |         self.identifier_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
124 |         
125 |         self.identifier_label = ttk.Label(self.identifier_selection_frame, text="Select Identifier Column:")
126 |         self.identifier_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
127 |         
128 |         self.identifier_var = tk.StringVar()
129 |         self.identifier_dropdown = ttk.Combobox(self.identifier_selection_frame, textvariable=self.identifier_var, state='readonly')
130 |         self.identifier_dropdown.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
131 |         self.identifier_dropdown['values'] = self.metadata_df.columns.tolist()  # Show all columns as options
132 |         self.identifier_dropdown.current(0)  # Select first column by default
133 |         
134 |         self.next_button_identifier = ttk.Button(self.identifier_selection_frame, text="Next", command=self.show_category_selection)
135 |         self.next_button_identifier.grid(row=2, column=0, padx=10, pady=10, sticky="nsew")
136 |         
137 |         self.back_button_identifier = ttk.Button(self.identifier_selection_frame, text="Back", command=self.back_to_column_selection)
138 |         self.back_button_identifier.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
139 |     
140 |     def show_category_selection(self):
141 |         self.identifier_column = self.identifier_var.get()
142 |         
143 |         self.identifier_selection_frame.grid_remove()
144 |         
145 |         self.category_selection_frame = ttk.Frame(self)
146 |         self.category_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
147 |         
148 |         self.category_label = ttk.Label(self.category_selection_frame, text="Select Categories to Analyze:")
149 |         self.category_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
150 |         
151 |         self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple')
152 |         self.category_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
153 |         self.categories = self.lexicon_df['category'].unique().tolist()
154 |         for category in self.categories:
155 |             self.category_listbox.insert(tk.END, category)
156 |         
157 |         self.all_categories_var = tk.BooleanVar(value=False)
158 |         self.all_categories_checkbox = ttk.Checkbutton(self.category_selection_frame, text="All", variable=self.all_categories_var, command=self.toggle_categories)
159 |         self.all_categories_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w")
160 |         
161 |         self.next_button_categories = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching)
162 |         self.next_button_categories.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
163 |         
164 |         self.back_button_categories = ttk.Button(self.category_selection_frame, text="Back", command=self.back_to_identifier_selection)
165 |         self.back_button_categories.grid(row=4, column=0, padx=10, pady=10, sticky="nsew")
166 |     
167 |     def perform_matching(self):
168 |         selected_categories = self.get_selected_categories()
169 |         if not selected_categories:
170 |             messagebox.showwarning("Warning", "Please select at least one category.")
171 |             return
172 |         
173 |         matches = self.find_matches(self.selected_columns, selected_categories)
174 |         matches_filtered = [(identifier, term, category, col, text) for identifier, term, category, col, text in matches if col in self.selected_columns]
175 |         output_file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
176 |         if output_file_path:
177 |             try:
178 |                 matches_df = pd.DataFrame(matches_filtered, columns=['Identifier', 'Term', 'Category', 'Column', 'Original Text'])
179 |                 matches_df.to_csv(output_file_path, index=False)
180 |                 messagebox.showinfo("Success", f"Merged data saved to: {output_file_path}")
181 |                 self.reset()
182 |             except Exception as e:
183 |                 messagebox.showerror("Error", f"An error occurred while saving file: {e}")
184 |     
185 |     def toggle_columns(self):
186 |         if self.all_columns_var.get():
187 |             self.column_listbox.selection_set(0, tk.END)
188 |             self.column_listbox.config(state='disabled')
189 |         else:
190 |             self.column_listbox.selection_clear(0, tk.END)
191 |             self.column_listbox.config(state='normal')
192 |     
193 |     def toggle_categories(self):
194 |         if self.all_categories_var.get():
195 |             self.category_listbox.selection_set(0, tk.END)
196 |             self.category_listbox.config(state='disabled')
197 |         else:
198 |             self.category_listbox.selection_clear(0, tk.END)
199 |             self.category_listbox.config(state='normal')
200 |     
201 |     def get_selected_columns(self):
202 |         if self.all_columns_var.get():
203 |             return self.columns
204 |         else:
205 |             return [self.columns[i] for i in self.column_listbox.curselection()]
206 |     
207 |     def get_selected_categories(self):
208 |         return [self.categories[i] for i in self.category_listbox.curselection()]
209 |     
210 |     def find_matches(self, selected_columns, selected_categories):
211 |         matches = []
212 |         lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)]
213 |         for index, row in self.metadata_df.iterrows():
214 |             for col in selected_columns:
215 |                 if isinstance(row[col], str):
216 |                     for term, category in zip(lexicon_df['term'], lexicon_df['category']):
217 |                         if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
218 |                             matches.append((row[self.identifier_column], term, category, col, row[col]))
219 |                             break
220 |         return matches
221 |     
222 |     def back_to_main_frame(self):
223 |         self.column_selection_frame.grid_remove()
224 |         self.main_frame.grid()
225 |     
226 |     def back_to_column_selection(self):
227 |         self.identifier_selection_frame.grid_remove()
228 |         self.column_selection_frame.grid()
229 |     
230 |     def back_to_identifier_selection(self):
231 |         self.category_selection_frame.grid_remove()
232 |         self.identifier_selection_frame.grid()
233 |     
234 |     def reset(self):
235 |         self.load_lexicon_button.config(state='normal')
236 |         self.load_metadata_button.config(state='normal')
237 |         self.lexicon_df = None
238 |         self.metadata_df = None
239 |         self.columns = []
240 |         self.categories = []
241 |         self.selected_columns = []
242 |         self.identifier_column = None
243 |         self.next_button.grid_remove()
244 |         self.explanation_label.grid()
245 | 
246 | # Create and run the application
247 | app = MaRMAT()
248 | app.mainloop()
249 | 


--------------------------------------------------------------------------------
/Code/MaRMAT-GUI-2.5.3.py:
--------------------------------------------------------------------------------
  1 | import tkinter as tk
  2 | from tkinter import filedialog, messagebox, ttk
  3 | import pandas as pd
  4 | import re
  5 | import threading
  6 | 
  7 | class MaRMAT(tk.Tk):
  8 |     def __init__(self):
  9 |         super().__init__()
 10 |         self.title("Marriott Reparative Metadata Assessment Tool (MaRMAT)")
 11 |         
 12 |         # Initialize variables
 13 |         self.lexicon_df = None
 14 |         self.metadata_df = None
 15 |         self.columns = []
 16 |         self.categories = []
 17 |         self.selected_columns = []
 18 |         self.identifier_column = None
 19 |         
 20 |         # Create main frame
 21 |         self.main_frame = ttk.Frame(self)
 22 |         self.main_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
 23 |         
 24 |         # Explanation text
 25 |         self.explanation_text = """
 26 |         Welcome to Marriott Reparative Metadata Assessment Tool (MaRMAT)!
 27 |         
 28 |         This tool allows you to match terms from a problematic terms lexicon file with text data from a collections metadata file.
 29 |         
 30 |         Please follow the steps below:
 31 |         
 32 |         1. Load your lexicon and metadata files using the provided buttons.
 33 |         
 34 |         2. On the next screen, select the columns from your metadata file that you want to analyze.
 35 |         
 36 |         3. After selecting columns, choose the column in your metadata file that you want to rewrite as the "Identifier" column that relates back to your original metadata (e.g., a collection ID).
 37 |         
 38 |         4. Then, choose the categories of terms from the lexicon that you want to search for.
 39 |         
 40 |         5. Click "Perform Matching" to find matches and export the results to a CSV file.
 41 |         
 42 |         Let's get started!
 43 |         """
 44 |         
 45 |         self.explanation_label = ttk.Label(self.main_frame, text=self.explanation_text, justify='left', wraplength=600)
 46 |         self.explanation_label.grid(row=0, column=0, columnspan=3, padx=20, pady=20, sticky="nsew")
 47 |         
 48 |         # Load lexicon button
 49 |         self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon)
 50 |         self.load_lexicon_button.grid(row=1, column=0, padx=10, pady=10, sticky="w")
 51 |         
 52 |         # Load metadata button
 53 |         self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata)
 54 |         self.load_metadata_button.grid(row=1, column=1, padx=10, pady=10, sticky="w")
 55 |         
 56 |         # Reset button
 57 |         self.reset_button = ttk.Button(self.main_frame, text="Reset", command=self.reset)
 58 |         self.reset_button.grid(row=1, column=2, padx=10, pady=10, sticky="w")
 59 |         
 60 |         # Next button
 61 |         self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection)
 62 |         self.next_button.grid(row=2, column=0, columnspan=3, pady=10, sticky="nsew")
 63 |         self.next_button.grid_remove()  # Hide next button initially
 64 | 
 65 |     def load_lexicon(self):
 66 |         file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
 67 |         if file_path:
 68 |             try:
 69 |                 self.lexicon_df = pd.read_csv(file_path, encoding='latin1')
 70 |                 messagebox.showinfo("Success", "Lexicon loaded successfully.")
 71 |                 self.load_lexicon_button.config(state='disabled')
 72 |             except Exception as e:
 73 |                 messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}")
 74 |     
 75 |     def load_metadata(self):
 76 |         file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
 77 |         if file_path:
 78 |             try:
 79 |                 self.metadata_df = pd.read_csv(file_path, encoding='latin1')
 80 |                 messagebox.showinfo("Success", "Metadata loaded successfully.")
 81 |                 self.load_metadata_button.config(state='disabled')
 82 |                 self.next_button.grid()
 83 |             except Exception as e:
 84 |                 messagebox.showerror("Error", f"An error occurred while loading metadata: {e}")
 85 |     
 86 |     def show_column_selection(self):
 87 |         if self.lexicon_df is None or self.metadata_df is None:
 88 |             messagebox.showwarning("Warning", "Please load lexicon and metadata files first.")
 89 |             return
 90 |         
 91 |         # Populate columns listbox
 92 |         self.columns = self.metadata_df.columns.tolist()
 93 |         self.column_selection_frame = ttk.Frame(self)
 94 |         self.column_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
 95 |         
 96 |         self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:")
 97 |         self.column_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
 98 |         
 99 |         self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple')
100 |         self.column_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
101 |         for column in self.columns:
102 |             self.column_listbox.insert(tk.END, column)
103 |         
104 |         self.all_columns_var = tk.BooleanVar(value=False)
105 |         self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns)
106 |         self.all_columns_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w")
107 |         
108 |         self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_identifier_selection)
109 |         self.next_button_columns.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
110 |         
111 |         self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.back_to_main_frame)
112 |         self.back_button_columns.grid(row=4, column=0, padx=10, pady=10, sticky="nsew")
113 |     
114 |     def show_identifier_selection(self):
115 |         self.selected_columns = self.get_selected_columns()  # Store selected columns
116 |         if not self.selected_columns:
117 |             messagebox.showwarning("Warning", "Please select at least one column.")
118 |             return
119 |         
120 |         self.column_selection_frame.grid_remove()
121 |         
122 |         self.identifier_selection_frame = ttk.Frame(self)
123 |         self.identifier_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
124 |         
125 |         self.identifier_label = ttk.Label(self.identifier_selection_frame, text="Select Identifier Column:")
126 |         self.identifier_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
127 |         
128 |         self.identifier_var = tk.StringVar()
129 |         self.identifier_dropdown = ttk.Combobox(self.identifier_selection_frame, textvariable=self.identifier_var, state='readonly')
130 |         self.identifier_dropdown.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
131 |         self.identifier_dropdown['values'] = self.metadata_df.columns.tolist()  # Show all columns as options
132 |         self.identifier_dropdown.current(0)  # Select first column by default
133 |         
134 |         self.next_button_identifier = ttk.Button(self.identifier_selection_frame, text="Next", command=self.show_category_selection)
135 |         self.next_button_identifier.grid(row=2, column=0, padx=10, pady=10, sticky="nsew")
136 |         
137 |         self.back_button_identifier = ttk.Button(self.identifier_selection_frame, text="Back", command=self.back_to_column_selection)
138 |         self.back_button_identifier.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
139 |     
140 |     def show_category_selection(self):
141 |         self.identifier_column = self.identifier_var.get()
142 |         
143 |         self.identifier_selection_frame.grid_remove()
144 |         
145 |         self.category_selection_frame = ttk.Frame(self)
146 |         self.category_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
147 |         
148 |         self.category_label = ttk.Label(self.category_selection_frame, text="Select Categories to Analyze:")
149 |         self.category_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
150 |         
151 |         self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple')
152 |         self.category_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
153 |         self.categories = self.lexicon_df['category'].unique().tolist()
154 |         for category in self.categories:
155 |             self.category_listbox.insert(tk.END, category)
156 |         
157 |         self.all_categories_var = tk.BooleanVar(value=False)
158 |         self.all_categories_checkbox = ttk.Checkbutton(self.category_selection_frame, text="All", variable=self.all_categories_var, command=self.toggle_categories)
159 |         self.all_categories_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w")
160 |         
161 |         self.next_button_categories = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching)
162 |         self.next_button_categories.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
163 |         
164 |         self.back_button_categories = ttk.Button(self.category_selection_frame, text="Back", command=self.back_to_identifier_selection)
165 |         self.back_button_categories.grid(row=4, column=0, padx=10, pady=10, sticky="nsew")
166 |     
167 |     def perform_matching(self):
168 |         selected_categories = self.get_selected_categories()
169 |         if not selected_categories:
170 |             messagebox.showwarning("Warning", "Please select at least one category.")
171 |             return
172 |         
173 |         matches = self.find_matches(self.selected_columns, selected_categories)
174 |         matches_filtered = [(identifier, term, category, col, text) for identifier, term, category, col, text in matches if col in self.selected_columns]
175 |         output_file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
176 |         if output_file_path:
177 |             try:
178 |                 matches_df = pd.DataFrame(matches_filtered, columns=['Identifier', 'Term', 'Category', 'Column', 'Original Text'])
179 |                 matches_df.to_csv(output_file_path, index=False)
180 |                 messagebox.showinfo("Success", f"Merged data saved to: {output_file_path}")
181 |                 self.reset()
182 |             except Exception as e:
183 |                 messagebox.showerror("Error", f"An error occurred while saving file: {e}")
184 |     
185 |     def toggle_columns(self):
186 |         if self.all_columns_var.get():
187 |             self.column_listbox.selection_set(0, tk.END)
188 |             self.column_listbox.config(state='disabled')
189 |         else:
190 |             self.column_listbox.selection_clear(0, tk.END)
191 |             self.column_listbox.config(state='normal')
192 |     
193 |     def toggle_categories(self):
194 |         if self.all_categories_var.get():
195 |             self.category_listbox.selection_set(0, tk.END)
196 |             self.category_listbox.config(state='disabled')
197 |         else:
198 |             self.category_listbox.selection_clear(0, tk.END)
199 |             self.category_listbox.config(state='normal')
200 |     
201 |     def get_selected_columns(self):
202 |         if self.all_columns_var.get():
203 |             return self.columns
204 |         else:
205 |             return [self.columns[i] for i in self.column_listbox.curselection()]
206 |     
207 |     def get_selected_categories(self):
208 |         return [self.categories[i] for i in self.category_listbox.curselection()]
209 | 
210 |     def find_matches(self, selected_columns, selected_categories):
211 |         matches = []
212 |         lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)]
213 | 
214 |         for index, row in self.metadata_df.iterrows():
215 |             for col in selected_columns:
216 |                 if isinstance(row[col], str):
217 |                     for term, category in zip(lexicon_df['term'], lexicon_df['category']):
218 |                         # Check for multiple matches of terms within a single metadata cell
219 |                         if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()):
220 |                             matches.append((row[self.identifier_column], term, category, col, row[col]))
221 | 
222 |         return matches
223 |     
224 |     def back_to_main_frame(self):
225 |         self.column_selection_frame.grid_remove()
226 |         self.main_frame.grid()
227 |     
228 |     def back_to_column_selection(self):
229 |         self.identifier_selection_frame.grid_remove()
230 |         self.column_selection_frame.grid()
231 |     
232 |     def back_to_identifier_selection(self):
233 |         self.category_selection_frame.grid_remove()
234 |         self.identifier_selection_frame.grid()
235 |     
236 |     def reset(self):
237 |         self.load_lexicon_button.config(state='normal')
238 |         self.load_metadata_button.config(state='normal')
239 |         self.lexicon_df = None
240 |         self.metadata_df = None
241 |         self.columns = []
242 |         self.categories = []
243 |         self.selected_columns = []
244 |         self.identifier_column = None
245 |         self.next_button.grid_remove()
246 |         self.explanation_label.grid()
247 | 
248 | # Create and run the application
249 | app = MaRMAT()
250 | app.mainloop()
251 | 


--------------------------------------------------------------------------------
/Code/example-output-reparative-metadata-lexicon.csv:
--------------------------------------------------------------------------------
 1 | Identifier,Term,Category,Column,Original Text
 2 | 337805,aborigines,RaceTerms,title,Aborigines of Taiwan [001]
 3 | 332408,aboriginal,RaceTerms,title,"Ainu (Japan's aboriginal people), Hokkaido, Japan [30]"
 4 | 332408,wife,GenderTerms,description,"Photo of Japan's Aboriginal people (Chief, his wife and an unidentified person), Hokkaido, Japan"
 5 | 332408,aboriginal,RaceTerms,description,"Photo of Japan's Aboriginal people (Chief, his wife and an unidentified person), Hokkaido, Japan"
 6 | 335588,gentleman,Aggrandizement,description,"Photo of a Japanese gentleman holding a hand fan, Tokyo, Japan"
 7 | 330740,dwarf,Disability,description,"Photograph of block print: ""A Potted Dwarf Pine with a Basin and a Towel on a Rack - Horse Talisman (Mayoke)"", also known as ""A surimono still-life composition"", (from the series A Set of Horses (Umazukushi), 1822) by Katsushika Hokusai (Japanese, 1760-1849), (approximate size, may vary slightly) 206 mm x 183 mm (8.11 in. x 7.20 in.)"
 8 | 1533946,indians,RaceTerms,title,Busts of Ute Indians [1]
 9 | 1533946,indian,RaceTerms,description,"Black-and-white photograph of a bust of an American Indian by Millard F. Malin, from a set commissioned by the State of Utah in 1934.  His models were Ute Indians in the Uinta Basin."
10 | 1533946,indians,RaceTerms,description,"Black-and-white photograph of a bust of an American Indian by Millard F. Malin, from a set commissioned by the State of Utah in 1934.  His models were Ute Indians in the Uinta Basin."
11 | 962277,indian,RaceTerms,description,"Photo taken at a court hearing or de-briefing following the American Indian Movement takeover at Wounded Knee, South Dakota, in 1973."
12 | 1498946,indian,RaceTerms,title,Spanish at Indian pueblo
13 | 1498946,indian,RaceTerms,description,"Photograph of an illustration in an unidentified publication, artist's rendition of a party of Spanish horsemen at an Indian pueblo, perhaps in New Mexico."
14 | 995167,Native Americans,RaceTerms,title,Native Americans herding sheep on horseback
15 | 995167,indians,RaceTerms,description,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation
16 | 947066,squaw,RaceTerms,description,"Photo of a large wood sculpture at Palisades Tahoe (previously Squaw Valley) in Olympic Valley, California, depicting skiers. It was carved in 1995 "
17 | 958790,wife,GenderTerms,title,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960."
18 | 958790,wife,GenderTerms,description,"Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960"
19 | 998919,pioneer,Aggrandizement,title,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
20 | 998919,pioneer,Aggrandizement,description,"Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
21 | 1302623,indian,RaceTerms,title,"Basalt-capped mesa on Dolores (Triassic), 6Â± miles south of Beddehoche (Indian Wells), Ariz., 1909 (photo G-67)"
22 | 1302623,indian,RaceTerms,description,"Photograph of Black Butte, a basalt-capped mesa south of Indian Wells. From Herbert E. Gregory Book 2: Navajo-Hopi, San Juan 1909"
23 | 1302623,indian,RaceTerms,spatial coverage,"Black Butte (Navajo County, Ariz.); Five Buttes (Ariz.); Navajo County (Ariz.); Navajo Indian Reservation; Arizona"
24 | 995167,Native Americans,RaceTerms,title,Native Americans herding sheep on horseback
25 | 995167,indians,RaceTerms,description,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation
26 | 995167,indian,RaceTerms,spatial coverage,Navajo Indian Reservation
27 | 2364219,pioneer,Aggrandizement,title,"Pioneer Day parade, 1880 (Carter, photo)"
28 | 2364219,pioneer,Aggrandizement,description,"Black and white photograph of the Salt Lake Pioneer Day Parade, July 24, 1880."
29 | 941713,evacuees,JapaneseincarcerationTerm,title,Newly arrived evacuees standing behind their baggage.
30 | 941713,evacuees,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II
31 | 941713,relocation,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II
32 | 941713,relocation center,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II
33 | 941713,relocation,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II
34 | 941713,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese American Relocation Photograph Collection
35 | 941713,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese American Relocation Photograph Collection
36 | 941496,evacuees,JapaneseincarcerationTerm,title,Evacuees cleaning vegetables in the packing shed.
37 | 941496,evacuees,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II
38 | 941496,relocation,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II
39 | 941496,relocation center,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II
40 | 941496,relocation,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II
41 | 941496,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection
42 | 941496,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection
43 | 941536,evacuees,JapaneseincarcerationTerm,title,Evacuees harvesting potatoes at Tule Lake. [5]
44 | 941536,evacuees,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II
45 | 941536,relocation,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II
46 | 941536,relocation center,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II
47 | 941536,relocation,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II
48 | 941536,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection
49 | 941536,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection
50 | 958469,wife,GenderTerms,title,"Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle the ""Challenger"" on the Bonneville Salt Flats Raceway in 1960."
51 | 958469,wife,GenderTerms,description,"Photo of Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle, the ""Challenger,"" on the Bonneville Salt Flats Raceway in 1960"
52 | 958790,wife,GenderTerms,title,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960."
53 | 958790,wife,GenderTerms,description,"Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960"
54 | 998946,pioneer,Aggrandizement,title,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City"
55 | 998946,pioneer,Aggrandizement,description,"Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
56 | 998908,pioneer,Aggrandizement,title,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City"
57 | 998908,pioneer,Aggrandizement,description,"Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
58 | 998938,pioneer,Aggrandizement,title,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
59 | 998938,pioneer,Aggrandizement,description,"Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
60 | 998919,pioneer,Aggrandizement,title,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
61 | 998919,pioneer,Aggrandizement,description,"Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City"
62 | 2292054,racism,RaceEuphemisms,description,"Series of letters from Albert Fritz to the Salt Lake Police Department, Assistant Chief Ralph Knusden regarding peaceful demonstrations. NAACP call for protest on State Capitol building due to failure of the Utah Legislature to adopt civil rights legislation regarding housing and public accommodations (2005). NAACP appealed to the Governor of the state of Utah to include civil rights legislation on the docket for the special session of the Utah state Legislature, which he has announced. Local NAACP leaders noticed that Utah is the only northern state with no civil rights legislation (1968). Letter from Albert B. Fritz (Salt Lake NAACP) to Honorable George D. Clyde, Governor of Utah (1965?) regarding lack of civil rights legislation in Utah. Article from the Wall Street Journal (1964) ""Civil Rights Irony: New U.S. Agency's First Case Likely to Come From Utah"" by Donald Moffitt. Article discusses racism in Salt Lake City and highlights Chuck Nabors who moved to Salt Lake City to attend the University of Utah. Nabors rented an apartment sight unseen, when the landlord saw he was an African American the landlord backed out of lease. After Nabors found a landlord who would rent to him, neighbors petitioned to have him leave."
63 | 1396789,ethnic,RaceTerms,description,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah.  Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah."""
64 | 1396777,racial,RaceEuphemisms,description,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination."
65 | 1396777,indian,RaceTerms,description,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination."
66 | 1396777,indians,RaceTerms,description,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination."
67 | 1396789,ethnic,RaceTerms,description,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah.  Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah."""
68 | 893658,blacks,RaceTerms,collection name,"Ms0453, Interviews with Blacks in Utah, 1982-1988"
69 | 946265,pioneer,Aggrandizement,title,"Arnold Lunn, left, ski historian and author of the book, The Story of Ski-ing, 1952.  And Hjalmar Hvam, right, ski pioneer and inventor of America's first safety binding in 1937."
70 | 947346,pioneer,Aggrandizement,title,"Utah ski pioneer Mel Fletcher skiing on a pair of his homemade ""Barrel Staves,""   circa 1952."
71 | 947346,pioneer,Aggrandizement,description,"Photo of Mel Fletcher, skiing pioneer and ski instructor, on homemade skis"
72 | 2509070,crippled,Disability,title," ""Captive Jewels--Our Crippled Children"" speech, retyped"
73 | 


--------------------------------------------------------------------------------
/Code/Test Versions/RMA-GUI-2.5.py:
--------------------------------------------------------------------------------
  1 | import tkinter as tk
  2 | from tkinter import filedialog, messagebox, ttk
  3 | import pandas as pd
  4 | import re
  5 | import threading
  6 | import queue
  7 | 
  8 | class ReparativeMetadataAuditTool(tk.Tk):
  9 |     def __init__(self):
 10 |         super().__init__()
 11 |         self.title("Reparative Metadata Audit Tool")
 12 |         
 13 |         # Initialize variables
 14 |         self.lexicon_df = None
 15 |         self.metadata_df = None
 16 |         self.columns = []
 17 |         self.categories = []
 18 |         self.selected_columns = []
 19 |         self.identifier_column = None
 20 |         self.category_selection_page_active = False
 21 | 
 22 |         # Create main frame
 23 |         self.main_frame = ttk.Frame(self)
 24 |         self.main_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
 25 | 
 26 |         # Explanation text
 27 |         explanation_text = (
 28 |             "Welcome to the Reparative Metadata Audit Tool!\n\n"
 29 |             "This tool allows you to match terms from a problematic terms lexicon file with text data from a collections metadata file.\n\n"
 30 |             "Please follow the steps below:\n\n"
 31 |             "1. Load your lexicon and metadata files using the provided buttons.\n"
 32 |             "2. Select the columns from your metadata file that you want to analyze.\n"
 33 |             "3. Choose the column in your metadata file that you want to use as the 'Identifier' column.\n"
 34 |             "4. Choose the categories of terms from the lexicon that you want to search for.\n"
 35 |             "5. Click 'Perform Matching' to find matches and export the results to a CSV file.\n\n"
 36 |             "Let's get started!"
 37 |         )
 38 | 
 39 |         self.explanation_label = ttk.Label(self.main_frame, text=explanation_text, justify='left')
 40 |         self.explanation_label.grid(row=0, column=0, columnspan=3, padx=20, pady=20, sticky="nsew")
 41 | 
 42 |         # Load lexicon button
 43 |         self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon)
 44 |         self.load_lexicon_button.grid(row=1, column=0, padx=10, pady=10, sticky="w")
 45 | 
 46 |         # Load metadata button
 47 |         self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata)
 48 |         self.load_metadata_button.grid(row=1, column=1, padx=10, pady=10, sticky="w")
 49 | 
 50 |         # Reset button
 51 |         self.reset_button = ttk.Button(self.main_frame, text="Reset", command=self.reset)
 52 |         self.reset_button.grid(row=1, column=2, padx=10, pady=10, sticky="w")
 53 | 
 54 |         # Next button
 55 |         self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection)
 56 |         self.next_button.grid(row=2, column=0, columnspan=3, pady=10, sticky="nsew")
 57 |         self.next_button.grid_remove()  # Hide next button initially
 58 | 
 59 |         # Progress bar
 60 |         self.progress_bar = ttk.Progressbar(self.main_frame, orient="horizontal", mode="determinate")
 61 |         self.progress_bar.grid(row=3, column=0, columnspan=3, pady=10, padx=10, sticky="ew")
 62 |         self.progress_bar.grid_remove()  # Hide progress bar initially
 63 | 
 64 |         # Queue for thread communication
 65 |         self.matching_queue = queue.Queue()
 66 |         self.matching_thread = None
 67 |         self.check_queue_job = None
 68 | 
 69 |     def load_lexicon(self):
 70 |         file_path = filedialog.askopenfilename(filetypes=[("CSV and TSV files", "*.csv *.tsv")])
 71 |         if file_path:
 72 |             try:
 73 |                 if file_path.endswith('.csv'):
 74 |                     self.lexicon_df = pd.read_csv(file_path, encoding='latin1')
 75 |                 elif file_path.endswith('.tsv'):
 76 |                     self.lexicon_df = pd.read_csv(file_path, encoding='latin1', sep='\t')
 77 |                 messagebox.showinfo("Success", "Lexicon loaded successfully.")
 78 |                 self.load_lexicon_button.config(state='disabled')
 79 |             except Exception as e:
 80 |                 messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}")
 81 | 
 82 |     def load_metadata(self):
 83 |         file_path = filedialog.askopenfilename(filetypes=[("CSV and TSV files", "*.csv *.tsv")])
 84 |         if file_path:
 85 |             try:
 86 |                 if file_path.endswith('.csv'):
 87 |                     self.metadata_df = pd.read_csv(file_path, encoding='latin1')
 88 |                 elif file_path.endswith('.tsv'):
 89 |                     self.metadata_df = pd.read_csv(file_path, encoding='latin1', sep='\t')
 90 |                 messagebox.showinfo("Success", "Metadata loaded successfully.")
 91 |                 self.load_metadata_button.config(state='disabled')
 92 |                 self.next_button.grid()
 93 |             except Exception as e:
 94 |                 messagebox.showerror("Error", f"An error occurred while loading metadata: {e}")
 95 | 
 96 |     def show_column_selection(self):
 97 |         if self.lexicon_df is None or self.metadata_df is None:
 98 |             messagebox.showwarning("Warning", "Please load lexicon and metadata files first.")
 99 |             return
100 | 
101 |         # Populate columns listbox
102 |         self.columns = self.metadata_df.columns.tolist()
103 |         self.column_selection_frame = ttk.Frame(self)
104 |         self.column_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
105 | 
106 |         self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:")
107 |         self.column_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
108 | 
109 |         self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple')
110 |         self.column_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
111 |         for column in self.columns:
112 |             self.column_listbox.insert(tk.END, column)
113 | 
114 |         self.all_columns_var = tk.BooleanVar(value=False)
115 |         self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns)
116 |         self.all_columns_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w")
117 | 
118 |         self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_identifier_selection)
119 |         self.next_button_columns.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
120 | 
121 |         self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.back_to_main_frame)
122 |         self.back_button_columns.grid(row=4, column=0, padx=10, pady=10, sticky="nsew")
123 |         self.back_button_columns.grid_remove()  # Hide back button initially
124 | 
125 |     def show_identifier_selection(self):
126 |         self.selected_columns = self.get_selected_columns()  # Store selected columns
127 |         if not self.selected_columns:
128 |             messagebox.showwarning("Warning", "Please select at least one column.")
129 |             return
130 | 
131 |         self.column_selection_frame.grid_remove()
132 | 
133 |         self.identifier_selection_frame = ttk.Frame(self)
134 |         self.identifier_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
135 | 
136 |         self.identifier_label = ttk.Label(self.identifier_selection_frame, text="Select Identifier Column:")
137 |         self.identifier_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
138 | 
139 |         self.identifier_var = tk.StringVar()
140 |         self.identifier_dropdown = ttk.Combobox(self.identifier_selection_frame, textvariable=self.identifier_var, state='readonly')
141 |         self.identifier_dropdown.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
142 |         self.identifier_dropdown['values'] = self.metadata_df.columns.tolist()  # Show all columns as options
143 |         self.identifier_dropdown.current(0)  # Select first column by default
144 | 
145 |         self.next_button_identifier = ttk.Button(self.identifier_selection_frame, text="Next", command=self.show_category_selection)
146 |         self.next_button_identifier.grid(row=2, column=0, padx=10, pady=10, sticky="nsew")
147 | 
148 |         self.back_button_identifier = ttk.Button(self.identifier_selection_frame, text="Back", command=self.back_to_column_selection)
149 |         self.back_button_identifier.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
150 | 
151 |     def show_category_selection(self):
152 |         self.identifier_column = self.identifier_var.get()
153 | 
154 |         self.identifier_selection_frame.grid_remove()
155 | 
156 |         self.category_selection_frame = ttk.Frame(self)
157 |         self.category_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew")
158 | 
159 |         self.category_label = ttk.Label(self.category_selection_frame, text="Select Categories to Analyze:")
160 |         self.category_label.grid(row=0, column=0, padx=10, pady=5, sticky="w")
161 | 
162 |         self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple')
163 |         self.category_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew")
164 |         self.categories = self.lexicon_df['category'].unique().tolist()
165 |         for category in self.categories:
166 |             self.category_listbox.insert(tk.END, category)
167 | 
168 |         self.all_categories_var = tk.BooleanVar(value=False)
169 |         self.all_categories_checkbox = ttk.Checkbutton(self.category_selection_frame, text="All", variable=self.all_categories_var, command=self.toggle_categories)
170 |         self.all_categories_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w")
171 | 
172 |         self.next_button_categories = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching)
173 |         self.next_button_categories.grid(row=3, column=0, padx=10, pady=10, sticky="nsew")
174 | 
175 |         self.back_button_categories = ttk.Button(self.category_selection_frame, text="Back", command=self.back_to_identifier_selection)
176 |         self.back_button_categories.grid(row=4, column=0, padx=10, pady=10, sticky="nsew")
177 | 
178 |     def perform_matching(self):
179 |         selected_categories = self.get_selected_categories()
180 |         if not selected_categories:
181 |             messagebox.showwarning("Warning", "Please select at least one category.")
182 |             return
183 | 
184 |         self.progress_bar.grid()
185 |         self.progress_bar.start()
186 | 
187 |         self.matched_results = []
188 | 
189 |         def match_terms():
190 |             total_rows = len(self.metadata_df)
191 |             for idx, row in self.metadata_df.iterrows():
192 |                 for col in self.selected_columns:
193 |                     cell_value = str(row[col])
194 |                     for _, lexicon_row in self.lexicon_df.iterrows():
195 |                         term = lexicon_row['term']
196 |                         category = lexicon_row['category']
197 |                         if category in selected_categories:
198 |                             if re.search(rf'\b{re.escape(term)}\b', cell_value, re.IGNORECASE):
199 |                                 match_info = {
200 |                                     'Identifier': row[self.identifier_column],
201 |                                     'Column': col,
202 |                                     'Term': term,
203 |                                     'Category': category,
204 |                                     'Original Text': cell_value
205 |                                 }
206 |                                 self.matched_results.append(match_info)
207 |                 self.matching_queue.put((idx + 1) / total_rows * 100)
208 | 
209 |         def process_queue():
210 |             try:
211 |                 while True:
212 |                     progress = self.matching_queue.get_nowait()
213 |                     self.progress_bar['value'] = progress
214 |                     self.update_idletasks()
215 |             except queue.Empty:
216 |                 if not self.matching_thread.is_alive():
217 |                     self.progress_bar.stop()
218 |                     self.progress_bar.grid_remove()
219 |                     if self.matched_results:
220 |                         self.export_results()
221 |                     else:
222 |                         messagebox.showinfo("No Matches", "No matches found.")
223 |                     self.after_cancel(self.check_queue_job)
224 | 
225 |         self.matching_thread = threading.Thread(target=match_terms)
226 |         self.matching_thread.start()
227 |         self.check_queue_job = self.after(100, process_queue)
228 | 
229 |     def export_results(self):
230 |         results_df = pd.DataFrame(self.matched_results)
231 |         save_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
232 |         if save_path:
233 |             try:
234 |                 results_df.to_csv(save_path, index=False, encoding='utf-8')
235 |                 messagebox.showinfo("Success", "Results exported successfully.")
236 |             except Exception as e:
237 |                 messagebox.showerror("Error", f"An error occurred while exporting results: {e}")
238 | 
239 |     def get_selected_columns(self):
240 |         if self.all_columns_var.get():
241 |             return self.columns
242 |         selected_indices = self.column_listbox.curselection()
243 |         return [self.columns[i] for i in selected_indices]
244 | 
245 |     def get_selected_categories(self):
246 |         if self.all_categories_var.get():
247 |             return self.categories
248 |         selected_indices = self.category_listbox.curselection()
249 |         return [self.categories[i] for i in selected_indices]
250 | 
251 |     def toggle_columns(self):
252 |         if self.all_columns_var.get():
253 |             self.column_listbox.selection_set(0, tk.END)
254 |         else:
255 |             self.column_listbox.selection_clear(0, tk.END)
256 | 
257 |     def toggle_categories(self):
258 |         if self.all_categories_var.get():
259 |             self.category_listbox.selection_set(0, tk.END)
260 |         else:
261 |             self.category_listbox.selection_clear(0, tk.END)
262 | 
263 |     def reset(self):
264 |         self.destroy()
265 |         self.__init__()
266 |         self.mainloop()
267 | 
268 |     def back_to_main_frame(self):
269 |         self.column_selection_frame.grid_remove()
270 |         self.main_frame.grid()
271 | 
272 |     def back_to_column_selection(self):
273 |         self.identifier_selection_frame.grid_remove()
274 |         self.column_selection_frame.grid()
275 | 
276 |     def back_to_identifier_selection(self):
277 |         self.category_selection_frame.grid_remove()
278 |         self.identifier_selection_frame.grid()
279 | 
280 | if __name__ == "__main__":
281 |     app = ReparativeMetadataAuditTool()
282 |     app.mainloop()
283 | 
284 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | **MaRMAT Beta is now obsolete. For the most current version of MaRMAT, see https://github.com/marriott-library/MaRMAT.** 
  2 | 
  3 | # Marriott Reparative Metadata Assessment Tool (MaRMAT) - Beta
  4 | 
  5 | The Marriott Reparative Metadata Assessment Tool (MaRMAT) is a Python application designed for auditing collections metadata files against a lexicon of potentially problematic terms. The tool's design facilitates an easy-to-follow process for assessing metadata using a lexicon of terms. For PC user's, we provide a graphical interface for file loading, column selection, and term matching, making it user-friendly for those with limited programming experience. The tool can also be run in your command line. 
  6 | 
  7 | Note: Whether you are using the GUI for Windows or the command-line tool for MacOS, you will need to have [Python](https://docs.python.org/3/) and the `pandas` library for Python installed on your machine. Installation instructions for `pandas` are provided below in the respective **Dependencies** sections for the GUI and command-line tool.
  8 | 
  9 | We value your feedback! Please take [this survey](https://docs.google.com/forms/d/e/1FAIpQLSfaABD5qsU2trEjrDWs3MoytgiNCaD08GJvRWzqhgzv5GjoDg/viewform?usp=sf_link) to tell us about your experience using MaRMAT.
 10 | 
 11 | 
 12 | ## **Table of Contents**
 13 | 1. [Project Background](#1-project-background)
 14 | 
 15 |    1.1 [About the Tool](#11-about-the-tool)
 16 |    
 17 |    1.2 [The Lexicons](#12-the-lexicons)
 18 | 
 19 |    1.3 [Features](#13-features)
 20 | 
 21 |    1.4 [Example Outputs and Tutorial](#14-example-outputs-and-tutorial)
 22 | 
 23 | 2. [The GUI for Windows Users](#2-the-gui-for-windows-users)
 24 | 
 25 |    2.1 [Usage](#21-usage)
 26 | 
 27 |    2.2 [Dependencies](#22-dependencies)
 28 | 
 29 |    2.3 [Installation](#23-installation)
 30 | 
 31 |    2.4 [Troubleshooting](#24-troubleshooting)
 32 | 
 33 | 3. [The Command-Line Tool](#3-the-command-line-tool)
 34 | 
 35 |    3.1 [Usage](#31-usage)
 36 | 
 37 |    3.2 [Dependencies](#32-dependencies)
 38 | 
 39 |    3.3 [Notes](#33-notes)
 40 | 
 41 | 4. [Credits and Acknowledgments](#4-credits-and-acknowledgments)
 42 | 5. [User Feedback Survey](#5-user-feedback-survey)
 43 | 
 44 | ## 1. Project Background
 45 | Identifying potentially harmful language, problematic and outdated Library of Congress Subject Headings, is one step towards reparative metadata practices. Deciding what and how to change this metadata, however, is up to metadata practitioners and involves awareness, education, and sensitivity for the communities and history reflected in digital collections. The [Digital Library Federation’s Inclusive Metadata Toolkit, created by the Digital Library Federation’s Cultural Assessment Working Group](https://osf.io/2nmpc/), provides resources to educate and assist in reparative metadata decision-making.  
 46 | 
 47 | The Marriot Reparative Metadata Assessment Tool (MaRMAT) is based [Duke University’s Description Audit Tool](https://github.com/duke-libraries/description-audit). It is intended to assist digital collections metadata practitioners in bulk analysis of metadata collections to identify potentially harmful language in description and facilitate repairing metadata to reflect current and preferred terminology. While Duke University's Description Audit Tool was created to analyze MARC XML and EAD finding aid metadata, MaRMAT was developed to analyze metadata in a spreadsheet format, allowing for assessment of Dublin Core metadata and other schemas due to only requiring key column-header names. In addition, the script has been altered to provide more custom querying capabilities.
 48 | 
 49 | ## 1.1 About the Tool
 50 | At the most basic level, [MaRMAT](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-GUI-2.5.3.py) is designed to match terms from a lexicon with textual data and produce a CSV file containing the matched results. It utilizes the Pandas library for data manipulation and regular expressions for text processing. It was designed primarily with librarians in mind, specifically those engaged in reparative metadata practices, to assist in idenfiying terms in their metadata that may be outdated, biased, or otherwise problematic. The underlying code (including preliminary iterations) and sample lexicons for using the tool can be accessed via the [Code](https://github.com/marriott-library/MaRMAT/tree/main/Code) folder of this repository. For additional information about the GUI, see [GUI-Documentation](https://github.com/marriott-library/MaRMAT/blob/main/Code/GUI-Documentation.md). 
 51 | 
 52 | ### 1.2 The Lexicons
 53 | There are two lexicons provided to help begin your reparative metadata assessment. Not all of the terms in these lexicons may need remediation, rather, they may signal areas of your collections that should be reiveiwed carefully. Users may download the provided lexicons to use in MaRMAT as is, remove terms that may not be problematic in your metadata, or add additional terms and categories based on specific project needs. The only requirements for a lexicon to work against another file are that there be two columns in the CSV file: "Term" and "Category" (case sensitive). Therefore, the tool's use is not limited to assessing metadata for problematic terms; it may also be loaded with a custom lexicon to perform matching against a variety of content types.
 54 | 
 55 | | Lexicon      | Description |
 56 | | :----------:| ---------- |
 57 | | Reparative Metadata Lexicon   | The [Reparative Metadata Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/lexicon-reparative-metadata.csv) includes potentially harmful terminology organized by category and is best suited for uncontrolled metadata fields (i.e. Title, Description). This lexicon has been adapted from Duke University's lexicons, which were created for similar use cases. For the Marriott Reparative Metadata Assessment Tool (MaRMAT),  Duke's [lexicons](https://github.com/duke-libraries/description-audit/tree/main/lexicons) were modified  by transposing across their category columns to create a single lexicon (term, category) that better accommodate users adding additional terms and categories without having to adjust the underlying code structure.  |
 58 | | Library of Congress Subject Heading (LCSH) Lexicon   | The [LCSH Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/lexicon-LCSH.csv) includes selected changed and canceled LCSH (mostly from 2023) and headings that have been identified as problematic. The LCSH Lexicon is best suited to run against the Subject metadata field, or other fields that contain LCSH terms
 59 | 
 60 | ### 1.3 Features
 61 | - Load lexicon and metadata files in CSV format.
 62 | - Select columns from the metadata file for analysis.
 63 | - Choose the column in the metadata file to be rewritten as the "Identifier" column so that the output can be reconciled with the original metadata file.
 64 | - Select categories of terms from the lexicon for analysis.
 65 | - Perform matching to find matches between selected columns and categories.
 66 | - Export results to a CSV file.
 67 | 
 68 | ### 1.4 Example Outputs and Tutorial
 69 | 
 70 | To provide users with a sense of what to expect from running MaRMAT against their own metadata collection, below includes example metadata to load and query against the provided lexicons, and outputs from the the provided lexicons: 
 71 | 1. [Example Input: Potentially Problematic Metadata](https://github.com/marriott-library/MaRMAT/blob/main/Code/example-input-metadata.csv)
 72 | 2. [Example Output: Reparative Metadata Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/example-output-reparative-metadata-lexicon.csv)
 73 | 3. [Example Output: LCSH Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/example-output-lcsh-subject-lexicon.csv)
 74 | 
 75 | Please keep in mind these reports are just snippets of larger reports. Users should be aware that there may be false positives or results that may not need remediation. For example, the LCSH term "Race" is considered a problem heading but MaRMAT may flag other headings with "race," as in "Bonneville Salt Flats Race, Utah." Likewise, the gender term "wife" may not always signal an unnamed woman, and terms that may be harmful in some contexts may not be in others. Therefore, we stress the importance of human review and intervention prior to making broad conclusions or global changes based on MaRMAT outputs.
 76 | 
 77 | To assist in getting started with MaRMAT, there is also a [video tutorial](https://youtu.be/uspAoqfj99g?si=jQArVdlbGm_qN78l) that demonstrates the first steps in using the GUI for Windows (subtitles can be enabled in settings). The Mac OS demonstration is also available [on video](https://youtu.be/j_fFplU1W_o), please note audio for this demo will be coming soon.  
 78 | 
 79 | ## 2. The GUI for Windows Users
 80 | To facilitate wider use, the [MaRMAT GUI](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-GUI-2.5.3.py) allows users to easily load a lexicon and a metadata file, select a key column (i.e., Identifier) to use in reconciling matches, and choose the columns and categories they'd like to perform matching on. 
 81 | 
 82 | *Note: The GUI is not compatible with MacOS. Additional information on the MaRMAT GUI is available [here](https://github.com/marriott-library/MaRMAT/blob/main/Code/GUI-Documentation.md).
 83 | 
 84 | ### 2.1 Usage 
 85 | 1. Loading Files:
 86 |    - Click on the "Load Lexicon" button to load the lexicon file.
 87 |    - Click on the "Load Metadata" button to load the metadata file.
 88 |      
 89 | 2. Selecting Columns:
 90 |    - After loading files, click "Next" to proceed to column selection.
 91 |    - Select the columns from the metadata file that you want to analyze.
 92 |      
 93 | 3. Selecting Identifier Column:
 94 |    - After selecting columns, choose the column in the metadata file that will serve as the key column or "Identifier" column, such as a record ID. 
 95 | 
 96 | 4. Selecting Categories:
 97 |    - Next, choose the categories of terms from the lexicon that you want to search for.
 98 | 
 99 | 5. Performing Matching:
100 |    - Click "Perform Matching" to find matches between selected columns and categories.
101 |    - The results will be exported to a CSV file.
102 |   
103 | ### 2.2 Dependencies
104 | - **[Python 3.x](https://docs.python.org/3/)**: Python is a widely used high-level programming language for general-purpose programming.
105 | 
106 | - **[Tkinter](https://docs.python.org/3/library/tk.html)**: Tkinter is Python's standard GUI (Graphical User Interface) package. It is used to create desktop applications with a graphical interface. It is usually included with Python distributions, so no separate installation is required.
107 | 
108 | - **[re](https://docs.python.org/3/library/re.html)**: This module provides regular expression matching operations. It's a built-in module in Python and doesn't require separate installation.
109 | 
110 | - **[pandas](https://pandas.pydata.org/docs/)**: Pandas is a Python library that provides easy-to-use data structures and data analysis tools for manipulating and analyzing structured data, particularly tabular data. Pandas can be installed using pip in your command line interface: ``py -m pip install pandas``
111 | 
112 | *Note: These dependencies are essential for running MaRMAT. If you don't have Python installed, you can download it from the [official Python website](https://www.python.org/downloads).*
113 | 
114 | ### 2.3 Installation 
115 | No installation is required. Simply follow the steps below to download and run the Python script to start the application on your PC.
116 | 
117 | 1. Download the Python Script:
118 |    - Download the [MaRMAT-GUI-2.5.3.py](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-GUI-2.5.3.py) script to a location on your PC where you can easily find it, such as your Desktop or Downloads.
119 | 
120 | 2. Ensure Python is Installed:
121 |    - To make sure that Python is installed on your PC, search for "Python" in your Start Menu or look for the Python folder in your Program Files.
122 |    - If Python is not installed, you can download and install it from the official [Python website](https://www.python.org/downloads).
123 | 
124 | 3. Double-Click the Python Script:
125 |    - Navigate to the location where you downloaded the script.
126 |    - Double-click on the script file (i.e., MaRMAT-GUI-2.5.3.py).
127 | 
128 | 4. Application Starts:
129 |    - The application should start running automatically; the GUI will appear on your screen.
130 |   
131 | ### 2.4 Troubleshooting
132 | The GUI should automacially open when you open the Python code file. If you are having issues with the GUI opening, try opening the file in Python IDLE and running it. IDLE should give ou an error message with insights as to why it is not loading correctly. If you are receiving error messages pyrelated to ``pandas``, such as ``No module named 'pandas'``, follow these steps to install ``pandas``. 
133 | 
134 | 1. Open your command line interface
135 | 
136 | 2. Type the following into command line: ``py -m pip install pandas``
137 | 
138 | 3. Press enter to run the command
139 | 
140 | If this process does not resolve your issue, follow these Getting Started tips to make sure python and the pip installer are running correctly on your PC: [https://pip.pypa.io/en/stable/getting-started](https://pip.pypa.io/en/stable/getting-started/)
141 | 
142 | ## 3. The Command-Line Tool 
143 | The [MaRMAT](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-CommandLine-2.6.py) can be run by any user from their command line.
144 | 
145 | ### 3.1 Usage
146 | 1. Install Python if not already installed (Python 3.x recommended).
147 |    
148 | 2. Clone or download the MaRMAT repository.
149 | 
150 | 3. Use the command-line interface to navigate to the directory where you saved the files (e.g., `Downloads`, `Desktop`). For example, run `cd Downloads` to change your directory to your `Downloads` folder.
151 | 
152 | 5. Run the tool in your command line using the following command: ```python3 MaRMAT-CommandLine-2.6.py```
153 | 
154 | 6. Follow the prompts in your command line to provide the paths to the lexicon and metadata files.
155 | 
156 | 7. Follow the prompts to input the names of the columns you want to analyze in the metadata file, the name of the column that should be used as the identifier or key column, and the categories of terms from the lexicon that you want to search for in your metadata file. Note: inputs are case sensitive.
157 |    
158 | 8. Follow the prompt to provide the path you would like to save your output to. 
159 | 
160 | 9. Review the matching results displayed on the console or in the generated CSV file.
161 | 
162 | *Note: Demonstration video coming soon*
163 | 
164 | ### 3.2 Dependencies
165 | 
166 | - **[Python 3.x](https://docs.python.org/3/)**: Python is a widely used high-level programming language for general-purpose programming.
167 | 
168 | - **[pandas](https://pandas.pydata.org/docs/)**: Pandas is a Python library that provides easy-to-use data structures and data analysis tools for manipulating and analyzing structured data, particularly tabular data. Pandas can be installed using pip in Terminal: `pip install pandas`
169 | 
170 | - **[re](https://docs.python.org/3/library/re.html)**: This module provides regular expression matching operations. It's a built-in module in Python and doesn't require separate installation.
171 | 
172 | *Note: These dependencies are necessary to run the provided code successfully. Ensure that you have them installed before running the code.*
173 | 
174 | ### 3.3 Notes
175 | - Ensure that both the lexicon and metadata files are in CSV format.
176 | - The lexicon file should contain columns for terms and their corresponding categories ("Terms","Category").
177 | - The metadata file should contain the text data to be analyzed, with each row representing a separate entry.
178 | - The metadata file should contain a column, such as a Record ID, that you can use as an "Identifier" to reconcile the tool's output with your original metadata. 
179 | - The tool outputs matching results to a CSV file named "matching_results.csv" in the tool's directory.
180 | 
181 | ## 4. Credits and Acknowledgments
182 | Code developed by [Kaylee Alexander](https://github.com/kayleealexander) in collaboration with ChatGPT 3.5, [Rachel Wittmann](https://github.com/RachelJaneWittmann), and [Anna Neatrour](https://github.com/aneatrour) at the University of Utah's J. Willard Marriott Library. MaRMAT Beta was released in July, 2024.
183 | 
184 | This tool was inspired by the Duke University Libraries Description Audit Tool, developed by [Noah Huffman](https://github.com/noahgh221) at the Rubenstein Library, and expanded by [Miriam Shams-Rainey](https://github.com/mshamsrainey) (see [Description-Audit](https://github.com/duke-libraries/description-audit/tree/main)). 
185 | 
186 | ## 5. User Feedback Survey
187 | After using MaRMAT, please take [this suvery](https://docs.google.com/forms/d/e/1FAIpQLSfaABD5qsU2trEjrDWs3MoytgiNCaD08GJvRWzqhgzv5GjoDg/viewform?usp=sf_link) and tell us about your exeprience using MARMAT. We appreciate your feedback!
188 | 


--------------------------------------------------------------------------------
/Code/lexicon-LCSH.csv:
--------------------------------------------------------------------------------
  1 | term,category
  2 | Afghanistan--Politics and government--2001-,ChangeHeadingLCSH
  3 | African American bisexuals,ChangeHeadingLCSH
  4 | African American gays,ChangeHeadingLCSH
  5 | African American gays in literature,ChangeHeadingLCSH
  6 | Aged,ChangeHeadingLCSH
  7 | "Al-Aqsa Intifada, 2000-",ChangeHeadingLCSH
  8 | Aliens,ChangeHeadingLCSH
  9 | Ammassalimiut dialect,ChangeHeadingLCSH
 10 | Amycus (Greek mythology),CancelHeadingLCSH
 11 | Anaxarete (Greek mythology),CancelHeadingLCSH
 12 | Anaxarete (Greek mythology) in literature,CancelHeadingLCSH
 13 | Anime,CancelHeadingLCSH
 14 | "Anti-war poetry, Oriya",ChangeHeadingLCSH
 15 | Antifa (Organisation),ProblemLCSH
 16 | "Argentina--History--Dirty War, 1976-1983",ProblemLCSH
 17 | Asian American bisexuals,ChangeHeadingLCSH
 18 | Asian American gays,ChangeHeadingLCSH
 19 | Asian flu,ProblemLCSH
 20 | Asperger's syndrome,ProblemLCSH
 21 | Attention-deficit disorder in adolescence,CancelHeadingLCSH
 22 | Attention-deficit disorder in adults,CancelHeadingLCSH
 23 | Attention-deficit-disordered children,CancelHeadingLCSH
 24 | Attention-deficit-disordered parents,ChangeHeadingLCSH
 25 | Attention-deficit-disordered youth,ChangeHeadingLCSH
 26 | B.C. and A.D.,ProblemLCSH
 27 | B17 (Steam locomotive),ChangeHeadingLCSH
 28 | Baucis (Greek mythology),CancelHeadingLCSH
 29 | Bellerophon (Greek mythology),CancelHeadingLCSH
 30 | Berbers,ProblemLCSH
 31 | Bildungsromans,ProblemLCSH
 32 | "Bird, Mount (Ross Island, Ross Sea, Antarctica)",ChangeHeadingLCSH
 33 | Bisexuals,ChangeHeadingLCSH
 34 | Bisexuals' writings,ChangeHeadingLCSH
 35 | "Bisexuals' writings, American",ChangeHeadingLCSH
 36 | "Bisexuals' writings, Canadian",ChangeHeadingLCSH
 37 | "Bisexuals' writings, English",ChangeHeadingLCSH
 38 | Blacks,ChangeHeadingLCSH
 39 | Boat people,ProblemLCSH
 40 | Bossiness,ProblemLCSH
 41 | Boys love (Gay erotica),ChangeHeadingLCSH
 42 | "Brazil--History--Revolution, 1964",ProblemLCSH
 43 | Brecon Beacons National Park (Wales),ChangeHeadingLCSH
 44 | Buddhist gays,ChangeHeadingLCSH
 45 | Bukit Baka-Bukit Raya National Park (Indonesia),ChangeHeadingLCSH
 46 | Candra (Hindu deity),CancelHeadingLCSH
 47 | "Cang Lang Ting (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
 48 | Catholic gays,ChangeHeadingLCSH
 49 | Cerebral palsied,ProblemLCSH
 50 | Cheju Tol Munhwa Kongwŏn (Korea),ChangeHeadingLCSH
 51 | Child pornography,ProblemLCSH
 52 | Children of attention-deficit-disordered parents,ChangeHeadingLCSH
 53 | Children of egg donors,CancelHeadingLCSH
 54 | Children of sperm donors,CancelHeadingLCSH
 55 | "Children's literature, Oriya",ChangeHeadingLCSH
 56 | "Children's poetry, Oriya",ChangeHeadingLCSH
 57 | Christian gays,ChangeHeadingLCSH
 58 | Church work with African American gays,ChangeHeadingLCSH
 59 | Church work with attention-deficit-disordered youth,ChangeHeadingLCSH
 60 | Church work with gays,ChangeHeadingLCSH
 61 | Civil procedure (Bantu law),CancelHeadingLCSH
 62 | Climatic changes,ProblemLCSH
 63 | Closeted gays,ChangeHeadingLCSH
 64 | Closeted gays in literature,ChangeHeadingLCSH
 65 | Common fallacies,ProblemLCSH
 66 | Crimea (Ukraine)--History--2014-,ChangeHeadingLCSH
 67 | Criminals,ProblemLCSH
 68 | Cripples,ChangeHeadingLCSH
 69 | Cross-dressing,ProblemLCSH
 70 | Crystalline lens,ProblemLCSH
 71 | Dattātreya (Hindu deity),CancelHeadingLCSH
 72 | Deaf gays,ChangeHeadingLCSH
 73 | Defloration,ProblemLCSH
 74 | Devakī (Hindu mythology),CancelHeadingLCSH
 75 | Deviant behavior,ProblemLCSH
 76 | Dewbow,ChangeHeadingLCSH
 77 | "Didactic poetry, Oriya",ChangeHeadingLCSH
 78 | Discovery and exploration,ProblemLCSH
 79 | Discrimination against overweight persons,ProblemLCSH
 80 | Domestic relations,ProblemLCSH
 81 | "Donetsk International Airport, Battle for, 2nd, Ukraine, 2014-2015",ChangeHeadingLCSH
 82 | Dwarfs (Persons),ProblemLCSH
 83 | East Indians,ProblemLCSH
 84 | Eskimos,ProblemLCSH
 85 | Ethnic arts,ProblemLCSH
 86 | Evaluative (Linguistics),ChangeHeadingLCSH
 87 | Ex-gays,ChangeHeadingLCSH
 88 | Expulsion of the Mormons,ChangeHeadingLCSH
 89 | Female circumcision,ChangeHeadingLCSH
 90 | Feminine hygiene products,ChangeHeadingLCSH
 91 | Feminine hygiene products industry,ChangeHeadingLCSH
 92 | Fetishism,ProblemLCSH
 93 | Florida Trail (Fla.),CancelHeadingLCSH
 94 | Fogbow,ChangeHeadingLCSH
 95 | "Folk drama, Oriya",ChangeHeadingLCSH
 96 | "Folk literature, Oriya",ChangeHeadingLCSH
 97 | "Folk songs, Dunsun",ChangeHeadingLCSH
 98 | "Folk songs, Kalâtdlisut",ChangeHeadingLCSH
 99 | "Folk songs, Karok",ChangeHeadingLCSH
100 | Freedmen,ChangeHeadingLCSH
101 | Friesland (Netherlands)--History,ChangeHeadingLCSH
102 | Future life,ProblemLCSH
103 | Galván family,ChangeHeadingLCSH
104 | Gays,ChangeHeadingLCSH
105 | Gays and rock music,ChangeHeadingLCSH
106 | Gays and sports,ChangeHeadingLCSH
107 | Gays and the performing arts,ChangeHeadingLCSH
108 | Gays in advertising,CancelHeadingLCSH
109 | Gays in higher education,ChangeHeadingLCSH
110 | Gays in literature,ChangeHeadingLCSH
111 | Gays in mass media,ChangeHeadingLCSH
112 | Gays in motion pictures,ChangeHeadingLCSH
113 | Gays in popular culture,ChangeHeadingLCSH
114 | Gays in the civil service,ChangeHeadingLCSH
115 | Gays in the performing arts,ChangeHeadingLCSH
116 | Gays with disabilities,ChangeHeadingLCSH
117 | "Gays, Black",ChangeHeadingLCSH
118 | Gays' writings,ChangeHeadingLCSH
119 | "Gays' writings, American",ChangeHeadingLCSH
120 | "Gays' writings, Australian",ChangeHeadingLCSH
121 | "Gays' writings, Austrian",ChangeHeadingLCSH
122 | "Gays' writings, Basque",ChangeHeadingLCSH
123 | "Gays' writings, Canadian",ChangeHeadingLCSH
124 | "Gays' writings, Caribbean",ChangeHeadingLCSH
125 | "Gays' writings, Catalan",ChangeHeadingLCSH
126 | "Gays' writings, Chilean",ChangeHeadingLCSH
127 | "Gays' writings, Chinese",ChangeHeadingLCSH
128 | "Gays' writings, Costa Rican",ChangeHeadingLCSH
129 | "Gays' writings, Dominican",ChangeHeadingLCSH
130 | "Gays' writings, English",ChangeHeadingLCSH
131 | "Gays' writings, French",ChangeHeadingLCSH
132 | "Gays' writings, Galician",ChangeHeadingLCSH
133 | "Gays' writings, German",ChangeHeadingLCSH
134 | "Gays' writings, Irish",ChangeHeadingLCSH
135 | "Gays' writings, Israeli",ChangeHeadingLCSH
136 | "Gays' writings, Italian",ChangeHeadingLCSH
137 | "Gays' writings, Latin American",ChangeHeadingLCSH
138 | "Gays' writings, Malaysian (English)",ChangeHeadingLCSH
139 | "Gays' writings, Philippine (English)",ChangeHeadingLCSH
140 | "Gays' writings, Portuguese",ChangeHeadingLCSH
141 | "Gays' writings, Puerto Rican",ChangeHeadingLCSH
142 | "Gays' writings, Scottish",ChangeHeadingLCSH
143 | "Gays' writings, South African (English)",ChangeHeadingLCSH
144 | "Gays' writings, Spanish",ChangeHeadingLCSH
145 | "Gays' writings, Spanish American",ChangeHeadingLCSH
146 | "Gays' writings, Tagalog",ChangeHeadingLCSH
147 | "Gays’ writings, Bikol",ChangeHeadingLCSH
148 | Gender identity disorders,ChangeHeadingLCSH
149 | Gender-nonconforming people,ProblemLCSH
150 | Giants,CancelHeadingLCSH
151 | Giants in art,ChangeHeadingLCSH
152 | Giants in literature,ChangeHeadingLCSH
153 | Giants in the Bible,ChangeHeadingLCSH
154 | "Glen Villa Art Garden (North Hatley, Québec)",ChangeHeadingLCSH
155 | God (Islam),ProblemLCSH
156 | Great Chalfield Manor (England),ChangeHeadingLCSH
157 | Gypsies,ChangeHeadingLCSH
158 | "Hand-to-hand fighting, Oriental",ProblemLCSH
159 | Handicapped children,ChangeHeadingLCSH
160 | Hearing impaired,ProblemLCSH
161 | "Heng Jie (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
162 | Héta Indians,ChangeHeadingLCSH
163 | Hispanic American gays,ChangeHeadingLCSH
164 | Hispanic Americans,ProblemLCSH
165 | "Historical drama, Oriya",ChangeHeadingLCSH
166 | Homeless persons,ProblemLCSH
167 | Hotchkiss automobile,ChangeHeadingLCSH
168 | Human monkeypox,ChangeHeadingLCSH
169 | "Humorous poetry, Oriya",ChangeHeadingLCSH
170 | Husband and wife,ProblemLCSH
171 | Hypermnestra (Greek mythology),CancelHeadingLCSH
172 | Hypnobryales,ChangeHeadingLCSH
173 | Illegal aliens,ChangeHeadingLCSH
174 | Illegal immigration,ProblemLCSH
175 | Illegitimacy,ProblemLCSH
176 | Illegitimate children,ProblemLCSH
177 | Indecent assault,CancelHeadingLCSH
178 | Indian cooking,ProblemLCSH
179 | Indian gays,ChangeHeadingLCSH
180 | Indian gays in literature,ChangeHeadingLCSH
181 | Indians of North America,ProblemLCSH
182 | Inmates of institutions,ProblemLCSH
183 | Invalids,ProblemLCSH
184 | Isobryales,CancelHeadingLCSH
185 | Italian American gays,CancelHeadingLCSH
186 | "Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH
187 | Japanimation,CancelHeadingLCSH
188 | Jewish bisexuals,ChangeHeadingLCSH
189 | Jewish gays,ChangeHeadingLCSH
190 | Jewish question,ProblemLCSH
191 | Juvenile delinquents,ProblemLCSH
192 | Kalâtdlisut dialect,ChangeHeadingLCSH
193 | Kalâtdlisut language,ChangeHeadingLCSH
194 | Kalâtdlisut literature,ChangeHeadingLCSH
195 | Kalâtdlisut poetry,ChangeHeadingLCSH
196 | Karok art,ChangeHeadingLCSH
197 | Karok artists,ChangeHeadingLCSH
198 | Karok baskets,ChangeHeadingLCSH
199 | Karok Indians,ChangeHeadingLCSH
200 | Karok language,ChangeHeadingLCSH
201 | Karok mythology,ChangeHeadingLCSH
202 | Karok women,ChangeHeadingLCSH
203 | Keeche Indians,ChangeHeadingLCSH
204 | Kejimkujik National Park (N.S.),ChangeHeadingLCSH
205 | "KeyArena (Seattle, Wash.)",ChangeHeadingLCSH
206 | Kham language,ChangeHeadingLCSH
207 | Kingdom of God (Mormon theology),ChangeHeadingLCSH
208 | Kings and rulers,ProblemLCSH
209 | Landlord and tenant,ProblemLCSH
210 | "Laudatory poetry, Oriya",ChangeHeadingLCSH
211 | "Law, Bantu",CancelHeadingLCSH
212 | "Lebanon--History--Israeli intervention, 1982-1985",ProblemLCSH
213 | Legal assistance to gays,ChangeHeadingLCSH
214 | Leprosy,ProblemLCSH
215 | Leprosy,ProblemLCSH
216 | Libraries and bisexuals,ChangeHeadingLCSH
217 | Libraries and gays,ChangeHeadingLCSH
218 | Mah-Meri (Malaysian people),ChangeHeadingLCSH
219 | "Masks, Mah-Meri",ChangeHeadingLCSH
220 | Mass media and gays,ChangeHeadingLCSH
221 | Mentally retarded persons,ChangeHeadingLCSH
222 | Mexican American gays,CancelHeadingLCSH
223 | Mexican fruit-fly,ChangeHeadingLCSH
224 | Middle-aged gays,ChangeHeadingLCSH
225 | Minority gays,CancelHeadingLCSH
226 | Minority gays in literature,ChangeHeadingLCSH
227 | Missions to Mormons,ChangeHeadingLCSH
228 | Mogul Empire,ProblemLCSH
229 | Mongoloid race,ProblemLCSH
230 | Mormon,ChangeHeadingLCSH
231 | Mormon almanacs,ChangeHeadingLCSH
232 | Mormon architecture,ChangeHeadingLCSH
233 | Mormon art,ChangeHeadingLCSH
234 | Mormon artists,ChangeHeadingLCSH
235 | Mormon arts,ChangeHeadingLCSH
236 | Mormon athletes,ChangeHeadingLCSH
237 | Mormon authors,ChangeHeadingLCSH
238 | Mormon boys,ChangeHeadingLCSH
239 | Mormon children,ChangeHeadingLCSH
240 | Mormon Church,ChangeHeadingLCSH
241 | Mormon church buildings,ChangeHeadingLCSH
242 | Mormon cities and towns,ChangeHeadingLCSH
243 | Mormon converts,ChangeHeadingLCSH
244 | Mormon cooking,ChangeHeadingLCSH
245 | Mormon cosmology,ChangeHeadingLCSH
246 | Mormon decorative arts,ChangeHeadingLCSH
247 | Mormon families,ChangeHeadingLCSH
248 | Mormon fundamentalism,ChangeHeadingLCSH
249 | Mormon furniture,ChangeHeadingLCSH
250 | Mormon gays,ChangeHeadingLCSH
251 | Mormon girls,ChangeHeadingLCSH
252 | Mormon handcart companies,ChangeHeadingLCSH
253 | Mormon historians,ChangeHeadingLCSH
254 | Mormon hygiene,ChangeHeadingLCSH
255 | Mormon intellectuals,ChangeHeadingLCSH
256 | Mormon interpretations,ChangeHeadingLCSH
257 | Mormon men,ChangeHeadingLCSH
258 | Mormon midwives,ChangeHeadingLCSH
259 | Mormon missionaries,ChangeHeadingLCSH
260 | Mormon neo-orthodoxy,ChangeHeadingLCSH
261 | Mormon painting,ChangeHeadingLCSH
262 | Mormon pilgrims and pilgrimages,ChangeHeadingLCSH
263 | Mormon pioneers,ChangeHeadingLCSH
264 | Mormon press,ChangeHeadingLCSH
265 | Mormon quilts,ChangeHeadingLCSH
266 | Mormon returned missionaries,ChangeHeadingLCSH
267 | Mormon scholars,ChangeHeadingLCSH
268 | Mormon seminaries,ChangeHeadingLCSH
269 | Mormon shrines,ChangeHeadingLCSH
270 | Mormon tabernacles,ChangeHeadingLCSH
271 | Mormon temples,ChangeHeadingLCSH
272 | Mormon universities and colleges,ChangeHeadingLCSH
273 | Mormon wit and humor,ChangeHeadingLCSH
274 | Mormon women,ChangeHeadingLCSH
275 | Mormon women authors,ChangeHeadingLCSH
276 | Mormon women missionaries,ChangeHeadingLCSH
277 | Mormon youth,ChangeHeadingLCSH
278 | Mormons,ChangeHeadingLCSH
279 | Mormons in art,ChangeHeadingLCSH
280 | Mormons in literature,ChangeHeadingLCSH
281 | Mormons in mass media,ChangeHeadingLCSH
282 | Mormons in motion pictures,ChangeHeadingLCSH
283 | Morrigan (Celtic deity),CancelHeadingLCSH
284 | Multiple personality disorder,ProblemLCSH
285 | Muslim gays,ChangeHeadingLCSH
286 | "Mythology, Aboriginal Australian",ProblemLCSH
287 | Nanovor (Game),CancelHeadingLCSH
288 | Navajo,ProblemLCSH
289 | Neopagan gays,ChangeHeadingLCSH
290 | Nephites,ChangeHeadingLCSH
291 | New Jerusalem (Mormon theology),ChangeHeadingLCSH
292 | Ngati Awa (New Zealand people),ChangeHeadingLCSH
293 | Ngati Haua (New Zealand people),ChangeHeadingLCSH
294 | Ngati Mahuta (New Zealand people),ChangeHeadingLCSH
295 | Ngati Mamoe (New Zealand people),ChangeHeadingLCSH
296 | Ngati Pukenga (New Zealand people),ChangeHeadingLCSH
297 | Nilam River (Pakistan),ChangeHeadingLCSH
298 | Nilam River Valley (Pakistan),ChangeHeadingLCSH
299 | Nile mosaic (Palestrina),CancelHeadingLCSH
300 | Obsession,ProblemLCSH
301 | Obsession (Psychology),ProblemLCSH
302 | Older gays,ChangeHeadingLCSH
303 | Older Mormons,ChangeHeadingLCSH
304 | "One-act plays, Oriya",ChangeHeadingLCSH
305 | Ordinances for the dead (Mormon Church),ChangeHeadingLCSH
306 | Ordination of gays,ChangeHeadingLCSH
307 | Oriental literature,ProblemLCSH
308 | Orientalism,ProblemLCSH
309 | Oriya drama,ChangeHeadingLCSH
310 | Oriya essays,ChangeHeadingLCSH
311 | Oriya fiction,ChangeHeadingLCSH
312 | Oriya language,ChangeHeadingLCSH
313 | Oriya literature,ChangeHeadingLCSH
314 | Oriya poetry,ChangeHeadingLCSH
315 | Oriya poetry--1500-1800,ChangeHeadingLCSH
316 | Oriya poetry--20th century,ChangeHeadingLCSH
317 | Oriya prose literature,ChangeHeadingLCSH
318 | Oriya prose literature--To 1500,ChangeHeadingLCSH
319 | Oriya wit and humor,ChangeHeadingLCSH
320 | "Ou Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
321 | Overweight gays,ChangeHeadingLCSH
322 | Pacific Islander American bisexuals,ChangeHeadingLCSH
323 | Pacific Islander American gays,ChangeHeadingLCSH
324 | Pacifists,ProblemLCSH
325 | Palestinian Arabs,ProblemLCSH
326 | Parental leave,ChangeHeadingLCSH
327 | Parents of attention-deficit-disordered children,ChangeHeadingLCSH
328 | Parents of gays,ChangeHeadingLCSH
329 | Parolees,ProblemLCSH
330 | Patriarchal blessings (Mormon Church),ChangeHeadingLCSH
331 | Patriarchs (Mormon theology),ChangeHeadingLCSH
332 | "Patriotic poetry, Oriya",ChangeHeadingLCSH
333 | People with mental disabilities,ProblemLCSH
334 | People with social disabilities,ProblemLCSH
335 | Philemon (Greek mythology),CancelHeadingLCSH
336 | Pioneer Day (Mormon history),ChangeHeadingLCSH
337 | Plan of salvation (Mormon theology),ChangeHeadingLCSH
338 | Poor,ProblemLCSH
339 | Popular music--South Korea--2011-2020,ProblemLCSH
340 | Porirua Harbour (N.Z.),ChangeHeadingLCSH
341 | Posse Comitatus (Group),CancelHeadingLCSH
342 | Pregnant women,ProblemLCSH
343 | Presbyterian gays,ChangeHeadingLCSH
344 | Primitive art,ProblemLCSH
345 | Prisoners,ProblemLCSH
346 | Problem children,ProblemLCSH
347 | Prophets (Mormon theology),ChangeHeadingLCSH
348 | Prostitution,ProblemLCSH
349 | Protestant gays,ChangeHeadingLCSH
350 | Psychic trauma,ProblemLCSH
351 | "Quatrains, Oriya",ChangeHeadingLCSH
352 | Race,ProblemLCSH
353 | Race relations,ProblemLCSH
354 | Race riots,ProblemLCSH
355 | Racially mixed people,ProblemLCSH
356 | Radio programs for gays,ChangeHeadingLCSH
357 | "Rāsa literature, Oriya",ChangeHeadingLCSH
358 | Rastafarian,ProblemLCSH
359 | "Red River Rebellion, 1869-1870",ChangeHeadingLCSH
360 | "Reichstagsgebäude (Berlin, Germany)",CancelHeadingLCSH
361 | "Religious literature, Oriya",ChangeHeadingLCSH
362 | "Religious poetry, Oriya",ChangeHeadingLCSH
363 | Restoration of the gospel (Mormon doctrine),ChangeHeadingLCSH
364 | "Revolutionary poetry, Oriya",ChangeHeadingLCSH
365 | Rgyal-roṅ (China),ChangeHeadingLCSH
366 | "Riel Rebellion, 1885",ChangeHeadingLCSH
367 | Samantabhadra (Buddhist deity),CancelHeadingLCSH
368 | Schizophrenics,ProblemLCSH
369 | Sex change,ChangeHeadingLCSH
370 | Sex role,ProblemLCSH
371 | Sexual minorities,ProblemLCSH
372 | Sexual reorientation programs,ChangeHeadingLCSH
373 | "Shiquan Jie (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
374 | "Short stories, Oriya",ChangeHeadingLCSH
375 | Social disabilities,ProblemLCSH
376 | Social work with bisexuals,ChangeHeadingLCSH
377 | Social work with gays,ChangeHeadingLCSH
378 | Socially handicapped,ChangeHeadingLCSH
379 | South Asian American gays,ChangeHeadingLCSH
380 | Street-food vendors,ChangeHeadingLCSH
381 | Stuart Island State Park (Wash.),ChangeHeadingLCSH
382 | Substance abuse,ProblemLCSH
383 | Śulvasūtras,CancelHeadingLCSH
384 | "Support (Domestic relations law, Hindu)",ChangeHeadingLCSH
385 | "Support (Domestic relations law, Islamic)",ChangeHeadingLCSH
386 | "Support (Domestic relations law, Jewish)",ChangeHeadingLCSH
387 | Television and gays,ChangeHeadingLCSH
388 | Television programs for gays,ChangeHeadingLCSH
389 | Temple endowments (Mormon Church),ChangeHeadingLCSH
390 | Temple work (Mormon Church),ChangeHeadingLCSH
391 | Tenth of Muḥarram,ProblemLCSH
392 | Theology,ProblemLCSH
393 | "Tibet, Plateau of",ChangeHeadingLCSH
394 | Tramps,ProblemLCSH
395 | Transvestism,ChangeHeadingLCSH
396 | Triangles (Interpersonal relations),ProblemLCSH
397 | "Ukraine Conflict, 2014-",CancelHeadingLCSH
398 | Ukraine--Economic conditions--1991-,ChangeHeadingLCSH
399 | Ukraine--Economic policy--1991-,ChangeHeadingLCSH
400 | Ukraine--Foreign relations--1991-,ChangeHeadingLCSH
401 | Ukraine--History--1917-,ChangeHeadingLCSH
402 | Ukraine--History--1917-1991,ChangeHeadingLCSH
403 | Ukraine--History--1991-,ChangeHeadingLCSH
404 | "Ukraine--History--Euromaidan Protests, 2013-2014",ChangeHeadingLCSH
405 | "Ukraine--History--Russian Invasion, 2022-",ChangeHeadingLCSH
406 | Ukraine--Intellectual life--1991-,ChangeHeadingLCSH
407 | Ukraine--Politics and government--1991-,ChangeHeadingLCSH
408 | Ukraine--Social conditions--1991-,ChangeHeadingLCSH
409 | Ultra-Orthodox,ProblemLCSH
410 | United orders (Mormon Church),ChangeHeadingLCSH
411 | United States. Army--Gays,ChangeHeadingLCSH
412 | Unskilled labor,ProblemLCSH
413 | "Vaishnava poetry, Oriya",ChangeHeadingLCSH
414 | Victims,ProblemLCSH
415 | Waima'a language,ChangeHeadingLCSH
416 | "Wang Shi Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
417 | Wards (Mormon Church),ChangeHeadingLCSH
418 | Whites,ChangeHeadingLCSH
419 | Women in the Mormon Church,ChangeHeadingLCSH
420 | Women in the Mormon sacred books,ChangeHeadingLCSH
421 | Word recognition,ProblemLCSH
422 | World music,ProblemLCSH
423 | "World War, 1939-1945--Gays",ChangeHeadingLCSH
424 | Yellow peril,CancelHeadingLCSH
425 | "Yipu Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
426 | "Zhuo Zheng Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH
427 | Zion (Mormon Church),ChangeHeadingLCSH
428 | Zoroastrianim,CancelHeadingLCSH
429 | 


--------------------------------------------------------------------------------
/Code/example-input-metadata.csv:
--------------------------------------------------------------------------------
 1 | id,title,description,creator,date,collection name,subjects,spatial coverage
 2 | 337805,Aborigines of Taiwan [001],"A 1974 photo showing a group of aborigine dancers, Taiwan","Tierney, Lennox",1974,P0479 Lennox and Catherine Tierney Photo Collection,Indigenous peoples--Photographs; Taiwan aborigines--Photographs; Amis (Taiwan people)--Photographs; Dance--Photographs; Clothing and dress--Photographs; Taiwan; Dance; Clothing and dress; Republic of China,Taiwan
 3 | 332408,"Ainu (Japan's aboriginal people), Hokkaido, Japan [30]","Photo of Japan's Aboriginal people (Chief, his wife and an unidentified person), Hokkaido, Japan","Tierney, Lennox",1959,P0479 Lennox and Catherine Tierney Photo Collection,Ainu--Photographs; Men--Photographs; Women--Photographs; Japan; Indigenous peoples,Shiraoi-chō (Japan) 
 4 | 335588,People of Japan [002],"Photo of a Japanese gentleman holding a hand fan, Tokyo, Japan","Tierney, Lennox",1950; 1951; 1952,P0479 Lennox and Catherine Tierney Photo Collection,Japanese--Japan--Tokyo--Photographs; Men--Japan--Tokyo--Photographs; Fans--Japan--Tokyo--Photographs; Japan; Men; Fans,Japan
 5 | 330740,Block printing: Katsushika Hokusai [015],"Photograph of block print: ""A Potted Dwarf Pine with a Basin and a Towel on a Rack - Horse Talisman (Mayoke)"", also known as ""A surimono still-life composition"", (from the series A Set of Horses (Umazukushi), 1822) by Katsushika Hokusai (Japanese, 1760-1849), (approximate size, may vary slightly) 206 mm x 183 mm (8.11 in. x 7.20 in.)","Tierney, Lennox",2003,P0479 Lennox and Catherine Tierney Photo Collection,"Katsushika, Hokusai, 1760-1849--Photographs; Block printing--Japan--Photographs; Ukiyoe--Japan--Photographs; Surimono--Japan--Photographs; Trees--Art--Photographs; Bonsai--Art--Photographs; Pine--Art--Photographs; Towels--Art--Photographs; Basins (Containers)--Art--Photographs; Water--Art--Photographs; Art; Ukiyoe; Surimono",
 6 | 1533946,Busts of Ute Indians [1],"Black-and-white photograph of a bust of an American Indian by Millard F. Malin, from a set commissioned by the State of Utah in 1934.  His models were Ute Indians in the Uinta Basin.",,1934; 1935; 1936,P0177  Millard F. Malin Photographs,"Indians of North America--Monuments--Photographs; Indians of North America--Art--Photographs; Malin, Millard F., 1891-1974--Works--Photographs; Sculptures--Photographs; Indigenous peoples--North America",
 7 | 1398979,"Navajo Pavilion, Gateway Center, February 14, 2002 [18]","Color photograph of Navajo performers at the Navajo pavillion, Gateway Center in downtown Salt Lake City, February 14, 2002.",,2002-02-14,P0810 Peter L. Goss Photograph Collection,"Navajo Indians--Music--Photographs; Navajo Indians-Photographs; Salt Lake City (Utah)--Photographs; Olympic Winter Games (19th : 2002 : Salt Lake City, Utah)--Photographs; Indigenous peoples--North America","Salt Lake City, Salt Lake County, Utah, United States"
 8 | 962277,"Photo taken during the AIM takeover and ultimate surrender at Wounded Knee, South Dakota.  (De-briefing or court hearing.)","Photo taken at a court hearing or de-briefing following the American Indian Movement takeover at Wounded Knee, South Dakota, in 1973.",,1973,P0181 Stanley Lyman Photograph Collection,"Wounded Knee (S.D.)--History--Indian occupation, 1973--Photographs; Indians of North America; Indigenous peoples--North America",Wounded Knee (S.D.); South Dakota
 9 | 2348627,Sur. Navajo Mt. looking down Colorado from 5 miles below San Juan,"Black and white photograph showing a view of Navajo Mountain from the Colorado River. Photo taken during the U.S. Geological Survey's 1921 survey of the San Juan River, led by K. W. Trimble, with Bert Loper serving as chief boatman.",,1921,P0243 Grand Canyon and San Juan River photograph colleciton,Navajo Mountain (Utah and Ariz.)--Photographs; Colorado River (Colo.-Mexico)--Photographs,Colorado River (Colo.-Mexico); San Juan County (Utah)
10 | 1498946,Spanish at Indian pueblo,"Photograph of an illustration in an unidentified publication, artist's rendition of a party of Spanish horsemen at an Indian pueblo, perhaps in New Mexico.",,1600; 1601; 1602; 1603; 1604; 1605; 1606; 1607; 1608; 1609; 1610; 1611; 1612; 1613; 1614; 1615; 1616; 1617; 1618; 1619; 1620; 1621; 1622; 1623; 1624; 1625; 1626; 1627; 1628; 1629; 1630; 1631; 1632; 1633; 1634; 1635; 1636; 1637; 1638; 1639; 1640; 1641; 1642; 1643; 1644; 1645; 1646; 1647; 1648; 1649; 1650; 1651; 1652; 1653; 1654; 1655; 1656; 1657; 1658; 1659; 1660; 1661; 1662; 1663; 1664; 1665; 1666; 1667; 1668; 1669; 1670; 1671; 1672; 1673; 1674; 1675; 1676; 1677; 1678; 1679; 1680; 1681; 1682; 1683; 1684; 1685; 1686; 1687; 1688; 1689; 1690; 1691; 1692; 1693; 1694; 1695; 1696; 1697; 1698; 1699; 1700; 1701; 1702; 1703; 1704; 1705; 1706; 1707; 1708; 1709; 1710; 1711; 1712; 1713; 1714; 1715; 1716; 1717; 1718; 1719; 1720; 1721; 1722; 1723; 1724; 1725; 1726; 1727; 1728; 1729; 1730; 1731; 1732; 1733; 1734; 1735; 1736; 1737; 1738; 1739; 1740; 1741; 1742; 1743; 1744; 1745; 1746; 1747; 1748; 1749; 1750; 1751; 1752; 1753; 1754; 1755; 1756; 1757; 1758; 1759; 1760; 1761; 1762; 1763; 1764; 1765; 1766; 1767; 1768; 1769; 1770; 1771; 1772; 1773; 1774; 1775; 1776; 1777; 1778; 1779; 1780; 1781; 1782; 1783; 1784; 1785; 1786; 1787; 1788; 1789; 1790; 1791; 1792; 1793; 1794; 1795; 1796; 1797; 1798; 1799; 1800,P0185  Drawings of Western History,"Pueblo Indians--History--Art; Southwest, New--Discovery and exploration--Art; Indigenous peoples--North America","Southwest, New; New Mexico"
11 | 995167,Native Americans herding sheep on horseback,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation,,1950; 1951; 1952; 1953; 1954; 1955; 1956; 1957; 1958; 1959; 1960; 1961; 1962; 1963; 1964; 1965; 1966; 1967; 1968; 1969; 1970; 1971; 1972; 1973; 1974; 1975; 1976; 1977; 1978; 1979; 1980,P0561 Wallace Stegner Photograph Collection,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America,
12 | 947066,"Large woodcut located in Olympic Valley, California","Photo of a large wood sculpture at Palisades Tahoe (previously Squaw Valley) in Olympic Valley, California, depicting skiers. It was carved in 1995 ",,1995; 1996; 1997; 1998; 1999; 2000,P0413 Alan K. Engen Photograph Collection,Skiers--Art; Skis and skiing--Art; Wood sculpture—Photographs,"Palisades Tahoe, Placer County, California, United States; Olympic Valley, Placer County, California, United States"
13 | 958790,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960.","Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960",,1960,P0790 Shipler Studio Photograph Collection,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs",Utah; Tooele County (Utah); Bonneville Salt Flats (Utah) 
14 | 998919,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah)
15 | 1302623,"Basalt-capped mesa on Dolores (Triassic), 6± miles south of Beddehoche (Indian Wells), Ariz., 1909 (photo G-67)","Photograph of Black Butte, a basalt-capped mesa south of Indian Wells. From Herbert E. Gregory Book 2: Navajo-Hopi, San Juan 1909","Gregory, Herbert E. (Herbert Ernest), 1869-1952",,P0013 Herbert E. Gregory Photograph Collection,Volcanic fields--Arizona--Apache County--Photographs; Volcanic fields--Navajo Indian Reservation--Photographs; Buttes--Arizona--Apache County--Photographs; Buttes--Navajo Indian Reservation--Photographs; Landforms--Arizona--Apache County--Photographs; Landforms--Navajo Indian Reservation--Photographs; Geology--Arizona--Apache County--Photographs; Navajo Indian Reservation--Photographs,"Black Butte (Navajo County, Ariz.); Five Buttes (Ariz.); Navajo County (Ariz.); Navajo Indian Reservation; Arizona"
16 | 995167,Native Americans herding sheep on horseback,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation,,1950; 1951; 1952; 1953; 1954; 1955; 1956; 1957; 1958; 1959; 1960; 1961; 1962; 1963; 1964; 1965; 1966; 1967; 1968; 1969; 1970; 1971; 1972; 1973; 1974; 1975; 1976; 1977; 1978; 1979; 1980,,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America,Navajo Indian Reservation
17 | 2364219,"Pioneer Day parade, 1880 (Carter, photo)","Black and white photograph of the Salt Lake Pioneer Day Parade, July 24, 1880.","Carter, C. W.",,"P0251 Salt Lake City, Utah",Mormon pioneers--Photographs; Pioneer Day (Mormon history)--Photographs; Parades--Utah--Salt Lake City--Photographs,Salt Lake City (Utah); Salt Lake County (Utah)
18 | 941713,Newly arrived evacuees standing behind their baggage.,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II,,1942; 1943 ,P0144 Japanese American Relocation Photograph Collection,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Clothing & dress--Tule Lake Relocation Center--1940-1950",Modoc County (Calif.); California
19 | 941496,Evacuees cleaning vegetables in the packing shed.,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II,,1941; 1942; 1943 ,P0144 Japanese Relocation Photograph Collection,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Agriculture--1940-1950",Modoc County (Calif.); California
20 | 941536,Evacuees harvesting potatoes at Tule Lake. [5],Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II,,1942-11-05,P0144 Japanese Relocation Photograph Collection,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; World War, 1939-1945--Concentration camps--California; Farming--California--Tule Lake--1940-1950; Agricultural laborers--California--Tule lake--1940-1950",
21 | 958469,"Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle the ""Challenger"" on the Bonneville Salt Flats Raceway in 1960.","Photo of Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle, the ""Challenger,"" on the Bonneville Salt Flats Raceway in 1960",,,P0790 Shipler Studio Photograph Collection,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs",
22 | 958790,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960.","Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960",,,P0790 Shipler Studio Photograph Collection,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs",
23 | 998946,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah)
24 | 998908,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah)
25 | 998938,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Mormon history); Days of '47; Holidays; Local holidays; Theatrical sets; Costumes (character dress); Thrones; McKay, Calleen Alice Robinson, 1928-2005; Women; Pioneer Days Royalty; Centennial Queens; Beauty contestants",Salt Lake City (Utah); Salt Lake County (Utah)
26 | 998919,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah)
27 | 2292054,Albert Fritz letters about peaceful demonstrations,"Series of letters from Albert Fritz to the Salt Lake Police Department, Assistant Chief Ralph Knusden regarding peaceful demonstrations. NAACP call for protest on State Capitol building due to failure of the Utah Legislature to adopt civil rights legislation regarding housing and public accommodations (2005). NAACP appealed to the Governor of the state of Utah to include civil rights legislation on the docket for the special session of the Utah state Legislature, which he has announced. Local NAACP leaders noticed that Utah is the only northern state with no civil rights legislation (1968). Letter from Albert B. Fritz (Salt Lake NAACP) to Honorable George D. Clyde, Governor of Utah (1965?) regarding lack of civil rights legislation in Utah. Article from the Wall Street Journal (1964) ""Civil Rights Irony: New U.S. Agency's First Case Likely to Come From Utah"" by Donald Moffitt. Article discusses racism in Salt Lake City and highlights Chuck Nabors who moved to Salt Lake City to attend the University of Utah. Nabors rented an apartment sight unseen, when the landlord saw he was an African American the landlord backed out of lease. After Nabors found a landlord who would rent to him, neighbors petitioned to have him leave.",,1952-05-10; 2005-03-07; 1968-04-10; 2008-04-16; 1964-10-20,ACCN 1007 Charles James Nabors papers,"Civil rights movements--United States--History--20th century; Civil rights--Utah; Race discrimination--Religious aspects--Latter Day Saint churches; African Americans; Racism against Black people; Nabors, Charles James,1934-1986; African American scientists",
28 | 1396789,Working papers toward a history of the Spanish-speaking people of Utah: a report of research of the Mexican-American Documentation Project,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah.  Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah.""",,,,Hispanic Americans--Utah; Latin Americans--Utah,
29 | 1396777,Human rights and the Native peoples of the Americas,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination.",,,,"Indians--Legal status, laws, etc.; Indians of North America--Legal status, laws, etc.--Canada; Indians of North America--Legal status, laws, etc.--United States; Discrimination; Indians--Social conditions; Indigenous peoples--North America",America; North America; United States; Canada
30 | 1049391,"Three Nephites"" stories [3]","A collection of handwritten and typed papers recounting stories about mysterious visitors, often identified as ""Nephites""",,1932; 1933; 1934; 1935; 1936; 1937; 1938; 1939; 1940; 1941; 1942; 1943; 1944; 1945; 1946,,Nephites; Folklore--Utah; Latter Day Saints--Utah--History--Anecdotes,
31 | 1006680,Letter of Ebenezer and George Brooks,"October 6, 1880 - Corinth, New York; Brooks, Ebenezer and George, to My Dear beloved niece, and Dear Cosen Angela Ebenezer writes on his eighty-eighth birthday to tell of his conversion to Mormonism. George, his son, adds a note. Short genealogical list of Brookses",,,MS0120 Philip T. Blair Family Papers,Church of Jesus Christ of Latter-day Saints--History; Utah--History; Mormon converts,
32 | 1470940,"Interview with Joseph Ward Spendlove, Downwinders of Utah Archive, June 25, 2019","Transcript (8 pages) of an interview conducted by Justin Sorensen and Anthony Sams in Tooele, Utah on June 25, 2019. Spendlove discusses his experiences growing up in Delta, Utah. He discusses the various health issues experienced by his family members. He mentions playing outside during the testing and not being concerned about the possible effects. He also discusses aspects of Downwinders receiving compensation.",,,Everett L. Cooley Oral History Project,Nuclear weapons Testing; Nuclear weapons--United States--Testing; Nuclear weapons testing victims; Nuclear weapons--United States--Testing; Radioactive fallout,Corinth (N.Y.)
33 | 818891,"Peter Loewenberg, Los Angeles, California: an interview by Newell Bringhurst","Transcript (28 pages) of interview by Newell G. Bringhurst with Peter Loewenberg, as associate of Fawn Brodie, on December 12, 1988. This interview is no. 272 in the Everett L. Cooley Oral History Project, and tape no. U-941. Accompanied by Loewenberg's curriculum vitae","Loewenberg, Peter, 1933-",,ACCN 0814 Everett L. Cooley Oral History Project,"Loewenberg, Peter, 1933- --Interviews; Brodie, Fawn McKay, 1915-1981--Biography; Latter Day Saints--Biography; Mormon scholars--Biography; University of California, Los Angeles--Faculty--Biography","Los Angeles, Los Angeles County, California, United States, http://sws.geonames.org/5368361/"
34 | 1396789,Working papers toward a history of the Spanish-speaking people of Utah: a report of research of the Mexican-American Documentation Project,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah.  Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah.""",,,American West Center Research Projects,Hispanic Americans--Utah; Latin Americans--Utah,"Utah, United States"
35 | 893658,"Interviews with African Americans in Utah, Alberta Henry, Interview 1","Transcript (137 pages) of an interview by Leslie Kelen with Alberta Henry on July 21, 1983. From Interviews with African Americans in Utah",,,"Ms0453, Interviews with Blacks in Utah, 1982-1988","African Americans--Utah--Interviews; Henry, Alberta H., 1920-2005--Interviews; African Americans--Civil rights--Utah; Utah--Race relations",
36 | 958364,"John Cobb's racing vehicle the ""Railton Mobil Special"" being worked on by a group of men on the Bonneville Salt Flats Raceway in 1938. [10]","Photo of the chassis of John Cobb's racing vehicle, the ""Railton Mobil Special,"" being worked on by a group of men on the Bonneville Salt Flats Raceway in 1938. Cobb is third from left, talking to the man in a suit and hat ",,,P0790 Shipler Studio Photograph Collection,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs",Utah; Tooele County (Utah); Bonneville Salt Flats (Utah)
37 | 946265,"Arnold Lunn, left, ski historian and author of the book, The Story of Ski-ing, 1952.  And Hjalmar Hvam, right, ski pioneer and inventor of America's first safety binding in 1937.","Photo shows skiing pioneers Arnold Lunn (left) and Hjalmar Hvam, probably in the 1940s ",,,P0413 Alan K. Engen Photograph Collection,"Lunn, Arnold, 1888-1974--Photographs; Hvam, Hjalmar, 1902-1996--Photpraphs; Skiers--Photographs",
38 | 947346,"Utah ski pioneer Mel Fletcher skiing on a pair of his homemade ""Barrel Staves,""   circa 1952.","Photo of Mel Fletcher, skiing pioneer and ski instructor, on homemade skis",,,P0413 Alan K. Engen Photograph Collection,"Fletcher, Mel, 1918-2010--Photographs; Skiers—Utah--Photographs"," Deer Valley (Summit County, Utah); Park City (Utah); Summit County (Utah)"
39 | 1739616,"Crime, Veda Goff, murder victim","Black and white photograph of Vida Irene Goff, who was murdered at Magna, Utah, December 29, 1945 ",,1945-12-29,P0244 Olive Woolley Burt Photograph Collection ,Murder victims--Photographs,
40 | 2509070," ""Captive Jewels--Our Crippled Children"" speech, retyped",,"Priest, Ivy Baker, 1905-1975", 1953; 1954; 1955; 1956; 1957; 1958; 1959; 1960; 1961,MS0163 Ivy Baker Priest Papers,"Priest, Ivy Baker, 1905-1975; Women in politics--United States--Sources; Women--Utah--Biography; United States--Department of the Treasury; National Society for Crippled Children and Adults; People with disabilities",
41 | 


--------------------------------------------------------------------------------