├── LICENSE ├── XML Test Code ├── lexicons.csv ├── RMA-Tool-CSVOnly.py ├── RMA-Tool.py └── RMA-GUI.py ├── project notes ├── Code ├── GUI-Documentation.md ├── Test Versions │ ├── RMA-Tool-1.0.py │ ├── RMA-GUI-1.0.py │ ├── RMA-GUI-2.0.py │ └── RMA-GUI-2.5.py ├── Past Versions │ ├── MacOS-UsersGuide-2.5.md │ ├── MaRMAT-CommandLine-2.5.py │ └── MaRMAT-GUI-2.5.2.py ├── lexicon-reparative-metadata.csv ├── MarMAT-CommandLine-2.6.py ├── example-output-lcsh-subject-lexicon.csv ├── MaRMAT-GUI-2.5.3.py ├── example-output-reparative-metadata-lexicon.csv ├── lexicon-LCSH.csv └── example-input-metadata.csv └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 J. Willard Marriott Library 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /XML Test Code/lexicons.csv: -------------------------------------------------------------------------------- 1 | Aggrandizement,RaceEuphemisms,RaceTerms,SlaveryTerms,GenderTerms,LGBTQ,MentalIllness,Disability 2 | acclaimed,color blind,aboriginal,abolition,matriarch,biologically female,brain damaged,able-bodied 3 | ambitious,colored,aboriginals,abolitionist,miss,biologically male,committed suicide,birth defect 4 | celebrated,coloured,aborigines,antislavery,mistress,dyke,crazy,confined to a wheelchair 5 | distinguished,negro,afro-american,anti-slavery,mrs.,fag,dumb,cripple 6 | eminent,race relations,aliens,bill of sale,muse,gay lifestyle,emotionally disturbed,crippled 7 | esteemed,race situation,arab,bills of sale,patriarch,gays,insane,deaf-mute 8 | expert,race-based,arabs,enslaved,spouse,homosexual,retarded,deformed 9 | famous,racial,asians,freed slave,wife,homosexuality,slow learner,disabled person 10 | father of,racism,asiatic,freed slaves,,lesbianism,special needs,dwarf 11 | foremost,riot,blacks,freedman,,sexual minorities,stupid,epileptic 12 | founding father,troubles,bushman,freedmen,,sexual minority,,handicap 13 | genius,unruly,bushmen,fugitive slave,,sexual preference,,handicapped 14 | gentleman,,bushwoman,hired out,,tranny,,invalid 15 | important,,chink,hiring out,,transvestite,,lame 16 | influential,,civilized,manumission,,,,midget 17 | man of letters,,coolie,manumitted,,,,paraplegic 18 | masterpiece,,coolies,negro,,,,physically challenged 19 | notable,,creole,overseer,,,,the deaf 20 | patriot,,creoles,plantation,,,,the disabled 21 | pioneer,,ethnic,planter,,,,wheelchair-bound 22 | plantation owner,,exotic,runaway slave,,,, 23 | planter,,gook,runaway slaves,,,, 24 | preeminent,,gypsies,slave,,,, 25 | prestigious,,gypsy,slave holder,,,, 26 | prolific,,hispanics,slave master,,,, 27 | prominent,,illegal alien,slave owner,,,, 28 | renowned,,illegal aliens,slave revolt,,,, 29 | respected,,illegal immigrant,slaveholder,,,, 30 | revolutionary,,illegal immigrants,slavery,,,, 31 | seminal,,illegals,slaves,,,, 32 | successful,,indian,,,,, 33 | wealthy,,indians,,,,, 34 | ,,japs,,,,, 35 | ,,mammy,,,,, 36 | ,,mulatto,,,,, 37 | ,,mulattoes,,,,, 38 | ,,mulattos,,,,, 39 | ,,native americans,,,,, 40 | ,,natives,,,,, 41 | ,,negro,,,,, 42 | ,,negroes,,,,, 43 | ,,negros,,,,, 44 | ,,oriental,,,,, 45 | ,,primitive people,,,,, 46 | ,,primitives,,,,, 47 | ,,pygmies,,,,, 48 | ,,pygmy,,,,, 49 | ,,sambo,,,,, 50 | ,,savages,,,,, 51 | ,,segregated,,,,, 52 | ,,squaw,,,,, 53 | ,,squaws,,,,, 54 | ,,uncivilized,,,,, 55 | -------------------------------------------------------------------------------- /project notes: -------------------------------------------------------------------------------- 1 | Here are our project notes from Friday, April 19 2024 2 | 3 | 4 | Project notes from working meeting: 5 | 6 | We continued to look at issues with harvesting OAI and convering xml to CSV 7 | We decided to focus only on downloaded CSV and TSV files 8 | 9 | Kaylee explored writing something that would help Rachel convert TSV files, or this is done in Open Refine? Does this need to be revisited? 10 | We had discussions about supplementing the lexicon. 11 | 12 | Kaylee wrote code and Rachel tested on her PC 13 | One issue raised was the idea of searching for phrases, for example "biological male" 14 | We also thought of supplementing the lexicon with lists of LCSH terms that were outdated, in addition to terms that were problematic. 15 | We tried to find ways of getting automated lists of changed LCSH subject headings and did not come up with a good solution. 16 | Rachel and Anna are coping and pasting lists of changes into Excel and adjusting in order to come up with these terms from classweb: 17 | https://classweb.org/approved-subjects/ (Rachel to do from 2011-2016, Anna to work on 2017-2024) 18 | 19 | At the close of day, there is a working script, with some issues - tokenizing? on parts of words causes the riot in Marriott to appear. 20 | 21 | Need to change section of find matches for what script is looking for to resolve this 22 | 23 | Kaylee cleared out the riots as a closing activity today! WOO! 24 | 25 | 26 | May 1 2024 27 | 28 | We reviewed current progress. Rachel tested GUI, it did not work 29 | Anna prepared Aileen H Clyde sample metadata, it is in project Box folder 30 | Kaylee notes that we can add more information about description of the project in advance of our next meeting 31 | 32 | June 6 33 | 34 | We are going to do CSV only 35 | We are still considering the progress bar 36 | Anna will review documentation, see if there is anything additonal to add about the requirement for csv files 37 | 38 | Rachel will finalize lexicons 39 | 40 | Kaylee will add additional column output for original context 41 | 42 | We discussed the issues with matching against the problem LCSH lexicon 43 | Kaylee thinks that the processing step where the metadata gets stripped of punctuation is removing the subdivisions between LCSH headings and subheadings 44 | This might be making the matching process against the LCSH lexicon not function well. 45 | Anna asked if we could add intermediate steps where we print out some of the behind the scenes processing steps would be useful 46 | 47 | maybe we need a separate tool that could process digital library LCSH against a LCSH lexicon 48 | 49 | Also maybe take out headings only from LCSH changes, not subheadings in order to pick up broad matches 50 | -------------------------------------------------------------------------------- /Code/GUI-Documentation.md: -------------------------------------------------------------------------------- 1 | # Marriott Reparative Metadata Assessment Tool (MaRMAT) GUI 2 | 3 | The MaRMAT GUI is a graphical application built using Tkinter in Python. This tool allows users to match terms from a problematic terms lexicon file with text data from a collections metadata file, facilitating metadata cleanup and analysis. 4 | 5 | ## Overview 6 | 7 | The application provides the following functionalities: 8 | 9 | - Load CSV files for both lexicon and metadata. 10 | - Select specific columns from the metadata for analysis. 11 | - Choose an identifier column in the metadata to relate back to the original dataset. 12 | - Select categories of terms from the lexicon for searching. 13 | - Perform matching to find terms in selected metadata columns and export results to a CSV file. 14 | 15 | ## Features 16 | 17 | - **User Interface**: Utilizes Tkinter for a GUI interface. 18 | - **File Loading**: Supports loading CSV files for lexicon and metadata. 19 | - **Column Selection**: Allows users to choose specific columns from metadata for term analysis. 20 | - **Identifier Selection**: Enables selection of an identifier column for linking matched terms back to the original metadata. 21 | - **Category Selection**: Provides options to select categories of terms from the lexicon for matching. 22 | - **Matching Process**: Performs regex-based term matching across selected metadata columns and chosen lexicon categories. 23 | - **Output**: Exports matched data to a CSV file for further analysis or use. 24 | 25 | ## Getting Started 26 | 27 | To use the MaRMAT GUI, follow these steps: 28 | 29 | 1. Download the [RMA-GUI-2.52.py](https://github.com/kayleealexander/RMA-Tool/blob/main/Code/RMA-GUI-2.52.py) file. 30 | 2. Download one of our sample lexicons in the [Code](https://github.com/kayleealexander/RMA-Tool/tree/main/Code) folder, or create your own. 31 | 3. Download the metadata you want to assess as a CSV file. 32 | 4. Open the [RMA-GUI-2.52.py](https://github.com/kayleealexander/RMA-Tool/blob/main/Code/RMA-GUI-2.52.py) file and follow the prompts. 33 | 34 | ## Using the Tool 35 | 36 | **1. Load Lexicon and Metadata**: 37 | - Follow on-screen instructions to load your lexicon and metadata CSV files using the provided buttons. 38 | 39 | **2. Perform Analysis**: 40 | - Select columns from your metadata for analysis. 41 | - Choose an identifier column for matching results back to the original dataset. 42 | - Select categories of terms from the lexicon for analysis. 43 | - Click "Perform Matching" to find matches and export the results as a CSV file. 44 | 45 | ## Additional Notes 46 | 47 | **Dependencies**: Ensure you have Python 3.x and the `pandas` library installed as per the installation instructions. 48 | 49 | ## Contact 50 | 51 | For any questions or support, please contact [Kaylee Alexander](mailto:kaylee.alexander@utah.edu). 52 | -------------------------------------------------------------------------------- /Code/Test Versions/RMA-Tool-1.0.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import string 3 | import re 4 | 5 | def load_lexicon(file_path): 6 | try: 7 | # Load the lexicon CSV file into a DataFrame 8 | lexicon_df = pd.read_csv(file_path, encoding='latin1') 9 | return lexicon_df 10 | except FileNotFoundError: 11 | print("File not found. Please provide a valid file path.") 12 | return None 13 | except Exception as e: 14 | print("An error occurred:", e) 15 | return None 16 | 17 | def load_metadata(file_path): 18 | try: 19 | # Load the metadata CSV file into a DataFrame 20 | metadata_df = pd.read_csv(file_path, encoding='latin1') 21 | 22 | # Remove punctuation from specified columns 23 | punctuation_table = str.maketrans('', '', string.punctuation) 24 | metadata_df['Title'] = metadata_df['Title'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x) 25 | metadata_df['Description'] = metadata_df['Description'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x) 26 | metadata_df['Collection Name'] = metadata_df['Collection Name'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x) 27 | 28 | return metadata_df 29 | except FileNotFoundError: 30 | print("File not found. Please provide a valid file path.") 31 | return None 32 | except Exception as e: 33 | print("An error occurred:", e) 34 | return None 35 | 36 | def find_matches(lexicon_df, metadata_df): 37 | matches = [] 38 | # Iterate over each row in the metadata DataFrame 39 | for index, row in metadata_df.iterrows(): 40 | # Process the text in each specified column 41 | for col in ['Title', 'Description', 'Subject', 'Collection Name']: 42 | # Check if the value in the column is a string 43 | if isinstance(row[col], str): 44 | # Iterate over each term in the lexicon and check for matches 45 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 46 | # Check if the whole term exists in the text column 47 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 48 | matches.append((row['Identifier'], term, category, col)) 49 | return matches 50 | 51 | # Example usage 52 | lexicon_file_path = "lexicon.csv" # Replace with the path to your lexicon CSV file 53 | metadata_file_path = "metadata.csv" # Replace with the path to your metadata CSV file 54 | output_file_path = "matches.csv" # Path to the output matches CSV 55 | 56 | lexicon = load_lexicon(lexicon_file_path) 57 | metadata = load_metadata(metadata_file_path) 58 | 59 | # Perform matching 60 | if lexicon is not None and metadata is not None: 61 | matches = find_matches(lexicon, metadata) 62 | # Create DataFrame from matches 63 | matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column']) 64 | # Merge matches with original metadata using left join on "Identifier" 65 | merged_df = pd.merge(metadata, matches_df, on="Identifier", how="left") 66 | # Filter out rows without matches 67 | merged_df = merged_df.dropna(subset=['Term']) 68 | # Save merged DataFrame to CSV 69 | merged_df.to_csv(output_file_path, index=False) 70 | 71 | print("Merged data saved to:", output_file_path) 72 | -------------------------------------------------------------------------------- /Code/Test Versions/RMA-GUI-1.0.py: -------------------------------------------------------------------------------- 1 | import tkinter as tk 2 | from tkinter import filedialog 3 | from tkinter import messagebox 4 | import pandas as pd 5 | import string 6 | import re 7 | 8 | def load_lexicon(file_path): 9 | try: 10 | lexicon_df = pd.read_csv(file_path, encoding='latin1') 11 | return lexicon_df 12 | except FileNotFoundError: 13 | messagebox.showerror("Error", "File not found. Please provide a valid file path.") 14 | return None 15 | except Exception as e: 16 | messagebox.showerror("Error", f"An error occurred: {e}") 17 | return None 18 | 19 | def load_metadata(file_path): 20 | try: 21 | metadata_df = pd.read_csv(file_path, encoding='latin1') 22 | punctuation_table = str.maketrans('', '', string.punctuation) 23 | metadata_df['Title'] = metadata_df['Title'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x) 24 | metadata_df['Description'] = metadata_df['Description'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x) 25 | metadata_df['Collection Name'] = metadata_df['Collection Name'].apply(lambda x: x.translate(punctuation_table) if isinstance(x, str) else x) 26 | return metadata_df 27 | except FileNotFoundError: 28 | messagebox.showerror("Error", "File not found. Please provide a valid file path.") 29 | return None 30 | except Exception as e: 31 | messagebox.showerror("Error", f"An error occurred: {e}") 32 | return None 33 | 34 | def find_matches(lexicon_df, metadata_df): 35 | matches = [] 36 | for index, row in metadata_df.iterrows(): 37 | for col in ['Title', 'Description', 'Subject', 'Collection Name']: 38 | if isinstance(row[col], str): 39 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 40 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 41 | matches.append((row['Identifier'], term, category, col)) 42 | return matches 43 | 44 | def execute_matching(): 45 | lexicon_file_path = lexicon_entry.get() 46 | metadata_file_path = metadata_entry.get() 47 | output_file_path = output_entry.get() 48 | 49 | lexicon = load_lexicon(lexicon_file_path) 50 | metadata = load_metadata(metadata_file_path) 51 | 52 | if lexicon is not None and metadata is not None: 53 | matches = find_matches(lexicon, metadata) 54 | matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column']) 55 | merged_df = pd.merge(metadata, matches_df, on="Identifier", how="left") 56 | merged_df = merged_df.dropna(subset=['Term']) 57 | merged_df.to_csv(output_file_path, index=False) 58 | messagebox.showinfo("Success", "Matching process completed. Output saved successfully.") 59 | 60 | # GUI 61 | root = tk.Tk() 62 | root.title("Lexicon Matcher") 63 | 64 | # Lexicon file path entry 65 | lexicon_label = tk.Label(root, text="Lexicon File Path:") 66 | lexicon_label.grid(row=0, column=0, padx=5, pady=5) 67 | lexicon_entry = tk.Entry(root, width=50) 68 | lexicon_entry.grid(row=0, column=1, padx=5, pady=5) 69 | lexicon_button = tk.Button(root, text="Browse", command=lambda: lexicon_entry.insert(tk.END, filedialog.askopenfilename())) 70 | lexicon_button.grid(row=0, column=2, padx=5, pady=5) 71 | 72 | # Metadata file path entry 73 | metadata_label = tk.Label(root, text="Metadata File Path:") 74 | metadata_label.grid(row=1, column=0, padx=5, pady=5) 75 | metadata_entry = tk.Entry(root, width=50) 76 | metadata_entry.grid(row=1, column=1, padx=5, pady=5) 77 | metadata_button = tk.Button(root, text="Browse", command=lambda: metadata_entry.insert(tk.END, filedialog.askopenfilename())) 78 | metadata_button.grid(row=1, column=2, padx=5, pady=5) 79 | 80 | # Output file path entry 81 | output_label = tk.Label(root, text="Output File Path:") 82 | output_label.grid(row=2, column=0, padx=5, pady=5) 83 | output_entry = tk.Entry(root, width=50) 84 | output_entry.grid(row=2, column=1, padx=5, pady=5) 85 | output_button = tk.Button(root, text="Browse", command=lambda: output_entry.insert(tk.END, filedialog.asksaveasfilename(defaultextension=".csv"))) 86 | output_button.grid(row=2, column=2, padx=5, pady=5) 87 | 88 | # Execute button 89 | execute_button = tk.Button(root, text="Execute Matching", command=execute_matching) 90 | execute_button.grid(row=3, column=1, padx=5, pady=5) 91 | 92 | root.mainloop() 93 | -------------------------------------------------------------------------------- /XML Test Code/RMA-Tool-CSVOnly.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import nltk 3 | import string 4 | from nltk.tokenize import word_tokenize 5 | from nltk.corpus import stopwords 6 | 7 | # Download NLTK resources if not already downloaded 8 | nltk.download('punkt') 9 | nltk.download('stopwords') 10 | 11 | def load_lexicon_from_csv(file_path): 12 | """ 13 | Loads lexicon categories and terms from a CSV file into a dictionary. 14 | 15 | Parameters: 16 | - file_path (str): File path of the CSV input file. 17 | 18 | Returns: 19 | - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values. 20 | 21 | Note: 22 | - The CSV file should have lexicon categories as column headers and terms listed under each category. 23 | """ 24 | 25 | lexicon = { 26 | "Aggrandizement": [], 27 | "RaceEuphemisms": [], 28 | "RaceTerms": [], 29 | "SlaveryTerms": [], 30 | "GenderTerms": [], 31 | "LGBTQ": [], 32 | "MentalIllness": [], 33 | "Disability": [] 34 | } 35 | 36 | with open(file_path, 'r', encoding='utf-8-sig') as csv_file: 37 | csv_reader = csv.reader(csv_file) 38 | next(csv_reader) # Skip the header row 39 | 40 | for row in csv_reader: 41 | for category, terms in lexicon.items(): 42 | terms.extend(row) # Assuming each row contains terms for each lexicon category 43 | 44 | return lexicon 45 | 46 | def search_and_append_lexicon_category(lexicon, input_csv_file, output_csv_file): 47 | """ 48 | Searches for lexicon term matches in an input CSV file, appends lexicon categories to each row, and writes the modified data into an output CSV file. 49 | 50 | Parameters: 51 | - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values. 52 | - input_csv_file (str): File path of the input CSV file. 53 | - output_csv_file (str): File path of the output CSV file. 54 | 55 | Note: 56 | - The input CSV file should contain columns specified for lexicon analysis. 57 | - The output CSV file will have additional columns for each lexicon category, indicating the matched terms. 58 | """ 59 | 60 | # Load lexicon 61 | lexicon = load_lexicon_from_csv(lexicon) 62 | 63 | # Open input CSV file for reading and output CSV file for writing 64 | with open(input_csv_file, 'r', newline='', encoding='utf-8') as input_csv, \ 65 | open(output_csv_file, 'w', newline='', encoding='utf-8') as output_csv: 66 | 67 | reader = csv.DictReader(input_csv) 68 | fieldnames = reader.fieldnames + list(lexicon.keys()) # Add lexicon category names as additional columns 69 | writer = csv.DictWriter(output_csv, fieldnames=fieldnames) 70 | writer.writeheader() 71 | 72 | # Iterate over rows in the input CSV file 73 | for row in reader: 74 | # Initialize dictionary to store token matches for each lexicon category 75 | token_matches = {category: [] for category in lexicon} 76 | 77 | # Tokenize and preprocess text from specified columns 78 | for column in ["Title", "Subject", "Description", "Collection Name"]: 79 | text = row[column] 80 | if text: 81 | tokens = word_tokenize(text.lower()) 82 | filtered_tokens = [word for word in tokens if word not in stopwords.words('english') and word not in string.punctuation and not word.isdigit() and word != '--'] 83 | # Search for matches between tokens and terms in the lexicon 84 | for category, terms in lexicon.items(): 85 | matches = [term for term in filtered_tokens if term in terms] 86 | token_matches[category].extend(matches) 87 | 88 | # Update the row with token matches for each lexicon category 89 | row.update(token_matches) 90 | # Write the modified row to the output CSV file 91 | writer.writerow(row) 92 | 93 | # File paths 94 | lexicon_file_path = "PATH_TO_LEXICON_CSV_FILE" # Insert path to your lexicon CSV file 95 | input_csv_file_path = "PATH_TO_INPUT_CSV_FILE" # Insert path to your input CSV file 96 | output_csv_file_path = "PATH_TO_OUTPUT_CSV_FILE" # Insert path to desired output CSV file 97 | 98 | # Search for matches, append lexicon categories, and write to output CSV 99 | search_and_append_lexicon_category(lexicon_file_path, input_csv_file_path, output_csv_file_path) 100 | 101 | print("Lexicon matching and appending completed.") 102 | -------------------------------------------------------------------------------- /Code/Past Versions/MacOS-UsersGuide-2.5.md: -------------------------------------------------------------------------------- 1 | # Comprehensive Guide for Running MaRMAT in Terminal on a Mac 2 | 3 | ## 1. Prerequisites 4 | 5 | ### 1.1 **Python Installation**: 6 | - Ensure Python is installed by running the following in Terminal: 7 | 8 | ```bash 9 | python3 --version 10 | ``` 11 | 12 | - If Python is not installed, download it from the [official Python website](https://www.python.org/downloads/). 13 | 14 | ### 1.2 **Library Requirements**: 15 | - **Pandas**: Install the `pandas` library: 16 | 17 | ```bash 18 | pip3 install pandas 19 | ``` 20 | 21 | - **Regular Expression (`re`) Module**: This module is part of Python’s standard library. Confirm its availability: 22 | 23 | ```bash 24 | python3 -c "import re; print('re module is available')" 25 | ``` 26 | 27 | ## 2. Step-by-Step Instructions 28 | 29 | ### 2.1. **Save the Script** 30 | - Ensure the [MaRMAT-CommandLine-2.5.py](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-CommandLine-2.5.py) script is saved on your Mac. 31 | - Add the lexicon (e.g., [Reparative Metadata](https://github.com/marriott-library/MaRMAT/blob/main/Code/reparative-metadata-lexicon.csv), [LCSH](https://github.com/marriott-library/MaRMAT/blob/main/Code/LCSH-lexicon.csv)) you'd like to use as well as the metadata file you want to analyze to the same folder. 32 | 33 | ### 2.2. **Opening the Script for Editing with TextEdit** 34 | 35 | - **Locate the Script**: 36 | - Open Finder and navigate to the directory where the script is saved (e.g., `Documents`, `Downloads`). 37 | 38 | - **Open with TextEdit**: 39 | - Right-click on the script file (`MaRMAT-CommandLine-2.5.py`) and select **Open With > TextEdit**. 40 | - If you don’t see TextEdit, choose **Other...** and select TextEdit (or another text editor) from the list. 41 | 42 | - **Edit the Script**: 43 | - In TextEdit, find and modify the sections below according to your specific file paths and requirements. Note: These sections are all at the very end of the script under "# Example usage" (you can use `command` + `F` to search for "Example usage" to quickly find this). 44 | 45 | - **Load Lexicon**: 46 | 47 | ```python 48 | tool.load_lexicon("/path/to/your/lexicon.csv") # Replace with the path to your lexicon CSV file. 49 | ``` 50 | 51 | - **Load Metadata**: 52 | 53 | ```python 54 | tool.load_metadata("/path/to/your/metadata.csv") # Replace with the path to your metadata CSV file. 55 | ``` 56 | 57 | - **Select Columns for Matching**: 58 | 59 | ```python 60 | tool.select_columns(["Column1", "Column2"]) # Replace with the metadata column names you want to analyze. 61 | ``` 62 | 63 | - **Select Identifier Column**: 64 | 65 | ```python 66 | tool.select_identifier_column("Identifier") # Replace with the name of your identifier column (e.g., a record ID number). 67 | ``` 68 | 69 | - **Select Categories for Matching**: 70 | 71 | ```python 72 | tool.select_categories(["RaceTerms"]) # Replace with the categories from the lexicon that you want to search for. 73 | ``` 74 | 75 | - **Perform Matching and View Results**: 76 | 77 | ```python 78 | tool.perform_matching("/path/to/your/output.csv") # Replace with the path to your output file. 79 | ``` 80 | 81 | - Save your changes by clicking **File > Save** or pressing `Command + S`. 82 | 83 | - **Ensure Proper TextEdit Settings**: 84 | - If TextEdit opens the file in **Rich Text Format (RTF)**, change it to **Plain Text** by selecting **Format > Make Plain Text** from the menu. This ensures the script runs correctly. 85 | 86 | ### 2.3. **Running the Script** 87 | 88 | - **Open Terminal**: 89 | 90 | - **Navigate to the Script’s Directory**: 91 | - Use the `cd` command to go to the directory where your script is located, for example: 92 | 93 | ```bash 94 | cd ~/Documents 95 | ``` 96 | 97 | - **Execute the Script**: 98 | - Run the script using the following command: 99 | 100 | ```bash 101 | python3 MaRMAT-CommandLine-2.5.py 102 | ``` 103 | 104 | ### 3. Additional Considerations 105 | 106 | - **Full Paths**: Always use the full path for files if they are not in the same directory as the script. 107 | - **Output Files**: Ensure the output directory specified in the script has the appropriate permissions to save the results. 108 | -------------------------------------------------------------------------------- /Code/lexicon-reparative-metadata.csv: -------------------------------------------------------------------------------- 1 | term,category 2 | acclaimed,Aggrandizement 3 | ambitious,Aggrandizement 4 | celebrated,Aggrandizement 5 | distinguished,Aggrandizement 6 | eminent,Aggrandizement 7 | esteemed,Aggrandizement 8 | expert,Aggrandizement 9 | famous,Aggrandizement 10 | father of,Aggrandizement 11 | foremost,Aggrandizement 12 | founding father,Aggrandizement 13 | genius,Aggrandizement 14 | gentleman,Aggrandizement 15 | important,Aggrandizement 16 | influential,Aggrandizement 17 | man of letters,Aggrandizement 18 | masterpiece,Aggrandizement 19 | notable,Aggrandizement 20 | patriot,Aggrandizement 21 | pioneer,Aggrandizement 22 | plantation owner,Aggrandizement 23 | preeminent,Aggrandizement 24 | prestigious,Aggrandizement 25 | prolific,Aggrandizement 26 | prominent,Aggrandizement 27 | renowned,Aggrandizement 28 | respected,Aggrandizement 29 | revolutionary,Aggrandizement 30 | seminal,Aggrandizement 31 | successful,Aggrandizement 32 | wealthy,Aggrandizement 33 | able-bodied,Disability 34 | birth defect,Disability 35 | confined to a wheelchair,Disability 36 | cripples,Disability 37 | cripple,Disability 38 | crippled,Disability 39 | deaf-mute,Disability 40 | deformed,Disability 41 | disabled person,Disability 42 | dwarf,Disability 43 | dwarfs,Disability 44 | epileptic,Disability 45 | handicap,Disability 46 | handicapped,Disability 47 | invalid,Disability 48 | invalids,Disability 49 | lame,Disability 50 | midget,Disability 51 | paraplegic,Disability 52 | physically challenged,Disability 53 | the deaf,Disability 54 | the disabled,Disability 55 | wheelchair-bound,Disability 56 | matriarch,Gender 57 | miss,Gender 58 | mistress,Gender 59 | mrs.,Gender 60 | muse,Gender 61 | patriarch,Gender 62 | spouse,Gender 63 | wife,Gender 64 | wives,Gender 65 | biologically female,LGBTQ 66 | biologically male,LGBTQ 67 | dyke,LGBTQ 68 | fag,LGBTQ 69 | gay lifestyle,LGBTQ 70 | gays,LGBTQ 71 | homosexual,LGBTQ 72 | homosexuality,LGBTQ 73 | lesbianism,LGBTQ 74 | sexual minorities,LGBTQ 75 | sexual minority,LGBTQ 76 | sexual preference,LGBTQ 77 | tranny,LGBTQ 78 | transvestite,LGBTQ 79 | brain damaged,MentalIllness 80 | committed suicide,MentalIllness 81 | crazy,MentalIllness 82 | dumb,MentalIllness 83 | emotionally disturbed,MentalIllness 84 | insane,MentalIllness 85 | retarded,MentalIllness 86 | slow learner,MentalIllness 87 | special needs,MentalIllness 88 | stupid,MentalIllness 89 | color blind,RaceEuphemisms 90 | colored,RaceEuphemisms 91 | coloured,RaceEuphemisms 92 | race relations,RaceEuphemisms 93 | race situation,RaceEuphemisms 94 | race-based,RaceEuphemisms 95 | racial,RaceEuphemisms 96 | racism,RaceEuphemisms 97 | riot,RaceEuphemisms 98 | troubles,RaceEuphemisms 99 | unruly,RaceEuphemisms 100 | aboriginal,Race 101 | aboriginals,Race 102 | aborigines,Race 103 | afro-american,Race 104 | aliens,Race 105 | arab,Race 106 | arabs,Race 107 | asians,Race 108 | asiatic,Race 109 | blacks,Race 110 | bushman,Race 111 | bushmen,Race 112 | bushwoman,Race 113 | chink,Race 114 | civilized,Race 115 | coolie,Race 116 | coolies,Race 117 | creole,Race 118 | creoles,Race 119 | ethnic,Race 120 | exotic,Race 121 | gook,Race 122 | gypsies,Race 123 | gypsy,Race 124 | hispanics,Race 125 | illegal alien,Race 126 | illegal aliens,Race 127 | illegal immigrant,Race 128 | illegal immigrants,Race 129 | illegals,Race 130 | indian,Race 131 | indians,Race 132 | japs,Race 133 | mammy,Race 134 | mulatto,Race 135 | mulattoes,Race 136 | mulattos,Race 137 | Native Americans,Race 138 | natives,Race 139 | negroes,Race 140 | negros,Race 141 | oriental,Race 142 | primitive people,Race 143 | primitives,Race 144 | pygmies,Race 145 | pygmy,Race 146 | sambo,Race 147 | savages,Race 148 | segregated,Race 149 | squaw,Race 150 | squaws,Race 151 | uncivilized,Race 152 | lamenite,Race 153 | abolition,Slavery 154 | abolitionist,Slavery 155 | antislavery,Slavery 156 | anti-slavery,Slavery 157 | bill of sale,Slavery 158 | bills of sale,Slavery 159 | enslaved,Slavery 160 | freed slave,Slavery 161 | freed slaves,Slavery 162 | freedman,Slavery 163 | freedmen,Slavery 164 | fugitive slave,Slavery 165 | hired out,Slavery 166 | hiring out,Slavery 167 | manumission,Slavery 168 | manumitted,Slavery 169 | overseer,Slavery 170 | plantation,Slavery 171 | runaway slave,Slavery 172 | runaway slaves,Slavery 173 | slave,Slavery 174 | slave holder,Slavery 175 | slave master,Slavery 176 | slave owner,Slavery 177 | slave revolt,Slavery 178 | slaveholder,Slavery 179 | slavery,Slavery 180 | slaves,Slavery 181 | evacuate,USJapaneseIncarceration 182 | evacuation,USJapaneseIncarceration 183 | evacuees,USJapaneseIncarceration 184 | evacuee,USJapaneseIncarceration 185 | relocation,USJapaneseIncarceration 186 | internment,USJapaneseIncarceration 187 | assembly center,USJapaneseIncarceration 188 | relocation center,USJapaneseIncarceration 189 | non-aliens,USJapaneseIncarceration 190 | native American aliens,USJapaneseIncarceration 191 | civilian exclusion orders,USJapaneseIncarceration 192 | relocate,USJapaneseIncarceration 193 | relocation,USJapaneseIncarceration 194 | -------------------------------------------------------------------------------- /Code/Past Versions/MaRMAT-CommandLine-2.5.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import re 3 | 4 | class MaRMAT: 5 | """A tool for assessing metadata and identifying matches based on a provided lexicon.""" 6 | 7 | def __init__(self): 8 | """Initialize the assessment tool.""" 9 | self.lexicon_df = None 10 | self.metadata_df = None 11 | self.columns = [] # List of all available columns in the metadata 12 | self.categories = [] # List of all available categories in the lexicon 13 | self.selected_columns = [] # List of columns selected for matching 14 | self.identifier_column = None # Identifier column used to uniquely identify rows 15 | 16 | def load_lexicon(self, file_path): 17 | """Load the lexicon file. 18 | 19 | Parameters: 20 | file_path (str): Path to the lexicon CSV file. 21 | 22 | """ 23 | try: 24 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1') 25 | print("Lexicon loaded successfully.") 26 | except Exception as e: 27 | print(f"An error occurred while loading lexicon: {e}") 28 | 29 | def load_metadata(self, file_path): 30 | """Load the metadata file. 31 | 32 | Parameters: 33 | file_path (str): Path to the metadata CSV file. 34 | 35 | """ 36 | try: 37 | self.metadata_df = pd.read_csv(file_path, encoding='latin1') 38 | print("Metadata loaded successfully.") 39 | except Exception as e: 40 | print(f"An error occurred while loading metadata: {e}") 41 | 42 | def select_columns(self, columns): 43 | """Select columns from the metadata for matching. 44 | 45 | Parameters: 46 | columns (list of str): List of column names in the metadata. 47 | 48 | """ 49 | self.selected_columns = columns 50 | 51 | def select_identifier_column(self, column): 52 | """Select the identifier column used for uniquely identifying rows. 53 | 54 | Parameters: 55 | column (str): Name of the identifier column in the metadata. 56 | 57 | """ 58 | self.identifier_column = column 59 | 60 | def select_categories(self, categories): 61 | """Select categories from the lexicon for matching. 62 | 63 | Parameters: 64 | categories (list of str): List of category names in the lexicon. 65 | 66 | """ 67 | self.categories = categories 68 | 69 | def perform_matching(self, output_file): 70 | """Perform matching between selected columns and categories and save results to a CSV file. 71 | 72 | Parameters: 73 | output_file (str): Path to the output CSV file to save matching results. 74 | 75 | """ 76 | if self.lexicon_df is None or self.metadata_df is None: 77 | print("Please load lexicon and metadata files first.") 78 | return 79 | 80 | matches = self.find_matches(self.selected_columns, self.categories) 81 | matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column']) 82 | print(matches_df) 83 | 84 | """Write results to CSV""" 85 | try: 86 | matches_df.to_csv(output_file, index=False) 87 | print(f"Results saved to {output_file}") 88 | except Exception as e: 89 | print(f"An error occurred while saving results: {e}") 90 | 91 | def find_matches(self, selected_columns, selected_categories): 92 | """Find matches between metadata and lexicon based on selected columns and categories. 93 | 94 | Parameters: 95 | selected_columns (list of str): List of column names from metadata for matching. 96 | selected_categories (list of str): List of category names from the lexicon for matching. 97 | 98 | Returns: 99 | list of tuple: List of tuples containing matched results (Identifier, Term, Category, Column). 100 | 101 | """ 102 | matches = [] 103 | lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)] 104 | for index, row in self.metadata_df.iterrows(): 105 | for col in selected_columns: 106 | if isinstance(row[col], str): 107 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 108 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 109 | matches.append((row[self.identifier_column], term, category, col)) 110 | return matches 111 | 112 | # Define output file path 113 | output_file = "matches.csv" # Input the file path where you want to save your matches here. 114 | 115 | # Example usage: 116 | print("1. Initialize the tool:") 117 | tool = MaRMAT() 118 | 119 | print("\n2. Load lexicon and metadata files:") 120 | tool.load_lexicon("lexicon.csv") # Input the path to your lexicon CSV file. 121 | tool.load_metadata("metadata.csv") # Input the path to your metadata CSV file. 122 | 123 | print("\n3. Select columns for matching:") 124 | tool.select_columns(["Column1", "Column2"]) # Input the name(s) of the metadata column(s) you want to analyze. 125 | 126 | print("\n4. Select the identifier column:") 127 | tool.select_identifier_column("Identifier") # Input the name of your identifier column (e.g., a record ID number). 128 | 129 | print("\n5. Select categories for matching:") 130 | tool.select_categories(["RaceTerms"]) # Input the categories from the lexicon that you want to search for. 131 | 132 | print("\n6. Perform matching and view results:") 133 | tool.perform_matching(output_file) 134 | -------------------------------------------------------------------------------- /Code/MarMAT-CommandLine-2.6.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import re 3 | 4 | class MaRMAT: 5 | """A tool for assessing metadata and identifying matches based on a provided lexicon.""" 6 | 7 | def __init__(self): 8 | """Initialize the assessment tool.""" 9 | self.lexicon_df = None 10 | self.metadata_df = None 11 | self.columns = [] # List of all available columns in the metadata 12 | self.categories = [] # List of all available categories in the lexicon 13 | self.selected_columns = [] # List of columns selected for matching 14 | self.identifier_column = None # Identifier column used to uniquely identify rows 15 | 16 | def load_lexicon(self, file_path): 17 | """Load the lexicon file. 18 | 19 | Parameters: 20 | file_path (str): Path to the lexicon CSV file. 21 | 22 | """ 23 | try: 24 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1') 25 | print("Lexicon loaded successfully.") 26 | except Exception as e: 27 | print(f"An error occurred while loading lexicon: {e}") 28 | 29 | def load_metadata(self, file_path): 30 | """Load the metadata file. 31 | 32 | Parameters: 33 | file_path (str): Path to the metadata CSV file. 34 | 35 | """ 36 | try: 37 | self.metadata_df = pd.read_csv(file_path, encoding='latin1') 38 | print("Metadata loaded successfully.") 39 | except Exception as e: 40 | print(f"An error occurred while loading metadata: {e}") 41 | 42 | def select_columns(self, columns): 43 | """Select columns from the metadata for matching. 44 | 45 | Parameters: 46 | columns (list of str): List of column names in the metadata. 47 | 48 | """ 49 | self.selected_columns = columns 50 | 51 | def select_identifier_column(self, column): 52 | """Select the identifier column used for uniquely identifying rows. 53 | 54 | Parameters: 55 | column (str): Name of the identifier column in the metadata. 56 | 57 | """ 58 | self.identifier_column = column 59 | 60 | def select_categories(self, categories): 61 | """Select categories from the lexicon for matching. 62 | 63 | Parameters: 64 | categories (list of str): List of category names in the lexicon. 65 | 66 | """ 67 | self.categories = categories 68 | 69 | def perform_matching(self, output_file): 70 | """Perform matching between selected columns and categories and save results to a CSV file. 71 | 72 | Parameters: 73 | output_file (str): Path to the output CSV file to save matching results. 74 | 75 | """ 76 | if self.lexicon_df is None or self.metadata_df is None: 77 | print("Please load lexicon and metadata files first.") 78 | return 79 | 80 | matches = self.find_matches(self.selected_columns, self.categories) 81 | matches_df = pd.DataFrame(matches, columns=['Identifier', 'Term', 'Category', 'Column']) 82 | print(matches_df) 83 | 84 | """Write results to CSV""" 85 | try: 86 | matches_df.to_csv(output_file, index=False) 87 | print(f"Results saved to {output_file}") 88 | except Exception as e: 89 | print(f"An error occurred while saving results: {e}") 90 | 91 | def find_matches(self, selected_columns, selected_categories): 92 | """Find matches between metadata and lexicon based on selected columns and categories. 93 | 94 | Parameters: 95 | selected_columns (list of str): List of column names from metadata for matching. 96 | selected_categories (list of str): List of category names from the lexicon for matching. 97 | 98 | Returns: 99 | list of tuple: List of tuples containing matched results (Identifier, Term, Category, Column). 100 | 101 | """ 102 | matches = [] 103 | lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)] 104 | for index, row in self.metadata_df.iterrows(): 105 | for col in selected_columns: 106 | if isinstance(row[col], str): 107 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 108 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 109 | matches.append((row[self.identifier_column], term, category, col)) 110 | return matches 111 | 112 | # Main program for command line interaction 113 | if __name__ == "__main__": 114 | print("1. Initialize the tool:") 115 | tool = MaRMAT() 116 | 117 | print("\n2. Load lexicon and metadata files:") 118 | lexicon_path = input("Enter the path to the lexicon CSV file: ") 119 | tool.load_lexicon(lexicon_path) 120 | 121 | metadata_path = input("Enter the path to the metadata CSV file: ") 122 | tool.load_metadata(metadata_path) 123 | 124 | print("\n3. Select columns for matching:") 125 | columns = input("Enter the column names for matching, separated by commas: ").split(",") 126 | tool.select_columns([col.strip() for col in columns]) # Strip whitespace 127 | 128 | print("\n4. Select the identifier column:") 129 | identifier = input("Enter the name of the identifier column: ") 130 | tool.select_identifier_column(identifier) 131 | 132 | print("\n5. Select categories for matching:") 133 | categories = input("Enter the categories from the lexicon for matching, separated by commas: ").split(",") 134 | tool.select_categories([cat.strip() for cat in categories]) # Strip whitespace 135 | 136 | print("\n6. Perform matching and view results:") 137 | output_file = input("Enter the path to save the output CSV file: ") 138 | tool.perform_matching(output_file) 139 | -------------------------------------------------------------------------------- /Code/example-output-lcsh-subject-lexicon.csv: -------------------------------------------------------------------------------- 1 | Identifier,Term,Category,Column,Original Text 2 | 1533946,Indians of North America,ProblemLCSH,subjects,"Indians of North America--Monuments--Photographs; Indians of North America--Art--Photographs; Malin, Millard F., 1891-1974--Works--Photographs; Sculptures--Photographs; Indigenous peoples--North America" 3 | 1398979,Navajo,ProblemLCSH,subjects,"Navajo Indians--Music--Photographs; Navajo Indians-Photographs; Salt Lake City (Utah)--Photographs; Olympic Winter Games (19th : 2002 : Salt Lake City, Utah)--Photographs; Indigenous peoples--North America" 4 | 962277,Indians of North America,ProblemLCSH,subjects,"Wounded Knee (S.D.)--History--Indian occupation, 1973--Photographs; Indians of North America; Indigenous peoples--North America" 5 | 2348627,Navajo,ProblemLCSH,subjects,Navajo Mountain (Utah and Ariz.)--Photographs; Colorado River (Colo.-Mexico)--Photographs 6 | 1498946,Discovery and exploration,ProblemLCSH,subjects,"Pueblo Indians--History--Art; Southwest, New--Discovery and exploration--Art; Indigenous peoples--North America" 7 | 995167,Navajo,ProblemLCSH,subjects,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America 8 | 958790,Race,ProblemLCSH,subjects,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs" 9 | 998919,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs" 10 | 1302623,Navajo,ProblemLCSH,subjects,Volcanic fields--Arizona--Apache County--Photographs; Volcanic fields--Navajo Indian Reservation--Photographs; Buttes--Arizona--Apache County--Photographs; Buttes--Navajo Indian Reservation--Photographs; Landforms--Arizona--Apache County--Photographs; Landforms--Navajo Indian Reservation--Photographs; Geology--Arizona--Apache County--Photographs; Navajo Indian Reservation--Photographs 11 | 995167,Navajo,ProblemLCSH,subjects,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America 12 | 2364219,Mormon,ChangeHeadingLCSH,subjects,Mormon pioneers--Photographs; Pioneer Day (Mormon history)--Photographs; Parades--Utah--Salt Lake City--Photographs 13 | 2364219,Mormon pioneers,ChangeHeadingLCSH,subjects,Mormon pioneers--Photographs; Pioneer Day (Mormon history)--Photographs; Parades--Utah--Salt Lake City--Photographs 14 | 941713,"Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH,subjects,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Clothing & dress--Tule Lake Relocation Center--1940-1950" 15 | 941496,"Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH,subjects,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Agriculture--1940-1950" 16 | 941536,"Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH,subjects,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; World War, 1939-1945--Concentration camps--California; Farming--California--Tule Lake--1940-1950; Agricultural laborers--California--Tule lake--1940-1950" 17 | 958469,Race,ProblemLCSH,subjects,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs" 18 | 958790,Race,ProblemLCSH,subjects,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs" 19 | 998946,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs" 20 | 998908,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs" 21 | 998938,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Mormon history); Days of '47; Holidays; Local holidays; Theatrical sets; Costumes (character dress); Thrones; McKay, Calleen Alice Robinson, 1928-2005; Women; Pioneer Days Royalty; Centennial Queens; Beauty contestants" 22 | 998919,Mormon,ChangeHeadingLCSH,subjects,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs" 23 | 2292054,Race,ProblemLCSH,subjects,"Civil rights movements--United States--History--20th century; Civil rights--Utah; Race discrimination--Religious aspects--Latter Day Saint churches; African Americans; Racism against Black people; Nabors, Charles James,1934-1986; African American scientists" 24 | 1396789,Hispanic Americans,ProblemLCSH,subjects,Hispanic Americans--Utah; Latin Americans--Utah 25 | 1396777,Indians of North America,ProblemLCSH,subjects,"Indians--Legal status, laws, etc.; Indians of North America--Legal status, laws, etc.--Canada; Indians of North America--Legal status, laws, etc.--United States; Discrimination; Indians--Social conditions; Indigenous peoples--North America" 26 | 1049391,Nephites,ChangeHeadingLCSH,subjects,Nephites; Folklore--Utah; Latter Day Saints--Utah--History--Anecdotes 27 | 1006680,Mormon,ChangeHeadingLCSH,subjects,Church of Jesus Christ of Latter-day Saints--History; Utah--History; Mormon converts 28 | 1006680,Mormon converts,ChangeHeadingLCSH,subjects,Church of Jesus Christ of Latter-day Saints--History; Utah--History; Mormon converts 29 | 1470940,Victims,ProblemLCSH,subjects,Nuclear weapons Testing; Nuclear weapons--United States--Testing; Nuclear weapons testing victims; Nuclear weapons--United States--Testing; Radioactive fallout 30 | 818891,Mormon,ChangeHeadingLCSH,subjects,"Loewenberg, Peter, 1933- --Interviews; Brodie, Fawn McKay, 1915-1981--Biography; Latter Day Saints--Biography; Mormon scholars--Biography; University of California, Los Angeles--Faculty--Biography" 31 | 818891,Mormon scholars,ChangeHeadingLCSH,subjects,"Loewenberg, Peter, 1933- --Interviews; Brodie, Fawn McKay, 1915-1981--Biography; Latter Day Saints--Biography; Mormon scholars--Biography; University of California, Los Angeles--Faculty--Biography" 32 | 1396789,Hispanic Americans,ProblemLCSH,subjects,Hispanic Americans--Utah; Latin Americans--Utah 33 | 893658,Race,ProblemLCSH,subjects,"African Americans--Utah--Interviews; Henry, Alberta H., 1920-2005--Interviews; African Americans--Civil rights--Utah; Utah--Race relations" 34 | 893658,Race relations,ProblemLCSH,subjects,"African Americans--Utah--Interviews; Henry, Alberta H., 1920-2005--Interviews; African Americans--Civil rights--Utah; Utah--Race relations" 35 | 958364,Race,ProblemLCSH,subjects,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs" 36 | 1739616,Victims,ProblemLCSH,subjects,Murder victims--Photographs 37 | -------------------------------------------------------------------------------- /XML Test Code/RMA-Tool.py: -------------------------------------------------------------------------------- 1 | import xml.etree.ElementTree as ET 2 | import csv 3 | import nltk 4 | import string 5 | from nltk.tokenize import word_tokenize 6 | from nltk.corpus import stopwords 7 | import re 8 | 9 | # Download NLTK resources if not already downloaded 10 | nltk.download('punkt') 11 | nltk.download('stopwords') 12 | 13 | def parse_xml_to_csv(xml_file, csv_file): 14 | """ 15 | Parses an XML file containing specific metadata and writes the extracted data into a CSV file. 16 | 17 | Parameters: 18 | - xml_file (str): File path of the XML input file. 19 | - csv_file (str): File path of the CSV output file. 20 | 21 | Note: 22 | - Make sure the XML file follows a specific structure with predefined namespaces. 23 | - Ensure that the CSV file path points to a writable location. 24 | """ 25 | 26 | # Define namespaces 27 | namespaces = { 28 | 'oai': 'http://www.openarchives.org/OAI/2.0/', 29 | 'qdc': 'http://worldcat.org/xmlschemas/qdc-1.0/', 30 | 'dcterms': 'http://purl.org/dc/terms/', 31 | 'dc': 'http://purl.org/dc/elements/1.1/' 32 | } 33 | 34 | # Load stopwords and punctuation 35 | stop_words = set(stopwords.words('english')) 36 | punctuation = set(string.punctuation) 37 | 38 | # Parse XML 39 | tree = ET.parse(xml_file) 40 | root = tree.getroot() 41 | 42 | # Open CSV file for writing 43 | with open(csv_file, 'w', newline='', encoding='utf-8') as csvfile: 44 | writer = csv.writer(csvfile) 45 | 46 | # Write headers 47 | writer.writerow(['Identifier', 'Title', 'Subject', 'IdentifierURL', 'Token']) 48 | 49 | # Extract data from XML and write to CSV 50 | for record in root.findall('.//oai:record', namespaces): 51 | identifier = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else "" 52 | title = record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces) is not None else "" 53 | subject = record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces) is not None else "" 54 | identifier_url = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else "" 55 | 56 | # Tokenize and preprocess title and subject 57 | title_tokens = [word for word in word_tokenize(title.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if title else [] 58 | subject_tokens = [word for word in word_tokenize(subject.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if subject else [] 59 | 60 | # Write each token as a separate row with other columns filled down 61 | for token in title_tokens: 62 | writer.writerow([identifier, title, subject, identifier_url, token]) 63 | 64 | for token in subject_tokens: 65 | writer.writerow([identifier, title, subject, identifier_url, token]) 66 | 67 | # Define file paths 68 | xml_file_path = "PATH_TO_XML_FILE" # Insert path to your XML file 69 | csv_file_path = "PATH_TO_CSV_FILE" # Insert path to desired CSV output file 70 | 71 | # Parse XML and write to CSV 72 | parse_xml_to_csv(xml_file_path, csv_file_path) 73 | 74 | print("Conversion from XML to CSV successfully completed.") 75 | 76 | def load_lexicon_from_csv(file_path): 77 | """ 78 | Loads lexicon categories and terms from a CSV file into a dictionary. 79 | 80 | Parameters: 81 | - file_path (str): File path of the CSV input file. 82 | 83 | Returns: 84 | - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values. 85 | 86 | Note: 87 | - The CSV file should have lexicon categories as column headers and terms listed under each category. 88 | """ 89 | 90 | lexicon = { 91 | "Aggrandizement": [], 92 | "RaceEuphemisms": [], 93 | "RaceTerms": [], 94 | "SlaveryTerms": [], 95 | "GenderTerms": [], 96 | "LGBTQ": [], 97 | "MentalIllness": [], 98 | "Disability": [] 99 | } 100 | 101 | with open(file_path, 'r', encoding='utf-8-sig') as csv_file: 102 | csv_reader = csv.reader(csv_file) 103 | next(csv_reader) # Skip the header row 104 | 105 | for row in csv_reader: 106 | if len(row) == 8: 107 | lexicon["Aggrandizement"].append(row[0]) 108 | lexicon["RaceEuphemisms"].append(row[1]) 109 | lexicon["RaceTerms"].append(row[2]) 110 | lexicon["SlaveryTerms"].append(row[3]) 111 | lexicon["GenderTerms"].append(row[4]) 112 | lexicon["LGBTQ"].append(row[5]) 113 | lexicon["MentalIllness"].append(row[6]) 114 | lexicon["Disability"].append(row[7]) 115 | else: 116 | print("Invalid row format. Skipping...") 117 | 118 | return lexicon 119 | 120 | def search_and_append_lexicon_category(lexicon, input_csv_file, output_csv_file): 121 | """ 122 | Searches for lexicon term matches in an input CSV file, appends lexicon categories to each row, and writes the modified data into an output CSV file. 123 | 124 | Parameters: 125 | - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values. 126 | - input_csv_file (str): File path of the input CSV file. 127 | - output_csv_file (str): File path of the output CSV file. 128 | 129 | Note: 130 | - The input CSV file should contain a 'Token' column where lexicon term matches will be searched. 131 | - The output CSV file will have an additional column 'LexiconCategory' appended to each row, indicating the matched lexicon categories. 132 | """ 133 | 134 | # Load lexicon 135 | lexicon = load_lexicon_from_csv(lexicon) 136 | 137 | # Open input CSV file for reading and output CSV file for writing 138 | with open(input_csv_file, 'r', newline='', encoding='utf-8') as input_csv, \ 139 | open(output_csv_file, 'w', newline='', encoding='utf-8') as output_csv: 140 | 141 | reader = csv.reader(input_csv) 142 | writer = csv.writer(output_csv) 143 | 144 | # Write headers to the output CSV file 145 | headers = next(reader) 146 | headers.append('LexiconCategory') 147 | writer.writerow(headers) 148 | 149 | # Iterate over rows in the input CSV file 150 | for row in reader: 151 | token = row[4] # Assuming token is in the 5th column 152 | # Search for matches between token and terms in the lexicon 153 | matching_categories = [category for category, terms in lexicon.items() if token in terms] 154 | # Append lexicon category to the row 155 | row.append(', '.join(matching_categories)) 156 | # Write the modified row to the output CSV file 157 | writer.writerow(row) 158 | 159 | # Open the output CSV file again to remove rows without a LexiconCategory 160 | with open(output_csv_file, 'r', newline='', encoding='utf-8') as output_csv: 161 | reader = csv.reader(output_csv) 162 | # Filter rows based on presence of LexiconCategory 163 | rows_to_keep = [row for row in reader if row[-1] != ''] # Assuming LexiconCategory is the last column 164 | # Write filtered rows back to the output CSV file 165 | with open(output_csv_file, 'w', newline='', encoding='utf-8') as updated_output_csv: 166 | writer = csv.writer(updated_output_csv) 167 | writer.writerows(rows_to_keep) 168 | 169 | # File paths 170 | lexicon_file_path = "PATH_TO_LEXICON_CSV_FILE" # Insert path to your lexicon CSV file 171 | input_csv_file_path = "PATH_TO_INPUT_CSV_FILE" # Insert path to your input CSV file 172 | output_csv_file_path = "PATH_TO_OUTPUT_CSV_FILE" # Insert path to desired output CSV file 173 | 174 | # Search for matches, append lexicon category, and remove rows without a LexiconCategory 175 | search_and_append_lexicon_category(lexicon_file_path, input_csv_file_path, output_csv_file_path) 176 | 177 | print("Reparative Metadata Audit successfully completed.") 178 | -------------------------------------------------------------------------------- /XML Test Code/RMA-GUI.py: -------------------------------------------------------------------------------- 1 | import xml.etree.ElementTree as ET 2 | import csv 3 | import nltk 4 | import string 5 | from nltk.tokenize import word_tokenize 6 | from nltk.corpus import stopwords 7 | import tkinter as tk 8 | from tkinter import filedialog, messagebox 9 | 10 | # Download NLTK resources if not already downloaded 11 | nltk.download('punkt') 12 | nltk.download('stopwords') 13 | 14 | def parse_xml_to_csv(xml_file, csv_file): 15 | """ 16 | Parses an XML file containing specific metadata and writes the extracted data into a CSV file. 17 | 18 | Parameters: 19 | - xml_file (str): File path of the XML input file. 20 | - csv_file (str): File path of the CSV output file. 21 | 22 | Note: 23 | - Make sure the XML file follows a specific structure with predefined namespaces. 24 | - Ensure that the CSV file path points to a writable location. 25 | """ 26 | 27 | # Define namespaces 28 | namespaces = { 29 | 'oai': 'http://www.openarchives.org/OAI/2.0/', 30 | 'qdc': 'http://worldcat.org/xmlschemas/qdc-1.0/', 31 | 'dcterms': 'http://purl.org/dc/terms/', 32 | 'dc': 'http://purl.org/dc/elements/1.1/' 33 | } 34 | 35 | # Load stopwords and punctuation 36 | stop_words = set(stopwords.words('english')) 37 | punctuation = set(string.punctuation) 38 | 39 | # Parse XML 40 | tree = ET.parse(xml_file) 41 | root = tree.getroot() 42 | 43 | # Open CSV file for writing 44 | with open(csv_file, 'w', newline='', encoding='utf-8') as csvfile: 45 | writer = csv.writer(csvfile) 46 | 47 | # Write headers 48 | writer.writerow(['Identifier', 'Title', 'Subject', 'IdentifierURL', 'Token']) 49 | 50 | # Extract data from XML and write to CSV 51 | for record in root.findall('.//oai:record', namespaces): 52 | identifier = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else "" 53 | title = record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:title', namespaces) is not None else "" 54 | subject = record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:subject', namespaces) is not None else "" 55 | identifier_url = record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces).text if record.find('./oai:metadata/qdc:qualifieddc/dc:identifier', namespaces) is not None else "" 56 | 57 | # Tokenize and preprocess title and subject 58 | title_tokens = [word for word in word_tokenize(title.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if title else [] 59 | subject_tokens = [word for word in word_tokenize(subject.lower()) if word not in stop_words and word not in punctuation and not word.isdigit() and word != '--'] if subject else [] 60 | 61 | # Write each token as a separate row with other columns filled down 62 | for token in title_tokens: 63 | writer.writerow([identifier, title, subject, identifier_url, token]) 64 | 65 | for token in subject_tokens: 66 | writer.writerow([identifier, title, subject, identifier_url, token]) 67 | 68 | def load_lexicon_from_csv(file_path): 69 | """ 70 | Loads lexicon categories and terms from a CSV file into a dictionary. 71 | 72 | Parameters: 73 | - file_path (str): File path of the CSV input file. 74 | 75 | Returns: 76 | - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values. 77 | 78 | Note: 79 | - The CSV file should have lexicon categories as column headers and terms listed under each category. 80 | """ 81 | 82 | lexicon = { 83 | "Aggrandizement": [], 84 | "RaceEuphemisms": [], 85 | "RaceTerms": [], 86 | "SlaveryTerms": [], 87 | "GenderTerms": [], 88 | "LGBTQ": [], 89 | "MentalIllness": [], 90 | "Disability": [] 91 | } 92 | 93 | with open(file_path, 'r', encoding='utf-8-sig') as csv_file: 94 | csv_reader = csv.reader(csv_file) 95 | next(csv_reader) # Skip the header row 96 | 97 | for row in csv_reader: 98 | if len(row) == 8: 99 | lexicon["Aggrandizement"].append(row[0]) 100 | lexicon["RaceEuphemisms"].append(row[1]) 101 | lexicon["RaceTerms"].append(row[2]) 102 | lexicon["SlaveryTerms"].append(row[3]) 103 | lexicon["GenderTerms"].append(row[4]) 104 | lexicon["LGBTQ"].append(row[5]) 105 | lexicon["MentalIllness"].append(row[6]) 106 | lexicon["Disability"].append(row[7]) 107 | else: 108 | print("Invalid row format. Skipping...") 109 | 110 | return lexicon 111 | 112 | def search_and_append_lexicon_category(lexicon, input_csv_file, output_csv_file): 113 | """ 114 | Searches for lexicon term matches in an input CSV file, appends lexicon categories to each row, and writes the modified data into an output CSV file. 115 | 116 | Parameters: 117 | - lexicon (dict): Dictionary containing lexicon categories as keys and lists of terms as values. 118 | - input_csv_file (str): File path of the input CSV file. 119 | - output_csv_file (str): File path of the output CSV file. 120 | 121 | Note: 122 | - The input CSV file should contain a 'Token' column where lexicon term matches will be searched. 123 | - The output CSV file will have an additional column 'LexiconCategory' appended to each row, indicating the matched lexicon categories. 124 | """ 125 | 126 | # Load lexicon 127 | lexicon = load_lexicon_from_csv(lexicon) 128 | 129 | # Open input CSV file for reading and output CSV file for writing 130 | with open(input_csv_file, 'r', newline='', encoding='utf-8') as input_csv, \ 131 | open(output_csv_file, 'w', newline='', encoding='utf-8') as output_csv: 132 | 133 | reader = csv.reader(input_csv) 134 | writer = csv.writer(output_csv) 135 | 136 | # Write headers to the output CSV file 137 | headers = next(reader) 138 | headers.append('LexiconCategory') 139 | writer.writerow(headers) 140 | 141 | # Iterate over rows in the input CSV file 142 | for row in reader: 143 | token = row[4] # Assuming token is in the 5th column 144 | # Search for matches between token and terms in the lexicon 145 | matching_categories = [category for category, terms in lexicon.items() if token in terms] 146 | # Append lexicon category to the row 147 | row.append(', '.join(matching_categories)) 148 | # Write the modified row to the output CSV file 149 | writer.writerow(row) 150 | 151 | # Open the output CSV file again to remove rows without a LexiconCategory 152 | with open(output_csv_file, 'r', newline='', encoding='utf-8') as output_csv: 153 | reader = csv.reader(output_csv) 154 | # Filter rows based on presence of LexiconCategory 155 | rows_to_keep = [row for row in reader if row[-1] != ''] # Assuming LexiconCategory is the last column 156 | # Write filtered rows back to the output CSV file 157 | with open(output_csv_file, 'w', newline='', encoding='utf-8') as updated_output_csv: 158 | writer = csv.writer(updated_output_csv) 159 | writer.writerows(rows_to_keep) 160 | 161 | def browse_xml(): 162 | filename = filedialog.askopenfilename(filetypes=[("XML Files", "*.xml")]) 163 | xml_entry.delete(0, tk.END) 164 | xml_entry.insert(0, filename) 165 | 166 | def browse_lexicon(): 167 | filename = filedialog.askopenfilename(filetypes=[("CSV Files", "*.csv")]) 168 | lexicon_entry.delete(0, tk.END) 169 | lexicon_entry.insert(0, filename) 170 | 171 | def browse_output(): 172 | filename = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV Files", "*.csv")]) 173 | output_entry.delete(0, tk.END) 174 | output_entry.insert(0, filename) 175 | 176 | def process_files(): 177 | xml_file = xml_entry.get() 178 | lexicon_file = lexicon_entry.get() 179 | output_file = output_entry.get() 180 | 181 | if not xml_file or not lexicon_file or not output_file: 182 | messagebox.showerror("Error", "Please select XML file, lexicon file, and output file.") 183 | return 184 | 185 | try: 186 | parse_xml_to_csv(xml_file, "temp.csv") 187 | search_and_append_lexicon_category(lexicon_file, "temp.csv", output_file) 188 | messagebox.showinfo("Success", "Conversion completed successfully.") 189 | except Exception as e: 190 | messagebox.showerror("Error", f"An error occurred: {str(e)}") 191 | 192 | # Remove temp file 193 | import os 194 | os.remove("temp.csv") 195 | 196 | # Create GUI 197 | root = tk.Tk() 198 | root.title("XML to CSV Converter") 199 | 200 | # XML File 201 | xml_label = tk.Label(root, text="Please select the XML file you want to analyze:") 202 | xml_label.grid(row=0, column=0, padx=5, pady=5, sticky="w") 203 | xml_entry = tk.Entry(root, width=50) 204 | xml_entry.grid(row=0, column=1, columnspan=2, padx=5, pady=5) 205 | xml_button = tk.Button(root, text="Browse", command=browse_xml) 206 | xml_button.grid(row=0, column=3, padx=5, pady=5) 207 | 208 | # Lexicon File 209 | lexicon_label = tk.Label(root, text="Navigate to the lexicons.csv file saved on your computer:") 210 | lexicon_label.grid(row=1, column=0, padx=5, pady=5, sticky="w") 211 | lexicon_entry = tk.Entry(root, width=50) 212 | lexicon_entry.grid(row=1, column=1, columnspan=2, padx=5, pady=5) 213 | lexicon_button = tk.Button(root, text="Browse", command=browse_lexicon) 214 | lexicon_button.grid(row=1, column=3, padx=5, pady=5) 215 | 216 | # Output File 217 | output_label = tk.Label(root, text="Choose where you would like the audit file to be saved:") 218 | output_label.grid(row=2, column=0, padx=5, pady=5, sticky="w") 219 | output_entry = tk.Entry(root, width=50) 220 | output_entry.grid(row=2, column=1, columnspan=2, padx=5, pady=5) 221 | output_button = tk.Button(root, text="Browse", command=browse_output) 222 | output_button.grid(row=2, column=3, padx=5, pady=5) 223 | 224 | # Process Button 225 | process_button = tk.Button(root, text="Process", command=process_files) 226 | process_button.grid(row=3, column=0, columnspan=4, padx=5, pady=5) 227 | 228 | root.mainloop() 229 | -------------------------------------------------------------------------------- /Code/Test Versions/RMA-GUI-2.0.py: -------------------------------------------------------------------------------- 1 | import tkinter as tk 2 | from tkinter import filedialog, messagebox 3 | from tkinter import ttk 4 | import pandas as pd 5 | import string 6 | import re 7 | 8 | class ReparativeMetadataAuditTool(tk.Tk): 9 | def __init__(self): 10 | super().__init__() 11 | self.title("Lexicon Matching") 12 | 13 | # Initialize variables 14 | self.lexicon_df = None 15 | self.metadata_df = None 16 | self.columns = [] 17 | self.categories = [] 18 | self.selected_columns = [] # Store selected columns as an attribute 19 | self.category_selection_page_active = False # Track whether category selection page is active 20 | 21 | # Create main frame 22 | self.main_frame = ttk.Frame(self) 23 | self.main_frame.pack(fill='both', expand=True) 24 | 25 | # Load lexicon button 26 | self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon) 27 | self.load_lexicon_button.pack(pady=10) 28 | 29 | # Load metadata button 30 | self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata) 31 | self.load_metadata_button.pack(pady=10) 32 | 33 | # Next button 34 | self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection) 35 | self.next_button.pack(pady=10) 36 | 37 | # Hide next button initially 38 | self.next_button.pack_forget() 39 | 40 | # Second screen frame (Column selection) 41 | self.column_selection_frame = ttk.Frame(self) 42 | 43 | # Column selection label 44 | self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:") 45 | self.column_label.pack(pady=5) 46 | 47 | # Columns listbox 48 | self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple') 49 | self.column_listbox.pack(pady=5) 50 | 51 | # All columns checkbox 52 | self.all_columns_var = tk.BooleanVar(value=False) 53 | self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns) 54 | self.all_columns_checkbox.pack(pady=5) 55 | 56 | # Next button for column selection 57 | self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_category_selection) 58 | self.next_button_columns.pack(pady=10) 59 | 60 | # Back button for column selection 61 | self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.show_main_frame) 62 | self.back_button_columns.pack(pady=10) 63 | 64 | # Second screen frame (Category selection) 65 | self.category_selection_frame = ttk.Frame(self) 66 | 67 | # Initialize match button 68 | self.match_button = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching) 69 | 70 | # Initialize all_categories_var 71 | self.all_categories_var = tk.BooleanVar(value=False) 72 | 73 | # Hide second screen frames initially 74 | self.column_selection_frame.pack_forget() 75 | self.category_selection_frame.pack_forget() 76 | 77 | def load_lexicon(self): 78 | file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")]) 79 | if file_path: 80 | try: 81 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1') 82 | messagebox.showinfo("Success", "Lexicon loaded successfully.") 83 | self.load_lexicon_button.config(state='disabled') 84 | except Exception as e: 85 | messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}") 86 | 87 | def load_metadata(self): 88 | file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")]) 89 | if file_path: 90 | try: 91 | self.metadata_df = pd.read_csv(file_path, encoding='latin1') 92 | messagebox.showinfo("Success", "Metadata loaded successfully.") 93 | self.load_metadata_button.config(state='disabled') 94 | self.next_button.pack() 95 | except Exception as e: 96 | messagebox.showerror("Error", f"An error occurred while loading metadata: {e}") 97 | 98 | def show_column_selection(self): 99 | if self.lexicon_df is None or self.metadata_df is None: 100 | messagebox.showwarning("Warning", "Please load lexicon and metadata files first.") 101 | return 102 | 103 | # Populate columns listbox 104 | self.columns = self.metadata_df.columns.tolist() 105 | for column in self.columns: 106 | self.column_listbox.insert(tk.END, column) 107 | 108 | # Show column selection frame 109 | self.main_frame.pack_forget() 110 | self.column_selection_frame.pack(fill='both', expand=True) 111 | 112 | def show_category_selection(self): 113 | # Get selected columns 114 | self.selected_columns = self.get_selected_columns() # Store selected columns 115 | if not self.selected_columns: 116 | messagebox.showwarning("Warning", "Please select at least one column.") 117 | return 118 | 119 | # Clear previous selections 120 | self.categories.clear() 121 | if hasattr(self, 'category_listbox'): 122 | self.category_listbox.destroy() # Destroy previous listbox if exists 123 | self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple') 124 | self.category_listbox.pack(pady=5) 125 | 126 | # Populate categories listbox 127 | self.categories = self.lexicon_df['category'].unique().tolist() 128 | for category in self.categories: 129 | self.category_listbox.insert(tk.END, category) 130 | 131 | # Show category selection frame 132 | self.column_selection_frame.pack_forget() 133 | self.match_button.pack_forget() # Hide matching button if it's already displayed 134 | self.category_selection_frame.pack(fill='both', expand=True) 135 | self.category_selection_page_active = True # Set category selection page as active 136 | self.match_button.pack(pady=10) # Display matching button 137 | 138 | def perform_matching(self): 139 | # Hide back buttons 140 | self.back_button_columns.pack_forget() 141 | 142 | if self.category_selection_page_active: # Check if currently on category selection page 143 | # Get selected categories 144 | selected_categories = self.get_selected_categories() 145 | 146 | if not selected_categories: 147 | messagebox.showwarning("Warning", "Please select at least one category.") 148 | return 149 | 150 | # Proceed with matching 151 | matches = self.find_matches(self.selected_columns, selected_categories) 152 | else: 153 | # Get selected columns 154 | selected_columns = self.get_selected_columns() 155 | 156 | if not selected_columns: 157 | messagebox.showwarning("Warning", "Please select at least one column.") 158 | return 159 | 160 | # Get selected categories 161 | selected_categories = self.get_selected_categories() 162 | 163 | if not selected_categories: 164 | messagebox.showwarning("Warning", "Please select at least one category.") 165 | return 166 | 167 | # Proceed with matching 168 | matches = self.find_matches(selected_columns, selected_categories) 169 | 170 | # Filter matches to include only selected columns 171 | matches_filtered = [(identifier, term, category, col) for identifier, term, category, col in matches if col in self.selected_columns] 172 | 173 | # Save to CSV 174 | output_file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")]) 175 | if output_file_path: 176 | try: 177 | matches_df = pd.DataFrame(matches_filtered, columns=['Identifier', 'Term', 'Category', 'Column']) 178 | matches_df.to_csv(output_file_path, index=False) 179 | messagebox.showinfo("Success", f"Merged data saved to: {output_file_path}") 180 | except Exception as e: 181 | messagebox.showerror("Error", f"An error occurred while saving file: {e}") 182 | 183 | def toggle_columns(self): 184 | if self.all_columns_var.get(): 185 | self.column_listbox.selection_set(0, tk.END) 186 | self.column_listbox.config(state='disabled') 187 | else: 188 | self.column_listbox.selection_clear(0, tk.END) 189 | self.column_listbox.config(state='normal') 190 | 191 | def toggle_categories(self): 192 | if self.all_categories_var.get(): 193 | self.category_listbox.selection_set(0, tk.END) 194 | self.category_listbox.config(state='disabled') 195 | else: 196 | self.category_listbox.selection_clear(0, tk.END) 197 | self.category_listbox.config(state='normal') 198 | 199 | def get_selected_columns(self): 200 | if self.all_columns_var.get(): 201 | return self.columns 202 | else: 203 | return [self.columns[i] for i in self.column_listbox.curselection()] 204 | 205 | def get_selected_categories(self): 206 | if self.all_categories_var.get(): 207 | return self.categories 208 | else: 209 | return [self.categories[i] for i in self.category_listbox.curselection()] 210 | 211 | def find_matches(self, selected_columns, selected_categories): 212 | matches = [] 213 | # Filter lexicon based on selected categories 214 | lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)] 215 | # Iterate over each row in the metadata DataFrame 216 | for index, row in self.metadata_df.iterrows(): 217 | # Process the text in each specified column 218 | for col in selected_columns: 219 | # Check if the value in the column is a string 220 | if isinstance(row[col], str): 221 | # Iterate over each term in the lexicon and check for matches 222 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 223 | # Check if the whole term exists in the text column 224 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 225 | matches.append((row['Identifier'], term, category, col)) 226 | break # Break out of the inner loop once a match is found in this column 227 | return matches 228 | 229 | def show_main_frame(self): 230 | if self.category_selection_page_active: 231 | self.category_selection_page_active = False 232 | self.category_selection_frame.pack_forget() 233 | self.column_selection_frame.pack(fill='both', expand=True) 234 | else: 235 | self.column_selection_frame.pack_forget() 236 | self.main_frame.pack(fill='both', expand=True) 237 | 238 | # Create and run the application 239 | app = ReparativeMetadataAuditTool() 240 | app.mainloop() 241 | -------------------------------------------------------------------------------- /Code/Past Versions/MaRMAT-GUI-2.5.2.py: -------------------------------------------------------------------------------- 1 | import tkinter as tk 2 | from tkinter import filedialog, messagebox, ttk 3 | import pandas as pd 4 | import re 5 | import threading 6 | 7 | class MaRMAT(tk.Tk): 8 | def __init__(self): 9 | super().__init__() 10 | self.title("Marriott Reparative Metadata Assessment Tool (MaRMAT)") 11 | 12 | # Initialize variables 13 | self.lexicon_df = None 14 | self.metadata_df = None 15 | self.columns = [] 16 | self.categories = [] 17 | self.selected_columns = [] 18 | self.identifier_column = None 19 | 20 | # Create main frame 21 | self.main_frame = ttk.Frame(self) 22 | self.main_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 23 | 24 | # Explanation text 25 | self.explanation_text = """ 26 | Welcome to Marriott Reparative Metadata Assessment Tool (MaRMAT)! 27 | 28 | This tool allows you to match terms from a problematic terms lexicon file with text data from a collections metadata file. 29 | 30 | Please follow the steps below: 31 | 32 | 1. Load your lexicon and metadata files using the provided buttons. 33 | 34 | 2. On the next screen, select the columns from your metadata file that you want to analyze. 35 | 36 | 3. After selecting columns, choose the column in your metadata file that you want to rewrite as the "Identifier" column that relates back to your original metadata (e.g., a collection ID). 37 | 38 | 4. Then, choose the categories of terms from the lexicon that you want to search for. 39 | 40 | 5. Click "Perform Matching" to find matches and export the results to a CSV file. 41 | 42 | Let's get started! 43 | """ 44 | 45 | self.explanation_label = ttk.Label(self.main_frame, text=self.explanation_text, justify='left', wraplength=600) 46 | self.explanation_label.grid(row=0, column=0, columnspan=3, padx=20, pady=20, sticky="nsew") 47 | 48 | # Load lexicon button 49 | self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon) 50 | self.load_lexicon_button.grid(row=1, column=0, padx=10, pady=10, sticky="w") 51 | 52 | # Load metadata button 53 | self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata) 54 | self.load_metadata_button.grid(row=1, column=1, padx=10, pady=10, sticky="w") 55 | 56 | # Reset button 57 | self.reset_button = ttk.Button(self.main_frame, text="Reset", command=self.reset) 58 | self.reset_button.grid(row=1, column=2, padx=10, pady=10, sticky="w") 59 | 60 | # Next button 61 | self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection) 62 | self.next_button.grid(row=2, column=0, columnspan=3, pady=10, sticky="nsew") 63 | self.next_button.grid_remove() # Hide next button initially 64 | 65 | def load_lexicon(self): 66 | file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")]) 67 | if file_path: 68 | try: 69 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1') 70 | messagebox.showinfo("Success", "Lexicon loaded successfully.") 71 | self.load_lexicon_button.config(state='disabled') 72 | except Exception as e: 73 | messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}") 74 | 75 | def load_metadata(self): 76 | file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")]) 77 | if file_path: 78 | try: 79 | self.metadata_df = pd.read_csv(file_path, encoding='latin1') 80 | messagebox.showinfo("Success", "Metadata loaded successfully.") 81 | self.load_metadata_button.config(state='disabled') 82 | self.next_button.grid() 83 | except Exception as e: 84 | messagebox.showerror("Error", f"An error occurred while loading metadata: {e}") 85 | 86 | def show_column_selection(self): 87 | if self.lexicon_df is None or self.metadata_df is None: 88 | messagebox.showwarning("Warning", "Please load lexicon and metadata files first.") 89 | return 90 | 91 | # Populate columns listbox 92 | self.columns = self.metadata_df.columns.tolist() 93 | self.column_selection_frame = ttk.Frame(self) 94 | self.column_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 95 | 96 | self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:") 97 | self.column_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 98 | 99 | self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple') 100 | self.column_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 101 | for column in self.columns: 102 | self.column_listbox.insert(tk.END, column) 103 | 104 | self.all_columns_var = tk.BooleanVar(value=False) 105 | self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns) 106 | self.all_columns_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w") 107 | 108 | self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_identifier_selection) 109 | self.next_button_columns.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 110 | 111 | self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.back_to_main_frame) 112 | self.back_button_columns.grid(row=4, column=0, padx=10, pady=10, sticky="nsew") 113 | 114 | def show_identifier_selection(self): 115 | self.selected_columns = self.get_selected_columns() # Store selected columns 116 | if not self.selected_columns: 117 | messagebox.showwarning("Warning", "Please select at least one column.") 118 | return 119 | 120 | self.column_selection_frame.grid_remove() 121 | 122 | self.identifier_selection_frame = ttk.Frame(self) 123 | self.identifier_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 124 | 125 | self.identifier_label = ttk.Label(self.identifier_selection_frame, text="Select Identifier Column:") 126 | self.identifier_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 127 | 128 | self.identifier_var = tk.StringVar() 129 | self.identifier_dropdown = ttk.Combobox(self.identifier_selection_frame, textvariable=self.identifier_var, state='readonly') 130 | self.identifier_dropdown.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 131 | self.identifier_dropdown['values'] = self.metadata_df.columns.tolist() # Show all columns as options 132 | self.identifier_dropdown.current(0) # Select first column by default 133 | 134 | self.next_button_identifier = ttk.Button(self.identifier_selection_frame, text="Next", command=self.show_category_selection) 135 | self.next_button_identifier.grid(row=2, column=0, padx=10, pady=10, sticky="nsew") 136 | 137 | self.back_button_identifier = ttk.Button(self.identifier_selection_frame, text="Back", command=self.back_to_column_selection) 138 | self.back_button_identifier.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 139 | 140 | def show_category_selection(self): 141 | self.identifier_column = self.identifier_var.get() 142 | 143 | self.identifier_selection_frame.grid_remove() 144 | 145 | self.category_selection_frame = ttk.Frame(self) 146 | self.category_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 147 | 148 | self.category_label = ttk.Label(self.category_selection_frame, text="Select Categories to Analyze:") 149 | self.category_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 150 | 151 | self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple') 152 | self.category_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 153 | self.categories = self.lexicon_df['category'].unique().tolist() 154 | for category in self.categories: 155 | self.category_listbox.insert(tk.END, category) 156 | 157 | self.all_categories_var = tk.BooleanVar(value=False) 158 | self.all_categories_checkbox = ttk.Checkbutton(self.category_selection_frame, text="All", variable=self.all_categories_var, command=self.toggle_categories) 159 | self.all_categories_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w") 160 | 161 | self.next_button_categories = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching) 162 | self.next_button_categories.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 163 | 164 | self.back_button_categories = ttk.Button(self.category_selection_frame, text="Back", command=self.back_to_identifier_selection) 165 | self.back_button_categories.grid(row=4, column=0, padx=10, pady=10, sticky="nsew") 166 | 167 | def perform_matching(self): 168 | selected_categories = self.get_selected_categories() 169 | if not selected_categories: 170 | messagebox.showwarning("Warning", "Please select at least one category.") 171 | return 172 | 173 | matches = self.find_matches(self.selected_columns, selected_categories) 174 | matches_filtered = [(identifier, term, category, col, text) for identifier, term, category, col, text in matches if col in self.selected_columns] 175 | output_file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")]) 176 | if output_file_path: 177 | try: 178 | matches_df = pd.DataFrame(matches_filtered, columns=['Identifier', 'Term', 'Category', 'Column', 'Original Text']) 179 | matches_df.to_csv(output_file_path, index=False) 180 | messagebox.showinfo("Success", f"Merged data saved to: {output_file_path}") 181 | self.reset() 182 | except Exception as e: 183 | messagebox.showerror("Error", f"An error occurred while saving file: {e}") 184 | 185 | def toggle_columns(self): 186 | if self.all_columns_var.get(): 187 | self.column_listbox.selection_set(0, tk.END) 188 | self.column_listbox.config(state='disabled') 189 | else: 190 | self.column_listbox.selection_clear(0, tk.END) 191 | self.column_listbox.config(state='normal') 192 | 193 | def toggle_categories(self): 194 | if self.all_categories_var.get(): 195 | self.category_listbox.selection_set(0, tk.END) 196 | self.category_listbox.config(state='disabled') 197 | else: 198 | self.category_listbox.selection_clear(0, tk.END) 199 | self.category_listbox.config(state='normal') 200 | 201 | def get_selected_columns(self): 202 | if self.all_columns_var.get(): 203 | return self.columns 204 | else: 205 | return [self.columns[i] for i in self.column_listbox.curselection()] 206 | 207 | def get_selected_categories(self): 208 | return [self.categories[i] for i in self.category_listbox.curselection()] 209 | 210 | def find_matches(self, selected_columns, selected_categories): 211 | matches = [] 212 | lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)] 213 | for index, row in self.metadata_df.iterrows(): 214 | for col in selected_columns: 215 | if isinstance(row[col], str): 216 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 217 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 218 | matches.append((row[self.identifier_column], term, category, col, row[col])) 219 | break 220 | return matches 221 | 222 | def back_to_main_frame(self): 223 | self.column_selection_frame.grid_remove() 224 | self.main_frame.grid() 225 | 226 | def back_to_column_selection(self): 227 | self.identifier_selection_frame.grid_remove() 228 | self.column_selection_frame.grid() 229 | 230 | def back_to_identifier_selection(self): 231 | self.category_selection_frame.grid_remove() 232 | self.identifier_selection_frame.grid() 233 | 234 | def reset(self): 235 | self.load_lexicon_button.config(state='normal') 236 | self.load_metadata_button.config(state='normal') 237 | self.lexicon_df = None 238 | self.metadata_df = None 239 | self.columns = [] 240 | self.categories = [] 241 | self.selected_columns = [] 242 | self.identifier_column = None 243 | self.next_button.grid_remove() 244 | self.explanation_label.grid() 245 | 246 | # Create and run the application 247 | app = MaRMAT() 248 | app.mainloop() 249 | -------------------------------------------------------------------------------- /Code/MaRMAT-GUI-2.5.3.py: -------------------------------------------------------------------------------- 1 | import tkinter as tk 2 | from tkinter import filedialog, messagebox, ttk 3 | import pandas as pd 4 | import re 5 | import threading 6 | 7 | class MaRMAT(tk.Tk): 8 | def __init__(self): 9 | super().__init__() 10 | self.title("Marriott Reparative Metadata Assessment Tool (MaRMAT)") 11 | 12 | # Initialize variables 13 | self.lexicon_df = None 14 | self.metadata_df = None 15 | self.columns = [] 16 | self.categories = [] 17 | self.selected_columns = [] 18 | self.identifier_column = None 19 | 20 | # Create main frame 21 | self.main_frame = ttk.Frame(self) 22 | self.main_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 23 | 24 | # Explanation text 25 | self.explanation_text = """ 26 | Welcome to Marriott Reparative Metadata Assessment Tool (MaRMAT)! 27 | 28 | This tool allows you to match terms from a problematic terms lexicon file with text data from a collections metadata file. 29 | 30 | Please follow the steps below: 31 | 32 | 1. Load your lexicon and metadata files using the provided buttons. 33 | 34 | 2. On the next screen, select the columns from your metadata file that you want to analyze. 35 | 36 | 3. After selecting columns, choose the column in your metadata file that you want to rewrite as the "Identifier" column that relates back to your original metadata (e.g., a collection ID). 37 | 38 | 4. Then, choose the categories of terms from the lexicon that you want to search for. 39 | 40 | 5. Click "Perform Matching" to find matches and export the results to a CSV file. 41 | 42 | Let's get started! 43 | """ 44 | 45 | self.explanation_label = ttk.Label(self.main_frame, text=self.explanation_text, justify='left', wraplength=600) 46 | self.explanation_label.grid(row=0, column=0, columnspan=3, padx=20, pady=20, sticky="nsew") 47 | 48 | # Load lexicon button 49 | self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon) 50 | self.load_lexicon_button.grid(row=1, column=0, padx=10, pady=10, sticky="w") 51 | 52 | # Load metadata button 53 | self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata) 54 | self.load_metadata_button.grid(row=1, column=1, padx=10, pady=10, sticky="w") 55 | 56 | # Reset button 57 | self.reset_button = ttk.Button(self.main_frame, text="Reset", command=self.reset) 58 | self.reset_button.grid(row=1, column=2, padx=10, pady=10, sticky="w") 59 | 60 | # Next button 61 | self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection) 62 | self.next_button.grid(row=2, column=0, columnspan=3, pady=10, sticky="nsew") 63 | self.next_button.grid_remove() # Hide next button initially 64 | 65 | def load_lexicon(self): 66 | file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")]) 67 | if file_path: 68 | try: 69 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1') 70 | messagebox.showinfo("Success", "Lexicon loaded successfully.") 71 | self.load_lexicon_button.config(state='disabled') 72 | except Exception as e: 73 | messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}") 74 | 75 | def load_metadata(self): 76 | file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")]) 77 | if file_path: 78 | try: 79 | self.metadata_df = pd.read_csv(file_path, encoding='latin1') 80 | messagebox.showinfo("Success", "Metadata loaded successfully.") 81 | self.load_metadata_button.config(state='disabled') 82 | self.next_button.grid() 83 | except Exception as e: 84 | messagebox.showerror("Error", f"An error occurred while loading metadata: {e}") 85 | 86 | def show_column_selection(self): 87 | if self.lexicon_df is None or self.metadata_df is None: 88 | messagebox.showwarning("Warning", "Please load lexicon and metadata files first.") 89 | return 90 | 91 | # Populate columns listbox 92 | self.columns = self.metadata_df.columns.tolist() 93 | self.column_selection_frame = ttk.Frame(self) 94 | self.column_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 95 | 96 | self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:") 97 | self.column_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 98 | 99 | self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple') 100 | self.column_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 101 | for column in self.columns: 102 | self.column_listbox.insert(tk.END, column) 103 | 104 | self.all_columns_var = tk.BooleanVar(value=False) 105 | self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns) 106 | self.all_columns_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w") 107 | 108 | self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_identifier_selection) 109 | self.next_button_columns.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 110 | 111 | self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.back_to_main_frame) 112 | self.back_button_columns.grid(row=4, column=0, padx=10, pady=10, sticky="nsew") 113 | 114 | def show_identifier_selection(self): 115 | self.selected_columns = self.get_selected_columns() # Store selected columns 116 | if not self.selected_columns: 117 | messagebox.showwarning("Warning", "Please select at least one column.") 118 | return 119 | 120 | self.column_selection_frame.grid_remove() 121 | 122 | self.identifier_selection_frame = ttk.Frame(self) 123 | self.identifier_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 124 | 125 | self.identifier_label = ttk.Label(self.identifier_selection_frame, text="Select Identifier Column:") 126 | self.identifier_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 127 | 128 | self.identifier_var = tk.StringVar() 129 | self.identifier_dropdown = ttk.Combobox(self.identifier_selection_frame, textvariable=self.identifier_var, state='readonly') 130 | self.identifier_dropdown.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 131 | self.identifier_dropdown['values'] = self.metadata_df.columns.tolist() # Show all columns as options 132 | self.identifier_dropdown.current(0) # Select first column by default 133 | 134 | self.next_button_identifier = ttk.Button(self.identifier_selection_frame, text="Next", command=self.show_category_selection) 135 | self.next_button_identifier.grid(row=2, column=0, padx=10, pady=10, sticky="nsew") 136 | 137 | self.back_button_identifier = ttk.Button(self.identifier_selection_frame, text="Back", command=self.back_to_column_selection) 138 | self.back_button_identifier.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 139 | 140 | def show_category_selection(self): 141 | self.identifier_column = self.identifier_var.get() 142 | 143 | self.identifier_selection_frame.grid_remove() 144 | 145 | self.category_selection_frame = ttk.Frame(self) 146 | self.category_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 147 | 148 | self.category_label = ttk.Label(self.category_selection_frame, text="Select Categories to Analyze:") 149 | self.category_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 150 | 151 | self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple') 152 | self.category_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 153 | self.categories = self.lexicon_df['category'].unique().tolist() 154 | for category in self.categories: 155 | self.category_listbox.insert(tk.END, category) 156 | 157 | self.all_categories_var = tk.BooleanVar(value=False) 158 | self.all_categories_checkbox = ttk.Checkbutton(self.category_selection_frame, text="All", variable=self.all_categories_var, command=self.toggle_categories) 159 | self.all_categories_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w") 160 | 161 | self.next_button_categories = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching) 162 | self.next_button_categories.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 163 | 164 | self.back_button_categories = ttk.Button(self.category_selection_frame, text="Back", command=self.back_to_identifier_selection) 165 | self.back_button_categories.grid(row=4, column=0, padx=10, pady=10, sticky="nsew") 166 | 167 | def perform_matching(self): 168 | selected_categories = self.get_selected_categories() 169 | if not selected_categories: 170 | messagebox.showwarning("Warning", "Please select at least one category.") 171 | return 172 | 173 | matches = self.find_matches(self.selected_columns, selected_categories) 174 | matches_filtered = [(identifier, term, category, col, text) for identifier, term, category, col, text in matches if col in self.selected_columns] 175 | output_file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")]) 176 | if output_file_path: 177 | try: 178 | matches_df = pd.DataFrame(matches_filtered, columns=['Identifier', 'Term', 'Category', 'Column', 'Original Text']) 179 | matches_df.to_csv(output_file_path, index=False) 180 | messagebox.showinfo("Success", f"Merged data saved to: {output_file_path}") 181 | self.reset() 182 | except Exception as e: 183 | messagebox.showerror("Error", f"An error occurred while saving file: {e}") 184 | 185 | def toggle_columns(self): 186 | if self.all_columns_var.get(): 187 | self.column_listbox.selection_set(0, tk.END) 188 | self.column_listbox.config(state='disabled') 189 | else: 190 | self.column_listbox.selection_clear(0, tk.END) 191 | self.column_listbox.config(state='normal') 192 | 193 | def toggle_categories(self): 194 | if self.all_categories_var.get(): 195 | self.category_listbox.selection_set(0, tk.END) 196 | self.category_listbox.config(state='disabled') 197 | else: 198 | self.category_listbox.selection_clear(0, tk.END) 199 | self.category_listbox.config(state='normal') 200 | 201 | def get_selected_columns(self): 202 | if self.all_columns_var.get(): 203 | return self.columns 204 | else: 205 | return [self.columns[i] for i in self.column_listbox.curselection()] 206 | 207 | def get_selected_categories(self): 208 | return [self.categories[i] for i in self.category_listbox.curselection()] 209 | 210 | def find_matches(self, selected_columns, selected_categories): 211 | matches = [] 212 | lexicon_df = self.lexicon_df[self.lexicon_df['category'].isin(selected_categories)] 213 | 214 | for index, row in self.metadata_df.iterrows(): 215 | for col in selected_columns: 216 | if isinstance(row[col], str): 217 | for term, category in zip(lexicon_df['term'], lexicon_df['category']): 218 | # Check for multiple matches of terms within a single metadata cell 219 | if re.search(r'\b' + re.escape(term.lower()) + r'\b', row[col].lower()): 220 | matches.append((row[self.identifier_column], term, category, col, row[col])) 221 | 222 | return matches 223 | 224 | def back_to_main_frame(self): 225 | self.column_selection_frame.grid_remove() 226 | self.main_frame.grid() 227 | 228 | def back_to_column_selection(self): 229 | self.identifier_selection_frame.grid_remove() 230 | self.column_selection_frame.grid() 231 | 232 | def back_to_identifier_selection(self): 233 | self.category_selection_frame.grid_remove() 234 | self.identifier_selection_frame.grid() 235 | 236 | def reset(self): 237 | self.load_lexicon_button.config(state='normal') 238 | self.load_metadata_button.config(state='normal') 239 | self.lexicon_df = None 240 | self.metadata_df = None 241 | self.columns = [] 242 | self.categories = [] 243 | self.selected_columns = [] 244 | self.identifier_column = None 245 | self.next_button.grid_remove() 246 | self.explanation_label.grid() 247 | 248 | # Create and run the application 249 | app = MaRMAT() 250 | app.mainloop() 251 | -------------------------------------------------------------------------------- /Code/example-output-reparative-metadata-lexicon.csv: -------------------------------------------------------------------------------- 1 | Identifier,Term,Category,Column,Original Text 2 | 337805,aborigines,RaceTerms,title,Aborigines of Taiwan [001] 3 | 332408,aboriginal,RaceTerms,title,"Ainu (Japan's aboriginal people), Hokkaido, Japan [30]" 4 | 332408,wife,GenderTerms,description,"Photo of Japan's Aboriginal people (Chief, his wife and an unidentified person), Hokkaido, Japan" 5 | 332408,aboriginal,RaceTerms,description,"Photo of Japan's Aboriginal people (Chief, his wife and an unidentified person), Hokkaido, Japan" 6 | 335588,gentleman,Aggrandizement,description,"Photo of a Japanese gentleman holding a hand fan, Tokyo, Japan" 7 | 330740,dwarf,Disability,description,"Photograph of block print: ""A Potted Dwarf Pine with a Basin and a Towel on a Rack - Horse Talisman (Mayoke)"", also known as ""A surimono still-life composition"", (from the series A Set of Horses (Umazukushi), 1822) by Katsushika Hokusai (Japanese, 1760-1849), (approximate size, may vary slightly) 206 mm x 183 mm (8.11 in. x 7.20 in.)" 8 | 1533946,indians,RaceTerms,title,Busts of Ute Indians [1] 9 | 1533946,indian,RaceTerms,description,"Black-and-white photograph of a bust of an American Indian by Millard F. Malin, from a set commissioned by the State of Utah in 1934. His models were Ute Indians in the Uinta Basin." 10 | 1533946,indians,RaceTerms,description,"Black-and-white photograph of a bust of an American Indian by Millard F. Malin, from a set commissioned by the State of Utah in 1934. His models were Ute Indians in the Uinta Basin." 11 | 962277,indian,RaceTerms,description,"Photo taken at a court hearing or de-briefing following the American Indian Movement takeover at Wounded Knee, South Dakota, in 1973." 12 | 1498946,indian,RaceTerms,title,Spanish at Indian pueblo 13 | 1498946,indian,RaceTerms,description,"Photograph of an illustration in an unidentified publication, artist's rendition of a party of Spanish horsemen at an Indian pueblo, perhaps in New Mexico." 14 | 995167,Native Americans,RaceTerms,title,Native Americans herding sheep on horseback 15 | 995167,indians,RaceTerms,description,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation 16 | 947066,squaw,RaceTerms,description,"Photo of a large wood sculpture at Palisades Tahoe (previously Squaw Valley) in Olympic Valley, California, depicting skiers. It was carved in 1995 " 17 | 958790,wife,GenderTerms,title,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960." 18 | 958790,wife,GenderTerms,description,"Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960" 19 | 998919,pioneer,Aggrandizement,title,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 20 | 998919,pioneer,Aggrandizement,description,"Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 21 | 1302623,indian,RaceTerms,title,"Basalt-capped mesa on Dolores (Triassic), 6± miles south of Beddehoche (Indian Wells), Ariz., 1909 (photo G-67)" 22 | 1302623,indian,RaceTerms,description,"Photograph of Black Butte, a basalt-capped mesa south of Indian Wells. From Herbert E. Gregory Book 2: Navajo-Hopi, San Juan 1909" 23 | 1302623,indian,RaceTerms,spatial coverage,"Black Butte (Navajo County, Ariz.); Five Buttes (Ariz.); Navajo County (Ariz.); Navajo Indian Reservation; Arizona" 24 | 995167,Native Americans,RaceTerms,title,Native Americans herding sheep on horseback 25 | 995167,indians,RaceTerms,description,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation 26 | 995167,indian,RaceTerms,spatial coverage,Navajo Indian Reservation 27 | 2364219,pioneer,Aggrandizement,title,"Pioneer Day parade, 1880 (Carter, photo)" 28 | 2364219,pioneer,Aggrandizement,description,"Black and white photograph of the Salt Lake Pioneer Day Parade, July 24, 1880." 29 | 941713,evacuees,JapaneseincarcerationTerm,title,Newly arrived evacuees standing behind their baggage. 30 | 941713,evacuees,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II 31 | 941713,relocation,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II 32 | 941713,relocation center,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II 33 | 941713,relocation,JapaneseincarcerationTerm,description,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II 34 | 941713,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese American Relocation Photograph Collection 35 | 941713,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese American Relocation Photograph Collection 36 | 941496,evacuees,JapaneseincarcerationTerm,title,Evacuees cleaning vegetables in the packing shed. 37 | 941496,evacuees,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II 38 | 941496,relocation,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II 39 | 941496,relocation center,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II 40 | 941496,relocation,JapaneseincarcerationTerm,description,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II 41 | 941496,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection 42 | 941496,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection 43 | 941536,evacuees,JapaneseincarcerationTerm,title,Evacuees harvesting potatoes at Tule Lake. [5] 44 | 941536,evacuees,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II 45 | 941536,relocation,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II 46 | 941536,relocation center,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II 47 | 941536,relocation,JapaneseincarcerationTerm,description,Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II 48 | 941536,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection 49 | 941536,relocation,JapaneseincarcerationTerm,collection name,P0144 Japanese Relocation Photograph Collection 50 | 958469,wife,GenderTerms,title,"Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle the ""Challenger"" on the Bonneville Salt Flats Raceway in 1960." 51 | 958469,wife,GenderTerms,description,"Photo of Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle, the ""Challenger,"" on the Bonneville Salt Flats Raceway in 1960" 52 | 958790,wife,GenderTerms,title,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960." 53 | 958790,wife,GenderTerms,description,"Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960" 54 | 998946,pioneer,Aggrandizement,title,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City" 55 | 998946,pioneer,Aggrandizement,description,"Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 56 | 998908,pioneer,Aggrandizement,title,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City" 57 | 998908,pioneer,Aggrandizement,description,"Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 58 | 998938,pioneer,Aggrandizement,title,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 59 | 998938,pioneer,Aggrandizement,description,"Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 60 | 998919,pioneer,Aggrandizement,title,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 61 | 998919,pioneer,Aggrandizement,description,"Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City" 62 | 2292054,racism,RaceEuphemisms,description,"Series of letters from Albert Fritz to the Salt Lake Police Department, Assistant Chief Ralph Knusden regarding peaceful demonstrations. NAACP call for protest on State Capitol building due to failure of the Utah Legislature to adopt civil rights legislation regarding housing and public accommodations (2005). NAACP appealed to the Governor of the state of Utah to include civil rights legislation on the docket for the special session of the Utah state Legislature, which he has announced. Local NAACP leaders noticed that Utah is the only northern state with no civil rights legislation (1968). Letter from Albert B. Fritz (Salt Lake NAACP) to Honorable George D. Clyde, Governor of Utah (1965?) regarding lack of civil rights legislation in Utah. Article from the Wall Street Journal (1964) ""Civil Rights Irony: New U.S. Agency's First Case Likely to Come From Utah"" by Donald Moffitt. Article discusses racism in Salt Lake City and highlights Chuck Nabors who moved to Salt Lake City to attend the University of Utah. Nabors rented an apartment sight unseen, when the landlord saw he was an African American the landlord backed out of lease. After Nabors found a landlord who would rent to him, neighbors petitioned to have him leave." 63 | 1396789,ethnic,RaceTerms,description,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah. Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah.""" 64 | 1396777,racial,RaceEuphemisms,description,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination." 65 | 1396777,indian,RaceTerms,description,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination." 66 | 1396777,indians,RaceTerms,description,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination." 67 | 1396789,ethnic,RaceTerms,description,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah. Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah.""" 68 | 893658,blacks,RaceTerms,collection name,"Ms0453, Interviews with Blacks in Utah, 1982-1988" 69 | 946265,pioneer,Aggrandizement,title,"Arnold Lunn, left, ski historian and author of the book, The Story of Ski-ing, 1952. And Hjalmar Hvam, right, ski pioneer and inventor of America's first safety binding in 1937." 70 | 947346,pioneer,Aggrandizement,title,"Utah ski pioneer Mel Fletcher skiing on a pair of his homemade ""Barrel Staves,"" circa 1952." 71 | 947346,pioneer,Aggrandizement,description,"Photo of Mel Fletcher, skiing pioneer and ski instructor, on homemade skis" 72 | 2509070,crippled,Disability,title," ""Captive Jewels--Our Crippled Children"" speech, retyped" 73 | -------------------------------------------------------------------------------- /Code/Test Versions/RMA-GUI-2.5.py: -------------------------------------------------------------------------------- 1 | import tkinter as tk 2 | from tkinter import filedialog, messagebox, ttk 3 | import pandas as pd 4 | import re 5 | import threading 6 | import queue 7 | 8 | class ReparativeMetadataAuditTool(tk.Tk): 9 | def __init__(self): 10 | super().__init__() 11 | self.title("Reparative Metadata Audit Tool") 12 | 13 | # Initialize variables 14 | self.lexicon_df = None 15 | self.metadata_df = None 16 | self.columns = [] 17 | self.categories = [] 18 | self.selected_columns = [] 19 | self.identifier_column = None 20 | self.category_selection_page_active = False 21 | 22 | # Create main frame 23 | self.main_frame = ttk.Frame(self) 24 | self.main_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 25 | 26 | # Explanation text 27 | explanation_text = ( 28 | "Welcome to the Reparative Metadata Audit Tool!\n\n" 29 | "This tool allows you to match terms from a problematic terms lexicon file with text data from a collections metadata file.\n\n" 30 | "Please follow the steps below:\n\n" 31 | "1. Load your lexicon and metadata files using the provided buttons.\n" 32 | "2. Select the columns from your metadata file that you want to analyze.\n" 33 | "3. Choose the column in your metadata file that you want to use as the 'Identifier' column.\n" 34 | "4. Choose the categories of terms from the lexicon that you want to search for.\n" 35 | "5. Click 'Perform Matching' to find matches and export the results to a CSV file.\n\n" 36 | "Let's get started!" 37 | ) 38 | 39 | self.explanation_label = ttk.Label(self.main_frame, text=explanation_text, justify='left') 40 | self.explanation_label.grid(row=0, column=0, columnspan=3, padx=20, pady=20, sticky="nsew") 41 | 42 | # Load lexicon button 43 | self.load_lexicon_button = ttk.Button(self.main_frame, text="Load Lexicon", command=self.load_lexicon) 44 | self.load_lexicon_button.grid(row=1, column=0, padx=10, pady=10, sticky="w") 45 | 46 | # Load metadata button 47 | self.load_metadata_button = ttk.Button(self.main_frame, text="Load Metadata", command=self.load_metadata) 48 | self.load_metadata_button.grid(row=1, column=1, padx=10, pady=10, sticky="w") 49 | 50 | # Reset button 51 | self.reset_button = ttk.Button(self.main_frame, text="Reset", command=self.reset) 52 | self.reset_button.grid(row=1, column=2, padx=10, pady=10, sticky="w") 53 | 54 | # Next button 55 | self.next_button = ttk.Button(self.main_frame, text="Next", command=self.show_column_selection) 56 | self.next_button.grid(row=2, column=0, columnspan=3, pady=10, sticky="nsew") 57 | self.next_button.grid_remove() # Hide next button initially 58 | 59 | # Progress bar 60 | self.progress_bar = ttk.Progressbar(self.main_frame, orient="horizontal", mode="determinate") 61 | self.progress_bar.grid(row=3, column=0, columnspan=3, pady=10, padx=10, sticky="ew") 62 | self.progress_bar.grid_remove() # Hide progress bar initially 63 | 64 | # Queue for thread communication 65 | self.matching_queue = queue.Queue() 66 | self.matching_thread = None 67 | self.check_queue_job = None 68 | 69 | def load_lexicon(self): 70 | file_path = filedialog.askopenfilename(filetypes=[("CSV and TSV files", "*.csv *.tsv")]) 71 | if file_path: 72 | try: 73 | if file_path.endswith('.csv'): 74 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1') 75 | elif file_path.endswith('.tsv'): 76 | self.lexicon_df = pd.read_csv(file_path, encoding='latin1', sep='\t') 77 | messagebox.showinfo("Success", "Lexicon loaded successfully.") 78 | self.load_lexicon_button.config(state='disabled') 79 | except Exception as e: 80 | messagebox.showerror("Error", f"An error occurred while loading lexicon: {e}") 81 | 82 | def load_metadata(self): 83 | file_path = filedialog.askopenfilename(filetypes=[("CSV and TSV files", "*.csv *.tsv")]) 84 | if file_path: 85 | try: 86 | if file_path.endswith('.csv'): 87 | self.metadata_df = pd.read_csv(file_path, encoding='latin1') 88 | elif file_path.endswith('.tsv'): 89 | self.metadata_df = pd.read_csv(file_path, encoding='latin1', sep='\t') 90 | messagebox.showinfo("Success", "Metadata loaded successfully.") 91 | self.load_metadata_button.config(state='disabled') 92 | self.next_button.grid() 93 | except Exception as e: 94 | messagebox.showerror("Error", f"An error occurred while loading metadata: {e}") 95 | 96 | def show_column_selection(self): 97 | if self.lexicon_df is None or self.metadata_df is None: 98 | messagebox.showwarning("Warning", "Please load lexicon and metadata files first.") 99 | return 100 | 101 | # Populate columns listbox 102 | self.columns = self.metadata_df.columns.tolist() 103 | self.column_selection_frame = ttk.Frame(self) 104 | self.column_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 105 | 106 | self.column_label = ttk.Label(self.column_selection_frame, text="Select Columns to Analyze:") 107 | self.column_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 108 | 109 | self.column_listbox = tk.Listbox(self.column_selection_frame, selectmode='multiple') 110 | self.column_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 111 | for column in self.columns: 112 | self.column_listbox.insert(tk.END, column) 113 | 114 | self.all_columns_var = tk.BooleanVar(value=False) 115 | self.all_columns_checkbox = ttk.Checkbutton(self.column_selection_frame, text="All", variable=self.all_columns_var, command=self.toggle_columns) 116 | self.all_columns_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w") 117 | 118 | self.next_button_columns = ttk.Button(self.column_selection_frame, text="Next", command=self.show_identifier_selection) 119 | self.next_button_columns.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 120 | 121 | self.back_button_columns = ttk.Button(self.column_selection_frame, text="Back", command=self.back_to_main_frame) 122 | self.back_button_columns.grid(row=4, column=0, padx=10, pady=10, sticky="nsew") 123 | self.back_button_columns.grid_remove() # Hide back button initially 124 | 125 | def show_identifier_selection(self): 126 | self.selected_columns = self.get_selected_columns() # Store selected columns 127 | if not self.selected_columns: 128 | messagebox.showwarning("Warning", "Please select at least one column.") 129 | return 130 | 131 | self.column_selection_frame.grid_remove() 132 | 133 | self.identifier_selection_frame = ttk.Frame(self) 134 | self.identifier_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 135 | 136 | self.identifier_label = ttk.Label(self.identifier_selection_frame, text="Select Identifier Column:") 137 | self.identifier_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 138 | 139 | self.identifier_var = tk.StringVar() 140 | self.identifier_dropdown = ttk.Combobox(self.identifier_selection_frame, textvariable=self.identifier_var, state='readonly') 141 | self.identifier_dropdown.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 142 | self.identifier_dropdown['values'] = self.metadata_df.columns.tolist() # Show all columns as options 143 | self.identifier_dropdown.current(0) # Select first column by default 144 | 145 | self.next_button_identifier = ttk.Button(self.identifier_selection_frame, text="Next", command=self.show_category_selection) 146 | self.next_button_identifier.grid(row=2, column=0, padx=10, pady=10, sticky="nsew") 147 | 148 | self.back_button_identifier = ttk.Button(self.identifier_selection_frame, text="Back", command=self.back_to_column_selection) 149 | self.back_button_identifier.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 150 | 151 | def show_category_selection(self): 152 | self.identifier_column = self.identifier_var.get() 153 | 154 | self.identifier_selection_frame.grid_remove() 155 | 156 | self.category_selection_frame = ttk.Frame(self) 157 | self.category_selection_frame.grid(row=0, column=0, padx=20, pady=20, sticky="nsew") 158 | 159 | self.category_label = ttk.Label(self.category_selection_frame, text="Select Categories to Analyze:") 160 | self.category_label.grid(row=0, column=0, padx=10, pady=5, sticky="w") 161 | 162 | self.category_listbox = tk.Listbox(self.category_selection_frame, selectmode='multiple') 163 | self.category_listbox.grid(row=1, column=0, padx=10, pady=5, sticky="nsew") 164 | self.categories = self.lexicon_df['category'].unique().tolist() 165 | for category in self.categories: 166 | self.category_listbox.insert(tk.END, category) 167 | 168 | self.all_categories_var = tk.BooleanVar(value=False) 169 | self.all_categories_checkbox = ttk.Checkbutton(self.category_selection_frame, text="All", variable=self.all_categories_var, command=self.toggle_categories) 170 | self.all_categories_checkbox.grid(row=2, column=0, padx=10, pady=5, sticky="w") 171 | 172 | self.next_button_categories = ttk.Button(self.category_selection_frame, text="Perform Matching", command=self.perform_matching) 173 | self.next_button_categories.grid(row=3, column=0, padx=10, pady=10, sticky="nsew") 174 | 175 | self.back_button_categories = ttk.Button(self.category_selection_frame, text="Back", command=self.back_to_identifier_selection) 176 | self.back_button_categories.grid(row=4, column=0, padx=10, pady=10, sticky="nsew") 177 | 178 | def perform_matching(self): 179 | selected_categories = self.get_selected_categories() 180 | if not selected_categories: 181 | messagebox.showwarning("Warning", "Please select at least one category.") 182 | return 183 | 184 | self.progress_bar.grid() 185 | self.progress_bar.start() 186 | 187 | self.matched_results = [] 188 | 189 | def match_terms(): 190 | total_rows = len(self.metadata_df) 191 | for idx, row in self.metadata_df.iterrows(): 192 | for col in self.selected_columns: 193 | cell_value = str(row[col]) 194 | for _, lexicon_row in self.lexicon_df.iterrows(): 195 | term = lexicon_row['term'] 196 | category = lexicon_row['category'] 197 | if category in selected_categories: 198 | if re.search(rf'\b{re.escape(term)}\b', cell_value, re.IGNORECASE): 199 | match_info = { 200 | 'Identifier': row[self.identifier_column], 201 | 'Column': col, 202 | 'Term': term, 203 | 'Category': category, 204 | 'Original Text': cell_value 205 | } 206 | self.matched_results.append(match_info) 207 | self.matching_queue.put((idx + 1) / total_rows * 100) 208 | 209 | def process_queue(): 210 | try: 211 | while True: 212 | progress = self.matching_queue.get_nowait() 213 | self.progress_bar['value'] = progress 214 | self.update_idletasks() 215 | except queue.Empty: 216 | if not self.matching_thread.is_alive(): 217 | self.progress_bar.stop() 218 | self.progress_bar.grid_remove() 219 | if self.matched_results: 220 | self.export_results() 221 | else: 222 | messagebox.showinfo("No Matches", "No matches found.") 223 | self.after_cancel(self.check_queue_job) 224 | 225 | self.matching_thread = threading.Thread(target=match_terms) 226 | self.matching_thread.start() 227 | self.check_queue_job = self.after(100, process_queue) 228 | 229 | def export_results(self): 230 | results_df = pd.DataFrame(self.matched_results) 231 | save_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")]) 232 | if save_path: 233 | try: 234 | results_df.to_csv(save_path, index=False, encoding='utf-8') 235 | messagebox.showinfo("Success", "Results exported successfully.") 236 | except Exception as e: 237 | messagebox.showerror("Error", f"An error occurred while exporting results: {e}") 238 | 239 | def get_selected_columns(self): 240 | if self.all_columns_var.get(): 241 | return self.columns 242 | selected_indices = self.column_listbox.curselection() 243 | return [self.columns[i] for i in selected_indices] 244 | 245 | def get_selected_categories(self): 246 | if self.all_categories_var.get(): 247 | return self.categories 248 | selected_indices = self.category_listbox.curselection() 249 | return [self.categories[i] for i in selected_indices] 250 | 251 | def toggle_columns(self): 252 | if self.all_columns_var.get(): 253 | self.column_listbox.selection_set(0, tk.END) 254 | else: 255 | self.column_listbox.selection_clear(0, tk.END) 256 | 257 | def toggle_categories(self): 258 | if self.all_categories_var.get(): 259 | self.category_listbox.selection_set(0, tk.END) 260 | else: 261 | self.category_listbox.selection_clear(0, tk.END) 262 | 263 | def reset(self): 264 | self.destroy() 265 | self.__init__() 266 | self.mainloop() 267 | 268 | def back_to_main_frame(self): 269 | self.column_selection_frame.grid_remove() 270 | self.main_frame.grid() 271 | 272 | def back_to_column_selection(self): 273 | self.identifier_selection_frame.grid_remove() 274 | self.column_selection_frame.grid() 275 | 276 | def back_to_identifier_selection(self): 277 | self.category_selection_frame.grid_remove() 278 | self.identifier_selection_frame.grid() 279 | 280 | if __name__ == "__main__": 281 | app = ReparativeMetadataAuditTool() 282 | app.mainloop() 283 | 284 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | **MaRMAT Beta is now obsolete. For the most current version of MaRMAT, see https://github.com/marriott-library/MaRMAT.** 2 | 3 | # Marriott Reparative Metadata Assessment Tool (MaRMAT) - Beta 4 | 5 | The Marriott Reparative Metadata Assessment Tool (MaRMAT) is a Python application designed for auditing collections metadata files against a lexicon of potentially problematic terms. The tool's design facilitates an easy-to-follow process for assessing metadata using a lexicon of terms. For PC user's, we provide a graphical interface for file loading, column selection, and term matching, making it user-friendly for those with limited programming experience. The tool can also be run in your command line. 6 | 7 | Note: Whether you are using the GUI for Windows or the command-line tool for MacOS, you will need to have [Python](https://docs.python.org/3/) and the `pandas` library for Python installed on your machine. Installation instructions for `pandas` are provided below in the respective **Dependencies** sections for the GUI and command-line tool. 8 | 9 | We value your feedback! Please take [this survey](https://docs.google.com/forms/d/e/1FAIpQLSfaABD5qsU2trEjrDWs3MoytgiNCaD08GJvRWzqhgzv5GjoDg/viewform?usp=sf_link) to tell us about your experience using MaRMAT. 10 | 11 | 12 | ## **Table of Contents** 13 | 1. [Project Background](#1-project-background) 14 | 15 | 1.1 [About the Tool](#11-about-the-tool) 16 | 17 | 1.2 [The Lexicons](#12-the-lexicons) 18 | 19 | 1.3 [Features](#13-features) 20 | 21 | 1.4 [Example Outputs and Tutorial](#14-example-outputs-and-tutorial) 22 | 23 | 2. [The GUI for Windows Users](#2-the-gui-for-windows-users) 24 | 25 | 2.1 [Usage](#21-usage) 26 | 27 | 2.2 [Dependencies](#22-dependencies) 28 | 29 | 2.3 [Installation](#23-installation) 30 | 31 | 2.4 [Troubleshooting](#24-troubleshooting) 32 | 33 | 3. [The Command-Line Tool](#3-the-command-line-tool) 34 | 35 | 3.1 [Usage](#31-usage) 36 | 37 | 3.2 [Dependencies](#32-dependencies) 38 | 39 | 3.3 [Notes](#33-notes) 40 | 41 | 4. [Credits and Acknowledgments](#4-credits-and-acknowledgments) 42 | 5. [User Feedback Survey](#5-user-feedback-survey) 43 | 44 | ## 1. Project Background 45 | Identifying potentially harmful language, problematic and outdated Library of Congress Subject Headings, is one step towards reparative metadata practices. Deciding what and how to change this metadata, however, is up to metadata practitioners and involves awareness, education, and sensitivity for the communities and history reflected in digital collections. The [Digital Library Federation’s Inclusive Metadata Toolkit, created by the Digital Library Federation’s Cultural Assessment Working Group](https://osf.io/2nmpc/), provides resources to educate and assist in reparative metadata decision-making. 46 | 47 | The Marriot Reparative Metadata Assessment Tool (MaRMAT) is based [Duke University’s Description Audit Tool](https://github.com/duke-libraries/description-audit). It is intended to assist digital collections metadata practitioners in bulk analysis of metadata collections to identify potentially harmful language in description and facilitate repairing metadata to reflect current and preferred terminology. While Duke University's Description Audit Tool was created to analyze MARC XML and EAD finding aid metadata, MaRMAT was developed to analyze metadata in a spreadsheet format, allowing for assessment of Dublin Core metadata and other schemas due to only requiring key column-header names. In addition, the script has been altered to provide more custom querying capabilities. 48 | 49 | ## 1.1 About the Tool 50 | At the most basic level, [MaRMAT](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-GUI-2.5.3.py) is designed to match terms from a lexicon with textual data and produce a CSV file containing the matched results. It utilizes the Pandas library for data manipulation and regular expressions for text processing. It was designed primarily with librarians in mind, specifically those engaged in reparative metadata practices, to assist in idenfiying terms in their metadata that may be outdated, biased, or otherwise problematic. The underlying code (including preliminary iterations) and sample lexicons for using the tool can be accessed via the [Code](https://github.com/marriott-library/MaRMAT/tree/main/Code) folder of this repository. For additional information about the GUI, see [GUI-Documentation](https://github.com/marriott-library/MaRMAT/blob/main/Code/GUI-Documentation.md). 51 | 52 | ### 1.2 The Lexicons 53 | There are two lexicons provided to help begin your reparative metadata assessment. Not all of the terms in these lexicons may need remediation, rather, they may signal areas of your collections that should be reiveiwed carefully. Users may download the provided lexicons to use in MaRMAT as is, remove terms that may not be problematic in your metadata, or add additional terms and categories based on specific project needs. The only requirements for a lexicon to work against another file are that there be two columns in the CSV file: "Term" and "Category" (case sensitive). Therefore, the tool's use is not limited to assessing metadata for problematic terms; it may also be loaded with a custom lexicon to perform matching against a variety of content types. 54 | 55 | | Lexicon | Description | 56 | | :----------:| ---------- | 57 | | Reparative Metadata Lexicon | The [Reparative Metadata Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/lexicon-reparative-metadata.csv) includes potentially harmful terminology organized by category and is best suited for uncontrolled metadata fields (i.e. Title, Description). This lexicon has been adapted from Duke University's lexicons, which were created for similar use cases. For the Marriott Reparative Metadata Assessment Tool (MaRMAT), Duke's [lexicons](https://github.com/duke-libraries/description-audit/tree/main/lexicons) were modified by transposing across their category columns to create a single lexicon (term, category) that better accommodate users adding additional terms and categories without having to adjust the underlying code structure. | 58 | | Library of Congress Subject Heading (LCSH) Lexicon | The [LCSH Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/lexicon-LCSH.csv) includes selected changed and canceled LCSH (mostly from 2023) and headings that have been identified as problematic. The LCSH Lexicon is best suited to run against the Subject metadata field, or other fields that contain LCSH terms 59 | 60 | ### 1.3 Features 61 | - Load lexicon and metadata files in CSV format. 62 | - Select columns from the metadata file for analysis. 63 | - Choose the column in the metadata file to be rewritten as the "Identifier" column so that the output can be reconciled with the original metadata file. 64 | - Select categories of terms from the lexicon for analysis. 65 | - Perform matching to find matches between selected columns and categories. 66 | - Export results to a CSV file. 67 | 68 | ### 1.4 Example Outputs and Tutorial 69 | 70 | To provide users with a sense of what to expect from running MaRMAT against their own metadata collection, below includes example metadata to load and query against the provided lexicons, and outputs from the the provided lexicons: 71 | 1. [Example Input: Potentially Problematic Metadata](https://github.com/marriott-library/MaRMAT/blob/main/Code/example-input-metadata.csv) 72 | 2. [Example Output: Reparative Metadata Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/example-output-reparative-metadata-lexicon.csv) 73 | 3. [Example Output: LCSH Lexicon](https://github.com/marriott-library/MaRMAT/blob/main/Code/example-output-lcsh-subject-lexicon.csv) 74 | 75 | Please keep in mind these reports are just snippets of larger reports. Users should be aware that there may be false positives or results that may not need remediation. For example, the LCSH term "Race" is considered a problem heading but MaRMAT may flag other headings with "race," as in "Bonneville Salt Flats Race, Utah." Likewise, the gender term "wife" may not always signal an unnamed woman, and terms that may be harmful in some contexts may not be in others. Therefore, we stress the importance of human review and intervention prior to making broad conclusions or global changes based on MaRMAT outputs. 76 | 77 | To assist in getting started with MaRMAT, there is also a [video tutorial](https://youtu.be/uspAoqfj99g?si=jQArVdlbGm_qN78l) that demonstrates the first steps in using the GUI for Windows (subtitles can be enabled in settings). The Mac OS demonstration is also available [on video](https://youtu.be/j_fFplU1W_o), please note audio for this demo will be coming soon. 78 | 79 | ## 2. The GUI for Windows Users 80 | To facilitate wider use, the [MaRMAT GUI](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-GUI-2.5.3.py) allows users to easily load a lexicon and a metadata file, select a key column (i.e., Identifier) to use in reconciling matches, and choose the columns and categories they'd like to perform matching on. 81 | 82 | *Note: The GUI is not compatible with MacOS. Additional information on the MaRMAT GUI is available [here](https://github.com/marriott-library/MaRMAT/blob/main/Code/GUI-Documentation.md). 83 | 84 | ### 2.1 Usage 85 | 1. Loading Files: 86 | - Click on the "Load Lexicon" button to load the lexicon file. 87 | - Click on the "Load Metadata" button to load the metadata file. 88 | 89 | 2. Selecting Columns: 90 | - After loading files, click "Next" to proceed to column selection. 91 | - Select the columns from the metadata file that you want to analyze. 92 | 93 | 3. Selecting Identifier Column: 94 | - After selecting columns, choose the column in the metadata file that will serve as the key column or "Identifier" column, such as a record ID. 95 | 96 | 4. Selecting Categories: 97 | - Next, choose the categories of terms from the lexicon that you want to search for. 98 | 99 | 5. Performing Matching: 100 | - Click "Perform Matching" to find matches between selected columns and categories. 101 | - The results will be exported to a CSV file. 102 | 103 | ### 2.2 Dependencies 104 | - **[Python 3.x](https://docs.python.org/3/)**: Python is a widely used high-level programming language for general-purpose programming. 105 | 106 | - **[Tkinter](https://docs.python.org/3/library/tk.html)**: Tkinter is Python's standard GUI (Graphical User Interface) package. It is used to create desktop applications with a graphical interface. It is usually included with Python distributions, so no separate installation is required. 107 | 108 | - **[re](https://docs.python.org/3/library/re.html)**: This module provides regular expression matching operations. It's a built-in module in Python and doesn't require separate installation. 109 | 110 | - **[pandas](https://pandas.pydata.org/docs/)**: Pandas is a Python library that provides easy-to-use data structures and data analysis tools for manipulating and analyzing structured data, particularly tabular data. Pandas can be installed using pip in your command line interface: ``py -m pip install pandas`` 111 | 112 | *Note: These dependencies are essential for running MaRMAT. If you don't have Python installed, you can download it from the [official Python website](https://www.python.org/downloads).* 113 | 114 | ### 2.3 Installation 115 | No installation is required. Simply follow the steps below to download and run the Python script to start the application on your PC. 116 | 117 | 1. Download the Python Script: 118 | - Download the [MaRMAT-GUI-2.5.3.py](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-GUI-2.5.3.py) script to a location on your PC where you can easily find it, such as your Desktop or Downloads. 119 | 120 | 2. Ensure Python is Installed: 121 | - To make sure that Python is installed on your PC, search for "Python" in your Start Menu or look for the Python folder in your Program Files. 122 | - If Python is not installed, you can download and install it from the official [Python website](https://www.python.org/downloads). 123 | 124 | 3. Double-Click the Python Script: 125 | - Navigate to the location where you downloaded the script. 126 | - Double-click on the script file (i.e., MaRMAT-GUI-2.5.3.py). 127 | 128 | 4. Application Starts: 129 | - The application should start running automatically; the GUI will appear on your screen. 130 | 131 | ### 2.4 Troubleshooting 132 | The GUI should automacially open when you open the Python code file. If you are having issues with the GUI opening, try opening the file in Python IDLE and running it. IDLE should give ou an error message with insights as to why it is not loading correctly. If you are receiving error messages pyrelated to ``pandas``, such as ``No module named 'pandas'``, follow these steps to install ``pandas``. 133 | 134 | 1. Open your command line interface 135 | 136 | 2. Type the following into command line: ``py -m pip install pandas`` 137 | 138 | 3. Press enter to run the command 139 | 140 | If this process does not resolve your issue, follow these Getting Started tips to make sure python and the pip installer are running correctly on your PC: [https://pip.pypa.io/en/stable/getting-started](https://pip.pypa.io/en/stable/getting-started/) 141 | 142 | ## 3. The Command-Line Tool 143 | The [MaRMAT](https://github.com/marriott-library/MaRMAT/blob/main/Code/MaRMAT-CommandLine-2.6.py) can be run by any user from their command line. 144 | 145 | ### 3.1 Usage 146 | 1. Install Python if not already installed (Python 3.x recommended). 147 | 148 | 2. Clone or download the MaRMAT repository. 149 | 150 | 3. Use the command-line interface to navigate to the directory where you saved the files (e.g., `Downloads`, `Desktop`). For example, run `cd Downloads` to change your directory to your `Downloads` folder. 151 | 152 | 5. Run the tool in your command line using the following command: ```python3 MaRMAT-CommandLine-2.6.py``` 153 | 154 | 6. Follow the prompts in your command line to provide the paths to the lexicon and metadata files. 155 | 156 | 7. Follow the prompts to input the names of the columns you want to analyze in the metadata file, the name of the column that should be used as the identifier or key column, and the categories of terms from the lexicon that you want to search for in your metadata file. Note: inputs are case sensitive. 157 | 158 | 8. Follow the prompt to provide the path you would like to save your output to. 159 | 160 | 9. Review the matching results displayed on the console or in the generated CSV file. 161 | 162 | *Note: Demonstration video coming soon* 163 | 164 | ### 3.2 Dependencies 165 | 166 | - **[Python 3.x](https://docs.python.org/3/)**: Python is a widely used high-level programming language for general-purpose programming. 167 | 168 | - **[pandas](https://pandas.pydata.org/docs/)**: Pandas is a Python library that provides easy-to-use data structures and data analysis tools for manipulating and analyzing structured data, particularly tabular data. Pandas can be installed using pip in Terminal: `pip install pandas` 169 | 170 | - **[re](https://docs.python.org/3/library/re.html)**: This module provides regular expression matching operations. It's a built-in module in Python and doesn't require separate installation. 171 | 172 | *Note: These dependencies are necessary to run the provided code successfully. Ensure that you have them installed before running the code.* 173 | 174 | ### 3.3 Notes 175 | - Ensure that both the lexicon and metadata files are in CSV format. 176 | - The lexicon file should contain columns for terms and their corresponding categories ("Terms","Category"). 177 | - The metadata file should contain the text data to be analyzed, with each row representing a separate entry. 178 | - The metadata file should contain a column, such as a Record ID, that you can use as an "Identifier" to reconcile the tool's output with your original metadata. 179 | - The tool outputs matching results to a CSV file named "matching_results.csv" in the tool's directory. 180 | 181 | ## 4. Credits and Acknowledgments 182 | Code developed by [Kaylee Alexander](https://github.com/kayleealexander) in collaboration with ChatGPT 3.5, [Rachel Wittmann](https://github.com/RachelJaneWittmann), and [Anna Neatrour](https://github.com/aneatrour) at the University of Utah's J. Willard Marriott Library. MaRMAT Beta was released in July, 2024. 183 | 184 | This tool was inspired by the Duke University Libraries Description Audit Tool, developed by [Noah Huffman](https://github.com/noahgh221) at the Rubenstein Library, and expanded by [Miriam Shams-Rainey](https://github.com/mshamsrainey) (see [Description-Audit](https://github.com/duke-libraries/description-audit/tree/main)). 185 | 186 | ## 5. User Feedback Survey 187 | After using MaRMAT, please take [this suvery](https://docs.google.com/forms/d/e/1FAIpQLSfaABD5qsU2trEjrDWs3MoytgiNCaD08GJvRWzqhgzv5GjoDg/viewform?usp=sf_link) and tell us about your exeprience using MARMAT. We appreciate your feedback! 188 | -------------------------------------------------------------------------------- /Code/lexicon-LCSH.csv: -------------------------------------------------------------------------------- 1 | term,category 2 | Afghanistan--Politics and government--2001-,ChangeHeadingLCSH 3 | African American bisexuals,ChangeHeadingLCSH 4 | African American gays,ChangeHeadingLCSH 5 | African American gays in literature,ChangeHeadingLCSH 6 | Aged,ChangeHeadingLCSH 7 | "Al-Aqsa Intifada, 2000-",ChangeHeadingLCSH 8 | Aliens,ChangeHeadingLCSH 9 | Ammassalimiut dialect,ChangeHeadingLCSH 10 | Amycus (Greek mythology),CancelHeadingLCSH 11 | Anaxarete (Greek mythology),CancelHeadingLCSH 12 | Anaxarete (Greek mythology) in literature,CancelHeadingLCSH 13 | Anime,CancelHeadingLCSH 14 | "Anti-war poetry, Oriya",ChangeHeadingLCSH 15 | Antifa (Organisation),ProblemLCSH 16 | "Argentina--History--Dirty War, 1976-1983",ProblemLCSH 17 | Asian American bisexuals,ChangeHeadingLCSH 18 | Asian American gays,ChangeHeadingLCSH 19 | Asian flu,ProblemLCSH 20 | Asperger's syndrome,ProblemLCSH 21 | Attention-deficit disorder in adolescence,CancelHeadingLCSH 22 | Attention-deficit disorder in adults,CancelHeadingLCSH 23 | Attention-deficit-disordered children,CancelHeadingLCSH 24 | Attention-deficit-disordered parents,ChangeHeadingLCSH 25 | Attention-deficit-disordered youth,ChangeHeadingLCSH 26 | B.C. and A.D.,ProblemLCSH 27 | B17 (Steam locomotive),ChangeHeadingLCSH 28 | Baucis (Greek mythology),CancelHeadingLCSH 29 | Bellerophon (Greek mythology),CancelHeadingLCSH 30 | Berbers,ProblemLCSH 31 | Bildungsromans,ProblemLCSH 32 | "Bird, Mount (Ross Island, Ross Sea, Antarctica)",ChangeHeadingLCSH 33 | Bisexuals,ChangeHeadingLCSH 34 | Bisexuals' writings,ChangeHeadingLCSH 35 | "Bisexuals' writings, American",ChangeHeadingLCSH 36 | "Bisexuals' writings, Canadian",ChangeHeadingLCSH 37 | "Bisexuals' writings, English",ChangeHeadingLCSH 38 | Blacks,ChangeHeadingLCSH 39 | Boat people,ProblemLCSH 40 | Bossiness,ProblemLCSH 41 | Boys love (Gay erotica),ChangeHeadingLCSH 42 | "Brazil--History--Revolution, 1964",ProblemLCSH 43 | Brecon Beacons National Park (Wales),ChangeHeadingLCSH 44 | Buddhist gays,ChangeHeadingLCSH 45 | Bukit Baka-Bukit Raya National Park (Indonesia),ChangeHeadingLCSH 46 | Candra (Hindu deity),CancelHeadingLCSH 47 | "Cang Lang Ting (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 48 | Catholic gays,ChangeHeadingLCSH 49 | Cerebral palsied,ProblemLCSH 50 | Cheju Tol Munhwa Kongwŏn (Korea),ChangeHeadingLCSH 51 | Child pornography,ProblemLCSH 52 | Children of attention-deficit-disordered parents,ChangeHeadingLCSH 53 | Children of egg donors,CancelHeadingLCSH 54 | Children of sperm donors,CancelHeadingLCSH 55 | "Children's literature, Oriya",ChangeHeadingLCSH 56 | "Children's poetry, Oriya",ChangeHeadingLCSH 57 | Christian gays,ChangeHeadingLCSH 58 | Church work with African American gays,ChangeHeadingLCSH 59 | Church work with attention-deficit-disordered youth,ChangeHeadingLCSH 60 | Church work with gays,ChangeHeadingLCSH 61 | Civil procedure (Bantu law),CancelHeadingLCSH 62 | Climatic changes,ProblemLCSH 63 | Closeted gays,ChangeHeadingLCSH 64 | Closeted gays in literature,ChangeHeadingLCSH 65 | Common fallacies,ProblemLCSH 66 | Crimea (Ukraine)--History--2014-,ChangeHeadingLCSH 67 | Criminals,ProblemLCSH 68 | Cripples,ChangeHeadingLCSH 69 | Cross-dressing,ProblemLCSH 70 | Crystalline lens,ProblemLCSH 71 | Dattātreya (Hindu deity),CancelHeadingLCSH 72 | Deaf gays,ChangeHeadingLCSH 73 | Defloration,ProblemLCSH 74 | Devakī (Hindu mythology),CancelHeadingLCSH 75 | Deviant behavior,ProblemLCSH 76 | Dewbow,ChangeHeadingLCSH 77 | "Didactic poetry, Oriya",ChangeHeadingLCSH 78 | Discovery and exploration,ProblemLCSH 79 | Discrimination against overweight persons,ProblemLCSH 80 | Domestic relations,ProblemLCSH 81 | "Donetsk International Airport, Battle for, 2nd, Ukraine, 2014-2015",ChangeHeadingLCSH 82 | Dwarfs (Persons),ProblemLCSH 83 | East Indians,ProblemLCSH 84 | Eskimos,ProblemLCSH 85 | Ethnic arts,ProblemLCSH 86 | Evaluative (Linguistics),ChangeHeadingLCSH 87 | Ex-gays,ChangeHeadingLCSH 88 | Expulsion of the Mormons,ChangeHeadingLCSH 89 | Female circumcision,ChangeHeadingLCSH 90 | Feminine hygiene products,ChangeHeadingLCSH 91 | Feminine hygiene products industry,ChangeHeadingLCSH 92 | Fetishism,ProblemLCSH 93 | Florida Trail (Fla.),CancelHeadingLCSH 94 | Fogbow,ChangeHeadingLCSH 95 | "Folk drama, Oriya",ChangeHeadingLCSH 96 | "Folk literature, Oriya",ChangeHeadingLCSH 97 | "Folk songs, Dunsun",ChangeHeadingLCSH 98 | "Folk songs, Kalâtdlisut",ChangeHeadingLCSH 99 | "Folk songs, Karok",ChangeHeadingLCSH 100 | Freedmen,ChangeHeadingLCSH 101 | Friesland (Netherlands)--History,ChangeHeadingLCSH 102 | Future life,ProblemLCSH 103 | Galván family,ChangeHeadingLCSH 104 | Gays,ChangeHeadingLCSH 105 | Gays and rock music,ChangeHeadingLCSH 106 | Gays and sports,ChangeHeadingLCSH 107 | Gays and the performing arts,ChangeHeadingLCSH 108 | Gays in advertising,CancelHeadingLCSH 109 | Gays in higher education,ChangeHeadingLCSH 110 | Gays in literature,ChangeHeadingLCSH 111 | Gays in mass media,ChangeHeadingLCSH 112 | Gays in motion pictures,ChangeHeadingLCSH 113 | Gays in popular culture,ChangeHeadingLCSH 114 | Gays in the civil service,ChangeHeadingLCSH 115 | Gays in the performing arts,ChangeHeadingLCSH 116 | Gays with disabilities,ChangeHeadingLCSH 117 | "Gays, Black",ChangeHeadingLCSH 118 | Gays' writings,ChangeHeadingLCSH 119 | "Gays' writings, American",ChangeHeadingLCSH 120 | "Gays' writings, Australian",ChangeHeadingLCSH 121 | "Gays' writings, Austrian",ChangeHeadingLCSH 122 | "Gays' writings, Basque",ChangeHeadingLCSH 123 | "Gays' writings, Canadian",ChangeHeadingLCSH 124 | "Gays' writings, Caribbean",ChangeHeadingLCSH 125 | "Gays' writings, Catalan",ChangeHeadingLCSH 126 | "Gays' writings, Chilean",ChangeHeadingLCSH 127 | "Gays' writings, Chinese",ChangeHeadingLCSH 128 | "Gays' writings, Costa Rican",ChangeHeadingLCSH 129 | "Gays' writings, Dominican",ChangeHeadingLCSH 130 | "Gays' writings, English",ChangeHeadingLCSH 131 | "Gays' writings, French",ChangeHeadingLCSH 132 | "Gays' writings, Galician",ChangeHeadingLCSH 133 | "Gays' writings, German",ChangeHeadingLCSH 134 | "Gays' writings, Irish",ChangeHeadingLCSH 135 | "Gays' writings, Israeli",ChangeHeadingLCSH 136 | "Gays' writings, Italian",ChangeHeadingLCSH 137 | "Gays' writings, Latin American",ChangeHeadingLCSH 138 | "Gays' writings, Malaysian (English)",ChangeHeadingLCSH 139 | "Gays' writings, Philippine (English)",ChangeHeadingLCSH 140 | "Gays' writings, Portuguese",ChangeHeadingLCSH 141 | "Gays' writings, Puerto Rican",ChangeHeadingLCSH 142 | "Gays' writings, Scottish",ChangeHeadingLCSH 143 | "Gays' writings, South African (English)",ChangeHeadingLCSH 144 | "Gays' writings, Spanish",ChangeHeadingLCSH 145 | "Gays' writings, Spanish American",ChangeHeadingLCSH 146 | "Gays' writings, Tagalog",ChangeHeadingLCSH 147 | "Gays’ writings, Bikol",ChangeHeadingLCSH 148 | Gender identity disorders,ChangeHeadingLCSH 149 | Gender-nonconforming people,ProblemLCSH 150 | Giants,CancelHeadingLCSH 151 | Giants in art,ChangeHeadingLCSH 152 | Giants in literature,ChangeHeadingLCSH 153 | Giants in the Bible,ChangeHeadingLCSH 154 | "Glen Villa Art Garden (North Hatley, Québec)",ChangeHeadingLCSH 155 | God (Islam),ProblemLCSH 156 | Great Chalfield Manor (England),ChangeHeadingLCSH 157 | Gypsies,ChangeHeadingLCSH 158 | "Hand-to-hand fighting, Oriental",ProblemLCSH 159 | Handicapped children,ChangeHeadingLCSH 160 | Hearing impaired,ProblemLCSH 161 | "Heng Jie (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 162 | Héta Indians,ChangeHeadingLCSH 163 | Hispanic American gays,ChangeHeadingLCSH 164 | Hispanic Americans,ProblemLCSH 165 | "Historical drama, Oriya",ChangeHeadingLCSH 166 | Homeless persons,ProblemLCSH 167 | Hotchkiss automobile,ChangeHeadingLCSH 168 | Human monkeypox,ChangeHeadingLCSH 169 | "Humorous poetry, Oriya",ChangeHeadingLCSH 170 | Husband and wife,ProblemLCSH 171 | Hypermnestra (Greek mythology),CancelHeadingLCSH 172 | Hypnobryales,ChangeHeadingLCSH 173 | Illegal aliens,ChangeHeadingLCSH 174 | Illegal immigration,ProblemLCSH 175 | Illegitimacy,ProblemLCSH 176 | Illegitimate children,ProblemLCSH 177 | Indecent assault,CancelHeadingLCSH 178 | Indian cooking,ProblemLCSH 179 | Indian gays,ChangeHeadingLCSH 180 | Indian gays in literature,ChangeHeadingLCSH 181 | Indians of North America,ProblemLCSH 182 | Inmates of institutions,ProblemLCSH 183 | Invalids,ProblemLCSH 184 | Isobryales,CancelHeadingLCSH 185 | Italian American gays,CancelHeadingLCSH 186 | "Japanese Americans--Evacuation and relocation, 1942-1945",ChangeHeadingLCSH 187 | Japanimation,CancelHeadingLCSH 188 | Jewish bisexuals,ChangeHeadingLCSH 189 | Jewish gays,ChangeHeadingLCSH 190 | Jewish question,ProblemLCSH 191 | Juvenile delinquents,ProblemLCSH 192 | Kalâtdlisut dialect,ChangeHeadingLCSH 193 | Kalâtdlisut language,ChangeHeadingLCSH 194 | Kalâtdlisut literature,ChangeHeadingLCSH 195 | Kalâtdlisut poetry,ChangeHeadingLCSH 196 | Karok art,ChangeHeadingLCSH 197 | Karok artists,ChangeHeadingLCSH 198 | Karok baskets,ChangeHeadingLCSH 199 | Karok Indians,ChangeHeadingLCSH 200 | Karok language,ChangeHeadingLCSH 201 | Karok mythology,ChangeHeadingLCSH 202 | Karok women,ChangeHeadingLCSH 203 | Keeche Indians,ChangeHeadingLCSH 204 | Kejimkujik National Park (N.S.),ChangeHeadingLCSH 205 | "KeyArena (Seattle, Wash.)",ChangeHeadingLCSH 206 | Kham language,ChangeHeadingLCSH 207 | Kingdom of God (Mormon theology),ChangeHeadingLCSH 208 | Kings and rulers,ProblemLCSH 209 | Landlord and tenant,ProblemLCSH 210 | "Laudatory poetry, Oriya",ChangeHeadingLCSH 211 | "Law, Bantu",CancelHeadingLCSH 212 | "Lebanon--History--Israeli intervention, 1982-1985",ProblemLCSH 213 | Legal assistance to gays,ChangeHeadingLCSH 214 | Leprosy,ProblemLCSH 215 | Leprosy,ProblemLCSH 216 | Libraries and bisexuals,ChangeHeadingLCSH 217 | Libraries and gays,ChangeHeadingLCSH 218 | Mah-Meri (Malaysian people),ChangeHeadingLCSH 219 | "Masks, Mah-Meri",ChangeHeadingLCSH 220 | Mass media and gays,ChangeHeadingLCSH 221 | Mentally retarded persons,ChangeHeadingLCSH 222 | Mexican American gays,CancelHeadingLCSH 223 | Mexican fruit-fly,ChangeHeadingLCSH 224 | Middle-aged gays,ChangeHeadingLCSH 225 | Minority gays,CancelHeadingLCSH 226 | Minority gays in literature,ChangeHeadingLCSH 227 | Missions to Mormons,ChangeHeadingLCSH 228 | Mogul Empire,ProblemLCSH 229 | Mongoloid race,ProblemLCSH 230 | Mormon,ChangeHeadingLCSH 231 | Mormon almanacs,ChangeHeadingLCSH 232 | Mormon architecture,ChangeHeadingLCSH 233 | Mormon art,ChangeHeadingLCSH 234 | Mormon artists,ChangeHeadingLCSH 235 | Mormon arts,ChangeHeadingLCSH 236 | Mormon athletes,ChangeHeadingLCSH 237 | Mormon authors,ChangeHeadingLCSH 238 | Mormon boys,ChangeHeadingLCSH 239 | Mormon children,ChangeHeadingLCSH 240 | Mormon Church,ChangeHeadingLCSH 241 | Mormon church buildings,ChangeHeadingLCSH 242 | Mormon cities and towns,ChangeHeadingLCSH 243 | Mormon converts,ChangeHeadingLCSH 244 | Mormon cooking,ChangeHeadingLCSH 245 | Mormon cosmology,ChangeHeadingLCSH 246 | Mormon decorative arts,ChangeHeadingLCSH 247 | Mormon families,ChangeHeadingLCSH 248 | Mormon fundamentalism,ChangeHeadingLCSH 249 | Mormon furniture,ChangeHeadingLCSH 250 | Mormon gays,ChangeHeadingLCSH 251 | Mormon girls,ChangeHeadingLCSH 252 | Mormon handcart companies,ChangeHeadingLCSH 253 | Mormon historians,ChangeHeadingLCSH 254 | Mormon hygiene,ChangeHeadingLCSH 255 | Mormon intellectuals,ChangeHeadingLCSH 256 | Mormon interpretations,ChangeHeadingLCSH 257 | Mormon men,ChangeHeadingLCSH 258 | Mormon midwives,ChangeHeadingLCSH 259 | Mormon missionaries,ChangeHeadingLCSH 260 | Mormon neo-orthodoxy,ChangeHeadingLCSH 261 | Mormon painting,ChangeHeadingLCSH 262 | Mormon pilgrims and pilgrimages,ChangeHeadingLCSH 263 | Mormon pioneers,ChangeHeadingLCSH 264 | Mormon press,ChangeHeadingLCSH 265 | Mormon quilts,ChangeHeadingLCSH 266 | Mormon returned missionaries,ChangeHeadingLCSH 267 | Mormon scholars,ChangeHeadingLCSH 268 | Mormon seminaries,ChangeHeadingLCSH 269 | Mormon shrines,ChangeHeadingLCSH 270 | Mormon tabernacles,ChangeHeadingLCSH 271 | Mormon temples,ChangeHeadingLCSH 272 | Mormon universities and colleges,ChangeHeadingLCSH 273 | Mormon wit and humor,ChangeHeadingLCSH 274 | Mormon women,ChangeHeadingLCSH 275 | Mormon women authors,ChangeHeadingLCSH 276 | Mormon women missionaries,ChangeHeadingLCSH 277 | Mormon youth,ChangeHeadingLCSH 278 | Mormons,ChangeHeadingLCSH 279 | Mormons in art,ChangeHeadingLCSH 280 | Mormons in literature,ChangeHeadingLCSH 281 | Mormons in mass media,ChangeHeadingLCSH 282 | Mormons in motion pictures,ChangeHeadingLCSH 283 | Morrigan (Celtic deity),CancelHeadingLCSH 284 | Multiple personality disorder,ProblemLCSH 285 | Muslim gays,ChangeHeadingLCSH 286 | "Mythology, Aboriginal Australian",ProblemLCSH 287 | Nanovor (Game),CancelHeadingLCSH 288 | Navajo,ProblemLCSH 289 | Neopagan gays,ChangeHeadingLCSH 290 | Nephites,ChangeHeadingLCSH 291 | New Jerusalem (Mormon theology),ChangeHeadingLCSH 292 | Ngati Awa (New Zealand people),ChangeHeadingLCSH 293 | Ngati Haua (New Zealand people),ChangeHeadingLCSH 294 | Ngati Mahuta (New Zealand people),ChangeHeadingLCSH 295 | Ngati Mamoe (New Zealand people),ChangeHeadingLCSH 296 | Ngati Pukenga (New Zealand people),ChangeHeadingLCSH 297 | Nilam River (Pakistan),ChangeHeadingLCSH 298 | Nilam River Valley (Pakistan),ChangeHeadingLCSH 299 | Nile mosaic (Palestrina),CancelHeadingLCSH 300 | Obsession,ProblemLCSH 301 | Obsession (Psychology),ProblemLCSH 302 | Older gays,ChangeHeadingLCSH 303 | Older Mormons,ChangeHeadingLCSH 304 | "One-act plays, Oriya",ChangeHeadingLCSH 305 | Ordinances for the dead (Mormon Church),ChangeHeadingLCSH 306 | Ordination of gays,ChangeHeadingLCSH 307 | Oriental literature,ProblemLCSH 308 | Orientalism,ProblemLCSH 309 | Oriya drama,ChangeHeadingLCSH 310 | Oriya essays,ChangeHeadingLCSH 311 | Oriya fiction,ChangeHeadingLCSH 312 | Oriya language,ChangeHeadingLCSH 313 | Oriya literature,ChangeHeadingLCSH 314 | Oriya poetry,ChangeHeadingLCSH 315 | Oriya poetry--1500-1800,ChangeHeadingLCSH 316 | Oriya poetry--20th century,ChangeHeadingLCSH 317 | Oriya prose literature,ChangeHeadingLCSH 318 | Oriya prose literature--To 1500,ChangeHeadingLCSH 319 | Oriya wit and humor,ChangeHeadingLCSH 320 | "Ou Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 321 | Overweight gays,ChangeHeadingLCSH 322 | Pacific Islander American bisexuals,ChangeHeadingLCSH 323 | Pacific Islander American gays,ChangeHeadingLCSH 324 | Pacifists,ProblemLCSH 325 | Palestinian Arabs,ProblemLCSH 326 | Parental leave,ChangeHeadingLCSH 327 | Parents of attention-deficit-disordered children,ChangeHeadingLCSH 328 | Parents of gays,ChangeHeadingLCSH 329 | Parolees,ProblemLCSH 330 | Patriarchal blessings (Mormon Church),ChangeHeadingLCSH 331 | Patriarchs (Mormon theology),ChangeHeadingLCSH 332 | "Patriotic poetry, Oriya",ChangeHeadingLCSH 333 | People with mental disabilities,ProblemLCSH 334 | People with social disabilities,ProblemLCSH 335 | Philemon (Greek mythology),CancelHeadingLCSH 336 | Pioneer Day (Mormon history),ChangeHeadingLCSH 337 | Plan of salvation (Mormon theology),ChangeHeadingLCSH 338 | Poor,ProblemLCSH 339 | Popular music--South Korea--2011-2020,ProblemLCSH 340 | Porirua Harbour (N.Z.),ChangeHeadingLCSH 341 | Posse Comitatus (Group),CancelHeadingLCSH 342 | Pregnant women,ProblemLCSH 343 | Presbyterian gays,ChangeHeadingLCSH 344 | Primitive art,ProblemLCSH 345 | Prisoners,ProblemLCSH 346 | Problem children,ProblemLCSH 347 | Prophets (Mormon theology),ChangeHeadingLCSH 348 | Prostitution,ProblemLCSH 349 | Protestant gays,ChangeHeadingLCSH 350 | Psychic trauma,ProblemLCSH 351 | "Quatrains, Oriya",ChangeHeadingLCSH 352 | Race,ProblemLCSH 353 | Race relations,ProblemLCSH 354 | Race riots,ProblemLCSH 355 | Racially mixed people,ProblemLCSH 356 | Radio programs for gays,ChangeHeadingLCSH 357 | "Rāsa literature, Oriya",ChangeHeadingLCSH 358 | Rastafarian,ProblemLCSH 359 | "Red River Rebellion, 1869-1870",ChangeHeadingLCSH 360 | "Reichstagsgebäude (Berlin, Germany)",CancelHeadingLCSH 361 | "Religious literature, Oriya",ChangeHeadingLCSH 362 | "Religious poetry, Oriya",ChangeHeadingLCSH 363 | Restoration of the gospel (Mormon doctrine),ChangeHeadingLCSH 364 | "Revolutionary poetry, Oriya",ChangeHeadingLCSH 365 | Rgyal-roṅ (China),ChangeHeadingLCSH 366 | "Riel Rebellion, 1885",ChangeHeadingLCSH 367 | Samantabhadra (Buddhist deity),CancelHeadingLCSH 368 | Schizophrenics,ProblemLCSH 369 | Sex change,ChangeHeadingLCSH 370 | Sex role,ProblemLCSH 371 | Sexual minorities,ProblemLCSH 372 | Sexual reorientation programs,ChangeHeadingLCSH 373 | "Shiquan Jie (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 374 | "Short stories, Oriya",ChangeHeadingLCSH 375 | Social disabilities,ProblemLCSH 376 | Social work with bisexuals,ChangeHeadingLCSH 377 | Social work with gays,ChangeHeadingLCSH 378 | Socially handicapped,ChangeHeadingLCSH 379 | South Asian American gays,ChangeHeadingLCSH 380 | Street-food vendors,ChangeHeadingLCSH 381 | Stuart Island State Park (Wash.),ChangeHeadingLCSH 382 | Substance abuse,ProblemLCSH 383 | Śulvasūtras,CancelHeadingLCSH 384 | "Support (Domestic relations law, Hindu)",ChangeHeadingLCSH 385 | "Support (Domestic relations law, Islamic)",ChangeHeadingLCSH 386 | "Support (Domestic relations law, Jewish)",ChangeHeadingLCSH 387 | Television and gays,ChangeHeadingLCSH 388 | Television programs for gays,ChangeHeadingLCSH 389 | Temple endowments (Mormon Church),ChangeHeadingLCSH 390 | Temple work (Mormon Church),ChangeHeadingLCSH 391 | Tenth of Muḥarram,ProblemLCSH 392 | Theology,ProblemLCSH 393 | "Tibet, Plateau of",ChangeHeadingLCSH 394 | Tramps,ProblemLCSH 395 | Transvestism,ChangeHeadingLCSH 396 | Triangles (Interpersonal relations),ProblemLCSH 397 | "Ukraine Conflict, 2014-",CancelHeadingLCSH 398 | Ukraine--Economic conditions--1991-,ChangeHeadingLCSH 399 | Ukraine--Economic policy--1991-,ChangeHeadingLCSH 400 | Ukraine--Foreign relations--1991-,ChangeHeadingLCSH 401 | Ukraine--History--1917-,ChangeHeadingLCSH 402 | Ukraine--History--1917-1991,ChangeHeadingLCSH 403 | Ukraine--History--1991-,ChangeHeadingLCSH 404 | "Ukraine--History--Euromaidan Protests, 2013-2014",ChangeHeadingLCSH 405 | "Ukraine--History--Russian Invasion, 2022-",ChangeHeadingLCSH 406 | Ukraine--Intellectual life--1991-,ChangeHeadingLCSH 407 | Ukraine--Politics and government--1991-,ChangeHeadingLCSH 408 | Ukraine--Social conditions--1991-,ChangeHeadingLCSH 409 | Ultra-Orthodox,ProblemLCSH 410 | United orders (Mormon Church),ChangeHeadingLCSH 411 | United States. Army--Gays,ChangeHeadingLCSH 412 | Unskilled labor,ProblemLCSH 413 | "Vaishnava poetry, Oriya",ChangeHeadingLCSH 414 | Victims,ProblemLCSH 415 | Waima'a language,ChangeHeadingLCSH 416 | "Wang Shi Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 417 | Wards (Mormon Church),ChangeHeadingLCSH 418 | Whites,ChangeHeadingLCSH 419 | Women in the Mormon Church,ChangeHeadingLCSH 420 | Women in the Mormon sacred books,ChangeHeadingLCSH 421 | Word recognition,ProblemLCSH 422 | World music,ProblemLCSH 423 | "World War, 1939-1945--Gays",ChangeHeadingLCSH 424 | Yellow peril,CancelHeadingLCSH 425 | "Yipu Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 426 | "Zhuo Zheng Yuan (Suzhou, Jiangsu Sheng, China)",ChangeHeadingLCSH 427 | Zion (Mormon Church),ChangeHeadingLCSH 428 | Zoroastrianim,CancelHeadingLCSH 429 | -------------------------------------------------------------------------------- /Code/example-input-metadata.csv: -------------------------------------------------------------------------------- 1 | id,title,description,creator,date,collection name,subjects,spatial coverage 2 | 337805,Aborigines of Taiwan [001],"A 1974 photo showing a group of aborigine dancers, Taiwan","Tierney, Lennox",1974,P0479 Lennox and Catherine Tierney Photo Collection,Indigenous peoples--Photographs; Taiwan aborigines--Photographs; Amis (Taiwan people)--Photographs; Dance--Photographs; Clothing and dress--Photographs; Taiwan; Dance; Clothing and dress; Republic of China,Taiwan 3 | 332408,"Ainu (Japan's aboriginal people), Hokkaido, Japan [30]","Photo of Japan's Aboriginal people (Chief, his wife and an unidentified person), Hokkaido, Japan","Tierney, Lennox",1959,P0479 Lennox and Catherine Tierney Photo Collection,Ainu--Photographs; Men--Photographs; Women--Photographs; Japan; Indigenous peoples,Shiraoi-chō (Japan) 4 | 335588,People of Japan [002],"Photo of a Japanese gentleman holding a hand fan, Tokyo, Japan","Tierney, Lennox",1950; 1951; 1952,P0479 Lennox and Catherine Tierney Photo Collection,Japanese--Japan--Tokyo--Photographs; Men--Japan--Tokyo--Photographs; Fans--Japan--Tokyo--Photographs; Japan; Men; Fans,Japan 5 | 330740,Block printing: Katsushika Hokusai [015],"Photograph of block print: ""A Potted Dwarf Pine with a Basin and a Towel on a Rack - Horse Talisman (Mayoke)"", also known as ""A surimono still-life composition"", (from the series A Set of Horses (Umazukushi), 1822) by Katsushika Hokusai (Japanese, 1760-1849), (approximate size, may vary slightly) 206 mm x 183 mm (8.11 in. x 7.20 in.)","Tierney, Lennox",2003,P0479 Lennox and Catherine Tierney Photo Collection,"Katsushika, Hokusai, 1760-1849--Photographs; Block printing--Japan--Photographs; Ukiyoe--Japan--Photographs; Surimono--Japan--Photographs; Trees--Art--Photographs; Bonsai--Art--Photographs; Pine--Art--Photographs; Towels--Art--Photographs; Basins (Containers)--Art--Photographs; Water--Art--Photographs; Art; Ukiyoe; Surimono", 6 | 1533946,Busts of Ute Indians [1],"Black-and-white photograph of a bust of an American Indian by Millard F. Malin, from a set commissioned by the State of Utah in 1934. His models were Ute Indians in the Uinta Basin.",,1934; 1935; 1936,P0177 Millard F. Malin Photographs,"Indians of North America--Monuments--Photographs; Indians of North America--Art--Photographs; Malin, Millard F., 1891-1974--Works--Photographs; Sculptures--Photographs; Indigenous peoples--North America", 7 | 1398979,"Navajo Pavilion, Gateway Center, February 14, 2002 [18]","Color photograph of Navajo performers at the Navajo pavillion, Gateway Center in downtown Salt Lake City, February 14, 2002.",,2002-02-14,P0810 Peter L. Goss Photograph Collection,"Navajo Indians--Music--Photographs; Navajo Indians-Photographs; Salt Lake City (Utah)--Photographs; Olympic Winter Games (19th : 2002 : Salt Lake City, Utah)--Photographs; Indigenous peoples--North America","Salt Lake City, Salt Lake County, Utah, United States" 8 | 962277,"Photo taken during the AIM takeover and ultimate surrender at Wounded Knee, South Dakota. (De-briefing or court hearing.)","Photo taken at a court hearing or de-briefing following the American Indian Movement takeover at Wounded Knee, South Dakota, in 1973.",,1973,P0181 Stanley Lyman Photograph Collection,"Wounded Knee (S.D.)--History--Indian occupation, 1973--Photographs; Indians of North America; Indigenous peoples--North America",Wounded Knee (S.D.); South Dakota 9 | 2348627,Sur. Navajo Mt. looking down Colorado from 5 miles below San Juan,"Black and white photograph showing a view of Navajo Mountain from the Colorado River. Photo taken during the U.S. Geological Survey's 1921 survey of the San Juan River, led by K. W. Trimble, with Bert Loper serving as chief boatman.",,1921,P0243 Grand Canyon and San Juan River photograph colleciton,Navajo Mountain (Utah and Ariz.)--Photographs; Colorado River (Colo.-Mexico)--Photographs,Colorado River (Colo.-Mexico); San Juan County (Utah) 10 | 1498946,Spanish at Indian pueblo,"Photograph of an illustration in an unidentified publication, artist's rendition of a party of Spanish horsemen at an Indian pueblo, perhaps in New Mexico.",,1600; 1601; 1602; 1603; 1604; 1605; 1606; 1607; 1608; 1609; 1610; 1611; 1612; 1613; 1614; 1615; 1616; 1617; 1618; 1619; 1620; 1621; 1622; 1623; 1624; 1625; 1626; 1627; 1628; 1629; 1630; 1631; 1632; 1633; 1634; 1635; 1636; 1637; 1638; 1639; 1640; 1641; 1642; 1643; 1644; 1645; 1646; 1647; 1648; 1649; 1650; 1651; 1652; 1653; 1654; 1655; 1656; 1657; 1658; 1659; 1660; 1661; 1662; 1663; 1664; 1665; 1666; 1667; 1668; 1669; 1670; 1671; 1672; 1673; 1674; 1675; 1676; 1677; 1678; 1679; 1680; 1681; 1682; 1683; 1684; 1685; 1686; 1687; 1688; 1689; 1690; 1691; 1692; 1693; 1694; 1695; 1696; 1697; 1698; 1699; 1700; 1701; 1702; 1703; 1704; 1705; 1706; 1707; 1708; 1709; 1710; 1711; 1712; 1713; 1714; 1715; 1716; 1717; 1718; 1719; 1720; 1721; 1722; 1723; 1724; 1725; 1726; 1727; 1728; 1729; 1730; 1731; 1732; 1733; 1734; 1735; 1736; 1737; 1738; 1739; 1740; 1741; 1742; 1743; 1744; 1745; 1746; 1747; 1748; 1749; 1750; 1751; 1752; 1753; 1754; 1755; 1756; 1757; 1758; 1759; 1760; 1761; 1762; 1763; 1764; 1765; 1766; 1767; 1768; 1769; 1770; 1771; 1772; 1773; 1774; 1775; 1776; 1777; 1778; 1779; 1780; 1781; 1782; 1783; 1784; 1785; 1786; 1787; 1788; 1789; 1790; 1791; 1792; 1793; 1794; 1795; 1796; 1797; 1798; 1799; 1800,P0185 Drawings of Western History,"Pueblo Indians--History--Art; Southwest, New--Discovery and exploration--Art; Indigenous peoples--North America","Southwest, New; New Mexico" 11 | 995167,Native Americans herding sheep on horseback,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation,,1950; 1951; 1952; 1953; 1954; 1955; 1956; 1957; 1958; 1959; 1960; 1961; 1962; 1963; 1964; 1965; 1966; 1967; 1968; 1969; 1970; 1971; 1972; 1973; 1974; 1975; 1976; 1977; 1978; 1979; 1980,P0561 Wallace Stegner Photograph Collection,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America, 12 | 947066,"Large woodcut located in Olympic Valley, California","Photo of a large wood sculpture at Palisades Tahoe (previously Squaw Valley) in Olympic Valley, California, depicting skiers. It was carved in 1995 ",,1995; 1996; 1997; 1998; 1999; 2000,P0413 Alan K. Engen Photograph Collection,Skiers--Art; Skis and skiing--Art; Wood sculpture—Photographs,"Palisades Tahoe, Placer County, California, United States; Olympic Valley, Placer County, California, United States" 13 | 958790,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960.","Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960",,1960,P0790 Shipler Studio Photograph Collection,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs",Utah; Tooele County (Utah); Bonneville Salt Flats (Utah) 14 | 998919,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah) 15 | 1302623,"Basalt-capped mesa on Dolores (Triassic), 6± miles south of Beddehoche (Indian Wells), Ariz., 1909 (photo G-67)","Photograph of Black Butte, a basalt-capped mesa south of Indian Wells. From Herbert E. Gregory Book 2: Navajo-Hopi, San Juan 1909","Gregory, Herbert E. (Herbert Ernest), 1869-1952",,P0013 Herbert E. Gregory Photograph Collection,Volcanic fields--Arizona--Apache County--Photographs; Volcanic fields--Navajo Indian Reservation--Photographs; Buttes--Arizona--Apache County--Photographs; Buttes--Navajo Indian Reservation--Photographs; Landforms--Arizona--Apache County--Photographs; Landforms--Navajo Indian Reservation--Photographs; Geology--Arizona--Apache County--Photographs; Navajo Indian Reservation--Photographs,"Black Butte (Navajo County, Ariz.); Five Buttes (Ariz.); Navajo County (Ariz.); Navajo Indian Reservation; Arizona" 16 | 995167,Native Americans herding sheep on horseback,Photograph of Navajo Indians on horseback herding sheep; unidentified location but probably in Navajo Reservation,,1950; 1951; 1952; 1953; 1954; 1955; 1956; 1957; 1958; 1959; 1960; 1961; 1962; 1963; 1964; 1965; 1966; 1967; 1968; 1969; 1970; 1971; 1972; 1973; 1974; 1975; 1976; 1977; 1978; 1979; 1980,,Horsemanship--Photographs; Sheepherding--Photographs; Navajo Indians--Photographs; Navajo Indian Reservation--Photographs; Indigenous peoples--North America,Navajo Indian Reservation 17 | 2364219,"Pioneer Day parade, 1880 (Carter, photo)","Black and white photograph of the Salt Lake Pioneer Day Parade, July 24, 1880.","Carter, C. W.",,"P0251 Salt Lake City, Utah",Mormon pioneers--Photographs; Pioneer Day (Mormon history)--Photographs; Parades--Utah--Salt Lake City--Photographs,Salt Lake City (Utah); Salt Lake County (Utah) 18 | 941713,Newly arrived evacuees standing behind their baggage.,Photo of two newly arrived evacuees standing by a truck behind their baggage at the Tule Lake Relocation Center in California during World War II,,1942; 1943 ,P0144 Japanese American Relocation Photograph Collection,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Clothing & dress--Tule Lake Relocation Center--1940-1950",Modoc County (Calif.); California 19 | 941496,Evacuees cleaning vegetables in the packing shed.,Photo of evacuees cleaning vegetables in the packing shed at the Tule Lake Relocation Center in California during World War II,,1941; 1942; 1943 ,P0144 Japanese Relocation Photograph Collection,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; Tule Lake Relocation Center--People--1940-1950; World War, 1939-1945--Concentration camps--California; Agriculture--1940-1950",Modoc County (Calif.); California 20 | 941536,Evacuees harvesting potatoes at Tule Lake. [5],Photo of evacuees harvesting potatoes at the Tule Lake Relocation Center in California during World War II,,1942-11-05,P0144 Japanese Relocation Photograph Collection,"Japanese Americans--Evacuation and relocation, 1942-1945--Photographs; Tule Lake Relocation Center--Photographs; World War, 1939-1945--Concentration camps--California; Farming--California--Tule Lake--1940-1950; Agricultural laborers--California--Tule lake--1940-1950", 21 | 958469,"Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle the ""Challenger"" on the Bonneville Salt Flats Raceway in 1960.","Photo of Mickey Thompson and wife Trudy Thompson standing in front of his racing vehicle, the ""Challenger,"" on the Bonneville Salt Flats Raceway in 1960",,,P0790 Shipler Studio Photograph Collection,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs", 22 | 958790,"Mickey Thompson inside his racing vehicle the ""Challenger"" getting a hug from his wife Trudy Thompson on the Bonneville Salt Flats Raceway in 1960.","Photo of Mickey Thompson inside his racing vehicle, the ""Challenger,"" getting a kiss from his wife, Trudy Thompson, while crew members stand by on the Bonneville Salt Flats Raceway in 1960",,,P0790 Shipler Studio Photograph Collection,"Thompson, Mickey, 1928-1988--Photographs; Thompson, Trudy--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1960-1970--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs", 23 | 998946,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah) 24 | 998908,"Vern Adix's 1947 Centennial Pioneer Days covered wagon throne for coronation of Pioneer Days Queen Calleen Alice Robinson in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of Vern Adix's 1947 Centennial Pioneer Days covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history----Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah) 25 | 998938,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Mormon history); Days of '47; Holidays; Local holidays; Theatrical sets; Costumes (character dress); Thrones; McKay, Calleen Alice Robinson, 1928-2005; Women; Pioneer Days Royalty; Centennial Queens; Beauty contestants",Salt Lake City (Utah); Salt Lake County (Utah) 26 | 998919,"1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City","Scan of 35mm slide of 1947 Centennial Coronation of Pioneer Days Queen Calleen Alice Robinson seated in Vern Adix's covered wagon throne in the Utah State Capitol rotunda, Salt Lake City",,,P0316 Vern Adix Photograph Collection,"Pioneer Day (Latter Day Saint history) (Mormon history--Photographs; Utah State Capitol (Salt Lake City, Utah)--Photographs",Salt Lake City (Utah); Salt Lake County (Utah) 27 | 2292054,Albert Fritz letters about peaceful demonstrations,"Series of letters from Albert Fritz to the Salt Lake Police Department, Assistant Chief Ralph Knusden regarding peaceful demonstrations. NAACP call for protest on State Capitol building due to failure of the Utah Legislature to adopt civil rights legislation regarding housing and public accommodations (2005). NAACP appealed to the Governor of the state of Utah to include civil rights legislation on the docket for the special session of the Utah state Legislature, which he has announced. Local NAACP leaders noticed that Utah is the only northern state with no civil rights legislation (1968). Letter from Albert B. Fritz (Salt Lake NAACP) to Honorable George D. Clyde, Governor of Utah (1965?) regarding lack of civil rights legislation in Utah. Article from the Wall Street Journal (1964) ""Civil Rights Irony: New U.S. Agency's First Case Likely to Come From Utah"" by Donald Moffitt. Article discusses racism in Salt Lake City and highlights Chuck Nabors who moved to Salt Lake City to attend the University of Utah. Nabors rented an apartment sight unseen, when the landlord saw he was an African American the landlord backed out of lease. After Nabors found a landlord who would rent to him, neighbors petitioned to have him leave.",,1952-05-10; 2005-03-07; 1968-04-10; 2008-04-16; 1964-10-20,ACCN 1007 Charles James Nabors papers,"Civil rights movements--United States--History--20th century; Civil rights--Utah; Race discrimination--Religious aspects--Latter Day Saint churches; African Americans; Racism against Black people; Nabors, Charles James,1934-1986; African American scientists", 28 | 1396789,Working papers toward a history of the Spanish-speaking people of Utah: a report of research of the Mexican-American Documentation Project,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah. Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah.""",,,,Hispanic Americans--Utah; Latin Americans--Utah, 29 | 1396777,Human rights and the Native peoples of the Americas,"The 14th Occasional paper of the University of Utah's American West Center, including an essay by Mexican anthropologist Alejandro Marroquin, ""The problem of racial discrimination,"" about the history of discrimination against American Indians and efforts to address it; a statement of the Canadian government on Indian policy from 1969; an essay by S. Lyman Tyler about Indian policy in the United States during 1968-1971; a statement by Forrest J. Gerard, Assistant Secretary for Indian Affairs dated April 5, 1979; and a bibliography of discrimination.",,,,"Indians--Legal status, laws, etc.; Indians of North America--Legal status, laws, etc.--Canada; Indians of North America--Legal status, laws, etc.--United States; Discrimination; Indians--Social conditions; Indigenous peoples--North America",America; North America; United States; Canada 30 | 1049391,"Three Nephites"" stories [3]","A collection of handwritten and typed papers recounting stories about mysterious visitors, often identified as ""Nephites""",,1932; 1933; 1934; 1935; 1936; 1937; 1938; 1939; 1940; 1941; 1942; 1943; 1944; 1945; 1946,,Nephites; Folklore--Utah; Latter Day Saints--Utah--History--Anecdotes, 31 | 1006680,Letter of Ebenezer and George Brooks,"October 6, 1880 - Corinth, New York; Brooks, Ebenezer and George, to My Dear beloved niece, and Dear Cosen Angela Ebenezer writes on his eighty-eighth birthday to tell of his conversion to Mormonism. George, his son, adds a note. Short genealogical list of Brookses",,,MS0120 Philip T. Blair Family Papers,Church of Jesus Christ of Latter-day Saints--History; Utah--History; Mormon converts, 32 | 1470940,"Interview with Joseph Ward Spendlove, Downwinders of Utah Archive, June 25, 2019","Transcript (8 pages) of an interview conducted by Justin Sorensen and Anthony Sams in Tooele, Utah on June 25, 2019. Spendlove discusses his experiences growing up in Delta, Utah. He discusses the various health issues experienced by his family members. He mentions playing outside during the testing and not being concerned about the possible effects. He also discusses aspects of Downwinders receiving compensation.",,,Everett L. Cooley Oral History Project,Nuclear weapons Testing; Nuclear weapons--United States--Testing; Nuclear weapons testing victims; Nuclear weapons--United States--Testing; Radioactive fallout,Corinth (N.Y.) 33 | 818891,"Peter Loewenberg, Los Angeles, California: an interview by Newell Bringhurst","Transcript (28 pages) of interview by Newell G. Bringhurst with Peter Loewenberg, as associate of Fawn Brodie, on December 12, 1988. This interview is no. 272 in the Everett L. Cooley Oral History Project, and tape no. U-941. Accompanied by Loewenberg's curriculum vitae","Loewenberg, Peter, 1933-",,ACCN 0814 Everett L. Cooley Oral History Project,"Loewenberg, Peter, 1933- --Interviews; Brodie, Fawn McKay, 1915-1981--Biography; Latter Day Saints--Biography; Mormon scholars--Biography; University of California, Los Angeles--Faculty--Biography","Los Angeles, Los Angeles County, California, United States, http://sws.geonames.org/5368361/" 34 | 1396789,Working papers toward a history of the Spanish-speaking people of Utah: a report of research of the Mexican-American Documentation Project,"A collection of papers gathered as the fourth Occasional paper of the University of Utah's American West Center, focused on the history of Spanish-speaking people of Utah. Articles include Vincent Mayer's ""Oral history: another approach to ethnic history""; Paul Morgan and Vincent Mayer's ""The Spanish-speaking population of Utah from 1900 to 1935""; Ann Nelson's ""Spanish-speakeing migrant laborer in Utah 1950 to 1955""; and Greg Coronado's ""Spanish-speaking organizations in Utah.""",,,American West Center Research Projects,Hispanic Americans--Utah; Latin Americans--Utah,"Utah, United States" 35 | 893658,"Interviews with African Americans in Utah, Alberta Henry, Interview 1","Transcript (137 pages) of an interview by Leslie Kelen with Alberta Henry on July 21, 1983. From Interviews with African Americans in Utah",,,"Ms0453, Interviews with Blacks in Utah, 1982-1988","African Americans--Utah--Interviews; Henry, Alberta H., 1920-2005--Interviews; African Americans--Civil rights--Utah; Utah--Race relations", 36 | 958364,"John Cobb's racing vehicle the ""Railton Mobil Special"" being worked on by a group of men on the Bonneville Salt Flats Raceway in 1938. [10]","Photo of the chassis of John Cobb's racing vehicle, the ""Railton Mobil Special,"" being worked on by a group of men on the Bonneville Salt Flats Raceway in 1938. Cobb is third from left, talking to the man in a suit and hat ",,,P0790 Shipler Studio Photograph Collection,"Cobb, John Rhodes, 1899-1952--Photographs; Automobiles, Racing--Utah--Bonneville Salt Flats--Photographs; Bonneville Salt Flats (Utah); Bonneville Salt Flats Race, Utah--Photographs; Automobile racing--Utah--1930-1940--Photographs; Automobiles, Racing--Speed records--History; Automobile racing--Speed records--History; Antique and classic cars--Photographs",Utah; Tooele County (Utah); Bonneville Salt Flats (Utah) 37 | 946265,"Arnold Lunn, left, ski historian and author of the book, The Story of Ski-ing, 1952. And Hjalmar Hvam, right, ski pioneer and inventor of America's first safety binding in 1937.","Photo shows skiing pioneers Arnold Lunn (left) and Hjalmar Hvam, probably in the 1940s ",,,P0413 Alan K. Engen Photograph Collection,"Lunn, Arnold, 1888-1974--Photographs; Hvam, Hjalmar, 1902-1996--Photpraphs; Skiers--Photographs", 38 | 947346,"Utah ski pioneer Mel Fletcher skiing on a pair of his homemade ""Barrel Staves,"" circa 1952.","Photo of Mel Fletcher, skiing pioneer and ski instructor, on homemade skis",,,P0413 Alan K. Engen Photograph Collection,"Fletcher, Mel, 1918-2010--Photographs; Skiers—Utah--Photographs"," Deer Valley (Summit County, Utah); Park City (Utah); Summit County (Utah)" 39 | 1739616,"Crime, Veda Goff, murder victim","Black and white photograph of Vida Irene Goff, who was murdered at Magna, Utah, December 29, 1945 ",,1945-12-29,P0244 Olive Woolley Burt Photograph Collection ,Murder victims--Photographs, 40 | 2509070," ""Captive Jewels--Our Crippled Children"" speech, retyped",,"Priest, Ivy Baker, 1905-1975", 1953; 1954; 1955; 1956; 1957; 1958; 1959; 1960; 1961,MS0163 Ivy Baker Priest Papers,"Priest, Ivy Baker, 1905-1975; Women in politics--United States--Sources; Women--Utah--Biography; United States--Department of the Treasury; National Society for Crippled Children and Adults; People with disabilities", 41 | --------------------------------------------------------------------------------