├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── demo
    ├── __init__.py
    ├── pydantic.ipynb
    └── requests_session_example.py
├── media
    ├── airflow1.png
    ├── airflow2.png
    ├── airflow3.png
    ├── airflow4.png
    ├── airflow5.png
    ├── airflow7.png
    ├── env-mgmt.png
    ├── marc1.png
    ├── marc2.png
    ├── pax-opex1.png
    ├── pax-opex2.png
    └── pax-opex3.png
├── mtg_notes.md
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 
131 | # IntelliJ
132 | .idea


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 | 
3 | ## How to contribute
4 | Use [pull requests](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) to propose changes to this repository:
5 | 1. Fork this repo and create your branch from `main`
6 | 2. Make changes/additions in your fork
7 | 3. When ready issue a pull request
8 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Tomek
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # p4l-resources
  2 | Shared space for the Python{4}Lib group.
  3 | 
  4 | See our [meeting notes](mtg_notes.md) for more details.
  5 | 
  6 | Upcoming meetings (meetings at 11am Eastern time):
  7 | + *No meeting on May 14 during Code4Lib conference*
  8 | + May 28, 2024: Thomas Guinard talks about Jupiter Kernel Gateway
  9 | + June 11, 2024: Rebecca Hyams demos Postman
 10 | + June 25, 2024: Charles Brown-Roberts & Eddie Prieto introduce deployment of web apps (Flask, security, and more)
 11 | 
 12 | Would like to suggest a worthy resource? See [contributing instructions](CONTRIBUTING.md).
 13 | 
 14 | 
 15 | ## Python Resources
 16 | ### Reference
 17 | + [Python Cheatsheet](https://www.pythoncheatsheet.org/)
 18 | 
 19 | ### Books
 20 | + [Automate the Boring Stuff with Python : practical programming for total beginners / Al Sweigart](https://worldcat.org/title/1128094127)
 21 | + [Python crash course : a hands-on, project based introduction to programming / Eric Matthes](https://search.worldcat.org/title/1350635022)
 22 | + [Python workout: 50 ten-minute exercises / Reuven M. Lerner](https://search.worldcat.org/title/1121083840)
 23 | + [Effective Python: 59 Ways to Write Better Python / Brett Slatkin](https://www.worldcat.org/title/1140129622)
 24 | + [Pandas for everyone: Python data analysis / Daniel Y. Chen](https://worldcat.org/en/title/1240309883)
 25 | + [Data Visualization with Python and JavaScript, 2nd Editionby Kyran Dale](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/)
 26 | 
 27 | ### Tutorials
 28 | + [Official Python Tutorial](https://docs.python.org/3/tutorial/index.html)
 29 | 
 30 | ### Courses
 31 | + General Python courses on [Coursera](https://www.coursera.org/courses?query=python) (free to enroll)
 32 | + [Python for Librarians / Library Juice Academy](https://libraryjuiceacademy.com/shop/course/270-python-for-librarians/) (fee)
 33 | + [Library Carpentry](https://librarycarpentry.org/lessons/) (free lessons, paid sessions with an instructor)
 34 | + [Learn Python 3 the Hard Way / Zed Shaw](https://shop.learncodethehardway.org/access/buy/9/) (free with O'Reilly for Higher Education subscription)
 35 | 
 36 | ### Articles
 37 | + [Fuzzy Matching at Scale / Josh Taylor](https://towardsdatascience.com/fuzzy-matching-at-scale-84f2bfd0c536)
 38 | + [19 Sweet Python Syntax Sugar for Improving Your Coding Experience](https://medium.com/techtofreedom/19-sweet-python-syntax-sugar-for-improving-your-coding-experience-37c4118fc6b1)
 39 | 
 40 | ### Podcasts
 41 | + [Python Bytes](https://pythonbytes.fm/) weekly Python news podcast hosted by Michael Kennedy and Brian Okken
 42 | + [Test & Code](https://testandcode.com/) hosted by Brian Okken, focused on automated testing in Python
 43 | + [Talk Python To Me](https://talkpython.fm/) hosted by Michael Kennedy
 44 | + [Podcast.__init__](https://www.pythonpodcast.com/) hosted by Tobias Macey
 45 | + [The Real Python Podcast](https://realpython.com/podcasts/rpp/) weekly coding tips, news, and interviews
 46 | 
 47 | ### Blogs
 48 | + [Practical Business Python](https://pbpython.com/) / data science centric
 49 | 
 50 | ### Member Presentations
 51 | + [Finding a path forward: The use of Python to support technical services work in academic libraries](https://docs.google.com/presentation/d/1598qxRIB08_kLaJov_CsKWHw5VctFY0MIZhohQUG6ww/edit#slide=id.p1) Talk given at Python{4}Lib 9/20/22 by Maria Collins and Xiaoyan Song based on their presentation at ER&L 2022
 52 | + [Intro to unit testing in Python / Yamil Suárez](https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing)
 53 | + [Speedy pandas : a super brief intro to Python's pandas library / Michelle Janowiecki](https://docs.google.com/presentation/d/1xRdNVonTxi9-gEsQkNvbF1e47o_2cuo1iimunoFUky4/edit#slide=id.p)
 54 | 
 55 | ## Tools
 56 | 
 57 | ### Library metadata
 58 | + [pybibframe](https://pypi.org/project/pybibframe/) - MARC/XML to RDF or Versa output converter
 59 | + [pymarc](https://pymarc.readthedocs.io/en/latest/) - MARC parser
 60 | + [marcgrep](https://github.com/phette23/marcgreppy) - CLI for searching MARC files
 61 | 
 62 | #### ILS & other library systems wrappers
 63 | + [almapipy](https://github.com/UCDavisLibrary/almapipy) - Alma API wrapper
 64 | + [caiasoft-sdk-python](https://github.com/kstatelibraries/caiasoft-sdk-python) - SDK for Connecting to the CaiaSoft API
 65 | 
 66 | #### Transliteration / romanization
 67 | + [Aksharamukha](https://github.com/virtualvinodh/aksharamukha-python) - transliteration of 120 Indic languages
 68 | + [ArabicTransliterator](https://github.com/MTG/ArabicTransliterator) - ALA-LC transliteration tool for Arabic
 69 | + [cyrillic-transliteration](https://github.com/opendatakosovo/cyrillic-transliteration) - bi-directional transliteration of Cyrillic script to Latin script and vice versa
 70 | + [graphtransliterator](https://github.com/seanpue/graphtransliterator)
 71 | 
 72 | ### Data Analysis
 73 | #### Pandas
 74 | + [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/index.html)
 75 | + [Intro to Python: Pandas for Metadata Transformation and Cleanup / workshop by Michelle Janowiecki](https://mjanowiecki.github.io/intro-pandas-metadata/intro.html)
 76 | + [Speedy pandas / Michelle Janowiecki](https://docs.google.com/presentation/d/1xRdNVonTxi9-gEsQkNvbF1e47o_2cuo1iimunoFUky4/edit#slide=id.p)
 77 | + [All Pandas json_normalize() you should know for flattening JSON / B. Chen](https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd)
 78 | ### Data Validators
 79 | + [Pydantic official documentation](https://docs.pydantic.dev/latest/)
 80 | 
 81 | ### GUI
 82 | + [VisualTK](https://visualtk.com/) / (great starting point to visually create a GUI in Tkinter)
 83 | + [Gooey](https://pypi.org/project/Gooey/) (simple GUI package, transforms argparse into GUI)
 84 | 
 85 | ### HTTP
 86 | #### Requests
 87 | + [Requests official docs](https://requests.readthedocs.io/en/latest/)
 88 | + [Python's Requests Library (Guide) / Alex Ronquillo](https://realpython.com/python-requests/)
 89 | + [HTTPX official docs](https://www.python-httpx.org/)
 90 | #### Links checkers
 91 | + [LinkChecker official documenation](https://linkchecker.github.io/linkchecker/)
 92 | #### Retries
 93 | + [stamina official docs](https://stamina.hynek.me/en/stable/index.html)
 94 | + [tenacity offical docs](https://tenacity.readthedocs.io/en/latest/)
 95 | 
 96 | ### Packaging
 97 | #### Briefcase (packaging)
 98 | + [Briefcase documentation](https://briefcase.readthedocs.io/en/latest/)
 99 | + [PyCon 2020 'Snakes In a Case' talk by Russell Keith-Magee](https://us.pycon.org/2020/schedule/presentation/126/)
100 | + [Qt for Python & Briefcase](https://doc.qt.io/qtforpython/deployment-briefcase.html)
101 | 
102 | #### PyInstaller (packaging)
103 | + [PyInstaller documentation](https://pyinstaller.org/en/stable/index.html)
104 | + [Easy Steps to Create an Executable in Python Using Pyinstaller / Renu Khandelwal](https://medium.com/swlh/easy-steps-to-create-an-executable-in-python-using-pyinstaller-cc48393bcc64)
105 | + [Using PyInstaller to Easily Distribute Python Applications / Luke Lee](https://realpython.com/pyinstaller-python/)
106 | + [auto-py-to-exe]
107 | (https://pypi.org/project/auto-py-to-exe/) - PyInstaller made easy
108 | 
109 | ### QR codes
110 | + [QR Code Demystify / Ivan](https://ivantay2003.medium.com/qr-code-demystify-2a5263ab136e)
111 | + [python-barcode](https://python-barcode.readthedocs.io/en/stable/)
112 | + [PyQRCode](https://pythonhosted.org/PyQRCode/)
113 | + [pyzbar](https://github.com/NaturalHistoryMuseum/pyzbar/)
114 | 
115 | ### RDF
116 | + [rdflib](https://rdflib.readthedocs.io/en/stable/)
117 | + [Gephi](https://gephi.org)
118 | 
119 | ### Testing
120 | + [Intro to unit testing in Python / Yamil Suárez](https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing)
121 | 
122 | ### Visualization
123 | + [Python Data Visualization: Where to Start? : Interview with Chris Moffitt / Talk Python To Me: episode # 384](https://talkpython.fm/episodes/transcript/384/python-data-visualization-where-to-start) (a great overview of available tools)
124 | + [Data Visualization with Python and JavaScript, 2nd Edition
125 | by Kyran Dale](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/)
126 | 


--------------------------------------------------------------------------------
/demo/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/demo/__init__.py


--------------------------------------------------------------------------------
/demo/pydantic.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "id": "5a244171-3219-4bd5-9c97-ae59c7621536",
  7 |    "metadata": {
  8 |     "tags": []
  9 |    },
 10 |    "outputs": [],
 11 |    "source": [
 12 |     "!pip install pydantic"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "id": "0d596cd1-9e07-4d91-8903-1f28cc011c9f",
 18 |    "metadata": {},
 19 |    "source": [
 20 |     "## Processing data using dicts\n",
 21 |     "\n",
 22 |     "Dictionaries are the backbone of python data structures, but it is very easy to miss errors with them because they do not enforce what kind of data you put into them."
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "code",
 27 |    "execution_count": null,
 28 |    "id": "690a8ae9-69a8-4fa9-8a5b-7e653e268263",
 29 |    "metadata": {
 30 |     "tags": []
 31 |    },
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "def report_pet(pet_dict):\n",
 35 |     "    print(f\"My name is {pet_dict['name']} and I need {pet_dict['n_legs'] / 2} pairs of pants\")"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": null,
 41 |    "id": "38adc86e-7f71-43d2-baa8-1567c339ec4b",
 42 |    "metadata": {
 43 |     "tags": []
 44 |    },
 45 |    "outputs": [],
 46 |    "source": [
 47 |     "json_1 = {\"name\": \"Mittens\", \"n_legs\": 4}\n",
 48 |     "json_2 = {\"name\": \"Slither\", \"n_legs\": 0}\n",
 49 |     "json_3 = {\"name\": \"Skitter\", \"n_legs\": \"8\"}\n",
 50 |     "json_4 = {\"n_legs\": 6}"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": null,
 56 |    "id": "59e2d709-11fd-4003-b425-5157c8dfa3c1",
 57 |    "metadata": {
 58 |     "tags": []
 59 |    },
 60 |    "outputs": [],
 61 |    "source": [
 62 |     "report_pet(json_1)"
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": null,
 68 |    "id": "f455fb8e-d89a-4480-b957-c68c1ab2352b",
 69 |    "metadata": {
 70 |     "tags": []
 71 |    },
 72 |    "outputs": [],
 73 |    "source": [
 74 |     "report_pet(json_2)"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "markdown",
 79 |    "id": "c28bfe76-1e30-4b8f-9c8d-07df4ce36ec4",
 80 |    "metadata": {},
 81 |    "source": [
 82 |     "The first two pets work fine because their dictionaries have data that happens to be valid. But things start to go wrong if we pass the wrong data type"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": null,
 88 |    "id": "01d1833f-9931-4800-b445-bee966e91606",
 89 |    "metadata": {
 90 |     "tags": []
 91 |    },
 92 |    "outputs": [],
 93 |    "source": [
 94 |     "report_pet(json_3)"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "markdown",
 99 |    "id": "5fc6b993-72b6-4046-8108-0c2bff3c50b0",
100 |    "metadata": {},
101 |    "source": [
102 |     "This error only comes up when we run our function to report on the pet - it doesn't check the data any earlier."
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": null,
108 |    "id": "bdf85d03-10ff-4a95-a545-498f59910610",
109 |    "metadata": {
110 |     "tags": []
111 |    },
112 |    "outputs": [],
113 |    "source": [
114 |     "report_pet(json_4)"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "markdown",
119 |    "id": "e4075caa-33fc-4139-a710-7683a2aebb7b",
120 |    "metadata": {
121 |     "execution": {
122 |      "iopub.execute_input": "2023-07-27T13:36:02.207076Z",
123 |      "iopub.status.busy": "2023-07-27T13:36:02.206472Z",
124 |      "iopub.status.idle": "2023-07-27T13:36:02.210591Z",
125 |      "shell.execute_reply": "2023-07-27T13:36:02.209948Z",
126 |      "shell.execute_reply.started": "2023-07-27T13:36:02.207056Z"
127 |     },
128 |     "tags": []
129 |    },
130 |    "source": [
131 |     "And when our dictionary is missing an entire field, we need to figure out what the \"key error\" is."
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "markdown",
136 |    "id": "f67db9a6-3861-45c5-917b-468afe93344f",
137 |    "metadata": {},
138 |    "source": [
139 |     "## Processing data with Pydantic"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "id": "47e53073-a6db-4986-a5da-17015803ff83",
145 |    "metadata": {},
146 |    "source": [
147 |     "[Pydantic](https://docs.pydantic.dev/latest/https://docs.pydantic.dev/latest/) uses python type hints to define a class - a way of stating the exact shape of data we expect to receive."
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": null,
153 |    "id": "b91eb9a1-37b1-4456-bf76-27939bebde1d",
154 |    "metadata": {
155 |     "tags": []
156 |    },
157 |    "outputs": [],
158 |    "source": [
159 |     "from pydantic import BaseModel\n",
160 |     "\n",
161 |     "class PydanticPet(BaseModel):\n",
162 |     "    name: str\n",
163 |     "    n_legs: int\n",
164 |     "\n",
165 |     "def report_pypet(pypet: PydanticPet):\n",
166 |     "    print(f\"My name is {pypet.name} and I need {pypet.n_legs / 2} pairs of pants\")"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "markdown",
171 |    "id": "e3d62f5e-8077-4c5f-805d-6987f108e683",
172 |    "metadata": {},
173 |    "source": [
174 |     "Note that we aren't accessing dictionary keys with `[\"strings\"]` that may or may not succeed, but instead using dot notation `pypet.name` because we _know_ that every `PydanticPet` instance has an attribute called `name`."
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "code",
179 |    "execution_count": null,
180 |    "id": "2ce2f8c1-bcac-422c-bfef-74d3aea57489",
181 |    "metadata": {
182 |     "tags": []
183 |    },
184 |    "outputs": [],
185 |    "source": [
186 |     "pypet_1 = PydanticPet(**json_1)\n",
187 |     "# Using ** is a python trick that passes a dictionary to a function by \"expanding\" it and putting in the key names as arugments\n",
188 |     "# pypet_1 = PydanticPet(name=\"Mittens\", n_legs=4)"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "code",
193 |    "execution_count": null,
194 |    "id": "95382aac-56dd-4997-9e1c-060b585ab8fb",
195 |    "metadata": {
196 |     "tags": []
197 |    },
198 |    "outputs": [],
199 |    "source": [
200 |     "pypet_1"
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "code",
205 |    "execution_count": null,
206 |    "id": "2068e358-1139-4174-9571-4e07869b0f58",
207 |    "metadata": {
208 |     "tags": []
209 |    },
210 |    "outputs": [],
211 |    "source": [
212 |     "report_pypet(pypet_1)"
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "code",
217 |    "execution_count": null,
218 |    "id": "d8755600-1b92-4b9b-ba61-62e7cc5bbff6",
219 |    "metadata": {
220 |     "tags": []
221 |    },
222 |    "outputs": [],
223 |    "source": [
224 |     "pypet_2 = PydanticPet(**json_2)"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": null,
230 |    "id": "77225e76-eb12-43b2-8941-1936178e2e64",
231 |    "metadata": {
232 |     "tags": []
233 |    },
234 |    "outputs": [],
235 |    "source": [
236 |     "pypet_2"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": null,
242 |    "id": "041a1ea7-7c8b-464f-8755-2fd1e1ace059",
243 |    "metadata": {
244 |     "tags": []
245 |    },
246 |    "outputs": [],
247 |    "source": [
248 |     "report_pypet(pypet_2)"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "markdown",
253 |    "id": "8fe78f14-d817-4dbf-974a-48b05ae5a0b8",
254 |    "metadata": {},
255 |    "source": [
256 |     "Pydantic can automate certain kinds of data parsing, such as converting the string `\"8\"` to the integer `8`."
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "code",
261 |    "execution_count": null,
262 |    "id": "3a2519f3-ebdf-41d1-bda5-e342a2520a93",
263 |    "metadata": {
264 |     "tags": []
265 |    },
266 |    "outputs": [],
267 |    "source": [
268 |     "pypet_3 = PydanticPet(**json_3)"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "code",
273 |    "execution_count": null,
274 |    "id": "e779abfa-9808-47c8-8490-f3883070c14c",
275 |    "metadata": {
276 |     "tags": []
277 |    },
278 |    "outputs": [],
279 |    "source": [
280 |     "pypet_3"
281 |    ]
282 |   },
283 |   {
284 |    "cell_type": "code",
285 |    "execution_count": null,
286 |    "id": "ee16367b-4cdf-48cb-8110-2f2d4bf32355",
287 |    "metadata": {
288 |     "tags": []
289 |    },
290 |    "outputs": [],
291 |    "source": [
292 |     "report_pypet(pypet_3)"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "code",
297 |    "execution_count": null,
298 |    "id": "1b66ef5f-21f3-4b8d-b43e-12fa05800f12",
299 |    "metadata": {
300 |     "tags": []
301 |    },
302 |    "outputs": [],
303 |    "source": [
304 |     "PydanticPet(**json_4)"
305 |    ]
306 |   },
307 |   {
308 |    "cell_type": "markdown",
309 |    "id": "da3c71cf-354d-4b22-be22-ba0eaec91afd",
310 |    "metadata": {},
311 |    "source": [
312 |     "Pydantic raises a `ValidationError` that provides a clear reason why the data passed in was invalid."
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "markdown",
317 |    "id": "b0c0aa3e-b907-4317-bb71-3615c15da220",
318 |    "metadata": {},
319 |    "source": [
320 |     "## Nesting and lists\n",
321 |     "\n",
322 |     "Pydantic models can refer to other pydantic models, and can nest lists of data too."
323 |    ]
324 |   },
325 |   {
326 |    "cell_type": "code",
327 |    "execution_count": null,
328 |    "id": "0ab0b456-873c-4a68-882d-e5c1abdcfc03",
329 |    "metadata": {
330 |     "tags": []
331 |    },
332 |    "outputs": [],
333 |    "source": [
334 |     "class PetDaycare(BaseModel):\n",
335 |     "    name: str\n",
336 |     "    founding_year: int | None # This indicates that founding_year is an optional attribute\n",
337 |     "    current_pets: list[PydanticPet] = []"
338 |    ]
339 |   },
340 |   {
341 |    "cell_type": "code",
342 |    "execution_count": null,
343 |    "id": "01ef018e-74d3-4c89-89ad-ebd71205f584",
344 |    "metadata": {
345 |     "tags": []
346 |    },
347 |    "outputs": [],
348 |    "source": [
349 |     "local_daycare = PetDaycare(name=\"All Things That Crawl\")"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "code",
354 |    "execution_count": null,
355 |    "id": "570f829b-1540-40cb-84c3-1a380b63e4b6",
356 |    "metadata": {
357 |     "tags": []
358 |    },
359 |    "outputs": [],
360 |    "source": [
361 |     "local_daycare"
362 |    ]
363 |   },
364 |   {
365 |    "cell_type": "code",
366 |    "execution_count": null,
367 |    "id": "6a179a22-3674-4dd9-be92-d624630e482a",
368 |    "metadata": {
369 |     "tags": []
370 |    },
371 |    "outputs": [],
372 |    "source": [
373 |     "local_daycare.current_pets.append(pypet_1)\n",
374 |     "local_daycare.current_pets.append(pypet_2)\n",
375 |     "local_daycare.current_pets.append(pypet_3)"
376 |    ]
377 |   },
378 |   {
379 |    "cell_type": "code",
380 |    "execution_count": null,
381 |    "id": "07beecd1-2281-4550-af8b-22780f19887c",
382 |    "metadata": {
383 |     "tags": []
384 |    },
385 |    "outputs": [],
386 |    "source": [
387 |     "local_daycare"
388 |    ]
389 |   },
390 |   {
391 |    "cell_type": "code",
392 |    "execution_count": null,
393 |    "id": "68d8f84b-9073-493d-a52f-e6130c28e25f",
394 |    "metadata": {
395 |     "tags": []
396 |    },
397 |    "outputs": [],
398 |    "source": [
399 |     "for pet in local_daycare.current_pets:\n",
400 |     "    report_pypet(pet)"
401 |    ]
402 |   },
403 |   {
404 |    "cell_type": "markdown",
405 |    "id": "1115a0bd-dcf9-4d21-b4c5-0e70d4ad7810",
406 |    "metadata": {},
407 |    "source": [
408 |     "One of the biggest uses of pydantic is serializing data to JSON to be used in API servers."
409 |    ]
410 |   },
411 |   {
412 |    "cell_type": "code",
413 |    "execution_count": null,
414 |    "id": "4de9d024-fe4d-41c0-b342-a3eed2218758",
415 |    "metadata": {
416 |     "tags": []
417 |    },
418 |    "outputs": [],
419 |    "source": [
420 |     "local_daycare.json()"
421 |    ]
422 |   },
423 |   {
424 |    "cell_type": "markdown",
425 |    "id": "60a1fc4c-320f-4254-92a2-5a9dba595f35",
426 |    "metadata": {},
427 |    "source": [
428 |     "Pydantic also can autogenerate a JSONSchema that can power API documentation pages."
429 |    ]
430 |   },
431 |   {
432 |    "cell_type": "code",
433 |    "execution_count": null,
434 |    "id": "7850fda9-051c-4731-904e-0ca62c7c9777",
435 |    "metadata": {
436 |     "tags": []
437 |    },
438 |    "outputs": [],
439 |    "source": [
440 |     "PetDaycare.schema()"
441 |    ]
442 |   },
443 |   {
444 |    "cell_type": "code",
445 |    "execution_count": null,
446 |    "id": "bfe1f33c-5424-44e3-b783-e636675ddd24",
447 |    "metadata": {},
448 |    "outputs": [],
449 |    "source": []
450 |   }
451 |  ],
452 |  "metadata": {
453 |   "kernelspec": {
454 |    "display_name": "Python 3 (ipykernel)",
455 |    "language": "python",
456 |    "name": "python3"
457 |   },
458 |   "language_info": {
459 |    "codemirror_mode": {
460 |     "name": "ipython",
461 |     "version": 3
462 |    },
463 |    "file_extension": ".py",
464 |    "mimetype": "text/x-python",
465 |    "name": "python",
466 |    "nbconvert_exporter": "python",
467 |    "pygments_lexer": "ipython3",
468 |    "version": "3.10.6"
469 |   }
470 |  },
471 |  "nbformat": 4,
472 |  "nbformat_minor": 5
473 | }
474 | 


--------------------------------------------------------------------------------
/demo/requests_session_example.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Pros:
 3 | The `requests.Session` object allows to persist parameters across all requests issued within the session.
 4 | When a service requires authentication, the session will store the credentials and persist them to be used
 5 | for subsequent calls.
 6 | If a service you connect to allows keep-alive connection, the Requests session will persist connection
 7 | across all requests instead of establishing a new one for each requests.
 8 | 
 9 | To install Requests library:
10 | `pip install requests`
11 | """
12 | 
13 | 
14 | from requests import Session
15 | 
16 | 
17 | def print_request_headers(r, *args, **kwargs):
18 |     print(r.url, r.request.headers)
19 | 
20 | 
21 | def make_multiple_requests_in_session():
22 | 	"""
23 | 	Issue multiple requests to the same service (id.loc.gov),
24 | 	persist the connection and attach appropriate headers.
25 | 
26 | 	id.gov.loc does not require authentication, but other services
27 | 	may. Credentials or access tokens can be stored in the session object and used
28 | 	for each request.
29 | 	"""
30 |     with Session() as session:
31 |         session.headers.update(
32 |             {"User-Agent": "my_email", "Accept": "application/json"}
33 |         )  # will attach these parameters to each request header during the session
34 |         session.timeout = 5
35 | 
36 |         terms = ["sh85080541", "sh91002704", "sh85088368"]
37 |         for term in terms:
38 |             url = f"https://id.loc.gov/authorities/subjects/{term}"
39 |             response = session.get(url, hooks={"response": print_request_headers})
40 |             if response.status_code == 200:
41 |                 yield response.json()
42 |             else:
43 |                 continue
44 | 
45 | 
46 | if __name__ == "__main__":
47 |     results = make_multiple_requests_in_session()
48 |     for response in results:
49 |         # do something with each response
50 |         pass
51 | 


--------------------------------------------------------------------------------
/media/airflow1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow1.png


--------------------------------------------------------------------------------
/media/airflow2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow2.png


--------------------------------------------------------------------------------
/media/airflow3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow3.png


--------------------------------------------------------------------------------
/media/airflow4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow4.png


--------------------------------------------------------------------------------
/media/airflow5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow5.png


--------------------------------------------------------------------------------
/media/airflow7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow7.png


--------------------------------------------------------------------------------
/media/env-mgmt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/env-mgmt.png


--------------------------------------------------------------------------------
/media/marc1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/marc1.png


--------------------------------------------------------------------------------
/media/marc2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/marc2.png


--------------------------------------------------------------------------------
/media/pax-opex1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/pax-opex1.png


--------------------------------------------------------------------------------
/media/pax-opex2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/pax-opex2.png


--------------------------------------------------------------------------------
/media/pax-opex3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/pax-opex3.png


--------------------------------------------------------------------------------
/mtg_notes.md:
--------------------------------------------------------------------------------
  1 | ### May 16, 2024 (Code4Lib Post-Conference Session)
  2 |   + Eric Phetteplace ran a workshop on [Python4Lib](https://2024.code4lib.org/workshop/Python4Lib) at Code4Lib 2024 in Ann Arbor
  3 |   + Started with an open discussion where we talked about people's experience with Python and some general topics
  4 |     + Some folks were mainly familiar with running Python in notebooks, others were more familiar with running Python scripts
  5 |     + We spoke a bit about managing dependencies and tools like Pipenv/Poetry that help with this and abstract over virtual environments
  6 |     + We discussed asyncio and asynchronoous programming generally, when to use it, what types of problems it addresses, and CPU-bound (computation heavy) vs IO-bound (network/files heavy) tasks
  7 |     + Eric introduced his [`marcgrep`](https://github.com/phette23/marcgreppy) CLI tool for searching MARC records
  8 |   + We worked through the [c4l24-python4lib](https://github.com/phette23/c4l24-python4lib) repo which has notebooks on several topics. The only topics we covered specifically were:
  9 |     + [Jupyter Notebooks](https://github.com/phette23/c4l24-python4lib/blob/main/docs/notebooks.md) (the material was delivered as notebooks)
 10 |     + [Pymarc](https://github.com/phette23/c4l24-python4lib/blob/main/docs/pymarc.ipynb) and common usage patterns, the most foolproof ways to get and modify record information
 11 |     + [Pandas](https://github.com/phette23/c4l24-python4lib/blob/main/docs/pandas.ipynb) and its fundamental concepts (DataFrames, Series), how to summarize loaded data, stopped after introducing how to filter via bracket expressions
 12 | 
 13 | ### April 30, 2024
 14 |   + David asked if anyone had experience with or knew of any automated discard assessment tools
 15 |     + Javier said he has 25,000 volumes to assess for discard
 16 |     + Tomasz said other groups may know more about these types of tools because tech services may not have responsibility for collections assessment. Reference librarians may know more about potential tools to use.
 17 |     + Sara Amato has used OCLC API “to look at WC holdings and compare also to HathiTrust and comparisons to other libraries in our group to help make decisions - not great for large scale projects but good for smaller lists. I don’t have the code up anywhere though… and it doesn’t have any item level data like circ.”
 18 |  + Tomasz asked if Pymarc will have a new release due to a change in how indicators are handled
 19 |    + Indicators will be a named tuple that can only have two positions rather than a list which could be of any length
 20 |      + The change is outlined in this merge request: https://gitlab.com/pymarc/pymarc/-/merge_requests/206
 21 |    + Ed: No scheduled release, reluctant to introduce another major version with breaking changes
 22 |    + More discussion of the change is in the [pymarc google group](https://groups.google.com/g/pymarc/c/cMkDb-dDDBY?pli=1)
 23 |  + Michael asked if anyone has experience working with APIs for wikimedia/wikimedia commons
 24 |    + He has copyright free newspaper images he would like to upload in bulk as PDFs (rather than image files which the other wikicommons tools can use)
 25 |    + Javier mentioned using the APIs to get data out of wikimedia commons but not to POST data
 26 |  + Tomasz asked about Michael’s involvement in movement to preserve Ukrainian cultural heritage materials after the start of the full scale invasion
 27 |    + Michael noted there are two parts to this preservation work:
 28 |      + [SUCHO](https://www.sucho.org/) works on preserving publicly available materials
 29 |      + There is a separate effort to back up digital materials that are not publicly available
 30 |    + Michael mentioned Maryna Paliienko, a Fulbright Scholar from Taras Shevchenko University, whose project focuses on archives
 31 |      + Maryna and Michael recently gave a presentation at NYU: https://www.nycarchivists.org/event-5671162
 32 |  + Michelle asked for help figuring out why her API calls hang when she tries to upload large files
 33 |    + Files are ~2GB and she is posting them using the DSpace API. The files have to be read in binary before uploading them and the requests just hang after uploading the file successfully
 34 |    + Yamil mentioned that Python has issues with downloading files that are larger than available RAM and wondered if it has a similar issue with uploading files larger than available RAM
 35 |      + He also provided link to streaming uploads with Requests: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads
 36 |    + Impromptu code review: https://github.com/mjanowiecki/dspace7-rest-api/blob/main/post/postItemsToCollection.py
 37 |      + Susan asked if the code is sending the correct residual size
 38 |      + If chunks are in unequal sizes (or the last chunk is not the same size as the others), the API will wait for the last chunk to reach the size of the other chunks
 39 |      + Ed said it could be helpful to add the complete upload size in the content-length header with the POST request
 40 |    + Michelle provided a link to a tool that makes it easier to authenticate using the DSpace API: https://github.com/the-library-code/dspace-rest-python/tree/main
 41 |  + John asked if anyone had recommendations for tools to use to take messy data from google docs and publish it to a dashboard a couple of times a year
 42 |    + Has been looking at [Streamlit](https://streamlit.io/) and [Pygwalker](https://github.com/Kanaries/pygwalker) as potential options
 43 |      + Pygwalker has tableau-like display
 44 |    + Jeremy used streamlit for a project with Hopkins Marine Station: https://taxa.stanford.edu/
 45 |      + One issue he noted was that every time a user would interact with the dashboard it would completely reload
 46 |  + Michael mentioned stumbling across a tool called [Discorpy](https://discorpy.readthedocs.io/en/latest/index.html) and thought it may be of interest after discussion in last Python4Lib session about image cropping/manipulation
 47 |    + It is a tool for measuring lens distortion in a camera
 48 |  + Yamil mentioned he is learning about [SeleniumBase](https://seleniumbase.io/)
 49 | 
 50 | ### April 16, 2024
 51 |  + David provided an update on the upcoming Python4Lib presentation schedule:
 52 |    + April 30 - open topics
 53 |    + May 14 - skipped, C4L in person
 54 |    + May 28 - Thomas will be talking Jupyter Kernel Gateways
 55 |    + June 11 - Rebecca will be talking Postman
 56 |  + Eric Phetteplace spoke about hosting a Python4Lib workshop at the upcoming Code4Lib conference
 57 |    + https://2024.code4lib.org/workshop/Python4Lib
 58 |    + He mentioned that he would welcome a a volunteer to help with session and mentione that he can probably get the cost of the workshop refunded for the volunteer
 59 |      + It’ll be a loose conversation similar to a Python4Lib missing and will cover more specific topics in the second half
 60 |      + He mentioned asyncio as a potential topic he would like to explore in the session
 61 |  + Eric spoke about getting access to some High Performance Computing and exploring parallel processing
 62 |    + He mentioned that this set up has a “head node” that coordinates with the other nodes
 63 |    + We shared some links with information on parallel work in Python
 64 |      + https://realpython.com/python-concurrency/
 65 |      + https://docs.python.org/3/library/multiprocessing.html
 66 |      + https://realpython.com/async-io-python/
 67 |      + https://realpython.com/python-gil/
 68 |    + Then we spent a long time talking about the pros and cons of doing parallel work with Python
 69 |      + Clinton had some details and examples of reasons why Python’s language design makes it comparatively very slow for parallel work compared to many other languages like Rust and C
 70 |      + GIL is going away https://www.blog.pythonlibrary.org/2023/08/16/global-interpreter-lock-optional-in-python-3-13/
 71 |  + We also talked about how despite the fact that Python is slower than other languages, you can take existing Python code/projects and update them over to the current parallel options in Python and in many situations you can still get really good improvements in performance
 72 |    + Michelle shared an example of working with the Alma API using asyncio
 73 |    + Her work went from a runtime of 1 hour for 2000 API calls to 5 minutes for 2000 API calls
 74 |      + https://github.com/jhu-library-applications/alma-api/blob/main/updateItemFieldsFromCSVAsync.py
 75 |      + Her code updates Alma items from a CSV, doing batches of 1000 rows at a time from the spreadsheet (to help catch errors in more manageable sets)
 76 |    + Clinton also shared a Python profiler, to help see what parts of your code are running slow/fast and which parts are using C-based code (which runs faster)
 77 |      + https://github.com/plasma-umass/scalene
 78 |      + He also shared apresentaion on python performance
 79 |        + [Python Performance Matters by Emery Berger (Strange Loop 2022)](https://www.youtube.com/watch?v=vVUnCXKuNOg)
 80 |    + Jerrell asked if anyone had been working on AI assisted image cropping
 81 |      + No one had worked on this yet but many people are interested in the topic
 82 |    + We briefly talked about the use of [Whisper (from OpenAI)](https://openai.com/research/whisper) to create transcripts of videos
 83 |      + We also spoke about [Otter AI](https://otter.ai/), another transcript platform that can use Zoom
 84 |    + Handprint also came up
 85 |      + https://2022.code4lib.org/talks/Handprint-A-program-to-explore-and-compare-major-cloudbased-services-for-handwritten-text-recognition
 86 | 
 87 | ### April 2, 2024
 88 |  + Charlotte and Tomasz have released a new [version (1.0) of Bookops-Worldcat](https://github.com/BookOps-CAT/bookops-worldcat), a Python wrapper for the WorldCat Metadata API.
 89 |    + The new version supports changes made in [version 2.0 of the Metadata API](https://developer.api.oclc.org/wc-metadata-v2).
 90 |    + The documentation is available on GitHub pages: https://bookops-cat.github.io/bookops-worldcat/
 91 |  + Lauren at Rice is working on a reclamation project, gave a shoutout to Rebecca for some python notes she shared in the past.
 92 |    + Here is Rebecca’s code:
 93 |      + Pulls specified data from holdings records in Alma, using the Bibs API
 94 |      + https://github.com/LibraryNinja/Holdings_Record_Inpsector
 95 |  + Rebecca talked about her recent work using Tkinter. She has been changing code written using PySimpleGUI to Tkinter after PySimpleGUI changed their licensing and would require a fee for higher ed use.
 96 |      + https://docs.python.org/3/library/tkinter.html
 97 |      + https://realpython.com/python-gui-tkinter/
 98 |      + https://github.com/TomSchimansky/CustomTkinter
 99 |    + Someone asked Rebecca for beginer Tkinter resources and she recommended two courses/videos
100 |      + [Create Graphical User Interfaces With Python And TKinter](https://www.youtube.com/playlist?list=PLCC34OHNcOtoC6GglhF3ncJ5rLwQrLGnV)
101 |      + [A Linkedin Learning Course](https://www.linkedin.com/learning/python-gui-development-with-tkinter-2?u=2147385)
102 |    + Eric asked if once can create a single executable with a custom desktop icon for the resulting app with Tkinter
103 |      + Rebecca said it is possible, but would require the use of a packaging utility
104 |        + Rebecca: “PyInstaller is the thing that packages it all up using the command line, Auto-py-to-exe is a layer on top for it”
105 |  + Emily had a question about using pymarc for some batch edits, but it did not work as she hoped(?)
106 |    + “At my institution, we’ve got one person (me) identifying OCLC numbers for changes in one, now pymarc script, that a second person then feeds into the Metadata API 2.0 to make changes. Using the BookOps library would we be able to integrate the script searching for identifiers with the script that makes batch changes?”
107 |  + Charles shared a new project he and Eddie are working on using Flask to connect to the Alma API
108 |    + https://flask.palletsprojects.com/en/3.0.x/
109 |    + https://en.wikipedia.org/wiki/Flask_(web_framework)
110 |    + The application lives on the Azure cloud, but it runs via Docker for local tests and on the cloud
111 |  + Javier asked about Charles' use of ChatGPT 4, if he could share reasons to justify the cost of chatGPT 4
112 |    + Javier also asked about the various “personas” that Charles used.
113 |    + Charles then explained how to give “context” to each “persona.” Like stating that the human users is already experienced in programming.
114 |    + Charles also mentioned that he asks chatGPT questions that chatGPT may need answered before it can properly answer a particular prompt (or all prompts going forward for a single “persona”)
115 |    + Charles also recommended other LLMs that worked well for him for code questions if you cannot pay for ChatGPT 4 (some of the ones below have paid versions too)
116 |      + https://www.phind.com/search
117 |      + https://www.anthropic.com/claude
118 | 
119 | ### March 19th, 2024
120 |  + Yamil and Charlotte gave a presentation on Python Virtual Environments & requirements.txt
121 |    + https://docs.google.com/presentation/d/1XvnmQFdCkBWnD4javgJ0SPn-Uzp7F8if4dIh6qPxKos/edit?usp=sharing
122 |  + Q&A/Discussion
123 |    + Using pyproject.toml vs. requirements.txt
124 |      + pyproject.toml files are more complex/powerful
125 |      + this should be a presentation topic in the future
126 |      + https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
127 |    + Dependency management and how to properly deploy code to someone else’s machine
128 |    + pipx: https://github.com/pypa/pipx
129 |      + how to install packages globally while still keeping them separate form the global Python install
130 | 
131 | ### March 5th, 2024
132 |  + Rebecca mentioned that Pysimple GUI has moved to a license model and was wondering if it is common for a package to move to a closed license
133 |    + Clinton mentioned he has seen it maybe 5 times
134 |    + It makes projects very brittle because every person needs to get a key annually
135 |  + We discussed alternatives to PySimpleGUI
136 |    + TKinter: https://docs.python.org/3/library/tkinter.html
137 |    + PyQt: https://wiki.python.org/moin/PyQt
138 |    + Clinton also mentioned using a python backend with a simple HTML frontend in the past as a potential alternative to PySimpleGUI
139 |      + If the project doesnt need the user interface to change, the project won't require any javascript
140 |      + Buttons can send calls to Flask endpoints
141 |        + Example: randomizing math exercises from text book
142 |        + Basic inputs with some rendering in Flask
143 |        + It has a low barrier to entry
144 |        + The python is running locally and you type in the local host in the browser
145 |        + Will always use a browser as the front end
146 |    + Brooks mentioned [FastUI](https://github.com/pydantic/FastUI) and [DearPyGUI](https://github.com/hoffstadt/DearPyGui)
147 |      + https://talkpython.fm/episodes/show/348/dear-pygui-simple-yet-fast-python-gui-apps
148 |  + Tomasz mentioned that python isn’t really known for windows apps especially because TKinter is part of the standard library but looks very dated
149 |    + The library isn’t copied into your virtual environment
150 |    + https://beeware.org/project/projects/libraries/toga/
151 |    + Rebecca mentioned TTKbootstrap: https://ttkbootstrap.readthedocs.io/en/latest/
152 |  + Rebecca asked how to ensure that one won’t be burned in the future
153 |    + Clinton suggested focussing on tools with very wide adoption (like Flask or Django)
154 |    + Tools that are widely used can’t make that sort of change without it being too disruptive
155 |  + If anyone would like to evaluate any of these tools and present on their findings it would be a welcome presentation
156 |  + Rebecca mentioned a self-checkout tool that she is developing and asked for feedback
157 |    + She is working with a group within CUNY to develop this tool
158 |    + It will run in a terminal where someone could enter their User ID and check out a book
159 |  + Charlotte asked for feedback on [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat)
160 |  + David mentioned that he and Lauren are working on an OCLC reclamation using [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat)
161 |  + Clinton offered to present on creating simple APIs in the future
162 |    + Eric said he was interested in learning more about FastAPI
163 |    + Tomasz asked about Jupyter Kernel Gateway to implement a local API to query from within an OpenRefine project
164 |      + https://github.com/MichaelMarkert/GND4C/blob/main/APIs_for_OpenRefine/localAPI.ipynb
165 |  + Kate asked about adding 758 fields to ILS records
166 |    + She is exploring adding them to their collection in a batch
167 | 
168 | ### February 20th, 2024
169 | (Missing notes from Jeremy's presentation on pyscript)
170 | 
171 | 
172 | ### February 6, 2024
173 |  + Upcoming scheduled presentations/chats:
174 |    + Jeremy Nelson will talk about [pyscript](https://pyscript.net/) on Feb 20
175 |    + Charlotte and Yamil will be talking virtual environments on Mar 19
176 |  + Rebecca recently gave a chat about something she built with [PysimpleGui](https://www.pysimplegui.org/en/latest/)
177 |    + there will be a video of this soon
178 |  + Michael went over how he solved his PDF batch change issue by using [pikePDF](https://pikepdf.readthedocs.io/en/latest/)
179 |    + He just wanted to batch change some simple low level PDF file metadata like the “author” field for the whole PDF file, but pikePDF can do a lot more with PDFs
180 |    + He mentioned how PDFs save file metadata in two ways, but pikePDF helps him access either
181 |    + He also mentioned an older Perl based tool called `exiftool` that is good for grabbing file metadata info
182 |      + https://exiftool.org/
183 |  + He fired up the [Pycharm python IDE](https://www.jetbrains.com/pycharm/) and ran the debugger on some sample code to show us some issues that he initially had, but has since solved
184 |     ```
185 |       from pikepdf import Pdf
186 | 
187 |       with Pdf.open('original.pdf') as pdf:
188 |         with pdf.open_metadata() as meta:
189 |           del meta['dc:description']
190 |           del meta['pdf:Keywords']
191 |         pdf.save('clean.pdf')
192 | 
193 |       ```
194 |  + Yamil mentioned the upcoming PyCon 2024, and mentioned the $100 online only registration option. Also the videos will be posted on their Youtube channel after a month or so.
195 |    + https://us.pycon.org/2024/
196 |    + https://us.pycon.org/2024/attend/information/
197 |  + David asked about any new projects people have started with Python lately
198 |    + He mentioned that he is teaching a colleague to update OCLC holdings with Python using the OCLC Metadata API
199 |    + He also mentioned [bookops-worldcat](https://bookops-cat.github.io/bookops-worldcat/0.5/), Tomasz's library that acts as an “wrapper” for use with the OCLC Metadata API
200 |      + “... Bookops-Worldcat is a Python wrapper around OCLC’s Worldcat Metadata API which supports changes released in the version 1.1 (May 2020) of the web service. The package features methods that utilize search functionality of the API as well as read-write endpoints. The Bookops-Worldcat package simplifies some of the OCLC API boilerplate, and ideally lowers the technological threshold for cataloging departments that may not have sufficient programming support to access and utilize those web services. Python language, with its gentle learning curve, has the potential to be a perfect vehicle towards this goal. ...”
201 |    + David said he will share some sample code to show how he uses the OCLC Metadata API to update holdings with Python
202 |  + Alison asked if anyone has successfully used Alma APIs and scripting to bulk change loan due dates for expired patrons
203 |    + Alma doesn’t automatically do this when patron expiration dates change, which is a huge issue.
204 |      + Rebecca: I haven’t changed loan dates but I have done other small things with the user/fulfillment API so far
205 |      + Matt: I’ve used Python & the API once or twice to make bulk change due dates for specific users, but it’s been a while. Should be possible to do what you’re asking, though
206 |      + David: I think our systems librarian does something like that at the end of the semester or FY. I can check with him and see if there’s anything he’d be willing to share.
207 | 
208 | ### January 23, 2024
209 | + Mike was having issues making bulk edits to the built-in metadata (eg. author) in PDF files using the [pypdf module](https://pypi.org/project/pypdf/)
210 |   + repo: https://github.com/py-pdf/pypdf
211 |   + Daniel suggested he try a module like [PyExifTool](https://pypi.org/project/PyExifTool/) that taps into exif data
212 | + David mentioned that his library is migrating into Ex Libris Alma/Primo in the near future.
213 |   + He asked about existing Alma API wrappers you use and if anyone had experience using them
214 |   + No one had suggestions for an API wrapper for Alma but many suggested he ask on the various Code4lib Slack channels
215 |   + There is a [possibly outdated project UC David from 5 years ago](https://github.com/UCDavisLibrary/almapipy)
216 | + Clinton put in a plug for using Postman to quickly use APIs
217 |   + https://www.postman.com/
218 |   + Craig also suggested [Insomnia](https://insomnia.rest/) as an alternative for working with APIs manually
219 |   + We may try to have a presentation in this group on the very basics of Postman in the future
220 | + David E. asked about how folks have been using chatGPT for coding python
221 |   + Many folks had success with writing code with chatGPT, but chatGPT does not know a lot about some technologies
222 |     + It doesn't know some details of OpenSearch and has invented functions in PyMARC when asked
223 |   + [HuggingChat](https://huggingface.co/chat/) was suggested as a better alternative to chatGPT, since it has a more recently updated model
224 |     + ChatGPT’s 3.x model is from 2021 and HuggingChat's model is supposed to be newer
225 |     + it has an option to “search the web” that, when enabled, will try to compliment its answers with information queried from the web
226 |   + Eric has used chatGPT for creating unit tests with more advanced features like “test parameterization”
227 | + Eric mentioned that he proposed a post-conference session at Code4lib 2024 for this group (python{4}lib)
228 |   + He asked for topic suggestions and volunteers
229 |   + The session will happen in the morning
230 | + David E. asked if folks are starting new projects that will necessitate using python to finish the projects
231 |   + For those migrating to FOLIO ILS the [EBSCO python client](https://folio-migration-tools.readthedocs.io/en/latest/) was recommended
232 | + Daniel asked for suggestions for PAID software for digital humanities, since they have a budget for it
233 |   + Here were the suggestions:
234 |     + [Constellate from Jstor labs](https://labs.jstor.org/projects/text-mining) is a text analysis tool and they run workshops
235 |     + [Gale Digital Scholar Lab](https://www.gale.com/primary-sources/digital-scholar-lab#how-the-lab-works)
236 | 
237 | ### January 9, 2024
238 | John Dewees, DAM Lead at the University of Rochester, gave a presentation on the pax-opex-utility
239 | [pax-opex-utility](https://github.com/rochester-rcl/pax-opex-utility) is "a graphical utility to format PAX objects and OPEX metadata for ingest into Preservica as SIPs to be synced with ArchivesSpace"
240 | + He used a PySimpleGUI utility to create a Windows executable
241 |   + https://www.pysimplegui.org/en/latest/
242 |   + the pax-opex-utility only works on Windows at this time
243 |   + from David E.:
244 |     + One thought on implementing on Mac vs. PC: I think there are different pathing formats/norms to follow. Depending on users they may need to make some adjustments if certain paths are hard coded. (I’ve made that an issue for myself by cleverly coding between a laptop and work PC.)
245 | + someone asked about libraries that can be used to package up assets for Archivematica and libraries that can be used to work with metadata in ArchivesSpace
246 |   + someone else shared [ArchivesSnake](https://github.com/archivesspace-labs/ArchivesSnake)
247 | + Tomasz asked how is this software “shipped” to users
248 |   + John said the users download software from the software’s Github repo’s release section
249 | + Someone asked if the code had unit tests, and some were not familiar with unit tests
250 |   + Yamil shared a presentation he gave to this same group last year called [“Intro to unit testing in Python”](https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing)
251 | + We talked about how to save credentials in your OS and not in the app
252 |   + Tomasz mentioned a Python module that can help with this:
253 |     + “The [Python keyring library](https://github.com/jaraco/keyring) provides an easy way to access the system keyring service from python. It can be used in any application that needs safe password storage. These recommended keyring backends are supported:”
254 |       + macOS Keychain
255 |       + Freedesktop Secret Service supports many DE including GNOME (requires secretstorage)
256 |       + KDE4 & KDE5 KWallet (requires dbus)
257 |       + Windows Credential Locker
258 | + We talked about how to handle using paths in your code to work in more than one OS
259 |   + it was suggested to look into using the built in “pathlib” library to make it easier to create cross platform paths and thus use less manual string concatenation to create paths
260 |     + https://realpython.com/python-pathlib/
261 |     + https://docs.python.org/3/library/pathlib.html
262 | 
263 | Screenshots from John's presentation:
264 | ![pax-opex1](media/pax-opex1.png)
265 | ![pax-opex2](media/pax-opex2.png)
266 | ![pax-opex3](media/pax-opex3.png)
267 | 
268 | ### December 13, 2023
269 | We briefly talked about [“for … else” construct](https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops) that was recently mentioned in the #python Slack channel
270 | + I have only used it once, but I was very confused the first time I saw it
271 | 
272 | “This is a summary of what features appeared in which versions of Python.”
273 | + https://nedbatchelder.com/text/which-py.html
274 | + I found this page very helpful, it is created by the maintainer of the [coverage.py](https://coverage.readthedocs.io/) Python module
275 | 
276 | We talked about using Google Colab as a way to try to run a python script with more resources than on your local machine. For example, you may be able to tap into GPUs with Google Colab.
277 | + “Colab is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs.”
278 | 
279 | Someone asked about running Python or non-Python projects on Digital Ocean, some have used it and were happy with them. I use the Digital Ocean help docs for Unix/shell and even Python topics quite often
280 | + John: “I’m not sure about now, but a few years ago Digital Ocean did some good free webinars on Django and Flask. The instructors really knew a lot about deploying Python on DO.”
281 |   + [Getting Started with Flask](https://www.digitalocean.com/community/tech-talks/getting-started-with-flask)
282 |   + [Deploying your Python Applications](https://www.digitalocean.com/community/tech-talks/deploying-your-python-applications)
283 | 
284 | Daniel talked briefly about a new project called [jupyter-ai](https://github.com/jupyterlab/jupyter-ai) (and gave a live demo)
285 | 
286 | We spoke about doing quick python tests or experiments with a local Jupyter notebook
287 | + Another alternative for doing quick local tests or to run interactive commands for production use is iPython. Which is the code base that was the foundation of Jupiter Notebooks
288 | + https://ipython.readthedocs.io/en/stable/index.html
289 | 
290 | Book suggestion from John:
291 | + I’ve just started this book to try and build more programming practice into my workday: [Python Workout: 50 ten-minute exercises](https://www.manning.com/books/python-workout)
292 | + It’s included on O’Reilly if you have an institutional subscription.
293 | 
294 | On the topic of new things we have tried lately
295 | + I finally started using the [coverage.py](https://coverage.readthedocs.io/) Python module
296 | + “Coverage.py is a tool for measuring code coverage of Python programs. It monitors your program, noting which parts of the code have been executed, then analyzes the source to identify code that could have been executed but was not.”
297 | 
298 | We spoke about coming up with new years python learning resolutions, or 7 days of code challenge
299 | 
300 | Also the group was asked if we should continue to have a mix of scheduled presentations and free chat time
301 | + the group would like to keep this mix
302 | 
303 | We talked about John’s earlier idea (from Slack) about finding if there any Python related presentations meant for Code4lib that were not accepted (or accepted) that could be given during this Python group meetings for those that cannot attend Code4lib
304 | 
305 | Tomasz mentioned how he suddenly found out that `distutils` (https://docs.python.org/3.10/library/distutils.html) was removed from the new Python 3.12 release
306 | + “`distutils` is deprecated with removal planned for Python 3.12. See the What’s New entry for more information.”
307 | + we talked a bit about how Python does remove features, but it tries to give “deprecation warnings” and a year or so before a feature/module is removed
308 | + “You get what you pay for” reminds me of this: https://xkcd.com/2347
309 |   + Susan: That xkcd reminds me of the node.js/javascript library whose developer yanked it from all the public repos a few years back, and it broke basically everything. Was it underscore?
310 |     + [left-pad](https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code)
311 | 
312 | 
313 | The removal of that distutils module led to a discussion about [Python virtual environments](https://realpython.com/python-virtual-environments-a-primer/) (also known as a venv which is the Python built-in module’s name)
314 | + by default the virtual environment works with whatever is the single python version is installed on your OS
315 | + you still need to set up separate python version (and there are multiple ways for that[1]) to have a virtual environment and also have it run a different version of Python locally
316 |  + these are 2 ways (of several) to have more than one version of Python with tricks like
317 |  + https://github.com/pyenv/pyenv
318 |  + Docker
319 | + this group may have a future presentation on Python virtual environments (Yamil and Charlotte agreed to present on the topic)
320 | 
321 | ### November 28, 2023
322 | Michael Benowitz, a Tech Lead at the NYPL, gave a presentation on Airflow.
323 | "[Apache Airflow](https://airflow.apache.org/) is a platform created by the community to programmatically author, schedule and monitor workflows.”
324 | Link to slides will be forthcoming, I will include screenshots of a few of the slides in the meantime.
325 | + [Wikipedia article on Airflow](https://en.wikipedia.org/wiki/Apache_Airflow)
326 | + It is a free and open source product, but typically needs to run on a central VM/server for production use. Instead of just running on your own workstation. There are “cloud” providers for handling the hosting for you.
327 | + Airflow can be part of an [ETL workflow](https://en.wikipedia.org/wiki/Extract,_transform,_load)
328 | + Airflow can be easy to schedule compared to older tools like [cron](https://en.wikipedia.org/wiki/Cron), and it comes with a GUI
329 | 
330 | Airflow cloud options:
331 | + https://aws.amazon.com/managed-workflows-for-apache-airflow/pricing/
332 | + https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/data-factory/how-does-managed-airflow-work.md
333 | + Astronomer
334 | + DAGster
335 | 
336 | Additional technologies used and/or mentioned:
337 | + https://en.wikipedia.org/wiki/Kubernetes
338 | + https://www.sqlalchemy.org/
339 | + https://docs.pydantic.dev/latest/
340 | + https://newrelic.com/ - has a way to give free “seats” to certain non-profit organizations
341 | + https://en.wikipedia.org/wiki/AWS_Lambda
342 | 
343 | Screenshots of Mike's presenations:
344 | ![airflow1](media/airflow1.png)
345 | ![airflow2](media/airflow2.png)
346 | ![airflow3](media/airflow3.png)
347 | ![airflow4](media/airflow4.png)
348 | ![airflow5](media/airflow5.png)
349 | ![airflow7](media/airflow7.png)
350 | 
351 | ### November 14, 2023
352 | + We talked about the MARC21 standard, how each record has a max size of 99,999 bytes/octets, and that individual fields can only have a maximum of 9,999 bytes/octets in size
353 | https://www.loc.gov/marc/specifications/specrecstruc.html
354 | + I then shared a Python pymarc snippet that inspired this size talk, that processed a large 80k record MARCXML file export to find if any individual records were larger than 99,999 bytes/octets
355 | https://pymarc.readthedocs.io/en/latest/
356 | + I was happy to find a convenient pymarc method that reads in MARCXML files and returns a Python list of individual pymarc records
357 | ```python
358 | records = pymarc.marcxml.parse_xml_to_array('myfile.xml')
359 | ```
360 | + though this method loads all data in RAM and could seriously impact your computer performance if you don’t have a lot of RAM available
361 | there are other functions and approaches to only load a few XML records at a time
362 | the resulting code found 4 records in our data
363 | then there was a question about how hard it is to use pymarc to analyze subject data in a batch of records
364 | we then shared a few more examples of how simple it can be to use pymarc
365 | and how general knowledge of Python concepts like looping through lists and using conditional statements goes a long way to make it easy to use pymarc
366 | see image of Eric’s example of using pymarc code that was shared
367 | 
368 | ![marc1](media/marc1.png)
369 | 
370 | ![marc2](media/marc2.png)
371 | 
372 | + Rebecca had a question about properly creating a graph using Google Colab, Pandas, and plotly.
373 |   + https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html
374 |   + https://pandas.pydata.org/
375 |   + https://plotly.com/python/
376 | 
377 | + Rebecca was hosting her code on Google Colab, which is a way to run Jupyter notebooks on a shared site that you can then share with others
378 | https://research.google.com/colaboratory/
379 | https://jupyter.org/
380 | + we briefly spoke about that we should avoid using regular expressions when processing XML data
381 | and we should instead use a Python module that are specifically designed for processing XML
382 | here are some short post with some comments on why we should avoid using regex with XML
383 |   + https://medium.com/thecyberfibre/stop-parsing-x-html-with-regular-expression-2cf13215b411
384 |    + https://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex
385 | + Here are some examples of python modules that are meant to handle XML
386 |   + [ElementTree XML API](https://docs.python.org/3/library/xml.etree.elementtree.html)
387 |   + [this one is built-in to Python BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/)
388 |   + this one is usually used for parsing HTML but can handle XML [lxml](https://lxml.de/)
389 | 
390 | ### October 31st, 2023
391 | + Introductions, refreshing memories of returning attendees and new attendees; common threads from intros:
392 |   + Alma
393 |   + OCLC API (APIs in general)
394 |   + Archivespace
395 | + John Dewees Question on CSVs - Generally how big is too big for python to handle CSVs? Is there a moment where something is too big to be ingested and handled properly?
396 | + John Pillbeam mentioned SQLite might work well here which is sort of a file on disk and is adaptable for quite a bit of operations.
397 | + Bruce Orcutt mentioned SQLite might be the best way to go as well, though think of the upfront maintenance.
398 | + Paul Clough mentioned you may need an Object Relational Mapping (ORM) in front of the SQLite. It helps translate between the application and its needs (abstracts it out.)
399 | + Emily Frazier mentioned using a python script which loads 8 million rows of a TSV into pandas. It worked but was a bit slow.
400 | + Rebecca Hyams mentioned an Alma project which helps draw out certain elements of MARC data. You can get really granular from API. ENUG Presentations including Rebecca’s presentation on item/inventory and PySimpleGUI
401 | + Comments about documenting projects. Susan mentioned good comments in code and a narrative of it in a separate word doc.
402 | + Constellate was asked after by Bruce.
403 | + John Pillbeam linked to the courses/workshops at constellate.org/events.
404 | + John P. Linked to another course by one of the constellate devs. Currently going through this free online course/textbook that one of the Constellate trainers created: https://pandas.pythonhumanities.com/
405 | 
406 | ### October 17th, 2023
407 | + we talked about [FRBR](https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records)
408 |   + Talked about record-rollups
409 | + Susan mentioned that she started working through Adam Emery’s “Learn Python” tutorials.
410 | + Eric has recently liked working with the [Spacy site to learn about Natural Language Processing (NLP)](https://course.spacy.io/en)
411 | + Yamil liked the tutorials that this site has, since you can run examples right on their site without having to install anything locally
412 | + Susan later asked if they should use a locally installed version of Python or use Jupyter notebooks for her first real project
413 |   + John: the consensus that it is better to have a locally installed version
414 |   + though Jupyter notebooks or Google Colab can be great to practice or prototype things
415 | + “I just discovered this via that Glyph blog post - an updater for the python.org Mac installer: https://mopup.readthedocs.io/en/latest/
416 | + David shared a free online Python tutorial:
417 |   + https://learn-python.adamemery.dev/
418 |   + Other more advanced suggestions included
419 | + using [pyenv](https://github.com/pyenv/pyenv) to easily manage having more than one version of Python on yourhcieh
420 | + a few people mentioned that they are liking using [Poetry](https://python-poetry.org/) for “packaging and dependency management”
421 | + John D. mentioned: “Just finished the official [PySimpleGUI](https://www.pysimplegui.org/en/latest/) Udemy course and created my first graphical utility which has been fun”
422 |   + This group may have a future session to demonstrate PySimpleGUI
423 |   + Tomasz asked if folks knew about Python tools for “transliteration” of Non-Latin text
424 |     + A graph-based transliteration tool: https://github.com/seanpue/graphtransliterator
425 | + We went back to talking about tools for local development
426 |   + Here is an image that I found through my local (Boston) Python meet up, of all the tools that can be used for setting up your code...
427 |     + for virtual environments, for creating packages, for multiple python versions, etc. https://cdn.fosstodon.org/media_attachments/files/110/741/748/598/833/261/small/13d5e21803357140.png
428 |     + It is a bit overwhelming
429 |     + Here is a presentation where this image was taken from uploaded this summer covering a lot of the possible tools that can be used… https://youtu.be/MsJjzVIVs6M
430 | 
431 | ### October 3rd, 2023
432 | + Guest Speakers:
433 |   + Simply E (python project ereader app) (Mike with NYPL, Tomasz to contact)
434 |   + Alma and Archivespace sync utility (Aspace and Alma APIs) Bruce Orcutt in Group (Dave to email/slack)
435 |   + Pysimple GUI (John Dewees) (Dave to email/slack)
436 |   + Citation generator at U of Miami (Eddy and Charles) Citation Style Language
437 |     + https://pypi.org/project/citeproc-py/
438 |     + https://pypi.org/project/citeproc-py-styles/
439 |     + https://github.com/brechtm/citeproc-py
440 |     + https://citationstyles.org/
441 |     + Though some libraries aren’t actively maintained.
442 | + Side note, Charles works on an open source LibGuides alternative.
443 | + Some general chat about the nature of open source projects - great grassroots! Though it can be fragile/risky.
444 | + Some code generated by chat GPT for the basic LMS on the list of exercises Charles provided
445 | + Think small and tailor the items to the library discipline. Build upon one thing to the next?
446 | + GUI? Connect to WorldCat?
447 | + Carpentries lessons, link to git space? https://carpentries.org/community-lessons/
448 | + John Pillbeam mentioned the incubator for finding concepts that may not be included in main lesson plans yet.
449 | 
450 | ### September 19th, 2023
451 | + Ben asked how to tell others that say they want to use Python with AI, specifically with the chatGPT API
452 |   + We spoke how there is some ability to run some API calls for free for version 3.5, though there is a cost for running API calls for the 4.x version
453 |   + It was mentioned about the pricing for Hugging Face https://huggingface.co/pricing as an alternative
454 |   + From David: Hugging Face also has a variety of tags around different areas of AI. So there’s the Natural Language Processing stuff, but ChatGPT is the big player there. But things like object detection and audio tools are there.
455 |   + Yamil suggested running tutorials of the https://scikit-learn.org/stable/
456 |     + Simple and efficient tools for predictive data analysis
457 |     + Accessible to everybody, and reusable in various contexts
458 |     + Built on NumPy, SciPy, and matplotlib
459 |     + Open source, commercially usable - BSD license
460 |   + Recent post from Simon Willison on Python and OpenAI tools: https://simonwillison.net/2023/Sep/12/llm-clip-and-chat/
461 |   + We talked about concerns on the AI hype and over reliance of AI.
462 |   + We very briefly spoke about NLP - Natural Language Processing., and how that is just a small part of the “engine” that is a platform like chatGPT
463 |     + to try to learn NLP I ran some tutorials using the python module https://spacy.io/
464 |     + spaCy is a free, open-source library for advanced Natural Language Processing(NLP) in Python.
465 |     + If you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? ”
466 | + We spoke about Charle’s new repository with exercises to learn python skills
467 |   + https://github.com/UMiamiLibraries/python4lib-python-exercises/blob/main/README.md
468 |   + Charles is looking for collaborators
469 | + Tomasz talked about about issues with being a organizational customer of Naxos, which is a streaming audio/video content
470 |   + For example, how to make sure the catalog is serving the correct sets of valid MARC files with also valid 856 tags that lead to the content
471 |   + Here is a presentation on the pitfalls of keeping your holdings in sync with vendors
472 |   + [Everything is Broken, but by How Much Exactly (video)?](https://phette.net/prez/everything-is-broken) [(slides)](https://phette23.github.io/everything-is-broken/#/)
473 |   + Tomasz would like to see if he can use Python to automate the process of keeping the holdings in sync. Meaning that MAC records for content that is no longer available via Naxos is deleted from the catalog in a timely manner
474 |     + For example, doing some analysis with Pandas
475 | + Kate wrote:
476 |   + Once we migrate to our new ILS (Symphony), we will eventually (hopefully!) start using their eResource Central system for all our eContent and be able to do away with MARC records for eContent. But for now we use a combination of extracting batches of records in order to use MarcEdit’s link checker or other link checkers, or just periodically wiping out all our MARC records for a particular vendor and loading a new batch from the vendor for all our holdings
477 |   + We’re about to do that now with Axis 360 since they’ve switched to “Boundless”. We have over 30,000 MARC records for Axis 360, so just too much to handle
478 |   + Mentioned the issues of trying to fix issues, in the large vendor MARC records that need to be added to our catalogs. For example, like misspellings or bad records
479 | + We spoke about about the limitations of licensing content from Naxos (or similar vendors) versus actually storing that content locally
480 | + Briefly mentioned the ongoing “Internet Archive lawsuit”
481 | + Here is an article about the lawsuit it that is a few weeks old
482 | This is an [article from the New York Times](https://www.nytimes.com/2023/08/13/business/media/internet-archive-emergency-lending-library.html?unlocked_article_code=wcOmLYkdU__rOiiM6CNfze5OdE8Y4h41_rWZGFXrGdG-380Ng1Dkw0URPeZyTdFWmVYedUOlhz1hQFujukvNfw6un9L-aR5-AXLvbT4yWNv_tPLhfkj0Ou344H0i50355VZDbp5Uv9U6xLKJrJGh7WRZ-Vi6WbWosiHTpN7j-qR60P1SUSZn9nweYhFky5gIPNubaGpsUrRt3V1ZbzqG_aQMpfqbQSjFZamJkm84kzV_bqbbDB1q370gK6OkBDZbrBifM0fTKnqQaVItqvokBYaeEExJsRMugQQlJiKInxc7V44Cg5xK0piv3Q6ulQj1V1i2QYsbQGgSQwjv_bzTmknPkPRHMkfI9Uf2jdYqM5GHRn9zwqk9tvqXTw&smid=url-share) that is several weeks old about the lawsuit
483 | a key quote from the article that we talked about
484 | “Libraries came before publishers,” the 62-year-old librarian said in a recent interview in the former Christian Science church in western San Francisco that houses the archive. “We came before copyright. But publishers now think of libraries as customer service departments for their database products.”
485 | 
486 | ### September 5th, 2023
487 | + Charles showed some code that batch creates APA & AMA citations
488 | + Carlos wanted feedback on how to add small improvements to their code that creates citations
489 |   + for example, when then there is no volume number for a citation, how to elegantly not add a volume number
490 | + someone suggested to to use Python 3.10's “case” functionality that is formally called: “Structural Pattern Matching”
491 |   + this feature was added Python 3.10 in PEP636 https://peps.python.org/pep-0636/
492 | + we briefly talked about how PEP stands for “Python Enhancement Request”
493 | + Here is a site with a brief explanation on how to use “Structural Pattern Matching” in Python 3.10
494 | https://realpython.com/python310-new-features/#structural-pattern-matching
495 | + Eduardo, who works with Charles, mentioned that they are trying to figure out how to encode that some parts of the citation have to be in italic when using Pandas to batch create citations
496 | + Tom has this suggestion for dealing with citation data
497 |   + If you want to play with bibtex files to manage your citations instead of excel, you could possibly use this https://github.com/caltechlibrary/pybtex-apa7-style
498 |   + https://github.com/cproctor/pybtex-apa7-style/blob/master/formatting/apa.py
499 | + Yamil talked about using “unittest” for a pre-existing python code base, but mentioned that you can keep older tests as unittest style and just add new tests that use pytest
500 |   + [Info on Python built-in unittest module](https://docs.python.org/3/library/unittest.html)
501 |   + versus the non-built in [pytest module](https://docs.pytest.org/en/7.4.x/) also for “unit tests”
502 | + we talked about “Library Carpentry” classes and how helpful they have been. They can cover various topics, including Python
503 |   + https://librarycarpentry.org/index.html
504 |   + “Library Carpentry focuses on building software and data skills within library and information-related communities. Our goal is to empower people in these roles to use software and data in their own work and to become advocates for and train others in efficient, effective and reproducible data and software practices. Our workshops are based on our lessons. ”
505 |   + The [umbrella organization for Library Carpentry](https://carpentries.org/index.html)includes: Data Carpentry and Software Carpentry
506 | + Yamil was asked to briefly speak about a session at the Open Library Foundation’s (OLF) conference (WOLFCon) that covered the FOLIO ILS and the use of Python for post migration clean up by folks at Wellesley
507 |   + https://github.com/wellesleyfolio/WOLFcon_2023
508 |   + here are more links for Python FOLIO tools/modules
509 |     + https://github.com/FOLIO-FSE/folioclient
510 |     + https://github.com/folio-org/folio-tools
511 | + this site was suggested for improving your Python skills, but other programming languages are supported
512 |   + https://exercism.org/
513 | + we spoke about Python community’s preferred writing style versus Ruby’s
514 | + We spoke about PEP8, which is the main Python style guide
515 |   + [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/)
516 |   + Here is a Python module to check if your code follows PEP8 without making changes
517 | https://pycodestyle.pycqa.org/en/latest/
518 | + spoke about [Black](https://black.readthedocs.io/en/stable/), which can be used to change your code to match PEP8
519 |   + “Black: The uncompromising code formatter”
520 |   + We spoke about how the Pycharm Python editor is great about reminding you to follow PEP8 when you write your code and to also give the option to automatically reformat individual code snippets to follow PEP8, instead of just reformatting all of your code
521 |   + Yamil also mentioned how I have opened up existing Python codebases in Pycharm, and the Pycharm indexer has found many hidden bugs in code that had never run or code that had logic flaws
522 | 
523 | ### August 22, 2023
524 | ... missing ... :sob:
525 | 
526 | ### August 8, 2023
527 | Our meet focused on [Pydantic](https://docs.pydantic.dev/latest/). Matt Lincoln from JSTOR Labs gave a brief intorduction into the tool and its uses.
528 | 
529 | Matt used [this jupyter notebook](demo/pydantic.ipynb) to demo basic Pydantic syntax and validation functinality.
530 | 
531 | + Data validation can be done using Python type hints
532 | + Fast and extensible, Pydantic plays nicely with your linters/IDE/brain. Define how data should be in pure, canonical Python 3.7+; validate it with Pydantic.
533 | + We briefly talked about wanted to review how to create classes and objects in Python in a future meeting.
534 | + Pydantic can help with IDE / editor auto complete / auto suggest
535 | + Pydantic hasa a x.json() function/method to serialize data to JSON
536 | + great for writing APIs
537 | + Pydantic has a x.schema() method (which uses JSON schemas)
538 |     + the schema can then be used to create API documentation for using the API
539 | + [FastAPI](https://fastapi.tiangolo.com/) platform for Python based APIs uses Pedantic a lot
540 | + FYI: Pydantic version 2 is just coming out and some products/python modules that use Pydantic may still be not ready for version 2, but shoudl still support version 1
541 | we also briefly talked about Python’s built in “data classes”
542 | + “In Python, a data class is a class that is designed to only hold data values. They aren’t different from regular classes, but they usually don’t have any other methods. They are typically used to store information that will be passed between different parts of a program or a system.”
543 |   + https://docs.python.org/3/library/dataclasses.html
544 |   + https://realpython.com/python-data-classes/
545 |   + https://www.dataquest.io/blog/how-to-use-python-data-classes/
546 | + we talked about that Pydantic is not a replacement of “JSON Schemas”, that Pydantic is a complimentary tool
547 |   + https://json-schema.org/
548 |   + https://www.tutorialspoint.com/json/json_schema.htm
549 | + talked about Pydantic validators and their application
550 |   + https://docs.pydantic.dev/2.1/usage/validators/
551 |   + the less strict with lose rules
552 |   + then will do some clean up/transformation
553 |   + then switch to a more strict Pydantic validating class
554 | + we talked about briefly typing in Python in general, and how helpful it can be
555 |   + https://docs.python.org/3/library/typing.html
556 |   + https://realpython.com/lessons/type-hinting/
557 |   + https://towardsdatascience.com/12-beginner-concepts-about-type-hints-to-improve-your-python-code-90f1ba0ac49
558 |   + “Type hints are performed using Python annotations (introduced since PEP 3107). They are used to add types to variables, parameters, function arguments as well as their return values, class attributes, and methods. Adding type hints has no runtime effect: these are only hints and are not enforced on their own.”
559 |   + For example, in other languages that are strongly typed like C or C++, if you initially declare a variable as one type (e.g. string), you can’t just later on use it as another type (e.g. int) like we can do in Python
560 | + questions for Matt:
561 |   + is there any integration between pydantic and popular [ORMs](https://www.fullstackpython.com/object-relational-mappers-orms.html) (like [sqlalchemy](https://www.sqlalchemy.org/) for example)? Answer: yes, pydantic data classes should work well with most ORMs
562 |   + can pydantic validation features be useful in format crosswalks when we do not care about JSON output? Answer: yes, although in some cases more strict and detailed validation may be required. Still out of-the-box validiton in pydantic would be very useful in Matt's opinion
563 | 
564 | ### July 25, 2023
565 | + Rebecca:
566 |   + Inventory tool to active scan vs. lists, processes, & jobs https://github.com/LibraryNinja/alma_inventory_utility/tree/main
567 |   + Utilizes: pysimplegui, auto-py-to-exe
568 |   + Old method: Make a barcode set, run job on Alma to update
569 |   + Problem of not really knowing if something wasn’t found or had a status (loan, out of place, etc.)
570 |   + This is loosely based off of Jeremy Hobbs Lazy Lists utility to adapt to an inventory project. (https://github.com/MrJeremyHobbs/LazyLists)
571 |   + Examines items in XML
572 |   + Pulls in some basic information to confirm for users.
573 |   + Indicates set aside for problematic titles (tech services would handle)
574 |   + Used autopy-to-exe to allow student workers to run this small utility on their machines.
575 | + Julie:
576 |   + Sierra had a shelflist/inventory but it was not really work well, so a python inventory tool is great!
577 |   + Had used SQL lists to help scan/match with selenium
578 |   + Tools for link checking?
579 |   + Authentication with EZ Proxy
580 |   + https://pypi.org/project/LinkChecker/
581 | + Charles:
582 |   + Plotly module for data vis
583 |   + Neat 54 lines of code to create an interactive map of internet usage over time worldwide
584 |   + Charles does a 1-hour challenge to help learn new modules.
585 |   + ChatGPT for helping, there are some prompt setups you can do to reduce repetitive typing
586 |   + https://code.visualstudio.com now has a postman extension.
587 |   + https://www.pythonanywhere.com/ helps host and run python in the cloud (from the Anaconda people)
588 |   + https://www.git-tower.com/education/mac Gui for Git
589 | 
590 | ### July 11, 2023
591 | Rough and incomplete summary of topics covered today’s (2023-07-11) in Python{4}Lib group meeting
592 | + we talked about TAP - Text Analysis Pedagogy classes
593 |   + https://www.ithaka.org/constellate/text-analysis-pedagogy-institute/
594 | + Eric mentioned the Python Wagtail CMS built on top of the Python Django software dev sponsored by Google
595 |   + https://wagtail.org/
596 |   + https://www.djangoproject.com/
597 |   + Eric’s library moved off of Drupal by switching to Wagtail
598 | + We briefly talked about using https://gunicorn.org/ Python WSGI HTTP to serve Python software like Django, Flask
599 | + Eric also mentioned about a Python based institutional repository, and how it compared to the PHP based Islandora digital repository
600 |   + [InvenioRDM](https://inveniordm.docs.cern.ch)
601 | + we talked about using http://docopt.org/ instead of using the [Python built-in argparse module](https://docs.python.org/3/library/argparse.html) for parsing command line (CLI) parameters
602 | + We then talked about parsing ezproxy “audit” files with Python
603 | + then Eric shared a script that he created to parse a data file for the Koha ILS using docopt to parse the CLI parameters that are listed in the comments at the top of the file
604 |   + https://github.com/cca/koha_patron_import/blob/main/create_koha_csv.py
605 | + We talked about how to improve your coding style before posting you Python code on Github or on the internet.
606 | + Yamil recommended this book which helped him write in more standard/professional Python style: [“Beyond the Basic Stuff with Python / Al Sweigart”](https://inventwithpython.com/beyond/)
607 |   + this section talks about how to better understand Python errors messages like “stack traces”
608 |   + [Dealing With Errors And Asking For Help](https://inventwithpython.com/beyond/chapter1.html)
609 | + We then talked about when to use the `try:  except:`
610 |   + Python syntax to catch exceptions, since folks often did not see try {...} being used a lot in other people code
611 |   + some of us mentioned that we don’t use them all of the time but in some situations we always make sure to use them. For example, it is common to use try {...} when you are using a method that commonly raises exceptions.
612 |   + Like in the Python Selenium module for writing “functional tests” for web pages. There are several Selenium methods that start with find_***() and can easily trigger an exception if what you are looking for in a webpage is not found. In this context I always use a try {...} statement around calls like find_element_by_css()
613 |   + there is of course a lot more that can be said of when to use try {...} in your Python code
614 |   + this [chapter from the Beyond the Basic Stuff with Python” book](https://inventwithpython.com/beyond/chapter6.html), among many tips, includes how to use the built-in dictionary get() method that can be used to not accidentally trigger a KeyError exception when you try to access a Python dictionary’s key that does not actually exist
615 |     + Writing Pythonic Code - Pythonic Ways to Use Dictionaries
616 |     + using the get() dictionary method to avoid KeyError exceptions
617 | 
618 |   ```python
619 |   my_dict = {'username': 'joe'}
620 |   my_dict.get['password'] # raises KeyError exception
621 |   my_dict.get('password', False) # simply returns False, or whatever is placed in the 2nd parameter of get()
622 |   ```
623 | 
624 | ### June 27, 2023
625 | + We talked about how the US PyCon (Python Conference) recently released their videos from their 2023 conference
626 |   + [2023 sessions youtube channel](https://www.youtube.com/watch?v=eZwHvBsoPn4&list=PL2Uw4_HvXqvY2zhJ9AMUa_Z6dtMGF3gtb)
627 |   + [PyCon YouTube home with past conference videos](https://www.youtube.com/@PyConUS)
628 | + Charles shared this article about “typo squatting” popular Python modules names to trick users to install malaware: https://arstechnica.com/information-technology/2023/02/451-malicious-packages-available-in-pypi-contained-crypto-stealing-malware/
629 | + Charles also talked about a project at the University of Miami that collects data from Twitter for research purposes. He mentioned that there is now a an API limit to only be able to check 7 days in the past. He will post the name of the Python module they are using to
630 | + Eric mentioned issues that he has had issues archiving older tweets in the past. Also mentioned challenges evaluating misspelled words and how to interpret emojis.
631 | + Talked about unit testing and test coverage with the Python project called [coverage](https://coverage.readthedocs.io/en/7.2.7/)
632 | + Tomasz mentioned [coveralls](https://coveralls.io/) that gives you nice visual reports on your test coverage that can be integrated with Github and be part of CI
633 | + we of course talked about using [Pytest](https://docs.pytest.org/en/7.3.x/) for your unit tests
634 | + For those that are unfamiliar with “unit tests” and “pytest” here a presentation Yamil gave this group a few months ago [“Intro to unit testing in Python”](
635 | https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing)
636 | + We talked a bit about parallel processing in Python to finish work faster
637 | + We shared a link to the [free version of chapter 17 of the 2nd edition of “Automate the Boring Stuff with Python”](https://automatetheboringstuff.com/2e/chapter17/)
638 | + Yamil brought up a suggested approach by the author (Al Sweigart) of “Automate the Boring Stuff with Python” to download files from the internet using the python request module, but in a way that you will not be limited by the amount of free RAM on your computer ([section: Saving Downloaded Files to the Hard Drive](https://automatetheboringstuff.com/2e/chapter12/))
639 |   + Here is the snippet that uses a loop with the iter_content() method, to prevent using up all your RAM if the file is larger than the amount of free RAM on your system...
640 | 
641 | ```python
642 | import requests
643 | 
644 | res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
645 | res.raise_for_status()
646 | 
647 | playFile = open('RomeoAndJuliet.txt', 'wb')
648 | 
649 | for chunk in res.iter_content(100000):
650 |     playFile.write(chunk)
651 | ```
652 | + Also we mentioned [networkX](https://networkx.org/): NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
653 | + Tomasz ask if anyone was doing any batch work on images with Python, to find a faster way to process a larger number of images. We talked about perhaps using multiprocessing for this.
654 |   + Again from the book ["Automating stuff with Python", Ch 19](https://automatetheboringstuff.com/2e/chapter19/) talks about using the [Pillow](https://python-pillow.org/) Python module batch change images
655 |   + Also there should be ways to use very well known non-python library called ImageMagick, but controlled through Python, for batch making changes to images. Yamil has worked with many projects like the Drupal/PHP based Islandora project, that use ImageMagick for making changes to images
656 | 
657 | ### June 13, 2023
658 | + Python podcasts suggestion from Tomasz: [PythonBytes](https://pythonbytes.fm/)
659 | + David talked about an new Python module called “Pandas AI” that did find useful if you have a paid chatGPT account
660 |   + https://github.com/gventuri/pandas-ai
661 |   + “Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational”
662 |   + David id also find a poorly written blog post that was claiming featured that Pandas AI does NOT have, so stay away from this article…
663 |     + https://levelup.gitconnected.com/introducing-pandasai-the-generative-ai-python-library-568a971af014
664 | + We talked about when we have used chatGPT to write some Python code snippets, and what were our results.
665 |   + The results were mostly positive, but we talked about the benefits of already knowing Python well enough to formulate request more precisely and evaluate how well the chatGPT responses were
666 |   + Someone mentioned that chatGPT has become as an alternative to StackOverflow, specially if you are in a hurry
667 |   + Someone mentioned Github Copilot: “Those of us who have GitHub educator accounts have free access to Copilot. Have not tried it. Very reluctant, personally.” Which uses AI to write code for you.
668 |     + https://en.wikipedia.org/wiki/GitHub_Copilot
669 |     + As a counter argument there is this article [“Why I don’t use Copilot”](https://inkdroid.org/2023/06/04/copilot/)
670 |   + Will StackOverflow become obsolete with the revolution in AI? Yamil thinks that it is a good inspiration for prompts, and still has great information
671 |   + We saw an example of sharing a snippet of object oriented Python code to ask chatGPT to explain what is missing
672 |   + One of the participants was glad to get the explanations from chatGPT of what was missing in their object oriented code
673 |     + Here is the link to the chat https://chat.openai.com/share/fea426fb-cb02-4b38-9f42-128f59115fc4
674 |   + A recent Code4Lib article that talked about using AI generated code was shared [“Utilizing R and Python for Institutional Repository Daily Jobs”](https://journal.code4lib.org/articles/17134)
675 |   + We briefly talked about the ethics of using AI written code that was trained on code that other published publicly on Github, but without their explicit consent
676 |   + Podcast example crated by AI:
677 |     “I’ve been listening to this series in the Planet Money podcast where they try to make an entire podcast episode made by AI:” https://www.npr.org/series/1178395718/planet-money-makes-an-episode-using-ai
678 |   + Charles asked if anyone was using Python to automate work with the Azure cloud computing platform
679 |     + https://en.wikipedia.org/wiki/Microsoft_Azure
680 |     + We briefly talked about [“Azure Functions”](https://learn.microsoft.com/en-us/azure/azure-functions/functions-overview?pivots=programming-language-csharp) which seem similar to [AWS Lambda](https://en.wikipedia.org/wiki/AWS_Lambda)
681 | + We talked about a great site and free book that many people use to get started with Python [“Python for Everyone”](https://www.py4e.com/)
682 | + We also talked about the well known and still very popular Python [Requests](https://requests.readthedocs.io/en/latest/) module, and but also the newer and “async compatible” [HTTPX](https://www.python-httpx.org/) module, which was also mentioned on the Python Slack channel.
683 | 
684 | ### May 30, 2023
685 | + David shared his code utlizing `pymarc` to harvest and clean OCLC records. An older example of code: https://github.com/derlandson/PyCat
686 | + Demo of Match MARC toolset as well.
687 | + Tomasz reported his first experiences using `pymarc` v.5
688 | + Discussed a potential `pymarc` feature ordering subfields accoding to a particular field cataloging practice
689 |   + challenge: no clear, outlined rules to based it on
690 | + Rebecca demoed a script created to have circ desk staff click a single button for simple questions (directions, tech, find a book, etc.) Creates output file and emails results as csv once per month.
691 |  Currently doesn’t need admin permissions but various features may impact this.
692 |   + simplified `pyinstaller` app: [auto-py-to-exe]
693 | (https://pypi.org/project/auto-py-to-exe/) was used to help redeploy to other PCs.
694 | 
695 | ### May 16, 2023
696 | + We had a brief discussion about [pymarc](https://pymarc.readthedocs.io/en/latest/) and [MARC authority data](https://www.loc.gov/marc/authority/ecadhome.html)
697 |   + sparked by Benjamin's issues with using pymarc for authority records
698 |   + Tomasz run some quick tests and they looked good: `pymarc` was able to read such data, but more tests are needed to see if manipulating and writing is done correctly. There were concerns about differences in the leader field between the bibliographic and authority data
699 | 
700 | #### Ed Summers intro to new pymarc
701 | + David introduced Ed
702 | + Ed stated pymarc is work of many people, Ed's involvement is more of the maintainer
703 | 
704 | ##### Breaking changes in pymarc v.5:
705 | + new class `pymarc.Field.Subfield`
706 | + helper properties instead of methods
707 |   + old: record.title(), new: record.title
708 |   + old: record.publisher(), new: record.publisher
709 | + automatically sets UTF-8 code in record leader in the position 9
710 |   + pymarc always converts data to unicode, but before it did not attempt to change the code in the leader to reflect that
711 |   + most people don't want to write MARC-8, and want UTF-8 encoded data
712 | 
713 | + Ed shows off doing live coding! Uses [Google Colab](https://colab.research.google.com/) and Jupyter notebooks (tip: you can pip install packages in Colab: `!pip install pymarc`, the exclamation mark will tell the notbook cell in not a code but a command line script)
714 | + Ed shows initiating new record instance, and adding fields with the new model for subfields
715 | + `Subfield` is a python [`namedtuple`](https://docs.python.org/3.10/library/collections.html?highlight=namedtuple#collections.namedtuple)
716 | 
717 | *New:*
718 | ```python
719 | from pymarc import Record, Field, Subfield
720 | 
721 | record = Record()
722 | record.add_field(
723 |     Field(
724 |         tag="245",
725 |         indicators=["0", "0"],
726 |         subfields=[
727 |             Subfield(code="a", value="Foo :"),
728 |             Subfield(code="b", value=" bar /"),
729 |             Subfield(code="c", value="Spam.")
730 |         ]
731 |     ))
732 | ```
733 | or simply:
734 | ```python
735 | field = Field(
736 |     tag="245",
737 |     indicators=["0", "0"],
738 |     subfields=[
739 |         Subfield("a", "Foo :"),
740 |         Subfield("b", "bar /"),
741 |         Subfield("c", "Spam.")
742 |     ])
743 | ```
744 | 
745 | *old*
746 | ```python
747 | record.add_field(
748 |     Field(
749 |         tag="245",
750 |         indicators=["0", "0"],
751 |         subfields=["a", "Foo :", "b", "bar /", "c", "Spam."]
752 |     ))
753 | ```
754 | 
755 | + New model has advantages over subfiels as a list of strings:
756 |   + matches how cataloger's think about subfields - as code-value pairs (Tomasz)
757 |   + helps guard against errors such as missing an element to properly create a subfield
758 | 
759 | 
760 | + discussed briefly differences between pymarc and similar Pearl library [MARC::Record]https://metacpan.org/pod/MARC::Record()
761 | + Ed showed a tip how to avoid malformed or otherwise invalid records when looping over a file:
762 | Ed errors looping over return None (malformed bibs, leader lenght problems, )
763 | 
764 | ```python
765 | from pymarc import MARCReader
766 | 
767 | with open("foo.mrc", "rb") as marcfile:
768 |     reader = MARCReader(marcfile)
769 |     for record in reader:
770 |         if record is None:
771 |             print(reader.current_exception)
772 |         else:
773 |           # do something
774 | ```
775 | 
776 | + talked about potential new features in pymarc, for example handling of [linked 880 fields](https://www.loc.gov/marc/bibliographic/bd880.html) that include parallel data in non-Latin scripts
777 | 
778 | ### May 2, 2023
779 | + We talked about Rebecca’s code for parsing MARCXML from Ex Libris Alma
780 | Then we talked about our various experiences (good and bad) with parsing XML with Python’s built in ElementTree module versus LXML versus Beautiful soup. We took a moment to talk about the typical issues that can come up with web scraping when a site’s HTML changes over time.
781 | 
782 | + We then spoke about Eric Morgan’s recent question to the Code4lib mailing list about “literary warrant.”
783 | 
784 | + John asked if anyone had experience with Python modules for creating barcodes. We briefly also spoke about creating QR codes with Python.
785 | John is using this Python module:
786 | https://python-barcode.readthedocs.io/en/stable/
787 | Jason is using:
788 | PyQRCode==1.2.1
789 | pyzbar==0.1.9
790 | Emma shared a good explainer on QR codes : https://ivantay2003.medium.com/qr-code-demystify-2a5263ab136e
791 | 12:00
792 | 
793 | + Meghan asked about what tools to use when handed Excel files or CSV files that users would like some charts created from the data in a way that is shareable. This is in addition to creating charts inside Excel and Jupiter or Colab notebooks, then sharing them with a group of people.
794 | Here are some of the suggestions discussed...
795 | Plotly - https://plotly.com/
796 | Streamlit - https://streamlit.io/
797 | This book on mixing Python to process data, but then use JS based tools for web visualization was mentioned again in this group...
798 | Data Visualization with Python and JavaScript, 2nd Edition
799 | https://www.oreilly.com/library/view/data-visualization-with/9781098111861/
800 | The author’s website is also worth a look: https://www.kyrandale.com
801 | https://www.kyrandale.com
802 | 
803 | + We talked about creating RDFs with Python, including Python modules, visualization tool, and GML files
804 | https://github.com/RDFLib/rdflib
805 | https://rdflib.readthedocs.io/en/stable/
806 | “RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.”
807 | https://gephi.org/
808 | “Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free.”
809 | GML files
810 | https://en.wikipedia.org/wiki/Graph_Modelling_Language
811 | + We talked briefly about the new version 5.0 of Pymarc and that we would like to go over the changes to Pymarc in thsi group in the future
812 | https://gitlab.com/pymarc/pymarc/-/releases/v5.0.0
813 | 
814 | ### April 18, 2023
815 | At today meeting we had @michelle.janowiecki give a short presentation on Pandas, partially based on a longer Pandas presentation she has given before.
816 | [Speedy pandas : a super brief intro to Python's pandas library (see slides)](https://docs.google.com/presentation/d/1xRdNVonTxi9-gEsQkNvbF1e47o_2cuo1iimunoFUky4/edit#slide=id.p)
817 | Here are a couple of useful links from her presentation...
818 | 
819 | #### Pandas Official resources
820 | + [documentation website](https://pandas.pydata.org/pandas-docs/stable/index.html)
821 | + [User Guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html)
822 | + [API reference](https://pandas.pydata.org/pandas-docs/stable/reference/index.html)
823 | 
824 | #### Pandas Additional resources
825 | + ["Pandas for Metadata Transformation and Cleanup" workshop by Michelle Janowiecki](https://mjanowiecki.github.io/intro-pandas-metadata/intro.html)
826 | + the best book: [Pandas for everyone : Python data analysis](https://www.worldcat.org/title/pandas-for-everyone-python-data-analysis/oclc/1240309883?referer=br&ht=edition)
827 | 
828 | #### Examples of the code Michelle demonstrated
829 | 
830 | ```python
831 | import pandas as pd
832 | 
833 | filename = "sampleData.csv"
834 | df = pd.read_csv(filename)
835 | print(df.head())
836 | 
837 | print(df.columns)
838 | 
839 | degree_department = df["degree_department"]
840 | department_unique = degree_department.unique()
841 | print(department_unique)
842 | unique_list = list(department_unique)
843 | print(unique_list)
844 | ```
845 | 
846 | ```python
847 | import pandas as pd
848 | 
849 | filename = "sampleData.csv"
850 | df = pd.read_csv(filename)
851 | 
852 | print(df.shape)
853 | df = df.dropna(axis=0, how="all")
854 | df = df.dropna(axis=1, how="all")
855 | df = df.drop_duplicates()
856 | df["title"] = df["title"].str.strip()
857 | 
858 | print(df.head())
859 | print(df.shape)
860 | 
861 | df.to_csv("sampleData_cleaned.csv", index=False)
862 | ```
863 | 
864 | ```python
865 | import pandas as pd
866 | 
867 | df_1 = pd.read_csv("frame_1.csv")
868 | df_2 = pd.read_csv("frame_2.csv")
869 | 
870 | merged = pd.merge(df_1, df_2, how="left", on="subject_id")
871 | print(merged.head())
872 | 
873 | merged.to_csv("merged_frames.csv", index=False)
874 | ```
875 | 
876 | These are some of the Pandas features @michelle.janowiecki demonstrated today
877 | + drop_duplicates()
878 | + dropna()
879 | + merge()
880 | 
881 | After the presentation we all exchanged pandas usage tips
882 | + like pd.json_normalize(a_dict)
883 |   + “All Pandas json_normalize() you should know for flattening JSON”
884 |   + https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd
885 | + and the ability of doing mathematical
886 | + also there was a mention of the command line JQ tool for parsing JSON
887 |   + https://stedolan.github.io/jq/
888 | 
889 | 
890 | ### April 4, 2023
891 | #### The mini-workshop "An Introduction to Python for Absolute Beginners":
892 | A very basic intro to Python for librarians who have little to no experience with Python but who want to get started.
893 | + What is Python and why is it useful? (5 min)
894 | + Hands-on practice with basic operations in Python, using Google Colaboratory (25 min)
895 | + Print function
896 | + Data types
897 | + Arithmetic operations
898 | + String concatenation
899 | + Variable assignment
900 | + Q&A/Resources (15 min)
901 | 
902 | #### Notes
903 | + We got a shortened version of a Rice University Library workshop called “Mini Python Intro”
904 | + We used Google Colab
905 |   + Which is a free Google service that essentially hosts Python Jupyter Notebooks that can be shared with others
906 |   + https://colab.research.google.com/
907 | + For the training using the following resources
908 |   + Pre-loaded notebook:
909 |     + https://colab.research.google.com/drive/1m3cz4KeozooHFzjswyjgJmbfXZTfG0mP?usp=sharing
910 |     + Exercises:
911 |       + https://drive.google.com/file/d/1CRda_Gh3mrqpEmbnvF58-7jvYnmV-LhI/view?usp=share_link
912 | + We discussed the Python print() sep parameter
913 | + David shared an article and specifically a tip about using a “union” operator, in this special case a bar “|” character, to join multiple dictionaries together.
914 |   + https://medium.com/techtofreedom/19-sweet-python-syntax-sugar-for-improving-your-coding-experience-37c4118fc6b1
915 | + Python f-strings were briefly mentioned
916 |   + https://realpython.com/python-f-strings/
917 |   + f-strings are available as of Python 3.6
918 | + Talked about copy /pasting parts of your Python error messages right into Google to help you figure out what is wrong
919 |   + and how https://stackoverflow.com/ is a common place to look for error advice
920 |   + Google Colab actually offers to send you to StackOverflow when you get an error on code running in Colab
921 | + @Yamil Suárez shared a code snippet demonstrating how to read a file stored in a Google Drive into Google Colab:
922 | ```python
923 | from google.colab import drive
924 | drive.mount('/content/drive')
925 | 
926 | import pandas as pd
927 | 
928 | df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/data.tsv', sep='\t')
929 | ```
930 | 
931 | 
932 | 
933 | ### March 21, 2023
934 | + Talked about this group’s [new repository](https://github.com/code4lib/python4lib-resources), and that we want to encourage others to contribute changes via PRs (or reach out to the group)
935 | + Talked about combining JS and python for web visualization
936 |   + [Data Visualization with Python and JavaScript, 2nd Edition by Kyran Dale](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/)
937 | + Talked about if on macOS we should currently be using homebrew for installing Python on macOS
938 |   + https://docs.brew.sh/Homebrew-and-Python
939 |   + consensus was that it should work fine
940 |   + “if you use VSCode, it recommends homebrew on mac. I used home-brew to install 3.10 and I haven’t encountered any issues
941 |   + (https://code.visualstudio.com/docs/python/python-tutorial”)
942 |   + we talked about how Anaconda or Anaconda can be used for python installations
943 | + Talked about [Library Carpentry lessons](https://librarycarpentry.org/lessons/) on Python and other skills like bash, OpenRefine
944 | + Spoke a bit about [Google Collab](https://colab.research.google.com/), which are essentially Jupyter Notebooks in the cloud, no need for local installation
945 | + Pivoted to talk about interesting things seen in during Code4lib
946 |   + the Python GUI package mentioned named [Gooey](https://pypi.org/project/Gooey/)
947 |   + “There was a poster about updating subject headings as well. Which was something we had briefly talked about briefly a week before C4L.”
948 | + Touched on a suggested breaking change to [pymarc](https://gitlab.com/pymarc/pymarc), [MR details](https://gitlab.com/pymarc/pymarc/-/merge_requests/194)
949 |   + this change uses Python “namedtuples”
950 |   + this change is welcome by many
951 |   + We then covered how to use pymarc with authority records, as opposed to bibliographic records - more research needs to be done
952 | + NOTE: this Python group in the future plans to host a pymarc “code recipe” sharing session
953 | + Talked about current issues in pymarc with MARC bib tag 880
954 | 
955 | ### March 7, 2023
956 | + Introductions with a few new members
957 | + Move the Python{4}Lib resource page to a Code{4}Lib, thanks @klinga
958 | + @Rebecca Hyams working on an ELUNA Dev. Day presentation gathering specific holding data (granular) from Alma via API and parsing it via python script.
959 | Chat about maintaining authorities when you’ve decided to change from standard language. Is/should there be a tool to check for changes for authorities you select?
960 | + A project for a heat map visual for circulation might be a new way of helping to weed/collection develop.
961 |   + Perhaps there's interest to have a working group dive into different projects. Could be helpful for design ideas.
962 | + Dashboards and/or developing scripts that can translate one form of data to another; identifying transformation steps and when to streamline them in one script vs. multiple.
963 | + IPEDS data transformations. A lot of data isn’t as streamlined as we’d like every time IPEDS comes up. Still quite local though. (Changes year to year?)
964 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | certifi==2022.9.24
2 | charset-normalizer==2.1.1
3 | idna==3.4
4 | requests==2.28.1
5 | urllib3==1.26.12
6 | 


--------------------------------------------------------------------------------