├── .gitignore ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── demo ├── __init__.py ├── pydantic.ipynb └── requests_session_example.py ├── media ├── airflow1.png ├── airflow2.png ├── airflow3.png ├── airflow4.png ├── airflow5.png ├── airflow7.png ├── env-mgmt.png ├── marc1.png ├── marc2.png ├── pax-opex1.png ├── pax-opex2.png └── pax-opex3.png ├── mtg_notes.md └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | 131 | # IntelliJ 132 | .idea -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | ## How to contribute 4 | Use [pull requests](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) to propose changes to this repository: 5 | 1. Fork this repo and create your branch from `main` 6 | 2. Make changes/additions in your fork 7 | 3. When ready issue a pull request 8 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Tomek 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # p4l-resources 2 | Shared space for the Python{4}Lib group. 3 | 4 | See our [meeting notes](mtg_notes.md) for more details. 5 | 6 | Upcoming meetings (meetings at 11am Eastern time): 7 | + *No meeting on May 14 during Code4Lib conference* 8 | + May 28, 2024: Thomas Guinard talks about Jupiter Kernel Gateway 9 | + June 11, 2024: Rebecca Hyams demos Postman 10 | + June 25, 2024: Charles Brown-Roberts & Eddie Prieto introduce deployment of web apps (Flask, security, and more) 11 | 12 | Would like to suggest a worthy resource? See [contributing instructions](CONTRIBUTING.md). 13 | 14 | 15 | ## Python Resources 16 | ### Reference 17 | + [Python Cheatsheet](https://www.pythoncheatsheet.org/) 18 | 19 | ### Books 20 | + [Automate the Boring Stuff with Python : practical programming for total beginners / Al Sweigart](https://worldcat.org/title/1128094127) 21 | + [Python crash course : a hands-on, project based introduction to programming / Eric Matthes](https://search.worldcat.org/title/1350635022) 22 | + [Python workout: 50 ten-minute exercises / Reuven M. Lerner](https://search.worldcat.org/title/1121083840) 23 | + [Effective Python: 59 Ways to Write Better Python / Brett Slatkin](https://www.worldcat.org/title/1140129622) 24 | + [Pandas for everyone: Python data analysis / Daniel Y. Chen](https://worldcat.org/en/title/1240309883) 25 | + [Data Visualization with Python and JavaScript, 2nd Editionby Kyran Dale](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/) 26 | 27 | ### Tutorials 28 | + [Official Python Tutorial](https://docs.python.org/3/tutorial/index.html) 29 | 30 | ### Courses 31 | + General Python courses on [Coursera](https://www.coursera.org/courses?query=python) (free to enroll) 32 | + [Python for Librarians / Library Juice Academy](https://libraryjuiceacademy.com/shop/course/270-python-for-librarians/) (fee) 33 | + [Library Carpentry](https://librarycarpentry.org/lessons/) (free lessons, paid sessions with an instructor) 34 | + [Learn Python 3 the Hard Way / Zed Shaw](https://shop.learncodethehardway.org/access/buy/9/) (free with O'Reilly for Higher Education subscription) 35 | 36 | ### Articles 37 | + [Fuzzy Matching at Scale / Josh Taylor](https://towardsdatascience.com/fuzzy-matching-at-scale-84f2bfd0c536) 38 | + [19 Sweet Python Syntax Sugar for Improving Your Coding Experience](https://medium.com/techtofreedom/19-sweet-python-syntax-sugar-for-improving-your-coding-experience-37c4118fc6b1) 39 | 40 | ### Podcasts 41 | + [Python Bytes](https://pythonbytes.fm/) weekly Python news podcast hosted by Michael Kennedy and Brian Okken 42 | + [Test & Code](https://testandcode.com/) hosted by Brian Okken, focused on automated testing in Python 43 | + [Talk Python To Me](https://talkpython.fm/) hosted by Michael Kennedy 44 | + [Podcast.__init__](https://www.pythonpodcast.com/) hosted by Tobias Macey 45 | + [The Real Python Podcast](https://realpython.com/podcasts/rpp/) weekly coding tips, news, and interviews 46 | 47 | ### Blogs 48 | + [Practical Business Python](https://pbpython.com/) / data science centric 49 | 50 | ### Member Presentations 51 | + [Finding a path forward: The use of Python to support technical services work in academic libraries](https://docs.google.com/presentation/d/1598qxRIB08_kLaJov_CsKWHw5VctFY0MIZhohQUG6ww/edit#slide=id.p1) Talk given at Python{4}Lib 9/20/22 by Maria Collins and Xiaoyan Song based on their presentation at ER&L 2022 52 | + [Intro to unit testing in Python / Yamil Suárez](https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing) 53 | + [Speedy pandas : a super brief intro to Python's pandas library / Michelle Janowiecki](https://docs.google.com/presentation/d/1xRdNVonTxi9-gEsQkNvbF1e47o_2cuo1iimunoFUky4/edit#slide=id.p) 54 | 55 | ## Tools 56 | 57 | ### Library metadata 58 | + [pybibframe](https://pypi.org/project/pybibframe/) - MARC/XML to RDF or Versa output converter 59 | + [pymarc](https://pymarc.readthedocs.io/en/latest/) - MARC parser 60 | + [marcgrep](https://github.com/phette23/marcgreppy) - CLI for searching MARC files 61 | 62 | #### ILS & other library systems wrappers 63 | + [almapipy](https://github.com/UCDavisLibrary/almapipy) - Alma API wrapper 64 | + [caiasoft-sdk-python](https://github.com/kstatelibraries/caiasoft-sdk-python) - SDK for Connecting to the CaiaSoft API 65 | 66 | #### Transliteration / romanization 67 | + [Aksharamukha](https://github.com/virtualvinodh/aksharamukha-python) - transliteration of 120 Indic languages 68 | + [ArabicTransliterator](https://github.com/MTG/ArabicTransliterator) - ALA-LC transliteration tool for Arabic 69 | + [cyrillic-transliteration](https://github.com/opendatakosovo/cyrillic-transliteration) - bi-directional transliteration of Cyrillic script to Latin script and vice versa 70 | + [graphtransliterator](https://github.com/seanpue/graphtransliterator) 71 | 72 | ### Data Analysis 73 | #### Pandas 74 | + [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/index.html) 75 | + [Intro to Python: Pandas for Metadata Transformation and Cleanup / workshop by Michelle Janowiecki](https://mjanowiecki.github.io/intro-pandas-metadata/intro.html) 76 | + [Speedy pandas / Michelle Janowiecki](https://docs.google.com/presentation/d/1xRdNVonTxi9-gEsQkNvbF1e47o_2cuo1iimunoFUky4/edit#slide=id.p) 77 | + [All Pandas json_normalize() you should know for flattening JSON / B. Chen](https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd) 78 | ### Data Validators 79 | + [Pydantic official documentation](https://docs.pydantic.dev/latest/) 80 | 81 | ### GUI 82 | + [VisualTK](https://visualtk.com/) / (great starting point to visually create a GUI in Tkinter) 83 | + [Gooey](https://pypi.org/project/Gooey/) (simple GUI package, transforms argparse into GUI) 84 | 85 | ### HTTP 86 | #### Requests 87 | + [Requests official docs](https://requests.readthedocs.io/en/latest/) 88 | + [Python's Requests Library (Guide) / Alex Ronquillo](https://realpython.com/python-requests/) 89 | + [HTTPX official docs](https://www.python-httpx.org/) 90 | #### Links checkers 91 | + [LinkChecker official documenation](https://linkchecker.github.io/linkchecker/) 92 | #### Retries 93 | + [stamina official docs](https://stamina.hynek.me/en/stable/index.html) 94 | + [tenacity offical docs](https://tenacity.readthedocs.io/en/latest/) 95 | 96 | ### Packaging 97 | #### Briefcase (packaging) 98 | + [Briefcase documentation](https://briefcase.readthedocs.io/en/latest/) 99 | + [PyCon 2020 'Snakes In a Case' talk by Russell Keith-Magee](https://us.pycon.org/2020/schedule/presentation/126/) 100 | + [Qt for Python & Briefcase](https://doc.qt.io/qtforpython/deployment-briefcase.html) 101 | 102 | #### PyInstaller (packaging) 103 | + [PyInstaller documentation](https://pyinstaller.org/en/stable/index.html) 104 | + [Easy Steps to Create an Executable in Python Using Pyinstaller / Renu Khandelwal](https://medium.com/swlh/easy-steps-to-create-an-executable-in-python-using-pyinstaller-cc48393bcc64) 105 | + [Using PyInstaller to Easily Distribute Python Applications / Luke Lee](https://realpython.com/pyinstaller-python/) 106 | + [auto-py-to-exe] 107 | (https://pypi.org/project/auto-py-to-exe/) - PyInstaller made easy 108 | 109 | ### QR codes 110 | + [QR Code Demystify / Ivan](https://ivantay2003.medium.com/qr-code-demystify-2a5263ab136e) 111 | + [python-barcode](https://python-barcode.readthedocs.io/en/stable/) 112 | + [PyQRCode](https://pythonhosted.org/PyQRCode/) 113 | + [pyzbar](https://github.com/NaturalHistoryMuseum/pyzbar/) 114 | 115 | ### RDF 116 | + [rdflib](https://rdflib.readthedocs.io/en/stable/) 117 | + [Gephi](https://gephi.org) 118 | 119 | ### Testing 120 | + [Intro to unit testing in Python / Yamil Suárez](https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing) 121 | 122 | ### Visualization 123 | + [Python Data Visualization: Where to Start? : Interview with Chris Moffitt / Talk Python To Me: episode # 384](https://talkpython.fm/episodes/transcript/384/python-data-visualization-where-to-start) (a great overview of available tools) 124 | + [Data Visualization with Python and JavaScript, 2nd Edition 125 | by Kyran Dale](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/) 126 | -------------------------------------------------------------------------------- /demo/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/demo/__init__.py -------------------------------------------------------------------------------- /demo/pydantic.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "id": "5a244171-3219-4bd5-9c97-ae59c7621536", 7 | "metadata": { 8 | "tags": [] 9 | }, 10 | "outputs": [], 11 | "source": [ 12 | "!pip install pydantic" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "id": "0d596cd1-9e07-4d91-8903-1f28cc011c9f", 18 | "metadata": {}, 19 | "source": [ 20 | "## Processing data using dicts\n", 21 | "\n", 22 | "Dictionaries are the backbone of python data structures, but it is very easy to miss errors with them because they do not enforce what kind of data you put into them." 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "id": "690a8ae9-69a8-4fa9-8a5b-7e653e268263", 29 | "metadata": { 30 | "tags": [] 31 | }, 32 | "outputs": [], 33 | "source": [ 34 | "def report_pet(pet_dict):\n", 35 | " print(f\"My name is {pet_dict['name']} and I need {pet_dict['n_legs'] / 2} pairs of pants\")" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "id": "38adc86e-7f71-43d2-baa8-1567c339ec4b", 42 | "metadata": { 43 | "tags": [] 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "json_1 = {\"name\": \"Mittens\", \"n_legs\": 4}\n", 48 | "json_2 = {\"name\": \"Slither\", \"n_legs\": 0}\n", 49 | "json_3 = {\"name\": \"Skitter\", \"n_legs\": \"8\"}\n", 50 | "json_4 = {\"n_legs\": 6}" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "id": "59e2d709-11fd-4003-b425-5157c8dfa3c1", 57 | "metadata": { 58 | "tags": [] 59 | }, 60 | "outputs": [], 61 | "source": [ 62 | "report_pet(json_1)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "id": "f455fb8e-d89a-4480-b957-c68c1ab2352b", 69 | "metadata": { 70 | "tags": [] 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "report_pet(json_2)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "id": "c28bfe76-1e30-4b8f-9c8d-07df4ce36ec4", 80 | "metadata": {}, 81 | "source": [ 82 | "The first two pets work fine because their dictionaries have data that happens to be valid. But things start to go wrong if we pass the wrong data type" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "id": "01d1833f-9931-4800-b445-bee966e91606", 89 | "metadata": { 90 | "tags": [] 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "report_pet(json_3)" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "id": "5fc6b993-72b6-4046-8108-0c2bff3c50b0", 100 | "metadata": {}, 101 | "source": [ 102 | "This error only comes up when we run our function to report on the pet - it doesn't check the data any earlier." 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "id": "bdf85d03-10ff-4a95-a545-498f59910610", 109 | "metadata": { 110 | "tags": [] 111 | }, 112 | "outputs": [], 113 | "source": [ 114 | "report_pet(json_4)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "id": "e4075caa-33fc-4139-a710-7683a2aebb7b", 120 | "metadata": { 121 | "execution": { 122 | "iopub.execute_input": "2023-07-27T13:36:02.207076Z", 123 | "iopub.status.busy": "2023-07-27T13:36:02.206472Z", 124 | "iopub.status.idle": "2023-07-27T13:36:02.210591Z", 125 | "shell.execute_reply": "2023-07-27T13:36:02.209948Z", 126 | "shell.execute_reply.started": "2023-07-27T13:36:02.207056Z" 127 | }, 128 | "tags": [] 129 | }, 130 | "source": [ 131 | "And when our dictionary is missing an entire field, we need to figure out what the \"key error\" is." 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "id": "f67db9a6-3861-45c5-917b-468afe93344f", 137 | "metadata": {}, 138 | "source": [ 139 | "## Processing data with Pydantic" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "id": "47e53073-a6db-4986-a5da-17015803ff83", 145 | "metadata": {}, 146 | "source": [ 147 | "[Pydantic](https://docs.pydantic.dev/latest/https://docs.pydantic.dev/latest/) uses python type hints to define a class - a way of stating the exact shape of data we expect to receive." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "id": "b91eb9a1-37b1-4456-bf76-27939bebde1d", 154 | "metadata": { 155 | "tags": [] 156 | }, 157 | "outputs": [], 158 | "source": [ 159 | "from pydantic import BaseModel\n", 160 | "\n", 161 | "class PydanticPet(BaseModel):\n", 162 | " name: str\n", 163 | " n_legs: int\n", 164 | "\n", 165 | "def report_pypet(pypet: PydanticPet):\n", 166 | " print(f\"My name is {pypet.name} and I need {pypet.n_legs / 2} pairs of pants\")" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "id": "e3d62f5e-8077-4c5f-805d-6987f108e683", 172 | "metadata": {}, 173 | "source": [ 174 | "Note that we aren't accessing dictionary keys with `[\"strings\"]` that may or may not succeed, but instead using dot notation `pypet.name` because we _know_ that every `PydanticPet` instance has an attribute called `name`." 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": null, 180 | "id": "2ce2f8c1-bcac-422c-bfef-74d3aea57489", 181 | "metadata": { 182 | "tags": [] 183 | }, 184 | "outputs": [], 185 | "source": [ 186 | "pypet_1 = PydanticPet(**json_1)\n", 187 | "# Using ** is a python trick that passes a dictionary to a function by \"expanding\" it and putting in the key names as arugments\n", 188 | "# pypet_1 = PydanticPet(name=\"Mittens\", n_legs=4)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "id": "95382aac-56dd-4997-9e1c-060b585ab8fb", 195 | "metadata": { 196 | "tags": [] 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "pypet_1" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "id": "2068e358-1139-4174-9571-4e07869b0f58", 207 | "metadata": { 208 | "tags": [] 209 | }, 210 | "outputs": [], 211 | "source": [ 212 | "report_pypet(pypet_1)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "id": "d8755600-1b92-4b9b-ba61-62e7cc5bbff6", 219 | "metadata": { 220 | "tags": [] 221 | }, 222 | "outputs": [], 223 | "source": [ 224 | "pypet_2 = PydanticPet(**json_2)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "id": "77225e76-eb12-43b2-8941-1936178e2e64", 231 | "metadata": { 232 | "tags": [] 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "pypet_2" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "id": "041a1ea7-7c8b-464f-8755-2fd1e1ace059", 243 | "metadata": { 244 | "tags": [] 245 | }, 246 | "outputs": [], 247 | "source": [ 248 | "report_pypet(pypet_2)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "id": "8fe78f14-d817-4dbf-974a-48b05ae5a0b8", 254 | "metadata": {}, 255 | "source": [ 256 | "Pydantic can automate certain kinds of data parsing, such as converting the string `\"8\"` to the integer `8`." 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "id": "3a2519f3-ebdf-41d1-bda5-e342a2520a93", 263 | "metadata": { 264 | "tags": [] 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "pypet_3 = PydanticPet(**json_3)" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "id": "e779abfa-9808-47c8-8490-f3883070c14c", 275 | "metadata": { 276 | "tags": [] 277 | }, 278 | "outputs": [], 279 | "source": [ 280 | "pypet_3" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "id": "ee16367b-4cdf-48cb-8110-2f2d4bf32355", 287 | "metadata": { 288 | "tags": [] 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "report_pypet(pypet_3)" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": null, 298 | "id": "1b66ef5f-21f3-4b8d-b43e-12fa05800f12", 299 | "metadata": { 300 | "tags": [] 301 | }, 302 | "outputs": [], 303 | "source": [ 304 | "PydanticPet(**json_4)" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "id": "da3c71cf-354d-4b22-be22-ba0eaec91afd", 310 | "metadata": {}, 311 | "source": [ 312 | "Pydantic raises a `ValidationError` that provides a clear reason why the data passed in was invalid." 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "id": "b0c0aa3e-b907-4317-bb71-3615c15da220", 318 | "metadata": {}, 319 | "source": [ 320 | "## Nesting and lists\n", 321 | "\n", 322 | "Pydantic models can refer to other pydantic models, and can nest lists of data too." 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "id": "0ab0b456-873c-4a68-882d-e5c1abdcfc03", 329 | "metadata": { 330 | "tags": [] 331 | }, 332 | "outputs": [], 333 | "source": [ 334 | "class PetDaycare(BaseModel):\n", 335 | " name: str\n", 336 | " founding_year: int | None # This indicates that founding_year is an optional attribute\n", 337 | " current_pets: list[PydanticPet] = []" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "id": "01ef018e-74d3-4c89-89ad-ebd71205f584", 344 | "metadata": { 345 | "tags": [] 346 | }, 347 | "outputs": [], 348 | "source": [ 349 | "local_daycare = PetDaycare(name=\"All Things That Crawl\")" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "id": "570f829b-1540-40cb-84c3-1a380b63e4b6", 356 | "metadata": { 357 | "tags": [] 358 | }, 359 | "outputs": [], 360 | "source": [ 361 | "local_daycare" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "id": "6a179a22-3674-4dd9-be92-d624630e482a", 368 | "metadata": { 369 | "tags": [] 370 | }, 371 | "outputs": [], 372 | "source": [ 373 | "local_daycare.current_pets.append(pypet_1)\n", 374 | "local_daycare.current_pets.append(pypet_2)\n", 375 | "local_daycare.current_pets.append(pypet_3)" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "id": "07beecd1-2281-4550-af8b-22780f19887c", 382 | "metadata": { 383 | "tags": [] 384 | }, 385 | "outputs": [], 386 | "source": [ 387 | "local_daycare" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "id": "68d8f84b-9073-493d-a52f-e6130c28e25f", 394 | "metadata": { 395 | "tags": [] 396 | }, 397 | "outputs": [], 398 | "source": [ 399 | "for pet in local_daycare.current_pets:\n", 400 | " report_pypet(pet)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "id": "1115a0bd-dcf9-4d21-b4c5-0e70d4ad7810", 406 | "metadata": {}, 407 | "source": [ 408 | "One of the biggest uses of pydantic is serializing data to JSON to be used in API servers." 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": null, 414 | "id": "4de9d024-fe4d-41c0-b342-a3eed2218758", 415 | "metadata": { 416 | "tags": [] 417 | }, 418 | "outputs": [], 419 | "source": [ 420 | "local_daycare.json()" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "id": "60a1fc4c-320f-4254-92a2-5a9dba595f35", 426 | "metadata": {}, 427 | "source": [ 428 | "Pydantic also can autogenerate a JSONSchema that can power API documentation pages." 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "id": "7850fda9-051c-4731-904e-0ca62c7c9777", 435 | "metadata": { 436 | "tags": [] 437 | }, 438 | "outputs": [], 439 | "source": [ 440 | "PetDaycare.schema()" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": null, 446 | "id": "bfe1f33c-5424-44e3-b783-e636675ddd24", 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [] 450 | } 451 | ], 452 | "metadata": { 453 | "kernelspec": { 454 | "display_name": "Python 3 (ipykernel)", 455 | "language": "python", 456 | "name": "python3" 457 | }, 458 | "language_info": { 459 | "codemirror_mode": { 460 | "name": "ipython", 461 | "version": 3 462 | }, 463 | "file_extension": ".py", 464 | "mimetype": "text/x-python", 465 | "name": "python", 466 | "nbconvert_exporter": "python", 467 | "pygments_lexer": "ipython3", 468 | "version": "3.10.6" 469 | } 470 | }, 471 | "nbformat": 4, 472 | "nbformat_minor": 5 473 | } 474 | -------------------------------------------------------------------------------- /demo/requests_session_example.py: -------------------------------------------------------------------------------- 1 | """ 2 | Pros: 3 | The `requests.Session` object allows to persist parameters across all requests issued within the session. 4 | When a service requires authentication, the session will store the credentials and persist them to be used 5 | for subsequent calls. 6 | If a service you connect to allows keep-alive connection, the Requests session will persist connection 7 | across all requests instead of establishing a new one for each requests. 8 | 9 | To install Requests library: 10 | `pip install requests` 11 | """ 12 | 13 | 14 | from requests import Session 15 | 16 | 17 | def print_request_headers(r, *args, **kwargs): 18 | print(r.url, r.request.headers) 19 | 20 | 21 | def make_multiple_requests_in_session(): 22 | """ 23 | Issue multiple requests to the same service (id.loc.gov), 24 | persist the connection and attach appropriate headers. 25 | 26 | id.gov.loc does not require authentication, but other services 27 | may. Credentials or access tokens can be stored in the session object and used 28 | for each request. 29 | """ 30 | with Session() as session: 31 | session.headers.update( 32 | {"User-Agent": "my_email", "Accept": "application/json"} 33 | ) # will attach these parameters to each request header during the session 34 | session.timeout = 5 35 | 36 | terms = ["sh85080541", "sh91002704", "sh85088368"] 37 | for term in terms: 38 | url = f"https://id.loc.gov/authorities/subjects/{term}" 39 | response = session.get(url, hooks={"response": print_request_headers}) 40 | if response.status_code == 200: 41 | yield response.json() 42 | else: 43 | continue 44 | 45 | 46 | if __name__ == "__main__": 47 | results = make_multiple_requests_in_session() 48 | for response in results: 49 | # do something with each response 50 | pass 51 | -------------------------------------------------------------------------------- /media/airflow1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow1.png -------------------------------------------------------------------------------- /media/airflow2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow2.png -------------------------------------------------------------------------------- /media/airflow3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow3.png -------------------------------------------------------------------------------- /media/airflow4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow4.png -------------------------------------------------------------------------------- /media/airflow5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow5.png -------------------------------------------------------------------------------- /media/airflow7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/airflow7.png -------------------------------------------------------------------------------- /media/env-mgmt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/env-mgmt.png -------------------------------------------------------------------------------- /media/marc1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/marc1.png -------------------------------------------------------------------------------- /media/marc2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/marc2.png -------------------------------------------------------------------------------- /media/pax-opex1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/pax-opex1.png -------------------------------------------------------------------------------- /media/pax-opex2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/pax-opex2.png -------------------------------------------------------------------------------- /media/pax-opex3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/code4lib/python4lib-resources/5d2ac0cebf3a560eb9c59fb69c8dc7270d3f4075/media/pax-opex3.png -------------------------------------------------------------------------------- /mtg_notes.md: -------------------------------------------------------------------------------- 1 | ### May 16, 2024 (Code4Lib Post-Conference Session) 2 | + Eric Phetteplace ran a workshop on [Python4Lib](https://2024.code4lib.org/workshop/Python4Lib) at Code4Lib 2024 in Ann Arbor 3 | + Started with an open discussion where we talked about people's experience with Python and some general topics 4 | + Some folks were mainly familiar with running Python in notebooks, others were more familiar with running Python scripts 5 | + We spoke a bit about managing dependencies and tools like Pipenv/Poetry that help with this and abstract over virtual environments 6 | + We discussed asyncio and asynchronoous programming generally, when to use it, what types of problems it addresses, and CPU-bound (computation heavy) vs IO-bound (network/files heavy) tasks 7 | + Eric introduced his [`marcgrep`](https://github.com/phette23/marcgreppy) CLI tool for searching MARC records 8 | + We worked through the [c4l24-python4lib](https://github.com/phette23/c4l24-python4lib) repo which has notebooks on several topics. The only topics we covered specifically were: 9 | + [Jupyter Notebooks](https://github.com/phette23/c4l24-python4lib/blob/main/docs/notebooks.md) (the material was delivered as notebooks) 10 | + [Pymarc](https://github.com/phette23/c4l24-python4lib/blob/main/docs/pymarc.ipynb) and common usage patterns, the most foolproof ways to get and modify record information 11 | + [Pandas](https://github.com/phette23/c4l24-python4lib/blob/main/docs/pandas.ipynb) and its fundamental concepts (DataFrames, Series), how to summarize loaded data, stopped after introducing how to filter via bracket expressions 12 | 13 | ### April 30, 2024 14 | + David asked if anyone had experience with or knew of any automated discard assessment tools 15 | + Javier said he has 25,000 volumes to assess for discard 16 | + Tomasz said other groups may know more about these types of tools because tech services may not have responsibility for collections assessment. Reference librarians may know more about potential tools to use. 17 | + Sara Amato has used OCLC API “to look at WC holdings and compare also to HathiTrust and comparisons to other libraries in our group to help make decisions - not great for large scale projects but good for smaller lists. I don’t have the code up anywhere though… and it doesn’t have any item level data like circ.” 18 | + Tomasz asked if Pymarc will have a new release due to a change in how indicators are handled 19 | + Indicators will be a named tuple that can only have two positions rather than a list which could be of any length 20 | + The change is outlined in this merge request: https://gitlab.com/pymarc/pymarc/-/merge_requests/206 21 | + Ed: No scheduled release, reluctant to introduce another major version with breaking changes 22 | + More discussion of the change is in the [pymarc google group](https://groups.google.com/g/pymarc/c/cMkDb-dDDBY?pli=1) 23 | + Michael asked if anyone has experience working with APIs for wikimedia/wikimedia commons 24 | + He has copyright free newspaper images he would like to upload in bulk as PDFs (rather than image files which the other wikicommons tools can use) 25 | + Javier mentioned using the APIs to get data out of wikimedia commons but not to POST data 26 | + Tomasz asked about Michael’s involvement in movement to preserve Ukrainian cultural heritage materials after the start of the full scale invasion 27 | + Michael noted there are two parts to this preservation work: 28 | + [SUCHO](https://www.sucho.org/) works on preserving publicly available materials 29 | + There is a separate effort to back up digital materials that are not publicly available 30 | + Michael mentioned Maryna Paliienko, a Fulbright Scholar from Taras Shevchenko University, whose project focuses on archives 31 | + Maryna and Michael recently gave a presentation at NYU: https://www.nycarchivists.org/event-5671162 32 | + Michelle asked for help figuring out why her API calls hang when she tries to upload large files 33 | + Files are ~2GB and she is posting them using the DSpace API. The files have to be read in binary before uploading them and the requests just hang after uploading the file successfully 34 | + Yamil mentioned that Python has issues with downloading files that are larger than available RAM and wondered if it has a similar issue with uploading files larger than available RAM 35 | + He also provided link to streaming uploads with Requests: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads 36 | + Impromptu code review: https://github.com/mjanowiecki/dspace7-rest-api/blob/main/post/postItemsToCollection.py 37 | + Susan asked if the code is sending the correct residual size 38 | + If chunks are in unequal sizes (or the last chunk is not the same size as the others), the API will wait for the last chunk to reach the size of the other chunks 39 | + Ed said it could be helpful to add the complete upload size in the content-length header with the POST request 40 | + Michelle provided a link to a tool that makes it easier to authenticate using the DSpace API: https://github.com/the-library-code/dspace-rest-python/tree/main 41 | + John asked if anyone had recommendations for tools to use to take messy data from google docs and publish it to a dashboard a couple of times a year 42 | + Has been looking at [Streamlit](https://streamlit.io/) and [Pygwalker](https://github.com/Kanaries/pygwalker) as potential options 43 | + Pygwalker has tableau-like display 44 | + Jeremy used streamlit for a project with Hopkins Marine Station: https://taxa.stanford.edu/ 45 | + One issue he noted was that every time a user would interact with the dashboard it would completely reload 46 | + Michael mentioned stumbling across a tool called [Discorpy](https://discorpy.readthedocs.io/en/latest/index.html) and thought it may be of interest after discussion in last Python4Lib session about image cropping/manipulation 47 | + It is a tool for measuring lens distortion in a camera 48 | + Yamil mentioned he is learning about [SeleniumBase](https://seleniumbase.io/) 49 | 50 | ### April 16, 2024 51 | + David provided an update on the upcoming Python4Lib presentation schedule: 52 | + April 30 - open topics 53 | + May 14 - skipped, C4L in person 54 | + May 28 - Thomas will be talking Jupyter Kernel Gateways 55 | + June 11 - Rebecca will be talking Postman 56 | + Eric Phetteplace spoke about hosting a Python4Lib workshop at the upcoming Code4Lib conference 57 | + https://2024.code4lib.org/workshop/Python4Lib 58 | + He mentioned that he would welcome a a volunteer to help with session and mentione that he can probably get the cost of the workshop refunded for the volunteer 59 | + It’ll be a loose conversation similar to a Python4Lib missing and will cover more specific topics in the second half 60 | + He mentioned asyncio as a potential topic he would like to explore in the session 61 | + Eric spoke about getting access to some High Performance Computing and exploring parallel processing 62 | + He mentioned that this set up has a “head node” that coordinates with the other nodes 63 | + We shared some links with information on parallel work in Python 64 | + https://realpython.com/python-concurrency/ 65 | + https://docs.python.org/3/library/multiprocessing.html 66 | + https://realpython.com/async-io-python/ 67 | + https://realpython.com/python-gil/ 68 | + Then we spent a long time talking about the pros and cons of doing parallel work with Python 69 | + Clinton had some details and examples of reasons why Python’s language design makes it comparatively very slow for parallel work compared to many other languages like Rust and C 70 | + GIL is going away https://www.blog.pythonlibrary.org/2023/08/16/global-interpreter-lock-optional-in-python-3-13/ 71 | + We also talked about how despite the fact that Python is slower than other languages, you can take existing Python code/projects and update them over to the current parallel options in Python and in many situations you can still get really good improvements in performance 72 | + Michelle shared an example of working with the Alma API using asyncio 73 | + Her work went from a runtime of 1 hour for 2000 API calls to 5 minutes for 2000 API calls 74 | + https://github.com/jhu-library-applications/alma-api/blob/main/updateItemFieldsFromCSVAsync.py 75 | + Her code updates Alma items from a CSV, doing batches of 1000 rows at a time from the spreadsheet (to help catch errors in more manageable sets) 76 | + Clinton also shared a Python profiler, to help see what parts of your code are running slow/fast and which parts are using C-based code (which runs faster) 77 | + https://github.com/plasma-umass/scalene 78 | + He also shared apresentaion on python performance 79 | + [Python Performance Matters by Emery Berger (Strange Loop 2022)](https://www.youtube.com/watch?v=vVUnCXKuNOg) 80 | + Jerrell asked if anyone had been working on AI assisted image cropping 81 | + No one had worked on this yet but many people are interested in the topic 82 | + We briefly talked about the use of [Whisper (from OpenAI)](https://openai.com/research/whisper) to create transcripts of videos 83 | + We also spoke about [Otter AI](https://otter.ai/), another transcript platform that can use Zoom 84 | + Handprint also came up 85 | + https://2022.code4lib.org/talks/Handprint-A-program-to-explore-and-compare-major-cloudbased-services-for-handwritten-text-recognition 86 | 87 | ### April 2, 2024 88 | + Charlotte and Tomasz have released a new [version (1.0) of Bookops-Worldcat](https://github.com/BookOps-CAT/bookops-worldcat), a Python wrapper for the WorldCat Metadata API. 89 | + The new version supports changes made in [version 2.0 of the Metadata API](https://developer.api.oclc.org/wc-metadata-v2). 90 | + The documentation is available on GitHub pages: https://bookops-cat.github.io/bookops-worldcat/ 91 | + Lauren at Rice is working on a reclamation project, gave a shoutout to Rebecca for some python notes she shared in the past. 92 | + Here is Rebecca’s code: 93 | + Pulls specified data from holdings records in Alma, using the Bibs API 94 | + https://github.com/LibraryNinja/Holdings_Record_Inpsector 95 | + Rebecca talked about her recent work using Tkinter. She has been changing code written using PySimpleGUI to Tkinter after PySimpleGUI changed their licensing and would require a fee for higher ed use. 96 | + https://docs.python.org/3/library/tkinter.html 97 | + https://realpython.com/python-gui-tkinter/ 98 | + https://github.com/TomSchimansky/CustomTkinter 99 | + Someone asked Rebecca for beginer Tkinter resources and she recommended two courses/videos 100 | + [Create Graphical User Interfaces With Python And TKinter](https://www.youtube.com/playlist?list=PLCC34OHNcOtoC6GglhF3ncJ5rLwQrLGnV) 101 | + [A Linkedin Learning Course](https://www.linkedin.com/learning/python-gui-development-with-tkinter-2?u=2147385) 102 | + Eric asked if once can create a single executable with a custom desktop icon for the resulting app with Tkinter 103 | + Rebecca said it is possible, but would require the use of a packaging utility 104 | + Rebecca: “PyInstaller is the thing that packages it all up using the command line, Auto-py-to-exe is a layer on top for it” 105 | + Emily had a question about using pymarc for some batch edits, but it did not work as she hoped(?) 106 | + “At my institution, we’ve got one person (me) identifying OCLC numbers for changes in one, now pymarc script, that a second person then feeds into the Metadata API 2.0 to make changes. Using the BookOps library would we be able to integrate the script searching for identifiers with the script that makes batch changes?” 107 | + Charles shared a new project he and Eddie are working on using Flask to connect to the Alma API 108 | + https://flask.palletsprojects.com/en/3.0.x/ 109 | + https://en.wikipedia.org/wiki/Flask_(web_framework) 110 | + The application lives on the Azure cloud, but it runs via Docker for local tests and on the cloud 111 | + Javier asked about Charles' use of ChatGPT 4, if he could share reasons to justify the cost of chatGPT 4 112 | + Javier also asked about the various “personas” that Charles used. 113 | + Charles then explained how to give “context” to each “persona.” Like stating that the human users is already experienced in programming. 114 | + Charles also mentioned that he asks chatGPT questions that chatGPT may need answered before it can properly answer a particular prompt (or all prompts going forward for a single “persona”) 115 | + Charles also recommended other LLMs that worked well for him for code questions if you cannot pay for ChatGPT 4 (some of the ones below have paid versions too) 116 | + https://www.phind.com/search 117 | + https://www.anthropic.com/claude 118 | 119 | ### March 19th, 2024 120 | + Yamil and Charlotte gave a presentation on Python Virtual Environments & requirements.txt 121 | + https://docs.google.com/presentation/d/1XvnmQFdCkBWnD4javgJ0SPn-Uzp7F8if4dIh6qPxKos/edit?usp=sharing 122 | + Q&A/Discussion 123 | + Using pyproject.toml vs. requirements.txt 124 | + pyproject.toml files are more complex/powerful 125 | + this should be a presentation topic in the future 126 | + https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ 127 | + Dependency management and how to properly deploy code to someone else’s machine 128 | + pipx: https://github.com/pypa/pipx 129 | + how to install packages globally while still keeping them separate form the global Python install 130 | 131 | ### March 5th, 2024 132 | + Rebecca mentioned that Pysimple GUI has moved to a license model and was wondering if it is common for a package to move to a closed license 133 | + Clinton mentioned he has seen it maybe 5 times 134 | + It makes projects very brittle because every person needs to get a key annually 135 | + We discussed alternatives to PySimpleGUI 136 | + TKinter: https://docs.python.org/3/library/tkinter.html 137 | + PyQt: https://wiki.python.org/moin/PyQt 138 | + Clinton also mentioned using a python backend with a simple HTML frontend in the past as a potential alternative to PySimpleGUI 139 | + If the project doesnt need the user interface to change, the project won't require any javascript 140 | + Buttons can send calls to Flask endpoints 141 | + Example: randomizing math exercises from text book 142 | + Basic inputs with some rendering in Flask 143 | + It has a low barrier to entry 144 | + The python is running locally and you type in the local host in the browser 145 | + Will always use a browser as the front end 146 | + Brooks mentioned [FastUI](https://github.com/pydantic/FastUI) and [DearPyGUI](https://github.com/hoffstadt/DearPyGui) 147 | + https://talkpython.fm/episodes/show/348/dear-pygui-simple-yet-fast-python-gui-apps 148 | + Tomasz mentioned that python isn’t really known for windows apps especially because TKinter is part of the standard library but looks very dated 149 | + The library isn’t copied into your virtual environment 150 | + https://beeware.org/project/projects/libraries/toga/ 151 | + Rebecca mentioned TTKbootstrap: https://ttkbootstrap.readthedocs.io/en/latest/ 152 | + Rebecca asked how to ensure that one won’t be burned in the future 153 | + Clinton suggested focussing on tools with very wide adoption (like Flask or Django) 154 | + Tools that are widely used can’t make that sort of change without it being too disruptive 155 | + If anyone would like to evaluate any of these tools and present on their findings it would be a welcome presentation 156 | + Rebecca mentioned a self-checkout tool that she is developing and asked for feedback 157 | + She is working with a group within CUNY to develop this tool 158 | + It will run in a terminal where someone could enter their User ID and check out a book 159 | + Charlotte asked for feedback on [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat) 160 | + David mentioned that he and Lauren are working on an OCLC reclamation using [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat) 161 | + Clinton offered to present on creating simple APIs in the future 162 | + Eric said he was interested in learning more about FastAPI 163 | + Tomasz asked about Jupyter Kernel Gateway to implement a local API to query from within an OpenRefine project 164 | + https://github.com/MichaelMarkert/GND4C/blob/main/APIs_for_OpenRefine/localAPI.ipynb 165 | + Kate asked about adding 758 fields to ILS records 166 | + She is exploring adding them to their collection in a batch 167 | 168 | ### February 20th, 2024 169 | (Missing notes from Jeremy's presentation on pyscript) 170 | 171 | 172 | ### February 6, 2024 173 | + Upcoming scheduled presentations/chats: 174 | + Jeremy Nelson will talk about [pyscript](https://pyscript.net/) on Feb 20 175 | + Charlotte and Yamil will be talking virtual environments on Mar 19 176 | + Rebecca recently gave a chat about something she built with [PysimpleGui](https://www.pysimplegui.org/en/latest/) 177 | + there will be a video of this soon 178 | + Michael went over how he solved his PDF batch change issue by using [pikePDF](https://pikepdf.readthedocs.io/en/latest/) 179 | + He just wanted to batch change some simple low level PDF file metadata like the “author” field for the whole PDF file, but pikePDF can do a lot more with PDFs 180 | + He mentioned how PDFs save file metadata in two ways, but pikePDF helps him access either 181 | + He also mentioned an older Perl based tool called `exiftool` that is good for grabbing file metadata info 182 | + https://exiftool.org/ 183 | + He fired up the [Pycharm python IDE](https://www.jetbrains.com/pycharm/) and ran the debugger on some sample code to show us some issues that he initially had, but has since solved 184 | ``` 185 | from pikepdf import Pdf 186 | 187 | with Pdf.open('original.pdf') as pdf: 188 | with pdf.open_metadata() as meta: 189 | del meta['dc:description'] 190 | del meta['pdf:Keywords'] 191 | pdf.save('clean.pdf') 192 | 193 | ``` 194 | + Yamil mentioned the upcoming PyCon 2024, and mentioned the $100 online only registration option. Also the videos will be posted on their Youtube channel after a month or so. 195 | + https://us.pycon.org/2024/ 196 | + https://us.pycon.org/2024/attend/information/ 197 | + David asked about any new projects people have started with Python lately 198 | + He mentioned that he is teaching a colleague to update OCLC holdings with Python using the OCLC Metadata API 199 | + He also mentioned [bookops-worldcat](https://bookops-cat.github.io/bookops-worldcat/0.5/), Tomasz's library that acts as an “wrapper” for use with the OCLC Metadata API 200 | + “... Bookops-Worldcat is a Python wrapper around OCLC’s Worldcat Metadata API which supports changes released in the version 1.1 (May 2020) of the web service. The package features methods that utilize search functionality of the API as well as read-write endpoints. The Bookops-Worldcat package simplifies some of the OCLC API boilerplate, and ideally lowers the technological threshold for cataloging departments that may not have sufficient programming support to access and utilize those web services. Python language, with its gentle learning curve, has the potential to be a perfect vehicle towards this goal. ...” 201 | + David said he will share some sample code to show how he uses the OCLC Metadata API to update holdings with Python 202 | + Alison asked if anyone has successfully used Alma APIs and scripting to bulk change loan due dates for expired patrons 203 | + Alma doesn’t automatically do this when patron expiration dates change, which is a huge issue. 204 | + Rebecca: I haven’t changed loan dates but I have done other small things with the user/fulfillment API so far 205 | + Matt: I’ve used Python & the API once or twice to make bulk change due dates for specific users, but it’s been a while. Should be possible to do what you’re asking, though 206 | + David: I think our systems librarian does something like that at the end of the semester or FY. I can check with him and see if there’s anything he’d be willing to share. 207 | 208 | ### January 23, 2024 209 | + Mike was having issues making bulk edits to the built-in metadata (eg. author) in PDF files using the [pypdf module](https://pypi.org/project/pypdf/) 210 | + repo: https://github.com/py-pdf/pypdf 211 | + Daniel suggested he try a module like [PyExifTool](https://pypi.org/project/PyExifTool/) that taps into exif data 212 | + David mentioned that his library is migrating into Ex Libris Alma/Primo in the near future. 213 | + He asked about existing Alma API wrappers you use and if anyone had experience using them 214 | + No one had suggestions for an API wrapper for Alma but many suggested he ask on the various Code4lib Slack channels 215 | + There is a [possibly outdated project UC David from 5 years ago](https://github.com/UCDavisLibrary/almapipy) 216 | + Clinton put in a plug for using Postman to quickly use APIs 217 | + https://www.postman.com/ 218 | + Craig also suggested [Insomnia](https://insomnia.rest/) as an alternative for working with APIs manually 219 | + We may try to have a presentation in this group on the very basics of Postman in the future 220 | + David E. asked about how folks have been using chatGPT for coding python 221 | + Many folks had success with writing code with chatGPT, but chatGPT does not know a lot about some technologies 222 | + It doesn't know some details of OpenSearch and has invented functions in PyMARC when asked 223 | + [HuggingChat](https://huggingface.co/chat/) was suggested as a better alternative to chatGPT, since it has a more recently updated model 224 | + ChatGPT’s 3.x model is from 2021 and HuggingChat's model is supposed to be newer 225 | + it has an option to “search the web” that, when enabled, will try to compliment its answers with information queried from the web 226 | + Eric has used chatGPT for creating unit tests with more advanced features like “test parameterization” 227 | + Eric mentioned that he proposed a post-conference session at Code4lib 2024 for this group (python{4}lib) 228 | + He asked for topic suggestions and volunteers 229 | + The session will happen in the morning 230 | + David E. asked if folks are starting new projects that will necessitate using python to finish the projects 231 | + For those migrating to FOLIO ILS the [EBSCO python client](https://folio-migration-tools.readthedocs.io/en/latest/) was recommended 232 | + Daniel asked for suggestions for PAID software for digital humanities, since they have a budget for it 233 | + Here were the suggestions: 234 | + [Constellate from Jstor labs](https://labs.jstor.org/projects/text-mining) is a text analysis tool and they run workshops 235 | + [Gale Digital Scholar Lab](https://www.gale.com/primary-sources/digital-scholar-lab#how-the-lab-works) 236 | 237 | ### January 9, 2024 238 | John Dewees, DAM Lead at the University of Rochester, gave a presentation on the pax-opex-utility 239 | [pax-opex-utility](https://github.com/rochester-rcl/pax-opex-utility) is "a graphical utility to format PAX objects and OPEX metadata for ingest into Preservica as SIPs to be synced with ArchivesSpace" 240 | + He used a PySimpleGUI utility to create a Windows executable 241 | + https://www.pysimplegui.org/en/latest/ 242 | + the pax-opex-utility only works on Windows at this time 243 | + from David E.: 244 | + One thought on implementing on Mac vs. PC: I think there are different pathing formats/norms to follow. Depending on users they may need to make some adjustments if certain paths are hard coded. (I’ve made that an issue for myself by cleverly coding between a laptop and work PC.) 245 | + someone asked about libraries that can be used to package up assets for Archivematica and libraries that can be used to work with metadata in ArchivesSpace 246 | + someone else shared [ArchivesSnake](https://github.com/archivesspace-labs/ArchivesSnake) 247 | + Tomasz asked how is this software “shipped” to users 248 | + John said the users download software from the software’s Github repo’s release section 249 | + Someone asked if the code had unit tests, and some were not familiar with unit tests 250 | + Yamil shared a presentation he gave to this same group last year called [“Intro to unit testing in Python”](https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing) 251 | + We talked about how to save credentials in your OS and not in the app 252 | + Tomasz mentioned a Python module that can help with this: 253 | + “The [Python keyring library](https://github.com/jaraco/keyring) provides an easy way to access the system keyring service from python. It can be used in any application that needs safe password storage. These recommended keyring backends are supported:” 254 | + macOS Keychain 255 | + Freedesktop Secret Service supports many DE including GNOME (requires secretstorage) 256 | + KDE4 & KDE5 KWallet (requires dbus) 257 | + Windows Credential Locker 258 | + We talked about how to handle using paths in your code to work in more than one OS 259 | + it was suggested to look into using the built in “pathlib” library to make it easier to create cross platform paths and thus use less manual string concatenation to create paths 260 | + https://realpython.com/python-pathlib/ 261 | + https://docs.python.org/3/library/pathlib.html 262 | 263 | Screenshots from John's presentation: 264 | ![pax-opex1](media/pax-opex1.png) 265 | ![pax-opex2](media/pax-opex2.png) 266 | ![pax-opex3](media/pax-opex3.png) 267 | 268 | ### December 13, 2023 269 | We briefly talked about [“for … else” construct](https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops) that was recently mentioned in the #python Slack channel 270 | + I have only used it once, but I was very confused the first time I saw it 271 | 272 | “This is a summary of what features appeared in which versions of Python.” 273 | + https://nedbatchelder.com/text/which-py.html 274 | + I found this page very helpful, it is created by the maintainer of the [coverage.py](https://coverage.readthedocs.io/) Python module 275 | 276 | We talked about using Google Colab as a way to try to run a python script with more resources than on your local machine. For example, you may be able to tap into GPUs with Google Colab. 277 | + “Colab is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs.” 278 | 279 | Someone asked about running Python or non-Python projects on Digital Ocean, some have used it and were happy with them. I use the Digital Ocean help docs for Unix/shell and even Python topics quite often 280 | + John: “I’m not sure about now, but a few years ago Digital Ocean did some good free webinars on Django and Flask. The instructors really knew a lot about deploying Python on DO.” 281 | + [Getting Started with Flask](https://www.digitalocean.com/community/tech-talks/getting-started-with-flask) 282 | + [Deploying your Python Applications](https://www.digitalocean.com/community/tech-talks/deploying-your-python-applications) 283 | 284 | Daniel talked briefly about a new project called [jupyter-ai](https://github.com/jupyterlab/jupyter-ai) (and gave a live demo) 285 | 286 | We spoke about doing quick python tests or experiments with a local Jupyter notebook 287 | + Another alternative for doing quick local tests or to run interactive commands for production use is iPython. Which is the code base that was the foundation of Jupiter Notebooks 288 | + https://ipython.readthedocs.io/en/stable/index.html 289 | 290 | Book suggestion from John: 291 | + I’ve just started this book to try and build more programming practice into my workday: [Python Workout: 50 ten-minute exercises](https://www.manning.com/books/python-workout) 292 | + It’s included on O’Reilly if you have an institutional subscription. 293 | 294 | On the topic of new things we have tried lately 295 | + I finally started using the [coverage.py](https://coverage.readthedocs.io/) Python module 296 | + “Coverage.py is a tool for measuring code coverage of Python programs. It monitors your program, noting which parts of the code have been executed, then analyzes the source to identify code that could have been executed but was not.” 297 | 298 | We spoke about coming up with new years python learning resolutions, or 7 days of code challenge 299 | 300 | Also the group was asked if we should continue to have a mix of scheduled presentations and free chat time 301 | + the group would like to keep this mix 302 | 303 | We talked about John’s earlier idea (from Slack) about finding if there any Python related presentations meant for Code4lib that were not accepted (or accepted) that could be given during this Python group meetings for those that cannot attend Code4lib 304 | 305 | Tomasz mentioned how he suddenly found out that `distutils` (https://docs.python.org/3.10/library/distutils.html) was removed from the new Python 3.12 release 306 | + “`distutils` is deprecated with removal planned for Python 3.12. See the What’s New entry for more information.” 307 | + we talked a bit about how Python does remove features, but it tries to give “deprecation warnings” and a year or so before a feature/module is removed 308 | + “You get what you pay for” reminds me of this: https://xkcd.com/2347 309 | + Susan: That xkcd reminds me of the node.js/javascript library whose developer yanked it from all the public repos a few years back, and it broke basically everything. Was it underscore? 310 | + [left-pad](https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code) 311 | 312 | 313 | The removal of that distutils module led to a discussion about [Python virtual environments](https://realpython.com/python-virtual-environments-a-primer/) (also known as a venv which is the Python built-in module’s name) 314 | + by default the virtual environment works with whatever is the single python version is installed on your OS 315 | + you still need to set up separate python version (and there are multiple ways for that[1]) to have a virtual environment and also have it run a different version of Python locally 316 | + these are 2 ways (of several) to have more than one version of Python with tricks like 317 | + https://github.com/pyenv/pyenv 318 | + Docker 319 | + this group may have a future presentation on Python virtual environments (Yamil and Charlotte agreed to present on the topic) 320 | 321 | ### November 28, 2023 322 | Michael Benowitz, a Tech Lead at the NYPL, gave a presentation on Airflow. 323 | "[Apache Airflow](https://airflow.apache.org/) is a platform created by the community to programmatically author, schedule and monitor workflows.” 324 | Link to slides will be forthcoming, I will include screenshots of a few of the slides in the meantime. 325 | + [Wikipedia article on Airflow](https://en.wikipedia.org/wiki/Apache_Airflow) 326 | + It is a free and open source product, but typically needs to run on a central VM/server for production use. Instead of just running on your own workstation. There are “cloud” providers for handling the hosting for you. 327 | + Airflow can be part of an [ETL workflow](https://en.wikipedia.org/wiki/Extract,_transform,_load) 328 | + Airflow can be easy to schedule compared to older tools like [cron](https://en.wikipedia.org/wiki/Cron), and it comes with a GUI 329 | 330 | Airflow cloud options: 331 | + https://aws.amazon.com/managed-workflows-for-apache-airflow/pricing/ 332 | + https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/data-factory/how-does-managed-airflow-work.md 333 | + Astronomer 334 | + DAGster 335 | 336 | Additional technologies used and/or mentioned: 337 | + https://en.wikipedia.org/wiki/Kubernetes 338 | + https://www.sqlalchemy.org/ 339 | + https://docs.pydantic.dev/latest/ 340 | + https://newrelic.com/ - has a way to give free “seats” to certain non-profit organizations 341 | + https://en.wikipedia.org/wiki/AWS_Lambda 342 | 343 | Screenshots of Mike's presenations: 344 | ![airflow1](media/airflow1.png) 345 | ![airflow2](media/airflow2.png) 346 | ![airflow3](media/airflow3.png) 347 | ![airflow4](media/airflow4.png) 348 | ![airflow5](media/airflow5.png) 349 | ![airflow7](media/airflow7.png) 350 | 351 | ### November 14, 2023 352 | + We talked about the MARC21 standard, how each record has a max size of 99,999 bytes/octets, and that individual fields can only have a maximum of 9,999 bytes/octets in size 353 | https://www.loc.gov/marc/specifications/specrecstruc.html 354 | + I then shared a Python pymarc snippet that inspired this size talk, that processed a large 80k record MARCXML file export to find if any individual records were larger than 99,999 bytes/octets 355 | https://pymarc.readthedocs.io/en/latest/ 356 | + I was happy to find a convenient pymarc method that reads in MARCXML files and returns a Python list of individual pymarc records 357 | ```python 358 | records = pymarc.marcxml.parse_xml_to_array('myfile.xml') 359 | ``` 360 | + though this method loads all data in RAM and could seriously impact your computer performance if you don’t have a lot of RAM available 361 | there are other functions and approaches to only load a few XML records at a time 362 | the resulting code found 4 records in our data 363 | then there was a question about how hard it is to use pymarc to analyze subject data in a batch of records 364 | we then shared a few more examples of how simple it can be to use pymarc 365 | and how general knowledge of Python concepts like looping through lists and using conditional statements goes a long way to make it easy to use pymarc 366 | see image of Eric’s example of using pymarc code that was shared 367 | 368 | ![marc1](media/marc1.png) 369 | 370 | ![marc2](media/marc2.png) 371 | 372 | + Rebecca had a question about properly creating a graph using Google Colab, Pandas, and plotly. 373 | + https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html 374 | + https://pandas.pydata.org/ 375 | + https://plotly.com/python/ 376 | 377 | + Rebecca was hosting her code on Google Colab, which is a way to run Jupyter notebooks on a shared site that you can then share with others 378 | https://research.google.com/colaboratory/ 379 | https://jupyter.org/ 380 | + we briefly spoke about that we should avoid using regular expressions when processing XML data 381 | and we should instead use a Python module that are specifically designed for processing XML 382 | here are some short post with some comments on why we should avoid using regex with XML 383 | + https://medium.com/thecyberfibre/stop-parsing-x-html-with-regular-expression-2cf13215b411 384 | + https://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex 385 | + Here are some examples of python modules that are meant to handle XML 386 | + [ElementTree XML API](https://docs.python.org/3/library/xml.etree.elementtree.html) 387 | + [this one is built-in to Python BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) 388 | + this one is usually used for parsing HTML but can handle XML [lxml](https://lxml.de/) 389 | 390 | ### October 31st, 2023 391 | + Introductions, refreshing memories of returning attendees and new attendees; common threads from intros: 392 | + Alma 393 | + OCLC API (APIs in general) 394 | + Archivespace 395 | + John Dewees Question on CSVs - Generally how big is too big for python to handle CSVs? Is there a moment where something is too big to be ingested and handled properly? 396 | + John Pillbeam mentioned SQLite might work well here which is sort of a file on disk and is adaptable for quite a bit of operations. 397 | + Bruce Orcutt mentioned SQLite might be the best way to go as well, though think of the upfront maintenance. 398 | + Paul Clough mentioned you may need an Object Relational Mapping (ORM) in front of the SQLite. It helps translate between the application and its needs (abstracts it out.) 399 | + Emily Frazier mentioned using a python script which loads 8 million rows of a TSV into pandas. It worked but was a bit slow. 400 | + Rebecca Hyams mentioned an Alma project which helps draw out certain elements of MARC data. You can get really granular from API. ENUG Presentations including Rebecca’s presentation on item/inventory and PySimpleGUI 401 | + Comments about documenting projects. Susan mentioned good comments in code and a narrative of it in a separate word doc. 402 | + Constellate was asked after by Bruce. 403 | + John Pillbeam linked to the courses/workshops at constellate.org/events. 404 | + John P. Linked to another course by one of the constellate devs. Currently going through this free online course/textbook that one of the Constellate trainers created: https://pandas.pythonhumanities.com/ 405 | 406 | ### October 17th, 2023 407 | + we talked about [FRBR](https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records) 408 | + Talked about record-rollups 409 | + Susan mentioned that she started working through Adam Emery’s “Learn Python” tutorials. 410 | + Eric has recently liked working with the [Spacy site to learn about Natural Language Processing (NLP)](https://course.spacy.io/en) 411 | + Yamil liked the tutorials that this site has, since you can run examples right on their site without having to install anything locally 412 | + Susan later asked if they should use a locally installed version of Python or use Jupyter notebooks for her first real project 413 | + John: the consensus that it is better to have a locally installed version 414 | + though Jupyter notebooks or Google Colab can be great to practice or prototype things 415 | + “I just discovered this via that Glyph blog post - an updater for the python.org Mac installer: https://mopup.readthedocs.io/en/latest/ 416 | + David shared a free online Python tutorial: 417 | + https://learn-python.adamemery.dev/ 418 | + Other more advanced suggestions included 419 | + using [pyenv](https://github.com/pyenv/pyenv) to easily manage having more than one version of Python on yourhcieh 420 | + a few people mentioned that they are liking using [Poetry](https://python-poetry.org/) for “packaging and dependency management” 421 | + John D. mentioned: “Just finished the official [PySimpleGUI](https://www.pysimplegui.org/en/latest/) Udemy course and created my first graphical utility which has been fun” 422 | + This group may have a future session to demonstrate PySimpleGUI 423 | + Tomasz asked if folks knew about Python tools for “transliteration” of Non-Latin text 424 | + A graph-based transliteration tool: https://github.com/seanpue/graphtransliterator 425 | + We went back to talking about tools for local development 426 | + Here is an image that I found through my local (Boston) Python meet up, of all the tools that can be used for setting up your code... 427 | + for virtual environments, for creating packages, for multiple python versions, etc. https://cdn.fosstodon.org/media_attachments/files/110/741/748/598/833/261/small/13d5e21803357140.png 428 | + It is a bit overwhelming 429 | + Here is a presentation where this image was taken from uploaded this summer covering a lot of the possible tools that can be used… https://youtu.be/MsJjzVIVs6M 430 | 431 | ### October 3rd, 2023 432 | + Guest Speakers: 433 | + Simply E (python project ereader app) (Mike with NYPL, Tomasz to contact) 434 | + Alma and Archivespace sync utility (Aspace and Alma APIs) Bruce Orcutt in Group (Dave to email/slack) 435 | + Pysimple GUI (John Dewees) (Dave to email/slack) 436 | + Citation generator at U of Miami (Eddy and Charles) Citation Style Language 437 | + https://pypi.org/project/citeproc-py/ 438 | + https://pypi.org/project/citeproc-py-styles/ 439 | + https://github.com/brechtm/citeproc-py 440 | + https://citationstyles.org/ 441 | + Though some libraries aren’t actively maintained. 442 | + Side note, Charles works on an open source LibGuides alternative. 443 | + Some general chat about the nature of open source projects - great grassroots! Though it can be fragile/risky. 444 | + Some code generated by chat GPT for the basic LMS on the list of exercises Charles provided 445 | + Think small and tailor the items to the library discipline. Build upon one thing to the next? 446 | + GUI? Connect to WorldCat? 447 | + Carpentries lessons, link to git space? https://carpentries.org/community-lessons/ 448 | + John Pillbeam mentioned the incubator for finding concepts that may not be included in main lesson plans yet. 449 | 450 | ### September 19th, 2023 451 | + Ben asked how to tell others that say they want to use Python with AI, specifically with the chatGPT API 452 | + We spoke how there is some ability to run some API calls for free for version 3.5, though there is a cost for running API calls for the 4.x version 453 | + It was mentioned about the pricing for Hugging Face https://huggingface.co/pricing as an alternative 454 | + From David: Hugging Face also has a variety of tags around different areas of AI. So there’s the Natural Language Processing stuff, but ChatGPT is the big player there. But things like object detection and audio tools are there. 455 | + Yamil suggested running tutorials of the https://scikit-learn.org/stable/ 456 | + Simple and efficient tools for predictive data analysis 457 | + Accessible to everybody, and reusable in various contexts 458 | + Built on NumPy, SciPy, and matplotlib 459 | + Open source, commercially usable - BSD license 460 | + Recent post from Simon Willison on Python and OpenAI tools: https://simonwillison.net/2023/Sep/12/llm-clip-and-chat/ 461 | + We talked about concerns on the AI hype and over reliance of AI. 462 | + We very briefly spoke about NLP - Natural Language Processing., and how that is just a small part of the “engine” that is a platform like chatGPT 463 | + to try to learn NLP I ran some tutorials using the python module https://spacy.io/ 464 | + spaCy is a free, open-source library for advanced Natural Language Processing(NLP) in Python. 465 | + If you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? ” 466 | + We spoke about Charle’s new repository with exercises to learn python skills 467 | + https://github.com/UMiamiLibraries/python4lib-python-exercises/blob/main/README.md 468 | + Charles is looking for collaborators 469 | + Tomasz talked about about issues with being a organizational customer of Naxos, which is a streaming audio/video content 470 | + For example, how to make sure the catalog is serving the correct sets of valid MARC files with also valid 856 tags that lead to the content 471 | + Here is a presentation on the pitfalls of keeping your holdings in sync with vendors 472 | + [Everything is Broken, but by How Much Exactly (video)?](https://phette.net/prez/everything-is-broken) [(slides)](https://phette23.github.io/everything-is-broken/#/) 473 | + Tomasz would like to see if he can use Python to automate the process of keeping the holdings in sync. Meaning that MAC records for content that is no longer available via Naxos is deleted from the catalog in a timely manner 474 | + For example, doing some analysis with Pandas 475 | + Kate wrote: 476 | + Once we migrate to our new ILS (Symphony), we will eventually (hopefully!) start using their eResource Central system for all our eContent and be able to do away with MARC records for eContent. But for now we use a combination of extracting batches of records in order to use MarcEdit’s link checker or other link checkers, or just periodically wiping out all our MARC records for a particular vendor and loading a new batch from the vendor for all our holdings 477 | + We’re about to do that now with Axis 360 since they’ve switched to “Boundless”. We have over 30,000 MARC records for Axis 360, so just too much to handle 478 | + Mentioned the issues of trying to fix issues, in the large vendor MARC records that need to be added to our catalogs. For example, like misspellings or bad records 479 | + We spoke about about the limitations of licensing content from Naxos (or similar vendors) versus actually storing that content locally 480 | + Briefly mentioned the ongoing “Internet Archive lawsuit” 481 | + Here is an article about the lawsuit it that is a few weeks old 482 | This is an [article from the New York Times](https://www.nytimes.com/2023/08/13/business/media/internet-archive-emergency-lending-library.html?unlocked_article_code=wcOmLYkdU__rOiiM6CNfze5OdE8Y4h41_rWZGFXrGdG-380Ng1Dkw0URPeZyTdFWmVYedUOlhz1hQFujukvNfw6un9L-aR5-AXLvbT4yWNv_tPLhfkj0Ou344H0i50355VZDbp5Uv9U6xLKJrJGh7WRZ-Vi6WbWosiHTpN7j-qR60P1SUSZn9nweYhFky5gIPNubaGpsUrRt3V1ZbzqG_aQMpfqbQSjFZamJkm84kzV_bqbbDB1q370gK6OkBDZbrBifM0fTKnqQaVItqvokBYaeEExJsRMugQQlJiKInxc7V44Cg5xK0piv3Q6ulQj1V1i2QYsbQGgSQwjv_bzTmknPkPRHMkfI9Uf2jdYqM5GHRn9zwqk9tvqXTw&smid=url-share) that is several weeks old about the lawsuit 483 | a key quote from the article that we talked about 484 | “Libraries came before publishers,” the 62-year-old librarian said in a recent interview in the former Christian Science church in western San Francisco that houses the archive. “We came before copyright. But publishers now think of libraries as customer service departments for their database products.” 485 | 486 | ### September 5th, 2023 487 | + Charles showed some code that batch creates APA & AMA citations 488 | + Carlos wanted feedback on how to add small improvements to their code that creates citations 489 | + for example, when then there is no volume number for a citation, how to elegantly not add a volume number 490 | + someone suggested to to use Python 3.10's “case” functionality that is formally called: “Structural Pattern Matching” 491 | + this feature was added Python 3.10 in PEP636 https://peps.python.org/pep-0636/ 492 | + we briefly talked about how PEP stands for “Python Enhancement Request” 493 | + Here is a site with a brief explanation on how to use “Structural Pattern Matching” in Python 3.10 494 | https://realpython.com/python310-new-features/#structural-pattern-matching 495 | + Eduardo, who works with Charles, mentioned that they are trying to figure out how to encode that some parts of the citation have to be in italic when using Pandas to batch create citations 496 | + Tom has this suggestion for dealing with citation data 497 | + If you want to play with bibtex files to manage your citations instead of excel, you could possibly use this https://github.com/caltechlibrary/pybtex-apa7-style 498 | + https://github.com/cproctor/pybtex-apa7-style/blob/master/formatting/apa.py 499 | + Yamil talked about using “unittest” for a pre-existing python code base, but mentioned that you can keep older tests as unittest style and just add new tests that use pytest 500 | + [Info on Python built-in unittest module](https://docs.python.org/3/library/unittest.html) 501 | + versus the non-built in [pytest module](https://docs.pytest.org/en/7.4.x/) also for “unit tests” 502 | + we talked about “Library Carpentry” classes and how helpful they have been. They can cover various topics, including Python 503 | + https://librarycarpentry.org/index.html 504 | + “Library Carpentry focuses on building software and data skills within library and information-related communities. Our goal is to empower people in these roles to use software and data in their own work and to become advocates for and train others in efficient, effective and reproducible data and software practices. Our workshops are based on our lessons. ” 505 | + The [umbrella organization for Library Carpentry](https://carpentries.org/index.html)includes: Data Carpentry and Software Carpentry 506 | + Yamil was asked to briefly speak about a session at the Open Library Foundation’s (OLF) conference (WOLFCon) that covered the FOLIO ILS and the use of Python for post migration clean up by folks at Wellesley 507 | + https://github.com/wellesleyfolio/WOLFcon_2023 508 | + here are more links for Python FOLIO tools/modules 509 | + https://github.com/FOLIO-FSE/folioclient 510 | + https://github.com/folio-org/folio-tools 511 | + this site was suggested for improving your Python skills, but other programming languages are supported 512 | + https://exercism.org/ 513 | + we spoke about Python community’s preferred writing style versus Ruby’s 514 | + We spoke about PEP8, which is the main Python style guide 515 | + [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/) 516 | + Here is a Python module to check if your code follows PEP8 without making changes 517 | https://pycodestyle.pycqa.org/en/latest/ 518 | + spoke about [Black](https://black.readthedocs.io/en/stable/), which can be used to change your code to match PEP8 519 | + “Black: The uncompromising code formatter” 520 | + We spoke about how the Pycharm Python editor is great about reminding you to follow PEP8 when you write your code and to also give the option to automatically reformat individual code snippets to follow PEP8, instead of just reformatting all of your code 521 | + Yamil also mentioned how I have opened up existing Python codebases in Pycharm, and the Pycharm indexer has found many hidden bugs in code that had never run or code that had logic flaws 522 | 523 | ### August 22, 2023 524 | ... missing ... :sob: 525 | 526 | ### August 8, 2023 527 | Our meet focused on [Pydantic](https://docs.pydantic.dev/latest/). Matt Lincoln from JSTOR Labs gave a brief intorduction into the tool and its uses. 528 | 529 | Matt used [this jupyter notebook](demo/pydantic.ipynb) to demo basic Pydantic syntax and validation functinality. 530 | 531 | + Data validation can be done using Python type hints 532 | + Fast and extensible, Pydantic plays nicely with your linters/IDE/brain. Define how data should be in pure, canonical Python 3.7+; validate it with Pydantic. 533 | + We briefly talked about wanted to review how to create classes and objects in Python in a future meeting. 534 | + Pydantic can help with IDE / editor auto complete / auto suggest 535 | + Pydantic hasa a x.json() function/method to serialize data to JSON 536 | + great for writing APIs 537 | + Pydantic has a x.schema() method (which uses JSON schemas) 538 | + the schema can then be used to create API documentation for using the API 539 | + [FastAPI](https://fastapi.tiangolo.com/) platform for Python based APIs uses Pedantic a lot 540 | + FYI: Pydantic version 2 is just coming out and some products/python modules that use Pydantic may still be not ready for version 2, but shoudl still support version 1 541 | we also briefly talked about Python’s built in “data classes” 542 | + “In Python, a data class is a class that is designed to only hold data values. They aren’t different from regular classes, but they usually don’t have any other methods. They are typically used to store information that will be passed between different parts of a program or a system.” 543 | + https://docs.python.org/3/library/dataclasses.html 544 | + https://realpython.com/python-data-classes/ 545 | + https://www.dataquest.io/blog/how-to-use-python-data-classes/ 546 | + we talked about that Pydantic is not a replacement of “JSON Schemas”, that Pydantic is a complimentary tool 547 | + https://json-schema.org/ 548 | + https://www.tutorialspoint.com/json/json_schema.htm 549 | + talked about Pydantic validators and their application 550 | + https://docs.pydantic.dev/2.1/usage/validators/ 551 | + the less strict with lose rules 552 | + then will do some clean up/transformation 553 | + then switch to a more strict Pydantic validating class 554 | + we talked about briefly typing in Python in general, and how helpful it can be 555 | + https://docs.python.org/3/library/typing.html 556 | + https://realpython.com/lessons/type-hinting/ 557 | + https://towardsdatascience.com/12-beginner-concepts-about-type-hints-to-improve-your-python-code-90f1ba0ac49 558 | + “Type hints are performed using Python annotations (introduced since PEP 3107). They are used to add types to variables, parameters, function arguments as well as their return values, class attributes, and methods. Adding type hints has no runtime effect: these are only hints and are not enforced on their own.” 559 | + For example, in other languages that are strongly typed like C or C++, if you initially declare a variable as one type (e.g. string), you can’t just later on use it as another type (e.g. int) like we can do in Python 560 | + questions for Matt: 561 | + is there any integration between pydantic and popular [ORMs](https://www.fullstackpython.com/object-relational-mappers-orms.html) (like [sqlalchemy](https://www.sqlalchemy.org/) for example)? Answer: yes, pydantic data classes should work well with most ORMs 562 | + can pydantic validation features be useful in format crosswalks when we do not care about JSON output? Answer: yes, although in some cases more strict and detailed validation may be required. Still out of-the-box validiton in pydantic would be very useful in Matt's opinion 563 | 564 | ### July 25, 2023 565 | + Rebecca: 566 | + Inventory tool to active scan vs. lists, processes, & jobs https://github.com/LibraryNinja/alma_inventory_utility/tree/main 567 | + Utilizes: pysimplegui, auto-py-to-exe 568 | + Old method: Make a barcode set, run job on Alma to update 569 | + Problem of not really knowing if something wasn’t found or had a status (loan, out of place, etc.) 570 | + This is loosely based off of Jeremy Hobbs Lazy Lists utility to adapt to an inventory project. (https://github.com/MrJeremyHobbs/LazyLists) 571 | + Examines items in XML 572 | + Pulls in some basic information to confirm for users. 573 | + Indicates set aside for problematic titles (tech services would handle) 574 | + Used autopy-to-exe to allow student workers to run this small utility on their machines. 575 | + Julie: 576 | + Sierra had a shelflist/inventory but it was not really work well, so a python inventory tool is great! 577 | + Had used SQL lists to help scan/match with selenium 578 | + Tools for link checking? 579 | + Authentication with EZ Proxy 580 | + https://pypi.org/project/LinkChecker/ 581 | + Charles: 582 | + Plotly module for data vis 583 | + Neat 54 lines of code to create an interactive map of internet usage over time worldwide 584 | + Charles does a 1-hour challenge to help learn new modules. 585 | + ChatGPT for helping, there are some prompt setups you can do to reduce repetitive typing 586 | + https://code.visualstudio.com now has a postman extension. 587 | + https://www.pythonanywhere.com/ helps host and run python in the cloud (from the Anaconda people) 588 | + https://www.git-tower.com/education/mac Gui for Git 589 | 590 | ### July 11, 2023 591 | Rough and incomplete summary of topics covered today’s (2023-07-11) in Python{4}Lib group meeting 592 | + we talked about TAP - Text Analysis Pedagogy classes 593 | + https://www.ithaka.org/constellate/text-analysis-pedagogy-institute/ 594 | + Eric mentioned the Python Wagtail CMS built on top of the Python Django software dev sponsored by Google 595 | + https://wagtail.org/ 596 | + https://www.djangoproject.com/ 597 | + Eric’s library moved off of Drupal by switching to Wagtail 598 | + We briefly talked about using https://gunicorn.org/ Python WSGI HTTP to serve Python software like Django, Flask 599 | + Eric also mentioned about a Python based institutional repository, and how it compared to the PHP based Islandora digital repository 600 | + [InvenioRDM](https://inveniordm.docs.cern.ch) 601 | + we talked about using http://docopt.org/ instead of using the [Python built-in argparse module](https://docs.python.org/3/library/argparse.html) for parsing command line (CLI) parameters 602 | + We then talked about parsing ezproxy “audit” files with Python 603 | + then Eric shared a script that he created to parse a data file for the Koha ILS using docopt to parse the CLI parameters that are listed in the comments at the top of the file 604 | + https://github.com/cca/koha_patron_import/blob/main/create_koha_csv.py 605 | + We talked about how to improve your coding style before posting you Python code on Github or on the internet. 606 | + Yamil recommended this book which helped him write in more standard/professional Python style: [“Beyond the Basic Stuff with Python / Al Sweigart”](https://inventwithpython.com/beyond/) 607 | + this section talks about how to better understand Python errors messages like “stack traces” 608 | + [Dealing With Errors And Asking For Help](https://inventwithpython.com/beyond/chapter1.html) 609 | + We then talked about when to use the `try: except:` 610 | + Python syntax to catch exceptions, since folks often did not see try {...} being used a lot in other people code 611 | + some of us mentioned that we don’t use them all of the time but in some situations we always make sure to use them. For example, it is common to use try {...} when you are using a method that commonly raises exceptions. 612 | + Like in the Python Selenium module for writing “functional tests” for web pages. There are several Selenium methods that start with find_***() and can easily trigger an exception if what you are looking for in a webpage is not found. In this context I always use a try {...} statement around calls like find_element_by_css() 613 | + there is of course a lot more that can be said of when to use try {...} in your Python code 614 | + this [chapter from the Beyond the Basic Stuff with Python” book](https://inventwithpython.com/beyond/chapter6.html), among many tips, includes how to use the built-in dictionary get() method that can be used to not accidentally trigger a KeyError exception when you try to access a Python dictionary’s key that does not actually exist 615 | + Writing Pythonic Code - Pythonic Ways to Use Dictionaries 616 | + using the get() dictionary method to avoid KeyError exceptions 617 | 618 | ```python 619 | my_dict = {'username': 'joe'} 620 | my_dict.get['password'] # raises KeyError exception 621 | my_dict.get('password', False) # simply returns False, or whatever is placed in the 2nd parameter of get() 622 | ``` 623 | 624 | ### June 27, 2023 625 | + We talked about how the US PyCon (Python Conference) recently released their videos from their 2023 conference 626 | + [2023 sessions youtube channel](https://www.youtube.com/watch?v=eZwHvBsoPn4&list=PL2Uw4_HvXqvY2zhJ9AMUa_Z6dtMGF3gtb) 627 | + [PyCon YouTube home with past conference videos](https://www.youtube.com/@PyConUS) 628 | + Charles shared this article about “typo squatting” popular Python modules names to trick users to install malaware: https://arstechnica.com/information-technology/2023/02/451-malicious-packages-available-in-pypi-contained-crypto-stealing-malware/ 629 | + Charles also talked about a project at the University of Miami that collects data from Twitter for research purposes. He mentioned that there is now a an API limit to only be able to check 7 days in the past. He will post the name of the Python module they are using to 630 | + Eric mentioned issues that he has had issues archiving older tweets in the past. Also mentioned challenges evaluating misspelled words and how to interpret emojis. 631 | + Talked about unit testing and test coverage with the Python project called [coverage](https://coverage.readthedocs.io/en/7.2.7/) 632 | + Tomasz mentioned [coveralls](https://coveralls.io/) that gives you nice visual reports on your test coverage that can be integrated with Github and be part of CI 633 | + we of course talked about using [Pytest](https://docs.pytest.org/en/7.3.x/) for your unit tests 634 | + For those that are unfamiliar with “unit tests” and “pytest” here a presentation Yamil gave this group a few months ago [“Intro to unit testing in Python”]( 635 | https://docs.google.com/presentation/d/1t1dl7SANyhp4uClRP2JsijWj05nr5AkbUJIAB66GKFQ/edit?usp=sharing) 636 | + We talked a bit about parallel processing in Python to finish work faster 637 | + We shared a link to the [free version of chapter 17 of the 2nd edition of “Automate the Boring Stuff with Python”](https://automatetheboringstuff.com/2e/chapter17/) 638 | + Yamil brought up a suggested approach by the author (Al Sweigart) of “Automate the Boring Stuff with Python” to download files from the internet using the python request module, but in a way that you will not be limited by the amount of free RAM on your computer ([section: Saving Downloaded Files to the Hard Drive](https://automatetheboringstuff.com/2e/chapter12/)) 639 | + Here is the snippet that uses a loop with the iter_content() method, to prevent using up all your RAM if the file is larger than the amount of free RAM on your system... 640 | 641 | ```python 642 | import requests 643 | 644 | res = requests.get('https://automatetheboringstuff.com/files/rj.txt') 645 | res.raise_for_status() 646 | 647 | playFile = open('RomeoAndJuliet.txt', 'wb') 648 | 649 | for chunk in res.iter_content(100000): 650 | playFile.write(chunk) 651 | ``` 652 | + Also we mentioned [networkX](https://networkx.org/): NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. 653 | + Tomasz ask if anyone was doing any batch work on images with Python, to find a faster way to process a larger number of images. We talked about perhaps using multiprocessing for this. 654 | + Again from the book ["Automating stuff with Python", Ch 19](https://automatetheboringstuff.com/2e/chapter19/) talks about using the [Pillow](https://python-pillow.org/) Python module batch change images 655 | + Also there should be ways to use very well known non-python library called ImageMagick, but controlled through Python, for batch making changes to images. Yamil has worked with many projects like the Drupal/PHP based Islandora project, that use ImageMagick for making changes to images 656 | 657 | ### June 13, 2023 658 | + Python podcasts suggestion from Tomasz: [PythonBytes](https://pythonbytes.fm/) 659 | + David talked about an new Python module called “Pandas AI” that did find useful if you have a paid chatGPT account 660 | + https://github.com/gventuri/pandas-ai 661 | + “Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational” 662 | + David id also find a poorly written blog post that was claiming featured that Pandas AI does NOT have, so stay away from this article… 663 | + https://levelup.gitconnected.com/introducing-pandasai-the-generative-ai-python-library-568a971af014 664 | + We talked about when we have used chatGPT to write some Python code snippets, and what were our results. 665 | + The results were mostly positive, but we talked about the benefits of already knowing Python well enough to formulate request more precisely and evaluate how well the chatGPT responses were 666 | + Someone mentioned that chatGPT has become as an alternative to StackOverflow, specially if you are in a hurry 667 | + Someone mentioned Github Copilot: “Those of us who have GitHub educator accounts have free access to Copilot. Have not tried it. Very reluctant, personally.” Which uses AI to write code for you. 668 | + https://en.wikipedia.org/wiki/GitHub_Copilot 669 | + As a counter argument there is this article [“Why I don’t use Copilot”](https://inkdroid.org/2023/06/04/copilot/) 670 | + Will StackOverflow become obsolete with the revolution in AI? Yamil thinks that it is a good inspiration for prompts, and still has great information 671 | + We saw an example of sharing a snippet of object oriented Python code to ask chatGPT to explain what is missing 672 | + One of the participants was glad to get the explanations from chatGPT of what was missing in their object oriented code 673 | + Here is the link to the chat https://chat.openai.com/share/fea426fb-cb02-4b38-9f42-128f59115fc4 674 | + A recent Code4Lib article that talked about using AI generated code was shared [“Utilizing R and Python for Institutional Repository Daily Jobs”](https://journal.code4lib.org/articles/17134) 675 | + We briefly talked about the ethics of using AI written code that was trained on code that other published publicly on Github, but without their explicit consent 676 | + Podcast example crated by AI: 677 | “I’ve been listening to this series in the Planet Money podcast where they try to make an entire podcast episode made by AI:” https://www.npr.org/series/1178395718/planet-money-makes-an-episode-using-ai 678 | + Charles asked if anyone was using Python to automate work with the Azure cloud computing platform 679 | + https://en.wikipedia.org/wiki/Microsoft_Azure 680 | + We briefly talked about [“Azure Functions”](https://learn.microsoft.com/en-us/azure/azure-functions/functions-overview?pivots=programming-language-csharp) which seem similar to [AWS Lambda](https://en.wikipedia.org/wiki/AWS_Lambda) 681 | + We talked about a great site and free book that many people use to get started with Python [“Python for Everyone”](https://www.py4e.com/) 682 | + We also talked about the well known and still very popular Python [Requests](https://requests.readthedocs.io/en/latest/) module, and but also the newer and “async compatible” [HTTPX](https://www.python-httpx.org/) module, which was also mentioned on the Python Slack channel. 683 | 684 | ### May 30, 2023 685 | + David shared his code utlizing `pymarc` to harvest and clean OCLC records. An older example of code: https://github.com/derlandson/PyCat 686 | + Demo of Match MARC toolset as well. 687 | + Tomasz reported his first experiences using `pymarc` v.5 688 | + Discussed a potential `pymarc` feature ordering subfields accoding to a particular field cataloging practice 689 | + challenge: no clear, outlined rules to based it on 690 | + Rebecca demoed a script created to have circ desk staff click a single button for simple questions (directions, tech, find a book, etc.) Creates output file and emails results as csv once per month. 691 | Currently doesn’t need admin permissions but various features may impact this. 692 | + simplified `pyinstaller` app: [auto-py-to-exe] 693 | (https://pypi.org/project/auto-py-to-exe/) was used to help redeploy to other PCs. 694 | 695 | ### May 16, 2023 696 | + We had a brief discussion about [pymarc](https://pymarc.readthedocs.io/en/latest/) and [MARC authority data](https://www.loc.gov/marc/authority/ecadhome.html) 697 | + sparked by Benjamin's issues with using pymarc for authority records 698 | + Tomasz run some quick tests and they looked good: `pymarc` was able to read such data, but more tests are needed to see if manipulating and writing is done correctly. There were concerns about differences in the leader field between the bibliographic and authority data 699 | 700 | #### Ed Summers intro to new pymarc 701 | + David introduced Ed 702 | + Ed stated pymarc is work of many people, Ed's involvement is more of the maintainer 703 | 704 | ##### Breaking changes in pymarc v.5: 705 | + new class `pymarc.Field.Subfield` 706 | + helper properties instead of methods 707 | + old: record.title(), new: record.title 708 | + old: record.publisher(), new: record.publisher 709 | + automatically sets UTF-8 code in record leader in the position 9 710 | + pymarc always converts data to unicode, but before it did not attempt to change the code in the leader to reflect that 711 | + most people don't want to write MARC-8, and want UTF-8 encoded data 712 | 713 | + Ed shows off doing live coding! Uses [Google Colab](https://colab.research.google.com/) and Jupyter notebooks (tip: you can pip install packages in Colab: `!pip install pymarc`, the exclamation mark will tell the notbook cell in not a code but a command line script) 714 | + Ed shows initiating new record instance, and adding fields with the new model for subfields 715 | + `Subfield` is a python [`namedtuple`](https://docs.python.org/3.10/library/collections.html?highlight=namedtuple#collections.namedtuple) 716 | 717 | *New:* 718 | ```python 719 | from pymarc import Record, Field, Subfield 720 | 721 | record = Record() 722 | record.add_field( 723 | Field( 724 | tag="245", 725 | indicators=["0", "0"], 726 | subfields=[ 727 | Subfield(code="a", value="Foo :"), 728 | Subfield(code="b", value=" bar /"), 729 | Subfield(code="c", value="Spam.") 730 | ] 731 | )) 732 | ``` 733 | or simply: 734 | ```python 735 | field = Field( 736 | tag="245", 737 | indicators=["0", "0"], 738 | subfields=[ 739 | Subfield("a", "Foo :"), 740 | Subfield("b", "bar /"), 741 | Subfield("c", "Spam.") 742 | ]) 743 | ``` 744 | 745 | *old* 746 | ```python 747 | record.add_field( 748 | Field( 749 | tag="245", 750 | indicators=["0", "0"], 751 | subfields=["a", "Foo :", "b", "bar /", "c", "Spam."] 752 | )) 753 | ``` 754 | 755 | + New model has advantages over subfiels as a list of strings: 756 | + matches how cataloger's think about subfields - as code-value pairs (Tomasz) 757 | + helps guard against errors such as missing an element to properly create a subfield 758 | 759 | 760 | + discussed briefly differences between pymarc and similar Pearl library [MARC::Record]https://metacpan.org/pod/MARC::Record() 761 | + Ed showed a tip how to avoid malformed or otherwise invalid records when looping over a file: 762 | Ed errors looping over return None (malformed bibs, leader lenght problems, ) 763 | 764 | ```python 765 | from pymarc import MARCReader 766 | 767 | with open("foo.mrc", "rb") as marcfile: 768 | reader = MARCReader(marcfile) 769 | for record in reader: 770 | if record is None: 771 | print(reader.current_exception) 772 | else: 773 | # do something 774 | ``` 775 | 776 | + talked about potential new features in pymarc, for example handling of [linked 880 fields](https://www.loc.gov/marc/bibliographic/bd880.html) that include parallel data in non-Latin scripts 777 | 778 | ### May 2, 2023 779 | + We talked about Rebecca’s code for parsing MARCXML from Ex Libris Alma 780 | Then we talked about our various experiences (good and bad) with parsing XML with Python’s built in ElementTree module versus LXML versus Beautiful soup. We took a moment to talk about the typical issues that can come up with web scraping when a site’s HTML changes over time. 781 | 782 | + We then spoke about Eric Morgan’s recent question to the Code4lib mailing list about “literary warrant.” 783 | 784 | + John asked if anyone had experience with Python modules for creating barcodes. We briefly also spoke about creating QR codes with Python. 785 | John is using this Python module: 786 | https://python-barcode.readthedocs.io/en/stable/ 787 | Jason is using: 788 | PyQRCode==1.2.1 789 | pyzbar==0.1.9 790 | Emma shared a good explainer on QR codes : https://ivantay2003.medium.com/qr-code-demystify-2a5263ab136e 791 | 12:00 792 | 793 | + Meghan asked about what tools to use when handed Excel files or CSV files that users would like some charts created from the data in a way that is shareable. This is in addition to creating charts inside Excel and Jupiter or Colab notebooks, then sharing them with a group of people. 794 | Here are some of the suggestions discussed... 795 | Plotly - https://plotly.com/ 796 | Streamlit - https://streamlit.io/ 797 | This book on mixing Python to process data, but then use JS based tools for web visualization was mentioned again in this group... 798 | Data Visualization with Python and JavaScript, 2nd Edition 799 | https://www.oreilly.com/library/view/data-visualization-with/9781098111861/ 800 | The author’s website is also worth a look: https://www.kyrandale.com 801 | https://www.kyrandale.com 802 | 803 | + We talked about creating RDFs with Python, including Python modules, visualization tool, and GML files 804 | https://github.com/RDFLib/rdflib 805 | https://rdflib.readthedocs.io/en/stable/ 806 | “RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.” 807 | https://gephi.org/ 808 | “Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free.” 809 | GML files 810 | https://en.wikipedia.org/wiki/Graph_Modelling_Language 811 | + We talked briefly about the new version 5.0 of Pymarc and that we would like to go over the changes to Pymarc in thsi group in the future 812 | https://gitlab.com/pymarc/pymarc/-/releases/v5.0.0 813 | 814 | ### April 18, 2023 815 | At today meeting we had @michelle.janowiecki give a short presentation on Pandas, partially based on a longer Pandas presentation she has given before. 816 | [Speedy pandas : a super brief intro to Python's pandas library (see slides)](https://docs.google.com/presentation/d/1xRdNVonTxi9-gEsQkNvbF1e47o_2cuo1iimunoFUky4/edit#slide=id.p) 817 | Here are a couple of useful links from her presentation... 818 | 819 | #### Pandas Official resources 820 | + [documentation website](https://pandas.pydata.org/pandas-docs/stable/index.html) 821 | + [User Guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html) 822 | + [API reference](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) 823 | 824 | #### Pandas Additional resources 825 | + ["Pandas for Metadata Transformation and Cleanup" workshop by Michelle Janowiecki](https://mjanowiecki.github.io/intro-pandas-metadata/intro.html) 826 | + the best book: [Pandas for everyone : Python data analysis](https://www.worldcat.org/title/pandas-for-everyone-python-data-analysis/oclc/1240309883?referer=br&ht=edition) 827 | 828 | #### Examples of the code Michelle demonstrated 829 | 830 | ```python 831 | import pandas as pd 832 | 833 | filename = "sampleData.csv" 834 | df = pd.read_csv(filename) 835 | print(df.head()) 836 | 837 | print(df.columns) 838 | 839 | degree_department = df["degree_department"] 840 | department_unique = degree_department.unique() 841 | print(department_unique) 842 | unique_list = list(department_unique) 843 | print(unique_list) 844 | ``` 845 | 846 | ```python 847 | import pandas as pd 848 | 849 | filename = "sampleData.csv" 850 | df = pd.read_csv(filename) 851 | 852 | print(df.shape) 853 | df = df.dropna(axis=0, how="all") 854 | df = df.dropna(axis=1, how="all") 855 | df = df.drop_duplicates() 856 | df["title"] = df["title"].str.strip() 857 | 858 | print(df.head()) 859 | print(df.shape) 860 | 861 | df.to_csv("sampleData_cleaned.csv", index=False) 862 | ``` 863 | 864 | ```python 865 | import pandas as pd 866 | 867 | df_1 = pd.read_csv("frame_1.csv") 868 | df_2 = pd.read_csv("frame_2.csv") 869 | 870 | merged = pd.merge(df_1, df_2, how="left", on="subject_id") 871 | print(merged.head()) 872 | 873 | merged.to_csv("merged_frames.csv", index=False) 874 | ``` 875 | 876 | These are some of the Pandas features @michelle.janowiecki demonstrated today 877 | + drop_duplicates() 878 | + dropna() 879 | + merge() 880 | 881 | After the presentation we all exchanged pandas usage tips 882 | + like pd.json_normalize(a_dict) 883 | + “All Pandas json_normalize() you should know for flattening JSON” 884 | + https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 885 | + and the ability of doing mathematical 886 | + also there was a mention of the command line JQ tool for parsing JSON 887 | + https://stedolan.github.io/jq/ 888 | 889 | 890 | ### April 4, 2023 891 | #### The mini-workshop "An Introduction to Python for Absolute Beginners": 892 | A very basic intro to Python for librarians who have little to no experience with Python but who want to get started. 893 | + What is Python and why is it useful? (5 min) 894 | + Hands-on practice with basic operations in Python, using Google Colaboratory (25 min) 895 | + Print function 896 | + Data types 897 | + Arithmetic operations 898 | + String concatenation 899 | + Variable assignment 900 | + Q&A/Resources (15 min) 901 | 902 | #### Notes 903 | + We got a shortened version of a Rice University Library workshop called “Mini Python Intro” 904 | + We used Google Colab 905 | + Which is a free Google service that essentially hosts Python Jupyter Notebooks that can be shared with others 906 | + https://colab.research.google.com/ 907 | + For the training using the following resources 908 | + Pre-loaded notebook: 909 | + https://colab.research.google.com/drive/1m3cz4KeozooHFzjswyjgJmbfXZTfG0mP?usp=sharing 910 | + Exercises: 911 | + https://drive.google.com/file/d/1CRda_Gh3mrqpEmbnvF58-7jvYnmV-LhI/view?usp=share_link 912 | + We discussed the Python print() sep parameter 913 | + David shared an article and specifically a tip about using a “union” operator, in this special case a bar “|” character, to join multiple dictionaries together. 914 | + https://medium.com/techtofreedom/19-sweet-python-syntax-sugar-for-improving-your-coding-experience-37c4118fc6b1 915 | + Python f-strings were briefly mentioned 916 | + https://realpython.com/python-f-strings/ 917 | + f-strings are available as of Python 3.6 918 | + Talked about copy /pasting parts of your Python error messages right into Google to help you figure out what is wrong 919 | + and how https://stackoverflow.com/ is a common place to look for error advice 920 | + Google Colab actually offers to send you to StackOverflow when you get an error on code running in Colab 921 | + @Yamil Suárez shared a code snippet demonstrating how to read a file stored in a Google Drive into Google Colab: 922 | ```python 923 | from google.colab import drive 924 | drive.mount('/content/drive') 925 | 926 | import pandas as pd 927 | 928 | df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/data.tsv', sep='\t') 929 | ``` 930 | 931 | 932 | 933 | ### March 21, 2023 934 | + Talked about this group’s [new repository](https://github.com/code4lib/python4lib-resources), and that we want to encourage others to contribute changes via PRs (or reach out to the group) 935 | + Talked about combining JS and python for web visualization 936 | + [Data Visualization with Python and JavaScript, 2nd Edition by Kyran Dale](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/) 937 | + Talked about if on macOS we should currently be using homebrew for installing Python on macOS 938 | + https://docs.brew.sh/Homebrew-and-Python 939 | + consensus was that it should work fine 940 | + “if you use VSCode, it recommends homebrew on mac. I used home-brew to install 3.10 and I haven’t encountered any issues 941 | + (https://code.visualstudio.com/docs/python/python-tutorial”) 942 | + we talked about how Anaconda or Anaconda can be used for python installations 943 | + Talked about [Library Carpentry lessons](https://librarycarpentry.org/lessons/) on Python and other skills like bash, OpenRefine 944 | + Spoke a bit about [Google Collab](https://colab.research.google.com/), which are essentially Jupyter Notebooks in the cloud, no need for local installation 945 | + Pivoted to talk about interesting things seen in during Code4lib 946 | + the Python GUI package mentioned named [Gooey](https://pypi.org/project/Gooey/) 947 | + “There was a poster about updating subject headings as well. Which was something we had briefly talked about briefly a week before C4L.” 948 | + Touched on a suggested breaking change to [pymarc](https://gitlab.com/pymarc/pymarc), [MR details](https://gitlab.com/pymarc/pymarc/-/merge_requests/194) 949 | + this change uses Python “namedtuples” 950 | + this change is welcome by many 951 | + We then covered how to use pymarc with authority records, as opposed to bibliographic records - more research needs to be done 952 | + NOTE: this Python group in the future plans to host a pymarc “code recipe” sharing session 953 | + Talked about current issues in pymarc with MARC bib tag 880 954 | 955 | ### March 7, 2023 956 | + Introductions with a few new members 957 | + Move the Python{4}Lib resource page to a Code{4}Lib, thanks @klinga 958 | + @Rebecca Hyams working on an ELUNA Dev. Day presentation gathering specific holding data (granular) from Alma via API and parsing it via python script. 959 | Chat about maintaining authorities when you’ve decided to change from standard language. Is/should there be a tool to check for changes for authorities you select? 960 | + A project for a heat map visual for circulation might be a new way of helping to weed/collection develop. 961 | + Perhaps there's interest to have a working group dive into different projects. Could be helpful for design ideas. 962 | + Dashboards and/or developing scripts that can translate one form of data to another; identifying transformation steps and when to streamline them in one script vs. multiple. 963 | + IPEDS data transformations. A lot of data isn’t as streamlined as we’d like every time IPEDS comes up. Still quite local though. (Changes year to year?) 964 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | certifi==2022.9.24 2 | charset-normalizer==2.1.1 3 | idna==3.4 4 | requests==2.28.1 5 | urllib3==1.26.12 6 | --------------------------------------------------------------------------------