├── BnF.md ├── LICENSE ├── README.md └── presentations ├── CENL_AI_Marcel_Gygli.pdf ├── CENL_AI_Yves_Maurer.pdf └── readme.md /BnF.md: -------------------------------------------------------------------------------- 1 | # BnF PoCs and applications of AI 2 | 3 | [AI at the BnF](https://www.bnf.fr/fr/feuille-de-route-ia): roadmap ([EN poster](https://www.bnf.fr/sites/default/files/2022-01/Poster_AI%20Roadmap_BnF_202112.pdf)) 4 | 5 | ## A. Image 6 | 7 | ### General presentation 8 | 9 | [IFLA Galway 2022 presentation](https://www.universityofgalway.ie/ifla/abstracts/): [slides](https://docs.google.com/presentation/d/1RbUvVw8mr3DVKfKncGBMeq9s0mbDT_LEmNfpgZ8Zuws/) 10 | 11 | ### Work 12 | 13 | [GallicaPix](https://gallicapix.bnf.fr/), hybrid retrieval of heritage images (in-house PoC, 2017). [Github](https://github.com/altomator/Image_Retrieval) 14 | 15 | [GallicaSnoop](https://snoop.inria.fr/bnf/login), visual similarity engine (research project, collaboration with INA and [INRIA](https://hal.science/hal-02096036), 2018-). 16 | See also: [poster](https://www.bnf.fr/sites/default/files/2022-05/Poster_Gallica_Snoop.pdf), [presentation](https://www.culture.gouv.fr/Media/Thematiques/Innovation-numerique/Folder/Atelier-INRIA-2019/GallicaSnoop) 17 | 18 | [JADIS](https://bnf-jadis.github.io), semantic segmentation of heritage maps (master student project, collaboration with EPFL, 2019). [Github](https://github.com/BnF-jadis) 19 | 20 | CLIP model and heritage images: [Dataiku DSS](https://gallery.dataiku.com/projects/EX_CLIP/) experimentation on Japanese engravings (partnership with Dataiku, 2022); in-house [Flask web app](https://github.com/altomator/CLIP_test/) 21 | 22 | [IIIF and ML](https://github.com/altomator/IIIF): experimentation with Gallica content (in-house PoCs, 2020-) 23 | 24 | 25 | ## B. Text 26 | 27 | ### Work 28 | 29 | HTR 30 | 31 | OCR 32 | 33 | 34 | ## C. Data 35 | 36 | Personalised content recommendation (research project in collaboration with SCAI/Sorbonne Université, 2022-2023) 37 | 38 | [ALGOCOL](https://actions-recherche.bnf.fr/BnF/anirw3.nsf/IX01/A2022000001_dalgocol-fouille-de-donnees-et-algorithmes-de-prediction-de-l-etat-des-collections): Decision support for the conservation and management of collections (PhD thesis, 2018-2022). See also [presentation]( https://bbf.enssib.fr/consulter/bbf-2022-00-0000-008) and [paper](https://www.sciencedirect.com/science/article/abs/pii/S0169023X2200026X#) 39 | 40 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Legal Code 2 | 3 | CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN 7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS 10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM 11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED 12 | HEREUNDER. 13 | 14 | Statement of Purpose 15 | 16 | The laws of most jurisdictions throughout the world automatically confer 17 | exclusive Copyright and Related Rights (defined below) upon the creator 18 | and subsequent owner(s) (each and all, an "owner") of an original work of 19 | authorship and/or a database (each, a "Work"). 20 | 21 | Certain owners wish to permanently relinquish those rights to a Work for 22 | the purpose of contributing to a commons of creative, cultural and 23 | scientific works ("Commons") that the public can reliably and without fear 24 | of later claims of infringement build upon, modify, incorporate in other 25 | works, reuse and redistribute as freely as possible in any form whatsoever 26 | and for any purposes, including without limitation commercial purposes. 27 | These owners may contribute to the Commons to promote the ideal of a free 28 | culture and the further production of creative, cultural and scientific 29 | works, or to gain reputation or greater distribution for their Work in 30 | part through the use and efforts of others. 31 | 32 | For these and/or other purposes and motivations, and without any 33 | expectation of additional consideration or compensation, the person 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she 35 | is an owner of Copyright and Related Rights in the Work, voluntarily 36 | elects to apply CC0 to the Work and publicly distribute the Work under its 37 | terms, with knowledge of his or her Copyright and Related Rights in the 38 | Work and the meaning and intended legal effect of CC0 on those rights. 39 | 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be 41 | protected by copyright and related or neighboring rights ("Copyright and 42 | Related Rights"). Copyright and Related Rights include, but are not 43 | limited to, the following: 44 | 45 | i. the right to reproduce, adapt, distribute, perform, display, 46 | communicate, and translate a Work; 47 | ii. moral rights retained by the original author(s) and/or performer(s); 48 | iii. publicity and privacy rights pertaining to a person's image or 49 | likeness depicted in a Work; 50 | iv. rights protecting against unfair competition in regards to a Work, 51 | subject to the limitations in paragraph 4(a), below; 52 | v. rights protecting the extraction, dissemination, use and reuse of data 53 | in a Work; 54 | vi. database rights (such as those arising under Directive 96/9/EC of the 55 | European Parliament and of the Council of 11 March 1996 on the legal 56 | protection of databases, and under any national implementation 57 | thereof, including any amended or successor version of such 58 | directive); and 59 | vii. other similar, equivalent or corresponding rights throughout the 60 | world based on applicable law or treaty, and any national 61 | implementations thereof. 62 | 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention 64 | of, applicable law, Affirmer hereby overtly, fully, permanently, 65 | irrevocably and unconditionally waives, abandons, and surrenders all of 66 | Affirmer's Copyright and Related Rights and associated claims and causes 67 | of action, whether now known or unknown (including existing as well as 68 | future claims and causes of action), in the Work (i) in all territories 69 | worldwide, (ii) for the maximum duration provided by applicable law or 70 | treaty (including future time extensions), (iii) in any current or future 71 | medium and for any number of copies, and (iv) for any purpose whatsoever, 72 | including without limitation commercial, advertising or promotional 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each 74 | member of the public at large and to the detriment of Affirmer's heirs and 75 | successors, fully intending that such Waiver shall not be subject to 76 | revocation, rescission, cancellation, termination, or any other legal or 77 | equitable action to disrupt the quiet enjoyment of the Work by the public 78 | as contemplated by Affirmer's express Statement of Purpose. 79 | 80 | 3. Public License Fallback. Should any part of the Waiver for any reason 81 | be judged legally invalid or ineffective under applicable law, then the 82 | Waiver shall be preserved to the maximum extent permitted taking into 83 | account Affirmer's express Statement of Purpose. In addition, to the 84 | extent the Waiver is so judged Affirmer hereby grants to each affected 85 | person a royalty-free, non transferable, non sublicensable, non exclusive, 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the 88 | maximum duration provided by applicable law or treaty (including future 89 | time extensions), (iii) in any current or future medium and for any number 90 | of copies, and (iv) for any purpose whatsoever, including without 91 | limitation commercial, advertising or promotional purposes (the 92 | "License"). The License shall be deemed effective as of the date CC0 was 93 | applied by Affirmer to the Work. Should any part of the License for any 94 | reason be judged legally invalid or ineffective under applicable law, such 95 | partial invalidity or ineffectiveness shall not invalidate the remainder 96 | of the License, and in such case Affirmer hereby affirms that he or she 97 | will not (i) exercise any of his or her remaining Copyright and Related 98 | Rights in the Work or (ii) assert any associated claims and causes of 99 | action with respect to the Work, in either case contrary to Affirmer's 100 | express Statement of Purpose. 101 | 102 | 4. Limitations and Disclaimers. 103 | 104 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 105 | surrendered, licensed or otherwise affected by this document. 106 | b. Affirmer offers the Work as-is and makes no representations or 107 | warranties of any kind concerning the Work, express, implied, 108 | statutory or otherwise, including without limitation warranties of 109 | title, merchantability, fitness for a particular purpose, non 110 | infringement, or the absence of latent or other defects, accuracy, or 111 | the present or absence of errors, whether or not discoverable, all to 112 | the greatest extent permissible under applicable law. 113 | c. Affirmer disclaims responsibility for clearing rights of other persons 114 | that may apply to the Work or any use thereof, including without 115 | limitation any person's Copyright and Related Rights in the Work. 116 | Further, Affirmer disclaims responsibility for obtaining any necessary 117 | consents, permissions or other rights required for any use of the 118 | Work. 119 | d. Affirmer understands and acknowledges that Creative Commons is not a 120 | party to this document and has no duty or obligation with respect to 121 | this CC0 or use of the Work. 122 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Awesome AI in Libraries [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) 3 | 4 | Libraries hold an increasing amount of digital data that can be extracted, analysed and processed using different AI methods for different use cases. Some libraries manage this internally, others set up projects and others yet have a dedicated labs environment. 5 | 6 | Please feel free to populate this list with your project or initiative ([how-to](https://github.com/sindresorhus/awesome/blob/main/contributing.md)) or drop an email to cenl.ng.ai@gmail.com to inform us about it. 7 | 8 | ## Content 9 | * [Training](#training-for-GLAM-practitioners) 10 | * [AI Initiatives](#ai-initiatives) 11 | * [Community Resources](#community-resources) 12 | * [Other Awesome Lists](#other-awesome-lists) 13 | 14 | 15 | ## Training for GLAM practitioners 16 | - [Programming Historian](https://programminghistorian.org/) 17 | - [Library Carpentry](https://librarycarpentry.org/) 18 | 19 | CENL "AI in Libraries" webinars serie: see the 2023 presentations on the [CENL web site](https://www.cenl.org/network-group-ai-in-libraries-webinars-2023/). 20 | 21 | ## AI Initiatives in Europe 22 | 23 | ### Research projects 24 | - [NewsEye](https://www.newseye.eu/), funded by the European Union’s Horizon 2020 research and innovation programme, is a research project advancing the state of the art and introducing new concepts, methods and tools for digital humanities by providing enhanced access to historical newspapers. NewsEye makes use of AI approaches for document analysis, OCR, text analysis. 25 | - [Impresso](https://impresso-project.ch/) (EPFL-DHLAB, UUZH-CL, C2DH). The objective of the project "Media monitoring of the past. Mining 200 years of historical newspapers" is to enable critical text mining of newspaper archives with the implementation of computational linguistics tools to extract, process, link, and explore data from print media archives. 26 | - [Living with Machines](https://livingwithmachines.ac.uk/) (Turing Institute, British Library) is a research project that rethinks the impact of technology on the lives of ordinary people during the Industrial Revolution 27 | 28 | ### Internal projects 29 | - **National Library of Finland**: 30 | - [Automatic description of digital data](https://www.cenl.org/national-library-finland-high-performance-digitisation-giving-a-boost-to-the-description-of-digital-data/) using an intelligent annotation pipeline for semi-automated annotation (adding metadata) and enrichment of archived material, such as newspapers, books and official documents. 31 | - [Annif](https://annif.org/): tool for automated subject indexing and classification 32 | - **Bibliothèque nationale de France**: 33 | - [GallicaPix](https://gallicapix.bnf.fr/), hybrid retrieval of heritage images with use of trained classification models and commercial AI APIs. 34 | - [GallicaSnoop](https://snoop.inria.fr/bnf/login), visual similarity engine 35 | - *See more projects [here](https://github.com/CENL-Network-Group-AI/awesome-list/blob/main/BnF.md)* 36 | - **National Library of Norway**: 37 | - Machine Learning and the Dewey Decimal ([conference presentation](https://nkos-eu.github.io/2019/content/NKOS2019-presentation-wetjen.pdf), NKOS 2019; [conference paper](http://library.ifla.org/id/eprint/2216), IFLA WLIC 2018) 38 | - [NoTraM - Norwegian Transformer Model](https://github.com/NBAiLab/notram), a transformer-based model for the Norwegian language 39 | - **Helsinki Central Library Oodi**: [Headai](https://medium.com/headai-customer-stories/customer-story-oodi-1d1ef2554bb6), a virtual information assistant 40 | - **Royal Library of Belgium**: [Cataloguing Books](https://www.realdolmen.com/en/case-study/artificial-intelligence-helps-royal-library-of-belgium), a tool developed in Windows Powerapps, that detects metadata based on title page (title, author, publisher, and so on). Future developments: detection of type of page (is the page to be treated a title page, or a colophon, or back cover), and based on the results, other actions (title pages: metadata title, author, publisher), colophon (detect metadata isbn, legal deposit number, names, publisher), and back cover (subject indexing) 41 | - **Swiss National Library**: [Automatic Classification of e-Dissertations](https://github.com/CENL-Network-Group-AI/awesome-list/blob/main/presentations/CENL_AI_Marcel_Gygli.pdf) The National Library of Switzerland receives one copy each of dissertations produced in Switzerland from university libraries, a large proportion of which are now digital. The dissertations are to be classified into one of the approximately 100 subject groups. The aim of this project is to test open source algorithms that automatically classify the theses. 42 | - **German National Library (DNB)**: [Automatic subject indexing with Annif](https://swib.org/swib21/slides/03-02-uhlmann.pdf) 43 | - **National Library of Luxembourg**: 44 | - [Fine grained language identification in multilingual corpus with OCR errors](https://github.com/CENL-Network-Group-AI/awesome-list/blob/main/presentations/CENL_AI_Yves_Maurer.pdf) The project describes how the BnL identified 18 different languages in a 8 million articles collection, ranging from majority languages like French and German to minority cases like Latin or Esperanto using a combination of different existing models and dictionaries. 45 | - [OCR post-correction](https://github.com/natliblux/nautilusocr) This project aims at enhancing the OCR quality of original METS/ALTO packages. 46 | 47 | ### Tools and services 48 | - [Transkribus](https://readcoop.eu/transkribus/): a platform for the transcription, recognition and searching of historical documents 49 | - [Visual Geometry Group](https://www.robots.ox.ac.uk/~vgg/), university of Oxford: computer vision tools for Digital Humanities 50 | - [DANE (Distributed Annotation ‘n’ Enrichment)](https://dane.readthedocs.io/en/latest/index.html), Netherlands institute of Sound and Vision 51 | 52 | ## Community Resources 53 | 54 | ### Other Awesome Lists 55 | - [Ai4lam Look Book](https://docs.google.com/presentation/d/1iWG9RpPaMlikUAe8mfVlYQeoCiNH8ct2ILFtbMI7P_o/edit#slide=id.p) : Knowledge Base of AI 56 | Projects in Libraries, Archives and Museums from the [ai4lam.org](https://sites.google.com/view/ai4lam) international initiative. 57 | - [The Museums + AI Network](https://themuseumsai.network/about). List of Artificial Intelligence (AI) [initiatives in museums](https://docs.google.com/spreadsheets/d/1A7IVnucQZ0ICxYSOCjqQ1oV3xGgNzDKtIYGrk6smV7w/edit#gid=0) 58 | - [AI:CULT - Culturally aware AI](https://www.cultural-ai.nl/projects/aicult-culturally-aware-ai) 59 | 60 | 61 | -------------------------------------------------------------------------------- /presentations/CENL_AI_Marcel_Gygli.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CENL-Network-Group-AI/awesome-list/9a460cbc7a56b95a890824f1d760a3cb567b67c1/presentations/CENL_AI_Marcel_Gygli.pdf -------------------------------------------------------------------------------- /presentations/CENL_AI_Yves_Maurer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CENL-Network-Group-AI/awesome-list/9a460cbc7a56b95a890824f1d760a3cb567b67c1/presentations/CENL_AI_Yves_Maurer.pdf -------------------------------------------------------------------------------- /presentations/readme.md: -------------------------------------------------------------------------------- 1 | 2 | --------------------------------------------------------------------------------