├── templates └── meeting-notes.md ├── meeting-minutes ├── notes-2020-02-05.md ├── notes-2021-06-30.md ├── notes-2020-03-04.md ├── notes-2020-10-14.md ├── notes-2021-04-28.md ├── notes-2021-02-24.md ├── notes-2020-06-24.md ├── notes-2020-12-01.md ├── notes-2020-09-02.md └── notes-2020-05-20.md ├── README.md ├── CONTRIBUTING.md ├── CODE_OF_CONDUCT.md └── LICENSE.md /templates/meeting-notes.md: -------------------------------------------------------------------------------- 1 | # Template for taking notes from Humanities & Data Science Discussion group 2 | 3 | *This template is used in the shared HackMD shared by the discussion hosts.* 4 | 5 | :::info 6 | - **Next meeting date:** 7 | - **Hosts:** 8 | - **Contact:** 9 | - **Join Zoom Meeting:** <>ZOOM LINK 10 | ::: 11 | 12 | **Direct links to the notes from the last meetings:** 13 | 14 | [TOC] 15 | 16 | 17 | -- 18 | 19 | ## Notes: dd Month, yyyy 20 | 21 | ## Topic 22 | 23 | - 24 | 25 | ## Aim of this meeting 26 | 27 | - 28 | 29 | 30 | ## Volunteer to take notes 31 | 32 | - 33 | - 34 | - 35 | - 36 | - 37 | 38 | ## Participants 39 | 40 | **Participants (write your names below)** 41 | 42 | *Name / Institute / What's your experience with preprints?* (answer in ONE short sentence) 43 | 44 | - 45 | - 46 | - 47 | - 48 | - 49 | 50 | :dart: Discussion Goal 51 | --- 52 | 53 | - 54 | 55 | **2 minutes silent note-taking: personal reflection** 56 | "Add '+1' (plus 1_) next to a statement that you agree with and would like to discuss" 57 | 58 | *Name / response* 59 | 60 | - 61 | - 62 | 63 | 64 | :books: Reference and other works mentioned during the discussion 65 | --- 66 | *Please add link and reference, any work that has been discussed and mentioned* 67 | 68 | - 69 | 70 | :mag: Main arguments from the discussion 71 | --- 72 | *anyone can help taking notes.* 73 | 74 | - 75 | 76 | :closed_book: Closing thoughts 77 | -- 78 | 79 | - 80 | 81 | ### Additional Drafted Notes 82 | 83 | 84 | - None for this meeting. 85 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-02-05.md: -------------------------------------------------------------------------------- 1 | # Notes: 05 February, 2020 2 | 3 | - **Hosts:** 4 | - Fede, Leontien 5 | 6 | ### Topic 7 | 8 | - [Data Science Tutorials & Humanities Scholars](https://docs.google.com/presentation/d/1ZfY0_GyYBkRyvkrCt_7hJhShFNdYGaRYQUsKpizUBAY/edit?usp=sharing) 9 | 10 | **Participants (write your names below)** 11 | 12 | - Malvika, Kasra, Katie, Daniel W, DanVan, Dave, Kaspar, Mariona, Olivia ... 13 | 14 | :books: References 15 | --- 16 | - Programming Historians: https://programminghistorian.org/ 17 | - 18 | 19 | 20 | :dart: Discussion Goal 21 | --- 22 | 23 | - The focus will be on the benefits and drawbacks of tutorials enabling humanities scholars to easily use data science methods. 24 | 25 | 26 | :mag: Main arguments from the discussion 27 | --- 28 | 29 | 1. Tutorials are useful for very specific tasks, to learn what a tool could/should do, not necessarily to learn data science. They are a first step into the field, but from the discussion it became apparent that it is easier to learn data science from books, courses or internships. 30 | 31 | 2. Different methodological frameworks between science and humanities education. We talked about whether data science will ever become part of humanities curricula, due to the demand from students which see it necessary for entering the job market. We had a comparison with the training in biology, a science that (partly) relies on qualitative methods. 32 | 33 | 3. Tutorials often don’t have an interactive component (compared to working in groups). This leads to less of a community feeling; also it is unclear how reliable they generally are. The Programming Historian addresses many of these issues, with peer-review, frequent updates and an active Twitter community. 34 | 35 | :closed_book: Closing thoughts 36 | -- 37 | - 38 | 39 | ### Additional Drafted Notes 40 | 41 | 42 | - 43 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Humanities & Data Science Discussion Group 2 | 3 | First and foremost, Welcome! 🎉 4 | 5 | This repository is to develop, store and maintain resources for the Humanities & Data Science (HDS) Discussion Group, which is part of a broader [Turing Interest Group on the same topic](https://www.turing.ac.uk/research/interest-groups/humanities-and-data-science). 6 | 7 | This document (the README file) is a hub to give you some information about the project. 8 | 9 | Since January 2020, we have been organising and hosting sessions to discuss imporant topics that are at the intersection of humanities and data science. 10 | Members of this discussion/interest group are from The Alan Turing Institute, University College London, British Library and their friends or colleagues. 11 | As we gain momentum in this project and test different formats to run our discussion sessions, we understand that there might be more people who would like to get involved in the discussions or read notes from these meetings. 12 | 13 | ## Maintainers and Organisers 14 | 15 | The project maintainers and discussion group organisers are the following members: 16 | - Federico Nanni (@fnanni), Senior Data Scientist, Research Engineering Group 17 | - Katherine L. McDonough (@kmcdono2), History Postdoc, Living with Machine Project 18 | - Valeria Vitale (@nottinauta), Research Associate, Digital Cultural Heritage 19 | - Kalle Westerling (@kallewesterling), DH Research Software Engineer, Living with Machine Project 20 | - Anne Lee Steele (@aleesteele), Community Manager of The Turing Way 21 | 22 | ### Thanks to Previous Organisers 23 | 24 | - Leontien Talboom (@makethecatwise), PhD Student, UCL 25 | - Malvika Sharan (@malvikasharan), Senior Researcher and Co-Lead of The Turing Way 26 | 27 | ## Getting Involved 28 | 29 | If you would like to get involved or share ideas to make resources from these sessions available for wider audience, please get in touch with the maintainers of this repository by opening an [issue](https://github.com/fedenanni/HDS-DiscussionGroup/issues) on this repository. 30 | 31 | If you would like to help with any of the issues raised in this repository or have other ideas for contributions, please check out our [contributors' guidelines](./CONTRIBUTING.md). 32 | 33 | Please note that it's very important to us that we maintain a positive and supportive environment for everyone who wants to participate. 34 | When you join us we ask that you follow our [Code of Conduct](CODE_OF_CONDUCT.md) in all interactions both online and offline. 35 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # How to contribute? 2 | 3 | First of all, thanks for taking the time to contribute! 🎉👍 4 | 5 | This repository is a place to share resources for the Humanities & Data Science (HDS) discussion and interest group and we welcome suggestions and contributions. 6 | You can report mistakes and errors, propose a topic for discussion, offer resources or help create posts based on past discussions. 7 | 8 | For any organisation related queries or concern, you can directly reach out to the organisers by dropping an email to Federico Nanni (email: fnanni@turing.ac.uk). 9 | We have a [Code of Conduct](./CODE_OF_CONDUCT.md) that applies to all the activities and projects involved in this event. 10 | 11 | Whatever is your background and availability, there is a way to contribute on this GitHub repository. 12 | 13 | 🏃 I'm busy, I only have 1 minute 14 | --- 15 | 16 | Get in touch with the members who organise discussion sessions and maintain documents in the repository (for instance by fixing typos in [our meeting minutes](https://github.com/fedenanni/HDS-DiscussionGroup/tree/meeting-resources/meeting-minutes)! 😊) 17 | 18 | ⏳ I've got 5 minutes - tell me what I should do 19 | --- 20 | 21 | Submit issues/pull requests to suggest new topics for discussion 22 | 23 | 🎉 It's my life's mission to learn everything about Humanities and Data Science 24 | --- 25 | 26 | Suggest a topic and chair the session! Contribute to the following posts, documents and notes. 27 | 28 | Please open a GitHub issue to suggest new content, contribute examples, or let us know about errors/bugs. 29 | 30 | 🛠 I am ready to contribute 31 | --- 32 | 33 | - For open tasks in this repository, please see the [Issues section](https://github.com/fedenanni/HDS-DiscussionGroup/issues). 34 | - Raise mistakes, error or missing information on this repository by opening a [Pull Request](https://github.com/fedenanni/HDS-DiscussionGroup/pulls) 35 | - Read details on [how to open a Pull request](https://opensource.guide/how-to-contribute/#opening-a-pull-request) 36 | - Submit trivial fixes (for example, a typo, a broken link or an obvious error) 37 | - Start work on a contribution that is already listed as an issue, or something you’ve already discussed 38 | - A pull request doesn’t have to represent finished work. It’s usually better to open a pull request early on, so others can watch or give feedback on your progress. Just mark it as a “WIP” (Work in Progress) in the subject line. You can always add more commits later. 39 | 40 | Acknowledgements 🙌 41 | --- 42 | 43 | This project is maintained and organised by Federico Nanni, Leontien Talboom, Malvika Sharan and Katherine L McDonough. 44 | 45 | This work is licensed under a Creative Commons Attribution 4.0 International license. 46 | You are free to share and adapt the material for any purpose, even commercially, 47 | as long as you provide attribution (give appropriate credit, provide a link to the license, 48 | and indicate if changes were made) in any reasonable manner, but not in any way that suggests the 49 | licensor endorses you or your use, and with no additional restrictions. 50 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2021-06-30.md: -------------------------------------------------------------------------------- 1 | # Notes 30 June 2021 2 | 3 | ## Topic: Non-English NLP 4 | 5 | :mailbox_with_mail: [Invitation prompt](https://hackmd.io/GGuqNbzpS2qGvwh28N5e9w) 6 | 7 | **Participants (write your names below)** 8 | 9 | *Name/Institute* 10 | 11 | - Katie McDonough / Turing 12 | - Federico Nanni / Turing 13 | - Malvika Sharan / Turing 14 | - Leontien Talboom / UCL & The National Archives, UK 15 | - Quinn 16 | - Daniel 17 | - Kevin 18 | - Javad 19 | - Alex Brandsen / Faculty of Archaeology, Leiden University 20 | - Martin 21 | - Serge Sharoff / University of Leeds 22 | - Ludovic Moncla / LIRIS, INSA Lyon, France 23 | - Thao Do 24 | 25 | **Volunteers to take notes: Please add your name below** 26 | 27 | - Fede 28 | - Leontien 29 | - Malvika 30 | 31 | ### Slides 32 | https://hackmd.io/_7ga9589T8OqQs1F6GWpEA 33 | 34 | #### References from slides/invitation 35 | 36 | - Just an example of why Wikipedia should not be always assumed to be an available corpus for each language: https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-us-teenager-wrote-huge-slice-of-scots-wikipedia 37 | - Quinn Dombrowski, "What's a "word": Multilingual DH and the English Default" (2020) 38 | http://www.quinndombrowski.com/?q=blog/2020/10/15/whats-word-multilingual-dh-and-english-default 39 | - The Multilingual Digital Humanities initiative 40 | https://multilingualdh.org/en/ 41 | - Emily Bender on (her) Bender Rule (2019) 42 | https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/ 43 | - Right to Left Conference 2021 44 | https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online-edition-aligned-conferences-and-events/dhsi-2021-right-to-left/ 45 | - NEH-funded project: New Languages for NLP 46 | https://newnlp.princeton.edu/ 47 | - https://gscl.org/en 48 | - https://www.ai-lc.it/en/ 49 | - https://hausanlp.github.io/ 50 | - Domenico Fiormonte, "Towards a Cultural Critique of Digital Humanities" in Debates in the Digital Humanities 2016 51 | 52 | ### :dart: Quick Question 53 | 54 | *Feel free to answer or add a '+1' next to a statement that you agree with and/or would like to discuss* 55 | 56 | **What is your experience with non-English languages in NLP or DH?** 57 | 58 | 59 | 60 | ### ✍ Initial Drafted Notes 61 | 62 | *anyone can help taking notes* 63 | 64 | 65 | Non-English NLP. 66 | 67 | Lots of development working with non english language outside of the NLP community and in the DH. how do we approach working with non english material in these communities? 68 | 69 | Problems: 70 | - getting data 71 | - developing methods 72 | - sharing work 73 | - how we teach 74 | 75 | Significant differencies across centuries for many languages. Many NLP methods are developed to work in English, gaps in tools / software libraries for working in tools. Imperative to present in English is a problem, if there is no attention to linguistic diversity in the classroom then there will be a lack of it in the public sphere. 76 | 77 | Why people work in other languages? 78 | 79 | Quinn: Often there are no things out of the box that you can use, because most of the time these are things that are not really used in the NLP community. 80 | 81 | Katie: the tools that are out there were terrible and don't work. 82 | 83 | Alex: dutch archeology pipeline is not generalisible and so very hard to publish 84 | 85 | Serge: the diversity of language is quite large 86 | 87 | ![](https://i.imgur.com/f2TE1sD.jpg) 88 | 89 | the rise of multilingual method brings less language resources and for some tasks might work 90 | 91 | Daniel: relation between business and law - if you have a model that assign a tag and works well in English, does this work well in another language 92 | 93 | Serge: machine translation and multilingual transformers might work 94 | 95 | Quinn: there is unavoilable labour for annotations. Often is non scalable 96 | 97 | Katie: many complex and time consuming questions 98 | 99 | Fede: find good annotators across languages is hard 100 | 101 | Alex: there are many gray areas when working in other languages 102 | 103 | Leontien: archival selection is approached with very engineering way 104 | 105 | Serge: simple engineering solutions might be a starting point to enable language-specific research 106 | 107 | Katie: what are typical problems when teaching multilingual NLP? 108 | 109 | Quinn: words are not words and sentences are not sentences. 110 | 111 | Katie: certain type of linguistic work in the humanities are easier to do when working in english compard 112 | 113 | ### :books: Reference and other works mentioned during the discussion 114 | 115 | *Please add links and references to any work that has been discussed and mentioned* 116 | 117 | * Nekoto, W. et al. 2020. ‘Participatory Research for Low-resourced Machine Translation: A Case 118 | Study in African Languages’, Findings of Association for Computational Linguistics, ACL Anthology. 119 | URL: https://www.aclweb.org/anthology/2020.findings-emnlp.195 120 | DOI: 10.18653/v1/2020.findings-emnlp.195 - This paper proposed participatory research for African languages 121 | * This paper from researchers in Nigeria who are working to embed Hausa and other local languages for NLP: https://arxiv.org/abs/1911.10708 122 | https://github.com/hausanlp 123 | * Linguistic variety, cognate languages and NLP: http://corpus.leeds.ac.uk/serge/publications/2020-jnle.pdf 124 | 125 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct for the HDS Discussion group 2 | 3 | As a community-oriented project we welcome everyone, and encourage a friendly and positive environment. 4 | 5 | This code of conduct outlines our expectations for participants within the 6 | community, as well as steps to reporting unacceptable behavior. We are committed 7 | to providing a welcoming and inspiring community for all and expect our code of 8 | conduct to be honored. Anyone who violates this code of conduct may be banned 9 | from the community. 10 | 11 | Our open source community strives to: 12 | 13 | - **Be friendly and patient.** 14 | 15 | - **Be welcoming**: We strive to be a community that welcomes and supports 16 | people of all backgrounds and identities. This includes, but is not limited to 17 | members of any race, ethnicity, culture, national origin, colour, immigration 18 | status, social and economic class, educational level, sex, sexual orientation, 19 | gender identity and expression, age, size, family status, political belief, 20 | religion, and mental and physical ability. 21 | 22 | - **Be considerate**: Your work will be used by other people, and you in turn 23 | will depend on the work of others. Any decision you take will affect users and 24 | colleagues, and you should take those consequences into account when making 25 | decisions. Remember that we're a world-wide community, so you might not be 26 | communicating in someone else's primary language. 27 | 28 | - **Be respectful**: Not all of us will agree all the time, but disagreement is 29 | no excuse for poor behavior and poor manners. We might all experience some 30 | frustration now and then, but we cannot allow that frustration to turn into a 31 | personal attack. It’s important to remember that a community where people feel 32 | uncomfortable or threatened is not a productive one. 33 | 34 | - **Be careful in the words that we choose**: We are a community of 35 | professionals, and we conduct ourselves professionally. Be kind to others. Do 36 | not insult or put down other participants. Harassment and other exclusionary 37 | behavior aren't acceptable. This includes, but is not limited to: Violent 38 | threats or language directed against another person, Discriminatory jokes and 39 | language, Posting sexually explicit or violent material, Posting (or 40 | threatening to post) other people’s personally identifying information 41 | (“doxing”), Personal insults, especially those using racist or sexist terms, 42 | Unwelcome sexual attention, Advocating for, or encouraging, any of the above 43 | behavior, Repeated harassment of others. In general, if someone asks you to 44 | stop, then stop. 45 | 46 | - **Try to understand why we disagree**: Disagreements, both social and 47 | technical, happen all the time. It is important that we resolve disagreements 48 | and differing views constructively. Remember that we’re different. Diversity 49 | contributes to the strength of our community, which is composed of people from 50 | a wide range of backgrounds. Different people have different perspectives on 51 | issues. Being unable to understand why someone holds a viewpoint doesn’t mean 52 | that they’re wrong. Don’t forget that it is human to err and blaming each 53 | other doesn’t get us anywhere. Instead, focus on helping to resolve issues and 54 | learning from mistakes. 55 | 56 | ### Diversity Statement 57 | 58 | We encourage everyone to participate and are committed to building a community 59 | for all. Although we will fail at times, we seek to treat everyone both as 60 | fairly and equally as possible. Whenever a participant has made a mistake, we 61 | expect them to take responsibility for it. If someone has been harmed or 62 | offended, it is our responsibility to listen carefully and respectfully, and do 63 | our best to right the wrong. 64 | 65 | Although this list cannot be exhaustive, we explicitly honor diversity in age, 66 | gender, gender identity or expression, culture, ethnicity, language, national 67 | origin, political beliefs, profession, race, religion, sexual orientation, 68 | socioeconomic status, and technical ability. We will not tolerate discrimination 69 | based on any of the protected characteristics above, including participants with 70 | disabilities. 71 | 72 | ### Reporting Issues 73 | 74 | If you experience or witness unacceptable behavior, or have any other concerns, please report it by contacting the project organiser and maintainer: 75 | Federico Nanni (email: fnanni@turing.ac.uk) and Leontien Talboom (email: leontien.talboom.18@ucl.ac.uk). 76 | 77 | To report an issue involving one of the members, please email other members individually. 78 | 79 | All reports will be handled with discretion. In your report please include: 80 | 81 | - Your contact information. 82 | 83 | - Names (real, nicknames, or pseudonyms) of any individuals involved. If there 84 | are additional witnesses, please include them as well. Your account of what 85 | occurred, and if you believe the incident is ongoing. If there is a publicly 86 | available record (e.g. a mailing list archive or a public IRC logger), please 87 | include a link. 88 | 89 | - Any additional information that may be helpful. 90 | 91 | After filing a report, a representative will contact you personally, review the 92 | incident, follow up with any additional questions, and make a decision as to how 93 | to respond. If the person who is harassing you is part of the response team, 94 | they will recuse themselves from handling your incident. If the complaint 95 | originates from a member of the response team, it will be handled by a different 96 | member of the response team. We will respect confidentiality requests for the 97 | purpose of protecting victims of abuse. 98 | 99 | ### Attribution & Acknowledgements 100 | 101 | This code of conduct is based on the Open Code of Conduct from the [TODO Group](https://github.com/todogroup/opencodeofconduct/). 102 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-03-04.md: -------------------------------------------------------------------------------- 1 | # Notes: 04 March, 2020 2 | 3 | - **Hosts:** 4 | - Fede, Leontien 5 | 6 | ### Topic 7 | 8 | - [Data-driven publications in the Humanities](https://docs.google.com/presentation/d/13nPK5f9Z6wEwOkjbNfLQI4WZ1cRJ9HfDcl6MmmuaJtY/edit?usp=sharing) - comments are open 9 | 10 | **Participants (write your names below)** 11 | 12 | - Malvika, Kasra, Laura, Katie, DanVan, Kaspar, Mariona, Giorgia Occhini, Tim, Amy 13 | 14 | :dart: Discussion Goal 15 | --- 16 | 17 | - Discussing the impact of data-driven research on the overall debate concerning methodology in the humanities 18 | 19 | :books: Works mentioned during the discussion 20 | --- 21 | - Gregory Crane, [What Do You Do with a Million Books?](http://www.dlib.org/dlib/march06/crane/03crane.html), 2006 22 | - [The Culturomics paper](http://www.culturomics.org/), 2010 23 | - Dan Cohen, [Initial Thoughts on the Google Books Ngram Viewer and Datasets](https://dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/ "https://dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/"), 2010 24 | - Anthony Grafton, [Loneliness and Freedom](https://www.historians.org/publications-and-directories/perspectives-on-history/march-2011/loneliness-and-freedom), 2011 25 | - Cameron Blevins, [Topic Modeling Martha Ballard's Diary](http://www.cameronblevins.org/posts/topic-modeling-martha-ballards-diary/), 2010 26 | - Matthew Jockers and Annie Swafford discussion around the Syuzhet package, 2015. Starting points: [1](http://www.matthewjockers.net/2015/02/02/syuzhet/) and [2](https://annieswafford.wordpress.com/2015/03/02/syuzhet/) 27 | - Scott Weingart, [“Digital History” Can Never Be New](https://scottbot.net/digital-history-can-never-be-new/), 2016 28 | 29 | 30 | :mag: Main arguments from the discussion 31 | --- 32 | 33 | - The different role that examples play in historical research compared to other disciplines (especially the social and natural sciences). In the first, they are presented as evidences for sustaining a specific narrative, while in the others they offer insights on a quantified property of the analysed data. 34 | - Initially, a distant reading method like topic modeling was used for browsing and visualizing the collection, not for deriving evidence (however the distinction is thin) 35 | - History deals with questions that often cannot be answered using big data and quantitative approaches. For instance "how" questions, rather than "what". 36 | 37 | :closed_book: Closing remarks/questions/topics (for future discussions!) 38 | -- 39 | 40 | - The role of private companies in digitizing and making available collections (ethical, copyright and accessibility issues) 41 | - Non-domain experts doing research in the humanities (as well in biology, medicine, psicology) because they know how to work at scale 42 | - The difference between discovering and justification in the Humanities (starting from [Trevor Owens, 2012](http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-humanities/)) 43 | - How different disciplines answer "why" questions, and whether this is changing with the advent of data science. 44 | 45 | ### Additional Drafted Notes 46 | 47 | 48 | - Previous experience working/playing with tools for working with large dataset without any goal or questions in mind 49 | - How can playing around be changed to actual research: making sense of the outcome? 50 | 51 | - What happens when you find something you did not expect? i.e. sentiment analysis of Dorian Grey that ends up showing that the book is sad in the first part and happy towards the end. Is data exploration used in other fields? 52 | - Linguist researchers working on Oral tradition - can create maps and names 53 | - Engineers - hypothesis generation does not expect surprises, but going forward in exploration can give you surprise to support or reject these surprises 54 | - Historians go to archive with some question to select the collections to look at - then they can lean on serendipity that can lead to the crystalisation of bigger new questions 55 | - In bioinformatics we can start with a set of data, i.e. multiple cancer sequencing data (transcriptomics, metagenomics, metabolomics, proteomics) and we can study patterns and derive conclusion on what kind of cancer are they, what causes them, which are the genes or the drug targets. Serendipity and surprises are basically everything - but larger dataset allow us remove noise from actual signal in data. 56 | - Conclusion: Its hard to ignore surprises and ask more questions when they appear as a side effect of an original question 57 | 58 | - Dan Cohen's reaction to n-gram: are trends derived from big data as historical evidences or do we just want to search and learn? 59 | - Human right critical theory: this approach allows them to look at the last status (what it is) and then go back to looking into data to see how it started. 60 | - Social sciences: Hypothesis generation in social sciences are based on assumptions derived by a specific group of people working on selected cases and examples - it changes with people, their environment and cases/examples 61 | - Engineer: we need tools for exploration and other tools for trends - trends can allow us to avoid averaging out (overfitting or underfitting) of observations. 62 | - when you are working with millions of article, you will find an article that matches your ideal observation 63 | - improving how we use methods to go from distant to close reading or vice versa: avoiding cherry picking of observation made through an analysis by using computational methods that can help them avoid these bias 64 | - Do computational approaches to history help historian ask new question, or provide new methods to explore old question? 65 | - In other fields, general observations allowing future predictions, for e.g. conflict, infections 66 | - It also depends of the relevance in our community, for e.g. coronavirus vs infection in general 67 | - Some questions can't be addressed with the close reading because it is about trend (and vice versa) 68 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-10-14.md: -------------------------------------------------------------------------------- 1 | # Notes: 14 October, 2020 2 | 3 | ## Topic 4 | 5 | - Ethical implications of archiving the web, especially social media 6 | 7 | ## Aim of this meeting 8 | 9 | - We would like to talk about the benefits and drawbacks of guaranteeing long-term access to this type of material, focusing in particular on the dichotomy between authorial consent and historical preservation. 10 | 11 | - Here’s a few starting points for the discussion: 12 | - [Guest Editorial: Reflections on the Ethics of Web Archiving](https://www.tandfonline.com/doi/full/10.1080/15332748.2018.1517589) 13 | - [We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections](https://uwspace.uwaterloo.ca/bitstream/handle/10012/11649/Milligan_etal_JCDL2016%281%29-s.pdf?sequence=1&isAllowed=y) 14 | - Archiving social media for good: 15 | - [https://www.docnow.io/](https://www.docnow.io/) 16 | - [https://www.bellingcat.com/](https://www.bellingcat.com/) 17 | 18 | ## Participants 19 | 20 | **Participants (write your names below)** 21 | 22 | *Name / Institute 23 | 24 | - Federico Nanni, The Alan Turing Institute 25 | - Leontien Talboom, UCL & The National Archives 26 | - Jenny Bunn, The National Archives 27 | - Andy 28 | - Nicola 29 | - Helena 30 | - Ian 31 | 32 | :books: Reference and other works mentioned during the discussion 33 | --- 34 | Recent blog post on an experiment we ran using Webrecorder in the last UK General Election: https://blogs.bl.uk/webarchive/2020/05/using-webrecorder-to-archive-uk-political-party-leaders-social-media-after-the-uk-general-election-2.html 35 | 36 | This is an older blog post discussing archiving social media throgh heritrix: https://blogs.bl.uk/webarchive/2017/04/the-challenges-of-web-archiving-social-media.html 37 | 38 | IIPC collections are cross national and multi lingual. All are open access: https://archive-it.org/home/IIPC 39 | 40 | WARCnet: https://cc.au.dk/en/warcnet/
 41 | 42 | IIPC Research Working Group: https://netpreserve.org/about-us/working-groups/research-working-group/ 43 | 44 | :mag: Main arguments from the discussion 45 | --- 46 | The discussion started with a few examples of how archiving of social media is done in practice, covering the UK Government Web Archive, Bellingcat and Document the Now. 47 | - UK Government Web Archive is archiving YouTube and Twitter for example from government pages. They are only able to archive the original content, none of the comments or other community parts of it. 48 | - Document the Now doesn't archive themselves but offers a set of tools to empower people to archive material online. 49 | - Bellingcat does archive the context around it, sometimes this being private or confidential information. But they archive it as evidence, which is a slightly different purpose. 50 | 51 | Starting question was around the difference between archiving and capturing social media. This then led to the British Library (BL) outlining their approach to archiving social media. They also talk about the Legal Deposit, as this limits them from archiving at scale. Another question was asked around the UK content and how this should be determined, as these boundaries may not be as visible on the web. 52 | 53 | Then there was a discussion around metadata, especially focusing on losing context, such as the UK Government Web archive, only capturing what the government does online. BL keep specific metadata with their material to preserve the context, but what is considered enough and what are we actually capturing? 54 | 55 | 56 | 57 | :closed_book: Closing thoughts 58 | --- 59 | Although the discussion may not have given concrete answers to what should and shouldn't be archived. It was good to hear people's different approaches to this and what their problems they encountered when doing this type of work. 60 | 61 | ### Additional Drafted Notes 62 | 63 | 64 | We will start from a few examples on how this is done in practice. 65 | 66 | - The UKGWA archives twitter, youtube and trying facebook 67 | - They keep the tweets (no context / retweets / comments) 68 | 69 | Another example is the Document the Now (they don't archive material themselves) but offer tools 70 | - Bellingcat is an investigate journalist company. They use social media to debunk certain things - examples on Covid misinformation 71 | - They have a more don't ask for permission but ask for forgiveness. 72 | 73 | Is it useful to archive 74 | 75 | - difference between archiving and capture 76 | - from BL. Archiving social media preserving the entire life cycle. Legal deposit - they don't archive social media at scale. 77 | 78 | How to assess what is UK content? 79 | - Heretrix used for archiving at scale - not suited for archiving at scale 80 | - The UKGWA doesn't keep dynamic nature of social media 81 | - Losing context around the tweet 82 | - UKWeb Archive at BL tries to get context from metadata 83 | - Social media material is as much as possible publicly avaiable (you would need an additional permission from the content owner) 84 | - what do we create when we doing it? What are we capturing? (who archives social media at scale?) 85 | - build social network from the XVIII century through letters - you can do that, but it is not there in the same form 86 | - what are we preserving and why? 87 | - Research on social media - but how did you collect them? 88 | - Discussion around anonymisation / deanonymisation 89 | - And the role of the archive in this - it could act as a collaborator for the researchers for guaranteeing the way data is collected 90 | - Prioritisation from the archivist point of view. This is usually not a priority 91 | - Technical expertise are needed 92 | - We can run the code from our holdings 93 | - CommonCrawl / Internet Archive are available so people can run their code first there 94 | - document the process of how data has been collected 95 | - question for the researchers: what do they want? 96 | 97 | Discussion around preserving cross-national events 98 | - the distinction between private and public is dissolved now - the social context is all mixed together 99 | - discussion about the role of BL on archiving and making available 100 | - archiving newsletter? discusses how BL archives currently Weibo 101 | - moving more into a contracting out to the users the type of contents to preserver 102 | - who makes the choice? Delegating to the crowd 103 | - but are we replicating the old model? 104 | - people are already contacting the BL for preserving their online activities before 105 | 106 | Discussion on who to communicate to the crawler what you should not 107 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2021-04-28.md: -------------------------------------------------------------------------------- 1 | # Notes 28 April, 2021 2 | 3 | ## Topic: Open-source Journalism 4 | 5 | **Participants (write your names below)** 6 | 7 | *Name/Institute 8 | 9 | - Federico Nanni, The Alan Turing Institute 10 | - Leontien Talboom, UCL & The National Archives UK 11 | - Malvika Sharan, The Turing Way - Turing Institute 12 | - Katie MacDonough, The Alan Turing Institute 13 | - David Beavan, The Alan Turing Institute 14 | - Giovanni Maria Pala, University of Oxford/Magdalen college 15 | - Bernard Ogden, The National Archives 16 | - Camila 17 | - Andre 18 | - Rossitza Atanassova, British Library 19 | - S. Sharoff 20 | - Ridda 21 | - Ismael 22 | - Andre Piza - Alan Turing Institue 23 | - Andrea Kocsis, The National Archives 24 | 25 | **Volunteers to take notes: Please add your name below** 26 | 27 | - Federico 28 | - Malvika 29 | 30 | ### Slides 31 | 32 | [Open Source Journalism](https://docs.google.com/presentation/d/1EWPeRaRDjYKbs8Y1P06rqKIB8Ul0AUhUMNOmHxD31pA) 33 | 34 | - references: 35 | - https://www.bbc.com/news/technology-22214511 36 | - https://www.bellingcat.com/ 37 | 38 | ### :dart: Quick Question 39 | 40 | *Feel free to answer or add a '+1' next to a statement that you agree with and/or would like to discuss* 41 | 42 | **What is your experience with open-source journalism? Do you think it could be of any help for your discipline? If so, why?** 43 | 44 | *Name / response* 45 | 46 | * David / Along with Camila here on the call, I'm part of [Turing Data Stories](https://github.com/alan-turing-institute/TuringDataStories): a mix of open data, code, narrative 💬, visuals 📊📈 and knowledge 🧠 to help understand the world around us. 47 | * Blatent self promo: Come join us, with stories, data, ideas and community building 48 | * [name=Camila] 🙌 YASS! 49 | * Ismael / Not very much, but I spoke with the Comms team today and it was interesting to hear how things have changes during the pandemic, as journalists now just look out for preprints. See Fox ([2020](https://www.sciencemediacentre.org/what-should-press-officers-advise-on-preprints-during-a-pandemic/)). 50 | * David / One of my fave exhibitions ever was [Forensic Architecture](https://forensic-architecture.org/) '...undertake advanced spatial and media investigations into cases of human rights violations, with and on behalf of communities affected by political violence, human rights organisations, international prosecutors, environmental justice groups, and media organisations.' 51 | * Camila: One of my main source for Venezuelan news comes from a [OSINT account](https://twitter.com/conflictsw) that can't be censored by the regime. It is been extremely useful to know what is happening in certain events. 52 | * Malvika: Positive journalism in the context of citizen science (researcher's night, road show, data reporting, blogging) and negative journalism in the context of public shaming of people social media (twitter trends on open source related news - current one on basecamp). 53 | * I am thinking more in the direction of peer-reviewing of news where ideas are represented fairly (a more wholesome view of world event rather than living in a bubble). 54 | * Andre: I'm a journalist by background and I'm running a project with the Bureau Local (a branch of the Bureau of Investigative Journalism) that works with a lot of open source or citizen journalism principles. The project I'm running is a collaboration with Coney (interactive theatre company) and they are working on innovative community engagement, investigation, storytelling and impact strategies. The idea is that this could help both organisations to tell contemporary stories that matter to people co-creating with communities. It is based on the idea that traditional journalism is extractive of society and it needs to work better for communities affected by the issues that journalists cover. **For reference**: [FT's The Uber Game](https://ig.ft.com/uber-game/) 55 | * Katie: I am familiar with a couple of projects at the Spatial History Project at Stanford that are nice combinations of mapping/data viz and journalist- and documentarian-driven projects. For ex: http://web.stanford.edu/group/spatialhistory/cgi-bin/site/project.php?id=1045 56 | * Giovanni / No real experience but curious about the value as a source, both for current affairs and Social Science, but also within a more classic "historical source" framing. I wonder how much notions of legacy/obsolescence of the journalistic open-data are discussed within these groups. 57 | 58 | 59 | ### ✍ Initial Drafted Notes 60 | 61 | *anyone can help taking notes.* 62 | 63 | 64 | - Starting point on Boston bombing and picking the wrong suspect (discussion from online forums like Reddit). 65 | During the last 5 years open source journalist as a counter movement to alt-facts. 66 | - Transparency as a key aspect - very very clear where they got their sources from and there's lots of help from the public. Example of Bellingcat on geolocating picture. Open discussion is public around information. 67 | - S. Sharoff: texts published by non by professional journalist but by everyday people. The format and style is often similar but the type of content is different. 68 | The founder of Bellingcat is not a journalist. 69 | Andrea: she is trained as a journalist, but it's not the education that makes you a journalist, more about the prcatice. 70 | - Andrea: internship is again learning through practice instead of professional training (trust in editing) 71 | - Open source journalism is about building accountability. Quicker response to any concern compared to established outlet. 72 | - S. Sharoff: there are some linguistic differences in the way information is presented. often opinions and third-person viewpoints are not mixed. Fake news spreading often these are more personal 73 | - André Piza: open-source journalism has a great impact on organisation. they recognise that there's a big change in the field 74 | - Andrea: the role of citisen journalism in anti-democratic countries. 75 | - Malvika: open-source journalism and open-source development. Transparency seems to be the key. Peer reviewing, challenging the power imbalance and allowing users to become contributors. 76 | - Here they define if blogging can be called open source journalism: [https://www.upstart.net.au/explainer-open-source-journalism/](https://www.upstart.net.au/explainer-open-source-journalism/) 77 | * Crowdsourcing data and stories: 78 | - Dave: transparency is the main aspect. Bellingcat is based on open source data. Traditional journalism doesn't show the sources and the flow so clearly ("protecting sources"). 79 | - Katie: writing about research and writing about journalism. Training graduate students how to write to non academics about their research - often it is done to replace the comms stuff. 80 | - Dave: Turing Data Stories: https://github.com/alan-turing-institute/TuringDataStories 81 | - Andre: Crowdsourcing data and stories: https://www.thebureauinvestigates.com/blog/2021-04-14/a-blueprint-for-investigative-journalism-how-the-bureau-worked-alongside-riders-to-investigate-deliveroo 82 | - Leontien: FullFact and it's not about being trusted, it's about being trustworthy. 83 | - Serge: writing is for an audience and for different audiences you'll have different messages. 84 | - Andrea: the audience is more differentiated than just age. Older demographies are more exposed to fake news 85 | - Camila: youtube as first source of news for young people. 86 | - Andre: from 2019 social media is not anymore growing as source of news and also trust in people is changing. 87 | 88 | ### :books: Reference and other works mentioned during the discussion 89 | 90 | *Please add link and reference, any work that has been discussed and mentioned* 91 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2021-02-24.md: -------------------------------------------------------------------------------- 1 | # Notes 24 February, 2021 2 | 3 | ### Topic: The Role of Authorship** ([slides](https://docs.google.com/presentation/d/1oanQWbP_yg9UkLSI9OrNwSxST0K5mkgXvNiymz4ixqw/edit?usp=sharing)) 4 | 5 | **Volunteers to take notes: Please add your name below** 6 | 7 | - Leontien 8 | - Malvika 9 | 10 | **Participants (write your names below)** 11 | 12 | *Name / Institute* 13 | 14 | - Federico Nanni, The Alan Turing Institute 15 | - Leontien Talboom, UCL & The National Archives UK 16 | - Malvika Sharan, The Turing Way - Turing Institute 17 | - Katie McDonough, The Alan Turing Institute 18 | - Emma Karoune, The Alan Turing Institute 19 | - David Beavan, The Alan Turing Institute 20 | - Ismael Kherroubi Garcia, The Alan Turing Institute 21 | - Jez Cope, The British Library 22 | - Glenn, The Alan Turing Institute 23 | - Jenny Bunn, The National Archives 24 | - John Moore (The National Archives) 25 | - Alessandro Tirapani 26 | - Isil Bilgin (University of Reading, Brainhack Global Organization Committee) 27 | - Kaspar Beelen, The Alan Turing Institute 28 | - Bernard Ogden 29 | - Serge Sharoff, University of Leeds University of Leeds 30 | - Becca Hutcheon 31 | - James Smithies, King's Digital Lab, Kings College London 32 | - Rossitza Atanassova, The British Library 33 | 34 | ### :dart: Quick Question 35 | 36 | *Feel free to answer or add a '+1' next to a statement that you agree with and/or would like to discuss* 37 | **Is authorship a topic you discuss openly with your colleagues / group? Or is it something that comes out only at the very end of a project (when you’re about to submit a paper for instance)?** 38 | 39 | *Name / response* 40 | 41 | - Jenny - Coming from the perspective of being an archivist we have been reluctant to claim ownership of our work (e.g. in the form of a catalogue). We operate in a different framework around recognition - For many years it has been seen as important that we are invisible, neutral in the process, but this is increasingly being questioned. +1 DB:neutrality I suspect is a myth, the catalog is a reflection of individuals, and certianly society at time of writing/editing. JB - Exactly this is what is being recognised and some are calling for colophons to be added - a sort of positionality statement for archivists. 42 | * Emma - single author still fairly common in my field, archaeology but I am speaking about this a lot at the moment with The Turing Way - discussing how many authors and contributions can be captured or attributed fairly, how to record collaborative projects etc. 43 | - Ismael - Yes, with those I discuss philosophy of science with; not with the people I am currently preparing a conference paper with. Hmmm, may have caught myself there. 44 | * Malvika: Yes, but because I am not currently in a job that sits in a traditional academic system. It is quite important in open source community as a lot of work stay hidden and we would like them to receive acknowledgement. 45 | * 46 | * Isil: I guess the authorship agreement highly differs between the fields, even countries depends on the apriori accepted settings regarding the order, who should be there, and so on. This seems like a challenge in many aspects to break when it becomes a backboned struture especially when you are an ECR. Here is the question how to decide which contribution worth more than the other or how to quantify them given we all are working under public fundings which outputs and benefits of the research more important than these discussions to focus. 47 | * James: We're taking a more ad hoc approach than we'd like in King's Digital Lab, due to other priorities. It's something we're starting to consider now, but on a case by case basis. We are running a 'Research Collaboration Framework pilot' that defines RSE research involvement in the pre-grant stage, and defines high level authorship / attribution expectations, but doesn't get down to the specifics of attribtuion in articles. In general, King's DH has long assumed a 'movie credits' approach to attribution on websites, but we do have work to do with other kinds of outputs. It would actually be useful to have someone other than me here, given I'm lab director and it's really an issue for the RSE team to decide (in my opinion). 48 | * [name=Jez] Mostly discuss this as part of the writing process myself; very interested in recording the different roles in metadata, see e.g. [CASRAI Contributor Roles Taxonomy (CRediT)](https://casrai.org/credit/) 49 | * Yes, and it gets amplified when there may be different linked outputs, e.g. papers and code. Getting agreement over who are authors, who are contributors who are acknowledged is always hard +1 [name=Glenn] 50 | * Alessandro: in social sciences, it is uncommon to have more than 3 authors. Most papers that are not single authored have quite clear rules. The person who did most work/had the idea/collected data is the first author, and the others are listed depending on contribution. At times a new author is added in the review process, and it would normally go last. So I think there is not much debate beforehand, but it can be discussed when more than one person collects data or does substaintial work along the way. In few cases, you can see a * saying 'All authors contributed equally'. 51 | 52 | 53 | 54 | ### ✍ Initial Drafted Notes 55 | 56 | *anyone can help taking notes.* 57 | 58 | 59 | First discussion talked about the contribution of code, how does this work? Github holds a history of who has worked on the code, but what if you move it? 60 | 61 | Jenny talks about being the 'invisible labour' of doing archival work and that archivists are complicit with that as they perceive themselves as neutral. This question has come up more now that there is more recognition for preprocessing datasets within academia. But archivists do not get recognized in the same way that someone who may be doing this preprocessing. 62 | 63 | At King's Digital Lab they are looking at different ways that people may be recognised for their research contribution. They are able to choose between three options, 64 | co-investigator, research services or undertaking a project and getting recognition for it. 65 | 66 | Other invisible work is done by for example students, who may not be recognised for the tedious work in projects. Jenny also talks about participants in this setting and that it is also about understanding when recognition is important. 67 | 68 | Alessandro talks about how in social sciences the position is almost irrelevant. It doesn't really matter as much, whereas in other fields it matter quite a lot. 69 | 70 | Fede then mentions how he would like to discuss the interdisplinary work and publishing this. And what do you do? Do you publish in a highly prestigous interdisciplinary journal? Or do you publish in two different journals? What is the best way of doing this? 71 | 72 | Fede also asks if code is something that is seen as more general across disciplines? David did say that it is not, they are different ways of coding and different ideas of good code. 73 | 74 | Ismael asks if people really are seeking recognition and Jenny talks about how there are different levels of recognition, also there is a difference between people who want the recognition and people who need the recognition. 75 | 76 | Isil talks about how every contribution is important. Creating a mindset and a community where any contribution is a contribution is important. 77 | 78 | ### :books: Reference and other works mentioned during the discussion 79 | 80 | *Please add link and reference, any work that has been discussed and mentioned* 81 | 82 | - [CASRAI Contributor Roles Taxonomy (CRediT)](https://casrai.org/credit/) 83 | - [The Turing Way](https://the-turing-way.netlify.app) - a guide to reproducible data science that will support students and academics as they develop their code, with the aim of helping them produce work that will be regarded as gold-standard examples of trustworthy and reusable research. 84 | - [Communicating Open Science](https://github.com/alan-turing-institute/the-turing-way/issues/1733) 85 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-06-24.md: -------------------------------------------------------------------------------- 1 | ## Notes: 24 June, 2020 2 | 3 | - Host: Leontien and Fede 4 | 5 | ## Topic: 6 | 7 | - Commercial organisations doing the job of libraries/archives 8 | - [Slides](https://docs.google.com/presentation/d/1ZfY0\_GyYBkRyvkrCt\_7hJhShFNdYGaRYQUsKpizUBAY/edit?usp=sharing) 9 | 10 | ## Aim of this meeting 11 | 12 | - Having a conversation rather than a one-to-many reading group 13 | - Discussing topics at the intersections of the two disciplines 14 | - Trying to consider different / uncommon points of view 15 | - Sign up to this mailing list: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=TURINGINS-HUMANITIES-DATASCIENCE 16 | 17 | 18 | **Participants (write your names below)** 19 | 20 | Name / Institute / What brought you here? (answer in a short sentence) 21 | - Federico Nanni / Turing / Leading the session 22 | - Leontien Talboom / UCL / chairing the session 23 | - Katie McDonough / Turing 24 | - Malvika Sharan / Turing / Community discussions 25 | - Sarah Gibson / Turing 26 | - Scott Bailey 27 | - Rossiza Atanassova 28 | - Patricia Murrieta 29 | - KBeelen 30 | - Eirini Goudarouli (TNA) 31 | - Daniel Wilson / Turing / Historian working with/on 'data' 32 | - D Vanstrien 33 | - David Beavan / Turing / Co-Organiser Humanities & Data Science interest group 34 | - Bernard Ogden 35 | - A Lang 36 | - Barbara McGillivray 37 | - 38 | 39 | 40 | :dart: Discussion Goal 41 | --- 42 | 43 | Commercial Digitalisation is not a library, or are they? 44 | 45 | Example 1: 46 | - National archives have agreement with "Findmypast" to secure records on ancestory 47 | - pro- Findmypast does the digitalisation and preserves the data in different format 48 | - Con- this is available only upon visiting the national archive reading room, but to access them from home there is a paywall for access 49 | 50 | Example 2: 51 | - Googlebooks: Digital bookstores are not library 52 | - they are copyright and authors don't benefit from them 53 | - You need to pay to access the books and hence its google that profits from this and not society 54 | 55 | Example 3: 56 | - Internet archive - not library but piracy as there is no license to make these books available 57 | ![](https://i.imgur.com/sQpDrsM.png) 58 | 59 | Questions: 60 | 61 | 62 | 2. What are the drawbacks of commercial organisation acting in this environment? 63 | - As their main goal is to make money, how do we ensure that our values also come across? 64 | - How do we guarantee long-term preservation? (when the hype is gone) 65 | - If the business is based on data, how do we ensure that data is open and fully available? 66 | 67 | 3. How should we be setting up such a relationship? 68 | - Which value do we recognize in their work, apart from the invested budget? 69 | - How can we ensure that our expertise is not lost? 70 | - Or is it something that academia should discourage as a whole? 71 | 72 | 73 | :books: Reference and other works mentioned during the discussion 74 | --- 75 | 76 | - [Google books](https://books.google.com/) 77 | - [Torching the Modern-Day Library of Alexandria](https://www.theatlantic.com/technology/archive/2017/04/the-tragedy-of-google-books/523320/) 78 | “Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.” 79 | - [Google & the Future of Books by Robert Darnton](http://www.nybooks.com/articles/22281) 80 | - Gale Cengage came up with the Digital Scholar Lab to allow computation with the digitised collections behind pay-wall: https://insights.uksg.org/articles/10.1629/uksg.482/ 81 | - [Removing Barriers to Digital Scholarship](https://www.gale.com/intl/primary-sources/digital-scholar-lab) 82 | 83 | :mag: Main arguments from the discussion 84 | --- 85 | 86 | **Questions 1.** What are the benefits of commercial organisation acting in this environment? 87 | - How would we otherwise fund large digitisation projects? 88 | - Does this mean the material is more widely available? 89 | - Does this simplify cross-country efforts? 90 | 91 | **Discussion on Google books and British Library (BL) contract on digitalisation of literature**: 92 | - Google books: https://books.google.com/ 93 | - What are their business model? - very unclear to library and archives 94 | - Rossitza: Google Books are still going, and digitising collections at the BL and other institutions 95 | - Daniel Wilson (in chat): As Rossitza said, they *are* digitising thousands of books a month: but the business model is more opaque.I assumed Google Books was meant to be a ‘loss leader’ for the wider operation: PR for their ambition to ‘organise all knowledge’ (aka advertising) 96 | - A Lang (in chat): book historian Robert Darnton wrote a good piece on this in the NYRB some years ago (ironically enough, behind a paywall: http://www.nybooks.com/articles/22281) 97 | 98 | Questions on Google digitalisation: 99 | - Patricia Murrieta (in chat): what is the arrangement between the BL and them Rossitza? In terms of what you get and what do they get? 100 | - The contract is available online, it is quite inflexible https://www.openrightsgroup.org/blog/access-to-the-agreement-between-google-books-and-the-british-library/ 101 | - What are they interested in digitalising? What kind of material? Just speculating what's their goal :D 102 | - Curators had the freedom to select materials but there are restrictions on dimension and condition to meet the requirements of the scanning equipment Google use 103 | - A lot of the books can be rejected if the metadata is missing 104 | - There is a focus on scale than content, e.g. a request to digitalise some specific material was rejected because metadata was missing and BL did not have the resources 105 | - The goal seem to be text mining and OCR ... 106 | - Libraries are having different dialogues with google group (separately), and it's not consistent 107 | - Even if their goal is not the most charitable, how can communities benefit from it? 108 | - David Beavan (in chat): They are hoovering up all of human knowledge. Born digital for them = web, they have got covered. This is a way of going back in time. Language models, semantic change, OCR, gateway to knowledge. If they become to de facto place for knowledge/search and put libraries out of business (even if only by convenience) then you’ll get adverts between page turns etc.etc. Google are ultimately a advertising company 109 | 110 | - Daniel Wilson: there is one buyer and no competition. we need to understand what is that they gain from this, in order to value the resource they are being given. 111 | - In any case, my point was more that the BL felt its hands were tied, even before it got to that point
 112 | 113 | - Katie: ancestry free from local libraries in the UK during the pandemic. 114 | - Geneology organisations hold a lot of power (personal information) 115 | - Based on where they are (America or UK), they also compete for information 116 | 117 | - Patricia: It's in a way like publishing companies. In order to change the model, holders of knowledge would have to choose not to go with them... 118 | - Mia: Other organisation can access Google Books, but they don’t mind the unlimited liability that Turing didn’t agree to 119 | 120 | - Kate M (in chat): Is there any writing/research about which countries have provided public funding for digitization vs. those that have gone (at least primarily) with commercial digitization? 121 | 122 | - Mia: Really great overview on the efforts of different countries in digitizing their cultural heritage (France, Finland, Australia, New Zealand, Canada) in comparison with the UK 123 | 124 | :closed_book: Closing remarks/questions/topics (for future discussions!) 125 | -- 126 | 127 | - In science we have a strong open movement on the basis that the Tax payers (public funding) going into research should produce output that is publicly accessible. 128 | - However, that kind of funding is missing in humanities which is shocking given the fact that humanities affects generations of scholars, researchers, politicians and citizens. 129 | - What we have also realised that some of the researchers work on a field not because that's what they want to do, but because that's the only field they can access paper on - I wonder if that pattern exists within humanities as well. 130 | - The embargo for IP rights on research output are same across all these fields 131 | 132 | ### Additional Drafted Notes 133 | 134 | 135 | - 136 | 137 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-12-01.md: -------------------------------------------------------------------------------- 1 | 2 | ## Notes 01 December, 2020 3 | 4 | ## Topic 5 | 6 | **Ground Truth and the Humanities**: The Structured Representation of Places (and other Named Entities) in a Knowledge Base ([slides](https://docs.google.com/presentation/d/1PIddDoFrhQsSwxvfy715_5idNujxbnmv2Jo-ztS-8_s/edit?usp=sharing)) 7 | 8 | 9 | ## Volunteers to take notes 10 | 11 | - Fede 12 | - Malvika 13 | - 14 | 15 | ## Participants 16 | 17 | **Participants (write your names below)** 18 | 19 | *Name / Institute* 20 | 21 | - Federico Nanni, The Alan Turing Institute 22 | - Leontien Talboom, UCL & The National Archives UK 23 | - Malvika Sharan, The Turing Way - Turing Institute 24 | - Katie MacDonough, The Alan Turing Institute 25 | - Ludovic Moncla, INSA Lyon 26 | - Carmen Brando, EHESS Paris 27 | - Rossitza Atanassova, British Library 28 | - Bruno Martins 29 | - Matt 30 | - Arno 31 | - S. Sharoff, Leeds 32 | - Gethin Rees 33 | - Beatrice Alex 34 | - Janelle Jenstad, University of Victoria, Map of Early Modern London 35 | - Daniel Wilson 36 | - Kasper Beelen 37 | - Ruth Mostern 38 | - Karl Grossner, World Historical Gazetteer 39 | - Arianna Ciula, King's DigitaL Lab, King's College London (UK) 40 | - Francesca Benatti, The Open University 41 | - Enrico Daga, The Open University 42 | - Miranda Lewis 43 | - Arianna Ciula 44 | - Mark Bell 45 | - Katherine Bellamy 46 | - Yann Ryan, QMUL, Networking Archives project 47 | - Arno Bosse, KNAW Humanities Cluster, Amsterdam 48 | - Alex Butterworth 49 | - Barbara McGillivray 50 | - Bekka Kahn 51 | 52 | 53 | ✍ Initial Drafted Notes 54 | --- 55 | *anyone can help taking notes.* 56 | 57 | 58 | - Ground Truth in Humanities: came across this through [field paper](https://wiki.openstreetmap.org/wiki/Field_Papers) 59 | - Remote sensing data and historical map: what counts as ground truth about historical places? 60 | - The concept of ground truth, especially in digital humanities 61 | 62 | - Reference for the concept in the humanities 63 | - Etymology of the word ground truth: remote sensing, the truth "in the ground". Correcting the digital data (no longer or never represented) what is on the ground. Ground truth as returning to the ground. Going back to the landscape 64 | 65 | - As per wiki: https://en.wikipedia.org/wiki/Ground_truth 66 | - 'records the use of the word "**Groundtruth**" in the sense of a "fundamental **truth**" from Henry Ellison's poem "The Siberian Exile's Tale", published in 1833.' 67 | - [Geographic information systems](https://en.wikipedia.org/wiki/Geographic_information_system "Geographic information system") such as GIS, GPS, and GNSS, have become so widespread that the term "ground truth" has taken on special meaning in that context. If the location coordinates returned by a location method such as GPS are an estimate of a location, then the "ground truth" is the actual location on Earth. 68 | 69 | - Arianna Ciula: in terms of computer science context of verifiability and the lack of data in the humanities to often establish ground truth: https://drops.dagstuhl.de/opus/volltexte/2013/4167/pdf/dagman-v002-i001-p014-12382.pdf - useful to have in some circumstances e.g. when subsantial training datasets needed 70 | - Coincidentally today with some other colleagues we had a meeting with Prof Charlotte Roueche about re-building/refreshing this gazetteer project https://www.slsgazetteer.org/ and one of the discussion points focused on the constraints on measurements methods at the time when names were recorded (the point is that we are for granted even the identification of exact coordinates). 71 | 72 | - Janelle Jenstad: the need of an autority of name for place. What counts as ground truth**s** about a place? Who wrote those documents and why? 73 | - Who has the right to name? 74 | - Experience working on historical mapping of early modern London, North America's indigenous land. 75 | - Who from the past has the right to name the land, who we have forgotten, how do we develop a gazetteer that brings people to get their voices heard (truth from the land/ground) 76 | 77 | - Ruth Mostern: what we see in published works and what we see on the ground are not the same thing! Go to the ground to figure out what is there. Thinking about the spacial humanities as part of the humanities. Decolonizing colleg campuses - list of every building / roads in every campus and studying all individuals commemorated. 78 | 79 | - Definition: A **gazetteer** is a geographical **dictionary** or directory, an important reference for information about places and place names (see: toponomy), used in conjunction with a map or a full atlas. 80 | 81 | - Bea: working on NLP and geoparsing in Edinburgh. Ground truth for verifying an algorithm. Project and task driven the decision on the gold standard / ground truth. 82 | - Katie: is the gold standard the same thing as the ground truth? 83 | - Bruno Martins: we use ground truth referring to annotations that can be considered as truth. we often work on projects that have ambiguity - we look at the level of agreement. there are tools to quantify uncertainty. reaching something that is close to a consensus. In geography you want to go beyond categorical variables. Does probability theory 84 | - S. Sharoff: another point of view from NLP. Classification of genres for instance, often there is the problem of reliability of annotations. Annotations offer different perspectives. If you have a task for translating, a question is what is the cognitive difficulty of this (this is really difficult to establish) 85 | - Katie: there is lots of theory on how to handle interannotator agreement, but with visual material the reference point is going back to pre-digital geography. 86 | - Ariana C: To react to Bruno’s comment: agreed but I also think it depends a lot on the dataset we are discussing; e.g. in the case of palaeography mentioned above there are simply not enough data to establish a ground truth in the computational sense 87 | 88 | 89 | - Francesca Benatti: Another historical example is the Irish Ordnance Survey, which renamed every single place name in Ireland from Irish Gaelic into English during 1824-1842, while Ireland was part of the United Kingdom. Even after Irish independence in 1922, the English place names are the ones that have remained in common usage. Some of the Irish Gaelic names have only survived in the documents of the surveyors. In Northern Ireland, certain places have different names for the nationalist community and for the unionist community e.g. Derry/Londonderry. “Truth” cannot be separated from the political. 90 | 91 | - Karl Grossner, it is a bit ironic that a group of humanists discuss about how to establish truth. There is observational data, but this is different from truth. 92 | - Katie: Who's on first as an interesting project (from the internet, not from academia). How do we capture multiple truths? 93 | - Arianna: Data Information Knowledge pyramid quoted a lot but not always clear what it means in practice so enjoying Karl’s explanation 94 | - Ruth: Devil's advocate position: the fact that there was a Gaelic name that was erased by British colonialism (as per the example above) IS a truth, which needs to be excavated. Truth is a good word for this process. 95 | - Bekka: Knowledge as ‘belief’ also makes me think about holy/spiritual places which were real at historical moments such as Hades, or Jerusalem (I know the team at KIMA have thought about this a lot). 96 | - Arno: be transparent on what we want to use the gazetteer for. Why are you building the gazzetteer 97 | - Katie: what is important when you construct a gazzetteer (both historically and now) 98 | 99 | - Enrico: what is the identify of a place? Is London and Londinium the same place or different places? You use the term ground truth to make a point in computer science. What is the role of "ground truth" in the humanities? Maybe we should speak more about "evidences". 100 | - Arianna: defined data models in a layered way (definition of strata depends on scope/purpose/user research) - even a minimum data model for place can be quite complex as the concept is relational (space-time) 101 | 102 | Full Zoom chat available [here](https://drive.google.com/file/d/1oVRDSPYsBZpUzylSjjubTQJLibdu2LTD/view?usp=sharing) 103 | 104 | :books: Reference and other works mentioned during the discussion 105 | --- 106 | *Please add link and reference, any work that has been discussed and mentioned* 107 | 108 | - [field paper](https://wiki.openstreetmap.org/wiki/Field_Papers) 109 | - Gazetteer project: https://www.slsgazetteer.org/ 110 | - ref by Arianna: the humanities to often establish ground truth: https://drops.dagstuhl.de/opus/volltexte/2013/4167/pdf/dagman-v002-i001-p014-12382.pdf 111 | - Ref by Janelle: https://www.w3.org/2009/12/rdf-ws/papers/ws21: Halpin and Hayes, "When owl:sameAs isn't the Same" 112 | - Ref by Karl Grossner: https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24194 113 | - Ref by Francesca: Logainm.ie The Placenames Database of Ireland, which is working to register both Irish and English place names https://www.logainm.ie/en/. It also links to historical sources of place names
 114 | - Recent Reassembling Republic of Letters publication also includes useful reference to modelling of places: https://www.univerlag.uni-goettingen.de/handle/3/isbn-978-3-86395-403-1?locale-attribute=en 115 | - cf. World Historical Gazetteer: http://whgazetteer.org 116 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-09-02.md: -------------------------------------------------------------------------------- 1 | ## Notes: 02 September, 2020 2 | 3 | ## Aim of this meeting 4 | 5 | The focus is on the **current and future role of preprints** as a way of sharing research findings, with examples from different communities. 6 | 7 | 8 | ## Participants 9 | 10 | **Participants (write your names below)** 11 | 12 | *Name / Institute* 13 | 14 | - Federico Nanni, The Alan Turing Institute 15 | - Leontien Talboom, UCL & The National Archives 16 | - Jessica Polka, ASAPbio 17 | - Anna Rogers, University of Copenhagen 18 | - Dmytro Mishkin, CTU in Prague 19 | - Demitra Ellina, F1000Research 20 | - Barbara McGillivray, University of CamBarbara McGillivray 21 | - David Beavan, The Alan Turing Institute 22 | - Rennie Mapp, University of Virginia, US 23 | - Martin O'Reilly, The Alan Turing Institute 24 | - Callum Mole, The Alan Turing Institute 25 | - Adam Tsakalidis, QMUL & The Alan Turing Institute 26 | - Alessandro Tirapani, City, University of London 27 | - Giulia Paci, UCL 28 | - Amy Tabb, at the meeting as an independent scholar, USA 29 | 30 | :dart: Quick Questions 31 | --- 32 | *Feel free to answer them or add a '+1' next to a statement that you agree with and/or would like to discuss* 33 | 34 | 35 | **In which cases do you post a preprint of your work?** 36 | 37 | *Name / response* 38 | 39 | - Dmytro / Anytime, unless my collaborators are against it 40 | - Leontien / Never, it is not common in my field 41 | - Jessica / Always (except some commissioned review articles) 42 | - Adam T / faster access (post-acceptance) 43 | - David B / does final author copy count here? i.e. to satisfy open access - where it's mandadted by funder 44 | - Martin O / Pre-submission or post-acceptance (depending on journal policy) if journal paper is not open access, as I always want a freely available copy. I'd like to move my default to pre-submission pre-print as standard practice 45 | - Callum / I publish a pre-print at paper submission. Primarily for faster access since the review process can be so sluggish sometimes. I also like that the review process is then transparent (if the paper changes a lot from pre-print to journal article). 46 | - Amy Tabb / most of the time when publishing w/ CS/ECE researchers. 47 | 48 | **And when you don't?** 49 | 50 | *Name / response* 51 | 52 | - Dmytro / Only if co-authors are not allowed to. 53 | - Jessica / Review article requested by the journal (depending on journal policy) 54 | - Alessandro / In our field (organisation studies) it is extremely uncommon to do it alltogether. Multiple journals ask you to take it down (few do it nonetheless) and some do not accept articles already posted online 55 | - Amy Tabb / when the lead authors are not in favor and/or it is not in the discipline's tradition (entomology). 56 | 57 | 58 | **How do you select which new preprints to read?** 59 | 60 | *Name / response* 61 | 62 | - Leontien / Mainly shared by people on Twitter [name=DavidB] +1 [name=Martin O] +1 63 | - Dmytro/ http://www.arxiv-sanity.com/, twitter 64 | - Jessica / Twitter - we have cataloged some efforts here: https://reimaginereview.asapbio.org/explore/?search_keywords=preprint&sort=latest 65 | - Amy Tabb / twitter 66 | 67 | **Do you ever question your approach?** 68 | 69 | *Name / response* 70 | 71 | - Dmytro / No. I thought about it a lot, but cannot find reasons not to for myself. 72 | - Amy Tabb / Also no. Preprinting has been very positive for my work and allows me to transfer the technology. 73 | 74 | **Do you then regularly read the final paper when it is published?** 75 | 76 | *Name / response* 77 | 78 | - David B / Nope, things have often moved on well before. Makes it tricky as what to cite, the preprint or the final 79 | - Leontien / Depends on the type of work it is, as some work will be outdated quite quickly 80 | - Dmytro / rarely, mostly if it is updated on arXiv and the paper is very relevant to me 81 | - Jessica / we have recommended that preprint servers implement changelog in metadata, would be good to see this for journals as well: https://asapbio.org/biopreprints2020-report 82 | - Demitra/ At F1000Research we combine the preprint with the open post-pub peer review process, with all versions linked and an ammendement box explaining what has changed between versions 83 | - Amy Tabb / not frequently because of access issues. 84 | 85 | 86 | :books: Initial Drafted Notes 87 | --- 88 | The Current and Future Role of Preprints Across Research Communities 89 | 90 | Preprints - scholarly or scientific papers that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. Some communities use it a lot more than others, also there has been a paid increase of preprints in the last decades. 91 | 92 | Why publish preprints? 93 | - Increased visibility 94 | - Increased citations 95 | - Faster dissemnation of results 96 | - May prevent scooping 97 | - It migh be an easy way to wrap-up a side project 98 | - Bypassing paywall 99 | 100 | Anna discusses the behaviour in NLP regarding preprints. Main point of preprints here is to try and get results out faster. Preprints can be very different from the published version, Anna gives an example of her own work which turned out completely different than the initial preprint. She talks about how even if the published paper turns out much better, not a lot of people revisit it. 101 | Dmytro disagrees, and talks about the fact that he does read an updated version if it is published. 102 | Martin would love to see how a preprint can change over time, a change log would be very helpful with that. Are they even still the same paper over time? 103 | 104 | There are very few journals in the life sciences that point to the preprints. Jessica talks about how journals should acknowledge that preprints exist. 105 | 106 | Some fields seem to be more comfortable citing preprints than other fields. In NLP citations for preprints seem to be more common than citing the actual published work. 107 | 108 | Across our different disciplines there are different approaches to editing and finishing off the final version of a published paper. 109 | 110 | Another point that we touched upon is the sheer amount of preprints and this slightly touches on the topic of trust of preprints. How do you select them? What approach is used here? 111 | 112 | Demitra from F1000 Research talks about how she approaches different disciplines. They are looking at the differences between fields, for examples, for some fields you need a PhD to be considered an expert, whilst others view a Master's degree as sufficient. Because the reviews are open and citable, it encourages people to do a better job at reviewing. 113 | 114 | Dmytro is concerned about if early career researchers do an open critical reviews, which could impact their career. Demitra talks about how you can team up and protect yourself from these types of situations. 115 | 116 | Preprints is increasing the speed of research, but does this push research into a certain direction? And is that the direction that you want to go in? Is this harmful for the research community? 117 | 118 | Jessica - making science openly accessible and make it possible for everyone to provide their expertise gives the writers a much better feedback than when using a more traditional peer-review approach. But the downside of this is that it is easier to disseminate misinformation. 119 | 120 | Rennie, from a digital humanities background, takes an example of a journal from her field (Cultural Analytics). This journal is against preprints, because it can disrupt the blind peer-review process. Both Dmytro and Martin question why this process is blind. Anna would like to preserve anonymity, her blogpost about this is linked in the below section for more details. 121 | 122 | Alessandro talks about how his field does not perceive preprints very well, they are not very well known either. Also, the discussion is different depending on the research methods, qualitative and quantitative material will need different approaches. He closes about mentioning how we should rethink the journal process. 123 | 124 | :mag: Main arguments from the discussion 125 | --- 126 | - Preprints may change drastically during the time of it being made available and when the actual paper is published. This rises questions around what is actually being cited, as the finished paper could be very different 127 | - Peer review is perceived very differently across different fields. It is difficult to find the balance between giving researchers credit for this work, but also keeping people's reputation intact if they provide a critical review, especially if this is an early career researcher. 128 | - The benefits of preprints differ across fields, in the section above some example have been given. However, there was a strong positive perception of publishing preprints. But some fields may be more used to using them than others. 129 | 130 | 131 | 132 | 133 | :books: Reference and other works mentioned during the discussion 134 | --- 135 | There is a tool for [seeing changes in arxiv papers](https://github.com/temken/comparxiv) 136 | 137 | [ReImagineReview](https://reimaginereview.asapbio.org/explore/?search_keywords=preprint&sort=latest) 138 | 139 | [F1000 Research](https://f1000research.com/about) 140 | 141 | Anna Rogers - [Should the reviewers know who the authors are?](https://hackingsemantics.xyz/2020/anonymity/#BharadhwajTurpinEtAl_2020_De-anonymization_of_authors_through_arXiv_submissions_during_double-blind_review) 142 | 143 | Dmytro Mishkin & Amy Tabb - [(part I) Hands off Arxiv!](https://amytabb.com/ts/2020_06_29/) 144 | 145 | Dmytro Mishkin & Amy Tabb - [(part II) What does it mean to publish your scientific paper in 2020?](https://amytabb.com/ts/2020_08_21/) 146 | 147 | Amy Tabb - [arXiv paper explainer](https://amytabb.com/ts/2020_08_09/) 148 | 149 | [Data feminism](https://mitpress.mit.edu/books/data-feminism) as an example of sharing qualitative research before publication 150 | 151 | [Twitter thread](https://twitter.com/annargrs/status/1301204793235566600) by Anna Rogers wrapping up the discussion 152 | 153 | :closed_book: Closing thoughts 154 | -- 155 | 156 | Next aspects on the topic that we can discuss in further sessions: 157 | * open peer review in the humanities 158 | * preprints / working draft and qualitative research 159 | * being a reviewer as a "job" 160 | -------------------------------------------------------------------------------- /meeting-minutes/notes-2020-05-20.md: -------------------------------------------------------------------------------- 1 | # Notes: 20 May, 2020 2 | 3 | - Hosts: Katie and Fede 4 | 5 | 6 | ## Topic 7 | 8 | - The Computational Humanities and Toxic Masculinity? A (long) reflection ([Original blogpost](https://latex-ninja.com/2020/04/19/the-computational-humanities-and-toxic-masculinity-a-long-reflection/), [Our slides](https://docs.google.com/presentation/d/11qi43HYFjogFJV36u2pS0CPLqP43d_Ypvv4pYvxuKVY/edit?usp=sharing)) 9 | - Katie McDonough (from Living with Machines) will introduce the topic and Fede chairs the debate. 10 | 11 | ## Aim of this meeting 12 | 13 | - Having a conversation rather than a one-to-many reading group 14 | - Discussing topics at the intersections of the two disciplines 15 | - Trying to consider different / uncommon points of view 16 | - Chatting over lunch (a tea / beer) to make it as informal and relaxed as possible 17 | - Trying to have this the first Wednesday of every month (up for discussion) 18 | 19 | **Participants (write your names below)** 20 | 21 | Name / Institute / What brought you here? (answer in a short sentence) 22 | - Federico Nanni / Turing / Leading the session 23 | - Leontien Talboom / UCL / Leading the session 24 | - Katie McDonough / Turing / Chairing the session 25 | - Malvika Sharan / Turing / Want to capture different perspective in the Turing Way project 26 | - Barbara McGillivray / Turing and Cambridge 27 | - Ismael Kherroubi Garcia / Ethics Research Assistant at Turing 28 | * Sarah Gibson / I'm a Research Software Engineer in the Research Engineering Group at the Turing. I'm an advocate for reproducible research and work on open projects like mybinder.org and The Turing Way. I'm also on the Living with Machines project at the Turing. 29 | * James Smithies from King’s Digital Lab, King’s College London * Glen Cameron / Illinois (US) working at HathiTrust Research Center 30 | * Laura Carter / Human Rights Centre at the University of Essex, currently an Enrichment student at the Turing 31 | * Arianna Ciula / Deputy Director and Senior Research Software Analyst at King’s Digital Lab, King’s College London (UK) 32 | * Scott Bailey / Data and Visualization Librarian at NC State University Libraries, but previously worked at the Scholars’ Lab @ UVa, and at Stanford’s Center for Interdisciplinary Digital Research (CIDR) 33 | * Eirini Goudarouli / Heads of Digital Research Programmes at The National Archives, UK 34 | * Jane Winters / School of Advanced Study, University of London. 35 | * James Cummings / Newcastle University, DH, Late Medieval Drama, TEI geek, that sort of thing. 36 | * Luca Scholz / Lecturer in Digital Humanities at the University of Manchester (UK) 37 | * David Beavan / Turing Research Engineering. Amongst other things, I’m co-organiser of the Humanities & Data Science SIG at the Turing, find out more here: https://www.turing.ac.uk/research/interest-groups/humanities-and-data-science 38 | * Kaspar Beelen / Research Associate at the Alan Turing Institute (Living with Machines project) 39 | * Charlotte Tupman / Digital Humanities Lab at the University of Exeter. Into ancient inscriptions. 40 | * Giulia Occhini / PhD student at the Turing in Data Science/NLP/Digital Humanities and other stuff 41 | * Sarah Lang / (also known as The LaTeX Ninja and author of the post discussed today) - my non-Ninja-self works at the Centre for Information Modellierung (Zentrum für Informationsmodellierung) in Graz, doing my PhD in Digital Humanities on early modern science / alchemy. My internet isn't always stable, so no permanent video ;) 42 | * Melvin Wevers / DHLab of the KNAW Humanities Cluster in Amsterdam. One of the organizers of the Computational Humanities Research workshop. 43 | * Kevin Xu / Research Software Engineer at the Turing.
 44 | * Glen Worthey/ U. of Illinois, Urbana-Champaign, at the HathiTrust Research Center. Thanks to Katie (my former Stanford colleague) for the invitation from across the Atlantic. Great to see many old friends and colleagues, looking forward to meeting new ones! 45 | * David De Roure / (Dave D as opposed to Dave B…), a Turing Fellow and my project is AI and music (I’m a digital musicologist, also know occasionally as a computational musicologist…). In Oxford I look after the Digital Humanities network (DH@Ox) which comes together annually in the DHOx Summer school (a cut-down 3 day online event this year). I’m a visiting prof at the Royal Northern College of Music working on science and music. I’m also involved in the UKRI research and innovation infrastructure exercise. 46 | * Daniel van Strien / I work at the BL as a digital curator.
 47 | * Olivia Vane / Research Software Engineer at the British Library (Living with Machines project) 48 | 49 | :dart: Discussion Goal 50 | --- 51 | 52 | - 53 | 54 | 55 | :books: Reference and other works mentioned during the discussion 56 | --- 57 | 58 | - Gender bias before and after “Computational Humanities,” some starting points 59 | - [Beyond the Margins: Intersectionality and the Digital Humanities](https://www.digitalhumanities.org/dhq/vol/9/2/000208/000208.html), DHQ (2015) by Roopika Risam 60 | - [The Radical Potential of the Digital Humanities](https://blogs.lse.ac.uk/impactofsocialsciences/2015/08/12/the-radical-unrealized-potential-of-digital-humanities/), Miriam Posner 61 | - [Bodies of Information](https://dhdebates.gc.cuny.edu/projects/bodies-of-information), ed. by Jacqueline Wernimont and Elizabeth Losh (2018) 62 | - [DH-WoGeM](http://www.dhwogem.org/) 63 | - [Data Feminism by Catherine D’Ignazio and Lauren F. Klein](https://bookbook.pubpub.org/data-feminism) 64 | - [LaTeX Ninja blog post](https://latex-ninja.com/2020/04/19/the-computational-humanities-and-toxic-masculinity-a-long-reflection), 19 April 2020 (the author is here!) 65 | 66 | ### Initial Drafted Notes 67 | 68 | 69 | Here's we can collaboratively take notes of the main passages of the conversation, that we can then organize as below. 70 | - Does the computational skill denote to some power structure? - is it assumed to be masculin (and hence exclude women or other genders) 71 | - Digital humanity has grown out of humantities wby techie people, similarly computational humanity seem to have come out of folks in computation who are interested in humanities - not sure if that's what creates a niche (Comment from Zoom: Does CH vs DH play back into long disproved stereotypes of The Two Cultures? Or is it different from wethat?) 72 | - A lot of the points that is in the blog post echoes human right approach that people in legal space talk about 73 | - Problem of binaries: techie - fuzzy divide (technies are engineers and fuzzy are historian and literature folks) that separates intellectual community in a campus 74 | - There is a gendered aspects indeed that exist in many research spaces and as a research community we should think about how do we manage these privileges and power dynamics 75 | - By the author on what led her to draft the post 76 | - The motivation comes from the lack of full understanding of what Computational Humanity actually stood for 77 | - In many languages this as a field doesn't exist 78 | - She noticed that some jobs in humanities are offered to computer scientists because they can do machine learning and a qualified humanities specialist might not 79 | - Many conference only highlight computational visualisation and not so much on humanities 80 | - Women and men will have same chances to get selected if they work on the same topic, but what if a field is also gender biased and that's the field that gets more focus 81 | - Privilege hazard: when you have privilege you don't see the problem 82 | - Often people get offended by people pointing out less privilege. They are afraid to speak up, and therefore having a safe space is useful. 83 | - ![](https://i.imgur.com/nTqIN1y.jpg) 84 | 85 | - How power dynamics influence the way we do research 86 | - When working in science there is an attitude of "verificationism" - as people who don't have the same lived experience want to understand what others are talking about (putting yourself in the shoes of other genders) - this causes frustrations to both sides of debaters 87 | 88 | - People come for a value but stay for the ethos 89 | - How is DH formed, and what aspects are being considered? 90 | - Why is a separate community being formed? 91 | 92 | - Melvin: I think it's not a clear separation, in our view it's more like a special interest / subcommunity within the larger community 93 | - James: Glen, It may also be different in the different regionalities of DH, where there are difference focuses. 94 | - James: Sarah, Maybe the perception of marginalisation is something we all have but to more/lesser degrees? 95 | 96 | 97 | ### Some comments from the chat 98 | 99 | - From James Cummings to Everyone: (4:32 pm): I'm not sure women have the same chance of getting accepted to a conference. At least that isn't what the statistics over a long period seem to show. There is a PI-goes-to-the-conference, and then far too many of those are still men. 100 | - From Ismael to Everyone: (4:34 pm): Having zero background in computational or digital humanities, I only learned the term when I saw this discussion advertised! I am happy to see the definitions are vague - I have a feeling that defining what either one is (or clarifying that they are the same) could be a first step (setting aside the enormous social background through which all concepts, names, etc. are interpreted for a moment) 101 | - From Jez Cope to Everyone: (4:35 pm): My gut reaction is "we need to investigate this more" too, but I'm also aware that attitude tends to perpetuate the status quo, both because you don't have to change anything until you've investigated, but but also there's a danger of confirmation bias 102 | - From James Cummings to Everyone: (4:36 pm): Ismael: There is a long history (and publications like 'Defining DH') on what is or isn't DH. I've learned from experience when someone claims they are doing DH it isn't my place to say whether that is real 'DH' or not. ;-) 103 | - From quinn dombrowski to Everyone: (4:37 pm): On representation in conference acceptance, there's this paper on DH (through 2015) which suggests underrepresentation https://scottbot.net/representation-at-digital-humanities-conferences-2000-2015/
 104 | - Sarah: Thanks! I think this is a good summary of where I wanted to go with the post - I get how continuous feminism debates can be somewhat annoying to men, but it's just like Laura said- if you have the priviledge, you can't just "see" the perspective of those who doN#t 105 | - From quinn dombrowski to Everyone: (4:42 pm) If the stats reflect fewer women submitting, isn't that a problem too?
 106 | - From Arianna Ciula to Everyone: (4:42 pm) My reactions: names ARE important, names often mean identity especially at certain stages in life (e.g. early career); society has problems with diversity (just look at figures on salaries across sectors); DH/RSE/Computational Humanities are right to question/problematise names and question bias/problemitise reifications of societal bias/problems; however you would assume we had figured out by now that instrumental and intellectual are entangled - if itsn’t this community who can articulate it best, who else? 107 | - james - Yeah, we are way passed having to ask women to prove they don't feel comfortable/experience misogyny. Just believe them. (But also, forgive us privileged if we forget EDI sometimes. Prod us when we do.) 108 | - From James Cummings to Everyone: (5:02 pm): (Hopefully we'll eventually get to a place where we don't need prodding, it is normal.) 109 | - From Melvin Wevers to Everyone: (5:03 pm): @sarah, I see how in the grant-world, traditional hum is threatened by computational approaches. But having a community dealing with issues related to communities is not necessarily set up as something that invalidates this field of scholarship, communities = computation 110 | - From James Smithies to Everyone: (5:03 pm): Thanks to the organisers and everyone who shared their thoughts - really valuable for me. 111 | 112 | 113 | :mag: Main arguments from the discussion 114 | --- 115 | 116 | - 117 | 118 | :closed_book: Closing remarks/questions/topics (for future discussions!) 119 | -- 120 | 121 | - 122 | 123 | ### Additional Drafted Notes 124 | 125 | 126 | - 127 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | # Attribution 4.0 International 2 | 3 | Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible. 4 | 5 | ### Using Creative Commons Public Licenses 6 | 7 | Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses. 8 | 9 | * __Considerations for licensors:__ Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. [More considerations for licensors](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensors). 10 | 11 | * __Considerations for the public:__ By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. [More considerations for the public](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensees). 12 | 13 | ## Creative Commons Attribution 4.0 International Public License 14 | 15 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 16 | 17 | ### Section 1 – Definitions. 18 | 19 | a. __Adapted Material__ means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 20 | 21 | b. __Adapter's License__ means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 22 | 23 | c. __Copyright and Similar Rights__ means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 24 | 25 | d. __Effective Technological Measures__ means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 26 | 27 | e. __Exceptions and Limitations__ means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 28 | 29 | f. __Licensed Material__ means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 30 | 31 | g. __Licensed Rights__ means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 32 | 33 | h. __Licensor__ means the individual(s) or entity(ies) granting rights under this Public License. 34 | 35 | i. __Share__ means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 36 | 37 | j. __Sui Generis Database Rights__ means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 38 | 39 | k. __You__ means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 40 | 41 | ### Section 2 – Scope. 42 | 43 | a. ___License grant.___ 44 | 45 | 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 46 | 47 | A. reproduce and Share the Licensed Material, in whole or in part; and 48 | 49 | B. produce, reproduce, and Share Adapted Material. 50 | 51 | 2. __Exceptions and Limitations.__ For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 52 | 53 | 3. __Term.__ The term of this Public License is specified in Section 6(a). 54 | 55 | 4. __Media and formats; technical modifications allowed.__ The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 56 | 57 | 5. __Downstream recipients.__ 58 | 59 | A. __Offer from the Licensor – Licensed Material.__ Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 60 | 61 | B. __No downstream restrictions.__ You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 62 | 63 | 6. __No endorsement.__ Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 64 | 65 | b. ___Other rights.___ 66 | 67 | 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 68 | 69 | 2. Patent and trademark rights are not licensed under this Public License. 70 | 71 | 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. 72 | 73 | ### Section 3 – License Conditions. 74 | 75 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 76 | 77 | a. ___Attribution.___ 78 | 79 | 1. If You Share the Licensed Material (including in modified form), You must: 80 | 81 | A. retain the following if it is supplied by the Licensor with the Licensed Material: 82 | 83 | i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 84 | 85 | ii. a copyright notice; 86 | 87 | iii. a notice that refers to this Public License; 88 | 89 | iv. a notice that refers to the disclaimer of warranties; 90 | 91 | v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 92 | 93 | B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 94 | 95 | C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 96 | 97 | 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 98 | 99 | 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 100 | 101 | 4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License. 102 | 103 | ### Section 4 – Sui Generis Database Rights. 104 | 105 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 106 | 107 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; 108 | 109 | b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and 110 | 111 | c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 112 | 113 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 114 | 115 | ### Section 5 – Disclaimer of Warranties and Limitation of Liability. 116 | 117 | a. __Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.__ 118 | 119 | b. __To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.__ 120 | 121 | c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 122 | 123 | ### Section 6 – Term and Termination. 124 | 125 | a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 126 | 127 | b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 128 | 129 | 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 130 | 131 | 2. upon express reinstatement by the Licensor. 132 | 133 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 134 | 135 | c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 136 | 137 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 138 | 139 | ### Section 7 – Other Terms and Conditions. 140 | 141 | a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 142 | 143 | b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 144 | 145 | ### Section 8 – Interpretation. 146 | 147 | a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 148 | 149 | b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 150 | 151 | c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 152 | 153 | d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 154 | 155 | > Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the [CC0 Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/legalcode). Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](http://creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses. 156 | > 157 | > Creative Commons may be contacted at creativecommons.org 158 | --------------------------------------------------------------------------------