├── LICENSE ├── README.md ├── assignments ├── assignment-01.md ├── assignment-02.md ├── assignment-03.md └── assignment-04.md ├── modules ├── images │ ├── .ignore │ ├── module-02-unt-homepage-source.png │ ├── module-02-unt-homepage.png │ ├── module-04-save-options.png │ ├── module-05-save-page-now.png │ ├── module-06-archive-today.png │ ├── module-07-oldweb-today-01.png │ ├── module-07-oldweb-today-02.png │ ├── module-08-archiveready-01.png │ ├── module-08-archiveready-02.png │ ├── module-08-arquivo.png │ ├── module-09-awp-01.png │ ├── module-09-ukwa.png │ ├── module-10-time-travel-01.png │ ├── module-10-time-travel-02.png │ ├── module-10-trove-01.png │ ├── module-10-trove-02.png │ ├── module-10-trove-03.png │ ├── module-11-commoncrawl-01.png │ ├── module-11-replayweb-01.png │ ├── module-11-replayweb-02.png │ ├── module-11-replayweb-03.png │ ├── module-13-conifer-01.png │ ├── module-13-conifer-02.png │ ├── module-13-conifer-03.png │ ├── module-13-conifer-04.png │ ├── module-14-robust-01.png │ ├── module-14-robust-02.png │ └── module-14-robust-03.png ├── module-00-introductions.md ├── module-01-what-is-a-web-archive.md ├── module-02-what-is-the-web.md ├── module-03-who-does-web-archiving.md ├── module-04-technology-overview.md ├── module-05-capture.md ├── module-06-preserve.md ├── module-07-playback.md ├── module-08-other-tools.md ├── module-09-collection-policies.md ├── module-10-metadata.md ├── module-11-quality-assurance.md ├── module-12-research.md ├── module-13-intellectual-property-ethics.md └── module-14-future-of-web-archive.md ├── syllabus-5960.001-Web-Archiving-2022-Spring.md └── syllabus-5960.001-Web-Archiving-2023-Spring.md /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. 396 | © 2022 GitHub, Inc. 397 | Terms 398 | Privacy 399 | Security 400 | Status 401 | Docs 402 | Contact GitHub 403 | Pricing 404 | API 405 | Training 406 | Blog 407 | About 408 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Web Archiving Course 2 | 3 | Primary Author: 4 | * Mark Phillips, Ph.D. 5 | * mark.phillips@unt.edu 6 | * https://twitter.com/vphill 7 | 8 | ## Overview 9 | 10 | This course is divided into [modules](./modules/) that are designed to build sequentally throughout the course. 11 | 12 | There are fourteen modules in total starting the first week of class and ending the last week of the semester. 13 | 14 | Each week there is an **_Overview and Objectives_** section which gives students an idea of what is going to be happening in that week's module. 15 | 16 | The **_Readings_** page will list the required and optional readings for that week. In addition to traditional readings, there are videos and sometimes audio recordings for students to listen to. 17 | 18 | A feature of each module called _**Exploring Web Archives**_ is included to give students a bit of a guided introduction to the wide range of web archives and collections that exist around the world. These explorations are generally a part of the weekly discussion. 19 | 20 | Finally, in each module there is a **_Discussion_** related to the topics covered that week. Additionally, the web archives the students explored in the Exploring Web Archives section will come back as part of the weekly discussion. 21 | 22 | Generally the current week's module and the following week's module will get published for students that like to work a bit ahead. The graded discussions are opened on Monday morning of the module week. 23 | 24 | In addition to weekly readings, activities, and discussions, there are four major [assignments](./assignments/) spread across the semester. The final assignment is informed by the third assignment. 25 | 26 | ## Sections Taught 27 | 28 | This course has been taught in the following semesters. 29 | 30 | * [UNT INFO 5960.001 - Web Archiving](syllabus-5960.001-Web-Archiving-2023-Spring.md) - Spring 2023 31 | * [UNT INFO 5960.001 - Web Archiving](syllabus-5960.001-Web-Archiving-2022-Spring.md) - Spring 2022 32 | 33 | 34 | ## Corrections / Suggestions / Contributions 35 | 36 | If you notice typos, have readings that you would like to suggest, or have ideas for web archiving activities I am interested in receiving contributions to this course. Please reach out to me directly, or if you are inclined, submit an __Issue__ with this repository. 37 | 38 | ## Acknowledgements 39 | 40 | The readings and activities in this course were in part informed by instructors of similar courses around the country. I would like to highlight the work of Samantha Abrams (WISC) , Lori Donovan (UMICH), and Ayoung Yoon (UNC). 41 | 42 | ## Contributors 43 | 44 | Kristy Phillips - https://github.com/k-phillips 45 | 46 | ## License 47 | 48 | Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License. 49 | -------------------------------------------------------------------------------- /assignments/assignment-01.md: -------------------------------------------------------------------------------- 1 | # Assignment 1: Web Archive Critique 2 | 3 | ## Context: 4 | The purpose of the Web Archive Critique is to explore a web archive and describe what you observe. The assignment is designed for you to share your observations about an existing web archive, including the organization responsible for creating the archive, the scope of the collection, any quality or performance issues you find, and finally to discuss options that might improve the experience with the collection. 5 | 6 | ## Assignment: 7 | Identify an existing web archive. This could be one that you have encountered in the course so far or it can be something you discover on your own. The readings and exploring web archives portion of Module Three would be good places to start to discover potential web archives. Once you select the archive, explore the collected websites to get a good idea of what is or isn’t collected as part of the archive. 8 | 9 | For this assignment, choose a topical web archive. Instead of The Library of Congress Web Archives you would choose one of the archiving initiatives they have. Likewise if you look at an Archive-It collection, instead of choosing an institution, which might have multiple projects, you are better off choosing a more topical, or event focused web archive. 10 | 11 | In this assignment you will be describing a web archive. It is important to include links to content within the web archive that support your observations. 12 | 13 | ## Organization and Content: 14 | The analysis should have the following sections: Background, Scope, Quality, Closing, and References. 15 | 16 | ### Background 17 | 18 | Provide an overview of the web archive that you have selected. This should include information about the collecting organization, the motivation for creating this web archive, and any technical information that is available about the building the collection. 19 | 20 | Other questions that are good to think about (though might not apply to all collections) include: 21 | When was the collection created? Why was it created? Does it expand upon an existing physical collection or support a program of study or scholarship at that institution? Who is responsible for the creation of this web archive? Is it a single institution or a collaborative project? What tools/platforms were used in the creation of this web archive? Is it an ongoing collection or has it completed? How are users expected to find this web archive? How is it cataloged or described? 22 | 23 | ### Scope 24 | What is the scope of the content being collected? Does it focus on a specific event or is it based around a topic or subject? Does it contain specific kinds of websites like political websites or election websites? Does it contain websites from a specific period of time? 25 | 26 | Just as important as what is included, what doesn’t seem to be included in the archive? 27 | 28 | ### Quality 29 | Describe the overall quality of the captured websites in the archive. Are there types of content that don’t seem to display or render in the playback tool very well? Does most of the content display correctly? If there are specific websites or website features that don’t seem to display well, discuss that in this section. Include links to examples that demonstrate the issues you identify whenever possible. 30 | 31 | ### Closing 32 | Discuss the overall observations of the web archive you chose. Discuss observations you would give the collection creators given the opportunity. Provide any suggestions for additional content that might be missing from the collection. 33 | 34 | ### References 35 | Any references cited within the document. 36 | 37 | ## Layout Specifics: 38 | 2-3 pages of textual content (1000-1500 words), font-size 11 pt, double spaced with 1 inch margins throughout the document. 39 | 40 | Feel free to include screenshots as needed to provide examples or highlight points. Don’t try to fill up space with the screenshots, if they make your document a bit longer than three pages that isn’t a problem. 41 | 42 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 43 | 44 | Put your last name in the upper right margin. Include pagination in the bottom margin. 45 | 46 | Name the document Assignment1_lastname.docx, Assignment1_lastname.doc, or Assignment1_lastname.odf depending on which tool you use. You will submit this to the **Major Assignment: Web Archive Critique** in Canvas. 47 | 48 | ## Grading Rubric 49 | Design (10 points) 50 | * Does the document follow the specific instructions for the assignment? 51 | * Does the document contain the correct information in the header and footer? 52 | * Does the document use appropriate margins, line spacing, and font size? 53 | * Is the document’s length appropriate based on the instructions? 54 | 55 | Content (30 points) 56 | * Does the document introduce the selected web archive? 57 | * Does the document identify the institutions involved with the creation of the web archive? 58 | * Does the document describe the scope of content in the web archive? 59 | * Does the document discuss the quality of the web captures contained in the web archive? 60 | * Does the document include critique or suggestions for improvement? 61 | 62 | Linking and Citations (5 points) 63 | * Does the document include links to the web archive? 64 | * Does the document include links to support observations or critique? 65 | * Does the document include links to examples of quality issues? 66 | * Does the document include properly formatted citations? 67 | 68 | Delivery (5 points) 69 | * Was the document submitted to the correct assignment module on Canvas? 70 | * Was the document submitted on time? 71 | * Was the document submitted in the correct file format (.doc, .docx, .odf)? 72 | * Was the document submitted with the correct file name? 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /assignments/assignment-02.md: -------------------------------------------------------------------------------- 1 | # Assignment 2: Web Archive Tools Critique 2 | 3 | ## Context: 4 | The purpose of the Web Archive Tools Critique is to learn to critically evaluate technology tools and services related to the web archiving process. The assignment is designed for you to share your observations about an existing tool or service related to the broad practice of archiving the web. These observations will include who is responsible for creating the tool/service, what problem is it trying to solve, are there other tools in this space that do something similar or different, what are the costs (if any) associated with the tool, and how users generally interact with the tool (service, software that is run locally, browser plugin, tool requiring a server). 5 | 6 | ## Assignment: 7 | Identify an existing tool used in the web archiving space. I suggest starting with the “Tools and Software” section of the Awesome Web Archiving list from the IIPC (https://github.com/iipc/awesome-web-archiving), but if you have another tool or service in mind that isn’t on that list feel free to use it, the tool just has to be related to the overall web archiving process. Once you select the tool/service, explore the documentation/website for that tool/service to get a good idea of what problem it is trying to solve, and other information about the creation of the tool/service. 8 | 9 | While it isn’t strictly required, I suggest you pick a tool or service that you are able to download, create an account with, or generally use as part of your assignment. I couldn’t imagine writing about a tool I am not able to see, test, or use in any sort of believable way and you will find this assignment much easier if you pick something you can actually test. 10 | 11 | In this assignment it is important to include links to relevant web pages and documentation related to the tool that supports your observations. 12 | 13 | ## Organization and Content: 14 | The assignment should start with the title “Assignment 2: Web Archive Tools Critique” centered at the top of the first page. 15 | 16 | The analysis should have the following section headings: Introduction, Problem Space, Technology Requirements, Assessment, and References. 17 | 18 | ### Introduction 19 | Provide an overview of the web archive tool or service that you have selected. This should include information about the tool itself, the motivation for creating this tool or service, and a brief description of what this tool or service does. Other useful information to include in the introduction is who is responsible for this tool? How long has this tool or service been around? What kind of software license does it have? What are the costs involved with using this tool? 20 | 21 | ### Problem Space 22 | Provide a discussion of why this tool or service exists. What is the problem space that this tool or service was created for? Go into more detail than in the introduction about what the tool or service actually accomplishes. How does this tool help to solve or minimize the issues in the problem space. 23 | 24 | ### Technology Requirements 25 | Describe the technology requirements for using this kind of tool or service. Is it designed to be downloaded and run on a desktop or run as a server? Is it a hosted service that requires an account and a subscription? What kinds of environments is the tool able to be installed, Windows, Linux, Mac? If you have a chance to install the tool and try using it, what experience did you have in the process? 26 | 27 | You should also include any information (and links) related to online documentation for the tool or service. Does the tool or service have an online mailing list for help? Is there a help forum? 28 | 29 | ### Assessment 30 | Discuss the overall observations of the web archive tool or service that you chose. Would you recommend this tool for others to use? Do you think that it solves or minimizes the problem it exists to solve? Does it seem like the tool is still maintained and in use? What is the size of the user community of the tool? Is there enough documentation available to get the tool installed and working? Are there things that would make using the tool easier that you would suggest to the project owner? 31 | 32 | ### References 33 | 34 | Any references cited within the document. Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 35 | 36 | ## Layout Specifics: 37 | 2-3 pages of textual content (1000-1500 words), font-size 11 pt, double spaced with 1 inch margins throughout the document. 38 | 39 | The title for the assignment should be centered horizontally at the top of the first page (just like this document). Sections should be bolded and slightly larger than the 11pt font used in the rest of the document. I suggest using the Headings available in most word processing tools. 40 | 41 | Feel free to include screenshots as needed to provide examples or highlight points. Don’t try to fill up space with the screenshots, if they make your document a bit longer than three pages that isn’t a problem. 42 | 43 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 44 | 45 | Put your last name in the upper right margin. Include pagination in the bottom margin. 46 | 47 | Name the document Assignment2_lastname.docx, Assignment2_lastname.doc, or Assignment2_lastname.odf depending on which tool you use. You will submit this to the **Major Assignment: Web Archive Tool Critique** in Canvas. 48 | 49 | ## Grading Rubric 50 | Design (10 points) 51 | * Does the document follow the specific instructions for the assignment? 52 | * Does the document contain a title and section headings? 53 | * Does the document contain the correct information in the header and footer? 54 | * Does the document use appropriate margins, line spacing, and font size? 55 | * Is the document’s length appropriate based on the instructions? 56 | 57 | Content (30 points) 58 | * Does the document introduce the selected web archive tool or service? 59 | * Does the document identify the person/organization responsible for the creation of the tool or service? 60 | * Does the document describe the problem space the tool or service is trying to solve? 61 | * Does the document discuss the functionality and features of the tool or service? 62 | * Does the document include critique or limitations for tool or service? 63 | 64 | Linking and Citations (5 points) 65 | * Does the document include links to the web archive tool’s website or home page? 66 | * Does the document include links to support observations or critique? 67 | * Does the document include properly formatted citations? 68 | 69 | Delivery (5 points) 70 | * Was the document submitted to the correct assignment module on Canvas? 71 | * Was the document submitted on time? 72 | * Was the document submitted in the correct file format (.doc, .docx, .odf)? 73 | * Was the document submitted with the correct file name? 74 | 75 | -------------------------------------------------------------------------------- /assignments/assignment-03.md: -------------------------------------------------------------------------------- 1 | # Assignment 3: Web Archive Collection Plan 2 | 3 | ## Context: 4 | 5 | The purpose of the Web Archive Collection Plan is to apply the things you have learned so far in this course and begin to think about building an actual web archive collection. The assignment is designed to give you experience in developing a collection plan that relates to a web archive that you will be building in the final project for this course. This assignment will make use of an existing set of collection planning guidelines that were introduced to you by Murray and Hsieh (2006). 6 | 7 | ## Assignment: 8 | 9 | For this assignment you will make use of the Collection Planning Guidelines by Murray and Hsieh https://digital.library.unt.edu/ark:/67531/metadc33006/ that was introduced in the Collection Policies Module of this course. This set of guidelines provides information about what to include in a collection plan. While you should be familiar with the entirety of the document, Section 3, Creating a Web Collection Plan (p. 18), will be the place that you will really want to read closely. We will be developing our Collection Plans and including sections 1-4. 10 | 11 | * Section 1. Mission & Scope 12 | * Section 2. Selection Activities 13 | * Section 3. Web Site Acquisition 14 | * Section 4. Descriptive Metadata Requirements 15 | 16 | This assignment will feed into your final project for this course, which involves the creation of a small web archive collection using the Conifer tool. More information about the final project will be released in a few weeks. For now, this assignment will start you thinking about building a web archive collection and the final project will follow up with actually doing the crawling. 17 | 18 | For this assignment, you will be creating the Collection Policy for a collection of archived web pages. The exact topic of the collection is up to you. You can choose a topic or subject-based collection, event-based collection, or organizational collection. You will want to scope your collection large enough so that it can include at least 15 seed urls but you don’t want it too broad where it becomes infeasible to create as part of this course. Some sections (such as the Mission) might require you to get creative to complete. Feel free to use your imagination on these to either create a fictional organization that this collection belongs to, or feel free to develop it as part of another institution on their behalf. 19 | 20 | For this assignment, you will need to identify 5-10 seed URLs you will include in your web archive collection. In addition to the seed URL you will include a description of why that seed has been included in this collection. 21 | 22 | An example collection plan using these guidelines for the CyberCemetery is available for reference here. https://digital.library.unt.edu/ark:/67531/metadc36313/ 23 | 24 | ## Organization and Content: 25 | The assignment should start with the title “Assignment 3: Web Archive Collection Plan” centered at the top of the first page. 26 | 27 | The following layout as described by the Collection Planning Guidelines in its Appendix A. Web Collection Plan Outline (p. 41) is a good starting place for organizing your document. 28 | 29 | Section 1. Mission & Scope 30 | 31 | 1. Mission Statement 32 | 2. User Group(s) 33 | 3. Collection Subject, Theme, or Event 34 | 4. Curator(s) 35 | 36 | Section 2. Selection Activities 37 | 38 | 1. Seed List 39 | (Include 5-10 seed URLs for your web archive collection and include a description of why they are included in this collection. These will be used in the final project). 40 | 1. URL(s) 41 | 2. Brief Description(s) 42 | 2. Initial Boundary Specification 43 | 1. Depth of linked web pages within the seed URL host 44 | 2. Inclusion or exclusion of linked web pages from external hosts for each seed URL host 45 | * Depth of linked web pages from external hosts (if included) 46 | 3. Rights Metadata 47 | 1. Rights designation 48 | 2. Rights metadata 49 | 3. Linked and sourced objects 50 | 51 | Section 3. Web Site Acquisition 52 | 53 | 1. Frequency of Capture 54 | 1. Date 55 | 2. Interval 56 | 2. Capture Boundaries 57 | 1. Depth of linked web pages within the seed URL host 58 | 2. Inclusion or exclusion of linked web pages from external hosts for each seed URL host 59 | * Depth of linked web pages from external hosts (if included) 60 | 3. Material Types & Formats 61 | 1. Excluded types 62 | 2. Excluded formats 63 | 4. Interactive & Dynamic Content 64 | 1. Authentication (username/password) 65 | 2. Email links 66 | 3. Forms 67 | 4. Database-generated pages (based on user queries) 68 | 5. Dynamically or programmatically generated web pages 69 | 70 | Section 4. Descriptive Metadata Requirements 71 | 72 | 1. Level of Description 73 | 1. Collection Level 74 | 2. Web Site Level 75 | 3. Information object level 76 | 2. Metadata elements 77 | 1. Essential 78 | 2. Desirable 79 | 3. Controlled vocabularies 80 | 81 | **References** 82 | Any references cited within the document. Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 83 | 84 | ## Layout Specifics: 85 | 4-5 pages of textual content, font-size 11 pt, double spaced with 1 inch margins throughout the document. Include page numbers in the bottom right side of the footer. Each of the four sections for this document should fill roughly a page. 86 | 87 | The title for the assignment should be centered horizontally at the top of the first page (just like this document). Sections should be bolded and slightly larger than the 11pt font used in the rest of the document. I suggest using the Headings available in most word processing tools. 88 | 89 | Feel free to include screenshots as needed to provide examples or highlight points. 90 | 91 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 92 | 93 | Put your last name in the upper right margin. Include pagination in the bottom margin. 94 | 95 | Name the document Assignment3_lastname.docx, Assignment3_lastname.doc, or Assignment3_lastname.odf depending on which tool you use. 96 | You will submit this to the **Major Assignment: Web Archive Collection Plan** in Canvas. 97 | 98 | ## Grading Rubric 99 | 100 | Design (10 points) 101 | * Does the document follow the specific instructions for the assignment? 102 | * Does the document contain a title and section headings? 103 | * Does the document contain the correct information in the header and footer? 104 | * Does the document use appropriate margins, line spacing, and font size? 105 | * Is the document’s length appropriate based on the instructions? 106 | 107 | Content (30 points) 108 | * Does the document introduce the mission and scope of the collection? 109 | * Does the document identify the collection focus such as subject, theme, or event? 110 | * Does the document have at least five seed URLs and descriptions? 111 | * Does the document discuss the frequency of capture and capture scope? 112 | * Does the document include information about metadata elements? 113 | 114 | Linking and Citations (5 points) 115 | * Does the document have at least five seed URLs and descriptions? 116 | * Does the document include necessary citations? 117 | * Does the document include properly formatted citations? 118 | 119 | Delivery (5 points) 120 | * Was the document submitted to the correct assignment module on Canvas? 121 | * Was the document submitted on time? 122 | * Was the document submitted in the correct file format (.doc, .docx, .odf)? 123 | * Was the document submitted with the correct file name? 124 | -------------------------------------------------------------------------------- /assignments/assignment-04.md: -------------------------------------------------------------------------------- 1 | # Assignment 4: Building a Web Archive 2 | 3 | ## Context: 4 | 5 | The purpose of the Building a Web Archive assignment is to function as a final project for this course. It has been designed to provide you with an opportunity to draw from the readings, discussion, and assignments you have worked on in previous modules of this course, and build a sample web archive based on your Assignment 3: Web Archive Collection Plan. 6 | 7 | ## Assignment: 8 | 9 | For this assignment you will make use of the Web Archive Collection Plan that you submitted in Assignment 3: and begin to build that web archive. 10 | 11 | We will use the free Conifer (previously called Webrecorder.io) tool located at https://conifer.rhizome.org/ 12 | 13 | In Module 13 you signed up for a free account, created a public collection, and uploaded a link for that collection to the class discussion board. You will be adding seed URLs to that collection and making use of the tools and functionality in Conifer to conduct your crawls of those seeds. 14 | 15 | If you have questions about using the Conifer tool, I suggest you begin with the Conifer Help guides https://guide.conifer.rhizome.org/ or even look at some of the YouTube videos that discuss the operation of the service. Overall it should be something you are familiar with from class Web Archive Exercises. 16 | 17 | For your final project you will need to expand your initial seed list to a total of at least 15 seeds. 18 | 19 | For each of the seeds you will document at least the following pieces of information. 20 | 21 | | Metadata Fields | Description of Field | 22 | |-------------------------------------|------------------------------------------------------------------------------| 23 | | Seed URL |Seed URL you will be collecting. | 24 | | Pre-Crawl Review: | Problems that might exist for a crawler. | 25 | | Title of Seed: |The title of the seed/document/website | 26 | | Description of Seed URL: | Textual description of the seed URL | 27 | | Creator/Author/Publisher: | Who is responsible for the creation of this seed URL | 28 | | Reason for Inclusion in Collection: | Why did you include this seed in your collection? | 29 | | Post-Crawl Review: | After crawling, what are any limitations you notice in your crawled content. | 30 | | Crawled Seed URL | Link directly to the seed URL in your collection in Conifer. | 31 | 32 | You are welcome to include other metadata fields that are appropriate for the type of collection you are creating. These could include country, language, branch of government, Olympian name, political party, or anything else that would be helpful for a user trying to use your collection. The pre- and post-crawl review are opportunities to communicate some of the crawling challenges you see based on your experience in this course. It is also a way of describing any quality issues that you identify in your collection after you have completed your crawls. 33 | 34 | You are free to present the metadata and fields in any format that works for you, just make sure it is clear what the fields are, and which seed URL they belong to. 35 | 36 | When you are crawling your seed, you should keep two things in mind. First, you have a limited amount of space (5GB) for this work. So don’t go crazy trying to capture a ton of video for example. Second, you want to make sure you capture your seed at an appropriate level that fits within the scope of your collection. You may not need to capture the site in its entirety, just make sure you discuss what you did and didn’t crawl in your Pre-Crawl and Post-Crawl Review sections. 37 | 38 | ## Organization and Content: 39 | 40 | The assignment should start with the title “Assignment 4: Building a Web Archive” centered at the top of the first page. 41 | 42 | The following sections should be present as headings. 43 | 44 | ### Collection Overview 45 | This section outlines the collection you are building including links to the public collection page in the Conifer service. You should take into account the additional seeds you have added beyond those in your Collection Plan when writing this overview to make sure it incorporates the additional information you will include. In this section you should speak to the crawl modality of the web archive you are creating (domain, website, topical, event, document). 46 | 47 | ### Seed List 48 | This section will contain all 15 (or if you want to include more) seed URLs and associated metadata fields (as listed in the Assignment section above). You may format these in whatever way you feel best conveys the information clearly. 49 | 50 | ### References 51 | Any references cited within the document. Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 52 | 53 | ## Layout Specifics: 54 | 55 | Font-size 11 pt, double spaced with 1 inch margins throughout the document. Include page numbers in the bottom right side of the footer. 56 | 57 | The title for the assignment should be centered horizontally at the top of the first page (just like this document). Sections should be bolded and slightly larger than the 11pt font used in the rest of the document. I suggest using the Headings available in most word processing tools. 58 | 59 | Feel free to include screenshots as needed to provide examples or highlight points. 60 | 61 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA. 62 | 63 | Put your last name in the upper right margin. Include pagination in the bottom margin. 64 | 65 | Name the document Assignment4_lastname.docx, Assignment4_lastname.doc, or Assignment4_lastname.odf depending on which tool you use. You will submit this to the **Major Assignment: Building a Web Archive** in Canvas. 66 | 67 | ## Grading Rubric 68 | 69 | Design (15 points) 70 | * Does the document follow the specific instructions for the assignment? 71 | * Does the document contain a title and section headings? 72 | * Does the document contain the correct information in the header and footer? 73 | * Does the document use appropriate margins, line spacing, and font size? 74 | * Is the document’s length appropriate based on the instructions? 75 | 76 | Content (65 points) 77 | * Does the document provide an overview of the collection? 78 | * Does the document identify the collection crawl modality? 79 | * Does the document have at least fifteen seed URLs? 80 | * Does the document include the required metadata fields with each seed URL? 81 | * Does the archive content represent the seed list, pre- and post- crawl reviews? 82 | 83 | Linking and Citations (10 points) 84 | * Does the document have at least fifteen seed URLs and appropriate metadata? 85 | * Does the document include necessary citations? 86 | * Does the document include properly formatted citations? 87 | 88 | Delivery (10 points) 89 | * Was the document submitted to the correct assignment module on Canvas? 90 | * Was the document submitted on time? 91 | * Was the document submitted in the correct file format (.doc, .docx, .odf)? 92 | * Was the document submitted with the correct file name? 93 | -------------------------------------------------------------------------------- /modules/images/.ignore: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /modules/images/module-02-unt-homepage-source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-02-unt-homepage-source.png -------------------------------------------------------------------------------- /modules/images/module-02-unt-homepage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-02-unt-homepage.png -------------------------------------------------------------------------------- /modules/images/module-04-save-options.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-04-save-options.png -------------------------------------------------------------------------------- /modules/images/module-05-save-page-now.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-05-save-page-now.png -------------------------------------------------------------------------------- /modules/images/module-06-archive-today.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-06-archive-today.png -------------------------------------------------------------------------------- /modules/images/module-07-oldweb-today-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-07-oldweb-today-01.png -------------------------------------------------------------------------------- /modules/images/module-07-oldweb-today-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-07-oldweb-today-02.png -------------------------------------------------------------------------------- /modules/images/module-08-archiveready-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-08-archiveready-01.png -------------------------------------------------------------------------------- /modules/images/module-08-archiveready-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-08-archiveready-02.png -------------------------------------------------------------------------------- /modules/images/module-08-arquivo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-08-arquivo.png -------------------------------------------------------------------------------- /modules/images/module-09-awp-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-09-awp-01.png -------------------------------------------------------------------------------- /modules/images/module-09-ukwa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-09-ukwa.png -------------------------------------------------------------------------------- /modules/images/module-10-time-travel-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-time-travel-01.png -------------------------------------------------------------------------------- /modules/images/module-10-time-travel-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-time-travel-02.png -------------------------------------------------------------------------------- /modules/images/module-10-trove-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-trove-01.png -------------------------------------------------------------------------------- /modules/images/module-10-trove-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-trove-02.png -------------------------------------------------------------------------------- /modules/images/module-10-trove-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-trove-03.png -------------------------------------------------------------------------------- /modules/images/module-11-commoncrawl-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-commoncrawl-01.png -------------------------------------------------------------------------------- /modules/images/module-11-replayweb-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-replayweb-01.png -------------------------------------------------------------------------------- /modules/images/module-11-replayweb-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-replayweb-02.png -------------------------------------------------------------------------------- /modules/images/module-11-replayweb-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-replayweb-03.png -------------------------------------------------------------------------------- /modules/images/module-13-conifer-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-01.png -------------------------------------------------------------------------------- /modules/images/module-13-conifer-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-02.png -------------------------------------------------------------------------------- /modules/images/module-13-conifer-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-03.png -------------------------------------------------------------------------------- /modules/images/module-13-conifer-04.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-04.png -------------------------------------------------------------------------------- /modules/images/module-14-robust-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-14-robust-01.png -------------------------------------------------------------------------------- /modules/images/module-14-robust-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-14-robust-02.png -------------------------------------------------------------------------------- /modules/images/module-14-robust-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-14-robust-03.png -------------------------------------------------------------------------------- /modules/module-00-introductions.md: -------------------------------------------------------------------------------- 1 | # Overview of a Weekly Module 2 | 3 | This course is divided into modules that correspond to the weeks in the semester. 4 | 5 | There are fourteen modules in total starting the first week of class and ending the last week of the semester. 6 | 7 | Each week there is an **_Overview and Objectives_** which gives you an idea of what is going to be happening in this week's module. 8 | 9 | The **_Readings_** page will list the required and optional readings for that week. In addition to traditional readings, there are videos and sometimes audio recordings for you to listen to. 10 | 11 | A feature of each module called _**Exploring Web Archives**_ is included to give you a bit of a guided introduction to the wide range of web archives and collections that exist around the world. These explorations are generally a part of the weekly discussion. 12 | 13 | Finally, in each module there is a **_Discussion_** related to the topics covered that week. Additionally, the web archives you explored in the Exploring Web Archives section will come back as part of the weekly discussion. 14 | 15 | Generally the current week's module and the following week's module will get published for those that like to work a bit ahead. The graded discussions are opened on Monday morning the module week. 16 | 17 | # Read the Syllabus 18 | 19 | Take the time to read the Syllabus for the course. 20 | 21 | %Link to Syllabus document% 22 | 23 | # Discussion - Introduce Yourself 24 | 25 | This is a pretty standard first discussion for an online course. 26 | 27 | In a paragraph or two introduce yourself, where you are from, what program you are in, and your progress in that program. 28 | 29 | Try and share some things about yourself that will allow me and your classmates to get to know you a little better. 30 | 31 | If possible, include a picture so that we can include a face with a name in the discussions. 32 | -------------------------------------------------------------------------------- /modules/module-01-what-is-a-web-archive.md: -------------------------------------------------------------------------------- 1 | # Module One - What is a Web Archive? 2 | 3 | ## Module One - Overview and Objectives 4 | 5 | ### Overview: 6 | 7 | This week we are going to explore the subject of what is a web archive, the reasons behind building these types of collections, and finally, what kinds of content you might expect in a web archive. 8 | 9 | There are a couple of short videos to watch as well as some readings that present the concept of what a web archive is. 10 | 11 | There is a graded discussion for this module. 12 | 13 | ### Objectives: 14 | 15 | 1. Understand why web archives exist 16 | 2. Begin to interact with existing web archives 17 | 3. Explore existing web archives and report out on what you discover. 18 | 19 | 20 | ## Module One - Readings 21 | 22 | ### Web Archiving: 23 | 24 | * UK Web Archive. "What is a Web Archive?" (April 2, 2015) https://www.youtube.com/watch?v=ubDHY-ynWi0 25 | * Potter, Abbey. “The Why and What of Web Archives.” The Signal: Digital Preservation (April 29, 2014) http://blogs.loc.gov/digitalpreservation/2014/04/the-why-and-what-of-web-archives/ 26 | * LePore, Jill. “The Cobweb: Can the Internet be archived?” The New Yorker (January 26, 2015) http://www.newyorker.com/magazine/2015/01/26/cobweb 27 | * National Digital Stewardship Alliance. “Web Archiving in the United Sates: A 2017 Survey” 2017 https://osf.io/ht6ay/ 28 | * Skim this survey report. 29 | * Bragg, Molly, Hanna, Kristine, et al. “The Web Archiving Life Cycle Model.” (March 2013) http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf 30 | * Introduction: pp. 1-5 31 | * Vision and Objectives: pp. 5-8 32 | 33 | ### Digital Preservation: 34 | * Lavoie, Brian. “The Open Archival Information System (OAIS) Reference Model: Introductory Guide” (2nd Edition) DPC Technology Watch Report 14-02 (October 2014) http://dx.doi.org/10.7207/TWR14-02 35 | * Section 5 and 6 (p 7-28) 36 | 37 | ## Module One - Exploring Web Archives 38 | 39 | ### Exploring Web Archives 40 | 41 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 42 | 43 | This week we will start off with the largest web archive that we have, the Internet Archives' Wayback Machine. 44 | 45 | https://web.archive.org/ 46 | 47 | We will learn more about the Internet Archive in future weeks. For now, type a url into the Wayback Machine and see what you can find. You could try https://unt.edu or maybe see what cnn.com looked like ten years ago. You can also try to view the different thumbnails that are rotating on the page. Explore some of the features of the interface after you have selected a URL to explore. 48 | 49 | ## Module One - Discussion 50 | 51 | ### Discussion Post: 52 | 53 | In at least one paragraph, discuss what you learned about web archiving in this week's introduction to the topic. Were you familiar with this area before this course? Have you ever found yourself using a web archive in your research or work? Knowing that there are web archives, how do you think they might be useful in your work in the future? Finally, if there was something that didn't get answered in the readings, or if a question came up that you would like to hear others ideas on, please include that in your post. 54 | 55 | In at least one paragraph, discuss what you learned about the Internet Archive's Wayback Machine. What URLs did you look at? Were you surprised by anything that you found? What is your previous experience with the Wayback Machine? 56 | 57 | ### Class Engagement: 58 | 59 | After you have made the discussion post described above, take the time to response, comment, or engage with at least two of your classmates posts. 60 | 61 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 62 | -------------------------------------------------------------------------------- /modules/module-02-what-is-the-web.md: -------------------------------------------------------------------------------- 1 | # Module Two - What is the Web? 2 | 3 | ## Module Two - Overview and Objectives 4 | 5 | ### Overview: 6 | This week we are going to be looking at the building blocks that make up the web. Having a basic familiarity of how the web works is important as we begin to discuss web archiving. 7 | 8 | There are several short vides to watch as well as some readings that provide an overview of common web components such as HTTP, URLS, HTML and HTTP Headers. 9 | 10 | ### Objectives: 11 | 1. Become familiar with building blocks of the web. 12 | 2. Understand how the Web and the Internet are related but different. 13 | 3. Become familiar with the basics of HTML and how to source of websites. 14 | 15 | ## Module Two - Readings 16 | 17 | ### Web Architecture 18 | 19 | * Computerphile. _Web vs Internet (Deep Dark Web Pt1)_ (June 17, 2016) - https://www.youtube.com/watch?v=oiR2mvep_nQ 20 | * Eye on Tech. _What is a URL? URL Components and How it Works_ (January 8, 2020) https://www.youtube.com/watch?v=-LPe4tYckkg 21 | * Computer Hope. URL (December 5, 2021) https://www.computerhope.com/jargon/u/url.htm 22 | * CoffeeCup Software. _Absolute Vs. Relative Paths/Links_ (September 6, 2017) https://www.coffeecup.com/help/articles/absolute-vs-relative-pathslinks/ 23 | * Mozilla. _"An Overview of HTTP"_ https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview 24 | * WebConcepts. _Web Server Concepts and Examples_ (October 5, 2020). https://www.youtube.com/watch?v=9J1nJOivdyw 25 | * SoftwareEngenius. _Learn in 5 Minutes: Http Headers (General/Request/Response/Entity)_ (July 31, 2020) https://www.youtube.com/watch?v=1v7RoeXyww4 26 | * Fieldings, et. al. _RFC 2616 Section 10. Status Code Definitions._ https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html 27 | 28 | ### Further Readings (and Videos) 29 | 30 | * Jake Wright. _Learn HTML in 12 Minutes_ (Nov 10, 2010). https://www.youtube.com/watch?v=bWPMSSsVdPk 31 | * Computerphile. SGML HTML XML What's the Difference? (Part 1) (Apr 13, 2016). https://www.youtube.com/watch?v=RH0o-QjnwDg 32 | * Computerphile. HTML: Poison or Panacea? (HTML Part 2) (Apr 22, 2016). https://www.youtube.com/watch?v=Q4dYwEyjZcY 33 | * Mozilla Getting started with HTML - https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML/Getting_started 34 | * Tim Burners-Lee Information Management: A Proposal (March 1989) - https://www.w3.org/History/1989/proposal.html 35 | * Ben Cotton. 6 RFCs for understanding how the internet works (And three for fun) (July 6, 2018) - https://opensource.com/article/18/7/requests-for-comments-to-know 36 | * The Internet Society. Hypertext Transfer Protocol -- HTTP/1.1 (June 1999) - https://datatracker.ietf.org/doc/html/rfc2616 37 | 38 | ## Module Two - HTML Exercise 39 | 40 | ### HTML Exercise 41 | 42 | Visit either https://texteditor.com/html/editor/ or https://htmlfiddle.net/ and experiment with what happens when you are using the What You See Is What You Get (WYSIWYG) text editor. In one side of the page you can type text, add formatting and you will see what the HTML markup is doing on the other side of the screen. Make sure you try to make several paragraphs, add some links and an image to see what happens. Finally experiment with different formats like bold, italics, lists and maybe a table. The goal is to see the different tags that are being created in the html. 43 | 44 | ### View the Source 45 | The next exercise is to become familiar with ways of viewing the source of the HTML pages you are using all the time in your web browsers. Most browsers will have a way of viewing the source HTML code that is used to render the pages you are looking at. This will include the links, images, stylesheets, javascript, and other markup needed to add structure, design, and interactivity to the webpages you are viewing online. 46 | 47 | Viewing the source is actually fairly easy on browsers, if you right click with you mouse on the page you are looking at, you will see an option that says something like "View Page Source". When you click on this option your browser will open a new tab and will show you what HTML makes up the page you are on. 48 | 49 | In looking at a few browsers on my Mac, here is what the options say that you should click on. It might be slightly different on another browser and operating system. 50 | 51 | * Chrome - View Page Source 52 | * Firefox - View Page Source 53 | * Safari - Show Page Source 54 | 55 | #### Example of what you should see 56 | For a website like https://unt.edu you might see this in your browser. 57 | 58 | ![Alt](images/module-02-unt-homepage.png "UNT Homepage") 59 | 60 | If you click on this page and view the source code you will see something that looks like this. 61 | 62 | ![Alt](images/module-02-unt-homepage-source.png "UNT Homepage Source View") 63 | 64 | ### Exploring Web Archives 65 | 66 | This week we will be looking at the web archives of the Library of Congress. 67 | 68 | Pay attention to how the archived web sites are organized. 69 | 70 | What is the difference in this presentation of an archived website and collection of those sites? 71 | 72 | * Web Archiving - About this Program - https://www.loc.gov/programs/web-archiving/about-this-program/ 73 | * Collections with Web Archives - https://www.loc.gov/web-archives/collections/ 74 | * Web Archives - https://www.loc.gov/web-archives/collections/ (Links to an external site.) 75 | 76 | ## Discussion 77 | 78 | ### Discussion Post: 79 | In at least one paragraph, discuss what you learned about the components that are used together to build the web as we know it. What areas were you most or least familiar with before this weeks readings? Are there pieces that you would like to learn more about? 80 | 81 | In at least one paragraph, discuss collections you discovered at the Library of Congress? Were you surprised by what you found there? Are there things you think are missing based on your exploration of the web archive holdings? What would you like to know more about in relation to the Library of Congress Web Archives? Please include links to the specific sites you reference including links into the web archives themselves. One of the goals of this course is to become comfortable with linking into web archives and making them an active part of your online experience. 82 | 83 | Note: It is easy to go astray in the Library of Congress Collections. Make sure that what you are looking at are web archives, and not archival collections on the web. 84 | 85 | ### Class Engagement: 86 | After you have made the discussion post described above, take the time to response, comment, or engage with at least two of your classmates posts. 87 | 88 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 89 | -------------------------------------------------------------------------------- /modules/module-03-who-does-web-archiving.md: -------------------------------------------------------------------------------- 1 | # Module Three - Who Does Web Archiving? 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | This week we will look at who is building web archives. This will include what institutions, their type, where they are located, and most importantly, what they are collecting. 7 | 8 | There are a couple of short videos as well as some readings that discuss who is involved in the web archiving process. 9 | 10 | There is a graded discussion for this module. 11 | 12 | ### Objectives: 13 | 1. Be able to identify institutions who have web archiving initiatives. 14 | 2. Begin to evaluate the scope of a web archive. 15 | 16 | ## Readings 17 | 18 | ### Readings 19 | 20 | * Major D., Gomes D. (2021) Web Archives Preserve Our Digital Collective Memory. In: Gomes D., Demidova E., Winters J., Risse T. (eds) The Past Web. Springer, Cham. https://doi.org/10.1007%2F978-3-030-63291-5_2 21 | * This chapter is a resource that the UNT Libraries subscribes to. 22 | * This link will hopefully get you right to the pdf - https://link-springer-com.libproxy.library.unt.edu/content/pdf/10.1007%2F978-3-030-63291-5_2.pdf 23 | * Pennock, Maureen. “Web Archiving” Digital Preservation Coalition Technology Watch Report 13-01 (March 2013) - http://dx.doi.org/10.7207/twr13-01 24 | * Read 1, 1.1, 1.2, 1.3, 1.4 (pages 3-8) 25 | * Skim the rest of the document. 26 | * National Digital Stewardship Alliance. Web Archiving in the United Sates: A 2017 Survey. 2017 https://osf.io/ht6ay/ 27 | * Review this report by skimming it again. 28 | * Wikipedia List of Web archiving initiatives - https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives 29 | * Skim this page and explore some of the initiatives. 30 | * PBS NewsHour Internet history is fragile. This archive is making sure it doesn't disappear. (January 2, 2017) https://www.youtube.com/watch?v=K8I28erYFLc 31 | 32 | ### Who does Web Archiving? 33 | 34 | Internet Archive - https://archive.org 35 | * Wayback Machine - https://web.archive.org 36 | 37 | International Internet Preservation Consortium (IIPC) - https://netpreserve.org/ 38 | * Members List - https://netpreserve.org/about-us/members/ 39 | * IIPC Publications in the UNT Digital Library - https://digital.library.unt.edu/explore/partners/IIPC/browse/ 40 | 41 | Archive-It - https://archive-it.org/ 42 | * List of Organizations - https://archive-it.org/explore?show=Organizations 43 | * List of Collections - https://archive-it.org/explore?show=Collections 44 | 45 | UNT Libraries - https://webarchive.library.unt.edu/ 46 | * CyberCemetery - https://cybercemetery.unt.edu/ 47 | * UNT Web Archives - https://digital.library.unt.edu/explore/collections/UNTWEB/browse/?sort=date_d 48 | * UNT Libraries Archive-It Collections - https://archive-it.org/organizations/1181 49 | * End of Term Web Archive - https://eotarchive.org/ 50 | 51 | Web Archiving Texas Interest Group 52 | * https://www.tdl.org/members/groups/web-archiving-texas-interest-group/ 53 | 54 | ## Exploring Web Archives 55 | 56 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 57 | 58 | This week we will look at the institutions and organizations that are using the web archiving service Archive-It. 59 | 60 | Archive-It - https://archive-it.org/ 61 | 62 | * List of Organizations - https://archive-it.org/explore?show=Organizations 63 | * List of Collections - https://archive-it.org/explore?show=Collections 64 | 65 | Take a look around these sites and explore two institutions in depth. You will report out on these institutions and their collections in this week's discussion. 66 | 67 | ## Discussion 68 | 69 | ### Discussion Post: 70 | 71 | In at least one paragraph, discuss what you learned about who is involved in the web archiving space. What kinds of institutions did you primarily see? What kinds of collections did you see from these institutions? Are there areas (geographically) that you didn't see much activity from? Why might that be? 72 | 73 | In at least one paragraph per institution, identify which institution or organization you looked at in the Archive-It platform. What kinds of content are they collecting? Did you notice any similarities with other organizations that you looked at? Did you notice any differences in the kinds of things they collect? Please include links to the specific sites you reference including links into the web archives themselves. One of the goals of this course is to become comfortable with linking into web archives and making them an active part of your online experience. 74 | 75 | ### Class Engagement: 76 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 77 | 78 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 79 | -------------------------------------------------------------------------------- /modules/module-04-technology-overview.md: -------------------------------------------------------------------------------- 1 | # Module Four - Technology Overview 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | The web is a growing and changing environment. Because of this constant change, the tools and processes used to harvest, capture, and archive the web also have to change in order to keep up. This module will introduce you to the major components involved in the web archiving process including capture or harvest, replay or playback, and finally discovery and access. 7 | 8 | Future modules will discuss these concepts in greater detail. 9 | 10 | There are several readings and a longer video recorded in 2019 to watch that will present the major components of technology used in web archiving. 11 | 12 | There is a graded discussion for this module. 13 | 14 | ### Objectives: 15 | 1. Be able to identify the high-level technology components related to web archiving. 16 | 2. Begin to learn about the different components and their primary uses. 17 | 3. Begin to understand some of the limitations present in harvesting resources from the web. 18 | 19 | ## Readings 20 | 21 | * Niu, j. (2012). An Overview of Web Archiving. _D-Lib Magazine_. 18(3/4) https://doi.org/10.1045/march2012-niu1 22 | * This article provides a good overview of the components in the web archiving space. 23 | * Texas Digital Library (2019). Intro to Web Archiving Texas #1 Web Archiving Technology, Tools & Resources - https://www.youtube.com/watch?v=vkSKPQccuMg 24 | * Mark Phillips, Associate Dean for Digital Libraries, University of North Texas 25 | * Courtney Mumma, Deputy Director, Texas Digital Library 26 | * Lauren Ko, Supervisor, Software Development Unit, University of North Texas 27 | * International Internet Preservation Consortium (2022). _ Awesome Web Archiving_ - https://github.com/iipc/awesome-web-archiving 28 | * Skim this list of tools and technologies for web archiving. 29 | * Follow a few different links and explore. (You will need to pick one for the discussion this week) 30 | * Hockx-Yu, H. (2009) Web Archiving Tools: An Overview - https://www.dpconline.org/docs/miscellaneous/events/394-0907hockxyumissing-links/file 31 | * This presentation does a great job of presenting a wide range of concerns related to the UK Web Archive. 32 | 33 | ## Archiving Exercise 34 | 35 | ### Web Archiving Exercise - Browser Based "Save As" 36 | In a previous module we looked at the HTML that makes up the web pages that we view in our browsers. We learned how to "view source" on a page and see the underlying code. 37 | 38 | In this exercise we will look at how a browser can be used to save web content to your local machine. 39 | 40 | Most browsers have the ability to save a web page to your local machine. 41 | 42 | This is found in one of two places. first you can look in the File dropdown at the top of your browser. The exact wording will be different depending on which browser and operating system you use, but on a Mac with Chrome I see "Save Page As". 43 | 44 | Another option is to right click on the page you are interested in saving and you will see something like "Save as" (again using Chrome on a Mac) 45 | 46 | You will generally have two different options when you save an HTML file from your browser (though sometimes there are more). They will be some variation of "HTML Only" and "Complete" 47 | 48 | Example save dialog from Chrome on a Mac. 49 | 50 | ![Alt](images/module-04-save-options.png "Example save dialog from Chrome on a Mac.") 51 | 52 | 53 | ### Exercise 54 | 55 | Pick an HTML page, ideally the homepage of an agency or organization. 56 | 57 | 0. (just a hint) I like to create a new empty folder on my Desktop named something like "captures" so that when I'm trying to find this stuff later, it doesn't get all confused in my normal Downloads folder. As you are saving files you will need to navigate to this folder on your desktop but in the long run it will be easier to deal with. 58 | 1. Using Save As, first save the HTML file as "HTML only". Pay attention where on your hard drive you save the file. Next navigate to that location and try and open the file in your browser. Does the page display the same as the "live" version? What does the URL bar in your browser display for the "URL"?, What happens when you click on links, what displays in the URL after you click a link? 59 | 2. Going back to the homepage you chose (and not your saved copy), this time save the HTML file as "complete" or "all files". Pay attention where on your hard drive you save the file. Again navigate to the location you saved it and this time notice the different files that are present. In addition to the HTML file, what other files were downloaded? Next, open the file in your browser. Does the page display the same as the "live" version? What does the URL bar in your browser display for the "URL"?, What happens when you click on links, what displays in the URL after you click a link? What is the file size for all of the files that were downloaded? 60 | 3. View the source on these saved HTML versions and compare them to the "view source" on the live website. Do you notice any differences with the saved pages and the version that is on the web? 61 | 62 | ## Exploring Web Archives 63 | 64 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 65 | 66 | This week we will look at the Ivy Plus Library Web Collecting Program - https://ivpluslibraries.org/programs/ivy-plus-libraries-confederation-web-collecting-program/ 67 | 68 | This program uses the Archive-It service for their web archiving activities. 69 | 70 | A gateway into their collections can be found here - https://archive-it.org/home/IvyPlus 71 | 72 | Explore the collections that are included in this program. In the discussion for this week you will describe what you find in these collections, why they are being collected, and the scope of what is being collected. 73 | 74 | ## Discussion 75 | 76 | ### Discussion Post: 77 | In at least one paragraph, discuss what you learned about the technologies involved in the web archiving space. In addition to the big buckets of Capture, Preserve, Playback what other things also should be thought about based on your readings? 78 | 79 | What tools did you explore in the Awesome List? Link to the tool and give a brief description of the problem it is trying to solve. 80 | 81 | In at least one paragraph discuss the web archive you identified this week in the Ivy Plus Library Web Collecting Program. Include a link to the web archive and discuss some of the types of content you found inside. Was there anything in their collections you hadn't expected? Are there things that you would have thought might be there? Please include links to the specific sites you reference including links into the web archives themselves. One of the goals of this course is to become comfortable with linking into web archives and making them an active part of your online experience. 82 | 83 | Finally, in at least one paragraph, what did you discover in the "Save As" exercise this week? What website did you capture? What happened when you opened the saved file in your browser? How was the HTML Only different from the other option to save things completely or save all files? What kinds of files did you see when you saved things? What differences did you notice between the saved versions of the web page and the live version? 84 | 85 | ### Class Engagement: 86 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 87 | 88 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 89 | -------------------------------------------------------------------------------- /modules/module-05-capture.md: -------------------------------------------------------------------------------- 1 | # Module Five - Capture 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | Capturing, harvesting, or crawling are usually used interchangeably to represent the acquisition process in web archiving. This module will take a deeper look at the process of acquiring content for a web archive, introduce you to some new terms, and give you a chance to create what might be your first web capture. 7 | 8 | This will build on concepts that you were introduced to in [Module Four](./module-04-technology-overview.md). 9 | 10 | There are several readings, some online documentation to skim, and several power points that you will review. 11 | 12 | ### Objectives: 13 | 1. Become familiar with common capture related terms such as seed, path, domain, subdomain. 14 | 2. Understand where acquisition of content fits into the lifecycle of a web archive. 15 | 3. Create your first web capture in the Wayback Machine at the Internet Archive. 16 | 17 | ## Readings 18 | 19 | ### Web Archiving 20 | * Archive-It Help Center, Glossary of Archive-It and Web Archiving Terms - https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms 21 | * Review these terms and pay specific attention to Crawl, Crawler, Document, Domain, Host, Scope, Seed, and Sub-domain. 22 | * About /robots.txt - https://www.robotstxt.org/robotstxt.html 23 | * Become familiar with the concepts discussed on this page. 24 | * This is how website owners can give hints to crawlers about what to crawl but usually what not to crawl. 25 | * International Internet Preservation Consortium (2020). Session 3A: Main Concepts and Technologies: Capture 26 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3a-slides/ 27 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3a-notes/ 28 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen. 29 | * Mohr, G., Stack, M., Ranitovic, I., Avery, D., & Kimpton, M. (2004). An Introduction to Heritrix An open source archival quality web crawler. International Web Archiving Workshop. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.6877&rep=rep1&type=pdf 30 | * This article is a tiny bit dated but is the best reference about the beginnings of Heritrix that I could find. 31 | * Davis, R. (2011). Saving the Smithsonian's Web. https://siarchives.si.edu/blog/saving-smithsonians-web 32 | 33 | ### Heritrix 34 | Review the following links to learn more about the Heritrix Crawler. 35 | * https://github.com/internetarchive/heritrix3 36 | * https://github.com/internetarchive/heritrix3/wiki 37 | * https://netpreserveblog.wordpress.com/2019/02/19/a-new-release-of-heritrix-3/ 38 | 39 | Bonus Video Overview of Heritrix 40 | 41 | * Fisher, D. (2018). Heritrix Web Crawler. - https://www.youtube.com/watch?v=RmHG0MaFJSI 42 | * This is a video from a peer of yours at Simmons in the School of Library & Information Science. I think they do a great job in the presentation overall. 43 | 44 | ### Additional Optional Readings 45 | * Brunelle, J., Ferrante, K., Wilczek, E., Weigle, M. & Nelson, M. (2016). Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives. D-LibMagazine. 22(1/2) https://doi.org/10.1045/january2016-brunelle 46 | 47 | ## Archiving Exercise 48 | 49 | ### Web Archiving Exercise - Saving a Webpage at the Internet Archive 50 | This week we are going to save our first webpage in a full web archiving infrastructure. 51 | 52 | Again we turn to the Internet Archive and their Wayback Machine. 53 | 54 | If you navigate over to this page https://web.archive.org/ , in the bottom right side of the page you will see the "Save Page Now" 55 | 56 | ![Alt](images/module-05-save-page-now.png 'Wayback Machine home with red arrow pointing to the "Save Page Now" tool.') 57 | 58 | 59 | This will link you to the "Save Page Now" interface - https://web.archive.org/save 60 | 61 | This week we are going to select a webpage and save it to the Internet Archive's collection. 62 | 63 | When picking your page to archive keep in mind that you will be describing and linking to it in this weeks discussion post. 64 | 65 | One thing to note, if you have an Internet Archive account, you will have more options when you register and sign in than you will without that account. For this exercise you do not need an account. 66 | 67 | Select a webpage you want to archive, add it to the box, go ahead and leave the "Save error pages (HTTP Status=4xx, 5xx)" checked and "Save Page". 68 | 69 | After you click the "Save Page" button leave the page open so that you can see what is happening. 70 | 71 | In the discussion this week you will write a paragraph about what you see happening, How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. What are the potential uses of this kind of "micro" web archiving service? 72 | 73 | You will need to link to the specific capture you initiated in the Wayback Machine. 74 | 75 | ## Exploring Web Archives 76 | 77 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 78 | 79 | This week we will look at the web archives at that UNT Libraries. 80 | 81 | ### Web Archives 82 | 83 | UNT Web Archives (UNT Digital Library Interface) - https://digital.library.unt.edu/explore/collections/UNTWEB/browse/?sort=date_d 84 | 85 | UNT Libraries' Web Archives (Wayback Interface) - https://webarchive.library.unt.edu/ 86 | 87 | CyberCemetery - https://cybercemetery.unt.edu 88 | 89 | * Cathy Hartman and CyberCemetery - https://www.digitalpreservation.gov/series/pioneers/hartman.html 90 | * Hartman, C. N., Hastings, S. K., & Alemneh, D. G. (2004). The Cybercemetery: Prolonging Usable Afterlife. IS&T--the Society for Imaging Science and Technology. https://digital.library.unt.edu/ark:/67531/metadc29310/ 91 | 92 | UNT Libraries' Archive-It Collections - https://archive-it.org/organizations/1181 93 | * Primarly focused on Special Collections 94 | 95 | ### Grant Project related to Web Archives 96 | 97 | **National Digital Information and Infrastructure Preservation Program: Web-at-Risk (2005-2008)** 98 | 99 | * Web-at-Risk: Preserving Our Nation's Cultural Heritage - UNT Digital Library 100 | * Seneca, T. (2009). The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Project. Library Trends, 57(3), 427-441. http://hdl.handle.net/2142/13606 101 | 102 | **Expanding Collection Development Practices to Web Archives (EOTCD) (2009-2013)** 103 | 104 | * Hartman, C. N., Murray, K. R., & Phillips, M. E., (2013). Classification Of The End-Of-Term Archive: Extending Collection Development Practices To Web Archives. https://digital.library.unt.edu/ark:/67531/metadc152437/ 105 | * Murray, K. R., & Hartman, C. N. (2012). Classifying the End-of-Term Archive. IS & T--the Society for Imaging Science and Technology Archiving Conference, 2012, Copenhagen, Denmark. https://digital.library.unt.edu/ark:/67531/metadc93305/ 106 | * Phillips, M. E., & Murray, K. R. (2013). Improving Access to Web Archives through Innovative Analysis of PDF Content. IS & T--the Society for Imaging Science and Technology Archiving Conference, 2013, Washington, D.C., United States. https://digital.library.unt.edu/ark:/67531/metadc155622/ 107 | 108 | **Programmatic Extraction of 'Documents' from Web Archives (2017-2020)** 109 | 110 | * Phillips, M. E. & Caragea, C. (2017) Programmatic Extraction of 'Documents' from Web Archives https://www.imls.gov/grants/awarded/lg-71-17-0202-17 111 | * Fox, N. T., Phillips, M. E., & Tarver, H. (2020). Programmatic Extraction of ‘Documents’ from Web Archives: Identifying Document Characteristics from Content Selector Interviews. https://digital.library.unt.edu/ark:/67531/metadc1757659/ 112 | * Patel, K., Caragea, C. Phillips, M. E., & Fox., N. (2020). Identifying Documents In-Scope of a Collection from Web Archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 167-176 2020. https://doi.org/10.1145/3383583.3398540 113 | * arXiv version - https://arxiv.org/abs/2009.00611 114 | 115 | ## Discussion 116 | 117 | ### Discussion Post: 118 | In at least one paragraph, discuss what you learned about the capture technologies involved in the web archiving space. What were some of the terms that were new to you this week? What are some things that still need clarity for you? 119 | 120 | In at least one paragraph, describe what happened when you used the Wayback Machines "Save Page Now" mechanism. What webpage did you choose to save? Link to your capture of the website in the Wayback Machine. How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. What are the potential uses of this kind of "micro" web archiving service? 121 | 122 | Finally, in at least one paragraph, discuss the web archive or archived website that you reviewed this week from the UNT Libraries. Were you surprised by anything you found in the websites? Are there things that you would have expected that you didn't see? Discuss any of the related projects or grants that you explored as well. 123 | 124 | ### Class Engagement: 125 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 126 | 127 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 128 | -------------------------------------------------------------------------------- /modules/module-06-preserve.md: -------------------------------------------------------------------------------- 1 | # Module Six - Preserve 2 | 3 | ## Overview and Objective 4 | 5 | ### Overview: 6 | Once you have decided what you are interested in collecting, and after you have decided how to crawl this content you need to think about how you will store and preserve the crawled content. This module will take a deeper look at the process of preservation of content for a web archive, introduce you to some new terms and file formats, and give you a chance to create another web capture with a different tool. 7 | 8 | This will build on concepts that you were introduced to in Module Four. 9 | 10 | There are several readings, some online documentation to skim, a video to watch, and several power points that you will review. 11 | 12 | ### Objectives: 13 | 1. Become familiar with different approaches to preserving web content. 14 | 2. Become familiar with the Web ARCchive (WARC) File Format. 15 | 3. Create your second web capture in the https://archive.today service. 16 | 17 | ## Readings 18 | 19 | ### Readings 20 | 21 | * International Internet Preservation Consortium (2020). Session 3b: Main Concepts and Technologies: Preserve 22 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3b-slides/ 23 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3b-notes/ 24 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen. 25 | * Kunze, J. (2005). WARC: an Archiving Format for the Web. International Web Archiving Workshop - https://web.archive.org/web/20120619151338/http://www.iwaw.net/05/kunze.pdf 26 | * Pennock, Maureen. “Web Archiving” Digital Preservation Coalition Technology Watch Report 13-01 (March 2013) - http://dx.doi.org/10.7207/twr13-01 27 | * Section 3. Standards (pgs. 17-18) 28 | * International Internet Preservation Consortium (2020). IIPC Training Video Case Study, Topic 7: Web Archiving Tools and Services - https://www.youtube.com/watch?v=MaynKx0_Oow 29 | 30 | ### Web ARChive (WARC) File Format 31 | Skim this documentation and focus on the types of WARC records that can exist. (This is part of this week's discussion) 32 | 33 | * International Internet Preservation Consortium. (2022). WARC Specifications. https://iipc.github.io/warc-specifications/ 34 | * https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/ 35 | * https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#warc-record-types 36 | * https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#annex-b-informative-examples-of-warc-records 37 | 38 | Review/Skim the following links to learn more about the WARC File Format. 39 | * Library of Congress. (2020). WARC, Web ARChive file format. https://www.loc.gov/preservation/digital/formats/fdd/fdd000236.shtml 40 | * The National Archives (2020). Details for: WARC http://www.nationalarchives.gov.uk/pronom/fmt/289 41 | * Wikipedia - https://en.wikipedia.org/wiki/Web_ARChive 42 | * International Standards Organization (2009). ISO 28500:2009 Information and documentation — WARC file format. https://www.iso.org/standard/44717.html 43 | * **Do NOT buy this standard. It is for reference only.** 44 | * Instead, use the public draft before standardization - http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf 45 | * Archive Team (2021) - https://wiki.archiveteam.org/index.php?title=The_WARC_Ecosystem 46 | * ARC Format (precursor of the WARC format) - https://github.com/internetarchive/heritrix3/wiki/ARC%20File%20Format 47 | 48 | ### Additional Optional Video 49 | * Consultative Committee for Space Data Systems (CCSDS), Data Archive Interoperability (DAI) Working Group, Kearney, Michael W. III, Giaretta, D., Garrett, J., Hughes, S. (2020 What's missing from WARC? - https://www.youtube.com/watch?v=vdEaz109uAo 50 | 51 | ## Archiving Exercise 52 | 53 | ### Saving a Webpage with Archive.Today 54 | 55 | This week we are going to save our second webpage in a full web archiving infrastructure. 56 | 57 | This time we turn to a service called Archive.Today. 58 | 59 | You can read more about this service on its Wikipedia page (https://en.wikipedia.org/wiki/Archive.today) 60 | 61 | Start by navigating over to this page https://archive.today, (it most likely will redirect to https://archive.ph , this is fine) 62 | 63 | ![Alt](images/module-06-archive-today.png "Homepage for the Archive.Today service.") 64 | 65 | This week we are going to select a webpage and save it using the archive.today service. 66 | 67 | When picking your page to archive keep in mind that you will be describing and linking to it in this weeks discussion post. 68 | 69 | Select a webpage you want to archive, add it to the box that says "My url is alive and I want to archive its content" and hit enter. 70 | 71 | You may be asked to prove you are a robot, once you have done that you will see if the webpage has been archived before or if you are the first to capture it. If it has been archived before, go ahead and say that you would like to archive it again. 72 | 73 | After you click the "Save" button leave the page open so that you can see what is happening. 74 | 75 | In the discussion this week you will write a paragraph about what you see happening, How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. Compare this with tool from the Internet Archive's Wayback Machine. You will need to link to the specific capture you initiated using archive.today in this weeks discussion post. 76 | 77 | ## Exploring Web Archives 78 | 79 | Exploring Web Archives 80 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 81 | 82 | This week we will look at the Collaborative Collections of the International Internet Preservation Consortium. 83 | 84 | Collaborative Collections 85 | * International Internet Preservation Consortium (IIPC). Collaborative Collections - https://netpreserve.org/projects/collaborative-collections/ 86 | * IIPC Collaborative Collections at Archive-It. - https://archive-it.org/home/IIPC 87 | * Using IIPC Collaborative Collections WARC data - https://netpreserve.org/iipc-cdg-warc-data/ 88 | * Thurman, A., & Grotke, A. (2016). Content Development Group & Collaborative Collections Update. - https://digital.library.unt.edu/ark:/67531/metadc1477165/ 89 | 90 | ## Discussion 91 | 92 | ### Discussion Post: 93 | In at least one paragraph, discuss what you learned this week about preservation of web archives and specifically about the WARC format. What were some of the terms or concepts that were new to you this week? What are some things that still need clarity for you? 94 | 95 | Looking back at the WARC Specification on Github, it contains some examples of different types of WARC records that are normally seen in a WARC file. In the Appendix B they have these examples listed - https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#annex-b-informative-examples-of-warc-records . In at least one paragraph, pick one of these record types and in your own words, describe why this kind of record makes sense in the WARC format. How is it used? What is it used for? What kind of information does it contain? Why is this information important? 96 | 97 | In at least one paragraph, describe what happened when you used the Archive.Today service. What webpage did you choose to save? Link to your capture of the website at the Archive.Today service. How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. 98 | 99 | Finally, in at least one paragraph, discuss the collaborative collection you chose to look at in the IIPC collection. Were you surprised by anything you found in the websites? Are there things that you would have expected that you didn't see? What are some other examples of archives that an international group like the IIPC might want to look at for a collaborative collection in the future? Remember to include links to the collection that you choose as well as examples of archived websites in your post. 100 | 101 | ### Class Engagement: 102 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 103 | 104 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 105 | -------------------------------------------------------------------------------- /modules/module-07-playback.md: -------------------------------------------------------------------------------- 1 | # Module Seven - Playback 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | 7 | Now that we have learned about standard ways of harvesting and storing web content in a web archive, the next step is providing access to that content. In the web archiving community this is most often referred as **playback** or **replay**. This module will take a deeper look at the some of the fundamentals of how replay works in most web archives. Additionally it will introduce you to some of the standard playback tools such as Open Wayback and pywb. 8 | 9 | This will build on concepts that you were introduced to in Module Six. 10 | 11 | There are several readings, some online documentation to skim, and several power points that you will review. 12 | 13 | ### Objectives: 14 | 15 | 1. Learn the fundamentals of how playback of archived web content works. 16 | 2. Become familiar with some of the common software tools for web archive replay. 17 | 3. Experiment with different replay environments and combinations of browsers and operating system with https://oldweb.today/ 18 | 19 | ## Readings 20 | 21 | * International Internet Preservation Consortium (2020). Session 3C: Main Concepts and Technologies: Playback 22 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3c-slides/ 23 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3c-notes/ 24 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen. 25 | * Internet Archive (2021). How to use the Wayback Machine. https://www.youtube.com/watch?v=ts1tu1BiSuY 26 | * Sigurðsson, K. (2020). The Future of Playback. https://netpreserveblog.wordpress.com/2020/06/16/the-future-of-playback/ 27 | * International Internet Preservation Consortium (2020). IIPC Training Video Case Study, Topic 6: Accessing and Using Web Archives. 28 | https://www.youtube.com/watch?v=Dng8d9ytOUc 29 | * A Short on How the Wayback Machine Stores more Pages than Stars in the Milky Way. http://highscalability.com/blog/2014/5/19/a-short-on-how-the-wayback-machine-stores-more-pages-than-st.html 30 | * Technical discussion of how the Wayback Machine Works. 31 | 32 | ### Software Links 33 | * Wayback Machine on Wikipedia - https://en.wikipedia.org/wiki/Wayback_Machine 34 | * Webrecorder pywb 2.6 - https://github.com/webrecorder/pywb 35 | * OpenWayback - https://github.com/iipc/openwayback/ 36 | * OpenWayback to pywb Transition Guide and pywb update - https://netpreserveblog.wordpress.com/2020/12/16/openwayback-to-pywb-transition-guide/ 37 | * OpenWayback Transition Guide - https://pywb.readthedocs.io/en/latest/manual/owb-transition.html 38 | * Replay Web.Page - https://replayweb.page/ 39 | 40 | ## Archiving Exercise 41 | 42 | ### Web Archiving Exercise - Viewing a website on Old Web Today 43 | 44 | This week we will look at viewing websites on older browsers and operating systems. 45 | 46 | The tool that we will be exploring is a web service called Old Web Today https://oldweb.today . 47 | 48 | This tool is developed and maintained by the team that created Webrecorder (https://webrecorder.net/ ) 49 | 50 | The goal of this tool is to more easily allow users to experience websites in the tools and technologies from when websites were captured. 51 | 52 | ![Alt](images/module-07-oldweb-today-01.png "Homepage for the OldWeb.Today service.") 53 | 54 | Start by navigating to the website https://oldweb.today/ 55 | 56 | **Note**: _I have had mixed results with this service. I think it is very interesting and one of the only ways for us to experience browsers from over twenty years ago. That being said, it is pretty finicky and can be a bit frustrating. Try a few different things as experiments and spend at least 15 minutes trying different combinations. _ 57 | 58 | I suggest starting with the NCSA Mosaic 2 browser. You can then add a URL that is likely to have existed early in the web. I chose http://nasa.gov and was curious about what things looked like back in 1996. 59 | 60 | You can select other websites and time periods or even look at a modern website from today on a browser from the past. 61 | 62 | Here is an example that will give you an idea of how things work. For the activity, please select a different URL for your experiments. 63 | 64 | * About the Mosaic web browser - https://en.wikipedia.org/wiki/Mosaic_(web_browser) 65 | * NASA website from 1996 on NCSA Mosaic 2 - https://oldweb.today/?browser=nm2-mac#19960101/http://nasa.gov 66 | * NASA website from 1998 on NCSA Mosaic 2 - https://oldweb.today/?browser=nm2-mac#19980101/http://nasa.gov 67 | * NASA website from the Live web on NCSA Mosaic 2 - https://oldweb.today/?browser=nm2-mac#http://nasa.gov 68 | 69 | ![Alt](images/module-07-oldweb-today-02.png "Oldweb.Today with emulated nasa.gov in an emulated NCSA Mosaic Browser.") 70 | 71 | 72 | Experiment with different browsers, urls, and switch between the live web and the archived web. 73 | 74 | You have to be patient with all of these emulated systems. What it is doing behind the scenes (emulating an operating system in Javascript) is pretty cool, but takes patience. Here is more information about the technology - https://github.com/oldweb-today/oldweb-today 75 | 76 | For the discussion this week you will describe your experience with this system, what you tried to access, how well it did or didn't work and if you were surprised by anything. What is the earliest you can remember accessing websites on the internet. What tools do you remember? What websites do you remember using? Finally, what are different uses for a web service like this in the web archiving space? 77 | 78 | ## Exploring Web Archives 79 | 80 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 81 | 82 | This week we will look at the web archives at the End of Term (EOT) Collaborative Web Archive. 83 | 84 | ### End of Term Web Archive 85 | * End of Term Web Archive Website - https://eotarchive.org/ 86 | * End of Term Twitter Account - https://twitter.com/eotarchive 87 | * End of Term Web Archive Wikipedia Page - https://en.wikipedia.org/wiki/End_of_Term_Web_Archive 88 | Webarchive of Press Releases about the 2016 EOT Web Archive - https://archive-it.org/collections/8311 89 | 90 | ### Collected Websites 91 | * Browse 2008, 2012, 2016 - http://eotarchive.cdlib.org/ 92 | * End of Term 2008 - UNT Digital Library - https://webarchive.library.unt.edu/eot2008/ 93 | * End of Term 2012 - UNT Digital Library - https://webarchive.library.unt.edu/eot2012/ 94 | * End of Term 2016 - UNT Digital Library (UNT Crawls Only) - https://webarchive.library.unt.edu/eot2016/ 95 | * End of Term 2020 - UNT Digital Library (UNT Crawls Only) - https://webarchive.library.unt.edu/eot2020/ 96 | 97 | ### Seed Lists for Collection 98 | 99 | URL Nomination Tool - https://digital2.library.unt.edu/nomination/ 100 | 101 | * End of Term Presidential Harvest 2008 - https://digital2.library.unt.edu/nomination/eth2008/ 102 | * End of Term Presidential Harvest 2012 - https://digital2.library.unt.edu/nomination/eth2012/ 103 | * End of Term 2012 - Bulk Lists - https://digital2.library.unt.edu/nomination/eth2012_bulk/ 104 | * End of Term Presidential Harvest 2016 - https://digital2.library.unt.edu/nomination/eth2016/ 105 | * End of Term 2016 - Bulk Lists - https://digital2.library.unt.edu/nomination/eth2016_bulk/ 106 | * End of Term Presidential Harvest 2020 - https://digital2.library.unt.edu/nomination/eth2020/ 107 | * End of Term 2020 - Bulk Lists - https://digital2.library.unt.edu/nomination/eth2020_bulk/ 108 | 109 | ### Articles about the End of Term 110 | * Seneca, T., Grotke, A., Hartman, C. N., & Carpenter, K. (2012). It Takes A Village To Save The Web: The End Of Term Web Archive. Documents to the People. 40(1). https://digital.library.unt.edu/ark:/67531/metadc84373/ 111 | * Phillips, M. E. & Phillips, K. K. (2017). End of Term 2016 Presidential Web Archive. Against the Grain 29(6) https://doi.org/10.7771/2380-176X.7874 112 | * Phillips, M. E. , Chudnov, D., & Jacobs, J. R. (2016). Exploratory Analysis of the End of Term Web Archive: Comparing Two Collections. Web Archiving Workshop, Joint Conference on Digital Libraries, Newark, New Jersey. https://digital.library.unt.edu/ark:/67531/metadc854106/ 113 | 114 | ## Discussion 115 | 116 | ### Discussion Post: 117 | In at least one paragraph, discuss what you learned this week about playback of and access to web archives. What were some of the terms or concepts that were new to you this week? What are some things that still need clarity for you? 118 | 119 | In at least one paragraph, describe what happened when you used the OldWeb.Today service. What combinations did you tried to access, how well did it or didn't it work? Was this your first time you have used emulated software? What did you think of the process? What is the earliest you can remember accessing websites on the internet. What tools do you remember? What websites do you remember using? Finally, what are different uses for a web service like this in the web archiving space? 120 | 121 | Finally, in at least two paragraphs, discuss the End of Term Web Archive and what you learned about this collaborative collection. Who are some of the institutions involved with this effort? What websites did you try to access in the different term's web archive? Were you successful in navigating to the different term's content? With this being a volunteer effort, there are some serious limitations in how users can access this content. Based on what you have learned this week and over the past few weeks in this course, what are some suggestions you would make to this effort on ways of improving access to these web crawls? 122 | 123 | ### Class Engagement: 124 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 125 | 126 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 127 | -------------------------------------------------------------------------------- /modules/module-08-other-tools.md: -------------------------------------------------------------------------------- 1 | # Module Eight - Other Tools 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | The purpose of this module is to provide an overview of a number of tools for curation, characterization or profiling (to analyze and understand the captured content) and tools to widen access to collections and raise awareness for web archives. 7 | 8 | This will build on concepts that you were introduced to in the previous technology modules. 9 | 10 | There are several readings, some online documentation to skim, and several power points that you will review. 11 | 12 | ### Objectives: 13 | 1. Learn about additional tools in the web archiving landscape. 14 | 2. Become familiar with some of the standards in development for enabling cross-archive access (Memento). 15 | 3. Experiment with ArchiveReady and familiarize yourself with the concept of archivability. 16 | 17 | 18 | ## Readings 19 | 20 | ### General 21 | 22 | The readings in this module are chosen to give you some exposure to different tools and systems that are being used around the world for different aspects of the web archive lifecycle. 23 | 24 | * International Internet Preservation Consortium (2020). Session 3D: Main Concepts and Technologies: Other Tools 25 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3d-slides/ 26 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3d-notes/ 27 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen. 28 | 29 | ### Memento 30 | 31 | * Jones S.M., Klein M., Sompel H.V.., Nelson M.L., Weigle M.C. (2021) Interoperability for Accessing Versions of Web Resources with the Memento Protocol. In: Gomes D., Demidova E., Winters J., Risse T. (eds) The Past Web. Springer, Cham. https://doi.org/10.1007/978-3-030-63291-5_9 32 | * Direct Link to UNT Libraries Access - https://link-springer-com.libproxy.library.unt.edu/chapter/10.1007/978-3-030-63291-5_9 33 | * Web Archiving Fundamentals pt 2: Memento https://video.vt.edu/media/Web+Archiving+Fundamentals+pt+2A+Memento/1_5gvqowto 34 | * This is probably the best overview of what Memento actually is and how it can be used to improve access to web archives. 35 | * Memento Guide - Introduction to Memento - http://www.mementoweb.org/guide/quick-intro/ 36 | * About the Memento Project - http://mementoweb.org/about/ 37 | * Memento Project - https://en.wikipedia.org/wiki/Memento_Project 38 | * Memento, About the Time Travel Service - http://timetravel.mementoweb.org/about/ 39 | * Memento at the W3C - https://www.w3.org/blog/2016/08/memento-at-the-w3c/ 40 | * Coalition for Networked Information (2010). CNI: Memento: Time Travel for the Web - 41 | https://www.youtube.com/watch?v=ePBMn-_I1rU 42 | * I don't expect you to watch this whole video, but it is a very good, and deep dive into Memento. 43 | 44 | ### Webrecorder/Conifer 45 | 46 | Webrecorder (service is now called Conifer) 47 | 48 | * Conifer. (2019). Introduction to Webrecorder.io - getting started https://www.youtube.com/watch?v=yX2RrfNPQjg 49 | * Rhizome. (2020). Webrecorder.io is now Conifer.rhizome.org. https://blog.conifer.rhizome.org/2020/06/11/webrecorder-conifer.html 50 | * Frequently Asked Questions - https://conifer.rhizome.org/_faq 51 | * Conifer Guide - https://guide.conifer.rhizome.org/ 52 | 53 | ### Archives Unleashed 54 | 55 | * Ruest, N., Lin, J., Milligan, I., & Fritz, S. (2020). The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. Association for Computing Machinery, New York, NY, USA, 157–166. https://doi.org/10.1145/3383583.3398513 56 | * UNT Libraries Access Link - https://dl-acm-org.libproxy.library.unt.edu/doi/pdf/10.1145/3383583.3398513 57 | * Archives Unleashed Overview - https://www.youtube.com/watch?v=nBwgM63MxY8 58 | * Project Website - https://archivesunleashed.org/ 59 | * The Archives Unleashed Toolkit - https://archivesunleashed.org/aut/ 60 | 61 | ### Seed Nomination Services 62 | * URL Nomination Tool - https://digital2.library.unt.edu/nomination/ 63 | * URL Nomination Tool Code (django-nomination) - https://github.com/unt-libraries/django-nomination 64 | * URL Nomination Tool Presentation - https://digital.library.unt.edu/ark:/67531/metadc287023/m2/ 65 | * Cobweb Service - https://cdlib.org/services/pad/webarchiving/cobweb/ 66 | * Cobweb Code - https://github.com/CobwebOrg/cobweb 67 | 68 | ### Other Crawling Services / Software Suites 69 | * Web Curator Tool - https://webcuratortool.org/ 70 | * Web Curator Tool Code - https://github.com/DIA-NZ/webcurator/wiki 71 | * Web Curator Tool Documentation - http://webcurator.sourceforge.net/ 72 | * Netarchive Suite - https://sbforge.org/display/NAS/NetarchiveSuite 73 | * Netarchive Suite Github - https://github.com/netarchivesuite 74 | 75 | 76 | ## Archiving Exercise 77 | 78 | ### Web Archiving Exercise - Archivability with ArchiveReady 79 | This week we will look at the concept of archivability as it pertains to Web Archives. 80 | 81 | First, take a look at this paper by Banos, Kim, Ross and Manolopoulos. 82 | 83 | Banos V., Kim Y., Ross S., Manolopoulos Y.: CLEAR: a credible method to evaluate website archivability, iPRES 2013, http://purl.pt/24107/1/iPres2013_PDF/CLEAR%20a%20credible%20method%20to%20evaluate%20website%20archivability.pdf 84 | 85 | You don't need to read the article in depth, but it is helpful to get the context of the work. 86 | 87 | Next, navigate to http://archiveready.com (sadly it is not an https service). 88 | 89 | ![Alt](images/module-08-archiveready-01.png "Homepage for the ArchiveReady service.") 90 | 91 | Get familiar with this service by browsing around the website. 92 | 93 | After you have looked around, choose the homepage of an organization or other website you want to test. 94 | 95 | Enter the websites URL into the provided box and choose "Check now". 96 | 97 | For this example I chose to look at the Electric Reliability Council of Texas (ERCOT) website (https://ercot.com ). 98 | 99 | ![Alt](images/module-08-archiveready-02.png "ArchiveReady results page for http://ercot.com") 100 | 101 | Look at the different results tabs to get an idea of the different metrics and archivability facets. 102 | 103 | In this week's discussion you will share the website you chose for this exercise. Additionally, you will share the Overall ratings as well as a synthesis of the findings from the service. Any additional observations you find interesting about this service would be good to share as well. Finally, explain how you think this kind of service could be helpful in the web archiving lifecycle. 104 | 105 | ### Additional Readings about Archivability 106 | * Web Archivability Community Group - https://www.w3.org/community/webarchivability/ 107 | * Archivability - https://library.stanford.edu/projects/web-archiving/archivability 108 | * Banos V., Manolopoulos Y.: A quantitative approach to evaluate Website Archivability using the CLEAR+ method, International Journal on Digital Libraries, 2015, https://doi.org/10.1007/s00799-015-0144-4 109 | * UNT Direct Link - https://www-proquest-com.libproxy.library.unt.edu/docview/1785958458?pq-origsite=summon 110 | 111 | ## Exploring Web Archives 112 | 113 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 114 | 115 | This week we will look at the Portugal National Web Archive (https://arquivo.pt/?l=en ) 116 | 117 | ![Alt](images/module-08-arquivo.png "Homepage for Arquivo.pt") 118 | 119 | What is Arquivo.pt - https://sobre.arquivo.pt/en/help/what-is-arquivo-pt/ 120 | 121 | Examples of preserved pages - https://sobre.arquivo.pt/en/examples/examples/ 122 | 123 | Exhibitions - https://sobre.arquivo.pt/en/examples/collections/ 124 | 125 | Youtube Channel for Arquivo.pt - https://www.youtube.com/channel/UCEMJX0ICk1t2TzuNXghxKDg 126 | 127 | Some things that I would like for you to notice is the different ways of presenting web archives. The style of the wayback interface that they are presenting is different than others we have seen so far in this course. A description of the differences would be useful in your discussion post. The service also presents some examples and exhibits to help users get into the web archives a little better. How well does this work in your opinion? What did you end up exploring in the archive? How well does arquivo.pt present content in Portuguese and English. Have you seen other interfaces in multiple languages in our web archive exploration so far? 128 | 129 | It should be of no surprise that the web archive focusing on the .pt domain and the national web of Portugal might not have the same content we are used to seeing here in the United States. If you have trouble thinking about what to look at in the archive consider finding the url of a city in Portugal, a local sports team or other cultural event in Portugal and exploring what has been archived 130 | 131 | ## Discussion 132 | 133 | ### Discussion Post: 134 | In at least one paragraph, discuss what you learned this week about other tools and service in the web archive landscape. What were some of the terms or concepts that were new to you this week? What are some things that still need clarity for you? How do you think a protocol Memento and its associated technologies can benefit the web archive landscape? 135 | 136 | In at least one paragraph, describe what happened when you used the ArchiveReady service service. What was the website you chose for this exercise? What was the Overall rating that this website received? Discuss the findings from the service in addition to this overall score. Any additional observations you find interesting about this service would be good to share as well. Finally, explain how you think this kind of service could be helpful in the web archiving lifecycle. 137 | 138 | Finally, in at least two paragraphs, discuss the Portugal National Web Archive arquivo.pt and what you learned about this web archive. What are some differences you noticed in the presentation of web archives in this service compared to collections we have looked at in previous weeks. Share your observations of the examples and the exhibitions and share your opinions about if that helped in exploring the collection. Finally what are some of the websites that you explored in arquivo.pt? 139 | 140 | ### Class Engagement: 141 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 142 | 143 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 144 | 145 | Search entries or author 146 | -------------------------------------------------------------------------------- /modules/module-09-collection-policies.md: -------------------------------------------------------------------------------- 1 | # Module Nine - Collection Policies 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | 7 | The purpose of this module is to show how collection policies, collection scopes, and general selection activities occur in the scope of building a web archive. Web archiving is at its lowest level another tool for building collections in libraries and archives. In order to communicate the scope of these collections with others some sort of collection scope or policy statement can be useful. 8 | 9 | This will build on concepts that you were introduced to in the technology modules and start to align them with other concepts in the library and archives space. 10 | 11 | There are several readings, some online documentation to skim, and several power points that you will review. 12 | 13 | ### Objectives: 14 | 1. Familiarize yourself with different web archive collection policies. 15 | 2. Understand the role that web archives can play in the collecting and acquisition of materials in libraries and archives. 16 | 3. Install and use the ArchiveWeb.page tool for creating web collections locally. 17 | 18 | ## Readings 19 | 20 | ### General 21 | 22 | The readings this week were selected to give you an introduction to the collection development approaches that are common to building collections using web archiving tools and techniques. These readings are a combination of both theory and practice in this space. You will be introduced to the framework we will use for the next major assignment in the Murray & Hsieh piece. 23 | 24 | * International Internet Preservation Consortium (2020). _Session 7: Writing a Web Archiving Policy_ 25 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-7-slides/ 26 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-7-notes/ 27 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen. 28 | * International Internet Preservation Consortium (2020). _IIPC Training Video Case Study, Topic 5: Web Archiving Collecting Policies_ 29 | https://www.youtube.com/watch?v=-NxJXrUTJ8A 30 | * Post, C. (2017). Building a Living, Breathing Archie: A Review of Appraisal Theories and Approaches for Web Archives. Preservation, Digital Technology & Culture 46(2). 69-77. https://doi.org/10.1515/pdtc-2016-0031 31 | * UNT Libraries Direct Link - https://libproxy.library.unt.edu/login?url=https://www.proquest.com/docview/1940603266 32 | * Free online version - http://libres.uncg.edu/ir/uncg/f/C_Post_Building_2017.pdf 33 | * Summers, E. & Punzalan, R. (2017). Bots, Seeds and People: Web archives as infrastructure. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 821-834. https://doi.org/10.1145/2998181.2998345 34 | * UNT Libraries Direct Link - https://dl-acm-org.libproxy.library.unt.edu/doi/10.1145/2998181.2998345 35 | * Free arXiv Version - https://arxiv.org/abs/1611.02493 (pdf: https://arxiv.org/pdf/1611.02493.pdf) 36 | * Read Crawl Modalities in Findings p. 825-826 (or p 5-6 in the preprint) 37 | * Ward, E. (2018). Archiving the Web @EBRPL: Creating and following a web collecting policy in a public library. https://archive-it.org/blog/post/archiving-the-web-ebrpl-creating-and-following-a-web-collecting-policy-in-a-public-library/ 38 | * Murray, K. & Hsieh, I. (2006). Collection Planning Guidelines https://digital.library.unt.edu/ark:/67531/metadc33006/ 39 | * https://digital.library.unt.edu/ark:/67531/metadc33006/m2/1/high_res_d/cpg_final_31may2006.pdf 40 | * Page 18-40 will come back in the next major assignment. 41 | 42 | ### Web Archive Collection/Collecting Policies 43 | 44 | A selection of policies for web archive collections at different institutions around the US. This is only a sample of those that are easily identifiable. You should notice the scope and breadth of the different plans. In this weeks' discussion you will select one of these to describe, or ideally, find a policy from another institution not listed here for the discussion. 45 | 46 | * Columbia University Libraries - https://library.columbia.edu/collections/web-archives/policies.html 47 | * J. Paul Getty Trust - https://archives.getty.edu/getty_images/digitalresources/PublicWebArchivesCollectingPolicy.pdf 48 | * Montana State University - Web Archives Policies and Procedures - https://lib.utsa.edu/specialcollections/sites/specialcollections/files/2020-09/WebArchives_Policy_2020-08-20.pdf 49 | * NCSU Web Archiving - https://ncsu-libraries.github.io/web-archiving-docs/ 50 | * Purdue University - https://www.lib.purdue.edu/sites/default/files/spcol/purdue-archives-web-archiving-policy.pdf 51 | * Stanford University Library - https://library.stanford.edu/projects/web-archiving/collection-development 52 | * University of California San Francisco - https://www.library.ucsf.edu/archives/ucsf/web/policy/ 53 | * University of Chicago Web Archive Collection - https://www.lib.uchicago.edu/e/scrc/findingaids/view.php?eadid=ICU.SPCL.UCWEB 54 | * UT San Antonio Web Archives Policy - https://lib.utsa.edu/specialcollections/sites/specialcollections/files/2020-09/WebArchives_Policy_2020-08-20.pdf 55 | * Virginia Memory - https://www.virginiamemory.com/collections/web_archives/guidelines 56 | 57 | International Internet Preservation Consortium - Collection Development Policies - https://netpreserve.org/web-archiving/collection-development-policies/ 58 | 59 | ## Archiving Exercise 60 | 61 | ### Archiveweb.page 62 | 63 | This exercise can be accomplished by either installing a free extension via the Chrome Web Store (https://chrome.google.com/webstore/) or installing a desktop version of the Archiveweb.page application. This tool will enable you to start creating web archives locally of websites and have access to these collections. Because this involves actually installing something locally it might be a bit more involved than previous exercises. 64 | 65 | Here is a video that gives an overview of why this tool exists - 66 | https://www.youtube.com/watch?v=hPcwDoDfhmo 67 | 68 | 69 | The easiest way to work with this tool is with a modern version of the Chrome (https://www.google.com/chrome/) browser. If you don't have the ability to install the Chrome browser there are options for downloading a desktop version of Archiveweb.page. 70 | 71 | Next navigate to https://archiveweb.page/ 72 | 73 | ![Alt](images/module-09-awp-01.png "Homepage for Archiveweb.page") 74 | 75 | 76 | This video by Ilya Kreymer gives a nice overview of the process. - 77 | https://www.youtube.com/watch?v=AP6wucoqJw0&t=1067s 78 | 79 | You can also look at the provided guide - https://archiveweb.page/guide 80 | 81 | ### Activity 82 | 83 | The goal of this exercise is to experiment with this tool and try and archive some web content on your own computer. This tool contains both the capture and playback pieces in the web archive workflow. It is an interactive tool that records the things that you browse in a Chrome tab and even has some capabilities to automatically capture some content using its "Autopilot" feature. 84 | 85 | Try recording a website you are familiar with. I would suggest picking an organization, governmental or other website and keeping clear of the social media sites as you start. It makes it a bit easier to see how things are getting captured. If you like you can go back and try capturing social media sites as well, just not the best to begin with. Try creating a collection, capturing some content, and then try and download that content to your local machine. 86 | 87 | In the discussion this week you will report on your success with this tool and share information about what website you captured and how well the tool worked. Were you able to download the web archive to your local machine? What did you think about this experience compared to others. What are your observations about how this is similar or different compared to the other hosted web archiving tools you have used in previous exercises? 88 | 89 | ## Exploring Web Archives 90 | 91 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 92 | 93 | This week we will look at the UK Web Archive. 94 | 95 | ![Alt](images/module-09-ukwa.png "Homepage for the UK Web Archive.") 96 | 97 | Start by navigating over to the UK Web Archive - https://www.webarchive.org.uk/ 98 | 99 | UK Web Archive: About us - https://www.webarchive.org.uk/en/ukwa/about 100 | 101 | Topics and Themes - https://www.webarchive.org.uk/en/ukwa/collection 102 | 103 | UK Web Archive: Github repositories - https://github.com/ukwa/ 104 | 105 | The UK Web Archive, like arquivo.pt from last week is a web archive focused primarily around a national domain. 106 | 107 | The UK Web Archive in this case is focused on the countries that are included in the United Kingdom. 108 | 109 | One of the things you should try is to explore the different topics and themes to get a better idea of the websites that are included. 110 | 111 | Another feature about this web archive that you should try is the word or phrase searching. 112 | 113 | Many of you have noted that search would be nice in other discussions in this course and again like arquivo.pt we are starting to see search as another way of accessing content in web archives. 114 | 115 | ## Discussion 116 | 117 | ### Discussion Post: 118 | In at least one paragraph, discuss what you learned this week about collection policies for web archives. How familiar were you with collection description or scope statements in the past? What is the value of having collection policies for web archives? 119 | 120 | In the Summers and Punzalan (2017) article, they describe different modalities for crawling and include domain, website, topical, event based, and document crawls. In at least one paragraph select one of these modalities, describe it in your own terms and give an example of a type of web archive collection that could fit this modality. The example could be from your previous explorations in class or can be an example of a web archive collection that could be created. 121 | 122 | In one paragraph describe what you found in a web archive collection policy. You can choose one from this week's readings or if you want a virtual fist bump from me when I grade, find an example of a policy or collection scope that isn't listed in the readings. Some examples of things you might comment on are the audience of the document, the structure, how detailed or broad it was written, or any other things you noticed when looking at it. How does the document you identified assist in the web archiving process? 123 | 124 | In at least one paragraph, describe what happened when you used the ArchiveWeb.page tool. Did you have any challenges getting it installed and working? What sites did you try and capture? How well did they work for you? Were you able to download the resulting archive file? What kind of file downloaded? What did you think about this experience compared to others. What are your observations about how this is similar or different compared to the other hosted web archiving tools you have used in previous exercises? 125 | 126 | Finally, in at least two paragraphs, discuss the UK Web Archive (UKWA) and what you learned about this web archive. What are some differences you noticed in the presentation of web archives in this service compared to collections we have looked at in previous weeks. Share your observations of the topics and themes feature and share your opinions about if that helped in exploring the collection. Finally what are some of the websites that you explored in the UKWA? 127 | 128 | ### Class Engagement: 129 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 130 | 131 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 132 | 133 | -------------------------------------------------------------------------------- /modules/module-10-metadata.md: -------------------------------------------------------------------------------- 1 | # Module Ten - Metadata 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | 7 | The purpose of this module is to explore the use of metadata to describe web archives. Metadata is used to provide description and navigational aides to many types of digital collections. Web archives are one of those types of digital resources that benefit from metadata. Metadata can be applied at different levels within a web archive itself and this module will discuss different approaches to metadata use in web archives. 8 | 9 | This will build on concepts discussed in the collection policies module and will be important in the final project for this course. 10 | 11 | There are several readings, some online documentation to skim, and several power points that you will review. 12 | 13 | ### Objectives: 14 | 15 | 1. Familiarize yourself with different approaches in applying metadata to web archive. 16 | 2. Become familiar with Dublin Core metadata as it applies to web archives. 17 | 3. Explore the Memento Time Travel Service for accessing multiple web archives in a single interfaces. 18 | 19 | ## Readings 20 | 21 | ### General 22 | 23 | The readings this week were selected to give you an introduction to different approaches for applying metadata to web archives. There is a wide range of options for metadata and web archives and depending on the scope, size, and organization of your web archiving program, one or more approaches might be in place. 24 | 25 | * Bragg, M., & Hanna, K. (2013). The Web Archiving Life Cycle Model. http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf 26 | * pp. 20-21 (Metadata and Description) 27 | * Dooley, J., Farrell, K., Kim, T., & Venlet, J. (2017). Developing Web Archiving Metadata Best Practices to Meet User 28 | Needs. Journal of Western Archives. 8(2). https://doi.org/10.26077/cffd-294a 29 | 30 | ### OCLC Web Archive Metadata 31 | 32 | * Dooley J. (2016). Slam bam WAM: Wrangling best practices for web archiving metadata - https://hangingtogether.org/slam-bam-wam-wrangling-best-practices-for-web-archiving-metadata/ 33 | * Dooley, J. & Bowers, K. (2018). Descriptive Metadata for Web Archiving. OCLC Research https://www.oclc.org/research/publications/2018/oclcresearch-descriptive-metadata.html 34 | * Skim this publication. 35 | * Web Archiving Metadata Working Group - OCLC Research - https://www.oclc.org/research/areas/research-collections/wam.html 36 | Project website. 37 | * OCLC Research (2018). Outcomes from the OCLC Research Library Partnership Web Archiving Metadata Working Group 38 | https://www.youtube.com/watch?v=xTR8RK3t2jU 39 | * A nice overview of the work. 40 | 41 | ### Other Metadata Resources 42 | 43 | * UNC University Archives (2013). University of North Carolina at Chapel Hill University Archives Collected Websites, 2012-2021 https://finding-aids.lib.unc.edu/40417/ 44 | * New York Art Resources Consortium (2018). Metadata Application Profile for Description of Websites with Archived Versions Version 2. https://web.archive.org/web/20200702210617/https://www.nyarc.org/sites/default/files/web-archiving-profile-version2.pdf 45 | * Skim this publication. 46 | * Archive-It. (2021). Add, edit, and manage your metadata https://support.archive-it.org/hc/en-us/articles/208332603-Add-edit-and-manage-your-metadata 47 | * Venlet, J. (2018). Behind the Scenes: Describing Archived Websites - https://blogs.lib.unc.edu/uarms/2018/05/23/describing-archived-websites/ 48 | * Formenton, D. & Gracioso, L. (2022). Metadata standards in web archiving technological resources for ensuring the digital preservation of archived websites. RDBCI: Digital Journal of Library and Information Science, 20 https://doi.org/10.20396/rdbci.v20i00.8666263 49 | 50 | ## Archiving Exercise 51 | 52 | ### Web Archiving Exercise - Time Travel with Memento 53 | 54 | This week we will look in depth at the Time Travel service that helps discover Mementos from different web archiving programs around the world. 55 | 56 | We learned about Memento in Module Eight - Other Tools so if you would like to review the Memento Protocol or the specifics of TimeGates and TimeMaps that is a good place to start. Here is a quick overview of the different components for your review http://www.mementoweb.org/guide/quick-intro/ 57 | 58 | The Time Travel Service is an easy way to see what these protocol and infrastructure components can enable once they have been implemented. 59 | 60 | First, navigate to http://timetravel.mementoweb.org/ 61 | 62 | ![Alt](images/module-10-time-travel-01.png "Homepage for Memento Time Travel Service") 63 | 64 | You can learn more about the service by navigating to the about page - http://timetravel.mementoweb.org/about/ 65 | 66 | You can then insert a URL you want to look at and the date and time you are interested in viewing. I've decided to look at https://unt.edu back in January of 2002. http://timetravel.mementoweb.org/list/20020124170643/http://unt.edu 67 | 68 | ![Alt](images/module-10-time-travel-02.png "Time Travel Service for http://unt.edu in January, 2002") 69 | 70 | You can see the different web archives that have Mementos nearest to the time I am interested in looking at. Additionally it will tell you how far from the requested date the Memento that they have is. So in this example, the closest example is from 1 hour before my requested time of 5:06:43 PM on January 1, 2002. 71 | 72 | Try out some different URLs and times in the service. What were the archives that you saw the most in the results? What was the closest to your requested time that you saw? What was the furthest away? For example in my example the furthest away Memento for the example above was 11 years 290 days after the requested time in Arquivo.pt. Explore some of the different web archive instances that are listed and see if you notice the differences in the Mementos. Why is a service like this useful? What are some situations where having greater control for knowing when a web archiving is providing harvested content comes into play. 73 | 74 | This week's discussion you will be reporting out on your experiments with this tool. 75 | 76 | ## Exploring Web Archives 77 | 78 | ### Trove - Australian Web Archive 79 | 80 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 81 | 82 | This week we will look at the Trove archive in Australia. 83 | 84 | Trove is an aggregation platform for libraries, universities, museums, and galleries across Australia. Trove is maintained by the National Library of Australia. 85 | 86 | https://webarchive.nla.gov.au/collection 87 | 88 | 89 | 90 | ![Alt](images/module-10-trove-01.png "Homepage for Trove in Australia") 91 | 92 | https://trove.nla.gov.au/help/categories/websites-category 93 | 94 | There are two ways of interacting with Trove and its web archive. First is using the Sub Collections listed on the main page. You can browse down into sub collections and locate examples of websites held in the collection. 95 | 96 | Another option is to use the search feature in the top left side of the page. I did a quick search for a broad search for 'barrier reef' https://trove.nla.gov.au/search/category/websites?keyword=barrier%20reef 97 | 98 | ![Alt](images/module-10-trove-02.png "Search for 'barrier reef' in Trove.") 99 | 100 | Picking the first example gives me this displayed page. https://webarchive.nla.gov.au/awa/20140313113916/http://www.gbrmpa.gov.au/ 101 | 102 | ![Alt](images/module-10-trove-03.png "Archived page for the Great Barrier Reef Marine Park Authority in the Trove system.") 103 | 104 | Take some time to explore this web archive and the different ways it presents information. In this week's discussion you will talk about some of the things that you find as well as giving your observation on the interface and the different ways it presents information from other web archives that we have seen so far in class. 105 | 106 | ## Discussion 107 | 108 | ### Discussion Post: 109 | In at least one paragraph, discuss what you learned this week about metadata for web archives. How familiar were you generally with metadata before this week? What are some of the concepts that were new to you in relation to metadata? Discuss some of the different "levels" that metadata can describe in a web archive such as a seed, site, collection, or document. 110 | 111 | In one paragraph give a short sales pitch for the tool or service that you described in the Web Archive Tool Critique assignment from a few weeks ago. Give an overview of the tool, what it is trying to accomplish, and anything you would recommend to your fellow students about the tool. 112 | 113 | In at least one paragraph, describe what happened when you used the Time Travel Service from the Memento team. What were the archives that you saw the most in the results? What was the closest to your requested time that you saw? What was the furthest away? Why is a service like this useful? What are some situations where having greater control for knowing when a web archiving is providing harvested content comes into play. Any other thoughts or observations about this tool would be great to mention. 114 | 115 | Finally, in at least two paragraphs, discuss the Trove system and the Australian Web Archive. What are some differences you noticed in the presentation of web archives in this service compared to collections we have looked at in previous weeks. Share your observations of the topics and themes feature and share your opinions about if that helped in exploring the collection. Finally what are some of the websites that you explored in Trove? How well did you find the search function worked? 116 | 117 | ### Class Engagement: 118 | 119 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 120 | 121 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 122 | -------------------------------------------------------------------------------- /modules/module-11-quality-assurance.md: -------------------------------------------------------------------------------- 1 | # Module Eleven - Quality Assurance 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | The purpose of this module is to explore quality assessment in web archives. As we have seen so far in our course, there are many times when the replay of content does not match the original site. This can be caused by many situations. Was the content within scope to be crawled? Was there an issue extracting the URL for the content? Was there issues harvesting the content? Are the issues related to playback? Is the content in the web archive but unavailable because of a long chain of redirects like we see with Youtube content? There are many reasons that quality can not meet expectations in a web archive and this module is going to give you an introduction to the main concepts. 7 | 8 | This will build on concepts discussed in the Module Nine - Collection Policies, and Module Ten - Metadata modules and will be important in the final project for this course. 9 | 10 | There are several readings, some online documentation to skim, and several power points that you will review. 11 | 12 | ### Objectives: 13 | 1. Familiarize yourself with quality assessment in web archive. 14 | 2. Identify common issues found in web archives and their playback. 15 | 3. Explore the use of the ReplayWeb.page service. 16 | 17 | ## Readings 18 | 19 | The readings this week were selected to give you an introduction to quality assessment in web archives. This module will introduce common problems that can occur in the web archiving space related to quality control and approaches that can be used to combat these problems. 20 | 21 | ### Overview of the Challenges 22 | * Brown, A. (2006). Archiving websites: A practical guide for information management professionals. 23 | * Chapter 5, Quality assurance and cataloging. 69-81 24 | * Click on the title of this link - https://scholar.google.com/citations?view_op=view_citation&hl=en&user=gZuRr94AAAAJ&citation_for_view=gZuRr94AAAAJ:Se3iqnhoufwC I was able to get to the 5th chapter via Google Books if I went through this URL. Let me know if this doesn't work for you. 25 | * Available at the Discovery Park Library - https://discover.library.unt.edu/catalog/b3062630 26 | * Please do not attempt to purchase this Chapter. I can work with you to get access Download access. It is a good resource but kind of a pain to get access to. 27 | * Bragg, M., & Hanna, K. (2013). The Web Archiving Life Cycle Model. http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf 28 | * pp. 26-27 (Quality Assurance and Analysis) 29 | * Reyes, B., Phillips, M. E., & Ko, L. (2014). Current Quality Assurance Practices in Web Archiving. https://digital.library.unt.edu/ark:/67531/metadc333026/ 30 | * Reyes, B., McDevitt, J., Sun, J., & Liu, X. (2020). Quality Matters: A New Approach for Detecting Quality Problems in Web Archives. Proceedings of the Annual Conference of CAIS. https://doi.org/10.29173/cais1145 31 | 32 | ### Blog Posts 33 | * Not All Websites Are Made Equal (Or Friendly): Archiving ephemeral art content on the web - https://archive-it.org/blog/post/not-all-websites-are-made-equal-or-friendly-archiving-ephemeral-art-content-on-the-web/ 34 | * Hockx-Yu, H. (2012). How good is good enough? - Quality Assurance of harvested web resources. https://britishlibrary.typepad.co.uk/webarchive/2012/10/how-good-is-good-enough-quality-assurance-of-harvested-web-resources.html 35 | * UK Web Archive blog. (2017). The Challenges of Web Archiving Social Media - http://blogs.bl.uk/webarchive/2017/04/the-challenges-of-web-archiving-social-media.html 36 | 37 | ### Other Reading 38 | * Reyes, B. (2018). A Grounded Theory of Information Quality in Web Archives. 39 | * Dissertation - https://digital.library.unt.edu/ark:/67531/metadc1248497/ 40 | * Defense Slide - https://digital.library.unt.edu/ark:/67531/metadc1181153/ 41 | * Archive-It (n.d.) Scoping crawls for specific types of sites. https://support.archive-it.org/hc/en-us/sections/201841373-Scoping-crawls-for-specific-types-of-sites 42 | * Archive-It (n.d.) Quality Assurance Overview. https://support.archive-it.org/hc/en-us/articles/208333833-Quality-Assurance-Overview 43 | * Library of Congress (n.d.) Creating Preservable Websites. https://www.loc.gov/programs/web-archiving/for-site-owners/creating-preservable-websites/ 44 | * Marill, J., Boyko, A., Ashenfelder, M., & Jones, G. (2004). Web Harvesting Survey. https://digital.library.unt.edu/ark:/67531/metadc1457765/ 45 | 46 | ## Archiving Exercise 47 | 48 | ### Web Archiving Exercise - ReplayWeb.page 49 | 50 | This week we are going to be looking at one of the tools in the suite of tools being developed by the Webrecord.org group. 51 | 52 | This exercise will build on the work that you did in Module Nine - Archiving Exercise where you looked at the ArchiveWeb.page service. 53 | 54 | The ReplayWeb service is designed to give you access to the contents in common web archive files directly in your browser. 55 | 56 | Start by navigating to https://replayweb.page/ 57 | 58 | ![Alt](images/module-11-replayweb-01.png "Homepage for Replayweb.page") 59 | 60 | What you will do is load a warc or wacz file into this service and investigate the contents of that web archive container file. 61 | 62 | You can visit the documentation pages for this site. - https://replayweb.page/docs/ 63 | 64 | I would prefer that you try to use a .warc or .wacz file that you have created from the ArchiveWeb.page site. You can use the content you collected previously if you happened to save that file, or you can quickly build another small archive. 65 | 66 | If you can't use your own content there are a few example files available here - https://replayweb.page/docs/examples 67 | 68 | Here are some example screenshots from my session where I loaded a .wacz file and looked at the contents of that file. 69 | 70 | ![Alt](images/module-11-replayweb-02.png "Loading a file in ReplayWeb.page") 71 | 72 | 73 | And then after selecting the first webpage on that screen to view in detail. 74 | 75 | ![Alt](images/module-11-replayweb-03.png "Viewing a file in ReplayWeb.page") 76 | 77 | 78 | In the discussion the week I would like to hear your observations on this tool and interacting with these files after you have created them. What did you think about the tool? Did you have any trouble with it? 79 | 80 | ### Bonus activity: 81 | 82 | Did you know that the .wacz format is actually just a standard Zip file? It is based on this specification https://webrecorder.github.io/wacz-spec/1.2.0/ that is currently being developed. If you will change your file's name from something like webarchive.wacz to webarchive.zip you will be able to open it up on your computer most likely. If you do this bonus it would be great to see what you found inside of the wacz file. Note: The warc file does not work in this same way. 83 | 84 | ## Exploring Web Archives 85 | 86 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 87 | 88 | This week we will look at Common Crawl. 89 | 90 | From their website's main page "We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone." 91 | 92 | https://commoncrawl.org/ 93 | 94 | ![Alt](images/module-11-commoncrawl-01.png "Homepage for Common Crawl") 95 | 96 | Explore the website to learn more about this project. 97 | 98 | The main component of this project that people interact with is the data. 99 | 100 | Take a look at the "getting started" page. https://commoncrawl.org/the-data/get-started/ 101 | 102 | Pick one of the monthly crawls and explore what kinds of files they make available. Are any of these formats familiar? Are any of them new to you? 103 | 104 | Take the time to look at the list of projects that have made use of Common Crawl Data - https://commoncrawl.org/the-data/examples/ 105 | 106 | For this week's discussion I would like to hear about what you found with Common Crawl. I would also like for you to identify one project that uses Common Crawl Data that you found interesting and describe the project to the rest of your classmates. Don't forget to include links or citations so we can see what you are looking at. 107 | 108 | ## Discussion 109 | 110 | ### Discussion Post: 111 | 112 | In at least one paragraph, discuss what you learned this week about quality assurance or assessment for web archives. What are some of the common issues that can crop up when archiving websites that cause quality issues? Which of the readings resonated with you the most this week? 113 | 114 | In at least one paragraph, describe what happened when you used the ReplayWeb.page service. Were you able to successfully load your file from a few weeks ago? If not, did you look at any of the example files? What are your observations overall of this tool? What future use can you see with the tool? Were you able to open a wacz file and peek inside? What did you find inside? 115 | 116 | Finally, in at least two paragraphs, discuss the what you learned about Common Crawl this week. Why are the goals that it has? How was it started? What services does it provide? Which monthly crawl did you look at? What kind of formats did they provide access to? Were they all familiar to you? 117 | 118 | In the second Common Crawl paragraph, which project that uses Common Crawl data did you find interesting? Give a couple sentence description of what the project is about. Include links to the project so that we can follow along with your description. 119 | 120 | 121 | ## Class Engagement: 122 | 123 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 124 | 125 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 126 | -------------------------------------------------------------------------------- /modules/module-12-research.md: -------------------------------------------------------------------------------- 1 | # Module Twelve - Research with Web Archives 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | 7 | The purpose of this module is to explore some of the ways that web archives can be used to answer research questions. You will look at some examples of research that seek to answer questions about the field of web archives as well as the broader use of web archives to answer questions in different disciplines. 8 | 9 | There are several readings, some online documentation to skim, a few videos to watch, and several power points that you will review. 10 | 11 | ### Objectives: 12 | 13 | 1. Familiarize yourself with different kinds of research that takes place with web archives. 14 | 2. Learn ways of identifying how web archives are being used in research. 15 | 3. Have experience identifying research that utilizes web archives. 16 | 17 | ## Readings 18 | 19 | The readings this week were selected to give you an introduction to different kinds of research that is being done with web archives. You will also look at how different web archives are working to enable access to researchers and different strategies that are in place for working with web archives in research projects. 20 | 21 | This week's subject could easily fill a whole semester as we look at the kinds of research conducted with web archives and how web archives are working to facilitate research with web archives. I've tried to pick some examples of each and then provided a number of websites for different initiatives in this space. 22 | 23 | ### Examples of Research with Web Archives 24 | 25 | * Ben-David, A. (2019). Web Archives as Memoryware: Critical reflections on sources and methods for web history. International Internet Preservation Consortium's Web Archiving Conference 2019, Zagreb, Croatia. 26 | https://www.youtube.com/watch?v=2kRC2X88kF4 27 | * While a bit on the longer side, I thought this was one of the most interesting talks I've heard in a long time. The volume is low at the beginning but it gets better after a few minutes when the microphone is moved. 28 | * Brügger, N. (2009). Website history and the website as an object of study. New Media & Society, 11(1–2), 115–132. https://doi.org/10.1177/1461444808099574 29 | * UNT Libraries direct link - https://journals-sagepub-com.libproxy.library.unt.edu/doi/pdf/10.1177/1461444808099574 30 | * Milligan, I. (2016). Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives. International Journal of Humanities and Arts Computing 10(1) 78-94. https://doi.org/10.3366/ijhac.2016.0161 31 | * University of Waterloo's Institutional Repository Preprint - https://uwspace.uwaterloo.ca/handle/10012/10322 32 | * Webster, P. (2021). Digital archaeology in the web of links: Reconstructing a late-1990's web sphere. In D. Gomes, E. Demidova, J. Winters, & T. Risse (Eds.), The Past Web (155-164). https://doi.org/10.1007/978-3-030-63291-5_12 33 | * UNT Libraries direct link - https://link-springer-com.libproxy.library.unt.edu/content/pdf/10.1007/978-3-030-63291-5_12.pdf 34 | * Ben-David, A. (2021). Critical web archive research. In D. Gomes, E. Demidova, J. Winters, & T. Risse (Eds.), The Past Web (181-188). https://doi.org/10.1007/978-3-030-63291-5_14 35 | * UNT Libraries direct link - https://link-springer-com.libproxy.library.unt.edu/content/pdf/10.1007/978-3-030-63291-5_14.pdf 36 | 37 | ### Working with Researchers 38 | 39 | * Zierau, E., & Moldrup-Dalum, P. (2021). Making web collections for research sustainable & reusable: Possibilities and challenges Experienced. International Internet Preservation Consortium's Web Archiving Conference 2021. 40 | https://www.youtube.com/watch?v=8DGsyEylnM4 41 | * Ruest, N., Lin, J., Milligan, I., & Fritz, S. (2020). The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives. Joint Conference on Digital Libraries. https://doi.org/10.1145/3383583.3398513 42 | * Also available as preprint from arXiv - https://arxiv.org/abs/2001.05399 43 | * Lin, J., Milligan, I. Wiebe, J., & Zhou, A. (2017). Warcbase: Scalable Analytics Infrastructure for Exploring Web Archives. Journal on Computing and Cultural Heritage. 10(4). 1-30 https://doi.org/10.1145/3097570 44 | * UNT Libraries Direct Link to resource - https://dl-acm-org.libproxy.library.unt.edu/doi/10.1145/3097570 45 | 46 | ### Initiatives 47 | * Archive-It Research Services - https://support.archive-it.org/hc/en-us/articles/209671666-Introduction-to-Archive-It-Research-Services-ARS- 48 | * Web Archive Transformation (WAT) files - https://support.archive-it.org/hc/en-us/articles/360039686611 49 | * Web Archive Named Entities (WANE) files - https://support.archive-it.org/hc/en-us/articles/360039691351 50 | * Longitudinal Graph Analysis (LGA) files - https://support.archive-it.org/hc/en-us/articles/360039291992 51 | * RESAW, a Research Infrastructure for the Study of Archived Web Materials - http://resaw.eu/ 52 | * Web Archiving and Digital Libraries - https://fox.cs.vt.edu/wadl2022.html 53 | * Archives Unleashed - https://archivesunleashed.org/ 54 | * WARCnet - https://cc.au.dk/en/warcnet/ 55 | * WARCnet Papers - https://cc.au.dk/en/warcnet/warcnet-papers/ 56 | 57 | ## Archiving Exercise 58 | 59 | Web Archiving Exercise - Research with Web Archives. 60 | This week we are going to explore different research projects that make use of web archives to answer research questions. 61 | 62 | I like to think about work with web archives as falling into a few different types. First, there is work that is exploring the nature of web archives themselves. These are often analysis of the shape, size, or contents of the web archive. They can also include research about how the web archive was crawled or the relationships of content in the websites. This can happen with network analysis or by working with other derivative formats from the web archive itself. 63 | 64 | Another type of research makes use of web archives as a dataset of large amounts of text to build tools, models, and services for research. You will have seen this in some of the Common Crawl (https://commoncrawl.org) research that you looked at in a previous module. 65 | 66 | Finally in there is research that is conducted with web archives as a datasource to answer questions in the specific discipline such as political science, history, or health policy. This list of potential uses is almost endless. 67 | 68 | This week's exercise is to identify **two** example papers or articles from **two** of these three rough categories. Said another way, don't choose the same category twice, and identify **two** papers. 69 | 70 | * Web archives to study web archives. 71 | * Web archives for building models. 72 | * Web archives for answering disciplinary research questions. 73 | 74 | In this week's discussion, include a citation for the paper or articles that you identified along with a paragraph describing the research and how the web archive was used to facilitate that research. Information about specifics of the web archive such as domain, size, time periods, or formats would be great to include. 75 | 76 | Where to find these papers? 77 | 78 | I suggest doing some broad searches in Google Scholar - https://scholar.google.com/ as a way to being this assignment. 79 | 80 | ## Exploring Web Archives 81 | 82 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 83 | 84 | This week we will look at the output of the International Internet Preservation Consortium's (IIPC) General Assembly and Web Archiving Conference. 85 | 86 | https://digital.library.unt.edu/explore/collections/IIPCM/ 87 | 88 | This collection of presentations is hosted by the UNT Digital Library and includes 316 presentation from eight years of events hosted by the IIPC. 89 | 90 | You can view the items in this collection at this link. 91 | 92 | https://digital.library.unt.edu/explore/collections/IIPCM/browse/ 93 | 94 | Take some time to explore these presentations to get a better sense of the kinds of presentations and work that is being carried out by members of the IIPC as well as others in this web archives space. 95 | 96 | In this week's discussion you will identify one presentation and write a description of the work being described in the presentation. Make sure that you link to the presentation so that others can see what you are referencing. 97 | 98 | ## Discussion 99 | 100 | ### Discussion Post: 101 | In at least one paragraph, discuss what you learned this week about research being conducted with web archives. What were some of the concepts that were new to you this week? Did you have any thoughts about the different kinds of research that can be done with web archives? In this week's exercise, three broad areas were suggested for a way of classifying research conducted with web archives. How do you think those three categories hold up? Too broad? Too narrow? Need additional ones based on what you found? Share your thoughts with the class. 102 | 103 | In at least one paragraph each, discuss the two articles or papers that you identified in this weeks web archiving exercise. In addition to a citation for the paper or articles that you identified, describe the research and how the web archive was used to facilitate that research. Information about specifics of the web archive such as domain, size, time periods, or formats would be great to include. 104 | 105 | Finally, in at least one paragraph, identify which presentation you found in the IIPC's GA and WAC collection in the UNT Digital Library. Who was involved in the work? What was the scope of the work or project? What questions would you like to ask the presenters about their work if you had a chance. 106 | 107 | ### Class Engagement: 108 | 109 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 110 | 111 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 112 | -------------------------------------------------------------------------------- /modules/module-13-intellectual-property-ethics.md: -------------------------------------------------------------------------------- 1 | # Module Thirteen - Intellectual Property / Ethics 2 | 3 | ## Overview and Objectives 4 | 5 | ### Overview: 6 | 7 | The purpose of this module is to become familiar with the ethical considerations to web archives. Ethics is an important component in most aspects of creating collections and web archives has many pieces where ethics and intellectual property are involved. 8 | 9 | In this module, both ethics and intellectual property are discussed. These are two different areas that have some overlap but can often be thought of together. They don't always align as there are many situations where it might be legal to do something (without copyright restriction, or issues with intellectual property concerns) but it might not be ethical to do so. Another concept that is involved in this space is bias which is also at play in any collection building activity including web archives. 10 | 11 | There are several readings, some online documentation to skim, a few videos to watch, and several power points that you will review. 12 | 13 | ### Objectives: 14 | 15 | * Familiarize yourself with ethics as they are associated with web archives. 16 | * Understand basic concepts of intellectual property and copyright as they apply to web archives. 17 | * Register and create a public collection in the Conifer tool. 18 | 19 | ## Readings 20 | 21 | The readings this week were selected to give you an introduction to different ways that ethics, legal considerations, and bias come into play within the scope of web archives. While some fo the readings for this week might seem web archive adjacent, they are all worth considering as you continue to think about building collections of content created by other people, other nations, and other communities. 22 | 23 | ### Ethics 24 | 25 | * Jules, B., Summers, E., & Mitchell, V. Jr. (2018). Ethical Considerations for Archiving Social Media Content Generated by Contemporary Social Movements: Challenges, Opportunities, and Recommendations. https://www.docnow.io/docs/docnow-whitepaper-2018.pdf 26 | * Graham, P. (2019). Guest Editorial: Reflections on the Ethics of Web Archiving, Journal of Archival Organization, 14(3-4), 103-110, https://doi.org/10.1080/15332748.2018.1517589 27 | * Summers, E. (2014). On Forgetting. https://inkdroid.org/2014/11/18/on-forgetting/ 28 | * Dolan-Mescal, A. (2017). Opportunities for making appraisal transparent when documenting the now. https://news.docnow.io/opportunities-for-making-appraisal-transparent-when-documenting-the-now-10b807606d39 29 | * Bingham, N. J., & Byrne, H. (2021). Archival strategies for contemporary collecting in a world of big data: Challenges and opportunities with curating the UK web archive. Big Data & Society https://doi.org/10.1177/2053951721990409 30 | * George Washington University Libraries. (2018). Social media research ethical and privacy guidelines. https://gwu-libraries.github.io/sfm-ui/resources/social_media_research_ethical_and_privacy_guidelines.pdf 31 | * National Forum on Ethics & Archiving of the Web - https://eaw.rhizome.org/ 32 | * This is a very interesting collection of talks and recorded videos from the event. 33 | * Digital Curation Ethics (Web Archive) - https://archive-it.org/collections/9982 34 | * Collection of papers and projects related to ethics in digital curation 35 | * Kahle, B. (1992). Ethics of Digital Librarianship. https://archive.org/about/ethics_BK.php 36 | * de Klerk, T. (2018). Ethics in Archives: Decisions in Digital Archiving. https://www.lib.ncsu.edu/news/special-collections/ethics-in-archives%3A-decisions-in-digital-archiving 37 | 38 | ### Legal / Intellectual Property 39 | * Grotke, A. (2012). Legal Issues in Web Archiving. https://blogs.loc.gov/thesignal/2012/05/legal-issues-in-web-archiving/ 40 | * Brindley, L. (2012). The memory of a nation in a digital world: Act quickly or our intellectual record will disappear down a black hole. The New Statesman. https://www.newstatesman.com/culture/2012/05/memory-nation-digital-world 41 | * International Internet Preservation Consortium (2022). Legal Deposit. https://netpreserve.org/web-archiving/legal-deposit/ 42 | * International Federation of Library Associations and Institutions. (2011). IFLS Statement of Legal Deposit. https://www.ifla.org/publications/ifla-statement-on-legal-deposit-2011/ 43 | * Association of Research Libraries. (N.D.) Copyright & Fair Use/Fair Dealing https://www.arl.org/category/our-priorities/advocacy-public-policy/copyright-and-fair-use/ 44 | 45 | ## Archiving Exercise 46 | 47 | ### Web Archiving Exercise - Creating a Collection with Conifer 48 | 49 | This week we are looking at the Conifer service offered by Rhizome. Many of you will know from our readings that Conifer was developed in partnership with the Webrecorder group and was previously called webrecorder.io. The service is basically the same as it was and the renaming reflects the changes in governance of the service in relation to other projects. 50 | 51 | For our final project we will be making use of the Conifer service to capture web content as part of the web archives that you have described in your Web Archive Collection Plan. 52 | 53 | For this exercise, you will create a free account with the Conifer service and then create a public collection for the web archive you described in your collection plan. 54 | 55 | Begin by navigating over to https://conifer.rhizome.org/ 56 | 57 | ![Alt](images/module-13-conifer-01.png "Homepage of the Confer service at https://conifer.rhizome.org/") 58 | 59 | Next, register for a free account with the service. You will have 5GB of free space on this service and if you don't go wild with your final assignment, this should be sufficient. If you want to use this service more in the future there are options for more storage with a subscription. 60 | 61 | Once you have created your account you will be given the option to create a collection. Create a new collection and name it what you chose in your Collection Plan document. When you are creating the collection click the toggle to make it viewable by everyone. Here is what my create a collection page looked like. 62 | 63 | ![Alt](images/module-13-conifer-02.png "Create a new collection page in Conifer") 64 | 65 | After you create the collection you will have a blank collection where you can start capturing items for you collection. 66 | 67 | If you click on the Collection Cover link you will be given the public facing display and the URL that you can share with the class. 68 | 69 | ![Alt](images/module-13-conifer-03.png "Collection management page in Conifer.") 70 | 71 | You can then share the link to your public collection. Here is the link to the collection I just created. 72 | 73 | https://conifer.rhizome.org/mphillips/action-figure-web-archive 74 | 75 | ![Alt](images/module-13-conifer-04.png "Action Figure Web Archive Collection Overview") 76 | 77 | If you want to explore different ways you can add additional information to your new collection feel free. There is a way of adding a description about the collection and possibly other options you can make use of. 78 | 79 | And that is all that you need to do for this week's exercise. In this weeks discussion you can post the link to your public collection as an example of what you will be working on for the final project. The public collection allows us all to see the work you are doing more easily. 80 | 81 | ## Exploring Web Archives 82 | 83 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 84 | 85 | This week we will look at the output of the membership of the International Internet Preservation Consortiums' members. 86 | 87 | https://netpreserve.org/about-us/members/ 88 | 89 | Because many of these libraries are national libraries, they are operating under their local copyright and intellectual property laws. Many of them have some sort of legal mandate in place for collecting resources but not all of them have the ability to display all of the content that they collect. 90 | 91 | Take some time to explore the members and try to navigate to their institutions' web archive if possible. You will notice some familiar groups like Australia, UKWA, Arquivo.pt, and the Library of Congress that we have looked at in past weeks. In your reporting out in the discussion this week, pick a web archiving institution other than one of the ones we have seen so far in previous module's Exploring Web Archives sections. 92 | 93 | ## Discussion 94 | 95 | ### Discussion Post: 96 | In at least one paragraph, discuss what you learned this week about ethics and intellectual property in relation to web archives. Had you thought much about this aspect previously in the course? Are there things like restricted access to some web archives that you see differently based on the readings this week? What are your thoughts about legal mandates to collect web content as a component of preserving culture and intellectual output of a nation? 97 | 98 | In at least one paragraph discuss this week's exercise with Conifer. Did you run into any problems with setting up an account? Post a link to your collection and give the class a brief overview of the collection you will be creating based on your Collection Plan. 99 | 100 | Finally, in at least one paragraph, identify one of the members of the IIPC that you haven't already looked at as part of a previous module's Exploring Web Archive. What did you learn about that member? What kind of library is it, national, research, archive, commercial service? What kinds of web archives do they collect? What do they have online. Link to the member's local institution's pages on their web archiving initiative if possible. 101 | 102 | ### Class Engagement: 103 | 104 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 105 | 106 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 107 | -------------------------------------------------------------------------------- /modules/module-14-future-of-web-archive.md: -------------------------------------------------------------------------------- 1 | # Module Fourteen - Future of Web Archives 2 | 3 | ## Overview and Objectives 4 | 5 | Overview: 6 | The purpose of this module is to introduce some of the emerging areas of web archive, or possibly web archiving adjacent activities that may change how the field does the things it does. This module will present several initiatives that are in current development as well as some example of recent projects in the broad scope of web archiving. 7 | 8 | There are several readings, some online documentation to skim, a few videos to watch, and several power points that you will review. 9 | 10 | Objectives: 11 | 1. Introduce projects that may have an effect on web archiving in the future. 12 | 2. Become aware of the concept of Robust Links. 13 | 3. Create and share an example of a Robust Link with the class. 14 | 15 | ## Readings 16 | 17 | The readings this week were selected to give you an introduction to different projects that are on the periphery of the web archiving space that we have not been able to cover in great depth so far in this course. Many of them you may have run across in previous readings but this is an opportunity for you to learn more about them in this module. 18 | 19 | ### General Readings 20 | * Lynch, C. (2022). The Dangerous Complacency of “Web Archiving” Rhetoric. Against the Grain 33(6) https://www.charleston-hub.com/2022/01/the-dangerous-complacency-of-web-archiving-rhetoric/ 21 | * Lynch, C. (2017) Stewardship in the "Age of Algorithms". First Monday 22(12). https://doi.org/10.5210/fm.v22i12.8097 22 | 23 | ### Robust Links 24 | * https://robustlinks.mementoweb.org/ 25 | * About the Project - https://robustlinks.mementoweb.org/about/ 26 | * Specification - https://robustlinks.mementoweb.org/spec/ 27 | * Sanderson, R., Phillips, M., & Van de Sompel H. (2011). Analyzing the Persistence of Referenced Web Resources with Memento. In proceedings Open Repositories 2011 Conference. 28 | * ArXiv Link - https://doi.org/10.48550/arXiv.1105.3459 29 | * UNT Digital Library - https://digital.library.unt.edu/ark:/67531/metadc39318/ 30 | 31 | ### Signposting 32 | * https://signposting.org/ 33 | * About the Project - https://signposting.org/#about 34 | Klein, M., Shankar, H., & Van de Sompel H. (2018). Signposting for Repositories. In proceedings Joint Conference on Digital Libraries. https://doi.org/10.1145/3197026.3203879 35 | * UNT Direct Link - https://dl-acm-org.libproxy.library.unt.edu/doi/10.1145/3197026.3203879 36 | * Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K., & Tobin, R. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. https://doi.org/10.1371/journal.pone.0115253 37 | 38 | ### Web Archive Collection Zipped (WACZ) 39 | * Summers, Ed. (2021). Web Archives on, of, and off, the Web. https://inkdroid.org/2021/11/24/wacz/ 40 | * Open Knowledge Foundation. (2022). Ilya Kreymer's and Ed Summers' presentation about standardising the WACZ format 41 | https://www.youtube.com/watch?v=TIyOTEyAu7k 42 | * Specification - https://webrecorder.github.io/wacz-spec/1.1.1/ 43 | 44 | ## Archiving Exercise 45 | 46 | ### Web Archiving Exercise - Robust Links 47 | 48 | One of the most powerful, and at the same time most fragile thing that makes the web possible is the mechanism that is used for connecting resource, links. These are the basis of the web and we use them all the time without thinking about them. That is, until they do not work for us and we get a 404 error and have to figure out what to do next. 49 | 50 | In addition to just missing pages, there are some situations when the specific version of a website is important to reference. We accomplish this in writing with citations and references that display when a URL was last referenced. There are other approaches to this problem space that we will explore in this exercise. 51 | 52 | First, head over to the Robustify service. 53 | 54 | https://robustlinks.mementoweb.org/ 55 | 56 | ![Alt](images/module-14-robust-01.png "Robustify your Links service.") 57 | 58 | Choose a website you want to link to with a specific date and time. 59 | 60 | After you entry your link and the text you want the link to say, you hit submit and the service will begin doing its work. 61 | 62 | ![Alt](images/module-14-robust-02.png "Your Robust Link (in progress)") 63 | 64 | In my example I have chosen a link to the Dallas Morning News (https://dallasnews.com (Links to an external site.)) for April 24, 2022. 65 | 66 | Once the service has completed you will be provided with the following screen. 67 | 68 | ![Alt](images/module-14-robust-03.png "Completed Robust Links page.") 69 | 70 | You have some options for how you can choose to display your new robust link. 71 | 72 | For the discussion this week you will share the code for your Robust link in its snippet form. You can use the option like you see below. 73 | 74 | ``` 75 | Dallas Morning News for April 24, 2022 78 | ``` 79 | 80 | ## Exploring Web Archives 81 | 82 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it. 83 | 84 | This week is an open week for identifying a web archive collection that you haven't previously mentioned in your discussion postings and share it with the class. This can be a collection from a larger service such as Archive-It (https://archive-it.org/ (Links to an external site.)) , or the Library of Congress Web Archive Collections (https://www.loc.gov/web-archives/collections/ (Links to an external site.)) but if there are others you want to explore feel free. 85 | 86 | In the discussion this week you will identify the web archive collection you have identified and at least two example URLs of content in that web archive. 87 | 88 | ## Discussion 89 | 90 | ### Discussion Post: 91 | In at least one paragraph, discuss what you learned this week from the readings. Discuss your opinions of the Clifford Lynch articles. How do they or do they not alight with your thinking so far in this course? 92 | 93 | In another paragraph share what did you learn from the other examples in the module's readings. Have you come across any of them before in this course? What do you think the future of web archiving holds? 94 | 95 | In at least one paragraph discuss this week's exercise with Robust Links. What is the point of this tool/service/specification? What website did you create a link for? Share your Robust Links in your discussion using the style to show the code snippet. Where do you think this kind of service fits into the web archiving and scholarly communication landscape. 96 | 97 | Finally, in at least one paragraph, introduce the web archive collection you found in the Exploring Web Archives section from this week. Who created the collection? What is the scope of the collection? Include the URL for the collection and then include two example URLs from content within that collection. 98 | 99 | ### Class Engagement: 100 | 101 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts. 102 | 103 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it? 104 | 105 | -------------------------------------------------------------------------------- /syllabus-5960.001-Web-Archiving-2022-Spring.md: -------------------------------------------------------------------------------- 1 | # INFO 5960.001 - Web Archiving 2 | ## Course Information 3 | 4 | **Term: Spring 2022** 5 | 6 | **Location: Online** - https://learn.unt.edu 7 | 8 | ## Instructor Information 9 | 10 | Instructor: Mark Phillips Ph.D. (he/him) 11 | Office Hours: Tues 2:00-4:00PM or by appointment 12 | Office Location: Online using Zoom 13 | Email: mark.phillips@unt.edu 14 | 15 | ## Course Description 16 | The web is a fundamental component of nearly all modern interaction. Preserving content from the web and providing long-term access to preserved content presents an interesting set of challenges for Information Scientists. In this course, you will develop knowledge and skills related to the standards, tools, and processes of web archiving. You will learn the mechanics of web archiving and its relation to familiar concepts like collection building and appraisal, access and use, and ethics. This course will provide hands-on experience working with different projects and tools, and is designed for anyone interested in the topic without any need for prior experience in web archiving. 17 | 18 | ## Objectives 19 | By the end of this course you should be able to: 20 | 21 | * Discuss the role and the potential of the Web as information and characteristics of the Web for archiving and preservation. 22 | * Be familiar with tools and appropriate techniques for preservation of different aspects of the web including “standard” websites as well as a working understanding of preserving API-based web content like social media sites. 23 | * Recognize the challenges of Web archiving. 24 | * Become proficient at using, interpreting, and explaining common playback tools such as the Wayback Machine. 25 | * Increase your awareness of legal and policy constraints on Web archiving. 26 | * Be familiar with the standards and best practices for sustainably archiving Web content. 27 | 28 | ## Required/Recommended Materials 29 | This course does not have a required textbook but will instead rely on a wide range of resources such as reports, articles, white papers, conference proceedings, presentations, and video recordings. A full list of readings is available in LEARN as weekly modules. 30 | 31 | Every effort has been made to select resources that are either open access or resources that are available to you as a student from the UNT Libraries, which are paid for by your library fee. If you have any trouble accessing the readings do not hesitate to reach out to me for assistance. 32 | 33 | ## Assignments 34 | The assignments for this course are designed to allow you to demonstrate and develop your knowledge related to web archives. 35 | 36 | ### Readings 37 | There are reading assignments posted with each module for this course. These readings have been identified to introduce concepts and ideas to you. In addition to readings there will be some video content assigned that you are expected to watch. The readings and videos will provide the topics for the discussion posts mentioned below. 38 | 39 | ### Discussion Posts 40 | Each week you will be assigned a discussion post that will be due on Sunday night by 11:59 PM (Central Standard Time). The discussion will be related to the weekly readings and will allow you to demonstrate your understanding of concepts introduced that week. Additionally, these discussion posts will allow you to explore web archives and introduce what you have found to your fellow students. In addition to your discussion post, you will be expected to comment on other students’ discussion posts as part of the participation portion of the course. 41 | 42 | ### Web Archive Critique 43 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing the features of a web archive and its content. This paper will include an overview of the web archive, who is responsible for creating the archive, what tools are used in creating the archive, its collection scope, and how long the archive has been operational. Instructions will be distributed two weeks before the deadline. 44 | 45 | ### Web Archive Tools Critique 46 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing a specific tool or service in the web archiving space. This paper will include an overview of the tool or service, the problem that it tries to solve, the history of the tool, and who or what institution is responsible for the tool or service. Instructions will be distributed two weeks before the deadline. 47 | 48 | ### Semester Project - Creating a web archive 49 | Throughout the semester you will be constructing the building blocks of a final class project. This project is to create a web archive related to a topic area you choose in consultation with your professor. Midway through the semester a Web Archive Collection Plan will be due describing the plan for this collection. The semester project will be due the last full week of the semester. 50 | 51 | ### Examinations 52 | There are no midterm or final examinations in this course. 53 | 54 | ## How to Succeed in this Course 55 | Office hours offer you an opportunity to ask for clarification or find support with understanding class material. I encourage you to connect with me for support. Additional office hours will be offered virtually as the semester concludes. Your success is my goal. 56 | 57 | I have blocked out my schedule on Tuesdays from 2:00 until 4:00 for scheduling Zoom-based office hours. If you are not able to meet during these times please send me an email and we can find a better date and time that will work for both of us. 58 | 59 | The University of North Texas makes reasonable academic accommodation for students with disabilities. Students seeking reasonable accommodation must first register with the Office of Disability Access (ODA) to verify their eligibility. If a disability is verified, the ODA will provide you with a reasonable accommodation letter to be delivered to faculty to begin a private discussion regarding your specific needs in a course. You may request reasonable accommodations at any time; however, ODA notices of reasonable accommodation should be provided as early as possible in the semester to avoid any delay in implementation. Note that students must obtain a new letter of reasonable accommodation for every semester and must meet with each faculty member prior to implementation in each class. Students are strongly encouraged to deliver letters of reasonable accommodation during faculty office hours or by appointment. Faculty members have the authority to ask students to discuss such letters during their designated office hours to protect the privacy of the student. For additional information, refer to the Office of Disability Access website (http://www.unt.edu/oda). You may also contact ODA by phone at (940) 565-4323. 60 | 61 | I value the many perspectives students bring to our campus. Please work with me to create a classroom culture of open communication, mutual respect, and inclusion. All discussions should be respectful and civil. Although disagreements and debates are encouraged, personal attacks are unacceptable. Together, we can ensure a safe and welcoming classroom for all. If you ever feel like this is not the case, please see me during office hours and let me know. We are all learning together. 62 | 63 | ## Assessing Your Work 64 | Grades will be determined as follows: 65 | 66 | | Assignment Type | Point Distribution | 67 | |-----------------------------|-----------------------------| 68 | | Discussions (15 total) | 150 points (10 points each) | 69 | | Web Archive Critique | 50 points | 70 | | Web Archive Tools Critique | 50 points | 71 | | Web Archive Collection Plan | 50 points | 72 | | Web Archive Final Project | 100 points. | 73 | 74 | 75 | 76 | | Grading Scale | Letter Grade | 77 | |---------------|--------------| 78 | | 90-100% | A | 79 | | 80-89% | B | 80 | | 70-79% | C | 81 | | 60-69% | D | 82 | | Below 60. | F | 83 | 84 | 85 | ### Late work 86 | All students are expected to submit their discussions, assignments, and final project by the due date. This prevents students from getting too far behind in the course and allows the instructor to assign grades in a consistent and timely manner. 87 | 88 | All students who do not complete their module assignments by 11:59 PM Central Time on Sunday will be penalized 15% of the module assignment’s points for each day late unless there are extenuating circumstances. The final project received after the due date will incur a 5-point deduction penalty for each day late unless there are extenuating circumstances. 89 | 90 | The only exceptions are a) if students have a personal or family medical emergency, or b) student informs their instructor of a conflict well in advance and receives permission to turn in an assignment late. 91 | 92 | ### Incompletes 93 | A grade of incomplete (I) will be given only for justifiable reasons (such as a serious illness or military service) and only if you are passing the course. It is our responsibility to contact the instructor to request an incomplete and discuss requirements for completing the course. If you do not remove the incomplete within the period agreed upon with the instructor or within one calendar year, you will receive a grade of an F. Please refer to https://registrar.unt.edu/grades/incompletes for more information. 94 | 95 | ### Withdrawal 96 | A grade of withdrawal (W) or withdrawal-failing (WF) will be given depending on your participation and grades to date. If you simply disappear and do not file a formal UNT withdrawal form, you may receive a grade of an F. 97 | 98 | ## Course Requirements / Schedule 99 | 100 | ### Schedule 101 | 102 | | Week | Date | Topic | Assignment Due | Points Possible | 103 | |--------------|-------|----------------------------------------------------------------------------|-----------------------------|-----------------| 104 | | | 01/18 | [Introduction to Class][module_00] | | | 105 | | Week 1 | 01/18 | [What is a Web Archive?][module_01] | **Module One** | | 106 | | | 02/23 | | Introduction | 10 pts. | 107 | | | 02/23 | | Discussion | 10 pts. | 108 | | Week 2 | 01/24 | [What is the Web][module_02] | **Module Two** | | 109 | | | 01/30 | | Discussion | 10 pts. | 110 | | Week 3 | 01/31 | [Who does Web Archiving?][module_03] | **Module Three** | | 111 | | | 01/31 | [Assignment 1: Web Archive Critique][assignment_01] | | | 112 | | | 02/06 | | Discussion | 10 pts. | 113 | | Week 4 | 02/07 | [Technology Overview][module_04] | **Module Four** | | 114 | | | 02/13 | | Discussion | | 115 | | | 02/13 | | Web Archive Critique | 50 pts. | 116 | | Week 5 | 02/14 | [Capture][module_05] | **Module Five** | | 117 | | | 02/20 | | Discussion | 10 pts. | 118 | | Week 6 | 02/21 | [Preserve][module_06] | **Module Six** | | 119 | | | 02/21 | [Assignment 2: Web Archive Tool Critique][assignment_02] | | | 120 | | | 02/27 | | Discussion | 10 pts. | 121 | | Week 7 | 02/28 | [Playback][module_07] | **Module Seven** | | 122 | | | 03/06 | | Discussion | 10 pts. | 123 | | Week 8 | 03/07 | [Other Tools][module_08] | **Module Eight** | | 124 | | | 03/13 | | Discussion | 10 pts. | 125 | | | 03/13 | | Web Archive Tools Critique | 50 pts. | 126 | | Spring Break | 03/14 | | | | 127 | | | | | | | 128 | | Week 9 | 03/21 | [Collection Policies][module_09] | **Module Nine** | | 129 | | | | [Assignment 3: Web Archive Collection Plan][assignment_03] | | | 130 | | | 03/27 | | Discussion | 10 pts. | 131 | | Week 10 | 03/28 | [Metadata][module_10] | **Module Ten** | | 132 | | | 04/03 | | Discussion | | 133 | | Week 11 | 04/04 | [Quality Assurance][module_11] | **Module Eleven** | | 134 | | | 04/10 | | Discussion | 10 pts. | 135 | | | 04/10 | | Web Archive Collection Plan | 50 pts. | 136 | | Week 12 | 04/11 | [Research with Web Archives][module_12] | **Module Twelve** | | 137 | | | 04/11 | [Final Project: Building a Web Archive][assignment_04] | | | 138 | | | 04/17 | | Discussion | 10 pts. | 139 | | Week 13 | 04/18 | [Intellectual Property][module_13] | **Module Thirteen** | | 140 | | | 04/24 | | Discussion | 10 pts. | 141 | | Week 14 | 04/25 | [Future of Web Archives][module_14] | **Module Fourteen** | | 142 | | | 05/01 | | Discussion | 10 pts. | 143 | | Week 15 | 05/05 | | Final Project Due | 100 pts. | 144 | | | | | | | 145 | | Week 16 | 05/09 | Finals Week | No Assignments | | 146 | 147 | 148 | Every student in my class can improve by doing their own work and trying their hardest with access to appropriate resources. Students who use other people’s work without citations will be violating UNT’s Academic Integrity Policy. Please read and follow this important set of guidelines for your academic success (https://policy.unt.edu/policy/06-003). If you have questions about this, or any UNT policy, please email me or come discuss this with me during my office hours. 149 | 150 | ## Attendance and Participation 151 | 152 | Success in this course is dependent on your active participation and engagement throughout the course. As such, students are required to complete all assignments by the due date, and to actively participate in class discussions. 153 | 154 | Additionally, students are expected to: 155 | * Log on at least two times a week – ideally on different days in order to completely weekly assignments, assessments, discussions and/or other weekly deliverables as directed by the instructor and outlined in the syllabus; 156 | * Participate in the weekly threaded discussions, this means that, in addition to posting a response to the thread topic presented, students are expected to respond to each other and comment and questions from the instructor and/or other students; 157 | 158 | If you find that you cannot meet the class' minimum discussion requirements due to such a circumstance, please contact me as soon as possible. 159 | 160 | Students will not be marked present for the course in a particular week if they have not posted on the discussion forum and/or submit assignment/essay or complete assessment if administered in that week. 161 | 162 | Please inform the professor and instructional team if you are unable to attend class meetings because you are ill, in mindfulness of the health and safety of everyone in our community. If you are experiencing any symptoms of COVID (https://www.cdc.gov/coronavirus/2019-ncov/symptoms testing/symptoms.html) please seek medical attention from the Student Health and Wellness Center (940-565-2333 or askSHWC@unt.edu) or your health care provider PRIOR to coming to campus. UNT also requires you to contact the UNT COVID Team at COVID@unt.edu for guidance on actions to take due to symptoms, pending or positive test results, or potential exposure. 163 | 164 | [module_00]: ./modules/module-00-introductions.md 165 | [module_01]: ./modules/module-01-what-is-a-web-archive.md 166 | [module_02]: ./modules/module-02-what-is-the-web.md 167 | [module_03]: ./modules/module-03-who-does-web-archiving.md 168 | [module_04]: ./modules/module-04-technology-overview.md 169 | [module_05]: ./modules/module-05-capture.md 170 | [module_06]: ./modules/module-06-preserve.md 171 | [module_07]: ./modules/module-07-playback.md 172 | [module_08]: ./modules/module-08-other-tools.md 173 | [module_09]: ./modules/module-09-collection-policies.md 174 | [module_10]: ./modules/module-10-metadata.md 175 | [module_11]: ./modules/module-11-quality-assurance.md 176 | [module_12]: ./modules/module-12-research.md 177 | [module_13]: ./modules/module-13-intellectual-property-ethics.md 178 | [module_14]: ./modules/module-14-future-of-web-archive.md 179 | 180 | [assignment_01]: ./assignments/assignment-01.md 181 | [assignment_02]: ./assignments/assignment-02.md 182 | [assignment_03]: ./assignments/assignment-03.md 183 | [assignment_04]: ./assignments/assignment-04.md 184 | 185 | -------------------------------------------------------------------------------- /syllabus-5960.001-Web-Archiving-2023-Spring.md: -------------------------------------------------------------------------------- 1 | # INFO 5960.001 - Web Archiving 2 | ## Course Information 3 | 4 | **Term: Spring 2023** 5 | 6 | **Location: Online** - https://learn.unt.edu 7 | 8 | ## Instructor Information 9 | 10 | Instructor: Mark Phillips Ph.D. (he/him) 11 | Office Hours: Wed 2:00-4:00PM by appointment 12 | Office Location: Online using Zoom - https://unt.zoom.us/my/mark.phillips 13 | Email: mark.phillips@unt.edu 14 | 15 | ## Course Description 16 | The web is a fundamental component of nearly all modern interaction. Preserving content from the web and providing long-term access to preserved content presents an interesting set of challenges for Information Scientists. In this course, you will develop knowledge and skills related to the standards, tools, and processes of web archiving. You will learn the mechanics of web archiving and its relation to familiar concepts like collection building and appraisal, access and use, and ethics. This course will provide hands-on experience working with different projects and tools, and is designed for anyone interested in the topic without any need for prior experience in web archiving. 17 | 18 | ## Objectives 19 | By the end of this course you should be able to: 20 | 21 | * Discuss the role and the potential of the Web as information and characteristics of the Web for archiving and preservation. 22 | * Be familiar with tools and appropriate techniques for preservation of different aspects of the web including “standard” websites as well as a working understanding of preserving API-based web content like social media sites. 23 | * Recognize the challenges of Web archiving. 24 | * Become proficient at using, interpreting, and explaining common playback tools such as the Wayback Machine. 25 | * Increase your awareness of legal and policy constraints on Web archiving. 26 | * Be familiar with the standards and best practices for sustainably archiving Web content. 27 | 28 | ## Required/Recommended Materials 29 | This course does not have a required textbook but will instead rely on a wide range of resources such as reports, articles, white papers, conference proceedings, presentations, and video recordings. A full list of readings is available in LEARN as weekly modules. 30 | 31 | Every effort has been made to select resources that are either open access or resources that are available to you as a student from the UNT Libraries, which are paid for by your library fee. If you have any trouble accessing the readings do not hesitate to reach out to me for assistance. 32 | 33 | ## Assignments 34 | The assignments for this course are designed to allow you to demonstrate and develop your knowledge related to web archives. 35 | 36 | ### Readings 37 | There are reading assignments posted with each module for this course. These readings have been identified to introduce concepts and ideas to you. In addition to readings there will be some video content assigned that you are expected to watch. The readings and videos will provide the topics for the discussion posts mentioned below. 38 | 39 | ### Discussion Posts 40 | Each week you will be assigned a discussion post that will be due on Sunday night by 11:59 PM (Central Standard Time). The discussion will be related to the weekly readings and will allow you to demonstrate your understanding of concepts introduced that week. Additionally, these discussion posts will allow you to explore web archives and introduce what you have found to your fellow students. In addition to your discussion post, you will be expected to comment on other students’ discussion posts as part of the participation portion of the course. 41 | 42 | ### Web Archive Critique 43 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing the features of a web archive and its content. This paper will include an overview of the web archive, who is responsible for creating the archive, what tools are used in creating the archive, its collection scope, and how long the archive has been operational. Instructions will be distributed two weeks before the deadline. 44 | 45 | ### Web Archive Tools Critique 46 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing a specific tool or service in the web archiving space. This paper will include an overview of the tool or service, the problem that it tries to solve, the history of the tool, and who or what institution is responsible for the tool or service. Instructions will be distributed two weeks before the deadline. 47 | 48 | ### Semester Project - Creating a web archive 49 | Throughout the semester you will be constructing the building blocks of a final class project. This project is to create a web archive related to a topic area you choose in consultation with your professor. Midway through the semester a Web Archive Collection Plan will be due describing the plan for this collection. The semester project will be due the last full week of the semester. 50 | 51 | ### Examinations 52 | There are no midterm or final examinations in this course. 53 | 54 | ## How to Succeed in this Course 55 | Office hours offer you an opportunity to ask for clarification or find support with understanding class material. I encourage you to connect with me for support. Additional office hours will be offered virtually as the semester concludes. Your success is my goal. 56 | 57 | I have blocked out my schedule on Tuesdays from 2:00 until 4:00 for scheduling Zoom-based office hours. If you are not able to meet during these times please send me an email and we can find a better date and time that will work for both of us. 58 | 59 | The University of North Texas makes reasonable academic accommodation for students with disabilities. Students seeking reasonable accommodation must first register with the Office of Disability Access (ODA) to verify their eligibility. If a disability is verified, the ODA will provide you with a reasonable accommodation letter to be delivered to faculty to begin a private discussion regarding your specific needs in a course. You may request reasonable accommodations at any time; however, ODA notices of reasonable accommodation should be provided as early as possible in the semester to avoid any delay in implementation. Note that students must obtain a new letter of reasonable accommodation for every semester and must meet with each faculty member prior to implementation in each class. Students are strongly encouraged to deliver letters of reasonable accommodation during faculty office hours or by appointment. Faculty members have the authority to ask students to discuss such letters during their designated office hours to protect the privacy of the student. For additional information, refer to the Office of Disability Access website (http://www.unt.edu/oda). You may also contact ODA by phone at (940) 565-4323. 60 | 61 | I value the many perspectives students bring to our campus. Please work with me to create a classroom culture of open communication, mutual respect, and inclusion. All discussions should be respectful and civil. Although disagreements and debates are encouraged, personal attacks are unacceptable. Together, we can ensure a safe and welcoming classroom for all. If you ever feel like this is not the case, please see me during office hours and let me know. We are all learning together. 62 | 63 | ## Assessing Your Work 64 | Grades will be determined as follows: 65 | 66 | | Assignment Type | Point Distribution | 67 | |-----------------------------|-----------------------------| 68 | | Discussions (15 total) | 150 points (10 points each) | 69 | | Web Archive Critique | 50 points | 70 | | Web Archive Tools Critique | 50 points | 71 | | Web Archive Collection Plan | 50 points | 72 | | Web Archive Final Project | 100 points. | 73 | 74 | 75 | 76 | | Grading Scale | Letter Grade | 77 | |---------------|--------------| 78 | | 90-100% | A | 79 | | 80-89% | B | 80 | | 70-79% | C | 81 | | 60-69% | D | 82 | | Below 60. | F | 83 | 84 | 85 | ### Late work 86 | All students are expected to submit their discussions, assignments, and final project by the due date. This prevents students from getting too far behind in the course and allows the instructor to assign grades in a consistent and timely manner. 87 | 88 | All students who do not complete their module assignments by 11:59 PM Central Time on Sunday will be penalized 15% of the module assignment’s points for each day late unless there are extenuating circumstances. The final project received after the due date will incur a 5-point deduction penalty for each day late unless there are extenuating circumstances. 89 | 90 | The only exceptions are a) if students have a personal or family medical emergency, or b) student informs their instructor of a conflict well in advance and receives permission to turn in an assignment late. 91 | 92 | ### Incompletes 93 | A grade of incomplete (I) will be given only for justifiable reasons (such as a serious illness or military service) and only if you are passing the course. It is our responsibility to contact the instructor to request an incomplete and discuss requirements for completing the course. If you do not remove the incomplete within the period agreed upon with the instructor or within one calendar year, you will receive a grade of an F. Please refer to https://registrar.unt.edu/grades/incompletes for more information. 94 | 95 | ### Withdrawal 96 | A grade of withdrawal (W) or withdrawal-failing (WF) will be given depending on your participation and grades to date. If you simply disappear and do not file a formal UNT withdrawal form, you may receive a grade of an F. 97 | 98 | ## Course Requirements / Schedule 99 | 100 | ### Schedule 101 | 102 | | Week | Date | Topic | Assignment Due | Points Possible | 103 | |--------------|-------|----------------------------------------------------------------------------|-----------------------------|-----------------| 104 | | | 01/17 | [Introduction to Class][module_00] | | | 105 | | Week 1 | 01/17 | [What is a Web Archive?][module_01] | **Module One** | | 106 | | | 02/22 | | Introduction | 10 pts. | 107 | | | 02/22 | | Discussion | 10 pts. | 108 | | Week 2 | 01/23 | [What is the Web][module_02] | **Module Two** | | 109 | | | 01/29 | | Discussion | 10 pts. | 110 | | Week 3 | 01/30 | [Who does Web Archiving?][module_03] | **Module Three** | | 111 | | | 01/30 | [Assignment 1: Web Archive Critique][assignment_01] | | | 112 | | | 02/05 | | Discussion | 10 pts. | 113 | | Week 4 | 02/06 | [Technology Overview][module_04] | **Module Four** | | 114 | | | 02/12 | | Discussion | | 115 | | | 02/12 | | Web Archive Critique | 50 pts. | 116 | | Week 5 | 02/13 | [Capture][module_05] | **Module Five** | | 117 | | | 02/19 | | Discussion | 10 pts. | 118 | | Week 6 | 02/20 | [Preserve][module_06] | **Module Six** | | 119 | | | 02/20 | [Assignment 2: Web Archive Tool Critique][assignment_02] | | | 120 | | | 02/26 | | Discussion | 10 pts. | 121 | | Week 7 | 02/27 | [Playback][module_07] | **Module Seven** | | 122 | | | 03/05 | | Discussion | 10 pts. | 123 | | Week 8 | 03/06 | [Other Tools][module_08] | **Module Eight** | | 124 | | | 03/12 | | Discussion | 10 pts. | 125 | | | 03/12 | | Web Archive Tools Critique | 50 pts. | 126 | | Spring Break | 03/13 | | | | 127 | | | | | | | 128 | | Week 9 | 03/20 | [Collection Policies][module_09] | **Module Nine** | | 129 | | | | [Assignment 3: Web Archive Collection Plan][assignment_03] | | | 130 | | | 03/26 | | Discussion | 10 pts. | 131 | | Week 10 | 03/27 | [Metadata][module_10] | **Module Ten** | | 132 | | | 04/02 | | Discussion | | 133 | | Week 11 | 04/03 | [Quality Assurance][module_11] | **Module Eleven** | | 134 | | | 04/09 | | Discussion | 10 pts. | 135 | | | 04/09 | | Web Archive Collection Plan | 50 pts. | 136 | | Week 12 | 04/10 | [Research with Web Archives][module_12] | **Module Twelve** | | 137 | | | 04/10 | [Final Project: Building a Web Archive][assignment_04] | | | 138 | | | 04/17 | | Discussion | 10 pts. | 139 | | Week 13 | 04/10 | [Intellectual Property][module_13] | **Module Thirteen** | | 140 | | | 04/23 | | Discussion | 10 pts. | 141 | | Week 14 | 04/24 | [Future of Web Archives][module_14] | **Module Fourteen** | | 142 | | | 04/30 | | Discussion | 10 pts. | 143 | | Week 15 | 05/04 | | Final Project Due | 100 pts. | 144 | | | | | | | 145 | | Week 16 | 05/08 | Finals Week | No Assignments | | 146 | 147 | 148 | Every student in my class can improve by doing their own work and trying their hardest with access to appropriate resources. Students who use other people’s work without citations will be violating UNT’s Academic Integrity Policy. Please read and follow this important set of guidelines for your academic success (https://policy.unt.edu/policy/06-003). If you have questions about this, or any UNT policy, please email me or come discuss this with me during my office hours. 149 | 150 | ## Attendance and Participation 151 | 152 | Success in this course is dependent on your active participation and engagement throughout the course. As such, students are required to complete all assignments by the due date, and to actively participate in class discussions. 153 | 154 | Additionally, students are expected to: 155 | * Log on at least two times a week – ideally on different days in order to completely weekly assignments, assessments, discussions and/or other weekly deliverables as directed by the instructor and outlined in the syllabus; 156 | * Participate in the weekly threaded discussions, this means that, in addition to posting a response to the thread topic presented, students are expected to respond to each other and comment and questions from the instructor and/or other students; 157 | 158 | If you find that you cannot meet the class' minimum discussion requirements due to such a circumstance, please contact me as soon as possible. 159 | 160 | Students will not be marked present for the course in a particular week if they have not posted on the discussion forum and/or submit assignment/essay or complete assessment if administered in that week. 161 | 162 | Please inform the professor and instructional team if you are unable to attend class meetings because you are ill, in mindfulness of the health and safety of everyone in our community. If you are experiencing any symptoms of COVID (https://www.cdc.gov/coronavirus/2019-ncov/symptoms testing/symptoms.html) please seek medical attention from the Student Health and Wellness Center (940-565-2333 or askSHWC@unt.edu) or your health care provider PRIOR to coming to campus. UNT also requires you to contact the UNT COVID Team at COVID@unt.edu for guidance on actions to take due to symptoms, pending or positive test results, or potential exposure. 163 | 164 | [module_00]: ./modules/module-00-introductions.md 165 | [module_01]: ./modules/module-01-what-is-a-web-archive.md 166 | [module_02]: ./modules/module-02-what-is-the-web.md 167 | [module_03]: ./modules/module-03-who-does-web-archiving.md 168 | [module_04]: ./modules/module-04-technology-overview.md 169 | [module_05]: ./modules/module-05-capture.md 170 | [module_06]: ./modules/module-06-preserve.md 171 | [module_07]: ./modules/module-07-playback.md 172 | [module_08]: ./modules/module-08-other-tools.md 173 | [module_09]: ./modules/module-09-collection-policies.md 174 | [module_10]: ./modules/module-10-metadata.md 175 | [module_11]: ./modules/module-11-quality-assurance.md 176 | [module_12]: ./modules/module-12-research.md 177 | [module_13]: ./modules/module-13-intellectual-property-ethics.md 178 | [module_14]: ./modules/module-14-future-of-web-archive.md 179 | 180 | [assignment_01]: ./assignments/assignment-01.md 181 | [assignment_02]: ./assignments/assignment-02.md 182 | [assignment_03]: ./assignments/assignment-03.md 183 | [assignment_04]: ./assignments/assignment-04.md 184 | 185 | --------------------------------------------------------------------------------