├── LICENSE
├── README.md
├── assignments
├── assignment-01.md
├── assignment-02.md
├── assignment-03.md
└── assignment-04.md
├── modules
├── images
│ ├── .ignore
│ ├── module-02-unt-homepage-source.png
│ ├── module-02-unt-homepage.png
│ ├── module-04-save-options.png
│ ├── module-05-save-page-now.png
│ ├── module-06-archive-today.png
│ ├── module-07-oldweb-today-01.png
│ ├── module-07-oldweb-today-02.png
│ ├── module-08-archiveready-01.png
│ ├── module-08-archiveready-02.png
│ ├── module-08-arquivo.png
│ ├── module-09-awp-01.png
│ ├── module-09-ukwa.png
│ ├── module-10-time-travel-01.png
│ ├── module-10-time-travel-02.png
│ ├── module-10-trove-01.png
│ ├── module-10-trove-02.png
│ ├── module-10-trove-03.png
│ ├── module-11-commoncrawl-01.png
│ ├── module-11-replayweb-01.png
│ ├── module-11-replayweb-02.png
│ ├── module-11-replayweb-03.png
│ ├── module-13-conifer-01.png
│ ├── module-13-conifer-02.png
│ ├── module-13-conifer-03.png
│ ├── module-13-conifer-04.png
│ ├── module-14-robust-01.png
│ ├── module-14-robust-02.png
│ └── module-14-robust-03.png
├── module-00-introductions.md
├── module-01-what-is-a-web-archive.md
├── module-02-what-is-the-web.md
├── module-03-who-does-web-archiving.md
├── module-04-technology-overview.md
├── module-05-capture.md
├── module-06-preserve.md
├── module-07-playback.md
├── module-08-other-tools.md
├── module-09-collection-policies.md
├── module-10-metadata.md
├── module-11-quality-assurance.md
├── module-12-research.md
├── module-13-intellectual-property-ethics.md
└── module-14-future-of-web-archive.md
├── syllabus-5960.001-Web-Archiving-2022-Spring.md
└── syllabus-5960.001-Web-Archiving-2023-Spring.md
/LICENSE:
--------------------------------------------------------------------------------
1 | Attribution 4.0 International
2 |
3 | =======================================================================
4 |
5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
6 | does not provide legal services or legal advice. Distribution of
7 | Creative Commons public licenses does not create a lawyer-client or
8 | other relationship. Creative Commons makes its licenses and related
9 | information available on an "as-is" basis. Creative Commons gives no
10 | warranties regarding its licenses, any material licensed under their
11 | terms and conditions, or any related information. Creative Commons
12 | disclaims all liability for damages resulting from their use to the
13 | fullest extent possible.
14 |
15 | Using Creative Commons Public Licenses
16 |
17 | Creative Commons public licenses provide a standard set of terms and
18 | conditions that creators and other rights holders may use to share
19 | original works of authorship and other material subject to copyright
20 | and certain other rights specified in the public license below. The
21 | following considerations are for informational purposes only, are not
22 | exhaustive, and do not form part of our licenses.
23 |
24 | Considerations for licensors: Our public licenses are
25 | intended for use by those authorized to give the public
26 | permission to use material in ways otherwise restricted by
27 | copyright and certain other rights. Our licenses are
28 | irrevocable. Licensors should read and understand the terms
29 | and conditions of the license they choose before applying it.
30 | Licensors should also secure all rights necessary before
31 | applying our licenses so that the public can reuse the
32 | material as expected. Licensors should clearly mark any
33 | material not subject to the license. This includes other CC-
34 | licensed material, or material used under an exception or
35 | limitation to copyright. More considerations for licensors:
36 | wiki.creativecommons.org/Considerations_for_licensors
37 |
38 | Considerations for the public: By using one of our public
39 | licenses, a licensor grants the public permission to use the
40 | licensed material under specified terms and conditions. If
41 | the licensor's permission is not necessary for any reason--for
42 | example, because of any applicable exception or limitation to
43 | copyright--then that use is not regulated by the license. Our
44 | licenses grant only permissions under copyright and certain
45 | other rights that a licensor has authority to grant. Use of
46 | the licensed material may still be restricted for other
47 | reasons, including because others have copyright or other
48 | rights in the material. A licensor may make special requests,
49 | such as asking that all changes be marked or described.
50 | Although not required by our licenses, you are encouraged to
51 | respect those requests where reasonable. More_considerations
52 | for the public:
53 | wiki.creativecommons.org/Considerations_for_licensees
54 |
55 | =======================================================================
56 |
57 | Creative Commons Attribution 4.0 International Public License
58 |
59 | By exercising the Licensed Rights (defined below), You accept and agree
60 | to be bound by the terms and conditions of this Creative Commons
61 | Attribution 4.0 International Public License ("Public License"). To the
62 | extent this Public License may be interpreted as a contract, You are
63 | granted the Licensed Rights in consideration of Your acceptance of
64 | these terms and conditions, and the Licensor grants You such rights in
65 | consideration of benefits the Licensor receives from making the
66 | Licensed Material available under these terms and conditions.
67 |
68 |
69 | Section 1 -- Definitions.
70 |
71 | a. Adapted Material means material subject to Copyright and Similar
72 | Rights that is derived from or based upon the Licensed Material
73 | and in which the Licensed Material is translated, altered,
74 | arranged, transformed, or otherwise modified in a manner requiring
75 | permission under the Copyright and Similar Rights held by the
76 | Licensor. For purposes of this Public License, where the Licensed
77 | Material is a musical work, performance, or sound recording,
78 | Adapted Material is always produced where the Licensed Material is
79 | synched in timed relation with a moving image.
80 |
81 | b. Adapter's License means the license You apply to Your Copyright
82 | and Similar Rights in Your contributions to Adapted Material in
83 | accordance with the terms and conditions of this Public License.
84 |
85 | c. Copyright and Similar Rights means copyright and/or similar rights
86 | closely related to copyright including, without limitation,
87 | performance, broadcast, sound recording, and Sui Generis Database
88 | Rights, without regard to how the rights are labeled or
89 | categorized. For purposes of this Public License, the rights
90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
91 | Rights.
92 |
93 | d. Effective Technological Measures means those measures that, in the
94 | absence of proper authority, may not be circumvented under laws
95 | fulfilling obligations under Article 11 of the WIPO Copyright
96 | Treaty adopted on December 20, 1996, and/or similar international
97 | agreements.
98 |
99 | e. Exceptions and Limitations means fair use, fair dealing, and/or
100 | any other exception or limitation to Copyright and Similar Rights
101 | that applies to Your use of the Licensed Material.
102 |
103 | f. Licensed Material means the artistic or literary work, database,
104 | or other material to which the Licensor applied this Public
105 | License.
106 |
107 | g. Licensed Rights means the rights granted to You subject to the
108 | terms and conditions of this Public License, which are limited to
109 | all Copyright and Similar Rights that apply to Your use of the
110 | Licensed Material and that the Licensor has authority to license.
111 |
112 | h. Licensor means the individual(s) or entity(ies) granting rights
113 | under this Public License.
114 |
115 | i. Share means to provide material to the public by any means or
116 | process that requires permission under the Licensed Rights, such
117 | as reproduction, public display, public performance, distribution,
118 | dissemination, communication, or importation, and to make material
119 | available to the public including in ways that members of the
120 | public may access the material from a place and at a time
121 | individually chosen by them.
122 |
123 | j. Sui Generis Database Rights means rights other than copyright
124 | resulting from Directive 96/9/EC of the European Parliament and of
125 | the Council of 11 March 1996 on the legal protection of databases,
126 | as amended and/or succeeded, as well as other essentially
127 | equivalent rights anywhere in the world.
128 |
129 | k. You means the individual or entity exercising the Licensed Rights
130 | under this Public License. Your has a corresponding meaning.
131 |
132 |
133 | Section 2 -- Scope.
134 |
135 | a. License grant.
136 |
137 | 1. Subject to the terms and conditions of this Public License,
138 | the Licensor hereby grants You a worldwide, royalty-free,
139 | non-sublicensable, non-exclusive, irrevocable license to
140 | exercise the Licensed Rights in the Licensed Material to:
141 |
142 | a. reproduce and Share the Licensed Material, in whole or
143 | in part; and
144 |
145 | b. produce, reproduce, and Share Adapted Material.
146 |
147 | 2. Exceptions and Limitations. For the avoidance of doubt, where
148 | Exceptions and Limitations apply to Your use, this Public
149 | License does not apply, and You do not need to comply with
150 | its terms and conditions.
151 |
152 | 3. Term. The term of this Public License is specified in Section
153 | 6(a).
154 |
155 | 4. Media and formats; technical modifications allowed. The
156 | Licensor authorizes You to exercise the Licensed Rights in
157 | all media and formats whether now known or hereafter created,
158 | and to make technical modifications necessary to do so. The
159 | Licensor waives and/or agrees not to assert any right or
160 | authority to forbid You from making technical modifications
161 | necessary to exercise the Licensed Rights, including
162 | technical modifications necessary to circumvent Effective
163 | Technological Measures. For purposes of this Public License,
164 | simply making modifications authorized by this Section 2(a)
165 | (4) never produces Adapted Material.
166 |
167 | 5. Downstream recipients.
168 |
169 | a. Offer from the Licensor -- Licensed Material. Every
170 | recipient of the Licensed Material automatically
171 | receives an offer from the Licensor to exercise the
172 | Licensed Rights under the terms and conditions of this
173 | Public License.
174 |
175 | b. No downstream restrictions. You may not offer or impose
176 | any additional or different terms or conditions on, or
177 | apply any Effective Technological Measures to, the
178 | Licensed Material if doing so restricts exercise of the
179 | Licensed Rights by any recipient of the Licensed
180 | Material.
181 |
182 | 6. No endorsement. Nothing in this Public License constitutes or
183 | may be construed as permission to assert or imply that You
184 | are, or that Your use of the Licensed Material is, connected
185 | with, or sponsored, endorsed, or granted official status by,
186 | the Licensor or others designated to receive attribution as
187 | provided in Section 3(a)(1)(A)(i).
188 |
189 | b. Other rights.
190 |
191 | 1. Moral rights, such as the right of integrity, are not
192 | licensed under this Public License, nor are publicity,
193 | privacy, and/or other similar personality rights; however, to
194 | the extent possible, the Licensor waives and/or agrees not to
195 | assert any such rights held by the Licensor to the limited
196 | extent necessary to allow You to exercise the Licensed
197 | Rights, but not otherwise.
198 |
199 | 2. Patent and trademark rights are not licensed under this
200 | Public License.
201 |
202 | 3. To the extent possible, the Licensor waives any right to
203 | collect royalties from You for the exercise of the Licensed
204 | Rights, whether directly or through a collecting society
205 | under any voluntary or waivable statutory or compulsory
206 | licensing scheme. In all other cases the Licensor expressly
207 | reserves any right to collect such royalties.
208 |
209 |
210 | Section 3 -- License Conditions.
211 |
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 |
215 | a. Attribution.
216 |
217 | 1. If You Share the Licensed Material (including in modified
218 | form), You must:
219 |
220 | a. retain the following if it is supplied by the Licensor
221 | with the Licensed Material:
222 |
223 | i. identification of the creator(s) of the Licensed
224 | Material and any others designated to receive
225 | attribution, in any reasonable manner requested by
226 | the Licensor (including by pseudonym if
227 | designated);
228 |
229 | ii. a copyright notice;
230 |
231 | iii. a notice that refers to this Public License;
232 |
233 | iv. a notice that refers to the disclaimer of
234 | warranties;
235 |
236 | v. a URI or hyperlink to the Licensed Material to the
237 | extent reasonably practicable;
238 |
239 | b. indicate if You modified the Licensed Material and
240 | retain an indication of any previous modifications; and
241 |
242 | c. indicate the Licensed Material is licensed under this
243 | Public License, and include the text of, or the URI or
244 | hyperlink to, this Public License.
245 |
246 | 2. You may satisfy the conditions in Section 3(a)(1) in any
247 | reasonable manner based on the medium, means, and context in
248 | which You Share the Licensed Material. For example, it may be
249 | reasonable to satisfy the conditions by providing a URI or
250 | hyperlink to a resource that includes the required
251 | information.
252 |
253 | 3. If requested by the Licensor, You must remove any of the
254 | information required by Section 3(a)(1)(A) to the extent
255 | reasonably practicable.
256 |
257 | 4. If You Share Adapted Material You produce, the Adapter's
258 | License You apply must not prevent recipients of the Adapted
259 | Material from complying with this Public License.
260 |
261 |
262 | Section 4 -- Sui Generis Database Rights.
263 |
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 |
267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 | to extract, reuse, reproduce, and Share all or a substantial
269 | portion of the contents of the database;
270 |
271 | b. if You include all or a substantial portion of the database
272 | contents in a database in which You have Sui Generis Database
273 | Rights, then the database in which You have Sui Generis Database
274 | Rights (but not its individual contents) is Adapted Material; and
275 |
276 | c. You must comply with the conditions in Section 3(a) if You Share
277 | all or a substantial portion of the contents of the database.
278 |
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 |
283 |
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 |
286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 |
297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 |
307 | c. The disclaimer of warranties and limitation of liability provided
308 | above shall be interpreted in a manner that, to the extent
309 | possible, most closely approximates an absolute disclaimer and
310 | waiver of all liability.
311 |
312 |
313 | Section 6 -- Term and Termination.
314 |
315 | a. This Public License applies for the term of the Copyright and
316 | Similar Rights licensed here. However, if You fail to comply with
317 | this Public License, then Your rights under this Public License
318 | terminate automatically.
319 |
320 | b. Where Your right to use the Licensed Material has terminated under
321 | Section 6(a), it reinstates:
322 |
323 | 1. automatically as of the date the violation is cured, provided
324 | it is cured within 30 days of Your discovery of the
325 | violation; or
326 |
327 | 2. upon express reinstatement by the Licensor.
328 |
329 | For the avoidance of doubt, this Section 6(b) does not affect any
330 | right the Licensor may have to seek remedies for Your violations
331 | of this Public License.
332 |
333 | c. For the avoidance of doubt, the Licensor may also offer the
334 | Licensed Material under separate terms or conditions or stop
335 | distributing the Licensed Material at any time; however, doing so
336 | will not terminate this Public License.
337 |
338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 | License.
340 |
341 |
342 | Section 7 -- Other Terms and Conditions.
343 |
344 | a. The Licensor shall not be bound by any additional or different
345 | terms or conditions communicated by You unless expressly agreed.
346 |
347 | b. Any arrangements, understandings, or agreements regarding the
348 | Licensed Material not stated herein are separate from and
349 | independent of the terms and conditions of this Public License.
350 |
351 |
352 | Section 8 -- Interpretation.
353 |
354 | a. For the avoidance of doubt, this Public License does not, and
355 | shall not be interpreted to, reduce, limit, restrict, or impose
356 | conditions on any use of the Licensed Material that could lawfully
357 | be made without permission under this Public License.
358 |
359 | b. To the extent possible, if any provision of this Public License is
360 | deemed unenforceable, it shall be automatically reformed to the
361 | minimum extent necessary to make it enforceable. If the provision
362 | cannot be reformed, it shall be severed from this Public License
363 | without affecting the enforceability of the remaining terms and
364 | conditions.
365 |
366 | c. No term or condition of this Public License will be waived and no
367 | failure to comply consented to unless expressly agreed to by the
368 | Licensor.
369 |
370 | d. Nothing in this Public License constitutes or may be interpreted
371 | as a limitation upon, or waiver of, any privileges and immunities
372 | that apply to the Licensor or You, including from the legal
373 | processes of any jurisdiction or authority.
374 |
375 |
376 | =======================================================================
377 |
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 |
395 | Creative Commons may be contacted at creativecommons.org.
396 | © 2022 GitHub, Inc.
397 | Terms
398 | Privacy
399 | Security
400 | Status
401 | Docs
402 | Contact GitHub
403 | Pricing
404 | API
405 | Training
406 | Blog
407 | About
408 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Web Archiving Course
2 |
3 | Primary Author:
4 | * Mark Phillips, Ph.D.
5 | * mark.phillips@unt.edu
6 | * https://twitter.com/vphill
7 |
8 | ## Overview
9 |
10 | This course is divided into [modules](./modules/) that are designed to build sequentally throughout the course.
11 |
12 | There are fourteen modules in total starting the first week of class and ending the last week of the semester.
13 |
14 | Each week there is an **_Overview and Objectives_** section which gives students an idea of what is going to be happening in that week's module.
15 |
16 | The **_Readings_** page will list the required and optional readings for that week. In addition to traditional readings, there are videos and sometimes audio recordings for students to listen to.
17 |
18 | A feature of each module called _**Exploring Web Archives**_ is included to give students a bit of a guided introduction to the wide range of web archives and collections that exist around the world. These explorations are generally a part of the weekly discussion.
19 |
20 | Finally, in each module there is a **_Discussion_** related to the topics covered that week. Additionally, the web archives the students explored in the Exploring Web Archives section will come back as part of the weekly discussion.
21 |
22 | Generally the current week's module and the following week's module will get published for students that like to work a bit ahead. The graded discussions are opened on Monday morning of the module week.
23 |
24 | In addition to weekly readings, activities, and discussions, there are four major [assignments](./assignments/) spread across the semester. The final assignment is informed by the third assignment.
25 |
26 | ## Sections Taught
27 |
28 | This course has been taught in the following semesters.
29 |
30 | * [UNT INFO 5960.001 - Web Archiving](syllabus-5960.001-Web-Archiving-2023-Spring.md) - Spring 2023
31 | * [UNT INFO 5960.001 - Web Archiving](syllabus-5960.001-Web-Archiving-2022-Spring.md) - Spring 2022
32 |
33 |
34 | ## Corrections / Suggestions / Contributions
35 |
36 | If you notice typos, have readings that you would like to suggest, or have ideas for web archiving activities I am interested in receiving contributions to this course. Please reach out to me directly, or if you are inclined, submit an __Issue__ with this repository.
37 |
38 | ## Acknowledgements
39 |
40 | The readings and activities in this course were in part informed by instructors of similar courses around the country. I would like to highlight the work of Samantha Abrams (WISC) , Lori Donovan (UMICH), and Ayoung Yoon (UNC).
41 |
42 | ## Contributors
43 |
44 | Kristy Phillips - https://github.com/k-phillips
45 |
46 | ## License
47 |
48 | 
This work is licensed under a Creative Commons Attribution 4.0 International License.
49 |
--------------------------------------------------------------------------------
/assignments/assignment-01.md:
--------------------------------------------------------------------------------
1 | # Assignment 1: Web Archive Critique
2 |
3 | ## Context:
4 | The purpose of the Web Archive Critique is to explore a web archive and describe what you observe. The assignment is designed for you to share your observations about an existing web archive, including the organization responsible for creating the archive, the scope of the collection, any quality or performance issues you find, and finally to discuss options that might improve the experience with the collection.
5 |
6 | ## Assignment:
7 | Identify an existing web archive. This could be one that you have encountered in the course so far or it can be something you discover on your own. The readings and exploring web archives portion of Module Three would be good places to start to discover potential web archives. Once you select the archive, explore the collected websites to get a good idea of what is or isn’t collected as part of the archive.
8 |
9 | For this assignment, choose a topical web archive. Instead of The Library of Congress Web Archives you would choose one of the archiving initiatives they have. Likewise if you look at an Archive-It collection, instead of choosing an institution, which might have multiple projects, you are better off choosing a more topical, or event focused web archive.
10 |
11 | In this assignment you will be describing a web archive. It is important to include links to content within the web archive that support your observations.
12 |
13 | ## Organization and Content:
14 | The analysis should have the following sections: Background, Scope, Quality, Closing, and References.
15 |
16 | ### Background
17 |
18 | Provide an overview of the web archive that you have selected. This should include information about the collecting organization, the motivation for creating this web archive, and any technical information that is available about the building the collection.
19 |
20 | Other questions that are good to think about (though might not apply to all collections) include:
21 | When was the collection created? Why was it created? Does it expand upon an existing physical collection or support a program of study or scholarship at that institution? Who is responsible for the creation of this web archive? Is it a single institution or a collaborative project? What tools/platforms were used in the creation of this web archive? Is it an ongoing collection or has it completed? How are users expected to find this web archive? How is it cataloged or described?
22 |
23 | ### Scope
24 | What is the scope of the content being collected? Does it focus on a specific event or is it based around a topic or subject? Does it contain specific kinds of websites like political websites or election websites? Does it contain websites from a specific period of time?
25 |
26 | Just as important as what is included, what doesn’t seem to be included in the archive?
27 |
28 | ### Quality
29 | Describe the overall quality of the captured websites in the archive. Are there types of content that don’t seem to display or render in the playback tool very well? Does most of the content display correctly? If there are specific websites or website features that don’t seem to display well, discuss that in this section. Include links to examples that demonstrate the issues you identify whenever possible.
30 |
31 | ### Closing
32 | Discuss the overall observations of the web archive you chose. Discuss observations you would give the collection creators given the opportunity. Provide any suggestions for additional content that might be missing from the collection.
33 |
34 | ### References
35 | Any references cited within the document.
36 |
37 | ## Layout Specifics:
38 | 2-3 pages of textual content (1000-1500 words), font-size 11 pt, double spaced with 1 inch margins throughout the document.
39 |
40 | Feel free to include screenshots as needed to provide examples or highlight points. Don’t try to fill up space with the screenshots, if they make your document a bit longer than three pages that isn’t a problem.
41 |
42 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
43 |
44 | Put your last name in the upper right margin. Include pagination in the bottom margin.
45 |
46 | Name the document Assignment1_lastname.docx, Assignment1_lastname.doc, or Assignment1_lastname.odf depending on which tool you use. You will submit this to the **Major Assignment: Web Archive Critique** in Canvas.
47 |
48 | ## Grading Rubric
49 | Design (10 points)
50 | * Does the document follow the specific instructions for the assignment?
51 | * Does the document contain the correct information in the header and footer?
52 | * Does the document use appropriate margins, line spacing, and font size?
53 | * Is the document’s length appropriate based on the instructions?
54 |
55 | Content (30 points)
56 | * Does the document introduce the selected web archive?
57 | * Does the document identify the institutions involved with the creation of the web archive?
58 | * Does the document describe the scope of content in the web archive?
59 | * Does the document discuss the quality of the web captures contained in the web archive?
60 | * Does the document include critique or suggestions for improvement?
61 |
62 | Linking and Citations (5 points)
63 | * Does the document include links to the web archive?
64 | * Does the document include links to support observations or critique?
65 | * Does the document include links to examples of quality issues?
66 | * Does the document include properly formatted citations?
67 |
68 | Delivery (5 points)
69 | * Was the document submitted to the correct assignment module on Canvas?
70 | * Was the document submitted on time?
71 | * Was the document submitted in the correct file format (.doc, .docx, .odf)?
72 | * Was the document submitted with the correct file name?
73 |
74 |
75 |
76 |
--------------------------------------------------------------------------------
/assignments/assignment-02.md:
--------------------------------------------------------------------------------
1 | # Assignment 2: Web Archive Tools Critique
2 |
3 | ## Context:
4 | The purpose of the Web Archive Tools Critique is to learn to critically evaluate technology tools and services related to the web archiving process. The assignment is designed for you to share your observations about an existing tool or service related to the broad practice of archiving the web. These observations will include who is responsible for creating the tool/service, what problem is it trying to solve, are there other tools in this space that do something similar or different, what are the costs (if any) associated with the tool, and how users generally interact with the tool (service, software that is run locally, browser plugin, tool requiring a server).
5 |
6 | ## Assignment:
7 | Identify an existing tool used in the web archiving space. I suggest starting with the “Tools and Software” section of the Awesome Web Archiving list from the IIPC (https://github.com/iipc/awesome-web-archiving), but if you have another tool or service in mind that isn’t on that list feel free to use it, the tool just has to be related to the overall web archiving process. Once you select the tool/service, explore the documentation/website for that tool/service to get a good idea of what problem it is trying to solve, and other information about the creation of the tool/service.
8 |
9 | While it isn’t strictly required, I suggest you pick a tool or service that you are able to download, create an account with, or generally use as part of your assignment. I couldn’t imagine writing about a tool I am not able to see, test, or use in any sort of believable way and you will find this assignment much easier if you pick something you can actually test.
10 |
11 | In this assignment it is important to include links to relevant web pages and documentation related to the tool that supports your observations.
12 |
13 | ## Organization and Content:
14 | The assignment should start with the title “Assignment 2: Web Archive Tools Critique” centered at the top of the first page.
15 |
16 | The analysis should have the following section headings: Introduction, Problem Space, Technology Requirements, Assessment, and References.
17 |
18 | ### Introduction
19 | Provide an overview of the web archive tool or service that you have selected. This should include information about the tool itself, the motivation for creating this tool or service, and a brief description of what this tool or service does. Other useful information to include in the introduction is who is responsible for this tool? How long has this tool or service been around? What kind of software license does it have? What are the costs involved with using this tool?
20 |
21 | ### Problem Space
22 | Provide a discussion of why this tool or service exists. What is the problem space that this tool or service was created for? Go into more detail than in the introduction about what the tool or service actually accomplishes. How does this tool help to solve or minimize the issues in the problem space.
23 |
24 | ### Technology Requirements
25 | Describe the technology requirements for using this kind of tool or service. Is it designed to be downloaded and run on a desktop or run as a server? Is it a hosted service that requires an account and a subscription? What kinds of environments is the tool able to be installed, Windows, Linux, Mac? If you have a chance to install the tool and try using it, what experience did you have in the process?
26 |
27 | You should also include any information (and links) related to online documentation for the tool or service. Does the tool or service have an online mailing list for help? Is there a help forum?
28 |
29 | ### Assessment
30 | Discuss the overall observations of the web archive tool or service that you chose. Would you recommend this tool for others to use? Do you think that it solves or minimizes the problem it exists to solve? Does it seem like the tool is still maintained and in use? What is the size of the user community of the tool? Is there enough documentation available to get the tool installed and working? Are there things that would make using the tool easier that you would suggest to the project owner?
31 |
32 | ### References
33 |
34 | Any references cited within the document. Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
35 |
36 | ## Layout Specifics:
37 | 2-3 pages of textual content (1000-1500 words), font-size 11 pt, double spaced with 1 inch margins throughout the document.
38 |
39 | The title for the assignment should be centered horizontally at the top of the first page (just like this document). Sections should be bolded and slightly larger than the 11pt font used in the rest of the document. I suggest using the Headings available in most word processing tools.
40 |
41 | Feel free to include screenshots as needed to provide examples or highlight points. Don’t try to fill up space with the screenshots, if they make your document a bit longer than three pages that isn’t a problem.
42 |
43 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
44 |
45 | Put your last name in the upper right margin. Include pagination in the bottom margin.
46 |
47 | Name the document Assignment2_lastname.docx, Assignment2_lastname.doc, or Assignment2_lastname.odf depending on which tool you use. You will submit this to the **Major Assignment: Web Archive Tool Critique** in Canvas.
48 |
49 | ## Grading Rubric
50 | Design (10 points)
51 | * Does the document follow the specific instructions for the assignment?
52 | * Does the document contain a title and section headings?
53 | * Does the document contain the correct information in the header and footer?
54 | * Does the document use appropriate margins, line spacing, and font size?
55 | * Is the document’s length appropriate based on the instructions?
56 |
57 | Content (30 points)
58 | * Does the document introduce the selected web archive tool or service?
59 | * Does the document identify the person/organization responsible for the creation of the tool or service?
60 | * Does the document describe the problem space the tool or service is trying to solve?
61 | * Does the document discuss the functionality and features of the tool or service?
62 | * Does the document include critique or limitations for tool or service?
63 |
64 | Linking and Citations (5 points)
65 | * Does the document include links to the web archive tool’s website or home page?
66 | * Does the document include links to support observations or critique?
67 | * Does the document include properly formatted citations?
68 |
69 | Delivery (5 points)
70 | * Was the document submitted to the correct assignment module on Canvas?
71 | * Was the document submitted on time?
72 | * Was the document submitted in the correct file format (.doc, .docx, .odf)?
73 | * Was the document submitted with the correct file name?
74 |
75 |
--------------------------------------------------------------------------------
/assignments/assignment-03.md:
--------------------------------------------------------------------------------
1 | # Assignment 3: Web Archive Collection Plan
2 |
3 | ## Context:
4 |
5 | The purpose of the Web Archive Collection Plan is to apply the things you have learned so far in this course and begin to think about building an actual web archive collection. The assignment is designed to give you experience in developing a collection plan that relates to a web archive that you will be building in the final project for this course. This assignment will make use of an existing set of collection planning guidelines that were introduced to you by Murray and Hsieh (2006).
6 |
7 | ## Assignment:
8 |
9 | For this assignment you will make use of the Collection Planning Guidelines by Murray and Hsieh https://digital.library.unt.edu/ark:/67531/metadc33006/ that was introduced in the Collection Policies Module of this course. This set of guidelines provides information about what to include in a collection plan. While you should be familiar with the entirety of the document, Section 3, Creating a Web Collection Plan (p. 18), will be the place that you will really want to read closely. We will be developing our Collection Plans and including sections 1-4.
10 |
11 | * Section 1. Mission & Scope
12 | * Section 2. Selection Activities
13 | * Section 3. Web Site Acquisition
14 | * Section 4. Descriptive Metadata Requirements
15 |
16 | This assignment will feed into your final project for this course, which involves the creation of a small web archive collection using the Conifer tool. More information about the final project will be released in a few weeks. For now, this assignment will start you thinking about building a web archive collection and the final project will follow up with actually doing the crawling.
17 |
18 | For this assignment, you will be creating the Collection Policy for a collection of archived web pages. The exact topic of the collection is up to you. You can choose a topic or subject-based collection, event-based collection, or organizational collection. You will want to scope your collection large enough so that it can include at least 15 seed urls but you don’t want it too broad where it becomes infeasible to create as part of this course. Some sections (such as the Mission) might require you to get creative to complete. Feel free to use your imagination on these to either create a fictional organization that this collection belongs to, or feel free to develop it as part of another institution on their behalf.
19 |
20 | For this assignment, you will need to identify 5-10 seed URLs you will include in your web archive collection. In addition to the seed URL you will include a description of why that seed has been included in this collection.
21 |
22 | An example collection plan using these guidelines for the CyberCemetery is available for reference here. https://digital.library.unt.edu/ark:/67531/metadc36313/
23 |
24 | ## Organization and Content:
25 | The assignment should start with the title “Assignment 3: Web Archive Collection Plan” centered at the top of the first page.
26 |
27 | The following layout as described by the Collection Planning Guidelines in its Appendix A. Web Collection Plan Outline (p. 41) is a good starting place for organizing your document.
28 |
29 | Section 1. Mission & Scope
30 |
31 | 1. Mission Statement
32 | 2. User Group(s)
33 | 3. Collection Subject, Theme, or Event
34 | 4. Curator(s)
35 |
36 | Section 2. Selection Activities
37 |
38 | 1. Seed List
39 | (Include 5-10 seed URLs for your web archive collection and include a description of why they are included in this collection. These will be used in the final project).
40 | 1. URL(s)
41 | 2. Brief Description(s)
42 | 2. Initial Boundary Specification
43 | 1. Depth of linked web pages within the seed URL host
44 | 2. Inclusion or exclusion of linked web pages from external hosts for each seed URL host
45 | * Depth of linked web pages from external hosts (if included)
46 | 3. Rights Metadata
47 | 1. Rights designation
48 | 2. Rights metadata
49 | 3. Linked and sourced objects
50 |
51 | Section 3. Web Site Acquisition
52 |
53 | 1. Frequency of Capture
54 | 1. Date
55 | 2. Interval
56 | 2. Capture Boundaries
57 | 1. Depth of linked web pages within the seed URL host
58 | 2. Inclusion or exclusion of linked web pages from external hosts for each seed URL host
59 | * Depth of linked web pages from external hosts (if included)
60 | 3. Material Types & Formats
61 | 1. Excluded types
62 | 2. Excluded formats
63 | 4. Interactive & Dynamic Content
64 | 1. Authentication (username/password)
65 | 2. Email links
66 | 3. Forms
67 | 4. Database-generated pages (based on user queries)
68 | 5. Dynamically or programmatically generated web pages
69 |
70 | Section 4. Descriptive Metadata Requirements
71 |
72 | 1. Level of Description
73 | 1. Collection Level
74 | 2. Web Site Level
75 | 3. Information object level
76 | 2. Metadata elements
77 | 1. Essential
78 | 2. Desirable
79 | 3. Controlled vocabularies
80 |
81 | **References**
82 | Any references cited within the document. Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
83 |
84 | ## Layout Specifics:
85 | 4-5 pages of textual content, font-size 11 pt, double spaced with 1 inch margins throughout the document. Include page numbers in the bottom right side of the footer. Each of the four sections for this document should fill roughly a page.
86 |
87 | The title for the assignment should be centered horizontally at the top of the first page (just like this document). Sections should be bolded and slightly larger than the 11pt font used in the rest of the document. I suggest using the Headings available in most word processing tools.
88 |
89 | Feel free to include screenshots as needed to provide examples or highlight points.
90 |
91 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
92 |
93 | Put your last name in the upper right margin. Include pagination in the bottom margin.
94 |
95 | Name the document Assignment3_lastname.docx, Assignment3_lastname.doc, or Assignment3_lastname.odf depending on which tool you use.
96 | You will submit this to the **Major Assignment: Web Archive Collection Plan** in Canvas.
97 |
98 | ## Grading Rubric
99 |
100 | Design (10 points)
101 | * Does the document follow the specific instructions for the assignment?
102 | * Does the document contain a title and section headings?
103 | * Does the document contain the correct information in the header and footer?
104 | * Does the document use appropriate margins, line spacing, and font size?
105 | * Is the document’s length appropriate based on the instructions?
106 |
107 | Content (30 points)
108 | * Does the document introduce the mission and scope of the collection?
109 | * Does the document identify the collection focus such as subject, theme, or event?
110 | * Does the document have at least five seed URLs and descriptions?
111 | * Does the document discuss the frequency of capture and capture scope?
112 | * Does the document include information about metadata elements?
113 |
114 | Linking and Citations (5 points)
115 | * Does the document have at least five seed URLs and descriptions?
116 | * Does the document include necessary citations?
117 | * Does the document include properly formatted citations?
118 |
119 | Delivery (5 points)
120 | * Was the document submitted to the correct assignment module on Canvas?
121 | * Was the document submitted on time?
122 | * Was the document submitted in the correct file format (.doc, .docx, .odf)?
123 | * Was the document submitted with the correct file name?
124 |
--------------------------------------------------------------------------------
/assignments/assignment-04.md:
--------------------------------------------------------------------------------
1 | # Assignment 4: Building a Web Archive
2 |
3 | ## Context:
4 |
5 | The purpose of the Building a Web Archive assignment is to function as a final project for this course. It has been designed to provide you with an opportunity to draw from the readings, discussion, and assignments you have worked on in previous modules of this course, and build a sample web archive based on your Assignment 3: Web Archive Collection Plan.
6 |
7 | ## Assignment:
8 |
9 | For this assignment you will make use of the Web Archive Collection Plan that you submitted in Assignment 3: and begin to build that web archive.
10 |
11 | We will use the free Conifer (previously called Webrecorder.io) tool located at https://conifer.rhizome.org/
12 |
13 | In Module 13 you signed up for a free account, created a public collection, and uploaded a link for that collection to the class discussion board. You will be adding seed URLs to that collection and making use of the tools and functionality in Conifer to conduct your crawls of those seeds.
14 |
15 | If you have questions about using the Conifer tool, I suggest you begin with the Conifer Help guides https://guide.conifer.rhizome.org/ or even look at some of the YouTube videos that discuss the operation of the service. Overall it should be something you are familiar with from class Web Archive Exercises.
16 |
17 | For your final project you will need to expand your initial seed list to a total of at least 15 seeds.
18 |
19 | For each of the seeds you will document at least the following pieces of information.
20 |
21 | | Metadata Fields | Description of Field |
22 | |-------------------------------------|------------------------------------------------------------------------------|
23 | | Seed URL |Seed URL you will be collecting. |
24 | | Pre-Crawl Review: | Problems that might exist for a crawler. |
25 | | Title of Seed: |The title of the seed/document/website |
26 | | Description of Seed URL: | Textual description of the seed URL |
27 | | Creator/Author/Publisher: | Who is responsible for the creation of this seed URL |
28 | | Reason for Inclusion in Collection: | Why did you include this seed in your collection? |
29 | | Post-Crawl Review: | After crawling, what are any limitations you notice in your crawled content. |
30 | | Crawled Seed URL | Link directly to the seed URL in your collection in Conifer. |
31 |
32 | You are welcome to include other metadata fields that are appropriate for the type of collection you are creating. These could include country, language, branch of government, Olympian name, political party, or anything else that would be helpful for a user trying to use your collection. The pre- and post-crawl review are opportunities to communicate some of the crawling challenges you see based on your experience in this course. It is also a way of describing any quality issues that you identify in your collection after you have completed your crawls.
33 |
34 | You are free to present the metadata and fields in any format that works for you, just make sure it is clear what the fields are, and which seed URL they belong to.
35 |
36 | When you are crawling your seed, you should keep two things in mind. First, you have a limited amount of space (5GB) for this work. So don’t go crazy trying to capture a ton of video for example. Second, you want to make sure you capture your seed at an appropriate level that fits within the scope of your collection. You may not need to capture the site in its entirety, just make sure you discuss what you did and didn’t crawl in your Pre-Crawl and Post-Crawl Review sections.
37 |
38 | ## Organization and Content:
39 |
40 | The assignment should start with the title “Assignment 4: Building a Web Archive” centered at the top of the first page.
41 |
42 | The following sections should be present as headings.
43 |
44 | ### Collection Overview
45 | This section outlines the collection you are building including links to the public collection page in the Conifer service. You should take into account the additional seeds you have added beyond those in your Collection Plan when writing this overview to make sure it incorporates the additional information you will include. In this section you should speak to the crawl modality of the web archive you are creating (domain, website, topical, event, document).
46 |
47 | ### Seed List
48 | This section will contain all 15 (or if you want to include more) seed URLs and associated metadata fields (as listed in the Assignment section above). You may format these in whatever way you feel best conveys the information clearly.
49 |
50 | ### References
51 | Any references cited within the document. Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
52 |
53 | ## Layout Specifics:
54 |
55 | Font-size 11 pt, double spaced with 1 inch margins throughout the document. Include page numbers in the bottom right side of the footer.
56 |
57 | The title for the assignment should be centered horizontally at the top of the first page (just like this document). Sections should be bolded and slightly larger than the 11pt font used in the rest of the document. I suggest using the Headings available in most word processing tools.
58 |
59 | Feel free to include screenshots as needed to provide examples or highlight points.
60 |
61 | Use APA standards for citations. Use an online source (Purdue OWL) for specifics about APA.
62 |
63 | Put your last name in the upper right margin. Include pagination in the bottom margin.
64 |
65 | Name the document Assignment4_lastname.docx, Assignment4_lastname.doc, or Assignment4_lastname.odf depending on which tool you use. You will submit this to the **Major Assignment: Building a Web Archive** in Canvas.
66 |
67 | ## Grading Rubric
68 |
69 | Design (15 points)
70 | * Does the document follow the specific instructions for the assignment?
71 | * Does the document contain a title and section headings?
72 | * Does the document contain the correct information in the header and footer?
73 | * Does the document use appropriate margins, line spacing, and font size?
74 | * Is the document’s length appropriate based on the instructions?
75 |
76 | Content (65 points)
77 | * Does the document provide an overview of the collection?
78 | * Does the document identify the collection crawl modality?
79 | * Does the document have at least fifteen seed URLs?
80 | * Does the document include the required metadata fields with each seed URL?
81 | * Does the archive content represent the seed list, pre- and post- crawl reviews?
82 |
83 | Linking and Citations (10 points)
84 | * Does the document have at least fifteen seed URLs and appropriate metadata?
85 | * Does the document include necessary citations?
86 | * Does the document include properly formatted citations?
87 |
88 | Delivery (10 points)
89 | * Was the document submitted to the correct assignment module on Canvas?
90 | * Was the document submitted on time?
91 | * Was the document submitted in the correct file format (.doc, .docx, .odf)?
92 | * Was the document submitted with the correct file name?
93 |
--------------------------------------------------------------------------------
/modules/images/.ignore:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/modules/images/module-02-unt-homepage-source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-02-unt-homepage-source.png
--------------------------------------------------------------------------------
/modules/images/module-02-unt-homepage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-02-unt-homepage.png
--------------------------------------------------------------------------------
/modules/images/module-04-save-options.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-04-save-options.png
--------------------------------------------------------------------------------
/modules/images/module-05-save-page-now.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-05-save-page-now.png
--------------------------------------------------------------------------------
/modules/images/module-06-archive-today.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-06-archive-today.png
--------------------------------------------------------------------------------
/modules/images/module-07-oldweb-today-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-07-oldweb-today-01.png
--------------------------------------------------------------------------------
/modules/images/module-07-oldweb-today-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-07-oldweb-today-02.png
--------------------------------------------------------------------------------
/modules/images/module-08-archiveready-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-08-archiveready-01.png
--------------------------------------------------------------------------------
/modules/images/module-08-archiveready-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-08-archiveready-02.png
--------------------------------------------------------------------------------
/modules/images/module-08-arquivo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-08-arquivo.png
--------------------------------------------------------------------------------
/modules/images/module-09-awp-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-09-awp-01.png
--------------------------------------------------------------------------------
/modules/images/module-09-ukwa.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-09-ukwa.png
--------------------------------------------------------------------------------
/modules/images/module-10-time-travel-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-time-travel-01.png
--------------------------------------------------------------------------------
/modules/images/module-10-time-travel-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-time-travel-02.png
--------------------------------------------------------------------------------
/modules/images/module-10-trove-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-trove-01.png
--------------------------------------------------------------------------------
/modules/images/module-10-trove-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-trove-02.png
--------------------------------------------------------------------------------
/modules/images/module-10-trove-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-10-trove-03.png
--------------------------------------------------------------------------------
/modules/images/module-11-commoncrawl-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-commoncrawl-01.png
--------------------------------------------------------------------------------
/modules/images/module-11-replayweb-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-replayweb-01.png
--------------------------------------------------------------------------------
/modules/images/module-11-replayweb-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-replayweb-02.png
--------------------------------------------------------------------------------
/modules/images/module-11-replayweb-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-11-replayweb-03.png
--------------------------------------------------------------------------------
/modules/images/module-13-conifer-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-01.png
--------------------------------------------------------------------------------
/modules/images/module-13-conifer-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-02.png
--------------------------------------------------------------------------------
/modules/images/module-13-conifer-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-03.png
--------------------------------------------------------------------------------
/modules/images/module-13-conifer-04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-13-conifer-04.png
--------------------------------------------------------------------------------
/modules/images/module-14-robust-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-14-robust-01.png
--------------------------------------------------------------------------------
/modules/images/module-14-robust-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-14-robust-02.png
--------------------------------------------------------------------------------
/modules/images/module-14-robust-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vphill/web-archiving-course/a8285a5f5a9233b3afe4416b73c6d66ec71a83e5/modules/images/module-14-robust-03.png
--------------------------------------------------------------------------------
/modules/module-00-introductions.md:
--------------------------------------------------------------------------------
1 | # Overview of a Weekly Module
2 |
3 | This course is divided into modules that correspond to the weeks in the semester.
4 |
5 | There are fourteen modules in total starting the first week of class and ending the last week of the semester.
6 |
7 | Each week there is an **_Overview and Objectives_** which gives you an idea of what is going to be happening in this week's module.
8 |
9 | The **_Readings_** page will list the required and optional readings for that week. In addition to traditional readings, there are videos and sometimes audio recordings for you to listen to.
10 |
11 | A feature of each module called _**Exploring Web Archives**_ is included to give you a bit of a guided introduction to the wide range of web archives and collections that exist around the world. These explorations are generally a part of the weekly discussion.
12 |
13 | Finally, in each module there is a **_Discussion_** related to the topics covered that week. Additionally, the web archives you explored in the Exploring Web Archives section will come back as part of the weekly discussion.
14 |
15 | Generally the current week's module and the following week's module will get published for those that like to work a bit ahead. The graded discussions are opened on Monday morning the module week.
16 |
17 | # Read the Syllabus
18 |
19 | Take the time to read the Syllabus for the course.
20 |
21 | %Link to Syllabus document%
22 |
23 | # Discussion - Introduce Yourself
24 |
25 | This is a pretty standard first discussion for an online course.
26 |
27 | In a paragraph or two introduce yourself, where you are from, what program you are in, and your progress in that program.
28 |
29 | Try and share some things about yourself that will allow me and your classmates to get to know you a little better.
30 |
31 | If possible, include a picture so that we can include a face with a name in the discussions.
32 |
--------------------------------------------------------------------------------
/modules/module-01-what-is-a-web-archive.md:
--------------------------------------------------------------------------------
1 | # Module One - What is a Web Archive?
2 |
3 | ## Module One - Overview and Objectives
4 |
5 | ### Overview:
6 |
7 | This week we are going to explore the subject of what is a web archive, the reasons behind building these types of collections, and finally, what kinds of content you might expect in a web archive.
8 |
9 | There are a couple of short videos to watch as well as some readings that present the concept of what a web archive is.
10 |
11 | There is a graded discussion for this module.
12 |
13 | ### Objectives:
14 |
15 | 1. Understand why web archives exist
16 | 2. Begin to interact with existing web archives
17 | 3. Explore existing web archives and report out on what you discover.
18 |
19 |
20 | ## Module One - Readings
21 |
22 | ### Web Archiving:
23 |
24 | * UK Web Archive. "What is a Web Archive?" (April 2, 2015) https://www.youtube.com/watch?v=ubDHY-ynWi0
25 | * Potter, Abbey. “The Why and What of Web Archives.” The Signal: Digital Preservation (April 29, 2014) http://blogs.loc.gov/digitalpreservation/2014/04/the-why-and-what-of-web-archives/
26 | * LePore, Jill. “The Cobweb: Can the Internet be archived?” The New Yorker (January 26, 2015) http://www.newyorker.com/magazine/2015/01/26/cobweb
27 | * National Digital Stewardship Alliance. “Web Archiving in the United Sates: A 2017 Survey” 2017 https://osf.io/ht6ay/
28 | * Skim this survey report.
29 | * Bragg, Molly, Hanna, Kristine, et al. “The Web Archiving Life Cycle Model.” (March 2013) http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf
30 | * Introduction: pp. 1-5
31 | * Vision and Objectives: pp. 5-8
32 |
33 | ### Digital Preservation:
34 | * Lavoie, Brian. “The Open Archival Information System (OAIS) Reference Model: Introductory Guide” (2nd Edition) DPC Technology Watch Report 14-02 (October 2014) http://dx.doi.org/10.7207/TWR14-02
35 | * Section 5 and 6 (p 7-28)
36 |
37 | ## Module One - Exploring Web Archives
38 |
39 | ### Exploring Web Archives
40 |
41 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
42 |
43 | This week we will start off with the largest web archive that we have, the Internet Archives' Wayback Machine.
44 |
45 | https://web.archive.org/
46 |
47 | We will learn more about the Internet Archive in future weeks. For now, type a url into the Wayback Machine and see what you can find. You could try https://unt.edu or maybe see what cnn.com looked like ten years ago. You can also try to view the different thumbnails that are rotating on the page. Explore some of the features of the interface after you have selected a URL to explore.
48 |
49 | ## Module One - Discussion
50 |
51 | ### Discussion Post:
52 |
53 | In at least one paragraph, discuss what you learned about web archiving in this week's introduction to the topic. Were you familiar with this area before this course? Have you ever found yourself using a web archive in your research or work? Knowing that there are web archives, how do you think they might be useful in your work in the future? Finally, if there was something that didn't get answered in the readings, or if a question came up that you would like to hear others ideas on, please include that in your post.
54 |
55 | In at least one paragraph, discuss what you learned about the Internet Archive's Wayback Machine. What URLs did you look at? Were you surprised by anything that you found? What is your previous experience with the Wayback Machine?
56 |
57 | ### Class Engagement:
58 |
59 | After you have made the discussion post described above, take the time to response, comment, or engage with at least two of your classmates posts.
60 |
61 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
62 |
--------------------------------------------------------------------------------
/modules/module-02-what-is-the-web.md:
--------------------------------------------------------------------------------
1 | # Module Two - What is the Web?
2 |
3 | ## Module Two - Overview and Objectives
4 |
5 | ### Overview:
6 | This week we are going to be looking at the building blocks that make up the web. Having a basic familiarity of how the web works is important as we begin to discuss web archiving.
7 |
8 | There are several short vides to watch as well as some readings that provide an overview of common web components such as HTTP, URLS, HTML and HTTP Headers.
9 |
10 | ### Objectives:
11 | 1. Become familiar with building blocks of the web.
12 | 2. Understand how the Web and the Internet are related but different.
13 | 3. Become familiar with the basics of HTML and how to source of websites.
14 |
15 | ## Module Two - Readings
16 |
17 | ### Web Architecture
18 |
19 | * Computerphile. _Web vs Internet (Deep Dark Web Pt1)_ (June 17, 2016) - https://www.youtube.com/watch?v=oiR2mvep_nQ
20 | * Eye on Tech. _What is a URL? URL Components and How it Works_ (January 8, 2020) https://www.youtube.com/watch?v=-LPe4tYckkg
21 | * Computer Hope. URL (December 5, 2021) https://www.computerhope.com/jargon/u/url.htm
22 | * CoffeeCup Software. _Absolute Vs. Relative Paths/Links_ (September 6, 2017) https://www.coffeecup.com/help/articles/absolute-vs-relative-pathslinks/
23 | * Mozilla. _"An Overview of HTTP"_ https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview
24 | * WebConcepts. _Web Server Concepts and Examples_ (October 5, 2020). https://www.youtube.com/watch?v=9J1nJOivdyw
25 | * SoftwareEngenius. _Learn in 5 Minutes: Http Headers (General/Request/Response/Entity)_ (July 31, 2020) https://www.youtube.com/watch?v=1v7RoeXyww4
26 | * Fieldings, et. al. _RFC 2616 Section 10. Status Code Definitions._ https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
27 |
28 | ### Further Readings (and Videos)
29 |
30 | * Jake Wright. _Learn HTML in 12 Minutes_ (Nov 10, 2010). https://www.youtube.com/watch?v=bWPMSSsVdPk
31 | * Computerphile. SGML HTML XML What's the Difference? (Part 1) (Apr 13, 2016). https://www.youtube.com/watch?v=RH0o-QjnwDg
32 | * Computerphile. HTML: Poison or Panacea? (HTML Part 2) (Apr 22, 2016). https://www.youtube.com/watch?v=Q4dYwEyjZcY
33 | * Mozilla Getting started with HTML - https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML/Getting_started
34 | * Tim Burners-Lee Information Management: A Proposal (March 1989) - https://www.w3.org/History/1989/proposal.html
35 | * Ben Cotton. 6 RFCs for understanding how the internet works (And three for fun) (July 6, 2018) - https://opensource.com/article/18/7/requests-for-comments-to-know
36 | * The Internet Society. Hypertext Transfer Protocol -- HTTP/1.1 (June 1999) - https://datatracker.ietf.org/doc/html/rfc2616
37 |
38 | ## Module Two - HTML Exercise
39 |
40 | ### HTML Exercise
41 |
42 | Visit either https://texteditor.com/html/editor/ or https://htmlfiddle.net/ and experiment with what happens when you are using the What You See Is What You Get (WYSIWYG) text editor. In one side of the page you can type text, add formatting and you will see what the HTML markup is doing on the other side of the screen. Make sure you try to make several paragraphs, add some links and an image to see what happens. Finally experiment with different formats like bold, italics, lists and maybe a table. The goal is to see the different tags that are being created in the html.
43 |
44 | ### View the Source
45 | The next exercise is to become familiar with ways of viewing the source of the HTML pages you are using all the time in your web browsers. Most browsers will have a way of viewing the source HTML code that is used to render the pages you are looking at. This will include the links, images, stylesheets, javascript, and other markup needed to add structure, design, and interactivity to the webpages you are viewing online.
46 |
47 | Viewing the source is actually fairly easy on browsers, if you right click with you mouse on the page you are looking at, you will see an option that says something like "View Page Source". When you click on this option your browser will open a new tab and will show you what HTML makes up the page you are on.
48 |
49 | In looking at a few browsers on my Mac, here is what the options say that you should click on. It might be slightly different on another browser and operating system.
50 |
51 | * Chrome - View Page Source
52 | * Firefox - View Page Source
53 | * Safari - Show Page Source
54 |
55 | #### Example of what you should see
56 | For a website like https://unt.edu you might see this in your browser.
57 |
58 | 
59 |
60 | If you click on this page and view the source code you will see something that looks like this.
61 |
62 | 
63 |
64 | ### Exploring Web Archives
65 |
66 | This week we will be looking at the web archives of the Library of Congress.
67 |
68 | Pay attention to how the archived web sites are organized.
69 |
70 | What is the difference in this presentation of an archived website and collection of those sites?
71 |
72 | * Web Archiving - About this Program - https://www.loc.gov/programs/web-archiving/about-this-program/
73 | * Collections with Web Archives - https://www.loc.gov/web-archives/collections/
74 | * Web Archives - https://www.loc.gov/web-archives/collections/ (Links to an external site.)
75 |
76 | ## Discussion
77 |
78 | ### Discussion Post:
79 | In at least one paragraph, discuss what you learned about the components that are used together to build the web as we know it. What areas were you most or least familiar with before this weeks readings? Are there pieces that you would like to learn more about?
80 |
81 | In at least one paragraph, discuss collections you discovered at the Library of Congress? Were you surprised by what you found there? Are there things you think are missing based on your exploration of the web archive holdings? What would you like to know more about in relation to the Library of Congress Web Archives? Please include links to the specific sites you reference including links into the web archives themselves. One of the goals of this course is to become comfortable with linking into web archives and making them an active part of your online experience.
82 |
83 | Note: It is easy to go astray in the Library of Congress Collections. Make sure that what you are looking at are web archives, and not archival collections on the web.
84 |
85 | ### Class Engagement:
86 | After you have made the discussion post described above, take the time to response, comment, or engage with at least two of your classmates posts.
87 |
88 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
89 |
--------------------------------------------------------------------------------
/modules/module-03-who-does-web-archiving.md:
--------------------------------------------------------------------------------
1 | # Module Three - Who Does Web Archiving?
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 | This week we will look at who is building web archives. This will include what institutions, their type, where they are located, and most importantly, what they are collecting.
7 |
8 | There are a couple of short videos as well as some readings that discuss who is involved in the web archiving process.
9 |
10 | There is a graded discussion for this module.
11 |
12 | ### Objectives:
13 | 1. Be able to identify institutions who have web archiving initiatives.
14 | 2. Begin to evaluate the scope of a web archive.
15 |
16 | ## Readings
17 |
18 | ### Readings
19 |
20 | * Major D., Gomes D. (2021) Web Archives Preserve Our Digital Collective Memory. In: Gomes D., Demidova E., Winters J., Risse T. (eds) The Past Web. Springer, Cham. https://doi.org/10.1007%2F978-3-030-63291-5_2
21 | * This chapter is a resource that the UNT Libraries subscribes to.
22 | * This link will hopefully get you right to the pdf - https://link-springer-com.libproxy.library.unt.edu/content/pdf/10.1007%2F978-3-030-63291-5_2.pdf
23 | * Pennock, Maureen. “Web Archiving” Digital Preservation Coalition Technology Watch Report 13-01 (March 2013) - http://dx.doi.org/10.7207/twr13-01
24 | * Read 1, 1.1, 1.2, 1.3, 1.4 (pages 3-8)
25 | * Skim the rest of the document.
26 | * National Digital Stewardship Alliance. Web Archiving in the United Sates: A 2017 Survey. 2017 https://osf.io/ht6ay/
27 | * Review this report by skimming it again.
28 | * Wikipedia List of Web archiving initiatives - https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives
29 | * Skim this page and explore some of the initiatives.
30 | * PBS NewsHour Internet history is fragile. This archive is making sure it doesn't disappear. (January 2, 2017) https://www.youtube.com/watch?v=K8I28erYFLc
31 |
32 | ### Who does Web Archiving?
33 |
34 | Internet Archive - https://archive.org
35 | * Wayback Machine - https://web.archive.org
36 |
37 | International Internet Preservation Consortium (IIPC) - https://netpreserve.org/
38 | * Members List - https://netpreserve.org/about-us/members/
39 | * IIPC Publications in the UNT Digital Library - https://digital.library.unt.edu/explore/partners/IIPC/browse/
40 |
41 | Archive-It - https://archive-it.org/
42 | * List of Organizations - https://archive-it.org/explore?show=Organizations
43 | * List of Collections - https://archive-it.org/explore?show=Collections
44 |
45 | UNT Libraries - https://webarchive.library.unt.edu/
46 | * CyberCemetery - https://cybercemetery.unt.edu/
47 | * UNT Web Archives - https://digital.library.unt.edu/explore/collections/UNTWEB/browse/?sort=date_d
48 | * UNT Libraries Archive-It Collections - https://archive-it.org/organizations/1181
49 | * End of Term Web Archive - https://eotarchive.org/
50 |
51 | Web Archiving Texas Interest Group
52 | * https://www.tdl.org/members/groups/web-archiving-texas-interest-group/
53 |
54 | ## Exploring Web Archives
55 |
56 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
57 |
58 | This week we will look at the institutions and organizations that are using the web archiving service Archive-It.
59 |
60 | Archive-It - https://archive-it.org/
61 |
62 | * List of Organizations - https://archive-it.org/explore?show=Organizations
63 | * List of Collections - https://archive-it.org/explore?show=Collections
64 |
65 | Take a look around these sites and explore two institutions in depth. You will report out on these institutions and their collections in this week's discussion.
66 |
67 | ## Discussion
68 |
69 | ### Discussion Post:
70 |
71 | In at least one paragraph, discuss what you learned about who is involved in the web archiving space. What kinds of institutions did you primarily see? What kinds of collections did you see from these institutions? Are there areas (geographically) that you didn't see much activity from? Why might that be?
72 |
73 | In at least one paragraph per institution, identify which institution or organization you looked at in the Archive-It platform. What kinds of content are they collecting? Did you notice any similarities with other organizations that you looked at? Did you notice any differences in the kinds of things they collect? Please include links to the specific sites you reference including links into the web archives themselves. One of the goals of this course is to become comfortable with linking into web archives and making them an active part of your online experience.
74 |
75 | ### Class Engagement:
76 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
77 |
78 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
79 |
--------------------------------------------------------------------------------
/modules/module-04-technology-overview.md:
--------------------------------------------------------------------------------
1 | # Module Four - Technology Overview
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 | The web is a growing and changing environment. Because of this constant change, the tools and processes used to harvest, capture, and archive the web also have to change in order to keep up. This module will introduce you to the major components involved in the web archiving process including capture or harvest, replay or playback, and finally discovery and access.
7 |
8 | Future modules will discuss these concepts in greater detail.
9 |
10 | There are several readings and a longer video recorded in 2019 to watch that will present the major components of technology used in web archiving.
11 |
12 | There is a graded discussion for this module.
13 |
14 | ### Objectives:
15 | 1. Be able to identify the high-level technology components related to web archiving.
16 | 2. Begin to learn about the different components and their primary uses.
17 | 3. Begin to understand some of the limitations present in harvesting resources from the web.
18 |
19 | ## Readings
20 |
21 | * Niu, j. (2012). An Overview of Web Archiving. _D-Lib Magazine_. 18(3/4) https://doi.org/10.1045/march2012-niu1
22 | * This article provides a good overview of the components in the web archiving space.
23 | * Texas Digital Library (2019). Intro to Web Archiving Texas #1 Web Archiving Technology, Tools & Resources - https://www.youtube.com/watch?v=vkSKPQccuMg
24 | * Mark Phillips, Associate Dean for Digital Libraries, University of North Texas
25 | * Courtney Mumma, Deputy Director, Texas Digital Library
26 | * Lauren Ko, Supervisor, Software Development Unit, University of North Texas
27 | * International Internet Preservation Consortium (2022). _ Awesome Web Archiving_ - https://github.com/iipc/awesome-web-archiving
28 | * Skim this list of tools and technologies for web archiving.
29 | * Follow a few different links and explore. (You will need to pick one for the discussion this week)
30 | * Hockx-Yu, H. (2009) Web Archiving Tools: An Overview - https://www.dpconline.org/docs/miscellaneous/events/394-0907hockxyumissing-links/file
31 | * This presentation does a great job of presenting a wide range of concerns related to the UK Web Archive.
32 |
33 | ## Archiving Exercise
34 |
35 | ### Web Archiving Exercise - Browser Based "Save As"
36 | In a previous module we looked at the HTML that makes up the web pages that we view in our browsers. We learned how to "view source" on a page and see the underlying code.
37 |
38 | In this exercise we will look at how a browser can be used to save web content to your local machine.
39 |
40 | Most browsers have the ability to save a web page to your local machine.
41 |
42 | This is found in one of two places. first you can look in the File dropdown at the top of your browser. The exact wording will be different depending on which browser and operating system you use, but on a Mac with Chrome I see "Save Page As".
43 |
44 | Another option is to right click on the page you are interested in saving and you will see something like "Save as" (again using Chrome on a Mac)
45 |
46 | You will generally have two different options when you save an HTML file from your browser (though sometimes there are more). They will be some variation of "HTML Only" and "Complete"
47 |
48 | Example save dialog from Chrome on a Mac.
49 |
50 | 
51 |
52 |
53 | ### Exercise
54 |
55 | Pick an HTML page, ideally the homepage of an agency or organization.
56 |
57 | 0. (just a hint) I like to create a new empty folder on my Desktop named something like "captures" so that when I'm trying to find this stuff later, it doesn't get all confused in my normal Downloads folder. As you are saving files you will need to navigate to this folder on your desktop but in the long run it will be easier to deal with.
58 | 1. Using Save As, first save the HTML file as "HTML only". Pay attention where on your hard drive you save the file. Next navigate to that location and try and open the file in your browser. Does the page display the same as the "live" version? What does the URL bar in your browser display for the "URL"?, What happens when you click on links, what displays in the URL after you click a link?
59 | 2. Going back to the homepage you chose (and not your saved copy), this time save the HTML file as "complete" or "all files". Pay attention where on your hard drive you save the file. Again navigate to the location you saved it and this time notice the different files that are present. In addition to the HTML file, what other files were downloaded? Next, open the file in your browser. Does the page display the same as the "live" version? What does the URL bar in your browser display for the "URL"?, What happens when you click on links, what displays in the URL after you click a link? What is the file size for all of the files that were downloaded?
60 | 3. View the source on these saved HTML versions and compare them to the "view source" on the live website. Do you notice any differences with the saved pages and the version that is on the web?
61 |
62 | ## Exploring Web Archives
63 |
64 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
65 |
66 | This week we will look at the Ivy Plus Library Web Collecting Program - https://ivpluslibraries.org/programs/ivy-plus-libraries-confederation-web-collecting-program/
67 |
68 | This program uses the Archive-It service for their web archiving activities.
69 |
70 | A gateway into their collections can be found here - https://archive-it.org/home/IvyPlus
71 |
72 | Explore the collections that are included in this program. In the discussion for this week you will describe what you find in these collections, why they are being collected, and the scope of what is being collected.
73 |
74 | ## Discussion
75 |
76 | ### Discussion Post:
77 | In at least one paragraph, discuss what you learned about the technologies involved in the web archiving space. In addition to the big buckets of Capture, Preserve, Playback what other things also should be thought about based on your readings?
78 |
79 | What tools did you explore in the Awesome List? Link to the tool and give a brief description of the problem it is trying to solve.
80 |
81 | In at least one paragraph discuss the web archive you identified this week in the Ivy Plus Library Web Collecting Program. Include a link to the web archive and discuss some of the types of content you found inside. Was there anything in their collections you hadn't expected? Are there things that you would have thought might be there? Please include links to the specific sites you reference including links into the web archives themselves. One of the goals of this course is to become comfortable with linking into web archives and making them an active part of your online experience.
82 |
83 | Finally, in at least one paragraph, what did you discover in the "Save As" exercise this week? What website did you capture? What happened when you opened the saved file in your browser? How was the HTML Only different from the other option to save things completely or save all files? What kinds of files did you see when you saved things? What differences did you notice between the saved versions of the web page and the live version?
84 |
85 | ### Class Engagement:
86 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
87 |
88 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
89 |
--------------------------------------------------------------------------------
/modules/module-05-capture.md:
--------------------------------------------------------------------------------
1 | # Module Five - Capture
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 | Capturing, harvesting, or crawling are usually used interchangeably to represent the acquisition process in web archiving. This module will take a deeper look at the process of acquiring content for a web archive, introduce you to some new terms, and give you a chance to create what might be your first web capture.
7 |
8 | This will build on concepts that you were introduced to in [Module Four](./module-04-technology-overview.md).
9 |
10 | There are several readings, some online documentation to skim, and several power points that you will review.
11 |
12 | ### Objectives:
13 | 1. Become familiar with common capture related terms such as seed, path, domain, subdomain.
14 | 2. Understand where acquisition of content fits into the lifecycle of a web archive.
15 | 3. Create your first web capture in the Wayback Machine at the Internet Archive.
16 |
17 | ## Readings
18 |
19 | ### Web Archiving
20 | * Archive-It Help Center, Glossary of Archive-It and Web Archiving Terms - https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms
21 | * Review these terms and pay specific attention to Crawl, Crawler, Document, Domain, Host, Scope, Seed, and Sub-domain.
22 | * About /robots.txt - https://www.robotstxt.org/robotstxt.html
23 | * Become familiar with the concepts discussed on this page.
24 | * This is how website owners can give hints to crawlers about what to crawl but usually what not to crawl.
25 | * International Internet Preservation Consortium (2020). Session 3A: Main Concepts and Technologies: Capture
26 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3a-slides/
27 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3a-notes/
28 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen.
29 | * Mohr, G., Stack, M., Ranitovic, I., Avery, D., & Kimpton, M. (2004). An Introduction to Heritrix An open source archival quality web crawler. International Web Archiving Workshop. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.6877&rep=rep1&type=pdf
30 | * This article is a tiny bit dated but is the best reference about the beginnings of Heritrix that I could find.
31 | * Davis, R. (2011). Saving the Smithsonian's Web. https://siarchives.si.edu/blog/saving-smithsonians-web
32 |
33 | ### Heritrix
34 | Review the following links to learn more about the Heritrix Crawler.
35 | * https://github.com/internetarchive/heritrix3
36 | * https://github.com/internetarchive/heritrix3/wiki
37 | * https://netpreserveblog.wordpress.com/2019/02/19/a-new-release-of-heritrix-3/
38 |
39 | Bonus Video Overview of Heritrix
40 |
41 | * Fisher, D. (2018). Heritrix Web Crawler. - https://www.youtube.com/watch?v=RmHG0MaFJSI
42 | * This is a video from a peer of yours at Simmons in the School of Library & Information Science. I think they do a great job in the presentation overall.
43 |
44 | ### Additional Optional Readings
45 | * Brunelle, J., Ferrante, K., Wilczek, E., Weigle, M. & Nelson, M. (2016). Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives. D-LibMagazine. 22(1/2) https://doi.org/10.1045/january2016-brunelle
46 |
47 | ## Archiving Exercise
48 |
49 | ### Web Archiving Exercise - Saving a Webpage at the Internet Archive
50 | This week we are going to save our first webpage in a full web archiving infrastructure.
51 |
52 | Again we turn to the Internet Archive and their Wayback Machine.
53 |
54 | If you navigate over to this page https://web.archive.org/ , in the bottom right side of the page you will see the "Save Page Now"
55 |
56 | 
57 |
58 |
59 | This will link you to the "Save Page Now" interface - https://web.archive.org/save
60 |
61 | This week we are going to select a webpage and save it to the Internet Archive's collection.
62 |
63 | When picking your page to archive keep in mind that you will be describing and linking to it in this weeks discussion post.
64 |
65 | One thing to note, if you have an Internet Archive account, you will have more options when you register and sign in than you will without that account. For this exercise you do not need an account.
66 |
67 | Select a webpage you want to archive, add it to the box, go ahead and leave the "Save error pages (HTTP Status=4xx, 5xx)" checked and "Save Page".
68 |
69 | After you click the "Save Page" button leave the page open so that you can see what is happening.
70 |
71 | In the discussion this week you will write a paragraph about what you see happening, How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. What are the potential uses of this kind of "micro" web archiving service?
72 |
73 | You will need to link to the specific capture you initiated in the Wayback Machine.
74 |
75 | ## Exploring Web Archives
76 |
77 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
78 |
79 | This week we will look at the web archives at that UNT Libraries.
80 |
81 | ### Web Archives
82 |
83 | UNT Web Archives (UNT Digital Library Interface) - https://digital.library.unt.edu/explore/collections/UNTWEB/browse/?sort=date_d
84 |
85 | UNT Libraries' Web Archives (Wayback Interface) - https://webarchive.library.unt.edu/
86 |
87 | CyberCemetery - https://cybercemetery.unt.edu
88 |
89 | * Cathy Hartman and CyberCemetery - https://www.digitalpreservation.gov/series/pioneers/hartman.html
90 | * Hartman, C. N., Hastings, S. K., & Alemneh, D. G. (2004). The Cybercemetery: Prolonging Usable Afterlife. IS&T--the Society for Imaging Science and Technology. https://digital.library.unt.edu/ark:/67531/metadc29310/
91 |
92 | UNT Libraries' Archive-It Collections - https://archive-it.org/organizations/1181
93 | * Primarly focused on Special Collections
94 |
95 | ### Grant Project related to Web Archives
96 |
97 | **National Digital Information and Infrastructure Preservation Program: Web-at-Risk (2005-2008)**
98 |
99 | * Web-at-Risk: Preserving Our Nation's Cultural Heritage - UNT Digital Library
100 | * Seneca, T. (2009). The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Project. Library Trends, 57(3), 427-441. http://hdl.handle.net/2142/13606
101 |
102 | **Expanding Collection Development Practices to Web Archives (EOTCD) (2009-2013)**
103 |
104 | * Hartman, C. N., Murray, K. R., & Phillips, M. E., (2013). Classification Of The End-Of-Term Archive: Extending Collection Development Practices To Web Archives. https://digital.library.unt.edu/ark:/67531/metadc152437/
105 | * Murray, K. R., & Hartman, C. N. (2012). Classifying the End-of-Term Archive. IS & T--the Society for Imaging Science and Technology Archiving Conference, 2012, Copenhagen, Denmark. https://digital.library.unt.edu/ark:/67531/metadc93305/
106 | * Phillips, M. E., & Murray, K. R. (2013). Improving Access to Web Archives through Innovative Analysis of PDF Content. IS & T--the Society for Imaging Science and Technology Archiving Conference, 2013, Washington, D.C., United States. https://digital.library.unt.edu/ark:/67531/metadc155622/
107 |
108 | **Programmatic Extraction of 'Documents' from Web Archives (2017-2020)**
109 |
110 | * Phillips, M. E. & Caragea, C. (2017) Programmatic Extraction of 'Documents' from Web Archives https://www.imls.gov/grants/awarded/lg-71-17-0202-17
111 | * Fox, N. T., Phillips, M. E., & Tarver, H. (2020). Programmatic Extraction of ‘Documents’ from Web Archives: Identifying Document Characteristics from Content Selector Interviews. https://digital.library.unt.edu/ark:/67531/metadc1757659/
112 | * Patel, K., Caragea, C. Phillips, M. E., & Fox., N. (2020). Identifying Documents In-Scope of a Collection from Web Archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 167-176 2020. https://doi.org/10.1145/3383583.3398540
113 | * arXiv version - https://arxiv.org/abs/2009.00611
114 |
115 | ## Discussion
116 |
117 | ### Discussion Post:
118 | In at least one paragraph, discuss what you learned about the capture technologies involved in the web archiving space. What were some of the terms that were new to you this week? What are some things that still need clarity for you?
119 |
120 | In at least one paragraph, describe what happened when you used the Wayback Machines "Save Page Now" mechanism. What webpage did you choose to save? Link to your capture of the website in the Wayback Machine. How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. What are the potential uses of this kind of "micro" web archiving service?
121 |
122 | Finally, in at least one paragraph, discuss the web archive or archived website that you reviewed this week from the UNT Libraries. Were you surprised by anything you found in the websites? Are there things that you would have expected that you didn't see? Discuss any of the related projects or grants that you explored as well.
123 |
124 | ### Class Engagement:
125 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
126 |
127 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
128 |
--------------------------------------------------------------------------------
/modules/module-06-preserve.md:
--------------------------------------------------------------------------------
1 | # Module Six - Preserve
2 |
3 | ## Overview and Objective
4 |
5 | ### Overview:
6 | Once you have decided what you are interested in collecting, and after you have decided how to crawl this content you need to think about how you will store and preserve the crawled content. This module will take a deeper look at the process of preservation of content for a web archive, introduce you to some new terms and file formats, and give you a chance to create another web capture with a different tool.
7 |
8 | This will build on concepts that you were introduced to in Module Four.
9 |
10 | There are several readings, some online documentation to skim, a video to watch, and several power points that you will review.
11 |
12 | ### Objectives:
13 | 1. Become familiar with different approaches to preserving web content.
14 | 2. Become familiar with the Web ARCchive (WARC) File Format.
15 | 3. Create your second web capture in the https://archive.today service.
16 |
17 | ## Readings
18 |
19 | ### Readings
20 |
21 | * International Internet Preservation Consortium (2020). Session 3b: Main Concepts and Technologies: Preserve
22 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3b-slides/
23 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3b-notes/
24 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen.
25 | * Kunze, J. (2005). WARC: an Archiving Format for the Web. International Web Archiving Workshop - https://web.archive.org/web/20120619151338/http://www.iwaw.net/05/kunze.pdf
26 | * Pennock, Maureen. “Web Archiving” Digital Preservation Coalition Technology Watch Report 13-01 (March 2013) - http://dx.doi.org/10.7207/twr13-01
27 | * Section 3. Standards (pgs. 17-18)
28 | * International Internet Preservation Consortium (2020). IIPC Training Video Case Study, Topic 7: Web Archiving Tools and Services - https://www.youtube.com/watch?v=MaynKx0_Oow
29 |
30 | ### Web ARChive (WARC) File Format
31 | Skim this documentation and focus on the types of WARC records that can exist. (This is part of this week's discussion)
32 |
33 | * International Internet Preservation Consortium. (2022). WARC Specifications. https://iipc.github.io/warc-specifications/
34 | * https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/
35 | * https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#warc-record-types
36 | * https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#annex-b-informative-examples-of-warc-records
37 |
38 | Review/Skim the following links to learn more about the WARC File Format.
39 | * Library of Congress. (2020). WARC, Web ARChive file format. https://www.loc.gov/preservation/digital/formats/fdd/fdd000236.shtml
40 | * The National Archives (2020). Details for: WARC http://www.nationalarchives.gov.uk/pronom/fmt/289
41 | * Wikipedia - https://en.wikipedia.org/wiki/Web_ARChive
42 | * International Standards Organization (2009). ISO 28500:2009 Information and documentation — WARC file format. https://www.iso.org/standard/44717.html
43 | * **Do NOT buy this standard. It is for reference only.**
44 | * Instead, use the public draft before standardization - http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
45 | * Archive Team (2021) - https://wiki.archiveteam.org/index.php?title=The_WARC_Ecosystem
46 | * ARC Format (precursor of the WARC format) - https://github.com/internetarchive/heritrix3/wiki/ARC%20File%20Format
47 |
48 | ### Additional Optional Video
49 | * Consultative Committee for Space Data Systems (CCSDS), Data Archive Interoperability (DAI) Working Group, Kearney, Michael W. III, Giaretta, D., Garrett, J., Hughes, S. (2020 What's missing from WARC? - https://www.youtube.com/watch?v=vdEaz109uAo
50 |
51 | ## Archiving Exercise
52 |
53 | ### Saving a Webpage with Archive.Today
54 |
55 | This week we are going to save our second webpage in a full web archiving infrastructure.
56 |
57 | This time we turn to a service called Archive.Today.
58 |
59 | You can read more about this service on its Wikipedia page (https://en.wikipedia.org/wiki/Archive.today)
60 |
61 | Start by navigating over to this page https://archive.today, (it most likely will redirect to https://archive.ph , this is fine)
62 |
63 | 
64 |
65 | This week we are going to select a webpage and save it using the archive.today service.
66 |
67 | When picking your page to archive keep in mind that you will be describing and linking to it in this weeks discussion post.
68 |
69 | Select a webpage you want to archive, add it to the box that says "My url is alive and I want to archive its content" and hit enter.
70 |
71 | You may be asked to prove you are a robot, once you have done that you will see if the webpage has been archived before or if you are the first to capture it. If it has been archived before, go ahead and say that you would like to archive it again.
72 |
73 | After you click the "Save" button leave the page open so that you can see what is happening.
74 |
75 | In the discussion this week you will write a paragraph about what you see happening, How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process. Compare this with tool from the Internet Archive's Wayback Machine. You will need to link to the specific capture you initiated using archive.today in this weeks discussion post.
76 |
77 | ## Exploring Web Archives
78 |
79 | Exploring Web Archives
80 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
81 |
82 | This week we will look at the Collaborative Collections of the International Internet Preservation Consortium.
83 |
84 | Collaborative Collections
85 | * International Internet Preservation Consortium (IIPC). Collaborative Collections - https://netpreserve.org/projects/collaborative-collections/
86 | * IIPC Collaborative Collections at Archive-It. - https://archive-it.org/home/IIPC
87 | * Using IIPC Collaborative Collections WARC data - https://netpreserve.org/iipc-cdg-warc-data/
88 | * Thurman, A., & Grotke, A. (2016). Content Development Group & Collaborative Collections Update. - https://digital.library.unt.edu/ark:/67531/metadc1477165/
89 |
90 | ## Discussion
91 |
92 | ### Discussion Post:
93 | In at least one paragraph, discuss what you learned this week about preservation of web archives and specifically about the WARC format. What were some of the terms or concepts that were new to you this week? What are some things that still need clarity for you?
94 |
95 | Looking back at the WARC Specification on Github, it contains some examples of different types of WARC records that are normally seen in a WARC file. In the Appendix B they have these examples listed - https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#annex-b-informative-examples-of-warc-records . In at least one paragraph, pick one of these record types and in your own words, describe why this kind of record makes sense in the WARC format. How is it used? What is it used for? What kind of information does it contain? Why is this information important?
96 |
97 | In at least one paragraph, describe what happened when you used the Archive.Today service. What webpage did you choose to save? Link to your capture of the website at the Archive.Today service. How many files did you end up collecting that were included in that page? Share any observations you have in the process, and ask any questions you might have about the process.
98 |
99 | Finally, in at least one paragraph, discuss the collaborative collection you chose to look at in the IIPC collection. Were you surprised by anything you found in the websites? Are there things that you would have expected that you didn't see? What are some other examples of archives that an international group like the IIPC might want to look at for a collaborative collection in the future? Remember to include links to the collection that you choose as well as examples of archived websites in your post.
100 |
101 | ### Class Engagement:
102 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
103 |
104 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
105 |
--------------------------------------------------------------------------------
/modules/module-07-playback.md:
--------------------------------------------------------------------------------
1 | # Module Seven - Playback
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 |
7 | Now that we have learned about standard ways of harvesting and storing web content in a web archive, the next step is providing access to that content. In the web archiving community this is most often referred as **playback** or **replay**. This module will take a deeper look at the some of the fundamentals of how replay works in most web archives. Additionally it will introduce you to some of the standard playback tools such as Open Wayback and pywb.
8 |
9 | This will build on concepts that you were introduced to in Module Six.
10 |
11 | There are several readings, some online documentation to skim, and several power points that you will review.
12 |
13 | ### Objectives:
14 |
15 | 1. Learn the fundamentals of how playback of archived web content works.
16 | 2. Become familiar with some of the common software tools for web archive replay.
17 | 3. Experiment with different replay environments and combinations of browsers and operating system with https://oldweb.today/
18 |
19 | ## Readings
20 |
21 | * International Internet Preservation Consortium (2020). Session 3C: Main Concepts and Technologies: Playback
22 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3c-slides/
23 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3c-notes/
24 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen.
25 | * Internet Archive (2021). How to use the Wayback Machine. https://www.youtube.com/watch?v=ts1tu1BiSuY
26 | * Sigurðsson, K. (2020). The Future of Playback. https://netpreserveblog.wordpress.com/2020/06/16/the-future-of-playback/
27 | * International Internet Preservation Consortium (2020). IIPC Training Video Case Study, Topic 6: Accessing and Using Web Archives.
28 | https://www.youtube.com/watch?v=Dng8d9ytOUc
29 | * A Short on How the Wayback Machine Stores more Pages than Stars in the Milky Way. http://highscalability.com/blog/2014/5/19/a-short-on-how-the-wayback-machine-stores-more-pages-than-st.html
30 | * Technical discussion of how the Wayback Machine Works.
31 |
32 | ### Software Links
33 | * Wayback Machine on Wikipedia - https://en.wikipedia.org/wiki/Wayback_Machine
34 | * Webrecorder pywb 2.6 - https://github.com/webrecorder/pywb
35 | * OpenWayback - https://github.com/iipc/openwayback/
36 | * OpenWayback to pywb Transition Guide and pywb update - https://netpreserveblog.wordpress.com/2020/12/16/openwayback-to-pywb-transition-guide/
37 | * OpenWayback Transition Guide - https://pywb.readthedocs.io/en/latest/manual/owb-transition.html
38 | * Replay Web.Page - https://replayweb.page/
39 |
40 | ## Archiving Exercise
41 |
42 | ### Web Archiving Exercise - Viewing a website on Old Web Today
43 |
44 | This week we will look at viewing websites on older browsers and operating systems.
45 |
46 | The tool that we will be exploring is a web service called Old Web Today https://oldweb.today .
47 |
48 | This tool is developed and maintained by the team that created Webrecorder (https://webrecorder.net/ )
49 |
50 | The goal of this tool is to more easily allow users to experience websites in the tools and technologies from when websites were captured.
51 |
52 | 
53 |
54 | Start by navigating to the website https://oldweb.today/
55 |
56 | **Note**: _I have had mixed results with this service. I think it is very interesting and one of the only ways for us to experience browsers from over twenty years ago. That being said, it is pretty finicky and can be a bit frustrating. Try a few different things as experiments and spend at least 15 minutes trying different combinations. _
57 |
58 | I suggest starting with the NCSA Mosaic 2 browser. You can then add a URL that is likely to have existed early in the web. I chose http://nasa.gov and was curious about what things looked like back in 1996.
59 |
60 | You can select other websites and time periods or even look at a modern website from today on a browser from the past.
61 |
62 | Here is an example that will give you an idea of how things work. For the activity, please select a different URL for your experiments.
63 |
64 | * About the Mosaic web browser - https://en.wikipedia.org/wiki/Mosaic_(web_browser)
65 | * NASA website from 1996 on NCSA Mosaic 2 - https://oldweb.today/?browser=nm2-mac#19960101/http://nasa.gov
66 | * NASA website from 1998 on NCSA Mosaic 2 - https://oldweb.today/?browser=nm2-mac#19980101/http://nasa.gov
67 | * NASA website from the Live web on NCSA Mosaic 2 - https://oldweb.today/?browser=nm2-mac#http://nasa.gov
68 |
69 | 
70 |
71 |
72 | Experiment with different browsers, urls, and switch between the live web and the archived web.
73 |
74 | You have to be patient with all of these emulated systems. What it is doing behind the scenes (emulating an operating system in Javascript) is pretty cool, but takes patience. Here is more information about the technology - https://github.com/oldweb-today/oldweb-today
75 |
76 | For the discussion this week you will describe your experience with this system, what you tried to access, how well it did or didn't work and if you were surprised by anything. What is the earliest you can remember accessing websites on the internet. What tools do you remember? What websites do you remember using? Finally, what are different uses for a web service like this in the web archiving space?
77 |
78 | ## Exploring Web Archives
79 |
80 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
81 |
82 | This week we will look at the web archives at the End of Term (EOT) Collaborative Web Archive.
83 |
84 | ### End of Term Web Archive
85 | * End of Term Web Archive Website - https://eotarchive.org/
86 | * End of Term Twitter Account - https://twitter.com/eotarchive
87 | * End of Term Web Archive Wikipedia Page - https://en.wikipedia.org/wiki/End_of_Term_Web_Archive
88 | Webarchive of Press Releases about the 2016 EOT Web Archive - https://archive-it.org/collections/8311
89 |
90 | ### Collected Websites
91 | * Browse 2008, 2012, 2016 - http://eotarchive.cdlib.org/
92 | * End of Term 2008 - UNT Digital Library - https://webarchive.library.unt.edu/eot2008/
93 | * End of Term 2012 - UNT Digital Library - https://webarchive.library.unt.edu/eot2012/
94 | * End of Term 2016 - UNT Digital Library (UNT Crawls Only) - https://webarchive.library.unt.edu/eot2016/
95 | * End of Term 2020 - UNT Digital Library (UNT Crawls Only) - https://webarchive.library.unt.edu/eot2020/
96 |
97 | ### Seed Lists for Collection
98 |
99 | URL Nomination Tool - https://digital2.library.unt.edu/nomination/
100 |
101 | * End of Term Presidential Harvest 2008 - https://digital2.library.unt.edu/nomination/eth2008/
102 | * End of Term Presidential Harvest 2012 - https://digital2.library.unt.edu/nomination/eth2012/
103 | * End of Term 2012 - Bulk Lists - https://digital2.library.unt.edu/nomination/eth2012_bulk/
104 | * End of Term Presidential Harvest 2016 - https://digital2.library.unt.edu/nomination/eth2016/
105 | * End of Term 2016 - Bulk Lists - https://digital2.library.unt.edu/nomination/eth2016_bulk/
106 | * End of Term Presidential Harvest 2020 - https://digital2.library.unt.edu/nomination/eth2020/
107 | * End of Term 2020 - Bulk Lists - https://digital2.library.unt.edu/nomination/eth2020_bulk/
108 |
109 | ### Articles about the End of Term
110 | * Seneca, T., Grotke, A., Hartman, C. N., & Carpenter, K. (2012). It Takes A Village To Save The Web: The End Of Term Web Archive. Documents to the People. 40(1). https://digital.library.unt.edu/ark:/67531/metadc84373/
111 | * Phillips, M. E. & Phillips, K. K. (2017). End of Term 2016 Presidential Web Archive. Against the Grain 29(6) https://doi.org/10.7771/2380-176X.7874
112 | * Phillips, M. E. , Chudnov, D., & Jacobs, J. R. (2016). Exploratory Analysis of the End of Term Web Archive: Comparing Two Collections. Web Archiving Workshop, Joint Conference on Digital Libraries, Newark, New Jersey. https://digital.library.unt.edu/ark:/67531/metadc854106/
113 |
114 | ## Discussion
115 |
116 | ### Discussion Post:
117 | In at least one paragraph, discuss what you learned this week about playback of and access to web archives. What were some of the terms or concepts that were new to you this week? What are some things that still need clarity for you?
118 |
119 | In at least one paragraph, describe what happened when you used the OldWeb.Today service. What combinations did you tried to access, how well did it or didn't it work? Was this your first time you have used emulated software? What did you think of the process? What is the earliest you can remember accessing websites on the internet. What tools do you remember? What websites do you remember using? Finally, what are different uses for a web service like this in the web archiving space?
120 |
121 | Finally, in at least two paragraphs, discuss the End of Term Web Archive and what you learned about this collaborative collection. Who are some of the institutions involved with this effort? What websites did you try to access in the different term's web archive? Were you successful in navigating to the different term's content? With this being a volunteer effort, there are some serious limitations in how users can access this content. Based on what you have learned this week and over the past few weeks in this course, what are some suggestions you would make to this effort on ways of improving access to these web crawls?
122 |
123 | ### Class Engagement:
124 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
125 |
126 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
127 |
--------------------------------------------------------------------------------
/modules/module-08-other-tools.md:
--------------------------------------------------------------------------------
1 | # Module Eight - Other Tools
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 | The purpose of this module is to provide an overview of a number of tools for curation, characterization or profiling (to analyze and understand the captured content) and tools to widen access to collections and raise awareness for web archives.
7 |
8 | This will build on concepts that you were introduced to in the previous technology modules.
9 |
10 | There are several readings, some online documentation to skim, and several power points that you will review.
11 |
12 | ### Objectives:
13 | 1. Learn about additional tools in the web archiving landscape.
14 | 2. Become familiar with some of the standards in development for enabling cross-archive access (Memento).
15 | 3. Experiment with ArchiveReady and familiarize yourself with the concept of archivability.
16 |
17 |
18 | ## Readings
19 |
20 | ### General
21 |
22 | The readings in this module are chosen to give you some exposure to different tools and systems that are being used around the world for different aspects of the web archive lifecycle.
23 |
24 | * International Internet Preservation Consortium (2020). Session 3D: Main Concepts and Technologies: Other Tools
25 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-3d-slides/
26 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-3d-notes/
27 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen.
28 |
29 | ### Memento
30 |
31 | * Jones S.M., Klein M., Sompel H.V.., Nelson M.L., Weigle M.C. (2021) Interoperability for Accessing Versions of Web Resources with the Memento Protocol. In: Gomes D., Demidova E., Winters J., Risse T. (eds) The Past Web. Springer, Cham. https://doi.org/10.1007/978-3-030-63291-5_9
32 | * Direct Link to UNT Libraries Access - https://link-springer-com.libproxy.library.unt.edu/chapter/10.1007/978-3-030-63291-5_9
33 | * Web Archiving Fundamentals pt 2: Memento https://video.vt.edu/media/Web+Archiving+Fundamentals+pt+2A+Memento/1_5gvqowto
34 | * This is probably the best overview of what Memento actually is and how it can be used to improve access to web archives.
35 | * Memento Guide - Introduction to Memento - http://www.mementoweb.org/guide/quick-intro/
36 | * About the Memento Project - http://mementoweb.org/about/
37 | * Memento Project - https://en.wikipedia.org/wiki/Memento_Project
38 | * Memento, About the Time Travel Service - http://timetravel.mementoweb.org/about/
39 | * Memento at the W3C - https://www.w3.org/blog/2016/08/memento-at-the-w3c/
40 | * Coalition for Networked Information (2010). CNI: Memento: Time Travel for the Web -
41 | https://www.youtube.com/watch?v=ePBMn-_I1rU
42 | * I don't expect you to watch this whole video, but it is a very good, and deep dive into Memento.
43 |
44 | ### Webrecorder/Conifer
45 |
46 | Webrecorder (service is now called Conifer)
47 |
48 | * Conifer. (2019). Introduction to Webrecorder.io - getting started https://www.youtube.com/watch?v=yX2RrfNPQjg
49 | * Rhizome. (2020). Webrecorder.io is now Conifer.rhizome.org. https://blog.conifer.rhizome.org/2020/06/11/webrecorder-conifer.html
50 | * Frequently Asked Questions - https://conifer.rhizome.org/_faq
51 | * Conifer Guide - https://guide.conifer.rhizome.org/
52 |
53 | ### Archives Unleashed
54 |
55 | * Ruest, N., Lin, J., Milligan, I., & Fritz, S. (2020). The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. Association for Computing Machinery, New York, NY, USA, 157–166. https://doi.org/10.1145/3383583.3398513
56 | * UNT Libraries Access Link - https://dl-acm-org.libproxy.library.unt.edu/doi/pdf/10.1145/3383583.3398513
57 | * Archives Unleashed Overview - https://www.youtube.com/watch?v=nBwgM63MxY8
58 | * Project Website - https://archivesunleashed.org/
59 | * The Archives Unleashed Toolkit - https://archivesunleashed.org/aut/
60 |
61 | ### Seed Nomination Services
62 | * URL Nomination Tool - https://digital2.library.unt.edu/nomination/
63 | * URL Nomination Tool Code (django-nomination) - https://github.com/unt-libraries/django-nomination
64 | * URL Nomination Tool Presentation - https://digital.library.unt.edu/ark:/67531/metadc287023/m2/
65 | * Cobweb Service - https://cdlib.org/services/pad/webarchiving/cobweb/
66 | * Cobweb Code - https://github.com/CobwebOrg/cobweb
67 |
68 | ### Other Crawling Services / Software Suites
69 | * Web Curator Tool - https://webcuratortool.org/
70 | * Web Curator Tool Code - https://github.com/DIA-NZ/webcurator/wiki
71 | * Web Curator Tool Documentation - http://webcurator.sourceforge.net/
72 | * Netarchive Suite - https://sbforge.org/display/NAS/NetarchiveSuite
73 | * Netarchive Suite Github - https://github.com/netarchivesuite
74 |
75 |
76 | ## Archiving Exercise
77 |
78 | ### Web Archiving Exercise - Archivability with ArchiveReady
79 | This week we will look at the concept of archivability as it pertains to Web Archives.
80 |
81 | First, take a look at this paper by Banos, Kim, Ross and Manolopoulos.
82 |
83 | Banos V., Kim Y., Ross S., Manolopoulos Y.: CLEAR: a credible method to evaluate website archivability, iPRES 2013, http://purl.pt/24107/1/iPres2013_PDF/CLEAR%20a%20credible%20method%20to%20evaluate%20website%20archivability.pdf
84 |
85 | You don't need to read the article in depth, but it is helpful to get the context of the work.
86 |
87 | Next, navigate to http://archiveready.com (sadly it is not an https service).
88 |
89 | 
90 |
91 | Get familiar with this service by browsing around the website.
92 |
93 | After you have looked around, choose the homepage of an organization or other website you want to test.
94 |
95 | Enter the websites URL into the provided box and choose "Check now".
96 |
97 | For this example I chose to look at the Electric Reliability Council of Texas (ERCOT) website (https://ercot.com ).
98 |
99 | 
100 |
101 | Look at the different results tabs to get an idea of the different metrics and archivability facets.
102 |
103 | In this week's discussion you will share the website you chose for this exercise. Additionally, you will share the Overall ratings as well as a synthesis of the findings from the service. Any additional observations you find interesting about this service would be good to share as well. Finally, explain how you think this kind of service could be helpful in the web archiving lifecycle.
104 |
105 | ### Additional Readings about Archivability
106 | * Web Archivability Community Group - https://www.w3.org/community/webarchivability/
107 | * Archivability - https://library.stanford.edu/projects/web-archiving/archivability
108 | * Banos V., Manolopoulos Y.: A quantitative approach to evaluate Website Archivability using the CLEAR+ method, International Journal on Digital Libraries, 2015, https://doi.org/10.1007/s00799-015-0144-4
109 | * UNT Direct Link - https://www-proquest-com.libproxy.library.unt.edu/docview/1785958458?pq-origsite=summon
110 |
111 | ## Exploring Web Archives
112 |
113 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
114 |
115 | This week we will look at the Portugal National Web Archive (https://arquivo.pt/?l=en )
116 |
117 | 
118 |
119 | What is Arquivo.pt - https://sobre.arquivo.pt/en/help/what-is-arquivo-pt/
120 |
121 | Examples of preserved pages - https://sobre.arquivo.pt/en/examples/examples/
122 |
123 | Exhibitions - https://sobre.arquivo.pt/en/examples/collections/
124 |
125 | Youtube Channel for Arquivo.pt - https://www.youtube.com/channel/UCEMJX0ICk1t2TzuNXghxKDg
126 |
127 | Some things that I would like for you to notice is the different ways of presenting web archives. The style of the wayback interface that they are presenting is different than others we have seen so far in this course. A description of the differences would be useful in your discussion post. The service also presents some examples and exhibits to help users get into the web archives a little better. How well does this work in your opinion? What did you end up exploring in the archive? How well does arquivo.pt present content in Portuguese and English. Have you seen other interfaces in multiple languages in our web archive exploration so far?
128 |
129 | It should be of no surprise that the web archive focusing on the .pt domain and the national web of Portugal might not have the same content we are used to seeing here in the United States. If you have trouble thinking about what to look at in the archive consider finding the url of a city in Portugal, a local sports team or other cultural event in Portugal and exploring what has been archived
130 |
131 | ## Discussion
132 |
133 | ### Discussion Post:
134 | In at least one paragraph, discuss what you learned this week about other tools and service in the web archive landscape. What were some of the terms or concepts that were new to you this week? What are some things that still need clarity for you? How do you think a protocol Memento and its associated technologies can benefit the web archive landscape?
135 |
136 | In at least one paragraph, describe what happened when you used the ArchiveReady service service. What was the website you chose for this exercise? What was the Overall rating that this website received? Discuss the findings from the service in addition to this overall score. Any additional observations you find interesting about this service would be good to share as well. Finally, explain how you think this kind of service could be helpful in the web archiving lifecycle.
137 |
138 | Finally, in at least two paragraphs, discuss the Portugal National Web Archive arquivo.pt and what you learned about this web archive. What are some differences you noticed in the presentation of web archives in this service compared to collections we have looked at in previous weeks. Share your observations of the examples and the exhibitions and share your opinions about if that helped in exploring the collection. Finally what are some of the websites that you explored in arquivo.pt?
139 |
140 | ### Class Engagement:
141 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
142 |
143 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
144 |
145 | Search entries or author
146 |
--------------------------------------------------------------------------------
/modules/module-09-collection-policies.md:
--------------------------------------------------------------------------------
1 | # Module Nine - Collection Policies
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 |
7 | The purpose of this module is to show how collection policies, collection scopes, and general selection activities occur in the scope of building a web archive. Web archiving is at its lowest level another tool for building collections in libraries and archives. In order to communicate the scope of these collections with others some sort of collection scope or policy statement can be useful.
8 |
9 | This will build on concepts that you were introduced to in the technology modules and start to align them with other concepts in the library and archives space.
10 |
11 | There are several readings, some online documentation to skim, and several power points that you will review.
12 |
13 | ### Objectives:
14 | 1. Familiarize yourself with different web archive collection policies.
15 | 2. Understand the role that web archives can play in the collecting and acquisition of materials in libraries and archives.
16 | 3. Install and use the ArchiveWeb.page tool for creating web collections locally.
17 |
18 | ## Readings
19 |
20 | ### General
21 |
22 | The readings this week were selected to give you an introduction to the collection development approaches that are common to building collections using web archiving tools and techniques. These readings are a combination of both theory and practice in this space. You will be introduced to the framework we will use for the next major assignment in the Murray & Hsieh piece.
23 |
24 | * International Internet Preservation Consortium (2020). _Session 7: Writing a Web Archiving Policy_
25 | * Slides - https://netpreserve.org/download/iipc-training-session-beginners-7-slides/
26 | * Speaker Notes - https://netpreserve.org/download/iipc-training-session-beginners-7-notes/
27 | * Review these slides and speaker notes. I suggest reading the notes when you have the slides open in another part of the screen.
28 | * International Internet Preservation Consortium (2020). _IIPC Training Video Case Study, Topic 5: Web Archiving Collecting Policies_
29 | https://www.youtube.com/watch?v=-NxJXrUTJ8A
30 | * Post, C. (2017). Building a Living, Breathing Archie: A Review of Appraisal Theories and Approaches for Web Archives. Preservation, Digital Technology & Culture 46(2). 69-77. https://doi.org/10.1515/pdtc-2016-0031
31 | * UNT Libraries Direct Link - https://libproxy.library.unt.edu/login?url=https://www.proquest.com/docview/1940603266
32 | * Free online version - http://libres.uncg.edu/ir/uncg/f/C_Post_Building_2017.pdf
33 | * Summers, E. & Punzalan, R. (2017). Bots, Seeds and People: Web archives as infrastructure. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 821-834. https://doi.org/10.1145/2998181.2998345
34 | * UNT Libraries Direct Link - https://dl-acm-org.libproxy.library.unt.edu/doi/10.1145/2998181.2998345
35 | * Free arXiv Version - https://arxiv.org/abs/1611.02493 (pdf: https://arxiv.org/pdf/1611.02493.pdf)
36 | * Read Crawl Modalities in Findings p. 825-826 (or p 5-6 in the preprint)
37 | * Ward, E. (2018). Archiving the Web @EBRPL: Creating and following a web collecting policy in a public library. https://archive-it.org/blog/post/archiving-the-web-ebrpl-creating-and-following-a-web-collecting-policy-in-a-public-library/
38 | * Murray, K. & Hsieh, I. (2006). Collection Planning Guidelines https://digital.library.unt.edu/ark:/67531/metadc33006/
39 | * https://digital.library.unt.edu/ark:/67531/metadc33006/m2/1/high_res_d/cpg_final_31may2006.pdf
40 | * Page 18-40 will come back in the next major assignment.
41 |
42 | ### Web Archive Collection/Collecting Policies
43 |
44 | A selection of policies for web archive collections at different institutions around the US. This is only a sample of those that are easily identifiable. You should notice the scope and breadth of the different plans. In this weeks' discussion you will select one of these to describe, or ideally, find a policy from another institution not listed here for the discussion.
45 |
46 | * Columbia University Libraries - https://library.columbia.edu/collections/web-archives/policies.html
47 | * J. Paul Getty Trust - https://archives.getty.edu/getty_images/digitalresources/PublicWebArchivesCollectingPolicy.pdf
48 | * Montana State University - Web Archives Policies and Procedures - https://lib.utsa.edu/specialcollections/sites/specialcollections/files/2020-09/WebArchives_Policy_2020-08-20.pdf
49 | * NCSU Web Archiving - https://ncsu-libraries.github.io/web-archiving-docs/
50 | * Purdue University - https://www.lib.purdue.edu/sites/default/files/spcol/purdue-archives-web-archiving-policy.pdf
51 | * Stanford University Library - https://library.stanford.edu/projects/web-archiving/collection-development
52 | * University of California San Francisco - https://www.library.ucsf.edu/archives/ucsf/web/policy/
53 | * University of Chicago Web Archive Collection - https://www.lib.uchicago.edu/e/scrc/findingaids/view.php?eadid=ICU.SPCL.UCWEB
54 | * UT San Antonio Web Archives Policy - https://lib.utsa.edu/specialcollections/sites/specialcollections/files/2020-09/WebArchives_Policy_2020-08-20.pdf
55 | * Virginia Memory - https://www.virginiamemory.com/collections/web_archives/guidelines
56 |
57 | International Internet Preservation Consortium - Collection Development Policies - https://netpreserve.org/web-archiving/collection-development-policies/
58 |
59 | ## Archiving Exercise
60 |
61 | ### Archiveweb.page
62 |
63 | This exercise can be accomplished by either installing a free extension via the Chrome Web Store (https://chrome.google.com/webstore/) or installing a desktop version of the Archiveweb.page application. This tool will enable you to start creating web archives locally of websites and have access to these collections. Because this involves actually installing something locally it might be a bit more involved than previous exercises.
64 |
65 | Here is a video that gives an overview of why this tool exists -
66 | https://www.youtube.com/watch?v=hPcwDoDfhmo
67 |
68 |
69 | The easiest way to work with this tool is with a modern version of the Chrome (https://www.google.com/chrome/) browser. If you don't have the ability to install the Chrome browser there are options for downloading a desktop version of Archiveweb.page.
70 |
71 | Next navigate to https://archiveweb.page/
72 |
73 | 
74 |
75 |
76 | This video by Ilya Kreymer gives a nice overview of the process. -
77 | https://www.youtube.com/watch?v=AP6wucoqJw0&t=1067s
78 |
79 | You can also look at the provided guide - https://archiveweb.page/guide
80 |
81 | ### Activity
82 |
83 | The goal of this exercise is to experiment with this tool and try and archive some web content on your own computer. This tool contains both the capture and playback pieces in the web archive workflow. It is an interactive tool that records the things that you browse in a Chrome tab and even has some capabilities to automatically capture some content using its "Autopilot" feature.
84 |
85 | Try recording a website you are familiar with. I would suggest picking an organization, governmental or other website and keeping clear of the social media sites as you start. It makes it a bit easier to see how things are getting captured. If you like you can go back and try capturing social media sites as well, just not the best to begin with. Try creating a collection, capturing some content, and then try and download that content to your local machine.
86 |
87 | In the discussion this week you will report on your success with this tool and share information about what website you captured and how well the tool worked. Were you able to download the web archive to your local machine? What did you think about this experience compared to others. What are your observations about how this is similar or different compared to the other hosted web archiving tools you have used in previous exercises?
88 |
89 | ## Exploring Web Archives
90 |
91 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
92 |
93 | This week we will look at the UK Web Archive.
94 |
95 | 
96 |
97 | Start by navigating over to the UK Web Archive - https://www.webarchive.org.uk/
98 |
99 | UK Web Archive: About us - https://www.webarchive.org.uk/en/ukwa/about
100 |
101 | Topics and Themes - https://www.webarchive.org.uk/en/ukwa/collection
102 |
103 | UK Web Archive: Github repositories - https://github.com/ukwa/
104 |
105 | The UK Web Archive, like arquivo.pt from last week is a web archive focused primarily around a national domain.
106 |
107 | The UK Web Archive in this case is focused on the countries that are included in the United Kingdom.
108 |
109 | One of the things you should try is to explore the different topics and themes to get a better idea of the websites that are included.
110 |
111 | Another feature about this web archive that you should try is the word or phrase searching.
112 |
113 | Many of you have noted that search would be nice in other discussions in this course and again like arquivo.pt we are starting to see search as another way of accessing content in web archives.
114 |
115 | ## Discussion
116 |
117 | ### Discussion Post:
118 | In at least one paragraph, discuss what you learned this week about collection policies for web archives. How familiar were you with collection description or scope statements in the past? What is the value of having collection policies for web archives?
119 |
120 | In the Summers and Punzalan (2017) article, they describe different modalities for crawling and include domain, website, topical, event based, and document crawls. In at least one paragraph select one of these modalities, describe it in your own terms and give an example of a type of web archive collection that could fit this modality. The example could be from your previous explorations in class or can be an example of a web archive collection that could be created.
121 |
122 | In one paragraph describe what you found in a web archive collection policy. You can choose one from this week's readings or if you want a virtual fist bump from me when I grade, find an example of a policy or collection scope that isn't listed in the readings. Some examples of things you might comment on are the audience of the document, the structure, how detailed or broad it was written, or any other things you noticed when looking at it. How does the document you identified assist in the web archiving process?
123 |
124 | In at least one paragraph, describe what happened when you used the ArchiveWeb.page tool. Did you have any challenges getting it installed and working? What sites did you try and capture? How well did they work for you? Were you able to download the resulting archive file? What kind of file downloaded? What did you think about this experience compared to others. What are your observations about how this is similar or different compared to the other hosted web archiving tools you have used in previous exercises?
125 |
126 | Finally, in at least two paragraphs, discuss the UK Web Archive (UKWA) and what you learned about this web archive. What are some differences you noticed in the presentation of web archives in this service compared to collections we have looked at in previous weeks. Share your observations of the topics and themes feature and share your opinions about if that helped in exploring the collection. Finally what are some of the websites that you explored in the UKWA?
127 |
128 | ### Class Engagement:
129 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
130 |
131 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
132 |
133 |
--------------------------------------------------------------------------------
/modules/module-10-metadata.md:
--------------------------------------------------------------------------------
1 | # Module Ten - Metadata
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 |
7 | The purpose of this module is to explore the use of metadata to describe web archives. Metadata is used to provide description and navigational aides to many types of digital collections. Web archives are one of those types of digital resources that benefit from metadata. Metadata can be applied at different levels within a web archive itself and this module will discuss different approaches to metadata use in web archives.
8 |
9 | This will build on concepts discussed in the collection policies module and will be important in the final project for this course.
10 |
11 | There are several readings, some online documentation to skim, and several power points that you will review.
12 |
13 | ### Objectives:
14 |
15 | 1. Familiarize yourself with different approaches in applying metadata to web archive.
16 | 2. Become familiar with Dublin Core metadata as it applies to web archives.
17 | 3. Explore the Memento Time Travel Service for accessing multiple web archives in a single interfaces.
18 |
19 | ## Readings
20 |
21 | ### General
22 |
23 | The readings this week were selected to give you an introduction to different approaches for applying metadata to web archives. There is a wide range of options for metadata and web archives and depending on the scope, size, and organization of your web archiving program, one or more approaches might be in place.
24 |
25 | * Bragg, M., & Hanna, K. (2013). The Web Archiving Life Cycle Model. http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf
26 | * pp. 20-21 (Metadata and Description)
27 | * Dooley, J., Farrell, K., Kim, T., & Venlet, J. (2017). Developing Web Archiving Metadata Best Practices to Meet User
28 | Needs. Journal of Western Archives. 8(2). https://doi.org/10.26077/cffd-294a
29 |
30 | ### OCLC Web Archive Metadata
31 |
32 | * Dooley J. (2016). Slam bam WAM: Wrangling best practices for web archiving metadata - https://hangingtogether.org/slam-bam-wam-wrangling-best-practices-for-web-archiving-metadata/
33 | * Dooley, J. & Bowers, K. (2018). Descriptive Metadata for Web Archiving. OCLC Research https://www.oclc.org/research/publications/2018/oclcresearch-descriptive-metadata.html
34 | * Skim this publication.
35 | * Web Archiving Metadata Working Group - OCLC Research - https://www.oclc.org/research/areas/research-collections/wam.html
36 | Project website.
37 | * OCLC Research (2018). Outcomes from the OCLC Research Library Partnership Web Archiving Metadata Working Group
38 | https://www.youtube.com/watch?v=xTR8RK3t2jU
39 | * A nice overview of the work.
40 |
41 | ### Other Metadata Resources
42 |
43 | * UNC University Archives (2013). University of North Carolina at Chapel Hill University Archives Collected Websites, 2012-2021 https://finding-aids.lib.unc.edu/40417/
44 | * New York Art Resources Consortium (2018). Metadata Application Profile for Description of Websites with Archived Versions Version 2. https://web.archive.org/web/20200702210617/https://www.nyarc.org/sites/default/files/web-archiving-profile-version2.pdf
45 | * Skim this publication.
46 | * Archive-It. (2021). Add, edit, and manage your metadata https://support.archive-it.org/hc/en-us/articles/208332603-Add-edit-and-manage-your-metadata
47 | * Venlet, J. (2018). Behind the Scenes: Describing Archived Websites - https://blogs.lib.unc.edu/uarms/2018/05/23/describing-archived-websites/
48 | * Formenton, D. & Gracioso, L. (2022). Metadata standards in web archiving technological resources for ensuring the digital preservation of archived websites. RDBCI: Digital Journal of Library and Information Science, 20 https://doi.org/10.20396/rdbci.v20i00.8666263
49 |
50 | ## Archiving Exercise
51 |
52 | ### Web Archiving Exercise - Time Travel with Memento
53 |
54 | This week we will look in depth at the Time Travel service that helps discover Mementos from different web archiving programs around the world.
55 |
56 | We learned about Memento in Module Eight - Other Tools so if you would like to review the Memento Protocol or the specifics of TimeGates and TimeMaps that is a good place to start. Here is a quick overview of the different components for your review http://www.mementoweb.org/guide/quick-intro/
57 |
58 | The Time Travel Service is an easy way to see what these protocol and infrastructure components can enable once they have been implemented.
59 |
60 | First, navigate to http://timetravel.mementoweb.org/
61 |
62 | 
63 |
64 | You can learn more about the service by navigating to the about page - http://timetravel.mementoweb.org/about/
65 |
66 | You can then insert a URL you want to look at and the date and time you are interested in viewing. I've decided to look at https://unt.edu back in January of 2002. http://timetravel.mementoweb.org/list/20020124170643/http://unt.edu
67 |
68 | 
69 |
70 | You can see the different web archives that have Mementos nearest to the time I am interested in looking at. Additionally it will tell you how far from the requested date the Memento that they have is. So in this example, the closest example is from 1 hour before my requested time of 5:06:43 PM on January 1, 2002.
71 |
72 | Try out some different URLs and times in the service. What were the archives that you saw the most in the results? What was the closest to your requested time that you saw? What was the furthest away? For example in my example the furthest away Memento for the example above was 11 years 290 days after the requested time in Arquivo.pt. Explore some of the different web archive instances that are listed and see if you notice the differences in the Mementos. Why is a service like this useful? What are some situations where having greater control for knowing when a web archiving is providing harvested content comes into play.
73 |
74 | This week's discussion you will be reporting out on your experiments with this tool.
75 |
76 | ## Exploring Web Archives
77 |
78 | ### Trove - Australian Web Archive
79 |
80 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
81 |
82 | This week we will look at the Trove archive in Australia.
83 |
84 | Trove is an aggregation platform for libraries, universities, museums, and galleries across Australia. Trove is maintained by the National Library of Australia.
85 |
86 | https://webarchive.nla.gov.au/collection
87 |
88 |
89 |
90 | 
91 |
92 | https://trove.nla.gov.au/help/categories/websites-category
93 |
94 | There are two ways of interacting with Trove and its web archive. First is using the Sub Collections listed on the main page. You can browse down into sub collections and locate examples of websites held in the collection.
95 |
96 | Another option is to use the search feature in the top left side of the page. I did a quick search for a broad search for 'barrier reef' https://trove.nla.gov.au/search/category/websites?keyword=barrier%20reef
97 |
98 | 
99 |
100 | Picking the first example gives me this displayed page. https://webarchive.nla.gov.au/awa/20140313113916/http://www.gbrmpa.gov.au/
101 |
102 | 
103 |
104 | Take some time to explore this web archive and the different ways it presents information. In this week's discussion you will talk about some of the things that you find as well as giving your observation on the interface and the different ways it presents information from other web archives that we have seen so far in class.
105 |
106 | ## Discussion
107 |
108 | ### Discussion Post:
109 | In at least one paragraph, discuss what you learned this week about metadata for web archives. How familiar were you generally with metadata before this week? What are some of the concepts that were new to you in relation to metadata? Discuss some of the different "levels" that metadata can describe in a web archive such as a seed, site, collection, or document.
110 |
111 | In one paragraph give a short sales pitch for the tool or service that you described in the Web Archive Tool Critique assignment from a few weeks ago. Give an overview of the tool, what it is trying to accomplish, and anything you would recommend to your fellow students about the tool.
112 |
113 | In at least one paragraph, describe what happened when you used the Time Travel Service from the Memento team. What were the archives that you saw the most in the results? What was the closest to your requested time that you saw? What was the furthest away? Why is a service like this useful? What are some situations where having greater control for knowing when a web archiving is providing harvested content comes into play. Any other thoughts or observations about this tool would be great to mention.
114 |
115 | Finally, in at least two paragraphs, discuss the Trove system and the Australian Web Archive. What are some differences you noticed in the presentation of web archives in this service compared to collections we have looked at in previous weeks. Share your observations of the topics and themes feature and share your opinions about if that helped in exploring the collection. Finally what are some of the websites that you explored in Trove? How well did you find the search function worked?
116 |
117 | ### Class Engagement:
118 |
119 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
120 |
121 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
122 |
--------------------------------------------------------------------------------
/modules/module-11-quality-assurance.md:
--------------------------------------------------------------------------------
1 | # Module Eleven - Quality Assurance
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 | The purpose of this module is to explore quality assessment in web archives. As we have seen so far in our course, there are many times when the replay of content does not match the original site. This can be caused by many situations. Was the content within scope to be crawled? Was there an issue extracting the URL for the content? Was there issues harvesting the content? Are the issues related to playback? Is the content in the web archive but unavailable because of a long chain of redirects like we see with Youtube content? There are many reasons that quality can not meet expectations in a web archive and this module is going to give you an introduction to the main concepts.
7 |
8 | This will build on concepts discussed in the Module Nine - Collection Policies, and Module Ten - Metadata modules and will be important in the final project for this course.
9 |
10 | There are several readings, some online documentation to skim, and several power points that you will review.
11 |
12 | ### Objectives:
13 | 1. Familiarize yourself with quality assessment in web archive.
14 | 2. Identify common issues found in web archives and their playback.
15 | 3. Explore the use of the ReplayWeb.page service.
16 |
17 | ## Readings
18 |
19 | The readings this week were selected to give you an introduction to quality assessment in web archives. This module will introduce common problems that can occur in the web archiving space related to quality control and approaches that can be used to combat these problems.
20 |
21 | ### Overview of the Challenges
22 | * Brown, A. (2006). Archiving websites: A practical guide for information management professionals.
23 | * Chapter 5, Quality assurance and cataloging. 69-81
24 | * Click on the title of this link - https://scholar.google.com/citations?view_op=view_citation&hl=en&user=gZuRr94AAAAJ&citation_for_view=gZuRr94AAAAJ:Se3iqnhoufwC I was able to get to the 5th chapter via Google Books if I went through this URL. Let me know if this doesn't work for you.
25 | * Available at the Discovery Park Library - https://discover.library.unt.edu/catalog/b3062630
26 | * Please do not attempt to purchase this Chapter. I can work with you to get access Download access. It is a good resource but kind of a pain to get access to.
27 | * Bragg, M., & Hanna, K. (2013). The Web Archiving Life Cycle Model. http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf
28 | * pp. 26-27 (Quality Assurance and Analysis)
29 | * Reyes, B., Phillips, M. E., & Ko, L. (2014). Current Quality Assurance Practices in Web Archiving. https://digital.library.unt.edu/ark:/67531/metadc333026/
30 | * Reyes, B., McDevitt, J., Sun, J., & Liu, X. (2020). Quality Matters: A New Approach for Detecting Quality Problems in Web Archives. Proceedings of the Annual Conference of CAIS. https://doi.org/10.29173/cais1145
31 |
32 | ### Blog Posts
33 | * Not All Websites Are Made Equal (Or Friendly): Archiving ephemeral art content on the web - https://archive-it.org/blog/post/not-all-websites-are-made-equal-or-friendly-archiving-ephemeral-art-content-on-the-web/
34 | * Hockx-Yu, H. (2012). How good is good enough? - Quality Assurance of harvested web resources. https://britishlibrary.typepad.co.uk/webarchive/2012/10/how-good-is-good-enough-quality-assurance-of-harvested-web-resources.html
35 | * UK Web Archive blog. (2017). The Challenges of Web Archiving Social Media - http://blogs.bl.uk/webarchive/2017/04/the-challenges-of-web-archiving-social-media.html
36 |
37 | ### Other Reading
38 | * Reyes, B. (2018). A Grounded Theory of Information Quality in Web Archives.
39 | * Dissertation - https://digital.library.unt.edu/ark:/67531/metadc1248497/
40 | * Defense Slide - https://digital.library.unt.edu/ark:/67531/metadc1181153/
41 | * Archive-It (n.d.) Scoping crawls for specific types of sites. https://support.archive-it.org/hc/en-us/sections/201841373-Scoping-crawls-for-specific-types-of-sites
42 | * Archive-It (n.d.) Quality Assurance Overview. https://support.archive-it.org/hc/en-us/articles/208333833-Quality-Assurance-Overview
43 | * Library of Congress (n.d.) Creating Preservable Websites. https://www.loc.gov/programs/web-archiving/for-site-owners/creating-preservable-websites/
44 | * Marill, J., Boyko, A., Ashenfelder, M., & Jones, G. (2004). Web Harvesting Survey. https://digital.library.unt.edu/ark:/67531/metadc1457765/
45 |
46 | ## Archiving Exercise
47 |
48 | ### Web Archiving Exercise - ReplayWeb.page
49 |
50 | This week we are going to be looking at one of the tools in the suite of tools being developed by the Webrecord.org group.
51 |
52 | This exercise will build on the work that you did in Module Nine - Archiving Exercise where you looked at the ArchiveWeb.page service.
53 |
54 | The ReplayWeb service is designed to give you access to the contents in common web archive files directly in your browser.
55 |
56 | Start by navigating to https://replayweb.page/
57 |
58 | 
59 |
60 | What you will do is load a warc or wacz file into this service and investigate the contents of that web archive container file.
61 |
62 | You can visit the documentation pages for this site. - https://replayweb.page/docs/
63 |
64 | I would prefer that you try to use a .warc or .wacz file that you have created from the ArchiveWeb.page site. You can use the content you collected previously if you happened to save that file, or you can quickly build another small archive.
65 |
66 | If you can't use your own content there are a few example files available here - https://replayweb.page/docs/examples
67 |
68 | Here are some example screenshots from my session where I loaded a .wacz file and looked at the contents of that file.
69 |
70 | 
71 |
72 |
73 | And then after selecting the first webpage on that screen to view in detail.
74 |
75 | 
76 |
77 |
78 | In the discussion the week I would like to hear your observations on this tool and interacting with these files after you have created them. What did you think about the tool? Did you have any trouble with it?
79 |
80 | ### Bonus activity:
81 |
82 | Did you know that the .wacz format is actually just a standard Zip file? It is based on this specification https://webrecorder.github.io/wacz-spec/1.2.0/ that is currently being developed. If you will change your file's name from something like webarchive.wacz to webarchive.zip you will be able to open it up on your computer most likely. If you do this bonus it would be great to see what you found inside of the wacz file. Note: The warc file does not work in this same way.
83 |
84 | ## Exploring Web Archives
85 |
86 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
87 |
88 | This week we will look at Common Crawl.
89 |
90 | From their website's main page "We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone."
91 |
92 | https://commoncrawl.org/
93 |
94 | 
95 |
96 | Explore the website to learn more about this project.
97 |
98 | The main component of this project that people interact with is the data.
99 |
100 | Take a look at the "getting started" page. https://commoncrawl.org/the-data/get-started/
101 |
102 | Pick one of the monthly crawls and explore what kinds of files they make available. Are any of these formats familiar? Are any of them new to you?
103 |
104 | Take the time to look at the list of projects that have made use of Common Crawl Data - https://commoncrawl.org/the-data/examples/
105 |
106 | For this week's discussion I would like to hear about what you found with Common Crawl. I would also like for you to identify one project that uses Common Crawl Data that you found interesting and describe the project to the rest of your classmates. Don't forget to include links or citations so we can see what you are looking at.
107 |
108 | ## Discussion
109 |
110 | ### Discussion Post:
111 |
112 | In at least one paragraph, discuss what you learned this week about quality assurance or assessment for web archives. What are some of the common issues that can crop up when archiving websites that cause quality issues? Which of the readings resonated with you the most this week?
113 |
114 | In at least one paragraph, describe what happened when you used the ReplayWeb.page service. Were you able to successfully load your file from a few weeks ago? If not, did you look at any of the example files? What are your observations overall of this tool? What future use can you see with the tool? Were you able to open a wacz file and peek inside? What did you find inside?
115 |
116 | Finally, in at least two paragraphs, discuss the what you learned about Common Crawl this week. Why are the goals that it has? How was it started? What services does it provide? Which monthly crawl did you look at? What kind of formats did they provide access to? Were they all familiar to you?
117 |
118 | In the second Common Crawl paragraph, which project that uses Common Crawl data did you find interesting? Give a couple sentence description of what the project is about. Include links to the project so that we can follow along with your description.
119 |
120 |
121 | ## Class Engagement:
122 |
123 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
124 |
125 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
126 |
--------------------------------------------------------------------------------
/modules/module-12-research.md:
--------------------------------------------------------------------------------
1 | # Module Twelve - Research with Web Archives
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 |
7 | The purpose of this module is to explore some of the ways that web archives can be used to answer research questions. You will look at some examples of research that seek to answer questions about the field of web archives as well as the broader use of web archives to answer questions in different disciplines.
8 |
9 | There are several readings, some online documentation to skim, a few videos to watch, and several power points that you will review.
10 |
11 | ### Objectives:
12 |
13 | 1. Familiarize yourself with different kinds of research that takes place with web archives.
14 | 2. Learn ways of identifying how web archives are being used in research.
15 | 3. Have experience identifying research that utilizes web archives.
16 |
17 | ## Readings
18 |
19 | The readings this week were selected to give you an introduction to different kinds of research that is being done with web archives. You will also look at how different web archives are working to enable access to researchers and different strategies that are in place for working with web archives in research projects.
20 |
21 | This week's subject could easily fill a whole semester as we look at the kinds of research conducted with web archives and how web archives are working to facilitate research with web archives. I've tried to pick some examples of each and then provided a number of websites for different initiatives in this space.
22 |
23 | ### Examples of Research with Web Archives
24 |
25 | * Ben-David, A. (2019). Web Archives as Memoryware: Critical reflections on sources and methods for web history. International Internet Preservation Consortium's Web Archiving Conference 2019, Zagreb, Croatia.
26 | https://www.youtube.com/watch?v=2kRC2X88kF4
27 | * While a bit on the longer side, I thought this was one of the most interesting talks I've heard in a long time. The volume is low at the beginning but it gets better after a few minutes when the microphone is moved.
28 | * Brügger, N. (2009). Website history and the website as an object of study. New Media & Society, 11(1–2), 115–132. https://doi.org/10.1177/1461444808099574
29 | * UNT Libraries direct link - https://journals-sagepub-com.libproxy.library.unt.edu/doi/pdf/10.1177/1461444808099574
30 | * Milligan, I. (2016). Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives. International Journal of Humanities and Arts Computing 10(1) 78-94. https://doi.org/10.3366/ijhac.2016.0161
31 | * University of Waterloo's Institutional Repository Preprint - https://uwspace.uwaterloo.ca/handle/10012/10322
32 | * Webster, P. (2021). Digital archaeology in the web of links: Reconstructing a late-1990's web sphere. In D. Gomes, E. Demidova, J. Winters, & T. Risse (Eds.), The Past Web (155-164). https://doi.org/10.1007/978-3-030-63291-5_12
33 | * UNT Libraries direct link - https://link-springer-com.libproxy.library.unt.edu/content/pdf/10.1007/978-3-030-63291-5_12.pdf
34 | * Ben-David, A. (2021). Critical web archive research. In D. Gomes, E. Demidova, J. Winters, & T. Risse (Eds.), The Past Web (181-188). https://doi.org/10.1007/978-3-030-63291-5_14
35 | * UNT Libraries direct link - https://link-springer-com.libproxy.library.unt.edu/content/pdf/10.1007/978-3-030-63291-5_14.pdf
36 |
37 | ### Working with Researchers
38 |
39 | * Zierau, E., & Moldrup-Dalum, P. (2021). Making web collections for research sustainable & reusable: Possibilities and challenges Experienced. International Internet Preservation Consortium's Web Archiving Conference 2021.
40 | https://www.youtube.com/watch?v=8DGsyEylnM4
41 | * Ruest, N., Lin, J., Milligan, I., & Fritz, S. (2020). The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives. Joint Conference on Digital Libraries. https://doi.org/10.1145/3383583.3398513
42 | * Also available as preprint from arXiv - https://arxiv.org/abs/2001.05399
43 | * Lin, J., Milligan, I. Wiebe, J., & Zhou, A. (2017). Warcbase: Scalable Analytics Infrastructure for Exploring Web Archives. Journal on Computing and Cultural Heritage. 10(4). 1-30 https://doi.org/10.1145/3097570
44 | * UNT Libraries Direct Link to resource - https://dl-acm-org.libproxy.library.unt.edu/doi/10.1145/3097570
45 |
46 | ### Initiatives
47 | * Archive-It Research Services - https://support.archive-it.org/hc/en-us/articles/209671666-Introduction-to-Archive-It-Research-Services-ARS-
48 | * Web Archive Transformation (WAT) files - https://support.archive-it.org/hc/en-us/articles/360039686611
49 | * Web Archive Named Entities (WANE) files - https://support.archive-it.org/hc/en-us/articles/360039691351
50 | * Longitudinal Graph Analysis (LGA) files - https://support.archive-it.org/hc/en-us/articles/360039291992
51 | * RESAW, a Research Infrastructure for the Study of Archived Web Materials - http://resaw.eu/
52 | * Web Archiving and Digital Libraries - https://fox.cs.vt.edu/wadl2022.html
53 | * Archives Unleashed - https://archivesunleashed.org/
54 | * WARCnet - https://cc.au.dk/en/warcnet/
55 | * WARCnet Papers - https://cc.au.dk/en/warcnet/warcnet-papers/
56 |
57 | ## Archiving Exercise
58 |
59 | Web Archiving Exercise - Research with Web Archives.
60 | This week we are going to explore different research projects that make use of web archives to answer research questions.
61 |
62 | I like to think about work with web archives as falling into a few different types. First, there is work that is exploring the nature of web archives themselves. These are often analysis of the shape, size, or contents of the web archive. They can also include research about how the web archive was crawled or the relationships of content in the websites. This can happen with network analysis or by working with other derivative formats from the web archive itself.
63 |
64 | Another type of research makes use of web archives as a dataset of large amounts of text to build tools, models, and services for research. You will have seen this in some of the Common Crawl (https://commoncrawl.org) research that you looked at in a previous module.
65 |
66 | Finally in there is research that is conducted with web archives as a datasource to answer questions in the specific discipline such as political science, history, or health policy. This list of potential uses is almost endless.
67 |
68 | This week's exercise is to identify **two** example papers or articles from **two** of these three rough categories. Said another way, don't choose the same category twice, and identify **two** papers.
69 |
70 | * Web archives to study web archives.
71 | * Web archives for building models.
72 | * Web archives for answering disciplinary research questions.
73 |
74 | In this week's discussion, include a citation for the paper or articles that you identified along with a paragraph describing the research and how the web archive was used to facilitate that research. Information about specifics of the web archive such as domain, size, time periods, or formats would be great to include.
75 |
76 | Where to find these papers?
77 |
78 | I suggest doing some broad searches in Google Scholar - https://scholar.google.com/ as a way to being this assignment.
79 |
80 | ## Exploring Web Archives
81 |
82 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
83 |
84 | This week we will look at the output of the International Internet Preservation Consortium's (IIPC) General Assembly and Web Archiving Conference.
85 |
86 | https://digital.library.unt.edu/explore/collections/IIPCM/
87 |
88 | This collection of presentations is hosted by the UNT Digital Library and includes 316 presentation from eight years of events hosted by the IIPC.
89 |
90 | You can view the items in this collection at this link.
91 |
92 | https://digital.library.unt.edu/explore/collections/IIPCM/browse/
93 |
94 | Take some time to explore these presentations to get a better sense of the kinds of presentations and work that is being carried out by members of the IIPC as well as others in this web archives space.
95 |
96 | In this week's discussion you will identify one presentation and write a description of the work being described in the presentation. Make sure that you link to the presentation so that others can see what you are referencing.
97 |
98 | ## Discussion
99 |
100 | ### Discussion Post:
101 | In at least one paragraph, discuss what you learned this week about research being conducted with web archives. What were some of the concepts that were new to you this week? Did you have any thoughts about the different kinds of research that can be done with web archives? In this week's exercise, three broad areas were suggested for a way of classifying research conducted with web archives. How do you think those three categories hold up? Too broad? Too narrow? Need additional ones based on what you found? Share your thoughts with the class.
102 |
103 | In at least one paragraph each, discuss the two articles or papers that you identified in this weeks web archiving exercise. In addition to a citation for the paper or articles that you identified, describe the research and how the web archive was used to facilitate that research. Information about specifics of the web archive such as domain, size, time periods, or formats would be great to include.
104 |
105 | Finally, in at least one paragraph, identify which presentation you found in the IIPC's GA and WAC collection in the UNT Digital Library. Who was involved in the work? What was the scope of the work or project? What questions would you like to ask the presenters about their work if you had a chance.
106 |
107 | ### Class Engagement:
108 |
109 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
110 |
111 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
112 |
--------------------------------------------------------------------------------
/modules/module-13-intellectual-property-ethics.md:
--------------------------------------------------------------------------------
1 | # Module Thirteen - Intellectual Property / Ethics
2 |
3 | ## Overview and Objectives
4 |
5 | ### Overview:
6 |
7 | The purpose of this module is to become familiar with the ethical considerations to web archives. Ethics is an important component in most aspects of creating collections and web archives has many pieces where ethics and intellectual property are involved.
8 |
9 | In this module, both ethics and intellectual property are discussed. These are two different areas that have some overlap but can often be thought of together. They don't always align as there are many situations where it might be legal to do something (without copyright restriction, or issues with intellectual property concerns) but it might not be ethical to do so. Another concept that is involved in this space is bias which is also at play in any collection building activity including web archives.
10 |
11 | There are several readings, some online documentation to skim, a few videos to watch, and several power points that you will review.
12 |
13 | ### Objectives:
14 |
15 | * Familiarize yourself with ethics as they are associated with web archives.
16 | * Understand basic concepts of intellectual property and copyright as they apply to web archives.
17 | * Register and create a public collection in the Conifer tool.
18 |
19 | ## Readings
20 |
21 | The readings this week were selected to give you an introduction to different ways that ethics, legal considerations, and bias come into play within the scope of web archives. While some fo the readings for this week might seem web archive adjacent, they are all worth considering as you continue to think about building collections of content created by other people, other nations, and other communities.
22 |
23 | ### Ethics
24 |
25 | * Jules, B., Summers, E., & Mitchell, V. Jr. (2018). Ethical Considerations for Archiving Social Media Content Generated by Contemporary Social Movements: Challenges, Opportunities, and Recommendations. https://www.docnow.io/docs/docnow-whitepaper-2018.pdf
26 | * Graham, P. (2019). Guest Editorial: Reflections on the Ethics of Web Archiving, Journal of Archival Organization, 14(3-4), 103-110, https://doi.org/10.1080/15332748.2018.1517589
27 | * Summers, E. (2014). On Forgetting. https://inkdroid.org/2014/11/18/on-forgetting/
28 | * Dolan-Mescal, A. (2017). Opportunities for making appraisal transparent when documenting the now. https://news.docnow.io/opportunities-for-making-appraisal-transparent-when-documenting-the-now-10b807606d39
29 | * Bingham, N. J., & Byrne, H. (2021). Archival strategies for contemporary collecting in a world of big data: Challenges and opportunities with curating the UK web archive. Big Data & Society https://doi.org/10.1177/2053951721990409
30 | * George Washington University Libraries. (2018). Social media research ethical and privacy guidelines. https://gwu-libraries.github.io/sfm-ui/resources/social_media_research_ethical_and_privacy_guidelines.pdf
31 | * National Forum on Ethics & Archiving of the Web - https://eaw.rhizome.org/
32 | * This is a very interesting collection of talks and recorded videos from the event.
33 | * Digital Curation Ethics (Web Archive) - https://archive-it.org/collections/9982
34 | * Collection of papers and projects related to ethics in digital curation
35 | * Kahle, B. (1992). Ethics of Digital Librarianship. https://archive.org/about/ethics_BK.php
36 | * de Klerk, T. (2018). Ethics in Archives: Decisions in Digital Archiving. https://www.lib.ncsu.edu/news/special-collections/ethics-in-archives%3A-decisions-in-digital-archiving
37 |
38 | ### Legal / Intellectual Property
39 | * Grotke, A. (2012). Legal Issues in Web Archiving. https://blogs.loc.gov/thesignal/2012/05/legal-issues-in-web-archiving/
40 | * Brindley, L. (2012). The memory of a nation in a digital world: Act quickly or our intellectual record will disappear down a black hole. The New Statesman. https://www.newstatesman.com/culture/2012/05/memory-nation-digital-world
41 | * International Internet Preservation Consortium (2022). Legal Deposit. https://netpreserve.org/web-archiving/legal-deposit/
42 | * International Federation of Library Associations and Institutions. (2011). IFLS Statement of Legal Deposit. https://www.ifla.org/publications/ifla-statement-on-legal-deposit-2011/
43 | * Association of Research Libraries. (N.D.) Copyright & Fair Use/Fair Dealing https://www.arl.org/category/our-priorities/advocacy-public-policy/copyright-and-fair-use/
44 |
45 | ## Archiving Exercise
46 |
47 | ### Web Archiving Exercise - Creating a Collection with Conifer
48 |
49 | This week we are looking at the Conifer service offered by Rhizome. Many of you will know from our readings that Conifer was developed in partnership with the Webrecorder group and was previously called webrecorder.io. The service is basically the same as it was and the renaming reflects the changes in governance of the service in relation to other projects.
50 |
51 | For our final project we will be making use of the Conifer service to capture web content as part of the web archives that you have described in your Web Archive Collection Plan.
52 |
53 | For this exercise, you will create a free account with the Conifer service and then create a public collection for the web archive you described in your collection plan.
54 |
55 | Begin by navigating over to https://conifer.rhizome.org/
56 |
57 | 
58 |
59 | Next, register for a free account with the service. You will have 5GB of free space on this service and if you don't go wild with your final assignment, this should be sufficient. If you want to use this service more in the future there are options for more storage with a subscription.
60 |
61 | Once you have created your account you will be given the option to create a collection. Create a new collection and name it what you chose in your Collection Plan document. When you are creating the collection click the toggle to make it viewable by everyone. Here is what my create a collection page looked like.
62 |
63 | 
64 |
65 | After you create the collection you will have a blank collection where you can start capturing items for you collection.
66 |
67 | If you click on the Collection Cover link you will be given the public facing display and the URL that you can share with the class.
68 |
69 | 
70 |
71 | You can then share the link to your public collection. Here is the link to the collection I just created.
72 |
73 | https://conifer.rhizome.org/mphillips/action-figure-web-archive
74 |
75 | 
76 |
77 | If you want to explore different ways you can add additional information to your new collection feel free. There is a way of adding a description about the collection and possibly other options you can make use of.
78 |
79 | And that is all that you need to do for this week's exercise. In this weeks discussion you can post the link to your public collection as an example of what you will be working on for the final project. The public collection allows us all to see the work you are doing more easily.
80 |
81 | ## Exploring Web Archives
82 |
83 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
84 |
85 | This week we will look at the output of the membership of the International Internet Preservation Consortiums' members.
86 |
87 | https://netpreserve.org/about-us/members/
88 |
89 | Because many of these libraries are national libraries, they are operating under their local copyright and intellectual property laws. Many of them have some sort of legal mandate in place for collecting resources but not all of them have the ability to display all of the content that they collect.
90 |
91 | Take some time to explore the members and try to navigate to their institutions' web archive if possible. You will notice some familiar groups like Australia, UKWA, Arquivo.pt, and the Library of Congress that we have looked at in past weeks. In your reporting out in the discussion this week, pick a web archiving institution other than one of the ones we have seen so far in previous module's Exploring Web Archives sections.
92 |
93 | ## Discussion
94 |
95 | ### Discussion Post:
96 | In at least one paragraph, discuss what you learned this week about ethics and intellectual property in relation to web archives. Had you thought much about this aspect previously in the course? Are there things like restricted access to some web archives that you see differently based on the readings this week? What are your thoughts about legal mandates to collect web content as a component of preserving culture and intellectual output of a nation?
97 |
98 | In at least one paragraph discuss this week's exercise with Conifer. Did you run into any problems with setting up an account? Post a link to your collection and give the class a brief overview of the collection you will be creating based on your Collection Plan.
99 |
100 | Finally, in at least one paragraph, identify one of the members of the IIPC that you haven't already looked at as part of a previous module's Exploring Web Archive. What did you learn about that member? What kind of library is it, national, research, archive, commercial service? What kinds of web archives do they collect? What do they have online. Link to the member's local institution's pages on their web archiving initiative if possible.
101 |
102 | ### Class Engagement:
103 |
104 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
105 |
106 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
107 |
--------------------------------------------------------------------------------
/modules/module-14-future-of-web-archive.md:
--------------------------------------------------------------------------------
1 | # Module Fourteen - Future of Web Archives
2 |
3 | ## Overview and Objectives
4 |
5 | Overview:
6 | The purpose of this module is to introduce some of the emerging areas of web archive, or possibly web archiving adjacent activities that may change how the field does the things it does. This module will present several initiatives that are in current development as well as some example of recent projects in the broad scope of web archiving.
7 |
8 | There are several readings, some online documentation to skim, a few videos to watch, and several power points that you will review.
9 |
10 | Objectives:
11 | 1. Introduce projects that may have an effect on web archiving in the future.
12 | 2. Become aware of the concept of Robust Links.
13 | 3. Create and share an example of a Robust Link with the class.
14 |
15 | ## Readings
16 |
17 | The readings this week were selected to give you an introduction to different projects that are on the periphery of the web archiving space that we have not been able to cover in great depth so far in this course. Many of them you may have run across in previous readings but this is an opportunity for you to learn more about them in this module.
18 |
19 | ### General Readings
20 | * Lynch, C. (2022). The Dangerous Complacency of “Web Archiving” Rhetoric. Against the Grain 33(6) https://www.charleston-hub.com/2022/01/the-dangerous-complacency-of-web-archiving-rhetoric/
21 | * Lynch, C. (2017) Stewardship in the "Age of Algorithms". First Monday 22(12). https://doi.org/10.5210/fm.v22i12.8097
22 |
23 | ### Robust Links
24 | * https://robustlinks.mementoweb.org/
25 | * About the Project - https://robustlinks.mementoweb.org/about/
26 | * Specification - https://robustlinks.mementoweb.org/spec/
27 | * Sanderson, R., Phillips, M., & Van de Sompel H. (2011). Analyzing the Persistence of Referenced Web Resources with Memento. In proceedings Open Repositories 2011 Conference.
28 | * ArXiv Link - https://doi.org/10.48550/arXiv.1105.3459
29 | * UNT Digital Library - https://digital.library.unt.edu/ark:/67531/metadc39318/
30 |
31 | ### Signposting
32 | * https://signposting.org/
33 | * About the Project - https://signposting.org/#about
34 | Klein, M., Shankar, H., & Van de Sompel H. (2018). Signposting for Repositories. In proceedings Joint Conference on Digital Libraries. https://doi.org/10.1145/3197026.3203879
35 | * UNT Direct Link - https://dl-acm-org.libproxy.library.unt.edu/doi/10.1145/3197026.3203879
36 | * Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K., & Tobin, R. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. https://doi.org/10.1371/journal.pone.0115253
37 |
38 | ### Web Archive Collection Zipped (WACZ)
39 | * Summers, Ed. (2021). Web Archives on, of, and off, the Web. https://inkdroid.org/2021/11/24/wacz/
40 | * Open Knowledge Foundation. (2022). Ilya Kreymer's and Ed Summers' presentation about standardising the WACZ format
41 | https://www.youtube.com/watch?v=TIyOTEyAu7k
42 | * Specification - https://webrecorder.github.io/wacz-spec/1.1.1/
43 |
44 | ## Archiving Exercise
45 |
46 | ### Web Archiving Exercise - Robust Links
47 |
48 | One of the most powerful, and at the same time most fragile thing that makes the web possible is the mechanism that is used for connecting resource, links. These are the basis of the web and we use them all the time without thinking about them. That is, until they do not work for us and we get a 404 error and have to figure out what to do next.
49 |
50 | In addition to just missing pages, there are some situations when the specific version of a website is important to reference. We accomplish this in writing with citations and references that display when a URL was last referenced. There are other approaches to this problem space that we will explore in this exercise.
51 |
52 | First, head over to the Robustify service.
53 |
54 | https://robustlinks.mementoweb.org/
55 |
56 | 
57 |
58 | Choose a website you want to link to with a specific date and time.
59 |
60 | After you entry your link and the text you want the link to say, you hit submit and the service will begin doing its work.
61 |
62 | ")
63 |
64 | In my example I have chosen a link to the Dallas Morning News (https://dallasnews.com (Links to an external site.)) for April 24, 2022.
65 |
66 | Once the service has completed you will be provided with the following screen.
67 |
68 | 
69 |
70 | You have some options for how you can choose to display your new robust link.
71 |
72 | For the discussion this week you will share the code for your Robust link in its snippet form. You can use the option like you see below.
73 |
74 | ```
75 | Dallas Morning News for April 24, 2022
78 | ```
79 |
80 | ## Exploring Web Archives
81 |
82 | Each week we will try and learn about a new web archive, a web archiving tool, or a web archiving service. The goal of this is to get an introduction to what is happening in the web archiving space, what is being collected, and who is collecting it.
83 |
84 | This week is an open week for identifying a web archive collection that you haven't previously mentioned in your discussion postings and share it with the class. This can be a collection from a larger service such as Archive-It (https://archive-it.org/ (Links to an external site.)) , or the Library of Congress Web Archive Collections (https://www.loc.gov/web-archives/collections/ (Links to an external site.)) but if there are others you want to explore feel free.
85 |
86 | In the discussion this week you will identify the web archive collection you have identified and at least two example URLs of content in that web archive.
87 |
88 | ## Discussion
89 |
90 | ### Discussion Post:
91 | In at least one paragraph, discuss what you learned this week from the readings. Discuss your opinions of the Clifford Lynch articles. How do they or do they not alight with your thinking so far in this course?
92 |
93 | In another paragraph share what did you learn from the other examples in the module's readings. Have you come across any of them before in this course? What do you think the future of web archiving holds?
94 |
95 | In at least one paragraph discuss this week's exercise with Robust Links. What is the point of this tool/service/specification? What website did you create a link for? Share your Robust Links in your discussion using the style to show the code snippet. Where do you think this kind of service fits into the web archiving and scholarly communication landscape.
96 |
97 | Finally, in at least one paragraph, introduce the web archive collection you found in the Exploring Web Archives section from this week. Who created the collection? What is the scope of the collection? Include the URL for the collection and then include two example URLs from content within that collection.
98 |
99 | ### Class Engagement:
100 |
101 | After you have made the discussion post described above, take the time to response, comment, or engage with at least **two** of your classmates posts.
102 |
103 | If there are any unanswered questions feel free to try and offer an answer or suggestion to the original poster. Did they mention something that made you investigate something further? If so, what was it?
104 |
105 |
--------------------------------------------------------------------------------
/syllabus-5960.001-Web-Archiving-2022-Spring.md:
--------------------------------------------------------------------------------
1 | # INFO 5960.001 - Web Archiving
2 | ## Course Information
3 |
4 | **Term: Spring 2022**
5 |
6 | **Location: Online** - https://learn.unt.edu
7 |
8 | ## Instructor Information
9 |
10 | Instructor: Mark Phillips Ph.D. (he/him)
11 | Office Hours: Tues 2:00-4:00PM or by appointment
12 | Office Location: Online using Zoom
13 | Email: mark.phillips@unt.edu
14 |
15 | ## Course Description
16 | The web is a fundamental component of nearly all modern interaction. Preserving content from the web and providing long-term access to preserved content presents an interesting set of challenges for Information Scientists. In this course, you will develop knowledge and skills related to the standards, tools, and processes of web archiving. You will learn the mechanics of web archiving and its relation to familiar concepts like collection building and appraisal, access and use, and ethics. This course will provide hands-on experience working with different projects and tools, and is designed for anyone interested in the topic without any need for prior experience in web archiving.
17 |
18 | ## Objectives
19 | By the end of this course you should be able to:
20 |
21 | * Discuss the role and the potential of the Web as information and characteristics of the Web for archiving and preservation.
22 | * Be familiar with tools and appropriate techniques for preservation of different aspects of the web including “standard” websites as well as a working understanding of preserving API-based web content like social media sites.
23 | * Recognize the challenges of Web archiving.
24 | * Become proficient at using, interpreting, and explaining common playback tools such as the Wayback Machine.
25 | * Increase your awareness of legal and policy constraints on Web archiving.
26 | * Be familiar with the standards and best practices for sustainably archiving Web content.
27 |
28 | ## Required/Recommended Materials
29 | This course does not have a required textbook but will instead rely on a wide range of resources such as reports, articles, white papers, conference proceedings, presentations, and video recordings. A full list of readings is available in LEARN as weekly modules.
30 |
31 | Every effort has been made to select resources that are either open access or resources that are available to you as a student from the UNT Libraries, which are paid for by your library fee. If you have any trouble accessing the readings do not hesitate to reach out to me for assistance.
32 |
33 | ## Assignments
34 | The assignments for this course are designed to allow you to demonstrate and develop your knowledge related to web archives.
35 |
36 | ### Readings
37 | There are reading assignments posted with each module for this course. These readings have been identified to introduce concepts and ideas to you. In addition to readings there will be some video content assigned that you are expected to watch. The readings and videos will provide the topics for the discussion posts mentioned below.
38 |
39 | ### Discussion Posts
40 | Each week you will be assigned a discussion post that will be due on Sunday night by 11:59 PM (Central Standard Time). The discussion will be related to the weekly readings and will allow you to demonstrate your understanding of concepts introduced that week. Additionally, these discussion posts will allow you to explore web archives and introduce what you have found to your fellow students. In addition to your discussion post, you will be expected to comment on other students’ discussion posts as part of the participation portion of the course.
41 |
42 | ### Web Archive Critique
43 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing the features of a web archive and its content. This paper will include an overview of the web archive, who is responsible for creating the archive, what tools are used in creating the archive, its collection scope, and how long the archive has been operational. Instructions will be distributed two weeks before the deadline.
44 |
45 | ### Web Archive Tools Critique
46 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing a specific tool or service in the web archiving space. This paper will include an overview of the tool or service, the problem that it tries to solve, the history of the tool, and who or what institution is responsible for the tool or service. Instructions will be distributed two weeks before the deadline.
47 |
48 | ### Semester Project - Creating a web archive
49 | Throughout the semester you will be constructing the building blocks of a final class project. This project is to create a web archive related to a topic area you choose in consultation with your professor. Midway through the semester a Web Archive Collection Plan will be due describing the plan for this collection. The semester project will be due the last full week of the semester.
50 |
51 | ### Examinations
52 | There are no midterm or final examinations in this course.
53 |
54 | ## How to Succeed in this Course
55 | Office hours offer you an opportunity to ask for clarification or find support with understanding class material. I encourage you to connect with me for support. Additional office hours will be offered virtually as the semester concludes. Your success is my goal.
56 |
57 | I have blocked out my schedule on Tuesdays from 2:00 until 4:00 for scheduling Zoom-based office hours. If you are not able to meet during these times please send me an email and we can find a better date and time that will work for both of us.
58 |
59 | The University of North Texas makes reasonable academic accommodation for students with disabilities. Students seeking reasonable accommodation must first register with the Office of Disability Access (ODA) to verify their eligibility. If a disability is verified, the ODA will provide you with a reasonable accommodation letter to be delivered to faculty to begin a private discussion regarding your specific needs in a course. You may request reasonable accommodations at any time; however, ODA notices of reasonable accommodation should be provided as early as possible in the semester to avoid any delay in implementation. Note that students must obtain a new letter of reasonable accommodation for every semester and must meet with each faculty member prior to implementation in each class. Students are strongly encouraged to deliver letters of reasonable accommodation during faculty office hours or by appointment. Faculty members have the authority to ask students to discuss such letters during their designated office hours to protect the privacy of the student. For additional information, refer to the Office of Disability Access website (http://www.unt.edu/oda). You may also contact ODA by phone at (940) 565-4323.
60 |
61 | I value the many perspectives students bring to our campus. Please work with me to create a classroom culture of open communication, mutual respect, and inclusion. All discussions should be respectful and civil. Although disagreements and debates are encouraged, personal attacks are unacceptable. Together, we can ensure a safe and welcoming classroom for all. If you ever feel like this is not the case, please see me during office hours and let me know. We are all learning together.
62 |
63 | ## Assessing Your Work
64 | Grades will be determined as follows:
65 |
66 | | Assignment Type | Point Distribution |
67 | |-----------------------------|-----------------------------|
68 | | Discussions (15 total) | 150 points (10 points each) |
69 | | Web Archive Critique | 50 points |
70 | | Web Archive Tools Critique | 50 points |
71 | | Web Archive Collection Plan | 50 points |
72 | | Web Archive Final Project | 100 points. |
73 |
74 |
75 |
76 | | Grading Scale | Letter Grade |
77 | |---------------|--------------|
78 | | 90-100% | A |
79 | | 80-89% | B |
80 | | 70-79% | C |
81 | | 60-69% | D |
82 | | Below 60. | F |
83 |
84 |
85 | ### Late work
86 | All students are expected to submit their discussions, assignments, and final project by the due date. This prevents students from getting too far behind in the course and allows the instructor to assign grades in a consistent and timely manner.
87 |
88 | All students who do not complete their module assignments by 11:59 PM Central Time on Sunday will be penalized 15% of the module assignment’s points for each day late unless there are extenuating circumstances. The final project received after the due date will incur a 5-point deduction penalty for each day late unless there are extenuating circumstances.
89 |
90 | The only exceptions are a) if students have a personal or family medical emergency, or b) student informs their instructor of a conflict well in advance and receives permission to turn in an assignment late.
91 |
92 | ### Incompletes
93 | A grade of incomplete (I) will be given only for justifiable reasons (such as a serious illness or military service) and only if you are passing the course. It is our responsibility to contact the instructor to request an incomplete and discuss requirements for completing the course. If you do not remove the incomplete within the period agreed upon with the instructor or within one calendar year, you will receive a grade of an F. Please refer to https://registrar.unt.edu/grades/incompletes for more information.
94 |
95 | ### Withdrawal
96 | A grade of withdrawal (W) or withdrawal-failing (WF) will be given depending on your participation and grades to date. If you simply disappear and do not file a formal UNT withdrawal form, you may receive a grade of an F.
97 |
98 | ## Course Requirements / Schedule
99 |
100 | ### Schedule
101 |
102 | | Week | Date | Topic | Assignment Due | Points Possible |
103 | |--------------|-------|----------------------------------------------------------------------------|-----------------------------|-----------------|
104 | | | 01/18 | [Introduction to Class][module_00] | | |
105 | | Week 1 | 01/18 | [What is a Web Archive?][module_01] | **Module One** | |
106 | | | 02/23 | | Introduction | 10 pts. |
107 | | | 02/23 | | Discussion | 10 pts. |
108 | | Week 2 | 01/24 | [What is the Web][module_02] | **Module Two** | |
109 | | | 01/30 | | Discussion | 10 pts. |
110 | | Week 3 | 01/31 | [Who does Web Archiving?][module_03] | **Module Three** | |
111 | | | 01/31 | [Assignment 1: Web Archive Critique][assignment_01] | | |
112 | | | 02/06 | | Discussion | 10 pts. |
113 | | Week 4 | 02/07 | [Technology Overview][module_04] | **Module Four** | |
114 | | | 02/13 | | Discussion | |
115 | | | 02/13 | | Web Archive Critique | 50 pts. |
116 | | Week 5 | 02/14 | [Capture][module_05] | **Module Five** | |
117 | | | 02/20 | | Discussion | 10 pts. |
118 | | Week 6 | 02/21 | [Preserve][module_06] | **Module Six** | |
119 | | | 02/21 | [Assignment 2: Web Archive Tool Critique][assignment_02] | | |
120 | | | 02/27 | | Discussion | 10 pts. |
121 | | Week 7 | 02/28 | [Playback][module_07] | **Module Seven** | |
122 | | | 03/06 | | Discussion | 10 pts. |
123 | | Week 8 | 03/07 | [Other Tools][module_08] | **Module Eight** | |
124 | | | 03/13 | | Discussion | 10 pts. |
125 | | | 03/13 | | Web Archive Tools Critique | 50 pts. |
126 | | Spring Break | 03/14 | | | |
127 | | | | | | |
128 | | Week 9 | 03/21 | [Collection Policies][module_09] | **Module Nine** | |
129 | | | | [Assignment 3: Web Archive Collection Plan][assignment_03] | | |
130 | | | 03/27 | | Discussion | 10 pts. |
131 | | Week 10 | 03/28 | [Metadata][module_10] | **Module Ten** | |
132 | | | 04/03 | | Discussion | |
133 | | Week 11 | 04/04 | [Quality Assurance][module_11] | **Module Eleven** | |
134 | | | 04/10 | | Discussion | 10 pts. |
135 | | | 04/10 | | Web Archive Collection Plan | 50 pts. |
136 | | Week 12 | 04/11 | [Research with Web Archives][module_12] | **Module Twelve** | |
137 | | | 04/11 | [Final Project: Building a Web Archive][assignment_04] | | |
138 | | | 04/17 | | Discussion | 10 pts. |
139 | | Week 13 | 04/18 | [Intellectual Property][module_13] | **Module Thirteen** | |
140 | | | 04/24 | | Discussion | 10 pts. |
141 | | Week 14 | 04/25 | [Future of Web Archives][module_14] | **Module Fourteen** | |
142 | | | 05/01 | | Discussion | 10 pts. |
143 | | Week 15 | 05/05 | | Final Project Due | 100 pts. |
144 | | | | | | |
145 | | Week 16 | 05/09 | Finals Week | No Assignments | |
146 |
147 |
148 | Every student in my class can improve by doing their own work and trying their hardest with access to appropriate resources. Students who use other people’s work without citations will be violating UNT’s Academic Integrity Policy. Please read and follow this important set of guidelines for your academic success (https://policy.unt.edu/policy/06-003). If you have questions about this, or any UNT policy, please email me or come discuss this with me during my office hours.
149 |
150 | ## Attendance and Participation
151 |
152 | Success in this course is dependent on your active participation and engagement throughout the course. As such, students are required to complete all assignments by the due date, and to actively participate in class discussions.
153 |
154 | Additionally, students are expected to:
155 | * Log on at least two times a week – ideally on different days in order to completely weekly assignments, assessments, discussions and/or other weekly deliverables as directed by the instructor and outlined in the syllabus;
156 | * Participate in the weekly threaded discussions, this means that, in addition to posting a response to the thread topic presented, students are expected to respond to each other and comment and questions from the instructor and/or other students;
157 |
158 | If you find that you cannot meet the class' minimum discussion requirements due to such a circumstance, please contact me as soon as possible.
159 |
160 | Students will not be marked present for the course in a particular week if they have not posted on the discussion forum and/or submit assignment/essay or complete assessment if administered in that week.
161 |
162 | Please inform the professor and instructional team if you are unable to attend class meetings because you are ill, in mindfulness of the health and safety of everyone in our community. If you are experiencing any symptoms of COVID (https://www.cdc.gov/coronavirus/2019-ncov/symptoms testing/symptoms.html) please seek medical attention from the Student Health and Wellness Center (940-565-2333 or askSHWC@unt.edu) or your health care provider PRIOR to coming to campus. UNT also requires you to contact the UNT COVID Team at COVID@unt.edu for guidance on actions to take due to symptoms, pending or positive test results, or potential exposure.
163 |
164 | [module_00]: ./modules/module-00-introductions.md
165 | [module_01]: ./modules/module-01-what-is-a-web-archive.md
166 | [module_02]: ./modules/module-02-what-is-the-web.md
167 | [module_03]: ./modules/module-03-who-does-web-archiving.md
168 | [module_04]: ./modules/module-04-technology-overview.md
169 | [module_05]: ./modules/module-05-capture.md
170 | [module_06]: ./modules/module-06-preserve.md
171 | [module_07]: ./modules/module-07-playback.md
172 | [module_08]: ./modules/module-08-other-tools.md
173 | [module_09]: ./modules/module-09-collection-policies.md
174 | [module_10]: ./modules/module-10-metadata.md
175 | [module_11]: ./modules/module-11-quality-assurance.md
176 | [module_12]: ./modules/module-12-research.md
177 | [module_13]: ./modules/module-13-intellectual-property-ethics.md
178 | [module_14]: ./modules/module-14-future-of-web-archive.md
179 |
180 | [assignment_01]: ./assignments/assignment-01.md
181 | [assignment_02]: ./assignments/assignment-02.md
182 | [assignment_03]: ./assignments/assignment-03.md
183 | [assignment_04]: ./assignments/assignment-04.md
184 |
185 |
--------------------------------------------------------------------------------
/syllabus-5960.001-Web-Archiving-2023-Spring.md:
--------------------------------------------------------------------------------
1 | # INFO 5960.001 - Web Archiving
2 | ## Course Information
3 |
4 | **Term: Spring 2023**
5 |
6 | **Location: Online** - https://learn.unt.edu
7 |
8 | ## Instructor Information
9 |
10 | Instructor: Mark Phillips Ph.D. (he/him)
11 | Office Hours: Wed 2:00-4:00PM by appointment
12 | Office Location: Online using Zoom - https://unt.zoom.us/my/mark.phillips
13 | Email: mark.phillips@unt.edu
14 |
15 | ## Course Description
16 | The web is a fundamental component of nearly all modern interaction. Preserving content from the web and providing long-term access to preserved content presents an interesting set of challenges for Information Scientists. In this course, you will develop knowledge and skills related to the standards, tools, and processes of web archiving. You will learn the mechanics of web archiving and its relation to familiar concepts like collection building and appraisal, access and use, and ethics. This course will provide hands-on experience working with different projects and tools, and is designed for anyone interested in the topic without any need for prior experience in web archiving.
17 |
18 | ## Objectives
19 | By the end of this course you should be able to:
20 |
21 | * Discuss the role and the potential of the Web as information and characteristics of the Web for archiving and preservation.
22 | * Be familiar with tools and appropriate techniques for preservation of different aspects of the web including “standard” websites as well as a working understanding of preserving API-based web content like social media sites.
23 | * Recognize the challenges of Web archiving.
24 | * Become proficient at using, interpreting, and explaining common playback tools such as the Wayback Machine.
25 | * Increase your awareness of legal and policy constraints on Web archiving.
26 | * Be familiar with the standards and best practices for sustainably archiving Web content.
27 |
28 | ## Required/Recommended Materials
29 | This course does not have a required textbook but will instead rely on a wide range of resources such as reports, articles, white papers, conference proceedings, presentations, and video recordings. A full list of readings is available in LEARN as weekly modules.
30 |
31 | Every effort has been made to select resources that are either open access or resources that are available to you as a student from the UNT Libraries, which are paid for by your library fee. If you have any trouble accessing the readings do not hesitate to reach out to me for assistance.
32 |
33 | ## Assignments
34 | The assignments for this course are designed to allow you to demonstrate and develop your knowledge related to web archives.
35 |
36 | ### Readings
37 | There are reading assignments posted with each module for this course. These readings have been identified to introduce concepts and ideas to you. In addition to readings there will be some video content assigned that you are expected to watch. The readings and videos will provide the topics for the discussion posts mentioned below.
38 |
39 | ### Discussion Posts
40 | Each week you will be assigned a discussion post that will be due on Sunday night by 11:59 PM (Central Standard Time). The discussion will be related to the weekly readings and will allow you to demonstrate your understanding of concepts introduced that week. Additionally, these discussion posts will allow you to explore web archives and introduce what you have found to your fellow students. In addition to your discussion post, you will be expected to comment on other students’ discussion posts as part of the participation portion of the course.
41 |
42 | ### Web Archive Critique
43 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing the features of a web archive and its content. This paper will include an overview of the web archive, who is responsible for creating the archive, what tools are used in creating the archive, its collection scope, and how long the archive has been operational. Instructions will be distributed two weeks before the deadline.
44 |
45 | ### Web Archive Tools Critique
46 | A brief paper (2-3 pages, font-size 12, double-spaced) discussing a specific tool or service in the web archiving space. This paper will include an overview of the tool or service, the problem that it tries to solve, the history of the tool, and who or what institution is responsible for the tool or service. Instructions will be distributed two weeks before the deadline.
47 |
48 | ### Semester Project - Creating a web archive
49 | Throughout the semester you will be constructing the building blocks of a final class project. This project is to create a web archive related to a topic area you choose in consultation with your professor. Midway through the semester a Web Archive Collection Plan will be due describing the plan for this collection. The semester project will be due the last full week of the semester.
50 |
51 | ### Examinations
52 | There are no midterm or final examinations in this course.
53 |
54 | ## How to Succeed in this Course
55 | Office hours offer you an opportunity to ask for clarification or find support with understanding class material. I encourage you to connect with me for support. Additional office hours will be offered virtually as the semester concludes. Your success is my goal.
56 |
57 | I have blocked out my schedule on Tuesdays from 2:00 until 4:00 for scheduling Zoom-based office hours. If you are not able to meet during these times please send me an email and we can find a better date and time that will work for both of us.
58 |
59 | The University of North Texas makes reasonable academic accommodation for students with disabilities. Students seeking reasonable accommodation must first register with the Office of Disability Access (ODA) to verify their eligibility. If a disability is verified, the ODA will provide you with a reasonable accommodation letter to be delivered to faculty to begin a private discussion regarding your specific needs in a course. You may request reasonable accommodations at any time; however, ODA notices of reasonable accommodation should be provided as early as possible in the semester to avoid any delay in implementation. Note that students must obtain a new letter of reasonable accommodation for every semester and must meet with each faculty member prior to implementation in each class. Students are strongly encouraged to deliver letters of reasonable accommodation during faculty office hours or by appointment. Faculty members have the authority to ask students to discuss such letters during their designated office hours to protect the privacy of the student. For additional information, refer to the Office of Disability Access website (http://www.unt.edu/oda). You may also contact ODA by phone at (940) 565-4323.
60 |
61 | I value the many perspectives students bring to our campus. Please work with me to create a classroom culture of open communication, mutual respect, and inclusion. All discussions should be respectful and civil. Although disagreements and debates are encouraged, personal attacks are unacceptable. Together, we can ensure a safe and welcoming classroom for all. If you ever feel like this is not the case, please see me during office hours and let me know. We are all learning together.
62 |
63 | ## Assessing Your Work
64 | Grades will be determined as follows:
65 |
66 | | Assignment Type | Point Distribution |
67 | |-----------------------------|-----------------------------|
68 | | Discussions (15 total) | 150 points (10 points each) |
69 | | Web Archive Critique | 50 points |
70 | | Web Archive Tools Critique | 50 points |
71 | | Web Archive Collection Plan | 50 points |
72 | | Web Archive Final Project | 100 points. |
73 |
74 |
75 |
76 | | Grading Scale | Letter Grade |
77 | |---------------|--------------|
78 | | 90-100% | A |
79 | | 80-89% | B |
80 | | 70-79% | C |
81 | | 60-69% | D |
82 | | Below 60. | F |
83 |
84 |
85 | ### Late work
86 | All students are expected to submit their discussions, assignments, and final project by the due date. This prevents students from getting too far behind in the course and allows the instructor to assign grades in a consistent and timely manner.
87 |
88 | All students who do not complete their module assignments by 11:59 PM Central Time on Sunday will be penalized 15% of the module assignment’s points for each day late unless there are extenuating circumstances. The final project received after the due date will incur a 5-point deduction penalty for each day late unless there are extenuating circumstances.
89 |
90 | The only exceptions are a) if students have a personal or family medical emergency, or b) student informs their instructor of a conflict well in advance and receives permission to turn in an assignment late.
91 |
92 | ### Incompletes
93 | A grade of incomplete (I) will be given only for justifiable reasons (such as a serious illness or military service) and only if you are passing the course. It is our responsibility to contact the instructor to request an incomplete and discuss requirements for completing the course. If you do not remove the incomplete within the period agreed upon with the instructor or within one calendar year, you will receive a grade of an F. Please refer to https://registrar.unt.edu/grades/incompletes for more information.
94 |
95 | ### Withdrawal
96 | A grade of withdrawal (W) or withdrawal-failing (WF) will be given depending on your participation and grades to date. If you simply disappear and do not file a formal UNT withdrawal form, you may receive a grade of an F.
97 |
98 | ## Course Requirements / Schedule
99 |
100 | ### Schedule
101 |
102 | | Week | Date | Topic | Assignment Due | Points Possible |
103 | |--------------|-------|----------------------------------------------------------------------------|-----------------------------|-----------------|
104 | | | 01/17 | [Introduction to Class][module_00] | | |
105 | | Week 1 | 01/17 | [What is a Web Archive?][module_01] | **Module One** | |
106 | | | 02/22 | | Introduction | 10 pts. |
107 | | | 02/22 | | Discussion | 10 pts. |
108 | | Week 2 | 01/23 | [What is the Web][module_02] | **Module Two** | |
109 | | | 01/29 | | Discussion | 10 pts. |
110 | | Week 3 | 01/30 | [Who does Web Archiving?][module_03] | **Module Three** | |
111 | | | 01/30 | [Assignment 1: Web Archive Critique][assignment_01] | | |
112 | | | 02/05 | | Discussion | 10 pts. |
113 | | Week 4 | 02/06 | [Technology Overview][module_04] | **Module Four** | |
114 | | | 02/12 | | Discussion | |
115 | | | 02/12 | | Web Archive Critique | 50 pts. |
116 | | Week 5 | 02/13 | [Capture][module_05] | **Module Five** | |
117 | | | 02/19 | | Discussion | 10 pts. |
118 | | Week 6 | 02/20 | [Preserve][module_06] | **Module Six** | |
119 | | | 02/20 | [Assignment 2: Web Archive Tool Critique][assignment_02] | | |
120 | | | 02/26 | | Discussion | 10 pts. |
121 | | Week 7 | 02/27 | [Playback][module_07] | **Module Seven** | |
122 | | | 03/05 | | Discussion | 10 pts. |
123 | | Week 8 | 03/06 | [Other Tools][module_08] | **Module Eight** | |
124 | | | 03/12 | | Discussion | 10 pts. |
125 | | | 03/12 | | Web Archive Tools Critique | 50 pts. |
126 | | Spring Break | 03/13 | | | |
127 | | | | | | |
128 | | Week 9 | 03/20 | [Collection Policies][module_09] | **Module Nine** | |
129 | | | | [Assignment 3: Web Archive Collection Plan][assignment_03] | | |
130 | | | 03/26 | | Discussion | 10 pts. |
131 | | Week 10 | 03/27 | [Metadata][module_10] | **Module Ten** | |
132 | | | 04/02 | | Discussion | |
133 | | Week 11 | 04/03 | [Quality Assurance][module_11] | **Module Eleven** | |
134 | | | 04/09 | | Discussion | 10 pts. |
135 | | | 04/09 | | Web Archive Collection Plan | 50 pts. |
136 | | Week 12 | 04/10 | [Research with Web Archives][module_12] | **Module Twelve** | |
137 | | | 04/10 | [Final Project: Building a Web Archive][assignment_04] | | |
138 | | | 04/17 | | Discussion | 10 pts. |
139 | | Week 13 | 04/10 | [Intellectual Property][module_13] | **Module Thirteen** | |
140 | | | 04/23 | | Discussion | 10 pts. |
141 | | Week 14 | 04/24 | [Future of Web Archives][module_14] | **Module Fourteen** | |
142 | | | 04/30 | | Discussion | 10 pts. |
143 | | Week 15 | 05/04 | | Final Project Due | 100 pts. |
144 | | | | | | |
145 | | Week 16 | 05/08 | Finals Week | No Assignments | |
146 |
147 |
148 | Every student in my class can improve by doing their own work and trying their hardest with access to appropriate resources. Students who use other people’s work without citations will be violating UNT’s Academic Integrity Policy. Please read and follow this important set of guidelines for your academic success (https://policy.unt.edu/policy/06-003). If you have questions about this, or any UNT policy, please email me or come discuss this with me during my office hours.
149 |
150 | ## Attendance and Participation
151 |
152 | Success in this course is dependent on your active participation and engagement throughout the course. As such, students are required to complete all assignments by the due date, and to actively participate in class discussions.
153 |
154 | Additionally, students are expected to:
155 | * Log on at least two times a week – ideally on different days in order to completely weekly assignments, assessments, discussions and/or other weekly deliverables as directed by the instructor and outlined in the syllabus;
156 | * Participate in the weekly threaded discussions, this means that, in addition to posting a response to the thread topic presented, students are expected to respond to each other and comment and questions from the instructor and/or other students;
157 |
158 | If you find that you cannot meet the class' minimum discussion requirements due to such a circumstance, please contact me as soon as possible.
159 |
160 | Students will not be marked present for the course in a particular week if they have not posted on the discussion forum and/or submit assignment/essay or complete assessment if administered in that week.
161 |
162 | Please inform the professor and instructional team if you are unable to attend class meetings because you are ill, in mindfulness of the health and safety of everyone in our community. If you are experiencing any symptoms of COVID (https://www.cdc.gov/coronavirus/2019-ncov/symptoms testing/symptoms.html) please seek medical attention from the Student Health and Wellness Center (940-565-2333 or askSHWC@unt.edu) or your health care provider PRIOR to coming to campus. UNT also requires you to contact the UNT COVID Team at COVID@unt.edu for guidance on actions to take due to symptoms, pending or positive test results, or potential exposure.
163 |
164 | [module_00]: ./modules/module-00-introductions.md
165 | [module_01]: ./modules/module-01-what-is-a-web-archive.md
166 | [module_02]: ./modules/module-02-what-is-the-web.md
167 | [module_03]: ./modules/module-03-who-does-web-archiving.md
168 | [module_04]: ./modules/module-04-technology-overview.md
169 | [module_05]: ./modules/module-05-capture.md
170 | [module_06]: ./modules/module-06-preserve.md
171 | [module_07]: ./modules/module-07-playback.md
172 | [module_08]: ./modules/module-08-other-tools.md
173 | [module_09]: ./modules/module-09-collection-policies.md
174 | [module_10]: ./modules/module-10-metadata.md
175 | [module_11]: ./modules/module-11-quality-assurance.md
176 | [module_12]: ./modules/module-12-research.md
177 | [module_13]: ./modules/module-13-intellectual-property-ethics.md
178 | [module_14]: ./modules/module-14-future-of-web-archive.md
179 |
180 | [assignment_01]: ./assignments/assignment-01.md
181 | [assignment_02]: ./assignments/assignment-02.md
182 | [assignment_03]: ./assignments/assignment-03.md
183 | [assignment_04]: ./assignments/assignment-04.md
184 |
185 |
--------------------------------------------------------------------------------