25 |
26 |
--------------------------------------------------------------------------------
/content/epub30_font_obfuscation/EPUB/wasteland-nav.xhtml:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
20 |
30 |
31 |
32 |
33 |
--------------------------------------------------------------------------------
/content/epub20_minimal/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/content/epub20__invalid_entity/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/content/epub20_minimal_encryption/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/content/epub20_encryption_binary_content/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/content/epub20_foreign_resource_no_fallback/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_crazy_columns.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:18:57+02:00
4 |
5 | 2016-04-01T13:11:36Z
6 | application/epub+zip
7 | 2.0
8 | Well-formed
9 | application/epub+zip
10 |
11 | CharacterCount273
12 | Languageen
13 | Info
14 | Identifierurn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6
15 | CreationDate2016-04-01T13:11:36Z
16 | TitleCrazy Columns
17 | CreatorJohan van der Knijff
18 | Date2016-04-01
19 |
20 |
21 |
22 |
23 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_crazy_fixed_layout.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:26+02:00
4 |
5 | 2016-03-31T17:10:22Z
6 | application/epub+zip
7 | 2.0
8 | Well-formed
9 | application/epub+zip
10 |
11 | CharacterCount491
12 | Languageen
13 | Info
14 | Identifierurn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6
15 | CreationDate2016-03-31T17:10:22Z
16 | TitleCrazy Fixed Layout
17 | CreatorJohan van der Knijff
18 | Date2016-03-31
19 |
20 |
21 |
22 |
23 |
--------------------------------------------------------------------------------
/content/epub20_missingfontresource/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
--------------------------------------------------------------------------------
/content/epub20_foreign_resource_with_fallback/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
--------------------------------------------------------------------------------
/content/epub20_foreign_resource_with_fallback_noID/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
--------------------------------------------------------------------------------
/content/epub20_xpgt/OEBPS/content.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
5 | When (not) to migrate a PDF to PDF/A
6 | Johan van der Knijff
7 | en
8 |
9 |
10 | 2015-03-03
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
--------------------------------------------------------------------------------
/analyse.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Analyse Epubs in build directory with epubcheck (both version 3 & 4)
4 |
5 | # Location of epubcheck jars (update according to your own system)
6 | epubcheck3Jar=/usr/share/java/epubcheck.jar
7 | epubcheck4Jar=/home/johan/epubcheck/epubcheck.jar
8 |
9 | # ---- No need to edit anything below this line, unless you know what you're doing!
10 |
11 | # Installation directory
12 | instDir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
13 |
14 | # Build directory - where newly created epubs are stored
15 | buildDir="$instDir"/build/
16 |
17 | # Epubcheck output directory - epubcheck output goes here
18 | eCOutDir="$instDir"/epubcheckout/
19 |
20 | # Subdirs for epubcheck versions
21 | ec3OutDir="$eCOutDir"3.0.1/
22 | ec4OutDir="$eCOutDir"4.0.1/
23 |
24 | # Create output directory structure if it doesn't exist already
25 |
26 | if ! [ -d $eCOutDir ] ; then
27 | mkdir $eCOutDir
28 | fi
29 |
30 | if ! [ -d $ec3OutDir ] ; then
31 | mkdir $ec3OutDir
32 | fi
33 |
34 | if ! [ -d $ec4OutDir ] ; then
35 | mkdir $ec4OutDir
36 | fi
37 |
38 | # **************
39 | # MAIN PROCESSING LOOP
40 | # **************
41 |
42 | counter=0
43 |
44 | while IFS= read -d $'\0' epub ; do
45 | # Base name (strip away path)
46 | epubFileName=$(basename "$epub")
47 | epubBaseName="${epubFileName%.*}"
48 |
49 | # Generate epubcheck output file names
50 | ec3OutName="$ec3OutDir"$"$epubBaseName".xml
51 | ec4OutName="$ec4OutDir"$"$epubBaseName".xml
52 |
53 | # Run Epubcheck
54 | java -jar $epubcheck3Jar "$epub" -out "$ec3OutName" # 2>tmpec3.stderr
55 | java -jar $epubcheck4Jar "$epub" -out "$ec4OutName" # 2>tmpec3.stderr
56 |
57 | done < <(find "$buildDir" -maxdepth 1 -mindepth 1 -print0 -type f)
58 |
59 |
--------------------------------------------------------------------------------
/content/epub30_font_obfuscation/EPUB/wasteland.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 | The Waste Land
9 |
10 |
11 |
12 |
13 |
14 | I. THE BURIAL OF THE DEAD
15 |
16 |
17 |
18 |
19 |
20 | II. A GAME OF CHESS
21 |
22 |
23 |
24 |
25 |
26 | III. THE FIRE SERMON
27 |
28 |
29 |
30 |
31 |
32 | IV. DEATH BY WATER
33 |
34 |
35 |
36 |
37 |
38 | V. WHAT THE THUNDER SAID
39 |
40 |
41 |
42 |
43 |
44 | NOTES ON "THE WASTE LAND"
45 |
46 |
47 |
48 |
49 |
50 |
--------------------------------------------------------------------------------
/content/epub20_xpgt/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_minimal/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20__invalid_entity/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_minimal_encryption/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_missingfontresource/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_encryption_binary_content/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_foreign_resource_no_fallback/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_foreign_resource_with_fallback/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_foreign_resource_with_fallback_noID/OEBPS/toc.ncx:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | Unknown
13 |
14 |
15 |
16 |
17 | When (not) to migrate a PDF to PDF/A
18 |
19 |
20 |
21 |
22 | PDF/A is a profile
23 |
24 |
25 |
26 |
27 |
28 | Loss, alteration during migration
29 |
30 |
31 |
32 |
33 |
34 | Complexity and effect of errors
35 |
36 |
37 |
38 |
39 |
40 | Digitised vs born-digital
41 |
42 |
43 |
44 |
45 |
46 | Conclusions
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/epub20_crazy_fixed_layout/OEBPS/Styles/styles.css:
--------------------------------------------------------------------------------
1 | body
2 | {
3 | left:0;
4 | top:0;
5 | margin:0;
6 | position:absolute;
7 | width:700px;
8 | height:1100px;
9 | border: 0;
10 | }
11 |
12 | .heading1
13 | {
14 | font-family:century;
15 | font-size:25px;
16 | color:#231F20;
17 | line-height: 1.7em;
18 | text-align: justify;
19 | width: 550px;
20 | }
21 |
22 | #h01
23 | {
24 | position:absolute;
25 | left:40px;
26 | top:20px;
27 | letter-spacing:0.6px;
28 | word-spacing:0.1em;
29 | }
30 |
31 | .para
32 | {
33 | font-family:century;
34 | font-size:18.5px;
35 | color:#231F20;
36 | line-height: 1.7em;
37 | text-align: justify;
38 | width: 550px;
39 | }
40 |
41 | #p01
42 | {
43 | position:absolute;
44 | left:40px;
45 | top:80px;
46 | letter-spacing:0.42px;
47 | word-spacing:0.1em;
48 | }
49 | #p02
50 | {
51 | position:absolute;
52 | left:40px;
53 | top:120px;
54 | letter-spacing:0.42px;
55 | word-spacing:0.1em;
56 | }
57 | #p03
58 | {
59 | position:absolute;
60 | left:40px;
61 | top:160px;
62 | letter-spacing:0.42px;
63 | word-spacing:0.1em;
64 | }
65 | #p04
66 | {
67 | position:absolute;
68 | left:40px;
69 | top:200px;
70 | letter-spacing:0.42px;
71 | word-spacing:0.1em;
72 | }
73 | #p05
74 | {
75 | position:absolute;
76 | left:40px;
77 | top:240px;
78 | letter-spacing:0.42px;
79 | word-spacing:0.1em;
80 | }
81 | #p06
82 | {
83 | position:absolute;
84 | left:40px;
85 | top:280px;
86 | letter-spacing:0.42px;
87 | word-spacing:0.1em;
88 | }
89 | #p07
90 | {
91 | position:absolute;
92 | left:40px;
93 | top:320px;
94 | letter-spacing:0.42px;
95 | word-spacing:0.1em;
96 | }
97 | #p08
98 | {
99 | position:absolute;
100 | left:40px;
101 | top:360px;
102 | letter-spacing:0.42px;
103 | word-spacing:0.1em;
104 | }
105 | #p09
106 | {
107 | position:absolute;
108 | left:40px;
109 | top:400px;
110 | letter-spacing:0.42px;
111 | word-spacing:0.1em;
112 | }
113 | #p10
114 | {
115 | position:absolute;
116 | left:40px;
117 | top:440px;
118 | letter-spacing:0.42px;
119 | word-spacing:0.1em;
120 | }
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_minimal_encryption.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:52+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Not well-formed
9 |
10 | ERROR: : OPS/XHTML file OEBPS/Text/pdfMigration.html cannot be decrypted
11 | ERROR: /OEBPS/toc.ncx(24): 'pdfa-is-a-profile': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
12 | ERROR: /OEBPS/toc.ncx(30): 'loss-alteration-during-migration': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
13 | ERROR: /OEBPS/toc.ncx(36): 'complexity-and-effect-of-errors': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
14 | ERROR: /OEBPS/toc.ncx(42): 'digitised-vs-born-digital': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
15 | ERROR: /OEBPS/toc.ncx(48): 'conclusions': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
16 |
17 | application/epub+zip
18 |
19 | CharacterCount25
20 | Languageen
21 | Info
22 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
23 | CreationDate2015-06-02T16:34:06Z
24 | TitleWhen (not) to migrate a PDF to PDF/A
25 | CreatorJohan van der Knijff
26 | Date2015-03-03
27 |
28 | hasEncryptiontrue
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_encryption_binary_content.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:18+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Not well-formed
9 |
10 | ERROR: : OPS/XHTML file OEBPS/Text/pdfMigration.html cannot be decrypted
11 | ERROR: /OEBPS/toc.ncx(24): 'pdfa-is-a-profile': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
12 | ERROR: /OEBPS/toc.ncx(30): 'loss-alteration-during-migration': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
13 | ERROR: /OEBPS/toc.ncx(36): 'complexity-and-effect-of-errors': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
14 | ERROR: /OEBPS/toc.ncx(42): 'digitised-vs-born-digital': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
15 | ERROR: /OEBPS/toc.ncx(48): 'conclusions': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
16 |
17 | application/epub+zip
18 |
19 | CharacterCount25
20 | Languageen
21 | Info
22 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
23 | CreationDate2015-06-02T16:34:06Z
24 | TitleWhen (not) to migrate a PDF to PDF/A
25 | CreatorJohan van der Knijff
26 | Date2015-03-03
27 |
28 | hasEncryptiontrue
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/content/epub30_font_obfuscation/EPUB/wasteland.opf:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | code.google.com.epub-samples.wasteland-otf-obfuscated
5 | The Waste Land
6 | T.S. Eliot
7 | en-US
8 | 2011-09-01
9 | OTF font obfuscated using algorithm defined in OCF 3.0, fallback to sans-serif system font
10 | 2012-01-18T12:47:00Z
11 |
12 | This work is shared with the public using the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
13 |
14 | http://code.google.com/p/epub-samples/
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
--------------------------------------------------------------------------------
/content/epub20_xpgt/OEBPS/page-template.xpgt:
--------------------------------------------------------------------------------
1 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_dtbook.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:36+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 | application/epub+zip
13 |
14 |
15 | Language
16 |
17 | en
18 |
19 |
20 |
21 | Info
22 |
23 |
24 | Identifier
25 |
26 | C00000
27 |
28 |
29 |
30 | CreationDate
31 |
32 | 2015-06-02T16:34:06Z
33 |
34 |
35 |
36 | Title
37 |
38 | Valentin Haüy - the father of the education for the blind
39 |
40 |
41 |
42 | Creator
43 |
44 | Beatrice Christensen Sköld
45 |
46 |
47 |
48 | Date
49 |
50 | 2007-08-09
51 |
52 |
53 |
54 | Publisher
55 |
56 | TPB
57 |
58 |
59 |
60 |
61 |
62 | MediaTypes
63 |
64 | image/jpeg
65 | text/css
66 | application/x-dtbook+xml
67 | application/x-dtbncx+xml
68 |
69 |
70 |
71 |
72 |
73 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_minimal.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:59+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Well-formed
9 | application/epub+zip
10 |
11 | CharacterCount4520
12 | Languageen
13 | Info
14 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
15 | CreationDate2015-06-02T16:34:06Z
16 | TitleWhen (not) to migrate a PDF to PDF/A
17 | CreatorJohan van der Knijff
18 | Date2015-03-03
19 |
20 | References
21 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
22 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
23 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
24 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
25 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
26 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
27 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
28 |
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_crazy_columns.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:01+02:00
7 |
8 | 2016-04-01T13:11:36Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 |
13 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (13-1)
14 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (18-7)
15 |
16 | application/epub+zip
17 |
18 |
19 | CharacterCount
20 |
21 | 273
22 |
23 |
24 |
25 | Language
26 |
27 | en
28 |
29 |
30 |
31 | Info
32 |
33 |
34 | Identifier
35 |
36 | urn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6
37 |
38 |
39 |
40 | CreationDate
41 |
42 | 2016-04-01T13:11:36Z
43 |
44 |
45 |
46 | Title
47 |
48 | Crazy Columns
49 |
50 |
51 |
52 | Creator
53 |
54 | Johan van der Knijff
55 |
56 |
57 |
58 | Date
59 |
60 | 2016-04-01
61 |
62 |
63 |
64 |
65 |
66 | MediaTypes
67 |
68 | application/x-dtbncx+xml
69 | application/xhtml+xml
70 | text/css
71 |
72 |
73 |
74 |
75 |
76 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_foreign_resource_with_fallback.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:45+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Well-formed
9 | application/epub+zip
10 |
11 | CharacterCount4521
12 | Languageen
13 | Info
14 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
15 | CreationDate2015-06-02T16:34:06Z
16 | TitleWhen (not) to migrate a PDF to PDF/A
17 | CreatorJohan van der Knijff
18 | Date2015-03-03
19 |
20 | References
21 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
22 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
23 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
24 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
25 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
26 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
27 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
28 |
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_foreign_resource_with_fallback_noID.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:12+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Well-formed
9 | application/epub+zip
10 |
11 | CharacterCount4521
12 | Languageen
13 | Info
14 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
15 | CreationDate2015-06-02T16:34:06Z
16 | TitleWhen (not) to migrate a PDF to PDF/A
17 | CreatorJohan van der Knijff
18 | Date2015-03-03
19 |
20 | References
21 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
22 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
23 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
24 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
25 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
26 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
27 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
28 |
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_foreign_resource_no_fallback.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:18:44+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Not well-formed
9 |
10 | ERROR: /OEBPS/Text/pdfMigration.html(20): non-standard image resource 'OEBPS/Images/pdfVenn.jp2' of type 'image/jp2'
11 |
12 | application/epub+zip
13 |
14 | CharacterCount4520
15 | Languageen
16 | Info
17 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
18 | CreationDate2015-06-02T16:34:06Z
19 | TitleWhen (not) to migrate a PDF to PDF/A
20 | CreatorJohan van der Knijff
21 | Date2015-03-03
22 |
23 | References
24 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
25 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
26 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
27 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
28 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
29 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
30 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
31 |
32 |
33 |
34 |
35 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_xpgt.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:20:05+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Not well-formed
9 |
10 | ERROR: /OEBPS/Text/pdfMigration.html(9): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.
11 |
12 | application/epub+zip
13 |
14 | CharacterCount4526
15 | Languageen
16 | Info
17 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
18 | CreationDate2015-06-02T16:34:06Z
19 | TitleWhen (not) to migrate a PDF to PDF/A
20 | CreatorJohan van der Knijff
21 | Date2015-03-03
22 |
23 | References
24 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
25 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
26 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
27 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
28 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
29 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
30 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
31 |
32 |
33 |
34 |
35 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20__invalid_entity.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:39+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Not well-formed
9 |
10 | ERROR: /OEBPS/Text/pdfMigration.html(17): An invalid XML character (Unicode: 0xb) was found in the element content of the document.
11 | ERROR: /OEBPS/Text/pdfMigration.html: An invalid XML character (Unicode: 0xb) was found in the element content of the document.
12 | ERROR: /OEBPS/toc.ncx(30): 'loss-alteration-during-migration': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
13 | ERROR: /OEBPS/toc.ncx(36): 'complexity-and-effect-of-errors': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
14 | ERROR: /OEBPS/toc.ncx(42): 'digitised-vs-born-digital': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
15 | ERROR: /OEBPS/toc.ncx(48): 'conclusions': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html'
16 |
17 | application/epub+zip
18 |
19 | CharacterCount25
20 | Languageen
21 | Info
22 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
23 | CreationDate2015-06-02T16:34:06Z
24 | TitleWhen (not) to migrate a PDF to PDF/A
25 | CreatorJohan van der Knijff
26 | Date2015-03-03
27 |
28 | References
29 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
30 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
31 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
32 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
33 |
34 |
35 |
36 |
37 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub20_missingfontresource.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:18:50+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | application/epub+zip
7 | 2.0
8 | Not well-formed
9 |
10 | ERROR: /OEBPS/stylesheet.css(6): 'OEBPS/Fonts/CourierStd.otf': referenced resource missing in the package.
11 |
12 | application/epub+zip
13 |
14 | CharacterCount4523
15 | Languageen
16 | Info
17 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
18 | CreationDate2015-06-02T16:34:06Z
19 | TitleWhen (not) to migrate a PDF to PDF/A
20 | CreatorJohan van der Knijff
21 | Date2015-03-03
22 |
23 | Fonts
24 | Font
25 | FontNameCourier
26 | FontFiletrue
27 |
28 |
29 | References
30 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format
31 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format
32 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
33 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
34 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
35 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
36 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
37 |
38 |
39 |
40 |
41 |
--------------------------------------------------------------------------------
/epubcheckout/3.0.1/epub30_font_obfuscation.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 2017-08-29T12:19:05+02:00
4 |
5 | 2015-06-02T16:34:06Z
6 | 2012-01-18T12:47:00Z
7 | application/epub+zip
8 | 3.0
9 | Well-formed
10 |
11 | WARN: /EPUB/wasteland.ncx: meta@dtb:uid content 'code.google.com.epub-samples.wasteland-basic' should conform to unique-identifier in content.opf: 'code.google.com.epub-samples.wasteland-otf-obfuscated'
12 |
13 | application/epub+zip
14 |
15 | CharacterCount31671
16 | Languageen-US
17 | Info
18 | Identifiercode.google.com.epub-samples.wasteland-otf-obfuscated
19 | CreationDate2015-06-02T16:34:06Z
20 | ModDate2012-01-18T12:47:00Z
21 | TitleThe Waste Land
22 | CreatorT.S. Eliot
23 | Date2011-09-01
24 | RightsThis work is shared with the public using the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
25 |
26 | Fonts
27 | Font
28 | FontNameOldStandard
29 | FontFiletrue
30 |
31 | Font
32 | FontNameOldStandard,bold
33 | FontFiletrue
34 |
35 | Font
36 | FontNameOldStandard,italic
37 | FontFiletrue
38 |
39 |
40 | References
41 | Referencehttp://creativecommons.org/licenses/by-sa/3.0/
42 | Referencehttp://en.wikipedia.org/wiki/Simon_Fieldhouse
43 |
44 | hasEncryptiontrue
45 |
46 |
47 |
48 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_minimal_encryption.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:56+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Not well-formed
12 |
13 | RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.], epub20_minimal_encryption.epub
14 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (24-67)
15 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82)
16 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81)
17 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75)
18 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61)
19 |
20 | application/epub+zip
21 |
22 |
23 | CharacterCount
24 |
25 | 25
26 |
27 |
28 |
29 | Language
30 |
31 | en
32 |
33 |
34 |
35 | Info
36 |
37 |
38 | Identifier
39 |
40 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
41 |
42 |
43 |
44 | CreationDate
45 |
46 | 2015-06-02T16:34:06Z
47 |
48 |
49 |
50 | Title
51 |
52 | When (not) to migrate a PDF to PDF/A
53 |
54 |
55 |
56 | Creator
57 |
58 | Johan van der Knijff
59 |
60 |
61 |
62 | Date
63 |
64 | 2015-03-03
65 |
66 |
67 |
68 |
69 |
70 | MediaTypes
71 |
72 | application/x-dtbncx+xml
73 | application/xhtml+xml
74 | image/png
75 | image/jpeg
76 |
77 |
78 |
79 | hasEncryption
80 |
81 | true
82 |
83 |
84 |
85 |
86 |
87 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_encryption_binary_content.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:23+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Not well-formed
12 |
13 | RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.], epub20_encryption_binary_content.epub
14 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (24-67)
15 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82)
16 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81)
17 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75)
18 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61)
19 | HTM-023, WARN, [An invalid XHTML Named Entity was found: '&0;'.], OEBPS/Text/pdfMigration.html (18-197)
20 | HTM-023, WARN, [An invalid XHTML Named Entity was found: '&l0xb'.], OEBPS/Text/pdfMigration.html (291-6)
21 |
22 | application/epub+zip
23 |
24 |
25 | CharacterCount
26 |
27 | 25
28 |
29 |
30 |
31 | Language
32 |
33 | en
34 |
35 |
36 |
37 | Info
38 |
39 |
40 | Identifier
41 |
42 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
43 |
44 |
45 |
46 | CreationDate
47 |
48 | 2015-06-02T16:34:06Z
49 |
50 |
51 |
52 | Title
53 |
54 | When (not) to migrate a PDF to PDF/A
55 |
56 |
57 |
58 | Creator
59 |
60 | Johan van der Knijff
61 |
62 |
63 |
64 | Date
65 |
66 | 2015-03-03
67 |
68 |
69 |
70 |
71 |
72 | MediaTypes
73 |
74 | application/x-dtbncx+xml
75 | application/xhtml+xml
76 | image/png
77 | image/jpeg
78 |
79 |
80 |
81 | hasEncryption
82 |
83 | true
84 |
85 |
86 |
87 |
88 |
89 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_crazy_fixed_layout.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:30+02:00
7 |
8 | 2016-03-31T17:10:22Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 |
13 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (6-2)
14 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (24-1)
15 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (43-1)
16 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (51-1)
17 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (59-1)
18 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (67-1)
19 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (75-1)
20 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (83-1)
21 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (91-1)
22 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (99-1)
23 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (107-1)
24 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (115-1)
25 |
26 | application/epub+zip
27 |
28 |
29 | CharacterCount
30 |
31 | 491
32 |
33 |
34 |
35 | Language
36 |
37 | en
38 |
39 |
40 |
41 | Info
42 |
43 |
44 | Identifier
45 |
46 | urn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6
47 |
48 |
49 |
50 | CreationDate
51 |
52 | 2016-03-31T17:10:22Z
53 |
54 |
55 |
56 | Title
57 |
58 | Crazy Fixed Layout
59 |
60 |
61 |
62 | Creator
63 |
64 | Johan van der Knijff
65 |
66 |
67 |
68 | Date
69 |
70 | 2016-03-31
71 |
72 |
73 |
74 |
75 |
76 | MediaTypes
77 |
78 | application/x-dtbncx+xml
79 | application/xhtml+xml
80 | text/css
81 |
82 |
83 |
84 |
85 |
86 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_minimal.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:20:03+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 | application/epub+zip
13 |
14 |
15 | CharacterCount
16 |
17 | 4520
18 |
19 |
20 |
21 | Language
22 |
23 | en
24 |
25 |
26 |
27 | Info
28 |
29 |
30 | Identifier
31 |
32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
33 |
34 |
35 |
36 | CreationDate
37 |
38 | 2015-06-02T16:34:06Z
39 |
40 |
41 |
42 | Title
43 |
44 | When (not) to migrate a PDF to PDF/A
45 |
46 |
47 |
48 | Creator
49 |
50 | Johan van der Knijff
51 |
52 |
53 |
54 | Date
55 |
56 | 2015-03-03
57 |
58 |
59 |
60 |
61 |
62 | References
63 |
64 |
65 | Reference
66 |
67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format
68 |
69 |
70 |
71 | Reference
72 |
73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
74 |
75 |
76 |
77 | Reference
78 |
79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
80 |
81 |
82 |
83 | Reference
84 |
85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
86 |
87 |
88 |
89 | Reference
90 |
91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
92 |
93 |
94 |
95 | Reference
96 |
97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
98 |
99 |
100 |
101 |
102 |
103 | MediaTypes
104 |
105 | application/x-dtbncx+xml
106 | application/xhtml+xml
107 | image/png
108 | image/jpeg
109 |
110 |
111 |
112 |
113 |
114 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20__invalid_entity.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:43+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Not well-formed
12 |
13 | RSC-016, FATAL, [Fatal Error while parsing file 'An invalid XML character (Unicode: 0xb) was found in the element content of the document.'.], OEBPS/Text/pdfMigration.html (17-520)
14 | RSC-005, ERROR, [Error while parsing file 'An invalid XML character (Unicode: 0xb) was found in the element content of the document.'.], OEBPS/Text/pdfMigration.html
15 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82)
16 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81)
17 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75)
18 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61)
19 |
20 | application/epub+zip
21 |
22 |
23 | CharacterCount
24 |
25 | 25
26 |
27 |
28 |
29 | Language
30 |
31 | en
32 |
33 |
34 |
35 | Info
36 |
37 |
38 | Identifier
39 |
40 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
41 |
42 |
43 |
44 | CreationDate
45 |
46 | 2015-06-02T16:34:06Z
47 |
48 |
49 |
50 | Title
51 |
52 | When (not) to migrate a PDF to PDF/A
53 |
54 |
55 |
56 | Creator
57 |
58 | Johan van der Knijff
59 |
60 |
61 |
62 | Date
63 |
64 | 2015-03-03
65 |
66 |
67 |
68 |
69 |
70 | References
71 |
72 |
73 | Reference
74 |
75 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format
76 |
77 |
78 |
79 | Reference
80 |
81 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
82 |
83 |
84 |
85 | Reference
86 |
87 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
88 |
89 |
90 |
91 |
92 |
93 | MediaTypes
94 |
95 | application/x-dtbncx+xml
96 | application/xhtml+xml
97 | image/png
98 | image/jpeg
99 |
100 |
101 |
102 |
103 |
104 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_foreign_resource_with_fallback.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:50+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 | application/epub+zip
13 |
14 |
15 | CharacterCount
16 |
17 | 4521
18 |
19 |
20 |
21 | Language
22 |
23 | en
24 |
25 |
26 |
27 | Info
28 |
29 |
30 | Identifier
31 |
32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
33 |
34 |
35 |
36 | CreationDate
37 |
38 | 2015-06-02T16:34:06Z
39 |
40 |
41 |
42 | Title
43 |
44 | When (not) to migrate a PDF to PDF/A
45 |
46 |
47 |
48 | Creator
49 |
50 | Johan van der Knijff
51 |
52 |
53 |
54 | Date
55 |
56 | 2015-03-03
57 |
58 |
59 |
60 |
61 |
62 | References
63 |
64 |
65 | Reference
66 |
67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format
68 |
69 |
70 |
71 | Reference
72 |
73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
74 |
75 |
76 |
77 | Reference
78 |
79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
80 |
81 |
82 |
83 | Reference
84 |
85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
86 |
87 |
88 |
89 | Reference
90 |
91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
92 |
93 |
94 |
95 | Reference
96 |
97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
98 |
99 |
100 |
101 |
102 |
103 | MediaTypes
104 |
105 | application/x-dtbncx+xml
106 | application/xhtml+xml
107 | image/jp2
108 | image/png
109 | image/jpeg
110 |
111 |
112 |
113 |
114 |
115 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_foreign_resource_with_fallback_noID.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:19:16+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 | application/epub+zip
13 |
14 |
15 | CharacterCount
16 |
17 | 4521
18 |
19 |
20 |
21 | Language
22 |
23 | en
24 |
25 |
26 |
27 | Info
28 |
29 |
30 | Identifier
31 |
32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
33 |
34 |
35 |
36 | CreationDate
37 |
38 | 2015-06-02T16:34:06Z
39 |
40 |
41 |
42 | Title
43 |
44 | When (not) to migrate a PDF to PDF/A
45 |
46 |
47 |
48 | Creator
49 |
50 | Johan van der Knijff
51 |
52 |
53 |
54 | Date
55 |
56 | 2015-03-03
57 |
58 |
59 |
60 |
61 |
62 | References
63 |
64 |
65 | Reference
66 |
67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format
68 |
69 |
70 |
71 | Reference
72 |
73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
74 |
75 |
76 |
77 | Reference
78 |
79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
80 |
81 |
82 |
83 | Reference
84 |
85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
86 |
87 |
88 |
89 | Reference
90 |
91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
92 |
93 |
94 |
95 | Reference
96 |
97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
98 |
99 |
100 |
101 |
102 |
103 | MediaTypes
104 |
105 | application/x-dtbncx+xml
106 | application/xhtml+xml
107 | image/jp2
108 | image/png
109 | image/jpeg
110 |
111 |
112 |
113 |
114 |
115 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_xpgt.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:20:10+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Well-formed
12 | application/epub+zip
13 |
14 |
15 | CharacterCount
16 |
17 | 4526
18 |
19 |
20 |
21 | Language
22 |
23 | en
24 |
25 |
26 |
27 | Info
28 |
29 |
30 | Identifier
31 |
32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
33 |
34 |
35 |
36 | CreationDate
37 |
38 | 2015-06-02T16:34:06Z
39 |
40 |
41 |
42 | Title
43 |
44 | When (not) to migrate a PDF to PDF/A
45 |
46 |
47 |
48 | Creator
49 |
50 | Johan van der Knijff
51 |
52 |
53 |
54 | Date
55 |
56 | 2015-03-03
57 |
58 |
59 |
60 |
61 |
62 | References
63 |
64 |
65 | Reference
66 |
67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format
68 |
69 |
70 |
71 | Reference
72 |
73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
74 |
75 |
76 |
77 | Reference
78 |
79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
80 |
81 |
82 |
83 | Reference
84 |
85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
86 |
87 |
88 |
89 | Reference
90 |
91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
92 |
93 |
94 |
95 | Reference
96 |
97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
98 |
99 |
100 |
101 |
102 |
103 | MediaTypes
104 |
105 | application/x-dtbncx+xml
106 | application/xhtml+xml
107 | image/png
108 | image/jpeg
109 | text/css
110 | application/vnd.adobe-page-template+xml
111 |
112 |
113 |
114 |
115 |
116 |
--------------------------------------------------------------------------------
/epubcheckout/4.0.1/epub20_foreign_resource_no_fallback.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 | 2017-08-29T12:18:48+02:00
7 |
8 | 2015-06-02T16:34:06Z
9 | application/epub+zip
10 | 2.0.1
11 | Not well-formed
12 |
13 | MED-003, ERROR, [Non-standard image resource of type image/jp2 found.], OEBPS/Text/pdfMigration.html (20-63)
14 |
15 | application/epub+zip
16 |
17 |
18 | CharacterCount
19 |
20 | 4520
21 |
22 |
23 |
24 | Language
25 |
26 | en
27 |
28 |
29 |
30 | Info
31 |
32 |
33 | Identifier
34 |
35 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6
36 |
37 |
38 |
39 | CreationDate
40 |
41 | 2015-06-02T16:34:06Z
42 |
43 |
44 |
45 | Title
46 |
47 | When (not) to migrate a PDF to PDF/A
48 |
49 |
50 |
51 | Creator
52 |
53 | Johan van der Knijff
54 |
55 |
56 |
57 | Date
58 |
59 | 2015-03-03
60 |
61 |
62 |
63 |
64 |
65 | References
66 |
67 |
68 | Reference
69 |
70 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format
71 |
72 |
73 |
74 | Reference
75 |
76 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf
77 |
78 |
79 |
80 | Reference
81 |
82 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf
83 |
84 |
85 |
86 | Reference
87 |
88 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf
89 |
90 |
91 |
92 | Reference
93 |
94 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation
95 |
96 |
97 |
98 | Reference
99 |
100 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21
101 |
102 |
103 |
104 |
105 |
106 | MediaTypes
107 |
108 | application/x-dtbncx+xml
109 | application/xhtml+xml
110 | image/jp2
111 | image/jpeg
112 |
113 |
114 |
115 |
116 |
117 |
--------------------------------------------------------------------------------
/pubresources/pdfMigration.md:
--------------------------------------------------------------------------------
1 | # When (not) to migrate a PDF to PDF/A
2 |
3 | It is well-known that PDF documents can contain features that are preservation risks (e.g. see [here](https://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format) and [here](http://wiki.opf-labs.org/display/TR/Portable+Document+Format)). Migration of existing *PDF*s to *PDF/A* is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
4 |
5 |
6 |
7 |
8 | ## *PDF/A* is a profile
9 | First, it's important to stress that each of the *PDF/A* standards (*A-1*, *A-2* and *A-3*) are really just *profiles* within the *PDF* format. More specifically, *PDF/A-1* offers a subset of [*PDF 1.4*](http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf), whereas *PDF/A-2* and *PDF/A-3* are based on [the ISO 32000 version of *PDF 1.7*](http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf). What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' *PDF*. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned *PDF* flavours:
10 |
11 | 
12 |
13 | Here we see how *PDF/A-1* is a subset of *PDF 1.4*, which in turn is a subset of *PDF 1.7*. *PDF A/2* and *PDF A/3* (aggregated here as one entity for the sake of readability) are subsets of *PDF 1.7*, and include all the features of *PDF A/1*.
14 |
15 | Keeping this in mind, it's easy to see that migrating an arbitrary *PDF* to *PDF/A* can result in problems.
16 |
17 | ##Loss, alteration during migration
18 | Suppose, as an example, that we have a *PDF* that contains a movie. This is prohibited in *PDF/A*, so migrating to *PDF/A* will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a *PDF/A* document must be embedded. But what happens if the source *PDF* uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
19 |
20 | ##Complexity and effect of errors
21 | Also, migrations like these typically involve a complete re-processing of the *PDF*'s internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source *PDF* contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a [sufficiently reliable *PDF* validator](http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf)), these cases can be difficult to deal with. Some further considerations can be found [here](http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation) (the context there is slightly different, but the risks are similar).
22 |
23 | ##Digitised vs born-digital
24 | The origin of the source *PDF*s may be another thing to take into account. If *PDF*s were originally created as part of a digitisation project (e.g. scanned books), the *PDF* is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such *PDF*s to *PDF/A* is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in *PDF/A*. At the same time, this also means that the benefits of migrating such files to *PDF/A* are pretty limited, since the source *PDF*s weren't problematic to begin with!
25 |
26 | The potential benefits *PDF/A* may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also [here](http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21) for some additional considerations).
27 |
28 | ##Conclusions
29 | Although migrating *PDF* documents to *PDF/A* may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source *PDF*s that weren't problematic to begin with, which belies the very purpose of migrating to *PDF/A*. For specific cases, migration to *PDF/A* may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of *PDF*s (both source *and* destination!), it would also seem prudent to always keep the originals.
--------------------------------------------------------------------------------
/pubresources/pdfMigration.html:
--------------------------------------------------------------------------------
1 |
When (not) to migrate a PDF to PDF/A
2 |
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
3 |
4 |
5 |
6 |
PDF/A is a profile
7 |
First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:
8 |
9 |
PDF Venn diagram
10 |
11 |
Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.
12 |
Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.
13 |
Loss, alteration during migration
14 |
Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
15 |
Complexity and effect of errors
16 |
Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).
17 |
Digitised vs born-digital
18 |
The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!
19 |
The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).
20 |
Conclusions
21 |
Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
14 |
15 |
PDF/A is a profile
16 |
17 |
First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:
18 |
19 |
20 |
21 |
22 |
PDF Venn diagram
23 |
24 |
25 |
Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.
26 |
27 |
Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.
28 |
29 |
Loss, alteration during migration
30 |
31 |
Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
32 |
33 |
Complexity and effect of errors
34 |
35 |
Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).
36 |
37 |
Digitised vs born-digital
38 |
39 |
The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!
40 |
41 |
The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).
42 |
43 |
Conclusions
44 |
45 |
Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
14 |
15 |
PDF/A is a profile
16 |
17 |
First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 17. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:
18 |
19 |
20 |
21 |
22 |
PDF Venn diagram
23 |
24 |
25 |
Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.
26 |
27 |
Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.
28 |
29 |
Loss, alteration during migration
30 |
31 |
Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
32 |
33 |
Complexity and effect of errors
34 |
35 |
Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).
36 |
37 |
Digitised vs born-digital
38 |
39 |
The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!
40 |
41 |
The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).
42 |
43 |
Conclusions
44 |
45 |
Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
14 |
15 |
PDF/A is a profile
16 |
17 |
First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:
18 |
19 |
20 |
21 |
22 |
PDF Venn diagram
23 |
24 |
25 |
Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.
26 |
27 |
Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.
28 |
29 |
Loss, alteration during migration
30 |
31 |
Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
32 |
33 |
Complexity and effect of errors
34 |
35 |
Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).
36 |
37 |
Digitised vs born-digital
38 |
39 |
The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!
40 |
41 |
The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).
42 |
43 |
Conclusions
44 |
45 |
Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
14 |
15 |
PDF/A is a profile
16 |
17 |
First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:
18 |
19 |
20 |
21 |
22 |
PDF Venn diagram
23 |
24 |
25 |
Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.
26 |
27 |
Keeping this in mind, it's easy to see that m igrating an arbitrary PDF to PDF/A can result in problems.
28 |
29 |
Loss, alteration during migration
30 |
31 |
Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
32 |
33 |
Complexity and effect of errors
34 |
35 |
Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).
36 |
37 |
Digitised vs born-digital
38 |
39 |
The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!
40 |
41 |
The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).
42 |
43 |
Conclusions
44 |
45 |
Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.
14 |
15 |
PDF/A is a profile
16 |
17 |
First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:
18 |
19 |
20 |
21 |
22 |
PDF Venn diagram
23 |
24 |
25 |
Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.
26 |
27 |
Keeping this in mind, it's easy to see that m igrating an arbitrary PDF to PDF/A can result in problems.
28 |
29 |
Loss, alteration during migration
30 |
31 |
Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?
32 |
33 |
Complexity and effect of errors
34 |
35 |
Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).
36 |
37 |
Digitised vs born-digital
38 |
39 |
The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!
40 |
41 |
The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).
42 |
43 |
Conclusions
44 |
45 |
Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.