├── content ├── epub20_xpgt │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── stylesheet.css │ │ ├── Text │ │ │ └── cover.xhtml │ │ ├── content.opf │ │ ├── toc.ncx │ │ └── page-template.xpgt │ └── META-INF │ │ └── container.xml ├── epub20_dtbook │ ├── mimetype │ ├── OEBPS │ │ ├── valentin.jpg │ │ └── package.opf │ └── META-INF │ │ └── container.xml ├── epub20_minimal │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── Text │ │ │ ├── cover.xhtml │ │ │ └── pdfMigration.html │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ └── container.xml ├── epub20_crazy_columns │ ├── mimetype │ ├── META-INF │ │ ├── calibre_bookmarks.txt │ │ └── container.xml │ └── OEBPS │ │ ├── Styles │ │ └── styles.css │ │ ├── toc.ncx │ │ ├── content.opf │ │ └── Text │ │ └── Section0001.xhtml ├── epub20__invalid_entity │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── Text │ │ │ ├── cover.xhtml │ │ │ └── pdfMigration.html │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ └── container.xml ├── epub20_crazy_fixed_layout │ ├── mimetype │ ├── META-INF │ │ ├── calibre_bookmarks.txt │ │ └── container.xml │ └── OEBPS │ │ ├── toc.ncx │ │ ├── content.opf │ │ ├── Text │ │ └── Section0001.xhtml │ │ └── Styles │ │ └── styles.css ├── epub20_minimal_encryption │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── Text │ │ │ └── cover.xhtml │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ ├── container.xml │ │ └── encryption.xml ├── epub20_missingfontresource │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── stylesheet.css │ │ ├── Text │ │ │ └── cover.xhtml │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ └── container.xml ├── epub30_font_obfuscation │ ├── mimetype │ ├── EPUB │ │ ├── wasteland-cover.jpg │ │ ├── OldStandard-Bold.obf.otf │ │ ├── OldStandard-Italic.obf.otf │ │ ├── OldStandard-Regular.obf.otf │ │ ├── wasteland-night.css │ │ ├── fonts.css │ │ ├── wasteland.css │ │ ├── wasteland-nav.xhtml │ │ ├── wasteland.ncx │ │ └── wasteland.opf │ └── META-INF │ │ ├── container.xml │ │ └── encryption.xml ├── epub20_encryption_binary_content │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── Text │ │ │ ├── pdfMigration.html │ │ │ └── cover.xhtml │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ ├── container.xml │ │ └── encryption.xml ├── epub20_foreign_resource_no_fallback │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.jp2 │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── Text │ │ │ ├── cover.xhtml │ │ │ └── pdfMigration.html │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ └── container.xml ├── epub20_foreign_resource_with_fallback │ ├── mimetype │ ├── OEBPS │ │ ├── Images │ │ │ ├── pdfVenn.jp2 │ │ │ ├── pdfVenn.png │ │ │ └── corruptedjp2_kdu_zoom3.jpg │ │ ├── Text │ │ │ ├── cover.xhtml │ │ │ └── pdfMigration.html │ │ ├── content.opf │ │ └── toc.ncx │ └── META-INF │ │ └── container.xml └── epub20_foreign_resource_with_fallback_noID │ ├── mimetype │ ├── OEBPS │ ├── Images │ │ ├── pdfVenn.jp2 │ │ ├── pdfVenn.png │ │ └── corruptedjp2_kdu_zoom3.jpg │ ├── Text │ │ ├── cover.xhtml │ │ └── pdfMigration.html │ ├── content.opf │ └── toc.ncx │ └── META-INF │ └── container.xml ├── pubresources ├── pdfVenn.png ├── epub20_minimal.epub ├── corruptedjp2_kdu_zoom3.jpg ├── encryption.xml ├── pdfMigration.md └── pdfMigration.html ├── epubcheckout ├── 3.0.1 │ ├── epub20_crazy_columns.xml │ ├── epub20_crazy_fixed_layout.xml │ ├── epub20_minimal_encryption.xml │ ├── epub20_encryption_binary_content.xml │ ├── epub20_minimal.xml │ ├── epub20_foreign_resource_with_fallback.xml │ ├── epub20_foreign_resource_with_fallback_noID.xml │ ├── epub20_foreign_resource_no_fallback.xml │ ├── epub20_xpgt.xml │ ├── epub20__invalid_entity.xml │ ├── epub20_missingfontresource.xml │ └── epub30_font_obfuscation.xml └── 4.0.1 │ ├── epub20_dtbook.xml │ ├── epub20_crazy_columns.xml │ ├── epub20_minimal_encryption.xml │ ├── epub20_encryption_binary_content.xml │ ├── epub20_crazy_fixed_layout.xml │ ├── epub20_minimal.xml │ ├── epub20__invalid_entity.xml │ ├── epub20_foreign_resource_with_fallback.xml │ ├── epub20_foreign_resource_with_fallback_noID.xml │ ├── epub20_xpgt.xml │ ├── epub20_foreign_resource_no_fallback.xml │ └── epub20_missingfontresource.xml └── analyse.sh /content/epub20_xpgt/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_dtbook/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_minimal/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_crazy_columns/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20__invalid_entity/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_missingfontresource/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/mimetype: -------------------------------------------------------------------------------- 1 | application/epub+zip -------------------------------------------------------------------------------- /pubresources/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/pubresources/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20_crazy_columns/META-INF/calibre_bookmarks.txt: -------------------------------------------------------------------------------- 1 | calibre_current_page_bookmark*|!|?|*0*|!|?|*/2/4/2[h01]/1:0 2 | -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/META-INF/calibre_bookmarks.txt: -------------------------------------------------------------------------------- 1 | calibre_current_page_bookmark*|!|?|*0*|!|?|*/2/4/2[h01]/1:1 2 | -------------------------------------------------------------------------------- /pubresources/epub20_minimal.epub: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/pubresources/epub20_minimal.epub -------------------------------------------------------------------------------- /content/epub20_dtbook/OEBPS/valentin.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_dtbook/OEBPS/valentin.jpg -------------------------------------------------------------------------------- /pubresources/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/pubresources/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_xpgt/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20_minimal/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_minimal/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20__invalid_entity/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20__invalid_entity/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/wasteland-cover.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub30_font_obfuscation/EPUB/wasteland-cover.jpg -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_minimal_encryption/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20_missingfontresource/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_missingfontresource/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_xpgt/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/OldStandard-Bold.obf.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub30_font_obfuscation/EPUB/OldStandard-Bold.obf.otf -------------------------------------------------------------------------------- /content/epub20_minimal/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_minimal/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/OldStandard-Italic.obf.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub30_font_obfuscation/EPUB/OldStandard-Italic.obf.otf -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_encryption_binary_content/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/OldStandard-Regular.obf.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub30_font_obfuscation/EPUB/OldStandard-Regular.obf.otf -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/OEBPS/Images/pdfVenn.jp2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_no_fallback/OEBPS/Images/pdfVenn.jp2 -------------------------------------------------------------------------------- /content/epub20_missingfontresource/OEBPS/stylesheet.css: -------------------------------------------------------------------------------- 1 | 2 | @font-face { 3 | font-family: Courier; 4 | font-style: normal; 5 | font-weight: normal; 6 | src:url("Fonts/CourierStd.otf"); 7 | } 8 | 9 | -------------------------------------------------------------------------------- /content/epub20__invalid_entity/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20__invalid_entity/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/OEBPS/Text/pdfMigration.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_encryption_binary_content/OEBPS/Text/pdfMigration.html -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/Images/pdfVenn.jp2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_with_fallback/OEBPS/Images/pdfVenn.jp2 -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_with_fallback/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_minimal_encryption/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/Images/pdfVenn.jp2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_with_fallback_noID/OEBPS/Images/pdfVenn.jp2 -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/Images/pdfVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_with_fallback_noID/OEBPS/Images/pdfVenn.png -------------------------------------------------------------------------------- /content/epub20_missingfontresource/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_missingfontresource/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_encryption_binary_content/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/stylesheet.css: -------------------------------------------------------------------------------- 1 | /* Style Sheet */ 2 | body { 3 | font-family: "Times New Roman",serif; 4 | font-size: 1em; 5 | color: rgb(0,0,130); 6 | background-color: rgb(240,240,240); 7 | } 8 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_no_fallback/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_with_fallback/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KBNLresearch/epubPolicyTests/master/content/epub20_foreign_resource_with_fallback_noID/OEBPS/Images/corruptedjp2_kdu_zoom3.jpg -------------------------------------------------------------------------------- /content/epub20_dtbook/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /content/epub20_minimal/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_xpgt/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20__invalid_entity/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_crazy_columns/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_missingfontresource/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/META-INF/container.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /content/epub20_crazy_columns/OEBPS/Styles/styles.css: -------------------------------------------------------------------------------- 1 | .heading1 2 | { 3 | font-family:century; 4 | font-size:25px; 5 | color:#231F20; 6 | line-height: 1.7em; 7 | text-align: justify; 8 | width: 550px; 9 | } 10 | 11 | #h01 12 | { 13 | position:absolute; 14 | left:40px; 15 | top:20px; 16 | } 17 | 18 | .pos {position:absolute; 19 | } 20 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/wasteland-night.css: -------------------------------------------------------------------------------- 1 | @charset "UTF-8"; 2 | @import "wasteland.css"; 3 | 4 | body { 5 | color: rgb(255,250,205); 6 | background-color: rgb(20,20,20); 7 | } 8 | 9 | span.lnum { 10 | color: rgb(175,170,125); 11 | } 12 | 13 | a.noteref { 14 | color: rgb(120,120,120); 15 | } 16 | 17 | section#rearnotes a { 18 | color: rgb(255,250,205); 19 | } -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/fonts.css: -------------------------------------------------------------------------------- 1 | @font-face { 2 | font-family: 'OldStandard'; 3 | font-weight: normal; 4 | font-style: normal; 5 | src:url(OldStandard-Regular.obf.otf) format('opentype'); 6 | } 7 | 8 | @font-face { 9 | font-family: 'OldStandard'; 10 | font-weight: bold; 11 | font-style: normal; 12 | src:url(OldStandard-Bold.obf.otf) format('opentype'); 13 | } 14 | 15 | @font-face { 16 | font-family:'OldStandard'; 17 | font-weight: normal; 18 | font-style: italic; 19 | src:url(OldStandard-Italic.obf.otf) format('opentype'); 20 | } 21 | -------------------------------------------------------------------------------- /pubresources/encryption.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | urn:uuid:a22336f8-c280-11e4-8dfc-aa07a5b093db 6 | 7 | 8 | 9 | 10 | 11 | 12 | -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/META-INF/encryption.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 6 | 7 | 8 | 9 | 10 | 11 | 12 | -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/META-INF/encryption.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 6 | 7 | 8 | 9 | 10 | 11 | 12 | -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_minimal/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20__invalid_entity/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_missingfontresource/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/Text/cover.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | Cover 8 | 9 | 10 | 11 |
12 | 13 |
14 | 15 | 16 | -------------------------------------------------------------------------------- /content/epub20_crazy_columns/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | Unknown 12 | 13 | 14 | 15 | 16 | Start 17 | 18 | 19 | 20 | 21 | -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | Unknown 12 | 13 | 14 | 15 | 16 | Start 17 | 18 | 19 | 20 | 21 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/META-INF/encryption.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | -------------------------------------------------------------------------------- /content/epub20_crazy_columns/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6 5 | 6 | Crazy Columns 7 | Johan van der Knijff 8 | en 9 | 2016-04-01 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6 5 | 6 | Crazy Fixed Layout 7 | Johan van der Knijff 8 | en 9 | 2016-03-31 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | -------------------------------------------------------------------------------- /content/epub20_dtbook/OEBPS/package.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Valentin Haüy - the father of the education for the blind 5 | Beatrice Christensen Sköld 6 | TPB 7 | 2006-03-23 8 | 2007-08-09 9 | C00000 10 | en 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/wasteland.css: -------------------------------------------------------------------------------- 1 | @charset "UTF-8"; 2 | @import "fonts.css"; 3 | @namespace "http://www.w3.org/1999/xhtml"; 4 | @namespace epub "http://www.idpf.org/2007/ops"; 5 | 6 | body { 7 | margin-left: 6em; 8 | margin-right: 16em; 9 | color: black; 10 | /* use sans-serif as fallback to make the difference obvious */ 11 | font-family: 'OldStandard', sans-serif; 12 | background-color: rgb(255,255,245); 13 | line-height: 1.5em; 14 | } 15 | 16 | h2 { 17 | margin-top: 5em; 18 | margin-bottom: 2em; 19 | } 20 | 21 | h3 { 22 | margin-top: 3em; 23 | } 24 | 25 | .linegroup { 26 | margin-top: 1.6em; 27 | } 28 | 29 | span.lnum { 30 | float: right; 31 | color: gray; 32 | font-size : 90%; 33 | } 34 | 35 | a.noteref { 36 | color: rgb(215,215,195); 37 | text-decoration: none; 38 | margin-left: 0.5em; 39 | margin-right: 0.5em; 40 | } 41 | 42 | section#rearnotes a { 43 | color: black; 44 | text-decoration: none; 45 | border-bottom : 1px dotted gray; 46 | margin-right: 0.8em; 47 | } 48 | 49 | .indent { 50 | padding-left: 3em; 51 | } 52 | 53 | .indent2 { 54 | padding-left: 5em; 55 | } 56 | 57 | *[epub|type~='dedication'] { 58 | padding-left: 2em; 59 | } 60 | -------------------------------------------------------------------------------- /content/epub20_crazy_columns/OEBPS/Text/Section0001.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | Crazy Columns Example 9 | 10 | 11 | 12 |

Crazy Columns

13 | 14 |
This is an EPUB file
15 |
page. Even though this
16 |
that uses a two-column
17 |
file is valid EPUB, there's
18 |
layout. For each column,
19 |
no way to establish the
20 |
every line is placed at
21 |
logical reading order of
22 |
a fixed position on the
23 |
the text.
24 | 25 | 26 | -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/OEBPS/Text/Section0001.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | Crazy Fixed Layout Example 10 | 11 | 12 | 13 |

Crazy Fixed Layout

14 | 15 |

This is an EPUB 2 file that uses a fixed layout.

16 |

This is achieved by placing each line inside a

17 |

paragraph element. Each paragraph element

18 |

is placed at a fixed position on the page. Even

19 |

though this file is valid EPUB, this is a pretty

20 |

terrible idea, because in most readers the text

21 |

will not reflow after resizing the viewer window.

22 |

If you increase the font size, text will appear

23 |

garbled. This is a really crap way to make an

24 |

EPUB.

25 | 26 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/wasteland-nav.xhtml: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 20 | 30 | 31 | 32 | 33 | -------------------------------------------------------------------------------- /content/epub20_minimal/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /content/epub20__invalid_entity/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_crazy_columns.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:18:57+02:00 4 | 5 | 2016-04-01T13:11:36Z 6 | application/epub+zip 7 | 2.0 8 | Well-formed 9 | application/epub+zip 10 | 11 | CharacterCount273 12 | Languageen 13 | Info 14 | Identifierurn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6 15 | CreationDate2016-04-01T13:11:36Z 16 | TitleCrazy Columns 17 | CreatorJohan van der Knijff 18 | Date2016-04-01 19 | 20 | 21 | 22 | 23 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_crazy_fixed_layout.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:26+02:00 4 | 5 | 2016-03-31T17:10:22Z 6 | application/epub+zip 7 | 2.0 8 | Well-formed 9 | application/epub+zip 10 | 11 | CharacterCount491 12 | Languageen 13 | Info 14 | Identifierurn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6 15 | CreationDate2016-03-31T17:10:22Z 16 | TitleCrazy Fixed Layout 17 | CreatorJohan van der Knijff 18 | Date2016-03-31 19 | 20 | 21 | 22 | 23 | -------------------------------------------------------------------------------- /content/epub20_missingfontresource/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/content.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 5 | When (not) to migrate a PDF to PDF/A 6 | Johan van der Knijff 7 | en 8 | 9 | 10 | 2015-03-03 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /analyse.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Analyse Epubs in build directory with epubcheck (both version 3 & 4) 4 | 5 | # Location of epubcheck jars (update according to your own system) 6 | epubcheck3Jar=/usr/share/java/epubcheck.jar 7 | epubcheck4Jar=/home/johan/epubcheck/epubcheck.jar 8 | 9 | # ---- No need to edit anything below this line, unless you know what you're doing! 10 | 11 | # Installation directory 12 | instDir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" 13 | 14 | # Build directory - where newly created epubs are stored 15 | buildDir="$instDir"/build/ 16 | 17 | # Epubcheck output directory - epubcheck output goes here 18 | eCOutDir="$instDir"/epubcheckout/ 19 | 20 | # Subdirs for epubcheck versions 21 | ec3OutDir="$eCOutDir"3.0.1/ 22 | ec4OutDir="$eCOutDir"4.0.1/ 23 | 24 | # Create output directory structure if it doesn't exist already 25 | 26 | if ! [ -d $eCOutDir ] ; then 27 | mkdir $eCOutDir 28 | fi 29 | 30 | if ! [ -d $ec3OutDir ] ; then 31 | mkdir $ec3OutDir 32 | fi 33 | 34 | if ! [ -d $ec4OutDir ] ; then 35 | mkdir $ec4OutDir 36 | fi 37 | 38 | # ************** 39 | # MAIN PROCESSING LOOP 40 | # ************** 41 | 42 | counter=0 43 | 44 | while IFS= read -d $'\0' epub ; do 45 | # Base name (strip away path) 46 | epubFileName=$(basename "$epub") 47 | epubBaseName="${epubFileName%.*}" 48 | 49 | # Generate epubcheck output file names 50 | ec3OutName="$ec3OutDir"$"$epubBaseName".xml 51 | ec4OutName="$ec4OutDir"$"$epubBaseName".xml 52 | 53 | # Run Epubcheck 54 | java -jar $epubcheck3Jar "$epub" -out "$ec3OutName" # 2>tmpec3.stderr 55 | java -jar $epubcheck4Jar "$epub" -out "$ec4OutName" # 2>tmpec3.stderr 56 | 57 | done < <(find "$buildDir" -maxdepth 1 -mindepth 1 -print0 -type f) 58 | 59 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/wasteland.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | The Waste Land 9 | 10 | 11 | 12 | 13 | 14 | I. THE BURIAL OF THE DEAD 15 | 16 | 17 | 18 | 19 | 20 | II. A GAME OF CHESS 21 | 22 | 23 | 24 | 25 | 26 | III. THE FIRE SERMON 27 | 28 | 29 | 30 | 31 | 32 | IV. DEATH BY WATER 33 | 34 | 35 | 36 | 37 | 38 | V. WHAT THE THUNDER SAID 39 | 40 | 41 | 42 | 43 | 44 | NOTES ON "THE WASTE LAND" 45 | 46 | 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_minimal/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20__invalid_entity/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_minimal_encryption/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_missingfontresource/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_encryption_binary_content/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/toc.ncx: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Unknown 13 | 14 | 15 | 16 | 17 | When (not) to migrate a PDF to PDF/A 18 | 19 | 20 | 21 | 22 | PDF/A is a profile 23 | 24 | 25 | 26 | 27 | 28 | Loss, alteration during migration 29 | 30 | 31 | 32 | 33 | 34 | Complexity and effect of errors 35 | 36 | 37 | 38 | 39 | 40 | Digitised vs born-digital 41 | 42 | 43 | 44 | 45 | 46 | Conclusions 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /content/epub20_crazy_fixed_layout/OEBPS/Styles/styles.css: -------------------------------------------------------------------------------- 1 | body 2 | { 3 | left:0; 4 | top:0; 5 | margin:0; 6 | position:absolute; 7 | width:700px; 8 | height:1100px; 9 | border: 0; 10 | } 11 | 12 | .heading1 13 | { 14 | font-family:century; 15 | font-size:25px; 16 | color:#231F20; 17 | line-height: 1.7em; 18 | text-align: justify; 19 | width: 550px; 20 | } 21 | 22 | #h01 23 | { 24 | position:absolute; 25 | left:40px; 26 | top:20px; 27 | letter-spacing:0.6px; 28 | word-spacing:0.1em; 29 | } 30 | 31 | .para 32 | { 33 | font-family:century; 34 | font-size:18.5px; 35 | color:#231F20; 36 | line-height: 1.7em; 37 | text-align: justify; 38 | width: 550px; 39 | } 40 | 41 | #p01 42 | { 43 | position:absolute; 44 | left:40px; 45 | top:80px; 46 | letter-spacing:0.42px; 47 | word-spacing:0.1em; 48 | } 49 | #p02 50 | { 51 | position:absolute; 52 | left:40px; 53 | top:120px; 54 | letter-spacing:0.42px; 55 | word-spacing:0.1em; 56 | } 57 | #p03 58 | { 59 | position:absolute; 60 | left:40px; 61 | top:160px; 62 | letter-spacing:0.42px; 63 | word-spacing:0.1em; 64 | } 65 | #p04 66 | { 67 | position:absolute; 68 | left:40px; 69 | top:200px; 70 | letter-spacing:0.42px; 71 | word-spacing:0.1em; 72 | } 73 | #p05 74 | { 75 | position:absolute; 76 | left:40px; 77 | top:240px; 78 | letter-spacing:0.42px; 79 | word-spacing:0.1em; 80 | } 81 | #p06 82 | { 83 | position:absolute; 84 | left:40px; 85 | top:280px; 86 | letter-spacing:0.42px; 87 | word-spacing:0.1em; 88 | } 89 | #p07 90 | { 91 | position:absolute; 92 | left:40px; 93 | top:320px; 94 | letter-spacing:0.42px; 95 | word-spacing:0.1em; 96 | } 97 | #p08 98 | { 99 | position:absolute; 100 | left:40px; 101 | top:360px; 102 | letter-spacing:0.42px; 103 | word-spacing:0.1em; 104 | } 105 | #p09 106 | { 107 | position:absolute; 108 | left:40px; 109 | top:400px; 110 | letter-spacing:0.42px; 111 | word-spacing:0.1em; 112 | } 113 | #p10 114 | { 115 | position:absolute; 116 | left:40px; 117 | top:440px; 118 | letter-spacing:0.42px; 119 | word-spacing:0.1em; 120 | } -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_minimal_encryption.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:52+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Not well-formed 9 | 10 | ERROR: : OPS/XHTML file OEBPS/Text/pdfMigration.html cannot be decrypted 11 | ERROR: /OEBPS/toc.ncx(24): 'pdfa-is-a-profile': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 12 | ERROR: /OEBPS/toc.ncx(30): 'loss-alteration-during-migration': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 13 | ERROR: /OEBPS/toc.ncx(36): 'complexity-and-effect-of-errors': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 14 | ERROR: /OEBPS/toc.ncx(42): 'digitised-vs-born-digital': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 15 | ERROR: /OEBPS/toc.ncx(48): 'conclusions': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 16 | 17 | application/epub+zip 18 | 19 | CharacterCount25 20 | Languageen 21 | Info 22 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 23 | CreationDate2015-06-02T16:34:06Z 24 | TitleWhen (not) to migrate a PDF to PDF/A 25 | CreatorJohan van der Knijff 26 | Date2015-03-03 27 | 28 | hasEncryptiontrue 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_encryption_binary_content.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:18+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Not well-formed 9 | 10 | ERROR: : OPS/XHTML file OEBPS/Text/pdfMigration.html cannot be decrypted 11 | ERROR: /OEBPS/toc.ncx(24): 'pdfa-is-a-profile': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 12 | ERROR: /OEBPS/toc.ncx(30): 'loss-alteration-during-migration': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 13 | ERROR: /OEBPS/toc.ncx(36): 'complexity-and-effect-of-errors': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 14 | ERROR: /OEBPS/toc.ncx(42): 'digitised-vs-born-digital': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 15 | ERROR: /OEBPS/toc.ncx(48): 'conclusions': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 16 | 17 | application/epub+zip 18 | 19 | CharacterCount25 20 | Languageen 21 | Info 22 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 23 | CreationDate2015-06-02T16:34:06Z 24 | TitleWhen (not) to migrate a PDF to PDF/A 25 | CreatorJohan van der Knijff 26 | Date2015-03-03 27 | 28 | hasEncryptiontrue 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /content/epub30_font_obfuscation/EPUB/wasteland.opf: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | code.google.com.epub-samples.wasteland-otf-obfuscated 5 | The Waste Land 6 | T.S. Eliot 7 | en-US 8 | 2011-09-01 9 | OTF font obfuscated using algorithm defined in OCF 3.0, fallback to sans-serif system font 10 | 2012-01-18T12:47:00Z 11 | 12 | This work is shared with the public using the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. 13 | 14 | http://code.google.com/p/epub-samples/ 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | -------------------------------------------------------------------------------- /content/epub20_xpgt/OEBPS/page-template.xpgt: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_dtbook.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:36+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | application/epub+zip 13 | 14 | 15 | Language 16 | 17 | en 18 | 19 | 20 | 21 | Info 22 | 23 | 24 | Identifier 25 | 26 | C00000 27 | 28 | 29 | 30 | CreationDate 31 | 32 | 2015-06-02T16:34:06Z 33 | 34 | 35 | 36 | Title 37 | 38 | Valentin Haüy - the father of the education for the blind 39 | 40 | 41 | 42 | Creator 43 | 44 | Beatrice Christensen Sköld 45 | 46 | 47 | 48 | Date 49 | 50 | 2007-08-09 51 | 52 | 53 | 54 | Publisher 55 | 56 | TPB 57 | 58 | 59 | 60 | 61 | 62 | MediaTypes 63 | 64 | image/jpeg 65 | text/css 66 | application/x-dtbook+xml 67 | application/x-dtbncx+xml 68 | 69 | 70 | 71 | 72 | 73 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_minimal.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:59+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Well-formed 9 | application/epub+zip 10 | 11 | CharacterCount4520 12 | Languageen 13 | Info 14 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 15 | CreationDate2015-06-02T16:34:06Z 16 | TitleWhen (not) to migrate a PDF to PDF/A 17 | CreatorJohan van der Knijff 18 | Date2015-03-03 19 | 20 | References 21 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 22 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 23 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 24 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 25 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 26 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 27 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 28 | 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_crazy_columns.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:01+02:00 7 | 8 | 2016-04-01T13:11:36Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | 13 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (13-1) 14 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (18-7) 15 | 16 | application/epub+zip 17 | 18 | 19 | CharacterCount 20 | 21 | 273 22 | 23 | 24 | 25 | Language 26 | 27 | en 28 | 29 | 30 | 31 | Info 32 | 33 | 34 | Identifier 35 | 36 | urn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6 37 | 38 | 39 | 40 | CreationDate 41 | 42 | 2016-04-01T13:11:36Z 43 | 44 | 45 | 46 | Title 47 | 48 | Crazy Columns 49 | 50 | 51 | 52 | Creator 53 | 54 | Johan van der Knijff 55 | 56 | 57 | 58 | Date 59 | 60 | 2016-04-01 61 | 62 | 63 | 64 | 65 | 66 | MediaTypes 67 | 68 | application/x-dtbncx+xml 69 | application/xhtml+xml 70 | text/css 71 | 72 | 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_foreign_resource_with_fallback.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:45+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Well-formed 9 | application/epub+zip 10 | 11 | CharacterCount4521 12 | Languageen 13 | Info 14 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 15 | CreationDate2015-06-02T16:34:06Z 16 | TitleWhen (not) to migrate a PDF to PDF/A 17 | CreatorJohan van der Knijff 18 | Date2015-03-03 19 | 20 | References 21 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 22 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 23 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 24 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 25 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 26 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 27 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 28 | 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_foreign_resource_with_fallback_noID.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:12+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Well-formed 9 | application/epub+zip 10 | 11 | CharacterCount4521 12 | Languageen 13 | Info 14 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 15 | CreationDate2015-06-02T16:34:06Z 16 | TitleWhen (not) to migrate a PDF to PDF/A 17 | CreatorJohan van der Knijff 18 | Date2015-03-03 19 | 20 | References 21 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 22 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 23 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 24 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 25 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 26 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 27 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 28 | 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_foreign_resource_no_fallback.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:18:44+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Not well-formed 9 | 10 | ERROR: /OEBPS/Text/pdfMigration.html(20): non-standard image resource 'OEBPS/Images/pdfVenn.jp2' of type 'image/jp2' 11 | 12 | application/epub+zip 13 | 14 | CharacterCount4520 15 | Languageen 16 | Info 17 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 18 | CreationDate2015-06-02T16:34:06Z 19 | TitleWhen (not) to migrate a PDF to PDF/A 20 | CreatorJohan van der Knijff 21 | Date2015-03-03 22 | 23 | References 24 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 25 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 26 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 27 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 28 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 29 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 30 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 31 | 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_xpgt.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:20:05+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Not well-formed 9 | 10 | ERROR: /OEBPS/Text/pdfMigration.html(9): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified. 11 | 12 | application/epub+zip 13 | 14 | CharacterCount4526 15 | Languageen 16 | Info 17 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 18 | CreationDate2015-06-02T16:34:06Z 19 | TitleWhen (not) to migrate a PDF to PDF/A 20 | CreatorJohan van der Knijff 21 | Date2015-03-03 22 | 23 | References 24 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 25 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 26 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 27 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 28 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 29 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 30 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 31 | 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20__invalid_entity.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:39+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Not well-formed 9 | 10 | ERROR: /OEBPS/Text/pdfMigration.html(17): An invalid XML character (Unicode: 0xb) was found in the element content of the document. 11 | ERROR: /OEBPS/Text/pdfMigration.html: An invalid XML character (Unicode: 0xb) was found in the element content of the document. 12 | ERROR: /OEBPS/toc.ncx(30): 'loss-alteration-during-migration': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 13 | ERROR: /OEBPS/toc.ncx(36): 'complexity-and-effect-of-errors': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 14 | ERROR: /OEBPS/toc.ncx(42): 'digitised-vs-born-digital': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 15 | ERROR: /OEBPS/toc.ncx(48): 'conclusions': fragment identifier is not defined in 'OEBPS/Text/pdfMigration.html' 16 | 17 | application/epub+zip 18 | 19 | CharacterCount25 20 | Languageen 21 | Info 22 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 23 | CreationDate2015-06-02T16:34:06Z 24 | TitleWhen (not) to migrate a PDF to PDF/A 25 | CreatorJohan van der Knijff 26 | Date2015-03-03 27 | 28 | References 29 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 30 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 31 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 32 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 33 | 34 | 35 | 36 | 37 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub20_missingfontresource.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:18:50+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | application/epub+zip 7 | 2.0 8 | Not well-formed 9 | 10 | ERROR: /OEBPS/stylesheet.css(6): 'OEBPS/Fonts/CourierStd.otf': referenced resource missing in the package. 11 | 12 | application/epub+zip 13 | 14 | CharacterCount4523 15 | Languageen 16 | Info 17 | Identifierurn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 18 | CreationDate2015-06-02T16:34:06Z 19 | TitleWhen (not) to migrate a PDF to PDF/A 20 | CreatorJohan van der Knijff 21 | Date2015-03-03 22 | 23 | Fonts 24 | Font 25 | FontNameCourier 26 | FontFiletrue 27 | 28 | 29 | References 30 | Referencehttps://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format 31 | Referencehttp://wiki.opf-labs.org/display/TR/Portable+Document+Format 32 | Referencehttp://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 33 | Referencehttp://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 34 | Referencehttp://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 35 | Referencehttp://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 36 | Referencehttp://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 37 | 38 | 39 | 40 | 41 | -------------------------------------------------------------------------------- /epubcheckout/3.0.1/epub30_font_obfuscation.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 2017-08-29T12:19:05+02:00 4 | 5 | 2015-06-02T16:34:06Z 6 | 2012-01-18T12:47:00Z 7 | application/epub+zip 8 | 3.0 9 | Well-formed 10 | 11 | WARN: /EPUB/wasteland.ncx: meta@dtb:uid content 'code.google.com.epub-samples.wasteland-basic' should conform to unique-identifier in content.opf: 'code.google.com.epub-samples.wasteland-otf-obfuscated' 12 | 13 | application/epub+zip 14 | 15 | CharacterCount31671 16 | Languageen-US 17 | Info 18 | Identifiercode.google.com.epub-samples.wasteland-otf-obfuscated 19 | CreationDate2015-06-02T16:34:06Z 20 | ModDate2012-01-18T12:47:00Z 21 | TitleThe Waste Land 22 | CreatorT.S. Eliot 23 | Date2011-09-01 24 | RightsThis work is shared with the public using the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. 25 | 26 | Fonts 27 | Font 28 | FontNameOldStandard 29 | FontFiletrue 30 | 31 | Font 32 | FontNameOldStandard,bold 33 | FontFiletrue 34 | 35 | Font 36 | FontNameOldStandard,italic 37 | FontFiletrue 38 | 39 | 40 | References 41 | Referencehttp://creativecommons.org/licenses/by-sa/3.0/ 42 | Referencehttp://en.wikipedia.org/wiki/Simon_Fieldhouse 43 | 44 | hasEncryptiontrue 45 | 46 | 47 | 48 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_minimal_encryption.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:56+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Not well-formed 12 | 13 | RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.], epub20_minimal_encryption.epub 14 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (24-67) 15 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82) 16 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81) 17 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75) 18 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61) 19 | 20 | application/epub+zip 21 | 22 | 23 | CharacterCount 24 | 25 | 25 26 | 27 | 28 | 29 | Language 30 | 31 | en 32 | 33 | 34 | 35 | Info 36 | 37 | 38 | Identifier 39 | 40 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 41 | 42 | 43 | 44 | CreationDate 45 | 46 | 2015-06-02T16:34:06Z 47 | 48 | 49 | 50 | Title 51 | 52 | When (not) to migrate a PDF to PDF/A 53 | 54 | 55 | 56 | Creator 57 | 58 | Johan van der Knijff 59 | 60 | 61 | 62 | Date 63 | 64 | 2015-03-03 65 | 66 | 67 | 68 | 69 | 70 | MediaTypes 71 | 72 | application/x-dtbncx+xml 73 | application/xhtml+xml 74 | image/png 75 | image/jpeg 76 | 77 | 78 | 79 | hasEncryption 80 | 81 | true 82 | 83 | 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_encryption_binary_content.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:23+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Not well-formed 12 | 13 | RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.], epub20_encryption_binary_content.epub 14 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (24-67) 15 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82) 16 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81) 17 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75) 18 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61) 19 | HTM-023, WARN, [An invalid XHTML Named Entity was found: '&0;'.], OEBPS/Text/pdfMigration.html (18-197) 20 | HTM-023, WARN, [An invalid XHTML Named Entity was found: '&l0xb'.], OEBPS/Text/pdfMigration.html (291-6) 21 | 22 | application/epub+zip 23 | 24 | 25 | CharacterCount 26 | 27 | 25 28 | 29 | 30 | 31 | Language 32 | 33 | en 34 | 35 | 36 | 37 | Info 38 | 39 | 40 | Identifier 41 | 42 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 43 | 44 | 45 | 46 | CreationDate 47 | 48 | 2015-06-02T16:34:06Z 49 | 50 | 51 | 52 | Title 53 | 54 | When (not) to migrate a PDF to PDF/A 55 | 56 | 57 | 58 | Creator 59 | 60 | Johan van der Knijff 61 | 62 | 63 | 64 | Date 65 | 66 | 2015-03-03 67 | 68 | 69 | 70 | 71 | 72 | MediaTypes 73 | 74 | application/x-dtbncx+xml 75 | application/xhtml+xml 76 | image/png 77 | image/jpeg 78 | 79 | 80 | 81 | hasEncryption 82 | 83 | true 84 | 85 | 86 | 87 | 88 | 89 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_crazy_fixed_layout.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:30+02:00 7 | 8 | 2016-03-31T17:10:22Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | 13 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (6-2) 14 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (24-1) 15 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (43-1) 16 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (51-1) 17 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (59-1) 18 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (67-1) 19 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (75-1) 20 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (83-1) 21 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (91-1) 22 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (99-1) 23 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (107-1) 24 | CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (115-1) 25 | 26 | application/epub+zip 27 | 28 | 29 | CharacterCount 30 | 31 | 491 32 | 33 | 34 | 35 | Language 36 | 37 | en 38 | 39 | 40 | 41 | Info 42 | 43 | 44 | Identifier 45 | 46 | urn:uuid:65df5593-b4ab-4d06-9fb4-bb9916eb99e6 47 | 48 | 49 | 50 | CreationDate 51 | 52 | 2016-03-31T17:10:22Z 53 | 54 | 55 | 56 | Title 57 | 58 | Crazy Fixed Layout 59 | 60 | 61 | 62 | Creator 63 | 64 | Johan van der Knijff 65 | 66 | 67 | 68 | Date 69 | 70 | 2016-03-31 71 | 72 | 73 | 74 | 75 | 76 | MediaTypes 77 | 78 | application/x-dtbncx+xml 79 | application/xhtml+xml 80 | text/css 81 | 82 | 83 | 84 | 85 | 86 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_minimal.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:20:03+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | application/epub+zip 13 | 14 | 15 | CharacterCount 16 | 17 | 4520 18 | 19 | 20 | 21 | Language 22 | 23 | en 24 | 25 | 26 | 27 | Info 28 | 29 | 30 | Identifier 31 | 32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 33 | 34 | 35 | 36 | CreationDate 37 | 38 | 2015-06-02T16:34:06Z 39 | 40 | 41 | 42 | Title 43 | 44 | When (not) to migrate a PDF to PDF/A 45 | 46 | 47 | 48 | Creator 49 | 50 | Johan van der Knijff 51 | 52 | 53 | 54 | Date 55 | 56 | 2015-03-03 57 | 58 | 59 | 60 | 61 | 62 | References 63 | 64 | 65 | Reference 66 | 67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 68 | 69 | 70 | 71 | Reference 72 | 73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 74 | 75 | 76 | 77 | Reference 78 | 79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 80 | 81 | 82 | 83 | Reference 84 | 85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 86 | 87 | 88 | 89 | Reference 90 | 91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 92 | 93 | 94 | 95 | Reference 96 | 97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 98 | 99 | 100 | 101 | 102 | 103 | MediaTypes 104 | 105 | application/x-dtbncx+xml 106 | application/xhtml+xml 107 | image/png 108 | image/jpeg 109 | 110 | 111 | 112 | 113 | 114 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20__invalid_entity.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:43+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Not well-formed 12 | 13 | RSC-016, FATAL, [Fatal Error while parsing file 'An invalid XML character (Unicode: 0xb) was found in the element content of the document.'.], OEBPS/Text/pdfMigration.html (17-520) 14 | RSC-005, ERROR, [Error while parsing file 'An invalid XML character (Unicode: 0xb) was found in the element content of the document.'.], OEBPS/Text/pdfMigration.html 15 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82) 16 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81) 17 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75) 18 | RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61) 19 | 20 | application/epub+zip 21 | 22 | 23 | CharacterCount 24 | 25 | 25 26 | 27 | 28 | 29 | Language 30 | 31 | en 32 | 33 | 34 | 35 | Info 36 | 37 | 38 | Identifier 39 | 40 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 41 | 42 | 43 | 44 | CreationDate 45 | 46 | 2015-06-02T16:34:06Z 47 | 48 | 49 | 50 | Title 51 | 52 | When (not) to migrate a PDF to PDF/A 53 | 54 | 55 | 56 | Creator 57 | 58 | Johan van der Knijff 59 | 60 | 61 | 62 | Date 63 | 64 | 2015-03-03 65 | 66 | 67 | 68 | 69 | 70 | References 71 | 72 | 73 | Reference 74 | 75 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 76 | 77 | 78 | 79 | Reference 80 | 81 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 82 | 83 | 84 | 85 | Reference 86 | 87 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 88 | 89 | 90 | 91 | 92 | 93 | MediaTypes 94 | 95 | application/x-dtbncx+xml 96 | application/xhtml+xml 97 | image/png 98 | image/jpeg 99 | 100 | 101 | 102 | 103 | 104 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_foreign_resource_with_fallback.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:50+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | application/epub+zip 13 | 14 | 15 | CharacterCount 16 | 17 | 4521 18 | 19 | 20 | 21 | Language 22 | 23 | en 24 | 25 | 26 | 27 | Info 28 | 29 | 30 | Identifier 31 | 32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 33 | 34 | 35 | 36 | CreationDate 37 | 38 | 2015-06-02T16:34:06Z 39 | 40 | 41 | 42 | Title 43 | 44 | When (not) to migrate a PDF to PDF/A 45 | 46 | 47 | 48 | Creator 49 | 50 | Johan van der Knijff 51 | 52 | 53 | 54 | Date 55 | 56 | 2015-03-03 57 | 58 | 59 | 60 | 61 | 62 | References 63 | 64 | 65 | Reference 66 | 67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 68 | 69 | 70 | 71 | Reference 72 | 73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 74 | 75 | 76 | 77 | Reference 78 | 79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 80 | 81 | 82 | 83 | Reference 84 | 85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 86 | 87 | 88 | 89 | Reference 90 | 91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 92 | 93 | 94 | 95 | Reference 96 | 97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 98 | 99 | 100 | 101 | 102 | 103 | MediaTypes 104 | 105 | application/x-dtbncx+xml 106 | application/xhtml+xml 107 | image/jp2 108 | image/png 109 | image/jpeg 110 | 111 | 112 | 113 | 114 | 115 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_foreign_resource_with_fallback_noID.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:19:16+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | application/epub+zip 13 | 14 | 15 | CharacterCount 16 | 17 | 4521 18 | 19 | 20 | 21 | Language 22 | 23 | en 24 | 25 | 26 | 27 | Info 28 | 29 | 30 | Identifier 31 | 32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 33 | 34 | 35 | 36 | CreationDate 37 | 38 | 2015-06-02T16:34:06Z 39 | 40 | 41 | 42 | Title 43 | 44 | When (not) to migrate a PDF to PDF/A 45 | 46 | 47 | 48 | Creator 49 | 50 | Johan van der Knijff 51 | 52 | 53 | 54 | Date 55 | 56 | 2015-03-03 57 | 58 | 59 | 60 | 61 | 62 | References 63 | 64 | 65 | Reference 66 | 67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 68 | 69 | 70 | 71 | Reference 72 | 73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 74 | 75 | 76 | 77 | Reference 78 | 79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 80 | 81 | 82 | 83 | Reference 84 | 85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 86 | 87 | 88 | 89 | Reference 90 | 91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 92 | 93 | 94 | 95 | Reference 96 | 97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 98 | 99 | 100 | 101 | 102 | 103 | MediaTypes 104 | 105 | application/x-dtbncx+xml 106 | application/xhtml+xml 107 | image/jp2 108 | image/png 109 | image/jpeg 110 | 111 | 112 | 113 | 114 | 115 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_xpgt.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:20:10+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Well-formed 12 | application/epub+zip 13 | 14 | 15 | CharacterCount 16 | 17 | 4526 18 | 19 | 20 | 21 | Language 22 | 23 | en 24 | 25 | 26 | 27 | Info 28 | 29 | 30 | Identifier 31 | 32 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 33 | 34 | 35 | 36 | CreationDate 37 | 38 | 2015-06-02T16:34:06Z 39 | 40 | 41 | 42 | Title 43 | 44 | When (not) to migrate a PDF to PDF/A 45 | 46 | 47 | 48 | Creator 49 | 50 | Johan van der Knijff 51 | 52 | 53 | 54 | Date 55 | 56 | 2015-03-03 57 | 58 | 59 | 60 | 61 | 62 | References 63 | 64 | 65 | Reference 66 | 67 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 68 | 69 | 70 | 71 | Reference 72 | 73 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 74 | 75 | 76 | 77 | Reference 78 | 79 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 80 | 81 | 82 | 83 | Reference 84 | 85 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 86 | 87 | 88 | 89 | Reference 90 | 91 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 92 | 93 | 94 | 95 | Reference 96 | 97 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 98 | 99 | 100 | 101 | 102 | 103 | MediaTypes 104 | 105 | application/x-dtbncx+xml 106 | application/xhtml+xml 107 | image/png 108 | image/jpeg 109 | text/css 110 | application/vnd.adobe-page-template+xml 111 | 112 | 113 | 114 | 115 | 116 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_foreign_resource_no_fallback.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:18:48+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Not well-formed 12 | 13 | MED-003, ERROR, [Non-standard image resource of type image/jp2 found.], OEBPS/Text/pdfMigration.html (20-63) 14 | 15 | application/epub+zip 16 | 17 | 18 | CharacterCount 19 | 20 | 4520 21 | 22 | 23 | 24 | Language 25 | 26 | en 27 | 28 | 29 | 30 | Info 31 | 32 | 33 | Identifier 34 | 35 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 36 | 37 | 38 | 39 | CreationDate 40 | 41 | 2015-06-02T16:34:06Z 42 | 43 | 44 | 45 | Title 46 | 47 | When (not) to migrate a PDF to PDF/A 48 | 49 | 50 | 51 | Creator 52 | 53 | Johan van der Knijff 54 | 55 | 56 | 57 | Date 58 | 59 | 2015-03-03 60 | 61 | 62 | 63 | 64 | 65 | References 66 | 67 | 68 | Reference 69 | 70 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 71 | 72 | 73 | 74 | Reference 75 | 76 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 77 | 78 | 79 | 80 | Reference 81 | 82 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 83 | 84 | 85 | 86 | Reference 87 | 88 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 89 | 90 | 91 | 92 | Reference 93 | 94 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 95 | 96 | 97 | 98 | Reference 99 | 100 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 101 | 102 | 103 | 104 | 105 | 106 | MediaTypes 107 | 108 | application/x-dtbncx+xml 109 | application/xhtml+xml 110 | image/jp2 111 | image/jpeg 112 | 113 | 114 | 115 | 116 | 117 | -------------------------------------------------------------------------------- /pubresources/pdfMigration.md: -------------------------------------------------------------------------------- 1 | # When (not) to migrate a PDF to PDF/A 2 | 3 | It is well-known that PDF documents can contain features that are preservation risks (e.g. see [here](https://web.archive.org/web/20130515073645/http://libraries.stackexchange.com/questions/964/what-preservation-risks-are-associated-with-the-pdf-file-format) and [here](http://wiki.opf-labs.org/display/TR/Portable+Document+Format)). Migration of existing *PDF*s to *PDF/A* is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this. 4 | 5 | 6 | 7 | 8 | ## *PDF/A* is a profile 9 | First, it's important to stress that each of the *PDF/A* standards (*A-1*, *A-2* and *A-3*) are really just *profiles* within the *PDF* format. More specifically, *PDF/A-1* offers a subset of [*PDF 1.4*](http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf), whereas *PDF/A-2* and *PDF/A-3* are based on [the ISO 32000 version of *PDF 1.7*](http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf). What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' *PDF*. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned *PDF* flavours: 10 | 11 | ![PDF Venn diagram](http://www.openplanetsfoundation.org/system/files/pdfVenn.png) 12 | 13 | Here we see how *PDF/A-1* is a subset of *PDF 1.4*, which in turn is a subset of *PDF 1.7*. *PDF A/2* and *PDF A/3* (aggregated here as one entity for the sake of readability) are subsets of *PDF 1.7*, and include all the features of *PDF A/1*. 14 | 15 | Keeping this in mind, it's easy to see that migrating an arbitrary *PDF* to *PDF/A* can result in problems. 16 | 17 | ##Loss, alteration during migration 18 | Suppose, as an example, that we have a *PDF* that contains a movie. This is prohibited in *PDF/A*, so migrating to *PDF/A* will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a *PDF/A* document must be embedded. But what happens if the source *PDF* uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this? 19 | 20 | ##Complexity and effect of errors 21 | Also, migrations like these typically involve a complete re-processing of the *PDF*'s internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source *PDF* contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a [sufficiently reliable *PDF* validator](http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf)), these cases can be difficult to deal with. Some further considerations can be found [here](http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation) (the context there is slightly different, but the risks are similar). 22 | 23 | ##Digitised vs born-digital 24 | The origin of the source *PDF*s may be another thing to take into account. If *PDF*s were originally created as part of a digitisation project (e.g. scanned books), the *PDF* is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such *PDF*s to *PDF/A* is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in *PDF/A*. At the same time, this also means that the benefits of migrating such files to *PDF/A* are pretty limited, since the source *PDF*s weren't problematic to begin with! 25 | 26 | The potential benefits *PDF/A* may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also [here](http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21) for some additional considerations). 27 | 28 | ##Conclusions 29 | Although migrating *PDF* documents to *PDF/A* may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source *PDF*s that weren't problematic to begin with, which belies the very purpose of migrating to *PDF/A*. For specific cases, migration to *PDF/A* may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of *PDF*s (both source *and* destination!), it would also seem prudent to always keep the originals. -------------------------------------------------------------------------------- /pubresources/pdfMigration.html: -------------------------------------------------------------------------------- 1 |

When (not) to migrate a PDF to PDF/A

2 |

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

3 | 4 | 5 | 6 |

PDF/A is a profile

7 |

First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:

8 |
9 | PDF Venn diagram

PDF Venn diagram

10 |
11 |

Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.

12 |

Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.

13 |

Loss, alteration during migration

14 |

Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?

15 |

Complexity and effect of errors

16 |

Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).

17 |

Digitised vs born-digital

18 |

The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!

19 |

The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).

20 |

Conclusions

21 |

Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.

22 | -------------------------------------------------------------------------------- /epubcheckout/4.0.1/epub20_missingfontresource.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 2017-08-29T12:18:55+02:00 7 | 8 | 2015-06-02T16:34:06Z 9 | application/epub+zip 10 | 2.0.1 11 | Not well-formed 12 | 13 | RSC-007, ERROR, [Referenced resource could not be found in the EPUB.], OEBPS/stylesheet.css (6-1) 14 | 15 | application/epub+zip 16 | 17 | 18 | CharacterCount 19 | 20 | 4523 21 | 22 | 23 | 24 | Language 25 | 26 | en 27 | 28 | 29 | 30 | Info 31 | 32 | 33 | Identifier 34 | 35 | urn:uuid:f930f4b3-cba2-42ba-ab26-d49438ab00d6 36 | 37 | 38 | 39 | CreationDate 40 | 41 | 2015-06-02T16:34:06Z 42 | 43 | 44 | 45 | Title 46 | 47 | When (not) to migrate a PDF to PDF/A 48 | 49 | 50 | 51 | Creator 52 | 53 | Johan van der Knijff 54 | 55 | 56 | 57 | Date 58 | 59 | 2015-03-03 60 | 61 | 62 | 63 | 64 | 65 | Fonts 66 | 67 | 68 | Font 69 | 70 | 71 | FontName 72 | 73 | Courier 74 | 75 | 76 | 77 | FontFile 78 | 79 | false 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | References 88 | 89 | 90 | Reference 91 | 92 | http://wiki.opf-labs.org/display/TR/Portable+Document+Format 93 | 94 | 95 | 96 | Reference 97 | 98 | http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf 99 | 100 | 101 | 102 | Reference 103 | 104 | http://acroeng.adobe.com/PDFReference/ISO32000/PDF32000-Adobe.pdf 105 | 106 | 107 | 108 | Reference 109 | 110 | http://duff-johnson.com/wp-content/uploads/2014/01/PDFValidationDreamOrYawn.pdf 111 | 112 | 113 | 114 | Reference 115 | 116 | http://web.archive.org/web/20130605142355/http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation 117 | 118 | 119 | 120 | Reference 121 | 122 | http://qanda.digipres.org/19/what-are-the-benefits-and-risks-of-using-the-pdf-a-file-format?show=21#a21 123 | 124 | 125 | 126 | 127 | 128 | MediaTypes 129 | 130 | application/x-dtbncx+xml 131 | application/xhtml+xml 132 | image/png 133 | image/jpeg 134 | text/css 135 | 136 | 137 | 138 | 139 | 140 | -------------------------------------------------------------------------------- /content/epub20_minimal/OEBPS/Text/pdfMigration.html: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

When (not) to migrate a PDF to PDF/A

12 | 13 |

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

14 | 15 |

PDF/A is a profile

16 | 17 |

First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:

18 | 19 |
20 | PDF Venn diagram 21 | 22 |

PDF Venn diagram

23 |
24 | 25 |

Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.

26 | 27 |

Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.

28 | 29 |

Loss, alteration during migration

30 | 31 |

Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?

32 | 33 |

Complexity and effect of errors

34 | 35 |

Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).

36 | 37 |

Digitised vs born-digital

38 | 39 |

The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!

40 | 41 |

The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).

42 | 43 |

Conclusions

44 | 45 |

Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.

46 | 47 | 48 | -------------------------------------------------------------------------------- /content/epub20__invalid_entity/OEBPS/Text/pdfMigration.html: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

When (not) to migrate a PDF to PDF/A

12 | 13 |

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

14 | 15 |

PDF/A is a profile

16 | 17 |

First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1 7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:

18 | 19 |
20 | PDF Venn diagram 21 | 22 |

PDF Venn diagram

23 |
24 | 25 |

Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.

26 | 27 |

Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.

28 | 29 |

Loss, alteration during migration

30 | 31 |

Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?

32 | 33 |

Complexity and effect of errors

34 | 35 |

Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).

36 | 37 |

Digitised vs born-digital

38 | 39 |

The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!

40 | 41 |

The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).

42 | 43 |

Conclusions

44 | 45 |

Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.

46 | 47 | 48 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_no_fallback/OEBPS/Text/pdfMigration.html: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

When (not) to migrate a PDF to PDF/A

12 | 13 |

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

14 | 15 |

PDF/A is a profile

16 | 17 |

First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:

18 | 19 |
20 | PDF Venn diagram 21 | 22 |

PDF Venn diagram

23 |
24 | 25 |

Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.

26 | 27 |

Keeping this in mind, it's easy to see that migrating an arbitrary PDF to PDF/A can result in problems.

28 | 29 |

Loss, alteration during migration

30 | 31 |

Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?

32 | 33 |

Complexity and effect of errors

34 | 35 |

Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).

36 | 37 |

Digitised vs born-digital

38 | 39 |

The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!

40 | 41 |

The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).

42 | 43 |

Conclusions

44 | 45 |

Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.

46 | 47 | 48 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback_noID/OEBPS/Text/pdfMigration.html: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

When (not) to migrate a PDF to PDF/A

12 | 13 |

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

14 | 15 |

PDF/A is a profile

16 | 17 |

First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:

18 | 19 |
20 | PDF Venn diagram 21 | 22 |

PDF Venn diagram

23 |
24 | 25 |

Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.

26 | 27 |

Keeping this in mind, it's easy to see that m igrating an arbitrary PDF to PDF/A can result in problems.

28 | 29 |

Loss, alteration during migration

30 | 31 |

Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?

32 | 33 |

Complexity and effect of errors

34 | 35 |

Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).

36 | 37 |

Digitised vs born-digital

38 | 39 |

The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!

40 | 41 |

The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).

42 | 43 |

Conclusions

44 | 45 |

Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.

46 | 47 | 48 | -------------------------------------------------------------------------------- /content/epub20_foreign_resource_with_fallback/OEBPS/Text/pdfMigration.html: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

When (not) to migrate a PDF to PDF/A

12 | 13 |

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

14 | 15 |

PDF/A is a profile

16 | 17 |

First, it's important to stress that each of the PDF/A standards (A-1, A-2 and A-3) are really just profiles within the PDF format. More specifically, PDF/A-1 offers a subset of PDF 1.4, whereas PDF/A-2 and PDF/A-3 are based on the ISO 32000 version of PDF 1.7. What these profiles have in common, is that they prohibit some features (e.g. multimedia, encryption, interactive content) that are allowed in 'regular' PDF. Also, they narrow down the way other features are implemented, for example by requiring that all fonts are embedded in the document. This can be illustrated with the following simple Venn diagram below, which shows the feature sets of the aforementioned PDF flavours:

18 | 19 |
20 | PDF Venn diagram 21 | 22 |

PDF Venn diagram

23 |
24 | 25 |

Here we see how PDF/A-1 is a subset of PDF 1.4, which in turn is a subset of PDF 1.7. PDF A/2 and PDF A/3 (aggregated here as one entity for the sake of readability) are subsets of PDF 1.7, and include all the features of PDF A/1.

26 | 27 |

Keeping this in mind, it's easy to see that m igrating an arbitrary PDF to PDF/A can result in problems.

28 | 29 |

Loss, alteration during migration

30 | 31 |

Suppose, as an example, that we have a PDF that contains a movie. This is prohibited in PDF/A, so migrating to PDF/A will simply result in the loss of the multimedia content. Another example are fonts: all fonts in a PDF/A document must be embedded. But what happens if the source PDF uses non-embedded fonts that are not available on the machine on which the migration is run? Will the migration tool exit with a warning, or will it silently use some alternative, perhaps similar font? And how do you check for this?

32 | 33 |

Complexity and effect of errors

34 | 35 |

Also, migrations like these typically involve a complete re-processing of the PDF's internal structure. The format's complexity implies that there's a lot of potential for things to go wrong in this process. This is particularly true if the source PDF contains subtle errors, in which case the risk of losing information is very real (even though the original document may be perfectly readable in a viewer). Since we don't really have any tools for detecting such errors (i.e. a sufficiently reliable PDF validator), these cases can be difficult to deal with. Some further considerations can be found here (the context there is slightly different, but the risks are similar).

36 | 37 |

Digitised vs born-digital

38 | 39 |

The origin of the source PDFs may be another thing to take into account. If PDFs were originally created as part of a digitisation project (e.g. scanned books), the PDF is usually little more than a wrapper around a bunch of images, perhaps augmented by an OCR layer. Migrating such PDFs to PDF/A is pretty straightforward, since the source files are unlikely to contain any features that are not allowed in PDF/A. At the same time, this also means that the benefits of migrating such files to PDF/A are pretty limited, since the source PDFs weren't problematic to begin with!

40 | 41 |

The potential benefits PDF/A may be more obvious for a lot of born-digital content; however, for the reasons listed in the previous section, the migration is more complex, and there's just a lot more that can go wrong (see also here for some additional considerations).

42 | 43 |

Conclusions

44 | 45 |

Although migrating PDF documents to PDF/A may look superficially attractive, it is actually quite risky in practice, and it may easily result in unintentional data loss. Moreover, the risks increase with the number of preservation-unfriendly features, meaning that the migration is most likely to be successful for source PDFs that weren't problematic to begin with, which belies the very purpose of migrating to PDF/A. For specific cases, migration to PDF/A may still be a sensible approach, but the expected benefits should be weighed carefully against the risks. In the absence of stable, generally accepted tools for assessing the quality of PDFs (both source and destination!), it would also seem prudent to always keep the originals.

46 | 47 | 48 | --------------------------------------------------------------------------------