├── .idea └── vcs.xml ├── LICENSE ├── README.md ├── cop-backend.md ├── cop-solr-fields.md ├── data-crunch-demos ├── date-demo.xsl └── settlement-demo.xsl ├── dka ├── README.md ├── ese2dka.xsl ├── harvested_records │ └── README.md ├── oai_kb_oriental.xml ├── schemas │ ├── DKA.xsd │ └── DKA2.xsd └── specimen │ └── specimen.xml ├── examples.md ├── form-demos ├── README.md ├── adl-form.html ├── cop-form.html └── cop-solr-form.html ├── geographic-data.md ├── image-delivery.md ├── kml-viewer.html ├── links.md ├── metadata-formats.md ├── nielsen-query.kml ├── oai-pmh.md ├── subject203.kml ├── subject203.rss ├── text-corpora.md └── web-service-architecture.png /.idea/vcs.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | GNU Free Documentation License 3 | Version 1.3, 3 November 2008 4 | 5 | 6 | Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. 7 | 8 | Everyone is permitted to copy and distribute verbatim copies 9 | of this license document, but changing it is not allowed. 10 | 11 | 0. PREAMBLE 12 | 13 | The purpose of this License is to make a manual, textbook, or other 14 | functional and useful document "free" in the sense of freedom: to 15 | assure everyone the effective freedom to copy and redistribute it, 16 | with or without modifying it, either commercially or noncommercially. 17 | Secondarily, this License preserves for the author and publisher a way 18 | to get credit for their work, while not being considered responsible 19 | for modifications made by others. 20 | 21 | This License is a kind of "copyleft", which means that derivative 22 | works of the document must themselves be free in the same sense. It 23 | complements the GNU General Public License, which is a copyleft 24 | license designed for free software. 25 | 26 | We have designed this License in order to use it for manuals for free 27 | software, because free software needs free documentation: a free 28 | program should come with manuals providing the same freedoms that the 29 | software does. But this License is not limited to software manuals; 30 | it can be used for any textual work, regardless of subject matter or 31 | whether it is published as a printed book. We recommend this License 32 | principally for works whose purpose is instruction or reference. 33 | 34 | 35 | 1. APPLICABILITY AND DEFINITIONS 36 | 37 | This License applies to any manual or other work, in any medium, that 38 | contains a notice placed by the copyright holder saying it can be 39 | distributed under the terms of this License. Such a notice grants a 40 | world-wide, royalty-free license, unlimited in duration, to use that 41 | work under the conditions stated herein. The "Document", below, 42 | refers to any such manual or work. Any member of the public is a 43 | licensee, and is addressed as "you". You accept the license if you 44 | copy, modify or distribute the work in a way requiring permission 45 | under copyright law. 46 | 47 | A "Modified Version" of the Document means any work containing the 48 | Document or a portion of it, either copied verbatim, or with 49 | modifications and/or translated into another language. 50 | 51 | A "Secondary Section" is a named appendix or a front-matter section of 52 | the Document that deals exclusively with the relationship of the 53 | publishers or authors of the Document to the Document's overall 54 | subject (or to related matters) and contains nothing that could fall 55 | directly within that overall subject. (Thus, if the Document is in 56 | part a textbook of mathematics, a Secondary Section may not explain 57 | any mathematics.) The relationship could be a matter of historical 58 | connection with the subject or with related matters, or of legal, 59 | commercial, philosophical, ethical or political position regarding 60 | them. 61 | 62 | The "Invariant Sections" are certain Secondary Sections whose titles 63 | are designated, as being those of Invariant Sections, in the notice 64 | that says that the Document is released under this License. If a 65 | section does not fit the above definition of Secondary then it is not 66 | allowed to be designated as Invariant. The Document may contain zero 67 | Invariant Sections. If the Document does not identify any Invariant 68 | Sections then there are none. 69 | 70 | The "Cover Texts" are certain short passages of text that are listed, 71 | as Front-Cover Texts or Back-Cover Texts, in the notice that says that 72 | the Document is released under this License. A Front-Cover Text may 73 | be at most 5 words, and a Back-Cover Text may be at most 25 words. 74 | 75 | A "Transparent" copy of the Document means a machine-readable copy, 76 | represented in a format whose specification is available to the 77 | general public, that is suitable for revising the document 78 | straightforwardly with generic text editors or (for images composed of 79 | pixels) generic paint programs or (for drawings) some widely available 80 | drawing editor, and that is suitable for input to text formatters or 81 | for automatic translation to a variety of formats suitable for input 82 | to text formatters. A copy made in an otherwise Transparent file 83 | format whose markup, or absence of markup, has been arranged to thwart 84 | or discourage subsequent modification by readers is not Transparent. 85 | An image format is not Transparent if used for any substantial amount 86 | of text. A copy that is not "Transparent" is called "Opaque". 87 | 88 | Examples of suitable formats for Transparent copies include plain 89 | ASCII without markup, Texinfo input format, LaTeX input format, SGML 90 | or XML using a publicly available DTD, and standard-conforming simple 91 | HTML, PostScript or PDF designed for human modification. Examples of 92 | transparent image formats include PNG, XCF and JPG. Opaque formats 93 | include proprietary formats that can be read and edited only by 94 | proprietary word processors, SGML or XML for which the DTD and/or 95 | processing tools are not generally available, and the 96 | machine-generated HTML, PostScript or PDF produced by some word 97 | processors for output purposes only. 98 | 99 | The "Title Page" means, for a printed book, the title page itself, 100 | plus such following pages as are needed to hold, legibly, the material 101 | this License requires to appear in the title page. For works in 102 | formats which do not have any title page as such, "Title Page" means 103 | the text near the most prominent appearance of the work's title, 104 | preceding the beginning of the body of the text. 105 | 106 | The "publisher" means any person or entity that distributes copies of 107 | the Document to the public. 108 | 109 | A section "Entitled XYZ" means a named subunit of the Document whose 110 | title either is precisely XYZ or contains XYZ in parentheses following 111 | text that translates XYZ in another language. (Here XYZ stands for a 112 | specific section name mentioned below, such as "Acknowledgements", 113 | "Dedications", "Endorsements", or "History".) To "Preserve the Title" 114 | of such a section when you modify the Document means that it remains a 115 | section "Entitled XYZ" according to this definition. 116 | 117 | The Document may include Warranty Disclaimers next to the notice which 118 | states that this License applies to the Document. These Warranty 119 | Disclaimers are considered to be included by reference in this 120 | License, but only as regards disclaiming warranties: any other 121 | implication that these Warranty Disclaimers may have is void and has 122 | no effect on the meaning of this License. 123 | 124 | 2. VERBATIM COPYING 125 | 126 | You may copy and distribute the Document in any medium, either 127 | commercially or noncommercially, provided that this License, the 128 | copyright notices, and the license notice saying this License applies 129 | to the Document are reproduced in all copies, and that you add no 130 | other conditions whatsoever to those of this License. You may not use 131 | technical measures to obstruct or control the reading or further 132 | copying of the copies you make or distribute. However, you may accept 133 | compensation in exchange for copies. If you distribute a large enough 134 | number of copies you must also follow the conditions in section 3. 135 | 136 | You may also lend copies, under the same conditions stated above, and 137 | you may publicly display copies. 138 | 139 | 140 | 3. COPYING IN QUANTITY 141 | 142 | If you publish printed copies (or copies in media that commonly have 143 | printed covers) of the Document, numbering more than 100, and the 144 | Document's license notice requires Cover Texts, you must enclose the 145 | copies in covers that carry, clearly and legibly, all these Cover 146 | Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on 147 | the back cover. Both covers must also clearly and legibly identify 148 | you as the publisher of these copies. The front cover must present 149 | the full title with all words of the title equally prominent and 150 | visible. You may add other material on the covers in addition. 151 | Copying with changes limited to the covers, as long as they preserve 152 | the title of the Document and satisfy these conditions, can be treated 153 | as verbatim copying in other respects. 154 | 155 | If the required texts for either cover are too voluminous to fit 156 | legibly, you should put the first ones listed (as many as fit 157 | reasonably) on the actual cover, and continue the rest onto adjacent 158 | pages. 159 | 160 | If you publish or distribute Opaque copies of the Document numbering 161 | more than 100, you must either include a machine-readable Transparent 162 | copy along with each Opaque copy, or state in or with each Opaque copy 163 | a computer-network location from which the general network-using 164 | public has access to download using public-standard network protocols 165 | a complete Transparent copy of the Document, free of added material. 166 | If you use the latter option, you must take reasonably prudent steps, 167 | when you begin distribution of Opaque copies in quantity, to ensure 168 | that this Transparent copy will remain thus accessible at the stated 169 | location until at least one year after the last time you distribute an 170 | Opaque copy (directly or through your agents or retailers) of that 171 | edition to the public. 172 | 173 | It is requested, but not required, that you contact the authors of the 174 | Document well before redistributing any large number of copies, to 175 | give them a chance to provide you with an updated version of the 176 | Document. 177 | 178 | 179 | 4. MODIFICATIONS 180 | 181 | You may copy and distribute a Modified Version of the Document under 182 | the conditions of sections 2 and 3 above, provided that you release 183 | the Modified Version under precisely this License, with the Modified 184 | Version filling the role of the Document, thus licensing distribution 185 | and modification of the Modified Version to whoever possesses a copy 186 | of it. In addition, you must do these things in the Modified Version: 187 | 188 | A. Use in the Title Page (and on the covers, if any) a title distinct 189 | from that of the Document, and from those of previous versions 190 | (which should, if there were any, be listed in the History section 191 | of the Document). You may use the same title as a previous version 192 | if the original publisher of that version gives permission. 193 | B. List on the Title Page, as authors, one or more persons or entities 194 | responsible for authorship of the modifications in the Modified 195 | Version, together with at least five of the principal authors of the 196 | Document (all of its principal authors, if it has fewer than five), 197 | unless they release you from this requirement. 198 | C. State on the Title page the name of the publisher of the 199 | Modified Version, as the publisher. 200 | D. Preserve all the copyright notices of the Document. 201 | E. Add an appropriate copyright notice for your modifications 202 | adjacent to the other copyright notices. 203 | F. Include, immediately after the copyright notices, a license notice 204 | giving the public permission to use the Modified Version under the 205 | terms of this License, in the form shown in the Addendum below. 206 | G. Preserve in that license notice the full lists of Invariant Sections 207 | and required Cover Texts given in the Document's license notice. 208 | H. Include an unaltered copy of this License. 209 | I. Preserve the section Entitled "History", Preserve its Title, and add 210 | to it an item stating at least the title, year, new authors, and 211 | publisher of the Modified Version as given on the Title Page. If 212 | there is no section Entitled "History" in the Document, create one 213 | stating the title, year, authors, and publisher of the Document as 214 | given on its Title Page, then add an item describing the Modified 215 | Version as stated in the previous sentence. 216 | J. Preserve the network location, if any, given in the Document for 217 | public access to a Transparent copy of the Document, and likewise 218 | the network locations given in the Document for previous versions 219 | it was based on. These may be placed in the "History" section. 220 | You may omit a network location for a work that was published at 221 | least four years before the Document itself, or if the original 222 | publisher of the version it refers to gives permission. 223 | K. For any section Entitled "Acknowledgements" or "Dedications", 224 | Preserve the Title of the section, and preserve in the section all 225 | the substance and tone of each of the contributor acknowledgements 226 | and/or dedications given therein. 227 | L. Preserve all the Invariant Sections of the Document, 228 | unaltered in their text and in their titles. Section numbers 229 | or the equivalent are not considered part of the section titles. 230 | M. Delete any section Entitled "Endorsements". Such a section 231 | may not be included in the Modified Version. 232 | N. Do not retitle any existing section to be Entitled "Endorsements" 233 | or to conflict in title with any Invariant Section. 234 | O. Preserve any Warranty Disclaimers. 235 | 236 | If the Modified Version includes new front-matter sections or 237 | appendices that qualify as Secondary Sections and contain no material 238 | copied from the Document, you may at your option designate some or all 239 | of these sections as invariant. To do this, add their titles to the 240 | list of Invariant Sections in the Modified Version's license notice. 241 | These titles must be distinct from any other section titles. 242 | 243 | You may add a section Entitled "Endorsements", provided it contains 244 | nothing but endorsements of your Modified Version by various 245 | parties--for example, statements of peer review or that the text has 246 | been approved by an organization as the authoritative definition of a 247 | standard. 248 | 249 | You may add a passage of up to five words as a Front-Cover Text, and a 250 | passage of up to 25 words as a Back-Cover Text, to the end of the list 251 | of Cover Texts in the Modified Version. Only one passage of 252 | Front-Cover Text and one of Back-Cover Text may be added by (or 253 | through arrangements made by) any one entity. If the Document already 254 | includes a cover text for the same cover, previously added by you or 255 | by arrangement made by the same entity you are acting on behalf of, 256 | you may not add another; but you may replace the old one, on explicit 257 | permission from the previous publisher that added the old one. 258 | 259 | The author(s) and publisher(s) of the Document do not by this License 260 | give permission to use their names for publicity for or to assert or 261 | imply endorsement of any Modified Version. 262 | 263 | 264 | 5. COMBINING DOCUMENTS 265 | 266 | You may combine the Document with other documents released under this 267 | License, under the terms defined in section 4 above for modified 268 | versions, provided that you include in the combination all of the 269 | Invariant Sections of all of the original documents, unmodified, and 270 | list them all as Invariant Sections of your combined work in its 271 | license notice, and that you preserve all their Warranty Disclaimers. 272 | 273 | The combined work need only contain one copy of this License, and 274 | multiple identical Invariant Sections may be replaced with a single 275 | copy. If there are multiple Invariant Sections with the same name but 276 | different contents, make the title of each such section unique by 277 | adding at the end of it, in parentheses, the name of the original 278 | author or publisher of that section if known, or else a unique number. 279 | Make the same adjustment to the section titles in the list of 280 | Invariant Sections in the license notice of the combined work. 281 | 282 | In the combination, you must combine any sections Entitled "History" 283 | in the various original documents, forming one section Entitled 284 | "History"; likewise combine any sections Entitled "Acknowledgements", 285 | and any sections Entitled "Dedications". You must delete all sections 286 | Entitled "Endorsements". 287 | 288 | 289 | 6. COLLECTIONS OF DOCUMENTS 290 | 291 | You may make a collection consisting of the Document and other 292 | documents released under this License, and replace the individual 293 | copies of this License in the various documents with a single copy 294 | that is included in the collection, provided that you follow the rules 295 | of this License for verbatim copying of each of the documents in all 296 | other respects. 297 | 298 | You may extract a single document from such a collection, and 299 | distribute it individually under this License, provided you insert a 300 | copy of this License into the extracted document, and follow this 301 | License in all other respects regarding verbatim copying of that 302 | document. 303 | 304 | 305 | 7. AGGREGATION WITH INDEPENDENT WORKS 306 | 307 | A compilation of the Document or its derivatives with other separate 308 | and independent documents or works, in or on a volume of a storage or 309 | distribution medium, is called an "aggregate" if the copyright 310 | resulting from the compilation is not used to limit the legal rights 311 | of the compilation's users beyond what the individual works permit. 312 | When the Document is included in an aggregate, this License does not 313 | apply to the other works in the aggregate which are not themselves 314 | derivative works of the Document. 315 | 316 | If the Cover Text requirement of section 3 is applicable to these 317 | copies of the Document, then if the Document is less than one half of 318 | the entire aggregate, the Document's Cover Texts may be placed on 319 | covers that bracket the Document within the aggregate, or the 320 | electronic equivalent of covers if the Document is in electronic form. 321 | Otherwise they must appear on printed covers that bracket the whole 322 | aggregate. 323 | 324 | 325 | 8. TRANSLATION 326 | 327 | Translation is considered a kind of modification, so you may 328 | distribute translations of the Document under the terms of section 4. 329 | Replacing Invariant Sections with translations requires special 330 | permission from their copyright holders, but you may include 331 | translations of some or all Invariant Sections in addition to the 332 | original versions of these Invariant Sections. You may include a 333 | translation of this License, and all the license notices in the 334 | Document, and any Warranty Disclaimers, provided that you also include 335 | the original English version of this License and the original versions 336 | of those notices and disclaimers. In case of a disagreement between 337 | the translation and the original version of this License or a notice 338 | or disclaimer, the original version will prevail. 339 | 340 | If a section in the Document is Entitled "Acknowledgements", 341 | "Dedications", or "History", the requirement (section 4) to Preserve 342 | its Title (section 1) will typically require changing the actual 343 | title. 344 | 345 | 346 | 9. TERMINATION 347 | 348 | You may not copy, modify, sublicense, or distribute the Document 349 | except as expressly provided under this License. Any attempt 350 | otherwise to copy, modify, sublicense, or distribute it is void, and 351 | will automatically terminate your rights under this License. 352 | 353 | However, if you cease all violation of this License, then your license 354 | from a particular copyright holder is reinstated (a) provisionally, 355 | unless and until the copyright holder explicitly and finally 356 | terminates your license, and (b) permanently, if the copyright holder 357 | fails to notify you of the violation by some reasonable means prior to 358 | 60 days after the cessation. 359 | 360 | Moreover, your license from a particular copyright holder is 361 | reinstated permanently if the copyright holder notifies you of the 362 | violation by some reasonable means, this is the first time you have 363 | received notice of violation of this License (for any work) from that 364 | copyright holder, and you cure the violation prior to 30 days after 365 | your receipt of the notice. 366 | 367 | Termination of your rights under this section does not terminate the 368 | licenses of parties who have received copies or rights from you under 369 | this License. If your rights have been terminated and not permanently 370 | reinstated, receipt of a copy of some or all of the same material does 371 | not give you any rights to use it. 372 | 373 | 374 | 10. FUTURE REVISIONS OF THIS LICENSE 375 | 376 | The Free Software Foundation may publish new, revised versions of the 377 | GNU Free Documentation License from time to time. Such new versions 378 | will be similar in spirit to the present version, but may differ in 379 | detail to address new problems or concerns. See 380 | http://www.gnu.org/copyleft/. 381 | 382 | Each version of the License is given a distinguishing version number. 383 | If the Document specifies that a particular numbered version of this 384 | License "or any later version" applies to it, you have the option of 385 | following the terms and conditions either of that specified version or 386 | of any later version that has been published (not as a draft) by the 387 | Free Software Foundation. If the Document does not specify a version 388 | number of this License, you may choose any version ever published (not 389 | as a draft) by the Free Software Foundation. If the Document 390 | specifies that a proxy can decide which future versions of this 391 | License can be used, that proxy's public statement of acceptance of a 392 | version permanently authorizes you to choose that version for the 393 | Document. 394 | 395 | 11. RELICENSING 396 | 397 | "Massive Multiauthor Collaboration Site" (or "MMC Site") means any 398 | World Wide Web server that publishes copyrightable works and also 399 | provides prominent facilities for anybody to edit those works. A 400 | public wiki that anybody can edit is an example of such a server. A 401 | "Massive Multiauthor Collaboration" (or "MMC") contained in the site 402 | means any set of copyrightable works thus published on the MMC site. 403 | 404 | "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 405 | license published by Creative Commons Corporation, a not-for-profit 406 | corporation with a principal place of business in San Francisco, 407 | California, as well as future copyleft versions of that license 408 | published by that same organization. 409 | 410 | "Incorporate" means to publish or republish a Document, in whole or in 411 | part, as part of another Document. 412 | 413 | An MMC is "eligible for relicensing" if it is licensed under this 414 | License, and if all works that were first published under this License 415 | somewhere other than this MMC, and subsequently incorporated in whole or 416 | in part into the MMC, (1) had no cover texts or invariant sections, and 417 | (2) were thus incorporated prior to November 1, 2008. 418 | 419 | The operator of an MMC Site may republish an MMC contained in the site 420 | under CC-BY-SA on the same site at any time before August 1, 2009, 421 | provided the MMC is eligible for relicensing. 422 | 423 | 424 | ADDENDUM: How to use this License for your documents 425 | 426 | To use this License in a document you have written, include a copy of 427 | the License in the document and put the following copyright and 428 | license notices just after the title page: 429 | 430 | Copyright (c) YEAR YOUR NAME. 431 | Permission is granted to copy, distribute and/or modify this document 432 | under the terms of the GNU Free Documentation License, Version 1.3 433 | or any later version published by the Free Software Foundation; 434 | with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 435 | A copy of the license is included in the section entitled "GNU 436 | Free Documentation License". 437 | 438 | If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, 439 | replace the "with...Texts." line with this: 440 | 441 | with the Invariant Sections being LIST THEIR TITLES, with the 442 | Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. 443 | 444 | If you have Invariant Sections without Cover Texts, or some other 445 | combination of the three, merge those two alternatives to suit the 446 | situation. 447 | 448 | If your document contains nontrivial examples of program code, we 449 | recommend releasing these examples in parallel under your choice of 450 | free software license, such as the GNU General Public License, 451 | to permit their use in free software. 452 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # Access digital objects! 4 | 5 | The Royal Library, Copenhagen, has been digitizing Cultural Heritage 6 | Objects in its collections since well over two decades. More recently, 7 | the last 8-10 years, we have tried to build our dissemination platforms 8 | using a REST based architecture. 9 | 10 | This collection of documents describes the various APIs we are using 11 | ourselves to provide access to our data to library patrons, in the 12 | hope that the access points could be useful for a new category library 13 | patron whose research or studies would benefit from programmatic access 14 | to our digital collections. 15 | 16 | ![Linked data](https://www.w3.org/DesignIssues/diagrams/lod/597992118v2_350x350_Back.jpg) [linked data](https://www.w3.org/DesignIssues/LinkedData.html) 17 | 18 | 19 | ## Licences & Legalese 20 | 21 | The documention here is provided as is, and mind you: __Everything 22 | that's free comes with no guarantee__. As a matter of fact the 23 | material in __this git repository__ is licensed to you as 24 | [GNU Free Documentation License](LICENSE) 25 | 26 | The material we __provide access to using the APIs described__ are of two kinds: 27 | 28 | 1. Metadata: This comes to you as [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/). I.e., 29 | our metadata is basically public domain. 30 | 31 | 2. Content: The content in our digital collections are delivered with 32 | varous licenses, but the most common one is 33 | [Attribution-NonCommercial-NoDerivs 2.0 Generic](https://creativecommons.org/licenses/by-nc-nd/2.0/). I.e., our content is provided with a much more restrictive CC license. 34 | 35 | ## Caveats 36 | 37 | The APIs described have been used successfully in projects (in some 38 | cases for many years). We, their users & developers, created each of 39 | them for getting a job done. They are usually well tested and works 40 | well, but they are neither polished, nor are there helpful error 41 | messages etc. 42 | 43 | The formats we are delivering are to a large extent based on 44 | standards. That means that in many places we refer to external 45 | documentation whenever possible. We are not guaranteeing anything 46 | concerning the external information sources, nor that our data are 47 | strictly valid in relation to those documents. 48 | 49 | ## Services by scope and purpose 50 | 51 | Our services builds upon the ideas 52 | 53 | + [COOL URIs](https://www.w3.org/Provider/Style/URI) 54 | 55 | and 56 | 57 | + [linked data](https://www.w3.org/DesignIssues/LinkedData.html) 58 | 59 | as they are presented by Tim Berners-Lee. We don't promise that our 60 | links will persist for 2000 years, but we do our best to keep them and 61 | if we don't we promise to make redirects according to best practise. 62 | 63 | As you will see, we are slightly better than three on Berners-Lee's 64 | five grade scale. We do not provide access to RDF and SPARQL but RSS 65 | and OpenSearch. Few provide access to RDF these days, so perhaps W3C 66 | should redesign the cop (see above). 67 | 68 | Neither do we link to external sources. 69 | 70 | A brief introduction to the characters in the story 71 | 72 | + [Digital Editions - COP](http://www.kb.dk/editions/any/2009/jul/editions/en/), an acronym for which one possible interpretation could be **C**ommon **O**bject **P**ublishing platform. The word _Common_ would refer to that it is a platform which is common between different collections and media types. However, when it was released we intentionally never gave the service a brand name. 73 | + [Aerial Photography collection - DSFL](http://www.kb.dk/danmarksetfraluften/) (Danmark Set Fra Luften) 74 | + [Rex - our Integrated Library System](http://rex.kb.dk/) and Aleph - our Online public access catalog (OPAC) 75 | + National Aggregator - A system run by us aggregating material from Danish libraries, museums and archives on behalf of Europeana 76 | + [Archive for Danish Literature](http://www.adl.dk/), ADL 77 | 78 | ### Dissemination of metadata 79 | 80 | The purpose of our dissemination is to enable us to synchronize data 81 | between our own systems, but also to share our data with Cultural 82 | Heritage communities at large. The dissemination services are aimed as 83 | aggregator services of various kinds. Our dissemination API is [OAI-PMH](oai-pmh.md) 84 | 85 | ### Search and retrieval 86 | 87 | The metadata and objects in COP are accessible from two front ends 88 | 89 | + [Digital Editions](http://www.kb.dk/editions/any/2009/jul/editions/en/) 90 | + Aerial Photography Collection, [Danmark set fra Luften](http://www.kb.dk/danmarksetfraluften/) (DFSL) 91 | 92 | Both services use the same web service, but DFSL has some 93 | geographical extensions in order to run its map based interface 94 | 95 | + [General COP web services](cop-backend.md) 96 | + [Geographical extensions](geographic-data.md) 97 | 98 | ### Metadata formats 99 | 100 | Through COP we deliver metadata to various services, in [different 101 | formats](metadata-formats.md). 102 | 103 | ### Content formats 104 | 105 | Virtually all content in COP is delivered as images; delivery is 106 | described in a [separate document](image-delivery.md). 107 | 108 | ### Text and literature 109 | 110 | Currently the whole [Archive for Danish Literature is available](text-corpora.md). 111 | 112 | ## Acknowledgements 113 | 114 | Andreas Borchsenius West contributed information about feeds and the 115 | JSON service used for maps, notably our Aerial photography collection 116 | Danmark Set Fra Luften (DSFL). 117 | -------------------------------------------------------------------------------- /cop-backend.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # Web services in COP 4 | 5 | ## Web service architecture 6 | 7 | ![COP Architecture](web-service-architecture.png) 8 | 9 | ## Syndication service formats 10 | 11 | The syndication service is made for search and retrieval. By default it 12 | delivers a search result set (or a search for a record ID) in [rss 13 | 2.0](https://cyber.harvard.edu/rss/rss.html). 14 | mods and kml format are supported as well. You can "toggle" between these formats using the `format` CGI 15 | parameter. 16 | 17 | | format | root element | 18 | |:-------|:-------------| 19 | | rss | ``` ... ```| 20 | | mods | ``` ... ``` | 21 | | kml | ``` ... ``` | 22 | 23 | The formats rss and kml are used internally in our services. All search and retrieval in 24 | [Digital Editions - COP](http://www.kb.dk/editions/any/2009/jul/editions/en/) 25 | is based on the former, the latter is tranlated to ```json``` and is then used in the 26 | client side rendering of maps [DSFL](http://www.kb.dk/danmarksetfraluften/) (DFSL) 27 | 28 | Note that mods records are embedded in the rss. 29 | 30 | ## SOLR Search 31 | 32 | It is now possible to use our public [SOLR index for searching](cop-solr-fields.md). We have [a brief demo](http://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/cop-solr-form.html) on how to use them. 33 | 34 | ## Open Search 35 | 36 | Clients communicates with the server with Amazon A9.com [Open 37 | search](http://www.opensearch.org/Home) protocol. 38 | 39 | All formats supported include a Open search header (used for 40 | resultset navigation) at the very beginning of each result set: 41 | 42 | ``` 43 | 1 44 | 40 45 | 46 | 104820 47 | 51 | 54 | ``` 55 | 56 | This search result starts with record 1 out of 104820, you obtain them in chunks of 40 57 | items. You can adjust your retrieval using the `page` and `itemsPerPage` [CGI variables](cop-backend.md#query-parameters-in-cop) 58 | 59 | ## Examples 60 | 61 | + search "by subject", searching by navigation -- examples 62 | + http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/en/?page=1&subject=2120&itemsPerPage=40 63 | which is synonymous with 64 | + http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/subject2120/en/?page=1 65 | The result set can be further molded by the itemsPerPage parameter. For example 66 | + http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/subject2109/en/?itemsPerPage=5 67 | if you need page 4 in the result set add that to the `URI` 68 | + http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/subject2109/en/?itemsPerPage=5&page=4 69 | + search "by querying", ordinary search -- for example 70 | http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/en/?page=2&query=jesus&itemsPerPage=40 71 | + You can search for anything in any of the editions using a search form 72 | http://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/cop-form.html 73 | 74 | To see what is going on in these examples, you have to view source, or 75 | you just see how your browser renders RSS. If you're using a computer 76 | supporting a unix command line, such as Mac and Linux it is actually 77 | easier to use the `xmllint` command: 78 | 79 | `xmllint --format` "http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/subject2109/en/?itemsPerPage=5&page=4" 80 | 81 | Don't forget the quotation marks around the URI, or your shell will 82 | try to do clever things with the meta-characters (?& etc). The format 83 | option ensures that the retrieved document is nicely indented and 84 | pretty-printed. 85 | 86 | ## Query parameters in COP 87 | 88 | Complete list of supported parameters in COP 89 | 90 | | Parameter | Use in PATH | Use in CGI | Comment | 91 | |-----------|:-------------|:------------|:--------| 92 | |query | - | supported | queries all fields | 93 | |searchWide | - | supported | 'true' if searching all fields, 'false' otherwise | 94 | |format | - | supported | kml, rss and mods | 95 | |page | - | supported | | 96 | |itemsPerPage | - | supported | | 97 | |object | supported | - | | 98 | |subject | supported | supported | deprecated | 99 | |bbo | - | - | supported | Bounding Box | 100 | |notBefore | - | supported | Not before a given date in iso format YYYY-MM-DD | 101 | |notAfter | - | supported | Not after a given date in iso format YYYY-MM-DD | 102 | 103 | Field candidates, mostly geographical ones 104 | 105 | | Parameter | Used in PATH | Used in CGI | Comment | Status| 106 | |-----------|:-------------|:------------|:--------|-------| 107 | |title | - | supported ||| 108 | |person | - | supported ||| 109 | |building | - | supported ||| 110 | |parish | - | supported ||| 111 | |street | - | supported ||| 112 | |housenumber | - | supported ||| 113 | |zipcode | - | supported ||| 114 | |cadastre | - | supported ||| 115 | |area | - | supported ||| 116 | |city | - | supported ||| 117 | |location | - | supported ||| 118 | |note | - | supported ||| 119 | |orientation | - | supported ||| 120 | 121 | We have more [detailed information on geographical searching](geographic-data.md). 122 | 123 | ## Navigation service 124 | 125 | The subject hierarchy needed for filtering and for building the 126 | browsing service can be retrieved from the navigation service. Links 127 | to RSS & HTML representations of the data -- ex 128 | 129 | http://www.kb.dk/cop/navigation/images/billed/2010/okt/billeder/subject841/ 130 | 131 | Here note that you get all nodes in the subject tree down to _kirker 132 | og kirkegårde_, i.e., a complete bread crumb path: 133 | 134 | _Billeder_ - _Samlinger_ - _Fotografarkiver_ - _Türck, Sven_ - _arkitektur_ 135 | 136 | Architecture is the parent of _kirker og kirkegårde_ 137 | 138 | ``` 139 | 145 | 151 | 157 | 163 | 164 | 165 | ``` 166 | 167 | Each node contains The subjects name in Danish or when applicable also English, such as for the following one: 168 | 169 | http://www.kb.dk/cop/navigation/manus/ortsam/2009/okt/orientalia/subject637/en/ 170 | 171 | where there is `text-en` in addition to `text`. For each `outline` we 172 | get three `URI` attributes: htmlUrl, url and xmlUrl. They point to the 173 | content of a subject, the navigation info for that subject and finally 174 | the corresponding syndication services. Hence, considering the 175 | `outline` with subject 1036 above well get: 176 | 177 | + Its subject tree: http://www.kb.dk/cop/navigation/images/billed/2010/okt/billeder/subject1036/ 178 | + Its syndication feed: http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/subject1036/ 179 | + And finally, if you append a language, for example, `en/` as in English to the htmlUrl http://www.kb.dk/images/billed/2010/okt/billeder/subject1036/en/ a the search result in HTML for gravsten (i.e., tomb stone) with user interface in English. 180 | 181 | ## Content service 182 | 183 | A majority of the digital objects in COP are "complex" in one way or 184 | another. By that we understand that we need to see more than one file 185 | to consume the entire object. The simplest case is a photograph 186 | requiring a digital image of each side, where there are essential 187 | information on the back. This means that we need a table of 188 | contents (TOC) for every digital object in the service, and that TOC 189 | is delivered through the content service. 190 | 191 | Here is one for a fairly complex song book 192 | 193 | http://www.kb.dk/cop/content/manus/musman/2010/dec/viser/object23942/en/ 194 | 195 | and one for two page letter 196 | 197 | http://www.kb.dk/cop/content/letters/judsam/2011/mar/dsa/object10/en/ 198 | 199 | respectively. 200 | 201 | Typically they look like: 202 | 203 | ``` 204 | 209 | 210 | 212 | 213 | 215 | 216 | 218 | 219 | 220 | 222 | 223 | ... 224 | 225 | ``` 226 | 227 | The nesting of outline elements seems to be dependent on some oddities 228 | in the authoring tool. In order to read the book, you'll have to 229 | retrieve the images. The construction of [image URIs is described elsewhere](image-delivery.md). 230 | 231 | 232 | 233 | 234 | 235 | -------------------------------------------------------------------------------- /cop-solr-fields.md: -------------------------------------------------------------------------------- 1 | # COP SOLR fields 2 | 3 | The index is built for the purpose of making a single bibliographic search index on top of a heterogeneous collection. 4 | It is built based on a much more fine grained description in MODS. 5 | The index contains 6 | 7 | 1. fields required by SOLR and related functions 8 | 2. fairly course grained fields having [Dublin Core](http://dublincore.org/documents/dces/), [ESE](http://pro.europeana.eu/page/ese-documentation) semantics or TEI semantics 9 | 3. fine grained fields for access to some data in the MODS 10 | 11 | __NB:__ The fields here are also available for use through [COP web services](cop-backend.md) 12 | 13 | The source, when given, is the xpath to where it is stored in the 14 | MODS. In the xpaths we occasionally refer to xml namespace for mods 15 | (md), dc, ese, tei (t) and xhtml (h). 16 | 17 | We have [a brief demo](http://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/cop-solr-form.html) on how to use these fields in our cop-editions index. 18 | 19 | ## Global fields 20 | 21 | | field(s) | source | Comments | 22 | |:---------|:-------|:---------| 23 | | id || as everywhere else ;) | 24 | | medium_ssi | Record ID | For resources images, letters, maps, manus, pamphlets, books. Then medium_ssi can be editions and categories for whole collections and subject areas, respectively | 25 | 26 | medium_ssi is categories, editions we can retrieve information about topics and collections respectively; 27 | the remaining medium_ssi values return to actual resources. 28 | 29 | ## Edition fields 30 | 31 | | field(s) | source | Comments | 32 | |:---------|:-------|:---------| 33 | | id || as everywhere else ;) | 34 | | medium_ssi || Always 'editions' for editions, what else could it be? | 35 | | name_ssi, name_en_ssi || Name of the edition in Danish and English respectively | 36 | | top_cat_ssi || Category ID of the subject being the root of the editions subject tree | 37 | | description_tdsim, description_tesim || Sequence of paragraphs descripting the resource in Danish and English, respectively | 38 | | collection_da_ssi, collection_en_ssi || The name of the library collection from where the edition emanates | 39 | | department_da_ssi, department_en_ssi || The the English and Danish names of the organisational unit whithin the library responsible for that collection | 40 | | contact_email_ssi || Contact information for that unit | 41 | 42 | ## Category fields 43 | 44 | | field(s) | source | Comments | 45 | |:---------|:-------|:---------| 46 | | id | /md:mods/md:recordInfo/md:recordIdentifier | The record ID as given in the MODS record | 47 | | medium_ssi || Always 'categories' for subject matter. Surprised? | 48 | | parent_ssi || id of parent node in the tree 49 | | node_tdsi || Danish name of the node 50 | | node_tesi || English name of the node 51 | | bread_crumb_ssim || array of category ids starting from the parent_ssi, through to the id of the edition | 52 | 53 | ## Fields for COP and Aerial photography resources 54 | 55 | | field(s) | source | Comments | 56 | |:---------|:-------|:---------| 57 | | cataloging_language_ssi | /md:mods/md:recordInfo/md:languageOfCataloging/md:languageTerm | 'da' or 'en', i.e., the default language for strings in the record | 58 | | full_title_tsim | /md:mods/md:titleInfo/md:title | All titles concatenated | 59 | | title_tesim, title_tdsim, title_tsim | /md:mods/md:titleInfo/md:title | Lists of all titles in English (tesim), Danish (tdsim) or other languages (tsim), respectively. Isn't used in any clever way | 60 | | author_tsim, author_nasim, creator_tsim, creator_nasim, creator_tsi | /md:mods/md:name[md:role/md:roleTerm[@type='text']='creator' or
md:role/md:roleTerm[@type='code']='cre' or
md:role/md:roleTerm[@type='code']='aut'] | Author and creator are synonymous. nasim is **untokenized** and tsim **tokenized** text. The tsi fields contain the **first** instance of the field in the MODS record | 61 | | contributor_tesim, contributor_tdsim, contributor_tsim, contributor_tsi, contributor_nasim | DC translation of the MODS name roles| The tsi fields contain the **first** instance of the field in the MODS record | 62 | | publisher_tesim, publisher_tdsim, publisher_tsim, publisher_tsi, publisher_nasim | MODS | Currently not used because of the nature of the collections in the service | 63 | | description_tesim, description_tdsim, description_tsim | DC translation of the MODS descriptions | 64 | | format_tesim, format_tdsim, format_tsim | DC translation of the MODS | 65 | | read_direction_ssi || LTR or RTL, used for describing whether a text is left to right or the other way around. RTL means that pages should be browsed from RTL | 66 | | type_tesim, type_tdsim, type_tsim | DC translation of the MODS | **messy** | 67 | | dc_type_tesim, dc_type_tdsim, dc_type_tsim | Translated directly from MODS | **messy** | 68 | | language_tsim | DC translation of the MODS | Usually a RFC 4646 language tag | 69 | | rights_tsim | DC translation of the MODS | Usually link to the appropriate CC license | 70 | | coverage_tdsim, coverage_tesim | DC translation of the MODS | Can be place names | 71 | | dcterms_spatial | DC translation of the MODS | lat long, especially for aerial photography | 72 | | cobject_location_tsi, cobject_location_tsim, cobject_location_ssim | /md:mods/md:subject/md:geographic/md:location | Place as a subject | 73 | | person_residence_tsim || Place as in the recidence of a sender or recipient | 74 | | origin_place_tsim || Place as in place of origin i.e., publication | 75 | | subject_tesim, subject_tdsim, subject_tsim | DC translation of the MODS | 76 | | pub_dat_tsim || Buggy | 77 | | readable_dat_string_tsim || Buggy | 78 | | local_id_ssi, local_id_fngsi | image URI | ID containing image file name. Use for connecting image to physical instance | 79 | | shelf_mark_tdsim, shelf_mark_tesim | | How to find the physical instance | 80 | | subject_topic_id_ssim | /md:mods/md:extension/h:div/h:a/@h:href | The list of IDs of the categories a given resource belong to | 81 | | subject_topic_facet_tesim, subject_topic_facet_tdsim | /md:mods/md:extension/h:div/h:a | The list of names of the categories a given resource belong to. The categories are either in Danish (tdsim) or English (tesim) | 82 | | mods_ts, processed_mods_ts || original XML blobs. processed_mods_ts is the complete one with some keywords and descriptions from external databases | 83 | | mods_uri_tsim || The URI of the MODS record used for generating the SOLR one | 84 | | thumbnail_square_url_ssm || An array of one square image URI for the resource | 85 | | thumbnail_url_ssm || An array of one image URI for the resource | 86 | | content_metadata_image_iiif_info_ssm || An array with URIs for images of all pages in a multipage document. See [Image delivery](image-delivery.md#constructing-iiif-uris) | 87 | | cobject_not_before_dtsi | /md:mods/md:originInfo/md:dateCreated/@t:notBefore | [Using TEI date model](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-date.html) | 88 | | cobject_not_after_dtsi | /md:mods/md:originInfo/md:dateCreated/@t:notAfter | [Using TEI date model](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-date.html) | 89 | | cobject_edition_ssi | assigned during ingest | 90 | | cobject_title_ssi | extracted from MODS during ingest | 91 | | cobject_id_ssi || synonym to id | 92 | | cobject_person_tsim, cobject_person_ssim | /md:mods/md:name[@type='personal']/md:namePart | Name of persons regardless of their relation to the work | 93 | | cobject_random_number_dbsi | generated on database ingest | used for various sorting and selection tasks | 94 | 95 | ### Fields added for Spotlight exhibitions 96 | 97 | | spotlight_exhibition_ssi || The name the exhibition in which the object appear. | 98 | 99 | ### Crowd sourced fields in Aerial photography 100 | 101 | | field(s) | source | Comments | 102 | |:---------|:-------|:---------| 103 | | luftfo_type_ssim, luftfo_type_tdsim | 104 | | cobject_bookmark_ssi || currently not used| 105 | | cobject_building_ssim, cobject_building_tsim || Name of a building | 106 | | cobject_correctness_isi || Indicator of whether crowd sourcing is pending (0) or completed (1) | 107 | | cobject_interestingness_isi || An integer [0.. 10] indicating how much interest or effort the object has generated among users | 108 | | cobject_last_modified_lsi || Long integer. Unix date | 109 | | area_area_tsim | Encoded as a hierarchicalGeographic subject in MODS. See Appendix below. | Usually a comma seperated list of places from more general to more specific like "Danmark, Sjælland, Arnøje" which together specifies a place. | 110 | | area_cadastre_tsim | See Appendix below | matrikelnummer in Denmark | 111 | | area_parish_tsim | See Appendix below | Sogn in Denmark | 112 | | area_building_tsim | See Appendix below | Overlaps with cobject_building_ssim above, but seems to be more precise | 113 | | citySection_zipcode_tsim | See Appendix below | Postnummer | 114 | | citySection_housenumber_tsim | See Appendix below | 115 | | citySection_street_tsim | See Appendix below | 116 | 117 | ### Appendix: Hierarchical geographic subject 118 | ``` 119 | 120 | 121 | Havrebjerg 122 | 34 123 | Havrebjerg 124 | Havrebjerg Brugsforening 125 | 4200 126 | 69 127 | Krænkerupvej 128 | Slagelse 129 | 130 | 131 | ``` -------------------------------------------------------------------------------- /data-crunch-demos/date-demo.xsl: -------------------------------------------------------------------------------- 1 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | -------------------------------------------------------------------------------- /data-crunch-demos/settlement-demo.xsl: -------------------------------------------------------------------------------- 1 | 4 | 5 | 6 | 7 | 8 | http://www.kb.dk/cop/syndication/letters/judsam/2011/mar/dsa/en/?query= 9 | 10 | 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /dka/README.md: -------------------------------------------------------------------------------- 1 | # Harvesting using OAI -- a worked example 2 | 3 | Run as 4 | 5 | ``` 6 | cd harvested_records 7 | xsltproc ../ese2dka.xsl 'http://oai.kb.dk/oai/provider?verb=ListRecords&metadataPrefix=ese&set=kb.partiprogrammer' 8 | ``` 9 | 10 | which will result in 2510 xml files in the directory harvested_records, 11 | containing descriptions of as many political pamphlets. The ese2dka.xsl is 12 | recursive and will retrieve all content for a given set (i.e., it is actually 13 | a complete and simple OAI-PMH harvester written in XSLT 1.0). 14 | 15 | Any edition from the Royal Library, Copenhagen and its content providers can 16 | be harvested from oai.kb.dk. See 17 | 18 | http://oai.kb.dk/oai/provider?verb=ListSets 19 | 20 | for a complete list of editions. Another possible harvests are 21 | 22 | ``` 23 | cd harvested_records 24 | xsltproc ../ese2dka.xsl 'http://oai.kb.dk/oai/provider?verb=ListRecords&metadataPrefix=ese&set=kb.orientalmss' 25 | xsltproc ../ese2dka.xsl 'http://oai.kb.dk/oai/provider?verb=ListRecords&metadataPrefix=ese&set=kb:kortatlas:ww1' 26 | ``` 27 | 28 | which yields descriptions of 42 oriental manuscripts and 180 maps related to 29 | the first worldwar. 30 | 31 | There is a set of files created during testing of the 32 | harvester/transformation. 33 | 34 | All harvested records validates against the DKA2.xsd 35 | 36 | 37 | 38 | 39 | 40 | 41 | -------------------------------------------------------------------------------- /dka/ese2dka.xsl: -------------------------------------------------------------------------------- 1 | 2 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | Resuming harvesting with 47 | 48 | 49 | 50 | 51 | 52 | 55 | 56 | <xsl:for-each select="dc:title"> 57 | <xsl:value-of select="."/><xsl:if test="position() < last()"><xsl:text>; </xsl:text></xsl:if> 58 | </xsl:for-each> 59 | 60 | 61 | 62 |
63 |
64 |

65 | 66 | 67 | 68 |
69 |
70 |
71 |

72 | 73 | 74 |

75 | Beskrivelse 76 |
77 | 78 | 79 | 80 | 81 | 82 |

83 |
84 | 85 | 86 |

87 | Format 88 |
89 | 90 | 91 | 92 |

93 |
94 | 95 | 96 |

97 | Kolofon 98 |
99 | 100 | og 101 | 102 | , 103 | 104 |

105 |
106 | 107 | 108 |

109 | 110 | Mere om objektet 111 | 112 |

113 | 114 |

115 | 116 | 117 | 118 | 119 | 120 | Mere fra samme udgivelse 121 | 122 |

123 | 124 |
Copyright 125 |
126 | © 127 |
128 | 129 |
130 | 131 |
132 | 133 | width:100%; 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 |
142 | 143 |
144 | 145 |
146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 162 | 163 | 164 | 165 | 166 | contributor 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | creator 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | Copyright © 191 | 192 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 |
207 |
208 | 209 |
210 | -------------------------------------------------------------------------------- /dka/harvested_records/README.md: -------------------------------------------------------------------------------- 1 | # place for harvested records -- derived content 2 | -------------------------------------------------------------------------------- /dka/schemas/DKA.xsd: -------------------------------------------------------------------------------- 1 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | -------------------------------------------------------------------------------- /dka/schemas/DKA2.xsd: -------------------------------------------------------------------------------- 1 | 7 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 31 | 32 | 33 | 34 | 35 | This field should contain a comma-seperated 36 | list of types among: Video Sound Image Text 37 | This field is deprecated as it is only used 38 | for filtering on the DKA frontend - it 39 | should instead be implemented as a computed 40 | solr index from the file formats attached to 41 | the object. 42 | 43 | 44 | 45 | 47 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 57 | 58 | 59 | A contributor is a 60 | person/organisation on the 61 | "frontside of the camera", such 62 | as an actor. 63 | 64 | 65 | 66 | 68 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 80 | 81 | 82 | A creator is a 83 | person/organisation from "behind 84 | of the camera", such as a 85 | producer, director or 86 | destributing organisation. 87 | 88 | 89 | 90 | 92 | 94 | 95 | 96 | 97 | 98 | 99 | 101 | 103 | 105 | 107 | 108 | 109 | 111 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 129 | 130 | 131 | 132 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | -------------------------------------------------------------------------------- /dka/specimen/specimen.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | Tryggelev Nor 6 | 7 | 8 |
9 |

Angrebet fra alle sider

10 |

Tryggelev Nor ligger på Langelands vestkyst. Indtil 2008 lå der en 11 | langdysse på toppen af klinten. Dyssen lå langs stranden. Gennem mange 12 | år har havet arbejdet sig ind på kysten. Til sidst lå alle randstenene 13 | fra den ene langside af dyssen nede på stranden i en lang 14 | række. Langdyssen er bare én af Danmarks cirka 2500 fredede dysser og 15 | jættestuer. Oprindelig var der ti gange så mange. Der bliver passet godt 16 | på de tilbageværende, men i nogle tilfælde kan lovgivningen alene ikke 17 | forhindre ødelæggelserne. Det gælder, når det er naturen selv, som 18 | nedbryder fortidsminderne.

19 |
20 |
21 |

Fra gravkammer til køleskab

22 |

Langdyssen ved Tryggelev Nor blev ikke kun angrebet af havet. På 23 | landsiden var dyssen også under angreb. Her havde en greve fra 24 | Tranekær nemlig fjernet randstenene fra den anden langside i den 25 | sidste halvdel af 1800-tallet. Stenene blev kløvet og brugt til at 26 | bygge et stendige. Samtidig havde greven bygget sig en jagthytte på 27 | stedet. I den forbindelse var gravkammeret blevet indrettet som 28 | viktualiekælder, altså en slags naturligt køleskab. I 2006 gnavede 29 | en storm endnu et stykke af klinten. Selve gravkammeret var i fare 30 | for at falde ned og til fare for publikum. Det var ikke muligt at 31 | kystsikre klinten effektivt. I 2008 blev gravkammeret derfor skilt 32 | ad og flyttet i sikkerhed.

33 |
34 |

35 | Skrevet af Jørgen Westphal 36 |

37 |
38 | 1001 fortællinger om Danmark - Kulturstyrelsen 39 | http://www.kulturarv.dk/1001fortaellinger/da_DK/s/630 40 | Picture, Sound 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | Langeland 53 | Copyright © Kulturstyrelsen 54 | 55 | 54.791752 56 | 10.661015 57 | 58 | 59 | 60 | Stenalderens bønder 61 | Klima- og landskabsændringer gennem 10000 år 62 | Vi skriver historie 63 | Megalit 64 | dysse 65 | Neolitikum 66 | Stenalder 67 | sikring af fortidsminder 68 | 69 | 70 | 71 | 72 |
73 | 74 | -------------------------------------------------------------------------------- /examples.md: -------------------------------------------------------------------------------- 1 | # Examples 2 | 3 | Our web services were created some years ago to be able to build our digital collection web portal http://www.kb.dk/editions/any/2009/jul/editions/da/ 4 | So here is a good place to start and explore our data. 5 | 6 | Please note that it is important to understand that for historical reason: 7 | 8 | * The ID of every object in our system is not a number but a string who looks like an URL. 9 | 10 | For example if you take this portrait of Søren Kiekegaard: 11 | http://www.kb.dk/images/billed/2010/okt/billeder/object76439/ 12 | 13 | The ID of this image is not *object76439* but the string *images/billed/2010/okt/billeder/object76439* 14 | (Note then that changing any parameters here like the date or the year will not give any new result) 15 | 16 | * The "base URL" of our web service is http://ww.kb.dk/editions/any/2009/jul/editions/da/ 17 | 18 | 19 | The data is structured by editions, an object is unique but can be in several editions. 20 | Each editions have categories, and each categories can have sub-categories. 21 | As you can notice in our [COP web portal](http://www.kb.dk/editions/any/2009/jul/editions/da/) we have around 14 editions 22 | * Billede: http://www.kb.dk/images/billed/2010/okt/billeder/da/ 23 | * Partiprogrammer: http://www.kb.dk/pamphlets/dasmaa/2008/feb/partiprogrammer/da/ 24 | * Judaistisk Samling: Tidlige & sjældne tryk: http://www.kb.dk/books/judsam/2010/maj/jstryk/da/ 25 | * and so on ... 26 | 27 | 28 | Three concretes examples: 29 | 30 | + If you want to find any data about Søren Kierkegaard: 31 | 32 | http://www.kb.dk/cop/editions/any/2009/jul/editions/da/?query=Søren+kierkegaard 33 | 34 | + If you want to get all the "søkort" from Atlanterhavet: 35 | 36 | http://www.kb.dk/cop/editions/any/2009/jul/editions/da/?query=Søkort&location=Atlanterhavet 37 | 38 | + Or get all the "småtryk/partiprogrammer" of the "socialdemokraterne" in the period 1900-1930: 39 | 40 | Get all object containing socialdemokraterne: 41 | 42 | http://www.kb.dk/cop/editions/any/2009/jul/editions/da/?query=socialdemokraterne 43 | 44 | Keep only the partiprogrammer: 45 | 46 | http://www.kb.dk/cop/syndication/pamphlets/dasmaa/2008/feb/partiprogrammer/da/?query=socialdemokraterne 47 | 48 | Refine your search by setting the notBefore, notAfter parameters: 49 | http://www.kb.dk/cop/syndication/pamphlets/dasmaa/2008/feb/partiprogrammer/da/?query=socialdemokraterne¬Before=1900¬After=1930 -------------------------------------------------------------------------------- /form-demos/README.md: -------------------------------------------------------------------------------- 1 | 2 | [API READ ME](../README.md) - [OAI Dissemination](../oai-pmh.md) - [Web services in COP](../cop-backend.md) - [Aerial Photography](../geographic-data.md) - [Image delivery](../image-delivery.md) - [Metadata Formats](../metadata-formats.md) - [Text Corpora](../text-corpora.md) 3 | 4 | # Try these forms 5 | 6 | The forms are in HTML. You need to go elsewhere to try them, for example here: 7 | 8 | * [text corpus search](https://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/adl-form.html) 9 | * [cop solr form](https://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/cop-solr-form.html) 10 | * [cop backend demo](https://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/cop-form.html) 11 | -------------------------------------------------------------------------------- /form-demos/adl-form.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Text corpus search API 5 | 7 | 8 | 9 | 10 |
11 | 12 |
13 |

Text corpus search API

14 | 15 |
16 | 17 |

This document is a part of Royal 18 | Danish Library's APIs, and in particular The 20 | documentation on how use our texts. See also Licences 22 | & Legalese and 23 | Caveats

24 | 25 | 26 |
29 | 30 |

Try out the API here

31 |
32 |
Search for
33 |
34 | 37 |
38 |
filter query
39 |
40 | 43 |
44 |
field list
45 |
46 | 49 |
50 |
result format
51 |
52 | 56 |
57 |
start record
58 |
59 |
number of records
60 |
61 |
Sort by
62 |
63 |
Query parser
64 |
65 | 68 | 69 |
70 |

71 | 73 | 83 | Reset form! 84 | 85 |
86 |
87 |
88 |
89 |
90 |
 
 91 |       
92 |
93 |
94 | 95 |
96 |

Properties of the search index

97 | 98 |

All the texts that can be searched in using the API are in Text 100 | Encoding Initiative, TEI for short, markup.

101 | 102 |

The solfware used for indexing is described in the documentation of the project 103 | SOLR and Snippets

104 | 105 |
    106 |
  • The text documents are basically ordered 108 | hierarchies of overlapping content objects. In 109 | particular we can not easily simultaneously 110 | ascertain what content there is on a given page and see what 111 | content there is in a paragraph starting on that page. However, 112 | we can always know on what page a given chapter, paragraph 113 | or whatever commences. That is a fundamental property of 114 | text.
  • 115 |
  • 116 | 117 | In text service the objects in the content hierarchy are 118 | 119 |
      120 |
    • A work is an entity someone has decided 121 | to annotate using metadata. It is hence the unit the 122 | search engine returns in the result set. The granularity 123 | is an editorial issue. The more works there 124 | are in a volume the less text there is in 125 | each works, the higher the granularity.
    • 126 | 127 |
    • The leaf is the smallest unit of the tree 128 | which can be identified and therefore retrievable and 129 | possible to index. The user interface gives for each 130 | work in a result set a list of 131 | leafs that are relevant for the 132 | search. leafs are possible to quote but they 133 | do usually not appear in table of contents.
    • 134 | 135 |
    • The trunks are contained in 136 | works. They may contain other trunk 137 | nodes, or works or leafs. It is 138 | possible to address a trunk so it is possible 139 | to send a URI to someone and say: Read chapter 5, it 140 | is so good! They are indexed and searchable in 141 | principle. However, the user interface only support them 142 | in table of contents and quotation services.
    • 143 | 144 |
    • A volume is what comes close to a physical 145 | book. It contains one or more works. If a 146 | volume contains only one work, we refer to it 147 | as a monograph
    • 148 | 149 |
    150 |
  • 151 | 152 |
  • 153 | All text is indexed down to leaf, basically 154 | paragraph, level, which implies 155 |
      156 |
    • Paragraph in prose: <p> ... </p>
    • 157 |
    • Speech in drama: <sp> ... </sp>
    • 158 |
    • Strophe in poetry: <lg> ... </lg>
    • 159 |
    160 | The distinctions here between prose, drama and poetry is 161 | not based on philological analysis, rather, it is 162 | determined by what markup was used to represent the text. 163 | There are other leaf nodes, like table rows, list items 164 | etc. If the markup is made stringently, then this way of 165 | indexing will be stringent. 166 |
  • 167 | 168 |
  • The same text may appear on multiple levels in the 169 | index, and hence be addressed as, for example, paragraph, 170 | chapter and volume. In particular, works will 171 | contain all text from its leaf nodes.
  • 172 | 173 |
  • The index granularity differs between literary 174 | genres. For instance can poems and individual short stories 175 | or essays be treated as individual works, and a single 176 | volume contain hundreds of such items, whereas there are usually 177 | only one novel in a volume.
  • 178 | 179 |
180 | 181 |

Note that this document does not define or describe all 182 | fields in the index. The index is far too rich for that, but I 183 | believe that it contains what it takes to use 184 | it. The thing I have left out is basically more of the same.

185 | 186 |

Finally, all fields are not available for all editions, 187 | because the heterogeneity of the data, or wishes from the 188 | projects contributing data.

189 | 190 |
191 | 192 | 193 | 194 | 195 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 213 | 216 | 217 | 218 | 219 | 220 | 222 | 223 | 224 | 225 | 226 | 241 | 242 | 243 | 244 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 260 | 267 | 268 | 269 | 270 | 274 | 280 | 281 | 282 | 283 | 290 | 298 | 299 | 300 | 301 | 302 | 303 | 307 | 313 | 314 | 315 | 316 | 317 | 318 | 323 | 330 | 331 | 332 | 333 | 334 | 338 | 349 | 350 | 351 | 354 | 355 | 356 | 357 | 358 | 363 | 370 | 375 | 376 | 377 | 378 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 426 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 | 447 | 448 | 449 | 450 | 454 | 463 | 464 | 465 | 466 | 469 | 479 | 480 | 481 | 482 | 486 | 496 | 497 | 498 | 499 | 500 | 501 | 504 | 505 | 506 | 507 |
196 |

ID and Relations fields

197 |
labeldescriptionvalues
id
The ID of 209 | the record. It identifies the collection, the TEI file and 210 | is constructed as a string concatenation of that basename 211 | with the xml:id of the the content indexed and some other 212 | stuff. 214 |
string
215 |
volume_id_ssi
The ID of 221 | the volume that contain the node
part_of_ssim
227 | 228 | Array of IDs of trunk nodes being containers of the node 229 | at hand. Typically containing 230 | 231 |
    232 |
  • One (or more) work(s) as a parent(s). Works may contain works.
  • 233 |
  • A volume as an ancestor
  • 234 |
235 | 236 | Some works are monographs (i.e., their are contained in a 237 | volume with only one work), and for those the 238 | part_of_ssim field become meaningless. 239 | 240 |
245 |

Filter fields

246 |
labeldescriptionvalues
cat_ssi
Category of 257 | a text. Use when limiting searches to works or to find 258 | volumes or find author portraits (biographies), omit 259 | otherwise. 261 |
262 | work
263 | author
264 | period
265 | 	    
266 |
is_editorial_ssi
The contents 271 | originator is someone else than the author. In this service 272 | it is typically forewords, prefaces, comments etc in a 273 | scientific edition. 275 |
276 | yes
277 | no
278 |             
279 |
type_ssi
Node type 284 | in document. A trunk node can be a whole work, a chapter 285 | etc, whereas a leaf could a paragraph of prose, a stanza (or 286 | strophe) of poetry or a speak in a dialog in a scenic 287 | work. For historical reasons, whole texts have 288 | type_ssi:work. A type_ssi:trunk will yield a 289 | result set comprising chapters or section of some kind. 291 |
292 | work
293 | trunk
294 | leaf
295 | volume
296 | 	    
297 |
is_monograph_ssi
A monograph 304 | in text service is perhaps not what you expect (on the other 305 | hand, what you expect is a monograph in text service). A 306 | monograph is a volume with only one work. 308 |
309 | yes
310 | no
311 | 	    
312 |
genre_ssi
Genre of a 319 | leaf node. Note that this is not the genre of a work, but 320 | the structure of the paragraph level markup. If there is a 321 | song in a scenic work, the speak in question might be 322 | classified as containing mostlty poetry. Available for all editions except GV. 324 |
325 | prose
326 | poetry
327 | play
328 | 	    
329 |
subcollection_ssi
Filter with respect to collection. 335 | 336 | 337 | public-index.kb.dk contains all these editions. 339 |
340 |               adl
341 |               gv
342 |               jura
343 |               letters
344 |               lh
345 |               sks
346 |               tfs
347 | 	    
348 |
352 |

Sort fields

353 |
359 |
360 | position_isi
361 | 	  
362 |
The position 364 | of the current node along the sibling xpath axis in the 365 | document. Sorting with respect to this field will guarantee 366 | that the result is presented in document order. (We cannot 367 | use page number, which might be a roman numeral or an arabic 368 | one. Also, we need to take into account leaf 369 | nodes within pages.) 371 |
372 | integer
373 | 	    
374 |

379 |

Search fields

380 |
labeldescriptionvalues
work_title_tesim
Misc. metadata 391 | fields. There are more of them, but they should be self 392 | explanatory.
just plain text
volume_title_tesim
work_title_tesim
author_name_tesim
The 404 | author(s) of a document. For messages it is assumed that 405 | author is a synonym of sender.
text_tesim
The text
just plain text
prose_extract_tesim
414 | verse_extract_tesim
415 | performance_extract_tesim        
416 | 
The text, as text_tesim, split up into fields according to its form. The to fields get their content from <p> ... </p>, <lg> ... </lg> and <sp> ... </sp> respectively.
just plain text
contains_ssi
We measures the length of the texts in prose_extract_tesim 424 | verse_extract_tesim 425 | performance_extract_tesim, whichever is the longest is used to assign the value of this field. 427 |
428 | prose
429 | poetry
430 | play
431 | 
432 |
speaker_tesim
The name of a character uttering something in a dialogue
just plain text
page_ssi
The page number where a leaf node (paragraph, speak or strophe) starts. 444 |
string (either integer
445 | or roman numerals)
446 |
451 | person_name_ssim
452 | person_name_tesim
453 | 
Name of 455 | persons mentioned in works, or, in case of letters, name of 456 | the recipient. The field can be accessed both as text 457 | (tesim) and string (ssim). The names in these fields are 458 | normalized to last name first (LNF) format. Also, the 459 | normalized form usually hits variants, such as Shakespeare, 460 | William hits William Shakespeare, and Jesus hits Kristus 461 | (Danish for Christ) as well. But only in these fields, there 462 | is no query expansion for the full text.
other_location_ssim 468 | other_location_tesim sender_location_tesimNames of 470 | places mentioned in works, or, in case of letters, the 471 | residence of the sender. The field can be accessed both as 472 | text (tesim) and string (ssim). The place names are usually 473 | normalized. For instance, a search in these field for 474 | Danmark hits Dannemark as well. The reverse is not true, a 475 | search for Dannemark hits only the word Dannemark in the 476 | full text (see text_tesim above). 477 | sender_location_tesim applies to letters 478 | only.
484 | bible_ref_ssim
485 | bible_ref_tesim
References 487 | to the bible mentioned in works. The field can be accessed 488 | both as text (tesim) and string (ssim). The references is 489 | using standard Danish abbreviations, like 1 Mos; 1 Kor 490 | 13,12; 1 Mos 2,7; Matt 16,18; Sl; Åb; ApG; Joh 1,14; Jak; 491 | Job. In many cases use bible_ref_ssim and then search 492 | for the exact string "1 Kor 13,12". The references are 493 | standardized annotations but in the full texts (of Grundtvig 494 | and Kierkegaard) may just allude to a place in the 495 | Bible.
year_itsiYear of 502 | release, publication or, in case of a message, the year it 503 | was sent.long int
508 | 509 |

Examples

510 | 511 |
512 |
Find all works 513 | try it! 518 | (clicking on "try it" to fill in the form to the left. You may then 519 | submit the search or customize it for your 520 | purposes. You might need to reset the form 521 | before a new search.) 522 |
523 |
524 |
525 | type_ssi:work AND is_editorial_ssi:no
526 | 	  
527 |
528 |
529 | 530 |
531 |
Find all works by Gustaf Munch-Petersen 532 | try it! 537 |
538 |
539 |
540 | author_name_tesim:munch
541 | AND
542 | type_ssi:work
543 | 	  
544 |
545 |
546 | 547 |
548 |
Find all speak in dialogs (TEI <sp> elements) in Archive for 549 | Danish Literature (ADL), written by someone called Jeppe 550 | try it! 555 |
556 |
557 |
558 | genre_ssi:play
559 | AND
560 | subcollection_ssi:adl
561 | AND
562 | author_name_tesim:jeppe
563 | 	  
564 |
565 |
566 | 567 |
568 |
Find all speak in dialogs (<sp> elements) in ADL, spoken by a character named Jeppe 569 | 572 | try it! 573 | 574 |
575 |
576 |
577 | genre_ssi:play
578 | AND
579 | subcollection_ssi:adl
580 | AND
581 | speaker_tesim:jeppe
582 | 	  
583 |
584 |
585 | 586 |
587 |
Find all strophes of poetry by N.F.S. Grundtvig containing 588 | the words hjerte and smerte (the two words rhyme, which heart 589 | and agony do not) in subcollection ADL. The query only makes 590 | sense in leafs; both words will most likely appear in 591 | any 19th century text of significant length. try it!
597 |
598 |
599 | type_ssi:leaf
600 | AND
601 | genre_ssi:poetry
602 | AND
603 | subcollection_ssi:adl
604 | AND
605 | author_name_tesim:grundtvig
606 | AND
607 | text_tesim:hjerte
608 | AND  
609 | text_tesim:smerte
610 | 	  
611 |
612 |
613 | 614 |
615 |
Find all dialogue (all TEI speak <sp> ... </sp>) in the plays by Holberg where someone is talking about Mester Erich 616 | try it!
620 |
621 |
622 | genre_ssi:play
623 | AND
624 | subcollection_ssi:adl
625 | AND
626 | text_tesim:mester erich
627 | AND
628 | author_name_tesim:holberg
629 | 	  
630 |
631 |
632 | 633 |
634 |
Find all letters sent from Berlin by Georg Brandes
635 |
Filter by letters, search by author and sender location 636 | try it!
641 |
642 |
643 | subcollection_ssi:letters
644 | AND
645 | author_name_tesim:georg brandes
646 | AND
647 | sender_location_tesim:berlin
648 |           
649 |
650 |
651 | 652 |
653 |
Find all letters sent from Paris before 1850
654 |
Filter by letters, search by year_itsi and sender location 655 | try it!
660 |
661 |
662 | subcollection_ssi:letters
663 | AND
664 | sender_location_tesim:paris
665 | AND
666 | year_itsi:[1000 TO 1850]
667 |           
668 |
669 |
670 | 671 |

Filter, join and sort examples

672 | 673 |
674 |
Find all works by Holberg containing poetry 675 | 680 | try it! 681 | . Steps in the search:
682 |
Search for author
683 |
684 | author_name_tesim:holberg
685 | 	
686 |
Filter by genre_ssi:poetry, but return the record corresponding to the containing work rather than to the leaf node corresponding to a piece of poetry. Requires a database join:
687 |
688 | {!join to=id from=part_of_ssim}genre_ssi:poetry
689 | 	
690 |
691 |
692 | 693 |
694 |
Find all letters sent from Berlin by Georg Brandes as above, but sort descending by date (year)
695 |
I.e., filter by letters, search by author and sender location 696 | try it! 701 |
702 |
Add sort by clause
703 | year_itsi desc          
704 | 
705 |
706 |
707 | 708 |
709 |
Find all years when Grundtvig mentions hell (in Danish 710 | helvede). 711 | try it! You can 721 | limit the retrieval to document id and year only by entering year_itsi into 722 | the field list field in the form) and get all records by 723 | setting the number of records to (say) 500.
724 |
725 | query 726 |
727 | subcollection_ssi:gv
728 | AND
729 | verse_extract_tesim:helvede
730 | AND
731 | type_ssi:work
732 |           
733 | field list 734 |
735 | id year_itsi            
736 |           
737 | sort by ascending 738 |
739 | year_itsi asc            
740 |           
741 |
742 | 743 |
744 | 745 |
746 |
747 | Note the difference between *_extract_tesim and 748 | genre_ssi. The former is to limit the search to text in the 749 | specified form of text in document. The genre_ssi looks 750 | specifies the form. genre_ssi is only applicable to 751 | paragraph level records. 752 |
753 |
754 | subcollection_ssi:gv
755 | AND
756 | text_tesim:helvede
757 | AND
758 | type_ssi:work
759 | AND
760 | genre_ssi:poetry
761 |         
762 |
will give zero hits whereas
763 |
764 | subcollection_ssi:gv
765 | AND
766 | text_tesim:helvede
767 | AND
768 | type_ssi:leaf
769 | AND
770 | genre_ssi:poetry
771 |         
772 | 773 |
will give a lot of hits, one for each strophe.
774 |
775 |
776 |           
777 |
778 |
An interesting exercise we leave to the reader is to 779 | repeat the search for paradise (the same in Danish) or 780 | heaven. Does Grundtvig mentions of hell and paradise (or 781 | heaven) in anyway correlate temporally?
782 |
783 | 784 |
785 |
Poetry often consists of strophes containing lines (which 786 | may or may not contain rhymes and rythm). In TEI, strophes are 787 | lines in a line group element (<lg>). Find all strophes 788 | containing "regn" (i.e., rain) in poetry in volume 1 of Gustaf 789 | Munch Petersen's collected works.
790 |
Sort the result set in inverse document order 791 | 792 | Try it! 798 |
799 | 800 |
The actual search
801 | 802 |
803 |
804 | volume_id_ssi:adl-texts-munp1-root
805 | AND
806 | text_tesim:regn
807 | AND
808 | genre_ssi:poetry
809 | 	  
810 |
811 | 812 |
The sort
813 | 814 |
815 |
816 | position_isi desc
817 | 	  
818 |
819 |
820 | 821 |
822 |
A poem is, technically in TEI, a sequence of line groups 823 | (see above). Find all poems (i.e., works) containing strophes 824 | with "regn" (i.e., rain) in volume 1 of Gustaf Munch 825 | Petersen's collected works.
826 |
Sort the result set in the actual document order 827 | 828 | Try it! 836 |
837 | 838 |
The actual search
839 | 840 |
841 |
842 | volume_id_ssi:adl-texts-munp1-root
843 | AND
844 | text_tesim:regn
845 | 	  
846 |
847 | 848 |
The join
849 | 850 |
851 | {!join to=id from=part_of_ssim}genre_ssi:poetry
852 | 	
853 |
854 | 855 |
The sort
856 | 857 |
858 |
859 | position_isi asc
860 | 	  
861 |
862 | 863 |
864 | 865 |
866 |
Find paragraphs or strophes where there are references to 867 | 1 Corinthians 13:12 (1 Kor 13,12: For now we see only a 868 | reflection as in a mirror; then we shall see face to 869 | face.) in the works of N.F.S. Grundtvig. 870 | 871 | try it! 880 |
881 | 882 |
The query
883 |
884 |
885 | bible_ref_ssim:"1 Kor 13,12"
886 | AND
887 | subcollection_ssi:gv
888 | AND
889 | is_editorial_ssi:no
890 | 
891 |
892 | 893 |
Sort chronologically
894 |
895 |
896 |  year_itsi asc
897 |  
898 |
899 | 900 |
Join with volume parent to return works. For paragraphs of prose.
901 |
902 |
903 | {!join to=volume_id_ssi from=part_of_ssim}genre_ssi:prose
904 | 
905 |
906 | 907 |
Join with volume parent to return works. Same thing as the join above but for strophes of poetry. 908 | Try it again for poetry! 917 |
918 |
919 |
920 | {!join to=volume_id_ssi from=part_of_ssim}genre_ssi:poetry
921 | 
922 |
923 | 924 |
I believe 1 Corinthians 13:12 is the part of the scripture 925 | most quoted by Grundtvig, but he do that more in prose than in 926 | poetry. On the other hand, he wrote more prose in spite of the 927 | fact that he is one of the most prolific hymn authors in not 928 | only Denmark but the whole of Scandinavia.
929 | 930 |
931 | 932 |

Choose index instance

933 | 934 |

You cannot use the index-test instance outside our 935 | network. Forget this if you are not developer at kb.dk

936 | 937 | 943 | 961 | 962 |

Colophon

963 |

This document was authored by

964 |

Sigfrid Lundberg
965 | The Royal Danish Library
966 | Denmark

967 | 968 |

who also wrote the indexer. However, a large number of people 969 | has contributed to this by coding services on top the 970 | index. That process has required clarifications of this document 971 | and modification of the index. This is the fruit of a teamwork.

972 | 973 |
974 | 975 | 976 | -------------------------------------------------------------------------------- /form-demos/cop-form.html: -------------------------------------------------------------------------------- 1 | 3 | 4 | COP API form demo 5 | 7 | 8 | 9 |

COP API form demo

10 |
11 |
Edition
12 |
13 | 30 |
31 |
32 |
33 |
34 |
Search for
35 |
36 | 37 |
38 |
Start page
39 |
40 | 41 |
42 |
Items per page
43 |
44 | 45 |
46 |
format
47 |
48 | 53 |
54 |
55 | 56 |
57 |
58 |
59 | 67 | 68 | 69 | -------------------------------------------------------------------------------- /form-demos/cop-solr-form.html: -------------------------------------------------------------------------------- 1 | 3 | 4 | COP search API 5 | 7 | 8 | 9 |
10 |

COP search API

11 | 12 |

This document is a part of Royal 13 | Danish Library's APIs, and in particular The 15 | documentation on how use our image based resources.. See also Licences 17 | & Legalese and 18 | Caveats

19 | 20 |
21 |
22 |
Search for
23 |
24 | 27 |
28 |
result format
29 |
30 | 34 |
35 |
start record
36 |
37 |
number of records
38 |
39 |
Query parser
40 |
41 | 45 | 46 |
47 |
48 | 49 |
50 |
51 |
52 |
53 |
54 |
 
55 |       
56 |
57 |
58 | 59 |

Fields to search

60 | 61 |

More to come

62 | 63 |

See COP 65 | SOLR data in our public index.

66 | 67 |
68 | 69 | 70 | -------------------------------------------------------------------------------- /geographic-data.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # Accessing the Aerial Photograph Collection 4 | 5 | ## Overview 6 | 7 | This document describes the search interfaces for our Aerial 8 | Photography Collection, [Danmark set fra 9 | Luften](http://www.kb.dk/danmarksetfraluften/), (DFSL). Because of 10 | its geographical aspect it has its own frontend. 11 | 12 | DSFL - draws its data from several sources, notably 13 | 14 | + The metadata is stored in Common Object Publishing (COP) 15 | + Vertical, photogrammetrical, images from 1954, 1995 and 2006. These are provided by [COWI](http://www.cowi.dk/menu/home/) based on originals in our collections. 16 | 17 | The service is based on the Amazon A9 open Search web service protocol. A more [detailed description of that is available](cop-backend.md#open-search). 18 | 19 | ## COP Backend - Syndication Service 20 | 21 | The whole dataset can (at least in principle) be accessed from http://www.kb.dk/cop/syndication/images/luftfo/2011/maj/luftfoto/subject203/ 22 | 23 | A single record can be retrieved using a URI on this form http://www.kb.dk/cop/syndication/images/luftfo/2011/maj/luftfoto/object59452/ (a single record) 24 | 25 | A simple area search with result presented in [KML format](https://developers.google.com/kml/documentation/) 26 | 27 | http://www.kb.dk/cop/syndication/images/luftfo/2011/maj/luftfoto/subject203/?format=kml&type=all&bbo=10.80531074987789,55.57241860489453,10.568933033813437,55.48147359047444¬Before=1920-01-01¬After=1970-12-31&itemsPerPage=50&page=1&random=0.0 28 | 29 | 30 | ### Parameters and sample values 31 | 32 | format = (kml, rss, atom, mods) 33 | 34 | type = all, 1,2,3 35 | 36 | all = Alle typer, all types of photos 37 | 1 = Skråfoto, aerial photo with an angle. 38 | 2 = Lodfoto, 90 degree aerial photo 39 | 3 = Protokolside, Protocol page 40 | 41 | bounding box 42 | 43 | bbo = 10.80531074987789,55.57241860489453,10.568933033813437,55.48147359047444 44 | 45 | http://www.kb.dk/cop/syndication/images/luftfo/2011/maj/luftfoto/subject203/?format=kml&type=all&bbo=10.80531074987789,55.57241860489453,10.568933033813437,55.48147359047444¬Before=1920-01-01¬After=1970-12-31&itemsPerPage=50&page=1&random=0.0&correctness=1 46 | 47 | notBefore=1920-01-01 48 | Do not return pictures before this date YYYY-MM-DD 49 | notAfter=1970-12-31 50 | Do not return pictures after this date YYYY-MM-DD 51 | 52 | itemsPerPage=1-5000 53 | The number of items to be returned per page. If 10000 records 54 | is found in an area only 5000 will be displayed, the last items (5001 55 | to 10000) can be retrieved by setting the page variable to 2 56 | 57 | page= 1,2,3 etc. depending on the number of results. Works as an offset value. 58 | 59 | random= 0.1 60 | a value between 0.0 and 1.0. Optional and only relevant for 61 | the luftfoto frontend... 62 | 63 | ### Searching with a query string 64 | 65 | query= a search term. 66 | 67 | Further specification on the query search term. Certain fields can be specified inside a query term. These are, currently 68 | 69 | + location:X 70 | + person:Y 71 | + address:Z 72 | + building:A 73 | 74 | Example: Searching for "Lykkegård" in the full text of the record and 75 | combining that with a boolean AND with a search for "Jørgensen" in 76 | the person field yields 77 | 78 | ``` 79 | Lykkegård&person::Jørgensen 80 | ``` 81 | 82 | URI encoded and inserted as the value of the query CGI variable this gives 83 | 84 | http://www.kb.dk/cop/syndication/images/luftfo/2011/maj/luftfoto/subject203/?format=rss&query=lykkeg%C3%A5rd%26person%3A%3AJ%C3%B8rgensen&type=all&bbo=10.826596760620077,55.54834253439101,10.590219044555624,55.45734180334893¬Before=1920-01-01¬After=1970-12-31&itemsPerPage=50&page=1&random=0.0 85 | 86 | Example Output in KML 87 | 88 | ``` 89 | 90 | 91 | 103 | 1 104 | 3 105 | 106 | 511 107 | 108 | 109 | 110 | Lykkegård, Valdemar, gårdejer (1948) 111 | #balloon-style 112 | 113 | 114 | 10.686429993629417,55.49797116219872 115 | 116 | 117 | 118 | http://www.kb.dk/images/luftfo/2011/maj/luftfoto/object78814 119 | 120 | 121 | Lykkegård, Valdemar, gårdejer 122 | 123 | 124 | Sylvest Jensen 125 | 126 | 127 | 1948 128 | 129 | 130 | Skråfoto 131 | 132 | 133 | 134 | 135 | 136 | Danmark, Fyn, Viby 137 | 138 | 139 | http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3147/L3147_38.jpg 140 | 141 | 142 | http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3147/L3147_38.jpg 143 | 144 | 145 | 2011-07-11 146 | 147 | 148 | 2012-02-16 149 | 150 | 151 | 1 152 | 153 | 154 | 155 | 156 | Petersen, gårdejer (1936/1937/1938) 157 | #balloon-style 158 | 159 | 160 | 10.685745670318965,55.49695929660715 161 | 162 | 163 | 164 | http://www.kb.dk/images/luftfo/2011/maj/luftfoto/object78546 165 | 166 | 167 | Petersen, gårdejer 168 | 169 | 170 | Sylvest Jensen 171 | 172 | 173 | 1936/1937/1938 174 | 175 | 176 | Skråfoto 177 | 178 | 179 | 180 | 181 | 182 | Danmark, Fyn, Viby 183 | 184 | 185 | http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L0271/L0271_12.jpg 186 | 187 | 188 | http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L0271/L0271_12.jpg 189 | 190 | 191 | 2011-07-11 192 | 193 | 194 | 2011-07-11 195 | 196 | 197 | 1 198 | 199 | 200 | 201 | 202 | Hansen, uddeler (1948) 203 | #balloon-style 204 | 205 | 206 | 10.68613495063778,55.496497415382834 207 | 208 | 209 | 210 | http://www.kb.dk/images/luftfo/2011/maj/luftfoto/object78812 211 | 212 | 213 | Hansen, uddeler 214 | 215 | 216 | Sylvest Jensen 217 | 218 | 219 | 1948 220 | 221 | 222 | Skråfoto 223 | 224 | 225 | 226 | 227 | 228 | Danmark, Fyn, Viby 229 | 230 | 231 | http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3147/L3147_36.jpg 232 | 233 | 234 | http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3147/L3147_36.jpg 235 | 236 | 237 | 2011-07-11 238 | 239 | 240 | 2012-02-16 241 | 242 | 243 | 1 244 | 245 | 246 | 247 | 248 | 249 | ``` 250 | ## Shortcut - Luftfoto JSON Service 251 | 252 | As an alternative to the xml based webservice a JSON REST service is available: 253 | 254 | type: POST 255 | 256 | url: http://www.kb.dk/danmarksetfraluften/async/search/ 257 | request: bbo=10.395490670074423,55.22227193719089,10.296785378326376,55.18994697228769&zoom=14&lat=55.20611273543719&lng=10.346138024200382&page=1&q_fritekst=&q_stednavn=&q_bygningsnavn=&q_person=&q_adresse=¬Before=1920¬After=1970&category=subject203&itemType=all&thumbnailSize= 258 | 259 | sample response (slightly modified to display less results): 260 | 261 | ``` 262 | { 263 | 264 | "status":"OK", 265 | "kmlFeedUrl":"REPLACED_EXTERNAL_BACKEND_GUI_URI/syndication/images/luftfo/2011/maj/luftfoto/subject203/?format\u003dkml\u0026type\u003dall\u0026bbo\u003d10.395490670074423,55.22227193719089,10.296785378326376,55.18994697228769\u0026notBefore\u003d1920-01-01\u0026notAfter\u003d1970-12-31\u0026itemsPerPage\u003d50\u0026page\u003d1\u0026random\u003d0.0", 266 | "rssUrl":"http://cop-02.kb.dk:8080/cop/syndication/images/luftfo/2011/maj/luftfoto/subject203/?format\u003drss\u0026type\u003dall\u0026bbo\u003d10.395490670074423,55.22227193719089,10.296785378326376,55.18994697228769\u0026notBefore\u003d1920-01-01\u0026notAfter\u003d1970-12-31\u0026itemsPerPage\u003d50\u0026page\u003d1\u0026random\u003d0.0", 267 | "copjects":[ 268 | { 269 | "id":"object80081", 270 | "title":"Larsen, Marius, gårdejer (1949)", 271 | "longitude":"10.299912225723233", 272 | "latitude":"55.20855233840371", 273 | "location":"Danmark, Fyn, Sandholt Lyndelse", 274 | "mods":"", 275 | "atomLink":"http://www.kb.dk/danmarksetfraluften/images/luftfo/2011/maj/luftfoto/object80081", 276 | "imgUrl":"http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3555/L3555_31a.jpg", 277 | "imgURLThumb":"http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3555/L3555_31a.jpg", 278 | "imgURLFull":"", 279 | "itemType":"Skråfoto", 280 | "stjerne":"", 281 | "photoYear":"1949", 282 | "photoName":"Larsen, Marius, gårdejer", 283 | "iconType":"", 284 | "isPartofCluster":false, 285 | "correctness":"1" 286 | }, 287 | { 288 | 289 | "id":"object80083", 290 | "title":"Frederiksen, Fr., gårdejer (1949)", 291 | "longitude":"10.301543008804288", 292 | "latitude":"55.20570867133342", 293 | "location":"Danmark, Fyn, Sandholt Lyndelse", 294 | "mods":"", 295 | "atomLink":"http://www.kb.dk/danmarksetfraluften/images/luftfo/2011/maj/luftfoto/object80083", 296 | "imgUrl":"http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3555/L3555_32a.jpg", 297 | "imgURLThumb":"http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3555/L3555_32a.jpg", 298 | "imgURLFull":"", 299 | "itemType":"Skråfoto", 300 | "stjerne":"", 301 | "photoYear":"1949", 302 | "photoName":"Frederiksen, Fr., gårdejer", 303 | "iconType":"", 304 | "isPartofCluster":false, 305 | "correctness":"1" 306 | }, 307 | { 308 | "id":"object79751", 309 | "title":"(1949)", 310 | "longitude":"10.344592463493541", 311 | "latitude":"55.20884006089258", 312 | "location":"Danmark, Fyn, Hillerslev", 313 | "mods":"", 314 | "atomLink":"http://www.kb.dk/danmarksetfraluften/images/luftfo/2011/maj/luftfoto/object79751", 315 | "imgUrl":"http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3542/L3542_37a.jpg", 316 | "imgURLThumb":"http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L-serien/Negativer/L3542/L3542_37a.jpg", 317 | "imgURLFull":"", 318 | "itemType":"Skråfoto", 319 | "stjerne":"", 320 | "photoYear":"1949", 321 | "photoName":"", 322 | "iconType":"", 323 | "isPartofCluster":false, 324 | "correctness":"1" 325 | }, 326 | { 327 | "id":"object69117", 328 | "title":"(1939)", 329 | "longitude":"10.300898918628718", 330 | "latitude":"55.206124979742256", 331 | "location":"Danmark, Fyn, Sandholt Lyndelse", 332 | "mods":"", 333 | "atomLink":"http://www.kb.dk/danmarksetfraluften/images/luftfo/2011/maj/luftfoto/object69117", 334 | "imgUrl":"http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L0815_06.jpg", 335 | "imgURLThumb":"http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L0815_06.jpg", 336 | "imgURLFull":"", 337 | "itemType":"Skråfoto", 338 | "stjerne":"", 339 | "photoYear":"1939", 340 | "photoName":"", 341 | "iconType":"", 342 | "isPartofCluster":false, 343 | "correctness":"1" 344 | }, 345 | { 346 | "id":"object69033", 347 | "title":"Bondesen, Rs., gårdejer (1939)", 348 | "longitude":"10.35555557834302", 349 | "latitude":"55.19091957553203", 350 | "location":"Danmark, Fyn, Nybølle", 351 | "mods":"", 352 | "atomLink":"http://www.kb.dk/danmarksetfraluften/images/luftfo/2011/maj/luftfoto/object69033", 353 | "imgUrl":"http://www.kb.dk/imageService/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L0812_16.jpg", 354 | "imgURLThumb":"http://www.kb.dk/imageService/w150/h150/online_master_arkiv_10/non-archival/Maps/FYNLUFTFOTO/L0812_16.jpg", 355 | "imgURLFull":"", 356 | "itemType":"Skråfoto", 357 | "stjerne":"", 358 | "photoYear":"1939", 359 | "photoName":"Bondesen, Rs., gårdejer", 360 | "iconType":"", 361 | "isPartofCluster":true, 362 | "correctness":"0" 363 | }, 364 | ], 365 | "totalResultsCount":153, 366 | "currentPage":1, 367 | "copjectsCount":50, 368 | "pagesCount":4, 369 | "resultsPerPage":50, 370 | "zoomLvl":14 371 | } 372 | ``` 373 | 374 | ### The encoding of coordinates 375 | 376 | `Lat,Long` or `Long,Lat?` That depends ;^) 377 | 378 | * In a `KML`l feed the expected order is Longitude, Latitude format. See KML reference 379 | http://code.google.com/intl/da-DK/apis/kml/documentation/kmlreference.html#point 380 | * In an `RSS` feed using the GeoRSS:point tag, the coordinates are Latitude, Longitude. 381 | http://www.georss.org/georss 382 | * In `MODS` the md:coordinates the order is Latitude, Longitude. 383 | http://www.loc.gov/standards/mods/v3/mods-userguide-elements.html#coordinates 384 | 385 | 386 | -------------------------------------------------------------------------------- /image-delivery.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # Image delivery 4 | 5 | Just about everything in COP is about images, i.e., digital images of 6 | sheets of paper. Front and back of a photograph or a page in book is 7 | special case of that. Note, however, that an image of a text is 8 | actually a text, just as an image of a photograph is a photograph. 9 | 10 | The COPs APIs are basically about enabling the building of user 11 | interfaces for all these objects. The images themselves, however, are 12 | delivered by a special protocol, IIIF Image API cf. the [IIIF 13 | Documents](http://iiif.io/api/image/2.1/). We give a short 14 | introduction on how to construct an image URI given the data you may 15 | get from COP. 16 | 17 | Two cases arise: 18 | 19 | ## Constructing single image URIs 20 | 21 | The identifier of an image is found in an element called md:identifier 22 | (see the [metadata section](metadata-formats.md#identifiers) for more 23 | information on how to find them in the MODS section) with 24 | displayLabel="image" and displayLabel="thumbnail", respectively. 25 | 26 | 27 | ``` 28 | Image Uri 30 | ``` 31 | or 32 | 33 | ``` 34 | Thumbnail Uri 36 | ``` 37 | 38 | The URIs in the two has the forms 39 | 40 | http://www.kb.dk/imageService/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011.jpg 41 | 42 | and 43 | 44 | http://www.kb.dk/imageService/w150/h150/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011.jpg 45 | 46 | and return 1024 and 150 px wide images, respectively. 47 | 48 | These forms are old and deprecated and predate our implementation of IIIF. The 49 | image's width and height can be set using the parameters w and h in the 50 | URI path. Only one of them are needed. E.g., 51 | 52 | http://www.kb.dk/imageService/w640/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011.jpg 53 | 54 | will return a 640 px wide image. 55 | 56 | ## Constructing IIIF URIs 57 | 58 | The [IIIF Image API](http://iiif.io/api/image/2.1/) use other syntaxes. 59 | 60 | In the Image Uri mentioned above, the substing after `imageService` and before `.jpg` is 61 | 62 | imageService/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011 63 | 64 | prepending `http://kb-images.kb.dk/` and `appending` /info.json to 65 | that string we get the URI of the technical metadata of the image in 66 | json format from the server. I.e., 67 | 68 | http://kb-images.kb.dk/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011/info.json 69 | 70 | returning 71 | 72 | ``` 73 | { 74 | "@context" : "http://iiif.io/api/image/2/context.json", 75 | "@id" : "http://kb-images.kb.dk/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011", 76 | "protocol" : "http://iiif.io/api/image", 77 | "width" : 2940, 78 | "height" : 2212, 79 | "sizes" : [ 80 | { "width" : 183, "height" : 138 }, 81 | { "width" : 367, "height" : 276 }, 82 | { "width" : 735, "height" : 553 }, 83 | { "width" : 1470, "height" : 1106 } 84 | ], 85 | "tiles" : [ 86 | { "width" : 256, "height" : 256, "scaleFactors" : [ 1, 2, 4, 8, 16 ] } 87 | ], 88 | "profile" : [ 89 | "http://iiif.io/api/image/2/level1.json", 90 | { "formats" : [ "jpg" ], 91 | "qualities" : [ "native","color","gray" ], 92 | "supports" : ["regionByPct","sizeByForcedWh","sizeByWh","sizeAboveFull","rotationBy90s","mirroring","gray"] } 93 | ] 94 | } 95 | ``` 96 | 97 | Note the @id in the json-ld above. Append '/full/!2000,/0/native.jpg' to that you get 98 | 99 | http://kb-images.kb.dk/online_master_arkiv_6/non-archival/Images/BILLED/2008/Billede/dk_eksp_album_191/kbb_alb_2_191_friis_011/full/!2000,/0/native.jpg 100 | 101 | Dereference that and you will get a 2000 px wide image. The width of 102 | this particular image is 2940 and you can get all of that; you 103 | retrieve images up to 8000x8000px. 104 | 105 | Image URIs may appear in a few more formats. Note, for example, the 106 | htmlUrl in outlines of [Chresten Jensens 107 | Visebog](cop-backend.md#content-service) are actually links to tif 108 | files. You can easily retrieve a 109 | [json-ld file for each page](http://kb-images.kb.dk/online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_001/info.json) 110 | in the manuscript and then [download them](http://kb-images.kb.dk/online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_001/full/!1024,/0/native.jpg) 111 | -------------------------------------------------------------------------------- /kml-viewer.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | KML example 4 | 5 | 6 |

The viewer-js used in here isn't compatible with the needles we are asking for in our kml

7 |
8 |
9 | 10 | 11 |
12 |
13 | 14 | 15 | -------------------------------------------------------------------------------- /links.md: -------------------------------------------------------------------------------- 1 | 2 | # Links in COP 3 | 4 | The objects are presented as parts of what we refer to as 5 | editions. Each edition is typically the fruit of a digitisation 6 | project within a collection. 7 | 8 | 9 | 10 | 11 | 12 | pamphlets 13 | manus 14 | images 15 | editions 16 | letters 17 | maps 18 | books 19 | 20 | 21 | 22 | | edition | description | 23 | |:--------|:------------| 24 | | /editions/any/2009/jul/editions | The edition of all editions | 25 | | /books/judsam/2010/maj/jstryk | Judaistisk Samling: Tidlige & sjældne tryk | 26 | | /books/ortsam/2011/mar/ostryk | Tidlige tryk i Orientalsk Samling | 27 | | /images/billed/2010/okt/billeder | Billeder | 28 | | /images/billed/2014/jun/hca | H.C. Andersens Papirklip | 29 | | /letters/judsam/2011/mar/dsa | David Simonsens Arkiv | 30 | | /manus/judsam/2009/sep/dsh | David Simonsens Håndskrifter | 31 | | /manus/judsam/2010/maj/jsmss | Judaistisk Samling: Håndskrifter | 32 | | /manus/musman/2010/dec/viser | DFS | 33 | | /manus/ortsam/2009/okt/orientalia | Oriental Collection: Manuscripts | 34 | | /manus/vmanus/2011/dec/ha | Vesterlandske håndskrifter | 35 | | /maps/kortsa/2012/jul/kortatlas | Kort og Atlas | 36 | | /pamphlets/dasmaa/2008/feb/daellsvarehus | Varehuskataloger | 37 | | /pamphlets/dasmaa/2008/feb/partiprogrammer | Partiprogram | 38 | | /pamphlets/dasmaa/2012/jul/smaatryk | Småtryk | 39 | -------------------------------------------------------------------------------- /metadata-formats.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # The Metadata Formats Used in Syndication and Dissemination 4 | 5 | Through the `format` CGI variable, data can be syndicated in the 6 | following formats: `kml`, `rss`, `solr` and `mods`. 7 | 8 | ## format=kml 9 | 10 | The [KML](https://developers.google.com/kml/documentation/) feed has 11 | not been designed to be consumed directly by external software like 12 | [Google MAPS](http://maps.google.com/), and as of writing this it is 13 | not known if that works. 14 | 15 | ## format=rss 16 | 17 | [RSS 2.0](https://cyber.harvard.edu/rss/rss.html) is the main 18 | format. This feed includes Open Search, geo and GeoRSS extensions. For 19 | example 20 | 21 | ``` 22 | 55.27093341035208 10.155321304320978 23 | 55.27093341035208 24 | 10.155321304320978 25 | ``` 26 | 27 | Which is the position of [a building in 28 | DSFL](http://www.kb.dk/cop/syndication/images/luftfo/2011/maj/luftfoto/object176012/da/). 29 | 30 | The `geo` name space refer to https://www.w3.org/2003/01/geo/wgs84_pos 31 | and the `georss` to http://www.georss.org/georss. The other rss 32 | elements used `title` and `link`. 33 | 34 | ``` 35 | Viby - 1952 36 | http://www.kb.dk/images/luftfo/2011/maj/luftfoto/object154326/da/ 37 | ``` 38 | 39 | We embed the detailed bibliographic metadata directly inside the 40 | RSS. So each item in the feed has the following outline. 41 | 42 | ``` 43 | 44 | ... 45 | http://www.kb.dk/... 46 | 47 | ... 48 | latitude, longitude 49 | latitude 50 | longitude 51 | 52 | ``` 53 | 54 | So the RSS feed allows us to get everything, while it is actually 55 | understood by browsers and other software. However, the ` 56 | ... ` is only understood by software aware of this practice. 57 | 58 | ## format=mods 59 | 60 | A noted [elsewhere](cop-backend.md#syndication-service-formats), the mods format contains data in this form 61 | 62 | ``` 63 | 64 | ... open search header ... 65 | ... 66 | ... 67 | ... 68 | ... 69 | 70 | ``` 71 | 72 | where the open search header is described in the [syndication service 73 | documentation](cop-backend.md#open-search) 74 | 75 | ## Metadata conventions 76 | 77 | The bulk of the data is in the mods object. The [Metadata Object 78 | Description Schema 79 | (mods)](http://www.loc.gov/standards/mods/mods-guidance.html) is 80 | developed and maintained by Library of Congress and well 81 | documented. Here we restrict ourselves to conventions and 82 | idiosyncrasies in our implementations. 83 | 84 | ### Identifiers 85 | 86 | 87 | | XPath | Example value | Comment | 88 | |:------|:--------------|:--------| 89 | | //md:identifier[@type="uri"][1] | `` http://www.kb.dk/images/billed/2010/okt/billeder/object67582/da/ `` | URI of the object | 90 | | //md:identifier[@type="local"] | `` H02165_020.tif `` | The local ID is usually the objects call number | 91 | | //md:identifier[@type="domsGuid"] | `` Uid:dk:kb:doms:2007-01/d651d950-1e88-11e2-808e-0016357f605f `` | A UUID in any of several formats, but usable for connecting it to the ID used for digital preservation if applicable | 92 | | //md:identifier[@displayLabel="image"][@type="uri"] | `` http://www.kb.dk/imageService/online_master_arkiv_12/non-archival/Maps/FYNLUFTFOTO/H-serien/H02165/H02165_020.jpg `` | A fairly high resolution JPG | 93 | | //md:identifier[@displayLabel="thumbnail"][@type="uri"] | `` http://www.kb.dk/imageService/w150/h150/online_master_arkiv_12/non-archival/Maps/FYNLUFTFOTO/H-serien/H02165/H02165_020.jpg `` | URI of a thumbnail | 94 | 95 | 96 | ### Language and other information about the records 97 | 98 | The COP user interface is bilingual, supporting Danish and 99 | English. Technically that implies that the cataloging language can be 100 | either of these two languages. It is possible read the cataloguing 101 | language from the recordinfo section of a record. 102 | 103 | 104 | ``` 105 | 106 | 107 | da 108 | 109 | http://www.kb.dk/cop/images/luftfo/2011/maj/luftfoto/object154326 110 | 2012-09-26 111 | 2012-10-25 112 | 113 | ``` 114 | 115 | The ID of the record and its creation and revision dates can be found here as well. 116 | 117 | ### Resource language 118 | 119 | Some of the objects in COP are in unusual language and script 120 | combinations, like judeo-arabic (arabic written in Hebr script) and 121 | even western European languages like Italian and German written in 122 | Hebrew. In our Oriental collections combinations of language and 123 | script is not at all obvious. Through history people write those 124 | languages they know, using the scripts they know. Which in some cases 125 | give arise to texts like this [Die abhandlung der 126 | algebra](http://www.kb.dk/manus/judsam/2009/sep/dsh/object28241/en/) 127 | which is written in German but in Hebrew script. 128 | 129 | To handle this we use an Internet standard for [language tagging, 130 | namely RFC 4646](https://www.ietf.org/rfc/rfc4646.txt) 131 | 132 | ### Event metadata 133 | 134 | In publishing the [Comics, caricature and newspaper cartoons 135 | collection](http://www.kb.dk/images/billed/2010/okt/billeder/subject2427/da/) 136 | we realized that it is more or less meaningless to publish artwork of 137 | this kind without being able to connect it to a description of the 138 | situation it was created. Find below an example of how such an event 139 | is encoded as a related item, which in this case is a revue preformance. 140 | 141 | ``` 142 | 143 | Revy af Heick, Hilda (f. 1946) sanger og Heick, Keld (f. 1946) musiker og sanger 144 | 145 | 1991 146 | 147 | Lindenborg Kro, Lindenborg, Roskilde 148 | 149 | 150 | 151 | ``` 152 | 153 | A `` element may contain any element allowed in the 154 | `` root element, the relation can be given in `type` 155 | attribute and can use an xlink:href instead of containing elements. 156 | 157 | ### Host publications 158 | 159 | A newspaper cartoon, just as an example, appears as a 160 | `` of type host. Only a handful elements are supported 161 | inside hosts. 162 | 163 | ``` 164 | 165 | 166 | B.T. 167 | 168 | 169 | ``` 170 | 171 | ### Complex and multipage resources 172 | 173 | COP supports a hierarchical table of contents. A fairly large 174 | proportion of the objects are complex and use this feature. There, 175 | however, only very few other fields than an image URI. The system 176 | supports titles on each item though. [This is one such 177 | object](http://www.kb.dk/cop/syndication/manus/musman/2010/dec/viser/object23942/en/?format=mods) 178 | 179 | 180 | ``` 181 | 182 | online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_001.tif 183 | 184 | Chresten Jensens Visebog 185 | 186 | 187 | online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_002.tif 188 | 189 | 190 | online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_003.tif 191 | 192 | 01 Held Danmarks gamle rige 193 | 194 | 195 | 196 | online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_004.tif 197 | 198 | online_master_arkiv_5/non-archival/Manus/VMANUS/2009/jun/dfs_1906_6a_16/dfs_1906_6a_16_005.tif 199 | 200 | 02 Frivillige møder vi 201 | 202 | 203 | ``` 204 | 205 | ## format=solr 206 | 207 | The cop backend/crud engine is now capable indexing its databases using SOLR. The procedure is 208 | 209 | 1. dumb down the record to DC and ESE 210 | 2. create elements directly from mods for data which has to be a little bit smarter 211 | 3. load the result into SOLR. 212 | 213 | 214 | http://www.kb.dk/cop/syndication/images/billed/2010/okt/billeder/object67582/da/?format=solr 215 | 216 | 217 | The same record as delivered from our public index 218 | 219 | 220 | http://public-index.kb.dk/solr/cop-editions/select?defType=edismax&indent=on&q=id:/images/billed/2010/okt/billeder/object67582&wt=json 221 | 222 | 223 | The semantics of the fields are outlined in the [COP SOLR fields](cop-solr-fields.md) documentation. 224 | We have [a brief demo](http://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/cop-solr-form.html) on how to use them. 225 | 226 | 227 | ## Europeana Semantic Elements (ESE) 228 | 229 | [ESE](http://pro.europeana.eu/page/ese-documentation) is only 230 | supported by the [OAI server](README.md#dissemination-of-metadata), 231 | not the syndication services. EDM is not supported. 232 | 233 | 234 | -------------------------------------------------------------------------------- /oai-pmh.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # OAI Dissemination API 4 | 5 | Our dissemination API is OAI-PMH, and our targets are COP, Aleph (our 6 | OPAC) and the National aggregator. The most important aggregators are, 7 | the National Aggregator (which is our own OAI to OAI gateway and 8 | aggregation service), REX, our discovery system and Europeana. 9 | 10 | When using these service, you need to store them in a database or 11 | index of your own. The [Open Archives Initives Protocol of Metadata 12 | Harvesting](http://www.openarchives.org/OAI/openarchivesprotocol.html) 13 | (OAI PMH) is well known, and we will not provide detailed information 14 | about here. 15 | 16 | A few example to get an idea what it is about: 17 | 18 | 1. Each OAI provider should be able to [Identify](http://www.openarchives.org/OAI/openarchivesprotocol.html#Identify) itself. 19 | + http://www.kb.dk/cop/oai/?verb=Identify 20 | 21 | 2. An OAI provider may contain multiple [collections or sets](http://www.openarchives.org/OAI/openarchivesprotocol.html#ListSets) 22 | + http://oai.kb.dk/oai/provider?verb=ListSets 23 | 24 | 3. For a given set we provide access to a [list of records](http://www.openarchives.org/OAI/openarchivesprotocol.html#ListRecords), 25 | for instance of all manuscripts in the Judaica collection 26 | + http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:judsam:2010:maj:jsmss&metadataPrefix=mods 27 | 28 | 4. One should be able to get hold of a [single record, given its Identifier](http://www.openarchives.org/OAI/openarchivesprotocol.html#GetRecord) 29 | + http://www.kb.dk/cop/oai/?verb=GetRecord&identifier=oai:kb.dk:manus:judsam:2010:maj:jsmss:object62730&metadataPrefix=oai_dc 30 | 31 | You can easily harvest any of our editions by choosing the 32 | desired set in the ListSets example above and insert it into the 33 | ListRecords URI. In the cases in the examples above we obtained the 34 | records in mods format, other alternatives are oai_dc and ese. See 35 | below . 36 | 37 | Examples: 38 | 39 | + http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:judsam:2010:maj:jsmss&metadataPrefix=ese 40 | + http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:judsam:2010:maj:jsmss&metadataPrefix=oai_dc 41 | 42 | | Set Spec | Set Name | 43 | |----------|----------| 44 | | [oai:kb.dk:letters:judsam:2011:mar:dsa](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:letters:judsam:2011:mar:dsa&metadataPrefix=ese) | David Simonsens Arkiv | 45 | | [oai:kb.dk:manus:judsam:2010:maj:jsmss](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:judsam:2010:maj:jsmss&metadataPrefix=ese) | Judaistisk Samling: Håndskrifter | 46 | | [oai:kb.dk:images:luftfo:2011:maj:luftfoto](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:images:luftfo:2011:maj:luftfoto&metadataPrefix=ese) | Luftfoto | 47 | | [oai:kb.dk:pamphlets:dasmaa:2012:jul:smaatryk](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:pamphlets:dasmaa:2012:jul:smaatryk&metadataPrefix=ese) | Småtryk | 48 | | [oai:kb.dk:maps:kortsa:2012:jul:kortatlas](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:maps:kortsa:2012:jul:kortatlas&metadataPrefix=ese) | Kort og Atlas | 49 | | [oai:kb.dk:pamphlets:dasmaa:2008:feb:partiprogrammer](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:pamphlets:dasmaa:2008:feb:partiprogrammer&metadataPrefix=ese) | Partiprogram | 50 | | [oai:kb.dk:images:billed:2010:okt:billeder](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:images:billed:2010:okt:billeder&metadataPrefix=ese) | Billeder | 51 | | [oai:kb.dk:manus:ortsam:2009:okt:orientalia](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:ortsam:2009:okt:orientalia&metadataPrefix=ese) | Oriental Collection: Manuscripts | 52 | | [oai:kb.dk:pamphlets:dasmaa:2008:feb:daellsvarehus](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:pamphlets:dasmaa:2008:feb:daellsvarehus&metadataPrefix=ese) | Varehuskataloger | 53 | | [oai:kb.dk:\books:judsam:2010:maj:jstryk](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:books:judsam:2010:maj:jstryk&metadataPrefix=ese) | Judaistisk Samling: Tidlige & sjældne tryk | 54 | | [oai:kb.dk:manus:judsam:2009:sep:dsh](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:judsam:2009:sep:dsh&metadataPrefix=ese) | David Simonsens Håndskrifter | 55 | | [oai:kb.dk:\books:ortsam:2011:mar:ostryk](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:books:ortsam:2011:mar:ostryk&metadataPrefix=ese) | Tidlige tryk i Orientalsk Samling | 56 | | [oai:kb.dk:manus:musman:2010:dec:viser](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:musman:2010:dec:viser&metadataPrefix=ese) | DFS | 57 | | [oai:kb.dk:images:billed:2014:jun:hca](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:images:billed:2014:jun:hca&metadataPrefix=ese) | H.C. Andersen | 58 | | [oai:kb.dk:manus:vmanus:2011:dec:ha](http://www.kb.dk/cop/oai/?verb=ListRecords&set=oai:kb.dk:manus:vmanus:2011:dec:ha&metadataPrefix=ese) | Vesterlandske håndskrifter | 59 | 60 | 61 | -------------------------------------------------------------------------------- /text-corpora.md: -------------------------------------------------------------------------------- 1 | [READ ME](README.md) - [OAI Dissemination](oai-pmh.md) - [Web services in COP](cop-backend.md) - [Aerial Photography](geographic-data.md) - [Image delivery](image-delivery.md) - [Metadata Formats](metadata-formats.md) - [Text Corpora](text-corpora.md) 2 | 3 | # Access to web services for text search, retrieval and other operations 4 | 5 | The Royal Danish Library provide access to some text and language 6 | resources. Until recently these resources have been intended solely for 7 | users coming to a site using a browser for searching, browsing and 8 | reading. 9 | 10 | Recently we have decided to complement these end user services with 11 | various text APIs. We hope that they are useful for students and 12 | scholars alike, and we also hope that this could seen as a 13 | contribution to the discussions on what kind web services and what 14 | APIs are useful within digital humanities and literary computing. 15 | 16 | The text resources are 17 | 18 | * [Archive for Danish Literature, ADL](http://www.adl.dk/). The APIs 19 | described in this document apply to this data set. The [literary texts used are available](https://github.com/Det-Kongelige-Bibliotek/public-adl-text-sources) 20 | * [Danmark's Breve](http://danmarksbreve.kb.dk/) use the basically the 21 | same APIs, but we have not decided to release the API on this data 22 | set. 23 | 24 | The APIs described here are provided with similar 25 | [caveats](README.md#caveats) and [legal 26 | restrictions](README.md#licences--legalese) as the other services 27 | described, and like them, these APIs are work in progress as public 28 | services. Also they are byproducts of our services and front ends. 29 | 30 | There are two kinds of services (and thus servers hosting the corresponding APIs) 31 | 32 | * text search service API 33 | * text retrieval service API 34 | 35 | The meaning of search service is obvious, the text retrieval service 36 | is somewhat less so. _Snippet server_ is our internal nick name of a 37 | set of web services that retrieves, transforms and delivers text 38 | snippets to the front end or other components using it. 39 | 40 | In order to be useful, the you need both search and retrieval APIs. 41 | Then you may search and discover what works and snippets there are, 42 | and retrieve and link to them. 43 | 44 | ## Text encoding 45 | 46 | Most texts are from collected works and are critical editions. All 47 | data and metadata available are in XML markup according Text Encoding 48 | Initiative, TEI, 49 | [Guidelines](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/). 50 | 51 | ## Anchors, searchability and retrievability 52 | 53 | The search system (which you cannot use just yet, see above), creates 54 | records corresponding to three levels 55 | 56 | * volume 57 | * work 58 | * text item 59 | 60 | where volume and work is defined as described above. When indexing, 61 | records are created taking 62 | 63 | * metadata from the TEI header given the reference in the decls attribute 64 | * text from the appropriate level, and below. 65 | 66 | All work and text item records contain data on the xml:id of the 67 | containing element and the xml:id page number of the preceding page 68 | break. 69 | 70 | The text items are indexed in a way that a search result can address a single 71 | 72 | * paragraph of prose 73 | * strophe in poetry 74 | * speech in a play 75 | 76 | Please note that a strophe occurring inside speech are not recognised 77 | as poetry. 78 | 79 | Typically one volume contributes (obviously) one volume record, one to 80 | dozens of work records and hundreds or thousands of text items. 81 | The records for works and text items import basic metadata and includes 82 | 83 | ## Connecting text with facsimile 84 | 85 | Our digital library have users interested in viewing the printed text, 86 | if not for any other reason than for checking the original when there 87 | is OCR errors. 88 | 89 | Facsimiles are delivered through our IIIF server. A page is turned 90 | whenever one finds a page break in the XML text, that is, a 91 | <pb/> element. It looks like 92 | 93 | ``` 94 | 95 | ``` 96 | 97 | An image URI is constructed by prepending 98 | `http://kb-images.kb.dk/public/` and appending 99 | `/full/,750/0/native.jpg` to the content of the facs attribute in the 100 | page break, resulting in an URI on this form: 101 | 102 | http://kb-images.kb.dk/public/adl/grundtvig/grundtvig08/grun8136/full/,750/0/native.jpg 103 | 104 | All images connected to a given snippet can be retrieved as an HTML 105 | document through the facsimile web service 106 | 107 | http://labs.kb.dk/storage/adl/present.xq?c=texts&doc=grundtvig08val.xml&id=workid80553&op=facsimile 108 | 109 | 110 | ## Text search 111 | 112 | The search API is described in detail in a separate documents 113 | 114 | * We use [SOLR for searching](https://cwiki.apache.org/confluence/display/solr/Searching) 115 | * SOLR has its own [Common Query Parameters](https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters) 116 | * We provide a [document about what search fields there are and how to use them](http://rawgit.com/Det-Kongelige-Bibliotek/access-digital-objects/master/form-demos/adl-form.html). 117 | 118 | A search can be returned in json or xml format. Here is an example, where we search for works 119 | 120 | * which title contain Jerusalem 121 | * that are writen by Gustaf Munch-Petersen 122 | 123 | SOLR returns [JSON](http://public-index.kb.dk/solr/text-retriever-core/select/?q=author_name_tesim%3AGustaf+Munch-Petersen%0D%0Aand%0D%0Acat_ssi%3Awork%0D%0Aand%0D%0Awork_title_tesim%3AJerusalem&wt=json&start=0&rows=10&defType=edismax&indent=on) or [XML](http://public-index.kb.dk/solr/text-retriever-core/select/?q=author_name_tesim%3AGustaf+Munch-Petersen%0D%0Aand%0D%0Acat_ssi%3Awork%0D%0Aand%0D%0Awork_title_tesim%3AJerusalem&wt=xml&start=0&rows=10&defType=edismax&indent=on) and the returned is the same. 124 | 125 | The simplest way to retrieve the data is to look for the url_ssi. In the example linked to it contains the value "texts/munp1.xml#workid72997", which is the concatenation of three variables 126 | 127 | * collection (c) = texts 128 | * document (doc) = munp1.xml 129 | * id = workid72997 130 | 131 | You can now construct the retrieval URI using the script present.xq and the three parameters: 132 | 133 | * [http://labs.kb.dk/storage/adl/present.xq?c=texts&doc=munp1.xml&id=workid72997&op=render](http://labs.kb.dk/storage/adl/present.xq?c=texts&doc=munp1.xml&id=workid72997&op=render) 134 | * [http://labs.kb.dk/storage/adl/present.xq?c=texts&doc=munp1.xml&id=workid72997&op=toc&targetOp=render](http://labs.kb.dk/storage/adl/present.xq?c=texts&doc=munp1.xml&id=workid72997&op=toc&targetOp=render) 135 | 136 | More on what you can do with the texts using the parameters below. 137 | 138 | ## Retrieval APIs for our texts 139 | 140 | There are several text retrieval scripts in the Snippet Server. 141 | [The source code is free](https://github.com/Det-Kongelige-Bibliotek/solr-and-snippets). 142 | 143 | We concentrate on two, present.xq. We use it for extracting snippets 144 | and transforming them. The html produced is mere fragments that you 145 | can include in your document just as you like it. 146 | 147 | There is an alternative script, present-text.xq which does the same as 148 | present.xq, except that it delivers the script as pure text with 149 | neither XML nor HTML markup. 150 | 151 | Virtually all scripts work in a similarly, with the following arguments. 152 | 153 | * doc -- the name of the document to be rendered or transformed. Here are some examples of doc names you can test 154 | * [hcaeventyr01val.xml](http://labs.kb.dk/storage/adl/texts/hcaeventyr01val.xml) 155 | * [hcaeventyr02val.xml](http://labs.kb.dk/storage/adl/texts/hcaeventyr02val.xml) 156 | * [munp1.xml](http://labs.kb.dk/storage/adl/texts/munp1.xml) 157 | * [munp2.xml](http://labs.kb.dk/storage/adl/texts/munp2.xml) 158 | * op, targetOp -- op is the operation to be performed upon the document doc, targetOp is the operation to be performed in links inside the service. Possible values of op and targetOp are 159 | * 'render' which implies that doc is transformed into HTML. 160 | * http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=render 161 | * http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=render&q=samlede with an argument q giving a search string to be highlighted in the text, in this case _samlede_ 162 | * 'solrize' which returns a solr ... document, which is ready to be sent to SOLR. C.f., http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=solrize 163 | * 'toc' returns a HTML table of contents 164 | * http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=toc If a 'toc' and a text generated through 'render' are included into one document, all internal links will work. 165 | * http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=toc&targetOp=render 166 | note the targetOp=render, which makes the toc script generate links to the _render_ed version of the doc. This is good for testing. 167 | * id -- the id of a part inside the doc which is to be treated. 168 | * q -- assuming that 'q' is the query, the present.xq is labelling the hits in the text 169 | 170 | Some more examples 171 | 172 | * Holberg, vol 3, HTML: http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=render 173 | * Holberg, vol 3, page 18: http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=render#s18 174 | * The TOC of the Den politiske Kandstøber http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=toc&targetOp=render&id=workid54980 175 | * The TOC of Den politiske Kandstøber, Actus II http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=toc&targetOp=render&id=idm140583366846000 176 | * Den politiske Kandstøber, Actus II http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=render&id=idm140583366846000 177 | * A single 'speak' in that play, 178 | * as HTML http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=render&id=idm140583366681648 179 | * or as SOLR doc http://labs.kb.dk/storage/adl/present.xq?doc=holb03val.xml&op=solrize&id=idm140583366681648 180 | * A TOC for a small work http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=toc&targetOp=render&id=workid59384 181 | * The page 27 (of the original volume) inside that work http://labs.kb.dk/storage/adl/present.xq?doc=aakjaer01val.xml&op=toc&targetOp=render&id=workid593843#s27 182 | 183 | -------------------------------------------------------------------------------- /web-service-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-dk/access-digital-objects/4b2645bde7de65a99a8f98993d50c8adaea37a6e/web-service-architecture.png --------------------------------------------------------------------------------