7 | Architecture of the World Wide Web (Second Edition) 8 |

├── .gitattributes ├── README.md ├── editorial.css ├── images └── uri-res-rep.png ├── index.html └── tidyconfig.txt /.gitattributes: -------------------------------------------------------------------------------- 1 | # I live in cygwin -- no CR wanted ever 2 | * text eol=lf 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | webarch 2 | ==== 3 | [The Architecture of the World Wide Web](http://w3ctag.github.io/webarch/) (Second Edition) 4 | 5 | * Unofficial editors' draft: http://w3ctag.github.io/webarch/ 6 | * Not yet ready for prime-time 7 | 8 | -------------------------------------------------------------------------------- /editorial.css: -------------------------------------------------------------------------------- 1 | .del { background-color: #ffbbbb } 2 | .del:before { content: "\002193" } /* down-arrow */ 3 | .del:after { content: "\002193" } 4 | .ins { background-color: #bbffbb } 5 | .ins:before { content: "\002191" } /* up-arrow */ 6 | .ins:after { content: "\002191" } 7 | 8 | -------------------------------------------------------------------------------- /images/uri-res-rep.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w3ctag/webarch/866b0fc141f9b690a0ce64134316aa7d9f187d90/images/uri-res-rep.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Architecture of the World Wide Web (Second Edition) 8 | 9 | 12 | 101 | 102 | 103 | 104 |

105 |

106 | The World Wide Web uses relatively simple technologies with sufficient 107 | scalability, efficiency and utility that they have resulted in a 108 | remarkable information space of interrelated 109 | resourcesinterconnected space of information 110 | and services, growing across languages, cultures, and media. In an effort to preserve these properties of 112 | the information space as the technologies 113 | evolve, this architecture document discusses the core design components 114 | of the Web. They are identification of resourcesinformation and 116 | services, representation of resource 117 | stateinformation state and service 118 | requests, and the protocols that support the interaction between 119 | agents and resources in the space. We relate 120 | core design components, constraints, and good practices to the 121 | principles and properties they support. 122 |

123 |

124 |

125 |

126 | This is an unofficial draft and work in progress. It has no official 127 | standing: rather it represents a trial balloon to help the TAG decide whether to proceed with a 129 | second edition of AWWW or 130 | not. 131 |

132 |

133 | This draft highlights most differences from the first edition of 134 | AWWW, with deletions presented like this and insertions presented like this. 137 |

138 |

139 | This section describes the status of this document at the time of 140 | its publication. Other documents may supersede this document. A list of 141 | current W3C publications and the latest revision of this technical 142 | report can be found in the W3C 143 | technical reports index at http://www.w3.org/TR/. 144 |

145 |

146 | This document has been developed by W3C's Technical Architecture Group (TAG), 148 | which, by charter 149 | maintains a list of 150 | architectural issues. The scope of this document is a useful subset 151 | of those issues; it is not intended to address all of them. The TAG 152 | intends to address the remaining (and future) issues after publication 153 | of Volume Two as a Recommendation. 154 |

155 |

156 | This document uses concepts and terms regarding URIs in general, and http: URIs in particular, as 158 | defined by the IETF. In an 160 | 18 Oct 2004 announcement, the revision of RFC2396 was endorsed as 161 | an IETF Specification, though the latest published draft as of this 162 | writing is draft-fielding-uri-rfc2396bis-07. 164 | The [URI] citation should reflect 165 | publication of the relevant RFC in future revisionsspecification is the primary normative reference for URIs in 167 | general. For http: (and https:, and the HTTP 168 | protocol), [[!HTTP11]] is the specification currently in force, but 169 | [[!HTTPbis]], which will update it, is nearing final approval, and we 170 | assume it will be in force by the time this second edition is 171 | completed. 172 |

173 |

174 | The references and caveats wrt HTTP(bis) above will need to be 175 | corrected as and when HTTPbis is approved HTTP11.2e, or whatever. 176 |

177 |

178 |

179 |

180 | List of Principles, Constraints, and Good Practice Notes 181 |

182 |

183 | The following principles, constraints, and good practice notes are 184 | discussed in this document and listed here for convenience. There is 185 | also a free-standing summary. 186 |

187 |

189 | Identification 190 |

192 |

194 | Global Identifiers (principle, 2) 195 |
197 | Identify with URIs (practice, 2.1) 198 |
200 | URIs Identify a Single Resource (constraint, 2.2) 201 |
203 | Avoiding URI aliases (practice, 2.3.1) 204 |
206 | Consistent URI usage (practice, 2.3.1) 207 |
209 | Reuse URI schemes (practice, 2.4) 210 |
212 | URI opacity (practice, 2.5) 213 |

215 |

217 | Interaction 218 |

220 |

222 | Reuse representation formats (practice, 3.2) 223 |
225 | Data-metadata inconsistency (constraint, 3.3) 226 |
228 | Metadata association (practice, 3.3) 229 |
231 | Safe retrieval (principle, 3.4) 232 |
234 | Available representation (practice, 3.5) 235 |
237 | Reference does not imply dereference (principle, 3.5) 238 |
240 | Consistent representation (practice, 3.5.1) 241 |

243 |

245 | Data Formats 246 |

248 |

250 | Version information (practice, 4.2.1) 251 |
253 | Namespace policy (practice, 4.2.2) 254 |
256 | Extensibility mechanisms (practice, 4.2.3) 257 |
259 | Extensibility conformance (practice, 4.2.3) 260 |
262 | Unknown extensions (practice, 4.2.3) 263 |
265 | Separation of content, presentation, interaction 266 | (practice, 4.3) 267 |
269 | Link identification (practice, 4.4) 270 |
272 | Web linking (practice, 4.4) 273 |
275 | Generic URIs (practice, 4.4) 276 |
278 | Hypertext links (practice, 4.4) 279 |
281 | Namespace adoption (practice, 4.5.3) 282 |
284 | Namespace documents (practice, 4.5.4) 285 |
287 | QNames Indistinguishable from URIs (constraint, 4.5.5) 288 |
290 | QName Mapping (practice, 4.5.5) 291 |
293 | XML and "text/*" (practice, 4.5.7) 294 |
296 | XML and character encodings (practice, 4.5.7) 297 |

299 |

301 | General Architecture Principles 302 |

304 |

306 | Orthogonality (principle, 5.1) 307 |
309 | Error recovery (principle, 5.3) 310 |

312 |

314 |

315 |

316 |

317 |

318 | Introduction 319 |

320 |

327 |

328 | Examples such as the following travel scenario are used 329 | throughout this document to illustrate typical behavior of Web 330 | agents—people or software acting on this information space. A 331 | user agent acts on behalf of a user. Software agents include 332 | servers, proxies, spiders, browsers, and multimedia players. 333 |

334 |

335 |

336 | Story 337 |

338 |

339 |

340 | While planning a trip to Mexico, Nadia reads “Oaxaca weather 341 | information: 'http://weather.example.com/oaxaca'” in a glossy 342 | travel magazine. Nadia has enough experience with the Web to 343 | recognize that "http://weather.example.com/oaxaca" is a URI and 344 | that she is likely to be able to retrieve associated information 345 | with her Web browser. When Nadia enters the URI into her browser: 346 |

347 |

The browser recognizes that what Nadia typed is a URI. 349 |
The browser performs an information retrieval action in 351 | accordance with its configured behavior for resources identified 352 | via the "http" URI scheme. 353 |
The authority responsible for "weather.example.com" provides 355 | information in a response to the retrieval request. 356 |
The browser interprets the response, identified as XHTML by the 358 | server, and performs additional retrieval actions for inline 359 | graphics and other content as necessary. 360 |
The browser displays the retrieved information, which includes 362 | hypertext links to other information. Nadia can follow these 363 | hypertext links to retrieve additional information. 364 |

366 |

367 |

368 |

369 | This scenario illustrates the three architectural bases of the Web that 370 | are discussed in this document: 371 |

372 |

374 |
375 | Identification. URIs are used to identify resources. In this 376 | travel scenario, the resource is a periodically updated report on 377 | the weather in Oaxaca, and the URI is 378 | “http://weather.example.com/oaxaca”. 379 |
380 |
382 |
383 | Interaction. Web agents communicate using standardized 384 | protocols that enable interaction through the exchange of messages 385 | which adhere to a defined syntax and semantics. By entering a URI 386 | into a retrieval dialog or selecting a hypertext link, Nadia tells 387 | her browser to perform a retrieval action for the resource 388 | identified by the URI. In this example, the browser sends an HTTP 389 | GET request (part of the HTTP protocol) to the server at 390 | "weather.example.com", via TCP/IP port 80, and the server sends 391 | back a message containing what it determines to be a representation 392 | of the resource as of the time that representation was generated. 393 | Note that this example is specific to hypertext browsing of 394 | information—other kinds of interaction are possible, both within 395 | browsers and through the use of other types of Web agent; our 396 | example is intended to illustrate one common interaction, not 397 | define the range of possible interactions or limit the ways in 398 | which agents might use the Web. 399 |
400 |
402 |
403 | Formats. Most protocols used for representation retrieval 404 | and/or submission make use of a sequence of one or more messages, 405 | which taken together contain a payload of representation data and 406 | metadata, to transfer the representation between agents. The choice 407 | of interaction protocol places limits on the formats of 408 | representation data and metadata that can be transmitted. HTTP, for 409 | example, typically transmits a single octet stream plus metadata, 410 | and uses the "Content-Type" and "Content-Encoding" header fields to 411 | further identify the format of the representation. In this 412 | scenario, the representation transferred is in XHTML, as identified 413 | by the "Content-type" HTTP header field containing the registered 414 | Internet media type name, "application/xhtml+xml". That Internet 415 | media type name indicates that the representation data can be 416 | processed according to the XHTML specification. 417 |
418 |
419 | Nadia's browser is configured and programmed to interpret the 420 | receipt of an "application/xhtml+xml" typed representation as an 421 | instruction to render the content of that representation according 422 | to the XHTML rendering model, including any subsidiary interactions 423 | (such as requests for external style sheets or in-line images) 424 | called for by the representation. In the scenario, the XHTML 425 | representation data received from the initial request instructs 426 | Nadia's browser to also retrieve and render in-line the weather 427 | maps, each identified by a URI and thus causing an additional 428 | retrieval action, resulting in additional representations that are 429 | processed by the browser according to their own data formats (e.g., 430 | "application/svg+xml" indicates the SVG data format), and this 431 | process continues until all of the data formats have been rendered. 432 | The result of all of this processing, once the browser has reached 433 | an application steady-state that completes Nadia's initial 434 | requested action, is commonly referred to as a "Web page". 435 |
436 |

438 |

439 | The following illustration shows the relationship between identifier, 440 | resource, and representation. 441 |

442 |

444 — 447 | In the remainder of this document, we highlight important 448 | architectural points regarding Web identifiers, protocols, and 449 | formats. We also discuss some important general architectural 450 | principles and how they apply to the Web. 451 |

453 |

454 |

455 | About this Document 456 |

457 |

458 | This document describes the properties we desire of the Web and the 459 | design choices that have been made to achieve them. It promotes the 460 | reuse of existing standards when suitable, and gives guidance on how 461 | to innovate in a manner consistent with Web architecture. 462 |

463 |

464 | The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in the 465 | principles, constraints, and good practice notes in accordance with 466 | RFC 2119 [[!RFC2119]]. 467 |

468 |

469 | This document does not include conformance provisions for these 470 | reasons: 471 |

472 |

Conforming software is expected to be so diverse that it would 474 | not be useful to be able to refer to the class of conforming software 475 | agents. 476 |
Some of the good practice notes concern people; specifications 478 | generally define conformance for software, not people. 479 |
We do not believe that the addition of a conformance section is 481 | likely to increase the utility of the document. 482 |

484 |

485 |

486 | Audience of this Document 487 |

488 |

489 | This document is intended to inform discussions about issues of Web 490 | architecture. The intended audience for this document includes: 491 |

492 |

Participants in W3C Activities 494 |
Other groups and individuals designing technologies to be 496 | integrated into the Web 497 |
Implementers of W3C specifications 499 |
Web content authors and publishers 501 |

503 |

504 | Note: This document does not distinguish in any 505 | formal way the terms "language" and "format." Context determines 506 | which term is used. The phrase "specification designer" encompasses 507 | language, format, and protocol designers. 508 |

509 |

510 |

511 |

512 | Scope of this Document 513 |

514 |

515 | This document presents the general architecture of the Web. Other 516 | groups inside and outside W3C also address specialized aspects of 517 | Web architecture, including accessibility, quality assurance, 518 | internationalization, device independence, and Web Services. The 519 | section on Architectural Specifications includes references 520 | to these related specifications. 521 |

522 |

523 | This document strives for a balance between brevity and precision 524 | while including illustrative examples. TAG findings are 526 | informational documents that complement the current document by 527 | providing more detail about selected topics. This document includes 528 | some excerpts from the findings. Since the findings evolve 529 | independently, this document includes references to approved TAG 530 | findings. For other TAG issues covered by this document but without 531 | an approved finding, references are to entries in the TAG issues list. 533 |

534 |

535 | Many of the examples in this document that involve human activity 536 | suppose the familiar Web interaction model (illustrated at the 537 | beginning of the Introduction) where a person follows a link via a 538 | user agent, the user agent retrieves and presents data, the user 539 | follows another link, etc. This document does not discuss in any 540 | detail other interaction models such as voice browsing (see, for 541 | example, [[!VOICEXML20]]). The choice of interaction model may have 542 | an impact on expected agent behavior. For instance, when a 543 | graphical user agent running on a laptop computer or hand-held 544 | device encounters an error, the user agent can report errors 545 | directly to the user through visual and audio cues, and present the 546 | user with options for resolving the errors. On the other hand, when 547 | someone is browsing the Web through voice input and audio-only 548 | output, stopping the dialog to wait for user input may reduce 549 | usability since it is so easy to "lose one's place" when browsing 550 | with only audio-output. This document does not discuss how the 551 | principles, constraints, and good practices identified here apply 552 | in all interaction contexts. 553 |

554 |

555 |

556 |

557 | Principles, Constraints, and Good Practice Notes 558 |

559 |

560 | The important points of this document are categorized as follows: 561 |

562 |

564 | Principle 565 |: 567 | An architectural principle is a fundamental rule that applies to 568 | a large number of situations and variables. Architectural 569 | principles include "separation of concerns", "generic interface", 570 | "self-descriptive syntax," "visible semantics," "network effect" 571 | (Metcalfe's Law), and Amdahl's Law: "The speed of a system is 572 | limited by its slowest component." 573 |
575 | Constraint 576 |: 578 | In the design of the Web, some choices, like the names of the 579 | p and li elements in HTML, the choice 580 | of the colon (:) character in URIs, or grouping bits into 581 | eight-bit units (octets), are somewhat arbitrary; if 582 | paragraph had been chosen instead of p 583 | or asterisk (*) instead of colon, the large-scale result would, 584 | most likely, have been the same. This document focuses on more 585 | fundamental design choices: design choices that lead to 586 | constraints, i.e., restrictions in behavior or interaction within 587 | the system. Constraints may be imposed for technical, policy, or 588 | other reasons to achieve desirable properties in the system, such 589 | as accessibility, global scope, relative ease of evolution, 590 | efficiency, and dynamic extensibility. 591 |
593 | Good practice 594 |: 596 | Good practice—by software developers, content authors, site 597 | managers, users, and specification designers—increases the value 598 | of the Web. 599 |

601 |

602 |

603 |

604 |

605 |

606 | Identification 607 |

608 |

609 | In order to communicate internally, a community agrees (to a reasonable 610 | extent) on a set of terms and their meanings. One goal of the Web, 611 | since its inception, has been to build a global community in which any 612 | party can share information with any other party. To achieve this goal, 613 | the Web makes use of a single global identification system: the URI. 614 | URIs are a cornerstone of Web architecture, providing identification 615 | that is common across the Web. The global scope of URIs promotes 616 | large-scale "network effects": the value of an identifier increases the 617 | more it is used consistently (for example, the more it is used in 618 | hypertext links). 619 |

620 |

621 |

622 | Principle: Global 623 | Identifiers 624 |

625 |

626 | Global naming leads to global network effects. 627 |

628 |

629 |

630 | This principle dates back at least as far as Douglas Engelbart's 631 | seminal work on open hypertext systems; see section Every Object 633 | Addressable in [[!Eng90]]. 634 |

635 |

636 |

637 | Benefits of URIs 638 |

639 |

640 | The choice of syntax for global identifiers is somewhat arbitrary; it 641 | is their global scope that is important. The Uniform Resource 642 | Identifier, [[!URI]], has been successfully deployed since the 643 | creation of the Web. There are substantial benefits to participating 644 | in the existing network of URIs, including linking, bookmarking, 645 | caching, and indexing by search engines, and there are substantial 646 | costs to creating a new identification system that has the same 647 | properties as URIs. 648 |

649 |

650 |

651 | Good practice: Identify with URIs 653 |

654 |

655 | To benefit from and increase the value of the World Wide Web, 656 | agents should provide URIs as identifiers for resources. 657 |

658 |

659 |

660 | A resource should have an associated URI if another party might 661 | reasonably want to create a hypertext link to it, make or refute 662 | assertions about it, retrieve or cache a representation of it, 663 | include all or part of it by reference into another representation, 664 | annotate it, or perform other operations on it. Software developers 665 | should expect that sharing URIs across applications will be useful, 666 | even if that utility is not initially evident. The TAG finding 667 | "URIs, 669 | Addressability, and the use of HTTP GET and POST" 670 | discusses additional benefits and considerations of URI 671 | addressability. 672 |

673 |

674 | Note: Some URI schemes (such as the "ftp" URI scheme 675 | specification) use the term "designate" where this document uses 676 | "identify." 677 |

678 |

679 |

680 |

681 | URI/Resource Relationships 682 |

683 |

684 | By design a URI identifies one resource. We do not limit the scope of 685 | what might be a resource. The term "resource" is used in a 686 | general sense for whatever might be identified by a URI. It is 687 | conventional on the hypertext Web to describe Web pages, images, 688 | product catalogs, etc. as “resources”. The distinguishing 689 | characteristic of these resources is that all of their essential 690 | characteristics can be conveyed in a message. We identify this set as 691 | “information resources.” 692 |

693 |

694 | This document is an example of an information resource. It consists 695 | of words and punctuation symbols and graphics and other artifacts 696 | that can be encoded, with varying degrees of fidelity, into a 697 | sequence of bits. There is nothing about the essential information 698 | content of this document that cannot in principle be transfered in a 699 | message. In the case of this document, the message payload is the 700 | representation of this document. 701 |

702 |

703 | However, our use of the term resource is intentionally more broad. 704 | Other things, such as cars and dogs (and, if you've printed this 705 | document on physical sheets of paper, the artifact that you are 706 | holding in your hand), are resources too. They are not information 707 | resources, however, because their essence is not information. 708 | Although it is possible to describe a great many things about a car 709 | or a dog in a sequence of bits, the sum of those things will 710 | invariably be an approximation of the essential character of the 711 | resource. 712 |

713 |

714 | We define the term “information resource” because we observe that it 715 | is useful in discussions of Web technology and may be useful in 716 | constructing specifications for facilities built for use on the Web. 717 |

718 |

719 |

720 | Constraint: URIs Identify a Single Resource 722 |

723 |

724 | Assign distinct URIs to distinct resources. 725 |

726 |

727 |

728 | Since the scope of a URI is global, the resource identified by a URI 729 | does not depend on the context in which the URI appears (see also the 730 | section about indirect identification). 731 |

732 |

733 | [[!URI]] is an agreement about how the Internet community allocates 734 | names and associates them with the resources they identify. URIs are 735 | divided into schemes that define, via their scheme 736 | specification, the mechanism by which scheme-specific identifiers are 737 | associated with resources. For example, the "http" URI scheme 738 | ([[!HTTP11]]) uses DNS and TCP-based HTTP servers for the purpose of 739 | identifier allocation and resolution. As a result, identifiers such 740 | as "http://example.com/somepath#someFrag" often take on meaning 741 | through the community experience of performing an HTTP GET request on 742 | the identifier and, if given a successful response, interpreting the 743 | response as a representation of the identified resource. (See also 744 | Fragment Identifiers.) Of course, a retrieval action like GET 745 | is not the only way to obtain information about a resource. One might 746 | also publish a document that purports to define the meaning of a 747 | particular URI. These other sources of information may suggest 748 | meanings for such identifiers, but it's a local policy decision 749 | whether those suggestions should be heeded. 750 |

751 |

752 | Just as one might wish to refer to a person by different names (by 753 | full name, first name only, sports nickname, romantic nickname, and 754 | so forth), Web architecture allows the association of more than one 755 | URI with a resource. URIs that identify the same resource are called 756 | URI aliases. The section on URI aliases discusses 757 | some of the potential costs of creating multiple URIs for the same 758 | resource. 759 |

760 |

761 | Several sections of this document address questions about the 762 | relationship between URIs and resources, including: 763 |

764 |

How much can I tell about a resource by inspection of a URI that 766 | identifies it? See the sections on URI schemes and URI 767 | opacity. 768 |
Who determines what resource a URI identifies? See the section on 770 | URI allocation. 771 |
Can the resource identified by a URI change over time? See the 773 | sections on URI persistence and representation 774 | management . 775 |
Since more than one URI can identify the same resource, how do I 777 | know which URIs identify the same resource? See the sections on 778 | URI comparison and assertions that two URIs identify the 779 | same resource. 780 |

782 |

783 |

784 | URI collision 785 |

786 |

787 | By design, a URI identifies one resource. Using the same URI to 788 | directly identify different resources produces a URI 789 | collision. Collision often imposes a cost in communication 790 | due to the effort required to resolve ambiguities. 791 |

792 |

793 | Suppose, for example, that one organization makes use of a URI to 794 | refer to the movie The Sting, and another organization 795 | uses the same URI to refer to a discussion forum about The 796 | Sting. To a third party, aware of both organizations, this 797 | collision creates confusion about what the URI identifies, 798 | undermining the value of the URI. If one wanted to talk about the 799 | creation date of the resource identified by the URI, for instance, 800 | it would not be clear whether this meant "when the movie was 801 | created" or "when the discussion forum about the movie was 802 | created." 803 |

804 |

805 | Social and technical solutions have been devised to help avoid URI 806 | collision. However, the success or failure of these different 807 | approaches depends on the extent to which there is consensus in the 808 | Internet community on abiding by the defining specifications. 809 |

810 |

811 | The section on URI allocation examines 812 | approaches for establishing the authoritative source of information 813 | about what resource a URI identifies. 814 |

815 |

816 | URIs are sometimes used for indirect identification. This 817 | does not necessarily lead to collisions. 818 |

819 |

820 |

821 |

822 | URI allocation 823 |

824 |

825 | URI allocation is the process of associating a URI with a resource. 826 | Allocation can be performed both by resource owners and by other 827 | parties. It is important to avoid URI collision. 828 |

829 |

830 |

831 | URI ownership 832 |

833 |

839 |

to pass on ownership of some or all owned URIs to another 841 | owner—delegation; and 842 |
to associate a resource with an owned URI—URI allocation. 844 |

846 |

847 | By social convention, URI ownership is delegated from the IANA 848 | URI scheme registry [[!IANASchemes]], itself a social entity, to 849 | IANA-registered URI scheme specifications. Some URI scheme 850 | specifications further delegate ownership to subordinate 851 | registries or to other nominated owners, who may further delegate 852 | ownership. In the case of a specification, ownership ultimately 853 | lies with the community that maintains the specification. 854 |

855 |

856 | The approach taken for the "http" URI scheme, for example, 857 | follows the pattern whereby the Internet community delegates 858 | authority, via the IANA URI scheme registry and the DNS, over a 859 | set of URIs with a common prefix to one particular owner. One 860 | consequence of this approach is the Web's heavy reliance on the 861 | central DNS registry. A different approach is taken by the URN 862 | Syntax scheme [[!RFC2141] which delegates ownership of portions 863 | of URN space to URN Namespace specifications which themselves are 864 | registered in an IANA-maintained registry of URN Namespace 865 | Identifiers. 866 |

867 |

868 | URI owners are responsible for avoiding the assignment of 869 | equivalent URIs to multiple resources. Thus, if a URI scheme 870 | specification does provide for the delegation of individual or 871 | organized sets of URIs, it should take pains to ensure that 872 | ownership ultimately resides in the hands of a single social 873 | entity. Allowing multiple owners increases the likelihood of URI 874 | collisions. 875 |

876 |

877 | URI owners may organize or deploy infrastruture to ensure that 878 | representations of associated resources are available and, where 879 | appropriate, interaction with the resource is possible through 880 | the exchange of representations. There are social expectations 881 | for responsible representation management 882 | by URI owners. Additional social implications of URI ownership 883 | are not discussed here. 884 |

885 |

886 | See TAG issue siteData-36, 888 | which concerns the expropriation of naming authority. 889 |

890 |

891 |

892 |

893 | Other allocation schemes 894 |

895 |

896 | Some schemes use techniques other than delegated ownership to 897 | avoid collision. For example, the specification for the data URL 898 | (sic) scheme [[!RFC2397]] specifies that the resource identified 899 | by a data scheme URI has only one possible representation. The 900 | representation data makes up the URI that identifies that 901 | resource. Thus, the specification itself determines how data URIs 902 | are allocated; no delegation is possible. 903 |

904 |

905 | Other schemes (such as "news:comp.text.xml") rely on a social 906 | process. 907 |

908 |

909 |

910 |

911 |

912 | Indirect Identification 913 |

914 |

915 | To say that the URI "mailto:nadia@example.com" identifies both an 916 | Internet mailbox and Nadia, the person, introduces a URI collision. 917 | However, we can use the URI to indirectly identify Nadia. 918 | Identifiers are commonly used in this way. 919 |

920 |

921 | Listening to a news broadcast, one might hear a report on Britain 922 | that begins, "Today, 10 Downing Street announced a series of new 923 | economic measures." Generally, "10 Downing Street" identifies the 924 | official residence of Britain's Prime Minister. In this context, 925 | the news reporter is using it (as English rhetoric allows) to 926 | indirectly identify the British government. Similarly, URIs 927 | identify resources, but they can also be used in many constructs to 928 | indirectly identify other resources. Globally adopted assignment 929 | policies make some URIs appealing as general-purpose identifiers. 930 | Local policy establishes what they indirectly identify. 931 |

932 |

933 | Suppose that nadia@example.com is Nadia's email 934 | address. The organizers of a conference Nadia attends might use 935 | "mailto:nadia@example.com" to refer indirectly to her (e.g., by 936 | using the URI as a database key in their database of conference 937 | participants). This does not introduce a URI collision. 938 |

939 |

940 |

941 |

942 |

943 | URI Comparisons 944 |

945 |

946 | URIs that are identical, character-by-character, refer to the same 947 | resource. Since Web Architecture allows the association of multiple 948 | URIs with a given resource, two URIs that are not 949 | character-by-character identical may still refer to the same 950 | resource. Different URIs do not necessarily refer to different 951 | resources but there is generally a higher computational cost to 952 | determine that different URIs refer to the same resource. 953 |

954 |

955 | To reduce the risk of a false negative (i.e., an incorrect conclusion 956 | that two URIs do not refer to the same resource) or a false positive 957 | (i.e., an incorrect conclusion that two URIs do refer to the same 958 | resource), some specifications describe equivalence tests in addition 959 | to character-by-character comparison. Agents that reach conclusions 960 | based on comparisons that are not licensed by the relevant 961 | specifications take responsibility for any problems that result; see 962 | the section on error handling for more 963 | information about responsible behavior when reaching unlicensed 964 | conclusions. Section 6 of [[!URI]] provides more information about 965 | comparing URIs and reducing the risk of false negatives and 966 | positives. 967 |

968 |

969 | See also the assertion that two URIs identify the same 970 | resource. 971 |

972 |

973 |

974 | URI aliases 975 |

976 |

977 | Although there are benefits (such as naming flexibility) to URI 978 | aliases, there are also costs. URI aliases are harmful when they 979 | divide the Web of related resources. A corollary of Metcalfe's 980 | Principle (the "network effect") is that the value of a given 981 | resource can be measured by the number and value of other resources 982 | in its network neighborhood, that is, the resources that link to 983 | it. 984 |

985 |

986 | The problem with aliases is that if half of the neighborhood points 987 | to one URI for a given resource, and the other half points to a 988 | second, different URI for that same resource, the neighborhood is 989 | divided. Not only is the aliased resource undervalued because of 990 | this split, the entire neighborhood of resources loses value 991 | because of the missing second-order relationships that should have 992 | existed among the referring resources by virtue of their references 993 | to the aliased resource. 994 |

995 |

996 |

997 | Good practice: Avoiding URI aliases 999 |

1000 |

1001 | A URI owner SHOULD NOT associate arbitrarily different URIs with 1002 | the same resource. 1003 |

1004 |

1005 |

1006 | URI consumers also have a role in ensuring URI consistency. For 1007 | instance, when transcribing a URI, agents should not gratuitously 1008 | percent-encode characters. The term "character" refers to URI 1009 | characters as defined in section 2 of [[!URI]]; percent-encoding is 1010 | discussed in section 2.1 of that specification. 1011 |

1012 |

1013 |

1014 | Good practice: Consistent URI usage 1016 |

1017 |

1018 | An agent that receives a URI SHOULD refer to the associated 1019 | resource using the same URI, character-by-character. 1020 |

1021 |

1022 |

1023 | When a URI alias does become common currency, the URI owner should use protocol techniques such 1025 | as server-side redirects to relate the two resources. The community 1026 | benefits when the URI owner supports redirection of an aliased URI 1027 | to the corresponding "official" URI. For more information on 1028 | redirection, see section 10.3, Redirection, in [[!HTTP11]]. See 1029 | also [[!CHIPS]] for a discussion of some best practices for server 1030 | administrators. 1031 |

1032 |

1033 |

1034 |

1035 | Representation reuse 1036 |

1037 |

1038 | URI aliasing only occurs when more than one URI is used to identify 1039 | the same resource. The fact that different resources sometimes have 1040 | the same representation does not make the URIs for those resources 1041 | aliases. 1042 |

1043 |

1044 |

1045 | Story 1046 |

1047 |

1048 |

1049 | Dirk would like to add a link from his Web site to the Oaxaca 1050 | weather site. He uses the URI http://weather.example.com/oaxaca 1051 | and labels his link “report on weather in Oaxaca on 1052 | 1 August 2004”. Nadia points out to Dirk that he is 1053 | setting misleading expectations for the URI he has used. The 1054 | Oaxaca weather site policy is that the URI in question 1055 | identifies a report on the current weather in Oaxaca—on any 1056 | given day—and not the weather on 1 August. Of course, on the 1057 | first of August in 2004, Dirk's link will be correct, but the 1058 | rest of the time he will be misleading readers. Nadia points 1059 | out to Dirk that the managers of the Oaxaca weather site do 1060 | make available a different URI permanently assigned to a 1061 | resource reporting on the weather on 1 August 2004. 1062 |

1063 |

1064 |

1065 |

1066 | In this story, there are two resources: “a report on the current 1067 | weather in Oaxaca” and “a report on the weather in Oaxaca on 1068 | 1 August 2004”. The managers of the Oaxaca weather site 1069 | assign two URIs to these two different resources. On 1070 | 1 August 2004, the representations for these resources 1071 | are identical. That fact that dereferencing two different URIs 1072 | produces identical representations does not imply that the two URIs 1073 | are aliases. 1074 |

1075 |

1076 |

1077 |

1078 |

1079 | URI Schemes 1080 |

1081 |

1082 | In the URI "http://weather.example.com/", the "http" that appears 1083 | before the colon (":") names a URI scheme. Each URI scheme has a 1084 | specification that explains the scheme-specific details of how scheme 1085 | identifiers are allocated and become associated with a resource. The 1086 | URI syntax is thus a federated and extensible naming system wherein 1087 | each scheme's specification may further restrict the syntax and 1088 | semantics of identifiers within that scheme. 1089 |

1090 |

1091 | Examples of URIs from various schemes include: 1092 |

1093 |

mailto:joe@example.org 1095 |
ftp://example.org/aDirectory/aFile 1097 |
news:comp.infosystems.www 1099 |
tel:+1-816-555-1212 1101 |
ldap://ldap.example.org/c=GB?objectClass?one 1103 |
urn:oasis:names:tc:entity:xmlns:xml:catalog 1105 |

1107 |

1108 | While Web architecture allows the definition of new schemes, 1109 | introducing a new scheme is costly. Many aspects of URI processing 1110 | are scheme-dependent, and a large amount of deployed software already 1111 | processes URIs of well-known schemes. Introducing a new URI scheme 1112 | requires the development and deployment not only of client software 1113 | to handle the scheme, but also of ancillary agents such as gateways, 1114 | proxies, and caches. See [[!RFC2718]] for other considerations and 1115 | costs related to URI scheme design. 1116 |

1117 |

1118 | Because of these costs, if a URI scheme exists that meets the needs 1119 | of an application, designers should use it rather than invent one. 1120 |

1121 |

1122 |

1123 | Good practice: Reuse URI schemes 1125 |

1126 |

1127 | A specification SHOULD reuse an existing URI scheme (rather than 1128 | create a new one) when it provides the desired properties of 1129 | identifiers and their relation to resources. 1130 |

1131 |

1132 |

1133 | Consider our travel scenario: should the 1134 | agent providing information about the weather in Oaxaca register a 1135 | new URI scheme "weather" for the identification of resources related 1136 | to the weather? They might then publish URIs such as 1137 | "weather://travel.example.com/oaxaca". When a software agent 1138 | dereferences such a URI, if what really happens is that HTTP GET is 1139 | invoked to retrieve a representation of the resource, then an "http" 1140 | URI would have sufficed. 1141 |

1142 |

1143 |

1144 | URI Scheme Registration 1145 |

1146 |

1147 | The Internet Assigned Numbers Authority (IANA) 1148 | maintains a registry [[!IANASchemes]] of mappings between URI 1149 | scheme names and scheme specifications. For instance, the IANA 1150 | registry indicates that the "http" scheme is defined in 1151 | [[!HTTP11]]. The process for registering a new URI scheme is 1152 | defined in [[!RFC2717]]. 1153 |

1154 |

1155 | Unregistered URI schemes SHOULD NOT be used for a number of 1156 | reasons: 1157 |

1158 |

There is no generally accepted way to locate the scheme 1160 | specification. 1161 |
Someone else may be using the scheme for other purposes. 1163 |
One should not expect that general-purpose software will do 1165 | anything useful with URIs of this scheme beyond URI comparison. 1166 |

1168 |

1169 | One misguided motivation for registering a new URI scheme is to 1170 | allow a software agent to launch a particular application when 1171 | retrieving a representation. The same thing can be accomplished at 1172 | lower expense by dispatching instead on the type of the 1173 | representation, thereby allowing use of existing transfer protocols 1174 | and implementations. 1175 |

1176 |

1177 | Even if an agent cannot process representation data in an unknown 1178 | format, it can at least retrieve it. The data may contain enough 1179 | information to allow a user or user agent to make some use of it. 1180 | When an agent does not handle a new URI scheme, it cannot retrieve 1181 | a representation. 1182 |

1183 |

1184 | When designing a new data format, the preferred mechanism to 1185 | promote its deployment on the Web is the Internet media type (see 1186 | Representation Types and Internet 1187 | Media Types). Media types also provide a means for building new 1188 | information applications, as described in future directions for data formats . 1190 |

1191 |

1192 |

1193 |

1194 |

1195 | URI Opacity 1196 |

1197 |

1198 | It is tempting to guess the nature of a resource by inspection of a 1199 | URI that identifies it. However, the Web is designed so that agents 1200 | communicate resource information state through representations, not identifiers. In 1202 | general, one cannot determine the type of a resource representation 1203 | by inspecting a URI for that resource. For example, the ".html" at 1204 | the end of "http://example.com/page.html" provides no guarantee that 1205 | representations of the identified resource will be served with the 1206 | Internet media type "text/html". The publisher is free to allocate 1207 | identifiers and define how they are served. The HTTP protocol does 1208 | not constrain the Internet media type based on the path component of 1209 | the URI; the URI owner is free to configure the server to return a 1210 | representation using PNG or any other data format. 1211 |

1212 |

1213 | Resource state may evolve over time. Requiring a URI owner to publish 1214 | a new URI for each change in resource state would lead to a 1215 | significant number of broken references. For robustness, Web 1216 | architecture promotes independence between an identifier and the 1217 | state of the identified resource. 1218 |

1219 |

1220 |

1221 | Good practice: URI 1222 | opacity 1223 |

1224 |

1225 | Agents making use of URIs SHOULD NOT attempt to infer properties of 1226 | the referenced resource. 1227 |

1228 |

1229 |

1230 | In practice, a small number of inferences can be made because they 1231 | are explicitly licensed by the relevant specifications. Some of these 1232 | inferences are discussed in the details of retrieving a representation . 1234 |

1235 |

1236 | The example URI used in the travel scenario 1237 | ("http://weather.example.com/oaxaca") suggests to a human reader that 1238 | the identified resource has something to do with the weather in 1239 | Oaxaca. A site reporting the weather in Oaxaca could just as easily 1240 | be identified by the URI "http://vjc.example.com/315". And the URI 1241 | "http://weather.example.com/vancouver" might identify the resource 1242 | "my photo album." 1243 |

1244 |

1245 | On the other hand, the URI "mailto:joe@example.com" indicates that 1246 | the URI refers to a mailbox. The "mailto" URI scheme specification 1247 | authorizes agents to infer that URIs of this form identify Internet 1248 | mailboxes. 1249 |

1250 |

1258 |

1259 |

1260 |

1261 | Fragment Identifiers 1262 |

1263 |

1264 |

1265 | Story 1266 |

1267 |

1268 |

1269 | When browsing the XHTML document that Nadia receives as a 1270 | representation of the resource identified by 1271 | "http://weather.example.com/oaxaca", she finds that the URI 1272 | "http://weather.example.com/oaxaca#weekend" refers to the part of 1273 | the representation that conveys information about the weekend 1274 | outlook. This URI includes the fragment identifier "weekend" (the 1275 | string after the "#"). 1276 |

1277 |

1278 |

1279 |

1280 | The fragment identifier component of a URI allows indirect 1281 | identification of a secondary resource by reference to a 1282 | primary resource and additional identifying information. The 1283 | secondary resource may be some portion or subset of the primary 1284 | resource, some view on representations of the primary resource, or 1285 | some other resource defined or described by those representations. 1286 | The terms "primary resource" and "secondary resource" are defined in 1287 | section 3.5 of [[!URI]]. 1288 |

1289 |

1290 | The terms “primary” and “secondary” in this context do not limit the 1291 | nature of the resource—they are not classes. In this context, primary 1292 | and secondary simply indicate that there is a relationship between 1293 | the resources for the purposes of one URI: the URI with a fragment 1294 | identifier. Any resource can be identified as a secondary resource. 1295 | It might also be identified using a URI without a fragment 1296 | identifier, and a resource may be identified as a secondary resource 1297 | via multiple URIs. The purpose of these terms is to enable discussion 1298 | of the relationship between such resources, not to limit the nature 1299 | of a resource. 1300 |

1301 |

1302 | The interpretation of fragment identifiers is discussed in the 1303 | section on media types and fragment 1304 | identifier semantics. 1305 |

1306 |

1307 | See TAG issue abstractComponentRefs-37, 1309 | which concerns the use of fragment identifiers with namespace names 1310 | to identify abstract components. 1311 |

1312 |

1313 |

1314 |

1315 | Future Directions for Identifiers 1316 |

1317 |

1318 | There remain open questions regarding identifiers on the Web. 1319 |

1320 |

1321 |

1322 | Internationalized identifiers 1323 |

1324 |

1325 | The integration of internationalized identifiers (i.e., composed of 1326 | characters beyond those allowed by [[!URI]]) into the Web 1327 | architecture is an important and open issue. See TAG issue IRIEverywhere-27 1329 | for discussion about work going on in this area. 1330 |

1331 |

1332 |

1333 |

1334 | Assertion that two URIs identify the same resource 1335 |

1336 |

1337 | Emerging Semantic Web technologies, including the "Web Ontology 1338 | Language (OWL)" [OWL10], define RDF properties 1339 | such as sameAs to assert that two URIs identify the 1340 | same resource or inverseFunctionalProperty to imply 1341 | it. 1342 |

1343 |

1344 |

1345 |

1346 |

1347 |

1348 | Interaction 1349 |

1350 |

1351 | Communication between agents over a network about resources involves 1352 | URIs, messages, and data. The Web's protocols (including HTTP, FTP, 1353 | SOAP, NNTP, and SMTP) are based on the exchange of messages. A 1354 | message may include data as well as metadata about a 1355 | resource (such as the "Alternates" and "Vary" HTTP headers), the 1356 | message data, and the message itself (such as the "Transfer-encoding" 1357 | HTTP header). A message may even include metadata about the message 1358 | metadata (for message-integrity checks, for instance). 1359 |

1360 |

1361 |

1362 | Story 1363 |

1364 |

1365 |

1366 | Nadia follows a hypertext link labeled "satellite image" expecting 1367 | to retrieve a satellite photo of the Oaxaca region. The link to the 1368 | satellite image is an XHTML link encoded as <a 1369 | href="http://example.com/satimage/oaxaca">satellite 1370 | image</a>. Nadia's browser analyzes the URI and 1371 | determines that its scheme is "http". The 1372 | browser configuration determines how it locates the identified 1373 | information, which might be via a cache of prior retrieval actions, 1374 | by contacting an intermediary (such as a proxy server), or by 1375 | direct access to the server identified by a portion of the URI. In 1376 | this example, the browser opens a network connection to port 80 on 1377 | the server at "example.com" and sends a "GET" message as specified 1378 | by the HTTP protocol, requesting a representation of the resource. 1379 |

1380 |

1381 | The server sends a response message to the browser, once again 1382 | according to the HTTP protocol. The message consists of several 1383 | headers and a JPEG image. The browser reads the headers, learns 1384 | from the "Content-Type" field that the Internet media type of the 1385 | representation is "image/jpeg", reads the sequence of octets that 1386 | make up the representation data, and renders the image. 1387 |

1388 |

1389 |

1390 |

1391 | This section describes the architectural principles and constraints 1392 | regarding interactions between agents, including such topics as network 1393 | protocols and interaction styles, along with interactions between the 1394 | Web as a system and the people that make use of it. The fact that the 1395 | Web is a highly distributed system affects architectural constraints 1396 | and assumptions about interactions. 1397 |

1398 |

1399 |

1400 | Using a URI to Access a Resource 1401 |

1402 |

1403 | Agents may use a URI to access the referenced resource; this is 1404 | called dereferencing the URI. Access may take many forms, 1405 | including retrieving a representation of the resource (for instance, 1406 | by using HTTP GET or HEAD), adding or modifying a representation of 1407 | the resource (for instance, by using HTTP POST or PUT, which in some 1408 | cases may change the actual state of the resource if the submitted 1409 | representations are interpreted as instructions to that end), and 1410 | deleting some or all representations of the resource (for instance, 1411 | by using HTTP DELETE, which in some cases may result in the deletion 1412 | of the resource itself). 1413 |

1414 |

1415 | There may be more than one way to access a resource for a given URI; 1416 | application context determines which access method an agent uses. For 1417 | instance, a browser might use HTTP GET to retrieve a representation 1418 | of a resource, whereas a hypertext link checker might use HTTP HEAD 1419 | on the same URI simply to establish whether a representation is 1420 | available. Some URI schemes set expectations about available access 1421 | methods, others (such as the URN scheme [[!URN]]) do not. Section 1422 | 1.2.2 of [[!URI]] discusses the separation of identification and 1423 | interaction in more detail. For more information about relationships 1424 | between multiple access methods and URI addressability, see the TAG 1425 | finding "URIs, 1427 | Addressability, and the use of HTTP GET and POST". 1428 |

1429 |

1430 | Although many URI schemes are named after 1431 | protocols, this does not imply that use of such a URI will 1432 | necessarily result in access to the resource via the named protocol. 1433 | Even when an agent uses a URI to retrieve a representation, that 1434 | access might be through gateways, proxies, caches, and name 1435 | resolution services that are independent of the protocol associated 1436 | with the scheme name. 1437 |

1438 |

1439 | Many URI schemes define a default interaction protocol for attempting 1440 | access to the identified resource. That interaction protocol is often 1441 | the basis for allocating identifiers within that scheme, just as 1442 | "http" URIs are defined in terms of TCP-based HTTP servers. However, 1443 | this does not imply that all interaction with such resources is 1444 | limited to the default interaction protocol. For example, information 1445 | retrieval systems often make use of proxies to interact with a 1446 | multitude of URI schemes, such as HTTP proxies being used to access 1447 | "ftp" and "wais" resources. Proxies can also to provide enhanced 1448 | services, such as annotation proxies that combine normal information 1449 | retrieval with additional metadata retrieval to provide a seamless, 1450 | multidimensional view of resources using the same protocols and user 1451 | agents as the non-annotated Web. Likewise, future protocols may be 1452 | defined that encompass our current systems, using entirely different 1453 | interaction mechanisms, without changing the existing identifier 1454 | schemes. See also, principle of 1455 | orthogonal specifications. 1456 |

1457 |

1458 |

1459 | Details of retrieving a representation 1460 |

1461 |

1462 | Dereferencing a URI generally involves a succession of steps as 1463 | described in multiple specifications and implemented by the agent. 1464 | The following example illustrates the series of specifications that 1465 | governs the process when a user agent is instructed to follow a 1466 | hypertext link that is part of an SVG 1467 | document. In this example, the URI is 1468 | "http://weather.example.com/oaxaca" and the application context 1469 | calls for the user agent to retrieve and render a representation of 1470 | the identified resource. 1471 |

1472 |

Since the URI is part of a hypertext link in an SVG document, 1474 | the first relevant specification is the SVG 1.1 Recommendation 1475 | [SVG11]. Section 17.1 of this 1477 | specification imports the link semantics defined in XLink 1.0 1478 | [XLink10]: "The remote resource (the 1479 | destination for the link) is defined by a URI specified by the 1480 | XLink href attribute on the 'a' element." 1481 | The SVG specification goes on to state that interpretation of an 1482 | a element involves retrieving a representation of a 1483 | resource, identified by the href attribute in the 1484 | XLink namespace: "By activating these links (by clicking with the 1485 | mouse, through keyboard input, voice commands, etc.), users may 1486 | visit these resources." 1487 |
The XLink 1.0 [XLink10] specification, 1489 | which defines the href attribute in section 5.4, 1490 | states that "The value of the href attribute must be a URI 1491 | reference as defined in [IETF RFC 2396], or must result in a URI 1492 | reference after the escaping procedure described below is applied." 1493 |
The URI specification [[!URI]] states that "Each URI begins 1495 | with a scheme name that refers to a specification for assigning 1496 | identifiers within that scheme." The URI scheme name in this 1497 | example is "http". 1498 |
[[!IANASchemes]] states that the "http" scheme is defined by 1500 | the HTTP/1.1 specification (RFC 2616 [[!HTTP11]], section 3.2.2). 1501 |
In this SVG context, the agent constructs an HTTP GET request 1503 | (per section 9.3 of [[!HTTP11]]) to retrieve the representation. 1504 |
Section 6 of [[!HTTP11]] defines how the server constructs a 1506 | corresponding response message, including the 'Content-Type' field. 1507 |
Section 1.4 of [[!HTTP11]] states "HTTP communication usually 1509 | takes place over TCP/IP connections." This example addresses 1510 | neither that step in the process nor other steps such as Domain 1511 | Name System (DNS) resolution. 1512 |
The agent interprets the returned representation according to 1514 | the data format specification that corresponds to the 1515 | representation's Internet Media 1516 | Type (the value of the HTTP 'Content-Type') in the relevant 1517 | IANA registry [MEDIATYPEREG]. 1518 |

1520 |

1521 | Precisely which representation(s) are retrieved depends on a number 1522 | of factors, including: 1523 |

1524 |

Whether the URI owner makes available any representations at 1526 | all; 1527 |
Whether the agent making the request has access privileges for 1529 | those representations (see the section on linking and access control); 1531 |
If the URI owner has provided more than one representation (in 1533 | different formats such as HTML, PNG, or RDF; in different languages 1534 | such as English and Spanish; or transformed dynamically according 1535 | to the hardware or software capabilities of the recipient), the 1536 | resulting representation may depend on negotiation between the user 1537 | agent and server. 1538 |
The time of the request; the world changes over time, so 1540 | representations of resources are also likely to change over time. 1541 |

1543 |

1544 | Assuming that a representation has been successfully retrieved, the 1545 | expressive power of the representation's format will affect how 1546 | precisely the representation provider communicates resource state. 1547 | If the representation communicates the state of the resource 1548 | inaccurately, this inaccuracy or ambiguity may lead to confusion 1549 | among users about what the resource is. If different users reach 1550 | different conclusions about what the resource is, they may 1551 | interpret this as a URI collision . 1552 | Some communities, such as the ones developing the Semantic Web, 1553 | seek to provide a framework for accurately communicating the 1554 | semantics of a resource in a machine readable way. Machine readable 1555 | semantics may alleviate some of the ambiguity associated with 1556 | natural language descriptions of resources. 1557 |

1558 |

1559 |

1560 |

1561 |

1562 | Representation Types and Internet Media Types 1563 |

1564 |

1565 | A representation is data that encodes information about 1566 | resource state. Representations do not necessarily describe the 1567 | resource, or portray a likeness of the resource, or represent the 1568 | resource in other senses of the word "represent". 1569 |

1570 |

1571 | Representations of a resource may be sent or received using 1572 | interaction protocols. These protocols in turn determine the form in 1573 | which representations are conveyed on the Web. HTTP, for example, 1574 | provides for transmission of representations as octet streams typed 1575 | using Internet media types [RFC2046]. 1576 |

1577 |

1578 | Just as it is important to reuse existing URI schemes whenever 1579 | possible, there are significant benefits to using media typed octet 1580 | streams for representations even in the unusual case where a new URI 1581 | scheme and associated protocol is to be defined. For example, if the 1582 | Oaxaca weather were conveyed to Nadia's browser using a protocol 1583 | other than HTTP, then software to render formats such as 1584 | text/xhmtl+xml and image/png would still be usable if the new 1585 | protocol supported transmission of those types. This is an example of 1586 | the principle of orthogonal 1587 | specifications. 1588 |

1589 |

1590 |

1591 | Good practice: Reuse representation formats 1593 |

1594 |

1595 | New protocols created for the Web SHOULD transmit representations 1596 | as octet streams typed by Internet media types. 1597 |

1598 |

1599 |

1600 | The Internet media type mechanism does have some limitations. For 1601 | instance, media type strings do not support versioning or other parameters. See TAG issues 1603 | uriMediaType-9 1605 | and mediaTypeManagement-45 1607 | which concern aspects of the media type mechanism. 1608 |

1609 |

1610 |

1611 | Representation types and fragment identifier semantics 1612 |

1613 |

1614 | The Internet Media Type defines the syntax and semantics of the 1615 | fragment identifier (introduced in Fragment 1616 | Identifiers), if any, that may be used in conjunction with a 1617 | representation. 1618 |

1619 |

1620 |

1621 | Story 1622 |

1623 |

1624 |

1625 | In one of his XHTML pages, Dirk creates a hypertext link to an 1626 | image that Nadia has published on the Web. He creates a 1627 | hypertext link with <a 1628 | href="http://www.example.com/images/nadia#hat">Nadia's 1629 | hat</a>. Emma views Dirk's XHTML page in her Web 1630 | browser and follows the link. The HTML implementation in her 1631 | browser removes the fragment from the URI and requests the 1632 | image "http://www.example.com/images/nadia". Nadia serves an 1633 | SVG representation of the image (with Internet media type 1634 | "image/svg+xml"). Emma's Web browser starts up an SVG 1635 | implementation to view the image. It passes it the original URI 1636 | including the fragment, 1637 | "http://www.example.com/images/nadia#hat" to this 1638 | implementation, causing a view of the hat to be displayed 1639 | rather than the complete image. 1640 |

1641 |

1642 |

1643 |

1644 | Note that the HTML implementation in Emma's browser did not need to 1645 | understand the syntax or semantics of the SVG fragment 1646 | (nor does the SVG implementation have to understand HTML, WebCGM, 1647 | RDF ... fragment syntax or semantics; it merely had to recognize 1648 | the # delimiter from the URI syntax [URI] and remove the fragment 1649 | when accessing the resource). This orthogonality is an important feature of 1651 | Web architecture; it is what enabled Emma's browser to provide a 1652 | useful service without requiring an upgrade. 1653 |

1654 |

1655 | The semantics of a fragment identifier are defined by the set of 1656 | representations that might result from a retrieval action on the 1657 | primary resource. The fragment's format and resolution are 1658 | therefore dependent on the type of a potentially retrieved 1659 | representation, even though such a retrieval is only performed if 1660 | the URI is dereferenced. If no such representation exists, then the 1661 | semantics of the fragment are considered unknown and, effectively, 1662 | unconstrained. Fragment identifier semantics are orthogonal to URI 1663 | schemes and thus cannot be redefined by URI scheme specifications. 1664 |

1665 |

1666 | Interpretation of the fragment identifier is performed solely by 1667 | the agent that dereferences a URI; the fragment identifier is not 1668 | passed to other systems during the process of retrieval. This means 1669 | that some intermediaries in Web architecture (such as proxies) have 1670 | no interaction with fragment identifiers and that redirection (in 1671 | HTTP [[!HTTP11]], for example) does not account for fragments. 1672 |

1673 |

1674 |

1675 |

1676 | Fragment identifiers and content negotiation 1677 |

1678 |

1679 | Content negotiation refers to the practice of making 1680 | available multiple representations via the same URI. Negotiation 1681 | between the requesting agent and the server determines which 1682 | representation is served (usually with the goal of serving the 1683 | "best" representation a receiving agent can process). HTTP is an 1684 | example of a protocol that enables representation providers to use 1685 | content negotiation. 1686 |

1687 |

1688 | Individual data formats may define their own rules for use of the 1689 | fragment identifier syntax for specifying different types of 1690 | subsets, views, or external references that are identifiable as 1691 | secondary resources by that media type. Therefore, representation 1692 | providers must manage content negotiation carefully when used with 1693 | a URI that contains a fragment identifier. Consider an example 1694 | where the owner of the URI 1695 | "http://weather.example.com/oaxaca/map#zicatela" uses content 1696 | negotiation to serve two representations of the identified 1697 | resource. Three situations can arise: 1698 |

1699 |

The interpretation of "zicatela" is defined consistently by 1701 | both data format specifications. The representation provider 1702 | decides when definitions of fragment identifier semantics are are 1703 | sufficiently consistent. 1704 |
The interpretation of "zicatela" is defined inconsistently by 1706 | the data format specifications. 1707 |
The interpretation of "zicatela" is defined in one data format 1709 | specification but not the other. 1710 |

1712 |

1713 | The first situation—consistent semantics—poses no problem. 1714 |

1715 |

1716 | The second case is a server management error: representation 1717 | providers must not use content negotiation to serve representation 1718 | formats that have inconsistent fragment identifier semantics. This 1719 | situation also leads to URI collision 1720 | . 1721 |

1722 |

1723 | The third case is not a server management error. It is a means by 1724 | which the Web can grow. Because the Web is a distributed system in 1725 | which formats and agents are deployed in a non-uniform manner, Web 1726 | architecture does not constrain authors to only use "lowest common 1727 | denominator" formats. Content authors may take advantage of new 1728 | data formats while still ensuring reasonable backward-compatibility 1729 | for agents that do not yet implement them. 1730 |

1731 |

1732 | In case three, behavior by the receiving agent should vary 1733 | depending on whether the negotiated format defines fragment 1734 | identifier semantics. When a received data format does not define 1735 | fragment identifier semantics, the agent should not perform 1736 | silent error recovery unless the 1737 | user has given consent; see [[!CUAP]] for additional suggested 1738 | agent behavior in this case. 1739 |

1740 |

1741 | See related TAG issue RDFinXHTML-35. 1743 |

1744 |

1745 |

1746 |

1747 |

1748 | Inconsistencies between Representation Data and Metadata 1749 |

1750 |

1751 | Successful communication between two parties depends on a reasonably 1752 | shared understanding of the semantics of exchanged messages, both 1753 | data and metadata. At times, there may be inconsistencies between a 1754 | message sender's data and metadata. Examples, observed in practice, 1755 | of inconsistencies between representation data and metadata include: 1756 |

1757 |

The actual character encoding of a representation (e.g., 1759 | "iso-8859-1", specified by the encoding attribute in an 1760 | XML declaration) is inconsistent with the charset parameter in the 1761 | representation metadata (e.g., "utf-8", specified by the 1762 | 'Content-Type' field in an HTTP header). 1763 |
The namespace of the root element 1765 | of XML representation data (e.g., as specified by the "xmlns" 1766 | attribute) is inconsistent with the value of the 'Content-Type' field 1767 | in an HTTP header. 1768 |

1770 |

1771 | On the other hand, there is no inconsistency in serving HTML content 1772 | with the media type "text/plain", for example, as this combination is 1773 | licensed by specifications. 1774 |

1775 |

1776 | Receiving agents should detect protocol inconsistencies and perform 1777 | proper error recovery. 1778 |

1779 |

1780 |

1781 | Constraint: Data-metadata inconsistency 1783 |

1784 |

1785 | Agents MUST NOT ignore message metadata without the consent of the 1786 | user. 1787 |

1788 |

1789 |

1790 | Thus, for example, if the parties responsible for 1791 | "weather.example.com" mistakenly label the satellite photo of Oaxaca 1792 | as "image/gif" instead of "image/jpeg", and if Nadia's browser 1793 | detects a problem, Nadia's browser must not ignore the problem (e.g., 1794 | by simply rendering the JPEG image) without Nadia's consent. Nadia's 1795 | browser can notify Nadia of the problem or notify Nadia and take 1796 | corrective action. 1797 |

1798 |

1799 | Furthermore, representation providers can help reduce the risk of 1800 | inconsistencies through careful assignment of representation metadata 1801 | (especially that which applies across representations). The section 1802 | on media types for XML presents an 1803 | example of reducing the risk of error by providing no metadata about 1804 | character encoding when serving XML. 1805 |

1806 |

1807 | The accuracy of metadata relies on the server administrators, the 1808 | authors of representations, and the software that they use. 1809 | Practically, the capabilities of the tools and the social 1810 | relationships may be the limiting factors. 1811 |

1812 |

1813 | The accuracy of these and other metadata fields is just as important 1814 | for dynamic Web resources, where a little bit of thought and 1815 | programming can often ensure correct metadata for a huge number of 1816 | resources. 1817 |

1818 |

1819 | Often there is a separation of control between the users who create 1820 | representations of resources and the server managers who maintain the 1821 | Web site software. Given that it is generally the Web site software 1822 | that provides the metadata associated with a resource, it follows 1823 | that coordination between the server managers and content creators is 1824 | required. 1825 |

1826 |

1827 |

1828 | Good practice: Metadata association 1830 |

1831 |

1832 | Server managers SHOULD allow representation creators to control the 1833 | metadata associated with their representations. 1834 |

1835 |

1836 |

1837 | In particular, content creators need to be able to control the 1838 | content type (for extensibility) and the character encoding (for 1839 | proper internationalization). 1840 |

1841 |

1848 |

1849 |

1850 |

1851 | Safe Interactions 1852 |

1853 |

1854 | Nadia's retrieval of weather information (an example of a read-only 1855 | query or lookup) qualifies as a "safe" interaction; a safe 1856 | interaction is one where the agent does not incur any 1857 | obligation beyond the interaction. An agent may incur an obligation 1858 | through other means (such as by signing a contract). If an agent does 1859 | not have an obligation before a safe interaction, it does not have 1860 | that obligation afterwards. 1861 |

1862 |

1863 | Other Web interactions resemble orders more than queries. These 1864 | unsafe interactions may cause a change to the state of a 1865 | resource and the user may be held responsible for the consequences of 1866 | these interactions. Unsafe interactions include subscribing to a 1867 | newsletter, posting to a list, or modifying a database. 1868 | Note: In this context, the word "unsafe" does not 1869 | necessarily mean "dangerous"; the term "safe" is used in section 1870 | 9.1.1 of [[!HTTP11]] and "unsafe" is the natural opposite. 1871 |

1872 |

1873 |

1874 | Story 1875 |

1876 |

1877 |

1878 | Nadia decides to book a vacation to Oaxaca at 1879 | "booking.example.com." She enters data into a series of online 1880 | forms and is ultimately asked for credit card information to 1881 | purchase the airline tickets. She provides this information in 1882 | another form. When she presses the "Purchase" button, her browser 1883 | opens another network connection to the server at 1884 | "booking.example.com" and sends a message composed of form data 1885 | using the POST method. This is an unsafe interaction; Nadia wishes to 1887 | change the state of the system by exchanging money for airline 1888 | tickets. 1889 |

1890 |

1891 | The server reads the POST request, and after performing the 1892 | booking transaction returns a message to Nadia's browser that 1893 | contains a representation of the results of Nadia's request. The 1894 | representation data is in XHTML so that it can be saved or 1895 | printed out for Nadia's records. 1896 |

1897 |

1898 | Note that neither the data transmitted with the POST nor the data 1899 | received in the response necessarily correspond to any resource 1900 | identified by a URI. 1901 |

1902 |

1903 |

1904 |

1905 | Safe interactions are important because these are interactions where 1906 | users can browse with confidence and where agents (including search 1907 | engines and browsers that pre-cache data for the user) can follow 1908 | hypertext links safely. Users (or agents acting on their behalf) do 1909 | not commit themselves to anything by querying a resource or following 1910 | a hypertext link. 1911 |

1912 |

1913 |

1914 | Principle: Safe 1915 | retrieval 1916 |

1917 |

1918 | Agents do not incur obligations by retrieving a representation. 1919 |

1920 |

1921 |

1922 | For instance, it is incorrect to publish a URI that, when followed as 1923 | part of a hypertext link, subscribes a user to a mailing list. 1924 | Remember that search engines may follow such hypertext links. 1925 |

1926 |

1927 | The fact that HTTP GET, the access method most often used when 1928 | following a hypertext link, is safe does not imply that all safe 1929 | interactions must be done through HTTP GET. At times, there may be 1930 | good reasons (such as confidentiality requirements or practical 1931 | limits on URI length) to conduct an otherwise safe operation using a 1932 | mechanism generally reserved for unsafe operations (e.g., HTTP POST). 1933 |

1934 |

1935 | For more information about safe and unsafe operations using HTTP GET 1936 | and POST, and handling security concerns around the use of HTTP GET, 1937 | see the TAG finding "URIs, 1939 | Addressability, and the use of HTTP GET and POST". 1940 |

1941 |

1942 |

1943 | Unsafe interactions and accountability 1944 |

1945 |

1946 |

1947 | Story 1948 |

1949 |

1950 |

1951 | Nadia pays for her airline tickets online (through a POST 1952 | interaction as described above). She receives a Web page with 1953 | confirmation information and wishes to bookmark it so that she 1954 | can refer to it when she calculates her expenses. Although 1955 | Nadia can print out the results, or save them to a file, she 1956 | would also like to bookmark them. 1957 |

1958 |

1959 |

1960 |

1961 | Transaction requests and results are valuable resources, and like 1962 | all valuable resources, it is useful to be able to refer to them 1963 | with a persistent URI. However, in 1964 | practice, Nadia cannot bookmark her commitment to pay (expressed 1965 | via the POST request) or the airline company's acknowledgment and 1966 | commitment to provide her with a flight (expressed via the response 1967 | to the POST). 1968 |

1969 |

1970 | There are ways to provide persistent URIs for transaction requests 1971 | and their results. For transaction requests, user agents can 1972 | provide an interface for managing transactions where the user agent 1973 | has incurred an obligation on behalf of the user. For transaction 1974 | results, HTTP allows representation providers to associate a URI 1975 | with the results of an HTTP POST request using the 1976 | "Content-Location" header (described in section 14.14 of 1977 | [[!HTTP11]]). 1978 |

1979 |

1980 |

1981 |

1982 |

1983 | Representation Management 1984 |

1985 |

1986 |

1987 | Story 1988 |

1989 |

1990 |

1991 | Since Nadia finds the Oaxaca weather site useful, she emails a 1992 | review to her friend Dirk recommending that he check out 1993 | 'http://weather.example.com/oaxaca'. Dirk clicks on the resulting 1994 | hypertext link in the email he receives and is frustrated by a 1995 | 404 (not found). Dirk tries again the next day and receives a 1996 | representation with "news" that is two-weeks old. He tries one 1997 | more time the next day only to receive a representation that 1998 | claims that the weather in Oaxaca is sunny, even though his 1999 | friends in Oaxaca tell him by phone that in fact it is raining. 2000 | Dirk and Nadia conclude that the URI owners are unreliable or 2001 | unpredictable. Although the URI owner has chosen the Web as a 2002 | communication medium, the owner has lost two customers due to 2003 | ineffective representation management. 2004 |

2005 |

2006 |

2007 |

2008 | A URI owner may supply zero or more authoritative representations of 2009 | the resource identified by that URI. There is a benefit to the 2010 | community in providing representations. 2011 |

2012 |

2013 |

2014 | Good practice: Available representation 2016 |

2017 |

2018 | A URI owner SHOULD provide representations of the resource it 2019 | identifies 2020 |

2021 |

2022 |

2023 | For example, owners of XML namespace URIs should use them to identify 2024 | a namespace document. 2025 |

2026 |

2027 | Just because representations are available does not mean that it is 2028 | always desirable to retrieve them. In fact, in some cases the 2029 | opposite is true. 2030 |

2031 |

2032 |

2033 | Principle: Reference does not imply 2035 | dereference 2036 |

2037 |

2038 | An application developer or specification author SHOULD NOT require 2039 | networked retrieval of representations each time they are 2040 | referenced. 2041 |

2042 |

2043 |

2044 | Dereferencing a URI has a (potentially significant) cost in computing 2045 | and bandwidth resources, may have security implications, and may 2046 | impose significant latency on the dereferencing application. 2047 | Dereferencing URIs should be avoided except when necessary. 2048 |

2049 |

2056 |

2057 |

2058 | URI persistence 2059 |

2060 |

2061 | As is the case with many human interactions, confidence in 2062 | interactions via the Web depends on stability and predictability. 2063 | For an information resource, persistence depends on the consistency 2064 | of representations. The representation provider decides when 2065 | representations are sufficiently consistent (although that 2066 | determination generally takes user expectations into account). 2067 |

2068 |

2069 | Although persistence in this case is observable as a result of 2070 | representation retrieval, the term URI persistence is 2071 | used to describe the desirable property that, once associated with 2072 | a resource, a URI should continue indefinitely to refer to that 2073 | resource. 2074 |

2075 |

2076 |

2077 | Good practice: Consistent representation 2079 |

2080 |

2081 | A URI owner SHOULD provide representations of the identified 2082 | resource consistently and predictably. 2083 |

2084 |

2085 |

2086 | URI persistence is a matter of policy and commitment on the part of 2087 | the URI owner. The choice of a 2088 | particular URI scheme provides no guarantee that those URIs will be 2089 | persistent or that they will not be persistent. 2090 |

2091 |

2092 | HTTP [[!HTTP11]] has been designed to help manage URI persistence. 2093 | For example, HTTP redirection (using the 3xx response codes) 2094 | permits servers to tell an agent that further action needs to be 2095 | taken by the agent in order to fulfill the request (for example, a 2096 | new URI is associated with the resource). 2097 |

2098 |

2099 | In addition, content negotiation also 2100 | promotes consistency, as a site manager is not required to define 2101 | new URIs when adding support for a new format specification. 2102 | Protocols that do not support content negotiation (such as FTP) 2103 | require a new identifier when a new data format is introduced. 2104 | Improper use of content negotiation can lead to inconsistent 2105 | representations. 2106 |

2107 |

2108 | For more discussion about URI persistence, see [Cool]. 2110 |

2111 |

2112 |

2113 |

2114 | Linking and access control 2115 |

2116 |

2117 | It is reasonable to limit access to a resource (for commercial or 2118 | security reasons, for example), but merely identifying the resource 2119 | is like referring to a book by title. In exceptional circumstances, 2120 | people may have agreed to keep titles or URIs confidential (for 2121 | example, a book author and a publisher may agree to keep the URI of 2122 | page containing additional material secret until after the book is 2123 | published), otherwise they are free to exchange them. 2124 |

2125 |

2126 | As an analogy: The owners of a building might have a policy that 2127 | the public may only enter the building via the main front door, and 2128 | only during business hours. People who work in the building and who 2129 | make deliveries to it might use other doors as appropriate. Such a 2130 | policy would be enforced by a combination of security personnel and 2131 | mechanical devices such as locks and pass-cards. One would not 2132 | enforce this policy by hiding some of the building entrances, nor 2133 | by requesting legislation requiring the use of the front door and 2134 | forbidding anyone to reveal the fact that there are other doors to 2135 | the building. 2136 |

2137 |

2138 |

2139 | Story 2140 |

2141 |

2142 |

2143 | Nadia sends to Dirk the URI of the current article she is 2144 | reading. With his browser, Dirk follows the hypertext link and 2145 | is asked to enter his subscriber username and password. Since 2146 | Dirk is also a subscriber to services provided by 2147 | "weather.example.com," he can access the same information as 2148 | Nadia. Thus, the authority for "weather.example.com" can limit 2149 | access to authorized parties and still provide the benefits of 2150 | URIs. 2151 |

2152 |

2153 |

2154 |

2155 | The Web provides several mechanisms to control access to resources; 2156 | these mechanisms do not rely on hiding or suppressing URIs for 2157 | those resources. For more information, see the TAG finding 2158 | "'Deep Linking' in 2160 | the World Wide Web". 2161 |

2162 |

2163 |

2164 |

2165 | Supporting Navigation 2166 |

2167 |

2168 | It is a strength of Web Architecture that links can be made and 2169 | shared; a user who has found an interesting part of the Web can 2170 | share this experience just by republishing a URI. 2171 |

2172 |

2173 |

2174 | Story 2175 |

2176 |

2177 |

2178 | Nadia and Dirk want to visit the Museum of Weather Forecasting 2179 | in Oaxaca. Nadia goes to "http://maps.example.com", locates the 2180 | museum, and mails the URI 2181 | "http://maps.example.com/oaxaca?lat=17.065;lon=-96.716;scale=6" 2182 | to Dirk. Dirk goes to "http://mymaps.example.com", locates the 2183 | museum, and mails the URI 2184 | "http://mymaps.example.com/geo?sessionID=765345;userID=Dirk" to 2185 | Nadia. Dirk reads Nadia's email and is able to follow the link 2186 | to the map. Nadia reads Dirk's email, follows the link, and 2187 | receives an error message 'No such session/user'. Nadia has to 2188 | start again from "http://mymaps.example.com" and find the 2189 | museum location once more. 2190 |

2191 |

2192 |

2193 |

2194 | For resources that are generated on demand, machine generation of 2195 | URIs is common. For resources that might usefully be bookmarked for 2196 | later perusal, or shared with others, server managers should avoid 2197 | needlessly restricting the reusability of such URIs. If the 2198 | intention is to restrict information to a particular user, as might 2199 | be the case in a home banking application for example, designers 2200 | should use appropriate access control 2201 | mechanisms. 2202 |

2203 |

2204 | Interactions conducted with HTTP POST (where HTTP GET could have 2205 | been used) also limit navigation possibilities. The user cannot 2206 | create a bookmark or share the URI because HTTP POST transactions 2207 | do not typically result in a different URI as the user interacts 2208 | with the site. 2209 |

2210 |

2211 |

2212 |

2213 |

2214 | Future Directions for Interaction 2215 |

2216 |

2217 | There remain open questions regarding Web interactions. The TAG 2218 | expects future versions of this document to address in more detail 2219 | the relationship between the architecture described herein, Web Services, peer-to-peer systems, 2221 | instant messaging systems (such as [[!RFC3920]]), streaming audio 2222 | (such as RTSP [[!RFC2326]]), and voice-over-IP (such as SIP 2223 | [[!RFC3261]]). 2224 |

2225 |

2226 |

2227 |

2228 |

2229 | Data Formats 2230 |

2231 |

2232 | A data format specification (for example, for XHTML, RDF/XML, SMIL, 2233 | XLink, CSS, and PNG) embodies an agreement on the correct 2234 | interpretation of representation 2235 | data. The first data format used on the Web was HTML. Since then, data 2236 | formats have grown in number. Web architecture does not constrain which 2237 | data formats content providers can use. This flexibility is important 2238 | because there is constant evolution in applications, resulting in new 2239 | data formats and refinements of existing formats. Although Web 2240 | architecture allows for the deployment of new data formats, the 2241 | creation and deployment of new formats (and agents able to handle them) 2242 | is expensive. Thus, before inventing a new data format (or "meta" 2243 | format such as XML), designers should carefully consider re-using one 2244 | that is already available. 2245 |

2246 |

2247 | For a data format to be usefully interoperable between two parties, the 2248 | parties must agree (to a reasonable extent) about its syntax and 2249 | semantics. Shared understanding of a data format promotes 2250 | interoperability but does not imply constraints on usage; for instance, 2251 | a sender of data cannot count on being able to constrain the behavior 2252 | of a data receiver. 2253 |

2254 |

2255 | Below we describe some characteristics of a data format that facilitate 2256 | integration into Web architecture. This document does not address 2257 | generally beneficial characteristics of a specification such as 2258 | readability, simplicity, attention to programmer goals, attention to 2259 | user needs, accessibility, nor internationalization. The section on 2260 | architectural specifications includes 2261 | references to additional format specification guidelines. 2262 |

2263 |

2264 |

2265 | Binary and Textual Data Formats 2266 |

2267 |

2268 | Binary data formats are those in which portions of the data are 2269 | encoded for direct use by computer processors, for example 32 bit 2270 | little-endian two's-complement and 64 bit IEEE double-precision 2271 | floating-point. The portions of data so represented include numeric 2272 | values, pointers, and compressed data of all sorts. 2273 |

2274 |

2275 | A textual data format is one in which the data is specified in a 2276 | defined encoding as a sequence of characters. HTML, Internet e-mail, 2277 | and all XML-based formats are textual. 2278 | Increasingly, internationalized textual data formats refer to the 2279 | Unicode repertoire [[!UNICODE]] for character definitions. 2280 |

2281 |

2282 | If a data format is textual, as defined in this section, that does 2283 | not imply that it should be served with a media type beginning with 2284 | "text/". Although XML-based formats are textual, many XML-based 2285 | formats do not consist primarily of phrases in natural language. See 2286 | the section on media types for XML for 2287 | issues that arise when "text/" is used in conjunction with an 2288 | XML-based format. 2289 |

2290 |

2291 | In principle, all data can be represented using textual formats. In 2292 | practice, some types of content (e.g., audio and video) are generally 2293 | represented using binary formats. 2294 |

2295 |

2296 | The trade-offs between binary and textual data formats are complex 2297 | and application-dependent. Binary formats can be substantially more 2298 | compact, particularly for complex pointer-rich data structures. Also, 2299 | they can be consumed more rapidly by agents in those cases where they 2300 | can be loaded into memory and used with little or no conversion. 2301 | Note, however, that such cases are relatively uncommon as such direct 2302 | use may open the door to security issues that can only practically be 2303 | addressed by examining every aspect of the data structure in detail. 2304 |

2305 |

2306 | Textual formats are usually more portable and interoperable. Textual 2307 | formats also have the considerable advantage that they can be 2308 | directly read by human beings (and understood, given sufficient 2309 | documentation). This can simplify the tasks of creating and 2310 | maintaining software, and allow the direct intervention of humans in 2311 | the processing chain without recourse to tools more complex than the 2312 | ubiquitous text editor. Finally, it simplifies the necessary human 2313 | task of learning about new data formats; this is called the "view 2314 | source" effect. 2315 |

2316 |

2317 | It is important to emphasize that intuition as to such matters as 2318 | data size and processing speed is not a reliable guide in data format 2319 | design; quantitative studies are essential to a correct understanding 2320 | of the trade-offs. Therefore, designers of a data format 2321 | specification should make a considered choice between binary and 2322 | textual format design. 2323 |

2324 |

2325 | See TAG issue binaryXML-30. 2327 |

2328 |

2329 |

2330 |

2331 | Versioning and Extensibility 2332 |

2333 |

2334 | In a perfect world, language designers would invent languages that 2335 | perfectly met the requirements presented to them, the requirements 2336 | would be a perfect model of the world, they would never change over 2337 | time, and all implementations would be perfectly interoperable 2338 | because the specifications would have no variability. 2339 |

2340 |

2341 | In the real world, language designers imperfectly address the 2342 | requirements as they interpret them, the requirements inaccurately 2343 | model the world, conflicting requirements are presented, and they 2344 | change over time. As a result, designers negotiate with users, make 2345 | compromises, and often introduce extensibility mechanisms so that it 2346 | is possible to work around problems in the short term. In the long 2347 | term, they produce multiple versions of their languages, as the 2348 | problem, and their understanding of it, evolve. The resulting 2349 | variability in specifications, languages, and implementations 2350 | introduces interoperability costs. 2351 |

2352 |

2353 | Extensibility and versioning are strategies to help manage the 2354 | natural evolution of information on the Web and technologies used to 2355 | represent that information. For more information about how these 2356 | strategies introduce variability and how that variability impacts 2357 | interoperability, see Variability in 2358 | Specifications. 2359 |

2360 |

2361 | See TAG issue XMLVersioning-41, 2363 | which concerns good practices for designing extensible XML languages 2364 | and for handling versioning. See also "Web Architecture: Extensible 2365 | Languages" [[!EXTLANG]]. 2366 |

2367 |

2368 |

2369 | Versioning 2370 |

2371 |

2372 | There is typically a (long) transition period during which multiple 2373 | versions of a format, protocol, or agent are simultaneously in use. 2374 |

2375 |

2376 |

2377 | Good practice: Version information 2379 |

2380 |

2381 | A data format specification SHOULD provide for version 2382 | information. 2383 |

2384 |

2385 |

2386 |

2387 |

2388 | Versioning and XML namespace policy 2389 |

2390 |

2391 |

2392 | Story 2393 |

2394 |

2395 |

2396 | Nadia and Dirk are designing an XML data format to encode data 2397 | about the film industry. They provide for extensibility by 2398 | using XML namespaces and creating a schema that allows the 2399 | inclusion, in certain places, of elements from any namespace. 2400 | When they revise their format, Nadia proposes a new optional 2401 | lang attribute on the film element. 2402 | Dirk feels that such a change requires them to assign a new 2403 | namespace name, which might require changes to deployed 2404 | software. Nadia explains to Dirk that their choice of 2405 | extensibility strategy in conjunction with their namespace 2406 | policy allows certain changes that do not affect conformance of 2407 | existing content and software, and thus no change to the 2408 | namespace identifier is required. They chose this policy to 2409 | help them meet their goals of reducing the cost of change. 2410 |

2411 |

2412 |

2413 |

2414 | Dirk and Nadia have chosen a particular namespace change policy 2415 | that allows them to avoid changing the namespace name whenever they 2416 | make changes that do not affect conformance of deployed content and 2417 | software. They might have chosen a different policy, for example 2418 | that any new element or attribute has to belong to a namespace 2419 | other than the original one. Whatever the chosen policy, it should 2420 | set clear expectations for users of the format. 2421 |

2422 |

2423 | In general, changing the namespace name of an element completely 2424 | changes the element name. If "a" and "b" are bound to two different 2425 | URIs, a:element and b:element are as 2426 | distinct as a:eieio and a:xyzzy. 2427 | Practically speaking, this means that deployed applications will 2428 | have to be upgraded in order to recognize the new language; the 2429 | cost of this upgrade may be very high. 2430 |

2431 |

2432 | It follows that there are significant tradeoffs to be considered 2433 | when deciding on a namespace change policy. If a vocabulary has no 2434 | extensibility points (that is, if it does not allow elements or 2435 | attributes from foreign namespaces or have a mechanism for dealing 2436 | with unrecognized names from the same namespace), it may be 2437 | absolutely necessary to change the namespace name. Languages that 2438 | allow some form of extensibility without requiring a change to the 2439 | namespace name are more likely to evolve gracefully. 2440 |

2441 |

2442 |

2443 | Good practice: Namespace policy 2445 |

2446 |

2447 | An XML format specification SHOULD include information about 2448 | change policies for XML namespaces. 2449 |

2450 |

2451 |

2452 | As an example of a change policy designed to reflect the variable 2453 | stability of a namespace, consider the W3C namespace policy for 2455 | documents on the W3C Recommendation track. The policy sets 2456 | expectations that the Working Group responsible for the namespace 2457 | may modify it in any way until a certain point in the process 2458 | ("Candidate Recommendation") at which point W3C constrains the set 2459 | of possible changes to the namespace in order to promote stable 2460 | implementations. 2461 |

2462 |

2463 | Note that since namespace names are URIs, the owner of a namespace 2464 | URI has the authority to decide the namespace change policy. 2465 |

2466 |

2467 |

2468 |

2469 | Extensibility 2470 |

2471 |

2472 | Requirements change over time. Successful technologies are adopted 2473 | and adapted by new users. Designers can facilitate the transition 2474 | process by making careful choices about extensibility during the 2475 | design of a language or protocol specification. 2476 |

2477 |

2478 | In making these choices, the designers must weigh the trade-offs 2479 | between extensibility, simplicity, and variability. A language 2480 | without extensibility mechanisms may be simpler and less variable, 2481 | improving initial interoperability. However, it's likely that 2482 | changes to that language will be more difficult, possibly more 2483 | complex and more variable, than if the initial design had provided 2484 | such mechanisms. This may decrease interoperability over the long 2485 | term. 2486 |

2487 |

2488 |

2489 | Good practice: Extensibility mechanisms 2491 |

2492 |

2493 | A specification SHOULD provide mechanisms that allow any party to 2494 | create extensions. 2495 |

2496 |

2497 |

2498 | Extensibility introduces variability which has an impact on 2499 | interoperability. However, languages that have no extensibility 2500 | mechanisms may be extended in ad hoc ways that impact 2501 | interoperability as well. One key criterion of the mechanisms 2502 | provided by language designers is that they allow the extended 2503 | languages to remain in conformance with the original specification, 2504 | increasing the likelihood of interoperability. 2505 |

2506 |

2507 |

2508 | Good practice: Extensibility conformance 2510 |

2511 |

2512 | Extensibility MUST NOT interfere with conformance to the original 2513 | specification. 2514 |

2515 |

2516 |

2517 | Application needs determine the most appropriate extension strategy 2518 | for a specification. For example, applications designed to operate 2519 | in closed environments may allow specification designers to define 2520 | a versioning strategy that would be impractical at the scale of the 2521 | Web. 2522 |

2523 |

2524 |

2525 | Good practice: Unknown extensions 2527 |

2528 |

2529 | A specification SHOULD specify agent behavior in the face of 2530 | unrecognized extensions. 2531 |

2532 |

2533 |

2534 | Two strategies have emerged as being particularly useful: 2535 |

2536 |

"Must ignore": The agent ignores any content it does not 2538 | recognize. 2539 |
"Must understand": The agent treats unrecognized markup as an 2541 | error condition. 2542 |

2544 |

2545 | A powerful design approach is for the language to allow either form 2546 | of extension, but to distinguish explicitly between them in the 2547 | syntax. 2548 |

2549 |

2550 | Additional strategies include prompting the user for more input and 2551 | automatically retrieving data from available hypertext links. More 2552 | complex strategies are also possible, including mixing strategies. 2553 | For instance, a language can include mechanisms for overriding 2554 | standard behavior. Thus, a data format can specify "must ignore" 2555 | semantics but also allow for extensions that override that 2556 | semantics in light of application needs (for instance, with "must 2557 | understand" semantics for a particular extension). 2558 |

2559 |

2560 | Extensibility is not free. Providing hooks for extensibility is one 2561 | of many requirements to be factored into the costs of language 2562 | design. Experience suggests that the long term benefits of a 2563 | well-designed extensibility mechanism generally outweigh the costs. 2564 |

2565 |

2566 | See “D.3 2567 | Extensibility and Extensions” in [[!QA]]. 2568 |

2569 |

2570 |

2571 |

2572 | Composition of data formats 2573 |

2574 |

2575 | Many modern data format include mechanisms for composition. For 2576 | example: 2577 |

2578 |

It is possible to embed text comments in some image formats, 2580 | such as JPEG/JFIF. Although these comments are embedded in the 2581 | containing data, they are not intended to affect the display of the 2582 | image. 2583 |
There are container formats such as SOAP which fully expect 2585 | content from multiple namespaces but which provide an overall 2586 | semantic relationship of message envelope and payload. 2587 |
The semantics of combining RDF documents containing multiple 2589 | vocabularies are well-defined. 2590 |

2592 |

2593 | In principle, these relationships can be mixed and nested 2594 | arbitrarily. A SOAP message, for example, can contain an SVG image 2595 | that contains an RDF comment which refers to a vocabulary of terms 2596 | for describing the image. 2597 |

2598 |

2599 | Note however, that for general XML there is no semantic model that 2600 | defines the interactions within XML documents with elements and/or 2601 | attributes from a variety of namespaces. Each application must 2602 | define how namespaces interact and what effect the namespace of an 2603 | element has on the element's ancestors, siblings, and descendants. 2604 |

2605 |

2606 | See TAG issues mixedUIXMLNamespace-33 2608 | (concerning the meaning of a document composed of content in 2609 | multiple namespaces), xmlFunctions-34 2611 | (concerning one approach for managing XML transformation and 2612 | composability), and RDFinXHTML-35 2614 | (concerning the interpretation of RDF when embedded in an XHTML 2615 | document). 2616 |

2617 |

2618 |

2619 |

2620 |

2621 | Separation of Content, Presentation, and Interaction 2622 |

2623 |

2624 | The Web is a heterogeneous environment where a wide variety of agents 2625 | provide access to content to users with a wide variety of 2626 | capabilities. It is good practice for authors to create content that 2627 | can reach the widest possible audience, including users with 2628 | graphical desktop computers, hand-held devices and mobile phones, 2629 | users with disabilities who may require speech synthesizers, and 2630 | devices not yet imagined. Furthermore, authors cannot predict in some 2631 | cases how an agent will display or process their content. Experience 2632 | shows that the separation of content, presentation, and interaction 2633 | promotes the reuse and device-independence of content; this follows 2634 | from the principle of orthogonal 2635 | specifications. 2636 |

2637 |

2638 | This separation also facilitates reuse of authored source content 2639 | across multiple delivery contexts. Sometimes, functional user 2640 | experiences suited to any delivery context can be generated by using 2641 | an adaptation process applied to a representation that does not 2642 | depend on the access mechanism. For more information about principles 2643 | of device-independence, see [DIPRINCIPLES]. 2645 |

2646 |

2647 |

2648 | Good practice: Separation of 2649 | content, presentation, interaction 2650 |

2651 |

2652 | A specification SHOULD allow authors to separate content from both 2653 | presentation and interaction concerns. 2654 |

2655 |

2656 |

2657 | Note that when content, presentation, and interaction are separated 2658 | by design, agents need to recombine them. There is a recombination 2659 | spectrum, with "client does all" at one end and "server does all" at 2660 | the other. 2661 |

2662 |

2663 | There are advantages to each approach. For instance when a client 2664 | (such as a mobile phone) communicates device capabilities to the 2665 | server (for example, using CC/PP), the server can tailor the 2666 | delivered content to fit that client. The server can, for example, 2667 | enable faster downloads by adjusting links to refer to lower 2668 | resolution images, smaller video or no video at all. Similarly, if 2669 | the content has been authored with multiple branches, the server can 2670 | remove unused branches before delivery. In addition, by tailoring the 2671 | content to match the characteristics of a target client, the server 2672 | can help reduce client side computation. However, specializing 2673 | content in this manner reduces caching efficiency. 2674 |

2675 |

2676 | On the other hand, designing content that that can be recombined on 2677 | the client also tends to make that content applicable to a wider 2678 | range of devices. This design also improves caching efficiency and 2679 | offers users more presentation options. Media-dependent style sheets 2680 | can be used to tailor the content on the client side to particular 2681 | groups of target devices. For textual content with a regular and 2682 | repeating structure, the combined size of the text content plus the 2683 | style sheet is typically less than that of fully recombined content; 2684 | the savings improve further if the style sheet is reused by other 2685 | pages. 2686 |

2687 |

2688 | In practice a combination of both approaches is often used. The 2689 | design decision about where on this spectrum an application should be 2690 | placed depends on the power on the client, the power and the load on 2691 | the server, and the bandwidth of the medium that connects them. If 2692 | the number of possible clients is unbounded, the application will 2693 | scale better if more computation is pushed to the client. 2694 |

2695 |

2696 | Of course, it may not be desirable to reach the widest possible 2697 | audience. Designers should consider appropriate technologies, such as 2698 | encryption and access control, for limiting 2699 | the audience. 2700 |

2701 |

2702 | Some data formats are designed to describe presentation (including 2703 | SVG and XSL Formatting Objects). Data formats such as these 2704 | demonstrate that one can only separate content from presentation (or 2705 | interaction) so far; at some point it becomes necessary to talk about 2706 | presentation. Per the principle of orthogonal specifications these data formats 2708 | should only address presentation issues. 2709 |

2710 |

2711 | See the TAG issues formattingProperties-19 2713 | (concerning interoperability in the case of formatting properties and 2714 | names) and contentPresentation-26 2716 | (concerning the separation of semantic and presentational markup). 2717 |

2718 |

2719 |

2720 |

2721 | Hypertext 2722 |

2723 |

2724 | A defining characteristic of the Web is that it allows embedded 2725 | references to other resources via URIs. The simplicity of creating 2726 | hypertext links using absolute URIs (<a 2727 | href="http://www.example.com/foo">) and relative URI 2728 | references (<a href="foo"> and <a 2729 | href="foo#anchor">) is partly (perhaps largely) responsible 2730 | for the success of the hypertext Web as we know it today. 2731 |

2732 |

2733 | When one resource (representation) refers to another resource with a 2734 | URI, this constitutes a link between the two resources. 2735 | Additional metadata may also form part of the link (see [[!XLINK10]], 2736 | for example). Note: In this document, the term 2737 | "link" generally means "relationship", not "physical connection". 2738 |

2739 |

2740 |

2741 | Good practice: Link identification 2743 |

2744 |

2745 | A specification SHOULD provide ways to identify links to other 2746 | resources, including to secondary resources (via fragment 2747 | identifiers). 2748 |

2749 |

2750 |

2751 | Formats that allow content authors to use URIs instead of local 2752 | identifiers promote the network effect: the value of these formats 2753 | grows with the size of the deployed Web. 2754 |

2755 |

2756 |

2757 | Good practice: Web 2758 | linking 2759 |

2760 |

2761 | A specification SHOULD allow Web-wide linking, not just internal 2762 | document linking. 2763 |

2764 |

2765 |

2766 |

2767 | Good practice: Generic URIs 2769 |

2770 |

2771 | A specification SHOULD allow content authors to use URIs without 2772 | constraining them to a limited set of URI schemes. 2773 |

2774 |

2775 |

2776 | What agents do with a hypertext link is not constrained by Web 2777 | architecture and may depend on application context. Users of 2778 | hypertext links expect to be able to navigate among representations 2779 | by following links. 2780 |

2781 |

2782 |

2783 | Good practice: Hypertext links 2785 |

2786 |

2787 | A data format SHOULD incorporate hypertext links if hypertext is 2788 | the expected user interface paradigm. 2789 |

2790 |

2791 |

2792 | Data formats that do not allow content authors to create hypertext 2793 | links lead to the creation of "terminal nodes" on the Web. 2794 |

2795 |

2796 |

2797 | URI references 2798 |

2799 |

2800 | Links are commonly expressed using URI references 2801 | (defined in section 4.2 of [[!URI]]), which may be combined with a 2802 | base URI to yield a usable URI. Section 5.1 of [[!URI]] explains 2803 | different ways to establish a base URI for a resource and 2804 | establishes a precedence among them. For instance, the base URI may 2805 | be a URI for the resource, or specified in a representation (see 2806 | the base elements provided by HTML and XML, and the 2807 | HTTP 'Content-Location' header). See also the section on links in XML. 2809 |

2810 |

2811 | Agents resolve a URI reference before using the resulting URI to 2812 | interact with another agent. URI references help in content 2813 | management by allowing content authors to design a representation 2814 | locally, i.e., without concern for which global identifier may 2815 | later be used to refer to the associated resource. 2816 |

2817 |

2818 |

2819 |

2820 |

2821 | XML-Based Data Formats 2822 |

2823 |

2824 | Many data formats are XML-based, that is to say they 2825 | conform to the syntax rules defined in the XML specification 2826 | [[!XML10]] or [XML11]. This section discusses 2827 | issues that are specific to such formats. Anyone seeking guidance in 2828 | this area is urged to consult the "Guidelines For the Use of XML in 2829 | IETF Protocols" [IETFXML], which contains a 2830 | thorough discussion of the considerations that govern whether or not 2831 | XML ought to be used, as well as specific guidelines on how it ought 2832 | to be used. While it is directed at Internet applications with 2833 | specific reference to protocols, the discussion is generally 2834 | applicable to Web scenarios as well. 2835 |

2836 |

2837 | The discussion here should be seen as ancillary to the content of 2838 | [IETFXML]. Refer also to "XML Accessibility 2839 | Guidelines" [XAG] for help designing XML formats 2840 | that lower barriers to Web accessibility for people with 2841 | disabilities. 2842 |

2843 |

2844 |

2845 | When to use an XML-based format 2846 |

2847 |

2848 | XML defines textual data formats that are naturally suited to 2849 | describing data objects which are hierarchical and processed in a 2850 | chosen sequence. It is widely, but not universally, applicable for 2851 | data formats; an audio or video format, for example, is unlikely to 2852 | be well suited to expression in XML. Design constraints that would 2853 | suggest the use of XML include: 2854 |

2855 |

Requirement for a hierarchical structure. 2857 |
Need for a wide range of tools on a variety of platforms. 2859 |
Need for data that can outlive the applications that currently 2861 | process it. 2862 |
Ability to support internationalization in a self-describing 2864 | way that makes confusion over coding options unlikely. 2865 |
Early detection of encoding errors with no requirement to "work 2867 | around" such errors. 2868 |
A high proportion of human-readable textual content. 2870 |
Potential composition of the data format with other XML-encoded 2872 | formats. 2873 |
Desire for data easily parsed by both humans and machines. 2875 |
Desire for vocabularies that can be invented in a distributed 2877 | manner and combined flexibly. 2878 |

2880 |

2881 |

2882 |

2883 | Links in XML 2884 |

2885 |

2886 | Sophisticated linking mechanisms have been invented for XML 2887 | formats. XPointer allows links to address content that does not 2888 | have an explicit, named anchor. [[!XLINK10]] is an appropriate 2889 | specification for representing links in hypertext XML applications. XLink allows links to 2891 | have multiple ends and to be expressed either inline or in "link 2892 | bases" stored external to any or all of the resources identified by 2893 | the links it contains. 2894 |

2895 |

2896 | Designers of XML-based formats may consider using XLink and, for 2897 | defining fragment identifier syntax, using the XPointer framework 2898 | and XPointer element() Schemes. 2899 |

2900 |

2901 | XLink is not the only linking design that has been proposed for 2902 | XML, nor is it universally accepted as a good design. See also TAG 2903 | issue xlinkScope-23. 2905 |

2906 |

2907 |

2908 |

2909 | XML namespaces 2910 |

2911 |

2912 | The purpose of an XML namespace (defined in [XMLNS]) is to allow the deployment of XML vocabularies 2914 | (in which element and attribute names are defined) in a global 2915 | environment and to reduce the risk of name collisions in a given 2916 | document when vocabularies are combined. For example, the MathML 2917 | and SVG specifications both define the set element. 2918 | Although XML data from different formats such as MathML and SVG can 2919 | be combined in a single document, in this case there could be 2920 | ambiguity about which set element was intended. XML 2921 | namespaces reduce the risk of name collisions by taking advantage 2922 | of existing systems for allocating globally scoped names: the URI 2923 | system (see also the section on URI allocation). When using XML namespaces, 2925 | each local name in an XML vocabulary is paired with a URI (called 2926 | the namespace URI) to distinguish the local name from local names 2927 | in other vocabularies. 2928 |

2929 |

2930 | The use of URIs confers additional benefits. First, each URI/local 2931 | name pair can be mapped to another URI, grounding the terms of the 2932 | vocabulary in the Web. These terms may be important resources and 2933 | thus it is appropriate to be able to associate URIs with them. 2934 |

2935 |

2936 | For flat namespaces, concatenation is one useful mapping. If 2937 | namespace URIs that end with a hash (“#”) are chosen, then simple 2938 | concatenation of the namespace URI and the local name creates a URI 2939 | for a secondary resource (the identified term). This technique is 2940 | used for many [[!RDFXML]] namespaces. 2941 |

2942 |

2943 | Other mappings are likely to be more suitable for hierarchical 2944 | namespaces; see the related TAG issue abstractComponentRefs-37. 2946 |

2947 |

2948 | Designers of XML-based data formats who declare namespaces thus 2949 | make it possible to reuse those data formats and combine them in 2950 | novel ways not yet imagined. Failure to declare namespaces makes 2951 | such reuse more difficult, even impractical in some cases. 2952 |

2953 |

2954 |

2955 | Good practice: Namespace adoption 2957 |

2958 |

2959 | A specification that establishes an XML vocabulary SHOULD place 2960 | all element names and global attribute names in a namespace. 2961 |

2962 |

2963 |

2964 | Attributes are always scoped by the element on which they appear. 2965 | An attribute that is "global," that is, one that might meaningfully 2966 | appear on elements of many types, including elements in other 2967 | namespaces, should be explicitly placed in a namespace. Local 2968 | attributes, ones associated with only a particular element type, 2969 | need not be included in a namespace since their meaning will always 2970 | be clear from the context provided by that element. 2971 |

2972 |

2973 | The type attribute from the W3C XML Schema Instance 2974 | namespace "http://www.w3.org/2001/XMLSchema-instance" ([XMLSCHEMA], section 4.3.2) is an example of a 2976 | global attribute. It can be used by authors of any vocabulary to 2977 | make an assertion in instance data about the type of the element on 2978 | which it appears. As a global attribute, it must always be 2979 | qualified. The frame attribute on an HTML table is an 2980 | example of a local attribute. There is no value in placing that 2981 | attribute in a namespace since the attribute is unlikely to be 2982 | useful on an element other than an HTML table. 2983 |

2984 |

2985 | Applications that rely on DTD processing must impose additional 2986 | constraints on the use of namespaces. DTDs perform validation based 2987 | on the lexical form of the element and attribute names in the 2988 | document. This makes prefixes syntactically significant in ways 2989 | that are not anticipated by [[!XMLNS]]. 2990 |

2991 |

2992 |

2993 |

2994 | Namespace documents 2995 |

2996 |

2997 |

2998 | Story 2999 |

3000 |

3001 |

3002 | Nadia receives representation data from "weather.example.com" 3003 | in an unfamiliar data format. She knows enough about XML to 3004 | recognize which XML namespace the elements belong to. Since the 3005 | namespace is identified by the URI 3006 | "http://weather.example.com/2003/format", she asks her browser 3007 | to retrieve a representation of the identified resource. She 3008 | gets back some useful data that allows her to learn more about 3009 | the data format. Nadia's browser may also be able to perform 3010 | some operations automatically (i.e., unattended by a human 3011 | overseer) given data that has been optimized for software 3012 | agents. For example, her browser might, on Nadia's behalf, 3013 | download additional agents to process and render the format. 3014 |

3015 |

3016 |

3017 |

3018 | Another benefit of using URIs to build XML namespaces is that the 3019 | namespace URI can be used to identify an information resource that 3020 | contains useful information, machine-usable and/or human-usable, 3021 | about terms in the namespace. This type of information resource is 3022 | called a namespace document. When a namespace URI owner 3023 | provides a namespace document, it is authoritative for the 3024 | namespace. 3025 |

3026 |

3027 | There are many reasons to provide a namespace document. A person 3028 | might want to: 3029 |

3030 |

understand the purpose of the namespace, 3032 |
learn how to use the markup vocabulary in the namespace, 3034 |
find out who controls it and associated policies, 3036 |
request authority to access schemas or collateral material 3038 | about it, or 3039 |
report a bug or situation that could be considered an error in 3041 | some collateral material. 3042 |

3044 |

3045 | A processor might want to: 3046 |

3047 |

retrieve a schema, for validation, 3049 |
retrieve a style sheet, for presentation, or 3051 |
retrieve ontologies, for making inferences. 3053 |

3055 |

3056 | In general, there is no established best practice for creating 3057 | representations of a namespace document; application expectations 3058 | will influence what data format or formats are used. Application 3059 | expectations will also influence whether relevant information 3060 | appears directly in a representation or is referenced from it. 3061 |

3062 |

3063 |

3064 | Good practice: Namespace documents 3066 |

3067 |

3068 | The owner of an XML namespace name SHOULD make available material 3069 | intended for people to read and material optimized for software 3070 | agents in order to meet the needs of those who will use the 3071 | namespace vocabulary. 3072 |

3073 |

3074 |

3075 | For example, the following are examples of data formats for 3076 | namespace documents: [[!OWL10]], [[!RDDL]], [[!XMLSCHEMA-1]], and 3077 | [[!XHTML11]]. Each of these formats meets different requirements 3078 | described above for satisfying the needs of an agent that wants 3079 | more information about the namespace. Note, however, issues related 3080 | to fragment identifiers and content 3081 | negotiation if content negotiation is used. 3082 |

3083 |

3084 | See TAG issues namespaceDocument-8 3086 | (concerning desired characteristics of namespace documents) and 3087 | abstractComponentRefs-37 3089 | (concerning the use of fragment identifiers with namespace names to 3090 | identify abstract components). 3091 |

3092 |

3093 |

3094 |

3095 | QNames in XML 3096 |

3097 |

3098 | Section 3 of "Namespaces in XML" [XMLNS] 3099 | provides a syntactic construct known as a QName for the compact 3100 | expression of qualified names in XML documents. A qualified name is 3101 | a pair consisting of a URI, which names a namespace, and a local 3102 | name placed within that namespace. "Namespaces in XML" provides for 3103 | the use of QNames as names for XML elements and attributes. 3104 |

3105 |

3106 | Other specifications, starting with [XSLT10], 3107 | have employed the idea of using QNames in contexts other than 3108 | element and attribute names, for example in attribute values and in 3109 | element content. However, general XML processors cannot reliably 3110 | recognize QNames as such when they are used in attribute values and 3111 | in element content; for example, the syntax of QNames overlaps with 3112 | that of URIs. Experience has also revealed other limitations to 3113 | QNames, such as losing namespace bindings after XML 3114 | canonicalization. 3115 |

3116 |

3117 |

3118 | Constraint: QNames Indistinguishable from URIs 3120 |

3121 |

3122 | Do not allow both QNames and URIs in attribute values or element 3123 | content where they are indistinguishable. 3124 |

3125 |

3126 |

3127 | For more information, see the TAG finding "Using QNames as 3129 | Identifiers in Content". 3130 |

3131 |

3132 | Because QNames are compact, some specification designers have 3133 | adopted the same syntax as a means of identifying resources. Though 3134 | convenient as a shorthand notation, this usage has a cost. There is 3135 | no single, accepted way to convert a QName into a URI or vice 3136 | versa. Although QNames are convenient, they do not replace the URI 3137 | as the identification system of the Web. The use of QNames to 3138 | identify Web resources without providing a mapping to URIs is 3139 | inconsistent with Web architecture. 3140 |

3141 |

3142 |

3143 | Good practice: QName Mapping 3145 |

3146 |

3147 | A specification in which QNames serve as resource identifiers 3148 | MUST provide a mapping to URIs. 3149 |

3150 |

3151 |

3152 | See XML namespaces for examples of 3153 | some mapping strategies. 3154 |

3155 |

3156 | See also TAG issues rdfmsQnameUriMapping-6 3158 | (concerning the mapping of QNames to URIs), qnameAsId-18 3160 | (concerning the use of QNames as identifiers in XML content), and 3161 | abstractComponentRefs-37 3163 | (concerning the use of fragment identifiers with namespace names to 3164 | identify abstract components). 3165 |

3166 |

3167 |

3168 |

3169 | XML ID semantics 3170 |

3171 |

3172 | Consider the following fragment of XML: <section 3173 | >. Does the section element have what the 3174 | XML Recommendation refers to as the ID foo (i.e., 3175 | "foo" must not appear in the surrounding XML document more than 3176 | once)? One cannot answer this question by examining the element and 3177 | its attributes alone. In XML, the quality of "being an ID" is 3178 | associated with the type of an attribute, not its name. Finding the 3179 | IDs in a document requires additional processing. 3180 |

3181 |

Processing the document with a processor that recognizes DTD 3183 | attribute list declarations (in the external or internal subset) 3184 | might reveal a declaration that identifies the name 3185 | attribute as an ID. Note: This processing is not 3186 | necessarily part of validation. A non-validating, DTD-aware 3187 | processor can recognize IDs. 3188 |
Processing the document with a W3C XML schema might reveal an 3190 | element declaration that identifies the name attribute 3191 | as an W3C XML Schema ID. 3192 |
In practice, processing the document with another schema 3194 | language, such as RELAX NG [RELAXNG], might 3195 | reveal the attributes declared to be of ID in the XML Schema sense. 3196 | Many modern specifications begin processing XML at the Infoset 3197 | [INFOSET] level and do not specify 3198 | normatively how an Infoset is constructed. For those 3199 | specifications, any process that establishes the ID type in the 3200 | Infoset (and Post Schema Validation Infoset (PSVI) 3201 | defined in [XMLSCHEMA]) may usefully 3202 | identify the attributes of type ID. 3203 |
In practice, applications may have independent means (such as 3205 | those defined in the XPointer specification, [XPTRFR] 3208 | section 3.2) of locating identifiers inside a document. 3209 |

3211 |

3212 | To further complicate matters, DTDs establish the ID type in the 3213 | Infoset whereas W3C XML Schema produces a PSVI but does not modify 3214 | the original Infoset. This leaves open the possibility that a 3215 | processor might only look in the Infoset and consequently would 3216 | fail to recognize schema-assigned IDs. 3217 |

3218 |

3219 | See the TAG issue xmlIDSemantics-32 3221 | for additional background information and [XML-ID] for a solution under development. 3223 |

3224 |

3225 |

3226 |

3227 | Media types for XML 3228 |

3229 |

3230 | RFC 3023 defines the Internet media types "application/xml" and 3231 | "text/xml", and describes a convention whereby XML-based data 3232 | formats use Internet media types with a "+xml" suffix, for example 3233 | "image/svg+xml". 3234 |

3235 |

3236 | There are two problems associated with the “text” media types: 3237 | First, for data identified as "text/*", Web intermediaries are 3238 | allowed to "transcode", i.e., convert one character encoding to 3239 | another. Transcoding may make the self-description false or may 3240 | cause the document to be not well-formed. 3241 |

3242 |

3243 |

3244 | Good practice: XML 3245 | and "text/*" 3246 |

3247 |

3248 | In general, a representation provider SHOULD NOT assign Internet 3249 | media types beginning with "text/" to XML representations. 3250 |

3251 |

3252 |

3253 | Second, representations whose Internet media types begin with 3254 | "text/" are required, unless the charset parameter is 3255 | specified, to be considered to be encoded in US-ASCII. Since the 3256 | syntax of XML is designed to make documents self-describing, it is 3257 | good practice to omit the charset parameter, and since 3258 | XML is very often not encoded in US-ASCII, the use of "text/" 3259 | Internet media types effectively precludes this good practice. 3260 |

3261 |

3262 |

3263 | Good practice: XML 3264 | and character encodings 3265 |

3266 |

3267 | In general, a representation provider SHOULD NOT specify the 3268 | character encoding for XML data in protocol headers since the 3269 | data is self-describing. 3270 |

3271 |

3272 |

3273 |

3274 |

3275 | Fragment identifiers in XML 3276 |

3277 |

3278 | The section on media types and 3279 | fragment identifier semantics discusses the interpretation of 3280 | fragment identifiers. Designers of an XML-based data format 3281 | specification should define the semantics of fragment identifiers 3282 | in that format. The XPointer Framework [XPTRFR] provides an interoperable starting point. 3284 |

3285 |

3286 | When the media type assigned to representation data is 3287 | "application/xml", there are no semantics defined for fragment 3288 | identifiers, and authors should not make use of fragment 3289 | identifiers in such data. The same is true if the assigned media 3290 | type has the suffix "+xml" (defined in "XML Media Types" [RFC3023]), and the data format specification does 3292 | not specify fragment identifier semantics. In short, just knowing 3293 | that content is XML does not provide information about fragment 3294 | identifier semantics. 3295 |

3296 |

3297 | Many people assume that the fragment identifier #abc, 3298 | when referring to XML data, identifies the element in the document 3299 | with the ID "abc". However, there is no normative support for this 3300 | assumption. A revision of RFC 3023 is expected to address this. 3301 |

3302 |

3303 | See TAG issue fragmentInXML-28. 3305 |

3306 |

3307 |

3308 |

3309 |

3310 | Future Directions for Data Formats 3311 |

3312 |

3313 | Data formats enable the creation of new applications to make use of 3314 | the information space infrastructure. The Semantic Web is one such 3315 | application, built on top of RDF [RDFXML]. This 3316 | document does not discuss the Semantic Web in detail; the TAG expects 3317 | that future volumes of this document will. See the related TAG issue 3318 | httpRange-14. 3320 |

3321 |

3322 |

3323 |

3324 |

3325 | General Architecture Principles 3326 |

3327 |

3328 | A number of general architecture principles apply to all three bases of 3329 | Web architecture. 3330 |

3331 |

3332 |

3333 | Orthogonal Specifications 3334 |

3335 |

3336 | Identification, interaction, and representation are orthogonal 3337 | concepts, meaning that technologies used for identification, 3338 | interaction, and representation may evolve independently. For 3339 | instance: 3340 |

3341 |

Resources are identified with URIs. URIs can be published without 3343 | building any representations of the resource or determining whether 3344 | any representations are available. 3345 |
A generic URI syntax allows agents to function in many cases 3347 | without knowing specifics of URI schemes. 3348 |
In many cases one may change the representation of a resource 3350 | without disrupting references to the resource (for example, by using 3351 | content negotiation). 3352 |

3354 |

3355 | When two specifications are orthogonal, one may change one without 3356 | requiring changes to the other, even if one has dependencies on the 3357 | other. For example, although the HTTP specification depends on the 3358 | URI specification, the two may evolve independently. This 3359 | orthogonality increases the flexibility and robustness of the Web. 3360 | For example, one may refer by URI to an image without knowing 3361 | anything about the format chosen to represent the image. This has 3362 | facilitated the introduction of image formats such as PNG and SVG 3363 | without disrupting existing references to image resources. 3364 |

3365 |

3366 |

3367 | Principle: Orthogonality 3369 |

3370 |

3371 | Orthogonal abstractions benefit from orthogonal specifications. 3372 |

3373 |

3374 |

3375 | Experience demonstrates that problems arise where orthogonal concepts 3376 | occur in a single specification. Consider, for example, the HTML 3377 | specification which includes the orthogonal x-www-form-urlencoded 3378 | specification. Software developers (for example, of [CGI] applications) might have an easier time finding the 3380 | specification if it were published separately and then cited from the 3381 | HTTP, URI, and HTML specifications. 3382 |

3383 |

3384 | Problems also arise when specifications attempt to modify orthogonal 3385 | abstractions described elsewhere. An historical 3387 | version of the HTML specification added a "Refresh" 3388 | value to the http-equiv attribute of the 3389 | meta element. It was defined to be equivalent to the 3390 | HTTP header of the same name. The authors of the HTTP specification 3391 | ultimately decided not to provide this header and that made the two 3392 | specifications awkwardly at odds with each other. The W3C HTML 3393 | Working Group eventually removed the "Refresh" value. 3394 |

3395 |

3396 | A specification should clearly indicate which features overlap with 3397 | those governed by another specification. 3398 |

3399 |

3400 |

3401 |

3402 | Extensibility 3403 |

3404 |

3405 | The information in the Web and the technologies used to represent 3406 | that information change over time. Extensibility is the property of a 3407 | technology that promotes evolution without sacrificing 3408 | interoperability. Some examples of successful technologies designed 3409 | to allow change while minimizing disruption include: 3410 |

3411 |

the fact that URI schemes are orthogonally specified; 3413 |
the use of an open set of Internet media types in mail and HTTP 3415 | to specify document interpretation; 3416 |
the separation of the generic XML grammar and the open set of XML 3418 | namespaces for element and attribute names; 3419 |
extensibility models in Cascading Style Sheets (CSS), XSLT 1.0, 3421 | and SOAP; 3422 |
user agent plug-ins. 3424 |

3426 |

3427 | An example of an unsuccessful extension mechanism is HTTP mandatory 3428 | extensions [HTTPEXT]. The community has sought 3429 | mechanisms to extend HTTP, but apparently the costs of the mandatory 3430 | extension proposal (notably in complexity) outweighed the benefits 3431 | and thus hampered adoption. 3432 |

3433 |

3434 | Below we discuss the property of "extensibility," exhibited by URIs, 3435 | some data formats, and some protocols (through the incorporation of 3436 | new messages). 3437 |

3438 |

3439 | Subset language: one language is a subset (or "profile") 3440 | of a second language if any document in the first language is also a 3441 | valid document in the second language and has the same interpretation 3442 | in the second language. 3443 |

3444 |

3445 | Extended language: If one language is a subset of another, 3446 | the latter superset is called an extended language; the difference 3447 | between the languages is called the extension. Clearly, extending a 3448 | language is better for interoperability than creating an incompatible 3449 | language. 3450 |

3451 |

3452 | Ideally, many instances of a superset language can be safely and 3453 | usefully processed as though they were in the subset language. 3454 | Languages that can evolve this way, allowing applications to provide 3455 | new information when necessary while still interoperating with 3456 | applications that only understand a subset of the current language, 3457 | are said to be "extensible." Language designers can facilitate 3458 | extensibility by defining the default behavior of unknown 3459 | extensions—for example, that they be ignored (in some defined way) or 3460 | should be considered errors. 3461 |

3462 |

3463 | For example, from early on in the Web, HTML agents followed the 3464 | convention of ignoring unknown tags. This choice left room for 3465 | innovation (i.e., non-standard elements) and encouraged the 3466 | deployment of HTML. However, interoperability problems arose as well. 3467 | In this type of environment, there is an inevitable tension between 3468 | interoperability in the short term and the desire for extensibility. 3469 | Experience shows that designs that strike the right balance between 3470 | allowing change and preserving interoperability are more likely to 3471 | thrive and are less likely to disrupt the Web community. Orthogonal specifications help reduce the 3473 | risk of disruption. 3474 |

3475 |

3476 | For further discussion, see the section on versioning and extensibility. See also TAG issue 3478 | xmlProfiles-29 3480 | and HTML Dialects. 3481 |

3482 |

3483 |

3484 |

3485 | Error Handling 3486 |

3487 |

3488 | Errors occur in networked information systems. An error condition can 3489 | be well-characterized (e.g., well-formedness errors in XML or 4xx 3490 | client errors in HTTP) or arise unpredictably. Error 3491 | correction means that an agent repairs a condition so that 3492 | within the system, it is as though the error never occurred. One 3493 | example of error correction involves data retransmission in response 3494 | to a temporary network failure. Error recovery means that 3495 | an agent does not repair an error condition but continues processing 3496 | by addressing the fact that the error has occurred. 3497 |

3498 |

3499 | Agents frequently correct errors without user awareness, 3500 | sparing users the details of complex network communications. On the 3501 | other hand, it is important that agents recover from error 3502 | in a way that is evident to users, since the agents are acting on 3503 | their behalf. 3504 |

3505 |

3506 |

3507 | Principle: Error recovery 3509 |

3510 |

3511 | Agents that recover from error by making a choice without the 3512 | user's consent are not acting on the user's behalf. 3513 |

3514 |

3515 |

3516 | An agent is not required to interrupt the user (e.g., by popping up a 3517 | confirmation box) to obtain consent. The user may indicate consent 3518 | through pre-selected configuration options, modes, or selectable user 3519 | interface toggles, with appropriate reporting to the user when the 3520 | agent detects an error. Agent developers should not ignore usability 3521 | issues when designing error recovery behavior. 3522 |

3523 |

3524 | To promote interoperability, specification designers should identify 3525 | predictable error conditions. Experience has led to the following 3526 | observations about error-handling approaches. 3527 |

3528 |

Protocol designers should provide enough information about an 3530 | error condition so that an agent can address the error condition. For 3531 | instance, an HTTP 404 status code (not found) is useful because it 3532 | allows user agents to present relevant information to users, enabling 3533 | them to contact the representation provider in case of problems. 3534 |
Experience with the cost of building a user agent to handle the 3536 | diverse forms of ill-formed HTML content convinced the designers of 3537 | the XML specification to require that agents fail upon encountering 3538 | ill-formed content. Because users are unlikely to tolerate such 3539 | failures, this design choice has pressured all parties into 3540 | respecting XML's constraints, to the benefit of all. 3541 |
An agent that encounters unrecognized content may handle it in a 3543 | number of ways, including by considering it an error; see also the 3544 | section on extensibility and 3545 | versioning. 3546 |
Error behavior that is appropriate for a person may not be 3548 | appropriate for software. People are capable of exercising judgement 3549 | in ways that software applications generally cannot. An informal 3550 | error response may suffice for a person but not for a processor. 3551 |

3553 |

3554 | See the TAG issue contentTypeOverride-24, 3556 | which concerns the source of authoritative metadata. 3557 |

3558 |

3559 |

3560 |

3561 | Protocol-based Interoperability 3562 |

3563 |

3564 | The Web follows Internet tradition in that its important interfaces 3565 | are defined in terms of protocols, by specifying the syntax, 3566 | semantics, and sequencing constraints of the messages interchanged. 3567 | Protocols designed to be resilient in the face of widely varying 3568 | environments have helped the Web scale and have facilitated 3569 | communication across multiple trust boundaries. Traditional 3570 | application programming interfaces (APIs) do not always 3571 | take these constraints into account, nor should they be required to. 3572 | One effect of protocol-based design is that the technology shared 3573 | among agents often lasts longer than the agents themselves. 3574 |

3575 |

3576 | It is common for programmers working with the Web to write code that 3577 | generates and parses these messages directly. It is less common, but 3578 | not unusual, for end users to have direct exposure to these messages. 3579 | It is often desirable to provide users with access to format and 3580 | protocol details: allowing them to “view source,” whereby they may 3581 | gain expertise in the workings of the underlying system. 3582 |

3583 |

3584 |

3585 |

3586 |

3587 | Glossary 3588 |

3589 |

3591 | Content negotiation 3592 |: 3594 | The practice of providing multiple 3595 | representations available via the same URI. Which representation is 3596 | served depends on negotiation between the requesting agent and the 3597 | agent serving the representations. 3598 |
3600 | Dereference a URI 3601 |: 3603 | Access a representation of the resource 3604 | identified by the URI. 3605 |
3607 | Error correction 3608 |: 3610 | An agent repairs an error so that within the 3611 | system, it is as though the error never occurred. 3612 |
3614 | Error recovery 3615 |: 3617 | An agent invokes exceptional behavior because 3618 | it does not correct the error. 3619 |
3621 | Extended language 3622 |: 3624 | If one language is a subset of another, the 3625 | latter is called an extended language. 3626 |
3628 | Fragment identifier 3629 |: 3631 | The part of a URI that allows identification of 3632 | a secondary resource. 3633 |
3635 | Information resource 3636 |: 3638 | A resource which has the property that all of 3639 | its essential characteristics can be conveyed in a message. 3640 |
3642 | Link 3643 |: 3645 | A relationship between two resources when one 3646 | resource (representation) refers to the other resource by means of a 3647 | URI. 3648 |
3650 | Message 3651 |: 3653 | A unit of communication between agents. 3654 |
3656 | Namespace document 3657 |: 3659 | An information resource that contains useful 3660 | information, machine-processable and/or human-readable, about terms 3661 | in a particular XML namespace. 3662 |
3664 | Representation 3665 |: 3667 | Data that encodes information about resource 3668 | state. 3669 |
3671 | Resource 3672 |: 3674 | Anything that might be identified by a 3675 | URI. 3676 |
3678 | Safe interaction 3679 |: 3681 | Interaction with a resource where an agent does 3682 | not incur any obligation beyond the interaction. 3683 |
3685 | Secondary resource 3686 |: 3688 | A resource related to another resource through 3689 | the primary resource with additional identifying information (the 3690 | fragment identifier). 3691 |
3693 | Subset language 3694 |: 3696 | One language is a subset of a second language 3697 | if any document in the first language is also a valid document in the 3698 | second language and has the same interpretation in the second 3699 | language. 3700 |
3702 | URI 3703 |: 3705 | Acronym for Uniform Resource Identifier. 3706 |
3708 | URI aliases 3709 |: 3711 | Two or more different URIs that that identify 3712 | the same resource. 3713 |
3715 | URI collision 3716 |: 3718 | The use of the same URI to refer to more than 3719 | one resource in the context of Web protocols and formats. 3720 |
3722 | URI ownership 3723 |: 3725 | A relationship between a URI and a social 3726 | entity, such as a person, organization, or specification. 3727 |
3729 | URI persistence 3730 |: 3732 | The social expectation that once a URI 3733 | identifies a particular resource, it should continue indefinitely to 3734 | refer to that resource. 3735 |
3737 | URI reference 3738 |: 3740 | An operational shorthand for a URI. 3741 |
3743 | Uniform Resource Identifier (URI) 3744 |: 3746 | A global identifier in the context of the World 3747 | Wide Web. 3748 |
3750 | Unsafe interaction 3751 |: 3753 | Interaction with a resource that is not safe 3754 | interaction. 3755 |
3757 | User agent 3758 |: 3760 | One type of Web agent; a piece of software 3761 | acting on behalf of a person. 3762 |
3764 | WWW 3765 |: 3767 | Acronym for World Wide Web. 3768 |
3770 | Web 3771 |: 3773 | Shortened form of World Wide Web. 3774 |
3776 | Web agent 3777 |: 3779 | A person or a piece of software acting on the 3780 | information space on behalf of a person, entity, or process. 3781 |
3783 | World Wide Web 3784 |: 3786 | An information space in which items of interest 3787 | are identified by Uniform Resource Identifiers. 3788 |
3790 | XML-based format 3791 |: 3793 | One that conforms to the syntax rules defined 3794 | in the XML specification. 3795 |

3797 |

3798 |

3799 |

3800 |

3801 | References 3802 |

3803 |

3805 | CGI 3806 |: 3808 | Common Gateway 3810 | Interface/1.1 Specification. Available at 3811 | http://hoohoo.ncsa.uiuc.edu/cgi/interface.html. 3812 |
3814 | CHIPS 3815 |: 3817 | Common HTTP 3818 | Implementation Problems, O. Théreaux, January 2003. This 3819 | W3C Team Submission is available at http://www.w3.org/TR/chips/. 3820 |
3822 | CUAP 3823 |: 3825 | Common User Agent 3826 | Problems, K. Dubost, January 2003. This W3C Team 3827 | Submission is available at http://www.w3.org/TR/cuap. 3828 |
3830 | Cool 3831 |: 3833 | Cool URIs 3834 | don't change T. Berners-Lee, W3C, 1998 Available at 3835 | http://www.w3.org/Provider/Style/URI. Note that the title is 3836 | somewhat misleading. It is not the URIs that change, it is what 3837 | they identify. 3838 |
3840 | Eng90 3841 |: 3843 | Knowledge-Domain 3845 | Interoperability and an Open Hyperdocument System, D. C. 3846 | Engelbart, June 1990. 3847 |
3849 | HTTPEXT 3850 |: 3852 | 3854 | Mandatory Extensions in HTTP, H. Frystyk Nielsen, P. 3855 | Leach, S. Lawrence, 20 January 1998. This expired IETF Internet 3856 | Draft is available at 3857 | http://www.w3.org/Protocols/HTTP/ietf-http-ext/draft-frystyk-http-mandatory. 3858 |
3860 | IANASchemes 3861 |: 3863 | IANA's online 3864 | registry of URI Schemes is available at 3865 | http://www.iana.org/assignments/uri-schemes. 3866 |
3868 | IETFXML 3869 |: 3871 | IETF Guidelines 3873 | For The Use of XML in IETF Protocols, S. Hollenbeck, M. 3874 | Rose, L. Masinter, eds., 2 November 2002. This IETF Internet Draft 3875 | is available at 3876 | http://www.imc.org/ietf-xml-use/xml-guidelines-07.txt. If this 3877 | document is no longer available, refer to the ietf-xml-use mailing 3879 | list. 3880 |
3882 | INFOSET 3883 |: 3885 | XML Information Set 3886 | (Second Edition), R. Tobin, J. Cowan, Editors, W3C 3887 | Recommendation, 04 February 2004, 3888 | http://www.w3.org/TR/2004/REC-xml-infoset-20040204. Latest version available at 3890 | http://www.w3.org/TR/xml-infoset. 3891 |
3893 | IRI 3894 |: 3896 | IETF Internationalized 3898 | Resource Identifiers (IRIs), M. Dürst, M. Suignard, Nov 3899 | 2002. This IETF Internet Draft is available at 3900 | http://www.w3.org/International/iri-edit/draft-duerst-iri.html. If 3901 | this document is no longer available, refer to the home page for 3902 | Editing 3903 | 'Internationalized Resource Identifiers (IRIs)'. 3904 |
3906 | MEDIATYPEREG 3907 |: 3909 | IANA's online 3911 | registry of Internet Media Types is available at 3912 | http://www.iana.org/assignments/media-types/index.html. 3913 |
3915 | OWL10 3916 |: 3918 | OWL Web Ontology Language 3919 | Reference, M. Dean, G. Schreiber, Editors, W3C Recommendation, 3920 | 10 February 2004, 3921 | http://www.w3.org/TR/2004/REC-owl-ref-20040210/. Latest version available at 3923 | http://www.w3.org/TR/owl-ref/. 3924 |
3926 | P3P10 3927 |: 3929 | The Platform for Privacy 3930 | Preferences 1.0 (P3P1.0) Specification, M. Marchiori, Editor, 3931 | W3C Recommendation, 16 April 2002, 3932 | http://www.w3.org/TR/2002/REC-P3P-20020416/. Latest version available at 3934 | http://www.w3.org/TR/P3P/. 3935 |
3937 | RDDL 3938 |: 3940 | Resource 3941 | Directory Description Language (RDDL), J. Borden, T. 3942 | Bray, eds., 1 June 2003. This document is available at 3943 | http://www.tbray.org/tag/rddl/rddl3.html. 3944 |
3946 | RDFXML 3947 |: 3949 | RDF/XML Syntax 3950 | Specification (Revised), D. Beckett, Editor, W3C 3951 | Recommendation, 10 February 2004, 3952 | http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/. 3953 | Latest 3954 | version available at http://www.w3.org/TR/rdf-syntax-grammar. 3955 |
3957 | RELAXNG 3958 |: 3960 | The RELAX NG schema language 3961 | project. 3962 |
3964 | REST 3965 |: 3967 | 3969 | Representational State Transfer (REST), Chapter 5 of 3970 | "Architectural Styles and the Design of Network-based Software 3971 | Architectures", Doctoral Thesis of R. T. Fielding, 2000. Designers 3972 | of protocol specifications in particular should invest time in 3973 | understanding the REST model and the relevance of its principles to 3974 | a given design. These principles include statelessness, clear 3975 | assignment of roles to parties, uniform address space, and a 3976 | limited, uniform set of verbs. Available at 3977 | http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm. 3978 |
3980 | RFC2045 3981 |: 3983 | IETF RFC 2045: 3984 | Multipurpose Internet Mail Extensions (MIME) Part One: Format of 3985 | Internet Message Bodies, N. Freed, N. Borenstein, 3986 | November 1996. Available at http://www.ietf.org/rfc/rfc2045.txt. 3987 |
3989 | RFC2046 3990 |: 3992 | IETF RFC 2046: 3993 | Multipurpose Internet Mail Extensions (MIME) Part Two: Media 3994 | Types, N. Freed, N. Borenstein, November 1996. Available 3995 | at http://www.ietf.org/rfc/rfc2046.txt. 3996 |
3998 | RFC2119 3999 |: 4001 | IETF RFC 2119: 4002 | Key words for use in RFCs to Indicate Requirement 4003 | Levels, S. Bradner, March 1997. Available at 4004 | http://www.ietf.org/rfc/rfc2119.txt. 4005 |
4007 | RFC2141 4008 |: 4010 | IETF RFC 2141: 4011 | URN Syntax, R. Moats, May 1997. Available at 4012 | http://www.ietf.org/rfc/rfc2141.txt. 4013 |
4015 | RFC2326 4016 |: 4018 | IETF RFC 2326: 4019 | Real Time Streaming Protocol (RTSP), H. Schulzrinne, A. 4020 | Rao, R. Lanphier, April 1998. Available at: 4021 | http://www.ietf.org/rfc/rfc2326.txt. 4022 |
4024 | RFC2397 4025 |: 4027 | IETF RFC 2397: 4028 | The “data” URL scheme, L. Masinter, August 1998. 4029 | Available at: http://www.ietf.org/rfc/rfc2397.txt. 4030 |
4032 | RFC2616 4033 |: 4035 | IETF RFC 2616: 4036 | Hypertext Transfer Protocol - HTTP/1.1, J. Gettys, J. 4037 | Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 4038 | 1999. Available at http://www.ietf.org/rfc/rfc2616.txt. 4039 |
4041 | RFC2717 4042 |: 4044 | IETF Registration Procedures for 4046 | URL Scheme Names, R. Petke, I. King, November 1999. 4047 | Available at http://www.ietf.org/rfc/rfc2717.txt. 4048 |
4050 | RFC2718 4051 |: 4053 | IETF RFC 2718: 4054 | Guidelines for new URL Schemes, L. Masinter, H. 4055 | Alvestrand, D. Zigmond, R. Petke, November 1999. Available at: 4056 | http://www.ietf.org/rfc/rfc2718.txt. 4057 |
4059 | RFC2818 4060 |: 4062 | IETF RFC 2818: 4063 | HTTP Over TLS, E. Rescorla, May 2000. Available at: 4064 | http://www.ietf.org/rfc/rfc2818.txt. 4065 |
4067 | RFC3023 4068 |: 4070 | IETF RFC 3023: 4071 | XML Media Types, M. Murata, S. St. Laurent, D. Kohn, 4072 | January 2001. Available at: http://www.ietf.org/rfc/rfc3023.txt 4073 |
4075 | RFC3236 4076 |: 4078 | IETF RFC 3236: 4079 | The 'application/xhtml+xml' Media Type, M. Baker, P. 4080 | Stark, January 2002. Available at: 4081 | http://www.ietf.org/rfc/rfc3236.txt 4082 |
4084 | RFC3261 4085 |: 4087 | IETF RFC 3261: 4088 | SIP: Session Initiation Protocol, J. Rosenberg, H. 4089 | Schulzrinne, G. Camarillo, et. al., June 2002. Available at: 4090 | http://www.ietf.org/rfc/rfc3261.txt 4091 |
4093 | RFC3920 4094 |: 4096 | IETF RFC 3920: 4097 | Extensible Messaging and Presence Protocol (XMPP): Core, 4098 | P. Saint-Andre, Ed., October 2004. Available at: 4099 | http://www.ietf.org/rfc/rfc3920.txt 4100 |
4102 | RFC977 4103 |: 4105 | IETF RFC 977: 4106 | Network News Transfer Protocol, B. Kantor, P. Lapsley, 4107 | February 1986. Available at http://www.ietf.org/rfc/rfc977.txt. 4108 |
4110 | SOAP12 4111 |: 4113 | SOAP Version 1.2 Part 4114 | 1: Messaging Framework, J. Moreau, N. Mendelsohn, H. Frystyk 4115 | Nielsen, et. al., Editors, W3C Recommendation, 4116 | 24 June 2003, 4117 | http://www.w3.org/TR/2003/REC-soap12-part1-20030624/. Latest version available 4119 | at http://www.w3.org/TR/soap12-part1/. 4120 |
4122 | SVG11 4123 |: 4125 | Scalable Vector Graphics 4126 | (SVG) 1.1 Specification, 藤沢淳, J. Ferraiolo, D. Jackson, 4127 | Editors, W3C Recommendation, 14 January 2003, 4128 | http://www.w3.org/TR/2003/REC-SVG11-20030114/. Latest version available at 4130 | http://www.w3.org/TR/SVG11/. 4131 |
4133 | UNICODE 4134 |: 4136 | See the Unicode Consortium home 4137 | page for information about the latest version of Unicode and 4138 | character repertoires. 4139 |
4141 | URI 4142 |: 4144 | Uniform Resource Identifiers (URI): Generic Syntax (T. 4145 | Berners-Lee, R. Fielding, L. Masinter, Eds.) is currently being 4146 | revised. Citations labeled [[!URI]] refer to "Uniform 4148 | Resource Identifier (URI): Generic Syntax." 4149 |
4151 | UniqueDNS 4152 |: 4154 | 4156 | IAB Technical Comment on the Unique DNS Root, B. 4157 | Carpenter, 27 September 1999. Available at 4158 | http://www.icann.org/correspondence/iab-tech-comment-27sept99.htm. 4159 |
4161 | VOICEXML2 4162 |: 4164 | Voice Extensible Markup 4165 | Language (VoiceXML) Version 2.0, B. Porter, A. Hunt, K. Rehor, 4166 | et. al., Editors, W3C Recommendation, 4167 | 16 March 2004, 4168 | http://www.w3.org/TR/2004/REC-voicexml20-20040316/. Latest version available at 4170 | http://www.w3.org/TR/voicexml20. 4171 |
4173 | XHTML11 4174 |: 4176 | XHTML™ 1.1 - Module-based 4177 | XHTML, S. McCarron, M. Altheim, Editors, W3C Recommendation, 4178 | 31 May 2001, 4179 | http://www.w3.org/TR/2001/REC-xhtml11-20010531. Latest version available at 4181 | http://www.w3.org/TR/xhtml11/. 4182 |
4184 | XLink10 4185 |: 4187 | XML Linking Language (XLink) 4188 | Version 1.0, E. Maler, S. DeRose, D. Orchard, Editors, W3C 4189 | Recommendation, 27 June 2001, 4190 | http://www.w3.org/TR/2001/REC-xlink-20010627/. Latest version available at 4192 | http://www.w3.org/TR/xlink/. 4193 |
4195 | XML-ID 4196 |: 4198 | xml:id Version 1.0, D. 4199 | Veillard, J. Marsh, Editors, W3C Working Draft (work in progress), 4200 | 07 April 2004, 4201 | http://www.w3.org/TR/2004/WD-xml-id-20040407. Latest version available at 4203 | http://www.w3.org/TR/xml-id/. 4204 |
4206 | XML10 4207 |: 4209 | Extensible Markup Language 4210 | (XML) 1.0 (Third Edition), F. Yergeau, J. Paoli, C. M. 4211 | Sperberg-McQueen, et. al., Editors, W3C Recommendation, 4212 | 04 February 2004, 4213 | http://www.w3.org/TR/2004/REC-xml-20040204. Latest version available at 4215 | http://www.w3.org/TR/REC-xml. 4216 |
4218 | XML11 4219 |: 4221 | Extensible Markup Language 4222 | (XML) 1.1, J. Paoli, C. M. Sperberg-McQueen, J. Cowan, et. 4223 | al., Editors, W3C Recommendation, 04 February 2004, 4224 | http://www.w3.org/TR/2004/REC-xml11-20040204/. Latest version available at 4226 | http://www.w3.org/TR/xml11/. 4227 |
4229 | XMLNS 4230 |: 4232 | Namespaces in XML 4233 | 1.1, R. Tobin, D. Hollander, A. Layman, et. al., 4234 | Editors, W3C Recommendation, 04 February 2004, 4235 | http://www.w3.org/TR/2004/REC-xml-names11-20040204. Latest version available at 4237 | http://www.w3.org/TR/xml-names11/. 4238 |
4240 | XMLSCHEMA 4241 |: 4243 | XML Schema Part 1: 4244 | Structures, D. Beech, M. Maloney, H. S. Thompson, et. 4245 | al., Editors, W3C Recommendation, 02 May 2001, 4246 | http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/. Latest version available at 4248 | http://www.w3.org/TR/xmlschema-1/. 4249 |
4251 | XPTRFR 4252 |: 4254 | XPointer 4255 | Framework, E. Maler, N. Walsh, P. Grosso, et. al., 4256 | Editors, W3C Recommendation, 25 March 2003, 4257 | http://www.w3.org/TR/2003/REC-xptr-framework-20030325/. Latest version available 4259 | at http://www.w3.org/TR/xptr-framework/. 4260 |
4262 | XSLT10 4263 |: 4265 | XSL Transformations (XSLT) 4266 | Version 1.0, J. Clark, Editor, W3C Recommendation, 4267 | 16 November 1999, 4268 | http://www.w3.org/TR/1999/REC-xslt-19991116. Latest version available at 4270 | http://www.w3.org/TR/xslt. 4271 |

4273 |

4274 |

4275 | Architectural Specifications 4276 |

4277 |

4279 | ATAG10 4280 |: 4282 | Authoring Tool 4283 | Accessibility Guidelines 1.0, C. McCathieNevile, I. Jacobs, 4284 | J. Treviranus, et. al., Editors, W3C Recommendation, 4285 | 03 February 2000, 4286 | http://www.w3.org/TR/2000/REC-ATAG10-20000203. Latest version available at 4288 | http://www.w3.org/TR/ATAG10. 4289 |
4291 | CHARMOD 4292 |: 4294 | Character Model for the 4295 | World Wide Web 1.0: Fundamentals, R. Ishida, M. J. Dürst, M. 4296 | Wolf, et. al., Editors, W3C Working Draft (work in 4297 | progress), 25 February 2004, 4298 | http://www.w3.org/TR/2004/WD-charmod-20040225/. Latest version available at 4300 | http://www.w3.org/TR/charmod/. 4301 |
4303 | DIPRINCIPLES 4304 |: 4306 | Device Independence 4307 | Principles, R. Gimson, Editor, W3C Note, 4308 | 01 September 2003, 4309 | http://www.w3.org/TR/2003/NOTE-di-princ-20030901/. Latest version available at 4311 | http://www.w3.org/TR/di-princ/. 4312 |
4314 | EXTLANG 4315 |: 4317 | Web 4319 | Architecture: Extensible Languages, T. Berners-Lee, D. 4320 | Connolly, 10 February 1998. This W3C Note is available at 4321 | http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210. 4322 |
4324 | Fielding 4325 |: 4327 | Principled 4329 | Design of the Modern Web Architecture, R.T. Fielding 4330 | and R.N. Taylor, UC Irvine. In Proceedings of the 2000 4331 | International Conference on Software Engineering (ICSE 2000), 4332 | Limerick, Ireland, June 2000, pp. 407-416. This document is 4333 | available at 4334 | http://www.ics.uci.edu/~fielding/pubs/webarch_icse2000.pdf. 4335 |
4337 | QA 4338 |: 4340 | QA Framework: 4341 | Specification Guidelines, D. Hazaël-Massieux, L. Rosenthal, 4342 | L. Henderson, et. al., Editors, W3C Working Draft (work 4343 | in progress), 30 August 2004, 4344 | http://www.w3.org/TR/2004/WD-qaframe-spec-20040830/. Latest version available 4346 | at http://www.w3.org/TR/qaframe-spec/. 4347 |
4349 | RFC1958 4350 |: 4352 | IETF RFC 4353 | 1958: Architectural Principles of the Internet, B. 4354 | Carpenter, June 1996. Available at 4355 | http://www.ietf.org/rfc/rfc1958.txt. 4356 |
4358 | SPECVAR 4359 |: 4361 | Variability in 4362 | Specifications, L. Rosenthal, D. Hazaël-Massieux, Editors, 4363 | W3C Working Draft (work in progress), 30 August 2004, 4364 | http://www.w3.org/TR/2004/WD-spec-variability-20040830/. Latest version 4366 | available at http://www.w3.org/TR/spec-variability/. 4367 |
4369 | UAAG10 4370 |: 4372 | User Agent Accessibility 4373 | Guidelines 1.0, J. Gunderson, I. Jacobs, E. Hansen, Editors, 4374 | W3C Recommendation, 17 December 2002, 4375 | http://www.w3.org/TR/2002/REC-UAAG10-20021217/. Latest version available at 4377 | http://www.w3.org/TR/UAAG10/. 4378 |
4380 | WCAG20 4381 |: 4383 | Web Content Accessibility 4384 | Guidelines 2.0, W. Chisholm, J. White, B. Caldwell, et. 4385 | al., Editors, W3C Working Draft (work in progress), 4386 | 30 July 2004, 4387 | http://www.w3.org/TR/2004/WD-WCAG20-20040730/. Latest version available at 4389 | http://www.w3.org/TR/WCAG20/. 4390 |
4392 | WSA 4393 |: 4395 | Web Services 4396 | Architecture, D. Booth, F. McCabe, E. Newcomer, et. 4397 | al., Editors, W3C Note, 11 February 2004, 4398 | http://www.w3.org/TR/2004/NOTE-ws-arch-20040211/. Latest version available at 4400 | http://www.w3.org/TR/ws-arch/. 4401 |
4403 | XAG 4404 |: 4406 | XML Accessibility 4407 | Guidelines, S. B. Palmer, C. McCathieNevile, D. Dardailler, 4408 | Editors, W3C Working Draft (work in progress), 4409 | 03 October 2002, 4410 | http://www.w3.org/TR/2002/WD-xag-20021003. Latest version available at 4412 | http://www.w3.org/TR/xag. 4413 |

4415 |

4416 |

4417 |

4418 |

4419 |

4420 | Acknowledgments 4421 |

4422 |

4423 | This document was authored by the W3C Technical Architecture Group 4424 | which included the following participants: Tim Berners-Lee (co-Chair, 4425 | W3C), Tim Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton 4426 | (Microsoft Corporation), Roy Fielding (Day Software), Mario Jeckle 4427 | (Daimler Chrysler), Chris Lilley (W3C), Noah Mendelsohn (IBM), David 4428 | Orchard (BEA Systems), Norman Walsh (Sun Microsystems), and Stuart 4429 | Williams (co-Chair, Hewlett-Packard). 4430 |

4431 |

4432 | The TAG appreciates the many contributions on the TAG's public mailing 4433 | list, www-tag@w3.org (archive), which have 4435 | helped to improve this document. 4436 |

4437 |

4438 | In addition, contributions by David Booth, Erik Bruchez, Kendall Clark, 4439 | Karl Dubost, Bob DuCharme, Martin Duerst, Olivier Fehr, Al Gilman, Tim 4440 | Goodwin, Elliotte Rusty Harold, Tony Hammond, Sandro Hawke, Ryan Hayes, 4441 | Dominique Hazaël-Massieux, Masayasu Ishikawa, David M. Karr, Graham 4442 | Klyne, Jacek Kopecky, Ken Laskey, Susan Lesch, Håkon Wium Lie, Frank 4443 | Manola, Mark Nottingham, Bijan Parsia, Peter F. Patel-Schneider, David 4444 | Pawson, Michael Sperberg-McQueen, Patrick Stickler, and Yuxiao Zhao are 4445 | gratefully acknowledged. 4446 |

4447 |

4448 | 4449 | 4450 | -------------------------------------------------------------------------------- /tidyconfig.txt: -------------------------------------------------------------------------------- 1 | char-encoding: utf8 2 | indent: yes 3 | wrap: 80 4 | tidy-mark: no 5 | --------------------------------------------------------------------------------