4 |
5 |
6 |
7 | Architecture of the World Wide Web (Second Edition)
8 |
9 |
12 |
101 |
102 |
103 |
104 |
105 |
106 | The World Wide Web uses relatively simple technologies with sufficient
107 | scalability, efficiency and utility that they have resulted in a
108 | remarkable information space of interrelated
109 | resourcesinterconnected space of information
110 | and services, growing across languages, cultures, and media. In an effort to preserve these properties of
112 | the information space as the technologies
113 | evolve, this architecture document discusses the core design components
114 | of the Web. They are identification of resourcesinformation and
116 | services, representation of resource
117 | stateinformation state and service
118 | requests, and the protocols that support the interaction between
119 | agents and resources in the space. We relate
120 | core design components, constraints, and good practices to the
121 | principles and properties they support.
122 |
123 |
124 |
125 |
126 | This is an unofficial draft and work in progress. It has no official
127 | standing: rather it represents a trial balloon to help the TAG decide whether to proceed with a
129 | second edition of AWWW or
130 | not.
131 |
132 |
133 | This draft highlights most differences from the first edition of
134 | AWWW, with deletions presented like this and insertions presented like this.
137 |
138 |
139 | This section describes the status of this document at the time of
140 | its publication. Other documents may supersede this document. A list of
141 | current W3C publications and the latest revision of this technical
142 | report can be found in the W3C
143 | technical reports index at http://www.w3.org/TR/.
144 |
145 |
146 | This document has been developed by W3C's Technical Architecture Group (TAG),
148 | which, by charter
149 | maintains a list of
150 | architectural issues. The scope of this document is a useful subset
151 | of those issues; it is not intended to address all of them. The TAG
152 | intends to address the remaining (and future) issues after publication
153 | of Volume Two as a Recommendation.
154 |
155 |
156 | This document uses concepts and terms regarding URIs in general, and http: URIs in particular, as
158 | defined by the IETF. In an
160 | 18 Oct 2004 announcement, the revision of RFC2396 was endorsed as
161 | an IETF Specification, though the latest published draft as of this
162 | writing is draft-fielding-uri-rfc2396bis-07.
164 | The [URI]citation should reflect
165 | publication of the relevant RFC in future revisionsspecification is the primary normative reference for URIs in
167 | general. For http: (and https:, and the HTTP
168 | protocol), [[!HTTP11]] is the specification currently in force, but
169 | [[!HTTPbis]], which will update it, is nearing final approval, and we
170 | assume it will be in force by the time this second edition is
171 | completed.
172 |
173 |
174 | The references and caveats wrt HTTP(bis) above will need to be
175 | corrected as and when HTTPbis is approved HTTP11.2e, or whatever.
176 |
177 |
178 |
179 |
180 | List of Principles, Constraints, and Good Practice Notes
181 |
182 |
183 | The following principles, constraints, and good practice notes are
184 | discussed in this document and listed here for convenience. There is
185 | also a free-standing summary.
186 |
321 | The World Wide Web (WWW, or simply
322 | Web) is an information space in which the items
323 | of interest, referred to as resources, are identified by global
324 | identifiers called Uniform Resource Identifiers
325 | (URI).
326 |
327 |
328 | Examples such as the following travel scenario are used
329 | throughout this document to illustrate typical behavior of Web
330 | agents—people or software acting on this information space. A
331 | user agent acts on behalf of a user. Software agents include
332 | servers, proxies, spiders, browsers, and multimedia players.
333 |
334 |
335 |
336 | Story
337 |
338 |
339 |
340 | While planning a trip to Mexico, Nadia reads “Oaxaca weather
341 | information: 'http://weather.example.com/oaxaca'” in a glossy
342 | travel magazine. Nadia has enough experience with the Web to
343 | recognize that "http://weather.example.com/oaxaca" is a URI and
344 | that she is likely to be able to retrieve associated information
345 | with her Web browser. When Nadia enters the URI into her browser:
346 |
347 |
348 |
The browser recognizes that what Nadia typed is a URI.
349 |
350 |
The browser performs an information retrieval action in
351 | accordance with its configured behavior for resources identified
352 | via the "http" URI scheme.
353 |
354 |
The authority responsible for "weather.example.com" provides
355 | information in a response to the retrieval request.
356 |
357 |
The browser interprets the response, identified as XHTML by the
358 | server, and performs additional retrieval actions for inline
359 | graphics and other content as necessary.
360 |
361 |
The browser displays the retrieved information, which includes
362 | hypertext links to other information. Nadia can follow these
363 | hypertext links to retrieve additional information.
364 |
365 |
366 |
367 |
368 |
369 | This scenario illustrates the three architectural bases of the Web that
370 | are discussed in this document:
371 |
372 |
373 |
374 |
375 | Identification. URIs are used to identify resources. In this
376 | travel scenario, the resource is a periodically updated report on
377 | the weather in Oaxaca, and the URI is
378 | “http://weather.example.com/oaxaca”.
379 |
380 |
381 |
382 |
383 | Interaction. Web agents communicate using standardized
384 | protocols that enable interaction through the exchange of messages
385 | which adhere to a defined syntax and semantics. By entering a URI
386 | into a retrieval dialog or selecting a hypertext link, Nadia tells
387 | her browser to perform a retrieval action for the resource
388 | identified by the URI. In this example, the browser sends an HTTP
389 | GET request (part of the HTTP protocol) to the server at
390 | "weather.example.com", via TCP/IP port 80, and the server sends
391 | back a message containing what it determines to be a representation
392 | of the resource as of the time that representation was generated.
393 | Note that this example is specific to hypertext browsing of
394 | information—other kinds of interaction are possible, both within
395 | browsers and through the use of other types of Web agent; our
396 | example is intended to illustrate one common interaction, not
397 | define the range of possible interactions or limit the ways in
398 | which agents might use the Web.
399 |
400 |
401 |
402 |
403 | Formats. Most protocols used for representation retrieval
404 | and/or submission make use of a sequence of one or more messages,
405 | which taken together contain a payload of representation data and
406 | metadata, to transfer the representation between agents. The choice
407 | of interaction protocol places limits on the formats of
408 | representation data and metadata that can be transmitted. HTTP, for
409 | example, typically transmits a single octet stream plus metadata,
410 | and uses the "Content-Type" and "Content-Encoding" header fields to
411 | further identify the format of the representation. In this
412 | scenario, the representation transferred is in XHTML, as identified
413 | by the "Content-type" HTTP header field containing the registered
414 | Internet media type name, "application/xhtml+xml". That Internet
415 | media type name indicates that the representation data can be
416 | processed according to the XHTML specification.
417 |
418 |
419 | Nadia's browser is configured and programmed to interpret the
420 | receipt of an "application/xhtml+xml" typed representation as an
421 | instruction to render the content of that representation according
422 | to the XHTML rendering model, including any subsidiary interactions
423 | (such as requests for external style sheets or in-line images)
424 | called for by the representation. In the scenario, the XHTML
425 | representation data received from the initial request instructs
426 | Nadia's browser to also retrieve and render in-line the weather
427 | maps, each identified by a URI and thus causing an additional
428 | retrieval action, resulting in additional representations that are
429 | processed by the browser according to their own data formats (e.g.,
430 | "application/svg+xml" indicates the SVG data format), and this
431 | process continues until all of the data formats have been rendered.
432 | The result of all of this processing, once the browser has reached
433 | an application steady-state that completes Nadia's initial
434 | requested action, is commonly referred to as a "Web page".
435 |
436 |
437 |
438 |
439 | The following illustration shows the relationship between identifier,
440 | resource, and representation.
441 |
442 |
443 |
446 |
447 | In the remainder of this document, we highlight important
448 | architectural points regarding Web identifiers, protocols, and
449 | formats. We also discuss some important general architectural
450 | principles and how they apply to the Web.
451 |
452 |
453 |
454 |
455 | About this Document
456 |
457 |
458 | This document describes the properties we desire of the Web and the
459 | design choices that have been made to achieve them. It promotes the
460 | reuse of existing standards when suitable, and gives guidance on how
461 | to innovate in a manner consistent with Web architecture.
462 |
463 |
464 | The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in the
465 | principles, constraints, and good practice notes in accordance with
466 | RFC 2119 [[!RFC2119]].
467 |
468 |
469 | This document does not include conformance provisions for these
470 | reasons:
471 |
472 |
473 |
Conforming software is expected to be so diverse that it would
474 | not be useful to be able to refer to the class of conforming software
475 | agents.
476 |
477 |
Some of the good practice notes concern people; specifications
478 | generally define conformance for software, not people.
479 |
480 |
We do not believe that the addition of a conformance section is
481 | likely to increase the utility of the document.
482 |
483 |
484 |
485 |
486 | Audience of this Document
487 |
488 |
489 | This document is intended to inform discussions about issues of Web
490 | architecture. The intended audience for this document includes:
491 |
492 |
493 |
Participants in W3C Activities
494 |
495 |
Other groups and individuals designing technologies to be
496 | integrated into the Web
497 |
498 |
Implementers of W3C specifications
499 |
500 |
Web content authors and publishers
501 |
502 |
503 |
504 | Note: This document does not distinguish in any
505 | formal way the terms "language" and "format." Context determines
506 | which term is used. The phrase "specification designer" encompasses
507 | language, format, and protocol designers.
508 |
509 |
510 |
511 |
512 | Scope of this Document
513 |
514 |
515 | This document presents the general architecture of the Web. Other
516 | groups inside and outside W3C also address specialized aspects of
517 | Web architecture, including accessibility, quality assurance,
518 | internationalization, device independence, and Web Services. The
519 | section on Architectural Specifications includes references
520 | to these related specifications.
521 |
522 |
523 | This document strives for a balance between brevity and precision
524 | while including illustrative examples. TAG findings are
526 | informational documents that complement the current document by
527 | providing more detail about selected topics. This document includes
528 | some excerpts from the findings. Since the findings evolve
529 | independently, this document includes references to approved TAG
530 | findings. For other TAG issues covered by this document but without
531 | an approved finding, references are to entries in the TAG issues list.
533 |
534 |
535 | Many of the examples in this document that involve human activity
536 | suppose the familiar Web interaction model (illustrated at the
537 | beginning of the Introduction) where a person follows a link via a
538 | user agent, the user agent retrieves and presents data, the user
539 | follows another link, etc. This document does not discuss in any
540 | detail other interaction models such as voice browsing (see, for
541 | example, [[!VOICEXML20]]). The choice of interaction model may have
542 | an impact on expected agent behavior. For instance, when a
543 | graphical user agent running on a laptop computer or hand-held
544 | device encounters an error, the user agent can report errors
545 | directly to the user through visual and audio cues, and present the
546 | user with options for resolving the errors. On the other hand, when
547 | someone is browsing the Web through voice input and audio-only
548 | output, stopping the dialog to wait for user input may reduce
549 | usability since it is so easy to "lose one's place" when browsing
550 | with only audio-output. This document does not discuss how the
551 | principles, constraints, and good practices identified here apply
552 | in all interaction contexts.
553 |
554 |
555 |
556 |
557 | Principles, Constraints, and Good Practice Notes
558 |
559 |
560 | The important points of this document are categorized as follows:
561 |
562 |
563 |
564 | Principle
565 |
566 |
567 | An architectural principle is a fundamental rule that applies to
568 | a large number of situations and variables. Architectural
569 | principles include "separation of concerns", "generic interface",
570 | "self-descriptive syntax," "visible semantics," "network effect"
571 | (Metcalfe's Law), and Amdahl's Law: "The speed of a system is
572 | limited by its slowest component."
573 |
574 |
575 | Constraint
576 |
577 |
578 | In the design of the Web, some choices, like the names of the
579 | p and li elements in HTML, the choice
580 | of the colon (:) character in URIs, or grouping bits into
581 | eight-bit units (octets), are somewhat arbitrary; if
582 | paragraph had been chosen instead of p
583 | or asterisk (*) instead of colon, the large-scale result would,
584 | most likely, have been the same. This document focuses on more
585 | fundamental design choices: design choices that lead to
586 | constraints, i.e., restrictions in behavior or interaction within
587 | the system. Constraints may be imposed for technical, policy, or
588 | other reasons to achieve desirable properties in the system, such
589 | as accessibility, global scope, relative ease of evolution,
590 | efficiency, and dynamic extensibility.
591 |
592 |
593 | Good practice
594 |
595 |
596 | Good practice—by software developers, content authors, site
597 | managers, users, and specification designers—increases the value
598 | of the Web.
599 |
600 |
601 |
602 |
603 |
604 |
605 |
606 | Identification
607 |
608 |
609 | In order to communicate internally, a community agrees (to a reasonable
610 | extent) on a set of terms and their meanings. One goal of the Web,
611 | since its inception, has been to build a global community in which any
612 | party can share information with any other party. To achieve this goal,
613 | the Web makes use of a single global identification system: the URI.
614 | URIs are a cornerstone of Web architecture, providing identification
615 | that is common across the Web. The global scope of URIs promotes
616 | large-scale "network effects": the value of an identifier increases the
617 | more it is used consistently (for example, the more it is used in
618 | hypertext links).
619 |
626 | Global naming leads to global network effects.
627 |
628 |
629 |
630 | This principle dates back at least as far as Douglas Engelbart's
631 | seminal work on open hypertext systems; see section Every Object
633 | Addressable in [[!Eng90]].
634 |
635 |
636 |
637 | Benefits of URIs
638 |
639 |
640 | The choice of syntax for global identifiers is somewhat arbitrary; it
641 | is their global scope that is important. The Uniform Resource
642 | Identifier, [[!URI]], has been successfully deployed since the
643 | creation of the Web. There are substantial benefits to participating
644 | in the existing network of URIs, including linking, bookmarking,
645 | caching, and indexing by search engines, and there are substantial
646 | costs to creating a new identification system that has the same
647 | properties as URIs.
648 |
655 | To benefit from and increase the value of the World Wide Web,
656 | agents should provide URIs as identifiers for resources.
657 |
658 |
659 |
660 | A resource should have an associated URI if another party might
661 | reasonably want to create a hypertext link to it, make or refute
662 | assertions about it, retrieve or cache a representation of it,
663 | include all or part of it by reference into another representation,
664 | annotate it, or perform other operations on it. Software developers
665 | should expect that sharing URIs across applications will be useful,
666 | even if that utility is not initially evident. The TAG finding
667 | "URIs,
669 | Addressability, and the use of HTTP GET and POST"
670 | discusses additional benefits and considerations of URI
671 | addressability.
672 |
673 |
674 | Note: Some URI schemes (such as the "ftp" URI scheme
675 | specification) use the term "designate" where this document uses
676 | "identify."
677 |
678 |
679 |
680 |
681 | URI/Resource Relationships
682 |
683 |
684 | By design a URI identifies one resource. We do not limit the scope of
685 | what might be a resource. The term "resource" is used in a
686 | general sense for whatever might be identified by a URI. It is
687 | conventional on the hypertext Web to describe Web pages, images,
688 | product catalogs, etc. as “resources”. The distinguishing
689 | characteristic of these resources is that all of their essential
690 | characteristics can be conveyed in a message. We identify this set as
691 | “information resources.”
692 |
693 |
694 | This document is an example of an information resource. It consists
695 | of words and punctuation symbols and graphics and other artifacts
696 | that can be encoded, with varying degrees of fidelity, into a
697 | sequence of bits. There is nothing about the essential information
698 | content of this document that cannot in principle be transfered in a
699 | message. In the case of this document, the message payload is the
700 | representation of this document.
701 |
702 |
703 | However, our use of the term resource is intentionally more broad.
704 | Other things, such as cars and dogs (and, if you've printed this
705 | document on physical sheets of paper, the artifact that you are
706 | holding in your hand), are resources too. They are not information
707 | resources, however, because their essence is not information.
708 | Although it is possible to describe a great many things about a car
709 | or a dog in a sequence of bits, the sum of those things will
710 | invariably be an approximation of the essential character of the
711 | resource.
712 |
713 |
714 | We define the term “information resource” because we observe that it
715 | is useful in discussions of Web technology and may be useful in
716 | constructing specifications for facilities built for use on the Web.
717 |
724 | Assign distinct URIs to distinct resources.
725 |
726 |
727 |
728 | Since the scope of a URI is global, the resource identified by a URI
729 | does not depend on the context in which the URI appears (see also the
730 | section about indirect identification).
731 |
732 |
733 | [[!URI]] is an agreement about how the Internet community allocates
734 | names and associates them with the resources they identify. URIs are
735 | divided into schemes that define, via their scheme
736 | specification, the mechanism by which scheme-specific identifiers are
737 | associated with resources. For example, the "http" URI scheme
738 | ([[!HTTP11]]) uses DNS and TCP-based HTTP servers for the purpose of
739 | identifier allocation and resolution. As a result, identifiers such
740 | as "http://example.com/somepath#someFrag" often take on meaning
741 | through the community experience of performing an HTTP GET request on
742 | the identifier and, if given a successful response, interpreting the
743 | response as a representation of the identified resource. (See also
744 | Fragment Identifiers.) Of course, a retrieval action like GET
745 | is not the only way to obtain information about a resource. One might
746 | also publish a document that purports to define the meaning of a
747 | particular URI. These other sources of information may suggest
748 | meanings for such identifiers, but it's a local policy decision
749 | whether those suggestions should be heeded.
750 |
751 |
752 | Just as one might wish to refer to a person by different names (by
753 | full name, first name only, sports nickname, romantic nickname, and
754 | so forth), Web architecture allows the association of more than one
755 | URI with a resource. URIs that identify the same resource are called
756 | URI aliases. The section on URI aliases discusses
757 | some of the potential costs of creating multiple URIs for the same
758 | resource.
759 |
760 |
761 | Several sections of this document address questions about the
762 | relationship between URIs and resources, including:
763 |
764 |
765 |
How much can I tell about a resource by inspection of a URI that
766 | identifies it? See the sections on URI schemes and URI
767 | opacity.
768 |
769 |
Who determines what resource a URI identifies? See the section on
770 | URI allocation.
771 |
787 | By design, a URI identifies one resource. Using the same URI to
788 | directly identify different resources produces a URI
789 | collision. Collision often imposes a cost in communication
790 | due to the effort required to resolve ambiguities.
791 |
792 |
793 | Suppose, for example, that one organization makes use of a URI to
794 | refer to the movie The Sting, and another organization
795 | uses the same URI to refer to a discussion forum about The
796 | Sting. To a third party, aware of both organizations, this
797 | collision creates confusion about what the URI identifies,
798 | undermining the value of the URI. If one wanted to talk about the
799 | creation date of the resource identified by the URI, for instance,
800 | it would not be clear whether this meant "when the movie was
801 | created" or "when the discussion forum about the movie was
802 | created."
803 |
804 |
805 | Social and technical solutions have been devised to help avoid URI
806 | collision. However, the success or failure of these different
807 | approaches depends on the extent to which there is consensus in the
808 | Internet community on abiding by the defining specifications.
809 |
810 |
811 | The section on URI allocation examines
812 | approaches for establishing the authoritative source of information
813 | about what resource a URI identifies.
814 |
815 |
816 | URIs are sometimes used for indirect identification. This
817 | does not necessarily lead to collisions.
818 |
819 |
820 |
821 |
822 | URI allocation
823 |
824 |
825 | URI allocation is the process of associating a URI with a resource.
826 | Allocation can be performed both by resource owners and by other
827 | parties. It is important to avoid URI collision.
828 |
829 |
830 |
831 | URI ownership
832 |
833 |
834 | URI ownership is a relation between a URI and a social
835 | entity, such as a person, organization, or specification. URI
836 | ownership gives the relevant social entity certain rights,
837 | including:
838 |
839 |
840 |
to pass on ownership of some or all owned URIs to another
841 | owner—delegation; and
842 |
843 |
to associate a resource with an owned URI—URI allocation.
844 |
845 |
846 |
847 | By social convention, URI ownership is delegated from the IANA
848 | URI scheme registry [[!IANASchemes]], itself a social entity, to
849 | IANA-registered URI scheme specifications. Some URI scheme
850 | specifications further delegate ownership to subordinate
851 | registries or to other nominated owners, who may further delegate
852 | ownership. In the case of a specification, ownership ultimately
853 | lies with the community that maintains the specification.
854 |
855 |
856 | The approach taken for the "http" URI scheme, for example,
857 | follows the pattern whereby the Internet community delegates
858 | authority, via the IANA URI scheme registry and the DNS, over a
859 | set of URIs with a common prefix to one particular owner. One
860 | consequence of this approach is the Web's heavy reliance on the
861 | central DNS registry. A different approach is taken by the URN
862 | Syntax scheme [[!RFC2141] which delegates ownership of portions
863 | of URN space to URN Namespace specifications which themselves are
864 | registered in an IANA-maintained registry of URN Namespace
865 | Identifiers.
866 |
867 |
868 | URI owners are responsible for avoiding the assignment of
869 | equivalent URIs to multiple resources. Thus, if a URI scheme
870 | specification does provide for the delegation of individual or
871 | organized sets of URIs, it should take pains to ensure that
872 | ownership ultimately resides in the hands of a single social
873 | entity. Allowing multiple owners increases the likelihood of URI
874 | collisions.
875 |
876 |
877 | URI owners may organize or deploy infrastruture to ensure that
878 | representations of associated resources are available and, where
879 | appropriate, interaction with the resource is possible through
880 | the exchange of representations. There are social expectations
881 | for responsible representation management
882 | by URI owners. Additional social implications of URI ownership
883 | are not discussed here.
884 |
885 |
886 | See TAG issue siteData-36,
888 | which concerns the expropriation of naming authority.
889 |
890 |
891 |
892 |
893 | Other allocation schemes
894 |
895 |
896 | Some schemes use techniques other than delegated ownership to
897 | avoid collision. For example, the specification for the data URL
898 | (sic) scheme [[!RFC2397]] specifies that the resource identified
899 | by a data scheme URI has only one possible representation. The
900 | representation data makes up the URI that identifies that
901 | resource. Thus, the specification itself determines how data URIs
902 | are allocated; no delegation is possible.
903 |
904 |
905 | Other schemes (such as "news:comp.text.xml") rely on a social
906 | process.
907 |
908 |
909 |
910 |
911 |
912 | Indirect Identification
913 |
914 |
915 | To say that the URI "mailto:nadia@example.com" identifies both an
916 | Internet mailbox and Nadia, the person, introduces a URI collision.
917 | However, we can use the URI to indirectly identify Nadia.
918 | Identifiers are commonly used in this way.
919 |
920 |
921 | Listening to a news broadcast, one might hear a report on Britain
922 | that begins, "Today, 10 Downing Street announced a series of new
923 | economic measures." Generally, "10 Downing Street" identifies the
924 | official residence of Britain's Prime Minister. In this context,
925 | the news reporter is using it (as English rhetoric allows) to
926 | indirectly identify the British government. Similarly, URIs
927 | identify resources, but they can also be used in many constructs to
928 | indirectly identify other resources. Globally adopted assignment
929 | policies make some URIs appealing as general-purpose identifiers.
930 | Local policy establishes what they indirectly identify.
931 |
932 |
933 | Suppose that nadia@example.com is Nadia's email
934 | address. The organizers of a conference Nadia attends might use
935 | "mailto:nadia@example.com" to refer indirectly to her (e.g., by
936 | using the URI as a database key in their database of conference
937 | participants). This does not introduce a URI collision.
938 |
939 |
940 |
941 |
942 |
943 | URI Comparisons
944 |
945 |
946 | URIs that are identical, character-by-character, refer to the same
947 | resource. Since Web Architecture allows the association of multiple
948 | URIs with a given resource, two URIs that are not
949 | character-by-character identical may still refer to the same
950 | resource. Different URIs do not necessarily refer to different
951 | resources but there is generally a higher computational cost to
952 | determine that different URIs refer to the same resource.
953 |
954 |
955 | To reduce the risk of a false negative (i.e., an incorrect conclusion
956 | that two URIs do not refer to the same resource) or a false positive
957 | (i.e., an incorrect conclusion that two URIs do refer to the same
958 | resource), some specifications describe equivalence tests in addition
959 | to character-by-character comparison. Agents that reach conclusions
960 | based on comparisons that are not licensed by the relevant
961 | specifications take responsibility for any problems that result; see
962 | the section on error handling for more
963 | information about responsible behavior when reaching unlicensed
964 | conclusions. Section 6 of [[!URI]] provides more information about
965 | comparing URIs and reducing the risk of false negatives and
966 | positives.
967 |
977 | Although there are benefits (such as naming flexibility) to URI
978 | aliases, there are also costs. URI aliases are harmful when they
979 | divide the Web of related resources. A corollary of Metcalfe's
980 | Principle (the "network effect") is that the value of a given
981 | resource can be measured by the number and value of other resources
982 | in its network neighborhood, that is, the resources that link to
983 | it.
984 |
985 |
986 | The problem with aliases is that if half of the neighborhood points
987 | to one URI for a given resource, and the other half points to a
988 | second, different URI for that same resource, the neighborhood is
989 | divided. Not only is the aliased resource undervalued because of
990 | this split, the entire neighborhood of resources loses value
991 | because of the missing second-order relationships that should have
992 | existed among the referring resources by virtue of their references
993 | to the aliased resource.
994 |
1001 | A URI owner SHOULD NOT associate arbitrarily different URIs with
1002 | the same resource.
1003 |
1004 |
1005 |
1006 | URI consumers also have a role in ensuring URI consistency. For
1007 | instance, when transcribing a URI, agents should not gratuitously
1008 | percent-encode characters. The term "character" refers to URI
1009 | characters as defined in section 2 of [[!URI]]; percent-encoding is
1010 | discussed in section 2.1 of that specification.
1011 |
1018 | An agent that receives a URI SHOULD refer to the associated
1019 | resource using the same URI, character-by-character.
1020 |
1021 |
1022 |
1023 | When a URI alias does become common currency, the URI owner should use protocol techniques such
1025 | as server-side redirects to relate the two resources. The community
1026 | benefits when the URI owner supports redirection of an aliased URI
1027 | to the corresponding "official" URI. For more information on
1028 | redirection, see section 10.3, Redirection, in [[!HTTP11]]. See
1029 | also [[!CHIPS]] for a discussion of some best practices for server
1030 | administrators.
1031 |
1032 |
1033 |
1034 |
1035 | Representation reuse
1036 |
1037 |
1038 | URI aliasing only occurs when more than one URI is used to identify
1039 | the same resource. The fact that different resources sometimes have
1040 | the same representation does not make the URIs for those resources
1041 | aliases.
1042 |
1043 |
1044 |
1045 | Story
1046 |
1047 |
1048 |
1049 | Dirk would like to add a link from his Web site to the Oaxaca
1050 | weather site. He uses the URI http://weather.example.com/oaxaca
1051 | and labels his link “report on weather in Oaxaca on
1052 | 1 August 2004”. Nadia points out to Dirk that he is
1053 | setting misleading expectations for the URI he has used. The
1054 | Oaxaca weather site policy is that the URI in question
1055 | identifies a report on the current weather in Oaxaca—on any
1056 | given day—and not the weather on 1 August. Of course, on the
1057 | first of August in 2004, Dirk's link will be correct, but the
1058 | rest of the time he will be misleading readers. Nadia points
1059 | out to Dirk that the managers of the Oaxaca weather site do
1060 | make available a different URI permanently assigned to a
1061 | resource reporting on the weather on 1 August 2004.
1062 |
1063 |
1064 |
1065 |
1066 | In this story, there are two resources: “a report on the current
1067 | weather in Oaxaca” and “a report on the weather in Oaxaca on
1068 | 1 August 2004”. The managers of the Oaxaca weather site
1069 | assign two URIs to these two different resources. On
1070 | 1 August 2004, the representations for these resources
1071 | are identical. That fact that dereferencing two different URIs
1072 | produces identical representations does not imply that the two URIs
1073 | are aliases.
1074 |
1075 |
1076 |
1077 |
1078 |
1079 | URI Schemes
1080 |
1081 |
1082 | In the URI "http://weather.example.com/", the "http" that appears
1083 | before the colon (":") names a URI scheme. Each URI scheme has a
1084 | specification that explains the scheme-specific details of how scheme
1085 | identifiers are allocated and become associated with a resource. The
1086 | URI syntax is thus a federated and extensible naming system wherein
1087 | each scheme's specification may further restrict the syntax and
1088 | semantics of identifiers within that scheme.
1089 |
1090 |
1091 | Examples of URIs from various schemes include:
1092 |
1108 | While Web architecture allows the definition of new schemes,
1109 | introducing a new scheme is costly. Many aspects of URI processing
1110 | are scheme-dependent, and a large amount of deployed software already
1111 | processes URIs of well-known schemes. Introducing a new URI scheme
1112 | requires the development and deployment not only of client software
1113 | to handle the scheme, but also of ancillary agents such as gateways,
1114 | proxies, and caches. See [[!RFC2718]] for other considerations and
1115 | costs related to URI scheme design.
1116 |
1117 |
1118 | Because of these costs, if a URI scheme exists that meets the needs
1119 | of an application, designers should use it rather than invent one.
1120 |
1127 | A specification SHOULD reuse an existing URI scheme (rather than
1128 | create a new one) when it provides the desired properties of
1129 | identifiers and their relation to resources.
1130 |
1131 |
1132 |
1133 | Consider our travel scenario: should the
1134 | agent providing information about the weather in Oaxaca register a
1135 | new URI scheme "weather" for the identification of resources related
1136 | to the weather? They might then publish URIs such as
1137 | "weather://travel.example.com/oaxaca". When a software agent
1138 | dereferences such a URI, if what really happens is that HTTP GET is
1139 | invoked to retrieve a representation of the resource, then an "http"
1140 | URI would have sufficed.
1141 |
1142 |
1143 |
1144 | URI Scheme Registration
1145 |
1146 |
1147 | The Internet Assigned Numbers Authority (IANA)
1148 | maintains a registry [[!IANASchemes]] of mappings between URI
1149 | scheme names and scheme specifications. For instance, the IANA
1150 | registry indicates that the "http" scheme is defined in
1151 | [[!HTTP11]]. The process for registering a new URI scheme is
1152 | defined in [[!RFC2717]].
1153 |
1154 |
1155 | Unregistered URI schemes SHOULD NOT be used for a number of
1156 | reasons:
1157 |
1158 |
1159 |
There is no generally accepted way to locate the scheme
1160 | specification.
1161 |
1162 |
Someone else may be using the scheme for other purposes.
1163 |
1164 |
One should not expect that general-purpose software will do
1165 | anything useful with URIs of this scheme beyond URI comparison.
1166 |
1167 |
1168 |
1169 | One misguided motivation for registering a new URI scheme is to
1170 | allow a software agent to launch a particular application when
1171 | retrieving a representation. The same thing can be accomplished at
1172 | lower expense by dispatching instead on the type of the
1173 | representation, thereby allowing use of existing transfer protocols
1174 | and implementations.
1175 |
1176 |
1177 | Even if an agent cannot process representation data in an unknown
1178 | format, it can at least retrieve it. The data may contain enough
1179 | information to allow a user or user agent to make some use of it.
1180 | When an agent does not handle a new URI scheme, it cannot retrieve
1181 | a representation.
1182 |
1183 |
1184 | When designing a new data format, the preferred mechanism to
1185 | promote its deployment on the Web is the Internet media type (see
1186 | Representation Types and Internet
1187 | Media Types). Media types also provide a means for building new
1188 | information applications, as described in future directions for data formats .
1190 |
1191 |
1192 |
1193 |
1194 |
1195 | URI Opacity
1196 |
1197 |
1198 | It is tempting to guess the nature of a resource by inspection of a
1199 | URI that identifies it. However, the Web is designed so that agents
1200 | communicate resource information state through representations, not identifiers. In
1202 | general, one cannot determine the type of a resource representation
1203 | by inspecting a URI for that resource. For example, the ".html" at
1204 | the end of "http://example.com/page.html" provides no guarantee that
1205 | representations of the identified resource will be served with the
1206 | Internet media type "text/html". The publisher is free to allocate
1207 | identifiers and define how they are served. The HTTP protocol does
1208 | not constrain the Internet media type based on the path component of
1209 | the URI; the URI owner is free to configure the server to return a
1210 | representation using PNG or any other data format.
1211 |
1212 |
1213 | Resource state may evolve over time. Requiring a URI owner to publish
1214 | a new URI for each change in resource state would lead to a
1215 | significant number of broken references. For robustness, Web
1216 | architecture promotes independence between an identifier and the
1217 | state of the identified resource.
1218 |
1225 | Agents making use of URIs SHOULD NOT attempt to infer properties of
1226 | the referenced resource.
1227 |
1228 |
1229 |
1230 | In practice, a small number of inferences can be made because they
1231 | are explicitly licensed by the relevant specifications. Some of these
1232 | inferences are discussed in the details of retrieving a representation .
1234 |
1235 |
1236 | The example URI used in the travel scenario
1237 | ("http://weather.example.com/oaxaca") suggests to a human reader that
1238 | the identified resource has something to do with the weather in
1239 | Oaxaca. A site reporting the weather in Oaxaca could just as easily
1240 | be identified by the URI "http://vjc.example.com/315". And the URI
1241 | "http://weather.example.com/vancouver" might identify the resource
1242 | "my photo album."
1243 |
1244 |
1245 | On the other hand, the URI "mailto:joe@example.com" indicates that
1246 | the URI refers to a mailbox. The "mailto" URI scheme specification
1247 | authorizes agents to infer that URIs of this form identify Internet
1248 | mailboxes.
1249 |
1250 |
1251 | Some URI assignment authorities document and publish their URI
1252 | assignment policies. For more information about URI opacity, see TAG
1253 | issues metaDataInURI-31
1255 | and siteData-36.
1257 |
1258 |
1259 |
1260 |
1261 | Fragment Identifiers
1262 |
1263 |
1264 |
1265 | Story
1266 |
1267 |
1268 |
1269 | When browsing the XHTML document that Nadia receives as a
1270 | representation of the resource identified by
1271 | "http://weather.example.com/oaxaca", she finds that the URI
1272 | "http://weather.example.com/oaxaca#weekend" refers to the part of
1273 | the representation that conveys information about the weekend
1274 | outlook. This URI includes the fragment identifier "weekend" (the
1275 | string after the "#").
1276 |
1277 |
1278 |
1279 |
1280 | The fragment identifier component of a URI allows indirect
1281 | identification of a secondary resource by reference to a
1282 | primary resource and additional identifying information. The
1283 | secondary resource may be some portion or subset of the primary
1284 | resource, some view on representations of the primary resource, or
1285 | some other resource defined or described by those representations.
1286 | The terms "primary resource" and "secondary resource" are defined in
1287 | section 3.5 of [[!URI]].
1288 |
1289 |
1290 | The terms “primary” and “secondary” in this context do not limit the
1291 | nature of the resource—they are not classes. In this context, primary
1292 | and secondary simply indicate that there is a relationship between
1293 | the resources for the purposes of one URI: the URI with a fragment
1294 | identifier. Any resource can be identified as a secondary resource.
1295 | It might also be identified using a URI without a fragment
1296 | identifier, and a resource may be identified as a secondary resource
1297 | via multiple URIs. The purpose of these terms is to enable discussion
1298 | of the relationship between such resources, not to limit the nature
1299 | of a resource.
1300 |
1307 | See TAG issue abstractComponentRefs-37,
1309 | which concerns the use of fragment identifiers with namespace names
1310 | to identify abstract components.
1311 |
1312 |
1313 |
1314 |
1315 | Future Directions for Identifiers
1316 |
1317 |
1318 | There remain open questions regarding identifiers on the Web.
1319 |
1320 |
1321 |
1322 | Internationalized identifiers
1323 |
1324 |
1325 | The integration of internationalized identifiers (i.e., composed of
1326 | characters beyond those allowed by [[!URI]]) into the Web
1327 | architecture is an important and open issue. See TAG issue IRIEverywhere-27
1329 | for discussion about work going on in this area.
1330 |
1331 |
1332 |
1333 |
1334 | Assertion that two URIs identify the same resource
1335 |
1336 |
1337 | Emerging Semantic Web technologies, including the "Web Ontology
1338 | Language (OWL)" [OWL10], define RDF properties
1339 | such as sameAs to assert that two URIs identify the
1340 | same resource or inverseFunctionalProperty to imply
1341 | it.
1342 |
1343 |
1344 |
1345 |
1346 |
1347 |
1348 | Interaction
1349 |
1350 |
1351 | Communication between agents over a network about resources involves
1352 | URIs, messages, and data. The Web's protocols (including HTTP, FTP,
1353 | SOAP, NNTP, and SMTP) are based on the exchange of messages. A
1354 | message may include data as well as metadata about a
1355 | resource (such as the "Alternates" and "Vary" HTTP headers), the
1356 | message data, and the message itself (such as the "Transfer-encoding"
1357 | HTTP header). A message may even include metadata about the message
1358 | metadata (for message-integrity checks, for instance).
1359 |
1360 |
1361 |
1362 | Story
1363 |
1364 |
1365 |
1366 | Nadia follows a hypertext link labeled "satellite image" expecting
1367 | to retrieve a satellite photo of the Oaxaca region. The link to the
1368 | satellite image is an XHTML link encoded as <a
1369 | href="http://example.com/satimage/oaxaca">satellite
1370 | image</a>. Nadia's browser analyzes the URI and
1371 | determines that its scheme is "http". The
1372 | browser configuration determines how it locates the identified
1373 | information, which might be via a cache of prior retrieval actions,
1374 | by contacting an intermediary (such as a proxy server), or by
1375 | direct access to the server identified by a portion of the URI. In
1376 | this example, the browser opens a network connection to port 80 on
1377 | the server at "example.com" and sends a "GET" message as specified
1378 | by the HTTP protocol, requesting a representation of the resource.
1379 |
1380 |
1381 | The server sends a response message to the browser, once again
1382 | according to the HTTP protocol. The message consists of several
1383 | headers and a JPEG image. The browser reads the headers, learns
1384 | from the "Content-Type" field that the Internet media type of the
1385 | representation is "image/jpeg", reads the sequence of octets that
1386 | make up the representation data, and renders the image.
1387 |
1388 |
1389 |
1390 |
1391 | This section describes the architectural principles and constraints
1392 | regarding interactions between agents, including such topics as network
1393 | protocols and interaction styles, along with interactions between the
1394 | Web as a system and the people that make use of it. The fact that the
1395 | Web is a highly distributed system affects architectural constraints
1396 | and assumptions about interactions.
1397 |
1398 |
1399 |
1400 | Using a URI to Access a Resource
1401 |
1402 |
1403 | Agents may use a URI to access the referenced resource; this is
1404 | called dereferencing the URI. Access may take many forms,
1405 | including retrieving a representation of the resource (for instance,
1406 | by using HTTP GET or HEAD), adding or modifying a representation of
1407 | the resource (for instance, by using HTTP POST or PUT, which in some
1408 | cases may change the actual state of the resource if the submitted
1409 | representations are interpreted as instructions to that end), and
1410 | deleting some or all representations of the resource (for instance,
1411 | by using HTTP DELETE, which in some cases may result in the deletion
1412 | of the resource itself).
1413 |
1414 |
1415 | There may be more than one way to access a resource for a given URI;
1416 | application context determines which access method an agent uses. For
1417 | instance, a browser might use HTTP GET to retrieve a representation
1418 | of a resource, whereas a hypertext link checker might use HTTP HEAD
1419 | on the same URI simply to establish whether a representation is
1420 | available. Some URI schemes set expectations about available access
1421 | methods, others (such as the URN scheme [[!URN]]) do not. Section
1422 | 1.2.2 of [[!URI]] discusses the separation of identification and
1423 | interaction in more detail. For more information about relationships
1424 | between multiple access methods and URI addressability, see the TAG
1425 | finding "URIs,
1427 | Addressability, and the use of HTTP GET and POST".
1428 |
1429 |
1430 | Although many URI schemes are named after
1431 | protocols, this does not imply that use of such a URI will
1432 | necessarily result in access to the resource via the named protocol.
1433 | Even when an agent uses a URI to retrieve a representation, that
1434 | access might be through gateways, proxies, caches, and name
1435 | resolution services that are independent of the protocol associated
1436 | with the scheme name.
1437 |
1438 |
1439 | Many URI schemes define a default interaction protocol for attempting
1440 | access to the identified resource. That interaction protocol is often
1441 | the basis for allocating identifiers within that scheme, just as
1442 | "http" URIs are defined in terms of TCP-based HTTP servers. However,
1443 | this does not imply that all interaction with such resources is
1444 | limited to the default interaction protocol. For example, information
1445 | retrieval systems often make use of proxies to interact with a
1446 | multitude of URI schemes, such as HTTP proxies being used to access
1447 | "ftp" and "wais" resources. Proxies can also to provide enhanced
1448 | services, such as annotation proxies that combine normal information
1449 | retrieval with additional metadata retrieval to provide a seamless,
1450 | multidimensional view of resources using the same protocols and user
1451 | agents as the non-annotated Web. Likewise, future protocols may be
1452 | defined that encompass our current systems, using entirely different
1453 | interaction mechanisms, without changing the existing identifier
1454 | schemes. See also, principle of
1455 | orthogonal specifications.
1456 |
1457 |
1458 |
1459 | Details of retrieving a representation
1460 |
1461 |
1462 | Dereferencing a URI generally involves a succession of steps as
1463 | described in multiple specifications and implemented by the agent.
1464 | The following example illustrates the series of specifications that
1465 | governs the process when a user agent is instructed to follow a
1466 | hypertext link that is part of an SVG
1467 | document. In this example, the URI is
1468 | "http://weather.example.com/oaxaca" and the application context
1469 | calls for the user agent to retrieve and render a representation of
1470 | the identified resource.
1471 |
1472 |
1473 |
Since the URI is part of a hypertext link in an SVG document,
1474 | the first relevant specification is the SVG 1.1 Recommendation
1475 | [SVG11]. Section 17.1 of this
1477 | specification imports the link semantics defined in XLink 1.0
1478 | [XLink10]: "The remote resource (the
1479 | destination for the link) is defined by a URI specified by the
1480 | XLink href attribute on the 'a' element."
1481 | The SVG specification goes on to state that interpretation of an
1482 | a element involves retrieving a representation of a
1483 | resource, identified by the href attribute in the
1484 | XLink namespace: "By activating these links (by clicking with the
1485 | mouse, through keyboard input, voice commands, etc.), users may
1486 | visit these resources."
1487 |
1488 |
The XLink 1.0 [XLink10] specification,
1489 | which defines the href attribute in section 5.4,
1490 | states that "The value of the href attribute must be a URI
1491 | reference as defined in [IETF RFC 2396], or must result in a URI
1492 | reference after the escaping procedure described below is applied."
1493 |
1494 |
The URI specification [[!URI]] states that "Each URI begins
1495 | with a scheme name that refers to a specification for assigning
1496 | identifiers within that scheme." The URI scheme name in this
1497 | example is "http".
1498 |
1499 |
[[!IANASchemes]] states that the "http" scheme is defined by
1500 | the HTTP/1.1 specification (RFC 2616 [[!HTTP11]], section 3.2.2).
1501 |
1502 |
In this SVG context, the agent constructs an HTTP GET request
1503 | (per section 9.3 of [[!HTTP11]]) to retrieve the representation.
1504 |
1505 |
Section 6 of [[!HTTP11]] defines how the server constructs a
1506 | corresponding response message, including the 'Content-Type' field.
1507 |
1508 |
Section 1.4 of [[!HTTP11]] states "HTTP communication usually
1509 | takes place over TCP/IP connections." This example addresses
1510 | neither that step in the process nor other steps such as Domain
1511 | Name System (DNS) resolution.
1512 |
1513 |
The agent interprets the returned representation according to
1514 | the data format specification that corresponds to the
1515 | representation's Internet Media
1516 | Type (the value of the HTTP 'Content-Type') in the relevant
1517 | IANA registry [MEDIATYPEREG].
1518 |
1519 |
1520 |
1521 | Precisely which representation(s) are retrieved depends on a number
1522 | of factors, including:
1523 |
1524 |
1525 |
Whether the URI owner makes available any representations at
1526 | all;
1527 |
1528 |
Whether the agent making the request has access privileges for
1529 | those representations (see the section on linking and access control);
1531 |
1532 |
If the URI owner has provided more than one representation (in
1533 | different formats such as HTML, PNG, or RDF; in different languages
1534 | such as English and Spanish; or transformed dynamically according
1535 | to the hardware or software capabilities of the recipient), the
1536 | resulting representation may depend on negotiation between the user
1537 | agent and server.
1538 |
1539 |
The time of the request; the world changes over time, so
1540 | representations of resources are also likely to change over time.
1541 |
1542 |
1543 |
1544 | Assuming that a representation has been successfully retrieved, the
1545 | expressive power of the representation's format will affect how
1546 | precisely the representation provider communicates resource state.
1547 | If the representation communicates the state of the resource
1548 | inaccurately, this inaccuracy or ambiguity may lead to confusion
1549 | among users about what the resource is. If different users reach
1550 | different conclusions about what the resource is, they may
1551 | interpret this as a URI collision .
1552 | Some communities, such as the ones developing the Semantic Web,
1553 | seek to provide a framework for accurately communicating the
1554 | semantics of a resource in a machine readable way. Machine readable
1555 | semantics may alleviate some of the ambiguity associated with
1556 | natural language descriptions of resources.
1557 |
1565 | A representation is data that encodes information about
1566 | resource state. Representations do not necessarily describe the
1567 | resource, or portray a likeness of the resource, or represent the
1568 | resource in other senses of the word "represent".
1569 |
1570 |
1571 | Representations of a resource may be sent or received using
1572 | interaction protocols. These protocols in turn determine the form in
1573 | which representations are conveyed on the Web. HTTP, for example,
1574 | provides for transmission of representations as octet streams typed
1575 | using Internet media types [RFC2046].
1576 |
1577 |
1578 | Just as it is important to reuse existing URI schemes whenever
1579 | possible, there are significant benefits to using media typed octet
1580 | streams for representations even in the unusual case where a new URI
1581 | scheme and associated protocol is to be defined. For example, if the
1582 | Oaxaca weather were conveyed to Nadia's browser using a protocol
1583 | other than HTTP, then software to render formats such as
1584 | text/xhmtl+xml and image/png would still be usable if the new
1585 | protocol supported transmission of those types. This is an example of
1586 | the principle of orthogonal
1587 | specifications.
1588 |
1595 | New protocols created for the Web SHOULD transmit representations
1596 | as octet streams typed by Internet media types.
1597 |
1598 |
1599 |
1600 | The Internet media type mechanism does have some limitations. For
1601 | instance, media type strings do not support versioning or other parameters. See TAG issues
1603 | uriMediaType-9
1605 | and mediaTypeManagement-45
1607 | which concern aspects of the media type mechanism.
1608 |
1609 |
1610 |
1611 | Representation types and fragment identifier semantics
1612 |
1613 |
1614 | The Internet Media Type defines the syntax and semantics of the
1615 | fragment identifier (introduced in Fragment
1616 | Identifiers), if any, that may be used in conjunction with a
1617 | representation.
1618 |
1619 |
1620 |
1621 | Story
1622 |
1623 |
1624 |
1625 | In one of his XHTML pages, Dirk creates a hypertext link to an
1626 | image that Nadia has published on the Web. He creates a
1627 | hypertext link with <a
1628 | href="http://www.example.com/images/nadia#hat">Nadia's
1629 | hat</a>. Emma views Dirk's XHTML page in her Web
1630 | browser and follows the link. The HTML implementation in her
1631 | browser removes the fragment from the URI and requests the
1632 | image "http://www.example.com/images/nadia". Nadia serves an
1633 | SVG representation of the image (with Internet media type
1634 | "image/svg+xml"). Emma's Web browser starts up an SVG
1635 | implementation to view the image. It passes it the original URI
1636 | including the fragment,
1637 | "http://www.example.com/images/nadia#hat" to this
1638 | implementation, causing a view of the hat to be displayed
1639 | rather than the complete image.
1640 |
1641 |
1642 |
1643 |
1644 | Note that the HTML implementation in Emma's browser did not need to
1645 | understand the syntax or semantics of the SVG fragment
1646 | (nor does the SVG implementation have to understand HTML, WebCGM,
1647 | RDF ... fragment syntax or semantics; it merely had to recognize
1648 | the # delimiter from the URI syntax [URI] and remove the fragment
1649 | when accessing the resource). This orthogonality is an important feature of
1651 | Web architecture; it is what enabled Emma's browser to provide a
1652 | useful service without requiring an upgrade.
1653 |
1654 |
1655 | The semantics of a fragment identifier are defined by the set of
1656 | representations that might result from a retrieval action on the
1657 | primary resource. The fragment's format and resolution are
1658 | therefore dependent on the type of a potentially retrieved
1659 | representation, even though such a retrieval is only performed if
1660 | the URI is dereferenced. If no such representation exists, then the
1661 | semantics of the fragment are considered unknown and, effectively,
1662 | unconstrained. Fragment identifier semantics are orthogonal to URI
1663 | schemes and thus cannot be redefined by URI scheme specifications.
1664 |
1665 |
1666 | Interpretation of the fragment identifier is performed solely by
1667 | the agent that dereferences a URI; the fragment identifier is not
1668 | passed to other systems during the process of retrieval. This means
1669 | that some intermediaries in Web architecture (such as proxies) have
1670 | no interaction with fragment identifiers and that redirection (in
1671 | HTTP [[!HTTP11]], for example) does not account for fragments.
1672 |
1673 |
1674 |
1675 |
1676 | Fragment identifiers and content negotiation
1677 |
1678 |
1679 | Content negotiation refers to the practice of making
1680 | available multiple representations via the same URI. Negotiation
1681 | between the requesting agent and the server determines which
1682 | representation is served (usually with the goal of serving the
1683 | "best" representation a receiving agent can process). HTTP is an
1684 | example of a protocol that enables representation providers to use
1685 | content negotiation.
1686 |
1687 |
1688 | Individual data formats may define their own rules for use of the
1689 | fragment identifier syntax for specifying different types of
1690 | subsets, views, or external references that are identifiable as
1691 | secondary resources by that media type. Therefore, representation
1692 | providers must manage content negotiation carefully when used with
1693 | a URI that contains a fragment identifier. Consider an example
1694 | where the owner of the URI
1695 | "http://weather.example.com/oaxaca/map#zicatela" uses content
1696 | negotiation to serve two representations of the identified
1697 | resource. Three situations can arise:
1698 |
1699 |
1700 |
The interpretation of "zicatela" is defined consistently by
1701 | both data format specifications. The representation provider
1702 | decides when definitions of fragment identifier semantics are are
1703 | sufficiently consistent.
1704 |
1705 |
The interpretation of "zicatela" is defined inconsistently by
1706 | the data format specifications.
1707 |
1708 |
The interpretation of "zicatela" is defined in one data format
1709 | specification but not the other.
1710 |
1711 |
1712 |
1713 | The first situation—consistent semantics—poses no problem.
1714 |
1715 |
1716 | The second case is a server management error: representation
1717 | providers must not use content negotiation to serve representation
1718 | formats that have inconsistent fragment identifier semantics. This
1719 | situation also leads to URI collision
1720 | .
1721 |
1722 |
1723 | The third case is not a server management error. It is a means by
1724 | which the Web can grow. Because the Web is a distributed system in
1725 | which formats and agents are deployed in a non-uniform manner, Web
1726 | architecture does not constrain authors to only use "lowest common
1727 | denominator" formats. Content authors may take advantage of new
1728 | data formats while still ensuring reasonable backward-compatibility
1729 | for agents that do not yet implement them.
1730 |
1731 |
1732 | In case three, behavior by the receiving agent should vary
1733 | depending on whether the negotiated format defines fragment
1734 | identifier semantics. When a received data format does not define
1735 | fragment identifier semantics, the agent should not perform
1736 | silent error recovery unless the
1737 | user has given consent; see [[!CUAP]] for additional suggested
1738 | agent behavior in this case.
1739 |
1748 | Inconsistencies between Representation Data and Metadata
1749 |
1750 |
1751 | Successful communication between two parties depends on a reasonably
1752 | shared understanding of the semantics of exchanged messages, both
1753 | data and metadata. At times, there may be inconsistencies between a
1754 | message sender's data and metadata. Examples, observed in practice,
1755 | of inconsistencies between representation data and metadata include:
1756 |
1757 |
1758 |
The actual character encoding of a representation (e.g.,
1759 | "iso-8859-1", specified by the encoding attribute in an
1760 | XML declaration) is inconsistent with the charset parameter in the
1761 | representation metadata (e.g., "utf-8", specified by the
1762 | 'Content-Type' field in an HTTP header).
1763 |
1764 |
The namespace of the root element
1765 | of XML representation data (e.g., as specified by the "xmlns"
1766 | attribute) is inconsistent with the value of the 'Content-Type' field
1767 | in an HTTP header.
1768 |
1769 |
1770 |
1771 | On the other hand, there is no inconsistency in serving HTML content
1772 | with the media type "text/plain", for example, as this combination is
1773 | licensed by specifications.
1774 |
1775 |
1776 | Receiving agents should detect protocol inconsistencies and perform
1777 | proper error recovery.
1778 |
1785 | Agents MUST NOT ignore message metadata without the consent of the
1786 | user.
1787 |
1788 |
1789 |
1790 | Thus, for example, if the parties responsible for
1791 | "weather.example.com" mistakenly label the satellite photo of Oaxaca
1792 | as "image/gif" instead of "image/jpeg", and if Nadia's browser
1793 | detects a problem, Nadia's browser must not ignore the problem (e.g.,
1794 | by simply rendering the JPEG image) without Nadia's consent. Nadia's
1795 | browser can notify Nadia of the problem or notify Nadia and take
1796 | corrective action.
1797 |
1798 |
1799 | Furthermore, representation providers can help reduce the risk of
1800 | inconsistencies through careful assignment of representation metadata
1801 | (especially that which applies across representations). The section
1802 | on media types for XML presents an
1803 | example of reducing the risk of error by providing no metadata about
1804 | character encoding when serving XML.
1805 |
1806 |
1807 | The accuracy of metadata relies on the server administrators, the
1808 | authors of representations, and the software that they use.
1809 | Practically, the capabilities of the tools and the social
1810 | relationships may be the limiting factors.
1811 |
1812 |
1813 | The accuracy of these and other metadata fields is just as important
1814 | for dynamic Web resources, where a little bit of thought and
1815 | programming can often ensure correct metadata for a huge number of
1816 | resources.
1817 |
1818 |
1819 | Often there is a separation of control between the users who create
1820 | representations of resources and the server managers who maintain the
1821 | Web site software. Given that it is generally the Web site software
1822 | that provides the metadata associated with a resource, it follows
1823 | that coordination between the server managers and content creators is
1824 | required.
1825 |
1832 | Server managers SHOULD allow representation creators to control the
1833 | metadata associated with their representations.
1834 |
1835 |
1836 |
1837 | In particular, content creators need to be able to control the
1838 | content type (for extensibility) and the character encoding (for
1839 | proper internationalization).
1840 |
1841 |
1842 | The TAG finding "Authoritative
1844 | Metadata" discusses in more detail how to handle
1845 | data/metadata inconsistency and how server configuration can be used
1846 | to avoid it.
1847 |
1848 |
1849 |
1850 |
1851 | Safe Interactions
1852 |
1853 |
1854 | Nadia's retrieval of weather information (an example of a read-only
1855 | query or lookup) qualifies as a "safe" interaction; a safe
1856 | interaction is one where the agent does not incur any
1857 | obligation beyond the interaction. An agent may incur an obligation
1858 | through other means (such as by signing a contract). If an agent does
1859 | not have an obligation before a safe interaction, it does not have
1860 | that obligation afterwards.
1861 |
1862 |
1863 | Other Web interactions resemble orders more than queries. These
1864 | unsafe interactions may cause a change to the state of a
1865 | resource and the user may be held responsible for the consequences of
1866 | these interactions. Unsafe interactions include subscribing to a
1867 | newsletter, posting to a list, or modifying a database.
1868 | Note: In this context, the word "unsafe" does not
1869 | necessarily mean "dangerous"; the term "safe" is used in section
1870 | 9.1.1 of [[!HTTP11]] and "unsafe" is the natural opposite.
1871 |
1872 |
1873 |
1874 | Story
1875 |
1876 |
1877 |
1878 | Nadia decides to book a vacation to Oaxaca at
1879 | "booking.example.com." She enters data into a series of online
1880 | forms and is ultimately asked for credit card information to
1881 | purchase the airline tickets. She provides this information in
1882 | another form. When she presses the "Purchase" button, her browser
1883 | opens another network connection to the server at
1884 | "booking.example.com" and sends a message composed of form data
1885 | using the POST method. This is an unsafe interaction; Nadia wishes to
1887 | change the state of the system by exchanging money for airline
1888 | tickets.
1889 |
1890 |
1891 | The server reads the POST request, and after performing the
1892 | booking transaction returns a message to Nadia's browser that
1893 | contains a representation of the results of Nadia's request. The
1894 | representation data is in XHTML so that it can be saved or
1895 | printed out for Nadia's records.
1896 |
1897 |
1898 | Note that neither the data transmitted with the POST nor the data
1899 | received in the response necessarily correspond to any resource
1900 | identified by a URI.
1901 |
1902 |
1903 |
1904 |
1905 | Safe interactions are important because these are interactions where
1906 | users can browse with confidence and where agents (including search
1907 | engines and browsers that pre-cache data for the user) can follow
1908 | hypertext links safely. Users (or agents acting on their behalf) do
1909 | not commit themselves to anything by querying a resource or following
1910 | a hypertext link.
1911 |
1918 | Agents do not incur obligations by retrieving a representation.
1919 |
1920 |
1921 |
1922 | For instance, it is incorrect to publish a URI that, when followed as
1923 | part of a hypertext link, subscribes a user to a mailing list.
1924 | Remember that search engines may follow such hypertext links.
1925 |
1926 |
1927 | The fact that HTTP GET, the access method most often used when
1928 | following a hypertext link, is safe does not imply that all safe
1929 | interactions must be done through HTTP GET. At times, there may be
1930 | good reasons (such as confidentiality requirements or practical
1931 | limits on URI length) to conduct an otherwise safe operation using a
1932 | mechanism generally reserved for unsafe operations (e.g., HTTP POST).
1933 |
1934 |
1935 | For more information about safe and unsafe operations using HTTP GET
1936 | and POST, and handling security concerns around the use of HTTP GET,
1937 | see the TAG finding "URIs,
1939 | Addressability, and the use of HTTP GET and POST".
1940 |
1941 |
1942 |
1943 | Unsafe interactions and accountability
1944 |
1945 |
1946 |
1947 | Story
1948 |
1949 |
1950 |
1951 | Nadia pays for her airline tickets online (through a POST
1952 | interaction as described above). She receives a Web page with
1953 | confirmation information and wishes to bookmark it so that she
1954 | can refer to it when she calculates her expenses. Although
1955 | Nadia can print out the results, or save them to a file, she
1956 | would also like to bookmark them.
1957 |
1958 |
1959 |
1960 |
1961 | Transaction requests and results are valuable resources, and like
1962 | all valuable resources, it is useful to be able to refer to them
1963 | with a persistent URI. However, in
1964 | practice, Nadia cannot bookmark her commitment to pay (expressed
1965 | via the POST request) or the airline company's acknowledgment and
1966 | commitment to provide her with a flight (expressed via the response
1967 | to the POST).
1968 |
1969 |
1970 | There are ways to provide persistent URIs for transaction requests
1971 | and their results. For transaction requests, user agents can
1972 | provide an interface for managing transactions where the user agent
1973 | has incurred an obligation on behalf of the user. For transaction
1974 | results, HTTP allows representation providers to associate a URI
1975 | with the results of an HTTP POST request using the
1976 | "Content-Location" header (described in section 14.14 of
1977 | [[!HTTP11]]).
1978 |
1979 |
1980 |
1981 |
1982 |
1983 | Representation Management
1984 |
1985 |
1986 |
1987 | Story
1988 |
1989 |
1990 |
1991 | Since Nadia finds the Oaxaca weather site useful, she emails a
1992 | review to her friend Dirk recommending that he check out
1993 | 'http://weather.example.com/oaxaca'. Dirk clicks on the resulting
1994 | hypertext link in the email he receives and is frustrated by a
1995 | 404 (not found). Dirk tries again the next day and receives a
1996 | representation with "news" that is two-weeks old. He tries one
1997 | more time the next day only to receive a representation that
1998 | claims that the weather in Oaxaca is sunny, even though his
1999 | friends in Oaxaca tell him by phone that in fact it is raining.
2000 | Dirk and Nadia conclude that the URI owners are unreliable or
2001 | unpredictable. Although the URI owner has chosen the Web as a
2002 | communication medium, the owner has lost two customers due to
2003 | ineffective representation management.
2004 |
2005 |
2006 |
2007 |
2008 | A URI owner may supply zero or more authoritative representations of
2009 | the resource identified by that URI. There is a benefit to the
2010 | community in providing representations.
2011 |
2018 | A URI owner SHOULD provide representations of the resource it
2019 | identifies
2020 |
2021 |
2022 |
2023 | For example, owners of XML namespace URIs should use them to identify
2024 | a namespace document.
2025 |
2026 |
2027 | Just because representations are available does not mean that it is
2028 | always desirable to retrieve them. In fact, in some cases the
2029 | opposite is true.
2030 |
2038 | An application developer or specification author SHOULD NOT require
2039 | networked retrieval of representations each time they are
2040 | referenced.
2041 |
2042 |
2043 |
2044 | Dereferencing a URI has a (potentially significant) cost in computing
2045 | and bandwidth resources, may have security implications, and may
2046 | impose significant latency on the dereferencing application.
2047 | Dereferencing URIs should be avoided except when necessary.
2048 |
2061 | As is the case with many human interactions, confidence in
2062 | interactions via the Web depends on stability and predictability.
2063 | For an information resource, persistence depends on the consistency
2064 | of representations. The representation provider decides when
2065 | representations are sufficiently consistent (although that
2066 | determination generally takes user expectations into account).
2067 |
2068 |
2069 | Although persistence in this case is observable as a result of
2070 | representation retrieval, the term URI persistence is
2071 | used to describe the desirable property that, once associated with
2072 | a resource, a URI should continue indefinitely to refer to that
2073 | resource.
2074 |
2081 | A URI owner SHOULD provide representations of the identified
2082 | resource consistently and predictably.
2083 |
2084 |
2085 |
2086 | URI persistence is a matter of policy and commitment on the part of
2087 | the URI owner. The choice of a
2088 | particular URI scheme provides no guarantee that those URIs will be
2089 | persistent or that they will not be persistent.
2090 |
2091 |
2092 | HTTP [[!HTTP11]] has been designed to help manage URI persistence.
2093 | For example, HTTP redirection (using the 3xx response codes)
2094 | permits servers to tell an agent that further action needs to be
2095 | taken by the agent in order to fulfill the request (for example, a
2096 | new URI is associated with the resource).
2097 |
2098 |
2099 | In addition, content negotiation also
2100 | promotes consistency, as a site manager is not required to define
2101 | new URIs when adding support for a new format specification.
2102 | Protocols that do not support content negotiation (such as FTP)
2103 | require a new identifier when a new data format is introduced.
2104 | Improper use of content negotiation can lead to inconsistent
2105 | representations.
2106 |
2107 |
2108 | For more discussion about URI persistence, see [Cool].
2110 |
2111 |
2112 |
2113 |
2114 | Linking and access control
2115 |
2116 |
2117 | It is reasonable to limit access to a resource (for commercial or
2118 | security reasons, for example), but merely identifying the resource
2119 | is like referring to a book by title. In exceptional circumstances,
2120 | people may have agreed to keep titles or URIs confidential (for
2121 | example, a book author and a publisher may agree to keep the URI of
2122 | page containing additional material secret until after the book is
2123 | published), otherwise they are free to exchange them.
2124 |
2125 |
2126 | As an analogy: The owners of a building might have a policy that
2127 | the public may only enter the building via the main front door, and
2128 | only during business hours. People who work in the building and who
2129 | make deliveries to it might use other doors as appropriate. Such a
2130 | policy would be enforced by a combination of security personnel and
2131 | mechanical devices such as locks and pass-cards. One would not
2132 | enforce this policy by hiding some of the building entrances, nor
2133 | by requesting legislation requiring the use of the front door and
2134 | forbidding anyone to reveal the fact that there are other doors to
2135 | the building.
2136 |
2137 |
2138 |
2139 | Story
2140 |
2141 |
2142 |
2143 | Nadia sends to Dirk the URI of the current article she is
2144 | reading. With his browser, Dirk follows the hypertext link and
2145 | is asked to enter his subscriber username and password. Since
2146 | Dirk is also a subscriber to services provided by
2147 | "weather.example.com," he can access the same information as
2148 | Nadia. Thus, the authority for "weather.example.com" can limit
2149 | access to authorized parties and still provide the benefits of
2150 | URIs.
2151 |
2152 |
2153 |
2154 |
2155 | The Web provides several mechanisms to control access to resources;
2156 | these mechanisms do not rely on hiding or suppressing URIs for
2157 | those resources. For more information, see the TAG finding
2158 | "'Deep Linking' in
2160 | the World Wide Web".
2161 |
2162 |
2163 |
2164 |
2165 | Supporting Navigation
2166 |
2167 |
2168 | It is a strength of Web Architecture that links can be made and
2169 | shared; a user who has found an interesting part of the Web can
2170 | share this experience just by republishing a URI.
2171 |
2172 |
2173 |
2174 | Story
2175 |
2176 |
2177 |
2178 | Nadia and Dirk want to visit the Museum of Weather Forecasting
2179 | in Oaxaca. Nadia goes to "http://maps.example.com", locates the
2180 | museum, and mails the URI
2181 | "http://maps.example.com/oaxaca?lat=17.065;lon=-96.716;scale=6"
2182 | to Dirk. Dirk goes to "http://mymaps.example.com", locates the
2183 | museum, and mails the URI
2184 | "http://mymaps.example.com/geo?sessionID=765345;userID=Dirk" to
2185 | Nadia. Dirk reads Nadia's email and is able to follow the link
2186 | to the map. Nadia reads Dirk's email, follows the link, and
2187 | receives an error message 'No such session/user'. Nadia has to
2188 | start again from "http://mymaps.example.com" and find the
2189 | museum location once more.
2190 |
2191 |
2192 |
2193 |
2194 | For resources that are generated on demand, machine generation of
2195 | URIs is common. For resources that might usefully be bookmarked for
2196 | later perusal, or shared with others, server managers should avoid
2197 | needlessly restricting the reusability of such URIs. If the
2198 | intention is to restrict information to a particular user, as might
2199 | be the case in a home banking application for example, designers
2200 | should use appropriate access control
2201 | mechanisms.
2202 |
2203 |
2204 | Interactions conducted with HTTP POST (where HTTP GET could have
2205 | been used) also limit navigation possibilities. The user cannot
2206 | create a bookmark or share the URI because HTTP POST transactions
2207 | do not typically result in a different URI as the user interacts
2208 | with the site.
2209 |
2210 |
2211 |
2212 |
2213 |
2214 | Future Directions for Interaction
2215 |
2216 |
2217 | There remain open questions regarding Web interactions. The TAG
2218 | expects future versions of this document to address in more detail
2219 | the relationship between the architecture described herein, Web Services, peer-to-peer systems,
2221 | instant messaging systems (such as [[!RFC3920]]), streaming audio
2222 | (such as RTSP [[!RFC2326]]), and voice-over-IP (such as SIP
2223 | [[!RFC3261]]).
2224 |
2225 |
2226 |
2227 |
2228 |
2229 | Data Formats
2230 |
2231 |
2232 | A data format specification (for example, for XHTML, RDF/XML, SMIL,
2233 | XLink, CSS, and PNG) embodies an agreement on the correct
2234 | interpretation of representation
2235 | data. The first data format used on the Web was HTML. Since then, data
2236 | formats have grown in number. Web architecture does not constrain which
2237 | data formats content providers can use. This flexibility is important
2238 | because there is constant evolution in applications, resulting in new
2239 | data formats and refinements of existing formats. Although Web
2240 | architecture allows for the deployment of new data formats, the
2241 | creation and deployment of new formats (and agents able to handle them)
2242 | is expensive. Thus, before inventing a new data format (or "meta"
2243 | format such as XML), designers should carefully consider re-using one
2244 | that is already available.
2245 |
2246 |
2247 | For a data format to be usefully interoperable between two parties, the
2248 | parties must agree (to a reasonable extent) about its syntax and
2249 | semantics. Shared understanding of a data format promotes
2250 | interoperability but does not imply constraints on usage; for instance,
2251 | a sender of data cannot count on being able to constrain the behavior
2252 | of a data receiver.
2253 |
2254 |
2255 | Below we describe some characteristics of a data format that facilitate
2256 | integration into Web architecture. This document does not address
2257 | generally beneficial characteristics of a specification such as
2258 | readability, simplicity, attention to programmer goals, attention to
2259 | user needs, accessibility, nor internationalization. The section on
2260 | architectural specifications includes
2261 | references to additional format specification guidelines.
2262 |
2263 |
2264 |
2265 | Binary and Textual Data Formats
2266 |
2267 |
2268 | Binary data formats are those in which portions of the data are
2269 | encoded for direct use by computer processors, for example 32 bit
2270 | little-endian two's-complement and 64 bit IEEE double-precision
2271 | floating-point. The portions of data so represented include numeric
2272 | values, pointers, and compressed data of all sorts.
2273 |
2274 |
2275 | A textual data format is one in which the data is specified in a
2276 | defined encoding as a sequence of characters. HTML, Internet e-mail,
2277 | and all XML-based formats are textual.
2278 | Increasingly, internationalized textual data formats refer to the
2279 | Unicode repertoire [[!UNICODE]] for character definitions.
2280 |
2281 |
2282 | If a data format is textual, as defined in this section, that does
2283 | not imply that it should be served with a media type beginning with
2284 | "text/". Although XML-based formats are textual, many XML-based
2285 | formats do not consist primarily of phrases in natural language. See
2286 | the section on media types for XML for
2287 | issues that arise when "text/" is used in conjunction with an
2288 | XML-based format.
2289 |
2290 |
2291 | In principle, all data can be represented using textual formats. In
2292 | practice, some types of content (e.g., audio and video) are generally
2293 | represented using binary formats.
2294 |
2295 |
2296 | The trade-offs between binary and textual data formats are complex
2297 | and application-dependent. Binary formats can be substantially more
2298 | compact, particularly for complex pointer-rich data structures. Also,
2299 | they can be consumed more rapidly by agents in those cases where they
2300 | can be loaded into memory and used with little or no conversion.
2301 | Note, however, that such cases are relatively uncommon as such direct
2302 | use may open the door to security issues that can only practically be
2303 | addressed by examining every aspect of the data structure in detail.
2304 |
2305 |
2306 | Textual formats are usually more portable and interoperable. Textual
2307 | formats also have the considerable advantage that they can be
2308 | directly read by human beings (and understood, given sufficient
2309 | documentation). This can simplify the tasks of creating and
2310 | maintaining software, and allow the direct intervention of humans in
2311 | the processing chain without recourse to tools more complex than the
2312 | ubiquitous text editor. Finally, it simplifies the necessary human
2313 | task of learning about new data formats; this is called the "view
2314 | source" effect.
2315 |
2316 |
2317 | It is important to emphasize that intuition as to such matters as
2318 | data size and processing speed is not a reliable guide in data format
2319 | design; quantitative studies are essential to a correct understanding
2320 | of the trade-offs. Therefore, designers of a data format
2321 | specification should make a considered choice between binary and
2322 | textual format design.
2323 |
2334 | In a perfect world, language designers would invent languages that
2335 | perfectly met the requirements presented to them, the requirements
2336 | would be a perfect model of the world, they would never change over
2337 | time, and all implementations would be perfectly interoperable
2338 | because the specifications would have no variability.
2339 |
2340 |
2341 | In the real world, language designers imperfectly address the
2342 | requirements as they interpret them, the requirements inaccurately
2343 | model the world, conflicting requirements are presented, and they
2344 | change over time. As a result, designers negotiate with users, make
2345 | compromises, and often introduce extensibility mechanisms so that it
2346 | is possible to work around problems in the short term. In the long
2347 | term, they produce multiple versions of their languages, as the
2348 | problem, and their understanding of it, evolve. The resulting
2349 | variability in specifications, languages, and implementations
2350 | introduces interoperability costs.
2351 |
2352 |
2353 | Extensibility and versioning are strategies to help manage the
2354 | natural evolution of information on the Web and technologies used to
2355 | represent that information. For more information about how these
2356 | strategies introduce variability and how that variability impacts
2357 | interoperability, see Variability in
2358 | Specifications.
2359 |
2360 |
2361 | See TAG issue XMLVersioning-41,
2363 | which concerns good practices for designing extensible XML languages
2364 | and for handling versioning. See also "Web Architecture: Extensible
2365 | Languages" [[!EXTLANG]].
2366 |
2367 |
2368 |
2369 | Versioning
2370 |
2371 |
2372 | There is typically a (long) transition period during which multiple
2373 | versions of a format, protocol, or agent are simultaneously in use.
2374 |
2381 | A data format specification SHOULD provide for version
2382 | information.
2383 |
2384 |
2385 |
2386 |
2387 |
2388 | Versioning and XML namespace policy
2389 |
2390 |
2391 |
2392 | Story
2393 |
2394 |
2395 |
2396 | Nadia and Dirk are designing an XML data format to encode data
2397 | about the film industry. They provide for extensibility by
2398 | using XML namespaces and creating a schema that allows the
2399 | inclusion, in certain places, of elements from any namespace.
2400 | When they revise their format, Nadia proposes a new optional
2401 | lang attribute on the film element.
2402 | Dirk feels that such a change requires them to assign a new
2403 | namespace name, which might require changes to deployed
2404 | software. Nadia explains to Dirk that their choice of
2405 | extensibility strategy in conjunction with their namespace
2406 | policy allows certain changes that do not affect conformance of
2407 | existing content and software, and thus no change to the
2408 | namespace identifier is required. They chose this policy to
2409 | help them meet their goals of reducing the cost of change.
2410 |
2411 |
2412 |
2413 |
2414 | Dirk and Nadia have chosen a particular namespace change policy
2415 | that allows them to avoid changing the namespace name whenever they
2416 | make changes that do not affect conformance of deployed content and
2417 | software. They might have chosen a different policy, for example
2418 | that any new element or attribute has to belong to a namespace
2419 | other than the original one. Whatever the chosen policy, it should
2420 | set clear expectations for users of the format.
2421 |
2422 |
2423 | In general, changing the namespace name of an element completely
2424 | changes the element name. If "a" and "b" are bound to two different
2425 | URIs, a:element and b:element are as
2426 | distinct as a:eieio and a:xyzzy.
2427 | Practically speaking, this means that deployed applications will
2428 | have to be upgraded in order to recognize the new language; the
2429 | cost of this upgrade may be very high.
2430 |
2431 |
2432 | It follows that there are significant tradeoffs to be considered
2433 | when deciding on a namespace change policy. If a vocabulary has no
2434 | extensibility points (that is, if it does not allow elements or
2435 | attributes from foreign namespaces or have a mechanism for dealing
2436 | with unrecognized names from the same namespace), it may be
2437 | absolutely necessary to change the namespace name. Languages that
2438 | allow some form of extensibility without requiring a change to the
2439 | namespace name are more likely to evolve gracefully.
2440 |
2447 | An XML format specification SHOULD include information about
2448 | change policies for XML namespaces.
2449 |
2450 |
2451 |
2452 | As an example of a change policy designed to reflect the variable
2453 | stability of a namespace, consider the W3C namespace policy for
2455 | documents on the W3C Recommendation track. The policy sets
2456 | expectations that the Working Group responsible for the namespace
2457 | may modify it in any way until a certain point in the process
2458 | ("Candidate Recommendation") at which point W3C constrains the set
2459 | of possible changes to the namespace in order to promote stable
2460 | implementations.
2461 |
2462 |
2463 | Note that since namespace names are URIs, the owner of a namespace
2464 | URI has the authority to decide the namespace change policy.
2465 |
2466 |
2467 |
2468 |
2469 | Extensibility
2470 |
2471 |
2472 | Requirements change over time. Successful technologies are adopted
2473 | and adapted by new users. Designers can facilitate the transition
2474 | process by making careful choices about extensibility during the
2475 | design of a language or protocol specification.
2476 |
2477 |
2478 | In making these choices, the designers must weigh the trade-offs
2479 | between extensibility, simplicity, and variability. A language
2480 | without extensibility mechanisms may be simpler and less variable,
2481 | improving initial interoperability. However, it's likely that
2482 | changes to that language will be more difficult, possibly more
2483 | complex and more variable, than if the initial design had provided
2484 | such mechanisms. This may decrease interoperability over the long
2485 | term.
2486 |
2493 | A specification SHOULD provide mechanisms that allow any party to
2494 | create extensions.
2495 |
2496 |
2497 |
2498 | Extensibility introduces variability which has an impact on
2499 | interoperability. However, languages that have no extensibility
2500 | mechanisms may be extended in ad hoc ways that impact
2501 | interoperability as well. One key criterion of the mechanisms
2502 | provided by language designers is that they allow the extended
2503 | languages to remain in conformance with the original specification,
2504 | increasing the likelihood of interoperability.
2505 |
2512 | Extensibility MUST NOT interfere with conformance to the original
2513 | specification.
2514 |
2515 |
2516 |
2517 | Application needs determine the most appropriate extension strategy
2518 | for a specification. For example, applications designed to operate
2519 | in closed environments may allow specification designers to define
2520 | a versioning strategy that would be impractical at the scale of the
2521 | Web.
2522 |
2529 | A specification SHOULD specify agent behavior in the face of
2530 | unrecognized extensions.
2531 |
2532 |
2533 |
2534 | Two strategies have emerged as being particularly useful:
2535 |
2536 |
2537 |
"Must ignore": The agent ignores any content it does not
2538 | recognize.
2539 |
2540 |
"Must understand": The agent treats unrecognized markup as an
2541 | error condition.
2542 |
2543 |
2544 |
2545 | A powerful design approach is for the language to allow either form
2546 | of extension, but to distinguish explicitly between them in the
2547 | syntax.
2548 |
2549 |
2550 | Additional strategies include prompting the user for more input and
2551 | automatically retrieving data from available hypertext links. More
2552 | complex strategies are also possible, including mixing strategies.
2553 | For instance, a language can include mechanisms for overriding
2554 | standard behavior. Thus, a data format can specify "must ignore"
2555 | semantics but also allow for extensions that override that
2556 | semantics in light of application needs (for instance, with "must
2557 | understand" semantics for a particular extension).
2558 |
2559 |
2560 | Extensibility is not free. Providing hooks for extensibility is one
2561 | of many requirements to be factored into the costs of language
2562 | design. Experience suggests that the long term benefits of a
2563 | well-designed extensibility mechanism generally outweigh the costs.
2564 |
2575 | Many modern data format include mechanisms for composition. For
2576 | example:
2577 |
2578 |
2579 |
It is possible to embed text comments in some image formats,
2580 | such as JPEG/JFIF. Although these comments are embedded in the
2581 | containing data, they are not intended to affect the display of the
2582 | image.
2583 |
2584 |
There are container formats such as SOAP which fully expect
2585 | content from multiple namespaces but which provide an overall
2586 | semantic relationship of message envelope and payload.
2587 |
2588 |
The semantics of combining RDF documents containing multiple
2589 | vocabularies are well-defined.
2590 |
2591 |
2592 |
2593 | In principle, these relationships can be mixed and nested
2594 | arbitrarily. A SOAP message, for example, can contain an SVG image
2595 | that contains an RDF comment which refers to a vocabulary of terms
2596 | for describing the image.
2597 |
2598 |
2599 | Note however, that for general XML there is no semantic model that
2600 | defines the interactions within XML documents with elements and/or
2601 | attributes from a variety of namespaces. Each application must
2602 | define how namespaces interact and what effect the namespace of an
2603 | element has on the element's ancestors, siblings, and descendants.
2604 |
2605 |
2606 | See TAG issues mixedUIXMLNamespace-33
2608 | (concerning the meaning of a document composed of content in
2609 | multiple namespaces), xmlFunctions-34
2611 | (concerning one approach for managing XML transformation and
2612 | composability), and RDFinXHTML-35
2614 | (concerning the interpretation of RDF when embedded in an XHTML
2615 | document).
2616 |
2617 |
2618 |
2619 |
2620 |
2621 | Separation of Content, Presentation, and Interaction
2622 |
2623 |
2624 | The Web is a heterogeneous environment where a wide variety of agents
2625 | provide access to content to users with a wide variety of
2626 | capabilities. It is good practice for authors to create content that
2627 | can reach the widest possible audience, including users with
2628 | graphical desktop computers, hand-held devices and mobile phones,
2629 | users with disabilities who may require speech synthesizers, and
2630 | devices not yet imagined. Furthermore, authors cannot predict in some
2631 | cases how an agent will display or process their content. Experience
2632 | shows that the separation of content, presentation, and interaction
2633 | promotes the reuse and device-independence of content; this follows
2634 | from the principle of orthogonal
2635 | specifications.
2636 |
2637 |
2638 | This separation also facilitates reuse of authored source content
2639 | across multiple delivery contexts. Sometimes, functional user
2640 | experiences suited to any delivery context can be generated by using
2641 | an adaptation process applied to a representation that does not
2642 | depend on the access mechanism. For more information about principles
2643 | of device-independence, see [DIPRINCIPLES].
2645 |
2652 | A specification SHOULD allow authors to separate content from both
2653 | presentation and interaction concerns.
2654 |
2655 |
2656 |
2657 | Note that when content, presentation, and interaction are separated
2658 | by design, agents need to recombine them. There is a recombination
2659 | spectrum, with "client does all" at one end and "server does all" at
2660 | the other.
2661 |
2662 |
2663 | There are advantages to each approach. For instance when a client
2664 | (such as a mobile phone) communicates device capabilities to the
2665 | server (for example, using CC/PP), the server can tailor the
2666 | delivered content to fit that client. The server can, for example,
2667 | enable faster downloads by adjusting links to refer to lower
2668 | resolution images, smaller video or no video at all. Similarly, if
2669 | the content has been authored with multiple branches, the server can
2670 | remove unused branches before delivery. In addition, by tailoring the
2671 | content to match the characteristics of a target client, the server
2672 | can help reduce client side computation. However, specializing
2673 | content in this manner reduces caching efficiency.
2674 |
2675 |
2676 | On the other hand, designing content that that can be recombined on
2677 | the client also tends to make that content applicable to a wider
2678 | range of devices. This design also improves caching efficiency and
2679 | offers users more presentation options. Media-dependent style sheets
2680 | can be used to tailor the content on the client side to particular
2681 | groups of target devices. For textual content with a regular and
2682 | repeating structure, the combined size of the text content plus the
2683 | style sheet is typically less than that of fully recombined content;
2684 | the savings improve further if the style sheet is reused by other
2685 | pages.
2686 |
2687 |
2688 | In practice a combination of both approaches is often used. The
2689 | design decision about where on this spectrum an application should be
2690 | placed depends on the power on the client, the power and the load on
2691 | the server, and the bandwidth of the medium that connects them. If
2692 | the number of possible clients is unbounded, the application will
2693 | scale better if more computation is pushed to the client.
2694 |
2695 |
2696 | Of course, it may not be desirable to reach the widest possible
2697 | audience. Designers should consider appropriate technologies, such as
2698 | encryption and access control, for limiting
2699 | the audience.
2700 |
2701 |
2702 | Some data formats are designed to describe presentation (including
2703 | SVG and XSL Formatting Objects). Data formats such as these
2704 | demonstrate that one can only separate content from presentation (or
2705 | interaction) so far; at some point it becomes necessary to talk about
2706 | presentation. Per the principle of orthogonal specifications these data formats
2708 | should only address presentation issues.
2709 |
2710 |
2711 | See the TAG issues formattingProperties-19
2713 | (concerning interoperability in the case of formatting properties and
2714 | names) and contentPresentation-26
2716 | (concerning the separation of semantic and presentational markup).
2717 |
2718 |
2719 |
2720 |
2721 | Hypertext
2722 |
2723 |
2724 | A defining characteristic of the Web is that it allows embedded
2725 | references to other resources via URIs. The simplicity of creating
2726 | hypertext links using absolute URIs (<a
2727 | href="http://www.example.com/foo">) and relative URI
2728 | references (<a href="foo"> and <a
2729 | href="foo#anchor">) is partly (perhaps largely) responsible
2730 | for the success of the hypertext Web as we know it today.
2731 |
2732 |
2733 | When one resource (representation) refers to another resource with a
2734 | URI, this constitutes a link between the two resources.
2735 | Additional metadata may also form part of the link (see [[!XLINK10]],
2736 | for example). Note: In this document, the term
2737 | "link" generally means "relationship", not "physical connection".
2738 |
2745 | A specification SHOULD provide ways to identify links to other
2746 | resources, including to secondary resources (via fragment
2747 | identifiers).
2748 |
2749 |
2750 |
2751 | Formats that allow content authors to use URIs instead of local
2752 | identifiers promote the network effect: the value of these formats
2753 | grows with the size of the deployed Web.
2754 |
2771 | A specification SHOULD allow content authors to use URIs without
2772 | constraining them to a limited set of URI schemes.
2773 |
2774 |
2775 |
2776 | What agents do with a hypertext link is not constrained by Web
2777 | architecture and may depend on application context. Users of
2778 | hypertext links expect to be able to navigate among representations
2779 | by following links.
2780 |
2787 | A data format SHOULD incorporate hypertext links if hypertext is
2788 | the expected user interface paradigm.
2789 |
2790 |
2791 |
2792 | Data formats that do not allow content authors to create hypertext
2793 | links lead to the creation of "terminal nodes" on the Web.
2794 |
2795 |
2796 |
2797 | URI references
2798 |
2799 |
2800 | Links are commonly expressed using URI references
2801 | (defined in section 4.2 of [[!URI]]), which may be combined with a
2802 | base URI to yield a usable URI. Section 5.1 of [[!URI]] explains
2803 | different ways to establish a base URI for a resource and
2804 | establishes a precedence among them. For instance, the base URI may
2805 | be a URI for the resource, or specified in a representation (see
2806 | the base elements provided by HTML and XML, and the
2807 | HTTP 'Content-Location' header). See also the section on links in XML.
2809 |
2810 |
2811 | Agents resolve a URI reference before using the resulting URI to
2812 | interact with another agent. URI references help in content
2813 | management by allowing content authors to design a representation
2814 | locally, i.e., without concern for which global identifier may
2815 | later be used to refer to the associated resource.
2816 |
2817 |
2818 |
2819 |
2820 |
2821 | XML-Based Data Formats
2822 |
2823 |
2824 | Many data formats are XML-based, that is to say they
2825 | conform to the syntax rules defined in the XML specification
2826 | [[!XML10]] or [XML11]. This section discusses
2827 | issues that are specific to such formats. Anyone seeking guidance in
2828 | this area is urged to consult the "Guidelines For the Use of XML in
2829 | IETF Protocols" [IETFXML], which contains a
2830 | thorough discussion of the considerations that govern whether or not
2831 | XML ought to be used, as well as specific guidelines on how it ought
2832 | to be used. While it is directed at Internet applications with
2833 | specific reference to protocols, the discussion is generally
2834 | applicable to Web scenarios as well.
2835 |
2836 |
2837 | The discussion here should be seen as ancillary to the content of
2838 | [IETFXML]. Refer also to "XML Accessibility
2839 | Guidelines" [XAG] for help designing XML formats
2840 | that lower barriers to Web accessibility for people with
2841 | disabilities.
2842 |
2843 |
2844 |
2845 | When to use an XML-based format
2846 |
2847 |
2848 | XML defines textual data formats that are naturally suited to
2849 | describing data objects which are hierarchical and processed in a
2850 | chosen sequence. It is widely, but not universally, applicable for
2851 | data formats; an audio or video format, for example, is unlikely to
2852 | be well suited to expression in XML. Design constraints that would
2853 | suggest the use of XML include:
2854 |
2855 |
2856 |
Requirement for a hierarchical structure.
2857 |
2858 |
Need for a wide range of tools on a variety of platforms.
2859 |
2860 |
Need for data that can outlive the applications that currently
2861 | process it.
2862 |
2863 |
Ability to support internationalization in a self-describing
2864 | way that makes confusion over coding options unlikely.
2865 |
2866 |
Early detection of encoding errors with no requirement to "work
2867 | around" such errors.
2868 |
2869 |
A high proportion of human-readable textual content.
2870 |
2871 |
Potential composition of the data format with other XML-encoded
2872 | formats.
2873 |
2874 |
Desire for data easily parsed by both humans and machines.
2875 |
2876 |
Desire for vocabularies that can be invented in a distributed
2877 | manner and combined flexibly.
2878 |
2879 |
2880 |
2881 |
2882 |
2883 | Links in XML
2884 |
2885 |
2886 | Sophisticated linking mechanisms have been invented for XML
2887 | formats. XPointer allows links to address content that does not
2888 | have an explicit, named anchor. [[!XLINK10]] is an appropriate
2889 | specification for representing links in hypertext XML applications. XLink allows links to
2891 | have multiple ends and to be expressed either inline or in "link
2892 | bases" stored external to any or all of the resources identified by
2893 | the links it contains.
2894 |
2895 |
2896 | Designers of XML-based formats may consider using XLink and, for
2897 | defining fragment identifier syntax, using the XPointer framework
2898 | and XPointer element() Schemes.
2899 |
2900 |
2901 | XLink is not the only linking design that has been proposed for
2902 | XML, nor is it universally accepted as a good design. See also TAG
2903 | issue xlinkScope-23.
2905 |
2906 |
2907 |
2908 |
2909 | XML namespaces
2910 |
2911 |
2912 | The purpose of an XML namespace (defined in [XMLNS]) is to allow the deployment of XML vocabularies
2914 | (in which element and attribute names are defined) in a global
2915 | environment and to reduce the risk of name collisions in a given
2916 | document when vocabularies are combined. For example, the MathML
2917 | and SVG specifications both define the set element.
2918 | Although XML data from different formats such as MathML and SVG can
2919 | be combined in a single document, in this case there could be
2920 | ambiguity about which set element was intended. XML
2921 | namespaces reduce the risk of name collisions by taking advantage
2922 | of existing systems for allocating globally scoped names: the URI
2923 | system (see also the section on URI allocation). When using XML namespaces,
2925 | each local name in an XML vocabulary is paired with a URI (called
2926 | the namespace URI) to distinguish the local name from local names
2927 | in other vocabularies.
2928 |
2929 |
2930 | The use of URIs confers additional benefits. First, each URI/local
2931 | name pair can be mapped to another URI, grounding the terms of the
2932 | vocabulary in the Web. These terms may be important resources and
2933 | thus it is appropriate to be able to associate URIs with them.
2934 |
2935 |
2936 | For flat namespaces, concatenation is one useful mapping. If
2937 | namespace URIs that end with a hash (“#”) are chosen, then simple
2938 | concatenation of the namespace URI and the local name creates a URI
2939 | for a secondary resource (the identified term). This technique is
2940 | used for many [[!RDFXML]] namespaces.
2941 |
2942 |
2943 | Other mappings are likely to be more suitable for hierarchical
2944 | namespaces; see the related TAG issue abstractComponentRefs-37.
2946 |
2947 |
2948 | Designers of XML-based data formats who declare namespaces thus
2949 | make it possible to reuse those data formats and combine them in
2950 | novel ways not yet imagined. Failure to declare namespaces makes
2951 | such reuse more difficult, even impractical in some cases.
2952 |
2959 | A specification that establishes an XML vocabulary SHOULD place
2960 | all element names and global attribute names in a namespace.
2961 |
2962 |
2963 |
2964 | Attributes are always scoped by the element on which they appear.
2965 | An attribute that is "global," that is, one that might meaningfully
2966 | appear on elements of many types, including elements in other
2967 | namespaces, should be explicitly placed in a namespace. Local
2968 | attributes, ones associated with only a particular element type,
2969 | need not be included in a namespace since their meaning will always
2970 | be clear from the context provided by that element.
2971 |
2972 |
2973 | The type attribute from the W3C XML Schema Instance
2974 | namespace "http://www.w3.org/2001/XMLSchema-instance" ([XMLSCHEMA], section 4.3.2) is an example of a
2976 | global attribute. It can be used by authors of any vocabulary to
2977 | make an assertion in instance data about the type of the element on
2978 | which it appears. As a global attribute, it must always be
2979 | qualified. The frame attribute on an HTML table is an
2980 | example of a local attribute. There is no value in placing that
2981 | attribute in a namespace since the attribute is unlikely to be
2982 | useful on an element other than an HTML table.
2983 |
2984 |
2985 | Applications that rely on DTD processing must impose additional
2986 | constraints on the use of namespaces. DTDs perform validation based
2987 | on the lexical form of the element and attribute names in the
2988 | document. This makes prefixes syntactically significant in ways
2989 | that are not anticipated by [[!XMLNS]].
2990 |
2991 |
2992 |
2993 |
2994 | Namespace documents
2995 |
2996 |
2997 |
2998 | Story
2999 |
3000 |
3001 |
3002 | Nadia receives representation data from "weather.example.com"
3003 | in an unfamiliar data format. She knows enough about XML to
3004 | recognize which XML namespace the elements belong to. Since the
3005 | namespace is identified by the URI
3006 | "http://weather.example.com/2003/format", she asks her browser
3007 | to retrieve a representation of the identified resource. She
3008 | gets back some useful data that allows her to learn more about
3009 | the data format. Nadia's browser may also be able to perform
3010 | some operations automatically (i.e., unattended by a human
3011 | overseer) given data that has been optimized for software
3012 | agents. For example, her browser might, on Nadia's behalf,
3013 | download additional agents to process and render the format.
3014 |
3015 |
3016 |
3017 |
3018 | Another benefit of using URIs to build XML namespaces is that the
3019 | namespace URI can be used to identify an information resource that
3020 | contains useful information, machine-usable and/or human-usable,
3021 | about terms in the namespace. This type of information resource is
3022 | called a namespace document. When a namespace URI owner
3023 | provides a namespace document, it is authoritative for the
3024 | namespace.
3025 |
3026 |
3027 | There are many reasons to provide a namespace document. A person
3028 | might want to:
3029 |
3030 |
3031 |
understand the purpose of the namespace,
3032 |
3033 |
learn how to use the markup vocabulary in the namespace,
3034 |
3035 |
find out who controls it and associated policies,
3036 |
3037 |
request authority to access schemas or collateral material
3038 | about it, or
3039 |
3040 |
report a bug or situation that could be considered an error in
3041 | some collateral material.
3042 |
3043 |
3044 |
3045 | A processor might want to:
3046 |
3047 |
3048 |
retrieve a schema, for validation,
3049 |
3050 |
retrieve a style sheet, for presentation, or
3051 |
3052 |
retrieve ontologies, for making inferences.
3053 |
3054 |
3055 |
3056 | In general, there is no established best practice for creating
3057 | representations of a namespace document; application expectations
3058 | will influence what data format or formats are used. Application
3059 | expectations will also influence whether relevant information
3060 | appears directly in a representation or is referenced from it.
3061 |
3068 | The owner of an XML namespace name SHOULD make available material
3069 | intended for people to read and material optimized for software
3070 | agents in order to meet the needs of those who will use the
3071 | namespace vocabulary.
3072 |
3073 |
3074 |
3075 | For example, the following are examples of data formats for
3076 | namespace documents: [[!OWL10]], [[!RDDL]], [[!XMLSCHEMA-1]], and
3077 | [[!XHTML11]]. Each of these formats meets different requirements
3078 | described above for satisfying the needs of an agent that wants
3079 | more information about the namespace. Note, however, issues related
3080 | to fragment identifiers and content
3081 | negotiation if content negotiation is used.
3082 |
3083 |
3084 | See TAG issues namespaceDocument-8
3086 | (concerning desired characteristics of namespace documents) and
3087 | abstractComponentRefs-37
3089 | (concerning the use of fragment identifiers with namespace names to
3090 | identify abstract components).
3091 |
3092 |
3093 |
3094 |
3095 | QNames in XML
3096 |
3097 |
3098 | Section 3 of "Namespaces in XML" [XMLNS]
3099 | provides a syntactic construct known as a QName for the compact
3100 | expression of qualified names in XML documents. A qualified name is
3101 | a pair consisting of a URI, which names a namespace, and a local
3102 | name placed within that namespace. "Namespaces in XML" provides for
3103 | the use of QNames as names for XML elements and attributes.
3104 |
3105 |
3106 | Other specifications, starting with [XSLT10],
3107 | have employed the idea of using QNames in contexts other than
3108 | element and attribute names, for example in attribute values and in
3109 | element content. However, general XML processors cannot reliably
3110 | recognize QNames as such when they are used in attribute values and
3111 | in element content; for example, the syntax of QNames overlaps with
3112 | that of URIs. Experience has also revealed other limitations to
3113 | QNames, such as losing namespace bindings after XML
3114 | canonicalization.
3115 |
3132 | Because QNames are compact, some specification designers have
3133 | adopted the same syntax as a means of identifying resources. Though
3134 | convenient as a shorthand notation, this usage has a cost. There is
3135 | no single, accepted way to convert a QName into a URI or vice
3136 | versa. Although QNames are convenient, they do not replace the URI
3137 | as the identification system of the Web. The use of QNames to
3138 | identify Web resources without providing a mapping to URIs is
3139 | inconsistent with Web architecture.
3140 |
3147 | A specification in which QNames serve as resource identifiers
3148 | MUST provide a mapping to URIs.
3149 |
3150 |
3151 |
3152 | See XML namespaces for examples of
3153 | some mapping strategies.
3154 |
3155 |
3156 | See also TAG issues rdfmsQnameUriMapping-6
3158 | (concerning the mapping of QNames to URIs), qnameAsId-18
3160 | (concerning the use of QNames as identifiers in XML content), and
3161 | abstractComponentRefs-37
3163 | (concerning the use of fragment identifiers with namespace names to
3164 | identify abstract components).
3165 |
3166 |
3167 |
3168 |
3169 | XML ID semantics
3170 |
3171 |
3172 | Consider the following fragment of XML: <section
3173 | >. Does the section element have what the
3174 | XML Recommendation refers to as the ID foo (i.e.,
3175 | "foo" must not appear in the surrounding XML document more than
3176 | once)? One cannot answer this question by examining the element and
3177 | its attributes alone. In XML, the quality of "being an ID" is
3178 | associated with the type of an attribute, not its name. Finding the
3179 | IDs in a document requires additional processing.
3180 |
3181 |
3182 |
Processing the document with a processor that recognizes DTD
3183 | attribute list declarations (in the external or internal subset)
3184 | might reveal a declaration that identifies the name
3185 | attribute as an ID. Note: This processing is not
3186 | necessarily part of validation. A non-validating, DTD-aware
3187 | processor can recognize IDs.
3188 |
3189 |
Processing the document with a W3C XML schema might reveal an
3190 | element declaration that identifies the name attribute
3191 | as an W3C XML Schema ID.
3192 |
3193 |
In practice, processing the document with another schema
3194 | language, such as RELAX NG [RELAXNG], might
3195 | reveal the attributes declared to be of ID in the XML Schema sense.
3196 | Many modern specifications begin processing XML at the Infoset
3197 | [INFOSET] level and do not specify
3198 | normatively how an Infoset is constructed. For those
3199 | specifications, any process that establishes the ID type in the
3200 | Infoset (and Post Schema Validation Infoset (PSVI)
3201 | defined in [XMLSCHEMA]) may usefully
3202 | identify the attributes of type ID.
3203 |
3204 |
In practice, applications may have independent means (such as
3205 | those defined in the XPointer specification, [XPTRFR]
3208 | section 3.2) of locating identifiers inside a document.
3209 |
3210 |
3211 |
3212 | To further complicate matters, DTDs establish the ID type in the
3213 | Infoset whereas W3C XML Schema produces a PSVI but does not modify
3214 | the original Infoset. This leaves open the possibility that a
3215 | processor might only look in the Infoset and consequently would
3216 | fail to recognize schema-assigned IDs.
3217 |
3218 |
3219 | See the TAG issue xmlIDSemantics-32
3221 | for additional background information and [XML-ID] for a solution under development.
3223 |
3224 |
3225 |
3226 |
3227 | Media types for XML
3228 |
3229 |
3230 | RFC 3023 defines the Internet media types "application/xml" and
3231 | "text/xml", and describes a convention whereby XML-based data
3232 | formats use Internet media types with a "+xml" suffix, for example
3233 | "image/svg+xml".
3234 |
3235 |
3236 | There are two problems associated with the “text” media types:
3237 | First, for data identified as "text/*", Web intermediaries are
3238 | allowed to "transcode", i.e., convert one character encoding to
3239 | another. Transcoding may make the self-description false or may
3240 | cause the document to be not well-formed.
3241 |
3248 | In general, a representation provider SHOULD NOT assign Internet
3249 | media types beginning with "text/" to XML representations.
3250 |
3251 |
3252 |
3253 | Second, representations whose Internet media types begin with
3254 | "text/" are required, unless the charset parameter is
3255 | specified, to be considered to be encoded in US-ASCII. Since the
3256 | syntax of XML is designed to make documents self-describing, it is
3257 | good practice to omit the charset parameter, and since
3258 | XML is very often not encoded in US-ASCII, the use of "text/"
3259 | Internet media types effectively precludes this good practice.
3260 |
3267 | In general, a representation provider SHOULD NOT specify the
3268 | character encoding for XML data in protocol headers since the
3269 | data is self-describing.
3270 |
3271 |
3272 |
3273 |
3274 |
3275 | Fragment identifiers in XML
3276 |
3277 |
3278 | The section on media types and
3279 | fragment identifier semantics discusses the interpretation of
3280 | fragment identifiers. Designers of an XML-based data format
3281 | specification should define the semantics of fragment identifiers
3282 | in that format. The XPointer Framework [XPTRFR] provides an interoperable starting point.
3284 |
3285 |
3286 | When the media type assigned to representation data is
3287 | "application/xml", there are no semantics defined for fragment
3288 | identifiers, and authors should not make use of fragment
3289 | identifiers in such data. The same is true if the assigned media
3290 | type has the suffix "+xml" (defined in "XML Media Types" [RFC3023]), and the data format specification does
3292 | not specify fragment identifier semantics. In short, just knowing
3293 | that content is XML does not provide information about fragment
3294 | identifier semantics.
3295 |
3296 |
3297 | Many people assume that the fragment identifier #abc,
3298 | when referring to XML data, identifies the element in the document
3299 | with the ID "abc". However, there is no normative support for this
3300 | assumption. A revision of RFC 3023 is expected to address this.
3301 |
3313 | Data formats enable the creation of new applications to make use of
3314 | the information space infrastructure. The Semantic Web is one such
3315 | application, built on top of RDF [RDFXML]. This
3316 | document does not discuss the Semantic Web in detail; the TAG expects
3317 | that future volumes of this document will. See the related TAG issue
3318 | httpRange-14.
3320 |
3321 |
3322 |
3323 |
3324 |
3325 | General Architecture Principles
3326 |
3327 |
3328 | A number of general architecture principles apply to all three bases of
3329 | Web architecture.
3330 |
3331 |
3332 |
3333 | Orthogonal Specifications
3334 |
3335 |
3336 | Identification, interaction, and representation are orthogonal
3337 | concepts, meaning that technologies used for identification,
3338 | interaction, and representation may evolve independently. For
3339 | instance:
3340 |
3341 |
3342 |
Resources are identified with URIs. URIs can be published without
3343 | building any representations of the resource or determining whether
3344 | any representations are available.
3345 |
3346 |
A generic URI syntax allows agents to function in many cases
3347 | without knowing specifics of URI schemes.
3348 |
3349 |
In many cases one may change the representation of a resource
3350 | without disrupting references to the resource (for example, by using
3351 | content negotiation).
3352 |
3353 |
3354 |
3355 | When two specifications are orthogonal, one may change one without
3356 | requiring changes to the other, even if one has dependencies on the
3357 | other. For example, although the HTTP specification depends on the
3358 | URI specification, the two may evolve independently. This
3359 | orthogonality increases the flexibility and robustness of the Web.
3360 | For example, one may refer by URI to an image without knowing
3361 | anything about the format chosen to represent the image. This has
3362 | facilitated the introduction of image formats such as PNG and SVG
3363 | without disrupting existing references to image resources.
3364 |
3371 | Orthogonal abstractions benefit from orthogonal specifications.
3372 |
3373 |
3374 |
3375 | Experience demonstrates that problems arise where orthogonal concepts
3376 | occur in a single specification. Consider, for example, the HTML
3377 | specification which includes the orthogonal x-www-form-urlencoded
3378 | specification. Software developers (for example, of [CGI] applications) might have an easier time finding the
3380 | specification if it were published separately and then cited from the
3381 | HTTP, URI, and HTML specifications.
3382 |
3383 |
3384 | Problems also arise when specifications attempt to modify orthogonal
3385 | abstractions described elsewhere. An historical
3387 | version of the HTML specification added a "Refresh"
3388 | value to the http-equiv attribute of the
3389 | meta element. It was defined to be equivalent to the
3390 | HTTP header of the same name. The authors of the HTTP specification
3391 | ultimately decided not to provide this header and that made the two
3392 | specifications awkwardly at odds with each other. The W3C HTML
3393 | Working Group eventually removed the "Refresh" value.
3394 |
3395 |
3396 | A specification should clearly indicate which features overlap with
3397 | those governed by another specification.
3398 |
3399 |
3400 |
3401 |
3402 | Extensibility
3403 |
3404 |
3405 | The information in the Web and the technologies used to represent
3406 | that information change over time. Extensibility is the property of a
3407 | technology that promotes evolution without sacrificing
3408 | interoperability. Some examples of successful technologies designed
3409 | to allow change while minimizing disruption include:
3410 |
3411 |
3412 |
the fact that URI schemes are orthogonally specified;
3413 |
3414 |
the use of an open set of Internet media types in mail and HTTP
3415 | to specify document interpretation;
3416 |
3417 |
the separation of the generic XML grammar and the open set of XML
3418 | namespaces for element and attribute names;
3419 |
3420 |
extensibility models in Cascading Style Sheets (CSS), XSLT 1.0,
3421 | and SOAP;
3422 |
3423 |
user agent plug-ins.
3424 |
3425 |
3426 |
3427 | An example of an unsuccessful extension mechanism is HTTP mandatory
3428 | extensions [HTTPEXT]. The community has sought
3429 | mechanisms to extend HTTP, but apparently the costs of the mandatory
3430 | extension proposal (notably in complexity) outweighed the benefits
3431 | and thus hampered adoption.
3432 |
3433 |
3434 | Below we discuss the property of "extensibility," exhibited by URIs,
3435 | some data formats, and some protocols (through the incorporation of
3436 | new messages).
3437 |
3438 |
3439 | Subset language: one language is a subset (or "profile")
3440 | of a second language if any document in the first language is also a
3441 | valid document in the second language and has the same interpretation
3442 | in the second language.
3443 |
3444 |
3445 | Extended language: If one language is a subset of another,
3446 | the latter superset is called an extended language; the difference
3447 | between the languages is called the extension. Clearly, extending a
3448 | language is better for interoperability than creating an incompatible
3449 | language.
3450 |
3451 |
3452 | Ideally, many instances of a superset language can be safely and
3453 | usefully processed as though they were in the subset language.
3454 | Languages that can evolve this way, allowing applications to provide
3455 | new information when necessary while still interoperating with
3456 | applications that only understand a subset of the current language,
3457 | are said to be "extensible." Language designers can facilitate
3458 | extensibility by defining the default behavior of unknown
3459 | extensions—for example, that they be ignored (in some defined way) or
3460 | should be considered errors.
3461 |
3462 |
3463 | For example, from early on in the Web, HTML agents followed the
3464 | convention of ignoring unknown tags. This choice left room for
3465 | innovation (i.e., non-standard elements) and encouraged the
3466 | deployment of HTML. However, interoperability problems arose as well.
3467 | In this type of environment, there is an inevitable tension between
3468 | interoperability in the short term and the desire for extensibility.
3469 | Experience shows that designs that strike the right balance between
3470 | allowing change and preserving interoperability are more likely to
3471 | thrive and are less likely to disrupt the Web community. Orthogonal specifications help reduce the
3473 | risk of disruption.
3474 |
3488 | Errors occur in networked information systems. An error condition can
3489 | be well-characterized (e.g., well-formedness errors in XML or 4xx
3490 | client errors in HTTP) or arise unpredictably. Error
3491 | correction means that an agent repairs a condition so that
3492 | within the system, it is as though the error never occurred. One
3493 | example of error correction involves data retransmission in response
3494 | to a temporary network failure. Error recovery means that
3495 | an agent does not repair an error condition but continues processing
3496 | by addressing the fact that the error has occurred.
3497 |
3498 |
3499 | Agents frequently correct errors without user awareness,
3500 | sparing users the details of complex network communications. On the
3501 | other hand, it is important that agents recover from error
3502 | in a way that is evident to users, since the agents are acting on
3503 | their behalf.
3504 |
3511 | Agents that recover from error by making a choice without the
3512 | user's consent are not acting on the user's behalf.
3513 |
3514 |
3515 |
3516 | An agent is not required to interrupt the user (e.g., by popping up a
3517 | confirmation box) to obtain consent. The user may indicate consent
3518 | through pre-selected configuration options, modes, or selectable user
3519 | interface toggles, with appropriate reporting to the user when the
3520 | agent detects an error. Agent developers should not ignore usability
3521 | issues when designing error recovery behavior.
3522 |
3523 |
3524 | To promote interoperability, specification designers should identify
3525 | predictable error conditions. Experience has led to the following
3526 | observations about error-handling approaches.
3527 |
3528 |
3529 |
Protocol designers should provide enough information about an
3530 | error condition so that an agent can address the error condition. For
3531 | instance, an HTTP 404 status code (not found) is useful because it
3532 | allows user agents to present relevant information to users, enabling
3533 | them to contact the representation provider in case of problems.
3534 |
3535 |
Experience with the cost of building a user agent to handle the
3536 | diverse forms of ill-formed HTML content convinced the designers of
3537 | the XML specification to require that agents fail upon encountering
3538 | ill-formed content. Because users are unlikely to tolerate such
3539 | failures, this design choice has pressured all parties into
3540 | respecting XML's constraints, to the benefit of all.
3541 |
3542 |
An agent that encounters unrecognized content may handle it in a
3543 | number of ways, including by considering it an error; see also the
3544 | section on extensibility and
3545 | versioning.
3546 |
3547 |
Error behavior that is appropriate for a person may not be
3548 | appropriate for software. People are capable of exercising judgement
3549 | in ways that software applications generally cannot. An informal
3550 | error response may suffice for a person but not for a processor.
3551 |
3552 |
3553 |
3554 | See the TAG issue contentTypeOverride-24,
3556 | which concerns the source of authoritative metadata.
3557 |
3558 |
3559 |
3560 |
3561 | Protocol-based Interoperability
3562 |
3563 |
3564 | The Web follows Internet tradition in that its important interfaces
3565 | are defined in terms of protocols, by specifying the syntax,
3566 | semantics, and sequencing constraints of the messages interchanged.
3567 | Protocols designed to be resilient in the face of widely varying
3568 | environments have helped the Web scale and have facilitated
3569 | communication across multiple trust boundaries. Traditional
3570 | application programming interfaces (APIs) do not always
3571 | take these constraints into account, nor should they be required to.
3572 | One effect of protocol-based design is that the technology shared
3573 | among agents often lasts longer than the agents themselves.
3574 |
3575 |
3576 | It is common for programmers working with the Web to write code that
3577 | generates and parses these messages directly. It is less common, but
3578 | not unusual, for end users to have direct exposure to these messages.
3579 | It is often desirable to provide users with access to format and
3580 | protocol details: allowing them to “view source,” whereby they may
3581 | gain expertise in the workings of the underlying system.
3582 |
3594 | The practice of providing multiple
3595 | representations available via the same URI. Which representation is
3596 | served depends on negotiation between the requesting agent and the
3597 | agent serving the representations.
3598 |
3659 | An information resource that contains useful
3660 | information, machine-processable and/or human-readable, about terms
3661 | in a particular XML namespace.
3662 |
3688 | A resource related to another resource through
3689 | the primary resource with additional identifying information (the
3690 | fragment identifier).
3691 |
3696 | One language is a subset of a second language
3697 | if any document in the first language is also a valid document in the
3698 | second language and has the same interpretation in the second
3699 | language.
3700 |
3701 |
3702 | URI
3703 |
3704 |
3705 | Acronym for Uniform Resource Identifier.
3706 |
3732 | The social expectation that once a URI
3733 | identifies a particular resource, it should continue indefinitely to
3734 | refer to that resource.
3735 |
3833 | Cool URIs
3834 | don't change T. Berners-Lee, W3C, 1998 Available at
3835 | http://www.w3.org/Provider/Style/URI. Note that the title is
3836 | somewhat misleading. It is not the URIs that change, it is what
3837 | they identify.
3838 |
3852 |
3854 | Mandatory Extensions in HTTP, H. Frystyk Nielsen, P.
3855 | Leach, S. Lawrence, 20 January 1998. This expired IETF Internet
3856 | Draft is available at
3857 | http://www.w3.org/Protocols/HTTP/ietf-http-ext/draft-frystyk-http-mandatory.
3858 |
3871 | IETF Guidelines
3873 | For The Use of XML in IETF Protocols, S. Hollenbeck, M.
3874 | Rose, L. Masinter, eds., 2 November 2002. This IETF Internet Draft
3875 | is available at
3876 | http://www.imc.org/ietf-xml-use/xml-guidelines-07.txt. If this
3877 | document is no longer available, refer to the ietf-xml-use mailing
3879 | list.
3880 |
3885 | XML Information Set
3886 | (Second Edition), R. Tobin, J. Cowan, Editors, W3C
3887 | Recommendation, 04 February 2004,
3888 | http://www.w3.org/TR/2004/REC-xml-infoset-20040204. Latest version available at
3890 | http://www.w3.org/TR/xml-infoset.
3891 |
3918 | OWL Web Ontology Language
3919 | Reference, M. Dean, G. Schreiber, Editors, W3C Recommendation,
3920 | 10 February 2004,
3921 | http://www.w3.org/TR/2004/REC-owl-ref-20040210/. Latest version available at
3923 | http://www.w3.org/TR/owl-ref/.
3924 |
3967 |
3969 | Representational State Transfer (REST), Chapter 5 of
3970 | "Architectural Styles and the Design of Network-based Software
3971 | Architectures", Doctoral Thesis of R. T. Fielding, 2000. Designers
3972 | of protocol specifications in particular should invest time in
3973 | understanding the REST model and the relevance of its principles to
3974 | a given design. These principles include statelessness, clear
3975 | assignment of roles to parties, uniform address space, and a
3976 | limited, uniform set of verbs. Available at
3977 | http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm.
3978 |
4035 | IETF RFC 2616:
4036 | Hypertext Transfer Protocol - HTTP/1.1, J. Gettys, J.
4037 | Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June
4038 | 1999. Available at http://www.ietf.org/rfc/rfc2616.txt.
4039 |
4053 | IETF RFC 2718:
4054 | Guidelines for new URL Schemes, L. Masinter, H.
4055 | Alvestrand, D. Zigmond, R. Petke, November 1999. Available at:
4056 | http://www.ietf.org/rfc/rfc2718.txt.
4057 |
4070 | IETF RFC 3023:
4071 | XML Media Types, M. Murata, S. St. Laurent, D. Kohn,
4072 | January 2001. Available at: http://www.ietf.org/rfc/rfc3023.txt
4073 |
4113 | SOAP Version 1.2 Part
4114 | 1: Messaging Framework, J. Moreau, N. Mendelsohn, H. Frystyk
4115 | Nielsen, et. al., Editors, W3C Recommendation,
4116 | 24 June 2003,
4117 | http://www.w3.org/TR/2003/REC-soap12-part1-20030624/. Latest version available
4119 | at http://www.w3.org/TR/soap12-part1/.
4120 |
4187 | XML Linking Language (XLink)
4188 | Version 1.0, E. Maler, S. DeRose, D. Orchard, Editors, W3C
4189 | Recommendation, 27 June 2001,
4190 | http://www.w3.org/TR/2001/REC-xlink-20010627/. Latest version available at
4192 | http://www.w3.org/TR/xlink/.
4193 |
4198 | xml:id Version 1.0, D.
4199 | Veillard, J. Marsh, Editors, W3C Working Draft (work in progress),
4200 | 07 April 2004,
4201 | http://www.w3.org/TR/2004/WD-xml-id-20040407. Latest version available at
4203 | http://www.w3.org/TR/xml-id/.
4204 |
4209 | Extensible Markup Language
4210 | (XML) 1.0 (Third Edition), F. Yergeau, J. Paoli, C. M.
4211 | Sperberg-McQueen, et. al., Editors, W3C Recommendation,
4212 | 04 February 2004,
4213 | http://www.w3.org/TR/2004/REC-xml-20040204. Latest version available at
4215 | http://www.w3.org/TR/REC-xml.
4216 |
4221 | Extensible Markup Language
4222 | (XML) 1.1, J. Paoli, C. M. Sperberg-McQueen, J. Cowan, et.
4223 | al., Editors, W3C Recommendation, 04 February 2004,
4224 | http://www.w3.org/TR/2004/REC-xml11-20040204/. Latest version available at
4226 | http://www.w3.org/TR/xml11/.
4227 |
4232 | Namespaces in XML
4233 | 1.1, R. Tobin, D. Hollander, A. Layman, et. al.,
4234 | Editors, W3C Recommendation, 04 February 2004,
4235 | http://www.w3.org/TR/2004/REC-xml-names11-20040204. Latest version available at
4237 | http://www.w3.org/TR/xml-names11/.
4238 |
4243 | XML Schema Part 1:
4244 | Structures, D. Beech, M. Maloney, H. S. Thompson, et.
4245 | al., Editors, W3C Recommendation, 02 May 2001,
4246 | http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/. Latest version available at
4248 | http://www.w3.org/TR/xmlschema-1/.
4249 |
4254 | XPointer
4255 | Framework, E. Maler, N. Walsh, P. Grosso, et. al.,
4256 | Editors, W3C Recommendation, 25 March 2003,
4257 | http://www.w3.org/TR/2003/REC-xptr-framework-20030325/. Latest version available
4259 | at http://www.w3.org/TR/xptr-framework/.
4260 |
4294 | Character Model for the
4295 | World Wide Web 1.0: Fundamentals, R. Ishida, M. J. Dürst, M.
4296 | Wolf, et. al., Editors, W3C Working Draft (work in
4297 | progress), 25 February 2004,
4298 | http://www.w3.org/TR/2004/WD-charmod-20040225/. Latest version available at
4300 | http://www.w3.org/TR/charmod/.
4301 |
4317 | Web
4319 | Architecture: Extensible Languages, T. Berners-Lee, D.
4320 | Connolly, 10 February 1998. This W3C Note is available at
4321 | http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210.
4322 |
4327 | Principled
4329 | Design of the Modern Web Architecture, R.T. Fielding
4330 | and R.N. Taylor, UC Irvine. In Proceedings of the 2000
4331 | International Conference on Software Engineering (ICSE 2000),
4332 | Limerick, Ireland, June 2000, pp. 407-416. This document is
4333 | available at
4334 | http://www.ics.uci.edu/~fielding/pubs/webarch_icse2000.pdf.
4335 |
4340 | QA Framework:
4341 | Specification Guidelines, D. Hazaël-Massieux, L. Rosenthal,
4342 | L. Henderson, et. al., Editors, W3C Working Draft (work
4343 | in progress), 30 August 2004,
4344 | http://www.w3.org/TR/2004/WD-qaframe-spec-20040830/. Latest version available
4346 | at http://www.w3.org/TR/qaframe-spec/.
4347 |
4361 | Variability in
4362 | Specifications, L. Rosenthal, D. Hazaël-Massieux, Editors,
4363 | W3C Working Draft (work in progress), 30 August 2004,
4364 | http://www.w3.org/TR/2004/WD-spec-variability-20040830/. Latest version
4366 | available at http://www.w3.org/TR/spec-variability/.
4367 |
4372 | User Agent Accessibility
4373 | Guidelines 1.0, J. Gunderson, I. Jacobs, E. Hansen, Editors,
4374 | W3C Recommendation, 17 December 2002,
4375 | http://www.w3.org/TR/2002/REC-UAAG10-20021217/. Latest version available at
4377 | http://www.w3.org/TR/UAAG10/.
4378 |
4383 | Web Content Accessibility
4384 | Guidelines 2.0, W. Chisholm, J. White, B. Caldwell, et.
4385 | al., Editors, W3C Working Draft (work in progress),
4386 | 30 July 2004,
4387 | http://www.w3.org/TR/2004/WD-WCAG20-20040730/. Latest version available at
4389 | http://www.w3.org/TR/WCAG20/.
4390 |
4395 | Web Services
4396 | Architecture, D. Booth, F. McCabe, E. Newcomer, et.
4397 | al., Editors, W3C Note, 11 February 2004,
4398 | http://www.w3.org/TR/2004/NOTE-ws-arch-20040211/. Latest version available at
4400 | http://www.w3.org/TR/ws-arch/.
4401 |
4406 | XML Accessibility
4407 | Guidelines, S. B. Palmer, C. McCathieNevile, D. Dardailler,
4408 | Editors, W3C Working Draft (work in progress),
4409 | 03 October 2002,
4410 | http://www.w3.org/TR/2002/WD-xag-20021003. Latest version available at
4412 | http://www.w3.org/TR/xag.
4413 |
4414 |
4415 |
4416 |
4417 |
4418 |
4419 |
4420 | Acknowledgments
4421 |
4422 |
4423 | This document was authored by the W3C Technical Architecture Group
4424 | which included the following participants: Tim Berners-Lee (co-Chair,
4425 | W3C), Tim Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton
4426 | (Microsoft Corporation), Roy Fielding (Day Software), Mario Jeckle
4427 | (Daimler Chrysler), Chris Lilley (W3C), Noah Mendelsohn (IBM), David
4428 | Orchard (BEA Systems), Norman Walsh (Sun Microsystems), and Stuart
4429 | Williams (co-Chair, Hewlett-Packard).
4430 |
4431 |
4432 | The TAG appreciates the many contributions on the TAG's public mailing
4433 | list, www-tag@w3.org (archive), which have
4435 | helped to improve this document.
4436 |
4437 |
4438 | In addition, contributions by David Booth, Erik Bruchez, Kendall Clark,
4439 | Karl Dubost, Bob DuCharme, Martin Duerst, Olivier Fehr, Al Gilman, Tim
4440 | Goodwin, Elliotte Rusty Harold, Tony Hammond, Sandro Hawke, Ryan Hayes,
4441 | Dominique Hazaël-Massieux, Masayasu Ishikawa, David M. Karr, Graham
4442 | Klyne, Jacek Kopecky, Ken Laskey, Susan Lesch, Håkon Wium Lie, Frank
4443 | Manola, Mark Nottingham, Bijan Parsia, Peter F. Patel-Schneider, David
4444 | Pawson, Michael Sperberg-McQueen, Patrick Stickler, and Yuxiao Zhao are
4445 | gratefully acknowledged.
4446 |