├── .DS_Store
├── .gitignore
├── use-cases
    ├── .DS_Store
    ├── respec-config.js
    ├── draft.md
    └── index.html
├── gap-analysis
    ├── .DS_Store
    ├── respec-config.js
    └── index.html
├── user-scenarios
    ├── .DS_Store
    ├── respec-config.js
    ├── draft.md
    └── index.html
├── samples
    ├── audio
    │   ├── case-sub.mp3
    │   ├── case-audio.mp3
    │   ├── case-break.mp3
    │   ├── case-phoneme.mp3
    │   ├── case-prosody.mp3
    │   ├── case-raven.mp3
    │   ├── case-say-as.mp3
    │   └── case-emphasis.mp3
    ├── index.html
    ├── respec-config.js
    ├── w3cptf-singleattr-tests.html
    └── w3cptf-multiattr-tests.html
├── w3c.json
├── docs
    ├── bestpractices.md
    └── explainer.md
├── LICENSE.md
├── CODE_OF_CONDUCT.md
├── scripts
    ├── DICTIONARY
    └── proof.js
├── CONTRIBUTING.md
├── index.html
├── common
    └── acknowledgements.html
├── .github
    └── workflows
    │   └── auto-publish.yml
├── README.md
├── presentations
    └── template.html
├── technical-approach
    ├── respec-config.js
    ├── ssml-json-schema-w3cptf.json
    └── appendixJSON.html
├── explainer
    ├── respec-config.js
    └── index.html
└── gap-analysis_and_use-case
    └── respec-config.js


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/.DS_Store


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | node_modules
3 | docs/expainer.md
4 | .vscode


--------------------------------------------------------------------------------
/use-cases/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/use-cases/.DS_Store


--------------------------------------------------------------------------------
/gap-analysis/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/gap-analysis/.DS_Store


--------------------------------------------------------------------------------
/user-scenarios/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/user-scenarios/.DS_Store


--------------------------------------------------------------------------------
/samples/audio/case-sub.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-sub.mp3


--------------------------------------------------------------------------------
/samples/audio/case-audio.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-audio.mp3


--------------------------------------------------------------------------------
/samples/audio/case-break.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-break.mp3


--------------------------------------------------------------------------------
/samples/audio/case-phoneme.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-phoneme.mp3


--------------------------------------------------------------------------------
/samples/audio/case-prosody.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-prosody.mp3


--------------------------------------------------------------------------------
/samples/audio/case-raven.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-raven.mp3


--------------------------------------------------------------------------------
/samples/audio/case-say-as.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-say-as.mp3


--------------------------------------------------------------------------------
/samples/audio/case-emphasis.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/w3c/pronunciation/HEAD/samples/audio/case-emphasis.mp3


--------------------------------------------------------------------------------
/w3c.json:
--------------------------------------------------------------------------------
1 |  {
2 |     "group":      [83907]
3 | ,   "contacts":   ["michael-n-cooper"]
4 | ,   "repo-type":  "rec-track"
5 | }
6 | 


--------------------------------------------------------------------------------
/docs/bestpractices.md:
--------------------------------------------------------------------------------
1 | #Best Practices for Implementing (or authoring?) Spoken Presentation and Pronunciation of Web Content (DRAFT)
2 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | All documents in this Repository are licensed by contributors
2 | under the 
3 | [W3C Document License](https://www.w3.org/Consortium/Legal/copyright-documents).
4 | 
5 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Code of Conduct
2 | 
3 | All documentation, code and communication under this repository are covered by the [W3C Code of Ethics and Professional Conduct](https://www.w3.org/Consortium/cepc/).
4 | 


--------------------------------------------------------------------------------
/scripts/DICTIONARY:
--------------------------------------------------------------------------------
 1 | Alexa
 2 | DOM
 3 | Grenier
 4 | HTML
 5 | HTML5
 6 | JSON
 7 | JSON-LD
 8 | Léamh
 9 | MathML
10 | microdata
11 | SSML
12 | SVG
13 | SaaS
14 | TTS
15 | UI
16 | VoiceXML
17 | W3C
18 | aria-ssml
19 | data-ssml
20 | heteronyms
21 | namespace
22 | namespaces
23 | romaji


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Accessible Platform Architectures Working Group
 2 | 
 3 | Contributions to this repository are intended to become part of Recommendation-track documents governed by the
 4 | [W3C Patent Policy](https://www.w3.org/Consortium/Patent-Policy-20040205/) and
 5 | [Document License](https://www.w3.org/Consortium/Legal/copyright-documents). To make substantive contributions to specifications, you must either participate
 6 | in the relevant W3C Working Group or make a non-member patent licensing commitment.
 7 | 
 8 | If you are not the sole contributor to a contribution (pull request), please identify all 
 9 | contributors in the pull request comment.
10 | 
11 | To add a contributor (other than yourself, that's automatic), mark them one per line as follows:
12 | 
13 | ```
14 | +@github_username
15 | ```
16 | 
17 | If you added a contributor by mistake, you can remove them in a comment with:
18 | 
19 | ```
20 | -@github_username
21 | ```
22 | 
23 | If you are making a pull request on behalf of someone else but you had no part in designing the 
24 | feature, you can remove yourself with the above syntax.
25 | 


--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
 1 | <!doctype html>
 2 | <html lang="en">
 3 |   <head>
 4 |     <meta charset="utf-8">
 5 |     <title>Spec proposal</title>
 6 |     <script src="https://www.w3.org/Tools/respec/respec-w3c-common" class="remove"></script>
 7 |     <script class='remove'>
 8 |       // See https://github.com/w3c/respec/wiki/ for how to configure ReSpec
 9 |       var respecConfig = {
10 |         specStatus: "CG-DRAFT",
11 |         shortName: "pronunciation",
12 |         editors: [{
13 |           name: "Michael Cooper"
14 |         }]
15 |       };
16 |     </script>
17 |   </head>
18 |   <body>
19 |     <section id="abstract">
20 |       <p>
21 |         This specification does neat stuff.
22 |       </p>
23 |     </section>
24 |     <section id="sotd">
25 |       <p>
26 |         This is an unofficial proposal.
27 |       </p>
28 |     </section>
29 | 
30 |     <section id="introduction">
31 |       <h2>Introduction</h2>
32 |       <p>
33 |         See <a href="https://github.com/w3c/respec/wiki/User's-Guide">ReSpec's user guide</a>
34 |         for how toget started!
35 |       </p>
36 |     </section>
37 |   </body>
38 | </html>
39 | 


--------------------------------------------------------------------------------
/common/acknowledgements.html:
--------------------------------------------------------------------------------
 1 | <section class="appendix informative section" id="acknowledgements">
 2 | 	<h3>Acknowledgments</h3>
 3 | 	<p>The following people contributed to the development of this document.</p>
 4 | 	<section class="section" id="ack_group">
 5 | 		<h4>Participants active in the Pronunciation TF at the time of publication</h4>
 6 | 		<ul>
 7 | 			<li>Irfan Ali (BlackRock)</li>
 8 | 			<li>Michael Cooper (W3C/MIT)</li>
 9 | 			<li>Dee Dyer (Educational Testing Service)</li>
10 | 			<li>Markku Hakkinen (Educational Testing Service)</li>
11 | 			<li>Christine Loew (Invited Expert)</li>
12 | 			<li>Steve Noble (Pearson plc)</li>
13 | 			<li>Roy Ran (W3C/Beihang)</li>
14 | 			<li>Janina Sajka (W3C invited expert)</li>
15 | 			<li>John Foliot (Invited Expert)</li>
16 | 			<li>Paul Grenier (Invited Expert)</li>
17 | 			<li>Sam Kanta (Invited Expert)</li>
18 | 			<li>Neil Soiffer (Invited Expert)</li>
19 | 			<li>Léonie, Watson (TetraLogical)</li>
20 | 			<li>Tom Babinszki (UnitedHealth Group)</li>
21 | 			<li>Paul Bohman (Deque Systems, Inc.)</li>
22 | 			<li>Becky Gibson (W3C Invited Expert)</li>
23 | 			<li>Alan Reeve (Invited Expert)</li>
24 | 
25 | 		</ul>
26 |   </section>
27 | </section>
28 | 


--------------------------------------------------------------------------------
/.github/workflows/auto-publish.yml:
--------------------------------------------------------------------------------
 1 | name: CI
 2 | on:
 3 |   pull_request: {}
 4 |   push:
 5 |     branches: [main]
 6 | jobs:
 7 |   main:
 8 |     name: Build, Validate and Deploy
 9 |     runs-on: ubuntu-latest
10 |     steps:
11 |       - uses: actions/checkout@v2
12 |       - uses: w3c/spec-prod@v2
13 |         with:
14 |           SOURCE: explainer/index.html
15 |           DESTINATION: explainer/index.html
16 |           TOOLCHAIN: respec
17 |           GH_PAGES_BRANCH: gh-pages
18 |           VALIDATE_WEBIDL: false
19 |           VALIDATE_MARKUP: false
20 |       - uses: w3c/spec-prod@v2
21 |         with:
22 |           SOURCE: gap-analysis_and_use-case/index.html
23 |           DESTINATION: gap-analysis_and_use-case/index.html
24 |           TOOLCHAIN: respec
25 |           GH_PAGES_BRANCH: gh-pages
26 |           VALIDATE_WEBIDL: false
27 |           VALIDATE_MARKUP: false
28 |       - uses: w3c/spec-prod@v2
29 |         with:
30 |           SOURCE: technical-approach/index.html
31 |           DESTINATION: technical-approach/index.html
32 |           TOOLCHAIN: respec
33 |           GH_PAGES_BRANCH: gh-pages
34 |           VALIDATE_WEBIDL: false
35 |           VALIDATE_MARKUP: false
36 |       - uses: w3c/spec-prod@v2
37 |         with:
38 |           SOURCE: user-scenarios/index.html
39 |           DESTINATION: user-scenarios/index.html
40 |           TOOLCHAIN: respec
41 |           GH_PAGES_BRANCH: gh-pages
42 |           VALIDATE_WEBIDL: false
43 |           VALIDATE_MARKUP: false
44 |       
45 | 


--------------------------------------------------------------------------------
/samples/index.html:
--------------------------------------------------------------------------------
 1 | 
 2 | <!DOCTYPE html>
 3 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
 4 | 
 5 | <head>
 6 | 	<title>W3C Pronunciation Task Force Samples Page</title>
 7 | 	<meta charset="utf-8" />
 8 | 	<script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
 9 | 
10 | 	<script src="respec-config.js" class="remove"></script>
11 | </head>
12 | 
13 | <body>
14 | <section id="abstract">
15 | 	<p>This document provides samples of Pronunciation.</p>
16 | </section>
17 | <h1>W3C Pronunciation Task Force SSML in HTML Sample Content</h1>
18 | <p>This sample content has been developed by the W3C Accessible Platform Architecture Pronunciation Task Force to allow developers to
19 | evaluate and test their implementation of the proposed Attribute Model of SSML in HTML. The samples are presented in each of the two proposed
20 | approaches for representing  SSML via attibutes.  There are currently 9 test cases, each incorporating
21 | one or more of the SSML functions defined in the Pronunciation Technical Approach Document. Each test case includes an audio sample of the expected
22 | spoken presentation.
23 | </p>
24 | <p>Developers of read aloud tools, screen readers, and voice assistants should use these test cases to evaluate and test their implementation. Questions and test results
25 | should be filed as issues in the <a href="https://github.com/w3c/pronunciation">Pronunciation github repo</a>.</p>
26 | <h2>Sample Content</h2>
27 | <ul>
28 |     <li><a href="w3cptf-multiattr-tests.html">Multiple Attribute Samples</a</li>
29 |     </li><a href="w3cptf-singleattr-tests.html">Single Attribute Samples</a></li>
30 | </ul>
31 | </body>
32 | </html>
33 | 
34 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | - [Calendar link](https://www.w3.org/groups/tf/pronunciation-tf/calendar/)
 2 | 
 3 | # Spoken Presentation Task Force current work
 4 | 
 5 | Editors' drafts are available on:
 6 | 
 7 | * Explainer: Improving Spoken Presentation on the Web - https://w3c.github.io/pronunciation/explainer
 8 | 
 9 | * Pronunciation User Scenarios - https://w3c.github.io/pronunciation/user-scenarios/
10 | 
11 | * Pronunciation Gap Analysis and Use Cases - https://w3c.github.io/pronunciation/gap-analysis_and_use-case
12 | 
13 | * Pronunciation Technical Approach - https://w3c.github.io/pronunciation/technical-approach/
14 | 
15 | Markdown URL for Use-cases
16 | 
17 | * https://github.com/w3c/pronunciation/blob/master/use-cases/draft.md
18 | 
19 | ## Timeline
20 | 
21 | 
22 | Spoken Presentation TF want to make a publication every three months if possible. 
23 | Timeline: https://github.com/w3c/pronunciation/wiki/Timeline
24 | 
25 | # Implementations
26 | 
27 | ## Multi-attribute Approach
28 | 
29 | none
30 | 
31 | ## Single-attribute Approach
32 | 
33 | Note: need specific URL to confirm from each vendor
34 | 
35 | Texthelp SpeechStream : https://www.texthelp.com/en-us/products/speechstream/
36 | 
37 | Pearson TestNav : https://home.testnav.com/
38 | 
39 | # For Authors
40 | 
41 | ## Proofing Markdown
42 | 
43 | Have node installed. Then, from this folder, run `npm install` from the command line (only the first time).
44 | 
45 | To proof all markdown files in this repository, `node ./scripts/proof.js`. To proof specific files, use "glob" syntax like `node ./scripts/proof.js */draft.md` or a specific files `node ./scripts/proof.js contributing.md readme.md`
46 | 
47 | If you introduce proper nouns, technical terms, or acronyms, you can add them to the `./scripts/DICTIONARY` file which will no longer trigger the `retext-spell` rule.
48 | 


--------------------------------------------------------------------------------
/scripts/proof.js:
--------------------------------------------------------------------------------
 1 | const vfile = require('to-vfile');
 2 | const unified = require('unified');
 3 | const markdown = require('remark-parse');
 4 | const lint = require('remark-lint');
 5 | const stringify = require('remark-stringify');
 6 | const report = require('vfile-reporter');
 7 | const glob = require('fast-glob');
 8 | const remark2retext = require('remark-retext');
 9 | const fs = require('fs');
10 | const path = require('path');
11 | const personal = fs.readFileSync(path.join(__dirname, 'DICTIONARY'));
12 | 
13 | // Plugins
14 | // @see https://github.com/retextjs/retext/blob/master/doc/plugins.md
15 | const english = require('retext-english');
16 | const equality = require('retext-equality');
17 | const passive = require('retext-passive');
18 | const simplify = require('retext-simplify');
19 | const readability = require('retext-readability');
20 | const spell = require('retext-spell');
21 | const dictionary = require('dictionary-en-us');
22 | const urls = require('retext-syntax-urls');
23 | const acronyms = require('retext-redundant-acronyms');
24 | const repeated = require('retext-repeated-words');
25 | 
26 | // Take command line file glob
27 | const files = process.argv.slice(2);
28 | 
29 | // Set default if no glob passed
30 | if (!files.length) {
31 |   files.push('**/*.md');
32 | }
33 | 
34 | // Always avoid node_modules
35 | glob(['!node_modules', ...files]).then(files =>
36 |   files.map(file => processAST(vfile.readSync(file)))
37 | );
38 | 
39 | function processAST(ast) {
40 |   unified()
41 |     .use(markdown)
42 |     .use(lint)
43 |     .use(
44 |       remark2retext,
45 |       unified()
46 |         .use(english)
47 |         .use(urls)
48 |         .use(acronyms)
49 |         .use(repeated)
50 |         .use(equality)
51 |         .use(passive)
52 |         .use(simplify)
53 |         .use(readability)
54 |         .use(spell, {
55 |           dictionary,
56 |           personal
57 |         })
58 |     )
59 |     .use(stringify)
60 |     .process(ast, function(err, file) {
61 |       console.error(report(err || file));
62 |     });
63 | }
64 | 


--------------------------------------------------------------------------------
/presentations/template.html:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <!DOCTYPE html
 3 |   PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 4 | <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
 5 | 
 6 |   <head>
 7 |     <title>Slide Shows in XHTML</title>
 8 |     <meta name="duration" content="5" />
 9 |     <meta name="copyright"
10 |       content="Copyright &#169; 2005-2009 W3C (MIT, ERCIM, Keio)" />
11 |     <link rel="stylesheet" type="text/css" media="screen, projection, print"
12 |       href="http://www.w3.org/Talks/Tools/Slidy2/styles/slidy.css" />
13 |     <link rel="stylesheet" type="text/css" media="screen, projection, print"
14 |       href="http://www.w3.org/Talks/Tools/Slidy2/styles/w3c-blue.css" />
15 |     <script src="http://www.w3.org/Talks/Tools/Slidy2/scripts/slidy.js"
16 |       charset="utf-8" type="text/javascript"></script>
17 |     <style type="text/css">
18 |       /* your custom style rules */
19 |     </style>
20 |   </head>
21 | 
22 |   <body>
23 |     <div class="background">
24 |       <img id="head-icon" alt="graphic with four colored squares"
25 |         src="http://www.w3.org/Talks/Tools/Slidy2/graphics/icon-blue.png" />
26 |       <object id="head-logo" title="W3C logo" type="image/svg+xml"
27 |         data="http://www.w3.org/Talks/Tools/Slidy2/graphics/w3c-logo-white.svg">
28 |         <img
29 |           src="http://www.w3.org/Talks/Tools/Slidy2/graphics/w3c-logo-white.gif"
30 |           alt="W3C logo" id="head-logo-fallback" />
31 |       </object>
32 |     </div>
33 |     <div class="slide cover">
34 |       <br clear="all" />
35 |       <h1>Specification for Spoken Presentation in HTML</h1>
36 |       <a
37 |         href="https://www.w3.org/TR/spoken-html/">https://www.w3.org/TR/spoken-html/</a>
38 |       </p>
39 |     </div>
40 |     <div class="slide foo">
41 |       <h1>New Slide</h1>
42 |       <ul>
43 |         <li>Each presentation is a single XHTML file</li>
44 | 
45 |         <li>Each slide is enclosed in <em>&lt;div class="slide"&gt; ...
46 |             &lt;/div&gt;</em>
47 | 
48 |           <ul>
49 |             <li>The div element will be created automatically for h1
50 |               elements that are direct children of the body element.</li>
51 |           </ul>
52 | 
53 |         </li>
54 | 
55 |         <li>Use regular markup within each slide</li>
56 | 
57 |         <li>See <a
58 |             href="https://www.w3.org/Talks/Tools/Slidy2/Overview.html#(1)">https://www.w3.org/Talks/Tools/Slidy2/Overview.html#(1)</a>
59 |           for more examples.</li>
60 |       </ul>
61 |       <div>
62 |   </body>
63 | 
64 | </html>


--------------------------------------------------------------------------------
/samples/respec-config.js:
--------------------------------------------------------------------------------
 1 | var respecConfig = {
 2 |     // embed RDFa data in the output
 3 |     trace:  true,
 4 |     doRDFa: '1.0',
 5 |     includePermalinks: true,
 6 |     permalinkEdge:     true,
 7 |     permalinkHide:     false,
 8 |     tocIntroductory: true,
 9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
10 |     specStatus:           "unofficial",
11 |     //noRecTrack: false,
12 |     //crEnd:                "2012-04-30",
13 |     //perEnd:               "2013-07-23",
14 |     //publishDate:          "2013-08-22",
15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
16 | 
17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
18 |     shortName:            "pronunciation-samples",
19 |     noTOC: true,
20 | 
21 | 
22 |     // if you wish the publication date to be other than today, set this
23 |     //publishDate:  "2017-05-09",
24 |     copyrightStart:  "2021",
25 |     license: "w3c-software",
26 |     editors: [
27 |       
28 |       {
29 |         name: "",
30 |         url: '',
31 |         mailto: "",
32 |         company: "",
33 |         companyURI: "",
34 |        
35 |       }
36 |     ],
37 | 
38 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
39 |     // and its maturity status
40 |     //previousPublishDate:  "",
41 |     //previousMaturity:  "",
42 |     //prevRecURI: "",
43 |     //previousDiffURI: "",
44 | 
45 |     // if there a publicly available Editors Draft, this is the link
46 |     //edDraftURI: "https://w3c.github.io/pronunciation/technical-approach",
47 | 
48 |     // if this is a LCWD, uncomment and set the end of its review period
49 |     // lcEnd: "2012-02-21",
50 | 
51 |     // editors, add as many as you like
52 |     // only "name" is required
53 |     
54 | 
55 |     // authors, add as many as you like.
56 |     // This is optional, uncomment if you have authors as well as editors.
57 |     // only "name" is required. Same format as editors.
58 | 
59 |     //authors:  [
60 |     //    { name: "Your Name", url: "http://example.org/",
61 |     //      company: "Your Company", companyURI: "http://example.com/" },
62 |     //],
63 | 
64 |     /*
65 |     alternateFormats: [
66 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
67 |       { uri: 'aria.ps', label: "PostScript version" },
68 |       { uri: 'aria.pdf', label: "PDF version" }
69 |     ],
70 |     */
71 | 
72 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
73 | 
74 | 
75 | 
76 |    // WG info
77 |    group: "apa"
78 | 
79 |     // Spec URLs
80 | 
81 | 
82 |   };
83 | 


--------------------------------------------------------------------------------
/technical-approach/respec-config.js:
--------------------------------------------------------------------------------
  1 | var respecConfig = {
  2 |     // embed RDFa data in the output
  3 |     trace:  true,
  4 |     doRDFa: '1.0',
  5 |     includePermalinks: true,
  6 |     permalinkEdge:     true,
  7 |     permalinkHide:     false,
  8 |     tocIntroductory: true,
  9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 10 |     specStatus:           "ED",
 11 |     noRecTrack: false,
 12 |     //crEnd:                "2012-04-30",
 13 |     //perEnd:               "2013-07-23",
 14 |     //publishDate:          "2013-08-22",
 15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 16 | 
 17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
 18 |     shortName:            "spoken-html",
 19 | 
 20 | 
 21 |     // if you wish the publication date to be other than today, set this
 22 |     //publishDate:  "2017-05-09",
 23 |     copyrightStart:  "2021",
 24 |     license: "w3c-software",
 25 | 
 26 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
 27 |     // and its maturity status
 28 |     //previousPublishDate:  "",
 29 |     //previousMaturity:  "",
 30 |     //prevRecURI: "",
 31 |     //previousDiffURI: "",
 32 | 
 33 |     // if there a publicly available Editors Draft, this is the link
 34 |     edDraftURI: "https://w3c.github.io/pronunciation/technical-approach",
 35 | 
 36 |     // if this is a LCWD, uncomment and set the end of its review period
 37 |     // lcEnd: "2012-02-21",
 38 | 
 39 |     // editors, add as many as you like
 40 |     // only "name" is required
 41 |     editors: [
 42 |       {
 43 |         name: "Irfan Ali",
 44 |         url: 'https://www.w3.org/users/98332',
 45 |         mailto: "iali@ets.org",
 46 |         company: "Educational Testing Service",
 47 |         companyURI: "https://www.ets.org/",
 48 |         w3cid: 98332
 49 |       },
 50 |         {
 51 |          name: "Markku Hakkinen",
 52 |         url: 'https://www.w3.org/users/35712',
 53 |         mailto: "mhakkinen@ets.org",
 54 |         company: "Educational Testing Service",
 55 |         companyURI: "https://www.ets.org/",
 56 |         w3cid: 35712
 57 |       },
 58 |         {
 59 |         name: "Paul Grenier",
 60 |         url: 'https://www.w3.org/users/',
 61 |         mailto: "pgrenier@gmail.com",
 62 |         company: "Invited Expert",
 63 |         w3cid: 111500
 64 |        
 65 |       },
 66 |       {
 67 |         name: "Ruoxi Ran",
 68 |         url: 'https://www.w3.org',
 69 |         mailto: "ran@w3.org",
 70 |         company: "W3C",
 71 |         companyURI: "http://www.w3.org",
 72 |         w3cid: 100586
 73 |       }
 74 |     ],
 75 | 
 76 |     // authors, add as many as you like.
 77 |     // This is optional, uncomment if you have authors as well as editors.
 78 |     // only "name" is required. Same format as editors.
 79 | 
 80 |     //authors:  [
 81 |     //    { name: "Your Name", url: "http://example.org/",
 82 |     //      company: "Your Company", companyURI: "http://example.com/" },
 83 |     //],
 84 | 
 85 |     /*
 86 |     alternateFormats: [
 87 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
 88 |       { uri: 'aria.ps', label: "PostScript version" },
 89 |       { uri: 'aria.pdf', label: "PDF version" }
 90 |     ],
 91 |     */
 92 | 
 93 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
 94 | 
 95 |     maxTocLevel: 3,
 96 | 
 97 |    // WG info
 98 |    //group: 83907
 99 |     group: "apa",
100 |     github: "https://github.com/w3c/pronunciation/"
101 | 
102 |     // Spec URLs
103 | 
104 | 
105 |   };
106 | 


--------------------------------------------------------------------------------
/explainer/respec-config.js:
--------------------------------------------------------------------------------
  1 | var respecConfig = {
  2 |     // embed RDFa data in the output
  3 |     trace:  true,
  4 |     doRDFa: '1.0',
  5 |     includePermalinks: true,
  6 |     permalinkEdge:     true,
  7 |     permalinkHide:     false,
  8 |     tocIntroductory: true,
  9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 10 |     specStatus:           "ED",
 11 |     noRecTrack: true,
 12 |     //crEnd:                "2012-04-30",
 13 |     //perEnd:               "2013-07-23",
 14 |     //publishDate:          "2013-08-22",
 15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 16 | 
 17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
 18 |     shortName:            "pronunciation-explainer",
 19 | 
 20 | 
 21 |     // if you wish the publication date to be other than today, set this
 22 |     //publishDate:  "2017-05-09",
 23 |     copyrightStart:  "2019",
 24 | 
 25 | 
 26 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
 27 |     // and its maturity status
 28 |     //previousPublishDate:  "",
 29 |     //previousMaturity:  "",
 30 |     //prevRecURI: "",
 31 |     //previousDiffURI: "",
 32 | 
 33 |     // if there a publicly available Editors Draft, this is the link
 34 |     edDraftURI: "https://w3c.github.io/pronunciation/explainer",
 35 | 
 36 |     // if this is a LCWD, uncomment and set the end of its review period
 37 |     // lcEnd: "2012-02-21",
 38 | 
 39 |     // editors, add as many as you like
 40 |     // only "name" is required
 41 |     editors: [
 42 |         {
 43 |         name: "Markku Hakkinen",
 44 |         url: 'https://www.w3.org/users/35712',
 45 |         mailto: "mhakkinen@ets.org",
 46 |         company: "Educational Testing Service",
 47 |         companyURI: "https://www.ets.org/",
 48 |         w3cid: 35712
 49 |       },
 50 |       
 51 |       {
 52 |         name: "Irfan Ali",
 53 |         url: 'https://www.w3.org/users/98332',
 54 |         mailto: "iali@ets.org",
 55 |         company: "Educational Testing Service",
 56 |         companyURI: "https://www.ets.org/",
 57 |         w3cid: 98332
 58 |       }
 59 |      
 60 |     ],
 61 | 
 62 |     // authors, add as many as you like.
 63 |     // This is optional, uncomment if you have authors as well as editors.
 64 |     // only "name" is required. Same format as editors.
 65 | 
 66 |     //authors:  [
 67 |     //    { name: "Your Name", url: "http://example.org/",
 68 |     //      company: "Your Company", companyURI: "http://example.com/" },
 69 |     //],
 70 | 
 71 |     /*
 72 |     alternateFormats: [
 73 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
 74 |       { uri: 'aria.ps', label: "PostScript version" },
 75 |       { uri: 'aria.pdf', label: "PDF version" }
 76 |     ],
 77 |     */
 78 | 
 79 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
 80 | 
 81 |     // name of the WG
 82 |     //wg:           "Accessible Platform Architectures Working Group",
 83 | 
 84 |     // URI of the public WG page
 85 |     //wgURI:        "https://www.w3.org/WAI/APA/",
 86 | 
 87 |     // name (with the @w3c.org) of the public mailing to which comments are due
 88 |     //wgPublicList: "public-pronunciation",
 89 | 
 90 |     // URI of the patent status for this WG, for Rec-track documents
 91 |     // !!!! IMPORTANT !!!!
 92 |     // This is important for Rec-track documents, do not copy a patent URI from a random
 93 |     // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
 94 |     // Team Contact.
 95 |     //wgPatentURI:  "https://www.w3.org/2004/01/pp-impl/83907/status",
 96 |     //group: 83907
 97 |     //maxTocLevel: 2,
 98 |     group: "apa",
 99 |     github: "https://github.com/w3c/pronunciation/"
100 | 
101 |    
102 | 
103 |     // Spec URLs
104 | 
105 | 
106 |   };
107 | 


--------------------------------------------------------------------------------
/use-cases/respec-config.js:
--------------------------------------------------------------------------------
  1 | var respecConfig = {
  2 |     // embed RDFa data in the output
  3 |     trace:  true,
  4 |     doRDFa: '1.0',
  5 |     includePermalinks: true,
  6 |     permalinkEdge:     true,
  7 |     permalinkHide:     false,
  8 |     tocIntroductory: true,
  9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 10 |     specStatus:           "ED",
 11 |     noRecTrack: true,
 12 |     //crEnd:                "2012-04-30",
 13 |     //perEnd:               "2013-07-23",
 14 |     //publishDate:          "2013-08-22",
 15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 16 | 
 17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
 18 |     shortName:            "pronunciation-use-case",
 19 | 
 20 | 
 21 |     // if you wish the publication date to be other than today, set this
 22 |     //publishDate:  "2017-05-09",
 23 |     copyrightStart:  "2019",
 24 |     license: "w3c-software",
 25 | 
 26 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
 27 |     // and its maturity status
 28 |     //previousPublishDate:  "",
 29 |     //previousMaturity:  "",
 30 |     //prevRecURI: "",
 31 |     //previousDiffURI: "",
 32 | 
 33 |     // if there a publicly available Editors Draft, this is the link
 34 |     edDraftURI: "https://w3c.github.io/pronunciation/user-scenarios",
 35 | 
 36 |     // if this is a LCWD, uncomment and set the end of its review period
 37 |     // lcEnd: "2012-02-21",
 38 | 
 39 |     // editors, add as many as you like
 40 |     // only "name" is required
 41 |     editors: [
 42 |       {
 43 |         name: "Irfan Ali",
 44 |         url: 'https://www.w3.org/users/98332',
 45 |         mailto: "irfan.ali@blackrock.com",
 46 |         company: "BlackRock",
 47 |         companyURI: "https://www.blackrock.com/",
 48 |         w3cid: 98332
 49 |       },
 50 |       {
 51 |         name: "Paul Grenier",
 52 |          mailto: "paul.grenier@deque.com",
 53 |          company: "Deque System"
 54 |       },
 55 |       {
 56 |         name: "Markku Hakkinen",
 57 |         url: 'https://www.w3.org/users/35712',
 58 |         mailto: "mhakkinen@ets.org",
 59 |         company: "Educational Testing Service",
 60 |         companyURI: "https://www.ets.org/",
 61 |         w3cid: 35712
 62 |       },
 63 |       {
 64 |         name: "Roy Ran",
 65 |         url: 'https://www.w3.org',
 66 |         mailto: "ran@w3.org",
 67 |         company: "W3C",
 68 |         companyURI: "http://www.w3.org",
 69 |         w3cid: 100586
 70 |       }
 71 |     ],
 72 | 
 73 |     // authors, add as many as you like.
 74 |     // This is optional, uncomment if you have authors as well as editors.
 75 |     // only "name" is required. Same format as editors.
 76 | 
 77 |     //authors:  [
 78 |     //    { name: "Your Name", url: "http://example.org/",
 79 |     //      company: "Your Company", companyURI: "http://example.com/" },
 80 |     //],
 81 | 
 82 |     /*
 83 |     alternateFormats: [
 84 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
 85 |       { uri: 'aria.ps', label: "PostScript version" },
 86 |       { uri: 'aria.pdf', label: "PDF version" }
 87 |     ],
 88 |     */
 89 | 
 90 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
 91 | 
 92 |     // name of the WG
 93 |     wg:           "Accessible Platform Architectures Working Group",
 94 | 
 95 |     // URI of the public WG page
 96 |     wgURI:        "https://www.w3.org/WAI/APA/",
 97 | 
 98 |     // name (with the @w3c.org) of the public mailing to which comments are due
 99 |     wgPublicList: "public-pronunciation",
100 | 
101 |     // URI of the patent status for this WG, for Rec-track documents
102 |     // !!!! IMPORTANT !!!!
103 |     // This is important for Rec-track documents, do not copy a patent URI from a random
104 |     // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
105 |     // Team Contact.
106 |     wgPatentURI:  "https://www.w3.org/2004/01/pp-impl/83907/status",
107 |     //maxTocLevel: 2,
108 | 
109 |    
110 | 
111 |     // Spec URLs
112 | 
113 | 
114 |   };
115 | 


--------------------------------------------------------------------------------
/samples/w3cptf-singleattr-tests.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
 3 | 
 4 | <head>
 5 |     <title>W3C Pronunciation Task Force Single Attribute Sample Page
 6 | </title>
 7 |     <meta charset="utf-8" />
 8 |     <script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
 9 | 
10 |     <script src="respec-config.js" class="remove"></script>
11 | </head>
12 | 
13 | <body>
14 | <h1>W3C Pronunciation Task Force Single Attribute SSML in HTML Test Cases</h1>
15 | 
16 | <p>Developers of read aloud tools, screen readers, and voice assistants should use these test cases to evaluate and test their implementation. Questions and test results
17 | should be filed as issues in the <a href="https://github.com/w3c/pronunciation">Pronunciation github repo</a>.</p>
18 | <h2>Case 1: Say-as</h2>
19 | <p>According the 2010 US Census, the population of <span
20 | data-ssml='{"say-as" :
21 | {"interpret-as":"characters"}}'>90274</span>
22 | increased to 25209 from 24976 over the past 10 years.</p>
23 | <p>Case 1: Audio rendering <audio  controls src="audio\case-say-as.mp3" />
24 | <h2>Case 2: Phoneme</h2>
25 | <p>Once upon a midnight <span data-ssml='{"phoneme":{"alphabet":"ipa";ph:"ˈdrɪəri"}}'>dreary</span></p>
26 | <p>Case 2: Audio rendering <audio  controls src="audio\case-phoneme.mp3" />
27 | <h2>Case 3: sub</h2>
28 | <p><span data-ssml='{"sub":{"alias":"Sodium Chloride"}}'>NaCL</span></p>
29 | <p>Case 3: Audio rendering <audio  controls src="audio\case-sub.mp3" />
30 | <h2>Case 4: Voice</h2>
31 | <p>She said, "<span data-ssml='{"voice":{"gender":"female"}}'>My name is Marie</span>", to which, he replied, "<span data-ssml='{"voice":{"gender":"male"}}'>I am Tom.</span>"</p>
32 | <p>Case 4: Audio rendering (no sample)</p>
33 | <h2>Case 5: Emphasis</h2>
34 | <p>Please use <span data-ssml='{"emphasis":{"level":"strong"}}'>extreme caution.</span>.</p>
35 | <p>Case 5: Audio rendering <audio  controls src="audio\case-emphasis.mp3" />
36 | <h2>Case 6: Break</h2>
37 | <p>Take a deep breath,<span data-ssml='{"break":{"time":"1s"}}'></span> and exhale.</span></p>
38 | <p>Case 6: Audio rendering <audio  controls src="audio\case-break.mp3" />
39 | <h2>Case 7: Prosody</h2>
40 | <p>The tortoise, said (slowly) "<span data-ssml='{"prosody":{"rate":"x-slow"}}'>I am almost at the finish line</span>."</p>
41 | <p>Case 7: Audio rendering <audio  controls src="audio\case-prosody.mp3" />
42 | <h2>Case 8: Audio</h2>
43 | <p>You will hear a brief chime <span data-ssml='{"audio":{"src":"/audio/chime.ogg"}}'></span> when your time is up.</p>
44 | <p>Case 8: Audio rendering <audio  controls src="audio\case-audio.mp3" />
45 | <h2>Case 9: Extended Text - The Raven</h2>
46 | <p data-ssml='{"prosody":{"rate":"slow";"pitch":"low"}}'>
47 | 	Once upon a midnight <span data-ssml='{"phoneme":{"alphabet":"ipa";ph:"ˈdrɪəri"}}'>dreary</span>
48 | 	<span data-ssml='{"break":{"time":"500ms"}'></span>,
49 | 	while I pondered, weak
50 | 	<span data-ssml='{"break":{"time":"150ms"}'></span> and weary,<br data-ssml='{"break":{"time":"500ms"}' />
51 | 	Over many a quaint and curious volume of forgotton 
52 | 	<span data-ssml='{"prosody":{"rate":"x-slow";"pitch":"low"}}'>lore—</span><br />
53 | 	While I nodded, nearly napping, suddenly there came a tapping,
54 | 	<br data-ssml='{"audio":{"src":"/soundlibrary/wood/hits/hits_11"}}'/>
55 | 	As of some one gently rapping,
56 | 	<br data-ssml='{"audio":{"src":"/soundlibrary/wood/hits/hits_11"}}'/></span>
57 | 	rapping at my chamber door.
58 | 	<span data-ssml='{"audio":{"src":"/soundlibrary/wood/hits/hits_11"}}'></span>
59 | 	<br data-ssml='{"audio":{"src":"/soundlibrary/wood/hits/hits_11"}}' />
60 | 	<span data-ssml='{"prosody":{"rate":"medium"}}'>"'Tis some visitor,"</span>
61 | 	I muttered, <span data-ssml='{"prosody":{"volume":"x-soft";"rate":"x-slow"}}'>
62 | 	<span data-ssml='{"phoneme":{"alphabet":"ipa";ph:"tæpɪŋ"}}'>"tapping</span>
63 | 	at my chamber door—</span><br  data-ssml='{"break":{"time":"750ms"}'/>
64 | 	Only this<span data-ssml='{"break":{"strength":"weak"}'> </span>
65 | 	and nothing<span data-ssml='{"break":{"strength":"none"}'>
66 | 	</span><span data-ssml='{"prosody":{"rate":"75%"}}'>more."</span>
67 | </p>
68 | <p>Case 9: Audio rendering <audio controls src="audio\case-raven.mp3" />
69 | 
70 | </body>
71 | </html>
72 | 
73 | 


--------------------------------------------------------------------------------
/user-scenarios/respec-config.js:
--------------------------------------------------------------------------------
  1 | var respecConfig = {
  2 |     // embed RDFa data in the output
  3 |     trace:  true,
  4 |     doRDFa: '1.0',
  5 |     includePermalinks: true,
  6 |     permalinkEdge:     true,
  7 |     permalinkHide:     false,
  8 |     tocIntroductory: true,
  9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 10 |     specStatus:           "ED",
 11 |     noRecTrack: true,
 12 |     //crEnd:                "2012-04-30",
 13 |     //perEnd:               "2013-07-23",
 14 |     //publishDate:          "2013-08-22",
 15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 16 | 
 17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
 18 |     shortName:            "pronunciation-user-scenarios",
 19 | 
 20 | 
 21 |     // if you wish the publication date to be other than today, set this
 22 |     //publishDate:  "2017-05-09",
 23 |     copyrightStart:  "2018",
 24 |     license: "w3c-software",
 25 | 
 26 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
 27 |     // and its maturity status
 28 |     //previousPublishDate:  "",
 29 |     //previousMaturity:  "",
 30 |     //prevRecURI: "",
 31 |     //previousDiffURI: "",
 32 | 
 33 |     // if there a publicly available Editors Draft, this is the link
 34 |     edDraftURI: "https://w3c.github.io/pronunciation/user-scenarios",
 35 | 
 36 |     // if this is a LCWD, uncomment and set the end of its review period
 37 |     // lcEnd: "2012-02-21",
 38 | 
 39 |     // editors, add as many as you like
 40 |     // only "name" is required
 41 |     editors: [
 42 |       {
 43 |         name: "Irfan Ali",
 44 |         url: 'https://www.w3.org/users/98332',
 45 |         mailto: "irfan.ali@blackRock.com",
 46 |         company: "BlackRock",
 47 |         companyURI: "https://www.blackrock.com/",
 48 |         w3cid: 98332
 49 |       },
 50 | 
 51 |       {
 52 |        name: "Sam Kanta",
 53 |        mailto: "sam.kanta@thesustainablechange.com"
 54 |       },
 55 |          {
 56 |        name: "Christine Loew",
 57 |        company: "College Board",
 58 |        mailto: "cloew@collegeboard.org"
 59 |       },
 60 |       {
 61 |        name: "Paul Grenier",
 62 |        company: "Deque System",
 63 |        mailto: "paul.grenier@deque.com"
 64 |           
 65 |       },
 66 |       {
 67 |         name: "Roy Ran",
 68 |         url: 'https://www.w3.org',
 69 |         mailto: "ran@w3.org",
 70 |         company: "W3C",
 71 |         companyURI: "http://www.w3.org",
 72 |         w3cid: 100586
 73 |       }
 74 |     ],
 75 | 
 76 |     // authors, add as many as you like.
 77 |     // This is optional, uncomment if you have authors as well as editors.
 78 |     // only "name" is required. Same format as editors.
 79 | 
 80 |     //authors:  [
 81 |     //    { name: "Your Name", url: "http://example.org/",
 82 |     //      company: "Your Company", companyURI: "http://example.com/" },
 83 |     //],
 84 | 
 85 |     /*
 86 |     alternateFormats: [
 87 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
 88 |       { uri: 'aria.ps', label: "PostScript version" },
 89 |       { uri: 'aria.pdf', label: "PDF version" }
 90 |     ],
 91 |     */
 92 | 
 93 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
 94 | 
 95 |     // name of the WG
 96 |     //wg:           "Accessible Platform Architectures Working Group",
 97 | 
 98 |     // URI of the public WG page
 99 |     //wgURI:        "https://www.w3.org/WAI/APA/",
100 | 
101 |     // name (with the @w3c.org) of the public mailing to which comments are due
102 |     wgPublicList: "public-pronunciation",
103 | 
104 |     // URI of the patent status for this WG, for Rec-track documents
105 |     // !!!! IMPORTANT !!!!
106 |     // This is important for Rec-track documents, do not copy a patent URI from a random
107 |     // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
108 |     // Team Contact.
109 |     //wgPatentURI:  "https://www.w3.org/2004/01/pp-impl/83907/status",
110 |     //maxTocLevel: 2,
111 | 
112 |    //group: 83907
113 |    group: "apa",
114 |     github: "https://github.com/w3c/pronunciation/"
115 | 
116 |     // Spec URLs
117 | 
118 | 
119 |   };
120 | 


--------------------------------------------------------------------------------
/gap-analysis/respec-config.js:
--------------------------------------------------------------------------------
  1 | var respecConfig = {
  2 |     // embed RDFa data in the output
  3 |     trace:  true,
  4 |     doRDFa: '1.0',
  5 |     includePermalinks: true,
  6 |     permalinkEdge:     true,
  7 |     permalinkHide:     false,
  8 |     tocIntroductory: true,
  9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 10 |     specStatus:           "ED",
 11 |     noRecTrack: true,
 12 |     //crEnd:                "2012-04-30",
 13 |     //perEnd:               "2013-07-23",
 14 |     //publishDate:          "2013-08-22",
 15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 16 | 
 17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
 18 |     shortName:            "pronunciation-gap-analysis",
 19 | 
 20 | 
 21 |     // if you wish the publication date to be other than today, set this
 22 |     //publishDate:  "2017-05-09",
 23 |     copyrightStart:  "2018",
 24 |     license: "w3c-software",
 25 | 
 26 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
 27 |     // and its maturity status
 28 |     //previousPublishDate:  "",
 29 |     //previousMaturity:  "",
 30 |     //prevRecURI: "",
 31 |     //previousDiffURI: "",
 32 | 
 33 |     // if there a publicly available Editors Draft, this is the link
 34 |     edDraftURI: "https://w3c.github.io/pronunciation/gap-analysis",
 35 | 
 36 |     // if this is a LCWD, uncomment and set the end of its review period
 37 |     // lcEnd: "2012-02-21",
 38 | 
 39 |     // editors, add as many as you like
 40 |     // only "name" is required
 41 |     editors: [
 42 |       {
 43 |         name: "Markku Hakkinen",
 44 |         url: 'https://www.w3.org/users/35712',
 45 |         mailto: "mhakkinen@ets.org",
 46 |         company: "Educational Testing Service",
 47 |         companyURI: "https://www.ets.org/",
 48 |         w3cid: 35712
 49 |       },
 50 |       {
 51 |         name: "Steve Noble",
 52 |         url: 'https://www.w3.org/users/112867',
 53 |         mailto: "steve.noble@pearson.com",
 54 |         company: "Pearson",
 55 |         companyURI: "https://www.pearson.com/",
 56 |         w3cid: 98332
 57 |       },
 58 |       {
 59 |         name: "Irfan Ali",
 60 |         url: 'https://www.w3.org/users/98332',
 61 |         mailto: "irfan.ali@blackrock.com",
 62 |         company: "BlackRock",
 63 |         companyURI: "https://www.blackrock.com",
 64 |         w3cid: 98332
 65 |       },
 66 | 
 67 |      
 68 |       {
 69 |         name: "Roy Ran",
 70 |         url: 'https://www.w3.org',
 71 |         mailto: "ran@w3.org",
 72 |         company: "W3C",
 73 |         companyURI: "http://www.w3.org",
 74 |         w3cid: 100586
 75 |       }
 76 |     ],
 77 | 
 78 |     // authors, add as many as you like.
 79 |     // This is optional, uncomment if you have authors as well as editors.
 80 |     // only "name" is required. Same format as editors.
 81 | 
 82 |     //authors:  [
 83 |     //    { name: "Your Name", url: "http://example.org/",
 84 |     //      company: "Your Company", companyURI: "http://example.com/" },
 85 |     //],
 86 | 
 87 |     /*
 88 |     alternateFormats: [
 89 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
 90 |       { uri: 'aria.ps', label: "PostScript version" },
 91 |       { uri: 'aria.pdf', label: "PDF version" }
 92 |     ],
 93 |     */
 94 | 
 95 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
 96 | 
 97 |     // name of the WG
 98 |     wg:           "Accessible Platform Architectures Working Group",
 99 | 
100 |     // URI of the public WG page
101 |     wgURI:        "https://www.w3.org/WAI/APA/",
102 | 
103 |     // name (with the @w3c.org) of the public mailing to which comments are due
104 |     wgPublicList: "public-pronunciation",
105 | 
106 |     // URI of the patent status for this WG, for Rec-track documents
107 |     // !!!! IMPORTANT !!!!
108 |     // This is important for Rec-track documents, do not copy a patent URI from a random
109 |     // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
110 |     // Team Contact.
111 |     wgPatentURI:  "https://www.w3.org/2004/01/pp-impl/83907/status",
112 |     //maxTocLevel: 2,
113 | 
114 |    
115 | 
116 |     // Spec URLs
117 | 
118 | 
119 |   };
120 | 


--------------------------------------------------------------------------------
/samples/w3cptf-multiattr-tests.html:
--------------------------------------------------------------------------------
 1 | 
 2 | <!DOCTYPE html>
 3 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
 4 | 
 5 | <head>
 6 |     <title>W3C Pronunciation Task Force Multi-Attribute Sample Page
 7 | </title>
 8 |     <meta charset="utf-8" />
 9 |     <script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
10 | 
11 |     <script src="respec-config.js" class="remove"></script>
12 | </head>
13 | 
14 | <body>
15 | <h1>W3C Pronunciation Task Force Multi-Attribute SSML in HTML Test Cases</h1>
16 | 
17 | <p>Developers of read aloud tools, screen readers, and voice assistants should use these test cases to evaluate and test their implementation. Questions and test results
18 | should be filed as issues in the <a href="https://github.com/w3c/pronunciation">Pronunciation github repo</a>.</p>
19 | <h2>Case 1: Say-as</h2>
20 | <p>According the 2010 US Census, the population of <span
21 |     data-ssml-say-as='digits'>90274</span>
22 |     increased to 25209 from 24976 over the past 10 years.</p>
23 | <p>Case 1: Audio rendering <audio  controls src="audio\case-say-as.mp3" />
24 | <h2>Case 2: Phoneme</h2>
25 | <p>Once upon a midnight <span data-ssml-phoneme-alphabet="ipa" data-ssml-phoneme-ph="ˈdrɪəri">dreary</span></p>
26 | <p>Case 2: Audio rendering <audio  controls src="audio\case-phoneme.mp3" />
27 | <h2>Case 3: sub</h2>
28 | <p><span data-ssml-sub-alias="Sodium Chloride">NaCL</span></p>
29 | <p>Case 3: Audio rendering <audio  controls src="audio\case-sub.mp3" />
30 | <h2>Case 4: Voice</h2>
31 | <p>She said, "<span data-ssml-voice-gender="female">My name is Marie</span>", to which, he replied, <span data-ssml-voice-gender="male">I am Tom.</span>"</p>
32 | <p>Case 4: Audio rendering (no sample) </p>
33 | <h2>Case 5: Emphasis</h2>
34 | <p>Please use <span data-ssml-emphasis-level="strong">extreme caution.</span>.</p>
35 | <p>Case 5: Audio rendering <audio  controls src="audio\case-emphasis.mp3" />
36 | <h2>Case 6: Break</h2>
37 | <p>ake a deep breath,<span data-ssml-break-time="1s"></span> and exhale.</span></p>
38 | <p>Case 6: Audio rendering <audio  controls src="audio\case-break.mp3" />
39 | <h2>Case 7: Prosody</h2>
40 | <p>The tortoise, said (slowly) "<span data-ssml-prosody-rate="x-slow">
41 |     I am almost at the finish line.</span>"</p>
42 | <p>Case 7: Audio rendering <audio  controls src="audio\case-prosody.mp3" />
43 | <h2>Case 8: Audio</h2>
44 | <p>You will hear a brief chime <span data-ssml-audio-src="https://www.talkinginterfaces.org/audio/chime.mp3"></span> when your time is up.</p>
45 | <p>Case 8: Audio rendering <audio  controls src="audio\case-audio.mp3" />
46 | <h2>Case 9: Extended Text - The Raven</h2>
47 | <p data-ssml-prosody-rate="slow" data-ssml-prosody-pitch="low">
48 |     Once upon a midnight <span data-ssml-phoneme-alphabet="ipa" data-ssml-phoneme-ph="ˈdrɪəri">dreary</span> <br />
49 |     <span data-ssml-break-time="500ms"> </span>,
50 |     while I pondered, weak
51 |     <span data-ssml-break-time="150ms"> </span> and weary,<br data-ssml-break-time="500ms" />
52 |     Over many a quaint and curious volume of forgotton <span data-ssml-prosody-rate="x-slow"
53 |         data-ssml-prosody-pitch="low"> lore—</span> <br />
54 |     While I nodded, nearly napping, suddenly there came a tapping, <br />
55 |     <br data-ssml-audio-src="https://www.tallkinginterfaces.org/audio/knocking.mp3" />
56 |     As of some one gently rapping,
57 |     <span data-ssml-audio-src="https://www.tallkinginterfaces.org/audio/knocking.mp3"></span >
58 |     rapping at my chamber door.
59 |     <span data-ssml-audio-src="https://www.tallkinginterfaces.org/audio/knocking.mp3"></span> <br />
60 |     <br data-ssml-audio-src="https://www.tallkinginterfaces.org/audio/knocking.mp3" />
61 | 	<span  data-ssml-prosody-rate="medium" > 
62 | 	"'Tis some visitor,"</span> <br />
63 |     I muttered, <span  data-ssml-prosody-rate="x-slow"> <br /> <span
64 |     data-ssml-phoneme-alphabet="ipa" data-ssml-phoneme-ph="tæpɪŋ">"tapping</span> at my chamber door—</span> <br /> <br
65 |     data-ssml-break-time="750ms" />
66 |     Only this <span data-ssml-break-strength="weak"> </span > and nothing <span data-ssml-break-strength="none">
67 |     </span> <br /> <span data-ssml-prosody-rate="75%"> more."</span>
68 | </p>
69 | 
70 | <p>Case 9: Audio rendering <audio controls src="audio\case-raven.mp3" />
71 | 
72 | </body>
73 | </html>
74 | 
75 | 


--------------------------------------------------------------------------------
/gap-analysis_and_use-case/respec-config.js:
--------------------------------------------------------------------------------
  1 | var respecConfig = {
  2 |     // embed RDFa data in the output
  3 |     trace:  true,
  4 |     doRDFa: '1.0',
  5 |     includePermalinks: true,
  6 |     permalinkEdge:     true,
  7 |     permalinkHide:     false,
  8 |     tocIntroductory: true,
  9 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 10 |     specStatus:           "ED",
 11 |     noRecTrack: true,
 12 |     //crEnd:                "2012-04-30",
 13 |     //perEnd:               "2013-07-23",
 14 |     //publishDate:          "2013-08-22",
 15 |     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 16 | 
 17 |     // the specifications short name, as in http://www.w3.org/TR/short-name/
 18 |     shortName:            "pronunciation-gap-analysis-and-use-cases",
 19 | 
 20 | 
 21 |     // if you wish the publication date to be other than today, set this
 22 |     //publishDate:  "2017-05-09",
 23 |     copyrightStart:  "2020",
 24 | 
 25 | 
 26 |     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
 27 |     // and its maturity status
 28 |     //previousPublishDate:  "",
 29 |     //previousMaturity:  "",
 30 |     //prevRecURI: "",
 31 |     //previousDiffURI: "",
 32 | 
 33 |     // if there a publicly available Editors Draft, this is the link
 34 |     edDraftURI: "https://w3c.github.io/pronunciation/gap-analysis_and_use-case",
 35 | 
 36 |     // if this is a LCWD, uncomment and set the end of its review period
 37 |     // lcEnd: "2012-02-21",
 38 | 
 39 |     // editors, add as many as you like
 40 |     // only "name" is required
 41 |     editors: [
 42 |       {
 43 |         name: "Markku Hakkinen",
 44 |         url: 'https://www.w3.org/users/35712',
 45 |         mailto: "mhakkinen@ets.org",
 46 |         company: "Educational Testing Service",
 47 |         companyURI: "https://www.ets.org/",
 48 |         w3cid: 35712
 49 |       },
 50 |       {
 51 |         name: "Steve Noble",
 52 |         url: 'https://www.w3.org/users/112867',
 53 |         mailto: "steve.noble@pearson.com",
 54 |         company: "Pearson",
 55 |         companyURI: "https://www.pearson.com/",
 56 |         w3cid: 98332
 57 |       },
 58 |         {
 59 |         name: "Dee Dyer",
 60 |         mailto: "ddyer@ets.org",
 61 |         company: "Educational Testing Service",
 62 |         companyURI: "https://www.ets.org/",
 63 |         w3cid: 00000
 64 |         
 65 |       },
 66 |       {
 67 |         name: "Irfan Ali",
 68 |         url: 'https://www.w3.org/users/98332',
 69 |         mailto: "iali@ets.org",
 70 |         company: "Educational Testing Service",
 71 |         companyURI: "https://www.ets.org/",
 72 |         w3cid: 98332
 73 |       },
 74 |       {
 75 |         name: "Paul Grenier",
 76 |          mailto: "pgrenier@gmail.com"
 77 |          
 78 |       },
 79 | 
 80 |       {
 81 |         name: "Roy Ran",
 82 |         url: 'https://www.w3.org',
 83 |         mailto: "ran@w3.org",
 84 |         company: "W3C",
 85 |         companyURI: "http://www.w3.org",
 86 |         w3cid: 100586
 87 |       }
 88 |     ],
 89 | 
 90 |     // authors, add as many as you like.
 91 |     // This is optional, uncomment if you have authors as well as editors.
 92 |     // only "name" is required. Same format as editors.
 93 | 
 94 |     //authors:  [
 95 |     //    { name: "Your Name", url: "http://example.org/",
 96 |     //      company: "Your Company", companyURI: "http://example.com/" },
 97 |     //],
 98 | 
 99 |     /*
100 |     alternateFormats: [
101 |       { uri: 'aria-diff.html', label: "Diff from Previous Recommendation" } ,
102 |       { uri: 'aria.ps', label: "PostScript version" },
103 |       { uri: 'aria.pdf', label: "PDF version" }
104 |     ],
105 |     */
106 | 
107 |     // errata: 'http://www.w3.org/2010/02/rdfa/errata.html',
108 | 
109 |     // name of the WG
110 |     //wg:           "Accessible Platform Architectures Working Group",
111 | 
112 |     // URI of the public WG page
113 |     //wgURI:        "https://www.w3.org/WAI/APA/",
114 | 
115 |     // name (with the @w3c.org) of the public mailing to which comments are due
116 |     //wgPublicList: "public-pronunciation",
117 | 
118 |     // URI of the patent status for this WG, for Rec-track documents
119 |     // !!!! IMPORTANT !!!!
120 |     // This is important for Rec-track documents, do not copy a patent URI from a random
121 |     // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
122 |     // Team Contact.
123 |     //wgPatentURI:  "https://www.w3.org/2004/01/pp-impl/83907/status",
124 |     //maxTocLevel: 2,
125 | 
126 |     group: "apa",
127 |     github: "https://github.com/w3c/pronunciation/"
128 | 
129 |     // Spec URLs
130 | 
131 | 
132 |   };
133 | 


--------------------------------------------------------------------------------
/technical-approach/ssml-json-schema-w3cptf.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "$schema": "http://json-schema.org/draft-07/schema#",
  3 |   "$id": "http://ets-research.org/ia11ylab/ia/json/ssml-json-schema-w3cptf.json",
  4 |   "title": "SSML as a single attribute for inclusion in HTML",
  5 |   "description": "JSON structure representing each SSML element as a JSON object. The SSML properties are dervived from https://www.w3.org/TR/speech-synthesis11/.  Several elements are excluded: mark, speak, p, w and the desc attribute. Author: M. Hakkinen - ETS",
  6 |   "type": "object",
  7 |   "properties": {
  8 |     "say-as": {
  9 |       "description": "The unique identifier for a product",
 10 |       "type": "object",
 11 |       "properties": {
 12 |             "interpret-as": { "type": "string",
 13 |                      "enum": ["date","time","telephone","characters","cardinal","ordinal"]},
 14 |             "format": { "type": "string" },
 15 |             "detail": {"type": "string"}
 16 |          }
 17 |     },
 18 |     "phoneme": {
 19 |         "description": "The Phoneme Function",
 20 |         "type": "object",
 21 |         "properties": {
 22 |             "ph": { "type": "string"},
 23 |             "alphabet": {"type": "string",
 24 |             "enum": ["ipa", "x-sampa"]}}
 25 |         },
 26 |     "sub": {
 27 |         "description": "sub function",
 28 |         "type": "object",
 29 |         "properties": {
 30 |             "alias": {"type":"string"}
 31 |         }
 32 |     },
 33 |     "voice":{
 34 |         "description": "voice function",
 35 |         "type":"object",
 36 |         "properties": {
 37 |             "gender": {"type":"string",
 38 |             "enum": ["female","male","neutral"]},
 39 |             "age": {"type":"integer"},
 40 |             "variant":{"type":"string"},
 41 |             "name": {"type":"string"},
 42 |             "languages": {"type":"string"}
 43 |         }
 44 |     },
 45 |     "emphasis":{
 46 |         "description": "speech emphasis level",
 47 |         "type":"object",
 48 |         "properties": {
 49 |             "level": {"type":"string",
 50 |             "enum": ["none","x-weak","weak","medium","strong","x-strong"]},
 51 |             "time": {"type":"string",
 52 |                    "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"}
 53 |                    }
 54 |         },
 55 |       "prosody": {
 56 |         "description": "speech prosody",
 57 |         "type":"object",
 58 |         "properties": {
 59 |         "pitch": {"type":"string",
 60 |                   "pattern":"^x-low|low|medium|high|x-high|default|(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)Hz)$"
 61 |                 },    
 62 |         "contour": {"type":"string"},
 63 |         "range": {"type":"string",
 64 |                   "pattern":"^x-low|low|medium|high|x-high|default|(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)Hz)$"
 65 |                 },
 66 |         "rate": {"type":"string",
 67 |                   "pattern":"^x-slow|slow|medium|fast|x-xfast|default|(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)%)$"},
 68 |         "duration": {"type": "string",
 69 |                     "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
 70 |         "volume": {"type":"string",
 71 |                    "pattern":"^silent|x-soft|soft|medium|loud|x-loud|default|(+|-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)dB)$"}
 72 |         }
 73 |       },
 74 |       "break": {
 75 |           "description": "break - insert a timed pause",
 76 |           "type":"object",
 77 |           "properties": {
 78 |            "strength": {"type":"string",
 79 |                         "enum": ["none","x-weak","weak","medium","strong","x-strong"]},
 80 |           "time": {"type":"string",
 81 |                    "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"}
 82 |                    }
 83 |            },
 84 |       "audio": {
 85 |           "description":"audio element used to insert audio file into speech stream",
 86 |           "type":"object",
 87 |           "properties":{
 88 |               "src": {"type":"uri"},
 89 |               "fetchtimeout":{"type":"string",
 90 |                               "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
 91 |               "fetchint":{"type":"string",
 92 |                           "enum": ["safe","prefetch"]},
 93 |               "maxage":{"type":"string"},
 94 |               "maxstale":{"type":"string"},
 95 |               "clipBegin":{"type": "string",
 96 |                     "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"
 97 | 
 98 |               },
 99 |               "clipEnd":{"type": "string",
100 |                     "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"
101 | 
102 |               },
103 |               "repeatCount":{"type":"integer"
104 |                            },
105 |               "repeatDur":{"type": "string",
106 |                     "pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"
107 | 
108 |               },
109 |               "soundLevel":{"type":"string",
110 |                    "pattern":"^(+|-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)dB)$"},
111 | 
112 |               "speed":{
113 |                 "type":"string",
114 |                   "pattern":"^((0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)%)$"
115 |               }
116 | 
117 |     }
118 |   }
119 |   }
120 | }
121 |       
122 |     


--------------------------------------------------------------------------------
/technical-approach/appendixJSON.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
  3 | <head>
  4 | 	<title>SSML JSON Schema</title>
  5 | 	<meta charset="utf-8" />
  6 | 	<script src="https://www.w3.org/Tools/respec/respec-w3c-common" class="remove"></script>
  7 | 
  8 | 	<script>
  9 | 		var respecConfig = {
 10 |     // embed RDFa data in the output
 11 |     trace:  true,
 12 |     doRDFa: '1.0',
 13 |     includePermalinks: true,
 14 |     permalinkEdge:     true,
 15 |     permalinkHide:     false,
 16 |     tocIntroductory: true,
 17 |     // specification status (e.g., WD, LC, NOTE, etc.). If in doubt use ED.
 18 |     specStatus:           "unofficial",
 19 |     
 20 |     shortName:            "pronunciation-json-example",
 21 |     noTOC: true,
 22 | 
 23 | 
 24 |     // if you wish the publication date to be other than today, set this
 25 |     //publishDate:  "2017-05-09",
 26 |     copyrightStart:  "2021",
 27 |     license: "w3c-software",
 28 |     editors: [
 29 |       
 30 |       {
 31 |         name: "",
 32 |         url: '',
 33 |         mailto: "",
 34 |         company: "",
 35 |         companyURI: "",
 36 |        
 37 |       }
 38 |     ],
 39 | 
 40 |    
 41 |    group: 83907
 42 | 
 43 |     // Spec URLs
 44 | 
 45 | 
 46 |   };
 47 | 	</script>
 48 | </head>
 49 | 
 50 | <body>
 51 | 	<section id="abstract">
 52 | 		<p>
 53 | 			The Pronunciation Task Force develops specifications for hypertext markup language (HTML) author control of
 54 | 			text-to-speech (TTS) presentation. 
 55 | 		</p>
 56 | 	</section>
 57 | 	<section id="sotd"></section>
 58 | <section class="informative" id="SSMLJSONexample">
 59 | <h2>Appendix A. SSML JSON Schema</h2>
 60 | 	<p>The JSON schema defines the specific SSML functions, properties, and values recommended for implementation in
 61 | 		this proposal.</p>
 62 | 	
 63 | <div class="example">	
 64 | <pre>
 65 | <code>
 66 | {
 67 | "$schema": "http://json-schema.org/draft-07/schema#",
 68 | "$id": "http://ets-research.org/ia11ylab/ia/json/ssml-json-schema-w3cptf.json",
 69 | "title": "SSML as a single attribute for inclusion in HTML",
 70 | "description": "JSON structure representing each SSML element as a JSON object. The SSML properties are dervived <br>
 71 | from https://www.w3.org/TR/speech-synthesis11/.  Several elements are excluded: mark, speak, p, w and the desc attribute. <br>
 72 | Author: M. Hakkinen - ETS",
 73 | "type": "object",
 74 | "properties": {
 75 | 	"say-as": {
 76 | 	"description": "The unique identifier for a product",
 77 | 	"type": "object",
 78 | 	"properties": {
 79 | 	    "interpret-as": { "type": "string",
 80 | 			   "enum": ["date","time","telephone","characters","cardinal","ordinal"]},
 81 | 		"format": { "type": "string" },
 82 | 		"detail": {"type": "string"}
 83 | 		  }
 84 | 		},
 85 | 	 "phoneme": {
 86 | 		  "description": "The Phoneme Function",
 87 | 		  "type": "object",
 88 | 		  "properties": {
 89 | 			  "ph": { "type": "string"},
 90 | 			  "alphabet": {"type": "string", "enum": ["ipa", "x-sampa"]}}
 91 | 		},
 92 | 	  "sub": {
 93 | 	        "description": "sub function", "type": "object", 
 94 | 		"properties": {
 95 | 		  "alias": {"type":"string"}}
 96 | 		},
 97 | 	  "voice":{"description": "voice function", "type":"object",
 98 | 		"properties": {
 99 | 			"gender": {"type":"string",
100 | 			  	"enum": ["female","male","neutral"]},
101 | 				"age": {"type":"integer"},
102 | 				"variant":{"type":"string"},
103 | 				"name": {"type":"string"},
104 | 				"languages": {"type":"string"}
105 | 				}
106 | 			  },
107 | 		 "emphasis":{
108 | 			"description": "speech emphasis level",
109 | 			"type":"object",
110 | 			"properties": {
111 | 				"level": {"type":"string",
112 | 					"enum": ["none","x-weak","weak","medium","strong","x-strong"]},
113 | 					"time": {"type":"string",
114 | 					"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"}
115 | 					}
116 | 				},
117 | 		"prosody": {
118 | 			"description": "speech prosody",
119 | 			"type":"object",
120 | 			"properties": {
121 | 				"pitch": {"type":"string",
122 | 					"pattern":"^x-low|low|medium|high|x-high|default|(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)Hz)$"},    
123 | 				"contour": {"type":"string"},
124 | 				"range": {"type":"string",
125 | 				"pattern":"^x-low|low|medium|high|x-high|default|(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)Hz)$"},
126 | 				"rate": {"type":"string",
127 | 				"pattern":"^x-slow|slow|medium|fast|x-xfast|default|(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)%)$"},
128 | 				"duration": {"type": "string",
129 | 				"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
130 | 				"volume": {"type":"string",
131 | 					"pattern":"^silent|x-soft|soft|medium|loud|x-loud|default|(+|-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)dB)$"}
132 | 					}
133 | 				},
134 | 		"break": {
135 | 			"description": "break - insert a timed pause",
136 | 			"type":"object",
137 | 			"properties": {
138 | 				"strength": {"type":"string",
139 | 				"enum": ["none","x-weak","weak","medium","strong","x-strong"]},
140 | 				"time": {"type":"string",
141 | 				"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"}
142 | 				}
143 | 			},
144 | 		"audio": {
145 | 			"description":"audio element used to insert audio file into speech stream",
146 | 			"type":"object",
147 | 			"properties":{
148 | 				"src": {"type":"uri"},
149 | 				"fetchtimeout":{"type":"string",
150 | 					"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
151 | 				"fetchint":{"type":"string",
152 | 					"enum": ["safe","prefetch"]},
153 | 				"maxage":{"type":"string"},
154 | 				"maxstale":{"type":"string"},
155 | 				"clipBegin":{"type": "string",
156 | 					"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
157 | 				"clipEnd":{"type": "string",
158 | 					"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
159 | 				"repeatCount":{"type":"integer"
160 | 					"repeatDur":{"type": "string",
161 | 					"pattern":"^(-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)ms|s)$"},
162 | 				"soundLevel":{"type":"string",
163 | 					"pattern":"^(+|-?(0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)dB)$"},
164 | 				"speed":{
165 | 					"type":"string",
166 | 					"pattern":"^((0|[1-9]\\d*)?(\\.\\d+)?(?<=\\d)%)$"}
167 | 				 }
168 | 				}
169 | 			}
170 | 			}	
171 | 		</code>
172 | 	</pre>
173 | </div>
174 | </section>
175 | 
176 | </body>
177 | 
178 | </html>
179 | 


--------------------------------------------------------------------------------
/explainer/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
 3 | 	<head>
 4 | 		<title>Explainer: Improving Spoken Presentation on the Web</title>
 5 | 		<meta charset="utf-8"/>
 6 | 		<script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
 7 | 		
 8 | 		<script src="respec-config.js" class="remove"></script>
 9 | 
10 | 	</head>
11 | 	<body>
12 | 
13 | 		<section id="abstract">
14 |             
15 |             
16 | 			<p>The objective of the Pronunciation Task Force is to develop normative specifications and best practices guidance collaborating with other W3C groups as appropriate, to provide for proper pronunciation in HTML content when using text to speech (TTS) synthesis.  This document defines a standard mechanism to allow 
17 | 				content authors to include spoken presentation guidance in HTML content.</p>
18 | 
19 | 		</section>
20 | 
21 | 		<section id="sotd"></section>
22 | 
23 | 		<section class="informative" id="introduction">
24 | 
25 | 			<h1>Introduction</h1>
26 | 			<p>Accurate, consistent pronunciation and presentation of content spoken by text-to-speech (TTS) synthesis is an essential requirement in education, 
27 | 				communication, entertainment, and other domains. 
28 | 				From helping to teach spelling and pronunciation in different languages, to reading learning materials or new stories,
29 | 				TTS has become a vital technology for providing access to digital content on the web, through mobile devices, and now via voice-based assistants. 
30 | 				Organizations such as educational publishers and assessment vendors are looking for a standards-based solution to 
31 | 				enable authoring of spoken presentation guidance in HTML which can then be consumed by assistive technologies (AT) and other 
32 | 				applications that utilize text to speech synthesis (TTS) for rendering of content. 
33 | 				Historically, efforts at standardization (e.g. SSML or CSS Speech) have not led to broad adoption of any standard by user agents, authors, or AT; 
34 | 				what has arisen are a variety of non-interoperable approaches that meet specific needs for some applications. This explainer document presents the case for 
35 | 			improving spoken presentation on the Web and how a standards-based approach can address the requirements.</p>
36 | 	
37 | 
38 | 		</section>
39 | 		<section>
40 | 			<h1>What is this?</h1>
41 | 			<p>This is a proposal for a mechanism to allow content authors to include spoken presentation guidance in HTML content.  Such guidance can be used by AT (including screen readers and read aloud tools) and voice assistants to control TTS synthesis. A key requirement is to ensure the spoken presentation content matches the author's intent and user expectations.</p>
42 | 			<p>The challenge is integrating pronunciation content into HTML so that it is easy to author, does not "break" content, and is straightforward for consumption by AT, voice assistants, and other tools that produce spoken presentation of content.</p>
43 | 		</section>
44 | 		<section>
45 | 			<h1>Why do we care?</h1>
46 | 			<p>Several classes of AT users depend upon spoken rendering of web content by TTS synthesis. In contexts such as education, there are specific expectations for accuracy of spoken presentation in terms of pronunciation, emphasis, prosody, pausing, etc.</p>
47 | 			<p>Correct pronunciation is also important in the context of language learning, where incorrect pronunciation can confuse learners.</p>
48 | 			<p>In practice, the ecosystem of devices used in classrooms is broad, and each vendor generally provides their own TTS engines for their platforms.  Ensuring consistent spoken presentation across devices is a very real problem, and challenge. For many educational assessment vendors, the problem necessitates non-interoperable hacks to tune pronunciation and other presentation features, such as pausing, which itself can introduce new problems through inconsistent representation of text across speech and braille.</p>
49 | 			<p>It could be argued that continual advances in machine learning will improve the quality of synthesized speech, reducing the need for this proposal. Waiting for a robust solution that will likely still not fully address our needs is risky, especially when an authorable, declarative approach may be within reach (and wouldn't preclude or conflict with continual improvement in TTS technology).</p>
50 | 			<p>The current situation:</p>
51 | 			<ul>
52 | 				<li>Is an authoring challenge for content developers that wish to support spoken presentation</li>
53 | 				<li>Limits interoperability and exchange of content between vendors and platforms</li>
54 | 				<li>Is an implementation challenge for developers creating AT and read aloud capabilities</li>
55 | 				<li>Presents an inconsistent, potentially confusing user experience for listeners of TTS</li>
56 | 			</ul>
57 | 			<p>With the growing consumer adoption of voice assistants, user expectations for high quality spoken presentation is growing.  Google and Amazon both encourage application developers to utilize SSML to enhance the user experience on their platforms, yet Web content authors do not have the same opportunity to enhance the spoken presentation of their content.</p>
58 | 			<p>Finding a solution to this need can have broader benefit in allowing authors to create web content that presents a better user experience if the content is presented by voice assistants.</p>
59 | 		</section>
60 | 		<section>
61 | 			<h1>Goals</h1>
62 | 			<ul>
63 | 				<li>Define a standard mechanism that enables spoken presentation guidance to be authored in HTML</li>
64 | 				<li>Leverage existing specifications, if possible</li>
65 | 				<li>The mechanism must be consumable by AT such as screen readers</li>
66 | 				<li>Address use cases with features, not a unified specification that attempts to cover all scenarios</li>
67 | 			</ul>
68 | 		</section>
69 | 		<section>
70 | 			<h2>Open Questions</h2>
71 | 			<ol>
72 | 				<li>What is the best approach to authoring spoken presentation guidance in HTML?  Possible approaches include:
73 | 					<ul>
74 | 						<li>Using existing elements and attributes in HTML (e.g. <code>&lt;span&gt;</code> with ARIA attributes)</li>
75 | 						<li>Defining new elements and/or attributes in HTML</li>
76 | 						<li>Using an existing specification such as SSML or CSS Speech within HTML</li>
77 | 					</ul>
78 | 				</li>
79 | 				<li>What spoken presentation features should be supported in the initial version of the specification?</li>
80 | 				<li>How to ensure that the mechanism is easy to author, does not "break" content, and is straightforward for consumption by AT and other tools that produce spoken presentation of content?</li>
81 | 			</ol>
82 | 		</section>
83 | 
84 | 
85 | 		
86 | 
87 | 
88 | 
89 | 		<div data-include="../common/acknowledgements.html" data-oninclude="fixIncludes" data-include-replace="true">Acknowledgements placeholder</div>
90 | 
91 | 
92 | 
93 | 	</body>
94 | </html>
95 | 


--------------------------------------------------------------------------------
/docs/explainer.md:
--------------------------------------------------------------------------------
  1 | # Improving Spoken Presentation on the Web (DRAFT)
  2 | 
  3 | ## What is this?
  4 | 
  5 | This is a proposal for a mechanism to allow content authors to include spoken presentation
  6 | guidance in HTML content.  Such guidance can be used by assistive technologies (including screen readers and read aloud tools) and voice assistants to control text to speech synthesis. A key requirement is to ensure the spoken presentation content matches the author's intent and user expectations.
  7 | 
  8 | Currently, the W3C SSML standard is seen as an important piece of a solution. The challenge is
  9 | how to integrate SSML into HTML that is both easy to author, does not "break" content, and is straightforward for consumption by assistive technologies, voice assistants, and other tools
 10 | that produce spoken presentation of content.
 11 | 
 12 | This proposal has emerged from the work of the Accessible Platform Architecture Pronunciation
 13 | Task Force and represents a decision point arising from two differing approaches for integrating SSML (or SSML-like characteristics) into HTML.  Each of the approaches differs in authoring and consumption models (specifically for assistive technologies).
 14 | 
 15 | ## Why do we care?
 16 | 
 17 | Several classes of assistive technology users depend upon spoken rendering of web content by
 18 | text to speech synthesis (TTS).  In contexts such as education, there are specific expectations for
 19 | accuracy of spoken presentation in terms of pronunciation, emphasis, prosody, pausing, etc.
 20 | 
 21 | Correct pronunciation is also important in the context of language learning, where incorrect pronunciation can confuse learners.
 22 | 
 23 | In practice, the ecosystem of devices used in classrooms is broad, and each vendor generally provides their own text to speech engines for their platforms.  Ensuring consistent spoken presentation across devices is a very real problem, and challenge. For many educational assessment vendors, the problem necessitates non-interoperable hacks to tune pronunciation and other presentation features, such as pausing, which itself can introduce new problems through inconsistent representation of text across speech and braille.
 24 | 
 25 | It could be argued that continual advances in machine learning will improve the quality of synthesized speech, reducing the need for this proposal. Waiting for a robust solution that will likely still not fully address our needs is risky, especially when an authorable, declarative approach may be within reach (and wouldn't preclude or conflict with continual improvement in TTS technology).
 26 | 
 27 | The current situation:
 28 | 
 29 | * Is an authoring challenge for content developers that wish to support spoken presentation
 30 | * Limits interoperability and exchange of content between vendors and platforms
 31 | * Is an implementation challenge for developers creating assistive technologies and read aloud capabilities
 32 | * Presents an inconsistent, potentially confusing user experience for listeners of TTS
 33 | 
 34 | With the growing consumer adoption of voice assistants, user expectations for high quality spoken presentation is growing.  Google and Amazon both encourage application developers to utilize SSML to enhance the user experience on their platforms, yet Web content authors do not have the same opportunity to enhance the spoken presentation of their content.
 35 | 
 36 | Finding a solution to this need can have broader benefit in allowing authors to create web content that presents a better user experience if the content is presented by voice assistants.
 37 | 
 38 | ## Goals
 39 | 
 40 | * Define a standard mechanism that enables spoken presentation guidance to be authored in HTML
 41 | * Leverage SSML, if possible, as it is an existing standard that meets all identified requirements, and is supported by many speech synthesis platforms
 42 | * The mechanism must be consumable by assistive technologies such as screen readers
 43 | 
 44 | ## Non-Goals
 45 | 
 46 | * Not trying to create a new speech presentation standard
 47 | * Not trying to resurrect CSS Speech (incomplete solution in any case)
 48 | 
 49 | ## Approaches considered
 50 | 
 51 | A variety of approaches have been identified thus far by the Task Force, but two are considered front runners:
 52 | 
 53 | 1. In-line SSML within Web Content
 54 | 2. Attribute-based Model of SSML
 55 | 
 56 | Both approaches have advantages and disadvantages and these are  briefly summarized below.
 57 | 
 58 | ### In-line SSML
 59 | 
 60 | Advantages are that SSML is an existent standard directly consumable by many speech synthesizers, and there is precedent for in-lining non-HTML markup such as SVG and MathML. This approach may be more easily consumed by Voice Assistants.
 61 | 
 62 | A key disadvantage is that inline SSML appears to be more difficult for Assistive Technologies to implement, specifically for screen readers.
 63 | 
 64 | A simple example of in-line SSML in an HTML fragment is shown below:
 65 | 
 66 | ``` HTML
 67 | <p>According the 2010 US Census, the population
 68 |    of <speak><say-as interpret-as="digits">90274</say-as></speak>
 69 |    increased to 25209 from 24976 over the past 10 years.
 70 | </p>
 71 | ```
 72 | 
 73 | ### Attribute-based Model of SSML
 74 | 
 75 | Advantages are that variants of the attribute model are currently used by educational assessment vendors, these variants are supported by custom read aloud tools, and it appears that the attribute model may be more easily implementable by screen reader vendors. The EPUB3 standard includes the SSML phoneme element implemented as a pair of namespaced attributes and is used by publishers in Japan.
 76 | 
 77 | Disadvantages may include adding a level of complexity to authoring through the introduction of JSON, which could be mitigated by authoring tools. This approach requires transforming the attribute content represented in JSON into SSML by the consumer (screen reader, read aloud tool, voice assistant, etc.). Possible security concerns exist with the JSON approach.  The EPUB approach would lead to a large number of attributes if all the SSML elements were to be implemented in that manner.
 78 | 
 79 | No other standard uses string JSON values for attributes in HTML. This may cause problems for implementers
 80 | who must parse the JSON values before processing. The browser, which normally attempts to address malformed
 81 | HTML, can make no guarantees about the JSON strings. Implementers must decide how to handle malformed
 82 | JSON.
 83 | 
 84 | The schema for `data-ssml` values has not been set. Competing standards for this format, like
 85 | [SpeakableSpecification](http://webschemas.org/SpeakableSpecification), as well as any issues converting
 86 | SSML to a proper JSON schema could cause confusion for implementors and authors. Often such conversions
 87 | are "...not exactly 1:1 transformation, but very very close".
 88 | 
 89 | A simple example of the attribute based model of SSML is shown below:
 90 | 
 91 | ``` HTML
 92 | <p>According the 2010 US Census, the population
 93 |    of <span data-ssml='{"say-as" : {"interpret-as":"digits"}}'>90274</span>
 94 |    increased to 25209 from 24976 over the past 10 years.
 95 | </p>
 96 | ```
 97 | 
 98 | ## Open Questions
 99 | 
100 | 1. From the TAG/WHATWG perspective,  what disadvantages/challenges have we missed with either approach?
101 | 2. Whichever approach makes sense from the web standards perspective, will/can it be adopted by assistive technologies? Particularly for screen readers, does it fit the accessibility API model?
102 | 
103 | 
104 | 
105 | 
106 | 


--------------------------------------------------------------------------------
/user-scenarios/draft.md:
--------------------------------------------------------------------------------
 1 | # Introduction
 2 | 
 3 | As part of the Accessible Platform Architectures (APA) Working Group, the Pronunciation Task Force (PTF) is a collaboration of subject matter experts working to identify and specify the optimal approach which can deliver reliably accurate pronunciation across browser and operating environments. With the introduction of the Kurzweil reading aid in 1976, to the more sophisticated synthetic speech currently used to assist communication as reading aids for the visually impaired and those with reading disabilities, the technology has multiple applications in education, communication, entertainment, etc. From helping to teach spelling and pronunciation in different languages, Text-to-Speech (TTS) has become a vital technology for providing access to digital content on the web and through mobile devices.
 4 | 
 5 | The challenges that TTS presents include but are not limited to: the inability to accommodate regional variation and presentation of every phoneme present throughout the world; the incorrect determination by TTS of the pronunciation of content in context, and; the current inability to influence other pronunciation characteristics such as prosody and emphasis.
 6 | 
 7 | # User Scenarios
 8 | The purpose of developing user scenarios is to facilitate discussion and further requirements definition for pronunciation standards developed within the PTF prior to review of the APA. There are numerous interpretations of what form user scenarios adopt. Within the user experience research (UXR) body of practice, a user scenario is a written narrative related to the use of a service from the perspective of a user or user group. Importantly, the context of use is emphasized as is the desired outcome of use. There are potentially thousands of user scenarios for a technology such as TTS, however, the focus for the PTF is on the core scenarios that relate to the kinds of users who will engage with TTS.
 9 | 
10 | User scenarios, like Personas, represent a composite of real-world experiences. In the case of the PTF, the scenarios were derived from interviews of people who were end-consumers of TTS, as well as submitted narratives and industry examples from practitioners. There are several formats of scenarios. Several are general goal or task-oriented scenarios. Others elaborate on richer context, for example, educational assessment.
11 | 
12 | The following user scenarios are organized on the three perspectives of TTS use derived from analysis of the qualitative data collected from the discovery work:
13 | 
14 | + **End-Consumers of TTS:** Encompasses those with a visual disability or other need to have TTS operational when using assistive technologies (ATs).
15 | + **Digital Content Managers:** Addresses activities related to those responsible for producing content that needs to be accessible to ATs and W3C-WAI Guidelines.
16 | + **Software Engineers:** Includes developers and architects required to put TTS into an application or service.
17 | 
18 | ## End-Consumers of TTS
19 | Ultimately, the quality and variation of TTS rendering by assistive technologies vary widely according to a user's context. The following user scenarios reinforce the necessity for accurate pronunciation from the perspective of those who consume digitally generated content.
20 | 
21 | A.	As a traveler who uses assistive technology (AT) with TTS to help navigate through websites, I need to hear arrival and destination codes pronounced accurately so I can select the desired travel itinerary. For example, a user with a visual impairment attempts to book a flight to Ottawa, Canada and so goes to a travel website. The user already knows the airport code and enters "YOW". The site produces the result in a drop-down list as "Ottawa, CA" but the AT does not pronounce the text accurately to help the user make the correct association between their data entry and the list item. 
22 | 
23 | B.	As a test taker (tester) with a visual impairment who may use assistive technology to access the test content with speech software, screen reader or refreshable braille device, I want the content to be presented as intended, with accurate pronunciation and articulation, so that my assessment accurately reflects my knowledge of the content.
24 | 
25 | C.	As a student/learner with auditory and cognitive processing issues, it is difficult to distinguish sounds, inflections, and variations in pronunciation as rendered through synthetic voice, such as text-to-speech or screen reader technologies. Consistent and accurate pronunciation whether human-provided, external, or embedded is needed to support working executive processing, auditory processing and memory that facilitates comprehension in literacy and numeracy for learning and for assessments.
26 | 
27 | D.	As an English Learner (EL) or a visually impaired early learner using speech synthesis for reading comprehension that includes decoding words from letters as part of the learning construct (intent of measurement), pronunciation accuracy is vital to successful comprehension, as it allows the learner to distinguish sounds at the sentence, word, syllable, and phoneme level.
28 | 
29 | ## Digital Content Management for TTS
30 | The advent of graphical user interfaces (GUIs) for the management and editing of text content has given rise to content creators not requiring technical expertise beyond the ability to operate a text editing application such as Microsoft Word. The following scenario summarizes the general use, accompanied by a hypothetical application. 
31 | 
32 | A.	As a content creator, I want to create content that can readily be delivered through assistive technology, can convey the correct meaning, and ensure that screen readers render the right pronunciation based on the surrounding context. 
33 | 
34 | B.	As a content producer for a global commercial site that is inclusive, I need to be able to provide accessible culture-specific content for different geographic regions.
35 | 
36 | ### Educational Assessment
37 | In the educational assessment field, providing accurate and concise pronunciation for students with auditory accommodations, such as text-to-speech (TTS) or students with screen readers, is vital for ensuring content validity and alignment with the intended construct, which objectively measures a test takers knowledge and skills. For test administrators/educators, pronunciations must be consistent across instruction and assessment in order to avoid test bias or impact effects for students. Some additional requirements for the test administrators, include, but are not limited to, such scenarios:
38 | 
39 | A.	As a test administrator, I want to ensure that students with the read-aloud accommodation, who are using assistive technology or speech synthesis as an alternative to a human reader, have the same speech quality (e.g., intonation, expression, pronunciation, and pace, etc.) as a spoken language.
40 | 
41 | B.	As a math educator, I want to ensure that speech accuracy with mathematical expressions, including numbers, fractions, and operations have accurate pronunciation for those who rely on TTS. Some mathematical expressions require special pronunciations to ensure accurate interpretation while maintaining test validity and construct. Specific examples include:
42 | 
43 | + Mathematical formulas written in simple text with special formatting should convey the correct meaning of the expression to identify changes from normal text to super- or to sub-script text. For example, without the proper formatting, the equation:
44 | <code>a<sup>3</sup>-b<sup>3</sup>=(a-b)(a<sup>2</sup>+ab+b<sup>2</sup>)</code> may incorrectly render through some technologies and applications as a3-b3=(a-b)(a2+ab+b2).
45 | + Distinctions made in writing are often not made explicit in speech; For example, “fx” may be interpreted as fx, f(x), fx, F X, F X. The distinction depends on the context; requiring the author to provide consistent and accurate semantic markup.
46 | + For math equations with Greek letters, it is important that the speech synthesizer be able to distinguish the phonetic differences between them, whether in the natural language or phonetic equivalents. For example, ε (epsilon) υ (upsilon) φ (phi) χ (chi) ξ(xi)
47 |  
48 | C.	As a test administrator/educator, pronunciations must be consistent across instruction and assessment, in order to avoid test bias and pronunciation effects on performance for students with disabilities (SWD) in comparison to students without disabilities (SWOD). Examples include:
49 | 
50 | + If a test question is measuring rhyming of words or sounds of words, the speech synthesis should not read aloud the words, but rather spell out the words in the answer options.
51 | + If a test question is measuring spelling and the student needs to consider spelling correctness/incorrectness, the speech synthesis should not read aloud the misspelt words, especially for words, such as:
52 | 
53 | <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>i. *Heteronyms/homographs*: same spelling, different pronunciation, different meanings, such as lead (to go in front of) or lead (a metal); wind (to follow a course that is not straight) or wind (a gust of air); bass (low, deep sound) or bass (a type of fish), etc.
54 | 
55 | <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>ii. *Homophone*: words that sound alike, such as, to/two/too; there/their/they're; pray/prey; etc.
56 | 
57 | <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>iii. *Homonyms*: multiple meaning words, such as scale (measure) or scale (climb, mount); fair (reasonable) or fair (carnival); suit (outfit) or suit (harmonize); etc.
58 | 
59 | ### Academic and Linguistic Practitioners 
60 | The extension of content management in TTS is one as a means of encoding and preserving spoken text for academic analyses; irrespective of discipline, subject domain, or research methodology.
61 | 
62 | A.	As a linguist, I want to represent all the pronunciation variations of a given word in any language, for future analyses.
63 | 
64 | B.	As a speech language pathologist or speech therapists, I want TTS functionality to include components of speech and language that include dialectal and individual differences in pronunciation; identify differences in intonation, syntax, and semantics, and; allow for enhanced comprehension, language processing and support phonological awareness.
65 | 
66 | ## Software Application Development
67 | Technical standards for software development assist organizations and individuals to provide accessible experiences for users with disabilities. The final user scenarios in this document are considered from the perspective of those who design and develop software. 
68 | 
69 | A.	As a Product Owner for a web content management system (CMS), I want the next software product release to have the capability of pronouncing speech "just like Alexa can".
70 | 
71 | B.	As a client-side user interface developer, I need a way to render text content, so it is spoken accurately with assistive technologies. 
72 | 
73 | 
74 | 
75 | 
76 | 
77 | 


--------------------------------------------------------------------------------
/user-scenarios/index.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
  3 | 	<head>
  4 | 		<title>Pronunciation User Scenarios</title>
  5 | 		<meta charset="utf-8"/>
  6 | 		<script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
  7 | 		
  8 | 		<script src="respec-config.js" class="remove"></script>
  9 | 
 10 | 	</head>
 11 | 	<body>
 12 | 
 13 | 		<section id="abstract">
 14 |             
 15 |             
 16 | 			<p>The objective of the Pronunciation Task Force is to develop normative specifications and best practices guidance collaborating with other W3C groups as appropriate, to provide for proper pronunciation in HTML content when using text to speech (TTS) synthesis. This document provides various user scenarios highlighting the need for standardization of pronunciation markup, to ensure that consistent and accurate representation of the content. The requirements that come from the user scenarios provide the basis for the technical requirements/specifications. </p>
 17 | 		</section>
 18 | 
 19 | 		<section id="sotd"></section>
 20 | 
 21 | 		<section class="informative" id="introduction">
 22 | 
 23 | 			<h1>Introduction</h1>
 24 |      
 25 |             <p>As part of the Accessible Platform Architectures (APA) Working Group, the Pronunciation Task Force (PTF) is a collaboration of subject matter experts working to identify and specify the optimal approach which can deliver reliably accurate pronunciation across browser and operating environments. With the introduction of the Kurzweil reading aid in 1976, to the more sophisticated synthetic speech currently used to assist communication as reading aids for the visually impaired and those with reading disabilities, the technology has multiple applications in education, communication, entertainment, etc. From helping to teach spelling and pronunciation in different languages, Text-to-Speech (TTS) has become a vital technology for providing access to digital content on the web and through mobile devices.
 26 | 			</p>
 27 | 			<p>The challenges that TTS presents include but are not limited to: the inability to accommodate regional variation and presentation of every phoneme present throughout the world; the incorrect determination by TTS of the pronunciation of content in context, and; the current inability to influence other pronunciation characteristics such as prosody and emphasis.</p>
 28 | 			
 29 | 			
 30 | 
 31 | 		</section>
 32 | 
 33 | 		<section>
 34 | 			<h1>User Scenarios</h1>
 35 | 			<p>The purpose of developing user scenarios is to facilitate discussion and further requirements definition for pronunciation standards developed within the PTF prior to review of the APA. There are numerous interpretations of what form user scenarios adopt. Within the user experience research (UXR) body of practice, a user scenario is a written narrative related to the use of a service from the perspective of a user or user group. Importantly, the context of use is emphasized as is the desired outcome of use. There are potentially thousands of user scenarios for a technology such as TTS, however, the focus for the PTF is on the core scenarios that relate to the kinds of users who will engage with TTS.</p>
 36 | 			<p>User scenarios, like Personas, represent a composite of real-world experiences. In the case of the PTF, the scenarios were derived from interviews of people who were end-consumers of TTS, as well as submitted narratives and industry examples from practitioners. There are several formats of scenarios. Several are general goal or task-oriented scenarios. Others elaborate on richer context, for example, educational assessment.</p>
 37 | 			<p>The following user scenarios are organized on the three perspectives of TTS use derived from analysis of the qualitative data collected from the discovery work:</p>
 38 | 			<ul>
 39 | 				<li><strong>End-Consumers of TTS: </strong>Encompasses those with a visual disability or other need to have TTS operational when using assistive technologies (ATs).</li>
 40 | 				<li><strong>Digital Content Managers: </strong>Addresses activities related to those responsible for producing content that needs to be accessible to ATs and W3C-WAI Guidelines.</li>
 41 | 				<li><strong>Software Engineers: </strong>Includes developers and architects required to put TTS into an application or service.</li>
 42 | 			</ul>
 43 | 			<section>
 44 | 				<h2>End-Consumer of TTS</h2>
 45 | 				<p>Ultimately, the quality and variation of TTS rendering by assistive technologies vary widely according to a user's context. The following user scenarios reinforce the necessity for accurate pronunciation from the perspective of those who consume digitally generated content.</p>
 46 | 				<ul>
 47 | 					<li>A.	As a traveler who uses assistive technology (AT) with TTS to help navigate through websites, I need to hear arrival and destination codes pronounced accurately so I can select the desired travel itinerary. For example, a user with a visual impairment attempts to book a flight to Ottawa, Canada and so goes to a travel website. The user already knows the airport code and enters "YOW". The site produces the result in a drop-down list as "Ottawa, CA" but the AT does not pronounce the text accurately to help the user make the correct association between their data entry and the list item. </li>
 48 | 					<li>B.	As a test taker (tester) with a visual impairment who may use assistive technology to access the test content with speech software, screen reader or refreshable braille device, I want the content to be presented as intended, with accurate pronunciation and articulation, so that my assessment accurately reflects my knowledge of the content.</li>
 49 | 					<li>C.	As a student/learner with auditory and cognitive processing issues, it is difficult to distinguish sounds, inflections, and variations in pronunciation as rendered through synthetic voice, such as text-to-speech or screen reader technologies. Consistent and accurate pronunciation whether human-provided, external, or embedded is needed to support working executive processing, auditory processing and memory that facilitates comprehension in literacy and numeracy for learning and for assessments.</li>
 50 | 					<li>D.	As an English Learner (EL) or a visually impaired early learner using speech synthesis for reading comprehension that includes decoding words from letters as part of the learning construct (intent of measurement), pronunciation accuracy is vital to successful comprehension, as it allows the learner to distinguish sounds at the sentence, word, syllable, and phoneme level.</li>
 51 | 				</ul>
 52 | 			
 53 | 			</section>
 54 | 			<section>
 55 | 				<h2>Digital Content Management for TTS</h2>
 56 | 				<p>The advent of graphical user interfaces (GUIs) for the management and editing of text content has given rise to content creators not requiring technical expertise beyond the ability to operate a text editing application such as Microsoft Word. The following scenario summarizes the general use, accompanied by a hypothetical application. </p>
 57 | 				<ul>
 58 | 					<li>A.	As a content creator, I want to create content that can readily be delivered through assistive technology, can convey the correct meaning, and ensure that screen readers render the right pronunciation based on the surrounding context. </li>
 59 | 					<li>B.	As a content producer for a global commercial site that is inclusive, I need to be able to provide accessible culture-specific content for different geographic regions.</li>
 60 | 				</ul>
 61 | 
 62 | 				<section>
 63 | 					<h3>Educational Assessment</h3>
 64 | 					<p>In the educational assessment field, providing accurate and concise pronunciation for students with auditory accommodations, such as text-to-speech (TTS) or students with screen readers, is vital for ensuring content validity and alignment with the intended construct, which objectively measures a test takers knowledge and skills. For test administrators/educators, pronunciations must be consistent across instruction and assessment in order to avoid test bias or impact effects for students. Some additional requirements for the test administrators, include, but are not limited to, such scenarios:</p>
 65 | 					
 66 | 					<ul>
 67 | 						<li>A.	As a test administrator, I want to ensure that students with the read-aloud accommodation, who are using assistive technology or speech synthesis as an alternative to a human reader, have the same speech quality (e.g., intonation, expression, pronunciation, and pace, etc.) as a spoken language.</li>
 68 | 						<li>B.	As a math educator, I want to ensure that speech accuracy with mathematical expressions, including numbers, fractions, and operations have accurate pronunciation for those who rely on TTS. Some mathematical expressions require special pronunciations to ensure accurate interpretation while maintaining test validity and construct. Specific examples include:
 69 | 							<ul>
 70 | 								<li>Mathematical formulas written in simple text with special formatting should convey the correct meaning of the expression to identify changes from normal text to super- or to sub-script text. For example, without the proper formatting, the equation:<code>a<sup>3</sup>-b<sup>3</sup>=(a-b)(a<sup>2</sup>+ab+b<sup>2</sup>)</code> may incorrectly render through some technologies and applications as a3-b3=(a-b)(a2+ab+b2).</li>
 71 | 								<li>Distinctions made in writing are often not made explicit in speech; For example, “fx” may be interpreted as fx, f(x), fx, F X, F X. The distinction depends on the context; requiring the author to provide consistent and accurate semantic markup.</li>
 72 | 								<li>For math equations with Greek letters, it is important that the speech synthesizer be able to distinguish the phonetic differences between them, whether in the natural language or phonetic equivalents. For example, ε (epsilon) υ (upsilon) φ (phi) χ (chi) ξ(xi).</li>
 73 | 							</ul>
 74 | 
 75 | 						</li>
 76 | 						<li>C.	As a test administrator/educator, pronunciations must be consistent across instruction and assessment, in order to avoid test bias and pronunciation effects on performance for students with disabilities (SWD) in comparison to students without disabilities (SWOD). Examples include:
 77 | 							<ul>
 78 | 								<li>If a test question is measuring rhyming of words or sounds of words, the speech synthesis should not read aloud the words, but rather spell out the words in the answer options.</li>
 79 | 								<li>If a test question is measuring spelling and the student needs to consider spelling correctness/incorrectness, the speech synthesis should not read aloud the misspelt words, especially for words, such as:
 80 | 									<ul>
 81 | 										<li><strong>Heteronyms/homographs</strong>: same spelling, different pronunciation, different meanings, such as lead (to go in front of) or lead (a metal); wind (to follow a course that is not straight) or wind (a gust of air); bass (low, deep sound) or bass (a type of fish), etc.</li>
 82 | 										<li><strong>Homophone</strong>: words that sound alike, such as, to/two/too; there/their/they're; pray/prey; etc.</li>
 83 | 										<li><strong>Homonyms</strong>: multiple meaning words, such as scale (measure) or scale (climb, mount); fair (reasonable) or fair (carnival); suit (outfit) or suit (harmonize); etc.</li>
 84 | 									</ul>
 85 | 
 86 | 								</li>
 87 | 							</ul>
 88 | 
 89 | 						</li>
 90 | 					</ul>
 91 | 					
 92 | 				</section>
 93 | 				<section>
 94 | 					<h3>Academic and Linguistic Practitioners </h3>
 95 | 					<p>The extension of content management in TTS is one as a means of encoding and preserving spoken text for academic analyses; irrespective of discipline, subject domain, or research methodology.</p>
 96 | 					<ul>
 97 | 						<li>A.	As a linguist, I want to represent all the pronunciation variations of a given word in any language, for future analyses.</li>
 98 | 						<li>B.	As a speech language pathologist or speech therapists, I want TTS functionality to include components of speech and language that include dialectal and individual differences in pronunciation; identify differences in intonation, syntax, and semantics, and; allow for enhanced comprehension, language processing and support phonological awareness.</li>
 99 | 					</ul>
100 | 				</section>
101 | 
102 | 			</section>
103 | 			<section>
104 | 				<h2>Software Application Development</h2>
105 | 				<p>Technical standards for software development assist organizations and individuals to provide accessible experiences for users with disabilities. The final user scenarios in this document are considered from the perspective of those who design and develop software. </p>
106 | 				<ul>
107 | 					<li>A.	As a Product Owner for a web content management system (CMS), I want the next software product release to have the capability of pronouncing speech "just like Alexa can".</li>
108 | 					<li>B.	As a client-side user interface developer, I need a way to render text content, so it is spoken accurately with assistive technologies. </li>
109 | 				</ul>
110 | 			</section>
111 | 
112 | 
113 | 			
114 | 		</section>
115 | 
116 | 		
117 | 
118 | 
119 | 
120 | 		<div data-include="../common/acknowledgements.html" data-oninclude="fixIncludes" data-include-replace="true">Acknowledgements placeholder</div>
121 | 
122 | 
123 | 
124 | 	</body>
125 | </html>
126 | 


--------------------------------------------------------------------------------
/use-cases/draft.md:
--------------------------------------------------------------------------------
  1 | # Use Cases
  2 | 
  3 | - [aria-ssml](#use-case-aria-ssml)
  4 | - [data-ssml](#use-case-data-ssml)
  5 | - [HTML5](#use-case-html5)
  6 | - [Custom Element](#use-case-custom-element)
  7 | - [JSON-LD](#use-case-json-ld)
  8 | - [Ruby](#use-case-ruby)
  9 | 
 10 | ## Abstract
 11 | 
 12 | The objective of the Pronunciation Task Force is to develop normative specifications and best practices guidance collaborating with other W3C groups as appropriate, to provide for proper pronunciation in HTML content when using text to speech (TTS) synthesis. This document provides various use cases highlighting the need for standardization of pronunciation markup, to ensure that consistent and accurate representation of the content. The requirements from the user scenarios provide the basis for these technical requirements/specifications.
 13 | 
 14 | ## Use Case `aria-ssml`
 15 | 
 16 | ### Name
 17 | `aria-ssml`
 18 | 
 19 | ### Owner
 20 | Paul Grenier
 21 | 
 22 | ### Background and Current Practice
 23 | A new `aria` attribute could be used to include pronunciation content.
 24 | 
 25 | ### Goal
 26 | - Embed SSML in an HTML document.
 27 | 
 28 | ### Target Audience
 29 | 
 30 | - Assistive Technology
 31 | - Browser Extensions
 32 | - Search Engines
 33 | 
 34 | ### Implementation Options
 35 | 
 36 | - `aria-ssml` as embedded JSON
 37 | 
 38 | When AT encounters an element with `aria-ssml`, the AT should enhance the UI by processing the pronunciation content and passing it to the [Web Speech API](https://w3c.github.io/speech-api/) or an external API (e.g., [Google's Text to Speech API](https://cloud.google.com/text-to-speech/)).
 39 | 
 40 | ```html
 41 | I say <span aria-ssml='{"phoneme":{"ph":"pɪˈkɑːn","alphabet":"ipa"}}'>pecan</span>.
 42 | You say <span aria-ssml='{"phoneme":{"ph":"ˈpi.kæn","alphabet":"ipa"}}'>pecan</span>.
 43 | ```
 44 | 
 45 | Client will convert JSON to SSML and pass the XML string a speech API.
 46 | 
 47 | ```js
 48 | var msg = new SpeechSynthesisUtterance();
 49 | msg.text = convertJSONtoSSML(element.dataset.ssml);
 50 | speechSynthesis.speak(msg);
 51 | ```
 52 | 
 53 | - `aria-ssml` referencing XML by template ID
 54 | 
 55 | ```html
 56 | <!-- ssml must appear inside a template to be valid -->
 57 | <template id="pecan">
 58 | <?xml version="1.0"?>
 59 | <speak version="1.1"
 60 |        xmlns="http://www.w3.org/2001/10/synthesis"
 61 |        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 62 |        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
 63 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
 64 |        xml:lang="en-US">
 65 |     You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
 66 |     I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
 67 | </speak>
 68 | </template>
 69 | 
 70 | <p aria-ssml="#pecan">You say, pecan. I say, pecan.</p>
 71 | ```
 72 | 
 73 | Client will parse XML and serialize it before passing to a speech API:
 74 | 
 75 | ```js
 76 | var msg = new SpeechSynthesisUtterance();
 77 | var xml = document.getElementById('pecan').content.firstElementChild;
 78 | msg.text = serialize(xml);
 79 | speechSynthesis.speak(msg);
 80 | ```
 81 | 
 82 | - `aria-ssml` referencing an XML string as script tag
 83 | 
 84 | ```html
 85 | <script id="pecan" type="application/ssml+xml">
 86 | <speak version="1.1"
 87 |        xmlns="http://www.w3.org/2001/10/synthesis"
 88 |        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 89 |        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
 90 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
 91 |        xml:lang="en-US">
 92 |     You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
 93 |     I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
 94 | </speak>
 95 | </script>
 96 | 
 97 | <p aria-ssml="#pecan">You say, pecan. I say, pecan.</p>
 98 | ```
 99 | 
100 | Client will pass the XML string raw to a speech API.
101 | 
102 | ```js
103 | var msg = new SpeechSynthesisUtterance();
104 | msg.text = document.getElementById('pecan').textContent;
105 | speechSynthesis.speak(msg);
106 | ```
107 | 
108 | - `aria-ssml` referencing an external XML document by URL
109 | 
110 | ```html
111 | <p aria-ssml="http://example.com/pronounce.ssml#pecan">You say, pecan. I say, pecan.</p>
112 | ```
113 | 
114 | Client will pass the string payload to a speech API.
115 | 
116 | ```js
117 | var msg = new SpeechSynthesisUtterance();
118 | var response = await fetch(el.dataset.ssml)
119 | msg.txt = await response.text();
120 | speechSynthesis.speak(msg);
121 | ```
122 | 
123 | ### Existing Work
124 | 
125 | - [`aria-ssml` proposal](https://github.com/alia11y/SSMLinHTMLproposal)
126 | - [SSML](https://www.w3.org/TR/speech-synthesis11/)
127 | - [Web Speech API](https://w3c.github.io/speech-api/)
128 | 
129 | ### Problems and Limitations
130 | 
131 | - `aria-ssml` is not a valid `aria-*` attribute.
132 | - OS/Browsers combinations that do not support the serialized XML usage of the Web Speech API.
133 | 
134 | ## Use Case `data-ssml`
135 | 
136 | ### Name
137 | `data-ssml`
138 | 
139 | ### Owner
140 | Paul Grenier
141 | 
142 | ### Background and Current Practice
143 | As an existing attribute, [`data-*`](https://html.spec.whatwg.org/multipage/dom.html#embedding-custom-non-visible-data-with-the-data-*-attributes) could be used, with some conventions, to include pronunciation content.
144 | 
145 | ### Goal
146 | 
147 | - Support repeated use within the page context
148 | - Support external file references
149 | - Reuse existing techniques without expanding specifications
150 | 
151 | ### Target Audience
152 | 
153 | - Hearing users
154 | 
155 | ### Implementation Options
156 | 
157 | - `data-ssml` as embedded JSON
158 | 
159 | When an element with `data-ssml` is encountered by an SSML-aware AT, the AT should enhance the user interface by processing the referenced SSML content and passing it to the [Web Speech API](https://w3c.github.io/speech-api/) or an external API (e.g., [Google's Text to Speech API](https://cloud.google.com/text-to-speech/)).
160 | 
161 | 
162 | ```html
163 | I say <span data-ssml='{"phoneme":{"ph":"pɪˈkɑːn","alphabet":"ipa"}}'>pecan</span>.
164 | You say <span data-ssml='{"phoneme":{"ph":"ˈpi.kæn","alphabet":"ipa"}}'>pecan</span>.
165 | ```
166 | 
167 | Client will convert JSON to SSML and pass the XML string a speech API.
168 | 
169 | ```js
170 | var msg = new SpeechSynthesisUtterance();
171 | msg.text = convertJSONtoSSML(element.dataset.ssml);
172 | speechSynthesis.speak(msg);
173 | ```
174 | 
175 | - `data-ssml` referencing XML by template ID
176 | 
177 | ```html
178 | <!-- ssml must appear inside a template to be valid -->
179 | <template id="pecan">
180 | <?xml version="1.0"?>
181 | <speak version="1.1"
182 |        xmlns="http://www.w3.org/2001/10/synthesis"
183 |        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
184 |        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
185 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
186 |        xml:lang="en-US">
187 |     You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
188 |     I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
189 | </speak>
190 | </template>
191 | 
192 | <p data-ssml="#pecan">You say, pecan. I say, pecan.</p>
193 | ```
194 | 
195 | Client will parse XML and serialize it before passing to a speech API:
196 | 
197 | ```js
198 | var msg = new SpeechSynthesisUtterance();
199 | var xml = document.getElementById('pecan').content.firstElementChild;
200 | msg.text = serialize(xml);
201 | speechSynthesis.speak(msg);
202 | ```
203 | 
204 | - `data-ssml` referencing an XML string as script tag
205 | 
206 | ```html
207 | <script id="pecan" type="application/ssml+xml">
208 | <speak version="1.1"
209 |        xmlns="http://www.w3.org/2001/10/synthesis"
210 |        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
211 |        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
212 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
213 |        xml:lang="en-US">
214 |     You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
215 |     I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
216 | </speak>
217 | </script>
218 | 
219 | <p data-ssml="#pecan">You say, pecan. I say, pecan.</p>
220 | ```
221 | 
222 | Client will pass the XML string raw to a speech API.
223 | 
224 | ```js
225 | var msg = new SpeechSynthesisUtterance();
226 | msg.text = document.getElementById('pecan').textContent;
227 | speechSynthesis.speak(msg);
228 | ```
229 | 
230 | - `data-ssml` referencing an external XML document by URL
231 | 
232 | ```html
233 | <p data-ssml="http://example.com/pronounce.ssml#pecan">You say, pecan. I say, pecan.</p>
234 | ```
235 | 
236 | Client will pass the string payload to a speech API.
237 | 
238 | ```js
239 | var msg = new SpeechSynthesisUtterance();
240 | var response = await fetch(el.dataset.ssml)
241 | msg.txt = await response.text();
242 | speechSynthesis.speak(msg);
243 | ```
244 | 
245 | ### Existing Work
246 | 
247 | - [`aria-ssml` proposal](https://github.com/alia11y/SSMLinHTMLproposal)
248 | - [SSML](https://www.w3.org/TR/speech-synthesis11/)
249 | - [Web Speech API](https://w3c.github.io/speech-api/)
250 | 
251 | ### Problems and Limitations
252 | 
253 | - Does not assume or suggest visual pronunciation help for deaf or hard of hearing
254 | - Use of `data-*` requires input from AT vendors
255 | - XML data is not indexed by search engines
256 | 
257 | ## Use Case HTML5
258 | 
259 | ### Name
260 | HTML5
261 | 
262 | ### Owner
263 | Paul Grenier
264 | 
265 | ### Background and Current Practice
266 | HTML5 includes the XML [namespaces](https://www.w3.org/TR/html5/infrastructure.html#namespaces) for MathML and SVG. So, using either's elements in an HTML5 document is valid. Because SSML's implementation is non-visual in nature, browser implementation could be slow or non-existent without affecting how authors use SSML in HTML. Expansion of HTML5 to include SSML namespace would allow valid use of SSML in the HTML5 document. Browsers would treat the element like any other unknown element, as [`HTMLUnknownElement`](https://www.w3.org/TR/html50/dom.html#htmlunknownelement).
267 | 
268 | ### Goal
269 | 
270 | - Support valid use of SSML in HTML5 documents
271 | - Allow visual pronunciation support
272 | 
273 | ### Target Audience
274 | 
275 | - SSML-aware technologies and browser extensions
276 | - Search indexers
277 | 
278 | ### Implementation Options
279 | 
280 | - SSML
281 | 
282 | When an element with [`data-ssml`](https://www.w3.org/TR/wai-aria-1.1/#aria-details) is encountered by an [SSML](https://www.w3.org/TR/speech-synthesis11/)-aware AT, the AT should enhance the user interface by processing the referenced SSML content and passing it to the [Web Speech API](https://w3c.github.io/speech-api/) or an external API (e.g., [Google's Text to Speech API](https://cloud.google.com/text-to-speech/)).
283 | 
284 | ```html
285 | <speak>
286 |   You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
287 |   I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
288 | </speak>
289 | ```
290 | 
291 | ### Existing Work
292 | 
293 | - [VoiceXML 2.1](https://www.w3.org/TR/voicexml21/)
294 | - [SMIL - Synchronized Multimedia Integration Language](https://www.w3.org/TR/REC-smil/smil-extended-linking.html#SMILLinking-Relationship-to-XLink)
295 | - [PLS - Pronunciation Lexicon](https://www.w3.org/TR/pronunciation-lexicon/#AppB)
296 | 
297 | ### Problems and Limitations
298 | 
299 | - SSML is not valid HTML5
300 | 
301 | ## Use Case Custom Element
302 | 
303 | ### Name
304 | Custom Element
305 | 
306 | ### Owner
307 | Paul Grenier
308 | 
309 | ### Background and Current Practice
310 | Embed valid SSML in HTML using custom elements registered as `ssml-*` where `*` is the actual SSML tag name (except for `p` which expects the same treatment as an HTML `p` in HTML layout).
311 | 
312 | ### Goal
313 | 
314 | - Support use of SSML in HTML documents.
315 | 
316 | ### Target Audience
317 | 
318 | - SSML-aware technologies and browser extensions
319 | - Search indexers
320 | 
321 | ### Implementation Options
322 | 
323 | - `ssml-speak`: see [demo](https://ssml-components.glitch.me/)
324 | 
325 | Only the `<ssml-speak>` component requires registration. The component code lifts the SSML by getting the `innerHTML` and removing the `ssml-` prefix from the interior tags and passing it to the web speech API. The `<p>` tag from SSML is not given the prefix because we still want to start a semantic paragraph within the content. The other tags used in the example have no semantic meaning. Tags like `<em>` in HTML could be converted to `<emphasis>` in SSML. In that case, CSS styles will come from the browser's default styles or the page author.
326 | 
327 | ```html
328 | <ssml-speak>
329 |   Here are <ssml-say-as interpret-as="characters">SSML</ssml-say-as> samples.
330 |   I can pause<ssml-break time="3s"></ssml-break>.
331 |   I can speak in cardinals.
332 |   Your number is <ssml-say-as interpret-as="cardinal">10</ssml-say-as>.
333 |   Or I can speak in ordinals.
334 |   You are <ssml-say-as interpret-as="ordinal">10</ssml-say-as> in line.
335 |   Or I can even speak in digits.
336 |   The digits for ten are <ssml-say-as interpret-as="characters">10</ssml-say-as>.
337 |   I can also substitute phrases, like the <ssml-sub alias="World Wide Web Consortium">W3C</ssml-sub>.
338 |   Finally, I can speak a paragraph with two sentences.
339 |   <p>
340 |     <ssml-s>You say, <ssml-phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</ssml-phoneme>.</ssml-s>
341 |     <ssml-s>I say, <ssml-phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</ssml-phoneme>.</ssml-s>
342 |   </p>
343 | </ssml-speak>
344 | <template id="ssml-controls">
345 |   <style>
346 |     [role="switch"][aria-checked="true"] :first-child,
347 |     [role="switch"][aria-checked="false"] :last-child {
348 |       background: #000;
349 |       color: #fff;
350 |     }
351 |   </style>
352 |   <slot></slot>
353 |   <p>
354 |     <span id="play">Speak</span>
355 |     <button role="switch" aria-checked="false" aria-labelledby="play">
356 |       <span>on</span>
357 |       <span>off</span>
358 |     </button>
359 |   </p>
360 | </template>
361 | ```
362 | 
363 | ```js
364 | class SSMLSpeak extends HTMLElement {
365 |   constructor() {
366 |     super();
367 |     const template = document.getElementById('ssml-controls');
368 |     const templateContent = template.content;
369 |     this.attachShadow({mode: 'open'})
370 |       .appendChild(templateContent.cloneNode(true));
371 |   }
372 |   connectedCallback() {
373 |     const button = this.shadowRoot.querySelector('[role="switch"][aria-labelledby="play"]')
374 |     const ssml = this.innerHTML.replace(/ssml-/gm, '')
375 |     const msg = new SpeechSynthesisUtterance();
376 |     msg.lang = document.documentElement.lang;
377 |     msg.text = `<speak version="1.1"
378 |       xmlns="http://www.w3.org/2001/10/synthesis"
379 |       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
380 |       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
381 |         http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
382 |       xml:lang="${msg.lang}">
383 |     ${ssml}
384 |     </speak>`;
385 |     msg.voice = speechSynthesis.getVoices().find(voice => voice.lang.startsWith(msg.lang));
386 |     msg.onstart = () => button.setAttribute('aria-checked', 'true');
387 |     msg.onend = () => button.setAttribute('aria-checked', 'false');
388 |     button.addEventListener('click', () => speechSynthesis[speechSynthesis.speaking ? 'cancel' : 'speak'](msg))
389 |   }
390 | }
391 | 
392 | customElements.define('ssml-speak', SSMLSpeak);
393 | ```
394 | 
395 | ### Existing Work
396 | 
397 | - [DOM Living Standard](https://dom.spec.whatwg.org/#concept-element-custom)
398 | - [Web Speech API](https://w3c.github.io/speech-api/)
399 | 
400 | ### Problems and Limitations
401 | 
402 | - OS/Browsers combinations that do not support the serialized XML usage of the Web Speech API.
403 | - Browsers may need to map SSML tags with CSS styles for default user agent styles.
404 | - Without an extension or AT, only user interaction can start the Web Speech API.
405 | - Authors or parsing may need to remove HTML content with unintended SSML semantics before serialization.
406 | 
407 | ## Use Case JSON-LD
408 | 
409 | ### Name
410 | JSON-LD
411 | 
412 | ### Owner
413 | Paul Grenier
414 | 
415 | ### Background and Current Practice
416 | [JSON-LD](https://www.w3.org/2018/jsonld-cg-reports/json-ld/) provides an established standard for embedding data in HTML. Unlike other microdata approaches, JSON-LD helps to reuse standardized annotations through external references.
417 | 
418 | ### Goal
419 | 
420 | - Support use of SSML in HTML documents.
421 | 
422 | ### Target Audience
423 | 
424 | - SSML-aware technologies and browser extensions
425 | - Search indexers
426 | 
427 | ### Implementation Options
428 | 
429 | - `JSON-LD`
430 | 
431 | ```html
432 | <script type="application/ld+json">
433 | {
434 |   "@context": "http://schema.org/",
435 |   "@id": "/Pronunciation#WKRP",
436 |   "@type": "TextPronunciation",
437 |   "@language": "en",
438 |   "text": "WKRP",
439 |   "speechToTextMarkup": "SSML",
440 |   "phoneticText": "<say-as interpret-as=\"characters\">WKRP</say-as>"
441 | }
442 | </script>
443 | <p>
444 |   Do you listen to <span itemscope
445 |     itemtype="http://schema.org/TextPronunciation"
446 |     itemid="/Pronunciation#WKRP">WKRP</span>?
447 | </p>
448 | ```
449 | 
450 | ### Existing Work
451 | 
452 | - [Web of Things Working Group](https://www.w3.org/WoT/WG/)
453 | - [Schema.org](https://schema.org/)
454 | 
455 | ### Problems and Limitations
456 | 
457 | - not an established "type"/published schema
458 | 
459 | 
460 | ## Use Case Ruby
461 | 
462 | ### Name
463 | Ruby
464 | 
465 | ### Owner
466 | Paul Grenier
467 | 
468 | ### Background and Current Practice
469 | > [`<Ruby>`](https://www.w3.org/TR/html5/textlevel-semantics.html#the-ruby-element) annotations are short runs of text presented alongside base text, primarily used in East Asian typography as a guide for pronunciation or to include other annotations.
470 | 
471 | `ruby` guides pronunciation visually. This seems like a natural fit for text-to-speech.
472 | 
473 | ### Goal
474 | 
475 | - Support use of SSML in HTML documents.
476 | - Offer visual pronunciation support.
477 | 
478 | ### Target Audience
479 | 
480 | - AT and browser extensions
481 | - Search indexers
482 | 
483 | ### Implementation Options
484 | 
485 | - `ruby` with microdata
486 | 
487 | Microdata can augment the `ruby` element and its descendants.
488 | 
489 | ```html
490 | <p>
491 |   You say,
492 |   <span itemscope="" itemtype="http://example.org/Pronunciation">
493 |     <ruby itemprop="phoneme" content="pecan">
494 |       pecan
495 |       <rt itemprop="ph">pɪˈkɑːn</rt>
496 |       <meta itemprop="alphabet" content="ipa">
497 |     </ruby>.
498 |   </span>
499 |   I say,
500 |   <span itemscope="" itemtype="http://example.org/Pronunciation">
501 |     <ruby itemprop="phoneme" content="pecan">
502 |       pe
503 |       <rt itemprop="ph">ˈpi</rt>
504 |       can
505 |       <rt itemprop="ph">kæn</rt>
506 |       <meta itemprop="alphabet" content="ipa">
507 |     </ruby>.
508 |   </span>
509 | </p>
510 | ```
511 | 
512 | ### Existing Work
513 | 
514 | - [HTML Living Standard](https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-ruby-element)
515 | - [Schema.org](https://schema.org/)
516 | 
517 | ### Problems and Limitations
518 | 
519 | - AT may process annotations as content
520 | - AT "double reading" words instead of choosing either the content or the annotation
521 | - Only offers for a few SSML expressions
522 | - Difficult to reuse by reference
523 | 
524 | 


--------------------------------------------------------------------------------
/gap-analysis/index.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
  3 | 
  4 | <head>
  5 |   <title>Pronunciation Gap Analysis</title>
  6 |   <meta charset="utf-8" />
  7 |   <script class="remove" src="https://www.w3.org/Tools/respec/respec-w3c"></script>
  8 |   <script class="remove" src="respec-config.js"></script>
  9 | </head>
 10 | 
 11 | <body>
 12 |   <section id="abstract">
 13 |     <p>This document is the Gap Analysis Review which presents required features of Spoken Text Pronunciation and
 14 |       Presentation and existing standards or specifications that may support (or enable support) of those features. Gaps
 15 |       are defined when a
 16 |       required feature does not have a corresponding method by which it can be authored in HTML.</p>
 17 |   </section>
 18 |   <section id="sotd"></section>
 19 |   <section class="informative" id="introduction">
 20 |     <h1>Introduction</h1>
 21 |     <p>Accurate, consistent pronunciation and presentation of content spoken by text to speech synthesis (TTS) is an
 22 |       essential requirement in education and other domains. Organizations such as educational publishers and assessment
 23 |       vendors are
 24 |       looking for a standards-based solution to enable authoring of spoken presentation guidance in HTML which can then
 25 |       be consumed by assistive technologies and other applications which utilize text to speech synthesis for rendering
 26 |       of content.</p>
 27 |     <p>W3C has developed two standards pertaining to the presentation of speech synthesis which have reached
 28 |       recommendation status, <a href="https://www.w3.org/TR/speech-synthesis11/">Speech Synthesis Markup Language</a>
 29 |       (SSML) and the <a href="https://www.w3.org/TR/pronunciation-lexicon/">Pronunciation Lexicon Specification</a>
 30 |       (PLS). Both standards are directly consumed by a speech synthesis engine supporting those standards. While a PLS
 31 |       file reference may be referenced in a HTML
 32 |       page using <code>link rel</code>, there is no known uptake of PLS using this method by assistive technologies.
 33 |       While there are technically methods to allow authors to inline SSML within HTML (using namespaces), such an
 34 |       approach has not been
 35 |       adopted, and anecdotal comments from browser and assistive technology vendors have suggested this is not a viable
 36 |       approach.</p>
 37 |     <p><a href="https://www.w3.org/TR/css3-speech/">CSS Speech Module</a> is a retired W3C Working Group Note that
 38 |       describes mechanism by which content authors may apply a variety of speech styling and presentation properties to
 39 |       HTML. This approach
 40 |       has a variety of advantages but does not implement the full set of features required for pronunciation. <a
 41 |         href="https://www.w3.org/TR/css3-speech/#pronunciation">Section 16</a> of the Note specifically references the
 42 |       issue of pronunciation:</p>
 43 |     <blockquote>
 44 |       CSS does not specify how to define the pronunciation (expressed using a well-defined phonetic alphabet) of a
 45 |       particular piece of text within the markup document. A "phonemes" property was described in earlier drafts of this
 46 |       specification, but
 47 |       objections were raised due to breaking the principle of separation between content and presentation (the
 48 |       "phonemes" authored within aural CSS stylesheets would have needed to be updated each time text changed within the
 49 |       markup document). The
 50 |       "phonemes" functionality is therefore considered out-of-scope in CSS (the presentation layer) and should be
 51 |       addressed in the markup / content layer.
 52 |     </blockquote>
 53 |     <p>While a portion of CSS Speech was demonstrated by <a
 54 |         href="https://developer.apple.com/videos/play/wwdc2011/519/">Apple in 2011 on iOS with Safari and VoiceOver</a>,
 55 |       it is not presently supported on any platform with any Assistive Technology,
 56 |       and work on the standard has itself been stopped by the CSS working group.</p>
 57 |     <p>Efforts to address this need have been considered by both assessment technology vendors and the publishing
 58 |       community. Citing the need for pronunciation and presentation controls, the IMS Global Learning Consortium added
 59 |       the ability to author SSML
 60 |       markup, specify PLS files, and reference CSS Speech properties to the Question Test Interoperability (QTI) <a
 61 |         href="https://www.imsglobal.org/apip/apipv1p0/APIP_QTI_v1p0.html">Accessible Portable Item Protocol</a> (APIP).
 62 |       In practice, QTI/APIP
 63 |       authored content is transformed into HTML for rendering in web browsers. This led to the dilemma that there is no
 64 |       standardized (and supported) method for inlining SSML in HTML, nor is there support for CSS Speech. This has led
 65 |       to the situation
 66 |       where SSML is the primary authoring model, with assessment vendors implementing a custom method for adding the
 67 |       SSML (or SSML-like) features to HTML using non-standard or data attributes, with customized Read Aloud software
 68 |       consuming those
 69 |       attributes for text to speech synthesis. Given the need to deliver accurate spoken presentation, non-standard
 70 |       approaches often include mis-use of WAI-ARIA, and novel or contextually non-valid attributes (e.g.,
 71 |       <code>label</code>). A
 72 |       particular problem occurs when custom pronunciation is applied via a misuse of the <code>aria-label</code>
 73 |       attribute, which results in an issue for screen reader users who also rely upon refreshable braille, and in which
 74 |       a hinted pronunciation
 75 |       intended only for a text to speech synthesizer also appears on the braille display.
 76 |     </p>
 77 |     <p>The attribute model for adding pronunciation and presentation guidance for assistive technologies and text to
 78 |       speech synthesis has demonstrated traction by vendors trying to solve this need. It should be noted that many of
 79 |       the required
 80 |       features are not well supported by a single attribute, as most follow the form of a presentation property / value
 81 |       pairing. Using multiple attributes to provide guidance to assistive technologies is not novel, as seen with
 82 |       WAI-ARIA where multiple
 83 |       attributes may be applied to a single element, for example, <code>role</code> and <code>aria-checked</code>. The
 84 |       <a href="https://w3c.github.io/publ-epub-revision/epub32/spec/epub-contentdocs.html#sec-xhtml-ssml-attrib">EPUB
 85 |         standard</a> for
 86 |       digital publishing introduced a namespaced version of the SSML <code>phoneme</code> and <code>alphabet</code>
 87 |       attributes enabling content authors to provide pronunciation guidance. Uptake by the publishing community has been
 88 |       limited, reportedly
 89 |       due to the lack of support in reading systems and assistive technologies.
 90 |     </p>
 91 |   </section>
 92 |   <section>
 93 |     <h1>Core Features for Pronunciation and Spoken Presentation</h1>
 94 |     <p>The common spoken pronunciation requirements from the education domain serve as a primary source for these
 95 |       features. These requirements can be broken down into the following main functions that would support authoring and
 96 |       spoken presentation
 97 |       needs.</p>
 98 |     <section>
 99 |       <h2>Language</h2>
100 |       <p>When content is authored in mixed language, a mechanism is needed to allow authors to indicate both the base
101 |         language of the content as well as the language of individual words and phrases. The expectation is that
102 |         assistive technologies and
103 |         other tools that utilize text to speech synthesis would detect and apply the language requested when presenting
104 |         the text.</p>
105 |     </section>
106 |     <seciton>
107 |       <h2>Voice Family / Gender</h2>
108 |       <p>Content authors may elect to make adjustments of those paramters to control the spoken presentation for
109 |         purposes such as providing a gender specific voice to reflect that of the author, or for a character (or
110 |         characters) in theatrical
111 |         presentation of a story. Many assistive technologies already provide user selection of voice family and gender
112 |         independent of any authored intent.</p>
113 |     </seciton>
114 |     <section>
115 |       <h2>Phonetic Pronunciation of String Values</h2>
116 |       <p>In some cases words may need to have their phonetic pronunciation prescribed by the content author. This may
117 |         occur when uncommon words (not supported by text to speech synthesizers), or in cases where word pronunciation
118 |         will vary based on
119 |         context, and that context may not be correctly described.</p>
120 |     </section>
121 |     <section>
122 |       <h2>String Substitution</h2>
123 |       <p>There are cases where content that is visually presented may require replacement (substitution) with an
124 |         alternate textual form to ensure correct pronunciation by text to speech synthesizers. In some cases phonetic
125 |         pronunciation may be a
126 |         solution to this need.</p>
127 |     </section>
128 |     <section>
129 |       <h2>Rate / Pitch / Volume</h2>
130 |       <p>While end users should have full control over spoken presentation parameters such as speaking rate, pitch, and
131 |         volume (e.g., WCAG 1.4.2 ), content authors may elect to make adjustments of those parameters to control the
132 |         spoken presentation for
133 |         purposes such as a theatrical presentation of a story. Many assistive technologies already provide user control
134 |         speaking rate, pitch, and volume independent of any authored intent.</p>
135 |     </section>
136 |     <section>
137 |       <h2>Emphasis</h2>
138 |       <p>In written text, an author may find it necessary to add emphasis to an important word or phrase. HTML supports
139 |         both semantic elements (e.g., <code>em</code>) and CSS properties which, through a variety of style options,
140 |         make programmatic
141 |         detection of authored emphasis difficult (e.g., <code>font-weight: heavy</code>). While the emphasis element has
142 |         existed since HTML 2.0, there is currently no uptake by assistive technology or read aloud tools to present text
143 |         semantically tagged
144 |         for emphasis to be spoken with emphasis.</p>
145 |     </section>
146 |     <section>
147 |       <h2>Say As</h2>
148 |       <p>While text to speech engines continue to improve in their ability to process text and provide accurate spoken
149 |         rendering of acronyms and numeric values, there can be instances where uncommon terms or alphanumeric constructs
150 |         pose challenges.
151 |         Further, some educators may have specific requirements as to how a numeric value be spoken which may differ from
152 |         a TTS engine's default rendering. For example, the Smarter Balanced Assessment Consortium has developed <a
153 |           href="https://portal.smarterbalanced.org/library/en/read-aloud-guidelines.pdf">Read Aloud Guidelines</a> to be
154 |         followed by human readers used by students who may require a spoken presentation of an educational test, which
155 |         includes specific examples
156 |         of how numeric values should be read aloud.</p>
157 |       <section>
158 |         <h3>Presentation of Numeric Values</h3>
159 |         <p>Precise control as to how numeric values should be spoken may not always be correctly determined by text to
160 |           speech engines from context.&nbsp; Examples include speaking a number as individual digits, correct reading of
161 |           year values, and
162 |           the correct speaking of ordinal and cardinal numbers.</p>
163 |       </section>
164 |       <section>
165 |         <h3>Presentation of String Values</h3>
166 |         <p>Precise control as to how string values should be spoken, which may not be determined correctly by text to
167 |           speech synthesizers.</p>
168 |       </section>
169 |     </section>
170 |     <section>
171 |       <h2>Pausing</h2>
172 |       <p>Specific spoken presentation requirements exist in the <a href="">Accessibility Guidelines from PARCC</a>, and
173 |         include requirements such as inserting pauses in the spoken presentation, before and after emphasized words and
174 |         mathematical
175 |         terms. In practice, content authors may find it necessary to insert pauses between numeric values to limit the
176 |         chance of hearing multiple numbers as a single value. One common technique to achieve pausing to date has
177 |         involved inserting
178 |         non-visible commas before or after a text string requiring a pause. While this may work in practice for a read
179 |         aloud TTS tool, it is problematic for screen reader users who may, based on verbosity settings, hear the
180 |         multiple commas announced,
181 |         and for refreshable braille users who will have the commas visible in braille.</p>
182 |     </section>
183 |   </section>
184 | 
185 |   <section>
186 |     <h1>Gap Analysis</h1>
187 |     <p>Based on the features and use cases described in the prior sections, the following table presents existing speech
188 |       presentation standards, HTML features, and WAI-ARIA attributes that may offer a method to achieve the requirement
189 |       for HTML authors. A blank cell for any approach represents a gap in support.</p>
190 |     <table border="1" cellpadding="2" cellspacing="2" width="100%">
191 |       <tbody>
192 |         <tr>
193 |           <th align="left" valign="top">Requirement<br /></th>
194 |           <th valign="top">HTML<br /></th>
195 |           <th valign="top">WAI-ARIA<br /></th>
196 |           <th valign="top">PLS<br /></th>
197 |           <th valign="top">CSS Speech<br /></th>
198 |           <th valign="top">SSML<br /></th>
199 |         </tr>
200 |         <tr>
201 |           <th align="left" valign="top">Language<br /></th>
202 |           <td align="center" valign="top">Yes<br /></td>
203 |           <td align="center" valign="top"><br /></td>
204 |           <td align="center" valign="top"><br /></td>
205 |           <td align="center" valign="top"><br /></td>
206 |           <td align="center" valign="top">Yes<br /></td>
207 |         </tr>
208 |         <tr>
209 |           <th align="left" valign="top">Voice Family/Gender<br /></th>
210 |           <td align="center" valign="top"><br /></td>
211 |           <td align="center" valign="top"><br /></td>
212 |           <td align="center" valign="top"><br /></td>
213 |           <td align="center" valign="top">Yes<br /></td>
214 |           <td align="center" valign="top">Yes<br /></td>
215 |         </tr>
216 |         <tr>
217 |           <th align="left" valign="top">Phonetic Pronunciation<br /></th>
218 |           <td align="center" valign="top"><br /></td>
219 |           <td align="center" valign="top"><br /></td>
220 |           <td align="center" valign="top">Yes<br /></td>
221 |           <td align="center" valign="top"><br /></td>
222 |           <td align="center" valign="top">Yes<br /></td>
223 |         </tr>
224 |         <tr>
225 |           <th align="left" valign="top">Substitution<br /></th>
226 |           <td align="center" valign="top"><br /></td>
227 |           <td align="center" valign="top">Partial<br /></td>
228 |           <td align="center" valign="top"><br /></td>
229 |           <td align="center" valign="top"><br /></td>
230 |           <td align="center" valign="top">Yes<br /></td>
231 |         </tr>
232 |         <tr>
233 |           <th align="left" valign="top">Rate/Pitch/Volume<br /></th>
234 |           <td align="center" valign="top"><br /></td>
235 |           <td align="center" valign="top"><br /></td>
236 |           <td align="center" valign="top"><br /></td>
237 |           <td align="center" valign="top">Yes<br /></td>
238 |           <td align="center" valign="top">Yes<br /></td>
239 |         </tr>
240 |         <tr>
241 |           <th align="left" valign="top">Emphasis<br /></th>
242 |           <td align="center" valign="top">Yes<br /></td>
243 |           <td align="center" valign="top"><br /></td>
244 |           <td align="center" valign="top"><br /></td>
245 |           <td align="center" valign="top">Yes<br /></td>
246 |           <td align="center" valign="top">Yes<br /></td>
247 |         </tr>
248 |         <tr>
249 |           <th align="left" valign="top">Say As<br /></th>
250 |           <td align="center" valign="top"><br /></td>
251 |           <td align="center" valign="top"><br /></td>
252 |           <td align="center" valign="top"><br /></td>
253 |           <td align="center" valign="top"><br /></td>
254 |           <td align="center" valign="top">Yes<br /></td>
255 |         </tr>
256 |         <tr>
257 |           <th align="left" valign="top">Pausing<br /></th>
258 |           <td align="center" valign="top"><br /></td>
259 |           <td align="center" valign="top"><br /></td>
260 |           <td align="center" valign="top"><br /></td>
261 |           <td align="center" valign="top">Yes<br /></td>
262 |           <td align="center" valign="top">Yes<br /></td>
263 |         </tr>
264 |       </tbody>
265 |     </table>
266 |     <p>The following sections describe how each of the required features may be met by the use of existing approaches. A
267 |       key consideration in the analysis is whether a means exists to directly author (or annotate) HTML content to
268 |       incorporate the
269 |       spoken presentation and pronunciation feature.</p>
270 |     <section>
271 |       <h3>Language</h3>
272 |       <p>Allow content authors to specify the language of text contained within an element so that the TTS used for
273 |         rendering will select the appropriate language for synthesis.<br /></p>
274 | 
275 |       <h4>HTML</h4>
276 |       <p><code>lang</code> attribute can be applied at the document level or to individual elements. (WCAG) (AT
277 |         Supported: some)<br /></p>
278 |       <h4>SSML</h4>
279 |       <p>Example: <code>&lt;speak&gt; In Paris, they pronounce it &lt;lang xml:lang="fr-FR"&gt;Paris&lt;/lang&gt;
280 |           &lt;/speak&gt;</code>code></p>
281 |     </section>
282 |     <section>
283 |       <h3>Voice Family/Gender</h3>
284 |       <p>Allow content authors to specify a specific TTS voice to be used to render text. For example, for content that
285 |         presents a dialog between two people, a woman and a man, the author may specify that a female voice be used for
286 |         the woman's text and a
287 |         male voice be used for the man's text. Some platform TTS services may support a variety of voices, identified by
288 |         a name, gender, or even age.</p>
289 |       <h4>CSS</h4>
290 |       <p><code>voice-family</code> property can be used to specify the gender of the voice.<br /></p>
291 |       <p>Example: <code>{ voice-family: male; }</code></p>
292 |       <h4>SSML</h4>
293 |       <p>Using the <code>&lt;voice&gt;</code> element, the gender of the speaker, if supported by the TTS engine, can be
294 |         specified.</p>
295 |       <p>Example: <code>&lt;voice gender="female" &gt;Mary had a little lamb,&lt;/voice&gt;</code></p>
296 | 
297 |     </section>
298 |     <section>
299 |       <h3>Phonetic Pronunciation</h3>
300 |       <p>Allow content authors to precisely specify the phonetic pronunciation of a word or phrase.</p>
301 |       <h4>PLS</h4>
302 |       <p>Using PLS, all the pronunciations can be factored out into an external PLS document which is referenced by the
303 |         <code>&lt;lexicon&gt;</code> element of SSML
304 |       </p>
305 |       <p>
306 |       <pre class="example">Example: <code>&lt;speak&gt; &lt;lexicon uri="http://www.example.com/movie_lexicon.pls"/&gt;
307 |           The title of the movie is: "La vita è bella" (Life is beautiful),
308 |           which is directed by Roberto Benigni.&lt;/speak&gt;</code>
309 |       </pre>
310 |       </p>
311 | 
312 |       <h4>SSML</h4>
313 |       <p>The following is a simple example of an SSML document. It includes an Italian movie title and the name of the
314 |         director to be read in US English.</p>
315 |       <p>
316 |       <pre class="example">Example: The title of the movie is:
317 |         <code>&lt;speak&gt; &lt;phoneme alphabet="ipa" ph="ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə"&gt;
318 |           "La vita è bella"&lt;/phoneme&gt; (Life is beautiful),
319 |           which is directed by
320 |           &lt;phoneme alphabet="ipa" ph="ɹəˈbɛːɹɾoʊ bɛˈniːnji""&gt;
321 |           Roberto Benigni &lt;/phoneme&gt;.&lt;/speak&gt;
322 |         </code>
323 |       </pre>
324 |       </p>
325 | 
326 |     </section>
327 |     <section>
328 |       <h3>Substitution</h3>
329 |       <p>Allow content authors to substitute a text string to be rendered by TTS instead of the actual text contained in
330 |         an element.</p>
331 |       <h4>WAI-ARIA</h4>
332 |       <p>The <a href="https://www.w3.org/TR/wai-aria/#aria-label"><code>aria-label</code></a> and <a
333 |           href="https://www.w3.org/TR/wai-aria/#aria-labelledby"><code>aria-labelledby</code></a> attribute can be used
334 |         by an author to supply a text string
335 |         that will become the accessible name for the element upon which it is applied.&nbsp; This usage effectively
336 |         provides a mechanism for performing text substation that is supported by a screen reader. However, it is
337 |         problematic for one significant reason; for users who utilize screen readers and refreshable Braille, the
338 |         content that is voiced will not match the content that is sent to the refreshable Braille device. This mismatch
339 |         would not be acceptable for some content, particularly for assessment content.<br /></p>
340 |       <h4>SSML</h4>
341 |       <p>Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute
342 |         with the <code>alias</code> attribute.</p>
343 |       <p>
344 |       <pre class="example">
345 |         <code>
346 |           &lt;speak&gt;
347 |           My favorite chemical element is &lt;sub alias="aluminum"&gt;Al&lt;/sub&gt;,
348 |           but Al prefers &lt;sub alias="magnesium">Mg&lt;/sub&gt;.
349 |           &lt;/speak&gt;
350 |         </code>
351 |       </pre>
352 |       </p>
353 |     </section>
354 |     <section>
355 |       <h3>Rate/Pitch/Volume</h3>
356 |       <p>Allow content authors to specify characteristics, such as rate, pitch, and/or volume of the TTS rendering of
357 |         the text.</p>
358 |       <h4>CSS</h4>
359 |       <p>
360 |       <dl>
361 |         <dt><strong>voice-rate</strong></dt>
362 |         <dd>The <code>‘voice-rate’</code> property manipulates the rate of generated synthetic speech in terms of words
363 |           per minute.</dd>
364 |       </dl>
365 |       </p>
366 | 
367 |       <p>
368 |       <dl>
369 |         <dt><strong>voice-pitch</strong></dt>
370 |         <dd>The <code>‘voice-pitch’</code> property specifies the "baseline" pitch of the generated speech output, which
371 |           depends on the used <code>‘voice-family’</code> instance, and varies across speech synthesis processors (it
372 |           approximately corresponds to the average pitch of the output). For example, the common pitch for a male voice
373 |           is around 120Hz, whereas it is around 210Hz for a female voice.</dd>
374 |       </dl>
375 |       </p>
376 |       <p>
377 |       <dl>
378 |         <dt><strong>voice-range</strong></dt>
379 |         <dd>The <code>‘voice-range’</code> property specifies the variability in the "baseline" pitch, i.e. how much the
380 |           fundamental frequency may deviate from the average pitch of the speech output. The dynamic pitch range of the
381 |           generated speech generally increases for a highly animated voice, for example when variations in inflection
382 |           are used to convey meaning and emphasis in speech. Typically, a low range produces a flat, monotonic voice,
383 |           whereas a high range produces an animated voice.</dd>
384 |       </dl>
385 |       </p>
386 |       <h4>SSML</h4>
387 |       <p><code>prosody</code> modifies the volume, pitch, and rate of the tagged speech.</p>
388 |       <p>
389 |       <pre class="example">
390 |         <code>
391 |           &lt;speak&gt;
392 |           Normal volume for the first sentence.
393 |           &lt;prosody volume="x-loud"&gt;Louder volume for the second sentence&lt;/prosody&gt;.
394 |           When I wake up, &lt;prosody rate="x-slow"&gt;I speak quite slowly&lt;/prosody&gt;.
395 |           I can speak with my normal pitch,
396 |           &lt;prosody pitch="x-high"&gt; but also with a much higher pitch &lt;/prosody&gt;,
397 |           and also &lt;prosody pitch="low"&gt;with a lower pitch&lt;/prosody&gt;.
398 |           &lt;/speak&gt;
399 |         </code>
400 |       </pre>
401 |       </p>
402 |     </section>
403 |     <section>
404 |       <h3>Emphasis</h3>
405 |       <p>Allow content authors to specify that text content be spoken with emphasis, for example, louder and more
406 |         slowly. This can be viewed as a simplification of the Rate/Pitch/Volume controls to reduce authoring complexity.
407 |       </p>
408 |       <h4>HTML</h4>
409 |       <p>
410 |         The HTML <code>&lt;em&gt;</code> element marks text that has stress emphasis. The <code>&lt;em&gt;</code>
411 |         element can be nested, with each level of nesting indicating a greater degree of emphasis.
412 |       </p>
413 |       <p>
414 |         The <code>&lt;em&gt;</code> element is for words that have a stressed emphasis compared to surrounding text,
415 |         which is often limited to a word or words of a sentence and affects the meaning of the sentence itself.
416 | 
417 |         Typically this element is displayed in italic type. However, it should not be used simply to apply italic
418 |         styling; use the CSS <code>font-style</code> property for that purpose. Use the <code>&lt;cite&gt;</code>
419 |         element to mark the title of a work (book, play, song, etc.). Use the <code>&lt;i&gt;</code> element to mark
420 |         text that is in an alternate tone or mood, which covers many common situations for italics such as scientific
421 |         names or words in other languages. Use the <code>&lt;strong&gt;</code> element to mark text that has greater
422 |         importance than surrounding text.
423 |       </p>
424 | 
425 |       <h4>CSS</h4>
426 |       <p>
427 |       <dl>
428 |         <dt><strong>voice-stress</strong></dt>
429 |         <dd>The <code>‘voice-stress’</code> property manipulates the strength of emphasis, which is normally applied
430 |           using a combination of pitch change, timing changes, loudness and other acoustic differences. The precise
431 |           meaning of the values therefore depend on the language being spoken.</dd>
432 |       </dl>
433 |       </p>
434 | 
435 |       <h4>SSML</h4>
436 |       <p>Emphasize the tagged words or phrases. Emphasis changes rate and volume of the speech. More emphasis is spoken
437 |         louder and slower. Less emphasis is quieter and faster.</p>
438 |       <p>
439 |       <pre class="example">
440 |         <code>
441 |           &lt;speak&gt;
442 |           I already told you I
443 |           &lt;emphasis level="strong"&gt;really like&lt;/emphasis&gt; that person.
444 |           &lt;/speak&gt;
445 |         </code>
446 |       </pre>
447 |       </p>
448 |     </section>
449 |     <section>
450 |       <h3>Say As</h3>
451 |       <p>Allow content authors to specify how text is spoken. For example, content authors would be able to indicate
452 |         that a series of four numbers should be spoken as a year rather than a cardinal number.</p>
453 |       <h4>CSS</h4>
454 |       <p>The <code>‘speak-as’</code> property determines in what manner text gets rendered aurally, based upon a
455 |         predefined list of possibilities.</p>
456 |       <p class="note">
457 |         Speech synthesizers are knowledgeable about what a number is. The <code>‘speak-as’</code> property enables some
458 |         level of control on how user agents render numbers, and may be implemented as a preprocessing step before
459 |         passing the text to the actual speech synthesizer.
460 |       </p>
461 |       <h4>SSML</h4>
462 |       <p>
463 |         Describes how the text should be interpreted. This lets you provide additional context to the text and eliminate
464 |         any ambiguity on how Alexa should render the text. Indicate how Alexa should interpret the text with the
465 |         <code>interpret-as</code> attribute.
466 |       </p>
467 |       <p>
468 |       <pre class="example">
469 |         <code>
470 |           &lt;speak&gt;
471 |           Here is a number spoken as a cardinal number:
472 |           &lt;say-as interpret-as="cardinal"&gt;12345&lt;/say-as&gt;.
473 |           Here is the same number with each digit spoken separately:
474 |           &lt;say-as interpret-as="digits"&gt;12345&lt;/say-as&gt;.
475 |           Here is a word spelled out: &lt;say-as interpret-as="spell-out"&gt;hello&lt;/say-as&gt;
476 |           &lt;/speak&gt;
477 |         </code>
478 |       </pre>
479 |       </p>
480 | 
481 |     </section>
482 |     <section>
483 |       <h3>Pausing</h3>
484 |       <p>Allow content authors to specify pauses before or after content to ensure the desired prosody of the
485 |         presentation, which can affect the pronunciation of the pronunciation of content the precedes or follows the
486 |         pause.</p>
487 |       <h4>CSS</h4>
488 |       <p>
489 |         The <code>‘pause-before’</code> and <code>‘pause-after’</code> properties specify a prosodic boundary (silence
490 |         with a specific duration) that occurs before (or after) the speech synthesis rendition of the selected element,
491 |         or if any <code>‘cue-before’</code> (or <code>‘cue-after’</code>) is specified, before (or after) the cue within
492 |         the aural box model.
493 |       </p>
494 |       <p class="note">
495 | 
496 |         Note that although the functionality provided by this property is similar to the <code>break element</code> from
497 |         the SSML markup language [SSML], the application of ‘pause’ prosodic boundaries within the aural box model of
498 |         CSS Speech requires special considerations (e.g. "collapsed" pauses).
499 | 
500 |       </p>
501 |       <h4>SSML</h4>
502 |       <p>
503 |         <code>break</code> represents a pause in the speech. Set the length of the pause with the <code>strength</code>
504 |         or <code>time</code> attributes.
505 |       </p>
506 |       <p>
507 |       <pre class="example">
508 |         <code>
509 |           &lt;speak&gt;
510 |           There is a three second pause here &lt;break time="3s"/&gt;
511 |           then the speech continues.
512 |           &lt;/speak&gt;
513 |         </code>
514 |       </pre>
515 |       </p>
516 | 
517 |     </section>
518 |   </section>
519 |   <div data-include="../common/acknowledgements.html" data-include-replace="true" data-oninclude="fixIncludes">
520 |     Acknowledgements placeholder
521 |   </div>
522 | </body>
523 | 
524 | </html>
525 | 


--------------------------------------------------------------------------------
/use-cases/index.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
  3 | 	<head>
  4 | 		<title>Pronunciation Use Cases</title>
  5 | 		<meta charset="utf-8"/>
  6 | 		<script src="https://www.w3.org/Tools/respec/respec-w3c-common" class="remove"></script>
  7 | 		
  8 | 		<script src="respec-config.js" class="remove"></script>
  9 | 
 10 | 	</head>
 11 | 	<body>
 12 | 
 13 | 		<section id="abstract">
 14 |             
 15 |             
 16 | 			<p>The objective of the Pronunciation Task Force is to develop normative specifications and best practices guidance collaborating with other W3C groups as appropriate, to provide for proper pronunciation in HTML content when using text to speech (TTS) synthesis. This document provides various use cases highlighting the need for standardization of pronunciation markup, to ensure that consistent and accurate representation of the content. The requirements from the user scenarios provide the basis for these technical requirements/specifications.</p>
 17 | 		</section>
 18 | 
 19 | 		<section id="sotd"></section>
 20 | 
 21 | 		<section class="informative" id="introduction">
 22 | 
 23 | 			<h1>Introduction</h1>
 24 |      
 25 |             <p>This document provides use cases which describe specific implmentation approaches for introducing pronunciation
 26 | 				and spoken presentation authoring markup into HTML5. These approaches are based on the two primary approaches 
 27 | 				that have evolved from the Pronunciation Task Force members.  Other approaches may appear in subsequent working drafts.
 28 | 			</p>
 29 | 			<p>Successful use cases will be those that provide ease of authoring and consumption by assistive technologies and user 
 30 | 				agents that utilize synthetic speech for spoken presentation of web content. The most challenging aspect of consumption may
 31 | 				be alignment of the markup approach with the standard mechanisms by which assistive technologies, specifically screen
 32 | 				readers, obtain content via platform accessibility APIs.
 33 | 			</p>
 34 | 			
 35 | 
 36 | 		</section>
 37 | 		<section>
 38 | 			<h1>Use Case aria-ssml</h1>
 39 | 			<section>
 40 | 				<h2>Background and Current Practice</h2>
 41 | 				<p>A new <code>aria</code> attribute could be used to include pronunciation content.</p>
 42 | 			</section>
 43 | 			<section>
 44 | 				<h2>Goal</h2>
 45 | 				<p>Embed SSML in an HTML document.</p>
 46 | 			</section>
 47 | 			<section>
 48 | 				<h2>Target Audience</h2>
 49 | 				<ul>
 50 | 					<li>Assistive Technology</li>
 51 | 					<li>Browser Extensions</li>
 52 | 					<li>Search Engines</li>
 53 | 				</ul>
 54 | 			</section>
 55 | 			<section>
 56 | 				<h2>Implementation Options</h2>
 57 | 				<p><strong>aria-ssml as embedded JSON</strong></p>
 58 | 				<p>When AT encounters an element with aria-ssml, the AT should enhance the UI by processing the pronunciation content and passing it to the <a href="https://w3c.github.io/speech-api/">Web Speech API</a> or an external API (e.g., <a href="https://cloud.google.com/text-to-speech/">Google's Text to Speech API</a>).</p>
 59 | 				<pre class="example">I say &lt;span aria-ssml='{&quot;phoneme&quot;:{&quot;ph&quot;:&quot;pɪˈkɑːn&quot;,&quot;alphabet&quot;:&quot;ipa&quot;}}'&gt;pecan&lt;/span&gt;.
 60 | You say &lt;span aria-ssml='{&quot;phoneme&quot;:{&quot;ph&quot;:&quot;ˈpi.k&aelig;n&quot;,&quot;alphabet&quot;:&quot;ipa&quot;}}'&gt;pecan&lt;/span&gt;.</pre>
 61 | 				<p>Client will convert JSON to SSML and pass the XML string a speech API.</p>
 62 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
 63 | msg.text = convertJSONtoSSML(element.getAttribute('aria-ssml'));
 64 | speechSynthesis.speak(msg);</pre>
 65 | 				<p><strong>aria-ssml referencing XML by template ID</strong></p>
 66 | 				<pre class="example">&lt;!-- ssml must appear inside a template to be valid --&gt;
 67 | &lt;template id=&quot;pecan&quot;&gt;
 68 | &lt;?xml version=&quot;1.0&quot;?&gt;
 69 | &lt;speak version=&quot;1.1&quot;
 70 |        xmlns=&quot;http://www.w3.org/2001/10/synthesis&quot;
 71 |        xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
 72 |        xsi:schemaLocation=&quot;http://www.w3.org/2001/10/synthesis
 73 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd&quot;
 74 |        xml:lang=&quot;en-US&quot;&gt;
 75 |     You say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;pɪˈkɑːn&quot;&gt;pecan&lt;/phoneme&gt;.
 76 |     I say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;ˈpi.k&aelig;n&quot;&gt;pecan&lt;/phoneme&gt;.
 77 | &lt;/speak&gt;
 78 | &lt;/template&gt;
 79 | 
 80 | &lt;p aria-ssml=&quot;#pecan&quot;&gt;You say, pecan. I say, pecan.&lt;/p&gt;</pre>
 81 | 				<p>Client will parse XML and serialize it before passing to a speech API:</p>
 82 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
 83 | var xml = document.getElementById('pecan').content.firstElementChild;
 84 | msg.text = serialize(xml);
 85 | speechSynthesis.speak(msg);</pre>
 86 | 
 87 | 				<p><strong>aria-ssml referencing an XML string as script tag</strong></p>
 88 | 				<pre class="example">&lt;script id=&quot;pecan&quot; type=&quot;application/ssml+xml&quot;&gt;
 89 | &lt;speak version=&quot;1.1&quot;
 90 |        xmlns=&quot;http://www.w3.org/2001/10/synthesis&quot;
 91 |        xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
 92 |        xsi:schemaLocation=&quot;http://www.w3.org/2001/10/synthesis
 93 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd&quot;
 94 |        xml:lang=&quot;en-US&quot;&gt;
 95 |     You say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;pɪˈkɑːn&quot;&gt;pecan&lt;/phoneme&gt;.
 96 |     I say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;ˈpi.k&aelig;n&quot;&gt;pecan&lt;/phoneme&gt;.
 97 | &lt;/speak&gt;
 98 | &lt;/script&gt;
 99 | 
100 | &lt;p aria-ssml=&quot;#pecan&quot;&gt;You say, pecan. I say, pecan.&lt;/p&gt;</pre>
101 | 				<p>Client will pass the XML string raw to a speech API.</p>
102 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
103 | msg.text = document.getElementById('pecan').textContent;
104 | speechSynthesis.speak(msg);</pre>
105 | 				<p><strong>aria-ssml referencing an external XML document by URL</strong></p>
106 | 				<pre class="example">&lt;p aria-ssml=&quot;http://example.com/pronounce.ssml#pecan&quot;&gt;You say, pecan. I say, pecan.&lt;/p&gt;</pre>
107 | 				<p>Client will pass the string payload to a speech API.</p>
108 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
109 | var response = await fetch(el.dataset.ssml)
110 | msg.txt = await response.text();
111 | speechSynthesis.speak(msg);</pre>
112 | 			</section>
113 | 			<section>
114 | 				<h2>Existing Work</h2>
115 | 				<ul>
116 | 					<li><a href="https://github.com/alia11y/SSMLinHTMLproposal">aria-ssml proposal</a></li>
117 | 					<li><a href="https://www.w3.org/TR/speech-synthesis11/">SSML</a></li>
118 | 					<li><a href="https://w3c.github.io/speech-api/">Web Speech API</a></li>
119 | 				</ul>
120 | 			</section>
121 | 			<section>
122 | 				<h2>Problems and Limitations</h2>
123 | 				<ul>
124 | 					<li>aria-ssml is not a valid aria-* attribute.</li>
125 | 					<li>OS/Browsers combinations that do not support the serialized XML usage of the Web Speech API.</li>
126 | 				</ul>
127 | 			</section>
128 | 			
129 | 		</section>
130 | 
131 | 		<section>
132 | 			<h1>Use Case data-ssml</h1>
133 | 			<section>
134 | 				<h2>Background and Current Practice</h2>
135 | 				<p>As an existing attribute, data-* could be used, with some conventions, to include pronunciation content.</p>
136 | 			</section>
137 | 			<section>
138 | 				<h2>Goal</h2>
139 | 				<ul>
140 | 					<li>Support repeated use within the page context</li>
141 | 					<li>Support external file references</li>
142 | 					<li>Reuse existing techniques without expanding specifications</li>
143 | 				</ul>
144 | 			</section>
145 | 			<section>
146 | 				<h2>Target Audience</h2>
147 | 				<p>Hearing users</p>
148 | 			</section>
149 | 			<section>
150 | 				<h2>Implementation Options</h2>
151 | 				<p><strong>data-ssml as embedded JSON</strong></p>
152 | 				<p>When an element with data-ssml is encountered by an SSML-aware AT, the AT should enhance the user interface by processing the referenced SSML content and passing it to the <a href="https://w3c.github.io/speech-api/">Web Speech API</a> or an external API (e.g., <a href="https://cloud.google.com/text-to-speech/">Google's Text to Speech API</a>).</p>
153 | 				<pre class="example">&lt;h2&gt;The Pronunciation of Pecan&lt;/h2&gt;
154 | &lt;p&gt;&lt;speak&gt;
155 | I say &lt;span data-ssml='{&quot;phoneme&quot;:{&quot;ph&quot;:&quot;pɪˈkɑːn&quot;,&quot;alphabet&quot;:&quot;ipa&quot;}}'&gt;pecan&lt;/span&gt;.
156 | You say &lt;span data-ssml='{&quot;phoneme&quot;:{&quot;ph&quot;:&quot;ˈpi.k&aelig;n&quot;,&quot;alphabet&quot;:&quot;ipa&quot;}}'&gt;pecan&lt;/span&gt;.</pre>
157 | 				<p>Client will convert JSON to SSML and pass the XML string a speech API.</p>
158 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
159 | msg.text = convertJSONtoSSML(element.dataset.ssml);
160 | speechSynthesis.speak(msg);</pre>
161 | 
162 | 				<p><strong>data-ssml referencing XML by template ID</strong></p>
163 | 				<pre class="example">&lt;!-- ssml must appear inside a template to be valid --&gt;
164 | &lt;template id=&quot;pecan&quot;&gt;
165 | &lt;?xml version=&quot;1.0&quot;?&gt;
166 | &lt;speak version=&quot;1.1&quot;
167 |        xmlns=&quot;http://www.w3.org/2001/10/synthesis&quot;
168 |        xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
169 |        xsi:schemaLocation=&quot;http://www.w3.org/2001/10/synthesis
170 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd&quot;
171 |        xml:lang=&quot;en-US&quot;&gt;
172 |     You say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;pɪˈkɑːn&quot;&gt;pecan&lt;/phoneme&gt;.
173 |     I say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;ˈpi.k&aelig;n&quot;&gt;pecan&lt;/phoneme&gt;.
174 | &lt;/speak&gt;
175 | &lt;/template&gt;
176 | 
177 | &lt;p data-ssml=&quot;#pecan&quot;&gt;You say, pecan. I say, pecan.&lt;/p&gt;</pre>
178 | 				<p>Client will parse XML and serialize it before passing to a speech API:</p>
179 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
180 | var xml = document.getElementById('pecan').content.firstElementChild;
181 | msg.text = serialize(xml);
182 | speechSynthesis.speak(msg);</pre>
183 | 
184 | 				<p><strong>data-ssml referencing an XML string as script tag</strong></p>
185 | 				<pre class="example">&lt;script id=&quot;pecan&quot; type=&quot;application/ssml+xml&quot;&gt;
186 | &lt;speak version=&quot;1.1&quot;
187 |        xmlns=&quot;http://www.w3.org/2001/10/synthesis&quot;
188 |        xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
189 |        xsi:schemaLocation=&quot;http://www.w3.org/2001/10/synthesis
190 |                    http://www.w3.org/TR/speech-synthesis11/synthesis.xsd&quot;
191 |        xml:lang=&quot;en-US&quot;&gt;
192 |     You say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;pɪˈkɑːn&quot;&gt;pecan&lt;/phoneme&gt;.
193 |     I say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;ˈpi.k&aelig;n&quot;&gt;pecan&lt;/phoneme&gt;.
194 | &lt;/speak&gt;
195 | &lt;/script&gt;
196 | 
197 | &lt;p data-ssml=&quot;#pecan&quot;&gt;You say, pecan. I say, pecan.&lt;/p&gt;</pre>
198 | 				<p>Client will pass the XML string raw to a speech API.</p>
199 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
200 | msg.text = document.getElementById('pecan').textContent;
201 | speechSynthesis.speak(msg);</pre>
202 | 				<p><strong>data-ssml referencing an external XML document by URL</strong></p>
203 | 				<pre class="example">&lt;p data-ssml=&quot;http://example.com/pronounce.ssml#pecan&quot;&gt;You say, pecan. I say, pecan.&lt;/p&gt;</pre>
204 | 				<p>Client will pass the string payload to a speech API.</p>
205 | 				<pre class="example">var msg = new SpeechSynthesisUtterance();
206 | var response = await fetch(el.dataset.ssml)
207 | msg.txt = await response.text();
208 | speechSynthesis.speak(msg);</pre>
209 | 			</section>
210 | 			<section>
211 | 				<h2>Existing Work</h2>
212 | 				<ul>
213 | 					<li><a href="https://github.com/alia11y/SSMLinHTMLproposal">aria-ssml proposal</a></li>
214 | 					<li><a href="https://www.w3.org/TR/speech-synthesis11/">SSML</a></li>
215 | 					<li><a href="https://w3c.github.io/speech-api/">Web Speech API</a></li>
216 | 				</ul>
217 | 			</section>
218 | 			<section>
219 | 				<h2>Problems and Limitations</h2>
220 | 				<ul>
221 | 					<li>Does not assume or suggest visual pronunciation help for deaf or hard of hearing</li>
222 | 					<li>Use of data-* requires input from AT vendors</li>
223 | 					<li>XML data is not indexed by search engines</li>
224 | 				</ul>
225 | 			</section>
226 | 		</section>
227 | 
228 | 		<section>
229 | 			<h1>Use Case HTML5</h1>
230 | 			<section>
231 | 				<h2>Background and Current Practice</h2>
232 | 				<p>HTML5 includes the XML namespaces for MathML and SVG. So, using either's elements in an HTML5 document is valid. Because SSML's implementation is non-visual in nature, browser implementation could be slow or non-existent without affecting how authors use SSML in HTML. Expansion of HTML5 to include SSML namespace would allow valid use of SSML in the HTML5 document. Browsers would treat the element like any other unknown element, as HTMLUnknownElement.</p>
233 | 			</section>
234 | 			<section>
235 | 				<h2>Goal</h2>
236 | 				<ul>
237 | 					<li>Support valid use of SSML in HTML5 documents</li>
238 | 					<li>Allow visual pronunciation support</li>
239 | 				</ul>
240 | 			</section>
241 | 			<section>
242 | 				<h2>Target Audience</h2>
243 | 				<ul>
244 | 					<li>SSML-aware technologies and browser extensions</li>
245 | 					<li>Search indexers</li>
246 | 				</ul>
247 | 			</section>
248 | 			<section>
249 | 				<h2>Implementation Options</h2>
250 | 				<p><strong>SSML</strong></p>
251 | 				<p>When an element with data-ssml is encountered by an <a href="https://www.w3.org/TR/speech-synthesis11/">SSML</a>-aware AT, the AT should enhance the user interface by processing the referenced SSML content and passing it to the <a href="https://w3c.github.io/speech-api/">Web Speech API</a> or an external API (e.g., <a href="https://cloud.google.com/text-to-speech/">Google's Text to Speech API</a>).</p>
252 | 				<pre class="example">&lt;h2&gt;The Pronunciation of Pecan&lt;/h2&gt;
253 |   &lt;p&gt;&lt;speak&gt;
254 |   You say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;pɪˈkɑːn&quot;&gt;pecan&lt;/phoneme&gt;.
255 |   I say, &lt;phoneme alphabet=&quot;ipa&quot; ph=&quot;ˈpi.k&aelig;n&quot;&gt;pecan&lt;/phoneme&gt;.
256 | &lt;/speak&gt;&lt;/p&gt;</pre>
257 | 			</section>
258 | 			<section>
259 | 				<h2>Existing Work</h2>
260 | 				<ul>
261 | 					<li><a href="https://www.w3.org/TR/voicexml21/">VoiceXML 2.1</a></li>
262 | 					<li><a href="https://www.w3.org/TR/REC-smil/smil-extended-linking.html#SMILLinking-Relationship-to-XLink">SMIL - Synchronized Multimedia Integration Language</a></li>
263 | 					<li><a href="https://www.w3.org/TR/pronunciation-lexicon/#AppB">PLS - Pronunciation Lexicon</a></li>
264 | 				</ul>
265 | 			</section>
266 | 			<section>
267 | 				<h2>Problems and Limitations</h2>
268 | 				<p>SSML is not valid HTML5</p>
269 | 			</section>
270 | 		</section>
271 | 
272 | 		<section>
273 | 			<h1>Use Case Custom Element</h1>
274 | 			<section>
275 | 				<h2>Background and Current Practice</h2>
276 | 				<p>Embed valid SSML in HTML using custom elements registered as ssml-* where * is the actual SSML tag name (except for p which expects the same treatment as an HTML p in HTML layout).</p>
277 | 			</section>
278 | 			<section>
279 | 				<h2>Goal</h2>
280 | 				<p>Support use of SSML in HTML documents.</p>
281 | 			</section>
282 | 			<section>
283 | 				<h2>Target Audience</h2>
284 | 				<ul>
285 | 					<li>SSML-aware technologies and browser extensions</li>
286 | 					<li>Search indexers</li>
287 | 				</ul>
288 | 			</section>
289 | 			<section>
290 | 				<h2>Implementation Options</h2>
291 | 				<p><strong>ssml-speak: see <a href="https://ssml-components.glitch.me">demo</a> </strong></p>
292 | 				<p>Only the &lt;ssml-speak&gt; component requires registration. The component code lifts the SSML by getting the innerHTML and removing the ssml- prefix from the interior tags and passing it to the web speech API. The &lt;p&gt; tag from SSML is not given the prefix because we still want to start a semantic paragraph within the content. The other tags used in the example have no semantic meaning. Tags like &lt;em&gt; in HTML could be converted to &lt;emphasis&gt; in SSML. In that case, CSS styles will come from the browser's default styles or the page author.</p>
293 | 				<pre class="example">&lt;ssml-speak&gt;
294 |   Here are &lt;ssml-say-as interpret-as=&quot;characters&quot;&gt;SSML&lt;/ssml-say-as&gt; samples.
295 |   I can pause&lt;ssml-break time=&quot;3s&quot;&gt;&lt;/ssml-break&gt;.
296 |   I can speak in cardinals.
297 |   Your number is &lt;ssml-say-as interpret-as=&quot;cardinal&quot;&gt;10&lt;/ssml-say-as&gt;.
298 |   Or I can speak in ordinals.
299 |   You are &lt;ssml-say-as interpret-as=&quot;ordinal&quot;&gt;10&lt;/ssml-say-as&gt; in line.
300 |   Or I can even speak in digits.
301 |   The digits for ten are &lt;ssml-say-as interpret-as=&quot;characters&quot;&gt;10&lt;/ssml-say-as&gt;.
302 |   I can also substitute phrases, like the &lt;ssml-sub alias=&quot;World Wide Web Consortium&quot;&gt;W3C&lt;/ssml-sub&gt;.
303 |   Finally, I can speak a paragraph with two sentences.
304 |   &lt;p&gt;
305 |     &lt;ssml-s&gt;You say, &lt;ssml-phoneme alphabet=&quot;ipa&quot; ph=&quot;pɪˈkɑːn&quot;&gt;pecan&lt;/ssml-phoneme&gt;.&lt;/ssml-s&gt;
306 |     &lt;ssml-s&gt;I say, &lt;ssml-phoneme alphabet=&quot;ipa&quot; ph=&quot;ˈpi.k&aelig;n&quot;&gt;pecan&lt;/ssml-phoneme&gt;.&lt;/ssml-s&gt;
307 |   &lt;/p&gt;
308 | &lt;/ssml-speak&gt;
309 | &lt;template id=&quot;ssml-controls&quot;&gt;
310 |   &lt;style&gt;
311 |     [role=&quot;switch&quot;][aria-checked=&quot;true&quot;] :first-child,
312 |     [role=&quot;switch&quot;][aria-checked=&quot;false&quot;] :last-child {
313 |       background: #000;
314 |       color: #fff;
315 |     }
316 |   &lt;/style&gt;
317 |   &lt;slot&gt;&lt;/slot&gt;
318 |   &lt;p&gt;
319 |     &lt;span id=&quot;play&quot;&gt;Speak&lt;/span&gt;
320 |     &lt;button role=&quot;switch&quot; aria-checked=&quot;false&quot; aria-labelledby=&quot;play&quot;&gt;
321 |       &lt;span&gt;on&lt;/span&gt;
322 |       &lt;span&gt;off&lt;/span&gt;
323 |     &lt;/button&gt;
324 |   &lt;/p&gt;
325 | &lt;/template&gt;</pre>
326 | 				<pre class="example">class SSMLSpeak extends HTMLElement {
327 |   constructor() {
328 |     super();
329 |     const template = document.getElementById('ssml-controls');
330 |     const templateContent = template.content;
331 |     this.attachShadow({mode: 'open'})
332 |       .appendChild(templateContent.cloneNode(true));
333 |   }
334 |   connectedCallback() {
335 |     const button = this.shadowRoot.querySelector('[role=&quot;switch&quot;][aria-labelledby=&quot;play&quot;]')
336 |     const ssml = this.innerHTML.replace(/ssml-/gm, '')
337 |     const msg = new SpeechSynthesisUtterance();
338 |     msg.lang = document.documentElement.lang;
339 |     msg.text = `&lt;speak version=&quot;1.1&quot;
340 |       xmlns=&quot;http://www.w3.org/2001/10/synthesis&quot;
341 |       xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
342 |       xsi:schemaLocation=&quot;http://www.w3.org/2001/10/synthesis
343 |         http://www.w3.org/TR/speech-synthesis11/synthesis.xsd&quot;
344 |       xml:lang=&quot;${msg.lang}&quot;&gt;
345 |     ${ssml}
346 |     &lt;/speak&gt;`;
347 |     msg.voice = speechSynthesis.getVoices().find(voice =&gt; voice.lang.startsWith(msg.lang));
348 |     msg.onstart = () =&gt; button.setAttribute('aria-checked', 'true');
349 |     msg.onend = () =&gt; button.setAttribute('aria-checked', 'false');
350 |     button.addEventListener('click', () =&gt; speechSynthesis[speechSynthesis.speaking ? 'cancel' : 'speak'](msg))
351 |   }
352 | }
353 | 
354 | customElements.define('ssml-speak', SSMLSpeak);</pre>
355 | 			</section>
356 | 			<section>
357 | 				<h2>Existing Work</h2>
358 | 				<ul>
359 | 					<li><a href="https://dom.spec.whatwg.org/#concept-element-custom">DOM Living Standard</a></li>
360 | 					<li><a href="https://w3c.github.io/speech-api/">Web Speech API</a></li>
361 | 				</ul>
362 | 			</section>
363 | 			<section>
364 | 				<h2>Problems and Limitations</h2>
365 | 				<ul>
366 | 					<li>OS/Browsers combinations that do not support the serialized XML usage of the Web Speech API.</li>
367 | 					<li>Browsers may need to map SSML tags with CSS styles for default user agent styles.</li>
368 | 					<li>Without an extension or AT, only user interaction can start the Web Speech API.</li>
369 | 					<li>Authors or parsing may need to remove HTML content with unintended SSML semantics before serialization.</li>
370 | 				</ul>
371 | 			</section>
372 | 		</section>
373 | 
374 | 		<section>
375 | 			<h1>Use Case JSON-LD</h1>
376 | 			<section>
377 | 				<h2>Background and Current Practice</h2>
378 | 				<p><a href="https://www.w3.org/2018/jsonld-cg-reports/json-ld/">JSON-LD</a> provides an established standard for embedding data in HTML. Unlike other microdata approaches, JSON-LD helps to reuse standardized annotations through external references.</p>
379 | 			</section>
380 | 			<section>
381 | 				<h2>Goal</h2>
382 | 				<p>Support use of SSML in HTML documents.</p>
383 | 			</section>
384 | 			<section>
385 | 				<h2>Target Audience</h2>
386 | 				<ul>
387 | 					<li>SSML-aware technologies and browser extensions</li>
388 | 					<li>Search indexers</li>
389 | 				</ul>
390 | 			</section>
391 | 			<section>
392 | 				<h2>Implementation Options</h2>
393 | 				<p><strong>JSON-LD</strong></p>
394 | 				<pre class="example">&lt;script type=&quot;application/ld+json&quot;&gt;
395 | {
396 |   &quot;@context&quot;: &quot;http://schema.org/&quot;,
397 |   &quot;@id&quot;: &quot;/pronunciation#WKRP&quot;,
398 |   &quot;@type&quot;: &quot;RadioStation&quot;,
399 |   &quot;name&quot;: [&quot;WKRP&quot;,
400 |     &quot;@type&quot;: &quot;PronounceableText&quot;,
401 |     &quot;textValue&quot;: &quot;WKRP&quot;,
402 |     &quot;speechToTextMarkup&quot;: &quot;SSML&quot;,
403 |     &quot;phoneticText&quot;: &quot;&lt;speak&gt;&lt;say-as interpret-as=\&quot;characters\&quot;&gt;WKRP&lt;/say-as>&quot;
404 |   ]
405 | }
406 | &lt;/script&gt;
407 | &lt;p&gt;
408 |   Do you listen to &lt;span itemscope
409 |     itemtype=&quot;http://schema.org/PronounceableText&quot;
410 |     itemid=&quot;/pronunciation#WKRP&quot;&gt;WKRP&lt;/span&gt;?
411 | &lt;/p&gt;</pre>
412 | 			</section>
413 | 			<section>
414 | 				<h2>Existing Work</h2>
415 | 				<ul>
416 | 					<li>
417 | 						<a href="https://www.w3.org/WoT/WG/">Web of Things Working Group</a>
418 | 					</li>
419 | 					<li>
420 | 						<a href="https://webschemas.org/PronounceableText">Schema.org PronounceableText</a>
421 | 					</li>
422 | 				</ul>
423 | 			</section>
424 | 			<section>
425 | 				<h2>Problems and Limitations</h2>
426 | 				<p>not an established "type"/published schema</p>
427 | 			</section>
428 | 		</section>
429 | 		<section>
430 | 			<h1>Use Case Ruby</h1>
431 | 			<section>
432 | 				<h2>Background and Current Practice</h2>
433 | 				<blockquote>&lt;Ruby&gt; annotations are short runs of text presented alongside base text, primarily used in East Asian typography as a guide for pronunciation or to include other annotations.</blockquote>
434 | 				<p>ruby guides pronunciation visually. This seems like a natural fit for text-to-speech.</p>
435 | 			</section>
436 | 			<section>
437 | 				<h2>Goal</h2>
438 | 				<ul>
439 | 					<li>Support use of SSML in HTML documents.</li>
440 | 					<li>Offer visual pronunciation support.</li>
441 | 				</ul>
442 | 			</section>
443 | 			<section>
444 | 				<h2>Target Audience</h2>
445 | 				<ul>
446 | 					<li>AT and browser extensions</li>
447 | 					<li>Search indexers</li>
448 | 				</ul>
449 | 			</section>
450 | 			<section>
451 | 				<h2>Implementation Options</h2>
452 | 				<p><strong>ruby with microdata</strong></p>
453 | 				<p>Microdata can augment the ruby element and its descendants.</p>
454 | 				<pre class="example">&lt;p&gt;
455 |   You say,
456 |   &lt;span itemscope=&quot;&quot; itemtype=&quot;http://example.org/Pronunciation&quot;&gt;
457 |     &lt;ruby itemprop=&quot;phoneme&quot; content=&quot;pecan&quot;&gt;
458 |       pecan
459 |       &lt;rt itemprop=&quot;ph&quot;&gt;pɪˈkɑːn&lt;/rt&gt;
460 |       &lt;meta itemprop=&quot;alphabet&quot; content=&quot;ipa&quot;&gt;
461 |     &lt;/ruby&gt;.
462 |   &lt;/span&gt;
463 |   I say,
464 |   &lt;span itemscope=&quot;&quot; itemtype=&quot;http://example.org/Pronunciation&quot;&gt;
465 |     &lt;ruby itemprop=&quot;phoneme&quot; content=&quot;pecan&quot;&gt;
466 |       pe
467 |       &lt;rt itemprop=&quot;ph&quot;&gt;ˈpi&lt;/rt&gt;
468 |       can
469 |       &lt;rt itemprop=&quot;ph&quot;&gt;k&aelig;n&lt;/rt&gt;
470 |       &lt;meta itemprop=&quot;alphabet&quot; content=&quot;ipa&quot;&gt;
471 |     &lt;/ruby&gt;.
472 |   &lt;/span&gt;
473 | &lt;/p&gt;</pre>
474 | 			</section>
475 | 			<section>
476 | 				<h2>Existing Work</h2>
477 | 				<ul>
478 | 					<li><a href="https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-ruby-element">HTML Living Standard</a></li>
479 | 					<li><a href="https://schema.org/">Schema.org</a></li>
480 | 				</ul>
481 | 			</section>
482 | 			<section>
483 | 				<h2>Problems and Limitations</h2>
484 | 				<ul>
485 | 					<li>AT may process annotations as content</li>
486 | 					<li>AT "double reading" words instead of choosing either the content or the annotation</li>
487 | 					<li>Only offers for a few SSML expressions</li>
488 | 					<li>Difficult to reuse by reference</li>
489 | 				</ul>
490 | 			</section>
491 | 		</section>
492 | 		
493 | 		<section>
494 | 			<h1>User Scenarios</h1>
495 | 			<p>The purpose of developing user scenarios is to facilitate discussion and further requirements definition for pronunciation standards developed within the PTF prior to review of the APA. There are numerous interpretations of what form user scenarios adopt. Within the user experience research (UXR) body of practice, a user scenario is a written narrative related to the use of a service from the perspective of a user or user group. Importantly, the context of use is emphasized as is the desired outcome of use. There are potentially thousands of user scenarios for a technology such as TTS, however, the focus for the PTF is on the core scenarios that relate to the kinds of users who will engage with TTS.</p>
496 | 			<p>User scenarios, like Personas, represent a composite of real-world experiences. In the case of the PTF, the scenarios were derived from interviews of people who were end-consumers of TTS, as well as submitted narratives and industry examples from practitioners. There are several formats of scenarios. Several are general goal or task-oriented scenarios. Others elaborate on richer context, for example, educational assessment.</p>
497 | 			<p>The following user scenarios are organized on the three perspectives of TTS use derived from analysis of the qualitative data collected from the discovery work:</p>
498 | 			<ul>
499 | 				<li><strong>End-Consumers of TTS: </strong>Encompasses those with a visual disability or other need to have TTS operational when using assistive technologies (ATs).</li>
500 | 				<li><strong>Digital Content Managers: </strong>Addresses activities related to those responsible for producing content that needs to be accessible to ATs and W3C-WAI Guidelines.</li>
501 | 				<li><strong>Software Engineers: </strong>Includes developers and architects required to put TTS into an application or service.</li>
502 | 			</ul>
503 | 			<p class=issue>Need to add the other categories, or remove the list above and just rely on the ToC.</p>
504 | 		</section>
505 | 			
506 | 		<section>
507 | 			<h1>Augmentative and Alternative Communication (AAC)</h1>
508 | 			<section>
509 | 				<h2>Names</h2>
510 | 				<p>As an AAC User I want my name to be pronounced correctly and I want to pronounce others names correctly using my AAC device.</p>
511 | 				<section>
512 | 					<h3>Storing others' names</h3>
513 | 					<p>As an AAC user, I want to be able to input and store the correct pronunciation of others’ names, so I can address people respectfully and build meaningful relationships.</p>
514 | 					<p>For instance, when meeting someone named “Nguyễn,” the AAC user wants to ensure their device pronounces the name correctly, using IPA or SSML markup, to foster respectful communication and avoid embarrassment.</p>
515 | 				</section>
516 | 				<section>
517 | 					<h3>Pronouncing my name</h3>
518 | 					<p>As an AAC user, I want my name to be pronounced correctly by my device, so that I can confidently introduce myself in social, educational, and professional settings.</p>
519 | 					<p>For example, a user named “Siobhán” may find that default TTS engines mispronounce her name. She wants to input a phonetic or SSML-based pronunciation so that her name is spoken accurately every time.</p>
520 | 				</section>
521 | 			</section>
522 | 		</section>
523 | 
524 | 		<section>
525 | 			<h2>End-Consumer of TTS</h2>
526 | 			<p>Ultimately, the quality and variation of TTS rendering by assistive technologies vary widely according to a user's context. The following user scenarios reinforce the necessity for accurate pronunciation from the perspective of those who consume digitally generated content.</p>
527 | 			<section>
528 | 				<h2>Traveller</h2>
529 | 				<p>As a traveler who uses assistive technology (AT) with TTS to help navigate through websites, I need to hear arrival and destination codes pronounced accurately so I can select the desired travel itinerary. For example, a user with a visual impairment attempts to book a flight to Ottawa, Canada and so goes to a travel website. The user already knows the airport code and enters "YOW". The site produces the result in a drop-down list as "Ottawa, CA" but the AT does not pronounce the text accurately to help the user make the correct association between their data entry and the list item.</p>
530 | 			</section>
531 | 			<section>
532 | 				<h2>Test Taker</h2>
533 | 				<p>As a test taker (tester) with a visual impairment who may use assistive technology to access the test content with speech software, screen reader or refreshable braille device, I want the content to be presented as intended, with accurate pronunciation and articulation, so that my assessment accurately reflects my knowledge of the content.</p>
534 | 			</section>
535 | 			<section>
536 | 				<h2>Student</h2>
537 | 				<p>As a student/learner with auditory and cognitive processing issues, it is difficult to distinguish sounds, inflections, and variations in pronunciation as rendered through synthetic voice, such as text-to-speech or screen reader technologies. Consistent and accurate pronunciation whether human-provided, external, or embedded is needed to support working executive processing, auditory processing and memory that facilitates comprehension in literacy and numeracy for learning and for assessments.</p>
538 | 			</section>
539 | 			<section>
540 | 				<h2>English learner</h2>
541 | 				<p>As an English Learner (EL) or a visually impaired early learner using speech synthesis for reading comprehension that includes decoding words from letters as part of the learning construct (intent of measurement), pronunciation accuracy is vital to successful comprehension, as it allows the learner to distinguish sounds at the sentence, word, syllable, and phoneme level.</p>
542 | 			</section>
543 | 		</section>
544 | 
545 | 		<section>
546 | 			<h2>Digital Content Management for TTS</h2>
547 | 			<p>The advent of graphical user interfaces (GUIs) for the management and editing of text content has given rise to content creators not requiring technical expertise beyond the ability to operate a text editing application such as Microsoft Word. The following scenario summarizes the general use, accompanied by a hypothetical application.</p>
548 | 			<ul>
549 | 				<li>As a content creator, I want to create content that can readily be delivered through assistive technology, can convey the correct meaning, and ensure that screen readers render the right pronunciation based on the surrounding context.</li>
550 | 				<li>As a content producer for a global commercial site that is inclusive, I need to be able to provide accessible culture-specific content for different geographic regions.</li>
551 | 			</ul>
552 | 
553 | 			<section>
554 | 				<h3>Educational Assessment</h3>
555 | 				<p>In the educational assessment field, providing accurate and concise pronunciation for students with auditory accommodations, such as text-to-speech (TTS) or students with screen readers, is vital for ensuring content validity and alignment with the intended construct, which objectively measures a test takers knowledge and skills. For test administrators/educators, pronunciations must be consistent across instruction and assessment in order to avoid test bias or impact effects for students. Some additional requirements for the test administrators, include, but are not limited to, such scenarios:</p>
556 | 				<section>
557 | 					<h3>Test Administrator&mdash;Read-aloud intonation, expression</h3>
558 | 					<p>As a test administrator, I want to ensure that students with the read-aloud accommodation, who are using assistive technology or speech synthesis as an alternative to a human reader, have the same speech quality (e.g., intonation, expression, pronunciation, and pace, etc.) as a spoken language.</p>
559 | 					<p class=issue>This may be simlar to the other Test Administrator case below?</p>
560 | 				</section>
561 | 				<section>
562 | 					<h3>Math educator</h3>
563 | 					<p>As a math educator, I want to ensure that speech accuracy with mathematical expressions, including numbers, fractions, and operations have accurate pronunciation for those who rely on TTS. Some mathematical expressions require special pronunciations to ensure accurate interpretation while maintaining test validity and construct. Specific examples include:</p>
564 | 					<section>
565 | 						<h4>Formulas</h4>
566 | 						<p>Mathematical formulas written in simple text with special formatting should convey the correct meaning of the expression to identify changes from normal text to super- or to sub-script text. For example, without the proper formatting, the equation:<code>a<sup>3</sup>-b<sup>3</sup>=(a-b)(a<sup>2</sup>+ab+b<sup>2</sup>)</code> may incorrectly render through some technologies and applications as a3-b3=(a-b)(a2+ab+b2).</p>
567 | 					</section>
568 | 					<section>
569 | 						<h4>Distinctions in writing</h4>
570 | 						<p>Distinctions made in writing are often not made explicit in speech; For example, “fx” may be interpreted as fx, f(x), fx, F X, F X. The distinction depends on the context; requiring the author to provide consistent and accurate semantic markup.</p>
571 | 					</section>
572 | 					<section>
573 | 						<h4>Greek letters</h4>
574 | 						<p>For math equations with Greek letters, it is important that the speech synthesizer be able to distinguish the phonetic differences between them, whether in the natural language or phonetic equivalents. For example, ε (epsilon) υ (upsilon) φ (phi) χ (chi) ξ(xi).</p>
575 | 					</section>
576 | 				</section>
577 | 				<section>
578 | 					<h3>Test Administrator&mdash;consistent pronunciation</h3>
579 | 					<p>As a test administrator/educator, pronunciations must be consistent across instruction and assessment, in order to avoid test bias and pronunciation effects on performance for students with disabilities (SWD) in comparison to students without disabilities (SWOD). Examples include:</p>
580 | 					<section>
581 | 						<h4>Spelling out rhyming words</h4>
582 | 						<p>If a test question is measuring rhyming of words or sounds of words, the speech synthesis should not read aloud the words, but rather spell out the words in the answer options.</p>
583 | 					</section>
584 | 					<section>
585 | 						<h4>Questions measuring spelling</h4>
586 | 						<p>If a test question is measuring spelling and the student needs to consider spelling correctness/incorrectness, the speech synthesis should not read aloud the misspelt words, especially for words, such as:</p>
587 | 						<ul>
588 | 							<li><strong>Heteronyms/homographs</strong>: same spelling, different pronunciation, different meanings, such as lead (to go in front of) or lead (a metal); wind (to follow a course that is not straight) or wind (a gust of air); bass (low, deep sound) or bass (a type of fish), etc.</li>
589 | 							<li><strong>Homophone</strong>: words that sound alike, such as, to/two/too; there/their/they're; pray/prey; etc.</li>
590 | 							<li><strong>Homonyms</strong>: multiple meaning words, such as scale (measure) or scale (climb, mount); fair (reasonable) or fair (carnival); suit (outfit) or suit (harmonize); etc.</li>
591 | 						</ul>
592 | 					</section>
593 | 				</section>
594 | 			</section>
595 | 
596 | 			<section>
597 | 				<h3>Academic and Linguistic Practitioners </h3>
598 | 				<p>The extension of content management in TTS is one as a means of encoding and preserving spoken text for academic analyses; irrespective of discipline, subject domain, or research methodology.</p>
599 | 				<section>
600 | 					<h3>Linguist</h3>
601 | 					<p>A.	As a linguist, I want to represent all the pronunciation variations of a given word in any language, for future analyses.</p>
602 | 				</section>
603 | 				<section>
604 | 					<h3>Speech Language Pathologist, Speech Therapist</h3>
605 | 					<p>As a speech language pathologist or speech therapists, I want TTS functionality to include components of speech and language that include dialectal and individual differences in pronunciation; identify differences in intonation, syntax, and semantics, and; allow for enhanced comprehension, language processing and support phonological awareness.</p>
606 | 				</section>
607 | 			</section>
608 | 		</section>
609 | 
610 | 		<section>
611 | 			<h2>Software Application Development</h2>
612 | 			<p>Technical standards for software development assist organizations and individuals to provide accessible experiences for users with disabilities. The final user scenarios in this document are considered from the perspective of those who design and develop software.</p>
613 | 			<p class=issue>Probably shouldn't use "final" here, as we mey re-order.</p>
614 | 			<section>
615 | 				<h3>Product owner</h3>
616 | 				<p>As a Product Owner for a web content management system (CMS), I want the next software product release to have the capability of pronouncing speech "just like Alexa can".</p>
617 | 			</section>
618 | 			<section>
619 | 				<h3>Client-side User Interface Developer</h3>
620 | 				<p>As a client-side user interface developer, I need a way to render text content, so it is spoken accurately with assistive technologies.</p>
621 | 			</section>
622 | 		</section>
623 | 
624 | 		<div data-include="../common/acknowledgements.html" data-oninclude="fixIncludes" data-include-replace="true">Acknowledgements placeholder</div>
625 | 
626 | 	</body>
627 | </html>
628 | 


--------------------------------------------------------------------------------