├── .github └── workflows │ └── pr-push.yml ├── .gitignore ├── .pr-preview.json ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── index-zh-cn.bs ├── index.bs ├── text.bs └── w3c.json /.github/workflows/pr-push.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | on: 3 | pull_request: {} 4 | push: 5 | branches: [main] 6 | jobs: 7 | main: 8 | name: Build, Validate and Deploy 9 | runs-on: ubuntu-latest 10 | steps: 11 | - uses: actions/checkout@v2 12 | - name: index.bs 13 | uses: w3c/spec-prod@v2 14 | with: 15 | SOURCE: index.bs 16 | DESTINATION: index.html 17 | TOOLCHAIN: bikeshed 18 | BUILD_FAIL_ON: fatal 19 | GH_PAGES_BRANCH: gh-pages 20 | - name: text.bs 21 | uses: w3c/spec-prod@v2 22 | with: 23 | SOURCE: text.bs 24 | DESTINATION: text.html 25 | TOOLCHAIN: bikeshed 26 | BUILD_FAIL_ON: fatal 27 | GH_PAGES_BRANCH: gh-pages 28 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | 6 | # Runtime data 7 | pids 8 | *.pid 9 | *.seed 10 | 11 | # Directory for instrumented libs generated by jscoverage/JSCover 12 | lib-cov 13 | 14 | # Coverage directory used by tools like istanbul 15 | coverage 16 | 17 | # nyc test coverage 18 | .nyc_output 19 | 20 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 21 | .grunt 22 | 23 | # node-waf configuration 24 | .lock-wscript 25 | 26 | # Compiled binary addons (http://nodejs.org/api/addons.html) 27 | build/Release 28 | 29 | # Dependency directories 30 | node_modules 31 | jspm_packages 32 | 33 | # Optional npm cache directory 34 | .npm 35 | 36 | # Optional REPL history 37 | .node_repl_history 38 | -------------------------------------------------------------------------------- /.pr-preview.json: -------------------------------------------------------------------------------- 1 | { 2 | "src_file": "index.bs", 3 | "type": "bikeshed", 4 | "params": { 5 | "force": 1 6 | } 7 | } 8 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | All documentation, code and communication under this repository are covered by the [W3C Code of Ethics and Professional Conduct](https://www.w3.org/Consortium/cepc/). 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Web Platform Incubator Community Group 2 | 3 | This repository is being used for work in the Web Platform Incubator Community Group, governed by the [W3C Community License 4 | Agreement (CLA)](http://www.w3.org/community/about/agreements/cla/). To contribute, you must join 5 | the CG. 6 | 7 | If you are not the sole contributor to a contribution (pull request), please identify all 8 | contributors in the pull request's body or in subsequent comments. 9 | 10 | To add a contributor (other than yourself, that's automatic), mark them one per line as follows: 11 | 12 | ``` 13 | +@github_username 14 | ``` 15 | 16 | If you added a contributor by mistake, you can remove them in a comment with: 17 | 18 | ``` 19 | -@github_username 20 | ``` 21 | 22 | If you are making a pull request on behalf of someone else but you had no part in designing the 23 | feature, you can remove yourself with the above syntax. 24 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | All Reports in this Repository are licensed by Contributors under the 2 | [W3C Software and Document 3 | License](http://www.w3.org/Consortium/Legal/2015/copyright-software-and-document). Contributions to 4 | Specifications are made under the [W3C CLA](https://www.w3.org/community/about/agreements/cla/). 5 | 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Shape Detection API Specification _:stars:_:movie_camera: 3 | 4 | This is the repository for `shape-detection-api`, an experimental API for detecting Shapes (e.g. Faces, Barcodes, Text) in live or still images on the Web by **using accelerated hardware/OS resources**. 5 | 6 | You're welcome to contribute! Let's make the Web rock our socks off! 7 | 8 | ## [Introduction](https://wicg.github.io/shape-detection-api/#introduction) :blue_book: 9 | 10 | Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, text or QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging or detection of high saliency areas. Users interacting with WebCams or other Video Capture Devices have become accustomed to camera-like features such as the ability to focus directly on human faces on the screen of their devices. This is particularly true in the case of mobile devices, where hardware manufacturers have long been supporting these features. Unfortunately, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary. 11 | 12 | ## Use cases :camera: 13 | 14 | QR/barcode/text detection can be used for: 15 | * user identification/registration, e.g. for [voting purposes](https://twitter.com/RegistertoVote/status/733123511128981508); 16 | * eCommerce, e.g. [Walmart Pay](https://www.slashgear.com/awalmart-announces-walmart-pay-for-qr-code-based-mobile-payments-10417912/); 17 | * Augmented Reality overlay, e.g. [here](http://www.multidots.com/augmented-reality/); 18 | * Driving online-to-offline engagement, fighting fakes [etc](https://www.clickz.com/why-have-qr-codes-taken-off-in-china/23662/). 19 | 20 | Face detection can be used for: 21 | * producing fun effects, e.g. [Snapchat Lenses](https://support.snapchat.com/en-US/a/lenses1); 22 | * giving hints to encoders or auto focus routines; 23 | * user name tagging; 24 | * enhance accesibility by e.g. making objects appear larger as the user gets closer like [HeadTrackr](https://www.auduno.com/headtrackr/examples/targets.html); 25 | * speeding up Face Recognition by indicating the areas of the image where faces are present. 26 | 27 | 28 | ## Current Related Efforts and Workarounds :wrench: 29 | 30 | Some Web Apps -gasp- run Detection in Javascript. A performance comparison of some such libraries can be found [here](https://github.com/mtschirs/js-objectdetect#performance) (note that this performance evaluation does not include e.g. WebCam image acquisition and/or canvas interactions). 31 | 32 | Samsung Browser [has a private API](developer.samsung.com/internet) (click to unfold "Overview for Android", then search for "QR code reader"). 33 | 34 | **TODO**: compare a few JS/native libraries in terms of size and performance. A performance and detection comparison of some popular JS QR code scanners can be found [here](https://github.com/danimoh/qr-scanner-benchmark). `zxingjs2` has [a list of some additional JS libraries](https://github.com/ghybs/zxingjs2#other-barcode-image-processing-libraries-related-to-javascript). 35 | 36 | Android Native Apps usually integrate [ZXing](https://github.com/zxing/zxing) (which amounts to adding ~560KB when counting [core.jar](http://repo1.maven.org/maven2/com/google/zxing/core/3.3.0/), [android-core.jar](http://repo1.maven.org/maven2/com/google/zxing/android-core/3.3.0/) and [android-integration.jar](http://repo1.maven.org/maven2/com/google/zxing/android-integration/3.3.0/))). 37 | 38 | OCR reader in Javascript are north of 1MB of size () 39 | 40 | ## Potential for misuse :money_with_wings: 41 | 42 | Face Detection is an expensive operation due to the algorithmic complexity. Many requests, or demanding systems like a live stream feed with a certain frame rate, could slow down the whole system or greatly increase power consumption. 43 | 44 | ## Platform specific implementation notes :computer: 45 | 46 | ## Overview 47 | 48 | What platforms support what detector? 49 | 50 | Encoder | Mac| Android | Win10 | Linux | ChromeOs | 51 | --------- |:--:| :------:| :---: | :------:| :------: | 52 | Face | sw | hw/sw | sw | ✘| ✘ | 53 | QR/Barcode| sw | sw |✘| ✘| ✘ | 54 | Text | sw | sw | sw | ✘| ✘ | 55 | 56 | 57 | ### Android 58 | 59 | Android provides both a stand alone software face detector and a interface to the hardware ones. 60 | 61 | | API | uses... | Release notes | 62 | | ------------- |:-------------:| -----:| 63 | | [FaceDetector](https://developer.android.com/reference/android/media/FaceDetector.html)| Software based using the [Neven face detector](https://android.googlesource.com/platform/external/neven)| API Level 1, 2008| 64 | | [Vision.Face](https://developers.google.com/android/reference/com/google/android/gms/vision/face/Face)| Software based | Google Play services 7.2, Aug 2015| 65 | | [Camera2](https://developer.android.com/reference/android/hardware/camera2/CaptureRequest.html#STATISTICS_FACE_DETECT_MODE)| Hardware | API Level 21/Lollipop, 2014 | 66 | | [Camera.Face](https://developer.android.com/reference/android/hardware/Camera.Face.html) (old)| Hardware | API Level 14/Ice Cream Sandwich, 2011 | 67 | 68 | The availability of the actual hardware detection depends on the actual chip; according to the market share in [1H 2016](http://www.antutu.com/en/view.shtml?id=8256) Qualcomm, MediaTek, Samsung and HiSilicon are the largest individual OEMs and they all have support for Face Detection (all the top-10 phones are covered as well): 69 | * [Qualcomm Snapdragon](https://developer.qualcomm.com/software/snapdragon-sdk-android/facial-recognition) chipset family supports it since ~2013 as part of their ISP. 70 | * MediaTek as part of [CorePilot 2.0](http://cdn-cw.mediatek.com/White%20Papers/MediaTek_CorePilot%202.0_Final.pdf) (introduced in 2015). 71 | * [Samsung Exynos](http://www.samsung.com/semiconductor/minisite/Exynos/data/Benefits_of_Exynos_5420_ISP_for_Enhanced_Imaging_Experience.pdf) (at least 2013). 72 | * Huawei HiSilicon [Kirin950](http://www.androidauthority.com/huawei-hisilicon-kirin-950-official-653811) since 2015 (this fabless manufacturer is relatively new). 73 | * It is worth noting that ARM [acquired Apical in 2016](https://www.arm.com/products/graphics-and-multimedia/computer-vision) for its computer vision expertise. 74 | 75 | Barcode/QR and Text detection is available via Google Play Services [barcode](https://developers.google.com/android/reference/com/google/android/gms/vision/barcode/package-summary) and [text](https://developers.google.com/android/reference/com/google/android/gms/vision/text/package-summary), respectively. 76 | 77 | ### Mac OS X / iOS 78 | 79 | Mac OS X/iOS provides `CIDetector` and `Vision Framework` for Face, QR, Text and Rectangle detection in software or hardware. 80 | 81 | | API | uses... | Release notes | 82 | | ------------- |:-------------: | -----:| 83 | | [Vision Framework, Mac OS X](https://developer.apple.com/documentation/vision)| Software and Hardware | OS X v10.13, 2017 | 84 | | [Vision Framework, iOS](https://developer.apple.com/documentation/vision)| Software and Hardware | IOS X v11.0, 2017 | 85 | | [CIDetector, Mac OS X](https://developer.apple.com/library/mac/documentation/CoreImage/Reference/CIDetector_Ref/)| Software | OS X v10.7, 2011 | 86 | | [CIDetector, iOS](https://developer.apple.com/library/ios/documentation/CoreImage/Reference/CIDetector_Ref/) | Software | iOS v5.0, 2011 | 87 | | [AVFoundation](https://developer.apple.com/reference/avfoundation/avcapturemetadataoutput?language=objc)| Hardware | iOS 6.0, 2012 | 88 | 89 | Apple has supported Face Detection in hardware since the [Apple A5 processor](https://en.wikipedia.org/wiki/Apple_A5) introduced in 2011. 90 | 91 | ### Windows 92 | 93 | Windows 10 has a [FaceDetector](https://msdn.microsoft.com/library/windows/apps/dn974129) class and support for Text Detection [OCR](https://msdn.microsoft.com/en-us/library/windows/apps/windows.media.ocr.aspx). 94 | 95 | ## Rendered URL :bookmark_tabs: 96 | 97 | The rendered version of this site can be found in https://wicg.github.io/shape-detection-api (if that's not alive for some reason try the [rawgit rendering](https://rawgit.com/WICG/shape-detection-api/gh-pages/index.html)). 98 | 99 | ## Examples and demos 100 | 101 | https://wicg.github.io/shape-detection-api/#examples 102 | 103 | ## Notes on bikeshedding :bicyclist: 104 | 105 | To compile, run: 106 | 107 | ``` 108 | curl https://api.csswg.org/bikeshed/ -F file=@index.bs -F force=1 > index.html 109 | ``` 110 | 111 | if the produced file has a strange size (i.e. zero), then something went terribly wrong; run instead 112 | 113 | ``` 114 | curl https://api.csswg.org/bikeshed/ -F file=@index.bs -F output=err 115 | ``` 116 | and try to figure out why `bikeshed` did not like the `.bs` :'( 117 | -------------------------------------------------------------------------------- /index-zh-cn.bs: -------------------------------------------------------------------------------- 1 |
  2 | Title: 加速的图形识别
  3 | Repository: wicg/shape-detection-api
  4 | Status: w3c/CG-DRAFT
  5 | ED: https://wicg.github.io/shape-detection-api
  6 | Shortname: shape-detection-api
  7 | Level: 1
  8 | Editor: Miguel Casas-Sanchez, w3cid 82825, Google Inc., mcasas@google.com
  9 | Abstract: 本文档描述了一套Chrome中针对静态和/或动态图像的图形识别(如:人脸识别)API。
 10 | Group: wicg
 11 | !Participate: Join the W3C Community Group
 12 | !Participate: Fix the text through GitHub
 13 | 
14 | 15 | 32 | 33 | # 简介 # {#introduction} 34 | 35 | 照片和图像是互联网构成中最大的部分,其中相当一部分包含了可识别的特征,比如人脸,二维码或者文本。可想而之,识别这些特征的计算开销非常大,但有些很有趣场景,比如在照片中自动标记人脸,或者根据图像中的URL进行重定向。硬件厂商从很久以前就已经开始支持这些特性,但Web应用迟迟未能很好地利用上这些硬件特性,必须借助一些难用的程序库才能达到目的。 36 | 37 | ## 图形识别的场景 ## {#image-sources-for-detection} 38 | 39 | 请参考代码库中自述/解释 的文档。 40 | 41 | # 图形识别API # {#api} 42 | 43 | 某些特定的浏览器可能会提供识别器来标示当前硬件是否提供加速功能。 44 | 45 | ## 用于识别的图像源 ## {#image-sources-for-detection} 46 | 47 |

48 | 本节的灵感来自 [[canvas2dcontext#image-sources-for-2d-rendering-contexts]]。 49 |

50 | 51 | {{ImageBitmapSource}} 允许多种图形接口的实现对象作为图像源,进行识别处理。 52 | 53 | 54 | * 当{{ImageBitmapSource}}对象代表{{HTMLImageElement}}的时候,该元素的图像必须用作源图像。而在特定情况下,当{{ImageBitmapSource}}对象代表{{HTMLImageElement}}中的动画图像的时候,用户代理程序(User Agent)必须显示这个动画图像的默认图像(该默认图像指的是,在动画图像被禁用或不支持动画的环境下,需要展现的图像),或者没有默认图像的话,就显示该动画图像的第一帧。 55 | 56 | * 当{{ImageBitmapSource}}对象代表{{HTMLVideoElement}}的时候,该视频播放的当前帧必须用作源图像,同时,该源图像的尺寸必须是视频源的固有维数(intrinsic dimensions),换句话说,就是视频源经过任意比例的调整后的大小。 57 | 58 | 59 | * 当{{ImageBitmapSource}}对象代表{{HTMLCanvasElement}}的时候,该元素的位图必须用作源图像。 60 | 61 | 当用户代理程序(User Agent)被要求用某种既有的{{ImageBitmapSource}}作为识别器的detect()方法的输入参数的时候,必须执行以下步骤: 62 | 63 | * 如果{{ImageBitmapSource}}所含的有效脚本源([[HTML#concept-origin]])和当前文档的有效脚本源不同,就拒绝对应的Promise对象,并附上一个名为{{SecurityError}}的新建{{DOMException}}对象。 64 | 65 | * 如果一个{{ImageBitmapSource}}是一个处于|broken|状态的{{HTMLImageElement}}对象的话,就拒绝对应的Promise对象,并附上一个名为{{InvalidStateError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。 66 | 67 | 68 | * 如果{{ImageBitmapSource}}是一个不能完整解码的{{HTMLImageElement}}对象的话,就拒绝对应的Promise对象,并附上一个名为{{InvalidStateError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。 69 | 70 | * 如果一个{{ImageBitmapSource}}是一个{{HTMLVideoElement}}对象,且其|readyState|属性为|HAVE_NOTHING| 或 |HAVE_METADATA|的话,就拒绝对应的Promise对象,并附上一个名为{{InvalidStateError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。 71 | 72 | 73 | * 如果一个{{ImageBitmapSource}}是一个{{HTMLCanvasElement}}对象,且其位图的|origin-clean| ([[HTML#concept-canvas-origin-clean]])标识为false的话,就拒绝对应的Promise对象,并附上一个名为{{SecurityError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。 74 | 75 | 76 | 请注意,如果一个{{ImageBitmapSource}}的水平尺寸或垂直尺寸等于0,那么对应的Promise对象就会被简单地当作一个空的已检测对象序列来处理。 77 | 78 | 79 | ## 人脸识别API ## {#face-detection-api} 80 | 81 | {{FaceDetector}}代表一个针对图像中的人脸进行识别的底层加速平台组件。创建时可以选择一个{{FaceDetectorOptions}}的Dictionary对象作为入参。它提供了一个单独的 {{FaceDetector/detect()}}方法操作{{ImageBitmapSource}}对象,并返回Promise对象。如果检测到[[#image-sources-for-detection]]中提及的用例,则该方法必须拒绝该Promise对象;否则,它可能会向{{DetectedFace}}序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise,每个任务由一个{{DetectedFace/boundingBox}}所包含并界定。 82 | 83 |
 84 | dictionary FaceDetectorOptions {
 85 |   unsigned short maxDetectedFaces;
 86 |   boolean fastMode;
 87 | };
 88 | 
89 | 90 |
91 |
maxDetectedFaces
92 |
当前场景中已识别的人脸数的最大值。
93 |
fastMode
94 |
提示User Agent(UA)尝试以速度优先(于精确度)的模式,通过更小的比例尺(更靠近目标图形)或寻找更大的目标图形的办法进行识别。
95 |
96 | 97 |
 98 | [Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)]
 99 | interface FaceDetector {
100 |   Promise<sequence<DetectedFace>> detect(ImageBitmapSource image);
101 | };
102 | 
103 | 104 |
105 |
FaceDetector(optional FaceDetectorOptions faceDetectorOptions)
106 |
构建一个新的{{FaceDetector}}对象,附带可选项|faceDetectorOptions|。
107 |
detect()
108 |
尝试在{{ImageBitmapSource}} |图像|中识别人脸,如果识别到人脸,则返回一个{{DetectedFace}}序列。
109 |
110 | 111 |
112 | interface DetectedFace {
113 |   [SameObject] readonly attribute DOMRectReadOnly boundingBox;
114 | };
115 | 
116 | 117 |
118 |
boundingBox
119 |
与图像坐标轴对齐的一个矩形,该矩形标示了一个已识别特征的位置和范围。
120 |
121 | 122 |
123 | 人脸识别的实现案例有:Android FaceDetector, Apple's CIFaceFeature 或者 Windows 10 FaceDetector。 124 |
125 | 126 |
127 | Consider adding attributes such as, e.g.: 128 |
129 |     [SameObject] readonly attribute unsigned long id;
130 |     [SameObject] readonly attribute FrozenArray<Landmark>? landmarks;
131 |   
132 | to {{DetectedFace}}. 133 |
134 | 135 | ## 条形码识别API ## {#barcode-detection-api} 136 | 137 | {{BarcodeDetector}}代表一个针对图像中的二维码或条形码进行识别的底层加速平台组件。它提供了一个单独的{{BarcodeDetector/detect()}}方法操作{{ImageBitmapSource}}对象,并返回Promise对象。如果检测到[[#image-sources-for-detection]]中提及的情况,则该方法必须拒绝该Promise对象;否则,它可能会向{{DetectedBarcode}}序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise。基本上每个任务包含{{DetectedBarcode/boundingBox}}和一系列{{Point2D}},甚至可能还有个解码后的{{DOMString}}对象{{DetectedBarcode/rawValue}},由它们来确定边界。 138 | 139 | 140 |
141 | [Exposed=(Window,Worker), Constructor()]
142 | interface BarcodeDetector {
143 |   Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image);
144 | };
145 | 
146 | 147 |
148 |
detect(ImageBitmapSource image)
149 |
尝试在{{ImageBitmapSource}}图像中识别条形码。
150 |
151 | 152 |
153 | interface DetectedBarcode {
154 |   [SameObject] readonly attribute DOMRectReadOnly boundingBox;
155 |   [SameObject] readonly attribute DOMString rawValue;
156 |   [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
157 | };
158 | 
159 | 160 |
161 |
boundingBox
162 |
与图像坐标轴对齐的一个矩形,该矩形标示了一个已识别特征的位置和范围。
163 | 164 |
rawValue
165 |
从条形码解码得到的DOMString对象,该值可能为多行。
166 | 167 |
cornerPoints
168 |
一串已识别条形码的顶点序列(sequence),顺序从左上角开始,以顺时针方向排列。因为现实中透视形变的原因,该序列不一定表示的是正方形。
169 |
170 | 171 |
172 | 实现了条形码/二维码识别的示例有:Google Play Services 或者 Apple's CICRCodeFeature. 173 |
174 | 175 | ## 文本识别API ## {#text-detection-api} 176 | 177 | TextDetector代表一个针对图像中的文本进行识别的底层加速平台组件。它提供了一个单独的{{TextDetector/detect()}}方法操作{{ImageBitmapSource}}对象,并返回Promise对象。如果检测到[[#image-sources-for-detection]]中提及的情况,则该方法必须拒绝该Promise对象;否则,它可能会向{{DetectedText}}序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise,基本上每个task包含一个{{DetectedText/rawValue}},并由一个{{DetectedText/boundingBox}}来确定边界。 178 | 179 | 180 | 181 |
182 | [
183 |     Constructor,
184 |     Exposed=(Window,Worker),
185 | ] interface TextDetector {
186 |     Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
187 | };
188 | 
189 | 190 |
191 |
detect(ImageBitmapSource image)
192 |
尝试在{{ImageBitmapSource}} |图像|中识别文本块。.
193 |
194 | 195 |
196 | [
197 |     Constructor,
198 | ] interface DetectedText {
199 |     [SameObject] readonly attribute DOMRect boundingBox;
200 |     [SameObject] readonly attribute DOMString rawValue;
201 | };
202 | 
203 | 204 |
205 |
boundingBox
206 |
与图像坐标轴对齐的一个矩形,该矩形标示了一个已识别特征的位置和范围。
207 | 208 |
rawValue
209 |
从图像中识别到的原始字符串。
210 |
211 | 212 |
213 | 实现了文本识别的示例有:Google Play Services, Apple's CIDetector 或者 Windows 10 OCR API. 214 |
215 | 216 | # 示例 # {#examples} 217 | 218 |

219 | 以下示例的微调或扩展版本,以及更多示例请参考这个codepen集合。 220 |

221 | 222 | ## 图形识别器的平台支持 ## {#platform-support-for-a-given-detector} 223 | 224 |
225 | 以下的示例同样可以在这个codepen中找到微调的版本。 226 |
227 | 228 |
229 | 230 |
231 | if (window.FaceDetector == undefined) {
232 |   console.error('Face Detection not supported on this platform');
233 | }
234 | if (window.BarcodeDetector == undefined) {
235 |   console.error('Barcode Detection not supported on this platform');
236 | }
237 | if (window.TextDetector == undefined) {
238 |   console.error('Text Detection not supported on this platform');
239 | }
240 | 
241 |
242 | 243 | ## 人脸识别 ## {#face-detection} 244 | 245 |
246 | 以下的示例同样可以在这个codepen(或者这个有边界框覆盖的图像示例中)找到。 247 |
248 | 249 |
250 |
251 | let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1});
252 | // Assuming |theImage| is e.g. a <img> content, or a Blob.
253 | 
254 | faceDetector.detect(theImage)
255 | .then(detectedFaces => {
256 |   for (const face of detectedFaces) {
257 |     console.log(' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' +
258 |         ' size ${face.boundingBox.width}x${face.boundingBox.height}');
259 |   }
260 | }).catch(() => {
261 |   console.error("Face Detection failed, boo.");
262 | })
263 | 
264 |
265 | 266 | ## 条形码识别 ## {#barcode-detection} 267 | 268 |
269 | 以下的示例同样可以在这个这个codepen(或者这个覆盖了边界框的图像示例中)找到。 270 |
271 | 272 |
273 |
274 | let barcodeDetector = new BarcodeDetector();
275 | // Assuming |theImage| is e.g. a <img> content, or a Blob.
276 | 
277 | barcodeDetector.detect(theImage)
278 | .then(detectedCodes => {
279 |   for (const barcode of detectedCodes) {
280 |     console.log(' Barcode ${barcode.rawValue}' +
281 |         ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' +
282 |         ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}');
283 |   }
284 | }).catch(() => {
285 |   console.error("Barcode Detection failed, boo.");
286 | })
287 | 
288 |
289 | 290 | ## 文本识别 ## {#text-detection} 291 | 292 |
293 | 以下的示例同样可以在这个codepen (或者这个集成了视频捕捉功能的示例)找到。 294 |
295 | 296 |
297 |
298 | let textDetector = new TextDetector();
299 | // Assuming |theImage| is e.g. a <img> content, or a Blob.
300 | 
301 | textDetector.detect(theImage)
302 | .then(detectedTextBlocks => {
303 |   for (const textBlock of detectedTextBlocks) {
304 |     console.log(
305 |         'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' +
306 |         'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}');
307 |   }
308 | }).catch(() => {
309 |   console.error("Text Detection failed, boo.");
310 | })
311 | 
312 |
313 | 314 | 315 |
316 | spec: ECMAScript; urlPrefix: https://tc39.github.io/ecma262/#
317 |     type: interface
318 |         text: Array; url: sec-array-objects
319 |         text: Promise; url:sec-promise-objects
320 |         text: TypeError; url: sec-native-error-types-used-in-this-standard-typeerror
321 | 
322 | 323 |
324 | type: interface; text: Point2D; url: https://w3c.github.io/mediacapture-image/#Point2D;
325 | 
326 | 327 |
328 | type: interface; text: DOMString; url: https://heycam.github.io/webidl/#idl-DOMString; spec: webidl
329 | 
330 | 331 | 338 | 339 |
340 | {
341 |   "wikipedia": {
342 |       "href": "https://en.wikipedia.org/wiki/Object-class_detection",
343 |       "title": "Object-class Detection Wikipedia Entry",
344 |       "publisher": "Wikipedia",
345 |       "date": "14 September 2016"
346 |   },
347 |   "canvas2dcontext": {
348 |       "authors": [ "Rik Cabanier", "Jatinder Mann", "Jay Munro", "Tom Wiltzius",
349 |                    "Ian Hickson"],
350 |       "href": "https://www.w3.org/TR/2dcontext/",
351 |       "title": "HTML Canvas 2D Context",
352 |       "status": "REC"
353 |   }
354 | }
355 | 
356 | 357 | -------------------------------------------------------------------------------- /index.bs: -------------------------------------------------------------------------------- 1 |
  2 | Title: Accelerated Shape Detection in Images
  3 | Repository: wicg/shape-detection-api
  4 | Status: CG-DRAFT
  5 | ED: https://wicg.github.io/shape-detection-api
  6 | Shortname: shape-detection-api
  7 | Level: 1
  8 | Editor: Miguel Casas-Sanchez 82825, Google LLC https://www.google.com, mcasas@google.com
  9 | Editor: Reilly Grant 83788, Google LLC https://www.google.com, reillyg@google.com
 10 | Abstract: This document describes an API providing access to accelerated shape detectors (e.g. human faces) for still images and/or live image feeds.
 11 | Translation: zh-CN https://wicg.github.io/shape-detection-api/index-zh-cn.html
 12 | Group: wicg
 13 | Markup Shorthands: markdown yes
 14 | !Participate: Join the W3C Community Group
 15 | !Participate: Fix the text through GitHub
 16 | 
17 | 18 | 35 | 36 | # Introduction # {#introduction} 37 | 38 | Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces or barcordes/QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. While hardware manufacturers have been supporting these features for a long time, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary. 39 | 40 |

41 | Text Detection, despite being an interesting field, is not considered stable enough across neither computing platforms nor character sets to be standarized in the context of this document. For reference a sister informative specification is kept in [[TEXT-DETECTION-API]]. 42 |

43 | 44 | ## Shape detection use cases ## {#use-cases} 45 | 46 | Please see the Readme/Explainer in the repository. 47 | 48 | # Shape Detection API # {#api} 49 | 50 | Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation. 51 | 52 | Detecting features in an image occurs asynchronously, potentially communicating with acceleration hardware independent of the browser. Completion events use the shape detection task source. 53 | 54 | ## Image sources for detection ## {#image-sources-for-detection} 55 | 56 |

57 | This section is inspired by [[2dcontext#image-sources-for-2d-rendering-contexts]]. 58 |

59 | 60 | {{ImageBitmapSource}} allows objects implementing any of a number of interfaces to be used as image sources for the detection process. 61 | 62 | * When an {{ImageBitmapSource}} object represents an {{HTMLImageElement}}, the element's image must be used as the source image. Specifically, when an {{ImageBitmapSource}} object represents an animated image in an {{HTMLImageElement}}, the user agent must use the default image of the animation (the one that the format defines is to be used when animation is not supported or is disabled), or, if there is no such image, the first frame of the animation. 63 | 64 | * When an {{ImageBitmapSource}} object represents an {{HTMLVideoElement}}, then the frame at the current playback position when the method with the argument is invoked must be used as the source image when processing the image, and the source image's dimensions must be the intrinsic dimensions of the media resource (i.e. after any aspect-ratio correction has been applied). 65 | 66 | * When an {{ImageBitmapSource}} object represents an {{HTMLCanvasElement}}, the element's bitmap must be used as the source image. 67 | 68 | When the UA is required to use a given type of {{ImageBitmapSource}} as input argument for the `detect()` method of whichever detector, it MUST run these steps: 69 | 70 | * If any {{ImageBitmapSource}} have an effective script origin ([=origin=]) which is not the same as the Document's effective script origin, then reject the Promise with a new {{DOMException}} whose name is {{SecurityError}}. 71 | 72 | * If the {{ImageBitmapSource}} is an {{HTMLImageElement}} object that is in the `Broken` (HTML Standard §img-error) state, then reject the Promise with a new {{DOMException}} whose name is {{InvalidStateError}}, and abort any further steps. 73 | 74 | * If the {{ImageBitmapSource}} is an {{HTMLImageElement}} object that is not fully decodable then reject the Promise with a new {{DOMException}} whose name is {{InvalidStateError}}, and abort any further steps 75 | 76 | * If the {{ImageBitmapSource}} is an {{HTMLVideoElement}} object whose {{HTMLMediaElement/readyState}} attribute is either {{HAVE_NOTHING}} or {{HAVE_METADATA}} then reject the Promise with a new {{DOMException}} whose name is {{InvalidStateError}}, and abort any further steps. 77 | 78 | * If the {{ImageBitmapSource}} argument is an {{HTMLCanvasElement}} whose bitmap's `origin-clean` (HTML Standard §concept-canvas-origin-clean) flag is false, then reject the Promise with a new {{DOMException}} whose name is {{SecurityError}}, and abort any further steps. 79 | 80 | Note that if the {{ImageBitmapSource}} is an object with either a horizontal dimension or a vertical dimension equal to zero, then the Promise will be simply resolved with an empty sequence of detected objects. 81 | 82 | ## Face Detection API ## {#face-detection-api} 83 | 84 | {{FaceDetector}} represents an underlying accelerated platform's component for detection of human faces in images. It can be created with an optional Dictionary of {{FaceDetectorOptions}}. It provides a single {{FaceDetector/detect()}} operation on an {{ImageBitmapSource}} which result is a Promise. This method MUST reject this promise in the cases detailed in [[#image-sources-for-detection]]; otherwise it MAY queue a task that utilizes the OS/Platform resources to resolve the Promise with a Sequence of {{DetectedFace}}s, each one essentially consisting on and delimited by a {{DetectedFace/boundingBox}}. 85 | 86 |
87 | Example implementations of face detection are e.g. Android FaceDetector (or the Google Play Services vision library), Apple's CIFaceFeature / VNDetectFaceLandmarksRequest or Windows 10 FaceDetector. 88 |
89 | 90 | 91 | [Exposed=(Window,Worker), 92 | SecureContext] 93 | interface FaceDetector { 94 | constructor(optional FaceDetectorOptions faceDetectorOptions = {}); 95 | Promise<sequence<DetectedFace>> detect(ImageBitmapSource image); 96 | }; 97 | 98 | 99 |
100 |
FaceDetector(optional FaceDetectorOptions |faceDetectorOptions|)
101 |
Constructs a new {{FaceDetector}} with the optional |faceDetectorOptions|. 102 |
103 | Detectors may potentially allocate and hold significant resources. Where possible, reuse the same {{FaceDetector}} for several detections. 104 |
105 |
106 |
detect(ImageBitmapSource |image|)
107 |
Tries to detect human faces in the {{ImageBitmapSource}} |image|. The detected faces, if any, are returned as a sequence of {{DetectedFace}}s.
108 |
109 | 110 | ### {{FaceDetectorOptions}} ### {#facedetectoroptions-section} 111 | 112 | 113 | dictionary FaceDetectorOptions { 114 | unsigned short maxDetectedFaces; 115 | boolean fastMode; 116 | }; 117 | 118 | 119 |
120 |
`maxDetectedFaces`
121 |
Hint to the UA to try and limit the amount of detected faces on the scene to this maximum number.
122 |
`fastMode`
123 |
Hint to the UA to try and prioritise speed over accuracy by e.g. operating on a reduced scale or looking for large features.
124 |
125 | 126 | ### {{DetectedFace}} ### {#detectedface-section} 127 | 128 | 129 | dictionary DetectedFace { 130 | required DOMRectReadOnly boundingBox; 131 | required sequence<Landmark>? landmarks; 132 | }; 133 | 134 | 135 |
136 |
`boundingBox`
137 |
A rectangle indicating the position and extent of a detected feature aligned to the image axes.
138 |
`landmarks`
139 |
A series of features of interest related to the detected feature.
140 |
141 | 142 | 143 | dictionary Landmark { 144 | required sequence<Point2D> locations; 145 | LandmarkType type; 146 | }; 147 | 148 | 149 |
150 |
`locations`
151 |
A point in the center of the detected landmark, or a sequence of points defining the vertices of a simple polygon surrounding the landmark in either a clockwise or counter-clockwise direction.
152 |
`type`
153 |
Type of the landmark, if known.
154 |
155 | 156 | 157 | enum LandmarkType { 158 | "mouth", 159 | "eye", 160 | "nose" 161 | }; 162 | 163 | 164 |
165 |
`mouth`
166 |
The landmark is identified as a human mouth.
167 |
`eye`
168 |
The landmark is identified as a human eye.
169 |
`nose`
170 |
The landmark is identified as a human nose.
171 |
172 | 173 |
174 | Consider adding attributes such as, e.g.: 175 |
176 |     [SameObject] readonly attribute unsigned long id;
177 |   
178 | to {{DetectedFace}}. 179 |
180 | 181 | ## Barcode Detection API ## {#barcode-detection-api} 182 | 183 | {{BarcodeDetector}} represents an underlying accelerated platform's component for detection of linear or two-dimensional barcodes in images. It provides a single {{BarcodeDetector/detect()}} operation on an {{ImageBitmapSource}} which result is a Promise. This method MUST reject this Promise in the cases detailed in [[#image-sources-for-detection]]; otherwise it MAY queue a task using the OS/Platform resources to resolve the Promise with a sequence of {{DetectedBarcode}}s, each one essentially consisting on and delimited by a {{DetectedBarcode/boundingBox}} and a series of {{Point2D}}s, and possibly a {{DetectedBarcode/rawValue}} decoded {{DOMString}}. 184 | 185 |
186 | Example implementations of Barcode/QR code detection are e.g. Google Play Services or Apple's CIQRCodeFeature / VNDetectBarcodesRequest. 187 |
188 | 189 | 190 | [Exposed=(Window,Worker), 191 | SecureContext] 192 | interface BarcodeDetector { 193 | constructor(optional BarcodeDetectorOptions barcodeDetectorOptions = {}); 194 | static Promise<sequence<BarcodeFormat>> getSupportedFormats(); 195 | 196 | Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image); 197 | }; 198 | 199 | 200 |
201 |
BarcodeDetector(optional BarcodeDetectorOptions |barcodeDetectorOptions|)
202 |
Constructs a new {{BarcodeDetector}} with |barcodeDetectorOptions|. 203 | * If |barcodeDetectorOptions|.{{BarcodeDetectorOptions/formats}} is present and empty, then throw a new {{TypeError}}. 204 | * If |barcodeDetectorOptions|.{{BarcodeDetectorOptions/formats}} is present and contains {{unknown}}, then throw a new {{TypeError}}. 205 |
206 | Detectors may potentially allocate and hold significant resources. Where possible, reuse the same {{BarcodeDetector}} for several detections. 207 |
208 |
209 | 210 |
`getSupportedFormats()`
211 |
This method, when invoked, MUST return a new {{Promise}} |promise| and run the following steps in parallel: 212 |
    213 |
  1. Let |supportedFormats| be a new {{Array}}.
  2. 214 |
  3. If the UA does not support barcode detection, [=queue a global task=] on the [=relevant global object=] of [=this=] using the [=shape detection task source=] to [=resolve=] |promise| with |supportedFormats| and abort these steps.
  4. 215 |
  5. Enumerate the {{BarcodeFormat}}s that the UA understands as potentially detectable in images. Add these to |supportedFormats|. 216 |
    217 | The UA cannot give a definitive answer as to whether a given barcode format will always be recognized on an image due to e.g. positioning of the symbols or encoding errors. If a given barcode symbology is not in |supportedFormats| array, however, it should not be detectable whatsoever. 218 |
    219 |
  6. 220 |
  7. [=Queue a global task=] on the [=relevant global object=] of [=this=] using the [=shape detection task source=] to [=resolve=] |promise| with |supportedFormats|.
  8. 221 |
222 |
223 | The list of supported {{BarcodeFormat}}s is platform dependent, some examples are the ones supported by Google Play Services and Apple's QICRCodeFeature. 224 |
225 |
226 | 227 |
detect(ImageBitmapSource |image|)
228 |
Tries to detect barcodes in the {{ImageBitmapSource}} |image|.
229 |
230 | 231 | ### {{BarcodeDetectorOptions}} ### {#barcodedetectoroptions-section} 232 | 233 | 234 | dictionary BarcodeDetectorOptions { 235 | sequence<BarcodeFormat> formats; 236 | }; 237 | 238 | 239 |
240 |
`formats`
241 |
A series of {{BarcodeFormat}}s to search for in the subsequent {{BarcodeDetector/detect()}} calls. If not present then the UA SHOULD search for all supported formats. 242 |
243 | Limiting the search to a particular subset of supported formats is likely to provide better performance. 244 |
245 |
246 |
247 | 248 | ### {{DetectedBarcode}} ### {#detectedbarcode-section} 249 | 250 | 251 | dictionary DetectedBarcode { 252 | required DOMRectReadOnly boundingBox; 253 | required DOMString rawValue; 254 | required BarcodeFormat format; 255 | required sequence<Point2D> cornerPoints; 256 | }; 257 | 258 | 259 |
260 |
`boundingBox`
261 |
A rectangle indicating the position and extent of a detected feature aligned to the image
262 | 263 |
`rawValue`
264 |
String decoded from the barcode. This value might be multiline.
265 | 266 |
`format`
267 |
Detect {{BarcodeFormat}}.
268 | 269 |
`cornerPoints`
270 |
A sequence of corner points of the detected barcode, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.
271 |
272 | 273 | ### {{BarcodeFormat}} ### {#barcodeformat-section} 274 | 275 | 276 | enum BarcodeFormat { 277 | "aztec", 278 | "code_128", 279 | "code_39", 280 | "code_93", 281 | "codabar", 282 | "data_matrix", 283 | "ean_13", 284 | "ean_8", 285 | "itf", 286 | "pdf417", 287 | "qr_code", 288 | "unknown", 289 | "upc_a", 290 | "upc_e" 291 | }; 292 | 293 | 294 |
295 |
`aztec`
296 |
This entry represents a square two-dimensional matrix following [[iso24778]] and with a square bullseye pattern at their centre, thus resembling an Aztec pyramid. Does not require a surrounding blank zone. 297 |
298 |
`code_128`
299 |
Code 128 is a linear (one-dimensional), bidirectionally-decodable, self-checking barcode following [[iso15417]] and able to encode all 128 characters of ASCII (hence the naming). 300 |
301 |
`code_39`
302 |
This part talks about the Code 39 barcode. It is a discrete and variable-length barcode type. 303 | [[iso16388]] 304 |
305 |
`code_93`
306 |
Code 93 is a linear, continuous symbology with a variable length following [[bc5]]. It offers a larger information density than Code 128 and the visually similar Code 39. Code 93 is used primarily by Canada Post to encode supplementary delivery information.
307 |
`codabar`
308 |
Codabar is a linear barcode symbology developed in 1972 by Pitney Bowes Corp. ( 309 |
310 | 311 |
`data_matrix`
312 |
Data Matrix is an orientation-independent two-dimensional barcode composed of black and white modules arranged in either a square or rectangular pattern following [[iso16022]].
313 | 314 |
`ean_13`
315 |
EAN-13 is a linear barcode based on the UPC-A standard and defined in [[iso15420]]. It was originally developed by the International Article Numbering Association (EAN) in Europe as a superset of the original 12-digit Universal Product Code (UPC) system developed in the United States (UPC-A codes are represented in EAN-13 with the first character set to 0).
316 |
`ean_8`
317 |
EAN-8 is a linear barcode defined in [[iso15420]] and derived from EAN-13.
318 | 319 |
`itf`
320 |
ITF14 barcode is the GS1 implementation of an Interleaved 2 of 5 bar code to encode a Global Trade Item Number. It is continuous, self-checking, bidirectionally decodable and it will always encode 14 digits. 321 | was once used in the package delivery industry but replaced by Code 128. 322 | [[bc2]] 323 |
324 | 325 |
`pdf417`
326 |
PDF417 refers to a continuous two-dimensional barcode symbology format with multiple rows and columns, bi-directionally decodable and according to the Standard [[iso15438]].
327 | 328 |
`qr_code`
329 |
QR Code is a two-dimensional barcode respecting the Standard [[iso18004]]. The information encoded can be text, URL or other data.
330 | 331 |
`unknown`
332 |
This value is used by the platform to signify that it does not know or specify which barcode format is being detected or supported.
333 | 334 |
`upc_a`
335 |
UPC-A is one of the most common linear barcode types and is widely applied to retail in the United States. Define in [[iso15420]], it represents digits by strips of bars and spaces, each digit being associated to a unique pattern of 2 bars and 2 spaces, both of variable width. UPC-A can encode 12 digits that are uniquely assigned to each trade item, and it'ss technically a subset of EAN-13 (UPC-A codes are represented in EAN-13 with the first character set to 0).
336 | 337 |
`upc_e`
338 |
UPC-E Barcode is a variation of UPC-A defined in [[iso15420]], compressing out unnecessary zeros for a more compact barcode.
339 |
340 | 341 | # Security and Privacy Considerations # {#security-and-privacy-considerations} 342 | 343 | This section is non-normative. 344 | 345 | This interface reveals information about the contents of an image source. It is 346 | critical for implementations to ensure that it cannot be used to bypass 347 | protections that would otherwise protect an image source from inspection. 348 | [[#image-sources-for-detection]] describes the algorithm to accomplish this. 349 | 350 | By providing high-performance shape detection capabilities this interface allows 351 | developers to run image analysis tasks on the local device. This offers a 352 | privacy advantage over offloading computation to a remote system. Developers 353 | should consider the results returned by this interface as privacy sensitive as 354 | the original image from which they were derived. 355 | 356 | # Examples # {#examples} 357 | 358 | This section is non-normative. 359 | 360 |

361 | Slightly modified/extended versions of these examples (and more) can be found in 362 | e.g. this codepen collection. 363 |

364 | 365 | ## Platform support for a given detector ## {#example-feature-detection} 366 | 367 |
368 | The following example can also be found in e.g. this codepen 370 | with minimal modifications. 371 |
372 | 373 |
374 | ```js 375 | if (window.FaceDetector == undefined) { 376 | console.error('Face Detection not supported on this platform'); 377 | } 378 | if (window.BarcodeDetector == undefined) { 379 | console.error('Barcode Detection not supported on this platform'); 380 | } 381 | ``` 382 |
383 | 384 | ## Face Detection ## {#example-face-detection} 385 | 386 |
387 | The following example can also be found in e.g. 388 | this codepen (or this one, with landmarks overlay). 389 |
390 | 391 |
392 | ```js 393 | let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1}); 394 | // Assuming |theImage| is e.g. a <img> content, or a Blob. 395 | 396 | faceDetector.detect(theImage) 397 | .then(detectedFaces => { 398 | for (const face of detectedFaces) { 399 | console.log( 400 | ' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' + 401 | ' size ${face.boundingBox.width}x${face.boundingBox.height}'); 402 | } 403 | }).catch(() => { 404 | console.error("Face Detection failed, boo."); 405 | }) 406 | ``` 407 |
408 | 409 | ## Barcode Detection ## {#example-barcode-detection} 410 | 411 |
412 | The following example can also be found in e.g. 413 | this codepen. 414 |
415 | 416 |
417 | ```js 418 | let barcodeDetector = new BarcodeDetector(); 419 | // Assuming |theImage| is e.g. a <img> content, or a Blob. 420 | 421 | barcodeDetector.detect(theImage) 422 | .then(detectedCodes => { 423 | for (const barcode of detectedCodes) { 424 | console.log(' Barcode ${barcode.rawValue}' + 425 | ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' + 426 | ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}'); 427 | } 428 | }).catch(() => { 429 | console.error("Barcode Detection failed, boo."); 430 | }) 431 | ``` 432 |
433 | 434 | 443 | 444 |
445 | {
446 |   "iso15417": {
447 |       "href": "https://www.iso.org/standard/43896.html",
448 |       "title": "Information technology -- Automatic identification and data capture techniques -- Code 128 bar code symbology specification",
449 |       "publisher": "ISO/IEC",
450 |       "date": "June 2007"
451 |   },
452 |   "iso15420": {
453 |       "href": "https://www.iso.org/standard/46143.html",
454 |       "title": "Information technology -- Automatic identification and data capture techniques -- EAN/UPC bar code symbology specification",
455 |       "publisher": "ISO/IEC",
456 |       "date": "Decemver 2009"
457 |   },
458 |   "iso15438": {
459 |       "href": "https://www.iso.org/standard/65502.html",
460 |       "title": "Information technology -- Automatic identification and data capture techniques -- PDF417 bar code symbology specification",
461 |       "publisher": "ISO/IEC",
462 |       "date": "September 2015"
463 |   },
464 |   "iso16022": {
465 |       "href": "https://www.iso.org/standard/44230.html",
466 |       "title": "Information technology -- Automatic identification and data capture techniques -- Data Matrix bar code symbology specification",
467 |       "publisher": "ISO/IEC",
468 |       "date": "September 2009"
469 |   },
470 |   "iso16388": {
471 |       "href": "https://www.iso.org/standard/43897.html",
472 |       "title": "nformation technology -- Automatic identification and data capture techniques -- Code 39 bar code symbology specification",
473 |       "publisher": "ISO/IEC",
474 |       "date": "May 2007"
475 |   },
476 |   "iso18004": {
477 |       "href": "https://www.iso.org/standard/62021.html",
478 |       "title": "Information technology -- Automatic identification and data capture techniques -- QR Code bar code symbology specification",
479 |       "publisher": "ISO/IEC",
480 |       "date": "February 2015"
481 |   },
482 |   "iso24778": {
483 |       "href": "https://www.iso.org/standard/62021.html",
484 |       "title": "Information technology -- Automatic identification and data capture techniques -- Aztec Code bar code symbology specification",
485 |       "publisher": "ISO/IEC",
486 |       "date": "February 2008"
487 |   },
488 |   "bc2" :{
489 |       "title": "ANSI/AIM-BC2, Uniform Symbol Specification - Interleaved 2 of 5",
490 |       "publisher": "ANSI",
491 |       "date": "1995"
492 |   },
493 |   "bc5" :{
494 |       "title": "ANSI/AIM-BC5, Uniform Symbol Specification - Code 93",
495 |       "publisher": "ANSI",
496 |       "date": "1995"
497 |   }
498 | }
499 | 
500 | -------------------------------------------------------------------------------- /text.bs: -------------------------------------------------------------------------------- 1 |
  2 | Title: Accelerated Text Detection in Images
  3 | Repository: wicg/shape-detection-api
  4 | Status: CG-DRAFT
  5 | ED: https://wicg.github.io/shape-detection-api
  6 | Shortname: text-detection-api
  7 | Level: 1
  8 | Editor: Miguel Casas-Sanchez 82825, Google LLC https://www.google.com, mcasas@google.com
  9 | Editor: Reilly Grant 83788, Google LLC https://www.google.com, reillyg@google.com
 10 | Abstract: This document describes an API providing access to accelerated text detectors for still images and/or live image feeds.
 11 | Group: wicg
 12 | Markup Shorthands: markdown yes
 13 | !Participate: Join the W3C Community Group
 14 | !Participate: Fix the text through GitHub
 15 | 
16 | 17 | 34 | 35 | # Introduction # {#introduction} 36 | 37 | Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. This document deals with text detection whereas the sister document [[SHAPE-DETECTION-API]] specifies the Face and Barcode detection cases and APIs. 38 | 39 | ## Text detection use cases ## {#use-cases} 40 | 41 | Please see the Readme/Explainer in the repository. 42 | 43 | # Text Detection API # {#api} 44 | 45 | Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation. 46 | 47 | ## Image sources for detection ## {#image-sources-for-detection} 48 | 49 | Please refer to [[SHAPE-DETECTION-API#image-sources-for-detection]] 50 | 51 | ## Text Detection API ## {#text-detection-api} 52 | 53 | {{TextDetector}} represents an underlying accelerated platform's component for detection in images of Latin-1 text as defined in [[iso8859-1]]. It provides a single {{TextDetector/detect()}} operation on an {{ImageBitmapSource}} of which the result is a Promise. This method must reject this Promise in the cases detailed in [[#image-sources-for-detection]]; otherwise it may queue a task using the OS/Platform resources to resolve the Promise with a sequence of {{DetectedText}}s, each one essentially consisting on a {{DetectedText/rawValue}} and delimited by a {{DetectedText/boundingBox}} and a series of {{Point2D}}s. 54 | 55 |
56 | Example implementations of Text code detection are e.g. Google Play Services, Apple's CIDetector (bounding box only, no OCR) or Windows 10 OCR API. 57 |
58 | 59 | 60 | [ 61 | Exposed=(Window,Worker), 62 | SecureContext 63 | ] interface TextDetector { 64 | constructor(); 65 | Promise<sequence<DetectedText>> detect(ImageBitmapSource image); 66 | }; 67 | 68 | 69 |
70 |
`TextDetector()`
71 |
72 |
73 | Detectors may potentially allocate and hold significant resources. Where possible, reuse the same {{TextDetector}} for several detections. 74 |
75 |
76 |
detect(ImageBitmapSource |image|)
77 |
Tries to detect text blocks in the {{ImageBitmapSource}} |image|.
78 |
79 | 80 | ### {{DetectedText}} ### {#detectedtext-section} 81 | 82 | 83 | dictionary DetectedText { 84 | required DOMRectReadOnly boundingBox; 85 | required DOMString rawValue; 86 | required sequence<Point2D> cornerPoints; 87 | }; 88 | 89 | 90 |
91 |
`boundingBox`
92 |
A rectangle indicating the position and extent of a detected feature aligned to the image
93 | 94 |
`rawValue`
95 |
Raw string detected from the image, where characters are drawn from [[iso8859-1]].
96 | 97 |
`cornerPoints`
98 |
A sequence of corner points of the detected feature, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.
99 |
100 | 101 | # Examples # {#examples} 102 | 103 | This section is non-normative. 104 | 105 |

106 | Slightly modified/extended versions of these examples (and more) can be found in 107 | e.g. this codepen collection. 108 |

109 | 110 | ## Platform support for a text detector ## {#example-feature-detection} 111 | 112 |
113 | The following example can also be found in e.g. this codepen 115 | with minimal modifications. 116 |
117 | 118 |
119 | ```js 120 | if (window.TextDetector == undefined) { 121 | console.error('Text Detection not supported on this platform'); 122 | } 123 | ``` 124 |
125 | 126 | ## Text Detection ## {#example-text-detection} 127 | 128 |
129 | The following example can also be found in e.g. 130 | this codepen. 131 |
132 | 133 |
134 | ```js 135 | let textDetector = new TextDetector(); 136 | // Assuming |theImage| is e.g. a <img> content, or a Blob. 137 | 138 | textDetector.detect(theImage) 139 | .then(detectedTextBlocks => { 140 | for (const textBlock of detectedTextBlocks) { 141 | console.log( 142 | 'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' + 143 | 'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}'); 144 | } 145 | }).catch(() => { 146 | console.error("Text Detection failed, boo."); 147 | }) 148 | ``` 149 |
150 | 151 | 158 | 159 |
160 | {
161 |   "iso8859-1": {
162 |       "href": "https://www.iso.org/standard/28245.html",
163 |       "title": "Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1",
164 |       "publisher": "ISO/IEC",
165 |       "date": "April 1998"
166 |   }
167 | }
168 | 
169 | -------------------------------------------------------------------------------- /w3c.json: -------------------------------------------------------------------------------- 1 | { 2 | "group": [80485] 3 | , "contacts": ["marcoscaceres"] 4 | , "repo-type": "cg-report" 5 | } 6 | --------------------------------------------------------------------------------