├── .github └── workflows │ └── pr-push.yml ├── .gitignore ├── .pr-preview.json ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── index-zh-cn.bs ├── index.bs ├── text.bs └── w3c.json /.github/workflows/pr-push.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | on: 3 | pull_request: {} 4 | push: 5 | branches: [main] 6 | jobs: 7 | main: 8 | name: Build, Validate and Deploy 9 | runs-on: ubuntu-latest 10 | steps: 11 | - uses: actions/checkout@v2 12 | - name: index.bs 13 | uses: w3c/spec-prod@v2 14 | with: 15 | SOURCE: index.bs 16 | DESTINATION: index.html 17 | TOOLCHAIN: bikeshed 18 | BUILD_FAIL_ON: fatal 19 | GH_PAGES_BRANCH: gh-pages 20 | - name: text.bs 21 | uses: w3c/spec-prod@v2 22 | with: 23 | SOURCE: text.bs 24 | DESTINATION: text.html 25 | TOOLCHAIN: bikeshed 26 | BUILD_FAIL_ON: fatal 27 | GH_PAGES_BRANCH: gh-pages 28 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | 6 | # Runtime data 7 | pids 8 | *.pid 9 | *.seed 10 | 11 | # Directory for instrumented libs generated by jscoverage/JSCover 12 | lib-cov 13 | 14 | # Coverage directory used by tools like istanbul 15 | coverage 16 | 17 | # nyc test coverage 18 | .nyc_output 19 | 20 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 21 | .grunt 22 | 23 | # node-waf configuration 24 | .lock-wscript 25 | 26 | # Compiled binary addons (http://nodejs.org/api/addons.html) 27 | build/Release 28 | 29 | # Dependency directories 30 | node_modules 31 | jspm_packages 32 | 33 | # Optional npm cache directory 34 | .npm 35 | 36 | # Optional REPL history 37 | .node_repl_history 38 | -------------------------------------------------------------------------------- /.pr-preview.json: -------------------------------------------------------------------------------- 1 | { 2 | "src_file": "index.bs", 3 | "type": "bikeshed", 4 | "params": { 5 | "force": 1 6 | } 7 | } 8 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | All documentation, code and communication under this repository are covered by the [W3C Code of Ethics and Professional Conduct](https://www.w3.org/Consortium/cepc/). 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Web Platform Incubator Community Group 2 | 3 | This repository is being used for work in the Web Platform Incubator Community Group, governed by the [W3C Community License 4 | Agreement (CLA)](http://www.w3.org/community/about/agreements/cla/). To contribute, you must join 5 | the CG. 6 | 7 | If you are not the sole contributor to a contribution (pull request), please identify all 8 | contributors in the pull request's body or in subsequent comments. 9 | 10 | To add a contributor (other than yourself, that's automatic), mark them one per line as follows: 11 | 12 | ``` 13 | +@github_username 14 | ``` 15 | 16 | If you added a contributor by mistake, you can remove them in a comment with: 17 | 18 | ``` 19 | -@github_username 20 | ``` 21 | 22 | If you are making a pull request on behalf of someone else but you had no part in designing the 23 | feature, you can remove yourself with the above syntax. 24 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | All Reports in this Repository are licensed by Contributors under the 2 | [W3C Software and Document 3 | License](http://www.w3.org/Consortium/Legal/2015/copyright-software-and-document). Contributions to 4 | Specifications are made under the [W3C CLA](https://www.w3.org/community/about/agreements/cla/). 5 | 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Shape Detection API Specification _:stars:_:movie_camera: 3 | 4 | This is the repository for `shape-detection-api`, an experimental API for detecting Shapes (e.g. Faces, Barcodes, Text) in live or still images on the Web by **using accelerated hardware/OS resources**. 5 | 6 | You're welcome to contribute! Let's make the Web rock our socks off! 7 | 8 | ## [Introduction](https://wicg.github.io/shape-detection-api/#introduction) :blue_book: 9 | 10 | Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, text or QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging or detection of high saliency areas. Users interacting with WebCams or other Video Capture Devices have become accustomed to camera-like features such as the ability to focus directly on human faces on the screen of their devices. This is particularly true in the case of mobile devices, where hardware manufacturers have long been supporting these features. Unfortunately, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary. 11 | 12 | ## Use cases :camera: 13 | 14 | QR/barcode/text detection can be used for: 15 | * user identification/registration, e.g. for [voting purposes](https://twitter.com/RegistertoVote/status/733123511128981508); 16 | * eCommerce, e.g. [Walmart Pay](https://www.slashgear.com/awalmart-announces-walmart-pay-for-qr-code-based-mobile-payments-10417912/); 17 | * Augmented Reality overlay, e.g. [here](http://www.multidots.com/augmented-reality/); 18 | * Driving online-to-offline engagement, fighting fakes [etc](https://www.clickz.com/why-have-qr-codes-taken-off-in-china/23662/). 19 | 20 | Face detection can be used for: 21 | * producing fun effects, e.g. [Snapchat Lenses](https://support.snapchat.com/en-US/a/lenses1); 22 | * giving hints to encoders or auto focus routines; 23 | * user name tagging; 24 | * enhance accesibility by e.g. making objects appear larger as the user gets closer like [HeadTrackr](https://www.auduno.com/headtrackr/examples/targets.html); 25 | * speeding up Face Recognition by indicating the areas of the image where faces are present. 26 | 27 | 28 | ## Current Related Efforts and Workarounds :wrench: 29 | 30 | Some Web Apps -gasp- run Detection in Javascript. A performance comparison of some such libraries can be found [here](https://github.com/mtschirs/js-objectdetect#performance) (note that this performance evaluation does not include e.g. WebCam image acquisition and/or canvas interactions). 31 | 32 | Samsung Browser [has a private API](developer.samsung.com/internet) (click to unfold "Overview for Android", then search for "QR code reader"). 33 | 34 | **TODO**: compare a few JS/native libraries in terms of size and performance. A performance and detection comparison of some popular JS QR code scanners can be found [here](https://github.com/danimoh/qr-scanner-benchmark). `zxingjs2` has [a list of some additional JS libraries](https://github.com/ghybs/zxingjs2#other-barcode-image-processing-libraries-related-to-javascript). 35 | 36 | Android Native Apps usually integrate [ZXing](https://github.com/zxing/zxing) (which amounts to adding ~560KB when counting [core.jar](http://repo1.maven.org/maven2/com/google/zxing/core/3.3.0/), [android-core.jar](http://repo1.maven.org/maven2/com/google/zxing/android-core/3.3.0/) and [android-integration.jar](http://repo1.maven.org/maven2/com/google/zxing/android-integration/3.3.0/))). 37 | 38 | OCR reader in Javascript are north of 1MB of size () 39 | 40 | ## Potential for misuse :money_with_wings: 41 | 42 | Face Detection is an expensive operation due to the algorithmic complexity. Many requests, or demanding systems like a live stream feed with a certain frame rate, could slow down the whole system or greatly increase power consumption. 43 | 44 | ## Platform specific implementation notes :computer: 45 | 46 | ## Overview 47 | 48 | What platforms support what detector? 49 | 50 | Encoder | Mac| Android | Win10 | Linux | ChromeOs | 51 | --------- |:--:| :------:| :---: | :------:| :------: | 52 | Face | sw | hw/sw | sw | ✘| ✘ | 53 | QR/Barcode| sw | sw |✘| ✘| ✘ | 54 | Text | sw | sw | sw | ✘| ✘ | 55 | 56 | 57 | ### Android 58 | 59 | Android provides both a stand alone software face detector and a interface to the hardware ones. 60 | 61 | | API | uses... | Release notes | 62 | | ------------- |:-------------:| -----:| 63 | | [FaceDetector](https://developer.android.com/reference/android/media/FaceDetector.html)| Software based using the [Neven face detector](https://android.googlesource.com/platform/external/neven)| API Level 1, 2008| 64 | | [Vision.Face](https://developers.google.com/android/reference/com/google/android/gms/vision/face/Face)| Software based | Google Play services 7.2, Aug 2015| 65 | | [Camera2](https://developer.android.com/reference/android/hardware/camera2/CaptureRequest.html#STATISTICS_FACE_DETECT_MODE)| Hardware | API Level 21/Lollipop, 2014 | 66 | | [Camera.Face](https://developer.android.com/reference/android/hardware/Camera.Face.html) (old)| Hardware | API Level 14/Ice Cream Sandwich, 2011 | 67 | 68 | The availability of the actual hardware detection depends on the actual chip; according to the market share in [1H 2016](http://www.antutu.com/en/view.shtml?id=8256) Qualcomm, MediaTek, Samsung and HiSilicon are the largest individual OEMs and they all have support for Face Detection (all the top-10 phones are covered as well): 69 | * [Qualcomm Snapdragon](https://developer.qualcomm.com/software/snapdragon-sdk-android/facial-recognition) chipset family supports it since ~2013 as part of their ISP. 70 | * MediaTek as part of [CorePilot 2.0](http://cdn-cw.mediatek.com/White%20Papers/MediaTek_CorePilot%202.0_Final.pdf) (introduced in 2015). 71 | * [Samsung Exynos](http://www.samsung.com/semiconductor/minisite/Exynos/data/Benefits_of_Exynos_5420_ISP_for_Enhanced_Imaging_Experience.pdf) (at least 2013). 72 | * Huawei HiSilicon [Kirin950](http://www.androidauthority.com/huawei-hisilicon-kirin-950-official-653811) since 2015 (this fabless manufacturer is relatively new). 73 | * It is worth noting that ARM [acquired Apical in 2016](https://www.arm.com/products/graphics-and-multimedia/computer-vision) for its computer vision expertise. 74 | 75 | Barcode/QR and Text detection is available via Google Play Services [barcode](https://developers.google.com/android/reference/com/google/android/gms/vision/barcode/package-summary) and [text](https://developers.google.com/android/reference/com/google/android/gms/vision/text/package-summary), respectively. 76 | 77 | ### Mac OS X / iOS 78 | 79 | Mac OS X/iOS provides `CIDetector` and `Vision Framework` for Face, QR, Text and Rectangle detection in software or hardware. 80 | 81 | | API | uses... | Release notes | 82 | | ------------- |:-------------: | -----:| 83 | | [Vision Framework, Mac OS X](https://developer.apple.com/documentation/vision)| Software and Hardware | OS X v10.13, 2017 | 84 | | [Vision Framework, iOS](https://developer.apple.com/documentation/vision)| Software and Hardware | IOS X v11.0, 2017 | 85 | | [CIDetector, Mac OS X](https://developer.apple.com/library/mac/documentation/CoreImage/Reference/CIDetector_Ref/)| Software | OS X v10.7, 2011 | 86 | | [CIDetector, iOS](https://developer.apple.com/library/ios/documentation/CoreImage/Reference/CIDetector_Ref/) | Software | iOS v5.0, 2011 | 87 | | [AVFoundation](https://developer.apple.com/reference/avfoundation/avcapturemetadataoutput?language=objc)| Hardware | iOS 6.0, 2012 | 88 | 89 | Apple has supported Face Detection in hardware since the [Apple A5 processor](https://en.wikipedia.org/wiki/Apple_A5) introduced in 2011. 90 | 91 | ### Windows 92 | 93 | Windows 10 has a [FaceDetector](https://msdn.microsoft.com/library/windows/apps/dn974129) class and support for Text Detection [OCR](https://msdn.microsoft.com/en-us/library/windows/apps/windows.media.ocr.aspx). 94 | 95 | ## Rendered URL :bookmark_tabs: 96 | 97 | The rendered version of this site can be found in https://wicg.github.io/shape-detection-api (if that's not alive for some reason try the [rawgit rendering](https://rawgit.com/WICG/shape-detection-api/gh-pages/index.html)). 98 | 99 | ## Examples and demos 100 | 101 | https://wicg.github.io/shape-detection-api/#examples 102 | 103 | ## Notes on bikeshedding :bicyclist: 104 | 105 | To compile, run: 106 | 107 | ``` 108 | curl https://api.csswg.org/bikeshed/ -F file=@index.bs -F force=1 > index.html 109 | ``` 110 | 111 | if the produced file has a strange size (i.e. zero), then something went terribly wrong; run instead 112 | 113 | ``` 114 | curl https://api.csswg.org/bikeshed/ -F file=@index.bs -F output=err 115 | ``` 116 | and try to figure out why `bikeshed` did not like the `.bs` :'( 117 | -------------------------------------------------------------------------------- /index-zh-cn.bs: -------------------------------------------------------------------------------- 1 |
2 | Title: 加速的图形识别 3 | Repository: wicg/shape-detection-api 4 | Status: w3c/CG-DRAFT 5 | ED: https://wicg.github.io/shape-detection-api 6 | Shortname: shape-detection-api 7 | Level: 1 8 | Editor: Miguel Casas-Sanchez, w3cid 82825, Google Inc., mcasas@google.com 9 | Abstract: 本文档描述了一套Chrome中针对静态和/或动态图像的图形识别(如:人脸识别)API。 10 | Group: wicg 11 | !Participate: Join the W3C Community Group 12 | !Participate: Fix the text through GitHub 13 |14 | 15 | 32 | 33 | # 简介 # {#introduction} 34 | 35 | 照片和图像是互联网构成中最大的部分,其中相当一部分包含了可识别的特征,比如人脸,二维码或者文本。可想而之,识别这些特征的计算开销非常大,但有些很有趣场景,比如在照片中自动标记人脸,或者根据图像中的URL进行重定向。硬件厂商从很久以前就已经开始支持这些特性,但Web应用迟迟未能很好地利用上这些硬件特性,必须借助一些难用的程序库才能达到目的。 36 | 37 | ## 图形识别的场景 ## {#image-sources-for-detection} 38 | 39 | 请参考代码库中自述/解释 的文档。 40 | 41 | # 图形识别API # {#api} 42 | 43 | 某些特定的浏览器可能会提供识别器来标示当前硬件是否提供加速功能。 44 | 45 | ## 用于识别的图像源 ## {#image-sources-for-detection} 46 | 47 |
48 | 本节的灵感来自 [[canvas2dcontext#image-sources-for-2d-rendering-contexts]]。 49 |
50 | 51 | {{ImageBitmapSource}} 允许多种图形接口的实现对象作为图像源,进行识别处理。 52 | 53 | 54 | * 当{{ImageBitmapSource}}对象代表{{HTMLImageElement}}的时候,该元素的图像必须用作源图像。而在特定情况下,当{{ImageBitmapSource}}对象代表{{HTMLImageElement}}中的动画图像的时候,用户代理程序(User Agent)必须显示这个动画图像的默认图像(该默认图像指的是,在动画图像被禁用或不支持动画的环境下,需要展现的图像),或者没有默认图像的话,就显示该动画图像的第一帧。 55 | 56 | * 当{{ImageBitmapSource}}对象代表{{HTMLVideoElement}}的时候,该视频播放的当前帧必须用作源图像,同时,该源图像的尺寸必须是视频源的固有维数(intrinsic dimensions),换句话说,就是视频源经过任意比例的调整后的大小。 57 | 58 | 59 | * 当{{ImageBitmapSource}}对象代表{{HTMLCanvasElement}}的时候,该元素的位图必须用作源图像。 60 | 61 | 当用户代理程序(User Agent)被要求用某种既有的{{ImageBitmapSource}}作为识别器的detect()
方法的输入参数的时候,必须执行以下步骤:
62 |
63 | * 如果{{ImageBitmapSource}}所含的有效脚本源([[HTML#concept-origin]])和当前文档的有效脚本源不同,就拒绝对应的Promise对象,并附上一个名为{{SecurityError}}的新建{{DOMException}}对象。
64 |
65 | * 如果一个{{ImageBitmapSource}}是一个处于|broken|状态的{{HTMLImageElement}}对象的话,就拒绝对应的Promise对象,并附上一个名为{{InvalidStateError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。
66 |
67 |
68 | * 如果{{ImageBitmapSource}}是一个不能完整解码的{{HTMLImageElement}}对象的话,就拒绝对应的Promise对象,并附上一个名为{{InvalidStateError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。
69 |
70 | * 如果一个{{ImageBitmapSource}}是一个{{HTMLVideoElement}}对象,且其|readyState|属性为|HAVE_NOTHING| 或 |HAVE_METADATA|的话,就拒绝对应的Promise对象,并附上一个名为{{InvalidStateError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。
71 |
72 |
73 | * 如果一个{{ImageBitmapSource}}是一个{{HTMLCanvasElement}}对象,且其位图的|origin-clean| ([[HTML#concept-canvas-origin-clean]])标识为false的话,就拒绝对应的Promise对象,并附上一个名为{{SecurityError}}的新建{{DOMException}}对象,同时停止之后的所有步骤。
74 |
75 |
76 | 请注意,如果一个{{ImageBitmapSource}}的水平尺寸或垂直尺寸等于0,那么对应的Promise对象就会被简单地当作一个空的已检测对象序列来处理。
77 |
78 |
79 | ## 人脸识别API ## {#face-detection-api}
80 |
81 | {{FaceDetector}}代表一个针对图像中的人脸进行识别的底层加速平台组件。创建时可以选择一个{{FaceDetectorOptions}}的Dictionary对象作为入参。它提供了一个单独的 {{FaceDetector/detect()}}方法操作{{ImageBitmapSource}}对象,并返回Promise对象。如果检测到[[#image-sources-for-detection]]中提及的用例,则该方法必须拒绝该Promise对象;否则,它可能会向{{DetectedFace}}序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise,每个任务由一个{{DetectedFace/boundingBox}}所包含并界定。
82 |
83 | 84 | dictionary FaceDetectorOptions { 85 | unsigned short maxDetectedFaces; 86 | boolean fastMode; 87 | }; 88 |89 | 90 |
maxDetectedFaces
fastMode
98 | [Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)] 99 | interface FaceDetector { 100 | Promise<sequence<DetectedFace>> detect(ImageBitmapSource image); 101 | }; 102 |103 | 104 |
FaceDetector(optional FaceDetectorOptions faceDetectorOptions)
detect()
112 | interface DetectedFace { 113 | [SameObject] readonly attribute DOMRectReadOnly boundingBox; 114 | }; 115 |116 | 117 |
boundingBox
129 | [SameObject] readonly attribute unsigned long id; 130 | [SameObject] readonly attribute FrozenArray<Landmark>? landmarks; 131 |132 | to {{DetectedFace}}. 133 |
141 | [Exposed=(Window,Worker), Constructor()] 142 | interface BarcodeDetector { 143 | Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image); 144 | }; 145 |146 | 147 |
detect(ImageBitmapSource image)
153 | interface DetectedBarcode { 154 | [SameObject] readonly attribute DOMRectReadOnly boundingBox; 155 | [SameObject] readonly attribute DOMString rawValue; 156 | [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints; 157 | }; 158 |159 | 160 |
boundingBox
rawValue
cornerPoints
182 | [ 183 | Constructor, 184 | Exposed=(Window,Worker), 185 | ] interface TextDetector { 186 | Promise<sequence<DetectedText>> detect(ImageBitmapSource image); 187 | }; 188 |189 | 190 |
detect(ImageBitmapSource image)
196 | [ 197 | Constructor, 198 | ] interface DetectedText { 199 | [SameObject] readonly attribute DOMRect boundingBox; 200 | [SameObject] readonly attribute DOMString rawValue; 201 | }; 202 |203 | 204 |
boundingBox
rawValue
219 | 以下示例的微调或扩展版本,以及更多示例请参考这个codepen集合。 220 |
221 | 222 | ## 图形识别器的平台支持 ## {#platform-support-for-a-given-detector} 223 | 224 |231 | if (window.FaceDetector == undefined) { 232 | console.error('Face Detection not supported on this platform'); 233 | } 234 | if (window.BarcodeDetector == undefined) { 235 | console.error('Barcode Detection not supported on this platform'); 236 | } 237 | if (window.TextDetector == undefined) { 238 | console.error('Text Detection not supported on this platform'); 239 | } 240 |241 |
251 | let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1}); 252 | // Assuming |theImage| is e.g. a <img> content, or a Blob. 253 | 254 | faceDetector.detect(theImage) 255 | .then(detectedFaces => { 256 | for (const face of detectedFaces) { 257 | console.log(' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' + 258 | ' size ${face.boundingBox.width}x${face.boundingBox.height}'); 259 | } 260 | }).catch(() => { 261 | console.error("Face Detection failed, boo."); 262 | }) 263 |264 |
274 | let barcodeDetector = new BarcodeDetector(); 275 | // Assuming |theImage| is e.g. a <img> content, or a Blob. 276 | 277 | barcodeDetector.detect(theImage) 278 | .then(detectedCodes => { 279 | for (const barcode of detectedCodes) { 280 | console.log(' Barcode ${barcode.rawValue}' + 281 | ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' + 282 | ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}'); 283 | } 284 | }).catch(() => { 285 | console.error("Barcode Detection failed, boo."); 286 | }) 287 |288 |
298 | let textDetector = new TextDetector(); 299 | // Assuming |theImage| is e.g. a <img> content, or a Blob. 300 | 301 | textDetector.detect(theImage) 302 | .then(detectedTextBlocks => { 303 | for (const textBlock of detectedTextBlocks) { 304 | console.log( 305 | 'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' + 306 | 'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}'); 307 | } 308 | }).catch(() => { 309 | console.error("Text Detection failed, boo."); 310 | }) 311 |312 |
316 | spec: ECMAScript; urlPrefix: https://tc39.github.io/ecma262/# 317 | type: interface 318 | text: Array; url: sec-array-objects 319 | text: Promise; url:sec-promise-objects 320 | text: TypeError; url: sec-native-error-types-used-in-this-standard-typeerror 321 |322 | 323 |
324 | type: interface; text: Point2D; url: https://w3c.github.io/mediacapture-image/#Point2D; 325 |326 | 327 |
328 | type: interface; text: DOMString; url: https://heycam.github.io/webidl/#idl-DOMString; spec: webidl 329 |330 | 331 |
332 | spec: html 333 | type: dfn 334 | text: allowed to show a popup 335 | text: in parallel 336 | text: incumbent settings object 337 |338 | 339 |
340 | { 341 | "wikipedia": { 342 | "href": "https://en.wikipedia.org/wiki/Object-class_detection", 343 | "title": "Object-class Detection Wikipedia Entry", 344 | "publisher": "Wikipedia", 345 | "date": "14 September 2016" 346 | }, 347 | "canvas2dcontext": { 348 | "authors": [ "Rik Cabanier", "Jatinder Mann", "Jay Munro", "Tom Wiltzius", 349 | "Ian Hickson"], 350 | "href": "https://www.w3.org/TR/2dcontext/", 351 | "title": "HTML Canvas 2D Context", 352 | "status": "REC" 353 | } 354 | } 355 |356 | 357 | -------------------------------------------------------------------------------- /index.bs: -------------------------------------------------------------------------------- 1 |
2 | Title: Accelerated Shape Detection in Images 3 | Repository: wicg/shape-detection-api 4 | Status: CG-DRAFT 5 | ED: https://wicg.github.io/shape-detection-api 6 | Shortname: shape-detection-api 7 | Level: 1 8 | Editor: Miguel Casas-Sanchez 82825, Google LLC https://www.google.com, mcasas@google.com 9 | Editor: Reilly Grant 83788, Google LLC https://www.google.com, reillyg@google.com 10 | Abstract: This document describes an API providing access to accelerated shape detectors (e.g. human faces) for still images and/or live image feeds. 11 | Translation: zh-CN https://wicg.github.io/shape-detection-api/index-zh-cn.html 12 | Group: wicg 13 | Markup Shorthands: markdown yes 14 | !Participate: Join the W3C Community Group 15 | !Participate: Fix the text through GitHub 16 |17 | 18 | 35 | 36 | # Introduction # {#introduction} 37 | 38 | Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces or barcordes/QR codes. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. While hardware manufacturers have been supporting these features for a long time, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary. 39 | 40 |
41 | Text Detection, despite being an interesting field, is not considered stable enough across neither computing platforms nor character sets to be standarized in the context of this document. For reference a sister informative specification is kept in [[TEXT-DETECTION-API]]. 42 |
43 | 44 | ## Shape detection use cases ## {#use-cases} 45 | 46 | Please see the Readme/Explainer in the repository. 47 | 48 | # Shape Detection API # {#api} 49 | 50 | Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation. 51 | 52 | Detecting features in an image occurs asynchronously, potentially communicating with acceleration hardware independent of the browser. Completion events use the shape detection task source. 53 | 54 | ## Image sources for detection ## {#image-sources-for-detection} 55 | 56 |57 | This section is inspired by [[2dcontext#image-sources-for-2d-rendering-contexts]]. 58 |
59 | 60 | {{ImageBitmapSource}} allows objects implementing any of a number of interfaces to be used as image sources for the detection process. 61 | 62 | * When an {{ImageBitmapSource}} object represents an {{HTMLImageElement}}, the element's image must be used as the source image. Specifically, when an {{ImageBitmapSource}} object represents an animated image in an {{HTMLImageElement}}, the user agent must use the default image of the animation (the one that the format defines is to be used when animation is not supported or is disabled), or, if there is no such image, the first frame of the animation. 63 | 64 | * When an {{ImageBitmapSource}} object represents an {{HTMLVideoElement}}, then the frame at the current playback position when the method with the argument is invoked must be used as the source image when processing the image, and the source image's dimensions must be the intrinsic dimensions of the media resource (i.e. after any aspect-ratio correction has been applied). 65 | 66 | * When an {{ImageBitmapSource}} object represents an {{HTMLCanvasElement}}, the element's bitmap must be used as the source image. 67 | 68 | When the UA is required to use a given type of {{ImageBitmapSource}} as input argument for the `detect()` method of whichever detector, it MUST run these steps: 69 | 70 | * If any {{ImageBitmapSource}} have an effective script origin ([=origin=]) which is not the same as the Document's effective script origin, then reject the Promise with a new {{DOMException}} whose name is {{SecurityError}}. 71 | 72 | * If the {{ImageBitmapSource}} is an {{HTMLImageElement}} object that is in the `Broken` (HTML Standard §img-error) state, then reject the Promise with a new {{DOMException}} whose name is {{InvalidStateError}}, and abort any further steps. 73 | 74 | * If the {{ImageBitmapSource}} is an {{HTMLImageElement}} object that is not fully decodable then reject the Promise with a new {{DOMException}} whose name is {{InvalidStateError}}, and abort any further steps 75 | 76 | * If the {{ImageBitmapSource}} is an {{HTMLVideoElement}} object whose {{HTMLMediaElement/readyState}} attribute is either {{HAVE_NOTHING}} or {{HAVE_METADATA}} then reject the Promise with a new {{DOMException}} whose name is {{InvalidStateError}}, and abort any further steps. 77 | 78 | * If the {{ImageBitmapSource}} argument is an {{HTMLCanvasElement}} whose bitmap's `origin-clean` (HTML Standard §concept-canvas-origin-clean) flag is false, then reject the Promise with a new {{DOMException}} whose name is {{SecurityError}}, and abort any further steps. 79 | 80 | Note that if the {{ImageBitmapSource}} is an object with either a horizontal dimension or a vertical dimension equal to zero, then the Promise will be simply resolved with an empty sequence of detected objects. 81 | 82 | ## Face Detection API ## {#face-detection-api} 83 | 84 | {{FaceDetector}} represents an underlying accelerated platform's component for detection of human faces in images. It can be created with an optional Dictionary of {{FaceDetectorOptions}}. It provides a single {{FaceDetector/detect()}} operation on an {{ImageBitmapSource}} which result is a Promise. This method MUST reject this promise in the cases detailed in [[#image-sources-for-detection]]; otherwise it MAY queue a task that utilizes the OS/Platform resources to resolve the Promise with a Sequence of {{DetectedFace}}s, each one essentially consisting on and delimited by a {{DetectedFace/boundingBox}}. 85 | 86 |FaceDetector(optional FaceDetectorOptions |faceDetectorOptions|)
detect(ImageBitmapSource |image|)
176 | [SameObject] readonly attribute unsigned long id; 177 |178 | to {{DetectedFace}}. 179 |
BarcodeDetector(optional BarcodeDetectorOptions |barcodeDetectorOptions|)
detect(ImageBitmapSource |image|)
361 | Slightly modified/extended versions of these examples (and more) can be found in 362 | e.g. this codepen collection. 363 |
364 | 365 | ## Platform support for a given detector ## {#example-feature-detection} 366 | 367 |435 | spec: html 436 | type: dfn 437 | text: allowed to show a popup 438 | text: in parallel 439 | text: incumbent settings object 440 | for: / 441 | text: origin 442 |443 | 444 |
445 | { 446 | "iso15417": { 447 | "href": "https://www.iso.org/standard/43896.html", 448 | "title": "Information technology -- Automatic identification and data capture techniques -- Code 128 bar code symbology specification", 449 | "publisher": "ISO/IEC", 450 | "date": "June 2007" 451 | }, 452 | "iso15420": { 453 | "href": "https://www.iso.org/standard/46143.html", 454 | "title": "Information technology -- Automatic identification and data capture techniques -- EAN/UPC bar code symbology specification", 455 | "publisher": "ISO/IEC", 456 | "date": "Decemver 2009" 457 | }, 458 | "iso15438": { 459 | "href": "https://www.iso.org/standard/65502.html", 460 | "title": "Information technology -- Automatic identification and data capture techniques -- PDF417 bar code symbology specification", 461 | "publisher": "ISO/IEC", 462 | "date": "September 2015" 463 | }, 464 | "iso16022": { 465 | "href": "https://www.iso.org/standard/44230.html", 466 | "title": "Information technology -- Automatic identification and data capture techniques -- Data Matrix bar code symbology specification", 467 | "publisher": "ISO/IEC", 468 | "date": "September 2009" 469 | }, 470 | "iso16388": { 471 | "href": "https://www.iso.org/standard/43897.html", 472 | "title": "nformation technology -- Automatic identification and data capture techniques -- Code 39 bar code symbology specification", 473 | "publisher": "ISO/IEC", 474 | "date": "May 2007" 475 | }, 476 | "iso18004": { 477 | "href": "https://www.iso.org/standard/62021.html", 478 | "title": "Information technology -- Automatic identification and data capture techniques -- QR Code bar code symbology specification", 479 | "publisher": "ISO/IEC", 480 | "date": "February 2015" 481 | }, 482 | "iso24778": { 483 | "href": "https://www.iso.org/standard/62021.html", 484 | "title": "Information technology -- Automatic identification and data capture techniques -- Aztec Code bar code symbology specification", 485 | "publisher": "ISO/IEC", 486 | "date": "February 2008" 487 | }, 488 | "bc2" :{ 489 | "title": "ANSI/AIM-BC2, Uniform Symbol Specification - Interleaved 2 of 5", 490 | "publisher": "ANSI", 491 | "date": "1995" 492 | }, 493 | "bc5" :{ 494 | "title": "ANSI/AIM-BC5, Uniform Symbol Specification - Code 93", 495 | "publisher": "ANSI", 496 | "date": "1995" 497 | } 498 | } 499 |500 | -------------------------------------------------------------------------------- /text.bs: -------------------------------------------------------------------------------- 1 |
2 | Title: Accelerated Text Detection in Images 3 | Repository: wicg/shape-detection-api 4 | Status: CG-DRAFT 5 | ED: https://wicg.github.io/shape-detection-api 6 | Shortname: text-detection-api 7 | Level: 1 8 | Editor: Miguel Casas-Sanchez 82825, Google LLC https://www.google.com, mcasas@google.com 9 | Editor: Reilly Grant 83788, Google LLC https://www.google.com, reillyg@google.com 10 | Abstract: This document describes an API providing access to accelerated text detectors for still images and/or live image feeds. 11 | Group: wicg 12 | Markup Shorthands: markdown yes 13 | !Participate: Join the W3C Community Group 14 | !Participate: Fix the text through GitHub 15 |16 | 17 | 34 | 35 | # Introduction # {#introduction} 36 | 37 | Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. This document deals with text detection whereas the sister document [[SHAPE-DETECTION-API]] specifies the Face and Barcode detection cases and APIs. 38 | 39 | ## Text detection use cases ## {#use-cases} 40 | 41 | Please see the Readme/Explainer in the repository. 42 | 43 | # Text Detection API # {#api} 44 | 45 | Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation. 46 | 47 | ## Image sources for detection ## {#image-sources-for-detection} 48 | 49 | Please refer to [[SHAPE-DETECTION-API#image-sources-for-detection]] 50 | 51 | ## Text Detection API ## {#text-detection-api} 52 | 53 | {{TextDetector}} represents an underlying accelerated platform's component for detection in images of Latin-1 text as defined in [[iso8859-1]]. It provides a single {{TextDetector/detect()}} operation on an {{ImageBitmapSource}} of which the result is a Promise. This method must reject this Promise in the cases detailed in [[#image-sources-for-detection]]; otherwise it may queue a task using the OS/Platform resources to resolve the Promise with a sequence of {{DetectedText}}s, each one essentially consisting on a {{DetectedText/rawValue}} and delimited by a {{DetectedText/boundingBox}} and a series of {{Point2D}}s. 54 | 55 |
detect(ImageBitmapSource |image|)
106 | Slightly modified/extended versions of these examples (and more) can be found in 107 | e.g. this codepen collection. 108 |
109 | 110 | ## Platform support for a text detector ## {#example-feature-detection} 111 | 112 |152 | spec: html 153 | type: dfn 154 | text: allowed to show a popup 155 | text: in parallel 156 | text: incumbent settings object 157 |158 | 159 |
160 | { 161 | "iso8859-1": { 162 | "href": "https://www.iso.org/standard/28245.html", 163 | "title": "Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1", 164 | "publisher": "ISO/IEC", 165 | "date": "April 1998" 166 | } 167 | } 168 |169 | -------------------------------------------------------------------------------- /w3c.json: -------------------------------------------------------------------------------- 1 | { 2 | "group": [80485] 3 | , "contacts": ["marcoscaceres"] 4 | , "repo-type": "cg-report" 5 | } 6 | --------------------------------------------------------------------------------