├── .gitignore ├── LICENSE ├── README-V1.md ├── README.md ├── images ├── gank today.png ├── gankio-heroku-search.png ├── gankio.png ├── gs_today.png ├── gs_valueanimator.png ├── search-gankio.png ├── search1.png ├── search2.png ├── search3.png └── search4.png ├── releases ├── Gank.1.0.2.alfredworkflow ├── Gank.1.0.3.alfredworkflow └── Gank.2.0.0.alfredworkflow ├── source-v1 ├── gank.py ├── icon.png ├── info.plist ├── version └── workflow │ ├── Notify.tgz │ ├── __init__.py │ ├── background.py │ ├── notify.py │ ├── update.py │ ├── version │ ├── web.py │ └── workflow.py └── source ├── android.png ├── apple.png ├── bs4 ├── __init__.py ├── builder │ ├── __init__.py │ ├── _html5lib.py │ ├── _htmlparser.py │ └── _lxml.py ├── dammit.py ├── element.py ├── testing.py └── tests │ ├── __init__.py │ ├── test_builder_registry.py │ ├── test_docs.py │ ├── test_html5lib.py │ ├── test_htmlparser.py │ ├── test_lxml.py │ ├── test_soup.py │ └── test_tree.py ├── gank.png ├── gank.py ├── html5.png ├── icon.png ├── info.plist ├── other.png ├── picture.png ├── version ├── video.png └── workflow ├── Notify.tgz ├── __init__.py ├── background.py ├── notify.py ├── update.py ├── version ├── web.py └── workflow.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | 55 | # Sphinx documentation 56 | docs/_build/ 57 | 58 | # PyBuilder 59 | target/ 60 | 61 | #Ipython Notebook 62 | .ipynb_checkpoints 63 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 潇涧 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README-V1.md: -------------------------------------------------------------------------------- 1 | # Gank Alfred Workflow 2 | > The missing Alfred Workflow for searching ganks(干货) in gank.io 3 | 4 | 干货搜索器,一个自制的Alfred Workflow,功能是搜索[gank.io](http://gank.io)中的干货! 5 | 6 | #### 0.使用方式 7 | 8 | [下载最新版本的workflow文件双击使用Alfred打开即可](https://github.com/hujiaweibujidao/Gank-Alfred-Workflow/releases),workflow也已经提交到[Packal](http://www.packal.org/workflow/gank-alfred-workflow)。 9 | 10 | **使用前提是Alfred Workflow开启了Powerpack功能。** 11 | 12 | 搜索操作的触发词是`gk` (short for `gank`) 13 | 14 | ![img](images/search1.png) 15 | 16 | 17 | 18 | #### 1.项目起因 19 | 20 | [gank.io](http://gank.io)网站上的搜索是根据搜索关键词在各期的干货日报中的出现情况进行搜索的,得到的搜索结果也是包含搜索关键词的日报,而不是具体的干货。加之不少推荐的干货的描述并没有简要地概括其干货内容,导致很多干货很难搜索到。此外,目前的搜索功能不支持多个搜索词的组合搜索,比如搜索`ios 动画`会得到很多结果,而搜索`动画`得到的结果只有几个。(好了,不说了,不然[@代码家](https://github.com/daimajia)要直接干我了 😭) 21 | 22 | ![img](images/search-gankio.png) 23 | 24 | 项目的真实起因其实是自己马上就要毕业参加工作了,深知自己很多东西都不会,所以想要看些干货提高自己,但是却又找不到干货在哪里 🙈🙈🙈 25 | 26 | #### 2.改进方案 27 | 28 | 两个可供改进的点: 29 | 30 | (1)搜索结果应该直接显示与搜索关键词相关的干货列表 31 | 32 | (2)搜索过程应该考虑干货目标网页的内容 33 | 34 | #### 3.实现原理 35 | 36 | 只通过一个Alfred Workflow要想解决这么多问题是比较困难的,Alfred Workflow只是调用我开发的另一个项目[Ganks for gank.io](https://github.com/hujiaweibujidao/Ganks-for-gank.io)部署在Heroku平台的搜索接口,并将搜索结果显示给开发者。所以,整个项目主要分成两部分: 37 | 38 | ##### (1)[Ganks for gank.io](https://github.com/hujiaweibujidao/Ganks-for-gank.io) 39 | 40 | **这个项目主要是利用[Gank的API](http://gank.io/api)来获取干货列表,除此之外,该项目还利用[dragnet](https://github.com/seomoz/dragnet)开源工具提取每一个干货的目标网页内容,最终利用[Lucene](http://lucene.apache.org/)和[Spark](http://sparkjava.com/)等开源工具提供一个高效的干货搜索接口,并将其部署在[Heroku](https://www.heroku.com/)平台。** [网站预览](http://gankio.herokuapp.com/) 41 | 42 | 有了dragnet和lucene等开源工具搜索就好办多了,更多详情请查看项目[Ganks for gank.io](https://github.com/hujiaweibujidao/Ganks-for-gank.io) 43 | 44 | ![img](images/gankio.png) 45 | 46 | 网站[http://gankio.herokuapp.com/](http://gankio.herokuapp.com/)提供的搜索接口如下图所示,发送post请求到`http://gankio.herokuapp.com/search`,请求体放参数为`keyword`的搜索关键词即可。**这个接口大家都可以调用的哟,赶紧试试吧!** 47 | 48 | ![img](images/gankio-heroku-search.png) 49 | 50 | **任何工具都可以测试该接口,只是目前我的Heroku账号处于free plan,所以应用某些时候会处于暂停服务状态,所以祝你好运!** 🙈🙈🙈 51 | 52 | ##### (2)Gank Alfred Workflow 53 | 54 | 有了后台搜索接口之后Alfred Workflow就好写了,这里使用的是被广泛使用的Python库[deanishe/alfred-workflow](https://github.com/deanishe/alfred-workflow/),它对workflow的很多功能和操作进行了封装,例如预处理使用者输入的数据、请求数据的缓存、workflow的更新等,作者担心我们不会用,还特意写了份很好的[入门教程](http://www.deanishe.net/alfred-workflow/tutorial.html)方便开发者迅速上手。 55 | 56 | **Gank Alfred Workflow的功能是返回前10条与使用者输入的搜索关键词相关的干货,快捷键操作将会使用默认的浏览器打开对应干货的网址。**嗯,是的,仅此而已,但是其实你如果觉得有必要还可以加上默认显示最新一期的干货数据、或者自定义各种搜索方式来玩转[gank.io](http://gank.io/api)提供的API都行。 57 | 58 | #### 4.界面截图 59 | 60 | ![img](images/search2.png) 61 | 62 | 63 | 64 | ![img](images/search3.png) 65 | 66 | 67 | 68 | ![img](images/search4.png) 69 | 70 | #### 5.开发后记 71 | 72 | 很显然,如果我们的后台有更多的干货的话,就能不断增强这个Alfred Workflow的搜索体验(这还不是为了远离百毒嘛😓)。这个相关的开发也是我目前还在做的另一个项目[GankHub](https://github.com/hujiaweibujidao/GankHub),从名字中可以看出这是要做一个增强版的`干货集中营`,其中的数据既包含了[gank.io](http://gank.io)中的数据,还包含了Android开发周报中的干货数据 (我的另一个项目[Ganks-for-andoirdweekly.net](https://github.com/hujiaweibujidao/Ganks-for-andoirdweekly.net)做的事情)。还有一个棘手的问题是,**如何及时处理后台干货数据的更新?** Heroku的定时任务要收费! 73 | 74 | 如果你对我的开发工作感兴趣的话记得在Github上Follow我哟,或者关注[我的博客](http://hujiaweibujidao.github.io/)。 75 | 76 | #### The license 77 | 78 | ``` 79 | The MIT License (MIT) 80 | 81 | Copyright (c) 2016 Hujiawei 82 | 83 | Permission is hereby granted, free of charge, to any person obtaining a copy 84 | of this software and associated documentation files (the "Software"), to deal 85 | in the Software without restriction, including without limitation the rights 86 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 87 | copies of the Software, and to permit persons to whom the Software is 88 | furnished to do so, subject to the following conditions: 89 | 90 | The above copyright notice and this permission notice shall be included in all 91 | copies or substantial portions of the Software. 92 | 93 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 94 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 95 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 96 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 97 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 98 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 99 | SOFTWARE. 100 | ``` 101 | 102 | 103 | 104 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Github All Releases](https://img.shields.io/github/downloads/hujiaweibujidao/Gank-Alfred-Workflow/total.svg) 2 | 3 | # Gank Alfred Workflow 4 | 5 | > The missing Alfred Workflow for searching ganks(干货) in gank.io 6 | 7 | 干货搜索器,一个自制的Alfred Workflow,功能是搜索[gank.io](http://gank.io)中的干货! 8 | 9 | 这是干货搜索器的第二个版本V2,因为[@代码家](https://github.com/daimajia)提供了官方的干货搜索接口咯! 10 | 11 | 官方的干货搜索接口真是碾压一切啊!用起来相当方便,搜索结果也是非常丰富! 12 | 13 | #### 使用方式 14 | 15 | [下载最新版本的workflow文件双击使用Alfred打开即可](https://github.com/hujiaweibujidao/Gank-Alfred-Workflow/releases)。 16 | 17 | **使用前提是Alfred Workflow开启了Powerpack功能。** 18 | 19 | 搜索操作的触发词是`gs` (short for `gank search`) 20 | 21 | ![img](images/gs_valueanimator.png) 22 | 23 | 有了后台搜索接口之后Alfred Workflow就好写了,这里使用的是Python库[deanishe/alfred-workflow](https://github.com/deanishe/alfred-workflow/),它对workflow的很多功能和操作进行了封装,例如预处理使用者输入的数据、请求数据的缓存、workflow的更新等,作者担心我们不会用,还特意写了份很好的[入门教程](http://www.deanishe.net/alfred-workflow/tutorial.html)方便开发者迅速上手。 24 | 25 | **Gank Alfred Workflow的主要功能是返回前50条与输入的搜索关键词相关的干货列表数据(该数据缓存时间是10分钟),点击Enter将会使用默认的浏览器打开对应干货的网址。此外,如果用户没有提供任何搜索关键词的话会默认显示今天的干货数据(该数据缓存时间是1分钟)。不同类型的干货前面的小图标是不同的,此外,每条干货下面都会显示对应的推荐者哟。** 26 | 27 | ![img](images/gs_today.png) 28 | 29 | 如果你对我的开发工作感兴趣的话记得在Github上Follow我哟,或者关注[我的博客](http://hujiaweibujidao.github.io/)。 30 | 31 | #### The license 32 | 33 | ``` 34 | The MIT License (MIT) 35 | 36 | Copyright (c) 2016 Hujiawei 37 | 38 | Permission is hereby granted, free of charge, to any person obtaining a copy 39 | of this software and associated documentation files (the "Software"), to deal 40 | in the Software without restriction, including without limitation the rights 41 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 42 | copies of the Software, and to permit persons to whom the Software is 43 | furnished to do so, subject to the following conditions: 44 | 45 | The above copyright notice and this permission notice shall be included in all 46 | copies or substantial portions of the Software. 47 | 48 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 49 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 50 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 51 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 52 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 53 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 54 | SOFTWARE. 55 | ``` 56 | 57 | 58 | 59 | -------------------------------------------------------------------------------- /images/gank today.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/gank today.png -------------------------------------------------------------------------------- /images/gankio-heroku-search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/gankio-heroku-search.png -------------------------------------------------------------------------------- /images/gankio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/gankio.png -------------------------------------------------------------------------------- /images/gs_today.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/gs_today.png -------------------------------------------------------------------------------- /images/gs_valueanimator.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/gs_valueanimator.png -------------------------------------------------------------------------------- /images/search-gankio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/search-gankio.png -------------------------------------------------------------------------------- /images/search1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/search1.png -------------------------------------------------------------------------------- /images/search2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/search2.png -------------------------------------------------------------------------------- /images/search3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/search3.png -------------------------------------------------------------------------------- /images/search4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/images/search4.png -------------------------------------------------------------------------------- /releases/Gank.1.0.2.alfredworkflow: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/releases/Gank.1.0.2.alfredworkflow -------------------------------------------------------------------------------- /releases/Gank.1.0.3.alfredworkflow: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/releases/Gank.1.0.3.alfredworkflow -------------------------------------------------------------------------------- /releases/Gank.2.0.0.alfredworkflow: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/releases/Gank.2.0.0.alfredworkflow -------------------------------------------------------------------------------- /source-v1/gank.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | # encoding: utf-8 4 | 5 | import sys 6 | 7 | from workflow import Workflow, ICON_WEB, web 8 | 9 | ICON_DEFAULT = 'icon.png' 10 | 11 | def search(query): 12 | # search the ganks from gank.io 13 | url = 'http://gankio.herokuapp.com/search' 14 | # url = 'http://ganhuo.herokuapp.com/search' 15 | params = dict(keyword=query) 16 | r = web.post(url, params) 17 | 18 | # throw an error if request failed, Workflow will catch this and show it to the user 19 | r.raise_for_status() 20 | 21 | return r.json() 22 | 23 | 24 | def main(wf): 25 | # The Workflow instance will be passed to the function 26 | # you call from `Workflow.run`. Not so useful, as 27 | # the `wf` object created in `if __name__ ...` below is global. 28 | # 29 | # Get query from Alfred 30 | query = wf.args[0] 31 | 32 | # Search ganks or load from cached data, 10 mins 33 | def wrapper(): 34 | return search(query) 35 | ganks = wf.cached_data(query, wrapper, max_age=600) 36 | 37 | # Parse the JSON returned by pinboard and extract the ganks 38 | for gank in ganks: 39 | wf.add_item(title=gank['title'], 40 | subtitle=gank['source'], 41 | arg=gank['url'], 42 | valid=True, 43 | icon=ICON_DEFAULT) 44 | 45 | # Send output to Alfred. You can only call this once. 46 | # Well, you *can* call it multiple times, but Alfred won't be listening 47 | # any more... 48 | wf.send_feedback() 49 | 50 | 51 | if __name__ == '__main__': 52 | # Create a global `Workflow` object 53 | wf = Workflow() 54 | 55 | wf = Workflow(update_settings={ 56 | 'github_slug': 'hujiaweibujidao/Gank-Alfred-Workflow', 57 | 'frequency': 7 58 | }) 59 | 60 | # Call your entry function via `Workflow.run()` to enable its helper 61 | # functions, like exception catching, ARGV normalization, magic 62 | # arguments etc. 63 | sys.exit(wf.run(main)) 64 | 65 | if wf.update_available: 66 | wf.start_update() 67 | 68 | -------------------------------------------------------------------------------- /source-v1/icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source-v1/icon.png -------------------------------------------------------------------------------- /source-v1/info.plist: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | bundleid 6 | gank.io 7 | category 8 | Tools 9 | connections 10 | 11 | 161947FA-07D7-46BC-AE79-EFF41BBA87EB 12 | 13 | 14 | destinationuid 15 | DCE01299-BCDC-442F-AA5F-13FC1A803AD3 16 | modifiers 17 | 0 18 | modifiersubtext 19 | 20 | 21 | 22 | 23 | createdby 24 | hujiawei 25 | description 26 | Gank searcher for gank.io (干货集中营的干货搜索器) 27 | disabled 28 | 29 | name 30 | Gank 31 | objects 32 | 33 | 34 | config 35 | 36 | plusspaces 37 | 38 | url 39 | {query} 40 | utf8 41 | 42 | 43 | type 44 | alfred.workflow.action.openurl 45 | uid 46 | DCE01299-BCDC-442F-AA5F-13FC1A803AD3 47 | version 48 | 0 49 | 50 | 51 | config 52 | 53 | argumenttype 54 | 0 55 | escaping 56 | 32 57 | keyword 58 | gk 59 | queuedelaycustom 60 | 3 61 | queuedelayimmediatelyinitially 62 | 63 | queuedelaymode 64 | 0 65 | queuemode 66 | 1 67 | runningsubtext 68 | 客官,请稍等...... 69 | script 70 | python gank.py "{query}" 71 | subtext 72 | 根据您的关键词,搜索10条最相关的干货! 73 | title 74 | 干货搜索器 75 | type 76 | 0 77 | withspace 78 | 79 | 80 | type 81 | alfred.workflow.input.scriptfilter 82 | uid 83 | 161947FA-07D7-46BC-AE79-EFF41BBA87EB 84 | version 85 | 0 86 | 87 | 88 | readme 89 | 90 | uidata 91 | 92 | 161947FA-07D7-46BC-AE79-EFF41BBA87EB 93 | 94 | ypos 95 | 110 96 | 97 | DCE01299-BCDC-442F-AA5F-13FC1A803AD3 98 | 99 | ypos 100 | 10 101 | 102 | 103 | webaddress 104 | http://gank.io 105 | 106 | 107 | -------------------------------------------------------------------------------- /source-v1/version: -------------------------------------------------------------------------------- 1 | 2.0.0 -------------------------------------------------------------------------------- /source-v1/workflow/Notify.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source-v1/workflow/Notify.tgz -------------------------------------------------------------------------------- /source-v1/workflow/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2014 Dean Jackson 5 | # 6 | # MIT Licence. See http://opensource.org/licenses/MIT 7 | # 8 | # Created on 2014-02-15 9 | # 10 | 11 | """ 12 | A Python helper library for `Alfred 2 `_ Workflow 13 | authors. 14 | """ 15 | 16 | import os 17 | 18 | __title__ = 'Alfred-Workflow' 19 | __version__ = open(os.path.join(os.path.dirname(__file__), 'version')).read() 20 | __author__ = 'Dean Jackson' 21 | __licence__ = 'MIT' 22 | __copyright__ = 'Copyright 2014 Dean Jackson' 23 | 24 | 25 | # Workflow objects 26 | from .workflow import Workflow, manager 27 | 28 | # Exceptions 29 | from .workflow import PasswordNotFound, KeychainError 30 | 31 | # Icons 32 | from .workflow import ( 33 | ICON_ACCOUNT, 34 | ICON_BURN, 35 | ICON_CLOCK, 36 | ICON_COLOR, 37 | ICON_COLOUR, 38 | ICON_EJECT, 39 | ICON_ERROR, 40 | ICON_FAVORITE, 41 | ICON_FAVOURITE, 42 | ICON_GROUP, 43 | ICON_HELP, 44 | ICON_HOME, 45 | ICON_INFO, 46 | ICON_NETWORK, 47 | ICON_NOTE, 48 | ICON_SETTINGS, 49 | ICON_SWIRL, 50 | ICON_SWITCH, 51 | ICON_SYNC, 52 | ICON_TRASH, 53 | ICON_USER, 54 | ICON_WARNING, 55 | ICON_WEB, 56 | ) 57 | 58 | # Filter matching rules 59 | from .workflow import ( 60 | MATCH_ALL, 61 | MATCH_ALLCHARS, 62 | MATCH_ATOM, 63 | MATCH_CAPITALS, 64 | MATCH_INITIALS, 65 | MATCH_INITIALS_CONTAIN, 66 | MATCH_INITIALS_STARTSWITH, 67 | MATCH_STARTSWITH, 68 | MATCH_SUBSTRING, 69 | ) 70 | 71 | __all__ = [ 72 | 'Workflow', 73 | 'manager', 74 | 'PasswordNotFound', 75 | 'KeychainError', 76 | 'ICON_ACCOUNT', 77 | 'ICON_BURN', 78 | 'ICON_CLOCK', 79 | 'ICON_COLOR', 80 | 'ICON_COLOUR', 81 | 'ICON_EJECT', 82 | 'ICON_ERROR', 83 | 'ICON_FAVORITE', 84 | 'ICON_FAVOURITE', 85 | 'ICON_GROUP', 86 | 'ICON_HELP', 87 | 'ICON_HOME', 88 | 'ICON_INFO', 89 | 'ICON_NETWORK', 90 | 'ICON_NOTE', 91 | 'ICON_SETTINGS', 92 | 'ICON_SWIRL', 93 | 'ICON_SWITCH', 94 | 'ICON_SYNC', 95 | 'ICON_TRASH', 96 | 'ICON_USER', 97 | 'ICON_WARNING', 98 | 'ICON_WEB', 99 | 'MATCH_ALL', 100 | 'MATCH_ALLCHARS', 101 | 'MATCH_ATOM', 102 | 'MATCH_CAPITALS', 103 | 'MATCH_INITIALS', 104 | 'MATCH_INITIALS_CONTAIN', 105 | 'MATCH_INITIALS_STARTSWITH', 106 | 'MATCH_STARTSWITH', 107 | 'MATCH_SUBSTRING', 108 | ] 109 | -------------------------------------------------------------------------------- /source-v1/workflow/background.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2014 deanishe@deanishe.net 5 | # 6 | # MIT Licence. See http://opensource.org/licenses/MIT 7 | # 8 | # Created on 2014-04-06 9 | # 10 | 11 | """ 12 | Run background tasks 13 | """ 14 | 15 | from __future__ import print_function, unicode_literals 16 | 17 | import sys 18 | import os 19 | import subprocess 20 | import pickle 21 | 22 | from workflow import Workflow 23 | 24 | __all__ = ['is_running', 'run_in_background'] 25 | 26 | _wf = None 27 | 28 | 29 | def wf(): 30 | global _wf 31 | if _wf is None: 32 | _wf = Workflow() 33 | return _wf 34 | 35 | 36 | def _arg_cache(name): 37 | """Return path to pickle cache file for arguments 38 | 39 | :param name: name of task 40 | :type name: ``unicode`` 41 | :returns: Path to cache file 42 | :rtype: ``unicode`` filepath 43 | 44 | """ 45 | 46 | return wf().cachefile('{0}.argcache'.format(name)) 47 | 48 | 49 | def _pid_file(name): 50 | """Return path to PID file for ``name`` 51 | 52 | :param name: name of task 53 | :type name: ``unicode`` 54 | :returns: Path to PID file for task 55 | :rtype: ``unicode`` filepath 56 | 57 | """ 58 | 59 | return wf().cachefile('{0}.pid'.format(name)) 60 | 61 | 62 | def _process_exists(pid): 63 | """Check if a process with PID ``pid`` exists 64 | 65 | :param pid: PID to check 66 | :type pid: ``int`` 67 | :returns: ``True`` if process exists, else ``False`` 68 | :rtype: ``Boolean`` 69 | """ 70 | 71 | try: 72 | os.kill(pid, 0) 73 | except OSError: # not running 74 | return False 75 | return True 76 | 77 | 78 | def is_running(name): 79 | """ 80 | Test whether task is running under ``name`` 81 | 82 | :param name: name of task 83 | :type name: ``unicode`` 84 | :returns: ``True`` if task with name ``name`` is running, else ``False`` 85 | :rtype: ``Boolean`` 86 | 87 | """ 88 | pidfile = _pid_file(name) 89 | if not os.path.exists(pidfile): 90 | return False 91 | 92 | with open(pidfile, 'rb') as file_obj: 93 | pid = int(file_obj.read().strip()) 94 | 95 | if _process_exists(pid): 96 | return True 97 | 98 | elif os.path.exists(pidfile): 99 | os.unlink(pidfile) 100 | 101 | return False 102 | 103 | 104 | def _background(stdin='/dev/null', stdout='/dev/null', 105 | stderr='/dev/null'): # pragma: no cover 106 | """Fork the current process into a background daemon. 107 | 108 | :param stdin: where to read input 109 | :type stdin: filepath 110 | :param stdout: where to write stdout output 111 | :type stdout: filepath 112 | :param stderr: where to write stderr output 113 | :type stderr: filepath 114 | 115 | """ 116 | 117 | # Do first fork. 118 | try: 119 | pid = os.fork() 120 | if pid > 0: 121 | sys.exit(0) # Exit first parent. 122 | except OSError as e: 123 | wf().logger.critical("fork #1 failed: ({0:d}) {1}".format( 124 | e.errno, e.strerror)) 125 | sys.exit(1) 126 | # Decouple from parent environment. 127 | os.chdir(wf().workflowdir) 128 | os.umask(0) 129 | os.setsid() 130 | # Do second fork. 131 | try: 132 | pid = os.fork() 133 | if pid > 0: 134 | sys.exit(0) # Exit second parent. 135 | except OSError as e: 136 | wf().logger.critical("fork #2 failed: ({0:d}) {1}".format( 137 | e.errno, e.strerror)) 138 | sys.exit(1) 139 | # Now I am a daemon! 140 | # Redirect standard file descriptors. 141 | si = file(stdin, 'r', 0) 142 | so = file(stdout, 'a+', 0) 143 | se = file(stderr, 'a+', 0) 144 | if hasattr(sys.stdin, 'fileno'): 145 | os.dup2(si.fileno(), sys.stdin.fileno()) 146 | if hasattr(sys.stdout, 'fileno'): 147 | os.dup2(so.fileno(), sys.stdout.fileno()) 148 | if hasattr(sys.stderr, 'fileno'): 149 | os.dup2(se.fileno(), sys.stderr.fileno()) 150 | 151 | 152 | def run_in_background(name, args, **kwargs): 153 | """Pickle arguments to cache file, then call this script again via 154 | :func:`subprocess.call`. 155 | 156 | :param name: name of task 157 | :type name: ``unicode`` 158 | :param args: arguments passed as first argument to :func:`subprocess.call` 159 | :param \**kwargs: keyword arguments to :func:`subprocess.call` 160 | :returns: exit code of sub-process 161 | :rtype: ``int`` 162 | 163 | When you call this function, it caches its arguments and then calls 164 | ``background.py`` in a subprocess. The Python subprocess will load the 165 | cached arguments, fork into the background, and then run the command you 166 | specified. 167 | 168 | This function will return as soon as the ``background.py`` subprocess has 169 | forked, returning the exit code of *that* process (i.e. not of the command 170 | you're trying to run). 171 | 172 | If that process fails, an error will be written to the log file. 173 | 174 | If a process is already running under the same name, this function will 175 | return immediately and will not run the specified command. 176 | 177 | """ 178 | 179 | if is_running(name): 180 | wf().logger.info('Task `{0}` is already running'.format(name)) 181 | return 182 | 183 | argcache = _arg_cache(name) 184 | 185 | # Cache arguments 186 | with open(argcache, 'wb') as file_obj: 187 | pickle.dump({'args': args, 'kwargs': kwargs}, file_obj) 188 | wf().logger.debug('Command arguments cached to `{0}`'.format(argcache)) 189 | 190 | # Call this script 191 | cmd = ['/usr/bin/python', __file__, name] 192 | wf().logger.debug('Calling {0!r} ...'.format(cmd)) 193 | retcode = subprocess.call(cmd) 194 | if retcode: # pragma: no cover 195 | wf().logger.error('Failed to call task in background') 196 | else: 197 | wf().logger.debug('Executing task `{0}` in background...'.format(name)) 198 | return retcode 199 | 200 | 201 | def main(wf): # pragma: no cover 202 | """ 203 | Load cached arguments, fork into background, then call 204 | :meth:`subprocess.call` with cached arguments 205 | 206 | """ 207 | 208 | name = wf.args[0] 209 | argcache = _arg_cache(name) 210 | if not os.path.exists(argcache): 211 | wf.logger.critical('No arg cache found : {0!r}'.format(argcache)) 212 | return 1 213 | 214 | # Load cached arguments 215 | with open(argcache, 'rb') as file_obj: 216 | data = pickle.load(file_obj) 217 | 218 | # Cached arguments 219 | args = data['args'] 220 | kwargs = data['kwargs'] 221 | 222 | # Delete argument cache file 223 | os.unlink(argcache) 224 | 225 | pidfile = _pid_file(name) 226 | 227 | # Fork to background 228 | _background() 229 | 230 | # Write PID to file 231 | with open(pidfile, 'wb') as file_obj: 232 | file_obj.write('{0}'.format(os.getpid())) 233 | 234 | # Run the command 235 | try: 236 | wf.logger.debug('Task `{0}` running'.format(name)) 237 | wf.logger.debug('cmd : {0!r}'.format(args)) 238 | 239 | retcode = subprocess.call(args, **kwargs) 240 | 241 | if retcode: 242 | wf.logger.error('Command failed with [{0}] : {1!r}'.format( 243 | retcode, args)) 244 | 245 | finally: 246 | if os.path.exists(pidfile): 247 | os.unlink(pidfile) 248 | wf.logger.debug('Task `{0}` finished'.format(name)) 249 | 250 | 251 | if __name__ == '__main__': # pragma: no cover 252 | wf().run(main) 253 | -------------------------------------------------------------------------------- /source-v1/workflow/notify.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2015 deanishe@deanishe.net 5 | # 6 | # MIT Licence. See http://opensource.org/licenses/MIT 7 | # 8 | # Created on 2015-11-26 9 | # 10 | 11 | # TODO: Exclude this module from test and code coverage in py2.6 12 | 13 | """ 14 | Post notifications via the OS X Notification Center. This feature 15 | is only available on Mountain Lion (10.8) and later. It will 16 | silently fail on older systems. 17 | 18 | The main API is a single function, :func:`~workflow.notify.notify`. 19 | 20 | It works by copying a simple application to your workflow's data 21 | directory. It replaces the application's icon with your workflow's 22 | icon and then calls the application to post notifications. 23 | """ 24 | 25 | from __future__ import print_function, unicode_literals 26 | 27 | import os 28 | import plistlib 29 | import shutil 30 | import subprocess 31 | import sys 32 | import tarfile 33 | import tempfile 34 | import uuid 35 | 36 | import workflow 37 | 38 | 39 | _wf = None 40 | _log = None 41 | 42 | 43 | #: Available system sounds from System Preferences > Sound > Sound Effects 44 | SOUNDS = ( 45 | 'Basso', 46 | 'Blow', 47 | 'Bottle', 48 | 'Frog', 49 | 'Funk', 50 | 'Glass', 51 | 'Hero', 52 | 'Morse', 53 | 'Ping', 54 | 'Pop', 55 | 'Purr', 56 | 'Sosumi', 57 | 'Submarine', 58 | 'Tink', 59 | ) 60 | 61 | 62 | def wf(): 63 | """Return `Workflow` object for this module. 64 | 65 | Returns: 66 | workflow.Workflow: `Workflow` object for current workflow. 67 | """ 68 | global _wf 69 | if _wf is None: 70 | _wf = workflow.Workflow() 71 | return _wf 72 | 73 | 74 | def log(): 75 | """Return logger for this module. 76 | 77 | Returns: 78 | logging.Logger: Logger for this module. 79 | """ 80 | global _log 81 | if _log is None: 82 | _log = wf().logger 83 | return _log 84 | 85 | 86 | def notifier_program(): 87 | """Return path to notifier applet executable. 88 | 89 | Returns: 90 | unicode: Path to Notify.app `applet` executable. 91 | """ 92 | return wf().datafile('Notify.app/Contents/MacOS/applet') 93 | 94 | 95 | def notifier_icon_path(): 96 | """Return path to icon file in installed Notify.app. 97 | 98 | Returns: 99 | unicode: Path to `applet.icns` within the app bundle. 100 | """ 101 | return wf().datafile('Notify.app/Contents/Resources/applet.icns') 102 | 103 | 104 | def install_notifier(): 105 | """Extract `Notify.app` from the workflow to data directory. 106 | 107 | Changes the bundle ID of the installed app and gives it the 108 | workflow's icon. 109 | """ 110 | archive = os.path.join(os.path.dirname(__file__), 'Notify.tgz') 111 | destdir = wf().datadir 112 | app_path = os.path.join(destdir, 'Notify.app') 113 | n = notifier_program() 114 | log().debug("Installing Notify.app to %r ...", destdir) 115 | # z = zipfile.ZipFile(archive, 'r') 116 | # z.extractall(destdir) 117 | tgz = tarfile.open(archive, 'r:gz') 118 | tgz.extractall(destdir) 119 | assert os.path.exists(n), ( 120 | "Notify.app could not be installed in {0!r}.".format(destdir)) 121 | 122 | # Replace applet icon 123 | icon = notifier_icon_path() 124 | workflow_icon = wf().workflowfile('icon.png') 125 | if os.path.exists(icon): 126 | os.unlink(icon) 127 | 128 | png_to_icns(workflow_icon, icon) 129 | 130 | # Set file icon 131 | # PyObjC isn't available for 2.6, so this is 2.7 only. Actually, 132 | # none of this code will "work" on pre-10.8 systems. Let it run 133 | # until I figure out a better way of excluding this module 134 | # from coverage in py2.6. 135 | if sys.version_info >= (2, 7): # pragma: no cover 136 | from AppKit import NSWorkspace, NSImage 137 | 138 | ws = NSWorkspace.sharedWorkspace() 139 | img = NSImage.alloc().init() 140 | img.initWithContentsOfFile_(icon) 141 | ws.setIcon_forFile_options_(img, app_path, 0) 142 | 143 | # Change bundle ID of installed app 144 | ip_path = os.path.join(app_path, 'Contents/Info.plist') 145 | bundle_id = '{0}.{1}'.format(wf().bundleid, uuid.uuid4().hex) 146 | data = plistlib.readPlist(ip_path) 147 | log().debug('Changing bundle ID to {0!r}'.format(bundle_id)) 148 | data['CFBundleIdentifier'] = bundle_id 149 | plistlib.writePlist(data, ip_path) 150 | 151 | 152 | def validate_sound(sound): 153 | """Coerce `sound` to valid sound name. 154 | 155 | Returns `None` for invalid sounds. Sound names can be found 156 | in `System Preferences > Sound > Sound Effects`. 157 | 158 | Args: 159 | sound (str): Name of system sound. 160 | 161 | Returns: 162 | str: Proper name of sound or `None`. 163 | """ 164 | if not sound: 165 | return None 166 | 167 | # Case-insensitive comparison of `sound` 168 | if sound.lower() in [s.lower() for s in SOUNDS]: 169 | # Title-case is correct for all system sounds as of OS X 10.11 170 | return sound.title() 171 | return None 172 | 173 | 174 | def notify(title='', text='', sound=None): 175 | """Post notification via Notify.app helper. 176 | 177 | Args: 178 | title (str, optional): Notification title. 179 | text (str, optional): Notification body text. 180 | sound (str, optional): Name of sound to play. 181 | 182 | Raises: 183 | ValueError: Raised if both `title` and `text` are empty. 184 | 185 | Returns: 186 | bool: `True` if notification was posted, else `False`. 187 | """ 188 | if title == text == '': 189 | raise ValueError('Empty notification') 190 | 191 | sound = validate_sound(sound) or '' 192 | 193 | n = notifier_program() 194 | 195 | if not os.path.exists(n): 196 | install_notifier() 197 | 198 | env = os.environ.copy() 199 | enc = 'utf-8' 200 | env['NOTIFY_TITLE'] = title.encode(enc) 201 | env['NOTIFY_MESSAGE'] = text.encode(enc) 202 | env['NOTIFY_SOUND'] = sound.encode(enc) 203 | cmd = [n] 204 | retcode = subprocess.call(cmd, env=env) 205 | if retcode == 0: 206 | return True 207 | 208 | log().error('Notify.app exited with status {0}.'.format(retcode)) 209 | return False 210 | 211 | 212 | def convert_image(inpath, outpath, size): 213 | """Convert an image file using `sips`. 214 | 215 | Args: 216 | inpath (str): Path of source file. 217 | outpath (str): Path to destination file. 218 | size (int): Width and height of destination image in pixels. 219 | 220 | Raises: 221 | RuntimeError: Raised if `sips` exits with non-zero status. 222 | """ 223 | cmd = [ 224 | b'sips', 225 | b'-z', b'{0}'.format(size), b'{0}'.format(size), 226 | inpath, 227 | b'--out', outpath] 228 | # log().debug(cmd) 229 | with open(os.devnull, 'w') as pipe: 230 | retcode = subprocess.call(cmd, stdout=pipe, stderr=subprocess.STDOUT) 231 | 232 | if retcode != 0: 233 | raise RuntimeError('sips exited with {0}'.format(retcode)) 234 | 235 | 236 | def png_to_icns(png_path, icns_path): 237 | """Convert PNG file to ICNS using `iconutil`. 238 | 239 | Create an iconset from the source PNG file. Generate PNG files 240 | in each size required by OS X, then call `iconutil` to turn 241 | them into a single ICNS file. 242 | 243 | Args: 244 | png_path (str): Path to source PNG file. 245 | icns_path (str): Path to destination ICNS file. 246 | 247 | Raises: 248 | RuntimeError: Raised if `iconutil` or `sips` fail. 249 | """ 250 | tempdir = tempfile.mkdtemp(prefix='aw-', dir=wf().datadir) 251 | 252 | try: 253 | iconset = os.path.join(tempdir, 'Icon.iconset') 254 | 255 | assert not os.path.exists(iconset), ( 256 | "Iconset path already exists : {0!r}".format(iconset)) 257 | os.makedirs(iconset) 258 | 259 | # Copy source icon to icon set and generate all the other 260 | # sizes needed 261 | configs = [] 262 | for i in (16, 32, 128, 256, 512): 263 | configs.append(('icon_{0}x{0}.png'.format(i), i)) 264 | configs.append((('icon_{0}x{0}@2x.png'.format(i), i*2))) 265 | 266 | shutil.copy(png_path, os.path.join(iconset, 'icon_256x256.png')) 267 | shutil.copy(png_path, os.path.join(iconset, 'icon_128x128@2x.png')) 268 | 269 | for name, size in configs: 270 | outpath = os.path.join(iconset, name) 271 | if os.path.exists(outpath): 272 | continue 273 | convert_image(png_path, outpath, size) 274 | 275 | cmd = [ 276 | b'iconutil', 277 | b'-c', b'icns', 278 | b'-o', icns_path, 279 | iconset] 280 | 281 | retcode = subprocess.call(cmd) 282 | if retcode != 0: 283 | raise RuntimeError("iconset exited with {0}".format(retcode)) 284 | 285 | assert os.path.exists(icns_path), ( 286 | "Generated ICNS file not found : {0!r}".format(icns_path)) 287 | finally: 288 | try: 289 | shutil.rmtree(tempdir) 290 | except OSError: # pragma: no cover 291 | pass 292 | 293 | 294 | # def notify_native(title='', text='', sound=''): 295 | # """Post notification via the native API (via pyobjc). 296 | 297 | # At least one of `title` or `text` must be specified. 298 | 299 | # This method will *always* show the Python launcher icon (i.e. the 300 | # rocket with the snakes on it). 301 | 302 | # Args: 303 | # title (str, optional): Notification title. 304 | # text (str, optional): Notification body text. 305 | # sound (str, optional): Name of sound to play. 306 | 307 | # """ 308 | 309 | # if title == text == '': 310 | # raise ValueError('Empty notification') 311 | 312 | # import Foundation 313 | 314 | # sound = sound or Foundation.NSUserNotificationDefaultSoundName 315 | 316 | # n = Foundation.NSUserNotification.alloc().init() 317 | # n.setTitle_(title) 318 | # n.setInformativeText_(text) 319 | # n.setSoundName_(sound) 320 | # nc = Foundation.NSUserNotificationCenter.defaultUserNotificationCenter() 321 | # nc.deliverNotification_(n) 322 | 323 | 324 | if __name__ == '__main__': # pragma: nocover 325 | # Simple command-line script to test module with 326 | # This won't work on 2.6, as `argparse` isn't available 327 | # by default. 328 | import argparse 329 | 330 | from unicodedata import normalize 331 | 332 | def uni(s): 333 | """Coerce `s` to normalised Unicode.""" 334 | ustr = s.decode('utf-8') 335 | return normalize('NFD', ustr) 336 | 337 | p = argparse.ArgumentParser() 338 | p.add_argument('-p', '--png', help="PNG image to convert to ICNS.") 339 | p.add_argument('-l', '--list-sounds', help="Show available sounds.", 340 | action='store_true') 341 | p.add_argument('-t', '--title', 342 | help="Notification title.", type=uni, 343 | default='') 344 | p.add_argument('-s', '--sound', type=uni, 345 | help="Optional notification sound.", default='') 346 | p.add_argument('text', type=uni, 347 | help="Notification body text.", default='', nargs='?') 348 | o = p.parse_args() 349 | 350 | # List available sounds 351 | if o.list_sounds: 352 | for sound in SOUNDS: 353 | print(sound) 354 | sys.exit(0) 355 | 356 | # Convert PNG to ICNS 357 | if o.png: 358 | icns = os.path.join( 359 | os.path.dirname(o.png), 360 | b'{0}{1}'.format(os.path.splitext(os.path.basename(o.png))[0], 361 | '.icns')) 362 | 363 | print('Converting {0!r} to {1!r} ...'.format(o.png, icns), 364 | file=sys.stderr) 365 | 366 | assert not os.path.exists(icns), ( 367 | "Destination file already exists : {0}".format(icns)) 368 | 369 | png_to_icns(o.png, icns) 370 | sys.exit(0) 371 | 372 | # Post notification 373 | if o.title == o.text == '': 374 | print('ERROR: Empty notification.', file=sys.stderr) 375 | sys.exit(1) 376 | else: 377 | notify(o.title, o.text, o.sound) 378 | -------------------------------------------------------------------------------- /source-v1/workflow/update.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2014 Fabio Niephaus , 5 | # Dean Jackson 6 | # 7 | # MIT Licence. See http://opensource.org/licenses/MIT 8 | # 9 | # Created on 2014-08-16 10 | # 11 | 12 | """ 13 | Self-updating from GitHub 14 | 15 | .. versionadded:: 1.9 16 | 17 | .. note:: 18 | 19 | This module is not intended to be used directly. Automatic updates 20 | are controlled by the ``update_settings`` :class:`dict` passed to 21 | :class:`~workflow.workflow.Workflow` objects. 22 | 23 | """ 24 | 25 | from __future__ import print_function, unicode_literals 26 | 27 | import os 28 | import tempfile 29 | import re 30 | import subprocess 31 | 32 | import workflow 33 | import web 34 | 35 | # __all__ = [] 36 | 37 | 38 | RELEASES_BASE = 'https://api.github.com/repos/{0}/releases' 39 | 40 | 41 | _wf = None 42 | 43 | 44 | def wf(): 45 | global _wf 46 | if _wf is None: 47 | _wf = workflow.Workflow() 48 | return _wf 49 | 50 | 51 | class Version(object): 52 | """Mostly semantic versioning 53 | 54 | The main difference to proper :ref:`semantic versioning ` 55 | is that this implementation doesn't require a minor or patch version. 56 | """ 57 | 58 | #: Match version and pre-release/build information in version strings 59 | match_version = re.compile(r'([0-9\.]+)(.+)?').match 60 | 61 | def __init__(self, vstr): 62 | self.vstr = vstr 63 | self.major = 0 64 | self.minor = 0 65 | self.patch = 0 66 | self.suffix = '' 67 | self.build = '' 68 | self._parse(vstr) 69 | 70 | def _parse(self, vstr): 71 | if vstr.startswith('v'): 72 | m = self.match_version(vstr[1:]) 73 | else: 74 | m = self.match_version(vstr) 75 | if not m: 76 | raise ValueError('Invalid version number: {0}'.format(vstr)) 77 | 78 | version, suffix = m.groups() 79 | parts = self._parse_dotted_string(version) 80 | self.major = parts.pop(0) 81 | if len(parts): 82 | self.minor = parts.pop(0) 83 | if len(parts): 84 | self.patch = parts.pop(0) 85 | if not len(parts) == 0: 86 | raise ValueError('Invalid version (too long) : {0}'.format(vstr)) 87 | 88 | if suffix: 89 | # Build info 90 | idx = suffix.find('+') 91 | if idx > -1: 92 | self.build = suffix[idx+1:] 93 | suffix = suffix[:idx] 94 | if suffix: 95 | if not suffix.startswith('-'): 96 | raise ValueError( 97 | 'Invalid suffix : `{0}`. Must start with `-`'.format( 98 | suffix)) 99 | self.suffix = suffix[1:] 100 | 101 | # wf().logger.debug('version str `{}` -> {}'.format(vstr, repr(self))) 102 | 103 | def _parse_dotted_string(self, s): 104 | """Parse string ``s`` into list of ints and strings""" 105 | parsed = [] 106 | parts = s.split('.') 107 | for p in parts: 108 | if p.isdigit(): 109 | p = int(p) 110 | parsed.append(p) 111 | return parsed 112 | 113 | @property 114 | def tuple(self): 115 | """Version number as a tuple of major, minor, patch, pre-release""" 116 | 117 | return (self.major, self.minor, self.patch, self.suffix) 118 | 119 | def __lt__(self, other): 120 | if not isinstance(other, Version): 121 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 122 | t = self.tuple[:3] 123 | o = other.tuple[:3] 124 | if t < o: 125 | return True 126 | if t == o: # We need to compare suffixes 127 | if self.suffix and not other.suffix: 128 | return True 129 | if other.suffix and not self.suffix: 130 | return False 131 | return (self._parse_dotted_string(self.suffix) < 132 | self._parse_dotted_string(other.suffix)) 133 | # t > o 134 | return False 135 | 136 | def __eq__(self, other): 137 | if not isinstance(other, Version): 138 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 139 | return self.tuple == other.tuple 140 | 141 | def __ne__(self, other): 142 | return not self.__eq__(other) 143 | 144 | def __gt__(self, other): 145 | if not isinstance(other, Version): 146 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 147 | return other.__lt__(self) 148 | 149 | def __le__(self, other): 150 | if not isinstance(other, Version): 151 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 152 | return not other.__lt__(self) 153 | 154 | def __ge__(self, other): 155 | return not self.__lt__(other) 156 | 157 | def __str__(self): 158 | vstr = '{0}.{1}.{2}'.format(self.major, self.minor, self.patch) 159 | if self.suffix: 160 | vstr += '-{0}'.format(self.suffix) 161 | if self.build: 162 | vstr += '+{0}'.format(self.build) 163 | return vstr 164 | 165 | def __repr__(self): 166 | return "Version('{0}')".format(str(self)) 167 | 168 | 169 | def download_workflow(url): 170 | """Download workflow at ``url`` to a local temporary file 171 | 172 | :param url: URL to .alfredworkflow file in GitHub repo 173 | :returns: path to downloaded file 174 | 175 | """ 176 | 177 | filename = url.split("/")[-1] 178 | 179 | if (not url.endswith('.alfredworkflow') or 180 | not filename.endswith('.alfredworkflow')): 181 | raise ValueError('Attachment `{0}` not a workflow'.format(filename)) 182 | 183 | local_path = os.path.join(tempfile.gettempdir(), filename) 184 | 185 | wf().logger.debug( 186 | 'Downloading updated workflow from `{0}` to `{1}` ...'.format( 187 | url, local_path)) 188 | 189 | response = web.get(url) 190 | 191 | with open(local_path, 'wb') as output: 192 | output.write(response.content) 193 | 194 | return local_path 195 | 196 | 197 | def build_api_url(slug): 198 | """Generate releases URL from GitHub slug 199 | 200 | :param slug: Repo name in form ``username/repo`` 201 | :returns: URL to the API endpoint for the repo's releases 202 | 203 | """ 204 | 205 | if len(slug.split('/')) != 2: 206 | raise ValueError('Invalid GitHub slug : {0}'.format(slug)) 207 | 208 | return RELEASES_BASE.format(slug) 209 | 210 | 211 | def get_valid_releases(github_slug, prereleases=False): 212 | """Return list of all valid releases 213 | 214 | :param github_slug: ``username/repo`` for workflow's GitHub repo 215 | :param prereleases: Whether to include pre-releases. 216 | :returns: list of dicts. Each :class:`dict` has the form 217 | ``{'version': '1.1', 'download_url': 'http://github.com/...', 218 | 'prerelease': False }`` 219 | 220 | 221 | A valid release is one that contains one ``.alfredworkflow`` file. 222 | 223 | If the GitHub version (i.e. tag) is of the form ``v1.1``, the leading 224 | ``v`` will be stripped. 225 | 226 | """ 227 | 228 | api_url = build_api_url(github_slug) 229 | releases = [] 230 | 231 | wf().logger.debug('Retrieving releases list from `{0}` ...'.format( 232 | api_url)) 233 | 234 | def retrieve_releases(): 235 | wf().logger.info( 236 | 'Retrieving releases for `{0}` ...'.format(github_slug)) 237 | return web.get(api_url).json() 238 | 239 | slug = github_slug.replace('/', '-') 240 | for release in wf().cached_data('gh-releases-{0}'.format(slug), 241 | retrieve_releases): 242 | version = release['tag_name'] 243 | download_urls = [] 244 | for asset in release.get('assets', []): 245 | url = asset.get('browser_download_url') 246 | if not url or not url.endswith('.alfredworkflow'): 247 | continue 248 | download_urls.append(url) 249 | 250 | # Validate release 251 | if release['prerelease'] and not prereleases: 252 | wf().logger.warning( 253 | 'Invalid release {0} : pre-release detected'.format(version)) 254 | continue 255 | if not download_urls: 256 | wf().logger.warning( 257 | 'Invalid release {0} : No workflow file'.format(version)) 258 | continue 259 | if len(download_urls) > 1: 260 | wf().logger.warning( 261 | 'Invalid release {0} : multiple workflow files'.format(version)) 262 | continue 263 | 264 | wf().logger.debug('Release `{0}` : {1}'.format(version, url)) 265 | releases.append({ 266 | 'version': version, 267 | 'download_url': download_urls[0], 268 | 'prerelease': release['prerelease'] 269 | }) 270 | 271 | return releases 272 | 273 | 274 | def check_update(github_slug, current_version, prereleases=False): 275 | """Check whether a newer release is available on GitHub 276 | 277 | :param github_slug: ``username/repo`` for workflow's GitHub repo 278 | :param current_version: the currently installed version of the 279 | workflow. :ref:`Semantic versioning ` is required. 280 | :param prereleases: Whether to include pre-releases. 281 | :type current_version: ``unicode`` 282 | :returns: ``True`` if an update is available, else ``False`` 283 | 284 | If an update is available, its version number and download URL will 285 | be cached. 286 | 287 | """ 288 | 289 | releases = get_valid_releases(github_slug, prereleases) 290 | 291 | wf().logger.info('{0} releases for {1}'.format(len(releases), 292 | github_slug)) 293 | 294 | if not len(releases): 295 | raise ValueError('No valid releases for {0}'.format(github_slug)) 296 | 297 | # GitHub returns releases newest-first 298 | latest_release = releases[0] 299 | 300 | # (latest_version, download_url) = get_latest_release(releases) 301 | vr = Version(latest_release['version']) 302 | vl = Version(current_version) 303 | wf().logger.debug('Latest : {0!r} Installed : {1!r}'.format(vr, vl)) 304 | if vr > vl: 305 | 306 | wf().cache_data('__workflow_update_status', { 307 | 'version': latest_release['version'], 308 | 'download_url': latest_release['download_url'], 309 | 'available': True 310 | }) 311 | 312 | return True 313 | 314 | wf().cache_data('__workflow_update_status', { 315 | 'available': False 316 | }) 317 | return False 318 | 319 | 320 | def install_update(github_slug, current_version): 321 | """If a newer release is available, download and install it 322 | 323 | :param github_slug: ``username/repo`` for workflow's GitHub repo 324 | :param current_version: the currently installed version of the 325 | workflow. :ref:`Semantic versioning ` is required. 326 | :type current_version: ``unicode`` 327 | 328 | If an update is available, it will be downloaded and installed. 329 | 330 | :returns: ``True`` if an update is installed, else ``False`` 331 | 332 | """ 333 | # TODO: `github_slug` and `current_version` are both unusued. 334 | 335 | update_data = wf().cached_data('__workflow_update_status', max_age=0) 336 | 337 | if not update_data or not update_data.get('available'): 338 | wf().logger.info('No update available') 339 | return False 340 | 341 | local_file = download_workflow(update_data['download_url']) 342 | 343 | wf().logger.info('Installing updated workflow ...') 344 | subprocess.call(['open', local_file]) 345 | 346 | update_data['available'] = False 347 | wf().cache_data('__workflow_update_status', update_data) 348 | return True 349 | 350 | 351 | if __name__ == '__main__': # pragma: nocover 352 | import sys 353 | 354 | def show_help(): 355 | print('Usage : update.py (check|install) github_slug version [--prereleases]') 356 | sys.exit(1) 357 | 358 | argv = sys.argv[:] 359 | prereleases = '--prereleases' in argv 360 | 361 | if prereleases: 362 | argv.remove('--prereleases') 363 | 364 | if len(argv) != 4: 365 | show_help() 366 | 367 | action, github_slug, version = argv[1:] 368 | 369 | if action not in ('check', 'install'): 370 | show_help() 371 | 372 | if action == 'check': 373 | check_update(github_slug, version, prereleases) 374 | elif action == 'install': 375 | install_update(github_slug, version) 376 | -------------------------------------------------------------------------------- /source-v1/workflow/version: -------------------------------------------------------------------------------- 1 | 1.17.2 -------------------------------------------------------------------------------- /source-v1/workflow/web.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | # 3 | # Copyright (c) 2014 Dean Jackson 4 | # 5 | # MIT Licence. See http://opensource.org/licenses/MIT 6 | # 7 | # Created on 2014-02-15 8 | # 9 | 10 | """ 11 | A lightweight HTTP library with a requests-like interface. 12 | """ 13 | 14 | from __future__ import print_function 15 | 16 | import codecs 17 | import json 18 | import mimetypes 19 | import os 20 | import random 21 | import re 22 | import socket 23 | import string 24 | import unicodedata 25 | import urllib 26 | import urllib2 27 | import urlparse 28 | import zlib 29 | 30 | 31 | USER_AGENT = u'Alfred-Workflow/1.17 (+http://www.deanishe.net/alfred-workflow)' 32 | 33 | # Valid characters for multipart form data boundaries 34 | BOUNDARY_CHARS = string.digits + string.ascii_letters 35 | 36 | # HTTP response codes 37 | RESPONSES = { 38 | 100: 'Continue', 39 | 101: 'Switching Protocols', 40 | 200: 'OK', 41 | 201: 'Created', 42 | 202: 'Accepted', 43 | 203: 'Non-Authoritative Information', 44 | 204: 'No Content', 45 | 205: 'Reset Content', 46 | 206: 'Partial Content', 47 | 300: 'Multiple Choices', 48 | 301: 'Moved Permanently', 49 | 302: 'Found', 50 | 303: 'See Other', 51 | 304: 'Not Modified', 52 | 305: 'Use Proxy', 53 | 307: 'Temporary Redirect', 54 | 400: 'Bad Request', 55 | 401: 'Unauthorized', 56 | 402: 'Payment Required', 57 | 403: 'Forbidden', 58 | 404: 'Not Found', 59 | 405: 'Method Not Allowed', 60 | 406: 'Not Acceptable', 61 | 407: 'Proxy Authentication Required', 62 | 408: 'Request Timeout', 63 | 409: 'Conflict', 64 | 410: 'Gone', 65 | 411: 'Length Required', 66 | 412: 'Precondition Failed', 67 | 413: 'Request Entity Too Large', 68 | 414: 'Request-URI Too Long', 69 | 415: 'Unsupported Media Type', 70 | 416: 'Requested Range Not Satisfiable', 71 | 417: 'Expectation Failed', 72 | 500: 'Internal Server Error', 73 | 501: 'Not Implemented', 74 | 502: 'Bad Gateway', 75 | 503: 'Service Unavailable', 76 | 504: 'Gateway Timeout', 77 | 505: 'HTTP Version Not Supported' 78 | } 79 | 80 | 81 | def str_dict(dic): 82 | """Convert keys and values in ``dic`` into UTF-8-encoded :class:`str` 83 | 84 | :param dic: :class:`dict` of Unicode strings 85 | :returns: :class:`dict` 86 | 87 | """ 88 | if isinstance(dic, CaseInsensitiveDictionary): 89 | dic2 = CaseInsensitiveDictionary() 90 | else: 91 | dic2 = {} 92 | for k, v in dic.items(): 93 | if isinstance(k, unicode): 94 | k = k.encode('utf-8') 95 | if isinstance(v, unicode): 96 | v = v.encode('utf-8') 97 | dic2[k] = v 98 | return dic2 99 | 100 | 101 | class NoRedirectHandler(urllib2.HTTPRedirectHandler): 102 | """Prevent redirections""" 103 | 104 | def redirect_request(self, *args): 105 | return None 106 | 107 | 108 | # Adapted from https://gist.github.com/babakness/3901174 109 | class CaseInsensitiveDictionary(dict): 110 | """ 111 | Dictionary that enables case insensitive searching while preserving 112 | case sensitivity when keys are listed, ie, via keys() or items() methods. 113 | 114 | Works by storing a lowercase version of the key as the new key and 115 | stores the original key-value pair as the key's value 116 | (values become dictionaries). 117 | 118 | """ 119 | 120 | def __init__(self, initval=None): 121 | 122 | if isinstance(initval, dict): 123 | for key, value in initval.iteritems(): 124 | self.__setitem__(key, value) 125 | 126 | elif isinstance(initval, list): 127 | for (key, value) in initval: 128 | self.__setitem__(key, value) 129 | 130 | def __contains__(self, key): 131 | return dict.__contains__(self, key.lower()) 132 | 133 | def __getitem__(self, key): 134 | return dict.__getitem__(self, key.lower())['val'] 135 | 136 | def __setitem__(self, key, value): 137 | return dict.__setitem__(self, key.lower(), {'key': key, 'val': value}) 138 | 139 | def get(self, key, default=None): 140 | try: 141 | v = dict.__getitem__(self, key.lower()) 142 | except KeyError: 143 | return default 144 | else: 145 | return v['val'] 146 | 147 | def update(self, other): 148 | for k, v in other.items(): 149 | self[k] = v 150 | 151 | def items(self): 152 | return [(v['key'], v['val']) for v in dict.itervalues(self)] 153 | 154 | def keys(self): 155 | return [v['key'] for v in dict.itervalues(self)] 156 | 157 | def values(self): 158 | return [v['val'] for v in dict.itervalues(self)] 159 | 160 | def iteritems(self): 161 | for v in dict.itervalues(self): 162 | yield v['key'], v['val'] 163 | 164 | def iterkeys(self): 165 | for v in dict.itervalues(self): 166 | yield v['key'] 167 | 168 | def itervalues(self): 169 | for v in dict.itervalues(self): 170 | yield v['val'] 171 | 172 | 173 | class Response(object): 174 | """ 175 | Returned by :func:`request` / :func:`get` / :func:`post` functions. 176 | 177 | A simplified version of the ``Response`` object in the ``requests`` library. 178 | 179 | >>> r = request('http://www.google.com') 180 | >>> r.status_code 181 | 200 182 | >>> r.encoding 183 | ISO-8859-1 184 | >>> r.content # bytes 185 | ... 186 | >>> r.text # unicode, decoded according to charset in HTTP header/meta tag 187 | u' ...' 188 | >>> r.json() # content parsed as JSON 189 | 190 | """ 191 | 192 | def __init__(self, request, stream=False): 193 | """Call `request` with :mod:`urllib2` and process results. 194 | 195 | :param request: :class:`urllib2.Request` instance 196 | :param stream: Whether to stream response or retrieve it all at once 197 | :type stream: ``bool`` 198 | 199 | """ 200 | 201 | self.request = request 202 | self._stream = stream 203 | self.url = None 204 | self.raw = None 205 | self._encoding = None 206 | self.error = None 207 | self.status_code = None 208 | self.reason = None 209 | self.headers = CaseInsensitiveDictionary() 210 | self._content = None 211 | self._content_loaded = False 212 | self._gzipped = False 213 | 214 | # Execute query 215 | try: 216 | self.raw = urllib2.urlopen(request) 217 | except urllib2.HTTPError as err: 218 | self.error = err 219 | try: 220 | self.url = err.geturl() 221 | # sometimes (e.g. when authentication fails) 222 | # urllib can't get a URL from an HTTPError 223 | # This behaviour changes across Python versions, 224 | # so no test cover (it isn't important). 225 | except AttributeError: # pragma: no cover 226 | pass 227 | self.status_code = err.code 228 | else: 229 | self.status_code = self.raw.getcode() 230 | self.url = self.raw.geturl() 231 | self.reason = RESPONSES.get(self.status_code) 232 | 233 | # Parse additional info if request succeeded 234 | if not self.error: 235 | headers = self.raw.info() 236 | self.transfer_encoding = headers.getencoding() 237 | self.mimetype = headers.gettype() 238 | for key in headers.keys(): 239 | self.headers[key.lower()] = headers.get(key) 240 | 241 | # Is content gzipped? 242 | # Transfer-Encoding appears to not be used in the wild 243 | # (contrary to the HTTP standard), but no harm in testing 244 | # for it 245 | if ('gzip' in headers.get('content-encoding', '') or 246 | 'gzip' in headers.get('transfer-encoding', '')): 247 | self._gzipped = True 248 | 249 | @property 250 | def stream(self): 251 | return self._stream 252 | 253 | @stream.setter 254 | def stream(self, value): 255 | if self._content_loaded: 256 | raise RuntimeError("`content` has already been read from " 257 | "this Response.") 258 | 259 | self._stream = value 260 | 261 | def json(self): 262 | """Decode response contents as JSON. 263 | 264 | :returns: object decoded from JSON 265 | :rtype: :class:`list` / :class:`dict` 266 | 267 | """ 268 | 269 | return json.loads(self.content, self.encoding or 'utf-8') 270 | 271 | @property 272 | def encoding(self): 273 | """Text encoding of document or ``None`` 274 | 275 | :returns: :class:`str` or ``None`` 276 | 277 | """ 278 | 279 | if not self._encoding: 280 | self._encoding = self._get_encoding() 281 | 282 | return self._encoding 283 | 284 | @property 285 | def content(self): 286 | """Raw content of response (i.e. bytes) 287 | 288 | :returns: Body of HTTP response 289 | :rtype: :class:`str` 290 | 291 | """ 292 | 293 | if not self._content: 294 | 295 | # Decompress gzipped content 296 | if self._gzipped: 297 | decoder = zlib.decompressobj(16 + zlib.MAX_WBITS) 298 | self._content = decoder.decompress(self.raw.read()) 299 | 300 | else: 301 | self._content = self.raw.read() 302 | 303 | self._content_loaded = True 304 | 305 | return self._content 306 | 307 | @property 308 | def text(self): 309 | """Unicode-decoded content of response body. 310 | 311 | If no encoding can be determined from HTTP headers or the content 312 | itself, the encoded response body will be returned instead. 313 | 314 | :returns: Body of HTTP response 315 | :rtype: :class:`unicode` or :class:`str` 316 | 317 | """ 318 | 319 | if self.encoding: 320 | return unicodedata.normalize('NFC', unicode(self.content, 321 | self.encoding)) 322 | return self.content 323 | 324 | def iter_content(self, chunk_size=4096, decode_unicode=False): 325 | """Iterate over response data. 326 | 327 | .. versionadded:: 1.6 328 | 329 | :param chunk_size: Number of bytes to read into memory 330 | :type chunk_size: ``int`` 331 | :param decode_unicode: Decode to Unicode using detected encoding 332 | :type decode_unicode: ``Boolean`` 333 | :returns: iterator 334 | 335 | """ 336 | 337 | if not self.stream: 338 | raise RuntimeError("You cannot call `iter_content` on a " 339 | "Response unless you passed `stream=True`" 340 | " to `get()`/`post()`/`request()`.") 341 | 342 | if self._content_loaded: 343 | raise RuntimeError( 344 | "`content` has already been read from this Response.") 345 | 346 | def decode_stream(iterator, r): 347 | 348 | decoder = codecs.getincrementaldecoder(r.encoding)(errors='replace') 349 | 350 | for chunk in iterator: 351 | data = decoder.decode(chunk) 352 | if data: 353 | yield data 354 | 355 | data = decoder.decode(b'', final=True) 356 | if data: # pragma: no cover 357 | yield data 358 | 359 | def generate(): 360 | 361 | if self._gzipped: 362 | decoder = zlib.decompressobj(16 + zlib.MAX_WBITS) 363 | 364 | while True: 365 | chunk = self.raw.read(chunk_size) 366 | if not chunk: 367 | break 368 | 369 | if self._gzipped: 370 | chunk = decoder.decompress(chunk) 371 | 372 | yield chunk 373 | 374 | chunks = generate() 375 | 376 | if decode_unicode and self.encoding: 377 | chunks = decode_stream(chunks, self) 378 | 379 | return chunks 380 | 381 | def save_to_path(self, filepath): 382 | """Save retrieved data to file at ``filepath`` 383 | 384 | .. versionadded: 1.9.6 385 | 386 | :param filepath: Path to save retrieved data. 387 | 388 | """ 389 | 390 | filepath = os.path.abspath(filepath) 391 | dirname = os.path.dirname(filepath) 392 | if not os.path.exists(dirname): 393 | os.makedirs(dirname) 394 | 395 | self.stream = True 396 | 397 | with open(filepath, 'wb') as fileobj: 398 | for data in self.iter_content(): 399 | fileobj.write(data) 400 | 401 | def raise_for_status(self): 402 | """Raise stored error if one occurred. 403 | 404 | error will be instance of :class:`urllib2.HTTPError` 405 | """ 406 | 407 | if self.error is not None: 408 | raise self.error 409 | return 410 | 411 | def _get_encoding(self): 412 | """Get encoding from HTTP headers or content. 413 | 414 | :returns: encoding or `None` 415 | :rtype: ``unicode`` or ``None`` 416 | 417 | """ 418 | 419 | headers = self.raw.info() 420 | encoding = None 421 | 422 | if headers.getparam('charset'): 423 | encoding = headers.getparam('charset') 424 | 425 | # HTTP Content-Type header 426 | for param in headers.getplist(): 427 | if param.startswith('charset='): 428 | encoding = param[8:] 429 | break 430 | 431 | if not self.stream: # Try sniffing response content 432 | # Encoding declared in document should override HTTP headers 433 | if self.mimetype == 'text/html': # sniff HTML headers 434 | m = re.search("""""", 435 | self.content) 436 | if m: 437 | encoding = m.group(1) 438 | print('sniffed HTML encoding=%r' % encoding) 439 | 440 | elif ((self.mimetype.startswith('application/') or 441 | self.mimetype.startswith('text/')) and 442 | 'xml' in self.mimetype): 443 | m = re.search("""]*\?>""", 444 | self.content) 445 | if m: 446 | encoding = m.group(1) 447 | 448 | # Format defaults 449 | if self.mimetype == 'application/json' and not encoding: 450 | # The default encoding for JSON 451 | encoding = 'utf-8' 452 | 453 | elif self.mimetype == 'application/xml' and not encoding: 454 | # The default for 'application/xml' 455 | encoding = 'utf-8' 456 | 457 | if encoding: 458 | encoding = encoding.lower() 459 | 460 | return encoding 461 | 462 | 463 | def request(method, url, params=None, data=None, headers=None, cookies=None, 464 | files=None, auth=None, timeout=60, allow_redirects=False, 465 | stream=False): 466 | """Initiate an HTTP(S) request. Returns :class:`Response` object. 467 | 468 | :param method: 'GET' or 'POST' 469 | :type method: ``unicode`` 470 | :param url: URL to open 471 | :type url: ``unicode`` 472 | :param params: mapping of URL parameters 473 | :type params: :class:`dict` 474 | :param data: mapping of form data ``{'field_name': 'value'}`` or 475 | :class:`str` 476 | :type data: :class:`dict` or :class:`str` 477 | :param headers: HTTP headers 478 | :type headers: :class:`dict` 479 | :param cookies: cookies to send to server 480 | :type cookies: :class:`dict` 481 | :param files: files to upload (see below). 482 | :type files: :class:`dict` 483 | :param auth: username, password 484 | :type auth: ``tuple`` 485 | :param timeout: connection timeout limit in seconds 486 | :type timeout: ``int`` 487 | :param allow_redirects: follow redirections 488 | :type allow_redirects: ``Boolean`` 489 | :param stream: Stream content instead of fetching it all at once. 490 | :type stream: ``bool`` 491 | :returns: :class:`Response` object 492 | 493 | 494 | The ``files`` argument is a dictionary:: 495 | 496 | {'fieldname' : { 'filename': 'blah.txt', 497 | 'content': '', 498 | 'mimetype': 'text/plain'} 499 | } 500 | 501 | * ``fieldname`` is the name of the field in the HTML form. 502 | * ``mimetype`` is optional. If not provided, :mod:`mimetypes` will 503 | be used to guess the mimetype, or ``application/octet-stream`` 504 | will be used. 505 | 506 | """ 507 | 508 | # TODO: cookies 509 | socket.setdefaulttimeout(timeout) 510 | 511 | # Default handlers 512 | openers = [] 513 | 514 | if not allow_redirects: 515 | openers.append(NoRedirectHandler()) 516 | 517 | if auth is not None: # Add authorisation handler 518 | username, password = auth 519 | password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm() 520 | password_manager.add_password(None, url, username, password) 521 | auth_manager = urllib2.HTTPBasicAuthHandler(password_manager) 522 | openers.append(auth_manager) 523 | 524 | # Install our custom chain of openers 525 | opener = urllib2.build_opener(*openers) 526 | urllib2.install_opener(opener) 527 | 528 | if not headers: 529 | headers = CaseInsensitiveDictionary() 530 | else: 531 | headers = CaseInsensitiveDictionary(headers) 532 | 533 | if 'user-agent' not in headers: 534 | headers['user-agent'] = USER_AGENT 535 | 536 | # Accept gzip-encoded content 537 | encodings = [s.strip() for s in 538 | headers.get('accept-encoding', '').split(',')] 539 | if 'gzip' not in encodings: 540 | encodings.append('gzip') 541 | 542 | headers['accept-encoding'] = ', '.join(encodings) 543 | 544 | # Force POST by providing an empty data string 545 | if method == 'POST' and not data: 546 | data = '' 547 | 548 | if files: 549 | if not data: 550 | data = {} 551 | new_headers, data = encode_multipart_formdata(data, files) 552 | headers.update(new_headers) 553 | elif data and isinstance(data, dict): 554 | data = urllib.urlencode(str_dict(data)) 555 | 556 | # Make sure everything is encoded text 557 | headers = str_dict(headers) 558 | 559 | if isinstance(url, unicode): 560 | url = url.encode('utf-8') 561 | 562 | if params: # GET args (POST args are handled in encode_multipart_formdata) 563 | 564 | scheme, netloc, path, query, fragment = urlparse.urlsplit(url) 565 | 566 | if query: # Combine query string and `params` 567 | url_params = urlparse.parse_qs(query) 568 | # `params` take precedence over URL query string 569 | url_params.update(params) 570 | params = url_params 571 | 572 | query = urllib.urlencode(str_dict(params), doseq=True) 573 | url = urlparse.urlunsplit((scheme, netloc, path, query, fragment)) 574 | 575 | req = urllib2.Request(url, data, headers) 576 | return Response(req, stream) 577 | 578 | 579 | def get(url, params=None, headers=None, cookies=None, auth=None, 580 | timeout=60, allow_redirects=True, stream=False): 581 | """Initiate a GET request. Arguments as for :func:`request`. 582 | 583 | :returns: :class:`Response` instance 584 | 585 | """ 586 | 587 | return request('GET', url, params, headers=headers, cookies=cookies, 588 | auth=auth, timeout=timeout, allow_redirects=allow_redirects, 589 | stream=stream) 590 | 591 | 592 | def post(url, params=None, data=None, headers=None, cookies=None, files=None, 593 | auth=None, timeout=60, allow_redirects=False, stream=False): 594 | """Initiate a POST request. Arguments as for :func:`request`. 595 | 596 | :returns: :class:`Response` instance 597 | 598 | """ 599 | return request('POST', url, params, data, headers, cookies, files, auth, 600 | timeout, allow_redirects, stream) 601 | 602 | 603 | def encode_multipart_formdata(fields, files): 604 | """Encode form data (``fields``) and ``files`` for POST request. 605 | 606 | :param fields: mapping of ``{name : value}`` pairs for normal form fields. 607 | :type fields: :class:`dict` 608 | :param files: dictionary of fieldnames/files elements for file data. 609 | See below for details. 610 | :type files: :class:`dict` of :class:`dicts` 611 | :returns: ``(headers, body)`` ``headers`` is a :class:`dict` of HTTP headers 612 | :rtype: 2-tuple ``(dict, str)`` 613 | 614 | The ``files`` argument is a dictionary:: 615 | 616 | {'fieldname' : { 'filename': 'blah.txt', 617 | 'content': '', 618 | 'mimetype': 'text/plain'} 619 | } 620 | 621 | - ``fieldname`` is the name of the field in the HTML form. 622 | - ``mimetype`` is optional. If not provided, :mod:`mimetypes` will be used to guess the mimetype, or ``application/octet-stream`` will be used. 623 | 624 | """ 625 | 626 | def get_content_type(filename): 627 | """Return or guess mimetype of ``filename``. 628 | 629 | :param filename: filename of file 630 | :type filename: unicode/string 631 | :returns: mime-type, e.g. ``text/html`` 632 | :rtype: :class::class:`str` 633 | 634 | """ 635 | 636 | return mimetypes.guess_type(filename)[0] or 'application/octet-stream' 637 | 638 | boundary = '-----' + ''.join(random.choice(BOUNDARY_CHARS) 639 | for i in range(30)) 640 | CRLF = '\r\n' 641 | output = [] 642 | 643 | # Normal form fields 644 | for (name, value) in fields.items(): 645 | if isinstance(name, unicode): 646 | name = name.encode('utf-8') 647 | if isinstance(value, unicode): 648 | value = value.encode('utf-8') 649 | output.append('--' + boundary) 650 | output.append('Content-Disposition: form-data; name="%s"' % name) 651 | output.append('') 652 | output.append(value) 653 | 654 | # Files to upload 655 | for name, d in files.items(): 656 | filename = d[u'filename'] 657 | content = d[u'content'] 658 | if u'mimetype' in d: 659 | mimetype = d[u'mimetype'] 660 | else: 661 | mimetype = get_content_type(filename) 662 | if isinstance(name, unicode): 663 | name = name.encode('utf-8') 664 | if isinstance(filename, unicode): 665 | filename = filename.encode('utf-8') 666 | if isinstance(mimetype, unicode): 667 | mimetype = mimetype.encode('utf-8') 668 | output.append('--' + boundary) 669 | output.append('Content-Disposition: form-data; ' 670 | 'name="%s"; filename="%s"' % (name, filename)) 671 | output.append('Content-Type: %s' % mimetype) 672 | output.append('') 673 | output.append(content) 674 | 675 | output.append('--' + boundary + '--') 676 | output.append('') 677 | body = CRLF.join(output) 678 | headers = { 679 | 'Content-Type': 'multipart/form-data; boundary=%s' % boundary, 680 | 'Content-Length': str(len(body)), 681 | } 682 | return (headers, body) 683 | -------------------------------------------------------------------------------- /source/android.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/android.png -------------------------------------------------------------------------------- /source/apple.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/apple.png -------------------------------------------------------------------------------- /source/bs4/__init__.py: -------------------------------------------------------------------------------- 1 | """Beautiful Soup 2 | Elixir and Tonic 3 | "The Screen-Scraper's Friend" 4 | http://www.crummy.com/software/BeautifulSoup/ 5 | 6 | Beautiful Soup uses a pluggable XML or HTML parser to parse a 7 | (possibly invalid) document into a tree representation. Beautiful Soup 8 | provides provides methods and Pythonic idioms that make it easy to 9 | navigate, search, and modify the parse tree. 10 | 11 | Beautiful Soup works with Python 2.6 and up. It works better if lxml 12 | and/or html5lib is installed. 13 | 14 | For more than you ever wanted to know about Beautiful Soup, see the 15 | documentation: 16 | http://www.crummy.com/software/BeautifulSoup/bs4/doc/ 17 | """ 18 | 19 | __author__ = "Leonard Richardson (leonardr@segfault.org)" 20 | __version__ = "4.1.0" 21 | __copyright__ = "Copyright (c) 2004-2012 Leonard Richardson" 22 | __license__ = "MIT" 23 | 24 | __all__ = ['BeautifulSoup'] 25 | 26 | import re 27 | import warnings 28 | 29 | from .builder import builder_registry 30 | from .dammit import UnicodeDammit 31 | from .element import ( 32 | CData, 33 | Comment, 34 | DEFAULT_OUTPUT_ENCODING, 35 | Declaration, 36 | Doctype, 37 | NavigableString, 38 | PageElement, 39 | ProcessingInstruction, 40 | ResultSet, 41 | SoupStrainer, 42 | Tag, 43 | ) 44 | 45 | # The very first thing we do is give a useful error if someone is 46 | # running this code under Python 3 without converting it. 47 | syntax_error = u'You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work. You need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).' 48 | 49 | class BeautifulSoup(Tag): 50 | """ 51 | This class defines the basic interface called by the tree builders. 52 | 53 | These methods will be called by the parser: 54 | reset() 55 | feed(markup) 56 | 57 | The tree builder may call these methods from its feed() implementation: 58 | handle_starttag(name, attrs) # See note about return value 59 | handle_endtag(name) 60 | handle_data(data) # Appends to the current data node 61 | endData(containerClass=NavigableString) # Ends the current data node 62 | 63 | No matter how complicated the underlying parser is, you should be 64 | able to build a tree using 'start tag' events, 'end tag' events, 65 | 'data' events, and "done with data" events. 66 | 67 | If you encounter an empty-element tag (aka a self-closing tag, 68 | like HTML's
tag), call handle_starttag and then 69 | handle_endtag. 70 | """ 71 | ROOT_TAG_NAME = u'[document]' 72 | 73 | # If the end-user gives no indication which tree builder they 74 | # want, look for one with these features. 75 | DEFAULT_BUILDER_FEATURES = ['html', 'fast'] 76 | 77 | # Used when determining whether a text node is all whitespace and 78 | # can be replaced with a single space. A text node that contains 79 | # fancy Unicode spaces (usually non-breaking) should be left 80 | # alone. 81 | STRIP_ASCII_SPACES = {9: None, 10: None, 12: None, 13: None, 32: None, } 82 | 83 | def __init__(self, markup="", features=None, builder=None, 84 | parse_only=None, from_encoding=None, **kwargs): 85 | """The Soup object is initialized as the 'root tag', and the 86 | provided markup (which can be a string or a file-like object) 87 | is fed into the underlying parser.""" 88 | 89 | if 'convertEntities' in kwargs: 90 | warnings.warn( 91 | "BS4 does not respect the convertEntities argument to the " 92 | "BeautifulSoup constructor. Entities are always converted " 93 | "to Unicode characters.") 94 | 95 | if 'markupMassage' in kwargs: 96 | del kwargs['markupMassage'] 97 | warnings.warn( 98 | "BS4 does not respect the markupMassage argument to the " 99 | "BeautifulSoup constructor. The tree builder is responsible " 100 | "for any necessary markup massage.") 101 | 102 | if 'smartQuotesTo' in kwargs: 103 | del kwargs['smartQuotesTo'] 104 | warnings.warn( 105 | "BS4 does not respect the smartQuotesTo argument to the " 106 | "BeautifulSoup constructor. Smart quotes are always converted " 107 | "to Unicode characters.") 108 | 109 | if 'selfClosingTags' in kwargs: 110 | del kwargs['selfClosingTags'] 111 | warnings.warn( 112 | "BS4 does not respect the selfClosingTags argument to the " 113 | "BeautifulSoup constructor. The tree builder is responsible " 114 | "for understanding self-closing tags.") 115 | 116 | if 'isHTML' in kwargs: 117 | del kwargs['isHTML'] 118 | warnings.warn( 119 | "BS4 does not respect the isHTML argument to the " 120 | "BeautifulSoup constructor. You can pass in features='html' " 121 | "or features='xml' to get a builder capable of handling " 122 | "one or the other.") 123 | 124 | def deprecated_argument(old_name, new_name): 125 | if old_name in kwargs: 126 | warnings.warn( 127 | 'The "%s" argument to the BeautifulSoup constructor ' 128 | 'has been renamed to "%s."' % (old_name, new_name)) 129 | value = kwargs[old_name] 130 | del kwargs[old_name] 131 | return value 132 | return None 133 | 134 | parse_only = parse_only or deprecated_argument( 135 | "parseOnlyThese", "parse_only") 136 | 137 | from_encoding = from_encoding or deprecated_argument( 138 | "fromEncoding", "from_encoding") 139 | 140 | if len(kwargs) > 0: 141 | arg = kwargs.keys().pop() 142 | raise TypeError( 143 | "__init__() got an unexpected keyword argument '%s'" % arg) 144 | 145 | if builder is None: 146 | if isinstance(features, basestring): 147 | features = [features] 148 | if features is None or len(features) == 0: 149 | features = self.DEFAULT_BUILDER_FEATURES 150 | builder_class = builder_registry.lookup(*features) 151 | if builder_class is None: 152 | raise ValueError( 153 | "Couldn't find a tree builder with the features you " 154 | "requested: %s. Do you need to install a parser library?" 155 | % ",".join(features)) 156 | builder = builder_class() 157 | self.builder = builder 158 | self.is_xml = builder.is_xml 159 | self.builder.soup = self 160 | 161 | self.parse_only = parse_only 162 | 163 | self.reset() 164 | 165 | if hasattr(markup, 'read'): # It's a file-type object. 166 | markup = markup.read() 167 | (self.markup, self.original_encoding, self.declared_html_encoding, 168 | self.contains_replacement_characters) = ( 169 | self.builder.prepare_markup(markup, from_encoding)) 170 | 171 | try: 172 | self._feed() 173 | except StopParsing: 174 | pass 175 | 176 | # Clear out the markup and remove the builder's circular 177 | # reference to this object. 178 | self.markup = None 179 | self.builder.soup = None 180 | 181 | def _feed(self): 182 | # Convert the document to Unicode. 183 | self.builder.reset() 184 | 185 | self.builder.feed(self.markup) 186 | # Close out any unfinished strings and close all the open tags. 187 | self.endData() 188 | while self.currentTag.name != self.ROOT_TAG_NAME: 189 | self.popTag() 190 | 191 | def reset(self): 192 | Tag.__init__(self, self, self.builder, self.ROOT_TAG_NAME) 193 | self.hidden = 1 194 | self.builder.reset() 195 | self.currentData = [] 196 | self.currentTag = None 197 | self.tagStack = [] 198 | self.pushTag(self) 199 | 200 | def new_tag(self, name, namespace=None, nsprefix=None, **attrs): 201 | """Create a new tag associated with this soup.""" 202 | return Tag(None, self.builder, name, namespace, nsprefix, attrs) 203 | 204 | def new_string(self, s): 205 | """Create a new NavigableString associated with this soup.""" 206 | navigable = NavigableString(s) 207 | navigable.setup() 208 | return navigable 209 | 210 | def insert_before(self, successor): 211 | raise ValueError("BeautifulSoup objects don't support insert_before().") 212 | 213 | def insert_after(self, successor): 214 | raise ValueError("BeautifulSoup objects don't support insert_after().") 215 | 216 | def popTag(self): 217 | tag = self.tagStack.pop() 218 | #print "Pop", tag.name 219 | if self.tagStack: 220 | self.currentTag = self.tagStack[-1] 221 | return self.currentTag 222 | 223 | def pushTag(self, tag): 224 | #print "Push", tag.name 225 | if self.currentTag: 226 | self.currentTag.contents.append(tag) 227 | self.tagStack.append(tag) 228 | self.currentTag = self.tagStack[-1] 229 | 230 | def endData(self, containerClass=NavigableString): 231 | if self.currentData: 232 | currentData = u''.join(self.currentData) 233 | if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and 234 | not set([tag.name for tag in self.tagStack]).intersection( 235 | self.builder.preserve_whitespace_tags)): 236 | if '\n' in currentData: 237 | currentData = '\n' 238 | else: 239 | currentData = ' ' 240 | self.currentData = [] 241 | if self.parse_only and len(self.tagStack) <= 1 and \ 242 | (not self.parse_only.text or \ 243 | not self.parse_only.search(currentData)): 244 | return 245 | o = containerClass(currentData) 246 | self.object_was_parsed(o) 247 | 248 | def object_was_parsed(self, o): 249 | """Add an object to the parse tree.""" 250 | o.setup(self.currentTag, self.previous_element) 251 | if self.previous_element: 252 | self.previous_element.next_element = o 253 | self.previous_element = o 254 | self.currentTag.contents.append(o) 255 | 256 | def _popToTag(self, name, nsprefix=None, inclusivePop=True): 257 | """Pops the tag stack up to and including the most recent 258 | instance of the given tag. If inclusivePop is false, pops the tag 259 | stack up to but *not* including the most recent instqance of 260 | the given tag.""" 261 | #print "Popping to %s" % name 262 | if name == self.ROOT_TAG_NAME: 263 | return 264 | 265 | numPops = 0 266 | mostRecentTag = None 267 | 268 | for i in range(len(self.tagStack) - 1, 0, -1): 269 | if (name == self.tagStack[i].name 270 | and nsprefix == self.tagStack[i].nsprefix == nsprefix): 271 | numPops = len(self.tagStack) - i 272 | break 273 | if not inclusivePop: 274 | numPops = numPops - 1 275 | 276 | for i in range(0, numPops): 277 | mostRecentTag = self.popTag() 278 | return mostRecentTag 279 | 280 | def handle_starttag(self, name, namespace, nsprefix, attrs): 281 | """Push a start tag on to the stack. 282 | 283 | If this method returns None, the tag was rejected by the 284 | SoupStrainer. You should proceed as if the tag had not occured 285 | in the document. For instance, if this was a self-closing tag, 286 | don't call handle_endtag. 287 | """ 288 | 289 | # print "Start tag %s: %s" % (name, attrs) 290 | self.endData() 291 | 292 | if (self.parse_only and len(self.tagStack) <= 1 293 | and (self.parse_only.text 294 | or not self.parse_only.search_tag(name, attrs))): 295 | return None 296 | 297 | tag = Tag(self, self.builder, name, namespace, nsprefix, attrs, 298 | self.currentTag, self.previous_element) 299 | if tag is None: 300 | return tag 301 | if self.previous_element: 302 | self.previous_element.next_element = tag 303 | self.previous_element = tag 304 | self.pushTag(tag) 305 | return tag 306 | 307 | def handle_endtag(self, name, nsprefix=None): 308 | #print "End tag: " + name 309 | self.endData() 310 | self._popToTag(name, nsprefix) 311 | 312 | def handle_data(self, data): 313 | self.currentData.append(data) 314 | 315 | def decode(self, pretty_print=False, 316 | eventual_encoding=DEFAULT_OUTPUT_ENCODING, 317 | formatter="minimal"): 318 | """Returns a string or Unicode representation of this document. 319 | To get Unicode, pass None for encoding.""" 320 | 321 | if self.is_xml: 322 | # Print the XML declaration 323 | encoding_part = '' 324 | if eventual_encoding != None: 325 | encoding_part = ' encoding="%s"' % eventual_encoding 326 | prefix = u'\n' % encoding_part 327 | else: 328 | prefix = u'' 329 | if not pretty_print: 330 | indent_level = None 331 | else: 332 | indent_level = 0 333 | return prefix + super(BeautifulSoup, self).decode( 334 | indent_level, eventual_encoding, formatter) 335 | 336 | class BeautifulStoneSoup(BeautifulSoup): 337 | """Deprecated interface to an XML parser.""" 338 | 339 | def __init__(self, *args, **kwargs): 340 | kwargs['features'] = 'xml' 341 | warnings.warn( 342 | 'The BeautifulStoneSoup class is deprecated. Instead of using ' 343 | 'it, pass features="xml" into the BeautifulSoup constructor.') 344 | super(BeautifulStoneSoup, self).__init__(*args, **kwargs) 345 | 346 | 347 | class StopParsing(Exception): 348 | pass 349 | 350 | 351 | #By default, act as an HTML pretty-printer. 352 | if __name__ == '__main__': 353 | import sys 354 | soup = BeautifulSoup(sys.stdin) 355 | print soup.prettify() 356 | -------------------------------------------------------------------------------- /source/bs4/builder/__init__.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict 2 | import itertools 3 | import sys 4 | from bs4.element import ( 5 | CharsetMetaAttributeValue, 6 | ContentMetaAttributeValue, 7 | whitespace_re 8 | ) 9 | 10 | __all__ = [ 11 | 'HTMLTreeBuilder', 12 | 'SAXTreeBuilder', 13 | 'TreeBuilder', 14 | 'TreeBuilderRegistry', 15 | ] 16 | 17 | # Some useful features for a TreeBuilder to have. 18 | FAST = 'fast' 19 | PERMISSIVE = 'permissive' 20 | STRICT = 'strict' 21 | XML = 'xml' 22 | HTML = 'html' 23 | HTML_5 = 'html5' 24 | 25 | 26 | class TreeBuilderRegistry(object): 27 | 28 | def __init__(self): 29 | self.builders_for_feature = defaultdict(list) 30 | self.builders = [] 31 | 32 | def register(self, treebuilder_class): 33 | """Register a treebuilder based on its advertised features.""" 34 | for feature in treebuilder_class.features: 35 | self.builders_for_feature[feature].insert(0, treebuilder_class) 36 | self.builders.insert(0, treebuilder_class) 37 | 38 | def lookup(self, *features): 39 | if len(self.builders) == 0: 40 | # There are no builders at all. 41 | return None 42 | 43 | if len(features) == 0: 44 | # They didn't ask for any features. Give them the most 45 | # recently registered builder. 46 | return self.builders[0] 47 | 48 | # Go down the list of features in order, and eliminate any builders 49 | # that don't match every feature. 50 | features = list(features) 51 | features.reverse() 52 | candidates = None 53 | candidate_set = None 54 | while len(features) > 0: 55 | feature = features.pop() 56 | we_have_the_feature = self.builders_for_feature.get(feature, []) 57 | if len(we_have_the_feature) > 0: 58 | if candidates is None: 59 | candidates = we_have_the_feature 60 | candidate_set = set(candidates) 61 | else: 62 | # Eliminate any candidates that don't have this feature. 63 | candidate_set = candidate_set.intersection( 64 | set(we_have_the_feature)) 65 | 66 | # The only valid candidates are the ones in candidate_set. 67 | # Go through the original list of candidates and pick the first one 68 | # that's in candidate_set. 69 | if candidate_set is None: 70 | return None 71 | for candidate in candidates: 72 | if candidate in candidate_set: 73 | return candidate 74 | return None 75 | 76 | # The BeautifulSoup class will take feature lists from developers and use them 77 | # to look up builders in this registry. 78 | builder_registry = TreeBuilderRegistry() 79 | 80 | class TreeBuilder(object): 81 | """Turn a document into a Beautiful Soup object tree.""" 82 | 83 | features = [] 84 | 85 | is_xml = False 86 | preserve_whitespace_tags = set() 87 | empty_element_tags = None # A tag will be considered an empty-element 88 | # tag when and only when it has no contents. 89 | 90 | # A value for these tag/attribute combinations is a space- or 91 | # comma-separated list of CDATA, rather than a single CDATA. 92 | cdata_list_attributes = {} 93 | 94 | 95 | def __init__(self): 96 | self.soup = None 97 | 98 | def reset(self): 99 | pass 100 | 101 | def can_be_empty_element(self, tag_name): 102 | """Might a tag with this name be an empty-element tag? 103 | 104 | The final markup may or may not actually present this tag as 105 | self-closing. 106 | 107 | For instance: an HTMLBuilder does not consider a

tag to be 108 | an empty-element tag (it's not in 109 | HTMLBuilder.empty_element_tags). This means an empty

tag 110 | will be presented as "

", not "

". 111 | 112 | The default implementation has no opinion about which tags are 113 | empty-element tags, so a tag will be presented as an 114 | empty-element tag if and only if it has no contents. 115 | "" will become "", and "bar" will 116 | be left alone. 117 | """ 118 | if self.empty_element_tags is None: 119 | return True 120 | return tag_name in self.empty_element_tags 121 | 122 | def feed(self, markup): 123 | raise NotImplementedError() 124 | 125 | def prepare_markup(self, markup, user_specified_encoding=None, 126 | document_declared_encoding=None): 127 | return markup, None, None, False 128 | 129 | def test_fragment_to_document(self, fragment): 130 | """Wrap an HTML fragment to make it look like a document. 131 | 132 | Different parsers do this differently. For instance, lxml 133 | introduces an empty tag, and html5lib 134 | doesn't. Abstracting this away lets us write simple tests 135 | which run HTML fragments through the parser and compare the 136 | results against other HTML fragments. 137 | 138 | This method should not be used outside of tests. 139 | """ 140 | return fragment 141 | 142 | def set_up_substitutions(self, tag): 143 | return False 144 | 145 | def _replace_cdata_list_attribute_values(self, tag_name, attrs): 146 | """Replaces class="foo bar" with class=["foo", "bar"] 147 | 148 | Modifies its input in place. 149 | """ 150 | if self.cdata_list_attributes: 151 | universal = self.cdata_list_attributes.get('*', []) 152 | tag_specific = self.cdata_list_attributes.get( 153 | tag_name.lower(), []) 154 | for cdata_list_attr in itertools.chain(universal, tag_specific): 155 | if cdata_list_attr in dict(attrs): 156 | # Basically, we have a "class" attribute whose 157 | # value is a whitespace-separated list of CSS 158 | # classes. Split it into a list. 159 | value = attrs[cdata_list_attr] 160 | values = whitespace_re.split(value) 161 | attrs[cdata_list_attr] = values 162 | return attrs 163 | 164 | class SAXTreeBuilder(TreeBuilder): 165 | """A Beautiful Soup treebuilder that listens for SAX events.""" 166 | 167 | def feed(self, markup): 168 | raise NotImplementedError() 169 | 170 | def close(self): 171 | pass 172 | 173 | def startElement(self, name, attrs): 174 | attrs = dict((key[1], value) for key, value in list(attrs.items())) 175 | #print "Start %s, %r" % (name, attrs) 176 | self.soup.handle_starttag(name, attrs) 177 | 178 | def endElement(self, name): 179 | #print "End %s" % name 180 | self.soup.handle_endtag(name) 181 | 182 | def startElementNS(self, nsTuple, nodeName, attrs): 183 | # Throw away (ns, nodeName) for now. 184 | self.startElement(nodeName, attrs) 185 | 186 | def endElementNS(self, nsTuple, nodeName): 187 | # Throw away (ns, nodeName) for now. 188 | self.endElement(nodeName) 189 | #handler.endElementNS((ns, node.nodeName), node.nodeName) 190 | 191 | def startPrefixMapping(self, prefix, nodeValue): 192 | # Ignore the prefix for now. 193 | pass 194 | 195 | def endPrefixMapping(self, prefix): 196 | # Ignore the prefix for now. 197 | # handler.endPrefixMapping(prefix) 198 | pass 199 | 200 | def characters(self, content): 201 | self.soup.handle_data(content) 202 | 203 | def startDocument(self): 204 | pass 205 | 206 | def endDocument(self): 207 | pass 208 | 209 | 210 | class HTMLTreeBuilder(TreeBuilder): 211 | """This TreeBuilder knows facts about HTML. 212 | 213 | Such as which tags are empty-element tags. 214 | """ 215 | 216 | preserve_whitespace_tags = set(['pre', 'textarea']) 217 | empty_element_tags = set(['br' , 'hr', 'input', 'img', 'meta', 218 | 'spacer', 'link', 'frame', 'base']) 219 | 220 | # The HTML standard defines these attributes as containing a 221 | # space-separated list of values, not a single value. That is, 222 | # class="foo bar" means that the 'class' attribute has two values, 223 | # 'foo' and 'bar', not the single value 'foo bar'. When we 224 | # encounter one of these attributes, we will parse its value into 225 | # a list of values if possible. Upon output, the list will be 226 | # converted back into a string. 227 | cdata_list_attributes = { 228 | "*" : ['class', 'accesskey', 'dropzone'], 229 | "a" : ['rel', 'rev'], 230 | "link" : ['rel', 'rev'], 231 | "td" : ["headers"], 232 | "th" : ["headers"], 233 | "td" : ["headers"], 234 | "form" : ["accept-charset"], 235 | "object" : ["archive"], 236 | 237 | # These are HTML5 specific, as are *.accesskey and *.dropzone above. 238 | "area" : ["rel"], 239 | "icon" : ["sizes"], 240 | "iframe" : ["sandbox"], 241 | "output" : ["for"], 242 | } 243 | 244 | def set_up_substitutions(self, tag): 245 | # We are only interested in tags 246 | if tag.name != 'meta': 247 | return False 248 | 249 | http_equiv = tag.get('http-equiv') 250 | content = tag.get('content') 251 | charset = tag.get('charset') 252 | 253 | # We are interested in tags that say what encoding the 254 | # document was originally in. This means HTML 5-style 255 | # tags that provide the "charset" attribute. It also means 256 | # HTML 4-style tags that provide the "content" 257 | # attribute and have "http-equiv" set to "content-type". 258 | # 259 | # In both cases we will replace the value of the appropriate 260 | # attribute with a standin object that can take on any 261 | # encoding. 262 | meta_encoding = None 263 | if charset is not None: 264 | # HTML 5 style: 265 | # 266 | meta_encoding = charset 267 | tag['charset'] = CharsetMetaAttributeValue(charset) 268 | 269 | elif (content is not None and http_equiv is not None 270 | and http_equiv.lower() == 'content-type'): 271 | # HTML 4 style: 272 | # 273 | tag['content'] = ContentMetaAttributeValue(content) 274 | 275 | return (meta_encoding is not None) 276 | 277 | def register_treebuilders_from(module): 278 | """Copy TreeBuilders from the given module into this module.""" 279 | # I'm fairly sure this is not the best way to do this. 280 | this_module = sys.modules['bs4.builder'] 281 | for name in module.__all__: 282 | obj = getattr(module, name) 283 | 284 | if issubclass(obj, TreeBuilder): 285 | setattr(this_module, name, obj) 286 | this_module.__all__.append(name) 287 | # Register the builder while we're at it. 288 | this_module.builder_registry.register(obj) 289 | 290 | # Builders are registered in reverse order of priority, so that custom 291 | # builder registrations will take precedence. In general, we want lxml 292 | # to take precedence over html5lib, because it's faster. And we only 293 | # want to use HTMLParser as a last result. 294 | from . import _htmlparser 295 | register_treebuilders_from(_htmlparser) 296 | try: 297 | from . import _html5lib 298 | register_treebuilders_from(_html5lib) 299 | except ImportError: 300 | # They don't have html5lib installed. 301 | pass 302 | try: 303 | from . import _lxml 304 | register_treebuilders_from(_lxml) 305 | except ImportError: 306 | # They don't have lxml installed. 307 | pass 308 | -------------------------------------------------------------------------------- /source/bs4/builder/_html5lib.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'HTML5TreeBuilder', 3 | ] 4 | 5 | import warnings 6 | from bs4.builder import ( 7 | PERMISSIVE, 8 | HTML, 9 | HTML_5, 10 | HTMLTreeBuilder, 11 | ) 12 | from bs4.element import NamespacedAttribute 13 | import html5lib 14 | from html5lib.constants import namespaces 15 | from bs4.element import ( 16 | Comment, 17 | Doctype, 18 | NavigableString, 19 | Tag, 20 | ) 21 | 22 | class HTML5TreeBuilder(HTMLTreeBuilder): 23 | """Use html5lib to build a tree.""" 24 | 25 | features = ['html5lib', PERMISSIVE, HTML_5, HTML] 26 | 27 | def prepare_markup(self, markup, user_specified_encoding): 28 | # Store the user-specified encoding for use later on. 29 | self.user_specified_encoding = user_specified_encoding 30 | return markup, None, None, False 31 | 32 | # These methods are defined by Beautiful Soup. 33 | def feed(self, markup): 34 | if self.soup.parse_only is not None: 35 | warnings.warn("You provided a value for parse_only, but the html5lib tree builder doesn't support parse_only. The entire document will be parsed.") 36 | parser = html5lib.HTMLParser(tree=self.create_treebuilder) 37 | doc = parser.parse(markup, encoding=self.user_specified_encoding) 38 | 39 | # Set the character encoding detected by the tokenizer. 40 | if isinstance(markup, unicode): 41 | # We need to special-case this because html5lib sets 42 | # charEncoding to UTF-8 if it gets Unicode input. 43 | doc.original_encoding = None 44 | else: 45 | doc.original_encoding = parser.tokenizer.stream.charEncoding[0] 46 | 47 | def create_treebuilder(self, namespaceHTMLElements): 48 | self.underlying_builder = TreeBuilderForHtml5lib( 49 | self.soup, namespaceHTMLElements) 50 | return self.underlying_builder 51 | 52 | def test_fragment_to_document(self, fragment): 53 | """See `TreeBuilder`.""" 54 | return u'%s' % fragment 55 | 56 | 57 | class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder): 58 | 59 | def __init__(self, soup, namespaceHTMLElements): 60 | self.soup = soup 61 | super(TreeBuilderForHtml5lib, self).__init__(namespaceHTMLElements) 62 | 63 | def documentClass(self): 64 | self.soup.reset() 65 | return Element(self.soup, self.soup, None) 66 | 67 | def insertDoctype(self, token): 68 | name = token["name"] 69 | publicId = token["publicId"] 70 | systemId = token["systemId"] 71 | 72 | doctype = Doctype.for_name_and_ids(name, publicId, systemId) 73 | self.soup.object_was_parsed(doctype) 74 | 75 | def elementClass(self, name, namespace): 76 | tag = self.soup.new_tag(name, namespace) 77 | return Element(tag, self.soup, namespace) 78 | 79 | def commentClass(self, data): 80 | return TextNode(Comment(data), self.soup) 81 | 82 | def fragmentClass(self): 83 | self.soup = BeautifulSoup("") 84 | self.soup.name = "[document_fragment]" 85 | return Element(self.soup, self.soup, None) 86 | 87 | def appendChild(self, node): 88 | # XXX This code is not covered by the BS4 tests. 89 | self.soup.append(node.element) 90 | 91 | def getDocument(self): 92 | return self.soup 93 | 94 | def getFragment(self): 95 | return html5lib.treebuilders._base.TreeBuilder.getFragment(self).element 96 | 97 | class AttrList(object): 98 | def __init__(self, element): 99 | self.element = element 100 | self.attrs = dict(self.element.attrs) 101 | def __iter__(self): 102 | return list(self.attrs.items()).__iter__() 103 | def __setitem__(self, name, value): 104 | "set attr", name, value 105 | self.element[name] = value 106 | def items(self): 107 | return list(self.attrs.items()) 108 | def keys(self): 109 | return list(self.attrs.keys()) 110 | def __len__(self): 111 | return len(self.attrs) 112 | def __getitem__(self, name): 113 | return self.attrs[name] 114 | def __contains__(self, name): 115 | return name in list(self.attrs.keys()) 116 | 117 | 118 | class Element(html5lib.treebuilders._base.Node): 119 | def __init__(self, element, soup, namespace): 120 | html5lib.treebuilders._base.Node.__init__(self, element.name) 121 | self.element = element 122 | self.soup = soup 123 | self.namespace = namespace 124 | 125 | def appendChild(self, node): 126 | if (node.element.__class__ == NavigableString and self.element.contents 127 | and self.element.contents[-1].__class__ == NavigableString): 128 | # Concatenate new text onto old text node 129 | # XXX This has O(n^2) performance, for input like 130 | # "aaa..." 131 | old_element = self.element.contents[-1] 132 | new_element = self.soup.new_string(old_element + node.element) 133 | old_element.replace_with(new_element) 134 | else: 135 | self.element.append(node.element) 136 | node.parent = self 137 | 138 | def getAttributes(self): 139 | return AttrList(self.element) 140 | 141 | def setAttributes(self, attributes): 142 | if attributes is not None and len(attributes) > 0: 143 | 144 | converted_attributes = [] 145 | for name, value in list(attributes.items()): 146 | if isinstance(name, tuple): 147 | new_name = NamespacedAttribute(*name) 148 | del attributes[name] 149 | attributes[new_name] = value 150 | 151 | self.soup.builder._replace_cdata_list_attribute_values( 152 | self.name, attributes) 153 | for name, value in attributes.items(): 154 | self.element[name] = value 155 | 156 | # The attributes may contain variables that need substitution. 157 | # Call set_up_substitutions manually. 158 | # 159 | # The Tag constructor called this method when the Tag was created, 160 | # but we just set/changed the attributes, so call it again. 161 | self.soup.builder.set_up_substitutions(self.element) 162 | attributes = property(getAttributes, setAttributes) 163 | 164 | def insertText(self, data, insertBefore=None): 165 | text = TextNode(self.soup.new_string(data), self.soup) 166 | if insertBefore: 167 | self.insertBefore(text, insertBefore) 168 | else: 169 | self.appendChild(text) 170 | 171 | def insertBefore(self, node, refNode): 172 | index = self.element.index(refNode.element) 173 | if (node.element.__class__ == NavigableString and self.element.contents 174 | and self.element.contents[index-1].__class__ == NavigableString): 175 | # (See comments in appendChild) 176 | old_node = self.element.contents[index-1] 177 | new_str = self.soup.new_string(old_node + node.element) 178 | old_node.replace_with(new_str) 179 | else: 180 | self.element.insert(index, node.element) 181 | node.parent = self 182 | 183 | def removeChild(self, node): 184 | node.element.extract() 185 | 186 | def reparentChildren(self, newParent): 187 | while self.element.contents: 188 | child = self.element.contents[0] 189 | child.extract() 190 | if isinstance(child, Tag): 191 | newParent.appendChild( 192 | Element(child, self.soup, namespaces["html"])) 193 | else: 194 | newParent.appendChild( 195 | TextNode(child, self.soup)) 196 | 197 | def cloneNode(self): 198 | tag = self.soup.new_tag(self.element.name, self.namespace) 199 | node = Element(tag, self.soup, self.namespace) 200 | for key,value in self.attributes: 201 | node.attributes[key] = value 202 | return node 203 | 204 | def hasContent(self): 205 | return self.element.contents 206 | 207 | def getNameTuple(self): 208 | if self.namespace == None: 209 | return namespaces["html"], self.name 210 | else: 211 | return self.namespace, self.name 212 | 213 | nameTuple = property(getNameTuple) 214 | 215 | class TextNode(Element): 216 | def __init__(self, element, soup): 217 | html5lib.treebuilders._base.Node.__init__(self, None) 218 | self.element = element 219 | self.soup = soup 220 | 221 | def cloneNode(self): 222 | raise NotImplementedError 223 | -------------------------------------------------------------------------------- /source/bs4/builder/_htmlparser.py: -------------------------------------------------------------------------------- 1 | """Use the HTMLParser library to parse HTML files that aren't too bad.""" 2 | 3 | __all__ = [ 4 | 'HTMLParserTreeBuilder', 5 | ] 6 | 7 | from HTMLParser import ( 8 | HTMLParser, 9 | HTMLParseError, 10 | ) 11 | import sys 12 | import warnings 13 | 14 | # Starting in Python 3.2, the HTMLParser constructor takes a 'strict' 15 | # argument, which we'd like to set to False. Unfortunately, 16 | # http://bugs.python.org/issue13273 makes strict=True a better bet 17 | # before Python 3.2.3. 18 | # 19 | # At the end of this file, we monkeypatch HTMLParser so that 20 | # strict=True works well on Python 3.2.2. 21 | major, minor, release = sys.version_info[:3] 22 | CONSTRUCTOR_TAKES_STRICT = ( 23 | major > 3 24 | or (major == 3 and minor > 2) 25 | or (major == 3 and minor == 2 and release >= 3)) 26 | 27 | from bs4.element import ( 28 | CData, 29 | Comment, 30 | Declaration, 31 | Doctype, 32 | ProcessingInstruction, 33 | ) 34 | from bs4.dammit import EntitySubstitution, UnicodeDammit 35 | 36 | from bs4.builder import ( 37 | HTML, 38 | HTMLTreeBuilder, 39 | STRICT, 40 | ) 41 | 42 | 43 | HTMLPARSER = 'html.parser' 44 | 45 | class BeautifulSoupHTMLParser(HTMLParser): 46 | def handle_starttag(self, name, attrs): 47 | # XXX namespace 48 | self.soup.handle_starttag(name, None, None, dict(attrs)) 49 | 50 | def handle_endtag(self, name): 51 | self.soup.handle_endtag(name) 52 | 53 | def handle_data(self, data): 54 | self.soup.handle_data(data) 55 | 56 | def handle_charref(self, name): 57 | # XXX workaround for a bug in HTMLParser. Remove this once 58 | # it's fixed. 59 | if name.startswith('x'): 60 | real_name = int(name.lstrip('x'), 16) 61 | else: 62 | real_name = int(name) 63 | 64 | try: 65 | data = unichr(real_name) 66 | except (ValueError, OverflowError), e: 67 | data = u"\N{REPLACEMENT CHARACTER}" 68 | 69 | self.handle_data(data) 70 | 71 | def handle_entityref(self, name): 72 | character = EntitySubstitution.HTML_ENTITY_TO_CHARACTER.get(name) 73 | if character is not None: 74 | data = character 75 | else: 76 | data = "&%s;" % name 77 | self.handle_data(data) 78 | 79 | def handle_comment(self, data): 80 | self.soup.endData() 81 | self.soup.handle_data(data) 82 | self.soup.endData(Comment) 83 | 84 | def handle_decl(self, data): 85 | self.soup.endData() 86 | if data.startswith("DOCTYPE "): 87 | data = data[len("DOCTYPE "):] 88 | self.soup.handle_data(data) 89 | self.soup.endData(Doctype) 90 | 91 | def unknown_decl(self, data): 92 | if data.upper().startswith('CDATA['): 93 | cls = CData 94 | data = data[len('CDATA['):] 95 | else: 96 | cls = Declaration 97 | self.soup.endData() 98 | self.soup.handle_data(data) 99 | self.soup.endData(cls) 100 | 101 | def handle_pi(self, data): 102 | self.soup.endData() 103 | if data.endswith("?") and data.lower().startswith("xml"): 104 | # "An XHTML processing instruction using the trailing '?' 105 | # will cause the '?' to be included in data." - HTMLParser 106 | # docs. 107 | # 108 | # Strip the question mark so we don't end up with two 109 | # question marks. 110 | data = data[:-1] 111 | self.soup.handle_data(data) 112 | self.soup.endData(ProcessingInstruction) 113 | 114 | 115 | class HTMLParserTreeBuilder(HTMLTreeBuilder): 116 | 117 | is_xml = False 118 | features = [HTML, STRICT, HTMLPARSER] 119 | 120 | def __init__(self, *args, **kwargs): 121 | if CONSTRUCTOR_TAKES_STRICT: 122 | kwargs['strict'] = False 123 | self.parser_args = (args, kwargs) 124 | 125 | def prepare_markup(self, markup, user_specified_encoding=None, 126 | document_declared_encoding=None): 127 | """ 128 | :return: A 4-tuple (markup, original encoding, encoding 129 | declared within markup, whether any characters had to be 130 | replaced with REPLACEMENT CHARACTER). 131 | """ 132 | if isinstance(markup, unicode): 133 | return markup, None, None, False 134 | 135 | try_encodings = [user_specified_encoding, document_declared_encoding] 136 | dammit = UnicodeDammit(markup, try_encodings, is_html=True) 137 | return (dammit.markup, dammit.original_encoding, 138 | dammit.declared_html_encoding, 139 | dammit.contains_replacement_characters) 140 | 141 | def feed(self, markup): 142 | args, kwargs = self.parser_args 143 | parser = BeautifulSoupHTMLParser(*args, **kwargs) 144 | parser.soup = self.soup 145 | try: 146 | parser.feed(markup) 147 | except HTMLParseError, e: 148 | warnings.warn(RuntimeWarning( 149 | "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) 150 | raise e 151 | 152 | # Patch 3.2 versions of HTMLParser earlier than 3.2.3 to use some 153 | # 3.2.3 code. This ensures they don't treat markup like

as a 154 | # string. 155 | # 156 | # XXX This code can be removed once most Python 3 users are on 3.2.3. 157 | if major == 3 and minor == 2 and not CONSTRUCTOR_TAKES_STRICT: 158 | import re 159 | attrfind_tolerant = re.compile( 160 | r'\s*((?<=[\'"\s])[^\s/>][^\s/=>]*)(\s*=+\s*' 161 | r'(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?') 162 | HTMLParserTreeBuilder.attrfind_tolerant = attrfind_tolerant 163 | 164 | locatestarttagend = re.compile(r""" 165 | <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name 166 | (?:\s+ # whitespace before attribute name 167 | (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name 168 | (?:\s*=\s* # value indicator 169 | (?:'[^']*' # LITA-enclosed value 170 | |\"[^\"]*\" # LIT-enclosed value 171 | |[^'\">\s]+ # bare value 172 | ) 173 | )? 174 | ) 175 | )* 176 | \s* # trailing whitespace 177 | """, re.VERBOSE) 178 | BeautifulSoupHTMLParser.locatestarttagend = locatestarttagend 179 | 180 | from html.parser import tagfind, attrfind 181 | 182 | def parse_starttag(self, i): 183 | self.__starttag_text = None 184 | endpos = self.check_for_whole_start_tag(i) 185 | if endpos < 0: 186 | return endpos 187 | rawdata = self.rawdata 188 | self.__starttag_text = rawdata[i:endpos] 189 | 190 | # Now parse the data between i+1 and j into a tag and attrs 191 | attrs = [] 192 | match = tagfind.match(rawdata, i+1) 193 | assert match, 'unexpected call to parse_starttag()' 194 | k = match.end() 195 | self.lasttag = tag = rawdata[i+1:k].lower() 196 | while k < endpos: 197 | if self.strict: 198 | m = attrfind.match(rawdata, k) 199 | else: 200 | m = attrfind_tolerant.match(rawdata, k) 201 | if not m: 202 | break 203 | attrname, rest, attrvalue = m.group(1, 2, 3) 204 | if not rest: 205 | attrvalue = None 206 | elif attrvalue[:1] == '\'' == attrvalue[-1:] or \ 207 | attrvalue[:1] == '"' == attrvalue[-1:]: 208 | attrvalue = attrvalue[1:-1] 209 | if attrvalue: 210 | attrvalue = self.unescape(attrvalue) 211 | attrs.append((attrname.lower(), attrvalue)) 212 | k = m.end() 213 | 214 | end = rawdata[k:endpos].strip() 215 | if end not in (">", "/>"): 216 | lineno, offset = self.getpos() 217 | if "\n" in self.__starttag_text: 218 | lineno = lineno + self.__starttag_text.count("\n") 219 | offset = len(self.__starttag_text) \ 220 | - self.__starttag_text.rfind("\n") 221 | else: 222 | offset = offset + len(self.__starttag_text) 223 | if self.strict: 224 | self.error("junk characters in start tag: %r" 225 | % (rawdata[k:endpos][:20],)) 226 | self.handle_data(rawdata[i:endpos]) 227 | return endpos 228 | if end.endswith('/>'): 229 | # XHTML-style empty tag: 230 | self.handle_startendtag(tag, attrs) 231 | else: 232 | self.handle_starttag(tag, attrs) 233 | if tag in self.CDATA_CONTENT_ELEMENTS: 234 | self.set_cdata_mode(tag) 235 | return endpos 236 | 237 | def set_cdata_mode(self, elem): 238 | self.cdata_elem = elem.lower() 239 | self.interesting = re.compile(r'' % self.cdata_elem, re.I) 240 | 241 | BeautifulSoupHTMLParser.parse_starttag = parse_starttag 242 | BeautifulSoupHTMLParser.set_cdata_mode = set_cdata_mode 243 | 244 | CONSTRUCTOR_TAKES_STRICT = True 245 | -------------------------------------------------------------------------------- /source/bs4/builder/_lxml.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'LXMLTreeBuilderForXML', 3 | 'LXMLTreeBuilder', 4 | ] 5 | 6 | from StringIO import StringIO 7 | import collections 8 | from lxml import etree 9 | from bs4.element import Comment, Doctype, NamespacedAttribute 10 | from bs4.builder import ( 11 | FAST, 12 | HTML, 13 | HTMLTreeBuilder, 14 | PERMISSIVE, 15 | TreeBuilder, 16 | XML) 17 | from bs4.dammit import UnicodeDammit 18 | 19 | LXML = 'lxml' 20 | 21 | class LXMLTreeBuilderForXML(TreeBuilder): 22 | DEFAULT_PARSER_CLASS = etree.XMLParser 23 | 24 | is_xml = True 25 | 26 | # Well, it's permissive by XML parser standards. 27 | features = [LXML, XML, FAST, PERMISSIVE] 28 | 29 | CHUNK_SIZE = 512 30 | 31 | @property 32 | def default_parser(self): 33 | # This can either return a parser object or a class, which 34 | # will be instantiated with default arguments. 35 | return etree.XMLParser(target=self, strip_cdata=False, recover=True) 36 | 37 | def __init__(self, parser=None, empty_element_tags=None): 38 | if empty_element_tags is not None: 39 | self.empty_element_tags = set(empty_element_tags) 40 | if parser is None: 41 | # Use the default parser. 42 | parser = self.default_parser 43 | if isinstance(parser, collections.Callable): 44 | # Instantiate the parser with default arguments 45 | parser = parser(target=self, strip_cdata=False) 46 | self.parser = parser 47 | self.soup = None 48 | self.nsmaps = None 49 | 50 | def _getNsTag(self, tag): 51 | # Split the namespace URL out of a fully-qualified lxml tag 52 | # name. Copied from lxml's src/lxml/sax.py. 53 | if tag[0] == '{': 54 | return tuple(tag[1:].split('}', 1)) 55 | else: 56 | return (None, tag) 57 | 58 | def prepare_markup(self, markup, user_specified_encoding=None, 59 | document_declared_encoding=None): 60 | """ 61 | :return: A 3-tuple (markup, original encoding, encoding 62 | declared within markup). 63 | """ 64 | if isinstance(markup, unicode): 65 | return markup, None, None, False 66 | 67 | try_encodings = [user_specified_encoding, document_declared_encoding] 68 | dammit = UnicodeDammit(markup, try_encodings, is_html=True) 69 | return (dammit.markup, dammit.original_encoding, 70 | dammit.declared_html_encoding, 71 | dammit.contains_replacement_characters) 72 | 73 | def feed(self, markup): 74 | if isinstance(markup, basestring): 75 | markup = StringIO(markup) 76 | # Call feed() at least once, even if the markup is empty, 77 | # or the parser won't be initialized. 78 | data = markup.read(self.CHUNK_SIZE) 79 | self.parser.feed(data) 80 | while data != '': 81 | # Now call feed() on the rest of the data, chunk by chunk. 82 | data = markup.read(self.CHUNK_SIZE) 83 | if data != '': 84 | self.parser.feed(data) 85 | self.parser.close() 86 | 87 | def close(self): 88 | self.nsmaps = None 89 | 90 | def start(self, name, attrs, nsmap={}): 91 | # Make sure attrs is a mutable dict--lxml may send an immutable dictproxy. 92 | attrs = dict(attrs) 93 | 94 | nsprefix = None 95 | # Invert each namespace map as it comes in. 96 | if len(nsmap) == 0 and self.nsmaps != None: 97 | # There are no new namespaces for this tag, but namespaces 98 | # are in play, so we need a separate tag stack to know 99 | # when they end. 100 | self.nsmaps.append(None) 101 | elif len(nsmap) > 0: 102 | # A new namespace mapping has come into play. 103 | if self.nsmaps is None: 104 | self.nsmaps = [] 105 | inverted_nsmap = dict((value, key) for key, value in nsmap.items()) 106 | self.nsmaps.append(inverted_nsmap) 107 | # Also treat the namespace mapping as a set of attributes on the 108 | # tag, so we can recreate it later. 109 | attrs = attrs.copy() 110 | for prefix, namespace in nsmap.items(): 111 | attribute = NamespacedAttribute( 112 | "xmlns", prefix, "http://www.w3.org/2000/xmlns/") 113 | attrs[attribute] = namespace 114 | namespace, name = self._getNsTag(name) 115 | if namespace is not None: 116 | for inverted_nsmap in reversed(self.nsmaps): 117 | if inverted_nsmap is not None and namespace in inverted_nsmap: 118 | nsprefix = inverted_nsmap[namespace] 119 | break 120 | self.soup.handle_starttag(name, namespace, nsprefix, attrs) 121 | 122 | def end(self, name): 123 | self.soup.endData() 124 | completed_tag = self.soup.tagStack[-1] 125 | namespace, name = self._getNsTag(name) 126 | nsprefix = None 127 | if namespace is not None: 128 | for inverted_nsmap in reversed(self.nsmaps): 129 | if inverted_nsmap is not None and namespace in inverted_nsmap: 130 | nsprefix = inverted_nsmap[namespace] 131 | break 132 | self.soup.handle_endtag(name, nsprefix) 133 | if self.nsmaps != None: 134 | # This tag, or one of its parents, introduced a namespace 135 | # mapping, so pop it off the stack. 136 | self.nsmaps.pop() 137 | if len(self.nsmaps) == 0: 138 | # Namespaces are no longer in play, so don't bother keeping 139 | # track of the namespace stack. 140 | self.nsmaps = None 141 | 142 | def pi(self, target, data): 143 | pass 144 | 145 | def data(self, content): 146 | self.soup.handle_data(content) 147 | 148 | def doctype(self, name, pubid, system): 149 | self.soup.endData() 150 | doctype = Doctype.for_name_and_ids(name, pubid, system) 151 | self.soup.object_was_parsed(doctype) 152 | 153 | def comment(self, content): 154 | "Handle comments as Comment objects." 155 | self.soup.endData() 156 | self.soup.handle_data(content) 157 | self.soup.endData(Comment) 158 | 159 | def test_fragment_to_document(self, fragment): 160 | """See `TreeBuilder`.""" 161 | return u'\n%s' % fragment 162 | 163 | 164 | class LXMLTreeBuilder(HTMLTreeBuilder, LXMLTreeBuilderForXML): 165 | 166 | features = [LXML, HTML, FAST, PERMISSIVE] 167 | is_xml = False 168 | 169 | @property 170 | def default_parser(self): 171 | return etree.HTMLParser 172 | 173 | def feed(self, markup): 174 | self.parser.feed(markup) 175 | self.parser.close() 176 | 177 | def test_fragment_to_document(self, fragment): 178 | """See `TreeBuilder`.""" 179 | return u'%s' % fragment 180 | -------------------------------------------------------------------------------- /source/bs4/testing.py: -------------------------------------------------------------------------------- 1 | """Helper classes for tests.""" 2 | 3 | import copy 4 | import functools 5 | import unittest 6 | from unittest import TestCase 7 | from bs4 import BeautifulSoup 8 | from bs4.element import ( 9 | CharsetMetaAttributeValue, 10 | Comment, 11 | ContentMetaAttributeValue, 12 | Doctype, 13 | SoupStrainer, 14 | ) 15 | 16 | from bs4.builder import HTMLParserTreeBuilder 17 | default_builder = HTMLParserTreeBuilder 18 | 19 | 20 | class SoupTest(unittest.TestCase): 21 | 22 | @property 23 | def default_builder(self): 24 | return default_builder() 25 | 26 | def soup(self, markup, **kwargs): 27 | """Build a Beautiful Soup object from markup.""" 28 | builder = kwargs.pop('builder', self.default_builder) 29 | return BeautifulSoup(markup, builder=builder, **kwargs) 30 | 31 | def document_for(self, markup): 32 | """Turn an HTML fragment into a document. 33 | 34 | The details depend on the builder. 35 | """ 36 | return self.default_builder.test_fragment_to_document(markup) 37 | 38 | def assertSoupEquals(self, to_parse, compare_parsed_to=None): 39 | builder = self.default_builder 40 | obj = BeautifulSoup(to_parse, builder=builder) 41 | if compare_parsed_to is None: 42 | compare_parsed_to = to_parse 43 | 44 | self.assertEqual(obj.decode(), self.document_for(compare_parsed_to)) 45 | 46 | 47 | class HTMLTreeBuilderSmokeTest(object): 48 | 49 | """A basic test of a treebuilder's competence. 50 | 51 | Any HTML treebuilder, present or future, should be able to pass 52 | these tests. With invalid markup, there's room for interpretation, 53 | and different parsers can handle it differently. But with the 54 | markup in these tests, there's not much room for interpretation. 55 | """ 56 | 57 | def assertDoctypeHandled(self, doctype_fragment): 58 | """Assert that a given doctype string is handled correctly.""" 59 | doctype_str, soup = self._document_with_doctype(doctype_fragment) 60 | 61 | # Make sure a Doctype object was created. 62 | doctype = soup.contents[0] 63 | self.assertEqual(doctype.__class__, Doctype) 64 | self.assertEqual(doctype, doctype_fragment) 65 | self.assertEqual(str(soup)[:len(doctype_str)], doctype_str) 66 | 67 | # Make sure that the doctype was correctly associated with the 68 | # parse tree and that the rest of the document parsed. 69 | self.assertEqual(soup.p.contents[0], 'foo') 70 | 71 | def _document_with_doctype(self, doctype_fragment): 72 | """Generate and parse a document with the given doctype.""" 73 | doctype = '' % doctype_fragment 74 | markup = doctype + '\n

foo

' 75 | soup = self.soup(markup) 76 | return doctype, soup 77 | 78 | def test_normal_doctypes(self): 79 | """Make sure normal, everyday HTML doctypes are handled correctly.""" 80 | self.assertDoctypeHandled("html") 81 | self.assertDoctypeHandled( 82 | 'html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"') 83 | 84 | def test_public_doctype_with_url(self): 85 | doctype = 'html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"' 86 | self.assertDoctypeHandled(doctype) 87 | 88 | def test_system_doctype(self): 89 | self.assertDoctypeHandled('foo SYSTEM "http://www.example.com/"') 90 | 91 | def test_namespaced_system_doctype(self): 92 | # We can handle a namespaced doctype with a system ID. 93 | self.assertDoctypeHandled('xsl:stylesheet SYSTEM "htmlent.dtd"') 94 | 95 | def test_namespaced_public_doctype(self): 96 | # Test a namespaced doctype with a public id. 97 | self.assertDoctypeHandled('xsl:stylesheet PUBLIC "htmlent.dtd"') 98 | 99 | def test_real_xhtml_document(self): 100 | """A real XHTML document should come out more or less the same as it went in.""" 101 | markup = b""" 102 | 103 | 104 | Hello. 105 | Goodbye. 106 | """ 107 | soup = self.soup(markup) 108 | self.assertEqual( 109 | soup.encode("utf-8").replace(b"\n", b""), 110 | markup.replace(b"\n", b"")) 111 | 112 | def test_deepcopy(self): 113 | """Make sure you can copy the tree builder. 114 | 115 | This is important because the builder is part of a 116 | BeautifulSoup object, and we want to be able to copy that. 117 | """ 118 | copy.deepcopy(self.default_builder) 119 | 120 | def test_p_tag_is_never_empty_element(self): 121 | """A

tag is never designated as an empty-element tag. 122 | 123 | Even if the markup shows it as an empty-element tag, it 124 | shouldn't be presented that way. 125 | """ 126 | soup = self.soup("

") 127 | self.assertFalse(soup.p.is_empty_element) 128 | self.assertEqual(str(soup.p), "

") 129 | 130 | def test_unclosed_tags_get_closed(self): 131 | """A tag that's not closed by the end of the document should be closed. 132 | 133 | This applies to all tags except empty-element tags. 134 | """ 135 | self.assertSoupEquals("

", "

") 136 | self.assertSoupEquals("", "") 137 | 138 | self.assertSoupEquals("
", "
") 139 | 140 | def test_br_is_always_empty_element_tag(self): 141 | """A
tag is designated as an empty-element tag. 142 | 143 | Some parsers treat

as one
tag, some parsers as 144 | two tags, but it should always be an empty-element tag. 145 | """ 146 | soup = self.soup("

") 147 | self.assertTrue(soup.br.is_empty_element) 148 | self.assertEqual(str(soup.br), "
") 149 | 150 | def test_nested_formatting_elements(self): 151 | self.assertSoupEquals("") 152 | 153 | def test_comment(self): 154 | # Comments are represented as Comment objects. 155 | markup = "

foobaz

" 156 | self.assertSoupEquals(markup) 157 | 158 | soup = self.soup(markup) 159 | comment = soup.find(text="foobar") 160 | self.assertEqual(comment.__class__, Comment) 161 | 162 | def test_preserved_whitespace_in_pre_and_textarea(self): 163 | """Whitespace must be preserved in
 and ")
166 | 
167 |     def test_nested_inline_elements(self):
168 |         """Inline elements can be nested indefinitely."""
169 |         b_tag = "Inside a B tag"
170 |         self.assertSoupEquals(b_tag)
171 | 
172 |         nested_b_tag = "

A nested tag

" 173 | self.assertSoupEquals(nested_b_tag) 174 | 175 | double_nested_b_tag = "

A doubly nested tag

" 176 | self.assertSoupEquals(nested_b_tag) 177 | 178 | def test_nested_block_level_elements(self): 179 | """Block elements can be nested.""" 180 | soup = self.soup('

Foo

') 181 | blockquote = soup.blockquote 182 | self.assertEqual(blockquote.p.b.string, 'Foo') 183 | self.assertEqual(blockquote.b.string, 'Foo') 184 | 185 | def test_correctly_nested_tables(self): 186 | """One table can go inside another one.""" 187 | markup = ('' 188 | '' 189 | "') 193 | 194 | self.assertSoupEquals( 195 | markup, 196 | '
Here's another table:" 190 | '' 191 | '' 192 | '
foo
Here\'s another table:' 197 | '
foo
' 198 | '
') 199 | 200 | self.assertSoupEquals( 201 | "" 202 | "" 203 | "
Foo
Bar
Baz
") 204 | 205 | def test_angle_brackets_in_attribute_values_are_escaped(self): 206 | self.assertSoupEquals('', '') 207 | 208 | def test_entities_in_attributes_converted_to_unicode(self): 209 | expect = u'

' 210 | self.assertSoupEquals('

', expect) 211 | self.assertSoupEquals('

', expect) 212 | self.assertSoupEquals('

', expect) 213 | 214 | def test_entities_in_text_converted_to_unicode(self): 215 | expect = u'

pi\N{LATIN SMALL LETTER N WITH TILDE}ata

' 216 | self.assertSoupEquals("

piñata

", expect) 217 | self.assertSoupEquals("

piñata

", expect) 218 | self.assertSoupEquals("

piñata

", expect) 219 | 220 | def test_quot_entity_converted_to_quotation_mark(self): 221 | self.assertSoupEquals("

I said "good day!"

", 222 | '

I said "good day!"

') 223 | 224 | def test_out_of_range_entity(self): 225 | expect = u"\N{REPLACEMENT CHARACTER}" 226 | self.assertSoupEquals("�", expect) 227 | self.assertSoupEquals("�", expect) 228 | self.assertSoupEquals("�", expect) 229 | 230 | def test_basic_namespaces(self): 231 | """Parsers don't need to *understand* namespaces, but at the 232 | very least they should not choke on namespaces or lose 233 | data.""" 234 | 235 | markup = b'4' 236 | soup = self.soup(markup) 237 | self.assertEqual(markup, soup.encode()) 238 | html = soup.html 239 | self.assertEqual('http://www.w3.org/1999/xhtml', soup.html['xmlns']) 240 | self.assertEqual( 241 | 'http://www.w3.org/1998/Math/MathML', soup.html['xmlns:mathml']) 242 | self.assertEqual( 243 | 'http://www.w3.org/2000/svg', soup.html['xmlns:svg']) 244 | 245 | def test_multivalued_attribute_value_becomes_list(self): 246 | markup = b'' 247 | soup = self.soup(markup) 248 | self.assertEqual(['foo', 'bar'], soup.a['class']) 249 | 250 | # 251 | # Generally speaking, tests below this point are more tests of 252 | # Beautiful Soup than tests of the tree builders. But parsers are 253 | # weird, so we run these tests separately for every tree builder 254 | # to detect any differences between them. 255 | # 256 | 257 | def test_soupstrainer(self): 258 | """Parsers should be able to work with SoupStrainers.""" 259 | strainer = SoupStrainer("b") 260 | soup = self.soup("A bold statement", 261 | parse_only=strainer) 262 | self.assertEqual(soup.decode(), "bold") 263 | 264 | def test_single_quote_attribute_values_become_double_quotes(self): 265 | self.assertSoupEquals("", 266 | '') 267 | 268 | def test_attribute_values_with_nested_quotes_are_left_alone(self): 269 | text = """a""" 270 | self.assertSoupEquals(text) 271 | 272 | def test_attribute_values_with_double_nested_quotes_get_quoted(self): 273 | text = """a""" 274 | soup = self.soup(text) 275 | soup.foo['attr'] = 'Brawls happen at "Bob\'s Bar"' 276 | self.assertSoupEquals( 277 | soup.foo.decode(), 278 | """a""") 279 | 280 | def test_ampersand_in_attribute_value_gets_escaped(self): 281 | self.assertSoupEquals('', 282 | '') 283 | 284 | self.assertSoupEquals( 285 | 'foo', 286 | 'foo') 287 | 288 | def test_escaped_ampersand_in_attribute_value_is_left_alone(self): 289 | self.assertSoupEquals('') 290 | 291 | def test_entities_in_strings_converted_during_parsing(self): 292 | # Both XML and HTML entities are converted to Unicode characters 293 | # during parsing. 294 | text = "

<<sacré bleu!>>

" 295 | expected = u"

<<sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!>>

" 296 | self.assertSoupEquals(text, expected) 297 | 298 | def test_smart_quotes_converted_on_the_way_in(self): 299 | # Microsoft smart quotes are converted to Unicode characters during 300 | # parsing. 301 | quote = b"

\x91Foo\x92

" 302 | soup = self.soup(quote) 303 | self.assertEqual( 304 | soup.p.string, 305 | u"\N{LEFT SINGLE QUOTATION MARK}Foo\N{RIGHT SINGLE QUOTATION MARK}") 306 | 307 | def test_non_breaking_spaces_converted_on_the_way_in(self): 308 | soup = self.soup("  ") 309 | self.assertEqual(soup.a.string, u"\N{NO-BREAK SPACE}" * 2) 310 | 311 | def test_entities_converted_on_the_way_out(self): 312 | text = "

<<sacré bleu!>>

" 313 | expected = u"

<<sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!>>

".encode("utf-8") 314 | soup = self.soup(text) 315 | self.assertEqual(soup.p.encode("utf-8"), expected) 316 | 317 | def test_real_iso_latin_document(self): 318 | # Smoke test of interrelated functionality, using an 319 | # easy-to-understand document. 320 | 321 | # Here it is in Unicode. Note that it claims to be in ISO-Latin-1. 322 | unicode_html = u'

Sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!

' 323 | 324 | # That's because we're going to encode it into ISO-Latin-1, and use 325 | # that to test. 326 | iso_latin_html = unicode_html.encode("iso-8859-1") 327 | 328 | # Parse the ISO-Latin-1 HTML. 329 | soup = self.soup(iso_latin_html) 330 | # Encode it to UTF-8. 331 | result = soup.encode("utf-8") 332 | 333 | # What do we expect the result to look like? Well, it would 334 | # look like unicode_html, except that the META tag would say 335 | # UTF-8 instead of ISO-Latin-1. 336 | expected = unicode_html.replace("ISO-Latin-1", "utf-8") 337 | 338 | # And, of course, it would be in UTF-8, not Unicode. 339 | expected = expected.encode("utf-8") 340 | 341 | # Ta-da! 342 | self.assertEqual(result, expected) 343 | 344 | def test_real_shift_jis_document(self): 345 | # Smoke test to make sure the parser can handle a document in 346 | # Shift-JIS encoding, without choking. 347 | shift_jis_html = ( 348 | b'
'
349 |             b'\x82\xb1\x82\xea\x82\xcdShift-JIS\x82\xc5\x83R\x81[\x83f'
350 |             b'\x83B\x83\x93\x83O\x82\xb3\x82\xea\x82\xbd\x93\xfa\x96{\x8c'
351 |             b'\xea\x82\xcc\x83t\x83@\x83C\x83\x8b\x82\xc5\x82\xb7\x81B'
352 |             b'
') 353 | unicode_html = shift_jis_html.decode("shift-jis") 354 | soup = self.soup(unicode_html) 355 | 356 | # Make sure the parse tree is correctly encoded to various 357 | # encodings. 358 | self.assertEqual(soup.encode("utf-8"), unicode_html.encode("utf-8")) 359 | self.assertEqual(soup.encode("euc_jp"), unicode_html.encode("euc_jp")) 360 | 361 | def test_real_hebrew_document(self): 362 | # A real-world test to make sure we can convert ISO-8859-9 (a 363 | # Hebrew encoding) to UTF-8. 364 | hebrew_document = b'Hebrew (ISO 8859-8) in Visual Directionality

Hebrew (ISO 8859-8) in Visual Directionality

\xed\xe5\xec\xf9' 365 | soup = self.soup( 366 | hebrew_document, from_encoding="iso8859-8") 367 | self.assertEqual(soup.original_encoding, 'iso8859-8') 368 | self.assertEqual( 369 | soup.encode('utf-8'), 370 | hebrew_document.decode("iso8859-8").encode("utf-8")) 371 | 372 | def test_meta_tag_reflects_current_encoding(self): 373 | # Here's the tag saying that a document is 374 | # encoded in Shift-JIS. 375 | meta_tag = ('') 377 | 378 | # Here's a document incorporating that meta tag. 379 | shift_jis_html = ( 380 | '\n%s\n' 381 | '' 382 | 'Shift-JIS markup goes here.') % meta_tag 383 | soup = self.soup(shift_jis_html) 384 | 385 | # Parse the document, and the charset is seemingly unaffected. 386 | parsed_meta = soup.find('meta', {'http-equiv': 'Content-type'}) 387 | content = parsed_meta['content'] 388 | self.assertEqual('text/html; charset=x-sjis', content) 389 | 390 | # But that value is actually a ContentMetaAttributeValue object. 391 | self.assertTrue(isinstance(content, ContentMetaAttributeValue)) 392 | 393 | # And it will take on a value that reflects its current 394 | # encoding. 395 | self.assertEqual('text/html; charset=utf8', content.encode("utf8")) 396 | 397 | # For the rest of the story, see TestSubstitutions in 398 | # test_tree.py. 399 | 400 | def test_html5_style_meta_tag_reflects_current_encoding(self): 401 | # Here's the tag saying that a document is 402 | # encoded in Shift-JIS. 403 | meta_tag = ('') 404 | 405 | # Here's a document incorporating that meta tag. 406 | shift_jis_html = ( 407 | '\n%s\n' 408 | '' 409 | 'Shift-JIS markup goes here.') % meta_tag 410 | soup = self.soup(shift_jis_html) 411 | 412 | # Parse the document, and the charset is seemingly unaffected. 413 | parsed_meta = soup.find('meta', id="encoding") 414 | charset = parsed_meta['charset'] 415 | self.assertEqual('x-sjis', charset) 416 | 417 | # But that value is actually a CharsetMetaAttributeValue object. 418 | self.assertTrue(isinstance(charset, CharsetMetaAttributeValue)) 419 | 420 | # And it will take on a value that reflects its current 421 | # encoding. 422 | self.assertEqual('utf8', charset.encode("utf8")) 423 | 424 | def test_tag_with_no_attributes_can_have_attributes_added(self): 425 | data = self.soup("text") 426 | data.a['foo'] = 'bar' 427 | self.assertEqual('text', data.a.decode()) 428 | 429 | class XMLTreeBuilderSmokeTest(object): 430 | 431 | def test_docstring_generated(self): 432 | soup = self.soup("") 433 | self.assertEqual( 434 | soup.encode(), b'\n') 435 | 436 | def test_real_xhtml_document(self): 437 | """A real XHTML document should come out *exactly* the same as it went in.""" 438 | markup = b""" 439 | 440 | 441 | Hello. 442 | Goodbye. 443 | """ 444 | soup = self.soup(markup) 445 | self.assertEqual( 446 | soup.encode("utf-8"), markup) 447 | 448 | 449 | def test_docstring_includes_correct_encoding(self): 450 | soup = self.soup("") 451 | self.assertEqual( 452 | soup.encode("latin1"), 453 | b'\n') 454 | 455 | def test_large_xml_document(self): 456 | """A large XML document should come out the same as it went in.""" 457 | markup = (b'\n' 458 | + b'0' * (2**12) 459 | + b'') 460 | soup = self.soup(markup) 461 | self.assertEqual(soup.encode("utf-8"), markup) 462 | 463 | 464 | def test_tags_are_empty_element_if_and_only_if_they_are_empty(self): 465 | self.assertSoupEquals("

", "

") 466 | self.assertSoupEquals("

foo

") 467 | 468 | def test_namespaces_are_preserved(self): 469 | markup = 'This tag is in the a namespaceThis tag is in the b namespace' 470 | soup = self.soup(markup) 471 | root = soup.root 472 | self.assertEqual("http://example.com/", root['xmlns:a']) 473 | self.assertEqual("http://example.net/", root['xmlns:b']) 474 | 475 | 476 | class HTML5TreeBuilderSmokeTest(HTMLTreeBuilderSmokeTest): 477 | """Smoke test for a tree builder that supports HTML5.""" 478 | 479 | def test_real_xhtml_document(self): 480 | # Since XHTML is not HTML5, HTML5 parsers are not tested to handle 481 | # XHTML documents in any particular way. 482 | pass 483 | 484 | def test_html_tags_have_namespace(self): 485 | markup = "" 486 | soup = self.soup(markup) 487 | self.assertEqual("http://www.w3.org/1999/xhtml", soup.a.namespace) 488 | 489 | def test_svg_tags_have_namespace(self): 490 | markup = '' 491 | soup = self.soup(markup) 492 | namespace = "http://www.w3.org/2000/svg" 493 | self.assertEqual(namespace, soup.svg.namespace) 494 | self.assertEqual(namespace, soup.circle.namespace) 495 | 496 | 497 | def test_mathml_tags_have_namespace(self): 498 | markup = '5' 499 | soup = self.soup(markup) 500 | namespace = 'http://www.w3.org/1998/Math/MathML' 501 | self.assertEqual(namespace, soup.math.namespace) 502 | self.assertEqual(namespace, soup.msqrt.namespace) 503 | 504 | 505 | def skipIf(condition, reason): 506 | def nothing(test, *args, **kwargs): 507 | return None 508 | 509 | def decorator(test_item): 510 | if condition: 511 | return nothing 512 | else: 513 | return test_item 514 | 515 | return decorator 516 | -------------------------------------------------------------------------------- /source/bs4/tests/__init__.py: -------------------------------------------------------------------------------- 1 | "The beautifulsoup tests." 2 | -------------------------------------------------------------------------------- /source/bs4/tests/test_builder_registry.py: -------------------------------------------------------------------------------- 1 | """Tests of the builder registry.""" 2 | 3 | import unittest 4 | 5 | from bs4 import BeautifulSoup 6 | from bs4.builder import ( 7 | builder_registry as registry, 8 | HTMLParserTreeBuilder, 9 | TreeBuilderRegistry, 10 | ) 11 | 12 | try: 13 | from bs4.builder import HTML5TreeBuilder 14 | HTML5LIB_PRESENT = True 15 | except ImportError: 16 | HTML5LIB_PRESENT = False 17 | 18 | try: 19 | from bs4.builder import ( 20 | LXMLTreeBuilderForXML, 21 | LXMLTreeBuilder, 22 | ) 23 | LXML_PRESENT = True 24 | except ImportError: 25 | LXML_PRESENT = False 26 | 27 | 28 | class BuiltInRegistryTest(unittest.TestCase): 29 | """Test the built-in registry with the default builders registered.""" 30 | 31 | def test_combination(self): 32 | if LXML_PRESENT: 33 | self.assertEqual(registry.lookup('fast', 'html'), 34 | LXMLTreeBuilder) 35 | 36 | if LXML_PRESENT: 37 | self.assertEqual(registry.lookup('permissive', 'xml'), 38 | LXMLTreeBuilderForXML) 39 | self.assertEqual(registry.lookup('strict', 'html'), 40 | HTMLParserTreeBuilder) 41 | if HTML5LIB_PRESENT: 42 | self.assertEqual(registry.lookup('html5lib', 'html'), 43 | HTML5TreeBuilder) 44 | 45 | def test_lookup_by_markup_type(self): 46 | if LXML_PRESENT: 47 | self.assertEqual(registry.lookup('html'), LXMLTreeBuilder) 48 | self.assertEqual(registry.lookup('xml'), LXMLTreeBuilderForXML) 49 | else: 50 | self.assertEqual(registry.lookup('xml'), None) 51 | if HTML5LIB_PRESENT: 52 | self.assertEqual(registry.lookup('html'), HTML5TreeBuilder) 53 | else: 54 | self.assertEqual(registry.lookup('html'), HTMLParserTreeBuilder) 55 | 56 | def test_named_library(self): 57 | if LXML_PRESENT: 58 | self.assertEqual(registry.lookup('lxml', 'xml'), 59 | LXMLTreeBuilderForXML) 60 | self.assertEqual(registry.lookup('lxml', 'html'), 61 | LXMLTreeBuilder) 62 | if HTML5LIB_PRESENT: 63 | self.assertEqual(registry.lookup('html5lib'), 64 | HTML5TreeBuilder) 65 | 66 | self.assertEqual(registry.lookup('html.parser'), 67 | HTMLParserTreeBuilder) 68 | 69 | def test_beautifulsoup_constructor_does_lookup(self): 70 | # You can pass in a string. 71 | BeautifulSoup("", features="html") 72 | # Or a list of strings. 73 | BeautifulSoup("", features=["html", "fast"]) 74 | 75 | # You'll get an exception if BS can't find an appropriate 76 | # builder. 77 | self.assertRaises(ValueError, BeautifulSoup, 78 | "", features="no-such-feature") 79 | 80 | class RegistryTest(unittest.TestCase): 81 | """Test the TreeBuilderRegistry class in general.""" 82 | 83 | def setUp(self): 84 | self.registry = TreeBuilderRegistry() 85 | 86 | def builder_for_features(self, *feature_list): 87 | cls = type('Builder_' + '_'.join(feature_list), 88 | (object,), {'features' : feature_list}) 89 | 90 | self.registry.register(cls) 91 | return cls 92 | 93 | def test_register_with_no_features(self): 94 | builder = self.builder_for_features() 95 | 96 | # Since the builder advertises no features, you can't find it 97 | # by looking up features. 98 | self.assertEqual(self.registry.lookup('foo'), None) 99 | 100 | # But you can find it by doing a lookup with no features, if 101 | # this happens to be the only registered builder. 102 | self.assertEqual(self.registry.lookup(), builder) 103 | 104 | def test_register_with_features_makes_lookup_succeed(self): 105 | builder = self.builder_for_features('foo', 'bar') 106 | self.assertEqual(self.registry.lookup('foo'), builder) 107 | self.assertEqual(self.registry.lookup('bar'), builder) 108 | 109 | def test_lookup_fails_when_no_builder_implements_feature(self): 110 | builder = self.builder_for_features('foo', 'bar') 111 | self.assertEqual(self.registry.lookup('baz'), None) 112 | 113 | def test_lookup_gets_most_recent_registration_when_no_feature_specified(self): 114 | builder1 = self.builder_for_features('foo') 115 | builder2 = self.builder_for_features('bar') 116 | self.assertEqual(self.registry.lookup(), builder2) 117 | 118 | def test_lookup_fails_when_no_tree_builders_registered(self): 119 | self.assertEqual(self.registry.lookup(), None) 120 | 121 | def test_lookup_gets_most_recent_builder_supporting_all_features(self): 122 | has_one = self.builder_for_features('foo') 123 | has_the_other = self.builder_for_features('bar') 124 | has_both_early = self.builder_for_features('foo', 'bar', 'baz') 125 | has_both_late = self.builder_for_features('foo', 'bar', 'quux') 126 | lacks_one = self.builder_for_features('bar') 127 | has_the_other = self.builder_for_features('foo') 128 | 129 | # There are two builders featuring 'foo' and 'bar', but 130 | # the one that also features 'quux' was registered later. 131 | self.assertEqual(self.registry.lookup('foo', 'bar'), 132 | has_both_late) 133 | 134 | # There is only one builder featuring 'foo', 'bar', and 'baz'. 135 | self.assertEqual(self.registry.lookup('foo', 'bar', 'baz'), 136 | has_both_early) 137 | 138 | def test_lookup_fails_when_cannot_reconcile_requested_features(self): 139 | builder1 = self.builder_for_features('foo', 'bar') 140 | builder2 = self.builder_for_features('foo', 'baz') 141 | self.assertEqual(self.registry.lookup('bar', 'baz'), None) 142 | -------------------------------------------------------------------------------- /source/bs4/tests/test_docs.py: -------------------------------------------------------------------------------- 1 | "Test harness for doctests." 2 | 3 | # pylint: disable-msg=E0611,W0142 4 | 5 | __metaclass__ = type 6 | __all__ = [ 7 | 'additional_tests', 8 | ] 9 | 10 | import atexit 11 | import doctest 12 | import os 13 | #from pkg_resources import ( 14 | # resource_filename, resource_exists, resource_listdir, cleanup_resources) 15 | import unittest 16 | 17 | DOCTEST_FLAGS = ( 18 | doctest.ELLIPSIS | 19 | doctest.NORMALIZE_WHITESPACE | 20 | doctest.REPORT_NDIFF) 21 | 22 | 23 | # def additional_tests(): 24 | # "Run the doc tests (README.txt and docs/*, if any exist)" 25 | # doctest_files = [ 26 | # os.path.abspath(resource_filename('bs4', 'README.txt'))] 27 | # if resource_exists('bs4', 'docs'): 28 | # for name in resource_listdir('bs4', 'docs'): 29 | # if name.endswith('.txt'): 30 | # doctest_files.append( 31 | # os.path.abspath( 32 | # resource_filename('bs4', 'docs/%s' % name))) 33 | # kwargs = dict(module_relative=False, optionflags=DOCTEST_FLAGS) 34 | # atexit.register(cleanup_resources) 35 | # return unittest.TestSuite(( 36 | # doctest.DocFileSuite(*doctest_files, **kwargs))) 37 | -------------------------------------------------------------------------------- /source/bs4/tests/test_html5lib.py: -------------------------------------------------------------------------------- 1 | """Tests to ensure that the html5lib tree builder generates good trees.""" 2 | 3 | import warnings 4 | 5 | try: 6 | from bs4.builder import HTML5TreeBuilder 7 | HTML5LIB_PRESENT = True 8 | except ImportError, e: 9 | HTML5LIB_PRESENT = False 10 | from bs4.element import SoupStrainer 11 | from bs4.testing import ( 12 | HTML5TreeBuilderSmokeTest, 13 | SoupTest, 14 | skipIf, 15 | ) 16 | 17 | @skipIf( 18 | not HTML5LIB_PRESENT, 19 | "html5lib seems not to be present, not testing its tree builder.") 20 | class HTML5LibBuilderSmokeTest(SoupTest, HTML5TreeBuilderSmokeTest): 21 | """See ``HTML5TreeBuilderSmokeTest``.""" 22 | 23 | @property 24 | def default_builder(self): 25 | return HTML5TreeBuilder() 26 | 27 | def test_soupstrainer(self): 28 | # The html5lib tree builder does not support SoupStrainers. 29 | strainer = SoupStrainer("b") 30 | markup = "

A bold statement.

" 31 | with warnings.catch_warnings(record=True) as w: 32 | soup = self.soup(markup, parse_only=strainer) 33 | self.assertEqual( 34 | soup.decode(), self.document_for(markup)) 35 | 36 | self.assertTrue( 37 | "the html5lib tree builder doesn't support parse_only" in 38 | str(w[0].message)) 39 | 40 | def test_correctly_nested_tables(self): 41 | """html5lib inserts tags where other parsers don't.""" 42 | markup = ('' 43 | '' 44 | "') 48 | 49 | self.assertSoupEquals( 50 | markup, 51 | '
Here's another table:" 45 | '' 46 | '' 47 | '
foo
Here\'s another table:' 52 | '
foo
' 53 | '
') 54 | 55 | self.assertSoupEquals( 56 | "" 57 | "" 58 | "
Foo
Bar
Baz
") 59 | -------------------------------------------------------------------------------- /source/bs4/tests/test_htmlparser.py: -------------------------------------------------------------------------------- 1 | """Tests to ensure that the html.parser tree builder generates good 2 | trees.""" 3 | 4 | from bs4.testing import SoupTest, HTMLTreeBuilderSmokeTest 5 | from bs4.builder import HTMLParserTreeBuilder 6 | 7 | class HTMLParserTreeBuilderSmokeTest(SoupTest, HTMLTreeBuilderSmokeTest): 8 | 9 | @property 10 | def default_builder(self): 11 | return HTMLParserTreeBuilder() 12 | 13 | def test_namespaced_system_doctype(self): 14 | # html.parser can't handle namespaced doctypes, so skip this one. 15 | pass 16 | 17 | def test_namespaced_public_doctype(self): 18 | # html.parser can't handle namespaced doctypes, so skip this one. 19 | pass 20 | -------------------------------------------------------------------------------- /source/bs4/tests/test_lxml.py: -------------------------------------------------------------------------------- 1 | """Tests to ensure that the lxml tree builder generates good trees.""" 2 | 3 | import re 4 | import warnings 5 | 6 | try: 7 | from bs4.builder import LXMLTreeBuilder, LXMLTreeBuilderForXML 8 | LXML_PRESENT = True 9 | except ImportError, e: 10 | LXML_PRESENT = False 11 | 12 | from bs4 import ( 13 | BeautifulSoup, 14 | BeautifulStoneSoup, 15 | ) 16 | from bs4.element import Comment, Doctype, SoupStrainer 17 | from bs4.testing import skipIf 18 | from bs4.tests import test_htmlparser 19 | from bs4.testing import ( 20 | HTMLTreeBuilderSmokeTest, 21 | XMLTreeBuilderSmokeTest, 22 | SoupTest, 23 | skipIf, 24 | ) 25 | 26 | @skipIf( 27 | not LXML_PRESENT, 28 | "lxml seems not to be present, not testing its tree builder.") 29 | class LXMLTreeBuilderSmokeTest(SoupTest, HTMLTreeBuilderSmokeTest): 30 | """See ``HTMLTreeBuilderSmokeTest``.""" 31 | 32 | @property 33 | def default_builder(self): 34 | return LXMLTreeBuilder() 35 | 36 | def test_out_of_range_entity(self): 37 | self.assertSoupEquals( 38 | "

foo�bar

", "

foobar

") 39 | self.assertSoupEquals( 40 | "

foo�bar

", "

foobar

") 41 | self.assertSoupEquals( 42 | "

foo�bar

", "

foobar

") 43 | 44 | def test_beautifulstonesoup_is_xml_parser(self): 45 | # Make sure that the deprecated BSS class uses an xml builder 46 | # if one is installed. 47 | with warnings.catch_warnings(record=False) as w: 48 | soup = BeautifulStoneSoup("") 49 | self.assertEqual(u"", unicode(soup.b)) 50 | 51 | def test_real_xhtml_document(self): 52 | """lxml strips the XML definition from an XHTML doc, which is fine.""" 53 | markup = b""" 54 | 55 | 56 | Hello. 57 | Goodbye. 58 | """ 59 | soup = self.soup(markup) 60 | self.assertEqual( 61 | soup.encode("utf-8").replace(b"\n", b''), 62 | markup.replace(b'\n', b'').replace( 63 | b'', b'')) 64 | 65 | 66 | @skipIf( 67 | not LXML_PRESENT, 68 | "lxml seems not to be present, not testing its XML tree builder.") 69 | class LXMLXMLTreeBuilderSmokeTest(SoupTest, XMLTreeBuilderSmokeTest): 70 | """See ``HTMLTreeBuilderSmokeTest``.""" 71 | 72 | @property 73 | def default_builder(self): 74 | return LXMLTreeBuilderForXML() 75 | 76 | -------------------------------------------------------------------------------- /source/bs4/tests/test_soup.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Tests of Beautiful Soup as a whole.""" 3 | 4 | import unittest 5 | from bs4 import ( 6 | BeautifulSoup, 7 | BeautifulStoneSoup, 8 | ) 9 | from bs4.element import ( 10 | CharsetMetaAttributeValue, 11 | ContentMetaAttributeValue, 12 | SoupStrainer, 13 | NamespacedAttribute, 14 | ) 15 | import bs4.dammit 16 | from bs4.dammit import EntitySubstitution, UnicodeDammit 17 | from bs4.testing import ( 18 | SoupTest, 19 | skipIf, 20 | ) 21 | import warnings 22 | 23 | try: 24 | from bs4.builder import LXMLTreeBuilder, LXMLTreeBuilderForXML 25 | LXML_PRESENT = True 26 | except ImportError, e: 27 | LXML_PRESENT = False 28 | 29 | class TestDeprecatedConstructorArguments(SoupTest): 30 | 31 | def test_parseOnlyThese_renamed_to_parse_only(self): 32 | with warnings.catch_warnings(record=True) as w: 33 | soup = self.soup("
", parseOnlyThese=SoupStrainer("b")) 34 | msg = str(w[0].message) 35 | self.assertTrue("parseOnlyThese" in msg) 36 | self.assertTrue("parse_only" in msg) 37 | self.assertEqual(b"", soup.encode()) 38 | 39 | def test_fromEncoding_renamed_to_from_encoding(self): 40 | with warnings.catch_warnings(record=True) as w: 41 | utf8 = b"\xc3\xa9" 42 | soup = self.soup(utf8, fromEncoding="utf8") 43 | msg = str(w[0].message) 44 | self.assertTrue("fromEncoding" in msg) 45 | self.assertTrue("from_encoding" in msg) 46 | self.assertEqual("utf8", soup.original_encoding) 47 | 48 | def test_unrecognized_keyword_argument(self): 49 | self.assertRaises( 50 | TypeError, self.soup, "", no_such_argument=True) 51 | 52 | @skipIf( 53 | not LXML_PRESENT, 54 | "lxml not present, not testing BeautifulStoneSoup.") 55 | def test_beautifulstonesoup(self): 56 | with warnings.catch_warnings(record=True) as w: 57 | soup = BeautifulStoneSoup("") 58 | self.assertTrue(isinstance(soup, BeautifulSoup)) 59 | self.assertTrue("BeautifulStoneSoup class is deprecated") 60 | 61 | class TestSelectiveParsing(SoupTest): 62 | 63 | def test_parse_with_soupstrainer(self): 64 | markup = "NoYesNoYes Yes" 65 | strainer = SoupStrainer("b") 66 | soup = self.soup(markup, parse_only=strainer) 67 | self.assertEqual(soup.encode(), b"YesYes Yes") 68 | 69 | 70 | class TestEntitySubstitution(unittest.TestCase): 71 | """Standalone tests of the EntitySubstitution class.""" 72 | def setUp(self): 73 | self.sub = EntitySubstitution 74 | 75 | def test_simple_html_substitution(self): 76 | # Unicode characters corresponding to named HTML entites 77 | # are substituted, and no others. 78 | s = u"foo\u2200\N{SNOWMAN}\u00f5bar" 79 | self.assertEqual(self.sub.substitute_html(s), 80 | u"foo∀\N{SNOWMAN}õbar") 81 | 82 | def test_smart_quote_substitution(self): 83 | # MS smart quotes are a common source of frustration, so we 84 | # give them a special test. 85 | quotes = b"\x91\x92foo\x93\x94" 86 | dammit = UnicodeDammit(quotes) 87 | self.assertEqual(self.sub.substitute_html(dammit.markup), 88 | "‘’foo“”") 89 | 90 | def test_xml_converstion_includes_no_quotes_if_make_quoted_attribute_is_false(self): 91 | s = 'Welcome to "my bar"' 92 | self.assertEqual(self.sub.substitute_xml(s, False), s) 93 | 94 | def test_xml_attribute_quoting_normally_uses_double_quotes(self): 95 | self.assertEqual(self.sub.substitute_xml("Welcome", True), 96 | '"Welcome"') 97 | self.assertEqual(self.sub.substitute_xml("Bob's Bar", True), 98 | '"Bob\'s Bar"') 99 | 100 | def test_xml_attribute_quoting_uses_single_quotes_when_value_contains_double_quotes(self): 101 | s = 'Welcome to "my bar"' 102 | self.assertEqual(self.sub.substitute_xml(s, True), 103 | "'Welcome to \"my bar\"'") 104 | 105 | def test_xml_attribute_quoting_escapes_single_quotes_when_value_contains_both_single_and_double_quotes(self): 106 | s = 'Welcome to "Bob\'s Bar"' 107 | self.assertEqual( 108 | self.sub.substitute_xml(s, True), 109 | '"Welcome to "Bob\'s Bar""') 110 | 111 | def test_xml_quotes_arent_escaped_when_value_is_not_being_quoted(self): 112 | quoted = 'Welcome to "Bob\'s Bar"' 113 | self.assertEqual(self.sub.substitute_xml(quoted), quoted) 114 | 115 | def test_xml_quoting_handles_angle_brackets(self): 116 | self.assertEqual( 117 | self.sub.substitute_xml("foo"), 118 | "foo<bar>") 119 | 120 | def test_xml_quoting_handles_ampersands(self): 121 | self.assertEqual(self.sub.substitute_xml("AT&T"), "AT&T") 122 | 123 | def test_xml_quoting_ignores_ampersands_when_they_are_part_of_an_entity(self): 124 | self.assertEqual( 125 | self.sub.substitute_xml("ÁT&T"), 126 | "ÁT&T") 127 | 128 | def test_quotes_not_html_substituted(self): 129 | """There's no need to do this except inside attribute values.""" 130 | text = 'Bob\'s "bar"' 131 | self.assertEqual(self.sub.substitute_html(text), text) 132 | 133 | 134 | class TestEncodingConversion(SoupTest): 135 | # Test Beautiful Soup's ability to decode and encode from various 136 | # encodings. 137 | 138 | def setUp(self): 139 | super(TestEncodingConversion, self).setUp() 140 | self.unicode_data = u"Sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!" 141 | self.utf8_data = self.unicode_data.encode("utf-8") 142 | # Just so you know what it looks like. 143 | self.assertEqual( 144 | self.utf8_data, 145 | b"Sacr\xc3\xa9 bleu!") 146 | 147 | def test_ascii_in_unicode_out(self): 148 | # ASCII input is converted to Unicode. The original_encoding 149 | # attribute is set. 150 | ascii = b"a" 151 | soup_from_ascii = self.soup(ascii) 152 | unicode_output = soup_from_ascii.decode() 153 | self.assertTrue(isinstance(unicode_output, unicode)) 154 | self.assertEqual(unicode_output, self.document_for(ascii.decode())) 155 | self.assertEqual(soup_from_ascii.original_encoding, "ascii") 156 | 157 | def test_unicode_in_unicode_out(self): 158 | # Unicode input is left alone. The original_encoding attribute 159 | # is not set. 160 | soup_from_unicode = self.soup(self.unicode_data) 161 | self.assertEqual(soup_from_unicode.decode(), self.unicode_data) 162 | self.assertEqual(soup_from_unicode.foo.string, u'Sacr\xe9 bleu!') 163 | self.assertEqual(soup_from_unicode.original_encoding, None) 164 | 165 | def test_utf8_in_unicode_out(self): 166 | # UTF-8 input is converted to Unicode. The original_encoding 167 | # attribute is set. 168 | soup_from_utf8 = self.soup(self.utf8_data) 169 | self.assertEqual(soup_from_utf8.decode(), self.unicode_data) 170 | self.assertEqual(soup_from_utf8.foo.string, u'Sacr\xe9 bleu!') 171 | 172 | def test_utf8_out(self): 173 | # The internal data structures can be encoded as UTF-8. 174 | soup_from_unicode = self.soup(self.unicode_data) 175 | self.assertEqual(soup_from_unicode.encode('utf-8'), self.utf8_data) 176 | 177 | 178 | class TestUnicodeDammit(unittest.TestCase): 179 | """Standalone tests of Unicode, Dammit.""" 180 | 181 | def test_smart_quotes_to_unicode(self): 182 | markup = b"\x91\x92\x93\x94" 183 | dammit = UnicodeDammit(markup) 184 | self.assertEqual( 185 | dammit.unicode_markup, u"\u2018\u2019\u201c\u201d") 186 | 187 | def test_smart_quotes_to_xml_entities(self): 188 | markup = b"\x91\x92\x93\x94" 189 | dammit = UnicodeDammit(markup, smart_quotes_to="xml") 190 | self.assertEqual( 191 | dammit.unicode_markup, "‘’“”") 192 | 193 | def test_smart_quotes_to_html_entities(self): 194 | markup = b"\x91\x92\x93\x94" 195 | dammit = UnicodeDammit(markup, smart_quotes_to="html") 196 | self.assertEqual( 197 | dammit.unicode_markup, "‘’“”") 198 | 199 | def test_smart_quotes_to_ascii(self): 200 | markup = b"\x91\x92\x93\x94" 201 | dammit = UnicodeDammit(markup, smart_quotes_to="ascii") 202 | self.assertEqual( 203 | dammit.unicode_markup, """''""""") 204 | 205 | def test_detect_utf8(self): 206 | utf8 = b"\xc3\xa9" 207 | dammit = UnicodeDammit(utf8) 208 | self.assertEqual(dammit.unicode_markup, u'\xe9') 209 | self.assertEqual(dammit.original_encoding, 'utf-8') 210 | 211 | def test_convert_hebrew(self): 212 | hebrew = b"\xed\xe5\xec\xf9" 213 | dammit = UnicodeDammit(hebrew, ["iso-8859-8"]) 214 | self.assertEqual(dammit.original_encoding, 'iso-8859-8') 215 | self.assertEqual(dammit.unicode_markup, u'\u05dd\u05d5\u05dc\u05e9') 216 | 217 | def test_dont_see_smart_quotes_where_there_are_none(self): 218 | utf_8 = b"\343\202\261\343\203\274\343\202\277\343\202\244 Watch" 219 | dammit = UnicodeDammit(utf_8) 220 | self.assertEqual(dammit.original_encoding, 'utf-8') 221 | self.assertEqual(dammit.unicode_markup.encode("utf-8"), utf_8) 222 | 223 | def test_ignore_inappropriate_codecs(self): 224 | utf8_data = u"Räksmörgås".encode("utf-8") 225 | dammit = UnicodeDammit(utf8_data, ["iso-8859-8"]) 226 | self.assertEqual(dammit.original_encoding, 'utf-8') 227 | 228 | def test_ignore_invalid_codecs(self): 229 | utf8_data = u"Räksmörgås".encode("utf-8") 230 | for bad_encoding in ['.utf8', '...', 'utF---16.!']: 231 | dammit = UnicodeDammit(utf8_data, [bad_encoding]) 232 | self.assertEqual(dammit.original_encoding, 'utf-8') 233 | 234 | def test_detect_html5_style_meta_tag(self): 235 | 236 | for data in ( 237 | b'', 238 | b"", 239 | b"", 240 | b""): 241 | dammit = UnicodeDammit(data, is_html=True) 242 | self.assertEqual( 243 | "euc-jp", dammit.original_encoding) 244 | 245 | def test_last_ditch_entity_replacement(self): 246 | # This is a UTF-8 document that contains bytestrings 247 | # completely incompatible with UTF-8 (ie. encoded with some other 248 | # encoding). 249 | # 250 | # Since there is no consistent encoding for the document, 251 | # Unicode, Dammit will eventually encode the document as UTF-8 252 | # and encode the incompatible characters as REPLACEMENT 253 | # CHARACTER. 254 | # 255 | # If chardet is installed, it will detect that the document 256 | # can be converted into ISO-8859-1 without errors. This happens 257 | # to be the wrong encoding, but it is a consistent encoding, so the 258 | # code we're testing here won't run. 259 | # 260 | # So we temporarily disable chardet if it's present. 261 | doc = b"""\357\273\277 262 | \330\250\330\252\330\261 263 | \310\322\321\220\312\321\355\344""" 264 | chardet = bs4.dammit.chardet 265 | try: 266 | bs4.dammit.chardet = None 267 | with warnings.catch_warnings(record=True) as w: 268 | dammit = UnicodeDammit(doc) 269 | self.assertEqual(True, dammit.contains_replacement_characters) 270 | self.assertTrue(u"\ufffd" in dammit.unicode_markup) 271 | 272 | soup = BeautifulSoup(doc, "html.parser") 273 | self.assertTrue(soup.contains_replacement_characters) 274 | 275 | msg = w[0].message 276 | self.assertTrue(isinstance(msg, UnicodeWarning)) 277 | self.assertTrue("Some characters could not be decoded" in str(msg)) 278 | finally: 279 | bs4.dammit.chardet = chardet 280 | 281 | def test_sniffed_xml_encoding(self): 282 | # A document written in UTF-16LE will be converted by a different 283 | # code path that sniffs the byte order markers. 284 | data = b'\xff\xfe<\x00a\x00>\x00\xe1\x00\xe9\x00<\x00/\x00a\x00>\x00' 285 | dammit = UnicodeDammit(data) 286 | self.assertEqual(u"áé", dammit.unicode_markup) 287 | self.assertEqual("utf-16le", dammit.original_encoding) 288 | 289 | def test_detwingle(self): 290 | # Here's a UTF8 document. 291 | utf8 = (u"\N{SNOWMAN}" * 3).encode("utf8") 292 | 293 | # Here's a Windows-1252 document. 294 | windows_1252 = ( 295 | u"\N{LEFT DOUBLE QUOTATION MARK}Hi, I like Windows!" 296 | u"\N{RIGHT DOUBLE QUOTATION MARK}").encode("windows_1252") 297 | 298 | # Through some unholy alchemy, they've been stuck together. 299 | doc = utf8 + windows_1252 + utf8 300 | 301 | # The document can't be turned into UTF-8: 302 | self.assertRaises(UnicodeDecodeError, doc.decode, "utf8") 303 | 304 | # Unicode, Dammit thinks the whole document is Windows-1252, 305 | # and decodes it into "☃☃☃“Hi, I like Windows!”☃☃☃" 306 | 307 | # But if we run it through fix_embedded_windows_1252, it's fixed: 308 | 309 | fixed = UnicodeDammit.detwingle(doc) 310 | self.assertEqual( 311 | u"☃☃☃“Hi, I like Windows!”☃☃☃", fixed.decode("utf8")) 312 | 313 | def test_detwingle_ignores_multibyte_characters(self): 314 | # Each of these characters has a UTF-8 representation ending 315 | # in \x93. \x93 is a smart quote if interpreted as 316 | # Windows-1252. But our code knows to skip over multibyte 317 | # UTF-8 characters, so they'll survive the process unscathed. 318 | for tricky_unicode_char in ( 319 | u"\N{LATIN SMALL LIGATURE OE}", # 2-byte char '\xc5\x93' 320 | u"\N{LATIN SUBSCRIPT SMALL LETTER X}", # 3-byte char '\xe2\x82\x93' 321 | u"\xf0\x90\x90\x93", # This is a CJK character, not sure which one. 322 | ): 323 | input = tricky_unicode_char.encode("utf8") 324 | self.assertTrue(input.endswith(b'\x93')) 325 | output = UnicodeDammit.detwingle(input) 326 | self.assertEqual(output, input) 327 | 328 | class TestNamedspacedAttribute(SoupTest): 329 | 330 | def test_name_may_be_none(self): 331 | a = NamespacedAttribute("xmlns", None) 332 | self.assertEqual(a, "xmlns") 333 | 334 | def test_attribute_is_equivalent_to_colon_separated_string(self): 335 | a = NamespacedAttribute("a", "b") 336 | self.assertEqual("a:b", a) 337 | 338 | def test_attributes_are_equivalent_if_prefix_and_name_identical(self): 339 | a = NamespacedAttribute("a", "b", "c") 340 | b = NamespacedAttribute("a", "b", "c") 341 | self.assertEqual(a, b) 342 | 343 | # The actual namespace is not considered. 344 | c = NamespacedAttribute("a", "b", None) 345 | self.assertEqual(a, c) 346 | 347 | # But name and prefix are important. 348 | d = NamespacedAttribute("a", "z", "c") 349 | self.assertNotEqual(a, d) 350 | 351 | e = NamespacedAttribute("z", "b", "c") 352 | self.assertNotEqual(a, e) 353 | 354 | 355 | class TestAttributeValueWithCharsetSubstitution(unittest.TestCase): 356 | 357 | def test_content_meta_attribute_value(self): 358 | value = CharsetMetaAttributeValue("euc-jp") 359 | self.assertEqual("euc-jp", value) 360 | self.assertEqual("euc-jp", value.original_value) 361 | self.assertEqual("utf8", value.encode("utf8")) 362 | 363 | 364 | def test_content_meta_attribute_value(self): 365 | value = ContentMetaAttributeValue("text/html; charset=euc-jp") 366 | self.assertEqual("text/html; charset=euc-jp", value) 367 | self.assertEqual("text/html; charset=euc-jp", value.original_value) 368 | self.assertEqual("text/html; charset=utf8", value.encode("utf8")) 369 | -------------------------------------------------------------------------------- /source/gank.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/gank.png -------------------------------------------------------------------------------- /source/gank.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # encoding: utf-8 3 | 4 | import sys 5 | from workflow import Workflow, web 6 | 7 | logger = None 8 | 9 | # Different icon image for different type of gank 10 | ICON_DEFAULT = 'icon.png' 11 | ICON_GANK = 'gank.png' 12 | ICON_ANDROID = 'android.png' 13 | ICON_IOS = 'apple.png' 14 | ICON_HTML5 = 'html5.png' 15 | ICON_PICTURE = 'picture.png' 16 | ICON_VIDEO = 'video.png' 17 | ICON_OTHER = 'other.png' 18 | 19 | # 加载今天的干货列表 20 | def today(): 21 | import datetime 22 | now = datetime.datetime.now() 23 | # load today ganks from gank.io 24 | url = 'http://gank.io/api/day/' + now.strftime("%Y/%m/%d") 25 | # url = 'http://gank.io/api/day/2016/06/22' 26 | r = web.get(url) 27 | 28 | # throw an error if request failed, Workflow will catch this and show it to the user 29 | r.raise_for_status() 30 | 31 | ganks = [] 32 | data = r.json() 33 | categories = data['category'] 34 | results = data['results'] 35 | for category in categories: 36 | for gank in results[category]: 37 | ganks.append(gank) 38 | 39 | return ganks 40 | 41 | # 给定关键词搜索干货 -> 第二版本的搜索接口:由官方gank.io提供搜索接口 42 | # http://gank.io/api/search/query/android animation/category/all/count/50/page/1 43 | def search(query): 44 | # search the ganks from gank.io 45 | url = 'http://gank.io/api/search/query/%s/category/all/count/50/page/1' % query 46 | r = web.get(url) 47 | 48 | # throw an error if request failed, Workflow will catch this and show it to the user 49 | r.raise_for_status() 50 | 51 | ganks = [] 52 | data = r.json() 53 | results = data['results'] 54 | for gank in results: 55 | ganks.append(gank) 56 | 57 | return ganks 58 | 59 | # 给定关键词搜索干货,这里使用的是干货集中营中的http://gank.io/search?q=接口,通过解析返回网页得到结果 60 | # def search_v2(query): 61 | # from bs4 import BeautifulSoup 62 | # from bs4 import UnicodeDammit 63 | # # search the ganks from gank.io 64 | # # url = 'http://gank.io/search?%s=%s' % ('q', query) 65 | # url = 'http://gank.io/search' 66 | # params = dict(q=query) 67 | # r = web.get(url, params) 68 | 69 | # # throw an error if request failed, Workflow will catch this and show it to the user 70 | # r.raise_for_status() 71 | 72 | # ganks = [] 73 | # soup = BeautifulSoup(r.text, 'html.parser', from_encoding = 'unicode') 74 | # table = soup.body.find('div', 'content').find('ol') 75 | # for row in table.find_all('li'): 76 | # gank = {} 77 | # gank['url'] = row.a.get('href') 78 | # gank['desc'] = row.a.string 79 | # gank['type'] = row.find_all('small')[0].string 80 | # gank['who'] = row.find_all('small')[1].string 81 | # ganks.append(gank) 82 | 83 | # return ganks 84 | 85 | # 给定关键词搜索干货 -> 第一版本的搜索接口:由自己的项目Ganks-for-gank.io提供搜索接口 86 | def search_v1(query): 87 | # search the ganks from gankio.herokuapp.com 88 | url = 'http://gankio.herokuapp.com/search' 89 | params = dict(keyword=query) 90 | r = web.post(url, params) 91 | 92 | # throw an error if request failed, Workflow will catch this and show it to the user 93 | r.raise_for_status() 94 | 95 | return r.json() 96 | 97 | # determine the image 98 | def icon(type): 99 | if 'Android' in type: 100 | return ICON_ANDROID 101 | elif 'iOS' in type: 102 | return ICON_IOS 103 | elif u'前端' in type: 104 | return ICON_HTML5 105 | elif u'休息视频' in type: 106 | return ICON_VIDEO 107 | elif u'福利' in type: 108 | return ICON_PICTURE 109 | elif u'拓展资源' in type: 110 | return ICON_OTHER 111 | elif u'瞎推荐' in type: 112 | return ICON_OTHER 113 | elif 'App' in type: 114 | return ICON_GANK 115 | else: 116 | return ICON_GANK 117 | 118 | def main(wf): 119 | # The Workflow instance will be passed to the function you call from `Workflow.run`. 120 | # Not so useful, as the `wf` object created in `if __name__ ...` below is global. 121 | 122 | # Get query from Workflow 123 | if len(wf.args): 124 | query = ' '.join(wf.args) 125 | else: 126 | query = None 127 | 128 | # If query is None, load today ganks, else search with given query string 129 | if query: 130 | # Search ganks or load from cached data, cache for 10 mins 131 | def wrapper(): 132 | return search(query) 133 | ganks = wf.cached_data(query, wrapper, max_age=600) 134 | if len(ganks) <= 0: 135 | wf.add_item(title=u'没有搜索到干货', valid=True, icon=ICON_DEFAULT) 136 | else: 137 | # Load today ganks or load from cached data, cache for 1 min 138 | ganks = wf.cached_data('today', today, max_age=60) 139 | if len(ganks) <= 0: 140 | wf.add_item(title=u'今天还没发干货', valid=True, icon=ICON_DEFAULT) 141 | 142 | # add result items to workflow 143 | for gank in ganks: 144 | wf.add_item(title=gank['desc'], 145 | subtitle=gank['who'], 146 | arg=gank['url'], 147 | valid=True, 148 | icon=icon(gank['type'])) 149 | 150 | # Send output to Alfred. You can only call this once. 151 | # Well, you *can* call it multiple times, but Alfred won't be listening any more... 152 | wf.send_feedback() 153 | #print(wf._items) 154 | 155 | 156 | if __name__ == '__main__': 157 | # Create a global `Workflow` object 158 | wf = Workflow() 159 | logger = wf.logger 160 | # wf = Workflow(update_settings={ 161 | # 'github_slug': 'hujiaweibujidao/Gank-Alfred-Workflow', 162 | # 'frequency': 7 163 | # }) 164 | 165 | # Call your entry function via `Workflow.run()` to enable its helper 166 | # functions, like exception catching, ARGV normalization, magic arguments etc. 167 | sys.exit(wf.run(main)) 168 | 169 | # if wf.update_available: 170 | # wf.start_update() 171 | 172 | -------------------------------------------------------------------------------- /source/html5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/html5.png -------------------------------------------------------------------------------- /source/icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/icon.png -------------------------------------------------------------------------------- /source/info.plist: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | bundleid 6 | search.gank.io 7 | category 8 | Tools 9 | connections 10 | 11 | 161947FA-07D7-46BC-AE79-EFF41BBA87EB 12 | 13 | 14 | destinationuid 15 | DCE01299-BCDC-442F-AA5F-13FC1A803AD3 16 | modifiers 17 | 0 18 | modifiersubtext 19 | 20 | 21 | 22 | 23 | createdby 24 | hujiawei 25 | description 26 | Gank Searcher for gank.io (干货集中营的干货搜索器) 27 | disabled 28 | 29 | name 30 | Gank Searcher 31 | objects 32 | 33 | 34 | config 35 | 36 | plusspaces 37 | 38 | url 39 | {query} 40 | utf8 41 | 42 | 43 | type 44 | alfred.workflow.action.openurl 45 | uid 46 | DCE01299-BCDC-442F-AA5F-13FC1A803AD3 47 | version 48 | 0 49 | 50 | 51 | config 52 | 53 | argumenttype 54 | 1 55 | escaping 56 | 102 57 | keyword 58 | gs 59 | queuedelaycustom 60 | 3 61 | queuedelayimmediatelyinitially 62 | 63 | queuedelaymode 64 | 0 65 | queuemode 66 | 1 67 | runningsubtext 68 | 客官,请稍等...... 69 | script 70 | python gank.py "{query}" 71 | subtext 72 | 根据您的关键词,搜索最相关的干货! 73 | title 74 | 干货搜索器 75 | type 76 | 0 77 | withspace 78 | 79 | 80 | type 81 | alfred.workflow.input.scriptfilter 82 | uid 83 | 161947FA-07D7-46BC-AE79-EFF41BBA87EB 84 | version 85 | 0 86 | 87 | 88 | readme 89 | 90 | uidata 91 | 92 | 161947FA-07D7-46BC-AE79-EFF41BBA87EB 93 | 94 | ypos 95 | 110 96 | 97 | DCE01299-BCDC-442F-AA5F-13FC1A803AD3 98 | 99 | ypos 100 | 10 101 | 102 | 103 | webaddress 104 | https://github.com/hujiaweibujidao/Gank-Alfred-Workflow 105 | 106 | 107 | -------------------------------------------------------------------------------- /source/other.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/other.png -------------------------------------------------------------------------------- /source/picture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/picture.png -------------------------------------------------------------------------------- /source/version: -------------------------------------------------------------------------------- 1 | 2.0.0 -------------------------------------------------------------------------------- /source/video.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/video.png -------------------------------------------------------------------------------- /source/workflow/Notify.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/javayhu/Gank-Alfred-Workflow/aca39bd0c7bc0c494eee204e10bca61dab760ab7/source/workflow/Notify.tgz -------------------------------------------------------------------------------- /source/workflow/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2014 Dean Jackson 5 | # 6 | # MIT Licence. See http://opensource.org/licenses/MIT 7 | # 8 | # Created on 2014-02-15 9 | # 10 | 11 | """ 12 | A Python helper library for `Alfred 2 `_ Workflow 13 | authors. 14 | """ 15 | 16 | import os 17 | 18 | __title__ = 'Alfred-Workflow' 19 | __version__ = open(os.path.join(os.path.dirname(__file__), 'version')).read() 20 | __author__ = 'Dean Jackson' 21 | __licence__ = 'MIT' 22 | __copyright__ = 'Copyright 2014 Dean Jackson' 23 | 24 | 25 | # Workflow objects 26 | from .workflow import Workflow, manager 27 | 28 | # Exceptions 29 | from .workflow import PasswordNotFound, KeychainError 30 | 31 | # Icons 32 | from .workflow import ( 33 | ICON_ACCOUNT, 34 | ICON_BURN, 35 | ICON_CLOCK, 36 | ICON_COLOR, 37 | ICON_COLOUR, 38 | ICON_EJECT, 39 | ICON_ERROR, 40 | ICON_FAVORITE, 41 | ICON_FAVOURITE, 42 | ICON_GROUP, 43 | ICON_HELP, 44 | ICON_HOME, 45 | ICON_INFO, 46 | ICON_NETWORK, 47 | ICON_NOTE, 48 | ICON_SETTINGS, 49 | ICON_SWIRL, 50 | ICON_SWITCH, 51 | ICON_SYNC, 52 | ICON_TRASH, 53 | ICON_USER, 54 | ICON_WARNING, 55 | ICON_WEB, 56 | ) 57 | 58 | # Filter matching rules 59 | from .workflow import ( 60 | MATCH_ALL, 61 | MATCH_ALLCHARS, 62 | MATCH_ATOM, 63 | MATCH_CAPITALS, 64 | MATCH_INITIALS, 65 | MATCH_INITIALS_CONTAIN, 66 | MATCH_INITIALS_STARTSWITH, 67 | MATCH_STARTSWITH, 68 | MATCH_SUBSTRING, 69 | ) 70 | 71 | __all__ = [ 72 | 'Workflow', 73 | 'manager', 74 | 'PasswordNotFound', 75 | 'KeychainError', 76 | 'ICON_ACCOUNT', 77 | 'ICON_BURN', 78 | 'ICON_CLOCK', 79 | 'ICON_COLOR', 80 | 'ICON_COLOUR', 81 | 'ICON_EJECT', 82 | 'ICON_ERROR', 83 | 'ICON_FAVORITE', 84 | 'ICON_FAVOURITE', 85 | 'ICON_GROUP', 86 | 'ICON_HELP', 87 | 'ICON_HOME', 88 | 'ICON_INFO', 89 | 'ICON_NETWORK', 90 | 'ICON_NOTE', 91 | 'ICON_SETTINGS', 92 | 'ICON_SWIRL', 93 | 'ICON_SWITCH', 94 | 'ICON_SYNC', 95 | 'ICON_TRASH', 96 | 'ICON_USER', 97 | 'ICON_WARNING', 98 | 'ICON_WEB', 99 | 'MATCH_ALL', 100 | 'MATCH_ALLCHARS', 101 | 'MATCH_ATOM', 102 | 'MATCH_CAPITALS', 103 | 'MATCH_INITIALS', 104 | 'MATCH_INITIALS_CONTAIN', 105 | 'MATCH_INITIALS_STARTSWITH', 106 | 'MATCH_STARTSWITH', 107 | 'MATCH_SUBSTRING', 108 | ] 109 | -------------------------------------------------------------------------------- /source/workflow/background.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2014 deanishe@deanishe.net 5 | # 6 | # MIT Licence. See http://opensource.org/licenses/MIT 7 | # 8 | # Created on 2014-04-06 9 | # 10 | 11 | """ 12 | Run background tasks 13 | """ 14 | 15 | from __future__ import print_function, unicode_literals 16 | 17 | import sys 18 | import os 19 | import subprocess 20 | import pickle 21 | 22 | from workflow import Workflow 23 | 24 | __all__ = ['is_running', 'run_in_background'] 25 | 26 | _wf = None 27 | 28 | 29 | def wf(): 30 | global _wf 31 | if _wf is None: 32 | _wf = Workflow() 33 | return _wf 34 | 35 | 36 | def _arg_cache(name): 37 | """Return path to pickle cache file for arguments 38 | 39 | :param name: name of task 40 | :type name: ``unicode`` 41 | :returns: Path to cache file 42 | :rtype: ``unicode`` filepath 43 | 44 | """ 45 | 46 | return wf().cachefile('{0}.argcache'.format(name)) 47 | 48 | 49 | def _pid_file(name): 50 | """Return path to PID file for ``name`` 51 | 52 | :param name: name of task 53 | :type name: ``unicode`` 54 | :returns: Path to PID file for task 55 | :rtype: ``unicode`` filepath 56 | 57 | """ 58 | 59 | return wf().cachefile('{0}.pid'.format(name)) 60 | 61 | 62 | def _process_exists(pid): 63 | """Check if a process with PID ``pid`` exists 64 | 65 | :param pid: PID to check 66 | :type pid: ``int`` 67 | :returns: ``True`` if process exists, else ``False`` 68 | :rtype: ``Boolean`` 69 | """ 70 | 71 | try: 72 | os.kill(pid, 0) 73 | except OSError: # not running 74 | return False 75 | return True 76 | 77 | 78 | def is_running(name): 79 | """ 80 | Test whether task is running under ``name`` 81 | 82 | :param name: name of task 83 | :type name: ``unicode`` 84 | :returns: ``True`` if task with name ``name`` is running, else ``False`` 85 | :rtype: ``Boolean`` 86 | 87 | """ 88 | pidfile = _pid_file(name) 89 | if not os.path.exists(pidfile): 90 | return False 91 | 92 | with open(pidfile, 'rb') as file_obj: 93 | pid = int(file_obj.read().strip()) 94 | 95 | if _process_exists(pid): 96 | return True 97 | 98 | elif os.path.exists(pidfile): 99 | os.unlink(pidfile) 100 | 101 | return False 102 | 103 | 104 | def _background(stdin='/dev/null', stdout='/dev/null', 105 | stderr='/dev/null'): # pragma: no cover 106 | """Fork the current process into a background daemon. 107 | 108 | :param stdin: where to read input 109 | :type stdin: filepath 110 | :param stdout: where to write stdout output 111 | :type stdout: filepath 112 | :param stderr: where to write stderr output 113 | :type stderr: filepath 114 | 115 | """ 116 | 117 | # Do first fork. 118 | try: 119 | pid = os.fork() 120 | if pid > 0: 121 | sys.exit(0) # Exit first parent. 122 | except OSError as e: 123 | wf().logger.critical("fork #1 failed: ({0:d}) {1}".format( 124 | e.errno, e.strerror)) 125 | sys.exit(1) 126 | # Decouple from parent environment. 127 | os.chdir(wf().workflowdir) 128 | os.umask(0) 129 | os.setsid() 130 | # Do second fork. 131 | try: 132 | pid = os.fork() 133 | if pid > 0: 134 | sys.exit(0) # Exit second parent. 135 | except OSError as e: 136 | wf().logger.critical("fork #2 failed: ({0:d}) {1}".format( 137 | e.errno, e.strerror)) 138 | sys.exit(1) 139 | # Now I am a daemon! 140 | # Redirect standard file descriptors. 141 | si = file(stdin, 'r', 0) 142 | so = file(stdout, 'a+', 0) 143 | se = file(stderr, 'a+', 0) 144 | if hasattr(sys.stdin, 'fileno'): 145 | os.dup2(si.fileno(), sys.stdin.fileno()) 146 | if hasattr(sys.stdout, 'fileno'): 147 | os.dup2(so.fileno(), sys.stdout.fileno()) 148 | if hasattr(sys.stderr, 'fileno'): 149 | os.dup2(se.fileno(), sys.stderr.fileno()) 150 | 151 | 152 | def run_in_background(name, args, **kwargs): 153 | """Pickle arguments to cache file, then call this script again via 154 | :func:`subprocess.call`. 155 | 156 | :param name: name of task 157 | :type name: ``unicode`` 158 | :param args: arguments passed as first argument to :func:`subprocess.call` 159 | :param \**kwargs: keyword arguments to :func:`subprocess.call` 160 | :returns: exit code of sub-process 161 | :rtype: ``int`` 162 | 163 | When you call this function, it caches its arguments and then calls 164 | ``background.py`` in a subprocess. The Python subprocess will load the 165 | cached arguments, fork into the background, and then run the command you 166 | specified. 167 | 168 | This function will return as soon as the ``background.py`` subprocess has 169 | forked, returning the exit code of *that* process (i.e. not of the command 170 | you're trying to run). 171 | 172 | If that process fails, an error will be written to the log file. 173 | 174 | If a process is already running under the same name, this function will 175 | return immediately and will not run the specified command. 176 | 177 | """ 178 | 179 | if is_running(name): 180 | wf().logger.info('Task `{0}` is already running'.format(name)) 181 | return 182 | 183 | argcache = _arg_cache(name) 184 | 185 | # Cache arguments 186 | with open(argcache, 'wb') as file_obj: 187 | pickle.dump({'args': args, 'kwargs': kwargs}, file_obj) 188 | wf().logger.debug('Command arguments cached to `{0}`'.format(argcache)) 189 | 190 | # Call this script 191 | cmd = ['/usr/bin/python', __file__, name] 192 | wf().logger.debug('Calling {0!r} ...'.format(cmd)) 193 | retcode = subprocess.call(cmd) 194 | if retcode: # pragma: no cover 195 | wf().logger.error('Failed to call task in background') 196 | else: 197 | wf().logger.debug('Executing task `{0}` in background...'.format(name)) 198 | return retcode 199 | 200 | 201 | def main(wf): # pragma: no cover 202 | """ 203 | Load cached arguments, fork into background, then call 204 | :meth:`subprocess.call` with cached arguments 205 | 206 | """ 207 | 208 | name = wf.args[0] 209 | argcache = _arg_cache(name) 210 | if not os.path.exists(argcache): 211 | wf.logger.critical('No arg cache found : {0!r}'.format(argcache)) 212 | return 1 213 | 214 | # Load cached arguments 215 | with open(argcache, 'rb') as file_obj: 216 | data = pickle.load(file_obj) 217 | 218 | # Cached arguments 219 | args = data['args'] 220 | kwargs = data['kwargs'] 221 | 222 | # Delete argument cache file 223 | os.unlink(argcache) 224 | 225 | pidfile = _pid_file(name) 226 | 227 | # Fork to background 228 | _background() 229 | 230 | # Write PID to file 231 | with open(pidfile, 'wb') as file_obj: 232 | file_obj.write('{0}'.format(os.getpid())) 233 | 234 | # Run the command 235 | try: 236 | wf.logger.debug('Task `{0}` running'.format(name)) 237 | wf.logger.debug('cmd : {0!r}'.format(args)) 238 | 239 | retcode = subprocess.call(args, **kwargs) 240 | 241 | if retcode: 242 | wf.logger.error('Command failed with [{0}] : {1!r}'.format( 243 | retcode, args)) 244 | 245 | finally: 246 | if os.path.exists(pidfile): 247 | os.unlink(pidfile) 248 | wf.logger.debug('Task `{0}` finished'.format(name)) 249 | 250 | 251 | if __name__ == '__main__': # pragma: no cover 252 | wf().run(main) 253 | -------------------------------------------------------------------------------- /source/workflow/notify.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2015 deanishe@deanishe.net 5 | # 6 | # MIT Licence. See http://opensource.org/licenses/MIT 7 | # 8 | # Created on 2015-11-26 9 | # 10 | 11 | # TODO: Exclude this module from test and code coverage in py2.6 12 | 13 | """ 14 | Post notifications via the OS X Notification Center. This feature 15 | is only available on Mountain Lion (10.8) and later. It will 16 | silently fail on older systems. 17 | 18 | The main API is a single function, :func:`~workflow.notify.notify`. 19 | 20 | It works by copying a simple application to your workflow's data 21 | directory. It replaces the application's icon with your workflow's 22 | icon and then calls the application to post notifications. 23 | """ 24 | 25 | from __future__ import print_function, unicode_literals 26 | 27 | import os 28 | import plistlib 29 | import shutil 30 | import subprocess 31 | import sys 32 | import tarfile 33 | import tempfile 34 | import uuid 35 | 36 | import workflow 37 | 38 | 39 | _wf = None 40 | _log = None 41 | 42 | 43 | #: Available system sounds from System Preferences > Sound > Sound Effects 44 | SOUNDS = ( 45 | 'Basso', 46 | 'Blow', 47 | 'Bottle', 48 | 'Frog', 49 | 'Funk', 50 | 'Glass', 51 | 'Hero', 52 | 'Morse', 53 | 'Ping', 54 | 'Pop', 55 | 'Purr', 56 | 'Sosumi', 57 | 'Submarine', 58 | 'Tink', 59 | ) 60 | 61 | 62 | def wf(): 63 | """Return `Workflow` object for this module. 64 | 65 | Returns: 66 | workflow.Workflow: `Workflow` object for current workflow. 67 | """ 68 | global _wf 69 | if _wf is None: 70 | _wf = workflow.Workflow() 71 | return _wf 72 | 73 | 74 | def log(): 75 | """Return logger for this module. 76 | 77 | Returns: 78 | logging.Logger: Logger for this module. 79 | """ 80 | global _log 81 | if _log is None: 82 | _log = wf().logger 83 | return _log 84 | 85 | 86 | def notifier_program(): 87 | """Return path to notifier applet executable. 88 | 89 | Returns: 90 | unicode: Path to Notify.app `applet` executable. 91 | """ 92 | return wf().datafile('Notify.app/Contents/MacOS/applet') 93 | 94 | 95 | def notifier_icon_path(): 96 | """Return path to icon file in installed Notify.app. 97 | 98 | Returns: 99 | unicode: Path to `applet.icns` within the app bundle. 100 | """ 101 | return wf().datafile('Notify.app/Contents/Resources/applet.icns') 102 | 103 | 104 | def install_notifier(): 105 | """Extract `Notify.app` from the workflow to data directory. 106 | 107 | Changes the bundle ID of the installed app and gives it the 108 | workflow's icon. 109 | """ 110 | archive = os.path.join(os.path.dirname(__file__), 'Notify.tgz') 111 | destdir = wf().datadir 112 | app_path = os.path.join(destdir, 'Notify.app') 113 | n = notifier_program() 114 | log().debug("Installing Notify.app to %r ...", destdir) 115 | # z = zipfile.ZipFile(archive, 'r') 116 | # z.extractall(destdir) 117 | tgz = tarfile.open(archive, 'r:gz') 118 | tgz.extractall(destdir) 119 | assert os.path.exists(n), ( 120 | "Notify.app could not be installed in {0!r}.".format(destdir)) 121 | 122 | # Replace applet icon 123 | icon = notifier_icon_path() 124 | workflow_icon = wf().workflowfile('icon.png') 125 | if os.path.exists(icon): 126 | os.unlink(icon) 127 | 128 | png_to_icns(workflow_icon, icon) 129 | 130 | # Set file icon 131 | # PyObjC isn't available for 2.6, so this is 2.7 only. Actually, 132 | # none of this code will "work" on pre-10.8 systems. Let it run 133 | # until I figure out a better way of excluding this module 134 | # from coverage in py2.6. 135 | if sys.version_info >= (2, 7): # pragma: no cover 136 | from AppKit import NSWorkspace, NSImage 137 | 138 | ws = NSWorkspace.sharedWorkspace() 139 | img = NSImage.alloc().init() 140 | img.initWithContentsOfFile_(icon) 141 | ws.setIcon_forFile_options_(img, app_path, 0) 142 | 143 | # Change bundle ID of installed app 144 | ip_path = os.path.join(app_path, 'Contents/Info.plist') 145 | bundle_id = '{0}.{1}'.format(wf().bundleid, uuid.uuid4().hex) 146 | data = plistlib.readPlist(ip_path) 147 | log().debug('Changing bundle ID to {0!r}'.format(bundle_id)) 148 | data['CFBundleIdentifier'] = bundle_id 149 | plistlib.writePlist(data, ip_path) 150 | 151 | 152 | def validate_sound(sound): 153 | """Coerce `sound` to valid sound name. 154 | 155 | Returns `None` for invalid sounds. Sound names can be found 156 | in `System Preferences > Sound > Sound Effects`. 157 | 158 | Args: 159 | sound (str): Name of system sound. 160 | 161 | Returns: 162 | str: Proper name of sound or `None`. 163 | """ 164 | if not sound: 165 | return None 166 | 167 | # Case-insensitive comparison of `sound` 168 | if sound.lower() in [s.lower() for s in SOUNDS]: 169 | # Title-case is correct for all system sounds as of OS X 10.11 170 | return sound.title() 171 | return None 172 | 173 | 174 | def notify(title='', text='', sound=None): 175 | """Post notification via Notify.app helper. 176 | 177 | Args: 178 | title (str, optional): Notification title. 179 | text (str, optional): Notification body text. 180 | sound (str, optional): Name of sound to play. 181 | 182 | Raises: 183 | ValueError: Raised if both `title` and `text` are empty. 184 | 185 | Returns: 186 | bool: `True` if notification was posted, else `False`. 187 | """ 188 | if title == text == '': 189 | raise ValueError('Empty notification') 190 | 191 | sound = validate_sound(sound) or '' 192 | 193 | n = notifier_program() 194 | 195 | if not os.path.exists(n): 196 | install_notifier() 197 | 198 | env = os.environ.copy() 199 | enc = 'utf-8' 200 | env['NOTIFY_TITLE'] = title.encode(enc) 201 | env['NOTIFY_MESSAGE'] = text.encode(enc) 202 | env['NOTIFY_SOUND'] = sound.encode(enc) 203 | cmd = [n] 204 | retcode = subprocess.call(cmd, env=env) 205 | if retcode == 0: 206 | return True 207 | 208 | log().error('Notify.app exited with status {0}.'.format(retcode)) 209 | return False 210 | 211 | 212 | def convert_image(inpath, outpath, size): 213 | """Convert an image file using `sips`. 214 | 215 | Args: 216 | inpath (str): Path of source file. 217 | outpath (str): Path to destination file. 218 | size (int): Width and height of destination image in pixels. 219 | 220 | Raises: 221 | RuntimeError: Raised if `sips` exits with non-zero status. 222 | """ 223 | cmd = [ 224 | b'sips', 225 | b'-z', b'{0}'.format(size), b'{0}'.format(size), 226 | inpath, 227 | b'--out', outpath] 228 | # log().debug(cmd) 229 | with open(os.devnull, 'w') as pipe: 230 | retcode = subprocess.call(cmd, stdout=pipe, stderr=subprocess.STDOUT) 231 | 232 | if retcode != 0: 233 | raise RuntimeError('sips exited with {0}'.format(retcode)) 234 | 235 | 236 | def png_to_icns(png_path, icns_path): 237 | """Convert PNG file to ICNS using `iconutil`. 238 | 239 | Create an iconset from the source PNG file. Generate PNG files 240 | in each size required by OS X, then call `iconutil` to turn 241 | them into a single ICNS file. 242 | 243 | Args: 244 | png_path (str): Path to source PNG file. 245 | icns_path (str): Path to destination ICNS file. 246 | 247 | Raises: 248 | RuntimeError: Raised if `iconutil` or `sips` fail. 249 | """ 250 | tempdir = tempfile.mkdtemp(prefix='aw-', dir=wf().datadir) 251 | 252 | try: 253 | iconset = os.path.join(tempdir, 'Icon.iconset') 254 | 255 | assert not os.path.exists(iconset), ( 256 | "Iconset path already exists : {0!r}".format(iconset)) 257 | os.makedirs(iconset) 258 | 259 | # Copy source icon to icon set and generate all the other 260 | # sizes needed 261 | configs = [] 262 | for i in (16, 32, 128, 256, 512): 263 | configs.append(('icon_{0}x{0}.png'.format(i), i)) 264 | configs.append((('icon_{0}x{0}@2x.png'.format(i), i*2))) 265 | 266 | shutil.copy(png_path, os.path.join(iconset, 'icon_256x256.png')) 267 | shutil.copy(png_path, os.path.join(iconset, 'icon_128x128@2x.png')) 268 | 269 | for name, size in configs: 270 | outpath = os.path.join(iconset, name) 271 | if os.path.exists(outpath): 272 | continue 273 | convert_image(png_path, outpath, size) 274 | 275 | cmd = [ 276 | b'iconutil', 277 | b'-c', b'icns', 278 | b'-o', icns_path, 279 | iconset] 280 | 281 | retcode = subprocess.call(cmd) 282 | if retcode != 0: 283 | raise RuntimeError("iconset exited with {0}".format(retcode)) 284 | 285 | assert os.path.exists(icns_path), ( 286 | "Generated ICNS file not found : {0!r}".format(icns_path)) 287 | finally: 288 | try: 289 | shutil.rmtree(tempdir) 290 | except OSError: # pragma: no cover 291 | pass 292 | 293 | 294 | # def notify_native(title='', text='', sound=''): 295 | # """Post notification via the native API (via pyobjc). 296 | 297 | # At least one of `title` or `text` must be specified. 298 | 299 | # This method will *always* show the Python launcher icon (i.e. the 300 | # rocket with the snakes on it). 301 | 302 | # Args: 303 | # title (str, optional): Notification title. 304 | # text (str, optional): Notification body text. 305 | # sound (str, optional): Name of sound to play. 306 | 307 | # """ 308 | 309 | # if title == text == '': 310 | # raise ValueError('Empty notification') 311 | 312 | # import Foundation 313 | 314 | # sound = sound or Foundation.NSUserNotificationDefaultSoundName 315 | 316 | # n = Foundation.NSUserNotification.alloc().init() 317 | # n.setTitle_(title) 318 | # n.setInformativeText_(text) 319 | # n.setSoundName_(sound) 320 | # nc = Foundation.NSUserNotificationCenter.defaultUserNotificationCenter() 321 | # nc.deliverNotification_(n) 322 | 323 | 324 | if __name__ == '__main__': # pragma: nocover 325 | # Simple command-line script to test module with 326 | # This won't work on 2.6, as `argparse` isn't available 327 | # by default. 328 | import argparse 329 | 330 | from unicodedata import normalize 331 | 332 | def uni(s): 333 | """Coerce `s` to normalised Unicode.""" 334 | ustr = s.decode('utf-8') 335 | return normalize('NFD', ustr) 336 | 337 | p = argparse.ArgumentParser() 338 | p.add_argument('-p', '--png', help="PNG image to convert to ICNS.") 339 | p.add_argument('-l', '--list-sounds', help="Show available sounds.", 340 | action='store_true') 341 | p.add_argument('-t', '--title', 342 | help="Notification title.", type=uni, 343 | default='') 344 | p.add_argument('-s', '--sound', type=uni, 345 | help="Optional notification sound.", default='') 346 | p.add_argument('text', type=uni, 347 | help="Notification body text.", default='', nargs='?') 348 | o = p.parse_args() 349 | 350 | # List available sounds 351 | if o.list_sounds: 352 | for sound in SOUNDS: 353 | print(sound) 354 | sys.exit(0) 355 | 356 | # Convert PNG to ICNS 357 | if o.png: 358 | icns = os.path.join( 359 | os.path.dirname(o.png), 360 | b'{0}{1}'.format(os.path.splitext(os.path.basename(o.png))[0], 361 | '.icns')) 362 | 363 | print('Converting {0!r} to {1!r} ...'.format(o.png, icns), 364 | file=sys.stderr) 365 | 366 | assert not os.path.exists(icns), ( 367 | "Destination file already exists : {0}".format(icns)) 368 | 369 | png_to_icns(o.png, icns) 370 | sys.exit(0) 371 | 372 | # Post notification 373 | if o.title == o.text == '': 374 | print('ERROR: Empty notification.', file=sys.stderr) 375 | sys.exit(1) 376 | else: 377 | notify(o.title, o.text, o.sound) 378 | -------------------------------------------------------------------------------- /source/workflow/update.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # 4 | # Copyright (c) 2014 Fabio Niephaus , 5 | # Dean Jackson 6 | # 7 | # MIT Licence. See http://opensource.org/licenses/MIT 8 | # 9 | # Created on 2014-08-16 10 | # 11 | 12 | """ 13 | Self-updating from GitHub 14 | 15 | .. versionadded:: 1.9 16 | 17 | .. note:: 18 | 19 | This module is not intended to be used directly. Automatic updates 20 | are controlled by the ``update_settings`` :class:`dict` passed to 21 | :class:`~workflow.workflow.Workflow` objects. 22 | 23 | """ 24 | 25 | from __future__ import print_function, unicode_literals 26 | 27 | import os 28 | import tempfile 29 | import re 30 | import subprocess 31 | 32 | import workflow 33 | import web 34 | 35 | # __all__ = [] 36 | 37 | 38 | RELEASES_BASE = 'https://api.github.com/repos/{0}/releases' 39 | 40 | 41 | _wf = None 42 | 43 | 44 | def wf(): 45 | global _wf 46 | if _wf is None: 47 | _wf = workflow.Workflow() 48 | return _wf 49 | 50 | 51 | class Version(object): 52 | """Mostly semantic versioning 53 | 54 | The main difference to proper :ref:`semantic versioning ` 55 | is that this implementation doesn't require a minor or patch version. 56 | """ 57 | 58 | #: Match version and pre-release/build information in version strings 59 | match_version = re.compile(r'([0-9\.]+)(.+)?').match 60 | 61 | def __init__(self, vstr): 62 | self.vstr = vstr 63 | self.major = 0 64 | self.minor = 0 65 | self.patch = 0 66 | self.suffix = '' 67 | self.build = '' 68 | self._parse(vstr) 69 | 70 | def _parse(self, vstr): 71 | if vstr.startswith('v'): 72 | m = self.match_version(vstr[1:]) 73 | else: 74 | m = self.match_version(vstr) 75 | if not m: 76 | raise ValueError('Invalid version number: {0}'.format(vstr)) 77 | 78 | version, suffix = m.groups() 79 | parts = self._parse_dotted_string(version) 80 | self.major = parts.pop(0) 81 | if len(parts): 82 | self.minor = parts.pop(0) 83 | if len(parts): 84 | self.patch = parts.pop(0) 85 | if not len(parts) == 0: 86 | raise ValueError('Invalid version (too long) : {0}'.format(vstr)) 87 | 88 | if suffix: 89 | # Build info 90 | idx = suffix.find('+') 91 | if idx > -1: 92 | self.build = suffix[idx+1:] 93 | suffix = suffix[:idx] 94 | if suffix: 95 | if not suffix.startswith('-'): 96 | raise ValueError( 97 | 'Invalid suffix : `{0}`. Must start with `-`'.format( 98 | suffix)) 99 | self.suffix = suffix[1:] 100 | 101 | # wf().logger.debug('version str `{}` -> {}'.format(vstr, repr(self))) 102 | 103 | def _parse_dotted_string(self, s): 104 | """Parse string ``s`` into list of ints and strings""" 105 | parsed = [] 106 | parts = s.split('.') 107 | for p in parts: 108 | if p.isdigit(): 109 | p = int(p) 110 | parsed.append(p) 111 | return parsed 112 | 113 | @property 114 | def tuple(self): 115 | """Version number as a tuple of major, minor, patch, pre-release""" 116 | 117 | return (self.major, self.minor, self.patch, self.suffix) 118 | 119 | def __lt__(self, other): 120 | if not isinstance(other, Version): 121 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 122 | t = self.tuple[:3] 123 | o = other.tuple[:3] 124 | if t < o: 125 | return True 126 | if t == o: # We need to compare suffixes 127 | if self.suffix and not other.suffix: 128 | return True 129 | if other.suffix and not self.suffix: 130 | return False 131 | return (self._parse_dotted_string(self.suffix) < 132 | self._parse_dotted_string(other.suffix)) 133 | # t > o 134 | return False 135 | 136 | def __eq__(self, other): 137 | if not isinstance(other, Version): 138 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 139 | return self.tuple == other.tuple 140 | 141 | def __ne__(self, other): 142 | return not self.__eq__(other) 143 | 144 | def __gt__(self, other): 145 | if not isinstance(other, Version): 146 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 147 | return other.__lt__(self) 148 | 149 | def __le__(self, other): 150 | if not isinstance(other, Version): 151 | raise ValueError('Not a Version instance: {0!r}'.format(other)) 152 | return not other.__lt__(self) 153 | 154 | def __ge__(self, other): 155 | return not self.__lt__(other) 156 | 157 | def __str__(self): 158 | vstr = '{0}.{1}.{2}'.format(self.major, self.minor, self.patch) 159 | if self.suffix: 160 | vstr += '-{0}'.format(self.suffix) 161 | if self.build: 162 | vstr += '+{0}'.format(self.build) 163 | return vstr 164 | 165 | def __repr__(self): 166 | return "Version('{0}')".format(str(self)) 167 | 168 | 169 | def download_workflow(url): 170 | """Download workflow at ``url`` to a local temporary file 171 | 172 | :param url: URL to .alfredworkflow file in GitHub repo 173 | :returns: path to downloaded file 174 | 175 | """ 176 | 177 | filename = url.split("/")[-1] 178 | 179 | if (not url.endswith('.alfredworkflow') or 180 | not filename.endswith('.alfredworkflow')): 181 | raise ValueError('Attachment `{0}` not a workflow'.format(filename)) 182 | 183 | local_path = os.path.join(tempfile.gettempdir(), filename) 184 | 185 | wf().logger.debug( 186 | 'Downloading updated workflow from `{0}` to `{1}` ...'.format( 187 | url, local_path)) 188 | 189 | response = web.get(url) 190 | 191 | with open(local_path, 'wb') as output: 192 | output.write(response.content) 193 | 194 | return local_path 195 | 196 | 197 | def build_api_url(slug): 198 | """Generate releases URL from GitHub slug 199 | 200 | :param slug: Repo name in form ``username/repo`` 201 | :returns: URL to the API endpoint for the repo's releases 202 | 203 | """ 204 | 205 | if len(slug.split('/')) != 2: 206 | raise ValueError('Invalid GitHub slug : {0}'.format(slug)) 207 | 208 | return RELEASES_BASE.format(slug) 209 | 210 | 211 | def get_valid_releases(github_slug, prereleases=False): 212 | """Return list of all valid releases 213 | 214 | :param github_slug: ``username/repo`` for workflow's GitHub repo 215 | :param prereleases: Whether to include pre-releases. 216 | :returns: list of dicts. Each :class:`dict` has the form 217 | ``{'version': '1.1', 'download_url': 'http://github.com/...', 218 | 'prerelease': False }`` 219 | 220 | 221 | A valid release is one that contains one ``.alfredworkflow`` file. 222 | 223 | If the GitHub version (i.e. tag) is of the form ``v1.1``, the leading 224 | ``v`` will be stripped. 225 | 226 | """ 227 | 228 | api_url = build_api_url(github_slug) 229 | releases = [] 230 | 231 | wf().logger.debug('Retrieving releases list from `{0}` ...'.format( 232 | api_url)) 233 | 234 | def retrieve_releases(): 235 | wf().logger.info( 236 | 'Retrieving releases for `{0}` ...'.format(github_slug)) 237 | return web.get(api_url).json() 238 | 239 | slug = github_slug.replace('/', '-') 240 | for release in wf().cached_data('gh-releases-{0}'.format(slug), 241 | retrieve_releases): 242 | version = release['tag_name'] 243 | download_urls = [] 244 | for asset in release.get('assets', []): 245 | url = asset.get('browser_download_url') 246 | if not url or not url.endswith('.alfredworkflow'): 247 | continue 248 | download_urls.append(url) 249 | 250 | # Validate release 251 | if release['prerelease'] and not prereleases: 252 | wf().logger.warning( 253 | 'Invalid release {0} : pre-release detected'.format(version)) 254 | continue 255 | if not download_urls: 256 | wf().logger.warning( 257 | 'Invalid release {0} : No workflow file'.format(version)) 258 | continue 259 | if len(download_urls) > 1: 260 | wf().logger.warning( 261 | 'Invalid release {0} : multiple workflow files'.format(version)) 262 | continue 263 | 264 | wf().logger.debug('Release `{0}` : {1}'.format(version, url)) 265 | releases.append({ 266 | 'version': version, 267 | 'download_url': download_urls[0], 268 | 'prerelease': release['prerelease'] 269 | }) 270 | 271 | return releases 272 | 273 | 274 | def check_update(github_slug, current_version, prereleases=False): 275 | """Check whether a newer release is available on GitHub 276 | 277 | :param github_slug: ``username/repo`` for workflow's GitHub repo 278 | :param current_version: the currently installed version of the 279 | workflow. :ref:`Semantic versioning ` is required. 280 | :param prereleases: Whether to include pre-releases. 281 | :type current_version: ``unicode`` 282 | :returns: ``True`` if an update is available, else ``False`` 283 | 284 | If an update is available, its version number and download URL will 285 | be cached. 286 | 287 | """ 288 | 289 | releases = get_valid_releases(github_slug, prereleases) 290 | 291 | wf().logger.info('{0} releases for {1}'.format(len(releases), 292 | github_slug)) 293 | 294 | if not len(releases): 295 | raise ValueError('No valid releases for {0}'.format(github_slug)) 296 | 297 | # GitHub returns releases newest-first 298 | latest_release = releases[0] 299 | 300 | # (latest_version, download_url) = get_latest_release(releases) 301 | vr = Version(latest_release['version']) 302 | vl = Version(current_version) 303 | wf().logger.debug('Latest : {0!r} Installed : {1!r}'.format(vr, vl)) 304 | if vr > vl: 305 | 306 | wf().cache_data('__workflow_update_status', { 307 | 'version': latest_release['version'], 308 | 'download_url': latest_release['download_url'], 309 | 'available': True 310 | }) 311 | 312 | return True 313 | 314 | wf().cache_data('__workflow_update_status', { 315 | 'available': False 316 | }) 317 | return False 318 | 319 | 320 | def install_update(github_slug, current_version): 321 | """If a newer release is available, download and install it 322 | 323 | :param github_slug: ``username/repo`` for workflow's GitHub repo 324 | :param current_version: the currently installed version of the 325 | workflow. :ref:`Semantic versioning ` is required. 326 | :type current_version: ``unicode`` 327 | 328 | If an update is available, it will be downloaded and installed. 329 | 330 | :returns: ``True`` if an update is installed, else ``False`` 331 | 332 | """ 333 | # TODO: `github_slug` and `current_version` are both unusued. 334 | 335 | update_data = wf().cached_data('__workflow_update_status', max_age=0) 336 | 337 | if not update_data or not update_data.get('available'): 338 | wf().logger.info('No update available') 339 | return False 340 | 341 | local_file = download_workflow(update_data['download_url']) 342 | 343 | wf().logger.info('Installing updated workflow ...') 344 | subprocess.call(['open', local_file]) 345 | 346 | update_data['available'] = False 347 | wf().cache_data('__workflow_update_status', update_data) 348 | return True 349 | 350 | 351 | if __name__ == '__main__': # pragma: nocover 352 | import sys 353 | 354 | def show_help(): 355 | print('Usage : update.py (check|install) github_slug version [--prereleases]') 356 | sys.exit(1) 357 | 358 | argv = sys.argv[:] 359 | prereleases = '--prereleases' in argv 360 | 361 | if prereleases: 362 | argv.remove('--prereleases') 363 | 364 | if len(argv) != 4: 365 | show_help() 366 | 367 | action, github_slug, version = argv[1:] 368 | 369 | if action not in ('check', 'install'): 370 | show_help() 371 | 372 | if action == 'check': 373 | check_update(github_slug, version, prereleases) 374 | elif action == 'install': 375 | install_update(github_slug, version) 376 | -------------------------------------------------------------------------------- /source/workflow/version: -------------------------------------------------------------------------------- 1 | 1.17.2 --------------------------------------------------------------------------------