├── .gitignore ├── .travis.yml ├── DESIGN_THOUGHTS.md ├── MANIFEST ├── MANIFEST.in ├── Makefile ├── README.md ├── TODO.md ├── googleapi ├── __init__.py ├── google.py ├── modules │ ├── __init__.py │ ├── calculator.py │ ├── currency.py │ ├── images.py │ ├── shopping_search.py │ ├── standard_search.py │ └── utils.py └── tests │ ├── __init__.py │ ├── html_files │ ├── test_calculator.html │ ├── test_convert_currency.html │ ├── test_exchange_rate.html │ ├── test_search_images.html │ ├── test_shopping_search.html │ └── test_standard_search.html │ ├── test_google.py │ ├── test_utils.py │ └── vcr_cassetes │ ├── test_convert_currency.yaml │ └── test_standard_search.yaml ├── requirements.py ├── requirements.txt ├── setup.cfg ├── setup.py └── test_requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | ### Linux template 2 | *~ 3 | 4 | # KDE directory preferences 5 | .directory 6 | 7 | # Linux trash folder which might appear on any partition or disk 8 | .Trash-* 9 | ### Python template 10 | # Byte-compiled / optimized / DLL files 11 | __pycache__/ 12 | *.py[cod] 13 | *$py.class 14 | 15 | # C extensions 16 | *.so 17 | 18 | # Distribution / packaging 19 | .Python 20 | env/ 21 | build/ 22 | develop-eggs/ 23 | dist/ 24 | downloads/ 25 | eggs/ 26 | .eggs/ 27 | lib/ 28 | lib64/ 29 | parts/ 30 | sdist/ 31 | var/ 32 | *.egg-info/ 33 | .installed.cfg 34 | *.egg 35 | 36 | # PyInstaller 37 | # Usually these files are written by a python script from a template 38 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 39 | *.manifest 40 | *.spec 41 | 42 | # Installer logs 43 | pip-log.txt 44 | pip-delete-this-directory.txt 45 | 46 | # Unit test / coverage reports 47 | htmlcov/ 48 | .tox/ 49 | .coverage 50 | .coverage.* 51 | .cache 52 | nosetests.xml 53 | coverage.xml 54 | *,cover 55 | 56 | # Translations 57 | *.mo 58 | *.pot 59 | 60 | # Django stuff: 61 | *.log 62 | 63 | # Sphinx documentation 64 | docs/_build/ 65 | 66 | # PyBuilder 67 | target/ 68 | 69 | *.pyc 70 | *.egg 71 | *.egg-info 72 | *.dmp 73 | *.zip 74 | .DS_Store 75 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | sudo: false 3 | python: 4 | - '3.6' 5 | - '3.7' 6 | matrix: 7 | allow_failures: 8 | - python: '3.7' 9 | deploy: 10 | provider: pypi 11 | user: beni 12 | password: 13 | secure: XfUbc5Tnjq8mUHnv/rrQvcQ5m+k7mvk2sAwhS1Hzi2NFXaiPQF0YR2er0BDDQOFYba+MBd57l4zHdyti8Y39uVI2ZfY10c5KYio3VzXDU2doycLf7hY8cqKs8UioabVehrPU96GErVEUyA2Jj1cqrsIUX7Smj8qby0DfX+igJtM= 14 | on: 15 | tags: true 16 | repo: abenassi/Google-Search-API 17 | install: 18 | - pip install -r requirements.txt 19 | - python setup.py install 20 | - pip install coveralls 21 | script: 22 | - nosetests 23 | after_success: coveralls 24 | os: 25 | - linux 26 | -------------------------------------------------------------------------------- /DESIGN_THOUGHTS.md: -------------------------------------------------------------------------------- 1 | Design thoughts about the package 2 | ==== 3 | 4 | *This is a list of design thoughts that have appeared in building/modifying this package and their provisional resolution. They should be taken as useful indicators of what was the developer maintaining this package thinking when making design decisions.* 5 | 6 | * **Relocating methods of public Google class in separated modules** The idea is to encapsulate the logic and functions of each kind of search method outside the public class used by the user to be able to maintain them on an individual basis and repair them without touching the main module and its interface. (Is this more maintainable or is overkilling the problem?) 7 | - As the project may grow or the logic of the used methods may change, it is more maintainable to modularize. This shouldn't add more complication to the package since developers are used to deal with many files, if the structure of the modules make sense. 8 | 9 | * **Duplicated docstrings between methods in the main class and methods in the modules** Updates in main module (google) docstrings should be followed by duplicated updates in auxiliary module's docstrings. (Should I put docstrings only in the main module or manage both, even in a duplicated basis?) 10 | - It is better to make the methods in the main public module be just a reference to the methods hold in the auxiliary modules, so only docstrings and signature of the method written in the module need to be taken care. 11 | 12 | * **Private methods "floating" in a module as global methods** It feels uncomfortable to have private methods not wrapped into a class, but adding classes inside auxiliary methods complicates the interface of the modules unnecessarily (should I keep them like this? Without putting them inside "wrapping" classes?) 13 | - There is nothing wrong about having public and private methods in a module. The most important thing here is to build a clear user interface for the module. 14 | 15 | * **Order of methods in the modules** Private methods before public methods or vice-versa? First approach shows the bricks of the wall first, and then the wall. The second approach shows first the methods that are expected to be used by the user, delaying the reading of some implementation details. 16 | - I'm not really sure about the convenience of one approach over the other one. Provisionally I'm taking the second approach as the better one, because I thing could improve readability. 17 | 18 | * **Top down approach in currency: overkilling?** How far would be desirable to go in breaking down public methods into private ones to make more clear the algorithm followed by the main public method? 19 | - Making private methods with declarative names shows clearer algorithms, encapsulate issues than therefore can be dealt separately and avoid the need of in-line comments. Although tightly related private methods may not be separated, if it's unclear that they can truly be dealt separately. 20 | 21 | * **Closely related tests between main module and auxiliary modules** Some tests in the google module seems to test just the same than some tests in the auxiliary modules. At some point this is duplicating code and work (specially when test must be changed) but they respond to different compartments of the package. 22 | - I am not really sure about the subject. The safest approach would be to say that those tests are really testing different things, even if the tests of the main class are just testing that the reference to the methods written inside auxiliary classes are working fine. 23 | 24 | * **Is it ok to upload something so informal as these DESIGN_THOUGHTS to Github?** Is this the proper place to state these thoughts? Are these thoughts something I should rather keep to myself in a private file? 25 | - The design thoughts that a developer has when building a package (even, as is this case, with a newbie developer) can be useful for someone to understand, improve or discuss any productive changes that can be made in the package. They may sound silly sometimes, they may be even just totally wrong, but they offer (I think) a quick insight on what was this person thinking when taking these design decisions. -------------------------------------------------------------------------------- /MANIFEST: -------------------------------------------------------------------------------- 1 | # file GENERATED by distutils, do NOT edit 2 | requirements.txt 3 | setup.cfg 4 | setup.py 5 | google/__init__.py 6 | google/google.py 7 | google/modules/__init__.py 8 | google/modules/calculator.py 9 | google/modules/currency.py 10 | google/modules/images.py 11 | google/modules/shopping_search.py 12 | google/modules/standard_search.py 13 | google/modules/utils.py 14 | google/tests/__init__.py 15 | google/tests/test_google.py 16 | google/tests/test_utils.py 17 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include requirements.txt 2 | include test_requirements.txt 3 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean clean-test clean-pyc clean-build docs help 2 | .DEFAULT_GOAL := help 3 | define BROWSER_PYSCRIPT 4 | import os, webbrowser, sys 5 | try: 6 | from urllib import pathname2url 7 | except: 8 | from urllib.request import pathname2url 9 | 10 | webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1]))) 11 | endef 12 | export BROWSER_PYSCRIPT 13 | 14 | define PRINT_HELP_PYSCRIPT 15 | import re, sys 16 | 17 | for line in sys.stdin: 18 | match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line) 19 | if match: 20 | target, help = match.groups() 21 | print("%-20s %s" % (target, help)) 22 | endef 23 | export PRINT_HELP_PYSCRIPT 24 | BROWSER := python -c "$$BROWSER_PYSCRIPT" 25 | 26 | help: 27 | @python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST) 28 | 29 | clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts 30 | 31 | 32 | clean-build: ## remove build artifacts 33 | rm -fr build/ 34 | rm -fr dist/ 35 | rm -fr .eggs/ 36 | find . -name '*.egg-info' -exec rm -fr {} + 37 | find . -name '*.egg' -exec rm -f {} + 38 | 39 | clean-pyc: ## remove Python file artifacts 40 | find . -name '*.pyc' -exec rm -f {} + 41 | find . -name '*.pyo' -exec rm -f {} + 42 | find . -name '*~' -exec rm -f {} + 43 | find . -name '__pycache__' -exec rm -fr {} + 44 | 45 | clean-test: ## remove test and coverage artifacts 46 | rm -fr .tox/ 47 | rm -f .coverage 48 | rm -fr htmlcov/ 49 | 50 | lint: ## check style with flake8 51 | flake8 {{ cookiecutter.project_slug }} tests 52 | 53 | test: ## run tests quickly with the default Python 54 | {% if cookiecutter.use_pytest == 'y' -%} 55 | py.test 56 | {% else %} 57 | python setup.py test 58 | {%- endif %} 59 | 60 | test-all: ## run tests on every Python version with tox 61 | tox 62 | 63 | coverage: ## check code coverage quickly with the default Python 64 | {% if cookiecutter.use_pytest == 'y' -%} 65 | coverage run --source {{ cookiecutter.project_slug }} -m pytest 66 | {% else %} 67 | coverage run --source {{ cookiecutter.project_slug }} setup.py test 68 | {% endif %} 69 | coverage report -m 70 | coverage html 71 | $(BROWSER) htmlcov/index.html 72 | 73 | docs: ## generate Sphinx HTML documentation, including API docs 74 | cp README.md docs/README.md 75 | cp HISTORY.md docs/HISTORY.md 76 | rm -f docs/{{ cookiecutter.project_slug }}.rst 77 | rm -f docs/modules.rst 78 | sphinx-apidoc -o docs/ {{ cookiecutter.project_slug }} 79 | $(MAKE) -C docs clean 80 | $(MAKE) -C docs html 81 | $(BROWSER) docs/_build/html/index.html 82 | 83 | servedocs: docs ## compile the docs watching for changes 84 | watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D . 85 | 86 | release: dist ## package and upload a release 87 | twine upload dist/* 88 | 89 | dist: clean ## builds source and wheel package 90 | python setup.py sdist 91 | python setup.py bdist_wheel 92 | ls -l dist 93 | 94 | install: clean ## install the package to the active Python's site-packages 95 | python setup.py install 96 | 97 | pypi: ## register the package to PyPi get travis ready to deploy to pip 98 | make dist 99 | twine upload dist/* 100 | # python travis_pypi_setup.py 101 | 102 | doctoc: ## generate table of contents, doctoc command line tool required 103 | ## https://github.com/thlorenz/doctoc 104 | doctoc --title "## Indice" README.md 105 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | Google Search API 3 | ===== 4 | 5 | [![Coverage Status](https://coveralls.io/repos/abenassi/Google-Search-API/badge.svg?branch=master)](https://coveralls.io/r/abenassi/Google-Search-API?branch=master) 6 | [![Build Status](https://travis-ci.org/abenassi/Google-Search-API.svg?branch=master)](https://travis-ci.org/abenassi/Google-Search-API) 7 | [![](https://img.shields.io/pypi/v/Google-Search-API.svg)](https://pypi.python.org/pypi/Google-Search-API) 8 | 9 | *The original package was developed by Anthony Casagrande and can be downloaded at https://github.com/BirdAPI This is a forked package that I will continue maintaining in the foreseeable future. I will try to maintain a strongly modularized design so when something is broken anyone can quickly repair it. All contributions are very welcome.* 10 | 11 | Google Search API is a python based library for searching various functionalities of google. It uses screen scraping to retrieve the results, and thus is unreliable if the way google's web pages are returned change in the future. This package is currently under heavy refactoring so changes in the user interface should be expected for the time being. 12 | 13 | *Disclaimer: This software uses screen scraping to retrieve search results from google.com, and therefore this software may stop working at any given time. Use this software at your own risk. I assume no responsibility for how this software API is used by others.* 14 | 15 | 16 | 17 | **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* 18 | 19 | - [Google Search API 20 | ](#google-search-api) 21 | - [Development current status 22 | ](#development-current-status) 23 | - [Installation 24 | ](#installation) 25 | - [Google Web Search](#google-web-search) 26 | - [Google Calculator](#google-calculator) 27 | - [Google Image Search](#google-image-search) 28 | - [Google Currency Converter (Exchange Rates)](#google-currency-converter-exchange-rates) 29 | - [Contributions](#contributions) 30 | 31 | 32 | 33 | Development current status 34 | -------------------------- 35 | 36 | All methods are currently functioning and returning its primary target data. Although, some of the secondary data that is supposed to be collected in the result objects is not yet working. 37 | 38 | Redesign of the package is still a work in progress. After completed, I will attempt to repair the gathering of secondary data. Contributions are welcome! 39 | 40 | Installation 41 | ------------ 42 | 43 | The repo is structured like a package, so it can be installed from pip using 44 | github clone url. From command line type: 45 | 46 | ``` 47 | pip install git+https://github.com/abenassi/Google-Search-API 48 | ``` 49 | 50 | To upgrade the package if you have already installed it: 51 | 52 | ``` 53 | pip install git+https://github.com/abenassi/Google-Search-API --upgrade 54 | ``` 55 | 56 | Please note that you should also install **Firefox browser** in order to use images search. 57 | 58 | You could also just download or clone the repo and import the package from 59 | Google-Search-API folder. 60 | 61 | ```python 62 | import os 63 | os.chdir("C:\Path_where_repo_is") 64 | from googleapi import google 65 | ``` 66 | 67 | ## Google Web Search 68 | You can search google web in the following way: 69 | 70 | ```python 71 | from googleapi import google 72 | num_page = 3 73 | search_results = google.search("This is my query", num_page) 74 | ``` 75 | 76 | `search_results` will contain a list of `GoogleResult` objects. num_page parameter is optional (default is 1 page) 77 | 78 | ```python 79 | GoogleResult: 80 | self.name # The title of the link 81 | self.link # The external link 82 | self.google_link # The google link 83 | self.description # The description of the link 84 | self.thumb # The link to a thumbnail of the website (NOT implemented yet) 85 | self.cached # A link to the cached version of the page 86 | self.page # What page this result was on (When searching more than one page) 87 | self.index # What index on this page it was on 88 | self.number_of_results # The total number of results the query returned 89 | ``` 90 | 91 | *Description text parsing has some encoding problems to be resolved.* 92 | *Only google link of the search is being parsed right now, parse the external link is an implementation priority.* 93 | 94 | 95 | ## Google Calculator 96 | Attempts to search google calculator for the result of an expression. Returns a `CalculatorResult` if successful or `None` if it fails. 97 | 98 | ```python 99 | from googleapi import google 100 | google.calculate("157.3kg in grams") 101 | ``` 102 | 103 | ```python 104 | CalculatorResult 105 | value = None # Result value (eg. 157300.0) 106 | from_value = None # Initial value (eg. 157.3) 107 | unit = None # Result unit (eg. u'grams') (NOT implemented yet) 108 | from_unit = None # Initial unit (eg. u'kilograms') (NOT implemented yet) 109 | expr = None # Initial expression (eg. u'157.3 grams') (NOT implemented yet) 110 | result = None # Result expression (eg. u'157300 kilograms') (NOT implemented yet) 111 | fullstring = None # Result unit (eg. u'157.3 kilograms = 157300 grams') (NOT implemented yet) 112 | ``` 113 | 114 | *Parsing of the units must be implemented. The rest of the data members of CalculatorResult can be build from the values and units of the calculation.* 115 | 116 | ## Google Image Search 117 | Searches google images for a list of images. Image searches can be filtered to produce better results. Image searches can be downloaded. 118 | 119 | ### Requirement 120 | Image search uses the selenium & the Firefox driver, therefor you MUST have [Firefox installed](https://www.mozilla.org/en-US/firefox/new/) to use it. 121 | 122 | Perform a google image search on "banana" and filter it: 123 | 124 | ```python 125 | from googleapi import google, images 126 | options = images.ImageOptions() 127 | options.image_type = images.ImageType.CLIPART 128 | options.larger_than = images.LargerThan.MP_4 129 | options.color = "green" 130 | results = google.search_images("banana", options) 131 | ``` 132 | 133 | Sample Result: 134 | 135 | ```python 136 | {'domain': 'shop.tradedoubler.com', 137 | 'filesize': None, 138 | 'format': None, 139 | 'height': '2000', 140 | 'index': 15, 141 | 'link': 'http://tesco.scene7.com/is/image/tesco/210-8446_PI_1000013MN%3Fwid%3D2000%26hei%3D2000', 142 | 'name': None, 143 | 'page': 1, 144 | 'site': 'http://shop.tradedoubler.com/shop/uk-01/a/2058674/productName/banana/sortBy/price/sortReverse/false', 145 | 'thumb': 'https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcS8JPH_bgyvvyf5X67k32ZZYjf9MlWlxHIEXXxi91TVrNafpokI', 146 | 'thumb_height': '199px', 147 | 'thumb_width': '199px', 148 | 'width': '2000'} 149 | ``` 150 | 151 | *filesize is to be implemented. format works, but sometimes the link of the image doesn't show the format. Google images right now seems to not have a names, so the method for that is not implemented.* 152 | 153 | Filter options: 154 | 155 | ```python 156 | ImageOptions: 157 | image_type # face, body, clipart, line drawing 158 | size_category # large, small, icon 159 | larger_than # the well known name of the smallest image size you want 160 | exact_width # the exact width of the image you want 161 | exact_height # the exact height of the image you want 162 | color_type # color, b&w, specific 163 | color # blue, green, red 164 | ``` 165 | 166 | Enums of values that can be used to filter image searches: 167 | 168 | ```python 169 | class ImageType: 170 | NONE = None 171 | FACE = "face" 172 | PHOTO = "photo" 173 | CLIPART = "clipart" 174 | LINE_DRAWING = "lineart" 175 | 176 | class SizeCategory: 177 | NONE = None 178 | ICON = "i" 179 | LARGE = "l" 180 | MEDIUM = "m" 181 | SMALL = "s" 182 | LARGER_THAN = "lt" 183 | EXACTLY = "ex" 184 | 185 | class LargerThan: 186 | NONE = None 187 | QSVGA = "qsvga" # 400 x 300 188 | VGA = "vga" # 640 x 480 189 | SVGA = "svga" # 800 x 600 190 | XGA = "xga" # 1024 x 768 191 | MP_2 = "2mp" # 2 MP (1600 x 1200) 192 | MP_4 = "4mp" # 4 MP (2272 x 1704) 193 | MP_6 = "6mp" # 6 MP (2816 x 2112) 194 | MP_8 = "8mp" # 8 MP (3264 x 2448) 195 | MP_10 = "10mp" # 10 MP (3648 x 2736) 196 | MP_12 = "12mp" # 12 MP (4096 x 3072) 197 | MP_15 = "15mp" # 15 MP (4480 x 3360) 198 | MP_20 = "20mp" # 20 MP (5120 x 3840) 199 | MP_40 = "40mp" # 40 MP (7216 x 5412) 200 | MP_70 = "70mp" # 70 MP (9600 x 7200) 201 | 202 | class ColorType: 203 | NONE = None 204 | COLOR = "color" 205 | BLACK_WHITE = "gray" 206 | SPECIFIC = "specific" 207 | 208 | class License: 209 | NONE = None 210 | REUSE = "fc" 211 | REUSE_WITH_MOD = "fmc" 212 | REUSE_NON_COMMERCIAL = "f" 213 | REUSE_WITH_MOD_NON_COMMERCIAL = "fm" 214 | ``` 215 | 216 | You can download a list of images. 217 | 218 | ```python 219 | images.download(image_results, path = "path/to/download/images") 220 | ``` 221 | 222 | Path is an optional argument, if you don't specify a path, images will be downloaded to an "images" folder inside the working directory. 223 | 224 | If you want to download a large list of images, the previous method could be slow. A better method using multithreading is provided for this case. 225 | 226 | ```python 227 | images.fast_download(image_results, path = "path/to/download/images", threads=12) 228 | ``` 229 | 230 | You may change the number of threads, 12 is the number that has offered the best speed after a number of informal tests that I've done. 231 | 232 | ## Google Currency Converter (Exchange Rates) 233 | Convert between one currency and another using google calculator. Results are real time and can change at any time based on the current exchange rate according to google. 234 | 235 | Convert 5 US Dollars to Euros using the official 3 letter currency acronym ([ISO 4217](https://en.wikipedia.org/wiki/ISO_4217)): 236 | 237 | ```python 238 | from googleapi import google 239 | euros = google.convert_currency(5.0, "USD", "EUR") 240 | print "5.0 USD = {0} EUR".format(euros) 241 | ``` 242 | 243 | ```python 244 | 5.0 USD = 3.82350692 EUR 245 | ``` 246 | 247 | Convert 1000 Japanese Yen to US Dollars: 248 | 249 | ```python 250 | yen = google.convert_currency(1000, "yen", "us dollars") 251 | print "1000 yen = {0} us dollars".format(yen) 252 | ``` 253 | 254 | ```python 255 | 1000 yen = 12.379 us dollars 256 | ``` 257 | 258 | Instead you can get the exchange rate which returns what 1 `from_currency` equals in `to_currency` and do your own math: 259 | 260 | ```python 261 | rate = google.exchange_rate("dollars", "pesos") 262 | print "dollars -> pesos exchange rate = {0}".format(rate) 263 | ``` 264 | 265 | ```python 266 | dollars -> pesos exchange rate = 13.1580679 267 | ``` 268 | 269 | Perform your own math. The following 2 statements are equal: 270 | 271 | ```python 272 | 5.0 * google.exchange_rate("USD", "EUR") 273 | ``` 274 | 275 | ```python 276 | google.convert_currency(5.0, "USD", "EUR") 277 | ``` 278 | 279 | As a side note, `convert_currency` is always more accurate than performing your own math on `exchange_rate` because of possible rounding errors. However if you have more than one value to convert it is best to call `exchange_rate` and cache the result to use for multiple calculations instead of querying the google server for each one. 280 | 281 | 282 | ## Contributions 283 | 284 | All contributions are very welcome! As you have seen, there is still some methods that are not implemented. The structure of the package is intended to facilitate that you can contribute implementing or improving any method without changing other code. 285 | 286 | Other interesting things that you may do is to build a good command line interface for the package. You can also take a look to the [TODO list](https://github.com/abenassi/Google-Search-API/blob/master/TODO.md) 287 | 288 | For all contributions, we intend to follow the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html) 289 | -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | Todo list for Google-Search-API 2 | ==== 3 | 4 | ## Next tasks 5 | 6 | - [x] Mock html test for all search methods - https://pypi.python.org/pypi/mock/ 7 | - [x] Test and implement the use of images options in search images method 8 | - [x] Re-factor and split main module in separated ones 9 | - [ ] Implement fast download using concurrent futures - https://gist.github.com/mangecoeur/9540178 - https://docs.python.org/3/library/concurrent.futures.html 10 | - [ ] Implement google search external link scraping 11 | - [ ] Be able to manage both Chrome and Firefox, and maybe Ie too, depending in what browser the user has 12 | - [ ] Implement PhantomJS for not showing the explorer opening 13 | - [ ] Write a full suite of docs tests and unit tests 14 | - [ ] Reconvert all comments following the google python style guide - https://google-styleguide.googlecode.com/svn/trunk/pyguide.html 15 | - [ ] Add support for Google Scholar search (check out Py3 library google-scholar) 16 | 17 | ## Stuff to check out later on 18 | 19 | * vcrpy to record web requests and make more automated mocks 20 | * SauceLabs to do UI tests 21 | * Sphinx to generate documentation - readthedocs style 22 | * PhantomJS to use browsers without actually opening them (works with selenium) - http://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python -------------------------------------------------------------------------------- /googleapi/__init__.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from .modules import calculator, currency, images, utils 3 | from .modules import standard_search, shopping_search 4 | -------------------------------------------------------------------------------- /googleapi/google.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import absolute_import 3 | 4 | from .modules import images 5 | from .modules import currency 6 | from .modules import calculator 7 | from .modules import standard_search 8 | # from modules import shopping_search 9 | 10 | __author__ = "Anthony Casagrande , " + \ 11 | "Agustin Benassi " 12 | __version__ = "1.1.0" 13 | 14 | 15 | """Defines the public inteface of the API.""" 16 | 17 | search = standard_search.search 18 | search_images = images.search 19 | convert_currency = currency.convert 20 | exchange_rate = currency.exchange_rate 21 | calculate = calculator.calculate 22 | 23 | # TODO: This method is not working anymore! There is a new GET 24 | # link for this kind of search 25 | # shopping = shopping_search.shopping 26 | 27 | if __name__ == "__main__": 28 | import doctest 29 | doctest.testmod() 30 | -------------------------------------------------------------------------------- /googleapi/modules/__init__.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | from . import calculator 3 | from . import currency 4 | from . import images 5 | from . import shopping_search 6 | from . import standard_search 7 | -------------------------------------------------------------------------------- /googleapi/modules/calculator.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import absolute_import 3 | from builtins import object 4 | from unidecode import unidecode 5 | 6 | from .utils import get_html_from_dynamic_site 7 | from .utils import _get_search_url 8 | from bs4 import BeautifulSoup 9 | 10 | 11 | class CalculatorResult(object): 12 | 13 | """Represents a result returned from google calculator.""" 14 | 15 | def __init__(self): 16 | self.value = None # Result value (eg. 157300.0) 17 | self.from_value = None # Initial value (eg. 157.3) 18 | self.unit = None # Result unit (eg. u'grams') (NOT implemented yet) 19 | # Initial unit (eg. u'kilograms') (NOT implemented yet) 20 | self.from_unit = None 21 | # Initial expression (eg. u'157.3 grams') (NOT implemented yet) 22 | self.expr = None 23 | # Result expression (eg. u'157300 kilograms') (NOT implemented yet) 24 | self.result = None 25 | # Complete expression (eg. u'157.3 kilograms = 157300 grams') (NOT 26 | # implemented yet) 27 | self.fullstring = None 28 | 29 | def __repr__(self): 30 | return unidecode(self.value) 31 | 32 | 33 | # PUBLIC 34 | def calculate(expr): 35 | """Search for a calculation expression in google. 36 | 37 | Attempts to search google calculator for the result of an expression. 38 | Returns a `CalculatorResult` if successful or `None` if it fails. 39 | 40 | Args: 41 | expr: Calculation expression (eg. "cos(25 pi) / 17.4" or 42 | "157.3kg in grams") 43 | 44 | Returns: 45 | CalculatorResult object.""" 46 | 47 | url = _get_search_url(expr) 48 | html = get_html_from_dynamic_site(url) 49 | bs = BeautifulSoup(html) 50 | 51 | cr = CalculatorResult() 52 | cr.value = _get_to_value(bs) 53 | cr.from_value = _get_from_value(bs) 54 | cr.unit = _get_to_unit(bs) 55 | cr.from_unit = _get_from_unit(bs) 56 | cr.expr = _get_expr(bs) 57 | cr.result = _get_result(bs) 58 | cr.fullstring = _get_fullstring(bs) 59 | 60 | return cr 61 | 62 | 63 | # PRIVATE 64 | def _get_to_value(bs): 65 | input_node = bs.find("div", {"id": "_Cif"}) 66 | return float(input_node.find("input")["value"]) 67 | 68 | 69 | def _get_from_value(bs): 70 | input_node = bs.find("div", {"id": "_Aif"}) 71 | return float(input_node.find("input")["value"]) 72 | 73 | 74 | def _get_to_unit(bs): 75 | return None 76 | 77 | 78 | def _get_from_unit(bs): 79 | return None 80 | 81 | 82 | def _get_expr(bs): 83 | return None 84 | 85 | 86 | def _get_result(bs): 87 | return None 88 | 89 | 90 | def _get_fullstring(bs): 91 | return None 92 | -------------------------------------------------------------------------------- /googleapi/modules/currency.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import absolute_import 3 | 4 | from .utils import get_html 5 | from bs4 import BeautifulSoup 6 | 7 | 8 | # PUBLIC 9 | def convert(amount, from_currency, to_currency): 10 | """Method to convert currency. 11 | 12 | Args: 13 | amount: numeric amount to convert 14 | from_currency: currency denomination of the amount to convert 15 | to_currency: target currency denomination to convert to 16 | """ 17 | 18 | # same currency, no conversion 19 | if from_currency == to_currency: 20 | return amount * 1.0 21 | 22 | req_url = _get_currency_req_url(amount, 23 | from_currency, to_currency) 24 | response = get_html(req_url) 25 | rate = _parse_currency_response(response, to_currency) 26 | 27 | return rate 28 | 29 | 30 | def exchange_rate(from_currency, to_currency): 31 | """Gets the exchange rate of one currency to another. 32 | 33 | Args: 34 | from_currency: starting currency denomination (1) 35 | to_currency: target currency denomination to convert to (rate) 36 | 37 | Returns: 38 | rate / 1 to convert from_currency in to_currency 39 | """ 40 | return convert(1, from_currency, to_currency) 41 | 42 | 43 | # PRIVATE 44 | def _get_currency_req_url(amount, from_currency, to_currency): 45 | return "https://www.google.com/finance/converter?a={0}&from={1}&to={2}".format( 46 | amount, from_currency.replace(" ", "%20"), 47 | to_currency.replace(" ", "%20")) 48 | 49 | 50 | def _parse_currency_response(response, to_currency): 51 | bs = BeautifulSoup(response) 52 | str_rate = bs.find(id="currency_converter_result").span.get_text() 53 | rate = float(str_rate.replace(to_currency, "").strip()) 54 | return rate 55 | -------------------------------------------------------------------------------- /googleapi/modules/images.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import print_function 3 | from __future__ import absolute_import 4 | from future import standard_library 5 | standard_library.install_aliases() 6 | from builtins import str 7 | from builtins import range 8 | from builtins import object 9 | from unidecode import unidecode 10 | 11 | from .utils import get_browser_with_url, write_html_to_file, measure_time 12 | from bs4 import BeautifulSoup 13 | import urllib.parse 14 | import sys 15 | import requests 16 | import shutil 17 | import os 18 | import threading 19 | import queue 20 | 21 | 22 | IMAGE_FORMATS = ["bmp", "gif", "jpg", "png", "psd", "pspimage", "thm", 23 | "tif", "yuv", "ai", "drw", "eps", "ps", "svg", "tiff", 24 | "jpeg", "jif", "jfif", "jp2", "jpx", "j2k", "j2c", "fpx", 25 | "pcd", "png", "pdf"] 26 | 27 | 28 | # AUXILIARY CLASSES 29 | class ImageType(object): 30 | NONE = None 31 | FACE = "face" 32 | PHOTO = "photo" 33 | CLIPART = "clipart" 34 | LINE_DRAWING = "lineart" 35 | 36 | 37 | class SizeCategory(object): 38 | NONE = None 39 | ICON = "i" 40 | LARGE = "l" 41 | MEDIUM = "m" 42 | SMALL = "s" 43 | LARGER_THAN = "lt" 44 | EXACTLY = "ex" 45 | 46 | 47 | class LargerThan(object): 48 | NONE = None 49 | QSVGA = "qsvga" # 400 x 300 50 | VGA = "vga" # 640 x 480 51 | SVGA = "svga" # 800 x 600 52 | XGA = "xga" # 1024 x 768 53 | MP_2 = "2mp" # 2 MP (1600 x 1200) 54 | MP_4 = "4mp" # 4 MP (2272 x 1704) 55 | MP_6 = "6mp" # 6 MP (2816 x 2112) 56 | MP_8 = "8mp" # 8 MP (3264 x 2448) 57 | MP_10 = "10mp" # 10 MP (3648 x 2736) 58 | MP_12 = "12mp" # 12 MP (4096 x 3072) 59 | MP_15 = "15mp" # 15 MP (4480 x 3360) 60 | MP_20 = "20mp" # 20 MP (5120 x 3840) 61 | MP_40 = "40mp" # 40 MP (7216 x 5412) 62 | MP_70 = "70mp" # 70 MP (9600 x 7200) 63 | 64 | 65 | class ColorType(object): 66 | NONE = None 67 | COLOR = "color" 68 | BLACK_WHITE = "gray" 69 | SPECIFIC = "specific" 70 | 71 | 72 | class License(object): 73 | NONE = None 74 | REUSE = "fc" 75 | REUSE_WITH_MOD = "fmc" 76 | REUSE_NON_COMMERCIAL = "f" 77 | REUSE_WITH_MOD_NON_COMMERCIAL = "fm" 78 | 79 | 80 | class ImageOptions(object): 81 | 82 | """Allows passing options to filter a google images search.""" 83 | 84 | def __init__(self): 85 | self.image_type = None 86 | self.size_category = None 87 | self.larger_than = None 88 | self.exact_width = None 89 | self.exact_height = None 90 | self.color_type = None 91 | self.color = None 92 | self.license = None 93 | 94 | def __repr__(self): 95 | return unidecode(self.__dict__) 96 | 97 | def get_tbs(self): 98 | tbs = None 99 | if self.image_type: 100 | # clipart 101 | tbs = self._add_to_tbs(tbs, "itp", self.image_type) 102 | if self.size_category and not (self.larger_than or (self.exact_width and self.exact_height)): 103 | # i = icon, l = large, m = medium, lt = larger than, ex = exact 104 | tbs = self._add_to_tbs(tbs, "isz", self.size_category) 105 | if self.larger_than: 106 | # qsvga,4mp 107 | tbs = self._add_to_tbs(tbs, "isz", SizeCategory.LARGER_THAN) 108 | tbs = self._add_to_tbs(tbs, "islt", self.larger_than) 109 | if self.exact_width and self.exact_height: 110 | tbs = self._add_to_tbs(tbs, "isz", SizeCategory.EXACTLY) 111 | tbs = self._add_to_tbs(tbs, "iszw", self.exact_width) 112 | tbs = self._add_to_tbs(tbs, "iszh", self.exact_height) 113 | if self.color_type and not self.color: 114 | # color = color, gray = black and white, specific = user defined 115 | tbs = self._add_to_tbs(tbs, "ic", self.color_type) 116 | if self.color: 117 | tbs = self._add_to_tbs(tbs, "ic", ColorType.SPECIFIC) 118 | tbs = self._add_to_tbs(tbs, "isc", self.color) 119 | if self.license: 120 | tbs = self._add_to_tbs(tbs, "sur", self.license) 121 | return tbs 122 | 123 | def _add_to_tbs(self, tbs, name, value): 124 | if tbs: 125 | return "%s,%s:%s" % (tbs, name, value) 126 | else: 127 | return "&tbs=%s:%s" % (name, value) 128 | 129 | 130 | class ImageResult(object): 131 | 132 | """Represents a google image search result.""" 133 | 134 | ROOT_FILENAME = "img" 135 | DEFAULT_FORMAT = "jpg" 136 | 137 | def __init__(self): 138 | self.name = None 139 | self.file_name = None 140 | self.link = None 141 | self.thumb = None 142 | self.thumb_width = None 143 | self.thumb_height = None 144 | self.width = None 145 | self.height = None 146 | self.filesize = None 147 | self.format = None 148 | self.domain = None 149 | self.page = None 150 | self.index = None 151 | self.site = None 152 | 153 | def __eq__(self, other): 154 | return self.link == other.link 155 | 156 | def __hash__(self): 157 | return id(self.link) 158 | 159 | def __repr__(self): 160 | string = "ImageResult(index={i}, page={p}, domain={d}, link={l})".format( 161 | i=str(self.index), 162 | p=str(self.page), 163 | d=unidecode(self.domain) if self.domain else None, 164 | l=unidecode(self.link) if self.link else None 165 | ) 166 | return string 167 | 168 | def download(self, path="images"): 169 | """Download an image to a given path.""" 170 | 171 | self._create_path(path) 172 | # print path 173 | 174 | try: 175 | response = requests.get(self.link, stream=True) 176 | # request a protected image (adding a referer to the request) 177 | # referer = self.domain 178 | # image = self.link 179 | 180 | # req = urllib2.Request(image) 181 | # req.add_header('Referer', referer) # here is the trick 182 | # response = urllib2.urlopen(req) 183 | 184 | if "image" in response.headers['content-type']: 185 | path_filename = self._get_path_filename(path) 186 | with open(path_filename, 'wb') as output_file: 187 | shutil.copyfileobj(response.raw, output_file) 188 | # output_file.write(response.content) 189 | else: 190 | print("\r\rskiped! cached image") 191 | 192 | del response 193 | 194 | except Exception as inst: 195 | print(self.link, "has failed:") 196 | print(inst) 197 | 198 | def _get_path_filename(self, path): 199 | """Build the filename to download. 200 | 201 | Checks that filename is not already in path. Otherwise looks for 202 | another name. 203 | 204 | >>> ir = ImageResult() 205 | >>> ir._get_path_filename("test") 206 | 'test\\\img3.jpg' 207 | >>> ir.name = "pirulo" 208 | >>> ir.format = "jpg" 209 | >>> ir._get_path_filename("test") 210 | 'test\\\pirulo.jpg' 211 | """ 212 | 213 | path_filename = None 214 | 215 | # preserve the original name 216 | if self.file_name: 217 | original_filename = self.file_name 218 | path_filename = os.path.join(path, original_filename) 219 | 220 | # create a default name if there is no original name 221 | if not path_filename or os.path.isfile(path_filename): 222 | 223 | # take the format of the file, or use default 224 | if self.format: 225 | file_format = self.format 226 | else: 227 | file_format = self.DEFAULT_FORMAT 228 | 229 | # create root of file, until reaching a non existent one 230 | i = 1 231 | default_filename = self.ROOT_FILENAME + str(i) + "." + file_format 232 | path_filename = os.path.join(path, default_filename) 233 | while os.path.isfile(path_filename): 234 | i += 1 235 | default_filename = self.ROOT_FILENAME + str(i) + "." + \ 236 | file_format 237 | path_filename = os.path.join(path, default_filename) 238 | 239 | return path_filename 240 | 241 | def _create_path(self, path): 242 | """Create a path, if it doesn't exists.""" 243 | 244 | if not os.path.isdir(path): 245 | os.mkdir(path) 246 | 247 | 248 | # PRIVATE 249 | def _parse_image_format(image_link): 250 | """Parse an image format from a download link. 251 | 252 | Args: 253 | image_link: link to download an image. 254 | 255 | >>> link = "http://blogs.elpais.com/.a/6a00d8341bfb1653ef01a73dbb4a78970d-pi" 256 | >>> Google._parse_image_format(link) 257 | 258 | >>> link = "http://minionslovebananas.com/images/gallery/preview/Chiquita-DM2-minion-banana-3.jpg%3Fw%3D300%26h%3D429" 259 | >>> Google._parse_image_format(link) 260 | 261 | """ 262 | parsed_format = image_link[image_link.rfind(".") + 1:] 263 | 264 | # OLD: identify formats even with weird final characters 265 | if parsed_format not in IMAGE_FORMATS: 266 | for image_format in IMAGE_FORMATS: 267 | if image_format in parsed_format: 268 | parsed_format = image_format 269 | break 270 | 271 | if parsed_format not in IMAGE_FORMATS: 272 | parsed_format = None 273 | 274 | return parsed_format 275 | 276 | 277 | def _get_images_req_url(query, image_options=None, page=0, 278 | per_page=20): 279 | query = query.strip().replace(":", "%3A").replace( 280 | "+", "%2B").replace("&", "%26").replace(" ", "+") 281 | 282 | url = "https://www.google.com.ar/search?q={}".format(query) + \ 283 | "&es_sm=122&source=lnms" + \ 284 | "&tbm=isch&sa=X&ei=DDdUVL-fE4SpNq-ngPgK&ved=0CAgQ_AUoAQ" + \ 285 | "&biw=1024&bih=719&dpr=1.25" 286 | 287 | if image_options: 288 | tbs = image_options.get_tbs() 289 | if tbs: 290 | url = url + tbs 291 | 292 | return url 293 | 294 | 295 | def _find_divs_with_images(soup): 296 | 297 | try: 298 | div_container = soup.find("div", {"id": "rg_s"}) 299 | divs = div_container.find_all("div", {"class": "rg_di"}) 300 | except: 301 | divs = None 302 | return divs 303 | 304 | 305 | def _get_file_name(link): 306 | 307 | temp_name = link.rsplit('/', 1)[-1] 308 | image_format = _parse_image_format(link) 309 | 310 | if image_format and temp_name.rsplit(".", 1)[-1] != image_format: 311 | file_name = temp_name.rsplit(".", 1)[0] + "." + image_format 312 | 313 | else: 314 | file_name = temp_name 315 | 316 | return file_name 317 | 318 | 319 | def _get_name(): 320 | pass 321 | 322 | 323 | def _get_filesize(): 324 | pass 325 | 326 | 327 | def _get_image_data(res, a): 328 | """Parse image data and write it to an ImageResult object. 329 | 330 | Args: 331 | res: An ImageResult object. 332 | a: An "a" html tag. 333 | """ 334 | google_middle_link = a["href"] 335 | url_parsed = urllib.parse.urlparse(google_middle_link) 336 | qry_parsed = urllib.parse.parse_qs(url_parsed.query) 337 | res.name = _get_name() 338 | res.link = qry_parsed["imgurl"][0] 339 | res.file_name = _get_file_name(res.link) 340 | res.format = _parse_image_format(res.link) 341 | res.width = qry_parsed["w"][0] 342 | res.height = qry_parsed["h"][0] 343 | res.site = qry_parsed["imgrefurl"][0] 344 | res.domain = urllib.parse.urlparse(res.site).netloc 345 | res.filesize = _get_filesize() 346 | 347 | 348 | def _get_thumb_data(res, img): 349 | """Parse thumb data and write it to an ImageResult object. 350 | 351 | Args: 352 | res: An ImageResult object. 353 | a: An "a" html tag. 354 | """ 355 | try: 356 | res.thumb = img[0]["src"] 357 | except: 358 | res.thumb = img[0]["data-src"] 359 | 360 | try: 361 | img_style = img[0]["style"].split(";") 362 | img_style_dict = {i.split(":")[0]: i.split(":")[-1] for i in img_style} 363 | res.thumb_width = img_style_dict["width"] 364 | res.thumb_height = img_style_dict["height"] 365 | except: 366 | exc_type, exc_value, exc_traceback = sys.exc_info() 367 | print(exc_type, exc_value, "index=", res.index) 368 | 369 | 370 | # PUBLIC 371 | def search_old(query, image_options=None, pages=1): 372 | results = [] 373 | for i in range(pages): 374 | url = get_image_search_url(query, image_options, i) 375 | html = get_html(url) 376 | if html: 377 | if Google.DEBUG_MODE: 378 | write_html_to_file( 379 | html, "images_{0}_{1}.html".format(query.replace(" ", "_"), i)) 380 | j = 0 381 | soup = BeautifulSoup(html) 382 | match = re.search("dyn.setResults\((.+)\);", html) 383 | if match: 384 | init = str(match.group(1), errors="ignore") 385 | tokens = init.split('],[') 386 | for token in tokens: 387 | res = ImageResult() 388 | res.page = i 389 | res.index = j 390 | toks = token.split(",") 391 | 392 | # should be 32 or 33, but seems to change, so just make sure no exceptions 393 | # will be thrown by the indexing 394 | if (len(toks) > 22): 395 | for t in range(len(toks)): 396 | toks[t] = toks[t].replace('\\x3cb\\x3e', '').replace( 397 | '\\x3c/b\\x3e', '').replace('\\x3d', '=').replace('\\x26', '&') 398 | match = re.search( 399 | "imgurl=(?P[^&]+)&imgrefurl", toks[0]) 400 | if match: 401 | res.link = match.group("link") 402 | res.name = toks[6].replace('"', '') 403 | res.thumb = toks[21].replace('"', '') 404 | res.format = toks[10].replace('"', '') 405 | res.domain = toks[11].replace('"', '') 406 | match = re.search( 407 | "(?P[0-9]+) × (?P[0-9]+) - (?P[^ ]+)", toks[9].replace('"', '')) 408 | if match: 409 | res.width = match.group("width") 410 | res.height = match.group("height") 411 | res.filesize = match.group("size") 412 | results.append(res) 413 | j = j + 1 414 | return results 415 | 416 | 417 | def search(query, image_options=None, num_images=50): 418 | """Search images in google. 419 | 420 | Search images in google filtering by image type, size category, resolution, 421 | exact width, exact height, color type or color. A simple search can be 422 | performed without passing options. To filter the search, an ImageOptions 423 | must be built with the different filter categories and passed. 424 | 425 | Args: 426 | query: string to search in google images 427 | image_options: an ImageOptions object to filter the search 428 | num_images: number of images to be scraped 429 | 430 | Returns: 431 | A list of ImageResult objects 432 | """ 433 | 434 | results = set() 435 | curr_num_img = 1 436 | page = 0 437 | browser = get_browser_with_url("about:home") 438 | while curr_num_img <= num_images: 439 | 440 | page += 1 441 | url = _get_images_req_url(query, image_options, page) 442 | # html = get_html_from_dynamic_site(url) 443 | browser.get(url) 444 | html = browser.page_source 445 | 446 | if html: 447 | soup = BeautifulSoup(html) 448 | 449 | # iterate over the divs containing images in one page 450 | divs = _find_divs_with_images(soup) 451 | 452 | # empty search result page case 453 | if not divs: 454 | break 455 | 456 | for div in divs: 457 | 458 | res = ImageResult() 459 | 460 | # store indexing paramethers 461 | res.page = page 462 | res.index = curr_num_img 463 | 464 | # get url of image and its secondary data 465 | a = div.find("a") 466 | if a: 467 | _get_image_data(res, a) 468 | 469 | # get url of thumb and its size paramethers 470 | img = a.find_all("img") 471 | if img: 472 | _get_thumb_data(res, img) 473 | 474 | # increment image counter only if a new image was added 475 | prev_num_results = len(results) 476 | results.add(res) 477 | curr_num_results = len(results) 478 | 479 | if curr_num_results > prev_num_results: 480 | curr_num_img += 1 481 | 482 | # break the loop when limit of images is reached 483 | if curr_num_img >= num_images: 484 | break 485 | 486 | browser.quit() 487 | 488 | return list(results) 489 | 490 | 491 | def _download_image(image_result, path): 492 | 493 | if image_result.format: 494 | if path: 495 | image_result.download(path) 496 | else: 497 | image_result.download() 498 | 499 | 500 | @measure_time 501 | def download(image_results, path=None): 502 | """Download a list of images. 503 | 504 | Args: 505 | images_list: a list of ImageResult instances 506 | path: path to store downloaded images. 507 | """ 508 | 509 | total_images = len(image_results) 510 | i = 1 511 | for image_result in image_results: 512 | 513 | progress = "".join(["Downloading image ", str(i), 514 | " (", str(total_images), ")"]) 515 | print(progress) 516 | sys.stdout.flush() 517 | 518 | _download_image(image_result, path) 519 | 520 | i += 1 521 | 522 | 523 | class ThreadUrl(threading.Thread): 524 | 525 | """Threaded Url Grab""" 526 | 527 | def __init__(self, queue, path, total): 528 | threading.Thread.__init__(self) 529 | self.queue = queue 530 | self.path = path 531 | self.total = total 532 | 533 | def run(self): 534 | while True: 535 | # grabs host from queue 536 | image_result = self.queue.get() 537 | 538 | counter = self.total - self.queue.qsize() 539 | progress = "".join(["Downloading image ", str(counter), 540 | " (", str(self.total), ")"]) 541 | print(progress) 542 | sys.stdout.flush() 543 | _download_image(image_result, self.path) 544 | 545 | # signals to queue job is done 546 | self.queue.task_done() 547 | 548 | 549 | @measure_time 550 | def fast_download(image_results, path=None, threads=10): 551 | # print path 552 | queue = queue.Queue() 553 | total = len(image_results) 554 | 555 | for image_result in image_results: 556 | queue.put(image_result) 557 | 558 | # spawn a pool of threads, and pass them queue instance 559 | for i in range(threads): 560 | t = ThreadUrl(queue, path, total) 561 | t.setDaemon(True) 562 | t.start() 563 | 564 | # wait on the queue until everything has been processed 565 | queue.join() 566 | -------------------------------------------------------------------------------- /googleapi/modules/shopping_search.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import print_function 3 | from __future__ import absolute_import 4 | 5 | from builtins import range 6 | from builtins import object 7 | 8 | from .utils import get_html, normalize_query 9 | from bs4 import BeautifulSoup 10 | import re 11 | from unidecode import unidecode 12 | 13 | 14 | class ShoppingResult(object): 15 | 16 | """Represents a shopping result.""" 17 | 18 | def __init__(self): 19 | self.name = None 20 | self.link = None 21 | self.thumb = None 22 | self.subtext = None 23 | self.description = None 24 | self.compare_url = None 25 | self.store_count = None 26 | self.min_price = None 27 | 28 | def __repr__(self): 29 | return unidecode(self.name) 30 | 31 | 32 | def shopping(query, pages=1): 33 | results = [] 34 | for i in range(pages): 35 | url = _get_shopping_url(query, i) 36 | html = get_html(url) 37 | if html: 38 | j = 0 39 | soup = BeautifulSoup(html) 40 | 41 | products = soup.findAll("div", "g") 42 | print("yoooo", products) 43 | for prod in products: 44 | res = ShoppingResult() 45 | 46 | divs = prod.findAll("div") 47 | for div in divs: 48 | match = re.search( 49 | "from (?P[0-9]+) stores", div.text.strip()) 50 | if match: 51 | res.store_count = match.group("count") 52 | break 53 | 54 | h3 = prod.find("h3", "r") 55 | if h3: 56 | a = h3.find("a") 57 | if a: 58 | res.compare_url = a["href"] 59 | res.name = h3.text.strip() 60 | 61 | psliimg = prod.find("div", "psliimg") 62 | if psliimg: 63 | img = psliimg.find("img") 64 | if img: 65 | res.thumb = img["src"] 66 | 67 | f = prod.find("div", "f") 68 | if f: 69 | res.subtext = f.text.strip() 70 | 71 | price = prod.find("div", "psliprice") 72 | if price: 73 | res.min_price = price.text.strip() 74 | 75 | results.append(res) 76 | j = j + 1 77 | return results 78 | 79 | 80 | def _get_shopping_url(query, page=0, per_page=10): 81 | return "http://www.google.com/search?hl=en&q={0}&tbm=shop&start={1}&num={2}".format(normalize_query(query), page * per_page, per_page) 82 | -------------------------------------------------------------------------------- /googleapi/modules/standard_search.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import absolute_import 3 | 4 | from future import standard_library 5 | standard_library.install_aliases() 6 | from builtins import range 7 | from builtins import object 8 | from .utils import _get_search_url, get_html 9 | from bs4 import BeautifulSoup 10 | import urllib.parse 11 | from urllib.parse import unquote, parse_qs, urlparse 12 | from unidecode import unidecode 13 | from re import match, findall 14 | 15 | 16 | class GoogleResult(object): 17 | 18 | """Represents a google search result.""" 19 | 20 | def __init__(self): 21 | self.name = None # The title of the link 22 | self.link = None # The external link 23 | self.google_link = None # The google link 24 | self.description = None # The description of the link 25 | self.thumb = None # Thumbnail link of website (NOT implemented yet) 26 | self.cached = None # Cached version link of page 27 | self.page = None # Results page this one was on 28 | self.index = None # What index on this page it was on 29 | self.number_of_results = None # The total number of results the query returned 30 | self.is_pdf = None # This boolean is true if google thinks this result leads to a PDF file 31 | 32 | def __repr__(self): 33 | name = self._limit_str_size(self.name, 55) 34 | description = self._limit_str_size(self.description, 49) 35 | 36 | list_google = ["GoogleResult(", 37 | "name={}".format(name), "\n", " " * 13, 38 | "description={}".format(description)] 39 | 40 | return "".join(list_google) 41 | 42 | def _limit_str_size(self, str_element, size_limit): 43 | """Limit the characters of the string, adding .. at the end.""" 44 | if not str_element: 45 | return None 46 | 47 | elif len(str_element) > size_limit: 48 | return unidecode(str_element[:size_limit]) + ".." 49 | 50 | else: 51 | return unidecode(str_element) 52 | 53 | 54 | # PUBLIC 55 | def search(query, pages=1, lang='en', area='com', ncr=False, void=True, time_period=False, sort_by_date=False, first_page=0): 56 | """Returns a list of GoogleResult. 57 | 58 | Args: 59 | query: String to search in google. 60 | pages: Number of pages where results must be taken. 61 | area : Area of google homepages. 62 | first_page : First page. 63 | 64 | TODO: add support to get the google results. 65 | Returns: 66 | A GoogleResult object.""" 67 | 68 | results = [] 69 | for i in range(first_page, first_page + pages): 70 | url = _get_search_url(query, i, lang=lang, area=area, ncr=ncr, time_period=time_period, sort_by_date=sort_by_date) 71 | html = get_html(url) 72 | 73 | if html: 74 | soup = BeautifulSoup(html, "html.parser") 75 | divs = soup.findAll("div", attrs={"class": "g"}) 76 | 77 | results_div = soup.find("div", attrs={"id": "resultStats"}) 78 | number_of_results = _get_number_of_results(results_div) 79 | 80 | j = 0 81 | for li in divs: 82 | res = GoogleResult() 83 | 84 | res.page = i 85 | res.index = j 86 | 87 | res.name = _get_name(li) 88 | res.link = _get_link(li) 89 | res.google_link = _get_google_link(li) 90 | res.description = _get_description(li) 91 | res.thumb = _get_thumb() 92 | res.cached = _get_cached(li) 93 | res.number_of_results = number_of_results 94 | res.is_pdf = _get_is_pdf(li) 95 | 96 | if void is True: 97 | if res.description is None: 98 | continue 99 | results.append(res) 100 | j += 1 101 | return results 102 | 103 | 104 | # PRIVATE 105 | def _get_name(li): 106 | """Return the name of a google search.""" 107 | a = li.find("a") 108 | # return a.text.encode("utf-8").strip() 109 | if a is not None: 110 | return a.text.strip() 111 | return None 112 | 113 | 114 | def _filter_link(link): 115 | '''Filter links found in the Google result pages HTML code. 116 | Returns None if the link doesn't yield a valid result. 117 | ''' 118 | try: 119 | # Valid results are absolute URLs not pointing to a Google domain 120 | # like images.google.com or googleusercontent.com 121 | o = urlparse(link, 'http') 122 | # link type-1 123 | # >>> "https://www.gitbook.com/book/ljalphabeta/python-" 124 | if o.netloc and 'google' not in o.netloc: 125 | return link 126 | # link type-2 127 | # >>> "http://www.google.com/url?url=http://python.jobbole.com/84108/&rct=j&frm=1&q=&esrc=s&sa=U&ved=0ahUKEwj3quDH-Y7UAhWG6oMKHdQ-BQMQFggUMAA&usg=AFQjCNHPws5Buru5Z71wooRLHT6mpvnZlA" 128 | if o.netloc and o.path.startswith('/url'): 129 | try: 130 | link = parse_qs(o.query)['url'][0] 131 | o = urlparse(link, 'http') 132 | if o.netloc and 'google' not in o.netloc: 133 | return link 134 | except KeyError: 135 | pass 136 | # Decode hidden URLs. 137 | if link.startswith('/url?'): 138 | try: 139 | # link type-3 140 | # >>> "/url?q=http://python.jobbole.com/84108/&sa=U&ved=0ahUKEwjFw6Txg4_UAhVI5IMKHfqVAykQFggUMAA&usg=AFQjCNFOTLpmpfqctpIn0sAfaj5U5gAU9A" 141 | link = parse_qs(o.query)['q'][0] 142 | # Valid results are absolute URLs not pointing to a Google domain 143 | # like images.google.com or googleusercontent.com 144 | o = urlparse(link, 'http') 145 | if o.netloc and 'google' not in o.netloc: 146 | return link 147 | except KeyError: 148 | # link type-4 149 | # >>> "/url?url=https://machine-learning-python.kspax.io/&rct=j&frm=1&q=&esrc=s&sa=U&ved=0ahUKEwj3quDH-Y7UAhWG6oMKHdQ-BQMQFggfMAI&usg=AFQjCNEfkUI0RP_RlwD3eI22rSfqbYM_nA" 150 | link = parse_qs(o.query)['url'][0] 151 | o = urlparse(link, 'http') 152 | if o.netloc and 'google' not in o.netloc: 153 | return link 154 | 155 | # Otherwise, or on error, return None. 156 | except Exception: 157 | pass 158 | return None 159 | 160 | 161 | def _get_link(li): 162 | """Return external link from a search.""" 163 | try: 164 | a = li.find("a") 165 | link = a["href"] 166 | except Exception: 167 | return None 168 | return _filter_link(link) 169 | 170 | 171 | def _get_google_link(li): 172 | """Return google link from a search.""" 173 | try: 174 | a = li.find("a") 175 | link = a["href"] 176 | except Exception: 177 | return None 178 | 179 | if link.startswith("/url?") or link.startswith("/search?"): 180 | return urllib.parse.urljoin("http://www.google.com", link) 181 | 182 | else: 183 | return None 184 | 185 | 186 | def _get_description(li): 187 | """Return the description of a google search. 188 | 189 | TODO: There are some text encoding problems to resolve.""" 190 | 191 | sdiv = li.find("div", attrs={"class": "IsZvec"}) 192 | if sdiv: 193 | stspan = sdiv.find("span", attrs={"class": "aCOpRe"}) 194 | if stspan is not None: 195 | # return stspan.text.encode("utf-8").strip() 196 | return stspan.text.strip() 197 | else: 198 | return None 199 | 200 | 201 | def _get_thumb(): 202 | """Return the link to a thumbnail of the website.""" 203 | pass 204 | 205 | 206 | def _get_cached(li): 207 | """Return a link to the cached version of the page.""" 208 | links = li.find_all("a") 209 | if len(links) > 1 and links[1].text == "Cached": 210 | link = links[1]["href"] 211 | if link.startswith("/url?") or link.startswith("/search?"): 212 | return urllib.parse.urljoin("http://www.google.com", link) 213 | return None 214 | 215 | def _get_is_pdf(li): 216 | """Return if the link is marked by google as PDF""" 217 | sdiv = li.find("span", attrs={"class": "ZGwO7 C0kchf NaCKVc"}) 218 | return True if sdiv else False 219 | 220 | def _get_number_of_results(results_div): 221 | """Return the total number of results of the google search. 222 | Note that the returned value will be the same for all the GoogleResult 223 | objects from a specific query.""" 224 | try: 225 | results_div_text = results_div.get_text() 226 | if results_div_text: 227 | regex = r"((?:\d+[,\.])*\d+)" 228 | m = findall(regex, results_div_text) 229 | 230 | # Clean up the number. 231 | num = m[0].replace(",", "").replace(".", "") 232 | 233 | results = int(num) 234 | return results 235 | except Exception as e: 236 | return 0 237 | -------------------------------------------------------------------------------- /googleapi/modules/utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | from __future__ import print_function 3 | from __future__ import division 4 | 5 | from future import standard_library 6 | standard_library.install_aliases() 7 | from builtins import range 8 | from past.utils import old_div 9 | import time 10 | from selenium import webdriver 11 | import urllib.request 12 | import urllib.error 13 | import urllib.parse 14 | from functools import wraps 15 | # import requests 16 | from urllib.parse import urlencode 17 | from fake_useragent import UserAgent 18 | import sys 19 | 20 | class AreaError(KeyError): 21 | pass 22 | 23 | 24 | def measure_time(fn): 25 | 26 | def decorator(*args, **kwargs): 27 | start = time.time() 28 | 29 | res = fn(*args, **kwargs) 30 | 31 | elapsed = time.time() - start 32 | print(fn.__name__, "took", elapsed, "seconds") 33 | 34 | return res 35 | 36 | return decorator 37 | 38 | 39 | def normalize_query(query): 40 | return query.strip().replace(":", "%3A").replace("+", "%2B").replace("&", "%26").replace(" ", "+") 41 | 42 | 43 | def _get_search_url(query, page=0, per_page=10, lang='en', area='com', ncr=False, time_period=False, sort_by_date=False): 44 | # note: num per page might not be supported by google anymore (because of 45 | # google instant) 46 | 47 | params = { 48 | 'nl': lang, 49 | 'q': query.encode('utf8'), 50 | 'start': page * per_page, 51 | 'num': per_page 52 | } 53 | 54 | time_mapping = { 55 | 'hour': 'qdr:h', 56 | 'week': 'qdr:w', 57 | 'month': 'qdr:m', 58 | 'year': 'qdr:y' 59 | } 60 | 61 | 62 | tbs_param = [] 63 | # Set time period for query if given 64 | if time_period and time_period in time_mapping: 65 | tbs_param.append(time_mapping[time_period]) 66 | 67 | if sort_by_date: 68 | tbs_param.append('sbd:1') 69 | params['tbs'] = ','.join(tbs_param) 70 | 71 | # This will allow to search Google with No Country Redirect 72 | if ncr: 73 | params['gl'] = 'us' # Geographic Location: US 74 | params['pws'] = '0' # 'pws' = '0' disables personalised search 75 | params['gws_rd'] = 'cr' # Google Web Server ReDirect: CountRy. 76 | 77 | params = urlencode(params) 78 | 79 | url = u"https://www.google.com/search?" + params 80 | 81 | # @author JuaniFilardo: 82 | # Workaround to switch between http and https, since this maneuver 83 | # seems to avoid the 503 error when performing a lot of queries. 84 | # Weird, but it works. 85 | # You may also wanna wait some time between queries, say, randint(50,65) 86 | # between each query, and randint(180,240) every 100 queries, which is 87 | # what I found useful. 88 | https = int(time.time()) % 2 == 0 89 | bare_url = u"https://www.google.com/search?" if https else u"http://www.google.com/search?" 90 | url = bare_url + params 91 | 92 | # return u"http://www.google.com/search?hl=%s&q=%s&start=%i&num=%i" % 93 | # (lang, normalize_query(query), page * per_page, per_page) 94 | if not ncr: 95 | if area == 'com': 96 | url = u"http://www.google.com/search?" 97 | elif area == 'is': 98 | url = 'http://www.google.is/search?' 99 | elif area == 'dk': 100 | url = 'http://www.google.dk/search?' 101 | elif area == 'no': 102 | url = 'http://www.google.no/search?' 103 | elif area == 'se': 104 | url = 'http://www.google.se/search?' 105 | elif area == 'fi': 106 | url = 'http://www.google.fi/search?' 107 | elif area == 'ee': 108 | url = 'http://www.google.ee/search?' 109 | elif area == 'lv': 110 | url = 'http://www.google.lv/search?' 111 | elif area == 'lt': 112 | url = 'http://www.google.lt/search?' 113 | elif area == 'ie': 114 | url = 'http://www.google.ie/search?' 115 | elif area == 'uk': 116 | url = 'http://www.google.co.uk/search?' 117 | elif area == 'gg': 118 | url = 'http://www.google.gg/search?' 119 | elif area == 'je': 120 | url = 'http://www.google.je/search?' 121 | elif area == 'im': 122 | url = 'http://www.google.im/search?' 123 | elif area == 'fr': 124 | url = 'http://www.google.fr/search?' 125 | elif area == 'nl': 126 | url = 'http://www.google.nl/search?' 127 | elif area == 'be': 128 | url = 'http://www.google.be/search?' 129 | elif area == 'lu': 130 | url = 'http://www.google.lu/search?' 131 | elif area == 'de': 132 | url = 'http://www.google.de/search?' 133 | elif area == 'at': 134 | url = 'http://www.google.at/search?' 135 | elif area == 'ch': 136 | url = 'http://www.google.ch/search?' 137 | elif area == 'li': 138 | url = 'http://www.google.li/search?' 139 | elif area == 'pt': 140 | url = 'http://www.google.pt/search?' 141 | elif area == 'es': 142 | url = 'http://www.google.es/search?' 143 | elif area == 'gi': 144 | url = 'http://www.google.com.gi/search?' 145 | elif area == 'ad': 146 | url = 'http://www.google.ad/search?' 147 | elif area == 'it': 148 | url = 'http://www.google.it/search?' 149 | elif area == 'mt': 150 | url = 'http://www.google.com.mt/search?' 151 | elif area == 'sm': 152 | url = 'http://www.google.sm/search?' 153 | elif area == 'gr': 154 | url = 'http://www.google.gr/search?' 155 | elif area == 'ru': 156 | url = 'http://www.google.ru/search?' 157 | elif area == 'by': 158 | url = 'http://www.google.com.by/search?' 159 | elif area == 'ua': 160 | url = 'http://www.google.com.ua/search?' 161 | elif area == 'pl': 162 | url = 'http://www.google.pl/search?' 163 | elif area == 'cz': 164 | url = 'http://www.google.cz/search?' 165 | elif area == 'sk': 166 | url = 'http://www.google.sk/search?' 167 | elif area == 'hu': 168 | url = 'http://www.google.hu/search?' 169 | elif area == 'si': 170 | url = 'http://www.google.si/search?' 171 | elif area == 'hr': 172 | url = 'http://www.google.hr/search?' 173 | elif area == 'ba': 174 | url = 'http://www.google.ba/search?' 175 | elif area == 'me': 176 | url = 'http://www.google.me/search?' 177 | elif area == 'rs': 178 | url = 'http://www.google.rs/search?' 179 | elif area == 'mk': 180 | url = 'http://www.google.mk/search?' 181 | elif area == 'bg': 182 | url = 'http://www.google.bg/search?' 183 | elif area == 'ro': 184 | url = 'http://www.google.ro/search?' 185 | elif area == 'md': 186 | url = 'http://www.google.md/search?' 187 | elif area == 'hk': 188 | url = 'http://www.google.com.hk/search?' 189 | elif area == 'mn': 190 | url = 'http://www.google.mn/search?' 191 | elif area == 'kr': 192 | url = 'http://www.google.co.kr/search?' 193 | elif area == 'jp': 194 | url = 'http://www.google.co.jp/search?' 195 | elif area == 'vn': 196 | url = 'http://www.google.com.vn/search?' 197 | elif area == 'la': 198 | url = 'http://www.google.la/search?' 199 | elif area == 'kh': 200 | url = 'http://www.google.com.kh/search?' 201 | elif area == 'th': 202 | url = 'http://www.google.co.th/search?' 203 | elif area == 'my': 204 | url = 'http://www.google.com.my/search?' 205 | elif area == 'sg': 206 | url = 'http://www.google.com.sg/search?' 207 | elif area == 'bn': 208 | url = 'http://www.google.com.bn/search?' 209 | elif area == 'ph': 210 | url = 'http://www.google.com.ph/search?' 211 | elif area == 'id': 212 | url = 'http://www.google.co.id/search?' 213 | elif area == 'tp': 214 | url = 'http://www.google.tp/search?' 215 | elif area == 'kz': 216 | url = 'http://www.google.kz/search?' 217 | elif area == 'kg': 218 | url = 'http://www.google.kg/search?' 219 | elif area == 'tj': 220 | url = 'http://www.google.com.tj/search?' 221 | elif area == 'uz': 222 | url = 'http://www.google.co.uz/search?' 223 | elif area == 'tm': 224 | url = 'http://www.google.tm/search?' 225 | elif area == 'af': 226 | url = 'http://www.google.com.af/search?' 227 | elif area == 'pk': 228 | url = 'http://www.google.com.pk/search?' 229 | elif area == 'np': 230 | url = 'http://www.google.com.np/search?' 231 | elif area == 'in': 232 | url = 'http://www.google.co.in/search?' 233 | elif area == 'bd': 234 | url = 'http://www.google.com.bd/search?' 235 | elif area == 'lk': 236 | url = 'http://www.google.lk/search?' 237 | elif area == 'mv': 238 | url = 'http://www.google.mv/search?' 239 | elif area == 'kw': 240 | url = 'http://www.google.com.kw/search?' 241 | elif area == 'sa': 242 | url = 'http://www.google.com.sa/search?' 243 | elif area == 'bh': 244 | url = 'http://www.google.com.bh/search?' 245 | elif area == 'ae': 246 | url = 'http://www.google.ae/search?' 247 | elif area == 'om': 248 | url = 'http://www.google.com.om/search?' 249 | elif area == 'jo': 250 | url = 'http://www.google.jo/search?' 251 | elif area == 'il': 252 | url = 'http://www.google.co.il/search?' 253 | elif area == 'lb': 254 | url = 'http://www.google.com.lb/search?' 255 | elif area == 'tr': 256 | url = 'http://www.google.com.tr/search?' 257 | elif area == 'az': 258 | url = 'http://www.google.az/search?' 259 | elif area == 'am': 260 | url = 'http://www.google.am/search?' 261 | elif area == 'ls': 262 | url = 'http://www.google.co.ls/search?' 263 | elif area == 'eg': 264 | url = 'http://www.google.com.eg/search?' 265 | elif area == 'ly': 266 | url = 'http://www.google.com.ly/search?' 267 | elif area == 'dz': 268 | url = 'http://www.google.dz/search?' 269 | elif area == 'ma': 270 | url = 'http://www.google.co.ma/search?' 271 | elif area == 'sn': 272 | url = 'http://www.google.sn/search?' 273 | elif area == 'gm': 274 | url = 'http://www.google.gm/search?' 275 | elif area == 'ml': 276 | url = 'http://www.google.ml/search?' 277 | elif area == 'bf': 278 | url = 'http://www.google.bf/search?' 279 | elif area == 'sl': 280 | url = 'http://www.google.com.sl/search?' 281 | elif area == 'ci': 282 | url = 'http://www.google.ci/search?' 283 | elif area == 'gh': 284 | url = 'http://www.google.com.gh/search?' 285 | elif area == 'tg': 286 | url = 'http://www.google.tg/search?' 287 | elif area == 'bj': 288 | url = 'http://www.google.bj/search?' 289 | elif area == 'ne': 290 | url = 'http://www.google.ne/search?' 291 | elif area == 'ng': 292 | url = 'http://www.google.com.ng/search?' 293 | elif area == 'sh': 294 | url = 'http://www.google.sh/search?' 295 | elif area == 'cm': 296 | url = 'http://www.google.cm/search?' 297 | elif area == 'td': 298 | url = 'http://www.google.td/search?' 299 | elif area == 'cf': 300 | url = 'http://www.google.cf/search?' 301 | elif area == 'ga': 302 | url = 'http://www.google.ga/search?' 303 | elif area == 'cg': 304 | url = 'http://www.google.cg/search?' 305 | elif area == 'cd': 306 | url = 'http://www.google.cd/search?' 307 | elif area == 'ao': 308 | url = 'http://www.google.it.ao/search?' 309 | elif area == 'et': 310 | url = 'http://www.google.com.et/search?' 311 | elif area == 'dj': 312 | url = 'http://www.google.dj/search?' 313 | elif area == 'ke': 314 | url = 'http://www.google.co.ke/search?' 315 | elif area == 'ug': 316 | url = 'http://www.google.co.ug/search?' 317 | elif area == 'tz': 318 | url = 'http://www.google.co.tz/search?' 319 | elif area == 'rw': 320 | url = 'http://www.google.rw/search?' 321 | elif area == 'bi': 322 | url = 'http://www.google.bi/search?' 323 | elif area == 'mw': 324 | url = 'http://www.google.mw/search?' 325 | elif area == 'mz': 326 | url = 'http://www.google.co.mz/search?' 327 | elif area == 'mg': 328 | url = 'http://www.google.mg/search?' 329 | elif area == 'sc': 330 | url = 'http://www.google.sc/search?' 331 | elif area == 'mu': 332 | url = 'http://www.google.mu/search?' 333 | elif area == 'zm': 334 | url = 'http://www.google.co.zm/search?' 335 | elif area == 'zw': 336 | url = 'http://www.google.co.zw/search?' 337 | elif area == 'bw': 338 | url = 'http://www.google.co.bw/search?' 339 | elif area == 'na': 340 | url = 'http://www.google.com.na/search?' 341 | elif area == 'za': 342 | url = 'http://www.google.co.za/search?' 343 | elif area == 'au': 344 | url = 'http://www.google.com.au/search?' 345 | elif area == 'nf': 346 | url = 'http://www.google.com.nf/search?' 347 | elif area == 'nz': 348 | url = 'http://www.google.co.nz/search?' 349 | elif area == 'sb': 350 | url = 'http://www.google.com.sb/search?' 351 | elif area == 'fj': 352 | url = 'http://www.google.com.fj/search?' 353 | elif area == 'fm': 354 | url = 'http://www.google.fm/search?' 355 | elif area == 'ki': 356 | url = 'http://www.google.ki/search?' 357 | elif area == 'nr': 358 | url = 'http://www.google.nr/search?' 359 | elif area == 'tk': 360 | url = 'http://www.google.tk/search?' 361 | elif area == 'ws': 362 | url = 'http://www.google.ws/search?' 363 | elif area == 'as': 364 | url = 'http://www.google.as/search?' 365 | elif area == 'to': 366 | url = 'http://www.google.to/search?' 367 | elif area == 'nu': 368 | url = 'http://www.google.nu/search?' 369 | elif area == 'ck': 370 | url = 'http://www.google.co.ck/search?' 371 | elif area == 'do': 372 | url = 'http://www.google.com.do/search?' 373 | elif area == 'tt': 374 | url = 'http://www.google.tt/search?' 375 | elif area == 'co': 376 | url = 'http://www.google.com.co/search?' 377 | elif area == 'ec': 378 | url = 'http://www.google.com.ec/search?' 379 | elif area == 've': 380 | url = 'http://www.google.co.ve/search?' 381 | elif area == 'gy': 382 | url = 'http://www.google.gy/search?' 383 | elif area == 'pe': 384 | url = 'http://www.google.com.pe/search?' 385 | elif area == 'bo': 386 | url = 'http://www.google.com.bo/search?' 387 | elif area == 'py': 388 | url = 'http://www.google.com.py/search?' 389 | elif area == 'br': 390 | url = 'http://www.google.com.br/search?' 391 | elif area == 'uy': 392 | url = 'http://www.google.com.uy/search?' 393 | elif area == 'ar': 394 | url = 'http://www.google.com.ar/search?' 395 | elif area == 'cl': 396 | url = 'http://www.google.cl/search?' 397 | elif area == 'gl': 398 | url = 'http://www.google.gl/search?' 399 | elif area == 'ca': 400 | url = 'http://www.google.ca/search?' 401 | elif area == 'mx': 402 | url = 'http://www.google.com.mx/search?' 403 | elif area == 'gt': 404 | url = 'http://www.google.com.gt/search?' 405 | elif area == 'bz': 406 | url = 'http://www.google.com.bz/search?' 407 | elif area == 'sv': 408 | url = 'http://www.google.com.sv/search?' 409 | elif area == 'hn': 410 | url = 'http://www.google.hn/search?' 411 | elif area == 'ni': 412 | url = 'http://www.google.com.ni/search?' 413 | elif area == 'cr': 414 | url = 'http://www.google.co.cr/search?' 415 | elif area == 'pa': 416 | url = 'http://www.google.com.pa/search?' 417 | elif area == 'bs': 418 | url = 'http://www.google.bs/search?' 419 | elif area == 'cu': 420 | url = 'http://www.google.com.cu/search?' 421 | elif area == 'jm': 422 | url = 'http://www.google.com.jm/search?' 423 | elif area == 'ht': 424 | url = 'http://www.google.ht/search?' 425 | else: 426 | raise AreaError('invalid name, no area found') 427 | url += params 428 | return url 429 | 430 | 431 | def get_html(url): 432 | ua = UserAgent() 433 | header = ua.random 434 | 435 | try: 436 | request = urllib.request.Request(url) 437 | request.add_header("User-Agent", header) 438 | html = urllib.request.urlopen(request).read() 439 | return html 440 | except urllib.error.HTTPError as e: 441 | print("Error accessing:", url) 442 | print(e) 443 | if e.code == 503 and 'CaptchaRedirect' in e.read(): 444 | print("Google is requiring a Captcha. " 445 | "For more information check: 'https://support.google.com/websearch/answer/86640'") 446 | if e.code == 503: 447 | sys.exit("503 Error: service is currently unavailable. Program will exit.") 448 | return None 449 | except Exception as e: 450 | print("Error accessing:", url) 451 | print(e) 452 | return None 453 | 454 | 455 | def write_html_to_file(html, filename): 456 | of = open(filename, "w") 457 | of.write(html.encode("utf-8")) 458 | # of.flush() 459 | of.close() 460 | 461 | 462 | def get_browser_with_url(url, timeout=120, driver="firefox"): 463 | """Returns an open browser with a given url.""" 464 | 465 | # choose a browser 466 | if driver == "firefox": 467 | browser = webdriver.Firefox() 468 | elif driver == "ie": 469 | browser = webdriver.Ie() 470 | elif driver == "chrome": 471 | browser = webdriver.Chrome() 472 | else: 473 | print("Driver choosen is not recognized") 474 | 475 | # set maximum load time 476 | browser.set_page_load_timeout(timeout) 477 | 478 | # open a browser with given url 479 | browser.get(url) 480 | 481 | time.sleep(0.5) 482 | 483 | return browser 484 | 485 | 486 | def get_html_from_dynamic_site(url, timeout=120, 487 | driver="firefox", attempts=10): 488 | """Returns html from a dynamic site, opening it in a browser.""" 489 | 490 | RV = "" 491 | 492 | # try several attempts 493 | for i in range(attempts): 494 | try: 495 | # load browser 496 | browser = get_browser_with_url(url, timeout, driver) 497 | 498 | # get html 499 | time.sleep(2) 500 | content = browser.page_source 501 | 502 | # try again if there is no content 503 | if not content: 504 | browser.quit() 505 | raise Exception("No content!") 506 | 507 | # if there is content gets out 508 | browser.quit() 509 | RV = content 510 | break 511 | 512 | except: 513 | print("\nTry ", i, " of ", attempts, "\n") 514 | time.sleep(5) 515 | 516 | return RV 517 | 518 | 519 | def timeit(func=None, loops=1, verbose=False): 520 | if func: 521 | def inner(*args, **kwargs): 522 | 523 | sums = 0.0 524 | mins = 1.7976931348623157e+308 525 | maxs = 0.0 526 | print('====%s Timing====' % func.__name__) 527 | for i in range(0, loops): 528 | t0 = time.time() 529 | result = func(*args, **kwargs) 530 | dt = time.time() - t0 531 | mins = dt if dt < mins else mins 532 | maxs = dt if dt > maxs else maxs 533 | sums += dt 534 | if verbose: 535 | print('\t%r ran in %2.9f sec on run %s' % 536 | (func.__name__, dt, i)) 537 | print('%r min run time was %2.9f sec' % (func.__name__, mins)) 538 | print('%r max run time was %2.9f sec' % (func.__name__, maxs)) 539 | print('%r avg run time was %2.9f sec in %s runs' % 540 | (func.__name__, old_div(sums, loops), loops)) 541 | print('==== end ====') 542 | return result 543 | 544 | return inner 545 | else: 546 | def partial_inner(func): 547 | return timeit(func, loops, verbose) 548 | return partial_inner 549 | 550 | 551 | def timing(f): 552 | @wraps(f) 553 | def wrap(*args, **kw): 554 | ts = time.time() 555 | result = f(*args, **kw) 556 | te = time.time() 557 | print('func:%r args:[%r, %r] took: %2.4f sec' % 558 | (f.__name__, args, kw, te - ts)) 559 | return result 560 | return wrap 561 | -------------------------------------------------------------------------------- /googleapi/tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abenassi/Google-Search-API/546a59cc22d3260c60a2faf9afe6477168ead627/googleapi/tests/__init__.py -------------------------------------------------------------------------------- /googleapi/tests/html_files/test_convert_currency.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Currency Converter - Google Finance 5 | 6 | 7 | 9 | 10 |
11 |
12 |
13 | 14 |
15 |
16 | 186 |
187 |
to
188 |
189 | 359 |
360 |   361 |
5 USD = 4.4585 EUR 362 | 363 |
364 | 365 |
366 | 367 |
368 | 369 | -------------------------------------------------------------------------------- /googleapi/tests/html_files/test_exchange_rate.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Currency Converter - Google Finance 5 | 6 | 7 | 9 | 10 |
11 |
12 |
13 | 14 |
15 |
16 | 186 |
187 |
to
188 |
189 | 359 |
360 |   361 |
1 USD = 0.8913 EUR 362 | 363 |
364 | 365 |
366 | 367 |
368 | 369 | -------------------------------------------------------------------------------- /googleapi/tests/html_files/test_standard_search.html: -------------------------------------------------------------------------------- 1 | github - Google Search

 
About 123,000,000 results
-------------------------------------------------------------------------------- /googleapi/tests/test_google.py: -------------------------------------------------------------------------------- 1 | from builtins import object 2 | import unittest 3 | import nose 4 | from googleapi import google 5 | from googleapi import currency, images, shopping_search 6 | from mock import Mock 7 | import os 8 | import vcr 9 | 10 | BASE_DIR = os.path.dirname(__file__) 11 | 12 | 13 | def load_html_file(path): 14 | """Call test with a html file of the same name. 15 | 16 | Args: 17 | path: Relative path where the html file is located.""" 18 | 19 | def test_decorator(fn): 20 | base_path = os.path.join(os.path.dirname(__file__), path) 21 | file_name = fn.__name__ + ".html" 22 | file_path = os.path.join(base_path, file_name) 23 | 24 | html_f = open(file_path, "r") 25 | 26 | def test_decorated(self): 27 | fn(self, html_f) 28 | 29 | return test_decorated 30 | return test_decorator 31 | 32 | 33 | # HELPERS 34 | def get_dir_vcr(name): 35 | return os.path.join(BASE_DIR, "vcr_cassetes", name) 36 | 37 | 38 | class GoogleTest(unittest.TestCase): 39 | 40 | @load_html_file("html_files") 41 | # @unittest.skip("skip") 42 | def test_search_images(self, html_f): 43 | """Test method to search images.""" 44 | 45 | class MockBrowser(object): 46 | 47 | """Mock browser to replace selenium driver.""" 48 | 49 | def __init__(self, html_f): 50 | self.page_source = html_f.read() 51 | self.page_source = self.page_source.decode('utf8') if 'decode' in dir(self.page_source) else self.page_source 52 | 53 | def get(self, url): 54 | pass 55 | 56 | def quit(self): 57 | pass 58 | 59 | google.images.get_browser_with_url = \ 60 | Mock(return_value=MockBrowser(html_f)) 61 | 62 | res = google.search_images("apple", num_images=10) 63 | self.assertEqual(len(res), 10) 64 | 65 | # @load_html_file("html_files") 66 | # def test_calculator(self, html_f): 67 | @unittest.skip("skip") 68 | def test_calculator(self): 69 | """Test method to calculate in google.""" 70 | 71 | # replace method to get html from a test html file 72 | # google.calculator.get_html_from_dynamic_site = \ 73 | # Mock(return_value=html_f.read().decode('utf8')) 74 | 75 | calc = google.calculate("157.3kg in grams") 76 | self.assertEqual(calc.value, 157300) 77 | 78 | # @load_html_file("html_files") 79 | @vcr.use_cassette(get_dir_vcr("test_exchange_rate.yaml")) 80 | def test_exchange_rate(self): 81 | """Test method to get an exchange rate in google.""" 82 | 83 | usd_to_eur = google.exchange_rate("USD", "EUR") 84 | self.assertGreater(usd_to_eur, 0.0) 85 | 86 | # @load_html_file("html_files") 87 | @vcr.use_cassette(get_dir_vcr("test_convert_currency.yaml")) 88 | def test_convert_currency(self): 89 | """Test method to convert currency in google.""" 90 | 91 | euros = google.convert_currency(5.0, "USD", "EUR") 92 | self.assertGreater(euros, 0.0) 93 | 94 | # @load_html_file("html_files") 95 | @vcr.use_cassette(get_dir_vcr("test_standard_search.yaml")) 96 | def test_standard_search(self): 97 | """Test method to search in google.""" 98 | 99 | search = google.search("github") 100 | self.assertNotEqual(len(search), 0) 101 | 102 | # @load_html_file("html_files") 103 | @vcr.use_cassette(get_dir_vcr("test_shopping_search.yaml")) 104 | @unittest.skip("skip") 105 | def test_shopping_search(self): 106 | """Test method for google shopping.""" 107 | 108 | shop = google.shopping_search("Disgaea 4") 109 | self.assertNotEqual(len(shop), 0) 110 | 111 | 112 | class ConvertCurrencyTest(unittest.TestCase): 113 | 114 | def test_get_currency_req_url(self): 115 | """Test method to get currency conversion request url.""" 116 | 117 | amount = 10 118 | from_currency = "USD" 119 | to_currency = "EUR" 120 | req_url = currency._get_currency_req_url(amount, from_currency, 121 | to_currency) 122 | 123 | exp_req_url = "https://www.google.com/finance/converter?a=10&from=USD&to=EUR" 124 | 125 | self.assertEqual(req_url, exp_req_url) 126 | 127 | # @unittest.skip("skip") 128 | def test_parse_currency_response(self): 129 | """Test method to parse currency response. TODO!""" 130 | pass 131 | 132 | # @unittest.skip("skip") 133 | 134 | 135 | class SearchImagesTest(unittest.TestCase): 136 | 137 | def test_get_images_req_url(self): 138 | 139 | query = "banana" 140 | options = images.ImageOptions() 141 | options.image_type = images.ImageType.CLIPART 142 | options.larger_than = images.LargerThan.MP_4 143 | options.color = "green" 144 | options.license = images.License.REUSE_WITH_MOD 145 | 146 | req_url = images._get_images_req_url(query, options) 147 | 148 | exp_req_url = 'https://www.google.com.ar/search?q=banana&es_sm=122&source=lnms&tbm=isch&sa=X&ei=DDdUVL-fE4SpNq-ngPgK&ved=0CAgQ_AUoAQ&biw=1024&bih=719&dpr=1.25&tbs=itp:clipart,isz:lt,islt:4mp,ic:specific,isc:green,sur:fmc' 149 | 150 | self.assertEqual(req_url, exp_req_url) 151 | 152 | def test_repr(self): 153 | res = images.ImageResult() 154 | assert repr( 155 | res) == 'ImageResult(index=None, page=None, domain=None, link=None)' 156 | res.page = 1 157 | res.index = 11 158 | res.name = 'test' 159 | res.thumb = 'test' 160 | res.format = 'test' 161 | res.domain = 'test' 162 | res.link = 'http://aa.com' 163 | assert repr( 164 | res) == 'ImageResult(index=11, page=1, domain=test, link=http://aa.com)' 165 | 166 | def test_download(self): 167 | pass 168 | 169 | def test_fast_download(self): 170 | pass 171 | 172 | 173 | if __name__ == '__main__': 174 | # nose.main() 175 | nose.run(defaultTest=__name__) 176 | -------------------------------------------------------------------------------- /googleapi/tests/test_utils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | """Tests helper methods.""" 5 | 6 | from __future__ import print_function 7 | from __future__ import with_statement 8 | import unittest 9 | import nose 10 | 11 | from googleapi.modules.utils import _get_search_url 12 | 13 | 14 | class UtilsTestCase(unittest.TestCase): 15 | """Tests for helper methods.""" 16 | @unittest.skip('Don\t know why but it not work. Skipping for now') 17 | def test_get_search_url(self): 18 | url = _get_search_url("apple", 0, 10, "en", "jp") 19 | exp_url = "http://www.google.co.jp/search?q=apple&start=0&num=10&hl=en" 20 | self.assertEqual(url, exp_url) 21 | 22 | 23 | if __name__ == '__main__': 24 | nose.run(defaultTest=__name__) 25 | -------------------------------------------------------------------------------- /requirements.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import logging 4 | import warnings 5 | from pkg_resources import Requirement as Req 6 | 7 | try: 8 | from urllib.parse import urlparse 9 | except ImportError: 10 | from urlparse import urlparse 11 | 12 | 13 | __version__ = '0.1.0' 14 | 15 | logging.basicConfig(level=logging.WARNING) 16 | 17 | VCS = ['git', 'hg', 'svn', 'bzr'] 18 | 19 | 20 | class Requirement(object): 21 | """ 22 | This class is inspired from 23 | https://github.com/davidfischer/requirements-parser/blob/master/requirements/requirement.py#L30 24 | License: BSD 25 | """ 26 | 27 | def __init__(self, line): 28 | self.line = line 29 | self.is_editable = False 30 | self.is_local_file = False 31 | self.is_specifier = False 32 | self.vcs = None 33 | self.name = None 34 | self.uri = None 35 | self.full_uri = None 36 | self.path = None 37 | self.revision = None 38 | self.scheme = None 39 | self.login = None 40 | self.extras = [] 41 | self.specs = [] 42 | 43 | def __repr__(self): 44 | return ''.format(self.line) 45 | 46 | @classmethod 47 | def parse(cls, line, editable=False): 48 | """ 49 | Parses a Requirement from an "editable" requirement which is either 50 | a local project path or a VCS project URI. 51 | 52 | See: pip/req.py:from_editable() 53 | 54 | :param line: an "editable" requirement 55 | :returns: a Requirement instance for the given line 56 | :raises: ValueError on an invalid requirement 57 | """ 58 | if editable: 59 | req = cls('-e {0}'.format(line)) 60 | req.is_editable = True 61 | else: 62 | req = cls(line) 63 | 64 | url = urlparse(line) 65 | req.uri = None 66 | if url.scheme: 67 | req.scheme = url.scheme 68 | req.uri = url.scheme + '://' + url.netloc + url.path 69 | fragment = url.fragment.split(' ')[0].strip() 70 | req.name = fragment.split('egg=')[-1] or None 71 | req.path = url.path 72 | if fragment: 73 | req.uri += '#{}'.format(fragment) 74 | if url.username or url.password: 75 | username = url.username or '' 76 | password = url.password or '' 77 | req.login = username + ':' + password 78 | if '@' in url.path: 79 | req.revision = url.path.split('@')[-1] 80 | 81 | for vcs in VCS: 82 | if req.uri.startswith(vcs): 83 | req.vcs = vcs 84 | if req.scheme.startswith('file://'): 85 | req.is_local_file = True 86 | 87 | if not req.vcs and not req.is_local_file and 'egg=' not in line: 88 | # This is a requirement specifier. 89 | # Delegate to pkg_resources and hope for the best 90 | req.is_specifier = True 91 | pkg_req = Req.parse(line) 92 | req.name = pkg_req.unsafe_name 93 | req.extras = list(pkg_req.extras) 94 | req.specs = pkg_req.specs 95 | if req.specs: 96 | req.specs = sorted(req.specs) 97 | 98 | return req 99 | 100 | 101 | class Requirements: 102 | 103 | def __init__( 104 | self, 105 | requirements="requirements.txt", 106 | tests_requirements="tests_requirements.txt"): 107 | self.requirements_path = requirements 108 | self.tests_requirements_path = tests_requirements 109 | 110 | def format_specifiers(self, requirement): 111 | return ', '.join( 112 | ['{} {}'.format(s[0], s[1]) for s in requirement.specs]) 113 | 114 | @property 115 | def install_requires(self): 116 | dependencies = [] 117 | for requirement in self.parse(self.requirements_path): 118 | if not requirement.is_editable and not requirement.uri \ 119 | and not requirement.vcs: 120 | full_name = requirement.name 121 | specifiers = self.format_specifiers(requirement) 122 | if specifiers: 123 | full_name = "{} {}".format(full_name, specifiers) 124 | dependencies.append(full_name) 125 | for requirement in self.get_dependency_links(): 126 | print(":: (base:install_requires) {}".format(requirement.name)) 127 | dependencies.append(requirement.name) 128 | return dependencies 129 | 130 | @property 131 | def tests_require(self): 132 | dependencies = [] 133 | for requirement in self.parse(self.tests_requirements_path): 134 | if not requirement.is_editable and not requirement.uri \ 135 | and not requirement.vcs: 136 | full_name = requirement.name 137 | specifiers = self.format_specifiers(requirement) 138 | if specifiers: 139 | full_name = "{} {}".format(full_name, specifiers) 140 | print(":: (tests:tests_require) {}".format(full_name)) 141 | dependencies.append(full_name) 142 | return dependencies 143 | 144 | @property 145 | def dependency_links(self): 146 | dependencies = [] 147 | for requirement in self.parse(self.requirements_path): 148 | if requirement.uri or requirement.vcs or requirement.path: 149 | print(":: (base:dependency_links) {}".format( 150 | requirement.uri)) 151 | dependencies.append(requirement.uri) 152 | return dependencies 153 | 154 | @property 155 | def dependencies(self): 156 | install_requires = self.install_requires 157 | dependency_links = self.dependency_links 158 | tests_require = self.tests_require 159 | if dependency_links: 160 | print( 161 | "\n" 162 | "!! Some dependencies are linked to repository or local path.") 163 | print( 164 | "!! You'll need to run pip with following option: " 165 | "`--process-dependency-links`" 166 | "\n") 167 | return { 168 | 'install_requires': install_requires, 169 | 'dependency_links': dependency_links, 170 | 'tests_require': tests_require} 171 | 172 | def get_dependency_links(self): 173 | dependencies = [] 174 | for requirement in self.parse(self.requirements_path): 175 | if requirement.uri or requirement.vcs or requirement.path: 176 | dependencies.append(requirement) 177 | return dependencies 178 | 179 | def parse(self, path=None): 180 | path = path or self.requirements_path 181 | path = os.path.abspath(path) 182 | base_directory = os.path.dirname(path) 183 | 184 | if not os.path.exists(path): 185 | warnings.warn( 186 | 'Requirements file: {} does not exists.'.format(path)) 187 | return 188 | 189 | with open(path) as requirements: 190 | for index, line in enumerate(requirements.readlines()): 191 | index += 1 192 | line = line.strip() 193 | if not line: 194 | logging.debug('Empty line (line {} from {})'.format( 195 | index, path)) 196 | continue 197 | elif line.startswith('#'): 198 | logging.debug( 199 | 'Comments line (line {} from {})'.format(index, path)) 200 | elif line.startswith('-f') or \ 201 | line.startswith('--find-links') or \ 202 | line.startswith('-i') or \ 203 | line.startswith('--index-url') or \ 204 | line.startswith('--extra-index-url') or \ 205 | line.startswith('--no-index'): 206 | warnings.warn('Private repos not supported. Skipping.') 207 | continue 208 | elif line.startswith('-Z') or line.startswith( 209 | '--always-unzip'): 210 | warnings.warn('Unused option --always-unzip. Skipping.') 211 | continue 212 | elif line.startswith('-r') or line.startswith('--requirement'): 213 | logging.debug( 214 | 'Pining to another requirements file ' 215 | '(line {} from {})'.format(index, path)) 216 | for _line in self.parse(path=os.path.join( 217 | base_directory, line.split()[1])): 218 | yield _line 219 | elif line.startswith('-e') or line.startswith('--editable'): 220 | # Editable installs are either a local project path 221 | # or a VCS project URI 222 | yield Requirement.parse( 223 | re.sub(r'^(-e|--editable=?)\s*', '', line), 224 | editable=True) 225 | else: 226 | logging.debug('Found "{}" (line {} from {})'.format( 227 | line, index, path)) 228 | yield Requirement.parse(line, editable=False) 229 | 230 | r = Requirements() 231 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4 2 | selenium>=2.44.0,<3.0.0 3 | requests 4 | unidecode 5 | vcrpy 6 | future 7 | fake-useragent 8 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [nosetests] 2 | verbosity=1 3 | detailed-errors=1 4 | with-coverage=1 5 | cover-package=googleapi 6 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | try: 6 | from setuptools import setup 7 | except ImportError: 8 | from distutils.core import setup 9 | 10 | # from requirements import r 11 | 12 | with open("requirements.txt") as f: 13 | requirements = [req.strip() for req in f.readlines()] 14 | 15 | with open("test_requirements.txt") as f: 16 | test_requirements = [req.strip() for req in f.readlines()] 17 | 18 | setup(name='Google-Search-API', 19 | version='1.1.14', 20 | url='https://github.com/abenassi/Google-Search-API', 21 | description='Search in google', 22 | author='Anthony Casagrande, Agustin Benassi', 23 | author_email='birdapi@gmail.com, agusbenassi@gmail.com', 24 | maintainer="Agustin Benassi", 25 | maintainer_email='agusbenassi@gmail.com', 26 | license='MIT', 27 | packages=[ 28 | 'googleapi', 29 | 'googleapi.modules', 30 | 'googleapi.tests' 31 | ], 32 | package_dir={'googleapi': 'googleapi'}, 33 | include_package_data=True, 34 | install_requires=requirements, 35 | keywords="google search images api", 36 | classifiers=[ 37 | 'Development Status :: 3 - Alpha', 38 | 'Intended Audience :: Developers', 39 | 'Natural Language :: English', 40 | 'Programming Language :: Python :: 3', 41 | 'Programming Language :: Python :: 3.5', 42 | 'Programming Language :: Python :: 3.6', 43 | 'Programming Language :: Python :: 3.7', 44 | ], 45 | setup_requires=['nose>=1.0'], 46 | test_suite='nose.collector', 47 | tests_require=test_requirements 48 | # **r.requirements 49 | ) 50 | -------------------------------------------------------------------------------- /test_requirements.txt: -------------------------------------------------------------------------------- 1 | vcrpy 2 | nose 3 | nose-cov 4 | twine 5 | mock 6 | --------------------------------------------------------------------------------