├── publish_selenium_layer.sh ├── README.md └── scraping.py /publish_selenium_layer.sh: -------------------------------------------------------------------------------- 1 | mkdir -p python/bin/ 2 | curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip 3 | unzip headless-chromium.zip -d python/bin/ 4 | curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip 5 | unzip chromedriver.zip -d python/bin/ 6 | rm -rf chromedriver.zip headless-chromium.zip 7 | docker run --rm -v $(pwd):/var/task -w /var/task lambci/lambda:build-python3.7 pip install selenium -t ./python 8 | zip -r layer.zip python 9 | aws lambda publish-layer-version --layer-name selenium --zip-file fileb://layer.zip --compatible-runtimes python3.7 10 | rm -rf layer.zip python 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | AWS lambda & selenium & python is powerful solution, but requirements conflicts are sensitive. 2 | You can upload them as lambda layer and you don't need to be careful and upload them every time ;) 3 | 4 | + run [publish_selenium_layer.sh](https://github.com/umihico/selenium-lambda-layer/blob/master/publish_selenium_layer.sh) 5 | + open lambda function and use [scraping.py](https://github.com/umihico/selenium-lambda-layer/blob/master/scraping.py) as example. 6 | + make sure you extend lambda itself time out. (The default is 3 seconds) 7 | + import the layer above script created. 8 | 9 | That's it! 10 | 11 | This selenium binaries works with python 3.7. **NOT with 3.8 :(** 12 | 13 | If you want Python 3.8, you need to deploy as docker container image instead of lambda layer, pleate visit [docker-selenium-lambda](https://github.com/umihico/docker-selenium-lambda) 14 | 15 | If you don't want to create function and import this layer for each scraping purposes, please visit my project [pythonista-chromeless](https://github.com/umihico/pythonista-chromeless/) 16 | -------------------------------------------------------------------------------- /scraping.py: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | import json 3 | 4 | 5 | def selenium(event, context): 6 | chrome = Chrome() 7 | chrome.get('https://google.com') 8 | title = chrome.title 9 | chrome.quit() 10 | return { 11 | 'statusCode': 200, 12 | 'body': json.dumps(title) 13 | } 14 | 15 | 16 | def Chrome(): 17 | options = webdriver.ChromeOptions() 18 | options.binary_location = "/opt/python/bin/headless-chromium" 19 | options.add_argument("--headless") 20 | options.add_argument("--disable-gpu") 21 | options.add_argument("--window-size=1280x1696") 22 | options.add_argument("--disable-application-cache") 23 | options.add_argument("--disable-infobars") 24 | options.add_argument("--no-sandbox") 25 | options.add_argument("--hide-scrollbars") 26 | options.add_argument("--enable-logging") 27 | options.add_argument("--log-level=0") 28 | options.add_argument("--single-process") 29 | options.add_argument("--ignore-certificate-errors") 30 | options.add_argument("--homedir=/tmp") 31 | chrome = webdriver.Chrome( 32 | "/opt/python/bin/chromedriver", chrome_options=options) 33 | return chrome 34 | --------------------------------------------------------------------------------