├── publish_selenium_layer.sh
├── README.md
└── scraping.py


/publish_selenium_layer.sh:
--------------------------------------------------------------------------------
 1 | mkdir -p python/bin/
 2 | curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
 3 | unzip headless-chromium.zip -d python/bin/
 4 | curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip
 5 | unzip chromedriver.zip -d python/bin/
 6 | rm -rf chromedriver.zip headless-chromium.zip
 7 | docker run --rm -v $(pwd):/var/task -w /var/task lambci/lambda:build-python3.7 pip install selenium -t ./python
 8 | zip -r layer.zip python
 9 | aws lambda publish-layer-version --layer-name selenium --zip-file fileb://layer.zip --compatible-runtimes python3.7
10 | rm -rf layer.zip python
11 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | AWS lambda & selenium & python is powerful solution, but requirements conflicts are sensitive.
 2 | You can upload them as lambda layer and you don't need to be careful and upload them every time ;)
 3 | 
 4 | + run [publish_selenium_layer.sh](https://github.com/umihico/selenium-lambda-layer/blob/master/publish_selenium_layer.sh)
 5 | + open lambda function and use [scraping.py](https://github.com/umihico/selenium-lambda-layer/blob/master/scraping.py) as example.
 6 | + make sure you extend lambda itself time out. (The default is 3 seconds)
 7 | + import the layer above script created.
 8 | 
 9 | That's it!
10 | 
11 | This selenium binaries works with python 3.7. **NOT with 3.8 :(**
12 | 
13 | If you want Python 3.8, you need to deploy as docker container image instead of lambda layer, pleate visit [docker-selenium-lambda](https://github.com/umihico/docker-selenium-lambda)
14 | 
15 | If you don't want to create function and import this layer for each scraping purposes, please visit my project [pythonista-chromeless](https://github.com/umihico/pythonista-chromeless/)
16 | 


--------------------------------------------------------------------------------
/scraping.py:
--------------------------------------------------------------------------------
 1 | from selenium import webdriver
 2 | import json
 3 | 
 4 | 
 5 | def selenium(event, context):
 6 |     chrome = Chrome()
 7 |     chrome.get('https://google.com')
 8 |     title = chrome.title
 9 |     chrome.quit()
10 |     return {
11 |         'statusCode': 200,
12 |         'body': json.dumps(title)
13 |     }
14 | 
15 | 
16 | def Chrome():
17 |     options = webdriver.ChromeOptions()
18 |     options.binary_location = "/opt/python/bin/headless-chromium"
19 |     options.add_argument("--headless")
20 |     options.add_argument("--disable-gpu")
21 |     options.add_argument("--window-size=1280x1696")
22 |     options.add_argument("--disable-application-cache")
23 |     options.add_argument("--disable-infobars")
24 |     options.add_argument("--no-sandbox")
25 |     options.add_argument("--hide-scrollbars")
26 |     options.add_argument("--enable-logging")
27 |     options.add_argument("--log-level=0")
28 |     options.add_argument("--single-process")
29 |     options.add_argument("--ignore-certificate-errors")
30 |     options.add_argument("--homedir=/tmp")
31 |     chrome = webdriver.Chrome(
32 |         "/opt/python/bin/chromedriver", chrome_options=options)
33 |     return chrome
34 | 


--------------------------------------------------------------------------------