├── README.md └── scraping.py /README.md: -------------------------------------------------------------------------------- 1 | [![Python 3.x ](https://img.shields.io/badge/python-3.x-blue.svg)](https://www.python.org/downloads/release/python-385/) 2 | ![pypi](https://img.shields.io/pypi/v/pybadges.svg) 3 | 4 | # 📚 Libraries 5 | - ## Default in Python 3 6 | - [Datetime](https://docs.python.org/3/library/datetime.html) 7 | - [Json](https://docs.python.org/3/library/json.html) 8 | - [Urllib.request](https://docs.python.org/3/library/urllib.request.html) 9 | - ## Installed with pypi 10 | - [BeautifulSoup](https://pypi.org/project/beautifulsoup4/) 11 | 12 | # 💸 Investing 13 | 14 | [Investing](https://investing.com/) is a well-known website dedicated to the financial market, the part chosen for scraping of data was the economic calendar that contains the most important news and events that will affect certain currencies during the day. 15 | -------------------------------------------------------------------------------- /scraping.py: -------------------------------------------------------------------------------- 1 | from urllib.request import urlopen, Request 2 | from bs4 import BeautifulSoup 3 | from datetime import datetime 4 | import json 5 | 6 | def news_verification(): 7 | r = Request('https://br.investing.com/economic-calendar/', headers={'User-Agent': 'Mozilla/5.0'}) 8 | response = urlopen(r).read() 9 | soup = BeautifulSoup(response, "html.parser") 10 | table = soup.find_all(class_ = "js-event-item") 11 | 12 | result = [] 13 | base = {} 14 | 15 | for bl in table: 16 | time = bl.find(class_ ="first left time js-time").text 17 | # evento = bl.find(class_ ="left event").text 18 | currency = bl.find(class_ ="left flagCur noWrap").text.split(' ') 19 | intensity = bl.find_all(class_="left textNum sentiment noWrap") 20 | id_hour = currency[1] + '_' + time 21 | 22 | if not id_hour in base: 23 | base.update({id_hour : {'currency' : currency[1], 'time' : time,'intensity' : { "1": 0,"2": 0,"3": 0} } }) 24 | 25 | intencity = base[id_hour]['intensity'] 26 | 27 | for intence in intensity: 28 | _true = intence.find_all(class_="grayFullBullishIcon") 29 | _false = intence.find_all(class_="grayEmptyBullishIcon") 30 | 31 | if len(_true) == 1: 32 | intencity['1'] += 1 33 | 34 | elif len(_true) == 2: 35 | intencity['2'] += 1 36 | 37 | elif len(_true) == 3: 38 | intencity['3'] += 1 39 | 40 | base[id_hour].update({'intensity' : intencity}) 41 | 42 | for b in base: 43 | result.append(base[b]) 44 | 45 | return result 46 | 47 | news = news_verification() 48 | 49 | print(json.dumps(news, indent=2)) 50 | --------------------------------------------------------------------------------