├── README.md └── extract.py /README.md: -------------------------------------------------------------------------------- 1 | Web Scraping in Python 2 | ====================== 3 | 4 | extract.py: 5 | 6 | - This code uses the BeautifulSoup library to extract the links in any webpage. 7 | 8 | - The user needs to enter the website from where links have to be extracted. 9 | 10 | - This code uses the "a" tag in the HTML code to help extract all the links that are embedded in the web page. 11 | -------------------------------------------------------------------------------- /extract.py: -------------------------------------------------------------------------------- 1 | # Taken from http://www.pythonforbeginners.com/python-on-the-web/web-scraping-with-beautifulsoup/ 2 | 3 | from bs4 import BeautifulSoup 4 | 5 | import requests 6 | 7 | url = raw_input("Enter a website to extract the URL's from: ") 8 | 9 | r = requests.get("http://" +url) 10 | 11 | data = r.text 12 | 13 | soup = BeautifulSoup(data) 14 | 15 | for link in soup.find_all('a'): 16 | print(link.get('href')) 17 | --------------------------------------------------------------------------------