├── .gitignore ├── CONTRIBUTING.md ├── Procfile ├── README.md ├── SUMMARY.md ├── app.py ├── chromedriver.exe ├── docs ├── README.md ├── get_search.md └── get_search_platform.md ├── requirements.txt ├── spiders ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-37.pyc │ ├── __init__.cpython-38.pyc │ ├── coursera.cpython-37.pyc │ ├── coursera.cpython-38.pyc │ ├── pluralsight.cpython-37.pyc │ ├── pluralsight.cpython-38.pyc │ ├── udacity.cpython-37.pyc │ ├── udacity.cpython-38.pyc │ ├── udemy.cpython-37.pyc │ └── udemy.cpython-38.pyc ├── chromedriver.exe ├── coursera.py ├── pluralsight.py ├── udacity.py └── udemy.py ├── static ├── css │ └── styles.css └── images │ ├── 1.png │ ├── 2.png │ ├── 3.png │ ├── Coursearch.png │ ├── books.png │ ├── circle-cropped.png │ ├── home-img1.svg │ └── newicon.png └── templates ├── docs.html ├── home.html └── results.html /.gitignore: -------------------------------------------------------------------------------- 1 | venv -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | * Log in to your Github account, and navigate to this repository 4 | * Click the fork button to fork this repository 5 | * Head to your forked repository on Github 6 | * Click the "Code" button, and under the "Clone" section, copy the url 7 | * Inside your terminal app, type `git clone && cd Coursearch` 8 | * Once inside the Coursearch directory, make a new branch of the project and switch to it by typing `git checkout -b ` 9 | * Once you have made your contribution, type `git add .` to add your work 10 | * Once you've added your work, type `git commit -m "your-commit-message-here"` to commit the change 11 | * Once committed, type `git origin push ` to push the changes to Github 12 | * Navigate to this repository on Github 13 | * Hit the "compare and pull request button" 14 | * Submit a pull request to the master branch 15 | 16 | # Making a Pull Request 17 | 18 | When making a pull request, be sure to follow [these guidelines for writing a good commit message](https://dev.to/chrissiemhrk/git-commit-message-5e21). 19 | 20 | Also, please request a review for the pull request from the project maintainer. 21 | -------------------------------------------------------------------------------- /Procfile: -------------------------------------------------------------------------------- 1 | web: gunicorn app:app -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Coursearch 💻 📝 📚 2 | ============ 3 | 4 | [![](https://img.shields.io/badge/Made_with-Python3-blue?style=for-the-badge&logo=python)]() 5 | [![](https://img.shields.io/badge/Made_with-flask-blue?style=for-the-badge&logo=flask)]() 6 | [![](https://img.shields.io/badge/Made_with-pandas-blue?style=for-the-badge&logo=pandas)]() 7 | [![](https://img.shields.io/badge/Made_with-selenium-blue?style=for-the-badge&logo=selenium)]() 8 | [![](https://img.shields.io/badge/Made_with-crochet-blue?style=for-the-badge&logo=crochet)]() 9 | [![](https://img.shields.io/badge/Made_with-scrapy-blue?style=for-the-badge&logo=scrapy)]() 10 | [![](https://img.shields.io/badge/Made_with-material_design-blue?style=for-the-badge&logo=material-design)]() 11 | [![](https://img.shields.io/badge/deployed_on-heroku-blue?style=for-the-badge&logo=heroku)]() 12 | 13 | A one stop solution to navigate the endless sea of online courses. 14 | 15 | This is our submission for the HackChennai Hackathon 2020. It was made under 36 hours with no prior preparation. 16 | 17 | --- 18 | 19 | ## The problem: 20 | - In the past few years, the popularity of MOOC platforms has skyrocketed, prompting many other such platforms to crop up. 21 | - However, not many users are satisfied with all the courses present. 22 | - Sometimes the level of difficulty is too high, or the hands-on labs are outdated, or the instructor hides behind technical jargon, or the videos lack contextual clarity and so on. 23 | - With so many courses coming up, it is essential for the learner to save time and money by choosing only the best course available. 24 | 25 | --- 26 | 27 | ## The solution: 28 | - Why not let the user select something they want to learn and leave the choosing courses to us? 29 | - Why not gather data from multiple MOOC platforms? 30 | - Why not use this data of ratings and reviews to rank courses with a unique method? 31 | - Why not let the user view this combined information in one place where they can easily search, sort and visit these courses? 32 | 33 | --- 34 | 35 | ## Features: 36 | 37 | - Crawls information from Coursera, Udemy, Pluralsight and Udacity (in progress) to gather MOOC data for any search term. 38 | - Ranks the courses using a weighted average of their number of reviews and the average rating (out of five.) 39 | - Allows the user to further filter the results on any basis they want: difficulty, ratings, intructor, etc. 40 | - Additional search option is provided in the table too. 41 | - Users can find direct links to any MOOC. 42 | - The `/api` endpoint can be used by anyone to query data using parameters defined in the documentation. 43 | 44 | --- 45 | 46 | ## Tech stack: 47 | 48 | - `flask:` web framework. 49 | - `python:` backend routing and web crawlers. 50 | - `jinja2:` templating and rendering frontend. 51 | - `scrapy`, `selenium` and `beautifulsoup:` data mining. 52 | - `crochet:` running the reactor for consecutive searches. 53 | - `pandas` and `json:` formatting and cleaning the dataframe. 54 | - `Material Design` and `Bootstrap:` styling the frontend. 55 | 56 | --- 57 | 58 | ## Deployment: 59 | 60 | The live project is deployed on https://coursearch.herokuapp.com/. You may use it to search MOOCS, use the API for your projects or run it locally using the instructions below. 61 | 62 | --- 63 | 64 | ## Local installation: 65 | 66 | **You must have Python 3.6 or higher to run the file.** 67 | 68 | - Create a new virtual environment for running the application. You can follow the instructions [here.](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/26/python-virtual-env/) 69 | - Navigate to the virtual environment and activate it. 70 | - Install the dependancies using `pip install -r requirements.txt` 71 | - Run the `app.py` file with `python app.py` 72 | 73 | **Note:** to run it locally, you must have a version of `chromedriver.exe` that matches the one installed on your device. 74 | 75 | --- 76 | 77 | ## Future scope: 78 | - Fix the Udacity web crawler so that it works. 79 | - Convert the website into a PWA. 80 | - Explore and mine data from other platforms too. 81 | - Add a variety of parameters to the `/api` endpoint. 82 | We are open to enhancement and bug fixes: fork and clone this repository, submit a pull request and we'll test your changes before merging them! 83 | --- 84 |

Made with ❤️ by Team Infinity.

85 | -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Table of contents 2 | 3 | * [🏁 Coursearch](README.md) 4 | * [API Docs](docs/README.md) 5 | * [Retrieve by search term](docs/get_search.md) 6 | * [Retrieve by search term and platform](docs/get_search_platform.md) 7 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import crochet 2 | crochet.setup() 3 | 4 | from flask import Flask , render_template, jsonify, request, redirect, url_for ,make_response,send_from_directory 5 | from scrapy import signals 6 | from scrapy.crawler import CrawlerRunner 7 | from scrapy.signalmanager import dispatcher 8 | import time 9 | from spiders import * 10 | import pandas as pd 11 | 12 | app = Flask(__name__) 13 | 14 | output_data = [] 15 | crawl_runner = CrawlerRunner() 16 | 17 | def scrape(searchTerm): 18 | scrape_with_crochet(searchTerm=searchTerm ,spider = udemy.UdemySpider) 19 | scrape_with_crochet(searchTerm=searchTerm ,spider = udacity.UdacitySpider) 20 | output_data.extend(coursera.func(searchTerm)) 21 | time.sleep(5) 22 | output_data.extend(pluralsight.func(searchTerm)) 23 | df = pd.DataFrame() 24 | for i in output_data: 25 | df = df.append(i,ignore_index= True) 26 | output_data.clear() 27 | return df 28 | 29 | def sort_df(df): 30 | df = df.dropna() 31 | df['rank']=0.7*df["rating_count"] + 0.3*df["rating_out_of_five"] 32 | df = df.sort_values(by=['rating_count'],ascending=False) 33 | print(df) 34 | return df 35 | 36 | @app.route('/') 37 | def home(): 38 | return render_template('home.html') 39 | 40 | @app.route('/docs') 41 | def docs(): 42 | return render_template('docs.html') 43 | 44 | @app.route('/results', methods=['POST','GET']) 45 | def get_query(): 46 | if request.method=="POST": 47 | query = str(request.form['query']).title() 48 | else: 49 | args = request.args 50 | query = str(args['query']).title() 51 | df = scrape(query) 52 | df = sort_df(df) 53 | return render_template('results.html', query=query,df=df,l=df.shape[0]) 54 | 55 | @app.route('/api',methods = ['GET']) 56 | def api(): 57 | args = request.args 58 | if(len(args)==2): 59 | if(args["site"]=="udemy"): 60 | scrape_with_crochet(searchTerm=args["searchTerm"] ,spider= udemy.UdemySpider) 61 | time.sleep(2) 62 | elif(args["site"]=="coursera"): 63 | output_data.extend(coursera.getCourses(args["searchTerm"])) 64 | time.sleep(2) 65 | elif(args["site"]=="pluralsight"): 66 | output_data.extend(pluralsight.getCourses(args["searchTerm"])) 67 | elif(args["site"]=="udacity"): 68 | output_data.extend(udacity.getCourses(args["searchTerm"])) 69 | else: 70 | scrape_with_crochet(searchTerm=args["searchTerm"] ,spider= udemy.UdemySpider) 71 | output_data.extend(coursera.func(args["searchTerm"])) 72 | time.sleep(5) 73 | output_data.extend(pluralsight.func(args["searchTerm"])) 74 | 75 | res = jsonify(output_data) 76 | output_data.clear() 77 | return res 78 | 79 | 80 | 81 | @crochet.run_in_reactor 82 | def scrape_with_crochet(searchTerm,spider): 83 | dispatcher.connect(_crawler_result, signal=signals.item_scraped) 84 | 85 | eventual = crawl_runner.crawl(spider, category = searchTerm) 86 | return eventual 87 | 88 | def _crawler_result(item, response, spider): 89 | output_data.append(dict(item)) 90 | 91 | if __name__ == '__main__': 92 | app.run(debug=True) -------------------------------------------------------------------------------- /chromedriver.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/chromedriver.exe -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- 1 | ## Welcome to the Coursearch API reference! 2 | 3 | This is the place to find official information on how to use Coursearch API in your projects. 4 | 5 | The Coursearch API serves well formatted data about MOOCs across a few well known platforms. It can be used without any API token, membership, registration or payment. It supports a few parameters that can be applied to get just the right data you need. The usage is very simple and requires only basic knowledge of HTTP requests and JSON, XML, YAML or plain text. -------------------------------------------------------------------------------- /docs/get_search.md: -------------------------------------------------------------------------------- 1 | ## Retrieve MOOCs by search term 2 | 3 | `GET` /api?{searchTerm} 4 | 5 | ### Parameters: 6 | 7 | | Name | Type | In | Description | 8 | | ------------- | ------------- | ------------- | ------------- | 9 | | searchTerm* | string | path | Indicates the topic / MOOC name to search the API for across all platforms. Good examples for this parameter would be deep learning, graphic design, writing. | -------------------------------------------------------------------------------- /docs/get_search_platform.md: -------------------------------------------------------------------------------- 1 | ## Retrieve MOOCs by search term and platform 2 | 3 | `GET` /api?{searchTerm}/{site} 4 | 5 | ### Parameters: 6 | 7 | | Name | Type | In | Description | 8 | | ------------- | ------------- | ------------- | ------------- | 9 | | searchTerm* | string | path | Indicates the topic / MOOC name to search the API for across all platforms. Good examples for this parameter would be deep learning, graphic design, writing. | 10 | | site* | string | path | Indicates the platform to mine MOOC data from. Can take any of the following values: `coursera`, `udemy `, `pluralsight` or `udacity` | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | appdirs==1.4.4 2 | attrs==19.3.0 3 | Automat==20.2.0 4 | beautifulsoup4==4.9.1 5 | bs4==0.0.1 6 | certifi==2020.6.20 7 | cffi==1.14.2 8 | chardet==3.0.4 9 | click==7.1.2 10 | constantly==15.1.0 11 | crochet==1.12.0 12 | cryptography==3.0 13 | cssselect==1.1.0 14 | fake-useragent==0.1.11 15 | Flask==1.1.2 16 | gunicorn==20.0.4 17 | hyperlink==20.0.1 18 | idna==2.10 19 | incremental==17.5.0 20 | itemadapter==0.1.0 21 | itemloaders==1.0.2 22 | itsdangerous==1.1.0 23 | Jinja2==2.11.2 24 | jmespath==0.10.0 25 | lxml==4.5.2 26 | MarkupSafe==1.1.1 27 | numpy==1.19.1 28 | pandas==1.1.1 29 | parse==1.16.0 30 | parsel==1.6.0 31 | Protego==0.1.16 32 | pyasn1==0.4.8 33 | pyasn1-modules==0.2.8 34 | pycparser==2.20 35 | PyDispatcher==2.0.5 36 | pyee==7.0.2 37 | PyHamcrest==2.0.2 38 | pyOpenSSL==19.1.0 39 | pyppeteer==0.2.2 40 | pyquery==1.4.1 41 | python-dateutil==2.8.1 42 | pytz==2020.1 43 | queuelib==1.5.0 44 | requests==2.24.0 45 | requests-html==0.10.0 46 | Scrapy==2.3.0 47 | selenium==3.141.0 48 | service-identity==18.1.0 49 | six==1.15.0 50 | soupsieve==2.0.1 51 | tqdm==4.48.2 52 | Twisted==20.3.0 53 | urllib3==1.25.10 54 | w3lib==1.22.0 55 | websockets==8.1 56 | Werkzeug==1.0.1 57 | wrapt==1.12.1 58 | zope.interface==5.1.0 -------------------------------------------------------------------------------- /spiders/__init__.py: -------------------------------------------------------------------------------- 1 | # This package contains the spiders of your Scrapy project 2 | __all__=['coursera','udemy','udacity','pluralsight'] -------------------------------------------------------------------------------- /spiders/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/coursera.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/coursera.cpython-37.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/coursera.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/coursera.cpython-38.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/pluralsight.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/pluralsight.cpython-37.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/pluralsight.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/pluralsight.cpython-38.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/udacity.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/udacity.cpython-37.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/udacity.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/udacity.cpython-38.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/udemy.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/udemy.cpython-37.pyc -------------------------------------------------------------------------------- /spiders/__pycache__/udemy.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/__pycache__/udemy.cpython-38.pyc -------------------------------------------------------------------------------- /spiders/chromedriver.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/spiders/chromedriver.exe -------------------------------------------------------------------------------- /spiders/coursera.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from bs4 import BeautifulSoup 4 | from selenium import webdriver 5 | from selenium.webdriver.chrome.options import Options 6 | from selenium.webdriver.support.ui import WebDriverWait 7 | from selenium.common.exceptions import TimeoutException 8 | 9 | def configure_driver(): 10 | chrome_options = Options() 11 | chrome_options.add_argument("--headless") 12 | # chrome_options = Options() 13 | chrome_options.add_argument("--disable-dev-shm-usage") 14 | chrome_options.add_argument("--no-sandbox") 15 | # driver = webdriver.Chrome(executable_path="chromedriver.exe", options = chrome_options) 16 | driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options = chrome_options) 17 | return driver 18 | 19 | def scan(s): 20 | if s: 21 | return int(s.replace(',', '').replace('(', '').replace(')', '')) 22 | 23 | def decimals(n): 24 | if n: 25 | return float(n) 26 | 27 | def getCourses(driver, search_keyword): 28 | 29 | mylist = [] 30 | 31 | for page in range(1, 4): 32 | driver.get(f"https://www.coursera.org/search?query={search_keyword}%p={page}") 33 | soup = BeautifulSoup(driver.page_source, "lxml") 34 | for course in soup.select("li.ais-InfiniteHits-item"): 35 | mydict = {} 36 | NAME_SELECTOR = ".card-title" 37 | PARTNER_SELECTOR = 'span.partner-name' 38 | IMAGE_SELECTOR = 'div.card-content div.cds-grid-item img' 39 | RATING_SELECTOR = ".ratings-text" 40 | NUM_RATINGS_SELECTOR = "span.ratings-count span" 41 | DIFFICULTY_SELECTOR = ".difficulty" 42 | LINK_SELECTOR = "div a" 43 | mydict["course_name"] = course.select_one(NAME_SELECTOR).text 44 | mydict["partner_name"] = course.select_one(PARTNER_SELECTOR).text 45 | mydict["image_link"] = course.select_one(IMAGE_SELECTOR)['src'] 46 | mydict["rating_out_of_five"] = decimals(course.select_one(RATING_SELECTOR).text) 47 | mydict["rating_count"] = scan(course.select_one(NUM_RATINGS_SELECTOR).text) 48 | mydict["difficulty_level"] = course.select_one(DIFFICULTY_SELECTOR).text 49 | mydict["link_to_course"] = 'https://www.coursera.org' + course.select_one(LINK_SELECTOR)['href'] 50 | mydict["offered_by"] = "Coursera" 51 | mylist.append(mydict) 52 | return mylist 53 | 54 | def func(search_keyword): 55 | driver = configure_driver() 56 | list = getCourses(driver,search_keyword) 57 | driver.close() 58 | return list 59 | -------------------------------------------------------------------------------- /spiders/pluralsight.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from bs4 import BeautifulSoup 4 | from selenium import webdriver 5 | from selenium.webdriver.chrome.options import Options 6 | from selenium.webdriver.support.ui import WebDriverWait 7 | from selenium.common.exceptions import TimeoutException 8 | 9 | def configure_driver(): 10 | chrome_options = Options() 11 | chrome_options.add_argument("--headless") 12 | # chrome_options = Options() 13 | chrome_options.add_argument("--disable-dev-shm-usage") 14 | chrome_options.add_argument("--no-sandbox") 15 | # driver = webdriver.Chrome(executable_path="chromedriver.exe", options = chrome_options) 16 | driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options = chrome_options) 17 | return driver 18 | 19 | def formatreviews(x): 20 | if x: 21 | n = ''.join([i for i in str(x) if i.isdigit()]) 22 | else: 23 | n = 0 24 | return n 25 | 26 | def count_stars(s): 27 | s = str(s) 28 | ones = s.count('') 29 | halves = s.count('') 30 | total = ones + halves * 0.5 31 | return total 32 | 33 | 34 | def getCourses(driver, search_keyword): 35 | 36 | driver.get(f"https://www.pluralsight.com/search?q={search_keyword}&categories=course") 37 | try: 38 | WebDriverWait(driver, 5).until(lambda s: s.find_element_by_id("search-results-category-target").is_displayed()) 39 | except TimeoutException: 40 | print("TimeoutException: Element not found") 41 | return None 42 | 43 | soup = BeautifulSoup(driver.page_source, "lxml") 44 | mylist = [] 45 | 46 | for course_page in soup.select("div.search-results-page"): 47 | for course in course_page.select("div.search-result"): 48 | mydict = {} 49 | TITLE_SELECTOR = "div.search-result__info div.search-result__title a" 50 | AUTHOR_SELECTOR = "div.search-result__details div.search-result__author" 51 | LEVEL_SELECTOR = "div.search-result__details div.search-result__level" 52 | RATING_SELECTOR = "div.search-result__details div.search-result__rating" 53 | REVIEWS_SELECTOR = "div.search-result__details div.search-result__rating" 54 | IMAGE_SELECTOR = "div.search-result__icon img" 55 | LINK_SELECTOR = "div.search-result__info div.search-result__title a" 56 | if course.select_one(RATING_SELECTOR) and len(str(course.select_one(REVIEWS_SELECTOR)))!=0: 57 | try: 58 | mydict["course_name"] = course.select_one(TITLE_SELECTOR).text 59 | mydict["partner_name"] = (course.select_one(AUTHOR_SELECTOR).text).replace('by ', '') 60 | mydict["difficulty_level"] = course.select_one(LEVEL_SELECTOR).text 61 | mydict["rating_out_of_five"] = count_stars(course.select_one(RATING_SELECTOR)) 62 | mydict["rating_count"] = int(formatreviews(course.select_one(REVIEWS_SELECTOR))) 63 | mydict["image_link"] = course.select_one(IMAGE_SELECTOR)['src'] 64 | mydict["offered_by"] = "Pluralsight" 65 | mydict["link_to_course"] = course.select_one(LINK_SELECTOR)['href'] 66 | mylist.append(mydict) 67 | except: 68 | pass 69 | return mylist 70 | 71 | def func(search_keyword): 72 | driver = configure_driver() 73 | list = getCourses(driver,search_keyword) 74 | driver.close() 75 | return list 76 | -------------------------------------------------------------------------------- /spiders/udacity.py: -------------------------------------------------------------------------------- 1 | import json 2 | import requests 3 | import scrapy 4 | from bs4 import BeautifulSoup 5 | 6 | BASE_URL = "https://www.udacity.com/courses/all" 7 | 8 | class UdacitySpider(scrapy.Spider): 9 | name = "udacity_spider" 10 | 11 | def __init__(self, *args, **kwargs): 12 | super(UdacitySpider, self).__init__(*args, **kwargs) 13 | self.search_term = kwargs.get('category') 14 | self.start_urls = [BASE_URL] 15 | 16 | def parse(self, response): 17 | page = BeautifulSoup(response.body, 'lxml') 18 | courses = page.findAll("div", {"class": "catalog-component__card"}) 19 | for course in courses: 20 | link_to_course = "https://www.udacity.com" + course.find('a', {'class': 'card__top'})["href"] 21 | 22 | skills = course.find("p", {"class": "text-content__text"}) 23 | course_name = course.find("h2", {"class": "card__title__nd-name"}).text 24 | if self.search_term.lower() in course_name.lower() or skills and self.search_term.lower() in skills.text.lower(): 25 | course_id = link_to_course.split("--")[-1] 26 | 27 | sections = list(course.find('div' , {'class': 'card__text-content'}).children) 28 | if len(sections) == 2: 29 | partner_name = sections[1].find('p').text 30 | else: 31 | partner_name = None 32 | 33 | image_link = "https://d20vrrgs8k4bvw.cloudfront.net/images/open-graph/udacity.png" 34 | 35 | difficulty_level = course.find('div', {'class': 'difficulty'}).find('small').text 36 | 37 | ratings_page = requests.get(f"https://ratings-api.udacity.com/api/v1/reviews?node={course_id}") 38 | rating_count = 0 39 | stats = json.loads(ratings_page.content)["stats"] 40 | avg_ratings = 0 41 | for stat in stats: 42 | rating_count += stat["count"] 43 | avg_ratings += stat["count"] * stat["rating"] 44 | if rating_count > 0: 45 | avg_ratings = avg_ratings / rating_count 46 | rating_out_of_five = round(avg_ratings, 1) 47 | yield { 48 | 'course_name': course_name, 49 | 'partner_name': partner_name, 50 | 'image_link': image_link, 51 | 'rating_out_of_five': rating_out_of_five, 52 | 'rating_count': rating_count, 53 | 'difficulty_level': difficulty_level, 54 | 'link_to_course': link_to_course, 55 | 'offered_by': 'Udacity' 56 | } 57 | -------------------------------------------------------------------------------- /spiders/udemy.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | from scrapy.crawler import CrawlerProcess 3 | import json 4 | import pandas as pd 5 | 6 | # df1=pd.DataFrame() 7 | 8 | class UdemySpider(scrapy.Spider): 9 | 10 | name = 'udemy_spider' 11 | def __init__(self, *args, **kwargs): 12 | super(UdemySpider, self).__init__(*args, **kwargs) 13 | searchterm = kwargs.get('category') 14 | searchterm = ('%20').join(searchterm.lower().split()) 15 | self.start_urls = [f'https://www.udemy.com/api-2.0/search-courses/?p='+str(i)+'&q='+searchterm+'&skip_price=true' for i in (1, 2, 3)] 16 | self.headers={"accept": "application/json, text/plain, */*", 17 | "accept-encoding": "gzip, deflate, br", 18 | "accept-language": "en-US,en;q=0.9", 19 | "cache-control": "no-cache", 20 | "pragma": "no-cache", 21 | "referer": "https://www.udemy.com/courses/search/?q="+searchterm, 22 | "sec-fetch-mode": "cors", 23 | "sec-fetch-site": "same-origin", 24 | "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36", 25 | "x-requested-with": "XMLHttpRequest", 26 | "x-udemy-cache-brand": "UAen_US", 27 | "x-udemy-cache-campaign-code": "UDEMYBASICS0720", 28 | "x-udemy-cache-device": "desktop", 29 | "x-udemy-cache-language": "en", 30 | "x-udemy-cache-logged-in": "0", 31 | "x-udemy-cache-marketplace-country": "UA", 32 | "x-udemy-cache-modern-browser": "1", 33 | "x-udemy-cache-price-country": "UA", 34 | "x-udemy-cache-release": "a50355e6f369173f712c", 35 | "x-udemy-cache-user": "", 36 | "x-udemy-cache-version": "1" 37 | } 38 | 39 | # params = {'course_name':[], 40 | # 'link_to_course':[], 41 | # 'partner_name':[], 42 | # 'rating_out_of_five':[], 43 | # 'rating_count':[], 44 | # 'image_link':[], 45 | # 'difficulty_level':[], 46 | # 'offered_by':[]} 47 | 48 | # df = pd.DataFrame(params) 49 | 50 | def start_requests(self): 51 | for url in self.start_urls: 52 | yield scrapy.Request(url=url, callback=self.parse, headers=self.headers ) 53 | 54 | def parse(self, response): 55 | data = json.loads(response.text) 56 | for i in data['courses']: 57 | # self.df = self.df.append({'course_name':i['title'], 58 | # 'link_to_course':"https://www.udemy.com"+i['url'], 59 | # 'partner_name':i['visible_instructors'][0]['title'], 60 | # 'rating_out_of_five':i['rating'], 61 | # 'rating_count':i['num_reviews'], 62 | # 'image_link':i['image_240x135'], 63 | # 'difficulty_level':i['instructional_level'], 64 | # 'offered_by':'Udemy'}, 65 | # ignore_index = True 66 | # ) 67 | 68 | yield{ 69 | 'course_name':i['title'], 70 | 'link_to_course':"https://www.udemy.com"+i['url'], 71 | 'partner_name':i['visible_instructors'][0]['title'], 72 | 'rating_out_of_five':i['rating'], 73 | 'rating_count':i['num_reviews'], 74 | 'image_link':i['image_240x135'], 75 | 'difficulty_level':i['instructional_level'], 76 | 'offered_by':'Udemy' 77 | } 78 | # # global df1 79 | # # df1 = self.df 80 | 81 | # if __name__=="__main__": 82 | # process =CrawlerProcess() 83 | # process.crawl(UdemySpider) 84 | # process.start() 85 | 86 | # def run(searchTerm): 87 | # process =CrawlerProcess() 88 | # process.crawl(UdemySpider, category = searchTerm) 89 | # process.start() 90 | # return df1 -------------------------------------------------------------------------------- /static/css/styles.css: -------------------------------------------------------------------------------- 1 | :root { 2 | --sky: #BDD6DB; 3 | --pale-white: #f8f8f8; 4 | --darkblue: #1d1b40; 5 | --coral: #ff6464; 6 | } 7 | 8 | .main { 9 | background-color: var(--sky); 10 | width: 80%; 11 | margin: auto; 12 | } 13 | 14 | /* nav bar */ 15 | 16 | .navbar-brand { 17 | font-size: 2rem; 18 | font-family: 'Roboto', sans-serif; 19 | font-weight: 500; 20 | color: var(--darkblue); 21 | } 22 | 23 | .fa { 24 | color: var(--darkblue) !important; 25 | font-size: 28px; 26 | } 27 | 28 | .nav-item { 29 | padding: 0 5px; 30 | } 31 | 32 | .nav-link { 33 | font-size: 0.8rem; 34 | font-family: 'Roboto', sans-serif; 35 | font-weight: 300; 36 | color: var(--darkblue); 37 | } 38 | 39 | .nav-link:hover { 40 | color: var(--coral); 41 | } 42 | 43 | .nav-icon{ 44 | height: 2.5rem; 45 | border-radius: 50%; 46 | margin-right: 1.2rem; 47 | } 48 | 49 | /* home */ 50 | 51 | .home { 52 | padding: 2rem; 53 | } 54 | 55 | .header { 56 | font-family: 'Roboto', sans-serif; 57 | font-weight: 700; 58 | } 59 | 60 | .title { 61 | color: var(--coral); 62 | font-size: 3.2rem; 63 | } 64 | 65 | .row-manip { 66 | padding: 3.5rem 0 0 0; 67 | margin: auto; 68 | } 69 | 70 | .col-manip { 71 | padding: 2rem 1.5rem; 72 | margin: auto; 73 | text-align: left; 74 | } 75 | 76 | .desc { 77 | color: var(--darkblue); 78 | font-family: 'Roboto', sans-serif; 79 | font-weight: 400; 80 | font-size: 0.9rem; 81 | line-height: 1.5; 82 | } 83 | 84 | .searchbar { 85 | margin-top: 2em; 86 | border: none; 87 | width: 80%; 88 | color: var(--coral); 89 | border-bottom: 1px var(--coral) solid; 90 | background: transparent; 91 | text-transform: uppercase; 92 | padding-top: 1.5rem; 93 | } 94 | 95 | ::placeholder { 96 | color: var(--coral); 97 | opacity: .7; 98 | padding-left: 1%; 99 | } 100 | 101 | *:focus { 102 | outline: none; 103 | } 104 | 105 | .go { 106 | margin-top: 2em; 107 | color: var(--coral); 108 | border: none; 109 | cursor: pointer; 110 | background: transparent; 111 | } 112 | 113 | .go:hover { 114 | color: var(--coral); 115 | } 116 | 117 | .img-home { 118 | height: 15rem; 119 | padding: 0; 120 | } 121 | 122 | 123 | /* Results page */ 124 | 125 | .manip-search{ 126 | width:50%; 127 | margin: auto; 128 | } 129 | 130 | .nav-adjust { 131 | width: 80%; 132 | margin: auto; 133 | } 134 | 135 | .results { 136 | text-align: center; 137 | margin: auto; 138 | background-color: var(--pale-white); 139 | } 140 | 141 | .section-head { 142 | background-color: var(--sky); 143 | } 144 | 145 | .result-header { 146 | padding: 2rem 0 4rem 0; 147 | } 148 | 149 | .head { 150 | font: 'Roboto', sans-serif; 151 | font-weight: 700; 152 | color: var(--coral); 153 | font-size: 4rem; 154 | } 155 | 156 | .contents { 157 | background: var(--pale-white); 158 | margin: auto; 159 | padding: 2rem 0; 160 | /* font-family: 'Roboto', sans-serif; */ 161 | font-weight: 500; 162 | color: var(--sky); 163 | font-size: 1.2rem; 164 | } 165 | 166 | .img-course { 167 | height: 8rem; 168 | width: 8rem; 169 | margin: 1rem 0; 170 | border-radius: 50%; 171 | } 172 | 173 | .table { 174 | text-align: left; 175 | width: 90%; 176 | margin:auto; 177 | } 178 | 179 | .table-header { 180 | color: var(--darkblue) !important; 181 | } 182 | 183 | .visit-link { 184 | background-color: var(--darkblue) !important; 185 | border-width: 0 !important; 186 | color: var(--pale-white) !important; 187 | width: 2rem; 188 | height: 2rem; 189 | border-radius: 50%; 190 | } 191 | 192 | .icon { 193 | color: var(--pale-white) !important; 194 | } 195 | 196 | a.page-link { 197 | background-color: var(--pale-white); 198 | color: var(--darkblue); 199 | } 200 | 201 | .page link .active { 202 | background-color: var(--pale-white); 203 | color: var(--coral) !important; 204 | } 205 | 206 | 207 | /* footer */ 208 | 209 | .footer { 210 | font-family: 'Roboto', sans-serif; 211 | font-weight: 300; 212 | color: transparent; 213 | font-size: 0.8rem; 214 | text-align: center; 215 | padding-top: 2rem; 216 | } 217 | 218 | 219 | /* footer */ 220 | 221 | .res-footer{ 222 | background-color: transparent; 223 | padding-bottom: 0.3rem; 224 | } 225 | 226 | .footer { 227 | font-family: 'Roboto', sans-serif; 228 | font-weight: 300; 229 | color: var(--darkblue); 230 | font-size: 0.8rem; 231 | text-align: center; 232 | padding-top: 2rem; 233 | } 234 | 235 | .footer .fa-heart, .res-footer .fa-heart { 236 | color: var(--coral); 237 | } 238 | 239 | .footer-new{ 240 | font-family: 'Roboto', sans-serif; 241 | font-weight: 300; 242 | color: var(--darkblue); 243 | font-size: 0.8rem; 244 | text-align: center; 245 | padding: 1rem 0 0 0; 246 | } 247 | 248 | label{ 249 | margin-right: 200px; 250 | } 251 | 252 | .dataTables_Wrapper 253 | { 254 | padding: 4rem; 255 | } 256 | a.page-link,.previous 257 | { 258 | background-color: var(--pale-white) !important; 259 | color: var(--darkblue) !important; 260 | } 261 | .active>a 262 | { 263 | background-color: var(--sky) !important; 264 | text-decoration: none; 265 | } 266 | 267 | 268 | /* Card section */ 269 | 270 | .mobile{ 271 | background: var(--pale-white); 272 | margin: auto; 273 | padding: 2rem 0; 274 | font-family: 'Roboto', sans-serif; 275 | font-weight: 300; 276 | color: var(--darkblue); 277 | font-size: 1rem; 278 | } 279 | .card{ 280 | margin: 2rem 0; 281 | } 282 | .card-image{ 283 | margin: 3rem 0; 284 | } 285 | 286 | .card-body{ 287 | font-family: 'Roboto'; 288 | } 289 | 290 | .course-desc{ 291 | text-align: left; 292 | padding-left: 2rem; 293 | } 294 | 295 | .visit-link-mobile{ 296 | background-color: var(--darkblue) !important; 297 | border-width: 0 !important; 298 | color: var(--pale-white) !important; 299 | width: 2rem; 300 | height: 2rem; 301 | border-radius: 50%; 302 | } 303 | 304 | .course-name{ 305 | color: var(--coral); 306 | } 307 | 308 | 309 | @media (max-width: 992px) { 310 | .contents { 311 | display: none; 312 | } 313 | } 314 | 315 | @media (min-width: 992px) { 316 | .mobile { 317 | display: none; 318 | } 319 | } 320 | /*2c1654 ff6464 00be65 fdf550 */ 321 | /*2c1654 ff6464 00be65 fdf550 */ 322 | 323 | 324 | /* docs css */ 325 | 326 | 327 | 328 | /** 329 | * Copyright 2015 Google Inc. All Rights Reserved. 330 | * 331 | * Licensed under the Apache License, Version 2.0 (the "License"); 332 | * you may not use this file except in compliance with the License. 333 | * You may obtain a copy of the License at 334 | * 335 | * http://www.apache.org/licenses/LICENSE-2.0 336 | * 337 | * Unless required by applicable law or agreed to in writing, software 338 | * distributed under the License is distributed on an "AS IS" BASIS, 339 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 340 | * See the License for the specific language governing permissions and 341 | * limitations under the License. 342 | */ 343 | 344 | .nav-docs{ 345 | margin: auto; 346 | } 347 | .demo-ribbon { 348 | width: 100%; 349 | height: 50vh; 350 | background-color:var(--sky) !important; 351 | -webkit-flex-shrink: 0; 352 | -ms-flex-negative: 0; 353 | flex-shrink: 0; 354 | } 355 | 356 | .demo-main { 357 | margin-top: -35vh; 358 | -webkit-flex-shrink: 0; 359 | -ms-flex-negative: 0; 360 | flex-shrink: 0; 361 | padding-top:3rem; 362 | } 363 | 364 | .demo-header .mdl-layout__header-row { 365 | padding-left: 40px; 366 | } 367 | 368 | .demo-container { 369 | max-width: 1600px; 370 | width: calc(100% - 16px); 371 | margin: 0 auto; 372 | } 373 | 374 | .demo-content { 375 | border-radius: 2px; 376 | padding: 80px 56px; 377 | margin-bottom: 80px; 378 | } 379 | 380 | .demo-layout.is-small-screen .demo-content { 381 | padding: 40px 28px; 382 | } 383 | 384 | .demo-content h3 { 385 | margin-top: 48px; 386 | } 387 | 388 | .demo-footer { 389 | padding-left: 40px; 390 | } 391 | 392 | .demo-footer .mdl-mini-footer--link-list a { 393 | font-size: 13px; 394 | } -------------------------------------------------------------------------------- /static/images/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/1.png -------------------------------------------------------------------------------- /static/images/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/2.png -------------------------------------------------------------------------------- /static/images/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/3.png -------------------------------------------------------------------------------- /static/images/Coursearch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/Coursearch.png -------------------------------------------------------------------------------- /static/images/books.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/books.png -------------------------------------------------------------------------------- /static/images/circle-cropped.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/circle-cropped.png -------------------------------------------------------------------------------- /static/images/home-img1.svg: -------------------------------------------------------------------------------- 1 | teacher -------------------------------------------------------------------------------- /static/images/newicon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RitikMody/Coursearch/ed5d0e2fb43c6a4e7277c445dff8bd5c2860977d/static/images/newicon.png -------------------------------------------------------------------------------- /templates/docs.html: -------------------------------------------------------------------------------- 1 | 2 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | Coursearch API 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 71 | 72 | 73 |
74 | 94 |
95 |
96 |
97 |
98 |
99 | Coursearch » API reference 100 |
101 |
102 |

103 | The Coursearch API serves well formatted data about MOOCs across a few well known platforms. It can be used without any API token, membership, registration or payment. 104 | It supports a few parameters that can be applied to get just the right data you need. 105 | The usage is very simple and requires only basic knowledge of HTTP requests and JSON, XML, YAML or plain text. 106 |

107 |

1. Filter by search term

108 |
/api?searchTerm={searchTerm}
109 |

110 | Any typical search term can be passed to the searchTerm parameter to scrape data across various MOOCs. Good examples for searchTerm would be deep learning, graphic design, writing, etc to get maximum benefit by mining MOOC data from different platforms. 111 |

112 |

2. Filter by search term and platform

113 |
/api?searchTerm={searchTerm}&site={site}
114 |

115 | Any typical search term can be passed to the searchTerm parameter to scrape data across various MOOCs. Good examples for searchTerm would be deep learning, graphic design, writing, etc. 116 |
117 | The site parameter takes any of the following values: coursera, udemy, pluralsight or udacity. This will return MOOC data mined only from the selected platform. 118 |

119 |
120 | 121 |
122 |
123 | 124 |
125 |
126 |
127 | 128 | 129 | 130 | 131 | 132 | -------------------------------------------------------------------------------- /templates/home.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Coursearch 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 44 | 45 |
46 |
47 | 48 | 51 |
52 |
53 | image 54 |
55 |
56 |

57 | During the lockdown, have you too been confused about which courses to choose? Stuck with too many options from Coursera, Udemy, and other such sites?

58 | If so, this app is your solution! Just enter something you want to learn, and we'll get you courses from top platforms, effectively ranked with our unique algorithm. You no longer have to worry about choosing the best options with a list of the best ranked courses from the top websites! 59 |

60 |
65 |
66 |
67 | 68 |
69 | 70 |
71 | 72 | 73 | 74 | -------------------------------------------------------------------------------- /templates/results.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Coursearch-Results 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 30 | 31 | 32 | 33 | 34 | 35 |
36 | 54 | 55 | 66 |
67 | 68 |
69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | {% for i in range(l) %} 82 | {% if df["offered_by"][i]|length %} 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | {% endif %} 92 | {% endfor %} 93 | 94 |
LogoCourse NameOffered byPartner NameRatingRating count
image{{ df["course_name"][i] }}

{{ df["offered_by"][i] }}{{ df["partner_name"][i] }}{{ df["rating_out_of_five"][i] }}{{ df["rating_count"][i] }}
95 |
96 | 97 |
98 |
99 | {% for i in range(l) %} 100 |
101 |
102 |
103 | image 104 |
105 |
106 |
107 |
108 |
{{ df["course_name"][i] }}
109 |
110 |
111 |
112 | {{ df["offered_by"][i] }}
113 | {{ df["partner_name"][i] }} 114 |
115 |
116 | 117 |
118 |
119 |
120 |
121 |
Rating
{{ df["rating_out_of_five"][i] }}
122 |
Rating Count
{{ df["rating_count"][i] }}
123 |
124 |
125 |
126 |
127 |
128 | {% endfor %} 129 |
130 |
131 | 132 | 133 | 136 | 137 | 138 | 139 | --------------------------------------------------------------------------------