├── db └── Dockerfile ├── web ├── requirements.txt ├── Dockerfile └── app.py ├── docker-compose.yml └── README.md /db/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM mongo:3.6.13 2 | -------------------------------------------------------------------------------- /web/requirements.txt: -------------------------------------------------------------------------------- 1 | Flask 2 | flask_restful 3 | pymongo 4 | bcrypt #to store the password and store them as hashed 5 | spacy 6 | 7 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '3' 2 | 3 | services: 4 | web: 5 | build: './web' 6 | ports: 7 | - "5000:5000" 8 | links: 9 | - db 10 | db: 11 | build: './db' 12 | -------------------------------------------------------------------------------- /web/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3 2 | WORKDIR /usr/src/app 3 | COPY requirements.txt ./ 4 | RUN pip install --no-cache-dir -r requirements.txt 5 | COPY . . 6 | RUN python -m spacy download en_core_web_sm 7 | CMD [ "python", "app.py"] 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # RESTful API For Similarity Check Using Natural Language Processing and Docker Compose 2 | 3 | ## INTRODUCTION 4 | 5 | - Build a Similarity check API using NLP, run and deploy using Docker & Docker-compose. 6 | 7 | ----------------- 8 | 9 | ### Documents similarity 10 | Document similarity (or distance between documents) is a one of the central themes in Information Retrieval. How humans usually define how similar are documents? Usually documents treated as similar if they are semantically close and describe similar concepts. On other hand “similarity” can be used in context of duplicate detection. We will review several common approaches. 11 | 12 | ![imageSimilarity](https://miro.medium.com/max/1838/1*l-BZLW3JUHd1MZbNq1MjQA.png) 13 | 14 | ----------------------- 15 | 16 | #### OBJECTIVE 17 | 18 | `The objective of this API is to handle Similarity of text (PLAGIARISM CHECK) ` 19 | 20 | ## API ARCHITECTURE 21 | |RESOURCES |URL(PATH) |METHOD |PARAMETERS |STATUSCODE| 22 | |----------|-------|--------|--------------|----------| 23 | |Register a user | /register | POST | username, password | 200:OK, 301:INVALID USERNAME | 24 | |Detect Similarity of docs | /detect | POST | username, password , text1 & text2 |200:OK RETURN SIMILARITY , 301:INVALID USERNAME, 302:INVALID PASSWORD, 303:OUT OF TOKENS 25 | |Refill | /refill | POST | username, admin_pw, refill_amount | 200:OK, 301:INVALID USERNAME , 304:INVALID ADMIN_PW 26 | 27 | ------------------ 28 | 29 | 30 | ## REQUIREMENTS 31 | 32 | - [spacy.io](https://spacy.io/models/en) is an open-source software library for advanced Natural Language Processing, written in the programming languages Python, it is very easy python processing module. 33 | 34 | **Download the spacy model from [here](https://github.com/explosion/spacy-models/releases//tag/en_core_web_sm-2.1.0)** 35 | 36 | - Flask framework, see how to install and run the flask framework [here](https://github.com/pallets/flask) , for more [details](https://www.fullstackpython.com/flask.html) 37 | 38 | - pymongo, PyMongo is a Python distribution containing tools for working with MongoDB download and install pymongo from [here](https://api.mongodb.com/python/current/) 39 | 40 | - [Docker](https://www.docker.com/) 41 | 42 | - Docker-compose.yml 43 | 44 | ----------------------- 45 | 46 | ## Contributing 47 | 48 | Please feel free to fork this package and contribute by submitting a pull request to enhance the functionalities. 49 | 50 | ------------------- 51 | 52 | ## How can I thank you? 53 | 54 | Why not star the github repo? I'd love the attention! Why not share the link for this repository on Twitter, Hackernews or Destructoid ? Spread the word! 55 | 56 | 57 | Thanks! Ore-Aruwaji Tola. 58 | 59 | 60 | 61 | -------------------------------------------------------------------------------- /web/app.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, jsonify, request 2 | from flask_restful import Api, Resource 3 | from pymongo import MongoClient 4 | import bcrypt 5 | 6 | app = Flask(__name__) 7 | api = Api(app) 8 | 9 | client = MongoClient("mongodb://db:27017") 10 | db = client.SimilarityDB 11 | users = db["Users"] 12 | 13 | 14 | def UserExist(username): 15 | count = users.count_documents({"Username": username}) 16 | return count > 0 17 | 18 | class Register(Resource): 19 | def post(self): 20 | #Step 1 is to get posted data by the user 21 | postedData = request.get_json() 22 | 23 | #Get the data 24 | username = postedData["username"] 25 | password = postedData["password"] #"123xyz" 26 | 27 | if UserExist(username): 28 | retJson = { 29 | 'status':301, 30 | 'msg': 'Invalid Username' 31 | } 32 | return jsonify(retJson) 33 | 34 | hashed_pw = bcrypt.hashpw(password.encode('utf8'), bcrypt.gensalt()) 35 | 36 | #Store username and pw into the database 37 | users.insert_one({ 38 | "Username": username, 39 | "Password": hashed_pw, 40 | "Tokens": 6 41 | }) 42 | 43 | retJson = { 44 | "status": 200, 45 | "msg": "You successfully signed up for the API" 46 | } 47 | return jsonify(retJson) 48 | def verifyPw(username, password): 49 | if not UserExist(username): 50 | return False 51 | 52 | hashed_pw = users.find({ 53 | "Username":username 54 | })[0]["Password"] 55 | 56 | if bcrypt.hashpw(password.encode('utf8'), hashed_pw) == hashed_pw: 57 | return True 58 | else: 59 | return False 60 | 61 | def countTokens(username): 62 | tokens = users.find({ 63 | "Username":username 64 | })[0]["Tokens"] 65 | return tokens 66 | 67 | class Detect(Resource): 68 | def post(self): 69 | #Step 1 get the posted data 70 | postedData = request.get_json() 71 | 72 | #Step 2 is to read the data 73 | username = postedData["username"] 74 | password = postedData["password"] 75 | text1 = postedData["text1"] 76 | text2 = postedData["text2"] 77 | 78 | if not UserExist(username): 79 | retJson = { 80 | 'status':301, 81 | 'msg': "Invalid Username" 82 | } 83 | return jsonify(retJson) 84 | #Step 3 verify the username pw match 85 | correct_pw = verifyPw(username, password) 86 | 87 | if not correct_pw: 88 | retJson = { 89 | "status":302, 90 | "msg": "Incorrect Password" 91 | } 92 | return jsonify(retJson) 93 | #Step 4 Verify user has enough tokens 94 | num_tokens = countTokens(username) 95 | if num_tokens <= 0: 96 | retJson = { 97 | "status": 303, 98 | "msg": "You are out of tokens, please refill!" 99 | } 100 | return jsonify(retJson) 101 | 102 | #Calculate edit distance between text1, text2 103 | import spacy 104 | nlp = spacy.load('en_core_web_sm') 105 | text1 = nlp(text1) 106 | text2 = nlp(text2) 107 | 108 | ratio = text1.similarity(text2) 109 | 110 | retJson = { 111 | "status":200, 112 | "ratio": ratio, 113 | "msg":"Similarity score calculated successfully" 114 | } 115 | 116 | #Take away 1 token from user 117 | current_tokens = countTokens(username) 118 | users.update_one( 119 | {"Username": username}, 120 | {"$set": {"Tokens": current_tokens - 1}} 121 | ) 122 | 123 | return jsonify(retJson) 124 | 125 | class Refill(Resource): 126 | def post(self): 127 | postedData = request.get_json() 128 | 129 | username = postedData["username"] 130 | password = postedData["admin_pw"] 131 | refill_amount = postedData["refill"] 132 | 133 | if not UserExist(username): 134 | retJson = { 135 | "status": 301, 136 | "msg": "Invalid Username" 137 | } 138 | return jsonify(retJson) 139 | 140 | correct_pw = "abc123" 141 | if not password == correct_pw: 142 | retJson = { 143 | "status":304, 144 | "msg": "Invalid Admin Password" 145 | } 146 | return jsonify(retJson) 147 | 148 | #MAKE THE USER PAY! 149 | users.update_one( 150 | {"Username": username}, 151 | {"$set": {"Tokens": refill_amount}} 152 | ) 153 | 154 | retJson = { 155 | "status":200, 156 | "msg": "Refilled successfully" 157 | } 158 | return jsonify(retJson) 159 | 160 | 161 | api.add_resource(Register, '/register') 162 | api.add_resource(Detect, '/detect') 163 | api.add_resource(Refill, '/refill') 164 | 165 | 166 | if __name__=="__main__": 167 | app.run(host='0.0.0.0') 168 | --------------------------------------------------------------------------------