├── .gitignore
├── LICENSE
├── MANIFEST.in
├── README.md
├── analyse_sentence_example.py
├── classify_example.py
├── data_cluster_example.py
├── groups_example.py
├── pysemantics
    ├── NlpClient.py
    ├── __init__.py
    ├── __version__.py
    └── requirements.txt
├── resources
    └── classify_text_in
├── setup.py
└── similarity_example.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/**
2 | pysemantics/__pycache__/**
3 | pysemantics.egg-info/**
4 | build/**
5 | dist/**
6 | 
7 | 
8 | 
9 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2019 Borislav Stoilov
2 | 
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 | 
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 | 
7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.md LICENSE
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DigitalOwl NlpClient
  2 | Python client, that utilizes the [digitalowl.org](https://digitalowl.org) NLP API.
  3 | 
  4 | Take advantage of some of the modern NLP techniques in easy, fast and acessible way. Most of the time you won't need more than 10 lines of code to integrate this into your pipeline.
  5 | 
  6 | **The API is free for use**
  7 | ## Install using pip
  8 | ```pip install pysemantics```
  9 | 
 10 | 
 11 | 
 12 | ## So what can it do?
 13 | 
 14 | #### With few words, this is a script/client that can be used to perform semantic analysis of text, or in order words analyse the text's meaning.
 15 | 
 16 | ## Functionalities
 17 | 
 18 | ### Text classification
 19 | 
 20 | Classify text or url into set of user defined categories.
 21 | 
 22 | **`client.classify(input='https://en.wikipedia.org/wiki/2020_United_States_presidential_election')`**
 23 | 
 24 | Output: 
 25 | 
 26 |     {'tags': ['politics', 'law'], 'originalTags': ['2012 democratic national convention']}
 27 | 
 28 | The url is downloaded, meaningful text is extracted and classified, if you alredy have the text available, you can directly pass it as input.
 29 | 
 30 | Full working code, with more explanations: [classify_example.py](https://github.com/bstoilov/digitalowl-pysemantics/blob/master/classify_example.py)
 31 | 
 32 | 
 33 | ### Phrase/Word analysis
 34 | 
 35 | The underlying logic is based on NLP model called Word2Vec, if given the right training training data, it can start picking up contextual relations between words.
 36 | Meaning words that are used often together, or are used in similar way, are close by contextual meaning (contextual synonyms). 
 37 | 
 38 | **`client.analyse_sentence(sentence='apricot')`**
 39 | 
 40 | Output:
 41 | 
 42 |       {'pistachio': 0.7594164609909058, 'overripe': 0.7523329257965088, 'mango': 0.7421437501907349,
 43 |       'peach': 0.7410970330238342, 'rhubarb': 0.7401571273803711, 'pecan': 0.7379646897315979,
 44 |       'persimmon': 0.7368103265762329, 'strawberry': 0.731874942779541, 'unripe': 0.7294522523880005,
 45 |       'sorbet': 0.7278781533241272, 'walnut': 0.7244322299957275, 'tart': 0.7223066687583923,
 46 |       'beetroot': 0.7216348648071289, 'okra': 0.7172538042068481, 'pumpkin': 0.7165997624397278,
 47 |       'pineapple': 0.7146158814430237, 'lemongrass': 0.7138402462005615, 'papaya': 0.7137945294380188,
 48 |       'blueberry': 0.7127506136894226, 'marmalade': 0.7100027799606323}
 49 | 
 50 | The words that are close to apricot are other fruits and foods, these relations can be used in various NLP tasks.
 51 | Similar relations can be extracted for whole paragraphs full working code with more explanations: 
 52 | [analyse_sentence_example.py](https://github.com/bstoilov/digitalowl-pysemantics/blob/master/analyse_sentence_example.py)
 53 | 
 54 | 
 55 | 
 56 | ### Semantic Similarity
 57 | 
 58 | Given two documents, words or just phrases, you can compare to what degree they are close by meaning.
 59 | 
 60 | `first = 'https://en.wikipedia.org/wiki/Impeachment_inquiry_against_Donald_Trump'`
 61 | 
 62 | `second = 'https://news.sky.com/story/ex-trump-adviser-fiona-hill-says-russia-gearing-up-to-interfere-in-2020-election-11866422'
 63 | `
 64 | 
 65 | `client.similarity(first=first, second=second)`
 66 | 
 67 |     {'similarity': 0.9516085802597031}
 68 | 
 69 | Full working example with documentation: [similarity_example.py](https://github.com/bstoilov/digitalowl-pysemantics/blob/master/similarity_example.py)
 70 | 
 71 | ### Text Clusters
 72 | 
 73 | Automatically group documents, words or sentences.
 74 | 
 75 | Using the vectors we obtain from the API and the KMeans algorithm integrated into this client
 76 | we can group pieces of text or documents based on their meaning.
 77 | 
 78 | Full working example can be found here: [data_cluster_example.py](https://github.com/bstoilov/digitalowl-pysemantics/blob/master/data_cluster_example.py)
 79 | 
 80 | 
 81 | ### Belong to group check
 82 | 
 83 | Using the client you are able to define a group of objects and then determine if certain object belongs to that group
 84 | 
 85 | define some group of animals:
 86 | 
 87 | `group = ['cat', 'dog', 'fox', 'horse', 'rhino']`
 88 | 
 89 | pick some random words, some of which are animals:
 90 | 
 91 | `targets = ['carrot', 'animal', 'monkey', 'ship', 'Canada', 'buffalo', 'crow', 'news', 'government', 'murder', 'chariot']`
 92 | 
 93 | `client.belong(group=group, targets=targets)`
 94 | 
 95 | Output: 
 96 | 
 97 |     ['animal', 'monkey', 'buffalo', 'crow', 'chariot']
 98 | 
 99 | Full working example: [groups_example.py](https://github.com/bstoilov/digitalowl-pysemantics/blob/master/groups_example.py)
100 | 
101 | 
102 | 
103 | ##### In case you find any issues, please report them as issue. Any feedback is welcome, don't hesitate to contact me at borislav.stoilov@digitalowl.org
104 | 
105 | 
106 | 


--------------------------------------------------------------------------------
/analyse_sentence_example.py:
--------------------------------------------------------------------------------
 1 | from pysemantics.NlpClient import NlpClient
 2 | 
 3 | 
 4 | def contextual_synonyms():
 5 |     client = NlpClient()
 6 | 
 7 |     word = 'apricot'
 8 |     resp = client.analyse_sentence(sentence=word)
 9 |     print(resp['words'])
10 |     # {'pistachio': 0.7594164609909058, 'overripe': 0.7523329257965088, 'mango': 0.7421437501907349,
11 |     #  'peach': 0.7410970330238342, 'rhubarb': 0.7401571273803711, 'pecan': 0.7379646897315979,
12 |     #  'persimmon': 0.7368103265762329, 'strawberry': 0.731874942779541, 'unripe': 0.7294522523880005,
13 |     #  'sorbet': 0.7278781533241272, 'walnut': 0.7244322299957275, 'tart': 0.7223066687583923,
14 |     #  'beetroot': 0.7216348648071289, 'okra': 0.7172538042068481, 'pumpkin': 0.7165997624397278,
15 |     #  'pineapple': 0.7146158814430237, 'lemongrass': 0.7138402462005615, 'papaya': 0.7137945294380188,
16 |     #  'blueberry': 0.7127506136894226, 'marmalade': 0.7100027799606323}
17 | 
18 |     # in the response we see other words that are often used in the same context and in similar way as 'apricot'
19 |     # these are the so called contextual synonyms of 'apricot'
20 | 
21 | 
22 | def analyse_sentence_example():
23 |     client = NlpClient()
24 | 
25 |     sentence = 'Taking your pets to the vet should be made mandatory to all pet owners'
26 |     resp = client.analyse_sentence(sentence=sentence)
27 | 
28 |     print(resp['words'])
29 |     # output:
30 |     # {'them': 0.5691992044448853, 'you': 0.5044294595718384, 'yours': 0.49882641434669495, 'theirs': 0.49810361862182617,
31 |     #  'they': 0.49736160039901733, 'dog': 0.4872986376285553, 'himher': 0.4866429567337036, 'cats': 0.4748264253139496,
32 |     #  'dogs': 0.47481757402420044, 'puppy': 0.4733749330043793, 'landlord': 0.4688832461833954,
33 |     #  'parent': 0.4664549231529236, 'fleas': 0.4609288275241852, 'spouse': 0.45495110750198364,
34 |     #  'themselves': 0.4534609317779541, 'cat': 0.45328179001808167, 'yourself': 0.4528433084487915,
35 |     #  'insured': 0.45103758573532104, 'YOU': 0.45088881254196167, 'patient': 0.44211727380752563}
36 | 
37 |     # the output is an attempt to find the words that best describe contextually the sentence
38 |     # this can be used to classify short phrases
39 | 
40 | 
41 | # analyse_sentence_example()
42 | # contextual_synonyms()
43 | 


--------------------------------------------------------------------------------
/classify_example.py:
--------------------------------------------------------------------------------
 1 | from pysemantics.NlpClient import NlpClient
 2 | 
 3 | 
 4 | # all classification work best with larger texts, if an article has too few words, it might not be enough to pick up the context from that
 5 | # if you need to deal with short few word phrases you should use 'analyse_sentance' method
 6 | def classify_url_example():
 7 |     target_urls = [
 8 |         'https://en.wikipedia.org/wiki/2020_United_States_presidential_election',
 9 |         'https://pinchofyum.com/freezer-meals',
10 |         'https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array',
11 |         'https://www.digitaltrends.com/cars/best-cars/',
12 |         'https://www.cnbc.com/2019/11/22/us-china-economic-conflict-could-be-worse-than-wwi-henry-kissinger-says.html'
13 |     ]
14 | 
15 |     client = NlpClient()
16 | 
17 |     # the api will classify url based entierly on the web page contents
18 |     # all of the urls are downloaded, meaningful text is extracted and that is what is fed to the algorithm
19 |     for url in target_urls:
20 |         classification = client.classify(input=url)
21 |         print('url:{} -> {}'.format(url, classification))
22 |     # Output
23 |     # url: https://en.wikipedia.org/wiki/2020_United_States_presidential_election -> {'tags': ['politics', 'law'], 'originalTags': ['2012 democratic national convention']}
24 |     # url: https://pinchofyum.com/freezer-meals -> {'tags': ['food'], 'originalTags': ['easy recipes', 'cooking']}
25 |     # url: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array -> {'tags': ['software', 'language'], 'originalTags': ['ansi c', 'c++ (programming language)', 'c (programming language)']}
26 |     # url: https://www.digitaltrends.com/cars/best-cars/ -> {'tags': ['vehicles', 'retail'], 'originalTags': ['sport utility vehicles (suvs)', 'car buying', 'cars and automobiles']}
27 |     # url: https://www.cnbc.com/2019/11/22/us-china-economic-conflict-could-be-worse-than-wwi-henry-kissinger-says.html -> {'tags': ['politics', 'law', 'war'], 'originalTags': ['foreign policy', 'foreign policy of india', 'indian army']}
28 | 
29 |     # in the response you will notice two results for each url
30 |     # original_tags -> the algorithm choses from a huge set of user defined tags (around 120k) and picks the ones that are most applicable according to the word vectors
31 |     # these tags are often too concrete to be true
32 | 
33 |     # tags object -> these are tags obtained by 'generalizing' the original tags, for example in this case 'c++ (programming language)', 'c (programming language)'
34 |     # both belons to the more general 'software' tag and 'easy recipes', 'cooking' both belong to food and so on
35 | 
36 | 
37 | def classify_text_example():
38 |     # some text about car reviews
39 |     text = ' '.join(open('resources/classify_text_in', "r+").readlines())
40 | 
41 |     # the idea here is the same as with classify__url_example, just the download and extract text step is skipped.
42 |     # if you already have the text documents available this will be a lot faster
43 |     client = NlpClient()
44 |     result = client.classify(input=text)
45 |     print(result)
46 |     # OUTPUT:
47 |     # {'tags': ['vehicles', 'media'], 'originalTags': ['cars and automobiles', 'survey question']}
48 | 
49 | classify_url_example()
50 | # classify_text_example()
51 | 


--------------------------------------------------------------------------------
/data_cluster_example.py:
--------------------------------------------------------------------------------
 1 | from pysemantics.NlpClient import NlpClient
 2 | from random import shuffle
 3 | 
 4 | def example_clusters():
 5 |     # the idea is to automatically group phrases (words in this case) into
 6 |     # into previously specified number of groups, by the meaning of each phrase
 7 |     # For this we will use sentence vectors from digitalowl.org and the KMEans algorithm from sklearn
 8 | 
 9 |     # in this example we will chose 3 groups of objects, namely vechicles, vegetables and sport
10 |     cluster_count = 3
11 |     vegetables = ['greens', 'leaves', 'asparagus', 'stems', 'beans', 'seeds', 'beetroot', 'broccoli', 'flowers',
12 |                   'brussels', 'cabbages', 'capsicums', 'carrots', 'cauliflower', 'celeriac', 'celery', 'chilli',
13 |                   'peppers', 'chokos', 'cucumber', 'eggplant']
14 | 
15 |     vehicles = ['engine', 'wheels', 'car', 'truck', 'tires', 'gasoline', 'fuel']
16 | 
17 |     sport = ['tennis', 'football', 'cricket', 'chess', 'baseball', 'swimming']
18 | 
19 |     # merge and shuffle the data into one single array
20 |     all = vegetables + vehicles + sport
21 |     shuffle(all)
22 | 
23 |     # the client will help us process the vectors, but you can directly call '_obtain_vectors' and use other method
24 |     # the clusters function uses kmeans to build 3 groups of data out of the data we provided
25 |     client = NlpClient()
26 |     cluster_data = client.clusters(sentences=all, cluster_count=cluster_count)
27 | 
28 |     for cluster_id in cluster_data:
29 |         print('Cluster {} - {}'.format(cluster_id, cluster_data[cluster_id]))
30 | 
31 | 
32 |     # OUTPUT:
33 |     # Cluster 2 - ['greens', 'cucumber', 'peppers', 'capsicums', 'beetroot', 'cauliflower', 'cabbages', 'chilli', 'stems', 'seeds', 'celeriac', 'flowers', 'leaves', 'beans', 'asparagus', 'broccoli', 'celery', 'brussels', 'carrots','eggplant']
34 |     # Cluster 1 - ['tennis', 'baseball', 'football', 'swimming', 'chess', 'cricket']
35 |     # Cluster 0 - ['wheels', 'engine', 'tires', 'fuel', 'car', 'truck', 'gasoline']
36 | 
37 |     # we can see that we obtained the initial arrays, the word 'chokos' is missing from the vegetables group
38 |     # this is because it is not recognized by the digitalowl.org api, it simply wasn't mentioned in its training data
39 |     # so this word is ignored.
40 | 
41 | example_clusters()


--------------------------------------------------------------------------------
/groups_example.py:
--------------------------------------------------------------------------------
 1 | from pysemantics.NlpClient import NlpClient
 2 | 
 3 | 
 4 | def belongs_to_group_example():
 5 |     # We define a group of words or sentence and then check which of the targets belong to the group
 6 |     # Each word from target is compared to the words in group if there is
 7 |     # pair that is similar enough the target is included in the result
 8 |     group = ['cat', 'dog', 'fox', 'horse', 'rhino']
 9 |     targets = ['carrot', 'animal', 'monkey', 'ship', 'canada', 'buffalo', 'crow', 'news', 'government', 'murder',
10 |                'chariot']
11 | 
12 |     client = NlpClient()
13 |     # belonging is the array of items that belong to the specified group
14 |     # you can pass sim_factor arg to control the similarity of items, it can range from 0 to 1
15 |     # 0 meaning 'nothing in common', 1 meaning 'exactly the same', default is 0,5
16 |     belonging = client.belong(group=group, targets=targets)
17 | 
18 |     print(belonging)
19 |     # OUTPUT:
20 |     # ['animal', 'monkey', 'buffalo', 'crow', 'chariot']
21 | 
22 |     # In a sense we defined a group that can recognize animals, we see that anything that is not an animal is excluded
23 |     # NOTE: don't expect accurate results at all times because the algorithm relies on contextual similarity
24 |     # for example if we add 'chariot' to the targets we will see it in the result because the phrase 'horse chariot' is very common
25 | 
26 | 
27 | belongs_to_group_example()
28 | 


--------------------------------------------------------------------------------
/pysemantics/NlpClient.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | from sklearn.cluster import KMeans
 4 | from scipy import spatial
 5 | import numpy as np
 6 | 
 7 | COMMON_HEADERS = {'content-type': 'application/json'}
 8 | 
 9 | 
10 | class NlpClient:
11 |     def __init__(self, base_url='http://46.4.143.163:11111/'):
12 |         self.base_url = base_url
13 | 
14 |     def classify(self, input=''):
15 |         data = {
16 |             'input': input
17 |         }
18 | 
19 |         return requests.post(self.base_url + 'classify/topic', headers=COMMON_HEADERS, data=json.dumps(data)).json()
20 | 
21 |     def similarity(self, first='', second=''):
22 |         data = {
23 |             'first': first,
24 |             'second': second,
25 |             'lang': 'en'
26 |         }
27 | 
28 |         return requests.post(self.base_url + 'classify/similarity', headers=COMMON_HEADERS,
29 |                              data=json.dumps(data)).json()
30 | 
31 |     def analyse_sentence(self, sentence):
32 |         data = {
33 |             'positive': sentence.split(' '),
34 |             'negative': [],
35 |             'lang': 'en'
36 |         }
37 | 
38 |         return requests.post(self.base_url + 'wv/eval', headers=COMMON_HEADERS,
39 |                              data=json.dumps(data)).json()
40 | 
41 |     def _obtain_vectors(self, sentences):
42 |         data = {
43 |             "sentances": sentences
44 |         }
45 | 
46 |         res = requests.post(self.base_url + 'wv/vectors', headers=COMMON_HEADERS, data=json.dumps(data)).json()
47 |         return res['vectors']
48 | 
49 |     def clusters(self, sentences, cluster_count):
50 |         vectors = self._obtain_vectors(sentences)
51 | 
52 |         i = 0
53 |         while i < len(vectors):
54 |             if vectors[i] is None:
55 |                 sentences.pop(i)
56 |                 vectors.pop(i)
57 |             i += 1
58 | 
59 |         X = np.array(vectors)
60 |         kmeans = KMeans(n_clusters=cluster_count, random_state=0).fit(X)
61 |         line_count = len(sentences)
62 |         result = {}
63 |         i = 0
64 |         while i < line_count:
65 |             cluster = kmeans.labels_[i]
66 |             if cluster not in result:
67 |                 result[cluster] = []
68 |             result[cluster].append(sentences[i])
69 |             i += 1
70 |         return result
71 | 
72 |     def belong(self, group, targets, sim_factor=0.5):
73 |         group_vectors = self._obtain_vectors(group)
74 |         target_vectors = self._obtain_vectors(targets)
75 |         belonging = []
76 |         i = 0
77 |         while i < len(target_vectors):
78 |             target_vec = target_vectors[i]
79 | 
80 |             if target_vec is not None:
81 |                 j = 0
82 |                 while j < len(group_vectors):
83 |                     cosine_dist = spatial.distance.cosine(target_vec, group_vectors[j])
84 |                     sim = 1 - cosine_dist
85 |                     if sim >= sim_factor:
86 |                         belonging.append(targets[i])
87 |                         break
88 |                     j += 1
89 |             i += 1
90 | 
91 |         return belonging
92 | 


--------------------------------------------------------------------------------
/pysemantics/__init__.py:
--------------------------------------------------------------------------------
1 | from pysemantics import *


--------------------------------------------------------------------------------
/pysemantics/__version__.py:
--------------------------------------------------------------------------------
1 | __version__ = '1.0.3'
2 | 


--------------------------------------------------------------------------------
/pysemantics/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | sklearn
3 | spatial


--------------------------------------------------------------------------------
/resources/classify_text_in:
--------------------------------------------------------------------------------
1 | Fun to drive. The phrase gets blasted from seemingly every car commercial, magazine ad, and influencer account – overused that it has lost all meaning. So when Mazda, a small firm that actually does make cars that are fun to drive, talks about their most compelling trait it gets lost in the cacophony of ad spends. However, we're here to tell you that yes, while it's difficult to quantify, some cars are objectively more fun to drive than others, and the all-new 2019 Mazda3 is — and this is a very technical term — a freakin' blast. At Mazda's behest, we took a 2019 sedan up Angeles Crest Highway just outside of L.A. With plenty of yellow signs, tight sequences of banked curves and elevation changes, it's the platonic ideal of those serpentine mountain roads you see in car commercials. X The instant the Mazda3 reaches the windy roads, it glides in like an otter diving into the sea. Lively and graceful, it dances along a ribbon of asphalt more naturally than any compact sedan we've driven since the advent of drive-by-wire. The steering is not only direct and true, but possesses an extraordinary ability to maintain trajectory. From the moment you turn in, you never need to make adjustments to the steering wheel until the front tires are straight again. The car goes exactly where you intend, always. That's not hyperbole, but an amazing feat of engineering. In nearly every other vehicle, even those that purport to be sports cars, unless you're incredibly familiar with the machine and know the road like the back of your hand, minor mid-corner corrections are an inevitability. With the 3, you get it right on the first try. Now imagine you're on strip of canyon pavement with lots of short switchbacks in varying radii coming up fast, one right after another. The 3 links them all together with pure ease, and soon you're developing a rhythm through the curves. While other cars charge, the Mazda flows. The car's poise is particularly evident as momentum shifts from one direction to another, what Mazda chassis engineer Dave Coleman termed 'transience.' In most cars passengers are tossed around the cabin like mannequins, but the 3 cuts out the turbulence, its body engineered to move in a smooth undulation. At the midpoint of the transition, there's even a moment of weightlessness before the car tucks into the next turn and the seat seems to scoop you up and carry you onward. You don't have to be Yoshimi Katayama to finesse this Mazda, and some might think that means it's somehow lesser than a hardcore sports car. Quite the contrary. It's actually much more difficult to engineer a car that does what the Mazda3 does because they're not just throwing horsepower and grip at it to meet benchmarks. Perhaps the best example of this philosophy was illustrated by a piece of Japanese exercise equipment. To demonstrate their human-centric doctrine, Mazda had us sit in two cars, a current-generation 3 and the 2019 edition. Both had their passenger seats removed, and in their place was bolted a flat seating cushion mounted atop a single spring. With no armrests or seatbacks to stabilize yourself with, simply sitting upright took substantial balance and constant working of your torso, thigh and back muscles. Then a Mazda engineer drove us around in each car at speeds of no more than 5 mph to demonstrate just how much smoother the new 3 was compared to the old one, how much harder you had to work to hold yourself up in the old car against each input of the driver. 'This was a conscious decision by Mazda,' Director of R&amp;D Engineering Kelvin Hiraishi told us. 'Because if you can't enjoy the drive over the shouts of passengers to take it easy, what's the point of the car?' Of course, there's also the matter of the 3's rear suspension, and the replacement of a multi-link with that of a torsion beam for 2019. On paper it looks like a regression, but in the real world, the handling doesn't seem to be affected one iota. 'If you believe Mazda is truly committed to our philosophy of jinba ittai [horse and rider as one],' Hiraishi continued, 'You know that we'd never sacrifice our dynamics. It's not who we are.' 'With the torsion beam, we were able to increase lateral rigidity by a good margin,' Hiraishi added, 'There are fewer arms and bushings deflecting in various directions as the car turns.' Indeed, if it works well, there's no reason to get hung up on what's underneath. In fact, the handling almost overshadows the 3's other aspects, which are also very good. The sole powertrain is a naturally aspirated version of Mazda's 2.5-liter SkyActiv engine, the same one found in the CX-5, mated to a 6-speed automatic. That's for the sedan, which is the only version available to drive at this point (the hatchback will offer a manual option on its highest trim level). We've found this engine, which produces 186-horsepower and 186 pound-foot of torque and features cylinder deactivation, impressive in other Mazdas, and it continues to serve as a superb powerplant in the 3. Mazda still stubbornly refuses to add more than 6 gears to the transmission, but as Coleman reasons, 'the fewer the gears, the less it will hunt.' Activating Sport Mode holds the gears longer for peppy acceleration — and it's smart enough that, in a rare turn of events, we didn't find ourselves aching for manual — but doesn't change the steering or throttle response, as it does in many cars. 'That,' Coleman explained with a wry grin, 'is because we've already found the ideal setting.' The theme of ideal settings continues in the Mazda's cabin. Completely redesigned, it's a big departure from previous Mazdas. And there's one detail that differs significantly from most other cars on the market for that matter: The 8.8-inch 'infotainment' display is no longer a touchscreen. Mazda engineers have acknowledged what many other companies don't: Hunting for virtual buttons is a dangerous and unnecessary distraction. Instead, a command dial on the center console navigates the menu, while the screen itself now angles toward the driver and moves closer to the base of the windshield. That not only keeps the display closer to the driver's field of vision, but also requires less time to refocus the eyes when directing attention from something distant to something inside the cabin. Likewise, an available heads-up display is projected at a virtual distance of 2.3 meters ahead, which HMI engineer Matthew Valbuena says is the ideal span to minimize eyeball refocusing time. We found the command dial extremely intuitive, operable by nothing but feel. The menus have helpfully been redesigned to include a visual shorthand as well — if you're trying to change a setting in the instrument cluster, for instance, an image of the speedometer appears next to the menu text for easy identification. The cupholders have been relocated to fore of the shifter so while operating the command dial your forearm has more support from a larger center armrest cushion, which doubles as the lid to a large storage compartment. Next to the command dial sits a smaller knob for volume control, which also moves like a joystick for track selection or scanning. The stereo sounds clear and crisp with no bass distortion on the 6-speaker base system, and even better with the optional Bose 10-speaker. 2019 Mazda32019 Mazda32019 Mazda32019 Mazda3 Furthermore, Mazda has worked hard to maintain a sense of consistency throughout the cabin. Every dial and knob turns with the same click. Every button, whether on the steering wheel, door armrest, or center console has an identical detent feel. All interior lighting emits the same hue of white. It's all part of a grander mission to move upscale, and the rest of the cabin does not disappoint. Deluxe materials clothe most surfaces, save the hard plasticky-ness of the steering wheel. The seats provide excellent support without restrictive bolstering, keeping us in place on those twisty roads without restricting torso or arm movement, and add thigh support to the driver's side. Mazda doesn't skimp on the safety tech, either, adding front cross-traffic alerts for 2019. A traffic jam assist can either lane trace or follow the vehicle ahead below 40 mph. The cabin is supremely quiet for a C-segment offering as well. The base 3 Sedan starts at $21,895, and the only thing that lands anywhere in the same zip code in terms of driving experience is the Honda Civic. However, the Civic Sport Sedan starts at $21,150 and comes with just 158 horsepower. The 3 is more closely aligned with the Civic Si, but then you're already spending $24,300 at a minimum and it looks like it was designed by a robot. The Mazda, on the other hand, is stunning. If it wore a European badge, they could probably charge $10,000 more on each model. We say that because that's exactly what Audi does with the A3, and the Mazda is every bit its equal. But see, there we go comparing figures and specs again. At the end of the day, the Mazda might not win on every metric, but it doesn't matter. As a driver, you owe it to yourself to put it on your list. Just make sure your salesperson lets you take it beyond a few city blocks (Mazda still has a ways to go in dealership training), and thoughts about cubic feet will melt away immediately. We felt our mountain frolic too short and wanted more than anything to drive it more. Few cars in this class and price range can generate that kind of emotion – or any kind of emotion. If you're going to spend time in your car, why not make it a source of joy for both you and your passengers? It's the only way we'll truly understand the meaning of 'fun to drive' again.


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | # Note: To use the 'upload' functionality of this file, you must:
  5 | #   $ pip install twine
  6 | 
  7 | import io
  8 | import os
  9 | import sys
 10 | from shutil import rmtree
 11 | 
 12 | from setuptools import setup, Command
 13 | 
 14 | NAME = 'pysemantics'
 15 | PACKAGE = 'pysemantics'
 16 | DESCRIPTION = 'NLP client for python'
 17 | URL = 'https://github.com/bstoilov/digitalowl-pysemantics'
 18 | EMAIL = 'borislav.stoilov@digitalowl.org'
 19 | AUTHOR = 'Borislav Stoilov'
 20 | REQUIRES_PYTHON = '>=3.5.0'
 21 | REQUIRED = ['requests', 'sklearn', 'spatial', 'numpy']
 22 | 
 23 | here = os.path.abspath(os.path.dirname(__file__))
 24 | 
 25 | try:
 26 |     with io.open(os.path.join(here, 'README.md'), encoding='utf-8') as f:
 27 |         long_description = '\n' + f.read()
 28 | except FileNotFoundError:
 29 |     long_description = DESCRIPTION
 30 | 
 31 | about = {}
 32 | with open(os.path.join(here, PACKAGE, '__version__.py')) as f:
 33 |     exec(f.read(), about)
 34 | 
 35 | 
 36 | class UploadCommand(Command):
 37 |     """Support setup.py upload."""
 38 | 
 39 |     description = 'Build and publish the package.'
 40 |     user_options = []
 41 | 
 42 |     @staticmethod
 43 |     def status(s):
 44 |         """Prints things in bold."""
 45 |         print('\033[1m{0}\033[0m'.format(s))
 46 | 
 47 |     def initialize_options(self):
 48 |         pass
 49 | 
 50 |     def finalize_options(self):
 51 |         pass
 52 | 
 53 |     def run(self):
 54 |         try:
 55 |             self.status('Removing previous builds…')
 56 |             rmtree(os.path.join(here, 'dist'))
 57 |         except OSError:
 58 |             pass
 59 | 
 60 |         self.status('Building Source and Wheel (universal) distribution…')
 61 |         os.system('{0} setup.py sdist bdist_wheel --universal'.format(sys.executable))
 62 | 
 63 |         self.status('Uploading the package to PyPI via Twine…')
 64 |         os.system('twine upload dist/*')
 65 | 
 66 |         self.status('Pushing git tags…')
 67 |         os.system('git tag v{0}'.format(about['__version__']))
 68 |         os.system('git push --tags')
 69 | 
 70 |         sys.exit()
 71 | 
 72 | 
 73 | setup(
 74 |     name=NAME,
 75 |     version=about['__version__'],
 76 |     description=DESCRIPTION,
 77 |     long_description=long_description,
 78 |     long_description_content_type='text/markdown',
 79 |     author=AUTHOR,
 80 |     author_email=EMAIL,
 81 |     python_requires=REQUIRES_PYTHON,
 82 |     url=URL,
 83 |     install_requires=REQUIRED,
 84 |     include_package_data=True,
 85 |     packages=[PACKAGE],
 86 |     zip_safe=False,
 87 |     license='MIT',
 88 |     classifiers=[
 89 |         'Development Status :: 4 - Beta',
 90 |         'License :: OSI Approved :: MIT License',
 91 |         'Intended Audience :: Developers',
 92 |         'Natural Language :: English',
 93 |         'Programming Language :: Python',
 94 |         'Programming Language :: Python :: 3',
 95 |         'Programming Language :: Python :: 3.6',
 96 |         'Programming Language :: Python :: 3.7',
 97 |         'Programming Language :: Python :: Implementation :: CPython',
 98 |     ],
 99 |     cmdclass={
100 |         'upload': UploadCommand,
101 |     },
102 | )
103 | 


--------------------------------------------------------------------------------
/similarity_example.py:
--------------------------------------------------------------------------------
 1 | from pysemantics.NlpClient import NlpClient
 2 | 
 3 | 
 4 | def similarity_example():
 5 |     # first and second can be two urls, two texts, or one url and one text
 6 |     first = 'https://en.wikipedia.org/wiki/Impeachment_inquiry_against_Donald_Trump'
 7 |     second = 'https://news.sky.com/story/ex-trump-adviser-fiona-hill-says-russia-gearing-up-to-interfere-in-2020-election-11866422'
 8 | 
 9 |     client = NlpClient()
10 |     similarity = client.similarity(first=first, second=second)
11 | 
12 |     print(similarity)
13 | 
14 |     # Output:
15 |     # {'similarity': 0.9516085802597031}
16 | 
17 |     # two different articles, writen by different authors, but they are about the same topic
18 |     # we don't compare syntactic similarity here, we compare semantic similarity
19 |     # the meaning of both text is quite similar, in fact everything with similarity more
20 |     # than 0.6 is quite similar
21 | 
22 | 
23 | similarity_example()
24 | 


--------------------------------------------------------------------------------