├── .gitignore
├── .travis.yml
├── CITATION.R
├── LICENSES.txt
├── MANIFEST.in
├── README.rst
├── __init__.py
├── bin
    └── sentimentpy_logo
    │   ├── py_sentimentpy.png
    │   ├── py_sentimentpya.png
    │   ├── py_sentimentpyb.png
    │   ├── py_sentimentr.pptx
    │   └── resize_icon.txt
├── sentimentpy.Rproj
├── sentimentpy
    ├── __init__.py
    └── split_sentences.py
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | 
 3 | sudo: false
 4 | 
 5 | python:
 6 |    - "3.5"
 7 |    - "3.6"
 8 | 
 9 | 
10 | install:
11 |   - "./travis.sh"
12 | 
13 | 
14 | script: 
15 |   - pytest
16 | 
17 | notifications:
18 |   email:
19 |     on_success:      change
20 |     on_failure:      change
21 | 
22 | cache: pip


--------------------------------------------------------------------------------
/CITATION.R:
--------------------------------------------------------------------------------
1 | @Manual{,
2 |     title = {{sentimentpy}: Calculate Text Polarity Sentiment},
3 |     author = {Tyler W. Rinker},
4 |     address = {Buffalo, New York},
5 |     note = {version 2.7.0},
6 |     year = {2018},
7 |     url = {http://github.com/trinker/sentimentpy},
8 |   }
9 | 


--------------------------------------------------------------------------------
/LICENSES.txt:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2018 Tyler W. Rinker
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.txt *.py *.rst
2 | recursive-include sentimentpy *.txt *.py
3 | recursive-include additional_resources *.tar.gz


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
 1 | sentimentpy
 2 | ===========
 3 | 
 4 | .. image:: https://www.repostatus.org/badges/latest/wip.svg
 5 |    :alt: Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
 6 |    :target: https://www.repostatus.org/#wip
 7 |     
 8 | .. image:: https://img.shields.io/travis/trinker/sentimentpy/master.svg?style=flat-square&logo=travis
 9 |     :target: https://travis-ci.org/trinker/sentimentpy
10 |     :alt: Build Status
11 |     
12 | .. image:: bin/sentimentpy_logo/py_sentimentpyb.png
13 |     :alt: Module Logo
14 |     
15 | 
16 | 
17 |     
18 | **sentimentpy** is designed to quickly calculate text polarity sentiment at the sentence level.  The user can aggregate these scores by grouping variable(s) using built-in aggregate functions.  
19 | 
20 | 
21 | **sentimentpy** (a Python port of the R `sentimentr package <https://github.com/trinker/sentimentr>`_) is a response to my own needs with sentiment detection that were not addressed by the current **R** tools.  My own `polarity` function in the R **qdap** package is slower on larger data sets.  It is a dictionary lookup approach that tries to incorporate weighting for valence shifters (negation and amplifiers/deamplifiers).  Matthew Jockers created the `syuzhet <http://www.matthewjockers.net/2015/02/02/syuzhet>`_ R package that utilizes dictionary lookups for the Bing, NRC, and Afinn methods as well as a custom dictionary.  He also utilizes a wrapper for the `Stanford coreNLP <http://nlp.stanford.edu/software/corenlp.shtml>`_ which uses much more sophisticated analysis.  Jocker's dictionary methods are fast but are more prone to error in the case of valence shifters.  Jocker's `addressed these critiques <http://www.matthewjockers.net/2015/03/04/some-thoughts-on-annies-thoughts-about-syuzhet/>`_ explaining that the method is good with regard to analyzing general sentiment in a piece of literature.  He points to the accuracy of the Stanford detection as well.  In my own work I need better accuracy than a simple dictionary lookup; something that considers valence shifters yet optimizes speed which the Stanford's parser does not.  This leads to a trade off of speed vs. accuracy.  Simply, **sentimentpy** attempts to balance accuracy and speed.
22 | 
23 | 
24 | Installation
25 | ============
26 | 
27 | 
28 | Currently, this is a GitHub package.  To install use:
29 | 
30 | ``pip install git+https://github.com/trinker/sentimentpy``
31 | 
32 | 
33 | Sentence Splitting
34 | ==================
35 | 
36 | ::
37 |        
38 |     import sentimentpy.split_sentences as ss
39 |     
40 |     s = [
41 |         ' I like you.  P.S. I like carrots too mrs. dunbar. Well let\'s go to 100th st. around the corner.   ', 
42 |         'Hello Dr. Livingstone.  How are you?', 
43 |         'This is sill an incomplete thou.'
44 |         
45 |     ]
46 |     
47 |     ss.split_sentences(s)
48 | 
49 | ::
50 |     
51 |    ['I like you.',
52 |      'P.S. I like carrots too mrs. dunbar.',
53 |      "Well let's go to 100th st. around the corner.",
54 |      'Hello Dr. Livingstone.',
55 |      'How are you?',
56 |      'This is sill an incomplete thou.']
57 |    
58 | ::
59 |     
60 |     x = [
61 |         " ".join(
62 |             ["Mr. Brown comes! He says hello. i give him coffee.  i will ",
63 |             "go at 5 p. m. eastern time.  Or somewhere in between!go there"
64 |         ]),
65 |         " ".join(
66 |             ["Marvin K. Mooney Will You Please Go Now!", "The time has come.",
67 |             "The time has come. The time is now. Just go. Go. GO!",
68 |             "I don't care how."
69 |         ])
70 |     ]
71 |     
72 |     ss.split_sentences(x)
73 | 
74 | ::
75 |     
76 |     ['Mr. Brown comes!',
77 |      'He says hello.',
78 |      'i give him coffee.',
79 |      'i will  go at 5 p.m. eastern time.',
80 |      'Or somewhere in between!',
81 |      'go there',
82 |      'Marvin K. Mooney Will You Please Go Now!',
83 |      'The time has come.',
84 |      'The time has come.',
85 |      'The time is now.',
86 |      'Just go.',
87 |      'Go.',
88 |      'GO!',
89 |      "I don't care how."]    


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trinker/sentimentpy/ce960456d5d9ac4c211e910dd3d379fc895d2d9b/__init__.py


--------------------------------------------------------------------------------
/bin/sentimentpy_logo/py_sentimentpy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trinker/sentimentpy/ce960456d5d9ac4c211e910dd3d379fc895d2d9b/bin/sentimentpy_logo/py_sentimentpy.png


--------------------------------------------------------------------------------
/bin/sentimentpy_logo/py_sentimentpya.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trinker/sentimentpy/ce960456d5d9ac4c211e910dd3d379fc895d2d9b/bin/sentimentpy_logo/py_sentimentpya.png


--------------------------------------------------------------------------------
/bin/sentimentpy_logo/py_sentimentpyb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trinker/sentimentpy/ce960456d5d9ac4c211e910dd3d379fc895d2d9b/bin/sentimentpy_logo/py_sentimentpyb.png


--------------------------------------------------------------------------------
/bin/sentimentpy_logo/py_sentimentr.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trinker/sentimentpy/ce960456d5d9ac4c211e910dd3d379fc895d2d9b/bin/sentimentpy_logo/py_sentimentr.pptx


--------------------------------------------------------------------------------
/bin/sentimentpy_logo/resize_icon.txt:
--------------------------------------------------------------------------------
1 | ffmpeg -i py_sentimentpya.png -vf scale=150:-1 py_sentimentpy.png
2 | 
3 | convert py_sentimentpya.png -transparent white -resize 25% -crop 0x0-30-30 py_sentimentpy.png
4 | convert py_sentimentpya.png -transparent white -resize 16% -crop 0x0-18-19 py_sentimentpyb.png
5 | 


--------------------------------------------------------------------------------
/sentimentpy.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 4
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: knitr
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------
/sentimentpy/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trinker/sentimentpy/ce960456d5d9ac4c211e910dd3d379fc895d2d9b/sentimentpy/__init__.py


--------------------------------------------------------------------------------
/sentimentpy/split_sentences.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon Nov  5 19:07:23 2018
  4 | 
  5 | @author: trinker
  6 | """
  7 | 
  8 | import re
  9 | import numpy as np
 10 | 
 11 | 
 12 | abbr_rep_1_json = {
 13 |     "Titles": [
 14 |         "[mM]r",
 15 |         "[mM]rs",
 16 |         "[mM]s",
 17 |         "[dD]r",
 18 |         "[pP]rof",
 19 |         "[sS]en",
 20 |         "[rR]ep",
 21 |         "[rR]ev",
 22 |         "[gG]ov",
 23 |         "[aA]tty",
 24 |         "[sS]upt",
 25 |         "[dD]et",
 26 |         "[rR]ev",
 27 |         "[cC]ol",
 28 |         "[gG]en",
 29 |         "[lL]t",
 30 |         "[cC]mdr",
 31 |         "[aA]dm",
 32 |         "[cC]apt",
 33 |         "[sS]gt",
 34 |         "[cC]pl",
 35 |         "[mM]aj"
 36 |     ],
 37 |     "Entities": [
 38 |         "[dD]ept",
 39 |         "[uU]niv",
 40 |         "[uU]ni",
 41 |         "[aA]ssn"
 42 |     ],
 43 |     "Misc": [
 44 |         "[vV]s",
 45 |         "[mM]t"
 46 |     ],
 47 |     "Streets": [
 48 |         "[sS]t"
 49 |     ]
 50 | }
 51 | 
 52 | 
 53 | abbr_rep_2_json = {
 54 |     "Titles": [
 55 |         "[jJ]r",
 56 |         "[sS]r"
 57 |     ],
 58 |     "Entities": [
 59 |         "[bB]ros",
 60 |         "[iI]nc",
 61 |         "[lL]td",
 62 |         "[cC]o",
 63 |         "[cC]orp",
 64 |         "[pP]lc"
 65 |     ],
 66 |     "Months": [
 67 |         "[jJ]an",
 68 |         "[fF]eb",
 69 |         "[mM]ar",
 70 |         "[aA]pr",
 71 |         "[mM]ay",
 72 |         "[jJ]un",
 73 |         "[jJ]ul",
 74 |         "[aA]ug",
 75 |         "[sS]ep",
 76 |         "[oO]ct",
 77 |         "[nN]ov",
 78 |         "[dD]ec",
 79 |         "[sS]ept"
 80 |     ],
 81 |     "Days": [
 82 |         "[mM]on",
 83 |         "[tT]ue",
 84 |         "[wW]ed",
 85 |         "[tT]hu",
 86 |         "[fF]ri",
 87 |         "[sS]at",
 88 |         "[sS]un"
 89 |     ],
 90 |     "Misc": [
 91 |         "[eE]tc",
 92 |         "[eE]sp",
 93 |         "[cC]f",
 94 |         "[aA]l"
 95 |     ],
 96 |     "Streets": [
 97 |         "[aA]ve",
 98 |         "[bB]ld",
 99 |         "[bB]lvd",
100 |         "[cC]l",
101 |         "[cC]t",
102 |         "[cC]res",
103 |         "[rR]d"
104 |     ],
105 |     "Measurement": [
106 |         "[fF]t",
107 |         "[gG]al",
108 |         "[mM]i",
109 |         "[tT]bsp",
110 |         "[tT]sp",
111 |         "[yY]d",
112 |         "[qQ]t",
113 |         "[sS]q",
114 |         "[pP]t",
115 |         "[lL]b",
116 |         "[lL]bs"
117 |     ]
118 | }
119 | 
120 | 
121 |   
122 | period_reg = '{}|{}|{}|{}'.format(
123 |     r"(?:(?<=[a-z])\.\s(?=[a-z]\.))",
124 |     r"(?:(?<=([ .][a-z]))\.)(?!(?:\s[A-Z]|$)|(?:\s\s))",
125 |     r"(?:(?<=[A-Z])\.(?=\s??[A-Z]\.))",
126 |     r"(?:(?<=[A-Z])\.(?!\s+[A-Z][A-Za-z]))"
127 | )
128 | 
129 | 
130 | 
131 | abbr_rep_1 = [item for sublist in list(abbr_rep_1_json.values()) for item in sublist]
132 | abbr_rep_1_results = []
133 | 
134 | for i in range(len(abbr_rep_1)):
135 |     abbr_rep_1_results.append(r"((?<=\b({}))\.)".format(abbr_rep_1[i]))
136 |     
137 | 
138 | 
139 | abbr_rep_2 = [item for sublist in list(abbr_rep_2_json.values()) for item in sublist]        
140 | abbr_rep_2_results = []
141 | 
142 | for i in range(len(abbr_rep_2)):
143 |     abbr_rep_2_results.append(r"((?<=\b({}))\.(?!\s+[A-Z]))".format(abbr_rep_2[i]))
144 |     
145 | 
146 | sent_regex = "{}|{}|{}|({})".format(
147 |     "|".join(abbr_rep_1_results),
148 |     "|".join(abbr_rep_2_results),
149 |     period_reg,
150 |     r'\.(?=\d+)'
151 | )
152 | 
153 | 
154 | 
155 | 
156 | ## This works on a single string.  Need to loop through and apply.
157 | def break_sentence(x):
158 | 
159 |     y = re.sub(
160 |         pattern = r'([Pp])(\.)(\s*[Ss])(\.)', 
161 |         repl = r'\1<<<TEMP>>>\3<<<TEMP>>>',
162 |         string = x.strip()
163 |     )
164 |     
165 |     y = re.sub(
166 |         pattern = sent_regex, 
167 |         repl = "<<<TEMP>>>",
168 |         string = y
169 |     )
170 |     
171 |     y = re.sub(
172 |         pattern = r'(\b[Nn]o)(\\.)(\s+\d)', 
173 |         repl = r'\1<<<TEMP>>>\3',
174 |         string = y
175 |     )
176 |     
177 |     y = re.sub(
178 |         pattern = r'(\b\d+\s+in)(\.)(\s[a-z])', 
179 |         repl = r'\1<<<TEMP>>>\3',
180 |         string = y
181 |     )
182 |     
183 |     y = re.sub( 
184 |         pattern = r'([?.!]+)([\'])([^,])', 
185 |         repl = r'<<<SQUOTE>>>\1  \3',
186 |         string = y
187 |     )
188 |     
189 |     y = re.sub(  
190 |         pattern = r'([?.!]+)(["])([^,])', 
191 |         repl = r'<<<DQUOTE>>>\1  \3',
192 |         string = y
193 |     )
194 |     
195 |     ## midde name handling
196 |     y = re.sub( 
197 |         pattern = r'(\b[A-Z][a-z]+\s[A-Z])(\.)(\s[A-Z][a-z]+\b)',
198 |         repl = r'\1<<<TEMP>>>\3',
199 |         string = y
200 |     )
201 | 
202 |     #2 middle names
203 |     y = re.sub( 
204 |         pattern = r'(\b[A-Z][a-z]+\s[A-Z])(\.)(\s[A-Z])(\.)(\s[A-Z][a-z]+\b)',
205 |         repl = r'\1<<<TEMP>>>\3<<<TEMP>>>\5',
206 |         string = y
207 |     )
208 |     
209 |     y = re.split( 
210 |         pattern = r"{}{}".format(
211 |             r"(?:(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=[.?!])(?:\s|",  
212 |             r"(?=[a-zA-Z][a-zA-Z]*\s)))|(?:(?<=[A-Z][a-z][.?!])\s+)"
213 |         ),
214 |         string = y
215 |     )
216 |             
217 |     return y
218 | 
219 | 
220 | ## break a single string into sentences
221 | ## TO DO: handling missing????
222 | def break_sentences(x):
223 |     
224 |     broken_results = []
225 | 
226 |     for i in range(len(x)):
227 |         broken_results.append(break_sentence(x[i]))
228 |         
229 |     return broken_results
230 | 
231 | 
232 | 
233 | def restore_sentence(x):
234 | 
235 |     ## pdb.set_trace()
236 |     y = re.sub( 
237 |         pattern = r'<<<TEMP>>>',
238 |         repl = r'.',
239 |         string = x.strip()
240 |     )
241 |     
242 |     y = re.sub( 
243 |         pattern = r'(<<<DQUOTE>>>)([?.!]+)',
244 |         repl = r'\2"',
245 |         string = y
246 |     )
247 | 
248 |     y = re.sub( 
249 |         pattern = r'(<<<SQUOTE>>>)([?.!]+)',
250 |         repl = r'\2\"',
251 |         string = y
252 |     )
253 |     
254 |     return y
255 |     
256 | 
257 | 
258 | 
259 | def split_sentences(x):
260 |     
261 |     y = break_sentences(x)    
262 |        
263 |     element_id = np.repeat(range(len(y)), [len(i) for i in y])
264 |     sentence_id = [range(len(i)) for i in y]
265 |     sentence_id = [item for sublist in sentence_id for item in sublist]
266 |      
267 | #    locs = (np.cumsum([len(x) for x in y]) + 1)[:-1]
268 | ## TO DO: this should returna pandas object with an element_id, sentence_id, and the text
269 | 
270 |     sents = [restore_sentence(sentence).strip() for element in y for sentence in element]
271 |     return sents
272 | 
273 | ## Identical to ^^^
274 | # =============================================================================
275 | # list_of_words = []
276 | # for element in y:
277 | #     for sentence in element:
278 | #        list_of_words.append(restore_sentence(sentence))
279 | # 
280 | # list_of_words
281 | # =============================================================================
282 | 
283 | 
284 | if __name__ == '__main__':
285 |     # --- examples -------
286 |     s = [
287 |         ' I like you.  P.S. I like carrots too mrs. dunbar. Well let\'s go to 100th st. around the corner.   ', 
288 |         'Hello Dr. Livingstone.  How are you?', 
289 |         'This is sill an incomplete thou.'
290 |         
291 |     ]
292 | 
293 |     split_sentences(s)
294 | 
295 |     x = [
296 |         " ".join(
297 |             ["Mr. Brown comes! He says hello. i give him coffee.  i will ",
298 |             "go at 5 p. m. eastern time.  Or somewhere in between!go there"
299 |         ]),
300 |         " ".join(
301 |             ["Marvin K. Mooney Will You Please Go Now!", "The time has come.",
302 |             "The time has come. The time is now. Just go. Go. GO!",
303 |             "I don't care how."
304 |         ])
305 |     ]
306 |     
307 |     ss.split_sentences(x)
308 | 
309 | 
310 | 
311 | 
312 | 
313 | 
314 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import codecs
 2 | import os
 3 | from setuptools import setup, find_packages
 4 | 
 5 | HERE = os.path.abspath(os.path.dirname(__file__))
 6 | def read(*parts):
 7 |     """
 8 |     Build an absolute path from *parts* and and return the contents of the
 9 |     resulting file.  Assume UTF-8 encoding.
10 |     """
11 |     with codecs.open(os.path.join(HERE, *parts), "rb", "utf-8") as f:
12 |         return f.read()
13 |         
14 |         
15 | setup(
16 |   name = 'sentimentpy',
17 |   #packages = ['vaderSentiment'], # this must be the same as the name above
18 |   packages = find_packages(exclude=['tests*']), # a better way to do it than the line above -- this way no typo/transpo errors
19 |   include_package_data=True,
20 |   version = '2.7.0',
21 |   description = 'sentimentpy: Calculate Text Polarity Sentiment',
22 |   long_description = read("README.rst"),
23 |   long_description_content_type = 'text/markdown',
24 |   author = 'Tyler W. Rinker',
25 |   author_email = 'tyler.rinker@gmail.com',
26 |   license = 'MIT License: http://opensource.org/licenses/MIT',
27 |   url = 'https://github.com/cjhutto/vaderSentiment', # use the URL to the github repo
28 |   download_url = 'https://github.com/trinker/sentimentpy/archive/master.zip', 
29 |   keywords = ['sentiment'], # arbitrary keywords
30 |   classifiers = ['Development Status :: 4 - Beta', 'Intended Audience :: Science/Research',
31 |                  'License :: OSI Approved :: MIT License', 'Natural Language :: English',
32 |                  'Programming Language :: Python :: 3.6', 'Topic :: Scientific/Engineering :: Artificial Intelligence',
33 |                  'Topic :: Scientific/Engineering :: Information Analysis', 'Topic :: Text Processing :: Linguistic',
34 |                  'Topic :: Text Processing :: General'],
35 | )
36 | 


--------------------------------------------------------------------------------