` tag I will be looking for `` and `` tags with commonly used `id's`.
153 |
154 | ```python
155 | # These names commonly hold the article text.
156 | common_names = ["artic", "summary", "cont", "note", "cuerpo", "body"]
157 |
158 | # If the article is too short we look somewhere else.
159 | if len(article) <= 650:
160 |
161 | for tag in soup.find_all(["div", "section"]):
162 |
163 | tag_id = tag["id"].lower()
164 |
165 | for item in common_names:
166 | if item in tag_id:
167 | # We guarantee to get the longest div.
168 | if len(tag.text) >= len(article):
169 | article = tag.text
170 | ```
171 |
172 | That increased the accuracy quite a bit, I repeated the code but instead of the `id` attribute I was also looking for the `class` attribute.
173 |
174 | ```python
175 | # The article is still too short, let's try one more time.
176 | if len(article) <= 650:
177 |
178 | for tag in soup.find_all(["div", "section"]):
179 |
180 | tag_class = "".join(tag["class"]).lower()
181 |
182 | for item in common_names:
183 | if item in tag_class:
184 | # We guarantee to get the longest div.
185 | if len(tag.text) >= len(article):
186 | article = tag.text
187 | ```
188 |
189 | Using all the previous methods greatly increased the overall accuracy of the scraper. In some cases I used partial words that share the same letters in English and Spanish (artic -> article/articulo). The scraper was now compatible with all the urls I tested.
190 |
191 | We make a final check and if the article is still too short we abort the process and move to the next url, otherwise we move to the summary algorithm.
192 |
193 | ## Summary Algorithm
194 |
195 | This algorithm was designed to work primarily on Spanish written articles. It consists on several steps:
196 |
197 | 1. Reformat and clean the original article by removing all whitespaces.
198 | 2. Make a copy of the original article and remove all common used words from it.
199 | 3. Split the copied article into words and score each word.
200 | 4. Split the original article into sentences and score each sentence using the scores from the words.
201 | 5. Take the top 5 sentences and top 5 words and return them in chronological order.
202 |
203 | Before starting out we need to initialize the `spaCy` library.
204 |
205 | ```python
206 | NLP = spacy.load("es_core_news_sm")
207 | ```
208 |
209 | That line of code will load the `Spanish` model which I use the most. If you are using another language please refer to the `Requirements` section so you know how to install the appropriate model.
210 |
211 | ### Clean the Article
212 |
213 | When extracting the text from the article we usually get a lot of whitespaces, mostly from line breaks (`\n`).
214 |
215 | We split the text by that character, then strip all whitespaces and join it again. This is not strictly required to do but helps a lot while debugging the whole process.
216 |
217 | ### Remove Common and Stop Words
218 |
219 | At the top of the script we declare the path of the stop words text files. These stop words will be added to a `set`, guaranteeing no duplicates.
220 |
221 | I also added a list with some Spanish and English words that are not stop words but they don't add anything substantial to the article. My personal preference was to hard code them in lowercase form.
222 |
223 | Then I added a copy of each word in uppercase and title form. Which means the `set` will be 3 times the original size.
224 |
225 | ```python
226 | with open(ES_STOPWORDS_FILE, "r", encoding="utf-8") as temp_file:
227 | for word in temp_file.read().splitlines():
228 | COMMON_WORDS.add(word)
229 |
230 | with open(EN_STOPWORDS_FILE, "r", encoding="utf-8") as temp_file:
231 | for word in temp_file.read().splitlines():
232 | COMMON_WORDS.add(word)
233 |
234 | extra_words = list()
235 |
236 | for word in COMMON_WORDS:
237 | extra_words.append(word.title())
238 | extra_words.append(word.upper())
239 |
240 | for word in extra_words:
241 | COMMON_WORDS.add(word)
242 | ```
243 |
244 | ### Scoring Words
245 |
246 | Before starting tokenizing our words we must first pass our cleaned article into the `NLP` pipeline, this is done with one line of code.
247 |
248 | ```python
249 | doc = NLP(cleaned_article)
250 | ```
251 |
252 | This `doc` object contains several iterators, the 2 we will use are `tokens` and `sents` (sentences).
253 |
254 | At this point I added a personal touch to the algorithm. First I made a copy of the article and then removed all common words from it.
255 |
256 | Afterwards I used a `collections.Counter` object to do the initial scoring.
257 |
258 | Then I applied a multiplier bonus to words that start in uppercase and are equal or longer than 4 characters. Most of the time those words are names of places, people or organizations.
259 |
260 | Finally I set to zero the score for all words that are actually numbers.
261 |
262 | ```python
263 | words_of_interest = [
264 | token.text for token in doc if token.text not in COMMON_WORDS]
265 |
266 | scored_words = Counter(words_of_interest)
267 |
268 | for word in scored_words:
269 |
270 | if word[0].isupper() and len(word) >= 4:
271 | scored_words[word] *= 3
272 |
273 | if word.isdigit():
274 | scored_words[word] = 0
275 | ```
276 |
277 | ### Scoring Sentences
278 |
279 | Now that we have the final scores for each word it is time to score each sentence from the article.
280 |
281 | To do this we first need to split the article into sentences. I tried various approaches, including `RegEx` but the one that worked best was the `spaCy` library.
282 |
283 | We will iterate again over the `doc` object we defined in the previous step, but this time we will iterate over its `sents` property.
284 |
285 | Something to note is that we create a list of sentence `tokens` and inside those tokens we can retrieve the sentences text by accessing their `text` property.
286 |
287 | ```python
288 | article_sentences = [sent for sent in doc.sents]
289 |
290 | scored_sentences = list()
291 |
292 | or index, sent in enumerate(article_sentences):
293 |
294 | # In some edge cases we have duplicated sentences, we make sure that doesn't happen.
295 | if sent.text not in [sent for score, index, sent in scored_sentences]:
296 | scored_sentences.append(
297 | [score_line(sent, scored_words), index, sent.text])
298 | ```
299 |
300 | `scored_sentences` is a list of lists. Each inner list contains 3 values. The sentence score, its index and the sentence itself. Those values will be used in the next step.
301 |
302 | The code below shows how the lines are scored.
303 |
304 | ```python
305 | def score_line(line, scored_words):
306 |
307 | # We remove the common words.
308 | cleaned_line = [
309 | token.text for token in line if token.text not in COMMON_WORDS]
310 |
311 | # We now sum the total number of ocurrences for all words.
312 | temp_score = 0
313 |
314 | for word in cleaned_line:
315 | temp_score += scored_words[word]
316 |
317 | # We apply a bonus score to sentences that contain financial information.
318 | line_lowercase = line.text.lower()
319 |
320 | for word in FINANCIAL_WORDS:
321 | if word in line_lowercase:
322 | temp_score *= 1.5
323 | break
324 |
325 | return temp_score
326 | ```
327 |
328 | We apply a multiplier to sentences that contain any word that refers to money or finance.
329 |
330 | ### Chronological Order
331 |
332 | This is the final part of the algorithm, we make use of the `sorted()` function to get the top sentences and then reorder them in their original positions.
333 |
334 | We sort `scored_sentences` in reverse order, this will give us the top scored sentences first. We start a small counter variable so it breaks the loop once it hits 5. We also discard all sentences that are 3 characters or less (sometimes there are sneaky zero-width characters).
335 |
336 | ```python
337 | top_sentences = list()
338 | counter = 0
339 |
340 | for score, index, sentence in sorted(scored_sentences, reverse=True):
341 |
342 | if counter >= 5:
343 | break
344 |
345 | # When the article is too small the sentences may come empty.
346 | if len(sentence) >= 3:
347 |
348 | # We append the sentence and its index so we can sort in chronological order.
349 | top_sentences.append([index, sentence])
350 | counter += 1
351 |
352 | return [sentence for index, sentence in sorted(top_sentences)]
353 | ```
354 |
355 | At the end we use a list comprehension to return only the sentences which are already sorted in chronological order.
356 |
357 | ### Word Cloud
358 |
359 | Just for fun I added a word cloud to each article. To do so I used the `wordcloud` library. This library is very easy to use, you only require to declare a `WordCloud` object and use the `generate` method with a string of text as its parameter.
360 |
361 | ```python
362 | wc = wordcloud.WordCloud() # See cloud.py for full parameters.
363 | wc.generate(prepared_article)
364 | wc.to_file("./temp.png")
365 | ```
366 |
367 | After generating the image I uploaded it to `Imgur`, got back the url link and added it to the `Markdown` message.
368 |
369 | 
370 |
371 | ## Conclusion
372 |
373 | This was a very fun and interesting project to work on. I may have reinvented the wheel but at least I learned a few cool things.
374 |
375 | I'm satisfied with the overall quality of the results and I will keep tweaking the algorithm and applying compatibility enhancements.
376 |
377 | As a side note, when testing the script I accidentally requested Tweets, Facebook posts and English written articles. All of them got acceptable outputs, but since those sites were not the target I removed them from the whitelist.
378 |
379 | After some weeks of feedback I decided to add support for the English language. This required a bit of refactoring.
380 |
381 | To make it work with other languages you will only require a text file containing all the stop words from said language and copy a few lines of code (see Remove Common and Stop Words section).
382 |
383 | [](https://www.patreon.com/bePatron?u=20521425)
384 |
--------------------------------------------------------------------------------
/assets/cloud.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PhantomInsights/summarizer/d8b4d7745ca9ba4309fc9707b7c98ae143b97a10/assets/cloud.png
--------------------------------------------------------------------------------
/assets/font.txt:
--------------------------------------------------------------------------------
1 | The name of the font used by this project is Sofia Pro Light
2 |
3 | The font is free, to download it you need to purchase it from the following link:
4 |
5 | https://www.fontspring.com/fonts/mostardesign/sofia-pro
--------------------------------------------------------------------------------
/assets/stopwords-en.txt:
--------------------------------------------------------------------------------
1 | 'll
2 | 'tis
3 | 'twas
4 | 've
5 | 10
6 | 39
7 | a
8 | a's
9 | able
10 | ableabout
11 | about
12 | above
13 | abroad
14 | abst
15 | accordance
16 | according
17 | accordingly
18 | across
19 | act
20 | actually
21 | ad
22 | added
23 | adj
24 | adopted
25 | ae
26 | af
27 | affected
28 | affecting
29 | affects
30 | after
31 | afterwards
32 | ag
33 | again
34 | against
35 | ago
36 | ah
37 | ahead
38 | ai
39 | ain't
40 | aint
41 | al
42 | all
43 | allow
44 | allows
45 | almost
46 | alone
47 | along
48 | alongside
49 | already
50 | also
51 | although
52 | always
53 | am
54 | amid
55 | amidst
56 | among
57 | amongst
58 | amoungst
59 | amount
60 | an
61 | and
62 | announce
63 | another
64 | any
65 | anybody
66 | anyhow
67 | anymore
68 | anyone
69 | anything
70 | anyway
71 | anyways
72 | anywhere
73 | ao
74 | apart
75 | apparently
76 | appear
77 | appreciate
78 | appropriate
79 | approximately
80 | aq
81 | ar
82 | are
83 | area
84 | areas
85 | aren
86 | aren't
87 | arent
88 | arise
89 | around
90 | arpa
91 | as
92 | aside
93 | ask
94 | asked
95 | asking
96 | asks
97 | associated
98 | at
99 | au
100 | auth
101 | available
102 | aw
103 | away
104 | awfully
105 | az
106 | b
107 | ba
108 | back
109 | backed
110 | backing
111 | backs
112 | backward
113 | backwards
114 | bb
115 | bd
116 | be
117 | became
118 | because
119 | become
120 | becomes
121 | becoming
122 | been
123 | before
124 | beforehand
125 | began
126 | begin
127 | beginning
128 | beginnings
129 | begins
130 | behind
131 | being
132 | beings
133 | believe
134 | below
135 | beside
136 | besides
137 | best
138 | better
139 | between
140 | beyond
141 | bf
142 | bg
143 | bh
144 | bi
145 | big
146 | bill
147 | billion
148 | biol
149 | bj
150 | bm
151 | bn
152 | bo
153 | both
154 | bottom
155 | br
156 | brief
157 | briefly
158 | bs
159 | bt
160 | but
161 | buy
162 | bv
163 | bw
164 | by
165 | bz
166 | c
167 | c'mon
168 | c's
169 | ca
170 | call
171 | came
172 | can
173 | can't
174 | cannot
175 | cant
176 | caption
177 | case
178 | cases
179 | cause
180 | causes
181 | cc
182 | cd
183 | certain
184 | certainly
185 | cf
186 | cg
187 | ch
188 | changes
189 | ci
190 | ck
191 | cl
192 | clear
193 | clearly
194 | click
195 | cm
196 | cmon
197 | cn
198 | co
199 | co.
200 | com
201 | come
202 | comes
203 | computer
204 | con
205 | concerning
206 | consequently
207 | consider
208 | considering
209 | contain
210 | containing
211 | contains
212 | copy
213 | corresponding
214 | could
215 | could've
216 | couldn
217 | couldn't
218 | couldnt
219 | course
220 | cr
221 | cry
222 | cs
223 | cu
224 | currently
225 | cv
226 | cx
227 | cy
228 | cz
229 | d
230 | dare
231 | daren't
232 | darent
233 | date
234 | de
235 | dear
236 | definitely
237 | describe
238 | described
239 | despite
240 | detail
241 | did
242 | didn
243 | didn't
244 | didnt
245 | differ
246 | different
247 | differently
248 | directly
249 | dj
250 | dk
251 | dm
252 | do
253 | does
254 | doesn
255 | doesn't
256 | doesnt
257 | doing
258 | don
259 | don't
260 | done
261 | dont
262 | doubtful
263 | down
264 | downed
265 | downing
266 | downs
267 | downwards
268 | due
269 | during
270 | dz
271 | e
272 | each
273 | early
274 | ec
275 | ed
276 | edu
277 | ee
278 | effect
279 | eg
280 | eh
281 | eight
282 | eighty
283 | either
284 | eleven
285 | else
286 | elsewhere
287 | empty
288 | end
289 | ended
290 | ending
291 | ends
292 | enough
293 | entirely
294 | er
295 | es
296 | especially
297 | et
298 | et-al
299 | etc
300 | even
301 | evenly
302 | ever
303 | evermore
304 | every
305 | everybody
306 | everyone
307 | everything
308 | everywhere
309 | ex
310 | exactly
311 | example
312 | except
313 | f
314 | face
315 | faces
316 | fact
317 | facts
318 | fairly
319 | far
320 | farther
321 | felt
322 | few
323 | fewer
324 | ff
325 | fi
326 | fifteen
327 | fifth
328 | fifty
329 | fify
330 | fill
331 | find
332 | finds
333 | fire
334 | first
335 | five
336 | fix
337 | fj
338 | fk
339 | fm
340 | fo
341 | followed
342 | following
343 | follows
344 | for
345 | forever
346 | former
347 | formerly
348 | forth
349 | forty
350 | forward
351 | found
352 | four
353 | fr
354 | free
355 | from
356 | front
357 | full
358 | fully
359 | further
360 | furthered
361 | furthering
362 | furthermore
363 | furthers
364 | fx
365 | g
366 | ga
367 | gave
368 | gb
369 | gd
370 | ge
371 | general
372 | generally
373 | get
374 | gets
375 | getting
376 | gf
377 | gg
378 | gh
379 | gi
380 | give
381 | given
382 | gives
383 | giving
384 | gl
385 | gm
386 | gmt
387 | gn
388 | go
389 | goes
390 | going
391 | gone
392 | good
393 | goods
394 | got
395 | gotten
396 | gov
397 | gp
398 | gq
399 | gr
400 | great
401 | greater
402 | greatest
403 | greetings
404 | group
405 | grouped
406 | grouping
407 | groups
408 | gs
409 | gt
410 | gu
411 | gw
412 | gy
413 | h
414 | had
415 | hadn't
416 | hadnt
417 | half
418 | happens
419 | hardly
420 | has
421 | hasn
422 | hasn't
423 | hasnt
424 | have
425 | haven
426 | haven't
427 | havent
428 | having
429 | he
430 | he'd
431 | he'll
432 | he's
433 | hed
434 | hell
435 | hello
436 | help
437 | hence
438 | her
439 | here
440 | here's
441 | hereafter
442 | hereby
443 | herein
444 | heres
445 | hereupon
446 | hers
447 | herself
448 | herse”
449 | hes
450 | hi
451 | hid
452 | high
453 | higher
454 | highest
455 | him
456 | himself
457 | himse”
458 | his
459 | hither
460 | hk
461 | hm
462 | hn
463 | home
464 | homepage
465 | hopefully
466 | how
467 | how'd
468 | how'll
469 | how's
470 | howbeit
471 | however
472 | hr
473 | ht
474 | htm
475 | html
476 | http
477 | hu
478 | hundred
479 | i
480 | i'd
481 | i'll
482 | i'm
483 | i've
484 | i.e.
485 | id
486 | ie
487 | if
488 | ignored
489 | ii
490 | il
491 | ill
492 | im
493 | immediate
494 | immediately
495 | importance
496 | important
497 | in
498 | inasmuch
499 | inc
500 | inc.
501 | indeed
502 | index
503 | indicate
504 | indicated
505 | indicates
506 | information
507 | inner
508 | inside
509 | insofar
510 | instead
511 | int
512 | interest
513 | interested
514 | interesting
515 | interests
516 | into
517 | invention
518 | inward
519 | io
520 | iq
521 | ir
522 | is
523 | isn
524 | isn't
525 | isnt
526 | it
527 | it'd
528 | it'll
529 | it's
530 | itd
531 | itll
532 | its
533 | itself
534 | itse”
535 | ive
536 | j
537 | je
538 | jm
539 | jo
540 | join
541 | jp
542 | just
543 | k
544 | ke
545 | keep
546 | keeps
547 | kept
548 | keys
549 | kg
550 | kh
551 | ki
552 | kind
553 | km
554 | kn
555 | knew
556 | know
557 | known
558 | knows
559 | kp
560 | kr
561 | kw
562 | ky
563 | kz
564 | l
565 | la
566 | large
567 | largely
568 | last
569 | lately
570 | later
571 | latest
572 | latter
573 | latterly
574 | lb
575 | lc
576 | least
577 | length
578 | less
579 | lest
580 | let
581 | let's
582 | lets
583 | li
584 | like
585 | liked
586 | likely
587 | likewise
588 | line
589 | little
590 | lk
591 | ll
592 | long
593 | longer
594 | longest
595 | look
596 | looking
597 | looks
598 | low
599 | lower
600 | lr
601 | ls
602 | lt
603 | ltd
604 | lu
605 | lv
606 | ly
607 | m
608 | ma
609 | made
610 | mainly
611 | make
612 | makes
613 | making
614 | man
615 | many
616 | may
617 | maybe
618 | mayn't
619 | maynt
620 | mc
621 | md
622 | me
623 | mean
624 | means
625 | meantime
626 | meanwhile
627 | member
628 | members
629 | men
630 | merely
631 | mg
632 | mh
633 | microsoft
634 | might
635 | might've
636 | mightn't
637 | mightnt
638 | mil
639 | mill
640 | million
641 | mine
642 | minus
643 | miss
644 | mk
645 | ml
646 | mm
647 | mn
648 | mo
649 | more
650 | moreover
651 | most
652 | mostly
653 | move
654 | mp
655 | mq
656 | mr
657 | mrs
658 | ms
659 | msie
660 | mt
661 | mu
662 | much
663 | mug
664 | must
665 | must've
666 | mustn't
667 | mustnt
668 | mv
669 | mw
670 | mx
671 | my
672 | myself
673 | myse”
674 | mz
675 | n
676 | na
677 | name
678 | namely
679 | nay
680 | nc
681 | nd
682 | ne
683 | near
684 | nearly
685 | necessarily
686 | necessary
687 | need
688 | needed
689 | needing
690 | needn't
691 | neednt
692 | needs
693 | neither
694 | net
695 | netscape
696 | never
697 | neverf
698 | neverless
699 | nevertheless
700 | new
701 | newer
702 | newest
703 | next
704 | nf
705 | ng
706 | ni
707 | nine
708 | ninety
709 | nl
710 | no
711 | no-one
712 | nobody
713 | non
714 | none
715 | nonetheless
716 | noone
717 | nor
718 | normally
719 | nos
720 | not
721 | noted
722 | nothing
723 | notwithstanding
724 | novel
725 | now
726 | nowhere
727 | np
728 | nr
729 | nu
730 | null
731 | number
732 | numbers
733 | nz
734 | o
735 | obtain
736 | obtained
737 | obviously
738 | of
739 | off
740 | often
741 | oh
742 | ok
743 | okay
744 | old
745 | older
746 | oldest
747 | om
748 | omitted
749 | on
750 | once
751 | one
752 | one's
753 | ones
754 | only
755 | onto
756 | open
757 | opened
758 | opening
759 | opens
760 | opposite
761 | or
762 | ord
763 | order
764 | ordered
765 | ordering
766 | orders
767 | org
768 | other
769 | others
770 | otherwise
771 | ought
772 | oughtn't
773 | oughtnt
774 | our
775 | ours
776 | ourselves
777 | out
778 | outside
779 | over
780 | overall
781 | owing
782 | own
783 | p
784 | pa
785 | page
786 | pages
787 | part
788 | parted
789 | particular
790 | particularly
791 | parting
792 | parts
793 | past
794 | pe
795 | per
796 | perhaps
797 | pf
798 | pg
799 | ph
800 | pk
801 | pl
802 | place
803 | placed
804 | places
805 | please
806 | plus
807 | pm
808 | pmid
809 | pn
810 | point
811 | pointed
812 | pointing
813 | points
814 | poorly
815 | possible
816 | possibly
817 | potentially
818 | pp
819 | pr
820 | predominantly
821 | present
822 | presented
823 | presenting
824 | presents
825 | presumably
826 | previously
827 | primarily
828 | probably
829 | problem
830 | problems
831 | promptly
832 | proud
833 | provided
834 | provides
835 | pt
836 | put
837 | puts
838 | pw
839 | py
840 | q
841 | qa
842 | que
843 | quickly
844 | quite
845 | qv
846 | r
847 | ran
848 | rather
849 | rd
850 | re
851 | readily
852 | really
853 | reasonably
854 | recent
855 | recently
856 | ref
857 | refs
858 | regarding
859 | regardless
860 | regards
861 | related
862 | relatively
863 | research
864 | reserved
865 | respectively
866 | resulted
867 | resulting
868 | results
869 | right
870 | ring
871 | ro
872 | room
873 | rooms
874 | round
875 | ru
876 | run
877 | rw
878 | s
879 | sa
880 | said
881 | same
882 | saw
883 | say
884 | saying
885 | says
886 | sb
887 | sc
888 | sd
889 | se
890 | sec
891 | second
892 | secondly
893 | seconds
894 | section
895 | see
896 | seeing
897 | seem
898 | seemed
899 | seeming
900 | seems
901 | seen
902 | sees
903 | self
904 | selves
905 | sensible
906 | sent
907 | serious
908 | seriously
909 | seven
910 | seventy
911 | several
912 | sg
913 | sh
914 | shall
915 | shan't
916 | shant
917 | she
918 | she'd
919 | she'll
920 | she's
921 | shed
922 | shell
923 | shes
924 | should
925 | should've
926 | shouldn
927 | shouldn't
928 | shouldnt
929 | show
930 | showed
931 | showing
932 | shown
933 | showns
934 | shows
935 | si
936 | side
937 | sides
938 | significant
939 | significantly
940 | similar
941 | similarly
942 | since
943 | sincere
944 | site
945 | six
946 | sixty
947 | sj
948 | sk
949 | sl
950 | slightly
951 | sm
952 | small
953 | smaller
954 | smallest
955 | sn
956 | so
957 | some
958 | somebody
959 | someday
960 | somehow
961 | someone
962 | somethan
963 | something
964 | sometime
965 | sometimes
966 | somewhat
967 | somewhere
968 | soon
969 | sorry
970 | specifically
971 | specified
972 | specify
973 | specifying
974 | sr
975 | st
976 | state
977 | states
978 | still
979 | stop
980 | strongly
981 | su
982 | sub
983 | substantially
984 | successfully
985 | such
986 | sufficiently
987 | suggest
988 | sup
989 | sure
990 | sv
991 | sy
992 | system
993 | sz
994 | t
995 | t's
996 | take
997 | taken
998 | taking
999 | tc
1000 | td
1001 | tell
1002 | ten
1003 | tends
1004 | test
1005 | text
1006 | tf
1007 | tg
1008 | th
1009 | than
1010 | thank
1011 | thanks
1012 | thanx
1013 | that
1014 | that'll
1015 | that's
1016 | that've
1017 | thatll
1018 | thats
1019 | thatve
1020 | the
1021 | their
1022 | theirs
1023 | them
1024 | themselves
1025 | then
1026 | thence
1027 | there
1028 | there'd
1029 | there'll
1030 | there're
1031 | there's
1032 | there've
1033 | thereafter
1034 | thereby
1035 | thered
1036 | therefore
1037 | therein
1038 | therell
1039 | thereof
1040 | therere
1041 | theres
1042 | thereto
1043 | thereupon
1044 | thereve
1045 | these
1046 | they
1047 | they'd
1048 | they'll
1049 | they're
1050 | they've
1051 | theyd
1052 | theyll
1053 | theyre
1054 | theyve
1055 | thick
1056 | thin
1057 | thing
1058 | things
1059 | think
1060 | thinks
1061 | third
1062 | thirty
1063 | this
1064 | thorough
1065 | thoroughly
1066 | those
1067 | thou
1068 | though
1069 | thoughh
1070 | thought
1071 | thoughts
1072 | thousand
1073 | three
1074 | throug
1075 | through
1076 | throughout
1077 | thru
1078 | thus
1079 | til
1080 | till
1081 | tip
1082 | tis
1083 | tj
1084 | tk
1085 | tm
1086 | tn
1087 | to
1088 | today
1089 | together
1090 | too
1091 | took
1092 | top
1093 | toward
1094 | towards
1095 | tp
1096 | tr
1097 | tried
1098 | tries
1099 | trillion
1100 | truly
1101 | try
1102 | trying
1103 | ts
1104 | tt
1105 | turn
1106 | turned
1107 | turning
1108 | turns
1109 | tv
1110 | tw
1111 | twas
1112 | twelve
1113 | twenty
1114 | twice
1115 | two
1116 | tz
1117 | u
1118 | ua
1119 | ug
1120 | uk
1121 | um
1122 | un
1123 | under
1124 | underneath
1125 | undoing
1126 | unfortunately
1127 | unless
1128 | unlike
1129 | unlikely
1130 | until
1131 | unto
1132 | up
1133 | upon
1134 | ups
1135 | upwards
1136 | us
1137 | use
1138 | used
1139 | useful
1140 | usefully
1141 | usefulness
1142 | uses
1143 | using
1144 | usually
1145 | uucp
1146 | uy
1147 | uz
1148 | v
1149 | va
1150 | value
1151 | various
1152 | vc
1153 | ve
1154 | versus
1155 | very
1156 | vg
1157 | vi
1158 | via
1159 | viz
1160 | vn
1161 | vol
1162 | vols
1163 | vs
1164 | vu
1165 | w
1166 | want
1167 | wanted
1168 | wanting
1169 | wants
1170 | was
1171 | wasn
1172 | wasn't
1173 | wasnt
1174 | way
1175 | ways
1176 | we
1177 | we'd
1178 | we'll
1179 | we're
1180 | we've
1181 | web
1182 | webpage
1183 | website
1184 | wed
1185 | welcome
1186 | well
1187 | wells
1188 | went
1189 | were
1190 | weren
1191 | weren't
1192 | werent
1193 | weve
1194 | wf
1195 | what
1196 | what'd
1197 | what'll
1198 | what's
1199 | what've
1200 | whatever
1201 | whatll
1202 | whats
1203 | whatve
1204 | when
1205 | when'd
1206 | when'll
1207 | when's
1208 | whence
1209 | whenever
1210 | where
1211 | where'd
1212 | where'll
1213 | where's
1214 | whereafter
1215 | whereas
1216 | whereby
1217 | wherein
1218 | wheres
1219 | whereupon
1220 | wherever
1221 | whether
1222 | which
1223 | whichever
1224 | while
1225 | whilst
1226 | whim
1227 | whither
1228 | who
1229 | who'd
1230 | who'll
1231 | who's
1232 | whod
1233 | whoever
1234 | whole
1235 | wholl
1236 | whom
1237 | whomever
1238 | whos
1239 | whose
1240 | why
1241 | why'd
1242 | why'll
1243 | why's
1244 | widely
1245 | width
1246 | will
1247 | willing
1248 | wish
1249 | with
1250 | within
1251 | without
1252 | won
1253 | won't
1254 | wonder
1255 | wont
1256 | words
1257 | work
1258 | worked
1259 | working
1260 | works
1261 | world
1262 | would
1263 | would've
1264 | wouldn
1265 | wouldn't
1266 | wouldnt
1267 | ws
1268 | www
1269 | x
1270 | y
1271 | ye
1272 | year
1273 | years
1274 | yes
1275 | yet
1276 | you
1277 | you'd
1278 | you'll
1279 | you're
1280 | you've
1281 | youd
1282 | youll
1283 | young
1284 | younger
1285 | youngest
1286 | your
1287 | youre
1288 | yours
1289 | yourself
1290 | yourselves
1291 | youve
1292 | yt
1293 | yu
1294 | z
1295 | za
1296 | zero
1297 | zm
1298 | zr
--------------------------------------------------------------------------------
/assets/stopwords-es.txt:
--------------------------------------------------------------------------------
1 | 0
2 | 1
3 | 2
4 | 3
5 | 4
6 | 5
7 | 6
8 | 7
9 | 8
10 | 9
11 | _
12 | a
13 | actualmente
14 | acuerdo
15 | adelante
16 | ademas
17 | además
18 | adrede
19 | afirmó
20 | agregó
21 | ahi
22 | ahora
23 | ahí
24 | al
25 | algo
26 | alguna
27 | algunas
28 | alguno
29 | algunos
30 | algún
31 | alli
32 | allí
33 | alrededor
34 | ambos
35 | ampleamos
36 | antano
37 | antaño
38 | ante
39 | anterior
40 | antes
41 | apenas
42 | aproximadamente
43 | aquel
44 | aquella
45 | aquellas
46 | aquello
47 | aquellos
48 | aqui
49 | aquél
50 | aquélla
51 | aquéllas
52 | aquéllos
53 | aquí
54 | arriba
55 | arribaabajo
56 | aseguró
57 | asi
58 | así
59 | atras
60 | aun
61 | aunque
62 | ayer
63 | añadió
64 | aún
65 | b
66 | bajo
67 | bastante
68 | bien
69 | breve
70 | buen
71 | buena
72 | buenas
73 | bueno
74 | buenos
75 | c
76 | cada
77 | casi
78 | cerca
79 | cierta
80 | ciertas
81 | cierto
82 | ciertos
83 | cinco
84 | claro
85 | comentó
86 | como
87 | con
88 | conmigo
89 | conocer
90 | conseguimos
91 | conseguir
92 | considera
93 | consideró
94 | consigo
95 | consigue
96 | consiguen
97 | consigues
98 | contigo
99 | contra
100 | cosas
101 | creo
102 | cual
103 | cuales
104 | cualquier
105 | cuando
106 | cuanta
107 | cuantas
108 | cuanto
109 | cuantos
110 | cuatro
111 | cuenta
112 | cuál
113 | cuáles
114 | cuándo
115 | cuánta
116 | cuántas
117 | cuánto
118 | cuántos
119 | cómo
120 | d
121 | da
122 | dado
123 | dan
124 | dar
125 | de
126 | debajo
127 | debe
128 | deben
129 | debido
130 | decir
131 | dejó
132 | del
133 | delante
134 | demasiado
135 | demás
136 | dentro
137 | deprisa
138 | desde
139 | despacio
140 | despues
141 | después
142 | detras
143 | detrás
144 | dia
145 | dias
146 | dice
147 | dicen
148 | dicho
149 | dieron
150 | diferente
151 | diferentes
152 | dijeron
153 | dijo
154 | dio
155 | donde
156 | dos
157 | durante
158 | día
159 | días
160 | dónde
161 | e
162 | ejemplo
163 | el
164 | ella
165 | ellas
166 | ello
167 | ellos
168 | embargo
169 | empleais
170 | emplean
171 | emplear
172 | empleas
173 | empleo
174 | en
175 | encima
176 | encuentra
177 | enfrente
178 | enseguida
179 | entonces
180 | entre
181 | era
182 | erais
183 | eramos
184 | eran
185 | eras
186 | eres
187 | es
188 | esa
189 | esas
190 | ese
191 | eso
192 | esos
193 | esta
194 | estaba
195 | estabais
196 | estaban
197 | estabas
198 | estad
199 | estada
200 | estadas
201 | estado
202 | estados
203 | estais
204 | estamos
205 | estan
206 | estando
207 | estar
208 | estaremos
209 | estará
210 | estarán
211 | estarás
212 | estaré
213 | estaréis
214 | estaría
215 | estaríais
216 | estaríamos
217 | estarían
218 | estarías
219 | estas
220 | este
221 | estemos
222 | esto
223 | estos
224 | estoy
225 | estuve
226 | estuviera
227 | estuvierais
228 | estuvieran
229 | estuvieras
230 | estuvieron
231 | estuviese
232 | estuvieseis
233 | estuviesen
234 | estuvieses
235 | estuvimos
236 | estuviste
237 | estuvisteis
238 | estuviéramos
239 | estuviésemos
240 | estuvo
241 | está
242 | estábamos
243 | estáis
244 | están
245 | estás
246 | esté
247 | estéis
248 | estén
249 | estés
250 | ex
251 | excepto
252 | existe
253 | existen
254 | explicó
255 | expresó
256 | f
257 | fin
258 | final
259 | fue
260 | fuera
261 | fuerais
262 | fueran
263 | fueras
264 | fueron
265 | fuese
266 | fueseis
267 | fuesen
268 | fueses
269 | fui
270 | fuimos
271 | fuiste
272 | fuisteis
273 | fuéramos
274 | fuésemos
275 | g
276 | general
277 | gran
278 | grandes
279 | gueno
280 | h
281 | ha
282 | haber
283 | habia
284 | habida
285 | habidas
286 | habido
287 | habidos
288 | habiendo
289 | habla
290 | hablan
291 | habremos
292 | habrá
293 | habrán
294 | habrás
295 | habré
296 | habréis
297 | habría
298 | habríais
299 | habríamos
300 | habrían
301 | habrías
302 | habéis
303 | había
304 | habíais
305 | habíamos
306 | habían
307 | habías
308 | hace
309 | haceis
310 | hacemos
311 | hacen
312 | hacer
313 | hacerlo
314 | haces
315 | hacia
316 | haciendo
317 | hago
318 | han
319 | has
320 | hasta
321 | hay
322 | haya
323 | hayamos
324 | hayan
325 | hayas
326 | hayáis
327 | he
328 | hecho
329 | hemos
330 | hicieron
331 | hizo
332 | horas
333 | hoy
334 | hube
335 | hubiera
336 | hubierais
337 | hubieran
338 | hubieras
339 | hubieron
340 | hubiese
341 | hubieseis
342 | hubiesen
343 | hubieses
344 | hubimos
345 | hubiste
346 | hubisteis
347 | hubiéramos
348 | hubiésemos
349 | hubo
350 | i
351 | igual
352 | incluso
353 | indicó
354 | informo
355 | informó
356 | intenta
357 | intentais
358 | intentamos
359 | intentan
360 | intentar
361 | intentas
362 | intento
363 | ir
364 | j
365 | junto
366 | k
367 | l
368 | la
369 | lado
370 | largo
371 | las
372 | le
373 | lejos
374 | les
375 | llegó
376 | lleva
377 | llevar
378 | lo
379 | los
380 | luego
381 | lugar
382 | m
383 | mal
384 | manera
385 | manifestó
386 | mas
387 | mayor
388 | me
389 | mediante
390 | medio
391 | mejor
392 | mencionó
393 | menos
394 | menudo
395 | mi
396 | mia
397 | mias
398 | mientras
399 | mio
400 | mios
401 | mis
402 | misma
403 | mismas
404 | mismo
405 | mismos
406 | modo
407 | momento
408 | mucha
409 | muchas
410 | mucho
411 | muchos
412 | muy
413 | más
414 | mí
415 | mía
416 | mías
417 | mío
418 | míos
419 | n
420 | nada
421 | nadie
422 | ni
423 | ninguna
424 | ningunas
425 | ninguno
426 | ningunos
427 | ningún
428 | no
429 | nos
430 | nosotras
431 | nosotros
432 | nuestra
433 | nuestras
434 | nuestro
435 | nuestros
436 | nueva
437 | nuevas
438 | nuevo
439 | nuevos
440 | nunca
441 | o
442 | ocho
443 | os
444 | otra
445 | otras
446 | otro
447 | otros
448 | p
449 | pais
450 | para
451 | parece
452 | parte
453 | partir
454 | pasada
455 | pasado
456 | paìs
457 | peor
458 | pero
459 | pesar
460 | poca
461 | pocas
462 | poco
463 | pocos
464 | podeis
465 | podemos
466 | poder
467 | podria
468 | podriais
469 | podriamos
470 | podrian
471 | podrias
472 | podrá
473 | podrán
474 | podría
475 | podrían
476 | poner
477 | por
478 | por qué
479 | porque
480 | posible
481 | primer
482 | primera
483 | primero
484 | primeros
485 | principalmente
486 | pronto
487 | propia
488 | propias
489 | propio
490 | propios
491 | proximo
492 | próximo
493 | próximos
494 | pudo
495 | pueda
496 | puede
497 | pueden
498 | puedo
499 | pues
500 | q
501 | qeu
502 | que
503 | quedó
504 | queremos
505 | quien
506 | quienes
507 | quiere
508 | quiza
509 | quizas
510 | quizá
511 | quizás
512 | quién
513 | quiénes
514 | qué
515 | r
516 | raras
517 | realizado
518 | realizar
519 | realizó
520 | repente
521 | respecto
522 | s
523 | sabe
524 | sabeis
525 | sabemos
526 | saben
527 | saber
528 | sabes
529 | sal
530 | salvo
531 | se
532 | sea
533 | seamos
534 | sean
535 | seas
536 | segun
537 | segunda
538 | segundo
539 | según
540 | seis
541 | ser
542 | sera
543 | seremos
544 | será
545 | serán
546 | serás
547 | seré
548 | seréis
549 | sería
550 | seríais
551 | seríamos
552 | serían
553 | serías
554 | seáis
555 | señaló
556 | si
557 | sido
558 | siempre
559 | siendo
560 | siete
561 | sigue
562 | siguiente
563 | sin
564 | sino
565 | sobre
566 | sois
567 | sola
568 | solamente
569 | solas
570 | solo
571 | solos
572 | somos
573 | son
574 | soy
575 | soyos
576 | su
577 | supuesto
578 | sus
579 | suya
580 | suyas
581 | suyo
582 | suyos
583 | sé
584 | sí
585 | sólo
586 | t
587 | tal
588 | tambien
589 | también
590 | tampoco
591 | tan
592 | tanto
593 | tarde
594 | te
595 | temprano
596 | tendremos
597 | tendrá
598 | tendrán
599 | tendrás
600 | tendré
601 | tendréis
602 | tendría
603 | tendríais
604 | tendríamos
605 | tendrían
606 | tendrías
607 | tened
608 | teneis
609 | tenemos
610 | tener
611 | tenga
612 | tengamos
613 | tengan
614 | tengas
615 | tengo
616 | tengáis
617 | tenida
618 | tenidas
619 | tenido
620 | tenidos
621 | teniendo
622 | tenéis
623 | tenía
624 | teníais
625 | teníamos
626 | tenían
627 | tenías
628 | tercera
629 | ti
630 | tiempo
631 | tiene
632 | tienen
633 | tienes
634 | toda
635 | todas
636 | todavia
637 | todavía
638 | todo
639 | todos
640 | total
641 | trabaja
642 | trabajais
643 | trabajamos
644 | trabajan
645 | trabajar
646 | trabajas
647 | trabajo
648 | tras
649 | trata
650 | través
651 | tres
652 | tu
653 | tus
654 | tuve
655 | tuviera
656 | tuvierais
657 | tuvieran
658 | tuvieras
659 | tuvieron
660 | tuviese
661 | tuvieseis
662 | tuviesen
663 | tuvieses
664 | tuvimos
665 | tuviste
666 | tuvisteis
667 | tuviéramos
668 | tuviésemos
669 | tuvo
670 | tuya
671 | tuyas
672 | tuyo
673 | tuyos
674 | tú
675 | u
676 | ultimo
677 | un
678 | una
679 | unas
680 | uno
681 | unos
682 | usa
683 | usais
684 | usamos
685 | usan
686 | usar
687 | usas
688 | uso
689 | usted
690 | ustedes
691 | v
692 | va
693 | vais
694 | valor
695 | vamos
696 | van
697 | varias
698 | varios
699 | vaya
700 | veces
701 | ver
702 | verdad
703 | verdadera
704 | verdadero
705 | vez
706 | vosotras
707 | vosotros
708 | voy
709 | vuestra
710 | vuestras
711 | vuestro
712 | vuestros
713 | w
714 | x
715 | y
716 | ya
717 | yo
718 | z
719 | él
720 | éramos
721 | ésa
722 | ésas
723 | ése
724 | ésos
725 | ésta
726 | éstas
727 | éste
728 | éstos
729 | última
730 | últimas
731 | último
732 | últimos
--------------------------------------------------------------------------------
/assets/whitelist.txt:
--------------------------------------------------------------------------------
1 | rotativo.com.mx
2 | excelsior.com.mx
3 | yogonet.com
4 | eluniversal.com.mx
5 | nyti.ms
6 | unocero.com
7 | mexico.com
8 | thecoinrepublic.com
9 | costumbres.de
10 | bbc.com
11 | avclub.com
12 | infobae.com
13 | news24.com
14 | nasa.gov
15 | sdpnoticias.com
16 | jetnews.com.mx
17 | razon.com.mx
18 | elceo.com
19 | arenapublica.com
20 | diarioelindependiente.mx
21 | pscp.tv
22 | plumasatomicas.com
23 | regeneracion.mx
24 | mvsnoticias.com
25 | publimetro.com.mx
26 | themexico.news
27 | aristeguinoticias.com
28 | pulsoslp.com.mx
29 | diputados.gob.mx
30 | diariodequeretaro.com.mx
31 | nnc.mx
32 | frontera.info
33 | bloomberg.com
34 | lopezobrador.org.mx
35 | asisucedegto.mx
36 | xeu.mx
37 | xevt.com
38 | 24-horas.mx
39 | politico.mx
40 | festivosmexico.com.mx
41 | lavozdechile.com
42 | noticiaslapaz.com
43 | milenio.com
44 | theconservativetreehouse.com
45 | chalenoticias.mx
46 | breaking.com.mx
47 | miamiherald.com
48 | economiahoy.mx
49 | argumentopolitico.com
50 | elfinanciero.com.mx
51 | reporteroshoy.mx
52 | vanguardia.com.mx
53 | laopcion.com.mx
54 | elexpres.com
55 | elindependientedehidalgo.com.mx
56 | canalsonora.com
57 | diariocambio.com.mx
58 | nexos.com.mx
59 | newsweek.com
60 | xataka.com.mx
61 | ampproject.org
62 | zetatijuana.com
63 | brainwala.com
64 | tumblr.com
65 | sipse.com
66 | periodicocorreo.com.mx
67 | imparcialoaxaca.mx
68 | ejecentral.com.mx
69 | mas-mexico.com.mx
70 | elsoldepuebla.com.mx
71 | lasestrellas.tv
72 | coachesvoice.com
73 | psicologiaymente.com
74 | reportur.com
75 | themazatlanpost.com
76 | sg.com.mx
77 | superrucos.com
78 | elsoldeacapulco.com.mx
79 | elpais.com
80 | elmercurio.com.mx
81 | taringa.net
82 | oilandgasmagazine.com.mx
83 | proceso.com.mx
84 | lanetanoticias.com
85 | suracapulco.mx
86 | bancomundial.org
87 | cletofilia.com
88 | aztecanoticias.com.mx
89 | periodicoelmexicano.com.mx
90 | imagenradio.com.mx
91 | animalpolitico.com
92 | tiempo.com.mx
93 | forbes.com.mx
94 | eia.gov
95 | casede.org
96 | eleconomista.com.mx
97 | sinembargo.mx
98 | huffingtonpost.com.mx
99 | zocalo.com.mx
100 | www.gob.mx
101 | aem.gob.mx
102 | clipperdata.com
103 | expreso.com.mx
104 | elsoldemexico.com.mx
105 | streamable.com
106 | lacronica.com
107 | televisa.com
108 | am.com.mx
109 | mexnewz.mx
110 | beeg1.net
111 | moreloshabla.com
112 | washingtonpost.com
113 | dailywire.com
114 | soyhomosensual.com
115 | ft.com
116 | wsj.com
117 | blogspot.com
118 | wsws.org
119 | nacionunida.com
120 | society6.com
121 | telesurenglish.net
122 | independent.co.uk
123 | revistaei.cl
124 | amazon.com.mx
125 | escapadeland.com
126 | elnuevoherald.com
127 | mxcity.mx
128 | tribuna.com.mx
129 | lasillarota.com
130 | tabascohoy.com
131 | bitcoinrealcash.com
132 | informador.mx
133 | netnoticias.mx
134 | heraldodemexico.com.mx
135 | businessinsider.com
136 | sapiens.org
137 | monitoreconomico.org
138 | forbes.com
139 | elsoldelbajio.com.mx
140 | sputniknews.com
141 | versiones.com.mx
142 | quadratin.com.mx
143 | omnia.com.mx
144 | wordpress.com
145 | theregister.co.uk
146 | bbc.co.uk
147 | novedadesaca.mx
148 | dineroenimagen.com
149 | elhorizonte.mx
150 | opensocietyfoundations.org
151 | unimexicali.com
152 | asistepemex.org
153 | radioformula.com.mx
154 | reforma.com
155 | ibb.co
156 | laverdadnoticias.com
157 | nytimes.com
158 | notisistema.com
159 | reliefweb.int
160 | lavozdeperu.com
161 | abcnoticias.mx
162 | itam.mx
163 | jornada.com.mx
164 | parametria.com.mx
165 | unomasuno.com.mx
166 | commondreams.org
167 | theguardian.com
168 | sptnkne.ws
169 | josecardenas.com
170 | rascamapas.com
171 | segundoasegundo.com
172 | reporteindigo.com
173 | globo.com
174 | rasnoticias.mx
175 | maritimeherald.com
176 | jetbrains.com
177 | lopezdoriga.com
178 | cns.gob.mx
179 | livejournal.com
180 | desastre.mx
181 | mexicodesconocido.com.mx
182 | yahoo.com
183 | allerorts.de
184 | diario.mx
185 | bcsnoticias.mx
186 | noticiasdequeretaro.com.mx
187 | expansion.mx
188 | elimparcial.com
189 | cargonewsmex.com
190 | contrareplica.mx
191 | unam.mx
192 | lavozdelafrontera.com.mx
193 | terceravia.mx
194 | latercera.com
195 | acustiknoticias.com
196 | riodoce.mx
197 | adnpolitico.com
198 | fayerwayer.com
199 | horizontal.mx
200 | wradio.com.mx
201 | diariodecolima.com
202 | noticiaszmg.com
203 | elmanana.com
204 | altonivel.com.mx
205 | elsiglodetorreon.com.mx
206 | eldiariodechihuahua.mx
207 | declarenews.com
208 | reuters.com
209 | thelocal.se
210 | hongkongfp.com
211 | canoe.com
212 | indiatimes.com
213 | faroutmagazine.co.uk
214 | els5ra.com
215 | physicalfitnesscare.com
216 | bients.com
217 | udefense.info
218 | wildwechsel.de
219 | viralnewsdrift.com
220 | chinaro.ir
221 | dankpupper.com
222 | mightykingseo.com
223 | thesun.co.uk
224 | theverge.com
225 | thenewcivilrightsmovement.com
226 | truthdig.com
227 | channelnewsasia.com
228 | dailytelegraph.com.au
229 | centicsystems.com
230 | abracadabranoticias.com
231 | rawstory.com
232 | evolutionalblogs.com
233 | euronews.com
234 | haaretz.com
235 | loversofcats.com
236 | ndtv.com
237 | livetrendynews.com
238 | pakthought.com
239 | scmp.com
240 | euractiv.com
241 | aviralupdate.com
242 | france24.com
243 | theindianwire.com
244 | aljazeera.com
245 | wnobserver.com
246 | tessyinfohub.com
247 | newyorker.com
248 | kartiavelino.com
249 | newrightnetwork.com
250 | atusocialscience.ir
251 | samrattailors.com
252 | trendnewsworld.com
253 | zdnet.com
254 | nicepatogh.ir
255 | news18.com
256 | apsense.com
257 | virapars.com
258 | newsunbox.com
259 | nationalpost.com
260 | trendnewsweb.com
261 | globalnews.ca
262 | huffpost.com
263 | thenewsobservers.com
264 | bestnewsviral.com
265 | rt.com
266 | madamasr.com
267 | standard.co.uk
268 | local10.com
269 | telegraph.co.uk
270 | time8.in
271 | thehill.com
272 | timesofisrael.com
273 | dailymail.co.uk
274 | kyivpost.com
275 | indiatvnews.com
276 | talkingpointsmemo.com
277 | livemint.com
278 | sabq.org
279 | veteranstoday.com
280 | isibcase.ir
281 | dailytimes.com.pk
282 | thedailybeast.com
283 | total-croatia-news.com
284 | articlescad.com
285 | writeup.co.in
286 | gonewsviral.com
287 | thewire.in
288 | npr.org
289 | theglobeandmail.com
290 | nbcnews.com
291 | viralspicynews.com
292 | app.link
293 | trtworld.com
294 | unionjournalism.com
295 | winnaijatv.com
296 | jpost.com
297 | politico.com
298 | cnn.com
299 | walesonline.co.uk
300 | highmarksecurity.com
301 | indiatoday.in
302 | medium.com
303 | viralreportnow.com
304 | thegrowthop.com
305 | sky.com
306 | gappoo.com
307 | nymag.com
308 | viraltopiczone.com
309 | rfa.org
310 | apnews.com
311 | newindianexpress.com
312 | dailykos.com
313 | dw.com
314 | middleeastmonitor.com
315 | msn.com
316 | truescoopnews.com
317 | sbs.com.au
318 | discountbook.ir
319 | tnewst.com
320 | birminghammail.co.uk
321 | dailytrendshunter.com
322 | usnews.com
323 | dawn.com
324 | abc.net.au
325 | trendynewstime.com
326 | hindustantimes.com
327 | spacenews.com
328 | acneuro.co.uk
329 | washingtonexaminer.com
330 | cbc.ca
331 | rightsanddissent.org
332 | tribune.com.pk
333 | dailysabah.com
334 | today.com
335 | faitmain.ma
336 | dailyhive.com
337 | trendyupdatenews.com
338 | ctvnews.ca
339 | businessinsider.co.za
340 | usatoday.com
341 | viralupfeed.com
342 | indianexpress.com
343 | batask.ir
344 | foxnews.com
345 | thenews.com.pk
346 | cnbc.com
347 | newsdaily.today
348 | firstpost.com
349 | morningstaronline.co.uk
350 | ntrguadalajara.com
351 | nationalgeographic.com
352 | cronica.com.mx
353 | debate.com.mx
354 | guanajuatoinforma.com
355 | yucatan.com.mx
356 | sinlineamx.com
357 | sintesistv.com.mx
358 | laprensademonclova.com
359 | sandiegored.com
360 | turquesanews.mx
361 | enlapolitika.com
362 | lineadirectaportal.com
363 | hoyestado.com
364 | alcalorpolitico.com
365 | cafenegroportal.com
366 | noroeste.com.mx
367 | lineadirectaportal.com
368 | mediotiempo.com
369 | unotv.com
370 | criteriohidalgo.com
371 | xeva.com.mx
372 | quintafuerza.mx
373 | latinus.us
374 | verificado.com.mx
375 | lavanguardia.com
376 | nature.com
377 |
--------------------------------------------------------------------------------
/bot.py:
--------------------------------------------------------------------------------
1 | """
2 | Inits the summary bot. It starts a Reddit instance using PRAW, gets the latest posts
3 | and filters those who have already been processed.
4 | """
5 |
6 | import praw
7 | import requests
8 | import tldextract
9 |
10 | import cloud
11 | import config
12 | import scraper
13 | import summary
14 |
15 | # We don't reply to posts which have a very small or very high reduction.
16 | MINIMUM_REDUCTION_THRESHOLD = 20
17 | MAXIMUM_REDUCTION_THRESHOLD = 68
18 |
19 | # File locations
20 | POSTS_LOG = "./processed_posts.txt"
21 | WHITELIST_FILE = "./assets/whitelist.txt"
22 | ERROR_LOG = "./error.log"
23 |
24 | # Templates.
25 | TEMPLATE = open("./templates/es.txt", "r", encoding="utf-8").read()
26 |
27 |
28 | HEADERS = {"User-Agent": "Summarizer v2.0"}
29 |
30 |
31 | def load_whitelist():
32 | """Reads the processed posts log file and creates it if it doesn't exist.
33 |
34 | Returns
35 | -------
36 | list
37 | A list of domains that are confirmed to have an 'article' tag.
38 |
39 | """
40 |
41 | with open(WHITELIST_FILE, "r", encoding="utf-8") as log_file:
42 | return log_file.read().splitlines()
43 |
44 |
45 | def load_log():
46 | """Reads the processed posts log file and creates it if it doesn't exist.
47 |
48 | Returns
49 | -------
50 | list
51 | A list of Reddit posts ids.
52 |
53 | """
54 |
55 | try:
56 | with open(POSTS_LOG, "r", encoding="utf-8") as log_file:
57 | return log_file.read().splitlines()
58 |
59 | except FileNotFoundError:
60 | with open(POSTS_LOG, "a", encoding="utf-8") as log_file:
61 | return []
62 |
63 |
64 | def update_log(post_id):
65 | """Updates the processed posts log with the given post id.
66 |
67 | Parameters
68 | ----------
69 | post_id : str
70 | A Reddit post id.
71 |
72 | """
73 |
74 | with open(POSTS_LOG, "a", encoding="utf-8") as log_file:
75 | log_file.write("{}\n".format(post_id))
76 |
77 |
78 | def log_error(error_message):
79 | """Updates the error log.
80 |
81 | Parameters
82 | ----------
83 | error_message : str
84 | A string containing the faulty url and the exception message.
85 |
86 | """
87 |
88 | with open(ERROR_LOG, "a", encoding="utf-8") as log_file:
89 | log_file.write("{}\n".format(error_message))
90 |
91 |
92 | def init():
93 | """Inits the bot."""
94 |
95 | reddit = praw.Reddit(client_id=config.APP_ID, client_secret=config.APP_SECRET,
96 | user_agent=config.USER_AGENT, username=config.REDDIT_USERNAME,
97 | password=config.REDDIT_PASSWORD)
98 |
99 | processed_posts = load_log()
100 | whitelist = load_whitelist()
101 |
102 | for subreddit in config.SUBREDDITS:
103 |
104 | for submission in reddit.subreddit(subreddit).new(limit=50):
105 |
106 | if submission.id not in processed_posts:
107 |
108 | clean_url = submission.url.replace("amp.", "")
109 | ext = tldextract.extract(clean_url)
110 | domain = "{}.{}".format(ext.domain, ext.suffix)
111 |
112 | if domain in whitelist:
113 |
114 | try:
115 | with requests.get(clean_url, headers=HEADERS, timeout=10) as response:
116 |
117 | # Most of the times the encoding is utf-8 but in edge cases
118 | # we set it to ISO-8859-1 when it is present in the HTML header.
119 | if "iso-8859-1" in response.text.lower():
120 | response.encoding = "iso-8859-1"
121 | elif response.encoding == "ISO-8859-1":
122 | response.encoding = "utf-8"
123 |
124 | html_source = response.text
125 |
126 | article_title, article_date, article_body = scraper.scrape_html(
127 | html_source)
128 |
129 | summary_dict = summary.get_summary(article_body)
130 | except Exception as e:
131 | log_error("{},{}".format(clean_url, e))
132 | update_log(submission.id)
133 | print("Failed:", submission.id)
134 | continue
135 |
136 | # To reduce low quality submissions, we only process those that made a meaningful summary.
137 | if summary_dict["reduction"] >= MINIMUM_REDUCTION_THRESHOLD and summary_dict["reduction"] <= MAXIMUM_REDUCTION_THRESHOLD:
138 |
139 | # Create a wordcloud, upload it to Imgur and get back the url.
140 | image_url = cloud.generate_word_cloud(
141 | summary_dict["article_words"])
142 |
143 | # We start creating the comment body.
144 | post_body = "\n\n".join(
145 | ["> " + item for item in summary_dict["top_sentences"]])
146 |
147 | top_words = ""
148 |
149 | for index, word in enumerate(summary_dict["top_words"]):
150 | top_words += "{}^#{} ".format(word, index+1)
151 |
152 | post_message = TEMPLATE.format(
153 | article_title, clean_url, summary_dict["reduction"], article_date, post_body, image_url, top_words)
154 |
155 | reddit.submission(submission.id).reply(post_message)
156 | update_log(submission.id)
157 | print("Replied to:", submission.id)
158 | else:
159 | update_log(submission.id)
160 | print("Skipped:", submission.id)
161 |
162 |
163 | if __name__ == "__main__":
164 |
165 | init()
166 |
--------------------------------------------------------------------------------
/cloud.py:
--------------------------------------------------------------------------------
1 | """
2 | This script generates a word cloud from the article words. Uploads it to Imgur and returns back the url.
3 | """
4 |
5 | import os
6 | import random
7 |
8 | import numpy as np
9 | import requests
10 | import wordcloud
11 | from PIL import Image
12 |
13 | import config
14 |
15 | MASK_FILE = "./assets/cloud.png"
16 | FONT_FILE = "./assets/sofiapro-light.otf"
17 | IMAGE_PATH = "./temp.png"
18 |
19 | COLORMAPS = ["spring", "summer", "autumn", "Wistia"]
20 |
21 | mask = np.array(Image.open(MASK_FILE))
22 |
23 |
24 | def generate_word_cloud(text):
25 | """Generates a word cloud and uploads it to Imgur.
26 |
27 | Parameters
28 | ----------
29 | text : str
30 | The text to be converted into a word cloud.
31 |
32 | Returns
33 | -------
34 | str
35 | The url generated from the Imgur API.
36 | """
37 |
38 | wc = wordcloud.WordCloud(background_color="#222222",
39 | max_words=2000,
40 | mask=mask,
41 | contour_width=2,
42 | colormap=random.choice(COLORMAPS),
43 | font_path=FONT_FILE,
44 | contour_color="white")
45 |
46 | wc.generate(text)
47 | wc.to_file(IMAGE_PATH)
48 | image_link = upload_image(IMAGE_PATH)
49 | os.remove(IMAGE_PATH)
50 |
51 | return image_link
52 |
53 |
54 | def upload_image(image_path):
55 | """Uploads an image to Imgur and returns the permanent link url.
56 |
57 | Parameters
58 | ----------
59 | image_path : str
60 | The path of the file to be uploaded.
61 |
62 | Returns
63 | -------
64 | str
65 | The url generated from the Imgur API.
66 | """
67 |
68 | url = "https://api.imgur.com/3/image"
69 | headers = {"Authorization": "Client-ID " + config.IMGUR_CLIENT_ID}
70 | files = {"image": open(IMAGE_PATH, "rb")}
71 |
72 | with requests.post(url, headers=headers, files=files) as response:
73 |
74 | # We extract the new link from the response.
75 | image_link = response.json()["data"]["link"]
76 |
77 | return image_link
78 |
--------------------------------------------------------------------------------
/cloud_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PhantomInsights/summarizer/d8b4d7745ca9ba4309fc9707b7c98ae143b97a10/cloud_example.png
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | """Required constants for the Reddit API."""
2 |
3 | # The following constants are used by the bot.
4 | REDDIT_USERNAME = ""
5 | REDDIT_PASSWORD = ""
6 |
7 | APP_ID = ""
8 | APP_SECRET = ""
9 | USER_AGENT = ""
10 |
11 | SUBREDDITS = ["mexico"]
12 |
13 | IMGUR_CLIENT_ID = ""
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PhantomInsights/summarizer/d8b4d7745ca9ba4309fc9707b7c98ae143b97a10/requirements.txt
--------------------------------------------------------------------------------
/scraper.py:
--------------------------------------------------------------------------------
1 | """
2 | This function tries to extract the article title, date and body from an HTML string.
3 | """
4 |
5 | from datetime import datetime
6 |
7 | from bs4 import BeautifulSoup
8 |
9 | # We don't process articles that have fewer characters than this.
10 | ARTICLE_MINIMUM_LENGTH = 650
11 |
12 |
13 | def scrape_html(html_source):
14 | """Tries to scrape the article from the given HTML source.
15 |
16 | Parameters
17 | ----------
18 | html_source : str
19 | The html source of the article.
20 |
21 | Returns
22 | -------
23 | tuple
24 | The article title, date and body.
25 |
26 | """
27 |
28 | # Very often the text between tags comes together, we add an artificial newline to each common tag.
29 | for item in ["", "", "
", "", "
"]:
30 | html_source = html_source.replace(item, item+"\n")
31 |
32 | # We create a BeautifulSOup object and remove the unnecessary tags.
33 | soup = BeautifulSoup(html_source, "html5lib")
34 |
35 | # Then we extract the title and the article tags.
36 | article_title = soup.find("title").text.replace("\n", " ").strip()
37 |
38 | # If our title is too short we fallback to the first h1 tag.
39 | if len(article_title) <= 5:
40 | article_title = soup.find("h1").text.replace("\n", " ").strip()
41 |
42 | article_date = ""
43 |
44 | # We look for the first meta tag that has the word 'time' in it.
45 | for item in soup.find_all("meta"):
46 |
47 | if "time" in item.get("property", ""):
48 |
49 | clean_date = item["content"].split("+")[0].replace("Z", "")
50 |
51 | # Use your preferred time formatting.
52 | article_date = "{:%d-%m-%Y a las %H:%M:%S}".format(
53 | datetime.fromisoformat(clean_date))
54 | break
55 |
56 | # If we didn't find any meta tag with a datetime we look for a 'time' tag.
57 | if len(article_date) <= 5:
58 | try:
59 | article_date = soup.find("time").text.strip()
60 | except:
61 | pass
62 |
63 | # We remove some tags that add noise.
64 | [tag.extract() for tag in soup.find_all(
65 | ["script", "img", "ol", "ul", "time", "h1", "h2", "h3", "iframe", "style", "form", "footer", "figcaption"])]
66 |
67 | # These class names/ids are known to add noise or duplicate text to the article.
68 | noisy_names = ["image", "img", "video", "subheadline", "editor", "fondea", "resumen", "tags", "sidebar", "comment",
69 | "entry-title", "breaking_content", "pie", "tract", "caption", "tweet", "expert", "previous", "next",
70 | "compartir", "rightbar", "mas", "copyright", "instagram-media", "cookie", "paywall", "mainlist", "sitelist"]
71 |
72 | for tag in soup.find_all("div"):
73 |
74 | try:
75 | tag_id = tag["id"].lower()
76 |
77 | for item in noisy_names:
78 | if item in tag_id:
79 | tag.extract()
80 | except:
81 | pass
82 |
83 | for tag in soup.find_all(["div", "p", "blockquote"]):
84 |
85 | try:
86 | tag_class = "".join(tag["class"]).lower()
87 |
88 | for item in noisy_names:
89 | if item in tag_class:
90 | tag.extract()
91 | except:
92 | pass
93 |
94 | # These names commonly hold the article text.
95 | common_names = ["artic", "summary", "cont", "note", "cuerpo", "body"]
96 |
97 | article_body = ""
98 |
99 | # Sometimes we have more than one article tag. We are going to grab the longest one.
100 | for article_tag in soup.find_all("article"):
101 |
102 | if len(article_tag.text) >= len(article_body):
103 | article_body = article_tag.text
104 |
105 | # The article is too short, let's try to find it in another tag.
106 | if len(article_body) <= ARTICLE_MINIMUM_LENGTH:
107 |
108 | for tag in soup.find_all(["div", "section"]):
109 |
110 | try:
111 | tag_id = tag["id"].lower()
112 |
113 | for item in common_names:
114 | if item in tag_id:
115 | # We guarantee to get the longest div.
116 | if len(tag.text) >= len(article_body):
117 | article_body = tag.text
118 | except:
119 | pass
120 |
121 | # The article is still too short, let's try one more time.
122 | if len(article_body) <= ARTICLE_MINIMUM_LENGTH:
123 |
124 | for tag in soup.find_all(["div", "section"]):
125 |
126 | try:
127 | tag_class = "".join(tag["class"]).lower()
128 |
129 | for item in common_names:
130 | if item in tag_class:
131 | # We guarantee to get the longest div.
132 | if len(tag.text) >= len(article_body):
133 | article_body = tag.text
134 | except:
135 | pass
136 |
137 | return article_title, article_date, article_body
138 |
--------------------------------------------------------------------------------
/summary.py:
--------------------------------------------------------------------------------
1 | """
2 | This script extracts and ranks the sentences and words of an article.
3 |
4 | IT is inspired by the tf-idf algorithm.
5 | """
6 |
7 | from collections import Counter
8 |
9 | import spacy
10 |
11 | # The stop words files.
12 | ES_STOPWORDS_FILE = "./assets/stopwords-es.txt"
13 | EN_STOPWORDS_FILE = "./assets/stopwords-en.txt"
14 |
15 | # The number of sentences we need.
16 | NUMBER_OF_SENTENCES = 5
17 |
18 | # The number of top words we need.
19 | NUMBER_OF_TOP_WORDS = 5
20 |
21 | # Multiplier for uppercase and long words.
22 | IMPORTANT_WORDS_MULTIPLIER = 2.5
23 |
24 | # Financial sentences often are more important than others.
25 | FINANCIAL_SENTENCE_MULTIPLIER = 1.5
26 |
27 | # The minimum number of characters needed for a line to be valid.
28 | LINE_LENGTH_THRESHOLD = 150
29 |
30 | # It is very important to add spaces on these words.
31 | # Otherwise it will take into account partial words.
32 | COMMON_WORDS = {
33 | " ", " ", "\xa0", "#", ",", "|", "-", "‘", "’", ";", "(", ")", ".", ":", "¿", "?", '“', "/",
34 | '”', '"', "'", "%", "•", "«", "»", "foto", "photo", "video", "redacción", "nueve", "diez", "cien",
35 | "mil", "miles", "ciento", "cientos", "millones", "vale"
36 | }
37 |
38 | # These words increase the score of a sentence. They don't require whitespaces around them.
39 | FINANCIAL_WORDS = ["$", "€", "£", "pesos", "dólar", "libras", "euros",
40 | "dollar", "pound", "mdp", "mdd"]
41 |
42 |
43 | # Don't forget to specify the correct model for your language.
44 | NLP = spacy.load("es_core_news_sm")
45 |
46 |
47 | def add_extra_words():
48 | """Adds the title and uppercase forms of all words to COMMON_WORDS.
49 |
50 | We parse local copies of stop words downloaded from the following repositories:
51 |
52 | https://github.com/stopwords-iso/stopwords-es
53 | https://github.com/stopwords-iso/stopwords-en
54 | """
55 |
56 | with open(ES_STOPWORDS_FILE, "r", encoding="utf-8") as temp_file:
57 | for word in temp_file.read().splitlines():
58 | COMMON_WORDS.add(word)
59 |
60 | with open(EN_STOPWORDS_FILE, "r", encoding="utf-8") as temp_file:
61 | for word in temp_file.read().splitlines():
62 | COMMON_WORDS.add(word)
63 |
64 |
65 | add_extra_words()
66 |
67 |
68 | def get_summary(article):
69 | """Generates the top words and sentences from the article text.
70 |
71 | Parameters
72 | ----------
73 | article : str
74 | The article text.
75 |
76 | Returns
77 | -------
78 | dict
79 | A dict containing the title of the article, reduction percentage, top words and the top scored sentences.
80 |
81 | """
82 |
83 | # Now we prepare the article for scoring.
84 | cleaned_article = clean_article(article)
85 |
86 | # We start the NLP process.
87 | doc = NLP(cleaned_article)
88 |
89 | article_sentences = [sent for sent in doc.sents]
90 |
91 | words_of_interest = [
92 | token.text for token in doc if token.lower_ not in COMMON_WORDS]
93 |
94 | # We use the Counter class to count all words ocurrences.
95 | scored_words = Counter(words_of_interest)
96 |
97 | for word in scored_words:
98 |
99 | # We add bonus points to words starting in uppercase and are equal or longer than 4 characters.
100 | if word[0].isupper() and len(word) >= 4:
101 | scored_words[word] *= IMPORTANT_WORDS_MULTIPLIER
102 |
103 | # If the word is a number we punish it by settings its points to 0.
104 | if word.isdigit():
105 | scored_words[word] = 0
106 |
107 | top_sentences = get_top_sentences(article_sentences, scored_words)
108 | top_sentences_length = sum([len(sentence) for sentence in top_sentences])
109 | reduction = 100 - (top_sentences_length / len(cleaned_article)) * 100
110 |
111 | summary_dict = {
112 | "top_words": get_top_words(scored_words),
113 | "top_sentences": top_sentences,
114 | "reduction": reduction,
115 | "article_words": " ".join(words_of_interest)
116 | }
117 |
118 | return summary_dict
119 |
120 |
121 | def clean_article(article_text):
122 | """Cleans and reformats the article text.
123 |
124 | Parameters
125 | ----------
126 | article_text : str
127 | The article string.
128 |
129 | Returns
130 | -------
131 | str
132 | The cleaned up article.
133 |
134 | """
135 |
136 | # We divide the script into lines, this is to remove unnecessary whitespaces.
137 | lines_list = list()
138 |
139 | for line in article_text.split("\n"):
140 |
141 | # We remove whitespaces.
142 | stripped_line = line.strip()
143 |
144 | # If the line is too short we ignore it.
145 | if len(stripped_line) >= LINE_LENGTH_THRESHOLD:
146 | lines_list.append(stripped_line)
147 |
148 | # Now we have the article fully cleaned.
149 | return " ".join(lines_list)
150 |
151 |
152 | def get_top_words(scored_words):
153 | """Gets the top scored words from the prepared article.
154 |
155 | Parameters
156 | ----------
157 | scored_words : collections.Counter
158 | A Counter containing the article words and their scores.
159 |
160 | Returns
161 | -------
162 | list
163 | An ordered list with the top words.
164 |
165 | """
166 |
167 | # Once we have our words scored it's time to get top ones.
168 | top_words = list()
169 |
170 | for word, score in scored_words.most_common():
171 |
172 | add_to_list = True
173 |
174 | # We avoid duplicates by checking if the word already is in the top_words list.
175 | if word.upper() not in [item.upper() for item in top_words]:
176 |
177 | # Sometimes we have the same word but in plural form, we skip the word when that happens.
178 | for item in top_words:
179 | if word.upper() in item.upper() or item.upper() in word.upper():
180 | add_to_list = False
181 |
182 | if add_to_list:
183 | top_words.append(word)
184 |
185 | return top_words[0:NUMBER_OF_TOP_WORDS]
186 |
187 |
188 | def get_top_sentences(article_sentences, scored_words):
189 | """Gets the top scored sentences from the cleaned article.
190 |
191 | Parameters
192 | ----------
193 | cleaned_article : str
194 | The original article after it has been cleaned and reformatted.
195 |
196 | scored_words : collections.Counter
197 | A Counter containing the article words and their scores.
198 |
199 | Returns
200 | -------
201 | list
202 | An ordered list with the top sentences.
203 |
204 | """
205 |
206 | # Now its time to score each sentence.
207 | scored_sentences = list()
208 |
209 | # We take a reference of the order of the sentences, this will be used later.
210 | for index, sent in enumerate(article_sentences):
211 |
212 | # In some edge cases we have duplicated sentences, we make sure that doesn't happen.
213 | if sent.text not in [sent for score, index, sent in scored_sentences]:
214 | scored_sentences.append(
215 | [score_line(sent, scored_words), index, sent.text])
216 |
217 | top_sentences = list()
218 | counter = 0
219 |
220 | for score, index, sentence in sorted(scored_sentences, reverse=True):
221 |
222 | if counter >= NUMBER_OF_SENTENCES:
223 | break
224 |
225 | # When the article is too small the sentences may come empty.
226 | if len(sentence) >= 3:
227 |
228 | # We clean the sentence and its index so we can sort in chronological order.
229 | top_sentences.append([index, sentence])
230 | counter += 1
231 |
232 | return [sentence for index, sentence in sorted(top_sentences)]
233 |
234 |
235 | def score_line(line, scored_words):
236 | """Calculates the score of the given line using the word scores.
237 |
238 | Parameters
239 | ----------
240 | line : spacy.tokens.span.Span
241 | A tokenized sentence from the article.
242 |
243 | scored_words : collections.Counter
244 | A Counter containing the article words and their scores.
245 |
246 | Returns
247 | -------
248 | int
249 | The total score of all the words in the sentence.
250 |
251 | """
252 |
253 | # We remove the common words.
254 | cleaned_line = [
255 | token.text for token in line if token.lower_ not in COMMON_WORDS]
256 |
257 | # We now sum the total number of ocurrences for all words.
258 | temp_score = 0
259 |
260 | for word in cleaned_line:
261 | temp_score += scored_words[word]
262 |
263 | # We apply a bonus score to sentences that contain financial information.
264 | line_lowercase = line.text.lower()
265 |
266 | for word in FINANCIAL_WORDS:
267 | if word in line_lowercase:
268 | temp_score *= FINANCIAL_SENTENCE_MULTIPLIER
269 | break
270 |
271 | return temp_score
272 |
--------------------------------------------------------------------------------
/templates/es.txt:
--------------------------------------------------------------------------------
1 | ### {}
2 |
3 | [Nota Original]({}) | Reducido en un {:.2f}% | {}
4 |
5 | *****
6 |
7 | {}
8 |
9 | *****
10 |
11 | *^Este ^bot ^solo ^responde ^cuando ^logra ^resumir ^en ^un ^mínimo ^del ^20%. ^Tus ^reportes, ^sugerencias ^y ^comentarios ^son ^bienvenidos. *
12 |
13 | [FAQ](https://redd.it/arkxlg) | [GitHub](https://git.io/fhQkC) | [☁️]({}) | {}
14 |
--------------------------------------------------------------------------------