├── .DS_Store
├── 5pages.pdf
├── Be_Good.pdf
├── SAMPLE-OF-ENV-FILE.txt
├── Street_Tree_List.csv
├── _100 AI Startups__ 100 LLM Apps that have earned $500,000 before their first year of existence.html
├── be-good-and-how-not-to-die.txt
├── be-good.txt
├── como_podemos_ayudarte.pdf
├── good.txt
├── state_of_the_union.txt
├── street_tree_db.sqlite
├── thefuzz-master
    ├── .DS_Store
    ├── .editorconfig
    ├── .github
    │   └── workflows
    │   │   └── ci.yml
    ├── .gitignore
    ├── CHANGES.rst
    ├── LICENSE.txt
    ├── MANIFEST.in
    ├── README.rst
    ├── benchmarks.py
    ├── data
    │   └── titledata.csv
    ├── release
    ├── setup.py
    ├── test_thefuzz.py
    ├── test_thefuzz_hypothesis.py
    ├── test_thefuzz_pytest.py
    ├── thefuzz
    │   ├── __init__.py
    │   ├── fuzz.py
    │   ├── fuzz.pyi
    │   ├── process.py
    │   ├── process.pyi
    │   ├── py.typed
    │   ├── utils.py
    │   └── utils.pyi
    └── tox.ini
├── thefuzz
    ├── __init__.py
    ├── fuzz.py
    ├── fuzz.pyi
    ├── process.py
    ├── process.pyi
    ├── py.typed
    ├── utils.py
    └── utils.pyi
└── youtube
    └── LLM Apps： Professional Opportunities for LLM App Developers..m4a


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/.DS_Store


--------------------------------------------------------------------------------
/5pages.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/5pages.pdf


--------------------------------------------------------------------------------
/Be_Good.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/Be_Good.pdf


--------------------------------------------------------------------------------
/SAMPLE-OF-ENV-FILE.txt:
--------------------------------------------------------------------------------
 1 | ﻿(remember: the name of this file should be .env)
 2 | (this file should not be in the data folder, but in the root folder)
 3 | (replace … with your confidential key)
 4 | (remove the keys you are not using)
 5 | 
 6 | 
 7 | OPENAI_API_KEY=…
 8 | 
 9 | 
10 | LANGCHAIN_TRACING_V2=true
11 | LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
12 | LANGCHAIN_API_KEY=…
13 | 
14 | 
15 | SERPAPI_API_KEY=…
16 | HUGGINGFACEHUB_API_TOKEN=…
17 | COHERE_API_KEY=..
18 | DEEPLAKE_API_KEY=…
19 | GOOGLE_API_KEY=…
20 | GOOGLE_CSE_ID=…
21 | ACTIVELOOP_ORG_ID=…
22 | REPLICATE_API_TOKEN=…
23 | PALM_API_KEY=..
24 | PALM_REGION=..


--------------------------------------------------------------------------------
/be-good-and-how-not-to-die.txt:
--------------------------------------------------------------------------------
  1 | Be good
  2 | 
  3 | April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the
  4 | phrase that became our motto: Make something people want.  We've
  5 | learned a lot since then, but if I were choosing now that's still
  6 | the one I'd pick.Another thing we tell founders is not to worry too much about the
  7 | business model, at least at first.  Not because making money is
  8 | unimportant, but because it's so much easier than building something
  9 | great.A couple weeks ago I realized that if you put those two ideas
 10 | together, you get something surprising.  Make something people want.
 11 | Don't worry too much about making money.  What you've got is a
 12 | description of a charity.When you get an unexpected result like this, it could either be a
 13 | bug or a new discovery.  Either businesses aren't supposed to be
 14 | like charities, and we've proven by reductio ad absurdum that one
 15 | or both of the principles we began with is false.  Or we have a new
 16 | idea.I suspect it's the latter, because as soon as this thought occurred
 17 | to me, a whole bunch of other things fell into place.ExamplesFor example, Craigslist.  It's not a charity, but they run it like
 18 | one.  And they're astoundingly successful.  When you scan down the
 19 | list of most popular web sites, the number of employees at Craigslist
 20 | looks like a misprint. Their revenues aren't as high as they could
 21 | be, but most startups would be happy to trade places with them.In Patrick O'Brian's novels, his captains always try to get upwind
 22 | of their opponents.  If you're upwind, you decide when and if to
 23 | engage the other ship.  Craigslist is effectively upwind of enormous
 24 | revenues.  They'd face some challenges if they wanted to make more,
 25 | but not the sort you face when you're tacking upwind, trying to
 26 | force a crappy product on ambivalent users by spending ten times
 27 | as much on sales as on development.  [1]I'm not saying startups should aim to end up like Craigslist.
 28 | They're a product of unusual circumstances.  But they're a good
 29 | model for the early phases.Google looked a lot like a charity in the beginning. They didn't
 30 | have ads for over a year.  At year 1, Google was indistinguishable
 31 | from a nonprofit.  If a nonprofit or government organization had
 32 | started a project to index the web, Google at year 1 is the limit
 33 | of what they'd have produced.Back when I was working on spam filters I thought it would be a
 34 | good idea to have a web-based email service with good spam filtering.
 35 | I wasn't thinking of it as a company.  I just wanted to keep people
 36 | from getting spammed.  But as I thought more about this project, I
 37 | realized it would probably have to be a company.  It would cost
 38 | something to run, and it would be a pain to fund with grants and
 39 | donations.That was a surprising realization.  Companies often claim to be
 40 | benevolent, but it was surprising to realize there were purely
 41 | benevolent projects that had to be embodied as companies to work.I didn't want to start another company, so I didn't do it.  But if
 42 | someone had, they'd probably be quite rich now.  There was a window
 43 | of about two years when spam was increasing rapidly but all the big
 44 | email services had terrible filters.  If someone had launched a
 45 | new, spam-free mail service, users would have flocked to it.Notice the pattern here?  From either direction we get to the same
 46 | spot.  If you start from successful startups, you find they often
 47 | behaved like nonprofits.  And if you start from ideas for nonprofits,
 48 | you find they'd often make good startups.PowerHow wide is this territory?  Would all good nonprofits be good
 49 | companies?  Possibly not.  What makes Google so valuable is that
 50 | their users have money.  If you make people with money love you,
 51 | you can probably get some of it.  But could you also base a successful
 52 | startup on behaving like a nonprofit to people who don't have money?
 53 | Could you, for example, grow a successful startup out of curing an
 54 | unfashionable but deadly disease like malaria?I'm not sure, but I suspect that if you pushed this idea, you'd be
 55 | surprised how far it would go.  For example, people who apply to Y
 56 | Combinator don't generally have much money, and yet we can profit
 57 | by helping them, because with our help they could make money.  Maybe
 58 | the situation is similar with malaria.  Maybe an organization that
 59 | helped lift its weight off a country could benefit from the resulting
 60 | growth.I'm not proposing this is a serious idea.  I don't know anything
 61 | about malaria.  But I've been kicking ideas around long enough to
 62 | know when I come across a powerful one.One way to guess how far an idea extends is to ask yourself at what
 63 | point you'd bet against it.  The thought of betting against benevolence
 64 | is alarming in the same way as saying that something is technically
 65 | impossible.  You're just asking to be made a fool of, because these
 66 | are such powerful forces.  [2]For example, initially I thought maybe this principle only applied
 67 | to Internet startups.  Obviously it worked for Google, but what
 68 | about Microsoft?  Surely Microsoft isn't benevolent?  But when I
 69 | think back to the beginning, they were.  Compared to IBM they were
 70 | like Robin Hood.  When IBM introduced the PC, they thought they
 71 | were going to make money selling hardware at high prices.  But by
 72 | gaining control of the PC standard, Microsoft opened up the market
 73 | to any manufacturer.  Hardware prices plummeted, and lots of people
 74 | got to have computers who couldn't otherwise have afforded them.
 75 | It's the sort of thing you'd expect Google to do.Microsoft isn't so benevolent now.  Now when one thinks of what
 76 | Microsoft does to users, all the verbs that come to mind begin with
 77 | F.  [3] And yet it doesn't seem to pay.
 78 | Their stock price has been flat for years.  Back when they were
 79 | Robin Hood, their stock price rose like Google's.  Could there be
 80 | a connection?You can see how there would be.  When you're small, you can't bully
 81 | customers, so you have to charm them.  Whereas when you're big you
 82 | can maltreat them at will, and you tend to, because it's easier
 83 | than satisfying them.  You grow big by being nice, but you can stay
 84 | big by being mean.You get away with it till the underlying conditions change, and
 85 | then all your victims escape.  So "Don't be evil" may be the most
 86 | valuable thing Paul Buchheit made for Google, because it may turn
 87 | out to be an elixir of corporate youth.  I'm sure they find it
 88 | constraining, but think how valuable it will be if it saves them
 89 | from lapsing into the fatal laziness that afflicted Microsoft and
 90 | IBM.The curious thing is, this elixir is freely available to any other
 91 | company.  Anyone can adopt "Don't be evil."  The catch is that
 92 | people will hold you to it.  So I don't think you're going to see
 93 | record labels or tobacco companies using this discovery.MoraleThere's a lot of external evidence that benevolence works.  But how
 94 | does it work?  One advantage of investing in a large number of
 95 | startups is that you get a lot of data about how they work.  From
 96 | what we've seen, being good seems to help startups in three ways:
 97 | it improves their morale, it makes other people want to help them,
 98 | and above all, it helps them be decisive.Morale is tremendously important to a startup—so important
 99 | that morale alone is almost enough to determine success.  Startups
100 | are often described as emotional roller-coasters. One minute you're
101 | going to take over the world, and the next you're doomed.  The
102 | problem with feeling you're doomed is not just that it makes you
103 | unhappy, but that it makes you stop working.  So the downhills
104 | of the roller-coaster are more of a self fulfilling prophecy than
105 | the uphills.  If feeling you're going to succeed makes you work
106 | harder, that probably improves your chances of succeeding, but if
107 | feeling you're going to fail makes you stop working, that practically
108 | guarantees you'll fail.Here's where benevolence comes in.  If you feel you're really helping
109 | people, you'll keep working even when it seems like your startup
110 | is doomed.  Most of us have some amount of natural benevolence.
111 | The mere fact that someone needs you makes you want to help them.
112 | So if you start the kind of startup where users come back each day,
113 | you've basically built yourself a giant tamagotchi.  You've made
114 | something you need to take care of.Blogger is a famous example of a startup that went through really
115 | low lows and survived.  At one point they ran out of money and
116 | everyone left. Evan Williams came in to work the next day, and there
117 | was no one but him.  What kept him going?  Partly that users needed
118 | him.  He was hosting thousands of people's blogs. He couldn't just
119 | let the site die.There are many advantages of launching quickly, but the most important
120 | may be that once you have users, the tamagotchi effect kicks in.
121 | Once you have users to take care of, you're forced to figure out
122 | what will make them happy, and that's actually very valuable
123 | information.The added confidence that comes from trying to help people can
124 | also help you with investors. One of the founders of 
125 | Chatterous told 
126 | me recently that he and his cofounder had decided that this service
127 | was something the world needed, so they were going to keep working
128 | on it no matter what, even if they had to move back to Canada and live
129 | in their parents' basements.Once they realized this, they stopped caring so much what investors thought
130 | about them.  They still met with them, but they weren't going to
131 | die if they didn't get their money.  And you know what?  The investors
132 | got a lot more interested.  They could sense that the Chatterouses
133 | were going to do this startup with or without them.If you're really committed and your startup is cheap to run, you
134 | become very hard to kill.  And practically all startups, even the
135 | most successful, come close to death at some point.  So if doing
136 | good for people gives you a sense of mission that makes you harder
137 | to kill, that alone more than compensates for whatever you lose by
138 | not choosing a more selfish project.HelpAnother advantage of being good is that it makes other people want
139 | to help you.  This too seems to be an inborn trait in humans.One of the startups we've funded, Octopart, is currently locked in
140 | a classic battle of good versus evil.  They're a search site for
141 | industrial components.  A lot of people need to search for components,
142 | and before Octopart there was no good way to do it.  That, it turned
143 | out, was no coincidence.Octopart built the right way to search for components.  Users like
144 | it and they've been growing rapidly.  And yet for most of Octopart's
145 | life, the biggest distributor, Digi-Key, has been trying to force
146 | them take their prices off the site.  Octopart is sending them
147 | customers for free, and yet Digi-Key is trying to make that traffic
148 | stop.  Why?  Because their current business model depends on
149 | overcharging people who have incomplete information about prices.
150 | They don't want search to work.The Octoparts are the nicest guys in the world.  They dropped out
151 | of the PhD program in physics at Berkeley to do this.  They just
152 | wanted to fix a problem they encountered in their research.  Imagine
153 | how much time you could save the world's engineers if they could
154 | do searches online.  So when I hear that a big, evil company is
155 | trying to stop them in order to keep search broken, it makes me
156 | really want to help them. It makes me spend more time on the Octoparts
157 | than I do with most of the other startups we've funded.  It just
158 | made me spend several minutes telling you how great they are.  Why?
159 | Because they're good guys and they're trying to help the world.If you're benevolent, people will rally around you: investors,
160 | customers, other companies, and potential employees.  In the long
161 | term the most important may be the potential employees.  I think
162 | everyone knows now that 
163 | good hackers are much better than mediocre
164 | ones.  If you can attract the best hackers to work for you, as
165 | Google has, you have a big advantage.  And the very best hackers
166 | tend to be idealistic.  They're not desperate for a job.  They can
167 | work wherever they want.  So most want to work on things that will
168 | make the world better.CompassBut the most important advantage of being good is that it acts as
169 | a compass.  One of the hardest parts of doing a startup is that you
170 | have so many choices.  There are just two or three of you, and a
171 | thousand things you could do. How do you decide?Here's the answer: Do whatever's best for your users.  You can hold
172 | onto this like a rope in a hurricane, and it will save you if
173 | anything can.  Follow it and it will take you through everything
174 | you need to do.It's even the answer to questions that seem unrelated, like how to
175 | convince investors to give you money.  If you're a good salesman,
176 | you could try to just talk them into it.  But the more reliable
177 | route is to convince them through your users: if you make something
178 | users love enough to tell their friends, you grow exponentially,
179 | and that will convince any investor.Being good is a particularly useful strategy for making decisions
180 | in complex situations because it's stateless.  It's like telling
181 | the truth.  The trouble with lying is that you have to remember
182 | everything you've said in the past to make sure you don't contradict
183 | yourself.  If you tell the truth you don't have to remember anything,
184 | and that's a really useful property in domains where things happen
185 | fast.For example, Y Combinator has now invested in 80 startups, 57 of
186 | which are still alive.  (The rest have died or merged or been
187 | acquired.)  When you're trying to advise 57 startups, it turns out
188 | you have to have a stateless algorithm.  You can't have ulterior
189 | motives when you have 57 things going on at once, because you can't
190 | remember them.  So our rule is just to do whatever's best for the
191 | founders.  Not because we're particularly benevolent, but because
192 | it's the only algorithm that works on that scale.When you write something telling people to be good, you seem to be
193 | claiming to be good yourself.  So I want to say explicitly that I
194 | am not a particularly good person.  When I was a kid I was firmly
195 | in the camp of bad.  The way adults used the word good, it seemed
196 | to be synonymous with quiet, so I grew up very suspicious of it.You know how there are some people whose names come up in conversation
197 | and everyone says "He's such a great guy?"  People never say
198 | that about me.  The best I get is "he means well."  I am not claiming
199 | to be good.  At best I speak good as a second language.So I'm not suggesting you be good in the usual sanctimonious way.
200 | I'm suggesting it because it works.  It will work not just as a
201 | statement of "values," but as a guide to strategy,
202 | and even a design spec for software.  Don't just not be evil.  Be
203 | good.Notes[1] Fifty years ago
204 | it would have seemed shocking for a public company not to pay
205 | dividends.  Now many tech companies don't.  The markets seem to
206 | have figured out how to value potential dividends.  Maybe that isn't
207 | the last step in this evolution.  Maybe markets will eventually get
208 | comfortable with potential earnings. (VCs already are, and at least
209 | some of them consistently make money.)I realize this sounds like the stuff one used to hear about the
210 | "new economy" during the Bubble.  Believe me, I was not drinking
211 | that kool-aid at the time.  But I'm convinced there were some 
212 | good
213 | ideas buried in Bubble thinking.  For example, it's ok to focus on
214 | growth instead of profits—but only if the growth is genuine.
215 | You can't be buying users; that's a pyramid scheme.   But a company
216 | with rapid, genuine growth is valuable, and eventually markets learn
217 | how to value valuable things.[2] The idea of starting
218 | a company with benevolent aims is currently undervalued, because
219 | the kind of people who currently make that their explicit goal don't
220 | usually do a very good job.It's one of the standard career paths of trustafarians to start
221 | some vaguely benevolent business.  The problem with most of them
222 | is that they either have a bogus political agenda or are feebly
223 | executed.  The trustafarians' ancestors didn't get rich by preserving
224 | their traditional culture; maybe people in Bolivia don't want to
225 | either.  And starting an organic farm, though it's at least
226 | straightforwardly benevolent, doesn't help people on the scale that
227 | Google does.Most explicitly benevolent projects don't hold themselves sufficiently
228 | accountable.  They act as if having good intentions were enough to
229 | guarantee good effects.[3] Users dislike their
230 | new operating system so much that they're starting petitions to
231 | save the old one.  And the old one was nothing special.  The hackers
232 | within Microsoft must know in their hearts that if the company
233 | really cared about users they'd just advise them to switch to OSX.Thanks to Trevor Blackwell, Paul Buchheit, Jessica Livingston,
234 | and Robert Morris for reading drafts of this.
235 | 
236 | How not to die
237 | 
238 | August 2007
239 | 
240 | (This is a talk I gave at the last Y Combinator dinner of the summer. Usually we don't have a speaker at the last dinner; it's more of a party. But it seemed worth spoiling the atmosphere if I could save some of the startups from preventable deaths. So at the last minute I cooked up this rather grim talk. I didn't mean this as an essay; I wrote it down because I only had two hours before dinner and think fastest while writing.)
241 | 
242 | A couple days ago I told a reporter that we expected about a third of the companies we funded to succeed. Actually I was being conservative. I'm hoping it might be as much as a half. Wouldn't it be amazing if we could achieve a 50% success rate?
243 | 
244 | Another way of saying that is that half of you are going to die. Phrased that way, it doesn't sound good at all. In fact, it's kind of weird when you think about it, because our definition of success is that the founders get rich. If half the startups we fund succeed, then half of you are going to get rich and the other half are going to get nothing.
245 | 
246 | If you can just avoid dying, you get rich. That sounds like a joke, but it's actually a pretty good description of what happens in a typical startup. It certainly describes what happened in Viaweb. We avoided dying till we got rich.
247 | 
248 | It was really close, too. When we were visiting Yahoo to talk about being acquired, we had to interrupt everything and borrow one of their conference rooms to talk down an investor who was about to back out of a new funding round we needed to stay alive. So even in the middle of getting rich we were fighting off the grim reaper.
249 | 
250 | You may have heard that quote about luck consisting of opportunity meeting preparation. You've now done the preparation. The work you've done so far has, in effect, put you in a position to get lucky: you can now get rich by not letting your company die. That's more than most people have. So let's talk about how not to die.
251 | 
252 | We've done this five times now, and we've seen a bunch of startups die. About 10 of them so far. We don't know exactly what happens when they die, because they generally don't die loudly and heroically. Mostly they crawl off somewhere and die.
253 | 
254 | For us the main indication of impending doom is when we don't hear from you. When we haven't heard from, or about, a startup for a couple months, that's a bad sign. If we send them an email asking what's up, and they don't reply, that's a really bad sign. So far that is a 100% accurate predictor of death.
255 | 
256 | Whereas if a startup regularly does new deals and releases and either sends us mail or shows up at YC events, they're probably going to live.
257 | 
258 | I realize this will sound naive, but maybe the linkage works in both directions. Maybe if you can arrange that we keep hearing from you, you won't die.
259 | 
260 | That may not be so naive as it sounds. You've probably noticed that having dinners every Tuesday with us and the other founders causes you to get more done than you would otherwise, because every dinner is a mini Demo Day. Every dinner is a kind of a deadline. So the mere constraint of staying in regular contact with us will push you to make things happen, because otherwise you'll be embarrassed to tell us that you haven't done anything new since the last time we talked.
261 | 
262 | If this works, it would be an amazing hack. It would be pretty cool if merely by staying in regular contact with us you could get rich. It sounds crazy, but there's a good chance that would work.
263 | 
264 | A variant is to stay in touch with other YC-funded startups. There is now a whole neighborhood of them in San Francisco. If you move there, the peer pressure that made you work harder all summer will continue to operate.
265 | 
266 | When startups die, the official cause of death is always either running out of money or a critical founder bailing. Often the two occur simultaneously. But I think the underlying cause is usually that they've become demoralized. You rarely hear of a startup that's working around the clock doing deals and pumping out new features, and dies because they can't pay their bills and their ISP unplugs their server.
267 | 
268 | Startups rarely die in mid keystroke. So keep typing!
269 | 
270 | If so many startups get demoralized and fail when merely by hanging on they could get rich, you have to assume that running a startup can be demoralizing. That is certainly true. I've been there, and that's why I've never done another startup. The low points in a startup are just unbelievably low. I bet even Google had moments where things seemed hopeless.
271 | 
272 | Knowing that should help. If you know it's going to feel terrible sometimes, then when it feels terrible you won't think "ouch, this feels terrible, I give up." It feels that way for everyone. And if you just hang on, things will probably get better. The metaphor people use to describe the way a startup feels is at least a roller coaster and not drowning. You don't just sink and sink; there are ups after the downs.
273 | 
274 | Another feeling that seems alarming but is in fact normal in a startup is the feeling that what you're doing isn't working. The reason you can expect to feel this is that what you do probably won't work. Startups almost never get it right the first time. Much more commonly you launch something, and no one cares. Don't assume when this happens that you've failed. That's normal for startups. But don't sit around doing nothing. Iterate.
275 | 
276 | I like Paul Buchheit's suggestion of trying to make something that at least someone really loves. As long as you've made something that a few users are ecstatic about, you're on the right track. It will be good for your morale to have even a handful of users who really love you, and startups run on morale. But also it will tell you what to focus on. What is it about you that they love? Can you do more of that? Where can you find more people who love that sort of thing? As long as you have some core of users who love you, all you have to do is expand it. It may take a while, but as long as you keep plugging away, you'll win in the end. Both Blogger and Delicious did that. Both took years to succeed. But both began with a core of fanatically devoted users, and all Evan and Joshua had to do was grow that core incrementally. Wufoo is on the same trajectory now.
277 | 
278 | So when you release something and it seems like no one cares, look more closely. Are there zero users who really love you, or is there at least some little group that does? It's quite possible there will be zero. In that case, tweak your product and try again. Every one of you is working on a space that contains at least one winning permutation somewhere in it. If you just keep trying, you'll find it.
279 | 
280 | Let me mention some things not to do. The number one thing not to do is other things. If you find yourself saying a sentence that ends with "but we're going to keep working on the startup," you are in big trouble. Bob's going to grad school, but we're going to keep working on the startup. We're moving back to Minnesota, but we're going to keep working on the startup. We're taking on some consulting projects, but we're going to keep working on the startup. You may as well just translate these to "we're giving up on the startup, but we're not willing to admit that to ourselves," because that's what it means most of the time. A startup is so hard that working on it can't be preceded by "but."
281 | 
282 | In particular, don't go to graduate school, and don't start other projects. Distraction is fatal to startups. Going to (or back to) school is a huge predictor of death because in addition to the distraction it gives you something to say you're doing. If you're only doing a startup, then if the startup fails, you fail. If you're in grad school and your startup fails, you can say later "Oh yeah, we had this startup on the side when I was in grad school, but it didn't go anywhere."
283 | 
284 | You can't use euphemisms like "didn't go anywhere" for something that's your only occupation. People won't let you.
285 | 
286 | One of the most interesting things we've discovered from working on Y Combinator is that founders are more motivated by the fear of looking bad than by the hope of getting millions of dollars. So if you want to get millions of dollars, put yourself in a position where failure will be public and humiliating.
287 | 
288 | When we first met the founders of Octopart, they seemed very smart, but not a great bet to succeed, because they didn't seem especially committed. One of the two founders was still in grad school. It was the usual story: he'd drop out if it looked like the startup was taking off. Since then he has not only dropped out of grad school, but appeared full length in Newsweek with the word "Billionaire" printed across his chest. He just cannot fail now. Everyone he knows has seen that picture. Girls who dissed him in high school have seen it. His mom probably has it on the fridge. It would be unthinkably humiliating to fail now. At this point he is committed to fight to the death.
289 | 
290 | I wish every startup we funded could appear in a Newsweek article describing them as the next generation of billionaires, because then none of them would be able to give up. The success rate would be 90%. I'm not kidding.
291 | 
292 | When we first knew the Octoparts they were lighthearted, cheery guys. Now when we talk to them they seem grimly determined. The electronic parts distributors are trying to squash them to keep their monopoly pricing. (If it strikes you as odd that people still order electronic parts out of thick paper catalogs in 2007, there's a reason for that. The distributors want to prevent the transparency that comes from having prices online.) I feel kind of bad that we've transformed these guys from lighthearted to grimly determined. But that comes with the territory. If a startup succeeds, you get millions of dollars, and you don't get that kind of money just by asking for it. You have to assume it takes some amount of pain.
293 | 
294 | And however tough things get for the Octoparts, I predict they'll succeed. They may have to morph themselves into something totally different, but they won't just crawl off and die. They're smart; they're working in a promising field; and they just cannot give up.
295 | 
296 | All of you guys already have the first two. You're all smart and working on promising ideas. Whether you end up among the living or the dead comes down to the third ingredient, not giving up.
297 | 
298 | So I'll tell you now: bad shit is coming. It always is in a startup. The odds of getting from launch to liquidity without some kind of disaster happening are one in a thousand. So don't get demoralized. When the disaster strikes, just say to yourself, ok, this was what Paul was talking about. What did he say to do? Oh, yeah. Don't give up.


--------------------------------------------------------------------------------
/be-good.txt:
--------------------------------------------------------------------------------
  1 | Be good
  2 | 
  3 | April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the
  4 | phrase that became our motto: Make something people want.  We've
  5 | learned a lot since then, but if I were choosing now that's still
  6 | the one I'd pick.Another thing we tell founders is not to worry too much about the
  7 | business model, at least at first.  Not because making money is
  8 | unimportant, but because it's so much easier than building something
  9 | great.A couple weeks ago I realized that if you put those two ideas
 10 | together, you get something surprising.  Make something people want.
 11 | Don't worry too much about making money.  What you've got is a
 12 | description of a charity.When you get an unexpected result like this, it could either be a
 13 | bug or a new discovery.  Either businesses aren't supposed to be
 14 | like charities, and we've proven by reductio ad absurdum that one
 15 | or both of the principles we began with is false.  Or we have a new
 16 | idea.I suspect it's the latter, because as soon as this thought occurred
 17 | to me, a whole bunch of other things fell into place.ExamplesFor example, Craigslist.  It's not a charity, but they run it like
 18 | one.  And they're astoundingly successful.  When you scan down the
 19 | list of most popular web sites, the number of employees at Craigslist
 20 | looks like a misprint. Their revenues aren't as high as they could
 21 | be, but most startups would be happy to trade places with them.In Patrick O'Brian's novels, his captains always try to get upwind
 22 | of their opponents.  If you're upwind, you decide when and if to
 23 | engage the other ship.  Craigslist is effectively upwind of enormous
 24 | revenues.  They'd face some challenges if they wanted to make more,
 25 | but not the sort you face when you're tacking upwind, trying to
 26 | force a crappy product on ambivalent users by spending ten times
 27 | as much on sales as on development.  [1]I'm not saying startups should aim to end up like Craigslist.
 28 | They're a product of unusual circumstances.  But they're a good
 29 | model for the early phases.Google looked a lot like a charity in the beginning. They didn't
 30 | have ads for over a year.  At year 1, Google was indistinguishable
 31 | from a nonprofit.  If a nonprofit or government organization had
 32 | started a project to index the web, Google at year 1 is the limit
 33 | of what they'd have produced.Back when I was working on spam filters I thought it would be a
 34 | good idea to have a web-based email service with good spam filtering.
 35 | I wasn't thinking of it as a company.  I just wanted to keep people
 36 | from getting spammed.  But as I thought more about this project, I
 37 | realized it would probably have to be a company.  It would cost
 38 | something to run, and it would be a pain to fund with grants and
 39 | donations.That was a surprising realization.  Companies often claim to be
 40 | benevolent, but it was surprising to realize there were purely
 41 | benevolent projects that had to be embodied as companies to work.I didn't want to start another company, so I didn't do it.  But if
 42 | someone had, they'd probably be quite rich now.  There was a window
 43 | of about two years when spam was increasing rapidly but all the big
 44 | email services had terrible filters.  If someone had launched a
 45 | new, spam-free mail service, users would have flocked to it.Notice the pattern here?  From either direction we get to the same
 46 | spot.  If you start from successful startups, you find they often
 47 | behaved like nonprofits.  And if you start from ideas for nonprofits,
 48 | you find they'd often make good startups.PowerHow wide is this territory?  Would all good nonprofits be good
 49 | companies?  Possibly not.  What makes Google so valuable is that
 50 | their users have money.  If you make people with money love you,
 51 | you can probably get some of it.  But could you also base a successful
 52 | startup on behaving like a nonprofit to people who don't have money?
 53 | Could you, for example, grow a successful startup out of curing an
 54 | unfashionable but deadly disease like malaria?I'm not sure, but I suspect that if you pushed this idea, you'd be
 55 | surprised how far it would go.  For example, people who apply to Y
 56 | Combinator don't generally have much money, and yet we can profit
 57 | by helping them, because with our help they could make money.  Maybe
 58 | the situation is similar with malaria.  Maybe an organization that
 59 | helped lift its weight off a country could benefit from the resulting
 60 | growth.I'm not proposing this is a serious idea.  I don't know anything
 61 | about malaria.  But I've been kicking ideas around long enough to
 62 | know when I come across a powerful one.One way to guess how far an idea extends is to ask yourself at what
 63 | point you'd bet against it.  The thought of betting against benevolence
 64 | is alarming in the same way as saying that something is technically
 65 | impossible.  You're just asking to be made a fool of, because these
 66 | are such powerful forces.  [2]For example, initially I thought maybe this principle only applied
 67 | to Internet startups.  Obviously it worked for Google, but what
 68 | about Microsoft?  Surely Microsoft isn't benevolent?  But when I
 69 | think back to the beginning, they were.  Compared to IBM they were
 70 | like Robin Hood.  When IBM introduced the PC, they thought they
 71 | were going to make money selling hardware at high prices.  But by
 72 | gaining control of the PC standard, Microsoft opened up the market
 73 | to any manufacturer.  Hardware prices plummeted, and lots of people
 74 | got to have computers who couldn't otherwise have afforded them.
 75 | It's the sort of thing you'd expect Google to do.Microsoft isn't so benevolent now.  Now when one thinks of what
 76 | Microsoft does to users, all the verbs that come to mind begin with
 77 | F.  [3] And yet it doesn't seem to pay.
 78 | Their stock price has been flat for years.  Back when they were
 79 | Robin Hood, their stock price rose like Google's.  Could there be
 80 | a connection?You can see how there would be.  When you're small, you can't bully
 81 | customers, so you have to charm them.  Whereas when you're big you
 82 | can maltreat them at will, and you tend to, because it's easier
 83 | than satisfying them.  You grow big by being nice, but you can stay
 84 | big by being mean.You get away with it till the underlying conditions change, and
 85 | then all your victims escape.  So "Don't be evil" may be the most
 86 | valuable thing Paul Buchheit made for Google, because it may turn
 87 | out to be an elixir of corporate youth.  I'm sure they find it
 88 | constraining, but think how valuable it will be if it saves them
 89 | from lapsing into the fatal laziness that afflicted Microsoft and
 90 | IBM.The curious thing is, this elixir is freely available to any other
 91 | company.  Anyone can adopt "Don't be evil."  The catch is that
 92 | people will hold you to it.  So I don't think you're going to see
 93 | record labels or tobacco companies using this discovery.MoraleThere's a lot of external evidence that benevolence works.  But how
 94 | does it work?  One advantage of investing in a large number of
 95 | startups is that you get a lot of data about how they work.  From
 96 | what we've seen, being good seems to help startups in three ways:
 97 | it improves their morale, it makes other people want to help them,
 98 | and above all, it helps them be decisive.Morale is tremendously important to a startup—so important
 99 | that morale alone is almost enough to determine success.  Startups
100 | are often described as emotional roller-coasters. One minute you're
101 | going to take over the world, and the next you're doomed.  The
102 | problem with feeling you're doomed is not just that it makes you
103 | unhappy, but that it makes you stop working.  So the downhills
104 | of the roller-coaster are more of a self fulfilling prophecy than
105 | the uphills.  If feeling you're going to succeed makes you work
106 | harder, that probably improves your chances of succeeding, but if
107 | feeling you're going to fail makes you stop working, that practically
108 | guarantees you'll fail.Here's where benevolence comes in.  If you feel you're really helping
109 | people, you'll keep working even when it seems like your startup
110 | is doomed.  Most of us have some amount of natural benevolence.
111 | The mere fact that someone needs you makes you want to help them.
112 | So if you start the kind of startup where users come back each day,
113 | you've basically built yourself a giant tamagotchi.  You've made
114 | something you need to take care of.Blogger is a famous example of a startup that went through really
115 | low lows and survived.  At one point they ran out of money and
116 | everyone left. Evan Williams came in to work the next day, and there
117 | was no one but him.  What kept him going?  Partly that users needed
118 | him.  He was hosting thousands of people's blogs. He couldn't just
119 | let the site die.There are many advantages of launching quickly, but the most important
120 | may be that once you have users, the tamagotchi effect kicks in.
121 | Once you have users to take care of, you're forced to figure out
122 | what will make them happy, and that's actually very valuable
123 | information.The added confidence that comes from trying to help people can
124 | also help you with investors. One of the founders of 
125 | Chatterous told 
126 | me recently that he and his cofounder had decided that this service
127 | was something the world needed, so they were going to keep working
128 | on it no matter what, even if they had to move back to Canada and live
129 | in their parents' basements.Once they realized this, they stopped caring so much what investors thought
130 | about them.  They still met with them, but they weren't going to
131 | die if they didn't get their money.  And you know what?  The investors
132 | got a lot more interested.  They could sense that the Chatterouses
133 | were going to do this startup with or without them.If you're really committed and your startup is cheap to run, you
134 | become very hard to kill.  And practically all startups, even the
135 | most successful, come close to death at some point.  So if doing
136 | good for people gives you a sense of mission that makes you harder
137 | to kill, that alone more than compensates for whatever you lose by
138 | not choosing a more selfish project.HelpAnother advantage of being good is that it makes other people want
139 | to help you.  This too seems to be an inborn trait in humans.One of the startups we've funded, Octopart, is currently locked in
140 | a classic battle of good versus evil.  They're a search site for
141 | industrial components.  A lot of people need to search for components,
142 | and before Octopart there was no good way to do it.  That, it turned
143 | out, was no coincidence.Octopart built the right way to search for components.  Users like
144 | it and they've been growing rapidly.  And yet for most of Octopart's
145 | life, the biggest distributor, Digi-Key, has been trying to force
146 | them take their prices off the site.  Octopart is sending them
147 | customers for free, and yet Digi-Key is trying to make that traffic
148 | stop.  Why?  Because their current business model depends on
149 | overcharging people who have incomplete information about prices.
150 | They don't want search to work.The Octoparts are the nicest guys in the world.  They dropped out
151 | of the PhD program in physics at Berkeley to do this.  They just
152 | wanted to fix a problem they encountered in their research.  Imagine
153 | how much time you could save the world's engineers if they could
154 | do searches online.  So when I hear that a big, evil company is
155 | trying to stop them in order to keep search broken, it makes me
156 | really want to help them. It makes me spend more time on the Octoparts
157 | than I do with most of the other startups we've funded.  It just
158 | made me spend several minutes telling you how great they are.  Why?
159 | Because they're good guys and they're trying to help the world.If you're benevolent, people will rally around you: investors,
160 | customers, other companies, and potential employees.  In the long
161 | term the most important may be the potential employees.  I think
162 | everyone knows now that 
163 | good hackers are much better than mediocre
164 | ones.  If you can attract the best hackers to work for you, as
165 | Google has, you have a big advantage.  And the very best hackers
166 | tend to be idealistic.  They're not desperate for a job.  They can
167 | work wherever they want.  So most want to work on things that will
168 | make the world better.CompassBut the most important advantage of being good is that it acts as
169 | a compass.  One of the hardest parts of doing a startup is that you
170 | have so many choices.  There are just two or three of you, and a
171 | thousand things you could do. How do you decide?Here's the answer: Do whatever's best for your users.  You can hold
172 | onto this like a rope in a hurricane, and it will save you if
173 | anything can.  Follow it and it will take you through everything
174 | you need to do.It's even the answer to questions that seem unrelated, like how to
175 | convince investors to give you money.  If you're a good salesman,
176 | you could try to just talk them into it.  But the more reliable
177 | route is to convince them through your users: if you make something
178 | users love enough to tell their friends, you grow exponentially,
179 | and that will convince any investor.Being good is a particularly useful strategy for making decisions
180 | in complex situations because it's stateless.  It's like telling
181 | the truth.  The trouble with lying is that you have to remember
182 | everything you've said in the past to make sure you don't contradict
183 | yourself.  If you tell the truth you don't have to remember anything,
184 | and that's a really useful property in domains where things happen
185 | fast.For example, Y Combinator has now invested in 80 startups, 57 of
186 | which are still alive.  (The rest have died or merged or been
187 | acquired.)  When you're trying to advise 57 startups, it turns out
188 | you have to have a stateless algorithm.  You can't have ulterior
189 | motives when you have 57 things going on at once, because you can't
190 | remember them.  So our rule is just to do whatever's best for the
191 | founders.  Not because we're particularly benevolent, but because
192 | it's the only algorithm that works on that scale.When you write something telling people to be good, you seem to be
193 | claiming to be good yourself.  So I want to say explicitly that I
194 | am not a particularly good person.  When I was a kid I was firmly
195 | in the camp of bad.  The way adults used the word good, it seemed
196 | to be synonymous with quiet, so I grew up very suspicious of it.You know how there are some people whose names come up in conversation
197 | and everyone says "He's such a great guy?"  People never say
198 | that about me.  The best I get is "he means well."  I am not claiming
199 | to be good.  At best I speak good as a second language.So I'm not suggesting you be good in the usual sanctimonious way.
200 | I'm suggesting it because it works.  It will work not just as a
201 | statement of "values," but as a guide to strategy,
202 | and even a design spec for software.  Don't just not be evil.  Be
203 | good.Notes[1] Fifty years ago
204 | it would have seemed shocking for a public company not to pay
205 | dividends.  Now many tech companies don't.  The markets seem to
206 | have figured out how to value potential dividends.  Maybe that isn't
207 | the last step in this evolution.  Maybe markets will eventually get
208 | comfortable with potential earnings. (VCs already are, and at least
209 | some of them consistently make money.)I realize this sounds like the stuff one used to hear about the
210 | "new economy" during the Bubble.  Believe me, I was not drinking
211 | that kool-aid at the time.  But I'm convinced there were some 
212 | good
213 | ideas buried in Bubble thinking.  For example, it's ok to focus on
214 | growth instead of profits—but only if the growth is genuine.
215 | You can't be buying users; that's a pyramid scheme.   But a company
216 | with rapid, genuine growth is valuable, and eventually markets learn
217 | how to value valuable things.[2] The idea of starting
218 | a company with benevolent aims is currently undervalued, because
219 | the kind of people who currently make that their explicit goal don't
220 | usually do a very good job.It's one of the standard career paths of trustafarians to start
221 | some vaguely benevolent business.  The problem with most of them
222 | is that they either have a bogus political agenda or are feebly
223 | executed.  The trustafarians' ancestors didn't get rich by preserving
224 | their traditional culture; maybe people in Bolivia don't want to
225 | either.  And starting an organic farm, though it's at least
226 | straightforwardly benevolent, doesn't help people on the scale that
227 | Google does.Most explicitly benevolent projects don't hold themselves sufficiently
228 | accountable.  They act as if having good intentions were enough to
229 | guarantee good effects.[3] Users dislike their
230 | new operating system so much that they're starting petitions to
231 | save the old one.  And the old one was nothing special.  The hackers
232 | within Microsoft must know in their hearts that if the company
233 | really cared about users they'd just advise them to switch to OSX.Thanks to Trevor Blackwell, Paul Buchheit, Jessica Livingston,
234 | and Robert Morris for reading drafts of this.


--------------------------------------------------------------------------------
/como_podemos_ayudarte.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/como_podemos_ayudarte.pdf


--------------------------------------------------------------------------------
/good.txt:
--------------------------------------------------------------------------------
  1 | April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the
  2 | phrase that became our motto: Make something people want.  We've
  3 | learned a lot since then, but if I were choosing now that's still
  4 | the one I'd pick.Another thing we tell founders is not to worry too much about the
  5 | business model, at least at first.  Not because making money is
  6 | unimportant, but because it's so much easier than building something
  7 | great.A couple weeks ago I realized that if you put those two ideas
  8 | together, you get something surprising.  Make something people want.
  9 | Don't worry too much about making money.  What you've got is a
 10 | description of a charity.When you get an unexpected result like this, it could either be a
 11 | bug or a new discovery.  Either businesses aren't supposed to be
 12 | like charities, and we've proven by reductio ad absurdum that one
 13 | or both of the principles we began with is false.  Or we have a new
 14 | idea.I suspect it's the latter, because as soon as this thought occurred
 15 | to me, a whole bunch of other things fell into place.ExamplesFor example, Craigslist.  It's not a charity, but they run it like
 16 | one.  And they're astoundingly successful.  When you scan down the
 17 | list of most popular web sites, the number of employees at Craigslist
 18 | looks like a misprint. Their revenues aren't as high as they could
 19 | be, but most startups would be happy to trade places with them.In Patrick O'Brian's novels, his captains always try to get upwind
 20 | of their opponents.  If you're upwind, you decide when and if to
 21 | engage the other ship.  Craigslist is effectively upwind of enormous
 22 | revenues.  They'd face some challenges if they wanted to make more,
 23 | but not the sort you face when you're tacking upwind, trying to
 24 | force a crappy product on ambivalent users by spending ten times
 25 | as much on sales as on development.  [1]I'm not saying startups should aim to end up like Craigslist.
 26 | They're a product of unusual circumstances.  But they're a good
 27 | model for the early phases.Google looked a lot like a charity in the beginning. They didn't
 28 | have ads for over a year.  At year 1, Google was indistinguishable
 29 | from a nonprofit.  If a nonprofit or government organization had
 30 | started a project to index the web, Google at year 1 is the limit
 31 | of what they'd have produced.Back when I was working on spam filters I thought it would be a
 32 | good idea to have a web-based email service with good spam filtering.
 33 | I wasn't thinking of it as a company.  I just wanted to keep people
 34 | from getting spammed.  But as I thought more about this project, I
 35 | realized it would probably have to be a company.  It would cost
 36 | something to run, and it would be a pain to fund with grants and
 37 | donations.That was a surprising realization.  Companies often claim to be
 38 | benevolent, but it was surprising to realize there were purely
 39 | benevolent projects that had to be embodied as companies to work.I didn't want to start another company, so I didn't do it.  But if
 40 | someone had, they'd probably be quite rich now.  There was a window
 41 | of about two years when spam was increasing rapidly but all the big
 42 | email services had terrible filters.  If someone had launched a
 43 | new, spam-free mail service, users would have flocked to it.Notice the pattern here?  From either direction we get to the same
 44 | spot.  If you start from successful startups, you find they often
 45 | behaved like nonprofits.  And if you start from ideas for nonprofits,
 46 | you find they'd often make good startups.PowerHow wide is this territory?  Would all good nonprofits be good
 47 | companies?  Possibly not.  What makes Google so valuable is that
 48 | their users have money.  If you make people with money love you,
 49 | you can probably get some of it.  But could you also base a successful
 50 | startup on behaving like a nonprofit to people who don't have money?
 51 | Could you, for example, grow a successful startup out of curing an
 52 | unfashionable but deadly disease like malaria?I'm not sure, but I suspect that if you pushed this idea, you'd be
 53 | surprised how far it would go.  For example, people who apply to Y
 54 | Combinator don't generally have much money, and yet we can profit
 55 | by helping them, because with our help they could make money.  Maybe
 56 | the situation is similar with malaria.  Maybe an organization that
 57 | helped lift its weight off a country could benefit from the resulting
 58 | growth.I'm not proposing this is a serious idea.  I don't know anything
 59 | about malaria.  But I've been kicking ideas around long enough to
 60 | know when I come across a powerful one.One way to guess how far an idea extends is to ask yourself at what
 61 | point you'd bet against it.  The thought of betting against benevolence
 62 | is alarming in the same way as saying that something is technically
 63 | impossible.  You're just asking to be made a fool of, because these
 64 | are such powerful forces.  [2]For example, initially I thought maybe this principle only applied
 65 | to Internet startups.  Obviously it worked for Google, but what
 66 | about Microsoft?  Surely Microsoft isn't benevolent?  But when I
 67 | think back to the beginning, they were.  Compared to IBM they were
 68 | like Robin Hood.  When IBM introduced the PC, they thought they
 69 | were going to make money selling hardware at high prices.  But by
 70 | gaining control of the PC standard, Microsoft opened up the market
 71 | to any manufacturer.  Hardware prices plummeted, and lots of people
 72 | got to have computers who couldn't otherwise have afforded them.
 73 | It's the sort of thing you'd expect Google to do.Microsoft isn't so benevolent now.  Now when one thinks of what
 74 | Microsoft does to users, all the verbs that come to mind begin with
 75 | F.  [3] And yet it doesn't seem to pay.
 76 | Their stock price has been flat for years.  Back when they were
 77 | Robin Hood, their stock price rose like Google's.  Could there be
 78 | a connection?You can see how there would be.  When you're small, you can't bully
 79 | customers, so you have to charm them.  Whereas when you're big you
 80 | can maltreat them at will, and you tend to, because it's easier
 81 | than satisfying them.  You grow big by being nice, but you can stay
 82 | big by being mean.You get away with it till the underlying conditions change, and
 83 | then all your victims escape.  So "Don't be evil" may be the most
 84 | valuable thing Paul Buchheit made for Google, because it may turn
 85 | out to be an elixir of corporate youth.  I'm sure they find it
 86 | constraining, but think how valuable it will be if it saves them
 87 | from lapsing into the fatal laziness that afflicted Microsoft and
 88 | IBM.The curious thing is, this elixir is freely available to any other
 89 | company.  Anyone can adopt "Don't be evil."  The catch is that
 90 | people will hold you to it.  So I don't think you're going to see
 91 | record labels or tobacco companies using this discovery.MoraleThere's a lot of external evidence that benevolence works.  But how
 92 | does it work?  One advantage of investing in a large number of
 93 | startups is that you get a lot of data about how they work.  From
 94 | what we've seen, being good seems to help startups in three ways:
 95 | it improves their morale, it makes other people want to help them,
 96 | and above all, it helps them be decisive.Morale is tremendously important to a startup—so important
 97 | that morale alone is almost enough to determine success.  Startups
 98 | are often described as emotional roller-coasters. One minute you're
 99 | going to take over the world, and the next you're doomed.  The
100 | problem with feeling you're doomed is not just that it makes you
101 | unhappy, but that it makes you stop working.  So the downhills
102 | of the roller-coaster are more of a self fulfilling prophecy than
103 | the uphills.  If feeling you're going to succeed makes you work
104 | harder, that probably improves your chances of succeeding, but if
105 | feeling you're going to fail makes you stop working, that practically
106 | guarantees you'll fail.Here's where benevolence comes in.  If you feel you're really helping
107 | people, you'll keep working even when it seems like your startup
108 | is doomed.  Most of us have some amount of natural benevolence.
109 | The mere fact that someone needs you makes you want to help them.
110 | So if you start the kind of startup where users come back each day,
111 | you've basically built yourself a giant tamagotchi.  You've made
112 | something you need to take care of.Blogger is a famous example of a startup that went through really
113 | low lows and survived.  At one point they ran out of money and
114 | everyone left. Evan Williams came in to work the next day, and there
115 | was no one but him.  What kept him going?  Partly that users needed
116 | him.  He was hosting thousands of people's blogs. He couldn't just
117 | let the site die.There are many advantages of launching quickly, but the most important
118 | may be that once you have users, the tamagotchi effect kicks in.
119 | Once you have users to take care of, you're forced to figure out
120 | what will make them happy, and that's actually very valuable
121 | information.The added confidence that comes from trying to help people can
122 | also help you with investors. One of the founders of 
123 | Chatterous told 
124 | me recently that he and his cofounder had decided that this service
125 | was something the world needed, so they were going to keep working
126 | on it no matter what, even if they had to move back to Canada and live
127 | in their parents' basements.Once they realized this, they stopped caring so much what investors thought
128 | about them.  They still met with them, but they weren't going to
129 | die if they didn't get their money.  And you know what?  The investors
130 | got a lot more interested.  They could sense that the Chatterouses
131 | were going to do this startup with or without them.If you're really committed and your startup is cheap to run, you
132 | become very hard to kill.  And practically all startups, even the
133 | most successful, come close to death at some point.  So if doing
134 | good for people gives you a sense of mission that makes you harder
135 | to kill, that alone more than compensates for whatever you lose by
136 | not choosing a more selfish project.HelpAnother advantage of being good is that it makes other people want
137 | to help you.  This too seems to be an inborn trait in humans.One of the startups we've funded, Octopart, is currently locked in
138 | a classic battle of good versus evil.  They're a search site for
139 | industrial components.  A lot of people need to search for components,
140 | and before Octopart there was no good way to do it.  That, it turned
141 | out, was no coincidence.Octopart built the right way to search for components.  Users like
142 | it and they've been growing rapidly.  And yet for most of Octopart's
143 | life, the biggest distributor, Digi-Key, has been trying to force
144 | them take their prices off the site.  Octopart is sending them
145 | customers for free, and yet Digi-Key is trying to make that traffic
146 | stop.  Why?  Because their current business model depends on
147 | overcharging people who have incomplete information about prices.
148 | They don't want search to work.The Octoparts are the nicest guys in the world.  They dropped out
149 | of the PhD program in physics at Berkeley to do this.  They just
150 | wanted to fix a problem they encountered in their research.  Imagine
151 | how much time you could save the world's engineers if they could
152 | do searches online.  So when I hear that a big, evil company is
153 | trying to stop them in order to keep search broken, it makes me
154 | really want to help them. It makes me spend more time on the Octoparts
155 | than I do with most of the other startups we've funded.  It just
156 | made me spend several minutes telling you how great they are.  Why?
157 | Because they're good guys and they're trying to help the world.If you're benevolent, people will rally around you: investors,
158 | customers, other companies, and potential employees.  In the long
159 | term the most important may be the potential employees.  I think
160 | everyone knows now that 
161 | good hackers are much better than mediocre
162 | ones.  If you can attract the best hackers to work for you, as
163 | Google has, you have a big advantage.  And the very best hackers
164 | tend to be idealistic.  They're not desperate for a job.  They can
165 | work wherever they want.  So most want to work on things that will
166 | make the world better.CompassBut the most important advantage of being good is that it acts as
167 | a compass.  One of the hardest parts of doing a startup is that you
168 | have so many choices.  There are just two or three of you, and a
169 | thousand things you could do. How do you decide?Here's the answer: Do whatever's best for your users.  You can hold
170 | onto this like a rope in a hurricane, and it will save you if
171 | anything can.  Follow it and it will take you through everything
172 | you need to do.It's even the answer to questions that seem unrelated, like how to
173 | convince investors to give you money.  If you're a good salesman,
174 | you could try to just talk them into it.  But the more reliable
175 | route is to convince them through your users: if you make something
176 | users love enough to tell their friends, you grow exponentially,
177 | and that will convince any investor.Being good is a particularly useful strategy for making decisions
178 | in complex situations because it's stateless.  It's like telling
179 | the truth.  The trouble with lying is that you have to remember
180 | everything you've said in the past to make sure you don't contradict
181 | yourself.  If you tell the truth you don't have to remember anything,
182 | and that's a really useful property in domains where things happen
183 | fast.For example, Y Combinator has now invested in 80 startups, 57 of
184 | which are still alive.  (The rest have died or merged or been
185 | acquired.)  When you're trying to advise 57 startups, it turns out
186 | you have to have a stateless algorithm.  You can't have ulterior
187 | motives when you have 57 things going on at once, because you can't
188 | remember them.  So our rule is just to do whatever's best for the
189 | founders.  Not because we're particularly benevolent, but because
190 | it's the only algorithm that works on that scale.When you write something telling people to be good, you seem to be
191 | claiming to be good yourself.  So I want to say explicitly that I
192 | am not a particularly good person.  When I was a kid I was firmly
193 | in the camp of bad.  The way adults used the word good, it seemed
194 | to be synonymous with quiet, so I grew up very suspicious of it.You know how there are some people whose names come up in conversation
195 | and everyone says "He's such a great guy?"  People never say
196 | that about me.  The best I get is "he means well."  I am not claiming
197 | to be good.  At best I speak good as a second language.So I'm not suggesting you be good in the usual sanctimonious way.
198 | I'm suggesting it because it works.  It will work not just as a
199 | statement of "values," but as a guide to strategy,
200 | and even a design spec for software.  Don't just not be evil.  Be
201 | good.Notes[1] Fifty years ago
202 | it would have seemed shocking for a public company not to pay
203 | dividends.  Now many tech companies don't.  The markets seem to
204 | have figured out how to value potential dividends.  Maybe that isn't
205 | the last step in this evolution.  Maybe markets will eventually get
206 | comfortable with potential earnings. (VCs already are, and at least
207 | some of them consistently make money.)I realize this sounds like the stuff one used to hear about the
208 | "new economy" during the Bubble.  Believe me, I was not drinking
209 | that kool-aid at the time.  But I'm convinced there were some 
210 | good
211 | ideas buried in Bubble thinking.  For example, it's ok to focus on
212 | growth instead of profits—but only if the growth is genuine.
213 | You can't be buying users; that's a pyramid scheme.   But a company
214 | with rapid, genuine growth is valuable, and eventually markets learn
215 | how to value valuable things.[2] The idea of starting
216 | a company with benevolent aims is currently undervalued, because
217 | the kind of people who currently make that their explicit goal don't
218 | usually do a very good job.It's one of the standard career paths of trustafarians to start
219 | some vaguely benevolent business.  The problem with most of them
220 | is that they either have a bogus political agenda or are feebly
221 | executed.  The trustafarians' ancestors didn't get rich by preserving
222 | their traditional culture; maybe people in Bolivia don't want to
223 | either.  And starting an organic farm, though it's at least
224 | straightforwardly benevolent, doesn't help people on the scale that
225 | Google does.Most explicitly benevolent projects don't hold themselves sufficiently
226 | accountable.  They act as if having good intentions were enough to
227 | guarantee good effects.[3] Users dislike their
228 | new operating system so much that they're starting petitions to
229 | save the old one.  And the old one was nothing special.  The hackers
230 | within Microsoft must know in their hearts that if the company
231 | really cared about users they'd just advise them to switch to OSX.Thanks to Trevor Blackwell, Paul Buchheit, Jessica Livingston,
232 | and Robert Morris for reading drafts of this.


--------------------------------------------------------------------------------
/street_tree_db.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/street_tree_db.sqlite


--------------------------------------------------------------------------------
/thefuzz-master/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/thefuzz-master/.DS_Store


--------------------------------------------------------------------------------
/thefuzz-master/.editorconfig:
--------------------------------------------------------------------------------
 1 | # .editorconfig
 2 | # http://editorconfig.org/
 3 | root = true
 4 | 
 5 | [*]
 6 | charset = utf-8
 7 | end_of_line = lf
 8 | indent_size = 2
 9 | indent_style = space
10 | insert_final_newline = true
11 | trim_trailing_whitespace = true
12 | 
13 | [*.bat]
14 | end_of_line = crlf
15 | 
16 | [*.go]
17 | indent_size = 4
18 | indent_style = tab
19 | 
20 | [*.html]
21 | indent_size = 4
22 | 
23 | [*Makefile]
24 | indent_size = 4
25 | indent_style = tab
26 | 
27 | [*.php]
28 | indent_size = 4
29 | 
30 | [*.py]
31 | indent_size = 4
32 | 
33 | [*.xml]
34 | indent_size = 4
35 | 


--------------------------------------------------------------------------------
/thefuzz-master/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
 1 | name: The Fuzz
 2 | 
 3 | on: [push, pull_request, workflow_dispatch]
 4 | 
 5 | jobs:
 6 |   build:
 7 |     runs-on: ubuntu-latest
 8 |     strategy:
 9 |       fail-fast: false
10 |       matrix:
11 |         python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
12 |         test-cmd: [pytest]
13 |         include:
14 |           - python-version: "3.8"
15 |             test-cmd: python setup.py check --restructuredtext --strict --metadata
16 |           - python-version: "3.11"
17 |             test-cmd: python setup.py check --restructuredtext --strict --metadata
18 |     steps:
19 |       - uses: actions/checkout@v4
20 |       - name: Set up Python ${{ matrix.python-version }}
21 |         uses: actions/setup-python@v4
22 |         with:
23 |           python-version: ${{ matrix.python-version }}
24 |           allow-prereleases: true
25 |       - name: Install dependencies
26 |         run: |
27 |           python -m pip install --upgrade pip setuptools wheel
28 |           pip install pytest pycodestyle docutils Pygments hypothesis
29 | 
30 |       - name: Install project
31 |         run: pip install .
32 | 
33 |       - name: Test with pytest
34 |         run: ${{ matrix.test-cmd }}
35 | 


--------------------------------------------------------------------------------
/thefuzz-master/.gitignore:
--------------------------------------------------------------------------------
 1 | *.py[oc]
 2 | 
 3 | # Temp files
 4 | *~
 5 | ~*
 6 | .*~
 7 | \#*
 8 | .#*
 9 | *#
10 | 
11 | # Build files
12 | build
13 | dist
14 | pkg
15 | *.egg
16 | *.egg-info
17 | 
18 | # Debian Files
19 | debian/files
20 | debian/python-beaver*
21 | 
22 | # Sphinx build
23 | doc/_build
24 | 
25 | # Generated man page
26 | doc/aws_hostname.1
27 | 
28 | # tox
29 | .tox
30 | 
31 | # Hypothesis - keep the examples database
32 | .hypothesis/tmp
33 | .hypothesis/unicodedata
34 | .hypothesis
35 | 
36 | # pytest
37 | .cache/
38 | .pytest_cache
39 | __pycache__
40 | 
41 | # Pycharm
42 | .idea/
43 | 
44 | # vscode
45 | .vscode/
46 | 


--------------------------------------------------------------------------------
/thefuzz-master/CHANGES.rst:
--------------------------------------------------------------------------------
  1 | Changelog
  2 | =========
  3 | 
  4 | 0.17.0 (2018-08-20)
  5 | -------------------
  6 | 
  7 | - Make benchmarks script Py3 compatible. [Stefan Behnel]
  8 | 
  9 | - Add Go lang port. [iddober]
 10 | 
 11 | - Add reference to C# port. [ericcoleman]
 12 | 
 13 | - Chore: remove license header from files. [Jose Diaz-Gonzalez]
 14 | 
 15 |   The files should all inherit the projects license.
 16 | 
 17 | 
 18 | - Fix README title style. [Thomas Grainger]
 19 | 
 20 | - Add readme check. [Thomas Grainger]
 21 | 
 22 |   install docutils and Pygments
 23 | 
 24 | 
 25 | - Cache pip. [Thomas Grainger]
 26 | 
 27 | - Upgrade pip/setuptools for hypothesis. [Thomas Grainger]
 28 | 
 29 | - Feat: drop py26 and py33 support from tox. [Jose Diaz-Gonzalez]
 30 | 
 31 | - Feat: drop support for 2.6 in test_thefuzz.py. [Jose Diaz-Gonzalez]
 32 | 
 33 | - Feat: drop reference to 2.4 from readme. [Jose Diaz-Gonzalez]
 34 | 
 35 | - Feat: drop py2.6 and py3.3 classifiers. [Jose Diaz-Gonzalez]
 36 | 
 37 | - Feat: drop 2.6 and 3.3 support. [Jose Diaz-Gonzalez]
 38 | 
 39 |   These are no longer supported. Please upgrade your python version if you are using either version.
 40 | 
 41 | - Fuzz: _token_sort: check for equivalence. [Ralf Ramsauer]
 42 | 
 43 |   If we don't have to full_process the strings, we can safely assume to
 44 |   return 100 in case both candidates equal.
 45 | 
 46 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
 47 | 
 48 | 
 49 | - Test: add more test cases. [Ralf Ramsauer]
 50 | 
 51 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
 52 | 
 53 | 
 54 | - Utils: add and use check_for_equivalence decorator. [Ralf Ramsauer]
 55 | 
 56 |   And decorate basic scoring functions.
 57 | 
 58 |   The check_for_equivalence decorator MUST be used after the
 59 |   check_for_none decorator, as otherwise ratio(None, None) will get a
 60 |   score of 100.
 61 | 
 62 |   This fixes the first part of the recently introduced changes in the test
 63 |   set.
 64 | 
 65 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
 66 | 
 67 | 
 68 | - Tests: add some corner cases. [Ralf Ramsauer]
 69 | 
 70 |   '' and '' are equal, so are '{' and '{'. Test if thefuzz gives them a
 71 |   score of 100.
 72 | 
 73 |   For the moment, this patch breaks tests, fixes in thefuzz follow.
 74 | 
 75 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
 76 | 
 77 | 
 78 | - Utils: remove superfluous check. [Ralf Ramsauer]
 79 | 
 80 |   Decorators make sure that only non None-values are passed. We can safely
 81 |   assume that None will never get here.
 82 | 
 83 |   Other than that, None's shouldn't simply be ignored and erroneously
 84 |   changed to empty strings. Better let users fail.
 85 | 
 86 |   This commit doesn't break any tests.
 87 | 
 88 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
 89 | 
 90 | 
 91 | - README: add missing requirements. [Ralf Ramsauer]
 92 | 
 93 |   pycodestyle and hypothesis are required for automatic testing. Add them
 94 |   to README's requirement section.
 95 | 
 96 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
 97 | 
 98 | 
 99 | - Remove empty document. [Ralf Ramsauer]
100 | 
101 |   Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
102 | 
103 | 
104 | 0.16.0 (2017-12-18)
105 | -------------------
106 | 
107 | - Add punctuation characters back in so process does something.
108 |   [davidcellis]
109 | 
110 | - Simpler alphabet and even fewer examples. [davidcellis]
111 | 
112 | - Fewer examples and larger deadlines for Hypothesis. [davidcellis]
113 | 
114 | - Slightly more examples. [davidcellis]
115 | 
116 | - Attempt to fix the failing 2.7 and 3.6 python tests. [davidcellis]
117 | 
118 | - Readme: add link to C++ port. [Lizard]
119 | 
120 | - Fix tests on Python 3.3. [Jon Banafato]
121 | 
122 |   Modify tox.ini and .travis.yml to install enum34 when running with
123 |   Python 3.3 to allow hypothesis tests to pass.
124 | 
125 | 
126 | - Normalize Python versions. [Jon Banafato]
127 | 
128 |   - Enable Travis-CI tests for Python 3.6
129 |   - Enable tests for all supported Python versions in tox.ini
130 |   - Add Trove classifiers for Python 3.4 - 3.6 to setup.py
131 | 
132 |   ---
133 | 
134 |   Note: Python 2.6 and 3.3 are no longer supported by the Python core
135 |   team. Support for these can likely be dropped, but that's out of scope
136 |   for this change set.
137 | 
138 | 
139 | - Fix typos. [Sven-Hendrik Haase]
140 | 
141 | 0.15.1 (2017-07-19)
142 | -------------------
143 | 
144 | - Fix setup.py (addresses #155) [Paul O'Leary McCann]
145 | 
146 | - Merge remote-tracking branch 'upstream/master' into
147 |   extract_optimizations. [nolan]
148 | 
149 | - Seed random before generating benchmark strings. [nolan]
150 | 
151 | - Cleaner implementation of same idea without new param, but adding
152 |   existing full_process param to Q,W,UQ,UW. [nolan]
153 | 
154 | - Fix benchmark only generate list once. [nolan]
155 | 
156 | - Only run util.full_process once on query when using extract functions,
157 |   add new benchmarks. [nolan]
158 | 
159 | 0.15.0 (2017-02-20)
160 | -------------------
161 | 
162 | - Add extras require to install python-levenshtein optionally. [Rolando
163 |   Espinoza]
164 | 
165 |   This allows to install python-levenshtein as dependency.
166 | 
167 | 
168 | - Fix link formatting in the README. [Alex Chan]
169 | 
170 | - Add fuzzball.js JavaScript port link. [nolan]
171 | 
172 | - Added Rust Port link. [Logan Collins]
173 | 
174 | - Validate_string docstring. [davidcellis]
175 | 
176 | - For full comparisons test that ONLY exact matches (after processing)
177 |   are added. [davidcellis]
178 | 
179 | - Add detailed docstrings to WRatio and QRatio comparisons.
180 |   [davidcellis]
181 | 
182 | 0.14.0 (2016-11-04)
183 | -------------------
184 | 
185 | - Possible PEP-8 fix + make pep-8 warnings appear in test. [davidcellis]
186 | 
187 | - Possible PEP-8 fix. [davidcellis]
188 | 
189 | - Possible PEP-8 fix. [davidcellis]
190 | 
191 | - Test for stderr log instead of warning. [davidcellis]
192 | 
193 | - Convert warning.warn to logging.warning. [davidcellis]
194 | 
195 | - Additional details for empty string warning from process.
196 |   [davidcellis]
197 | 
198 |   String formatting fix for python 2.6
199 | 
200 | 
201 | - Enclose warnings.simplefilter() inside a with statement. [samkennerly]
202 | 
203 | 0.13.0 (2016-11-01)
204 | -------------------
205 | 
206 | - Support alternate git status output. [Jose Diaz-Gonzalez]
207 | 
208 | - Split warning test into new test file, added to travis execution on
209 |   2.6 / pypy3. [davidcellis]
210 | 
211 | - Remove hypothesis examples database from gitignore. [davidcellis]
212 | 
213 | - Add check for warning to tests. [davidcellis]
214 | 
215 |   Reordered test imports
216 | 
217 | 
218 | - Check processor and warn before scorer may remove processor.
219 |   [davidcellis]
220 | 
221 | - Renamed test - tidied docstring. [davidcellis]
222 | 
223 | - Add token ratios to the list of scorers that skip running full_process
224 |   as a processor. [davidcellis]
225 | 
226 | - Added tokex_sort, token_set to test. [davidcellis]
227 | 
228 | - Test docstrings/comments. [davidcellis]
229 | 
230 |   Removed redundant check from test.
231 | 
232 | 
233 | - Added py.test .cache/ removed duplicated build from gitignore.
234 |   [davidcellis]
235 | 
236 | - Added default_scorer, default_processor parameters to make it easier
237 |   to change in the future. [davidcellis]
238 | 
239 |   Added warning if the processor reduces the input query to an empty string.
240 | 
241 | 
242 | - Rewrote extracts to explicitly use default values for processor and
243 |   scorer. [davidcellis]
244 | 
245 | - Changed Hypothesis tests to use pytest parameters. [davidcellis]
246 | 
247 | - Added Hypothesis based tests for identical strings. [Ducksual]
248 | 
249 |   Added support for hypothesis to travis config.
250 |   Hypothesis based tests are skipped on Python 2.6 and pypy3.
251 | 
252 |   Added .hypothesis/ folder to gitignore
253 | 
254 | 
255 | - Added test for simple 'a, b' string on process.extractOne. [Ducksual]
256 | 
257 | - Process the query in process.extractWithoutOrder when using a scorer
258 |   which does not do so. [Ducksual]
259 | 
260 |   Closes 139
261 | 
262 | 
263 | - Mention that difflib and levenshtein results may differ. [Jose Diaz-
264 |   Gonzalez]
265 | 
266 |   Closes #128
267 | 
268 | 0.12.0 (2016-09-14)
269 | -------------------
270 | 
271 | - Declare support for universal wheels. [Thomas Grainger]
272 | 
273 | - Clarify that license is GPLv2. [Gareth Tan]
274 | 
275 | 0.11.1 (2016-07-27)
276 | -------------------
277 | 
278 | - Add editorconfig. [Jose Diaz-Gonzalez]
279 | 
280 | - Added tox.ini cofig file for easy local multi-environment testing
281 |   changed travis config to use py.test like tox updated use of pep8
282 |   module to pycodestyle. [Pedro Rodrigues]
283 | 
284 | 0.11.0 (2016-06-30)
285 | -------------------
286 | 
287 | - Clean-up. [desmaisons_david]
288 | 
289 | - Improving performance. [desmaisons_david]
290 | 
291 | - Performance Improvement. [desmaisons_david]
292 | 
293 | - Fix link to Levenshtein. [Brian J. McGuirk]
294 | 
295 | - Fix readme links. [Brian J. McGuirk]
296 | 
297 | - Add license to StringMatcher.py. [Jose Diaz-Gonzalez]
298 | 
299 |   Closes #113
300 | 
301 | 0.10.0 (2016-03-14)
302 | -------------------
303 | 
304 | - Handle None inputs same as empty string (Issue #94) [Nick Miller]
305 | 
306 | 0.9.0 (2016-03-07)
307 | ------------------
308 | 
309 | - Pull down all keys when updating local copy. [Jose Diaz-Gonzalez]
310 | 
311 | 0.8.2 (2016-02-26)
312 | ------------------
313 | 
314 | - Remove the warning for "slow" sequence matcher on PyPy. [Julian
315 |   Berman]
316 | 
317 |   where it's preferable to use the pure-python implementation.
318 | 
319 | 0.8.1 (2016-01-25)
320 | ------------------
321 | 
322 | - Minor release changes. [Jose Diaz-Gonzalez]
323 | 
324 | - Clean up wiki link in readme. [Ewan Oglethorpe]
325 | 
326 | 0.8.0 (2015-11-16)
327 | ------------------
328 | 
329 | - Refer to Levenshtein distance in readme. Closes #88. [Jose Diaz-
330 |   Gonzalez]
331 | 
332 | - Added install step for travis to have pep8 available. [Pedro
333 |   Rodrigues]
334 | 
335 | - Added a pep8 test. The way I add the error 501 to the ignore tuple is
336 |   probably wrong but from the docs and source code of pep8 I could not
337 |   find any other way. [Pedro Rodrigues]
338 | 
339 |   I also went ahead and removed the pep8 call from the release file.
340 | 
341 | 
342 | - Added python 3.5, pypy, and ypyp3 to the travis config file. [Pedro
343 |   Rodrigues]
344 | 
345 | - Added another step to the release file to run the tests before
346 |   releasing. [Pedro Rodrigues]
347 | 
348 | - Fixed a few pep8 errors Added a verification step in the release
349 |   automation file. This step should probably be somewhere at git level.
350 |   [Pedro Rodrigues]
351 | 
352 | - Pep8. [Pedro Rodrigues]
353 | 
354 | - Leaving TODOs in the code was never a good idea. [Pedro Rodrigues]
355 | 
356 | - Changed return values to be rounded integers. [Pedro Rodrigues]
357 | 
358 | - Added a test with the recovered data file. [Pedro Rodrigues]
359 | 
360 | - Recovered titledata.csv. [Pedro Rodrigues]
361 | 
362 | - Move extract test methods into the process test. [Shale Craig]
363 | 
364 |   Somehow, they ended up in the `RatioTest`, despite asserting that the
365 |   `ProcessTest` works.
366 | 
367 | 
368 | 0.7.0 (2015-10-02)
369 | ------------------
370 | 
371 | - Use portable syntax for catching exception on tests. [Luis Madrigal]
372 | 
373 | - [Fix] test against correct variable. [Luis Madrigal]
374 | 
375 | - Add unit tests for validator decorators. [Luis Madrigal]
376 | 
377 | - Move validators to decorator functions. [Luis Madrigal]
378 | 
379 |   This allows easier composition and IMO makes the functions more readable
380 | 
381 | 
382 | - Fix typo: dictionery -> dictionary. [shale]
383 | 
384 | - FizzyWuzzy -> TheFuzz typo correction. [shale]
385 | 
386 | - Add check for gitchangelog. [Jose Diaz-Gonzalez]
387 | 
388 | 0.6.2 (2015-09-03)
389 | ------------------
390 | 
391 | - Ensure the rst-lint binary is available. [Jose Diaz-Gonzalez]
392 | 
393 | 0.6.1 (2015-08-07)
394 | ------------------
395 | 
396 | - Minor whitespace changes for PEP8. [Jose Diaz-Gonzalez]
397 | 
398 | 0.6.0 (2015-07-20)
399 | ------------------
400 | 
401 | - Added link to a java port. [Andriy Burkov]
402 | 
403 | - Patched "name 'unicode' is not defined" python3. [Carlos Garay]
404 | 
405 |   https://github.com/seatgeek/thefuzz/issues/80
406 | 
407 | - Make process.extract accept {dict, list}-like choices. [Nathan
408 |   Typanski]
409 | 
410 |   Previously, process.extract expected lists or dictionaries, and tested
411 |   this with isinstance() calls. In keeping with the spirit of Python (duck
412 |   typing and all that), this change enables one to use extract() on any
413 |   dict-like object for dict-like results, or any list-like object for
414 |   list-like results.
415 | 
416 |   So now we can (and, indeed, I've added tests for these uses) call
417 |   extract() on things like:
418 | 
419 |   - a generator of strings ("any iterable")
420 |   - a UserDict
421 |   - custom user-made classes that "look like" dicts
422 |     (or, really, anything with a .items() method that behaves like a dict)
423 |   - plain old lists and dicts
424 | 
425 |   The behavior is exactly the same for previous use cases of
426 |   lists-and-dicts.
427 | 
428 |   This change goes along nicely with PR #68, since those docs suggest
429 |   dict-like behavior is valid, and this change makes that true.
430 | 
431 | 
432 | - Merge conflict. [Adam Cohen]
433 | 
434 | - Improve docs for thefuzz.process. [Nathan Typanski]
435 | 
436 |   The documentation for this module was dated and sometimes inaccurate.
437 |   This overhauls the docs to accurately describe the current module,
438 |   including detailing optional arguments that were not previously
439 |   explained - e.g., limit argument to extract().
440 | 
441 |   This change follows the Google Python Style Guide, which may be found
442 |   at:
443 | 
444 |   <https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments>
445 | 
446 | 
447 | 0.5.0 (2015-02-04)
448 | ------------------
449 | 
450 | - FIX: 0.4.0 is released, no need to specify 0.3.1 in README. [Josh
451 |   Warner (Mac)]
452 | 
453 | - Fixed a small typo. [Rostislav Semenov]
454 | 
455 | - Reset `processor` and `scorer` defaults to None with argument
456 |   checking. [foxxyz]
457 | 
458 | - Catch generators without lengths. [Jeremiah Lowin]
459 | 
460 | - Fixed python3 issue and deprecated assertion method. [foxxyz]
461 | 
462 | - Fixed some docstrings, typos, python3 string method compatibility,
463 |   some errors that crept in during rebase. [foxxyz]
464 | 
465 | - [mod] The lamdba in extract is not needed. [Olivier Le Thanh Duong]
466 | 
467 |   [mod] Pass directly the defaults functions in the args
468 | 
469 |   [mod] itertools.takewhile() can handle empty list just fine no need to test for it
470 | 
471 |   [mod] Shorten extractOne by removing double if
472 | 
473 |   [mod] Use a list comprehention in extract()
474 | 
475 |   [mod] Autopep8 on process.py
476 | 
477 |   [doc] Document make_type_consistent
478 | 
479 |   [mod] bad_chars shortened
480 | 
481 |   [enh] Move regex compilation outside the method, otherwise we don't get the benefit from it
482 | 
483 |   [mod] Don't need all the blah just to redefine method from string module
484 | 
485 |   [mod] Remove unused import
486 | 
487 |   [mod] Autopep8 on string_processing.py
488 | 
489 |   [mod] Rewrote asciidammit without recursion to make it more readable
490 | 
491 |   [mod] Autopep8 on utils.py
492 | 
493 |   [mod] Remove unused import
494 | 
495 |   [doc] Add some doc to fuzz.py
496 | 
497 |   [mod] Move the code to sort string in a separate function
498 | 
499 |   [doc] Docstrings for WRatio, UWRatio
500 | 
501 | 
502 | - Add note on which package to install. Closes #67. [Jose Diaz-Gonzalez]
503 | 
504 | 0.4.0 (2014-10-31)
505 | ------------------
506 | 
507 | - In extarctBests() and extractOne() use '>=' instead of '>' [Юрий
508 |   Пайков]
509 | 
510 | - Fixed python3 issue with SequenceMatcher import. [Юрий Пайков]
511 | 
512 | 0.3.3 (2014-10-22)
513 | ------------------
514 | 
515 | - Fixed issue #59 - "partial" parameter for `_token_set()` is now
516 |   honored. [Юрий Пайков]
517 | 
518 | - Catch generators without lengths. [Jeremiah Lowin]
519 | 
520 | - Remove explicit check for lists. [Jeremiah Lowin]
521 | 
522 |   The logic in `process.extract()` should support any Python sequence/iterable. The explicit check for lists is unnecessary and limiting (for example, it forces conversion of generators and other iterable classes to lists).
523 | 
524 | 0.3.2 (2014-09-12)
525 | ------------------
526 | 
527 | - Make release command an executable. [Jose Diaz-Gonzalez]
528 | 
529 | - Simplify MANIFEST.in. [Jose Diaz-Gonzalez]
530 | 
531 | - Add a release script. [Jose Diaz-Gonzalez]
532 | 
533 | - Fix readme codeblock. [Jose Diaz-Gonzalez]
534 | 
535 | - Minor formatting. [Jose Diaz-Gonzalez]
536 | 
537 | - Use __version__ from thefuzz package. [Jose Diaz-Gonzalez]
538 | 
539 | - Set __version__ constant in __init__.py. [Jose Diaz-Gonzalez]
540 | 
541 | - Rename LICENSE to LICENSE.txt. [Jose Diaz-Gonzalez]
542 | 
543 | 0.3.0 (2014-08-24)
544 | ------------------
545 | 
546 | - Test dict input to extractOne() [jamesnunn]
547 | 
548 | - Remove whitespace. [jamesnunn]
549 | 
550 | - Choices parameter for extract() accepts both dict and list objects.
551 |   [jamesnunn]
552 | 
553 | - Enable automated testing with Python 3.4. [Corey Farwell]
554 | 
555 | - Fixed typo: lettters -> letters. [Tal Einat]
556 | 
557 | - Fixing LICENSE and README's license info. [Dallas Gutauckis]
558 | 
559 | - Proper ordered list. [Jeff Paine]
560 | 
561 | - Convert README to rst. [Jeff Paine]
562 | 
563 | - Add requirements.txt per discussion in #44. [Jeff Paine]
564 | 
565 | - Add LICENSE TO MANIFEST.in. [Jeff Paine]
566 | 
567 | - Rename tests.py to more common test_thefuzz.py. [Jeff Paine]
568 | 
569 | - Add proper MANIFEST template. [Jeff Paine]
570 | 
571 | - Remove MANIFEST file Not meant to be kept in version control. [Jeff
572 |   Paine]
573 | 
574 | - Remove unused file. [Jeff Paine]
575 | 
576 | - Pep8. [Jeff Paine]
577 | 
578 | - Pep8 formatting. [Jeff Paine]
579 | 
580 | - Pep8 formatting. [Jeff Paine]
581 | 
582 | - Pep8 indentations. [Jeff Paine]
583 | 
584 | - Pep8 cleanup. [Jeff Paine]
585 | 
586 | - Pep8. [Jeff Paine]
587 | 
588 | - Pep8 cleanup. [Jeff Paine]
589 | 
590 | - Pep8 cleanup. [Jeff Paine]
591 | 
592 | - Pep8 import style. [Jeff Paine]
593 | 
594 | - Pep8 import ordering. [Jeff Paine]
595 | 
596 | - Pep8 import ordering. [Jeff Paine]
597 | 
598 | - Remove unused module. [Jeff Paine]
599 | 
600 | - Pep8 import ordering. [Jeff Paine]
601 | 
602 | - Remove unused module. [Jeff Paine]
603 | 
604 | - Pep8 import ordering. [Jeff Paine]
605 | 
606 | - Remove unused imports. [Jeff Paine]
607 | 
608 | - Remove unused module. [Jeff Paine]
609 | 
610 | - Remove import * where present. [Jeff Paine]
611 | 
612 | - Avoid import * [Jeff Paine]
613 | 
614 | - Add Travis CI badge. [Jeff Paine]
615 | 
616 | - Remove python 2.4, 2.5 from Travis (not supported) [Jeff Paine]
617 | 
618 | - Add python 2.4 and 2.5 to Travis. [Jeff Paine]
619 | 
620 | - Add all supported python versions to travis. [Jeff Paine]
621 | 
622 | - Bump minor version number. [Jeff Paine]
623 | 
624 | - Add classifiers for python versions. [Jeff Paine]
625 | 
626 | - Added note about python-Levenshtein speedup. Closes #34. [Jose Diaz-
627 |   Gonzalez]
628 | 
629 | - Fixed tests on 2.6. [Grigi]
630 | 
631 | - Fixed py2.6. [Grigi]
632 | 
633 | - Force bad_chars to ascii. [Grigi]
634 | 
635 | - Since importing unicode_literals, u decorator not required on strings
636 |   from py2.6 and up. [Grigi]
637 | 
638 | - Py3 support without 2to3. [Grigi]
639 | 
640 | - Created: Added .travis.yml. [futoase]
641 | 
642 | - [enh] Add docstrings to process.py. [Olivier Le Thanh Duong]
643 | 
644 |   Turn the existings comments into docstrings so they can be seen via introspection
645 | 
646 | 
647 | - Don't condense multiple punctuation characters to a single whitespace.
648 |   this is a behavioral change. [Adam Cohen]
649 | 
650 | - UQRatio and UWRatio shorthands. [Adam Cohen]
651 | 
652 | - Version 0.2. [Adam Cohen]
653 | 
654 | - Unicode/string comparison bug. [Adam Cohen]
655 | 
656 | - To maintain backwards compatibility, default is to force_ascii as
657 |   before. [Adam Cohen]
658 | 
659 | - Fix merge conflict. [Adam Cohen]
660 | 
661 | - New process function: extractBests. [Flávio Juvenal]
662 | 
663 | - More readable reverse sorting. [Flávio Juvenal]
664 | 
665 | - Further honoring of force_ascii. [Adam Cohen]
666 | 
667 | - Indentation fix. [Adam Cohen]
668 | 
669 | - Handle force_ascii in fuzz methods. [Adam Cohen]
670 | 
671 | - Add back relevant tests. [Adam Cohen]
672 | 
673 | - Utility method to make things consistent. [Adam Cohen]
674 | 
675 | - Re-commit asciidammit and add a parameter to full_process to determine
676 |   behavior. [Adam Cohen]
677 | 
678 | - Added a test for non letters/digits replacements. [Tristan Launay]
679 | 
680 | - ENG-741 fixed benchmark line length. [Laurent Erignoux]
681 | 
682 | - Fixed Unicode flag for tests. [Tristan Launay]
683 | 
684 | - ENG-741 commented code removed not erased for review from creator.
685 |   [Laurent Erignoux]
686 | 
687 | - ENG-741 cut long lines in fuzzy wizzy benchmark. [Laurent Erignoux]
688 | 
689 | - Re-upped the limit on benchmark, now that performance is not an issue
690 |   anymore. [Tristan Launay]
691 | 
692 | - Fixed comment. [Tristan Launay]
693 | 
694 | - Simplified processing of strings with built-in regex code in python.
695 |   Also fixed empty string detection in token_sort_ratio. [Tristan
696 |   Launay]
697 | 
698 | - Proper benchmark display. Introduce methods to explicitly do all the
699 |   unicode preprocessing *before* using fuzz lib. [Tristan Launay]
700 | 
701 | - ENG-741: having a true benchmark, to see when we improve stuff.
702 |   [Benjamin Combourieu]
703 | 
704 | - Unicode support in benchmark.py. [Benjamin Combourieu]
705 | 
706 | - Added file for processing strings. [Tristan Launay]
707 | 
708 | - Uniform treatment of strings in Unicode. Non-ASCII chars are now
709 |   considered in strings, which allows for matches in Cyrillic, Chinese,
710 |   Greek, etc. [Tristan Launay]
711 | 
712 | - Fixed bug in _token_set. [Michael Edward]
713 | 
714 | - Removed reference to PR. [Jose Diaz-Gonzalez]
715 | 
716 | - Sadist build and virtualenv dirs are not part of the project. [Pedro
717 |   Rodrigues]
718 | 
719 | - Fixes https://github.com/seatgeek/thefuzz/issues/10 and correctly
720 |   points to README.textile. [Pedro Rodrigues]
721 | 
722 | - Info on the pull request. [Pedro Rodrigues]
723 | 
724 | - Pullstat.us button. [Pedro Rodrigues]
725 | 
726 | - Fuzzywuzzy really needs better benchmarks. [Pedro Rodrigues]
727 | 
728 | - Moved tests and benchmarks out of the package. [Pedro Rodrigues]
729 | 
730 | - Report better ratio()s redundant import try. [Pedro Rodrigues]
731 | 
732 | - AssertGreater did not exist in python 2.4. [Pedro Rodrigues]
733 | 
734 | - Remove debug output. [Adam Cohen]
735 | 
736 | - Looks for python-Levenshtein package, and if present, uses that
737 |   instead of difflib. 10x speedup if present. add benchmarks. [Adam
738 |   Cohen]
739 | 
740 | - Add gitignore. [Adam Cohen]
741 | 
742 | - Fix a bug in WRatio, as well as an issue in full_process, which was
743 |   failing on strings with all unicode characters. [Adam Cohen]
744 | 
745 | - Error in partial_ratio. closes #7. [Adam Cohen]
746 | 
747 | - Adding some real-life event data for benchmarking. [Adam Cohen]
748 | 
749 | - Cleaned up utils.py. [Pedro Rodrigues]
750 | 
751 | - Optimized speed for full_process() [Pedro Rodrigues]
752 | 
753 | - Speed improvements to asciidammit. [Pedro Rodrigues]
754 | 
755 | - Removed old versions of validate_string() and remove_ponctuation()
756 |   kept from previous commits. [Pedro Rodrigues]
757 | 
758 | - Issue #6 from github updated license headers to match MIT license.
759 |   [Pedro Rodrigues]
760 | 
761 | - Clean up. [Pedro Rodrigues]
762 | 
763 | - Changes to utils.validate_string() and benchmarks. [Pedro Rodrigues]
764 | 
765 | - Some benchmarks to test the changes made to remove_punctuation. [Pedro
766 |   Rodrigues]
767 | 
768 | - Faster remove_punctuation. [Pedro Rodrigues]
769 | 
770 | - AssertIsNone did not exist in Python 2.4. [Pedro Rodrigues]
771 | 
772 | - Just adding some simple install instructions for pip. [Chris Dary]
773 | 
774 | - Check for null/empty strings in QRatio and WRatio. Add tests. Closes
775 |   #3. [Adam Cohen]
776 | 
777 | - More README. [Adam Cohen]
778 | 
779 | - README. [Adam Cohen]
780 | 
781 | - README. [Adam Cohen]
782 | 
783 | - Slight change to README. [Adam Cohen]
784 | 
785 | - Some readme. [Adam Cohen]
786 | 
787 | - Distutils. [Adam Cohen]
788 | 
789 | - Change directory structure. [Adam Cohen]
790 | 
791 | - Initial commit. [Adam Cohen]
792 | 
793 | 
794 | 


--------------------------------------------------------------------------------
/thefuzz-master/LICENSE.txt:
--------------------------------------------------------------------------------
  1 | 
  2 |                     GNU GENERAL PUBLIC LICENSE
  3 |                        Version 2, June 1991
  4 | 
  5 |  Copyright (C) 1989, 1991 Free Software Foundation, Inc.
  6 |     59 Temple Place, Suite 330, Boston, MA 02111 USA
  7 |  Everyone is permitted to copy and distribute verbatim copies
  8 |  of this license document, but changing it is not allowed.
  9 | 
 10 |                             Preamble
 11 | 
 12 |   The licenses for most software are designed to take away your
 13 | freedom to share and change it.  By contrast, the GNU General Public
 14 | License is intended to guarantee your freedom to share and change free
 15 | software--to make sure the software is free for all its users.  This
 16 | General Public License applies to most of the Free Software
 17 | Foundation's software and to any other program whose authors commit to
 18 | using it.  (Some other Free Software Foundation software is covered by
 19 | the GNU Library General Public License instead.)  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | this service if you wish), that you receive source code or can get it
 26 | if you want it, that you can change the software or use pieces of it
 27 | in new free programs; and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to make restrictions that forbid
 30 | anyone to deny you these rights or to ask you to surrender the rights.
 31 | These restrictions translate to certain responsibilities for you if you
 32 | distribute copies of the software, or if you modify it.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must give the recipients all the rights that
 36 | you have.  You must make sure that they, too, receive or can get the
 37 | source code.  And you must show them these terms so they know their
 38 | rights.
 39 | 
 40 |   We protect your rights with two steps: (1) copyright the software, and
 41 | (2) offer you this license which gives you legal permission to copy,
 42 | distribute and/or modify the software.
 43 | 
 44 |   Also, for each author's protection and ours, we want to make certain
 45 | that everyone understands that there is no warranty for this free
 46 | software.  If the software is modified by someone else and passed on, we
 47 | want its recipients to know that what they have is not the original, so
 48 | that any problems introduced by others will not reflect on the original
 49 | authors' reputations.
 50 | 
 51 |   Finally, any free program is threatened constantly by software
 52 | patents.  We wish to avoid the danger that redistributors of a free
 53 | program will individually obtain patent licenses, in effect making the
 54 | program proprietary.  To prevent this, we have made it clear that any
 55 | patent must be licensed for everyone's free use or not licensed at all.
 56 | 
 57 |   The precise terms and conditions for copying, distribution and
 58 | modification follow.
 59 | 
 60 |                     GNU GENERAL PUBLIC LICENSE
 61 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 62 | 
 63 |   0. This License applies to any program or other work which contains
 64 | a notice placed by the copyright holder saying it may be distributed
 65 | under the terms of this General Public License.  The "Program", below,
 66 | refers to any such program or work, and a "work based on the Program"
 67 | means either the Program or any derivative work under copyright law:
 68 | that is to say, a work containing the Program or a portion of it,
 69 | either verbatim or with modifications and/or translated into another
 70 | language.  (Hereinafter, translation is included without limitation in
 71 | the term "modification".)  Each licensee is addressed as "you".
 72 | 
 73 | Activities other than copying, distribution and modification are not
 74 | covered by this License; they are outside its scope.  The act of
 75 | running the Program is not restricted, and the output from the Program
 76 | is covered only if its contents constitute a work based on the
 77 | Program (independent of having been made by running the Program).
 78 | Whether that is true depends on what the Program does.
 79 | 
 80 |   1. You may copy and distribute verbatim copies of the Program's
 81 | source code as you receive it, in any medium, provided that you
 82 | conspicuously and appropriately publish on each copy an appropriate
 83 | copyright notice and disclaimer of warranty; keep intact all the
 84 | notices that refer to this License and to the absence of any warranty;
 85 | and give any other recipients of the Program a copy of this License
 86 | along with the Program.
 87 | 
 88 | You may charge a fee for the physical act of transferring a copy, and
 89 | you may at your option offer warranty protection in exchange for a fee.
 90 | 
 91 |   2. You may modify your copy or copies of the Program or any portion
 92 | of it, thus forming a work based on the Program, and copy and
 93 | distribute such modifications or work under the terms of Section 1
 94 | above, provided that you also meet all of these conditions:
 95 | 
 96 |     a) You must cause the modified files to carry prominent notices
 97 |     stating that you changed the files and the date of any change.
 98 | 
 99 |     b) You must cause any work that you distribute or publish, that in
100 |     whole or in part contains or is derived from the Program or any
101 |     part thereof, to be licensed as a whole at no charge to all third
102 |     parties under the terms of this License.
103 | 
104 |     c) If the modified program normally reads commands interactively
105 |     when run, you must cause it, when started running for such
106 |     interactive use in the most ordinary way, to print or display an
107 |     announcement including an appropriate copyright notice and a
108 |     notice that there is no warranty (or else, saying that you provide
109 |     a warranty) and that users may redistribute the program under
110 |     these conditions, and telling the user how to view a copy of this
111 |     License.  (Exception: if the Program itself is interactive but
112 |     does not normally print such an announcement, your work based on
113 |     the Program is not required to print an announcement.)
114 | 
115 | These requirements apply to the modified work as a whole.  If
116 | identifiable sections of that work are not derived from the Program,
117 | and can be reasonably considered independent and separate works in
118 | themselves, then this License, and its terms, do not apply to those
119 | sections when you distribute them as separate works.  But when you
120 | distribute the same sections as part of a whole which is a work based
121 | on the Program, the distribution of the whole must be on the terms of
122 | this License, whose permissions for other licensees extend to the
123 | entire whole, and thus to each and every part regardless of who wrote it.
124 | 
125 | Thus, it is not the intent of this section to claim rights or contest
126 | your rights to work written entirely by you; rather, the intent is to
127 | exercise the right to control the distribution of derivative or
128 | collective works based on the Program.
129 | 
130 | In addition, mere aggregation of another work not based on the Program
131 | with the Program (or with a work based on the Program) on a volume of
132 | a storage or distribution medium does not bring the other work under
133 | the scope of this License.
134 | 
135 |   3. You may copy and distribute the Program (or a work based on it,
136 | under Section 2) in object code or executable form under the terms of
137 | Sections 1 and 2 above provided that you also do one of the following:
138 | 
139 |     a) Accompany it with the complete corresponding machine-readable
140 |     source code, which must be distributed under the terms of Sections
141 |     1 and 2 above on a medium customarily used for software interchange; or,
142 | 
143 |     b) Accompany it with a written offer, valid for at least three
144 |     years, to give any third party, for a charge no more than your
145 |     cost of physically performing source distribution, a complete
146 |     machine-readable copy of the corresponding source code, to be
147 |     distributed under the terms of Sections 1 and 2 above on a medium
148 |     customarily used for software interchange; or,
149 | 
150 |     c) Accompany it with the information you received as to the offer
151 |     to distribute corresponding source code.  (This alternative is
152 |     allowed only for noncommercial distribution and only if you
153 |     received the program in object code or executable form with such
154 |     an offer, in accord with Subsection b above.)
155 | 
156 | The source code for a work means the preferred form of the work for
157 | making modifications to it.  For an executable work, complete source
158 | code means all the source code for all modules it contains, plus any
159 | associated interface definition files, plus the scripts used to
160 | control compilation and installation of the executable.  However, as a
161 | special exception, the source code distributed need not include
162 | anything that is normally distributed (in either source or binary
163 | form) with the major components (compiler, kernel, and so on) of the
164 | operating system on which the executable runs, unless that component
165 | itself accompanies the executable.
166 | 
167 | If distribution of executable or object code is made by offering
168 | access to copy from a designated place, then offering equivalent
169 | access to copy the source code from the same place counts as
170 | distribution of the source code, even though third parties are not
171 | compelled to copy the source along with the object code.
172 | 
173 |   4. You may not copy, modify, sublicense, or distribute the Program
174 | except as expressly provided under this License.  Any attempt
175 | otherwise to copy, modify, sublicense or distribute the Program is
176 | void, and will automatically terminate your rights under this License.
177 | However, parties who have received copies, or rights, from you under
178 | this License will not have their licenses terminated so long as such
179 | parties remain in full compliance.
180 | 
181 |   5. You are not required to accept this License, since you have not
182 | signed it.  However, nothing else grants you permission to modify or
183 | distribute the Program or its derivative works.  These actions are
184 | prohibited by law if you do not accept this License.  Therefore, by
185 | modifying or distributing the Program (or any work based on the
186 | Program), you indicate your acceptance of this License to do so, and
187 | all its terms and conditions for copying, distributing or modifying
188 | the Program or works based on it.
189 | 
190 |   6. Each time you redistribute the Program (or any work based on the
191 | Program), the recipient automatically receives a license from the
192 | original licensor to copy, distribute or modify the Program subject to
193 | these terms and conditions.  You may not impose any further
194 | restrictions on the recipients' exercise of the rights granted herein.
195 | You are not responsible for enforcing compliance by third parties to
196 | this License.
197 | 
198 |   7. If, as a consequence of a court judgment or allegation of patent
199 | infringement or for any other reason (not limited to patent issues),
200 | conditions are imposed on you (whether by court order, agreement or
201 | otherwise) that contradict the conditions of this License, they do not
202 | excuse you from the conditions of this License.  If you cannot
203 | distribute so as to satisfy simultaneously your obligations under this
204 | License and any other pertinent obligations, then as a consequence you
205 | may not distribute the Program at all.  For example, if a patent
206 | license would not permit royalty-free redistribution of the Program by
207 | all those who receive copies directly or indirectly through you, then
208 | the only way you could satisfy both it and this License would be to
209 | refrain entirely from distribution of the Program.
210 | 
211 | If any portion of this section is held invalid or unenforceable under
212 | any particular circumstance, the balance of the section is intended to
213 | apply and the section as a whole is intended to apply in other
214 | circumstances.
215 | 
216 | It is not the purpose of this section to induce you to infringe any
217 | patents or other property right claims or to contest validity of any
218 | such claims; this section has the sole purpose of protecting the
219 | integrity of the free software distribution system, which is
220 | implemented by public license practices.  Many people have made
221 | generous contributions to the wide range of software distributed
222 | through that system in reliance on consistent application of that
223 | system; it is up to the author/donor to decide if he or she is willing
224 | to distribute software through any other system and a licensee cannot
225 | impose that choice.
226 | 
227 | This section is intended to make thoroughly clear what is believed to
228 | be a consequence of the rest of this License.
229 | 
230 |   8. If the distribution and/or use of the Program is restricted in
231 | certain countries either by patents or by copyrighted interfaces, the
232 | original copyright holder who places the Program under this License
233 | may add an explicit geographical distribution limitation excluding
234 | those countries, so that distribution is permitted only in or among
235 | countries not thus excluded.  In such case, this License incorporates
236 | the limitation as if written in the body of this License.
237 | 
238 |   9. The Free Software Foundation may publish revised and/or new versions
239 | of the General Public License from time to time.  Such new versions will
240 | be similar in spirit to the present version, but may differ in detail to
241 | address new problems or concerns.
242 | 
243 | Each version is given a distinguishing version number.  If the Program
244 | specifies a version number of this License which applies to it and "any
245 | later version", you have the option of following the terms and conditions
246 | either of that version or of any later version published by the Free
247 | Software Foundation.  If the Program does not specify a version number of
248 | this License, you may choose any version ever published by the Free Software
249 | Foundation.
250 | 
251 |   10. If you wish to incorporate parts of the Program into other free
252 | programs whose distribution conditions are different, write to the author
253 | to ask for permission.  For software which is copyrighted by the Free
254 | Software Foundation, write to the Free Software Foundation; we sometimes
255 | make exceptions for this.  Our decision will be guided by the two goals
256 | of preserving the free status of all derivatives of our free software and
257 | of promoting the sharing and reuse of software generally.
258 | 
259 |                             NO WARRANTY
260 | 
261 |   11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
262 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
263 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
264 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
265 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
266 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
267 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
268 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
269 | REPAIR OR CORRECTION.
270 | 
271 |   12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
272 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
273 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
274 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
275 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
276 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
277 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
278 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
279 | POSSIBILITY OF SUCH DAMAGES.
280 | 
281 |                      END OF TERMS AND CONDITIONS
282 | 
283 |         Appendix: How to Apply These Terms to Your New Programs
284 | 
285 |   If you develop a new program, and you want it to be of the greatest
286 | possible use to the public, the best way to achieve this is to make it
287 | free software which everyone can redistribute and change under these terms.
288 | 
289 |   To do so, attach the following notices to the program.  It is safest
290 | to attach them to the start of each source file to most effectively
291 | convey the exclusion of warranty; and each file should have at least
292 | the "copyright" line and a pointer to where the full notice is found.
293 | 
294 |     <one line to give the program's name and a brief idea of what it does.>
295 |     Copyright (C) 19yy  <name of author>
296 | 
297 |     This program is free software; you can redistribute it and/or modify
298 |     it under the terms of the GNU General Public License as published by
299 |     the Free Software Foundation; either version 2 of the License, or
300 |     (at your option) any later version.
301 | 
302 |     This program is distributed in the hope that it will be useful,
303 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
304 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
305 |     GNU General Public License for more details.
306 | 
307 |     You should have received a copy of the GNU General Public License
308 |     along with this program; if not, write to the Free Software
309 |     Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111 USA
310 | 
311 | Also add information on how to contact you by electronic and paper mail.
312 | 
313 | If the program is interactive, make it output a short notice like this
314 | when it starts in an interactive mode:
315 | 
316 |     Gnomovision version 69, Copyright (C) 19yy name of author
317 |     Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
318 |     This is free software, and you are welcome to redistribute it
319 |     under certain conditions; type `show c' for details.
320 | 
321 | The hypothetical commands `show w' and `show c' should show the appropriate
322 | parts of the General Public License.  Of course, the commands you use may
323 | be called something other than `show w' and `show c'; they could even be
324 | mouse-clicks or menu items--whatever suits your program.
325 | 
326 | You should also get your employer (if you work as a programmer) or your
327 | school, if any, to sign a "copyright disclaimer" for the program, if
328 | necessary.  Here is a sample; alter the names:
329 | 
330 |   Yoyodyne, Inc., hereby disclaims all copyright interest in the program
331 |   `Gnomovision' (which makes passes at compilers) written by James Hacker.
332 | 
333 |   <signature of Ty Coon>, 1 April 1989
334 |   Ty Coon, President of Vice
335 | 
336 | This General Public License does not permit incorporating your program into
337 | proprietary programs.  If your program is a subroutine library, you may
338 | consider it more useful to permit linking proprietary applications with the
339 | library.  If this is what you want to do, use the GNU Library General
340 | Public License instead of this License.


--------------------------------------------------------------------------------
/thefuzz-master/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.txt
2 | include *.rst
3 | include test_thefuzz.py
4 | 


--------------------------------------------------------------------------------
/thefuzz-master/README.rst:
--------------------------------------------------------------------------------
  1 | .. image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
  2 |     :target: https://github.com/seatgeek/thefuzz
  3 | 
  4 | TheFuzz
  5 | =======
  6 | 
  7 | Fuzzy string matching like a boss. It uses `Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>`_ to calculate the differences between sequences in a simple-to-use package.
  8 | 
  9 | Requirements
 10 | ============
 11 | 
 12 | -  Python 3.8 or higher
 13 | -  `rapidfuzz <https://github.com/maxbachmann/RapidFuzz/>`_
 14 | 
 15 | For testing
 16 | ~~~~~~~~~~~
 17 | -  pycodestyle
 18 | -  hypothesis
 19 | -  pytest
 20 | 
 21 | Installation
 22 | ============
 23 | 
 24 | Using pip via PyPI
 25 | 
 26 | .. code:: bash
 27 | 
 28 |     pip install thefuzz
 29 | 
 30 | 
 31 | Using pip via GitHub
 32 | 
 33 | .. code:: bash
 34 | 
 35 |     pip install git+git://github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz
 36 | 
 37 | Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)
 38 | 
 39 | .. code:: bash
 40 | 
 41 |     git+ssh://git@github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz
 42 | 
 43 | Manually via GIT
 44 | 
 45 | .. code:: bash
 46 | 
 47 |     git clone git://github.com/seatgeek/thefuzz.git thefuzz
 48 |     cd thefuzz
 49 |     python setup.py install
 50 | 
 51 | 
 52 | Usage
 53 | =====
 54 | 
 55 | .. code:: python
 56 | 
 57 |     >>> from thefuzz import fuzz
 58 |     >>> from thefuzz import process
 59 | 
 60 | Simple Ratio
 61 | ~~~~~~~~~~~~
 62 | 
 63 | .. code:: python
 64 | 
 65 |     >>> fuzz.ratio("this is a test", "this is a test!")
 66 |         97
 67 | 
 68 | Partial Ratio
 69 | ~~~~~~~~~~~~~
 70 | 
 71 | .. code:: python
 72 | 
 73 |     >>> fuzz.partial_ratio("this is a test", "this is a test!")
 74 |         100
 75 | 
 76 | Token Sort Ratio
 77 | ~~~~~~~~~~~~~~~~
 78 | 
 79 | .. code:: python
 80 | 
 81 |     >>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
 82 |         91
 83 |     >>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
 84 |         100
 85 | 
 86 | Token Set Ratio
 87 | ~~~~~~~~~~~~~~~
 88 | 
 89 | .. code:: python
 90 | 
 91 |     >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
 92 |         84
 93 |     >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
 94 |         100
 95 | 
 96 | Partial Token Sort Ratio
 97 | ~~~~~~~~~~~~~~~~~~~~~~~~
 98 | 
 99 | .. code:: python
100 | 
101 |     >>> fuzz.token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
102 |         84
103 |     >>> fuzz.partial_token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
104 |         100
105 | 
106 | Process
107 | ~~~~~~~
108 | 
109 | .. code:: python
110 | 
111 |     >>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
112 |     >>> process.extract("new york jets", choices, limit=2)
113 |         [('New York Jets', 100), ('New York Giants', 78)]
114 |     >>> process.extractOne("cowboys", choices)
115 |         ("Dallas Cowboys", 90)
116 | 
117 | You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:
118 | 
119 | .. code:: python
120 | 
121 |     >>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
122 |         ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
123 |     >>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
124 |         ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)
125 | 
126 | .. |Build Status| image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
127 |    :target: https://github.com/seatgeek/thefuzz
128 | 


--------------------------------------------------------------------------------
/thefuzz-master/benchmarks.py:
--------------------------------------------------------------------------------
  1 | from timeit import timeit
  2 | import math
  3 | import csv
  4 | 
  5 | iterations = 100000
  6 | 
  7 | 
  8 | reader = csv.DictReader(open('data/titledata.csv'), delimiter='|')
  9 | titles = [i['custom_title'] for i in reader]
 10 | title_blob = '\n'.join(titles)
 11 | 
 12 | 
 13 | cirque_strings = [
 14 |     "cirque du soleil - zarkana - las vegas",
 15 |     "cirque du soleil ",
 16 |     "cirque du soleil las vegas",
 17 |     "zarkana las vegas",
 18 |     "las vegas cirque du soleil at the bellagio",
 19 |     "zarakana - cirque du soleil - bellagio"
 20 | ]
 21 | 
 22 | choices = [
 23 |     "",
 24 |     "new york yankees vs boston red sox",
 25 |     "",
 26 |     "zarakana - cirque du soleil - bellagio",
 27 |     None,
 28 |     "cirque du soleil las vegas",
 29 |     None
 30 | ]
 31 | 
 32 | mixed_strings = [
 33 |     "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
 34 |     "C\\'est la vie",
 35 |     "Ça va?",
 36 |     "Cães danados",
 37 |     "\xacCamarões assados",
 38 |     "a\xac\u1234\u20ac\U00008000"
 39 | ]
 40 | 
 41 | common_setup = "from thefuzz import fuzz, utils; "
 42 | 
 43 | 
 44 | def print_result_from_timeit(stmt='pass', setup='pass', number=1000000):
 45 |     """
 46 |     Clean function to know how much time took the execution of one statement
 47 |     """
 48 |     units = ["s", "ms", "us", "ns"]
 49 |     duration = timeit(stmt, setup, number=int(number))
 50 |     avg_duration = duration / float(number)
 51 |     thousands = int(math.floor(math.log(avg_duration, 1000)))
 52 | 
 53 |     print("Total time: {:f}s. Average run: {:.3f}{}.".format(
 54 |         duration, avg_duration * (1000 ** -thousands), units[-thousands]))
 55 | 
 56 | 
 57 | for s in mixed_strings + cirque_strings + choices:
 58 |     print('Test full_process for: "%s"' % s)
 59 |     print_result_from_timeit('utils.full_process(u\'%s\')' % s,
 60 |                              common_setup, number=iterations)
 61 | 
 62 | # benchmarking the core matching methods...
 63 | 
 64 | for s in cirque_strings:
 65 |     print('Test fuzz.ratio for string: "%s"' % s)
 66 |     print('-------------------------------')
 67 |     print_result_from_timeit('fuzz.ratio(u\'cirque du soleil\', u\'%s\')' % s,
 68 |                              common_setup, number=iterations / 100)
 69 | 
 70 | for s in cirque_strings:
 71 |     print('Test fuzz.partial_ratio for string: "%s"' % s)
 72 |     print('-------------------------------')
 73 |     print_result_from_timeit('fuzz.partial_ratio(u\'cirque du soleil\', u\'%s\')'
 74 |                              % s, common_setup, number=iterations / 100)
 75 | 
 76 | for s in cirque_strings:
 77 |     print('Test fuzz.WRatio for string: "%s"' % s)
 78 |     print('-------------------------------')
 79 |     print_result_from_timeit('fuzz.WRatio(u\'cirque du soleil\', u\'%s\')' % s,
 80 |                              common_setup, number=iterations / 100)
 81 | 
 82 | print('Test process.extract(scorer =  fuzz.QRatio) for string: "%s"' % s)
 83 | print('-------------------------------')
 84 | print_result_from_timeit('process.extract(u\'cirque du soleil\', choices, scorer =  fuzz.QRatio)',
 85 |                              common_setup + " from thefuzz import process; import string,random; random.seed(18);"
 86 |                              " choices = [\'\'.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(30)) for s in range(5000)]",
 87 |                               number=10)
 88 | 
 89 | print('Test process.extract(scorer =  fuzz.WRatio) for string: "%s"' % s)
 90 | print('-------------------------------')
 91 | print_result_from_timeit('process.extract(u\'cirque du soleil\', choices, scorer =  fuzz.WRatio)',
 92 |                              common_setup + " from thefuzz import process; import string,random; random.seed(18);"
 93 |                              " choices = [\'\'.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(30)) for s in range(5000)]",
 94 |                               number=10)
 95 | 
 96 | 
 97 | # let me show you something
 98 | 
 99 | s = 'New York Yankees'
100 | 
101 | test = 'import functools\n'
102 | test += 'title_blob = """%s"""\n' % title_blob
103 | test += 'title_blob = title_blob.strip()\n'
104 | test += 'titles = title_blob.split("\\n")\n'
105 | 
106 | print('Real world ratio(): "%s"' % s)
107 | print('-------------------------------')
108 | test += 'prepared_ratio = functools.partial(fuzz.ratio, "%s")\n' % s
109 | test += 'titles.sort(key=prepared_ratio)\n'
110 | print_result_from_timeit(test, common_setup, number=100)
111 | 


--------------------------------------------------------------------------------
/thefuzz-master/release:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | set -eo pipefail; [[ $RELEASE_TRACE ]] && set -x
  3 | 
  4 | PACKAGE_NAME='thefuzz'
  5 | INIT_PACKAGE_NAME='thefuzz'
  6 | PUBLIC="true"
  7 | 
  8 | # Colors
  9 | COLOR_OFF="\033[0m"   # unsets color to term fg color
 10 | RED="\033[0;31m"      # red
 11 | GREEN="\033[0;32m"    # green
 12 | YELLOW="\033[0;33m"   # yellow
 13 | MAGENTA="\033[0;35m"  # magenta
 14 | CYAN="\033[0;36m"     # cyan
 15 | 
 16 | # ensure wheel is available
 17 | pip install wheel > /dev/null
 18 | 
 19 | # ensure Pygment is available
 20 | pip install Pygments > /dev/null
 21 | 
 22 | command -v gitchangelog >/dev/null 2>&1 || {
 23 |     echo -e "${RED}WARNING: Missing gitchangelog binary, please run: pip install gitchangelog==2.2.0${COLOR_OFF}\n"
 24 |     exit 1
 25 | }
 26 | 
 27 | command -v rst-lint > /dev/null || {
 28 |     echo -e "${RED}WARNING: Missing rst-lint binary, please run: pip install restructuredtext_lint${COLOR_OFF}\n"
 29 |     exit 1
 30 | }
 31 | 
 32 | set +e;
 33 | python test_thefuzz.py &> /dev/null  # run the tests
 34 | if [ ! $? -eq 0 ]; then
 35 |     echo -e "${RED}WARNING: The tests are failing.${COLOR_OFF}"
 36 |     exit 1
 37 | fi
 38 | set -e;
 39 | 
 40 | if [[ "$@" != "major" ]] && [[ "$@" != "minor" ]] && [[ "$@" != "patch" ]]; then
 41 |     echo -e "${RED}WARNING: Invalid release type, must specify 'major', 'minor', or 'patch'${COLOR_OFF}\n"
 42 |     exit 1
 43 | fi
 44 | 
 45 | echo -e "\n${GREEN}STARTING RELEASE PROCESS${COLOR_OFF}\n"
 46 | 
 47 | set +e;
 48 | git status | grep -Eo "working (directory|tree) clean" &> /dev/null
 49 | if [ ! $? -eq 0 ]; then # working directory is NOT clean
 50 |     echo -e "${RED}WARNING: You have uncommitted changes, you may have forgotten something${COLOR_OFF}\n"
 51 |     exit 1
 52 | fi
 53 | set -e;
 54 | 
 55 | echo -e "${YELLOW}--->${COLOR_OFF} Updating local copy"
 56 | git pull -q origin master
 57 | git fetch --tags > /dev/null
 58 | 
 59 | echo -e "${YELLOW}--->${COLOR_OFF} Retrieving release versions"
 60 | 
 61 | current_version=$(cat ${INIT_PACKAGE_NAME}/__init__.py |grep '__version__ ='|sed 's/[^0-9.]//g')
 62 | major=$(echo $current_version | awk '{split($0,a,"."); print a[1]}')
 63 | minor=$(echo $current_version | awk '{split($0,a,"."); print a[2]}')
 64 | patch=$(echo $current_version | awk '{split($0,a,"."); print a[3]}')
 65 | 
 66 | if [[ "$@" == "major" ]]; then
 67 |     major=$(($major + 1));
 68 |     minor="0"
 69 |     patch="0"
 70 | elif [[ "$@" == "minor" ]]; then
 71 |     minor=$(($minor + 1));
 72 |     patch="0"
 73 | elif [[ "$@" == "patch" ]]; then
 74 |     patch=$(($patch + 1));
 75 | fi
 76 | 
 77 | next_version="${major}.${minor}.${patch}"
 78 | 
 79 | echo -e  "${YELLOW}   >${COLOR_OFF} ${MAGENTA}${current_version}${COLOR_OFF} -> ${MAGENTA}${next_version}${COLOR_OFF}"
 80 | 
 81 | echo -e "${YELLOW}--->${COLOR_OFF} Ensuring readme passes lint checks (if this fails, run rst-lint)"
 82 | rst-lint README.rst > /dev/null
 83 | 
 84 | echo -e "${YELLOW}--->${COLOR_OFF} Creating necessary temp file"
 85 | tempfoo=$(basename $0)
 86 | TMPFILE=$(mktemp /tmp/${tempfoo}.XXXXXX) || {
 87 |     echo -e "${RED}WARNING: Cannot create temp file using mktemp in /tmp dir ${COLOR_OFF}\n"
 88 |     exit 1
 89 | }
 90 | 
 91 | find_this="__version__ = '$current_version'"
 92 | replace_with="__version__ = '$next_version'"
 93 | 
 94 | echo -e "${YELLOW}--->${COLOR_OFF} Updating ${INIT_PACKAGE_NAME}/__init__.py"
 95 | sed "s/$find_this/$replace_with/" ${INIT_PACKAGE_NAME}/__init__.py > $TMPFILE && mv $TMPFILE ${INIT_PACKAGE_NAME}/__init__.py
 96 | 
 97 | echo -e "${YELLOW}--->${COLOR_OFF} Updating README.rst"
 98 | find_this="${PACKAGE_NAME}.git@$current_version"
 99 | replace_with="${PACKAGE_NAME}.git@$next_version"
100 | sed "s/$find_this/$replace_with/" README.rst > $TMPFILE && mv $TMPFILE README.rst
101 | find_this="${PACKAGE_NAME}==$current_version"
102 | replace_with="${PACKAGE_NAME}==$next_version"
103 | sed "s/$find_this/$replace_with/" README.rst > $TMPFILE && mv $TMPFILE README.rst
104 | 
105 | if [ -f docs/conf.py ]; then
106 |     echo -e "${YELLOW}--->${COLOR_OFF} Updating docs"
107 |     find_this="version = '${current_version}'"
108 |     replace_with="version = '${next_version}'"
109 |     sed "s/$find_this/$replace_with/" docs/conf.py > $TMPFILE && mv $TMPFILE docs/conf.py
110 | 
111 |     find_this="release = '${current_version}'"
112 |     replace_with="release = '${next_version}'"
113 |     sed "s/$find_this/$replace_with/" docs/conf.py > $TMPFILE && mv $TMPFILE docs/conf.py
114 | fi
115 | 
116 | echo -e "${YELLOW}--->${COLOR_OFF} Updating CHANGES.rst for new release"
117 | version_header="$next_version ($(date +%F))"
118 | set +e; dashes=$(yes '-'|head -n ${#version_header}|tr -d '\n') ; set -e
119 | gitchangelog |sed "4s/.*/$version_header/"|sed "5s/.*/$dashes/" > $TMPFILE && mv $TMPFILE CHANGES.rst
120 | 
121 | echo -e "${YELLOW}--->${COLOR_OFF} Adding changed files to git"
122 | git add CHANGES.rst README.rst ${INIT_PACKAGE_NAME}/__init__.py
123 | if [ -f docs/conf.py ]; then git add docs/conf.py; fi
124 | 
125 | echo -e "${YELLOW}--->${COLOR_OFF} Creating release"
126 | git commit -q -m "Release version $next_version"
127 | 
128 | echo -e "${YELLOW}--->${COLOR_OFF} Tagging release"
129 | git tag -a $next_version -m "Release version $next_version"
130 | 
131 | echo -e "${YELLOW}--->${COLOR_OFF} Pushing release and tags to github"
132 | git push -q origin master && git push -q --tags
133 | 
134 | if [[ "$PUBLIC" == "true" ]]; then
135 |     echo -e "${YELLOW}--->${COLOR_OFF} Creating python release"
136 |     cp README.rst README
137 |     python setup.py sdist bdist_wheel upload > /dev/null
138 |     rm README
139 | else
140 |     echo -e "${YELLOW}--->${COLOR_OFF} Creating local python dist and wheel for manual release"
141 |     python setup.py sdist bdist_wheel > /dev/null
142 | fi
143 | 
144 | echo -e "\n${CYAN}RELEASED VERSION ${next_version}${COLOR_OFF}\n"
145 | 


--------------------------------------------------------------------------------
/thefuzz-master/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | # Copyright (c) 2014 SeatGeek
 4 | 
 5 | # This file is part of thefuzz.
 6 | 
 7 | from thefuzz import __version__
 8 | from setuptools import setup
 9 | 
10 | with open('README.rst') as f:
11 |     long_description = f.read()
12 | 
13 | setup(
14 |     name='thefuzz',
15 |     version=__version__,
16 |     author='Adam Cohen',
17 |     author_email='adam@seatgeek.com',
18 |     packages=['thefuzz'],
19 |     # keep for backwards compatibility of projects depending on `thefuzz[speedup]`
20 |     extras_require={'speedup': []},
21 |     install_requires=['rapidfuzz>=3.0.0, < 4.0.0'],
22 |     url='https://github.com/seatgeek/thefuzz',
23 |     license="GPLv2",
24 |     classifiers=[
25 |         'Intended Audience :: Developers',
26 |         'License :: OSI Approved :: GNU General Public License v2 (GPLv2)',
27 |         'Programming Language :: Python',
28 |         'Programming Language :: Python :: 3',
29 |         'Programming Language :: Python :: 3.8',
30 |         'Programming Language :: Python :: 3.9',
31 |         'Programming Language :: Python :: 3.10',
32 |         'Programming Language :: Python :: 3.11',
33 |         'Programming Language :: Python :: 3.12',
34 |         'Programming Language :: Python :: 3 :: Only',
35 |     ],
36 |     description='Fuzzy string matching in python',
37 |     long_description=long_description,
38 |     zip_safe=True,
39 |     python_requires='>=3.8'
40 | )
41 | 


--------------------------------------------------------------------------------
/thefuzz-master/test_thefuzz.py:
--------------------------------------------------------------------------------
  1 | import unittest
  2 | import re
  3 | import pycodestyle
  4 | 
  5 | from thefuzz import fuzz
  6 | from thefuzz import process
  7 | from thefuzz import utils
  8 | 
  9 | scorers = [
 10 |     fuzz.ratio,
 11 |     fuzz.partial_ratio,
 12 |     fuzz.token_sort_ratio,
 13 |     fuzz.token_set_ratio,
 14 |     fuzz.partial_token_sort_ratio,
 15 |     fuzz.partial_token_set_ratio,
 16 |     fuzz.QRatio,
 17 |     fuzz.UQRatio,
 18 |     fuzz.WRatio,
 19 |     fuzz.UWRatio,
 20 | ]
 21 | 
 22 | class StringProcessingTest(unittest.TestCase):
 23 |     def test_replace_non_letters_non_numbers_with_whitespace(self):
 24 |         strings = ["new york mets - atlanta braves", "Cães danados",
 25 |                    "New York //// Mets $$$", "Ça va?"]
 26 |         for string in strings:
 27 |             proc_string = utils.full_process(string)
 28 |             regex = re.compile(r"(?ui)[\W]")
 29 |             for expr in regex.finditer(proc_string):
 30 |                 self.assertEqual(expr.group(), " ")
 31 | 
 32 |     def test_dont_condense_whitespace(self):
 33 |         s1 = "new york mets - atlanta braves"
 34 |         s2 = "new york mets atlanta braves"
 35 |         s3 = "new york mets   atlanta braves"
 36 |         p1 = utils.full_process(s1)
 37 |         p2 = utils.full_process(s2)
 38 |         p3 = utils.full_process(s3)
 39 |         self.assertEqual(p1, s3)
 40 |         self.assertEqual(p2, s2)
 41 |         self.assertEqual(p3, s3)
 42 | 
 43 | 
 44 | class UtilsTest(unittest.TestCase):
 45 |     def setUp(self):
 46 |         self.s1 = "new york mets"
 47 |         self.s1a = "new york mets"
 48 |         self.s2 = "new YORK mets"
 49 |         self.s3 = "the wonderful new york mets"
 50 |         self.s4 = "new york mets vs atlanta braves"
 51 |         self.s5 = "atlanta braves vs new york mets"
 52 |         self.s6 = "new york mets - atlanta braves"
 53 |         self.mixed_strings = [
 54 |             "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
 55 |             "C'est la vie",
 56 |             "Ça va?",
 57 |             "Cães danados",
 58 |             "\xacCamarões assados",
 59 |             "a\xac\u1234\u20ac\U00008000",
 60 |             "\u00C1"
 61 |         ]
 62 | 
 63 |     def tearDown(self):
 64 |         pass
 65 | 
 66 |     def test_ascii_only(self):
 67 |         for s in self.mixed_strings:
 68 |             utils.ascii_only(s)
 69 | 
 70 |     def test_fullProcess(self):
 71 |         for s in self.mixed_strings:
 72 |             utils.full_process(s)
 73 | 
 74 |     def test_fullProcessForceAscii(self):
 75 |         for s in self.mixed_strings:
 76 |             utils.full_process(s, force_ascii=True)
 77 | 
 78 | 
 79 | class RatioTest(unittest.TestCase):
 80 | 
 81 |     def setUp(self):
 82 |         self.s1 = "new york mets"
 83 |         self.s1a = "new york mets"
 84 |         self.s2 = "new YORK mets"
 85 |         self.s3 = "the wonderful new york mets"
 86 |         self.s4 = "new york mets vs atlanta braves"
 87 |         self.s5 = "atlanta braves vs new york mets"
 88 |         self.s6 = "new york mets - atlanta braves"
 89 |         self.s7 = 'new york city mets - atlanta braves'
 90 |         # test silly corner cases
 91 |         self.s8 = '{'
 92 |         self.s8a = '{'
 93 |         self.s9 = '{a'
 94 |         self.s9a = '{a'
 95 |         self.s10 = 'a{'
 96 |         self.s10a = '{b'
 97 | 
 98 |         self.cirque_strings = [
 99 |             "cirque du soleil - zarkana - las vegas",
100 |             "cirque du soleil ",
101 |             "cirque du soleil las vegas",
102 |             "zarkana las vegas",
103 |             "las vegas cirque du soleil at the bellagio",
104 |             "zarakana - cirque du soleil - bellagio"
105 |         ]
106 | 
107 |         self.baseball_strings = [
108 |             "new york mets vs chicago cubs",
109 |             "chicago cubs vs chicago white sox",
110 |             "philladelphia phillies vs atlanta braves",
111 |             "braves vs mets",
112 |         ]
113 | 
114 |     def tearDown(self):
115 |         pass
116 | 
117 |     def testEqual(self):
118 |         self.assertEqual(fuzz.ratio(self.s1, self.s1a), 100)
119 |         self.assertEqual(fuzz.ratio(self.s8, self.s8a), 100)
120 |         self.assertEqual(fuzz.ratio(self.s9, self.s9a), 100)
121 | 
122 |     def testCaseInsensitive(self):
123 |         self.assertNotEqual(fuzz.ratio(self.s1, self.s2), 100)
124 |         self.assertEqual(fuzz.ratio(utils.full_process(self.s1), utils.full_process(self.s2)), 100)
125 | 
126 |     def testPartialRatio(self):
127 |         self.assertEqual(fuzz.partial_ratio(self.s1, self.s3), 100)
128 | 
129 |     def testTokenSortRatio(self):
130 |         self.assertEqual(fuzz.token_sort_ratio(self.s1, self.s1a), 100)
131 | 
132 |     def testPartialTokenSortRatio(self):
133 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s1, self.s1a), 100)
134 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s4, self.s5), 100)
135 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s8, self.s8a, full_process=False), 100)
136 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s9, self.s9a, full_process=True), 100)
137 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s9, self.s9a, full_process=False), 100)
138 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s10, self.s10a, full_process=False), 67)
139 |         self.assertEqual(fuzz.partial_token_sort_ratio(self.s10a, self.s10, full_process=False), 67)
140 | 
141 |     def testTokenSetRatio(self):
142 |         self.assertEqual(fuzz.token_set_ratio(self.s4, self.s5), 100)
143 |         self.assertEqual(fuzz.token_set_ratio(self.s8, self.s8a, full_process=False), 100)
144 |         self.assertEqual(fuzz.token_set_ratio(self.s9, self.s9a, full_process=True), 100)
145 |         self.assertEqual(fuzz.token_set_ratio(self.s9, self.s9a, full_process=False), 100)
146 |         self.assertEqual(fuzz.token_set_ratio(self.s10, self.s10a, full_process=False), 50)
147 | 
148 |     def testPartialTokenSetRatio(self):
149 |         self.assertEqual(fuzz.partial_token_set_ratio(self.s4, self.s7), 100)
150 | 
151 |     def testQuickRatioEqual(self):
152 |         self.assertEqual(fuzz.QRatio(self.s1, self.s1a), 100)
153 | 
154 |     def testQuickRatioCaseInsensitive(self):
155 |         self.assertEqual(fuzz.QRatio(self.s1, self.s2), 100)
156 | 
157 |     def testQuickRatioNotEqual(self):
158 |         self.assertNotEqual(fuzz.QRatio(self.s1, self.s3), 100)
159 | 
160 |     def testWRatioEqual(self):
161 |         self.assertEqual(fuzz.WRatio(self.s1, self.s1a), 100)
162 | 
163 |     def testWRatioCaseInsensitive(self):
164 |         self.assertEqual(fuzz.WRatio(self.s1, self.s2), 100)
165 | 
166 |     def testWRatioPartialMatch(self):
167 |         # a partial match is scaled by .9
168 |         self.assertEqual(fuzz.WRatio(self.s1, self.s3), 90)
169 | 
170 |     def testWRatioMisorderedMatch(self):
171 |         # misordered full matches are scaled by .95
172 |         self.assertEqual(fuzz.WRatio(self.s4, self.s5), 95)
173 | 
174 |     def testWRatioStr(self):
175 |         self.assertEqual(fuzz.WRatio(str(self.s1), str(self.s1a)), 100)
176 | 
177 |     def testQRatioStr(self):
178 |         self.assertEqual(fuzz.WRatio(str(self.s1), str(self.s1a)), 100)
179 | 
180 |     def testEmptyStringsScore100(self):
181 |         self.assertEqual(fuzz.ratio("", ""), 100)
182 |         self.assertEqual(fuzz.partial_ratio("", ""), 100)
183 | 
184 |     def testIssueSeven(self):
185 |         s1 = "HSINCHUANG"
186 |         s2 = "SINJHUAN"
187 |         s3 = "LSINJHUANG DISTRIC"
188 |         s4 = "SINJHUANG DISTRICT"
189 | 
190 |         self.assertGreater(fuzz.partial_ratio(s1, s2), 75)
191 |         self.assertGreater(fuzz.partial_ratio(s1, s3), 75)
192 |         self.assertGreater(fuzz.partial_ratio(s1, s4), 75)
193 | 
194 |     def testRatioUnicodeString(self):
195 |         s1 = "\u00C1"
196 |         s2 = "ABCD"
197 |         score = fuzz.ratio(s1, s2)
198 |         self.assertEqual(0, score)
199 | 
200 |     def testPartialRatioUnicodeString(self):
201 |         s1 = "\u00C1"
202 |         s2 = "ABCD"
203 |         score = fuzz.partial_ratio(s1, s2)
204 |         self.assertEqual(0, score)
205 | 
206 |     def testWRatioUnicodeString(self):
207 |         s1 = "\u00C1"
208 |         s2 = "ABCD"
209 |         score = fuzz.WRatio(s1, s2)
210 |         self.assertEqual(0, score)
211 | 
212 |         # Cyrillic.
213 |         s1 = "\u043f\u0441\u0438\u0445\u043e\u043b\u043e\u0433"
214 |         s2 = "\u043f\u0441\u0438\u0445\u043e\u0442\u0435\u0440\u0430\u043f\u0435\u0432\u0442"
215 |         score = fuzz.WRatio(s1, s2, force_ascii=False)
216 |         self.assertNotEqual(0, score)
217 | 
218 |         # Chinese.
219 |         s1 = "\u6211\u4e86\u89e3\u6570\u5b66"
220 |         s2 = "\u6211\u5b66\u6570\u5b66"
221 |         score = fuzz.WRatio(s1, s2, force_ascii=False)
222 |         self.assertNotEqual(0, score)
223 | 
224 |     def testQRatioUnicodeString(self):
225 |         s1 = "\u00C1"
226 |         s2 = "ABCD"
227 |         score = fuzz.QRatio(s1, s2)
228 |         self.assertEqual(0, score)
229 | 
230 |         # Cyrillic.
231 |         s1 = "\u043f\u0441\u0438\u0445\u043e\u043b\u043e\u0433"
232 |         s2 = "\u043f\u0441\u0438\u0445\u043e\u0442\u0435\u0440\u0430\u043f\u0435\u0432\u0442"
233 |         score = fuzz.QRatio(s1, s2, force_ascii=False)
234 |         self.assertNotEqual(0, score)
235 | 
236 |         # Chinese.
237 |         s1 = "\u6211\u4e86\u89e3\u6570\u5b66"
238 |         s2 = "\u6211\u5b66\u6570\u5b66"
239 |         score = fuzz.QRatio(s1, s2, force_ascii=False)
240 |         self.assertNotEqual(0, score)
241 | 
242 |     def testQratioForceAscii(self):
243 |         s1 = "ABCD\u00C1"
244 |         s2 = "ABCD"
245 | 
246 |         score = fuzz.QRatio(s1, s2, force_ascii=True)
247 |         self.assertEqual(score, 100)
248 | 
249 |         score = fuzz.QRatio(s1, s2, force_ascii=False)
250 |         self.assertLess(score, 100)
251 | 
252 |     def testQRatioForceAscii(self):
253 |         s1 = "ABCD\u00C1"
254 |         s2 = "ABCD"
255 | 
256 |         score = fuzz.WRatio(s1, s2, force_ascii=True)
257 |         self.assertEqual(score, 100)
258 | 
259 |         score = fuzz.WRatio(s1, s2, force_ascii=False)
260 |         self.assertLess(score, 100)
261 | 
262 |     def testPartialTokenSetRatioForceAscii(self):
263 |         s1 = "ABCD\u00C1 HELP\u00C1"
264 |         s2 = "ABCD HELP"
265 | 
266 |         score = fuzz.partial_token_set_ratio(s1, s2, force_ascii=True)
267 |         self.assertEqual(score, 100)
268 | 
269 |         score = fuzz.partial_token_set_ratio(s1, s2, force_ascii=False)
270 |         self.assertLess(score, 100)
271 | 
272 |     def testPartialTokenSortRatioForceAscii(self):
273 |         s1 = "ABCD\u00C1 HELP\u00C1"
274 |         s2 = "ABCD HELP"
275 | 
276 |         score = fuzz.partial_token_sort_ratio(s1, s2, force_ascii=True)
277 |         self.assertEqual(score, 100)
278 | 
279 |         score = fuzz.partial_token_sort_ratio(s1, s2, force_ascii=False)
280 |         self.assertLess(score, 100)
281 | 
282 |     def testCheckForNone(self):
283 |         for scorer in scorers:
284 |             self.assertEqual(scorer(None, None), 0)
285 |             self.assertEqual(scorer('Some', None), 0)
286 |             self.assertEqual(scorer(None, 'Some'), 0)
287 | 
288 |             self.assertNotEqual(scorer('Some', 'Some'), 0)
289 | 
290 |     def testCheckEmptyString(self):
291 |         for scorer in scorers:
292 |             if scorer in {fuzz.token_set_ratio, fuzz.partial_token_set_ratio, fuzz.WRatio, fuzz.UWRatio, fuzz.QRatio, fuzz.UQRatio}:
293 |                 self.assertEqual(scorer('', ''), 0)
294 |             else:
295 |                 self.assertEqual(scorer('', ''), 100)
296 | 
297 |             self.assertEqual(scorer('Some', ''), 0)
298 |             self.assertEqual(scorer('', 'Some'), 0)
299 |             self.assertNotEqual(scorer('Some', 'Some'), 0)
300 | 
301 | 
302 | class ProcessTest(unittest.TestCase):
303 | 
304 |     def setUp(self):
305 |         self.s1 = "new york mets"
306 |         self.s1a = "new york mets"
307 |         self.s2 = "new YORK mets"
308 |         self.s3 = "the wonderful new york mets"
309 |         self.s4 = "new york mets vs atlanta braves"
310 |         self.s5 = "atlanta braves vs new york mets"
311 |         self.s6 = "new york mets - atlanta braves"
312 | 
313 |         self.cirque_strings = [
314 |             "cirque du soleil - zarkana - las vegas",
315 |             "cirque du soleil ",
316 |             "cirque du soleil las vegas",
317 |             "zarkana las vegas",
318 |             "las vegas cirque du soleil at the bellagio",
319 |             "zarakana - cirque du soleil - bellagio"
320 |         ]
321 | 
322 |         self.baseball_strings = [
323 |             "new york mets vs chicago cubs",
324 |             "chicago cubs vs chicago white sox",
325 |             "philladelphia phillies vs atlanta braves",
326 |             "braves vs mets",
327 |         ]
328 | 
329 |     def testGetBestChoice1(self):
330 |         query = "new york mets at atlanta braves"
331 |         best = process.extractOne(query, self.baseball_strings)
332 |         self.assertEqual(best[0], "braves vs mets")
333 | 
334 |     def testGetBestChoice2(self):
335 |         query = "philadelphia phillies at atlanta braves"
336 |         best = process.extractOne(query, self.baseball_strings)
337 |         self.assertEqual(best[0], self.baseball_strings[2])
338 | 
339 |     def testGetBestChoice3(self):
340 |         query = "atlanta braves at philadelphia phillies"
341 |         best = process.extractOne(query, self.baseball_strings)
342 |         self.assertEqual(best[0], self.baseball_strings[2])
343 | 
344 |     def testGetBestChoice4(self):
345 |         query = "chicago cubs vs new york mets"
346 |         best = process.extractOne(query, self.baseball_strings)
347 |         self.assertEqual(best[0], self.baseball_strings[0])
348 | 
349 |     def testWithProcessor(self):
350 |         events = [
351 |             ["chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm"],
352 |             ["new york yankees vs boston red sox", "Fenway Park", "2011-05-11", "8pm"],
353 |             ["atlanta braves vs pittsburgh pirates", "PNC Park", "2011-05-11", "8pm"],
354 |         ]
355 |         query = ["new york mets vs chicago cubs", "CitiField", "2017-03-19", "8pm"],
356 | 
357 |         best = process.extractOne(query, events, processor=lambda event: event[0])
358 |         self.assertEqual(best[0], events[0])
359 | 
360 |     def testIssue57(self):
361 |         """
362 |         account for force_ascii
363 |         """
364 |         query = str(("test", "test"))
365 |         choices = [("test", "test")]
366 |         assert process.extract(query, choices)[0][1] == 100
367 | 
368 |     def testWithScorer(self):
369 |         choices = [
370 |             "new york mets vs chicago cubs",
371 |             "chicago cubs at new york mets",
372 |             "atlanta braves vs pittsbugh pirates",
373 |             "new york yankees vs boston red sox"
374 |         ]
375 | 
376 |         choices_dict = {
377 |             1: "new york mets vs chicago cubs",
378 |             2: "chicago cubs vs chicago white sox",
379 |             3: "philladelphia phillies vs atlanta braves",
380 |             4: "braves vs mets"
381 |         }
382 | 
383 |         # in this hypothetical example we care about ordering, so we use quick ratio
384 |         query = "new york mets at chicago cubs"
385 |         scorer = fuzz.QRatio
386 | 
387 |         # first, as an example, the normal way would select the "more
388 |         # 'complete' match of choices[1]"
389 | 
390 |         best = process.extractOne(query, choices)
391 |         self.assertEqual(best[0], choices[1])
392 | 
393 |         # now, use the custom scorer
394 | 
395 |         best = process.extractOne(query, choices, scorer=scorer)
396 |         self.assertEqual(best[0], choices[0])
397 | 
398 |         best = process.extractOne(query, choices_dict)
399 |         self.assertEqual(best[0], choices_dict[1])
400 | 
401 |     def testWithCutoff(self):
402 |         choices = [
403 |             "new york mets vs chicago cubs",
404 |             "chicago cubs at new york mets",
405 |             "atlanta braves vs pittsbugh pirates",
406 |             "new york yankees vs boston red sox"
407 |         ]
408 | 
409 |         query = "los angeles dodgers vs san francisco giants"
410 | 
411 |         # in this situation, this is an event that does not exist in the list
412 |         # we don't want to randomly match to something, so we use a reasonable cutoff
413 | 
414 |         best = process.extractOne(query, choices, score_cutoff=50)
415 |         self.assertIsNone(best)
416 | 
417 |         # however if we had no cutoff, something would get returned
418 | 
419 |         # best = process.extractOne(query, choices)
420 |         # self.assertIsNotNone(best)
421 | 
422 |     def testWithCutoff2(self):
423 |         choices = [
424 |             "new york mets vs chicago cubs",
425 |             "chicago cubs at new york mets",
426 |             "atlanta braves vs pittsbugh pirates",
427 |             "new york yankees vs boston red sox"
428 |         ]
429 | 
430 |         query = "new york mets vs chicago cubs"
431 |         # Only find 100-score cases
432 |         res = process.extractOne(query, choices, score_cutoff=100)
433 |         self.assertIsNotNone(res)
434 |         best_match, score = res
435 |         self.assertIs(best_match, choices[0])
436 | 
437 |     def testEmptyStrings(self):
438 |         choices = [
439 |             "",
440 |             "new york mets vs chicago cubs",
441 |             "new york yankees vs boston red sox",
442 |             "",
443 |             ""
444 |         ]
445 | 
446 |         query = "new york mets at chicago cubs"
447 | 
448 |         best = process.extractOne(query, choices)
449 |         self.assertEqual(best[0], choices[1])
450 | 
451 |     def testNullStrings(self):
452 |         choices = [
453 |             None,
454 |             "new york mets vs chicago cubs",
455 |             "new york yankees vs boston red sox",
456 |             None,
457 |             None
458 |         ]
459 | 
460 |         query = "new york mets at chicago cubs"
461 | 
462 |         best = process.extractOne(query, choices)
463 |         self.assertEqual(best[0], choices[1])
464 | 
465 |     def test_list_like_extract(self):
466 |         """We should be able to use a list-like object for choices."""
467 |         def generate_choices():
468 |             choices = ['a', 'Bb', 'CcC']
469 |             yield from choices
470 |         search = 'aaa'
471 |         result = [(value, confidence) for value, confidence in
472 |                   process.extract(search, generate_choices())]
473 |         self.assertGreater(len(result), 0)
474 | 
475 |     def test_dict_like_extract(self):
476 |         """We should be able to use a dict-like object for choices, not only a
477 |         dict, and still get dict-like output.
478 |         """
479 |         try:
480 |             from UserDict import UserDict
481 |         except ImportError:
482 |             from collections import UserDict
483 |         choices = UserDict({'aa': 'bb', 'a1': None})
484 |         search = 'aaa'
485 |         result = process.extract(search, choices)
486 |         self.assertGreater(len(result), 0)
487 |         for value, confidence, key in result:
488 |             self.assertIn(value, choices.values())
489 | 
490 |     def test_dedupe(self):
491 |         """We should be able to use a list-like object for contains_dupes
492 |         """
493 |         # Test 1
494 |         contains_dupes = ['Frodo Baggins', 'Tom Sawyer', 'Bilbo Baggin', 'Samuel L. Jackson', 'F. Baggins', 'Frody Baggins', 'Bilbo Baggins']
495 | 
496 |         result = process.dedupe(contains_dupes)
497 |         self.assertLess(len(result), len(contains_dupes))
498 | 
499 |         # Test 2
500 |         contains_dupes = ['Tom', 'Dick', 'Harry']
501 | 
502 |         # we should end up with the same list since no duplicates are contained in the list (e.g. original list is returned)
503 |         deduped_list = ['Tom', 'Dick', 'Harry']
504 | 
505 |         result = process.dedupe(contains_dupes)
506 |         self.assertEqual(result, deduped_list)
507 | 
508 |     def test_simplematch(self):
509 |         basic_string = 'a, b'
510 |         match_strings = ['a, b']
511 | 
512 |         result = process.extractOne(basic_string, match_strings, scorer=fuzz.ratio)
513 |         part_result = process.extractOne(basic_string, match_strings, scorer=fuzz.partial_ratio)
514 | 
515 |         self.assertEqual(result, ('a, b', 100))
516 |         self.assertEqual(part_result, ('a, b', 100))
517 | 
518 | 
519 | class TestCodeFormat(unittest.TestCase):
520 |     def test_pep8_conformance(self):
521 |         pep8style = pycodestyle.StyleGuide(quiet=False)
522 |         pep8style.options.ignore = pep8style.options.ignore + tuple(['E501'])
523 |         pep8style.input_dir('thefuzz')
524 |         result = pep8style.check_files()
525 |         self.assertEqual(result.total_errors, 0, "PEP8 POLICE - WOOOOOWOOOOOOOOOO")
526 | 
527 | if __name__ == '__main__':
528 |     unittest.main()         # run all tests
529 | 


--------------------------------------------------------------------------------
/thefuzz-master/test_thefuzz_hypothesis.py:
--------------------------------------------------------------------------------
  1 | from itertools import product
  2 | from functools import partial
  3 | from string import ascii_letters, digits, punctuation
  4 | 
  5 | from hypothesis import given, assume, settings, HealthCheck
  6 | import hypothesis.strategies as st
  7 | import pytest
  8 | 
  9 | from thefuzz import fuzz, process, utils
 10 | 
 11 | 
 12 | HYPOTHESIS_ALPHABET = ascii_letters + digits + punctuation
 13 | 
 14 | 
 15 | def scorers_processors():
 16 |     """
 17 |     Generate a list of (scorer, processor) pairs for testing
 18 | 
 19 |     :return: [(scorer, processor), ...]
 20 |     """
 21 |     scorers = [fuzz.ratio,
 22 |                fuzz.partial_ratio]
 23 |     processors = [lambda x: x,
 24 |                   partial(utils.full_process, force_ascii=False),
 25 |                   partial(utils.full_process, force_ascii=True)]
 26 |     splist = list(product(scorers, processors))
 27 |     splist.extend(
 28 |         [(fuzz.WRatio, partial(utils.full_process, force_ascii=True)),
 29 |          (fuzz.QRatio, partial(utils.full_process, force_ascii=True)),
 30 |          (fuzz.UWRatio, partial(utils.full_process, force_ascii=False)),
 31 |          (fuzz.UQRatio, partial(utils.full_process, force_ascii=False)),
 32 |          (fuzz.token_set_ratio, partial(utils.full_process, force_ascii=True)),
 33 |          (fuzz.token_sort_ratio, partial(utils.full_process, force_ascii=True)),
 34 |          (fuzz.partial_token_set_ratio, partial(utils.full_process, force_ascii=True)),
 35 |          (fuzz.partial_token_sort_ratio, partial(utils.full_process, force_ascii=True))]
 36 |     )
 37 | 
 38 |     return splist
 39 | 
 40 | 
 41 | def full_scorers_processors():
 42 |     """
 43 |     Generate a list of (scorer, processor) pairs for testing for scorers that use the full string only
 44 | 
 45 |     :return: [(scorer, processor), ...]
 46 |     """
 47 |     scorers = [fuzz.ratio]
 48 |     processors = [lambda x: x,
 49 |                   partial(utils.full_process, force_ascii=False),
 50 |                   partial(utils.full_process, force_ascii=True)]
 51 |     splist = list(product(scorers, processors))
 52 |     splist.extend(
 53 |         [(fuzz.WRatio, partial(utils.full_process, force_ascii=True)),
 54 |          (fuzz.QRatio, partial(utils.full_process, force_ascii=True)),
 55 |          (fuzz.UWRatio, partial(utils.full_process, force_ascii=False)),
 56 |          (fuzz.UQRatio, partial(utils.full_process, force_ascii=False))]
 57 |     )
 58 | 
 59 |     return splist
 60 | 
 61 | 
 62 | @pytest.mark.parametrize('scorer,processor',
 63 |                          scorers_processors())
 64 | @given(data=st.data())
 65 | @settings(max_examples=20, deadline=5000, suppress_health_check=[HealthCheck.data_too_large])
 66 | def test_identical_strings_extracted(scorer, processor, data):
 67 |     """
 68 |     Test that identical strings will always return a perfect match.
 69 | 
 70 |     :param scorer:
 71 |     :param processor:
 72 |     :param data:
 73 |     :return:
 74 |     """
 75 |     # Draw a list of random strings
 76 |     strings = data.draw(
 77 |         st.lists(
 78 |             st.text(min_size=10, max_size=100, alphabet=HYPOTHESIS_ALPHABET),
 79 |             min_size=1,
 80 |             max_size=10
 81 |         )
 82 |     )
 83 |     # Draw a random integer for the index in that list
 84 |     choiceidx = data.draw(st.integers(min_value=0, max_value=(len(strings) - 1)))
 85 | 
 86 |     # Extract our choice from the list
 87 |     choice = strings[choiceidx]
 88 | 
 89 |     # Check process doesn't make our choice the empty string
 90 |     assume(processor(choice) != '')
 91 | 
 92 |     # Extract all perfect matches
 93 |     result = process.extractBests(choice,
 94 |                                   strings,
 95 |                                   scorer=scorer,
 96 |                                   processor=processor,
 97 |                                   score_cutoff=100,
 98 |                                   limit=None)
 99 | 
100 |     # Check we get a result
101 |     assert result != []
102 | 
103 |     # Check the original is in the list
104 |     assert (choice, 100) in result
105 | 
106 | 
107 | @pytest.mark.parametrize('scorer,processor',
108 |                          full_scorers_processors())
109 | @given(data=st.data())
110 | @settings(max_examples=20, deadline=5000)
111 | def test_only_identical_strings_extracted(scorer, processor, data):
112 |     """
113 |     Test that only identical (post processing) strings score 100 on the test.
114 | 
115 |     If two strings are not identical then using full comparison methods they should
116 |     not be a perfect (100) match.
117 | 
118 |     :param scorer:
119 |     :param processor:
120 |     :param data:
121 |     :return:
122 |     """
123 |     # Draw a list of random strings
124 |     strings = data.draw(
125 |         st.lists(
126 |             st.text(min_size=10, max_size=100, alphabet=HYPOTHESIS_ALPHABET),
127 |             min_size=1,
128 |             max_size=10)
129 |     )
130 |     # Draw a random integer for the index in that list
131 |     choiceidx = data.draw(st.integers(min_value=0, max_value=(len(strings) - 1)))
132 | 
133 |     # Extract our choice from the list
134 |     choice = strings[choiceidx]
135 | 
136 |     # Check process doesn't make our choice the empty string
137 |     assume(processor(choice) != '')
138 | 
139 |     # Extract all perfect matches
140 |     result = process.extractBests(choice,
141 |                                   strings,
142 |                                   scorer=scorer,
143 |                                   processor=processor,
144 |                                   score_cutoff=100,
145 |                                   limit=None)
146 | 
147 |     # Check we get a result
148 |     assert result != []
149 | 
150 |     # Check THE ONLY result(s) we get are a perfect match for the (processed) original data
151 |     pchoice = processor(choice)
152 |     for r in result:
153 |         assert pchoice == processor(r[0])
154 | 


--------------------------------------------------------------------------------
/thefuzz-master/test_thefuzz_pytest.py:
--------------------------------------------------------------------------------
 1 | from thefuzz import process
 2 | 
 3 | 
 4 | def test_process_warning(caplog):
 5 |     """Check that a string reduced to 0 by processor logs a warning to stderr"""
 6 | 
 7 |     query = ':::::::'
 8 |     choices = [':::::::']
 9 | 
10 |     _ = process.extractOne(query, choices)
11 | 
12 |     logstr = ("Applied processor reduces "
13 |               "input query to empty string, "
14 |               "all comparisons will have score 0. "
15 |               "[Query: ':::::::']")
16 | 
17 |     assert 1 == len(caplog.records)
18 |     log = caplog.records[0]
19 | 
20 |     assert log.levelname == "WARNING"
21 |     assert log.name == "thefuzz.process"
22 |     assert logstr == log.message
23 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/__init__.py:
--------------------------------------------------------------------------------
1 | __version__ = '0.21.0'
2 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/fuzz.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | from rapidfuzz.fuzz import (
  4 |     ratio as _ratio,
  5 |     partial_ratio as _partial_ratio,
  6 |     token_set_ratio as _token_set_ratio,
  7 |     token_sort_ratio as _token_sort_ratio,
  8 |     partial_token_set_ratio as _partial_token_set_ratio,
  9 |     partial_token_sort_ratio as _partial_token_sort_ratio,
 10 |     WRatio as _WRatio,
 11 |     QRatio as _QRatio,
 12 | )
 13 | 
 14 | from . import utils
 15 | 
 16 | ###########################
 17 | # Basic Scoring Functions #
 18 | ###########################
 19 | 
 20 | 
 21 | def _rapidfuzz_scorer(scorer, s1, s2, force_ascii, full_process):
 22 |     """
 23 |     wrapper around rapidfuzz function to be compatible with the API of thefuzz
 24 |     """
 25 |     if full_process:
 26 |         if s1 is None or s2 is None:
 27 |             return 0
 28 | 
 29 |         s1 = utils.full_process(s1, force_ascii=force_ascii)
 30 |         s2 = utils.full_process(s2, force_ascii=force_ascii)
 31 | 
 32 |     return int(round(scorer(s1, s2)))
 33 | 
 34 | 
 35 | def ratio(s1, s2):
 36 |     return _rapidfuzz_scorer(_ratio, s1, s2, False, False)
 37 | 
 38 | 
 39 | def partial_ratio(s1, s2):
 40 |     """
 41 |     Return the ratio of the most similar substring
 42 |     as a number between 0 and 100.
 43 |     """
 44 |     return _rapidfuzz_scorer(_partial_ratio, s1, s2, False, False)
 45 | 
 46 | 
 47 | ##############################
 48 | # Advanced Scoring Functions #
 49 | ##############################
 50 | 
 51 | # Sorted Token
 52 | #   find all alphanumeric tokens in the string
 53 | #   sort those tokens and take ratio of resulting joined strings
 54 | #   controls for unordered string elements
 55 | def token_sort_ratio(s1, s2, force_ascii=True, full_process=True):
 56 |     """
 57 |     Return a measure of the sequences' similarity between 0 and 100
 58 |     but sorting the token before comparing.
 59 |     """
 60 |     return _rapidfuzz_scorer(_token_sort_ratio, s1, s2, force_ascii, full_process)
 61 | 
 62 | 
 63 | def partial_token_sort_ratio(s1, s2, force_ascii=True, full_process=True):
 64 |     """
 65 |     Return the ratio of the most similar substring as a number between
 66 |     0 and 100 but sorting the token before comparing.
 67 |     """
 68 |     return _rapidfuzz_scorer(
 69 |         _partial_token_sort_ratio, s1, s2, force_ascii, full_process
 70 |     )
 71 | 
 72 | 
 73 | def token_set_ratio(s1, s2, force_ascii=True, full_process=True):
 74 |     return _rapidfuzz_scorer(_token_set_ratio, s1, s2, force_ascii, full_process)
 75 | 
 76 | 
 77 | def partial_token_set_ratio(s1, s2, force_ascii=True, full_process=True):
 78 |     return _rapidfuzz_scorer(
 79 |         _partial_token_set_ratio, s1, s2, force_ascii, full_process
 80 |     )
 81 | 
 82 | 
 83 | ###################
 84 | # Combination API #
 85 | ###################
 86 | 
 87 | # q is for quick
 88 | def QRatio(s1, s2, force_ascii=True, full_process=True):
 89 |     """
 90 |     Quick ratio comparison between two strings.
 91 | 
 92 |     Runs full_process from utils on both strings
 93 |     Short circuits if either of the strings is empty after processing.
 94 | 
 95 |     :param s1:
 96 |     :param s2:
 97 |     :param force_ascii: Allow only ASCII characters (Default: True)
 98 |     :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
 99 |     :return: similarity ratio
100 |     """
101 |     return _rapidfuzz_scorer(_QRatio, s1, s2, force_ascii, full_process)
102 | 
103 | 
104 | def UQRatio(s1, s2, full_process=True):
105 |     """
106 |     Unicode quick ratio
107 | 
108 |     Calls QRatio with force_ascii set to False
109 | 
110 |     :param s1:
111 |     :param s2:
112 |     :return: similarity ratio
113 |     """
114 |     return QRatio(s1, s2, force_ascii=False, full_process=full_process)
115 | 
116 | 
117 | # w is for weighted
118 | def WRatio(s1, s2, force_ascii=True, full_process=True):
119 |     """
120 |     Return a measure of the sequences' similarity between 0 and 100, using different algorithms.
121 | 
122 |     **Steps in the order they occur**
123 | 
124 |     #. Run full_process from utils on both strings
125 |     #. Short circuit if this makes either string empty
126 |     #. Take the ratio of the two processed strings (fuzz.ratio)
127 |     #. Run checks to compare the length of the strings
128 |         * If one of the strings is more than 1.5 times as long as the other
129 |           use partial_ratio comparisons - scale partial results by 0.9
130 |           (this makes sure only full results can return 100)
131 |         * If one of the strings is over 8 times as long as the other
132 |           instead scale by 0.6
133 | 
134 |     #. Run the other ratio functions
135 |         * if using partial ratio functions call partial_ratio,
136 |           partial_token_sort_ratio and partial_token_set_ratio
137 |           scale all of these by the ratio based on length
138 |         * otherwise call token_sort_ratio and token_set_ratio
139 |         * all token based comparisons are scaled by 0.95
140 |           (on top of any partial scalars)
141 | 
142 |     #. Take the highest value from these results
143 |        round it and return it as an integer.
144 | 
145 |     :param s1:
146 |     :param s2:
147 |     :param force_ascii: Allow only ascii characters
148 |     :type force_ascii: bool
149 |     :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
150 |     :return:
151 |     """
152 |     return _rapidfuzz_scorer(_WRatio, s1, s2, force_ascii, full_process)
153 | 
154 | 
155 | def UWRatio(s1, s2, full_process=True):
156 |     """
157 |     Return a measure of the sequences' similarity between 0 and 100,
158 |     using different algorithms. Same as WRatio but preserving unicode.
159 |     """
160 |     return WRatio(s1, s2, force_ascii=False, full_process=full_process)
161 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/fuzz.pyi:
--------------------------------------------------------------------------------
 1 | def ratio(s1: str, s2: str) -> int: ...
 2 | def partial_ratio(s1: str, s2: str) -> int: ...
 3 | def token_sort_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 4 | def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 5 | def token_set_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 6 | def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 7 | def QRatio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 8 | def UQRatio(s1: str, s2: str, full_process: bool = ...) -> int: ...
 9 | def WRatio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
10 | def UWRatio(s1: str, s2: str, full_process: bool = ...) -> int: ...
11 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/process.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | from . import fuzz
  3 | from . import utils
  4 | import logging
  5 | from rapidfuzz import fuzz as rfuzz
  6 | from rapidfuzz import process as rprocess
  7 | from functools import partial
  8 | 
  9 | _logger = logging.getLogger(__name__)
 10 | 
 11 | default_scorer = fuzz.WRatio
 12 | default_processor = utils.full_process
 13 | 
 14 | 
 15 | def _get_processor(processor, scorer):
 16 |     """
 17 |     thefuzz runs both the default preprocessing of the function and the preprocessing
 18 |     function passed into process.* while rapidfuzz only runs the one passed into
 19 |     process.*. This function wraps the processor to mimic this behavior
 20 |     """
 21 |     if scorer not in (fuzz.WRatio, fuzz.QRatio,
 22 |                       fuzz.token_set_ratio, fuzz.token_sort_ratio,
 23 |                       fuzz.partial_token_set_ratio, fuzz.partial_token_sort_ratio,
 24 |                       fuzz.UWRatio, fuzz.UQRatio):
 25 |         return processor
 26 | 
 27 |     force_ascii = scorer not in [fuzz.UWRatio, fuzz.UQRatio]
 28 |     pre_processor = partial(utils.full_process, force_ascii=force_ascii)
 29 | 
 30 |     if not processor or processor == utils.full_process:
 31 |         return pre_processor
 32 | 
 33 |     def wrapper(s):
 34 |         return pre_processor(processor(s))
 35 | 
 36 |     return wrapper
 37 | 
 38 | 
 39 | # this allows lowering the scorers back to the scorers used in rapidfuzz
 40 | # this allows rapidfuzz to perform more optimizations behind the scenes.
 41 | # These mapped scorers are the same with two expceptions
 42 | # - default processor
 43 | # - result is not rounded
 44 | # these two exceptions need to be taken into account in the implementation
 45 | _scorer_lowering = {
 46 |     fuzz.ratio: rfuzz.ratio,
 47 |     fuzz.partial_ratio: rfuzz.partial_ratio,
 48 |     fuzz.token_set_ratio: rfuzz.token_set_ratio,
 49 |     fuzz.token_sort_ratio: rfuzz.token_sort_ratio,
 50 |     fuzz.partial_token_set_ratio: rfuzz.partial_token_set_ratio,
 51 |     fuzz.partial_token_sort_ratio: rfuzz.partial_token_sort_ratio,
 52 |     fuzz.WRatio: rfuzz.WRatio,
 53 |     fuzz.QRatio: rfuzz.QRatio,
 54 |     fuzz.UWRatio: rfuzz.WRatio,
 55 |     fuzz.UQRatio: rfuzz.QRatio,
 56 | }
 57 | 
 58 | 
 59 | def _get_scorer(scorer):
 60 |     """
 61 |     rapidfuzz scorers require the score_cutoff argument to be available
 62 |     This generates a compatible wrapper function
 63 |     """
 64 |     def wrapper(s1, s2, score_cutoff=0):
 65 |         return scorer(s1, s2)
 66 | 
 67 |     return _scorer_lowering.get(scorer, wrapper)
 68 | 
 69 | 
 70 | def _preprocess_query(query, processor):
 71 |     processed_query = processor(query) if processor else query
 72 |     if len(processed_query) == 0:
 73 |         _logger.warning("Applied processor reduces input query to empty string, "
 74 |                         "all comparisons will have score 0. "
 75 |                         f"[Query: \'{query}\']")
 76 | 
 77 |     return processed_query
 78 | 
 79 | 
 80 | def extractWithoutOrder(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0):
 81 |     """
 82 |     Select the best match in a list or dictionary of choices.
 83 | 
 84 |     Find best matches in a list or dictionary of choices, return a
 85 |     generator of tuples containing the match and its score. If a dictionary
 86 |     is used, also returns the key for each match.
 87 | 
 88 |     Arguments:
 89 |         query: An object representing the thing we want to find.
 90 |         choices: An iterable or dictionary-like object containing choices
 91 |             to be matched against the query. Dictionary arguments of
 92 |             {key: value} pairs will attempt to match the query against
 93 |             each value.
 94 |         processor: Optional function of the form f(a) -> b, where a is the query or
 95 |             individual choice and b is the choice to be used in matching.
 96 | 
 97 |             This can be used to match against, say, the first element of
 98 |             a list:
 99 | 
100 |             lambda x: x[0]
101 | 
102 |             Defaults to thefuzz.utils.full_process().
103 |         scorer: Optional function for scoring matches between the query and
104 |             an individual processed choice. This should be a function
105 |             of the form f(query, choice) -> int.
106 | 
107 |             By default, fuzz.WRatio() is used and expects both query and
108 |             choice to be strings.
109 |         score_cutoff: Optional argument for score threshold. No matches with
110 |             a score less than this number will be returned. Defaults to 0.
111 | 
112 |     Returns:
113 |         Generator of tuples containing the match and its score.
114 | 
115 |         If a list is used for choices, then the result will be 2-tuples.
116 |         If a dictionary is used, then the result will be 3-tuples containing
117 |         the key for each match.
118 | 
119 |         For example, searching for 'bird' in the dictionary
120 | 
121 |         {'bard': 'train', 'dog': 'man'}
122 | 
123 |         may return
124 | 
125 |         ('train', 22, 'bard'), ('man', 0, 'dog')
126 |     """
127 |     is_mapping = hasattr(choices, "items")
128 |     is_lowered = scorer in _scorer_lowering
129 | 
130 |     query = _preprocess_query(query, processor)
131 |     it = rprocess.extract_iter(
132 |         query, choices,
133 |         processor=_get_processor(processor, scorer),
134 |         scorer=_get_scorer(scorer),
135 |         score_cutoff=score_cutoff
136 |     )
137 | 
138 |     for choice, score, key in it:
139 |         if is_lowered:
140 |             score = int(round(score))
141 | 
142 |         yield (choice, score, key) if is_mapping else (choice, score)
143 | 
144 | 
145 | def extract(query, choices, processor=default_processor, scorer=default_scorer, limit=5):
146 |     """
147 |     Select the best match in a list or dictionary of choices.
148 | 
149 |     Find best matches in a list or dictionary of choices, return a
150 |     list of tuples containing the match and its score. If a dictionary
151 |     is used, also returns the key for each match.
152 | 
153 |     Arguments:
154 |         query: An object representing the thing we want to find.
155 |         choices: An iterable or dictionary-like object containing choices
156 |             to be matched against the query. Dictionary arguments of
157 |             {key: value} pairs will attempt to match the query against
158 |             each value.
159 |         processor: Optional function of the form f(a) -> b, where a is the query or
160 |             individual choice and b is the choice to be used in matching.
161 | 
162 |             This can be used to match against, say, the first element of
163 |             a list:
164 | 
165 |             lambda x: x[0]
166 | 
167 |             Defaults to thefuzz.utils.full_process().
168 |         scorer: Optional function for scoring matches between the query and
169 |             an individual processed choice. This should be a function
170 |             of the form f(query, choice) -> int.
171 |             By default, fuzz.WRatio() is used and expects both query and
172 |             choice to be strings.
173 |         limit: Optional maximum for the number of elements returned. Defaults
174 |             to 5.
175 | 
176 |     Returns:
177 |         List of tuples containing the match and its score.
178 | 
179 |         If a list is used for choices, then the result will be 2-tuples.
180 |         If a dictionary is used, then the result will be 3-tuples containing
181 |         the key for each match.
182 | 
183 |         For example, searching for 'bird' in the dictionary
184 | 
185 |         {'bard': 'train', 'dog': 'man'}
186 | 
187 |         may return
188 | 
189 |         [('train', 22, 'bard'), ('man', 0, 'dog')]
190 |     """
191 |     return extractBests(query, choices, processor=processor, scorer=scorer, limit=limit)
192 | 
193 | 
194 | def extractBests(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0, limit=5):
195 |     """
196 |     Get a list of the best matches to a collection of choices.
197 | 
198 |     Convenience function for getting the choices with best scores.
199 | 
200 |     Args:
201 |         query: A string to match against
202 |         choices: A list or dictionary of choices, suitable for use with
203 |             extract().
204 |         processor: Optional function for transforming choices before matching.
205 |             See extract().
206 |         scorer: Scoring function for extract().
207 |         score_cutoff: Optional argument for score threshold. No matches with
208 |             a score less than this number will be returned. Defaults to 0.
209 |         limit: Optional maximum for the number of elements returned. Defaults
210 |             to 5.
211 | 
212 |     Returns: A a list of (match, score) tuples.
213 |     """
214 |     is_mapping = hasattr(choices, "items")
215 |     is_lowered = scorer in _scorer_lowering
216 | 
217 |     query = _preprocess_query(query, processor)
218 |     results = rprocess.extract(
219 |         query, choices,
220 |         processor=_get_processor(processor, scorer),
221 |         scorer=_get_scorer(scorer),
222 |         score_cutoff=score_cutoff,
223 |         limit=limit
224 |     )
225 | 
226 |     for i, (choice, score, key) in enumerate(results):
227 |         if is_lowered:
228 |             score = int(round(score))
229 | 
230 |         results[i] = (choice, score, key) if is_mapping else (choice, score)
231 | 
232 |     return results
233 | 
234 | 
235 | def extractOne(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0):
236 |     """
237 |     Find the single best match above a score in a list of choices.
238 | 
239 |     This is a convenience method which returns the single best choice.
240 |     See extract() for the full arguments list.
241 | 
242 |     Args:
243 |         query: A string to match against
244 |         choices: A list or dictionary of choices, suitable for use with
245 |             extract().
246 |         processor: Optional function for transforming choices before matching.
247 |             See extract().
248 |         scorer: Scoring function for extract().
249 |         score_cutoff: Optional argument for score threshold. If the best
250 |             match is found, but it is not greater than this number, then
251 |             return None anyway ("not a good enough match").  Defaults to 0.
252 | 
253 |     Returns:
254 |         A tuple containing a single match and its score, if a match
255 |         was found that was above score_cutoff. Otherwise, returns None.
256 |     """
257 |     is_mapping = hasattr(choices, "items")
258 |     is_lowered = scorer in _scorer_lowering
259 | 
260 |     query = _preprocess_query(query, processor)
261 |     res = rprocess.extractOne(
262 |         query, choices,
263 |         processor=_get_processor(processor, scorer),
264 |         scorer=_get_scorer(scorer),
265 |         score_cutoff=score_cutoff
266 |     )
267 | 
268 |     if res is None:
269 |         return res
270 | 
271 |     choice, score, key = res
272 | 
273 |     if is_lowered:
274 |         score = int(round(score))
275 | 
276 |     return (choice, score, key) if is_mapping else (choice, score)
277 | 
278 | 
279 | def dedupe(contains_dupes, threshold=70, scorer=fuzz.token_set_ratio):
280 |     """
281 |     This convenience function takes a list of strings containing duplicates and uses fuzzy matching to identify
282 |     and remove duplicates. Specifically, it uses process.extract to identify duplicates that
283 |     score greater than a user defined threshold. Then, it looks for the longest item in the duplicate list
284 |     since we assume this item contains the most entity information and returns that. It breaks string
285 |     length ties on an alphabetical sort.
286 | 
287 |     Note: as the threshold DECREASES the number of duplicates that are found INCREASES. This means that the
288 |         returned deduplicated list will likely be shorter. Raise the threshold for dedupe to be less
289 |         sensitive.
290 | 
291 |     Args:
292 |         contains_dupes: A list of strings that we would like to dedupe.
293 |         threshold: the numerical value (0,100) point at which we expect to find duplicates.
294 |             Defaults to 70 out of 100
295 |         scorer: Optional function for scoring matches between the query and
296 |             an individual processed choice. This should be a function
297 |             of the form f(query, choice) -> int.
298 |             By default, fuzz.token_set_ratio() is used and expects both query and
299 |             choice to be strings.
300 | 
301 |     Returns:
302 |         A deduplicated list. For example:
303 | 
304 |             In: contains_dupes = ['Frodo Baggin', 'Frodo Baggins', 'F. Baggins', 'Samwise G.', 'Gandalf', 'Bilbo Baggins']
305 |             In: dedupe(contains_dupes)
306 |             Out: ['Frodo Baggins', 'Samwise G.', 'Bilbo Baggins', 'Gandalf']
307 |     """
308 |     deduped = set()
309 |     for item in contains_dupes:
310 |         matches = extractBests(item, contains_dupes, scorer=scorer, score_cutoff=threshold, limit=None)
311 |         deduped.add(max(matches, key=lambda x: (len(x[0]), x[0]))[0])
312 | 
313 |     return list(deduped) if len(deduped) != len(contains_dupes) else contains_dupes
314 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/process.pyi:
--------------------------------------------------------------------------------
 1 | from collections.abc import Mapping
 2 | import typing
 3 | from typing import Any, Callable, Union, Tuple, Generator, TypeVar, Sequence
 4 | 
 5 | 
 6 | ChoicesT = Union[Mapping[str, str], Sequence[str]]
 7 | T = TypeVar('T')
 8 | ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]
 9 | ScorerT = Callable[[str, str, bool, bool], int]
10 | 
11 | 
12 | @typing.overload
13 | def extractWithoutOrder(query: str, choices: Mapping[str, str], processor: ProcessorT, scorer: ScorerT, score_cutoff: int = ...) -> Generator[Tuple[str, int, str], None, None]: ...
14 | 
15 | 
16 | @typing.overload
17 | def extractWithoutOrder(query: str, choices: Sequence[str], processor: ProcessorT, scorer: ScorerT, score_cutoff: int = ...) -> Generator[Tuple[str, int], None, None]: ...
18 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/py.typed:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/utils.py:
--------------------------------------------------------------------------------
 1 | from rapidfuzz.utils import default_process as _default_process
 2 | 
 3 | translation_table = {i: None for i in range(128, 256)}  # ascii dammit!
 4 | 
 5 | 
 6 | def ascii_only(s):
 7 |     return s.translate(translation_table)
 8 | 
 9 | 
10 | def full_process(s, force_ascii=False):
11 |     """
12 |     Process string by
13 |     -- removing all but letters and numbers
14 |     -- trim whitespace
15 |     -- force to lower case
16 |     if force_ascii == True, force convert to ascii
17 |     """
18 | 
19 |     if force_ascii:
20 |         s = ascii_only(str(s))
21 | 
22 |     return _default_process(s)
23 | 


--------------------------------------------------------------------------------
/thefuzz-master/thefuzz/utils.pyi:
--------------------------------------------------------------------------------
1 | 
2 | def ascii_only(s: str) -> str: ...
3 | def full_process(s: str, force_ascii: bool = ...) -> str: ...
4 | 


--------------------------------------------------------------------------------
/thefuzz-master/tox.ini:
--------------------------------------------------------------------------------
 1 | [tox]
 2 | envlist = py{38, 39, 310, 311, 312, py3}
 3 | skip_missing_interpreters = True
 4 | 
 5 | [testenv]
 6 | deps = pytest
 7 |        pycodestyle
 8 |        hypothesis
 9 | commands = pytest
10 | 


--------------------------------------------------------------------------------
/thefuzz/__init__.py:
--------------------------------------------------------------------------------
1 | __version__ = '0.21.0'
2 | 


--------------------------------------------------------------------------------
/thefuzz/fuzz.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | from rapidfuzz.fuzz import (
  4 |     ratio as _ratio,
  5 |     partial_ratio as _partial_ratio,
  6 |     token_set_ratio as _token_set_ratio,
  7 |     token_sort_ratio as _token_sort_ratio,
  8 |     partial_token_set_ratio as _partial_token_set_ratio,
  9 |     partial_token_sort_ratio as _partial_token_sort_ratio,
 10 |     WRatio as _WRatio,
 11 |     QRatio as _QRatio,
 12 | )
 13 | 
 14 | from . import utils
 15 | 
 16 | ###########################
 17 | # Basic Scoring Functions #
 18 | ###########################
 19 | 
 20 | 
 21 | def _rapidfuzz_scorer(scorer, s1, s2, force_ascii, full_process):
 22 |     """
 23 |     wrapper around rapidfuzz function to be compatible with the API of thefuzz
 24 |     """
 25 |     if full_process:
 26 |         if s1 is None or s2 is None:
 27 |             return 0
 28 | 
 29 |         s1 = utils.full_process(s1, force_ascii=force_ascii)
 30 |         s2 = utils.full_process(s2, force_ascii=force_ascii)
 31 | 
 32 |     return int(round(scorer(s1, s2)))
 33 | 
 34 | 
 35 | def ratio(s1, s2):
 36 |     return _rapidfuzz_scorer(_ratio, s1, s2, False, False)
 37 | 
 38 | 
 39 | def partial_ratio(s1, s2):
 40 |     """
 41 |     Return the ratio of the most similar substring
 42 |     as a number between 0 and 100.
 43 |     """
 44 |     return _rapidfuzz_scorer(_partial_ratio, s1, s2, False, False)
 45 | 
 46 | 
 47 | ##############################
 48 | # Advanced Scoring Functions #
 49 | ##############################
 50 | 
 51 | # Sorted Token
 52 | #   find all alphanumeric tokens in the string
 53 | #   sort those tokens and take ratio of resulting joined strings
 54 | #   controls for unordered string elements
 55 | def token_sort_ratio(s1, s2, force_ascii=True, full_process=True):
 56 |     """
 57 |     Return a measure of the sequences' similarity between 0 and 100
 58 |     but sorting the token before comparing.
 59 |     """
 60 |     return _rapidfuzz_scorer(_token_sort_ratio, s1, s2, force_ascii, full_process)
 61 | 
 62 | 
 63 | def partial_token_sort_ratio(s1, s2, force_ascii=True, full_process=True):
 64 |     """
 65 |     Return the ratio of the most similar substring as a number between
 66 |     0 and 100 but sorting the token before comparing.
 67 |     """
 68 |     return _rapidfuzz_scorer(
 69 |         _partial_token_sort_ratio, s1, s2, force_ascii, full_process
 70 |     )
 71 | 
 72 | 
 73 | def token_set_ratio(s1, s2, force_ascii=True, full_process=True):
 74 |     return _rapidfuzz_scorer(_token_set_ratio, s1, s2, force_ascii, full_process)
 75 | 
 76 | 
 77 | def partial_token_set_ratio(s1, s2, force_ascii=True, full_process=True):
 78 |     return _rapidfuzz_scorer(
 79 |         _partial_token_set_ratio, s1, s2, force_ascii, full_process
 80 |     )
 81 | 
 82 | 
 83 | ###################
 84 | # Combination API #
 85 | ###################
 86 | 
 87 | # q is for quick
 88 | def QRatio(s1, s2, force_ascii=True, full_process=True):
 89 |     """
 90 |     Quick ratio comparison between two strings.
 91 | 
 92 |     Runs full_process from utils on both strings
 93 |     Short circuits if either of the strings is empty after processing.
 94 | 
 95 |     :param s1:
 96 |     :param s2:
 97 |     :param force_ascii: Allow only ASCII characters (Default: True)
 98 |     :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
 99 |     :return: similarity ratio
100 |     """
101 |     return _rapidfuzz_scorer(_QRatio, s1, s2, force_ascii, full_process)
102 | 
103 | 
104 | def UQRatio(s1, s2, full_process=True):
105 |     """
106 |     Unicode quick ratio
107 | 
108 |     Calls QRatio with force_ascii set to False
109 | 
110 |     :param s1:
111 |     :param s2:
112 |     :return: similarity ratio
113 |     """
114 |     return QRatio(s1, s2, force_ascii=False, full_process=full_process)
115 | 
116 | 
117 | # w is for weighted
118 | def WRatio(s1, s2, force_ascii=True, full_process=True):
119 |     """
120 |     Return a measure of the sequences' similarity between 0 and 100, using different algorithms.
121 | 
122 |     **Steps in the order they occur**
123 | 
124 |     #. Run full_process from utils on both strings
125 |     #. Short circuit if this makes either string empty
126 |     #. Take the ratio of the two processed strings (fuzz.ratio)
127 |     #. Run checks to compare the length of the strings
128 |         * If one of the strings is more than 1.5 times as long as the other
129 |           use partial_ratio comparisons - scale partial results by 0.9
130 |           (this makes sure only full results can return 100)
131 |         * If one of the strings is over 8 times as long as the other
132 |           instead scale by 0.6
133 | 
134 |     #. Run the other ratio functions
135 |         * if using partial ratio functions call partial_ratio,
136 |           partial_token_sort_ratio and partial_token_set_ratio
137 |           scale all of these by the ratio based on length
138 |         * otherwise call token_sort_ratio and token_set_ratio
139 |         * all token based comparisons are scaled by 0.95
140 |           (on top of any partial scalars)
141 | 
142 |     #. Take the highest value from these results
143 |        round it and return it as an integer.
144 | 
145 |     :param s1:
146 |     :param s2:
147 |     :param force_ascii: Allow only ascii characters
148 |     :type force_ascii: bool
149 |     :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
150 |     :return:
151 |     """
152 |     return _rapidfuzz_scorer(_WRatio, s1, s2, force_ascii, full_process)
153 | 
154 | 
155 | def UWRatio(s1, s2, full_process=True):
156 |     """
157 |     Return a measure of the sequences' similarity between 0 and 100,
158 |     using different algorithms. Same as WRatio but preserving unicode.
159 |     """
160 |     return WRatio(s1, s2, force_ascii=False, full_process=full_process)
161 | 


--------------------------------------------------------------------------------
/thefuzz/fuzz.pyi:
--------------------------------------------------------------------------------
 1 | def ratio(s1: str, s2: str) -> int: ...
 2 | def partial_ratio(s1: str, s2: str) -> int: ...
 3 | def token_sort_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 4 | def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 5 | def token_set_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 6 | def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 7 | def QRatio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
 8 | def UQRatio(s1: str, s2: str, full_process: bool = ...) -> int: ...
 9 | def WRatio(s1: str, s2: str, force_ascii: bool = ..., full_process: bool = ...) -> int: ...
10 | def UWRatio(s1: str, s2: str, full_process: bool = ...) -> int: ...
11 | 


--------------------------------------------------------------------------------
/thefuzz/process.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | from . import fuzz
  3 | from . import utils
  4 | import logging
  5 | from rapidfuzz import fuzz as rfuzz
  6 | from rapidfuzz import process as rprocess
  7 | from functools import partial
  8 | 
  9 | _logger = logging.getLogger(__name__)
 10 | 
 11 | default_scorer = fuzz.WRatio
 12 | default_processor = utils.full_process
 13 | 
 14 | 
 15 | def _get_processor(processor, scorer):
 16 |     """
 17 |     thefuzz runs both the default preprocessing of the function and the preprocessing
 18 |     function passed into process.* while rapidfuzz only runs the one passed into
 19 |     process.*. This function wraps the processor to mimic this behavior
 20 |     """
 21 |     if scorer not in (fuzz.WRatio, fuzz.QRatio,
 22 |                       fuzz.token_set_ratio, fuzz.token_sort_ratio,
 23 |                       fuzz.partial_token_set_ratio, fuzz.partial_token_sort_ratio,
 24 |                       fuzz.UWRatio, fuzz.UQRatio):
 25 |         return processor
 26 | 
 27 |     force_ascii = scorer not in [fuzz.UWRatio, fuzz.UQRatio]
 28 |     pre_processor = partial(utils.full_process, force_ascii=force_ascii)
 29 | 
 30 |     if not processor or processor == utils.full_process:
 31 |         return pre_processor
 32 | 
 33 |     def wrapper(s):
 34 |         return pre_processor(processor(s))
 35 | 
 36 |     return wrapper
 37 | 
 38 | 
 39 | # this allows lowering the scorers back to the scorers used in rapidfuzz
 40 | # this allows rapidfuzz to perform more optimizations behind the scenes.
 41 | # These mapped scorers are the same with two expceptions
 42 | # - default processor
 43 | # - result is not rounded
 44 | # these two exceptions need to be taken into account in the implementation
 45 | _scorer_lowering = {
 46 |     fuzz.ratio: rfuzz.ratio,
 47 |     fuzz.partial_ratio: rfuzz.partial_ratio,
 48 |     fuzz.token_set_ratio: rfuzz.token_set_ratio,
 49 |     fuzz.token_sort_ratio: rfuzz.token_sort_ratio,
 50 |     fuzz.partial_token_set_ratio: rfuzz.partial_token_set_ratio,
 51 |     fuzz.partial_token_sort_ratio: rfuzz.partial_token_sort_ratio,
 52 |     fuzz.WRatio: rfuzz.WRatio,
 53 |     fuzz.QRatio: rfuzz.QRatio,
 54 |     fuzz.UWRatio: rfuzz.WRatio,
 55 |     fuzz.UQRatio: rfuzz.QRatio,
 56 | }
 57 | 
 58 | 
 59 | def _get_scorer(scorer):
 60 |     """
 61 |     rapidfuzz scorers require the score_cutoff argument to be available
 62 |     This generates a compatible wrapper function
 63 |     """
 64 |     def wrapper(s1, s2, score_cutoff=0):
 65 |         return scorer(s1, s2)
 66 | 
 67 |     return _scorer_lowering.get(scorer, wrapper)
 68 | 
 69 | 
 70 | def _preprocess_query(query, processor):
 71 |     processed_query = processor(query) if processor else query
 72 |     if len(processed_query) == 0:
 73 |         _logger.warning("Applied processor reduces input query to empty string, "
 74 |                         "all comparisons will have score 0. "
 75 |                         f"[Query: \'{query}\']")
 76 | 
 77 |     return processed_query
 78 | 
 79 | 
 80 | def extractWithoutOrder(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0):
 81 |     """
 82 |     Select the best match in a list or dictionary of choices.
 83 | 
 84 |     Find best matches in a list or dictionary of choices, return a
 85 |     generator of tuples containing the match and its score. If a dictionary
 86 |     is used, also returns the key for each match.
 87 | 
 88 |     Arguments:
 89 |         query: An object representing the thing we want to find.
 90 |         choices: An iterable or dictionary-like object containing choices
 91 |             to be matched against the query. Dictionary arguments of
 92 |             {key: value} pairs will attempt to match the query against
 93 |             each value.
 94 |         processor: Optional function of the form f(a) -> b, where a is the query or
 95 |             individual choice and b is the choice to be used in matching.
 96 | 
 97 |             This can be used to match against, say, the first element of
 98 |             a list:
 99 | 
100 |             lambda x: x[0]
101 | 
102 |             Defaults to thefuzz.utils.full_process().
103 |         scorer: Optional function for scoring matches between the query and
104 |             an individual processed choice. This should be a function
105 |             of the form f(query, choice) -> int.
106 | 
107 |             By default, fuzz.WRatio() is used and expects both query and
108 |             choice to be strings.
109 |         score_cutoff: Optional argument for score threshold. No matches with
110 |             a score less than this number will be returned. Defaults to 0.
111 | 
112 |     Returns:
113 |         Generator of tuples containing the match and its score.
114 | 
115 |         If a list is used for choices, then the result will be 2-tuples.
116 |         If a dictionary is used, then the result will be 3-tuples containing
117 |         the key for each match.
118 | 
119 |         For example, searching for 'bird' in the dictionary
120 | 
121 |         {'bard': 'train', 'dog': 'man'}
122 | 
123 |         may return
124 | 
125 |         ('train', 22, 'bard'), ('man', 0, 'dog')
126 |     """
127 |     is_mapping = hasattr(choices, "items")
128 |     is_lowered = scorer in _scorer_lowering
129 | 
130 |     query = _preprocess_query(query, processor)
131 |     it = rprocess.extract_iter(
132 |         query, choices,
133 |         processor=_get_processor(processor, scorer),
134 |         scorer=_get_scorer(scorer),
135 |         score_cutoff=score_cutoff
136 |     )
137 | 
138 |     for choice, score, key in it:
139 |         if is_lowered:
140 |             score = int(round(score))
141 | 
142 |         yield (choice, score, key) if is_mapping else (choice, score)
143 | 
144 | 
145 | def extract(query, choices, processor=default_processor, scorer=default_scorer, limit=5):
146 |     """
147 |     Select the best match in a list or dictionary of choices.
148 | 
149 |     Find best matches in a list or dictionary of choices, return a
150 |     list of tuples containing the match and its score. If a dictionary
151 |     is used, also returns the key for each match.
152 | 
153 |     Arguments:
154 |         query: An object representing the thing we want to find.
155 |         choices: An iterable or dictionary-like object containing choices
156 |             to be matched against the query. Dictionary arguments of
157 |             {key: value} pairs will attempt to match the query against
158 |             each value.
159 |         processor: Optional function of the form f(a) -> b, where a is the query or
160 |             individual choice and b is the choice to be used in matching.
161 | 
162 |             This can be used to match against, say, the first element of
163 |             a list:
164 | 
165 |             lambda x: x[0]
166 | 
167 |             Defaults to thefuzz.utils.full_process().
168 |         scorer: Optional function for scoring matches between the query and
169 |             an individual processed choice. This should be a function
170 |             of the form f(query, choice) -> int.
171 |             By default, fuzz.WRatio() is used and expects both query and
172 |             choice to be strings.
173 |         limit: Optional maximum for the number of elements returned. Defaults
174 |             to 5.
175 | 
176 |     Returns:
177 |         List of tuples containing the match and its score.
178 | 
179 |         If a list is used for choices, then the result will be 2-tuples.
180 |         If a dictionary is used, then the result will be 3-tuples containing
181 |         the key for each match.
182 | 
183 |         For example, searching for 'bird' in the dictionary
184 | 
185 |         {'bard': 'train', 'dog': 'man'}
186 | 
187 |         may return
188 | 
189 |         [('train', 22, 'bard'), ('man', 0, 'dog')]
190 |     """
191 |     return extractBests(query, choices, processor=processor, scorer=scorer, limit=limit)
192 | 
193 | 
194 | def extractBests(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0, limit=5):
195 |     """
196 |     Get a list of the best matches to a collection of choices.
197 | 
198 |     Convenience function for getting the choices with best scores.
199 | 
200 |     Args:
201 |         query: A string to match against
202 |         choices: A list or dictionary of choices, suitable for use with
203 |             extract().
204 |         processor: Optional function for transforming choices before matching.
205 |             See extract().
206 |         scorer: Scoring function for extract().
207 |         score_cutoff: Optional argument for score threshold. No matches with
208 |             a score less than this number will be returned. Defaults to 0.
209 |         limit: Optional maximum for the number of elements returned. Defaults
210 |             to 5.
211 | 
212 |     Returns: A a list of (match, score) tuples.
213 |     """
214 |     is_mapping = hasattr(choices, "items")
215 |     is_lowered = scorer in _scorer_lowering
216 | 
217 |     query = _preprocess_query(query, processor)
218 |     results = rprocess.extract(
219 |         query, choices,
220 |         processor=_get_processor(processor, scorer),
221 |         scorer=_get_scorer(scorer),
222 |         score_cutoff=score_cutoff,
223 |         limit=limit
224 |     )
225 | 
226 |     for i, (choice, score, key) in enumerate(results):
227 |         if is_lowered:
228 |             score = int(round(score))
229 | 
230 |         results[i] = (choice, score, key) if is_mapping else (choice, score)
231 | 
232 |     return results
233 | 
234 | 
235 | def extractOne(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0):
236 |     """
237 |     Find the single best match above a score in a list of choices.
238 | 
239 |     This is a convenience method which returns the single best choice.
240 |     See extract() for the full arguments list.
241 | 
242 |     Args:
243 |         query: A string to match against
244 |         choices: A list or dictionary of choices, suitable for use with
245 |             extract().
246 |         processor: Optional function for transforming choices before matching.
247 |             See extract().
248 |         scorer: Scoring function for extract().
249 |         score_cutoff: Optional argument for score threshold. If the best
250 |             match is found, but it is not greater than this number, then
251 |             return None anyway ("not a good enough match").  Defaults to 0.
252 | 
253 |     Returns:
254 |         A tuple containing a single match and its score, if a match
255 |         was found that was above score_cutoff. Otherwise, returns None.
256 |     """
257 |     is_mapping = hasattr(choices, "items")
258 |     is_lowered = scorer in _scorer_lowering
259 | 
260 |     query = _preprocess_query(query, processor)
261 |     res = rprocess.extractOne(
262 |         query, choices,
263 |         processor=_get_processor(processor, scorer),
264 |         scorer=_get_scorer(scorer),
265 |         score_cutoff=score_cutoff
266 |     )
267 | 
268 |     if res is None:
269 |         return res
270 | 
271 |     choice, score, key = res
272 | 
273 |     if is_lowered:
274 |         score = int(round(score))
275 | 
276 |     return (choice, score, key) if is_mapping else (choice, score)
277 | 
278 | 
279 | def dedupe(contains_dupes, threshold=70, scorer=fuzz.token_set_ratio):
280 |     """
281 |     This convenience function takes a list of strings containing duplicates and uses fuzzy matching to identify
282 |     and remove duplicates. Specifically, it uses process.extract to identify duplicates that
283 |     score greater than a user defined threshold. Then, it looks for the longest item in the duplicate list
284 |     since we assume this item contains the most entity information and returns that. It breaks string
285 |     length ties on an alphabetical sort.
286 | 
287 |     Note: as the threshold DECREASES the number of duplicates that are found INCREASES. This means that the
288 |         returned deduplicated list will likely be shorter. Raise the threshold for dedupe to be less
289 |         sensitive.
290 | 
291 |     Args:
292 |         contains_dupes: A list of strings that we would like to dedupe.
293 |         threshold: the numerical value (0,100) point at which we expect to find duplicates.
294 |             Defaults to 70 out of 100
295 |         scorer: Optional function for scoring matches between the query and
296 |             an individual processed choice. This should be a function
297 |             of the form f(query, choice) -> int.
298 |             By default, fuzz.token_set_ratio() is used and expects both query and
299 |             choice to be strings.
300 | 
301 |     Returns:
302 |         A deduplicated list. For example:
303 | 
304 |             In: contains_dupes = ['Frodo Baggin', 'Frodo Baggins', 'F. Baggins', 'Samwise G.', 'Gandalf', 'Bilbo Baggins']
305 |             In: dedupe(contains_dupes)
306 |             Out: ['Frodo Baggins', 'Samwise G.', 'Bilbo Baggins', 'Gandalf']
307 |     """
308 |     deduped = set()
309 |     for item in contains_dupes:
310 |         matches = extractBests(item, contains_dupes, scorer=scorer, score_cutoff=threshold, limit=None)
311 |         deduped.add(max(matches, key=lambda x: (len(x[0]), x[0]))[0])
312 | 
313 |     return list(deduped) if len(deduped) != len(contains_dupes) else contains_dupes
314 | 


--------------------------------------------------------------------------------
/thefuzz/process.pyi:
--------------------------------------------------------------------------------
 1 | from collections.abc import Mapping
 2 | import typing
 3 | from typing import Any, Callable, Union, Tuple, Generator, TypeVar, Sequence
 4 | 
 5 | 
 6 | ChoicesT = Union[Mapping[str, str], Sequence[str]]
 7 | T = TypeVar('T')
 8 | ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]
 9 | ScorerT = Callable[[str, str, bool, bool], int]
10 | 
11 | 
12 | @typing.overload
13 | def extractWithoutOrder(query: str, choices: Mapping[str, str], processor: ProcessorT, scorer: ScorerT, score_cutoff: int = ...) -> Generator[Tuple[str, int, str], None, None]: ...
14 | 
15 | 
16 | @typing.overload
17 | def extractWithoutOrder(query: str, choices: Sequence[str], processor: ProcessorT, scorer: ScorerT, score_cutoff: int = ...) -> Generator[Tuple[str, int], None, None]: ...
18 | 


--------------------------------------------------------------------------------
/thefuzz/py.typed:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/thefuzz/utils.py:
--------------------------------------------------------------------------------
 1 | from rapidfuzz.utils import default_process as _default_process
 2 | 
 3 | translation_table = {i: None for i in range(128, 256)}  # ascii dammit!
 4 | 
 5 | 
 6 | def ascii_only(s):
 7 |     return s.translate(translation_table)
 8 | 
 9 | 
10 | def full_process(s, force_ascii=False):
11 |     """
12 |     Process string by
13 |     -- removing all but letters and numbers
14 |     -- trim whitespace
15 |     -- force to lower case
16 |     if force_ascii == True, force convert to ascii
17 |     """
18 | 
19 |     if force_ascii:
20 |         s = ascii_only(str(s))
21 | 
22 |     return _default_process(s)
23 | 


--------------------------------------------------------------------------------
/thefuzz/utils.pyi:
--------------------------------------------------------------------------------
1 | 
2 | def ascii_only(s: str) -> str: ...
3 | def full_process(s: str, force_ascii: bool = ...) -> str: ...
4 | 


--------------------------------------------------------------------------------
/youtube/LLM Apps： Professional Opportunities for LLM App Developers..m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AI-LLM-Bootcamp/data/f776b47268f2d7152ac1accf3ce47472ec1a59e7/youtube/LLM Apps： Professional Opportunities for LLM App Developers..m4a


--------------------------------------------------------------------------------