├── README.md
├── PatentAdvisorBlockChain.xlsx
├── License.txt
├── GBQUploadJoinDownload.ipynb
├── gp-search-20180428-150343.csv
└── PatentAnalysisExample.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # PatentAnalysisNotebooks
2 | Public examples of using Python to analyze patents
3 |
--------------------------------------------------------------------------------
/PatentAdvisorBlockChain.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DeltaCharlieAlpha/PatentAnalysisNotebooks/HEAD/PatentAdvisorBlockChain.xlsx
--------------------------------------------------------------------------------
/License.txt:
--------------------------------------------------------------------------------
1 | Copyright 2018 David Andrews
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 |
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 |
7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 |
--------------------------------------------------------------------------------
/GBQUploadJoinDownload.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Google BigQuery\n",
8 | "### Sample Upload a list to GBQ, join to GBQ Data, Download results\n",
9 | "David Andrews\n",
10 | "Legal Analytics"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "This sample connects to Google BigQuery to retrieve patent data. You need to do some setup in Google's cloud dev platform if you are not already using Google BigQuery. You can reference one of the many great tutorials on setting up Google BigQuery for the first time to get the libraries installed, credentials cached, and project ID setup.\n",
18 | "\n",
19 | "#### Caveat\n",
20 | "As-is. Use at your own risk. Joining data can be hard and you might think you have the right results, but double-check before you rely on this data, and check with other data sources before you litigate!"
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": 1,
26 | "metadata": {},
27 | "outputs": [],
28 | "source": [
29 | "#import lots of stuff\n",
30 | "from google.cloud import bigquery\n",
31 | "import pandas as pd\n",
32 | "from oauth2client.client import GoogleCredentials\n",
33 | "from googleapiclient import discovery\n",
34 | "from IPython.display import display, HTML"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "# Upload data to GBQ"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 2,
47 | "metadata": {},
48 | "outputs": [
49 | {
50 | "data": {
51 | "text/html": [
52 | "
\n",
53 | "\n",
66 | "
\n",
67 | " \n",
68 | " \n",
69 | " | \n",
70 | " Publication_Number | \n",
71 | "
\n",
72 | " \n",
73 | " \n",
74 | " \n",
75 | " | 0 | \n",
76 | " US-2003004827-A1 | \n",
77 | "
\n",
78 | " \n",
79 | " | 1 | \n",
80 | " US-2003097331-A1 | \n",
81 | "
\n",
82 | " \n",
83 | " | 2 | \n",
84 | " US-2003191719-A1 | \n",
85 | "
\n",
86 | " \n",
87 | " | 3 | \n",
88 | " US-2004123129-A1 | \n",
89 | "
\n",
90 | " \n",
91 | " | 4 | \n",
92 | " US-2005177716-A1 | \n",
93 | "
\n",
94 | " \n",
95 | "
\n",
96 | "
"
97 | ],
98 | "text/plain": [
99 | " Publication_Number\n",
100 | "0 US-2003004827-A1\n",
101 | "1 US-2003097331-A1\n",
102 | "2 US-2003191719-A1\n",
103 | "3 US-2004123129-A1\n",
104 | "4 US-2005177716-A1"
105 | ]
106 | },
107 | "metadata": {},
108 | "output_type": "display_data"
109 | }
110 | ],
111 | "source": [
112 | "#load in the patents we want to join\n",
113 | "#these files are just flat lists of patent numbers. You can load locally using a local filepath,e.g. r\"c:\\data\\portfolio.csv\"\n",
114 | "#use read_excel if you have an xlsx file instead of read_csv. read_excel doesn't accept a url, so it has to be local.\n",
115 | "example_url = \"https://docs.google.com/spreadsheets/d/e/2PACX-1vTwNjPYeJV6l0lOTjMnI65rE4i_Prtc4Gnku3HupqBzuZ5v9wzhYWAA26AivTkFPw_AbwGuiuqoj_lq/pub?output=csv\"\n",
116 | "df_patent_numbers = pd.read_csv(example_url)\n",
117 | "\n",
118 | "#patent numbers come in lots of different flavors, like \"US9000000B2\" \"9000000\" \"US9000000\" \"US-9000000-B2\" or \"9,000,000\" \n",
119 | "#you can use RegEx101.com to try out regular expressions to match your format and reformat to \"US-9000000-B2\" to match GBQ\n",
120 | "#Sample format is US1234567B2 so we change it to US-1234567-B2\n",
121 | "df_patent_numbers['Publication_Number'] = df_patent_numbers['Publication_Number'].str.replace(r\"(\\D*)(\\d*)(\\D\\d?)\", \"\\\\1-\\\\2-\\\\3\")\n",
122 | "#Application numbers are wierd. Sometimes they have an extra zero at the 5th character into the number. We need to do\n",
123 | "#this for the sample data that was downloaded from Google Patents.\n",
124 | "#need to strip it so it matches the format in Google BigQuery\n",
125 | "df_patent_numbers['Publication_Number'] = df_patent_numbers['Publication_Number'].str.replace(r\"(\\D*-)(\\d{4})0(\\d{6})(-A1)\",\n",
126 | " \"\\\\1\\\\2\\\\3\\\\4\")\n",
127 | "#if the Regular expressions in the replace function are challenging to understand, try plugging them into \n",
128 | "#regex101.com along with some sample numbers and using the explanation to understand what each portion is doing.\n",
129 | "#unpacking the above REGEXP a little:\n",
130 | "# (\\D*) = grab all the non-digits at the front e.g. US\n",
131 | "# (\\d*) = grab all the digits, e.g. 1234567\n",
132 | "# (\\D\\d?) = grab a trailing non-digit optionally followed by a digit e.g. A1 or B2 or A \n",
133 | "\n",
134 | "#drop dulicates. Sometimes duplicates can cause issues in joins.\n",
135 | "df_patent_numbers.drop_duplicates(['Publication_Number'], inplace=True)\n",
136 | "#show the training set to see what it looks like. Make sure it imported correctly.\n",
137 | "display(df_patent_numbers.head())\n"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "metadata": {},
143 | "source": [
144 | "# Connect to Google BigQuery and upload patent numbers"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 3,
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stdout",
154 | "output_type": "stream",
155 | "text": [
156 | "Inserting 791 rows into: my_new_dataset.patent_numbers_temp\n",
157 | "\n",
158 | "\n",
159 | "\n",
160 | "Streaming Insert is 100.0% Complete\n",
161 | "\n",
162 | "\n"
163 | ]
164 | }
165 | ],
166 | "source": [
167 | "#load the dataset to Google BigQuery so we can join against the public patents data\n",
168 | "\n",
169 | "# Variables to be used to access GBQ, replace with your project id, and optionally change the table and\n",
170 | "# and dataset name.\n",
171 | "PROJECT_ID = 'patenttest-182300' #change this to your project ID\n",
172 | "DEST_DATASET = 'my_new_dataset'\n",
173 | "table_name = 'patent_numbers_temp'\n",
174 | "\n",
175 | "# Create a python client we can use for executing table creation queries\n",
176 | "client = bigquery.Client(project=PROJECT_ID)\n",
177 | "# Create an HTTP client for additional functionality.\n",
178 | "credentials = GoogleCredentials.get_application_default()\n",
179 | "http_client = discovery.build('bigquery', 'v2', credentials=credentials)\n",
180 | "\n",
181 | "#attach to the dataset\n",
182 | "dataset = client.dataset(DEST_DATASET)\n",
183 | "\n",
184 | "#create the table by having Pandas push up the dataframe as a table\n",
185 | "full_table_path = '{}.{}'.format(DEST_DATASET, table_name)\n",
186 | "print(\"Inserting\", len(df_patent_numbers), \"rows into:\",full_table_path)\n",
187 | "df_patent_numbers.to_gbq(destination_table=full_table_path,\n",
188 | " project_id=PROJECT_ID,\n",
189 | " if_exists='replace')"
190 | ]
191 | },
192 | {
193 | "cell_type": "markdown",
194 | "metadata": {},
195 | "source": [
196 | "# Query BigQuery to get training text"
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": 4,
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "name": "stdout",
206 | "output_type": "stream",
207 | "text": [
208 | "Requesting query... ok.\n",
209 | "Job ID: job_uQs6PF_CPEuOLP6nVAl5qGTNUEm6\n",
210 | "Query running...\n",
211 | "Query done.\n",
212 | "Processed: 8.7 GB\n",
213 | "Standard price: $0.04 USD\n",
214 | "\n",
215 | "Retrieving results...\n",
216 | "Got 791 rows.\n",
217 | "\n",
218 | "Total time taken 2.9 s.\n",
219 | "Finished at 2018-01-20 18:23:05.\n"
220 | ]
221 | },
222 | {
223 | "data": {
224 | "text/html": [
225 | "\n",
226 | "\n",
239 | "
\n",
240 | " \n",
241 | " \n",
242 | " | \n",
243 | " publication_number | \n",
244 | " title | \n",
245 | " grant_date | \n",
246 | "
\n",
247 | " \n",
248 | " \n",
249 | " \n",
250 | " | 0 | \n",
251 | " US-2006218651-A1 | \n",
252 | " Trusted infrastructure support systems, method... | \n",
253 | " 0 | \n",
254 | "
\n",
255 | " \n",
256 | " | 1 | \n",
257 | " US-2004123129-A1 | \n",
258 | " Trusted infrastructure support systems, method... | \n",
259 | " 0 | \n",
260 | "
\n",
261 | " \n",
262 | " | 2 | \n",
263 | " US-2003097331-A1 | \n",
264 | " Systems for financial and electronic commerce | \n",
265 | " 0 | \n",
266 | "
\n",
267 | " \n",
268 | " | 3 | \n",
269 | " US-2003004827-A1 | \n",
270 | " Payment system | \n",
271 | " 0 | \n",
272 | "
\n",
273 | " \n",
274 | " | 4 | \n",
275 | " US-2005197957-A1 | \n",
276 | " Parcel manager for distributed electronic bill... | \n",
277 | " 0 | \n",
278 | "
\n",
279 | " \n",
280 | " | 5 | \n",
281 | " US-8140567-B2 | \n",
282 | " Measuring entity extraction complexity | \n",
283 | " 20120320 | \n",
284 | "
\n",
285 | " \n",
286 | " | 6 | \n",
287 | " US-7734631-B2 | \n",
288 | " Associating information with an electronic doc... | \n",
289 | " 20100608 | \n",
290 | "
\n",
291 | " \n",
292 | " | 7 | \n",
293 | " US-7734945-B1 | \n",
294 | " Automated recovery of unbootable systems | \n",
295 | " 20100608 | \n",
296 | "
\n",
297 | " \n",
298 | " | 8 | \n",
299 | " US-7324671-B2 | \n",
300 | " System and method for multi-view face detection | \n",
301 | " 20080129 | \n",
302 | "
\n",
303 | " \n",
304 | " | 9 | \n",
305 | " US-7478326-B2 | \n",
306 | " Window information switching system | \n",
307 | " 20090113 | \n",
308 | "
\n",
309 | " \n",
310 | "
\n",
311 | "
"
312 | ],
313 | "text/plain": [
314 | " publication_number title \\\n",
315 | "0 US-2006218651-A1 Trusted infrastructure support systems, method... \n",
316 | "1 US-2004123129-A1 Trusted infrastructure support systems, method... \n",
317 | "2 US-2003097331-A1 Systems for financial and electronic commerce \n",
318 | "3 US-2003004827-A1 Payment system \n",
319 | "4 US-2005197957-A1 Parcel manager for distributed electronic bill... \n",
320 | "5 US-8140567-B2 Measuring entity extraction complexity \n",
321 | "6 US-7734631-B2 Associating information with an electronic doc... \n",
322 | "7 US-7734945-B1 Automated recovery of unbootable systems \n",
323 | "8 US-7324671-B2 System and method for multi-view face detection \n",
324 | "9 US-7478326-B2 Window information switching system \n",
325 | "\n",
326 | " grant_date \n",
327 | "0 0 \n",
328 | "1 0 \n",
329 | "2 0 \n",
330 | "3 0 \n",
331 | "4 0 \n",
332 | "5 20120320 \n",
333 | "6 20100608 \n",
334 | "7 20100608 \n",
335 | "8 20080129 \n",
336 | "9 20090113 "
337 | ]
338 | },
339 | "metadata": {},
340 | "output_type": "display_data"
341 | }
342 | ],
343 | "source": [
344 | "#create our query string. The query joins the BigQuery patent data using our training set on\n",
345 | "# publication number. Because we formatted in Python to match the BigQuery format, we can just use equality\n",
346 | "# in the \"on\" clause of the join, but you could also do manipulations here to the format of the publication number\n",
347 | "# The query just adds the title and grant date.\n",
348 | "query = \"\"\"\n",
349 | "select pubs.publication_number, \n",
350 | " (SELECT text from UNNEST(pubs.title_localized) LIMIT 1) as title,\n",
351 | " pubs.grant_date\n",
352 | "from\n",
353 | " `patents-public-data.patents.publications` as pubs, UNNEST(title_localized) as title\n",
354 | " JOIN `\"\"\" + full_table_path + \"\"\"` as input\n",
355 | " on pubs.publication_number = input.Publication_Number\n",
356 | "\"\"\"\n",
357 | "df_patent_data = pd.read_gbq(query, project_id=PROJECT_ID, dialect='standard')\n",
358 | "#check to make sure we got back a dataset that looks right.\n",
359 | "display(df_patent_data.head(10))\n",
360 | "\n"
361 | ]
362 | },
363 | {
364 | "cell_type": "code",
365 | "execution_count": 7,
366 | "metadata": {},
367 | "outputs": [
368 | {
369 | "data": {
370 | "text/html": [
371 | "\n",
372 | "\n",
385 | "
\n",
386 | " \n",
387 | " \n",
388 | " | \n",
389 | " publication_number | \n",
390 | " title | \n",
391 | " grant_date | \n",
392 | "
\n",
393 | " \n",
394 | " \n",
395 | " \n",
396 | " | 0 | \n",
397 | " US-2006218651-A1 | \n",
398 | " Trusted infrastructure support systems, method... | \n",
399 | " 0 | \n",
400 | "
\n",
401 | " \n",
402 | " | 1 | \n",
403 | " US-2004123129-A1 | \n",
404 | " Trusted infrastructure support systems, method... | \n",
405 | " 0 | \n",
406 | "
\n",
407 | " \n",
408 | " | 2 | \n",
409 | " US-2003097331-A1 | \n",
410 | " Systems for financial and electronic commerce | \n",
411 | " 0 | \n",
412 | "
\n",
413 | " \n",
414 | " | 3 | \n",
415 | " US-2003004827-A1 | \n",
416 | " Payment system | \n",
417 | " 0 | \n",
418 | "
\n",
419 | " \n",
420 | " | 4 | \n",
421 | " US-2005197957-A1 | \n",
422 | " Parcel manager for distributed electronic bill... | \n",
423 | " 0 | \n",
424 | "
\n",
425 | " \n",
426 | " | 5 | \n",
427 | " US-8140567-B2 | \n",
428 | " Measuring entity extraction complexity | \n",
429 | " 2012-03-20 | \n",
430 | "
\n",
431 | " \n",
432 | " | 6 | \n",
433 | " US-7734631-B2 | \n",
434 | " Associating information with an electronic doc... | \n",
435 | " 2010-06-08 | \n",
436 | "
\n",
437 | " \n",
438 | " | 7 | \n",
439 | " US-7734945-B1 | \n",
440 | " Automated recovery of unbootable systems | \n",
441 | " 2010-06-08 | \n",
442 | "
\n",
443 | " \n",
444 | " | 8 | \n",
445 | " US-7324671-B2 | \n",
446 | " System and method for multi-view face detection | \n",
447 | " 2008-01-29 | \n",
448 | "
\n",
449 | " \n",
450 | " | 9 | \n",
451 | " US-7478326-B2 | \n",
452 | " Window information switching system | \n",
453 | " 2009-01-13 | \n",
454 | "
\n",
455 | " \n",
456 | "
\n",
457 | "
"
458 | ],
459 | "text/plain": [
460 | " publication_number title \\\n",
461 | "0 US-2006218651-A1 Trusted infrastructure support systems, method... \n",
462 | "1 US-2004123129-A1 Trusted infrastructure support systems, method... \n",
463 | "2 US-2003097331-A1 Systems for financial and electronic commerce \n",
464 | "3 US-2003004827-A1 Payment system \n",
465 | "4 US-2005197957-A1 Parcel manager for distributed electronic bill... \n",
466 | "5 US-8140567-B2 Measuring entity extraction complexity \n",
467 | "6 US-7734631-B2 Associating information with an electronic doc... \n",
468 | "7 US-7734945-B1 Automated recovery of unbootable systems \n",
469 | "8 US-7324671-B2 System and method for multi-view face detection \n",
470 | "9 US-7478326-B2 Window information switching system \n",
471 | "\n",
472 | " grant_date \n",
473 | "0 0 \n",
474 | "1 0 \n",
475 | "2 0 \n",
476 | "3 0 \n",
477 | "4 0 \n",
478 | "5 2012-03-20 \n",
479 | "6 2010-06-08 \n",
480 | "7 2010-06-08 \n",
481 | "8 2008-01-29 \n",
482 | "9 2009-01-13 "
483 | ]
484 | },
485 | "metadata": {},
486 | "output_type": "display_data"
487 | }
488 | ],
489 | "source": [
490 | "#Note on grant date- GBQ Patents stores the data as an int, but not an easily dealt-with int.\n",
491 | "#here is a handy function to go to a string representation that is more readable and importable to e.g. Excel\n",
492 | "repl = lambda m: m.group(0)[:4]+\"-\"+m.group(0)[4:6]+\"-\"+m.group(0)[6:]\n",
493 | "df_patent_data['grant_date'] = df_patent_data['grant_date'].astype(str).str.replace(\"(\\d{8})\", repl)\n",
494 | "display(df_patent_data.head(10))"
495 | ]
496 | },
497 | {
498 | "cell_type": "code",
499 | "execution_count": null,
500 | "metadata": {},
501 | "outputs": [],
502 | "source": []
503 | }
504 | ],
505 | "metadata": {
506 | "kernelspec": {
507 | "display_name": "Python 3",
508 | "language": "python",
509 | "name": "python3"
510 | },
511 | "language_info": {
512 | "codemirror_mode": {
513 | "name": "ipython",
514 | "version": 3
515 | },
516 | "file_extension": ".py",
517 | "mimetype": "text/x-python",
518 | "name": "python",
519 | "nbconvert_exporter": "python",
520 | "pygments_lexer": "ipython3",
521 | "version": "3.6.1"
522 | }
523 | },
524 | "nbformat": 4,
525 | "nbformat_minor": 2
526 | }
527 |
--------------------------------------------------------------------------------
/gp-search-20180428-150343.csv:
--------------------------------------------------------------------------------
1 | search URL:,https://patents.google.com/?q=blockchain&q=block-chain&country=US&status=GRANT
2 | id,title,assignee,inventor/author,priority date,filing/creation date,publication date,grant date,result link,representative figure link
3 | US-9569771-B2,Method and system for storage and retrieval of blockchain blocks using galois fields ,"Stephen Lesavich, Zachary C. LESAVICH","Stephen Lesavich, Zachary C. LESAVICH",2011-04-29,2016-06-06,2017-02-14,2017-02-14,https://patents.google.com/patent/US9569771B2/en,https://patentimages.storage.googleapis.com/6a/88/4c/76089aeab0fabb/US09569771-20170214-D00000.png
4 | US-9635000-B1,Blockchain identity management system based on public identities ledger ,Sead Muftic,Sead Muftic,2016-05-25,2016-05-25,2017-04-25,2017-04-25,https://patents.google.com/patent/US9635000B1/en,https://patentimages.storage.googleapis.com/77/26/cf/3a04a6e42eadb5/US09635000-20170425-D00000.png
5 | US-9608829-B2,System and method for creating a multi-branched blockchain with configurable protocol rules ,Blockchain Technologies Corporation,"Nikolaos Spanos, Andrew R. Martin, Eric T. Dixon, Asterios Steven Geros",2014-07-25,2015-07-24,2017-03-28,2017-03-28,https://patents.google.com/patent/US9608829B2/en,https://patentimages.storage.googleapis.com/d7/f5/97/688b360071c952/US09608829-20170328-D00000.png
6 | US-9807106-B2,Mitigating blockchain attack ,British Telecommunications Public Limited Company,"Joshua DANIEL, Gery Ducatel, Theo Dimitrakos",2015-07-31,2016-07-29,2017-10-31,2017-10-31,https://patents.google.com/patent/US9807106B2/en,https://patentimages.storage.googleapis.com/c8/49/aa/2dc147d5b8b9ec/US09807106-20171031-D00000.png
7 | US-9774578-B1,Distributed key secret for rewritable blockchain ,"Accenture Global Solutions Limited, GSC Secrypt, LLC","Giuseppe Ateniese, Michael T. Chiaramonte, David Treat, Bernardo Magri, Daniele Venturi",2016-05-23,2017-05-16,2017-09-26,2017-09-26,https://patents.google.com/patent/US9774578B1/en,https://patentimages.storage.googleapis.com/4b/92/76/d74289fa3d5f76/US09774578-20170926-D00000.png
8 | US-9853819-B2,"Blockchain-supported, node ID-augmented digital record signature method ",Guardtime Ip Holdings Ltd.,"Ahto Truu, Andres Kroonmaa, Michael GAULT, Jeffrey Pearce",2013-08-05,2016-10-17,2017-12-26,2017-12-26,https://patents.google.com/patent/US9853819B2/en,https://patentimages.storage.googleapis.com/d9/92/db/34f28b08c0e198/US09853819-20171226-D00000.png
9 | US-9722790-B2,Identity management service using a blockchain providing certifying transactions between devices ,"ShoCard, Inc.",Armin Ebrahimi,2015-05-05,2016-05-04,2017-08-01,2017-08-01,https://patents.google.com/patent/US9722790B2/en,https://patentimages.storage.googleapis.com/3c/8f/6f/bb2448787162e5/US09722790-20170801-D00000.png
10 | US-9882918-B1,User behavior profile in a blockchain ,"Forcepoint, LLC","Richard Anthony Ford, Brandon L. Swafford, Christopher Brian Shirey, Matthew P. Moynahan, Richard Heath Thompson",2017-05-15,2017-09-29,2018-01-30,2018-01-30,https://patents.google.com/patent/US9882918B1/en,https://patentimages.storage.googleapis.com/06/74/c7/54e3deeecc51bd/US09882918-20180130-D00000.png
11 | US-9934138-B1,Application testing on a blockchain ,International Business Machines Corporation,"Vijay Kumar Ananthapur Bache, Jhilam Bera, Arvind Kumar, Bidhu Sahoo",2016-12-07,2016-12-07,2018-04-03,2018-04-03,https://patents.google.com/patent/US9934138B1/en,https://patentimages.storage.googleapis.com/5f/fd/41/3a3cde0c7e7090/US09934138-20180403-D00000.png
12 | US-9870591-B2,Distributed electronic document review in a blockchain system and computerized scoring based on textual and visual feedback ,Netspective Communications Llc,Shahid N. Shah,2013-09-12,2016-12-21,2018-01-16,2018-01-16,https://patents.google.com/patent/US9870591B2/en,https://patentimages.storage.googleapis.com/e7/6f/d4/7c3657234c4836/US09870591-20180116-D00000.png
13 | US-9849364-B2,Smart device ,Bao Tran,"Bao Tran, Ha Tran",2016-02-02,2017-05-03,2017-12-26,2017-12-26,https://patents.google.com/patent/US9849364B2/en,https://patentimages.storage.googleapis.com/70/09/f3/187df1b29d4bea/US09849364-20171226-D00000.png
14 | US-9836908-B2,System and method for securely receiving and counting votes in an election ,Blockchain Technologies Corporation,"Nikolaos Spanos, Andrew R. Martin, Eric T. Dixon",2014-07-25,2015-08-06,2017-12-05,2017-12-05,https://patents.google.com/patent/US9836908B2/en,https://patentimages.storage.googleapis.com/62/76/2c/cda0758bec4454/US09836908-20171205-D00000.png
15 | US-9298806-B1,System and method for analyzing transactions in a distributed ledger ,"Coinlab, Inc.","Peter Joseph Vessenes, Robert Beach Seidensticker, III",2015-07-08,2015-07-08,2016-03-29,2016-03-29,https://patents.google.com/patent/US9298806B1/en,https://patentimages.storage.googleapis.com/cb/33/62/7b28d3e287b8ab/US09298806-20160329-D00000.png
16 | US-9397985-B1,System and method for providing a cryptographic platform for exchanging information ,"Manifold Technology, Inc.","A. Seger II Robert, Christopher T. Finan",2015-04-14,2015-04-14,2016-07-19,2016-07-19,https://patents.google.com/patent/US9397985B1/en,https://patentimages.storage.googleapis.com/86/bf/bc/f1db3498df32ef/US09397985-20160719-D00000.png
17 | US-9749297-B2,Manicoding for communication verification ,Yaron Gvili,Yaron Gvili,2014-11-12,2014-11-12,2017-08-29,2017-08-29,https://patents.google.com/patent/US9749297B2/en,https://patentimages.storage.googleapis.com/ae/bd/94/a753aa68e5aa1e/US09749297-20170829-D00000.png
18 | US-9898782-B1,"Systems, methods, and program products for operating exchange traded products holding digital math-based assets ","Winklevoss Ip, Llc","Cameron Howard Winklevoss, Tyler Howard Winklevoss, Evan Louis Greebel, Kathleen Hill Moriarty, Gregory Elias Xethalis",2013-06-28,2014-06-27,2018-02-20,2018-02-20,https://patents.google.com/patent/US9898782B1/en,https://patentimages.storage.googleapis.com/dc/50/09/75b1fc72eec3bc/US09898782-20180220-D00000.png
19 | US-9794074-B2,Systems and methods for storing and sharing transactional data using distributed computing systems ,Nasdaq Technology Ab,"Johan TOLL, Fredrik SJÖBLOM",2016-02-04,2017-02-03,2017-10-17,2017-10-17,https://patents.google.com/patent/US9794074B2/en,https://patentimages.storage.googleapis.com/19/e2/9f/309dc344718519/US09794074-20171017-D00000.png
20 | US-9513627-B1,Autonomous coordination of resources amongst robots ,"inVia Robotics, LLC","Lior Elazary, Randolph Charles Voorhies, Frank Parks II Daniel",2016-04-25,2016-04-25,2016-12-06,2016-12-06,https://patents.google.com/patent/US9513627B1/en,https://patentimages.storage.googleapis.com/e7/b2/5e/0fb1b906c01547/US09513627-20161206-D00000.png
21 | US-9667600-B2,Decentralized and distributed secure home subscriber server device ,"At&T Intellectual Property I, L.P.","Roger Piqueras Jover, Joshua Lackey",2015-04-06,2015-04-06,2017-05-30,2017-05-30,https://patents.google.com/patent/US9667600B2/en,https://patentimages.storage.googleapis.com/c3/0b/4e/7460247efcdfd8/US09667600-20170530-D00000.png
22 | US-9703986-B1,Decentralized reputation service for synthetic identities ,"Anonyome Labs, Inc.","Paul Ashley, Steve Shillingford, Greg Clark, Dennis Wilkins, Mike Neuenschwander, Stephen Sartor",2015-05-13,2015-08-13,2017-07-11,2017-07-11,https://patents.google.com/patent/US9703986B1/en,https://patentimages.storage.googleapis.com/e3/99/b7/c88b5b05c9548c/US09703986-20170711-D00000.png
23 | US-9792101-B2,Capacity and automated de-install of linket mobile apps with deep links ,Wesley John Boudville,Wesley John Boudville,2015-11-10,2015-11-10,2017-10-17,2017-10-17,https://patents.google.com/patent/US9792101B2/en,https://patentimages.storage.googleapis.com/f7/38/47/9d4b4bab450809/US09792101-20171017-D00000.png
24 | US-9509690-B2,Methods and systems for managing network activity using biometrics ,Eyelock Llc,"Samuel J. Carter, Christopher L. Ream, Sarvesh Makthal, Stephen Charles Gerber",2015-03-12,2016-03-11,2016-11-29,2016-11-29,https://patents.google.com/patent/US9509690B2/en,https://patentimages.storage.googleapis.com/5c/06/da/48769ab4455da6/US09509690-20161129-D00000.png
25 | US-9760827-B1,Neural network applications in resource constrained environments ,"Alpine Electronics of Silicon Valley, Inc.","Rocky Chau-Hsiung Lin, Thomas Yamasaki, Koichiro Kanda, Diego Rodriguez Risco, Alexander Joseph Ryan",2016-07-22,2017-01-03,2017-09-12,2017-09-12,https://patents.google.com/patent/US9760827B1/en,https://patentimages.storage.googleapis.com/e3/68/48/d87d9232fde640/US09760827-20170912-D00000.png
26 | US-9881176-B2,Fragmenting data for the purposes of persistent storage across multiple immutable data structures ,"ALTR Solutions, Inc.","Scott Nathaniel Goldfarb, James Douglas Beecham, Christopher Edward Struttmann",2015-06-02,2017-08-11,2018-01-30,2018-01-30,https://patents.google.com/patent/US9881176B2/en,https://patentimages.storage.googleapis.com/1f/72/5a/5d834961046ce4/US09881176-20180130-D00000.png
27 | US-9858781-B1,Architecture for access management ,"Tyco Integrated Security, LLC","Richard Campero, Sean DAVIS, Graeme Jarvis, Terezinha Rumble",2016-09-09,2017-05-16,2018-01-02,2018-01-02,https://patents.google.com/patent/US9858781B1/en,https://patentimages.storage.googleapis.com/04/9f/a6/bda52789049e5a/US09858781-20180102-D00000.png
28 | US-9667427-B2,Systems and methods for managing digital identities ,"Cambridge Blockchain, LLC","Alex Oberhauser, Matthew Commons, Alok Bhargava",2015-10-14,2016-10-14,2017-05-30,2017-05-30,https://patents.google.com/patent/US9667427B2/en,https://patentimages.storage.googleapis.com/73/34/36/f6cb6f27b2adba/US09667427-20170530-D00000.png
29 | US-9870562-B2,Method and system for integration of market exchange and issuer processing for blockchain-based transactions ,Mastercard International Incorporated,"Steven Charles DAVIS, Ashish Raghavendra Tetali",2015-05-21,2015-05-21,2018-01-16,2018-01-16,https://patents.google.com/patent/US9870562B2/en,https://patentimages.storage.googleapis.com/2c/16/9f/76f56ea6c76ebc/US09870562-20180116-D00000.png
30 | US-9876775-B2,Generalized entity network translation (GENT) ,"Ent Technologies, Inc.",Timothy Mossbarger,2012-11-09,2015-03-27,2018-01-23,2018-01-23,https://patents.google.com/patent/US9876775B2/en,https://patentimages.storage.googleapis.com/7b/7e/1e/c83d5b102fc12a/US09876775-20180123-D00000.png
31 | US-9818092-B2,System and method for executing financial transactions ,Antti Pennanen,Antti Pennanen,2014-06-04,2014-06-04,2017-11-14,2017-11-14,https://patents.google.com/patent/US9818092B2/en,https://patentimages.storage.googleapis.com/0d/dd/c4/65562c038cbfb6/US09818092-20171114-D00000.png
32 | US-9862222-B1,Digitally encoded seal for document verification ,"Uipco, Llc","Alexander B. Nagelberg, Michael Justin Cairns",2016-04-04,2017-09-14,2018-01-09,2018-01-09,https://patents.google.com/patent/US9862222B1/en,https://patentimages.storage.googleapis.com/6b/6b/50/d096013b2e876a/US09862222-20180109-D00000.png
33 | US-9824031-B1,Efficient clearinghouse transactions with trusted and un-trusted entities ,International Business Machines Corporation,"Raghu K. Ganti, Mudhakar Srivatsa, Dinesh C. Verma",2016-10-28,2016-10-28,2017-11-21,2017-11-21,https://patents.google.com/patent/US9824031B1/en,https://patentimages.storage.googleapis.com/d1/b3/ba/559b9f336b9655/US09824031-20171121-D00000.png
34 | US-9338148-B2,Secure distributed information and password management ,"Verizon Patent And Licensing Inc., Cellco Partnership","Donna L. Polehn, Lalit R. KOTECHA, Patricia R. Chang, Deepak Kakadia, John F. MACIAS, Priscilla Lau, Arda Aksu",2013-11-05,2013-11-05,2016-05-10,2016-05-10,https://patents.google.com/patent/US9338148B2/en,https://patentimages.storage.googleapis.com/b0/5e/26/105473d2622b1c/US09338148-20160510-D00000.png
35 | US-9852427-B2,Systems and methods for sanction screening ,"Idm Global, Inc.",Jose Caldera,2015-11-11,2015-11-11,2017-12-26,2017-12-26,https://patents.google.com/patent/US9852427B2/en,https://patentimages.storage.googleapis.com/af/5c/b3/3f50132e521fb5/US09852427-20171226-D00000.png
36 | US-9928290-B2,Trust framework for platform data ,Accenture Global Solutions Limited,Steven C. Tiell,2015-08-17,2016-04-11,2018-03-27,2018-03-27,https://patents.google.com/patent/US9928290B2/en,https://patentimages.storage.googleapis.com/99/b7/16/eef6046503a074/US09928290-20180327-D00000.png
37 | US-9747586-B1,System and method for issuance of electronic currency substantiated by a reserve of assets ,Cpn Gold B.V.,"Vladimir Nikolayevich Frolov, Damir Nasibullovich Gaynanov, Aleksey Petrovich Romanchuk, Anatoliy Anatolievich Vatolin",2016-06-28,2016-09-12,2017-08-29,2017-08-29,https://patents.google.com/patent/US9747586B1/en,https://patentimages.storage.googleapis.com/70/9c/7d/7bc26301a49cf4/US09747586-20170829-D00000.png
38 | US-9818116-B2,Systems and methods for detecting relations between unknown merchants and merchants with a known connection to fraud ,"Idm Global, Inc.",Jose Caldera,2015-11-11,2015-11-23,2017-11-14,2017-11-14,https://patents.google.com/patent/US9818116B2/en,https://patentimages.storage.googleapis.com/d8/e9/ee/9fd0041cc8006f/US09818116-20171114-D00000.png
39 | US-9870508-B1,Securely authenticating a recording file from initial collection through post-production and distribution ,"Unveiled Labs, Inc.","Roderick Neil Hodgson, Shamir Allibhai",2017-06-01,2017-06-01,2018-01-16,2018-01-16,https://patents.google.com/patent/US9870508B1/en,https://patentimages.storage.googleapis.com/30/12/aa/69a46ccdab8ba7/US09870508-20180116-D00000.png
40 | US-9351124-B1,Location detection and communication through latent dynamic network interactions ,Cognizant Business Services Limited,Edward Martin Shelton,2015-06-29,2015-06-29,2016-05-24,2016-05-24,https://patents.google.com/patent/US9351124B1/en,https://patentimages.storage.googleapis.com/8d/ca/af/91edec93a79e1c/US09351124-20160524-D00000.png
41 | US-9935772-B1,Methods and systems for operating secure digital management aware applications ,"Vijay K Madisetti, Arshdeep Bahga, Michael Richter","Vijay K Madisetti, Arshdeep Bahga, Michael Richter",2016-02-19,2017-08-15,2018-04-03,2018-04-03,https://patents.google.com/patent/US9935772B1/en,https://patentimages.storage.googleapis.com/8a/92/08/008bd5cad493af/US09935772-20180403-D00000.png
42 | US-6938039-B1,Concurrent file across at a target file server during migration of file systems between file servers using a network file system access protocol ,Emc Corporation,"Paul M. Bober, Uresh Vahalia, Aju John, Jeffrey L. Alexander, Uday K. Gupta",2000-06-30,2000-06-30,2005-08-30,2005-08-30,https://patents.google.com/patent/US6938039B1/en,https://patentimages.storage.googleapis.com/US6938039B1/US06938039-20050830-D00000.png
43 | US-9641338-B2,"Method and apparatus for providing a universal deterministically reproducible cryptographic key-pair representation for all SKUs, shipping cartons, and items ","Skuchain, Inc.","Srinivasan Sriram, Zaki N. MANIAN",2015-03-12,2015-03-12,2017-05-02,2017-05-02,https://patents.google.com/patent/US9641338B2/en,https://patentimages.storage.googleapis.com/c3/12/e2/f77192dd25abcd/US09641338-20170502-D00000.png
44 | US-9847997-B2,Server based biometric authentication ,Visa International Service Association,Kim Wagner,2015-11-11,2015-11-11,2017-12-19,2017-12-19,https://patents.google.com/patent/US9847997B2/en,https://patentimages.storage.googleapis.com/db/8b/5c/a54363e3fbcb40/US09847997-20171219-D00000.png
45 | US-676727-A,Match-making machine. ,Fed Match Company,Morris San,1899-10-24,1899-10-24,1901-06-18,1901-06-18,https://patents.google.com/patent/US676727A/en,https://patentimages.storage.googleapis.com/pages/US676727-0.png
46 | US-9760574-B1,Managing I/O requests in file systems ,EMC IP Holding Company LLC,"Jia Zhai, Yingchao Zhou, Ivan Bassov",2014-06-30,2014-06-30,2017-09-12,2017-09-12,https://patents.google.com/patent/US9760574B1/en,https://patentimages.storage.googleapis.com/a9/2e/4b/2e45b62dc5b8d0/US09760574-20170912-D00000.png
47 | US-9825931-B2,System for tracking and validation of an entity in a process data network ,Bank Of America Corporation,"Darrell Johnsrud, Manu Jacob Kurian, Michael Wuehler",2016-01-26,2016-02-22,2017-11-21,2017-11-21,https://patents.google.com/patent/US9825931B2/en,https://patentimages.storage.googleapis.com/f5/33/d9/87118c4008265c/US09825931-20171121-D00000.png
48 | US-9875510-B1,Consensus system for tracking peer-to-peer digital records ,Lance Kasper,Lance Kasper,2015-02-03,2015-05-07,2018-01-23,2018-01-23,https://patents.google.com/patent/US9875510B1/en,https://patentimages.storage.googleapis.com/47/3e/1a/969ae1c8fb5324/US09875510-20180123-D00000.png
49 | US-9436935-B2,Computer system for making a payment using a tip button ,"Coinbase, Inc.",James Bradley Hudon,2014-03-17,2015-03-17,2016-09-06,2016-09-06,https://patents.google.com/patent/US9436935B2/en,https://patentimages.storage.googleapis.com/99/4d/1b/376c170c6a4560/US09436935-20160906-D00000.png
50 | US-318621-A,William heckeet ,,,,,1885-05-19,1885-05-19,https://patents.google.com/patent/US318621A/en,https://patentimages.storage.googleapis.com/pages/US318621-0.png
51 | US-9413735-B1,Managing distribution and retrieval of security key fragments among proxy storage devices ,"Ca, Inc.",Geoffrey R. Hird,2015-01-20,2015-01-20,2016-08-09,2016-08-09,https://patents.google.com/patent/US9413735B1/en,https://patentimages.storage.googleapis.com/58/f1/b0/aab3cb86e3364d/US09413735-20160809-D00000.png
52 | US-9665734-B2,Uniform-frequency records with obscured context ,"Q Bio, Inc.","Jeffrey Howard Kaditz, Andrew Gettings STEVENS, David Grijalva",2015-09-12,2016-09-11,2017-05-30,2017-05-30,https://patents.google.com/patent/US9665734B2/en,https://patentimages.storage.googleapis.com/f9/e8/e2/5ce2a2dd6f7bda/US09665734-20170530-D00000.png
53 | US-9135787-B1,Bitcoin kiosk/ATM device and system integrating enrollment protocol and method of using the same ,"Mark Russell, John W. Russell","Mark Russell, John W. Russell",2014-04-04,2014-04-04,2015-09-15,2015-09-15,https://patents.google.com/patent/US9135787B1/en,https://patentimages.storage.googleapis.com/d9/af/96/21e1a59132dbb6/US09135787-20150915-D00000.png
54 | US-702775-A,Protection of driving-chains. ,Ernest Catchpool,Ernest Catchpool,1902-02-27,1902-02-27,1902-06-17,1902-06-17,https://patents.google.com/patent/US702775A/en,https://patentimages.storage.googleapis.com/pages/US702775-0.png
55 | US-371437-A,Elevator ,,,,,1887-10-11,1887-10-11,https://patents.google.com/patent/US371437A/en,https://patentimages.storage.googleapis.com/pages/US371437-0.png
56 | US-9679276-B1,"Systems and methods for using a block chain to certify the existence, integrity, and/or ownership of a file or communication ","Stampery, Inc.",Luis Iván Cuende,2016-01-26,2016-01-26,2017-06-13,2017-06-13,https://patents.google.com/patent/US9679276B1/en,https://patentimages.storage.googleapis.com/6e/a0/d3/ea1f17d52d3221/US09679276-20170613-D00000.png
57 | US-9875592-B1,Drone used for authentication and authorization for restricted access via an electronic lock ,International Business Machines Corporation,"Thomas D. Erickson, Kala K. Fleming, Clifford A. Pickover, Komminist Weldemariam",2016-08-30,2016-08-30,2018-01-23,2018-01-23,https://patents.google.com/patent/US9875592B1/en,https://patentimages.storage.googleapis.com/2b/b3/9d/e8cd3a35fe7646/US09875592-20180123-D00000.png
58 | US-9480188-B2,Use of computationally generated thermal energy ,LO3 Energy Inc.,"Lawrence Orsini, Yun Wei",2014-11-04,2015-11-04,2016-10-25,2016-10-25,https://patents.google.com/patent/US9480188B2/en,https://patentimages.storage.googleapis.com/82/68/ea/bf72722934974e/US09480188-20161025-D00000.png
59 | US-9942304-B2,Remote control authority and authentication ,N99 Llc,Steven K. Gold,,2017-09-20,2018-04-10,2018-04-10,https://patents.google.com/patent/US9942304B2/en,https://patentimages.storage.googleapis.com/f1/27/53/a0e753a409b29c/US09942304-20180410-D00000.png
60 | US-9705682-B2,Extending DNSSEC trust chains to objects outside the DNS ,"Verisign, Inc.","Burton S. Kaliski, Jr., Eric Osterweil, Glen Wiley",2015-07-06,2015-12-04,2017-07-11,2017-07-11,https://patents.google.com/patent/US9705682B2/en,https://patentimages.storage.googleapis.com/6c/19/1d/30dda33c6b96ed/US09705682-20170711-D00000.png
61 | US-9014661-B2,Mobile security technology ,Christopher deCharms,Christopher deCharms,2013-05-04,2014-09-03,2015-04-21,2015-04-21,https://patents.google.com/patent/US9014661B2/en,https://patentimages.storage.googleapis.com/f8/39/c0/e63ac4922f7811/US09014661-20150421-D00000.png
62 | US-9705851-B2,Extending DNSSEC trust chains to objects outside the DNS ,"Verisign, Inc.","Burton S. Kaliski, Jr., Eric Osterweil, Glen Wiley",2015-07-06,2015-07-31,2017-07-11,2017-07-11,https://patents.google.com/patent/US9705851B2/en,https://patentimages.storage.googleapis.com/86/7f/87/5e13a8e7d87db0/US09705851-20170711-D00000.png
63 | US-9716595-B1,System and method for internet of things (IOT) security and management ,"T-Central, Inc.","David W. Kravitz, Donald Houston Graham, III, Josselyn L. Boudett, Russell S. Dietz",2010-04-30,2017-03-24,2017-07-25,2017-07-25,https://patents.google.com/patent/US9716595B1/en,https://patentimages.storage.googleapis.com/1e/29/56/74e28619ecdce6/US09716595-20170725-D00000.png
64 | US-9832026-B2,System and method from Internet of Things (IoT) security and management ,"T-Central, Inc.","David W. Kravitz, Donald Houston Graham, III, Josselyn L. Boudett, Russell S. Dietz",2010-04-30,2017-06-13,2017-11-28,2017-11-28,https://patents.google.com/patent/US9832026B2/en,https://patentimages.storage.googleapis.com/04/9b/14/49dbc2738aaf63/US09832026-20171128-D00000.png
65 | US-8453219-B2,Systems and methods of assessing permissions in virtual worlds ,"Brian Shuster, Aaron Burch, Dirk Herling, Gary Shuster","Brian Shuster, Aaron Burch, Dirk Herling, Gary Shuster",2011-08-18,2012-08-20,2013-05-28,2013-05-28,https://patents.google.com/patent/US8453219B2/en,https://patentimages.storage.googleapis.com/f2/64/2d/c5733e7b6783b9/US08453219-20130528-D00000.png
66 | US-9888007-B2,Systems and methods to authenticate users and/or control access made by users on a computer network using identity services ,"Idm Global, Inc.","Jose Caldera, Kieran Sherlock, Garrett Gafke",2016-05-13,2017-03-20,2018-02-06,2018-02-06,https://patents.google.com/patent/US9888007B2/en,https://patentimages.storage.googleapis.com/e2/41/95/9f09465dfa96ca/US09888007-20180206-D00000.png
67 | US-9331856-B1,Systems and methods for validating digital signatures ,Symantec Corporation,Qu Bo Song,2014-02-10,2014-02-10,2016-05-03,2016-05-03,https://patents.google.com/patent/US9331856B1/en,https://patentimages.storage.googleapis.com/36/e8/35/114db385fbe668/US09331856-20160503-D00000.png
68 | US-9411976-B2,Communication system and method ,Maidsafe Foundation,David Irvine,2006-12-01,2014-04-24,2016-08-09,2016-08-09,https://patents.google.com/patent/US9411976B2/en,https://patentimages.storage.googleapis.com/75/dc/2b/c9f3131860f84c/US09411976-20160809-D00000.png
69 | US-9659104-B2,Link association analysis systems and methods ,"Nant Holdings Ip, Llc","Luke Soon-Shiong, Patrick Soon-Shiong",2013-02-25,2014-02-18,2017-05-23,2017-05-23,https://patents.google.com/patent/US9659104B2/en,https://patentimages.storage.googleapis.com/53/0a/d8/285d0d0c8f146a/US09659104-20170523-D00000.png
70 | US-8449378-B2,"Gaming system, gaming device and method for utilizing bitcoins ",Igt,"Richard E. Michaelson, Kehl T. LeSourd",2011-09-13,2011-09-13,2013-05-28,2013-05-28,https://patents.google.com/patent/US8449378B2/en,https://patentimages.storage.googleapis.com/df/b8/fc/e3374b7389f334/US08449378-20130528-D00000.png
71 | US-9853977-B1,"System, method, and program product for processing secure transactions within a cloud computing system ","Winklevoss Ip, Llc","Andrew Laucius, Cem Paya, Eric Winer",2015-01-26,2016-01-26,2017-12-26,2017-12-26,https://patents.google.com/patent/US9853977B1/en,https://patentimages.storage.googleapis.com/01/27/25/217f10449d1d59/US09853977-20171226-D00000.png
72 | US-9311640-B2,Methods and arrangements for smartphone payments and transactions ,Digimarc Corporation,Tomas Filler,2014-02-11,2014-02-13,2016-04-12,2016-04-12,https://patents.google.com/patent/US9311640B2/en,https://patentimages.storage.googleapis.com/e0/7b/c3/bc28fc512ba183/US09311640-20160412-D00000.png
73 | US-9514293-B1,Behavioral profiling method and system to authenticate a user ,United Services Automobile Association,"Karen M. Moritz, Stephen Seyler Aultman, Joseph James Albert Campbell, Debra R. Casillas, Jonathan Edward Neuse, Sara Teresa Alonzo, Thomas Bret Buckingham, Gabriel Carlos Fernandez, Maland Keith Mortensen, Hudson Reid Jameson, Michael Frank Morris",2012-03-20,2015-10-12,2016-12-06,2016-12-06,https://patents.google.com/patent/US9514293B1/en,https://patentimages.storage.googleapis.com/d3/c9/48/8463b8c2bd2260/US09514293-20161206-D00000.png
74 | US-8756156-B1,Online management portal ,"HouseTab, LLC","Patrick G. Campi, Nikolaos Plaitakis, Andrew Tauber",2013-02-27,2014-02-27,2014-06-17,2014-06-17,https://patents.google.com/patent/US8756156B1/en,https://patentimages.storage.googleapis.com/49/f1/c4/58f13a9548d0bc/US08756156-20140617-D00000.png
75 | US-9558524-B2,Risk assessment using social networking data ,Socure Inc.,"Sunil Madhu, Giacomo Pallotti, Edward J. Romano, Alexander K. Chavez",2013-03-15,2016-03-23,2017-01-31,2017-01-31,https://patents.google.com/patent/US9558524B2/en,https://patentimages.storage.googleapis.com/cb/5c/74/9b89e908c3fb12/US09558524-20170131-D00000.png
76 | US-9672499-B2,Data analytic and security mechanism for implementing a hot wallet service ,"Modernity Financial Holdings, Ltd.","Danny Yang, Liqin Kou, Alex Liu",2014-04-02,2014-04-18,2017-06-06,2017-06-06,https://patents.google.com/patent/US9672499B2/en,https://patentimages.storage.googleapis.com/03/97/d6/ac37161eb23331/US09672499-20170606-D00000.png
77 | US-9740906-B2,Wearable device ,"Practech, Inc.","Khalid A. AlNasser, Ibrahim O. AlGwaiz, Mohammad A. AlGassim",2013-07-11,2016-05-26,2017-08-22,2017-08-22,https://patents.google.com/patent/US9740906B2/en,https://patentimages.storage.googleapis.com/dd/ec/e0/47c49e10fe2a1c/US09740906-20170822-D00000.png
78 | US-8523657-B2,"Gaming system, gaming device and method for utilizing bitcoins ",Igt,"Richard E. Michaelson, Kehl T. LeSourd",2011-09-13,2011-09-13,2013-09-03,2013-09-03,https://patents.google.com/patent/US8523657B2/en,https://patentimages.storage.googleapis.com/19/d0/12/12b35710179bed/US08523657-20130903-D00000.png
79 | US-9516035-B1,Behavioral profiling method and system to authenticate a user ,United Services Automobile Association,"Karen M. Moritz, Stephen Seyler Aultman, Joseph James Albert Campbell, Debra R. Casillas, Jonathan Edward Neuse, Sara Teresa Alonzo, Thomas Bret Buckingham, Gabriel Carlos Fernandez, Maland Keith Mortensen",2012-03-20,2015-09-16,2016-12-06,2016-12-06,https://patents.google.com/patent/US9516035B1/en,https://patentimages.storage.googleapis.com/c7/76/16/7a6db02a903329/US09516035-20161206-D00000.png
80 | US-9426151-B2,Determining identity of individuals using authenticators ,Ncluud Corporation,"Ronald F. Richards, Bradley N. Rotter, Pavan K. Muddana",2013-11-01,2014-10-31,2016-08-23,2016-08-23,https://patents.google.com/patent/US9426151B2/en,https://patentimages.storage.googleapis.com/38/42/52/fa5f4b8cec33b6/US09426151-20160823-D00000.png
81 | US-9866545-B2,Credential-free user login to remotely executed applications ,"ALTR Solutions, Inc.",James Douglas Beecham,2015-06-02,2017-08-11,2018-01-09,2018-01-09,https://patents.google.com/patent/US9866545B2/en,https://patentimages.storage.googleapis.com/ca/72/7b/6f8aae3f9912d3/US09866545-20180109-D00000.png
82 | US-9710808-B2,Direct digital cash system and method ,Igor V. SLEPININ,Igor V. SLEPININ,2013-09-16,2014-09-08,2017-07-18,2017-07-18,https://patents.google.com/patent/US9710808B2/en,https://patentimages.storage.googleapis.com/f8/5c/cd/95bd6a79bb41fe/US09710808-20170718-D00000.png
83 | US-9852426-B2,Method and system for secure transactions ,Collective Dynamics LLC,Steven V. Bacastow,2008-02-20,2014-12-23,2017-12-26,2017-12-26,https://patents.google.com/patent/US9852426B2/en,https://patentimages.storage.googleapis.com/b2/ea/e5/cb58ebff137851/US09852426-20171226-D00000.png
84 | US-9934502-B1,Contacts for misdirected payments and user authentication ,"Square, Inc.","Brian Grassadonia, Ayokunle Omojola, Michael Moring, Robert Andersen, Daniele Perito, Kristopher Stipech",2017-01-30,2017-01-30,2018-04-03,2018-04-03,https://patents.google.com/patent/US9934502B1/en,https://patentimages.storage.googleapis.com/3f/40/58/138ffe3ccdb970/US09934502-20180403-D00000.png
85 | US-9792742-B2,Decentralized virtual trustless ledger for access control ,"Live Nation Entertainment, Inc.","David Johnson, Joseph Mulkey",2016-02-02,2017-02-02,2017-10-17,2017-10-17,https://patents.google.com/patent/US9792742B2/en,https://patentimages.storage.googleapis.com/a0/70/85/f1a168ef635690/US09792742-20171017-D00000.png
86 | US-9912659-B1,Locking systems with multifactor authentication and changing passcodes ,Matt Widdows,Matt Widdows,2017-04-14,2017-04-14,2018-03-06,2018-03-06,https://patents.google.com/patent/US9912659B1/en,https://patentimages.storage.googleapis.com/db/f5/89/12b098329f066e/US09912659-20180306-D00000.png
87 | US-9702582-B2,Connected thermostat for controlling a climate system based on a desired usage profile in comparison to other connected thermostats controlling other climate systems ,"Ikorongo Technology, LLC",Hugh Blake Svendsen,2015-10-12,2016-07-18,2017-07-11,2017-07-11,https://patents.google.com/patent/US9702582B2/en,https://patentimages.storage.googleapis.com/d2/18/c0/8b08a41a99cef2/US09702582-20170711-D00000.png
88 | US-9830593-B2,Cryptographic currency user directory data and enhanced peer-verification ledger synthesis through multi-modal cryptographic key-address mapping ,"Ss8 Networks, Inc.",Michael Myers,2014-04-26,2014-04-26,2017-11-28,2017-11-28,https://patents.google.com/patent/US9830593B2/en,https://patentimages.storage.googleapis.com/20/ca/bb/1a6fc13b81e5a2/US09830593-20171128-D00000.png
89 | US-9735958-B2,Key ceremony of a security system forming part of a host computer for cryptographic transactions ,"Coinbase, Inc.","Andrew E. Alness, James Bradley Hudon",2015-05-19,2015-05-19,2017-08-15,2017-08-15,https://patents.google.com/patent/US9735958B2/en,https://patentimages.storage.googleapis.com/84/c4/fd/f86235eef8f646/US09735958-20170815-D00000.png
90 | US-9660627-B1,System and techniques for repeating differential signals ,"Bitfury Group Limited, Valerijs Vavilovs",Valerii Nebesnyi,2016-01-05,2016-01-05,2017-05-23,2017-05-23,https://patents.google.com/patent/US9660627B1/en,https://patentimages.storage.googleapis.com/05/28/6a/f2aa9012b3846e/US09660627-20170523-D00000.png
91 | US-9922381-B2,System and method for providing a payment handler API and a browser payment request API for processing a payment ,Monticello Enterprises LLC,"Thomas M. Isaacson, Ryan Connell Durham",2014-03-31,2017-05-23,2018-03-20,2018-03-20,https://patents.google.com/patent/US9922381B2/en,https://patentimages.storage.googleapis.com/da/1b/87/d1c1b4b0ba9212/US09922381-20180320-D00000.png
92 | US-9762562-B2,Techniques for multi-standard peer-to-peer connection ,"Facebook, Inc.","Yael Maguire, Damian Kowalewski, Bin Liu, Wai Davidgeolim Lim, Caitlin Elizabeth Kalinowski",2013-09-13,2014-06-20,2017-09-12,2017-09-12,https://patents.google.com/patent/US9762562B2/en,https://patentimages.storage.googleapis.com/1a/2f/1b/3a5e6e07cbbad0/US09762562-20170912-D00000.png
93 | US-9933790-B2,Peer-to-peer air analysis and treatment ,"Lunatech, Llc",Jonathan Seamus Blackley,2015-06-15,2016-06-13,2018-04-03,2018-04-03,https://patents.google.com/patent/US9933790B2/en,https://patentimages.storage.googleapis.com/ba/6a/70/9409f15ba5b6d6/US09933790-20180403-D00000.png
94 | US-9805360-B1,Location based device flagging and interface ,"Philz Coffee, Inc.",Jacob Jaber,2017-03-17,2017-05-24,2017-10-31,2017-10-31,https://patents.google.com/patent/US9805360B1/en,https://patentimages.storage.googleapis.com/f1/67/73/c84248294c3818/US09805360-20171031-D00000.png
95 | US-9436455-B2,Logging operating system updates of a secure element of an electronic device ,Apple Inc.,"Mehdi ZIAT, Kyle A. Diebolt",2014-01-06,2015-01-06,2016-09-06,2016-09-06,https://patents.google.com/patent/US9436455B2/en,https://patentimages.storage.googleapis.com/48/ac/b0/0e8809fbae805c/US09436455-20160906-D00000.png
96 | US-9852305-B2,Method for provably secure erasure of data ,Nec Corporation,Sebastian GAJEK,2014-10-21,2015-10-21,2017-12-26,2017-12-26,https://patents.google.com/patent/US9852305B2/en,https://patentimages.storage.googleapis.com/cb/41/52/383192c593fdba/US09852305-20171226-D00000.png
97 | US-9633513-B2,Method and system for gaming revenue ,Jackpot Rising Inc.,William Garrett Webb,2014-12-17,2016-07-26,2017-04-25,2017-04-25,https://patents.google.com/patent/US9633513B2/en,https://patentimages.storage.googleapis.com/18/1d/de/dec4ff9c31d9f1/US09633513-20170425-D00000.png
98 | US-9373223-B1,Method and system for gaming revenue ,Jackpot Rising Inc.,William Webb,2014-12-17,2014-12-17,2016-06-21,2016-06-21,https://patents.google.com/patent/US9373223B1/en,https://patentimages.storage.googleapis.com/df/ee/cc/a37b5c1ce5cc24/US09373223-20160621-D00000.png
99 | US-9646029-B1,Methods and apparatus for a distributed database within a network ,"Swirlds, Inc.","Leemon C. Baird, III",2016-06-02,2016-12-21,2017-05-09,2017-05-09,https://patents.google.com/patent/US9646029B1/en,https://patentimages.storage.googleapis.com/46/28/7f/11896c39e41e15/US09646029-20170509-D00000.png
100 | US-9876646-B2,User identification management system and method ,"ShoCard, Inc.","Armin Ebrahimi, Jeff Weitzman",2015-05-05,2016-05-05,2018-01-23,2018-01-23,https://patents.google.com/patent/US9876646B2/en,https://patentimages.storage.googleapis.com/76/46/dd/b2508967e8735c/US09876646-20180123-D00000.png
101 | US-9836790-B2,Cryptocurrency transformation system ,Bank Of America Corporation,"James G. Ronca, Joseph B. Castinado, Heather Dolan, Thomas E. Durbin, Richard H. Thomas",2014-06-16,2014-06-16,2017-12-05,2017-12-05,https://patents.google.com/patent/US9836790B2/en,https://patentimages.storage.googleapis.com/7b/43/4a/cfb9e40b43512e/US09836790-20171205-D00000.png
102 | US-9258307-B2,Decentralized electronic transfer system ,Alcatel Lucent,"Fabio Pianese, Noah Evans",2012-03-02,2013-02-25,2016-02-09,2016-02-09,https://patents.google.com/patent/US9258307B2/en,https://patentimages.storage.googleapis.com/ac/2f/10/cf0b25be388415/US09258307-20160209-D00000.png
103 | US-9552615-B2,Automated database analysis to detect malfeasance ,Palantir Technologies Inc.,"Shivam Mathura, Lucas Lemanowicz, Tim Vergenz",2013-12-20,2015-03-05,2017-01-24,2017-01-24,https://patents.google.com/patent/US9552615B2/en,https://patentimages.storage.googleapis.com/d7/e5/97/49d4a17a8d1d3c/US09552615-20170124-D00000.png
104 | US-9645604-B1,Circuits and techniques for mesochronous processing ,"Bitfury Group Limited, Valerijs Vavilovs",Valerii Nebesnyi,2016-01-05,2016-01-05,2017-05-09,2017-05-09,https://patents.google.com/patent/US9645604B1/en,https://patentimages.storage.googleapis.com/86/fe/1a/547593f5e22eba/US09645604-20170509-D00000.png
105 | US-9935948-B2,"Biometric data hashing, verification and security ","Case Wallet, Inc.","Stephen L. Schultz, David R. Nilosek, John Dvorak",2015-09-18,2016-09-19,2018-04-03,2018-04-03,https://patents.google.com/patent/US9935948B2/en,https://patentimages.storage.googleapis.com/e1/77/e2/4a91480e03a419/US09935948-20180403-D00000.png
106 | US-9872050-B2,"Method for generating, providing and reproducing digital contents in conjunction with digital currency, and terminal and computer readable recording medium using same ","Joon Sun Uhr, Jay Wu Hong","Joon Sun Uhr, Jay Wu Hong, Richard Ho Yun",2014-06-17,2015-06-12,2018-01-16,2018-01-16,https://patents.google.com/patent/US9872050B2/en,https://patentimages.storage.googleapis.com/61/5a/94/1a7492242ed01a/US09872050-20180116-D00000.png
107 | US-9595034-B2,System and method for monitoring third party access to a restricted item ,Stellenbosch University,"Gert-Jan VAN ROOYEN, Frederick Johannes LUTZ, Herman Arnold ENGELBRECHT",2013-10-25,2014-05-22,2017-03-14,2017-03-14,https://patents.google.com/patent/US9595034B2/en,https://patentimages.storage.googleapis.com/bd/c3/f4/e571a7dcc057c8/US09595034-20170314-D00000.png
108 | US-140937-A,Improvement in stone and glass polishing machines ,,,,,1873-07-15,1873-07-15,https://patents.google.com/patent/US140937A/en,https://patentimages.storage.googleapis.com/pages/US140937-0.png
109 | US-9704143-B2,Cryptographic currency for securities settlement ,Goldman Sachs & Co. LLC,"Paul Walker, Phil J. Venables",2014-05-16,2014-10-30,2017-07-11,2017-07-11,https://patents.google.com/patent/US9704143B2/en,https://patentimages.storage.googleapis.com/a3/56/de/4a36cbd7693479/US09704143-20170711-D00000.png
110 | US-9364950-B2,Trainable modular robotic methods ,Brain Corporation,"Eugene Izhikevich, Dimitry Fisher, Jean-Baptiste Passot, Heathcliff Hatcher, Vadim Polonichko",2014-03-13,2014-03-13,2016-06-14,2016-06-14,https://patents.google.com/patent/US9364950B2/en,https://patentimages.storage.googleapis.com/90/7e/69/f8e217eda2d8fc/US09364950-20160614-D00000.png
111 | US-9533413-B2,Trainable modular robotic apparatus and methods ,Brain Corporation,"Eugene Izhikevich, Dimitry Fisher, Jean-Baptiste Passot, Heathcliff Hatcher, Vadim Polonichko",2014-03-13,2014-03-13,2017-01-03,2017-01-03,https://patents.google.com/patent/US9533413B2/en,https://patentimages.storage.googleapis.com/c8/d1/1c/df948d12455d9d/US09533413-20170103-D00000.png
112 | US-9824408-B2,Browser payment request API ,Monticello Enterprises LLC,"Thomas M. Isaacson, Ryan C. Durham",2014-03-31,2016-09-12,2017-11-21,2017-11-21,https://patents.google.com/patent/US9824408B2/en,https://patentimages.storage.googleapis.com/ea/10/5d/6b4afe75eae27b/US09824408-20171121-D00000.png
113 | US-9425954-B1,Device and method for resonant cryptography ,Global Risk Advisors,Kevin Chalker,2015-09-15,2016-01-04,2016-08-23,2016-08-23,https://patents.google.com/patent/US9425954B1/en,https://patentimages.storage.googleapis.com/05/63/d3/68da0f4906f70d/US09425954-20160823-D00000.png
114 | US-9398018-B2,Virtual currency system ,nTrust Technology Solutions Corp.,"Robert Scott MacGregor, Milagrino Jose C. Ong",2014-03-18,2014-03-18,2016-07-19,2016-07-19,https://patents.google.com/patent/US9398018B2/en,https://patentimages.storage.googleapis.com/11/ee/5c/54f60bb5e56f26/US09398018-20160719-D00000.png
115 | US-9830580-B2,Virtual currency system ,nChain Holdings Limited,"Robert Scott MacGregor, Milagrino Jose C. Ong",2014-03-18,2014-03-18,2017-11-28,2017-11-28,https://patents.google.com/patent/US9830580B2/en,https://patentimages.storage.googleapis.com/75/3c/e0/c6fdddd29a9abd/US09830580-20171128-D00000.png
116 | US-9553982-B2,System and methods for tamper proof interaction recording and timestamping ,"Newvoicemedia, Ltd.",Ashley Unitt,2013-07-06,2015-12-16,2017-01-24,2017-01-24,https://patents.google.com/patent/US9553982B2/en,https://patentimages.storage.googleapis.com/88/34/47/76fd9526faf7a2/US09553982-20170124-D00000.png
117 |
--------------------------------------------------------------------------------
/PatentAnalysisExample.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Patent Analysis in Python\n",
8 | "### Sample using a training dataset directed at BlockChain patents and predicting relevance in the IBM portofolio\n",
9 | "David Andrews\n",
10 | "Legal Analytics"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "This sample connects to Google BigQuery to retrieve patent data. You need to do some setup in Google's cloud dev platform if you are not already using Google BigQuery. You can reference one of the many great tutorials on setting up Google BigQuery for the first time to get the libraries installed, credentials cached, and project ID setup.\n",
18 | "\n",
19 | "#### Caveat\n",
20 | "This sample uses a very dumb training set. The 'positive' samples, or samples labeled as being related to Block Chain, were gathered by doing a simple keyword search on Google Patents. The results were not reviewed by hand and there was no effort to be inclusive. As a result, the final results are interesting, but not demonstrative of what a fully-trained ML model can do when predicting results. The 'negative' samples were created by doing a random search for patents, which means they are probably not blockchain related. Immediately better results could be obtained by providing negative samples that weed out things like financial transactions patents that don't use block chain or similar technology. If you improve the training set and I would appreciate the opportunity to improve these results by including your updated training data. You can email me at david@analytics.legal and I can re-publish the training sets."
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": 1,
26 | "metadata": {},
27 | "outputs": [],
28 | "source": [
29 | "%matplotlib inline\n",
30 | "#import lots of stuff\n",
31 | "from google.cloud import bigquery\n",
32 | "import pandas as pd\n",
33 | "import matplotlib.pylab as plt\n",
34 | "from oauth2client.client import GoogleCredentials\n",
35 | "from googleapiclient import discovery\n",
36 | "from IPython.display import display, HTML\n"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "# Load Training Data"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 2,
49 | "metadata": {},
50 | "outputs": [
51 | {
52 | "data": {
53 | "text/html": [
54 | "\n",
55 | "\n",
68 | "
\n",
69 | " \n",
70 | " \n",
71 | " | \n",
72 | " Publication_Number | \n",
73 | " Label | \n",
74 | "
\n",
75 | " \n",
76 | " \n",
77 | " \n",
78 | " | 0 | \n",
79 | " US-2015310424-A1 | \n",
80 | " 1 | \n",
81 | "
\n",
82 | " \n",
83 | " | 1 | \n",
84 | " US-2015120567-A1 | \n",
85 | " 1 | \n",
86 | "
\n",
87 | " \n",
88 | " | 2 | \n",
89 | " US-2015170112-A1 | \n",
90 | " 1 | \n",
91 | "
\n",
92 | " \n",
93 | " | 3 | \n",
94 | " US-2015379510-A1 | \n",
95 | " 1 | \n",
96 | "
\n",
97 | " \n",
98 | " | 4 | \n",
99 | " US-2015287026-A1 | \n",
100 | " 1 | \n",
101 | "
\n",
102 | " \n",
103 | " | 5 | \n",
104 | " US-2015356555-A1 | \n",
105 | " 1 | \n",
106 | "
\n",
107 | " \n",
108 | " | 6 | \n",
109 | " US-2015332256-A1 | \n",
110 | " 1 | \n",
111 | "
\n",
112 | " \n",
113 | " | 7 | \n",
114 | " US-2015371224-A1 | \n",
115 | " 1 | \n",
116 | "
\n",
117 | " \n",
118 | " | 8 | \n",
119 | " US-2015332283-A1 | \n",
120 | " 1 | \n",
121 | "
\n",
122 | " \n",
123 | " | 9 | \n",
124 | " US-2015206106-A1 | \n",
125 | " 1 | \n",
126 | "
\n",
127 | " \n",
128 | " | 10 | \n",
129 | " US-2015356524-A1 | \n",
130 | " 1 | \n",
131 | "
\n",
132 | " \n",
133 | " | 11 | \n",
134 | " US-2016321654-A1 | \n",
135 | " 1 | \n",
136 | "
\n",
137 | " \n",
138 | " | 12 | \n",
139 | " US-2017011460-A1 | \n",
140 | " 1 | \n",
141 | "
\n",
142 | " \n",
143 | " | 13 | \n",
144 | " US-9397985-B1 | \n",
145 | " 1 | \n",
146 | "
\n",
147 | " \n",
148 | " | 14 | \n",
149 | " US-2016261411-A1 | \n",
150 | " 1 | \n",
151 | "
\n",
152 | " \n",
153 | " | 15 | \n",
154 | " US-2015294308-A1 | \n",
155 | " 1 | \n",
156 | "
\n",
157 | " \n",
158 | " | 16 | \n",
159 | " US-2015324764-A1 | \n",
160 | " 1 | \n",
161 | "
\n",
162 | " \n",
163 | " | 17 | \n",
164 | " US-2016342977-A1 | \n",
165 | " 1 | \n",
166 | "
\n",
167 | " \n",
168 | " | 18 | \n",
169 | " US-2016300234-A1 | \n",
170 | " 1 | \n",
171 | "
\n",
172 | " \n",
173 | " | 19 | \n",
174 | " US-2017177855-A1 | \n",
175 | " 1 | \n",
176 | "
\n",
177 | " \n",
178 | " | 20 | \n",
179 | " US-9298806-B1 | \n",
180 | " 1 | \n",
181 | "
\n",
182 | " \n",
183 | " | 21 | \n",
184 | " US-2017178237-A1 | \n",
185 | " 1 | \n",
186 | "
\n",
187 | " \n",
188 | " | 22 | \n",
189 | " US-2016012424-A1 | \n",
190 | " 1 | \n",
191 | "
\n",
192 | " \n",
193 | " | 23 | \n",
194 | " US-2015244690-A1 | \n",
195 | " 1 | \n",
196 | "
\n",
197 | " \n",
198 | " | 24 | \n",
199 | " US-2017046651-A1 | \n",
200 | " 1 | \n",
201 | "
\n",
202 | " \n",
203 | " | 25 | \n",
204 | " US-2017132630-A1 | \n",
205 | " 1 | \n",
206 | "
\n",
207 | " \n",
208 | " | 26 | \n",
209 | " US-2015278820-A1 | \n",
210 | " 1 | \n",
211 | "
\n",
212 | " \n",
213 | " | 27 | \n",
214 | " US-2016218879-A1 | \n",
215 | " 1 | \n",
216 | "
\n",
217 | " \n",
218 | " | 28 | \n",
219 | " US-2015227890-A1 | \n",
220 | " 1 | \n",
221 | "
\n",
222 | " \n",
223 | " | 29 | \n",
224 | " US-2016350749-A1 | \n",
225 | " 1 | \n",
226 | "
\n",
227 | " \n",
228 | " | ... | \n",
229 | " ... | \n",
230 | " ... | \n",
231 | "
\n",
232 | " \n",
233 | " | 1120 | \n",
234 | " US-8489331-B2 | \n",
235 | " 0 | \n",
236 | "
\n",
237 | " \n",
238 | " | 1121 | \n",
239 | " US-8494215-B2 | \n",
240 | " 0 | \n",
241 | "
\n",
242 | " \n",
243 | " | 1122 | \n",
244 | " US-8498100-B1 | \n",
245 | " 0 | \n",
246 | "
\n",
247 | " \n",
248 | " | 1123 | \n",
249 | " US-8521513-B2 | \n",
250 | " 0 | \n",
251 | "
\n",
252 | " \n",
253 | " | 1124 | \n",
254 | " US-8538960-B2 | \n",
255 | " 0 | \n",
256 | "
\n",
257 | " \n",
258 | " | 1125 | \n",
259 | " US-8539384-B2 | \n",
260 | " 0 | \n",
261 | "
\n",
262 | " \n",
263 | " | 1126 | \n",
264 | " US-8548494-B2 | \n",
265 | " 0 | \n",
266 | "
\n",
267 | " \n",
268 | " | 1127 | \n",
269 | " US-8560959-B2 | \n",
270 | " 0 | \n",
271 | "
\n",
272 | " \n",
273 | " | 1128 | \n",
274 | " US-8576276-B2 | \n",
275 | " 0 | \n",
276 | "
\n",
277 | " \n",
278 | " | 1129 | \n",
279 | " US-8578486-B2 | \n",
280 | " 0 | \n",
281 | "
\n",
282 | " \n",
283 | " | 1130 | \n",
284 | " US-8582206-B2 | \n",
285 | " 0 | \n",
286 | "
\n",
287 | " \n",
288 | " | 1131 | \n",
289 | " US-8584094-B2 | \n",
290 | " 0 | \n",
291 | "
\n",
292 | " \n",
293 | " | 1132 | \n",
294 | " US-8594467-B2 | \n",
295 | " 0 | \n",
296 | "
\n",
297 | " \n",
298 | " | 1133 | \n",
299 | " US-8612550-B2 | \n",
300 | " 0 | \n",
301 | "
\n",
302 | " \n",
303 | " | 1134 | \n",
304 | " US-8639625-B1 | \n",
305 | " 0 | \n",
306 | "
\n",
307 | " \n",
308 | " | 1135 | \n",
309 | " US-8670183-B2 | \n",
310 | " 0 | \n",
311 | "
\n",
312 | " \n",
313 | " | 1136 | \n",
314 | " US-8675067-B2 | \n",
315 | " 0 | \n",
316 | "
\n",
317 | " \n",
318 | " | 1137 | \n",
319 | " US-8687023-B2 | \n",
320 | " 0 | \n",
321 | "
\n",
322 | " \n",
323 | " | 1138 | \n",
324 | " US-8713535-B2 | \n",
325 | " 0 | \n",
326 | "
\n",
327 | " \n",
328 | " | 1139 | \n",
329 | " US-8719603-B2 | \n",
330 | " 0 | \n",
331 | "
\n",
332 | " \n",
333 | " | 1140 | \n",
334 | " US-8752963-B2 | \n",
335 | " 0 | \n",
336 | "
\n",
337 | " \n",
338 | " | 1141 | \n",
339 | " US-8793304-B2 | \n",
340 | " 0 | \n",
341 | "
\n",
342 | " \n",
343 | " | 1142 | \n",
344 | " US-8814691-B2 | \n",
345 | " 0 | \n",
346 | "
\n",
347 | " \n",
348 | " | 1143 | \n",
349 | " US-8830270-B2 | \n",
350 | " 0 | \n",
351 | "
\n",
352 | " \n",
353 | " | 1144 | \n",
354 | " US-8873227-B2 | \n",
355 | " 0 | \n",
356 | "
\n",
357 | " \n",
358 | " | 1145 | \n",
359 | " US-8896594-B2 | \n",
360 | " 0 | \n",
361 | "
\n",
362 | " \n",
363 | " | 1146 | \n",
364 | " US-8903430-B2 | \n",
365 | " 0 | \n",
366 | "
\n",
367 | " \n",
368 | " | 1147 | \n",
369 | " US-8964298-B2 | \n",
370 | " 0 | \n",
371 | "
\n",
372 | " \n",
373 | " | 1148 | \n",
374 | " US-9128281-B2 | \n",
375 | " 0 | \n",
376 | "
\n",
377 | " \n",
378 | " | 1149 | \n",
379 | " US-9129295-B2 | \n",
380 | " 0 | \n",
381 | "
\n",
382 | " \n",
383 | "
\n",
384 | "
1148 rows × 2 columns
\n",
385 | "
"
386 | ],
387 | "text/plain": [
388 | " Publication_Number Label\n",
389 | "0 US-2015310424-A1 1\n",
390 | "1 US-2015120567-A1 1\n",
391 | "2 US-2015170112-A1 1\n",
392 | "3 US-2015379510-A1 1\n",
393 | "4 US-2015287026-A1 1\n",
394 | "5 US-2015356555-A1 1\n",
395 | "6 US-2015332256-A1 1\n",
396 | "7 US-2015371224-A1 1\n",
397 | "8 US-2015332283-A1 1\n",
398 | "9 US-2015206106-A1 1\n",
399 | "10 US-2015356524-A1 1\n",
400 | "11 US-2016321654-A1 1\n",
401 | "12 US-2017011460-A1 1\n",
402 | "13 US-9397985-B1 1\n",
403 | "14 US-2016261411-A1 1\n",
404 | "15 US-2015294308-A1 1\n",
405 | "16 US-2015324764-A1 1\n",
406 | "17 US-2016342977-A1 1\n",
407 | "18 US-2016300234-A1 1\n",
408 | "19 US-2017177855-A1 1\n",
409 | "20 US-9298806-B1 1\n",
410 | "21 US-2017178237-A1 1\n",
411 | "22 US-2016012424-A1 1\n",
412 | "23 US-2015244690-A1 1\n",
413 | "24 US-2017046651-A1 1\n",
414 | "25 US-2017132630-A1 1\n",
415 | "26 US-2015278820-A1 1\n",
416 | "27 US-2016218879-A1 1\n",
417 | "28 US-2015227890-A1 1\n",
418 | "29 US-2016350749-A1 1\n",
419 | "... ... ...\n",
420 | "1120 US-8489331-B2 0\n",
421 | "1121 US-8494215-B2 0\n",
422 | "1122 US-8498100-B1 0\n",
423 | "1123 US-8521513-B2 0\n",
424 | "1124 US-8538960-B2 0\n",
425 | "1125 US-8539384-B2 0\n",
426 | "1126 US-8548494-B2 0\n",
427 | "1127 US-8560959-B2 0\n",
428 | "1128 US-8576276-B2 0\n",
429 | "1129 US-8578486-B2 0\n",
430 | "1130 US-8582206-B2 0\n",
431 | "1131 US-8584094-B2 0\n",
432 | "1132 US-8594467-B2 0\n",
433 | "1133 US-8612550-B2 0\n",
434 | "1134 US-8639625-B1 0\n",
435 | "1135 US-8670183-B2 0\n",
436 | "1136 US-8675067-B2 0\n",
437 | "1137 US-8687023-B2 0\n",
438 | "1138 US-8713535-B2 0\n",
439 | "1139 US-8719603-B2 0\n",
440 | "1140 US-8752963-B2 0\n",
441 | "1141 US-8793304-B2 0\n",
442 | "1142 US-8814691-B2 0\n",
443 | "1143 US-8830270-B2 0\n",
444 | "1144 US-8873227-B2 0\n",
445 | "1145 US-8896594-B2 0\n",
446 | "1146 US-8903430-B2 0\n",
447 | "1147 US-8964298-B2 0\n",
448 | "1148 US-9128281-B2 0\n",
449 | "1149 US-9129295-B2 0\n",
450 | "\n",
451 | "[1148 rows x 2 columns]"
452 | ]
453 | },
454 | "metadata": {},
455 | "output_type": "display_data"
456 | }
457 | ],
458 | "source": [
459 | "#load in the samples of patents we want to find\n",
460 | "#these files are just flat lists of patent numbers. Positives are examples of block chain patents and negatives \n",
461 | "#are random non-blockchain patents. Stored on public google sheets share.\n",
462 | "negative_url = \"https://docs.google.com/spreadsheets/d/e/2PACX-1vTwNjPYeJV6l0lOTjMnI65rE4i_Prtc4Gnku3HupqBzuZ5v9wzhYWAA26AivTkFPw_AbwGuiuqoj_lq/pub?output=csv\"\n",
463 | "positive_url = \"https://docs.google.com/spreadsheets/d/e/2PACX-1vSD-tiMAUTosTzEz9jiqK0JMhFLr3s_Jeb7J5Ry39NoIaEHE-Iqr1M7etvDJBTZA0ilgmUjb6KU-TCs/pub?output=csv\"\n",
464 | "df_positive_samples = pd.read_csv(positive_url)\n",
465 | "#load samples of patents that are counter-examples\n",
466 | "df_negative_samples = pd.read_csv(negative_url)\n",
467 | "\n",
468 | "#Create Label column to hold the labels\n",
469 | "#label blockchain patents 1\n",
470 | "df_positive_samples['Label'] = 1\n",
471 | "#label non-blockahin patents 0\n",
472 | "df_negative_samples['Label'] = 0\n",
473 | "\n",
474 | "#combine labeled data into a training set\n",
475 | "df_training_set = pd.concat([df_positive_samples, df_negative_samples], ignore_index=True)\n",
476 | "\n",
477 | "#change the format of the publication number from US1234567B2 to US-1234567-B2 to match the format in the database\n",
478 | "df_training_set['Publication_Number'] = df_training_set['Publication_Number'].str.replace(r\"(\\D*)(\\d*)(\\D\\d?)\", \"\\\\1-\\\\2-\\\\3\")\n",
479 | "#input data has some funky pub numbers for applications with an extra 0 after the year portion of the pub number.\n",
480 | "#need to strip it so it matches the format in Google BigQuery\n",
481 | "df_training_set['Publication_Number'] = df_training_set['Publication_Number'].str.replace(r\"(\\D*-)(\\d{4})0(\\d{6})(-A1)\",\n",
482 | " \"\\\\1\\\\2\\\\3\\\\4\")\n",
483 | "\n",
484 | "#unpacking the above REGEXP a little:\n",
485 | "# (\\D*) = grab all the non-digits at the front e.g. US\n",
486 | "# (\\d*) = grab all the digits, e.g. 1234567\n",
487 | "# (\\D\\d?) = grab a trailing non-digit optionally followed by a digit e.g. A1 or B2 or A \n",
488 | "\n",
489 | "\n",
490 | "df_training_set.drop_duplicates(['Publication_Number'], inplace=True)\n",
491 | "#show the training set to see what it looks like. Make sure it imported correctly.\n",
492 | "display(df_training_set)\n"
493 | ]
494 | },
495 | {
496 | "cell_type": "markdown",
497 | "metadata": {},
498 | "source": [
499 | "Note there are some non-US patents in the Negatives training example. Our SQL join below will weed those out, so you may notice that the row count decreases from this query to the next."
500 | ]
501 | },
502 | {
503 | "cell_type": "markdown",
504 | "metadata": {},
505 | "source": [
506 | "# Connect to Google BigQuery and upload patent numbers & labels"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": 3,
512 | "metadata": {},
513 | "outputs": [
514 | {
515 | "name": "stdout",
516 | "output_type": "stream",
517 | "text": [
518 | "\n",
519 | "\n",
520 | "\n",
521 | "Streaming Insert is 100.0% Complete\n",
522 | "\n",
523 | "\n"
524 | ]
525 | }
526 | ],
527 | "source": [
528 | "#load the dataset to Google BigQuery so we can join against the public patents data\n",
529 | "\n",
530 | "#If you don't want to try to get GBQ working, you can download the results of this step here:\n",
531 | "#combined_url = \"https://docs.google.com/spreadsheets/d/e/2PACX-1vSfXq_QpXjL3eskKnbezF33GgcBM7O1KB-TAPrfcZk1cYXdQWQWU5X_oAICJr5ipXearXbJ19_Rp4PY/pub?output=csv\"\n",
532 | "#df_training_set_finished = pd.read_csv(\"combined_url\")\n",
533 | "#then skip to Prepare Machine Learning Pipeline below\n",
534 | "\n",
535 | "\n",
536 | "# Variables to be used to access GBQ, replace with your project id, and optionally change the table and\n",
537 | "# and dataset name.\n",
538 | "PROJECT_ID = 'patenttest-182300' #change this to your project ID\n",
539 | "DEST_DATASET = 'my_new_dataset'\n",
540 | "samples_table = 'training_patents'\n",
541 | "\n",
542 | "# Create a python client we can use for executing table creation queries\n",
543 | "client = bigquery.Client(project=PROJECT_ID)\n",
544 | "# Create an HTTP client for additional functionality.\n",
545 | "credentials = GoogleCredentials.get_application_default()\n",
546 | "http_client = discovery.build('bigquery', 'v2', credentials=credentials)\n",
547 | "\n",
548 | "#attach to the dataset\n",
549 | "dataset = client.dataset(DEST_DATASET)\n",
550 | "\n",
551 | "#create the table by having Pandas push up the dataframe as a table\n",
552 | "full_table_path = '{}.{}'.format(DEST_DATASET, samples_table)\n",
553 | "df_training_set.to_gbq(destination_table=full_table_path,\n",
554 | " project_id=PROJECT_ID,\n",
555 | " if_exists='replace')\n"
556 | ]
557 | },
558 | {
559 | "cell_type": "markdown",
560 | "metadata": {},
561 | "source": [
562 | "# Query BigQuery to get training text"
563 | ]
564 | },
565 | {
566 | "cell_type": "code",
567 | "execution_count": 4,
568 | "metadata": {},
569 | "outputs": [
570 | {
571 | "name": "stdout",
572 | "output_type": "stream",
573 | "text": [
574 | "Requesting query... ok.\n",
575 | "Job ID: job_Ic0ZpX_SFT1RjExOrNCJOgc8D63N\n",
576 | "Query running...\n",
577 | "Query done.\n",
578 | "Processed: 155.0 GB\n",
579 | "Standard price: $0.76 USD\n",
580 | "\n",
581 | "Retrieving results...\n",
582 | "Got 1101 rows.\n",
583 | "\n",
584 | "Total time taken 6.62 s.\n",
585 | "Finished at 2018-01-15 16:18:33.\n",
586 | "Shape: (1101, 3)\n"
587 | ]
588 | }
589 | ],
590 | "source": [
591 | "#create our query string. The query joins the BigQuery patent data using our training set\n",
592 | "# publication number. Because we formatted in Python to match the BigQuery format, we can just use traing equality\n",
593 | "# in the \"on\" clause of the join.\n",
594 | "# The concat combines the data fields we are interested in. For our training data, we are going to use\n",
595 | "# the title, abstract, claims, and CPC codes combined into one text block to feed into the machine learning code.\n",
596 | "\n",
597 | "query = \"\"\"\n",
598 | "select pubs.publication_number, Label,\n",
599 | " CONCAT(\n",
600 | " IFNULL(\n",
601 | " (SELECT text from UNNEST(pubs.title_localized)), \" \"), \" \",\n",
602 | " IFNULL(\n",
603 | " (SELECT text from UNNEST(pubs.abstract_localized)), \" \"), \" \",\n",
604 | " IFNULL( \n",
605 | " (SELECT text from UNNEST(pubs.claims_localized)), \" \"), \" \",\n",
606 | " IFNULL(\n",
607 | " ARRAY_TO_STRING( ARRAY(SELECT code from UNNEST(pubs.cpc)), \" \"), \" \" ), \" \",\n",
608 | " IFNULL(\n",
609 | " ARRAY_TO_STRING( ARRAY(SELECT REGEXP_REPLACE( code, \"/.*\", \"\") from UNNEST(pubs.cpc)), \" \"), \" \")) as text\n",
610 | "from\n",
611 | " `patents-public-data.patents.publications` as pubs, UNNEST(title_localized) as title\n",
612 | " JOIN `\"\"\" + full_table_path + \"\"\"` as input\n",
613 | " on pubs.publication_number = input.Publication_Number \n",
614 | "where (SELECT language from UNNEST(pubs.title_localized) LIMIT 1) = 'en' and\n",
615 | "(SELECT language from UNNEST(pubs.abstract_localized) LIMIT 1) = 'en' and\n",
616 | "(SELECT language from UNNEST(pubs.claims_localized) LIMIT 1) = 'en'\n",
617 | "\"\"\"\n",
618 | "df_training_set_finished = pd.read_gbq(query, project_id=PROJECT_ID, dialect='standard')\n",
619 | "#check to make sure we got back a dataset that looks right.\n",
620 | "print(\"Shape:\", df_training_set_finished.shape)"
621 | ]
622 | },
623 | {
624 | "cell_type": "markdown",
625 | "metadata": {},
626 | "source": [
627 | "# Prepare Machine Learning Pipeline"
628 | ]
629 | },
630 | {
631 | "cell_type": "code",
632 | "execution_count": 5,
633 | "metadata": {},
634 | "outputs": [],
635 | "source": [
636 | "from sklearn.pipeline import Pipeline\n",
637 | "from sklearn.linear_model import SGDClassifier\n",
638 | "from sklearn.model_selection import GridSearchCV\n",
639 | "from sklearn.feature_extraction.text import TfidfTransformer\n",
640 | "from sklearn.feature_extraction.text import CountVectorizer\n",
641 | "from sklearn.model_selection import cross_val_score\n",
642 | "from sklearn.utils import shuffle\n",
643 | "from sklearn.externals import joblib\n",
644 | "from sklearn.model_selection import train_test_split\n",
645 | "from sklearn.metrics import roc_curve, auc\n",
646 | "from sklearn.metrics import confusion_matrix\n"
647 | ]
648 | },
649 | {
650 | "cell_type": "code",
651 | "execution_count": 6,
652 | "metadata": {},
653 | "outputs": [],
654 | "source": [
655 | "#now that the data is prepared, get ready to do the ML training.\n",
656 | "features, labels = shuffle( df_training_set_finished['text'].values, df_training_set_finished['Label'])\n",
657 | "\n",
658 | "#Grid sweep parameters to use the SGDClassifier\n",
659 | "#there are lots of options on algorithm and parameters to try. These are some that have worked the best\n",
660 | "#for me in the past on similar data.\n",
661 | "parameters = {\n",
662 | " 'loss': ['log'],\n",
663 | " 'penalty': ['none', 'l2', 'l1', 'elasticnet'],\n",
664 | " 'n_iter':[3, 8, 12],\n",
665 | " 'alpha': [0.001, 0.01, 0.1 ],\n",
666 | " 'learning_rate': ['constant','optimal','invscaling'],\n",
667 | " 'eta0':[.5,1],\n",
668 | " 'class_weight': ['balanced', None]\n",
669 | "}\n",
670 | "\n",
671 | "#Grid search will find the best combination of parameters to use in our ML model\n",
672 | "grid_search = GridSearchCV(SGDClassifier(), parameters )\n",
673 | "\n",
674 | "#Stick a CountVectorizer, which will convert the text into word counts in the pipeline\n",
675 | "#Next, use a TF-IDF transformer to convert the word counts to TFIDF\n",
676 | "#Last, do the grid search.\n",
677 | "#store this all in a pipeline so we can do it over and over.\n",
678 | "text_clf = Pipeline( [('vect', CountVectorizer(stop_words='english')),\n",
679 | " ('tfidf',TfidfTransformer()),\n",
680 | " ('clf',grid_search)])"
681 | ]
682 | },
683 | {
684 | "cell_type": "markdown",
685 | "metadata": {},
686 | "source": [
687 | "# Search the grid for the best trained model"
688 | ]
689 | },
690 | {
691 | "cell_type": "code",
692 | "execution_count": 7,
693 | "metadata": {},
694 | "outputs": [
695 | {
696 | "name": "stdout",
697 | "output_type": "stream",
698 | "text": [
699 | "Grid Searching\n",
700 | "Best Score 0.932788374205\n",
701 | "Best parameters set:\n",
702 | "\talpha: 0.001\n",
703 | "\tclass_weight: None\n",
704 | "\teta0: 1\n",
705 | "\tlearning_rate: 'constant'\n",
706 | "\tloss: 'log'\n",
707 | "\tn_iter: 12\n",
708 | "\tpenalty: 'none'\n"
709 | ]
710 | }
711 | ],
712 | "source": [
713 | "#Fit the data\n",
714 | "print(\"Grid Searching\")\n",
715 | "text_clf.fit( features, labels)\n",
716 | "print(\"Best Score\", grid_search.best_score_)\n",
717 | "print( \"Best parameters set:\" )\n",
718 | "best_parameters = grid_search.best_estimator_.get_params()\n",
719 | "for param_name in sorted(parameters.keys()):\n",
720 | " print (\"\\t%s: %r\" % (param_name, best_parameters[param_name]))\n",
721 | " \n"
722 | ]
723 | },
724 | {
725 | "cell_type": "markdown",
726 | "metadata": {},
727 | "source": [
728 | "# Save the best model and do cross validation"
729 | ]
730 | },
731 | {
732 | "cell_type": "code",
733 | "execution_count": 8,
734 | "metadata": {},
735 | "outputs": [
736 | {
737 | "name": "stdout",
738 | "output_type": "stream",
739 | "text": [
740 | "Accuracy: 0.906735751295\n"
741 | ]
742 | }
743 | ],
744 | "source": [
745 | "#grab the best estimator from the grid search\n",
746 | "estimator = grid_search.best_estimator_\n",
747 | "#create a reuseable pipepline to do the predictions\n",
748 | "final_pipeline = Pipeline( [('vect', CountVectorizer(stop_words='english')),\n",
749 | " ('tfidf',TfidfTransformer()),\n",
750 | " ('estimator', estimator)])\n",
751 | "\n",
752 | "# split data 65%-35% into training set and test set\n",
753 | "features_train, features_test, labels_train, labels_test = train_test_split(df_training_set_finished['text'].values,\n",
754 | " df_training_set_finished['Label'].values, test_size=0.35)\n",
755 | "final_pipeline.fit( features_train, labels_train )\n",
756 | "\n",
757 | "accuracy = final_pipeline.score(features_test, labels_test)\n",
758 | "print(\"Accuracy: \", accuracy)\n",
759 | "\n",
760 | "#uncomment here to do a cross val score instead of the test/train split above\n",
761 | "#score = cross_val_score( final_pipeline, features, labels)\n",
762 | "#print(\"cross val score:\", score)\n"
763 | ]
764 | },
765 | {
766 | "cell_type": "code",
767 | "execution_count": 9,
768 | "metadata": {},
769 | "outputs": [
770 | {
771 | "name": "stdout",
772 | "output_type": "stream",
773 | "text": [
774 | "Confusion Matrix:\n",
775 | "\t\t\t Pred True \t Pred False\n",
776 | "BlockChain True: \t 254 \t\t 8\n",
777 | "BlockChain False: \t 28 \t\t 96\n",
778 | "False Negatives: 8\n",
779 | "False Positives: 28\n"
780 | ]
781 | },
782 | {
783 | "data": {
784 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUUAAAEUCAYAAAC8piQPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALiAAAC4gB5Y4pSQAAGgZJREFUeJzt3XmYVOWZ/vHv3Q0NsimiooI4ruAWcU1QYnSCmqhBHMZE\nR2NUTERxxRElo1kc18Qk4280Kj8YcSNORKIYE/eoCa4oaAQM7oJAVCTsKE0/80cd2tNK06egT1dV\nc3+4zmXVqVPnfaC87ut9z/IeRQRmZlZQVeoCzMzKiUPRzCzFoWhmluJQNDNLcSiamaU4FM3MUhyK\nZmYpDkUzsxSHoplZikPRzCylTakLSFObqqB9danLsPXQp9eOpS7BmsFrM15bHBFd1mcf6touWFmX\n/QtLa5+MiIMb3Z+0DTAW2BqoAyZGxEhJBwP3A28mm74dEcck3+kB/AbYEpgDHB8Rc9dWRlmFIu2r\noV/3Uldh62HKgy+WugRrBhu16Th7vXeysg4O2DL79o/M3qKJLWqBiyJisqQa4DFJRwMLgeciYsAa\nvnMNcEdEjJJ0JnAlcMraGvHw2czyU1XE0oSImBsRk5PXnwJTgF5NfO0o4Lbk9a3A0VlKNjPLh5R9\nga6SpqeWYY3vVpsCg4BHklX7SJoi6SlJhyfbdAOWRsQKgIhYCqyUtPHaSi6v4bOZtR4CqlTMNxZE\nxK5N7rYwdB4PXBcRr0maA2wbEYsk7QY8KOkgYPG6lO2eopnlpxmHzwCSqoFxwNSI+AVARCyKiEXJ\n62nAJGBvYD7QUVL75LsdgZqIWNhUyWZmOShi6KzMPcpRFHqAF9S3Im0lFXaQnG3uB0yLwgzaDwAn\nJZt+D5jYVAMePptZfooaPTexK+lA4FTgVWBKkoP/Q+HynDMkrUw2vSQiXkteXwyMk3QhMBc4vql2\nHIpmlo/ijymuVURMovGYvb6R78wCvlpMOw5FM8tPM/YUW4pD0czyk/1YYdlwKJpZfiovEx2KZpYT\nAdWVl4oORTPLj4fPZmYpzXj2uaU4FM0sP5WXiQ5FM8tLUXeqlA2HopnlQ7inaGbWgI8pmpmlVF4m\nOhTNLCe+TtHM7HN8osXMLKXyMtGhaGY5ck/RzCylAuf2dyiaWT6Ee4pmZg1UXiY6FM0sL4Lqyhs/\nOxTNLD/uKZqZfUZFHFOMHOsohkPRzPIhh6KZWQMVePLZoWhm+akqIhVX5VhHMRyKZpabYobP5cKh\naGa5EHIompnVE1R5klkzs8+4p2hmluJQNDNLUQXe0uJQNLNcFCbJcSiamdWrwEx0KJpZfoq5eLtc\nOBTNLB/ydYpmZvWEr1M0M2vAPUUzs5RKDMXKmyvczCqGkuOKWZYM+9pG0mOSZkiaJumq1GdXS3pD\n0kxJg1Prd5f0oqTXJd0rqVNT7TgUzSwfat5QBGqBiyJiF2AvoL+koyUNAA4AegOHAL9Khd9NwMiI\n2AmYCVzQVCMOxWZQ07aGUef/jLdue5pF973GjDFPcMrh36n//E/X3s2KB95k8cS/1S9bdev+hf1s\nsclmzL/nVabc9FBLlm9r8Mknn3Dm6cPos+OubL5Jd/bcbS9uveXW+s9nTJ/BNw89gq0268E/9diO\nYUPPYtmyZSWsuDxJ2ZemRMTciJicvP4UmAL0AgYDYyNiVUS8D0wCDpPUHegVEQ8nuxiTbLtWDsVm\n0Ka6mrkff8CAi46ny9F9OPna4fzi9Es5dJ+D6re5aPSVdB7Yu36ZO//vX9jP9WdfzpQ3X23J0q0R\ntbW1bLnllvzhod/zwYJ5jBpzMxdf+EMeffhRAE7+7ins3Htn3p3zNi9MfZ5XXv4rV11+dYmrLi+r\npw4roqfYVdL01DKs0X1LmwKDgEeAnsCs1MfvAdusZf1aORSbwbIVy/nxrdfy1tx3AXhuxkv86eVn\n6L/7/pn3MbDfYWzaeRNuf/SevMq0InTs2JEf/fRStt9heyTx5a/sz0EHH8TTk54B4O233uG4fzuO\nmpoaNt98c4761pFMe3VaiasuP9VVVZkXYEFE7JpabljTPiXVAOOB6yLiNRp/ZuA6neVxKOagXdt2\n7N+7L6+8NaN+3SUnnMv8e17lpRsf5LsDGvbgu3TozC+H/oih141s6VItoxUrVjD5hcns/qXdAThv\n+LmMu2Mcy5cvZ968eUy8byJHHHVEiassP805fC7sT9XAOGBqRPwiWT2Lhj3AXsDsZFnT+rVyKOZg\n9PCf8/r7bzPhL38AYOSYq9jhpAPp/u2+XDzmKv77rP9k0IHfqN/+Z9//D8Y+fDdvvP92qUq2tYgI\nzvjBmey4444MOuZoAA77xmE8PelpNt+kO9v13IEePXvyvVNOKnGlZab5T7QAjAIW0/CEyQTgZEnV\nknoA/YGHI2IeMEvSYcl2Q5Jt1yrXUJR0cHLq/A1Jo5OUb9V+fc6V9N5mewb9eAgRhYc2PjvjJRYt\nW0ztqloenvwkN//+Tr7ztW8B0H/3/Tlwt3255n9/XcqyrRERwblnncfMma/z2wl3UVVVxYIFCzjy\n8KM4ZcgpfLz4I+Z8OJuOHTtwyklDSl1uWRHJccWMf5rcn3QgcCqwLzBF0lRJ50TEI8CzFM4uPwEM\nj4jFydfOAK6R9DrQB7i2qXZyu3hbUhUwGhgYEdMl/RY4Ebh17d+sXDecfQVf7rMXXx9xHIuWLW50\nu7qoq3/99b36s/1W2zLnrhcBaNe2ho3atefD8a+wxw8GMO/jD3Kv29YsIjjv7PN54fkX+MPDD7Dx\nxhsD8Nabb7N8+XKGnX0mkqipqWHI94cw6KhjSlxx+WnOi7cjYhKNHCeMiBHAiDWsf4XC5TuZ5dlT\n3A+YExHTk/eZTodXquvPvpwDd9uPQy86nn8sWVi/fuOOXfjm/v/MRu3aU1VVxT/vdSBDjzqRe5Kh\n9S/vGcXOpxxE36GH03fo4fzo1mv526w36Tv0cD74x0el+usYcP45w3nm6Wf4/YP307Vr1/r1vfvs\nTKdOnbj5xlHU1tayePFibhlzC3v23bOE1Zajos8+l4U8b/Nr8nR4csr9s9PuNZV5iLPXFj0YNvBk\nVny6gnfvfK5+/R2PTeDSsT/nxyeez10/LJxIe+fvsxl+02WMf+oBABYvW8LiZUvqv7NgyUJWrqrl\n/Y/mtuxfwhp49933uPnGUbRr147e2+9Sv/74E47jv3/9/xh/791cMvJSfnLpT6murqbfAV9h9C2j\nSlhxeSqjrMtMq497NfuOpX8FjomIE5L3uwDjIqLRrqw6tQ36ffGiZqscyx+cWeoSrBls1KbjjIjY\ndX32UdO9U/S68IDM27954SPr3WZzyLOn2NhpcjPbAEhQVVV5o788K54M9JS0OvkznQ43s9ajua9T\nbAm59RQjYpWk04DxktoBTwK359WemZWfcjqBklWu8ylGxONAyY8RmFlpOBTNzOqV16U2WTkUzSw3\nFZiJDkUzy4fk4bOZWQMORTOzFD/i1MwsxT1FM7M0h6KZWaLMZr/JyqFoZrkQFdlRdCiaWX7cUzQz\nS3Eompml+JIcM7PVfEeLmVlDDkUzs4Q8S46ZWUMORTOzlArMRIeimeXHPUUzsxSHoplZovCIU4ei\nmVk99xTNzOqV2QOdM3Iomllu3FM0M0upwEOKDkUzy4nvfTYz+4yAKoeimdln3FM0M0sIaONQNDNb\nzbPkmJk1UInHFKtKXYCZtVLJ2eesS5O7k66TNFtSbWrdwZIWS5qaLL9LfdZD0lOSZkp6QtJWWcp2\nKJpZLkQhYLIuGdwN7LuG9c9FRN9kOSa1/hrgjojYGfgtcGWWRjx8NrPcNOfwOSL+AkWd0T4KOC15\nfStweZYvuadoZrkpcvjcVdL01DIsYzP7SJqSDJUPT9rtBiyNiBUAEbEUWClp46Z25p6imeVCQHVx\nPcUFEbFrkc28BGwbEYsk7QY8KOkgYHGR+6nnnqKZ5URUKfuyLiJiUUQsSl5PAyYBewPzgY6S2gNI\n6gjURMTCpvbpUDSz3OQdipK2UjL2ltQD6AdMi4gAHgBOSjb9HjAxyz49fDazfDTzhBCSbgaOBKol\nzQbuA2YAZ0hamWx2SUS8lry+GBgn6UJgLnB8lnbWGopJQ7Gmj4CIiJosjZjZhqe5J4SIiNMb+ej6\nRrafBXy12HbWGooR0bbYHZqZrVZ597MUOXyWtCnQfvX7iJjT7BWZWatRibf5ZQpFSYcCNwI9gSVA\nV+A9YLv8SjOzStba51O8BjgIeCAi9pL0r8CB+ZVlZpVPVFdV3gUuWSuOZKjcJnkzHtg/t6rMrFVQ\nEUu5yNpTXC6pGpguaQTwPoUhtJnZGkmVOXzO2lM8B9gIOA/oA/wLn10UaWa2RnlfvJ2HTD3FiHgp\nebkEODW/csysNWm1M29LuoU1XMQdEQ5IM2tU5Z1myX5M8dHU6/bAIODt5i/HzFqTVttTjIg70+8l\njQUezKMgM2sdWvt1ip+3KbB9cxYCsPM22zHp/qeae7fWgn739t2lLsHKhSrzOsWsxxRf57NjitXA\nJsAleRVlZpWv8IyW1ttTHJB6XQv8PSJqG9vYzAwq85hi1r7tTyPi3WR5PyJqk+OKZmaNarXXKQJf\nWsO6NT1q0Mysnlrb8FnS2RTuZukpaWbqo87AQ3kWZmaVLttD7stNUz3F24D7gV8C56fWL46Ij3Or\nyswqXqu8JCd58tVCSRcDH0bEMgBJHSTtFBGvt0SRZlaZqlVd6hKKlvVEy2+Alan3tcC45i/HzFqN\n5MFVWZdykfVES3VE1IdiRHwqyQ+tMrNGKflTabL2FBdK6r/6jaSDgEX5lGRmrUVrviTnXGC8pAUU\njp9uDAzOrSozaxXKaVicVdYJIaZK6gP0BrpReF7LncAeOdZmZhWuqgInD8tUsaTOwHcpPHT6IaAL\nMCTHusyswonKPNGy1lCUdJSku4DXgf7A5RTuex4REc+3RIFmVqmyB2I5hWJTw+eJwJPAfhExC0BS\nXe5VmVmrUK3KGz43FYpfAU4EnpE0lcJxxMr7W5pZi2uVT/OLiOcj4hxgW+BGYCDQTdKdkga1RIFm\nVrlUxJ9ykanXFxGrIuKBiDge2JrCM1vOyrUyM6twokpVmZdyUfTjCCJiMXBLspiZNaqcTqBkta7P\naDEza1I5DYuzciiaWS5a5dRhZmbrwz1FM7N6orqq8uZTdCiaWS6Ee4pmZg1U4jHF8rk4yMxal2ae\neVvSdZJmS6r93PqrJb0haaakwan1u0t6UdLrku6V1ClL2Q5FM8tNFcq8ZHA3n3u0sqQBwAEUpjU8\nBPhVKvxuAkZGxE7ATOCCbDWbmeVAzTxLTkT8JSLmfW71YGBsctfd+8Ak4DBJ3YFeEfFwst0YMk6M\n7VA0s9xIVZkXoKuk6allWIYmegKzUu/fA7ZZy/om+USLmeWmyKnDFkTErkU20VgXc53P8LinaGa5\naYFZcmbRsAfYC5idLGta3ySHopnlpEVm3p4AnCypWlIPCk8IeDg59jhL0mHJdkOSbZvk4bOZ5UKQ\n9axytv1JNwNHAtWSZgP3RcQwSYdSOLtcBwxPZvICOAO4VdINwAzghCztOBTNLDdqxnkSI+L0RtaP\nAEasYf0rwF7FtuNQNLPc+DY/M7PV5ElmzcwacE/RzCwh1CofcWpmts48fDYzS1EFXgrtUDSz3Lin\naGaWKLeH3GflUDSz3FTizNsORTPLjXuKZmYpPqZoZpaQRJX8iFMzs3rNOUtOS3EomlluPHy2Rs19\nfy4XnTeSZ59+Hgn6f60/1/zXlWy2+WalLs0a8fd3P2DsZbfz+tQ3abdRDd846TC+9YMj6j//02+f\n5Pej/8jH8z6m86adOemSE9h3wN4lrLj8+ESLNeqi80YC8NLfXoAIhp4yjB9ecCmjbruxxJXZmtSt\nquPaof/FvofuzQU3ncsHsz7kqpN/zqZbduXAgf147K4n+OMtD3H2r85g2117sWj+IlYs+6TUZZed\nSuwpVt49OBXq3Xfe4+jBA+nUqSOdOndi0OCBzJg2o9RlWSPmvDWXuW/PY/BZg2jTtg1bb78VBx97\nEI//7xPUrapj/HUTOOmSE/in3bZFEhtvtjHde21R6rLLSuHS7exPfi4X7im2kKHnnM7ECfdz6DcH\nEBFMuPteDjvi0FKXZY2IiAb/BYi64L2/zWbOW3NZ+NEi5r49l9GX3kJdbR17fm0PTrj4eDp03qhU\nJZelSrx4u3ziuZXbv99+fPjhfHbcqg87bb0L/1iwkPMuPKfUZVkjttpuSzbvsRnjr/sdKz9ZyezX\n3+eJ8X9m+ZLlLFm4FIAXH5vK5RN+wpUTL+OD2R9xx5XjSlx1+alSVealXORaiaSDJU2T9Iak0VIF\nXrTUDOrq6jj2yO/w5X778c5Hb/DOR2/w5X77cexRx5W6NGtEm7ZtuOCmc3ln+rsM++r53DD8Jr42\nuD+dNulE+w7tABh4+pF02bQzXTbtzNGnH8VLj08tcdXlpwUecdrscgtFFZ5YMxo4NiJ2BLoAJ+bV\nXjlb8PECZr03m++fOYQOHTrQoUMHTjvjVF584SXmfzS/1OVZI3ru1IORYy9k1PPXc9X9/0ntp7Xs\nsn9vtt5+K9q2a1vq8sqeoCUecdrs8uwp7gfMiYjpyfsxwOAc2ytb3TbrxnY7bMeYm29hxYoVrFix\ngjE3j2XrHlvTbbNupS7PGvHea7NYsewTaj+t5fmHJvPE+D9zzJkDqWlfQ/+j+3H/qAdYsnApSxct\n5f5RD7DPgKIfHNfKiaoi/pSLPE+09ARmpd6/B2yT3kDSMGDY6vdbdN88x3JK6/a7b+GSET/mSzvs\nTV1dHXvsuTu3jx9b6rJsLZ79w/M8+pvHWfnJSnr16cXwG8+hV5/C/8Lf/Y8TGPuT2zjvkH+nTU1b\n9vl6X04ceXyJKy4zfnDVFzT5rxERNwA3rH7fe5edYy2bV7Teu/Tm7vvvKnUZVoRvDx/Mt4eveXDT\nvkM7hv7s+y1cUeUpp2OFWeUZirNo2DPsBczOsT0zKzOV2FPMcyA/Gegpadfk/RBgQo7tmVkZEZV5\n9jm3nmJErJJ0GjBeUjvgSeD2vNozs3IjVEbXH2aV6x0tEfE4sGuTG5pZq1ROPcCsfJufmeWmEo8p\nOhTNLDfuKZqZpTgUzcwSorxu38vKoWhmuXFP0cwsxaFoZpZSTvMkZuVQNLN8eEIIM7O08rp9L6vK\n69uaWUXI495nSe8ks/lPTZY9kvVXJzP8z5S0XvO2uqdoZrnJafh8eETUz7glaQBwANAb2BJ4RtJD\nEbFkXXbunqKZ5aaFZskZDIyNiFUR8T4wCThsXXfmUDSz3BQZil0lTU8twxrZ7f3J0PkKSW3JMMt/\nMTx8NrPcFDl12IKIaGpWra9GxCxJHYFbgX8nwyz/xXBP0cxykv1JflmPPUbErOS/Syk8LfQAmnmW\nf4eimeWiuc8+S+ooqUvyuprCscRXKMzof7Kkakk9gP7Aw+tat4fPZpabZr5OsTswIXmmfDXwDHBF\nRCyTdCgwE6gDhkfE4nVtxKFoZrlpzktyIuItoG8jn40ARjRHOw5FM8tNJd7R4lA0s9w4FM3M6nmS\nWTOzepKnDjMza8DDZzOzBhyKZmb1Ki8SHYpmliOfaDEzqycqsa/oUDSz3FReJDoUzSxXlReLDkUz\ny4XwMUUzswZ8naKZWUolhmLl3YNjZpYj9xTNLDeVeEzRPUUzsxT3FM0sJ+v9POeScCiaWW4cimZm\nCV+naGb2BQ5FM7N6lReJDkUzy1XlxaJD0cxyU4nHFH2doplZinuKZpYTX6doZtaAQ9HMbDVV5jFF\nh6KZ5cihaGYGVOpjqxyKZparyotFh6KZ5aYSjyn6OkUzsxRFRKlrqCdpETC71HXkrCuwoNRF2HrZ\nEH7DnhHRZX12IOkJYIsivvJBRBy8Pm02h7IKxQ2BpOkRsWup67B159+wdfPw2cwsxaFoZpbiUGx5\nN5S6AFtv/g1bMR9TNDNLcU/RzCzFoWhmluJQbCGSDpY0TdIbkkZLqi51TVY8/46tn0OxBUiqAkYD\nx0bEjkAX4MTSVmXF8u+4YXAotoz9gDkRMT15PwYYXMJ6bN34d9wAOBRbRk9gVur9e8A2JarF1p1/\nxw2AQ7FlVN5UIbYm/h03AA7FljGLhj2KXrT+iS9aI/+OGwCHYsuYDPSUtHoSgSHAhBLWY+vGv+MG\nwKHYAiJiFXAaMF7Sm8AS4PbSVmXF8u+4YfBtfmZmKe4pmpmlOBTNzFIcimZmKQ5FM7MUh6KZWYpD\ncQMkKSRNlfSqpCcl7bCe+ztY0qPJ64GSftTE9oMkfWkd2jlZ0uh1rdMsC4fihmlVRPSNiN2BZ4Ff\nfH4DSW3WZccRMTEiLmtis0FA0aFo1hIcivYEsBMUntMr6Yrkeb1XS6pK3j8v6RVJV67+kqQTJM2U\n9Gfg6NT6+t5c6vt/lfSypF9KOgQYCFyR9FYPkLSRpJuSdv4q6azU/i5M2vkT0K8l/kFsw7ZOvQFr\nHSSJQkC9nFrdEzgkIkLSqQARsX8yl+C9kr4JTAF+DuwDzAPGN9LEEGAvYJ+I+FRSt4iYL2ki8GhE\n3JHUcRnwUkQMldQOmCTpcaAdhTtI9gE+pRDg05rxn8DsCxyKG6ZqSVMpzPryGnBe6rNx8dltTkcA\ne0o6MnnfkUKvsgb4S0TMBZB0G3D2Gto5HLgxIj4FiIj5jdRzBLCRpDOT912A3hQmXLgvIhYn7dwF\n7FHsX9asGA7FDdOqiOjbyGdLU68FXBgR96Y3kHQ02WSdakvACREx9XPtnJvx+2bNxscUbW3+CJwh\nqT2ApK0lbQk8BxwoactkCN7YlPwPJt+vSb7fLVm/mEJvMN3OuaufdyJpJ0ldgKeAb0nqlOzj2838\n9zP7Aoeirc0Y4BlgsqS/Upgma5OImAeMoBBaTwEz1/L9qcCUZLg+Mlk/Djhr9YkW4HIKM868LOlV\n4P8DNRExBfgf4CXgIeD5HP6OZg14lhwzsxT3FM3MUhyKZmYpDkUzsxSHoplZikPRzCzFoWhmluJQ\nNDNLcSiamaX8H30FogZjwc5QAAAAAElFTkSuQmCC\n",
785 | "text/plain": [
786 | ""
787 | ]
788 | },
789 | "metadata": {},
790 | "output_type": "display_data"
791 | }
792 | ],
793 | "source": [
794 | "#plot confusion matrix\n",
795 | "labels_predicted = final_pipeline.predict(features_test)\n",
796 | "cm = confusion_matrix(labels_test, labels_predicted)\n",
797 | "\n",
798 | "#print in text format\n",
799 | "print(\"Confusion Matrix:\\n\\t\\t\\t\", \" Pred True \\t Pred False\")\n",
800 | "print\n",
801 | "print(\"BlockChain True:\",\"\\t\", cm[0][0], \"\\t\\t\", cm[0][1])\n",
802 | "print(\"BlockChain False:\",\"\\t\", cm[1][0], \"\\t\\t\", cm[1][1])\n",
803 | "print(\"False Negatives:\", cm[0][1])\n",
804 | "print(\"False Positives:\", cm[1][0])\n",
805 | "\n",
806 | "#now plot confusion matrix with matplotlib to get all fancy\n",
807 | "import matplotlib\n",
808 | "import matplotlib.pyplot as plt\n",
809 | "import numpy as np\n",
810 | "\n",
811 | "fig = plt.figure(figsize=(6, 4), dpi=75)\n",
812 | "plt.imshow(cm, interpolation=\"nearest\", cmap=plt.cm.Greens)\n",
813 | "plt.colorbar()\n",
814 | "tick_marks = [0,1]\n",
815 | "plt.xticks(tick_marks, labels)\n",
816 | "plt.yticks(tick_marks, labels)\n",
817 | "plt.xlabel(\"Predicted\")\n",
818 | "plt.ylabel(\"Actual\")\n",
819 | "ax = fig.add_subplot(1,1,1)\n",
820 | "ax.text(0, 0, str(cm[0][0]), color='white', fontsize=12, horizontalalignment='center')\n",
821 | "ax.text(1, 0, str(cm[1][0]), color='black', fontsize=12, horizontalalignment='center')\n",
822 | "ax.text(0, 1, str(cm[0][1]), color='black', fontsize=12, horizontalalignment='center')\n",
823 | "ax.text(1, 1, str(cm[1][1]), color='black', fontsize=12, horizontalalignment='center')\n",
824 | "\n",
825 | "plt.show()\n",
826 | "\n",
827 | "\n"
828 | ]
829 | },
830 | {
831 | "cell_type": "markdown",
832 | "metadata": {},
833 | "source": [
834 | "# Pull target porfolio\n",
835 | "Note: this is a very unsophisticated way to pull the target portfolio. The best way, in my experience, is to use a third party database that includes all reassignment data. You can upload patent numbers to GBQ and do a join much as we did on the training data above. The technique below, however, isn't bad for large portfolios like IBM, Google, and Microsoft, to get an idea before you spend a lot of time making sure you have the correct portfolio to search. Companies like Intellectual Ventures, which uses many different holding companies, can be notoriously hard to search."
836 | ]
837 | },
838 | {
839 | "cell_type": "code",
840 | "execution_count": 10,
841 | "metadata": {},
842 | "outputs": [
843 | {
844 | "name": "stdout",
845 | "output_type": "stream",
846 | "text": [
847 | "Requesting query... ok.\n",
848 | "Job ID: job_D9gakZHCX35BshdCyHhI-Zu_JKcA\n",
849 | "Query running...\n",
850 | "Query done.\n",
851 | "Cache hit.\n",
852 | "\n",
853 | "Retrieving results...\n",
854 | " Got page: 3; 34% done. Elapsed 7.72 s.\n",
855 | " Got page: 4; 45% done. Elapsed 10.42 s.\n",
856 | " Got page: 5; 56% done. Elapsed 13.32 s.\n",
857 | " Got page: 6; 67% done. Elapsed 16.02 s.\n",
858 | " Got page: 7; 78% done. Elapsed 18.23 s.\n",
859 | " Got page: 8; 89% done. Elapsed 20.65 s.\n",
860 | " Got page: 9; 100% done. Elapsed 23.1 s.\n",
861 | "Got 8216 rows.\n",
862 | "\n",
863 | "Total time taken 23.19 s.\n",
864 | "Finished at 2018-01-15 16:20:04.\n",
865 | "(8216, 3)\n"
866 | ]
867 | }
868 | ],
869 | "source": [
870 | "#get target portofolio\n",
871 | "#pull patent applications where the harmonized assignee is Amazon on the face of the patent (might not be current assignee)\n",
872 | "#The query creates the same text as above, but this time we will use it for prediction instead of training.\n",
873 | "#this might take a while. Amazon should have about 8,000 rows, so it will fetch and process quickly.\n",
874 | "#the query pares the data down by having only patents published after 2000. If you want a larger dataset, try uncommenting\n",
875 | "#one of the other assignees bellow.\n",
876 | "\n",
877 | "#if you don't want to use Google BigQuery, you can load the results from here:\n",
878 | "#df_target = pd.read_csv()\n",
879 | "\n",
880 | "#assignee = \"IBM%\"\n",
881 | "#assignee = \"GOOGLE%\"\n",
882 | "#assignee = \"MICROSOFT%\"\n",
883 | "assignee = \"AMAZON TECH%\"\n",
884 | "\n",
885 | "query = \"\"\"\n",
886 | "select pubs.publication_number, --pubs.assignee_harmonized,\n",
887 | " (SELECT text from UNNEST(pubs.title_localized)) as title,\n",
888 | " CONCAT(IFNULL(\n",
889 | " (SELECT text from UNNEST(pubs.title_localized)), \" \"), \" \",\n",
890 | " IFNULL(\n",
891 | " (SELECT text from UNNEST(pubs.abstract_localized)), \" \"), \" \",\n",
892 | " IFNULL( \n",
893 | " (SELECT text from UNNEST(pubs.claims_localized)), \" \"), \" \",\n",
894 | " IFNULL(\n",
895 | " ARRAY_TO_STRING( ARRAY(SELECT code from UNNEST(pubs.cpc)), \" \"), \" \" ), \" \",\n",
896 | " IFNULL(\n",
897 | " ARRAY_TO_STRING( ARRAY(SELECT REGEXP_REPLACE( code, \"/.*\", \"\") from UNNEST(pubs.cpc)), \" \"), \" \")) as text\n",
898 | "from\n",
899 | " `patents-public-data.patents.publications` as pubs, UNNEST(title_localized) as title\n",
900 | "where \n",
901 | "country_code = 'US' and --US only\n",
902 | "application_kind = 'A' and --patents only\n",
903 | "publication_date > 20000000 and --The patents table uses a very awkward date formulation. This queries for patents published after Jan 1, 2000.\n",
904 | "EXISTS( SELECT 1 name from UNNEST(assignee_harmonized) where name LIKE '\"\"\"+ assignee + \"\"\"')\"\"\"\n",
905 | "\n",
906 | "df_target = pd.read_gbq(query, project_id=PROJECT_ID, dialect='standard')\n",
907 | "\n",
908 | "print(df_target.shape)\n"
909 | ]
910 | },
911 | {
912 | "cell_type": "code",
913 | "execution_count": 11,
914 | "metadata": {},
915 | "outputs": [],
916 | "source": [
917 | "# Use the pipeline and run the predictions.\n",
918 | "#using predict_proba gives us the probabilty rather than the label. This helps in sorting later.\n",
919 | "#this might also take a while\n",
920 | "predictions = final_pipeline.predict_proba(df_target.text.values)\n",
921 | "\n",
922 | "#add the predictions to the dataframe\n",
923 | "# The predictions outcome for our model is tuple of the likelihood of being in category 0 or category 1.\n",
924 | "# The list comprehension below splits off just the probability of the row being in Category 1.\n",
925 | "df_target['BlockChainPredictions'] = [j for i,j in predictions]\n",
926 | "\n"
927 | ]
928 | },
929 | {
930 | "cell_type": "code",
931 | "execution_count": 12,
932 | "metadata": {},
933 | "outputs": [
934 | {
935 | "data": {
936 | "text/html": [
937 | "\n",
938 | "\n",
951 | "
\n",
952 | " \n",
953 | " \n",
954 | " | \n",
955 | " publication_number | \n",
956 | " title | \n",
957 | " BlockChainPredictions | \n",
958 | "
\n",
959 | " \n",
960 | " \n",
961 | " \n",
962 | " | 518 | \n",
963 | " US-9311500-B2 | \n",
964 | " Data security using request-supplied keys | \n",
965 | " 0.970489 | \n",
966 | "
\n",
967 | " \n",
968 | " | 5980 | \n",
969 | " US-2016217290-A1 | \n",
970 | " Data security using request-supplied keys | \n",
971 | " 0.968580 | \n",
972 | "
\n",
973 | " \n",
974 | " | 3798 | \n",
975 | " US-8600886-B2 | \n",
976 | " Managing transaction accounts | \n",
977 | " 0.959393 | \n",
978 | "
\n",
979 | " \n",
980 | " | 5974 | \n",
981 | " US-2012136710-A1 | \n",
982 | " Digital Coupon System | \n",
983 | " 0.957294 | \n",
984 | "
\n",
985 | " \n",
986 | " | 7828 | \n",
987 | " US-2012136707-A1 | \n",
988 | " Digital Coupon System | \n",
989 | " 0.956874 | \n",
990 | "
\n",
991 | " \n",
992 | " | 2393 | \n",
993 | " US-2012136712-A1 | \n",
994 | " Digital Coupon System | \n",
995 | " 0.952514 | \n",
996 | "
\n",
997 | " \n",
998 | " | 4759 | \n",
999 | " US-2013238504-A1 | \n",
1000 | " Performing automatically authorized programmatic transactions | \n",
1001 | " 0.947606 | \n",
1002 | "
\n",
1003 | " \n",
1004 | " | 5570 | \n",
1005 | " US-2015089244-A1 | \n",
1006 | " Data security using request-supplied keys | \n",
1007 | " 0.945723 | \n",
1008 | "
\n",
1009 | " \n",
1010 | " | 1477 | \n",
1011 | " US-2016191241-A1 | \n",
1012 | " Distributed public key revocation | \n",
1013 | " 0.945159 | \n",
1014 | "
\n",
1015 | " \n",
1016 | " | 1316 | \n",
1017 | " US-2017195283-A1 | \n",
1018 | " Allocating identifiers with minimal fragmentation | \n",
1019 | " 0.937665 | \n",
1020 | "
\n",
1021 | " \n",
1022 | " | 5532 | \n",
1023 | " US-2012136706-A1 | \n",
1024 | " Digital Coupon System | \n",
1025 | " 0.936986 | \n",
1026 | "
\n",
1027 | " \n",
1028 | " | 525 | \n",
1029 | " US-2017171219-A1 | \n",
1030 | " Signed envelope encryption | \n",
1031 | " 0.927431 | \n",
1032 | "
\n",
1033 | " \n",
1034 | " | 7470 | \n",
1035 | " US-9087187-B1 | \n",
1036 | " Unique credentials verification | \n",
1037 | " 0.924442 | \n",
1038 | "
\n",
1039 | " \n",
1040 | " | 4436 | \n",
1041 | " US-9674162-B1 | \n",
1042 | " Updating encrypted cryptographic key pair | \n",
1043 | " 0.919130 | \n",
1044 | "
\n",
1045 | " \n",
1046 | " | 5666 | \n",
1047 | " US-9286608-B1 | \n",
1048 | " System and method for predictive payment authorizations | \n",
1049 | " 0.916734 | \n",
1050 | "
\n",
1051 | " \n",
1052 | " | 6161 | \n",
1053 | " US-2016248589-A1 | \n",
1054 | " Cryptographically verified repeatable virtualized computing | \n",
1055 | " 0.911653 | \n",
1056 | "
\n",
1057 | " \n",
1058 | " | 5534 | \n",
1059 | " US-7814229-B1 | \n",
1060 | " Constraint-based domain name system | \n",
1061 | " 0.911245 | \n",
1062 | "
\n",
1063 | " \n",
1064 | " | 4508 | \n",
1065 | " US-9699146-B1 | \n",
1066 | " Secure access to user data | \n",
1067 | " 0.908116 | \n",
1068 | "
\n",
1069 | " \n",
1070 | " | 305 | \n",
1071 | " US-9258120-B1 | \n",
1072 | " Distributed public key revocation | \n",
1073 | " 0.906655 | \n",
1074 | "
\n",
1075 | " \n",
1076 | " | 4532 | \n",
1077 | " US-2017093569-A1 | \n",
1078 | " Supporting a fixed transaction rate with a variably-backed logical cryptographic key | \n",
1079 | " 0.905095 | \n",
1080 | "
\n",
1081 | " \n",
1082 | "
\n",
1083 | "
"
1084 | ],
1085 | "text/plain": [
1086 | " publication_number \\\n",
1087 | "518 US-9311500-B2 \n",
1088 | "5980 US-2016217290-A1 \n",
1089 | "3798 US-8600886-B2 \n",
1090 | "5974 US-2012136710-A1 \n",
1091 | "7828 US-2012136707-A1 \n",
1092 | "2393 US-2012136712-A1 \n",
1093 | "4759 US-2013238504-A1 \n",
1094 | "5570 US-2015089244-A1 \n",
1095 | "1477 US-2016191241-A1 \n",
1096 | "1316 US-2017195283-A1 \n",
1097 | "5532 US-2012136706-A1 \n",
1098 | "525 US-2017171219-A1 \n",
1099 | "7470 US-9087187-B1 \n",
1100 | "4436 US-9674162-B1 \n",
1101 | "5666 US-9286608-B1 \n",
1102 | "6161 US-2016248589-A1 \n",
1103 | "5534 US-7814229-B1 \n",
1104 | "4508 US-9699146-B1 \n",
1105 | "305 US-9258120-B1 \n",
1106 | "4532 US-2017093569-A1 \n",
1107 | "\n",
1108 | " title \\\n",
1109 | "518 Data security using request-supplied keys \n",
1110 | "5980 Data security using request-supplied keys \n",
1111 | "3798 Managing transaction accounts \n",
1112 | "5974 Digital Coupon System \n",
1113 | "7828 Digital Coupon System \n",
1114 | "2393 Digital Coupon System \n",
1115 | "4759 Performing automatically authorized programmatic transactions \n",
1116 | "5570 Data security using request-supplied keys \n",
1117 | "1477 Distributed public key revocation \n",
1118 | "1316 Allocating identifiers with minimal fragmentation \n",
1119 | "5532 Digital Coupon System \n",
1120 | "525 Signed envelope encryption \n",
1121 | "7470 Unique credentials verification \n",
1122 | "4436 Updating encrypted cryptographic key pair \n",
1123 | "5666 System and method for predictive payment authorizations \n",
1124 | "6161 Cryptographically verified repeatable virtualized computing \n",
1125 | "5534 Constraint-based domain name system \n",
1126 | "4508 Secure access to user data \n",
1127 | "305 Distributed public key revocation \n",
1128 | "4532 Supporting a fixed transaction rate with a variably-backed logical cryptographic key \n",
1129 | "\n",
1130 | " BlockChainPredictions \n",
1131 | "518 0.970489 \n",
1132 | "5980 0.968580 \n",
1133 | "3798 0.959393 \n",
1134 | "5974 0.957294 \n",
1135 | "7828 0.956874 \n",
1136 | "2393 0.952514 \n",
1137 | "4759 0.947606 \n",
1138 | "5570 0.945723 \n",
1139 | "1477 0.945159 \n",
1140 | "1316 0.937665 \n",
1141 | "5532 0.936986 \n",
1142 | "525 0.927431 \n",
1143 | "7470 0.924442 \n",
1144 | "4436 0.919130 \n",
1145 | "5666 0.916734 \n",
1146 | "6161 0.911653 \n",
1147 | "5534 0.911245 \n",
1148 | "4508 0.908116 \n",
1149 | "305 0.906655 \n",
1150 | "4532 0.905095 "
1151 | ]
1152 | },
1153 | "metadata": {},
1154 | "output_type": "display_data"
1155 | }
1156 | ],
1157 | "source": [
1158 | "#lets look at the top 20 patents IBM owns that might be relevant to BlockChain:\n",
1159 | "df_display = df_target.nlargest(20, 'BlockChainPredictions').sort_values('BlockChainPredictions', ascending=False)\n",
1160 | "#Why do nlargest and then sort, as opposed to doing sort and slicing? Because n-largest only requires one\n",
1161 | "#pass through the data, then sorting 20 items is trivial. Sorting all ~200k patents is wasteful when we only want the top 20.\n",
1162 | "\n",
1163 | "#make columns wrap instead of truncate:\n",
1164 | "pd.set_option('display.max_colwidth', -1)\n",
1165 | "#drop the text column on display so it doesn't clutter everything, then display\n",
1166 | "display(df_display.drop(\"text\", axis=1))\n"
1167 | ]
1168 | },
1169 | {
1170 | "cell_type": "markdown",
1171 | "metadata": {},
1172 | "source": [
1173 | "# Conclusion\n",
1174 | "The results are interesting. If you are looking at the Amazon results, you can see there are some patent titles that look highly related in the top 20, and some that look way off. From here, the magic is improving your training data by adding positive and negative examples. Additionally, \"Block Chain\" may be too broad of a topic. Do you want applications of block chain, or the underlying mechanism that drives block chain? Are you looking for the component parts of Block Chain like cryptographic functions, or are you only interested in patents that are refinements to the overall Block Chain system?\n",
1175 | "\n",
1176 | "This is a bit of a toy example to let people get started actually doing data science experiments on real patent data. This tutorial is aimed at the relative beginner to machine learning.\n",
1177 | "\n",
1178 | "I hope you have learned something by reading this walk through. Find me on LinkedIn at https://www.linkedin.com/in/davidandrewsjd/. \n",
1179 | "\n",
1180 | "David Andrews\n",
1181 | "Founder\n",
1182 | "Legal Analytics"
1183 | ]
1184 | }
1185 | ],
1186 | "metadata": {
1187 | "kernelspec": {
1188 | "display_name": "Python 3",
1189 | "language": "python",
1190 | "name": "python3"
1191 | },
1192 | "language_info": {
1193 | "codemirror_mode": {
1194 | "name": "ipython",
1195 | "version": 3
1196 | },
1197 | "file_extension": ".py",
1198 | "mimetype": "text/x-python",
1199 | "name": "python",
1200 | "nbconvert_exporter": "python",
1201 | "pygments_lexer": "ipython3",
1202 | "version": "3.6.1"
1203 | }
1204 | },
1205 | "nbformat": 4,
1206 | "nbformat_minor": 2
1207 | }
1208 |
--------------------------------------------------------------------------------