├── .gitignore
├── LICENSE.txt
├── README.md
├── WPJsonScraper.py
├── doc
    ├── Interactive.md
    └── WPJsonScraperCapture.png
├── lib
    ├── __init__.py
    ├── console.py
    ├── exceptions.py
    ├── exporter.py
    ├── infodisplayer.py
    ├── interactive.py
    ├── plugins
    │   └── plugin_list.csv
    ├── requestsession.py
    ├── utils.py
    └── wpapi.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | */__pycache__/*
2 | .venv/*


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy
 4 | of this software and associated documentation files (the "Software"), to deal
 5 | in the Software without restriction, including without limitation the rights
 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 7 | copies of the Software, and to permit persons to whom the Software is
 8 | furnished to do so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
20 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # WPJsonScraper
  2 | 
  3 | ## Introduction
  4 | 
  5 | ![WPJsonScraper capture](doc/WPJsonScraperCapture.png)
  6 | 
  7 | WPJsonScraper is a tool for dumping a maximum of the content available on a
  8 | WordPress installation. It uses the wp-json API to retrieve all important
  9 | information and enumerate every user, post, comment, media and more.
 10 | 
 11 | This allows to get information about sensitive files or pages which may be not
 12 | protected enough from external access.
 13 | 
 14 | WPJsonScraper has 2 operation modes: command line arguments and interactive. 
 15 | The latest offers a command prompt allowing to do more complex operations on 
 16 | the WP-JSON API.
 17 | 
 18 | ## Prerequises
 19 | 
 20 | WPJsonScraper is written in Python and should work with any Python 3
 21 | environment given that the following packages are installed:
 22 | 
 23 | * Python 3
 24 | * requests
 25 | 
 26 | ## Installation
 27 | 
 28 | Just clone the repository with git and run `pip install -r requirements.txt`.
 29 | 
 30 | You may want to use a virtualenv for keeping your dependencies consistent across 
 31 | Python projects.
 32 | 
 33 | ## Usage
 34 | 
 35 | ### Interactive mode
 36 | 
 37 | See [Interactive mode](doc/Interactive.md) for more details.
 38 | 
 39 | ### Command line arguments mode
 40 | 
 41 | The tool needs the definition of a target WordPress installation and a flag
 42 | instructing which action to do.
 43 | 
 44 | You may want to have all available information using the -a flag. But this is
 45 | maybe a bit verbose, so you can select which categories of information you need
 46 | in these ones :
 47 | 
 48 | * -h, --help: display the help and exit
 49 | * -v, --version: display the version number and exit
 50 | * -a, --all: display all data available
 51 | * -i, --info: dump basic information about the target
 52 | * -e, --endpoints: dump full endpoint documentation
 53 | * -p, --posts: list all published posts
 54 | * -u, --users: list all users
 55 | * -t, --tags: list all tags
 56 | * -c, --categories: list all categories
 57 | * -m, --media: list all public media objects
 58 | * --download-media MEDIA_FOLDER: download media to the designated folder
 59 | * -g, --pages: list all public pages
 60 | * -o, --comments: lists comments
 61 | * -S, --search SEARCH_TERMS: performs a search on SEARCH_TERMS
 62 | * -r, --crawl-ns: crawl plugin namespaces for collections. Set it to all to
 63 | crawl all namespaces
 64 | * --proxy PROXY_URL force the data to pass through a specified proxy server
 65 | * --auth CREDENTIALS use the specified credentials as basic HTTP auth for the
 66 | server
 67 | * --cookies COOKIES add specified Cookies to the requests
 68 | * --no-color: remove color (for example to redirect the output to a file)
 69 | * --interactive: start an interactive session
 70 | 
 71 | Moreover, you can export contents of pages and posts to a folder in separate
 72 | files:
 73 | 
 74 | * --export-pages PAGE_EXPORT_FOLDER
 75 | * --export-posts POST_EXPORT_FOLDER
 76 | * --export-comments COMMENT_EXPORT_FOLDER
 77 | 
 78 | You can set the proxy server with the --proxy flag. It can be an HTTP or HTTPS
 79 | as described in Python requests documentation. By default the proxy servers of
 80 | the system are used.
 81 | 
 82 | Example:
 83 | 
 84 |     http://user:password@example.com:8080/
 85 | 
 86 | Using the -r option, you can crawl collections of the specified namespace. This
 87 | allows you to get a set of objects from the API and maybe confidential data ;)
 88 | 
 89 | #### Search feature
 90 | 
 91 | WordPress WP-JSON API allows to search in posts, pages, media objects, tags, 
 92 | categories, comments and users.
 93 | 
 94 | The -S (--search) option allows to use this functionnality with 
 95 | wp-json-scraper.
 96 | 
 97 | It can be used on a specific item type or on several at once.
 98 | 
 99 | Examples:
100 | 
101 |     # Search for "lorem" for all item types specified
102 |     ./WPJsonScraper.py -S lorem https://demo.wp-api.org/
103 |     # Search for "hello world" in posts, users and pages only
104 |     ./WPJsonScraper.py -S "hello world" -p -u -g https://demo.wp-api.org/
105 | 
106 | ## Features to implement
107 | 
108 | WPJsonScraper is not a mature project yet and its features are pretty basic for
109 | the moment. Some of the features that could be implemented in the future are:
110 | 
111 | * Posts revisions retrieval
112 | * Plugins support
113 | * Authentication support with NTLM
114 | * WordPress instance save as JSON (limited to the accessible scope) and restore?
115 | * Password-protected content handling
116 | * Support new endpoints added in version 5.0: autosaves, block type, blocks, block_renderer, themes (authenticated access required but WTF?)
117 | * Write tests duh!


--------------------------------------------------------------------------------
/WPJsonScraper.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | """
  4 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  5 | 
  6 | Permission is hereby granted, free of charge, to any person obtaining a copy
  7 | of this software and associated documentation files (the "Software"), to deal
  8 | in the Software without restriction, including without limitation the rights
  9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 10 | copies of the Software, and to permit persons to whom the Software is
 11 | furnished to do so, subject to the following conditions:
 12 | 
 13 | The above copyright notice and this permission notice shall be included in all
 14 | copies or substantial portions of the Software.
 15 | 
 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 22 | SOFTWARE.
 23 | """
 24 | 
 25 | import argparse
 26 | import requests
 27 | import re
 28 | import os
 29 | 
 30 | from lib.console import Console
 31 | from lib.wpapi import WPApi
 32 | from lib.infodisplayer import InfoDisplayer
 33 | from lib.exceptions import NoWordpressApi, WordPressApiNotV2, \
 34 |                             NSNotFoundException
 35 | from lib.exporter import Exporter
 36 | from lib.requestsession import RequestSession
 37 | from lib.interactive import start_interactive
 38 | 
 39 | version = '0.5'
 40 | 
 41 | def main():
 42 |     parser = argparse.ArgumentParser(description=
 43 | """Reads a WP-JSON API on a WordPress installation to retrieve a maximum of
 44 | publicly available information. These information comprise, but not only:
 45 | posts, comments, pages, medias or users. As this tool could allow to access
 46 | confidential (but not well-protected) data, it is recommended that you get
 47 | first a written permission from the site owner. The author won\'t endorse any
 48 | liability for misuse of this software""",
 49 |     epilog=
 50 | """(c) 2018-2020 Mickaël "Kilawyn" Walter. This program is licensed under the MIT
 51 | license, check LICENSE.txt for more information""")
 52 |     parser.add_argument('-v',
 53 |                         '--version',
 54 |                         action='version',
 55 |                         version='%(prog)s ' + version)
 56 |     parser.add_argument('target',
 57 |                         type=str,
 58 |                         help='the base path of the WordPress installation to '
 59 |                         'examine')
 60 |     parser.add_argument('-i',
 61 |                         '--info',
 62 |                         dest='info',
 63 |                         action='store_true',
 64 |                         help='dumps basic information about the WordPress '
 65 |                         'installation')
 66 |     parser.add_argument('-e',
 67 |                         '--endpoints',
 68 |                         dest='endpoints',
 69 |                         action='store_true',
 70 |                         help='dumps full endpoint documentation')
 71 |     parser.add_argument('-p',
 72 |                         '--posts',
 73 |                         dest='posts',
 74 |                         action='store_true',
 75 |                         help='lists published posts')
 76 |     parser.add_argument('--export-posts',
 77 |                         dest='post_export_folder',
 78 |                         action='store',
 79 |                         help='export posts to a specified destination folder')
 80 |     parser.add_argument('-u',
 81 |                         '--users',
 82 |                         dest='users',
 83 |                         action='store_true',
 84 |                         help='lists users')
 85 |     parser.add_argument('-t',
 86 |                         '--tags',
 87 |                         dest='tags',
 88 |                         action='store_true',
 89 |                         help='lists tags')
 90 |     parser.add_argument('-c',
 91 |                         '--categories',
 92 |                         dest='categories',
 93 |                         action='store_true',
 94 |                         help='lists categories')
 95 |     parser.add_argument('-m',
 96 |                         '--media',
 97 |                         dest='media',
 98 |                         action='store_true',
 99 |                         help='lists media objects')
100 |     parser.add_argument('-g',
101 |                         '--pages',
102 |                         dest='pages',
103 |                         action='store_true',
104 |                         help='lists pages')
105 |     parser.add_argument('-o',
106 |                         '--comments',
107 |                         dest='comments',
108 |                         action='store_true',
109 |                         help="lists comments")
110 |     parser.add_argument('--export-pages',
111 |                         dest='page_export_folder',
112 |                         action='store',
113 |                         help='export pages to a specified destination folder')
114 |     parser.add_argument('--export-comments',
115 |                         dest='comment_export_folder',
116 |                         action='store',
117 |                         help='export comments to a specified destination folder')
118 |     parser.add_argument('--download-media',
119 |                         dest='media_folder',
120 |                         action='store',
121 |                         help='download media to the designated folder')
122 |     parser.add_argument('-r',
123 |                         '--crawl-ns',
124 |                         dest='crawl_ns',
125 |                         action='store',
126 |                         help='crawl all GET routes of the specified namespace '
127 |                         'or all namespaces if all is specified')
128 |     parser.add_argument('-a',
129 |                         '--all',
130 |                         dest='all',
131 |                         action='store_true',
132 |                         help='dumps all available information from the '
133 |                         'target API')
134 |     parser.add_argument('-S',
135 |                         '--search',
136 |                         dest='search',
137 |                         action='store',
138 |                         help='search for a string on the WordPress instance. '
139 |                         'If one or several flag in agpmctu are set, search '
140 |                         'only on these')
141 |     parser.add_argument('--proxy',
142 |                         dest='proxy_server',
143 |                         action='store',
144 |                         help='define a proxy server to use, e.g. for '
145 |                         'enterprise network or debugging')
146 |     parser.add_argument('--auth',
147 |                         dest='credentials',
148 |                         action='store',
149 |                         help='define a username and a password separated by '
150 |                         'a colon to use them as basic authentication')
151 |     parser.add_argument('--cookies',
152 |                         dest='cookies',
153 |                         action='store',
154 |                         help='define specific cookies to send with the request '
155 |                         'in the format cookie1=foo; cookie2=bar')
156 |     parser.add_argument('--no-color',
157 |                         dest='nocolor',
158 |                         action='store_true',
159 |                         help='remove color in the output (e.g. to pipe it)')
160 |     parser.add_argument('--interactive',
161 |                         dest='interactive',
162 |                         action='store_true',
163 |                         help='start an interactive session')
164 | 
165 | 
166 |     args = parser.parse_args()
167 | 
168 |     motd = """
169 |  _    _______  ___                  _____
170 | | |  | | ___ \\|_  |                /  ___|
171 | | |  | | |_/ /  | | ___  ___  _ __ \\ `--.  ___ _ __ __ _ _ __   ___ _ __
172 | | |/\\| |  __/   | |/ __|/ _ \\| '_ \\ `--. \\/ __| '__/ _` | '_ \\ / _ \\ '__|
173 | \\  /\\  / |  /\\__/ /\\__ \\ (_) | | | /\\__/ / (__| | | (_| | |_) |  __/ |
174 |  \\/  \\/\\_|  \\____/ |___/\\___/|_| |_\\____/ \\___|_|  \\__,_| .__/ \\___|_|
175 |                                                         | |
176 |                                                         |_|
177 |     WPJsonScraper v%s
178 |     By Mickaël \"Kilawyn\" Walter
179 | 
180 |     Make sure you use this tool with the approval of the site owner. Even if
181 |     these information are public or available with proper authentication, this
182 |     could be considered as an intrusion.
183 | 
184 |     Target: %s
185 | 
186 |     """ % (version, args.target)
187 | 
188 |     print(motd)
189 | 
190 |     if args.nocolor:
191 |         Console.wipe_color()
192 | 
193 |     Console.log_info("Testing connectivity with the server")
194 | 
195 |     target = args.target
196 |     if re.match(r'^https?://.*$', target) is None:
197 |         target = "http://" + target
198 |     if re.match(r'^.+/$', target) is None:
199 |         target += "/"
200 | 
201 |     proxy = None
202 |     if args.proxy_server is not None:
203 |         proxy = args.proxy_server
204 |     cookies = None
205 |     if args.cookies is not None:
206 |         cookies = args.cookies
207 |     authorization = None
208 |     if args.credentials is not None:
209 |         authorization_list = args.credentials.split(':')
210 |         if len(authorization_list) == 1:
211 |             authorization = (authorization_list[0], '')
212 |         elif len(authorization_list) >= 2:
213 |             authorization = (authorization_list[0],
214 |               ':'.join(authorization_list[1:]))
215 |     session = RequestSession(proxy=proxy, cookies=cookies,
216 |       authorization=authorization)
217 |     try:
218 |         session.get(target)
219 |         Console.log_success("Connection OK")
220 |     except Exception as e:
221 |         Console.log_error("Failed to connect to the server")
222 |         exit(0)
223 |     
224 |     # Quite an ugly check to launch a search on all parameters edible 
225 |     # Should find something better (maybe in argparser doc?)
226 |     if args.search is not None and not (args.all | args.posts | args.pages | 
227 |         args.users | args.categories | args.tags | args.media):
228 |         Console.log_info("Searching on all available sources")
229 |         args.posts = True
230 |         args.pages = True 
231 |         args.users = True
232 |         args.categories = True
233 |         args.tags = True
234 |         args.media = True
235 | 
236 |     if args.interactive:
237 |         start_interactive(target, session, version)
238 |         return
239 | 
240 |     scanner = WPApi(target, session=session, search_terms=args.search)
241 |     if args.info or args.all:
242 |         try:
243 |             basic_info = scanner.get_basic_info()
244 |             Console.log_info("General information on the target")
245 |             InfoDisplayer.display_basic_info(basic_info)
246 |         except NoWordpressApi:
247 |             Console.log_error("No WordPress API available at the given URL "
248 |             "(too old WordPress or not WordPress?)")
249 |             exit()
250 |     
251 |     if args.posts or args.all:
252 |         try:
253 |             if args.comments:
254 |                 Console.log_info("Post list with comments")
255 |             else:
256 |                 Console.log_info("Post list")
257 |             posts_list = scanner.get_posts(args.comments)
258 |             InfoDisplayer.display_posts(posts_list, scanner.get_orphans_comments())
259 |         except WordPressApiNotV2:
260 |             Console.log_error("The API does not support WP V2")
261 | 
262 |     if args.pages or args.all:
263 |         try:
264 |             Console.log_info("Page list")
265 |             pages_list = scanner.get_pages()
266 |             InfoDisplayer.display_pages(pages_list)
267 |         except WordPressApiNotV2:
268 |             Console.log_error("The API does not support WP V2")
269 | 
270 |     if args.users or args.all:
271 |         try:
272 |             Console.log_info("User list")
273 |             users_list = scanner.get_users()
274 |             InfoDisplayer.display_users(users_list)
275 |         except WordPressApiNotV2:
276 |             Console.log_error("The API does not support WP V2")
277 | 
278 |     if args.endpoints or args.all:
279 |         try:
280 |             Console.log_info("API endpoints")
281 |             basic_info = scanner.get_basic_info()
282 |             InfoDisplayer.display_endpoints(basic_info)
283 |         except NoWordpressApi:
284 |             Console.log_error("No WordPress API available at the given URL "
285 |             "(too old WordPress or not WordPress?)")
286 |             exit()
287 | 
288 |     if args.categories or args.all:
289 |         try:
290 |             Console.log_info("Category list")
291 |             categories_list = scanner.get_categories()
292 |             InfoDisplayer.display_categories(categories_list)
293 |         except WordPressApiNotV2:
294 |             Console.log_error("The API does not support WP V2")
295 | 
296 |     if args.tags or args.all:
297 |         try:
298 |             Console.log_info("Tags list")
299 |             tags_list = scanner.get_tags()
300 |             InfoDisplayer.display_tags(tags_list)
301 |         except WordPressApiNotV2:
302 |             Console.log_error("The API does not support WP V2")
303 | 
304 |     media_list = None
305 |     if args.media or args.all:
306 |         try:
307 |             Console.log_info("Media list")
308 |             media_list = scanner.get_media()
309 |             InfoDisplayer.display_media(media_list)
310 |         except WordPressApiNotV2:
311 |             Console.log_error("The API does not support WP V2")
312 | 
313 |     if args.crawl_ns is None and args.all:
314 |         args.crawl_ns = "all"
315 | 
316 |     if args.crawl_ns is not None:
317 |         try:
318 |             if args.crawl_ns == "all":
319 |                 Console.log_info("Crawling all namespaces")
320 |             else:
321 |                 Console.log_info("Crawling %s namespace" % args.crawl_ns)
322 |             ns_data = scanner.crawl_namespaces(args.crawl_ns)
323 |             InfoDisplayer.display_crawled_ns(ns_data)
324 |         except NSNotFoundException:
325 |             Console.log_error("The specified namespace was not found")
326 |         except Exception as e:
327 |             print(e)
328 | 
329 |     if args.post_export_folder is not None:
330 |         try:
331 |             posts_list = scanner.get_posts()
332 |             tags_list = scanner.get_tags()
333 |             categories_list = scanner.get_categories()
334 |             users_list = scanner.get_users()
335 |             print()
336 |             post_number = Exporter.export_posts_html(posts_list,
337 |              args.post_export_folder,
338 |              tags_list,
339 |              categories_list,
340 |              users_list)
341 |             if post_number> 0:
342 |                 Console.log_success("Exported %d posts to %s" %
343 |                 (post_number, args.post_export_folder))
344 |         except WordPressApiNotV2:
345 |             Console.log_error("The API does not support WP V2")
346 | 
347 |     if args.page_export_folder is not None:
348 |         try:
349 |             pages_list = scanner.get_pages()
350 |             users_list = scanner.get_users()
351 |             print()
352 |             page_number = Exporter.export_posts_html(pages_list,
353 |              args.page_export_folder,
354 |              None,
355 |              None,
356 |              users_list)
357 |             if page_number> 0:
358 |                 Console.log_success("Exported %d pages to %s" %
359 |                 (page_number, args.page_export_folder))
360 |         except WordPressApiNotV2:
361 |             Console.log_error("The API does not support WP V2")
362 |     
363 |     if args.comment_export_folder is not None:
364 |         try:
365 |             post_list = scanner.get_posts(True)
366 |             orphan_list = scanner.get_orphans_comments()
367 |             print()
368 |             page_number = Exporter.export_comments(post_list, orphan_list, args.comment_export_folder)
369 |             if page_number > 0:
370 |                 Console.log_success("Exported %d comments to %s" %
371 |                 (page_number, args.comment_export_folder))
372 |         except WordPressApiNotV2:
373 |             Console.log_error("The API does not support WP V2")
374 | 
375 |     if args.media_folder is not None:
376 |         Console.log_info("Downloading media files")
377 |         if not os.path.isdir(args.media_folder):
378 |             Console.log_error("The destination is not a folder or does not exist")
379 |         else:
380 |             print("Pulling the media URLs")
381 | 
382 |             media, _ = scanner.get_media_urls('all', True)
383 |             if len(media) == 0:
384 |                 Console.log_error("No media found")
385 |                 return
386 |             print("%d media URLs found" % len(media))
387 | 
388 |             print("Note: Only files over 10MB are logged here")
389 |             number_downloaded = Exporter.download_media(media, args.media_folder)
390 |             Console.log_success('Downloaded %d media to %s' % (number_downloaded, args.media_folder))
391 | 
392 | 
393 | if __name__ == "__main__":
394 |     main()
395 | 


--------------------------------------------------------------------------------
/doc/Interactive.md:
--------------------------------------------------------------------------------
  1 | # Interactive mode
  2 | 
  3 | To help with more complex interactions with WP-JSON API, WPJsonScraper implements an interactive mode.
  4 | 
  5 | In interactive mode, the same session is used between requests. So every cookies set by the server and other parameters are kept 
  6 | from one request to another.
  7 | 
  8 | Typing `command -h` or `command --help` will bring a detailed help message for specific commands.
  9 | 
 10 | Tab autocompletes the command name, up and down browse the command history.
 11 | 
 12 | ## Commands
 13 | 
 14 | ### help
 15 | 
 16 | Lists commands and displays a brief help message about specified commands.
 17 | 
 18 | Example 1: display the command list
 19 | 
 20 |     help
 21 | 
 22 | Example 2: display a brief help message about the command goals.
 23 | 
 24 |     help show
 25 | 
 26 | ### exit
 27 | 
 28 | Exits the interactive mode and goes back to the user's shell.
 29 | 
 30 | ### show
 31 | 
 32 | Shows details about global parameters stored in WPJsonScraper memory.
 33 | 
 34 | Example: show all parameters
 35 | 
 36 |     show all
 37 | 
 38 | ### set
 39 | 
 40 | Sets a specific global parameter. 
 41 | 
 42 | Note that in cases of proxy and cookies, the command updates the entries. 
 43 | Check the resulting parameter with show if you don't know what that means.
 44 | 
 45 | **Note:** changing the target resets the cache but keeps proxies, cookies and authorization headers. Be aware 
 46 | of data leakage risks. If you need to keep things apart between targets, relaunch WPJsonScraper or make sure 
 47 | all is correctly set up with the `show all` command.
 48 | 
 49 | Example 1: change the target
 50 | 
 51 |     set target http://example.com
 52 | 
 53 | Example 2: add or modify the cookies PHPSESSID and JSESSIONID (because why not?)
 54 | 
 55 |     set cookie "PHPSESSID=deadbeef; JSESSIONID=badc0ffee"
 56 | 
 57 | ### list
 58 | 
 59 | Lists specified data from the server.
 60 | 
 61 | This command gets data from the server and displays it as a simple list (with no details).
 62 | 
 63 | It also can export full scraped data (with all details available) to specified JSON file 
 64 | (see --csv and --json options). If a file extension is not specified, WPJsonScraper will append one. 
 65 | The export options will try to join data with other API endpoint data (e.g. users with posts). CSV files 
 66 | imply that most of the data is removed to ensure human readability. Use this option only to export a list of 
 67 | posts.
 68 | 
 69 | **Note:** to avoid having too much noise on the target, WPJsonScraper won't fetch automatically any other 
 70 | endpoint to complete the exported data. If you want all information to be gathered, you have to build the 
 71 | cache first by requesting the data beforehand (for example, getting the user list before exporting the posts).
 72 | 
 73 | By default, WPJsonScraper caches data to avoid requesting the server too often. To get the lastest updates, 
 74 | run this command with the --no-cache option.
 75 | 
 76 | Use the --limit and --start options to retrieve a subset of all data selected.
 77 | 
 78 | In the case of media files, the files themselves **are not downloaded**.
 79 | 
 80 | Example 1: get all posts
 81 | 
 82 |     list posts
 83 | 
 84 | Example 2: get maximum 10 pages starting at page 15
 85 | 
 86 |     list pages --start 15 --limit 10
 87 | 
 88 | Example 3: export all listeable content to json files (including for example all-data-posts.json)
 89 | 
 90 |     list all --json all-data
 91 | 
 92 | Example 4: list namespaces
 93 | 
 94 |     list namespaces
 95 | 
 96 | ### fetch
 97 | 
 98 | Fetches a specific piece of data from the server given its type and its ID. By default, if the data is cached, 
 99 | the data is returned from the cache. Use the --no-cache argument to force its retrieval from the server.
100 | 
101 | The data displayed is more complete than the data displayed by the list command. But some metadata is still not 
102 | displayed. Only the JSON export is a full data dump (with additional mapping when relevant).
103 | 
104 | **Note:** like in the list function, the data that could complete the displayed information is not automatically 
105 | fetched. You have to get it into cache first or to fetch it separately based on its ID. Moreover, the data 
106 | retrieved by ID is not yet pushed into the cache. It may be in a later version.
107 | 
108 | Example 1 : display the post with the ID 1
109 | 
110 |     fetch post 1
111 | 
112 | Example 2 : display the page with the ID 42 and export it in a JSON file, don't use the cache
113 | 
114 |     fetch page 42 --no-cache
115 | 
116 | ### search
117 | 
118 | Looks for data based on the specified keywords. This command doesn't use the cache and systematically uses the 
119 | WordPress API to do searches. One or several object types may be provided to narrow the search scope.
120 | 
121 | Example 1: look for keyword test in all object types
122 | 
123 |     search test
124 | 
125 | Example 2: look for keyword foo in posts and pages
126 | 
127 |     search --type post --type page foo
128 | 
129 | Example 3: --limit and --start also work for search results
130 | 
131 |     search --limit 5 --start 4 bar
132 | 
133 | ### dl
134 | 
135 | Downloads media based on the provided ID. The ID can be specified as an integer (or list of integers), `all` or 
136 | `cache`. In the first case, only media with the specified IDs will be downloaded. `all` will trigger a fetch from 
137 | the API to list all medias then a download session for each file. `cache` will get media URLs from the cache and 
138 | then download the files. 
139 | 
140 | Note that if all the IDs specified are in the cache, no lookup will be made on the API. If you want to override 
141 | this behaviour, set the `--no-cache` flag.
142 | 
143 | Example 1: download the media with the IDs 42 and 63 to the current folder
144 | 
145 |     dl 42,63 .
146 | 
147 | Example 2: download all media to user's home folder
148 | 
149 |     dl all /home/user
150 | 
151 | Example 3: only media present in the cache (e.g. previously requested with list or fetch) are downloaded
152 | 
153 |     dl cache .


--------------------------------------------------------------------------------
/doc/WPJsonScraperCapture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MickaelWalter/wp-json-scraper/677ddeea6437f24302855652756e11c89ebeaf84/doc/WPJsonScraperCapture.png


--------------------------------------------------------------------------------
/lib/__init__.py:
--------------------------------------------------------------------------------
1 | pass
2 | 


--------------------------------------------------------------------------------
/lib/console.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
 3 | 
 4 | Permission is hereby granted, free of charge, to any person obtaining a copy
 5 | of this software and associated documentation files (the "Software"), to deal
 6 | in the Software without restriction, including without limitation the rights
 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 8 | copies of the Software, and to permit persons to whom the Software is
 9 | furnished to do so, subject to the following conditions:
10 | 
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 | 
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 | 
23 | 
24 | class Console:
25 |     """
26 |     A little helper class to allow console management (like color)
27 |     """
28 |     normal = "\033[0m"
29 |     blue = "\033[94m"
30 |     green = "\033[92m"
31 |     red = "\033[31m"
32 | 
33 |     @staticmethod
34 |     def wipe_color():
35 |         """
36 |         Deactivates color in terminal
37 |         """
38 |         Console.normal = ""
39 |         Console.blue = ""
40 |         Console.green = ""
41 |         Console.red = ""
42 | 
43 |     @staticmethod
44 |     def log_info(text):
45 |         """
46 |         Prints information log to the console
47 |         param text: the text to display
48 |         """
49 |         print()
50 |         print(Console.blue + "[*] " + text + Console.normal)
51 | 
52 |     @staticmethod
53 |     def log_error(text):
54 |         """
55 |         Prints error log to the console
56 |         param text: the text to display
57 |         """
58 |         print()
59 |         print(Console.red + "[!] " + text + Console.normal)
60 | 
61 |     @staticmethod
62 |     def log_success(text):
63 |         """
64 |         Prints error log to the console
65 |         param text: the text to display
66 |         """
67 |         print(Console.green + "[+] " + text + Console.normal)
68 | 


--------------------------------------------------------------------------------
/lib/exceptions.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
 3 | 
 4 | Permission is hereby granted, free of charge, to any person obtaining a copy
 5 | of this software and associated documentation files (the "Software"), to deal
 6 | in the Software without restriction, including without limitation the rights
 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 8 | copies of the Software, and to permit persons to whom the Software is
 9 | furnished to do so, subject to the following conditions:
10 | 
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 | 
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 | 
23 | class NoWordpressApi (Exception):
24 |     """
25 |     No API is available at the given URL
26 |     """
27 |     pass
28 | 
29 | class WordPressApiNotV2 (Exception):
30 |     """
31 |     The WordPress V2 API is not available
32 |     """
33 |     pass
34 | 
35 | class NSNotFoundException (Exception):
36 |     """
37 |     The specified namespace does not exist
38 |     """
39 |     pass
40 | 


--------------------------------------------------------------------------------
/lib/exporter.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy
  5 | of this software and associated documentation files (the "Software"), to deal
  6 | in the Software without restriction, including without limitation the rights
  7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  8 | copies of the Software, and to permit persons to whom the Software is
  9 | furnished to do so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import os
 24 | import copy
 25 | import html
 26 | import json
 27 | import csv
 28 | from datetime import datetime
 29 | from urllib import parse as urlparse
 30 | import mimetypes
 31 | import requests
 32 | 
 33 | from lib.console import Console
 34 | from lib.utils import get_by_id, print_progress_bar
 35 | 
 36 | class Exporter:
 37 |     """
 38 |         Utility functions to export data
 39 |     """
 40 |     JSON = 1
 41 |     """
 42 |         Represents the JSON format for format choice
 43 |     """
 44 |     CSV = 2
 45 |     """
 46 |         Represents the CSV format for format choice
 47 |     """
 48 |     CHUNK_SIZE = 2048
 49 |     """
 50 |         The size of chunks to download large files
 51 |     """
 52 | 
 53 |     @staticmethod
 54 |     def download_media(media, output_folder, slugs=None):
 55 |         """
 56 |             Downloads the media files based on the given URLs
 57 |             
 58 |             :param media: the URLs as a list
 59 |             :param output_folder: the path to the folder where the files are being saved, it is assumed as existing
 60 |             :param slugs: list of slugs to associate with media. The list must be ordered the same as media and should be the same size
 61 |             :return: the number of files wrote
 62 |         """
 63 |         files_number = 0
 64 |         media_length = len(media)
 65 |         progress = 0
 66 |         for m in media:
 67 |             r = requests.get(m, stream=True)
 68 |             if r.status_code == 200:
 69 |                 http_path = urlparse.urlparse(m).path.split("/")
 70 |                 local_path = output_folder
 71 |                 if len(http_path) > 1:
 72 |                     for el in http_path[:-1]:
 73 |                         local_path = os.path.join(local_path, el)
 74 |                         if not os.path.isdir(local_path):
 75 |                             os.mkdir(local_path)
 76 |                 if slugs is None:
 77 |                     local_path = os.path.join(local_path, http_path[-1])
 78 |                 else:
 79 |                     ext = mimetypes.guess_extension(r.headers['Content-Type'])
 80 |                     local_path = os.path.join(local_path, slugs[progress])
 81 |                     if ext is not None:
 82 |                         local_path += ext
 83 |                 with open(local_path, "wb") as f:
 84 |                     i = 0
 85 |                     content_size = int(r.headers['Content-Length'])
 86 |                     for chunk in r.iter_content(Exporter.CHUNK_SIZE):
 87 |                         if content_size > 10485706: # 10Mo
 88 |                             print_progress_bar(i*Exporter.CHUNK_SIZE, content_size, prefix=http_path[-1], length=70)
 89 |                         f.write(chunk)
 90 |                         i += 1
 91 |                     if content_size > 10485706: # 10Mo
 92 |                             print_progress_bar(content_size, content_size, prefix=http_path[-1], length=70)
 93 |                 files_number += 1
 94 |             progress += 1
 95 |             if progress % 10 == 1:
 96 |                 print("Downloaded file %d of %d" % (progress, media_length))
 97 |         return files_number
 98 | 
 99 |     @staticmethod
100 |     def map_params(el, parameters_to_map):
101 |         """
102 |             Maps params to ids recursively.
103 | 
104 |             This method automatically maps IDs with the correponding objects given in parameters_to_map. 
105 |             The mapping is made in place as el is passed as a reference.
106 | 
107 |             :param el: the element that have ID references
108 |             :param parameters_to_map: a dict containing lists of elements to map by ids with el
109 |         """
110 |         for key, value in el.items():
111 |             if key in parameters_to_map.keys() and parameters_to_map[key] is not None:
112 |                 if type(value) is int: # Only one ID to map
113 |                     obj = get_by_id(parameters_to_map[key], value)
114 |                     if obj is not None:
115 |                         el[key] = {
116 |                             'id': value,
117 |                             'details': obj
118 |                         }
119 |                 elif type(value) is list: # The object is a list of IDs, we map each one
120 |                     vlist = []
121 |                     for v in value:
122 |                         obj = get_by_id(parameters_to_map[key], v)
123 |                         vlist.append(obj)
124 |                     el[key] = {
125 |                         'ids': value,
126 |                         'details': vlist
127 |                     }
128 |             elif value is dict:
129 |                 Exporter.map_params(value, parameters_to_map)
130 | 
131 |     @staticmethod
132 |     def setup_export(vlist, parameters_to_unescape, parameters_to_map):
133 |         """
134 |             Sets up the right values for a list export.
135 | 
136 |             This function flattens alist of objects before its serialization in the expected format. 
137 |             It also makes a deepcopy to ensure that the original vlist is not altered.
138 | 
139 |             :param vlist: the list to prepare for exporting
140 |             :param parameters_to_unescape: parameters to unescape (ex. ["param1", ["param2"]["rendered"]])
141 |             :param parameters_to_map: parameters to map to another (ex. {"param_to_map": param_values_list})
142 |         """
143 |         exported_list = []
144 | 
145 |         for el in vlist:
146 |             if el is not None:
147 |                 # First copy the object
148 |                 exported_el = copy.deepcopy(el)
149 |                 # Look for parameters to HTML unescape
150 |                 for key in parameters_to_unescape:
151 |                     if type(key) is str: # If the parameter is at the root
152 |                         exported_el[key] = html.unescape(exported_el[key])
153 |                     elif type(key) is list: # If the parameter is nested
154 |                         selected = exported_el
155 |                         siblings = []
156 |                         fullpath = {}
157 |                         # We look for the leaf first, not forgetting sibling branches for rebuilding the tree later
158 |                         for k in key:
159 |                             if type(selected) is dict and k in selected.keys():
160 |                                 sib = {}
161 |                                 for e in selected.keys():
162 |                                     if e != k:
163 |                                         sib[e] = selected[e]
164 |                                 selected = selected[k]
165 |                                 siblings.append(sib)
166 |                             else:
167 |                                 selected = None
168 |                                 break
169 |                         # If we can unescape the parameter, we do it and rebuild the tree starting from the leaf
170 |                         if selected is not None and type(selected) is str:
171 |                             selected = html.unescape(selected)
172 |                             key.reverse()
173 |                             fullpath[key[0]] = selected
174 |                             s = len(siblings) - 1
175 |                             for e in siblings[s].keys():
176 |                                 fullpath[e] = siblings[s][e]
177 |                             for k in key[1:]:
178 |                                 fullpath = {k: fullpath}
179 |                                 s -= 1
180 |                                 for e in siblings[s].keys():
181 |                                     fullpath[e] = siblings[s][e]
182 |                             key.reverse()
183 |                             exported_el[key[0]] = fullpath[key[0]]
184 |                 # If there is any parameter to map, we do it here
185 |                 Exporter.map_params(exported_el, parameters_to_map)
186 |                 # The resulting element is appended to the list of exported elements
187 |                 exported_list.append(exported_el)
188 | 
189 |         return exported_list
190 | 
191 |     @staticmethod
192 |     def prepare_filename(filename, fmt):
193 |         """
194 |             Returns a filename with the proper extension according to the given format
195 | 
196 |             :param filename: the filename to clean
197 |             :param fmt: the file format
198 |             :return: the cleaned filename
199 |         """
200 |         if filename[-5:] != ".json" and fmt == Exporter.JSON:
201 |             filename += ".json"
202 |         elif filename[-4:] != ".csv" and fmt == Exporter.CSV:
203 |             filename += ".csv"
204 |         return filename
205 | 
206 |     @staticmethod
207 |     def write_file(filename, fmt, csv_keys, data, details=None):
208 |         """
209 |             Writes content to the given file using the given format.
210 | 
211 |             The key mapping must be a dict of keys or lists of keys to ensure proper mapping.
212 | 
213 |             :param filename: the path of the file
214 |             :param fmt: the format of the file
215 |             :param csv_keys: the key mapping
216 |             :param data: the actual data to export
217 |             :param details: the details keys to look for
218 |         """
219 |         with open(filename, "w", encoding="utf-8") as f:
220 |             if fmt == Exporter.JSON:
221 |                 # The JSON format is straightforward, we dump the flattened objects to JSON
222 |                 json.dump(data, f, ensure_ascii=False, indent=4)
223 |             else:
224 |                 # The CSV format requires some work, to select the most relevant information
225 |                 fieldnames = csv_keys.keys()
226 |                 w = csv.DictWriter(f, fieldnames=fieldnames)
227 |                 w.writeheader()
228 |                 for el in data:
229 |                     el_csv = {}
230 |                     for key in csv_keys:
231 |                         # First we look for the key specified by csv_keys and select the corresponding leaf
232 |                         k = csv_keys[key]
233 |                         selected = None
234 |                         last_key = None
235 |                         if type(k) is str:
236 |                             last_key = k
237 |                             k = [k] 
238 |                         if k[0] in el.keys():
239 |                             selected = el[k[0]]
240 |                         else:
241 |                             el_csv[key] = ""
242 |                             continue
243 |                         if len(k) > 1:
244 |                             for subkey in k[1:]:
245 |                                 if subkey in selected.keys():
246 |                                     selected = selected[subkey]
247 |                                     last_key = subkey
248 |                         # Once the leaf is selected, we verify if there is any kind of ID mapping and act accordingly
249 |                         if type(selected) is dict and 'id' in selected.keys() and 'details' in selected.keys() and last_key in details.keys():
250 |                             el_csv[key] = "%s (%d)" % (selected["details"][details[last_key]], selected["id"])
251 |                         elif type(selected) is not dict and type(selected) is not list:
252 |                             el_csv[key] = selected
253 |                         else:
254 |                             el_csv[key] = "unknown"
255 |                     # And we write the row
256 |                     w.writerow(el_csv)
257 | 
258 |     @staticmethod
259 |     def export_posts(posts, fmt, filename, tags_list=None, categories_list=None, users_list=None):
260 |         """
261 |             Exports posts in specified format to specified file
262 | 
263 |             :param posts: the posts to export
264 |             :param fmt: the export format (JSON or CSV)
265 |             :param tags_list: a list of tags to associate them with tag ids
266 |             :param categories_list: a list of categories to associate them with
267 |             category ids
268 |             :param user_list: a list of users to associate them with author id
269 |             :return: the length of the list written to the file
270 |         """
271 |         exported_posts = Exporter.setup_export(posts, 
272 |             [['title', 'rendered'], ['content', 'rendered'], ['excerpt', 'rendered']],
273 |             {
274 |                 'author': users_list,
275 |                 'categories': categories_list,
276 |                 'tags': tags_list,
277 |             })
278 |         
279 |         filename = Exporter.prepare_filename(filename, fmt)
280 |         csv_keys = {
281 |             'id': 'id',
282 |             'date': 'date',
283 |             'modified': 'modified',
284 |             'status': 'status',
285 |             'link': 'link',
286 |             'title': ['title', 'rendered'],
287 |             'author': 'author'
288 |         }
289 |         details = {
290 |             'author': 'name',
291 |         }
292 |         Exporter.write_file(filename, fmt, csv_keys, exported_posts, details)
293 |         return len(exported_posts)
294 | 
295 |     @staticmethod
296 |     def export_categories(categories, fmt, filename, category_list=None):
297 |         """
298 |             Exports categories in specified format to specified file.
299 | 
300 |             :param categories: the categories to export
301 |             :param fmt: the export format (JSON or CSV)
302 |             :param filename: the path to the file to write
303 |             :param category_list: the list of categories to be used as parents
304 |             :return: the length of the list written to the file
305 |         """
306 |         exported_categories = Exporter.setup_export(categories, # TODO
307 |             [],
308 |             {
309 |                 'parent': category_list,
310 |             })
311 |         
312 |         filename = Exporter.prepare_filename(filename, fmt)
313 | 
314 |         csv_keys = {
315 |             'id': 'id',
316 |             'name': 'name',
317 |             'post_count': 'count',
318 |             'description': 'description',
319 |             'parent': 'parent'
320 |         }
321 |         details = {
322 |             'parent': 'name'
323 |         }
324 |         Exporter.write_file(filename, fmt, csv_keys, exported_categories, details)
325 |         return len(exported_categories)
326 |     
327 |     @staticmethod
328 |     def export_tags(tags, fmt, filename):
329 |         """
330 |             Exports tags in specified format to specified file
331 | 
332 |             :param tags: the tags to export
333 |             :param fmt: the export format (JSON or CSV)
334 |             :param filename: the path to the file to write
335 |             :return: the length of the list written to the file
336 |         """
337 |         filename = Exporter.prepare_filename(filename, fmt)
338 |         
339 |         exported_tags = tags # It seems that no modification will be done for this one, so no deepcopy
340 |         csv_keys = {
341 |             'id': 'id',
342 |             'name': 'name',
343 |             'post_count': 'post_count',
344 |             'description': 'description'
345 |         }
346 |         Exporter.write_file(filename, fmt, csv_keys, exported_tags)
347 |         return len(exported_tags)
348 | 
349 |     @staticmethod
350 |     def export_users(users, fmt, filename):
351 |         """
352 |             Exports users in specified format to specified file.
353 | 
354 |             :param users: the users to export
355 |             :param fmt: the export format (JSON or CSV)
356 |             :param filename: the path to the file to write
357 |             :return: the length of the list written to the file
358 |         """
359 |         filename = Exporter.prepare_filename(filename, fmt)
360 |         
361 |         exported_users = users # It seems that no modification will be done for this one, so no deepcopy
362 |         csv_keys = {
363 |             'id': 'id',
364 |             'name': 'name', 
365 |             'link': 'link', 
366 |             'description': 'description'
367 |         }
368 |         Exporter.write_file(filename, fmt, csv_keys, exported_users)
369 |         return len(exported_users)
370 | 
371 |     @staticmethod
372 |     def export_pages(pages, fmt, filename, parent_pages=None, users=None):
373 |         """
374 |             Exports pages in specified format to specified file.
375 |         
376 |             :param pages: the pages to export
377 |             :param fmt: the export format (JSON or CSV)
378 |             :param filename: the path to the file to write
379 |             :param parent_pages: the list of all cached pages, to get parents
380 |             :param users: the list of all cached users, to get users
381 |             :return: the length of the list written to the file
382 |         """
383 |         exported_pages = Exporter.setup_export(pages,
384 |             [["guid", "rendered"], ["title", "rendered"], ["content", "rendered"], ["excerpt", "rendered"]],
385 |             {
386 |                 'parent': parent_pages,
387 |                 'author': users,
388 |             })
389 |         
390 |         filename = Exporter.prepare_filename(filename, fmt)
391 |         csv_keys = {
392 |             'id': 'id',
393 |             'title': ['title', 'rendered'],
394 |             'date': 'date',
395 |             'modified': 'modified',
396 |             'status': 'status',
397 |             'link': 'link',
398 |             'author': 'author',
399 |             'protected': ['content', 'protected']
400 |         }
401 |         details = {
402 |             'author': 'name'
403 |         }
404 |         Exporter.write_file(filename, fmt, csv_keys, exported_pages, details)
405 |         return len(exported_pages)
406 | 
407 |     @staticmethod
408 |     def export_media(media, fmt, filename, users=None):
409 |         """
410 |             Exports media in specified format to specified file.
411 | 
412 |             :param media: the media to export
413 |             :param fmt: the export format (JSON or CSV)
414 |             :param users: a list of users to associate them with author ids
415 |             :return: the length of the list written to the file
416 |         """
417 |         exported_media = Exporter.setup_export(media, 
418 |             [
419 |                 ['guid', 'rendered'],
420 |                 ['title', 'rendered'],
421 |                 ['description', 'rendered'],
422 |                 ['caption', 'rendered'],
423 |             ],
424 |             {
425 |                 'author': users,
426 |             })
427 |         
428 |         filename = Exporter.prepare_filename(filename, fmt)
429 |         csv_keys = {
430 |             'id': 'id',
431 |             'title': ['title', 'rendered'],
432 |             'date': 'date',
433 |             'modified': 'modified',
434 |             'status': 'status',
435 |             'link': 'link',
436 |             'author': 'author',
437 |             'media_type': 'media_type'
438 |         }
439 |         details = {
440 |             'author': 'name'
441 |         }
442 |         Exporter.write_file(filename, fmt, csv_keys, exported_media, details)
443 |         return len(exported_media)
444 | 
445 |     @staticmethod
446 |     def export_namespaces(namespaces, fmt, filename):
447 |         """
448 |             **NOT IMPLEMENTED** Exports namespaces in specified format to specified file.
449 | 
450 |             :param namespaces: the namespaces to export
451 |             :param fmt: the export format (JSON or CSV)
452 |             :return: the length of the list written to the file
453 |         """
454 |         Console.log_info("Namespaces export not available yet")
455 |         return 0
456 | 
457 |     # FIXME to be refactored
458 |     @staticmethod
459 |     def export_comments_interactive(comments, fmt, filename, parent_posts=None, users=None):
460 |         """
461 |             Exports comments in specified format to specified file.
462 | 
463 |             :param comments: the comments to export
464 |             :param fmt: the export format (JSON or CSV)
465 |             :param filename: the path to the file to write
466 |             :param parent_posts: the list of all cached posts, to get parent posts (not used yet because this could be too verbose)
467 |             :param users: the list of all cached users, to get users
468 |             :return: the length of the list written to the file
469 |         """
470 |         exported_comments = Exporter.setup_export(comments,
471 |             [["content", "rendered"]],
472 |             {
473 |                 'post': parent_posts,
474 |                 'author': users,
475 |             })
476 |         
477 |         # FIXME replacing the post ID by the post title in CSV mode doesn't work yet (nested keys)
478 |         filename = Exporter.prepare_filename(filename, fmt)
479 |         csv_keys = {
480 |             'id': 'id',
481 |             'post': 'post',
482 |             'date': 'date',
483 |             'status': 'status',
484 |             'link': 'link',
485 |             'author': 'author_name',
486 |         }
487 |         details = {
488 |             'post': ['title', 'rendered'] 
489 |         }
490 |         Exporter.write_file(filename, fmt, csv_keys, exported_comments, details)
491 |         return len(exported_comments)
492 | 
493 |     # TODO deprecated, to be moved to export_posts when HTML will be supported
494 |     @staticmethod
495 |     def export_posts_html(posts, folder, tags_list=None, categories_list=None,
496 |     users_list=None):
497 |         """
498 |             Exports posts as HTML to specified export folder.
499 |         
500 |             :param posts: the posts to export
501 |             :param folder: the export folder
502 |             :param tags_list: a list of tags to associate them with tag ids
503 |             :param categories_list: a list of categories to associate them with category ids
504 |             :param user_list: a list of users to associate them with author id
505 |             :return: the length of the list written to the file
506 |         """
507 |         exported_posts = 0
508 | 
509 |         date_format = "%Y-%m-%dT%H:%M:%S-%Z"
510 | 
511 |         if not os.path.isdir(folder):
512 |             os.makedirs(folder)
513 |         for post in posts:
514 |             post_file = None
515 |             if 'slug' in post.keys():
516 |                 post_file = open(os.path.join(folder, post['slug'])+".html",
517 |                 "wt", encoding="utf-8")
518 |             else:
519 |                 post_file = open(os.path.join(folder, str(post['id']))+".html",
520 |                 "wt", encoding="utf-8")
521 | 
522 |             title = "Unknown"
523 |             if 'title' in post.keys() and 'rendered' in post['title'].keys():
524 |                 title = post['title']['rendered']
525 | 
526 |             date_gmt = "Unknown"
527 |             if 'date_gmt' in post.keys():
528 |                 date_gmt = datetime.strptime(post['date_gmt'] +
529 |                                              "-GMT", date_format)
530 |             modified_gmt = "Unknown"
531 |             if 'modified_gmt' in post.keys():
532 |                 modified_gmt = datetime.strptime(post['modified_gmt'] +
533 |                                                  "-GMT", date_format)
534 |             status = "Unknown"
535 |             if 'status' in post.keys():
536 |                 status = post['status']
537 | 
538 |             post_type = "Unknown"
539 |             if 'type' in post.keys():
540 |                 post_type = post['type']
541 | 
542 |             link = "Unknown"
543 |             if 'link' in post.keys():
544 |                 link = html.escape(post['link'])
545 | 
546 |             comments = "Unknown"
547 |             if 'comment_status' in post.keys():
548 |                 comments = html.escape(post['comment_status'])
549 | 
550 |             content = "Unknown"
551 |             if 'content' in post.keys() and 'rendered' in \
552 |                     post['content'].keys():
553 |                 content = post['content']['rendered']
554 | 
555 |             excerpt = "Unknown"
556 |             if 'excerpt' in post.keys() and 'rendered' in \
557 |                     post['excerpt'].keys():
558 |                 excerpt = post['excerpt']['rendered']
559 | 
560 |             author = "Unknown"
561 |             if 'author' in post.keys() and users_list is not None:
562 |                 author_obj = get_by_id(users_list, post['author'])
563 |                 author = "%d: " % post['author']
564 |                 if author_obj is not None:
565 |                     if 'name' in author_obj.keys():
566 |                         author += author_obj['name']
567 |                     if 'slug' in author_obj.keys():
568 |                         author += "(%s)" % author_obj['slug']
569 |                     if 'link' in author_obj.keys():
570 |                         author += " - <a href=\"%s\">%s</a>" % \
571 |                                   (author_obj['link'], author_obj['link'])
572 |             elif 'author' in post.keys():
573 |                 author = str(post['author'])
574 | 
575 |             categories = "<li>Unknown</li>"
576 |             if 'categories' in post.keys() and categories_list is not None:
577 |                 categories = ""
578 |                 for cat in post['categories']:
579 |                     cat_obj = get_by_id(categories_list, cat)
580 |                     categories += "<li>%d: " % cat
581 |                     if cat_obj is not None:
582 |                         if 'name' in cat_obj.keys():
583 |                             categories += cat_obj['name']
584 |                         if 'link' in cat_obj.keys():
585 |                             categories += " - <a href=\"%s\">%s</a>" % \
586 |                                           (html.escape(cat_obj['link']),
587 |                                            html.escape(cat_obj['link']))
588 |                     categories += "</li>"
589 |             elif 'categories' in post.keys():
590 |                 categories = ""
591 |                 for cat in post['categories']:
592 |                     categories += "<li>" + str(post['categories']) + "</li>"
593 | 
594 |             tags = "<li>Unknown</li>"
595 |             if 'tags' in post.keys() and tags_list is not None:
596 |                 tags = ""
597 |                 for tag in post['tags']:
598 |                     tag_obj = get_by_id(tags_list, tag)
599 |                     tags += "<li>%d: " % tag
600 |                     if tag_obj is not None:
601 |                         if 'name' in tag_obj.keys():
602 |                             tags += tag_obj['name']
603 |                         if 'link' in tag_obj.keys():
604 |                             tags += " - <a href=\"%s\">%s</a>" % \
605 |                                     (html.escape(tag_obj['link']),
606 |                                      html.escape(tag_obj['link']))
607 |                     tags += "</li>"
608 |             elif 'tags' in post.keys():
609 |                 tags = ""
610 |                 for cat in post['tags']:
611 |                     tags += "<li>" + str(post['categories']) + "</li>"
612 | 
613 |             buffer = \
614 | """<!DOCTYPE html>
615 | <html>
616 |     <head>
617 |         <title>{title}</title>
618 |     </head>
619 |     <body>
620 |         <div>
621 |             <h1>Metadata</h1>
622 |             <ul>
623 |                 <li><strong>Date (GMT):</strong> {date_gmt}</li>
624 |                 <li><strong>Date modified (GMT):</strong> {modified_gmt}</li>
625 |                 <li><strong>Status:</strong> {status}</li>
626 |                 <li><strong>Type:</strong> {post_type}</li>
627 |                 <li><strong>Link:</strong> <a href=\"{link}\">{link}</a></li>
628 |                 <li><strong>Author:</strong> {author}</li>
629 |                 <li><strong>Comment status:</strong> {comments}</a></li>
630 |                 <li>
631 |                     <strong>Categories:</strong>
632 |                     <ul>
633 |                         {categories}
634 |                     </ul>
635 |                 </li>
636 |                 <li>
637 |                     <strong>Tags:</strong>
638 |                     <ul>
639 |                         {tags}
640 |                     </ul>
641 |                 </li>
642 |             </ul>
643 |         </div>
644 |         <div>
645 |             <h1>Excerpt</h1>
646 |             {excerpt}
647 |         </div>
648 |         <div>
649 |             <h1>{title}</h1>
650 |             {content}
651 |         </div>
652 |     </body>
653 | </html>
654 | """
655 |             buffer = buffer.format(
656 |             title=title,
657 |             date_gmt=date_gmt.strftime("%d/%m/%Y %H:%M:%S"),
658 |             modified_gmt=modified_gmt.strftime("%d/%m/%Y %H:%M:%S"),
659 |             status=status,
660 |             post_type=post_type,
661 |             link=link,
662 |             author=author,
663 |             comments=comments,
664 |             categories=categories,
665 |             tags=tags,
666 |             excerpt=excerpt,
667 |             content=content
668 |             )
669 | 
670 |             post_file.write(buffer)
671 |             post_file.close()
672 |             exported_posts += 1
673 | 
674 |         return exported_posts
675 | 
676 |     @staticmethod
677 |     def export_comments(posts, orphan_comments, export_folder):
678 |         """
679 |         Exports comments from posts and from orphans list
680 |         """
681 |         exported_comments = 0
682 |         for post in posts:
683 |             if 'comments' in post.keys() and len(post['comments']) > 0:
684 |                 for comment in post['comments']:
685 |                     if 'slug' in post.keys() and len(post['slug']) > 0:
686 |                         Exporter.export_comments_helper(comment, post['slug'], export_folder)
687 |                     else:
688 |                         Exporter.export_comments_helper(comment, post['id'], export_folder)
689 |                     exported_comments += 1
690 |         for comment in orphan_comments:
691 |             Exporter.export_comments_helper(comment, '__orphan_comments', export_folder)
692 |             exported_comments += 1
693 |         return exported_comments
694 | 
695 |     @staticmethod 
696 |     def export_comments_helper(comment, post, export_folder):
697 |         date_format = "%Y-%m-%dT%H:%M:%S-%Z"
698 |         if not os.path.isdir(export_folder):
699 |             os.mkdir(export_folder)
700 |         if not os.path.isdir(os.path.join(export_folder, post)):
701 |             os.mkdir(os.path.join(export_folder, post))
702 |         out_file = open(os.path.join(export_folder, post, "%04d.html" % comment['id']), "wt", encoding="utf-8")
703 |         date_gmt = "Unknown"
704 |         if 'date_gmt' in comment.keys():
705 |             date_gmt = datetime.strptime(comment['date_gmt'] +
706 |                                             "-GMT", date_format)
707 |         post_link = "None"
708 |         if '_links' in comment.keys() and 'up' in comment['_links'].keys() and len(comment['_links'].keys()) > 0 and 'href' in comment['_links']['up'][0].keys():
709 |             post_link = html.escape(comment['_links']['up'][0]['href'])
710 |         buffer = """
711 | <!DOCTYPE html>
712 | <html>
713 |     <head>
714 |         <title>{author}</title>
715 |     </head>
716 |     <body>
717 |         <div>
718 |             <h1>Metadata</h1>
719 |             <ul>
720 |                 <li><strong>Date (GMT):</strong> {date_gmt}</li>
721 |                 <li><strong>Status:</strong> {status}</li>
722 |                 <li><strong>Link:</strong> <a href=\"{link}\">{link}</a></li>
723 |                 <li><strong>Author:</strong> {author}</li>
724 |                 <li><strong>Author URL:</strong> {author_url}</li>
725 |                 <li><strong>Post ID:</strong> {post_id}</li>
726 |                 <li><strong>Post link:</strong> <a href\"={post_link}\">{post_link}</a></li>
727 |             </ul>
728 |         </div>
729 |         <div>
730 |             <h1>{author} on {post_title}</h1>
731 |             {content}
732 |         </div>
733 |     </body>
734 | </html>
735 |         """
736 |         buffer = buffer.format(
737 |             author=html.escape(comment["author_name"]),
738 |             author_url=html.escape(comment['author_url']),
739 |             date_gmt=date_gmt.strftime("%d/%m/%Y %H:%M:%S"),
740 |             status=html.escape(comment['status']),
741 |             link=html.escape(comment['link']),
742 |             content=html.escape(comment['content']['rendered']),
743 |             post_title=html.escape(post),
744 |             post_id=int(comment['post']),
745 |             post_link=post_link
746 |         )
747 |         out_file.write(buffer)
748 |         out_file.close()
749 | 


--------------------------------------------------------------------------------
/lib/infodisplayer.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy
  5 | of this software and associated documentation files (the "Software"), to deal
  6 | in the Software without restriction, including without limitation the rights
  7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  8 | copies of the Software, and to permit persons to whom the Software is
  9 | furnished to do so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import html
 24 | import csv
 25 | from datetime import datetime
 26 | 
 27 | from lib.console import Console
 28 | 
 29 | class InfoDisplayer:
 30 |     """
 31 |     Static class to display information for different categories
 32 |     """
 33 | 
 34 |     @staticmethod
 35 |     def display_basic_info(information):
 36 |         """
 37 |         Displays basic information about the WordPress instance
 38 |         param information: information as a JSON object
 39 |         """
 40 |         print()
 41 | 
 42 |         if 'name' in information.keys():
 43 |             print("Site name: %s" % html.unescape(information['name']))
 44 | 
 45 |         if 'description' in information.keys():
 46 |             print("Site description: %s" %
 47 |                   html.unescape(information['description']))
 48 | 
 49 |         if 'home' in information.keys():
 50 |             print("Site home: %s" % html.unescape(information['home']))
 51 | 
 52 |         if 'gmt_offset' in information.keys():
 53 |             timezone_string = ""
 54 |             gmt_offset = str(information['gmt_offset'])
 55 |             if '-' not in gmt_offset:
 56 |                 gmt_offset = '+' + gmt_offset
 57 |             if 'timezone_string' in information.keys():
 58 |                 timezone_string = information['timezone_string']
 59 |             print("Site Timezone: %s (GMT%s)" % (timezone_string, gmt_offset))
 60 | 
 61 |         if 'namespaces' in information.keys():
 62 |             print('Namespaces (API provided by addons):')
 63 |             ns_ref = {}
 64 |             try:
 65 |                 ns_ref_file = open("lib/plugins/plugin_list.csv", "rt")
 66 |                 ns_ref_reader = csv.reader(ns_ref_file)
 67 |                 for row in ns_ref_reader:
 68 |                     desc = None
 69 |                     url = None
 70 |                     if len(row) > 1 and len(row[1]) > 0:
 71 |                         desc = row[1]
 72 |                     if len(row) > 2 and len(row[2]) > 0:
 73 |                         url = row[2]
 74 |                     ns_ref[row[0]] = {"desc": desc, "url": url}
 75 |                 ns_ref_file.close()
 76 |             except:
 77 |                 Console.log_error("Could not load namespaces reference file")
 78 |             for ns in information['namespaces']:
 79 |                 tip = ""
 80 |                 if ns in ns_ref.keys():
 81 |                     if ns_ref[ns]['desc'] is not None:
 82 |                         if tip == "":
 83 |                             tip += " - "
 84 |                         tip += ns_ref[ns]['desc']
 85 |                     if ns_ref[ns]['url'] is not None:
 86 |                         if tip == "":
 87 |                             tip += " - "
 88 |                         tip += " - " + ns_ref[ns]['url']
 89 |                 print('    %s%s' % (ns, tip))
 90 | 
 91 |         # TODO, dive into authentication
 92 |         print()
 93 | 
 94 |     @staticmethod
 95 |     def display_namespaces(information, details=False):
 96 |         """
 97 |             Displays namespace list of the WordPress API
 98 | 
 99 |             :param information: information as a JSON object
100 |             :param details: unused, available for compatibility purposes
101 |         """
102 |         print()
103 |         if information is not None:
104 |             for ns in information:
105 |                 print("* %s" % ns)
106 |         print()
107 | 
108 |     @staticmethod
109 |     def display_endpoints(information):
110 |         """
111 |         Displays endpoint documentation of the WordPress API
112 |         param information: information as a JSON object
113 |         """
114 |         print()
115 | 
116 |         if 'routes' not in information.keys():
117 |             Console.log_error("Did not find the routes for endpoint discovery")
118 |             return None
119 | 
120 |         for url, route in information['routes'].items():
121 |             print("%s (Namespace: %s)" % (url, route['namespace']))
122 |             for endpoint in route['endpoints']:
123 |                 methods = "    "
124 |                 first = True
125 |                 for method in endpoint['methods']:
126 |                     if first:
127 |                         methods += method
128 |                         first = False
129 |                     else:
130 |                         methods += ", " + method
131 |                 print(methods)
132 |                 if len(endpoint['args']) > 0:
133 |                     for arg, props in endpoint['args'].items():
134 |                         required = ""
135 |                         if props['required']:
136 |                             required = " (required)"
137 |                         print("        " + arg + required)
138 |                         if 'type' in props.keys():
139 |                             print("            type: " + str(props['type']))
140 |                         if 'default' in props.keys():
141 |                             print("            default: " +
142 |                                   str(props['default']))
143 |                         if 'enum' in props.keys():
144 |                             allowed = "            allowed values: "
145 |                             first = True
146 |                             for val in props['enum']:
147 |                                 if first:
148 |                                     allowed += val
149 |                                     first = False
150 |                                 else:
151 |                                     allowed += ", " + val
152 |                             print(allowed)
153 |                         if 'description' in props.keys():
154 |                             print("            " + str(props['description']))
155 |             print()
156 | 
157 |     @staticmethod
158 |     def display_posts(information, orphan_comments=[], details=False):
159 |         """
160 |         Displays posts published on the WordPress instance
161 |         param information: information as a JSON object
162 |         """
163 |         print()
164 |         date_format = "%Y-%m-%dT%H:%M:%S-%Z"
165 |         for post in information:
166 |             if post is not None:
167 |                 line = ""
168 |                 if 'id' in post.keys():
169 |                     line += "ID: %d" %post['id']
170 |                 if 'title' in post.keys():
171 |                     line += " - " + html.unescape(post['title']['rendered'])
172 |                 if 'date_gmt' in post.keys():
173 |                     date_gmt = datetime.strptime(post['date_gmt'] +
174 |                                                 "-GMT", date_format)
175 |                     line += " on %s" % \
176 |                             date_gmt.strftime("%d/%m/%Y at %H:%M:%S")
177 |                 if 'link' in post.keys():
178 |                     line += " - " + post['link']
179 |                 if details:
180 |                     if 'slug' in post.keys():
181 |                         line += "\nSlug: " + post['slug']
182 |                     if 'status' in post.keys():
183 |                         line += "\nStatus: " + post['status']
184 |                     if 'author' in post.keys():
185 |                         line += "\nAuthor ID: %d" % post['author']
186 |                     if 'comment_status' in post.keys():
187 |                         line += "\nComment status: " + post['comment_status']
188 |                     if 'template' in post.keys() and len(post['template']) > 0:
189 |                         line += "\nTemplate: " + post['template']
190 |                     if 'categories' in post.keys() and len(post['categories']) > 0:
191 |                         line += "\nCategory IDs: "
192 |                         for cat in post['categories']:
193 |                             line += "%d, " % cat
194 |                         line = line[:-2]
195 |                     if 'excerpt' in post.keys():
196 |                         line += "\nExcerpt: "
197 |                         if 'protected' in post['excerpt'].keys() and post['excerpt']['protected']:
198 |                             line += "<post is protected>"
199 |                         elif 'rendered' in post['excerpt'].keys():
200 |                             line += "\n" + html.unescape(post['excerpt']['rendered'])
201 |                     if 'content' in post.keys():
202 |                         line += "\nContent: "
203 |                         if 'protected' in post['content'].keys() and post['content']['protected']:
204 |                             line += "<post is protected>"
205 |                         elif 'rendered' in post['content'].keys():
206 |                             line += "\n" + html.unescape(post['content']['rendered'])
207 |                 if 'comments' in post.keys():
208 |                     for comment in post['comments']:
209 |                         line += "\n\t * Comment by %s from (%s) - %s" % (comment['author_name'], comment['author_url'], comment['link'])
210 |                 print(line)
211 |         
212 |         if len(orphan_comments) > 0:
213 |             # TODO: Untested code, may never be executed, I don't know how the REST API and WordPress handle post/comment link in back-end
214 |             print()
215 |             print("Found orphan comments! Check them right below:")
216 |             for comment in post['comments']:
217 |                 line += "\n\t * Comment by %s from (%s) on post ID %d - %s" % (comment['author_name'], comment['author_url'], comment['post'], comment['link'])
218 |         print()
219 | 
220 |     @staticmethod
221 |     def display_comments(information, details=False):
222 |         """
223 |             Displays comments published on the WordPress instance.
224 | 
225 |             :param information: information as a JSON object
226 |             :param details: if the details should be displayed
227 |         """
228 |         print()
229 |         date_format = "%Y-%m-%dT%H:%M:%S-%Z"
230 |         for comment in information:
231 |             if comment is not None:
232 |                 line = ""
233 |                 if 'id' in comment.keys():
234 |                     line += "ID: %d" % comment['id']
235 |                 if 'post' in comment.keys():
236 |                     line += " - Post ID: %d" % comment['post'] #html.unescape(post['title']['rendered'])
237 |                 if 'author_name' in comment.keys():
238 |                     line += " - By %s" % comment['author_name']
239 |                 if 'date' in comment.keys():
240 |                     date_gmt = datetime.strptime(comment['date_gmt'] +
241 |                                                 "-GMT", date_format)
242 |                     line += " on %s" % \
243 |                             date_gmt.strftime("%d/%m/%Y at %H:%M:%S")
244 |                 if details:
245 |                     if 'parent' in comment.keys() and comment['parent'] != 0:
246 |                         line += "\nParent ID: " + comment['parent']
247 |                     if 'link' in comment.keys():
248 |                         line += "\nLink: " + comment['link']
249 |                     if 'status' in comment.keys():
250 |                         line += "\nStatus: " + comment['status']
251 |                     if 'author_url' in comment.keys() and len(comment['author_url']) > 0:
252 |                         line += "\nAuthor URL: " + comment['author_url']
253 |                     if 'content' in comment.keys():
254 |                         line += "\nContent: \n" + html.unescape(comment['content']['rendered'])
255 |                 print(line)
256 |         print()
257 | 
258 |     @staticmethod
259 |     def display_users(information, details=False):
260 |         """
261 |             Displays users on the WordPress instance
262 | 
263 |             :param information: information as a JSON object
264 |             :param details: display more details about the user
265 |         """
266 |         print()
267 |         for user in information:
268 |             if user is not None:
269 |                 line = ""
270 |                 if 'id' in user.keys():
271 |                     line += "User ID: %d\n" % user['id']
272 |                 if 'name' in user.keys():
273 |                     line += "    Display name: %s\n" % user['name']
274 |                 if 'slug' in user.keys():
275 |                     line += "    User name (probable): %s\n" % user['slug']
276 |                 if 'description' in user.keys():
277 |                     line += "    User description: %s\n" % user['description']
278 |                 if 'url' in user.keys():
279 |                     line += "    User website: %s\n" % user['url']
280 |                 if 'link' in user.keys():
281 |                     line += "    User personal page: %s\n" % user['link']
282 |                 if details:
283 |                     if "avatar_urls" in user.keys() and type(user["avatar_urls"]) is dict and len(user["avatar_urls"].keys()) > 0:
284 |                         line += "    Avatars: \n"
285 |                         for key, value in user["avatar_urls"].items():
286 |                             line += "        * %s: %s\n" % (key, value)
287 |                 print(line)
288 |         print()
289 | 
290 |     @staticmethod
291 |     def display_categories(information, details=False):
292 |         """
293 |         Displays categories of the WordPress instance
294 |         param information: information as a JSON object
295 |         """
296 |         print()
297 |         for category in information:
298 |             if category is not None:
299 |                 line = ""
300 |                 if 'id' in category.keys():
301 |                     line += "Category ID: %d\n" % category['id']
302 |                 if 'name' in category.keys():
303 |                     line += "    Name: %s\n" % category['name']
304 |                 if 'description' in category.keys():
305 |                     line += "    Description: %s\n" % category['description']
306 |                 if 'count' in category.keys():
307 |                     line += "    Number of posts: %d\n" % category['count']
308 |                 if 'link' in category.keys():
309 |                     line += "    Page: %s\n" % category['link']
310 |                 if details:
311 |                     if 'slug' in category.keys():
312 |                         line += "    Slug: %s\n" % category['slug']
313 |                     if 'taxonomy' in category.keys():
314 |                         line += "    Taxonomy: %s\n" % category['slug']
315 |                     if 'parent' in category.keys():
316 |                         line += "    Parent category: "
317 |                         if type(category['parent']) is str:
318 |                             line += category['parent']
319 |                         elif type(category['parent']) is int:
320 |                             line += "%d" % category['parent']
321 |                         else:
322 |                             line += "Unknown"
323 |                         line += "\n"
324 |                 print(line)
325 |         print()
326 | 
327 |     @staticmethod
328 |     def display_tags(information, details=False):
329 |         """
330 |         Displays tags of the WordPress instance
331 |         param information: information as a JSON object
332 |         """
333 |         print()
334 |         for tag in information:
335 |             if tag is not None:
336 |                 line = ""
337 |                 if 'id' in tag.keys():
338 |                     line += "Tag ID: %d\n" % tag['id']
339 |                 if 'name' in tag.keys():
340 |                     line += "    Name: %s\n" % tag['name']
341 |                 if 'description' in tag.keys():
342 |                     line += "    Description: %s\n" % tag['description']
343 |                 if 'count' in tag.keys():
344 |                     line += "    Number of posts: %d\n" % tag['count']
345 |                 if 'link' in tag.keys():
346 |                     line += "    Page: %s\n" % tag['link']
347 |                 if details:
348 |                     if 'slug' in tag.keys():
349 |                         line += "    Slug: %s\n" % tag['slug']
350 |                     if 'taxonomy' in tag.keys():
351 |                         line += "    Taxonomy: %s\n" % tag['slug']
352 |                 print(line)
353 |         print()
354 | 
355 |     @staticmethod
356 |     def display_media(information, details=False):
357 |         """
358 |             Displays media objects of the WordPress instance
359 | 
360 |             :param information: information as a JSON object
361 |             :param details: if the details should be displayed
362 |         """
363 |         print()
364 |         date_format = "%Y-%m-%dT%H:%M:%S-%Z"
365 |         for media in information:
366 |             if media is not None:
367 |                 line = ""
368 |                 if 'id' in media.keys():
369 |                     line += "Media ID: %d\n" % media['id']
370 |                 if 'title' in media.keys() and 'rendered' in media['title']:
371 |                     line += "    Media title: %s\n" % \
372 |                             html.unescape(media['title']['rendered'])
373 |                 if 'date_gmt' in media.keys():
374 |                     date_gmt = datetime.strptime(media['date_gmt'] +
375 |                                                 "-GMT", date_format)
376 |                     line += "    Upload date (GMT): %s\n" % \
377 |                             date_gmt.strftime("%d/%m/%Y %H:%M:%S")
378 |                 if 'media_type' in media.keys():
379 |                     line += "    Media type: %s\n" % media['media_type']
380 |                 if 'mime_type' in media.keys():
381 |                     line += "    Mime type: %s\n" % media['mime_type']
382 |                 if 'link' in media.keys():
383 |                     line += "    Page: %s\n" % media['link']
384 |                 if 'source_url' in media.keys():
385 |                     line += "    Source URL: %s\n" % media['source_url']
386 |                 if details:
387 |                     if 'slug' in media.keys():
388 |                         line += "Slug: " + media['slug'] + "\n"
389 |                     if 'status' in media.keys():
390 |                         line += "Status: " + media['status'] + "\n"
391 |                     if 'type' in media.keys():
392 |                         line += "Type: " + media['type'] + "\n"
393 |                     if 'author' in media.keys():
394 |                         line += "Author ID: %d\n" % media['author']
395 |                     if 'alt_text' in media.keys():
396 |                         line += "Alt text: " + media['alt_text'] + "\n"
397 |                     if 'comment_status' in media.keys():
398 |                         line += "Comment status: " + media['comment_status'] + "\n"
399 |                     if 'post' in media.keys():
400 |                         line += "Post or page ID: %d\n" % media['post']
401 |                     if 'description' in media.keys() and media['description']['rendered']:
402 |                         line += "Description: \n" + html.unescape(media['description']['rendered']) + "\n"
403 |                     if 'caption' in media.keys() and media['caption']['rendered']:
404 |                         line += "Caption: \n" + html.unescape(media['caption']['rendered']) + "\n"
405 |                 print(line)
406 |         print()
407 | 
408 |     @staticmethod
409 |     def display_pages(information, details=False):
410 |         """
411 |             Displays pages published on the WordPress instance
412 | 
413 |             :param information: information as a JSON object
414 |             :param details: if the details should be displayed
415 |         """
416 |         print()
417 |         for page in information:
418 |             if page is not None:
419 |                 line = ""
420 |                 if 'id' in page.keys():
421 |                     line += "ID: %d" % page['id']
422 |                 if 'title' in page.keys() and 'rendered' in page['title']:
423 |                     line += " - " + html.unescape(page['title']['rendered'])
424 |                 if 'link' in page.keys():
425 |                     line += " - " + page['link']
426 |                 if details:
427 |                     if 'slug' in page.keys():
428 |                         line += "\nSlug: " + page['slug']
429 |                     if 'status' in page.keys():
430 |                         line += "\nStatus: " + page['status']
431 |                     if 'author' in page.keys():
432 |                         line += "\nAuthor ID: %d" % page['author']
433 |                     if 'comment_status' in page.keys():
434 |                         line += "\nComment status: " + page['comment_status']
435 |                     if 'template' in page.keys() and len(page['template']) > 0:
436 |                         line += "\nTemplate: " + page['template']
437 |                     if 'parent' in page.keys():
438 |                         if page['parent'] == 0:
439 |                             line += "\nParent: none"
440 |                         else:
441 |                             line += "\nParent ID: %d" % page['parent']
442 |                     if 'excerpt' in page.keys():
443 |                         line += "\nExcerpt: "
444 |                         if 'protected' in page['excerpt'].keys() and page['excerpt']['protected']:
445 |                             line += "<page is protected>"
446 |                         elif 'rendered' in page['excerpt'].keys():
447 |                             line += "\n" + html.unescape(page['excerpt']['rendered'])
448 |                     if 'content' in page.keys():
449 |                         line += "\nContent: "
450 |                         if 'protected' in page['content'].keys() and page['content']['protected']:
451 |                             line += "<page is protected>"
452 |                         elif 'rendered' in page['content'].keys():
453 |                             line += "\n" + html.unescape(page['content']['rendered'])
454 |                 print(line)
455 |         print()
456 | 
457 |     @staticmethod
458 |     def recurse_list_or_dict(data, tab):
459 |         """
460 |         Helper function to generate recursive display of API data
461 |         """
462 |         if type(data) is not dict and type(data) is not list:
463 |             return tab + str(data)
464 | 
465 |         line = ""
466 |         if type(data) is list:
467 |             i = 0
468 |             length = len(data)
469 |             for value in data:
470 |                 do_jmp = True
471 |                 if type(value) is dict or type(value) is list:
472 |                     line += InfoDisplayer.recurse_list_or_dict(value, tab+"\t")
473 |                 elif type(value) is str:
474 |                     if "\n" in value:
475 |                         line += "\n" + tab + "\t"
476 |                         line += value.replace("\n", "\n"+tab+"\t")
477 |                     else:
478 |                         line += " "
479 |                         line += value.replace("\n", "\n"+tab)
480 |                         do_jmp = False
481 |                 else:
482 |                     line += " " + str(value)
483 |                 if i < length and do_jmp:
484 |                     line += "\n"
485 |                 i += 1
486 |         else:
487 |             for key,value in data.items():
488 |                 line += "\n" + tab + key
489 |                 if type(value) is dict or type(value) is list:
490 |                     line += InfoDisplayer.recurse_list_or_dict(value, tab+"\t")
491 |                 elif type(value) is str:
492 |                     if "\n" in value:
493 |                         line += "\n" + tab + "\t"
494 |                         line += value.replace("\n", "\n"+tab+"\t")
495 |                     else:
496 |                         line += " "
497 |                         line += value.replace("\n", "\n"+tab)
498 |                 else:
499 |                     line += " " + str(value)
500 |         return line
501 | 
502 |     @staticmethod
503 |     def display_crawled_ns(information):
504 |         """
505 |         Displays endpoints details published on the WordPress instance
506 |         param information: information as a JSON object
507 |         """
508 |         print()
509 |         for url,data in information.items():
510 |             line = "\n"
511 |             line += url
512 |             tab = "\t"
513 |             line += InfoDisplayer.recurse_list_or_dict(data, tab)
514 |             print(line)
515 |         print()
516 | 


--------------------------------------------------------------------------------
/lib/interactive.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy
  5 | of this software and associated documentation files (the "Software"), to deal
  6 | in the Software without restriction, including without limitation the rights
  7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  8 | copies of the Software, and to permit persons to whom the Software is
  9 | furnished to do so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import cmd
 24 | import argparse
 25 | import shlex
 26 | import sys
 27 | import re
 28 | import copy
 29 | import os
 30 | 
 31 | from lib.wpapi import WPApi, WordPressApiNotV2
 32 | from lib.requestsession import RequestSession
 33 | from lib.console import Console
 34 | from lib.infodisplayer import InfoDisplayer
 35 | from lib.exporter import Exporter
 36 | from lib.utils import get_by_id
 37 | 
 38 | class ArgumentParser(argparse.ArgumentParser):
 39 |     """
 40 |     Wrapper for argparse.ArgumentParser (especially the help function that quits the application after display)
 41 |     """
 42 |     def __init__(self, prog="", description=""):
 43 |         argparse.ArgumentParser.__init__(self, prog=prog, add_help=False, description=description)
 44 |         self.add_argument("--help", "-h", help="print this help", action="store_true")
 45 |         self.should_help = True
 46 | 
 47 |     def custom_parse_args(self, args):
 48 |         args = self.parse_args(shlex.split(args))
 49 |         if args.help:
 50 |             if self.should_help:
 51 |                 self.print_help(sys.stdout)
 52 |                 print()
 53 |             self.should_help = False
 54 |             return None
 55 |         if self.should_help:
 56 |             return args
 57 |         else:
 58 |             return None
 59 | 
 60 |     def error(self, message):
 61 |         if self.should_help:
 62 |             self.print_help(sys.stdout)
 63 |             print()
 64 |             self.should_help = False
 65 | 
 66 | class InteractiveShell(cmd.Cmd):
 67 |     """
 68 |     The interactive shell for the application
 69 |     """
 70 |     intro = """
 71 |     Entering interactive session
 72 |     Use the 'help' command to get a list of available commands and parameters, 'exit' to quit
 73 |     `command -h` gives more details about a command
 74 |     """
 75 |     prompt = "> "
 76 | 
 77 |     def __init__(self, target, session, version):
 78 |         cmd.Cmd.__init__(self)
 79 |         self.target = target
 80 |         InteractiveShell.prompt = Console.red + target + Console.normal + " > "
 81 |         self.session = session
 82 |         self.version = version
 83 |         self.scanner = WPApi(self.target, session=session)
 84 | 
 85 |     @staticmethod
 86 |     def export_decorator(export_func, is_all, export_str, json, csv, values, kwargs = {}):
 87 |         if json is not None:
 88 |             json_file = json
 89 |             if is_all:
 90 |                 json_file = json + "-" + export_str
 91 |             args = [values]
 92 |             args.append(Exporter.JSON)
 93 |             args.append(json_file)
 94 |             export_func(*args, **kwargs)
 95 |         if csv is not None:
 96 |             csv_file = csv
 97 |             if is_all:
 98 |                 csv_file = csv + "-" + export_str
 99 |             args = [values]
100 |             args.append(Exporter.CSV)
101 |             args.append(csv_file)
102 |             export_func(*args, **kwargs)
103 |     
104 |     def get_fetch_or_list_type(self, obj_type, plural=False):
105 |         """
106 |             Returns a dict containing all necessary metadata
107 |              about the obj_type to list and fetch data
108 | 
109 |             :param obj_type: the type of the object
110 |             :param plural: whether the name must be plural or not
111 |         """
112 |         display_func = None
113 |         export_func = None
114 |         additional_info = {}
115 |         obj_name = ""
116 |         if obj_type == WPApi.USER:
117 |             display_func = InfoDisplayer.display_users
118 |             export_func = Exporter.export_users
119 |             additional_info = {}
120 |             obj_name = "Users" if plural else "User"
121 |         elif obj_type == WPApi.TAG:
122 |             display_func = InfoDisplayer.display_tags
123 |             export_func = Exporter.export_tags
124 |             additional_info = {}
125 |             obj_name = "Tags" if plural else "Tag"
126 |         elif obj_type == WPApi.CATEGORY:
127 |             display_func = InfoDisplayer.display_categories
128 |             export_func = Exporter.export_categories
129 |             additional_info = {
130 |                 'category_list': self.scanner.categories
131 |             }
132 |             obj_name = "Categories" if plural else "Category"
133 |         elif obj_type == WPApi.POST:
134 |             display_func = InfoDisplayer.display_posts
135 |             export_func = Exporter.export_posts
136 |             additional_info = {
137 |                 'tags_list': self.scanner.tags,
138 |                 'categories_list': self.scanner.categories,
139 |                 'users_list': self.scanner.users
140 |             }
141 |             obj_name = "Posts" if plural else "Post"
142 |         elif obj_type == WPApi.PAGE:
143 |             display_func = InfoDisplayer.display_pages
144 |             export_func = Exporter.export_pages
145 |             additional_info = {
146 |                 'parent_pages': self.scanner.pages,
147 |                 'users': self.scanner.users
148 |             }
149 |             obj_name = "Pages" if plural else "Page"
150 |         elif obj_type == WPApi.COMMENT:
151 |             display_func = InfoDisplayer.display_comments
152 |             export_func = Exporter.export_comments_interactive
153 |             additional_info = {
154 |                 #'parent_posts': self.scanner.posts, # May be too verbose
155 |                 'users': self.scanner.users
156 |             }
157 |             obj_name = "Comments" if plural else "Comment"
158 |         elif obj_type == WPApi.MEDIA:
159 |             display_func = InfoDisplayer.display_media
160 |             export_func = Exporter.export_media
161 |             additional_info = {'users': self.scanner.users}
162 |             obj_name = "Media"
163 |         elif obj_type == WPApi.NAMESPACE:
164 |             display_func = InfoDisplayer.display_namespaces
165 |             export_func = Exporter.export_media
166 |             additional_info = {}
167 |             obj_name = "Namespaces" if plural else "Namespace"
168 | 
169 |         return {
170 |             "display_func": display_func,
171 |             "export_func": export_func,
172 |             "additional_info": additional_info,
173 |             "obj_name": obj_name
174 |         }
175 | 
176 |     def fetch_obj(self, obj_type, obj_id, cache=True, json=None, csv=None):
177 |         """
178 |             Displays and exports (if relevant) the object fetched by ID
179 | 
180 |             :param obj_type: the type of the object
181 |             :param obj_id: the ID of the obj
182 |             :param cache: whether to use the cache of not
183 |             :param json: json export filename
184 |             :param csv: csv export filename
185 |         """
186 |         prop = self.get_fetch_or_list_type(obj_type)
187 |         print(prop["obj_name"] + " details")
188 |         try:
189 |             obj = self.scanner.get_obj_by_id(obj_type, obj_id, use_cache=cache)
190 |             if len(obj) == 0:
191 |                 Console.log_info(prop["obj_name"] + " not found\n")
192 |             else:
193 |                 prop["display_func"](obj, details=True)
194 |                 if len(prop["additional_info"].keys()) > 0:
195 |                     InteractiveShell.export_decorator(prop["export_func"], False, "", json, csv, obj, prop["additional_info"])
196 |                 else:
197 |                     InteractiveShell.export_decorator(prop["export_func"], False, "", json, csv, obj)
198 |         except WordPressApiNotV2:
199 |             Console.log_error("The API does not support WP V2")
200 |         except IOError as e:
201 |             Console.log_error("Could not open %s for writing" % e.filename)
202 |         print()
203 |     
204 |     def list_obj(self, obj_type, start, limit, is_all=False, cache=True, json=None, csv=None):
205 |         """
206 |             Displays and exports (if relevant) the object list
207 | 
208 |             :param obj_type: the type of the object
209 |             :param start: the offset of the first object
210 |             :param limit: the maximum number of objects to list
211 |             :param is_all: are all object types requested?
212 |             :param cache: whether to use the cache of not
213 |             :param json: json export filename
214 |             :param csv: csv export filename
215 |         """
216 |         prop = self.get_fetch_or_list_type(obj_type, plural=True)
217 |         print(prop["obj_name"] + " details")
218 |         try:
219 |             kwargs = {}
220 |             if obj_type == WPApi.POST:
221 |                 kwargs = {"comments": False}
222 |             obj_list = self.scanner.get_obj_list(obj_type, start, limit, cache, kwargs=kwargs)
223 |             prop["display_func"](obj_list)
224 |             InteractiveShell.export_decorator(prop["export_func"], is_all, prop["obj_name"].lower(), json, csv, obj_list)
225 |         except WordPressApiNotV2:
226 |             Console.log_error("The API does not support WP V2")
227 |         except IOError as e:
228 |             Console.log_error("Could not open %s for writing" % e.filename)
229 |         print()
230 | 
231 |     def do_exit(self, arg):
232 |         'Exit wp-json-scraper'
233 |         return True
234 |     
235 |     def do_show(self, arg):
236 |         'Shows information about parameters in memory'
237 |         parser = ArgumentParser(prog='show', description='show information about global parameters')
238 |         parser.add_argument("what", choices=['all', 'target', 'proxy', 'cookies', 'credentials', 'version'],
239 |         help='choose the information to be displayed', default='all')
240 |         args = parser.custom_parse_args(arg)
241 |         if args is None:
242 |             return
243 |         if args.what == 'all' or args.what == 'target':
244 |             print("Target: %s" % self.target)
245 |         if args.what == 'all' or args.what == 'proxy':
246 |             proxies = self.session.get_proxies()
247 |             if proxies is not None and len(proxies) > 0:
248 |                 print ("Proxies:")
249 |                 for key, value in proxies.items():
250 |                     print("\t%s: %s" % (key, value))
251 |             else:
252 |                 print ("Proxy: none")
253 |         if args.what == 'all' or args.what == 'cookies':
254 |             cookies = self.session.get_cookies()
255 |             if len(cookies) > 0:
256 |                 print("Cookies:")
257 |                 for key, value in cookies.items():
258 |                     print("\t%s: %s" % (key, value))
259 |             else:
260 |                 print("Cookies: none")
261 |         if args.what == 'all' or args.what == 'credentials':
262 |             credentials = self.session.get_creds()
263 |             if credentials is not None:
264 |                 creds_str = "Credentials: "
265 |                 for el in credentials:
266 |                     creds_str += el + ":"
267 |                 print(creds_str[:-1])
268 |             else:
269 |                 print("Credentials: none")
270 |         if args.what == 'all' or args.what == 'version':
271 |             print("WPJsonScraper version: %s" % self.version)
272 |         print()
273 |     
274 |     def do_set(self, arg):
275 |         'Sets a global parameter of WPJsonScanner'
276 |         parser = ArgumentParser(prog='set', description='sets global parameters for WPJsonScanner')
277 |         parser.add_argument("what", choices=['target', 'proxy', 'cookies', 'credentials'],
278 |         help='the parameter to set')
279 |         parser.add_argument("value", type=str, help='the new value of the parameter (for cookies, set as cookie string: "n1=v1; n2=v2")')
280 |         args = parser.custom_parse_args(arg)
281 |         if args is None:
282 |             return
283 |         if args.what == 'target':
284 |             self.target = args.value
285 |             if re.match(r'^https?://.*$', self.target) is None:
286 |                 self.target = "http://" + self.target
287 |             if re.match(r'^.+/$', self.target) is None:
288 |                 self.target += "/"
289 |             InteractiveShell.prompt = Console.red + self.target + Console.normal + " > "
290 |             print("target = %s" % args.value)
291 |             self.scanner = WPApi(self.target, session=self.session)
292 |             Console.log_info("Cache is erased but session stays the same (with cookies and authorization)")
293 |         elif args.what == 'proxy':
294 |             self.session.set_proxy(args.value)
295 |             print("proxy = %s" % args.value)
296 |         elif args.what == 'cookies':
297 |             self.session.set_cookies(args.value)
298 |             print("Cookies set!")
299 |         elif args.what == "credentials":
300 |             authorization_list = args.value.split(':')
301 |             if len(authorization_list) == 1:
302 |                 authorization = (authorization_list[0], '')
303 |             elif len(authorization_list) >= 2:
304 |                 authorization = (authorization_list[0],
305 |                 ':'.join(authorization_list[1:]))
306 |             self.session.set_creds(authorization)
307 |             print("Credentials set!")
308 |         print()
309 | 
310 |     def do_list(self, arg):
311 |         'Gets the list of something from the server'
312 |         parser = ArgumentParser(prog='list', description='gets a list of something from the server')
313 |         parser.add_argument("what", choices=[
314 |             'posts', 
315 |             #'post-revisions', 
316 |             #'wp-blocks', 
317 |             'categories',
318 |             'tags',
319 |             'pages',
320 |             'comments',
321 |             'media',
322 |             'users',
323 |             #'themes',
324 |             #'search-results',
325 |             'namespaces',
326 |             'all',
327 |             ],
328 |             help='what to list')
329 |         parser.add_argument("--json", "-j", help="list and store as json to the specified file")
330 |         parser.add_argument("--csv", "-c", help="list and store as csv to the specified file")
331 |         parser.add_argument("--limit", "-l", type=int, help="limit the number of results")
332 |         parser.add_argument("--start", "-s", type=int, help="start at the given index")
333 |         parser.add_argument("--no-cache", dest="cache", action="store_false", help="don't lookup in cache and ask the server")
334 |         args = parser.custom_parse_args(arg)
335 |         if args is None:
336 |             return
337 |         # The checks must be ordered by dependencies
338 |         kwargs = {
339 |             "start": args.start, 
340 |             "limit": args.limit, 
341 |             "is_all": args.what == "all", 
342 |             "cache": args.cache, 
343 |             "json": args.json, 
344 |             "csv": args.csv
345 |         }
346 |         if args.what == "all" or args.what == "users":
347 |             self.list_obj(WPApi.USER, **kwargs)
348 |         if args.what == "all" or args.what == "tags":
349 |             self.list_obj(WPApi.TAG, **kwargs)
350 |         if args.what == "all" or args.what == "categories":
351 |             self.list_obj(WPApi.CATEGORY, **kwargs)
352 |         if args.what == "all" or args.what == "posts":
353 |             self.list_obj(WPApi.POST, **kwargs)
354 |         if args.what == "all" or args.what == "pages":
355 |             self.list_obj(WPApi.PAGE, **kwargs)
356 |         if args.what == "all" or args.what == "comments":
357 |             self.list_obj(WPApi.COMMENT, **kwargs)
358 |         if args.what == "all" or args.what == "media":
359 |             self.list_obj(WPApi.MEDIA, **kwargs)
360 |         if args.what == "all" or args.what == "namespaces":
361 |             self.list_obj(WPApi.NAMESPACE, **kwargs)
362 | 
363 |     def do_fetch(self, arg):
364 |         'Fetches a specific content specified by ID'
365 |         parser = ArgumentParser(prog='fetch', description='fetches something from the server or the cache by ID')
366 |         parser.add_argument("what", choices=[
367 |             'post', 
368 |             #'post-revision', 
369 |             #'wp-block', 
370 |             'category',
371 |             'tag',
372 |             'page',
373 |             'comment',
374 |             'media',
375 |             'user',
376 |             #'theme',
377 |             #'search-result',
378 |             ],
379 |             help='what to fetch')
380 |         parser.add_argument("id", type=int, help='the ID of the content to fetch')
381 |         parser.add_argument("--json", "-j", help="list and store as json to the specified file")
382 |         parser.add_argument("--csv", "-c", help="list and store as csv to the specified file")
383 |         parser.add_argument("--no-cache", dest="cache", action="store_false", help="don't lookup in cache and ask the server")
384 |         args = parser.custom_parse_args(arg)
385 |         what_type = None
386 |         if args is None:
387 |             return
388 |         what_type = WPApi.str_type_to_native(args.what)
389 |         
390 |         if what_type is not None:
391 |             self.fetch_obj(what_type, args.id, cache=args.cache, json=args.json, csv=args.csv)
392 |         else:
393 |             print("Not implemented")
394 |             print()
395 |     
396 |     def do_search(self, arg):
397 |         'Looks for specific keywords in the WordPress API'
398 |         parser = ArgumentParser(prog='search', description='searches something from the server')
399 |         parser.add_argument("--type", "-t", action="append", choices=[
400 |             'all',
401 |             'post', 
402 |             #'post-revision', 
403 |             #'wp-block', 
404 |             'category',
405 |             'tag',
406 |             'page',
407 |             'comment',
408 |             'media',
409 |             'user',
410 |             #'theme',
411 |             #'search-result',
412 |             ],
413 |             help='the types to look for (default all)',
414 |             dest='what'
415 |             )
416 |         parser.add_argument("keywords", help='the keywords to look for')
417 |         parser.add_argument("--json", "-j", help="list and store as json to the specified file(s)")
418 |         parser.add_argument("--csv", "-c", help="list and store as csv to the specified file(s)")
419 |         parser.add_argument("--limit", "-l", type=int, help="limit the number of results")
420 |         parser.add_argument("--start", "-s", type=int, help="start at the given index")
421 |         args = parser.custom_parse_args(arg)
422 |         if args is None:
423 |             return
424 |         what_types = WPApi.convert_obj_types_to_list(args.what)
425 |         results = self.scanner.search(what_types, args.keywords, args.start, args.limit)
426 |         print()
427 |         for k, v in results.items():
428 |             prop = self.get_fetch_or_list_type(k, plural=True)
429 |             print(prop["obj_name"] + " details")
430 |             if len(v) == 0:
431 |                 Console.log_info("No result")
432 |             else:
433 |                 try:
434 |                     prop["display_func"](v)
435 |                     InteractiveShell.export_decorator(
436 |                         prop["export_func"],
437 |                         len(what_types) > 1 or WPApi.ALL_TYPES in what_types,
438 |                         prop["obj_name"].lower(),
439 |                         args.json,
440 |                         args.csv,
441 |                         v
442 |                     )
443 |                 except WordPressApiNotV2:
444 |                     Console.log_error("The API does not support WP V2")
445 |                 except IOError as e:
446 |                     Console.log_error("Could not open %s for writing" % e.filename)
447 |             print()
448 | 
449 |     def do_dl(self, arg):
450 |         'Downloads a media file (e.g. from /wp-content/uploads/) based on its ID'
451 | 
452 |         parser = ArgumentParser(prog='dl', description='downloads a media from the server')
453 |         parser.add_argument("ids", help='ids to look for (comma separated), "all" or "cache"')
454 |         parser.add_argument("dest", help='destination folder')
455 |         parser.add_argument("--no-cache", dest="cache", action="store_false", help="don't lookup in cache and ask the server")
456 |         parser.add_argument("--use-slug", dest="slug", action="store_true", help="use the slug as filename and not the source URL name")
457 |         args = parser.custom_parse_args(arg)
458 |         if args is None:
459 |             return
460 |         
461 |         if not os.path.isdir(args.dest):
462 |             Console.log_error("The destination is not a folder or does not exist")
463 |             return
464 | 
465 |         print("Pulling the media URLs")
466 |         media, slugs = self.scanner.get_media_urls(args.ids, args.cache)
467 |         if len(media) == 0:
468 |             Console.log_error("No media found corresponding to the criteria")
469 |             return
470 |         print("%d media URLs found" % len(media))
471 |         answer = input("Do you wish to proceed to download? (y/N)")
472 |         if answer.lower() != "y":
473 |             return
474 |         print("Note: Only files over 10MB are logged here")
475 | 
476 |         number_downloaded = 0
477 |         if args.slug:
478 |             number_downloaded = Exporter.download_media(media, args.dest, slugs)
479 |         else:
480 |             number_downloaded = Exporter.download_media(media, args.dest)
481 |         print('Downloaded %d media to %s' % (number_downloaded, args.dest))
482 | 
483 | def start_interactive(target, session, version):
484 |     """
485 |     Starts a new interactive session
486 |     """
487 |     InteractiveShell(target, session, version).cmdloop()


--------------------------------------------------------------------------------
/lib/plugins/plugin_list.csv:
--------------------------------------------------------------------------------
  1 | oembed/1.0,Allows embedded representation of a URL,
  2 | contact-form-7/v1,Manages multiple contact forms,https://wordpress.org/plugins/contact-form-7/
  3 | wc/v1,WooCommerce is a free eCommerce plugin that allows to sell anything,https://wordpress.org/plugins/woocommerce/
  4 | wc/v2,WooCommerce is a free eCommerce plugin that allows to sell anything,https://wordpress.org/plugins/woocommerce/
  5 | facebook/v1,,
  6 | regenerate-thumbnails/v1,Regenerate Thumbnails allows to regenerate all thumbnail sizes for one or more images,https://wordpress.org/plugins/regenerate-thumbnails/
  7 | wp/v2,The default API integrated since WordPress 4.7,https://developer.wordpress.org/rest-api/
  8 | akismet/v1,Akismet checks comments and contact form submissions against a global database of spam,https://wordpress.org/plugins/akismet/
  9 | yoast/v1,Yoast SEO is a WordPress SEO plugin,https://wordpress.org/plugins/wordpress-seo/
 10 | wp-super-cache/v1,This plugin generates static html files from your dynamic WordPress blog,https://wordpress.org/plugins/wp-super-cache/
 11 | script-manager/v1,,
 12 | jetpack/v4,Hassle-free design and marketing,https://wordpress.org/plugins/jetpack/
 13 | redirection/v1,Redirection is the most popular redirect manager for WordPress,https://wordpress.org/plugins/redirection/
 14 | tribe/events/v1,Create and manage an events calendar,https://wordpress.org/plugins/the-events-calendar/
 15 | 2fa/v1,,
 16 | wpsc/v1,,
 17 | v1/products/,,
 18 | v1/cart/,,
 19 | v1/,,
 20 | post-views-counter,Counts views of posts of the website,https://wordpress.org/plugins/post-views-counter/
 21 | frm-admin/v1,,
 22 | listo/v1,Listo is a simple plugin that supplies other plugins and themes with commonly used lists,https://wordpress.org/plugins/listo/
 23 | themeisle-sdk/v1,,
 24 | bogo/v1,Bogo is a straight-forward multilingual plugin for WordPress,https://wordpress.org/plugins/bogo/
 25 | envira/v1,Responsive Image Gallery for WordPress,https://wordpress.org/plugins/envira-gallery-lite/
 26 | disqus/v1,Disqus is the web’s most popular commenting system,https://wordpress.org/plugins/disqus-comment-system/
 27 | invitations-for-slack/v1,Invitations for Slack allows to show “Join us on Slack.” buttons,https://wordpress.org/plugins/invitations-for-slack/
 28 | rop/v1,Revive Old Posts helps to keep the old posts alive by automatically sharing them on Social Networks,https://wordpress.org/plugins/tweet-old-post/
 29 | cf-api/v2,,
 30 | thrive,,
 31 | om-cc,,
 32 | om/fiw,,
 33 | tatsu/v1,,
 34 | semplice/v1/editor,,
 35 | semplice/v1/admin,,
 36 | semplice/v1/frontend,,
 37 | jwt-auth/v1,,
 38 | pum/v1,,
 39 | deliciousbrains/v1,,
 40 | sportspress/v2,Creates a professional sports website,https://wordpress.org/plugins/sportspress/
 41 | content-forms/v1,,
 42 | wp_live_chat_support/v1,Fully functional Live Chat plugin,https://wordpress.org/plugins/wp-live-chat-support/
 43 | if-menu/v1,Control what menu items visitors see based on visibility rules,https://wordpress.org/plugins/if-menu/
 44 | iowd/v1,,
 45 | save,,
 46 | facetwp/v1/,,
 47 | slimstat/v1,A web analytics plugin for WordPress,https://wordpress.org/plugins/wp-slimstat/
 48 | social-share/v1,,
 49 | social-counts/v1,,
 50 | swp_api,,
 51 | app/v2,,
 52 | alids/v1/,,
 53 | template-directory,,
 54 | customify/v1,With Customify developers can easily create advanced theme-specific options inside the WordPress Customizer,https://wordpress.org/plugins/customify/
 55 | pixcare/v1,,
 56 | codepinch/v1,A website error correcter?,https://wordpress.org/plugins/wp-error-fix/
 57 | blc/v1,Broken Link Checker?,https://wordpress.org/plugins/broken-link-checker/
 58 | visualizer/v1,,
 59 | td-composer,,
 60 | tdw,,
 61 | mpp/v1,,
 62 | wooketing/v1,,
 63 | gf/v2,,
 64 | wpcsp/v1,Set the CSP settings and will add them to the page the visitor requested,https://wordpress.org/plugins/wp-content-security-policy/
 65 | instant-images,One click uploads of Unsplash photos,https://wordpress.org/plugins/instant-images/
 66 | api,,
 67 | templates-directory,,
 68 | rollbar/v1,Rollbar collects errors and allows to analyze them,https://wordpress.org/plugins/rollbar/
 69 | liveblog/v1,Quick and simple blogging for following fast-paced events,https://wordpress.org/plugins/liveblog/
 70 | integrity-checker/v1,Verifies that all installed code is identical to it’s original version and more,https://wordpress.org/plugins/integrity-checker/
 71 | pll/v1,,
 72 | wp-post-modal/v1,,
 73 | quiz-survey-master/v1,Creates surveys for the users,https://wordpress.org/plugins/quiz-master-next/
 74 | rp-wapi/v1,,
 75 | wc-product-add-ons/v1,WooCommerce PPOM (Personalized Product Option Manager) Plugin adds input fields on product page,https://wordpress.org/plugins/search/wc+products/
 76 | wpglib/v1,,
 77 | tcm/v1,,
 78 | affwp/v1,,
 79 | custom-api/v1,,
 80 | wplr/v1,"Synchronizes photos, collections, keywords and metadata between Lightroom and WordPress",https://wordpress.org/plugins/search/wplr/
 81 | acf/v3,Exposes Advanced Custom Fields Endpoints in the WordPress REST API,https://wordpress.org/plugins/acf-to-rest-api/
 82 | pp/v1,,
 83 | dooplay,,
 84 | dbmovies,,
 85 | pciextranet/v2,,
 86 | cloozi/rest,,
 87 | store-locator-plus/v1,Maps locations on Google Maps,https://wordpress.org/plugins/store-locator-le/
 88 | store-locator-plus/v2,Maps locations on Google Maps,https://wordpress.org/plugins/store-locator-le/
 89 | joinzee-wp/v1,,
 90 | ccf/v1,Custom Contact Forms?,https://wordpress.org/plugins/custom-contact-forms/
 91 | keremiya,,
 92 | pageviews/1.0,A simple and lightweight pageviews counter,https://wordpress.org/plugins/pageviews/
 93 | watchful/v1,,
 94 | shortcode-change,,
 95 | shortcode-insert,,
 96 | upload/,,
 97 | sync/,,
 98 | download/,,
 99 | agroopwoo,,
100 | rest-routes/v2,Building custom endpoints for WP REST API made easy,https://wordpress.org/plugins/rest-routes/
101 | pvc/v1,,
102 | ee/v4.8.29,,
103 | ee/v4.8.33,,
104 | ee/v4.8.34,,
105 | ee/v4.8.36,,
106 | vegashero/v1,,
107 | ml-api/v2,,
108 | mwl/v1,,
109 | envira-background/v1,,
110 | api/v1,,
111 | rta,,
112 | stec/v2,,
113 | erp/v1,,
114 | autofill/v1,,
115 | /autofill/v1,,
116 | rest/v1,,
117 | wp/v2/acf,,
118 | ms/api,,
119 | siso/v1,,
120 | dp/v1,,
121 | indieauth/1.0,IndieAuth is a way for doing Web sign-in,https://wordpress.org/plugins/indieauth/
122 | sloc_geo/1.0,,
123 | link-preview/1.0,Display a preview for a URL similar to sharing a link on Facebook,https://wordpress.org/plugins/wp-link-preview/
124 | webmention/1.0,Enable conversation across the web,https://wordpress.org/plugins/webmention/
125 | bballs,,
126 | logbook/v1,This plugin is for logging users' activities,https://wordpress.org/plugins/search/logbook/
127 | child-themify/v1,Create child themes with the click of a button,https://wordpress.org/plugins/child-themify/
128 | versionpress,,
129 | keliron/api/v3,,
130 | bablic,Translate WP with this multilingual plugin,https://wordpress.org/plugins/bablic/
131 | eum/v1,,
132 | tvo/v1,,
133 | frm/v2,,
134 | app-mobile,,
135 | ap3/v1,,
136 | diets/v1,,
137 | manage-customers/v1,,
138 | leads,WordPress Leads?,https://wordpress.org/plugins/leads/
139 | commentcava/v1.0,CommentCaVa disables the comment field for a certain amount of time,https://wordpress.org/plugins/commentcava/
140 | lscf_rest,Advanced WordPress Filter Plugin,https://wordpress.org/plugins/live-search-custom-fields-lite/
141 | wpv/v1,,
142 | tho/v1,,
143 | aghigh/v1,,
144 | spnl/v1,A Newsletter Plugin for WordPress,https://wordpress.org/plugins/search/spnl/
145 | task_manager/v1,Task manager,https://wordpress.org/plugins/task-manager/ 
146 | customfiy/v1,Theme Customizer Booster,https://wordpress.org/plugins/customify/
147 | CHifcoRegCardPluginV2/v1,,
148 | CHifcoFireBasePlugin/v1,,
149 | CHifcoFireBaseVII/v2,,
150 | wp-api-menus/v2,,
151 | envira-lightroom/v3,Envira Gallery allows you to create photo galleries and video galleries,https://wordpress.org/plugins/envira-gallery-lite/
152 | comments/v1,,
153 | addcomment/v1,,
154 | pf/v1,,
155 | postmatic/v1,,
156 | ivole/v1,Customer Reviews for WooCommerce?,https://wordpress.org/plugins/customer-reviews-woocommerce/
157 | shwcp/v1,,
158 | wp-rest-api-log,WordPress plugin to log REST API requests and responses,https://wordpress.org/plugins/wp-rest-api-log/
159 | wk/v1,,
160 | sfp-live-search/v1,,
161 | csco/v1,,
162 | caos/v1,A plugin that inserts the Analytics tracking code,https://wordpress.org/plugins/host-analyticsjs-local/
163 | rest/events,,
164 | obfx-google-analytics,,
165 | shariff/v1,Shariff provides share buttons that respect the privacy of visitors,https://wordpress.org/plugins/shariff/
166 | wp-discourse/v1,This plugin allows to use Discourse as a community engine,https://wordpress.org/plugins/wp-discourse/
167 | dbmvs,,
168 | wp-crm/v1/form,This plugin is intended to significantly improve user management,https://wordpress.org/plugins/wp-crm/
169 | gutenberg/v1,A new editing experience for WordPress,https://wordpress.org/plugins/gutenberg/
170 | tribe_events/v2,,
171 | rnet/v1,,
172 | eklo/v2,,
173 | menus/v1,,
174 | sow/v1,,
175 | wpbooklist/v1,Used to sell books, record and catalog a library,https://wordpress.org/plugins/wpbooklist/
176 | tabulate,This plugin provides a simple user-friendly interface to tables in the database,https://wordpress.org/plugins/tabulate/
177 | geoblog/v1,,
178 | acf/v2,,
179 | mobilegate/v2,,
180 | jamtrap/v1,,
181 | paf,,
182 | in-cron/v1,,
183 | awb/v1,AWB allows to use parallax backgrounds with images, videos, youtube and vimeo,https://wordpress.org/plugins/advanced-backgrounds/
184 | wctofb/v1,WooCommerce to facebook shop,https://wordpress.org/plugins/woo-to-facebook-shop/
185 | weekly-class/v1,Generate a weekly schedule of classes,https://wordpress.org/plugins/weekly-class-schedule/
186 | be-to-tatsu/v1,,
187 | braintree-gateway/v1/,A payment gateway,
188 | bfwc/settings/kount/,,
189 | gembloong/,,
190 | 


--------------------------------------------------------------------------------
/lib/requestsession.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy
  5 | of this software and associated documentation files (the "Software"), to deal
  6 | in the Software without restriction, including without limitation the rights
  7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  8 | copies of the Software, and to permit persons to whom the Software is
  9 | furnished to do so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | from http.cookies import SimpleCookie
 24 | import requests
 25 | 
 26 | from lib.console import Console
 27 | 
 28 | class ConnectionCouldNotResolve(Exception):
 29 |     pass
 30 | 
 31 | class ConnectionReset(Exception):
 32 |     pass
 33 | 
 34 | class ConnectionRefused(Exception):
 35 |     pass
 36 | 
 37 | class ConnectionTimeout(Exception):
 38 |     pass
 39 | 
 40 | class HTTPError400(Exception):
 41 |     pass
 42 | 
 43 | class HTTPError401(Exception):
 44 |     pass
 45 | 
 46 | class HTTPError403(Exception):
 47 |     pass
 48 | 
 49 | class HTTPError404(Exception):
 50 |     pass
 51 | 
 52 | class HTTPError500(Exception):
 53 |     pass
 54 | 
 55 | class HTTPError502(Exception):
 56 |     pass
 57 | 
 58 | class HTTPError(Exception):
 59 |     pass
 60 | 
 61 | class RequestSession:
 62 |     """
 63 |     Wrapper to handle the requests library with session support
 64 |     """
 65 | 
 66 |     def __init__(self, proxy=None, cookies=None, authorization=None):
 67 |         """
 68 |         Creates a new RequestSession instance
 69 |         param proxy: a dict containing a proxy server string for HTTP and/or
 70 |         HTTPS connection
 71 |         param cookies: a string in the format of the Cookie header
 72 |         param authorization: a tuple containing login and password or
 73 |         requests.auth.HTTPBasicAuth for basic authentication or
 74 |         requests.auth.HTTPDigestAuth for NTLM-like authentication
 75 |         """
 76 |         self.s = requests.Session()
 77 |         if proxy is not None:
 78 |             self.set_proxy(proxy)
 79 |         if cookies is not None:
 80 |             self.set_cookies(cookies)
 81 |         if authorization is not None and (
 82 |             type(authorization) is tuple and len(authorization) == 2 or
 83 |             type(authorization) is requests.auth.HTTPBasicAuth or
 84 |             type(authorization) is requests.auth.HTTPDigestAuth):
 85 |             self.s.auth = authorization
 86 | 
 87 |     def get(self, url):
 88 |         """
 89 |         Calls the get function from requests but handles errors to raise proper
 90 |         exception following the context
 91 |         """
 92 |         return self.do_request("get", url)
 93 | 
 94 | 
 95 |     def post(self, url, data=None):
 96 |         """
 97 |         Calls the post function from requests but handles errors to raise proper
 98 |         exception following the context
 99 |         """
100 |         return self.do_request("post", url, data)
101 | 
102 |     def do_request(self, method, url, data=None):
103 |         """
104 |         Helper class to regroup requests and handle exceptions at the same
105 |         location
106 |         """
107 |         response = None
108 |         try:
109 |             if method == "post":
110 |                 response = self.s.post(url, data)
111 |             else:
112 |                 response = self.s.get(url)
113 |         except requests.ConnectionError as e:
114 |             if "Errno -5" in str(e) or "Errno -2" in str(e)\
115 |               or "Errno -3" in str(e):
116 |                 Console.log_error("Could not resolve host %s" % url)
117 |                 raise ConnectionCouldNotResolve
118 |             elif "Errno 111" in str(e):
119 |                 Console.log_error("Connection refused by %s" % url)
120 |                 raise ConnectionRefused
121 |             elif "RemoteDisconnected" in str(e):
122 |                 Console.log_error("Connection reset by %s" % url)
123 |                 raise ConnectionReset
124 |             else:
125 |                 print(e)
126 |                 raise e
127 |         except Exception as e:
128 |             raise e
129 | 
130 |         if response.status_code == 400:
131 |             raise HTTPError400
132 |         elif response.status_code == 401:
133 |             Console.log_error("Error 401 (Unauthorized) while trying to fetch"
134 |             " the API")
135 |             raise HTTPError401
136 |         elif response.status_code == 403:
137 |             Console.log_error("Error 403 (Authorization Required) while trying"
138 |             " to fetch the API")
139 |             raise HTTPError403
140 |         elif response.status_code == 404:
141 |             raise HTTPError404
142 |         elif response.status_code == 500:
143 |             Console.log_error("Error 500 (Internal Server Error) while trying"
144 |             " to fetch the API")
145 |             raise HTTPError500
146 |         elif response.status_code == 502:
147 |             Console.log_error("Error 502 (Bad Gateway) while trying"
148 |             " to fetch the API")
149 |             raise HTTPError404
150 |         elif response.status_code > 400:
151 |             Console.log_error("Error %d while trying to fetch the API" %
152 |             response.status_code)
153 |             raise HTTPError
154 | 
155 |         return response
156 |     
157 |     def set_cookies(self, cookies):
158 |         """
159 |         Sets new cookies from a string
160 |         """
161 |         c = SimpleCookie()
162 |         c.load(cookies)
163 |         for key, m in c.items():
164 |             self.s.cookies.set(key, m.value)
165 |     
166 |     def get_cookies(self):
167 |         return self.s.cookies.get_dict()
168 |     
169 |     def set_proxy(self, proxy):
170 |         prot = 'http'
171 |         if proxy[:5].lower() == 'https':
172 |             prot = 'https'
173 |         self.s.proxies = {prot: proxy}
174 |     
175 |     def get_proxies(self):
176 |         return self.s.proxies
177 |     
178 |     def set_creds(self, credentials):
179 |         self.s.auth = credentials
180 | 
181 |     def get_creds(self):
182 |         return self.s.auth


--------------------------------------------------------------------------------
/lib/utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy
  5 | of this software and associated documentation files (the "Software"), to deal
  6 | in the Software without restriction, including without limitation the rights
  7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  8 | copies of the Software, and to permit persons to whom the Software is
  9 | furnished to do so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import json
 24 | 
 25 | from urllib.parse import urlsplit, urlunsplit
 26 | 
 27 | def get_by_id(value, id):
 28 |     """
 29 |     Utility function to retrieve a value by and ID in a list of dicts, returns
 30 |     None of no correspondance have been made
 31 |     param value: the dict to process
 32 |     param id: the id to get
 33 |     """
 34 |     if value is None:
 35 |         return None
 36 |     for val in value:
 37 |         if 'id' in val.keys() and val['id'] == id:
 38 |             return val
 39 |     return None
 40 | 
 41 | # Neat code part from https://codereview.stackexchange.com/questions/13027/joini
 42 | # ng-url-path-components-intelligently
 43 | def url_path_join(*parts):
 44 |     """Normalize url parts and join them with a slash."""
 45 |     schemes, netlocs, paths, queries, fragments = \
 46 |     zip(*(urlsplit(part) for part in parts))
 47 |     scheme = first(schemes)
 48 |     netloc = first(netlocs)
 49 |     path = '/'.join(x.strip('/') for x in paths if x)
 50 |     query = first(queries)
 51 |     fragment = first(fragments)
 52 |     return urlunsplit((scheme, netloc, path, query, fragment))
 53 | 
 54 | def first(sequence, default=''):
 55 |     return next((x for x in sequence if x), default)
 56 | 
 57 | # Code from https://stackoverflow.com/questions/3173320/text-progress-bar-in-th
 58 | # e-console
 59 | 
 60 | def print_progress_bar (iteration, total, prefix = '', suffix = '', decimals = 1,\
 61 |  length = 100, fill = '█'):
 62 |     """
 63 |     Call in a loop to create terminal progress bar
 64 |     @params:
 65 |         iteration   - Required  : current iteration (Int)
 66 |         total       - Required  : total iterations (Int)
 67 |         prefix      - Optional  : prefix string (Str)
 68 |         suffix      - Optional  : suffix string (Str)
 69 |         decimals    - Optional  : positive number of decimals in percent \
 70 |         complete (Int)
 71 |         length      - Optional  : character length of bar (Int)
 72 |         fill        - Optional  : bar fill character (Str)
 73 |     """
 74 |     try:
 75 |       percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / \
 76 |       float(total)))
 77 |       filledLength = int(length * iteration // total)
 78 |     except:
 79 |       percent = 0
 80 |       filledLength = 0
 81 | 
 82 |     bar = fill * filledLength + '-' * (length - filledLength)
 83 |     print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = '\r')
 84 |     # Print New Line on Complete
 85 |     if iteration == total: 
 86 |         print()
 87 | 
 88 | def get_content_as_json (response_obj):
 89 |     """
 90 |     When a BOM is present (see issue #2), UTF-8 is not properly decoded by 
 91 |     Response.json() method. This is a helper function that returns a json value 
 92 |     even if a BOM is present in UTF-8 text
 93 |     @params:
 94 |         response_obj: a requests Response instance
 95 |     @returns: a decoded json object (list or dict)
 96 |     """
 97 |     if response_obj.content[:3]== b'\xef\xbb\xbf': # UTF-8 BOM
 98 |         content = response_obj.content.decode("utf-8-sig")
 99 |         return json.loads(content)
100 |     else:
101 |         try:
102 |             return response_obj.json()
103 |         except:
104 |             return {}
105 | 


--------------------------------------------------------------------------------
/lib/wpapi.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy
  5 | of this software and associated documentation files (the "Software"), to deal
  6 | in the Software without restriction, including without limitation the rights
  7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  8 | copies of the Software, and to permit persons to whom the Software is
  9 | furnished to do so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import math
 24 | import copy
 25 | 
 26 | import requests
 27 | from urllib.parse import urlencode
 28 | 
 29 | from json.decoder import JSONDecodeError
 30 | 
 31 | from lib.exceptions import NoWordpressApi, WordPressApiNotV2, \
 32 |                             NSNotFoundException
 33 | from lib.requestsession import RequestSession, HTTPError400, HTTPError404
 34 | from lib.utils import url_path_join, print_progress_bar, get_content_as_json, get_by_id
 35 | 
 36 | class WPApi:
 37 |     """
 38 |     Queries the WordPress API to retrieve information
 39 |     """
 40 | 
 41 |     # Object types
 42 |     POST = 0
 43 |     """
 44 |         The post type
 45 |     """
 46 |     POST_REVISION = 1
 47 |     """
 48 |         The post revision type
 49 |     """
 50 |     WP_BLOCK = 2
 51 |     """
 52 |         The Gutenberg block type
 53 |     """
 54 |     CATEGORY = 3
 55 |     """
 56 |         The category type
 57 |     """
 58 |     TAG = 4
 59 |     """
 60 |         The tag type
 61 |     """
 62 |     PAGE = 5
 63 |     """
 64 |         The page type
 65 |     """
 66 |     COMMENT = 6
 67 |     """
 68 |         The comment type
 69 |     """
 70 |     MEDIA = 7
 71 |     """
 72 |         The media type
 73 |     """
 74 |     USER = 8
 75 |     """
 76 |         The user type
 77 |     """
 78 |     THEME = 9
 79 |     """
 80 |         The theme type
 81 |     """
 82 |     NAMESPACE = 10
 83 |     """
 84 |         The namespace type
 85 |     """
 86 |     #SEARCH_RESULT = 10
 87 |     ALL_TYPES = 20
 88 |     """
 89 |         Constant representing all types
 90 |     """
 91 | 
 92 |     def __init__(self, target, api_path="wp-json/", session=None,
 93 |                  search_terms=None):
 94 |         """
 95 |         Creates a new instance of WPApi
 96 |         param target: the target of the scan
 97 |         param api_path: the api path, if non-default
 98 |         param session: the requests session object to use for HTTP requests
 99 |         param search_terms : the terms of the keyword search, if any
100 |         """
101 |         self.api_path = api_path
102 |         self.search_terms = search_terms
103 |         self.has_v2 = None
104 |         self.name = None
105 |         self.description = None
106 |         self.url = target
107 |         self.basic_info = None
108 |         self.posts = None
109 |         self.tags = None
110 |         self.categories = None
111 |         self.users = None
112 |         self.media = None
113 |         self.pages = None
114 |         self.s = None
115 |         self.comments_loaded = False
116 |         self.orphan_comments = []
117 |         self.comments = None
118 | 
119 |         if session is not None:
120 |             self.s = session
121 |         else:
122 |             self.s = RequestSession()
123 | 
124 |     @staticmethod 
125 |     def str_type_to_native(str_type):
126 |         """
127 |             Converts a single object type as str to its corresponding native type.
128 |             If the object type is unknown, this returns None as a fallback.
129 |             This may have to be modified in cases of bugs.
130 | 
131 |             :param str_type: the object type as string
132 |             :return: the object type as native constant
133 | 
134 |             ```
135 |             str_type_to_native("post") # returns WPApi.POST
136 |             ```
137 |         """
138 |         if str_type == "user":
139 |             return WPApi.USER
140 |         elif str_type == "tag":
141 |             return WPApi.TAG
142 |         elif str_type == "category":
143 |             return WPApi.CATEGORY
144 |         elif str_type == "post":
145 |             return WPApi.POST
146 |         elif str_type == "page":
147 |             return WPApi.PAGE
148 |         elif str_type == "comment":
149 |             return WPApi.COMMENT
150 |         elif str_type == "media":
151 |             return WPApi.MEDIA
152 |         elif str_type == "post_revision":
153 |             return WPApi.POST_REVISION
154 |         elif str_type == "block":
155 |             return WPApi.WP_BLOCK
156 |         elif str_type == "theme":
157 |             return WPApi.THEME
158 |         elif str_type == "namespace":
159 |             return WPApi.NAMESPACE
160 |         return None
161 | 
162 |     @staticmethod
163 |     def convert_obj_types_to_list(str_types):
164 |         """
165 |             Converts a list of object type as list to a list of native constants 
166 |             representing the object types.
167 |         """
168 |         out = []
169 |         if str_types is None or len(str_types) == 0 or 'all' in str_types:
170 |             return [WPApi.ALL_TYPES]
171 |         for el in str_types:
172 |             current = WPApi.str_type_to_native(el)
173 |             if current is not None:
174 |                 out.append(current)
175 |         return out
176 | 
177 |     def get_orphans_comments(self):
178 |         """
179 |         Returns the list of comments for which a post hasn't been found
180 |         """
181 |         return self.orphan_comments
182 | 
183 |     def get_basic_info(self):
184 |         """
185 |         Collects and stores basic information about the target
186 |         """
187 |         rest_url = url_path_join(self.url, self.api_path)
188 |         if self.basic_info is not None:
189 |             return self.basic_info
190 | 
191 |         try:
192 |             req = self.s.get(rest_url)
193 |         except Exception:
194 |             raise NoWordpressApi
195 |         if req.status_code >= 400:
196 |             raise NoWordpressApi
197 |         self.basic_info = get_content_as_json(req)
198 | 
199 |         if 'name' in self.basic_info.keys():
200 |             self.name = self.basic_info['name']
201 | 
202 |         if 'description' in self.basic_info.keys():
203 |             self.description = self.basic_info['description']
204 | 
205 |         if 'namespaces' in self.basic_info.keys() and 'wp/v2' in \
206 |                 self.basic_info['namespaces']:
207 |             self.has_v2 = True
208 | 
209 |         return self.basic_info
210 | 
211 |     def crawl_pages(self, url, start=None, num=None, search_terms=None, display_progress=True):
212 |         """
213 |         Crawls all pages while there is at least one result for the given
214 |         endpoint or tries to get pages from start to end
215 |         """
216 |         if search_terms is None:
217 |             search_terms = self.search_terms
218 |         page = 1
219 |         total_entries = 0
220 |         total_pages = 0
221 |         more_entries = True
222 |         entries = []
223 |         base_url = url
224 |         entries_left = 1
225 |         per_page = 10
226 |         if search_terms is not None:
227 |             if '?' in base_url:
228 |                 base_url += '&' + urlencode({'search': search_terms})
229 |             else:
230 |                 base_url += '?' + urlencode({'search': search_terms})
231 |         if start is not None:
232 |             page = math.floor(start/per_page) + 1
233 |         if num is not None:
234 |             entries_left = num
235 |         while more_entries and entries_left > 0:
236 |             rest_url = url_path_join(self.url, self.api_path, (base_url % page))
237 |             if start is not None:
238 |                 rest_url += "&per_page=%d" % per_page
239 |             try:
240 |                 req = self.s.get(rest_url)
241 |                 if (page == 1 or start is not None and page == math.floor(start/per_page) + 1) and 'X-WP-Total' in req.headers:
242 |                     total_entries = int(req.headers['X-WP-Total'])
243 |                     total_pages = int(req.headers['X-WP-TotalPages'])
244 |                     print("Total number of entries: %d" % total_entries)
245 |                     if start is not None and total_entries < start:
246 |                         start = total_entries - 1
247 |             except HTTPError400:
248 |                 break
249 |             except Exception:
250 |                 raise WordPressApiNotV2
251 |             try:
252 |                 json_content = get_content_as_json(req)
253 |                 if type(json_content) is list and len(json_content) > 0:
254 |                     if (start is None or start is not None and page > math.floor(start/per_page) + 1) and num is None:
255 |                         entries += json_content
256 |                         if start is not None:
257 |                             entries_left -= len(json_content)
258 |                     elif start is not None and page == math.floor(start/per_page) + 1:
259 |                         if num is None or num is not None and len(json_content[start % per_page:]) < num:
260 |                             entries += json_content[start % per_page:]
261 |                             if num is not None:
262 |                                 entries_left -= len(json_content[start % per_page:])
263 |                         else:
264 |                             entries += json_content[start % per_page:(start % per_page) + num]
265 |                             entries_left = 0
266 |                     else:
267 |                         if num is not None and entries_left > len(json_content):
268 |                             entries += json_content
269 |                             entries_left -= len(json_content)
270 |                         else:
271 |                             entries += json_content[:entries_left]
272 |                             entries_left = 0
273 |                     
274 |                     if display_progress:
275 |                         if num is None and start is None and total_entries >= 0:
276 |                             print_progress_bar(page, total_pages,
277 |                             length=70)
278 |                         elif num is None and start is not None and total_entries >= 0:
279 |                             print_progress_bar(total_entries-start-entries_left, total_entries-start,
280 |                             length=70)
281 |                         elif num is not None and total_entries > 0:
282 |                             print_progress_bar(num-entries_left, num,
283 |                             length=70)
284 |                 else:
285 |                     more_entries = False
286 |             except JSONDecodeError:
287 |                 more_entries = False
288 | 
289 |             page += 1
290 | 
291 |         return (entries, total_entries)
292 |     
293 |     def crawl_single_page(self, url):
294 |         """
295 |             Crawls a single URL
296 |         """
297 |         content = None
298 |         rest_url = url_path_join(self.url, self.api_path, url)
299 |         try:
300 |             req = self.s.get(rest_url)
301 |         except HTTPError400:
302 |             return None
303 |         except HTTPError404:
304 |             return None
305 |         except Exception:
306 |             raise WordPressApiNotV2
307 |         try:
308 |             content = get_content_as_json(req)
309 |         except JSONDecodeError:
310 |             pass
311 | 
312 |         return content
313 | 
314 |     def get_from_cache(self, cache, start=None, num=None, force=False):
315 |         """
316 |             Tries to fetch data from the given cache, also verifies first if WP-JSON is supported
317 |         """
318 |         if self.has_v2 is None:
319 |             self.get_basic_info()
320 |         if not self.has_v2:
321 |             raise WordPressApiNotV2
322 |         if cache is not None and start is not None and len(cache) <= start:
323 |             start = len(cache) - 1
324 |         if cache is not None and not force:
325 |             if start is not None and num is None and len(cache) > start and None not in cache[start:]:
326 |                 # If start is specified and not num, we want to return the posts in cache only if they were already cached
327 |                 return cache[start:]
328 |             elif start is None and num is not None and len(cache) > num and None not in cache[:num]:
329 |                 # If num is specified and not start, we want to do something similar to the above
330 |                 return cache[:num]
331 |             elif start is not None and num is not None and len(cache) > start + num and None not in cache[start:num]:
332 |                 return cache[start:start+num]
333 |             elif (start is None and (num is None or num > len(cache))) and None not in cache:
334 |                 return cache
335 |         
336 |         return None
337 | 
338 |     def update_cache(self, cache, values, total_entries, start=None, num=None):
339 |         if cache is None:
340 |             cache = values
341 |         elif len(values) > 0:
342 |             s = start
343 |             if start is None:
344 |                 s = 0
345 |             if start >= total_entries:
346 |                 s = total_entries - 1
347 |             n = num
348 |             if n is not None and s + n > total_entries:
349 |                 n = total_entries - s
350 |             if num is None:
351 |                 n = total_entries
352 |             if n > len(cache):
353 |                 cache += [None] * (n - len(cache))
354 |             for el in values:
355 |                 cache[s] = el
356 |                 s += 1
357 |                 if s == n:
358 |                     break
359 |         if len(cache) != total_entries:
360 |             if start is not None and start < total_entries:
361 |                 cache = [None] * start + cache
362 |             if num is not None:
363 |                 cache += [None] * (total_entries - len(cache))
364 |         return cache
365 | 
366 |     def get_comments(self, start=None, num=None, force=False):
367 |         """
368 |         Retrieves all comments
369 |         """
370 |         comments = self.get_from_cache(self.comments, start, num, force)
371 |         if comments is not None:
372 |             return comments
373 | 
374 |         comments, total_entries = self.crawl_pages('wp/v2/comments?page=%d', start, num)
375 |         self.comments = self.update_cache(self.comments, comments, total_entries, start, num)
376 |         return comments
377 | 
378 |     def get_posts(self, comments=False, start=None, num=None, force=False):
379 |         """
380 |         Retrieves all posts or the specified ones
381 |         """
382 |         if self.has_v2 is None:
383 |             self.get_basic_info()
384 |         if not self.has_v2:
385 |             raise WordPressApiNotV2
386 |         if self.posts is not None and start is not None and len(self.posts) < start:
387 |             start = len(self.posts) - 1
388 |         if self.posts is not None and (self.comments_loaded and comments or not comments) and not force:
389 |             posts = self.get_from_cache(self.posts, start, num)
390 |             if posts is not None:
391 |                 return posts
392 |         posts, total_entries = self.crawl_pages('wp/v2/posts?page=%d', start=start, num=num)
393 | 
394 |         self.posts = self.update_cache(self.posts, posts, total_entries, start, num)
395 | 
396 |         if not self.comments_loaded and comments:
397 |             # Load comments
398 |             comment_list = self.crawl_pages('wp/v2/comments?page=%d')[0]
399 |             for comment in comment_list:
400 |                 found_post = False
401 |                 for i in range(0, len(self.posts)):
402 |                     if self.posts[i]['id'] == comment['post']:
403 |                         if "comments" not in self.posts[i]:
404 |                             self.posts[i]['comments'] = []
405 |                         self.posts[i]["comments"].append(comment)
406 |                         found_post = True
407 |                         break
408 |                 if not found_post:
409 |                     self.orphan_comments.append(comment)
410 |             self.comments_loaded = True
411 |         
412 |         return_posts = self.posts
413 |         if start is not None and start < len(return_posts):
414 |             return_posts = return_posts[start:]
415 |         if num is not None and num < len(return_posts):
416 |             return_posts = return_posts[:num]
417 |         return return_posts
418 | 
419 |     def get_tags(self, start=None, num=None, force=False):
420 |         """
421 |         Retrieves all tags
422 |         """
423 |         tags = self.get_from_cache(self.tags, start, num, force)
424 |         if tags is not None:
425 |             return tags
426 | 
427 |         tags, total_entries = self.crawl_pages('wp/v2/tags?page=%d', start, num)
428 |         self.tags = self.update_cache(self.tags, tags, total_entries, start, num)
429 |         return tags
430 | 
431 |     def get_categories(self, start=None, num=None, force=False):
432 |         """
433 |         Retrieves all categories or the specified ones
434 |         """
435 |         categories = self.get_from_cache(self.categories, start, num, force)
436 |         if categories is not None:
437 |             return categories
438 |         
439 |         categories, total_entries = self.crawl_pages('wp/v2/categories?page=%d', start=start, num=num)
440 |         self.categories = self.update_cache(self.categories, categories, total_entries, start, num)
441 |         return categories
442 | 
443 |     def get_users(self, start=None, num=None, force=False):
444 |         """
445 |         Retrieves all users or the specified ones
446 |         """
447 |         users = self.get_from_cache(self.users, start, num, force)
448 |         if users is not None:
449 |             return users
450 | 
451 |         users, total_entries = self.crawl_pages('wp/v2/users?page=%d', start=start, num=num)
452 |         self.users = self.update_cache(self.users, users, total_entries, start, num)
453 |         return users
454 | 
455 |     def get_media(self, start=None, num=None, force=False):
456 |         """
457 |         Retrieves all media objects
458 |         """
459 |         media = self.get_from_cache(self.media, start, num, force)
460 |         if media is not None:
461 |             return media
462 | 
463 |         media, total_entries = self.crawl_pages('wp/v2/media?page=%d', start=start, num=num)
464 |         self.media = self.update_cache(self.media, media, total_entries, start, num)
465 |         return media
466 |     
467 |     def get_media_urls(self, ids, cache=True):
468 |         """
469 |         Retrieves the media download URLs for specified IDs or all or from cache
470 |         """
471 |         media = []
472 |         if ids == 'all':
473 |             media = self.get_media(force=(not cache))
474 |         elif ids == 'cache':
475 |             media = self.get_from_cache(self.media, force=(not cache))
476 |         else:
477 |             id_list = ids.split(',')
478 |             media = []
479 |             for i in id_list:
480 |                 try:
481 |                     if int(i) > 0:
482 |                         m = self.get_obj_by_id(WPApi.MEDIA, int(i), cache)
483 |                         if m is not None and len(m) > 0 and type(m[0]) is dict:
484 |                             media.append(m[0])
485 |                 except ValueError:
486 |                     pass
487 |         urls = []
488 |         slugs = []
489 |         if media is None:
490 |             return []
491 |         for m in media:
492 |             if m is not None and type(m) is dict and "source_url" in m.keys() and 'slug' in m.keys():
493 |                 urls.append(m["source_url"])
494 |                 slugs.append(m['slug'])
495 |         return urls, slugs
496 |             
497 | 
498 |     def get_pages(self, start=None, num=None, force=False):
499 |         """
500 |         Retrieves all pages
501 |         """
502 |         pages = self.get_from_cache(self.pages, start, num, force)
503 |         if pages is not None:
504 |             return pages
505 | 
506 |         pages, total_entries = self.crawl_pages('wp/v2/pages?page=%d', start=start, num=num)
507 |         self.pages = self.update_cache(self.pages, pages, total_entries, start, num)
508 |         return pages
509 | 
510 |     def get_namespaces(self, start=None, num=None, force=False):
511 |         """
512 |         Retrieves an array of namespaces
513 |         """
514 |         if self.has_v2 is None or force:
515 |             self.get_basic_info()
516 |         if 'namespaces' in self.basic_info.keys():
517 |             if start is None and num is None:
518 |                 return self.basic_info['namespaces']
519 |             namespaces = copy.deepcopy(self.basic_info['namespaces'])
520 |             if start is not None and start < len(namespaces):
521 |                 namespaces = namespaces[start:]
522 |             if num <= len(namespaces):
523 |                 namespaces = namespaces[:num]
524 |             return namespaces
525 |         return []
526 | 
527 |     def get_routes(self):
528 |         """
529 |         Retrieves an array of routes
530 |         """
531 |         if self.has_v2 is None:
532 |             self.get_basic_info()
533 |         if 'routes' in self.basic_info.keys():
534 |             return self.basic_info['routes']
535 |         return []
536 | 
537 |     def crawl_namespaces(self, ns):
538 |         """
539 |         Crawls all accessible get routes defined for the specified namespace.
540 |         """
541 |         namespaces = self.get_namespaces()
542 |         routes = self.get_routes()
543 |         ns_data = {}
544 |         if ns != "all" and ns not in namespaces:
545 |             raise NSNotFoundException
546 |         for url, route in routes.items():
547 |             if 'namespace' not in route.keys() \
548 |                or 'endpoints' not in route.keys():
549 |                 continue
550 |             url_as_ns = url.lstrip('/')
551 |             if '(?P<' in url or url_as_ns in namespaces:
552 |                 continue
553 |             if ns != 'all' and route['namespace'] != ns or \
554 |                route['namespace'] in ['wp/v2', '']:
555 |                 continue
556 |             for endpoint in route['endpoints']:
557 |                 if 'GET' not in endpoint['methods']:
558 |                     continue
559 |                 keep = True
560 |                 if len(endpoint['args']) > 0 and type(endpoint['args']) is dict:
561 |                     for name,arg in endpoint['args'].items():
562 |                         if arg['required']:
563 |                             keep = False
564 |                 if keep:
565 |                     rest_url = url_path_join(self.url, self.api_path, url)
566 |                     try:
567 |                         ns_request = self.s.get(rest_url)
568 |                         ns_data[url] = get_content_as_json(ns_request)
569 |                     except Exception:
570 |                         continue
571 |         return ns_data
572 | 
573 |     def get_obj_by_id_helper(self, cache, obj_id, url, use_cache=True):
574 |         if use_cache and cache is not None:
575 |             obj = get_by_id(cache, obj_id)
576 |             if obj is not None:
577 |                 return [obj]
578 |         obj = self.crawl_single_page(url % obj_id)
579 |         if type(obj) is dict:
580 |             return [obj]
581 |         return []
582 |     
583 |     def get_obj_by_id(self, obj_type, obj_id, use_cache=True):
584 |         """
585 |             Returns a list of maximum one object specified by its type and ID.
586 | 
587 |             Also returns an empty list if the ID does not exist.
588 | 
589 |             :param obj_type: the type of the object (ex. POST)
590 |             :param obj_id: the ID of the object to fetch
591 |             :param use_cache: if the cache should be used to avoid useless requests
592 |         """
593 |         if obj_type == WPApi.USER:
594 |             return self.get_obj_by_id_helper(self.users, obj_id, 'wp/v2/users/%d', use_cache)
595 |         if obj_type == WPApi.TAG:
596 |             return self.get_obj_by_id_helper(self.tags, obj_id, 'wp/v2/tags/%d', use_cache)
597 |         if obj_type == WPApi.CATEGORY:
598 |             return self.get_obj_by_id_helper(self.categories, obj_id, 'wp/v2/categories/%d', use_cache)
599 |         if obj_type == WPApi.POST:
600 |             return self.get_obj_by_id_helper(self.posts, obj_id, 'wp/v2/posts/%d', use_cache)
601 |         if obj_type == WPApi.PAGE:
602 |             return self.get_obj_by_id_helper(self.pages, obj_id, 'wp/v2/pages/%d', use_cache)
603 |         if obj_type == WPApi.COMMENT:
604 |             return self.get_obj_by_id_helper(self.comments, obj_id, 'wp/v2/comments/%d', use_cache)
605 |         if obj_type == WPApi.MEDIA:
606 |             return self.get_obj_by_id_helper(self.comments, obj_id, 'wp/v2/media/%d', use_cache)
607 |         return []
608 |     
609 |     def get_obj_list(self, obj_type, start, limit, cache, kwargs={}):
610 |         """
611 |             Returns a list of maximum limit objects specified by the starting object offset.
612 | 
613 |             :param obj_type: the type of the object (ex. POST)
614 |             :param start: the offset of the first object to return
615 |             :param limit: the maximum number of objects to return
616 |             :param cache: if the cache should be used to avoid useless requests
617 |             :param kwargs: additional parameters to pass to the function (for POST only)
618 |         """
619 |         get_func = None
620 |         if obj_type == WPApi.USER:
621 |             get_func = self.get_users
622 |         elif obj_type == WPApi.TAG:
623 |             get_func = self.get_tags
624 |         elif obj_type == WPApi.CATEGORY:
625 |             get_func = self.get_categories
626 |         elif obj_type == WPApi.PAGE:
627 |             get_func = self.get_pages
628 |         elif obj_type == WPApi.COMMENT:
629 |             get_func = self.get_comments
630 |         elif obj_type == WPApi.MEDIA:
631 |             get_func = self.get_media
632 |         elif obj_type == WPApi.NAMESPACE:
633 |             get_func = self.get_namespaces
634 |         
635 |         if get_func is not None:
636 |             return get_func(start=start, num=limit, force=not cache)
637 |         elif obj_type == WPApi.POST:
638 |             return self.get_posts(start=start, num=limit, force=not cache, **kwargs)
639 |         return []
640 |     
641 |     def search(self, obj_types, keywords, start, limit):
642 |         """
643 |             Looks for data with the specified keywords of the given types.
644 | 
645 |             :param obj_types: a list of the desired object types to look for
646 |             :param keywords: the keywords to look for
647 |             :param start: a start index
648 |             :param limit: the max number to return
649 |             :return: a dict of lists of objects sorted by types
650 |         """
651 |         out = {}
652 |         if WPApi.ALL_TYPES in obj_types or len(obj_types) == 0:
653 |             obj_types = [
654 |                 WPApi.POST, WPApi.CATEGORY, WPApi.TAG, WPApi.PAGE,
655 |                 WPApi.COMMENT, WPApi.MEDIA, WPApi.USER
656 |             ] # All supported types for search
657 |         for t in obj_types:
658 |             if t == WPApi.POST:
659 |                 out[t] = self.crawl_pages('wp/v2/posts?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
660 |             elif t == WPApi.CATEGORY:
661 |                 out[t] = self.crawl_pages('wp/v2/categories?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
662 |             elif t == WPApi.TAG:
663 |                 out[t] = self.crawl_pages('wp/v2/tags?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
664 |             elif t == WPApi.PAGE:
665 |                 out[t] = self.crawl_pages('wp/v2/pages?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
666 |             elif t == WPApi.COMMENT:
667 |                 out[t] = self.crawl_pages('wp/v2/comments?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
668 |             elif t == WPApi.MEDIA:
669 |                 out[t] = self.crawl_pages('wp/v2/media?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
670 |             elif t == WPApi.USER:
671 |                 out[t] = self.crawl_pages('wp/v2/users?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
672 |         return out


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | certifi==2022.12.7
2 | chardet==3.0.4
3 | idna==2.9
4 | requests==2.23.0
5 | urllib3==1.26.5
6 | 


--------------------------------------------------------------------------------