├── .gitignore
├── LICENSE.txt
├── README.md
├── WPJsonScraper.py
├── doc
├── Interactive.md
└── WPJsonScraperCapture.png
├── lib
├── __init__.py
├── console.py
├── exceptions.py
├── exporter.py
├── infodisplayer.py
├── interactive.py
├── plugins
│ └── plugin_list.csv
├── requestsession.py
├── utils.py
└── wpapi.py
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | */__pycache__/*
2 | .venv/*
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy
4 | of this software and associated documentation files (the "Software"), to deal
5 | in the Software without restriction, including without limitation the rights
6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7 | copies of the Software, and to permit persons to whom the Software is
8 | furnished to do so, subject to the following conditions:
9 |
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 |
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
20 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # WPJsonScraper
2 |
3 | ## Introduction
4 |
5 | 
6 |
7 | WPJsonScraper is a tool for dumping a maximum of the content available on a
8 | WordPress installation. It uses the wp-json API to retrieve all important
9 | information and enumerate every user, post, comment, media and more.
10 |
11 | This allows to get information about sensitive files or pages which may be not
12 | protected enough from external access.
13 |
14 | WPJsonScraper has 2 operation modes: command line arguments and interactive.
15 | The latest offers a command prompt allowing to do more complex operations on
16 | the WP-JSON API.
17 |
18 | ## Prerequises
19 |
20 | WPJsonScraper is written in Python and should work with any Python 3
21 | environment given that the following packages are installed:
22 |
23 | * Python 3
24 | * requests
25 |
26 | ## Installation
27 |
28 | Just clone the repository with git and run `pip install -r requirements.txt`.
29 |
30 | You may want to use a virtualenv for keeping your dependencies consistent across
31 | Python projects.
32 |
33 | ## Usage
34 |
35 | ### Interactive mode
36 |
37 | See [Interactive mode](doc/Interactive.md) for more details.
38 |
39 | ### Command line arguments mode
40 |
41 | The tool needs the definition of a target WordPress installation and a flag
42 | instructing which action to do.
43 |
44 | You may want to have all available information using the -a flag. But this is
45 | maybe a bit verbose, so you can select which categories of information you need
46 | in these ones :
47 |
48 | * -h, --help: display the help and exit
49 | * -v, --version: display the version number and exit
50 | * -a, --all: display all data available
51 | * -i, --info: dump basic information about the target
52 | * -e, --endpoints: dump full endpoint documentation
53 | * -p, --posts: list all published posts
54 | * -u, --users: list all users
55 | * -t, --tags: list all tags
56 | * -c, --categories: list all categories
57 | * -m, --media: list all public media objects
58 | * --download-media MEDIA_FOLDER: download media to the designated folder
59 | * -g, --pages: list all public pages
60 | * -o, --comments: lists comments
61 | * -S, --search SEARCH_TERMS: performs a search on SEARCH_TERMS
62 | * -r, --crawl-ns: crawl plugin namespaces for collections. Set it to all to
63 | crawl all namespaces
64 | * --proxy PROXY_URL force the data to pass through a specified proxy server
65 | * --auth CREDENTIALS use the specified credentials as basic HTTP auth for the
66 | server
67 | * --cookies COOKIES add specified Cookies to the requests
68 | * --no-color: remove color (for example to redirect the output to a file)
69 | * --interactive: start an interactive session
70 |
71 | Moreover, you can export contents of pages and posts to a folder in separate
72 | files:
73 |
74 | * --export-pages PAGE_EXPORT_FOLDER
75 | * --export-posts POST_EXPORT_FOLDER
76 | * --export-comments COMMENT_EXPORT_FOLDER
77 |
78 | You can set the proxy server with the --proxy flag. It can be an HTTP or HTTPS
79 | as described in Python requests documentation. By default the proxy servers of
80 | the system are used.
81 |
82 | Example:
83 |
84 | http://user:password@example.com:8080/
85 |
86 | Using the -r option, you can crawl collections of the specified namespace. This
87 | allows you to get a set of objects from the API and maybe confidential data ;)
88 |
89 | #### Search feature
90 |
91 | WordPress WP-JSON API allows to search in posts, pages, media objects, tags,
92 | categories, comments and users.
93 |
94 | The -S (--search) option allows to use this functionnality with
95 | wp-json-scraper.
96 |
97 | It can be used on a specific item type or on several at once.
98 |
99 | Examples:
100 |
101 | # Search for "lorem" for all item types specified
102 | ./WPJsonScraper.py -S lorem https://demo.wp-api.org/
103 | # Search for "hello world" in posts, users and pages only
104 | ./WPJsonScraper.py -S "hello world" -p -u -g https://demo.wp-api.org/
105 |
106 | ## Features to implement
107 |
108 | WPJsonScraper is not a mature project yet and its features are pretty basic for
109 | the moment. Some of the features that could be implemented in the future are:
110 |
111 | * Posts revisions retrieval
112 | * Plugins support
113 | * Authentication support with NTLM
114 | * WordPress instance save as JSON (limited to the accessible scope) and restore?
115 | * Password-protected content handling
116 | * Support new endpoints added in version 5.0: autosaves, block type, blocks, block_renderer, themes (authenticated access required but WTF?)
117 | * Write tests duh!
--------------------------------------------------------------------------------
/WPJsonScraper.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | """
4 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
5 |
6 | Permission is hereby granted, free of charge, to any person obtaining a copy
7 | of this software and associated documentation files (the "Software"), to deal
8 | in the Software without restriction, including without limitation the rights
9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the Software is
11 | furnished to do so, subject to the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 | """
24 |
25 | import argparse
26 | import requests
27 | import re
28 | import os
29 |
30 | from lib.console import Console
31 | from lib.wpapi import WPApi
32 | from lib.infodisplayer import InfoDisplayer
33 | from lib.exceptions import NoWordpressApi, WordPressApiNotV2, \
34 | NSNotFoundException
35 | from lib.exporter import Exporter
36 | from lib.requestsession import RequestSession
37 | from lib.interactive import start_interactive
38 |
39 | version = '0.5'
40 |
41 | def main():
42 | parser = argparse.ArgumentParser(description=
43 | """Reads a WP-JSON API on a WordPress installation to retrieve a maximum of
44 | publicly available information. These information comprise, but not only:
45 | posts, comments, pages, medias or users. As this tool could allow to access
46 | confidential (but not well-protected) data, it is recommended that you get
47 | first a written permission from the site owner. The author won\'t endorse any
48 | liability for misuse of this software""",
49 | epilog=
50 | """(c) 2018-2020 Mickaël "Kilawyn" Walter. This program is licensed under the MIT
51 | license, check LICENSE.txt for more information""")
52 | parser.add_argument('-v',
53 | '--version',
54 | action='version',
55 | version='%(prog)s ' + version)
56 | parser.add_argument('target',
57 | type=str,
58 | help='the base path of the WordPress installation to '
59 | 'examine')
60 | parser.add_argument('-i',
61 | '--info',
62 | dest='info',
63 | action='store_true',
64 | help='dumps basic information about the WordPress '
65 | 'installation')
66 | parser.add_argument('-e',
67 | '--endpoints',
68 | dest='endpoints',
69 | action='store_true',
70 | help='dumps full endpoint documentation')
71 | parser.add_argument('-p',
72 | '--posts',
73 | dest='posts',
74 | action='store_true',
75 | help='lists published posts')
76 | parser.add_argument('--export-posts',
77 | dest='post_export_folder',
78 | action='store',
79 | help='export posts to a specified destination folder')
80 | parser.add_argument('-u',
81 | '--users',
82 | dest='users',
83 | action='store_true',
84 | help='lists users')
85 | parser.add_argument('-t',
86 | '--tags',
87 | dest='tags',
88 | action='store_true',
89 | help='lists tags')
90 | parser.add_argument('-c',
91 | '--categories',
92 | dest='categories',
93 | action='store_true',
94 | help='lists categories')
95 | parser.add_argument('-m',
96 | '--media',
97 | dest='media',
98 | action='store_true',
99 | help='lists media objects')
100 | parser.add_argument('-g',
101 | '--pages',
102 | dest='pages',
103 | action='store_true',
104 | help='lists pages')
105 | parser.add_argument('-o',
106 | '--comments',
107 | dest='comments',
108 | action='store_true',
109 | help="lists comments")
110 | parser.add_argument('--export-pages',
111 | dest='page_export_folder',
112 | action='store',
113 | help='export pages to a specified destination folder')
114 | parser.add_argument('--export-comments',
115 | dest='comment_export_folder',
116 | action='store',
117 | help='export comments to a specified destination folder')
118 | parser.add_argument('--download-media',
119 | dest='media_folder',
120 | action='store',
121 | help='download media to the designated folder')
122 | parser.add_argument('-r',
123 | '--crawl-ns',
124 | dest='crawl_ns',
125 | action='store',
126 | help='crawl all GET routes of the specified namespace '
127 | 'or all namespaces if all is specified')
128 | parser.add_argument('-a',
129 | '--all',
130 | dest='all',
131 | action='store_true',
132 | help='dumps all available information from the '
133 | 'target API')
134 | parser.add_argument('-S',
135 | '--search',
136 | dest='search',
137 | action='store',
138 | help='search for a string on the WordPress instance. '
139 | 'If one or several flag in agpmctu are set, search '
140 | 'only on these')
141 | parser.add_argument('--proxy',
142 | dest='proxy_server',
143 | action='store',
144 | help='define a proxy server to use, e.g. for '
145 | 'enterprise network or debugging')
146 | parser.add_argument('--auth',
147 | dest='credentials',
148 | action='store',
149 | help='define a username and a password separated by '
150 | 'a colon to use them as basic authentication')
151 | parser.add_argument('--cookies',
152 | dest='cookies',
153 | action='store',
154 | help='define specific cookies to send with the request '
155 | 'in the format cookie1=foo; cookie2=bar')
156 | parser.add_argument('--no-color',
157 | dest='nocolor',
158 | action='store_true',
159 | help='remove color in the output (e.g. to pipe it)')
160 | parser.add_argument('--interactive',
161 | dest='interactive',
162 | action='store_true',
163 | help='start an interactive session')
164 |
165 |
166 | args = parser.parse_args()
167 |
168 | motd = """
169 | _ _______ ___ _____
170 | | | | | ___ \\|_ | / ___|
171 | | | | | |_/ / | | ___ ___ _ __ \\ `--. ___ _ __ __ _ _ __ ___ _ __
172 | | |/\\| | __/ | |/ __|/ _ \\| '_ \\ `--. \\/ __| '__/ _` | '_ \\ / _ \\ '__|
173 | \\ /\\ / | /\\__/ /\\__ \\ (_) | | | /\\__/ / (__| | | (_| | |_) | __/ |
174 | \\/ \\/\\_| \\____/ |___/\\___/|_| |_\\____/ \\___|_| \\__,_| .__/ \\___|_|
175 | | |
176 | |_|
177 | WPJsonScraper v%s
178 | By Mickaël \"Kilawyn\" Walter
179 |
180 | Make sure you use this tool with the approval of the site owner. Even if
181 | these information are public or available with proper authentication, this
182 | could be considered as an intrusion.
183 |
184 | Target: %s
185 |
186 | """ % (version, args.target)
187 |
188 | print(motd)
189 |
190 | if args.nocolor:
191 | Console.wipe_color()
192 |
193 | Console.log_info("Testing connectivity with the server")
194 |
195 | target = args.target
196 | if re.match(r'^https?://.*$', target) is None:
197 | target = "http://" + target
198 | if re.match(r'^.+/$', target) is None:
199 | target += "/"
200 |
201 | proxy = None
202 | if args.proxy_server is not None:
203 | proxy = args.proxy_server
204 | cookies = None
205 | if args.cookies is not None:
206 | cookies = args.cookies
207 | authorization = None
208 | if args.credentials is not None:
209 | authorization_list = args.credentials.split(':')
210 | if len(authorization_list) == 1:
211 | authorization = (authorization_list[0], '')
212 | elif len(authorization_list) >= 2:
213 | authorization = (authorization_list[0],
214 | ':'.join(authorization_list[1:]))
215 | session = RequestSession(proxy=proxy, cookies=cookies,
216 | authorization=authorization)
217 | try:
218 | session.get(target)
219 | Console.log_success("Connection OK")
220 | except Exception as e:
221 | Console.log_error("Failed to connect to the server")
222 | exit(0)
223 |
224 | # Quite an ugly check to launch a search on all parameters edible
225 | # Should find something better (maybe in argparser doc?)
226 | if args.search is not None and not (args.all | args.posts | args.pages |
227 | args.users | args.categories | args.tags | args.media):
228 | Console.log_info("Searching on all available sources")
229 | args.posts = True
230 | args.pages = True
231 | args.users = True
232 | args.categories = True
233 | args.tags = True
234 | args.media = True
235 |
236 | if args.interactive:
237 | start_interactive(target, session, version)
238 | return
239 |
240 | scanner = WPApi(target, session=session, search_terms=args.search)
241 | if args.info or args.all:
242 | try:
243 | basic_info = scanner.get_basic_info()
244 | Console.log_info("General information on the target")
245 | InfoDisplayer.display_basic_info(basic_info)
246 | except NoWordpressApi:
247 | Console.log_error("No WordPress API available at the given URL "
248 | "(too old WordPress or not WordPress?)")
249 | exit()
250 |
251 | if args.posts or args.all:
252 | try:
253 | if args.comments:
254 | Console.log_info("Post list with comments")
255 | else:
256 | Console.log_info("Post list")
257 | posts_list = scanner.get_posts(args.comments)
258 | InfoDisplayer.display_posts(posts_list, scanner.get_orphans_comments())
259 | except WordPressApiNotV2:
260 | Console.log_error("The API does not support WP V2")
261 |
262 | if args.pages or args.all:
263 | try:
264 | Console.log_info("Page list")
265 | pages_list = scanner.get_pages()
266 | InfoDisplayer.display_pages(pages_list)
267 | except WordPressApiNotV2:
268 | Console.log_error("The API does not support WP V2")
269 |
270 | if args.users or args.all:
271 | try:
272 | Console.log_info("User list")
273 | users_list = scanner.get_users()
274 | InfoDisplayer.display_users(users_list)
275 | except WordPressApiNotV2:
276 | Console.log_error("The API does not support WP V2")
277 |
278 | if args.endpoints or args.all:
279 | try:
280 | Console.log_info("API endpoints")
281 | basic_info = scanner.get_basic_info()
282 | InfoDisplayer.display_endpoints(basic_info)
283 | except NoWordpressApi:
284 | Console.log_error("No WordPress API available at the given URL "
285 | "(too old WordPress or not WordPress?)")
286 | exit()
287 |
288 | if args.categories or args.all:
289 | try:
290 | Console.log_info("Category list")
291 | categories_list = scanner.get_categories()
292 | InfoDisplayer.display_categories(categories_list)
293 | except WordPressApiNotV2:
294 | Console.log_error("The API does not support WP V2")
295 |
296 | if args.tags or args.all:
297 | try:
298 | Console.log_info("Tags list")
299 | tags_list = scanner.get_tags()
300 | InfoDisplayer.display_tags(tags_list)
301 | except WordPressApiNotV2:
302 | Console.log_error("The API does not support WP V2")
303 |
304 | media_list = None
305 | if args.media or args.all:
306 | try:
307 | Console.log_info("Media list")
308 | media_list = scanner.get_media()
309 | InfoDisplayer.display_media(media_list)
310 | except WordPressApiNotV2:
311 | Console.log_error("The API does not support WP V2")
312 |
313 | if args.crawl_ns is None and args.all:
314 | args.crawl_ns = "all"
315 |
316 | if args.crawl_ns is not None:
317 | try:
318 | if args.crawl_ns == "all":
319 | Console.log_info("Crawling all namespaces")
320 | else:
321 | Console.log_info("Crawling %s namespace" % args.crawl_ns)
322 | ns_data = scanner.crawl_namespaces(args.crawl_ns)
323 | InfoDisplayer.display_crawled_ns(ns_data)
324 | except NSNotFoundException:
325 | Console.log_error("The specified namespace was not found")
326 | except Exception as e:
327 | print(e)
328 |
329 | if args.post_export_folder is not None:
330 | try:
331 | posts_list = scanner.get_posts()
332 | tags_list = scanner.get_tags()
333 | categories_list = scanner.get_categories()
334 | users_list = scanner.get_users()
335 | print()
336 | post_number = Exporter.export_posts_html(posts_list,
337 | args.post_export_folder,
338 | tags_list,
339 | categories_list,
340 | users_list)
341 | if post_number> 0:
342 | Console.log_success("Exported %d posts to %s" %
343 | (post_number, args.post_export_folder))
344 | except WordPressApiNotV2:
345 | Console.log_error("The API does not support WP V2")
346 |
347 | if args.page_export_folder is not None:
348 | try:
349 | pages_list = scanner.get_pages()
350 | users_list = scanner.get_users()
351 | print()
352 | page_number = Exporter.export_posts_html(pages_list,
353 | args.page_export_folder,
354 | None,
355 | None,
356 | users_list)
357 | if page_number> 0:
358 | Console.log_success("Exported %d pages to %s" %
359 | (page_number, args.page_export_folder))
360 | except WordPressApiNotV2:
361 | Console.log_error("The API does not support WP V2")
362 |
363 | if args.comment_export_folder is not None:
364 | try:
365 | post_list = scanner.get_posts(True)
366 | orphan_list = scanner.get_orphans_comments()
367 | print()
368 | page_number = Exporter.export_comments(post_list, orphan_list, args.comment_export_folder)
369 | if page_number > 0:
370 | Console.log_success("Exported %d comments to %s" %
371 | (page_number, args.comment_export_folder))
372 | except WordPressApiNotV2:
373 | Console.log_error("The API does not support WP V2")
374 |
375 | if args.media_folder is not None:
376 | Console.log_info("Downloading media files")
377 | if not os.path.isdir(args.media_folder):
378 | Console.log_error("The destination is not a folder or does not exist")
379 | else:
380 | print("Pulling the media URLs")
381 |
382 | media, _ = scanner.get_media_urls('all', True)
383 | if len(media) == 0:
384 | Console.log_error("No media found")
385 | return
386 | print("%d media URLs found" % len(media))
387 |
388 | print("Note: Only files over 10MB are logged here")
389 | number_downloaded = Exporter.download_media(media, args.media_folder)
390 | Console.log_success('Downloaded %d media to %s' % (number_downloaded, args.media_folder))
391 |
392 |
393 | if __name__ == "__main__":
394 | main()
395 |
--------------------------------------------------------------------------------
/doc/Interactive.md:
--------------------------------------------------------------------------------
1 | # Interactive mode
2 |
3 | To help with more complex interactions with WP-JSON API, WPJsonScraper implements an interactive mode.
4 |
5 | In interactive mode, the same session is used between requests. So every cookies set by the server and other parameters are kept
6 | from one request to another.
7 |
8 | Typing `command -h` or `command --help` will bring a detailed help message for specific commands.
9 |
10 | Tab autocompletes the command name, up and down browse the command history.
11 |
12 | ## Commands
13 |
14 | ### help
15 |
16 | Lists commands and displays a brief help message about specified commands.
17 |
18 | Example 1: display the command list
19 |
20 | help
21 |
22 | Example 2: display a brief help message about the command goals.
23 |
24 | help show
25 |
26 | ### exit
27 |
28 | Exits the interactive mode and goes back to the user's shell.
29 |
30 | ### show
31 |
32 | Shows details about global parameters stored in WPJsonScraper memory.
33 |
34 | Example: show all parameters
35 |
36 | show all
37 |
38 | ### set
39 |
40 | Sets a specific global parameter.
41 |
42 | Note that in cases of proxy and cookies, the command updates the entries.
43 | Check the resulting parameter with show if you don't know what that means.
44 |
45 | **Note:** changing the target resets the cache but keeps proxies, cookies and authorization headers. Be aware
46 | of data leakage risks. If you need to keep things apart between targets, relaunch WPJsonScraper or make sure
47 | all is correctly set up with the `show all` command.
48 |
49 | Example 1: change the target
50 |
51 | set target http://example.com
52 |
53 | Example 2: add or modify the cookies PHPSESSID and JSESSIONID (because why not?)
54 |
55 | set cookie "PHPSESSID=deadbeef; JSESSIONID=badc0ffee"
56 |
57 | ### list
58 |
59 | Lists specified data from the server.
60 |
61 | This command gets data from the server and displays it as a simple list (with no details).
62 |
63 | It also can export full scraped data (with all details available) to specified JSON file
64 | (see --csv and --json options). If a file extension is not specified, WPJsonScraper will append one.
65 | The export options will try to join data with other API endpoint data (e.g. users with posts). CSV files
66 | imply that most of the data is removed to ensure human readability. Use this option only to export a list of
67 | posts.
68 |
69 | **Note:** to avoid having too much noise on the target, WPJsonScraper won't fetch automatically any other
70 | endpoint to complete the exported data. If you want all information to be gathered, you have to build the
71 | cache first by requesting the data beforehand (for example, getting the user list before exporting the posts).
72 |
73 | By default, WPJsonScraper caches data to avoid requesting the server too often. To get the lastest updates,
74 | run this command with the --no-cache option.
75 |
76 | Use the --limit and --start options to retrieve a subset of all data selected.
77 |
78 | In the case of media files, the files themselves **are not downloaded**.
79 |
80 | Example 1: get all posts
81 |
82 | list posts
83 |
84 | Example 2: get maximum 10 pages starting at page 15
85 |
86 | list pages --start 15 --limit 10
87 |
88 | Example 3: export all listeable content to json files (including for example all-data-posts.json)
89 |
90 | list all --json all-data
91 |
92 | Example 4: list namespaces
93 |
94 | list namespaces
95 |
96 | ### fetch
97 |
98 | Fetches a specific piece of data from the server given its type and its ID. By default, if the data is cached,
99 | the data is returned from the cache. Use the --no-cache argument to force its retrieval from the server.
100 |
101 | The data displayed is more complete than the data displayed by the list command. But some metadata is still not
102 | displayed. Only the JSON export is a full data dump (with additional mapping when relevant).
103 |
104 | **Note:** like in the list function, the data that could complete the displayed information is not automatically
105 | fetched. You have to get it into cache first or to fetch it separately based on its ID. Moreover, the data
106 | retrieved by ID is not yet pushed into the cache. It may be in a later version.
107 |
108 | Example 1 : display the post with the ID 1
109 |
110 | fetch post 1
111 |
112 | Example 2 : display the page with the ID 42 and export it in a JSON file, don't use the cache
113 |
114 | fetch page 42 --no-cache
115 |
116 | ### search
117 |
118 | Looks for data based on the specified keywords. This command doesn't use the cache and systematically uses the
119 | WordPress API to do searches. One or several object types may be provided to narrow the search scope.
120 |
121 | Example 1: look for keyword test in all object types
122 |
123 | search test
124 |
125 | Example 2: look for keyword foo in posts and pages
126 |
127 | search --type post --type page foo
128 |
129 | Example 3: --limit and --start also work for search results
130 |
131 | search --limit 5 --start 4 bar
132 |
133 | ### dl
134 |
135 | Downloads media based on the provided ID. The ID can be specified as an integer (or list of integers), `all` or
136 | `cache`. In the first case, only media with the specified IDs will be downloaded. `all` will trigger a fetch from
137 | the API to list all medias then a download session for each file. `cache` will get media URLs from the cache and
138 | then download the files.
139 |
140 | Note that if all the IDs specified are in the cache, no lookup will be made on the API. If you want to override
141 | this behaviour, set the `--no-cache` flag.
142 |
143 | Example 1: download the media with the IDs 42 and 63 to the current folder
144 |
145 | dl 42,63 .
146 |
147 | Example 2: download all media to user's home folder
148 |
149 | dl all /home/user
150 |
151 | Example 3: only media present in the cache (e.g. previously requested with list or fetch) are downloaded
152 |
153 | dl cache .
--------------------------------------------------------------------------------
/doc/WPJsonScraperCapture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MickaelWalter/wp-json-scraper/677ddeea6437f24302855652756e11c89ebeaf84/doc/WPJsonScraperCapture.png
--------------------------------------------------------------------------------
/lib/__init__.py:
--------------------------------------------------------------------------------
1 | pass
2 |
--------------------------------------------------------------------------------
/lib/console.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 |
24 | class Console:
25 | """
26 | A little helper class to allow console management (like color)
27 | """
28 | normal = "\033[0m"
29 | blue = "\033[94m"
30 | green = "\033[92m"
31 | red = "\033[31m"
32 |
33 | @staticmethod
34 | def wipe_color():
35 | """
36 | Deactivates color in terminal
37 | """
38 | Console.normal = ""
39 | Console.blue = ""
40 | Console.green = ""
41 | Console.red = ""
42 |
43 | @staticmethod
44 | def log_info(text):
45 | """
46 | Prints information log to the console
47 | param text: the text to display
48 | """
49 | print()
50 | print(Console.blue + "[*] " + text + Console.normal)
51 |
52 | @staticmethod
53 | def log_error(text):
54 | """
55 | Prints error log to the console
56 | param text: the text to display
57 | """
58 | print()
59 | print(Console.red + "[!] " + text + Console.normal)
60 |
61 | @staticmethod
62 | def log_success(text):
63 | """
64 | Prints error log to the console
65 | param text: the text to display
66 | """
67 | print(Console.green + "[+] " + text + Console.normal)
68 |
--------------------------------------------------------------------------------
/lib/exceptions.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | class NoWordpressApi (Exception):
24 | """
25 | No API is available at the given URL
26 | """
27 | pass
28 |
29 | class WordPressApiNotV2 (Exception):
30 | """
31 | The WordPress V2 API is not available
32 | """
33 | pass
34 |
35 | class NSNotFoundException (Exception):
36 | """
37 | The specified namespace does not exist
38 | """
39 | pass
40 |
--------------------------------------------------------------------------------
/lib/exporter.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | import os
24 | import copy
25 | import html
26 | import json
27 | import csv
28 | from datetime import datetime
29 | from urllib import parse as urlparse
30 | import mimetypes
31 | import requests
32 |
33 | from lib.console import Console
34 | from lib.utils import get_by_id, print_progress_bar
35 |
36 | class Exporter:
37 | """
38 | Utility functions to export data
39 | """
40 | JSON = 1
41 | """
42 | Represents the JSON format for format choice
43 | """
44 | CSV = 2
45 | """
46 | Represents the CSV format for format choice
47 | """
48 | CHUNK_SIZE = 2048
49 | """
50 | The size of chunks to download large files
51 | """
52 |
53 | @staticmethod
54 | def download_media(media, output_folder, slugs=None):
55 | """
56 | Downloads the media files based on the given URLs
57 |
58 | :param media: the URLs as a list
59 | :param output_folder: the path to the folder where the files are being saved, it is assumed as existing
60 | :param slugs: list of slugs to associate with media. The list must be ordered the same as media and should be the same size
61 | :return: the number of files wrote
62 | """
63 | files_number = 0
64 | media_length = len(media)
65 | progress = 0
66 | for m in media:
67 | r = requests.get(m, stream=True)
68 | if r.status_code == 200:
69 | http_path = urlparse.urlparse(m).path.split("/")
70 | local_path = output_folder
71 | if len(http_path) > 1:
72 | for el in http_path[:-1]:
73 | local_path = os.path.join(local_path, el)
74 | if not os.path.isdir(local_path):
75 | os.mkdir(local_path)
76 | if slugs is None:
77 | local_path = os.path.join(local_path, http_path[-1])
78 | else:
79 | ext = mimetypes.guess_extension(r.headers['Content-Type'])
80 | local_path = os.path.join(local_path, slugs[progress])
81 | if ext is not None:
82 | local_path += ext
83 | with open(local_path, "wb") as f:
84 | i = 0
85 | content_size = int(r.headers['Content-Length'])
86 | for chunk in r.iter_content(Exporter.CHUNK_SIZE):
87 | if content_size > 10485706: # 10Mo
88 | print_progress_bar(i*Exporter.CHUNK_SIZE, content_size, prefix=http_path[-1], length=70)
89 | f.write(chunk)
90 | i += 1
91 | if content_size > 10485706: # 10Mo
92 | print_progress_bar(content_size, content_size, prefix=http_path[-1], length=70)
93 | files_number += 1
94 | progress += 1
95 | if progress % 10 == 1:
96 | print("Downloaded file %d of %d" % (progress, media_length))
97 | return files_number
98 |
99 | @staticmethod
100 | def map_params(el, parameters_to_map):
101 | """
102 | Maps params to ids recursively.
103 |
104 | This method automatically maps IDs with the correponding objects given in parameters_to_map.
105 | The mapping is made in place as el is passed as a reference.
106 |
107 | :param el: the element that have ID references
108 | :param parameters_to_map: a dict containing lists of elements to map by ids with el
109 | """
110 | for key, value in el.items():
111 | if key in parameters_to_map.keys() and parameters_to_map[key] is not None:
112 | if type(value) is int: # Only one ID to map
113 | obj = get_by_id(parameters_to_map[key], value)
114 | if obj is not None:
115 | el[key] = {
116 | 'id': value,
117 | 'details': obj
118 | }
119 | elif type(value) is list: # The object is a list of IDs, we map each one
120 | vlist = []
121 | for v in value:
122 | obj = get_by_id(parameters_to_map[key], v)
123 | vlist.append(obj)
124 | el[key] = {
125 | 'ids': value,
126 | 'details': vlist
127 | }
128 | elif value is dict:
129 | Exporter.map_params(value, parameters_to_map)
130 |
131 | @staticmethod
132 | def setup_export(vlist, parameters_to_unescape, parameters_to_map):
133 | """
134 | Sets up the right values for a list export.
135 |
136 | This function flattens alist of objects before its serialization in the expected format.
137 | It also makes a deepcopy to ensure that the original vlist is not altered.
138 |
139 | :param vlist: the list to prepare for exporting
140 | :param parameters_to_unescape: parameters to unescape (ex. ["param1", ["param2"]["rendered"]])
141 | :param parameters_to_map: parameters to map to another (ex. {"param_to_map": param_values_list})
142 | """
143 | exported_list = []
144 |
145 | for el in vlist:
146 | if el is not None:
147 | # First copy the object
148 | exported_el = copy.deepcopy(el)
149 | # Look for parameters to HTML unescape
150 | for key in parameters_to_unescape:
151 | if type(key) is str: # If the parameter is at the root
152 | exported_el[key] = html.unescape(exported_el[key])
153 | elif type(key) is list: # If the parameter is nested
154 | selected = exported_el
155 | siblings = []
156 | fullpath = {}
157 | # We look for the leaf first, not forgetting sibling branches for rebuilding the tree later
158 | for k in key:
159 | if type(selected) is dict and k in selected.keys():
160 | sib = {}
161 | for e in selected.keys():
162 | if e != k:
163 | sib[e] = selected[e]
164 | selected = selected[k]
165 | siblings.append(sib)
166 | else:
167 | selected = None
168 | break
169 | # If we can unescape the parameter, we do it and rebuild the tree starting from the leaf
170 | if selected is not None and type(selected) is str:
171 | selected = html.unescape(selected)
172 | key.reverse()
173 | fullpath[key[0]] = selected
174 | s = len(siblings) - 1
175 | for e in siblings[s].keys():
176 | fullpath[e] = siblings[s][e]
177 | for k in key[1:]:
178 | fullpath = {k: fullpath}
179 | s -= 1
180 | for e in siblings[s].keys():
181 | fullpath[e] = siblings[s][e]
182 | key.reverse()
183 | exported_el[key[0]] = fullpath[key[0]]
184 | # If there is any parameter to map, we do it here
185 | Exporter.map_params(exported_el, parameters_to_map)
186 | # The resulting element is appended to the list of exported elements
187 | exported_list.append(exported_el)
188 |
189 | return exported_list
190 |
191 | @staticmethod
192 | def prepare_filename(filename, fmt):
193 | """
194 | Returns a filename with the proper extension according to the given format
195 |
196 | :param filename: the filename to clean
197 | :param fmt: the file format
198 | :return: the cleaned filename
199 | """
200 | if filename[-5:] != ".json" and fmt == Exporter.JSON:
201 | filename += ".json"
202 | elif filename[-4:] != ".csv" and fmt == Exporter.CSV:
203 | filename += ".csv"
204 | return filename
205 |
206 | @staticmethod
207 | def write_file(filename, fmt, csv_keys, data, details=None):
208 | """
209 | Writes content to the given file using the given format.
210 |
211 | The key mapping must be a dict of keys or lists of keys to ensure proper mapping.
212 |
213 | :param filename: the path of the file
214 | :param fmt: the format of the file
215 | :param csv_keys: the key mapping
216 | :param data: the actual data to export
217 | :param details: the details keys to look for
218 | """
219 | with open(filename, "w", encoding="utf-8") as f:
220 | if fmt == Exporter.JSON:
221 | # The JSON format is straightforward, we dump the flattened objects to JSON
222 | json.dump(data, f, ensure_ascii=False, indent=4)
223 | else:
224 | # The CSV format requires some work, to select the most relevant information
225 | fieldnames = csv_keys.keys()
226 | w = csv.DictWriter(f, fieldnames=fieldnames)
227 | w.writeheader()
228 | for el in data:
229 | el_csv = {}
230 | for key in csv_keys:
231 | # First we look for the key specified by csv_keys and select the corresponding leaf
232 | k = csv_keys[key]
233 | selected = None
234 | last_key = None
235 | if type(k) is str:
236 | last_key = k
237 | k = [k]
238 | if k[0] in el.keys():
239 | selected = el[k[0]]
240 | else:
241 | el_csv[key] = ""
242 | continue
243 | if len(k) > 1:
244 | for subkey in k[1:]:
245 | if subkey in selected.keys():
246 | selected = selected[subkey]
247 | last_key = subkey
248 | # Once the leaf is selected, we verify if there is any kind of ID mapping and act accordingly
249 | if type(selected) is dict and 'id' in selected.keys() and 'details' in selected.keys() and last_key in details.keys():
250 | el_csv[key] = "%s (%d)" % (selected["details"][details[last_key]], selected["id"])
251 | elif type(selected) is not dict and type(selected) is not list:
252 | el_csv[key] = selected
253 | else:
254 | el_csv[key] = "unknown"
255 | # And we write the row
256 | w.writerow(el_csv)
257 |
258 | @staticmethod
259 | def export_posts(posts, fmt, filename, tags_list=None, categories_list=None, users_list=None):
260 | """
261 | Exports posts in specified format to specified file
262 |
263 | :param posts: the posts to export
264 | :param fmt: the export format (JSON or CSV)
265 | :param tags_list: a list of tags to associate them with tag ids
266 | :param categories_list: a list of categories to associate them with
267 | category ids
268 | :param user_list: a list of users to associate them with author id
269 | :return: the length of the list written to the file
270 | """
271 | exported_posts = Exporter.setup_export(posts,
272 | [['title', 'rendered'], ['content', 'rendered'], ['excerpt', 'rendered']],
273 | {
274 | 'author': users_list,
275 | 'categories': categories_list,
276 | 'tags': tags_list,
277 | })
278 |
279 | filename = Exporter.prepare_filename(filename, fmt)
280 | csv_keys = {
281 | 'id': 'id',
282 | 'date': 'date',
283 | 'modified': 'modified',
284 | 'status': 'status',
285 | 'link': 'link',
286 | 'title': ['title', 'rendered'],
287 | 'author': 'author'
288 | }
289 | details = {
290 | 'author': 'name',
291 | }
292 | Exporter.write_file(filename, fmt, csv_keys, exported_posts, details)
293 | return len(exported_posts)
294 |
295 | @staticmethod
296 | def export_categories(categories, fmt, filename, category_list=None):
297 | """
298 | Exports categories in specified format to specified file.
299 |
300 | :param categories: the categories to export
301 | :param fmt: the export format (JSON or CSV)
302 | :param filename: the path to the file to write
303 | :param category_list: the list of categories to be used as parents
304 | :return: the length of the list written to the file
305 | """
306 | exported_categories = Exporter.setup_export(categories, # TODO
307 | [],
308 | {
309 | 'parent': category_list,
310 | })
311 |
312 | filename = Exporter.prepare_filename(filename, fmt)
313 |
314 | csv_keys = {
315 | 'id': 'id',
316 | 'name': 'name',
317 | 'post_count': 'count',
318 | 'description': 'description',
319 | 'parent': 'parent'
320 | }
321 | details = {
322 | 'parent': 'name'
323 | }
324 | Exporter.write_file(filename, fmt, csv_keys, exported_categories, details)
325 | return len(exported_categories)
326 |
327 | @staticmethod
328 | def export_tags(tags, fmt, filename):
329 | """
330 | Exports tags in specified format to specified file
331 |
332 | :param tags: the tags to export
333 | :param fmt: the export format (JSON or CSV)
334 | :param filename: the path to the file to write
335 | :return: the length of the list written to the file
336 | """
337 | filename = Exporter.prepare_filename(filename, fmt)
338 |
339 | exported_tags = tags # It seems that no modification will be done for this one, so no deepcopy
340 | csv_keys = {
341 | 'id': 'id',
342 | 'name': 'name',
343 | 'post_count': 'post_count',
344 | 'description': 'description'
345 | }
346 | Exporter.write_file(filename, fmt, csv_keys, exported_tags)
347 | return len(exported_tags)
348 |
349 | @staticmethod
350 | def export_users(users, fmt, filename):
351 | """
352 | Exports users in specified format to specified file.
353 |
354 | :param users: the users to export
355 | :param fmt: the export format (JSON or CSV)
356 | :param filename: the path to the file to write
357 | :return: the length of the list written to the file
358 | """
359 | filename = Exporter.prepare_filename(filename, fmt)
360 |
361 | exported_users = users # It seems that no modification will be done for this one, so no deepcopy
362 | csv_keys = {
363 | 'id': 'id',
364 | 'name': 'name',
365 | 'link': 'link',
366 | 'description': 'description'
367 | }
368 | Exporter.write_file(filename, fmt, csv_keys, exported_users)
369 | return len(exported_users)
370 |
371 | @staticmethod
372 | def export_pages(pages, fmt, filename, parent_pages=None, users=None):
373 | """
374 | Exports pages in specified format to specified file.
375 |
376 | :param pages: the pages to export
377 | :param fmt: the export format (JSON or CSV)
378 | :param filename: the path to the file to write
379 | :param parent_pages: the list of all cached pages, to get parents
380 | :param users: the list of all cached users, to get users
381 | :return: the length of the list written to the file
382 | """
383 | exported_pages = Exporter.setup_export(pages,
384 | [["guid", "rendered"], ["title", "rendered"], ["content", "rendered"], ["excerpt", "rendered"]],
385 | {
386 | 'parent': parent_pages,
387 | 'author': users,
388 | })
389 |
390 | filename = Exporter.prepare_filename(filename, fmt)
391 | csv_keys = {
392 | 'id': 'id',
393 | 'title': ['title', 'rendered'],
394 | 'date': 'date',
395 | 'modified': 'modified',
396 | 'status': 'status',
397 | 'link': 'link',
398 | 'author': 'author',
399 | 'protected': ['content', 'protected']
400 | }
401 | details = {
402 | 'author': 'name'
403 | }
404 | Exporter.write_file(filename, fmt, csv_keys, exported_pages, details)
405 | return len(exported_pages)
406 |
407 | @staticmethod
408 | def export_media(media, fmt, filename, users=None):
409 | """
410 | Exports media in specified format to specified file.
411 |
412 | :param media: the media to export
413 | :param fmt: the export format (JSON or CSV)
414 | :param users: a list of users to associate them with author ids
415 | :return: the length of the list written to the file
416 | """
417 | exported_media = Exporter.setup_export(media,
418 | [
419 | ['guid', 'rendered'],
420 | ['title', 'rendered'],
421 | ['description', 'rendered'],
422 | ['caption', 'rendered'],
423 | ],
424 | {
425 | 'author': users,
426 | })
427 |
428 | filename = Exporter.prepare_filename(filename, fmt)
429 | csv_keys = {
430 | 'id': 'id',
431 | 'title': ['title', 'rendered'],
432 | 'date': 'date',
433 | 'modified': 'modified',
434 | 'status': 'status',
435 | 'link': 'link',
436 | 'author': 'author',
437 | 'media_type': 'media_type'
438 | }
439 | details = {
440 | 'author': 'name'
441 | }
442 | Exporter.write_file(filename, fmt, csv_keys, exported_media, details)
443 | return len(exported_media)
444 |
445 | @staticmethod
446 | def export_namespaces(namespaces, fmt, filename):
447 | """
448 | **NOT IMPLEMENTED** Exports namespaces in specified format to specified file.
449 |
450 | :param namespaces: the namespaces to export
451 | :param fmt: the export format (JSON or CSV)
452 | :return: the length of the list written to the file
453 | """
454 | Console.log_info("Namespaces export not available yet")
455 | return 0
456 |
457 | # FIXME to be refactored
458 | @staticmethod
459 | def export_comments_interactive(comments, fmt, filename, parent_posts=None, users=None):
460 | """
461 | Exports comments in specified format to specified file.
462 |
463 | :param comments: the comments to export
464 | :param fmt: the export format (JSON or CSV)
465 | :param filename: the path to the file to write
466 | :param parent_posts: the list of all cached posts, to get parent posts (not used yet because this could be too verbose)
467 | :param users: the list of all cached users, to get users
468 | :return: the length of the list written to the file
469 | """
470 | exported_comments = Exporter.setup_export(comments,
471 | [["content", "rendered"]],
472 | {
473 | 'post': parent_posts,
474 | 'author': users,
475 | })
476 |
477 | # FIXME replacing the post ID by the post title in CSV mode doesn't work yet (nested keys)
478 | filename = Exporter.prepare_filename(filename, fmt)
479 | csv_keys = {
480 | 'id': 'id',
481 | 'post': 'post',
482 | 'date': 'date',
483 | 'status': 'status',
484 | 'link': 'link',
485 | 'author': 'author_name',
486 | }
487 | details = {
488 | 'post': ['title', 'rendered']
489 | }
490 | Exporter.write_file(filename, fmt, csv_keys, exported_comments, details)
491 | return len(exported_comments)
492 |
493 | # TODO deprecated, to be moved to export_posts when HTML will be supported
494 | @staticmethod
495 | def export_posts_html(posts, folder, tags_list=None, categories_list=None,
496 | users_list=None):
497 | """
498 | Exports posts as HTML to specified export folder.
499 |
500 | :param posts: the posts to export
501 | :param folder: the export folder
502 | :param tags_list: a list of tags to associate them with tag ids
503 | :param categories_list: a list of categories to associate them with category ids
504 | :param user_list: a list of users to associate them with author id
505 | :return: the length of the list written to the file
506 | """
507 | exported_posts = 0
508 |
509 | date_format = "%Y-%m-%dT%H:%M:%S-%Z"
510 |
511 | if not os.path.isdir(folder):
512 | os.makedirs(folder)
513 | for post in posts:
514 | post_file = None
515 | if 'slug' in post.keys():
516 | post_file = open(os.path.join(folder, post['slug'])+".html",
517 | "wt", encoding="utf-8")
518 | else:
519 | post_file = open(os.path.join(folder, str(post['id']))+".html",
520 | "wt", encoding="utf-8")
521 |
522 | title = "Unknown"
523 | if 'title' in post.keys() and 'rendered' in post['title'].keys():
524 | title = post['title']['rendered']
525 |
526 | date_gmt = "Unknown"
527 | if 'date_gmt' in post.keys():
528 | date_gmt = datetime.strptime(post['date_gmt'] +
529 | "-GMT", date_format)
530 | modified_gmt = "Unknown"
531 | if 'modified_gmt' in post.keys():
532 | modified_gmt = datetime.strptime(post['modified_gmt'] +
533 | "-GMT", date_format)
534 | status = "Unknown"
535 | if 'status' in post.keys():
536 | status = post['status']
537 |
538 | post_type = "Unknown"
539 | if 'type' in post.keys():
540 | post_type = post['type']
541 |
542 | link = "Unknown"
543 | if 'link' in post.keys():
544 | link = html.escape(post['link'])
545 |
546 | comments = "Unknown"
547 | if 'comment_status' in post.keys():
548 | comments = html.escape(post['comment_status'])
549 |
550 | content = "Unknown"
551 | if 'content' in post.keys() and 'rendered' in \
552 | post['content'].keys():
553 | content = post['content']['rendered']
554 |
555 | excerpt = "Unknown"
556 | if 'excerpt' in post.keys() and 'rendered' in \
557 | post['excerpt'].keys():
558 | excerpt = post['excerpt']['rendered']
559 |
560 | author = "Unknown"
561 | if 'author' in post.keys() and users_list is not None:
562 | author_obj = get_by_id(users_list, post['author'])
563 | author = "%d: " % post['author']
564 | if author_obj is not None:
565 | if 'name' in author_obj.keys():
566 | author += author_obj['name']
567 | if 'slug' in author_obj.keys():
568 | author += "(%s)" % author_obj['slug']
569 | if 'link' in author_obj.keys():
570 | author += " - %s" % \
571 | (author_obj['link'], author_obj['link'])
572 | elif 'author' in post.keys():
573 | author = str(post['author'])
574 |
575 | categories = "
Unknown"
576 | if 'categories' in post.keys() and categories_list is not None:
577 | categories = ""
578 | for cat in post['categories']:
579 | cat_obj = get_by_id(categories_list, cat)
580 | categories += "%d: " % cat
581 | if cat_obj is not None:
582 | if 'name' in cat_obj.keys():
583 | categories += cat_obj['name']
584 | if 'link' in cat_obj.keys():
585 | categories += " - %s" % \
586 | (html.escape(cat_obj['link']),
587 | html.escape(cat_obj['link']))
588 | categories += ""
589 | elif 'categories' in post.keys():
590 | categories = ""
591 | for cat in post['categories']:
592 | categories += "" + str(post['categories']) + ""
593 |
594 | tags = "Unknown"
595 | if 'tags' in post.keys() and tags_list is not None:
596 | tags = ""
597 | for tag in post['tags']:
598 | tag_obj = get_by_id(tags_list, tag)
599 | tags += "%d: " % tag
600 | if tag_obj is not None:
601 | if 'name' in tag_obj.keys():
602 | tags += tag_obj['name']
603 | if 'link' in tag_obj.keys():
604 | tags += " - %s" % \
605 | (html.escape(tag_obj['link']),
606 | html.escape(tag_obj['link']))
607 | tags += ""
608 | elif 'tags' in post.keys():
609 | tags = ""
610 | for cat in post['tags']:
611 | tags += "" + str(post['categories']) + ""
612 |
613 | buffer = \
614 | """
615 |
616 |
617 | {title}
618 |
619 |
620 |
621 |
Metadata
622 |
623 | - Date (GMT): {date_gmt}
624 | - Date modified (GMT): {modified_gmt}
625 | - Status: {status}
626 | - Type: {post_type}
627 | - Link: {link}
628 | - Author: {author}
629 | - Comment status: {comments}
630 | -
631 | Categories:
632 |
635 |
636 | -
637 | Tags:
638 |
641 |
642 |
643 |
644 |
645 |
Excerpt
646 | {excerpt}
647 |
648 |
649 |
{title}
650 | {content}
651 |
652 |
653 |
654 | """
655 | buffer = buffer.format(
656 | title=title,
657 | date_gmt=date_gmt.strftime("%d/%m/%Y %H:%M:%S"),
658 | modified_gmt=modified_gmt.strftime("%d/%m/%Y %H:%M:%S"),
659 | status=status,
660 | post_type=post_type,
661 | link=link,
662 | author=author,
663 | comments=comments,
664 | categories=categories,
665 | tags=tags,
666 | excerpt=excerpt,
667 | content=content
668 | )
669 |
670 | post_file.write(buffer)
671 | post_file.close()
672 | exported_posts += 1
673 |
674 | return exported_posts
675 |
676 | @staticmethod
677 | def export_comments(posts, orphan_comments, export_folder):
678 | """
679 | Exports comments from posts and from orphans list
680 | """
681 | exported_comments = 0
682 | for post in posts:
683 | if 'comments' in post.keys() and len(post['comments']) > 0:
684 | for comment in post['comments']:
685 | if 'slug' in post.keys() and len(post['slug']) > 0:
686 | Exporter.export_comments_helper(comment, post['slug'], export_folder)
687 | else:
688 | Exporter.export_comments_helper(comment, post['id'], export_folder)
689 | exported_comments += 1
690 | for comment in orphan_comments:
691 | Exporter.export_comments_helper(comment, '__orphan_comments', export_folder)
692 | exported_comments += 1
693 | return exported_comments
694 |
695 | @staticmethod
696 | def export_comments_helper(comment, post, export_folder):
697 | date_format = "%Y-%m-%dT%H:%M:%S-%Z"
698 | if not os.path.isdir(export_folder):
699 | os.mkdir(export_folder)
700 | if not os.path.isdir(os.path.join(export_folder, post)):
701 | os.mkdir(os.path.join(export_folder, post))
702 | out_file = open(os.path.join(export_folder, post, "%04d.html" % comment['id']), "wt", encoding="utf-8")
703 | date_gmt = "Unknown"
704 | if 'date_gmt' in comment.keys():
705 | date_gmt = datetime.strptime(comment['date_gmt'] +
706 | "-GMT", date_format)
707 | post_link = "None"
708 | if '_links' in comment.keys() and 'up' in comment['_links'].keys() and len(comment['_links'].keys()) > 0 and 'href' in comment['_links']['up'][0].keys():
709 | post_link = html.escape(comment['_links']['up'][0]['href'])
710 | buffer = """
711 |
712 |
713 |
714 | {author}
715 |
716 |
717 |
718 |
Metadata
719 |
720 | - Date (GMT): {date_gmt}
721 | - Status: {status}
722 | - Link: {link}
723 | - Author: {author}
724 | - Author URL: {author_url}
725 | - Post ID: {post_id}
726 | - Post link: {post_link}
727 |
728 |
729 |
730 |
{author} on {post_title}
731 | {content}
732 |
733 |
734 |
735 | """
736 | buffer = buffer.format(
737 | author=html.escape(comment["author_name"]),
738 | author_url=html.escape(comment['author_url']),
739 | date_gmt=date_gmt.strftime("%d/%m/%Y %H:%M:%S"),
740 | status=html.escape(comment['status']),
741 | link=html.escape(comment['link']),
742 | content=html.escape(comment['content']['rendered']),
743 | post_title=html.escape(post),
744 | post_id=int(comment['post']),
745 | post_link=post_link
746 | )
747 | out_file.write(buffer)
748 | out_file.close()
749 |
--------------------------------------------------------------------------------
/lib/infodisplayer.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | import html
24 | import csv
25 | from datetime import datetime
26 |
27 | from lib.console import Console
28 |
29 | class InfoDisplayer:
30 | """
31 | Static class to display information for different categories
32 | """
33 |
34 | @staticmethod
35 | def display_basic_info(information):
36 | """
37 | Displays basic information about the WordPress instance
38 | param information: information as a JSON object
39 | """
40 | print()
41 |
42 | if 'name' in information.keys():
43 | print("Site name: %s" % html.unescape(information['name']))
44 |
45 | if 'description' in information.keys():
46 | print("Site description: %s" %
47 | html.unescape(information['description']))
48 |
49 | if 'home' in information.keys():
50 | print("Site home: %s" % html.unescape(information['home']))
51 |
52 | if 'gmt_offset' in information.keys():
53 | timezone_string = ""
54 | gmt_offset = str(information['gmt_offset'])
55 | if '-' not in gmt_offset:
56 | gmt_offset = '+' + gmt_offset
57 | if 'timezone_string' in information.keys():
58 | timezone_string = information['timezone_string']
59 | print("Site Timezone: %s (GMT%s)" % (timezone_string, gmt_offset))
60 |
61 | if 'namespaces' in information.keys():
62 | print('Namespaces (API provided by addons):')
63 | ns_ref = {}
64 | try:
65 | ns_ref_file = open("lib/plugins/plugin_list.csv", "rt")
66 | ns_ref_reader = csv.reader(ns_ref_file)
67 | for row in ns_ref_reader:
68 | desc = None
69 | url = None
70 | if len(row) > 1 and len(row[1]) > 0:
71 | desc = row[1]
72 | if len(row) > 2 and len(row[2]) > 0:
73 | url = row[2]
74 | ns_ref[row[0]] = {"desc": desc, "url": url}
75 | ns_ref_file.close()
76 | except:
77 | Console.log_error("Could not load namespaces reference file")
78 | for ns in information['namespaces']:
79 | tip = ""
80 | if ns in ns_ref.keys():
81 | if ns_ref[ns]['desc'] is not None:
82 | if tip == "":
83 | tip += " - "
84 | tip += ns_ref[ns]['desc']
85 | if ns_ref[ns]['url'] is not None:
86 | if tip == "":
87 | tip += " - "
88 | tip += " - " + ns_ref[ns]['url']
89 | print(' %s%s' % (ns, tip))
90 |
91 | # TODO, dive into authentication
92 | print()
93 |
94 | @staticmethod
95 | def display_namespaces(information, details=False):
96 | """
97 | Displays namespace list of the WordPress API
98 |
99 | :param information: information as a JSON object
100 | :param details: unused, available for compatibility purposes
101 | """
102 | print()
103 | if information is not None:
104 | for ns in information:
105 | print("* %s" % ns)
106 | print()
107 |
108 | @staticmethod
109 | def display_endpoints(information):
110 | """
111 | Displays endpoint documentation of the WordPress API
112 | param information: information as a JSON object
113 | """
114 | print()
115 |
116 | if 'routes' not in information.keys():
117 | Console.log_error("Did not find the routes for endpoint discovery")
118 | return None
119 |
120 | for url, route in information['routes'].items():
121 | print("%s (Namespace: %s)" % (url, route['namespace']))
122 | for endpoint in route['endpoints']:
123 | methods = " "
124 | first = True
125 | for method in endpoint['methods']:
126 | if first:
127 | methods += method
128 | first = False
129 | else:
130 | methods += ", " + method
131 | print(methods)
132 | if len(endpoint['args']) > 0:
133 | for arg, props in endpoint['args'].items():
134 | required = ""
135 | if props['required']:
136 | required = " (required)"
137 | print(" " + arg + required)
138 | if 'type' in props.keys():
139 | print(" type: " + str(props['type']))
140 | if 'default' in props.keys():
141 | print(" default: " +
142 | str(props['default']))
143 | if 'enum' in props.keys():
144 | allowed = " allowed values: "
145 | first = True
146 | for val in props['enum']:
147 | if first:
148 | allowed += val
149 | first = False
150 | else:
151 | allowed += ", " + val
152 | print(allowed)
153 | if 'description' in props.keys():
154 | print(" " + str(props['description']))
155 | print()
156 |
157 | @staticmethod
158 | def display_posts(information, orphan_comments=[], details=False):
159 | """
160 | Displays posts published on the WordPress instance
161 | param information: information as a JSON object
162 | """
163 | print()
164 | date_format = "%Y-%m-%dT%H:%M:%S-%Z"
165 | for post in information:
166 | if post is not None:
167 | line = ""
168 | if 'id' in post.keys():
169 | line += "ID: %d" %post['id']
170 | if 'title' in post.keys():
171 | line += " - " + html.unescape(post['title']['rendered'])
172 | if 'date_gmt' in post.keys():
173 | date_gmt = datetime.strptime(post['date_gmt'] +
174 | "-GMT", date_format)
175 | line += " on %s" % \
176 | date_gmt.strftime("%d/%m/%Y at %H:%M:%S")
177 | if 'link' in post.keys():
178 | line += " - " + post['link']
179 | if details:
180 | if 'slug' in post.keys():
181 | line += "\nSlug: " + post['slug']
182 | if 'status' in post.keys():
183 | line += "\nStatus: " + post['status']
184 | if 'author' in post.keys():
185 | line += "\nAuthor ID: %d" % post['author']
186 | if 'comment_status' in post.keys():
187 | line += "\nComment status: " + post['comment_status']
188 | if 'template' in post.keys() and len(post['template']) > 0:
189 | line += "\nTemplate: " + post['template']
190 | if 'categories' in post.keys() and len(post['categories']) > 0:
191 | line += "\nCategory IDs: "
192 | for cat in post['categories']:
193 | line += "%d, " % cat
194 | line = line[:-2]
195 | if 'excerpt' in post.keys():
196 | line += "\nExcerpt: "
197 | if 'protected' in post['excerpt'].keys() and post['excerpt']['protected']:
198 | line += ""
199 | elif 'rendered' in post['excerpt'].keys():
200 | line += "\n" + html.unescape(post['excerpt']['rendered'])
201 | if 'content' in post.keys():
202 | line += "\nContent: "
203 | if 'protected' in post['content'].keys() and post['content']['protected']:
204 | line += ""
205 | elif 'rendered' in post['content'].keys():
206 | line += "\n" + html.unescape(post['content']['rendered'])
207 | if 'comments' in post.keys():
208 | for comment in post['comments']:
209 | line += "\n\t * Comment by %s from (%s) - %s" % (comment['author_name'], comment['author_url'], comment['link'])
210 | print(line)
211 |
212 | if len(orphan_comments) > 0:
213 | # TODO: Untested code, may never be executed, I don't know how the REST API and WordPress handle post/comment link in back-end
214 | print()
215 | print("Found orphan comments! Check them right below:")
216 | for comment in post['comments']:
217 | line += "\n\t * Comment by %s from (%s) on post ID %d - %s" % (comment['author_name'], comment['author_url'], comment['post'], comment['link'])
218 | print()
219 |
220 | @staticmethod
221 | def display_comments(information, details=False):
222 | """
223 | Displays comments published on the WordPress instance.
224 |
225 | :param information: information as a JSON object
226 | :param details: if the details should be displayed
227 | """
228 | print()
229 | date_format = "%Y-%m-%dT%H:%M:%S-%Z"
230 | for comment in information:
231 | if comment is not None:
232 | line = ""
233 | if 'id' in comment.keys():
234 | line += "ID: %d" % comment['id']
235 | if 'post' in comment.keys():
236 | line += " - Post ID: %d" % comment['post'] #html.unescape(post['title']['rendered'])
237 | if 'author_name' in comment.keys():
238 | line += " - By %s" % comment['author_name']
239 | if 'date' in comment.keys():
240 | date_gmt = datetime.strptime(comment['date_gmt'] +
241 | "-GMT", date_format)
242 | line += " on %s" % \
243 | date_gmt.strftime("%d/%m/%Y at %H:%M:%S")
244 | if details:
245 | if 'parent' in comment.keys() and comment['parent'] != 0:
246 | line += "\nParent ID: " + comment['parent']
247 | if 'link' in comment.keys():
248 | line += "\nLink: " + comment['link']
249 | if 'status' in comment.keys():
250 | line += "\nStatus: " + comment['status']
251 | if 'author_url' in comment.keys() and len(comment['author_url']) > 0:
252 | line += "\nAuthor URL: " + comment['author_url']
253 | if 'content' in comment.keys():
254 | line += "\nContent: \n" + html.unescape(comment['content']['rendered'])
255 | print(line)
256 | print()
257 |
258 | @staticmethod
259 | def display_users(information, details=False):
260 | """
261 | Displays users on the WordPress instance
262 |
263 | :param information: information as a JSON object
264 | :param details: display more details about the user
265 | """
266 | print()
267 | for user in information:
268 | if user is not None:
269 | line = ""
270 | if 'id' in user.keys():
271 | line += "User ID: %d\n" % user['id']
272 | if 'name' in user.keys():
273 | line += " Display name: %s\n" % user['name']
274 | if 'slug' in user.keys():
275 | line += " User name (probable): %s\n" % user['slug']
276 | if 'description' in user.keys():
277 | line += " User description: %s\n" % user['description']
278 | if 'url' in user.keys():
279 | line += " User website: %s\n" % user['url']
280 | if 'link' in user.keys():
281 | line += " User personal page: %s\n" % user['link']
282 | if details:
283 | if "avatar_urls" in user.keys() and type(user["avatar_urls"]) is dict and len(user["avatar_urls"].keys()) > 0:
284 | line += " Avatars: \n"
285 | for key, value in user["avatar_urls"].items():
286 | line += " * %s: %s\n" % (key, value)
287 | print(line)
288 | print()
289 |
290 | @staticmethod
291 | def display_categories(information, details=False):
292 | """
293 | Displays categories of the WordPress instance
294 | param information: information as a JSON object
295 | """
296 | print()
297 | for category in information:
298 | if category is not None:
299 | line = ""
300 | if 'id' in category.keys():
301 | line += "Category ID: %d\n" % category['id']
302 | if 'name' in category.keys():
303 | line += " Name: %s\n" % category['name']
304 | if 'description' in category.keys():
305 | line += " Description: %s\n" % category['description']
306 | if 'count' in category.keys():
307 | line += " Number of posts: %d\n" % category['count']
308 | if 'link' in category.keys():
309 | line += " Page: %s\n" % category['link']
310 | if details:
311 | if 'slug' in category.keys():
312 | line += " Slug: %s\n" % category['slug']
313 | if 'taxonomy' in category.keys():
314 | line += " Taxonomy: %s\n" % category['slug']
315 | if 'parent' in category.keys():
316 | line += " Parent category: "
317 | if type(category['parent']) is str:
318 | line += category['parent']
319 | elif type(category['parent']) is int:
320 | line += "%d" % category['parent']
321 | else:
322 | line += "Unknown"
323 | line += "\n"
324 | print(line)
325 | print()
326 |
327 | @staticmethod
328 | def display_tags(information, details=False):
329 | """
330 | Displays tags of the WordPress instance
331 | param information: information as a JSON object
332 | """
333 | print()
334 | for tag in information:
335 | if tag is not None:
336 | line = ""
337 | if 'id' in tag.keys():
338 | line += "Tag ID: %d\n" % tag['id']
339 | if 'name' in tag.keys():
340 | line += " Name: %s\n" % tag['name']
341 | if 'description' in tag.keys():
342 | line += " Description: %s\n" % tag['description']
343 | if 'count' in tag.keys():
344 | line += " Number of posts: %d\n" % tag['count']
345 | if 'link' in tag.keys():
346 | line += " Page: %s\n" % tag['link']
347 | if details:
348 | if 'slug' in tag.keys():
349 | line += " Slug: %s\n" % tag['slug']
350 | if 'taxonomy' in tag.keys():
351 | line += " Taxonomy: %s\n" % tag['slug']
352 | print(line)
353 | print()
354 |
355 | @staticmethod
356 | def display_media(information, details=False):
357 | """
358 | Displays media objects of the WordPress instance
359 |
360 | :param information: information as a JSON object
361 | :param details: if the details should be displayed
362 | """
363 | print()
364 | date_format = "%Y-%m-%dT%H:%M:%S-%Z"
365 | for media in information:
366 | if media is not None:
367 | line = ""
368 | if 'id' in media.keys():
369 | line += "Media ID: %d\n" % media['id']
370 | if 'title' in media.keys() and 'rendered' in media['title']:
371 | line += " Media title: %s\n" % \
372 | html.unescape(media['title']['rendered'])
373 | if 'date_gmt' in media.keys():
374 | date_gmt = datetime.strptime(media['date_gmt'] +
375 | "-GMT", date_format)
376 | line += " Upload date (GMT): %s\n" % \
377 | date_gmt.strftime("%d/%m/%Y %H:%M:%S")
378 | if 'media_type' in media.keys():
379 | line += " Media type: %s\n" % media['media_type']
380 | if 'mime_type' in media.keys():
381 | line += " Mime type: %s\n" % media['mime_type']
382 | if 'link' in media.keys():
383 | line += " Page: %s\n" % media['link']
384 | if 'source_url' in media.keys():
385 | line += " Source URL: %s\n" % media['source_url']
386 | if details:
387 | if 'slug' in media.keys():
388 | line += "Slug: " + media['slug'] + "\n"
389 | if 'status' in media.keys():
390 | line += "Status: " + media['status'] + "\n"
391 | if 'type' in media.keys():
392 | line += "Type: " + media['type'] + "\n"
393 | if 'author' in media.keys():
394 | line += "Author ID: %d\n" % media['author']
395 | if 'alt_text' in media.keys():
396 | line += "Alt text: " + media['alt_text'] + "\n"
397 | if 'comment_status' in media.keys():
398 | line += "Comment status: " + media['comment_status'] + "\n"
399 | if 'post' in media.keys():
400 | line += "Post or page ID: %d\n" % media['post']
401 | if 'description' in media.keys() and media['description']['rendered']:
402 | line += "Description: \n" + html.unescape(media['description']['rendered']) + "\n"
403 | if 'caption' in media.keys() and media['caption']['rendered']:
404 | line += "Caption: \n" + html.unescape(media['caption']['rendered']) + "\n"
405 | print(line)
406 | print()
407 |
408 | @staticmethod
409 | def display_pages(information, details=False):
410 | """
411 | Displays pages published on the WordPress instance
412 |
413 | :param information: information as a JSON object
414 | :param details: if the details should be displayed
415 | """
416 | print()
417 | for page in information:
418 | if page is not None:
419 | line = ""
420 | if 'id' in page.keys():
421 | line += "ID: %d" % page['id']
422 | if 'title' in page.keys() and 'rendered' in page['title']:
423 | line += " - " + html.unescape(page['title']['rendered'])
424 | if 'link' in page.keys():
425 | line += " - " + page['link']
426 | if details:
427 | if 'slug' in page.keys():
428 | line += "\nSlug: " + page['slug']
429 | if 'status' in page.keys():
430 | line += "\nStatus: " + page['status']
431 | if 'author' in page.keys():
432 | line += "\nAuthor ID: %d" % page['author']
433 | if 'comment_status' in page.keys():
434 | line += "\nComment status: " + page['comment_status']
435 | if 'template' in page.keys() and len(page['template']) > 0:
436 | line += "\nTemplate: " + page['template']
437 | if 'parent' in page.keys():
438 | if page['parent'] == 0:
439 | line += "\nParent: none"
440 | else:
441 | line += "\nParent ID: %d" % page['parent']
442 | if 'excerpt' in page.keys():
443 | line += "\nExcerpt: "
444 | if 'protected' in page['excerpt'].keys() and page['excerpt']['protected']:
445 | line += ""
446 | elif 'rendered' in page['excerpt'].keys():
447 | line += "\n" + html.unescape(page['excerpt']['rendered'])
448 | if 'content' in page.keys():
449 | line += "\nContent: "
450 | if 'protected' in page['content'].keys() and page['content']['protected']:
451 | line += ""
452 | elif 'rendered' in page['content'].keys():
453 | line += "\n" + html.unescape(page['content']['rendered'])
454 | print(line)
455 | print()
456 |
457 | @staticmethod
458 | def recurse_list_or_dict(data, tab):
459 | """
460 | Helper function to generate recursive display of API data
461 | """
462 | if type(data) is not dict and type(data) is not list:
463 | return tab + str(data)
464 |
465 | line = ""
466 | if type(data) is list:
467 | i = 0
468 | length = len(data)
469 | for value in data:
470 | do_jmp = True
471 | if type(value) is dict or type(value) is list:
472 | line += InfoDisplayer.recurse_list_or_dict(value, tab+"\t")
473 | elif type(value) is str:
474 | if "\n" in value:
475 | line += "\n" + tab + "\t"
476 | line += value.replace("\n", "\n"+tab+"\t")
477 | else:
478 | line += " "
479 | line += value.replace("\n", "\n"+tab)
480 | do_jmp = False
481 | else:
482 | line += " " + str(value)
483 | if i < length and do_jmp:
484 | line += "\n"
485 | i += 1
486 | else:
487 | for key,value in data.items():
488 | line += "\n" + tab + key
489 | if type(value) is dict or type(value) is list:
490 | line += InfoDisplayer.recurse_list_or_dict(value, tab+"\t")
491 | elif type(value) is str:
492 | if "\n" in value:
493 | line += "\n" + tab + "\t"
494 | line += value.replace("\n", "\n"+tab+"\t")
495 | else:
496 | line += " "
497 | line += value.replace("\n", "\n"+tab)
498 | else:
499 | line += " " + str(value)
500 | return line
501 |
502 | @staticmethod
503 | def display_crawled_ns(information):
504 | """
505 | Displays endpoints details published on the WordPress instance
506 | param information: information as a JSON object
507 | """
508 | print()
509 | for url,data in information.items():
510 | line = "\n"
511 | line += url
512 | tab = "\t"
513 | line += InfoDisplayer.recurse_list_or_dict(data, tab)
514 | print(line)
515 | print()
516 |
--------------------------------------------------------------------------------
/lib/interactive.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | import cmd
24 | import argparse
25 | import shlex
26 | import sys
27 | import re
28 | import copy
29 | import os
30 |
31 | from lib.wpapi import WPApi, WordPressApiNotV2
32 | from lib.requestsession import RequestSession
33 | from lib.console import Console
34 | from lib.infodisplayer import InfoDisplayer
35 | from lib.exporter import Exporter
36 | from lib.utils import get_by_id
37 |
38 | class ArgumentParser(argparse.ArgumentParser):
39 | """
40 | Wrapper for argparse.ArgumentParser (especially the help function that quits the application after display)
41 | """
42 | def __init__(self, prog="", description=""):
43 | argparse.ArgumentParser.__init__(self, prog=prog, add_help=False, description=description)
44 | self.add_argument("--help", "-h", help="print this help", action="store_true")
45 | self.should_help = True
46 |
47 | def custom_parse_args(self, args):
48 | args = self.parse_args(shlex.split(args))
49 | if args.help:
50 | if self.should_help:
51 | self.print_help(sys.stdout)
52 | print()
53 | self.should_help = False
54 | return None
55 | if self.should_help:
56 | return args
57 | else:
58 | return None
59 |
60 | def error(self, message):
61 | if self.should_help:
62 | self.print_help(sys.stdout)
63 | print()
64 | self.should_help = False
65 |
66 | class InteractiveShell(cmd.Cmd):
67 | """
68 | The interactive shell for the application
69 | """
70 | intro = """
71 | Entering interactive session
72 | Use the 'help' command to get a list of available commands and parameters, 'exit' to quit
73 | `command -h` gives more details about a command
74 | """
75 | prompt = "> "
76 |
77 | def __init__(self, target, session, version):
78 | cmd.Cmd.__init__(self)
79 | self.target = target
80 | InteractiveShell.prompt = Console.red + target + Console.normal + " > "
81 | self.session = session
82 | self.version = version
83 | self.scanner = WPApi(self.target, session=session)
84 |
85 | @staticmethod
86 | def export_decorator(export_func, is_all, export_str, json, csv, values, kwargs = {}):
87 | if json is not None:
88 | json_file = json
89 | if is_all:
90 | json_file = json + "-" + export_str
91 | args = [values]
92 | args.append(Exporter.JSON)
93 | args.append(json_file)
94 | export_func(*args, **kwargs)
95 | if csv is not None:
96 | csv_file = csv
97 | if is_all:
98 | csv_file = csv + "-" + export_str
99 | args = [values]
100 | args.append(Exporter.CSV)
101 | args.append(csv_file)
102 | export_func(*args, **kwargs)
103 |
104 | def get_fetch_or_list_type(self, obj_type, plural=False):
105 | """
106 | Returns a dict containing all necessary metadata
107 | about the obj_type to list and fetch data
108 |
109 | :param obj_type: the type of the object
110 | :param plural: whether the name must be plural or not
111 | """
112 | display_func = None
113 | export_func = None
114 | additional_info = {}
115 | obj_name = ""
116 | if obj_type == WPApi.USER:
117 | display_func = InfoDisplayer.display_users
118 | export_func = Exporter.export_users
119 | additional_info = {}
120 | obj_name = "Users" if plural else "User"
121 | elif obj_type == WPApi.TAG:
122 | display_func = InfoDisplayer.display_tags
123 | export_func = Exporter.export_tags
124 | additional_info = {}
125 | obj_name = "Tags" if plural else "Tag"
126 | elif obj_type == WPApi.CATEGORY:
127 | display_func = InfoDisplayer.display_categories
128 | export_func = Exporter.export_categories
129 | additional_info = {
130 | 'category_list': self.scanner.categories
131 | }
132 | obj_name = "Categories" if plural else "Category"
133 | elif obj_type == WPApi.POST:
134 | display_func = InfoDisplayer.display_posts
135 | export_func = Exporter.export_posts
136 | additional_info = {
137 | 'tags_list': self.scanner.tags,
138 | 'categories_list': self.scanner.categories,
139 | 'users_list': self.scanner.users
140 | }
141 | obj_name = "Posts" if plural else "Post"
142 | elif obj_type == WPApi.PAGE:
143 | display_func = InfoDisplayer.display_pages
144 | export_func = Exporter.export_pages
145 | additional_info = {
146 | 'parent_pages': self.scanner.pages,
147 | 'users': self.scanner.users
148 | }
149 | obj_name = "Pages" if plural else "Page"
150 | elif obj_type == WPApi.COMMENT:
151 | display_func = InfoDisplayer.display_comments
152 | export_func = Exporter.export_comments_interactive
153 | additional_info = {
154 | #'parent_posts': self.scanner.posts, # May be too verbose
155 | 'users': self.scanner.users
156 | }
157 | obj_name = "Comments" if plural else "Comment"
158 | elif obj_type == WPApi.MEDIA:
159 | display_func = InfoDisplayer.display_media
160 | export_func = Exporter.export_media
161 | additional_info = {'users': self.scanner.users}
162 | obj_name = "Media"
163 | elif obj_type == WPApi.NAMESPACE:
164 | display_func = InfoDisplayer.display_namespaces
165 | export_func = Exporter.export_media
166 | additional_info = {}
167 | obj_name = "Namespaces" if plural else "Namespace"
168 |
169 | return {
170 | "display_func": display_func,
171 | "export_func": export_func,
172 | "additional_info": additional_info,
173 | "obj_name": obj_name
174 | }
175 |
176 | def fetch_obj(self, obj_type, obj_id, cache=True, json=None, csv=None):
177 | """
178 | Displays and exports (if relevant) the object fetched by ID
179 |
180 | :param obj_type: the type of the object
181 | :param obj_id: the ID of the obj
182 | :param cache: whether to use the cache of not
183 | :param json: json export filename
184 | :param csv: csv export filename
185 | """
186 | prop = self.get_fetch_or_list_type(obj_type)
187 | print(prop["obj_name"] + " details")
188 | try:
189 | obj = self.scanner.get_obj_by_id(obj_type, obj_id, use_cache=cache)
190 | if len(obj) == 0:
191 | Console.log_info(prop["obj_name"] + " not found\n")
192 | else:
193 | prop["display_func"](obj, details=True)
194 | if len(prop["additional_info"].keys()) > 0:
195 | InteractiveShell.export_decorator(prop["export_func"], False, "", json, csv, obj, prop["additional_info"])
196 | else:
197 | InteractiveShell.export_decorator(prop["export_func"], False, "", json, csv, obj)
198 | except WordPressApiNotV2:
199 | Console.log_error("The API does not support WP V2")
200 | except IOError as e:
201 | Console.log_error("Could not open %s for writing" % e.filename)
202 | print()
203 |
204 | def list_obj(self, obj_type, start, limit, is_all=False, cache=True, json=None, csv=None):
205 | """
206 | Displays and exports (if relevant) the object list
207 |
208 | :param obj_type: the type of the object
209 | :param start: the offset of the first object
210 | :param limit: the maximum number of objects to list
211 | :param is_all: are all object types requested?
212 | :param cache: whether to use the cache of not
213 | :param json: json export filename
214 | :param csv: csv export filename
215 | """
216 | prop = self.get_fetch_or_list_type(obj_type, plural=True)
217 | print(prop["obj_name"] + " details")
218 | try:
219 | kwargs = {}
220 | if obj_type == WPApi.POST:
221 | kwargs = {"comments": False}
222 | obj_list = self.scanner.get_obj_list(obj_type, start, limit, cache, kwargs=kwargs)
223 | prop["display_func"](obj_list)
224 | InteractiveShell.export_decorator(prop["export_func"], is_all, prop["obj_name"].lower(), json, csv, obj_list)
225 | except WordPressApiNotV2:
226 | Console.log_error("The API does not support WP V2")
227 | except IOError as e:
228 | Console.log_error("Could not open %s for writing" % e.filename)
229 | print()
230 |
231 | def do_exit(self, arg):
232 | 'Exit wp-json-scraper'
233 | return True
234 |
235 | def do_show(self, arg):
236 | 'Shows information about parameters in memory'
237 | parser = ArgumentParser(prog='show', description='show information about global parameters')
238 | parser.add_argument("what", choices=['all', 'target', 'proxy', 'cookies', 'credentials', 'version'],
239 | help='choose the information to be displayed', default='all')
240 | args = parser.custom_parse_args(arg)
241 | if args is None:
242 | return
243 | if args.what == 'all' or args.what == 'target':
244 | print("Target: %s" % self.target)
245 | if args.what == 'all' or args.what == 'proxy':
246 | proxies = self.session.get_proxies()
247 | if proxies is not None and len(proxies) > 0:
248 | print ("Proxies:")
249 | for key, value in proxies.items():
250 | print("\t%s: %s" % (key, value))
251 | else:
252 | print ("Proxy: none")
253 | if args.what == 'all' or args.what == 'cookies':
254 | cookies = self.session.get_cookies()
255 | if len(cookies) > 0:
256 | print("Cookies:")
257 | for key, value in cookies.items():
258 | print("\t%s: %s" % (key, value))
259 | else:
260 | print("Cookies: none")
261 | if args.what == 'all' or args.what == 'credentials':
262 | credentials = self.session.get_creds()
263 | if credentials is not None:
264 | creds_str = "Credentials: "
265 | for el in credentials:
266 | creds_str += el + ":"
267 | print(creds_str[:-1])
268 | else:
269 | print("Credentials: none")
270 | if args.what == 'all' or args.what == 'version':
271 | print("WPJsonScraper version: %s" % self.version)
272 | print()
273 |
274 | def do_set(self, arg):
275 | 'Sets a global parameter of WPJsonScanner'
276 | parser = ArgumentParser(prog='set', description='sets global parameters for WPJsonScanner')
277 | parser.add_argument("what", choices=['target', 'proxy', 'cookies', 'credentials'],
278 | help='the parameter to set')
279 | parser.add_argument("value", type=str, help='the new value of the parameter (for cookies, set as cookie string: "n1=v1; n2=v2")')
280 | args = parser.custom_parse_args(arg)
281 | if args is None:
282 | return
283 | if args.what == 'target':
284 | self.target = args.value
285 | if re.match(r'^https?://.*$', self.target) is None:
286 | self.target = "http://" + self.target
287 | if re.match(r'^.+/$', self.target) is None:
288 | self.target += "/"
289 | InteractiveShell.prompt = Console.red + self.target + Console.normal + " > "
290 | print("target = %s" % args.value)
291 | self.scanner = WPApi(self.target, session=self.session)
292 | Console.log_info("Cache is erased but session stays the same (with cookies and authorization)")
293 | elif args.what == 'proxy':
294 | self.session.set_proxy(args.value)
295 | print("proxy = %s" % args.value)
296 | elif args.what == 'cookies':
297 | self.session.set_cookies(args.value)
298 | print("Cookies set!")
299 | elif args.what == "credentials":
300 | authorization_list = args.value.split(':')
301 | if len(authorization_list) == 1:
302 | authorization = (authorization_list[0], '')
303 | elif len(authorization_list) >= 2:
304 | authorization = (authorization_list[0],
305 | ':'.join(authorization_list[1:]))
306 | self.session.set_creds(authorization)
307 | print("Credentials set!")
308 | print()
309 |
310 | def do_list(self, arg):
311 | 'Gets the list of something from the server'
312 | parser = ArgumentParser(prog='list', description='gets a list of something from the server')
313 | parser.add_argument("what", choices=[
314 | 'posts',
315 | #'post-revisions',
316 | #'wp-blocks',
317 | 'categories',
318 | 'tags',
319 | 'pages',
320 | 'comments',
321 | 'media',
322 | 'users',
323 | #'themes',
324 | #'search-results',
325 | 'namespaces',
326 | 'all',
327 | ],
328 | help='what to list')
329 | parser.add_argument("--json", "-j", help="list and store as json to the specified file")
330 | parser.add_argument("--csv", "-c", help="list and store as csv to the specified file")
331 | parser.add_argument("--limit", "-l", type=int, help="limit the number of results")
332 | parser.add_argument("--start", "-s", type=int, help="start at the given index")
333 | parser.add_argument("--no-cache", dest="cache", action="store_false", help="don't lookup in cache and ask the server")
334 | args = parser.custom_parse_args(arg)
335 | if args is None:
336 | return
337 | # The checks must be ordered by dependencies
338 | kwargs = {
339 | "start": args.start,
340 | "limit": args.limit,
341 | "is_all": args.what == "all",
342 | "cache": args.cache,
343 | "json": args.json,
344 | "csv": args.csv
345 | }
346 | if args.what == "all" or args.what == "users":
347 | self.list_obj(WPApi.USER, **kwargs)
348 | if args.what == "all" or args.what == "tags":
349 | self.list_obj(WPApi.TAG, **kwargs)
350 | if args.what == "all" or args.what == "categories":
351 | self.list_obj(WPApi.CATEGORY, **kwargs)
352 | if args.what == "all" or args.what == "posts":
353 | self.list_obj(WPApi.POST, **kwargs)
354 | if args.what == "all" or args.what == "pages":
355 | self.list_obj(WPApi.PAGE, **kwargs)
356 | if args.what == "all" or args.what == "comments":
357 | self.list_obj(WPApi.COMMENT, **kwargs)
358 | if args.what == "all" or args.what == "media":
359 | self.list_obj(WPApi.MEDIA, **kwargs)
360 | if args.what == "all" or args.what == "namespaces":
361 | self.list_obj(WPApi.NAMESPACE, **kwargs)
362 |
363 | def do_fetch(self, arg):
364 | 'Fetches a specific content specified by ID'
365 | parser = ArgumentParser(prog='fetch', description='fetches something from the server or the cache by ID')
366 | parser.add_argument("what", choices=[
367 | 'post',
368 | #'post-revision',
369 | #'wp-block',
370 | 'category',
371 | 'tag',
372 | 'page',
373 | 'comment',
374 | 'media',
375 | 'user',
376 | #'theme',
377 | #'search-result',
378 | ],
379 | help='what to fetch')
380 | parser.add_argument("id", type=int, help='the ID of the content to fetch')
381 | parser.add_argument("--json", "-j", help="list and store as json to the specified file")
382 | parser.add_argument("--csv", "-c", help="list and store as csv to the specified file")
383 | parser.add_argument("--no-cache", dest="cache", action="store_false", help="don't lookup in cache and ask the server")
384 | args = parser.custom_parse_args(arg)
385 | what_type = None
386 | if args is None:
387 | return
388 | what_type = WPApi.str_type_to_native(args.what)
389 |
390 | if what_type is not None:
391 | self.fetch_obj(what_type, args.id, cache=args.cache, json=args.json, csv=args.csv)
392 | else:
393 | print("Not implemented")
394 | print()
395 |
396 | def do_search(self, arg):
397 | 'Looks for specific keywords in the WordPress API'
398 | parser = ArgumentParser(prog='search', description='searches something from the server')
399 | parser.add_argument("--type", "-t", action="append", choices=[
400 | 'all',
401 | 'post',
402 | #'post-revision',
403 | #'wp-block',
404 | 'category',
405 | 'tag',
406 | 'page',
407 | 'comment',
408 | 'media',
409 | 'user',
410 | #'theme',
411 | #'search-result',
412 | ],
413 | help='the types to look for (default all)',
414 | dest='what'
415 | )
416 | parser.add_argument("keywords", help='the keywords to look for')
417 | parser.add_argument("--json", "-j", help="list and store as json to the specified file(s)")
418 | parser.add_argument("--csv", "-c", help="list and store as csv to the specified file(s)")
419 | parser.add_argument("--limit", "-l", type=int, help="limit the number of results")
420 | parser.add_argument("--start", "-s", type=int, help="start at the given index")
421 | args = parser.custom_parse_args(arg)
422 | if args is None:
423 | return
424 | what_types = WPApi.convert_obj_types_to_list(args.what)
425 | results = self.scanner.search(what_types, args.keywords, args.start, args.limit)
426 | print()
427 | for k, v in results.items():
428 | prop = self.get_fetch_or_list_type(k, plural=True)
429 | print(prop["obj_name"] + " details")
430 | if len(v) == 0:
431 | Console.log_info("No result")
432 | else:
433 | try:
434 | prop["display_func"](v)
435 | InteractiveShell.export_decorator(
436 | prop["export_func"],
437 | len(what_types) > 1 or WPApi.ALL_TYPES in what_types,
438 | prop["obj_name"].lower(),
439 | args.json,
440 | args.csv,
441 | v
442 | )
443 | except WordPressApiNotV2:
444 | Console.log_error("The API does not support WP V2")
445 | except IOError as e:
446 | Console.log_error("Could not open %s for writing" % e.filename)
447 | print()
448 |
449 | def do_dl(self, arg):
450 | 'Downloads a media file (e.g. from /wp-content/uploads/) based on its ID'
451 |
452 | parser = ArgumentParser(prog='dl', description='downloads a media from the server')
453 | parser.add_argument("ids", help='ids to look for (comma separated), "all" or "cache"')
454 | parser.add_argument("dest", help='destination folder')
455 | parser.add_argument("--no-cache", dest="cache", action="store_false", help="don't lookup in cache and ask the server")
456 | parser.add_argument("--use-slug", dest="slug", action="store_true", help="use the slug as filename and not the source URL name")
457 | args = parser.custom_parse_args(arg)
458 | if args is None:
459 | return
460 |
461 | if not os.path.isdir(args.dest):
462 | Console.log_error("The destination is not a folder or does not exist")
463 | return
464 |
465 | print("Pulling the media URLs")
466 | media, slugs = self.scanner.get_media_urls(args.ids, args.cache)
467 | if len(media) == 0:
468 | Console.log_error("No media found corresponding to the criteria")
469 | return
470 | print("%d media URLs found" % len(media))
471 | answer = input("Do you wish to proceed to download? (y/N)")
472 | if answer.lower() != "y":
473 | return
474 | print("Note: Only files over 10MB are logged here")
475 |
476 | number_downloaded = 0
477 | if args.slug:
478 | number_downloaded = Exporter.download_media(media, args.dest, slugs)
479 | else:
480 | number_downloaded = Exporter.download_media(media, args.dest)
481 | print('Downloaded %d media to %s' % (number_downloaded, args.dest))
482 |
483 | def start_interactive(target, session, version):
484 | """
485 | Starts a new interactive session
486 | """
487 | InteractiveShell(target, session, version).cmdloop()
--------------------------------------------------------------------------------
/lib/plugins/plugin_list.csv:
--------------------------------------------------------------------------------
1 | oembed/1.0,Allows embedded representation of a URL,
2 | contact-form-7/v1,Manages multiple contact forms,https://wordpress.org/plugins/contact-form-7/
3 | wc/v1,WooCommerce is a free eCommerce plugin that allows to sell anything,https://wordpress.org/plugins/woocommerce/
4 | wc/v2,WooCommerce is a free eCommerce plugin that allows to sell anything,https://wordpress.org/plugins/woocommerce/
5 | facebook/v1,,
6 | regenerate-thumbnails/v1,Regenerate Thumbnails allows to regenerate all thumbnail sizes for one or more images,https://wordpress.org/plugins/regenerate-thumbnails/
7 | wp/v2,The default API integrated since WordPress 4.7,https://developer.wordpress.org/rest-api/
8 | akismet/v1,Akismet checks comments and contact form submissions against a global database of spam,https://wordpress.org/plugins/akismet/
9 | yoast/v1,Yoast SEO is a WordPress SEO plugin,https://wordpress.org/plugins/wordpress-seo/
10 | wp-super-cache/v1,This plugin generates static html files from your dynamic WordPress blog,https://wordpress.org/plugins/wp-super-cache/
11 | script-manager/v1,,
12 | jetpack/v4,Hassle-free design and marketing,https://wordpress.org/plugins/jetpack/
13 | redirection/v1,Redirection is the most popular redirect manager for WordPress,https://wordpress.org/plugins/redirection/
14 | tribe/events/v1,Create and manage an events calendar,https://wordpress.org/plugins/the-events-calendar/
15 | 2fa/v1,,
16 | wpsc/v1,,
17 | v1/products/,,
18 | v1/cart/,,
19 | v1/,,
20 | post-views-counter,Counts views of posts of the website,https://wordpress.org/plugins/post-views-counter/
21 | frm-admin/v1,,
22 | listo/v1,Listo is a simple plugin that supplies other plugins and themes with commonly used lists,https://wordpress.org/plugins/listo/
23 | themeisle-sdk/v1,,
24 | bogo/v1,Bogo is a straight-forward multilingual plugin for WordPress,https://wordpress.org/plugins/bogo/
25 | envira/v1,Responsive Image Gallery for WordPress,https://wordpress.org/plugins/envira-gallery-lite/
26 | disqus/v1,Disqus is the web’s most popular commenting system,https://wordpress.org/plugins/disqus-comment-system/
27 | invitations-for-slack/v1,Invitations for Slack allows to show “Join us on Slack.” buttons,https://wordpress.org/plugins/invitations-for-slack/
28 | rop/v1,Revive Old Posts helps to keep the old posts alive by automatically sharing them on Social Networks,https://wordpress.org/plugins/tweet-old-post/
29 | cf-api/v2,,
30 | thrive,,
31 | om-cc,,
32 | om/fiw,,
33 | tatsu/v1,,
34 | semplice/v1/editor,,
35 | semplice/v1/admin,,
36 | semplice/v1/frontend,,
37 | jwt-auth/v1,,
38 | pum/v1,,
39 | deliciousbrains/v1,,
40 | sportspress/v2,Creates a professional sports website,https://wordpress.org/plugins/sportspress/
41 | content-forms/v1,,
42 | wp_live_chat_support/v1,Fully functional Live Chat plugin,https://wordpress.org/plugins/wp-live-chat-support/
43 | if-menu/v1,Control what menu items visitors see based on visibility rules,https://wordpress.org/plugins/if-menu/
44 | iowd/v1,,
45 | save,,
46 | facetwp/v1/,,
47 | slimstat/v1,A web analytics plugin for WordPress,https://wordpress.org/plugins/wp-slimstat/
48 | social-share/v1,,
49 | social-counts/v1,,
50 | swp_api,,
51 | app/v2,,
52 | alids/v1/,,
53 | template-directory,,
54 | customify/v1,With Customify developers can easily create advanced theme-specific options inside the WordPress Customizer,https://wordpress.org/plugins/customify/
55 | pixcare/v1,,
56 | codepinch/v1,A website error correcter?,https://wordpress.org/plugins/wp-error-fix/
57 | blc/v1,Broken Link Checker?,https://wordpress.org/plugins/broken-link-checker/
58 | visualizer/v1,,
59 | td-composer,,
60 | tdw,,
61 | mpp/v1,,
62 | wooketing/v1,,
63 | gf/v2,,
64 | wpcsp/v1,Set the CSP settings and will add them to the page the visitor requested,https://wordpress.org/plugins/wp-content-security-policy/
65 | instant-images,One click uploads of Unsplash photos,https://wordpress.org/plugins/instant-images/
66 | api,,
67 | templates-directory,,
68 | rollbar/v1,Rollbar collects errors and allows to analyze them,https://wordpress.org/plugins/rollbar/
69 | liveblog/v1,Quick and simple blogging for following fast-paced events,https://wordpress.org/plugins/liveblog/
70 | integrity-checker/v1,Verifies that all installed code is identical to it’s original version and more,https://wordpress.org/plugins/integrity-checker/
71 | pll/v1,,
72 | wp-post-modal/v1,,
73 | quiz-survey-master/v1,Creates surveys for the users,https://wordpress.org/plugins/quiz-master-next/
74 | rp-wapi/v1,,
75 | wc-product-add-ons/v1,WooCommerce PPOM (Personalized Product Option Manager) Plugin adds input fields on product page,https://wordpress.org/plugins/search/wc+products/
76 | wpglib/v1,,
77 | tcm/v1,,
78 | affwp/v1,,
79 | custom-api/v1,,
80 | wplr/v1,"Synchronizes photos, collections, keywords and metadata between Lightroom and WordPress",https://wordpress.org/plugins/search/wplr/
81 | acf/v3,Exposes Advanced Custom Fields Endpoints in the WordPress REST API,https://wordpress.org/plugins/acf-to-rest-api/
82 | pp/v1,,
83 | dooplay,,
84 | dbmovies,,
85 | pciextranet/v2,,
86 | cloozi/rest,,
87 | store-locator-plus/v1,Maps locations on Google Maps,https://wordpress.org/plugins/store-locator-le/
88 | store-locator-plus/v2,Maps locations on Google Maps,https://wordpress.org/plugins/store-locator-le/
89 | joinzee-wp/v1,,
90 | ccf/v1,Custom Contact Forms?,https://wordpress.org/plugins/custom-contact-forms/
91 | keremiya,,
92 | pageviews/1.0,A simple and lightweight pageviews counter,https://wordpress.org/plugins/pageviews/
93 | watchful/v1,,
94 | shortcode-change,,
95 | shortcode-insert,,
96 | upload/,,
97 | sync/,,
98 | download/,,
99 | agroopwoo,,
100 | rest-routes/v2,Building custom endpoints for WP REST API made easy,https://wordpress.org/plugins/rest-routes/
101 | pvc/v1,,
102 | ee/v4.8.29,,
103 | ee/v4.8.33,,
104 | ee/v4.8.34,,
105 | ee/v4.8.36,,
106 | vegashero/v1,,
107 | ml-api/v2,,
108 | mwl/v1,,
109 | envira-background/v1,,
110 | api/v1,,
111 | rta,,
112 | stec/v2,,
113 | erp/v1,,
114 | autofill/v1,,
115 | /autofill/v1,,
116 | rest/v1,,
117 | wp/v2/acf,,
118 | ms/api,,
119 | siso/v1,,
120 | dp/v1,,
121 | indieauth/1.0,IndieAuth is a way for doing Web sign-in,https://wordpress.org/plugins/indieauth/
122 | sloc_geo/1.0,,
123 | link-preview/1.0,Display a preview for a URL similar to sharing a link on Facebook,https://wordpress.org/plugins/wp-link-preview/
124 | webmention/1.0,Enable conversation across the web,https://wordpress.org/plugins/webmention/
125 | bballs,,
126 | logbook/v1,This plugin is for logging users' activities,https://wordpress.org/plugins/search/logbook/
127 | child-themify/v1,Create child themes with the click of a button,https://wordpress.org/plugins/child-themify/
128 | versionpress,,
129 | keliron/api/v3,,
130 | bablic,Translate WP with this multilingual plugin,https://wordpress.org/plugins/bablic/
131 | eum/v1,,
132 | tvo/v1,,
133 | frm/v2,,
134 | app-mobile,,
135 | ap3/v1,,
136 | diets/v1,,
137 | manage-customers/v1,,
138 | leads,WordPress Leads?,https://wordpress.org/plugins/leads/
139 | commentcava/v1.0,CommentCaVa disables the comment field for a certain amount of time,https://wordpress.org/plugins/commentcava/
140 | lscf_rest,Advanced WordPress Filter Plugin,https://wordpress.org/plugins/live-search-custom-fields-lite/
141 | wpv/v1,,
142 | tho/v1,,
143 | aghigh/v1,,
144 | spnl/v1,A Newsletter Plugin for WordPress,https://wordpress.org/plugins/search/spnl/
145 | task_manager/v1,Task manager,https://wordpress.org/plugins/task-manager/
146 | customfiy/v1,Theme Customizer Booster,https://wordpress.org/plugins/customify/
147 | CHifcoRegCardPluginV2/v1,,
148 | CHifcoFireBasePlugin/v1,,
149 | CHifcoFireBaseVII/v2,,
150 | wp-api-menus/v2,,
151 | envira-lightroom/v3,Envira Gallery allows you to create photo galleries and video galleries,https://wordpress.org/plugins/envira-gallery-lite/
152 | comments/v1,,
153 | addcomment/v1,,
154 | pf/v1,,
155 | postmatic/v1,,
156 | ivole/v1,Customer Reviews for WooCommerce?,https://wordpress.org/plugins/customer-reviews-woocommerce/
157 | shwcp/v1,,
158 | wp-rest-api-log,WordPress plugin to log REST API requests and responses,https://wordpress.org/plugins/wp-rest-api-log/
159 | wk/v1,,
160 | sfp-live-search/v1,,
161 | csco/v1,,
162 | caos/v1,A plugin that inserts the Analytics tracking code,https://wordpress.org/plugins/host-analyticsjs-local/
163 | rest/events,,
164 | obfx-google-analytics,,
165 | shariff/v1,Shariff provides share buttons that respect the privacy of visitors,https://wordpress.org/plugins/shariff/
166 | wp-discourse/v1,This plugin allows to use Discourse as a community engine,https://wordpress.org/plugins/wp-discourse/
167 | dbmvs,,
168 | wp-crm/v1/form,This plugin is intended to significantly improve user management,https://wordpress.org/plugins/wp-crm/
169 | gutenberg/v1,A new editing experience for WordPress,https://wordpress.org/plugins/gutenberg/
170 | tribe_events/v2,,
171 | rnet/v1,,
172 | eklo/v2,,
173 | menus/v1,,
174 | sow/v1,,
175 | wpbooklist/v1,Used to sell books, record and catalog a library,https://wordpress.org/plugins/wpbooklist/
176 | tabulate,This plugin provides a simple user-friendly interface to tables in the database,https://wordpress.org/plugins/tabulate/
177 | geoblog/v1,,
178 | acf/v2,,
179 | mobilegate/v2,,
180 | jamtrap/v1,,
181 | paf,,
182 | in-cron/v1,,
183 | awb/v1,AWB allows to use parallax backgrounds with images, videos, youtube and vimeo,https://wordpress.org/plugins/advanced-backgrounds/
184 | wctofb/v1,WooCommerce to facebook shop,https://wordpress.org/plugins/woo-to-facebook-shop/
185 | weekly-class/v1,Generate a weekly schedule of classes,https://wordpress.org/plugins/weekly-class-schedule/
186 | be-to-tatsu/v1,,
187 | braintree-gateway/v1/,A payment gateway,
188 | bfwc/settings/kount/,,
189 | gembloong/,,
190 |
--------------------------------------------------------------------------------
/lib/requestsession.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | from http.cookies import SimpleCookie
24 | import requests
25 |
26 | from lib.console import Console
27 |
28 | class ConnectionCouldNotResolve(Exception):
29 | pass
30 |
31 | class ConnectionReset(Exception):
32 | pass
33 |
34 | class ConnectionRefused(Exception):
35 | pass
36 |
37 | class ConnectionTimeout(Exception):
38 | pass
39 |
40 | class HTTPError400(Exception):
41 | pass
42 |
43 | class HTTPError401(Exception):
44 | pass
45 |
46 | class HTTPError403(Exception):
47 | pass
48 |
49 | class HTTPError404(Exception):
50 | pass
51 |
52 | class HTTPError500(Exception):
53 | pass
54 |
55 | class HTTPError502(Exception):
56 | pass
57 |
58 | class HTTPError(Exception):
59 | pass
60 |
61 | class RequestSession:
62 | """
63 | Wrapper to handle the requests library with session support
64 | """
65 |
66 | def __init__(self, proxy=None, cookies=None, authorization=None):
67 | """
68 | Creates a new RequestSession instance
69 | param proxy: a dict containing a proxy server string for HTTP and/or
70 | HTTPS connection
71 | param cookies: a string in the format of the Cookie header
72 | param authorization: a tuple containing login and password or
73 | requests.auth.HTTPBasicAuth for basic authentication or
74 | requests.auth.HTTPDigestAuth for NTLM-like authentication
75 | """
76 | self.s = requests.Session()
77 | if proxy is not None:
78 | self.set_proxy(proxy)
79 | if cookies is not None:
80 | self.set_cookies(cookies)
81 | if authorization is not None and (
82 | type(authorization) is tuple and len(authorization) == 2 or
83 | type(authorization) is requests.auth.HTTPBasicAuth or
84 | type(authorization) is requests.auth.HTTPDigestAuth):
85 | self.s.auth = authorization
86 |
87 | def get(self, url):
88 | """
89 | Calls the get function from requests but handles errors to raise proper
90 | exception following the context
91 | """
92 | return self.do_request("get", url)
93 |
94 |
95 | def post(self, url, data=None):
96 | """
97 | Calls the post function from requests but handles errors to raise proper
98 | exception following the context
99 | """
100 | return self.do_request("post", url, data)
101 |
102 | def do_request(self, method, url, data=None):
103 | """
104 | Helper class to regroup requests and handle exceptions at the same
105 | location
106 | """
107 | response = None
108 | try:
109 | if method == "post":
110 | response = self.s.post(url, data)
111 | else:
112 | response = self.s.get(url)
113 | except requests.ConnectionError as e:
114 | if "Errno -5" in str(e) or "Errno -2" in str(e)\
115 | or "Errno -3" in str(e):
116 | Console.log_error("Could not resolve host %s" % url)
117 | raise ConnectionCouldNotResolve
118 | elif "Errno 111" in str(e):
119 | Console.log_error("Connection refused by %s" % url)
120 | raise ConnectionRefused
121 | elif "RemoteDisconnected" in str(e):
122 | Console.log_error("Connection reset by %s" % url)
123 | raise ConnectionReset
124 | else:
125 | print(e)
126 | raise e
127 | except Exception as e:
128 | raise e
129 |
130 | if response.status_code == 400:
131 | raise HTTPError400
132 | elif response.status_code == 401:
133 | Console.log_error("Error 401 (Unauthorized) while trying to fetch"
134 | " the API")
135 | raise HTTPError401
136 | elif response.status_code == 403:
137 | Console.log_error("Error 403 (Authorization Required) while trying"
138 | " to fetch the API")
139 | raise HTTPError403
140 | elif response.status_code == 404:
141 | raise HTTPError404
142 | elif response.status_code == 500:
143 | Console.log_error("Error 500 (Internal Server Error) while trying"
144 | " to fetch the API")
145 | raise HTTPError500
146 | elif response.status_code == 502:
147 | Console.log_error("Error 502 (Bad Gateway) while trying"
148 | " to fetch the API")
149 | raise HTTPError404
150 | elif response.status_code > 400:
151 | Console.log_error("Error %d while trying to fetch the API" %
152 | response.status_code)
153 | raise HTTPError
154 |
155 | return response
156 |
157 | def set_cookies(self, cookies):
158 | """
159 | Sets new cookies from a string
160 | """
161 | c = SimpleCookie()
162 | c.load(cookies)
163 | for key, m in c.items():
164 | self.s.cookies.set(key, m.value)
165 |
166 | def get_cookies(self):
167 | return self.s.cookies.get_dict()
168 |
169 | def set_proxy(self, proxy):
170 | prot = 'http'
171 | if proxy[:5].lower() == 'https':
172 | prot = 'https'
173 | self.s.proxies = {prot: proxy}
174 |
175 | def get_proxies(self):
176 | return self.s.proxies
177 |
178 | def set_creds(self, credentials):
179 | self.s.auth = credentials
180 |
181 | def get_creds(self):
182 | return self.s.auth
--------------------------------------------------------------------------------
/lib/utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | import json
24 |
25 | from urllib.parse import urlsplit, urlunsplit
26 |
27 | def get_by_id(value, id):
28 | """
29 | Utility function to retrieve a value by and ID in a list of dicts, returns
30 | None of no correspondance have been made
31 | param value: the dict to process
32 | param id: the id to get
33 | """
34 | if value is None:
35 | return None
36 | for val in value:
37 | if 'id' in val.keys() and val['id'] == id:
38 | return val
39 | return None
40 |
41 | # Neat code part from https://codereview.stackexchange.com/questions/13027/joini
42 | # ng-url-path-components-intelligently
43 | def url_path_join(*parts):
44 | """Normalize url parts and join them with a slash."""
45 | schemes, netlocs, paths, queries, fragments = \
46 | zip(*(urlsplit(part) for part in parts))
47 | scheme = first(schemes)
48 | netloc = first(netlocs)
49 | path = '/'.join(x.strip('/') for x in paths if x)
50 | query = first(queries)
51 | fragment = first(fragments)
52 | return urlunsplit((scheme, netloc, path, query, fragment))
53 |
54 | def first(sequence, default=''):
55 | return next((x for x in sequence if x), default)
56 |
57 | # Code from https://stackoverflow.com/questions/3173320/text-progress-bar-in-th
58 | # e-console
59 |
60 | def print_progress_bar (iteration, total, prefix = '', suffix = '', decimals = 1,\
61 | length = 100, fill = '█'):
62 | """
63 | Call in a loop to create terminal progress bar
64 | @params:
65 | iteration - Required : current iteration (Int)
66 | total - Required : total iterations (Int)
67 | prefix - Optional : prefix string (Str)
68 | suffix - Optional : suffix string (Str)
69 | decimals - Optional : positive number of decimals in percent \
70 | complete (Int)
71 | length - Optional : character length of bar (Int)
72 | fill - Optional : bar fill character (Str)
73 | """
74 | try:
75 | percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / \
76 | float(total)))
77 | filledLength = int(length * iteration // total)
78 | except:
79 | percent = 0
80 | filledLength = 0
81 |
82 | bar = fill * filledLength + '-' * (length - filledLength)
83 | print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = '\r')
84 | # Print New Line on Complete
85 | if iteration == total:
86 | print()
87 |
88 | def get_content_as_json (response_obj):
89 | """
90 | When a BOM is present (see issue #2), UTF-8 is not properly decoded by
91 | Response.json() method. This is a helper function that returns a json value
92 | even if a BOM is present in UTF-8 text
93 | @params:
94 | response_obj: a requests Response instance
95 | @returns: a decoded json object (list or dict)
96 | """
97 | if response_obj.content[:3]== b'\xef\xbb\xbf': # UTF-8 BOM
98 | content = response_obj.content.decode("utf-8-sig")
99 | return json.loads(content)
100 | else:
101 | try:
102 | return response_obj.json()
103 | except:
104 | return {}
105 |
--------------------------------------------------------------------------------
/lib/wpapi.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) 2018-2020 Mickaël "Kilawyn" Walter
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this software and associated documentation files (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 |
23 | import math
24 | import copy
25 |
26 | import requests
27 | from urllib.parse import urlencode
28 |
29 | from json.decoder import JSONDecodeError
30 |
31 | from lib.exceptions import NoWordpressApi, WordPressApiNotV2, \
32 | NSNotFoundException
33 | from lib.requestsession import RequestSession, HTTPError400, HTTPError404
34 | from lib.utils import url_path_join, print_progress_bar, get_content_as_json, get_by_id
35 |
36 | class WPApi:
37 | """
38 | Queries the WordPress API to retrieve information
39 | """
40 |
41 | # Object types
42 | POST = 0
43 | """
44 | The post type
45 | """
46 | POST_REVISION = 1
47 | """
48 | The post revision type
49 | """
50 | WP_BLOCK = 2
51 | """
52 | The Gutenberg block type
53 | """
54 | CATEGORY = 3
55 | """
56 | The category type
57 | """
58 | TAG = 4
59 | """
60 | The tag type
61 | """
62 | PAGE = 5
63 | """
64 | The page type
65 | """
66 | COMMENT = 6
67 | """
68 | The comment type
69 | """
70 | MEDIA = 7
71 | """
72 | The media type
73 | """
74 | USER = 8
75 | """
76 | The user type
77 | """
78 | THEME = 9
79 | """
80 | The theme type
81 | """
82 | NAMESPACE = 10
83 | """
84 | The namespace type
85 | """
86 | #SEARCH_RESULT = 10
87 | ALL_TYPES = 20
88 | """
89 | Constant representing all types
90 | """
91 |
92 | def __init__(self, target, api_path="wp-json/", session=None,
93 | search_terms=None):
94 | """
95 | Creates a new instance of WPApi
96 | param target: the target of the scan
97 | param api_path: the api path, if non-default
98 | param session: the requests session object to use for HTTP requests
99 | param search_terms : the terms of the keyword search, if any
100 | """
101 | self.api_path = api_path
102 | self.search_terms = search_terms
103 | self.has_v2 = None
104 | self.name = None
105 | self.description = None
106 | self.url = target
107 | self.basic_info = None
108 | self.posts = None
109 | self.tags = None
110 | self.categories = None
111 | self.users = None
112 | self.media = None
113 | self.pages = None
114 | self.s = None
115 | self.comments_loaded = False
116 | self.orphan_comments = []
117 | self.comments = None
118 |
119 | if session is not None:
120 | self.s = session
121 | else:
122 | self.s = RequestSession()
123 |
124 | @staticmethod
125 | def str_type_to_native(str_type):
126 | """
127 | Converts a single object type as str to its corresponding native type.
128 | If the object type is unknown, this returns None as a fallback.
129 | This may have to be modified in cases of bugs.
130 |
131 | :param str_type: the object type as string
132 | :return: the object type as native constant
133 |
134 | ```
135 | str_type_to_native("post") # returns WPApi.POST
136 | ```
137 | """
138 | if str_type == "user":
139 | return WPApi.USER
140 | elif str_type == "tag":
141 | return WPApi.TAG
142 | elif str_type == "category":
143 | return WPApi.CATEGORY
144 | elif str_type == "post":
145 | return WPApi.POST
146 | elif str_type == "page":
147 | return WPApi.PAGE
148 | elif str_type == "comment":
149 | return WPApi.COMMENT
150 | elif str_type == "media":
151 | return WPApi.MEDIA
152 | elif str_type == "post_revision":
153 | return WPApi.POST_REVISION
154 | elif str_type == "block":
155 | return WPApi.WP_BLOCK
156 | elif str_type == "theme":
157 | return WPApi.THEME
158 | elif str_type == "namespace":
159 | return WPApi.NAMESPACE
160 | return None
161 |
162 | @staticmethod
163 | def convert_obj_types_to_list(str_types):
164 | """
165 | Converts a list of object type as list to a list of native constants
166 | representing the object types.
167 | """
168 | out = []
169 | if str_types is None or len(str_types) == 0 or 'all' in str_types:
170 | return [WPApi.ALL_TYPES]
171 | for el in str_types:
172 | current = WPApi.str_type_to_native(el)
173 | if current is not None:
174 | out.append(current)
175 | return out
176 |
177 | def get_orphans_comments(self):
178 | """
179 | Returns the list of comments for which a post hasn't been found
180 | """
181 | return self.orphan_comments
182 |
183 | def get_basic_info(self):
184 | """
185 | Collects and stores basic information about the target
186 | """
187 | rest_url = url_path_join(self.url, self.api_path)
188 | if self.basic_info is not None:
189 | return self.basic_info
190 |
191 | try:
192 | req = self.s.get(rest_url)
193 | except Exception:
194 | raise NoWordpressApi
195 | if req.status_code >= 400:
196 | raise NoWordpressApi
197 | self.basic_info = get_content_as_json(req)
198 |
199 | if 'name' in self.basic_info.keys():
200 | self.name = self.basic_info['name']
201 |
202 | if 'description' in self.basic_info.keys():
203 | self.description = self.basic_info['description']
204 |
205 | if 'namespaces' in self.basic_info.keys() and 'wp/v2' in \
206 | self.basic_info['namespaces']:
207 | self.has_v2 = True
208 |
209 | return self.basic_info
210 |
211 | def crawl_pages(self, url, start=None, num=None, search_terms=None, display_progress=True):
212 | """
213 | Crawls all pages while there is at least one result for the given
214 | endpoint or tries to get pages from start to end
215 | """
216 | if search_terms is None:
217 | search_terms = self.search_terms
218 | page = 1
219 | total_entries = 0
220 | total_pages = 0
221 | more_entries = True
222 | entries = []
223 | base_url = url
224 | entries_left = 1
225 | per_page = 10
226 | if search_terms is not None:
227 | if '?' in base_url:
228 | base_url += '&' + urlencode({'search': search_terms})
229 | else:
230 | base_url += '?' + urlencode({'search': search_terms})
231 | if start is not None:
232 | page = math.floor(start/per_page) + 1
233 | if num is not None:
234 | entries_left = num
235 | while more_entries and entries_left > 0:
236 | rest_url = url_path_join(self.url, self.api_path, (base_url % page))
237 | if start is not None:
238 | rest_url += "&per_page=%d" % per_page
239 | try:
240 | req = self.s.get(rest_url)
241 | if (page == 1 or start is not None and page == math.floor(start/per_page) + 1) and 'X-WP-Total' in req.headers:
242 | total_entries = int(req.headers['X-WP-Total'])
243 | total_pages = int(req.headers['X-WP-TotalPages'])
244 | print("Total number of entries: %d" % total_entries)
245 | if start is not None and total_entries < start:
246 | start = total_entries - 1
247 | except HTTPError400:
248 | break
249 | except Exception:
250 | raise WordPressApiNotV2
251 | try:
252 | json_content = get_content_as_json(req)
253 | if type(json_content) is list and len(json_content) > 0:
254 | if (start is None or start is not None and page > math.floor(start/per_page) + 1) and num is None:
255 | entries += json_content
256 | if start is not None:
257 | entries_left -= len(json_content)
258 | elif start is not None and page == math.floor(start/per_page) + 1:
259 | if num is None or num is not None and len(json_content[start % per_page:]) < num:
260 | entries += json_content[start % per_page:]
261 | if num is not None:
262 | entries_left -= len(json_content[start % per_page:])
263 | else:
264 | entries += json_content[start % per_page:(start % per_page) + num]
265 | entries_left = 0
266 | else:
267 | if num is not None and entries_left > len(json_content):
268 | entries += json_content
269 | entries_left -= len(json_content)
270 | else:
271 | entries += json_content[:entries_left]
272 | entries_left = 0
273 |
274 | if display_progress:
275 | if num is None and start is None and total_entries >= 0:
276 | print_progress_bar(page, total_pages,
277 | length=70)
278 | elif num is None and start is not None and total_entries >= 0:
279 | print_progress_bar(total_entries-start-entries_left, total_entries-start,
280 | length=70)
281 | elif num is not None and total_entries > 0:
282 | print_progress_bar(num-entries_left, num,
283 | length=70)
284 | else:
285 | more_entries = False
286 | except JSONDecodeError:
287 | more_entries = False
288 |
289 | page += 1
290 |
291 | return (entries, total_entries)
292 |
293 | def crawl_single_page(self, url):
294 | """
295 | Crawls a single URL
296 | """
297 | content = None
298 | rest_url = url_path_join(self.url, self.api_path, url)
299 | try:
300 | req = self.s.get(rest_url)
301 | except HTTPError400:
302 | return None
303 | except HTTPError404:
304 | return None
305 | except Exception:
306 | raise WordPressApiNotV2
307 | try:
308 | content = get_content_as_json(req)
309 | except JSONDecodeError:
310 | pass
311 |
312 | return content
313 |
314 | def get_from_cache(self, cache, start=None, num=None, force=False):
315 | """
316 | Tries to fetch data from the given cache, also verifies first if WP-JSON is supported
317 | """
318 | if self.has_v2 is None:
319 | self.get_basic_info()
320 | if not self.has_v2:
321 | raise WordPressApiNotV2
322 | if cache is not None and start is not None and len(cache) <= start:
323 | start = len(cache) - 1
324 | if cache is not None and not force:
325 | if start is not None and num is None and len(cache) > start and None not in cache[start:]:
326 | # If start is specified and not num, we want to return the posts in cache only if they were already cached
327 | return cache[start:]
328 | elif start is None and num is not None and len(cache) > num and None not in cache[:num]:
329 | # If num is specified and not start, we want to do something similar to the above
330 | return cache[:num]
331 | elif start is not None and num is not None and len(cache) > start + num and None not in cache[start:num]:
332 | return cache[start:start+num]
333 | elif (start is None and (num is None or num > len(cache))) and None not in cache:
334 | return cache
335 |
336 | return None
337 |
338 | def update_cache(self, cache, values, total_entries, start=None, num=None):
339 | if cache is None:
340 | cache = values
341 | elif len(values) > 0:
342 | s = start
343 | if start is None:
344 | s = 0
345 | if start >= total_entries:
346 | s = total_entries - 1
347 | n = num
348 | if n is not None and s + n > total_entries:
349 | n = total_entries - s
350 | if num is None:
351 | n = total_entries
352 | if n > len(cache):
353 | cache += [None] * (n - len(cache))
354 | for el in values:
355 | cache[s] = el
356 | s += 1
357 | if s == n:
358 | break
359 | if len(cache) != total_entries:
360 | if start is not None and start < total_entries:
361 | cache = [None] * start + cache
362 | if num is not None:
363 | cache += [None] * (total_entries - len(cache))
364 | return cache
365 |
366 | def get_comments(self, start=None, num=None, force=False):
367 | """
368 | Retrieves all comments
369 | """
370 | comments = self.get_from_cache(self.comments, start, num, force)
371 | if comments is not None:
372 | return comments
373 |
374 | comments, total_entries = self.crawl_pages('wp/v2/comments?page=%d', start, num)
375 | self.comments = self.update_cache(self.comments, comments, total_entries, start, num)
376 | return comments
377 |
378 | def get_posts(self, comments=False, start=None, num=None, force=False):
379 | """
380 | Retrieves all posts or the specified ones
381 | """
382 | if self.has_v2 is None:
383 | self.get_basic_info()
384 | if not self.has_v2:
385 | raise WordPressApiNotV2
386 | if self.posts is not None and start is not None and len(self.posts) < start:
387 | start = len(self.posts) - 1
388 | if self.posts is not None and (self.comments_loaded and comments or not comments) and not force:
389 | posts = self.get_from_cache(self.posts, start, num)
390 | if posts is not None:
391 | return posts
392 | posts, total_entries = self.crawl_pages('wp/v2/posts?page=%d', start=start, num=num)
393 |
394 | self.posts = self.update_cache(self.posts, posts, total_entries, start, num)
395 |
396 | if not self.comments_loaded and comments:
397 | # Load comments
398 | comment_list = self.crawl_pages('wp/v2/comments?page=%d')[0]
399 | for comment in comment_list:
400 | found_post = False
401 | for i in range(0, len(self.posts)):
402 | if self.posts[i]['id'] == comment['post']:
403 | if "comments" not in self.posts[i]:
404 | self.posts[i]['comments'] = []
405 | self.posts[i]["comments"].append(comment)
406 | found_post = True
407 | break
408 | if not found_post:
409 | self.orphan_comments.append(comment)
410 | self.comments_loaded = True
411 |
412 | return_posts = self.posts
413 | if start is not None and start < len(return_posts):
414 | return_posts = return_posts[start:]
415 | if num is not None and num < len(return_posts):
416 | return_posts = return_posts[:num]
417 | return return_posts
418 |
419 | def get_tags(self, start=None, num=None, force=False):
420 | """
421 | Retrieves all tags
422 | """
423 | tags = self.get_from_cache(self.tags, start, num, force)
424 | if tags is not None:
425 | return tags
426 |
427 | tags, total_entries = self.crawl_pages('wp/v2/tags?page=%d', start, num)
428 | self.tags = self.update_cache(self.tags, tags, total_entries, start, num)
429 | return tags
430 |
431 | def get_categories(self, start=None, num=None, force=False):
432 | """
433 | Retrieves all categories or the specified ones
434 | """
435 | categories = self.get_from_cache(self.categories, start, num, force)
436 | if categories is not None:
437 | return categories
438 |
439 | categories, total_entries = self.crawl_pages('wp/v2/categories?page=%d', start=start, num=num)
440 | self.categories = self.update_cache(self.categories, categories, total_entries, start, num)
441 | return categories
442 |
443 | def get_users(self, start=None, num=None, force=False):
444 | """
445 | Retrieves all users or the specified ones
446 | """
447 | users = self.get_from_cache(self.users, start, num, force)
448 | if users is not None:
449 | return users
450 |
451 | users, total_entries = self.crawl_pages('wp/v2/users?page=%d', start=start, num=num)
452 | self.users = self.update_cache(self.users, users, total_entries, start, num)
453 | return users
454 |
455 | def get_media(self, start=None, num=None, force=False):
456 | """
457 | Retrieves all media objects
458 | """
459 | media = self.get_from_cache(self.media, start, num, force)
460 | if media is not None:
461 | return media
462 |
463 | media, total_entries = self.crawl_pages('wp/v2/media?page=%d', start=start, num=num)
464 | self.media = self.update_cache(self.media, media, total_entries, start, num)
465 | return media
466 |
467 | def get_media_urls(self, ids, cache=True):
468 | """
469 | Retrieves the media download URLs for specified IDs or all or from cache
470 | """
471 | media = []
472 | if ids == 'all':
473 | media = self.get_media(force=(not cache))
474 | elif ids == 'cache':
475 | media = self.get_from_cache(self.media, force=(not cache))
476 | else:
477 | id_list = ids.split(',')
478 | media = []
479 | for i in id_list:
480 | try:
481 | if int(i) > 0:
482 | m = self.get_obj_by_id(WPApi.MEDIA, int(i), cache)
483 | if m is not None and len(m) > 0 and type(m[0]) is dict:
484 | media.append(m[0])
485 | except ValueError:
486 | pass
487 | urls = []
488 | slugs = []
489 | if media is None:
490 | return []
491 | for m in media:
492 | if m is not None and type(m) is dict and "source_url" in m.keys() and 'slug' in m.keys():
493 | urls.append(m["source_url"])
494 | slugs.append(m['slug'])
495 | return urls, slugs
496 |
497 |
498 | def get_pages(self, start=None, num=None, force=False):
499 | """
500 | Retrieves all pages
501 | """
502 | pages = self.get_from_cache(self.pages, start, num, force)
503 | if pages is not None:
504 | return pages
505 |
506 | pages, total_entries = self.crawl_pages('wp/v2/pages?page=%d', start=start, num=num)
507 | self.pages = self.update_cache(self.pages, pages, total_entries, start, num)
508 | return pages
509 |
510 | def get_namespaces(self, start=None, num=None, force=False):
511 | """
512 | Retrieves an array of namespaces
513 | """
514 | if self.has_v2 is None or force:
515 | self.get_basic_info()
516 | if 'namespaces' in self.basic_info.keys():
517 | if start is None and num is None:
518 | return self.basic_info['namespaces']
519 | namespaces = copy.deepcopy(self.basic_info['namespaces'])
520 | if start is not None and start < len(namespaces):
521 | namespaces = namespaces[start:]
522 | if num <= len(namespaces):
523 | namespaces = namespaces[:num]
524 | return namespaces
525 | return []
526 |
527 | def get_routes(self):
528 | """
529 | Retrieves an array of routes
530 | """
531 | if self.has_v2 is None:
532 | self.get_basic_info()
533 | if 'routes' in self.basic_info.keys():
534 | return self.basic_info['routes']
535 | return []
536 |
537 | def crawl_namespaces(self, ns):
538 | """
539 | Crawls all accessible get routes defined for the specified namespace.
540 | """
541 | namespaces = self.get_namespaces()
542 | routes = self.get_routes()
543 | ns_data = {}
544 | if ns != "all" and ns not in namespaces:
545 | raise NSNotFoundException
546 | for url, route in routes.items():
547 | if 'namespace' not in route.keys() \
548 | or 'endpoints' not in route.keys():
549 | continue
550 | url_as_ns = url.lstrip('/')
551 | if '(?P<' in url or url_as_ns in namespaces:
552 | continue
553 | if ns != 'all' and route['namespace'] != ns or \
554 | route['namespace'] in ['wp/v2', '']:
555 | continue
556 | for endpoint in route['endpoints']:
557 | if 'GET' not in endpoint['methods']:
558 | continue
559 | keep = True
560 | if len(endpoint['args']) > 0 and type(endpoint['args']) is dict:
561 | for name,arg in endpoint['args'].items():
562 | if arg['required']:
563 | keep = False
564 | if keep:
565 | rest_url = url_path_join(self.url, self.api_path, url)
566 | try:
567 | ns_request = self.s.get(rest_url)
568 | ns_data[url] = get_content_as_json(ns_request)
569 | except Exception:
570 | continue
571 | return ns_data
572 |
573 | def get_obj_by_id_helper(self, cache, obj_id, url, use_cache=True):
574 | if use_cache and cache is not None:
575 | obj = get_by_id(cache, obj_id)
576 | if obj is not None:
577 | return [obj]
578 | obj = self.crawl_single_page(url % obj_id)
579 | if type(obj) is dict:
580 | return [obj]
581 | return []
582 |
583 | def get_obj_by_id(self, obj_type, obj_id, use_cache=True):
584 | """
585 | Returns a list of maximum one object specified by its type and ID.
586 |
587 | Also returns an empty list if the ID does not exist.
588 |
589 | :param obj_type: the type of the object (ex. POST)
590 | :param obj_id: the ID of the object to fetch
591 | :param use_cache: if the cache should be used to avoid useless requests
592 | """
593 | if obj_type == WPApi.USER:
594 | return self.get_obj_by_id_helper(self.users, obj_id, 'wp/v2/users/%d', use_cache)
595 | if obj_type == WPApi.TAG:
596 | return self.get_obj_by_id_helper(self.tags, obj_id, 'wp/v2/tags/%d', use_cache)
597 | if obj_type == WPApi.CATEGORY:
598 | return self.get_obj_by_id_helper(self.categories, obj_id, 'wp/v2/categories/%d', use_cache)
599 | if obj_type == WPApi.POST:
600 | return self.get_obj_by_id_helper(self.posts, obj_id, 'wp/v2/posts/%d', use_cache)
601 | if obj_type == WPApi.PAGE:
602 | return self.get_obj_by_id_helper(self.pages, obj_id, 'wp/v2/pages/%d', use_cache)
603 | if obj_type == WPApi.COMMENT:
604 | return self.get_obj_by_id_helper(self.comments, obj_id, 'wp/v2/comments/%d', use_cache)
605 | if obj_type == WPApi.MEDIA:
606 | return self.get_obj_by_id_helper(self.comments, obj_id, 'wp/v2/media/%d', use_cache)
607 | return []
608 |
609 | def get_obj_list(self, obj_type, start, limit, cache, kwargs={}):
610 | """
611 | Returns a list of maximum limit objects specified by the starting object offset.
612 |
613 | :param obj_type: the type of the object (ex. POST)
614 | :param start: the offset of the first object to return
615 | :param limit: the maximum number of objects to return
616 | :param cache: if the cache should be used to avoid useless requests
617 | :param kwargs: additional parameters to pass to the function (for POST only)
618 | """
619 | get_func = None
620 | if obj_type == WPApi.USER:
621 | get_func = self.get_users
622 | elif obj_type == WPApi.TAG:
623 | get_func = self.get_tags
624 | elif obj_type == WPApi.CATEGORY:
625 | get_func = self.get_categories
626 | elif obj_type == WPApi.PAGE:
627 | get_func = self.get_pages
628 | elif obj_type == WPApi.COMMENT:
629 | get_func = self.get_comments
630 | elif obj_type == WPApi.MEDIA:
631 | get_func = self.get_media
632 | elif obj_type == WPApi.NAMESPACE:
633 | get_func = self.get_namespaces
634 |
635 | if get_func is not None:
636 | return get_func(start=start, num=limit, force=not cache)
637 | elif obj_type == WPApi.POST:
638 | return self.get_posts(start=start, num=limit, force=not cache, **kwargs)
639 | return []
640 |
641 | def search(self, obj_types, keywords, start, limit):
642 | """
643 | Looks for data with the specified keywords of the given types.
644 |
645 | :param obj_types: a list of the desired object types to look for
646 | :param keywords: the keywords to look for
647 | :param start: a start index
648 | :param limit: the max number to return
649 | :return: a dict of lists of objects sorted by types
650 | """
651 | out = {}
652 | if WPApi.ALL_TYPES in obj_types or len(obj_types) == 0:
653 | obj_types = [
654 | WPApi.POST, WPApi.CATEGORY, WPApi.TAG, WPApi.PAGE,
655 | WPApi.COMMENT, WPApi.MEDIA, WPApi.USER
656 | ] # All supported types for search
657 | for t in obj_types:
658 | if t == WPApi.POST:
659 | out[t] = self.crawl_pages('wp/v2/posts?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
660 | elif t == WPApi.CATEGORY:
661 | out[t] = self.crawl_pages('wp/v2/categories?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
662 | elif t == WPApi.TAG:
663 | out[t] = self.crawl_pages('wp/v2/tags?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
664 | elif t == WPApi.PAGE:
665 | out[t] = self.crawl_pages('wp/v2/pages?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
666 | elif t == WPApi.COMMENT:
667 | out[t] = self.crawl_pages('wp/v2/comments?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
668 | elif t == WPApi.MEDIA:
669 | out[t] = self.crawl_pages('wp/v2/media?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
670 | elif t == WPApi.USER:
671 | out[t] = self.crawl_pages('wp/v2/users?page=%d', start=start, num=limit, search_terms=keywords, display_progress=False)[0]
672 | return out
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | certifi==2022.12.7
2 | chardet==3.0.4
3 | idna==2.9
4 | requests==2.23.0
5 | urllib3==1.26.5
6 |
--------------------------------------------------------------------------------