├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── LICENSE.txt
├── MANIFEST.in
├── README.md
├── bookcut
├── Settings.ini
├── __init__.py
├── article.py
├── bibliography.py
├── book.py
├── book_details.py
├── bookcut.py
├── booklist.py
├── downloader.py
├── libgen.py
├── mirror_checker.py
├── organise.py
├── repositories.py
├── search.py
└── settings.py
├── conftest.py
├── pytest.ini
├── setup.py
└── tests
├── __init__.py
├── test_book.py
├── test_bookcut.py
├── test_main.py
└── test_mirror_checker.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | bookdb.db
3 | BookCat.egg-info/
4 | build/
5 | dist/
6 | BookCut.egg-info/
7 | resources.json
8 |
--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
1 | repos:
2 | - repo: https://github.com/psf/black
3 | rev: 21.6b0
4 | hooks:
5 | - id: black
6 | language_version: python3
7 |
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | # Changelog
2 | All notable changes to this project will be documented in this file.
3 |
4 |
5 | ##[1.3.7]
6 |
7 | ### Add
8 | - Black autoformatting
9 |
10 | ### Fix
11 | -Issue#7
12 |
13 |
14 | ## [1.3.6]
15 | - Repos option added. ArXiv is now included.
16 | - Fixed known bugs
17 |
18 | ## [1.3.5] - 06_Jun_2021
19 | - Fixed Issue#9
20 |
21 | ### Fixed
22 | - Organise options known bug that could not move some files fixed.
23 |
24 | ## [1.3.1] - 11_Aug_2020
25 |
26 | ### Fixed
27 | - Organise options known bug that could not move some files fixed.
28 |
29 | ## [1.3.0] - 10_Aug_2020
30 |
31 | ### Added
32 | - Configuration mode, user now can change some basic settings of BookCut like
33 | destination folder and clear screen option. Also can add more Libgen Mirrors.
34 |
35 | ### Fixed
36 | - Some raise errors and some known bugs.
37 |
38 | ## [1.2.3] - 07_Aug_2020
39 |
40 | ### Fixed
41 | - [Bug] Issue#6 Forced list error.
42 |
43 |
44 | ## [1.2.2] - 03_Aug_2020
45 |
46 | ### Fixed
47 | - [Bug] Issue with FileNotFoundError, when no home/Documents directory existed.
48 |
49 |
50 | ## [1.2.1] - 03_Aug_2020
51 |
52 | ### Added
53 | - Version option added ('bookcut --version')
54 | - List forced flag, for downloading the founded books automatically.
55 | - Added this file (CHANGELOG.md).
56 |
57 | ### Fixed
58 | - Search option bug, when no results found.
59 | - PEP8 fixes
60 |
61 | ## [1.2.0] - 03_Aug_2020
62 | ### Added
63 | - Details option
64 | - Bibliography option which returns a list with all the books from an author, with the option to save to .txt file
65 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) [2020] [Costis94]
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include bookcut/Settings.ini
2 | include README.md
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | [](https://pepy.tech/project/bookcut) 
5 | [](https://github.com/psf/black)
6 |
7 |
8 | BookCut is a Python Command Line Interface tool, that help the user to download **free e-books**,
9 | **organise** them in folders by genre, **retrieve** book details by *ISBN* or *title*,
10 | get a list with **all the books from a writer** and save them to .txt file.
11 |
12 | *With the help of LibGen, ArXiv and OpenLibrary.*
13 |
14 |
15 | ## REQUIREMENTS
16 |
17 | * Python 3
18 | * python3-pip
19 |
20 |
21 | ## Installation
22 |
23 | * **Install with pip:**
24 |
25 | ```bash
26 | pip install bookcut
27 | #or if you have also Python 2
28 | pip3 install bookcut
29 | ```
30 |
31 |
32 | ## Usage
33 |
34 | ### Searching and downloading books:
35 |
36 | * Download a **single** book:
37 |
38 | ```bash
39 | bookcut book -b "White Fang" -a "Jack London"
40 | ```
41 |
42 | * Download a **list** of books:
43 |
44 | ```bash
45 | bookcut list "FreeEbooksToDownload.txt"
46 | ```
47 |
48 | * Organise a **folder** full of e-books to folders according to genre:
49 |
50 | ```bash
51 | bookcut organise "full/path/to/folder"
52 | ```
53 | ***
54 | * Search **LibGen**, output the results and download e-book:
55 |
56 | ```bash
57 | bookcut search -t 'Homer Odyssey'
58 | ```
59 |
60 | * Search more book repositories with the **--repos** option:
61 | ``` bash
62 | bookcut search -t 'Homer Odyssey' --repos 'libgen,arxiv'
63 | ```
64 | **Available book repositories: Libgen, ArXiv**
65 | ***
66 |
67 | * Get the **details** of a book by **title and author**, or simply **ISBN**.
68 |
69 | ```bash
70 | bookcut details -b 'Homer Iliad'
71 | ```
72 |
73 | * Get a list with *all the books* from an **author**,with an option to save to .txt:
74 |
75 | ```bash
76 | bookcut all-books -author 'Stephen King'
77 | ```
78 | ***
79 | ### Searching and downloading articles:
80 | Now you can use bookcut to search and download **scientific articles**.
81 |
82 | - Search with the Digital Object Identifier:
83 | ```
84 | bookcut article --doi "10.1126/science.196.4287.293"
85 | ```
86 | - Search with the exact title:
87 | ```
88 | bookcut article --title "Ribulose Bisphosphate Carboxylase A Two-Layered, Square-Shaped Molecule of Symmetry"
89 | ```
90 | ****
91 | ### Configuration
92 | * Also you can change some basic settings of BookCut. For more check:
93 |
94 | ```bash
95 | bookcut config --help
96 | ```
97 |
98 | ## TO-DO
99 | * Add Tests
100 | * Add documentation
101 | * Add more sources with free e-books
102 | * Fix organiser so it can use all types of files
103 | * Add a logger.
104 |
105 | ## Copyrights
106 | Please use the bookcut app to download **only free e-books** that are legally distributing through *BookCut repositories.*
107 | Bookcut contributors do not have any responsibility for the use of the tool.
108 | ## Contributing
109 | Pull requests are welcome, this is my first project so be kind.
110 | For major changes, please open an issue first to discuss what you would like to change.
111 |
112 | Please make sure to update tests as appropriate.
113 |
114 | ## License
115 | [MIT](https://choosealicense.com/licenses/mit/)
116 |
--------------------------------------------------------------------------------
/bookcut/Settings.ini:
--------------------------------------------------------------------------------
1 | [LibGen]
2 | mirrors = https://libgen.lc/,http://libgen.li/,http://185.39.10.101/,http://genesis.lib/
3 |
4 | [Repositories]
5 | available_repos = arxiv,libgen
6 |
7 | [ArticleRepositories]
8 | article_repos = openaccessbutton
9 |
10 | [Settings]
11 | clean_screen = True
12 | destination = None
13 |
--------------------------------------------------------------------------------
/bookcut/__init__.py:
--------------------------------------------------------------------------------
1 | from importlib import metadata
2 |
3 | __version__ = metadata.version("bookcut")
4 |
--------------------------------------------------------------------------------
/bookcut/article.py:
--------------------------------------------------------------------------------
1 | from bookcut.repositories import open_access_button
2 | from bookcut.downloader import filename_refubrished
3 | from bookcut.search import search_downloader
4 | from click import confirm
5 |
6 | """
7 | Article.py is using from article command and searches repositories for
8 | published articles.
9 | """
10 |
11 |
12 | def article_search(doi, title):
13 | try:
14 | article_json_data = open_access_button(doi, title)
15 | url = article_json_data["url"]
16 | metadata = article_json_data["metadata"]
17 | title = metadata["title"]
18 | filename = filename_refubrished(title)
19 | filename = filename + ".pdf"
20 | ask_for_downloading(filename, url)
21 | except KeyError:
22 | print("\nCan not find the given article.\nPlease try another search!")
23 |
24 |
25 | def ask_for_downloading(articlefilename, url):
26 | ask = confirm(f"Do you want to download:\n {articlefilename}")
27 | if ask is True:
28 | search_downloader(articlefilename, url)
29 | else:
30 | print("Aborted!")
31 |
--------------------------------------------------------------------------------
/bookcut/bibliography.py:
--------------------------------------------------------------------------------
1 | import requests
2 | import json
3 | import re
4 | from difflib import SequenceMatcher
5 | import os
6 | from bookcut.mirror_checker import pageStatus
7 |
8 | """This file is used by ---allbooks command
9 | It is searching OpenLibrary for all books written from an
10 | author, and gives the choice to user to save it to a .txt file"""
11 |
12 | OPEN_LIBRARY_URL = "http://www.openlibrary.org"
13 |
14 |
15 | def main(author, similarity):
16 | # returns all the books writen by an author from openlibrary
17 | # using similarity for filtering the results
18 | status = pageStatus(OPEN_LIBRARY_URL)
19 | if status is not False:
20 | search_url = "http://openlibrary.org/search.json?author=" + author
21 | jason = requests.get(search_url)
22 | jason = jason.text
23 | data = json.loads(jason)
24 | data = data["docs"]
25 | if data != []:
26 | metr = 0
27 | books = []
28 | for i in range(0, len(data) - 1):
29 | title = data[metr]["title"]
30 | metr = metr + 1
31 | books.append(title)
32 | mylist = list(dict.fromkeys(books))
33 |
34 | # Filtrering results: trying to erase similar titles
35 | words = [
36 | " the ",
37 | "The ",
38 | " THE ",
39 | " The" " a ",
40 | " A ",
41 | " and ",
42 | " of ",
43 | " from ",
44 | "on",
45 | "The",
46 | "in",
47 | ]
48 |
49 | noise_re = re.compile(
50 | "\\b(%s)\\W" % ("|".join(map(re.escape, words))), re.I
51 | )
52 | clean_mylist = [noise_re.sub("", p) for p in mylist]
53 |
54 | for i in clean_mylist:
55 | for j in clean_mylist:
56 | a = similar(i, j, similarity)
57 | if a is True:
58 | clean_mylist.pop(a)
59 |
60 | clean_mylist.sort()
61 | print(" ~Books found to OpenLibrary Database:\n")
62 | for i in clean_mylist:
63 | print(i)
64 | return clean_mylist
65 | else:
66 | print("(!) No valid author name, or bad internet connection.")
67 | print("Please try again!")
68 | return None
69 |
70 |
71 | def similar(a, b, similarity):
72 | """function which check similarity between two strings"""
73 | ratio = SequenceMatcher(None, a, b).ratio()
74 | if ratio > similarity and ratio < 1:
75 | return True
76 | else:
77 | return False
78 |
79 |
80 | def save_to_txt(lista, path, author):
81 | # save the books list to txt file.
82 | for content in lista:
83 | name = f"{author}_bibliography.txt"
84 | full_path = os.path.join(path, name)
85 | with open(full_path, "a", encoding="utf-8") as f1:
86 | f1.write(content + " " + author + os.linesep)
87 | print("\nList saved at: ", full_path, "\n")
88 |
--------------------------------------------------------------------------------
/bookcut/book.py:
--------------------------------------------------------------------------------
1 | from bookcut.mirror_checker import settingParser
2 | import mechanize
3 | from bs4 import BeautifulSoup as Soup
4 | from bookcut.libgen import file_name
5 | from click import confirm
6 | from bookcut.downloader import downloading
7 | from bookcut.repositories import arxiv, libgen_repo
8 | import pandas as pd
9 | from bookcut.search import choose_a_book
10 |
11 |
12 | def libgen_book_find(
13 | title, author, publisher, destination, extension, force, libgenurl
14 | ):
15 | """searching @ LibGen for a single book"""
16 | try:
17 | book = Booksearch(title, author, publisher, type, libgenurl)
18 | result = book.search()
19 | extensions = result["extensions"]
20 | tb = result["table_data"]
21 | mirrors = result["mirrors"]
22 | file_details = book.give_result(extensions, tb, mirrors, extension)
23 | if file_details is not None:
24 | book.cursor(file_details["url"], destination, file_details["file"], force)
25 | except TypeError:
26 | # TODO add logger error
27 | pass
28 |
29 |
30 | def book_searching_in_repos(term, repos):
31 | # search a book in various Repositories
32 | if repos is None:
33 | libgen_data = libgen_repo(term)
34 | return libgen_data
35 | repos = repos.split(",")
36 | repos = [i.strip(" ") for i in repos]
37 | available_repos = settingParser("Repositories", "available_repos")
38 | df = pd.DataFrame({"Author(s)": [], "Title": [], "Size": [], "Extension": []})
39 | for i in repos:
40 | if i in available_repos:
41 | if i == "arxiv":
42 | arxiv_data = arxiv(term)
43 | df = pd.concat([df, arxiv_data], ignore_index=True)
44 | if i == "libgen":
45 | libgen_data = libgen_repo(term)
46 | df = pd.concat([df, libgen_data], ignore_index=True)
47 | choose_a_book(df)
48 |
49 |
50 | class Booksearch:
51 | """searching libgen original page and returns book details and mirror link"""
52 |
53 | def __init__(self, title, author, publisher, filetype, libgenurl):
54 | self.title = title
55 | self.author = author
56 | self.publisher = publisher
57 | self.filetype = filetype
58 | self.mirror = None
59 | self.libgenurl = libgenurl
60 |
61 | def search(self):
62 | """searching libgen and returns table data, extensions and links"""
63 | br = mechanize.Browser()
64 | br.set_handle_robots(False) # ignore robots
65 | br.set_handle_refresh(False) #
66 | br.addheaders = [("User-agent", "Firefox")]
67 |
68 | br.open(self.libgenurl)
69 | br.select_form("libgen")
70 | input_form = self.title + " " + self.author + " " + self.publisher
71 | br.form["req"] = input_form
72 | ac = br.submit()
73 | html_from_page = ac
74 | soup = Soup(html_from_page, "html.parser")
75 | links_table = soup.find_all("table")[3]
76 | table_data = []
77 | mirrors = []
78 | extensions = []
79 | for i in links_table:
80 | try:
81 | td = i.find_all("td")
82 | for tr in td:
83 | # scrape mirror links
84 | temp = tr.find("a", href=True)
85 | mirror_page = temp["href"]
86 | # add also mirror link
87 | if mirror_page.startswith("http") is False:
88 | mirror_page = self.libgenurl + temp["href"]
89 | else:
90 | mirror_page = temp["href"]
91 | mirrors.append(mirror_page)
92 | except Exception as e:
93 | print(e)
94 |
95 | # Parse Details from table_data
96 | table = soup.find_all("table")[2]
97 | for i in table:
98 | try:
99 | td = i.find_all("td")
100 | row = tr.find_all("tr")
101 | row = [tr.text for tr in td]
102 | table_data.append(row)
103 | extensions.append(row[8])
104 | table_details = dict()
105 | table_details["extensions"] = extensions
106 | table_details["table_data"] = table_data
107 | table_details["mirrors"] = mirrors
108 | return table_details
109 | except Exception as e:
110 | pass
111 |
112 | def give_result(self, extensions, table_data, mirrors, filetype):
113 | try:
114 | if filetype is not None:
115 | temp = 0
116 | for i in extensions:
117 | if filetype == i:
118 | result = dict()
119 | result["url"] = mirrors[temp]
120 | result["file"] = extensions[temp]
121 | print("\nDownloading Link: FOUND")
122 | return result
123 | temp = temp + 1
124 | else:
125 | # return the first result
126 | result = dict()
127 | result["url"] = mirrors[0]
128 | result["file"] = extensions[0]
129 | print("\nDownloading Link: FOUND")
130 | return result
131 | except IndexError:
132 | print("Downloading Link:NOT FOUND\n")
133 | print("================================")
134 |
135 | def cursor(self, url, destination_folder, extension, forced):
136 | """asking the user to download a chosen book or to abort"""
137 | title = str(self.title)
138 | author = str(self.author)
139 | nameofbook = file_name(url)
140 | if nameofbook is None:
141 | nameofbook = (
142 | title.replace("\n", "") + author.replace("\n", "") + "." + extension
143 | )
144 | if forced is not True:
145 | ask = confirm(f"Do you want to download:\n {nameofbook}")
146 | if ask is True:
147 | downloading(
148 | url, title, author, nameofbook, destination_folder, extension
149 | )
150 | else:
151 | downloading(url, title, author, nameofbook, destination_folder, extension)
152 |
--------------------------------------------------------------------------------
/bookcut/book_details.py:
--------------------------------------------------------------------------------
1 | import requests
2 | import json
3 | from requests import ConnectionError
4 | from bookcut.mirror_checker import pageStatus
5 |
6 | """This file is using by ---details command.
7 | It's main use is to search OpenLibrary for a books' details.
8 | It's input can be the name of the book or the ISBN.
9 | """
10 |
11 | OPEN_LIBRARY_URL = "http://www.openlibrary.org"
12 |
13 |
14 | def main(term):
15 | # searching OpenLibrary and prints the details of a book
16 | try:
17 | if term is None:
18 | term = input(
19 | "Please enter the book and the author, or the ISBN of the book."
20 | )
21 | term = term.replace(" ", "+")
22 | pageStatus(OPEN_LIBRARY_URL)
23 | search_url = "http://openlibrary.org/search.json?q=" + term
24 | jason = requests.get(search_url)
25 | jason = jason.text
26 | data = json.loads(jason)
27 | try:
28 | data = data["docs"][0]
29 | except IndexError:
30 | data = None
31 | print("Invalid search, please try again.")
32 |
33 | if data is not None:
34 | author = data["author_name"][0]
35 | title = data["title_suggest"]
36 | isbn = data["isbn"]
37 | first_publish_year = data["first_publish_year"]
38 | try:
39 | lang = data["language"]
40 | except KeyError:
41 | lang = None
42 |
43 | print("Results for search: ", term, "\n")
44 | print("Title:", title)
45 | print("Author(s):", author, "\n")
46 | print("ISBN(s):", isbn, "\n")
47 | if lang is not None:
48 | print(
49 | "Language(s): ",
50 | )
51 | print("\nFirst published: ", first_publish_year)
52 | except ConnectionError:
53 | url = "http://www.openlibrary.com"
54 | print(
55 | "Unable to connect to:",
56 | url,
57 | "\nPlease check your internet connection and try again later.",
58 | )
59 | except json.decoder.JSONDecodeError:
60 | print("An error occured during the retrieving of data.")
61 | print("Please try again later.")
62 |
--------------------------------------------------------------------------------
/bookcut/bookcut.py:
--------------------------------------------------------------------------------
1 | import click
2 | import pyfiglet
3 | from os import name, system
4 | from bookcut import __version__
5 | from bookcut.mirror_checker import main as mirror_checker, settingParser
6 | from bookcut.book import libgen_book_find, book_searching_in_repos
7 | from bookcut.organise import main_organiser
8 | from bookcut.search import choose_a_book
9 | from bookcut.book_details import main as detailing
10 | from bookcut.bibliography import main as allbooks
11 | from bookcut.bibliography import save_to_txt
12 | from bookcut.article import article_search
13 | from bookcut.settings import initial_config, mirrors_append, read_settings
14 | from bookcut.settings import (
15 | screen_setting,
16 | print_settings,
17 | set_destination,
18 | path_checker,
19 | )
20 | from bookcut.booklist import booklist_main
21 | from bookcut.repositories import libgen_repo
22 | from bookcut.libgen import md5_search
23 |
24 |
25 | @click.group(name="commands")
26 | @click.version_option(version=__version__)
27 | def entry():
28 | """
29 | for a single book download you can \n
30 | bookcut.py book --bookname "White Fang" -- author "Jack London"
31 | \nor bookcut.py book -b "White Fang" -a "Jack London" \n
32 | *For a more complete help: bookcut.py [COMMAND] --help\n
33 | *For example: bookcut.py list --help
34 | """
35 | # read the settings ini file and check what value for clean screen
36 | settings = read_settings()
37 | clean_screen(settings[0])
38 | title = pyfiglet.figlet_format("BookCut")
39 | click.echo(title)
40 | click.echo("**********************************")
41 | print("Welcome to BookCut! I'm here to" "\nhelp you to read your favourite books!")
42 | print("**********************************")
43 |
44 |
45 | @entry.command(name="list", help="Download a list of ebook from a .txt file")
46 | @click.option(
47 | "--file",
48 | "-f",
49 | help="A .txt file in which books are written in a separate line",
50 | required=True,
51 | )
52 | @click.option(
53 | "--destination",
54 | "-d",
55 | help="The destinations folder of the downloaded books",
56 | default=path_checker(),
57 | )
58 | @click.option(
59 | "--forced", help="Forced option, accepts all books for downloading", is_flag=True
60 | )
61 | @click.option("--extension", "-ext", help="File type of e-book.")
62 | def download_from_txt(file, destination, forced, extension):
63 | click.echo("Importing of book list:Started.")
64 | if forced:
65 | click.echo(click.style("(!) Forced list downloading:Enabled", fg="green"))
66 | booklist_main(file, destination, forced, extension)
67 |
68 |
69 | @entry.command(
70 | name="book",
71 | help="Download a book in epub format, by inserting" "\n the title and the author",
72 | )
73 | @click.option("--book", "-b", help="Title of Book", default=" ")
74 | @click.option("--author", "-a", help="The author of the Book", default=" ")
75 | @click.option("--publisher", "-p", default="")
76 | @click.option(
77 | "--destination",
78 | "-d",
79 | help="The destinations folder of the downloaded books",
80 | default=path_checker(),
81 | )
82 | @click.option("--extension", "-ext", help="Filetype of e-book for example:pdf")
83 | @click.option("--forced", is_flag=True)
84 | @click.option("--md5", help="Md5 search for a specific book version.", default=None)
85 | def book(book, author, publisher, destination, extension, forced, md5):
86 | if book == " " and md5 is None:
87 | print("Invalid Input! Check for more.")
88 | elif author != " " and book != " ":
89 | click.echo(f"\nSearching for {book.capitalize()} by {author.capitalize()}")
90 | elif book != " ":
91 | click.echo(f"\nSearching for {book.capitalize()}")
92 | url = mirror_checker()
93 | if url is not None:
94 | if md5 is not None:
95 | print("\nSearching for book with md5: ", md5)
96 | md5_search(md5, url, destination)
97 | else:
98 | libgen_book_find(
99 | book, author, publisher, destination, extension, forced, url
100 | )
101 |
102 |
103 | def clean_screen(setting):
104 | """Cleans the terminal screen"""
105 | if setting == "True":
106 | if name == "nt":
107 | _ = system("cls")
108 | else:
109 | _ = system("clear")
110 |
111 |
112 | @entry.command(
113 | name="organise", help="Organise the ebooks in folders according\n to genre"
114 | )
115 | @click.option(
116 | "--directory",
117 | "-d",
118 | help="Directory of source ",
119 | required=True,
120 | default=path_checker(),
121 | )
122 | @click.option(
123 | "--output",
124 | "-o",
125 | help="The destination folder of organised books",
126 | default=path_checker(),
127 | )
128 | def organiser(directory, output):
129 | print("\nBookCut is starting to \norganise your books!")
130 | main_organiser(directory)
131 |
132 |
133 | @entry.command(name="all-books", help="Search and return all the books from an author")
134 | @click.option("--author", "-a", required=True, help="Author name")
135 | @click.option(
136 | "--ratio", "-r", help="Ratio for filtering book results", default="0.7", type=float
137 | )
138 | def bibliography(author, ratio):
139 | print(f"\nStart searching for all books by {author.capitalize()}:")
140 | lista = allbooks(author, ratio)
141 | if lista is not None:
142 | print("**********************************")
143 | choice = "y or n"
144 | while choice != "Y" or choice != "N":
145 | choice = input("\nDo you wish to save the list? [Y/n]: ")
146 | choice = choice.capitalize()
147 | if choice == "Y":
148 | save_to_txt(lista, path_checker(), author)
149 | break
150 | elif choice == "N":
151 | print("Aborted.")
152 | break
153 |
154 |
155 | @entry.command(
156 | name="search",
157 | help="Search LibGen or other repositories and choose a book to download",
158 | )
159 | @click.option("--term", "-t", help="Term for searching")
160 | @click.option("--repos", default=None)
161 | def searching(term, repos):
162 | print("Searching for:", term.capitalize())
163 | # set default libgen search
164 | if repos is None:
165 | libgen_data = libgen_repo(term)
166 | choose_a_book(libgen_data)
167 | else:
168 | book_searching_in_repos(term, repos)
169 |
170 |
171 | @entry.command(name="details", help="Search the details of a book")
172 | @click.option(
173 | "--book",
174 | "-b",
175 | help="Enter book & author or the ISBN number.",
176 | required=True,
177 | default=None,
178 | )
179 | def details(book):
180 | detailing(book)
181 |
182 |
183 | @entry.command(name="article", help="Search for an article")
184 | @click.option("--doi", "-d", help="Enter D.O.I. of the article", default=None)
185 | @click.option("--title", "-t", help="Enter title of article", default=None)
186 | def article(doi, title):
187 | if doi or title is not None:
188 | article_search(doi, title)
189 | else:
190 | print("Not correct input. \nPlease use: bookcut article --help")
191 |
192 |
193 | @entry.command(name="config", help="BookCut configuration settings")
194 | @click.option("--libgen_add", help="Add a Libgen mirror to mirrors list", default=None)
195 | @click.option(
196 | "--restore", help="Restores the settings file to initial state", is_flag=True
197 | )
198 | @click.option("--settings", help="Prints the current BookCut settings", is_flag=True)
199 | @click.option(
200 | "--clean_screen",
201 | help="You can choose if BookCut will" " clean terminal screen",
202 | is_flag=True,
203 | )
204 | @click.option("--download_folder", help="Set BookCut's download folder", default=None)
205 | def configure_mode(restore, libgen_add, settings, clean_screen, download_folder):
206 | if restore:
207 | prompt = click.confirm("\n Are you sure do you want to restore Settings?")
208 | if prompt is True:
209 | initial_config()
210 | else:
211 | click.echo("Aborted!")
212 | elif libgen_add is not None:
213 | click.echo(f"Adding {libgen_add} to mirrors list")
214 | mirrors_append(libgen_add)
215 | elif settings:
216 | print_settings()
217 | elif clean_screen:
218 | prompt = click.confirm("\nDo you want Bookcut to clean command line?")
219 | if prompt is True:
220 | screen_setting("True")
221 | else:
222 | screen_setting("False")
223 | elif download_folder is not None:
224 | set_destination(download_folder)
225 | else:
226 | print(
227 | "Usage: bookcut config [OPTIONS]",
228 | "\nTry 'bookcut config --help' for help.\n",
229 | "\nError: Missing option or flag.",
230 | )
231 |
232 |
233 | if __name__ == "__main__":
234 | entry()
235 |
--------------------------------------------------------------------------------
/bookcut/booklist.py:
--------------------------------------------------------------------------------
1 | from bookcut.book import libgen_book_find
2 | from bookcut.mirror_checker import main as mirror_checker
3 |
4 |
5 | def file_list(filename):
6 | """checks if the input file is a .txt file and adds each separate line
7 | as a book to the list 'Lines'.
8 | After return this list to download_from_txt
9 | """
10 |
11 | if filename.endswith(".txt"):
12 | try:
13 | file1 = open(filename, "r", encoding="utf-8")
14 | Lines = file1.readlines()
15 | for i in Lines:
16 | if i == "\n":
17 | Lines.remove(i)
18 | return Lines
19 | except FileNotFoundError:
20 | print("Error:No such file or directory:", filename)
21 | else:
22 | print("\nError:Not correct file type. Please insert a '.txt' file")
23 |
24 |
25 | def booklist_main(file, destination, forced, extension):
26 | """executes with the command --list"""
27 | Lines = file_list(file)
28 | if Lines is not None:
29 | print("List imported succesfully!")
30 | url = mirror_checker()
31 | if url is not None:
32 | temp = 1
33 | many = len(Lines)
34 | for a in Lines:
35 | if a != "":
36 | print(
37 | f"~[{temp}/{many}] Searching for:",
38 | a,
39 | )
40 | temp = temp + 1
41 | libgen_book_find(a, "", "", destination, extension, forced, url)
42 |
--------------------------------------------------------------------------------
/bookcut/downloader.py:
--------------------------------------------------------------------------------
1 | import requests
2 | from tqdm import tqdm
3 | import os
4 | from bs4 import BeautifulSoup as Soup
5 |
6 |
7 | def downloading(link, name, author, file, destination_folder, type):
8 | """finds the first available book and sends the link to file_downloader"""
9 | page = requests.get(link)
10 | soup = Soup(page.content, "html.parser")
11 |
12 | searcher = [a["href"] for a in soup.find_all(href=True) if a.text]
13 | searcher_link = searcher[0]
14 | if searcher_link.startswith("http") is False:
15 | until_dot = link.split("//")
16 | searcher_link = until_dot[0] + "//" + until_dot[1] + searcher_link
17 | file_downloader(searcher_link, name, author, file, destination_folder, type)
18 |
19 |
20 | def file_downloader(href, name, author, file, destination_folder, type):
21 | """Downloads the book file to users folder"""
22 | response = requests.get(href, stream=True)
23 | total_size = int(response.headers.get("content-length"))
24 | inMb = total_size / 1000000
25 | inMb = round(inMb, 2)
26 | print("\nDownloading...", "\nTotal file size:", inMb, "MB")
27 |
28 | # Folder to download books
29 | filename = file
30 | if filename != "":
31 | pass
32 | else:
33 | filename = name + " - " + author + type
34 | path = destination_folder
35 |
36 | filename = os.path.join(path, filename)
37 |
38 | try:
39 | with open(filename, "wb") as f:
40 | """For progress bar"""
41 | with tqdm(total=total_size, unit="iB", unit_scale=True) as pbar:
42 | for ch in response.iter_content(chunk_size=1024):
43 | if ch:
44 | f.write(ch)
45 | pbar.update(len(ch))
46 |
47 | print("================================\nFile saved as:", filename)
48 | except FileNotFoundError:
49 | print("ERROR! Is the destination folder exists? ")
50 |
51 |
52 | def pathfinder():
53 | path = os.path.expanduser("~/Documents/BookCut")
54 | if os.path.isdir(path):
55 | pass
56 | else:
57 | os.makedirs(path)
58 | return path
59 |
60 |
61 | def filename_refubrished(filename):
62 | # for valid filenames without special characters
63 | special_char = [":", "/", '""', "?", "*", "<", ">", "|"]
64 | for i in special_char:
65 | filename = filename.replace(i, " ")
66 | return filename
67 |
--------------------------------------------------------------------------------
/bookcut/libgen.py:
--------------------------------------------------------------------------------
1 | from bs4 import BeautifulSoup as soupa
2 | import requests
3 | from bookcut.downloader import file_downloader
4 | from click import confirm
5 | from bookcut.search import RESULT_ERROR
6 |
7 |
8 | def epub_finder(soup):
9 | table = soup.find("table", attrs={"class": "c"})
10 | tb = table.find_all("tr")
11 | data = []
12 | epub = "epub"
13 | for row in tb:
14 | col = row.find_all("td")
15 | col = [ele.text.strip() for ele in col]
16 | xxx = [ele for ele in col if ele]
17 |
18 | false_results = ["[1]", "[2]", "[3]", "[4]", "[5]"]
19 | if false_results == xxx:
20 | pass
21 | else:
22 | data.append(xxx)
23 | del data[0]
24 | count = 0
25 | for a in data:
26 | if epub in a:
27 | break
28 | else:
29 | count = count + 1
30 | return count
31 |
32 |
33 | def file_name(url):
34 | print("URL: ", url)
35 | page = requests.get(url)
36 | try:
37 | soup = soupa(page.content, "html.parser")
38 | r = soup.find("input")["value"]
39 | r.replace("\n", "")
40 | return r
41 | except TypeError:
42 | return None
43 |
44 |
45 | def md5_search(md5, url, destination):
46 | try:
47 | # function that using by book command and searching for a specific book in LibGen with a given md5 value
48 | mirror_url = url + "/ads.php?md5=" + md5
49 | req = requests.get(mirror_url)
50 | soup = soupa(req.content, "html.parser")
51 | html = soup.find("input", attrs={"id": "textarea-example"})
52 | filename = html["value"]
53 | url_soup = soup.findAll("table", attrs={"id": "main"})
54 |
55 | urls = []
56 | for j in url_soup:
57 | a = j.findAll("a", href=True)
58 | for i in a:
59 | urls.append(i["href"])
60 | download_url = url + urls[0]
61 | question = confirm(f"Do you want to download:\n{filename}")
62 | if question is True:
63 | file_downloader(download_url, "", "", filename, destination, "")
64 | else:
65 | print("Aborted!")
66 | except TypeError:
67 | print(RESULT_ERROR)
68 |
--------------------------------------------------------------------------------
/bookcut/mirror_checker.py:
--------------------------------------------------------------------------------
1 | import requests
2 | from requests import ConnectionError
3 | import configparser
4 | import os
5 |
6 |
7 | CONNECTION_ERROR_MESSAGE = (
8 | "\nUnable to connect to: {} "
9 | "\nPlease check your internet connection and try again later."
10 | )
11 |
12 |
13 | def settingParser(section, value):
14 | "Parsing data from Settings.ini"
15 | config = configparser.ConfigParser()
16 | module_path = os.path.dirname(os.path.realpath(__file__))
17 | settings_ini = os.path.join(module_path, "Settings.ini")
18 | config.read(settings_ini)
19 | mirrors = config.get(section, value)
20 | mirrors = mirrors.split(",")
21 | return mirrors
22 |
23 |
24 | def main(verbose=True):
25 | """Check which LibGen mirror is available"""
26 |
27 | mirrors = settingParser("LibGen", "mirrors")
28 | for url in mirrors:
29 | try:
30 | r = requests.head(url)
31 | if r.status_code == 200 or r.status_code == 301:
32 | status = True
33 | if status is True:
34 | if verbose is True:
35 | print("Connected to:", url)
36 | return url
37 | break
38 | else:
39 | print("No mirrors available or no Internet Connection!")
40 | except:
41 | pass
42 |
43 |
44 | def pageStatus(url, verbose=True):
45 | try:
46 | request = requests.head(url)
47 | if request.status_code == 200 or request.status_code == 301:
48 | if verbose is True:
49 | print("Connected to:", url)
50 | return True
51 | except ConnectionError:
52 | pass
53 | print(CONNECTION_ERROR_MESSAGE.format(url))
54 | return False
55 |
56 |
57 | if __name__ == "__main__":
58 | main()
59 |
--------------------------------------------------------------------------------
/bookcut/organise.py:
--------------------------------------------------------------------------------
1 | from bookcut.mirror_checker import pageStatus
2 | import os
3 | import shutil
4 | import requests
5 | import json
6 |
7 | OPEN_LIBRARY_URL = "http://www.openlibrary.org"
8 |
9 |
10 | def main_organiser(directory):
11 | status = pageStatus(OPEN_LIBRARY_URL)
12 | if status is not False:
13 | book_list = get_books(directory)
14 | # lists only the files in the given directory
15 | namepath = []
16 | with os.scandir(directory) as entries:
17 | for entry in entries:
18 | if entry.is_file():
19 | namepath.append(entry.name)
20 | for i in range(0, len(book_list)):
21 | print("File:", namepath[i])
22 | try:
23 | """splitting file name to author and book title for using as
24 | searching terms to OpenLibrary"""
25 | a = book_list[i].split("by")
26 | book = a[1]
27 | author = a[0]
28 | a = scraper(book, author)
29 | print("\n***", book, " ", author)
30 | a = a["genre"]
31 | filename = namepath[i]
32 | cutpaste(directory, a, filename)
33 | except IndexError:
34 | try:
35 | a = book_list[i].split("-")
36 | book = a[1]
37 | author = a[0]
38 | a = scraper(book, author)
39 | print("\n***", book, " ", author)
40 | a = a["genre"]
41 | filename = namepath[i]
42 | cutpaste(directory, a, filename)
43 | except IndexError:
44 | print("Unable to organise this file.\n")
45 | pass
46 |
47 |
48 | def get_books(dir):
49 | """filtering epub, pdf, txt, mobi, djvu files in the given directory
50 | and return a list with all filenames"""
51 | epub_list = []
52 | for file in os.listdir(dir):
53 | if file.endswith(".epub"):
54 | renamed = file.replace(".epub", "")
55 | renamed = renamed.replace("_", " ")
56 | epub_list.append(renamed)
57 | elif file.endswith(".pdf"):
58 | renamed = file.replace(".pdf", "")
59 | renamed = renamed.replace("_", " ")
60 | epub_list.append(renamed)
61 | elif file.endswith(".txt"):
62 | renamed = file.replace(".txt", "")
63 | renamed = renamed.replace("_", " ")
64 | epub_list.append(renamed)
65 | elif file.endswith(".mobi"):
66 | renamed = file.replace(".mobi", "")
67 | renamed = renamed.replace("_", " ")
68 | epub_list.append(renamed)
69 | elif file.endswith(".djvu"):
70 | renamed = file.replace(".djvu", "")
71 | renamed = renamed.replace("_", " ")
72 | epub_list.append(renamed)
73 | return epub_list
74 |
75 |
76 | def scraper(book, author):
77 | """parsing the book category from OpenLibrary"""
78 | try:
79 | book = book.replace(" ", "+")
80 | author = author.replace(" ", "+")
81 |
82 | search_url = "http://openlibrary.org/search.json?q=" + book + "+" + author
83 | jason = requests.get(search_url)
84 | jason = jason.text
85 | data = json.loads(jason)
86 | json_formatted_str = json.dumps(data, indent=2)
87 |
88 | book_values = {}
89 | isbn = None
90 | author_name = None
91 | title = None
92 | subject = None
93 | try:
94 | # TODO: to add feature to check all docs
95 |
96 | data = data["docs"][0]
97 | except IndexError:
98 | data = None
99 | if data is not None:
100 | try:
101 | isbn = data["isbn"][0]
102 | except KeyError:
103 | pass
104 | try:
105 | author_name = data["author_name"][0]
106 | except KeyError:
107 | pass
108 | try:
109 | title = data["title_suggest"]
110 | except KeyError:
111 | pass
112 | try:
113 | subject = data["subject"]
114 | except KeyError:
115 | pass
116 |
117 | book_values.update([("isbn", isbn), ("author", author_name), ("title", title)])
118 | if subject is not None:
119 | for a in subject:
120 | x = genre_finder(a)
121 | if x is not None:
122 | subject = x
123 | break
124 | else:
125 | subject = "Uncategorized"
126 | else:
127 | subject = "Uncategorized"
128 | book_values.update({"genre": subject})
129 | return book_values
130 | except requests.ConnectionError:
131 | url = "http://www.openlibrary.com"
132 | print(
133 | "Unable to connect to:",
134 | url,
135 | "\nPlease check your internet connection and try again later.",
136 | )
137 | return None
138 |
139 |
140 | def genre_finder(sub):
141 | genres = [
142 | "Classics",
143 | "Literary",
144 | "Fiction",
145 | "Historical Fiction",
146 | "Romance",
147 | "Horror",
148 | "Mystery",
149 | "Suspence",
150 | "Fantasy",
151 | "Action",
152 | "Adventure",
153 | "Science Fiction",
154 | "History",
155 | "Biography",
156 | "Autobiography",
157 | "Poetry",
158 | "Art",
159 | "Music",
160 | "Humor",
161 | "Religion",
162 | "Mythology",
163 | "Philosophy",
164 | "Health",
165 | "Science",
166 | "Social Science",
167 | "Psychology",
168 | "Self-helf",
169 | "Nonfiction",
170 | ]
171 |
172 | if sub in genres:
173 | return sub
174 | else:
175 | return None
176 |
177 |
178 | def cutpaste(dir, genre, file):
179 | """Check if genre folder exists if not it creates one"""
180 | path = os.path.join(dir, genre)
181 | if os.path.isdir(path):
182 | pass
183 | else:
184 | os.mkdir(path)
185 | print("Created folder:", genre)
186 | filepath = os.path.join(path, file)
187 |
188 | from_path = os.path.join(dir, file)
189 | dest_path = os.path.join(dir, genre, file)
190 | shutil.move(from_path, dest_path)
191 | print("File moved to: ", genre, "\n", "\n", "********************")
192 |
--------------------------------------------------------------------------------
/bookcut/repositories.py:
--------------------------------------------------------------------------------
1 | from bs4 import BeautifulSoup as soup
2 | from bookcut.mirror_checker import (
3 | pageStatus,
4 | main as mirror_checker,
5 | CONNECTION_ERROR_MESSAGE,
6 | )
7 | import mechanize
8 | import pandas as pd
9 | import requests
10 |
11 | ARCHIV_URL = "https://export.arxiv.org/find/grp_cs,grp_econ,grp_eess,grp_math,grp_physics,grp_q-bio,grp_q-fin,grp_stat"
12 | ARCHIV_BASE = "https://export.arxiv.org"
13 | OPEN_ACCESS_BUTTON = "https://api.openaccessbutton.org/find"
14 |
15 |
16 | def arxiv(term):
17 | # Searching Arxiv.org and returns a DataFrame with the founded results.
18 | status = pageStatus(ARCHIV_URL)
19 | if status:
20 | br = mechanize.Browser()
21 | br.set_handle_robots(False) # ignore robots
22 | br.set_handle_refresh(False) #
23 | br.addheaders = [("User-agent", "Firefox")]
24 |
25 | br.open(ARCHIV_URL)
26 | br.select_form(nr=0)
27 | input_form = term
28 | br.form["query"] = input_form
29 | ac = br.submit()
30 | html_from_page = ac
31 | html_soup = soup(html_from_page, "html.parser")
32 |
33 | t = html_soup.findAll("div", {"class": "list-title mathjax"})
34 | titles = []
35 | for i in t:
36 | raw = i.text
37 | raw = raw.replace("Title: ", "")
38 | raw = raw.replace("\n", "")
39 | titles.append(raw)
40 | authors = []
41 | auth_soup = html_soup.findAll("div", {"class": "list-authors"})
42 | for i in auth_soup:
43 | raw = i.text
44 | raw = raw.replace("Authors:", "")
45 | raw = raw.replace("\n", "")
46 | authors.append(raw)
47 | extensions = []
48 | urls = []
49 | ext = html_soup.findAll("span", {"class": "list-identifier"})
50 | for i in ext:
51 | a = i.findAll("a")
52 | link = a[1]["href"]
53 | extensions.append(str(a[1].text))
54 | urls.append(ARCHIV_BASE + link)
55 |
56 | arxiv_df = pd.DataFrame(
57 | {
58 | "Title": titles,
59 | "Author(s)": authors,
60 | "Url": urls,
61 | "Extension": extensions,
62 | }
63 | )
64 |
65 | return arxiv_df
66 | else:
67 | print(CONNECTION_ERROR_MESSAGE.format("ArXiv"))
68 | return None
69 |
70 |
71 | def libgen_repo(term):
72 | # Searching LibGen and returns results DataFrame
73 | try:
74 | url = mirror_checker()
75 | if url is not None:
76 | br = mechanize.Browser()
77 | br.set_handle_robots(False) # ignore robots
78 | br.set_handle_refresh(False) #
79 | br.addheaders = [("User-agent", "Firefox")]
80 |
81 | br.open(url)
82 | br.select_form("libgen")
83 | input_form = term
84 | br.form["req"] = input_form
85 | ac = br.submit()
86 | html_from_page = ac
87 | html_soup = soup(html_from_page, "html.parser")
88 | table = html_soup.find_all("table")[2]
89 |
90 | table_data = []
91 | mirrors = []
92 | extensions = []
93 |
94 | for i in table:
95 | j = 0
96 | try:
97 | td = i.find_all("td")
98 | for tr in td:
99 | # scrape mirror links
100 | if j == 9:
101 | temp = tr.find("a", href=True)
102 | mirrors.append(temp["href"])
103 | j = j + 1
104 | row = [tr.text for tr in td]
105 | table_data.append(row)
106 | extensions.append(row[8])
107 | except:
108 | pass
109 |
110 | # Clean result page
111 | for j in table_data:
112 | j.pop(0)
113 | del j[8:15]
114 | headers = [
115 | "Author(s)",
116 | "Title",
117 | "Publisher",
118 | "Year",
119 | "Pages",
120 | "Language",
121 | "Size",
122 | "Extension",
123 | ]
124 |
125 | tabular = pd.DataFrame(table_data)
126 | tabular.columns = headers
127 | tabular["Url"] = mirrors
128 | return tabular
129 | except ValueError:
130 | # create emptyDataframe
131 | df = pd.DataFrame()
132 | return df
133 |
134 |
135 | def open_access_button(doi, title):
136 | status = pageStatus(OPEN_ACCESS_BUTTON)
137 | if status:
138 | if doi is not None:
139 | query = {"doi": doi}
140 | else:
141 | query = {"title": title}
142 | req = requests.get(OPEN_ACCESS_BUTTON, params=query)
143 | response = req.json()
144 | return response
145 | else:
146 | print(CONNECTION_ERROR_MESSAGE.format("Open Access Button"))
147 |
--------------------------------------------------------------------------------
/bookcut/search.py:
--------------------------------------------------------------------------------
1 | from bookcut.mirror_checker import main as mirror_checker
2 | from bookcut.downloader import filename_refubrished
3 | from bookcut.settings import path_checker
4 | from bs4 import BeautifulSoup as Soup
5 | import mechanize
6 | import pandas as pd
7 | import os
8 | import requests
9 | from tqdm import tqdm
10 |
11 | RESULT_ERROR = "\nNo results found or bad Internet connection.\nPlease try again!"
12 |
13 |
14 | def search_downloader(file, href):
15 | # search_downloader downloads the book
16 | response = requests.get(href, stream=True)
17 | total_size = int(response.headers.get("content-length"))
18 | inMb = total_size / 1000000
19 | inMb = round(inMb, 2)
20 | filename = file
21 | print("\nDownloading...\n", "Total file size:", inMb, "MB")
22 |
23 | path = path_checker()
24 |
25 | filename = os.path.join(path, filename)
26 | # progress bar
27 | buffer_size = 1024
28 | progress = tqdm(
29 | response.iter_content(buffer_size),
30 | f"{file}",
31 | total=total_size,
32 | unit="B",
33 | unit_scale=True,
34 | unit_divisor=1024,
35 | )
36 | with open(filename, "wb") as f:
37 | for data in progress:
38 | # write data read to the file
39 | f.write(data)
40 | # update the progress bar manually
41 | progress.update(len(data))
42 | print("================================\nFile saved as:", filename)
43 |
44 |
45 | def link_finder(link, mirror_used):
46 | # link_ finder is searching Libgen for download link and filename
47 | page = requests.get(link)
48 | soup = Soup(page.content, "html.parser")
49 | searcher = [a["href"] for a in soup.find_all(href=True) if a.text]
50 | try:
51 | filename = soup.find("input")["value"]
52 | except TypeError:
53 | filename = None
54 | if searcher[0].startswith("http") is False:
55 | searcher[0] = mirror_used + searcher[0]
56 | results = [filename, searcher[0]]
57 | return results
58 |
59 |
60 | def search(term):
61 | # This function is used when searching to LibGen with the command
62 | # bookcut search -t "keyword"
63 |
64 | url = mirror_checker()
65 | if url is not None:
66 | br = mechanize.Browser()
67 | br.set_handle_robots(False) # ignore robots
68 | br.set_handle_refresh(False) #
69 | br.addheaders = [("User-agent", "Firefox")]
70 |
71 | br.open(url)
72 | br.select_form("libgen")
73 | input_form = term
74 | br.form["req"] = input_form
75 | ac = br.submit()
76 | html_from_page = ac
77 | soup = Soup(html_from_page, "html.parser")
78 | table = soup.find_all("table")[2]
79 |
80 | table_data = []
81 | mirrors = []
82 | extensions = []
83 |
84 | for i in table:
85 | j = 0
86 | try:
87 | td = i.find_all("td")
88 | for tr in td:
89 | # scrape mirror links
90 | if j == 9:
91 | temp = tr.find("a", href=True)
92 | mirrors.append(temp["href"])
93 | j = j + 1
94 | row = [tr.text for tr in td]
95 | table_data.append(row)
96 | extensions.append(row[8])
97 |
98 | except:
99 | pass
100 |
101 | # Clean result page
102 | for j in table_data:
103 | j.pop(0)
104 | del j[8:15]
105 | headers = [
106 | "Author(s)",
107 | "Title",
108 | "Publisher",
109 | "Year",
110 | "Pages",
111 | "Language",
112 | "Size",
113 | "Extension",
114 | ]
115 |
116 | try:
117 | tabular = pd.DataFrame(table_data)
118 | tabular.index += 1
119 | tabular.columns = headers
120 | print(tabular)
121 | choices = []
122 | temp = len(mirrors) + 1
123 | for i in range(1, temp):
124 | choices.append(str(i))
125 | choices.append("C")
126 | choices.append("c")
127 | while True:
128 | tell_me = str(
129 | input(
130 | "\n\nPlease enter a number from 1 to {number}"
131 | ' to download a book or press "C" to abort'
132 | " search: ".format(number=len(extensions))
133 | )
134 | )
135 | if tell_me in choices:
136 | if tell_me == "C" or tell_me == "c":
137 | print("Aborted!")
138 | return None
139 | else:
140 | c = int(tell_me) - 1
141 | results = [mirrors[c], extensions[c]]
142 | return results
143 | except ValueError:
144 | print("\nNo results found or bad Internet connection.")
145 | print("Please,try again.")
146 | return None
147 | else:
148 | print("\nNo results found or bad Internet connection.")
149 | print("Please,try again.")
150 |
151 |
152 | def single_search():
153 | def search(term):
154 | # This function is used when searching to LibGen with the command
155 | # bookcut search -t "keyword"
156 |
157 | url = mirror_checker()
158 | if url is not None:
159 | br = mechanize.Browser()
160 | br.set_handle_robots(False) # ignore robots
161 | br.set_handle_refresh(False) #
162 | br.addheaders = [("User-agent", "Firefox")]
163 |
164 | br.open(url)
165 | br.select_form("libgen")
166 | input_form = term
167 | br.form["req"] = input_form
168 | ac = br.submit()
169 | html_from_page = ac
170 | soup = Soup(html_from_page, "html.parser")
171 | table = soup.find_all("table")[2]
172 |
173 | table_data = []
174 | mirrors = []
175 | extensions = []
176 |
177 | for i in table:
178 | j = 0
179 | try:
180 | td = i.find_all("td")
181 | for tr in td:
182 | # scrape mirror links
183 | if j == 9:
184 | temp = tr.find("a", href=True)
185 | mirrors.append(temp["href"])
186 | j = j + 1
187 | row = [tr.text for tr in td]
188 | table_data.append(row)
189 | extensions.append(row[8])
190 |
191 | except:
192 | pass
193 |
194 | # Clean result page
195 | for j in table_data:
196 | j.pop(0)
197 | del j[8:15]
198 | headers = [
199 | "Author(s)",
200 | "Title",
201 | "Publisher",
202 | "Year",
203 | "Pages",
204 | "Language",
205 | "Size",
206 | "Extension",
207 | ]
208 |
209 | try:
210 | tabular = pd.DataFrame(table_data)
211 | tabular.index += 1
212 | tabular.columns = headers
213 | print(tabular)
214 | choices = []
215 | temp = len(mirrors) + 1
216 | for i in range(1, temp):
217 | choices.append(str(i))
218 | choices.append("C")
219 | choices.append("c")
220 | while True:
221 | tell_me = str(
222 | input(
223 | "\n\nPlease enter a number from 1 to {number}"
224 | ' to download a book or press "C" to abort'
225 | " search: ".format(number=len(extensions))
226 | )
227 | )
228 | if tell_me in choices:
229 | if tell_me == "C" or tell_me == "c":
230 | print("Aborted!")
231 | return None
232 | else:
233 | c = int(tell_me) - 1
234 | print(mirrors[c], " ", extensions[c])
235 | results = [mirrors[c], extensions[c]]
236 | return results
237 | except ValueError:
238 | print("\nNo results found or bad Internet connection.")
239 | print("Please,try again.")
240 | return None
241 | else:
242 | print("\nNo results found or bad Internet connection.")
243 | print("Please,try again.")
244 |
245 |
246 | def choose_a_book(dataframe):
247 | # asks the user which book to download from the printed DataFrame
248 | if dataframe.empty is False:
249 | dataframe.index += 1
250 | print(dataframe[["Author(s)", "Title", "Size", "Extension"]])
251 |
252 | urls = dataframe["Url"].to_list()
253 | titles = dataframe["Title"].to_list()
254 | extensions = dataframe["Extension"].to_list()
255 | choices = []
256 | temp = len(urls) + 1
257 | for i in range(1, temp):
258 | choices.append(str(i))
259 | choices.append("C")
260 | choices.append("c")
261 | try:
262 | while True:
263 | tell_me = str(
264 | input(
265 | "\n\nPlease enter a number from 1 to {number}"
266 | ' to download a book or press "C" to abort'
267 | " search: ".format(number=len(urls))
268 | )
269 | )
270 | if tell_me in choices:
271 | if tell_me == "C" or tell_me == "c":
272 | print("Aborted!")
273 | return None
274 | else:
275 | c = int(tell_me) - 1
276 | filename = titles[c] + "." + extensions[c]
277 | filename = filename_refubrished(filename)
278 | if urls[c].startswith("https://export.arxiv.org/"):
279 | search_downloader(filename, urls[c])
280 | return False
281 | else:
282 | mirror_used = mirror_checker(False)
283 | link = mirror_used + urls[c]
284 | details = link_finder(link, mirror_used)
285 | file_link = details[1]
286 | search_downloader(filename, file_link)
287 | return False
288 | except ValueError:
289 | print(RESULT_ERROR)
290 | print("Please,try again.")
291 | return None
292 | else:
293 | print(RESULT_ERROR)
294 |
--------------------------------------------------------------------------------
/bookcut/settings.py:
--------------------------------------------------------------------------------
1 | import configparser
2 | import os
3 | from bookcut.downloader import pathfinder
4 |
5 |
6 | def initial_config():
7 | """function to create settings .ini file, used also for restore settings"""
8 | try:
9 | write_config = configparser.ConfigParser()
10 | module_path = os.path.dirname(os.path.realpath(__file__))
11 | settings_ini = os.path.join(module_path, "Settings.ini")
12 |
13 | write_config.add_section("LibGen")
14 | write_config.add_section("Settings")
15 | mirrors = "https://libgen.lc/,http://libgen.li/,http://185.39.10.101/,http://genesis.lib/"
16 | write_config.set("LibGen", "mirrors", mirrors)
17 | write_config.set("Settings", "clean_screen", "True")
18 | write_config.set("Settings", "destination", "None")
19 |
20 | cfgfile = open(settings_ini, "w")
21 | write_config.write(cfgfile)
22 | cfgfile.close()
23 | except PermissionError as error:
24 | print("\n", error)
25 | print("You have to be administrator to change BookCut settings. ")
26 |
27 |
28 | def mirrors_append(url):
29 | """function to append the LibGen mirrors list"""
30 |
31 | try:
32 |
33 | # READ EXISTING LIST
34 | config = configparser.ConfigParser()
35 | module_path = os.path.dirname(os.path.realpath(__file__))
36 | settings_ini = os.path.join(module_path, "Settings.ini")
37 |
38 | config.read(settings_ini)
39 | mirrors = config.get("LibGen", "mirrors")
40 | mirrors = mirrors + "," + url
41 |
42 | # APPEND LIST
43 | mirrors = str(mirrors)
44 | config.set("LibGen", "mirrors", mirrors)
45 |
46 | # WRITE TO INI FILE
47 | cfgfile = open(settings_ini, "w")
48 | config.write(cfgfile)
49 | cfgfile.close()
50 |
51 | # Succefully message
52 | print("\nSuccesfully added to list!:")
53 | mirrors = mirrors.split(",")
54 | for i in mirrors:
55 | print(i)
56 | except PermissionError as error:
57 | print("\n", error)
58 | print("You have to be administrator to change BookCut settings. ")
59 |
60 |
61 | def read_settings():
62 | # read the config file settings and printing them
63 |
64 | # get ini file path
65 | config = configparser.ConfigParser()
66 | module_path = os.path.dirname(os.path.realpath(__file__))
67 | settings_ini = os.path.join(module_path, "Settings.ini")
68 |
69 | # get values
70 | config.read(settings_ini)
71 | clean_screen = config.get("Settings", "clean_screen")
72 | destination = config.get("Settings", "destination")
73 | settings = [clean_screen, destination]
74 | return settings
75 |
76 |
77 | def print_settings():
78 | """Prints settings"""
79 | settings = read_settings()
80 |
81 | print("\nBookCut Settings:\n")
82 | print("1.Clean Screen Option Enabled: ", settings[0])
83 | print("2.Destination Folder Path: ", settings[1])
84 |
85 |
86 | def screen_setting(input):
87 | """clean screen settings adjust"""
88 | try:
89 | config = configparser.ConfigParser()
90 | module_path = os.path.dirname(os.path.realpath(__file__))
91 | settings_ini = os.path.join(module_path, "Settings.ini")
92 |
93 | config.read(settings_ini)
94 | config.set("Settings", "clean_screen", input)
95 | cfgfile = open(settings_ini, "w")
96 | config.write(cfgfile)
97 | cfgfile.close()
98 | except PermissionError as error:
99 | print("\n", error)
100 | print("You have to be administrator to change BookCut settings. ")
101 |
102 |
103 | def set_destination(path):
104 | try:
105 | if os.path.isdir(path):
106 | module_path = os.path.dirname(os.path.realpath(__file__))
107 | settings_ini = os.path.join(module_path, "Settings.ini")
108 |
109 | config = configparser.ConfigParser()
110 | config.read(settings_ini)
111 | config.set("Settings", "destination", path)
112 | cfgfile = open(settings_ini, "w")
113 | config.write(cfgfile)
114 | cfgfile.close()
115 | print("Destination path changed!\n", path)
116 | else:
117 | try:
118 | os.makedirs(path)
119 | print("Created folder: ", path)
120 | except FileNotFoundError as error:
121 | print("\n", error)
122 | print("(!) Not a valid path please try again!")
123 |
124 | except PermissionError as error:
125 | print("\n", error)
126 | print("(!) You have to be administrator to change BookCut settings!")
127 |
128 |
129 | def path_checker():
130 | settings = read_settings()
131 | if settings[1] != "None":
132 | return settings[1]
133 | else:
134 | path = pathfinder()
135 | return path
136 |
137 |
138 | if __name__ == "__main__":
139 | initial_config()
140 |
--------------------------------------------------------------------------------
/conftest.py:
--------------------------------------------------------------------------------
1 | def pytest_addoption(parser):
2 | parser.addoption(
3 | "--web",
4 | action="store_true",
5 | dest="web",
6 | default=False,
7 | help="enable tests requiring an internet connection",
8 | )
9 |
10 |
11 | def pytest_configure(config):
12 | if not config.option.web:
13 | setattr(config.option, "markexpr", "not web")
14 |
--------------------------------------------------------------------------------
/pytest.ini:
--------------------------------------------------------------------------------
1 | [pytest]
2 | minversion = 6.0
3 | testpaths =
4 | tests
5 | markers =
6 | web: mark tests which require an internet connection
7 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | import setuptools
2 | import sys
3 | import pathlib
4 |
5 | if sys.version_info.major < 3:
6 | print("\nPython 2 is not supported! \nPlease upgrade to Python 3.\n")
7 | print(
8 | "Installation of BookCut stopped, please try again with\n"
9 | "a newer version of Python!"
10 | )
11 | sys.exit(1)
12 |
13 | # The directory containing this file
14 | HERE = pathlib.Path(__file__).parent
15 |
16 | # The text of the README file
17 | README = (HERE / "README.md").read_text()
18 |
19 | setuptools.setup(
20 | name="BookCut",
21 | python_requires=">3.5.2",
22 | version="1.3.7",
23 | author="Costis94",
24 | author_email="gravitymusician@gmail.com",
25 | description="Command Line Interface app to download ebooks",
26 | long_description_content_type="text/markdown",
27 | long_description=README,
28 | url="https://github.com/costis94/bookcut",
29 | packages=setuptools.find_packages(),
30 | classifiers=[
31 | "Programming Language :: Python :: 3",
32 | "License :: OSI Approved :: MIT License",
33 | "Operating System :: OS Independent",
34 | ],
35 | install_requires=[
36 | "pandas",
37 | "click>=7.1.2",
38 | "requests",
39 | "beautifulsoup4",
40 | "pyfiglet",
41 | "tqdm",
42 | "mechanize",
43 | ],
44 | extras_require={
45 | "dev": [
46 | "pytest",
47 | "pytest-cov",
48 | "pre-commit",
49 | "black",
50 | ]
51 | },
52 | include_package_data=True,
53 | entry_points="""
54 | [console_scripts]
55 | bookcut=bookcut.bookcut:entry
56 | """,
57 | )
58 |
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/costis94/bookcut/88a06bf6e7962f6b013b9f45d23886e255d7a9f2/tests/__init__.py
--------------------------------------------------------------------------------
/tests/test_book.py:
--------------------------------------------------------------------------------
1 | import pytest
2 | from bookcut.mirror_checker import main as mirror_checker
3 | from bookcut.book import Booksearch
4 |
5 |
6 | @pytest.mark.web
7 | def test_single_book_download():
8 | title = "Iliad"
9 | author = "Homer"
10 | publisher = " "
11 | type_format = " "
12 | book = Booksearch(title, author, publisher, type_format, mirror_checker())
13 | result = book.search()
14 | extensions = result["extensions"]
15 | print("extensions: ", extensions)
16 | tb = result["table_data"]
17 | mirrors = result["mirrors"]
18 | assert mirrors[0].startswith("http"), "Not correct format of Mirror URL."
19 | assert type(extensions) is list, "Wrong format of extension details."
20 | file_details = book.give_result(extensions, tb, mirrors, extensions[0])
21 |
--------------------------------------------------------------------------------
/tests/test_bookcut.py:
--------------------------------------------------------------------------------
1 | import pytest
2 |
3 | from click.testing import CliRunner
4 | from bookcut import __version__
5 | from bookcut.bookcut import entry
6 |
7 |
8 | def test_entry_with_version_option():
9 | cli_output = CliRunner().invoke(entry, ["--version"])
10 | assert cli_output.exit_code == 0
11 | assert cli_output.output == f"commands, version {__version__}\n"
12 |
--------------------------------------------------------------------------------
/tests/test_main.py:
--------------------------------------------------------------------------------
1 | import unittest
2 |
--------------------------------------------------------------------------------
/tests/test_mirror_checker.py:
--------------------------------------------------------------------------------
1 | import pytest
2 | from bookcut.mirror_checker import pageStatus, main as mirror_checker
3 | from bookcut.mirror_checker import requests, CONNECTION_ERROR_MESSAGE
4 | from requests import ConnectionError
5 |
6 | TEST_URL = "http://www.sometesturl.com"
7 |
8 |
9 | @pytest.mark.web
10 | def test_mirror_availability():
11 | available_mirror = mirror_checker()
12 | assert type(available_mirror) is str, "Not correct type of LibGen Url"
13 | assert available_mirror.startswith("http"), "Not correct LibGen Url."
14 |
15 |
16 | @pytest.mark.parametrize("status_code", [200, 301])
17 | def test_openLibraryStatus_output_if_it_can_connect(monkeypatch, capsys, status_code):
18 | def mock_requests_head(_):
19 | return type("_", (), {"status_code": status_code})
20 |
21 | monkeypatch.setattr(requests, "head", mock_requests_head)
22 | assert pageStatus(TEST_URL)
23 | captured = capsys.readouterr()
24 | assert captured.out == f"Connected to: {TEST_URL}\n"
25 |
26 |
27 | def test_openLibraryStatus_output_for_wrong_status_code(monkeypatch, capsys):
28 | def mock_requests_head(_):
29 | return type("_", (), {"status_code": 42})
30 |
31 | monkeypatch.setattr(requests, "head", mock_requests_head)
32 | assert not pageStatus(TEST_URL)
33 | captured = capsys.readouterr()
34 | assert captured.out == CONNECTION_ERROR_MESSAGE.format(TEST_URL) + "\n"
35 |
36 |
37 | def test_openLibraryStatus_output_on_connection_error(monkeypatch, capsys):
38 | def mock_requests_head(_):
39 | raise ConnectionError
40 |
41 | monkeypatch.setattr(requests, "head", mock_requests_head)
42 | assert not pageStatus(TEST_URL)
43 | captured = capsys.readouterr()
44 | assert captured.out == CONNECTION_ERROR_MESSAGE.format(TEST_URL) + "\n"
45 |
46 |
47 | @pytest.mark.web
48 | def test_open_libraryStatus():
49 | status = pageStatus(url="http://www.openlibrary.org")
50 | assert status is not False, "OpenLibrary Status =! 200"
51 |
52 |
53 | @pytest.mark.web
54 | def test_archiv_Status():
55 | status = pageStatus(url="http://export.arxiv.org/")
56 | assert status is not False, "Archiv Status =! 200"
57 |
--------------------------------------------------------------------------------