├── LICENSE.txt ├── README.md └── scopy.py /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 narimiran 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Scopy 2 | 3 | Python script for searching through your digital books and cataloguing them in an easy-to-share list of files. 4 | 5 |   6 | 7 | ## How To Use 8 | 9 | ### Basic usage: 10 | 11 | ```bash 12 | $ python scopy.py 13 | ``` 14 | Searches current folder and all its subfolders for .epub, .mobi, and .pdf files. Outputs the results to the console: 15 | 16 | ``` 17 | Filename: Ext: Size: Relative path: 18 | Epub In First Sub .epub 515 B /first_subdirectory 19 | Epub In Second Sub .epub 3 KB /second_subdirectory 20 | Mobi In First Sub .mobi 2 KB /first_subdirectory 21 | Pdf In Dir .pdf 63 KB 22 | Pdf In Second Sub .pdf 1 KB /second_subdirectory 23 | ``` 24 | 25 | If you want to save the results in the easy to share file, provide the `--outfile` (`-o`) argument: 26 | 27 | ```bash 28 | $ python scopy.py -o my_books.txt 29 | ``` 30 | --- 31 | 32 | The list of all options can be seen by calling `help` with: 33 | ```bash 34 | $ python scopy.py -h 35 | ``` 36 | 37 | ``` 38 | usage: scopy.py [-h] [-d DIR] [-e [EXT [EXT ...]]] [-c] [-f [F [F ...]]] 39 | [-m N] [-i [DIR [DIR ...]]] [-r] [-s [S [S ...]]] [-z] [-v] 40 | [-o FILE] 41 | 42 | Catalogue your digital books (and more) 43 | 44 | optional arguments: 45 | -h, --help show this help message and exit 46 | -d DIR, --directory DIR 47 | Path to the directory you want to scan. Default: 48 | current directory 49 | -e [EXT [EXT ...]], --ext [EXT [EXT ...]] 50 | Choose wanted file extensions. Default: ['pdf', 51 | 'epub', 'mobi'] 52 | -c, --current Scan just the current directory, without subfolders. 53 | Default: False 54 | -f [F [F ...]], --filter [F [F ...]] 55 | Filter results to include only the filenames 56 | containing these words. Default: None 57 | -m N, --minsize N Include only the files larger than the provided size 58 | (in bytes). Can use suffixes `k`, `m`, and `g` for 59 | kilo-, mega-, and giga-bytes. For example: 64k. 60 | Default: 0 61 | -i [DIR [DIR ...]], --ignore [DIR [DIR ...]] 62 | Ignores subdirectories containing these words. 63 | Default: None 64 | -r, --raw Keep the original filenames, don't change to Title 65 | Case, and don't replace symbols such as -, _, +, etc. 66 | Default: False 67 | -s [S [S ...]], --sort [S [S ...]] 68 | Sort files by: [n]ame, [e]xtension, [s]ize, 69 | [d]irectory, or their combination. Default: by Name 70 | -z, --descending Sort file descending: from Z to A, from larger to 71 | smaller. Default: False 72 | -v, --verbose Output summary statistics at the top. Default: False 73 | -o FILE, --outfile FILE 74 | Choose an output file to save the results. Default: 75 | None, prints to console 76 | ``` 77 | 78 | 79 | ### More examples 80 | 81 | ```bash 82 | $ python scopy.py -e pdf -i first 83 | ``` 84 | Searches current folder and all its subfolders for files with `.pdf` extension (`-e`), ignoring subdirectories (`-i`) containing the word `first`: 85 | 86 | ``` 87 | Filename: Ext: Size: Relative path: 88 | Pdf In Dir .pdf 63 KB 89 | Pdf In Second Sub .pdf 1 KB /second_subdirectory 90 | ``` 91 | 92 | --- 93 | 94 | ```bash 95 | $ python scopy.py -f sub -s d e n -r -v 96 | ``` 97 | Filter (`-f`) the results to only the filenames including the word `sub`, sort by (`-s`) directory (`d`), then extension (`e`), then name (`n`). Keep raw (`-r`) filenames (without Title Case and without replacing symbols). Make verbose (`-v`) output: 98 | 99 | ``` 100 | Scanned directory: [absolute path]/scopy/scopy_example 101 | Looking for files containing: sub 102 | With extensions: .epub, .pdf, .mobi 103 | Found: 4 files 104 | 105 | 106 | Filename: Ext: Size: Relative path: 107 | epub_in_first_sub .epub 515 B /first_subdirectory 108 | mobi_in_first_sub .mobi 2 KB /first_subdirectory 109 | epub_in_second_sub .epub 3 KB /second_subdirectory 110 | pdf_in_second_sub .pdf 1 KB /second_subdirectory 111 | ``` 112 | 113 | --- 114 | 115 | ```bash 116 | $ python scopy.py -d D:/Documents/Books -c -o book_list.txt 117 | ``` 118 | 119 | Scan `D:/Documents/Books` folder (both absolute and relative paths can be used), without subfolders (`-c`), and save the results in the output file (`-o`) called `book_list.txt`. 120 | 121 |   122 | 123 | ## Installation 124 | 125 | ### Requirements 126 | 127 | Python 3.4+ 128 | 129 | No other dependencies. 130 | 131 | ### Install 132 | 133 | Clone this repo: 134 | ```bash 135 | git clone https://github.com/narimiran/scopy.git 136 | ``` 137 | or just manually download the file [`scopy.py`](scopy.py). 138 | 139 |   140 | 141 | ## FAQ 142 | 143 | > Why the name Scopy? 144 | 145 | From the Greek verb σκοπέω (skopéō), meaning "I search". The suffix `py` is, of course, because of Python. 146 | 147 | > Can't I just use X or Y, to get the same (or better) result? 148 | 149 | You probably can. Scopy was done as a fun weekend project to practice my Python skills. It wasn't meant to be groundbreaking. 150 | 151 | > Is there really a limit to search only for digital books? Can't I just search for any extension? 152 | 153 | Scopy was started because I wanted to catalogue my .pdf collection, but as you figured it out - it can be used to search any format you like. 154 | 155 |   156 | 157 | ## License 158 | 159 | MIT License. 160 | See the details at [LICENSE](/LICENSE.txt). 161 | -------------------------------------------------------------------------------- /scopy.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import re 4 | 5 | 6 | class Scopy: 7 | """Object which holds parsed arguments, and depending on them searches for files. 8 | 9 | Public methods: 10 | run: runs all other public methods 11 | get_results: searches for files satisfying the provided criteria 12 | sort_results: sorts the results based on the `sort_by` argument 13 | format_results: formats content into columns, trims too long columns 14 | output_results: outputs formatted content 15 | """ 16 | def __init__(self, args): 17 | self.directory = args.directory.replace('\\', '/') 18 | self.ext = self._set_extensions(args.ext) 19 | self.no_subs = args.no_subs 20 | self.filters = args.filters 21 | self.ignore = args.ignore 22 | self.minsize = self._set_minsize(args.minsize) 23 | self.raw = args.raw 24 | self.sort_by = args.sort_by 25 | self.descending = args.descending 26 | self.verbose = args.verbose 27 | self.outfile = self._set_outfile(args.outfile) 28 | 29 | def run(self): 30 | """The main method, used to run all other methods. 31 | 32 | Checks if the provided directory is valid and gets the results, based on 33 | the given criteria. 34 | Sorts the results based on `sort_by` attribute. 35 | Formats the sorted results in the columns. 36 | Outputs the formatted content either to console or to provided outfile. 37 | 38 | See other public methods for more information. 39 | """ 40 | results = self.get_results() 41 | if results: 42 | sorted_results = self.sort_results(results) 43 | formatted_content = self.format_results(sorted_results) 44 | self.output_results(formatted_content) 45 | 46 | def get_results(self): 47 | """Gets the results for the provided arguments. 48 | 49 | Checks if the provided directory is valid. 50 | If it is not, prints the warning to the console and returns None. 51 | 52 | Otherwise: 53 | Depending on `no_subs` attribute, either gets the results only from 54 | the current directory or the current directory and all its subdirectories. 55 | Returns the results satisfying the provided criteria (extensions, filenames), 56 | as the list of tuples (filename, extension, size, path). 57 | """ 58 | if os.path.isdir(self.directory): 59 | return self._search_directory(self.directory) if self.no_subs else self._search_all() 60 | else: 61 | print('{} is not a valid directory!'.format(self.directory)) 62 | 63 | def sort_results(self, results): 64 | """Sorts the results based on the `sort_by` attribute. 65 | 66 | Args: 67 | results: list of 4-tuples (filename, extension, size, path) 68 | provided by `get_results` method 69 | 70 | Columns can be sorted by name, extension, directory, or any combination 71 | of those. For example: first sort by directory, then by name. 72 | Returns the sorted list of tuples. 73 | """ 74 | translate = { 75 | 'n': 0, 76 | 'e': 1, 77 | 's': 2, 78 | 'd': 3, 79 | } 80 | sort_order = lambda x: [x[translate[column]] for column in self.sort_by] 81 | return sorted(results, key=sort_order, reverse=self.descending) 82 | 83 | def format_results(self, results): 84 | """Formats the results in four columns. Trims too long columns. 85 | 86 | Args: 87 | results: list of 4-tuples (filename, extension, size, path) 88 | provided by `sort_results` (or `get_results`) method 89 | 90 | Too long filenames are trimmed to fit in the `MAX_WIDTH`. 91 | Subdirectories in too long paths are stripped until path can fit in 92 | the `MAX_WIDTH`. 93 | The value for `MAX_WIDTH` is chosen in such way that the total width of 94 | the output is less than 120 characters, so two outputs can be shown 95 | side by side on a typical computer screen. 96 | 97 | Returns a string containing all the results. 98 | """ 99 | def _trimmed(filename, path): 100 | relative_path = path[HOME_LENGTH:].replace('\\', '/') 101 | if not self.raw: 102 | filename = self._replace_symbols(filename) 103 | 104 | if len(filename) > MAX_WIDTH: 105 | filename = filename[:MAX_WIDTH-3] + '...' 106 | if len(relative_path) > MAX_WIDTH: 107 | while len(relative_path) > MAX_WIDTH-3: 108 | relative_path = relative_path[relative_path.find('/', 1):] 109 | relative_path = '...' + relative_path 110 | return filename, relative_path 111 | 112 | MAX_WIDTH = 50 113 | COLUMNS = '{0:<{WIDTH}} {2:<8} {3:<8} {1}' 114 | HOME_LENGTH = len(self.directory) 115 | 116 | column_names = COLUMNS.format( 117 | 'Filename:', 'Relative path:', 'Ext:', 'Size:', 118 | WIDTH=MAX_WIDTH 119 | ) 120 | results_string = '\n'.join( 121 | COLUMNS.format( 122 | *_trimmed(filename, path), 123 | ext, 124 | self._convert_bytes(size), 125 | WIDTH=MAX_WIDTH) 126 | for filename, ext, size, path in results) 127 | 128 | if self.verbose: 129 | header = self._get_header(results) 130 | return '\n\n\n'.join([header, '\n'.join([column_names, results_string])]) 131 | else: 132 | return '\n'.join(['', column_names, results_string]) 133 | 134 | def output_results(self, contents): 135 | """Outputs given content, either to a file or to the console. 136 | 137 | Args: 138 | contents: string with all the results 139 | provided by `format_results` method 140 | 141 | If `outfile` attribute is provided, writes to a file. 142 | Otherwise, prints to the console. 143 | """ 144 | self._write_to_file(contents) if self.outfile else print(contents) 145 | 146 | 147 | def _search_directory(self, directory): 148 | results = [] 149 | for file in os.listdir(directory): 150 | filepath = os.path.join(directory, file) 151 | if os.path.isfile(filepath): 152 | filename, ext = self._split(file) 153 | filesize = os.path.getsize(filepath) 154 | if self._satisfies_filters(filename, ext, filesize): 155 | results.append((filename, ext, filesize, directory)) 156 | return results 157 | 158 | def _search_all(self): 159 | def _ignore_directories(dirs): 160 | for d in dirs: 161 | if any(word.lower() in d.lower() for word in self.ignore): 162 | dirs.remove(d) 163 | return dirs 164 | 165 | results = [] 166 | for path, dirs, _ in os.walk(self.directory): 167 | if self.ignore: 168 | dirs = _ignore_directories(dirs) 169 | results.extend(self._search_directory(path)) 170 | return results 171 | 172 | def _satisfies_filters(self, filename, ext, filesize): 173 | is_valid_file = any(filt.lower() in filename.lower() 174 | for filt in self.filters) if self.filters else True 175 | is_valid_ext = ext in self.ext 176 | is_valid_size = filesize >= self.minsize 177 | return is_valid_file and is_valid_ext and is_valid_size 178 | 179 | def _get_header(self, results): 180 | header = [('', '')] 181 | header.append(('Scanned directory:', os.path.abspath(self.directory).replace('\\', '/'))) 182 | if self.ignore: 183 | header.append(('Ignoring subdirectories containing:', ', '.join(self.ignore))) 184 | if self.filters: 185 | header.append(('Looking for files containing:', ', '.join(self.filters))) 186 | header.append(('With extensions:', ', '.join(self.ext))) 187 | header.append(('Found:', '{} files'.format(len(results)))) 188 | return '\n'.join('{0:<36}{1}'.format(*line) for line in header) 189 | 190 | def _write_to_file(self, contens): 191 | with open(self.outfile, 'w') as f: 192 | f.writelines(contens) 193 | f.writelines('\n\n\nCreated with Scopy. https://github.com/narimiran/scopy \n') 194 | print('Results saved in {}'.format(self.outfile)) 195 | 196 | 197 | @staticmethod 198 | def _set_extensions(extensions): 199 | return {'.{}'.format(ext) if not ext.startswith('.') else ext 200 | for ext in extensions} 201 | 202 | @staticmethod 203 | def _set_minsize(minsize): 204 | def parse_size(size, multi=1): 205 | try: 206 | size = multi * int(size) 207 | except ValueError: 208 | print('WARNING: Not a valid format for file size!') 209 | print('Using the default value: 0\n') 210 | size = 0 211 | return size 212 | 213 | translate = { 214 | 'k': 1024, 215 | 'm': 1024**2, 216 | 'g': 1024**3 217 | } 218 | if minsize[-1].isalpha(): 219 | multiplier = translate.get(minsize[-1].lower(), 1) 220 | return parse_size(minsize[:-1], multiplier) 221 | else: 222 | return parse_size(minsize) 223 | 224 | @staticmethod 225 | def _set_outfile(outfile): 226 | if outfile: 227 | if outfile.find('.') == -1: 228 | return outfile + '.txt' 229 | return outfile 230 | 231 | @staticmethod 232 | def _split(file): 233 | ext_index = file.rfind('.') 234 | return file[:ext_index], file[ext_index:] 235 | 236 | @staticmethod 237 | def _replace_symbols(filename): 238 | return ' '.join(re.sub(r'[.$%_-]', ' ', filename).split()).title() 239 | 240 | @staticmethod 241 | def _convert_bytes(size): 242 | for unit in ['B', 'KB', 'MB', 'GB', 'TB']: 243 | if size < 1024: 244 | return '{:3.0f} {:>2}'.format(size, unit) 245 | size /= 1024 246 | 247 | 248 | def arg_parser(): 249 | parser = argparse.ArgumentParser( 250 | description="Catalogue your digital books (and more)" 251 | ) 252 | 253 | parser.add_argument( 254 | '-d', '--directory', 255 | default='.', 256 | metavar='DIR', 257 | help='Path to the directory you want to scan. Default: current directory' 258 | ) 259 | parser.add_argument( 260 | '-e', '--ext', 261 | default=['pdf', 'epub', 'mobi'], 262 | nargs='*', 263 | help="Choose wanted file extensions. Default: ['pdf', 'epub', 'mobi']", 264 | ) 265 | parser.add_argument( 266 | '-c', '--current', 267 | action='store_true', 268 | dest='no_subs', 269 | help='Scan just the current directory, without subfolders. Default: False', 270 | ) 271 | parser.add_argument( 272 | '-f', '--filter', 273 | default=None, 274 | metavar='F', 275 | nargs='*', 276 | dest='filters', 277 | help='Filter results to include only the filenames containing these words. ' 278 | 'Default: None', 279 | ) 280 | parser.add_argument( 281 | '-m', '--minsize', 282 | default='0', 283 | metavar='N', 284 | help='Include only the files larger than the provided size (in bytes). ' 285 | 'Can use suffixes `k`, `m`, and `g` for kilo-, mega-, and giga-bytes. ' 286 | 'For example: 64k. Default: 0', 287 | ) 288 | parser.add_argument( 289 | '-i', '--ignore', 290 | default=None, 291 | nargs='*', 292 | metavar='DIR', 293 | help='Ignores subdirectories containing these words. Default: None', 294 | ) 295 | parser.add_argument( 296 | '-r', '--raw', 297 | action='store_true', 298 | help="Keep the original filenames, don't change to Title Case, and " 299 | "don't replace symbols such as -, _, +, etc. Default: False", 300 | ) 301 | parser.add_argument( 302 | '-s', '--sort', 303 | default=['n'], 304 | choices=['n', 'e', 's', 'd'], 305 | nargs='*', 306 | metavar='S', 307 | dest='sort_by', 308 | help='Sort files by: [n]ame, [e]xtension, [s]ize, [d]irectory, ' 309 | 'or their combination. Default: by Name', 310 | ) 311 | parser.add_argument( 312 | '-z', '--descending', 313 | action='store_true', 314 | help='Sort file descending: from Z to A, from larger to smaller. ' 315 | 'Default: False' 316 | ) 317 | parser.add_argument( 318 | '-v', '--verbose', 319 | action='store_true', 320 | help='Output summary statistics at the top. Default: False', 321 | ) 322 | parser.add_argument( 323 | '-o', '--outfile', 324 | default=None, 325 | metavar='FILE', 326 | help='Choose an output file to save the results. Default: None, prints to console', 327 | ) 328 | return parser.parse_args() 329 | 330 | 331 | def main(): 332 | args = arg_parser() 333 | sc = Scopy(args) 334 | sc.run() 335 | 336 | 337 | if __name__ == '__main__': 338 | main() 339 | --------------------------------------------------------------------------------