├── LICENSE.txt
├── README.md
└── scopy.py


/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 narimiran
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Scopy
  2 | 
  3 | Python script for searching through your digital books and cataloguing them in an easy-to-share list of files.
  4 | 
  5 | &nbsp;
  6 | 
  7 | ## How To Use
  8 | 
  9 | ### Basic usage:
 10 | 
 11 | ```bash
 12 | $ python scopy.py
 13 | ```
 14 | Searches current folder and all its subfolders for .epub, .mobi, and .pdf files. Outputs the results to the console:
 15 | 
 16 | ```
 17 | Filename:                                          Ext:     Size:    Relative path:
 18 | Epub In First Sub                                  .epub    515  B   /first_subdirectory
 19 | Epub In Second Sub                                 .epub      3 KB   /second_subdirectory
 20 | Mobi In First Sub                                  .mobi      2 KB   /first_subdirectory
 21 | Pdf In Dir                                         .pdf      63 KB
 22 | Pdf In Second Sub                                  .pdf       1 KB   /second_subdirectory
 23 | ```
 24 | 
 25 | If you want to save the results in the easy to share file, provide the `--outfile` (`-o`) argument:
 26 | 
 27 | ```bash
 28 | $ python scopy.py -o my_books.txt
 29 | ```
 30 | ---
 31 | 
 32 | The list of all options can be seen by calling `help` with:
 33 | ```bash
 34 | $ python scopy.py -h
 35 | ```
 36 | 
 37 | ```
 38 | usage: scopy.py [-h] [-d DIR] [-e [EXT [EXT ...]]] [-c] [-f [F [F ...]]]
 39 |                 [-m N] [-i [DIR [DIR ...]]] [-r] [-s [S [S ...]]] [-z] [-v]
 40 |                 [-o FILE]
 41 | 
 42 | Catalogue your digital books (and more)
 43 | 
 44 | optional arguments:
 45 |   -h, --help            show this help message and exit
 46 |   -d DIR, --directory DIR
 47 |                         Path to the directory you want to scan. Default:
 48 |                         current directory
 49 |   -e [EXT [EXT ...]], --ext [EXT [EXT ...]]
 50 |                         Choose wanted file extensions. Default: ['pdf',
 51 |                         'epub', 'mobi']
 52 |   -c, --current         Scan just the current directory, without subfolders.
 53 |                         Default: False
 54 |   -f [F [F ...]], --filter [F [F ...]]
 55 |                         Filter results to include only the filenames
 56 |                         containing these words. Default: None
 57 |   -m N, --minsize N     Include only the files larger than the provided size
 58 |                         (in bytes). Can use suffixes `k`, `m`, and `g` for
 59 |                         kilo-, mega-, and giga-bytes. For example: 64k.
 60 |                         Default: 0
 61 |   -i [DIR [DIR ...]], --ignore [DIR [DIR ...]]
 62 |                         Ignores subdirectories containing these words.
 63 |                         Default: None
 64 |   -r, --raw             Keep the original filenames, don't change to Title
 65 |                         Case, and don't replace symbols such as -, _, +, etc.
 66 |                         Default: False
 67 |   -s [S [S ...]], --sort [S [S ...]]
 68 |                         Sort files by: [n]ame, [e]xtension, [s]ize,
 69 |                         [d]irectory, or their combination. Default: by Name
 70 |   -z, --descending      Sort file descending: from Z to A, from larger to
 71 |                         smaller. Default: False
 72 |   -v, --verbose         Output summary statistics at the top. Default: False
 73 |   -o FILE, --outfile FILE
 74 |                         Choose an output file to save the results. Default:
 75 |                         None, prints to console
 76 | ```
 77 | 
 78 | 
 79 | ### More examples
 80 | 
 81 | ```bash
 82 | $ python scopy.py -e pdf -i first
 83 | ```
 84 | Searches current folder and all its subfolders for files with `.pdf` extension (`-e`), ignoring subdirectories (`-i`) containing the word `first`:
 85 | 
 86 | ```
 87 | Filename:                                          Ext:     Size:    Relative path:
 88 | Pdf In Dir                                         .pdf      63 KB
 89 | Pdf In Second Sub                                  .pdf       1 KB   /second_subdirectory
 90 | ```
 91 | 
 92 | ---
 93 | 
 94 | ```bash
 95 | $ python scopy.py -f sub -s d e n -r -v
 96 | ```
 97 | Filter (`-f`) the results to only the filenames including the word `sub`, sort by (`-s`) directory (`d`), then extension (`e`), then name (`n`). Keep raw (`-r`) filenames (without Title Case and without replacing symbols). Make verbose (`-v`) output:
 98 | 
 99 | ```
100 | Scanned directory:                  [absolute path]/scopy/scopy_example
101 | Looking for files containing:       sub
102 | With extensions:                    .epub, .pdf, .mobi
103 | Found:                              4 files
104 | 
105 | 
106 | Filename:                                          Ext:     Size:    Relative path:
107 | epub_in_first_sub                                  .epub    515  B   /first_subdirectory
108 | mobi_in_first_sub                                  .mobi      2 KB   /first_subdirectory
109 | epub_in_second_sub                                 .epub      3 KB   /second_subdirectory
110 | pdf_in_second_sub                                  .pdf       1 KB   /second_subdirectory
111 | ```
112 | 
113 | ---
114 | 
115 | ```bash
116 | $ python scopy.py -d D:/Documents/Books -c -o book_list.txt
117 | ```
118 | 
119 | Scan `D:/Documents/Books` folder (both absolute and relative paths can be used), without subfolders (`-c`), and save the results in the output file (`-o`) called `book_list.txt`.
120 | 
121 | &nbsp;
122 | 
123 | ## Installation
124 | 
125 | ### Requirements
126 | 
127 | Python 3.4+
128 | 
129 | No other dependencies.
130 | 
131 | ### Install
132 | 
133 | Clone this repo:
134 | ```bash
135 | git clone https://github.com/narimiran/scopy.git
136 | ```
137 | or just manually download the file [`scopy.py`](scopy.py).
138 | 
139 | &nbsp;
140 | 
141 | ## FAQ
142 | 
143 | > Why the name Scopy?
144 | 
145 | From the Greek verb σκοπέω (skopéō), meaning "I search". The suffix `py` is, of course, because of Python.
146 | 
147 | > Can't I just use X or Y, to get the same (or better) result?
148 | 
149 | You probably can. Scopy was done as a fun weekend project to practice my Python skills. It wasn't meant to be groundbreaking.
150 | 
151 | > Is there really a limit to search only for digital books? Can't I just search for any extension?
152 | 
153 | Scopy was started because I wanted to catalogue my .pdf collection, but as you figured it out - it can be used to search any format you like.
154 | 
155 | &nbsp;
156 | 
157 | ## License
158 | 
159 | MIT License.  
160 | See the details at [LICENSE](/LICENSE.txt).
161 | 


--------------------------------------------------------------------------------
/scopy.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | import re
  4 | 
  5 | 
  6 | class Scopy:
  7 |     """Object which holds parsed arguments, and depending on them searches for files.
  8 | 
  9 |     Public methods:
 10 |         run: runs all other public methods
 11 |         get_results: searches for files satisfying the provided criteria
 12 |         sort_results: sorts the results based on the `sort_by` argument
 13 |         format_results: formats content into columns, trims too long columns
 14 |         output_results: outputs formatted content
 15 |     """
 16 |     def __init__(self, args):
 17 |         self.directory = args.directory.replace('\\', '/')
 18 |         self.ext = self._set_extensions(args.ext)
 19 |         self.no_subs = args.no_subs
 20 |         self.filters = args.filters
 21 |         self.ignore = args.ignore
 22 |         self.minsize = self._set_minsize(args.minsize)
 23 |         self.raw = args.raw
 24 |         self.sort_by = args.sort_by
 25 |         self.descending = args.descending
 26 |         self.verbose = args.verbose
 27 |         self.outfile = self._set_outfile(args.outfile)
 28 | 
 29 |     def run(self):
 30 |         """The main method, used to run all other methods.
 31 | 
 32 |         Checks if the provided directory is valid and gets the results, based on
 33 |         the given criteria.
 34 |         Sorts the results based on `sort_by` attribute.
 35 |         Formats the sorted results in the columns.
 36 |         Outputs the formatted content either to console or to provided outfile.
 37 | 
 38 |         See other public methods for more information.
 39 |         """
 40 |         results = self.get_results()
 41 |         if results:
 42 |             sorted_results = self.sort_results(results)
 43 |             formatted_content = self.format_results(sorted_results)
 44 |             self.output_results(formatted_content)
 45 | 
 46 |     def get_results(self):
 47 |         """Gets the results for the provided arguments.
 48 | 
 49 |         Checks if the provided directory is valid.
 50 |         If it is not, prints the warning to the console and returns None.
 51 | 
 52 |         Otherwise:
 53 |         Depending on `no_subs` attribute, either gets the results only from
 54 |         the current directory or the current directory and all its subdirectories.
 55 |         Returns the results satisfying the provided criteria (extensions, filenames),
 56 |         as the list of tuples (filename, extension, size, path).
 57 |         """
 58 |         if os.path.isdir(self.directory):
 59 |             return self._search_directory(self.directory) if self.no_subs else self._search_all()
 60 |         else:
 61 |             print('{} is not a valid directory!'.format(self.directory))
 62 | 
 63 |     def sort_results(self, results):
 64 |         """Sorts the results based on the `sort_by` attribute.
 65 | 
 66 |         Args:
 67 |             results: list of 4-tuples (filename, extension, size, path)
 68 |                      provided by `get_results` method
 69 | 
 70 |         Columns can be sorted by name, extension, directory, or any combination
 71 |         of those. For example: first sort by directory, then by name.
 72 |         Returns the sorted list of tuples.
 73 |         """
 74 |         translate = {
 75 |             'n': 0,
 76 |             'e': 1,
 77 |             's': 2,
 78 |             'd': 3,
 79 |         }
 80 |         sort_order = lambda x: [x[translate[column]] for column in self.sort_by]
 81 |         return sorted(results, key=sort_order, reverse=self.descending)
 82 | 
 83 |     def format_results(self, results):
 84 |         """Formats the results in four columns. Trims too long columns.
 85 | 
 86 |         Args:
 87 |             results: list of 4-tuples (filename, extension, size, path)
 88 |                      provided by `sort_results` (or `get_results`) method
 89 | 
 90 |         Too long filenames are trimmed to fit in the `MAX_WIDTH`.
 91 |         Subdirectories in too long paths are stripped until path can fit in
 92 |         the `MAX_WIDTH`.
 93 |         The value for `MAX_WIDTH` is chosen in such way that the total width of
 94 |         the output is less than 120 characters, so two outputs can be shown
 95 |         side by side on a typical computer screen.
 96 | 
 97 |         Returns a string containing all the results.
 98 |         """
 99 |         def _trimmed(filename, path):
100 |             relative_path = path[HOME_LENGTH:].replace('\\', '/')
101 |             if not self.raw:
102 |                 filename = self._replace_symbols(filename)
103 | 
104 |             if len(filename) > MAX_WIDTH:
105 |                 filename = filename[:MAX_WIDTH-3] + '...'
106 |             if len(relative_path) > MAX_WIDTH:
107 |                 while len(relative_path) > MAX_WIDTH-3:
108 |                     relative_path = relative_path[relative_path.find('/', 1):]
109 |                 relative_path = '...' + relative_path
110 |             return filename, relative_path
111 | 
112 |         MAX_WIDTH = 50
113 |         COLUMNS = '{0:<{WIDTH}} {2:<8} {3:<8} {1}'
114 |         HOME_LENGTH = len(self.directory)
115 | 
116 |         column_names = COLUMNS.format(
117 |             'Filename:', 'Relative path:', 'Ext:', 'Size:',
118 |             WIDTH=MAX_WIDTH
119 |         )
120 |         results_string = '\n'.join(
121 |             COLUMNS.format(
122 |                 *_trimmed(filename, path),
123 |                 ext,
124 |                 self._convert_bytes(size),
125 |                 WIDTH=MAX_WIDTH)
126 |             for filename, ext, size, path in results)
127 | 
128 |         if self.verbose:
129 |             header = self._get_header(results)
130 |             return '\n\n\n'.join([header, '\n'.join([column_names, results_string])])
131 |         else:
132 |             return '\n'.join(['', column_names, results_string])
133 | 
134 |     def output_results(self, contents):
135 |         """Outputs given content, either to a file or to the console.
136 | 
137 |         Args:
138 |             contents: string with all the results
139 |                       provided by `format_results` method
140 | 
141 |         If `outfile` attribute is provided, writes to a file.
142 |         Otherwise, prints to the console.
143 |         """
144 |         self._write_to_file(contents) if self.outfile else print(contents)
145 | 
146 | 
147 |     def _search_directory(self, directory):
148 |         results = []
149 |         for file in os.listdir(directory):
150 |             filepath = os.path.join(directory, file)
151 |             if os.path.isfile(filepath):
152 |                 filename, ext = self._split(file)
153 |                 filesize = os.path.getsize(filepath)
154 |                 if self._satisfies_filters(filename, ext, filesize):
155 |                     results.append((filename, ext, filesize, directory))
156 |         return results
157 | 
158 |     def _search_all(self):
159 |         def _ignore_directories(dirs):
160 |             for d in dirs:
161 |                 if any(word.lower() in d.lower() for word in self.ignore):
162 |                     dirs.remove(d)
163 |             return dirs
164 | 
165 |         results = []
166 |         for path, dirs, _ in os.walk(self.directory):
167 |             if self.ignore:
168 |                 dirs = _ignore_directories(dirs)
169 |             results.extend(self._search_directory(path))
170 |         return results
171 | 
172 |     def _satisfies_filters(self, filename, ext, filesize):
173 |         is_valid_file = any(filt.lower() in filename.lower()
174 |                             for filt in self.filters) if self.filters else True
175 |         is_valid_ext = ext in self.ext
176 |         is_valid_size = filesize >= self.minsize
177 |         return is_valid_file and is_valid_ext and is_valid_size
178 | 
179 |     def _get_header(self, results):
180 |         header = [('', '')]
181 |         header.append(('Scanned directory:', os.path.abspath(self.directory).replace('\\', '/')))
182 |         if self.ignore:
183 |             header.append(('Ignoring subdirectories containing:', ', '.join(self.ignore)))
184 |         if self.filters:
185 |             header.append(('Looking for files containing:', ', '.join(self.filters)))
186 |         header.append(('With extensions:', ', '.join(self.ext)))
187 |         header.append(('Found:', '{} files'.format(len(results))))
188 |         return '\n'.join('{0:<36}{1}'.format(*line) for line in header)
189 | 
190 |     def _write_to_file(self, contens):
191 |         with open(self.outfile, 'w') as f:
192 |             f.writelines(contens)
193 |             f.writelines('\n\n\nCreated with Scopy. https://github.com/narimiran/scopy \n')
194 |         print('Results saved in {}'.format(self.outfile))
195 | 
196 | 
197 |     @staticmethod
198 |     def _set_extensions(extensions):
199 |         return {'.{}'.format(ext) if not ext.startswith('.') else ext
200 |                 for ext in extensions}
201 | 
202 |     @staticmethod
203 |     def _set_minsize(minsize):
204 |         def parse_size(size, multi=1):
205 |             try:
206 |                 size = multi * int(size)
207 |             except ValueError:
208 |                 print('WARNING: Not a valid format for file size!')
209 |                 print('Using the default value: 0\n')
210 |                 size = 0
211 |             return size
212 | 
213 |         translate = {
214 |             'k': 1024,
215 |             'm': 1024**2,
216 |             'g': 1024**3
217 |         }
218 |         if minsize[-1].isalpha():
219 |             multiplier = translate.get(minsize[-1].lower(), 1)
220 |             return parse_size(minsize[:-1], multiplier)
221 |         else:
222 |             return parse_size(minsize)
223 | 
224 |     @staticmethod
225 |     def _set_outfile(outfile):
226 |         if outfile:
227 |             if outfile.find('.') == -1:
228 |                 return outfile + '.txt'
229 |         return outfile
230 | 
231 |     @staticmethod
232 |     def _split(file):
233 |         ext_index = file.rfind('.')
234 |         return file[:ext_index], file[ext_index:]
235 | 
236 |     @staticmethod
237 |     def _replace_symbols(filename):
238 |         return ' '.join(re.sub(r'[.$%_-]', ' ', filename).split()).title()
239 | 
240 |     @staticmethod
241 |     def _convert_bytes(size):
242 |         for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
243 |             if size < 1024:
244 |                 return '{:3.0f} {:>2}'.format(size, unit)
245 |             size /= 1024
246 | 
247 | 
248 | def arg_parser():
249 |     parser = argparse.ArgumentParser(
250 |         description="Catalogue your digital books (and more)"
251 |     )
252 | 
253 |     parser.add_argument(
254 |         '-d', '--directory',
255 |         default='.',
256 |         metavar='DIR',
257 |         help='Path to the directory you want to scan. Default: current directory'
258 |     )
259 |     parser.add_argument(
260 |         '-e', '--ext',
261 |         default=['pdf', 'epub', 'mobi'],
262 |         nargs='*',
263 |         help="Choose wanted file extensions. Default: ['pdf', 'epub', 'mobi']",
264 |     )
265 |     parser.add_argument(
266 |         '-c', '--current',
267 |         action='store_true',
268 |         dest='no_subs',
269 |         help='Scan just the current directory, without subfolders. Default: False',
270 |     )
271 |     parser.add_argument(
272 |         '-f', '--filter',
273 |         default=None,
274 |         metavar='F',
275 |         nargs='*',
276 |         dest='filters',
277 |         help='Filter results to include only the filenames containing these words. '
278 |              'Default: None',
279 |     )
280 |     parser.add_argument(
281 |         '-m', '--minsize',
282 |         default='0',
283 |         metavar='N',
284 |         help='Include only the files larger than the provided size (in bytes). '
285 |              'Can use suffixes `k`, `m`, and `g` for kilo-, mega-, and giga-bytes. '
286 |              'For example: 64k. Default: 0',
287 |     )
288 |     parser.add_argument(
289 |         '-i', '--ignore',
290 |         default=None,
291 |         nargs='*',
292 |         metavar='DIR',
293 |         help='Ignores subdirectories containing these words. Default: None',
294 |     )
295 |     parser.add_argument(
296 |         '-r', '--raw',
297 |         action='store_true',
298 |         help="Keep the original filenames, don't change to Title Case, and "
299 |              "don't replace symbols such as -, _, +, etc. Default: False",
300 |     )
301 |     parser.add_argument(
302 |         '-s', '--sort',
303 |         default=['n'],
304 |         choices=['n', 'e', 's', 'd'],
305 |         nargs='*',
306 |         metavar='S',
307 |         dest='sort_by',
308 |         help='Sort files by: [n]ame, [e]xtension, [s]ize, [d]irectory, '
309 |              'or their combination. Default: by Name',
310 |     )
311 |     parser.add_argument(
312 |         '-z', '--descending',
313 |         action='store_true',
314 |         help='Sort file descending: from Z to A, from larger to smaller. '
315 |              'Default: False'
316 |     )
317 |     parser.add_argument(
318 |         '-v', '--verbose',
319 |         action='store_true',
320 |         help='Output summary statistics at the top. Default: False',
321 |     )
322 |     parser.add_argument(
323 |         '-o', '--outfile',
324 |         default=None,
325 |         metavar='FILE',
326 |         help='Choose an output file to save the results. Default: None, prints to console',
327 |     )
328 |     return parser.parse_args()
329 | 
330 | 
331 | def main():
332 |     args = arg_parser()
333 |     sc = Scopy(args)
334 |     sc.run()
335 | 
336 | 
337 | if __name__ == '__main__':
338 |     main()
339 | 


--------------------------------------------------------------------------------