├── .hgignore
├── Package.ini
├── README.md
├── changelog.txt
├── constants.py
├── dmBookWrapper.py
├── dmParser.py
├── dmrules.dat.demo
├── duplicatesmanager.png
├── duplicatesmanager.py
├── duplicatesmanager.xcf
├── duplicatesmanager_small.png
├── getcvdb.py
├── processfunctions.py
├── re.py
├── traceback.py
└── utilsbycory.py


/.hgignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/.hgignore


--------------------------------------------------------------------------------
/Package.ini:
--------------------------------------------------------------------------------
1 | Name=Duplicates Manager
2 | Author=Perezmu and Pescuma
3 | Version=0.9
4 | Description=A manager to sort and remove duplicate ecomics from the library according to given rules.
5 | Image=duplicatesmanager_small.png
6 | KeepFiles= dmrules.dat


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ![http://i750.photobucket.com/albums/xx149/perezmu/duplicatesmanager-2.png](http://i750.photobucket.com/albums/xx149/perezmu/duplicatesmanager-2.png) `                     `  DUPLICATES MANAGER FOR [COMICRACK](http://comicrack.cyolito.com) #
 2 | 
 3 | ---
 4 | 
 5 | ## NEW VERSION 0.9 ##
 6 | 
 7 | ### Updated  with NEW RULES, see the [rules wiki](http://code.google.com/p/comicrack-duplicates-manager/wiki/RulesFileSyntax) ###
 8 | 
 9 | ```
10 | v0.9 -> 
11 | 
12 |    Added:    - New rules: 
13 |                     - scan keep/remove
14 |              - New toolbar icon
15 |              - Fix for typos and rare crash
16 | ```
17 | [Complete CHANGELOG](http://code.google.com/p/comicrack-duplicates-manager/wiki/Changelog)
18 | 
19 | ---
20 | 
21 | 
22 | ### IMPORTANT NOTICE (UPDATED AS OF VERSION 0.5) ###
23 | Since I do not want to mess with your files & library before we are sure this thing works right,
24 | |**the script out of the box will not move or remove any comic, just log what it would do in the logfile. To enable the actual processing of files you need to  add the values  true for the variables `MOVEFILES` and `REMOVEFROMLIB`** in the **dmrules.dat** file. See the [Configuration Options](http://code.google.com/p/comicrack-duplicates-manager/wiki/UserConfiguration?ts=1297328384&updated=UserConfiguration) Wiki page for details.|
25 | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
26 | 
27 | 
28 | ---
29 | 
30 | 
31 | This script is an addon to comicrack that identifies duplicated ecomics and follows a set of user defined rules to remove unwanted dupes. It is designed with the 0-days in mind, but should prove useful in other scenarios.
32 | 
33 | The script reads a file (**dmrules.dat**) from the directory where it is installed, that contains, both user options, and a series of rules to manage the duplicate files. Duplicate files that meet the criteria expressed in the rules are **moved** to a dump directory (not deleted) and removed from the comicrack library (default dump directory is `C:\_dupes_`). This directory also holds a logfile (**logfile.log**) that details the process followed on your comics
34 | 
35 | So, the first thing you want to do is read the rules (see wiki) and edit your custom **dmrules.dat** file.
36 | 
37 | Wiki Index:
38 | 
39 |   * [Overview](http://code.google.com/p/comicrack-duplicates-manager/wiki/Overview?ts=1297338327&updated=Overview): You need to read this first!!!!
40 |   * Rules are explained in detail in the Wiki: [Rules Syntax](http://code.google.com/p/comicrack-duplicates-manager/wiki/RulesFileSyntax).
41 |   * Then you can go to the [Tips and Examples](http://code.google.com/p/comicrack-duplicates-manager/wiki/TipsAndTricks?ts=1297265175&updated=TipsAndTricks) page.
42 |   * User defined variables are described in the [Configuration Options](http://code.google.com/p/comicrack-duplicates-manager/wiki/UserConfiguration?ts=1297328384&updated=UserConfiguration)  page.
43 |   * Credits for people who have (not necessary knowingly) contributed to this project in [the credits](http://code.google.com/p/comicrack-duplicates-manager/wiki/FellowCredits) page
44 |   * Screenshot and example files at [this wiki page](http://code.google.com/p/comicrack-duplicates-manager/wiki/ScreenshotExample)
45 |   * Changelog is in [this other page](http://code.google.com/p/comicrack-duplicates-manager/wiki/Changelog)
46 | 
47 | Discussion is provided in the [comicrack support forum](http://comicrack.cyolito.com/forum/13-scripts/12076-duplicates-manager#12076)
48 | 
49 | 
50 | ---
51 | 
52 | 
53 | Cheers!!!!! ![http://comicrack.cyolito.com/media/kunena/avatars/resized/size72/users/avatar195.jpg](http://comicrack.cyolito.com/media/kunena/avatars/resized/size72/users/avatar195.jpg)


--------------------------------------------------------------------------------
/changelog.txt:
--------------------------------------------------------------------------------
 1 | v0.9 ->
 2 | 
 3 |    Added:    - New rules: 
 4 |                     - scan keep/remove
 5 |              - New toolbar icon
 6 |              - Fix for typos and rare crash
 7 | 
 8 | 
 9 | v0.8 -> 
10 | 
11 |    Added:    - New rules: 
12 |                     - pagesize keep/remove largest/smallet
13 |              - Comics with different formats will not be duppes
14 | 
15 | 
16 | v0.7 -> 
17 | 
18 |    Added:    - New rules: 
19 |                     - pagecount remove largest
20 |                     - pagecount remove smallest
21 |                     - filesize remove largest
22 |                     - filesize remove smallest
23 |              - Now it copies some comic information from deleted comics. It is disabled per default. 
24 |                Enable in constants.py : UPDATEINFO
25 |              - Fix for series with multiple volumes
26 | 
27 | 
28 | v0.6 -> New Features Release
29 | 
30 |    Added:    - New parser (texts with more than one word can be surrounded by ")
31 |              - Percentage option to filesize keep/remove
32 |              - Added percentage option to pagecount keep/remove largest/smallest
33 |              - Added keep first (to remove remaining identical files)
34 |              - Allow filter multiple words (using any of then) in texts
35 | 
36 | 
37 | v0.5 -> 
38 | 
39 |    Fixed:   - Major bug found in the 'text' rules!!!!!
40 | 
41 |    Changed: - 'pagecount keep noads' now skips comics with COVERPAGES or less pages
42 | 	    - Added "@ OPTION VALUE" to the rules.dat file
43 | 	    - Added new options:
44 | 		    - COVERPAGES (int)
45 | 		    - SIZEMARGIN (int)     (part of issue 9, still not operative)
46 |             - Allow more than one word as part of the text based rules (issue 8)	
47 | 
48 | v0.4 -> Bug fix Release
49 | 
50 |    Fixed:   - Issue 7 finally solved (I hope!)
51 | 	    - Issue 6 finally solved (I hope!)
52 | 	    - Correctly changed version number in Package.ini
53 | 
54 | 
55 | v0.3 -> Bug fix Release
56 | 	
57 |    Fixed:   - Doesn't break Series info Panel (issue 6) anymore
58 | 
59 | 	    - Threw and exception when there were no dupes (issue 4)
60 | 
61 | 
62 |    Changed: - Remove leading 0's in comic number to improve duplicate discovery
63 | 
64 |             - 'pagecount remove fileless' will remove all fileless dupes but one when
65 | 		a group of only fileless is found. One with thumbnail will be kept
66 | 		(issue 7)
67 | 
68 | 
69 | v0.2 -> Bug fix Release
70 | 
71 | 	Fixed: 	- Rules '[text] remove word' is now correctly parsed.
72 | 
73 | v0.1 -> Initial Release


--------------------------------------------------------------------------------
/constants.py:
--------------------------------------------------------------------------------
 1 | #####################################################################################################
 2 | ##
 3 | ##      constants.py - part of duplicatemanager, a script for comicrack
 4 | ##
 5 | ##      Author: perezmu
 6 | ##
 7 | ##      Copyleft perezmu 2011. 
 8 | ##
 9 | ######################################################################################################
10 | 
11 | 
12 | 
13 | ##########
14 | #
15 | #   DEFINITIONS
16 | 
17 | 
18 | import re
19 | import clr
20 | import System
21 | import System.IO
22 | from System.IO import Path, Directory, File, FileInfo
23 | 
24 | #
25 | #############   **** USER CONFIGURABLE VARIABLES ***    ###########################################
26 | #
27 | #          see http://code.google.com/p/comicrack-duplicates-manager/wiki/UserConfiguration for details
28 | #
29 | #          These may also be set in the "dmrules.dat" rules file using this syntax:  "@ OPTION VALUE". Values
30 | #           found in the "dmrules.dat" file override the defaults set in this file.
31 | 
32 | 
33 | MOVEFILES = False
34 | REMOVEFROMLIB = False
35 | UPDATEINFO = False
36 | 
37 | DUPESDIRECTORY = Path.Combine("C:\\","__dupes__")
38 | 
39 | C2C_NOADS_GAP = 5          # Difference of pages between c2c and noads
40 | SIZEMARGIN = 0             # Preserve comics within sizemargin % size
41 | COVERPAGES = 4             # Minimal number of pages to be considered "covers only"
42 |  
43 | VERBOSE = False            # Logging level (true/false)
44 | DEBUG = False              # Logging level (true/false)
45 | 
46 | 
47 | #
48 | ############   DON'T MODIFY BELOW THIS LINE ######
49 | #
50 | 
51 | VERSION= "0.9"
52 | 
53 | SCRIPTDIRECTORY = __file__[0:-len("constants.py")]
54 | RULESFILE = Path.Combine(SCRIPTDIRECTORY, "dmrules.dat")
55 | LOGFILE = Path.Combine(DUPESDIRECTORY, "logfile.log")
56 | (SERIES,NUMBER,VOLUME,FILENAME,PAGECOUNT,FILESIZE,ID,CVDB_ID,FILEPATH,TAGS,NOTES,FILETYPE,SCAN,BOOK) = range(14)
57 | FIELD_NAMES = ['series','number','volume','filename','pages','size','id','cvdb_id','path','tags','notes','type','scan','book']
58 | FIELDS_TO_UPDATE_INFO = [
59 |         [ 'AlternateCount', lambda x: int(x) ],
60 |         [ 'AlternateNumber', lambda x: x ],
61 |         [ 'AlternateSeries', lambda x: x ],
62 |         [ 'Count', lambda x: int(x) ],
63 |         [ 'Title', lambda x: x ],
64 | ]
65 | 
66 | #
67 | #
68 | ###########


--------------------------------------------------------------------------------
/dmBookWrapper.py:
--------------------------------------------------------------------------------
  1 | #####################################################################################################
  2 | ##
  3 | ##      BookWrapper.py - part of duplicatemanager, a script for comicrack
  4 | ##
  5 | ##      Author: Ricardo Pescuma Domenecci , modified by perezmu
  6 | ##
  7 | ##      Orignally from pescumas' series info panel script. 
  8 | ##
  9 | ######################################################################################################
 10 | 
 11 | 
 12 | ### original credits
 13 | """
 14 | Copyright (C) 2010 Ricardo Pescuma Domenecci
 15 | 
 16 | This is free software; you can redistribute it and/or
 17 | modify it under the terms of the GNU Library General Public
 18 | License as published by the Free Software Foundation; either
 19 | version 2 of the License, or (at your option) any later version.
 20 | 
 21 | This is distributed in the hope that it will be useful,
 22 | but WITHOUT ANY WARRANTY; without even the implied warranty of
 23 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 24 | Library General Public License for more details.
 25 | 
 26 | You should have received a copy of the GNU Library General Public
 27 | License along with this file; see the file license.txt.  If
 28 | not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
 29 | Boston, MA 02111-1307, USA.
 30 | """
 31 | ##########
 32 | 
 33 | """
 34 | Modified by apm
 35 | """
 36 | 
 37 | import clr
 38 | clr.AddReference('System.Drawing')
 39 | import sys
 40 | import System
 41 | from getcvdb import extract_issue_ref
 42 | from utilsbycory import *
 43 | 
 44 | 
 45 | 
 46 | 
 47 | class dmBookWrapper:
 48 | 	_emptyVals = { 
 49 | 		'Count' : '-1', 
 50 | 		'Year' : '-1', 
 51 | 		'Month' : '-1', 
 52 | 		'AlternateCount' : '-1', 
 53 | 		'Rating' : '0.0',
 54 | 		'CommunityRating' : '0.0'
 55 | 		}
 56 | 	_dontConvert = set([ 
 57 | 		'Pages',
 58 | 		'PageCount',
 59 | 		'FrontCoverPageIndex',
 60 | 		'FirstNonCoverPageIndex',
 61 | 		'LastPageRead',
 62 | 		'ReadPercentage',
 63 | 		'OpenedCount'
 64 | 		])
 65 | 	
 66 | 	def __init__(self, book):
 67 | 		self.raw = book
 68 | 		self._pages = {}
 69 | 	
 70 | 	def __dir__(self):
 71 | 		ret = set()
 72 | 		ret.update(set(self.__dict__))
 73 | 		ret.update(dir(self.raw))
 74 | 		ret.update(self._getterFields)
 75 | 		return list(ret)
 76 | 	
 77 | 	def _safeget(self, name):
 78 | 		try:
 79 | 			return self._get(name)
 80 | 		except:
 81 | 			return ''
 82 | 	
 83 | 	def _get(self, name):
 84 | 		return ToString(getattr(self.raw, name)).strip()
 85 | 	
 86 | 	def __getattr__(self, name):
 87 | 		if name in self._dontConvert:
 88 | 			return getattr(self.raw, name)
 89 | 		
 90 | 		if name in self._emptyVals:
 91 | 			emptVal = self._emptyVals[name]
 92 | 		else:
 93 | 			emptVal = ''
 94 | 		
 95 | 		ret = self._get(name)
 96 | 		if ret == '' or ret == emptVal:
 97 | 			ret = self._safeget('Shadow' + name)
 98 | 		if ret == '' or ret == emptVal:
 99 | 			ret = ''
100 | 		return ret
101 | 	
102 | 	def GetCover(self, width = 0, height = 0):
103 | 		coverIndex = 0 
104 | 		if self.raw.FrontCoverPageIndex > 0:
105 | 			coverIndex = self.raw.FrontCoverPageIndex
106 | 		return self.GetPage(coverIndex, width, height)
107 | 	
108 | 	def GetPage(self, page, width = 0, height = 0):
109 | 		global _oldTmpFiles, _ComicRack
110 | 		
111 | 		if not self.raw.FilePath:
112 | 			if page > 0:
113 | 				return ''
114 | 		elif page >= self.raw.PageCount:
115 | 			return ''
116 | 		
117 | 		hash = str(page) + '_' + str(width) + '_' + str(height)
118 | 		
119 | 		if hash in self._pages:
120 | 			return self._pages[hash]
121 | 		
122 | 		self._pages[hash] = ''
123 | 		
124 | 		#image = _ComicRack.App.GetComicPage(self.raw, page)
125 | 		image = _ComicRack.App.GetComicThumbnail(self.raw, page)
126 | 		if image is None:
127 | 			return ''
128 | 
129 | 		tmpFile = System.IO.Path.GetTempFileName()
130 | 		_oldTmpFiles.append(tmpFile)
131 | 
132 | 		# We need a jpg
133 | 		imageFile = tmpFile + '.jpg'
134 | 		_oldTmpFiles.append(imageFile)
135 | 		#print imageFile
136 | 		
137 | 		try:
138 | 			if width > 0 or height > 0:
139 | 				image = ResizeImage(image, width, height)
140 | 			
141 | 			image.Save(imageFile, System.Drawing.Imaging.ImageFormat.Jpeg)
142 | 			
143 | 			self._pages[hash] = imageFile
144 | 			
145 | 			return imageFile
146 | 			
147 | 		except Exception,e:
148 | 			print '[SeriesInfoPanel] Exception when saving image: ', e
149 | 			return ''
150 | 	
151 | 	def GetSeries(self):
152 | 		ret = self.raw.Series
153 | 		if ret:
154 | 			return ToString(ret)
155 | 		ret = self.raw.ShadowSeries
156 | 		if ret:
157 | 			return ToString(ret)
158 | 		return ''
159 | 	
160 | 	def GetVolume(self):
161 | 		ret = self.raw.Volume
162 | 		if ret != -1:
163 | 			return ToString(ret)
164 | 		ret = self.raw.ShadowVolume
165 | 		if ret != -1:
166 | 			return ToString(ret)
167 | 		return ''
168 | 	
169 | 	def GetNumber(self):
170 | 		ret = self.raw.Number
171 | 		if ret:
172 | 			return ret
173 | 		ret = self.raw.ShadowNumber
174 | 		if ret:
175 | 			return ret
176 | 		return ''
177 | 	
178 | 	def GetFormat(self):
179 | 		ret = self.raw.Format
180 | 		if ret:
181 | 			return ret
182 | 		ret = self.raw.ShadowFormat
183 | 		if ret:
184 | 			return ret
185 | 		return 'Series'
186 | 
187 | 	def GetFileFormat(self):
188 | 		if not self.raw.FilePath:
189 | 			return 'Fileless'
190 | 		ret = self.raw.FileFormat
191 | 		if ret:
192 | 			return ret
193 | 		return self.raw.ShadowFileFormat
194 | 		if ret:
195 | 			return ret
196 | 		return Translate('Unknown')
197 | 
198 | 	def GetFilePath(self):
199 | 		if not self.raw.FilePath:
200 | 			return 'Fileless'
201 | 		ret = self.raw.FilePath
202 | 		return ret
203 | 
204 | 	def GetFileName(self):
205 | 		if not self.raw.FileNameWithExtension:
206 | 			return 'Fileless'
207 | 		ret = self.raw.FileNameWithExtension
208 | 		return ret
209 | 
210 | 
211 | # these were added by apm
212 | 
213 | 
214 | 	def GetId(self):
215 | 		return ToString(self.raw.Id)
216 | 	
217 | 	def GetPageCount(self):
218 | 		return self.raw.PageCount
219 | 
220 | 	def GetFileSize(self):
221 | 		return self.raw.FileSize
222 | 
223 | 	def GetCVDB_ID(self):
224 | 		return ToString(extract_issue_ref(self.raw))
225 | 
226 | 
227 | 	# Properties
228 | 	
229 | 	Cover = property(GetCover)
230 | 	Series = property(GetSeries)
231 | 	Volume = property(GetVolume)
232 | 	Number = property(GetNumber)
233 | 	Format = property(GetFormat)
234 | 	FileFormat = property(GetFileFormat)
235 | 	ID = property (GetId)
236 | 	PageCount = property(GetPageCount)
237 | 	FilePath = property(GetFilePath)
238 | 	FileName = property(GetFileName)
239 | 
240 | 	FileSize = property(GetFileSize)
241 | 	CVDB_ID = property(GetCVDB_ID)
242 | 
243 | 
244 | 


--------------------------------------------------------------------------------
/dmParser.py:
--------------------------------------------------------------------------------
 1 | def _ParseLine(line_num, line):
 2 |     insideQuote = False
 3 |     result = []
 4 |     word = ''
 5 |     
 6 |     i = 0
 7 |     while i < len(line):
 8 |         c = line[i]
 9 |         
10 |         if not insideQuote:
11 |             if c == '#':
12 |                 # Found start of comment
13 |                 line = line[:i].strip()
14 |                 break
15 |             
16 |             elif c == '"':
17 |                 if word == '':
18 |                     # Found start of a quote
19 |                     insideQuote = True
20 |                 else:
21 |                     # Found quote inside word. Keep it there
22 |                     word += c
23 |                 
24 |             elif c.isspace():
25 |                 if word != '':
26 |                     result.append(word)
27 |                 word = ''
28 |                 
29 |             else:
30 |                 word += c
31 |             
32 |         else: # insideQuote
33 |             if c == '\\' and line[i + 1] == '"':
34 |                 word += '"'
35 |                 i += 1
36 |             
37 |             elif c == '"':
38 |                 result.append(word)
39 |                 word = ''
40 |                 insideQuote = False
41 |                 
42 |             else:
43 |                 word += c
44 |         
45 |         i += 1
46 |     
47 |     if word != '':
48 |         result.append(word)
49 |     
50 |     if len(result) > 0:
51 |         result.insert(0, line_num)
52 |         result.insert(1, line)
53 |         
54 |     return result
55 |     
56 | 
57 | def Parse(lines):
58 |     rules = []
59 |     
60 |     line_num = 0
61 |     for line in lines:
62 |         line = line.strip('\r\n').strip()
63 |         line_num += 1
64 |         
65 |         if line == '':
66 |             continue
67 |         
68 |         rule = _ParseLine(line_num, line)
69 |         if rule:
70 |             rules.append(rule)
71 |     
72 |     return rules
73 | 


--------------------------------------------------------------------------------
/dmrules.dat.demo:
--------------------------------------------------------------------------------
 1 | # All possible rules at the end of the file
 2 | 
 3 | # this example removes fileless and 
 4 | # selects the noads files with largest filesize
 5 | 
 6 | @ MOVEFILES True
 7 | @ REMOVEFROMLIB True
 8 | @ C2C_NOADS_GAP      120
 9 | 
10 | pagecount	remove	fileless
11 | covers      keep    some
12 | filename   keep    edit 
13 | filename   remove  "cover only"
14 | filename    keep    noads
15 | pagecount	keep	noads
16 | text keep Bchry
17 | filesize	keep	largest	10%
18 | #
19 | #
20 | # filename    keep    c2c
21 | # filename    remove  c2c
22 | # filetype    keep    zip rar
23 | # filetype    remove  pdf
24 | # filetype    remove  fileless
25 | # filepath    keep    c2c
26 | # filepath    remove  c2c
27 | # tags        remove  c2c
28 | # tags        keep    c2c
29 | # notes       keep    c2c
30 | # notes       remove  c2c
31 | # text        keep    c2c
32 | # text        remove  c2c
33 | # scan        keep    abc
34 | # scan        remove  abc
35 | # covers      keep    all
36 | # covers      keep	  some
37 | # filesize    keep    largest
38 | # filesize    keep    largest	10%
39 | # filesize    remove  largest
40 | # filesize    remove  largest	10%
41 | # filesize    keep    smallest
42 | # filesize    keep    smallest	10%
43 | # filesize    remove  smallest
44 | # filesize    remove  smallest	10%
45 | # pagecount   keep    largest
46 | # pagecount   remove  largest
47 | # pagecount   keep    smallest
48 | # pagecount   remove  smallest
49 | # pagecount   keep    fileless
50 | # pagecount   remove  fileless
51 | # pagecount   keep    noads
52 | # pagecount   keep    c2c
53 | # keep        first


--------------------------------------------------------------------------------
/duplicatesmanager.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/duplicatesmanager.png


--------------------------------------------------------------------------------
/duplicatesmanager.py:
--------------------------------------------------------------------------------
  1 | #####################################################################################################
  2 | ##
  3 | ##      duplicatesmanager.py - a script for comicrack
  4 | ##
  5 | ##      Author: perezmu, pescuma
  6 | ##
  7 | ##      Copyleft perezmu 2011. 
  8 | ##
  9 | ##        Detailed credits: "http://code.google.com/p/comicrack-duplicates-manager/wiki/FellowCredits"
 10 | ##
 11 | ######################################################################################################
 12 | 
 13 | 
 14 | 
 15 | #########
 16 | #
 17 | #    Import section
 18 | 
 19 | import sys, traceback
 20 | import re
 21 | import clr
 22 | import System
 23 | import System.IO
 24 | from System.IO import Path, Directory, File, FileInfo
 25 | 
 26 | clr.AddReference("System.Windows.Forms")
 27 | from System.Windows.Forms import DialogResult, MessageBox, MessageBoxButtons, MessageBoxIcon
 28 | 
 29 | 
 30 | from itertools import groupby
 31 | from dmBookWrapper import *
 32 | from utilsbycory import cleanupseries
 33 | from processfunctions import *
 34 | from dmParser import *
 35 | from utilsbycory import *
 36 | 
 37 | from constants import *
 38 | 
 39 | 
 40 | #
 41 | #
 42 | ##########
 43 | 
 44 |   
 45 | ##'''---------------------------------------------------------'''
 46 | 
 47 | 
 48 | ############
 49 | #
 50 | #   MAIN FUNCTION
 51 | 
 52 | 
 53 | #@Name DuplicatesManager
 54 | #@Hook Books
 55 | #@Image duplicatesmanager.png
 56 | 
 57 | def DuplicatesManager(books):
 58 | 
 59 |     ########################################
 60 |     #
 61 |     # Starting log file
 62 |     #
 63 |     
 64 |     if not Directory.Exists(DUPESDIRECTORY):
 65 |         try:
 66 |             Directory.CreateDirectory(DUPESDIRECTORY)
 67 |         except Exception, ex:
 68 |             MessageBox.Show('ERROR: '+ str(ex), "ERROR creating dump directory" + DUPESDIRECTORY, MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
 69 |             return
 70 |     
 71 |     logfile = open(LOGFILE,'w')
 72 |     logfile.write('COMICRACK DUPLICATES MANAGER V '+VERSION+'\n\n')
 73 |     ''' Logfile initialized '''
 74 |     
 75 |     #
 76 |     #
 77 |     #########################################
 78 |     
 79 |     try:
 80 |         ProcessDuplicates(books, logfile)
 81 |     
 82 |     except Exception, ex:
 83 |         logfile.write('\n\nSTOPPED PROCESSING BECAUSE OF EXCEPTION:\n')
 84 |         traceback.print_exc(None, logfile, False)
 85 |         raise ex
 86 |         
 87 |     finally:
 88 |         logfile.close()
 89 | 
 90 | 
 91 | 
 92 | def ProcessDuplicates(books, logfile):
 93 |     
 94 |     #########################################
 95 |     #
 96 |     # Getting comics info
 97 |         
 98 |     comiclist = []
 99 |     for book in books:
100 |         
101 |         b = dmBookWrapper(book)
102 |         # re.sub(r'^0+','',b.Number) -> removes leading 0's
103 |         series = b.Series
104 |         if b.Volume:
105 |             series += ' Vol.' + b.Volume
106 |         series += ' ' + b.Format
107 |         comiclist.append((cleanupseries(series),re.sub(r'^0+','',b.Number),b.Volume,b.FileName,b.PageCount,b.FileSize/1048576.0,b.ID,b.CVDB_ID,b.FilePath,book.Tags,book.Notes,b.FileFormat,b.ScanInformation,book))
108 | 
109 |     logfile.write('Parsing '+str(len(comiclist))+ ' ecomics\n')
110 | 
111 |    
112 |     #
113 |     #
114 |     ########################################
115 | 
116 |     #########################################
117 |     #
118 |     # Setting intial options values
119 | 
120 |     options = {"movefiles":MOVEFILES,
121 |                "removefromlib":REMOVEFROMLIB,
122 |                "updateinfo":UPDATEINFO,
123 |                "verbose":VERBOSE,
124 |                "debug":DEBUG,
125 |                "sizemargin":SIZEMARGIN,
126 |                "coverpages":COVERPAGES,
127 |                "c2c_noads_gap":C2C_NOADS_GAP}
128 |                    
129 |     ########################################
130 |     #
131 |     # Main Loop
132 |     #
133 | 
134 |     try:
135 |      
136 |         ###########################################
137 |         #
138 |         # Load rules file
139 |                 
140 |         rules = LoadRules(logfile, options)
141 | 
142 |         #
143 |         ############################################
144 | 
145 |         
146 |         
147 |         ############################################
148 |         #
149 |         # Massage comics to get a list of dupes groups
150 |         
151 |         ''' Now we group books looking for dupes! '''
152 |         comiclist.sort()
153 |         ''' begin sorting and sort the list '''
154 | 
155 |                 # TODO: I need to cleanup the series names and issues 1/2, 0.5, etc...
156 |                 # TODO: Also, check for CVDB items first!
157 | 
158 |         cl = {}
159 |         ''' temp dictionary'''
160 |         for key, group in groupby(comiclist, lambda x: x[SERIES]):
161 |                 cl[key] = list(group)
162 |                 '''groups by series'''
163 |                 ''' cl is a dictionary that now has 'series' as keys'''
164 |                 ''' we remove series with only one ecomic '''
165 |         
166 |         logfile.write('============= Begining dupes identification ==================================\n\n')             
167 |         
168 |         logfile.write('Parsing '+str(len(comiclist))+ ' ecomics\n')
169 |         logfile.write('Found '+str(len(cl))+ ' different series\n')
170 |                 
171 |         if options["verbose"]:
172 |             for series in sorted(cl.keys()):
173 |                             logfile.write('\t'+series+'\n')
174 |         
175 |         remove = []
176 |         for series in cl.keys():
177 |                 if len(cl[series])==1:
178 |                                 remove.append(series)   
179 |         for series in remove:
180 |                 del cl[series]
181 |         logfile.write('Found '+str(len(cl))+ ' different series with more than one issue\n')
182 |                 
183 |         if options["verbose"]:
184 |             for series in sorted(cl.keys()):
185 |                             logfile.write('\t'+series+'\n')
186 |         
187 |         ''' we now regroup each series looking for dupe issues '''
188 |         # We need to use a different list or sometimes python duplicates entries
189 |         temp_cl = {}
190 |         for series in cl.keys():
191 |                 cl[series].sort()
192 |                 temp_dict = {} 
193 |                 for key, group in groupby(cl[series], lambda x: x[NUMBER]):
194 |                         temp_dict[key] = list(group)
195 |                 temp_cl[series] = temp_dict
196 |         cl = temp_cl
197 |                 
198 |         ''' cleaning issues without dupes '''
199 |         remove = []
200 |         for series in cl.keys():
201 |                 for number in cl[series]:
202 |                         if len(cl[series][number])==1:
203 |                                         remove.append([series,number])
204 |                 
205 |         for a in remove:
206 |                 del cl[a[0]][a[1]]
207 |         
208 |         
209 |         ''' now a second go for series without issues after non-dupe removal '''
210 |         remove = []
211 |         for i in cl:
212 |                 if len(cl[i])==0:
213 |                                 remove.append(i)
214 |         for i in remove:
215 |                 del cl[i]
216 | 
217 |         logfile.write('Found '+str(len(cl))+ ' different series with dupes\n')
218 |         if options["verbose"]:
219 |             for series in sorted(cl.keys()):
220 |                             logfile.write('\t'+series+'\t('+str(cl[series].keys())+')\n')
221 |                 
222 |         ''' Now I have them sorted, I convert them to a simple list of lists (groups)...
223 |         each item in this list is a list of dupes '''
224 |         
225 |         dupe_groups = []
226 |         for i in cl:
227 |                 for j in cl[i]:
228 |                         dupe_groups.append(cl[i][j])
229 |         
230 |         logfile.write('Found '+str(len(dupe_groups)) +' groups of dupes, with a total of '+ str(len(reduce(list.__add__, dupe_groups, [])))+ ' ecomics.\n')
231 |         if options["verbose"]:
232 |                 for group in sorted(dupe_groups):
233 |                         logfile.write('\t'+group[0][SERIES]+' #'+group[0][NUMBER]+'\n')
234 |                         for comic in group:
235 |                                 logfile.write('\t\t'+comic[FILENAME]+'\n')
236 | 
237 |         dupe_groups.sort()
238 | 
239 |         logfile.write('\n============= End of dupes identification=====================================\n\n\n\n')
240 |         logfile.write('============= Beginning dupes processing =====================================\n\n')
241 |         
242 |         del cl
243 |         
244 |         #
245 |         ##########################################################
246 |         
247 |         #
248 |         #      Exception handling
249 |         #
250 | 
251 |     except NoRulesFileException, ex:
252 |         MessageBox.Show('ERROR: '+ str(ex), "ERROR in Rules File", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
253 |         logfile.write('\n\nERROR in Rules File:\n')
254 |         traceback.print_exc(None, logfile, False)
255 |         return
256 |         
257 |         
258 |     ###################### processing ########################################
259 |     
260 |     movedcomics = 0
261 |     
262 |     new_groups = []
263 |     
264 |     # fix for issue 4 - if there are no dupes, end gracefully
265 |     if len(dupe_groups) == 0:
266 |         MessageBox.Show('Script execution completed: No duplicates found in the comics selected', 'Success', MessageBoxButtons.OK, MessageBoxIcon.Information)
267 |         logfile.write('\n\n\n\ ########################################################### \n\n\n')
268 |         logfile.write('Scritp execution completed: No duplicates found in the comics selected')
269 |         
270 |         del dupe_groups
271 |         del new_groups
272 |         return
273 |     
274 |     for group in dupe_groups:
275 |         
276 |         t_group = group[:]
277 | 
278 |         logfile.write('\n= PROCESSING GROUP_____\n')
279 |         logfile.write('= '+ t_group[0][SERIES] + ' #'+str(t_group[0][NUMBER])+'\n')
280 |         
281 |         i_rules = 0
282 |         
283 |         while (len(t_group)> 1) and (i_rules < len(rules)):
284 |             t_rule = rules[i_rules][:]
285 |             
286 |             line = t_rule[0]
287 |             t_rule = t_rule[1:]
288 |             
289 |             logfile.write('\n_________________  ')
290 |             logfile.write(line)
291 |             logfile.write('  _________________\n')
292 |             logfile.flush()
293 |             
294 |             if options["debug"]:
295 |                 logfile.write('  ' + str(t_rule) + '\n')
296 |             
297 |             t_rule.append(t_group[:])
298 |             t_rule.append(logfile)
299 |             t_rule.insert(1,ComicRack)
300 |             t_rule.insert(1,options)
301 |             
302 |             t_group = globals()[t_rule[0]](*t_rule[1:])     ### this is the trick to call a function using a string with its name
303 |             
304 |             i_rules = i_rules+1
305 |         
306 |         new_groups.append(t_group)
307 | 
308 |     
309 |     dupe_groups = new_groups[:]
310 |     
311 |     remain_comics = len(reduce(list.__add__, new_groups))
312 |     
313 |     for group in dupe_groups:
314 |         if len(group) == 1: new_groups.remove(group)
315 |     
316 |     # new_groups holds now the remaining groups for logging purposes
317 |     
318 |     #if len(dupe_groups)>=1:
319 |     #    print 'Found ',len(dupe_groups), ' groups of dupes, with a total of ', len(reduce(list.__add__, dupe_groups)), ' comics.'
320 | 
321 |     #
322 |     #   End of Main Loop
323 |     #
324 |     ###########################################################
325 | 
326 | #### End report
327 | 
328 |     MessageBox.Show('Script execution completed correctly on: '+ str(len(books))+ ' books.\n - '+str(len(dupe_groups))+' duplicated groups processed.\n - '+str(len(new_groups))+' duplicated groups remain.\n - '+str(remain_comics)+' comics remain', 'Success', MessageBoxButtons.OK, MessageBoxIcon.Information)
329 |     logfile.write('\n\n\n\ ########################################################### \n\n\n')
330 |     logfile.write('Script execution completed correctly on: '+ str(len(books))+ ' books.\n'+str(len(dupe_groups))+' duplicated groups processed.\n'+str(len(new_groups))+' duplicated groups remain..\n'+str(remain_comics)+' comics remain')
331 | 
332 | #### Garbage collecting
333 | 
334 |     del dupe_groups
335 |     del new_groups
336 | 
337 |     return
338 | 
339 | 
340 | #### ============================================================================================================================
341 |     
342 | ###############################
343 | #
344 | #    Read and parse rules files dmrules.dat
345 | #    
346 |     
347 | def LoadRules(logfile, options):
348 |     
349 |     # Check if file exists
350 |     if not File.Exists(RULESFILE):
351 |         raise NoRulesFileException('Rules File (dmrules.dat) could not be found in the script directory ('+ SCRIPTDIRECTORY +')')
352 |     
353 |     # Read file
354 |     f = open(RULESFILE, 'r')
355 |     all_lines = f.readlines()
356 |     f.close()
357 | 
358 |     # Parse rules and filter out options
359 |     options_list = []
360 |     rules = []
361 |     for line in Parse(all_lines):
362 |         if line[2] == '@':
363 |             options_list.append(line)
364 |         elif line[2][0] == '@':
365 |             options_list.append(line[:2] + ['@', line[2][1:]] + line[3:])
366 |         else:
367 |             rules.append(line)
368 |     
369 |     logfile.write('\n\n============= Beginning options reading ==================================\n\n')               
370 |     logfile.write('Successfully read the following options: \n\n')
371 |     for option in options_list:
372 |         logfile.write('\tLine ' + str(option[0]) + ': ' + str(option[3:]) + '\n')
373 |     logfile.write('\n')
374 |             
375 |     logfile.write('\n\n============= Beginning rules reading ==================================\n\n')               
376 |     logfile.write('Successfully read the following rules: \n\n')
377 |     for rule in rules:
378 |         logfile.write('\tLine ' + str(rule[0]) + ': ' + str(rule[2:]) + '\n')
379 |     logfile.write('\n')
380 |       
381 |     bool_options = ("movefiles", "removefromlib", "updateinfo", "verbose", "debug")
382 |     int_options = ("sizemargin", "coverspages", "c2c_noads_gap")
383 | 
384 |     
385 | #
386 | #   Parse options
387 | #
388 | #               Checks if options need to have (and in fact do have) boolean or integer values
389 | 
390 |     bDict = {"false":False, "true":True}
391 |      
392 |     for option in options_list:
393 |         opLineNum = option[0]
394 |         opLine = option[1]
395 |         
396 |         if len(option) != 5:
397 |             raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "' + opLine + '" has wrong format')
398 |             
399 |         opName = option[3].lower()
400 |         opVal = option[4].lower()
401 | 
402 |                                             # boolean option
403 |         if opName in bool_options:
404 |             opVal = opVal
405 |             if opVal in bDict.keys():
406 |                 options[opName] = bDict[opVal]
407 |             else:
408 |                 raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "'+ opLine +'" value is invalid ("True" or "False" required)')
409 |                 
410 |                                             # integer option
411 |         elif opName in int_options:
412 |             try:
413 |                 options[opName] = int(opVal)
414 |             except:
415 |                 raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "'+ opLine +'" value is invalid (integer required)')             
416 |                                             # failure
417 |         else:
418 |             raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "'+ opLine +'" not recognized (' + str(opName) + ')')
419 |    
420 | 
421 |     logfile.write('\n\n============= Beginning options parsing ==================================\n\n')               
422 |     logfile.write('Using the following options: \n\n')
423 |     for option in options:
424 |         logfile.write('\t'+option.upper() + " = " + str(options[option]).upper()+'\n')
425 |         
426 | #
427 | #   Parse rules
428 | #
429 | 
430 |     logfile.write('\n\n============= Beginning rules parsing ==================================\n\n')               
431 |     
432 |     parsed_rules = []
433 |         
434 |     for rule in rules:
435 |         parsed_rules.append(ParseRule(rule))            
436 | 			
437 |     if VERBOSE:
438 |         logfile.write('\nParsed rules:\n\n')
439 |         for rule in parsed_rules:
440 |             logfile.write('\t\t'+str(rule)+'\n')
441 |     logfile.write('\n============= End of rules parsing ======================================\n\n\n\n')  
442 | 
443 |     return parsed_rules
444 | 
445 |     
446 | 
447 | def AsPercentage(args, index, defVal):
448 |     if index < len(args):
449 |         text = args[index]
450 |     else:
451 |         text = defVal
452 |     
453 |     if text[-1] == '%':
454 |         num = text[:-1]
455 |     else:
456 |         num = text 
457 |     
458 |     try:
459 |         return int(num)
460 |     except:
461 |         raise Exception('Invalid percentage value: ' + text)
462 |     
463 | 
464 | 
465 | known_rules = [
466 |     [ ["pagecount", "keep", "fileless"],  lambda args: ["keep_pagecount_fileless"] ],
467 |     [ ["pagecount", "remove", "fileless"],lambda args: ["remove_pagecount_fileless"] ],
468 |     [ ["pagecount", "keep", "largest"],   lambda args: ["keep_pagecount_largest", AsPercentage(args, 0, "0%")] ],
469 |     [ ["pagecount", "remove", "largest"], lambda args: ["remove_pagecount_largest", AsPercentage(args, 0, "0%")] ],
470 |     [ ["pagecount", "keep", "smallest"],  lambda args: ["keep_pagecount_smallest", AsPercentage(args, 0, "0%")] ],
471 |     [ ["pagecount", "remove", "smallest"],lambda args: ["remove_pagecount_smallest", AsPercentage(args, 0, "0%")] ],
472 |     [ ["pagecount", "keep", "noads"],     lambda args: ["keep_pagecount_noads"] ],
473 |     [ ["pagecount", "keep", "c2c"],       lambda args: ["keep_pagecount_c2c"] ],
474 |     [ ["filesize", "keep", "largest"],    lambda args: ["keep_filesize_largest", AsPercentage(args, 0, "0%")] ],
475 |     [ ["filesize", "remove", "largest"],  lambda args: ["remove_filesize_largest", AsPercentage(args, 0, "0%")] ],
476 |     [ ["filesize", "keep", "smallest"],   lambda args: ["keep_filesize_smallest", AsPercentage(args, 0, "0%")] ],
477 |     [ ["filesize", "remove", "smallest"], lambda args: ["remove_filesize_smallest", AsPercentage(args, 0, "0%")] ],
478 |     [ ["pagesize", "keep", "largest"],    lambda args: ["keep_pagesize_largest", AsPercentage(args, 0, "0%")] ],
479 |     [ ["pagesize", "remove", "largest"],  lambda args: ["remove_pagesize_largest", AsPercentage(args, 0, "0%")] ],
480 |     [ ["pagesize", "keep", "smallest"],   lambda args: ["keep_pagesize_smallest", AsPercentage(args, 0, "0%")] ],
481 |     [ ["pagesize", "remove", "smallest"], lambda args: ["remove_pagesize_smallest", AsPercentage(args, 0, "0%")] ],
482 |     [ ["covers", "keep", "some"],         lambda args: ["keep_covers_all", False] ],
483 |     [ ["covers", "keep", "all"],          lambda args: ["keep_covers_all", True] ],
484 |     [ ["filename", "keep"],               lambda args: ["keep_with_words", args, [FILENAME]] ],
485 |     [ ["filename", "remove"],             lambda args: ["remove_with_words", args, [FILENAME]] ],
486 |     [ ["filepath", "keep"],               lambda args: ["keep_with_words", args, [FILEPATH]] ],
487 |     [ ["filepath", "remove"],             lambda args: ["remove_with_words", args, [FILEPATH]] ],
488 |     [ ["tags", "keep"],                   lambda args: ["keep_with_words", args, [TAGS]] ],
489 |     [ ["tags", "remove"],                 lambda args: ["remove_with_words", args, [TAGS]] ],
490 |     [ ["notes", "keep"],                  lambda args: ["keep_with_words", args, [NOTES]] ],
491 |     [ ["notes", "remove"],                lambda args: ["remove_with_words", args, [NOTES]] ],
492 |     [ ["text", "keep"],                   lambda args: ["keep_with_words", args, [FILENAME, FILEPATH, TAGS, NOTES, SCAN]] ],
493 |     [ ["text", "remove"],                 lambda args: ["remove_with_words", args, [FILENAME, FILEPATH, TAGS, NOTES, SCAN]] ],
494 |     [ ["scan", "keep"],                   lambda args: ["keep_with_words", args, [SCAN]] ],
495 |     [ ["scan", "remove"],                 lambda args: ["remove_with_words", args, [SCAN]] ],
496 |     [ ["filetype", "keep"],               lambda args: ["keep_with_words", args, [FILETYPE]] ],
497 |     [ ["filetype", "remove"],             lambda args: ["remove_with_words", args, [FILETYPE]] ],
498 |     [ ["keep", "first"],                  lambda args: ["keep_first"] ],
499 | ]
500 | 
501 | 
502 | def ParseRule(rule):
503 |     line_num = rule[0]
504 |     line = rule[1]
505 |     rule_tokens = rule[2:]
506 |     
507 |     # Try to match to a known command
508 |     for cmd in known_rules:
509 |         tokens = cmd[0]
510 |         action = cmd[1]
511 |         
512 |         if len(rule_tokens) < len(tokens):
513 |             continue
514 |         
515 |         # Check it the rule matches the command
516 |         matches = True
517 |         for i in range(len(tokens)):
518 |             if tokens[i] != rule_tokens[i]:
519 |                 matches = False
520 |                 break;
521 |         
522 |         if not matches:
523 |             continue
524 |         
525 |         args = rule_tokens[len(tokens):]
526 |         
527 |         try:
528 |             result = [line]
529 |             result.extend(action(args))
530 |             return result
531 |         except Exception, ex:
532 |             raise NoRulesFileException('Line ' + str(line_num) + ': ' + str(ex) + '\n' + line)
533 |     
534 |     # If got here not command was matched
535 |     raise NoRulesFileException('Line ' + str(line_num) + ': Rule could not be parsed:\n' + line) 
536 | 
537 | 
538 | 
539 | class NoRulesFileException(Exception):
540 |     pass
541 |     
542 | 


--------------------------------------------------------------------------------
/duplicatesmanager.xcf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/duplicatesmanager.xcf


--------------------------------------------------------------------------------
/duplicatesmanager_small.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/duplicatesmanager_small.png


--------------------------------------------------------------------------------
/getcvdb.py:
--------------------------------------------------------------------------------
 1 | #####################################################################################################
 2 | ##
 3 | ##      getcvdb.py - part of duplicatemanager, a script for comicrack
 4 | ##
 5 | ##      Author: cbanack, for his Comic Vine Scraper module
 6 | ##
 7 | ##      
 8 | ######################################################################################################
 9 | 
10 | ### original credits
11 | ''' 
12 | This module contains utility methods for working with ComicRack
13 | ComicBook objects (i.e. 'book' objects).
14 | 
15 | @author: Cory Banack
16 | '''
17 | #########
18 | 
19 | 
20 | import re
21 | 
22 | # =============================================================================
23 | def extract_issue_ref(book):
24 |    '''
25 |    This method looks in the Tags and Notes fields of the given book for 
26 |    evidence that the given ComicBook has been scraped before.   If possible, 
27 |    it will construct an IssueRef based on that evidence, and return it.  
28 |    If not, it will return None.   
29 |    
30 |    If the user has manually added a "skip" flag to one of those fields, this
31 |    method will return the string "skip", which should be interpreted as 
32 |    "never scrape this book".
33 |    '''
34 |    
35 | 
36 |    tag_found = re.search(r'(?i)CVDB(\d{1,})', book.Tags)
37 |    if not tag_found:
38 |       tag_found = re.search(r'(?i)CVDB(\d{1,})', book.Notes)
39 |       if not tag_found:
40 |          tag_found = re.search(r'(?i)ComicVine.?\[(\d{1,})', book.Notes)
41 | 
42 |    retval = None
43 |    if tag_found:
44 |       retval = tag_found.group(1).lower()
45 |       
46 |    return retval
47 | 


--------------------------------------------------------------------------------
/processfunctions.py:
--------------------------------------------------------------------------------
  1 | #####################################################################################################
  2 | ##
  3 | ##      processfunctions.py - part of duplicatemanager, a script for comicrack
  4 | ##
  5 | ##      Author: perezmu, pescuma
  6 | ##
  7 | ##      Copyleft perezmu 2011. 
  8 | ##
  9 | ######################################################################################################
 10 | 
 11 | #########
 12 | #
 13 | #    Import section
 14 | 
 15 | 
 16 | import constants
 17 | 
 18 | import re
 19 | import clr
 20 | import System
 21 | import System.IO
 22 | from System.IO import Path, Directory, File, FileInfo
 23 | 
 24 | clr.AddReference("System.Windows.Forms")
 25 | from System.Windows.Forms import DialogResult, MessageBox, MessageBoxButtons, MessageBoxIcon
 26 | 
 27 | 
 28 | from itertools import groupby
 29 | from dmBookWrapper import *
 30 | from utilsbycory import *
 31 | 
 32 | from constants import *
 33 | 
 34 | PAGESIZE = -1
 35 | 
 36 | #
 37 | #
 38 | ##########
 39 | 
 40 | 
 41 | 
 42 | #################################################################################################
 43 | 
 44 | 
 45 | # ================ PAGECOUNT FUNCTIONS ==========================================================
 46 | 
 47 | 
 48 | def keep_pagecount_noads(options, cr, dgroup, logfile):
 49 |     ''' Keeps from the 'group' the ones that seem to be 'noads' (less pages)
 50 |             dgroup -> list of duplicate comics
 51 |             logfile -> file object    '''
 52 | 
 53 |     to_keep = []
 54 |     to_remove =[]
 55 | 
 56 |     by_size = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=False) # sorts by filesize of covers
 57 |    
 58 |     for comic in dgroup:
 59 |         if comic[PAGECOUNT] <= options["coverpages"]:
 60 |             by_size.remove(comic)
 61 |             to_keep.append(comic)
 62 |             if comic[PAGECOUNT] == 0: logfile.write('skipping... '+ comic[SERIES]+' #' + comic[NUMBER] + ' (fileless)\n')
 63 |             else: logfile.write('skipping... '+ comic[FILENAME]+' #' + comic[NUMBER] +' (pages '+str(comic[PAGECOUNT])+')\n')
 64 |             
 65 |     i=0                                                                             #keeps the first one
 66 |     to_keep.append(by_size[i])
 67 |     logfile.write('keeping... '+ by_size[i][FILENAME]+' (pages '+str(by_size[i][PAGECOUNT])+')\n')
 68 |     
 69 |        
 70 |     while (i<len(by_size)-1) and (int(by_size[i+1][PAGECOUNT]) < (int(by_size[i][PAGECOUNT]) + C2C_NOADS_GAP)):
 71 |             to_keep.append(by_size[i+1])
 72 |             logfile.write('keeping... '+ by_size[i+1][FILENAME]+' (pages '+str(by_size[i+1][PAGECOUNT])+')\n')
 73 |             i = i+1
 74 |     for j in range (i+1,len(by_size)):
 75 |         to_remove.append(by_size[j])
 76 |         logfile.write('removing... '+ by_size[j][FILENAME]+' (pages '+str(by_size[j][PAGECOUNT])+')\n')
 77 |             
 78 |     updateinfo(options, to_remove, to_keep, logfile)
 79 |     deletecomics(options, cr, to_remove, logfile)
 80 |         
 81 |     return to_keep[:]
 82 | 
 83 | 
 84 | def keep_pagecount_c2c(options, cr, dgroup, logfile):
 85 |     ''' Keeps from the 'group' the ones that seem to be 'noads' (less pages)
 86 |             dgroup -> list of duplicate comics
 87 |             logfile -> file object    '''
 88 | 
 89 |     to_keep = []
 90 |     to_remove = []
 91 | 
 92 |     by_size = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=True) # sorts by filesize of covers
 93 |            
 94 |     i=0                               #keeps the first one
 95 |     to_keep.append(by_size[i])
 96 |     logfile.write('keeping... '+ by_size[i][FILENAME]+' (pages '+str(by_size[i][PAGECOUNT])+')\n')
 97 |          
 98 |  
 99 |     while (i<len(by_size)-1) and (int(by_size[i+1][PAGECOUNT]) > (int(by_size[i][PAGECOUNT]) - int(options["c2c_noads_gap"]))):
100 |             to_keep.append(by_size[i+1])
101 |             logfile.write('keeping... '+ by_size[i+1][FILENAME]+' (pages '+str(by_size[i+1][PAGECOUNT])+')\n') 
102 |             i = i+1
103 |     for j in range (i+1,len(by_size)):
104 |         to_remove.append(by_size[j])
105 |         logfile.write('removing... '+ by_size[j][FILENAME]+' (pages '+str(by_size[j][PAGECOUNT])+')\n')
106 |             
107 |     if to_remove != []:
108 |         updateinfo(options, to_remove, to_keep, logfile) 
109 |         deletecomics(options, cr, to_remove, logfile)
110 |     
111 |     return to_keep[:]
112 | 
113 | 
114 | def process_pagecount_largest(options, cr, percentage, dgroup, logfile, test_to_keep):
115 |     ''' Keeps from the 'dgroup' the ones with most pages
116 |             dgroup -> list of duplicate comics
117 |             logfile -> file object
118 |             percentage -> a percentage over the page count that is used to keep more comics
119 |             test_to_keep -> True to keep the largest, False to remove 
120 |             '''
121 |     
122 |     by_pages = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=True) # sorts by number of pages
123 |     min_pages = by_pages[0][PAGECOUNT] * (1 - percentage/100.0)
124 |     
125 |     if options["verbose"]:
126 |         logfile.write('Filtering all files with at least ' + str(min_pages) + ' pages\n')
127 |     
128 |     def IsToKeep(comic):
129 |         return comic[PAGECOUNT] >= min_pages
130 |             
131 |     return process_dups(options, cr, IsToKeep, test_to_keep, [PAGECOUNT], dgroup, logfile)
132 | 
133 | def keep_pagecount_largest(options, cr, percentage, dgroup, logfile):
134 |     return process_pagecount_largest(options, cr, percentage, dgroup, logfile, True)
135 | 
136 | def remove_pagecount_largest(options, cr, percentage, dgroup, logfile):
137 |     return process_pagecount_largest(options, cr, percentage, dgroup, logfile, False)
138 | 
139 | 
140 | def process_pagecount_smallest(options, cr, percentage, dgroup, logfile, test_to_keep):
141 |     ''' Keeps from the 'group' the one with less pages
142 |             dgroup -> list of duplicate comics
143 |             logfile -> file object
144 |             percentage -> a percentage over the page count that is used to keep more comics
145 |             test_to_keep -> True to keep the smallest, False to remove 
146 |             '''
147 |     
148 |     by_pages = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=False) # sorts by filesize of covers
149 |                          
150 |     # keep fileless
151 |     for comic in by_pages:
152 |         if comic[PAGECOUNT] == 0:
153 |             by_pages.remove(comic)
154 | 
155 |     if len(by_pages) < 1:
156 |         max_pages = 0
157 |     else:
158 |         max_pages = by_pages[0][PAGECOUNT] * (1 + percentage/100.0)
159 |     
160 |     if options["verbose"]:
161 |         logfile.write('Filtering all files with at max ' + str(max_pages) + ' pages\n')
162 |         
163 |     def IsToKeep(comic):
164 |         return comic[PAGECOUNT] <= max_pages
165 |             
166 |     return process_dups(options, cr, IsToKeep, test_to_keep, [PAGECOUNT], dgroup, logfile)
167 | 
168 | def keep_pagecount_smallest(options, cr, percentage, dgroup, logfile):
169 |     return process_pagecount_smallest(options, cr, percentage, dgroup, logfile, True)
170 | 
171 | def remove_pagecount_smallest(options, cr, percentage, dgroup, logfile):
172 |     return process_pagecount_smallest(options, cr, percentage, dgroup, logfile, False)
173 | 
174 |     
175 | def keep_pagecount_fileless(options, cr, dgroup, logfile):
176 |     ''' Keeps only fileless comics          
177 |         dgroup -> list of duplicate comics
178 |         logfile -> file object    '''
179 |         
180 |     def IsToKeep(comic):
181 |         return comic[FILENAME] != "Fileless"
182 |             
183 |     return process_dups(options, cr, IsToKeep, [PAGECOUNT], dgroup, logfile)
184 | 
185 |     
186 | def remove_pagecount_fileless(options, cr, dgroup, logfile):
187 |     ''' Removes fileless comics          
188 |         dgroup -> list of duplicate comics
189 |         logfile -> file object    '''
190 | 
191 |     to_keep = dgroup[:]
192 |     to_remove = []
193 |     fileless_thumb = []
194 |     fileless_nothumb = []
195 |     
196 |     # First separate all fileless
197 |     for comic in dgroup:
198 |           
199 |         if comic[FILENAME] == "Fileless":
200 |             to_keep.remove(comic)
201 |             if comic[BOOK].CustomThumbnailKey == None:
202 |                 fileless_nothumb.append(comic)
203 |             else:
204 |                 fileless_thumb.append(comic)
205 |             
206 |     if len(to_keep) == 0:                # all are fileless
207 |         if len(fileless_nothumb) == len(dgroup):  # none has custom thumb
208 |             to_keep.append(fileless_nothumb[0])  # keep the first one
209 |             fileless_nothumb.pop(0)
210 |             to_remove = fileless_nothumb[:]
211 |         elif len(fileless_thumb) == 1:    # only one with custom thumb
212 |             to_keep = fileless_thumb[:]
213 |             to_remove = fileless_nothumb[:]
214 |         else:                                # more than one with custom thumb
215 |             to_keep.append(fileless_thumb[0])    #keep the first one
216 |             fileless_thumb.pop(0)
217 |             to_remove = fileless_thumb[:]
218 |             to_remove.extend(fileless_nothumb)
219 |     
220 |     else:               # if there were non fileless, remove all fileless
221 |         to_remove.extend(fileless_thumb)
222 |         to_remove.extend(fileless_nothumb)
223 |     
224 |     
225 |     for comic in fileless_nothumb:
226 |         logfile.write('removing... '+ comic[SERIES]+' #' + comic[NUMBER] + ' (fileless + no cover)\n')  
227 |     for comic in fileless_thumb:
228 |         logfile.write('removing... '+ comic[SERIES]+' #' + comic[NUMBER] + ' (fileless + custom cover)\n')    
229 |     for comic in to_keep:
230 |         logfile.write('keeping... '+ comic[FILENAME]+' (pages '+str(comic[PAGECOUNT])+')\n')
231 |     
232 |     if to_remove != []:
233 |         updateinfo(options, to_remove, to_keep, logfile) 
234 |         deletecomics(options, cr, to_remove, logfile)
235 |         
236 |     return to_keep[:]  
237 | 
238 |     
239 | 
240 | # =================== FILESIZE FUNCTIONS ========================================================
241 | 
242 | 
243 | def process_filesize_largest(options, cr, percentage, dgroup, logfile, test_to_keep):
244 |     ''' Keeps from the 'group' the largest comic
245 |             dgroup -> list of duplicate comics
246 |             logfile -> file object   
247 |             percentage -> a percentage over the size that is used to keep more comics
248 |             test_to_keep -> True to keep largest, False to remove 
249 |             '''
250 | 
251 |     by_size = sorted(dgroup, key=lambda dgroup: dgroup[FILESIZE], reverse=True) # sorts by filesize of covers
252 |     min_size = by_size[0][FILESIZE] * (1 - percentage/100.0)
253 |     
254 |     if options["verbose"]:
255 |         logfile.write('Filtering all files with size at least ' + str(min_size) + '\n')
256 |     
257 |     def IsToKeep(comic):
258 |         return comic[FILESIZE] >= min_size
259 |             
260 |     return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE], dgroup, logfile)
261 | 
262 | def keep_filesize_largest(options, cr, percentage, dgroup, logfile):
263 |     return process_filesize_largest(options, cr, percentage, dgroup, logfile, True)
264 | 
265 | def remove_filesize_largest(options, cr, percentage, dgroup, logfile):
266 |     return process_filesize_largest(options, cr, percentage, dgroup, logfile, False)
267 | 
268 | 
269 | def process_filesize_smallest(options, cr, percentage, dgroup, logfile, test_to_keep):
270 |     ''' Keeps from the 'group' the smallest comic
271 |             dgroup -> list of duplicate comics
272 |             logfile -> file object
273 |             percentage -> a percentage over the size that is used to keep more comics
274 |             test_to_keep -> True to keep smallest, False to remove 
275 |             '''
276 | 
277 |     by_size = sorted(dgroup, key=lambda dgroup: dgroup[FILESIZE], reverse=False) # sorts by filesize of covers
278 |                          
279 |     # keep fileless
280 |     for comic in by_size:
281 |         if comic[PAGECOUNT] == 0:
282 |             by_size.remove(comic)
283 | 
284 |     if len(by_size) < 1:
285 |         max_size = 0
286 |     else:
287 |         max_size = by_size[0][FILESIZE] * (1 + percentage/100.0)
288 |     
289 |     if options["verbose"]:
290 |         logfile.write('Filtering all files with size at max ' + str(max_size) + '\n')
291 |         
292 |     def IsToKeep(comic):
293 |         return comic[FILESIZE] <= max_size
294 |             
295 |     return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE], dgroup, logfile)
296 | 
297 | def keep_filesize_smallest(options, cr, percentage, dgroup, logfile):
298 |     return process_filesize_smallest(options, cr, percentage, dgroup, logfile, True)
299 | 
300 | def remove_filesize_smallest(options, cr, percentage, dgroup, logfile):
301 |     return process_filesize_smallest(options, cr, percentage, dgroup, logfile, False)
302 | 
303 |     
304 | 
305 | # =================== PAGESIZE FUNCTIONS ========================================================
306 | 
307 | def pagesize(comic):
308 |     if comic[PAGECOUNT] == 0:
309 |         return 0
310 |     else:
311 |         return comic[FILESIZE] / comic[PAGECOUNT]
312 | 
313 | def process_pagesize_largest(options, cr, percentage, dgroup, logfile, test_to_keep):
314 |     ''' Keeps from the 'group' the comic with largest page size
315 |             dgroup -> list of duplicate comics
316 |             logfile -> file object   
317 |             percentage -> a percentage over the size that is used to keep more comics
318 |             test_to_keep -> True to keep largest, False to remove 
319 |             '''
320 | 
321 |     by_size = sorted(dgroup, key=lambda dgroup: pagesize(dgroup), reverse=True) # sorts by filesize of covers
322 |     min_size = pagesize(by_size[0]) * (1 - percentage/100.0)
323 |     
324 |     if options["verbose"]:
325 |         logfile.write('Filtering all files with page size at least ' + str(min_size) + '\n')
326 |     
327 |     def IsToKeep(comic):
328 |         return pagesize(comic) >= min_size
329 |             
330 |     return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE, PAGECOUNT, PAGESIZE], dgroup, logfile)
331 | 
332 | def keep_pagesize_largest(options, cr, percentage, dgroup, logfile):
333 |     return process_pagesize_largest(options, cr, percentage, dgroup, logfile, True)
334 | 
335 | def remove_pagesize_largest(options, cr, percentage, dgroup, logfile):
336 |     return process_pagesize_largest(options, cr, percentage, dgroup, logfile, False)
337 | 
338 | 
339 | def process_pagesize_smallest(options, cr, percentage, dgroup, logfile, test_to_keep):
340 |     ''' Keeps from the 'group' the comic with smallest page size
341 |             dgroup -> list of duplicate comics
342 |             logfile -> file object
343 |             percentage -> a percentage over the size that is used to keep more comics
344 |             test_to_keep -> True to keep smallest, False to remove 
345 |             '''
346 | 
347 |     by_size = sorted(dgroup, key=lambda dgroup: pagesize(dgroup), reverse=False) # sorts by filesize of covers
348 |                          
349 |     # keep fileless
350 |     for comic in by_size:
351 |         if comic[PAGECOUNT] == 0:
352 |             by_size.remove(comic)
353 |                          
354 |     if len(by_size) < 1:
355 |         max_size = 0
356 |     else:
357 |         max_size = pagesize(by_size[0]) * (1 + percentage/100.0)
358 |     
359 |     if options["verbose"]:
360 |         logfile.write('Filtering all files with page size at max ' + str(max_size) + '\n')
361 |         
362 |     def IsToKeep(comic):
363 |         return pagesize(comic) <= max_size
364 |             
365 |     return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE, PAGECOUNT, PAGESIZE], dgroup, logfile)
366 | 
367 | def keep_pagesize_smallest(options, cr, percentage, dgroup, logfile):
368 |     return process_pagesize_smallest(options, cr, percentage, dgroup, logfile, True)
369 | 
370 | def remove_pagesize_smallest(options, cr, percentage, dgroup, logfile):
371 |     return process_pagesize_smallest(options, cr, percentage, dgroup, logfile, False)
372 |     
373 |     
374 |     
375 | # =================== COVERS FUNCTIONS ============================================================    
376 | 
377 | 
378 | def keep_covers_all(options, cr, option, dgroup, logfile):
379 |     ''' Keeps from the 'group' the comics with largest number of '(n covers)' in the file name
380 |             dgroup -> list of duplicate comics
381 |             logfile -> file object
382 |             option -> boolean: True means all comics with or without 'covers' are considered, meaning that
383 |                 if there is a single comic with 'covers' the rest will be deleted. False means
384 |                 that only those comics with the 'covers' word will be considered in the process        '''
385 | 
386 |     with_covers = []
387 |     to_keep = [] 
388 |     to_remove = []
389 |                          
390 |     for comic in dgroup:
391 |         searchstring = convertnumberwords(comic[FILENAME],False)
392 |         searchstring = searchstring.replace("(both","(2")
393 |         searchstring = searchstring.lower()
394 | 
395 |       
396 |         m = re.search('\((\d*) +covers\)', searchstring)
397 |         if m:
398 |             with_covers.append((comic, int(m.groups(0)[0])))
399 |         else:
400 |             if option == True:
401 |                 with_covers.append((comic,1))
402 |             else:
403 |                 to_keep.append(comic)
404 | 
405 |     if with_covers != []:
406 |         dgroup = []
407 |         with_covers = sorted(with_covers, key=lambda to_keep: to_keep[1], reverse=True) # sorts by number of covers
408 |         max = with_covers[0][1] # max number of covers found
409 |         
410 |         temp_with_covers = with_covers[:]  
411 |         for (comic,covers) in temp_with_covers:
412 |             if covers < max:
413 |                 with_covers.remove((comic,covers))
414 |                 to_remove.append(comic)
415 |                 logfile.write('removing... '+ comic[FILENAME]+'\n')
416 |             else:
417 |                 # logfile.write('keeping... '+ comic[FILENAME]+'\n')
418 |                 to_keep.append(comic)
419 | 
420 |         
421 |         for comic in to_keep:
422 |             dgroup.append(comic)
423 |             logfile.write('keeping... '+ comic[FILENAME]+'\n')
424 |     
425 |     if to_remove != []:
426 |         updateinfo(options, to_remove, dgroup, logfile) 
427 |         deletecomics(options, cr, to_remove, logfile)
428 |     
429 |     del with_covers
430 |     del to_keep
431 |     del to_remove
432 | 
433 |     return dgroup
434 | 
435 | 
436 | 
437 | # =================== WORD SEARCH FUNCTIONS ========================================================    
438 | 
439 | def fix_words_for_testing(words):
440 |     wordlist = []
441 |     
442 |     for word in words:
443 |         word = word.lower()
444 |     
445 |         ''' some common substitutions .... more can be added '''
446 |         if word in ('c2c', 'ctc', 'fiche'): 
447 |             wordlist.extend(['c2c', 'ctc', 'fiche'])
448 |         elif word in ('noads'): 
449 |             wordlist.extend(['noads', 'no ads'])
450 |         elif word in ('(f)', 'fixed'): 
451 |             wordlist.extend(['(f)', 'fixed'])
452 |         elif word in ('(f)', 'fiche'): 
453 |             wordlist.extend(['(f)', 'fiche'])
454 |         elif word in ('zip','cbz'): 
455 |             wordlist.extend(['zip','cbz'])
456 |         elif word in ('rar','cbr'): 
457 |             wordlist.extend(['rar','cbr'])
458 |         else:
459 |             wordlist.append(cleanupseries(word))
460 |     
461 |     return wordlist
462 | 
463 | 
464 | def process_with_words(options, cr, words, items, dgroup, logfile, test_to_keep):
465 |     ''' Removes from the 'group' all comics that do not include any of the 'words'
466 |         in the fields 'item'
467 |             dgroup -> list of duplicate comics
468 |             logfile -> file object
469 |             words -> text strings to be searched
470 |             items -> LIST of fields to search in'''
471 | 
472 |     wordlist = fix_words_for_testing(words)
473 | 
474 |     def IsToKeep(comic):
475 |         searchstring = ""
476 |         for item in items:
477 |             searchstring = searchstring + " " + cleanupseries(comic[item])
478 |             ''' adds all search strings together '''
479 |         
480 |         for word in wordlist:
481 |             if searchstring.find(word) != -1:    #word found
482 |                 return True
483 |         
484 |         return False
485 |         
486 |     return process_dups(options, cr, IsToKeep, test_to_keep, items, dgroup, logfile)
487 | 
488 | def keep_with_words(options, cr, words, items, dgroup, logfile):
489 |     return process_with_words(options, cr, words, items, dgroup, logfile, True)
490 | 
491 | def remove_with_words(options, cr, words, items, dgroup, logfile):
492 |     return process_with_words(options, cr, words, items, dgroup, logfile, False)
493 | 
494 | 
495 | def keep_first(options, cr, dgroup, logfile):
496 |     ''' Keeps only the first comic in the group
497 |             dgroup -> list of duplicate comics
498 |             logfile -> file object'''
499 |     
500 |     to_keep = dgroup[0]
501 | 
502 |     def IsToKeep(book):
503 |         return book == to_keep
504 |         
505 |     return process_dups(options, cr, IsToKeep, True, [], dgroup, logfile)
506 |     
507 | 
508 |     
509 | ###################################################################################################
510 | 
511 | # ================ BASE FUNCTION TO HANDLE THE DUPS ================================================
512 | 
513 | 
514 | def process_dups(options, cr, test_to_keep, keep_if_test_is_true, fields, dgroup, logfile):
515 |     ''' Removes from the 'group' all comics that test_to_keep('comic') returns false
516 |             dgroup -> list of duplicate comics
517 |             logfile -> file object
518 |             test_to_keep -> function to do the tesing'''
519 | 
520 |     to_keep = []
521 |     to_remove = []
522 |     
523 |     for comic in dgroup:
524 |         if test_to_keep(comic) == keep_if_test_is_true:
525 |             if comic not in to_keep: 
526 |                 to_keep.append(comic)
527 |             continue
528 |         
529 |         if comic not in to_keep: 
530 |             to_remove.append(comic)
531 |     
532 |     # Make sure at least 1 book remains!!!!
533 |     if len(to_keep) < 1:
534 |         logfile.write('Filter would remove all items, so it will be ignored\n')
535 |         to_keep = dgroup[:]
536 |         to_remove = []
537 |     
538 |     # Log comic actions
539 |     for comic in dgroup:
540 |         if comic in to_keep:
541 |             logfile.write('keeping... ')
542 |         else:
543 |             logfile.write('removing... ')
544 | 
545 |         logfile.write(comic[FILENAME])
546 |         
547 |         if options["verbose"] and len(fields) > 0:
548 |             logfile.write('                 (')        
549 |             for i in range(len(fields)):
550 |                 if i > 0:
551 |                     logfile.write(' ')
552 |                 f = fields[i]
553 |                 if f == PAGESIZE:
554 |                     logfile.write('pagesize=' + ToString(pagesize(comic)))
555 |                 else:
556 |                     logfile.write(FIELD_NAMES[f] + '=' + ToString(comic[f]))
557 |             logfile.write(')')
558 |                 
559 |         logfile.write('\n')
560 |         logfile.flush()
561 |     
562 |     # Delete books
563 |     if to_remove != []:
564 |         updateinfo(options, to_remove, to_keep, logfile) 
565 |         deletecomics(options, cr, to_remove, logfile)
566 | 
567 |     del to_remove
568 |     
569 |     return to_keep
570 | 
571 | 
572 | # ================ DELETE COMICS FUNCTION ==========================================================
573 | 
574 | # Copy missing data from remove files to keep files
575 | def updateinfo(options, to_remove_files, to_keep_files, logfile):
576 |     to_remove = []
577 |     for book in to_remove_files:
578 |         to_remove.append(dmBookWrapper(book[BOOK]))
579 |     to_keep = []
580 |     for book in to_keep_files:
581 |         to_keep.append(dmBookWrapper(book[BOOK]))
582 |     
583 |     for field in FIELDS_TO_UPDATE_INFO:
584 |         data = None
585 |         
586 |         # Get available data
587 |         for book in to_remove:
588 |             book_data = getattr(book, field[0])
589 |             
590 |             if options["debug"]:
591 |                 logfile.write('  rem: ' + book.FileName + ': ' + field[0] + ' = ' + ToString(book_data) + '\n')
592 |             
593 |             if book_data:
594 |                 data = book_data
595 |                 break
596 |         
597 |         if not data:
598 |             continue
599 |         
600 |         try:
601 |             data = field[1](data)
602 |         except:
603 |             if options["verbose"]:
604 |                 logfile.write('updating... Could not convert data to correct type ' + field[0] + ' = ' + ToString(data) + '\n')
605 |             continue
606 |         
607 |         # Set in missing books
608 |         for book in to_keep:
609 |             book_data = getattr(book, field[0])
610 |             
611 |             if options["debug"]:
612 |                 logfile.write('  keep: ' + book.FileName + ': ' + field[0] + ' = ' + ToString(book_data) + '\n')
613 |             
614 |             if not book_data:
615 |                 if not options["updateinfo"]:
616 |                     logfile.write('[simulation] ')
617 |                 logfile.write('updating... ' + book.FileName + ': ' + field[0] + ' = ' + ToString(data) + '\n')
618 |                 
619 |                 if options["updateinfo"]:
620 |                     setattr(book.raw, field[0], data)
621 | 
622 | 
623 | # ================ DELETE COMICS FUNCTION ==========================================================
624 | 
625 | def deletecomics(options, cr, deletelist, logfile):
626 |     ''' Moves or deletes the specified comics and removes them from the library'''
627 |     
628 |     ''' Mostly ripped form StonePawn's Libary Organizer script'''
629 |     
630 |     if not Directory.Exists(DUPESDIRECTORY):
631 |         try:
632 |             Directory.CreateDirectory(DUPESDIRECTORY)
633 |         except Exception, ex:
634 |                 MessageBox.Show('ERROR: '+ str(ex), "ERROR creating dump directory" + DUPESDIRECTORY, MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
635 |                 logfile.write('ERROR: '+str(ex)+'\n')
636 |                 return
637 |     
638 |     for comic in deletelist:
639 |         
640 |         if options["movefiles"]: 
641 |             fullpath = Path.Combine(DUPESDIRECTORY, comic[FILENAME])
642 |         
643 |            #Check if the file currently exists at all
644 |            
645 |             if comic[FILENAME]!='Fileless' and File.Exists(comic[FILEPATH]):
646 |               #If the book is already in the location we don't have to do anything
647 |               if fullpath == comic[FILEPATH]:
648 |                  
649 |                 #print "books path is the same"
650 |                 logfile.write("\n\nSkipped moving book " + comic[FILEPATH] + " because it is already located at the calculated path")
651 |                 dmCleanDirectories(DirectoryInfo(path))
652 |                 
653 |             if comic[FILENAME]!='Fileless' and not File.Exists(fullpath):
654 |                 try:
655 |                     File.Move(comic[FILEPATH], fullpath)
656 |                     comic[BOOK].FilePath = fullpath            #update new file path
657 |                 except Exception, ex:
658 |                         MessageBox.Show('ERROR: '+ str(ex)+ "while trying to move " + comic[FILENAME], 'MOVE ERROR', MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
659 |                         logfile.write('ERROR: '+str(ex)+'\n')
660 |             else:
661 |                 logfile.write('WARNING: '+comic[FILENAME]+' could not be moved\n')
662 |              
663 |             logfile.write('---MOVED... '+ comic[FILENAME]+'\n')
664 |         
665 |         if options["removefromlib"]:
666 |             try:
667 |                 cr.App.RemoveBook(comic[BOOK])
668 |                 logfile.write('---REMOVED FROM LIBRARY... '+ comic[FILENAME]+'\n')
669 |             except:        
670 |                 logfile.write('---COULD NOT REMOVE FROM LIBRARY... '+ comic[FILENAME]+'\n')
671 |             
672 |             
673 |     return
674 |             
675 |             
676 | def dmCleanDirectories(directory):
677 |     #Driectory should be a directoryinfo object
678 |     if not directory.Exists:
679 |         return
680 |     if len(directory.GetFiles()) == 0 and len(directory.GetDirectories()) == 0:
681 |         parent = directory.Parent
682 |         directory.Delete()
683 |         CleanDirectories(parent)
684 | 


--------------------------------------------------------------------------------
/re.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # Secret Labs' Regular Expression Engine
  3 | #
  4 | # re-compatible interface for the sre matching engine
  5 | #
  6 | # Copyright (c) 1998-2001 by Secret Labs AB.  All rights reserved.
  7 | #
  8 | # This version of the SRE library can be redistributed under CNRI's
  9 | # Python 1.6 license.  For any other use, please contact Secret Labs
 10 | # AB (info@pythonware.com).
 11 | #
 12 | # Portions of this engine have been developed in cooperation with
 13 | # CNRI.  Hewlett-Packard provided funding for 1.6 integration and
 14 | # other compatibility work.
 15 | #
 16 | 
 17 | r"""Support for regular expressions (RE).
 18 | 
 19 | This module provides regular expression matching operations similar to
 20 | those found in Perl.  It supports both 8-bit and Unicode strings; both
 21 | the pattern and the strings being processed can contain null bytes and
 22 | characters outside the US ASCII range.
 23 | 
 24 | Regular expressions can contain both special and ordinary characters.
 25 | Most ordinary characters, like "A", "a", or "0", are the simplest
 26 | regular expressions; they simply match themselves.  You can
 27 | concatenate ordinary characters, so last matches the string 'last'.
 28 | 
 29 | The special characters are:
 30 |     "."      Matches any character except a newline.
 31 |     "^"      Matches the start of the string.
 32 |     "$"      Matches the end of the string or just before the newline at
 33 |              the end of the string.
 34 |     "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
 35 |              Greedy means that it will match as many repetitions as possible.
 36 |     "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
 37 |     "?"      Matches 0 or 1 (greedy) of the preceding RE.
 38 |     *?,+?,?? Non-greedy versions of the previous three special characters.
 39 |     {m,n}    Matches from m to n repetitions of the preceding RE.
 40 |     {m,n}?   Non-greedy version of the above.
 41 |     "\\"     Either escapes special characters or signals a special sequence.
 42 |     []       Indicates a set of characters.
 43 |              A "^" as the first character indicates a complementing set.
 44 |     "|"      A|B, creates an RE that will match either A or B.
 45 |     (...)    Matches the RE inside the parentheses.
 46 |              The contents can be retrieved or matched later in the string.
 47 |     (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below).
 48 |     (?:...)  Non-grouping version of regular parentheses.
 49 |     (?P<name>...) The substring matched by the group is accessible by name.
 50 |     (?P=name)     Matches the text matched earlier by the group named name.
 51 |     (?#...)  A comment; ignored.
 52 |     (?=...)  Matches if ... matches next, but doesn't consume the string.
 53 |     (?!...)  Matches if ... doesn't match next.
 54 |     (?<=...) Matches if preceded by ... (must be fixed length).
 55 |     (?<!...) Matches if not preceded by ... (must be fixed length).
 56 |     (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
 57 |                        the (optional) no pattern otherwise.
 58 | 
 59 | The special sequences consist of "\\" and a character from the list
 60 | below.  If the ordinary character is not on the list, then the
 61 | resulting RE will match the second character.
 62 |     \number  Matches the contents of the group of the same number.
 63 |     \A       Matches only at the start of the string.
 64 |     \Z       Matches only at the end of the string.
 65 |     \b       Matches the empty string, but only at the start or end of a word.
 66 |     \B       Matches the empty string, but not at the start or end of a word.
 67 |     \d       Matches any decimal digit; equivalent to the set [0-9].
 68 |     \D       Matches any non-digit character; equivalent to the set [^0-9].
 69 |     \s       Matches any whitespace character; equivalent to [ \t\n\r\f\v].
 70 |     \S       Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v].
 71 |     \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
 72 |              With LOCALE, it will match the set [0-9_] plus characters defined
 73 |              as letters for the current locale.
 74 |     \W       Matches the complement of \w.
 75 |     \\       Matches a literal backslash.
 76 | 
 77 | This module exports the following functions:
 78 |     match    Match a regular expression pattern to the beginning of a string.
 79 |     search   Search a string for the presence of a pattern.
 80 |     sub      Substitute occurrences of a pattern found in a string.
 81 |     subn     Same as sub, but also return the number of substitutions made.
 82 |     split    Split a string by the occurrences of a pattern.
 83 |     findall  Find all occurrences of a pattern in a string.
 84 |     finditer Return an iterator yielding a match object for each match.
 85 |     compile  Compile a pattern into a RegexObject.
 86 |     purge    Clear the regular expression cache.
 87 |     escape   Backslash all non-alphanumerics in a string.
 88 | 
 89 | Some of the functions in this module takes flags as optional parameters:
 90 |     I  IGNORECASE  Perform case-insensitive matching.
 91 |     L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
 92 |     M  MULTILINE   "^" matches the beginning of lines (after a newline)
 93 |                    as well as the string.
 94 |                    "$" matches the end of lines (before a newline) as well
 95 |                    as the end of the string.
 96 |     S  DOTALL      "." matches any character at all, including the newline.
 97 |     X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
 98 |     U  UNICODE     Make \w, \W, \b, \B, dependent on the Unicode locale.
 99 | 
100 | This module also defines an exception 'error'.
101 | 
102 | """
103 | 
104 | import sys
105 | import sre_compile
106 | import sre_parse
107 | 
108 | # public symbols
109 | __all__ = [ "match", "search", "sub", "subn", "split", "findall",
110 |     "compile", "purge", "template", "escape", "I", "L", "M", "S", "X",
111 |     "U", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
112 |     "UNICODE", "error" ]
113 | 
114 | __version__ = "2.2.1"
115 | 
116 | # flags
117 | I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
118 | L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
119 | U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
120 | M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
121 | S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
122 | X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
123 | 
124 | # sre extensions (experimental, don't rely on these)
125 | T = TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking
126 | DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation
127 | 
128 | # sre exception
129 | error = sre_compile.error
130 | 
131 | # --------------------------------------------------------------------
132 | # public interface
133 | 
134 | def match(pattern, string, flags=0):
135 |     """Try to apply the pattern at the start of the string, returning
136 |     a match object, or None if no match was found."""
137 |     return _compile(pattern, flags).match(string)
138 | 
139 | def search(pattern, string, flags=0):
140 |     """Scan through string looking for a match to the pattern, returning
141 |     a match object, or None if no match was found."""
142 |     return _compile(pattern, flags).search(string)
143 | 
144 | def sub(pattern, repl, string, count=0, flags=0):
145 |     """Return the string obtained by replacing the leftmost
146 |     non-overlapping occurrences of the pattern in string by the
147 |     replacement repl.  repl can be either a string or a callable;
148 |     if a string, backslash escapes in it are processed.  If it is
149 |     a callable, it's passed the match object and must return
150 |     a replacement string to be used."""
151 |     return _compile(pattern, flags).sub(repl, string, count)
152 | 
153 | def subn(pattern, repl, string, count=0, flags=0):
154 |     """Return a 2-tuple containing (new_string, number).
155 |     new_string is the string obtained by replacing the leftmost
156 |     non-overlapping occurrences of the pattern in the source
157 |     string by the replacement repl.  number is the number of
158 |     substitutions that were made. repl can be either a string or a
159 |     callable; if a string, backslash escapes in it are processed.
160 |     If it is a callable, it's passed the match object and must
161 |     return a replacement string to be used."""
162 |     return _compile(pattern, flags).subn(repl, string, count)
163 | 
164 | def split(pattern, string, maxsplit=0, flags=0):
165 |     """Split the source string by the occurrences of the pattern,
166 |     returning a list containing the resulting substrings."""
167 |     return _compile(pattern, flags).split(string, maxsplit)
168 | 
169 | def findall(pattern, string, flags=0):
170 |     """Return a list of all non-overlapping matches in the string.
171 | 
172 |     If one or more groups are present in the pattern, return a
173 |     list of groups; this will be a list of tuples if the pattern
174 |     has more than one group.
175 | 
176 |     Empty matches are included in the result."""
177 |     return _compile(pattern, flags).findall(string)
178 | 
179 | if sys.hexversion >= 0x02020000:
180 |     __all__.append("finditer")
181 |     def finditer(pattern, string, flags=0):
182 |         """Return an iterator over all non-overlapping matches in the
183 |         string.  For each match, the iterator returns a match object.
184 | 
185 |         Empty matches are included in the result."""
186 |         return _compile(pattern, flags).finditer(string)
187 | 
188 | def compile(pattern, flags=0):
189 |     "Compile a regular expression pattern, returning a pattern object."
190 |     return _compile(pattern, flags)
191 | 
192 | def purge():
193 |     "Clear the regular expression cache"
194 |     _cache.clear()
195 |     _cache_repl.clear()
196 | 
197 | def template(pattern, flags=0):
198 |     "Compile a template pattern, returning a pattern object"
199 |     return _compile(pattern, flags|T)
200 | 
201 | _alphanum = {}
202 | for c in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890':
203 |     _alphanum[c] = 1
204 | del c
205 | 
206 | def escape(pattern):
207 |     "Escape all non-alphanumeric characters in pattern."
208 |     s = list(pattern)
209 |     alphanum = _alphanum
210 |     for i in range(len(pattern)):
211 |         c = pattern[i]
212 |         if c not in alphanum:
213 |             if c == "\000":
214 |                 s[i] = "\\000"
215 |             else:
216 |                 s[i] = "\\" + c
217 |     return pattern[:0].join(s)
218 | 
219 | # --------------------------------------------------------------------
220 | # internals
221 | 
222 | _cache = {}
223 | _cache_repl = {}
224 | 
225 | _pattern_type = type(sre_compile.compile("", 0))
226 | 
227 | _MAXCACHE = 100
228 | 
229 | def _compile(*key):
230 |     # internal: compile pattern
231 |     cachekey = (type(key[0]),) + key
232 |     p = _cache.get(cachekey)
233 |     if p is not None:
234 |         return p
235 |     pattern, flags = key
236 |     if isinstance(pattern, _pattern_type):
237 |         if flags:
238 |             raise ValueError('Cannot process flags argument with a compiled pattern')
239 |         return pattern
240 |     if not sre_compile.isstring(pattern):
241 |         raise TypeError, "first argument must be string or compiled pattern"
242 |     try:
243 |         p = sre_compile.compile(pattern, flags)
244 |     except error, v:
245 |         raise error, v # invalid expression
246 |     if len(_cache) >= _MAXCACHE:
247 |         _cache.clear()
248 |     _cache[cachekey] = p
249 |     return p
250 | 
251 | def _compile_repl(*key):
252 |     # internal: compile replacement pattern
253 |     p = _cache_repl.get(key)
254 |     if p is not None:
255 |         return p
256 |     repl, pattern = key
257 |     try:
258 |         p = sre_parse.parse_template(repl, pattern)
259 |     except error, v:
260 |         raise error, v # invalid expression
261 |     if len(_cache_repl) >= _MAXCACHE:
262 |         _cache_repl.clear()
263 |     _cache_repl[key] = p
264 |     return p
265 | 
266 | def _expand(pattern, match, template):
267 |     # internal: match.expand implementation hook
268 |     template = sre_parse.parse_template(template, pattern)
269 |     return sre_parse.expand_template(template, match)
270 | 
271 | def _subx(pattern, template):
272 |     # internal: pattern.sub/subn implementation helper
273 |     template = _compile_repl(template, pattern)
274 |     if not template[0] and len(template[1]) == 1:
275 |         # literal replacement
276 |         return template[1][0]
277 |     def filter(match, template=template):
278 |         return sre_parse.expand_template(template, match)
279 |     return filter
280 | 
281 | # register myself for pickling
282 | 
283 | import copy_reg
284 | 
285 | def _pickle(p):
286 |     return _compile, (p.pattern, p.flags)
287 | 
288 | copy_reg.pickle(_pattern_type, _pickle, _compile)
289 | 
290 | # --------------------------------------------------------------------
291 | # experimental stuff (see python-dev discussions for details)
292 | 
293 | class Scanner:
294 |     def __init__(self, lexicon, flags=0):
295 |         from sre_constants import BRANCH, SUBPATTERN
296 |         self.lexicon = lexicon
297 |         # combine phrases into a compound pattern
298 |         p = []
299 |         s = sre_parse.Pattern()
300 |         s.flags = flags
301 |         for phrase, action in lexicon:
302 |             p.append(sre_parse.SubPattern(s, [
303 |                 (SUBPATTERN, (len(p)+1, sre_parse.parse(phrase, flags))),
304 |                 ]))
305 |         s.groups = len(p)+1
306 |         p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])
307 |         self.scanner = sre_compile.compile(p)
308 |     def scan(self, string):
309 |         result = []
310 |         append = result.append
311 |         match = self.scanner.scanner(string).match
312 |         i = 0
313 |         while 1:
314 |             m = match()
315 |             if not m:
316 |                 break
317 |             j = m.end()
318 |             if i == j:
319 |                 break
320 |             action = self.lexicon[m.lastindex-1][1]
321 |             if hasattr(action, '__call__'):
322 |                 self.match = m
323 |                 action = action(self, m.group())
324 |             if action is not None:
325 |                 append(action)
326 |             i = j
327 |         return result, string[i:]
328 | 


--------------------------------------------------------------------------------
/traceback.py:
--------------------------------------------------------------------------------
  1 | """Extract, format and print information about Python stack traces."""
  2 | # Modified to remove linecache dependency 
  3 | 
  4 | #import linecache
  5 | import sys
  6 | 
  7 | __all__ = ['extract_stack', 'extract_tb', 'format_exception',
  8 |            'format_exception_only', 'format_list', 'format_stack',
  9 |            'format_tb', 'print_exc', 'format_exc', 'print_exception',
 10 |            'print_last', 'print_stack', 'print_tb']
 11 | 
 12 | def _print(file, str='', terminator='\n'):
 13 |     file.write(str+terminator)
 14 | 
 15 | 
 16 | def print_list(extracted_list, file=None):
 17 |     """Print the list of tuples as returned by extract_tb() or
 18 |     extract_stack() as a formatted stack trace to the given file."""
 19 |     if file is None:
 20 |         file = sys.stderr
 21 |     for filename, lineno, name, line in extracted_list:
 22 |         _print(file,
 23 |                '  File "%s", line %d, in %s' % (filename,lineno,name))
 24 |         if line:
 25 |             _print(file, '    %s' % line.strip())
 26 | 
 27 | def format_list(extracted_list):
 28 |     """Format a list of traceback entry tuples for printing.
 29 | 
 30 |     Given a list of tuples as returned by extract_tb() or
 31 |     extract_stack(), return a list of strings ready for printing.
 32 |     Each string in the resulting list corresponds to the item with the
 33 |     same index in the argument list.  Each string ends in a newline;
 34 |     the strings may contain internal newlines as well, for those items
 35 |     whose source text line is not None.
 36 |     """
 37 |     list = []
 38 |     for filename, lineno, name, line in extracted_list:
 39 |         item = '  File "%s", line %d, in %s\n' % (filename,lineno,name)
 40 |         if line:
 41 |             item = item + '    %s\n' % line.strip()
 42 |         list.append(item)
 43 |     return list
 44 | 
 45 | 
 46 | def print_tb(tb, limit=None, file=None):
 47 |     """Print up to 'limit' stack trace entries from the traceback 'tb'.
 48 | 
 49 |     If 'limit' is omitted or None, all entries are printed.  If 'file'
 50 |     is omitted or None, the output goes to sys.stderr; otherwise
 51 |     'file' should be an open file or file-like object with a write()
 52 |     method.
 53 |     """
 54 |     if file is None:
 55 |         file = sys.stderr
 56 |     if limit is None:
 57 |         if hasattr(sys, 'tracebacklimit'):
 58 |             limit = sys.tracebacklimit
 59 |     n = 0
 60 |     while tb is not None and (limit is None or n < limit):
 61 |         f = tb.tb_frame
 62 |         lineno = tb.tb_lineno
 63 |         co = f.f_code
 64 |         filename = co.co_filename
 65 |         name = co.co_name
 66 |         _print(file,
 67 |                '  File "%s", line %d, in %s' % (filename, lineno, name))
 68 |         #linecache.checkcache(filename)
 69 |         #line = linecache.getline(filename, lineno, f.f_globals)
 70 |         #if line: _print(file, '    ' + line.strip())
 71 |         tb = tb.tb_next
 72 |         n = n+1
 73 | 
 74 | def format_tb(tb, limit = None):
 75 |     """A shorthand for 'format_list(extract_stack(f, limit))."""
 76 |     return format_list(extract_tb(tb, limit))
 77 | 
 78 | def extract_tb(tb, limit = None):
 79 |     """Return list of up to limit pre-processed entries from traceback.
 80 | 
 81 |     This is useful for alternate formatting of stack traces.  If
 82 |     'limit' is omitted or None, all entries are extracted.  A
 83 |     pre-processed stack trace entry is a quadruple (filename, line
 84 |     number, function name, text) representing the information that is
 85 |     usually printed for a stack trace.  The text is a string with
 86 |     leading and trailing whitespace stripped; if the source is not
 87 |     available it is None.
 88 |     """
 89 |     if limit is None:
 90 |         if hasattr(sys, 'tracebacklimit'):
 91 |             limit = sys.tracebacklimit
 92 |     list = []
 93 |     n = 0
 94 |     while tb is not None and (limit is None or n < limit):
 95 |         f = tb.tb_frame
 96 |         lineno = tb.tb_lineno
 97 |         co = f.f_code
 98 |         filename = co.co_filename
 99 |         name = co.co_name
100 |         #linecache.checkcache(filename)
101 |         #line = linecache.getline(filename, lineno, f.f_globals)
102 |         #if line: line = line.strip()
103 |         #else: 
104 |         line = None
105 |         list.append((filename, lineno, name, line))
106 |         tb = tb.tb_next
107 |         n = n+1
108 |     return list
109 | 
110 | 
111 | _cause_message = (
112 |     "\nThe above exception was the direct cause "
113 |     "of the following exception:\n")
114 | 
115 | _context_message = (
116 |     "\nDuring handling of the above exception, "
117 |     "another exception occurred:\n")
118 | 
119 | def _iter_chain(exc, custom_tb=None, seen=None):
120 |     if seen is None:
121 |         seen = set()
122 |     seen.add(exc)
123 |     its = []
124 |     cause = exc.__cause__
125 |     context = exc.__context__
126 |     if cause is not None and cause not in seen:
127 |         its.append(_iter_chain(cause, None, seen))
128 |         its.append([(_cause_message, None)])
129 |     if context is not None and context is not cause and context not in seen:
130 |         its.append(_iter_chain(context, None, seen))
131 |         its.append([(_context_message, None)])
132 |     its.append([(exc, custom_tb or exc.__traceback__)])
133 |     # itertools.chain is in an extension module and may be unavailable
134 |     for it in its:
135 |         for x in it:
136 |             yield x
137 | 
138 | 
139 | def print_exception(etype, value, tb, limit=None, file=None, chain=True):
140 |     """Print exception up to 'limit' stack trace entries from 'tb' to 'file'.
141 | 
142 |     This differs from print_tb() in the following ways: (1) if
143 |     traceback is not None, it prints a header "Traceback (most recent
144 |     call last):"; (2) it prints the exception type and value after the
145 |     stack trace; (3) if type is SyntaxError and value has the
146 |     appropriate format, it prints the line where the syntax error
147 |     occurred with a caret on the next line indicating the approximate
148 |     position of the error.
149 |     """
150 |     if file is None:
151 |         file = sys.stderr
152 |     if chain:
153 |         values = _iter_chain(value, tb)
154 |     else:
155 |         values = [(value, tb)]
156 |     for value, tb in values:
157 |         if isinstance(value, str):
158 |             _print(file, value)
159 |             continue
160 |         if tb:
161 |             _print(file, 'Traceback (most recent call last):')
162 |             print_tb(tb, limit, file)
163 |         lines = format_exception_only(type(value), value)
164 |         for line in lines:
165 |             _print(file, line, '')
166 | 
167 | def format_exception(etype, value, tb, limit=None, chain=True):
168 |     """Format a stack trace and the exception information.
169 | 
170 |     The arguments have the same meaning as the corresponding arguments
171 |     to print_exception().  The return value is a list of strings, each
172 |     ending in a newline and some containing internal newlines.  When
173 |     these lines are concatenated and printed, exactly the same text is
174 |     printed as does print_exception().
175 |     """
176 |     list = []
177 |     if chain:
178 |         values = _iter_chain(value, tb)
179 |     else:
180 |         values = [(value, tb)]
181 |     for value, tb in values:
182 |         if isinstance(value, str):
183 |             list.append(value + '\n')
184 |             continue
185 |         if tb:
186 |             list.append('Traceback (most recent call last):\n')
187 |             list.extend(format_tb(tb, limit))
188 |         list.extend(format_exception_only(type(value), value))
189 |     return list
190 | 
191 | def format_exception_only(etype, value):
192 |     """Format the exception part of a traceback.
193 | 
194 |     The arguments are the exception type and value such as given by
195 |     sys.last_type and sys.last_value. The return value is a list of
196 |     strings, each ending in a newline.
197 | 
198 |     Normally, the list contains a single string; however, for
199 |     SyntaxError exceptions, it contains several lines that (when
200 |     printed) display detailed information about where the syntax
201 |     error occurred.
202 | 
203 |     The message indicating which exception occurred is always the last
204 |     string in the list.
205 | 
206 |     """
207 |     # Gracefully handle (the way Python 2.4 and earlier did) the case of
208 |     # being called with (None, None).
209 |     if etype is None:
210 |         return [_format_final_exc_line(etype, value)]
211 | 
212 |     stype = etype.__name__
213 |     smod = etype.__module__
214 |     if smod not in ("__main__", "builtins"):
215 |         stype = smod + '.' + stype
216 | 
217 |     if not issubclass(etype, SyntaxError):
218 |         return [_format_final_exc_line(stype, value)]
219 | 
220 |     # It was a syntax error; show exactly where the problem was found.
221 |     lines = []
222 |     filename = value.filename or "<string>"
223 |     lineno = str(value.lineno) or '?'
224 |     lines.append('  File "%s", line %s\n' % (filename, lineno))
225 |     badline = value.text
226 |     offset = value.offset
227 |     if badline is not None:
228 |         lines.append('    %s\n' % badline.strip())
229 |         if offset is not None:
230 |             caretspace = badline.rstrip('\n')[:offset].lstrip()
231 |             # non-space whitespace (likes tabs) must be kept for alignment
232 |             caretspace = ((c.isspace() and c or ' ') for c in caretspace)
233 |             # only three spaces to account for offset1 == pos 0
234 |             lines.append('   %s^\n' % ''.join(caretspace))
235 |     msg = value.msg or "<no detail available>"
236 |     lines.append("%s: %s\n" % (stype, msg))
237 |     return lines
238 | 
239 | def _format_final_exc_line(etype, value):
240 |     valuestr = _some_str(value)
241 |     if value is None or not valuestr:
242 |         line = "%s\n" % etype
243 |     else:
244 |         line = "%s: %s\n" % (etype, valuestr)
245 |     return line
246 | 
247 | def _some_str(value):
248 |     try:
249 |         return str(value)
250 |     except:
251 |         return '<unprintable %s object>' % type(value).__name__
252 | 
253 | 
254 | def print_exc(limit=None, file=None, chain=True):
255 |     """Shorthand for 'print_exception(*sys.exc_info(), limit, file)'."""
256 |     if file is None:
257 |         file = sys.stderr
258 |     try:
259 |         etype, value, tb = sys.exc_info()
260 |         print_exception(etype, value, tb, limit, file, chain)
261 |     finally:
262 |         etype = value = tb = None
263 | 
264 | 
265 | def format_exc(limit=None, chain=True):
266 |     """Like print_exc() but return a string."""
267 |     try:
268 |         etype, value, tb = sys.exc_info()
269 |         return ''.join(
270 |             format_exception(etype, value, tb, limit, chain))
271 |     finally:
272 |         etype = value = tb = None
273 | 
274 | 
275 | def print_last(limit=None, file=None, chain=True):
276 |     """This is a shorthand for 'print_exception(sys.last_type,
277 |     sys.last_value, sys.last_traceback, limit, file)'."""
278 |     if not hasattr(sys, "last_type"):
279 |         raise ValueError("no last exception")
280 |     if file is None:
281 |         file = sys.stderr
282 |     print_exception(sys.last_type, sys.last_value, sys.last_traceback,
283 |                     limit, file, chain)
284 | 
285 | 
286 | def print_stack(f=None, limit=None, file=None):
287 |     """Print a stack trace from its invocation point.
288 | 
289 |     The optional 'f' argument can be used to specify an alternate
290 |     stack frame at which to start. The optional 'limit' and 'file'
291 |     arguments have the same meaning as for print_exception().
292 |     """
293 |     if f is None:
294 |         try:
295 |             raise ZeroDivisionError
296 |         except ZeroDivisionError:
297 |             f = sys.exc_info()[2].tb_frame.f_back
298 |     print_list(extract_stack(f, limit), file)
299 | 
300 | def format_stack(f=None, limit=None):
301 |     """Shorthand for 'format_list(extract_stack(f, limit))'."""
302 |     if f is None:
303 |         try:
304 |             raise ZeroDivisionError
305 |         except ZeroDivisionError:
306 |             f = sys.exc_info()[2].tb_frame.f_back
307 |     return format_list(extract_stack(f, limit))
308 | 
309 | def extract_stack(f=None, limit = None):
310 |     """Extract the raw traceback from the current stack frame.
311 | 
312 |     The return value has the same format as for extract_tb().  The
313 |     optional 'f' and 'limit' arguments have the same meaning as for
314 |     print_stack().  Each item in the list is a quadruple (filename,
315 |     line number, function name, text), and the entries are in order
316 |     from oldest to newest stack frame.
317 |     """
318 |     if f is None:
319 |         try:
320 |             raise ZeroDivisionError
321 |         except ZeroDivisionError:
322 |             f = sys.exc_info()[2].tb_frame.f_back
323 |     if limit is None:
324 |         if hasattr(sys, 'tracebacklimit'):
325 |             limit = sys.tracebacklimit
326 |     list = []
327 |     n = 0
328 |     while f is not None and (limit is None or n < limit):
329 |         lineno = f.f_lineno
330 |         co = f.f_code
331 |         filename = co.co_filename
332 |         name = co.co_name
333 |         #linecache.checkcache(filename)
334 |         #line = linecache.getline(filename, lineno, f.f_globals)
335 |         #if line: line = line.strip()
336 |         #else: 
337 |         line = None
338 |         list.append((filename, lineno, name, line))
339 |         f = f.f_back
340 |         n = n+1
341 |     list.reverse()
342 |     return list
343 | 


--------------------------------------------------------------------------------
/utilsbycory.py:
--------------------------------------------------------------------------------
 1 | #####################################################################################################
 2 | ##
 3 | ##      utilsbycory.py - part of duplicatemanager, a script for comicrack
 4 | ##
 5 | ##      Author: perezmu after cbanack
 6 | ##
 7 | ##      Copyleft perezmu 2011. 
 8 | ##
 9 | ######################################################################################################
10 | 
11 | #### Original declarations
12 | ''' 
13 | This module contains utility methods for working with ComicRack
14 | ComicBook objects (i.e. 'book' objects).
15 | 
16 | @author: Cory Banack
17 | '''
18 | ####
19 | 
20 | 
21 | import re
22 | 
23 | def cleanupseries(series_name):
24 |     
25 |     # All of the symbols below cause inconsistency in title searches
26 |     series_name = series_name.lower()
27 |     series_name = series_name.replace('.', '')
28 |     series_name = series_name.replace('_', ' ')
29 |     series_name = series_name.replace('-', ' ')
30 |     series_name = series_name.replace("'", ' ')
31 |     series_name = series_name.replace(":", ' ')
32 |     series_name = re.sub(r'\b(vs\.?|versus|and|or|the|an|of|a|is)\b','', series_name)
33 |     series_name = re.sub(r'giantsize', r'giant size', series_name)
34 |     series_name = re.sub(r'giant[- ]*sized', r'giant size', series_name)
35 |     series_name = re.sub(r'kingsize', r'king size', series_name)
36 |     series_name = re.sub(r'king[- ]*sized', r'king size', series_name)
37 |     series_name = re.sub(r"directors", r"director's", series_name)
38 |     series_name = re.sub(r"\bvolume\b", r"\bvol\b", series_name)
39 |     series_name = re.sub(r"\bvol\.\b", r"\bvol\b", series_name)
40 | 
41 |     series_name = re.sub(r'[ ]*', r'', series_name)
42 |  
43 |     return series_name
44 |     
45 |     
46 | def convertnumberwords(phrase_s, expand_b):
47 |    """
48 |    Converts all of the number words (as defined by regular expression 'words')
49 |    in the given phrase, either expanding or contracting them as specified.
50 |    When expanding, words like '1' and '2nd' will be transformed into 'one'
51 |    and 'second' in the returned string.   When contracting, the transformation
52 |    goes in reverse.
53 |    
54 |    This method only works for numbers up to 20, and it only works properly
55 |    on lower case strings. 
56 |    """
57 |    number_map = {'0': 'zero', '1': 'one', '2': 'two', '3': 'three',\
58 |       '4': 'four', '5': 'five', '6': 'six','7': 'seven', '8': 'eight',\
59 |       '9': 'nine', '10': 'ten', '11': 'eleven', '12': 'twelve',\
60 |       '13': 'thirteen', '14': 'fourteen', '15': 'fifteen',\
61 |       '16': 'sixteen', '17': 'seventeen', '18': 'eighteen', '19': 'nineteen',\
62 |       '20': 'twenty', '0th': 'zeroth', '1rst': 'first', '2nd': 'second',\
63 |       '3rd': 'third', '4th': 'fourth', '5th': 'fifth', '6th': 'sixth',\
64 |       '7th': 'seventh', '8th': 'eighth', '9th': 'ninth', '10th': 'tenth',\
65 |       '11th': 'eleventh', '12th': 'twelveth', '13th': 'thirteenth',\
66 |       '14th': 'fourteenth', '15th': 'fifteenth', '16th': 'sixteenth',\
67 |       '17th': 'seventeenth', '18th': 'eighteenth', '19th': 'nineteenth',\
68 |       '20th': 'twentieth'}
69 | 
70 |    b = r'\b'
71 |    if expand_b:
72 |       for (x,y) in number_map.iteritems():
73 |          phrase_s = re.sub(b+x+b, y, phrase_s);
74 |       phrase_s = re.sub(r'\b1st\b', 'first', phrase_s);
75 |    else:
76 |       for (x,y) in number_map.iteritems():
77 |          phrase_s = re.sub(b+y+b, x, phrase_s);
78 |       phrase_s = re.sub(r'\btwelfth\b', '12th', phrase_s);
79 |       phrase_s = re.sub(r'\beightteenth\b', '18th', phrase_s);
80 |    return phrase_s
81 | 
82 | 
83 | 
84 | ### Other util methods
85 | 
86 | import System
87 | 
88 | 
89 | def ToString(v):
90 |     if v is None:
91 |         return ''
92 |     return unicode(v).encode(System.Text.Encoding.Default.BodyName, 'replace')
93 | 
94 | 


--------------------------------------------------------------------------------