├── .hgignore ├── Package.ini ├── README.md ├── changelog.txt ├── constants.py ├── dmBookWrapper.py ├── dmParser.py ├── dmrules.dat.demo ├── duplicatesmanager.png ├── duplicatesmanager.py ├── duplicatesmanager.xcf ├── duplicatesmanager_small.png ├── getcvdb.py ├── processfunctions.py ├── re.py ├── traceback.py └── utilsbycory.py /.hgignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/.hgignore -------------------------------------------------------------------------------- /Package.ini: -------------------------------------------------------------------------------- 1 | Name=Duplicates Manager 2 | Author=Perezmu and Pescuma 3 | Version=0.9 4 | Description=A manager to sort and remove duplicate ecomics from the library according to given rules. 5 | Image=duplicatesmanager_small.png 6 | KeepFiles= dmrules.dat -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ![http://i750.photobucket.com/albums/xx149/perezmu/duplicatesmanager-2.png](http://i750.photobucket.com/albums/xx149/perezmu/duplicatesmanager-2.png) ` ` DUPLICATES MANAGER FOR [COMICRACK](http://comicrack.cyolito.com) # 2 | 3 | --- 4 | 5 | ## NEW VERSION 0.9 ## 6 | 7 | ### Updated with NEW RULES, see the [rules wiki](http://code.google.com/p/comicrack-duplicates-manager/wiki/RulesFileSyntax) ### 8 | 9 | ``` 10 | v0.9 -> 11 | 12 | Added: - New rules: 13 | - scan keep/remove 14 | - New toolbar icon 15 | - Fix for typos and rare crash 16 | ``` 17 | [Complete CHANGELOG](http://code.google.com/p/comicrack-duplicates-manager/wiki/Changelog) 18 | 19 | --- 20 | 21 | 22 | ### IMPORTANT NOTICE (UPDATED AS OF VERSION 0.5) ### 23 | Since I do not want to mess with your files & library before we are sure this thing works right, 24 | |**the script out of the box will not move or remove any comic, just log what it would do in the logfile. To enable the actual processing of files you need to add the values true for the variables `MOVEFILES` and `REMOVEFROMLIB`** in the **dmrules.dat** file. See the [Configuration Options](http://code.google.com/p/comicrack-duplicates-manager/wiki/UserConfiguration?ts=1297328384&updated=UserConfiguration) Wiki page for details.| 25 | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 26 | 27 | 28 | --- 29 | 30 | 31 | This script is an addon to comicrack that identifies duplicated ecomics and follows a set of user defined rules to remove unwanted dupes. It is designed with the 0-days in mind, but should prove useful in other scenarios. 32 | 33 | The script reads a file (**dmrules.dat**) from the directory where it is installed, that contains, both user options, and a series of rules to manage the duplicate files. Duplicate files that meet the criteria expressed in the rules are **moved** to a dump directory (not deleted) and removed from the comicrack library (default dump directory is `C:\_dupes_`). This directory also holds a logfile (**logfile.log**) that details the process followed on your comics 34 | 35 | So, the first thing you want to do is read the rules (see wiki) and edit your custom **dmrules.dat** file. 36 | 37 | Wiki Index: 38 | 39 | * [Overview](http://code.google.com/p/comicrack-duplicates-manager/wiki/Overview?ts=1297338327&updated=Overview): You need to read this first!!!! 40 | * Rules are explained in detail in the Wiki: [Rules Syntax](http://code.google.com/p/comicrack-duplicates-manager/wiki/RulesFileSyntax). 41 | * Then you can go to the [Tips and Examples](http://code.google.com/p/comicrack-duplicates-manager/wiki/TipsAndTricks?ts=1297265175&updated=TipsAndTricks) page. 42 | * User defined variables are described in the [Configuration Options](http://code.google.com/p/comicrack-duplicates-manager/wiki/UserConfiguration?ts=1297328384&updated=UserConfiguration) page. 43 | * Credits for people who have (not necessary knowingly) contributed to this project in [the credits](http://code.google.com/p/comicrack-duplicates-manager/wiki/FellowCredits) page 44 | * Screenshot and example files at [this wiki page](http://code.google.com/p/comicrack-duplicates-manager/wiki/ScreenshotExample) 45 | * Changelog is in [this other page](http://code.google.com/p/comicrack-duplicates-manager/wiki/Changelog) 46 | 47 | Discussion is provided in the [comicrack support forum](http://comicrack.cyolito.com/forum/13-scripts/12076-duplicates-manager#12076) 48 | 49 | 50 | --- 51 | 52 | 53 | Cheers!!!!! ![http://comicrack.cyolito.com/media/kunena/avatars/resized/size72/users/avatar195.jpg](http://comicrack.cyolito.com/media/kunena/avatars/resized/size72/users/avatar195.jpg) -------------------------------------------------------------------------------- /changelog.txt: -------------------------------------------------------------------------------- 1 | v0.9 -> 2 | 3 | Added: - New rules: 4 | - scan keep/remove 5 | - New toolbar icon 6 | - Fix for typos and rare crash 7 | 8 | 9 | v0.8 -> 10 | 11 | Added: - New rules: 12 | - pagesize keep/remove largest/smallet 13 | - Comics with different formats will not be duppes 14 | 15 | 16 | v0.7 -> 17 | 18 | Added: - New rules: 19 | - pagecount remove largest 20 | - pagecount remove smallest 21 | - filesize remove largest 22 | - filesize remove smallest 23 | - Now it copies some comic information from deleted comics. It is disabled per default. 24 | Enable in constants.py : UPDATEINFO 25 | - Fix for series with multiple volumes 26 | 27 | 28 | v0.6 -> New Features Release 29 | 30 | Added: - New parser (texts with more than one word can be surrounded by ") 31 | - Percentage option to filesize keep/remove 32 | - Added percentage option to pagecount keep/remove largest/smallest 33 | - Added keep first (to remove remaining identical files) 34 | - Allow filter multiple words (using any of then) in texts 35 | 36 | 37 | v0.5 -> 38 | 39 | Fixed: - Major bug found in the 'text' rules!!!!! 40 | 41 | Changed: - 'pagecount keep noads' now skips comics with COVERPAGES or less pages 42 | - Added "@ OPTION VALUE" to the rules.dat file 43 | - Added new options: 44 | - COVERPAGES (int) 45 | - SIZEMARGIN (int) (part of issue 9, still not operative) 46 | - Allow more than one word as part of the text based rules (issue 8) 47 | 48 | v0.4 -> Bug fix Release 49 | 50 | Fixed: - Issue 7 finally solved (I hope!) 51 | - Issue 6 finally solved (I hope!) 52 | - Correctly changed version number in Package.ini 53 | 54 | 55 | v0.3 -> Bug fix Release 56 | 57 | Fixed: - Doesn't break Series info Panel (issue 6) anymore 58 | 59 | - Threw and exception when there were no dupes (issue 4) 60 | 61 | 62 | Changed: - Remove leading 0's in comic number to improve duplicate discovery 63 | 64 | - 'pagecount remove fileless' will remove all fileless dupes but one when 65 | a group of only fileless is found. One with thumbnail will be kept 66 | (issue 7) 67 | 68 | 69 | v0.2 -> Bug fix Release 70 | 71 | Fixed: - Rules '[text] remove word' is now correctly parsed. 72 | 73 | v0.1 -> Initial Release -------------------------------------------------------------------------------- /constants.py: -------------------------------------------------------------------------------- 1 | ##################################################################################################### 2 | ## 3 | ## constants.py - part of duplicatemanager, a script for comicrack 4 | ## 5 | ## Author: perezmu 6 | ## 7 | ## Copyleft perezmu 2011. 8 | ## 9 | ###################################################################################################### 10 | 11 | 12 | 13 | ########## 14 | # 15 | # DEFINITIONS 16 | 17 | 18 | import re 19 | import clr 20 | import System 21 | import System.IO 22 | from System.IO import Path, Directory, File, FileInfo 23 | 24 | # 25 | ############# **** USER CONFIGURABLE VARIABLES *** ########################################### 26 | # 27 | # see http://code.google.com/p/comicrack-duplicates-manager/wiki/UserConfiguration for details 28 | # 29 | # These may also be set in the "dmrules.dat" rules file using this syntax: "@ OPTION VALUE". Values 30 | # found in the "dmrules.dat" file override the defaults set in this file. 31 | 32 | 33 | MOVEFILES = False 34 | REMOVEFROMLIB = False 35 | UPDATEINFO = False 36 | 37 | DUPESDIRECTORY = Path.Combine("C:\\","__dupes__") 38 | 39 | C2C_NOADS_GAP = 5 # Difference of pages between c2c and noads 40 | SIZEMARGIN = 0 # Preserve comics within sizemargin % size 41 | COVERPAGES = 4 # Minimal number of pages to be considered "covers only" 42 | 43 | VERBOSE = False # Logging level (true/false) 44 | DEBUG = False # Logging level (true/false) 45 | 46 | 47 | # 48 | ############ DON'T MODIFY BELOW THIS LINE ###### 49 | # 50 | 51 | VERSION= "0.9" 52 | 53 | SCRIPTDIRECTORY = __file__[0:-len("constants.py")] 54 | RULESFILE = Path.Combine(SCRIPTDIRECTORY, "dmrules.dat") 55 | LOGFILE = Path.Combine(DUPESDIRECTORY, "logfile.log") 56 | (SERIES,NUMBER,VOLUME,FILENAME,PAGECOUNT,FILESIZE,ID,CVDB_ID,FILEPATH,TAGS,NOTES,FILETYPE,SCAN,BOOK) = range(14) 57 | FIELD_NAMES = ['series','number','volume','filename','pages','size','id','cvdb_id','path','tags','notes','type','scan','book'] 58 | FIELDS_TO_UPDATE_INFO = [ 59 | [ 'AlternateCount', lambda x: int(x) ], 60 | [ 'AlternateNumber', lambda x: x ], 61 | [ 'AlternateSeries', lambda x: x ], 62 | [ 'Count', lambda x: int(x) ], 63 | [ 'Title', lambda x: x ], 64 | ] 65 | 66 | # 67 | # 68 | ########### -------------------------------------------------------------------------------- /dmBookWrapper.py: -------------------------------------------------------------------------------- 1 | ##################################################################################################### 2 | ## 3 | ## BookWrapper.py - part of duplicatemanager, a script for comicrack 4 | ## 5 | ## Author: Ricardo Pescuma Domenecci , modified by perezmu 6 | ## 7 | ## Orignally from pescumas' series info panel script. 8 | ## 9 | ###################################################################################################### 10 | 11 | 12 | ### original credits 13 | """ 14 | Copyright (C) 2010 Ricardo Pescuma Domenecci 15 | 16 | This is free software; you can redistribute it and/or 17 | modify it under the terms of the GNU Library General Public 18 | License as published by the Free Software Foundation; either 19 | version 2 of the License, or (at your option) any later version. 20 | 21 | This is distributed in the hope that it will be useful, 22 | but WITHOUT ANY WARRANTY; without even the implied warranty of 23 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 24 | Library General Public License for more details. 25 | 26 | You should have received a copy of the GNU Library General Public 27 | License along with this file; see the file license.txt. If 28 | not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, 29 | Boston, MA 02111-1307, USA. 30 | """ 31 | ########## 32 | 33 | """ 34 | Modified by apm 35 | """ 36 | 37 | import clr 38 | clr.AddReference('System.Drawing') 39 | import sys 40 | import System 41 | from getcvdb import extract_issue_ref 42 | from utilsbycory import * 43 | 44 | 45 | 46 | 47 | class dmBookWrapper: 48 | _emptyVals = { 49 | 'Count' : '-1', 50 | 'Year' : '-1', 51 | 'Month' : '-1', 52 | 'AlternateCount' : '-1', 53 | 'Rating' : '0.0', 54 | 'CommunityRating' : '0.0' 55 | } 56 | _dontConvert = set([ 57 | 'Pages', 58 | 'PageCount', 59 | 'FrontCoverPageIndex', 60 | 'FirstNonCoverPageIndex', 61 | 'LastPageRead', 62 | 'ReadPercentage', 63 | 'OpenedCount' 64 | ]) 65 | 66 | def __init__(self, book): 67 | self.raw = book 68 | self._pages = {} 69 | 70 | def __dir__(self): 71 | ret = set() 72 | ret.update(set(self.__dict__)) 73 | ret.update(dir(self.raw)) 74 | ret.update(self._getterFields) 75 | return list(ret) 76 | 77 | def _safeget(self, name): 78 | try: 79 | return self._get(name) 80 | except: 81 | return '' 82 | 83 | def _get(self, name): 84 | return ToString(getattr(self.raw, name)).strip() 85 | 86 | def __getattr__(self, name): 87 | if name in self._dontConvert: 88 | return getattr(self.raw, name) 89 | 90 | if name in self._emptyVals: 91 | emptVal = self._emptyVals[name] 92 | else: 93 | emptVal = '' 94 | 95 | ret = self._get(name) 96 | if ret == '' or ret == emptVal: 97 | ret = self._safeget('Shadow' + name) 98 | if ret == '' or ret == emptVal: 99 | ret = '' 100 | return ret 101 | 102 | def GetCover(self, width = 0, height = 0): 103 | coverIndex = 0 104 | if self.raw.FrontCoverPageIndex > 0: 105 | coverIndex = self.raw.FrontCoverPageIndex 106 | return self.GetPage(coverIndex, width, height) 107 | 108 | def GetPage(self, page, width = 0, height = 0): 109 | global _oldTmpFiles, _ComicRack 110 | 111 | if not self.raw.FilePath: 112 | if page > 0: 113 | return '' 114 | elif page >= self.raw.PageCount: 115 | return '' 116 | 117 | hash = str(page) + '_' + str(width) + '_' + str(height) 118 | 119 | if hash in self._pages: 120 | return self._pages[hash] 121 | 122 | self._pages[hash] = '' 123 | 124 | #image = _ComicRack.App.GetComicPage(self.raw, page) 125 | image = _ComicRack.App.GetComicThumbnail(self.raw, page) 126 | if image is None: 127 | return '' 128 | 129 | tmpFile = System.IO.Path.GetTempFileName() 130 | _oldTmpFiles.append(tmpFile) 131 | 132 | # We need a jpg 133 | imageFile = tmpFile + '.jpg' 134 | _oldTmpFiles.append(imageFile) 135 | #print imageFile 136 | 137 | try: 138 | if width > 0 or height > 0: 139 | image = ResizeImage(image, width, height) 140 | 141 | image.Save(imageFile, System.Drawing.Imaging.ImageFormat.Jpeg) 142 | 143 | self._pages[hash] = imageFile 144 | 145 | return imageFile 146 | 147 | except Exception,e: 148 | print '[SeriesInfoPanel] Exception when saving image: ', e 149 | return '' 150 | 151 | def GetSeries(self): 152 | ret = self.raw.Series 153 | if ret: 154 | return ToString(ret) 155 | ret = self.raw.ShadowSeries 156 | if ret: 157 | return ToString(ret) 158 | return '' 159 | 160 | def GetVolume(self): 161 | ret = self.raw.Volume 162 | if ret != -1: 163 | return ToString(ret) 164 | ret = self.raw.ShadowVolume 165 | if ret != -1: 166 | return ToString(ret) 167 | return '' 168 | 169 | def GetNumber(self): 170 | ret = self.raw.Number 171 | if ret: 172 | return ret 173 | ret = self.raw.ShadowNumber 174 | if ret: 175 | return ret 176 | return '' 177 | 178 | def GetFormat(self): 179 | ret = self.raw.Format 180 | if ret: 181 | return ret 182 | ret = self.raw.ShadowFormat 183 | if ret: 184 | return ret 185 | return 'Series' 186 | 187 | def GetFileFormat(self): 188 | if not self.raw.FilePath: 189 | return 'Fileless' 190 | ret = self.raw.FileFormat 191 | if ret: 192 | return ret 193 | return self.raw.ShadowFileFormat 194 | if ret: 195 | return ret 196 | return Translate('Unknown') 197 | 198 | def GetFilePath(self): 199 | if not self.raw.FilePath: 200 | return 'Fileless' 201 | ret = self.raw.FilePath 202 | return ret 203 | 204 | def GetFileName(self): 205 | if not self.raw.FileNameWithExtension: 206 | return 'Fileless' 207 | ret = self.raw.FileNameWithExtension 208 | return ret 209 | 210 | 211 | # these were added by apm 212 | 213 | 214 | def GetId(self): 215 | return ToString(self.raw.Id) 216 | 217 | def GetPageCount(self): 218 | return self.raw.PageCount 219 | 220 | def GetFileSize(self): 221 | return self.raw.FileSize 222 | 223 | def GetCVDB_ID(self): 224 | return ToString(extract_issue_ref(self.raw)) 225 | 226 | 227 | # Properties 228 | 229 | Cover = property(GetCover) 230 | Series = property(GetSeries) 231 | Volume = property(GetVolume) 232 | Number = property(GetNumber) 233 | Format = property(GetFormat) 234 | FileFormat = property(GetFileFormat) 235 | ID = property (GetId) 236 | PageCount = property(GetPageCount) 237 | FilePath = property(GetFilePath) 238 | FileName = property(GetFileName) 239 | 240 | FileSize = property(GetFileSize) 241 | CVDB_ID = property(GetCVDB_ID) 242 | 243 | 244 | -------------------------------------------------------------------------------- /dmParser.py: -------------------------------------------------------------------------------- 1 | def _ParseLine(line_num, line): 2 | insideQuote = False 3 | result = [] 4 | word = '' 5 | 6 | i = 0 7 | while i < len(line): 8 | c = line[i] 9 | 10 | if not insideQuote: 11 | if c == '#': 12 | # Found start of comment 13 | line = line[:i].strip() 14 | break 15 | 16 | elif c == '"': 17 | if word == '': 18 | # Found start of a quote 19 | insideQuote = True 20 | else: 21 | # Found quote inside word. Keep it there 22 | word += c 23 | 24 | elif c.isspace(): 25 | if word != '': 26 | result.append(word) 27 | word = '' 28 | 29 | else: 30 | word += c 31 | 32 | else: # insideQuote 33 | if c == '\\' and line[i + 1] == '"': 34 | word += '"' 35 | i += 1 36 | 37 | elif c == '"': 38 | result.append(word) 39 | word = '' 40 | insideQuote = False 41 | 42 | else: 43 | word += c 44 | 45 | i += 1 46 | 47 | if word != '': 48 | result.append(word) 49 | 50 | if len(result) > 0: 51 | result.insert(0, line_num) 52 | result.insert(1, line) 53 | 54 | return result 55 | 56 | 57 | def Parse(lines): 58 | rules = [] 59 | 60 | line_num = 0 61 | for line in lines: 62 | line = line.strip('\r\n').strip() 63 | line_num += 1 64 | 65 | if line == '': 66 | continue 67 | 68 | rule = _ParseLine(line_num, line) 69 | if rule: 70 | rules.append(rule) 71 | 72 | return rules 73 | -------------------------------------------------------------------------------- /dmrules.dat.demo: -------------------------------------------------------------------------------- 1 | # All possible rules at the end of the file 2 | 3 | # this example removes fileless and 4 | # selects the noads files with largest filesize 5 | 6 | @ MOVEFILES True 7 | @ REMOVEFROMLIB True 8 | @ C2C_NOADS_GAP 120 9 | 10 | pagecount remove fileless 11 | covers keep some 12 | filename keep edit 13 | filename remove "cover only" 14 | filename keep noads 15 | pagecount keep noads 16 | text keep Bchry 17 | filesize keep largest 10% 18 | # 19 | # 20 | # filename keep c2c 21 | # filename remove c2c 22 | # filetype keep zip rar 23 | # filetype remove pdf 24 | # filetype remove fileless 25 | # filepath keep c2c 26 | # filepath remove c2c 27 | # tags remove c2c 28 | # tags keep c2c 29 | # notes keep c2c 30 | # notes remove c2c 31 | # text keep c2c 32 | # text remove c2c 33 | # scan keep abc 34 | # scan remove abc 35 | # covers keep all 36 | # covers keep some 37 | # filesize keep largest 38 | # filesize keep largest 10% 39 | # filesize remove largest 40 | # filesize remove largest 10% 41 | # filesize keep smallest 42 | # filesize keep smallest 10% 43 | # filesize remove smallest 44 | # filesize remove smallest 10% 45 | # pagecount keep largest 46 | # pagecount remove largest 47 | # pagecount keep smallest 48 | # pagecount remove smallest 49 | # pagecount keep fileless 50 | # pagecount remove fileless 51 | # pagecount keep noads 52 | # pagecount keep c2c 53 | # keep first -------------------------------------------------------------------------------- /duplicatesmanager.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/duplicatesmanager.png -------------------------------------------------------------------------------- /duplicatesmanager.py: -------------------------------------------------------------------------------- 1 | ##################################################################################################### 2 | ## 3 | ## duplicatesmanager.py - a script for comicrack 4 | ## 5 | ## Author: perezmu, pescuma 6 | ## 7 | ## Copyleft perezmu 2011. 8 | ## 9 | ## Detailed credits: "http://code.google.com/p/comicrack-duplicates-manager/wiki/FellowCredits" 10 | ## 11 | ###################################################################################################### 12 | 13 | 14 | 15 | ######### 16 | # 17 | # Import section 18 | 19 | import sys, traceback 20 | import re 21 | import clr 22 | import System 23 | import System.IO 24 | from System.IO import Path, Directory, File, FileInfo 25 | 26 | clr.AddReference("System.Windows.Forms") 27 | from System.Windows.Forms import DialogResult, MessageBox, MessageBoxButtons, MessageBoxIcon 28 | 29 | 30 | from itertools import groupby 31 | from dmBookWrapper import * 32 | from utilsbycory import cleanupseries 33 | from processfunctions import * 34 | from dmParser import * 35 | from utilsbycory import * 36 | 37 | from constants import * 38 | 39 | 40 | # 41 | # 42 | ########## 43 | 44 | 45 | ##'''---------------------------------------------------------''' 46 | 47 | 48 | ############ 49 | # 50 | # MAIN FUNCTION 51 | 52 | 53 | #@Name DuplicatesManager 54 | #@Hook Books 55 | #@Image duplicatesmanager.png 56 | 57 | def DuplicatesManager(books): 58 | 59 | ######################################## 60 | # 61 | # Starting log file 62 | # 63 | 64 | if not Directory.Exists(DUPESDIRECTORY): 65 | try: 66 | Directory.CreateDirectory(DUPESDIRECTORY) 67 | except Exception, ex: 68 | MessageBox.Show('ERROR: '+ str(ex), "ERROR creating dump directory" + DUPESDIRECTORY, MessageBoxButtons.OK, MessageBoxIcon.Exclamation) 69 | return 70 | 71 | logfile = open(LOGFILE,'w') 72 | logfile.write('COMICRACK DUPLICATES MANAGER V '+VERSION+'\n\n') 73 | ''' Logfile initialized ''' 74 | 75 | # 76 | # 77 | ######################################### 78 | 79 | try: 80 | ProcessDuplicates(books, logfile) 81 | 82 | except Exception, ex: 83 | logfile.write('\n\nSTOPPED PROCESSING BECAUSE OF EXCEPTION:\n') 84 | traceback.print_exc(None, logfile, False) 85 | raise ex 86 | 87 | finally: 88 | logfile.close() 89 | 90 | 91 | 92 | def ProcessDuplicates(books, logfile): 93 | 94 | ######################################### 95 | # 96 | # Getting comics info 97 | 98 | comiclist = [] 99 | for book in books: 100 | 101 | b = dmBookWrapper(book) 102 | # re.sub(r'^0+','',b.Number) -> removes leading 0's 103 | series = b.Series 104 | if b.Volume: 105 | series += ' Vol.' + b.Volume 106 | series += ' ' + b.Format 107 | comiclist.append((cleanupseries(series),re.sub(r'^0+','',b.Number),b.Volume,b.FileName,b.PageCount,b.FileSize/1048576.0,b.ID,b.CVDB_ID,b.FilePath,book.Tags,book.Notes,b.FileFormat,b.ScanInformation,book)) 108 | 109 | logfile.write('Parsing '+str(len(comiclist))+ ' ecomics\n') 110 | 111 | 112 | # 113 | # 114 | ######################################## 115 | 116 | ######################################### 117 | # 118 | # Setting intial options values 119 | 120 | options = {"movefiles":MOVEFILES, 121 | "removefromlib":REMOVEFROMLIB, 122 | "updateinfo":UPDATEINFO, 123 | "verbose":VERBOSE, 124 | "debug":DEBUG, 125 | "sizemargin":SIZEMARGIN, 126 | "coverpages":COVERPAGES, 127 | "c2c_noads_gap":C2C_NOADS_GAP} 128 | 129 | ######################################## 130 | # 131 | # Main Loop 132 | # 133 | 134 | try: 135 | 136 | ########################################### 137 | # 138 | # Load rules file 139 | 140 | rules = LoadRules(logfile, options) 141 | 142 | # 143 | ############################################ 144 | 145 | 146 | 147 | ############################################ 148 | # 149 | # Massage comics to get a list of dupes groups 150 | 151 | ''' Now we group books looking for dupes! ''' 152 | comiclist.sort() 153 | ''' begin sorting and sort the list ''' 154 | 155 | # TODO: I need to cleanup the series names and issues 1/2, 0.5, etc... 156 | # TODO: Also, check for CVDB items first! 157 | 158 | cl = {} 159 | ''' temp dictionary''' 160 | for key, group in groupby(comiclist, lambda x: x[SERIES]): 161 | cl[key] = list(group) 162 | '''groups by series''' 163 | ''' cl is a dictionary that now has 'series' as keys''' 164 | ''' we remove series with only one ecomic ''' 165 | 166 | logfile.write('============= Begining dupes identification ==================================\n\n') 167 | 168 | logfile.write('Parsing '+str(len(comiclist))+ ' ecomics\n') 169 | logfile.write('Found '+str(len(cl))+ ' different series\n') 170 | 171 | if options["verbose"]: 172 | for series in sorted(cl.keys()): 173 | logfile.write('\t'+series+'\n') 174 | 175 | remove = [] 176 | for series in cl.keys(): 177 | if len(cl[series])==1: 178 | remove.append(series) 179 | for series in remove: 180 | del cl[series] 181 | logfile.write('Found '+str(len(cl))+ ' different series with more than one issue\n') 182 | 183 | if options["verbose"]: 184 | for series in sorted(cl.keys()): 185 | logfile.write('\t'+series+'\n') 186 | 187 | ''' we now regroup each series looking for dupe issues ''' 188 | # We need to use a different list or sometimes python duplicates entries 189 | temp_cl = {} 190 | for series in cl.keys(): 191 | cl[series].sort() 192 | temp_dict = {} 193 | for key, group in groupby(cl[series], lambda x: x[NUMBER]): 194 | temp_dict[key] = list(group) 195 | temp_cl[series] = temp_dict 196 | cl = temp_cl 197 | 198 | ''' cleaning issues without dupes ''' 199 | remove = [] 200 | for series in cl.keys(): 201 | for number in cl[series]: 202 | if len(cl[series][number])==1: 203 | remove.append([series,number]) 204 | 205 | for a in remove: 206 | del cl[a[0]][a[1]] 207 | 208 | 209 | ''' now a second go for series without issues after non-dupe removal ''' 210 | remove = [] 211 | for i in cl: 212 | if len(cl[i])==0: 213 | remove.append(i) 214 | for i in remove: 215 | del cl[i] 216 | 217 | logfile.write('Found '+str(len(cl))+ ' different series with dupes\n') 218 | if options["verbose"]: 219 | for series in sorted(cl.keys()): 220 | logfile.write('\t'+series+'\t('+str(cl[series].keys())+')\n') 221 | 222 | ''' Now I have them sorted, I convert them to a simple list of lists (groups)... 223 | each item in this list is a list of dupes ''' 224 | 225 | dupe_groups = [] 226 | for i in cl: 227 | for j in cl[i]: 228 | dupe_groups.append(cl[i][j]) 229 | 230 | logfile.write('Found '+str(len(dupe_groups)) +' groups of dupes, with a total of '+ str(len(reduce(list.__add__, dupe_groups, [])))+ ' ecomics.\n') 231 | if options["verbose"]: 232 | for group in sorted(dupe_groups): 233 | logfile.write('\t'+group[0][SERIES]+' #'+group[0][NUMBER]+'\n') 234 | for comic in group: 235 | logfile.write('\t\t'+comic[FILENAME]+'\n') 236 | 237 | dupe_groups.sort() 238 | 239 | logfile.write('\n============= End of dupes identification=====================================\n\n\n\n') 240 | logfile.write('============= Beginning dupes processing =====================================\n\n') 241 | 242 | del cl 243 | 244 | # 245 | ########################################################## 246 | 247 | # 248 | # Exception handling 249 | # 250 | 251 | except NoRulesFileException, ex: 252 | MessageBox.Show('ERROR: '+ str(ex), "ERROR in Rules File", MessageBoxButtons.OK, MessageBoxIcon.Exclamation) 253 | logfile.write('\n\nERROR in Rules File:\n') 254 | traceback.print_exc(None, logfile, False) 255 | return 256 | 257 | 258 | ###################### processing ######################################## 259 | 260 | movedcomics = 0 261 | 262 | new_groups = [] 263 | 264 | # fix for issue 4 - if there are no dupes, end gracefully 265 | if len(dupe_groups) == 0: 266 | MessageBox.Show('Script execution completed: No duplicates found in the comics selected', 'Success', MessageBoxButtons.OK, MessageBoxIcon.Information) 267 | logfile.write('\n\n\n\ ########################################################### \n\n\n') 268 | logfile.write('Scritp execution completed: No duplicates found in the comics selected') 269 | 270 | del dupe_groups 271 | del new_groups 272 | return 273 | 274 | for group in dupe_groups: 275 | 276 | t_group = group[:] 277 | 278 | logfile.write('\n= PROCESSING GROUP_____\n') 279 | logfile.write('= '+ t_group[0][SERIES] + ' #'+str(t_group[0][NUMBER])+'\n') 280 | 281 | i_rules = 0 282 | 283 | while (len(t_group)> 1) and (i_rules < len(rules)): 284 | t_rule = rules[i_rules][:] 285 | 286 | line = t_rule[0] 287 | t_rule = t_rule[1:] 288 | 289 | logfile.write('\n_________________ ') 290 | logfile.write(line) 291 | logfile.write(' _________________\n') 292 | logfile.flush() 293 | 294 | if options["debug"]: 295 | logfile.write(' ' + str(t_rule) + '\n') 296 | 297 | t_rule.append(t_group[:]) 298 | t_rule.append(logfile) 299 | t_rule.insert(1,ComicRack) 300 | t_rule.insert(1,options) 301 | 302 | t_group = globals()[t_rule[0]](*t_rule[1:]) ### this is the trick to call a function using a string with its name 303 | 304 | i_rules = i_rules+1 305 | 306 | new_groups.append(t_group) 307 | 308 | 309 | dupe_groups = new_groups[:] 310 | 311 | remain_comics = len(reduce(list.__add__, new_groups)) 312 | 313 | for group in dupe_groups: 314 | if len(group) == 1: new_groups.remove(group) 315 | 316 | # new_groups holds now the remaining groups for logging purposes 317 | 318 | #if len(dupe_groups)>=1: 319 | # print 'Found ',len(dupe_groups), ' groups of dupes, with a total of ', len(reduce(list.__add__, dupe_groups)), ' comics.' 320 | 321 | # 322 | # End of Main Loop 323 | # 324 | ########################################################### 325 | 326 | #### End report 327 | 328 | MessageBox.Show('Script execution completed correctly on: '+ str(len(books))+ ' books.\n - '+str(len(dupe_groups))+' duplicated groups processed.\n - '+str(len(new_groups))+' duplicated groups remain.\n - '+str(remain_comics)+' comics remain', 'Success', MessageBoxButtons.OK, MessageBoxIcon.Information) 329 | logfile.write('\n\n\n\ ########################################################### \n\n\n') 330 | logfile.write('Script execution completed correctly on: '+ str(len(books))+ ' books.\n'+str(len(dupe_groups))+' duplicated groups processed.\n'+str(len(new_groups))+' duplicated groups remain..\n'+str(remain_comics)+' comics remain') 331 | 332 | #### Garbage collecting 333 | 334 | del dupe_groups 335 | del new_groups 336 | 337 | return 338 | 339 | 340 | #### ============================================================================================================================ 341 | 342 | ############################### 343 | # 344 | # Read and parse rules files dmrules.dat 345 | # 346 | 347 | def LoadRules(logfile, options): 348 | 349 | # Check if file exists 350 | if not File.Exists(RULESFILE): 351 | raise NoRulesFileException('Rules File (dmrules.dat) could not be found in the script directory ('+ SCRIPTDIRECTORY +')') 352 | 353 | # Read file 354 | f = open(RULESFILE, 'r') 355 | all_lines = f.readlines() 356 | f.close() 357 | 358 | # Parse rules and filter out options 359 | options_list = [] 360 | rules = [] 361 | for line in Parse(all_lines): 362 | if line[2] == '@': 363 | options_list.append(line) 364 | elif line[2][0] == '@': 365 | options_list.append(line[:2] + ['@', line[2][1:]] + line[3:]) 366 | else: 367 | rules.append(line) 368 | 369 | logfile.write('\n\n============= Beginning options reading ==================================\n\n') 370 | logfile.write('Successfully read the following options: \n\n') 371 | for option in options_list: 372 | logfile.write('\tLine ' + str(option[0]) + ': ' + str(option[3:]) + '\n') 373 | logfile.write('\n') 374 | 375 | logfile.write('\n\n============= Beginning rules reading ==================================\n\n') 376 | logfile.write('Successfully read the following rules: \n\n') 377 | for rule in rules: 378 | logfile.write('\tLine ' + str(rule[0]) + ': ' + str(rule[2:]) + '\n') 379 | logfile.write('\n') 380 | 381 | bool_options = ("movefiles", "removefromlib", "updateinfo", "verbose", "debug") 382 | int_options = ("sizemargin", "coverspages", "c2c_noads_gap") 383 | 384 | 385 | # 386 | # Parse options 387 | # 388 | # Checks if options need to have (and in fact do have) boolean or integer values 389 | 390 | bDict = {"false":False, "true":True} 391 | 392 | for option in options_list: 393 | opLineNum = option[0] 394 | opLine = option[1] 395 | 396 | if len(option) != 5: 397 | raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "' + opLine + '" has wrong format') 398 | 399 | opName = option[3].lower() 400 | opVal = option[4].lower() 401 | 402 | # boolean option 403 | if opName in bool_options: 404 | opVal = opVal 405 | if opVal in bDict.keys(): 406 | options[opName] = bDict[opVal] 407 | else: 408 | raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "'+ opLine +'" value is invalid ("True" or "False" required)') 409 | 410 | # integer option 411 | elif opName in int_options: 412 | try: 413 | options[opName] = int(opVal) 414 | except: 415 | raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "'+ opLine +'" value is invalid (integer required)') 416 | # failure 417 | else: 418 | raise NoRulesFileException('Line ' + str(opLineNum) + ': Option "'+ opLine +'" not recognized (' + str(opName) + ')') 419 | 420 | 421 | logfile.write('\n\n============= Beginning options parsing ==================================\n\n') 422 | logfile.write('Using the following options: \n\n') 423 | for option in options: 424 | logfile.write('\t'+option.upper() + " = " + str(options[option]).upper()+'\n') 425 | 426 | # 427 | # Parse rules 428 | # 429 | 430 | logfile.write('\n\n============= Beginning rules parsing ==================================\n\n') 431 | 432 | parsed_rules = [] 433 | 434 | for rule in rules: 435 | parsed_rules.append(ParseRule(rule)) 436 | 437 | if VERBOSE: 438 | logfile.write('\nParsed rules:\n\n') 439 | for rule in parsed_rules: 440 | logfile.write('\t\t'+str(rule)+'\n') 441 | logfile.write('\n============= End of rules parsing ======================================\n\n\n\n') 442 | 443 | return parsed_rules 444 | 445 | 446 | 447 | def AsPercentage(args, index, defVal): 448 | if index < len(args): 449 | text = args[index] 450 | else: 451 | text = defVal 452 | 453 | if text[-1] == '%': 454 | num = text[:-1] 455 | else: 456 | num = text 457 | 458 | try: 459 | return int(num) 460 | except: 461 | raise Exception('Invalid percentage value: ' + text) 462 | 463 | 464 | 465 | known_rules = [ 466 | [ ["pagecount", "keep", "fileless"], lambda args: ["keep_pagecount_fileless"] ], 467 | [ ["pagecount", "remove", "fileless"],lambda args: ["remove_pagecount_fileless"] ], 468 | [ ["pagecount", "keep", "largest"], lambda args: ["keep_pagecount_largest", AsPercentage(args, 0, "0%")] ], 469 | [ ["pagecount", "remove", "largest"], lambda args: ["remove_pagecount_largest", AsPercentage(args, 0, "0%")] ], 470 | [ ["pagecount", "keep", "smallest"], lambda args: ["keep_pagecount_smallest", AsPercentage(args, 0, "0%")] ], 471 | [ ["pagecount", "remove", "smallest"],lambda args: ["remove_pagecount_smallest", AsPercentage(args, 0, "0%")] ], 472 | [ ["pagecount", "keep", "noads"], lambda args: ["keep_pagecount_noads"] ], 473 | [ ["pagecount", "keep", "c2c"], lambda args: ["keep_pagecount_c2c"] ], 474 | [ ["filesize", "keep", "largest"], lambda args: ["keep_filesize_largest", AsPercentage(args, 0, "0%")] ], 475 | [ ["filesize", "remove", "largest"], lambda args: ["remove_filesize_largest", AsPercentage(args, 0, "0%")] ], 476 | [ ["filesize", "keep", "smallest"], lambda args: ["keep_filesize_smallest", AsPercentage(args, 0, "0%")] ], 477 | [ ["filesize", "remove", "smallest"], lambda args: ["remove_filesize_smallest", AsPercentage(args, 0, "0%")] ], 478 | [ ["pagesize", "keep", "largest"], lambda args: ["keep_pagesize_largest", AsPercentage(args, 0, "0%")] ], 479 | [ ["pagesize", "remove", "largest"], lambda args: ["remove_pagesize_largest", AsPercentage(args, 0, "0%")] ], 480 | [ ["pagesize", "keep", "smallest"], lambda args: ["keep_pagesize_smallest", AsPercentage(args, 0, "0%")] ], 481 | [ ["pagesize", "remove", "smallest"], lambda args: ["remove_pagesize_smallest", AsPercentage(args, 0, "0%")] ], 482 | [ ["covers", "keep", "some"], lambda args: ["keep_covers_all", False] ], 483 | [ ["covers", "keep", "all"], lambda args: ["keep_covers_all", True] ], 484 | [ ["filename", "keep"], lambda args: ["keep_with_words", args, [FILENAME]] ], 485 | [ ["filename", "remove"], lambda args: ["remove_with_words", args, [FILENAME]] ], 486 | [ ["filepath", "keep"], lambda args: ["keep_with_words", args, [FILEPATH]] ], 487 | [ ["filepath", "remove"], lambda args: ["remove_with_words", args, [FILEPATH]] ], 488 | [ ["tags", "keep"], lambda args: ["keep_with_words", args, [TAGS]] ], 489 | [ ["tags", "remove"], lambda args: ["remove_with_words", args, [TAGS]] ], 490 | [ ["notes", "keep"], lambda args: ["keep_with_words", args, [NOTES]] ], 491 | [ ["notes", "remove"], lambda args: ["remove_with_words", args, [NOTES]] ], 492 | [ ["text", "keep"], lambda args: ["keep_with_words", args, [FILENAME, FILEPATH, TAGS, NOTES, SCAN]] ], 493 | [ ["text", "remove"], lambda args: ["remove_with_words", args, [FILENAME, FILEPATH, TAGS, NOTES, SCAN]] ], 494 | [ ["scan", "keep"], lambda args: ["keep_with_words", args, [SCAN]] ], 495 | [ ["scan", "remove"], lambda args: ["remove_with_words", args, [SCAN]] ], 496 | [ ["filetype", "keep"], lambda args: ["keep_with_words", args, [FILETYPE]] ], 497 | [ ["filetype", "remove"], lambda args: ["remove_with_words", args, [FILETYPE]] ], 498 | [ ["keep", "first"], lambda args: ["keep_first"] ], 499 | ] 500 | 501 | 502 | def ParseRule(rule): 503 | line_num = rule[0] 504 | line = rule[1] 505 | rule_tokens = rule[2:] 506 | 507 | # Try to match to a known command 508 | for cmd in known_rules: 509 | tokens = cmd[0] 510 | action = cmd[1] 511 | 512 | if len(rule_tokens) < len(tokens): 513 | continue 514 | 515 | # Check it the rule matches the command 516 | matches = True 517 | for i in range(len(tokens)): 518 | if tokens[i] != rule_tokens[i]: 519 | matches = False 520 | break; 521 | 522 | if not matches: 523 | continue 524 | 525 | args = rule_tokens[len(tokens):] 526 | 527 | try: 528 | result = [line] 529 | result.extend(action(args)) 530 | return result 531 | except Exception, ex: 532 | raise NoRulesFileException('Line ' + str(line_num) + ': ' + str(ex) + '\n' + line) 533 | 534 | # If got here not command was matched 535 | raise NoRulesFileException('Line ' + str(line_num) + ': Rule could not be parsed:\n' + line) 536 | 537 | 538 | 539 | class NoRulesFileException(Exception): 540 | pass 541 | 542 | -------------------------------------------------------------------------------- /duplicatesmanager.xcf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/duplicatesmanager.xcf -------------------------------------------------------------------------------- /duplicatesmanager_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pescuma/comicrack-duplicates-manager/162d6e10225b9e9eb3b4e618c3343eec08c92887/duplicatesmanager_small.png -------------------------------------------------------------------------------- /getcvdb.py: -------------------------------------------------------------------------------- 1 | ##################################################################################################### 2 | ## 3 | ## getcvdb.py - part of duplicatemanager, a script for comicrack 4 | ## 5 | ## Author: cbanack, for his Comic Vine Scraper module 6 | ## 7 | ## 8 | ###################################################################################################### 9 | 10 | ### original credits 11 | ''' 12 | This module contains utility methods for working with ComicRack 13 | ComicBook objects (i.e. 'book' objects). 14 | 15 | @author: Cory Banack 16 | ''' 17 | ######### 18 | 19 | 20 | import re 21 | 22 | # ============================================================================= 23 | def extract_issue_ref(book): 24 | ''' 25 | This method looks in the Tags and Notes fields of the given book for 26 | evidence that the given ComicBook has been scraped before. If possible, 27 | it will construct an IssueRef based on that evidence, and return it. 28 | If not, it will return None. 29 | 30 | If the user has manually added a "skip" flag to one of those fields, this 31 | method will return the string "skip", which should be interpreted as 32 | "never scrape this book". 33 | ''' 34 | 35 | 36 | tag_found = re.search(r'(?i)CVDB(\d{1,})', book.Tags) 37 | if not tag_found: 38 | tag_found = re.search(r'(?i)CVDB(\d{1,})', book.Notes) 39 | if not tag_found: 40 | tag_found = re.search(r'(?i)ComicVine.?\[(\d{1,})', book.Notes) 41 | 42 | retval = None 43 | if tag_found: 44 | retval = tag_found.group(1).lower() 45 | 46 | return retval 47 | -------------------------------------------------------------------------------- /processfunctions.py: -------------------------------------------------------------------------------- 1 | ##################################################################################################### 2 | ## 3 | ## processfunctions.py - part of duplicatemanager, a script for comicrack 4 | ## 5 | ## Author: perezmu, pescuma 6 | ## 7 | ## Copyleft perezmu 2011. 8 | ## 9 | ###################################################################################################### 10 | 11 | ######### 12 | # 13 | # Import section 14 | 15 | 16 | import constants 17 | 18 | import re 19 | import clr 20 | import System 21 | import System.IO 22 | from System.IO import Path, Directory, File, FileInfo 23 | 24 | clr.AddReference("System.Windows.Forms") 25 | from System.Windows.Forms import DialogResult, MessageBox, MessageBoxButtons, MessageBoxIcon 26 | 27 | 28 | from itertools import groupby 29 | from dmBookWrapper import * 30 | from utilsbycory import * 31 | 32 | from constants import * 33 | 34 | PAGESIZE = -1 35 | 36 | # 37 | # 38 | ########## 39 | 40 | 41 | 42 | ################################################################################################# 43 | 44 | 45 | # ================ PAGECOUNT FUNCTIONS ========================================================== 46 | 47 | 48 | def keep_pagecount_noads(options, cr, dgroup, logfile): 49 | ''' Keeps from the 'group' the ones that seem to be 'noads' (less pages) 50 | dgroup -> list of duplicate comics 51 | logfile -> file object ''' 52 | 53 | to_keep = [] 54 | to_remove =[] 55 | 56 | by_size = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=False) # sorts by filesize of covers 57 | 58 | for comic in dgroup: 59 | if comic[PAGECOUNT] <= options["coverpages"]: 60 | by_size.remove(comic) 61 | to_keep.append(comic) 62 | if comic[PAGECOUNT] == 0: logfile.write('skipping... '+ comic[SERIES]+' #' + comic[NUMBER] + ' (fileless)\n') 63 | else: logfile.write('skipping... '+ comic[FILENAME]+' #' + comic[NUMBER] +' (pages '+str(comic[PAGECOUNT])+')\n') 64 | 65 | i=0 #keeps the first one 66 | to_keep.append(by_size[i]) 67 | logfile.write('keeping... '+ by_size[i][FILENAME]+' (pages '+str(by_size[i][PAGECOUNT])+')\n') 68 | 69 | 70 | while (i list of duplicate comics 87 | logfile -> file object ''' 88 | 89 | to_keep = [] 90 | to_remove = [] 91 | 92 | by_size = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=True) # sorts by filesize of covers 93 | 94 | i=0 #keeps the first one 95 | to_keep.append(by_size[i]) 96 | logfile.write('keeping... '+ by_size[i][FILENAME]+' (pages '+str(by_size[i][PAGECOUNT])+')\n') 97 | 98 | 99 | while (i (int(by_size[i][PAGECOUNT]) - int(options["c2c_noads_gap"]))): 100 | to_keep.append(by_size[i+1]) 101 | logfile.write('keeping... '+ by_size[i+1][FILENAME]+' (pages '+str(by_size[i+1][PAGECOUNT])+')\n') 102 | i = i+1 103 | for j in range (i+1,len(by_size)): 104 | to_remove.append(by_size[j]) 105 | logfile.write('removing... '+ by_size[j][FILENAME]+' (pages '+str(by_size[j][PAGECOUNT])+')\n') 106 | 107 | if to_remove != []: 108 | updateinfo(options, to_remove, to_keep, logfile) 109 | deletecomics(options, cr, to_remove, logfile) 110 | 111 | return to_keep[:] 112 | 113 | 114 | def process_pagecount_largest(options, cr, percentage, dgroup, logfile, test_to_keep): 115 | ''' Keeps from the 'dgroup' the ones with most pages 116 | dgroup -> list of duplicate comics 117 | logfile -> file object 118 | percentage -> a percentage over the page count that is used to keep more comics 119 | test_to_keep -> True to keep the largest, False to remove 120 | ''' 121 | 122 | by_pages = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=True) # sorts by number of pages 123 | min_pages = by_pages[0][PAGECOUNT] * (1 - percentage/100.0) 124 | 125 | if options["verbose"]: 126 | logfile.write('Filtering all files with at least ' + str(min_pages) + ' pages\n') 127 | 128 | def IsToKeep(comic): 129 | return comic[PAGECOUNT] >= min_pages 130 | 131 | return process_dups(options, cr, IsToKeep, test_to_keep, [PAGECOUNT], dgroup, logfile) 132 | 133 | def keep_pagecount_largest(options, cr, percentage, dgroup, logfile): 134 | return process_pagecount_largest(options, cr, percentage, dgroup, logfile, True) 135 | 136 | def remove_pagecount_largest(options, cr, percentage, dgroup, logfile): 137 | return process_pagecount_largest(options, cr, percentage, dgroup, logfile, False) 138 | 139 | 140 | def process_pagecount_smallest(options, cr, percentage, dgroup, logfile, test_to_keep): 141 | ''' Keeps from the 'group' the one with less pages 142 | dgroup -> list of duplicate comics 143 | logfile -> file object 144 | percentage -> a percentage over the page count that is used to keep more comics 145 | test_to_keep -> True to keep the smallest, False to remove 146 | ''' 147 | 148 | by_pages = sorted(dgroup, key=lambda dgroup: dgroup[PAGECOUNT], reverse=False) # sorts by filesize of covers 149 | 150 | # keep fileless 151 | for comic in by_pages: 152 | if comic[PAGECOUNT] == 0: 153 | by_pages.remove(comic) 154 | 155 | if len(by_pages) < 1: 156 | max_pages = 0 157 | else: 158 | max_pages = by_pages[0][PAGECOUNT] * (1 + percentage/100.0) 159 | 160 | if options["verbose"]: 161 | logfile.write('Filtering all files with at max ' + str(max_pages) + ' pages\n') 162 | 163 | def IsToKeep(comic): 164 | return comic[PAGECOUNT] <= max_pages 165 | 166 | return process_dups(options, cr, IsToKeep, test_to_keep, [PAGECOUNT], dgroup, logfile) 167 | 168 | def keep_pagecount_smallest(options, cr, percentage, dgroup, logfile): 169 | return process_pagecount_smallest(options, cr, percentage, dgroup, logfile, True) 170 | 171 | def remove_pagecount_smallest(options, cr, percentage, dgroup, logfile): 172 | return process_pagecount_smallest(options, cr, percentage, dgroup, logfile, False) 173 | 174 | 175 | def keep_pagecount_fileless(options, cr, dgroup, logfile): 176 | ''' Keeps only fileless comics 177 | dgroup -> list of duplicate comics 178 | logfile -> file object ''' 179 | 180 | def IsToKeep(comic): 181 | return comic[FILENAME] != "Fileless" 182 | 183 | return process_dups(options, cr, IsToKeep, [PAGECOUNT], dgroup, logfile) 184 | 185 | 186 | def remove_pagecount_fileless(options, cr, dgroup, logfile): 187 | ''' Removes fileless comics 188 | dgroup -> list of duplicate comics 189 | logfile -> file object ''' 190 | 191 | to_keep = dgroup[:] 192 | to_remove = [] 193 | fileless_thumb = [] 194 | fileless_nothumb = [] 195 | 196 | # First separate all fileless 197 | for comic in dgroup: 198 | 199 | if comic[FILENAME] == "Fileless": 200 | to_keep.remove(comic) 201 | if comic[BOOK].CustomThumbnailKey == None: 202 | fileless_nothumb.append(comic) 203 | else: 204 | fileless_thumb.append(comic) 205 | 206 | if len(to_keep) == 0: # all are fileless 207 | if len(fileless_nothumb) == len(dgroup): # none has custom thumb 208 | to_keep.append(fileless_nothumb[0]) # keep the first one 209 | fileless_nothumb.pop(0) 210 | to_remove = fileless_nothumb[:] 211 | elif len(fileless_thumb) == 1: # only one with custom thumb 212 | to_keep = fileless_thumb[:] 213 | to_remove = fileless_nothumb[:] 214 | else: # more than one with custom thumb 215 | to_keep.append(fileless_thumb[0]) #keep the first one 216 | fileless_thumb.pop(0) 217 | to_remove = fileless_thumb[:] 218 | to_remove.extend(fileless_nothumb) 219 | 220 | else: # if there were non fileless, remove all fileless 221 | to_remove.extend(fileless_thumb) 222 | to_remove.extend(fileless_nothumb) 223 | 224 | 225 | for comic in fileless_nothumb: 226 | logfile.write('removing... '+ comic[SERIES]+' #' + comic[NUMBER] + ' (fileless + no cover)\n') 227 | for comic in fileless_thumb: 228 | logfile.write('removing... '+ comic[SERIES]+' #' + comic[NUMBER] + ' (fileless + custom cover)\n') 229 | for comic in to_keep: 230 | logfile.write('keeping... '+ comic[FILENAME]+' (pages '+str(comic[PAGECOUNT])+')\n') 231 | 232 | if to_remove != []: 233 | updateinfo(options, to_remove, to_keep, logfile) 234 | deletecomics(options, cr, to_remove, logfile) 235 | 236 | return to_keep[:] 237 | 238 | 239 | 240 | # =================== FILESIZE FUNCTIONS ======================================================== 241 | 242 | 243 | def process_filesize_largest(options, cr, percentage, dgroup, logfile, test_to_keep): 244 | ''' Keeps from the 'group' the largest comic 245 | dgroup -> list of duplicate comics 246 | logfile -> file object 247 | percentage -> a percentage over the size that is used to keep more comics 248 | test_to_keep -> True to keep largest, False to remove 249 | ''' 250 | 251 | by_size = sorted(dgroup, key=lambda dgroup: dgroup[FILESIZE], reverse=True) # sorts by filesize of covers 252 | min_size = by_size[0][FILESIZE] * (1 - percentage/100.0) 253 | 254 | if options["verbose"]: 255 | logfile.write('Filtering all files with size at least ' + str(min_size) + '\n') 256 | 257 | def IsToKeep(comic): 258 | return comic[FILESIZE] >= min_size 259 | 260 | return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE], dgroup, logfile) 261 | 262 | def keep_filesize_largest(options, cr, percentage, dgroup, logfile): 263 | return process_filesize_largest(options, cr, percentage, dgroup, logfile, True) 264 | 265 | def remove_filesize_largest(options, cr, percentage, dgroup, logfile): 266 | return process_filesize_largest(options, cr, percentage, dgroup, logfile, False) 267 | 268 | 269 | def process_filesize_smallest(options, cr, percentage, dgroup, logfile, test_to_keep): 270 | ''' Keeps from the 'group' the smallest comic 271 | dgroup -> list of duplicate comics 272 | logfile -> file object 273 | percentage -> a percentage over the size that is used to keep more comics 274 | test_to_keep -> True to keep smallest, False to remove 275 | ''' 276 | 277 | by_size = sorted(dgroup, key=lambda dgroup: dgroup[FILESIZE], reverse=False) # sorts by filesize of covers 278 | 279 | # keep fileless 280 | for comic in by_size: 281 | if comic[PAGECOUNT] == 0: 282 | by_size.remove(comic) 283 | 284 | if len(by_size) < 1: 285 | max_size = 0 286 | else: 287 | max_size = by_size[0][FILESIZE] * (1 + percentage/100.0) 288 | 289 | if options["verbose"]: 290 | logfile.write('Filtering all files with size at max ' + str(max_size) + '\n') 291 | 292 | def IsToKeep(comic): 293 | return comic[FILESIZE] <= max_size 294 | 295 | return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE], dgroup, logfile) 296 | 297 | def keep_filesize_smallest(options, cr, percentage, dgroup, logfile): 298 | return process_filesize_smallest(options, cr, percentage, dgroup, logfile, True) 299 | 300 | def remove_filesize_smallest(options, cr, percentage, dgroup, logfile): 301 | return process_filesize_smallest(options, cr, percentage, dgroup, logfile, False) 302 | 303 | 304 | 305 | # =================== PAGESIZE FUNCTIONS ======================================================== 306 | 307 | def pagesize(comic): 308 | if comic[PAGECOUNT] == 0: 309 | return 0 310 | else: 311 | return comic[FILESIZE] / comic[PAGECOUNT] 312 | 313 | def process_pagesize_largest(options, cr, percentage, dgroup, logfile, test_to_keep): 314 | ''' Keeps from the 'group' the comic with largest page size 315 | dgroup -> list of duplicate comics 316 | logfile -> file object 317 | percentage -> a percentage over the size that is used to keep more comics 318 | test_to_keep -> True to keep largest, False to remove 319 | ''' 320 | 321 | by_size = sorted(dgroup, key=lambda dgroup: pagesize(dgroup), reverse=True) # sorts by filesize of covers 322 | min_size = pagesize(by_size[0]) * (1 - percentage/100.0) 323 | 324 | if options["verbose"]: 325 | logfile.write('Filtering all files with page size at least ' + str(min_size) + '\n') 326 | 327 | def IsToKeep(comic): 328 | return pagesize(comic) >= min_size 329 | 330 | return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE, PAGECOUNT, PAGESIZE], dgroup, logfile) 331 | 332 | def keep_pagesize_largest(options, cr, percentage, dgroup, logfile): 333 | return process_pagesize_largest(options, cr, percentage, dgroup, logfile, True) 334 | 335 | def remove_pagesize_largest(options, cr, percentage, dgroup, logfile): 336 | return process_pagesize_largest(options, cr, percentage, dgroup, logfile, False) 337 | 338 | 339 | def process_pagesize_smallest(options, cr, percentage, dgroup, logfile, test_to_keep): 340 | ''' Keeps from the 'group' the comic with smallest page size 341 | dgroup -> list of duplicate comics 342 | logfile -> file object 343 | percentage -> a percentage over the size that is used to keep more comics 344 | test_to_keep -> True to keep smallest, False to remove 345 | ''' 346 | 347 | by_size = sorted(dgroup, key=lambda dgroup: pagesize(dgroup), reverse=False) # sorts by filesize of covers 348 | 349 | # keep fileless 350 | for comic in by_size: 351 | if comic[PAGECOUNT] == 0: 352 | by_size.remove(comic) 353 | 354 | if len(by_size) < 1: 355 | max_size = 0 356 | else: 357 | max_size = pagesize(by_size[0]) * (1 + percentage/100.0) 358 | 359 | if options["verbose"]: 360 | logfile.write('Filtering all files with page size at max ' + str(max_size) + '\n') 361 | 362 | def IsToKeep(comic): 363 | return pagesize(comic) <= max_size 364 | 365 | return process_dups(options, cr, IsToKeep, test_to_keep, [FILESIZE, PAGECOUNT, PAGESIZE], dgroup, logfile) 366 | 367 | def keep_pagesize_smallest(options, cr, percentage, dgroup, logfile): 368 | return process_pagesize_smallest(options, cr, percentage, dgroup, logfile, True) 369 | 370 | def remove_pagesize_smallest(options, cr, percentage, dgroup, logfile): 371 | return process_pagesize_smallest(options, cr, percentage, dgroup, logfile, False) 372 | 373 | 374 | 375 | # =================== COVERS FUNCTIONS ============================================================ 376 | 377 | 378 | def keep_covers_all(options, cr, option, dgroup, logfile): 379 | ''' Keeps from the 'group' the comics with largest number of '(n covers)' in the file name 380 | dgroup -> list of duplicate comics 381 | logfile -> file object 382 | option -> boolean: True means all comics with or without 'covers' are considered, meaning that 383 | if there is a single comic with 'covers' the rest will be deleted. False means 384 | that only those comics with the 'covers' word will be considered in the process ''' 385 | 386 | with_covers = [] 387 | to_keep = [] 388 | to_remove = [] 389 | 390 | for comic in dgroup: 391 | searchstring = convertnumberwords(comic[FILENAME],False) 392 | searchstring = searchstring.replace("(both","(2") 393 | searchstring = searchstring.lower() 394 | 395 | 396 | m = re.search('\((\d*) +covers\)', searchstring) 397 | if m: 398 | with_covers.append((comic, int(m.groups(0)[0]))) 399 | else: 400 | if option == True: 401 | with_covers.append((comic,1)) 402 | else: 403 | to_keep.append(comic) 404 | 405 | if with_covers != []: 406 | dgroup = [] 407 | with_covers = sorted(with_covers, key=lambda to_keep: to_keep[1], reverse=True) # sorts by number of covers 408 | max = with_covers[0][1] # max number of covers found 409 | 410 | temp_with_covers = with_covers[:] 411 | for (comic,covers) in temp_with_covers: 412 | if covers < max: 413 | with_covers.remove((comic,covers)) 414 | to_remove.append(comic) 415 | logfile.write('removing... '+ comic[FILENAME]+'\n') 416 | else: 417 | # logfile.write('keeping... '+ comic[FILENAME]+'\n') 418 | to_keep.append(comic) 419 | 420 | 421 | for comic in to_keep: 422 | dgroup.append(comic) 423 | logfile.write('keeping... '+ comic[FILENAME]+'\n') 424 | 425 | if to_remove != []: 426 | updateinfo(options, to_remove, dgroup, logfile) 427 | deletecomics(options, cr, to_remove, logfile) 428 | 429 | del with_covers 430 | del to_keep 431 | del to_remove 432 | 433 | return dgroup 434 | 435 | 436 | 437 | # =================== WORD SEARCH FUNCTIONS ======================================================== 438 | 439 | def fix_words_for_testing(words): 440 | wordlist = [] 441 | 442 | for word in words: 443 | word = word.lower() 444 | 445 | ''' some common substitutions .... more can be added ''' 446 | if word in ('c2c', 'ctc', 'fiche'): 447 | wordlist.extend(['c2c', 'ctc', 'fiche']) 448 | elif word in ('noads'): 449 | wordlist.extend(['noads', 'no ads']) 450 | elif word in ('(f)', 'fixed'): 451 | wordlist.extend(['(f)', 'fixed']) 452 | elif word in ('(f)', 'fiche'): 453 | wordlist.extend(['(f)', 'fiche']) 454 | elif word in ('zip','cbz'): 455 | wordlist.extend(['zip','cbz']) 456 | elif word in ('rar','cbr'): 457 | wordlist.extend(['rar','cbr']) 458 | else: 459 | wordlist.append(cleanupseries(word)) 460 | 461 | return wordlist 462 | 463 | 464 | def process_with_words(options, cr, words, items, dgroup, logfile, test_to_keep): 465 | ''' Removes from the 'group' all comics that do not include any of the 'words' 466 | in the fields 'item' 467 | dgroup -> list of duplicate comics 468 | logfile -> file object 469 | words -> text strings to be searched 470 | items -> LIST of fields to search in''' 471 | 472 | wordlist = fix_words_for_testing(words) 473 | 474 | def IsToKeep(comic): 475 | searchstring = "" 476 | for item in items: 477 | searchstring = searchstring + " " + cleanupseries(comic[item]) 478 | ''' adds all search strings together ''' 479 | 480 | for word in wordlist: 481 | if searchstring.find(word) != -1: #word found 482 | return True 483 | 484 | return False 485 | 486 | return process_dups(options, cr, IsToKeep, test_to_keep, items, dgroup, logfile) 487 | 488 | def keep_with_words(options, cr, words, items, dgroup, logfile): 489 | return process_with_words(options, cr, words, items, dgroup, logfile, True) 490 | 491 | def remove_with_words(options, cr, words, items, dgroup, logfile): 492 | return process_with_words(options, cr, words, items, dgroup, logfile, False) 493 | 494 | 495 | def keep_first(options, cr, dgroup, logfile): 496 | ''' Keeps only the first comic in the group 497 | dgroup -> list of duplicate comics 498 | logfile -> file object''' 499 | 500 | to_keep = dgroup[0] 501 | 502 | def IsToKeep(book): 503 | return book == to_keep 504 | 505 | return process_dups(options, cr, IsToKeep, True, [], dgroup, logfile) 506 | 507 | 508 | 509 | ################################################################################################### 510 | 511 | # ================ BASE FUNCTION TO HANDLE THE DUPS ================================================ 512 | 513 | 514 | def process_dups(options, cr, test_to_keep, keep_if_test_is_true, fields, dgroup, logfile): 515 | ''' Removes from the 'group' all comics that test_to_keep('comic') returns false 516 | dgroup -> list of duplicate comics 517 | logfile -> file object 518 | test_to_keep -> function to do the tesing''' 519 | 520 | to_keep = [] 521 | to_remove = [] 522 | 523 | for comic in dgroup: 524 | if test_to_keep(comic) == keep_if_test_is_true: 525 | if comic not in to_keep: 526 | to_keep.append(comic) 527 | continue 528 | 529 | if comic not in to_keep: 530 | to_remove.append(comic) 531 | 532 | # Make sure at least 1 book remains!!!! 533 | if len(to_keep) < 1: 534 | logfile.write('Filter would remove all items, so it will be ignored\n') 535 | to_keep = dgroup[:] 536 | to_remove = [] 537 | 538 | # Log comic actions 539 | for comic in dgroup: 540 | if comic in to_keep: 541 | logfile.write('keeping... ') 542 | else: 543 | logfile.write('removing... ') 544 | 545 | logfile.write(comic[FILENAME]) 546 | 547 | if options["verbose"] and len(fields) > 0: 548 | logfile.write(' (') 549 | for i in range(len(fields)): 550 | if i > 0: 551 | logfile.write(' ') 552 | f = fields[i] 553 | if f == PAGESIZE: 554 | logfile.write('pagesize=' + ToString(pagesize(comic))) 555 | else: 556 | logfile.write(FIELD_NAMES[f] + '=' + ToString(comic[f])) 557 | logfile.write(')') 558 | 559 | logfile.write('\n') 560 | logfile.flush() 561 | 562 | # Delete books 563 | if to_remove != []: 564 | updateinfo(options, to_remove, to_keep, logfile) 565 | deletecomics(options, cr, to_remove, logfile) 566 | 567 | del to_remove 568 | 569 | return to_keep 570 | 571 | 572 | # ================ DELETE COMICS FUNCTION ========================================================== 573 | 574 | # Copy missing data from remove files to keep files 575 | def updateinfo(options, to_remove_files, to_keep_files, logfile): 576 | to_remove = [] 577 | for book in to_remove_files: 578 | to_remove.append(dmBookWrapper(book[BOOK])) 579 | to_keep = [] 580 | for book in to_keep_files: 581 | to_keep.append(dmBookWrapper(book[BOOK])) 582 | 583 | for field in FIELDS_TO_UPDATE_INFO: 584 | data = None 585 | 586 | # Get available data 587 | for book in to_remove: 588 | book_data = getattr(book, field[0]) 589 | 590 | if options["debug"]: 591 | logfile.write(' rem: ' + book.FileName + ': ' + field[0] + ' = ' + ToString(book_data) + '\n') 592 | 593 | if book_data: 594 | data = book_data 595 | break 596 | 597 | if not data: 598 | continue 599 | 600 | try: 601 | data = field[1](data) 602 | except: 603 | if options["verbose"]: 604 | logfile.write('updating... Could not convert data to correct type ' + field[0] + ' = ' + ToString(data) + '\n') 605 | continue 606 | 607 | # Set in missing books 608 | for book in to_keep: 609 | book_data = getattr(book, field[0]) 610 | 611 | if options["debug"]: 612 | logfile.write(' keep: ' + book.FileName + ': ' + field[0] + ' = ' + ToString(book_data) + '\n') 613 | 614 | if not book_data: 615 | if not options["updateinfo"]: 616 | logfile.write('[simulation] ') 617 | logfile.write('updating... ' + book.FileName + ': ' + field[0] + ' = ' + ToString(data) + '\n') 618 | 619 | if options["updateinfo"]: 620 | setattr(book.raw, field[0], data) 621 | 622 | 623 | # ================ DELETE COMICS FUNCTION ========================================================== 624 | 625 | def deletecomics(options, cr, deletelist, logfile): 626 | ''' Moves or deletes the specified comics and removes them from the library''' 627 | 628 | ''' Mostly ripped form StonePawn's Libary Organizer script''' 629 | 630 | if not Directory.Exists(DUPESDIRECTORY): 631 | try: 632 | Directory.CreateDirectory(DUPESDIRECTORY) 633 | except Exception, ex: 634 | MessageBox.Show('ERROR: '+ str(ex), "ERROR creating dump directory" + DUPESDIRECTORY, MessageBoxButtons.OK, MessageBoxIcon.Exclamation) 635 | logfile.write('ERROR: '+str(ex)+'\n') 636 | return 637 | 638 | for comic in deletelist: 639 | 640 | if options["movefiles"]: 641 | fullpath = Path.Combine(DUPESDIRECTORY, comic[FILENAME]) 642 | 643 | #Check if the file currently exists at all 644 | 645 | if comic[FILENAME]!='Fileless' and File.Exists(comic[FILEPATH]): 646 | #If the book is already in the location we don't have to do anything 647 | if fullpath == comic[FILEPATH]: 648 | 649 | #print "books path is the same" 650 | logfile.write("\n\nSkipped moving book " + comic[FILEPATH] + " because it is already located at the calculated path") 651 | dmCleanDirectories(DirectoryInfo(path)) 652 | 653 | if comic[FILENAME]!='Fileless' and not File.Exists(fullpath): 654 | try: 655 | File.Move(comic[FILEPATH], fullpath) 656 | comic[BOOK].FilePath = fullpath #update new file path 657 | except Exception, ex: 658 | MessageBox.Show('ERROR: '+ str(ex)+ "while trying to move " + comic[FILENAME], 'MOVE ERROR', MessageBoxButtons.OK, MessageBoxIcon.Exclamation) 659 | logfile.write('ERROR: '+str(ex)+'\n') 660 | else: 661 | logfile.write('WARNING: '+comic[FILENAME]+' could not be moved\n') 662 | 663 | logfile.write('---MOVED... '+ comic[FILENAME]+'\n') 664 | 665 | if options["removefromlib"]: 666 | try: 667 | cr.App.RemoveBook(comic[BOOK]) 668 | logfile.write('---REMOVED FROM LIBRARY... '+ comic[FILENAME]+'\n') 669 | except: 670 | logfile.write('---COULD NOT REMOVE FROM LIBRARY... '+ comic[FILENAME]+'\n') 671 | 672 | 673 | return 674 | 675 | 676 | def dmCleanDirectories(directory): 677 | #Driectory should be a directoryinfo object 678 | if not directory.Exists: 679 | return 680 | if len(directory.GetFiles()) == 0 and len(directory.GetDirectories()) == 0: 681 | parent = directory.Parent 682 | directory.Delete() 683 | CleanDirectories(parent) 684 | -------------------------------------------------------------------------------- /re.py: -------------------------------------------------------------------------------- 1 | # 2 | # Secret Labs' Regular Expression Engine 3 | # 4 | # re-compatible interface for the sre matching engine 5 | # 6 | # Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. 7 | # 8 | # This version of the SRE library can be redistributed under CNRI's 9 | # Python 1.6 license. For any other use, please contact Secret Labs 10 | # AB (info@pythonware.com). 11 | # 12 | # Portions of this engine have been developed in cooperation with 13 | # CNRI. Hewlett-Packard provided funding for 1.6 integration and 14 | # other compatibility work. 15 | # 16 | 17 | r"""Support for regular expressions (RE). 18 | 19 | This module provides regular expression matching operations similar to 20 | those found in Perl. It supports both 8-bit and Unicode strings; both 21 | the pattern and the strings being processed can contain null bytes and 22 | characters outside the US ASCII range. 23 | 24 | Regular expressions can contain both special and ordinary characters. 25 | Most ordinary characters, like "A", "a", or "0", are the simplest 26 | regular expressions; they simply match themselves. You can 27 | concatenate ordinary characters, so last matches the string 'last'. 28 | 29 | The special characters are: 30 | "." Matches any character except a newline. 31 | "^" Matches the start of the string. 32 | "$" Matches the end of the string or just before the newline at 33 | the end of the string. 34 | "*" Matches 0 or more (greedy) repetitions of the preceding RE. 35 | Greedy means that it will match as many repetitions as possible. 36 | "+" Matches 1 or more (greedy) repetitions of the preceding RE. 37 | "?" Matches 0 or 1 (greedy) of the preceding RE. 38 | *?,+?,?? Non-greedy versions of the previous three special characters. 39 | {m,n} Matches from m to n repetitions of the preceding RE. 40 | {m,n}? Non-greedy version of the above. 41 | "\\" Either escapes special characters or signals a special sequence. 42 | [] Indicates a set of characters. 43 | A "^" as the first character indicates a complementing set. 44 | "|" A|B, creates an RE that will match either A or B. 45 | (...) Matches the RE inside the parentheses. 46 | The contents can be retrieved or matched later in the string. 47 | (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below). 48 | (?:...) Non-grouping version of regular parentheses. 49 | (?P...) The substring matched by the group is accessible by name. 50 | (?P=name) Matches the text matched earlier by the group named name. 51 | (?#...) A comment; ignored. 52 | (?=...) Matches if ... matches next, but doesn't consume the string. 53 | (?!...) Matches if ... doesn't match next. 54 | (?<=...) Matches if preceded by ... (must be fixed length). 55 | (?= 0x02020000: 180 | __all__.append("finditer") 181 | def finditer(pattern, string, flags=0): 182 | """Return an iterator over all non-overlapping matches in the 183 | string. For each match, the iterator returns a match object. 184 | 185 | Empty matches are included in the result.""" 186 | return _compile(pattern, flags).finditer(string) 187 | 188 | def compile(pattern, flags=0): 189 | "Compile a regular expression pattern, returning a pattern object." 190 | return _compile(pattern, flags) 191 | 192 | def purge(): 193 | "Clear the regular expression cache" 194 | _cache.clear() 195 | _cache_repl.clear() 196 | 197 | def template(pattern, flags=0): 198 | "Compile a template pattern, returning a pattern object" 199 | return _compile(pattern, flags|T) 200 | 201 | _alphanum = {} 202 | for c in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890': 203 | _alphanum[c] = 1 204 | del c 205 | 206 | def escape(pattern): 207 | "Escape all non-alphanumeric characters in pattern." 208 | s = list(pattern) 209 | alphanum = _alphanum 210 | for i in range(len(pattern)): 211 | c = pattern[i] 212 | if c not in alphanum: 213 | if c == "\000": 214 | s[i] = "\\000" 215 | else: 216 | s[i] = "\\" + c 217 | return pattern[:0].join(s) 218 | 219 | # -------------------------------------------------------------------- 220 | # internals 221 | 222 | _cache = {} 223 | _cache_repl = {} 224 | 225 | _pattern_type = type(sre_compile.compile("", 0)) 226 | 227 | _MAXCACHE = 100 228 | 229 | def _compile(*key): 230 | # internal: compile pattern 231 | cachekey = (type(key[0]),) + key 232 | p = _cache.get(cachekey) 233 | if p is not None: 234 | return p 235 | pattern, flags = key 236 | if isinstance(pattern, _pattern_type): 237 | if flags: 238 | raise ValueError('Cannot process flags argument with a compiled pattern') 239 | return pattern 240 | if not sre_compile.isstring(pattern): 241 | raise TypeError, "first argument must be string or compiled pattern" 242 | try: 243 | p = sre_compile.compile(pattern, flags) 244 | except error, v: 245 | raise error, v # invalid expression 246 | if len(_cache) >= _MAXCACHE: 247 | _cache.clear() 248 | _cache[cachekey] = p 249 | return p 250 | 251 | def _compile_repl(*key): 252 | # internal: compile replacement pattern 253 | p = _cache_repl.get(key) 254 | if p is not None: 255 | return p 256 | repl, pattern = key 257 | try: 258 | p = sre_parse.parse_template(repl, pattern) 259 | except error, v: 260 | raise error, v # invalid expression 261 | if len(_cache_repl) >= _MAXCACHE: 262 | _cache_repl.clear() 263 | _cache_repl[key] = p 264 | return p 265 | 266 | def _expand(pattern, match, template): 267 | # internal: match.expand implementation hook 268 | template = sre_parse.parse_template(template, pattern) 269 | return sre_parse.expand_template(template, match) 270 | 271 | def _subx(pattern, template): 272 | # internal: pattern.sub/subn implementation helper 273 | template = _compile_repl(template, pattern) 274 | if not template[0] and len(template[1]) == 1: 275 | # literal replacement 276 | return template[1][0] 277 | def filter(match, template=template): 278 | return sre_parse.expand_template(template, match) 279 | return filter 280 | 281 | # register myself for pickling 282 | 283 | import copy_reg 284 | 285 | def _pickle(p): 286 | return _compile, (p.pattern, p.flags) 287 | 288 | copy_reg.pickle(_pattern_type, _pickle, _compile) 289 | 290 | # -------------------------------------------------------------------- 291 | # experimental stuff (see python-dev discussions for details) 292 | 293 | class Scanner: 294 | def __init__(self, lexicon, flags=0): 295 | from sre_constants import BRANCH, SUBPATTERN 296 | self.lexicon = lexicon 297 | # combine phrases into a compound pattern 298 | p = [] 299 | s = sre_parse.Pattern() 300 | s.flags = flags 301 | for phrase, action in lexicon: 302 | p.append(sre_parse.SubPattern(s, [ 303 | (SUBPATTERN, (len(p)+1, sre_parse.parse(phrase, flags))), 304 | ])) 305 | s.groups = len(p)+1 306 | p = sre_parse.SubPattern(s, [(BRANCH, (None, p))]) 307 | self.scanner = sre_compile.compile(p) 308 | def scan(self, string): 309 | result = [] 310 | append = result.append 311 | match = self.scanner.scanner(string).match 312 | i = 0 313 | while 1: 314 | m = match() 315 | if not m: 316 | break 317 | j = m.end() 318 | if i == j: 319 | break 320 | action = self.lexicon[m.lastindex-1][1] 321 | if hasattr(action, '__call__'): 322 | self.match = m 323 | action = action(self, m.group()) 324 | if action is not None: 325 | append(action) 326 | i = j 327 | return result, string[i:] 328 | -------------------------------------------------------------------------------- /traceback.py: -------------------------------------------------------------------------------- 1 | """Extract, format and print information about Python stack traces.""" 2 | # Modified to remove linecache dependency 3 | 4 | #import linecache 5 | import sys 6 | 7 | __all__ = ['extract_stack', 'extract_tb', 'format_exception', 8 | 'format_exception_only', 'format_list', 'format_stack', 9 | 'format_tb', 'print_exc', 'format_exc', 'print_exception', 10 | 'print_last', 'print_stack', 'print_tb'] 11 | 12 | def _print(file, str='', terminator='\n'): 13 | file.write(str+terminator) 14 | 15 | 16 | def print_list(extracted_list, file=None): 17 | """Print the list of tuples as returned by extract_tb() or 18 | extract_stack() as a formatted stack trace to the given file.""" 19 | if file is None: 20 | file = sys.stderr 21 | for filename, lineno, name, line in extracted_list: 22 | _print(file, 23 | ' File "%s", line %d, in %s' % (filename,lineno,name)) 24 | if line: 25 | _print(file, ' %s' % line.strip()) 26 | 27 | def format_list(extracted_list): 28 | """Format a list of traceback entry tuples for printing. 29 | 30 | Given a list of tuples as returned by extract_tb() or 31 | extract_stack(), return a list of strings ready for printing. 32 | Each string in the resulting list corresponds to the item with the 33 | same index in the argument list. Each string ends in a newline; 34 | the strings may contain internal newlines as well, for those items 35 | whose source text line is not None. 36 | """ 37 | list = [] 38 | for filename, lineno, name, line in extracted_list: 39 | item = ' File "%s", line %d, in %s\n' % (filename,lineno,name) 40 | if line: 41 | item = item + ' %s\n' % line.strip() 42 | list.append(item) 43 | return list 44 | 45 | 46 | def print_tb(tb, limit=None, file=None): 47 | """Print up to 'limit' stack trace entries from the traceback 'tb'. 48 | 49 | If 'limit' is omitted or None, all entries are printed. If 'file' 50 | is omitted or None, the output goes to sys.stderr; otherwise 51 | 'file' should be an open file or file-like object with a write() 52 | method. 53 | """ 54 | if file is None: 55 | file = sys.stderr 56 | if limit is None: 57 | if hasattr(sys, 'tracebacklimit'): 58 | limit = sys.tracebacklimit 59 | n = 0 60 | while tb is not None and (limit is None or n < limit): 61 | f = tb.tb_frame 62 | lineno = tb.tb_lineno 63 | co = f.f_code 64 | filename = co.co_filename 65 | name = co.co_name 66 | _print(file, 67 | ' File "%s", line %d, in %s' % (filename, lineno, name)) 68 | #linecache.checkcache(filename) 69 | #line = linecache.getline(filename, lineno, f.f_globals) 70 | #if line: _print(file, ' ' + line.strip()) 71 | tb = tb.tb_next 72 | n = n+1 73 | 74 | def format_tb(tb, limit = None): 75 | """A shorthand for 'format_list(extract_stack(f, limit)).""" 76 | return format_list(extract_tb(tb, limit)) 77 | 78 | def extract_tb(tb, limit = None): 79 | """Return list of up to limit pre-processed entries from traceback. 80 | 81 | This is useful for alternate formatting of stack traces. If 82 | 'limit' is omitted or None, all entries are extracted. A 83 | pre-processed stack trace entry is a quadruple (filename, line 84 | number, function name, text) representing the information that is 85 | usually printed for a stack trace. The text is a string with 86 | leading and trailing whitespace stripped; if the source is not 87 | available it is None. 88 | """ 89 | if limit is None: 90 | if hasattr(sys, 'tracebacklimit'): 91 | limit = sys.tracebacklimit 92 | list = [] 93 | n = 0 94 | while tb is not None and (limit is None or n < limit): 95 | f = tb.tb_frame 96 | lineno = tb.tb_lineno 97 | co = f.f_code 98 | filename = co.co_filename 99 | name = co.co_name 100 | #linecache.checkcache(filename) 101 | #line = linecache.getline(filename, lineno, f.f_globals) 102 | #if line: line = line.strip() 103 | #else: 104 | line = None 105 | list.append((filename, lineno, name, line)) 106 | tb = tb.tb_next 107 | n = n+1 108 | return list 109 | 110 | 111 | _cause_message = ( 112 | "\nThe above exception was the direct cause " 113 | "of the following exception:\n") 114 | 115 | _context_message = ( 116 | "\nDuring handling of the above exception, " 117 | "another exception occurred:\n") 118 | 119 | def _iter_chain(exc, custom_tb=None, seen=None): 120 | if seen is None: 121 | seen = set() 122 | seen.add(exc) 123 | its = [] 124 | cause = exc.__cause__ 125 | context = exc.__context__ 126 | if cause is not None and cause not in seen: 127 | its.append(_iter_chain(cause, None, seen)) 128 | its.append([(_cause_message, None)]) 129 | if context is not None and context is not cause and context not in seen: 130 | its.append(_iter_chain(context, None, seen)) 131 | its.append([(_context_message, None)]) 132 | its.append([(exc, custom_tb or exc.__traceback__)]) 133 | # itertools.chain is in an extension module and may be unavailable 134 | for it in its: 135 | for x in it: 136 | yield x 137 | 138 | 139 | def print_exception(etype, value, tb, limit=None, file=None, chain=True): 140 | """Print exception up to 'limit' stack trace entries from 'tb' to 'file'. 141 | 142 | This differs from print_tb() in the following ways: (1) if 143 | traceback is not None, it prints a header "Traceback (most recent 144 | call last):"; (2) it prints the exception type and value after the 145 | stack trace; (3) if type is SyntaxError and value has the 146 | appropriate format, it prints the line where the syntax error 147 | occurred with a caret on the next line indicating the approximate 148 | position of the error. 149 | """ 150 | if file is None: 151 | file = sys.stderr 152 | if chain: 153 | values = _iter_chain(value, tb) 154 | else: 155 | values = [(value, tb)] 156 | for value, tb in values: 157 | if isinstance(value, str): 158 | _print(file, value) 159 | continue 160 | if tb: 161 | _print(file, 'Traceback (most recent call last):') 162 | print_tb(tb, limit, file) 163 | lines = format_exception_only(type(value), value) 164 | for line in lines: 165 | _print(file, line, '') 166 | 167 | def format_exception(etype, value, tb, limit=None, chain=True): 168 | """Format a stack trace and the exception information. 169 | 170 | The arguments have the same meaning as the corresponding arguments 171 | to print_exception(). The return value is a list of strings, each 172 | ending in a newline and some containing internal newlines. When 173 | these lines are concatenated and printed, exactly the same text is 174 | printed as does print_exception(). 175 | """ 176 | list = [] 177 | if chain: 178 | values = _iter_chain(value, tb) 179 | else: 180 | values = [(value, tb)] 181 | for value, tb in values: 182 | if isinstance(value, str): 183 | list.append(value + '\n') 184 | continue 185 | if tb: 186 | list.append('Traceback (most recent call last):\n') 187 | list.extend(format_tb(tb, limit)) 188 | list.extend(format_exception_only(type(value), value)) 189 | return list 190 | 191 | def format_exception_only(etype, value): 192 | """Format the exception part of a traceback. 193 | 194 | The arguments are the exception type and value such as given by 195 | sys.last_type and sys.last_value. The return value is a list of 196 | strings, each ending in a newline. 197 | 198 | Normally, the list contains a single string; however, for 199 | SyntaxError exceptions, it contains several lines that (when 200 | printed) display detailed information about where the syntax 201 | error occurred. 202 | 203 | The message indicating which exception occurred is always the last 204 | string in the list. 205 | 206 | """ 207 | # Gracefully handle (the way Python 2.4 and earlier did) the case of 208 | # being called with (None, None). 209 | if etype is None: 210 | return [_format_final_exc_line(etype, value)] 211 | 212 | stype = etype.__name__ 213 | smod = etype.__module__ 214 | if smod not in ("__main__", "builtins"): 215 | stype = smod + '.' + stype 216 | 217 | if not issubclass(etype, SyntaxError): 218 | return [_format_final_exc_line(stype, value)] 219 | 220 | # It was a syntax error; show exactly where the problem was found. 221 | lines = [] 222 | filename = value.filename or "" 223 | lineno = str(value.lineno) or '?' 224 | lines.append(' File "%s", line %s\n' % (filename, lineno)) 225 | badline = value.text 226 | offset = value.offset 227 | if badline is not None: 228 | lines.append(' %s\n' % badline.strip()) 229 | if offset is not None: 230 | caretspace = badline.rstrip('\n')[:offset].lstrip() 231 | # non-space whitespace (likes tabs) must be kept for alignment 232 | caretspace = ((c.isspace() and c or ' ') for c in caretspace) 233 | # only three spaces to account for offset1 == pos 0 234 | lines.append(' %s^\n' % ''.join(caretspace)) 235 | msg = value.msg or "" 236 | lines.append("%s: %s\n" % (stype, msg)) 237 | return lines 238 | 239 | def _format_final_exc_line(etype, value): 240 | valuestr = _some_str(value) 241 | if value is None or not valuestr: 242 | line = "%s\n" % etype 243 | else: 244 | line = "%s: %s\n" % (etype, valuestr) 245 | return line 246 | 247 | def _some_str(value): 248 | try: 249 | return str(value) 250 | except: 251 | return '' % type(value).__name__ 252 | 253 | 254 | def print_exc(limit=None, file=None, chain=True): 255 | """Shorthand for 'print_exception(*sys.exc_info(), limit, file)'.""" 256 | if file is None: 257 | file = sys.stderr 258 | try: 259 | etype, value, tb = sys.exc_info() 260 | print_exception(etype, value, tb, limit, file, chain) 261 | finally: 262 | etype = value = tb = None 263 | 264 | 265 | def format_exc(limit=None, chain=True): 266 | """Like print_exc() but return a string.""" 267 | try: 268 | etype, value, tb = sys.exc_info() 269 | return ''.join( 270 | format_exception(etype, value, tb, limit, chain)) 271 | finally: 272 | etype = value = tb = None 273 | 274 | 275 | def print_last(limit=None, file=None, chain=True): 276 | """This is a shorthand for 'print_exception(sys.last_type, 277 | sys.last_value, sys.last_traceback, limit, file)'.""" 278 | if not hasattr(sys, "last_type"): 279 | raise ValueError("no last exception") 280 | if file is None: 281 | file = sys.stderr 282 | print_exception(sys.last_type, sys.last_value, sys.last_traceback, 283 | limit, file, chain) 284 | 285 | 286 | def print_stack(f=None, limit=None, file=None): 287 | """Print a stack trace from its invocation point. 288 | 289 | The optional 'f' argument can be used to specify an alternate 290 | stack frame at which to start. The optional 'limit' and 'file' 291 | arguments have the same meaning as for print_exception(). 292 | """ 293 | if f is None: 294 | try: 295 | raise ZeroDivisionError 296 | except ZeroDivisionError: 297 | f = sys.exc_info()[2].tb_frame.f_back 298 | print_list(extract_stack(f, limit), file) 299 | 300 | def format_stack(f=None, limit=None): 301 | """Shorthand for 'format_list(extract_stack(f, limit))'.""" 302 | if f is None: 303 | try: 304 | raise ZeroDivisionError 305 | except ZeroDivisionError: 306 | f = sys.exc_info()[2].tb_frame.f_back 307 | return format_list(extract_stack(f, limit)) 308 | 309 | def extract_stack(f=None, limit = None): 310 | """Extract the raw traceback from the current stack frame. 311 | 312 | The return value has the same format as for extract_tb(). The 313 | optional 'f' and 'limit' arguments have the same meaning as for 314 | print_stack(). Each item in the list is a quadruple (filename, 315 | line number, function name, text), and the entries are in order 316 | from oldest to newest stack frame. 317 | """ 318 | if f is None: 319 | try: 320 | raise ZeroDivisionError 321 | except ZeroDivisionError: 322 | f = sys.exc_info()[2].tb_frame.f_back 323 | if limit is None: 324 | if hasattr(sys, 'tracebacklimit'): 325 | limit = sys.tracebacklimit 326 | list = [] 327 | n = 0 328 | while f is not None and (limit is None or n < limit): 329 | lineno = f.f_lineno 330 | co = f.f_code 331 | filename = co.co_filename 332 | name = co.co_name 333 | #linecache.checkcache(filename) 334 | #line = linecache.getline(filename, lineno, f.f_globals) 335 | #if line: line = line.strip() 336 | #else: 337 | line = None 338 | list.append((filename, lineno, name, line)) 339 | f = f.f_back 340 | n = n+1 341 | list.reverse() 342 | return list 343 | -------------------------------------------------------------------------------- /utilsbycory.py: -------------------------------------------------------------------------------- 1 | ##################################################################################################### 2 | ## 3 | ## utilsbycory.py - part of duplicatemanager, a script for comicrack 4 | ## 5 | ## Author: perezmu after cbanack 6 | ## 7 | ## Copyleft perezmu 2011. 8 | ## 9 | ###################################################################################################### 10 | 11 | #### Original declarations 12 | ''' 13 | This module contains utility methods for working with ComicRack 14 | ComicBook objects (i.e. 'book' objects). 15 | 16 | @author: Cory Banack 17 | ''' 18 | #### 19 | 20 | 21 | import re 22 | 23 | def cleanupseries(series_name): 24 | 25 | # All of the symbols below cause inconsistency in title searches 26 | series_name = series_name.lower() 27 | series_name = series_name.replace('.', '') 28 | series_name = series_name.replace('_', ' ') 29 | series_name = series_name.replace('-', ' ') 30 | series_name = series_name.replace("'", ' ') 31 | series_name = series_name.replace(":", ' ') 32 | series_name = re.sub(r'\b(vs\.?|versus|and|or|the|an|of|a|is)\b','', series_name) 33 | series_name = re.sub(r'giantsize', r'giant size', series_name) 34 | series_name = re.sub(r'giant[- ]*sized', r'giant size', series_name) 35 | series_name = re.sub(r'kingsize', r'king size', series_name) 36 | series_name = re.sub(r'king[- ]*sized', r'king size', series_name) 37 | series_name = re.sub(r"directors", r"director's", series_name) 38 | series_name = re.sub(r"\bvolume\b", r"\bvol\b", series_name) 39 | series_name = re.sub(r"\bvol\.\b", r"\bvol\b", series_name) 40 | 41 | series_name = re.sub(r'[ ]*', r'', series_name) 42 | 43 | return series_name 44 | 45 | 46 | def convertnumberwords(phrase_s, expand_b): 47 | """ 48 | Converts all of the number words (as defined by regular expression 'words') 49 | in the given phrase, either expanding or contracting them as specified. 50 | When expanding, words like '1' and '2nd' will be transformed into 'one' 51 | and 'second' in the returned string. When contracting, the transformation 52 | goes in reverse. 53 | 54 | This method only works for numbers up to 20, and it only works properly 55 | on lower case strings. 56 | """ 57 | number_map = {'0': 'zero', '1': 'one', '2': 'two', '3': 'three',\ 58 | '4': 'four', '5': 'five', '6': 'six','7': 'seven', '8': 'eight',\ 59 | '9': 'nine', '10': 'ten', '11': 'eleven', '12': 'twelve',\ 60 | '13': 'thirteen', '14': 'fourteen', '15': 'fifteen',\ 61 | '16': 'sixteen', '17': 'seventeen', '18': 'eighteen', '19': 'nineteen',\ 62 | '20': 'twenty', '0th': 'zeroth', '1rst': 'first', '2nd': 'second',\ 63 | '3rd': 'third', '4th': 'fourth', '5th': 'fifth', '6th': 'sixth',\ 64 | '7th': 'seventh', '8th': 'eighth', '9th': 'ninth', '10th': 'tenth',\ 65 | '11th': 'eleventh', '12th': 'twelveth', '13th': 'thirteenth',\ 66 | '14th': 'fourteenth', '15th': 'fifteenth', '16th': 'sixteenth',\ 67 | '17th': 'seventeenth', '18th': 'eighteenth', '19th': 'nineteenth',\ 68 | '20th': 'twentieth'} 69 | 70 | b = r'\b' 71 | if expand_b: 72 | for (x,y) in number_map.iteritems(): 73 | phrase_s = re.sub(b+x+b, y, phrase_s); 74 | phrase_s = re.sub(r'\b1st\b', 'first', phrase_s); 75 | else: 76 | for (x,y) in number_map.iteritems(): 77 | phrase_s = re.sub(b+y+b, x, phrase_s); 78 | phrase_s = re.sub(r'\btwelfth\b', '12th', phrase_s); 79 | phrase_s = re.sub(r'\beightteenth\b', '18th', phrase_s); 80 | return phrase_s 81 | 82 | 83 | 84 | ### Other util methods 85 | 86 | import System 87 | 88 | 89 | def ToString(v): 90 | if v is None: 91 | return '' 92 | return unicode(v).encode(System.Text.Encoding.Default.BodyName, 'replace') 93 | 94 | --------------------------------------------------------------------------------