├── LICENSE ├── README.md └── mb2md.pl /LICENSE: -------------------------------------------------------------------------------- 1 | Public Domain 2 | This work (mb2md.pl, by Robin Whittle, Juri Haberland, and Richard Bullington-McGuire), identified by The Obscure Organization, is free of known copyright restrictions. 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | mb2md.pl 2 | ======== 3 | A conversion script that converts mbox format files to Maildir folders, modified by The Obscure Organization to fit its purposes. 4 | 5 | ## Directions 6 | See the comments in [mb2md.pl](mb2md.pl) for detailed instructions. 7 | 8 | ## Acknowledgements 9 | Thanks go to Scott Hanselman for the [suggestion and instructions on switching the git default branch from master to main](https://www.hanselman.com/blog/EasilyRenameYourGitDefaultBranchFromMasterToMain.aspx). This repository transitioned to using `main` as its default branch on 2020-06-14. 10 | 11 | ## License 12 | This software is in the public domain, see [`LICENSE`](LICENSE) for details. 13 | -------------------------------------------------------------------------------- /mb2md.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | # 3 | # $Id: mb2md.pl,v 1.26 2004/03/28 00:09:46 juri Exp $ 4 | # 5 | # mb2md-3.20.pl Converts Mbox mailboxes to Maildir format. 6 | # 7 | 8 | # !!!! This version of mb2md.pl has been modified by The Obscure Organization 9 | # !!!! to aid in its mail conversion from mbox to maildir. 10 | # !!!! Send questions about the '-t' switch to: 11 | # !!!! Richard Bullington-McGuire 12 | # !!!! @obscurerichard on twitter and github 13 | # 14 | # !! This is a version modified for Dovecot. Use Dovecot mailing list 15 | # !! for questions, patches, etc. You don't have to be 16 | # !! subscribed to send mail there. Do not send mail directly to people 17 | # !! listed below. 18 | 19 | # Public domain. 20 | # 21 | # currently maintained by: 22 | # Juri Haberland 23 | # initially wrote by: 24 | # Robin Whittle 25 | # 26 | # This script's web abode is http://batleth.sapienti-sat.org/projects/mb2md/ . 27 | # For a changelog see http://batleth.sapienti-sat.org/projects/mb2md/changelog.txt 28 | # 29 | # The Mbox -> Maildir inner loop is based on qmail's script mbox2maildir, which 30 | # was kludged by Ivan Kohler in 1997 from convertandcreate (public domain) 31 | # by Russel Nelson. Both these convert a single mailspool file. 32 | # 33 | # The qmail distribution has a maildir2mbox.c program. 34 | # 35 | # What is does: 36 | # ============= 37 | # 38 | # Reads a directory full of Mbox format mailboxes and creates a set of 39 | # Maildir format mailboxes. Some details of this are to suit Courier 40 | # IMAP's naming conventions for Maildir mailboxes. 41 | # 42 | # http://www.inter7.com/courierimap/ 43 | # 44 | # This is intended to automate the conversion of the old 45 | # /var/spool/mail/blah file - with one call of this script - and to 46 | # convert one or more mailboxes in a specifed directory with separate 47 | # calls with other command line arguments. 48 | # 49 | # Run this as the user - in these examples "blah". 50 | 51 | # This version supports conversion of: 52 | # 53 | # Date The date-time in the "From " line of the message in the 54 | # Mbox format is the date when the message was *received*. 55 | # This is transformed into the date-time of the file which 56 | # contains the message in the Maildir mailbox. 57 | # 58 | # This relies on the Date::Parse perl module and the utime 59 | # perl function. 60 | # 61 | # The script tries to cope with errant forms of the 62 | # Mbox "From " line which it may encounter, but if 63 | # there is something really screwy in a From line, 64 | # then perhaps the script will fail when "touch" 65 | # is given an invalid date. Please report the 66 | # exact nature of any such "From " line! 67 | # 68 | # 69 | # Flagged 70 | # Replied 71 | # Read = Seen 72 | # Tagged for Deletion 73 | # 74 | # In the Mbox message, flags for these are found in the 75 | # "Status: N" or "X-Status: N" headers, where "N" is 0 76 | # or more of the following characters in the left column. 77 | # 78 | # They are converted to characters in the right column, 79 | # which become the last characters of the file name, 80 | # following the ":2," which indicates IMAP message status. 81 | # 82 | # 83 | # F -> F Flagged 84 | # A -> R Replied 85 | # R -> S Read = Seen 86 | # D -> T Tagged for Deletion (Trash) 87 | # 88 | # This is based on the work of Philip Mak who wrote a 89 | # completely separate Mbox -> Maildir converter called 90 | # perfect_maildir and posted it to the Mutt-users mailing 91 | # list on 25 December 2001: 92 | # 93 | # http://www.mail-archive.com/mutt-users@mutt.org/msg21872.html 94 | # 95 | # Michael Best originally integrated those changes into mb2md. 96 | # 97 | # UIDs (Dovecot and Courier) 98 | # Using the -U or -u options will cause this program to maintain 99 | # UIDVALIDITY and UIDLAST for folders and UIDs for individual 100 | # messages. The X-IMAP:, X-IMAPbase:, and X-UID: headers are 101 | # examined and appropriate files generated for Dovecot or Courier 102 | # in the destination Maildir to ensure these values are all kept. 103 | # 104 | # UID support added by Julian Fitzell June, 2008 105 | # 106 | # Message Keywords (Dovecot only) 107 | # Using the -K option will cause this program to maintain message 108 | # keywords (also known by other names such as tags). This is 109 | # currently only supported for Dovecot and involves looking at 110 | # the X-IMAP:, X-IMAPbase:, and X-Keywords: headers. The keywords 111 | # are written to a file in the Maildir which maps them to flags. 112 | # The flags are then appended the message filenames. 113 | # 114 | # Keyword support added by Julian Fitzell June, 2008 115 | # 116 | # In addition, the names of the message files in the Maildir are of a 117 | # regular length and are of the form: 118 | # 119 | # 7654321.000123.mbox:2,xxx 120 | # 121 | # Where "7654321" is the Unix time in seconds when the script was 122 | # run and "000123" is the six zeroes padded message number as 123 | # messages are converted from the Mbox file. "xxx" represents zero or 124 | # more of the above flags F, R, S or T. 125 | # 126 | # Message Size Tags 127 | # 128 | # Additionally, there is optional support for including ,S= and ,W= tags 129 | # before the colon. These message names are still valid Maildir filenames 130 | # and the tags are used by mail programs to speed up calculation of quotas 131 | # and the return of message sizes to IMAP clients. ,S= is part of the 132 | # Maildir++ standard. 133 | # (See: http://www.inter7.com/courierimap/README.maildirquota.html ) 134 | # As far as I can tell, ,W= is probably only used by Dovecot. 135 | # (See: http://wiki.dovecot.org/MailboxFormat/Maildir ) 136 | # 137 | # Size Tags added by Julian Fitzell June, 2008 138 | # 139 | # --------------------------------------------------------------------- 140 | # 141 | # 142 | # USAGE 143 | # ===== 144 | # 145 | # Run this as the user of the mailboxes, not as root. 146 | # 147 | # 148 | # mb2md -h 149 | # mb2md [-c] [-K] [-U|-u] [-S] [-t] [-W] -m [-d destdir] 150 | # mb2md [-c] [-K] [-U|-u] [-S] [-t] [-W] -s sourcefile [-d destdir] 151 | # mb2md [-c] [-K] [-U|-u] [-S] [-t] [-W] -s sourcedir [-l wu-mailboxlist] [-R|-f somefolder] [-d destdir] [-r strip_extension] 152 | # 153 | # -c use the Content-Length: headers (if present) to find the 154 | # beginning of the next message 155 | # Use with caution! Results may be unreliable. I recommend to do 156 | # a run without "-c" first and only use it if you are certain, 157 | # that the mbox in question really needs the "-c" option 158 | # 159 | # -K Preserve message keywords in a Dovecot-compatible way. This 160 | # looks for X-Keywords: tags and X-IMAP: and X-IMAPbase: tags 161 | # to determine keywords for messages and creates a Dovecot- 162 | # compatible "dovecot-keywords" file in "destdir" 163 | # NOTE: NO LOCKING IS DONE AND THE FILE MUST NOT ALREADY EXIST. 164 | # IF YOU USE THIS OPTION ON A MAILDIR THAT MAY BE ACCESSED BY 165 | # ANOTHER PROGRAM AT THE SAME TIME, STRANGE THINGS MAY HAPPEN. 166 | # 167 | # -U Preserve message UIDs in a Dovecot-compatible way 168 | # Looks for X-UID:, X-IMAP:, and X-IMAPbase: headers and 169 | # creates a Dovecot-compatible dovecot-uidlist file in 170 | # "destdir" 171 | # NOTE: NO LOCKING IS DONE AND THE FILE MUST NOT ALREADY EXIST. 172 | # IF YOU USE THIS OPTION ON A MAILDIR THAT MAY BE ACCESSED BY 173 | # ANOTHER PROGRAM AT THE SAME TIME, STRANGE THINGS MAY HAPPEN. 174 | # 175 | # -u Same as -U above, except creates a Courier IMAP-compatible 176 | # courierimapuiddb file instead. The only difference according 177 | # to http://wiki.dovecot.org/MailboxFormat/Maildir is that 178 | # Courier IMAP only stores the maildir file's basename 179 | # (everything before the colon) 180 | # NOTE: NO LOCKING IS DONE AND THE FILE MUST NOT ALREADY EXIST. 181 | # IF YOU USE THIS OPTION ON A MAILDIR THAT MAY BE ACCESSED BY 182 | # ANOTHER PROGRAM AT THE SAME TIME, STRANGE THINGS MAY HAPPEN. 183 | # 184 | # -S Add Maildir++ standard ,S= tag to the message filenames 185 | # indicating the size of the message on disk. This can be used 186 | # by Courier and Dovecot in calculating quotas. 187 | # I think Dovecot always uses this but not sure about Courier. 188 | # For Exim, see the quota_size_regex and maildir_tag config 189 | # statements. 190 | # 191 | # -t Name the converted mail files using a timestamp derived 192 | # from the envelope dates. Using this will preserve the 193 | # arrival order of the messages with some mail clients, 194 | # including alpine and the iOS 4.x Mail application. 195 | # 196 | # -W Add ,W= tag to the message filename indicating the RFC822.SIZE 197 | # of the message. This is the size of the message when actually 198 | # sent to an IMAP client with LF characters converted to CRLF 199 | # pairs as per the spec. Dovecot uses this to speed up returning 200 | # these sizes. Not sure if any other applications use it. 201 | # 202 | # -m If this is used then the source will 203 | # be the single mailbox at /var/spool/mail/blah for 204 | # user blah and the destination mailbox will be the 205 | # "destdir" mailbox itself. 206 | # 207 | # 208 | # -s source Directory or file relative to the user's home directory, 209 | # which is where the the "somefolders" directories are located. 210 | # Or if starting with a "/" it is taken as a 211 | # absolute path, e.g. /mnt/oldmail/user 212 | # 213 | # or 214 | # 215 | # A single mbox file which will be converted to 216 | # the destdir. 217 | # 218 | # -R If defined, do not skip directories found in a mailbox 219 | # directory, but runs recursively into each of them, 220 | # creating all wanted folders in Maildir. 221 | # Incompatible with '-f' 222 | # 223 | # -f somefolder Directories, relative to "sourcedir" where the Mbox files 224 | # are. All mailboxes in the "sourcedir" 225 | # directory will be converted and placed in the 226 | # "destdir" directory. (Typically the Inbox directory 227 | # which in this instance is also functioning as a 228 | # folder for other mailboxes.) 229 | # 230 | # The "somefolder" directory 231 | # name will be encoded into the new mailboxes' names. 232 | # See the examples below. 233 | # 234 | # This does not save an UW IMAP dummy message file 235 | # at the start of the Mbox file. Small changes 236 | # in the code could adapt it for looking for 237 | # other distinctive patterns of dummy messages too. 238 | # 239 | # Don't let the source directory you give as "somefolders" 240 | # contain any "."s in its name, unless you want to 241 | # create subfolders from the IMAP user's point of 242 | # view. See the example below. 243 | # 244 | # Incompatible with '-f' 245 | # 246 | # 247 | # -d destdir Directory where the Maildir format directories will be created. 248 | # If not given, then the destination will be ~/Maildir . 249 | # Typically, this is what the IMAP server sees as the 250 | # Inbox and the folder for all user mailboxes. 251 | # If this begins with a '/' the path is considered to be 252 | # absolute, otherwise it is relative to the users 253 | # home directory. 254 | # 255 | # -r strip_ext If defined this extension will be stripped from 256 | # the original mailbox file name before creating 257 | # the corresponding maildir. The extension must be 258 | # given without the leading dot ("."). See the example below. 259 | # 260 | # -l WU-file File containing the list of subscribed folders. If 261 | # migrating from WU-IMAP the list of subscribed folders will 262 | # be found in the file called .mailboxlist in the users 263 | # home directory. This will convert all subscribed folders 264 | # for a single user: 265 | # /bin/mb2md -s mail -l .mailboxlist -R -d Maildir 266 | # and for all users in a directory as root you can do the 267 | # following: 268 | # for i in *; do echo $i;su - $i -c "/bin/mb2md -s mail -l .mailboxlist -R -d Maildir";done 269 | # 270 | # 271 | # Example 272 | # ======= 273 | # 274 | # We have a bunch of directories of Mbox mailboxes located at 275 | # /home/blah/oldmail/ 276 | # 277 | # /home/blah/oldmail/fffff 278 | # /home/blah/oldmail/ggggg 279 | # /home/blah/oldmail/xxx/aaaa 280 | # /home/blah/oldmail/xxx/bbbb 281 | # /home/blah/oldmail/xxx/cccc 282 | # /home/blah/oldmail/xxx/dddd 283 | # /home/blah/oldmail/yyyy/huey 284 | # /home/blah/oldmail/yyyy/duey 285 | # /home/blah/oldmail/yyyy/louie 286 | # 287 | # With the UW IMAP server, fffff and ggggg would have appeared in the root 288 | # of this mail server, along with the Inbox. aaaa, bbbb etc, would have 289 | # appeared in a folder called xxx from that root, and xxx was just a folder 290 | # not a mailbox for storing messages. 291 | # 292 | # We also have the mailspool Inbox at: 293 | # 294 | # /var/spool/mail/blah 295 | # 296 | # 297 | # To convert these, as user blah, we give the first command: 298 | # 299 | # mb2md -m 300 | # 301 | # The main Maildir directory will be created if it does not exist. 302 | # (This is true of any argument options, not just "-m".) 303 | # 304 | # /home/blah/Maildir/ 305 | # 306 | # It has the following subdirectories: 307 | # 308 | # /home/blah/Maildir/tmp/ 309 | # /home/blah/Maildir/new/ 310 | # /home/blah/Maildir/cur/ 311 | # 312 | # Then /var/spool/blah file is read, split into individual files and 313 | # written into /home/blah/Maildir/cur/ . 314 | # 315 | # Now we give the second command: 316 | # 317 | # mb2md -s oldmail -R 318 | # 319 | # This reads recursively all Mbox mailboxes and creates: 320 | # 321 | # /home/blah/Maildir/.fffff/ 322 | # /home/blah/Maildir/.ggggg/ 323 | # /home/blah/Maildir/.xxx/ 324 | # /home/blah/Maildir/.xxx.aaaa/ 325 | # /home/blah/Maildir/.xxx.bbbb/ 326 | # /home/blah/Maildir/.xxx.cccc/ 327 | # /home/blah/Maildir/.xxx.aaaa/ 328 | # /home/blah/Maildir/.yyyy/ 329 | # /home/blah/Maildir/.yyyy.huey/ 330 | # /home/blah/Maildir/.yyyy.duey/ 331 | # /home/blah/Maildir/.yyyy.louie/ 332 | # 333 | # The result, from the IMAP client's point of view is: 334 | # 335 | # Inbox ----------------- 336 | # | 337 | # | fffff ----------- 338 | # | ggggg ----------- 339 | # | 340 | # - xxx ------------- 341 | # | | aaaa -------- 342 | # | | bbbb -------- 343 | # | | cccc -------- 344 | # | | dddd -------- 345 | # | 346 | # - yyyy ------------ 347 | # | huey ------- 348 | # | duey ------- 349 | # | louie ------ 350 | # 351 | # Note that although ~/Maildir/.xxx/ and ~/Maildir/.yyyy may appear 352 | # as folders to the IMAP client the above commands to not generate 353 | # any Maildir folders of these names. These are simply elements 354 | # of the names of other Maildir directories. (if you used '-R', they 355 | # whill be able to act as normal folders, containing messages AND folders) 356 | # 357 | # With a separate run of this script, using just the "-s" option 358 | # without "-f" nor "-R", it would be possible to create mailboxes which 359 | # appear at the same location as far as the IMAP client is 360 | # concerned. By having Mbox mailboxes in some directory: 361 | # ~/oldmail/nnn/ of the form: 362 | # 363 | # /home/blah/oldmail/nn/xxxx 364 | # /home/blah/oldmail/nn/yyyyy 365 | # 366 | # then the command: 367 | # 368 | # mb2md -s oldmail/nn 369 | # 370 | # will create two new Maildirs: 371 | # 372 | # /home/blah/Maildir/.xxx/ 373 | # /home/blah/Maildir/.yyyy/ 374 | # 375 | # Then what used to be the xxx and yyyy folders now function as 376 | # mailboxes too. Netscape 4.77 needed to be put to sleep and given ECT 377 | # to recognise this - deleting the contents of (Win2k example): 378 | # 379 | # C:\Program Files\Netscape\Users\uu\ImapMail\aaa.bbb.ccc\ 380 | # 381 | # where "uu" is the user and "aaa.bbb.ccc" is the IMAP server 382 | # 383 | # I often find that deleting all this directory's contents, except 384 | # "rules.dat", forces Netscape back to reality after its IMAP innards 385 | # have become twisted. Then maybe use File > Subscribe - but this 386 | # seems incapable of subscribing to folders. 387 | # 388 | # For Outlook Express, select the mail server, then click the 389 | # "IMAP Folders" button and use "Reset list". In the "All" 390 | # window, select the mailboxes you want to see in normal 391 | # usage. 392 | # 393 | # 394 | # This script did not recurse subdirectories or delete old mailboxes, before addition of the '-R' parameter :) 395 | # 396 | # Be sure not to be accessing the Mbox mailboxes while running this 397 | # script. It does not attempt to lock them. Likewise, don't run two 398 | # copies of this script either. 399 | # 400 | # 401 | # Trickier usage . . . 402 | # ==================== 403 | # 404 | # If you have a bunch of mailboxes in a directory ~/oldmail/doors/ 405 | # and you want them to appear in folders such as: 406 | # 407 | # ~/Maildir/.music.bands.doors.Jim 408 | # ~/Maildir/.music.bands.doors.John 409 | # 410 | # etc. so they appear in an IMAP folder: 411 | # 412 | # Inbox ----------------- 413 | # | music 414 | # | bands 415 | # | doors 416 | # | Jim 417 | # | John 418 | # | Robbie 419 | # | Ray 420 | # 421 | # Then you could rename the source directory to: 422 | # 423 | # ~/oldmail/music.bands.doors/ 424 | # 425 | # then use: 426 | # 427 | # mb2md -s oldmail -f music.bands.doors 428 | # 429 | # 430 | # Or simply use '-R' switch with: 431 | # mb2md -s oldmail -R 432 | # 433 | # 434 | # Stripping mailbox extensions: 435 | # ============================= 436 | # 437 | # If you want to convert mailboxes that came for example from 438 | # a Windows box than you might want to strip the extension of 439 | # the mailbox name so that it won't create a subfolder in your 440 | # mail clients view. 441 | # 442 | # Example: 443 | # You have several mailboxes named Trash.mbx, Sent.mbx, Drafts.mbx 444 | # If you don't strip the extension "mbx" you will get the following 445 | # hierarchy: 446 | # 447 | # Inbox 448 | # | 449 | # - Trash 450 | # | | mbx 451 | # | 452 | # - Sent 453 | # | | mbx 454 | # | 455 | # - Drafts 456 | # | mbx 457 | # 458 | # This is more than ugly! 459 | # Just use: 460 | # mb2md -s oldmail -r mbx 461 | # 462 | # Note: don't specify the dot! It will be stripped off 463 | # automagically ;) 464 | # 465 | #------------------------------------------------------------------------------ 466 | 467 | 468 | use strict; 469 | use Getopt::Std; 470 | use Date::Parse; 471 | use IO::Handle; 472 | use Fcntl; 473 | 474 | # print the usage message 475 | sub usage() { 476 | print "Usage:\n"; 477 | print " mb2md -h\n"; 478 | print " mb2md [-c] [-K] [-U|-u] [-S] [-t] [-W] -m [-d destdir]\n"; 479 | print " mb2md [-c] [-K] [-U|-u] [-S] [-t] [-W] -s sourcefile [-d destdir]\n"; 480 | die " mb2md [-c] [-K] [-U|-u] [-S] [-t] [-W] -s sourcedir [-l wu-mailboxlist] [-R|-f somefolder] [-d destdir] [-r strip_extension]\n"; 481 | } 482 | # get options 483 | my %opts; 484 | getopts('d:f:chms:r:l:RUuKStW', \%opts) || usage(); 485 | usage() if ( defined($opts{h}) 486 | || (!defined($opts{m}) && !defined($opts{s})) ); 487 | 488 | # Get uid, username and home dir 489 | my ($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $homedir, $shell) = getpwuid($<); 490 | 491 | # Get arguments and determine source 492 | # and target directories. 493 | my $mbroot = undef; # this is the base directory for the mboxes 494 | my $mbdir = undef; # this is an mbox dir relative to the $mbroot 495 | my $mbfile = undef; # this is an mbox file 496 | my $dest = undef; 497 | my $strip_ext = undef; 498 | my $use_cl = undef; # defines whether we use the Content-Length: header if present 499 | my $use_file_ts_in_names = 0; # defines whether we use the message timestamp as part of generated filenames or not 500 | my $create_dovecot_keywords = 0; # defines whether we generate a Dovecot-compatible keywords file 501 | my $create_dovecot_uidlist = 0; # defines whether we generate a Dovecot-compatible uidlist UID file 502 | my $create_courier_uidlist = 0; # defines whether we generate a Courier IMAP-compatible courierimapuiddb UID file 503 | my $note_message_size = 0; # Whether we should add the ,S= message size tag 504 | my $note_rfc822_size = 0; # Whether we should add the ,W= RFC822.SIZE tag 505 | 506 | # if option "-c" is given, we use the Content-Length: header if present 507 | # dangerous! may be unreliable, as the whole CL stuff is a bad idea 508 | if (defined($opts{c})) 509 | { 510 | $use_cl = 1; 511 | } else { 512 | $use_cl = 0; 513 | } 514 | 515 | # The -U and -u options cannot be specified together 516 | if (defined($opts{U}) && defined($opts{u})) 517 | { 518 | die("Options -U and -u cannot be specified together"); 519 | } 520 | 521 | # if option "-K" is given, we will generate a Dovecot-compatible 522 | # dovecot-keywords file in each Maildir 523 | if (defined($opts{K})) 524 | { 525 | $create_dovecot_keywords = 1; 526 | } 527 | 528 | if (defined($opts{t})) 529 | { 530 | $use_file_ts_in_names = 1; 531 | } 532 | 533 | # if option "-U" is given, we will generate a Dovecot-compatible 534 | # dovecot-uidlist file in each Maildir 535 | if (defined($opts{U})) 536 | { 537 | $create_dovecot_uidlist = 1; 538 | } 539 | 540 | # if option "-u" is given, we will generate a Courier IMAP-compatible 541 | # courierimapuiddb file in each Maildir 542 | if (defined($opts{u})) 543 | { 544 | $create_courier_uidlist = 1; 545 | } 546 | 547 | if (defined($opts{S})) 548 | { 549 | $note_message_size = 1; 550 | } 551 | 552 | if (defined($opts{W})) 553 | { 554 | $note_rfc822_size = 1; 555 | } 556 | 557 | # first, if the user has gone the -m option 558 | # we simply convert their mailfile 559 | if (defined($opts{m})) 560 | { 561 | if (defined($ENV{'MAIL'})) { 562 | $mbfile = $ENV{'MAIL'}; 563 | } elsif ( -f "/var/spool/mail/$name" ) { 564 | $mbfile = "/var/spool/mail/$name" 565 | } elsif ( -f "/var/mail/$name" ) { 566 | $mbfile = "/var/mail/$name" 567 | } else { 568 | die("I searched \$MAIL, /var/spool/mail/$name and /var/mail/$name, ". 569 | "but I couldn't find your mail spool file - "); 570 | } 571 | } 572 | # see if the user has specified a source directory 573 | elsif (defined($opts{s})) 574 | { 575 | # if opts{s} doesn't start with a "/" then 576 | # it is a subdir of the users $home 577 | # if it does start with a "/" then 578 | # let's take $mbroot as a absolut path 579 | $opts{s} = "$homedir/$opts{s}" if ($opts{s} !~ /^\//); 580 | 581 | # check if the given source is a mbox file 582 | if (-f $opts{s}) 583 | { 584 | $mbfile = $opts{s}; 585 | } 586 | 587 | # otherwise check if it is a directory 588 | elsif (-d $opts{s}) 589 | { 590 | $mbroot = $opts{s}; 591 | # get rid of trailing /'s 592 | $mbroot =~ s/\/$//; 593 | 594 | # check if we have a specified sub directory, 595 | # otherwise the sub directory is '.' 596 | if (defined($opts{f})) 597 | { 598 | $mbdir = $opts{f}; 599 | # get rid of trailing /'s 600 | $mbdir =~ s/\/$//; 601 | } 602 | } 603 | 604 | # otherwise we have an error 605 | else 606 | { 607 | die("Fatal: Source is not an mbox file or a directory!\n"); 608 | } 609 | } 610 | 611 | 612 | # get the dest 613 | defined($opts{d}) && ($dest = $opts{d}) || ($dest = "Maildir"); 614 | # see if we have anything to strip 615 | defined($opts{r}) && ($strip_ext = $opts{r}); 616 | # No '-f' with '-R' 617 | if((defined($opts{R}))&&(defined($opts{f}))) { die "No recursion with \"-f\"";} 618 | # Get list of folders 619 | my @flist; 620 | if(defined($opts{l})) 621 | { 622 | open (LIST,$opts{l}) or die "Could not open mailbox list $opts{l}: $!"; 623 | @flist=; 624 | close LIST; 625 | } 626 | 627 | # if the destination is relative to the home dir, 628 | # check that the home dir exists 629 | die("Fatal: home dir $homedir doesn't exist.\n") if ($dest !~ /^\// && ! -e $homedir); 630 | 631 | # 632 | # form the destination value 633 | # slap the home dir on the front of the dest if the dest does not begin 634 | # with a '/' 635 | $dest = "$homedir/$dest" if ($dest !~ /^\//); 636 | # get rid of trailing /'s 637 | $dest =~ s/\/$//; 638 | 639 | 640 | # Count the number of mailboxes, or 641 | # at least files, we found. 642 | my $mailboxcount = 0; 643 | 644 | # Since we'll be making sub directories of the main 645 | # Maildir, we need to make sure that the main maildir 646 | # exists 647 | &maildirmake($dest); 648 | 649 | # Now we do different things depending on whether we convert one mbox 650 | # file or a directory of mbox files 651 | if (defined($mbfile)) 652 | { 653 | if (!isamailboxfile($mbfile)) 654 | { 655 | print "Skipping $mbfile: not a mbox file\n"; 656 | } 657 | else 658 | { 659 | print "Converting $mbfile to maildir: $dest\n"; 660 | # this is easy, we just run the convert function 661 | &convert($mbfile, $dest); 662 | } 663 | } 664 | # if '-f' was used ... 665 | elsif (defined($mbdir)) 666 | { 667 | print "Converting mboxdir/mbdir: $mbroot/$mbdir to maildir: $dest/\n"; 668 | 669 | # Now set our source directory 670 | my $sourcedir = "$mbroot/$mbdir"; 671 | 672 | # check that the directory we are supposed to be finding mbox 673 | # files in, exists and is a directory 674 | -e $sourcedir or die("Fatal: MBDIR directory $sourcedir/ does not exist.\n"); 675 | -d $sourcedir or die("Fatal: MBDIR $sourcedir is not a directory.\n"); 676 | 677 | 678 | &convertit($mbdir,""); 679 | } 680 | # Else, let's work in $mbroot 681 | else 682 | { 683 | opendir(SDIR, $mbroot) 684 | or die("Fatal: Cannot open source directory $mbroot/ \n"); 685 | 686 | 687 | while (my $sourcefile = readdir(SDIR)) 688 | { 689 | if (-d "$mbroot/$sourcefile") { 690 | # Recurse only if requested (to be changed ?) 691 | if (defined($opts{R})) { 692 | print "convertit($sourcefile,\"\")\n"; 693 | &convertit($sourcefile,""); 694 | } else { 695 | print("$sourcefile is a directory, but '-R' was not used... skipping\n"); 696 | } 697 | } 698 | elsif (!-f "$mbroot/$sourcefile") 699 | { 700 | print "Skipping $mbroot/$sourcefile : not a file nor a dir\n"; 701 | next; 702 | } 703 | elsif (!isamailboxfile("$mbroot/$sourcefile")) 704 | { 705 | print "Skipping $mbroot/$sourcefile : not a mbox file\n"; 706 | next; 707 | } 708 | else 709 | { 710 | &convertit($sourcefile,""); 711 | } 712 | } # end of "while ($sfile = readdir(SDIR))" loop. 713 | closedir(SDIR); 714 | printf("$mailboxcount files processed.\n"); 715 | } 716 | # 717 | 718 | exit 0; 719 | 720 | # My debbugging placeholder I can put somewhere to show how far the script ran. 721 | # die("So far so good.\n\n"); 722 | 723 | # The isamailboxfile function 724 | # ---------------------- 725 | # 726 | # Here we check if the file is a mailbox file, not an address-book or 727 | # something else. 728 | # If file is empty, we say it is a mbox, to create it empty. 729 | # 730 | # Returns 1 if file is said mbox, 0 else. 731 | sub isamailboxfile { 732 | my ($mbxfile) = @_; 733 | return 1 if(-z $mbxfile); 734 | sysopen(MBXFILE, "$mbxfile", O_RDONLY) or die "Could not open $mbxfile ! \n"; 735 | while() { 736 | if (/^From/) { 737 | close(MBXFILE); 738 | return 1; 739 | } 740 | else { 741 | close(MBXFILE); 742 | return 0; 743 | } 744 | } 745 | } 746 | 747 | # The convertit function 748 | # ----------------------- 749 | # 750 | # This function creates all subdirs in maildir, and calls convert() 751 | # for each mbox file. 752 | # Yes, it becomes the 'main loop' :) 753 | sub convertit 754 | { 755 | # Get subdir as argument 756 | my ($dir,$oldpath) = @_; 757 | 758 | $oldpath =~ s/\/\///; 759 | 760 | # Skip files beginning with '.' since they are 761 | # not normally mbox files nor dirs (includes '.' and '..') 762 | if ($dir =~ /^\./) 763 | { 764 | print "Skipping $dir : name begins with a '.'\n"; 765 | return; 766 | } 767 | my $destinationdir = $dir; 768 | my $temppath = $oldpath; 769 | 770 | # We don't want to have .'s in the $targetfile file 771 | # name because they will become directories in the 772 | # Maildir. Therefore we convert them to _'s 773 | $temppath =~ s/\./\_/g; 774 | $destinationdir =~ s/\./\_/g; 775 | 776 | # Appending $oldpath => path is only missing $dest 777 | $destinationdir = "$temppath.$destinationdir"; 778 | 779 | # Converting '/' to '.' in $destinationdir 780 | $destinationdir =~s/\/+/\./g; 781 | 782 | # source dir 783 | my $srcdir="$mbroot/$oldpath/$dir"; 784 | 785 | print("convertit(): Converting $dir in $mbroot/$oldpath to $dest/$destinationdir\n"); 786 | &maildirmake("$dest/$destinationdir"); 787 | 788 | # Subfolders are Maildir++ folders and should be marked by the 789 | # presence of an empty "maildirfolder" file 790 | sysopen(F, "$dest/$destinationdir/maildirfolder", O_CREAT|O_WRONLY, 0600) && close F; 791 | 792 | print("destination = $destinationdir\n"); 793 | if (-d $srcdir) { 794 | opendir(SUBDIR, "$srcdir") or die "can't open $srcdir !\n"; 795 | my @subdirlist=readdir(SUBDIR); 796 | closedir(SUBDIR); 797 | foreach (@subdirlist) { 798 | next if (/^\.+$/); 799 | print("Sub: $_\n"); 800 | print("convertit($_,\"$oldpath/$dir\")\n"); 801 | &convertit($_,"$oldpath/$dir"); 802 | } 803 | } else { 804 | # Source file verifs .... 805 | # 806 | return if(defined($opts{l}) && !inlist("$oldpath/$dir",@flist)); 807 | 808 | if (!isamailboxfile("$mbroot/$oldpath/$dir")) 809 | { 810 | print "Skipping $dir (is not mbox)\n"; 811 | return; 812 | } 813 | 814 | # target file verifs... 815 | # 816 | # if $strip_extension is defined, 817 | # strip it off the $targetfile 818 | defined($strip_ext) && ($destinationdir =~ s/\.$strip_ext$//); 819 | &convert("$mbroot/$oldpath/$dir","$dest/$destinationdir"); 820 | $mailboxcount++; 821 | } 822 | } 823 | # The maildirmake function 824 | # ------------------------ 825 | # 826 | # It does the same thing that the maildirmake binary that 827 | # comes with courier-imap distribution 828 | # 829 | sub maildirmake 830 | { 831 | foreach(@_) { 832 | -d $_ or mkdir $_,0700 or die("Fatal: Directory $_ doesn't exist and can't be created.\n"); 833 | 834 | -d "$_/tmp" or mkdir("$_/tmp",0700) or die("Fatal: Unable to make $_/tmp/ subdirectory.\n"); 835 | -d "$_/new" or mkdir("$_/new",0700) or die("Fatal: Unable to make $_/new/ subdirectory.\n"); 836 | -d "$_/cur" or mkdir("$_/cur",0700) or die("Fatal: Unable to make $_/cur/ subdirectory.\n"); 837 | } 838 | } 839 | 840 | # The inlist function 841 | # ------------------------ 842 | # 843 | # It checks that the folder to be converted is in the list of subscribed 844 | # folders in WU-IMAP 845 | # 846 | sub inlist 847 | { 848 | my ($file,@flist) = @_; 849 | my $valid = 0; 850 | # Get rid of the first / if any 851 | $file =~ s/^\///; 852 | foreach my $folder (@flist) { 853 | chomp $folder; 854 | if ($file eq $folder) { 855 | $valid = 1; 856 | last; 857 | } 858 | } 859 | if (!$valid) { 860 | print "$file is not in list\n"; 861 | } 862 | else { 863 | print "$file is in list\n"; 864 | } 865 | 866 | return $valid; 867 | } 868 | 869 | # 870 | 871 | # The convert function 872 | # --------------------- 873 | # 874 | # This function does the down and dirty work of 875 | # actually converting the mbox to a maildir 876 | # 877 | sub convert 878 | { 879 | # get the source and destination as arguments 880 | my ($mbox, $maildir) = @_; 881 | 882 | print("Source Mbox is $mbox\n"); 883 | print("Target Maildir is $maildir \n") ; 884 | 885 | # create the directories for the new maildir 886 | # 887 | # if it is the root maildir (ie. converting the inbox) 888 | # these already exist but thats not a big issue 889 | 890 | &maildirmake($maildir); 891 | 892 | # Change to the target mailbox directory. 893 | 894 | chdir "$maildir" ; 895 | 896 | # Converts a Mbox to multiple files 897 | # in a Maildir. 898 | # This is adapted from mbox2maildir. 899 | # 900 | # Open the Mbox mailbox file. 901 | 902 | 903 | if (sysopen(MBOX, "$mbox", O_RDONLY)) 904 | { 905 | #printf("Converting Mbox $mbox . . . \n"); 906 | } 907 | else 908 | { 909 | die("Fatal: unable to open input mailbox file: $mbox ! \n"); 910 | } 911 | 912 | # This loop scans the input mailbox for 913 | # a line starting with "From ". The 914 | # "^" before it is pattern-matching 915 | # lingo for it being at the start of a 916 | # line. 917 | # 918 | # Each email in Mbox mailbox starts 919 | # with such a line, which is why any 920 | # such line in the body of the email 921 | # has to have a ">" put in front of it. 922 | # 923 | # This is not required in a Maildir 924 | # mailbox, and some majik below 925 | # finds any such quoted "> From"s and 926 | # gets rid of the "> " quote. 927 | # 928 | # Each email is put in a file 929 | # in the cur/ subdirectory with a 930 | # name of the form: 931 | # 932 | # nnnnnnnnn.cccc.mbox:2,XXXX 933 | # 934 | # where: 935 | # "nnnnnnnnn" is the Unix time since 936 | # 1970 when this script started 937 | # running, incremented by 1 for 938 | # every email. This is to ensure 939 | # unique names for each message 940 | # file. 941 | # 942 | # ".cccc" is the message count of 943 | # messages from this mbox. 944 | # 945 | # ".mbox" is just to indicate that 946 | # this message was converted from 947 | # an Mbox mailbox. 948 | # 949 | # ":2," is the start of potentially 950 | # multiple IMAP flag characters 951 | # "XXXX", but may be followed by 952 | # nothing. 953 | # 954 | # This is sort-of compliant with 955 | # the Maildir naming conventions 956 | # specified at: 957 | # 958 | # http://www.qmail.org/man/man5/maildir.html 959 | # 960 | # This approach does not involve the 961 | # process ID or the hostname, but it is 962 | # probably good enough. 963 | # 964 | # When the IMAP server looks at this 965 | # mailbox, it will move the files to 966 | # the cur/ directory and change their 967 | # names as it pleases. In the case 968 | # of Courier IMAP, the names will 969 | # become like: 970 | # 971 | # 995096541.25351.mbox:2,S 972 | # 973 | # with 25351 being Courier IMAP's 974 | # process ID. The :2, is the start 975 | # of the flags, and the "S" means 976 | # that this one has been seen by 977 | # the user. (But is this the same 978 | # meaning as the user actually 979 | # having opened the message to see 980 | # its contents, rather than just the 981 | # IMAP server having been asked to 982 | # list the message's Subject etc. 983 | # so the client could list it in the 984 | # visible Inbox?) 985 | # 986 | # This contrasts with a message 987 | # created by Courier IMAP, say with 988 | # a message copy, which is like: 989 | # 990 | # 995096541.25351.zair,S=14285:2,S 991 | # 992 | # where ",S=14285" is the size of the 993 | # message in bytes. 994 | # 995 | # Courier Maildrop's names are similar 996 | # but lack the ":2,XXXX" flags . . . 997 | # except for my modified Maildrop 998 | # which can deliver them with a 999 | # ":2,T" - flagged for deletion. 1000 | # 1001 | # I have extended the logic of the 1002 | # per-message inner loop to stop 1003 | # saving a file for a message with: 1004 | # 1005 | # Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA 1006 | # 1007 | # This is the dummy message, always 1008 | # at the start of an Mbox format 1009 | # mailbox file - and is put there 1010 | # by UW IMAPD. Since quite a few 1011 | # people will use this for 1012 | # converting from a UW system, 1013 | # I figure it is worth it. 1014 | # 1015 | # I will not save any such message 1016 | # file for the dummy message. 1017 | # 1018 | # Plan 1019 | # ---- 1020 | # 1021 | # We want to read the entire Mbox file, whilst 1022 | # going through a loop for each message we find. 1023 | # 1024 | # We want to read all the headers of the message, 1025 | # starting with the "From " line. For that "From " 1026 | # line we want to get a date. 1027 | # 1028 | # For all other header lines, we want to store them 1029 | # in $headers whilst parsing them to find: 1030 | # 1031 | # 1 - Any flags in the "Status: " or "X-Status: " or 1032 | # "X-Mozilla-Status: " lines. 1033 | # 1034 | # 2 - A subject line indicating this is the dummy message 1035 | # at the start (typically, but not necessarily) of 1036 | # the Mbox. 1037 | # 1038 | # Once we reach the end of the headers, we will crunch any 1039 | # flags we found to create a file name. Then, unless this is 1040 | # the dummy message we create that file and write all the 1041 | # headers to it. 1042 | # 1043 | # Then we continue reading the Mbox, converting ">From " to 1044 | # "From " and writing it to the file, until we reach one of: 1045 | # 1046 | # 1 - Another "From " line (indicating the start of another 1047 | # message). 1048 | # 1049 | # or 1050 | # 1051 | # 2 - The end of the Mbox. 1052 | # 1053 | # In the former case, which we detect at the start of the loop 1054 | # we need to close the file and touch it to alter its date-time. 1055 | # 1056 | # In the later case, we also need to close the file and touch 1057 | # it to alter its date-time - but this is beyond the end of the 1058 | # loop. 1059 | 1060 | 1061 | # Variables 1062 | # --------- 1063 | 1064 | my $messagecount = 0; 1065 | 1066 | # For generating unique filenames for 1067 | # each message. Initialise it here with 1068 | # numeric time in seconds since 1970. 1069 | my $unique = time; 1070 | 1071 | 1072 | # If we get to 1073 | # each message. Initialise it here with 1074 | # numeric time in seconds since 1970. 1075 | my $filebase = sprintf "%d", $unique; 1076 | 1077 | # Name of message file to delete if we found that 1078 | # it was created by reading the Mbox dummy message. 1079 | 1080 | my $deletedummy = ''; 1081 | 1082 | # To store the complete "From (address) (date-time) 1083 | # which delineates the start of each message 1084 | # in the Mbox 1085 | my $fromline = ''; 1086 | 1087 | 1088 | # Set to 1 when we are reading the header lines, 1089 | # including the "From " line. 1090 | # 1091 | # 0 means we are reading the message body and looking 1092 | # for another "From " line. 1093 | 1094 | my $inheaders = 0; 1095 | 1096 | # Variable to hold all headers (apart from 1097 | # the first line "From ...." which is not 1098 | # part of the message itself. 1099 | my $headers = ''; 1100 | 1101 | # Variable to hold the accumulated characters 1102 | # we find in header lines of the type: 1103 | # 1104 | # Status: 1105 | # X-Status: 1106 | # X-Mozilla-Status: 1107 | # X-Evolution: 1108 | my $flags = ''; 1109 | 1110 | # To build the file name for the message in. 1111 | my $messagefn = ''; 1112 | 1113 | 1114 | # The date string from the "From " line of each 1115 | # message will be written here - and used by 1116 | # touch to alter the date-time of each message 1117 | # file. Put non-date text here to make it 1118 | # spit the dummy if my code fails to find a 1119 | # date to write into this. 1120 | 1121 | my $receivedate = 'Bogus'; 1122 | 1123 | # The subject of the message 1124 | my $subject = ''; 1125 | 1126 | my $previous_line_was_empty = 1; 1127 | 1128 | # We record the message start line here, for error 1129 | # reporting. 1130 | my $startline; 1131 | 1132 | # If defined, we use this as the number of bytes in the 1133 | # message body rather than looking for a /^From / line. 1134 | my $contentlength; 1135 | 1136 | # A From lines can either occur as the first 1137 | # line of a file, or after an empty line. 1138 | # Most mail systems will quote all From lines 1139 | # appearing in the message, but some will only 1140 | # do it when necessary. 1141 | # Since we initialise the variable to true, 1142 | # we don't need to check for beginning of file. 1143 | 1144 | # The path to the UID list file 1145 | my $uidlistfile; 1146 | if ($create_dovecot_uidlist) 1147 | { 1148 | $uidlistfile = "${maildir}/dovecot-uidlist"; 1149 | } else { 1150 | $uidlistfile = "${maildir}/courierimapuiddb"; 1151 | } 1152 | # Store the UIDVALIDITY and UIDLAST from the X-IMAP: 1153 | # header 1154 | my $uidvalidity; 1155 | my $uidlast = 0; 1156 | 1157 | # Store the UID for the current message 1158 | my $uidcurr = 0; 1159 | 1160 | # Array to hold all the UIDs and filenames for outputing 1161 | # into a uidlist file 1162 | my @uidlist; 1163 | my $douidlist = $create_dovecot_uidlist || $create_courier_uidlist; 1164 | if ($douidlist && scalar(stat($uidlistfile))) 1165 | { 1166 | $douidlist = 0; 1167 | printf("WARNING: Skipping UIDs for this folder. %s already exists.\n", $uidlistfile); 1168 | } 1169 | 1170 | # The path to the Dovecot keywords list 1171 | my $keywordsfile = "$maildir/dovecot-keywords"; 1172 | # Hash to hold a list of all valid keywords for the folder. 1173 | # We use a hash to make looking up keywords in there fast. 1174 | my %validkeywords; 1175 | # A list of already encountered keys. The index of each key 1176 | # is used when generating message filenames and they get 1177 | # written to the dovecot-keywords file. We also have a 1178 | # hash that maps from the keyword to the array index to 1179 | # facilitate checking if we already have an index for the 1180 | # keyword 1181 | my @keywords; 1182 | my %keywordshash; 1183 | 1184 | # List of keyword flags used by Dovecot. The dovecot-keyword 1185 | # file maintains a 0-based index of keywords in use in the 1186 | # folder. The message filenames use the flags a-z to mark 1187 | # messages as having keywords (a=0, b=1, etc). Note that 1188 | # this means Dovecot only supports 26 different keywords 1189 | # per mail folder. This array maps the numeric indexes to 1190 | # the letter flags (in case Dovecot begins to use other 1191 | # flags in the future). 1192 | my @keywordflags = ('a'..'z'); 1193 | 1194 | # Store the keyword header found for the current message 1195 | my $messagekeywords; 1196 | 1197 | # If there already exists a dovecot-keywords file then 1198 | # we can't deal with keywords even if the user wants us to. 1199 | # It's not technically impossible, just more than this code 1200 | # can be bothered to deal with. 1201 | my $dokeywords = $create_dovecot_keywords; 1202 | if ($dokeywords && scalar(stat($keywordsfile))) 1203 | { 1204 | $dokeywords = 0; 1205 | printf("WARNING: Skipping keywords for this folder. %s already exists.\n", $keywordsfile); 1206 | } 1207 | 1208 | my $postclose = sub 1209 | { 1210 | if ($messagefn ne '' && $messagefn ne $deletedummy) 1211 | { 1212 | if ($note_message_size || $note_rfc822_size) 1213 | { 1214 | my $params = ""; 1215 | my $realsize = -s $messagefn; 1216 | 1217 | if ($note_message_size) 1218 | { 1219 | $params .= ",S=$realsize"; 1220 | } 1221 | 1222 | if ($note_rfc822_size && open(MSG, "<$messagefn")) 1223 | { 1224 | my $lfs = 0; 1225 | my $line; 1226 | while ($line = ) 1227 | { 1228 | $lfs += ($line =~ m/(?) 1254 | { 1255 | # exchange possible Windows EOL (CRLF) with Unix EOL (LF) 1256 | $_ =~ s/\r\n$/\n/; 1257 | 1258 | if ( /^From / 1259 | && $previous_line_was_empty 1260 | && (!defined $contentlength) 1261 | ) 1262 | { 1263 | # We are reading the "From " line which has an 1264 | # email address followed by a receive date. 1265 | # Turn on the $inheaders flag until we reach 1266 | # the end of the headers. 1267 | 1268 | $inheaders = 1; 1269 | 1270 | # In case we don't find an X-UID: header, set 1271 | # the UID for the current message to 1 higher 1272 | # than the previous message 1273 | $uidcurr += 1; 1274 | 1275 | # This is a new message so we need to undefine 1276 | # the message keyword header before looking at 1277 | # the new message (which may not have one) 1278 | undef($messagekeywords); 1279 | 1280 | # record the message start line 1281 | 1282 | $startline = $.; 1283 | 1284 | # If this is not the first run through the loop 1285 | # then this means we have already been working 1286 | # on a message. 1287 | 1288 | if ($messagecount > 0) 1289 | { 1290 | # If so, then close that message file and then 1291 | # use utime to change its date-time. 1292 | # 1293 | # Note this code should be duplicated to do 1294 | # the same thing at the end of the while loop 1295 | # since we must close and touch the final message 1296 | # file we were writing when we hit the end of the 1297 | # Mbox file. 1298 | 1299 | close (OUT); 1300 | &$postclose(); 1301 | } 1302 | 1303 | # Because we opened the Mbox file without any 1304 | # variable, I think this means that we have its 1305 | # current line in Perl's default variable "$_". 1306 | # So all sorts of pattern matching magic works 1307 | # directly on it. 1308 | 1309 | # We are currently reading the first line starting with 1310 | # "From " which contains the date we want. 1311 | # 1312 | # This will be of the form: 1313 | # 1314 | # From dduck@test.org Wed Nov 24 11:05:35 1999 1315 | # 1316 | # at least with UW-IMAP. 1317 | # 1318 | # However, I did find a nasty exception to this in my 1319 | # tests, of the form: 1320 | # 1321 | # "bounce-MusicNewsletter 5-rw=test.org"@announce2.mp3.com 1322 | # 1323 | # This makes it trickier to get rid of the email address, 1324 | # but I did find a way. I can't rule out that there would 1325 | # be some address like this with an "@" in the quoted 1326 | # portion too. 1327 | # 1328 | # Unfortunately, testing with an old Inbox Mbox file, 1329 | # I also found an instance where the email address 1330 | # had no @ sign at all. It was just an email 1331 | # account name, with no host. 1332 | # 1333 | # I could search for the day of the week. If I skipped 1334 | # at least one word of non-whitespace (1 or more contiguous 1335 | # non-whitespace characters) then searched for a day of 1336 | # the week, then I should be able to avoid almost 1337 | # every instance of a day of the week appearing in 1338 | # the email address. 1339 | # 1340 | # Do I need a failsafe arrangement to provide some 1341 | # other date to touch if I don't get what seems like 1342 | # a date in my resulting string? For now, no. 1343 | # 1344 | # I will take one approach if there is an @ in the 1345 | # "From " line and another (just skip the first word 1346 | # after "From ") if there is no @ in the line. 1347 | # 1348 | # If I knew more about Perl I would probably do it in 1349 | # a more elegant way. 1350 | 1351 | # Copy the current line into $fromline. 1352 | 1353 | $fromline = $_; 1354 | 1355 | # Now get rid of the "From ". " =~ s" means substitute. 1356 | # Find the word "From " at the start of the line and 1357 | # replace it with nothing. The nothing is what is 1358 | # between the second and third slash. 1359 | 1360 | $fromline =~ s/^From // ; 1361 | 1362 | 1363 | # Likewise get rid of the email address. 1364 | # This first section is if we determine there is one 1365 | # (or more . . . ) "@" characters in the line, which 1366 | # would normally be the case. 1367 | 1368 | if ($fromline =~ m/@/) 1369 | { 1370 | # The line has at least one "@" in it, so we assume 1371 | # this is in the middle of an email address. 1372 | # 1373 | # If the email address had no spaces, then we could 1374 | # get rid of the whole thing by searching for any number 1375 | # of non-whitespace characters (\S) contiguously, and 1376 | # then I think a space. Subsitute nothing for this. 1377 | # 1378 | # $fromline =~ s/(\S)+ // ; 1379 | # 1380 | # But we need something to match any number of non-@ 1381 | # characters, then the "@" and then all the non-whitespace 1382 | # characters from there (which takes us to the end of 1383 | # "test.org") and then the space following that. 1384 | # 1385 | # A tutorial on regular expressions is: 1386 | # 1387 | # http://www.perldoc.com/perl5.6.1/pod/perlretut.html 1388 | # 1389 | # Get rid of all non-@ characters up to the first "@": 1390 | 1391 | $fromline =~ s/[^@]+//; 1392 | 1393 | 1394 | # Get rid of the "@". 1395 | 1396 | $fromline =~ s/@//; 1397 | } 1398 | # If there was an "@" in the line, then we have now 1399 | # removed the first one (lets hope there aren't more!) 1400 | # and everything which preceded it. 1401 | # 1402 | # we now remove either something like 1403 | # '(foo bar)'. eg. '(no mail address)', 1404 | # or everything after the '@' up to the trailing 1405 | # timezone 1406 | # 1407 | # FIXME: all those regexp should be combined to just one single one 1408 | 1409 | # If the first character is a quote, remove everything up to 1410 | # the next quote. 1411 | if ($fromline =~ m/^\s*"/) 1412 | { 1413 | $fromline =~ s/"[^"]*"//; 1414 | } else { 1415 | $fromline =~ s/(\((\S*| )+\)|\S+) *//; 1416 | } 1417 | 1418 | chomp $fromline; 1419 | 1420 | # Stash the date-time for later use. We will use it 1421 | # to touch the file after we have closed it. 1422 | 1423 | $receivedate = $fromline; 1424 | 1425 | # Debugging lines: 1426 | # 1427 | # print "$receivedate is the receivedate of message $messagecount.\n"; 1428 | # $receivedate = "Wed Nov 24 11:05:35 1999"; 1429 | # 1430 | # To look at the exact date-time of files: 1431 | # 1432 | # ls -lFa --full-time 1433 | # 1434 | # End of handling the "From " line. 1435 | } 1436 | 1437 | 1438 | # Now process header lines which are not the "From " line. 1439 | 1440 | if ( ($inheaders eq 1) 1441 | && (! /^From /) 1442 | ) 1443 | { 1444 | # Now we are reading the header lines after the "From " line. 1445 | # Keep looking for the blank line which indicates the end of the 1446 | # headers. 1447 | 1448 | 1449 | # ".=" means append the current line to the $headers 1450 | # variable. 1451 | # 1452 | # For some reason, I was getting two blank lines 1453 | # at the end of the headers, rather than one, 1454 | # so I decided not to read in the blank line 1455 | # which terminates the headers. 1456 | # 1457 | # Delete the "unless ($_ eq "\n")" to get rid 1458 | # of this kludge. 1459 | # 1460 | # Don't copy status headers, etc. if we've used 1461 | # the info in them already for something. 1462 | 1463 | $headers .= $_ unless ( ($_ eq "\n") || 1464 | (/^Status: /) || 1465 | (/^X-Status: /) || 1466 | (/^X-Mozilla-Status: /i) || 1467 | (/^X\-Evolution:\s+/oi) || 1468 | (/^X-IMAP(?:base)?: /) || 1469 | (/^X-UID: /) || 1470 | (/^X-Keywords:\s+/)); 1471 | 1472 | if (/^X-IMAP(?:base)?: (\d+)\s+(\d+)\s*([^\s].*)?\s*$/) 1473 | { 1474 | if (defined($uidvalidity)) 1475 | { 1476 | printf("WARNING: Second X-IMAP: header found. Ignoring it (line %d, msg %d).\n", $., $messagecount); 1477 | } else { 1478 | $uidvalidity = $1; 1479 | $uidlast = $2; 1480 | } 1481 | 1482 | # Valid keywords for the mailbox are stored 1483 | # in the X-IMAP: or X-IMAPbase: header. Any 1484 | # keywords in messages that are not in this 1485 | # list should be ignored 1486 | if (defined($3)) 1487 | { 1488 | foreach my $keyword (split(/\s+/, $3)) 1489 | { 1490 | $validkeywords{$keyword} = 1; 1491 | } 1492 | } 1493 | } 1494 | 1495 | if (/^X-UID: (\d+)/) 1496 | { 1497 | # UIDs must increase; we must have a UID at least 1 1498 | # greater than the previous message 1499 | if ($1 < $uidcurr) 1500 | { 1501 | printf("WARNING: UID from X-UID: header too low. Ignoring it (line %d, msg %d).\n", $., $messagecount); 1502 | } else { 1503 | $uidcurr = $1; 1504 | } 1505 | } 1506 | 1507 | if (/^X-Keywords:\s+(.*)\s*$/) 1508 | { 1509 | # Grab the keywords for use when we generate the 1510 | # message filename below 1511 | $messagekeywords = $1; 1512 | } 1513 | 1514 | # Now scan the line for various status flags 1515 | # and to fine the Subject line. 1516 | 1517 | $flags .= $1 if /^Status: ([A-Z]+)/; 1518 | $flags .= $1 if /^X-Status: ([A-Z]+)/; 1519 | if (/^X-Mozilla-Status: ([0-9a-f]{4})/i) 1520 | { 1521 | $flags .= 'R' if (hex($1) & 0x0001); 1522 | $flags .= 'A' if (hex($1) & 0x0002); 1523 | $flags .= 'D' if (hex($1) & 0x0008); 1524 | } 1525 | if(/^X\-Evolution:\s+\w{8}\-(\w{4})/oi) 1526 | { 1527 | $b = pack("H4", $1); #pack it as 4 digit hex (0x0000) 1528 | $b = unpack("B32", $b); #unpack into bit string 1529 | 1530 | # "usually" only the right most six bits are used 1531 | # however, I have come across a seventh bit in 1532 | # about 15 (out of 10,000) messages with this bit 1533 | # activated. 1534 | # I have not found any documentation in the source. 1535 | # If you find out what it does, please let me know. 1536 | 1537 | # Notes: 1538 | # Evolution 1.4 does mark forwarded messages. 1539 | # The sixth bit is to denote an attachment 1540 | 1541 | $flags .= 'A' if($b =~ /[01]{15}1/); #replied 1542 | $flags .= 'D' if($b =~ /[01]{14}1[01]{1}/); #deleted 1543 | $flags .= 'T' if($b =~ /[01]{13}1[01]{2}/); #draft 1544 | $flags .= 'F' if($b =~ /[01]{12}1[01]{3}/); #flagged 1545 | $flags .= 'R' if($b =~ /[01]{11}1[01]{4}/); #seen/read 1546 | } 1547 | $subject = $1 if /^Subject: (.*)$/; 1548 | if ($use_cl eq 1) 1549 | { 1550 | $contentlength = $1 if /^Content-Length: (\d+)$/; 1551 | } 1552 | 1553 | # Now look out for the end of the headers - a blank 1554 | # line. When we find it, create the file name and 1555 | # analyse the Subject line. 1556 | 1557 | if ($_ eq "\n") 1558 | { 1559 | # We are at the end of the headers. Set the 1560 | # $inheaders flag back to 0. 1561 | 1562 | $inheaders = 0; 1563 | 1564 | # Include the current newline in the content length 1565 | 1566 | ++$contentlength if defined $contentlength; 1567 | 1568 | # Create the file name for the current message. 1569 | # 1570 | # A simple version of this would be: 1571 | # 1572 | # $messagefn = "cur/$unique.$messagecount.mbox:2,"; 1573 | # 1574 | # This would create names with $messagecount values of 1575 | # 1, 2, etc. But for neatness when looking at a 1576 | # directory of such messages, sorted by filename, 1577 | # I want to have leading zeroes on message count, so 1578 | # that they would be 000001 etc. This makes them 1579 | # appear in message order rather than 1 being after 1580 | # 19 etc. So this is good for up to 999,999 messages 1581 | # in a mailbox. It is a cosmetic matter for a person 1582 | # looking into the Maildir directory manually. 1583 | # To do this, use sprintf instead with "%06d" for 1584 | # 6 characters of zero-padding: 1585 | 1586 | $messagefn = sprintf ("cur/%s.%06d.mbox:2,", $filebase, $messagecount) ; 1587 | 1588 | # If the message has not been flagged as Opened 1589 | # then it should be put in the new/ folder. This 1590 | # Works with Exim/UW-IMAP folders but is otherwise 1591 | # untested. 1592 | $messagefn =~ s/^cur/new/ unless $flags =~ /O/; 1593 | 1594 | # Append flag characters to the end of the 1595 | # filename, according to flag characters 1596 | # collected from the message headers 1597 | 1598 | $messagefn .= 'F' if $flags =~ /F/; # Flagged. 1599 | $messagefn .= 'R' if $flags =~ /A/; # Replied to. 1600 | $messagefn .= 'S' if $flags =~ /R/; # Seen or Read. 1601 | $messagefn .= 'T' if $flags =~ /D/; # Tagged for deletion. 1602 | 1603 | # If the user has asked us to generate Dovecot- 1604 | # compatible keyword listings, let's give it a go 1605 | if ($dokeywords && 1606 | defined($messagekeywords) && 1607 | scalar(keys(%validkeywords))) 1608 | { 1609 | foreach my $keyword (split(/\s+/, $messagekeywords)) 1610 | { 1611 | # Only keywords listed in the X-IMAP(base): header 1612 | # are valid for this folder 1613 | next unless $validkeywords{$keyword}; 1614 | 1615 | # Check if we've already used this keyword and 1616 | # assigned it an index. Try to assign one if not 1617 | unless (defined($keywordshash{$keyword})) 1618 | { 1619 | unless (scalar(@keywords) < scalar(@keywordflags)) 1620 | { 1621 | printf("WARNING: Too many keywords (%d max). Ignoring keyword '%s' for message %d\n", scalar(@keywordflags), $keyword, $messagecount); 1622 | next; 1623 | } 1624 | 1625 | # Add the keyword to the array 1626 | push(@keywords, $keyword); 1627 | # Update the keyword to index hash 1628 | $keywordshash{$keyword} = scalar(@keywords)-1; 1629 | } 1630 | 1631 | $messagefn .= $keywordflags[$keywordshash{$keyword}]; 1632 | } 1633 | } 1634 | 1635 | 1636 | # Opens filename $messagefn for output (>) with filehandle OUT. 1637 | 1638 | open(OUT, ">$messagefn") or die("Fatal: unable to create new message $messagefn"); 1639 | 1640 | # Count the messages. 1641 | 1642 | $messagecount++; 1643 | 1644 | # If the current UID is higher than UIDLAST, we 1645 | # need to update UIDLAST 1646 | $uidlast = $uidcurr if ($uidcurr > $uidlast); 1647 | 1648 | # Only for the first message, 1649 | # check to see if it is a dummy. 1650 | # Delete the message file we 1651 | # just created if it was for the 1652 | # dummy message at the start 1653 | # of the Mbox. 1654 | # 1655 | # Add search terms as required. 1656 | # The last 2 lines are for rent. 1657 | # 1658 | # "m" means match the regular expression, 1659 | # but we can do without it. 1660 | # 1661 | # Do I need to escape the ' in "DON'T"? 1662 | # I didn't in the original version. 1663 | 1664 | if ( (($messagecount == 1) && defined($subject)) 1665 | && ($subject =~ m/^DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA/) 1666 | ) 1667 | { 1668 | # Stash the file name of the dummy message so we 1669 | # can delete it later. 1670 | 1671 | $deletedummy = "$messagefn"; 1672 | 1673 | # If there was a dummy message, we still want 1674 | # the next message to be able to use UID 1 1675 | $uidcurr = $uidlast = 0; 1676 | } else { 1677 | # If this is not a dummy message then store 1678 | # the UID and message filename for outputing 1679 | # into the uidlist file at the end (dropping 1680 | # "cur/" from the beginning) 1681 | push(@uidlist, "$uidcurr ". substr($messagefn, 4)); 1682 | } 1683 | 1684 | # Print the collected headers to the message file. 1685 | 1686 | print OUT "$headers"; 1687 | 1688 | 1689 | # Clear $headers and $flags ready for the next message. 1690 | 1691 | $headers = ''; 1692 | $flags = ''; 1693 | 1694 | # End of processing the headers once we found the 1695 | # blank line which terminated them 1696 | } 1697 | 1698 | # End of dealing with the headers. 1699 | } 1700 | 1701 | 1702 | if ( $inheaders eq 0) 1703 | { 1704 | 1705 | # We are now processing the message body. 1706 | # 1707 | # Now we have passed the headers to the 1708 | # output file, we scan until the while 1709 | # loop finds another "From " line. 1710 | 1711 | # Decrement our content length if we're 1712 | # using it to find the end of the message 1713 | # body 1714 | 1715 | if (defined $contentlength) { 1716 | 1717 | # Decrement our $contentlength variable 1718 | 1719 | $contentlength -= length($_); 1720 | 1721 | # The proper end for a message with Content-Length 1722 | # specified is the $contentlength variable should 1723 | # be exactly -1 and we should be on a bare 1724 | # newline. Note that the bare newline is not 1725 | # printed to the end of the current message as 1726 | # it's actually a message separator in the mbox 1727 | # format rather than part of the message. The 1728 | # next line _should_ be a From_ line, but just in 1729 | # case the Content-Length header is incorrect 1730 | # (e.g. a corrupt mailbox), we just continue 1731 | # putting lines into the current message until we 1732 | # see the next From_ line. 1733 | 1734 | if ($contentlength < 0) { 1735 | if ($contentlength == -1 && $_ eq "\n") { 1736 | $contentlength = undef; 1737 | next; 1738 | } 1739 | $contentlength = undef; 1740 | } 1741 | } 1742 | 1743 | # 1744 | # We want to copy every part of the message 1745 | # body to the output file, except for the 1746 | # quoted ">From " lines, which was the 1747 | # way the IMAP server encoded body lines 1748 | # starting with "From ". 1749 | # 1750 | # Pattern matching Perl majik to 1751 | # get rid of an Mbox quoted From. 1752 | # 1753 | # This works on the default variable "$_" which 1754 | # contains the text from the Mbox mailbox - I 1755 | # guess this is the case because of our 1756 | # (open(MBOX ....) line above, which did not 1757 | # assign this to anything else, so it would go 1758 | # to the default variable. This enables 1759 | # inscrutably terse Perlisms to follow. 1760 | # 1761 | # "s" means "Subsitute" and it looks for any 1762 | # occurrence of ">From" starting at the start 1763 | # of the line. When it finds this, it replaces 1764 | # it with "From". 1765 | # 1766 | # So this finds all instances in the Mbox message 1767 | # where the original line started with the word 1768 | # "From" but was converted to ">From" in order to 1769 | # not be mistaken for the "From ..." line which 1770 | # is used to demark each message in the Mbox. 1771 | # This was was a destructive conversion because 1772 | # any message which originally had ">From" at the 1773 | # start of the line, before being put into the 1774 | # Mbox, will now have that line without the ">". 1775 | 1776 | s/^>From /From /; 1777 | 1778 | # Glorious tersness here. Thanks Simon for 1779 | # explaining this. 1780 | # 1781 | # "print OUT" means print the default variable to 1782 | # the file of file handle OUT. This is where 1783 | # the bulk of the message text is written to 1784 | # the output file. 1785 | 1786 | print OUT or die("Fatal: unable to write to new message to $messagefn"); 1787 | 1788 | 1789 | # End of the if statement dealing with message body. 1790 | } 1791 | 1792 | $previous_line_was_empty = ( $_ eq "\n" ); 1793 | 1794 | # End of while (MBOX) loop. 1795 | } 1796 | # Close the input file. 1797 | 1798 | close(MBOX); 1799 | 1800 | # Close the output file, and duplicate the code 1801 | # from the start of the while loop which touches 1802 | # the date-time of the most recent message file. 1803 | 1804 | close(OUT); 1805 | &$postclose(); 1806 | 1807 | # After all the messages have been 1808 | # converted, check to see if the 1809 | # first one was a dummy. 1810 | # if so, delete it and make 1811 | # the message count one less. 1812 | 1813 | if ($deletedummy ne "") 1814 | { 1815 | printf("Dummy mail system first message detected and not saved.\n"); 1816 | unlink $deletedummy; 1817 | 1818 | $messagecount--; 1819 | 1820 | } 1821 | 1822 | # If the user asked for a Dovecot keywords file and 1823 | # we found any keywords in this folder then write 1824 | # the file out. 1825 | if ($dokeywords && scalar(@keywords)) 1826 | { 1827 | 1828 | # $dokeywords should be false if the file already exists 1829 | # but we open it in O_EXCL mode to be sure. 1830 | # NOTE: NO LOCKING IS PERFORMED so beware running this 1831 | # on an active Maildir folder 1832 | if (sysopen(KEYWORDS, $keywordsfile, O_WRONLY|O_CREAT|O_EXCL, 0600)) 1833 | { 1834 | for (my $i = 0;$i < scalar(@keywords);$i++) 1835 | { 1836 | printf(KEYWORDS "%d %s\n", $i, $keywords[$i]); 1837 | } 1838 | close(KEYWORDS); 1839 | printf("Created keywords list: %s\n", $keywordsfile); 1840 | } 1841 | } 1842 | 1843 | # If the user asked for a uidlist file 1844 | # and we found an X-IMAP: or X-IMAPbase: header, then 1845 | # let's generate the file. 1846 | if ($douidlist && defined($uidvalidity)) 1847 | { 1848 | if ($create_courier_uidlist) 1849 | { 1850 | # Courier IMAP only wants the basename of the 1851 | # maildir file (up to the colon) so let's strip 1852 | # the endings off. 1853 | grep(s/:.*$//,@uidlist); 1854 | } 1855 | 1856 | # If there's already a uid list file, we don't 1857 | # know how to deal with the old UIDVALIDITY or 1858 | # whether the UIDs from the incoming messages 1859 | # are valid or unique. So we use O_EXCL and just 1860 | # bail out if the file exists and let the mail 1861 | # system update the index with new UIDs for 1862 | # these messages 1863 | # NOTE: NO LOCKING IS DONE SO DON'T RUN THIS ON 1864 | # AN ACTIVE MAILDIR 1865 | if (sysopen(UIDLIST, $uidlistfile, O_WRONLY|O_CREAT|O_EXCL, 0600)) 1866 | { 1867 | # The first 1 is the file format version number 1868 | # The second number is the UIDVALIDITY value 1869 | # The last number is the next number to be given 1870 | # to a new message (one higher than UIDLAST) 1871 | printf(UIDLIST "1 %d %d\n", $uidvalidity, $uidlast+1); 1872 | print(UIDLIST join("\n", @uidlist)); 1873 | print(UIDLIST "\n") if (scalar(@uidlist) > 0); 1874 | close(UIDLIST); 1875 | printf("Created UID list: %s\n", $uidlistfile); 1876 | } else { 1877 | printf("WARNING: Unable to create %s. Does it already exist?\n", $uidlistfile); 1878 | } 1879 | } 1880 | 1881 | printf("$messagecount messages.\n\n"); 1882 | } 1883 | --------------------------------------------------------------------------------