├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── imap.asd ├── imap.cl ├── imap.html ├── imap.txt ├── load.cl ├── mime-api.cl ├── mime-parse.cl ├── mime-parse.txt ├── mime-transfer-encoding.cl ├── rfc1939.txt ├── rfc2060.txt ├── rfc2822.cl ├── rfc2822.txt ├── rfc3696.txt ├── smtp.cl └── t-imap.cl /.gitignore: -------------------------------------------------------------------------------- 1 | .git-branch-name 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) Franz Inc., Lafayette, CA - All rights reserved. 2 | 3 | This code is free software; you can redistribute it and/or modify it 4 | under the terms of the version 2.1 of the GNU Lesser General Public 5 | License as published by the Free Software Foundation, as clarified by 6 | the AllegroServe prequel found in license-allegroserve.txt. 7 | 8 | This code is distributed in the hope that it will be useful, but 9 | without any warranty; without even the implied warranty of 10 | merchantability or fitness for a particular purpose. See the GNU 11 | Lesser General Public License for more details. 12 | 13 | Version 2.1 of the GNU Lesser General Public License is in the file 14 | license-lgpl.txt that was distributed with this file. If it is not 15 | present, you can access it from http://www.gnu.org/copyleft/lesser.txt 16 | (until superseded by a newer version) or write to the Free Software 17 | Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 18 | USA 19 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | 2 | SHELL = sh 3 | 4 | default: FORCE 5 | @echo no default rule 6 | 7 | clean: FORCE 8 | rm -f *.fasl 9 | 10 | FORCE: 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | imap: A common lisp library for sending and receiving email. 2 | ============================================================ 3 | 4 | Table of contents 5 | ----------------- 6 | 7 | * Description 8 | * Author 9 | * Author comments 10 | * Documentation 11 | * Platforms 12 | * Dependencies 13 | * Installation 14 | * Configuration 15 | * Licence 16 | * Notes 17 | * Examples 18 | * Open Source 19 | 20 | Description 21 | ----------- 22 | 23 | A client library for sending and receiving email. 24 | 25 | Author 26 | ------ 27 | 28 | John Foderaro, Franz Inc. 29 | 30 | Author comments 31 | --------------- 32 | 33 | The most popular protocol for accessing a mailbox was the Post Office 34 | Protocol (POP) defined in rfc1939. While it is popular, pop has very 35 | few features. It doesn't allow you to manage the mail on the server 36 | itself, instead you usually just download all mail to your local 37 | machine. A much more powerful protocol called the Internet Message 38 | Access Protocol (IMAP) was defined in the 1996 document rfc2060. With 39 | imap you can work with your mail while it's on the server and can 40 | create folders on the server in which to archive your mail. Thus the 41 | server itself can be the message store which is useful if you want to 42 | access your mail from more than one machine. You are not required to 43 | use the server to archive your mail with imap, you can use it like pop 44 | and download all the mail to your local machine. 45 | 46 | Sending email is done via the Simple Mail Transfer Protocol 47 | (SMTP). You can use smtp to send mail directly to the destination but 48 | typically this is not done because the destination machine may be down 49 | or unreachable at the time you wish to send the mail. Most 50 | organizations have a local mail server that is up and reachable all of 51 | the time. You can use smtp to send your letter to that local mail 52 | server and it will then take over the job of getting the mail to the 53 | destination (which may involve queueing the message and retrying to 54 | send it over a period of days). 55 | 56 | Platforms 57 | --------- 58 | 59 | Allegro Common Lisp 7.0 and newer on all platforms. 60 | 61 | Dependencies 62 | ------------ 63 | 64 | None, but for the test suite, [tester](http://opensource.franz.com) 65 | is required. 66 | 67 | Installation 68 | ------------ 69 | 70 | Start Allegro Common Lisp and load the load.cl file 71 | :ld /path/to/load.cl 72 | 73 | Configuration 74 | ------------- 75 | 76 | Set the following variables to true for extra debugging information: 77 | 78 | (setq net.post-office::*debug-imap* t 79 | net.post-office::*smtp-debug* t) 80 | 81 | Documentation 82 | ------------- 83 | 84 | * [pop and imap interfaces] 85 | (http://franz.com/support/documentation/current/doc/imap.htm) 86 | * [smtp interface] 87 | (http://franz.com/support/documentation/current/doc/imap.htm#smtp-1) 88 | * Also see the imap.html file that is included with this source code. 89 | 90 | License 91 | ------- 92 | 93 | The aserve source code is licensed under the terms of the 94 | [Lisp Lesser GNU Public License](http://opensource.franz.com/preamble.html), 95 | known as the LLGPL. The LLGPL consists of a preamble and the LGPL. Where these 96 | conflict, the preamble takes precedence. This project is referenced in the 97 | preamble as the LIBRARY. 98 | 99 | Notes 100 | ----- 101 | 102 | For reference please see rfc1939 (pop) and rfc2060 (imap). 103 | 104 | Examples and Information 105 | ------------------------ 106 | 107 | See the first link in the documenation section above for examples. 108 | 109 | Franz Inc. Open Source Info 110 | --------------------------- 111 | 112 | This project's homepage is . There is an 113 | informal community support and development mailing list 114 | [opensource@franz.com](http://opensource.franz.com/mailinglist.html) 115 | for these open source projects. We encourage you to take advantage by 116 | subscribing to the list. Once you're subscribed, email to 117 | with your questions, comments, suggestions, 118 | and patches. 119 | -------------------------------------------------------------------------------- /imap.asd: -------------------------------------------------------------------------------- 1 | ;;; ASD file contributed by james anderson 2 | 3 | (in-package :cl-user) 4 | 5 | (unless (find-class 'asdf::cl-file nil) 6 | (defclass asdf::cl-file (asdf:cl-source-file) ()) 7 | (defmethod asdf:source-file-type ((c asdf::cl-file) (s asdf:module)) "cl")) 8 | 9 | (asdf:defsystem :imap 10 | :serial t 11 | :components 12 | ((:file "imap") 13 | (:file "smtp"))) 14 | -------------------------------------------------------------------------------- /imap.txt: -------------------------------------------------------------------------------- 1 | imap plan: 2 | 3 | flags: 4 | system flags are described in the imap protocol as 5 | words like: \Answered 6 | the backslash is annoying to type and I think that the 7 | text is case insensitive anyway. 8 | we'll represent these flags by keywords (e.g :answered). 9 | 10 | 11 | 12 | 13 | 14 | todo: 15 | uid (modifier to copy fetch and store commands) 16 | 17 | 18 | 19 | search sexpressions (things that can be passwd to the search-mailbox 20 | function). 21 | kwda - keywords that don't take any arguments 22 | kwdb - keywords that do take arguments 23 | 24 | 25 | ssexp := nil | ssexp1 26 | ssexp1 := kwda | (kwdb oprnd+) (and ssexp1*) | (or ssexp1*) | (not ssexp1) 27 | | set 28 | set := number-seq | ( number-seq+ ) 29 | number-seq := integer | (:seq integer integer) 30 | kwda := :all | :answered ... 31 | kwdb := :bcc | :before .... 32 | oprnd := 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | implemented: 42 | 43 | connecting: 44 | make-imap-connection 45 | 46 | after connecting: 47 | 48 | fetch-letter 49 | mailbox-list 50 | noop 51 | search-mailbox 52 | select-mailbox 53 | 54 | -------------------------------------------------------------------------------- /load.cl: -------------------------------------------------------------------------------- 1 | (load (compile-file-if-needed "imap")) 2 | (load (compile-file-if-needed "smtp")) 3 | 4 | (defun test () 5 | (setq *xx* (net.post-office::make-imap-connection "tiger.franz.com" 6 | :user "jkfmail" 7 | :password "jkf.imap" 8 | )) 9 | (net.post-office::select-mailbox *xx* "inbox")) 10 | 11 | 12 | (defun testp () 13 | (setq *xx* (net.post-office::make-pop-connection "tiger.franz.com" 14 | :user "jkfmail" 15 | :password "jkf.imap" 16 | ))) 17 | -------------------------------------------------------------------------------- /mime-api.cl: -------------------------------------------------------------------------------- 1 | #+(version= 8 1) 2 | (sys:defpatch "mime" 4 3 | "v1: changes to internal/undocumented portions of module; 4 | v2: better parse-mime-structure behavior in the face of malformatted headers; 5 | v3: Improved performance when converting charset to external-format; 6 | v4: decode-header-text: handle underscore & remove soft newlines." 7 | :type :system 8 | :post-loadable t) 9 | 10 | #+(version= 8 0) 11 | (sys:defpatch "mime" 3 12 | "v0: New module. See documentation.; 13 | v1: Improve default transfer encoding determination; 14 | v2: make-mime-part: Default external-format is :utf8; 15 | v3: add mime structure parsing support." 16 | :type :system 17 | :post-loadable t) 18 | 19 | ;; -*- mode: common-lisp; package: net.post-office -*- 20 | ;; 21 | ;; imap.cl 22 | ;; imap and pop interface 23 | ;; 24 | ;; See the file LICENSE for the full license governing this code. 25 | ;; 26 | 27 | (defpackage :net.post-office 28 | (:use #:lisp #:excl) 29 | (:export 30 | ;; functions/methods 31 | #:make-mime-part 32 | #:mime-part-writer 33 | #:mime-part-p 34 | #:mime-part-constructed-p 35 | #:map-over-parts 36 | #:decode-header-text 37 | 38 | ;; macros 39 | #:mime-get-header 40 | #:with-mime-part-constructed-stream 41 | 42 | ;; classes 43 | #:mime-part-constructed 44 | 45 | ;; slot accessors 46 | #:mime-part-type 47 | #:mime-part-subtype 48 | #:mime-part-parameters 49 | #:mime-part-id 50 | #:mime-part-description 51 | #:mime-part-encoding 52 | #:mime-part-headers 53 | #:mime-part-parts 54 | #:mime-part-boundary)) 55 | 56 | (provide :mime) 57 | 58 | (in-package :net.post-office) 59 | 60 | (eval-when (compile load eval) 61 | (require :osi) 62 | (require :regexp2)) 63 | 64 | (defclass mime-part () 65 | ( 66 | (type :accessor mime-part-type :initform nil) 67 | (subtype :accessor mime-part-subtype :initform nil) 68 | (parameters :accessor mime-part-parameters :initform nil) ;; alist 69 | (id :accessor mime-part-id :initform nil) 70 | (description :accessor mime-part-description :initform nil) 71 | (encoding :accessor mime-part-encoding :initform nil) 72 | (headers ;; parsed headers alist 73 | :accessor mime-part-headers :initform nil) 74 | (parts ;; list of subparts (for multipart types) 75 | :accessor mime-part-parts :initform nil) 76 | (boundary :accessor mime-part-boundary :initform nil))) 77 | 78 | (defclass mime-part-constructed (mime-part) 79 | ( 80 | (source-type :accessor source-type) 81 | (source :accessor source) 82 | (disposition :accessor disposition :initform nil) 83 | (disposition-name :accessor disposition-name :initform nil))) 84 | 85 | (defmacro mime-get-header (header part) 86 | `(cdr (assoc ,header (mime-part-headers ,part) :test #'equalp))) 87 | 88 | (defun message-rfc822-p (type subtype) 89 | (and (equalp type "message") (equalp subtype "rfc822"))) 90 | 91 | (defun multipart-p (part) 92 | (equalp (mime-part-type part) "multipart")) 93 | 94 | (defun multipart-mixed-p (part) 95 | (and (equalp (mime-part-type part) "multipart") 96 | (equalp (mime-part-subtype part) "mixed"))) 97 | 98 | (defun mime-part-p (thing) 99 | (typep thing 'mime-part)) 100 | 101 | (defun mime-part-constructed-p (thing) 102 | (typep thing 'mime-part-constructed)) 103 | 104 | (defun generate-boundary () 105 | (declare (optimize (speed 3))) 106 | (let ((hex "01234567890abcdef")) 107 | (with-output-to-string (s) 108 | (write-string "----------_" s) 109 | (dotimes (n 32) 110 | (write-char (schar hex (random 16)) s))))) 111 | 112 | (defun make-mime-part (&key content-type encoding headers 113 | (attachmentp nil attachmentp-supplied) 114 | name text (start 0) end file 115 | subparts (external-format :utf8) 116 | parameters charset id description) 117 | (let ((part (make-instance 'mime-part-constructed)) 118 | type subtype multipart textp filepath orig-text) 119 | 120 | (if* (and text file) 121 | then (error "Only one of :text or :file may be specified")) 122 | 123 | (if* (and text (null end)) 124 | then (setf end (length text))) 125 | 126 | (when file 127 | (if* (streamp file) 128 | then (setf filepath (ignore-errors (namestring file))) 129 | else (setf filepath file))) 130 | 131 | ;; Select default content-type 132 | (when (null content-type) 133 | (if* filepath 134 | then (setf content-type (lookup-mime-type filepath))) 135 | 136 | (when (null content-type) 137 | (setf content-type 138 | (if* subparts 139 | then "multipart/mixed" 140 | elseif file 141 | then "application/octet-stream" 142 | elseif (and text (stringp text)) 143 | then "text/plain" 144 | else "application/octet-stream")))) 145 | 146 | (let ((pos (position #\/ content-type))) 147 | (if (null pos) 148 | (error "Invalid content-type: ~s" content-type)) 149 | 150 | (setf type (subseq content-type 0 pos)) 151 | (setf subtype (subseq content-type (1+ pos)))) 152 | 153 | (setf multipart (equalp type "multipart")) 154 | (setf textp (or (equalp type "text") (message-rfc822-p type subtype))) 155 | 156 | (if* (and subparts (not multipart)) 157 | then (error "subparts may not be specified for non-multipart parts")) 158 | 159 | (if* (and (not multipart) (null text) (null file)) 160 | then (error "One of :text or :file must be specified")) 161 | 162 | 163 | ;; Select default charset 164 | (if* (and (null charset) textp) 165 | then (setf charset 166 | (or 167 | (guess-charset-from-ef (find-external-format external-format)) 168 | "us-ascii"))) 169 | 170 | ;; For :text, break down to the final usb8. 171 | (when text 172 | (setf orig-text text) 173 | (setf text (if* (stringp text) 174 | then (string-to-octets text :null-terminate nil 175 | :external-format external-format 176 | :start start 177 | :end end) 178 | else (subseq text start end)))) 179 | 180 | ;; Select default encoding, which is always base64 except for 181 | ;; when :text was supplied as a string, in which case we scan to 182 | ;; choose between 7bit and base64. 183 | (when (and (not multipart) (null encoding)) 184 | (setf encoding 185 | (if* (and (stringp orig-text) (not (8-bit-array-p text))) 186 | then "7bit" 187 | else "base64"))) 188 | 189 | (setf (mime-part-type part) type) 190 | (setf (mime-part-subtype part) subtype) 191 | (setf (mime-part-parameters part) parameters) 192 | (if* charset 193 | then (push (cons "charset" charset) (mime-part-parameters part))) 194 | (setf (mime-part-encoding part) encoding) 195 | (setf (mime-part-id part) id) 196 | (setf (mime-part-description part) description) 197 | (setf (mime-part-headers part) headers) 198 | 199 | (if* file 200 | then (setf (source part) file) 201 | (if* (streamp file) 202 | then (setf (source-type part) :stream) 203 | else (with-open-file (f file)) ;; make sure we can read it. 204 | (setf (source-type part) :file)) 205 | (if* (not attachmentp-supplied) 206 | then (setf attachmentp t)) 207 | else (setf (source-type part) :usb8) 208 | (setf (source part) text)) 209 | 210 | (if* (and (not textp) (not attachmentp) (not multipart)) 211 | then (setf (disposition part) "inline")) 212 | 213 | (when attachmentp 214 | (setf (disposition part) "attachment") 215 | (if* (and (null name) file) 216 | then (setf name (excl.osi:basename filepath)))) 217 | 218 | (if* name 219 | then (setf (disposition-name part) name)) 220 | 221 | (if* multipart 222 | then (let ((boundary (generate-boundary))) 223 | (setf (mime-part-boundary part) boundary) 224 | (push (cons "boundary" boundary) 225 | (mime-part-parameters part)))) 226 | 227 | (setf (mime-part-parts part) subparts) 228 | 229 | part)) 230 | 231 | (defun 8-bit-array-p (usb8) 232 | (declare (optimize (speed 3) (safety 0)) 233 | ((simple-array (unsigned-byte 8) (*)) usb8)) 234 | (dotimes (n (length usb8)) 235 | (declare (fixnum n)) 236 | (if (> (aref usb8 n) 127) 237 | (return t)))) 238 | 239 | (defparameter *ef-nick-to-mime-charset* 240 | '((:ascii . "us-ascii") 241 | (:iso-2022-jp . "iso-2022-jp") 242 | (:koi8-r . "koi8-r") 243 | (:shiftjis . "shift_jis") 244 | (:euc . "euc-jp") 245 | (:gb2312 . "gb2312") 246 | (:big5 . "big5") 247 | (:utf8 . "utf-8"))) 248 | 249 | (defun guess-charset-from-ef (ef) 250 | (dolist (nick (ef-nicknames (find-external-format ef))) 251 | (let ((charset (cdr (assoc nick *ef-nick-to-mime-charset*)))) 252 | (if charset (return-from guess-charset-from-ef charset)))) 253 | (let ((ef-name (string-downcase (symbol-name (ef-name (crlf-base-ef ef)))))) 254 | ;; Try iso-8559-x names. 255 | (multiple-value-bind (found ignore suffix) 256 | (match-re "^iso8859-(\\d+)-base" ef-name) 257 | (declare (ignore ignore)) 258 | (if found 259 | (return-from guess-charset-from-ef 260 | (format nil "iso-8859-~a" suffix)))) 261 | 262 | ;; Try windows- names. 263 | (multiple-value-bind (found whole value) 264 | (match-re "^(\\d+)-base$" ef-name) 265 | (declare (ignore whole)) 266 | (if found 267 | (return-from guess-charset-from-ef 268 | (format nil "windows-~a" value)))))) 269 | 270 | (defmethod mime-part-writer ((part mime-part-constructed) 271 | &key (stream *terminal-io*)) 272 | (mime-part-constructed-writer part stream t)) 273 | 274 | (defun mime-part-constructed-writer (part stream top-level) 275 | (if* top-level 276 | then (format stream "MIME-Version: 1.0~%")) 277 | 278 | ;; First dump user-supplied headers. 279 | (dolist (h (mime-part-headers part)) 280 | (format stream "~a: ~a~%" (car h) (cdr h))) 281 | 282 | ;; Now dump headers that are based on class fields. 283 | 284 | (let* ((type (mime-part-type part)) 285 | (multipart (equalp type "multipart"))) 286 | (format stream "Content-Type: ~a/~a" type (mime-part-subtype part)) 287 | (dolist (param (mime-part-parameters part)) 288 | (format stream ";~% ~a=~s" (car param) (cdr param))) 289 | (format stream "~%") 290 | 291 | (if* (mime-part-encoding part) 292 | then (format stream "Content-Transfer-Encoding: ~a~%" 293 | (mime-part-encoding part))) 294 | 295 | (if* (mime-part-id part) 296 | then (format stream "Content-Id: ~a~%" (mime-part-id part))) 297 | 298 | (if* (mime-part-description part) 299 | then (format stream "Content-Description: ~a~%" 300 | (mime-part-description part))) 301 | 302 | 303 | (if* (disposition part) 304 | then (format stream "Content-Disposition: ~a" (disposition part)) 305 | (if* (disposition-name part) 306 | then (format stream ";~% filename=~s" (disposition-name part))) 307 | (format stream "~%")) 308 | 309 | (format stream "~%") ;; terminate headers 310 | 311 | (if* multipart 312 | then (let ((boundary (mime-part-boundary part))) 313 | (if top-level 314 | (format stream "~ 315 | This is a multi-part message in MIME format.~%")) 316 | (dolist (subpart (mime-part-parts part)) 317 | (format stream "~%--~a~%" boundary) 318 | (mime-part-constructed-writer subpart stream nil)) 319 | (format stream "~%--~a--~%" boundary)) 320 | else (let ((instream (if* (eq (source-type part) :stream) 321 | then (source part) 322 | elseif (eq (source-type part) :file) 323 | then (open (source part)) 324 | else (make-buffer-input-stream (source part))))) 325 | (unwind-protect 326 | (let ((encoding (mime-part-encoding part))) 327 | (if* (equalp encoding "base64") 328 | then (excl::base64-encode-stream instream stream) 329 | elseif (equalp encoding "quoted-printable") 330 | then (qp-encode-stream instream stream) 331 | else (raw-encode-stream instream stream))) 332 | ;; cleanup 333 | (if* (not (eq (source-type part) :stream)) 334 | then (close instream))))))) 335 | 336 | (defun mime-part-writer-1 (stream part) 337 | (mime-part-writer part :stream stream)) 338 | 339 | (defmacro with-mime-part-constructed-stream ((stream part) &body body) 340 | `(excl::with-function-input-stream (,stream #'mime-part-writer-1 ,part) 341 | ,@body)) 342 | 343 | 344 | ;; misc 345 | 346 | (defun map-over-parts (part function) 347 | (funcall function part) 348 | (if* (multipart-p part) 349 | then (dolist (p (mime-part-parts part)) 350 | (map-over-parts p function)) 351 | elseif (message-rfc822-p (mime-part-type part) (mime-part-subtype part)) 352 | then (map-over-parts (mime-part-message part) function))) 353 | 354 | ;; 355 | 356 | (defparameter *default-charset-to-ef* 357 | '(("us-ascii" . :latin1) 358 | ("ansi_x3.4-1968" . :latin1) 359 | ("shift-jis" . :shiftjis) 360 | ("gbk" . :936) 361 | #+ignore("euc-kr" :iso-2022-kr))) 362 | 363 | (defparameter *charset-to-ef* nil) 364 | 365 | (defparameter *charset-to-ef-lock* (mp:make-process-lock)) 366 | 367 | (defparameter *debug-charset-to-ef* nil) 368 | 369 | (defun init-charset-to-ef () 370 | (let ((ht (make-hash-table :test #'equal))) 371 | (dolist (pair *default-charset-to-ef*) 372 | (setf (gethash (car pair) ht) (find-external-format (cdr pair) :errorp nil))) 373 | (setf *charset-to-ef* ht))) 374 | 375 | (defun charset-to-external-format (charset) 376 | (setf charset (string-downcase charset)) 377 | (mp:with-process-lock (*charset-to-ef-lock*) 378 | (if (null *charset-to-ef*) 379 | (init-charset-to-ef)) 380 | 381 | (macrolet ((save-and-return (ef) 382 | (let ((ef-x (gensym))) 383 | `(let ((,ef-x ,ef)) 384 | (progn (setf (gethash charset *charset-to-ef*) ,ef-x) 385 | (return-from charset-to-external-format ,ef-x)))))) 386 | 387 | (let ((ef (gethash charset *charset-to-ef*))) 388 | (if ef 389 | (return-from charset-to-external-format ef)) ;; Use cached result 390 | 391 | (if (setf ef (find-external-format charset :errorp nil)) 392 | (save-and-return ef)) 393 | 394 | (multiple-value-bind (matched x inner) 395 | (match-re "^windows-(\\d+)$" charset) 396 | (declare (ignore x)) 397 | (if (and matched (setf ef (find-external-format inner :errorp nil))) 398 | (save-and-return ef))) 399 | 400 | (multiple-value-bind (matched x dig) 401 | (match-re "^iso-8859-(\\d+)(?:-[ie])?$" charset) 402 | (declare (ignore x)) 403 | (if (and matched (setf ef (find-external-format (format nil "iso8859-~a" dig) :errorp nil))) 404 | (save-and-return ef))) 405 | 406 | (if *debug-charset-to-ef* 407 | (format t "no external found for ~a~%" charset)) 408 | 409 | ;; No luck 410 | nil)))) 411 | 412 | (defun decode-header-text (text) 413 | (declare (optimize (speed 3)) 414 | (string text)) 415 | (when (null text) (error "first argument expected to be non-nil.")) 416 | (let ((pos 0) 417 | (len (length text)) 418 | last-tail) 419 | (declare (fixnum pos len)) 420 | (with-output-to-string (res) 421 | (while (< pos len) 422 | (multiple-value-bind (matched whole charset encoding encoded tail) 423 | (match-re "=\\?([^?]+)\\?(q|b)\\?(.*?)\\?=(\\s+)?" text 424 | :start pos 425 | :case-fold t 426 | :return :index) 427 | 428 | (when (null matched) 429 | (when last-tail 430 | (write-string text res 431 | :start (car last-tail) :end (cdr last-tail))) 432 | (return)) 433 | 434 | ;; Write out the "before" stuff. 435 | (write-string text res :start pos :end (car whole)) 436 | 437 | (let* ((charset (subseq text (car charset) (cdr charset))) 438 | (ef (charset-to-external-format charset))) 439 | (if (null ef) 440 | (error "No external format found for MIME charset ~s" charset)) 441 | (write-string 442 | (if* (char-equal (char text (car encoding)) #\q) 443 | then (qp-decode-string text 444 | :start (car encoded) 445 | :end (cdr encoded) 446 | :external-format ef 447 | :underscores-are-spaces t) 448 | else ;; FIXME: Clean this up with/if rfe6174 is completed. 449 | (octets-to-string 450 | (base64-string-to-usb8-array 451 | (subseq text (car encoded) (cdr encoded))) 452 | :external-format ef)) 453 | res)) 454 | 455 | (setf pos (cdr whole)) 456 | (setf last-tail tail))) 457 | 458 | ;; Write out the remaining portion. 459 | (write-string text res :start pos)))) 460 | 461 | 462 | ;; Stuff ripped off from aserve 463 | 464 | (defun split-namestring (file) 465 | ;; split the namestring into root and tail and then the tail 466 | ;; into name and type 467 | ;; 468 | ;; any of the return value can be nil if the corresponding item 469 | ;; isn't present. 470 | ;; 471 | ;; rules for splitting the tail into name and type components: 472 | ;; if the last period in the tail is at the beginning or end of the 473 | ;; tail, then the name is exactly the tail and type is nil. 474 | ;; Thus .foo and bar. are just names, no type 475 | ;; but .foo.c has a name of ".foo" and a type of "c" 476 | ;; Thus if there is a non-nil type then it means that 477 | ;; 1. there will be a non nil name as well 478 | ;; 2. to reconstruct the filename you need to add a period between 479 | ;; the name and type. 480 | ;; 481 | (let ((pos (min (or (or (position #\/ file :from-end t) most-positive-fixnum) 482 | #+mswindows (position #\\ file :from-end t)))) 483 | root 484 | tail) 485 | 486 | (if* (equal file "") then (return-from split-namestring nil)) 487 | 488 | (if* (and pos (< pos most-positive-fixnum)) 489 | then ; we have root and tail 490 | (if* (eql pos (1- (length file))) 491 | then ; just have root 492 | (return-from split-namestring 493 | (values file nil nil nil))) 494 | 495 | 496 | (setq root (subseq file 0 (1+ pos)) 497 | tail (subseq file (1+ pos))) 498 | else (setq tail file)) 499 | 500 | 501 | ; split the tail 502 | (let ((pos (position #\. tail :from-end t))) 503 | (if* (or (null pos) 504 | (zerop pos) 505 | (equal pos (1- (length tail)))) 506 | then ; name begins or ends with . so it's not 507 | ; a type separator 508 | (values root tail tail nil) 509 | else ; have all pieces 510 | (values root tail 511 | (subseq tail 0 pos) 512 | (subseq tail (1+ pos))))))) 513 | 514 | 515 | ; we can specify either an exact url or one that handles all 516 | ; urls with a common prefix. 517 | ;; 518 | ;; if the prefix is given as a list: e.g. ("ReadMe") then it says that 519 | ;; this mime type applie to file named ReadMe. Note that file types 520 | ;; are checked first and if no match then a filename match is done. 521 | ; 522 | (defparameter *file-type-to-mime-type* 523 | ;; this list constructed by generate-mime-table in parse.cl 524 | '(("application/EDI-Consent") ("application/EDI-X12") ("application/EDIFACT") 525 | ("application/activemessage") ("application/andrew-inset" "ez") 526 | ("application/applefile") ("application/atomicmail") 527 | ("application/batch-SMTP") ("application/beep+xml") ("application/cals-1840") 528 | ("application/commonground") ("application/cybercash") 529 | ("application/dca-rft") ("application/dec-dx") ("application/dvcs") 530 | ("application/eshop") ("application/http") ("application/hyperstudio") 531 | ("application/iges") ("application/index") ("application/index.cmd") 532 | ("application/index.obj") ("application/index.response") 533 | ("application/index.vnd") ("application/iotp") ("application/ipp") 534 | ("application/isup") ("application/font-tdpfr") 535 | ("application/mac-binhex40" "hqx") ("application/mac-compactpro" "cpt") 536 | ("application/macwriteii") ("application/marc") ("application/mathematica") 537 | ("application/mathematica-old") ("application/msword" "doc") 538 | ("application/news-message-id") ("application/news-transmission") 539 | ("application/ocsp-request") ("application/ocsp-response") 540 | ("application/octet-stream" "bin" "dms" "lha" "lzh" "exe" "class" "so" "dll" 541 | "img" "iso") 542 | ("application/ogg" "ogg") ("application/parityfec") ("application/pdf" "pdf") 543 | ("application/pgp-encrypted") ("application/pgp-keys") 544 | ("application/pgp-signature") ("application/pkcs10") 545 | ("application/pkcs7-mime") ("application/pkcs7-signature") 546 | ("application/pkix-cert") ("application/pkix-crl") ("application/pkixcmp") 547 | ("application/postscript" "ai" "eps" "ps") 548 | ("application/prs.alvestrand.titrax-sheet") ("application/prs.cww") 549 | ("application/prs.nprend") ("application/qsig") 550 | ("application/remote-printing") ("application/riscos") 551 | ("application/rtf" "rtf") ("application/sdp") ("application/set-payment") 552 | ("application/set-payment-initiation") ("application/set-registration") 553 | ("application/set-registration-initiation") ("application/sgml") 554 | ("application/sgml-open-catalog") ("application/sieve") ("application/slate") 555 | ("application/smil" "smi" "smil") ("application/timestamp-query") 556 | ("application/timestamp-reply") ("application/vemmi") 557 | ("application/vnd.3M.Post-it-Notes") ("application/vnd.FloGraphIt") 558 | ("application/vnd.accpac.simply.aso") ("application/vnd.accpac.simply.imp") 559 | ("application/vnd.acucobol") ("application/vnd.aether.imp") 560 | ("application/vnd.anser-web-certificate-issue-initiation") 561 | ("application/vnd.anser-web-funds-transfer-initiation") 562 | ("application/vnd.audiograph") ("application/vnd.businessobjects") 563 | ("application/vnd.bmi") ("application/vnd.canon-cpdl") 564 | ("application/vnd.canon-lips") ("application/vnd.claymore") 565 | ("application/vnd.commerce-battelle") ("application/vnd.commonspace") 566 | ("application/vnd.comsocaller") ("application/vnd.contact.cmsg") 567 | ("application/vnd.cosmocaller") ("application/vnd.cups-postscript") 568 | ("application/vnd.cups-raster") ("application/vnd.cups-raw") 569 | ("application/vnd.ctc-posml") ("application/vnd.cybank") 570 | ("application/vnd.dna") ("application/vnd.dpgraph") ("application/vnd.dxr") 571 | ("application/vnd.ecdis-update") ("application/vnd.ecowin.chart") 572 | ("application/vnd.ecowin.filerequest") ("application/vnd.ecowin.fileupdate") 573 | ("application/vnd.ecowin.series") ("application/vnd.ecowin.seriesrequest") 574 | ("application/vnd.ecowin.seriesupdate") ("application/vnd.enliven") 575 | ("application/vnd.epson.esf") ("application/vnd.epson.msf") 576 | ("application/vnd.epson.quickanime") ("application/vnd.epson.salt") 577 | ("application/vnd.epson.ssf") ("application/vnd.ericsson.quickcall") 578 | ("application/vnd.eudora.data") ("application/vnd.fdf") 579 | ("application/vnd.ffsns") ("application/vnd.framemaker") 580 | ("application/vnd.fsc.weblaunch") ("application/vnd.fujitsu.oasys") 581 | ("application/vnd.fujitsu.oasys2") ("application/vnd.fujitsu.oasys3") 582 | ("application/vnd.fujitsu.oasysgp") ("application/vnd.fujitsu.oasysprs") 583 | ("application/vnd.fujixerox.ddd") ("application/vnd.fujixerox.docuworks") 584 | ("application/vnd.fujixerox.docuworks.binder") ("application/vnd.fut-misnet") 585 | ("application/vnd.grafeq") ("application/vnd.groove-account") 586 | ("application/vnd.groove-identity-message") 587 | ("application/vnd.groove-injector") ("application/vnd.groove-tool-message") 588 | ("application/vnd.groove-tool-template") ("application/vnd.groove-vcard") 589 | ("application/vnd.hhe.lesson-player") ("application/vnd.hp-HPGL") 590 | ("application/vnd.hp-PCL") ("application/vnd.hp-PCLXL") 591 | ("application/vnd.hp-hpid") ("application/vnd.hp-hps") 592 | ("application/vnd.httphone") ("application/vnd.hzn-3d-crossword") 593 | ("application/vnd.ibm.afplinedata") ("application/vnd.ibm.MiniPay") 594 | ("application/vnd.ibm.modcap") ("application/vnd.informix-visionary") 595 | ("application/vnd.intercon.formnet") ("application/vnd.intertrust.digibox") 596 | ("application/vnd.intertrust.nncp") ("application/vnd.intu.qbo") 597 | ("application/vnd.intu.qfx") ("application/vnd.irepository.package+xml") 598 | ("application/vnd.is-xpr") ("application/vnd.japannet-directory-service") 599 | ("application/vnd.japannet-jpnstore-wakeup") 600 | ("application/vnd.japannet-payment-wakeup") 601 | ("application/vnd.japannet-registration") 602 | ("application/vnd.japannet-registration-wakeup") 603 | ("application/vnd.japannet-setstore-wakeup") 604 | ("application/vnd.japannet-verification") 605 | ("application/vnd.japannet-verification-wakeup") ("application/vnd.koan") 606 | ("application/vnd.lotus-1-2-3") ("application/vnd.lotus-approach") 607 | ("application/vnd.lotus-freelance") ("application/vnd.lotus-notes") 608 | ("application/vnd.lotus-organizer") ("application/vnd.lotus-screencam") 609 | ("application/vnd.lotus-wordpro") ("application/vnd.mcd") 610 | ("application/vnd.mediastation.cdkey") ("application/vnd.meridian-slingshot") 611 | ("application/vnd.mif" "mif") ("application/vnd.minisoft-hp3000-save") 612 | ("application/vnd.mitsubishi.misty-guard.trustweb") 613 | ("application/vnd.mobius.daf") ("application/vnd.mobius.dis") 614 | ("application/vnd.mobius.msl") ("application/vnd.mobius.plc") 615 | ("application/vnd.mobius.txf") ("application/vnd.motorola.flexsuite") 616 | ("application/vnd.motorola.flexsuite.adsi") 617 | ("application/vnd.motorola.flexsuite.fis") 618 | ("application/vnd.motorola.flexsuite.gotap") 619 | ("application/vnd.motorola.flexsuite.kmr") 620 | ("application/vnd.motorola.flexsuite.ttc") 621 | ("application/vnd.motorola.flexsuite.wem") 622 | ("application/vnd.mozilla.xul+xml") ("application/vnd.ms-artgalry") 623 | ("application/vnd.ms-asf") ("application/vnd.ms-excel" "xls") 624 | ("application/vnd.ms-lrm") ("application/vnd.ms-powerpoint" "ppt") 625 | ("application/vnd.ms-project") ("application/vnd.ms-tnef") 626 | ("application/vnd.ms-works") ("application/vnd.mseq") 627 | ("application/vnd.msign") ("application/vnd.music-niff") 628 | ("application/vnd.musician") ("application/vnd.netfpx") 629 | ("application/vnd.noblenet-directory") ("application/vnd.noblenet-sealer") 630 | ("application/vnd.noblenet-web") ("application/vnd.novadigm.EDM") 631 | ("application/vnd.novadigm.EDX") ("application/vnd.novadigm.EXT") 632 | ("application/vnd.osa.netdeploy") ("application/vnd.palm") 633 | ("application/vnd.pg.format") ("application/vnd.pg.osasli") 634 | ("application/vnd.powerbuilder6") ("application/vnd.powerbuilder6-s") 635 | ("application/vnd.powerbuilder7") ("application/vnd.powerbuilder7-s") 636 | ("application/vnd.powerbuilder75") ("application/vnd.powerbuilder75-s") 637 | ("application/vnd.previewsystems.box") 638 | ("application/vnd.publishare-delta-tree") ("application/vnd.pvi.ptid1") 639 | ("application/vnd.pwg-xhtml-print+xml") ("application/vnd.rapid") 640 | ("application/vnd.s3sms") ("application/vnd.seemail") 641 | ("application/vnd.shana.informed.formdata") 642 | ("application/vnd.shana.informed.formtemplate") 643 | ("application/vnd.shana.informed.interchange") 644 | ("application/vnd.shana.informed.package") ("application/vnd.sss-cod") 645 | ("application/vnd.sss-dtf") ("application/vnd.sss-ntf") 646 | ("application/vnd.sun.xml.writer" "sxw") 647 | ("application/vnd.sun.xml.writer.template" "stw") 648 | ("application/vnd.sun.xml.calc" "sxc") 649 | ("application/vnd.sun.xml.calc.template" "stc") 650 | ("application/vnd.sun.xml.draw" "sxd") 651 | ("application/vnd.sun.xml.draw.template" "std") 652 | ("application/vnd.sun.xml.impress" "sxi") 653 | ("application/vnd.sun.xml.impress.template" "sti") 654 | ("application/vnd.sun.xml.writer.global" "sxg") 655 | ("application/vnd.sun.xml.math" "sxm") ("application/vnd.street-stream") 656 | ("application/vnd.svd") ("application/vnd.swiftview-ics") 657 | ("application/vnd.triscape.mxs") ("application/vnd.trueapp") 658 | ("application/vnd.truedoc") ("application/vnd.tve-trigger") 659 | ("application/vnd.ufdl") ("application/vnd.uplanet.alert") 660 | ("application/vnd.uplanet.alert-wbxml") 661 | ("application/vnd.uplanet.bearer-choice-wbxml") 662 | ("application/vnd.uplanet.bearer-choice") ("application/vnd.uplanet.cacheop") 663 | ("application/vnd.uplanet.cacheop-wbxml") ("application/vnd.uplanet.channel") 664 | ("application/vnd.uplanet.channel-wbxml") ("application/vnd.uplanet.list") 665 | ("application/vnd.uplanet.list-wbxml") ("application/vnd.uplanet.listcmd") 666 | ("application/vnd.uplanet.listcmd-wbxml") ("application/vnd.uplanet.signal") 667 | ("application/vnd.vcx") ("application/vnd.vectorworks") 668 | ("application/vnd.vidsoft.vidconference") ("application/vnd.visio") 669 | ("application/vnd.vividence.scriptfile") ("application/vnd.wap.sic") 670 | ("application/vnd.wap.slc") ("application/vnd.wap.wbxml" "wbxml") 671 | ("application/vnd.wap.wmlc" "wmlc") 672 | ("application/vnd.wap.wmlscriptc" "wmlsc") ("application/vnd.webturbo") 673 | ("application/vnd.wrq-hp3000-labelled") ("application/vnd.wt.stf") 674 | ("application/vnd.xara") ("application/vnd.xfdl") 675 | ("application/vnd.yellowriver-custom-menu") ("application/whoispp-query") 676 | ("application/whoispp-response") ("application/wita") 677 | ("application/wordperfect5.1") ("application/x-bcpio" "bcpio") 678 | ("application/x-bittorrent" "torrent") ("application/x-bzip2" "bz2") 679 | ("application/x-cdlink" "vcd") ("application/x-chess-pgn" "pgn") 680 | ("application/x-compress") ("application/x-cpio" "cpio") 681 | ("application/x-csh" "csh") ("application/x-director" "dcr" "dir" "dxr") 682 | ("application/x-dvi" "dvi") ("application/x-futuresplash" "spl") 683 | ("application/x-gtar" "gtar") ("application/x-gzip" "gz" "tgz") 684 | ("application/x-hdf" "hdf") ("application/x-javascript" "js") 685 | ("application/x-kword" "kwd" "kwt") ("application/x-kspread" "ksp") 686 | ("application/x-kpresenter" "kpr" "kpt") ("application/x-kchart" "chrt") 687 | ("application/x-killustrator" "kil") 688 | ("application/x-koan" "skp" "skd" "skt" "skm") 689 | ("application/x-latex" "latex") ("application/x-netcdf" "nc" "cdf") 690 | ("application/x-rpm" "rpm") ("application/x-sh" "sh") 691 | ("application/x-shar" "shar") ("application/x-shockwave-flash" "swf") 692 | ("application/x-stuffit" "sit") ("application/x-sv4cpio" "sv4cpio") 693 | ("application/x-sv4crc" "sv4crc") ("application/x-tar" "tar") 694 | ("application/x-tcl" "tcl") ("application/x-tex" "tex") 695 | ("application/x-texinfo" "texinfo" "texi") 696 | ("application/x-troff" "t" "tr" "roff") ("application/x-troff-man" "man") 697 | ("application/x-troff-me" "me") ("application/x-troff-ms" "ms") 698 | ("application/x-ustar" "ustar") ("application/x-wais-source" "src") 699 | ("application/x400-bp") ("application/xhtml+xml" "xhtml" "xht") 700 | ("application/xml") ("application/xml-dtd") 701 | ("application/xml-external-parsed-entity") ("application/zip" "zip") 702 | ("audio/32kadpcm") ("audio/basic" "au" "snd") ("audio/g.722.1") ("audio/l16") 703 | ("audio/midi" "mid" "midi" "kar") ("audio/mp4a-latm") ("audio/mpa-robust") 704 | ("audio/mpeg" "mpga" "mp2" "mp3") ("audio/parityfec") ("audio/prs.sid") 705 | ("audio/telephone-event") ("audio/tone") ("audio/vnd.cisco.nse") 706 | ("audio/vnd.cns.anp1") ("audio/vnd.cns.inf1") ("audio/vnd.digital-winds") 707 | ("audio/vnd.everad.plj") ("audio/vnd.lucent.voice") ("audio/vnd.nortel.vbk") 708 | ("audio/vnd.nuera.ecelp4800") ("audio/vnd.nuera.ecelp7470") 709 | ("audio/vnd.nuera.ecelp9600") ("audio/vnd.octel.sbc") ("audio/vnd.qcelp") 710 | ("audio/vnd.rhetorex.32kadpcm") ("audio/vnd.vmx.cvsd") 711 | ("audio/x-aiff" "aif" "aiff" "aifc") ("audio/x-mpegurl" "m3u") 712 | ("audio/x-pn-realaudio" "ram" "rm") ("audio/x-realaudio" "ra") 713 | ("audio/x-wav" "wav") ("chemical/x-pdb" "pdb") ("chemical/x-xyz" "xyz") 714 | ("image/bmp" "bmp") ("image/cgm") ("image/g3fax") ("image/gif" "gif") 715 | ("image/ief" "ief") ("image/jpeg" "jpeg" "jpg" "jpe") ("image/naplps") 716 | ("image/png" "png") ("image/prs.btif") ("image/prs.pti") 717 | ("image/tiff" "tiff" "tif") ("image/vnd.cns.inf2") 718 | ("image/vnd.djvu" "djvu" "djv") ("image/vnd.dwg") ("image/vnd.dxf") 719 | ("image/vnd.fastbidsheet") ("image/vnd.fpx") ("image/vnd.fst") 720 | ("image/vnd.fujixerox.edmics-mmr") ("image/vnd.fujixerox.edmics-rlc") 721 | ("image/vnd.mix") ("image/vnd.net-fpx") ("image/vnd.svf") 722 | ("image/vnd.wap.wbmp" "wbmp") ("image/vnd.xiff") ("image/x-cmu-raster" "ras") 723 | ("image/x-portable-anymap" "pnm") ("image/x-portable-bitmap" "pbm") 724 | ("image/x-portable-graymap" "pgm") ("image/x-portable-pixmap" "ppm") 725 | ("image/x-rgb" "rgb") ("image/x-xbitmap" "xbm") ("image/x-xpixmap" "xpm") 726 | ("image/x-xwindowdump" "xwd") ("message/delivery-status") 727 | ("message/disposition-notification") ("message/external-body") 728 | ("message/http") ("message/news") ("message/partial") ("message/rfc822") 729 | ("message/s-http") ("model/iges" "igs" "iges") 730 | ("model/mesh" "msh" "mesh" "silo") ("model/vnd.dwf") 731 | ("model/vnd.flatland.3dml") ("model/vnd.gdl") ("model/vnd.gs-gdl") 732 | ("model/vnd.gtw") ("model/vnd.mts") ("model/vnd.vtu") 733 | ("model/vrml" "wrl" "vrml") ("multipart/alternative") 734 | ("multipart/appledouble") ("multipart/byteranges") ("multipart/digest") 735 | ("multipart/encrypted") ("multipart/form-data") ("multipart/header-set") 736 | ("multipart/mixed") ("multipart/parallel") ("multipart/related") 737 | ("multipart/report") ("multipart/signed") ("multipart/voice-message") 738 | ("text/calendar") ("text/css" "css") ("text/directory") ("text/enriched") 739 | ("text/html" "html" "htm") ("text/parityfec") ("text/plain" "asc" "txt") 740 | ("text/prs.lines.tag") ("text/rfc822-headers") ("text/richtext" "rtx") 741 | ("text/rtf" "rtf") ("text/sgml" "sgml" "sgm") 742 | ("text/tab-separated-values" "tsv") ("text/t140") ("text/uri-list") 743 | ("text/vnd.DMClientScript") ("text/vnd.IPTC.NITF") ("text/vnd.IPTC.NewsML") 744 | ("text/vnd.abc") ("text/vnd.curl") ("text/vnd.flatland.3dml") 745 | ("text/vnd.fly") ("text/vnd.fmi.flexstor") ("text/vnd.in3d.3dml") 746 | ("text/vnd.in3d.spot") ("text/vnd.latex-z") ("text/vnd.motorola.reflex") 747 | ("text/vnd.ms-mediapackage") ("text/vnd.wap.si") ("text/vnd.wap.sl") 748 | ("text/vnd.wap.wml" "wml") ("text/vnd.wap.wmlscript" "wmls") 749 | ("text/x-setext" "etx") ("text/xml" "xml" "xsl") 750 | ("text/xml-external-parsed-entity") ("video/mp4v-es") 751 | ("video/mpeg" "mpeg" "mpg" "mpe") ("video/parityfec") ("video/pointer") 752 | ("video/quicktime" "qt" "mov") ("video/vnd.fvt") ("video/vnd.motorola.video") 753 | ("video/vnd.motorola.videop") ("video/vnd.mpegurl" "mxu") ("video/vnd.mts") 754 | ("video/vnd.nokia.interleaved-multimedia") ("video/vnd.vivo") 755 | ("video/x-msvideo" "avi") ("video/x-sgi-movie" "movie") 756 | ("x-conference/x-cooltalk" "ice"))) 757 | 758 | (defvar *mime-types* nil) 759 | 760 | (defun build-mime-types-table () 761 | (if* (null *mime-types*) 762 | then (setf *mime-types* (make-hash-table :test #'equalp)) 763 | (dolist (ent *file-type-to-mime-type*) 764 | (dolist (type (cdr ent)) 765 | (setf (gethash type *mime-types*) (car ent)))))) 766 | 767 | 768 | (build-mime-types-table) ;; build the table now 769 | 770 | ;; return mime type if known 771 | (defmethod lookup-mime-type (filename) 772 | (if* (pathnamep filename) 773 | then (setq filename (namestring filename))) 774 | (multiple-value-bind (root tail name type) 775 | (split-namestring filename) 776 | (declare (ignore root name)) 777 | (if* (and type (gethash type *mime-types*)) 778 | thenret 779 | elseif (gethash (list tail) *mime-types*) 780 | thenret))) 781 | -------------------------------------------------------------------------------- /mime-parse.cl: -------------------------------------------------------------------------------- 1 | ;; -*- mode: common-lisp; package: net.post-office -*- 2 | ;; 3 | ;; See the file LICENSE for the full license governing this code. 4 | ;; 5 | 6 | (defpackage :net.post-office 7 | (:use #:lisp #:excl) 8 | (:export 9 | #:parse-mime-structure 10 | #:mime-dequote 11 | #:with-part-stream 12 | 13 | ;; accessors 14 | #:mime-part-headers-size 15 | #:mime-part-body-size 16 | #:mime-part-lines 17 | #:mime-part-position 18 | #:mime-part-body-position 19 | #:mime-part-message 20 | 21 | ;; class name 22 | #:mime-part-parsed 23 | 24 | )) 25 | 26 | (in-package :net.post-office) 27 | 28 | (eval-when (compile) 29 | (declaim (optimize (speed 3)))) 30 | 31 | (eval-when (compile load eval) 32 | (require :streamp)) 33 | 34 | ;;; MIME structure parser. 35 | ;;; Ref: RFC2045/2046 36 | 37 | (defclass mime-part-parsed (mime-part) 38 | ( 39 | (headers-size ;; in bytes. Includes the bytes for the blank line 40 | :accessor mime-part-headers-size :initform nil) 41 | (body-size ;; in bytes. 42 | :accessor mime-part-body-size :initform nil) 43 | (lines ;; line count of body (for non-multipart types) 44 | :accessor mime-part-lines :initform nil) 45 | (position ;; file position of start of headers 46 | :accessor mime-part-position :initform nil) 47 | (body-position ;; file position of start of body 48 | :accessor mime-part-body-position :initform nil) 49 | (message ;; for message/rfc822 encapsulated message. 50 | ;; This will be a mime-part 51 | :accessor mime-part-message :initform nil))) 52 | 53 | (defmacro get-header (name headers) 54 | `(cdr (assoc ,name ,headers :test #'equalp))) 55 | 56 | (defvar *mime-read-line-unread*) 57 | 58 | (defun parse-mime-structure (stream &key mbox) 59 | (let ((*mime-read-line-unread* nil)) 60 | (multiple-value-bind (part stop-reason newpos) 61 | (parse-mime-structure-1 stream nil nil 0 mbox :outer t) 62 | (when (and part mbox (not (eq stop-reason :eof))) 63 | ;;(format t "advancing to next mbox boundary~%") 64 | (multiple-value-bind (x y z newpos2) 65 | (read-until-boundary stream nil newpos t) 66 | (declare (ignore x y z)) 67 | (setf stop-reason :eof) 68 | (setf newpos newpos2))) 69 | (values part stop-reason newpos)))) 70 | 71 | ;; Returns values: 72 | ;; 1) The part (or nil if EOF while reading readers) 73 | ;; 2) The stop reason (:eof, :close-boundary, nil (meaning regular boundary)) 74 | ;; 3) The new position 75 | 76 | ;: mime-parse-message-rfc822, parse-mime-structure, mime-parse-multipart 77 | ;: 78 | (defun parse-mime-structure-1 (stream boundary digest pos mbox &key outer) 79 | (let ((part (make-instance 'mime-part-parsed))) 80 | (setf (mime-part-position part) pos) 81 | (setf (mime-part-boundary part) boundary) 82 | (multiple-value-bind (headers bytes) 83 | (parse-headers stream mbox) 84 | (if (and (null headers) outer) 85 | (return-from parse-mime-structure-1)) 86 | (setf (mime-part-headers-size part) bytes) 87 | (incf pos bytes) 88 | (setf (mime-part-body-position part) pos) 89 | (setf (mime-part-headers part) headers) 90 | 91 | (let ((content-type (get-header "content-type" headers))) 92 | (setf (mime-part-id part) (get-header "Content-Id" headers)) 93 | (setf (mime-part-description part) 94 | (get-header "Content-description" headers)) 95 | (setf (mime-part-encoding part) 96 | (or (get-header "Content-transfer-encoding" headers) 97 | "7bit")) 98 | 99 | (multiple-value-bind (type subtype params) 100 | (parse-content-type content-type) 101 | 102 | (if* (null type) 103 | then (if* digest 104 | then (setf (mime-part-type part) "message") 105 | (setf (mime-part-subtype part) "rfc822") 106 | (setf (mime-part-parameters part) 107 | '(("charset" . "us-ascii"))) 108 | (mime-parse-message-rfc822 part stream boundary pos 109 | mbox) 110 | else (setup-text-plain-part part stream boundary pos 111 | mbox)) 112 | else (setf (mime-part-type part) type) 113 | (setf (mime-part-subtype part) subtype) 114 | (setf (mime-part-parameters part) params) 115 | 116 | (cond 117 | ((equalp type "multipart") 118 | (mime-parse-multipart part stream boundary pos 119 | mbox)) 120 | ((message-rfc822-p type subtype) 121 | (mime-parse-message-rfc822 part stream boundary pos 122 | mbox)) 123 | (t 124 | (mime-parse-non-multipart part stream boundary pos 125 | mbox))))))))) 126 | 127 | ;: skip-whitespace, parse-header-line, parse-headers 128 | ;: 129 | (defmacro whitespace-char-p (char) 130 | (let ((c (gensym))) 131 | `(let ((,c ,char)) 132 | (or (char= ,c #\space) (char= ,c #\tab) (char= ,c #\newline))))) 133 | 134 | ;; OK if 'string' is nil. 135 | ;; Might return nil 136 | ;; called by parse-mime-structure-1 137 | ;: parse-mime-structure-1 138 | ;: 139 | (defun parse-content-type (string) 140 | (block nil 141 | (if (null string) 142 | (return)) 143 | (let ((max (length string)) 144 | pos type subtype) 145 | (multiple-value-setq (type pos) 146 | (mime-get-token string 0 max)) 147 | (if (string= type "") 148 | (return)) 149 | 150 | (setf pos (skip-whitespace string pos max)) 151 | 152 | (if (or (>= pos max) (char/= (schar string pos) #\/)) 153 | (return)) ;; bogus input 154 | 155 | (multiple-value-setq (subtype pos) 156 | (mime-get-token string (1+ pos) max)) 157 | 158 | (if (string= subtype "") 159 | (return)) ;; bogus input 160 | 161 | (values type subtype (parse-parameters string pos max))))) 162 | 163 | 164 | 165 | 166 | 167 | ;; called by parse-content-type. 168 | ;: parse-content-type 169 | ;: 170 | (defun parse-parameters (string pos max) 171 | (let (char pairs param value) 172 | (while (< pos max) 173 | (setf pos (skip-whitespace string pos max)) 174 | (setf char (schar string pos)) 175 | 176 | (if (char/= char #\;) 177 | (return)) 178 | 179 | (multiple-value-setq (param pos) 180 | (mime-get-token string (1+ pos) max)) 181 | (setf pos (skip-whitespace string pos max)) 182 | (if (or (>= pos max) (char/= (schar string pos) #\=)) 183 | (return)) 184 | (multiple-value-setq (value pos) 185 | (mime-get-parameter-value string (1+ pos) max)) 186 | 187 | (push (cons param value) pairs)) 188 | (values (nreverse pairs) pos))) 189 | 190 | 191 | (defconstant *mime-tspecials* 192 | '(#\( #\) #\< #\> #\@ 193 | #\, #\; #\: #\\ #\" 194 | #\/ #\[ #\] #\? #\=)) 195 | 196 | ;: parse-content-type, parse-parameters, mime-get-parameter-value 197 | ;: mime-get-token, blank-line-p, parse-header-line 198 | ;: 199 | (defun skip-whitespace (string pos max) 200 | (declare (optimize (speed 3) (safety 0)) 201 | (fixnum pos max)) 202 | (while (< pos max) 203 | (if (not (whitespace-char-p (schar string pos))) 204 | (return)) 205 | (incf pos)) 206 | pos) 207 | 208 | ;: parse-parameters 209 | ;: 210 | (defun mime-get-parameter-value (string pos max) 211 | (setf pos (skip-whitespace string pos max)) 212 | (if* (>= pos max) 213 | then (values "" pos) 214 | else (if (char= (schar string pos) #\") 215 | (mime-get-quoted-string string pos max) 216 | (mime-get-token string pos max)))) 217 | 218 | ;: parse-content-type, parse-parameters, mime-get-parameter-value 219 | ;: 220 | (defun mime-get-token (string pos max) 221 | (setf pos (skip-whitespace string pos max)) 222 | (let ((startpos pos) 223 | char) 224 | (while (< pos max) 225 | (setf char (schar string pos)) 226 | (if (or (char= #\space char) (member char *mime-tspecials*)) 227 | (return)) 228 | (incf pos)) 229 | (values (subseq string startpos pos) pos))) 230 | 231 | ;; Doesn't attempt to dequote 232 | ;: mime-get-parameter-value 233 | ;: 234 | (defun mime-get-quoted-string (string pos max) 235 | (let ((res (make-string (- max pos))) 236 | (outpos 0) 237 | char inquote inbackslash) 238 | (while (< pos max) 239 | (setf char (schar string pos)) 240 | 241 | (when (and (char= char #\") (not inbackslash)) 242 | (if* inquote 243 | then (setf (schar res outpos) char) 244 | (incf outpos) 245 | (incf pos) 246 | (return)) 247 | (setf inquote t)) 248 | 249 | (if* inbackslash 250 | then (setf inbackslash nil) 251 | else (if (char= char #\\) 252 | (setf inbackslash t))) 253 | 254 | (setf (schar res outpos) char) 255 | (incf outpos) 256 | (incf pos)) 257 | 258 | (values (subseq res 0 outpos) pos))) 259 | 260 | ;; mime-parse-multipart 261 | ;: 262 | (defun mime-dequote (string) 263 | (block nil 264 | (if (or (string= string "") (char/= (schar string 0) #\")) 265 | (return string)) 266 | 267 | (let* ((max (length string)) 268 | (pos 1) 269 | (res (make-string max)) 270 | (outpos 0) 271 | char inbackslash) 272 | 273 | (while (< pos max) 274 | (setf char (schar string pos)) 275 | 276 | (if (and (char= char #\") (not inbackslash)) 277 | (return)) 278 | 279 | (if* (and (not inbackslash) (char= char #\\)) 280 | then (setf inbackslash t) 281 | (incf pos) 282 | else (setf (schar res outpos) char) 283 | (incf outpos) 284 | (incf pos) 285 | (setf inbackslash nil))) 286 | 287 | (subseq res 0 outpos)))) 288 | 289 | ;: parse-mime-structure-1 290 | ;: 291 | (defun setup-text-plain-part (part stream boundary pos mbox) 292 | (setf (mime-part-type part) "text") 293 | (setf (mime-part-subtype part) "plain") 294 | (setf (mime-part-parameters part) '(("charset" . "us-ascii"))) 295 | (mime-parse-non-multipart part stream boundary pos mbox)) 296 | 297 | ;: setup-text-plain-part, parse-mime-structure-1 298 | ;: 299 | (defun mime-parse-non-multipart (part stream boundary pos mbox) 300 | (let ((startpos pos)) 301 | (multiple-value-bind (size lines eof pos) 302 | (read-until-boundary stream boundary pos mbox) 303 | 304 | (setf (mime-part-lines part) lines) 305 | (setf (mime-part-body-position part) startpos) 306 | (setf (mime-part-body-size part) size) 307 | 308 | (values part eof pos)))) 309 | 310 | ;: parse-mime-structure-1 311 | ;: 312 | (defun mime-parse-message-rfc822 (part stream boundary pos mbox) 313 | (let ((startpos pos)) 314 | (multiple-value-bind (message eof pos) 315 | (parse-mime-structure-1 stream boundary nil pos mbox) 316 | 317 | (setf (mime-part-message part) message) 318 | 319 | (setf (mime-part-body-position part) startpos) 320 | (setf (mime-part-body-size part) (- pos startpos)) 321 | 322 | (values part eof pos)))) 323 | 324 | 325 | ;: parse-mime-structure-1 326 | ;: 327 | (defun mime-parse-multipart (part stream parent-boundary pos mbox) 328 | (let* ((params (mime-part-parameters part)) 329 | (boundary (cdr (assoc "boundary" params :test #'equalp))) 330 | (startpos pos) 331 | parts eof newpart) 332 | 333 | (setf (mime-part-boundary part) parent-boundary) 334 | 335 | ;; If boundary isn't specified.. try to compensate by using our 336 | ;; parent's boundary. 337 | (if* (null boundary) 338 | then (setf boundary parent-boundary) 339 | else (setf boundary (mime-dequote boundary))) 340 | 341 | ;; Locate the first boundary. 342 | (multiple-value-bind (ignore1 ignore2 ignore3 newpos) 343 | (read-until-boundary stream boundary pos mbox) 344 | (declare (ignore ignore1 ignore2 ignore3)) 345 | (setf pos newpos)) 346 | 347 | (until eof 348 | (multiple-value-setq (newpart eof pos) 349 | (parse-mime-structure-1 stream boundary 350 | (equalp (mime-part-subtype part) "digest") 351 | pos mbox)) 352 | (push newpart parts)) 353 | 354 | (setf (mime-part-parts part) (nreverse parts)) 355 | 356 | (setf (mime-part-body-size part) (- pos startpos)) 357 | 358 | ;; Discard everything that follows until we reach the parent-boundary. 359 | (multiple-value-bind (ignore1 ignore2 eof pos) 360 | (read-until-boundary stream parent-boundary pos mbox) 361 | (declare (ignore ignore1 ignore2)) 362 | (values part eof pos)))) 363 | 364 | 365 | ;; support 366 | 367 | (defconstant *whitespace* '(#\space #\tab #\return #\newline)) 368 | 369 | 370 | ;: parse-headers 371 | ;: 372 | (defun parse-header-line (line len) 373 | (declare (optimize (speed 3) (safety 0))) 374 | (let ((pos 0) 375 | colonpos 376 | spacepos) 377 | (declare (fixnum len pos spacepos)) 378 | 379 | (while (< pos len) 380 | (let ((char (schar line pos))) 381 | (when (char= char #\:) 382 | (setf colonpos pos) 383 | (return)) 384 | 385 | (if (and (null spacepos) (whitespace-char-p char)) 386 | (setf spacepos pos))) 387 | 388 | (incf pos)) 389 | 390 | (if (null colonpos) ;; bogus header line 391 | (return-from parse-header-line)) 392 | 393 | (if (null spacepos) 394 | (setf spacepos colonpos)) 395 | 396 | (if (= 0 spacepos) ;; bogus header line (no name) 397 | (return-from parse-header-line)) 398 | 399 | (values (subseq line 0 spacepos) 400 | (subseq line (skip-whitespace line (1+ colonpos) len) len)))) 401 | 402 | ;; Returns offset of end of line in buffer. Or nil if EOF 403 | ;; Second value is the number of characters read (including EOL chars) 404 | 405 | ;: parse-headers, read-until-boundary, collect-message-data-from-mbox 406 | ;: 407 | (defun mime-read-line (stream buffer) 408 | (declare (optimize (speed 3) (safety 0))) 409 | 410 | (if* *mime-read-line-unread* 411 | then (let* ((line (car *mime-read-line-unread*)) 412 | (bytes (cdr *mime-read-line-unread*)) 413 | (len (length line))) 414 | (declare (simple-string line)) 415 | (setf *mime-read-line-unread* nil) 416 | (dotimes (n len) 417 | (setf (schar buffer n) (schar line n))) 418 | (values len bytes)) 419 | else (let ((pos 0) 420 | (end (length buffer)) 421 | (count 0) 422 | char) 423 | (declare (fixnum pos end count)) 424 | 425 | (while (and (< pos end) (setf char (read-char stream nil nil))) 426 | (incf count) 427 | (if (char= char #\newline) 428 | (return)) 429 | (setf (schar buffer pos) char) 430 | (incf pos)) 431 | 432 | (if* (= count 0) 433 | then nil ;; EOF 434 | else ;; Check for CR/LF combo 435 | (if (and (> pos 0) 436 | (char= (schar buffer (1- pos)) #\return)) 437 | (decf pos)) 438 | 439 | (values pos count))))) 440 | 441 | (defun mime-unread-line (line end bytes) 442 | ;; This should never happen 443 | (if *mime-read-line-unread* 444 | (error "Unread buffer is full.")) 445 | (setf *mime-read-line-unread* 446 | (cons (subseq line 0 end) bytes))) 447 | 448 | (eval-when (compile) 449 | (defconstant *parse-headers-line-len* 1024)) 450 | 451 | ;; Returns: 452 | ;; 1) headers alist 453 | ;; 2) # of characters composing the header and terminator. 454 | ;: 455 | ;: parse-mime-structure-1 456 | ;: 457 | (defun parse-headers (stream mbox) 458 | (declare (optimize (speed 3) (safety 0))) 459 | (let ((count 0) 460 | (line (make-array #.*parse-headers-line-len* :element-type 'character)) 461 | headers lastcons current incomplete lastincomplete) 462 | (declare (fixnum count) 463 | (dynamic-extent line)) 464 | 465 | (loop 466 | (multiple-value-bind (end bytes) 467 | (mime-read-line stream line) 468 | (declare (fixnum end bytes)) 469 | 470 | (if (null end) ;; EOF 471 | (return)) 472 | 473 | (setf incomplete (= end #.*parse-headers-line-len*)) 474 | 475 | (if (and mbox (not lastincomplete) (my-prefixp "From " line end)) 476 | (return)) 477 | 478 | (incf count bytes) 479 | 480 | (cond 481 | (lastincomplete ;; rest of a long line 482 | (setf (car lastcons) 483 | (concatenate 'string (car lastcons) (subseq line 0 end)))) 484 | 485 | ((zerop end) ;; blank line 486 | (return)) 487 | 488 | ((whitespace-char-p (schar line 0)) ;; Continuation line 489 | (when (null current) ;; Malformed header line 490 | (decf count bytes) 491 | (mime-unread-line line end bytes) 492 | (return)) 493 | 494 | (let ((newcons (cons (subseq line 0 end) nil))) 495 | (setf (cdr lastcons) newcons) 496 | (setf lastcons newcons))) 497 | 498 | (t ;; Fresh header line 499 | (multiple-value-bind (name value) 500 | (parse-header-line line end) 501 | (when (null name) 502 | ;; Malformed header line. Unread it (so that it 503 | ;; will be treated as part of the body) and 504 | ;; consider the headers terminated. 505 | (decf count bytes) 506 | (mime-unread-line line end bytes) 507 | (return)) 508 | 509 | (setf lastcons (cons value nil)) 510 | (setf current (cons name lastcons)) 511 | (push current headers)))) 512 | 513 | (setf lastincomplete incomplete))) 514 | 515 | ;; Finalize strings. 516 | (dolist (header headers) 517 | (setf (cdr header) (coalesce-header header))) 518 | 519 | (values (nreverse headers) count))) 520 | 521 | ;: parse-headers 522 | ;: 523 | (defun coalesce-header (header) 524 | (declare (optimize (speed 3) (safety 0))) 525 | (let ((stringlist (cdr header))) 526 | (if* (= (length stringlist) 1) 527 | then (first stringlist) 528 | else (let ((len 0)) 529 | (declare (fixnum len)) 530 | (dolist (string stringlist) 531 | (incf len (1+ (the fixnum (length string))))) 532 | (decf len) 533 | (let ((res (make-string len)) 534 | (pos 0) 535 | (first t)) 536 | (declare (fixnum pos)) 537 | (dolist (string stringlist) 538 | (if* first 539 | then (setf first nil) 540 | else (setf (schar res pos) #\newline) 541 | (incf pos)) 542 | (dotimes (n (length string)) 543 | (declare (fixnum n)) 544 | (setf (schar res pos) (schar string n)) 545 | (incf pos))) 546 | res))))) 547 | 548 | ;; Returns: (1) size of part 549 | ;; (2) number of lines read 550 | ;; (3) stop reason (:eof, :close-boundary, or nil (meaning regular 551 | ;; boundary) 552 | ;; (4) new stream position (post boundary read) 553 | ;: mime-parse-multipart, mime-parse-non-multipart 554 | ;: 555 | (defun read-until-boundary (stream boundary pos mbox) 556 | (declare (optimize (speed 3) (safety 0)) 557 | (fixnum pos)) 558 | (if* (and (null boundary) (null mbox)) 559 | then 560 | (multiple-value-bind (lines count) 561 | (count-lines-to-eof stream) 562 | (declare (fixnum count)) 563 | (values count lines :eof (+ pos count))) 564 | else 565 | (let ((line (make-array 16000 :element-type 'character)) 566 | (size 0) 567 | (lines 0) 568 | (stop-reason :eof) 569 | delimiter close-delimiter) 570 | (declare (dynamic-extent line) 571 | (fixnum size lines)) 572 | 573 | (when boundary 574 | (setf delimiter (concatenate 'string "--" boundary)) 575 | (setf close-delimiter (concatenate 'string delimiter "--"))) 576 | 577 | (loop 578 | (multiple-value-bind (end bytes) 579 | (mime-read-line stream line) 580 | (declare (fixnum end bytes)) 581 | 582 | (if (or (null end) 583 | (and mbox (my-prefixp "From " line end))) 584 | (return)) 585 | 586 | (incf pos bytes) 587 | 588 | (when (and delimiter (my-prefixp delimiter line end)) 589 | (if* (my-prefixp close-delimiter line end) 590 | then (setf stop-reason :close-boundary) 591 | else (setf stop-reason nil)) 592 | (return)) 593 | 594 | (incf size bytes) 595 | (incf lines))) 596 | 597 | (values size lines stop-reason pos)))) 598 | 599 | ;; Returns: 600 | ;; 1) number of lines 601 | ;; 2) number of bytes read 602 | ;: read-until-boundary 603 | ;: 604 | (defun count-lines-to-eof (stream) 605 | (declare (optimize (speed 3) (safety 0))) 606 | (let ((buffer (make-array 65536 :element-type '(unsigned-byte 8))) 607 | (lines 0) 608 | (pos 0) 609 | (lastbyte -1) 610 | (count 0) 611 | end) 612 | (declare (dynamic-extent buffer) 613 | (fixnum lines pos end lastbyte count)) 614 | ;; count 10's 615 | ;; XXX: The count will be off if the file has CR/LF convention and 616 | ;; there are bare LFs. 617 | (loop 618 | (setf end (read-vector buffer stream)) 619 | (incf count end) 620 | 621 | (if (= end 0) 622 | (return)) 623 | 624 | (while (< pos end) 625 | (if (= (aref buffer pos) 10) 626 | (incf lines)) 627 | (incf pos)) 628 | 629 | (setf lastbyte (aref buffer (1- pos)))) 630 | 631 | ;; Count last partial line. 632 | (if (and (> lastbyte 0) (/= lastbyte 10)) 633 | (incf lines)) 634 | 635 | (values lines count))) 636 | 637 | (defun my-prefixp (prefix string &optional end) 638 | (declare (optimize (speed 3) (safety 0))) 639 | (let ((lenprefix (length prefix)) 640 | (end (or end (length string)))) 641 | (declare (fixnum lenprefix end)) 642 | (when (>= end lenprefix) 643 | (dotimes (n lenprefix) 644 | (declare (fixnum n)) 645 | (if (char/= (schar prefix n) (schar string n)) 646 | (return-from my-prefixp))) 647 | t))) 648 | 649 | ;;; misc 650 | 651 | (defun stream-to-stream-copy (outstream instream count) 652 | (declare (optimize (speed 3)) 653 | (fixnum count)) 654 | (let ((buf (make-array 4096 :element-type '(unsigned-byte 8)))) 655 | (declare (dynamic-extent buf)) 656 | (while (> count 0) 657 | (let ((got (read-sequence buf instream :end (min count 4096)))) 658 | (declare (fixnum got)) 659 | (if (zerop got) 660 | (error "Unexpected EOF while reading from ~a" instream)) 661 | (write-sequence buf outstream :end got) 662 | (decf count got))))) 663 | 664 | ;; 'instream' must be positioned appropriately by the caller beforehand. 665 | (defmacro with-part-stream ((sym part instream &key (header t)) &body body) 666 | (let ((p (gensym)) 667 | (stream (gensym)) 668 | (count (gensym))) 669 | `(let* ((,p ,part) 670 | (,stream ,instream) 671 | (,count (mime-part-body-size ,p))) 672 | (if ,header 673 | (incf ,count (mime-part-headers-size ,p))) 674 | (excl:with-function-input-stream 675 | (,sym #'stream-to-stream-copy ,stream ,count) 676 | ,@body)))) 677 | 678 | 679 | 680 | ;;; testing 681 | 682 | #| 683 | (defun test-parse-mime (file &key (pretty t)) 684 | (with-open-file (f file) 685 | (let ((res (parse-mime-structure f))) 686 | (if pretty 687 | (pprint-rfc822-part res) 688 | res)))) 689 | 690 | (defun pprint-rfc822-part (thing &optional (prefix "")) 691 | (if (null thing) 692 | (error "Strange. something called pprint-rfc822-part with nil")) 693 | (let ((type (mime-part-type thing)) 694 | (subtype (mime-part-subtype thing))) 695 | (format t "~AHEADER ([RFC-2822] header of the message)~%" prefix) 696 | (format t "~ATEXT ([RFC-2822] text body of the message) ~A/~A~%" prefix type subtype) 697 | 698 | (if* (message-rfc822-p type subtype) 699 | then ;; XXX .. what should the new prefix be? 700 | (pprint-rfc822-part (mime-part-message thing) prefix) 701 | elseif (equalp type "multipart") 702 | then (pprint-multipart thing prefix)))) 703 | 704 | (defun pprint-multipart (thing prefix) 705 | (let ((id 1)) 706 | (dolist (part (mime-part-parts thing)) 707 | (let ((type (mime-part-type part)) 708 | (subtype (mime-part-subtype part))) 709 | (format t "~a~a ~a/~a~%" prefix id type subtype) 710 | (if* (message-rfc822-p type subtype) 711 | then (pprint-rfc822-part (mime-part-message part) 712 | (format nil "~a~a." prefix id)) 713 | elseif (equalp type "multipart") 714 | then (pprint-multipart part (format nil "~a~a." prefix id))) 715 | (incf id))))) 716 | |# 717 | -------------------------------------------------------------------------------- /mime-parse.txt: -------------------------------------------------------------------------------- 1 | Preliminary documentation. 2 | 3 | Class: mime-part-parsed 4 | Subclass of: mime-part 5 | 6 | Slot accessors: 7 | 8 | * mime-part-type 9 | 10 | * mime-part-subtype 11 | 12 | * mime-part-parameters 13 | * mime-part-id 14 | 15 | * mime-part-description 16 | 17 | * mime-part-encoding 18 | 19 | * mime-part-headers 20 | * mime-part-parts 21 | * mime-part-boundary 22 | 23 | 24 | * mime-part-headers-size 25 | This is the size, in bytes, of the header portion of the part 26 | (including the blank line terminator) 27 | 28 | * mime-part-body-size 29 | This is the size, in bytes, of the body of the part. 30 | 31 | * mime-part-lines 32 | For non-multipart types, this is the number of lines that make 33 | up the part body. 34 | 35 | * mime-part-position 36 | This is the file positon of the start of the part headers, 37 | relative to the beginning of the topmost part. 38 | 39 | * mime-part-body-position 40 | This is the file position of the start of the part body, 41 | relative to the beginning of the topmost part. 42 | 43 | * mime-part-message 44 | For message/rfc822 parts, this slot contains the 45 | mime-part-parsed object which represents the encapsulated message. 46 | 47 | Function: parse-mime-structure 48 | 49 | Arguments: stream &key mbox 50 | 51 | Return values: 52 | The function returns two values: 53 | 54 | 1) A mime-part-parsed object which represents the topmost MIME part 55 | (which may possibly contain subparts). If there was no message to 56 | parse (such as if the stream is at EOF), then nil is returned. 57 | 58 | 2) The number of bytes that were read. 59 | 60 | 'stream' should be a stream that is positioned at the first header of 61 | a MIME-compliant email message. 62 | 63 | If 'mbox' is true, then parsing will terminate at EOF or when a line 64 | which begins with "From " is read. 65 | 66 | MIME messages always have a topmost part and may possibly have 67 | multiple subparts which may recursively have their own subparts. This 68 | function reads 'stream' and creates a mime-part-parsed object which 69 | contains information about these parts and subparts. For each part, 70 | the following information is collected: 71 | 72 | * The major content type, (e.g, "text", if the Content-Type header 73 | is "text/html"). 74 | * The content subtype, (e.g., "html", if the Content-Type header is 75 | "text/html"). 76 | * Any parameter information that was supplied in the Content-Type 77 | header. 78 | * The part id, as determined by the Content-Id header, if there was 79 | one. 80 | * The part description, as determined by the Content-Description 81 | header, if there was one. 82 | * The part encoding, as determined by the Content-Transfer-Encoding 83 | header. 84 | * The boundary string, for multipart types. 85 | * The list of subparts (which are also mime-part-parsed objects) for 86 | multipart parts. 87 | * The encapsulated message (which is a mime-part-parsed object) for 88 | message/rfc822 parts. 89 | * The size of the part headers (in bytes). 90 | * The size of the part body (in bytes). 91 | * The number of lines comprising the part body. 92 | * The file position of the beginning of the part headers, relative to the 93 | position of the topmost part. For the topmost part, this value will 94 | always be zero. 95 | * The file position of the beginning of the part body, relative to the 96 | position of the topmost part. 97 | 98 | See also: . 100 | 101 | 102 | Function: map-over-parts 103 | Arguments: part function 104 | 105 | 'part' must be a mime-part. 106 | 107 | 'function' must be a function (or symbol naming a function) which 108 | takes a single argument, a mime-part. 109 | 110 | map-over-parts calls 'function' on 'part', then, if part contains 111 | subparts (or an encapsulated message in the case of a message/rfc822 112 | part), map-over-parts is called recursively for each subpart (or 113 | encapsulated message). 114 | 115 | Function: qp-encode-stream 116 | Arguments: instream outstream &key wrap-at 117 | 118 | This function reads bytes from instream and writes them in 119 | quoted-printable format to outstream. Lines in the output are wrapped 120 | approximately every 'wrap-at' output characters (which defaults to 121 | 72). Wrapping may be late by up to 3 characters under some 122 | circumstances (e.g., when using the default 'wrap-at' value of 72, 123 | some lines may be 75 characters long). 124 | 125 | 'instream' is read until EOF is seen. 126 | 127 | 'instream' must be a stream capable of being read in an octet-oriented 128 | manner. In particular, it cannot be a string stream. 129 | 130 | See also: qp-decode-stream, base64-encode-stream, base64-decode-stream 131 | 132 | 133 | 134 | Function: qp-decode-stream 135 | Arguments: instream outstream 136 | 137 | This function reads quoted-printable encoded text from 'instream' and 138 | writes the decoded text to 'outstream'. Reading continues until 139 | end-of-file is seen on 'instream'. 140 | 141 | See also: qp-encode-stream, base64-encode-stream, base64-decoed-stream 142 | 143 | 144 | Macro: with-part-stream 145 | Arguments: (sym part instream &key (header t)) &body body 146 | 147 | with-part-stream evaluates 'body' with 'sym' bound to an input stream 148 | which, when read, supplies bytes from 'instream'. 'instream' must be 149 | positioned either at the beginning of the part headers (if keyword 150 | argument 'header' is true) or at the beginning of the part body (if 151 | keyword argument 'header' is false). The 'part' argument is used by 152 | the macro to determine how many bytes from 'instream' will be used. 153 | 154 | The primary purpose of this macro is to create a stream that will 155 | generate an end-of-file indicator when the contents of a part have 156 | been completely read. Such a stream is useful for passing to 157 | functions which expect to read a stream until EOF (such as 158 | 'decode-quoted-printable' or 'excl:base64-decode-stream'). 159 | 160 | See also: with-decoded-part-body-stream 161 | 162 | Macro: with-decoded-part-body-stream 163 | Arguments: (sym part instream) &body body 164 | 165 | with-decoded-part-body-stream evaluates 'body' with 'sym' bound to an 166 | input stream which, when read, supplies decoded bytes. The encoded 167 | bytes are read from 'instream', which should be an input stream whose 168 | file position is at the beginning of the part body. The 'part' 169 | argument is used by the macro to determine the size of the part body 170 | and also to determine the content transfer encoding of the part. The 171 | input stream bound to 'sym' will signal end-of-file when the part body 172 | has been exhausted. 173 | 174 | Example: 175 | 176 | (use-package :net.post-office) 177 | 178 | (defun extract-all-jpegs (filename) 179 | (with-open-file (f filename) 180 | (let ((toppart (parse-mime-structure f)) 181 | (count 0)) 182 | 183 | (flet ((extract-jpeg (part) 184 | (if* (and (equalp (mime-part-type part) "image") 185 | (equalp (mime-part-subtype part) "jpeg")) 186 | then (incf count) 187 | (let ((filename (format nil "image~d.jpg" count))) 188 | (format t "Saving ~a...~%" filename) 189 | (with-open-file (out filename :direction :output) 190 | ;; Position source file pointer to the beginning 191 | ;; of the part body. 192 | (file-position f (mime-part-body-position part)) 193 | (with-decoded-part-body-stream (bod part f) 194 | (sys:copy-file bod out))))))) 195 | 196 | (map-over-parts toppart #'extract-jpeg))))) 197 | 198 | See also: mime-part-body-position slot accessor. 199 | -------------------------------------------------------------------------------- /mime-transfer-encoding.cl: -------------------------------------------------------------------------------- 1 | ;; -*- mode: common-lisp; package: net.post-office -*- 2 | ;; 3 | ;; See the file LICENSE for the full license governing this code. 4 | ;; 5 | 6 | (defpackage :net.post-office 7 | (:use #:lisp #:excl) 8 | (:import-from #:excl #:base64-encode-stream 9 | #+(or (version= 7 0) 10 | (version= 8 0) 11 | (version>= 8 1 pre-beta 5)) 12 | #:base64-decode-stream) 13 | (:export 14 | #:base64-encode-stream 15 | #:base64-decode-stream 16 | #:qp-encode-stream 17 | #:qp-decode-stream 18 | #:qp-decode-usb8 19 | #:qp-decode-string 20 | #:with-decoded-part-body-stream)) 21 | 22 | (in-package :net.post-office) 23 | 24 | ;;; Supported transfer encodings 25 | 26 | ;; encoders 27 | 28 | (defun raw-encode-stream (instream outstream) 29 | (declare (optimize (speed 3) (safety 0))) 30 | (let ((buf (make-array 4096 :element-type '(unsigned-byte 8))) 31 | got) 32 | (declare (dynamic-extent buf) 33 | (fixnum got)) 34 | (while (/= 0 (setf got (read-vector buf instream))) 35 | (write-vector buf outstream :end got)))) 36 | 37 | (defconstant *qp-hex-digits* "0123456789ABCDEF") 38 | 39 | ;; wrap-at is not a hard limit but more of a suggestion. it may be 40 | ;; late by by 3 characters. 41 | (defun qp-encode-stream (instream outstream &key (wrap-at 72)) 42 | (declare (optimize (speed 3))) 43 | (let ((prev 0) 44 | (outcol 0) 45 | byte) 46 | (declare (fixnum byte prev)) 47 | 48 | (macrolet ((whitespace (x) 49 | (let ((xx (gensym))) 50 | `(let ((,xx ,x)) 51 | (or (= ,xx 9) (= ,xx 32)))))) 52 | 53 | (labels ((check-linewrap () 54 | (if* (and wrap-at (>= outcol wrap-at)) 55 | then (format outstream "=~%" outstream) 56 | (setf outcol 0))) 57 | (check-deferred () 58 | (if* (and (= prev 13) (/= byte 10)) 59 | then ;; previous byte was bare CR. Handle 60 | (check-linewrap) 61 | (write-string "=0D" outstream) 62 | (incf outcol 3)) 63 | 64 | (if* (whitespace prev) 65 | then (if* (or (= byte 0) (= byte 10) (= byte 13)) 66 | then ;; EOF, EOL, probable EOL. Encode. 67 | (check-linewrap) 68 | (format outstream "=20") 69 | (incf outcol 3) 70 | else ;; Safe to print deferred whitespace 71 | (check-linewrap) 72 | (write-char (code-char prev) outstream) 73 | (incf outcol 1))))) 74 | 75 | (while (setf byte (read-byte instream nil nil)) 76 | (check-deferred) 77 | 78 | (if* (or (and (>= byte 33) (<= byte 60)) 79 | (and (>= byte 62) (<= byte 126))) 80 | then (check-linewrap) 81 | (write-char (code-char byte) outstream) 82 | (incf outcol) 83 | elseif (or (= byte 13) (whitespace byte)) 84 | thenret ;; defer handling 85 | elseif (= byte 10) ;; LF 86 | then (write-char #\newline outstream) 87 | (setf outcol 0) 88 | else (check-linewrap) 89 | (format outstream "=~c~c" 90 | (schar *qp-hex-digits* 91 | (ash byte -4)) 92 | (schar *qp-hex-digits* 93 | (logand byte #xf))) 94 | (incf outcol 3)) 95 | 96 | (setf prev byte)) 97 | 98 | ;; Handle final deferred data 99 | (setf byte 0) 100 | (check-deferred))))) 101 | 102 | ;; Decoding stuff 103 | 104 | 105 | ;; Used by qp-decode-stream 106 | (eval-when (compile) 107 | (defconstant *qp-digit-values* 108 | #.(let ((arr (make-array 257 :element-type 'fixnum))) 109 | (dotimes (n 256) 110 | (setf (aref arr n) 111 | (if* (<= (char-code #\0) n (char-code #\9)) 112 | then (- n (char-code #\0)) 113 | elseif (<= (char-code #\A) n (char-code #\F)) 114 | then (- n (- (char-code #\A) 10)) 115 | elseif (<= (char-code #\a) n (char-code #\f)) 116 | then (- n (- (char-code #\a) 10)) 117 | else -1))) 118 | (setf (aref arr 256) -2) 119 | arr))) 120 | 121 | (defun qp-decode-stream (instream outstream &key count) 122 | (declare (optimize (speed 3))) 123 | 124 | (let (unread-buf) 125 | 126 | (macrolet ((unread (byte) 127 | `(progn 128 | (setf unread-buf ,byte) 129 | (if count 130 | (incf count)))) 131 | (get-byte (&key eof-value) 132 | `(block get-byte 133 | (if* count 134 | then (if (zerop count) 135 | (return-from get-byte ,eof-value)) 136 | (decf count)) 137 | (if* unread-buf 138 | then (prog1 unread-buf 139 | (setf unread-buf nil)) 140 | else (read-byte instream nil ,eof-value)))) 141 | (out (byte) 142 | `(write-byte ,byte outstream)) 143 | (eol-p (byte) 144 | `(or (eq ,byte 10) (eq ,byte 13)))) 145 | 146 | (let (byte) 147 | (while (setf byte (get-byte)) 148 | (if* (eq byte #.(char-code #\=)) 149 | then (let ((nextbyte (get-byte))) 150 | (if* (null nextbyte) 151 | then ;; stray equal sign. just dump and terminate. 152 | (out byte) 153 | (return)) 154 | (if* (eol-p nextbyte) 155 | then ;; soft line break. 156 | (if (eq nextbyte 13) ;; CR 157 | (setf nextbyte (get-byte))) 158 | (if (not (eq nextbyte 10)) ;; LF 159 | (unread nextbyte)) 160 | else ;; =XY encoding 161 | (let* ((byte3 (get-byte :eof-value 256)) 162 | (high (aref #.*qp-digit-values* nextbyte)) 163 | (low (aref #.*qp-digit-values* byte3)) 164 | (value (logior (the fixnum (ash high 4)) low))) 165 | (declare (fixnum byte3 high low value)) 166 | (if* (< value 0) 167 | then ;; Invalid or truncated encoding. just dump it. 168 | (out byte) 169 | (out nextbyte) 170 | (if* (eq low -2) ;; EOF 171 | then (return) 172 | else (out byte3)) 173 | else (out value))))) 174 | else (out byte))) 175 | 176 | t)))) 177 | 178 | ;; 'out' should be at least the size of 'in'. If it is nil, 179 | ;; a usb8 array will be allocated and used. It is okay if 'out' is the 180 | ;; same buffer as 'in'. 181 | ;; Returns: 182 | ;; 1) the supplied or allocated array 183 | ;; 2) the just past the last byte populated in the array. 184 | (defun qp-decode-usb8 (in out &key (start1 0) (end1 (length in)) 185 | (start2 0) end2 186 | underscores-are-spaces) 187 | (declare (optimize (speed 3)) 188 | ((simple-array (unsigned-byte 8) (*)) in out) 189 | (fixnum start1 end1 start2 end2)) 190 | 191 | (if (null out) 192 | (setf out (make-array (length in) :element-type '(unsigned-byte 8)))) 193 | 194 | (if (null end2) 195 | (setf end2 (length out))) 196 | 197 | (let ((count (- end1 start1))) 198 | (declare (fixnum count)) 199 | 200 | (if (< count 0) 201 | (error "start1 must be less than end1")) 202 | 203 | (if (> start2 end2) 204 | (error "start2 must be less than end2")) 205 | 206 | (if (< (the fixnum (- end2 start2)) count) 207 | (error "Not enough room in output array")) 208 | 209 | (macrolet ((unread (byte) 210 | (declare (ignore byte)) 211 | `(decf start1)) 212 | (get-byte (&key eof-value) 213 | `(if* (>= start1 end1) 214 | then ,eof-value 215 | else (prog1 (aref in start1) 216 | (incf start1)))) 217 | (out (byte) 218 | `(prog1 (setf (aref out start2) ,byte) 219 | (incf start2))) 220 | (eol-p (byte) 221 | `(or (eq ,byte 10) (eq ,byte 13)))) 222 | 223 | (let (byte) 224 | (while (setf byte (get-byte)) 225 | (if* (eq byte #.(char-code #\=)) 226 | then (let ((nextbyte (get-byte))) 227 | (if* (null nextbyte) 228 | then ;; stray equal sign. just dump and terminate. 229 | (out byte) 230 | (return)) 231 | (if* (eol-p nextbyte) 232 | then ;; soft line break. 233 | (if (eq nextbyte 13) ;; CR 234 | (setf nextbyte (get-byte))) 235 | (if (not (eq nextbyte 10)) ;; LF 236 | (unread nextbyte)) 237 | else ;; =XY encoding 238 | (let* ((byte3 (get-byte :eof-value 256)) 239 | (high (aref #.*qp-digit-values* nextbyte)) 240 | (low (aref #.*qp-digit-values* byte3)) 241 | (value (logior (the fixnum (ash high 4)) low))) 242 | (declare (fixnum byte3 high low value)) 243 | (if* (< value 0) 244 | then ;; Invalid or truncated encoding. just dump it. 245 | (out byte) 246 | (out nextbyte) 247 | (if* (eq low -2) ;; EOF 248 | then (return) 249 | else (out byte3)) 250 | else (out value))))) 251 | elseif (and underscores-are-spaces (eq byte #.(char-code #\_))) 252 | then ;; See the discussion in bug18636 about why this is 253 | ;; done. 254 | (out #.(char-code #\space)) 255 | else (out byte))) 256 | 257 | (values out start2))))) 258 | 259 | (defun qp-decode-string (string &key (start 0) (end (length string)) 260 | (return :string) 261 | (external-format :default) 262 | underscores-are-spaces) 263 | (multiple-value-bind (vec len) 264 | (string-to-octets string :start start :end end :null-terminate nil 265 | :external-format :latin1) 266 | (multiple-value-setq (vec len) 267 | (qp-decode-usb8 vec vec :end1 len 268 | :underscores-are-spaces underscores-are-spaces)) 269 | (ecase return 270 | (:string 271 | (octets-to-string vec :end len :external-format external-format)) 272 | (:usb8 273 | (subseq vec 0 len))))) 274 | 275 | ;; 'instream' must be positioned at the beginning of the part body 276 | ;; by the caller beforehand. 277 | (defmacro with-decoded-part-body-stream ((sym part instream) &body body) 278 | (let ((p (gensym)) 279 | (encoding (gensym)) 280 | (count (gensym)) 281 | (charset (gensym)) 282 | (ef (gensym))) 283 | `(let* ((,p ,part) 284 | (,encoding (mime-part-encoding ,p)) 285 | (,count (mime-part-body-size ,p))) 286 | (excl:with-function-input-stream (,sym #'mime-decode-transfer-encoding 287 | ,instream 288 | ,encoding 289 | ,count) 290 | (let* ((,charset (or (cdr (assoc "charset" (mime-part-parameters ,p) 291 | :test #'equalp)) 292 | "us-ascii")) 293 | (,ef (or (charset-to-external-format ,charset) :latin1))) 294 | (setf (stream-external-format ,sym) ,ef)) 295 | ,@body)))) 296 | 297 | (defun mime-decode-transfer-encoding (outstream instream encoding count) 298 | (cond 299 | ((equalp encoding "quoted-printable") 300 | (qp-decode-stream instream outstream :count count)) 301 | #+(or (version= 7 0) 302 | (version= 8 0) 303 | (version>= 8 1 pre-beta 5)) 304 | ((equalp encoding "base64") 305 | (excl:base64-decode-stream instream outstream :count count :error-p nil)) 306 | (t 307 | ;; defined in mime-parse.cl 308 | (stream-to-stream-copy outstream instream count)))) 309 | -------------------------------------------------------------------------------- /rfc2822.cl: -------------------------------------------------------------------------------- 1 | ;; -*- mode: common-lisp; package: net.mail -*- 2 | ;; 3 | ;; See the file LICENSE for the full license governing this code. 4 | 5 | #+(version= 10 1) 6 | (sys:defpatch "rfc2822" 1 7 | "v1: dns-record-exists-p: find correct cname entry to query." 8 | :type :system 9 | :post-loadable t) 10 | 11 | #+(version= 9 0) 12 | (sys:defpatch "rfc2822" 1 13 | "v1: parse-email-address: check length of local-part" 14 | :type :system 15 | :post-loadable t) 16 | 17 | (defpackage :net.mail 18 | (:use #:lisp #:excl) 19 | (:export #:parse-email-address 20 | #:extract-email-addresses 21 | #:valid-email-domain-p)) 22 | 23 | (in-package :net.mail) 24 | 25 | #| 26 | Email address parser. If parsing succeeds, then the email address 27 | has valid syntax. 28 | 29 | The parser should be RFC2822 compliant except: 30 | 31 | * It optionally allows for domain-less addresses. 32 | * By default, it requires the domain part to have two components (not 33 | actually required by the spec). 34 | * It does not allow domain literals (e.g., "joe-user@[192.132.95.23]") 35 | * It does not allow quoted strings. 36 | 37 | Exports: 38 | 39 | Function: parse-email-address 40 | Args: string &key require-domain require-dotted-domain 41 | 42 | Parses an email address string and returns two values: the username 43 | part of the address and the domain part of the address. 44 | 45 | Keyword arguments: 46 | 47 | :require-domain 48 | 49 | defaults to true. If true, then the @domain part of the email address 50 | is required. If nil, then the @domain part of the email address is 51 | not required. If it is not found, then the second return value of 52 | this function will be nil. 53 | 54 | :require-dotted-domain 55 | 56 | defaults to true. If true, then the domain part of the email address 57 | must have two dotted components (e.g., "franz.com"). If nil, then a 58 | single-component domain part is accepted (e.g., "com"). 59 | 60 | --- 61 | 62 | Function: valid-email-domain-p 63 | Args: domain 64 | 65 | Returns information on whether or not the DNS configuration for 66 | 'domain' is configured properly for Internet email reception. 67 | 68 | Possible return values: 69 | 70 | nil 71 | 72 | This means that the DNS records for 'domain' are not properly 73 | configured for Internet email. 74 | 75 | :unknown 76 | 77 | This means that no information was successfully collected. No 78 | conclusion can be drawn. 79 | 80 | t 81 | 82 | This means that 'domain' has DNS records that are suitable for 83 | Internet email reception. This does not necessarily mean that email 84 | delivery will succeed. 85 | 86 | Note: This function is more useful for its negative response (nil) 87 | than any other response. If it returns nil, it means that no standard 88 | mail transfer agent would be able to locate the mail server for the 89 | domain. 90 | 91 | |# 92 | 93 | (eval-when (compile eval) 94 | (defconstant *controls* "\\x0-\\x1f") 95 | 96 | (defconstant *specials* "()<>\\[\\]:;@\\,.\"") 97 | 98 | (defconstant *atext* 99 | (format nil "[^\\s~a~a]" *controls* *specials*)) 100 | 101 | (defconstant *atom* (format nil "^~a+" *atext*)) 102 | 103 | (defconstant *dot-atom-text* (format nil "~a+(?:\\.~a+)*" *atext* *atext*)) 104 | 105 | ;; More strict than the RFC, but good for verifying syntax of email 106 | ;; addresses that a user supplies. 107 | 108 | (defconstant *email-address-re* 109 | (format nil "^\\s*(~a)(?:@(~a))?\\s*$" *dot-atom-text* *dot-atom-text*)) 110 | 111 | ) 112 | 113 | (defun parse-email-address (string &key (require-domain t) 114 | (require-dotted-domain t)) 115 | (multiple-value-bind (matched x local-part domain) 116 | (match-re #.*email-address-re* string) 117 | (declare (ignore x)) 118 | (if* (or 119 | ;; Failure cases 120 | (not matched) 121 | (and require-domain (null domain)) 122 | (and require-dotted-domain domain (zerop (count #\. domain))) 123 | ;; From rfc3696 124 | (> (length local-part) 64)) 125 | then nil 126 | else (values local-part domain)))) 127 | 128 | ;; Returns a list of entries like so: 129 | ;; (:mailbox display-name user domain) 130 | ;; or 131 | ;; (:group display-name mailbox-list) 132 | ;; or, if 'compact' keyword arg is true, returns a flattened list of 133 | ;; user@domain strings. 134 | 135 | (defun extract-email-addresses (string &key (start 0) (end (length string)) 136 | (require-domain t) (errorp t) 137 | compact) 138 | (declare (optimize (speed 3)) 139 | (fixnum start end)) 140 | (with-underlying-simple-vector (string string disp) 141 | (declare (simple-string string) 142 | (fixnum disp)) 143 | (incf start disp) 144 | (incf end disp) 145 | 146 | ;; Unfold. 147 | (when (match-re "\\r?\\n\\s" string :start start :end end) 148 | (setf string (replace-re string "\\r?\\n\\s" " " 149 | :start start :end end)) 150 | (setf start 0) 151 | (setf end (length string))) 152 | 153 | (let ((res 154 | (catch 'syntax-error 155 | (parse-address-list string start end require-domain)))) 156 | (if* (stringp res) 157 | then (if errorp (error res)) 158 | elseif (null res) 159 | then (if errorp 160 | (error "Failed to parse: ~s" (subseq string start end))) 161 | nil 162 | elseif compact 163 | then (compact-extracted-addresses res) 164 | else res)))) 165 | 166 | (defun compact-extracted-addresses (list) 167 | (declare (optimize (speed 3))) 168 | (let (res) 169 | (dolist (entry list) 170 | (let ((type (car entry))) 171 | (ecase type 172 | (:mailbox 173 | (let ((user (third entry)) 174 | (domain (fourth entry))) 175 | (push (if* domain 176 | then (concatenate 'string user "@" domain) 177 | else user) 178 | res))) 179 | (:group 180 | (dolist (addr (compact-extracted-addresses (third entry))) 181 | (push addr res)))))) 182 | (nreverse res))) 183 | 184 | (macrolet ((parse-special (char skip-ws) 185 | `(multiple-value-bind (type value newpos) 186 | (rfc2822-lex string start end ,skip-ws) 187 | (declare (ignore type)) 188 | (when (eq value ,char) 189 | (setf start newpos))))) 190 | 191 | ;; Supports obsolete format which allows for null members in the list. 192 | (defun parse-address-list (string start end require-domain) 193 | (let (res) 194 | (loop 195 | (while (parse-special #\, t)) 196 | (multiple-value-bind (addr newpos) 197 | (parse-address string start end require-domain) 198 | (if (null addr) 199 | (return)) 200 | (setf start newpos) 201 | (push addr res))) 202 | (values (nreverse res) start))) 203 | 204 | (defun parse-address (string start end require-domain) 205 | (multiple-value-bind (mb newpos) 206 | (parse-mailbox string start end require-domain) 207 | (if* mb 208 | then (values mb newpos) 209 | else (parse-group string start end require-domain)))) 210 | 211 | 212 | (defun parse-mailbox (string start end require-domain) 213 | (multiple-value-bind (ok display-name localpart domain newpos) 214 | (parse-name-addr string start end require-domain) 215 | (if ok 216 | (return-from parse-mailbox 217 | (values 218 | (list :mailbox display-name localpart domain) 219 | newpos)))) 220 | (multiple-value-bind (localpart domain newpos) 221 | (parse-addr-spec string start end require-domain) 222 | (when localpart 223 | (setf start newpos) 224 | ;; Check for a trailing comment and use that as the display name 225 | (multiple-value-bind (display-name newpos) 226 | (grab-next-comment string start end) 227 | (if display-name 228 | (setf start newpos)) 229 | (values 230 | (list :mailbox display-name localpart domain) 231 | start))))) 232 | 233 | (defun grab-next-comment (string start end) 234 | (loop 235 | (multiple-value-bind (type value newpos) 236 | (rfc2822-lex string start end nil) 237 | (if (eq type :comment) 238 | (return (values (replace-re value "^\\((.*)\\)$" "\\1") newpos))) 239 | (if* (eq type :wsp) 240 | then (setf start newpos) 241 | else (return))))) 242 | 243 | (defun parse-group (string start end require-domain) 244 | (multiple-value-bind (display-name newpos) 245 | (parse-phrase string start end) 246 | (when display-name 247 | (setf start newpos) 248 | (when (parse-special #\: t) 249 | (multiple-value-bind (mailbox-list newpos) 250 | (parse-mailbox-list string start end require-domain) 251 | (setf start newpos) 252 | (when (parse-special #\; t) 253 | (values (list :group display-name mailbox-list) newpos))))))) 254 | 255 | (defun parse-mailbox-list (string start end require-domain) 256 | (let (res) 257 | (loop 258 | (multiple-value-bind (mailbox newpos) 259 | (parse-mailbox string start end require-domain) 260 | (if (null mailbox) 261 | (return)) 262 | (push mailbox res) 263 | (setf start newpos) 264 | (if (not (parse-special #\, t)) 265 | (return)))) 266 | (values (nreverse res) start))) 267 | 268 | (defun parse-name-addr (string start end require-domain) 269 | (multiple-value-bind (display-name newpos) 270 | (parse-phrase string start end) 271 | (if display-name 272 | (setf start newpos)) 273 | (multiple-value-bind (localpart domain newpos) 274 | (parse-angle-addr string start end require-domain) 275 | (when localpart 276 | (values t display-name localpart domain newpos))))) 277 | 278 | ;; This is obs-phrase, which is seen often. For example: 279 | ;; From: Mr. T 280 | (defun parse-phrase (string start end) 281 | (let ((first t) 282 | res type value newpos) 283 | (loop 284 | (multiple-value-setq (type value newpos) 285 | (rfc2822-lex string start end first)) 286 | (if* (or (eq type :atom) 287 | (eq type :quoted-string) 288 | (and (not first) (or (eq value #\.) (eq type :wsp)))) 289 | then (push value res) 290 | (setf first nil) 291 | (setf start newpos) 292 | else (return))) 293 | ;; Dump any trailing whitespace we collected 294 | (if (and (stringp res) (match-re "^\\s" (first res))) 295 | (pop res)) 296 | (if res 297 | (values (list-to-delimited-string (nreverse res) "") start)))) 298 | 299 | (defun parse-angle-addr (string start end require-domain) 300 | (when (parse-special #\< t) 301 | (multiple-value-bind (localpart domain newpos) 302 | (parse-addr-spec string start end require-domain) 303 | (setf start newpos) 304 | (when (and localpart (parse-special #\> t)) 305 | (values localpart domain start))))) 306 | 307 | (defun parse-addr-spec (string start end require-domain) 308 | (multiple-value-bind (localpart newpos) 309 | (parse-local-part string start end) 310 | (when localpart 311 | (setf start newpos) 312 | (when (not (parse-special #\@ t)) 313 | (if* require-domain 314 | then (return-from parse-addr-spec) 315 | else (return-from parse-addr-spec 316 | (values localpart nil start)))) 317 | (multiple-value-bind (domain newpos) 318 | (parse-dot-atom string start end) 319 | (when domain 320 | (values localpart domain newpos))))))) 321 | 322 | (defun parse-local-part (string start end) 323 | (multiple-value-bind (type value newpos) 324 | (rfc2822-lex string start end t) 325 | (if* (eq type :quoted-string) 326 | then (values value newpos) 327 | elseif (eq type :atom) 328 | then (parse-dot-atom string start end)))) 329 | 330 | (defun parse-dot-atom (string start end) 331 | (let ((first t) 332 | res) 333 | (loop 334 | (multiple-value-bind (type value newpos) 335 | (rfc2822-lex string start end first) 336 | (setf first nil) 337 | (if (null type) 338 | (return)) 339 | (if* (eq type :atom) 340 | then (push value res) 341 | elseif (not (eq value #\.)) 342 | then (return)) 343 | (setf start newpos))) 344 | (if res 345 | (values (list-to-delimited-string (nreverse res) #\.) start)))) 346 | 347 | (eval-when (compile) 348 | (defconstant *max-comment-nesting* 3) 349 | 350 | (defparameter *cchar* "(?:[^()\\\\]|\\\\.)") 351 | (defparameter *comment* nil) 352 | 353 | (dotimes (n *max-comment-nesting*) 354 | (if* *comment* 355 | then (setf *comment* (format nil "(?:\\((?:~a|~a)*\\))" 356 | *cchar* *comment*)) 357 | else (setf *comment* (format nil "(?:\\(~a*\\))" *cchar*)))) 358 | 359 | (setf *comment* (format nil "^~a" *comment*))) 360 | 361 | (defun rfc2822-lex (string start end skip-ws) 362 | (declare (optimize (speed 3)) 363 | (simple-string string) 364 | (fixnum start end)) 365 | (when (< start end) 366 | (let ((char (schar string start))) 367 | (if* (eq char #\") 368 | then ;; quoted string. 369 | (multiple-value-bind (matched whole) 370 | (match-re "^\"((?:[^\\\\\"]|\\\\.)*)\"" string 371 | :start start :end end 372 | :return :index) 373 | (if (not matched) 374 | (throw 'syntax-error "Unterminated quoted string")) 375 | (values :quoted-string 376 | (subseq string (car whole) (cdr whole)) 377 | (cdr whole))) 378 | elseif (or (eq char #\space) (eq char #\tab) 379 | (eq char #\return) (eq char #\newline)) 380 | then ;; whitespace 381 | (multiple-value-bind (x match) 382 | (match-re "^\\s+" string 383 | :start start :end end :return :index) 384 | (declare (ignore x)) 385 | (if* skip-ws 386 | then (rfc2822-lex string (cdr match) end t) 387 | else (values :wsp 388 | (subseq string (car match) (cdr match)) 389 | (cdr match)))) 390 | elseif (eq char #\() 391 | then ;; comment 392 | (multiple-value-bind (matched whole) 393 | (match-re #.*comment* string 394 | :start start :end end :return :index) 395 | (if (not matched) 396 | (throw 'syntax-error 397 | "Unterminated comment or nesting too deep")) 398 | (if* skip-ws 399 | then (rfc2822-lex string (cdr whole) end t) 400 | else (values :comment 401 | (subseq string (car whole) (cdr whole)) 402 | (cdr whole)))) 403 | else (multiple-value-bind (matched whole) 404 | (match-re *atom* string :start start :end end 405 | :return :index) 406 | (if* (not matched) 407 | then ;; must be a special 408 | (values :special 409 | char 410 | (1+ start)) 411 | else ;; atom 412 | (values :atom 413 | (subseq string (car whole) (cdr whole)) 414 | (cdr whole)))))))) 415 | 416 | #+ignore 417 | (defun test (file &key errorp (compact t) temp) 418 | (let ((seen-addrs (make-hash-table :test #'equal))) 419 | (with-open-file (f file) 420 | (let* ((part (net.post-office:parse-mime-structure f)) 421 | (hdrs (net.post-office:mime-part-headers part))) 422 | (dolist (type '("From" "To" "Cc")) 423 | (let ((hdr (cdr (assoc type hdrs :test #'equalp)))) 424 | (when (and hdr 425 | (string/= hdr "") 426 | (not (gethash hdr seen-addrs))) 427 | (setf (gethash hdr seen-addrs) t) 428 | (if* (setq temp 429 | (extract-email-addresses hdr :require-domain nil 430 | :errorp errorp 431 | :compact compact)) 432 | then (format t "have:~{ ~a~}~%" temp) 433 | else (format t "Failed to parse: ~s~%" hdr))))))))) 434 | 435 | ;; Ripped from maild:dns.cl and modified. 436 | 437 | (eval-when (compile load eval) 438 | (require :acldns)) 439 | 440 | ;; Only follows one CNAME lookup. If there is any more than that, the 441 | ;; domain has a really jacked up setup. 442 | 443 | ;; possible answers 444 | ;; t -- yes, there exists a record of that type. 445 | ;; nil -- no record of that type exists 446 | ;; :nxdomain -- the domain itself doesn't exist 447 | ;; :unknown -- couldn't get any answers. 448 | (defun dns-record-exists-p (domain type &key (try-cname t)) 449 | (block nil 450 | (let ((resp (socket:dns-query domain :decode nil :type type)) 451 | cname-item) 452 | (if (null resp) 453 | (return :unknown)) 454 | (let ((flags (socket:dns-response-flags resp)) 455 | (answer (socket:dns-response-answer resp))) 456 | (cond 457 | ((member :nameserver-internal-error flags) 458 | (return :unknown)) 459 | ((member :no-such-domain flags) 460 | (return :nxdomain)) 461 | ((null answer) 462 | (return nil)) ;; no records of that type for that name 463 | ((setq cname-item 464 | (find :cname answer :test #'eq :key #'socket:dns-rr-type)) 465 | (if* (not try-cname) 466 | then (return nil) 467 | else ;; There should only be one cname answer. 468 | (return (dns-record-exists-p (socket:dns-rr-answer cname-item) 469 | type :try-cname nil)))) 470 | (t 471 | t)))))) 472 | 473 | ;; A valid email domain is one that has an MX record or an A record 474 | ;; [or a CNAME to an MX or A record (illegal, but people do it)] 475 | 476 | ;; possible answers: 477 | ;; t -- there is either an MX or A record for that domain 478 | ;; nil -- there is neither an MX nor A record for that domain 479 | ; (possibly because the domain does not exist at all) 480 | ;; :unknown -- couldn't get answers 481 | (defun valid-email-domain-p (domain) 482 | (block nil 483 | (let ((res (dns-record-exists-p domain :mx))) 484 | (cond 485 | ((eq res t) 486 | (return t)) 487 | ((eq res :nxdomain) 488 | (return nil)) 489 | ((eq res :unknown) 490 | (return :unknown))) 491 | (setf res (dns-record-exists-p domain :a)) 492 | (cond 493 | ((eq res t) 494 | (return t)) 495 | ((eq res :nxdomain) 496 | (return nil)) 497 | ((eq res :unknown) 498 | (return :unknown))) 499 | nil))) 500 | 501 | (provide :rfc2822) 502 | -------------------------------------------------------------------------------- /rfc3696.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Network Working Group J. Klensin 8 | Request for Comments: 3696 February 2004 9 | Category: Informational 10 | 11 | 12 | Application Techniques for Checking and Transformation of Names 13 | 14 | Status of this Memo 15 | 16 | This memo provides information for the Internet community. It does 17 | not specify an Internet standard of any kind. Distribution of this 18 | memo is unlimited. 19 | 20 | Copyright Notice 21 | 22 | Copyright (C) The Internet Society (2004). All Rights Reserved. 23 | 24 | Abstract 25 | 26 | Many Internet applications have been designed to deduce top-level 27 | domains (or other domain name labels) from partial information. The 28 | introduction of new top-level domains, especially non-country-code 29 | ones, has exposed flaws in some of the methods used by these 30 | applications. These flaws make it more difficult, or impossible, for 31 | users of the applications to access the full Internet. This memo 32 | discusses some of the techniques that have been used and gives some 33 | guidance for minimizing their negative impact as the domain name 34 | environment evolves. This document draws summaries of the applicable 35 | rules together in one place and supplies references to the actual 36 | standards. 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | Klensin Informational [Page 1] 59 | 60 | RFC 3696 Checking and Transformation of Names February 2004 61 | 62 | 63 | Table of Contents 64 | 65 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 66 | 2. Restrictions on domain (DNS) names . . . . . . . . . . . . . . 3 67 | 3. Restrictions on email addresses . . . . . . . . . . . . . . . 5 68 | 4. URLs and URIs . . . . . . . . . . . . . . . . . . . . . . . . 7 69 | 4.1. URI syntax definitions and issues . . . . . . . . . . . 7 70 | 4.2. The HTTP URL . . . . . . . . . . . . . . . . . . . . . . 8 71 | 4.3. The MAILTO URL . . . . . . . . . . . . . . . . . . . . . 9 72 | 4.4. Guessing domain names in web contexts . . . . . . . . . 11 73 | 5. Implications of internationalization . . . . . . . . . . . . . 11 74 | 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 75 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 76 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 77 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 78 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 14 79 | 9.2. Informative References . . . . . . . . . . . . . . . . . 15 80 | 10. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 15 81 | 11. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 16 82 | 83 | 1. Introduction 84 | 85 | Designers of user interfaces to Internet applications have often 86 | found it useful to examine user-provided values for validity before 87 | passing them to the Internet tools themselves. This type of test, 88 | most commonly involving syntax checks or application of other rules 89 | to domain names, email addresses, or "web addresses" (URLs or, 90 | occasionally, extended URI forms (see Section 4)) may enable better- 91 | quality diagnostics for the user than might be available from the 92 | protocol itself. Local validity tests on values are also thought to 93 | improve the efficiency of back-office processing programs and to 94 | reduce the load on the protocols themselves. Certainly, they are 95 | consistent with the well-established principle that it is better to 96 | detect errors as early as possible. 97 | 98 | The tests must, however, be made correctly or at least safely. If 99 | criteria are applied that do not match the protocols, users will be 100 | inconvenienced, addresses and sites will effectively become 101 | inaccessible to some groups, and business and communications 102 | opportunities will be lost. Experience in recent years indicates 103 | that syntax tests are often performed incorrectly and that tests for 104 | top-level domain names are applied using obsolete lists and 105 | conventions. We assume that most of these incorrect tests are the 106 | result of the inability to conveniently locate exact definitions for 107 | the criteria to be applied. This document draws summaries of the 108 | applicable rules together in one place and supplies references to the 109 | 110 | 111 | 112 | 113 | 114 | Klensin Informational [Page 2] 115 | 116 | RFC 3696 Checking and Transformation of Names February 2004 117 | 118 | 119 | actual standards. It does not add anything to those standards; it 120 | merely draws the information together into a form that may be more 121 | accessible. 122 | 123 | Many experts on Internet protocols believe that tests and rules of 124 | these sorts should be avoided in applications and that the tests in 125 | the protocols and back-office systems should be relied on instead. 126 | Certainly implementations of the protocols cannot assume that the 127 | data passed to them will be valid. Unless the standards specify 128 | particular behavior, this document takes no position on whether or 129 | not the testing is desirable. It only identifies the correct tests 130 | to be made if tests are to be applied. 131 | 132 | The sections that follow discuss domain names, email addresses, and 133 | URLs. 134 | 135 | 2. Restrictions on domain (DNS) names 136 | 137 | The authoritative definitions of the format and syntax of domain 138 | names appear in RFCs 1035 [RFC1035], 1123 [RFC1123], and 2181 139 | [RFC2181]. 140 | 141 | Any characters, or combination of bits (as octets), are permitted in 142 | DNS names. However, there is a preferred form that is required by 143 | most applications. This preferred form has been the only one 144 | permitted in the names of top-level domains, or TLDs. In general, it 145 | is also the only form permitted in most second-level names registered 146 | in TLDs, although some names that are normally not seen by users obey 147 | other rules. It derives from the original ARPANET rules for the 148 | naming of hosts (i.e., the "hostname" rule) and is perhaps better 149 | described as the "LDH rule", after the characters that it permits. 150 | The LDH rule, as updated, provides that the labels (words or strings 151 | separated by periods) that make up a domain name must consist of only 152 | the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen. 153 | No other symbols or punctuation characters are permitted, nor is 154 | blank space. If the hyphen is used, it is not permitted to appear at 155 | either the beginning or end of a label. There is an additional rule 156 | that essentially requires that top-level domain names not be all- 157 | numeric. 158 | 159 | When it is necessary to express labels with non-character octets, or 160 | to embed periods within labels, there is a mechanism for keying them 161 | in that utilizes an escape sequence. RFC 1035 [RFC1035] should be 162 | consulted if that mechanism is needed (most common applications, 163 | including email and the Web, will generally not permit those escaped 164 | strings). A special encoding is now available for non-ASCII 165 | characters, see the brief discussion in Section 5. 166 | 167 | 168 | 169 | 170 | Klensin Informational [Page 3] 171 | 172 | RFC 3696 Checking and Transformation of Names February 2004 173 | 174 | 175 | Most internet applications that reference other hosts or systems 176 | assume they will be supplied with "fully-qualified" domain names, 177 | i.e., ones that include all of the labels leading to the root, 178 | including the TLD name. Those fully-qualified domain names are then 179 | passed to either the domain name resolution protocol itself or to the 180 | remote systems. Consequently, purported DNS names to be used in 181 | applications and to locate resources generally must contain at least 182 | one period (".") character. Those that do not are either invalid or 183 | require the application to supply additional information. Of course, 184 | this principle does not apply when the purpose of the application is 185 | to process or query TLD names themselves. The DNS specification also 186 | permits a trailing period to be used to denote the root, e.g., 187 | "a.b.c" and "a.b.c." are equivalent, but the latter is more explicit 188 | and is required to be accepted by applications. This convention is 189 | especially important when a TLD name is being referred to directly. 190 | For example, while ".COM" has become the popular terminology for 191 | referring to that top-level domain, "COM." would be strictly and 192 | technically correct in talking about the DNS, since it shows that 193 | "COM" is a top-level domain name. 194 | 195 | There is a long history of applications moving beyond the "one or 196 | more periods" test in an attempt to verify that a valid TLD name is 197 | actually present. They have done this either by applying some 198 | heuristics to the form of the name or by consulting a local list of 199 | valid names. The historical heuristics are no longer effective. If 200 | one is to keep a local list, much more effort must be devoted to 201 | keeping it up-to-date than was the case several years ago. 202 | 203 | The heuristics were based on the observation that, since the DNS was 204 | first deployed, all top-level domain names were two, three, or four 205 | characters in length. All two-character names were associated with 206 | "country code" domains, with the specific labels (with a few early 207 | exceptions) drawn from the ISO list of codes for countries and 208 | similar entities [IS3166]. The three-letter names were "generic" 209 | TLDs, whose function was not country-specific, and there was exactly 210 | one four-letter TLD, the infrastructure domain "ARPA." [RFC1591]. 211 | However, these length-dependent rules were conventions, rather than 212 | anything on which the protocols depended. 213 | 214 | Before the mid-1990s, lists of valid top-level domain names changed 215 | infrequently. New country codes were gradually, and then more 216 | rapidly, added as the Internet expanded, but the list of generic 217 | domains did not change at all between the establishment of the "INT." 218 | domain in 1988 and ICANN's allocation of new generic TLDs in 2000. 219 | Some application developers responded by assuming that any two-letter 220 | domain name could be valid as a TLD, but the list of generic TLDs was 221 | fixed and could be kept locally and tested. Several of these 222 | assumptions changed as ICANN started to allocate new top-level 223 | 224 | 225 | 226 | Klensin Informational [Page 4] 227 | 228 | RFC 3696 Checking and Transformation of Names February 2004 229 | 230 | 231 | domains: one two-letter domain that does not appear in the ISO 3166-1 232 | table [ISO.3166.1988] was tentatively approved, and new domains were 233 | created with three, four, and even six letter codes. 234 | 235 | As of the first quarter of 2003, the list of valid, non-country, 236 | top-level domains was .AERO, .BIZ, .COM, .COOP, .EDU, .GOV, .INFO, 237 | .INT, .MIL, .MUSEUM, .NAME, .NET, .ORG, .PRO, and .ARPA. ICANN is 238 | expected to expand that list at regular intervals, so the list that 239 | appears here should not be used in testing. Instead, systems that 240 | filter by testing top-level domain names should regularly update 241 | their local tables of TLDs (both "generic" and country-code-related) 242 | by polling the list published by IANA [DomainList]. It is 243 | likely that the better strategy has now become to make the "at least 244 | one period" test, to verify LDH conformance (including verification 245 | that the apparent TLD name is not all-numeric), and then to use the 246 | DNS to determine domain name validity, rather than trying to maintain 247 | a local list of valid TLD names. 248 | 249 | A DNS label may be no more than 63 octets long. This is in the form 250 | actually stored; if a non-ASCII label is converted to encoded 251 | "punycode" form (see Section 5), the length of that form may restrict 252 | the number of actual characters (in the original character set) that 253 | can be accommodated. A complete, fully-qualified, domain name must 254 | not exceed 255 octets. 255 | 256 | Some additional mechanisms for guessing correct domain names when 257 | incomplete information is provided have been developed for use with 258 | the web and are discussed in Section 4.4. 259 | 260 | 3. Restrictions on email addresses 261 | 262 | Reference documents: RFC 2821 [RFC2821] and RFC 2822 [RFC2822] 263 | 264 | Contemporary email addresses consist of a "local part" separated from 265 | a "domain part" (a fully-qualified domain name) by an at-sign ("@"). 266 | The syntax of the domain part corresponds to that in the previous 267 | section. The concerns identified in that section about filtering and 268 | lists of names apply to the domain names used in an email context as 269 | well. The domain name can also be replaced by an IP address in 270 | square brackets, but that form is strongly discouraged except for 271 | testing and troubleshooting purposes. 272 | 273 | The local part may appear using the quoting conventions described 274 | below. The quoted forms are rarely used in practice, but are 275 | required for some legitimate purposes. Hence, they should not be 276 | rejected in filtering routines but, should instead be passed to the 277 | email system for evaluation by the destination host. 278 | 279 | 280 | 281 | 282 | Klensin Informational [Page 5] 283 | 284 | RFC 3696 Checking and Transformation of Names February 2004 285 | 286 | 287 | The exact rule is that any ASCII character, including control 288 | characters, may appear quoted, or in a quoted string. When quoting 289 | is needed, the backslash character is used to quote the following 290 | character. For example 291 | 292 | Abc\@def@example.com 293 | 294 | is a valid form of an email address. Blank spaces may also appear, 295 | as in 296 | 297 | Fred\ Bloggs@example.com 298 | 299 | The backslash character may also be used to quote itself, e.g., 300 | 301 | Joe.\\Blow@example.com 302 | 303 | In addition to quoting using the backslash character, conventional 304 | double-quote characters may be used to surround strings. For example 305 | 306 | "Abc@def"@example.com 307 | 308 | "Fred Bloggs"@example.com 309 | 310 | are alternate forms of the first two examples above. These quoted 311 | forms are rarely recommended, and are uncommon in practice, but, as 312 | discussed above, must be supported by applications that are 313 | processing email addresses. In particular, the quoted forms often 314 | appear in the context of addresses associated with transitions from 315 | other systems and contexts; those transitional requirements do still 316 | arise and, since a system that accepts a user-provided email address 317 | cannot "know" whether that address is associated with a legacy 318 | system, the address forms must be accepted and passed into the email 319 | environment. 320 | 321 | Without quotes, local-parts may consist of any combination of 322 | alphabetic characters, digits, or any of the special characters 323 | 324 | ! # $ % & ' * + - / = ? ^ _ ` . { | } ~ 325 | 326 | period (".") may also appear, but may not be used to start or end the 327 | local part, nor may two or more consecutive periods appear. Stated 328 | differently, any ASCII graphic (printing) character other than the 329 | at-sign ("@"), backslash, double quote, comma, or square brackets may 330 | appear without quoting. If any of that list of excluded characters 331 | are to appear, they must be quoted. Forms such as 332 | 333 | user+mailbox@example.com 334 | 335 | 336 | 337 | 338 | Klensin Informational [Page 6] 339 | 340 | RFC 3696 Checking and Transformation of Names February 2004 341 | 342 | 343 | customer/department=shipping@example.com 344 | 345 | $A12345@example.com 346 | 347 | !def!xyz%abc@example.com 348 | 349 | _somename@example.com 350 | 351 | are valid and are seen fairly regularly, but any of the characters 352 | listed above are permitted. In the context of local parts, 353 | apostrophe ("'") and acute accent ("`") are ordinary characters, not 354 | quoting characters. Some of the characters listed above are used in 355 | conventions about routing or other types of special handling by some 356 | receiving hosts. But, since there is no way to know whether the 357 | remote host is using those conventions or just treating these 358 | characters as normal text, sending programs (and programs evaluating 359 | address validity) must simply accept the strings and pass them on. 360 | 361 | In addition to restrictions on syntax, there is a length limit on 362 | email addresses. That limit is a maximum of 64 characters (octets) 363 | in the "local part" (before the "@") and a maximum of 255 characters 364 | (octets) in the domain part (after the "@") for a total length of 320 365 | characters. Systems that handle email should be prepared to process 366 | addresses which are that long, even though they are rarely 367 | encountered. 368 | 369 | 4. URLs and URIs 370 | 371 | 4.1. URI syntax definitions and issues 372 | 373 | The syntax for URLs (Uniform Resource Locators) is specified in 374 | [RFC1738]. The syntax for the more general "URI" (Uniform Resource 375 | Identifier) is specified in [RFC2396]. The URI syntax is extremely 376 | general, with considerable variations permitted according to the type 377 | of "scheme" (e.g., "http", "ftp", "mailto") that is being used. 378 | While it is possible to use the general syntax rules of RFC 2396 to 379 | perform syntax checks, they are general enough --essentially only 380 | specifying the separation of the scheme name and "scheme specific 381 | part" with a colon (":") and excluding some characters that must be 382 | escaped if used-- to provide little significant filtering or 383 | validation power. 384 | 385 | The following characters are reserved in many URIs -- they must be 386 | used for either their URI-intended purpose or must be encoded. Some 387 | particular schemes may either broaden or relax these restrictions 388 | (see the following sections for URLs applicable to "web pages" and 389 | electronic mail), or apply them only to particular URI component 390 | parts. 391 | 392 | 393 | 394 | Klensin Informational [Page 7] 395 | 396 | RFC 3696 Checking and Transformation of Names February 2004 397 | 398 | 399 | ; / ? : @ & = + $ , ? 400 | 401 | In addition, control characters, the space character, the double- 402 | quote (") character, and the following special characters 403 | 404 | < > # % 405 | 406 | are generally forbidden and must either be avoided or escaped, as 407 | discussed below. 408 | 409 | The colon after the scheme name, and the percent sign used to escape 410 | characters, are specifically reserved for those purposes, although 411 | ":" may also be used elsewhere in some schemes. 412 | 413 | When it is necessary to encode these, or other, characters, the 414 | method used is to replace it with a percent-sign ("%") followed by 415 | two hexidecimal digits representing its octet value. See section 416 | 2.4.1 of [RFC2396] for an exact definition. Unless it is used as a 417 | delimiter of the URI scheme itself, any character may optionally be 418 | encoded this way; systems that are testing URI syntax should be 419 | prepared for these encodings to appear in any component of the URI 420 | except the scheme name itself. 421 | 422 | A "generic URI" syntax is specified and is more restrictive, but 423 | using it to test URI strings requires that one know whether or not 424 | the particular scheme in use obeys that syntax. Consequently, 425 | applications that intend to check or validate URIs should normally 426 | identify the scheme name and then apply scheme-specific tests. The 427 | rules for two of those -- HTTP [RFC1738] and MAILTO [RFC2368] URLs -- 428 | are discussed below, but the author of an application which intends 429 | to make very precise checks, or to reject particular syntax rather 430 | than just warning the user, should consult the relevant scheme- 431 | definition documents for precise syntax and relationships. 432 | 433 | 4.2. The HTTP URL 434 | 435 | Absolute HTTP URLs consist of the scheme name, a host name (expressed 436 | as a domain name or IP address), and optional port number, and then, 437 | optionally, a path, a search part, and a fragment identifier. These 438 | are separated, respectively, by a colon and the two slashes that 439 | precede the host name, a colon, a slash, a question mark, and a hash 440 | mark ("#"). So we have 441 | 442 | http://host:port/path?search#fragment 443 | 444 | http://host/path/ 445 | 446 | http://host/path#fragment 447 | 448 | 449 | 450 | Klensin Informational [Page 8] 451 | 452 | RFC 3696 Checking and Transformation of Names February 2004 453 | 454 | 455 | http://host/path?search 456 | 457 | http://host 458 | 459 | and other variations on that form. There is also a "relative" form, 460 | but it almost never appears in text that a user might, e.g., enter 461 | into a form. See [RFC2616] for details. 462 | 463 | The characters 464 | 465 | / ; ? 466 | 467 | are reserved within the path and search parts and must be encoded; 468 | the first of these may be used unencoded, and is often used within 469 | the path, to designate hierarchy. 470 | 471 | 4.3. The MAILTO URL 472 | 473 | MAILTO is a URL type whose content is an email address. It can be 474 | used to encode any of the email address formats discussed in Section 475 | 3 above. It can also support multiple addresses and the inclusion of 476 | headers (e.g., Subject lines) within the body of the URL. MAILTO is 477 | authoritatively defined in RFC 2368 [RFC2368]; anyone expecting to 478 | accept and test multiple addresses or mail header or body formats 479 | should consult that document carefully. 480 | 481 | In accepting text for, or validating, a MAILTO URL, it is important 482 | to note that, while it can be used to encode any valid email address, 483 | it is not sufficient to copy an email address into a MAILTO URL since 484 | email addresses may include a number of characters that are invalid 485 | in, or have reserved uses for, URLs. Those characters must be 486 | encoded, as outlined in Section 4.1 above, when the addresses are 487 | mapped into the URL form. Conversely, addresses in MAILTO URLs 488 | cannot, in general, be copied directly into email contexts, since few 489 | email programs will reverse the decodings (and doing so might be 490 | interpreted as a protocol violation). 491 | 492 | The following characters may appear in MAILTO URLs only with the 493 | specific defined meanings given. If they appear in an email address 494 | (i.e., for some other purpose), they must be encoded: 495 | 496 | : The colon in "mailto:" 497 | 498 | < > # " % { } | \ ^ ~ ` 499 | 500 | These characters are "unsafe" in any URL, and must always be 501 | encoded. 502 | 503 | 504 | 505 | 506 | Klensin Informational [Page 9] 507 | 508 | RFC 3696 Checking and Transformation of Names February 2004 509 | 510 | 511 | The following characters must also be encoded if they appear in a 512 | MAILTO URL 513 | 514 | ? & = 515 | Used to delimit headers and their values when these are encoded 516 | into URLs. 517 | 518 | Some examples may be helpful: 519 | 520 | +-------------------------+-----------------------------+-----------+ 521 | | Email address | MAILTO URL | Notes | 522 | +-------------------------+-----------------------------+-----------+ 523 | | Joe@example.com | mailto:joe@example.com | 1 | 524 | | | | | 525 | | user+mailbox@example | mailto: | 2 | 526 | | .com | user%2Bmailbox@example | | 527 | | | .com | | 528 | | | | | 529 | | customer/department= | mailto:customer%2F | 3 | 530 | | shipping@example.com | department=shipping@example | | 531 | | | .com | | 532 | | | | | 533 | | $A12345@example.com | mailto:$A12345@example | 4 | 534 | | | .com | | 535 | | | | | 536 | | !def!xyz%abc@example | mailto:!def!xyz%25abc | 5 | 537 | | .com | @example.com | | 538 | | | | | 539 | | _somename@example.com | mailto:_somename@example | 4 | 540 | | | .com | | 541 | +-------------------------+-----------------------------+-----------+ 542 | 543 | Table 1 544 | 545 | Notes on Table 546 | 547 | 1. No characters appear in the email address that require escaping, 548 | so the body of the MAILTO URL is identical to the email address. 549 | 550 | 2. There is actually some uncertainty as to whether or not the "+" 551 | characters requires escaping in MAILTO URLs (the standards are 552 | not precisely clear). But, since any character in the address 553 | specification may optionally be encoded, it is probably safer to 554 | encode it. 555 | 556 | 3. The "/" character is generally reserved in URLs, and must be 557 | encoded as %2F. 558 | 559 | 560 | 561 | 562 | Klensin Informational [Page 10] 563 | 564 | RFC 3696 Checking and Transformation of Names February 2004 565 | 566 | 567 | 4. Neither the "$" nor the "_" character are given any special 568 | interpretation in MAILTO URLs, so need not be encoded. 569 | 570 | 5. While the "!" character has no special interpretation, the "%" 571 | character is used to introduce encoded sequences and hence it 572 | must always be encoded. 573 | 574 | 4.4. Guessing domain names in web contexts 575 | 576 | Several web browsers have adopted a practice that permits an 577 | incomplete domain name to be used as input instead of a complete URL. 578 | This has, for example, permitted users to type "microsoft" and have 579 | the browser interpret the input as "http://www.microsoft.com/". 580 | Other browser versions have gone even further, trying to build DNS 581 | names up through a series of heuristics, testing each variation in 582 | turn to see if it appears in the DNS, and accepting the first one 583 | found as the intended domain name. Still, others automatically 584 | invoke search engines if no period appears or if the reference fails. 585 | If any of these approaches are to be used, it is often critical that 586 | the browser recognize the complete list of TLDs. If an incomplete 587 | list is used, complete domain names may not be recognized as such and 588 | the system may try to turn them into completely different names. For 589 | example, "example.aero" is a fully-qualified name, since "AERO." is a 590 | TLD name. But, if the system doesn't recognize "AERO" as a TLD name, 591 | it is likely to try to look up "example.aero.com" and 592 | "www.example.aero.com" (and then fail or find the wrong host), rather 593 | than simply looking up the user-supplied name. 594 | 595 | As discussed in Section 2 above, there are dangers associated with 596 | software that attempts to "know" the list of top-level domain names 597 | locally and take advantage of that knowledge. These name-guessing 598 | heuristics are another example of that situation: if the lists are 599 | up-to-date and used carefully, the systems in which they are embedded 600 | may provide an easier, and more attractive, experience for at least 601 | some users. But finding the wrong host, or being unable to find a 602 | host even when its name is precisely known, constitute bad 603 | experiences by any measure. 604 | 605 | More generally, there have been bad experiences with attempts to 606 | "complete" domain names by adding additional information to them. 607 | These issues are described in some detail in RFC 1535 [RFC1535]. 608 | 609 | 5. Implications of internationalization 610 | 611 | The IETF has adopted a series of proposals ([RFC3490] - [RFC3492]) 612 | whose purpose is to permit encoding internationalized (i.e., non- 613 | ASCII) names in the DNS. The primary standard, and the group 614 | generically, are known as "IDNA". The actual strings stored in the 615 | 616 | 617 | 618 | Klensin Informational [Page 11] 619 | 620 | RFC 3696 Checking and Transformation of Names February 2004 621 | 622 | 623 | DNS are in an encoded form: the labels begin with the characters 624 | "xn--" followed by the encoded string. Applications should be 625 | prepared to accept and process the encoded form (those strings are 626 | consistent with the "LDH rule" (see Section 2) so should not raise 627 | any separate issues) and the use of local, and potentially other, 628 | characters as appropriate to local systems and circumstances. 629 | 630 | The IDNA specification describes the exact process to be used to 631 | validate a name or encoded string. The process is sufficiently 632 | complex that shortcuts or heuristics, especially for versions of 633 | labels written directly in Unicode or other coded character sets, are 634 | likely to fail and cause problems. In particular, the strings cannot 635 | be validated with syntax or semantic rules of any of the usual sorts: 636 | syntax validity is defined only in terms of the result of executing a 637 | particular function. 638 | 639 | In addition to the restrictions imposed by the protocols themselves, 640 | many domains are implementing rules about just which non-ASCII names 641 | they will permit to be registered (see, e.g., [JET], [RegRestr]). 642 | This work is still relatively new, and the rules and conventions are 643 | likely to be different for each domain, or at least each language or 644 | script group. Attempting to test for those rules in a client program 645 | to see if a user-supplied name might possibly exist in the relevant 646 | domain would almost certainly be ill-advised. 647 | 648 | One quick local test however, may be reasonable: as of the time of 649 | this writing, there should be no instances of labels in the DNS that 650 | start with two characters, followed by two hyphens, where the two 651 | characters are not "xn" (in, of course, either upper or lower case). 652 | Such label strings, if they appear, are probably erroneous or 653 | obsolete, and it may be reasonable to at least warn the user about 654 | them. 655 | 656 | There is ongoing work in the IETF and elsewhere to define 657 | internationalized formats for use in other protocols, including email 658 | addresses. Those forms may or may not conform to existing rules for 659 | ASCII-only identifiers; anyone designing evaluators or filters should 660 | watch that work closely. 661 | 662 | 6. Summary 663 | 664 | When an application accepts a string from the user and ultimately 665 | passes it on to an API for a protocol, the desirability of testing or 666 | filtering the text in any way not required by the protocol itself is 667 | hotly debated. If it must divide the string into its components, or 668 | otherwise interpret it, it obviously must make at least enough tests 669 | to validate that process. With, e.g., domain names or email 670 | addresses that can be passed on untouched, the appropriateness of 671 | 672 | 673 | 674 | Klensin Informational [Page 12] 675 | 676 | RFC 3696 Checking and Transformation of Names February 2004 677 | 678 | 679 | trying to figure out which ones are valid and which ones are not 680 | requires a more complex decision, one that should include 681 | considerations of how to make exactly the correct tests and to keep 682 | information that changes and evolves up-to-date. A test containing 683 | obsolete information, can be extremely frustrating for potential 684 | correspondents or customers and may harm desired relationships. 685 | 686 | 7. Security Considerations 687 | 688 | Since this document merely summarizes the requirements of existing 689 | standards, it does not introduce any new security issues. However, 690 | many of the techniques that motivate the document raise important 691 | security concerns of their own. Rejecting valid forms of domain 692 | names, email addresses, or URIs often denies service to the user of 693 | those entities. Worse, guessing at the user's intent when an 694 | incomplete address, or other string, is given can result in 695 | compromises to privacy or accuracy of reference if the wrong target 696 | is found and returned. From a security standpoint, the optimum 697 | behavior is probably to never guess, but instead, to force the user 698 | to specify exactly what is wanted. When that position involves a 699 | tradeoff with an acceptable user experience, good judgment should be 700 | used and the fact that it is a tradeoff recognized. 701 | 702 | Some characters have special or privileged meanings on some systems 703 | (i.e., ` on Unix). Applications should be careful to escape those 704 | locally if necessary. By the same token, they are valid, and should 705 | not be disallowed locally, or escaped when transmitted through 706 | Internet protocols, for such reasons if a remote site chooses to use 707 | them. 708 | 709 | The presence of local checking does not permit remote checking to be 710 | bypassed. Note that this can apply to a single machine; in 711 | particular, a local MTA should not assume that a local MUA has 712 | properly escaped locally-significant special characters. 713 | 714 | 8. Acknowledgements 715 | 716 | The author would like to express his appreciation for helpful 717 | comments from Harald Alvestrand, Eric A. Hall, and the RFC Editor, 718 | and for partial support of this work from SITA. Responsibility for 719 | any errors remains, of course, with the author. 720 | 721 | The first Internet-Draft on this subject was posted in February 2003. 722 | The document was submitted to the RFC Editor on 20 June 2003, 723 | returned for revisions on 19 August, and resubmitted on 5 September 724 | 2003. 725 | 726 | 727 | 728 | 729 | 730 | Klensin Informational [Page 13] 731 | 732 | RFC 3696 Checking and Transformation of Names February 2004 733 | 734 | 735 | 9. References 736 | 737 | 9.1. Normative References 738 | 739 | [RFC1035] Mockapetris, P., "Domain names - implementation and 740 | specification", STD 13, RFC 1035, November 1987. 741 | 742 | [RFC1123] Braden, R., Ed., "Requirements for Internet Hosts - 743 | Application and Support", STD 3, RFC 1123, October 744 | 1989. 745 | 746 | [RFC1535] Gavron, E., "A Security Problem and Proposed 747 | Correction With Widely Deployed DNS Software", RFC 748 | 1535, October 1993. 749 | 750 | [RFC1738] Berners-Lee, T., Masinter, L. and M. McCahill, 751 | "Uniform Resource Locators (URL)", RFC 1738, December 752 | 1994. 753 | 754 | [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 755 | Specification", RFC 2181, July 1997. 756 | 757 | [RFC2368] Hoffman, P., Masinter, L. and J. Zawinski, "The 758 | mailto URL scheme", RFC 2368, July 1998. 759 | 760 | [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, 761 | "Uniform Resource Identifiers (URI): Generic Syntax", 762 | RFC 2396, August 1998. 763 | 764 | [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 765 | Masinter, L., Leach, P. and T. Berners-Lee, 766 | "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, 767 | June 1999. 768 | 769 | [RFC2821] Klensin, J., Ed., "Simple Mail Transfer Protocol", 770 | RFC 2821, April 2001. 771 | 772 | [RFC2822] Resnick, P., Ed., "Internet Message Format", RFC 773 | 2822, April 2001. 774 | 775 | [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, 776 | "Internationalizing Domain Names in Applications 777 | (IDNA)", RFC 3490, March 2003. 778 | 779 | [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 780 | Profile for Internationalized Domain Names (IDN)", 781 | RFC 3491, March 2003. 782 | 783 | 784 | 785 | 786 | Klensin Informational [Page 14] 787 | 788 | RFC 3696 Checking and Transformation of Names February 2004 789 | 790 | 791 | [RFC3492] Costello, A., "Punycode: A Bootstring encoding of 792 | Unicode for Internationalized Domain Names in 793 | Applications (IDNA)", RFC 3492, March 2003. 794 | 795 | [ASCII] American National Standards Institute (formerly 796 | United States of America Standards Institute), "USA 797 | Code for Information Interchange", ANSI X3.4-1968. 798 | ANSI X3.4-1968 has been replaced by newer versions 799 | with slight modifications, but the 1968 version 800 | remains definitive for the Internet. 801 | 802 | [DomainList] Internet Assigned Numbers Authority (IANA), Untitled 803 | alphabetical list of current top-level domains. 804 | http://data.iana.org/TLD/tlds-alpha-by-domain.txt 805 | ftp://data.iana.org/TLD/tlds-alpha-by-domain.txt 806 | 807 | 9.2. Informative References 808 | 809 | [ISO.3166.1988] International Organization for Standardization, 810 | "Codes for the representation of names of countries, 811 | 3rd edition", ISO Standard 3166, August 1988. 812 | 813 | [JET] Konishi, K., et al., "Internationalized Domain Names 814 | Registration and Administration Guideline for 815 | Chinese, Japanese and Korean", Work in Progress. 816 | 817 | [RFC1591] Postel, J., "Domain Name System Structure and 818 | Delegation", RFC 1591, March 1994. 819 | 820 | [RegRestr] Klensin, J., "Registration of Internationalized 821 | Domain Names: Overview and Method", Work in Progress, 822 | February 2004. 823 | 824 | 10. Author's Address 825 | 826 | John C Klensin 827 | 1770 Massachusetts Ave, #322 828 | Cambridge, MA 02140 829 | USA 830 | 831 | Phone: +1 617 491 5735 832 | EMail: john-ietf@jck.com 833 | 834 | 835 | 836 | 837 | 838 | 839 | 840 | 841 | 842 | Klensin Informational [Page 15] 843 | 844 | RFC 3696 Checking and Transformation of Names February 2004 845 | 846 | 847 | 11. Full Copyright Statement 848 | 849 | Copyright (C) The Internet Society (2004). This document is subject 850 | to the rights, licenses and restrictions contained in BCP 78 and 851 | except as set forth therein, the authors retain all their rights. 852 | 853 | This document and the information contained herein are provided on an 854 | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 855 | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 856 | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 857 | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 858 | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 859 | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 860 | 861 | Intellectual Property 862 | 863 | The IETF takes no position regarding the validity or scope of any 864 | Intellectual Property Rights or other rights that might be claimed to 865 | pertain to the implementation or use of the technology described in 866 | this document or the extent to which any license under such rights 867 | might or might not be available; nor does it represent that it has 868 | made any independent effort to identify any such rights. Information 869 | on the procedures with respect to rights in RFC documents can be 870 | found in BCP 78 and BCP 79. 871 | 872 | Copies of IPR disclosures made to the IETF Secretariat and any 873 | assurances of licenses to be made available, or the result of an 874 | attempt made to obtain a general license or permission for the use of 875 | such proprietary rights by implementers or users of this 876 | specification can be obtained from the IETF on-line IPR repository at 877 | http://www.ietf.org/ipr. 878 | 879 | The IETF invites any interested party to bring to its attention any 880 | copyrights, patents or patent applications, or other proprietary 881 | rights that may cover technology that may be required to implement 882 | this standard. Please address the information to the IETF at ietf- 883 | ipr@ietf.org. 884 | 885 | Acknowledgement 886 | 887 | Funding for the RFC Editor function is currently provided by the 888 | Internet Society. 889 | 890 | 891 | 892 | 893 | 894 | 895 | 896 | 897 | 898 | Klensin Informational [Page 16] 899 | 900 | -------------------------------------------------------------------------------- /smtp.cl: -------------------------------------------------------------------------------- 1 | ;; -*- mode: common-lisp; package: net.post-office -*- 2 | ;; send mail to an smtp server. 3 | ;; Originally, we used rfc821, but rfc5321/rfc5322 are now the definitive 4 | ;; resource for this code. 5 | ;; 6 | ;; See the file LICENSE for the full license governing this code. 7 | 8 | #+(version= 10 1) 9 | (sys:defpatch "smtp" 2 10 | "v2: allow SSL options to connect-to-mail-server to be changed, default to :tlsv1.2; 11 | v1: send-letter: fold header lines per rfc5322." 12 | :type :system 13 | :post-loadable t) 14 | 15 | (eval-when (compile eval load) 16 | (require :osi)) 17 | 18 | (defpackage :net.post-office 19 | (:use #:lisp #:excl #:excl.osi) 20 | (:export 21 | #:send-letter 22 | #:send-smtp 23 | #:send-smtp-auth 24 | #:test-email-address)) 25 | 26 | (in-package :net.post-office) 27 | 28 | (eval-when (compile load eval) 29 | (require :streamp) 30 | (require :sasl) 31 | (require :mime)) 32 | 33 | ;; the exported functions: 34 | 35 | ;; (send-letter "mail-server" "from" "to" "message" 36 | ;; &key cc bcc subject reply-to headers) 37 | ;; 38 | ;; 39 | ;; sends a message to the mail server (which may be a relay server 40 | ;; or the final destination). "from" is the address to be given 41 | ;; as the sender. "to" can be a string or a list of strings naming 42 | ;; recipients. 43 | ;; "message" is the message to be sent. It can be a string or a stream. 44 | ;; cc and bcc can be either be a string or a list of strings 45 | ;; naming recipients. All cc's and bcc's are sent the message 46 | ;; but the bcc's aren't included in the header created. 47 | ;; reply-to's value is a string and in cases a Reply-To header 48 | ;; to be created. 49 | ;; headers is a string or list of stings. These are raw header lines 50 | ;; added to the header build to send out. 51 | ;; 52 | ;; This builds a header and inserts the optional cc, bcc, 53 | ;; subject and reply-to lines. 54 | ;; 55 | ;; (send-smtp "mail-server" "from" "to" &rest messages) 56 | ;; this is like send-letter except that it doesn't build a header. 57 | ;; the messages should contain a header (and if not then sendmail 58 | ;; notices this and builds one -- other MTAs may not be that smart). 59 | ;; The messages ia list of strings or streams to be concatenated together 60 | ;; and sent as one message 61 | ;; 62 | ;; 63 | ;; (test-email-address "user@machine.com") 64 | ;; return t is this could be a valid email address on the machine 65 | ;; named. Do this by contacting the mail server and using the VRFY 66 | ;; command from smtp. Since some mail servers don't implement VRFY 67 | ;; we return t if VRFY doesn't work. 68 | ;; nil means that this address is bad (or we can't make contact with 69 | ;; the mail server, which could of course be a transient problem). 70 | ;; 71 | 72 | 73 | 74 | 75 | 76 | (defmacro response-case ((smtp-stream &optional smtp-response response-code) &rest case-clauses) 77 | ;; get a response from the smtp server and dispatch in a 'case' like 78 | ;; fashion to a clause based on the first digit of the return 79 | ;; code of the response. 80 | ;; smtp-response, if given, will be bound to string that is 81 | ;; the actual response 82 | ;; 83 | (let ((response-class (gensym))) 84 | `(multiple-value-bind (,response-class 85 | ,@(if* smtp-response then (list smtp-response)) 86 | ,@(if* response-code then (list response-code))) 87 | (progn (force-output ,smtp-stream) 88 | (wait-for-response ,smtp-stream)) 89 | ;;(declare (ignorable smtp-response)) 90 | (case ,response-class 91 | ,@case-clauses)))) 92 | 93 | (defmacro smtp-send-recv ((smtp-stream cmd smtp-response &optional response-code) &rest case-clauses) 94 | (let ((stream (gensym)) 95 | (sent (gensym))) 96 | `(let ((,stream ,smtp-stream) 97 | (,sent ,cmd)) 98 | (if* *smtp-debug* 99 | then (format *smtp-debug* "to smtp command: ~s~%" ,sent) 100 | (force-output *smtp-debug*)) 101 | (write-string ,sent ,stream) 102 | (write-char #\return ,stream) 103 | (write-char #\newline ,stream) 104 | (force-output ,stream) 105 | (macrolet ((smtp-transaction-error () 106 | (list 107 | 'error 108 | "SMTP transaction failed. We said: ~s, and the server replied: ~s" 109 | (quote ,sent) 110 | (quote ,smtp-response)))) 111 | 112 | (response-case (,stream ,smtp-response ,response-code) 113 | ,@case-clauses))))) 114 | 115 | (defvar *smtp-debug* nil) 116 | 117 | 118 | 119 | (defun send-letter (server from to message 120 | &key cc bcc subject reply-to headers 121 | login password attachments) 122 | ;; 123 | ;; see documentation at the head of this file 124 | ;; 125 | 126 | (if* (mime-part-constructed-p message) 127 | then (if* (and (not (multipart-mixed-p message)) attachments) 128 | then (error "~ 129 | attachments are not allowed for non-multipart/mixed messages.")) 130 | else (let ((part 131 | (if* (streamp message) 132 | then 133 | (make-mime-part :file message) 134 | elseif (stringp message) 135 | then (make-mime-part :text message) 136 | else (error "~ 137 | message must be a string, stream, or mime-part-constructed, not ~s" message)))) 138 | 139 | (setf message 140 | (if* attachments 141 | then (make-mime-part :subparts (list part)) 142 | else part)))) 143 | 144 | (let ((hdrs nil) 145 | (user-headers "") 146 | (tos (if* (stringp to) 147 | then (list to) 148 | elseif (consp to) 149 | then to 150 | else (error "to should be a string or list, not ~s" to))) 151 | (ccs 152 | (if* (null cc) 153 | then nil 154 | elseif (stringp cc) 155 | then (list cc) 156 | elseif (consp cc) 157 | then cc 158 | else (error "cc should be a string or list, not ~s" cc))) 159 | (bccs (if* (null bcc) 160 | then nil 161 | elseif (stringp bcc) 162 | then (list bcc) 163 | elseif (consp bcc) 164 | then bcc 165 | else (error "bcc should be a string or list, not ~s" bcc)))) 166 | 167 | (setf hdrs 168 | (with-output-to-string (hdrs) 169 | (macrolet ((already-have (name) 170 | `(mime-get-header ,name message))) 171 | 172 | ;; Give priority to headers already provided in a mime-part. 173 | (if* (not (already-have "From")) 174 | then (format hdrs "From: ~a~%" from)) 175 | 176 | (if* (not (already-have "To")) 177 | then (format hdrs "To: ~a~%" (fold-addresses tos))) 178 | 179 | (if* (and ccs (not (already-have "Cc"))) 180 | then (format hdrs "Cc: ~a~%" (fold-addresses ccs))) 181 | 182 | (if* (and subject (not (already-have "Subject"))) 183 | then (format hdrs "Subject: ~a~%" subject)) 184 | 185 | (if* (and reply-to (not (already-have "Reply-To"))) 186 | then (format hdrs "Reply-To: ~a~%" reply-to))))) 187 | 188 | (if* headers 189 | then (if* (stringp headers) 190 | then (setq headers (list headers)) 191 | elseif (consp headers) 192 | thenret 193 | else (error "Unknown headers format: ~s." headers)) 194 | (setf user-headers 195 | (with-output-to-string (header) 196 | (dolist (h headers) 197 | (format header "~a~%" h))))) 198 | 199 | ;; Temporarily modifies 'message', which may be user-provided. 200 | (let ((parts-save (mime-part-parts message))) 201 | (if* attachments 202 | then (if (not (consp attachments)) 203 | (setf attachments (list attachments))) 204 | 205 | (let (res) 206 | (dolist (attachment attachments) 207 | (if* (mime-part-constructed-p attachment) 208 | thenret 209 | elseif (or (streamp attachment) (stringp attachment) 210 | (pathnamep attachment)) 211 | then (setf attachment (make-mime-part :file attachment)) 212 | else (error "~ 213 | Attachments must be filenames, streams, or mime-part-constructed, not ~s" 214 | attachment)) 215 | (push attachment res)) 216 | 217 | (setf (mime-part-parts message) (append parts-save res)))) 218 | 219 | (with-mime-part-constructed-stream (s message) 220 | (if* (and (consp server) (eq :program (car server))) 221 | then (send-external-program (cdr server) hdrs user-headers s) 222 | else (send-smtp-auth server from (append tos ccs bccs) 223 | login password 224 | hdrs 225 | user-headers 226 | s))) 227 | 228 | (setf (mime-part-parts message) parts-save) 229 | t))) 230 | 231 | (defun send-external-program (program &rest messages 232 | &aux (external-format :default)) 233 | (multiple-value-bind (stdout stderr exit-status) 234 | (command-output 235 | (if* (stringp program) 236 | then program 237 | elseif (consp program) 238 | then #+mswindows program 239 | #-mswindows (apply #'vector (car program) program) 240 | else (error "Bad program argument: ~s." program)) 241 | :input (lambda (stream) 242 | (create-message stream messages external-format))) 243 | (when (/= 0 exit-status) 244 | (error "external program failed to send email (~s, ~s)." 245 | stdout stderr)))) 246 | 247 | (defun create-message (output-stream messages external-format) 248 | (let ((at-bol t) 249 | (prev-ch nil) 250 | ch input-stream) 251 | (dolist (message messages) 252 | (when message 253 | (setq input-stream 254 | (if* (streamp message) 255 | then message 256 | else (make-buffer-input-stream 257 | (string-to-octets 258 | message 259 | :null-terminate nil 260 | :external-format external-format)))) 261 | 262 | (while (setf ch (read-byte input-stream nil)) 263 | (if* (and at-bol (eq ch #.(char-code #\.))) 264 | then ;; to prevent . from being interpreted as eol 265 | (write-char #\. output-stream)) 266 | (if* (eq ch #.(char-code #\newline)) 267 | then (setq at-bol t) 268 | (if* (not (eq prev-ch #.(char-code #\return))) 269 | then (write-char #\return output-stream)) 270 | else (setq at-bol nil)) 271 | (write-byte ch output-stream) 272 | (setq prev-ch ch))))) 273 | (write-char #\return output-stream) 274 | (write-char #\linefeed output-stream) 275 | (write-char #\. output-stream) 276 | (write-char #\return output-stream) 277 | (write-char #\linefeed output-stream)) 278 | 279 | (defun send-smtp (server from to &rest messages) 280 | (send-smtp-1 server from to nil nil messages)) 281 | 282 | (defun send-smtp-auth (server from to login password &rest messages) 283 | (send-smtp-1 server from to login password messages)) 284 | 285 | (defun send-smtp-1 (server from to login password messages 286 | &key (external-format 287 | ;; Never used, this might as well be an &aux 288 | ;; variable 289 | :default)) 290 | ;; send the effective concatenation of the messages via 291 | ;; smtp to the mail server 292 | ;; Each message should be a string or a stream. 293 | ;; 294 | ;; 'to' can be a single string or a list of strings. 295 | ;; each string should be in the official rfc822 format "foo@bar.com" 296 | ;; 297 | 298 | (let ((sock (connect-to-mail-server server login password))) 299 | 300 | (unwind-protect 301 | (progn 302 | 303 | (smtp-send-recv (sock (format nil "MAIL from:<~a>" from) msg) 304 | (2 ;; cool 305 | nil 306 | ) 307 | (t (smtp-transaction-error))) 308 | 309 | (let ((tos (if* (stringp to) 310 | then (list to) 311 | elseif (consp to) 312 | then to 313 | else (error "to should be a string or list, not ~s" 314 | to)))) 315 | (dolist (to tos) 316 | (smtp-send-recv (sock (format nil "RCPT to:<~a>" to) msg) 317 | (2 ;; cool 318 | nil 319 | ) 320 | (t (smtp-transaction-error))))) 321 | 322 | (smtp-send-recv (sock "DATA" msg) 323 | (3 ;; cool 324 | nil) 325 | (t (smtp-transaction-error))) 326 | 327 | 328 | (create-message sock messages external-format) 329 | 330 | (response-case (sock msg) 331 | (2 nil ; (format t "Message sent to ~a~%" to) 332 | ) 333 | 334 | (t (error "message not sent: ~s" msg))) 335 | 336 | (smtp-send-recv (sock "QUIT" msg) 337 | (2 ;; cool 338 | nil) 339 | (t (smtp-transaction-error)))) 340 | ;; Cleanup 341 | (close sock)))) 342 | 343 | 344 | (defun connect-to-mail-server (server login password) 345 | ;; make that initial connection to the mail server 346 | ;; returning a socket connected to it and 347 | ;; signaling an error if it can't be made. 348 | 349 | (let ((use-port 25) ;; standard SMTP port 350 | ssl-args 351 | ssl 352 | starttls) 353 | (if* (consp server) 354 | then (if* (consp (cdr server)) 355 | then ;; long form 356 | (setq ssl-args (cdr server)) 357 | (setf server (car server)) 358 | (setq ssl (getf ssl-args :ssl)) 359 | (remf ssl-args :ssl) 360 | (setq use-port (or (getf ssl-args :port) 361 | (if ssl 465 use-port))) 362 | (remf ssl-args :port) 363 | (setq starttls (getf ssl-args :starttls)) 364 | (remf ssl-args :starttls) 365 | else ;; short form 366 | (setf use-port (cdr server)) 367 | (setf server (car server))) 368 | elseif (stringp server) 369 | then (multiple-value-bind (match whole m1 m2) 370 | (match-re "^([^:]+):([0-9]+)$" server) 371 | (declare (ignore whole)) 372 | (if* match 373 | then (setf server m1) 374 | (setf use-port (parse-integer m2))))) 375 | 376 | (let ((ipaddr (determine-mail-server server)) 377 | (sock) 378 | (ok)) 379 | 380 | (if* (null ipaddr) 381 | then (error "Can't determine ip address for mail server ~s" server)) 382 | 383 | (setq sock (socket:make-socket :remote-host ipaddr 384 | :remote-port use-port 385 | )) 386 | (when ssl 387 | (setq sock (apply #'acl-socket:make-ssl-client-stream sock ssl-args))) 388 | 389 | (unwind-protect 390 | (tagbody 391 | (response-case (sock msg) 392 | (2 ;; to the initial connect 393 | nil) 394 | (t (error "initial connect failed: ~s" msg))) 395 | ehlo 396 | ;; now that we're connected we can compute our hostname 397 | (let ((hostname (socket:ipaddr-to-hostname 398 | (socket:local-host sock)))) 399 | (if* (null hostname) 400 | then (setq hostname 401 | (format nil "[~a]" (socket:ipaddr-to-dotted 402 | (socket:local-host sock))))) 403 | (let ((mechs (smtp-ehlo sock hostname)) 404 | auth-mechs) 405 | (if* (and mechs starttls (member "STARTTLS" mechs :test #'string=)) 406 | then (smtp-send-recv (sock (format nil "STARTTLS") msg) 407 | (2 ;; ok 408 | (setq sock (acl-socket:make-ssl-client-stream sock :method (getf ssl-args :method :tlsv1.2)))) 409 | (t (smtp-transaction-error))) 410 | (go ehlo) 411 | elseif (and mechs login password 412 | (setq auth-mechs (car (member "LOGIN" mechs 413 | :test #'(lambda (x y) (search x y)))))) 414 | then (setf sock 415 | (smtp-authenticate sock server auth-mechs login password))))) 416 | 417 | ;; all is good 418 | (setq ok t)) 419 | 420 | ;; cleanup: 421 | (if* (null ok) 422 | then (close sock :abort t) 423 | (setq sock nil))) 424 | 425 | ;; return: 426 | sock 427 | ))) 428 | 429 | 430 | ;; Returns string with mechanisms, or nil if none. 431 | ;; This may need to be expanded in the future as we support 432 | ;; more of the features that EHLO responds with. 433 | (defun smtp-ehlo (sock our-name) 434 | (smtp-send-recv (sock (format nil "EHLO ~A" our-name) msg) 435 | (2 ;; ok 436 | ;; Collect the auth mechanisms. 437 | (let (mechs) 438 | (multiple-value-bind (found whole mech) 439 | (match-re "250[- ]AUTH (.*)" msg) 440 | (declare (ignore whole)) 441 | (if found (push mech mechs))) 442 | (multiple-value-bind (found whole mech) 443 | (match-re "250[- ](STARTTLS)" msg) 444 | (declare (ignore whole)) 445 | (if found (push mech mechs))) 446 | mechs)) 447 | (t 448 | (smtp-send-recv (sock (format nil "HELO ~A" our-name) msg) 449 | (2 ;; ok 450 | nil) 451 | (t (smtp-transaction-error)))))) 452 | 453 | (defun smtp-authenticate (sock server mechs login password) 454 | (let ((ctx (net.sasl:sasl-client-new "smtp" server 455 | :user login 456 | :pass password)) 457 | (first-server-response t)) 458 | (multiple-value-bind (res selected-mech response) 459 | (net.sasl:sasl-client-start ctx mechs) 460 | (if (not (eq res :continue)) 461 | (error "sasl-client-start unexpectedly returned: ~s" res)) 462 | (smtp-command sock "AUTH ~a" selected-mech) 463 | (loop 464 | (response-case (sock msg) 465 | (3 ;; need more interaction 466 | ;; [rfe12276] Some SMTP servers (notably The Amazon SES 467 | ;; SMTP endpoint (email-smtp.us-east-1.amazonaws.com)) 468 | ;; violate the protocol rules on the first server response. 469 | ;; Apparently other SMTP clients are tolerant of this, so 470 | ;; we try to be as well. 471 | 472 | (multiple-value-bind (decoded-server-response err) 473 | (ignore-errors (base64-string-to-usb8-array (subseq msg 4))) 474 | (when (null decoded-server-response) 475 | (if* first-server-response 476 | then ;; Ignore initial server response if it's 477 | ;; bogus. 478 | ;;;(warn "Bogus server initial response: ~s~%" (subseq msg 4)) 479 | (setf first-server-response nil) 480 | else ;; We tolerate a bogus initial response, but no others 481 | (error "Failed to decode server response of ~s: ~a" 482 | (subseq msg 4) 483 | err))) 484 | 485 | (multiple-value-setq (res response) 486 | (net.sasl:sasl-step ctx decoded-server-response)) 487 | 488 | (smtp-command sock "~a" 489 | (usb8-array-to-base64-string response nil)))) 490 | (2 ;; server is satisfied. 491 | ;; Make sure the auth process really completed 492 | (if (not (net.sasl:sasl-conn-auth-complete-p ctx)) 493 | (error "SMTP server indicated authentication complete before mechanisms was satisfied")) 494 | ;; It's all good. 495 | (return)) ;; break from loop 496 | (t 497 | (error "SMTP authentication failed: ~a" msg))))) 498 | 499 | ;; Reach here if authentication completed. 500 | ;; If a security layer was negotiated, return an encapsulated sock, 501 | ;; otherwise just return the original sock. 502 | (if (net.sasl:sasl-conn-security-layer-p ctx) 503 | (net.sasl:sasl-make-stream ctx sock :close-base t) 504 | sock))) 505 | 506 | 507 | 508 | (defun test-email-address (address) 509 | ;; test to see if we can determine if the address is valid 510 | ;; return nil if the address is bogus 511 | ;; return t if the address may or may not be bogus 512 | (if* (or (not (stringp address)) 513 | (zerop (length address))) 514 | then (error "mail address should be a non-empty string: ~s" address)) 515 | 516 | ; split on the @ sign 517 | (let (name hostname) 518 | (let ((pos (position #\@ address))) 519 | (if* (null pos) 520 | then (setq name address 521 | hostname "localhost") 522 | elseif (or (eql pos 0) 523 | (eql pos (1- (length address)))) 524 | then ; @ at beginning or end, bogus since we don't do route addrs 525 | (return-from test-email-address nil) 526 | else (setq name (subseq address 0 pos) 527 | hostname (subseq address (1+ pos))))) 528 | 529 | (let ((sock (ignore-errors (connect-to-mail-server hostname nil nil)))) 530 | (if* (null sock) then (return-from test-email-address nil)) 531 | 532 | (unwind-protect 533 | (progn 534 | (smtp-send-recv (sock (format nil "VRFY ~a" name) msg code) 535 | (5 536 | (if* (eq code 550) 537 | then ; no such user 538 | msg ; to remove unused warning 539 | nil 540 | else ;; otherwise we don't know 541 | (return-from test-email-address t))) 542 | (t (return-from test-email-address t))) 543 | (smtp-send-recv (sock (format nil "VRFY ~a" address) msg code) 544 | (5 545 | (if* (eq code 550) 546 | then ; no such user 547 | msg ; to remove unused warning 548 | nil 549 | else t)) 550 | (t t))) 551 | (close sock :abort t))))) 552 | 553 | 554 | 555 | 556 | 557 | 558 | 559 | 560 | 561 | 562 | 563 | 564 | 565 | 566 | 567 | (defun wait-for-response (stream) 568 | ;; read the response of the smtp server. 569 | ;; collect it all in a string. 570 | ;; Return two values: 571 | ;; response class 572 | ;; whole string 573 | ;; The string should begin with a decimal digit, and that is converted 574 | ;; into a number which is returned as the response class. 575 | ;; If the string doesn't begin with a decimal digit then the 576 | ;; response class is -1. 577 | ;; 578 | (flet ((match-chars (string pos1 pos2 count) 579 | ;; like strncmp 580 | (dotimes (i count t) 581 | (if* (not (eq (aref string (+ pos1 i)) 582 | (aref string (+ pos2 i)))) 583 | then (return nil))))) 584 | 585 | (let ((res (make-array 20 :element-type 'character 586 | :adjustable t 587 | :fill-pointer 0))) 588 | (if* (null (read-a-line stream res)) 589 | then ; eof encountered before end of line 590 | (return-from wait-for-response (values -1 res))) 591 | 592 | ;; a multi-line response begins with line containing 593 | ;; a hyphen in the 4th column: 594 | ;; xyz- some text 595 | ;; 596 | ;; and ends with a line containing the same reply code but no 597 | ;; hyphen. 598 | ;; xyz some text 599 | ;; 600 | 601 | (if* (and (>= (length res) 4) (eq #\- (aref res 3))) 602 | then ;; multi line response 603 | (let ((old-length (length res)) 604 | (new-length nil)) 605 | (loop 606 | (if* (null (read-a-line stream res)) 607 | then ; eof encountered before end of line 608 | (return-from wait-for-response (values -1 res))) 609 | (setq new-length (length res)) 610 | ;; see if this is the last line 611 | (if* (and (>= (- new-length old-length) 4) 612 | (eq (aref res (+ old-length 3)) #\space) 613 | (match-chars res 0 old-length 3)) 614 | then (return)) 615 | 616 | (setq old-length new-length)))) 617 | 618 | ;; complete response is in res 619 | ;; compute class and return the whole thing 620 | (let ((class (or (and (> (length res) 0) 621 | (digit-char-p (aref res 0))) 622 | -1))) 623 | (values class res 624 | (if* (>= (length res) 3) 625 | then ; compute the whole response value 626 | (+ (* (or (digit-char-p (aref res 0)) 0) 100) 627 | (* (or (digit-char-p (aref res 1)) 0) 10) 628 | (or (digit-char-p (aref res 2)) 0)))))))) 629 | 630 | (defun smtp-command (stream &rest format-args) 631 | ;; send a command to the smtp server 632 | (let ((command (apply #'format nil format-args))) 633 | (if* *smtp-debug* 634 | then (format *smtp-debug* "to smtp command: ~s~%" command) 635 | (force-output *smtp-debug*)) 636 | (write-string command stream) 637 | (write-char #\return stream) 638 | (write-char #\newline stream) 639 | (force-output stream))) 640 | 641 | (defun read-a-line (stream res) 642 | ;; read from stream and put the result in the adjust able array res 643 | ;; if line ends in cr-lf, only put a newline in res. 644 | ;; If we get an eof before the line finishes, return nil, 645 | ;; else return t if all is ok 646 | (let (ch last-ch) 647 | (loop 648 | (setq ch (read-char stream nil nil)) 649 | (if* (null ch) 650 | then ; premature eof 651 | (return nil)) 652 | 653 | (if* *smtp-debug* 654 | then (format *smtp-debug* "~c" ch) 655 | (force-output *smtp-debug*) 656 | ) 657 | 658 | (if* (eq last-ch #\return) 659 | then (if* (eq ch #\linefeed) 660 | then (vector-push-extend #\newline res) 661 | (return t) 662 | else (vector-push-extend last-ch res)) 663 | elseif (eq ch #\linefeed) 664 | then ; line ends with just lf, not cr-lf 665 | (vector-push-extend #\newline res) 666 | (return t) 667 | elseif (not (eq ch #\return)) 668 | then (vector-push-extend ch res)) 669 | 670 | (setq last-ch ch)))) 671 | 672 | (eval-when (compile eval) 673 | (defmacro ipaddrp (obj) 674 | #+(version>= 8 0) `(socket:ipaddrp ,obj) 675 | #-(version>= 8 0) `(and (integerp ,obj) (<= 0 ,obj #.(1- (expt 2 32))))) 676 | ) 677 | 678 | (defun determine-mail-server (name) 679 | ;; return the ipaddress to be used to connect to the 680 | ;; the mail server. 681 | ;; name is any method for naming a machine: 682 | ;; ip address (binary) 683 | ;; string with dotted ip address 684 | ;; string naming a domain 685 | ;; we can only do the mx lookup for the third case, the rest 686 | ;; we just return the ipaddress for what we were given 687 | ;; 688 | (let (ipaddr) 689 | (if* (ipaddrp name) 690 | then name 691 | elseif (ipaddrp (setq ipaddr (socket:dotted-to-ipaddr name :errorp nil))) 692 | then ipaddr 693 | else ; do mx lookup if acldns is being used 694 | (if* (or (eq socket:*dns-mode* :acldns) 695 | (and (consp socket:*dns-mode*) 696 | (member :acldns socket:*dns-mode* :test #'eq))) 697 | then (let ((res (socket:dns-query name :type :mx))) 698 | (if* (and (consp res) (cadr res)) 699 | then (cadr res) ; the ip address 700 | else (dolist (suffix socket::*domain-search-list* 701 | (socket:dns-lookup-hostname name)) 702 | (declare (special socket:*domain-search-list*)) 703 | (let ((name 704 | (concatenate 'string name "." suffix))) 705 | (setq res (socket:dns-query name :type :mx)) 706 | (if* (and res (cadr res)) 707 | then (return (cadr res))))))) 708 | 709 | 710 | else ; just do a hostname lookup 711 | (ignore-errors (socket:lookup-hostname name)))))) 712 | 713 | (defun fold-addresses (addresses) 714 | ;; Convert ADDRESSES into a string, being mindful of rfc5321 715 | ;; and section 4.5.3.1.6: 716 | ;; Text Line 717 | ;; The maximum total length of a text line including the is 718 | ;; 1000 octets (not counting the leading dot duplicated for 719 | ;; transparency). This number may be increased by the use of SMTP 720 | ;; Service Extensions. 721 | ;; and rfc5322 section 2.1.1: 722 | ;; Line Length Limits 723 | ;; There are two limits that this specification places on the number 724 | ;; of characters in a line. Each line of characters MUST be no more 725 | ;; than 998 characters, and SHOULD be no more than 78 characters, 726 | ;; excluding the CRLF. 727 | ;; The latter rfc defines "unfolding" as: 728 | ;; The process of moving from this folded multiple-line representation 729 | ;; of a header field to its single line representation is called 730 | ;; "unfolding". Unfolding is accomplished by simply removing any CRLF 731 | ;; that is immediately followed by WSP. 732 | ;; 733 | ;; So, the continued lines just need some whitespace. We will use 4 734 | ;; spaces after the CRLF. 735 | (do* ((break-at 736 | ;; after this many characters on line, insert a newline 737 | 70) 738 | (spaces #.(make-string 4 :initial-element #\space)) 739 | (addrs addresses (cdr addrs)) 740 | (addr #1=(car addrs) #1#) 741 | (lastp #1=(not (cdr addrs)) #1#) 742 | (buf (make-string-output-stream :element-type 'character)) 743 | (lines '())) 744 | ((null addrs) 745 | (when (> (file-position buf) 0) 746 | (push (get-output-stream-string buf) lines)) 747 | (apply #'concatenate 'simple-string (nreverse lines))) 748 | (princ addr buf) 749 | (when (not lastp) 750 | (princ "," buf) 751 | (write-char #\space buf) 752 | (when (>= (file-position buf) break-at) 753 | (push (get-output-stream-string buf) lines) 754 | (fresh-line buf) 755 | (write-string spaces buf))))) 756 | 757 | 758 | (provide :smtp) 759 | -------------------------------------------------------------------------------- /t-imap.cl: -------------------------------------------------------------------------------- 1 | ;; See the file LICENSE for the full license governing this code. 2 | 3 | ;; imap testing 4 | ;; requires smtp module too 5 | 6 | (eval-when (compile load eval) 7 | (require :rfc2822) 8 | (require :smtp) 9 | (require :imap) 10 | (require :test)) 11 | 12 | 13 | (in-package :test) 14 | 15 | 16 | (defparameter *test-machine* "tiger.franz.com") 17 | (defparameter *test-account* "jkfmail") 18 | (defparameter *test-password* "jkf.imap") 19 | 20 | 21 | (defparameter *test-email* (format nil "~a@~a" *test-account* *test-machine*)) 22 | 23 | 24 | (defun test-connect () 25 | ;; test connecting and disconnecting from the server 26 | 27 | (let ((mb (net.post-office:make-imap-connection *test-machine* 28 | :user *test-account* 29 | :password *test-password*))) 30 | (unwind-protect 31 | (progn 32 | 33 | (test-t (not (null mb))) ; make sure we got a mailbox object 34 | 35 | ; check that we've stored resonable values in the mb object 36 | (test-equal "/" (net.post-office:mailbox-separator mb)) 37 | 38 | (test-t (net.post-office::select-mailbox mb "inbox")) 39 | 40 | (test-t (> (net.post-office:mailbox-uidvalidity mb) 0)) 41 | (test-t (not (null (net.post-office:mailbox-flags mb))))) 42 | 43 | (test-t (net.post-office:close-connection mb))))) 44 | 45 | 46 | (defun test-sends () 47 | ;; test sending and reading mail 48 | (let ((mb (net.post-office:make-imap-connection *test-machine* 49 | :user *test-account* 50 | :password *test-password*))) 51 | (unwind-protect 52 | (progn 53 | (test-t (not (null mb))) ; make sure we got a mailbox object 54 | 55 | ;; go through the mailboxes and delete all letters 56 | (dolist (mblist (net.post-office:mailbox-list mb :pattern "*")) 57 | (if* (not (member :\\noselect (net.post-office:mailbox-list-flags mblist))) 58 | then (net.post-office:select-mailbox mb (net.post-office:mailbox-list-name mblist)) 59 | (let ((count (net.post-office:mailbox-message-count mb))) 60 | ; remove all old mail 61 | (if* (> count 0) 62 | then (net.post-office:alter-flags mb `(:seq 1 ,count) :add-flags :\\deleted) 63 | (net.post-office:expunge-mailbox mb) 64 | (test-eql 0 (net.post-office:mailbox-message-count mb))) 65 | ; remove mailbox (except inbox) 66 | (if* (not (equalp "inbox" (net.post-office:mailbox-list-name mblist))) 67 | then ; must not be selected if we want to del 68 | (net.post-office:select-mailbox mb "inbox") 69 | (net.post-office:delete-mailbox mb (net.post-office:mailbox-list-name mblist))) 70 | 71 | ))) 72 | 73 | 74 | ;; send five letters 75 | (dotimes (i 5) 76 | (net.post-office:send-smtp *test-machine* 77 | *test-email* 78 | *test-email* 79 | (format nil "message number ~d" (1+ i)))) 80 | 81 | ; test to see if imap figures out that the letters are there 82 | (net.post-office:select-mailbox mb "inbox") 83 | 84 | ; wait a bit for the mail to be delivered 85 | (dotimes (i 5) 86 | (if* (not (eql 5 (net.post-office:mailbox-message-count mb))) 87 | then (sleep 1) 88 | (net.post-office: noop mb))) 89 | 90 | (test-eql 5 (net.post-office:mailbox-message-count mb)) 91 | 92 | ; test the search facility 93 | ; look for the message number we put in each message. 94 | ; I hope the letters get delivered in order... 95 | (dotimes (i 5) 96 | (let ((mn (1+ i))) 97 | (test-equal (list mn) 98 | (net.post-office:search-mailbox mb 99 | `(:body ,(format nil "~d" mn)))))) 100 | 101 | ; test getting data from mail message 102 | (let ((fetch-info (net.post-office:fetch-parts mb 103 | 1 104 | "(envelope body[1])"))) 105 | (let ((envelope (net.post-office:fetch-field 1 "envelope" fetch-info)) 106 | (body (net.post-office:fetch-field 1 "body[1]" fetch-info))) 107 | (test-equal "jkfmail" (net.post-office:address-mailbox 108 | (car (net.post-office:envelope-from envelope)))) 109 | (test-nil (net.post-office:address-mailbox 110 | (car (net.post-office:envelope-to envelope)))) 111 | 112 | (test-equal (format nil "message number 1~c" #\newline) 113 | body)))) 114 | (test-t (net.post-office:close-connection mb))))) 115 | 116 | 117 | 118 | (defun test-flags () 119 | ;; test setting and getting flags 120 | ;; 121 | ;; assume we have 5 messages in inbox at this time 122 | ;; 123 | (let ((mb (net.post-office:make-imap-connection *test-machine* 124 | :user *test-account* 125 | :password *test-password*))) 126 | (unwind-protect 127 | (progn 128 | (net.post-office:select-mailbox mb "inbox") 129 | 130 | (let ((flags (net.post-office:fetch-field 3 131 | "flags" 132 | (net.post-office:fetch-parts 133 | mb 3 "flags")))) 134 | (test-nil flags)) 135 | 136 | ;; add flags 137 | (let ((info (net.post-office:alter-flags mb 3 :add-flags :\\deleted))) 138 | (test-equal '(:\\deleted) 139 | (net.post-office:fetch-field 3 "flags" info))) 140 | 141 | ; good bye message 142 | (test-equal '(3) (net.post-office:expunge-mailbox mb)) 143 | 144 | (net.post-office:alter-flags mb 4 :add-flags ':\\bbbb) 145 | (test-equal '(:\\bbbb) 146 | (net.post-office:fetch-field 4 "flags" 147 | (net.post-office:fetch-parts mb 4 148 | "flags"))) 149 | 150 | 151 | ) 152 | (test-t (net.post-office:close-connection mb))))) 153 | 154 | (defun test-mailboxes () 155 | ;; should be 4 messages now in inbox 156 | ;; let's create 4 mailboxes, one for each letter 157 | (let ((mb (net.post-office:make-imap-connection *test-machine* 158 | :user *test-account* 159 | :password *test-password*))) 160 | (unwind-protect 161 | (progn 162 | (net.post-office:select-mailbox mb "inbox") 163 | (dotimes (i 4) 164 | (let ((mbname (format nil "temp/mb~d" i))) 165 | (test-t (net.post-office:create-mailbox mb mbname)) 166 | (net.post-office:copy-to-mailbox mb (1+ i) mbname))) 167 | 168 | ; now check that each new mailbox has one message 169 | (dotimes (i 4) 170 | (let ((mbname (format nil "temp/mb~d" i))) 171 | (net.post-office:select-mailbox mb mbname) 172 | (test-eql 1 (net.post-office:mailbox-message-count mb))))) 173 | (test-t (net.post-office:close-connection mb))))) 174 | 175 | 176 | (defun test-pop () 177 | ;; test out the pop interface to the mailbox. 178 | 179 | (let ((pb (net.post-office:make-pop-connection *test-machine* 180 | :user *test-account* 181 | :password *test-password*))) 182 | ; still from before 183 | (test-eql 4 (net.post-office:mailbox-message-count pb)) 184 | 185 | (test-eql 4 (length (net.post-office:unique-id pb))) 186 | 187 | (net.post-office:delete-letter pb '(:seq 2 3)) 188 | 189 | (test-eql 2 (length (net.post-office:unique-id pb))) 190 | 191 | (test-eql 4 (and :second (net.post-office:mailbox-message-count pb))) 192 | 193 | (net.post-office:noop pb) 194 | 195 | (test-eql 2 (and :third (net.post-office:mailbox-message-count pb))) 196 | 197 | (net.post-office:fetch-letter pb 1) 198 | (test-err (net.post-office:fetch-letter pb 2)) 199 | (test-err (net.post-office:fetch-letter pb 3)) 200 | (net.post-office:fetch-letter pb 4) 201 | 202 | (net.post-office:close-connection pb) 203 | 204 | (setq pb (net.post-office:make-pop-connection *test-machine* 205 | :user *test-account* 206 | :password *test-password*)) 207 | 208 | (test-eql 2 (and :fourth (net.post-office:mailbox-message-count pb))) 209 | 210 | (net.post-office:fetch-letter pb 1) ; just make sure there's no error 211 | 212 | (net.post-office:top-lines pb 1 1) ; just make sure there's no error 213 | (net.post-office:make-envelope-from-text (net.post-office:top-lines pb 1 0)) 214 | 215 | (net.post-office:close-connection pb))) 216 | 217 | 218 | (defun test-mime () 219 | (test-equal 220 | "foobar baz" 221 | (net.post-office:decode-header-text "=?utf-8?q?foo?= 222 | =?utf-8?q?bar?= baz")) 223 | (test-equal 224 | "before brucejones hello" 225 | (net.post-office:decode-header-text "before =?utf-8?q?bruce?= =?utf-8?q?jones?= hello")) 226 | (test-equal 227 | "[Franz Wiki] Update of \"Office/EmployeeDirectory\" by SteveHaflich" 228 | (net.post-office:decode-header-text "=?utf-8?q?=5BFranz_Wiki=5D_Update_of_=22Office/EmployeeDirectory=22_by_St?= 229 | =?utf-8?q?eveHaflich?=")) 230 | ) 231 | 232 | (defun test-parse-email-address () 233 | (dolist (good `(("foo@bar.com" "foo" "bar.com") 234 | ("layer@franz.com" "layer" "franz.com") 235 | (" 236 | 237 | layer@franz.com" "layer" "franz.com") 238 | (,(replace-re "XXlayer@franz.comX X" 239 | "X" 240 | (format nil "~c" #\newline) 241 | :single-line t) 242 | "layer" "franz.com") 243 | (,(replace-re "XXlayer@franz.comX X" 244 | "X" 245 | (format nil "~c" #\return) 246 | :single-line t) 247 | "layer" "franz.com") 248 | ;; local-part length = 64 249 | ("1234567890123456789012345678901234567890123456789012345678901234@foo.com" 250 | "1234567890123456789012345678901234567890123456789012345678901234" 251 | "foo.com") 252 | )) 253 | (multiple-value-bind (local-part domain) 254 | (net.mail:parse-email-address (first good)) 255 | (test-equal (second good) local-part) 256 | (test-equal (third good) domain))) 257 | (dolist (bad (list "@foo.com" 258 | ;; local-part length = 65 259 | "12345678901234567890123456789012345678901234567890123456789012345@foo.com" 260 | )) 261 | (test-nil (net.mail:parse-email-address bad))) 262 | ) 263 | 264 | (defun test-rfc2822 () 265 | (test-t (net.mail:valid-email-domain-p "mail.upb.de")) 266 | ) 267 | 268 | (defparameter *folded-reference-value* 269 | "First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 270 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 271 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 272 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 273 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 274 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 275 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 276 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 277 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 278 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 279 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 280 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 281 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 282 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 283 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 284 | First.Last@domain.com, First.Last@domain.com, First.Last@domain.com, 285 | First.Last@domain.com, First.Last@domain.com") 286 | 287 | (defun test-send-letter () 288 | ;; really a test of fold-addresses 289 | (let* ((test-addresses 290 | (loop for i from 0 to 50 collect "First.Last@domain.com")) 291 | (folded (net.post-office::fold-addresses test-addresses))) 292 | (test-equal *folded-reference-value* folded))) 293 | 294 | 295 | (defun test-imap () 296 | (handler-bind ((net.post-office:po-condition 297 | #'(lambda (con) 298 | (format t "Got imap condition: ~a~%" con)))) 299 | (test-mime) 300 | (test-parse-email-address) 301 | (test-rfc2822) 302 | (test-send-letter) 303 | ;;;; Only jkf is setup to run the tests. 304 | (when (string= "jkf" (sys:getenv "USER")) 305 | (test-connect) 306 | (test-sends) 307 | (test-flags) 308 | (test-mailboxes) 309 | (test-pop) 310 | ))) 311 | 312 | 313 | (if* *do-test* then (do-test :imap #'test-imap)) 314 | --------------------------------------------------------------------------------