├── LICENSE ├── NEWS ├── README ├── TODO ├── diff.asd ├── diff.lisp ├── package.lisp ├── patch.lisp ├── svndiff.lisp ├── test-files ├── test-dst-1.txt ├── test-dst-2.txt ├── test-dst-3.txt ├── test-src-1.txt ├── test-src-2.txt └── test-src-3.txt └── vdelta.lisp /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2004, Nathan Froyd. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without 4 | modification, are permitted provided that the following conditions are 5 | met: 6 | 7 | * Redistributions of source code must retain the above copyright 8 | notice, this list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright 11 | notice, this list of conditions and the following disclaimer in the 12 | documentation and/or other materials provided with the distribution. 13 | 14 | * Neither the name of Nathan Froyd nor the names of the contributors to 15 | this software may be used to endorse or promote products derived from 16 | this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS 19 | IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20 | TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 21 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER 22 | OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 23 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 24 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 25 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 26 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 27 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | -------------------------------------------------------------------------------- /NEWS: -------------------------------------------------------------------------------- 1 | -*- mode: outline -*- 2 | 3 | * Version 0.4, released 10-05-2005 4 | 5 | ** bugfixes 6 | 7 | Context diff printing is "more correct". 8 | 9 | When printing unified diff windows, the header line ("@@ ... @@") is no 10 | longer printed twice. 11 | 12 | ** changes 13 | 14 | DIFF now depends on CL-PPCRE. 15 | 16 | ** new features 17 | 18 | A patch-reading interface has been added. It currently supports unified 19 | and "new-style" context diffs. Applying patches will be added in a 20 | future release. See DIFF::READ-PATCHES-FROM-FILE for a preliminary 21 | interface to reading patches. 22 | 23 | * Version 0.3, released 02-01-2005 24 | 25 | ** new features 26 | 27 | Supports generation of "context"-style diffs (diff -c). As with 28 | unified-style diffs, the diffs may not exactly match the diffs generated 29 | by GNU diff, but GNU patch (or similar) should be able to use the diffs. 30 | 31 | The internal interface has been reorganized somewhat. There is now a 32 | DIFF class and PRINT-OBJECT is used in preference to specially-written 33 | PRINT-FOO functions. 34 | 35 | An interface is actually exported from package DIFF. GENERATE-DIFF 36 | seems to be a reasonable interface for making diffs of all kinds; 37 | suggestions for a better interface are always appreciated (in 38 | particular, there should be an interface for diffing arbitrary streams). 39 | 40 | ** incompatible changes 41 | 42 | DIFF::*UNIFIED-DIFF-CONTEXT-LINES* has been renamed to 43 | DIFF::*DIFF-CONTEXT-LINES*, as it can be used for both unified and 44 | context diffs. 45 | 46 | DIFF::PRINT-DIFF has been removed in favor of PRINT-OBJECT functionality. 47 | 48 | * Version 0.2, released 27-05-2004 49 | 50 | ** new features 51 | 52 | An ASDF packaging file has been added. 53 | 54 | This package creates and uses the DIFF package. 55 | 56 | An implementation of the vdelta binary differencing algorithm has been 57 | added. 58 | 59 | Unified diff generation now produces Lisp objects to sling around rather 60 | than writing the diff to a stream as it is being generated (change made 61 | as per #lisp suggestions). 62 | 63 | ** tests 64 | 65 | A small set of test files have been included with the distribution. 66 | 67 | * Version 0.1, released 24-05-2004 68 | 69 | Initial release 70 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | DIFF is a package for computing various forms of differences between 2 | blobs of data and then doing neat things with those differences. 3 | Currently diff knows how to compute three common forms of differences: 4 | 5 | * "unified" format diffs, suitable for inspecting changes between 6 | different versions of files; 7 | * "context" format diffs, suitable for inspecting changes between 8 | different versions of files; 9 | * "vdelta" format binary diffs, which might be useful in a version 10 | control system for compact storage of deltas. 11 | 12 | An ASDF system is provided; there are no symbols exported from the DIFF 13 | package, as a good interface has not yet been decided upon. 14 | Documentation is fairly sparse. 15 | 16 | Nathan Froyd 17 | -------------------------------------------------------------------------------- /TODO: -------------------------------------------------------------------------------- 1 | * implement patch(1) functionality 2 | * more options for context/unified diff output; in particular, supporting 3 | something similar to GNU diff's -F option would be useful. 4 | * vcdiff encoding of copy/insert deltas 5 | * structural differencing algorithms 6 | * diff3 functionality 7 | * applying svndiffs to files/streams 8 | * identifying a useful interface for CL:EXPORT 9 | * docstrings and such 10 | -------------------------------------------------------------------------------- /diff.asd: -------------------------------------------------------------------------------- 1 | ;;;; diff.asd - the ASDF system definition for diff -*- lisp -*- 2 | (cl:defpackage #:diff-system 3 | (:use :cl)) 4 | 5 | (cl:in-package #:diff-system) 6 | 7 | (asdf:defsystem :diff 8 | :version "0.4" 9 | :author "Nathan Froyd " 10 | :maintainer "Nathan Froyd " 11 | :depends-on (:cl-ppcre :trivial-gray-streams) 12 | :components ((:file "package") 13 | (:file "diff" :depends-on ("package")) 14 | (:file "patch" :depends-on ("diff")) 15 | (:file "vdelta" :depends-on ("package")) 16 | (:file "svndiff" :depends-on ("package" "vdelta")) 17 | (:static-file "README") 18 | (:static-file "TODO") 19 | (:static-file "NEWS") 20 | (:static-file "LICENSE"))) -------------------------------------------------------------------------------- /diff.lisp: -------------------------------------------------------------------------------- 1 | ;;;; diff.lisp - producing unified style diffs from Common Lisp 2 | 3 | ;;; The diffs generated by this package may not match the diffs 4 | ;;; generated by GNU diff (or other diff programs of which I have 5 | ;;; not heard). However, the diffs produced should still apply 6 | ;;; cleanly with `patch' or something similar. If you find files 7 | ;;; where this does not hold true, please contact me. 8 | ;;; 9 | ;;; Differences between line ending conventions are pretty much 10 | ;;; ignored in the current implementation, since READ-LINE is used 11 | ;;; to snarf the lines from the file. GNU diff indicates whether 12 | ;;; or not the last line in the file ended in a newline; the last 13 | ;;; line of our diffs always end in newlines, which may or may not 14 | ;;; be acceptable to some people. 15 | ;;; 16 | ;;; No effort has been made to optimize for speed. Nonetheless, 17 | ;;; the implementation is probably "fast enough", since generating 18 | ;;; diffs is not really a speed-critical task. 19 | ;;; 20 | ;;; Compared to GNU diff, the set of options is pretty small. 21 | 22 | (in-package :diff) 23 | 24 | (declaim (optimize (debug 3))) 25 | 26 | 27 | ;;; interning arbitrary objects for equality 28 | 29 | ;;; We've made this fairly general, but we really only use this 30 | ;;; machinery to intern strings. I suppose we could have used CL:INTERN 31 | ;;; for this purpose, but I doubt the symbol implementations in a lot 32 | ;;; of Common Lisp implementations could handle our requirements. Plus 33 | ;;; it's not clear that symbols are ever garbage-collected. 34 | 35 | (defclass interner () 36 | ((object-to-code-table :initarg :object-to-code-table 37 | :reader object-to-code-table) 38 | (code-to-object-table :initarg :code-to-object-table 39 | :accessor code-to-object-table) 40 | (last-code :initform 0 :type integer :accessor last-code))) 41 | 42 | (defun make-interner (&key (test #'equal)) 43 | (let ((object-to-code-table (make-hash-table :test test)) 44 | (code-to-object-table (make-array 128))) 45 | (make-instance 'interner :object-to-code-table object-to-code-table 46 | :code-to-object-table code-to-object-table))) 47 | 48 | (defun interned-object (interner code) 49 | (aref (code-to-object-table interner) code)) 50 | 51 | (defun intern-string (interner string) 52 | (multiple-value-bind (code presentp) 53 | (gethash string (object-to-code-table interner)) 54 | (if presentp 55 | code ; string already exists 56 | (let ((object-code (last-code interner)) 57 | (otc-table (object-to-code-table interner)) 58 | (cto-table (code-to-object-table interner))) 59 | ;; grow table if necessary 60 | (when (= object-code (length cto-table)) 61 | (let ((new-cto-table (make-array (* (length cto-table) 2)))) 62 | (replace new-cto-table cto-table) 63 | (setf cto-table new-cto-table) 64 | (setf (code-to-object-table interner) new-cto-table))) 65 | (setf (gethash string otc-table) object-code 66 | (aref cto-table object-code) string) 67 | (incf (last-code interner)) 68 | object-code)))) 69 | 70 | (defmacro do-stream-lines ((line-var stream-var &optional result) &body body) 71 | `(loop for ,line-var = (read-line ,stream-var nil nil) 72 | while ,line-var 73 | do ,@body 74 | finally (return ,result))) 75 | 76 | (defmacro do-file-lines ((line-var pathname-var &optional result) &body body) 77 | (let ((stream-var (gensym))) 78 | `(with-open-file (,stream-var ,pathname-var :direction :input 79 | :element-type 'character) 80 | (do-stream-lines (,line-var ,stream-var ,result) 81 | ,@body)))) 82 | 83 | (defun intern-files (&rest files) 84 | (let ((interner (make-interner)) 85 | (interned-files nil)) 86 | (dolist (file files (values interner (nreverse interned-files))) 87 | (let ((interned-file nil)) 88 | (do-file-lines (line file) 89 | (let ((code (intern-string interner line))) 90 | (push code interned-file))) 91 | (push (coerce (nreverse interned-file) 'simple-vector) interned-files))))) 92 | 93 | (defun intern-seqs (&rest seqs) 94 | (let ((interner (make-interner)) 95 | (interned-seqs nil)) 96 | (dolist (seq seqs (values interner (nreverse interned-seqs))) 97 | (let ((interned-seq nil)) 98 | (loop :for line :in seq :do 99 | (let ((code (intern-string interner line))) 100 | (push code interned-seq))) 101 | (push (coerce (nreverse interned-seq) 'simple-vector) interned-seqs))))) 102 | 103 | 104 | ;;; Computing longest common subsequences between two sequences whose 105 | ;;; elements compare equal via EQL. The algorithm used here is based 106 | ;;; on _An O(NP) Sequence Comparison Algorithm_ by Sun Wu, Udi Manber, 107 | ;;; and Gene Meyers. 108 | 109 | (defclass snake () 110 | ((original-offset :accessor original-offset :initarg :original-offset) 111 | (modified-offset :accessor modified-offset :initarg :modified-offset) 112 | (length :accessor snake-length :initarg :length) 113 | (lcs :accessor lcs :initform nil))) 114 | 115 | (defmethod print-object ((snake snake) stream) 116 | (print-unreadable-object (snake stream) 117 | (format stream "Snake ~A ~A ~A" 118 | (original-offset snake) 119 | (modified-offset snake) 120 | (snake-length snake)))) 121 | 122 | (defun snake (lcs original modified k y) 123 | (let* ((x (- y k)) 124 | (y y) 125 | (x-start x) 126 | (y-start y)) 127 | (loop while (and (< -1 x (length original)) 128 | (< -1 y (length modified)) 129 | (eql (svref original x) (svref modified y))) 130 | do (incf x) (incf y)) 131 | (let ((snake (make-instance 'snake 132 | :original-offset x-start 133 | :modified-offset y-start 134 | :length (- y y-start)))) 135 | (if (= y-start y) 136 | (setf (lcs snake) lcs) 137 | (setf (lcs snake) (cons snake lcs))) 138 | snake))) 139 | 140 | (defun modified-end (snake) 141 | (+ (modified-offset snake) (snake-length snake))) 142 | 143 | (defun compute-lcs* (original modified) 144 | (let* ((m (length original)) 145 | (n (length modified)) 146 | (fp (make-array (+ m n 3) 147 | :initial-element (make-instance 'snake 148 | :original-offset -1 149 | :modified-offset -1 150 | :length 0))) 151 | (delta (- n m))) 152 | ;; There ought to be a good way to let the lengths be arbitrary, 153 | ;; but right now, we're just going to enforce that the modified 154 | ;; always be longer and let the caller do the fixing if need be. 155 | (when (minusp delta) 156 | (error "Length of MODIFIED less than length of ORIGINAL.")) 157 | (macrolet ((fpref (index) 158 | `(svref fp (+ ,index m 1)))) 159 | (flet ((do-snake (index) 160 | ;; ugh, FIXME 161 | (let ((snake (let ((lower-snake (fpref (1- index))) 162 | (upper-snake (fpref (1+ index)))) 163 | (if (> (1+ (modified-end lower-snake)) 164 | (modified-end upper-snake)) 165 | (snake (lcs lower-snake) 166 | original modified index 167 | (1+ (modified-end lower-snake))) 168 | (snake (lcs upper-snake) 169 | original modified index 170 | (modified-end upper-snake)))))) 171 | (setf (fpref index) snake)))) 172 | (do ((p 0 (1+ p))) 173 | ((= (modified-end (fpref delta)) n) 174 | ;; Add one last snake to easily detect EOF. 175 | (nreverse (cons (make-instance 'snake :original-offset m 176 | :modified-offset n 177 | :length 0) 178 | (lcs (fpref delta))))) 179 | (loop for k from (- p) upto (1- delta) 180 | do (do-snake k)) 181 | (loop for k from (+ delta p) downto (1+ delta) 182 | do (do-snake k)) 183 | (do-snake delta)))))) 184 | 185 | (defun compute-lcs (original modified) 186 | (let* ((original-length (length original)) 187 | (modified-length (length modified)) 188 | (modified-longer-p (> modified-length original-length)) 189 | ;; The algorithm given in the paper only works when the 190 | ;; modified is at least as long as the original. So we let 191 | ;; the algorithm follow that assumption and then fix it 192 | ;; at a later point. The lcs between two originals is 193 | ;; always the same--but the *diff* between them will be 194 | ;; different depending on the ordering. 195 | (lcs (if modified-longer-p 196 | (compute-lcs* original modified) 197 | (compute-lcs* modified original)))) 198 | (unless modified-longer-p 199 | ;; Go through and fix the lcs to have the right references. 200 | (dolist (snake lcs) 201 | (rotatef (original-offset snake) (modified-offset snake)))) 202 | lcs)) 203 | 204 | ;;; actually producing diffs from longest common subsequences 205 | 206 | (defclass diff-region () 207 | ((original-start :initarg :original-start :reader original-start) 208 | (original-length :initarg :original-length :reader original-length) 209 | (modified-start :initarg :modified-start :reader modified-start) 210 | (modified-length :initarg :modified-length :reader modified-length))) 211 | 212 | (defclass common-diff-region (diff-region) ()) 213 | 214 | (defclass modified-diff-region (diff-region) ()) 215 | 216 | (defmethod print-object ((diff-region diff-region) stream) 217 | (print-unreadable-object (diff-region stream) 218 | (format stream "~A src:~A/~A mod:~A/~A" 219 | (class-name (class-of diff-region)) 220 | (original-start diff-region) (original-length diff-region) 221 | (modified-start diff-region) (modified-length diff-region)))) 222 | 223 | (defun convert-lcs-to-diff (lcs &key (want-common t) 224 | (original-start 0) 225 | (modified-start 0)) 226 | (let ((diff-regions nil)) 227 | (loop 228 | (let ((snake (first lcs))) 229 | (when (or (< original-start (original-offset snake)) 230 | (< modified-start (modified-offset snake))) 231 | (push (make-instance 'modified-diff-region 232 | :original-start original-start 233 | :original-length (- (original-offset snake) 234 | original-start) 235 | :modified-start modified-start 236 | :modified-length (- (modified-offset snake) 237 | modified-start)) 238 | diff-regions)) 239 | (when (zerop (snake-length snake)) 240 | ;; Party's over, let's go home. 241 | (return-from convert-lcs-to-diff (nreverse diff-regions))) 242 | (setf original-start (original-offset snake) 243 | modified-start (modified-offset snake)) 244 | (when want-common 245 | (push (make-instance 'common-diff-region 246 | :original-start original-start 247 | :original-length (snake-length snake) 248 | :modified-start modified-start 249 | :modified-length (snake-length snake)) 250 | diff-regions)) 251 | (incf original-start (snake-length snake)) 252 | (incf modified-start (snake-length snake)) 253 | (pop lcs))))) 254 | 255 | (defun compute-raw-diff (origin modified) 256 | (convert-lcs-to-diff (compute-lcs origin modified))) 257 | 258 | (defun compute-raw-seq-diff (original-seq modified-seq) 259 | (multiple-value-bind (interner interned-seqs) 260 | (intern-seqs original-seq modified-seq) 261 | (declare (ignorable interner)) 262 | (convert-lcs-to-diff (apply #'compute-lcs interned-seqs)))) 263 | 264 | 265 | ;;; producing diffs in "unified diff" format 266 | 267 | (defparameter *diff-context-lines* 3 268 | "The number of lines of context to include for unified and context style 269 | diffs. The 'patch' program will have a hard time with less than two lines 270 | of context; the default of three should be good enough for most situations.") 271 | 272 | (defclass diff () 273 | ((original-pathname :initarg :original-pathname :accessor original-pathname) 274 | (modified-pathname :initarg :modified-pathname :accessor modified-pathname) 275 | (window-class :initarg :window-class :reader diff-window-class) 276 | (windows :initform nil :accessor diff-windows))) 277 | 278 | (defclass unified-diff (diff) () 279 | (:default-initargs 280 | :window-class 'unified-diff-window)) 281 | 282 | (defclass context-diff (diff) () 283 | (:default-initargs 284 | :window-class 'context-diff-window)) 285 | 286 | (defclass diff-generator () 287 | ((interned-lines :initarg :interned-lines :reader interner) 288 | (original-lines :initarg :original-lines :reader original-lines) 289 | (modified-lines :initarg :modified-lines :reader modified-lines) 290 | (current-window :initform nil :accessor current-window) 291 | (diff :initarg :diff :reader diff))) 292 | 293 | (defun original-line (context index) 294 | (let ((interner (interner context)) 295 | (original-lines (original-lines context))) 296 | (interned-object interner (aref original-lines index)))) 297 | 298 | (defun modified-line (context index) 299 | (let ((interner (interner context)) 300 | (modified-lines (modified-lines context))) 301 | (interned-object interner (aref modified-lines index)))) 302 | 303 | ;;; Some other diff implementations call this a "hunk". 304 | (defclass diff-window () 305 | ((original-start-line :initarg :original-start-line 306 | :accessor original-start-line) 307 | (modified-start-line :initarg :modified-start-line 308 | :accessor modified-start-line) 309 | (original-length :initarg :original-length 310 | :initform 0 311 | :accessor original-length) 312 | (modified-length :initarg :modified-length 313 | :initform 0 314 | :accessor modified-length) 315 | (window-chunks :initform nil 316 | :accessor window-chunks))) 317 | 318 | (defun apply-seq-window (original-seq window &key (offset 0)) 319 | "Apply the edits encoded in WINDOW to the ORIGINAL-SEQ." 320 | (multiple-value-bind (interner interned-seqs) 321 | (apply #'intern-seqs original-seq 322 | (mapcar #'chunk-lines (window-chunks window))) 323 | (let ((index (original-start-line window)) 324 | (result (coerce (first interned-seqs) 'list))) 325 | (flet ((ind () (+ index offset)) 326 | (back (line) (interned-object interner line))) 327 | (loop 328 | for chunk in (window-chunks window) 329 | for lines in (mapcar (lambda (l) (coerce l 'list)) (cdr interned-seqs)) 330 | do (case (chunk-kind chunk) 331 | (:common 332 | (mapc (lambda (line) 333 | (assert (eql line (nth (ind) result)) 334 | (line result index) 335 | "window does not apply at ~d, ~s!=~s " 336 | (ind) (back line) (back (nth (ind) result))) 337 | (incf index)) 338 | lines)) 339 | ((:replace :delete) 340 | (setf result 341 | (append (subseq result 0 (ind)) 342 | (subseq result (+ (ind) (length lines))))) 343 | (incf index (length lines)) 344 | (decf offset (length lines))) 345 | ((:insert :create) 346 | (setf result (append (subseq result 0 (ind)) 347 | lines 348 | (subseq result (ind)))) 349 | (incf offset (length lines))))) 350 | (values (mapcar #'back result) offset))))) 351 | 352 | (defun apply-seq-diff (original-seq diff) 353 | "Apply DIFF to the sequence ORIGINAL-SEQ." 354 | (apply #'values 355 | (reduce 356 | (lambda (accumulator window) 357 | (destructuring-bind (seq offset) accumulator 358 | (multiple-value-call #'list 359 | (apply-seq-window seq window :offset offset)))) 360 | (diff-windows diff) :initial-value (list original-seq 0)))) 361 | 362 | (deftype chunk-kind () '(member :common :delete :replace :insert :create)) 363 | 364 | (defclass chunk () 365 | ((kind :initarg :kind :reader chunk-kind :type chunk-kind) 366 | (lines :initarg :lines :reader chunk-lines :type list))) 367 | 368 | (defun modified-chunk-p (chunk) 369 | (let ((kind (chunk-kind chunk))) 370 | (or (eq kind :insert) (eq kind :create)))) 371 | 372 | (defun original-chunk-p (chunk) 373 | (let ((kind (chunk-kind chunk))) 374 | (or (eq kind :delete) (eq kind :replace)))) 375 | 376 | (defmethod print-object ((object chunk) stream) 377 | (print-unreadable-object (object stream) 378 | (format stream "Chunk ~A / ~A" (chunk-kind object) 379 | (length (chunk-lines object))))) 380 | 381 | (defclass unified-diff-window (diff-window) ()) 382 | (defclass context-diff-window (diff-window) ()) 383 | 384 | (defun create-window (generator) 385 | (create-window-for-diff (diff generator))) 386 | 387 | (defun create-window-for-diff (diff) 388 | (make-instance (diff-window-class diff))) 389 | 390 | (defun original-window-length (window) 391 | (reduce #'+ (window-chunks window) 392 | :key #'(lambda (chunk) 393 | (ecase (chunk-kind chunk) 394 | ((:common :delete :replace) (length (chunk-lines chunk))) 395 | ((:insert :create) 0))) 396 | :initial-value 0)) 397 | 398 | (defun modified-window-length (window) 399 | (reduce #'+ (window-chunks window) 400 | :key #'(lambda (chunk) 401 | (ecase (chunk-kind chunk) 402 | ((:common :insert :create) (length (chunk-lines chunk))) 403 | ((:delete :replace) 0))) 404 | :initial-value 0)) 405 | 406 | (defun add-window (context window) 407 | (setf (window-chunks window) (nreverse (window-chunks window))) 408 | (push window (diff-windows (diff context))) 409 | (setf (current-window context) nil)) 410 | 411 | (defgeneric process-region (context region)) 412 | 413 | (defun last-region-p (context region) 414 | (and (= (length (original-lines context)) 415 | (+ (original-start region) (original-length region))) 416 | (= (length (modified-lines context)) 417 | (+ (modified-start region) (modified-length region))))) 418 | 419 | (defmethod process-region ((context diff-generator) 420 | (region common-diff-region)) 421 | (when (current-window context) 422 | (do ((i 0 (1+ i)) 423 | (common-lines nil)) 424 | ;; I'm not entirely convinced that this is right, but the logic 425 | ;; embodied in this complicated expression can be summarized 426 | ;; as follows: 427 | ;; 428 | ;; * if this is the last region in the diff, then we should 429 | ;; only add a few lines of context; 430 | ;; * if this is a "large" region, then we should only add a 431 | ;; few lines of context 432 | ;; * otherwise, we have a "small" connecting region and we 433 | ;; should attempt to add the entire thing. 434 | ((or (if (or (> (original-length region) 435 | (* *diff-context-lines* 2)) 436 | (last-region-p context region)) 437 | (= i *diff-context-lines*) 438 | (= i (* *diff-context-lines* 2))) 439 | (= i (original-length region))) 440 | ;; Add the lines to the current window. 441 | (push (make-instance 'chunk 442 | :kind :common 443 | :lines (nreverse common-lines)) 444 | (window-chunks (current-window context))) 445 | ;; If this is the last chunk of common lines to add, then we 446 | ;; need to reverse the chunks of lines in the window, add 447 | ;; the current window to the window list and null out the 448 | ;; current window. 449 | (when (> (original-length region) 450 | (* *diff-context-lines* 2)) 451 | (add-window context (current-window context)))) 452 | (push (original-line context (+ i (original-start region))) 453 | common-lines)))) 454 | 455 | (defmethod process-region ((context diff-generator) 456 | (region modified-diff-region)) 457 | (let ((window (current-window context))) 458 | (unless window 459 | ;; This is the ugly case, because there was some common region 460 | ;; which we don't know about, yet we have to add lines from said 461 | ;; region. We do know, however, that the common region which 462 | ;; preceeded this one must have been longer than 463 | ;; *DIFF-CONTEXT-LINES* lines, or else the window would not 464 | ;; be NIL. We therefore walk backwards from ourself, adding 465 | ;; common lines as we go, and then we add the lines we contribute. 466 | (do ((i 0 (1+ i)) 467 | (new-window (create-window context)) 468 | (common-lines nil)) 469 | ((or (= i *diff-context-lines*) 470 | ;; Edge case of when the common region begins the file 471 | ;; and is quite small. 472 | (= 0 (- (original-start region) i))) 473 | (push (make-instance 'chunk 474 | :kind :common 475 | :lines common-lines) 476 | (window-chunks new-window)) 477 | ;; Make a note of where this window started for posterity. 478 | (setf (original-start-line new-window) (- (original-start region) i) 479 | (modified-start-line new-window) (- (modified-start region) i)) 480 | (setf (current-window context) new-window) 481 | (setf window new-window)) 482 | (push (original-line context (- (original-start region) i 1)) 483 | common-lines))) 484 | ;; Deletes come first. 485 | (when (plusp (original-length region)) 486 | (loop for index from (original-start region) 487 | below (+ (original-start region) (original-length region)) 488 | collect (original-line context index) into deleted-lines 489 | finally (push (make-instance 'chunk 490 | :kind (if (plusp (modified-length region)) 491 | :replace 492 | :delete) 493 | :lines deleted-lines) 494 | (window-chunks window)))) 495 | ;; Now for the inserts. 496 | (when (plusp (modified-length region)) 497 | (loop for index from (modified-start region) 498 | below (+ (modified-start region) (modified-length region)) 499 | collect (modified-line context index) into inserted-lines 500 | finally (push (make-instance 'chunk 501 | :kind (if (plusp (original-length region)) 502 | :insert 503 | :create) 504 | :lines inserted-lines) 505 | (window-chunks window)))))) 506 | 507 | ;;; Actually producing diffs. 508 | (defun walk-diff-regions (context diff-regions) 509 | (dolist (region diff-regions) 510 | (process-region context region)) 511 | ;; Pick off any stragglers. FIXME: is this appropriate for a generic 512 | ;; operation like WALK-DIFF-REGIONS? Maybe there should be a function 513 | ;; like FINALIZE-CONTEXT or some such. 514 | (when (current-window context) 515 | (add-window context (current-window context))) 516 | (let ((diff (diff context))) 517 | (setf (diff-windows diff) (nreverse (diff-windows diff))) 518 | diff)) 519 | 520 | (defun create-diff-generator (diff-kind interner 521 | original-pathname original-lines 522 | modified-pathname modified-lines) 523 | (make-instance 'diff-generator 524 | :interned-lines interner 525 | :original-lines original-lines 526 | :modified-lines modified-lines 527 | :diff (make-instance diff-kind 528 | :original-pathname original-pathname 529 | :modified-pathname modified-pathname))) 530 | 531 | (defgeneric render-diff (diff stream) 532 | (:documentation "Print DIFF object to STREAM")) 533 | 534 | (defgeneric render-diff-window (window stream) 535 | (:documentation "Print WINDOW to STREAM")) 536 | 537 | (defun generate-diff (diff-kind original-pathname modified-pathname) 538 | "Compute a diff between ORIGINAL-PATHNAME and MODIFIED-PATHNAME. 539 | DIFF-KIND indicates the type of DIFF generated and should be the symbol 540 | DIFF:UNIFIED-DIFF or DIFF:CONTEXT-DIFF." 541 | (multiple-value-bind (interner interned-files) 542 | (intern-files original-pathname modified-pathname) 543 | (let* ((original (first interned-files)) 544 | (modified (second interned-files)) 545 | (lcs (compute-lcs original modified))) 546 | (let ((diff-regions (convert-lcs-to-diff lcs)) 547 | (context (create-diff-generator diff-kind interner 548 | original-pathname original 549 | modified-pathname modified))) 550 | (walk-diff-regions context diff-regions))))) 551 | 552 | (defun generate-seq-diff (diff-kind original-seq modified-seq) 553 | "Compute a diff between ORIGINAL-PATHNAME and MODIFIED-PATHNAME." 554 | (multiple-value-bind (interner interned-seqs) 555 | (intern-seqs original-seq modified-seq) 556 | (let* ((original (first interned-seqs)) 557 | (modified (second interned-seqs)) 558 | (lcs (compute-lcs original modified))) 559 | (let ((diff-regions (convert-lcs-to-diff lcs)) 560 | (context (create-diff-generator diff-kind interner 561 | "original" original 562 | "modified" modified))) 563 | (walk-diff-regions context diff-regions))))) 564 | 565 | (defun format-diff (diff-kind original-pathname modified-pathname &optional (stream *standard-output*)) 566 | (render-diff (generate-diff diff-kind 567 | original-pathname 568 | modified-pathname) 569 | stream)) 570 | 571 | (defun format-diff-string (diff-kind original-pathname modified-pathname) 572 | (with-output-to-string (out) 573 | (format-diff diff-kind original-pathname modified-pathname out))) 574 | 575 | ;;; printing diffs on streams 576 | 577 | 578 | (defmethod render-diff-window :before ((window unified-diff-window) stream) 579 | (let ((original-length (original-window-length window)) 580 | (modified-length (modified-window-length window))) 581 | (format stream "@@ -~A" (1+ (original-start-line window))) 582 | (unless (zerop original-length) 583 | (format stream ",~A" original-length)) 584 | (format stream " +~A" (1+ (modified-start-line window))) 585 | (unless (zerop modified-length) 586 | (format stream ",~A" modified-length)) 587 | (write-string " @@" stream) 588 | (terpri stream))) 589 | 590 | (defmethod render-diff-window :before ((window context-diff-window) stream) 591 | (format stream "***************~%")) 592 | 593 | (defmethod render-diff-window ((object unified-diff-window) stream) 594 | (dolist (chunk (window-chunks object)) 595 | (let ((prefix (ecase (chunk-kind chunk) 596 | (:common #\Space) 597 | ((:delete :replace) #\-) 598 | ((:insert :create) #\+)))) 599 | (dolist (line (chunk-lines chunk)) 600 | (write-char prefix stream) 601 | (write-string line stream) 602 | (terpri stream))))) 603 | 604 | (defun window-contains-deletes-p (window) 605 | (some #'original-chunk-p (window-chunks window))) 606 | 607 | (defun window-contains-inserts-p (window) 608 | (some #'modified-chunk-p (window-chunks window))) 609 | 610 | (defmethod render-diff-window ((window context-diff-window) stream) 611 | (let ((original-length (1- (original-window-length window))) 612 | (original-start-line (1+ (original-start-line window))) 613 | (modified-length (1- (modified-window-length window))) 614 | (modified-start-line (1+ (modified-start-line window)))) 615 | ;; FIXME: lots of duplicated code, but factoring it out would result 616 | ;; in a function with about ten parameters...which is of dubious 617 | ;; usefulness. Still, good style dictates that it should be done. 618 | ;; the original file comes first 619 | (format stream "*** ~A,~A ****~%" original-start-line 620 | (+ original-start-line original-length)) 621 | (when (and (plusp original-length) (window-contains-deletes-p window)) 622 | (dolist (chunk (window-chunks window)) 623 | (unless (modified-chunk-p chunk) 624 | (let ((prefix (ecase (chunk-kind chunk) 625 | (:common #\Space) 626 | (:replace #\!) 627 | (:delete #\-)))) 628 | (dolist (line (chunk-lines chunk)) 629 | (write-char prefix stream) 630 | (write-string line stream) 631 | (terpri stream)))))) 632 | ;; now the modified file 633 | (format stream "--- ~A,~A ----~%" modified-start-line 634 | (+ modified-start-line modified-length)) 635 | (when (and (plusp modified-length) (window-contains-inserts-p window)) 636 | (dolist (chunk (window-chunks window)) 637 | (unless (original-chunk-p chunk) 638 | (let ((prefix (ecase (chunk-kind chunk) 639 | (:common #\Space) 640 | (:insert #\!) 641 | (:create #\+)))) 642 | (dolist (line (chunk-lines chunk)) 643 | (write-char prefix stream) 644 | (write-string line stream) 645 | (terpri stream)))))))) 646 | 647 | (defmethod render-diff :before ((diff unified-diff) stream) 648 | (format stream "--- ~A~%+++ ~A~%" 649 | (namestring (original-pathname diff)) 650 | (namestring (modified-pathname diff)))) 651 | 652 | (defmethod render-diff :before ((diff context-diff) stream) 653 | (format stream "*** ~A~%--- ~A~%" 654 | (namestring (original-pathname diff)) 655 | (namestring (modified-pathname diff)))) 656 | 657 | (defmethod render-diff ((object diff) stream) 658 | (dolist (window (diff-windows object)) 659 | (render-diff-window window stream))) 660 | 661 | -------------------------------------------------------------------------------- /package.lisp: -------------------------------------------------------------------------------- 1 | ;;;; package.lisp 2 | 3 | (defpackage :diff 4 | (:use :cl) 5 | (:export #:*diff-context-lines* 6 | #:generate-diff 7 | #:generate-seq-diff 8 | #:unified-diff #:context-diff 9 | 10 | #:apply-seq-window 11 | #:apply-seq-diff 12 | #:apply-seq-patch 13 | #:apply-patch 14 | 15 | #:render-diff 16 | #:render-diff-window 17 | #:format-diff 18 | #:format-diff-string 19 | 20 | #:diff 21 | #:original-pathname 22 | #:modified-pathname 23 | #:diff-window-class 24 | #:diff-windows 25 | 26 | #:diff-window 27 | #:original-start-line 28 | #:original-length 29 | #:modified-start-line 30 | #:modified-length 31 | #:window-chunks 32 | 33 | #:chunk-kind 34 | #:chunk-lines 35 | 36 | #:compute-raw-diff 37 | #:compute-raw-seq-diff 38 | #:common-diff-region 39 | #:modified-diff-region 40 | #:original-start 41 | #:original-length 42 | #:modified-start 43 | #:modified-length)) 44 | -------------------------------------------------------------------------------- /patch.lisp: -------------------------------------------------------------------------------- 1 | ;;;; patch.lisp -- reading patch files and applying them 2 | 3 | (in-package :diff) 4 | 5 | 6 | ;;; Some people, when confronted with a problem think, "I know, I'll 7 | ;;; use regular expressions." Now they have two problems. 8 | ;;; 9 | ;;; --Jamie Zawinski 10 | (defparameter *number-regex* '(:greedy-repetition 1 nil :digit-class)) 11 | 12 | 13 | (eval-when (:compile-toplevel :load-toplevel :execute) 14 | (defun anchored-line (regex) 15 | (if (consp regex) 16 | `(:sequence :start-anchor ,@regex :end-anchor) 17 | `(:sequence :start-anchor ,regex :end-anchor))) 18 | ) 19 | 20 | (defparameter *unified-diff-window-header* 21 | (cl-ppcre:create-scanner 22 | (anchored-line 23 | `("@@ -" 24 | (:register ,*number-regex*) 25 | (:greedy-repetition 0 1 (:sequence #\, (:register ,*number-regex*))) 26 | " +" 27 | (:register ,*number-regex*) 28 | (:greedy-repetition 0 1 (:sequence #\, (:register ,*number-regex*))) 29 | " @@")))) 30 | 31 | (defparameter *context-diff-window-header* 32 | (cl-ppcre:create-scanner (anchored-line "***************"))) 33 | 34 | (defparameter *context-diff-window-original-line* 35 | (cl-ppcre:create-scanner 36 | (anchored-line 37 | `("*** " 38 | (:register ,*number-regex*) 39 | #\, 40 | (:register ,*number-regex*) 41 | " ****")))) 42 | 43 | (defparameter *context-diff-window-modified-line* 44 | (cl-ppcre:create-scanner 45 | (anchored-line 46 | `("--- " 47 | (:register ,*number-regex*) 48 | #\, 49 | (:register ,*number-regex*) 50 | " ----")))) 51 | 52 | (defparameter *index-header-line* 53 | (cl-ppcre:create-scanner 54 | (anchored-line 55 | '("Index: " (:register (:greedy-repetition 0 nil :everything)))))) 56 | 57 | (defparameter *prereq-header-line* 58 | (cl-ppcre:create-scanner 59 | (anchored-line 60 | '("Prereq: " (:register (:greedy-repetition 0 nil :everything)))))) 61 | 62 | (defparameter *unified-diff-line* 63 | (cl-ppcre:create-scanner 64 | (anchored-line 65 | `((:register (:alternation #\+ #\- #\Space)) 66 | (:register (:greedy-repetition 0 nil :everything)))))) 67 | 68 | (defparameter *context-diff-line* 69 | (cl-ppcre:create-scanner 70 | (anchored-line 71 | `((:register (:alternation #\+ #\- #\! #\Space)) 72 | (:register (:greedy-repetition 0 nil :everything)))))) 73 | 74 | (defparameter *unified-diff-header-original-line* 75 | (cl-ppcre:create-scanner 76 | '(:sequence :start-anchor 77 | "--- " (:register (:greedy-repetition 1 nil :non-whitespace-char-class))))) 78 | 79 | (defparameter *unified-diff-header-modified-line* 80 | (cl-ppcre:create-scanner 81 | '(:sequence :start-anchor 82 | "+++ " (:register (:greedy-repetition 1 nil :non-whitespace-char-class))))) 83 | 84 | (defparameter *context-diff-header-original-line* 85 | (cl-ppcre:create-scanner 86 | '(:sequence :start-anchor 87 | "*** " (:register (:greedy-repetition 1 nil :non-whitespace-char-class))))) 88 | 89 | (defparameter *context-diff-header-modified-line* 90 | (cl-ppcre:create-scanner 91 | '(:sequence :start-anchor 92 | "--- " (:register (:greedy-repetition 1 nil :non-whitespace-char-class))))) 93 | 94 | (defun collect-window-lines (stream test) 95 | (loop for line = (read-line stream) 96 | while (funcall test line) 97 | collect line into lines 98 | finally (return (values lines line)))) 99 | 100 | 101 | ;;; abstraction for reading straight lines from a stream 102 | 103 | (defclass line-generator () 104 | ((peeked-line :accessor peeked-line :initform nil) 105 | (stream :accessor line-stream :initarg :stream))) 106 | 107 | (defun make-line-generator (stream) 108 | (make-instance 'line-generator :stream stream)) 109 | 110 | (defun yield-line (line-generator) 111 | (cond 112 | ((peeked-line line-generator) 113 | (prog1 (peeked-line line-generator) 114 | (setf (peeked-line line-generator) nil))) 115 | (t 116 | (read-line (line-stream line-generator) nil nil)))) 117 | 118 | (defun peek-line (line-generator) 119 | (cond 120 | ((peeked-line line-generator) 121 | (peeked-line line-generator)) 122 | (t 123 | (setf (peeked-line line-generator) 124 | (read-line (line-stream line-generator) nil nil))))) 125 | 126 | ;;; reading diff windows ("hunks") 127 | 128 | (defparameter *default-lead-char-alist* '((#\Space . :common) 129 | (#\+ . :create) 130 | (#\- . :delete))) 131 | 132 | (defparameter *context-original-lead-char-alist* 133 | (acons #\! :replace *default-lead-char-alist*)) 134 | 135 | (defparameter *context-modified-lead-char-alist* 136 | (acons #\! :insert *default-lead-char-alist*)) 137 | 138 | (defun line-to-chunk (line &optional (table *default-lead-char-alist*)) 139 | (let* ((char (aref line 0)) 140 | (text-line (subseq line 1)) 141 | (chunk-kind (cdr (assoc char table :test #'char=)))) 142 | (make-instance 'chunk 143 | :kind chunk-kind 144 | :lines (list text-line)))) 145 | 146 | (defgeneric read-diff-window (stream diff)) 147 | 148 | (defmethod read-diff-window (linegen (diff unified-diff)) 149 | (let ((window (create-window-for-diff diff)) 150 | (line (peek-line linegen))) 151 | (cond 152 | ((cl-ppcre:register-groups-bind ((#'parse-integer original-start 153 | original-length 154 | modified-start 155 | modified-length)) 156 | (*unified-diff-window-header* line) 157 | (setf (original-start-line window) (1- original-start) 158 | (original-length window) (if original-length original-length 1) 159 | (modified-start-line window) (1- modified-start) 160 | (modified-length window) (if modified-length modified-length 1)) 161 | (yield-line linegen))) 162 | (t 163 | (return-from read-diff-window nil))) 164 | (loop for line = (peek-line linegen) 165 | while (and line (cl-ppcre:scan *unified-diff-line* line)) 166 | collect (line-to-chunk (yield-line linegen)) into chunks 167 | finally (setf (window-chunks window) (consolidate-chunks chunks))) 168 | window)) 169 | 170 | ;;; We want to convert sequences of chunks with the same kind to a single 171 | ;;; chunk containing the lines of the sequence. 172 | (defun consolidate-chunks (chunk-list) 173 | (declare (type list chunk-list)) 174 | (do ((list chunk-list) 175 | (consolidated nil)) 176 | ((null list) (nreverse consolidated)) 177 | ;; consolidate chunks which match the kind of the head chunk 178 | (let* ((head-chunk (first list)) 179 | (kind (chunk-kind head-chunk))) 180 | (loop for rest on list 181 | for candidate = (car rest) 182 | while (eq kind (chunk-kind candidate)) 183 | collect (first (chunk-lines candidate)) into lines 184 | finally (let ((new-chunk (make-instance 'chunk 185 | :kind kind 186 | :lines lines))) 187 | (push new-chunk consolidated) 188 | (setf list rest)))))) 189 | 190 | (defmethod read-diff-window (linegen (diff context-diff)) 191 | (let ((window (create-window-for-diff diff)) 192 | (original-chunks nil) 193 | (modified-chunks nil) 194 | (line (yield-line linegen))) 195 | ;; read the "original" part 196 | (unless line 197 | (return-from read-diff-window nil)) 198 | (and (cl-ppcre:scan *context-diff-window-header* line) 199 | (setf line (yield-line linegen))) 200 | (unless (cl-ppcre:register-groups-bind ((#'parse-integer original-start 201 | original-end)) 202 | (*context-diff-window-original-line* line) 203 | (setf (original-start-line window) (1- original-start) 204 | (original-length window) (1+ (- original-end 205 | original-start)))) 206 | (error "Non-matching original window header ~A" line)) 207 | (loop for i from 0 below (original-length window) 208 | until (cl-ppcre:scan *context-diff-window-modified-line* (peek-line linegen)) 209 | collect (line-to-chunk (yield-line linegen) 210 | *context-original-lead-char-alist*) into chunks 211 | finally (setf original-chunks chunks)) 212 | ;; read the "modified" part 213 | (setf line (yield-line linegen)) 214 | (unless (cl-ppcre:register-groups-bind ((#'parse-integer modified-start 215 | modified-end)) 216 | (*context-diff-window-modified-line* line) 217 | (setf (modified-start-line window) (1- modified-start) 218 | (modified-length window) (1+ (- modified-end 219 | modified-start)))) 220 | (error "Non-matching modified window header ~A" line)) 221 | (loop for i from 0 below (modified-length window) 222 | until (let ((maybe-line (peek-line linegen))) 223 | (or (not maybe-line) ; EOF 224 | (cl-ppcre:scan *context-diff-window-header* maybe-line))) 225 | collect (line-to-chunk (yield-line linegen) 226 | *context-modified-lead-char-alist*) into chunks 227 | finally (setf modified-chunks chunks)) 228 | (setf (window-chunks window) 229 | (merge-context-chunks (consolidate-chunks original-chunks) 230 | (consolidate-chunks modified-chunks))) 231 | window)) 232 | 233 | (defun merge-context-chunks (original modified) 234 | (do ((original original) 235 | (modified modified) 236 | (merged-chunks nil)) 237 | ((and (null original) (null modified)) 238 | (nreverse merged-chunks)) 239 | (cond 240 | ((null original) 241 | (push (pop modified) merged-chunks)) 242 | ((null modified) 243 | (push (pop original) merged-chunks)) 244 | (t 245 | (let ((orig-kind (chunk-kind (first original))) 246 | (mod-kind (chunk-kind (first modified)))) 247 | (cond 248 | ((and (eq orig-kind :common) (eq mod-kind :common)) 249 | (push (pop original) merged-chunks) 250 | (pop modified)) 251 | ((eq orig-kind :common) 252 | (push (pop modified) merged-chunks)) 253 | ((eq mod-kind :common) 254 | (push (pop original) merged-chunks)) 255 | (t 256 | (push (pop original) merged-chunks) 257 | (push (pop modified) merged-chunks)))))))) 258 | 259 | ;;; reading patches 260 | 261 | (defclass patch () 262 | ((diff :initarg :diff :reader diff) 263 | (index-file :initarg :index-file :reader index-file) 264 | (prereq-string :initarg :prereq-string :reader prereq-string))) 265 | 266 | (defun read-patch (linegen) 267 | (let ((diff nil) 268 | (index-filename nil) 269 | (prereq-string nil) 270 | (line "")) 271 | (tagbody 272 | UNKNOWN-STATE 273 | (setf line (yield-line linegen)) 274 | (unless line 275 | (go RETURN-VALUE)) 276 | (cond 277 | ((cl-ppcre:register-groups-bind (original-pathname) 278 | (*unified-diff-header-original-line* line) 279 | (setf diff (make-instance 'unified-diff)) 280 | (setf (original-pathname diff) original-pathname)) 281 | (go UNIFIED-MODIFIED-LINE)) 282 | ((cl-ppcre:register-groups-bind (original-pathname) 283 | (*context-diff-header-original-line* line) 284 | (setf diff (make-instance 'context-diff)) 285 | (setf (original-pathname diff) original-pathname)) 286 | (go CONTEXT-MODIFIED-LINE)) 287 | ((cl-ppcre:register-groups-bind (index-pathname) 288 | (*index-header-line* line) 289 | (setf index-filename index-pathname)) 290 | (go UNKNOWN-STATE)) 291 | ((cl-ppcre:register-groups-bind (prereq) 292 | (*prereq-header-line* line) 293 | (setf prereq-string prereq)) 294 | (go UNKNOWN-STATE)) 295 | (t 296 | (go UNKNOWN-STATE))) 297 | UNIFIED-MODIFIED-LINE 298 | (setf line (yield-line linegen)) 299 | (unless line 300 | (go RETURN-VALUE)) 301 | (cond 302 | ((cl-ppcre:register-groups-bind (modified-pathname) 303 | (*unified-diff-header-modified-line* line) 304 | (setf (modified-pathname diff) modified-pathname)) 305 | (go WINDOW-STATE)) 306 | (t 307 | (error "Could not read unified modified-line"))) 308 | CONTEXT-MODIFIED-LINE 309 | (setf line (yield-line linegen)) 310 | (unless line 311 | (go RETURN-VALUE)) 312 | (cond 313 | ((cl-ppcre:register-groups-bind (modified-pathname) 314 | (*context-diff-header-modified-line* line) 315 | (setf (modified-pathname diff) modified-pathname)) 316 | (go WINDOW-STATE)) 317 | (t 318 | (error "Could not read context modified-line"))) 319 | WINDOW-STATE 320 | (loop for window = (read-diff-window linegen diff) 321 | while window 322 | collect window into windows 323 | finally (progn 324 | (setf (diff-windows diff) windows) 325 | (go RETURN-VALUE))) 326 | RETURN-VALUE 327 | (return-from read-patch (values diff index-filename prereq-string))))) 328 | 329 | (defun read-patches-from-file (pathname) 330 | (let ((patches nil)) 331 | (with-open-file (stream pathname :direction :input 332 | :if-does-not-exist :error) 333 | (let ((linegen (make-line-generator stream))) 334 | (loop 335 | (multiple-value-bind (diff index prereq) 336 | (read-patch linegen) 337 | (unless diff 338 | (return-from read-patches-from-file (nreverse patches))) 339 | (push (make-instance 'patch 340 | :diff diff 341 | :index-file (and index (pathname index)) 342 | :prereq-string prereq) 343 | patches))))))) 344 | 345 | (defun apply-seq-patch (original-seq patch) 346 | "Apply PATCH to the sequence ORIGINAL-SEQ." 347 | (apply-seq-diff original-seq (diff patch))) 348 | 349 | (defun apply-patch (patch &aux original) 350 | "Apply PATCH." 351 | (do-file-lines (line (original-pathname (diff patch))) (push line original)) 352 | (with-open-file (out (original-pathname (diff patch)) 353 | :direction :output :if-exists :supersede) 354 | (format out "~{~a~^~%~}~%" (apply-seq-patch (nreverse original) patch)))) 355 | -------------------------------------------------------------------------------- /svndiff.lisp: -------------------------------------------------------------------------------- 1 | ;;;; svndiff.lisp - encoding copy/insert deltas in svndiff format 2 | (in-package :diff) 3 | 4 | (declaim (optimize (debug 3))) 5 | 6 | (defun instruction-buffer-length (instlist) 7 | "Compute the length of the instruction buffer for a window containing 8 | the instructions in INSTLIST." 9 | (reduce #'+ instlist :initial-value 0 :key #'instruction-length)) 10 | 11 | (defun new-data-buffer-length (instlist) 12 | "Compute the length of the new data buffer for a window containing 13 | the instructions in INSTLIST." 14 | (reduce #'+ instlist :initial-value 0 15 | :key #'(lambda (op) 16 | (ecase (svndiff-op-kind op) 17 | (:new-data (svndiff-op-bytes op)) 18 | ((:copy-source :copy-target) 0))))) 19 | 20 | (defun write-svndiff-op (op target instruction-stream data-stream) 21 | (declare (type bytebuf target)) 22 | (let ((kind (svndiff-op-kind op)) 23 | (offset (svndiff-op-offset op)) 24 | (bytes (svndiff-op-bytes op))) 25 | (flet ((encode-copy (insn-byte) 26 | (when (< bytes 64) 27 | (setf insn-byte (logior insn-byte bytes))) 28 | (write-byte insn-byte instruction-stream) 29 | (when (>= bytes 64) 30 | (write-svndiff-integer bytes instruction-stream)) 31 | (write-svndiff-integer offset instruction-stream))) 32 | (ecase kind 33 | (:copy-target (encode-copy #x40)) 34 | (:copy-source (encode-copy #x00)) 35 | (:new-data 36 | (let ((insn-byte #x80)) ; first two bits `10' 37 | (when (< bytes 64) 38 | (setf insn-byte (logior insn-byte bytes))) 39 | (write-byte insn-byte instruction-stream) 40 | (when (>= bytes 64) 41 | (write-svndiff-integer bytes instruction-stream)) 42 | (dotimes (i bytes) 43 | (write-byte (aref target (+ i offset)) 44 | data-stream)))))))) 45 | 46 | (defun construct-svndiff-window (source-offset source-len target-len 47 | ops target) 48 | (declare (type bytebuf target)) 49 | (let* ((ops-buf-length (instruction-buffer-length ops)) 50 | (data-buf-length (new-data-buffer-length ops)) 51 | (ops-buffer (make-array ops-buf-length 52 | :element-type '(unsigned-byte 8) 53 | :initial-element 0)) 54 | (data-buffer (make-array data-buf-length 55 | :element-type '(unsigned-byte 8) 56 | :initial-element 0)) 57 | (ops-stream (make-instance 'byte-buffer-stream :buffer ops-buffer)) 58 | (data-stream (make-instance 'byte-buffer-stream :buffer data-buffer))) 59 | (declare (type bytebuf ops-buffer data-buffer)) 60 | (dolist (op ops) 61 | (write-svndiff-op op target ops-stream data-stream)) 62 | (make-svndiff-window :source-offset source-offset 63 | :source-len source-len 64 | :target-len target-len 65 | :ops ops-buffer 66 | :new-data data-buffer))) 67 | 68 | (defun svndiff-integer-length (num) 69 | (if (zerop num) 70 | 1 71 | (nth-value 0 (ceiling (integer-length num) 7)))) 72 | 73 | (defun instruction-length (op) 74 | "Compute the number of bytes needed to represent OP when it is svndiff 75 | encoded." 76 | (let ((kind (svndiff-op-kind op)) 77 | (offset (svndiff-op-offset op)) 78 | (bytes (svndiff-op-bytes op))) 79 | (ecase kind 80 | ((:copy-source :copy-target) 81 | (if (< bytes 64) 82 | (1+ (svndiff-integer-length offset)) 83 | (+ 1 84 | (svndiff-integer-length offset) 85 | (svndiff-integer-length bytes)))) 86 | (:new-data 87 | (if (< bytes 64) 88 | 1 89 | (1+ (svndiff-integer-length bytes))))))) 90 | 91 | (defun read-svndiff-integer (stream) 92 | "Reads a svndiff-encoded integer from STREAM." 93 | (let ((int 0) 94 | (byte (read-byte stream)) 95 | (data-byte-spec (byte 7 0))) 96 | (setf int (ldb data-byte-spec byte)) 97 | (loop while (> byte 127) 98 | do (setf byte (read-byte stream) 99 | int (logior (ash int 7) 100 | (ldb data-byte-spec byte)))) 101 | int)) 102 | 103 | (defun write-svndiff-integer (integer stream) 104 | "Writes the non-negative INTEGER to STREAM using svndiff encoding." 105 | (when (zerop integer) 106 | (write-byte #x00 stream) 107 | (return-from write-svndiff-integer (values))) 108 | (let ((blocks (svndiff-integer-length integer))) 109 | (loop for i downfrom blocks above 0 110 | do (let ((part (ldb (byte 7 (* 7 (1- i))) integer))) 111 | (assert (< part 128)) 112 | (unless (= i 1) 113 | (setf part (logior #x80 part))) 114 | (write-byte part stream))) 115 | (values))) 116 | 117 | (defun read-svndiff-window (stream) 118 | "Reads a svndiff window from STREAM." 119 | (let ((source-offset (read-svndiff-integer stream)) 120 | (source-len (read-svndiff-integer stream)) 121 | (target-len (read-svndiff-integer stream)) 122 | (instrs-len (read-svndiff-integer stream)) 123 | (new-data-len (read-svndiff-integer stream))) 124 | (let ((instr-bytes (make-array instrs-len 125 | :element-type '(unsigned-byte 8) 126 | :initial-element 0)) 127 | (new-data (make-array new-data-len 128 | :element-type '(unsigned-byte 8) 129 | :initial-element 0)) 130 | (bytes-read 0)) 131 | (declare (type bytebuf instr-bytes new-data)) 132 | (setf bytes-read (read-sequence instr-bytes stream)) 133 | (unless (= bytes-read instrs-len) 134 | (error "Could not read instructions for svndiff window.")) 135 | (setf bytes-read (read-sequence new-data stream)) 136 | (unless (= bytes-read new-data-len) 137 | (error "Could not read new data for svndiff window.")) 138 | (make-svndiff-window :source-offset source-offset 139 | :source-len source-len 140 | :target-len target-len 141 | :ops instr-bytes 142 | :new-data new-data)))) 143 | 144 | (defun write-svndiff-window (window stream) 145 | "Writes the svndiff window WINDOW to STREAM." 146 | (let ((source-offset (svndiff-window-source-offset window)) 147 | (source-len (svndiff-window-source-len window)) 148 | (target-len (svndiff-window-target-len window)) 149 | (instr-bytes (svndiff-window-ops window)) 150 | (new-data (svndiff-window-new-data window))) 151 | (declare (type (simple-array (unsigned-byte 8) (*)) instr-bytes new-data)) 152 | (write-svndiff-integer source-offset stream) 153 | (write-svndiff-integer source-len stream) 154 | (write-svndiff-integer target-len stream) 155 | (write-svndiff-integer (length instr-bytes) stream) 156 | (write-svndiff-integer (length new-data) stream) 157 | (write-sequence instr-bytes stream) 158 | (write-sequence new-data stream) 159 | (values))) 160 | 161 | (defun write-svndiff (windows stream) 162 | "Writes a svndiff document to STREAM using the information in WINDOWS, 163 | which is a list of SVNDIFF-WINDOWs." 164 | ;; "SVN\0" 165 | (write-byte #x83 stream) 166 | (write-byte #x86 stream) 167 | (write-byte #x78 stream) 168 | (write-byte #x00 stream) 169 | (dolist (window windows) 170 | (write-svndiff-window window stream))) 171 | 172 | 173 | ;;; high level driver for the whole shebang 174 | 175 | (defun compare-files (source-filename target-filename) 176 | "Return a list of svndiff windows describing the differences between 177 | SOURCE-FILENAME and TARGET-FILENAME." 178 | (with-binary-file (source-stream source-filename :input) 179 | (with-binary-file (target-stream target-filename :input) 180 | (let ((context (make-instance 'vdelta-context)) 181 | (windows nil)) 182 | (do* ((buffer (buffer context)) 183 | (source-offset 0) 184 | (source-length (read-sequence buffer source-stream 185 | :start 0 :end +buffer-size+) 186 | (read-sequence buffer source-stream 187 | :start 0 :end +buffer-size+)) 188 | (target-length (read-sequence buffer target-stream 189 | :start source-length) 190 | (read-sequence buffer target-stream 191 | :start source-length))) 192 | ((zerop target-length) (nreverse windows)) 193 | (clrhash (table context)) 194 | (setf (source-length context) source-length 195 | (target-start context) source-length 196 | (target-length context) (- target-length source-length)) 197 | (initialize-match-table context) 198 | (let* ((ops (calculate-svndiff-ops context))) 199 | (push ops windows)) 200 | (incf source-offset source-length)))))) 201 | 202 | #| 203 | ;;; vcdiff encoding stuff 204 | 205 | ;;; managing the cache. these algorithms are taken directly from the RFC, 206 | ;;; with small adjustments from the C-style used therein 207 | (defclass address-cache () 208 | ((near-cache :initarg :near-cache :reader near-cache) 209 | (next-near-slot :initform 0 :reader next-near-slot) 210 | (same-cache :initarg :same-cache :reader same-cache))) 211 | 212 | (defun make-address-cache (near-size same-size) 213 | (let ((near-cache (make-array near-size :initial-element 0)) 214 | (same-cache (make-array (* 256 same-size :initial-element 0)))) 215 | (make-instance 'address-cache 216 | :near-cache near-cache :same-cache same-cache))) 217 | 218 | (defun update-cache (address-cache address) 219 | (let ((near-cache-length (length (near-cache address-cache))) 220 | (same-cache-length (length (same-cache address-cache)))) 221 | (when (plusp near-cache-length) 222 | (setf (aref (near-cache address-cache) (next-near-slot address-cache)) 223 | address) 224 | (setf (next-near-slot address-cache) 225 | (truncate (1+ (next-near-slot address-cache)) near-cache-length))) 226 | (when (plusp same-cache-length) 227 | (setf (aref (same-cache address-cache) 228 | (nth-value 1 (truncate address same-cache-length))) 229 | address)) 230 | (values))) 231 | 232 | ;;; from the RFC: "attempt to find the address mode that yields the 233 | ;;; smallest integer value for the encoded address value, thereby 234 | ;;; minimizing the encoded size of the address. the RFC goes on to 235 | ;;; note that this best address value is "local" and suggests that more 236 | ;;; complex schemes might be able to do better. 237 | (defun encode-address (address-cache address here-p) 238 | (let ((best-encoded-address address) 239 | ;; I think when the RFC uses `here', it means `target', and 240 | ;; when it uses `self', it means `source'. check this to 241 | ;; make sure our intuition is right. why don't they use 242 | ;; self-descriptive terms? and can't we already figure this 243 | ;; out in an earlier phase? 244 | (best-mode (if here-p :vcd-here :vcd-self))) 245 | (dotimes (i (length (near-cache address-cache))) 246 | (let ((d (- address (aref (near-cache address-cache) i)))) 247 | (when (and (not (minusp d)) (< d best-encoded-address)) 248 | (setf best-encoded-address d 249 | ;; ugh 250 | best-mode (+ i 2))))) 251 | (let ((d (nth-value 1 (truncate address 252 | (* 256 (length (same-cache address-cache))))))) 253 | (when (and (plusp (length (same-cache address-cache))) 254 | (= addr 255 | (aref (same-cache address-cache) d))) 256 | (setf best-encoded-address d 257 | best-mode (+ (length (near-cache address-cache)) 2 (truncate d 256)))) 258 | (update-cache address-cache address) 259 | (values best-encoded-address best-mode)))) 260 | 261 | (defun decode-address (address-cache address mode) 262 | (let ((decoded-address 0)) 263 | (cond 264 | ((eq mode :vcd-self) (setf decoded-address 265 | (fetch-copy-address-integer #|wherever|#))) 266 | ((eq mode :vcd-here) (setf decoded-address 267 | (- address (fetch-copy-address-integer #|wherever|#)))) 268 | ((= 269 | ))))) 270 | |# -------------------------------------------------------------------------------- /test-files/test-dst-1.txt: -------------------------------------------------------------------------------- 1 | a 2 | b 3 | z 4 | d 5 | e 6 | f 7 | g 8 | h 9 | i 10 | j 11 | k 12 | y 13 | w 14 | n 15 | o 16 | p 17 | -------------------------------------------------------------------------------- /test-files/test-dst-2.txt: -------------------------------------------------------------------------------- 1 | a 2 | c 3 | e 4 | b 5 | d 6 | a 7 | b 8 | b 9 | a 10 | b 11 | e 12 | d 13 | -------------------------------------------------------------------------------- /test-files/test-dst-3.txt: -------------------------------------------------------------------------------- 1 | d 2 | d 3 | d 4 | b 5 | b 6 | b 7 | b 8 | b 9 | b 10 | b 11 | e 12 | e 13 | e 14 | f 15 | f 16 | f 17 | f 18 | f 19 | f 20 | f 21 | f 22 | f 23 | -------------------------------------------------------------------------------- /test-files/test-src-1.txt: -------------------------------------------------------------------------------- 1 | a 2 | b 3 | c 4 | d 5 | e 6 | f 7 | g 8 | h 9 | i 10 | j 11 | k 12 | n 13 | o 14 | p 15 | -------------------------------------------------------------------------------- /test-files/test-src-2.txt: -------------------------------------------------------------------------------- 1 | a 2 | c 3 | b 4 | d 5 | e 6 | a 7 | c 8 | b 9 | e 10 | d 11 | -------------------------------------------------------------------------------- /test-files/test-src-3.txt: -------------------------------------------------------------------------------- 1 | a 2 | a 3 | a 4 | b 5 | b 6 | b 7 | b 8 | b 9 | b 10 | b 11 | c 12 | c 13 | c 14 | f 15 | f 16 | f 17 | f 18 | f 19 | f 20 | f 21 | f 22 | f 23 | -------------------------------------------------------------------------------- /vdelta.lisp: -------------------------------------------------------------------------------- 1 | ;;;; vdelta.lisp - computing copy/insert deltas with the vdelta algorithm 2 | 3 | (in-package :diff) 4 | 5 | (defconstant +buffer-size+ 1024) 6 | 7 | (deftype op-kind () '(member :copy-target :copy-source :new-data)) 8 | (deftype bytebuf () '(simple-array (unsigned-byte 8) (*))) 9 | 10 | ;;; a convenience class for writing byte buffers 11 | (defclass byte-buffer-stream (trivial-gray-streams:fundamental-binary-output-stream) 12 | ((buffer :accessor buffer :initarg :buffer :type bytebuf) 13 | (index :accessor index :initform 0 :type fixnum))) 14 | 15 | (defmethod trivial-gray-streams:stream-write-byte ((stream byte-buffer-stream) byte) 16 | (with-slots (buffer index) stream 17 | (when (>= index (length buffer)) 18 | (error "Cannot write any more data to stream!")) 19 | (setf (aref buffer index) byte) 20 | (incf index) 21 | byte)) 22 | 23 | (defmacro with-binary-file ((stream-name pathname direction) &body body) 24 | "A wrapper for WITH-OPEN-FILE that opens the stream with an element-type 25 | of (UNSIGNED-BYTE 8). DIRECTION is passed as the argument to :DIRECTION." 26 | `(with-open-file (,stream-name ,pathname 27 | :direction ,direction 28 | :element-type '(unsigned-byte 8)) 29 | ,@body)) 30 | 31 | (defstruct svndiff-window 32 | (source-offset 0 :type (unsigned-byte 32)) 33 | (source-len 0 :type (unsigned-byte 32)) 34 | (target-len 0 :type (unsigned-byte 32)) 35 | (ops (error "required argument") :type bytebuf) 36 | (new-data (error "required argument") :type bytebuf)) 37 | 38 | (defstruct (svndiff-op 39 | (:constructor make-svndiff-op (kind offset bytes))) 40 | (kind :new-data :type op-kind) 41 | (offset 0 :type (unsigned-byte 32)) 42 | (bytes 0 :type (unsigned-byte 32))) 43 | 44 | (defun match-length (seq1 start1 end1 seq2 start2 end2) 45 | (declare (type bytebuf seq1 seq2)) 46 | (declare (type fixnum start1 end1 start2 end2)) 47 | (do ((index1 start1 (1+ index1)) 48 | (index2 start2 (1+ index2))) 49 | ((or (= index1 end1) (= index2 end2)) 50 | (- index1 start1)) 51 | (declare (type fixnum index1 index2)) 52 | (when (/= (aref seq1 index1) (aref seq2 index2)) 53 | (return (- index1 start1))))) 54 | 55 | (defun string-to-byteseq (string) 56 | "Convert STRING to an array of (UNSIGNED-BYTE 8). Assumes that the 57 | Common Lisp implementation is underpinned with 8-bit characters." 58 | (let ((buf (make-array (length string) 59 | :element-type '(unsigned-byte 8)))) 60 | (map-into buf #'char-code string))) 61 | 62 | 63 | ;;; vdelta calculation 64 | 65 | (declaim (inline combine-bytes)) 66 | (defun combine-bytes (byteseq start) 67 | (declare (type bytebuf byteseq)) 68 | (let ((int 0)) 69 | (dotimes (i 4) 70 | (setf int (logior int 71 | (ash (aref byteseq (+ start i)) 72 | (* 8 i))))) 73 | int)) 74 | 75 | (defclass vdelta-context () 76 | ((buffer :initform (make-array (* 2 +buffer-size+) :element-type '(unsigned-byte 8)) 77 | :reader buffer) 78 | (source-start :initform 0 79 | :accessor source-start) 80 | (source-length :initform 0 81 | :accessor source-length) 82 | (target-start :initform 0 83 | :accessor target-start) 84 | (target-length :initform 0 85 | :accessor target-length) 86 | (table :initform (make-hash-table :test #'eql) 87 | :accessor table))) 88 | 89 | (defun key-to-chars (key) 90 | (with-output-to-string (stream) 91 | (flet ((char-at (pos) 92 | (code-char (ldb (byte 8 pos) key)))) 93 | (format stream "~A~A~A~A" 94 | (char-at 0) (char-at 8) (char-at 16) (char-at 24))))) 95 | 96 | (defun print-match-table (table) 97 | (with-hash-table-iterator (fn table) 98 | (loop 99 | (multiple-value-bind (more-p key value) (fn) 100 | (unless more-p (return)) 101 | (format t "~A: ~A~%" (key-to-chars key) value))))) 102 | 103 | (defun initialize-match-table (context) 104 | (let ((buffer (buffer context)) 105 | (start (source-start context)) 106 | (length (source-length context)) 107 | (table (table context))) 108 | (declare (type bytebuf buffer)) 109 | (do ((i start (1+ i))) 110 | ((>= i (- length 3)) (print-match-table table)) 111 | (let ((key (combine-bytes buffer i))) 112 | (multiple-value-bind (index presentp) (gethash key table) 113 | (if (not presentp) 114 | (setf (gethash key table) i) 115 | (let ((match-length (match-length buffer i length 116 | buffer index length))) 117 | (dotimes (j 3) 118 | (let ((insert-index (+ start (- match-length 1 j)))) 119 | (setf (gethash (combine-bytes buffer insert-index) 120 | table) 121 | insert-index))) 122 | (assert (> match-length 0)) 123 | (incf i match-length)))))))) 124 | 125 | (defun calculate-svndiff-ops (context) 126 | (let ((buffer (buffer context)) 127 | (source-start (source-start context)) 128 | (source-length (source-length context)) 129 | (target-start (target-start context)) 130 | (target-length (target-length context)) 131 | (target-end (+ (target-start context) (target-length context))) 132 | (table (table context)) 133 | (add-start -1) 134 | (add-length 0) 135 | (instructions nil)) 136 | (declare (type bytebuf buffer) 137 | (type fixnum source-length target-length add-start add-length)) 138 | (flet ((push-new-data (index) 139 | (when (= -1 add-start) 140 | (setf add-start index)) 141 | (incf add-length)) 142 | (add-data-insn () 143 | (when (not (= -1 add-start)) 144 | (push (make-svndiff-op :new-data add-start add-length) 145 | instructions) 146 | (setf add-start -1 add-length 0))) 147 | (add-copy-insn (index len) 148 | (let* ((targetp (>= index source-length)) 149 | (op-kind (if targetp :copy-target :copy-source)) 150 | (offset index)) 151 | (push (make-svndiff-op op-kind offset len) 152 | instructions)))) 153 | (do ((i target-start (1+ i))) 154 | ((>= i target-end)) 155 | ;; pick up any stragglers at the end of the string 156 | (cond 157 | ((<= (- target-end 3) i) (push-new-data i)) 158 | (t 159 | (let ((key (combine-bytes buffer i))) 160 | (format t "Key: ~A~%" (key-to-chars key)) 161 | (multiple-value-bind (index presentp) (gethash key table) 162 | (if (not presentp) 163 | (progn 164 | (push-new-data i) 165 | ;; record a new position index 166 | (setf (gethash key table) i)) 167 | (flet ((find-diff-loc () 168 | (if (>= index source-length) 169 | ;; a index in version2 170 | (values (match-length buffer i target-end 171 | buffer index target-end) 172 | t) 173 | (values (match-length buffer index source-length 174 | buffer i target-end) 175 | nil)))) 176 | (format t "i: ~A, index: ~A~%" i index) 177 | ;; clear any pending additions 178 | (add-data-insn) 179 | (multiple-value-bind (match-length targetp) (find-diff-loc) 180 | (format t "match-length: ~A, ~A~%" match-length targetp) 181 | (assert (> match-length 0)) 182 | (when targetp 183 | (dotimes (j 3) 184 | (let ((insert-index (+ target-start 185 | (- match-length j)))) 186 | (assert (>= insert-index target-start)) 187 | (setf (gethash (combine-bytes buffer insert-index) 188 | table) 189 | insert-index)))) 190 | (add-copy-insn index match-length) 191 | (incf i (1- match-length)))))))))) 192 | (add-data-insn) 193 | (nreverse instructions)))) --------------------------------------------------------------------------------