├── .gitignore ├── README.md ├── UNLICENSE ├── app.py ├── images ├── 1.png ├── 2.png ├── 3.png ├── 4.png ├── 5.png ├── 6.png ├── 7.png └── title.png └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | secret 2 | venv 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # █ Blackout █ 2 | 3 | For [NaNoGenMo 2016](https://github.com/NaNoGenMo/2016). 4 | 5 | _Blackout_ generates pages of text from book or newspaper scans in the style of [Newspaper Blackout Poetry](http://newspaperblackout.com/), popularized by [Austin Kleon](https://twitter.com/austinkleon) (related to work like [A Humument](http://tomphillipshumument.tumblr.com/) by Tom Phillips). 6 | 7 | _Blackout_ does the following: 8 | 9 | 1. Take, as input, an image of text, from a newspaper or book. 10 | 2. Run [OCR](https://github.com/jflesch/pyocr) against the image, identifying the words and their bounding boxes. 11 | 3. Feed the extracted text into a [natural language parser](https://spacy.io/), categorizing each part of speech. 12 | 3. Given one of many randomly selected [Tracery](https://github.com/aparrish/pytracery) grammars, select words from the current page that match the parts of speech of that grammar. 13 | 4. Draw around those words and "scribble" out all other text on the page image. 14 | 5. Output the final page as a new image. 15 | 16 | Pen width, line length, line direction, number of strokes, and stroke opacity are all randomly fuzzed. The pen color is always black, except in rare cases it is blood red. 17 | 18 | _Blackout_ has produced two creative works: "The Days Left Forebodings and Water" (this one), and was modified by @samplereality for use in producing ["A Great Intimate Unmistakeable More"](https://github.com/NaNoGenMo/2021/issues/87) 19 | 20 | ## "The Days Left Forebodings and Water" 21 | 22 | 23 | 24 | The source material is [A Vindication of the Rights of Women](https://en.wikipedia.org/wiki/A_Vindication_of_the_Rights_of_Woman) by Mary Wollstonecraft (1792). 25 | 26 | Read [The Days Left Forebodings and Water](https://lizadaly.com/projects/blackout/lizadaly-blackout-nanogenmo-2016.pdf). 45 pages long, consists of entries that were generated randomly, but hand-picked and ordered on November 9, 2016. 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | (The full NaNoGenMo entry of ~50,000 words was a [9.3GB PDF] of nearly 10,000 pages. If for some reason you want it, just ask.) 43 | -------------------------------------------------------------------------------- /UNLICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | from statistics import mean 4 | import string 5 | import uuid 6 | import sys 7 | 8 | import tracery 9 | import spacy 10 | 11 | import pyocr 12 | import pyocr.builders 13 | from PIL import Image, ImageDraw, ImageFilter 14 | 15 | BOUND_PADDING = 50 16 | BOX_PADDING = 50 # 10 17 | WOBBLE_MAX = 2 18 | 19 | nlp = spacy.load('en') 20 | 21 | def draw_vertical_lines(draw, boxes, doc_bounding_box, line_width): 22 | line_weight_factor = random.triangular(0.005, 1.2) 23 | current_x = doc_bounding_box[0] - line_width / 2 24 | color = get_color() 25 | 26 | while current_x < doc_bounding_box[2]: 27 | start_x = current_x 28 | start_y = doc_bounding_box[1] - line_width / 2 29 | end_x = start_x 30 | end_y = doc_bounding_box[3] - line_width / 2 31 | 32 | bx0 = start_x 33 | bx1 = start_x + line_width 34 | 35 | select_boxes = [] 36 | for box in boxes: 37 | wx0 = box.position[0][0] - BOUND_PADDING 38 | wx1 = box.position[1][0] + BOUND_PADDING 39 | if bx0 < wx0 and wx1 < bx1 or \ 40 | wx0 < bx1 and bx1 < wx1 or \ 41 | wx0 < bx0 and bx0 < wx1: 42 | select_boxes.append(box) 43 | 44 | if select_boxes: 45 | y0 = start_y 46 | y1 = end_y 47 | for box in select_boxes: 48 | y1 = box.position[0][1] - BOX_PADDING 49 | draw_line(draw, [start_x, y0, end_x, y1], line_width=line_width, color=color, 50 | line_weight_factor=line_weight_factor, dir='v') 51 | y0 = box.position[1][1] + BOX_PADDING 52 | draw_line(draw, [start_x, y0, end_x, end_y], line_width=line_width, color=color, 53 | line_weight_factor=line_weight_factor, dir='v') 54 | else: 55 | draw_line(draw, [start_x, start_y, end_x, end_y], line_width=line_width, color=color, 56 | line_weight_factor=line_weight_factor, dir='v') 57 | 58 | current_x = start_x + line_width 59 | 60 | def get_color(): 61 | if random.randint(0, 100) == 0: 62 | color = (179, 27, 27) 63 | else: 64 | color = (int(random.triangular(0, 10, 1)), 65 | int(random.triangular(0, 10, 1)), 66 | int(random.triangular(0, 10, 1)), 67 | ) 68 | return color 69 | 70 | def draw_horizontal_lines(draw, boxes, doc_bounding_box, line_width): 71 | """Draw black horizontal lines across the page _except_ for that word""" 72 | line_weight_factor = random.triangular(0.005, 1.2) 73 | color = get_color() 74 | start_x = doc_bounding_box[0] 75 | current_y = doc_bounding_box[1] 76 | end_x = doc_bounding_box[2] 77 | end_y = doc_bounding_box[3] - line_width / 2 78 | 79 | while current_y < doc_bounding_box[3]: 80 | by0 = current_y 81 | by1 = current_y + line_width 82 | 83 | select_boxes = [] 84 | for box in boxes: 85 | wy0 = box.position[0][1] 86 | wy1 = box.position[1][1] 87 | if by0 <= wy0 and wy1 <= by1 or \ 88 | wy0 <= by1 and by1 <= wy1 or \ 89 | wy0 <= by0 and by0 <= wy1: 90 | select_boxes.append(box) 91 | 92 | if select_boxes: 93 | x0 = start_x 94 | x1 = end_x 95 | for box in select_boxes: 96 | x1 = box.position[0][0] - BOX_PADDING 97 | draw_line(draw, [x0, current_y, x1, current_y], 98 | line_width=line_width, 99 | line_weight_factor=line_weight_factor, color=color, 100 | dir="h") 101 | x0 = box.position[1][0] + BOX_PADDING 102 | draw_line(draw, [x0 + BOX_PADDING, current_y, end_x, current_y], 103 | line_width=line_width, line_weight_factor=line_weight_factor, dir="h", color=color) 104 | else: 105 | draw_line(draw, [start_x, current_y, end_x, current_y], 106 | line_width=line_width, color=color, 107 | line_weight_factor=line_weight_factor, 108 | dir="h") 109 | current_y = by1 110 | 111 | 112 | 113 | def draw_line(draw, pos, line_width, dir="h", color=(0, 0, 0), line_weight_factor=1): 114 | # Draw a fuzzy line of randomish width repeat times 115 | repeat = random.randint(10, 20) 116 | width = int(line_width) * line_weight_factor 117 | default_padding = line_width / 3 118 | 119 | margin_extent = 20 # random.randint(1, 20) 120 | # Slide the center of the line down width/2 based on dir 121 | if dir == 'h': 122 | pos[1] += width / 2 123 | pos[3] += width / 2 124 | # Introduce some randomness into the margins 125 | pos[0] -= random.triangular(width / margin_extent, width * margin_extent) 126 | pos[2] += random.triangular(width / margin_extent, width * margin_extent) 127 | else: 128 | pos[0] -= width / 2 129 | pos[2] -= width / 2 130 | # Introduce some randomness into the margins 131 | pos[1] -= random.triangular(width / margin_extent, width * margin_extent) 132 | pos[3] += random.triangular(width / margin_extent, width * margin_extent) 133 | 134 | for i in range(0, repeat): 135 | 136 | width = int(random.uniform(line_width - default_padding, line_width)) 137 | padding = default_padding * 4 138 | 139 | pos[0] = random.triangular(pos[0] - padding, pos[0] + padding) 140 | pos[1] = random.triangular(pos[1] - padding, pos[1] + padding) 141 | pos[2] = random.triangular(pos[2] - padding, pos[2] + padding) 142 | pos[3] = random.triangular(pos[3] - padding, pos[3] + padding) 143 | 144 | opacity = 240 + i 145 | width_factor = random.triangular(1, 10, 1) 146 | draw.line(pos, width=int(width / width_factor), fill=(*color, opacity)) 147 | 148 | def get_boxes(imagefile, tool): 149 | num_words = 5 150 | boxes = tool.image_to_string( 151 | Image.open(imagefile), lang="eng", 152 | builder=pyocr.builders.WordBoxBuilder() 153 | ) 154 | return boxes 155 | 156 | def image_filter(img): 157 | for i in range(10): 158 | img = img.filter(ImageFilter.SMOOTH_MORE) 159 | return img 160 | 161 | 162 | def parse_words(boxes): 163 | words = [] 164 | for box in boxes: 165 | word = box.content.strip() 166 | word = word.translate(str.maketrans({a:None for a in string.punctuation})) 167 | words.append({'text': word, 'box': box}) 168 | sent = ' '.join([w['box'].content for w in words]) 169 | doc = nlp(sent) 170 | for token in doc: 171 | for word in words: 172 | text = word['text'] 173 | if token.text == text: 174 | word['token'] = token 175 | word['pos'] = token.pos_ 176 | return words 177 | 178 | def find_boxes_for_grammar(boxes): 179 | words = parse_words(boxes) 180 | grammars = [ 181 | ['DET', 'NOUN', 'VERB', 'NOUN'], 182 | ['ADJ', 'NOUN', 'VERB', 'NOUN'], 183 | ['ADJ', 'NOUN', 'VERB', 'ADV'], 184 | ['DET', 'NOUN', 'VERB', 'NOUN', 'CONJ', 'NOUN'], 185 | ['VERB', 'DET', 'NOUN'], 186 | ['ADV', 'VERB', 'NOUN', 'CONJ', 'NOUN'] 187 | ] 188 | grammar = random.choice(grammars) 189 | picks = [] 190 | word_index = 0 191 | prev_word = None 192 | prev_pos = None 193 | 194 | for pos in grammar: 195 | while True: 196 | word = words[word_index] 197 | if len(picks) > 0: 198 | prev_word = picks[-1] 199 | prev_pos = prev_word['pos'] 200 | pick_this = True 201 | if prev_pos == 'DET': 202 | if prev_word['text'] == 'a' or prev_word['text'] == 'an': 203 | # Pick this if it's singular 204 | pick_this = not is_plural(word) 205 | if prev_word['text'] == 'a': 206 | # Pick this if it doesn't start with a vowel 207 | pick_this = not starts_with_vowel(word) and pick_this 208 | if prev_word['text'] == 'an': 209 | pick_this = starts_with_vowel(word) and pick_this 210 | if prev_word['text'] == 'this': 211 | pick_this = not is_plural(word) and pick_this 212 | if prev_word['text'] == 'these': 213 | pick_this = is_plural(word) and pick_this 214 | if prev_pos == 'NOUN': 215 | # If the previous noun was plural, the verb must be plural 216 | if is_plural(prev_word): 217 | pick_this = is_plural_verb(word) and pick_this 218 | if not is_plural(prev_word): 219 | pick_this = not is_plural_verb(word) and pick_this 220 | if prev_pos == 'VERB': 221 | # If the verb was plural, the noun must be 222 | if is_plural_verb(prev_word): 223 | pick_this = is_plural(word) and pick_this 224 | if not is_plural_verb(prev_word): 225 | pick_this = not is_plural(word) and pick_this 226 | if pos == 'VERB': 227 | # Don't pick auxilliary verbs as they won't have a helper 228 | if 'token' in word: 229 | pick_this = word['token'].dep_ != 'aux' and pick_this 230 | 231 | if 'pos' in word and word['pos'] == pos and pick_this and random.randint(0, 30) == 0: 232 | #print("Picking ", word['text'], " ", word['token'].dep_) 233 | picks.append(word) 234 | prev_pos = pos 235 | word_index += 1 236 | break 237 | 238 | word_index += 1 239 | return [p['box'] for p in picks] 240 | 241 | def is_plural(word): 242 | if word['text'] == 'men' or word['text'] == 'women': # Special case this since one comes up a lot 243 | return True 244 | return word['text'][-1] == 's' 245 | 246 | def is_plural_verb(word): 247 | if word['text'] == 'have': 248 | return True 249 | return word['text'][-1] != 's' 250 | 251 | def is_present(word): 252 | return word['text'][-1] == 's' 253 | 254 | def starts_with_vowel(word): 255 | vowels = set(['a', 'e', 'i', 'o', 'u']) 256 | return word['text'][0] in vowels 257 | 258 | def setup(imagefile): 259 | tool = pyocr.get_available_tools()[0] 260 | boxes = get_boxes(imagefile, tool) 261 | return boxes 262 | 263 | def draw(imagefile, boxes): 264 | 265 | while True: 266 | try: 267 | select_boxes = find_boxes_for_grammar(boxes) 268 | break 269 | except IndexError: 270 | #print("Retrying...") 271 | pass 272 | 273 | # Get the line height by taking the average of all the box heights 274 | box_heights = [] 275 | margin_lefts = [] 276 | margin_rights = [] 277 | margin_top = boxes[0].position[0][1] 278 | margin_bottom = boxes[-1].position[1][1] 279 | 280 | for box in boxes: 281 | margin_lefts.append(box.position[0][0]) 282 | margin_rights.append(box.position[1][0]) 283 | box_heights.append(box.position[1][1] - box.position[0][1]) 284 | 285 | margin_left = min(margin_lefts) 286 | margin_right = max(margin_rights) 287 | 288 | line_width = mean(box_heights) 289 | line_spaces = [0] 290 | last_y_pos = boxes[0].position[1][1] 291 | 292 | src = Image.open(imagefile) 293 | src = src.convert('RGBA') 294 | img = Image.new('RGBA', (src.size[0], src.size[1])) 295 | draw = ImageDraw.Draw(img) 296 | 297 | 298 | doc_bounding_box = (margin_left, margin_top, margin_right, margin_bottom) 299 | 300 | line_choices = random.choice(('v', 'h', 'a')) 301 | line_choices = 'v' 302 | if line_choices == 'v': 303 | draw_vertical_lines(draw, select_boxes, doc_bounding_box=doc_bounding_box, line_width=line_width) 304 | elif line_choices == 'h': 305 | draw_horizontal_lines(draw, select_boxes, 306 | doc_bounding_box=doc_bounding_box, 307 | line_width=line_width) 308 | else: 309 | draw_vertical_lines(draw, select_boxes, doc_bounding_box=doc_bounding_box, line_width=line_width) 310 | draw_horizontal_lines(draw, select_boxes, 311 | doc_bounding_box=doc_bounding_box, 312 | line_width=line_width) 313 | 314 | 315 | 316 | img = image_filter(img) 317 | out = Image.alpha_composite(src, img) 318 | 319 | repeat = 10 320 | f = 10 321 | 322 | for box in select_boxes: 323 | pad = BOX_PADDING 324 | d = ImageDraw.Draw(out) 325 | p0 = [box.position[0][0] - pad, box.position[0][1] - pad] 326 | p1 = [box.position[1][0] + pad, box.position[0][1] - pad] 327 | p2 = [box.position[1][0] + pad, box.position[1][1] + pad] 328 | p3 = [box.position[0][0] - pad, box.position[1][1] + pad] 329 | b = (*p0, *p2) 330 | crop = src.crop(box=b) 331 | out.paste(crop, box=b) 332 | w = 10 + int(random.uniform(-5, 5)) 333 | for i in range(0, repeat): 334 | fuzz = random.uniform(-f, f) 335 | p0 = [p + fuzz for p in p0] 336 | fuzz = random.uniform(-f, f) 337 | p1 = [p + fuzz for p in p1] 338 | fuzz = random.uniform(-f, f) 339 | p2 = [p + fuzz for p in p2] 340 | fuzz = random.uniform(-f, f) 341 | p3 = [p + fuzz for p in p3] 342 | fuzz = random.uniform(-f, f) 343 | d.line(p0 + p1, width=w, fill="black") 344 | d.line(p1 + p2, width=w, fill="black") 345 | d.line(p2 + p3, width=w, fill="black") 346 | d.line(p3 + p0, width=w, fill="black") 347 | 348 | 349 | final = Image.new('RGBA', (src.size[0], src.size[1])) 350 | canvas = ImageDraw.Draw(final) 351 | canvas.rectangle([0, 0, final.size[0], final.size[1]], fill='white') 352 | final = Image.alpha_composite(final, out) 353 | outfile = str(uuid.uuid4())[0:5] + '.png' # os.path.basename(imagefile) 354 | 355 | final.save("build/" + outfile) 356 | 357 | if __name__ == '__main__': 358 | path = sys.argv[1] 359 | pages = [] 360 | for f in os.listdir(path): 361 | pages.append(f) 362 | num_generations_per_page = 100 363 | while True: 364 | f = random.choice(pages) 365 | imagefile = os.path.join(path, f) 366 | print("Procesing " + imagefile) 367 | boxes = setup(imagefile) 368 | for i in range(0, num_generations_per_page): 369 | draw(imagefile, boxes) 370 | -------------------------------------------------------------------------------- /images/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/1.png -------------------------------------------------------------------------------- /images/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/2.png -------------------------------------------------------------------------------- /images/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/3.png -------------------------------------------------------------------------------- /images/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/4.png -------------------------------------------------------------------------------- /images/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/5.png -------------------------------------------------------------------------------- /images/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/6.png -------------------------------------------------------------------------------- /images/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/7.png -------------------------------------------------------------------------------- /images/title.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/title.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pyocr 2 | pillow 3 | spacy 4 | git+https://github.com/aparrish/pytracery.git#egg=tracery 5 | --------------------------------------------------------------------------------