├── .gitignore
├── README.md
├── UNLICENSE
├── app.py
├── images
├── 1.png
├── 2.png
├── 3.png
├── 4.png
├── 5.png
├── 6.png
├── 7.png
└── title.png
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | secret
2 | venv
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # █ Blackout █
2 |
3 | For [NaNoGenMo 2016](https://github.com/NaNoGenMo/2016).
4 |
5 | _Blackout_ generates pages of text from book or newspaper scans in the style of [Newspaper Blackout Poetry](http://newspaperblackout.com/), popularized by [Austin Kleon](https://twitter.com/austinkleon) (related to work like [A Humument](http://tomphillipshumument.tumblr.com/) by Tom Phillips).
6 |
7 | _Blackout_ does the following:
8 |
9 | 1. Take, as input, an image of text, from a newspaper or book.
10 | 2. Run [OCR](https://github.com/jflesch/pyocr) against the image, identifying the words and their bounding boxes.
11 | 3. Feed the extracted text into a [natural language parser](https://spacy.io/), categorizing each part of speech.
12 | 3. Given one of many randomly selected [Tracery](https://github.com/aparrish/pytracery) grammars, select words from the current page that match the parts of speech of that grammar.
13 | 4. Draw around those words and "scribble" out all other text on the page image.
14 | 5. Output the final page as a new image.
15 |
16 | Pen width, line length, line direction, number of strokes, and stroke opacity are all randomly fuzzed. The pen color is always black, except in rare cases it is blood red.
17 |
18 | _Blackout_ has produced two creative works: "The Days Left Forebodings and Water" (this one), and was modified by @samplereality for use in producing ["A Great Intimate Unmistakeable More"](https://github.com/NaNoGenMo/2021/issues/87)
19 |
20 | ## "The Days Left Forebodings and Water"
21 |
22 |
23 |
24 | The source material is [A Vindication of the Rights of Women](https://en.wikipedia.org/wiki/A_Vindication_of_the_Rights_of_Woman) by Mary Wollstonecraft (1792).
25 |
26 | Read [The Days Left Forebodings and Water](https://lizadaly.com/projects/blackout/lizadaly-blackout-nanogenmo-2016.pdf). 45 pages long, consists of entries that were generated randomly, but hand-picked and ordered on November 9, 2016.
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 | (The full NaNoGenMo entry of ~50,000 words was a [9.3GB PDF] of nearly 10,000 pages. If for some reason you want it, just ask.)
43 |
--------------------------------------------------------------------------------
/UNLICENSE:
--------------------------------------------------------------------------------
1 | This is free and unencumbered software released into the public domain.
2 |
3 | Anyone is free to copy, modify, publish, use, compile, sell, or
4 | distribute this software, either in source code form or as a compiled
5 | binary, for any purpose, commercial or non-commercial, and by any
6 | means.
7 |
8 | In jurisdictions that recognize copyright laws, the author or authors
9 | of this software dedicate any and all copyright interest in the
10 | software to the public domain. We make this dedication for the benefit
11 | of the public at large and to the detriment of our heirs and
12 | successors. We intend this dedication to be an overt act of
13 | relinquishment in perpetuity of all present and future rights to this
14 | software under copyright law.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 |
24 | For more information, please refer to
25 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | import os
2 | import random
3 | from statistics import mean
4 | import string
5 | import uuid
6 | import sys
7 |
8 | import tracery
9 | import spacy
10 |
11 | import pyocr
12 | import pyocr.builders
13 | from PIL import Image, ImageDraw, ImageFilter
14 |
15 | BOUND_PADDING = 50
16 | BOX_PADDING = 50 # 10
17 | WOBBLE_MAX = 2
18 |
19 | nlp = spacy.load('en')
20 |
21 | def draw_vertical_lines(draw, boxes, doc_bounding_box, line_width):
22 | line_weight_factor = random.triangular(0.005, 1.2)
23 | current_x = doc_bounding_box[0] - line_width / 2
24 | color = get_color()
25 |
26 | while current_x < doc_bounding_box[2]:
27 | start_x = current_x
28 | start_y = doc_bounding_box[1] - line_width / 2
29 | end_x = start_x
30 | end_y = doc_bounding_box[3] - line_width / 2
31 |
32 | bx0 = start_x
33 | bx1 = start_x + line_width
34 |
35 | select_boxes = []
36 | for box in boxes:
37 | wx0 = box.position[0][0] - BOUND_PADDING
38 | wx1 = box.position[1][0] + BOUND_PADDING
39 | if bx0 < wx0 and wx1 < bx1 or \
40 | wx0 < bx1 and bx1 < wx1 or \
41 | wx0 < bx0 and bx0 < wx1:
42 | select_boxes.append(box)
43 |
44 | if select_boxes:
45 | y0 = start_y
46 | y1 = end_y
47 | for box in select_boxes:
48 | y1 = box.position[0][1] - BOX_PADDING
49 | draw_line(draw, [start_x, y0, end_x, y1], line_width=line_width, color=color,
50 | line_weight_factor=line_weight_factor, dir='v')
51 | y0 = box.position[1][1] + BOX_PADDING
52 | draw_line(draw, [start_x, y0, end_x, end_y], line_width=line_width, color=color,
53 | line_weight_factor=line_weight_factor, dir='v')
54 | else:
55 | draw_line(draw, [start_x, start_y, end_x, end_y], line_width=line_width, color=color,
56 | line_weight_factor=line_weight_factor, dir='v')
57 |
58 | current_x = start_x + line_width
59 |
60 | def get_color():
61 | if random.randint(0, 100) == 0:
62 | color = (179, 27, 27)
63 | else:
64 | color = (int(random.triangular(0, 10, 1)),
65 | int(random.triangular(0, 10, 1)),
66 | int(random.triangular(0, 10, 1)),
67 | )
68 | return color
69 |
70 | def draw_horizontal_lines(draw, boxes, doc_bounding_box, line_width):
71 | """Draw black horizontal lines across the page _except_ for that word"""
72 | line_weight_factor = random.triangular(0.005, 1.2)
73 | color = get_color()
74 | start_x = doc_bounding_box[0]
75 | current_y = doc_bounding_box[1]
76 | end_x = doc_bounding_box[2]
77 | end_y = doc_bounding_box[3] - line_width / 2
78 |
79 | while current_y < doc_bounding_box[3]:
80 | by0 = current_y
81 | by1 = current_y + line_width
82 |
83 | select_boxes = []
84 | for box in boxes:
85 | wy0 = box.position[0][1]
86 | wy1 = box.position[1][1]
87 | if by0 <= wy0 and wy1 <= by1 or \
88 | wy0 <= by1 and by1 <= wy1 or \
89 | wy0 <= by0 and by0 <= wy1:
90 | select_boxes.append(box)
91 |
92 | if select_boxes:
93 | x0 = start_x
94 | x1 = end_x
95 | for box in select_boxes:
96 | x1 = box.position[0][0] - BOX_PADDING
97 | draw_line(draw, [x0, current_y, x1, current_y],
98 | line_width=line_width,
99 | line_weight_factor=line_weight_factor, color=color,
100 | dir="h")
101 | x0 = box.position[1][0] + BOX_PADDING
102 | draw_line(draw, [x0 + BOX_PADDING, current_y, end_x, current_y],
103 | line_width=line_width, line_weight_factor=line_weight_factor, dir="h", color=color)
104 | else:
105 | draw_line(draw, [start_x, current_y, end_x, current_y],
106 | line_width=line_width, color=color,
107 | line_weight_factor=line_weight_factor,
108 | dir="h")
109 | current_y = by1
110 |
111 |
112 |
113 | def draw_line(draw, pos, line_width, dir="h", color=(0, 0, 0), line_weight_factor=1):
114 | # Draw a fuzzy line of randomish width repeat times
115 | repeat = random.randint(10, 20)
116 | width = int(line_width) * line_weight_factor
117 | default_padding = line_width / 3
118 |
119 | margin_extent = 20 # random.randint(1, 20)
120 | # Slide the center of the line down width/2 based on dir
121 | if dir == 'h':
122 | pos[1] += width / 2
123 | pos[3] += width / 2
124 | # Introduce some randomness into the margins
125 | pos[0] -= random.triangular(width / margin_extent, width * margin_extent)
126 | pos[2] += random.triangular(width / margin_extent, width * margin_extent)
127 | else:
128 | pos[0] -= width / 2
129 | pos[2] -= width / 2
130 | # Introduce some randomness into the margins
131 | pos[1] -= random.triangular(width / margin_extent, width * margin_extent)
132 | pos[3] += random.triangular(width / margin_extent, width * margin_extent)
133 |
134 | for i in range(0, repeat):
135 |
136 | width = int(random.uniform(line_width - default_padding, line_width))
137 | padding = default_padding * 4
138 |
139 | pos[0] = random.triangular(pos[0] - padding, pos[0] + padding)
140 | pos[1] = random.triangular(pos[1] - padding, pos[1] + padding)
141 | pos[2] = random.triangular(pos[2] - padding, pos[2] + padding)
142 | pos[3] = random.triangular(pos[3] - padding, pos[3] + padding)
143 |
144 | opacity = 240 + i
145 | width_factor = random.triangular(1, 10, 1)
146 | draw.line(pos, width=int(width / width_factor), fill=(*color, opacity))
147 |
148 | def get_boxes(imagefile, tool):
149 | num_words = 5
150 | boxes = tool.image_to_string(
151 | Image.open(imagefile), lang="eng",
152 | builder=pyocr.builders.WordBoxBuilder()
153 | )
154 | return boxes
155 |
156 | def image_filter(img):
157 | for i in range(10):
158 | img = img.filter(ImageFilter.SMOOTH_MORE)
159 | return img
160 |
161 |
162 | def parse_words(boxes):
163 | words = []
164 | for box in boxes:
165 | word = box.content.strip()
166 | word = word.translate(str.maketrans({a:None for a in string.punctuation}))
167 | words.append({'text': word, 'box': box})
168 | sent = ' '.join([w['box'].content for w in words])
169 | doc = nlp(sent)
170 | for token in doc:
171 | for word in words:
172 | text = word['text']
173 | if token.text == text:
174 | word['token'] = token
175 | word['pos'] = token.pos_
176 | return words
177 |
178 | def find_boxes_for_grammar(boxes):
179 | words = parse_words(boxes)
180 | grammars = [
181 | ['DET', 'NOUN', 'VERB', 'NOUN'],
182 | ['ADJ', 'NOUN', 'VERB', 'NOUN'],
183 | ['ADJ', 'NOUN', 'VERB', 'ADV'],
184 | ['DET', 'NOUN', 'VERB', 'NOUN', 'CONJ', 'NOUN'],
185 | ['VERB', 'DET', 'NOUN'],
186 | ['ADV', 'VERB', 'NOUN', 'CONJ', 'NOUN']
187 | ]
188 | grammar = random.choice(grammars)
189 | picks = []
190 | word_index = 0
191 | prev_word = None
192 | prev_pos = None
193 |
194 | for pos in grammar:
195 | while True:
196 | word = words[word_index]
197 | if len(picks) > 0:
198 | prev_word = picks[-1]
199 | prev_pos = prev_word['pos']
200 | pick_this = True
201 | if prev_pos == 'DET':
202 | if prev_word['text'] == 'a' or prev_word['text'] == 'an':
203 | # Pick this if it's singular
204 | pick_this = not is_plural(word)
205 | if prev_word['text'] == 'a':
206 | # Pick this if it doesn't start with a vowel
207 | pick_this = not starts_with_vowel(word) and pick_this
208 | if prev_word['text'] == 'an':
209 | pick_this = starts_with_vowel(word) and pick_this
210 | if prev_word['text'] == 'this':
211 | pick_this = not is_plural(word) and pick_this
212 | if prev_word['text'] == 'these':
213 | pick_this = is_plural(word) and pick_this
214 | if prev_pos == 'NOUN':
215 | # If the previous noun was plural, the verb must be plural
216 | if is_plural(prev_word):
217 | pick_this = is_plural_verb(word) and pick_this
218 | if not is_plural(prev_word):
219 | pick_this = not is_plural_verb(word) and pick_this
220 | if prev_pos == 'VERB':
221 | # If the verb was plural, the noun must be
222 | if is_plural_verb(prev_word):
223 | pick_this = is_plural(word) and pick_this
224 | if not is_plural_verb(prev_word):
225 | pick_this = not is_plural(word) and pick_this
226 | if pos == 'VERB':
227 | # Don't pick auxilliary verbs as they won't have a helper
228 | if 'token' in word:
229 | pick_this = word['token'].dep_ != 'aux' and pick_this
230 |
231 | if 'pos' in word and word['pos'] == pos and pick_this and random.randint(0, 30) == 0:
232 | #print("Picking ", word['text'], " ", word['token'].dep_)
233 | picks.append(word)
234 | prev_pos = pos
235 | word_index += 1
236 | break
237 |
238 | word_index += 1
239 | return [p['box'] for p in picks]
240 |
241 | def is_plural(word):
242 | if word['text'] == 'men' or word['text'] == 'women': # Special case this since one comes up a lot
243 | return True
244 | return word['text'][-1] == 's'
245 |
246 | def is_plural_verb(word):
247 | if word['text'] == 'have':
248 | return True
249 | return word['text'][-1] != 's'
250 |
251 | def is_present(word):
252 | return word['text'][-1] == 's'
253 |
254 | def starts_with_vowel(word):
255 | vowels = set(['a', 'e', 'i', 'o', 'u'])
256 | return word['text'][0] in vowels
257 |
258 | def setup(imagefile):
259 | tool = pyocr.get_available_tools()[0]
260 | boxes = get_boxes(imagefile, tool)
261 | return boxes
262 |
263 | def draw(imagefile, boxes):
264 |
265 | while True:
266 | try:
267 | select_boxes = find_boxes_for_grammar(boxes)
268 | break
269 | except IndexError:
270 | #print("Retrying...")
271 | pass
272 |
273 | # Get the line height by taking the average of all the box heights
274 | box_heights = []
275 | margin_lefts = []
276 | margin_rights = []
277 | margin_top = boxes[0].position[0][1]
278 | margin_bottom = boxes[-1].position[1][1]
279 |
280 | for box in boxes:
281 | margin_lefts.append(box.position[0][0])
282 | margin_rights.append(box.position[1][0])
283 | box_heights.append(box.position[1][1] - box.position[0][1])
284 |
285 | margin_left = min(margin_lefts)
286 | margin_right = max(margin_rights)
287 |
288 | line_width = mean(box_heights)
289 | line_spaces = [0]
290 | last_y_pos = boxes[0].position[1][1]
291 |
292 | src = Image.open(imagefile)
293 | src = src.convert('RGBA')
294 | img = Image.new('RGBA', (src.size[0], src.size[1]))
295 | draw = ImageDraw.Draw(img)
296 |
297 |
298 | doc_bounding_box = (margin_left, margin_top, margin_right, margin_bottom)
299 |
300 | line_choices = random.choice(('v', 'h', 'a'))
301 | line_choices = 'v'
302 | if line_choices == 'v':
303 | draw_vertical_lines(draw, select_boxes, doc_bounding_box=doc_bounding_box, line_width=line_width)
304 | elif line_choices == 'h':
305 | draw_horizontal_lines(draw, select_boxes,
306 | doc_bounding_box=doc_bounding_box,
307 | line_width=line_width)
308 | else:
309 | draw_vertical_lines(draw, select_boxes, doc_bounding_box=doc_bounding_box, line_width=line_width)
310 | draw_horizontal_lines(draw, select_boxes,
311 | doc_bounding_box=doc_bounding_box,
312 | line_width=line_width)
313 |
314 |
315 |
316 | img = image_filter(img)
317 | out = Image.alpha_composite(src, img)
318 |
319 | repeat = 10
320 | f = 10
321 |
322 | for box in select_boxes:
323 | pad = BOX_PADDING
324 | d = ImageDraw.Draw(out)
325 | p0 = [box.position[0][0] - pad, box.position[0][1] - pad]
326 | p1 = [box.position[1][0] + pad, box.position[0][1] - pad]
327 | p2 = [box.position[1][0] + pad, box.position[1][1] + pad]
328 | p3 = [box.position[0][0] - pad, box.position[1][1] + pad]
329 | b = (*p0, *p2)
330 | crop = src.crop(box=b)
331 | out.paste(crop, box=b)
332 | w = 10 + int(random.uniform(-5, 5))
333 | for i in range(0, repeat):
334 | fuzz = random.uniform(-f, f)
335 | p0 = [p + fuzz for p in p0]
336 | fuzz = random.uniform(-f, f)
337 | p1 = [p + fuzz for p in p1]
338 | fuzz = random.uniform(-f, f)
339 | p2 = [p + fuzz for p in p2]
340 | fuzz = random.uniform(-f, f)
341 | p3 = [p + fuzz for p in p3]
342 | fuzz = random.uniform(-f, f)
343 | d.line(p0 + p1, width=w, fill="black")
344 | d.line(p1 + p2, width=w, fill="black")
345 | d.line(p2 + p3, width=w, fill="black")
346 | d.line(p3 + p0, width=w, fill="black")
347 |
348 |
349 | final = Image.new('RGBA', (src.size[0], src.size[1]))
350 | canvas = ImageDraw.Draw(final)
351 | canvas.rectangle([0, 0, final.size[0], final.size[1]], fill='white')
352 | final = Image.alpha_composite(final, out)
353 | outfile = str(uuid.uuid4())[0:5] + '.png' # os.path.basename(imagefile)
354 |
355 | final.save("build/" + outfile)
356 |
357 | if __name__ == '__main__':
358 | path = sys.argv[1]
359 | pages = []
360 | for f in os.listdir(path):
361 | pages.append(f)
362 | num_generations_per_page = 100
363 | while True:
364 | f = random.choice(pages)
365 | imagefile = os.path.join(path, f)
366 | print("Procesing " + imagefile)
367 | boxes = setup(imagefile)
368 | for i in range(0, num_generations_per_page):
369 | draw(imagefile, boxes)
370 |
--------------------------------------------------------------------------------
/images/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/1.png
--------------------------------------------------------------------------------
/images/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/2.png
--------------------------------------------------------------------------------
/images/3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/3.png
--------------------------------------------------------------------------------
/images/4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/4.png
--------------------------------------------------------------------------------
/images/5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/5.png
--------------------------------------------------------------------------------
/images/6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/6.png
--------------------------------------------------------------------------------
/images/7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/7.png
--------------------------------------------------------------------------------
/images/title.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lizadaly/blackout/82614781a025faddcf5d447c20908f0fdc97e674/images/title.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pyocr
2 | pillow
3 | spacy
4 | git+https://github.com/aparrish/pytracery.git#egg=tracery
5 |
--------------------------------------------------------------------------------