├── .gitignore
├── README.md
└── vimcrypt.py


/.gitignore:
--------------------------------------------------------------------------------
1 | vimcrypted.txt
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # vimcryption
 2 | Liberating ASCII characters from their cramped 1-byte prisons by abusing vim's incorrect UTF-8 decoding.
 3 | 
 4 | Simply pipe the script some ascii text to balloon it up to, on average, 3.5x the size! If this isn't enough for you, as an added bonus only other vim users will be able to read it. Bask in the smugness that can only come from using standard-defying software.
 5 | 
 6 | ## Usage
 7 | `(python2 | python3) vimcrypt.py inputfile > outputfile && vim outputfile`
 8 | 
 9 | `some-command | (python2 | python3) vimcrypt.py > outputfile && vim outputfile`
10 | 
11 | ## How?
12 | UTF-8 encodes codepoints in the following way (using 🍓 U+1F353 as an example)
13 | 
14 | * If the codepoint is < 128 (ASCII), simply encode it as such.
15 | * Otherwise, taking your codepoint as bits, fill in the gaps in the following pattern from the right:
16 | 
17 | > 10------ 10------ 10------ 10------ 10------
18 | >
19 | > 0x1F353 == **11111001101010011**
20 | >
21 | > 10------ 10------ 10-**11111** 10**001101** 10**010011**
22 | 
23 | * Delete any unused bytes:
24 | 
25 | > 100**11111** 10**001101** 10**010011**
26 | 
27 | * Prepend a byte consisting of as many 1's as bytes you have used (+1 to include itself) followed by 0's. We've used 3 bytes so we should have four 1's followed by four 0's:
28 | 
29 | > **11110000** 10011111 10001101 10010011
30 | 
31 | * If you are wasting space, i.e. you could fit the topmost 1 of your codepoint in the length byte's 0's, then shuffle it down and remove one of the 1's to reflect the new length.
32 | 
33 | * We're done! Therefore U+1F353 is encoded by UTF-8 as **F0 9F 8D 93**
34 | 
35 | * If you're still confused, Tom Scott explains it much more elegantly than I ever could in [this video](https://www.youtube.com/watch?v=MijmeoH9LT4).
36 | 
37 | You will notice that once we've decided to use multiple bytes, we only have 6 bits of space in our least significant byte, but most ascii characters require 7. This is an invalid use of UTF-8 according to http://unicode.org/versions/corrigendum1.html
38 | 
39 | >_the Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP characters, and clarified some of the conformance clauses_
40 | 
41 | ...but vim ignores this and parses non-shortest forms of characters regardless. As a result, if you write the bytes **C1 A1** to a file and open it in vim, the letter "a" is displayed despite the fact that this is technically an invalid encoding. This script just takes this to another level by re-encoding each character given to it to use between 2-5 bytes randomly.
42 | 
43 | ## Why?
44 | Boredom, mostly.
45 | 
46 | ## This isn't real encryption. You suck.
47 | :(
48 | 


--------------------------------------------------------------------------------
/vimcrypt.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from random import randint
 3 | 
 4 | if not sys.stdin.isatty():
 5 |     input_file = sys.stdin
 6 | elif len(sys.argv) < 2:
 7 |     print("Usage: python vimcrypt.py <file to liberate>")
 8 |     exit()
 9 | else:
10 |     try:
11 |         input_file = open(sys.argv[1])
12 |     except IOError:
13 |         print("Couldn't find file: %s" % sys.argv[1])
14 |         exit()
15 | 
16 | text_input = input_file.read()
17 | # Validate input
18 | try:
19 |     text_input.encode("ascii")
20 | # So we're compatible with python 2 and 3
21 | except (UnicodeDecodeError, UnicodeEncodeError):
22 |     print("Input text must be ascii")
23 |     exit()
24 | 
25 | out_bytes = bytearray()
26 | for char in text_input:
27 |     # Don't vimcrypt linefeeds
28 |     if ord(char) == 0x0a:
29 |         out_bytes.append(0x0a)
30 |         continue
31 | 
32 |     # Give each ascii byte some room to breathe by expanding it to 2-5 bytes randomly
33 |     byte_count = randint(2,6)
34 |     first = int("1"*byte_count + "0"*(8-byte_count), 2)
35 |     out_bytes.append(first)
36 | 
37 |     # Append n-1 continuation bytes
38 |     for _ in range(byte_count-1):
39 |         out_bytes.append(0x80)
40 |     out_bytes[-1] |= (ord(char) & 0b00111111)
41 |     out_bytes[-2] |= (ord(char) & 0b11000000) >> 6
42 | 
43 | sys.stdout.write(out_bytes)
44 | sys.stdout.flush()
45 | sys.stdout.close()
46 | 


--------------------------------------------------------------------------------