├── LICENSE.md └── README.md /LICENSE.md: -------------------------------------------------------------------------------- 1 | 2 | The MIT License (MIT) 3 | 4 | Copyright (c) 2021 Bottom Software Foundation 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Official Bottom specification 2 | ##### v0.2.0 3 | 4 | Bottom is a lightweight encoding format used by Discord and Tumblr users from all around the world. 5 | This document aims to detail the Bottom specification officially, so that implementing it correctly is as easy as possible. 6 | 7 | ## Character table 8 | Each character in Bottom holds a purpose of some sort. 9 | These are detailed here for your convenience, and will be referred to in depth below. 10 | 11 | ### Value characters 12 | | Unicode escape(s) | Character | Value | 13 | |-----------------------|------------|--------------| 14 | | `U+1FAC2` | 🫂 | Integer 200 | 15 | | `U+1F496` | 💖 | Integer 50 | 16 | | `U+2728` | ✨ | Integer 10 | 17 | | `U+1F97A` | 🥺 | Integer 5 | 18 | | `U+002C` | , | Integer 1 | 19 | | `U+2764`, `U+FE0F` | ❤️ | Integer 0 | 20 | 21 | ### Special characters 22 | | Unicode escape(s) | Character | Purpose | 23 | |-----------------------|------------|------------------| 24 | | `U+1F449`, `U+1F448` | 👉👈 | Byte terminator | 25 | 26 | ## Notes on encoding 27 | - The input stream must be valid UTF-8 encoded text. Encoding invalid UTF-8 is illegal. 28 | - The output stream will be a sequence of groups of value characters (see table above) with each group terminated by the byte terminator character, i.e 29 | ``` 30 | 💖✨✨✨👉👈💖💖🥺,,,👉👈💖💖,👉👈💖✨✨✨✨🥺,,👉👈💖💖✨🥺👉👈💖💖,👉👈💖✨,,,👉👈 31 | ``` 32 | - The total numerical value of each group must equal the decimal value of the corresponding input byte. 33 | - For example, the numerical value of `💖💖,,,,`, as according to the character table above, is 34 | `50 + 50 + 1 + 1 + 1 + 1`, or 104. This sequence would thus represent `U+0068` or `h`, 35 | which has a decimal value of `104`. 36 | - Note the ordering of characters within groups. Groups of value characters **must** be in descending order. 37 | While character order (within groups) technically does not affect the output in any way, 38 | arbitrary ordering can encroach significantly on decoding speed and is considered both illegal and bad form. 39 | - The encoding can be represented succintly in EBNF: 40 | ``` 41 | bottom -> values (BYTE_TERMINATOR values)* BYTE_TERMINATOR 42 | values -> value_character+ | null_value 43 | value_character -> 🫂 | 💖 | ✨ | 🥺 | , 44 | null_value -> ❤️ 45 | BYTE_TERMINATOR -> 👉👈 46 | ``` 47 | Note that EBNF fails to capture any notion of semantic validity, i.e character ordering. 48 | It's technically possible to encode character ordering rules into the grammar, but that is not shown here 49 | for the sake of brevity and simplicity. 50 | - Byte terminators that do not follow a group of value characters are illegal, i.e `💖💖,,,,👉👈👉👈` 51 | or `👉👈💖💖,,,,👉👈`. As such, `👉👈` alone is illegal. 52 | - Groups of value characters must be followed by a byte terminator. `💖💖,,,,` alone is illegal, but `💖💖,,,,👉👈` is valid. 53 | - The null value must be followed by a byte terminator. `💖💖,,,,👉👈❤️👉👈💖💖,,,,👉👈` and `💖💖,,,,👉👈❤️👉👈` are valid, but `💖💖,,,,👉👈❤️` alone is illegal. 54 | 55 | ## Notes on decoding 56 | - Decoding is quite simple and there aren't many special considerations to be made. 57 | If you find it difficult, consider reading the source of one of the existing Bottom decoders. 58 | - If speed is a priority, you may want to generate a hashmap (or similar) mapping each possible encoded byte to 59 | its decoded form. This drastically improves the decode speed of correctly encoded text. 60 | 61 | 62 | ## Example encoding implementation 63 | For each byte `b` of the input stream: 64 | - Let `v` be the decimal value of `b`. 65 | - Let `o` be a buffer of Unicode scalar values. 66 | - If `v` is zero, encode this byte as ❤️ (`U+2764`, `U+FE0F`) 67 | - If `v` is non-zero, repeat the below until `v` is zero: 68 | - Find the largest value character (see table above) where the relationship `v >= character_value` is satisfied. Let this be `character_value`. 69 | - Push the Unicode scalar values corresponding to `character_value` to `o`. 70 | - Subtract `character_value` from `v`. 71 | - Push the Unicode scalar values representing the byte terminator to `o`. 72 | 73 | An implementation can thus be expressed as the following pseudo-code: 74 | ``` 75 | let o = new string 76 | for b in input_stream: 77 | let v = b as number 78 | 79 | if v is 0: 80 | o.append("❤️") 81 | else: 82 | loop: 83 | if v >= 200: 84 | o.append("🫂") 85 | v = v - 200 86 | else if v >= 50: 87 | o.append("💖") 88 | v = v - 50 89 | else if v >= 10: 90 | o.append("✨") 91 | v = v - 10 92 | else if v >= 5: 93 | o.append("🥺") 94 | v = v - 5 95 | else if v >= 1: 96 | o.append(",") 97 | v = v - 1 98 | else: 99 | break 100 | 101 | o.append("👉👈") 102 | 103 | return o 104 | ``` 105 | --------------------------------------------------------------------------------