├── .formatter.exs ├── .gitignore ├── .travis.yml ├── LICENSE ├── README.md ├── examples.exs ├── iex.exs ├── images ├── EntropyBits.png ├── HashCollision.png └── NBitCollision.png ├── lib ├── charset.ex └── entropy_string.ex ├── mix.exs ├── mix.lock └── test ├── bits_test.exs ├── charset_test.exs ├── entropy_string_test.exs └── test_helper.exs /.formatter.exs: -------------------------------------------------------------------------------- 1 | [ 2 | inputs: ["mix.exs", "examples.exs", "{lib,test}/**/*.{ex,exs}"] 3 | ] 4 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # The directory Mix will write compiled artifacts to. 2 | /_build/ 3 | 4 | # If you run "mix test --cover", coverage assets end up here. 5 | /cover/ 6 | 7 | # The directory Mix downloads your dependencies sources to. 8 | /deps/ 9 | 10 | # Where 3rd-party dependencies like ExDoc output generated docs. 11 | /doc/ 12 | 13 | # Ignore .fetch files in case you like to edit your project deps locally. 14 | /.fetch 15 | 16 | # If the VM crashes, it generates a dump, let's ignore it too. 17 | erl_crash.dump 18 | 19 | # Also ignore archive artifacts (built via "mix archive.build"). 20 | *.ez 21 | 22 | .atomignore 23 | .elixir_ls 24 | .vscode 25 | 26 | # emacs 27 | *~ 28 | 29 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: elixir 2 | sudo: false 3 | 4 | env: 5 | - ELIXIR_ASSERT_TIMEOUT=2000 6 | 7 | elixir: 8 | - '1.8' 9 | otp_release: 10 | - '21.2' 11 | 12 | script: 13 | - mix test 14 | 15 | after_script: 16 | - mix deps.get --only docs 17 | - MIX_ENV=docs mix inch.report 18 | 19 | notifications: 20 | recipients: 21 | - paul@knoxen.com 22 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Knoxen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EntropyString for Elixir 2 | 3 | Efficiently generate cryptographically strong random strings of specified entropy from various character sets. 4 | 5 | [![Build Status](https://travis-ci.org/EntropyString/Elixir.svg?branch=master)](https://travis-ci.org/EntropyString/Elixir)   [![Hex Version](https://img.shields.io/hexpm/v/entropy_string.svg "Hex Version")](https://hex.pm/packages/entropy_string)   [![License: MIT](https://img.shields.io/npm/l/express.svg)]() 6 | 7 | 8 | ##
NOTICE
9 | 10 |
11 | 12 | `EntropyString` is now superseded by [Puid](https://github.com/puid/Elixir). 13 | 14 | `Puid` is based on the same logic as `EntropyString` but is [significantly faster](https://puid.github.io/Elixir/#EntropyString) for pre-defined character sets (which has been expanded to a total of [16 sets](https://hexdocs.pm/puid/Puid.CharSet.html#module-charsets)), as well as somewhat faster for custom characters, which can now be of any character count, as well as Unicode. 15 | 16 | The `Puid` [generated API](https://puid.github.io/Elixir/#ModuleAPI) for modules has been simplified to two functions: `generate/0` and `info/0`. 17 | 18 | `EntropyString` will not be removed. Although switching to `Puid` shouldn't be difficult, contact [Paul Rogers](https://hex.pm/users/dingosky) with any questions or need of assistance. 19 | 20 |
21 | 22 | ### TOC 23 | - [Installation](#Installation) 24 | - [Usage](#Usage) 25 | - [Overview](#Overview) 26 | - [Real Need](#RealNeed) 27 | - [Character Sets](#CharacterSets) 28 | - [Custom Characters](#CustomCharacters) 29 | - [Efficiency](#Efficiency) 30 | - [Custom Bytes](#CustomBytes) 31 | - [Entropy Bits](#EntropyBits) 32 | - [Why You Don't Need UUIDs](#UUID) 33 | - [Take Away](#TakeAway) 34 | 35 | ### Installation 36 | 37 | Add `entropy_string` to `mix.exs` dependencies: 38 | 39 | ```elixir 40 | def deps, 41 | do: [ 42 | {:entropy_string, "~> 1.3"} 43 | ] 44 | ``` 45 | 46 | Update dependencies 47 | 48 | ```bash 49 | mix deps.get 50 | ``` 51 | 52 | [TOC](#TOC) 53 | 54 | ### Usage 55 | 56 | Generate a potential of _10 million_ random strings with _1 in a trillion_ chance of repeat: 57 | 58 | ```elixir 59 | iex> import EntropyString 60 | EntropyString 61 | iex> defmodule(Id, do: use(EntropyString, total: 10.0e6, risk: 1.0e12)) 62 | iex> Id.random() 63 | "JhD7L4343P34TTL9NQ" 64 | ``` 65 | 66 | `EntropyString` uses predefined `charset32` characters by default (reference [Character Sets](#CharacterSets)). To get a random hexadecimal string with the same entropy bits as above (see [Real Need](#RealNeed) for a description of how `total` and `risk` determine entropy bits): 67 | 68 | ```elixir 69 | iex> defmodule(Hex, do: use(EntropyString, total: 10.0e6, risk: 1.0e12, charset: charset16)) 70 | iex> Hex.random() 71 | "03a4d43502c45b0f87fb3c" 72 | ``` 73 | 74 | Custom characters may be specified. Using uppercase hexadecimal characters: 75 | 76 | ```elixir 77 | iex> defmodule(UpperHex, do: use(EntropyString, charset: "0123456789ABCDEF")) 78 | iex> UpperHex.random() 79 | "C0B861E48CFB270738A4B6D54DA8E768" 80 | ``` 81 | 82 | Note that in the absence of specifying `total` and `risk` a default of 128 bits of entropy is used. 83 | 84 | Convenience functions exists for a variety of random string needs. For example, to create OWASP session ID using predefined base 32 characters: 85 | 86 | ```elixir 87 | iex> defmodule(Server, do: use(EntropyString)) 88 | iex> Server.session() 89 | "rp7D4hGp2QNPT2FP9q3rG8tt29" 90 | ``` 91 | 92 | Or a 256 bit token using [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5) file system and URL safe characters: 93 | 94 | ```elixir 95 | iex> defmodule(Generator, do: use(EntropyString)) 96 | iex> Generator.token() 97 | "X2AZRHuNN3mFUhsYzHSE_r2SeZJ_9uqdw-j9zvPqU2O" 98 | ``` 99 | 100 | The function `bits/0` reveals the entropy bits in use by a module: 101 | 102 | ```elixir 103 | iex> defmodule(Id, do: use(EntropyString, total: 1.0e9, risk: 1.0e15)) 104 | iex> Id.bits() 105 | "108.6" 106 | iex> Id.string() 107 | "9LtmpbG2TPq9NGjdq99BpQ" 108 | ``` 109 | 110 | The function `string/0` is an alias for `random/0`. 111 | 112 | The function `chars/0` reveals the characters in use by the module: 113 | 114 | ```elixir 115 | iex> defmodule(Id, do: use(EntropyString, bits: 96, charset: charset32)) 116 | EntropyString 117 | iex> Id.chars() 118 | "2346789bdfghjmnpqrtBDFGHJLMNPQRT" 119 | iex> Id.string() 120 | "JrftFNmJ8gHhRBp9f7dJ" 121 | ``` 122 | 123 | #### Examples 124 | 125 | The `examples.exs` file contains a smattering of example uses: 126 | 127 | ```bash 128 | > iex --dot-iex iex.exs 129 | Erlang/OTP ... 130 | EntropyString Loaded 131 | 132 | Results of executing examples.exs file 133 | -------------------------------------- 134 | 135 | Id: Predefined base 32 CharSet 136 | Bits: 128 137 | Characters: 2346789bdfghjmnpqrtBDFGHJLMNPQRT 138 | Random ID: L42P32Ldj6L8JdTTdt2HtHnp68 139 | 140 | Hex: Predefined hex characters 141 | Bits: 128 142 | Characters: 0123456789abcdef 143 | Random ID: 75f5758c1225a8417f186e66a4778188 144 | 145 | Base64Id: Predefined URL and file system safe character session id 146 | Characters: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_ 147 | Session ID: KdGziqwcDFZJJL43boV85J 148 | 149 | UpperHex: Uppercase hex characters 150 | Bits: 64 151 | Characters: 0123456789ABCDEF 152 | ID: CCDBFD3A05C087D4 153 | 154 | DingoSky: Custom characters for 10 million IDs with a 1 in a billion chance of repeat 155 | Bits: 75.4 156 | Characters: dingosky 157 | Random ID: yodynyykgskgoodoiyidnkssnd 158 | 159 | Server: 256 entropy bit token 160 | Characters: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_ 161 | Token: RtJosJEgOmA0oy8wPyUGju6SeJhCDJslTPUlVbRJgRM 162 | ``` 163 | 164 | Further investigations can use the modules defined in `examples.exs`: 165 | 166 | ```elixir 167 | ES-iex> Hex.medium() 168 | "e092b3e3e13704681f" 169 | ES-iex> DingoSky.medium() 170 | "ynssinoiosgignoiokgsogk" 171 | ES-iex> WebServer.token() 172 | "mT2vN607xeJy8qzVElnFbCpCyYpuWrYRRKbtTsNI6RN" 173 | ``` 174 | 175 | [TOC](#TOC) 176 | 177 | ### Overview 178 | 179 | `EntropyString` provides easy creation of randomly generated strings of specific entropy using various character sets. Such strings are needed as unique identifiers when generating, for example, random IDs and you don't want the overkill of a UUID. 180 | 181 | A key concern when generating such strings is that they be unique. Guaranteed uniqueness, however, requires either deterministic generation (e.g., a counter) that is not random, or that each newly created random string be compared against all existing strings. When randomness is required, the overhead of storing and comparing strings is often too onerous and a different tack is chosen. 182 | 183 | A common strategy is to replace the **_guarantee of uniqueness_** with a weaker but often sufficient one of **_probabilistic uniqueness_**. Specifically, rather than being absolutely sure of uniqueness, we settle for a statement such as *"there is less than a 1 in a billion chance that two of my strings are the same"*. We use an implicit version of this very strategy every time we use a hash set, where the keys are formed from taking the hash of some value. We *assume* there will be no hash collision using our values, but we **do not** have any true guarantee of uniqueness per se. 184 | 185 | Fortunately, a probabilistic uniqueness strategy requires much less overhead than guaranteed uniqueness. But it does require we have some manner of qualifying what we mean by *"there is less than a 1 in a billion chance that 1 million strings of this form will have a repeat"*. 186 | 187 | Understanding probabilistic uniqueness of random strings requires an understanding of [*entropy*](https://en.wikipedia.org/wiki/Entropy_(information_theory)) and of estimating the probability of a [*collision*](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem) (i.e., the probability that two strings in a set of randomly generated strings might be the same). The blog post [Hash Collision Probabilities](http://preshing.com/20110504/hash-collision-probabilities/) provides an excellent overview of deriving an expression for calculating the probability of a collision in some number of hashes using a perfect hash with an N-bit output. This is sufficient for understanding the probability of collision given a hash with a **fixed** output of N-bits, but does not provide an answer to qualifying what we mean by *"there is less than a 1 in a billion chance that 1 million strings of this form will have a repeat"*. The [Entropy Bits](#EntropyBits) section below describes how `EntropyString` provides this qualifying measure. 188 | 189 | We'll begin investigating `EntropyString` by considering the [Real Need](#RealNeed) when generating random strings. 190 | 191 | [TOC](#TOC) 192 | 193 | ### Real Need 194 | 195 | Let's start by reflecting on the common statement: *I need random strings 16 characters long.* 196 | 197 | Okay. There are libraries available that address that exact need. But first, there are some questions that arise from the need as stated, such as: 198 | 199 | 1. What characters do you want to use? 200 | 2. How many of these strings do you need? 201 | 3. Why do you need these strings? 202 | 203 | The available libraries often let you specify the characters to use. So we can assume for now that question 1 is answered with: 204 | 205 | *Hexadecimal will do fine*. 206 | 207 | As for question 2, the developer might respond: 208 | 209 | *I need 10,000 of these things*. 210 | 211 | Ah, now we're getting somewhere. The answer to question 3 might lead to a further qualification: 212 | 213 | *I need to generate 10,000 random, unique IDs*. 214 | 215 | And the cat's out of the bag. We're getting at the real need, and it's not the same as the original statement. The developer needs *uniqueness* across a total of some number of strings. The length of the string is a by-product of the uniqueness, not the goal, and should not be the primary specification for the random string. 216 | 217 | As noted in the [Overview](#Overview), guaranteeing uniqueness is difficult, so we'll replace that declaration with one of *probabilistic uniqueness* by asking a fourth question: 218 | 219 |
    220 |
  1. What risk of a repeat are you willing to accept?
  2. 221 |
222 | 223 | Probabilistic uniqueness contains risk. That's the price we pay for giving up on the stronger declaration of guaranteed uniqueness. But the developer can quantify an appropriate risk for a particular scenario with a statement like: 224 | 225 | *I guess I can live with a 1 in a million chance of a repeat*. 226 | 227 | So now we've finally gotten to the developer's real need: 228 | 229 | *I need 10,000 random hexadecimal IDs with less than 1 in a million chance of any repeats*. 230 | 231 | Not only is this statement more specific, there is no mention of string length. The developer needs probabilistic uniqueness, and strings are to be used to capture randomness for this purpose. As such, the length of the string is simply a by-product of the encoding used to represent the required uniqueness as a string. 232 | 233 | How do you address this need using a library designed to generate strings of specified length? Well, you don't, because that library was designed to answer the originally stated need, not the real need we've uncovered. We need a library that deals with probabilistic uniqueness of a total number of some strings. And that's exactly what `EntropyString` does. 234 | 235 | Let's use `EntropyString` to help this developer generate 5 hexadecimal IDs from a pool of a potential 10,000 IDs with a 1 in a million chance of a repeat: 236 | 237 | ```elixir 238 | iex> defmodule(Id, do: use(EntropyString, total: 10_000, risk: 1.0e6, charset: charset16)) 239 | iex> Id.bits() 240 | 45.5 241 | iex> for x <- :lists.seq(1,5), do: Id.random() 242 | ["85e442fa0e83", "a74dc126af1e", "368cd13b1f6e", "81bf94e1278d", "fe7dec099ac9"] 243 | ``` 244 | 245 | Examining the above code, the `total` and `risk` values determine the amount of entropy needed, which is about 45.5 bits, and a `charset` of `charset16` specifies the use of hexadecimal characters. Then Ids are then generated using `Id.random/0`. 246 | 247 | Looking at the output, we can see each Id is 12 characters long. Again, the string length is a by-product of the characters (hex) used to represent the entropy (45.5 bits) we needed. And it seems the developer didn't really need 16 characters after all. 248 | 249 | [TOC](#TOC) 250 | 251 | ### Character Sets 252 | 253 | As we\'ve seen in the previous sections, `EntropyString` provides predefined characters for each of the supported character set lengths. Let\'s see what\'s under the hood. The predefined `CharSet`s are *charset64*, *charset32*, *charset16*, *charset8*, *charset4* and *charset2*. The characters for each were chosen as follows: 254 | 255 | - CharSet 64: **ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_** 256 | * The file system and URL safe char set from [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5). 257 |   258 | - CharSet 32: **2346789bdfghjmnpqrtBDFGHJLMNPQRT** 259 | * Remove all upper and lower case vowels (including y) 260 | * Remove all numbers that look like letters 261 | * Remove all letters that look like numbers 262 | * Remove all letters that have poor distinction between upper and lower case values. 263 | The resulting strings don't look like English words and are easy to parse visually. 264 |   265 | - CharSet 16: **0123456789abcdef** 266 | * Hexadecimal 267 |   268 | - CharSet 8: **01234567** 269 | * Octal 270 |   271 | - CharSet 4: **ATCG** 272 | * DNA alphabet. No good reason; just wanted to get away from the obvious. 273 |   274 | - CharSet 2: **01** 275 | * Binary 276 | 277 | You may, of course, want to choose the characters used, which is covered next in [Custom Characters](#CustomCharacters). 278 | 279 | [TOC](#TOC) 280 | 281 | ### Custom Characters 282 | 283 | Being able to easily generate random strings is great, but what if you want to specify your own characters? For example, suppose you want to visualize flipping a coin to produce 10 bits of entropy. 284 | 285 | ```elixir 286 | iex> defmodule Coin do 287 | ...> use EntropyString, charset: :charset2 288 | ...> def flip(flips), do: Coin.random(flips) 289 | ...> end 290 | {:module, Coin, 291 | ... 292 | 293 | iex> Coin.flip(10) 294 | "0100101011" 295 | ``` 296 | 297 | The resulting string of __0__'s and __1__'s doesn't look quite right. Perhaps you want to use the characters __H__ and __T__ instead. 298 | 299 | ```elixir 300 | iex> defmodule Coin do 301 | ...> use EntropyString, charset: "HT" 302 | ...> def flip(flips), do: Coin.random(flips) 303 | ...> end 304 | {:module, Coin, 305 | ... 306 | 307 | iex> Coin.flip(10) 308 | "HTTTHHTTHH" 309 | ``` 310 | 311 | As another example, we saw in [Character Sets](#CharacterSets) the predefined hex characters for `charSet16` are lowercase. Suppose you like uppercase hexadecimal letters instead. 312 | 313 | ```elixir 314 | iex> defmodule(Hex, do: use(EntropyString, charset: "0123456789ABCDEF", bits: 192)) 315 | {:module, Hex, 316 | ... 317 | iex> Hex.string() 318 | "73057082B6039721275A0F07A253EDD40FD7AB511DF0C44A" 319 | ``` 320 | 321 | To facilitate [efficient](#Efficiency) generation of strings, `EntropyString` limits character set lengths to powers of 2. Attempting to use a character set of an invalid length returns an error. 322 | 323 | ```elixir 324 | iex> EntropyString.random(:medium, "123456789ABCDEF") 325 | {:error, "Invalid char count: must be one of 2,4,8,16,32,64"} 326 | ``` 327 | 328 | Likewise, since calculating entropy requires specification of the probability of each symbol, `EntropyString` requires all characters in a set be unique. (This maximize entropy per string as well). 329 | 330 | ```elixir 331 | iex> EntropyString.random(:medium, "123456789ABCDEF1") 332 | {:error, "Chars not unique"} 333 | ``` 334 | 335 | [TOC](#TOC) 336 | 337 | ### Efficiency 338 | 339 | To efficiently create random strings, `EntropyString` generates the necessary number of random bytes needed for each string and uses those bytes in a binary pattern matching scheme to index into a character set. For example, to generate strings from the __32__ characters in the *charSet32* character set, each index needs to be an integer in the range `[0,31]`. Generating a random string of *charSet32* characters is thus reduced to generating random indices in the range `[0,31]`. 340 | 341 | To generate the indices, `EntropyString` slices just enough bits from the random bytes to create each index. In the example at hand, 5 bits are needed to create an index in the range `[0,31]`. `EntropyString` processes the random bytes 5 bits at a time to create the indices. The first index comes from the first 5 bits of the first byte, the second index comes from the last 3 bits of the first byte combined with the first 2 bits of the second byte, and so on as the bytes are systematically sliced to form indices into the character set. And since binary pattern matching is really efficient, this scheme is quite fast. 342 | 343 | The `EntropyString` scheme is also efficient with regard to the amount of randomness used. Consider the following possible Elixir solution to generating random strings. To generated a character, an index into the available characters is created using `Enum.random`. The code looks something like: 344 | 345 | ```elixir 346 | iex> defmodule MyString do 347 | ...> @chars "abcdefghijklmnopqrstuvwxyz0123456" 348 | ...> @max String.length(@chars)-1 349 | ...> 350 | ...> defp random_char do 351 | ...> ndx = Enum.random 0..@max 352 | ...> String.slice @chars, ndx..ndx 353 | ...> end 354 | ...> 355 | ...> def random_string(len) do 356 | ...> list = for _ <- :lists.seq(1,len), do: random_char 357 | ...> List.foldl(list, "", fn(e,acc) -> acc <> e end) 358 | ...> end 359 | ...> end 360 | {:module, MyString, 361 | ... 362 | iex> MyString.random_string 16 363 | "j0jaxxnoipdgksxi" 364 | ``` 365 | 366 | In the code above, `Enum.random` generates a value used to index into the hexadecimal character set. The Elixir docs for `Enum.random` indicate it uses the Erlang `rand` module, which in turn indicates that each random value has 58 bits of precision. Suppose we're creating strings with **len = 16**. Generating each string character consumes 58 bits of randomness while only injecting 5 bits (`log2(32)`) of entropy into the resulting random string. The resulting string has an information carrying capacity of 16 * 5 = 80 bits, so creating each string requires a *total* of 928 bits of randomness while only actually *carrying* 80 bits of that entropy forward in the string itself. That means 848 bits (91%) of the generated randomness is simply wasted. 367 | 368 | Compare that to the `EntropyString` scheme. For the example above, plucking 5 bits at a time requires a total of 80 bits (10 bytes) be available. Creating the same strings as above, `EntropyString` uses 80 bits of randomness per string with no wasted bits. In general, the `EntropyString` scheme can waste up to 7 bits per string, but that's the worst case scenario and that's *per string*, not *per character*! 369 | 370 | There is, however, a potentially bigger issue at play in the above code. Erlang `rand`, and therefor Elixir `Enum.random`, does not use a cryptographically strong pseudo random number generator. So the above code should not be used for session IDs or any other purpose that requires secure properties. 371 | 372 | There are certainly other popular ways to create random strings, including secure ones. For example, generating secure random hex strings can be done by 373 | 374 | ```elixir 375 | iex> Base.encode16(:crypto.strong_rand_bytes(8)) 376 | "389B363BB7FD6227" 377 | ``` 378 | 379 | Or, to generate file system and URL safe strings 380 | 381 | ```elixir 382 | iex> Base.url_encode64(:crypto.strong_rand_bytes(8)) 383 | "5PLujtDieyA=" 384 | ``` 385 | 386 | Since Base64 encoding is concerned with decoding as well, you would have to strip any padding characters. That's the price we pay for using a function for something it wasn't designed for. 387 | 388 | These two solutions each have limitations. You can't alter the characters, but more importantly, each lacks a clear specification of how random the resulting strings actually are. Each specifies a number of bytes as opposed to specifying the entropy bits sufficient to represent some total number of strings with an explicit declaration of an associated risk of repeat using whatever encoding characters you want. That's a bit of a mouthful, but the important point is with `EntropyString` you _explicitly_ declare your intent. 389 | 390 | Fortunately you don't need to really understand how secure random bytes are efficiently sliced and diced to use `EntropyString`. But you may want to provide your own [Custom Bytes](#CustomBytes), which is the next topic. 391 | 392 | [TOC](#TOC) 393 | 394 | ### Custom Bytes 395 | 396 | As previously described, `EntropyString` automatically generates cryptographically strong random bytes to generate strings. You may, however, have a need to provide your own bytes, for deterministic testing or perhaps to use a specialized random byte generator. 397 | 398 | Suppose we want 30 strings with no more than a 1 in a million chance of repeat while using 32 characters. We can specify the bytes to use during string generation by 399 | 400 | ```elixir 401 | iex> bytes = <<0xfa, 0xc8, 0x96, 0x64>> 402 | <<250, 200, 150, 100>> 403 | iex> EntropyString.random(:small, :charset32, bytes) 404 | "Th7fjL" 405 | ``` 406 | 407 | The __bytes__ provided can come from any source. However, an error is returned if the number of bytes is insufficient to generate the string as described in the [Efficiency](#Efficiency) section: 408 | 409 | ```elixir 410 | iex> EntropyString.random(:large, :charset32, bytes) 411 | {:error, "Insufficient bytes: need 14 and got 4"} 412 | ``` 413 | 414 | `EntropyString.CharSet.bytes_needed/2` can be used to determine the number of bytes needed to cover a specified amount of entropy for a given character set. 415 | 416 | ```elixir 417 | iex> EntropyString.CharSet.bytes_needed(:large, :charset32) 418 | 13 419 | ``` 420 | 421 | [TOC](#TOC) 422 | 423 | ### Entropy Bits 424 | 425 | Thus far we've avoided the mathematics behind the calculation of the entropy bits required to specify a risk that some number random strings will not have a repeat. As noted in the [Overview](#Overview), the posting [Hash Collision Probabilities](http://preshing.com/20110504/hash-collision-probabilities/) derives an expression, based on the well-known [Birthday Problem](https://en.wikipedia.org/wiki/Birthday_problem#Approximations), for calculating the probability of a collision in some number of hashes (denoted by `k`) using a perfect hash with an output of `M` bits: 426 | 427 | ![Hash Collision Probability](images/HashCollision.png) 428 | 429 | There are two slight tweaks to this equation as compared to the one in the referenced posting. `M` is used for the total number of possible hashes and an equation is formed by explicitly specifying that the expression in the posting is approximately equal to `1/n`. 430 | 431 | More importantly, the above equation isn't in a form conducive to our entropy string needs. The equation was derived for a set number of possible hashes and yields a probability, which is fine for hash collisions but isn't quite right for calculating the bits of entropy needed for our random strings. 432 | 433 | The first thing we'll change is to use `M = 2^N`, where `N` is the number of entropy bits. This simply states that the number of possible strings is equal to the number of possible values using `N` bits: 434 | 435 | ![N-Bit Collision Probability](images/NBitCollision.png) 436 | 437 | Now we massage the equation to represent `N` as a function of `k` and `n`: 438 | 439 | ![Entropy Bits Equation](images/EntropyBits.png) 440 | 441 | The final line represents the number of entropy bits `N` as a function of the number of potential strings `k` and the risk of repeat of 1 in `n`, exactly what we want. Furthermore, the equation is in a form that avoids really large numbers in calculating `N` since we immediately take a logarithm of each large value `k` and `n`. 442 | 443 | [TOC](#TOC) 444 | 445 | ### Why You Don't Need UUIDs 446 | 447 | It is quite common in most (all?) programming languages to simply use string representations of UUIDs as random strings. While this isn't necessarily wrong, it is not efficient. It's somewhat akin to using a BigInt library to do math with small integers. The answers might be right, but the process seems wrong. 448 | 449 | By UUID, we almost always mean the version 4 string representation, which looks like this: 450 | 451 | ``` 452 | hhhhhhhh-hhhh-4hhh-Mhhh-hhhhh 453 | ``` 454 | 455 | Per [Section 4.4 of RFC 4122](https://tools.ietf.org/html/rfc4122#section-4.4), the algorithm for creating 32-byte version 4 UUIDs is: 456 | 457 | - Set bits 49-52 to the 4-bit version number, **0100** 458 | - The 13th hex char will always be **4** 459 | - Set bit 65-66 to **10**. 460 | - The 17th hex char will be one of **8**, **9**, **A** or **B** 461 | - Set all the other bits to randomly (or pseudo-randomly) chosen values 462 | 463 | The algorithm designates how to create the 32 byte UUID. The string representation shown above is specified in Section 3 of the RFC. 464 | 465 | The ramifications of the algorithm and string representation are: 466 | 467 | - The specification does not require the use of a cryptographically strong pseudo-random number generator. That's fine, but if using the IDs for security purposes, be sure a CSPRNG is being used to generate the random bytes for the UUID. 468 | - Because certain bits are fixed values, the entropy of the UUID is reduced from 128 bits to 122 bits. This may not be a significant issue in some cases, but regardless of how often you read otherwise, a version 4 UUID **_does not have_** 128 bits of randomness. And if you use version 4 UUIDs for session IDs, that does not cover the OWASP recommendation of using 128-bit IDs. 469 | - The string representation with hyphens adds overhead without adding any bits of entropy. 470 | 471 | As a quick aside, let me emphasize that a string **_does not_** inherently possess any given amount of entropy. For example, how many bits of entropy does the version 4 UUID string **7416179b-62f4-4ea1-9201-6aa4ef920c12** have? Given the structure of version 4 UUIDs, we know it represents **_at most_** 122 bits of entropy. But without knowing how the bits were actually generated, **_we can't know_** how much entropy has actually been captured. Consider that statement carefully if you ever look at one of the many libraries that claim to calculate the entropy of a given string. The underlying assumption of how the string characters are generated is crucial (and often glossed over). Buyer beware. 472 | 473 | Now, back to why you don't need to use version 4 UUIDs. The string representation is fixed, and uses 36 characters. Suppose we define as a metric of efficiency the number of bits in the string representation as opposed to the number of entropy bits. Then for a version 4 UUID we have: 474 | 475 | - UUID 476 | - Entropy bits: 122 477 | - String length: 36 478 | - String bits: 288 479 | - Efficiency: 42% 480 | 481 | Let's create a 122 entropy bit string using `charset64`: 482 | ```elixir 483 | iex> defmodule(Id, do: use(EntropyString, bits: 122, charset: charset64)) 484 | {:module, Id, 485 | iex> string = Id.string() 486 | "94N04YtQH7JeK-cMdnG00" 487 | ``` 488 | 489 | - Entropy String: 490 | - Entropy bits: 126 491 | - String length: 21 492 | - String bits: 168 493 | - Efficiency: 75% 494 | 495 | Using `charset64` characters, we create a string representation with 75% efficiency vs. the 42% achieved in using version 4 UUIDs. Given that generating random strings using `EntropyString` is as easy as using a UUID library, I'll take 75% efficiency over 42% any day. 496 | 497 | (Note the actually bits of entropy in the string is 126. Each character in `charset64` carries 6 bits of entropy, and so in this case we can only have a total entropy of a multiple of 6. The `EntropyString` library ensures the number of entropy bits will meet or exceed the designated bits.) 498 | 499 | But that's not the primary reason for using `EntropyString` over UUIDs. With version 4 UUIDs, the bits of entropy is fixed at 122, and you should ask yourself, "why do I need 122 bits"? And how often do you unquestioningly use one-size fits all solutions anyway? 500 | 501 | What you should actually ask is, "how many strings do I need and what level of risk of a repeat am I willing to accept"? Rather than one-size fits all solutions, you should seek understanding and explicit control. Rather than swallowing 122-bits without thinking, investigate your real need and act accordingly. If you need IDs for a database table that could have 1 million entries, explicitly declare how much risk of repeat you're willing to accept. 1 in a million? Then you need 59 bits. 1 in a billion? 69 bits. 1 in a trillion? 79 bits. But **_openly declare_** and quit using UUIDs just because you didn't think about it! Now you know better, so do better :) 502 | 503 | And finally, don't say you use version 4 UUIDs because you don't **_ever_** want a repeat. The term 'unique' in the name is misleading. Perhaps we should call them PUID for probabilistically unique identifiers. (I left out "universal" since that designation never really made sense anyway.) Regardless, there is a chance of repeat. It just depends on how many UUIDs you produce in a given "collision" context. Granted, it may be small, but it **_is not zero_**! 504 | 505 | [TOC](#TOC) 506 | 507 | ### Take Away 508 | 509 | - Don't specify randomness using string length 510 | - String length is a by-product, not a goal 511 | - Don't require truly uniqueness 512 | - You'll do fine with probabilistically uniqueness 513 | - Probabilistic uniqueness involves risk 514 | - Risk is specified as *"1 in __n__ chance of generating a repeat"* 515 | - Explicitly specify your intent 516 | - Specified entropy as the risk of repeat in a total number of strings 517 | - Characters used are arbitrary 518 | - You need `EntropyString`, not UUIDs 519 | 520 | ##### 10 million potential IDs with a 1 in a trillion chance of a repeat: 521 | 522 | ```elixir 523 | iex> defmodule(MyId, do: use(EntropyString, total: 1.0e7, risk: 1.0e12)) 524 | {:module, MyId, 525 | ... 526 | ES-iex> MyId.random() 527 | "4LbdRPfn7bdGfjqQmt" 528 | ``` 529 | 530 | [TOC](#TOC) 531 | -------------------------------------------------------------------------------- /examples.exs: -------------------------------------------------------------------------------- 1 | # 2 | # Compile library before running this example file: 3 | # 4 | # > mix compile 5 | # 6 | # Launching the Elixir shell from the project base directory loads EntropyString and runs 7 | # the examples 8 | # 9 | # > iex --dot-iex iex.exs 10 | # 11 | 12 | alias EntropyString.CharSet 13 | 14 | # -------------------------------------------------------------------------------------------------- 15 | # Id 16 | # Predefined base 32 characters 17 | # -------------------------------------------------------------------------------------------------- 18 | defmodule(Id, do: use(EntropyString)) 19 | 20 | IO.puts("Id: Predefined base 32 CharSet") 21 | IO.puts(" Bits: #{Id.bits()}") 22 | IO.puts(" Characters: #{Id.chars()}") 23 | IO.puts(" Random ID: #{Id.string()}\n") 24 | 25 | # -------------------------------------------------------------------------------------------------- 26 | # Hex Id 27 | # Predefined hex characters 28 | # -------------------------------------------------------------------------------------------------- 29 | defmodule(Hex, do: use(EntropyString, charset: :charset16)) 30 | 31 | IO.puts("Hex: Predefined hex character session id") 32 | IO.puts(" Bits: #{Id.bits()}") 33 | IO.puts(" Characters: #{Hex.chars()}") 34 | IO.puts(" Random ID: #{Hex.string()}\n") 35 | 36 | # -------------------------------------------------------------------------------------------------- 37 | # Base64 Id 38 | # Predefined URL and file system safe characters 39 | # -------------------------------------------------------------------------------------------------- 40 | defmodule(Base64Id, do: use(EntropyString, charset: charset64)) 41 | 42 | IO.puts("Base64Id: Predefined URL and file system safe characters") 43 | IO.puts(" Bits: #{Id.bits()}") 44 | IO.puts(" Characters: #{Base64Id.chars()}") 45 | IO.puts(" Session ID: #{Base64Id.session()}\n") 46 | 47 | # -------------------------------------------------------------------------------------------------- 48 | # Uppercase Hex Id 49 | # Uppercase hex characters 50 | # -------------------------------------------------------------------------------------------------- 51 | defmodule(UpperHex, do: use(EntropyString, bits: 64, charset: "0123456789ABCDEF")) 52 | 53 | IO.puts("UpperHex: Upper case hex CharSet") 54 | IO.puts(" Bits: #{UpperHex.bits()}") 55 | IO.puts(" Characters: #{UpperHex.chars()}") 56 | IO.puts(" Random ID: #{UpperHex.string()}\n") 57 | 58 | # -------------------------------------------------------------------------------------------------- 59 | # DingoSky 60 | # Ten million strings with a 1 in a billion chance of repeat 61 | # -------------------------------------------------------------------------------------------------- 62 | defmodule(DingoSky, do: use(EntropyString, charset: "dingosky", total: 1.0e7, risk: 1.0e9)) 63 | 64 | IO.puts("DingoSky: Custom characters for a million IDs with a 1 in a billion chance of repeat") 65 | IO.puts(" Bits: #{DingoSky.bits()}") 66 | IO.puts(" Characters: #{DingoSky.chars()}") 67 | IO.puts(" DingoSky ID: #{DingoSky.string()}\n") 68 | 69 | # -------------------------------------------------------------------------------------------------- 70 | # Server 71 | # 256 entropy bit token 72 | # -------------------------------------------------------------------------------------------------- 73 | defmodule(Server, do: use(EntropyString, charset: charset64)) 74 | 75 | IO.puts("Server: 256 entropy bit token") 76 | IO.puts(" Characters: #{Server.chars()}") 77 | IO.puts(" Token: #{Server.token()}\n") 78 | -------------------------------------------------------------------------------- /iex.exs: -------------------------------------------------------------------------------- 1 | Application.put_env(:elixir, :ansi_enabled, true) 2 | IEx.configure( 3 | colors: [enabled: true], 4 | default_prompt: [ 5 | "\e[G", # ANSI CHA, move cursor to column 1 6 | :magenta, 7 | "ES-%prefix", # IEx prompt variable 8 | ">", # plain string 9 | :reset 10 | ] |> IO.ANSI.format |> IO.chardata_to_string 11 | ) 12 | 13 | import_file "./lib/charset.ex" 14 | import_file "./lib/entropy_string.ex" 15 | IO.puts "\nEntropyString Loaded" 16 | 17 | IO.puts "\n------------------------" 18 | IO.puts " Execute examples.exs" 19 | IO.puts "------------------------\n" 20 | import_file "./examples.exs" 21 | -------------------------------------------------------------------------------- /images/EntropyBits.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EntropyString/Elixir/fb90c6dddd8dd7026c4762d138bea7ce0c81fe8a/images/EntropyBits.png -------------------------------------------------------------------------------- /images/HashCollision.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EntropyString/Elixir/fb90c6dddd8dd7026c4762d138bea7ce0c81fe8a/images/HashCollision.png -------------------------------------------------------------------------------- /images/NBitCollision.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EntropyString/Elixir/fb90c6dddd8dd7026c4762d138bea7ce0c81fe8a/images/NBitCollision.png -------------------------------------------------------------------------------- /lib/charset.ex: -------------------------------------------------------------------------------- 1 | # MIT License 2 | # 3 | # Copyright (c) 2017-2018 Knoxen 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy 6 | # of this software and associated documentation files (the "Software"), to deal 7 | # in the Software without restriction, including without limitation the rights 8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | # copies of the Software, and to permit persons to whom the Software is 10 | # furnished to do so, subject to the following conditions: 11 | # 12 | # The above copyright notice and this permission notice shall be included in all 13 | # copies or substantial portions of the Software. 14 | # 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | # SOFTWARE. 22 | 23 | defmodule EntropyString.CharSet do 24 | @moduledoc """ 25 | EntropyString CharSet functionality. 26 | 27 | To generating random strings, **_EntropyString_** plucks characters out of a specified 28 | **_CharSet_**. To facilitate efficient generation of these strings, **_EntropyString_** uses 29 | **_CharSet_**s with character counts that are powers of two: 2, 4, 8, 16, 32 and 64. Pre-defined 30 | character sets are provided and custom character sets are supported. 31 | 32 | ## Examples 33 | 34 | Pre-defined CharSet with 32 characters 35 | 36 | iex> EntropyString.CharSet.charset32 37 | "2346789bdfghjmnpqrtBDFGHJLMNPQRT" 38 | 39 | Entropy bits per character for **_charset64_** 40 | 41 | iex> charset = EntropyString.CharSet.charset64 42 | iex> EntropyString.CharSet.bits_per_char(charset) 43 | 6 44 | 45 | Custom bytes needed to produce a string of 48 entropy bits using **_charset32_** 46 | 47 | iex> charset = EntropyString.CharSet.charset32 48 | iex> EntropyString.CharSet.bytes_needed(48, charset) 49 | 7 50 | 51 | Validate custom CharSet 52 | 53 | iex> EntropyString.CharSet.validate(<<"HT">>) 54 | true 55 | 56 | iex> EntropyString.CharSet.validate(<<"012345">>) 57 | {:error, "Invalid char count: must be one of 2,4,8,16,32,64"} 58 | 59 | iex> EntropyString.CharSet.validate(<<"ABCB">>) 60 | {:error, "Chars not unique"} 61 | 62 | """ 63 | 64 | ## =============================================================================================== 65 | ## 66 | ## Module Constants 67 | ## 68 | ## =============================================================================================== 69 | @bitsPerByte 8 70 | 71 | ## =============================================================================================== 72 | ## 73 | ## PreDefined CharSets 74 | ## 75 | ## =============================================================================================== 76 | @charset64 "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_" 77 | @charset32 "2346789bdfghjmnpqrtBDFGHJLMNPQRT" 78 | @charset16 "0123456789abcdef" 79 | @charset8 "01234567" 80 | @charset4 "ATCG" 81 | @charset2 "01" 82 | 83 | ## =============================================================================================== 84 | ## 85 | ## Accessors for PreDefined CharSets 86 | ## 87 | ## =============================================================================================== 88 | @doc """ 89 | [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5) file system and URL safe character set 90 | 91 | "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_" 92 | """ 93 | def charset64, do: @charset64 94 | 95 | @doc """ 96 | Strings that don't look like English words and are easy to parse visually 97 | 98 | "2346789bdfghjmnpqrtBDFGHJLMNPQRT" 99 | 100 | - remove all upper and lower case vowels (including y) 101 | - remove all numbers that look like letters 102 | - remove all letters that look like numbers 103 | - remove all letters that have poor distinction between upper and lower case values 104 | """ 105 | def charset32, do: @charset32 106 | 107 | @doc """ 108 | Lowercase hexidecimal 109 | 110 | "0123456789abcdef" 111 | """ 112 | def charset16, do: @charset16 113 | 114 | @doc """ 115 | Octal characters 116 | 117 | "01234567" 118 | """ 119 | def charset8, do: @charset8 120 | 121 | @doc """ 122 | DNA alphabet 123 | 124 | "ATCG" 125 | 126 | No good reason; just wanted to get away from the obvious 127 | """ 128 | def charset4, do: @charset4 129 | 130 | @doc """ 131 | Binary characters 132 | 133 | "01" 134 | """ 135 | def charset2, do: @charset2 136 | 137 | ## =============================================================================================== 138 | ## 139 | ## bits_per_char/1 140 | ## 141 | ## =============================================================================================== 142 | @doc """ 143 | Entropy bits per character for **_charset_** 144 | 145 | - **_charset_** - CharSet in use 146 | 147 | ## Example 148 | 149 | iex> charset = EntropyString.CharSet.charset32 150 | "2346789bdfghjmnpqrtBDFGHJLMNPQRT" 151 | iex> EntropyString.CharSet.bits_per_char(charset) 152 | 5 153 | """ 154 | def bits_per_char(@charset64), do: 6 155 | def bits_per_char(@charset32), do: 5 156 | def bits_per_char(@charset16), do: 4 157 | def bits_per_char(@charset8), do: 3 158 | def bits_per_char(@charset4), do: 2 159 | def bits_per_char(@charset2), do: 1 160 | 161 | def bits_per_char(charset) when is_binary(charset) do 162 | round(:math.log2(byte_size(charset))) 163 | end 164 | 165 | ## =============================================================================================== 166 | ## 167 | ## bytes_needed/2 168 | ## 169 | ## =============================================================================================== 170 | @doc """ 171 | Bytes needed to form a string of entropy **_bits_** from characters in **_charset_** 172 | 173 | - **_bits_** - entropy bits for string 174 | - **_charset_** - CharSet in use 175 | 176 | Returns number of bytes needed to form strings with entropy **_bits_** using characters from 177 | **_charset_**; or 178 | 179 | - `{:error, reason}` if `EntropyString.CharSet.validate(charset)` is not `true`. 180 | 181 | ## Example 182 | 183 | iex> charset = EntropyString.CharSet.charset16 184 | "0123456789abcdef" 185 | iex> EntropyString.CharSet.bytes_needed(48, charset) 186 | 6 187 | """ 188 | def bytes_needed(bits, _charset) when bits < 0, do: {:error, "Negative entropy"} 189 | 190 | def bytes_needed(bits, charset) when is_atom(bits) do 191 | bytes_needed(bits(bits), charset) 192 | end 193 | 194 | def bytes_needed(bits, charset) when is_atom(charset) do 195 | bytes_needed(bits, charset_from_atom(charset)) 196 | end 197 | 198 | def bytes_needed(bits, charset) do 199 | bitsPerChar = bits_per_char(charset) 200 | charCount = round(Float.ceil(bits / bitsPerChar)) 201 | round(Float.ceil(charCount * bitsPerChar / @bitsPerByte)) 202 | end 203 | 204 | ## =============================================================================================== 205 | ## 206 | ## validate_charset/1 207 | ## 208 | ## =============================================================================================== 209 | @doc """ 210 | Validate **_charset_** 211 | 212 | - **_charset_** - CharSet to use 213 | 214 | ### Validations 215 | 216 | - **_charset_** must have 2, 4, 8, 16, 32, or 64 characters 217 | - characters must by unique 218 | 219 | ## Examples 220 | 221 | iex> EntropyString.CharSet.validate(<<"0123">>) 222 | true 223 | 224 | iex> EntropyString.CharSet.validate(<<"01234567890abcdef">>) 225 | {:error, "Invalid char count: must be one of 2,4,8,16,32,64"} 226 | 227 | iex> EntropyString.CharSet.validate(<<"01234566">>) 228 | {:error, "Chars not unique"} 229 | 230 | """ 231 | def validate(charset) when is_binary(charset) do 232 | length = byte_size(charset) 233 | 234 | case :lists.member(length, [64, 32, 16, 8, 4, 2]) do 235 | true -> 236 | unique(charset) 237 | 238 | false -> 239 | {:error, "Invalid char count: must be one of 2,4,8,16,32,64"} 240 | end 241 | end 242 | 243 | ## ----------------------------------------------------------------------------------------------- 244 | ## 245 | ## unique/1 246 | ## 247 | ## ----------------------------------------------------------------------------------------------- 248 | defp unique(charset), do: unique(true, charset) 249 | 250 | ## ----------------------------------------------------------------------------------------------- 251 | ## 252 | ## unique/2 253 | ## 254 | ## ----------------------------------------------------------------------------------------------- 255 | defp unique(result, <<>>), do: result 256 | 257 | defp unique(true, <>) do 258 | case :binary.match(tail, [<>]) do 259 | :nomatch -> 260 | unique(true, tail) 261 | 262 | _ -> 263 | {:error, "Chars not unique"} 264 | end 265 | end 266 | 267 | defp unique(error, _), do: error 268 | 269 | ## These 2 functions are repeated in entropy_string.ex. I don't want these function to be 270 | ## public. CxTBD Is there a way to DRY these functions and not be public? 271 | 272 | ## ----------------------------------------------------------------------------------------------- 273 | ## bits/1 274 | ## ----------------------------------------------------------------------------------------------- 275 | defp bits(:small), do: 29 276 | defp bits(:medium), do: 69 277 | defp bits(:large), do: 99 278 | defp bits(:session), do: 128 279 | defp bits(:token), do: 256 280 | 281 | ## ----------------------------------------------------------------------------------------------- 282 | ## Convert charset atom to EntropyString.CharSet 283 | ## ----------------------------------------------------------------------------------------------- 284 | defp charset_from_atom(:charset2), do: @charset2 285 | defp charset_from_atom(:charset4), do: @charset4 286 | defp charset_from_atom(:charset8), do: @charset8 287 | defp charset_from_atom(:charset16), do: @charset16 288 | defp charset_from_atom(:charset32), do: @charset32 289 | defp charset_from_atom(:charset64), do: @charset64 290 | end 291 | -------------------------------------------------------------------------------- /lib/entropy_string.ex: -------------------------------------------------------------------------------- 1 | # MIT License 2 | # 3 | # Copyright (c) 2017-2018 Knoxen 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy 6 | # of this software and associated documentation files (the "Software"), to deal 7 | # in the Software without restriction, including without limitation the rights 8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | # copies of the Software, and to permit persons to whom the Software is 10 | # furnished to do so, subject to the following conditions: 11 | # 12 | # The above copyright notice and this permission notice shall be included in all 13 | # copies or substantial portions of the Software. 14 | # 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | # SOFTWARE. 22 | 23 | defmodule EntropyString.Error do 24 | @moduledoc """ 25 | Errors raised when defining a EntropyString module with invalid options 26 | """ 27 | defexception message: "EntropyString error" 28 | end 29 | 30 | defmodule EntropyString do 31 | alias EntropyString.CharSet 32 | 33 | @moduledoc """ 34 | Efficiently generate cryptographically strong random strings of specified entropy from various 35 | character sets. 36 | 37 | ## Example 38 | 39 | Ten thousand potential hexidecimal strings with a 1 in 10 million chance of repeat 40 | 41 | bits = EntropyString.bits(10000, 10000000) 42 | EntropyString.random(bits, :charset16) 43 | "9e9b34d6f69ea" 44 | 45 | """ 46 | 47 | @doc false 48 | defmacro __using__(opts) do 49 | quote do 50 | import EntropyString 51 | import CharSet 52 | 53 | bitLen = unquote(opts)[:bits] 54 | total = unquote(opts)[:total] 55 | risk = unquote(opts)[:risk] 56 | 57 | bits = 58 | cond do 59 | is_number(bitLen) -> 60 | bitLen 61 | 62 | is_number(total) and is_number(risk) -> 63 | EntropyString.bits(total, risk) 64 | 65 | true -> 66 | 128 67 | end 68 | 69 | @entropy_string_bits bits 70 | 71 | charset = 72 | case unquote(opts)[:charset] do 73 | nil -> 74 | CharSet.charset32() 75 | 76 | :charset64 -> 77 | CharSet.charset64() 78 | 79 | :charset32 -> 80 | CharSet.charset32() 81 | 82 | :charset16 -> 83 | CharSet.charset16() 84 | 85 | :charset8 -> 86 | CharSet.charset8() 87 | 88 | :charset4 -> 89 | CharSet.charset4() 90 | 91 | :charset2 -> 92 | CharSet.charset2() 93 | 94 | charset when is_binary(charset) -> 95 | case validate(charset) do 96 | true -> charset 97 | {_, reason} -> raise EntropyString.Error, message: reason 98 | end 99 | 100 | charset -> 101 | raise EntropyString.Error, message: "Invalid predefined charset: #{charset}" 102 | end 103 | 104 | @entropy_string_charset charset 105 | 106 | @before_compile EntropyString 107 | end 108 | end 109 | 110 | @doc false 111 | defmacro __before_compile__(_env) do 112 | quote do 113 | @doc """ 114 | Default entropy bits for random strings 115 | """ 116 | def bits, do: @entropy_string_bits 117 | 118 | @doc """ 119 | Module **_EntropyString.CharSet_** 120 | """ 121 | def charset, do: @entropy_string_charset 122 | 123 | @doc """ 124 | Random string using module **_charset_** with a 1 in a million chance of repeat in 125 | 30 strings. 126 | 127 | ## Example 128 | MyModule.small() 129 | "nGrqnt" 130 | """ 131 | def small, do: small(@entropy_string_charset) 132 | 133 | @doc """ 134 | Random string using module **_charset_** with a 1 in a billion chance of repeat for a million 135 | potential strings. 136 | 137 | ## Example 138 | MyModulue.medium() 139 | "nndQjL7FLR9pDd" 140 | """ 141 | def medium, do: medium(@entropy_string_charset) 142 | 143 | @doc """ 144 | Random string using module **_charset_** with a 1 in a trillion chance of repeat for a billion 145 | potential strings. 146 | 147 | ## Example 148 | MyModule.large() 149 | "NqJLbG8htr4t64TQmRDB" 150 | """ 151 | def large, do: large(@entropy_string_charset) 152 | 153 | @doc """ 154 | Random string using module **_charset_** suitable for 128-bit OWASP Session ID 155 | 156 | ## Example 157 | MyModule.session() 158 | "6pLfLgfL8MgTn7tQDN8tqPFR4b" 159 | """ 160 | def session, do: session(@entropy_string_charset) 161 | 162 | @doc """ 163 | Random string using module **_charset_** with 256 bits of entropy. 164 | 165 | ## Example 166 | MyModule.token() 167 | "zHZ278Pv_GaOsmRYdBIR5uO8Tt0OWSESZbVuQye6grt" 168 | """ 169 | def token, do: token(@entropy_string_charset) 170 | 171 | @doc """ 172 | Random string of entropy **_bits_** using module **_charset_** 173 | 174 | - **_bits_** - entropy bits for string 175 | - non-negative integer 176 | - predefined atom 177 | - Defaults to module **_bits_** 178 | 179 | Returns string of at least entropy **_bits_** using module characters; or 180 | 181 | - `{:error, "Negative entropy"}` if **_bits_** is negative. 182 | - `{:error, reason}` if `EntropyString.CharSet.validate(charset)` is not `true`. 183 | 184 | Since the generated random strings carry an entropy that is a multiple of the bits per module 185 | characters, the returned entropy is the minimum that equals or exceeds the specified 186 | **_bits_**. 187 | 188 | ## Example 189 | 190 | A million potential strings (assuming :charset32 characters) with a 1 in a billion chance 191 | of a repeat 192 | 193 | bits = EntropyString.bits(1.0e6, 1.0e9) 194 | 195 | MyModule.random(bits) 196 | "NbMbLrj9fBbQP6" 197 | 198 | MyModule.random(:session) 199 | "CeElDdo7HnNDuiWwlFPPq0" 200 | 201 | """ 202 | def random(bits \\ @entropy_string_bits), do: random(bits, @entropy_string_charset) 203 | 204 | @doc """ 205 | Random string of module entropy **_bits_** and **_charset_** 206 | 207 | ## Example 208 | 209 | Define a module for 10 billion strings with a 1 in a decillion chance of a repeat 210 | 211 | defmodule Rare, do: use EntropyString, total: 1.0e10, risk: 1.0e33 212 | 213 | Rare.string() 214 | "H2Mp8MPT7F3Pp2bmHm" 215 | 216 | Define a module for strings with 122 bits of entropy 217 | 218 | defmodule MyId, do: use EntropyString, bits: 122, charset: charset64 219 | 220 | MyId.string() 221 | "aj2_kMH64P2QDRBlOkz7Z" 222 | 223 | """ 224 | @since "1.3" 225 | def string(), do: random(@entropy_string_bits, @entropy_string_charset) 226 | 227 | @doc """ 228 | Module characters 229 | """ 230 | @since "1.3" 231 | def chars(), do: @entropy_string_charset 232 | end 233 | end 234 | 235 | ## ----------------------------------------------------------------------------------------------- 236 | ## bits/2 237 | ## ----------------------------------------------------------------------------------------------- 238 | @doc """ 239 | Bits of entropy required for **_total_** number of strings with a given **_risk_** 240 | 241 | - **_total_** - potential number of strings 242 | - **_risk_** - risk of repeat in **_total_** strings 243 | 244 | ## Example 245 | 246 | Bits of entropy for **_30_** strings with a **_1 in a million_** chance of repeat 247 | 248 | iex> import EntropyString, only: [bits: 2] 249 | iex> bits = bits(30, 1000000) 250 | iex> round(bits) 251 | 29 252 | """ 253 | def bits(0, _), do: 0 254 | def bits(_, 0), do: 0 255 | def bits(total, _) when total < 0, do: NaN 256 | def bits(_, risk) when risk < 0, do: NaN 257 | 258 | def bits(total, risk) when is_number(total) and is_number(risk) do 259 | n = 260 | cond do 261 | total < 1000 -> 262 | :math.log2(total) + :math.log2(total - 1) 263 | 264 | true -> 265 | 2 * :math.log2(total) 266 | end 267 | 268 | n + :math.log2(risk) - 1 269 | end 270 | 271 | def bits(_, _), do: NaN 272 | 273 | ## ----------------------------------------------------------------------------------------------- 274 | ## small/1 275 | ## ----------------------------------------------------------------------------------------------- 276 | @doc """ 277 | Random string using **_charset_** characters with a 1 in a million chance of repeat in 30 strings. 278 | 279 | Default **_CharSet_** is `charset32`. 280 | 281 | ## Example 282 | EntropyString.small() 283 | "nGrqnt" 284 | 285 | EntropyString.small(:charset16) 286 | "7bc250e5" 287 | 288 | """ 289 | def small(charset \\ :charset32) 290 | 291 | def small(charset) when is_atom(charset) do 292 | random(bits_from_atom(:small), charset_from_atom(charset)) 293 | end 294 | 295 | def small(charset), do: random(bits_from_atom(:small), charset) 296 | 297 | ## ----------------------------------------------------------------------------------------------- 298 | ## medium/1 299 | ## ----------------------------------------------------------------------------------------------- 300 | @doc """ 301 | Random string using **_charset_** characters with a 1 in a billion chance of repeat for a million 302 | potential strings. 303 | 304 | Default **_CharSet_** is `charset32`. 305 | 306 | ## Example 307 | EntropyString.medium() 308 | "nndQjL7FLR9pDd" 309 | 310 | EntropyString.medium(:charset16) 311 | "b95d23b299eeb9bbe6" 312 | 313 | """ 314 | def medium(charset \\ :charset32) 315 | 316 | def medium(charset) when is_atom(charset) do 317 | random(bits_from_atom(:medium), charset_from_atom(charset)) 318 | end 319 | 320 | def medium(charset), do: random(bits_from_atom(:medium), charset) 321 | 322 | ## ----------------------------------------------------------------------------------------------- 323 | ## large/1 324 | ## ----------------------------------------------------------------------------------------------- 325 | @doc """ 326 | Random string using **_charset_** characters with a 1 in a trillion chance of repeat for a billion 327 | potential strings. 328 | 329 | Default **_CharSet_** is `charset32`. 330 | 331 | ## Example 332 | 333 | EntropyString.large() 334 | "NqJLbG8htr4t64TQmRDB" 335 | 336 | EntropyString.large(:charset16) 337 | "f6c4d04cef266a5c3a7950f90" 338 | """ 339 | def large(charset \\ :charset32) 340 | 341 | def large(charset) when is_atom(charset) do 342 | random(bits_from_atom(:large), charset_from_atom(charset)) 343 | end 344 | 345 | def large(charset), do: random(bits_from_atom(:large), charset) 346 | 347 | ## ----------------------------------------------------------------------------------------------- 348 | ## session/1 349 | ## ----------------------------------------------------------------------------------------------- 350 | @doc """ 351 | Random string using **_charset_** characters suitable for 128-bit OWASP Session ID 352 | 353 | Default **_CharSet_** is `charset32`. 354 | 355 | ## Example 356 | 357 | EntropyString.session() 358 | "6pLfLgfL8MgTn7tQDN8tqPFR4b" 359 | 360 | EntropyString.session(:charset64) 361 | "VzhprMROlM6Iy2Pk1IRCqR" 362 | """ 363 | def session(charset \\ :charset32) 364 | 365 | def session(charset) when is_atom(charset) do 366 | random(bits_from_atom(:session), charset_from_atom(charset)) 367 | end 368 | 369 | def session(charset), do: random(bits_from_atom(:session), charset) 370 | 371 | ## ----------------------------------------------------------------------------------------------- 372 | ## token/1 373 | ## ----------------------------------------------------------------------------------------------- 374 | @doc """ 375 | Random string using **_charset_** characters with 256 bits of entropy. 376 | 377 | Default **_CharSet_** is the base 64 URL and file system safe character set. 378 | 379 | ## Example 380 | 381 | EntropyString.token() 382 | "zHZ278Pv_GaOsmRYdBIR5uO8Tt0OWSESZbVuQye6grt" 383 | 384 | EntropyString.token(:charset32) 385 | "7fRgrB4JtqQB8gphhf8T7bppttJQqJ3PTPFjMjGQbhgJNR9FNNHD" 386 | """ 387 | def token(charset \\ CharSet.charset64()) 388 | 389 | def token(charset) when is_atom(charset) do 390 | random(bits_from_atom(:token), charset_from_atom(charset)) 391 | end 392 | 393 | def token(charset), do: random(bits_from_atom(:token), charset) 394 | 395 | ## ----------------------------------------------------------------------------------------------- 396 | ## random/2 397 | ## ----------------------------------------------------------------------------------------------- 398 | @doc """ 399 | Random string of entropy **_bits_** using **_charset_** characters 400 | 401 | - **_bits_** - entropy bits for string 402 | - non-negative integer 403 | - predefined atom 404 | - **_charset_** - CharSet to use 405 | - `EntropyString.CharSet` 406 | - predefined atom 407 | - Valid `String` representing the characters for the `EntropyString.CharSet` 408 | 409 | Returns string of at least entropy **_bits_** using characters from **_charset_**; or 410 | 411 | - `{:error, "Negative entropy"}` if **_bits_** is negative. 412 | - `{:error, reason}` if `EntropyString.CharSet.validate(charset)` is not `true`. 413 | 414 | Since the generated random strings carry an entropy that is a multiple of the bits per character 415 | for **_charset_**, the returned entropy is the minimum that equals or exceeds the specified 416 | **_bits_**. 417 | 418 | ## Examples 419 | 420 | A million potential base32 strings with a 1 in a billion chance of a repeat 421 | 422 | bits = EntropyString.bits(1.0e6, 1.0e9) 423 | EntropyString.random(bits) 424 | "NbMbLrj9fBbQP6" 425 | 426 | A million potential hex strings with a 1 in a billion chance of a repeat 427 | 428 | EntropyString.random(bits, :charset16) 429 | "0746ae8fbaa2fb4d36" 430 | 431 | A random session ID using URL and File System safe characters 432 | 433 | EntropyString.random(:session, :charset64) 434 | "txSdE3qBK2etQtLyCFNHGD" 435 | 436 | """ 437 | def random(bits \\ 128, charset \\ :charset32) 438 | 439 | ## ----------------------------------------------------------------------------------------------- 440 | ## Invalid bits 441 | ## ----------------------------------------------------------------------------------------------- 442 | def random(bits, _charset) when bits < 0, do: {:error, "Negative entropy"} 443 | 444 | def random(bits, charset) when is_atom(bits), do: random(bits_from_atom(bits), charset) 445 | 446 | def random(bits, charset) when is_atom(charset), do: random(bits, charset_from_atom(charset)) 447 | 448 | def random(bits, charset) do 449 | with_charset(charset, fn -> 450 | byteCount = CharSet.bytes_needed(bits, charset) 451 | bytes = :crypto.strong_rand_bytes(byteCount) 452 | _random_string_bytes(bits, charset, bytes) 453 | end) 454 | end 455 | 456 | ## ----------------------------------------------------------------------------------------------- 457 | ## random/3 458 | ## ----------------------------------------------------------------------------------------------- 459 | @doc """ 460 | Random string of entropy **_bits_** using **_charset_** characters and specified **_bytes_** 461 | 462 | - **_bits_** - entropy bits 463 | - non-negative integer 464 | - predefined atom 465 | - **_charset_** - CharSet to use 466 | - `EntropyString.CharSet` 467 | - predefined atom 468 | - Valid `String` representing the characters for the `EntropyString.CharSet` 469 | - **_bytes_** - Bytes to use 470 | 471 | Returns random string of at least entropy **_bits_**; or 472 | 473 | - `{:error, "Negative entropy"}` if **_bits_** is negative. 474 | - `{:error, reason}` if `EntropyString.CharSet.validate(charset)` is not `true`. 475 | - `{:error, reason}` if `validate_byte_count(bits, charset, bytes)` is not `true`. 476 | 477 | Since the generated random strings carry an entropy that is a multiple of the bits per character 478 | for **_charset_**, the returned entropy is the minimum that equals or exceeds the specified 479 | **_bits_**. 480 | 481 | ## Example 482 | 483 | 30 potential random hex strings with a 1 in a million chance of a repeat 484 | 485 | iex> bits = EntropyString.bits(30, 1000000) 486 | iex> bytes = <<14, 201, 32, 143>> 487 | iex> EntropyString.random(bits, :charset16, bytes) 488 | "0ec9208f" 489 | 490 | Use `EntropyString.CharSet.bytes_needed(bits, charset)` to determine how many **_bytes_** are 491 | actually needed. 492 | """ 493 | def random(bits, charset, bytes) when is_atom(bits) do 494 | random(bits_from_atom(bits), charset, bytes) 495 | end 496 | 497 | def random(bits, charset, bytes) when is_atom(charset) do 498 | random(bits, charset_from_atom(charset), bytes) 499 | end 500 | 501 | def random(bits, charset, bytes) do 502 | with_charset(charset, fn -> 503 | case validate_byte_count(bits, charset, bytes) do 504 | true -> _random_string_bytes(bits, charset, bytes) 505 | error -> error 506 | end 507 | end) 508 | end 509 | 510 | defp _random_string_bytes(bits, charset, bytes) do 511 | bitsPerChar = CharSet.bits_per_char(charset) 512 | ndxFn = ndx_fn(charset) 513 | charCount = trunc(Float.ceil(bits / bitsPerChar)) 514 | _random_string_count(charCount, ndxFn, charset, bytes, <<>>) 515 | end 516 | 517 | defp _random_string_count(0, _, _, _, chars), do: chars 518 | 519 | defp _random_string_count(charCount, ndxFn, charset, bytes, chars) do 520 | slice = charCount - 1 521 | ndx = ndxFn.(slice, bytes) 522 | char = :binary.part(charset, ndx, 1) 523 | _random_string_count(slice, ndxFn, charset, bytes, <>) 524 | end 525 | 526 | ## ----------------------------------------------------------------------------------------------- 527 | ## validate_byte_count/3 528 | ## ----------------------------------------------------------------------------------------------- 529 | @doc """ 530 | Validate number of **_bytes_** is sufficient to generate random strings with entropy **_bits_** 531 | using **_charset_** 532 | 533 | - **_bits_** - entropy bits for random string 534 | - **_charset_** - characters in use 535 | - **_bytes_** - bytes to validate 536 | 537 | ### Validations 538 | 539 | - **_bytes_** count must be sufficient to generate entropy **_bits_** string from **_charset_** 540 | 541 | Use `EntropyString.CharSet.bytes_needed(bits, charset)` to determine how many **_bytes_** are 542 | needed 543 | """ 544 | def validate_byte_count(bits, charset, bytes) when is_binary(bytes) do 545 | need = CharSet.bytes_needed(bits, charset) 546 | got = byte_size(bytes) 547 | 548 | case need <= got do 549 | true -> 550 | true 551 | 552 | _ -> 553 | reason = :io_lib.format("Insufficient bytes: need ~p and got ~p", [need, got]) 554 | {:error, :binary.list_to_bin(reason)} 555 | end 556 | end 557 | 558 | ## ----------------------------------------------------------------------------------------------- 559 | ## ndx_fn/1 560 | ## Return function to pull charset bits_per_char bits at position slice of bytes 561 | ## ----------------------------------------------------------------------------------------------- 562 | defp ndx_fn(charset) do 563 | bitsPerChar = CharSet.bits_per_char(charset) 564 | 565 | fn slice, bytes -> 566 | offset = slice * bitsPerChar 567 | <<_skip::size(offset), ndx::size(bitsPerChar), _rest::bits>> = bytes 568 | ndx 569 | end 570 | end 571 | 572 | ## ----------------------------------------------------------------------------------------------- 573 | ## with_charset/1 574 | ## For pre-defined CharSet, skip charset validation 575 | ## ----------------------------------------------------------------------------------------------- 576 | defp with_charset(charset, doFn) do 577 | # Pre-defined charset does not require validation 578 | case is_predefined_charset(charset) do 579 | true -> 580 | doFn.() 581 | 582 | _ -> 583 | case CharSet.validate(charset) do 584 | true -> doFn.() 585 | error -> error 586 | end 587 | end 588 | end 589 | 590 | defp is_predefined_charset(:charset2), do: true 591 | defp is_predefined_charset(:charset4), do: true 592 | defp is_predefined_charset(:charset8), do: true 593 | defp is_predefined_charset(:charset16), do: true 594 | defp is_predefined_charset(:charset32), do: true 595 | defp is_predefined_charset(:charset64), do: true 596 | 597 | defp is_predefined_charset(charset) do 598 | charset == CharSet.charset64() or charset == CharSet.charset32() or 599 | charset == CharSet.charset16() or charset == CharSet.charset8() or 600 | charset == CharSet.charset4() or charset == CharSet.charset2() 601 | end 602 | 603 | ## ----------------------------------------------------------------------------------------------- 604 | ## Convert bits atom to bits integer 605 | ## ----------------------------------------------------------------------------------------------- 606 | defp bits_from_atom(:small), do: 29 607 | defp bits_from_atom(:medium), do: 69 608 | defp bits_from_atom(:large), do: 99 609 | defp bits_from_atom(:session), do: 128 610 | defp bits_from_atom(:token), do: 256 611 | 612 | ## ----------------------------------------------------------------------------------------------- 613 | ## Convert charset atom to EntropyString.CharSet 614 | ## ----------------------------------------------------------------------------------------------- 615 | defp charset_from_atom(:charset2), do: CharSet.charset2() 616 | defp charset_from_atom(:charset4), do: CharSet.charset4() 617 | defp charset_from_atom(:charset8), do: CharSet.charset8() 618 | defp charset_from_atom(:charset16), do: CharSet.charset16() 619 | defp charset_from_atom(:charset32), do: CharSet.charset32() 620 | defp charset_from_atom(:charset64), do: CharSet.charset64() 621 | end 622 | -------------------------------------------------------------------------------- /mix.exs: -------------------------------------------------------------------------------- 1 | defmodule EntropyString.Mixfile do 2 | use Mix.Project 3 | 4 | def project do 5 | [ 6 | app: :entropy_string, 7 | version: "1.3.4", 8 | elixir: "~> 1.8", 9 | deps: deps(), 10 | description: description(), 11 | package: package() 12 | ] 13 | end 14 | 15 | defp deps, 16 | do: [ 17 | {:earmark, "~> 1.2", only: :dev}, 18 | {:ex_doc, "~> 0.19", only: :dev} 19 | ] 20 | 21 | defp description do 22 | """ 23 | Efficiently generate cryptographically strong random strings of specified entropy from various character sets. `EntropyString` is superseded by `Puid` (https://hex.pm/packages/puid). 24 | """ 25 | end 26 | 27 | defp package do 28 | [ 29 | maintainers: ["Paul Rogers"], 30 | licenses: ["MIT"], 31 | links: %{"GitHub" => "https://github.com/EntropyString/Elixir"} 32 | ] 33 | end 34 | end 35 | -------------------------------------------------------------------------------- /mix.lock: -------------------------------------------------------------------------------- 1 | %{ 2 | "earmark": {:hex, :earmark, "1.3.1", "73812f447f7a42358d3ba79283cfa3075a7580a3a2ed457616d6517ac3738cb9", [:mix], [], "hexpm"}, 3 | "ex_doc": {:hex, :ex_doc, "0.19.3", "3c7b0f02851f5fc13b040e8e925051452e41248f685e40250d7e40b07b9f8c10", [:mix], [{:earmark, "~> 1.2", [hex: :earmark, repo: "hexpm", optional: false]}, {:makeup_elixir, "~> 0.10", [hex: :makeup_elixir, repo: "hexpm", optional: false]}], "hexpm"}, 4 | "makeup": {:hex, :makeup, "0.8.0", "9cf32aea71c7fe0a4b2e9246c2c4978f9070257e5c9ce6d4a28ec450a839b55f", [:mix], [{:nimble_parsec, "~> 0.5.0", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm"}, 5 | "makeup_elixir": {:hex, :makeup_elixir, "0.13.0", "be7a477997dcac2e48a9d695ec730b2d22418292675c75aa2d34ba0909dcdeda", [:mix], [{:makeup, "~> 0.8", [hex: :makeup, repo: "hexpm", optional: false]}], "hexpm"}, 6 | "nimble_parsec": {:hex, :nimble_parsec, "0.5.0", "90e2eca3d0266e5c53f8fbe0079694740b9c91b6747f2b7e3c5d21966bba8300", [:mix], [], "hexpm"}, 7 | } 8 | -------------------------------------------------------------------------------- /test/bits_test.exs: -------------------------------------------------------------------------------- 1 | defmodule EntropypString.Bits.Test do 2 | use ExUnit.Case, async: true 3 | 4 | import EntropyString, only: [bits: 2] 5 | 6 | test "bits for zero entropy" do 7 | assert bits(0, 0) == 0 8 | assert bits(100, 0) == 0 9 | assert bits(0, 100) == 0 10 | assert bits(0, -1) == 0 11 | assert bits(-1, 0) == 0 12 | end 13 | 14 | test "bits for integer total and risk" do 15 | assert round(bits(10, 1000)) == 15 16 | assert round(bits(10, 10000)) == 19 17 | assert round(bits(10, 100_000)) == 22 18 | 19 | assert round(bits(100, 1000)) == 22 20 | assert round(bits(100, 10000)) == 26 21 | assert round(bits(100, 100_000)) == 29 22 | 23 | assert round(bits(1000, 1000)) == 29 24 | assert round(bits(1000, 10000)) == 32 25 | assert round(bits(1000, 100_000)) == 36 26 | 27 | assert round(bits(10000, 1000)) == 36 28 | assert round(bits(10000, 10000)) == 39 29 | assert round(bits(10000, 100_000)) == 42 30 | 31 | assert round(bits(100_000, 1000)) == 42 32 | assert round(bits(100_000, 10000)) == 46 33 | assert round(bits(100_000, 100_000)) == 49 34 | end 35 | 36 | test "bits powers" do 37 | assert round(bits(1.0e5, 1.0e3)) == 42 38 | assert round(bits(1.0e5, 1.0e4)) == 46 39 | assert round(bits(1.0e5, 1.0e5)) == 49 40 | end 41 | 42 | test "preshing 32-bit" do 43 | assert round(bits(30084, 1.0e01)) == 32 44 | assert round(bits(9292, 1.0e02)) == 32 45 | assert round(bits(2932, 1.0e03)) == 32 46 | assert round(bits(927, 1.0e04)) == 32 47 | assert round(bits(294, 1.0e05)) == 32 48 | assert round(bits(93, 1.0e06)) == 32 49 | assert round(bits(30, 1.0e07)) == 32 50 | assert round(bits(10, 1.0e08)) == 32 51 | end 52 | 53 | test "preshing 64-bit" do 54 | assert round(bits(1.97e09, 1.0e01)) == 64 55 | assert round(bits(6.09e08, 1.0e02)) == 64 56 | assert round(bits(1.92e08, 1.0e03)) == 64 57 | assert round(bits(6.07e07, 1.0e04)) == 64 58 | assert round(bits(1.92e07, 1.0e05)) == 64 59 | assert round(bits(6.07e06, 1.0e06)) == 64 60 | assert round(bits(1.92e06, 1.0e07)) == 64 61 | assert round(bits(607_401, 1.0e08)) == 64 62 | assert round(bits(192_077, 1.0e09)) == 64 63 | assert round(bits(60704, 1.0e10)) == 64 64 | assert round(bits(19208, 1.0e11)) == 64 65 | assert round(bits(6074, 1.0e12)) == 64 66 | assert round(bits(1921, 1.0e13)) == 64 67 | assert round(bits(608, 1.0e14)) == 64 68 | assert round(bits(193, 1.0e15)) == 64 69 | assert round(bits(61, 1.0e16)) == 64 70 | assert round(bits(20, 1.0e17)) == 64 71 | assert round(bits(7, 1.0e18)) == 64 72 | end 73 | 74 | test "preshing 160-bit" do 75 | assert round(bits(1.42e24, 2)) == 160 76 | assert round(bits(5.55e23, 10)) == 160 77 | assert round(bits(1.71e23, 100)) == 160 78 | assert round(bits(5.41e22, 1000)) == 160 79 | assert round(bits(1.71e22, 1.0e04)) == 160 80 | assert round(bits(5.41e21, 1.0e05)) == 160 81 | assert round(bits(1.71e21, 1.0e06)) == 160 82 | assert round(bits(5.41e20, 1.0e07)) == 160 83 | assert round(bits(1.71e20, 1.0e08)) == 160 84 | assert round(bits(5.41e19, 1.0e09)) == 160 85 | assert round(bits(1.71e19, 1.0e10)) == 160 86 | assert round(bits(5.41e18, 1.0e11)) == 160 87 | assert round(bits(1.71e18, 1.0e12)) == 160 88 | assert round(bits(5.41e17, 1.0e13)) == 160 89 | assert round(bits(1.71e17, 1.0e14)) == 160 90 | assert round(bits(5.41e16, 1.0e15)) == 160 91 | assert round(bits(1.71e16, 1.0e16)) == 160 92 | assert round(bits(5.41e15, 1.0e17)) == 160 93 | assert round(bits(1.71e15, 1.0e18)) == 160 94 | end 95 | 96 | test "NaN entropy bits" do 97 | assert bits(-1, 100) == NaN 98 | assert bits(100, -1) == NaN 99 | assert bits(-1, -1) == NaN 100 | end 101 | 102 | test "module entropy total/risk" do 103 | defmodule(TotalRiskId, do: use(EntropyString, total: 10_000_000, risk: 10.0e12)) 104 | assert round(TotalRiskId.bits()) == 89 105 | end 106 | 107 | test "module entropy bits" do 108 | defmodule(BitsId, do: use(EntropyString, bits: 96)) 109 | assert BitsId.bits() == 96 110 | end 111 | end 112 | -------------------------------------------------------------------------------- /test/charset_test.exs: -------------------------------------------------------------------------------- 1 | defmodule EntropyString.CharSet.Test do 2 | use ExUnit.Case, async: true 3 | 4 | doctest EntropyString.CharSet 5 | 6 | alias EntropyString.CharSet, as: CharSet 7 | 8 | @charsets [ 9 | CharSet.charset64(), 10 | CharSet.charset32(), 11 | CharSet.charset16(), 12 | CharSet.charset8(), 13 | CharSet.charset4(), 14 | CharSet.charset2() 15 | ] 16 | @bits_per_byte 8 17 | 18 | test "module charset" do 19 | defmodule(HexChars, do: use(EntropyString, charset: CharSet.charset16())) 20 | assert CharSet.charset16() === HexChars.charset() 21 | end 22 | 23 | test "bits per char" do 24 | testCharSet = fn charset -> 25 | actual = CharSet.bits_per_char(charset) 26 | expected = round(:math.log2(byte_size(charset))) 27 | assert actual == expected 28 | end 29 | 30 | Enum.each(@charsets, testCharSet) 31 | end 32 | 33 | test "bytes needed" do 34 | Enum.each(:lists.seq(0, 17), fn bits -> 35 | Enum.each(@charsets, fn charset -> 36 | bytesNeeded = CharSet.bytes_needed(bits, charset) 37 | atLeast = Float.ceil(bits / @bits_per_byte) 38 | assert atLeast <= bytesNeeded 39 | assert bytesNeeded <= atLeast + 1 40 | end) 41 | end) 42 | end 43 | 44 | test "charset length" do 45 | assert byte_size(CharSet.charset64()) == 64 46 | assert byte_size(CharSet.charset32()) == 32 47 | assert byte_size(CharSet.charset16()) == 16 48 | assert byte_size(CharSet.charset8()) == 8 49 | assert byte_size(CharSet.charset4()) == 4 50 | assert byte_size(CharSet.charset2()) == 2 51 | end 52 | 53 | test "invalid charset" do 54 | Enum.each( 55 | :lists.filter( 56 | fn len -> round(:math.log2(len)) != :math.log2(len) end, 57 | :lists.seq(1, 65) 58 | ), 59 | fn n -> 60 | {:error, _} = EntropyString.random(16, :binary.list_to_bin(:lists.seq(1, n))) 61 | end 62 | ) 63 | end 64 | end 65 | -------------------------------------------------------------------------------- /test/entropy_string_test.exs: -------------------------------------------------------------------------------- 1 | defmodule EntropyString.Test do 2 | use ExUnit.Case, async: true 3 | 4 | doctest EntropyString 5 | 6 | alias EntropyString.CharSet, as: CharSet 7 | 8 | import EntropyString 9 | 10 | test "Module total and risk entropy bits" do 11 | defmodule(TotalRisk, do: use(EntropyString, total: 1.0e9, risk: 1.0e12)) 12 | assert round(TotalRisk.bits()) == 99 13 | assert String.length(TotalRisk.string()) == 20 14 | assert String.length(TotalRisk.random()) == 20 15 | end 16 | 17 | test "Module entropy bits" do 18 | defmodule(Bits, do: use(EntropyString, bits: 122)) 19 | assert round(Bits.bits()) == 122 20 | assert String.length(Bits.string()) == String.length(Bits.random()) 21 | end 22 | 23 | test "CharSet.charset64()", do: with_64(CharSet.charset64()) 24 | test ":charset64", do: with_64(:charset64) 25 | 26 | defp with_64(charset) do 27 | assert random(6, charset, <<0xDD>>) == "3" 28 | assert random(12, charset, <<0x78, 0xFC>>) == "eP" 29 | assert random(18, charset, <<0xC5, 0x6F, 0x21>>) == "xW8" 30 | assert random(24, charset, <<0xC9, 0x68, 0xC7>>) == "yWjH" 31 | assert random(30, charset, <<0xA5, 0x62, 0x20, 0x87>>) == "pWIgh" 32 | assert random(36, charset, <<0x39, 0x51, 0xCA, 0xCC, 0x8B>>) == "OVHKzI" 33 | assert random(42, charset, <<0x83, 0x89, 0x00, 0xC7, 0xF4, 0x02>>) == "g4kAx_Q" 34 | assert random(48, charset, <<0x51, 0xBC, 0xA8, 0xC7, 0xC9, 0x17>>) == "Ubyox8kX" 35 | assert random(54, charset, <<0xD2, 0xE3, 0xE9, 0xDA, 0x19, 0x97, 0x52>>) == "0uPp2hmXU" 36 | assert random(60, charset, <<0xD9, 0x39, 0xC1, 0xAF, 0x1E, 0x2E, 0x69, 0x48>>) == "2TnBrx4uaU" 37 | 38 | assert random(66, charset, <<0x78, 0x3F, 0xFD, 0x93, 0xD1, 0x06, 0x90, 0x4B, 0xD6>>) == 39 | "eD_9k9EGkEv" 40 | 41 | assert random(72, charset, <<0x9D, 0x99, 0x4E, 0xA5, 0xD2, 0x3F, 0x8C, 0x86, 0x80>>) == 42 | "nZlOpdI_jIaA" 43 | end 44 | 45 | test "CharSet.charset32()", do: with_32(CharSet.charset32()) 46 | test ":charset32", do: with_32(:charset32) 47 | 48 | defp with_32(charset) do 49 | assert random(5, charset, <<0xDD>>) == "N" 50 | assert random(10, charset, <<0x78, 0xFC>>) == "p6" 51 | assert random(15, charset, <<0x78, 0xFC>>) == "p6R" 52 | assert random(20, charset, <<0xC5, 0x6F, 0x21>>) == "JFHt" 53 | assert random(25, charset, <<0xA5, 0x62, 0x20, 0x87>>) == "DFr43" 54 | assert random(30, charset, <<0xA5, 0x62, 0x20, 0x87>>) == "DFr433" 55 | assert random(35, charset, <<0x39, 0x51, 0xCA, 0xCC, 0x8B>>) == "b8dPFB7" 56 | assert random(40, charset, <<0x39, 0x51, 0xCA, 0xCC, 0x8B>>) == "b8dPFB7h" 57 | assert random(45, charset, <<0x83, 0x89, 0x00, 0xC7, 0xF4, 0x02>>) == "qn7q3rTD2" 58 | 59 | assert random(50, charset, <<0xD2, 0xE3, 0xE9, 0xDA, 0x19, 0x97, 0x52>>) == "MhrRBGqLtQ" 60 | 61 | assert random(55, charset, <<0xD2, 0xE3, 0xE9, 0xDA, 0x19, 0x97, 0x52>>) == "MhrRBGqLtQf" 62 | end 63 | 64 | test "CharSet.charset16()", do: with_16(CharSet.charset16()) 65 | test ":charset16", do: with_16(:charset16) 66 | 67 | defp with_16(charset) do 68 | assert random(4, charset, <<0x9D>>) == "9" 69 | assert random(8, charset, <<0xAE>>) == "ae" 70 | assert random(12, charset, <<0x01, 0xF2>>) == "01f" 71 | assert random(16, charset, <<0xC7, 0xC9>>) == "c7c9" 72 | assert random(20, charset, <<0xC7, 0xC9, 0x00>>) == "c7c90" 73 | end 74 | 75 | test "CharSet.charset8()", do: with_8(CharSet.charset8()) 76 | test ":charset8", do: with_8(:charset8) 77 | 78 | defp with_8(charset) do 79 | assert random(3, charset, <<0x5A>>) == "2" 80 | assert random(6, charset, <<0x5A>>) == "26" 81 | assert random(9, charset, <<0x21, 0xA4>>) == "103" 82 | assert random(12, charset, <<0x21, 0xA4>>) == "1032" 83 | assert random(15, charset, <<0xDA, 0x19>>) == "66414" 84 | assert random(18, charset, <<0xFD, 0x93, 0xD1>>) == "773117" 85 | assert random(21, charset, <<0xFD, 0x93, 0xD1>>) == "7731172" 86 | assert random(24, charset, <<0xFD, 0x93, 0xD1>>) == "77311721" 87 | assert random(27, charset, <<0xC7, 0xC9, 0x07, 0xC9>>) == "617444076" 88 | assert random(30, charset, <<0xC7, 0xC9, 0x07, 0xC9>>) == "6174440762" 89 | end 90 | 91 | test "CharSet.charset4()", do: with_4(CharSet.charset4()) 92 | test ":charset4", do: with_4(:charset4) 93 | 94 | defp with_4(charset) do 95 | assert random(2, charset, <<0x5A>>) == "T" 96 | assert random(4, charset, <<0x5A>>) == "TT" 97 | assert random(6, charset, <<0x93>>) == "CTA" 98 | assert random(8, charset, <<0x93>>) == "CTAG" 99 | assert random(10, charset, <<0x20, 0xF1>>) == "ACAAG" 100 | assert random(12, charset, <<0x20, 0xF1>>) == "ACAAGG" 101 | assert random(14, charset, <<0x20, 0xF1>>) == "ACAAGGA" 102 | assert random(16, charset, <<0x20, 0xF1>>) == "ACAAGGAT" 103 | end 104 | 105 | test "CharSet.charset2()", do: with_2(CharSet.charset2()) 106 | test ":charset2", do: with_2(:charset2) 107 | 108 | defp with_2(charset) do 109 | assert random(1, charset, <<0x27>>) == "0" 110 | assert random(2, charset, <<0x27>>) == "00" 111 | assert random(3, charset, <<0x27>>) == "001" 112 | assert random(4, charset, <<0x27>>) == "0010" 113 | assert random(5, charset, <<0x27>>) == "00100" 114 | assert random(6, charset, <<0x27>>) == "001001" 115 | assert random(7, charset, <<0x27>>) == "0010011" 116 | assert random(8, charset, <<0x27>>) == "00100111" 117 | assert random(9, charset, <<0xE3, 0xE9>>) == "111000111" 118 | assert random(16, charset, <<0xE3, 0xE9>>) == "1110001111101001" 119 | end 120 | 121 | test "small" do 122 | assert byte_size(small()) == 6 123 | 124 | assert byte_size(small(:charset64)) == 5 125 | assert byte_size(small(:charset32)) == 6 126 | assert byte_size(small(:charset16)) == 8 127 | assert byte_size(small(:charset8)) == 10 128 | assert byte_size(small(:charset4)) == 15 129 | assert byte_size(small(:charset2)) == 29 130 | end 131 | 132 | test "medium" do 133 | assert byte_size(medium()) == 14 134 | 135 | assert byte_size(medium(:charset64)) == 12 136 | assert byte_size(medium(:charset32)) == 14 137 | assert byte_size(medium(:charset16)) == 18 138 | assert byte_size(medium(:charset8)) == 23 139 | assert byte_size(medium(:charset4)) == 35 140 | assert byte_size(medium(:charset2)) == 69 141 | end 142 | 143 | test "large" do 144 | assert byte_size(large()) == 20 145 | 146 | assert byte_size(large(:charset64)) == 17 147 | assert byte_size(large(:charset32)) == 20 148 | assert byte_size(large(:charset16)) == 25 149 | assert byte_size(large(:charset8)) == 33 150 | assert byte_size(large(:charset4)) == 50 151 | assert byte_size(large(:charset2)) == 99 152 | end 153 | 154 | test "session" do 155 | assert byte_size(session()) == 26 156 | 157 | assert byte_size(session(:charset64)) == 22 158 | assert byte_size(session(:charset32)) == 26 159 | assert byte_size(session(:charset16)) == 32 160 | assert byte_size(session(:charset8)) == 43 161 | assert byte_size(session(:charset4)) == 64 162 | assert byte_size(session(:charset2)) == 128 163 | end 164 | 165 | test "token" do 166 | assert byte_size(token()) == 43 167 | 168 | assert byte_size(token(:charset64)) == 43 169 | assert byte_size(token(:charset32)) == 52 170 | assert byte_size(token(:charset16)) == 64 171 | assert byte_size(token(:charset8)) == 86 172 | assert byte_size(token(:charset4)) == 128 173 | assert byte_size(token(:charset2)) == 256 174 | end 175 | 176 | test "invalid byte count" do 177 | {:error, _} = random(7, :charset64, <<1>>) 178 | {:error, _} = random(13, :charset64, <<1, 2>>) 179 | {:error, _} = random(25, :charset64, <<1, 2, 3>>) 180 | {:error, _} = random(31, :charset64, <<1, 2, 3, 4>>) 181 | 182 | {:error, _} = random(6, :charset32, <<1>>) 183 | {:error, _} = random(16, :charset32, <<1, 2>>) 184 | {:error, _} = random(21, :charset32, <<1, 2, 3>>) 185 | {:error, _} = random(31, :charset32, <<1, 2, 3, 4>>) 186 | {:error, _} = random(32, :charset32, <<1, 2, 3, 4>>) 187 | {:error, _} = random(41, :charset32, <<1, 2, 3, 4, 5>>) 188 | {:error, _} = random(46, :charset32, <<1, 2, 3, 4, 5, 6>>) 189 | 190 | {:error, _} = random(9, :charset16, <<1>>) 191 | {:error, _} = random(17, :charset16, <<1, 2>>) 192 | 193 | {:error, _} = random(7, :charset8, <<1>>) 194 | {:error, _} = random(16, :charset8, <<1, 2>>) 195 | {:error, _} = random(25, :charset8, <<1, 2, 3>>) 196 | {:error, _} = random(31, :charset8, <<1, 2, 3, 4>>) 197 | 198 | {:error, _} = random(9, :charset4, <<1>>) 199 | {:error, _} = random(17, :charset4, <<1, 2>>) 200 | 201 | {:error, _} = random(9, :charset2, <<1>>) 202 | {:error, _} = random(17, :charset2, <<1, 2>>) 203 | end 204 | 205 | test "custom characters safe64" do 206 | charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ9876543210_-" 207 | bytes = <<0x9D, 0x99, 0x4E, 0xA5, 0xD2, 0x3F, 0x8C, 0x86, 0x80>> 208 | assert random(72, charset, bytes) == "NzLoPDi-JiAa" 209 | 210 | charset = "2346789BDFGHJMNPQRTbdfghjlmnpqrt" 211 | bytes = <<0xD2, 0xE3, 0xE9, 0xDA, 0x19, 0x97, 0x52>> 212 | assert random(55, charset, bytes) == "mHRrbgQlTqF" 213 | 214 | charset = "0123456789ABCDEF" 215 | bytes = <<0xC7, 0xC9, 0x00>> 216 | assert random(20, charset, bytes) == "C7C90" 217 | 218 | charset = "abcdefgh" 219 | bytes = <<0xC7, 0xC9, 0x07, 0xC9>> 220 | assert random(30, charset, bytes) == "gbheeeahgc" 221 | 222 | charset = "atcg" 223 | bytes = <<0x20, 0xF1>> 224 | assert random(16, charset, bytes) == "acaaggat" 225 | 226 | charset = "HT" 227 | bytes = <<0xE3, 0xE9>> 228 | assert random(16, charset, bytes) == "TTTHHHTTTTTHTHHT" 229 | end 230 | 231 | test "custom characters dingosky" do 232 | chars = "dingosky" 233 | defmodule(DingoSky, do: use(EntropyString, bits: 48, charset: chars)) 234 | 235 | DingoSky.random() 236 | |> String.graphemes() 237 | |> Enum.each(fn char -> assert String.contains?(chars, char) end) 238 | end 239 | 240 | test "invalid charset for random" do 241 | {:error, _} = random(10, <<"H20">>) 242 | {:error, _} = random(10, <<"H202">>) 243 | end 244 | 245 | test "invalid charset for module" do 246 | assert_raise EntropyString.Error, fn -> 247 | defmodule(Vowel, do: use(EntropyString, charset: "aeiou")) 248 | end 249 | end 250 | end 251 | -------------------------------------------------------------------------------- /test/test_helper.exs: -------------------------------------------------------------------------------- 1 | ExUnit.start() 2 | --------------------------------------------------------------------------------