├── .gitignore ├── LICENSE.md ├── README.md ├── benchmarks ├── benchmark.log └── benchmark.sh ├── core.py ├── decoder.py ├── distributions.py ├── encoder.py ├── lt_codes.py ├── md5_checker.sh └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | files/ 2 | __pycache__/ 3 | .ipynb_checkpoints/ 4 | dummy 5 | dummy-* -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 François Andrieux 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fountain Code: Efficient Python Implementation of LT Codes 2 | 3 | This project is the implementation in Python of the iterative encoding and iterative decoding algorithms of the [LT Codes](https://en.wikipedia.org/wiki/LT_codes), 4 | an error correction code based on the principles of [Fountain Codes](https://en.wikipedia.org/wiki/Fountain_code) by Michael Luby. 5 | I have written a whole article on LT Codes and this snippet that you can find here : [franpapers.com](https://franpapers.com/en/algorithmic/2018-introduction-to-fountain-codes-lt-codes-with-python/) 6 | 7 | The encoder and decoder are optimized to handle big transfers for files between 1MB to 1GB at high speed. 8 | 9 | ### Installation 10 | 11 | This implementation requires at least python 3.x. 12 | Some packages are not built-in. To install them with `pip` you can do: 13 | 14 | ``` 15 | $ pip install -r requirements.txt 16 | ``` 17 | 18 | ## Usage 19 | 20 | An example describing how to use the implementation is in `lt_codes.py`, and you can use it to encode/decode a file on the fly (creates a file copy): 21 | ``` 22 | $ python lt_codes.py filename [-h] [-r REDUNDANCY] [--systematic] [--verbose] [--x86] 23 | ``` 24 | 25 | As an example, here is a basic test to ensure the integrity of the final file: 26 | ``` 27 | $ echo "Hello!" > test.txt 28 | $ python lt_codes.py test.txt --systematic 29 | ``` 30 | A new file test-copy.txt should be created with the same content. 31 | 32 | ### Content 33 | 34 | * `core.py` contains the Symbol class, constants and functions that are used in both encoding and decoding. 35 | * `distributions.py` contains the two functions that generate degrees based on the ideal soliton and robust soliton distributions 36 | * `encoder.py` contains the encoding algorithm 37 | * `decoder.py` contains the decoding algorithm 38 | * `md5_checker.sh` calls `lt_codes.py` and then compare the integrity of the original file with the newly created file. The integrity check is made with `md5sum`, add the ".exe" if you work on Window. Replace it by `md5 -r` if you work on Mac, or run `brew install md5sha1sum`. 39 | 40 | ## Benchmarks 41 | The time consumed by the encoding and decoding process is completely related to the size of the file to encode and the wanted redundancy. 42 | I have made some measure on an Intel i5 @ 2.30GHz with a 1.5 redundancy : 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 |
Size (MB)BlocksSymbolsEncodingDecoding
Time (s)Speed (MB/s)Time (s)Speed (MB/s)
116240.00-0.00-
100160024000.21476.10.31322.5
120019200288003.86310.839.8230.1
200032000480006.44310.5104.1019.2
3600576008640023.14155.5426.368.4
108 | 109 | 110 | 111 | Note: `PACKET_SIZE` is set to 65536 for theses tests. Lowering it will result in lower speeds but it might be necessary to send small files in many chunks. 112 | 113 | 114 | ## References 115 | 116 | > M.Luby, "LT Codes", The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. 117 | 118 | ## License 119 | 120 | MIT License 121 | Copyright (c) 2018 François Andrieux 122 | 123 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 124 | 125 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 126 | 127 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 128 | 129 | -------------------------------------------------------------------------------- /benchmarks/benchmark.log: -------------------------------------------------------------------------------- 1 | Creating dummy data for size=100 2 | ====================================== 3 | Redundancy: 1.5 4 | Systematic: True 5 | Filesize: 100 bytes 6 | Blocks: 1 7 | Drops: 1 8 | 9 | Generating graph... 10 | Ready for encoding. 11 | -- Encoding: 1/1 - 100.00% symbols at 625.00 MB/s ~0.00s 12 | ----- Correctly dropped 1 symbols (packet size=65536) 13 | Graph built back. Ready for decoding. 14 | 15 | ----- Solved Blocks 1/ 1 -- 16 | Wrote 100 bytes in dummy-copy 17 | 18 | Creating dummy data for size=102400 19 | ====================================== 20 | Redundancy: 1.5 21 | Systematic: True 22 | Filesize: 102400 bytes 23 | Blocks: 2 24 | Drops: 3 25 | 26 | Generating graph... 27 | Ready for encoding. 28 | -- Encoding: 3/3 - 100.00% symbols at 170.80 MB/s ~0.00s 29 | ----- Correctly dropped 3 symbols (packet size=65536) 30 | Graph built back. Ready for decoding. 31 | -- Decoding: 2/2 - 100.00% symbols at 1250.00 MB/s ~0.00s 32 | ----- Solved Blocks 2/ 2 -- 33 | Wrote 102400 bytes in dummy-copy 34 | 35 | Creating dummy data for size=1048576 36 | ====================================== 37 | Redundancy: 1.5 38 | Systematic: True 39 | Filesize: 1048576 bytes 40 | Blocks: 16 41 | Drops: 24 42 | 43 | Generating graph... 44 | Ready for encoding. 45 | -- Encoding: 24/24 - 100.00% symbols at 715.47 MB/s ~0.00s 46 | ----- Correctly dropped 24 symbols (packet size=65536) 47 | Graph built back. Ready for decoding. 48 | -- Decoding: 16/16 - 100.00% symbols at 10000.00 MB/s ~0.00s 49 | ----- Solved Blocks 16/16 -- 50 | Wrote 983040 bytes in dummy-copy 51 | 52 | Creating dummy data for size=104857600 53 | ====================================== 54 | Redundancy: 1.5 55 | Systematic: True 56 | Filesize: 104857600 bytes 57 | Blocks: 1600 58 | Drops: 2400 59 | 60 | Generating graph... 61 | Ready for encoding. 62 | -- Encoding: 2400/2400 - 100.00% symbols at 724.93 MB/s ~0.21s 63 | ----- Correctly dropped 2400 symbols (packet size=65536) 64 | Graph built back. Ready for decoding. 65 | -- Decoding: 1600/1600 - 100.00% symbols at 323.91 MB/s ~0.31s 66 | ----- Solved Blocks 1600/1600 -- 67 | Wrote 104792064 bytes in dummy-copy 68 | 69 | Creating dummy data for size=419430400 70 | ====================================== 71 | Redundancy: 1.5 72 | Systematic: True 73 | Filesize: 419430400 bytes 74 | Blocks: 6400 75 | Drops: 9600 76 | 77 | Generating graph... 78 | Ready for encoding. 79 | -- Encoding: 9600/9600 - 100.00% symbols at 674.76 MB/s ~0.89s 80 | ----- Correctly dropped 9600 symbols (packet size=65536) 81 | Graph built back. Ready for decoding. 82 | -- Decoding: 6400/6400 - 100.00% symbols at 99.50 MB/s ~4.02s 83 | ----- Solved Blocks 6400/6400 -- 84 | Wrote 419364864 bytes in dummy-copy 85 | 86 | Creating dummy data for size=838860800 87 | ====================================== 88 | Redundancy: 1.5 89 | Systematic: True 90 | Filesize: 838860800 bytes 91 | Blocks: 12800 92 | Drops: 19200 93 | 94 | Generating graph... 95 | Ready for encoding. 96 | -- Encoding: 19200/19200 - 100.00% symbols at 525.49 MB/s ~2.28s 97 | ----- Correctly dropped 19200 symbols (packet size=65536) 98 | Graph built back. Ready for decoding. 99 | -- Decoding: 12800/12800 - 100.00% symbols at 48.95 MB/s ~16.34s 100 | ----- Solved Blocks 12800/12800 -- 101 | Wrote 838795264 bytes in dummy-copy 102 | 103 | Creating dummy data for size=1258291200 104 | ====================================== 105 | Redundancy: 1.5 106 | Systematic: True 107 | Filesize: 1258291200 bytes 108 | Blocks: 19200 109 | Drops: 28800 110 | 111 | Generating graph... 112 | Ready for encoding. 113 | -- Encoding: 28800/28800 - 100.00% symbols at 465.99 MB/s ~3.86s 114 | ----- Correctly dropped 28800 symbols (packet size=65536) 115 | Graph built back. Ready for decoding. 116 | -- Decoding: 19200/19200 - 100.00% symbols at 30.13 MB/s ~39.82s 117 | ----- Solved Blocks 19200/19200 -- 118 | Wrote 1258225664 bytes in dummy-copy 119 | 120 | Creating dummy data for size=1264867868 121 | ====================================== 122 | Redundancy: 1.5 123 | Systematic: True 124 | Filesize: 1264867868 bytes 125 | Blocks: 19301 126 | Drops: 28951 127 | 128 | Generating graph... 129 | Ready for encoding. 130 | -- Encoding: 28951/28951 - 100.00% symbols at 543.55 MB/s ~3.33s 131 | ----- Correctly dropped 28951 symbols (packet size=65536) 132 | Graph built back. Ready for decoding. 133 | -- Decoding: 19301/19301 - 100.00% symbols at 30.67 MB/s ~39.33s 134 | ----- Solved Blocks 19301/19301 -- 135 | Wrote 1264867868 bytes in dummy-copy 136 | 137 | Creating dummy data for size=2097152000 138 | ====================================== 139 | Redundancy: 1.5 140 | Systematic: True 141 | Filesize: 2097152000 bytes 142 | Blocks: 32000 143 | Drops: 48000 144 | 145 | Generating graph... 146 | Ready for encoding. 147 | -- Encoding: 48000/48000 - 100.00% symbols at 466.18 MB/s ~6.44s 148 | ----- Correctly dropped 48000 symbols (packet size=65536) 149 | Graph built back. Ready for decoding. 150 | -- Decoding: 32000/32000 - 100.00% symbols at 19.21 MB/s ~104.10s 151 | ----- Solved Blocks 32000/32000 -- 152 | Wrote 2097086464 bytes in dummy-copy 153 | 154 | Creating dummy data for size=2516582400 155 | ====================================== 156 | Redundancy: 1.5 157 | Systematic: True 158 | Filesize: 2516582400 bytes 159 | Blocks: 38400 160 | Drops: 57600 161 | 162 | Generating graph... 163 | Ready for encoding. 164 | -- Encoding: 57600/57600 - 100.00% symbols at 420.50 MB/s ~8.56s 165 | ----- Correctly dropped 57600 symbols (packet size=65536) 166 | Graph built back. Ready for decoding. 167 | -- Decoding: 38400/38400 - 100.00% symbols at 15.61 MB/s ~153.71s 168 | ----- Solved Blocks 38400/38400 -- 169 | Wrote 2516516864 bytes in dummy-copy 170 | 171 | Creating dummy data for size=2931315179 172 | ====================================== 173 | Redundancy: 1.5 174 | Systematic: True 175 | Filesize: 2931315179 bytes 176 | Blocks: 44729 177 | Drops: 67093 178 | 179 | Generating graph... 180 | Ready for encoding. 181 | -- Encoding: 67093/67093 - 100.00% symbols at 495.22 MB/s ~8.47s 182 | ----- Correctly dropped 67093 symbols (packet size=65536) 183 | Graph built back. Ready for decoding. 184 | -- Decoding: 44729/44729 - 100.00% symbols at 10.41 MB/s ~268.49s 185 | ----- Solved Blocks 44729/44729 -- 186 | Wrote 2931315179 bytes in dummy-copy 187 | 188 | Creating dummy data for size=3774873600 189 | ====================================== 190 | Redundancy: 1.5 191 | Systematic: True 192 | Filesize: 3774873600 bytes 193 | Blocks: 57600 194 | Drops: 86400 195 | 196 | Generating graph... 197 | Ready for encoding. 198 | -- Encoding: 86400/86400 - 100.00% symbols at 233.35 MB/s ~23.14s 199 | ----- Correctly dropped 86400 symbols (packet size=65536) 200 | Graph built back. Ready for decoding. 201 | -- Decoding: 57600/57600 - 100.00% symbols at 8.44 MB/s ~426.36s 202 | 203 | ----- Solved Blocks 57600/57600 -- 204 | Wrote 3774808064 bytes in dummy-copy 205 | -------------------------------------------------------------------------------- /benchmarks/benchmark.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Sizes: 100B, 100KB, 1MB, 100MB, 400MB, 800MB, 1.17GB, 1.56GB, 1.95GB, 2.34GB, 2.73GB, 3.51GB 4 | sizes="100 102400 1048576 104857600 419430400 838860800 1258291200 1264867868 2097152000 2516582400 2931315179 3774873600" 5 | for SIZE in $sizes 6 | do 7 | echo -e "\nCreating dummy data for size=$SIZE" 8 | echo "======================================" 9 | echo "" > benchmark.log 10 | head -c $SIZE /dev/urandom > dummy 11 | python ../lt_codes.py dummy -r 1.5 --systematic 12 | done 13 | 14 | rm dummy 15 | rm dummy-copy 16 | -------------------------------------------------------------------------------- /core.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import math 4 | import time 5 | import numpy as np 6 | import random 7 | from random import choices 8 | 9 | SYSTEMATIC = False 10 | VERBOSE = False 11 | PACKET_SIZE = 65536 12 | # PACKET_SIZE = 32768 13 | # PACKET_SIZE = 16384 14 | # PACKET_SIZE = 4096 15 | # PACKET_SIZE = 1024 16 | # PACKET_SIZE = 512 17 | # PACKET_SIZE = 128 18 | ROBUST_FAILURE_PROBABILITY = 0.01 19 | NUMPY_TYPE = np.uint64 20 | # NUMPY_TYPE = np.uint32 21 | # NUMPY_TYPE = np.uint16 22 | # NUMPY_TYPE = np.uint8 23 | EPSILON = 0.0001 24 | 25 | class Symbol: 26 | __slots__ = ["index", "degree", "data", "neighbors"] # fixing attributes may reduce memory usage 27 | 28 | def __init__(self, index, degree, data): 29 | self.index = index 30 | self.degree = degree 31 | self.data = data 32 | 33 | def log(self, blocks_quantity): 34 | neighbors, _ = generate_indexes(self.index, self.degree, blocks_quantity) 35 | print("symbol_{} degree={}\t {}".format(self.index, self.degree, neighbors)) 36 | 37 | def generate_indexes(symbol_index, degree, blocks_quantity): 38 | """Randomly get `degree` indexes, given the symbol index as a seed 39 | 40 | Generating with a seed allows saving only the seed (and the amount of degrees) 41 | and not the whole array of indexes. That saves memory, but also bandwidth when paquets are sent. 42 | 43 | The random indexes need to be unique because the decoding process uses dictionnaries for performance enhancements. 44 | Additionnally, even if XORing one block with itself among with other is not a problem for the algorithm, 45 | it is better to avoid uneffective operations like that. 46 | 47 | To be sure to get the same random indexes, we need to pass 48 | """ 49 | if SYSTEMATIC and symbol_index < blocks_quantity: 50 | indexes = [symbol_index] 51 | degree = 1 52 | else: 53 | random.seed(symbol_index) 54 | indexes = random.sample(range(blocks_quantity), degree) 55 | 56 | return indexes, degree 57 | 58 | def log(process, iteration, total, start_time): 59 | """Log the processing in a gentle way, each seconds""" 60 | global log_actual_time 61 | 62 | if "log_actual_time" not in globals(): 63 | log_actual_time = time.time() 64 | 65 | if time.time() - log_actual_time > 1 or iteration == total - 1: 66 | 67 | log_actual_time = time.time() 68 | elapsed = log_actual_time - start_time + EPSILON 69 | speed = (iteration + 1) / elapsed * PACKET_SIZE / (1024 * 1024) 70 | 71 | print("-- {}: {}/{} - {:.2%} symbols at {:.2f} MB/s ~{:.2f}s".format( 72 | process, iteration + 1, total, (iteration + 1) / total, speed, elapsed), end="\r", flush=True) -------------------------------------------------------------------------------- /decoder.py: -------------------------------------------------------------------------------- 1 | from core import * 2 | 3 | def recover_graph(symbols, blocks_quantity): 4 | """ Get back the same random indexes (or neighbors), thanks to the symbol id as seed. 5 | For an easy implementation purpose, we register the indexes as property of the Symbols objects. 6 | """ 7 | 8 | for symbol in symbols: 9 | 10 | neighbors, deg = generate_indexes(symbol.index, symbol.degree, blocks_quantity) 11 | symbol.neighbors = {x for x in neighbors} 12 | symbol.degree = deg 13 | 14 | if VERBOSE: 15 | symbol.log(blocks_quantity) 16 | 17 | return symbols 18 | 19 | def reduce_neighbors(block_index, blocks, symbols): 20 | """ Loop over the remaining symbols to find for a common link between 21 | each symbol and the last solved block `block` 22 | 23 | To avoid increasing complexity and another for loop, the neighbors are stored as dictionnary 24 | which enable to directly delete the entry after XORing back. 25 | """ 26 | 27 | for other_symbol in symbols: 28 | if other_symbol.degree > 1 and block_index in other_symbol.neighbors: 29 | 30 | # XOR the data and remove the index from the neighbors 31 | other_symbol.data = np.bitwise_xor(blocks[block_index], other_symbol.data) 32 | other_symbol.neighbors.remove(block_index) 33 | 34 | other_symbol.degree -= 1 35 | 36 | if VERBOSE: 37 | print("XOR block_{} with symbol_{} :".format(block_index, other_symbol.index), list(other_symbol.neighbors.keys())) 38 | 39 | 40 | def decode(symbols, blocks_quantity): 41 | """ Iterative decoding - Decodes all the passed symbols to build back the data as blocks. 42 | The function returns the data at the end of the process. 43 | 44 | 1. Search for an output symbol of degree one 45 | (a) If such an output symbol y exists move to step 2. 46 | (b) If no output symbols of degree one exist, iterative decoding exits and decoding fails. 47 | 48 | 2. Output symbol y has degree one. Thus, denoting its only neighbour as v, the 49 | value of v is recovered by setting v = y. 50 | 51 | 3. Update. 52 | 53 | 4. If all k input symbols have been recovered, decoding is successful and iterative 54 | decoding ends. Otherwise, go to step 1. 55 | """ 56 | 57 | symbols_n = len(symbols) 58 | assert symbols_n > 0, "There are no symbols to decode." 59 | 60 | # We keep `blocks_n` notation and create the empty list 61 | blocks_n = blocks_quantity 62 | blocks = [None] * blocks_n 63 | 64 | # Recover the degrees and associated neighbors using the seed (the index, cf. encoding). 65 | symbols = recover_graph(symbols, blocks_n) 66 | print("Graph built back. Ready for decoding.", flush=True) 67 | 68 | solved_blocks_count = 0 69 | iteration_solved_count = 0 70 | start_time = time.time() 71 | 72 | while iteration_solved_count > 0 or solved_blocks_count == 0: 73 | 74 | iteration_solved_count = 0 75 | 76 | # Search for solvable symbols 77 | for i, symbol in enumerate(symbols): 78 | 79 | # Check the current degree. If it's 1 then we can recover data 80 | if symbol.degree == 1: 81 | 82 | iteration_solved_count += 1 83 | block_index = next(iter(symbol.neighbors)) 84 | symbols.pop(i) 85 | 86 | # This symbol is redundant: another already helped decoding the same block 87 | if blocks[block_index] is not None: 88 | continue 89 | 90 | blocks[block_index] = symbol.data 91 | 92 | if VERBOSE: 93 | print("Solved block_{} with symbol_{}".format(block_index, symbol.index)) 94 | 95 | # Update the count and log the processing 96 | solved_blocks_count += 1 97 | log("Decoding", solved_blocks_count, blocks_n, start_time) 98 | 99 | # Reduce the degrees of other symbols that contains the solved block as neighbor 100 | reduce_neighbors(block_index, blocks, symbols) 101 | 102 | print("\n----- Solved Blocks {:2}/{:2} --".format(solved_blocks_count, blocks_n)) 103 | 104 | return np.asarray(blocks), solved_blocks_count -------------------------------------------------------------------------------- /distributions.py: -------------------------------------------------------------------------------- 1 | from core import * 2 | 3 | def ideal_distribution(N): 4 | """ Create the ideal soliton distribution. 5 | In practice, this distribution gives not the best results 6 | Cf. https://en.wikipedia.org/wiki/Soliton_distribution 7 | """ 8 | 9 | probabilities = [0, 1 / N] 10 | probabilities += [1 / (k * (k - 1)) for k in range(2, N+1)] 11 | probabilities_sum = sum(probabilities) 12 | 13 | assert probabilities_sum >= 1 - EPSILON and probabilities_sum <= 1 + EPSILON, "The ideal distribution should be standardized" 14 | return probabilities 15 | 16 | def robust_distribution(N): 17 | """ Create the robust soliton distribution. 18 | This fixes the problems of the ideal distribution 19 | Cf. https://en.wikipedia.org/wiki/Soliton_distribution 20 | """ 21 | 22 | # The choice of M is not a part of the distribution ; it may be improved 23 | # We take the median and add +1 to avoid possible division by zero 24 | M = N // 2 + 1 25 | R = N / M 26 | 27 | extra_proba = [0] + [1 / (i * M) for i in range(1, M)] 28 | extra_proba += [math.log(R / ROBUST_FAILURE_PROBABILITY) / M] # Spike at M 29 | extra_proba += [0 for k in range(M+1, N+1)] 30 | 31 | probabilities = np.add(extra_proba, ideal_distribution(N)) 32 | probabilities /= np.sum(probabilities) 33 | probabilities_sum = np.sum(probabilities) 34 | 35 | assert probabilities_sum >= 1 - EPSILON and probabilities_sum <= 1 + EPSILON, "The robust distribution should be standardized" 36 | return probabilities 37 | -------------------------------------------------------------------------------- /encoder.py: -------------------------------------------------------------------------------- 1 | from core import * 2 | from distributions import * 3 | 4 | def get_degrees_from(distribution_name, N, k): 5 | """ Returns the random degrees from a given distribution of probabilities. 6 | The degrees distribution must look like a Poisson distribution and the 7 | degree of the first drop is 1 to ensure the start of decoding. 8 | """ 9 | 10 | if distribution_name == "ideal": 11 | probabilities = ideal_distribution(N) 12 | elif distribution_name == "robust": 13 | probabilities = robust_distribution(N) 14 | else: 15 | probabilities = None 16 | 17 | population = list(range(0, N+1)) 18 | return [1] + choices(population, probabilities, k=k-1) 19 | 20 | def encode(blocks, drops_quantity): 21 | """ Iterative encoding - Encodes new symbols and yield them. 22 | Encoding one symbol is described as follow: 23 | 24 | 1. Randomly choose a degree according to the degree distribution, save it into "deg" 25 | Note: below we prefer to randomly choose all the degrees at once for our symbols. 26 | 27 | 2. Choose uniformly at random 'deg' distinct input blocs. 28 | These blocs are also called "neighbors" in graph theory. 29 | 30 | 3. Compute the output symbol as the combination of the neighbors. 31 | In other means, we XOR the chosen blocs to produce the symbol. 32 | """ 33 | 34 | # Display statistics 35 | blocks_n = len(blocks) 36 | assert blocks_n <= drops_quantity, "Because of the unicity in the random neighbors, it is need to drop at least the same amount of blocks" 37 | 38 | print("Generating graph...") 39 | start_time = time.time() 40 | 41 | # Generate random indexes associated to random degrees, seeded with the symbol id 42 | random_degrees = get_degrees_from("robust", blocks_n, k=drops_quantity) 43 | 44 | print("Ready for encoding.", flush=True) 45 | 46 | for i in range(drops_quantity): 47 | 48 | # Get the random selection, generated precedently (for performance) 49 | selection_indexes, deg = generate_indexes(i, random_degrees[i], blocks_n) 50 | 51 | # Xor each selected array within each other gives the drop (or just take one block if there is only one selected) 52 | drop = blocks[selection_indexes[0]] 53 | for n in range(1, deg): 54 | drop = np.bitwise_xor(drop, blocks[selection_indexes[n]]) 55 | # drop = drop ^ blocks[selection_indexes[n]] # according to my tests, this has the same performance 56 | 57 | # Create symbol, then log the process 58 | symbol = Symbol(index=i, degree=deg, data=drop) 59 | 60 | if VERBOSE: 61 | symbol.log(blocks_n) 62 | 63 | log("Encoding", i, drops_quantity, start_time) 64 | 65 | yield symbol 66 | 67 | print("\n----- Correctly dropped {} symbols (packet size={})".format(drops_quantity, PACKET_SIZE)) 68 | -------------------------------------------------------------------------------- /lt_codes.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import os 5 | import math 6 | import argparse 7 | import numpy as np 8 | import core 9 | from encoder import encode 10 | from decoder import decode 11 | 12 | def blocks_read(file, filesize): 13 | """ Read the given file by blocks of `core.PACKET_SIZE` and use np.frombuffer() improvement. 14 | 15 | Byt default, we store each octet into a np.uint8 array space, but it is also possible 16 | to store up to 8 octets together in a np.uint64 array space. 17 | 18 | This process is not saving memory but it helps reduce dimensionnality, especially for the 19 | XOR operation in the encoding. Example: 20 | * np.frombuffer(b'\x01\x02', dtype=np.uint8) => array([1, 2], dtype=uint8) 21 | * np.frombuffer(b'\x01\x02', dtype=np.uint16) => array([513], dtype=uint16) 22 | """ 23 | 24 | blocks_n = math.ceil(filesize / core.PACKET_SIZE) 25 | blocks = [] 26 | 27 | # Read data by blocks of size core.PACKET_SIZE 28 | for i in range(blocks_n): 29 | 30 | data = bytearray(file.read(core.PACKET_SIZE)) 31 | 32 | if not data: 33 | raise "stop" 34 | 35 | # The last read bytes needs a right padding to be XORed in the future 36 | if len(data) != core.PACKET_SIZE: 37 | data = data + bytearray(core.PACKET_SIZE - len(data)) 38 | assert i == blocks_n-1, "Packet #{} has a not handled size of {} bytes".format(i, len(blocks[i])) 39 | 40 | # Paquets are condensed in the right array type 41 | blocks.append(np.frombuffer(data, dtype=core.NUMPY_TYPE)) 42 | 43 | return blocks 44 | 45 | def blocks_write(blocks, file, filesize): 46 | """ Write the given blocks into a file 47 | """ 48 | 49 | count = 0 50 | for data in recovered_blocks[:-1]: 51 | file_copy.write(data) 52 | count += len(data) 53 | 54 | # Convert back the bytearray to bytes and shrink back 55 | last_bytes = bytes(recovered_blocks[-1]) 56 | shrinked_data = last_bytes[:filesize % core.PACKET_SIZE] 57 | file_copy.write(shrinked_data) 58 | 59 | ######################################################### 60 | 61 | if __name__ == "__main__": 62 | 63 | parser = argparse.ArgumentParser(description="Robust implementation of LT Codes encoding/decoding process.") 64 | parser.add_argument("filename", help="file path of the file to split in blocks") 65 | parser.add_argument("-r", "--redundancy", help="the wanted redundancy.", default=2.0, type=float) 66 | parser.add_argument("--systematic", help="ensure that the k first drops are exactaly the k first blocks (systematic LT Codes)", action="store_true") 67 | parser.add_argument("--verbose", help="increase output verbosity", action="store_true") 68 | parser.add_argument("--x86", help="avoid using np.uint64 for x86-32bits systems", action="store_true") 69 | args = parser.parse_args() 70 | 71 | core.NUMPY_TYPE = np.uint32 if args.x86 else core.NUMPY_TYPE 72 | core.SYSTEMATIC = True if args.systematic else core.SYSTEMATIC 73 | core.VERBOSE = True if args.verbose else core.VERBOSE 74 | 75 | with open(args.filename, "rb") as file: 76 | 77 | print("Redundancy: {}".format(args.redundancy)) 78 | print("Systematic: {}".format(core.SYSTEMATIC)) 79 | 80 | filesize = os.path.getsize(args.filename) 81 | print("Filesize: {} bytes".format(filesize)) 82 | 83 | # Splitting the file in blocks & compute drops 84 | file_blocks = blocks_read(file, filesize) 85 | file_blocks_n = len(file_blocks) 86 | drops_quantity = int(file_blocks_n * args.redundancy) 87 | 88 | print("Blocks: {}".format(file_blocks_n)) 89 | print("Drops: {}\n".format(drops_quantity)) 90 | 91 | # Generating symbols (or drops) from the blocks 92 | file_symbols = [] 93 | for curr_symbol in encode(file_blocks, drops_quantity=drops_quantity): 94 | file_symbols.append(curr_symbol) 95 | 96 | # HERE: Simulating the loss of packets? 97 | 98 | # Recovering the blocks from symbols 99 | recovered_blocks, recovered_n = decode(file_symbols, blocks_quantity=file_blocks_n) 100 | 101 | if core.VERBOSE: 102 | print(recovered_blocks) 103 | print("------ Blocks : \t-----------") 104 | print(file_blocks) 105 | 106 | if recovered_n != file_blocks_n: 107 | print("All blocks are not recovered, we cannot proceed the file writing") 108 | exit() 109 | 110 | splitted = args.filename.split(".") 111 | if len(splitted) > 1: 112 | filename_copy = "".join(splitted[:-1]) + "-copy." + splitted[-1] 113 | else: 114 | filename_copy = args.filename + "-copy" 115 | 116 | # Write down the recovered blocks in a copy 117 | with open(filename_copy, "wb") as file_copy: 118 | blocks_write(recovered_blocks, file_copy, filesize) 119 | 120 | print("Wrote {} bytes in {}".format(os.path.getsize(filename_copy), filename_copy)) 121 | 122 | 123 | -------------------------------------------------------------------------------- /md5_checker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | FILE=$1 3 | python lt_codes.py $FILE --systematic 4 | md5sum "${FILE}" 5 | md5sum "${FILE%%.*}-copy.${FILE#*.}" -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.16.4 --------------------------------------------------------------------------------