├── .gitignore
├── LICENSE.md
├── README.md
├── benchmarks
    ├── benchmark.log
    └── benchmark.sh
├── core.py
├── decoder.py
├── distributions.py
├── encoder.py
├── lt_codes.py
├── md5_checker.sh
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | files/
2 | __pycache__/
3 | .ipynb_checkpoints/
4 | dummy
5 | dummy-*


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 François Andrieux
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Fountain Code: Efficient Python Implementation of LT Codes
  2 | 
  3 | This project is the implementation in Python of the iterative encoding and iterative decoding algorithms of the [LT Codes](https://en.wikipedia.org/wiki/LT_codes),
  4 | an error correction code based on the principles of [Fountain Codes](https://en.wikipedia.org/wiki/Fountain_code) by Michael Luby.
  5 | I have written a whole article on LT Codes and this snippet that you can find here : [franpapers.com](https://franpapers.com/en/algorithmic/2018-introduction-to-fountain-codes-lt-codes-with-python/)
  6 | 
  7 | The encoder and decoder are optimized to handle big transfers for files between 1MB to 1GB at high speed.
  8 | 
  9 | ### Installation
 10 | 
 11 | This implementation requires at least python 3.x.
 12 | Some packages are not built-in. To install them with `pip` you can do:
 13 | 
 14 | ```
 15 | $ pip install -r requirements.txt
 16 | ```
 17 | 
 18 | ## Usage
 19 | 
 20 | An example describing how to use the implementation is in `lt_codes.py`, and you can use it to encode/decode a file on the fly (creates a file copy):
 21 | ```
 22 | $ python lt_codes.py filename [-h] [-r REDUNDANCY] [--systematic] [--verbose] [--x86]
 23 | ```
 24 | 
 25 | As an example, here is a basic test to ensure the integrity of the final file:
 26 | ```
 27 | $ echo "Hello!" > test.txt
 28 | $ python lt_codes.py test.txt --systematic
 29 | ```
 30 | A new file test-copy.txt should be created with the same content.
 31 | 
 32 | ### Content
 33 | 
 34 | * `core.py` contains the Symbol class, constants and functions that are used in both encoding and decoding.
 35 | * `distributions.py` contains the two functions that generate degrees based on the ideal soliton and robust soliton distributions
 36 | * `encoder.py` contains the encoding algorithm
 37 | * `decoder.py` contains the decoding algorithm
 38 | * `md5_checker.sh` calls `lt_codes.py` and then compare the integrity of the original file with the newly created file. The integrity check is made with `md5sum`, add the ".exe" if you work on Window. Replace it by `md5 -r` if you work on Mac, or run `brew install md5sha1sum`.
 39 | 
 40 | ## Benchmarks
 41 | The time consumed by the encoding and decoding process is completely related to the size of the file to encode and the wanted redundancy.
 42 | I have made some measure on an Intel i5 @ 2.30GHz with a 1.5 redundancy :
 43 | 
 44 | <table>
 45 | <thead>
 46 | <tr>
 47 | <td rowspan="2"><strong>Size (MB)</strong></td>
 48 | <td rowspan="2"><strong>Blocks</strong></td>
 49 | <td rowspan="2"><strong>Symbols</strong></td>
 50 | <td colspan="2"><strong>Encoding</strong></td>
 51 | <td colspan="2"><strong>Decoding</strong></td>
 52 | </tr>
 53 | <tr>
 54 | <td><strong>Time (s)</strong></td>
 55 | <td><strong>Speed (MB/s)</strong></td>
 56 | <td><strong>Time (s)</strong></td>
 57 | <td><strong>Speed (MB/s)</strong></td>
 58 | </tr>
 59 | </thead>
 60 | <tbody>
 61 | <tr>
 62 | <td>1</td>
 63 | <td>16</td>
 64 | <td>24</td>
 65 | <td>0.00</td>
 66 | <td>-</td>
 67 | <td>0.00</td>
 68 | <td>-</td>
 69 | </tr>
 70 | <tr>
 71 | <td>100</td>
 72 | <td>1600</td>
 73 | <td>2400</td>
 74 | <td>0.21</td>
 75 | <td>476.1</td>
 76 | <td>0.31</td>
 77 | <td>322.5</td>
 78 | </tr>
 79 | <tr>
 80 | <td>1200</td>
 81 | <td>19200</td>
 82 | <td>28800</td>
 83 | <td>3.86</td>
 84 | <td>310.8</td>
 85 | <td>39.82</td>
 86 | <td>30.1</td>
 87 | </tr>
 88 | <tr>
 89 | <td>2000</td>
 90 | <td>32000</td>
 91 | <td>48000</td>
 92 | <td>6.44</td>
 93 | <td>310.5</td>
 94 | <td>104.10</td>
 95 | <td>19.2</td>
 96 | </tr>
 97 | <tr>
 98 | <td>3600</td>
 99 | <td>57600</td>
100 | <td>86400</td>
101 | <td>23.14</td>
102 | <td>155.5</td>
103 | <td>426.36</td>
104 | <td>8.4</td>
105 | </tr>
106 | </tbody>
107 | </table>
108 | 
109 | <img src="https://franpapers.com/wp-content/uploads/2018/06/word-image-18.png" width=500 />
110 | 
111 | Note: `PACKET_SIZE` is set to 65536 for theses tests. Lowering it will result in lower speeds but it might be necessary to send small files in many chunks.
112 | 
113 | 
114 | ## References
115 | 
116 | > M.Luby, "LT Codes", The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002.
117 | 
118 | ## License
119 | 
120 | MIT License
121 | Copyright (c) 2018 François Andrieux
122 | 
123 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
124 | 
125 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
126 | 
127 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
128 | 
129 | 


--------------------------------------------------------------------------------
/benchmarks/benchmark.log:
--------------------------------------------------------------------------------
  1 | Creating dummy data for size=100
  2 | ======================================
  3 | Redundancy: 1.5
  4 | Systematic: True
  5 | Filesize: 100 bytes
  6 | Blocks: 1
  7 | Drops: 1
  8 | 
  9 | Generating graph...
 10 | Ready for encoding.
 11 | -- Encoding: 1/1 - 100.00% symbols at 625.00 MB/s       ~0.00s
 12 | ----- Correctly dropped 1 symbols (packet size=65536)
 13 | Graph built back. Ready for decoding.
 14 | 
 15 | ----- Solved Blocks  1/ 1 --
 16 | Wrote 100 bytes in dummy-copy
 17 | 
 18 | Creating dummy data for size=102400
 19 | ======================================
 20 | Redundancy: 1.5
 21 | Systematic: True
 22 | Filesize: 102400 bytes
 23 | Blocks: 2
 24 | Drops: 3
 25 | 
 26 | Generating graph...
 27 | Ready for encoding.
 28 | -- Encoding: 3/3 - 100.00% symbols at 170.80 MB/s       ~0.00s
 29 | ----- Correctly dropped 3 symbols (packet size=65536)
 30 | Graph built back. Ready for decoding.
 31 | -- Decoding: 2/2 - 100.00% symbols at 1250.00 MB/s       ~0.00s
 32 | ----- Solved Blocks  2/ 2 --
 33 | Wrote 102400 bytes in dummy-copy
 34 | 
 35 | Creating dummy data for size=1048576
 36 | ======================================
 37 | Redundancy: 1.5
 38 | Systematic: True
 39 | Filesize: 1048576 bytes
 40 | Blocks: 16
 41 | Drops: 24
 42 | 
 43 | Generating graph...
 44 | Ready for encoding.
 45 | -- Encoding: 24/24 - 100.00% symbols at 715.47 MB/s       ~0.00s
 46 | ----- Correctly dropped 24 symbols (packet size=65536)
 47 | Graph built back. Ready for decoding.
 48 | -- Decoding: 16/16 - 100.00% symbols at 10000.00 MB/s       ~0.00s
 49 | ----- Solved Blocks 16/16 --
 50 | Wrote 983040 bytes in dummy-copy
 51 | 
 52 | Creating dummy data for size=104857600
 53 | ======================================
 54 | Redundancy: 1.5
 55 | Systematic: True
 56 | Filesize: 104857600 bytes
 57 | Blocks: 1600
 58 | Drops: 2400
 59 | 
 60 | Generating graph...
 61 | Ready for encoding.
 62 | -- Encoding: 2400/2400 - 100.00% symbols at 724.93 MB/s       ~0.21s
 63 | ----- Correctly dropped 2400 symbols (packet size=65536)
 64 | Graph built back. Ready for decoding.
 65 | -- Decoding: 1600/1600 - 100.00% symbols at 323.91 MB/s       ~0.31s
 66 | ----- Solved Blocks 1600/1600 --
 67 | Wrote 104792064 bytes in dummy-copy
 68 | 
 69 | Creating dummy data for size=419430400
 70 | ======================================
 71 | Redundancy: 1.5
 72 | Systematic: True
 73 | Filesize: 419430400 bytes
 74 | Blocks: 6400
 75 | Drops: 9600
 76 | 
 77 | Generating graph...
 78 | Ready for encoding.
 79 | -- Encoding: 9600/9600 - 100.00% symbols at 674.76 MB/s       ~0.89s
 80 | ----- Correctly dropped 9600 symbols (packet size=65536)
 81 | Graph built back. Ready for decoding.
 82 | -- Decoding: 6400/6400 - 100.00% symbols at 99.50 MB/s       ~4.02s
 83 | ----- Solved Blocks 6400/6400 --
 84 | Wrote 419364864 bytes in dummy-copy
 85 | 
 86 | Creating dummy data for size=838860800
 87 | ======================================
 88 | Redundancy: 1.5
 89 | Systematic: True
 90 | Filesize: 838860800 bytes
 91 | Blocks: 12800
 92 | Drops: 19200
 93 | 
 94 | Generating graph...
 95 | Ready for encoding.
 96 | -- Encoding: 19200/19200 - 100.00% symbols at 525.49 MB/s       ~2.28s
 97 | ----- Correctly dropped 19200 symbols (packet size=65536)
 98 | Graph built back. Ready for decoding.
 99 | -- Decoding: 12800/12800 - 100.00% symbols at 48.95 MB/s       ~16.34s
100 | ----- Solved Blocks 12800/12800 --
101 | Wrote 838795264 bytes in dummy-copy
102 | 
103 | Creating dummy data for size=1258291200
104 | ======================================
105 | Redundancy: 1.5
106 | Systematic: True
107 | Filesize: 1258291200 bytes
108 | Blocks: 19200
109 | Drops: 28800
110 | 
111 | Generating graph...
112 | Ready for encoding.
113 | -- Encoding: 28800/28800 - 100.00% symbols at 465.99 MB/s       ~3.86s
114 | ----- Correctly dropped 28800 symbols (packet size=65536)
115 | Graph built back. Ready for decoding.
116 | -- Decoding: 19200/19200 - 100.00% symbols at 30.13 MB/s       ~39.82s
117 | ----- Solved Blocks 19200/19200 --
118 | Wrote 1258225664 bytes in dummy-copy
119 | 
120 | Creating dummy data for size=1264867868
121 | ======================================
122 | Redundancy: 1.5
123 | Systematic: True
124 | Filesize: 1264867868 bytes
125 | Blocks: 19301
126 | Drops: 28951
127 | 
128 | Generating graph...
129 | Ready for encoding.
130 | -- Encoding: 28951/28951 - 100.00% symbols at 543.55 MB/s       ~3.33s
131 | ----- Correctly dropped 28951 symbols (packet size=65536)
132 | Graph built back. Ready for decoding.
133 | -- Decoding: 19301/19301 - 100.00% symbols at 30.67 MB/s       ~39.33s
134 | ----- Solved Blocks 19301/19301 --
135 | Wrote 1264867868 bytes in dummy-copy
136 | 
137 | Creating dummy data for size=2097152000
138 | ======================================
139 | Redundancy: 1.5
140 | Systematic: True
141 | Filesize: 2097152000 bytes
142 | Blocks: 32000
143 | Drops: 48000
144 | 
145 | Generating graph...
146 | Ready for encoding.
147 | -- Encoding: 48000/48000 - 100.00% symbols at 466.18 MB/s       ~6.44s
148 | ----- Correctly dropped 48000 symbols (packet size=65536)
149 | Graph built back. Ready for decoding.
150 | -- Decoding: 32000/32000 - 100.00% symbols at 19.21 MB/s       ~104.10s
151 | ----- Solved Blocks 32000/32000 --
152 | Wrote 2097086464 bytes in dummy-copy
153 | 
154 | Creating dummy data for size=2516582400
155 | ======================================
156 | Redundancy: 1.5
157 | Systematic: True
158 | Filesize: 2516582400 bytes
159 | Blocks: 38400
160 | Drops: 57600
161 | 
162 | Generating graph...
163 | Ready for encoding.
164 | -- Encoding: 57600/57600 - 100.00% symbols at 420.50 MB/s       ~8.56s
165 | ----- Correctly dropped 57600 symbols (packet size=65536)
166 | Graph built back. Ready for decoding.
167 | -- Decoding: 38400/38400 - 100.00% symbols at 15.61 MB/s       ~153.71s
168 | ----- Solved Blocks 38400/38400 --
169 | Wrote 2516516864 bytes in dummy-copy
170 | 
171 | Creating dummy data for size=2931315179
172 | ======================================
173 | Redundancy: 1.5
174 | Systematic: True
175 | Filesize: 2931315179 bytes
176 | Blocks: 44729
177 | Drops: 67093
178 | 
179 | Generating graph...
180 | Ready for encoding.
181 | -- Encoding: 67093/67093 - 100.00% symbols at 495.22 MB/s       ~8.47s
182 | ----- Correctly dropped 67093 symbols (packet size=65536)
183 | Graph built back. Ready for decoding.
184 | -- Decoding: 44729/44729 - 100.00% symbols at 10.41 MB/s       ~268.49s
185 | ----- Solved Blocks 44729/44729 --
186 | Wrote 2931315179 bytes in dummy-copy
187 | 
188 | Creating dummy data for size=3774873600
189 | ======================================
190 | Redundancy: 1.5
191 | Systematic: True
192 | Filesize: 3774873600 bytes
193 | Blocks: 57600
194 | Drops: 86400
195 | 
196 | Generating graph...
197 | Ready for encoding.
198 | -- Encoding: 86400/86400 - 100.00% symbols at 233.35 MB/s       ~23.14s
199 | ----- Correctly dropped 86400 symbols (packet size=65536)
200 | Graph built back. Ready for decoding.
201 | -- Decoding: 57600/57600 - 100.00% symbols at 8.44 MB/s       ~426.36s
202 | 
203 | ----- Solved Blocks 57600/57600 --
204 | Wrote 3774808064 bytes in dummy-copy
205 | 


--------------------------------------------------------------------------------
/benchmarks/benchmark.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Sizes: 100B, 100KB, 1MB, 100MB, 400MB, 800MB, 1.17GB, 1.56GB, 1.95GB, 2.34GB, 2.73GB, 3.51GB
 4 | sizes="100 102400 1048576 104857600 419430400 838860800 1258291200 1264867868 2097152000 2516582400 2931315179 3774873600"
 5 | for SIZE in $sizes
 6 | do
 7 |     echo -e "\nCreating dummy data for size=$SIZE"
 8 |     echo "======================================"
 9 |     echo "" > benchmark.log
10 |     head -c $SIZE /dev/urandom > dummy
11 |     python ../lt_codes.py dummy -r 1.5 --systematic
12 | done
13 | 
14 | rm dummy
15 | rm dummy-copy
16 | 


--------------------------------------------------------------------------------
/core.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sys
 3 | import math
 4 | import time
 5 | import numpy as np
 6 | import random
 7 | from random import choices
 8 | 
 9 | SYSTEMATIC = False
10 | VERBOSE = False
11 | PACKET_SIZE = 65536
12 | # PACKET_SIZE = 32768
13 | # PACKET_SIZE = 16384
14 | # PACKET_SIZE = 4096
15 | # PACKET_SIZE = 1024
16 | # PACKET_SIZE = 512
17 | # PACKET_SIZE = 128
18 | ROBUST_FAILURE_PROBABILITY = 0.01
19 | NUMPY_TYPE = np.uint64
20 | # NUMPY_TYPE = np.uint32
21 | # NUMPY_TYPE = np.uint16
22 | # NUMPY_TYPE = np.uint8
23 | EPSILON = 0.0001
24 | 
25 | class Symbol:
26 |     __slots__ = ["index", "degree", "data", "neighbors"] # fixing attributes may reduce memory usage
27 | 
28 |     def __init__(self, index, degree, data):
29 |         self.index = index
30 |         self.degree = degree
31 |         self.data = data
32 | 
33 |     def log(self, blocks_quantity):
34 |         neighbors, _ = generate_indexes(self.index, self.degree, blocks_quantity)
35 |         print("symbol_{} degree={}\t {}".format(self.index, self.degree, neighbors))
36 | 
37 | def generate_indexes(symbol_index, degree, blocks_quantity):
38 |     """Randomly get `degree` indexes, given the symbol index as a seed
39 | 
40 |     Generating with a seed allows saving only the seed (and the amount of degrees) 
41 |     and not the whole array of indexes. That saves memory, but also bandwidth when paquets are sent.
42 | 
43 |     The random indexes need to be unique because the decoding process uses dictionnaries for performance enhancements.
44 |     Additionnally, even if XORing one block with itself among with other is not a problem for the algorithm, 
45 |     it is better to avoid uneffective operations like that.
46 | 
47 |     To be sure to get the same random indexes, we need to pass 
48 |     """
49 |     if SYSTEMATIC and symbol_index < blocks_quantity:
50 |         indexes = [symbol_index]               
51 |         degree = 1     
52 |     else:
53 |         random.seed(symbol_index)
54 |         indexes = random.sample(range(blocks_quantity), degree)
55 | 
56 |     return indexes, degree
57 | 
58 | def log(process, iteration, total, start_time):
59 |     """Log the processing in a gentle way, each seconds"""
60 |     global log_actual_time
61 |     
62 |     if "log_actual_time" not in globals():
63 |         log_actual_time = time.time()
64 | 
65 |     if time.time() - log_actual_time > 1 or iteration == total - 1:
66 |         
67 |         log_actual_time = time.time()
68 |         elapsed = log_actual_time - start_time + EPSILON
69 |         speed = (iteration + 1) / elapsed * PACKET_SIZE / (1024 * 1024)
70 | 
71 |         print("-- {}: {}/{} - {:.2%} symbols at {:.2f} MB/s       ~{:.2f}s".format(
72 |             process, iteration + 1, total, (iteration + 1) / total, speed, elapsed), end="\r", flush=True)


--------------------------------------------------------------------------------
/decoder.py:
--------------------------------------------------------------------------------
  1 | from core import *
  2 | 
  3 | def recover_graph(symbols, blocks_quantity):
  4 |     """ Get back the same random indexes (or neighbors), thanks to the symbol id as seed.
  5 |     For an easy implementation purpose, we register the indexes as property of the Symbols objects.
  6 |     """
  7 | 
  8 |     for symbol in symbols:
  9 |         
 10 |         neighbors, deg = generate_indexes(symbol.index, symbol.degree, blocks_quantity)
 11 |         symbol.neighbors = {x for x in neighbors}
 12 |         symbol.degree = deg
 13 | 
 14 |         if VERBOSE:
 15 |             symbol.log(blocks_quantity)
 16 | 
 17 |     return symbols
 18 | 
 19 | def reduce_neighbors(block_index, blocks, symbols):
 20 |     """ Loop over the remaining symbols to find for a common link between 
 21 |     each symbol and the last solved block `block`
 22 | 
 23 |     To avoid increasing complexity and another for loop, the neighbors are stored as dictionnary
 24 |     which enable to directly delete the entry after XORing back.
 25 |     """
 26 |     
 27 |     for other_symbol in symbols:
 28 |         if other_symbol.degree > 1 and block_index in other_symbol.neighbors:
 29 |         
 30 |             # XOR the data and remove the index from the neighbors
 31 |             other_symbol.data = np.bitwise_xor(blocks[block_index], other_symbol.data)
 32 |             other_symbol.neighbors.remove(block_index)
 33 | 
 34 |             other_symbol.degree -= 1
 35 |             
 36 |             if VERBOSE:
 37 |                 print("XOR block_{} with symbol_{} :".format(block_index, other_symbol.index), list(other_symbol.neighbors.keys())) 
 38 | 
 39 | 
 40 | def decode(symbols, blocks_quantity):
 41 |     """ Iterative decoding - Decodes all the passed symbols to build back the data as blocks. 
 42 |     The function returns the data at the end of the process.
 43 |     
 44 |     1. Search for an output symbol of degree one
 45 |         (a) If such an output symbol y exists move to step 2.
 46 |         (b) If no output symbols of degree one exist, iterative decoding exits and decoding fails.
 47 |     
 48 |     2. Output symbol y has degree one. Thus, denoting its only neighbour as v, the
 49 |         value of v is recovered by setting v = y.
 50 | 
 51 |     3. Update.
 52 | 
 53 |     4. If all k input symbols have been recovered, decoding is successful and iterative
 54 |         decoding ends. Otherwise, go to step 1.
 55 |     """
 56 | 
 57 |     symbols_n = len(symbols)
 58 |     assert symbols_n > 0, "There are no symbols to decode."
 59 | 
 60 |     # We keep `blocks_n` notation and create the empty list
 61 |     blocks_n = blocks_quantity
 62 |     blocks = [None] * blocks_n
 63 | 
 64 |     # Recover the degrees and associated neighbors using the seed (the index, cf. encoding).
 65 |     symbols = recover_graph(symbols, blocks_n)
 66 |     print("Graph built back. Ready for decoding.", flush=True)
 67 |     
 68 |     solved_blocks_count = 0
 69 |     iteration_solved_count = 0
 70 |     start_time = time.time()
 71 |     
 72 |     while iteration_solved_count > 0 or solved_blocks_count == 0:
 73 |     
 74 |         iteration_solved_count = 0
 75 | 
 76 |         # Search for solvable symbols
 77 |         for i, symbol in enumerate(symbols):
 78 | 
 79 |             # Check the current degree. If it's 1 then we can recover data
 80 |             if symbol.degree == 1: 
 81 | 
 82 |                 iteration_solved_count += 1 
 83 |                 block_index = next(iter(symbol.neighbors)) 
 84 |                 symbols.pop(i)
 85 | 
 86 |                 # This symbol is redundant: another already helped decoding the same block
 87 |                 if blocks[block_index] is not None:
 88 |                     continue
 89 | 
 90 |                 blocks[block_index] = symbol.data
 91 | 
 92 |                 if VERBOSE:
 93 |                     print("Solved block_{} with symbol_{}".format(block_index, symbol.index))
 94 |               
 95 |                 # Update the count and log the processing
 96 |                 solved_blocks_count += 1
 97 |                 log("Decoding", solved_blocks_count, blocks_n, start_time)
 98 | 
 99 |                 # Reduce the degrees of other symbols that contains the solved block as neighbor             
100 |                 reduce_neighbors(block_index, blocks, symbols)
101 | 
102 |     print("\n----- Solved Blocks {:2}/{:2} --".format(solved_blocks_count, blocks_n))
103 | 
104 |     return np.asarray(blocks), solved_blocks_count


--------------------------------------------------------------------------------
/distributions.py:
--------------------------------------------------------------------------------
 1 | from core import *
 2 | 
 3 | def ideal_distribution(N):
 4 |     """ Create the ideal soliton distribution. 
 5 |     In practice, this distribution gives not the best results
 6 |     Cf. https://en.wikipedia.org/wiki/Soliton_distribution
 7 |     """
 8 | 
 9 |     probabilities = [0, 1 / N]
10 |     probabilities += [1 / (k * (k - 1)) for k in range(2, N+1)]
11 |     probabilities_sum = sum(probabilities)
12 | 
13 |     assert probabilities_sum >= 1 - EPSILON and probabilities_sum <= 1 + EPSILON, "The ideal distribution should be standardized"
14 |     return probabilities
15 | 
16 | def robust_distribution(N):
17 |     """ Create the robust soliton distribution. 
18 |     This fixes the problems of the ideal distribution
19 |     Cf. https://en.wikipedia.org/wiki/Soliton_distribution
20 |     """
21 | 
22 |     # The choice of M is not a part of the distribution ; it may be improved
23 |     # We take the median and add +1 to avoid possible division by zero 
24 |     M = N // 2 + 1 
25 |     R = N / M
26 | 
27 |     extra_proba = [0] + [1 / (i * M) for i in range(1, M)]
28 |     extra_proba += [math.log(R / ROBUST_FAILURE_PROBABILITY) / M]  # Spike at M
29 |     extra_proba += [0 for k in range(M+1, N+1)]
30 | 
31 |     probabilities = np.add(extra_proba, ideal_distribution(N))
32 |     probabilities /= np.sum(probabilities)
33 |     probabilities_sum = np.sum(probabilities)
34 | 
35 |     assert probabilities_sum >= 1 - EPSILON and probabilities_sum <= 1 + EPSILON, "The robust distribution should be standardized"
36 |     return probabilities
37 | 


--------------------------------------------------------------------------------
/encoder.py:
--------------------------------------------------------------------------------
 1 | from core import *
 2 | from distributions import *
 3 | 
 4 | def get_degrees_from(distribution_name, N, k):
 5 |     """ Returns the random degrees from a given distribution of probabilities.
 6 |     The degrees distribution must look like a Poisson distribution and the 
 7 |     degree of the first drop is 1 to ensure the start of decoding.
 8 |     """
 9 | 
10 |     if distribution_name == "ideal":
11 |         probabilities = ideal_distribution(N)
12 |     elif distribution_name == "robust":
13 |         probabilities = robust_distribution(N)
14 |     else:
15 |         probabilities = None
16 |     
17 |     population = list(range(0, N+1))
18 |     return [1] + choices(population, probabilities, k=k-1)
19 |    
20 | def encode(blocks, drops_quantity):
21 |     """ Iterative encoding - Encodes new symbols and yield them.
22 |     Encoding one symbol is described as follow:
23 | 
24 |     1.  Randomly choose a degree according to the degree distribution, save it into "deg"
25 |         Note: below we prefer to randomly choose all the degrees at once for our symbols.
26 | 
27 |     2.  Choose uniformly at random 'deg' distinct input blocs. 
28 |         These blocs are also called "neighbors" in graph theory.
29 |     
30 |     3.  Compute the output symbol as the combination of the neighbors.
31 |         In other means, we XOR the chosen blocs to produce the symbol.
32 |     """
33 | 
34 |     # Display statistics
35 |     blocks_n = len(blocks)
36 |     assert blocks_n <= drops_quantity, "Because of the unicity in the random neighbors, it is need to drop at least the same amount of blocks"
37 | 
38 |     print("Generating graph...")
39 |     start_time = time.time()
40 | 
41 |     # Generate random indexes associated to random degrees, seeded with the symbol id
42 |     random_degrees = get_degrees_from("robust", blocks_n, k=drops_quantity)
43 | 
44 |     print("Ready for encoding.", flush=True)
45 | 
46 |     for i in range(drops_quantity):
47 |         
48 |         # Get the random selection, generated precedently (for performance)
49 |         selection_indexes, deg = generate_indexes(i, random_degrees[i], blocks_n)
50 | 
51 |         # Xor each selected array within each other gives the drop (or just take one block if there is only one selected)
52 |         drop = blocks[selection_indexes[0]]
53 |         for n in range(1, deg):
54 |             drop = np.bitwise_xor(drop, blocks[selection_indexes[n]])
55 |             # drop = drop ^ blocks[selection_indexes[n]] # according to my tests, this has the same performance
56 | 
57 |         # Create symbol, then log the process
58 |         symbol = Symbol(index=i, degree=deg, data=drop)
59 |         
60 |         if VERBOSE:
61 |             symbol.log(blocks_n)
62 | 
63 |         log("Encoding", i, drops_quantity, start_time)
64 | 
65 |         yield symbol
66 | 
67 |     print("\n----- Correctly dropped {} symbols (packet size={})".format(drops_quantity, PACKET_SIZE))
68 | 


--------------------------------------------------------------------------------
/lt_codes.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import os
  5 | import math
  6 | import argparse
  7 | import numpy as np
  8 | import core
  9 | from encoder import encode
 10 | from decoder import decode
 11 | 
 12 | def blocks_read(file, filesize):
 13 |     """ Read the given file by blocks of `core.PACKET_SIZE` and use np.frombuffer() improvement.
 14 | 
 15 |     Byt default, we store each octet into a np.uint8 array space, but it is also possible
 16 |     to store up to 8 octets together in a np.uint64 array space.  
 17 |     
 18 |     This process is not saving memory but it helps reduce dimensionnality, especially for the 
 19 |     XOR operation in the encoding. Example:
 20 |     * np.frombuffer(b'\x01\x02', dtype=np.uint8) => array([1, 2], dtype=uint8)
 21 |     * np.frombuffer(b'\x01\x02', dtype=np.uint16) => array([513], dtype=uint16)
 22 |     """
 23 | 
 24 |     blocks_n = math.ceil(filesize / core.PACKET_SIZE)
 25 |     blocks = []
 26 | 
 27 |     # Read data by blocks of size core.PACKET_SIZE
 28 |     for i in range(blocks_n):
 29 |             
 30 |         data = bytearray(file.read(core.PACKET_SIZE))
 31 | 
 32 |         if not data:
 33 |             raise "stop"
 34 | 
 35 |         # The last read bytes needs a right padding to be XORed in the future
 36 |         if len(data) != core.PACKET_SIZE:
 37 |             data = data + bytearray(core.PACKET_SIZE - len(data))
 38 |             assert i == blocks_n-1, "Packet #{} has a not handled size of {} bytes".format(i, len(blocks[i]))
 39 | 
 40 |         # Paquets are condensed in the right array type
 41 |         blocks.append(np.frombuffer(data, dtype=core.NUMPY_TYPE))
 42 | 
 43 |     return blocks
 44 | 
 45 | def blocks_write(blocks, file, filesize):
 46 |     """ Write the given blocks into a file
 47 |     """
 48 | 
 49 |     count = 0
 50 |     for data in recovered_blocks[:-1]:
 51 |         file_copy.write(data)
 52 |         count += len(data)
 53 | 
 54 |     # Convert back the bytearray to bytes and shrink back 
 55 |     last_bytes = bytes(recovered_blocks[-1])
 56 |     shrinked_data = last_bytes[:filesize % core.PACKET_SIZE]
 57 |     file_copy.write(shrinked_data)
 58 | 
 59 | #########################################################
 60 |     
 61 | if __name__ == "__main__":
 62 |     
 63 |     parser = argparse.ArgumentParser(description="Robust implementation of LT Codes encoding/decoding process.")
 64 |     parser.add_argument("filename", help="file path of the file to split in blocks")
 65 |     parser.add_argument("-r", "--redundancy", help="the wanted redundancy.", default=2.0, type=float)
 66 |     parser.add_argument("--systematic", help="ensure that the k first drops are exactaly the k first blocks (systematic LT Codes)", action="store_true")
 67 |     parser.add_argument("--verbose", help="increase output verbosity", action="store_true")
 68 |     parser.add_argument("--x86", help="avoid using np.uint64 for x86-32bits systems", action="store_true")
 69 |     args = parser.parse_args()
 70 | 
 71 |     core.NUMPY_TYPE = np.uint32 if args.x86 else core.NUMPY_TYPE
 72 |     core.SYSTEMATIC = True if args.systematic else core.SYSTEMATIC 
 73 |     core.VERBOSE = True if args.verbose else core.VERBOSE    
 74 | 
 75 |     with open(args.filename, "rb") as file:
 76 | 
 77 |         print("Redundancy: {}".format(args.redundancy))
 78 |         print("Systematic: {}".format(core.SYSTEMATIC))
 79 | 
 80 |         filesize = os.path.getsize(args.filename)
 81 |         print("Filesize: {} bytes".format(filesize))
 82 | 
 83 |         # Splitting the file in blocks & compute drops
 84 |         file_blocks = blocks_read(file, filesize)
 85 |         file_blocks_n = len(file_blocks)
 86 |         drops_quantity = int(file_blocks_n * args.redundancy)
 87 | 
 88 |         print("Blocks: {}".format(file_blocks_n))
 89 |         print("Drops: {}\n".format(drops_quantity))
 90 | 
 91 |         # Generating symbols (or drops) from the blocks
 92 |         file_symbols = []
 93 |         for curr_symbol in encode(file_blocks, drops_quantity=drops_quantity):
 94 |             file_symbols.append(curr_symbol)
 95 | 
 96 |         # HERE: Simulating the loss of packets?
 97 | 
 98 |         # Recovering the blocks from symbols
 99 |         recovered_blocks, recovered_n = decode(file_symbols, blocks_quantity=file_blocks_n)
100 |         
101 |         if core.VERBOSE:
102 |             print(recovered_blocks)
103 |             print("------ Blocks :  \t-----------")
104 |             print(file_blocks)
105 | 
106 |         if recovered_n != file_blocks_n:
107 |             print("All blocks are not recovered, we cannot proceed the file writing")
108 |             exit()
109 | 
110 |         splitted = args.filename.split(".")
111 |         if len(splitted) > 1:
112 |             filename_copy = "".join(splitted[:-1]) + "-copy." + splitted[-1] 
113 |         else:
114 |             filename_copy = args.filename + "-copy"
115 | 
116 |         # Write down the recovered blocks in a copy 
117 |         with open(filename_copy, "wb") as file_copy:
118 |             blocks_write(recovered_blocks, file_copy, filesize)
119 | 
120 |         print("Wrote {} bytes in {}".format(os.path.getsize(filename_copy), filename_copy))
121 | 
122 | 
123 | 


--------------------------------------------------------------------------------
/md5_checker.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | FILE=$1
3 | python lt_codes.py $FILE --systematic
4 | md5sum "${FILE}"
5 | md5sum "${FILE%%.*}-copy.${FILE#*.}"


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.16.4


--------------------------------------------------------------------------------