├── .gitignore ├── README.md └── filefilter.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Distribution / packaging 9 | .Python 10 | env/ 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | lib/ 17 | lib64/ 18 | parts/ 19 | sdist/ 20 | var/ 21 | *.egg-info/ 22 | .installed.cfg 23 | *.egg 24 | 25 | # PyInstaller 26 | # Usually these files are written by a python script from a template 27 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 28 | *.manifest 29 | *.spec 30 | 31 | # Installer logs 32 | pip-log.txt 33 | pip-delete-this-directory.txt 34 | 35 | # Unit test / coverage reports 36 | htmlcov/ 37 | .tox/ 38 | .coverage 39 | .cache 40 | nosetests.xml 41 | coverage.xml 42 | 43 | # Translations 44 | *.mo 45 | *.pot 46 | 47 | # Django stuff: 48 | *.log 49 | 50 | # Sphinx documentation 51 | docs/_build/ 52 | 53 | # PyBuilder 54 | target/ 55 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Create Small Docker Images of Dynamically Linked Binaries 2 | 3 | *For the impatient:* 4 | `docker run perarneng/fortune` *(3.831 MB)* 5 | 6 | 7 | There have been some blog posts about people generating very small docker images from statically linked binaries. I wanted to see if i could do the same with dynamically linked binaries. Theory: as long as you provide the files that a process need it will run. 8 | 9 | To find out about what files a program uses we can start it with `strace`. This gives us the runtime view of the files it needs. To find all the shared libraries that its dynamically linked with we can use the `ldd` tool. 10 | 11 | First we need to find a suitable program. For this example i choose `fortune` that prints a fortune cookie to the commandline. The reason for that was that it is a dynamically linked binary and it also reads from files. 12 | 13 | If we capture all the files that the program depends on we can create a docker image with only those files. This is all done within an existing Debian image. The resulting image will be based on the `scratch` base docker image. The size of the fortune image is less than 4MB, compared to the debain image that weighs in on 85.1 MB. 14 | 15 | 16 | *Note: Some of the steps is done in a docker image and thats why i run as root* 17 | 18 | ### Step 1: Install the program 19 | `$ apt-get install fortune` 20 | 21 | ### Step 2: Run the program with strace 22 | ```shell 23 | root:/# strace -f -o fortune.strace.out /usr/games/fortune 24 | Never be led astray onto the path of virtue. 25 | ``` 26 | This will create a trace file called `fortune.strace.out`. The `-f` option tells strace to follow any child processes. 27 | 28 | ### Step 3: Check for any dynamically linked libraries 29 | ```shell 30 | root:/# ldd /usr/games/fortune > fortune.ldd.out 31 | ``` 32 | 33 | ### Step 4: Generate a unified list of files 34 | ```shell 35 | root:/# ./filefilter.py fortune.strace.out fortune.ldd.out > fortune.file.lst 36 | ``` 37 | The `filefilter.py` script that is also contained in this repository will parse the output from strace and ldd and output a single list of path's to files that the program has dependencies on. 38 | 39 | ### Step 5: Create an archive of the files 40 | ```shell 41 | root:/# cat fortune.lst | zip -@ fortune.zip 42 | ``` 43 | I want to have a `tar.gz` but have not yet found out how to do that and include the symbolic links as pysical files. That's wy i run it through zip first and unpack it in a separate folder. Then run tar and compress it to a file called `fortune.tar.gz`. The reason why i want it as a `tar.gz` is that it works best with dockers `ADD` command. 44 | ```shell 45 | root:/# mkdir fortune_tmp ; cd fortune_tmp; unzip ../fortune.zip 46 | root:/# tar cvfz ../fortune.tar.gz * ; cd .. ; rm -rf fortune_tmp 47 | root:/# tar tvfz fortune.tar.gz 48 | drwxr-xr-x root/root 0 2014-11-15 18:56 etc/ 49 | -rw-r--r-- root/root 9021 2014-11-15 16:33 etc/ld.so.cache 50 | drwxr-xr-x root/root 0 2014-11-15 18:56 lib/ 51 | drwxr-xr-x root/root 0 2014-11-15 18:56 lib/x86_64-linux-gnu/ 52 | -rwxr-xr-x root/root 1603600 2014-10-16 22:45 lib/x86_64-linux-gnu/libc.so.6 53 | drwxr-xr-x root/root 0 2014-11-15 18:56 lib64/ 54 | -rwxr-xr-x root/root 136936 2014-10-16 22:45 lib64/ld-linux-x86-64.so.2 55 | drwxr-xr-x root/root 0 2014-11-15 18:56 usr/ 56 | drwxr-xr-x root/root 0 2014-11-15 18:56 usr/games/ 57 | -rwxr-xr-x root/root 22240 2009-10-01 05:47 usr/games/fortune 58 | drwxr-xr-x root/root 0 2014-11-15 18:56 usr/share/ 59 | drwxr-xr-x root/root 0 2014-11-15 18:56 usr/share/games/ 60 | drwxr-xr-x root/root 0 2014-11-15 15:56 usr/share/games/fortunes/ 61 | -rw-r--r-- root/root 540 2009-10-01 05:47 usr/share/games/fortunes/riddles.dat 62 | -rw-r--r-- root/root 24516 2009-10-01 05:47 usr/share/games/fortunes/fortunes 63 | -rw-r--r-- root/root 53589 2009-10-01 05:47 usr/share/games/fortunes/literature.u8 64 | -rw-r--r-- root/root 1752 2009-10-01 05:47 usr/share/games/fortunes/fortunes.dat 65 | -rw-r--r-- root/root 24516 2009-10-01 05:47 usr/share/games/fortunes/fortunes.u8 66 | -rw-r--r-- root/root 1076 2009-10-01 05:47 usr/share/games/fortunes/literature.dat 67 | -rw-r--r-- root/root 20294 2009-10-01 05:47 usr/share/games/fortunes/riddles 68 | -rw-r--r-- root/root 53589 2009-10-01 05:47 usr/share/games/fortunes/literature 69 | -rw-r--r-- root/root 20294 2009-10-01 05:47 usr/share/games/fortunes/riddles.u8 70 | drwxr-xr-x root/root 0 2014-11-15 18:56 usr/lib/ 71 | drwxr-xr-x root/root 0 2014-11-15 18:56 usr/lib/x86_64-linux-gnu/ 72 | -rw-r--r-- root/root 1859144 2012-06-06 11:38 usr/lib/x86_64-linux-gnu/librecode.so.0 73 | ``` 74 | ### Step 6: Copy the files out of the running container 75 | ```shell 76 | $ docker cp 5d9ffe1ba9ee:/fortune.tar.gz . 77 | ``` 78 | ### Step 7: Create the Dockerfile file 79 | ```Dockerfile 80 | FROM scratch 81 | ADD fortune.tar.gz / 82 | CMD ["/usr/games/fortune"] 83 | ``` 84 | ### Step 8: Build the image 85 | ```shell 86 | $ docker build -t fortune . 87 | ``` 88 | **MAKE SURE THAT THE DIRECTORY ONLY CONTAINS:** `fortune.tar.gz` and `Dockerfile` 89 | 90 | ### Step 9: Run the image 91 | ```shell 92 | $ docker run fortune 93 | Expect the worst, it's the least you can do. 94 | ``` 95 | 96 | ## Final Notes 97 | The resulting image is about 4Mb large and is much better than close to 100Mb that the debian image is. Since `strace` is analysing the runtime it might happen that all paths have not been executed and all files have not been opened. So its a little bit risky analysing output from strace. You shoud probably know what the program uses and mostly depend on ldd and manually include any other files. 98 | 99 | The docker repo is located here: https://registry.hub.docker.com/u/perarneng/fortune/ 100 | 101 | 102 | -------------------------------------------------------------------------------- /filefilter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import re 4 | import sys 5 | import os 6 | import sets 7 | 8 | def is_file(file): 9 | if file is None: 10 | return false 11 | return os.path.isfile(file) 12 | 13 | def extract_path(pattern, line, group_nr): 14 | match = pattern.search(line) 15 | path = None 16 | if match: 17 | file = match.group(group_nr) 18 | if is_file(file): 19 | path = file 20 | return path 21 | 22 | 23 | def main(): 24 | 25 | if len(sys.argv) < 2: 26 | print "usage: filefilter <(strace|ldd) output...>" 27 | exit(1) 28 | 29 | strace_pattern = re.compile('(open|stat|execve)\("([^"]+)"') 30 | ldd_pattern = re.compile('(/.*) \(0x\w+\)$') 31 | 32 | for arg_file in sys.argv[1:]: 33 | 34 | output_file = open(arg_file, "r") 35 | 36 | path_set = sets.Set() 37 | 38 | for line in output_file: 39 | 40 | path = extract_path(strace_pattern, line, 2) 41 | if path is not None: 42 | path_set.add(path) 43 | 44 | path = extract_path(ldd_pattern, line, 1) 45 | if path is not None: 46 | path_set.add(path) 47 | 48 | for path in path_set: 49 | print path 50 | 51 | 52 | if __name__ == "__main__": 53 | main() 54 | 55 | --------------------------------------------------------------------------------