├── .gitignore
├── README.md
├── dat
├── firmware_download_list.csv
└── firmware_ftp_list.csv
├── figs
├── firmware_arch_distribution.jpg
└── firmware_os_distribution.jpg
└── src
├── fw_downloader.py
├── fw_unpacker.py
└── main.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | fws
3 | binwalk
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Firmware-Dataset
2 |
3 |
4 | ## Introduction
5 | We collected 16.9 TB of firmware images from the official websites of vendors, open FTP sites, and open-source repositories. Currently, 157,141 firmware images (about 6 TB) from 204 vendors have been pre-processed. The corresponding products of these firmware images are commonly used in consumer markets, such as networking devices, cameras, and smart home devices. The pre-processing for other firmware images is still running since these procedures require a large amount of computation. The pre-processed firmware images are open-source for research purposes, the distribution of their architecture type and OS type is shown in Fig.1. We will continue to update this repository as we collect more firmware images in the future.
6 |
7 |
8 |
9 |
10 | Fig.1. Firmware distribution in terms of OS (left) and architecture (right).
11 |
12 |
13 |
14 | ## Usage
15 |
16 | - Firmware download links can be found in *[firmware_download_list.csv](dat/firmware_download_list.csv)* and *[firmware_ftp_list.csv](dat/firmware_ftp_list.csv)*.
17 |
18 | - Firmware images can be downloaded using tools like *wget* or using our script *[fw_downloader.py](src/fw_downloader.py)*.
19 |
20 | - Additionally, they can be unpacked using *[binwalk](https://github.com/ReFirmLabs/binwalk/tree/master)*, the usage of which can be found in *[fw_unpacker.py](src/fw_unpacker.py)*.
21 |
22 | - *[main.py](src/main.py)* supports first downloading and then unpacking firmware images.
23 |
24 |
25 | ## Project Structure
26 | ```
27 | ├── README.md
28 | ├── figs
29 | │ ├── firmware_arch_distribution.jpg
30 | │ └── firmware_os_distribution.jpg
31 | ├── dat
32 | │ ├── firmware_download_list.csv
33 | │ └── firmware_ftp_list.csv
34 | └── src
35 | ├── fw_downloader.py
36 | ├── fw_unpacker.py
37 | └── main.py
38 | ```
39 |
40 |
41 | ## Citation
42 | If you find our dataset helpful, please consider citing our papers.
43 |
44 | ```
45 | @inproceedings{wu2024firmware,
46 | title={Your Firmware Has Arrived: A Study of Firmware Update Vulnerabilities},
47 | author={Wu, Yuhao and Wang, Jinwen and Wang, Yujie and Zhai, Shixuan and Li, Zihan and He, Yi and Sun, Kun and Li, Qi and Zhang, Ning},
48 | booktitle={USENIX Security Symposium},
49 | year={2024}
50 | }
51 | ```
52 |
53 | ```
54 | @inproceedings{wu2022measuring,
55 | title={Work-in-Progress: Measuring Security Protection in Real-time Embedded Firmware},
56 | author={Wu, Yuhao and Wang, Yujie and Zhai, Shixuan and Li, Zihan and Li, Ao and Wang, Jinwen and Zhang, Ning},
57 | booktitle={IEEE Real-Time Systems Symposium (RTSS)},
58 | year={2022}
59 | }
60 | ```
61 |
62 |
--------------------------------------------------------------------------------
/dat/firmware_ftp_list.csv:
--------------------------------------------------------------------------------
1 | vendor,url
2 | zyxel,ftp.zyxel.lv
3 | zyxel,ftp.zyxel.com.tr
4 | zyxel,ftp.zyxel.com
5 | weintek,ftp.weintek.com
6 | tyan,ftp.tyan.com
7 | tiger,tiger.satsale.net
8 | simet,ftp.simet.com.tr
9 | sangoma,ftp.sangoma.com
10 | rinotel,rinotel.com
11 | ral,ftp.ral.ro
12 | proinit,188.138.149.64
13 | pctvsystems,ftp.pctvsystems.com
14 | partner-tech,ftp.partner-tech.eu
15 | netgear,downloads.netgear.com
16 | multitech,ftp.multitech.com
17 | luis,ftp.luis.ru
18 | loks,ftp.loks.lv
19 | infinet,ftp.infinet.ru
20 | geoteam,ftp.geoteam.dk
21 | eutronix,ftp.eutronix.be
22 | epson,download.epson-europe.com
23 | draytek,ftp.draytek.com
24 | dlink,ftp.dlink.ca
25 | dlink,ftp.dlink.by
26 | detewe,aux.detewe.ru
27 | depo-computers,ftp.depo.ru
28 | dd-wrt,ftp.dd-wrt.com
29 | d-link,ftp2.dlink.com
30 | d-link,ftp.d-link.co.za
31 | d-link,ftp.dlink.ru
32 | d-link,ftp.dlink.eu
33 | avm,ftp.avm.de
34 | mamont (FTP search engine),https://www.mmnt.ru/int/get?in=f&st=.bin&qw=firmware,https://www.mmnt.ru/int/get?in=f&st=.bin&qw=firmware
--------------------------------------------------------------------------------
/figs/firmware_arch_distribution.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WUSTL-CSPL/Firmware-Dataset/f1e6897f92ba552de0c97d9bc48a6215d7b6533c/figs/firmware_arch_distribution.jpg
--------------------------------------------------------------------------------
/figs/firmware_os_distribution.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WUSTL-CSPL/Firmware-Dataset/f1e6897f92ba552de0c97d9bc48a6215d7b6533c/figs/firmware_os_distribution.jpg
--------------------------------------------------------------------------------
/src/fw_downloader.py:
--------------------------------------------------------------------------------
1 | import requests
2 | import os
3 |
4 | def download_firmware(url_list, save_path):
5 | if not os.path.exists(save_path):
6 | os.makedirs(save_path) # Create the folder if it doesn't exist
7 |
8 | for url in url_list:
9 | try:
10 | # Extract the original file name from the URL
11 | original_filename = os.path.basename(url)
12 | full_save_path = os.path.join(save_path, original_filename)
13 |
14 | # Make the request to download the file
15 | response = requests.get(url, stream=True)
16 | response.raise_for_status() # Check for HTTP errors
17 |
18 | with open(full_save_path, 'wb') as file:
19 | for chunk in response.iter_content(chunk_size=8192):
20 | if chunk: # Filter out keep-alive new chunks
21 | file.write(chunk)
22 | print(f"Firmware downloaded successfully and saved to {full_save_path}")
23 |
24 | except requests.exceptions.RequestException as e:
25 | print(f"Failed to download the firmware from {url}: {e}")
26 |
27 |
28 | if __name__ == '__main__':
29 | firmware_urls = [
30 | "https://static.tp-link.com/TL-WR940N(US)_V4_160617_1476690524248q.zip",
31 | "https://static.tp-link.com/resources/software/TL-WR1043ND_V1_140319.zip",
32 | "https://static.tp-link.com/resources/software/TL-WA801ND_V1_130131_beta.zip"
33 | ]
34 | save_path = "../fws"
35 | download_firmware(firmware_urls, save_path)
36 |
--------------------------------------------------------------------------------
/src/fw_unpacker.py:
--------------------------------------------------------------------------------
1 | import os
2 | import subprocess
3 |
4 | def unpack_firmware(save_path):
5 | for root, _, files in os.walk(save_path):
6 | for file in files:
7 | file_path = os.path.join(root, file)
8 | try:
9 | subprocess.run(['binwalk', '-Mre', '--directory', save_path, file_path], check=True) # Run binwalk for unpacking
10 | print(f"Unpacked file using binwalk: {file_path}")
11 | except subprocess.CalledProcessError as e:
12 | print(f"Failed to unpack file {file_path} using binwalk: {e}")
13 |
14 |
15 | if __name__ == '__main__':
16 | save_path = "../fws"
17 | unpack_firmware(save_path)
18 |
--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
1 | from fw_downloader import download_firmware
2 | from fw_unpacker import unpack_firmware
3 |
4 | def main(firmware_urls, save_path):
5 | download_firmware(firmware_urls, save_path)
6 | unpack_firmware(save_path)
7 |
8 | if __name__ == '__main__':
9 | # Three firmware samples
10 | firmware_urls = [
11 | "https://static.tp-link.com/TL-WR940N(US)_V4_160617_1476690524248q.zip",
12 | "https://static.tp-link.com/resources/software/TL-WR1043ND_V1_140319.zip",
13 | "https://static.tp-link.com/resources/software/TL-WA801ND_V1_130131_beta.zip"
14 | ]
15 | save_path = "../fws"
16 | main(firmware_urls, save_path)
17 |
--------------------------------------------------------------------------------