├── .gitignore ├── .gitmodules ├── README.md ├── doc └── VMR_Francesco_Areoluci_Presentation.pdf ├── environment.yml ├── environment_cuda.yml ├── src ├── cl_parser.py ├── dir_handler.py ├── ec_utils.py ├── encoders.py ├── event_converter.py ├── rt_detection.py ├── tbe.py └── test_gen1.py ├── tools ├── change_dataset_path.sh ├── get_bbox_classes.sh └── get_gen1_bboxes.py └── yolo_config ├── gen1-test.data ├── gen1.data ├── yolov3-gen1.cfg └── yolov3-tiny.cfg /.gitignore: -------------------------------------------------------------------------------- 1 | data 2 | train_events 3 | valid_events 4 | test_events 5 | */__pycache__ 6 | rt_detections 7 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "PyTorch-YOLOv3"] 2 | path = PyTorch-YOLOv3 3 | url = https://github.com/eriklindernoren/PyTorch-YOLOv3.git 4 | [submodule "prophesee-automotive-dataset-toolbox"] 5 | path = prophesee-automotive-dataset-toolbox 6 | url = https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox.git 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Temporal Binary Represented Event Object Detection 2 | 3 | This repository contains a framework that can be used in order to perform object detection over events acquired from Event Cameras (https://en.wikipedia.org/wiki/Event_camera). 4 | To perform tests, the following technologies and tools have been employed: 5 | 6 | * Prophesee’s GEN1 Automotive Detection Dataset. This dataset contains events and their annotated bounding boxes for two classes: pedestrian and vehicles (https://www.prophesee.ai/2020/01/24/prophesee-gen1-automotive-detection-dataset/) 7 | 8 | * Temporal Binary Representation. This encoding have been developed in order to encode events into frames that can be feed to an object detector along with annotated bounding boxes. For further details check this paper: https://arxiv.org/pdf/2010.08946.pdf 9 | 10 | * YOLOv3 as Object Detector 11 | 12 | ## Submodules 13 | 14 | This repository use the following repos as submodules: 15 | * PyTorch YOLOv3 implementation: https://github.com/eriklindernoren/PyTorch-YOLOv3.git 16 | * Prophesee Toolbox: https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox.git 17 | 18 | Once this repository have been cloned, run: 19 | > git submodule update --init 20 | 21 | ## Requirements 22 | 23 | In order to execute the conversion and the object detection, use the environment.yml file to create a dedicated Conda environment. 24 | Note: if you have a NVidia graphic card compatible with CUDA, use the environment_cuda.yml environment in order to use the GPU with YOLO. 25 | 26 | ## Convert events to frames 27 | 28 | Events from the Prophesee’s GEN1 dataset can be converted to frames and bounding box labeling using the code inside the src/ folder. The code performs the conversion of all the events listed in a given directory and organizes the data in a folder compliant to what the YOLOv3 implementation expect. Given a destination directory, the following directory tree is generated: 29 | ``` bash 30 | . 31 | └── data 32 | ├── completed_videos 33 | ├── custom 34 | │   ├── classes.names 35 | │   ├── images 36 | │   ├── labels 37 | │   ├── test.txt 38 | │   ├── train.txt 39 | │   └── valid.txt 40 | └── evaluated_tbe 41 | 42 | ``` 43 | 44 | Starting from the custom folder, the converted frames and bounding box annotations are stored in images and labels folder. Image types are specified in test, train and valid txt files. Already converted events are specified in completed_videos text file. Temporal Binary Represented array can be stored in npy format in evaluated_tbe folder to avoid performing the conversion multiple times. 45 | Moreover, other types of conversion have been implemented in order to compare the results of Temporal Binary Represented event object detection (Polarity and Surface Active Events encoding). 46 | 47 | ### Conversion 48 | 49 | The conversion can be executed using src/event_converted.py file: 50 | > python event_converter.py -h 51 | ``` bash 52 | Event to frame converter 53 | usage: event_converter.py [-h] [--use_stored_enc] [--save_enc] [--show_video] 54 | [--tbr_bits TBR_BITS] [--src_video SRC_VIDEO] 55 | [--dest_path DEST_PATH] [--event_type EVENT_TYPE] 56 | [--save_bb_img SAVE_BB_IMG] 57 | [--accumulation_time ACCUMULATION_TIME] 58 | [--encoder ENCODER] 59 | [--export_all_frames EXPORT_ALL_FRAMES] 60 | 61 | Convert events to frames and associates bboxes 62 | 63 | optional arguments: 64 | -h, --help show this help message and exit 65 | --use_stored_enc, -l use_stored_enc: instead of evaluates TBR or other 66 | encodings, uses pre-evaluated encoded array. Default: 67 | false 68 | --save_enc, -s save_enc: save the intermediate TBR or other encodings 69 | frame array. Default: false 70 | --show_video, -v show_video: show video with evaluated TBR frames and 71 | their bboxes during processing. Default: false 72 | --tbr_bits TBR_BITS, -n TBR_BITS 73 | tbr_bits: set the number of bits for Temporal Binary 74 | Representation. Default: 8 75 | --src_video SRC_VIDEO, -t SRC_VIDEO 76 | src_video: path to event videos 77 | --dest_path DEST_PATH, -d DEST_PATH 78 | dest_path: path where images and bboxes will be stored 79 | --event_type EVENT_TYPE, -e EVENT_TYPE 80 | event_type: specify data type: 82 | --save_bb_img SAVE_BB_IMG, -b SAVE_BB_IMG 83 | save_bb_img: save frame with bboxes to path 84 | --accumulation_time ACCUMULATION_TIME, -a ACCUMULATION_TIME 85 | accumulation_time: set the quantization time of events 86 | (microseconds). Default: 2500 87 | --encoder ENCODER, -c ENCODER 88 | encoder: set the encoder: . 89 | Default: tbe 90 | --export_all_frames EXPORT_ALL_FRAMES 91 | export_all_frames: export all encoded frames from an 92 | event video to path 93 | 94 | ``` 95 | 96 | For example, to convert events from directory /dataset/train, store results in /dest/folder and label them as train data, run the following: 97 | > python event_converter.py --src_video /dataset/train --dest_path /dest/folder 98 | 99 | To convert events from directory /dataset/validation, store results in /dest/folder and label them as validation data, run the following: 100 | > python event_converter.py --src_video /dataset/validation --dest_path /dest/folder 101 | 102 | To convert events from directory /dataset/test, store results in /dest/folder and label them as test data, run the following: 103 | > python event_converter.py --src_video /dataset/test --dest_path /dest/folder 104 | 105 | Additional options are available in order to: 106 | * Change the number of bits that should be used in TBR - Option: -n X 107 | * Save converted frames with bboxes as image in a directory during processing - Option: -b /path/to/folder 108 | * Save the resulting encoded array in npy format - Option: -s 109 | * Load an encoded array - Option -l 110 | * Show video of converted frames and bboxes during processing - Option: -v 111 | * Change the accumulation time - Option: -a 112 | * Change encoder in order to store frames in other formats - Option: -c 113 | 114 | ### Training the object detector 115 | 116 | Once the dataset has been built, the object detector can be trained. 117 | To setup the detector, modify the gen1.data and gen1-test.data files inside the yolo_config folder with the absolute path of the dataset. 118 | Two configuration files are avaible in order to use the tiny (yolo_config/yolov3-tiny.cfg) or the full (yolo_config/yolov3-gen1.cfg) YOLO implementation. 119 | 120 | To train the detector, from the PyTorch-YOLOv3 folder launch the following command: 121 | > python3 train.py --model_def ../yolo_config/yolov3-.cfg --data_config ../yolo_config/gen1.data 122 | 123 | To test the detector: 124 | 125 | > python3 test.py --model_def ../yolo_config/yolov3-.cfg --data_config ../yolo_config/gen1-test.data --weights_path checkpoints/preferred_ckpt.pth 126 | 127 | To use the detector against real images: 128 | 129 | > python3 detect.py --image_folder /path/to/images --model_def ../yolo_config/yolov3-.cfg --weights_path checkpoints/preferred_ckpt.pth --class_path /path/to/dataset/data/custom/classes.names 130 | 131 | Further informations are available in PyTorch-YOLOv3 repository README. 132 | 133 | ### Use the Prophesee COCO metric evaluation 134 | 135 | In order to use the Prophesee evaluator script, compliant npy array files should be created. These arrays should contain the detected bounding boxes on test images along with their timestamp. 136 | This has be done by forking the YOLOv3 test.py script. The new script creates for each event (frames that belong to the same video) an array of tuples, where each tuple is a detected bounding box. The timestamp associated to a bounding box is approximated as: starting_accumulation_time + (total_accumulation_time / 2), where the total_accumulation_time for Temporal Binary Encoded frames is the accumulation time * number of bits used. 137 | In order to output the npy files enter the src/ folder and run the following command: 138 | 139 | > python3 test_gen1.py --model_def ../yolo_config/yolov3-.cfg --data_config ../yolo_config/gen1-test.data --weights_path ../PyTorch-YOLOv3/checkpoints/preferred_ckpt.pth --gen1_output /path/to/output/folder --total_acc_time 20000 140 | 141 | Once the npy files have been created, the Prophesee evaluator can be used. Enter the prophesee-automotive-dataset-toolbox repository and run the following command: 142 | 143 | > python3 psee_evaluator.py --camera GEN1 /path/to/gen1/events/npy /path/to/detection/events/npy 144 | 145 | Note: in order to use the psee_evaluator script, it should be moved on the root directory of the repository. Otherwise the following error will be produced: 146 | 147 | ``` bash 148 | Traceback (most recent call last): 149 | File "psee_evaluator.py", line 5, in 150 | from src.metrics.coco_eval import evaluate_detection 151 | ModuleNotFoundError: No module named 'src' 152 | ``` 153 | 154 | ### Real time object detection 155 | 156 | The YOLOv3 detect.py script has been forked in order to create a demo script that can be used to convert an event video to encoded frames and detect objects on them at the same time. In order to do that enter the src/ folder and run the following commands. 157 | 158 | > python3 rt_detection.py --class_path /path/to/dataset/data/custom/classes.names --event_video /path/to/event/dat --model_def ../yolo_config/yolov3-.cfg --encoder tbr --accumulation_time 10000 --tbr_bits 8 --show_video --weights_path ../PyTorch-YOLOv3/checkpoints/preferred_ckpt.pth --conf_thres 0.8 159 | 160 | All the three developed encoders can be used and can be specified with the option: 161 | * --encoder , default: tbr 162 | -------------------------------------------------------------------------------- /doc/VMR_Francesco_Areoluci_Presentation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/francescoareoluci/tbr-event-object-detection/e1de8c47fedbe1ea5ab57ae936f1673ff4fb7272/doc/VMR_Francesco_Areoluci_Presentation.pdf -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: pytorch_env 2 | channels: 3 | - conda-forge 4 | - defaults 5 | dependencies: 6 | - _libgcc_mutex=0.1=main 7 | - _pytorch_select=0.1=cpu_0 8 | - absl-py=0.12.0=pyhd8ed1ab_0 9 | - aiohttp=3.7.4=py37h27cfd23_1 10 | - async-timeout=3.0.1=py_1000 11 | - attrs=20.3.0=pyhd3deb0d_0 12 | - blas=1.0=mkl 13 | - blinker=1.4=py_1 14 | - blosc=1.21.0=h8c45485_0 15 | - brotli=1.0.9=he6710b0_2 16 | - brotlipy=0.7.0=py37hb5d75c8_1001 17 | - brunsli=0.1=h2531618_0 18 | - bzip2=1.0.8=h7b6447c_0 19 | - c-ares=1.17.1=h36c2ea0_0 20 | - ca-certificates=2020.12.5=ha878542_0 21 | - cachetools=4.2.1=pyhd8ed1ab_0 22 | - cairo=1.14.12=h8948797_3 23 | - certifi=2020.12.5=py37h89c1867_1 24 | - cffi=1.14.5=py37h261ae71_0 25 | - chardet=3.0.4=py37he5f6b98_1008 26 | - charls=2.1.0=he6710b0_2 27 | - click=7.1.2=pyh9f0ad1d_0 28 | - cloudpickle=1.6.0=py_0 29 | - colorama=0.4.4=pyh9f0ad1d_0 30 | - cryptography=3.4.7=py37h5d9358c_0 31 | - cycler=0.10.0=py37_0 32 | - cytoolz=0.11.0=py37h7b6447c_0 33 | - dask-core=2021.3.0=pyhd3eb1b0_0 34 | - dbus=1.13.18=hb2f20db_0 35 | - decorator=4.4.2=pyhd3eb1b0_0 36 | - expat=2.2.10=he6710b0_2 37 | - ffmpeg=4.0=hcdf2ecd_0 38 | - fontconfig=2.13.1=h6c09931_0 39 | - freeglut=3.0.0=hf484d3e_5 40 | - freetype=2.10.4=h5ab3b9f_0 41 | - geos=3.8.0=he6710b0_0 42 | - giflib=5.1.4=h14c3975_1 43 | - glib=2.66.1=h92f7085_0 44 | - google-auth=1.26.1=pyh44b312d_0 45 | - google-auth-oauthlib=0.4.1=py_2 46 | - graphite2=1.3.14=h23475e2_0 47 | - grpcio=1.33.2=py37haffed2e_2 48 | - gst-plugins-base=1.14.0=h8213a91_2 49 | - gstreamer=1.14.0=h28cd5cc_2 50 | - harfbuzz=1.8.8=hffaf4a1_0 51 | - hdf5=1.10.2=hba1933b_1 52 | - icu=58.2=he6710b0_3 53 | - idna=2.10=pyh9f0ad1d_0 54 | - imagecodecs=2021.1.11=py37h581e88b_1 55 | - imageio=2.9.0=py_0 56 | - imgaug=0.4.0=pyhd3eb1b0_0 57 | - importlib-metadata=3.9.1=py37h89c1867_0 58 | - intel-openmp=2019.4=243 59 | - jasper=2.0.14=h07fcdf6_1 60 | - jpeg=9b=h024ee3a_2 61 | - jxrlib=1.1=h7b6447c_2 62 | - kiwisolver=1.3.1=py37h2531618_0 63 | - lcms2=2.11=h396b838_0 64 | - ld_impl_linux-64=2.33.1=h53a641e_7 65 | - lerc=2.2.1=h2531618_0 66 | - libaec=1.0.4=he6710b0_1 67 | - libdeflate=1.7=h27cfd23_5 68 | - libedit=3.1.20191231=h14c3975_1 69 | - libffi=3.3=he6710b0_2 70 | - libgcc-ng=9.1.0=hdf63c60_0 71 | - libgfortran-ng=7.3.0=hdf63c60_0 72 | - libglu=9.0.0=hf484d3e_1 73 | - libopencv=3.4.2=hb342d67_1 74 | - libopus=1.3.1=h7b6447c_0 75 | - libpng=1.6.37=hbc83047_0 76 | - libprotobuf=3.14.0=h8c45485_0 77 | - libstdcxx-ng=9.1.0=hdf63c60_0 78 | - libtiff=4.1.0=h2733197_1 79 | - libuuid=1.0.3=h1bed415_2 80 | - libvpx=1.7.0=h439df22_0 81 | - libwebp=1.0.1=h8e7db2f_0 82 | - libxcb=1.14=h7b6447c_0 83 | - libxml2=2.9.10=hb55368b_3 84 | - libzopfli=1.0.3=he6710b0_0 85 | - lz4-c=1.9.3=h2531618_0 86 | - markdown=3.3.4=pyhd8ed1ab_0 87 | - matplotlib=3.3.2=h06a4308_0 88 | - matplotlib-base=3.3.2=py37h817c723_0 89 | - mkl=2019.4=243 90 | - mkl-service=2.3.0=py37he8ac12f_0 91 | - mkl_fft=1.2.0=py37h23d657b_0 92 | - mkl_random=1.0.4=py37hd81dba3_0 93 | - multidict=5.1.0=py37h27cfd23_2 94 | - ncurses=6.2=he6710b0_1 95 | - networkx=2.5=py_0 96 | - ninja=1.10.2=py37hff7bd54_0 97 | - numpy=1.19.2=py37h54aff64_0 98 | - numpy-base=1.19.2=py37hfa32c7d_0 99 | - oauthlib=3.0.1=py_0 100 | - olefile=0.46=py37_0 101 | - opencv=3.4.2=py37h6fd60c2_1 102 | - openjpeg=2.3.0=h05c96fa_1 103 | - openssl=1.1.1k=h27cfd23_0 104 | - pcre=8.44=he6710b0_0 105 | - pillow=8.1.0=py37he98fc37_0 106 | - pip=20.3.3=py37h06a4308_0 107 | - pixman=0.40.0=h7b6447c_0 108 | - protobuf=3.14.0=py37h2531618_1 109 | - py-opencv=3.4.2=py37hb342d67_1 110 | - pyasn1=0.4.8=py_0 111 | - pyasn1-modules=0.2.7=py_0 112 | - pycparser=2.20=py_2 113 | - pyjwt=2.0.1=pyhd8ed1ab_0 114 | - pyopenssl=20.0.1=pyhd8ed1ab_0 115 | - pyparsing=2.4.7=pyhd3eb1b0_0 116 | - pyqt=5.9.2=py37h05f1152_2 117 | - pysocks=1.7.1=py37h89c1867_3 118 | - python=3.7.9=h7579374_0 119 | - python-dateutil=2.8.1=pyhd3eb1b0_0 120 | - python_abi=3.7=1_cp37m 121 | - pytorch=1.3.1=cpu_py37h62f834f_0 122 | - pywavelets=1.1.1=py37h7b6447c_2 123 | - pyyaml=5.4.1=py37h27cfd23_1 124 | - qt=5.9.7=h5867ecd_1 125 | - readline=8.1=h27cfd23_0 126 | - requests=2.25.1=pyhd3deb0d_0 127 | - requests-oauthlib=1.3.0=pyh9f0ad1d_0 128 | - rsa=4.7.2=pyh44b312d_0 129 | - scikit-image=0.17.2=py37hdf5156a_0 130 | - scipy=1.6.2=py37h91f5cce_0 131 | - setuptools=52.0.0=py37h06a4308_0 132 | - shapely=1.7.1=py37h98ec03d_0 133 | - sip=4.19.8=py37hf484d3e_0 134 | - six=1.15.0=py37h06a4308_0 135 | - snappy=1.1.8=he6710b0_0 136 | - sqlite=3.33.0=h62c20be_0 137 | - tensorboard=2.4.1=pyhd8ed1ab_0 138 | - tensorboard-plugin-wit=1.8.0=pyh44b312d_0 139 | - termcolor=1.1.0=py_2 140 | - terminaltables=3.1.0=py_0 141 | - tifffile=2021.3.17=pyhd3eb1b0_1 142 | - tk=8.6.10=hbc83047_0 143 | - toolz=0.11.1=pyhd3eb1b0_0 144 | - torchvision=0.4.2=cpu_py37h9ec355b_0 145 | - tornado=6.1=py37h27cfd23_0 146 | - tqdm=4.59.0=pyhd3eb1b0_1 147 | - typing-extensions=3.7.4.3=0 148 | - typing_extensions=3.7.4.3=py_0 149 | - tzdata=2020f=h52ac0ba_0 150 | - urllib3=1.26.4=pyhd8ed1ab_0 151 | - werkzeug=1.0.1=pyh9f0ad1d_0 152 | - wheel=0.36.2=pyhd3eb1b0_0 153 | - xz=5.2.5=h7b6447c_0 154 | - yaml=0.2.5=h7b6447c_0 155 | - yarl=1.6.3=py37h4abf009_0 156 | - zfp=0.5.5=h2531618_4 157 | - zipp=3.4.1=pyhd8ed1ab_0 158 | - zlib=1.2.11=h7b6447c_3 159 | - zstd=1.4.5=h9ceee32_0 160 | prefix: /home/magenta/anaconda3/envs/pytorch_env 161 | 162 | -------------------------------------------------------------------------------- /environment_cuda.yml: -------------------------------------------------------------------------------- 1 | name: pytorch_env 2 | channels: 3 | - pytorch 4 | - conda-forge 5 | - defaults 6 | dependencies: 7 | - _libgcc_mutex=0.1=main 8 | - absl-py=0.12.0=pyhd8ed1ab_0 9 | - blas=1.0=mkl 10 | - bzip2=1.0.8=h7b6447c_0 11 | - c-ares=1.17.1=h36c2ea0_0 12 | - ca-certificates=2020.12.5=ha878542_0 13 | - cairo=1.16.0=hf32fb01_1 14 | - certifi=2020.12.5=py37h89c1867_1 15 | - cloudpickle=1.6.0=py_0 16 | - colorama=0.4.4=pyh9f0ad1d_0 17 | - cudatoolkit=11.0.221=h6bb024c_0 18 | - cycler=0.10.0=py37_0 19 | - cython=0.29.17=py37h3340039_0 20 | - cytoolz=0.11.0=py37h7b6447c_0 21 | - dask-core=2021.4.0=pyhd3eb1b0_0 22 | - dbus=1.13.18=hb2f20db_0 23 | - decorator=5.0.6=pyhd3eb1b0_0 24 | - expat=2.3.0=h2531618_2 25 | - ffmpeg=4.0=hcdf2ecd_0 26 | - fontconfig=2.13.1=h6c09931_0 27 | - freeglut=3.0.0=hf484d3e_5 28 | - freetype=2.10.4=h5ab3b9f_0 29 | - fsspec=0.9.0=pyhd3eb1b0_0 30 | - geos=3.8.0=he6710b0_0 31 | - glib=2.68.1=h36276a3_0 32 | - graphite2=1.3.14=h23475e2_0 33 | - grpcio=1.33.2=py37haffed2e_2 34 | - gst-plugins-base=1.14.0=h8213a91_2 35 | - gstreamer=1.14.0=h28cd5cc_2 36 | - harfbuzz=1.8.8=hffaf4a1_0 37 | - hdf5=1.10.2=hba1933b_1 38 | - icu=58.2=he6710b0_3 39 | - imageio=2.9.0=pyhd3eb1b0_0 40 | - imgaug=0.4.0=pyhd3eb1b0_0 41 | - importlib-metadata=3.10.1=py37h89c1867_0 42 | - intel-openmp=2020.2=254 43 | - jasper=2.0.14=h07fcdf6_1 44 | - jpeg=9b=h024ee3a_2 45 | - kiwisolver=1.3.1=py37h2531618_0 46 | - lcms2=2.12=h3be6417_0 47 | - ld_impl_linux-64=2.33.1=h53a641e_7 48 | - libffi=3.3=he6710b0_2 49 | - libgcc-ng=9.1.0=hdf63c60_0 50 | - libgfortran-ng=7.3.0=hdf63c60_0 51 | - libglu=9.0.0=hf484d3e_1 52 | - libopencv=3.4.2=hb342d67_1 53 | - libopus=1.3.1=h7b6447c_0 54 | - libpng=1.6.37=hbc83047_0 55 | - libprotobuf=3.14.0=h8c45485_0 56 | - libstdcxx-ng=9.1.0=hdf63c60_0 57 | - libtiff=4.1.0=h2733197_1 58 | - libuuid=1.0.3=h1bed415_2 59 | - libuv=1.40.0=h7b6447c_0 60 | - libvpx=1.7.0=h439df22_0 61 | - libxcb=1.14=h7b6447c_0 62 | - libxml2=2.9.10=hb55368b_3 63 | - locket=0.2.1=py37h06a4308_1 64 | - lz4-c=1.9.3=h2531618_0 65 | - markdown=3.3.4=pyhd8ed1ab_0 66 | - matplotlib=3.3.4=py37h06a4308_0 67 | - matplotlib-base=3.3.4=py37h62a2d02_0 68 | - mkl=2020.2=256 69 | - mkl-service=2.3.0=py37he8ac12f_0 70 | - mkl_fft=1.3.0=py37h54f3939_0 71 | - mkl_random=1.1.1=py37h0573a6f_0 72 | - ncurses=6.2=he6710b0_1 73 | - networkx=2.2=py37_1 74 | - ninja=1.10.2=hff7bd54_1 75 | - numpy=1.19.2=py37h54aff64_0 76 | - numpy-base=1.19.2=py37hfa32c7d_0 77 | - olefile=0.46=py37_0 78 | - opencv=3.4.2=py37h6fd60c2_1 79 | - openssl=1.1.1k=h27cfd23_0 80 | - partd=1.2.0=pyhd3eb1b0_0 81 | - pcre=8.44=he6710b0_0 82 | - pillow=8.2.0=py37he98fc37_0 83 | - pip=21.0.1=py37h06a4308_0 84 | - pixman=0.40.0=h7b6447c_0 85 | - protobuf=3.14.0=py37h2531618_1 86 | - py-opencv=3.4.2=py37hb342d67_1 87 | - pycocotools=2.0.2=py37h8f50634_1 88 | - pyparsing=2.4.7=pyhd3eb1b0_0 89 | - pyqt=5.9.2=py37h05f1152_2 90 | - python=3.7.10=hdb3f193_0 91 | - python-dateutil=2.8.1=pyhd3eb1b0_0 92 | - python_abi=3.7=1_cp37m 93 | - pytorch=1.7.1=py3.7_cuda11.0.221_cudnn8.0.5_0 94 | - pywavelets=1.1.1=py37h7b6447c_2 95 | - pyyaml=5.4.1=py37h27cfd23_1 96 | - qt=5.9.7=h5867ecd_1 97 | - readline=8.1=h27cfd23_0 98 | - scikit-image=0.18.1=py37ha9443f7_0 99 | - scipy=1.6.2=py37h91f5cce_0 100 | - setuptools=52.0.0=py37h06a4308_0 101 | - shapely=1.7.1=py37h98ec03d_0 102 | - sip=4.19.8=py37hf484d3e_0 103 | - six=1.15.0=py37h06a4308_0 104 | - sqlite=3.35.4=hdfb4753_0 105 | - tensorboard=1.15.0=py37_0 106 | - termcolor=1.1.0=py_2 107 | - terminaltables=3.1.0=py_0 108 | - tifffile=2020.10.1=py37hdd07704_2 109 | - tk=8.6.10=hbc83047_0 110 | - toolz=0.11.1=pyhd3eb1b0_0 111 | - torchvision=0.8.2=py37_cu110 112 | - tornado=6.1=py37h27cfd23_0 113 | - tqdm=4.59.0=pyhd3eb1b0_1 114 | - typing_extensions=3.7.4.3=pyha847dfd_0 115 | - werkzeug=1.0.1=pyh9f0ad1d_0 116 | - wheel=0.36.2=pyhd3eb1b0_0 117 | - xz=5.2.5=h7b6447c_0 118 | - yaml=0.2.5=h7b6447c_0 119 | - zipp=3.4.1=pyhd8ed1ab_0 120 | - zlib=1.2.11=h7b6447c_3 121 | - zstd=1.4.9=haebb681_0 122 | prefix: /home/fareoluci/miniconda3/envs/pytorch_env 123 | -------------------------------------------------------------------------------- /src/cl_parser.py: -------------------------------------------------------------------------------- 1 | """ 2 | cl_parser.py: Command Line Parser, parses user commands 3 | """ 4 | 5 | import argparse 6 | 7 | class CLParser: 8 | """ 9 | @brief: Class to manage command line arguments 10 | """ 11 | 12 | def __init__(self): 13 | self._parser = argparse.ArgumentParser(description='Convert events to frames and associates bboxes') 14 | self._parser.add_argument('--use_stored_enc', '-l', action='count', default=0, 15 | help='use_stored_enc: instead of evaluates TBR or other encodings, uses pre-evaluated encoded array. Default: false') 16 | self._parser.add_argument('--save_enc', '-s', action='count', default=0, 17 | help='save_enc: save the intermediate TBR or other encodings frame array. Default: false') 18 | self._parser.add_argument('--show_video', '-v', action='count', default=0, 19 | help='show_video: show video with evaluated TBR frames and their bboxes during processing. Default: false') 20 | self._parser.add_argument('--tbr_bits', '-n', type=int, default=8, 21 | help='tbr_bits: set the number of bits for Temporal Binary Representation. Default: 8') 22 | self._parser.add_argument('--src_video', '-t', type=str, nargs=1, 23 | help='src_video: path to event videos') 24 | self._parser.add_argument('--dest_path', '-d', type=str, nargs=1, 25 | help='dest_path: path where images and bboxes will be stored') 26 | self._parser.add_argument('--event_type', '-e', type=str, nargs=1, 27 | help='event_type: specify data type: ') 28 | self._parser.add_argument('--save_bb_img', '-b', type=str, nargs=1, 29 | help='save_bb_img: save frame with bboxes to path') 30 | self._parser.add_argument('--accumulation_time', '-a', type=int, default=2500, 31 | help='accumulation_time: set the quantization time of events (microseconds). Default: 2500') 32 | self._parser.add_argument('--encoder', '-c', type=str, nargs=1, 33 | help='encoder: set the encoder: . Default: tbe') 34 | self._parser.add_argument('--export_all_frames', type=str, nargs=1, 35 | help='export_all_frames: export all encoded frames from an event video to path') 36 | 37 | def parse(self): 38 | """ 39 | @brief: parse the command line arguments 40 | @return: parsed arguments 41 | """ 42 | 43 | return self._parser.parse_args() -------------------------------------------------------------------------------- /src/dir_handler.py: -------------------------------------------------------------------------------- 1 | """ 2 | dir_handlers.py: module that manages the input/output directories 3 | """ 4 | 5 | import os 6 | 7 | def setupDirectories(root_dir: str) -> dict: 8 | """ 9 | @brief: Setup directories as requested in YOLOV3 10 | implementation. 11 | @param: root_dir - Root directory where the files should be 12 | saved. Must be a valid folder 13 | @return: Dict of useful directories: 14 | "images": images_path, 15 | "labels": labels_path, 16 | "train_file": train_file_path, 17 | "valid_file": valid_file_path, 18 | "test_file": test_file_path, 19 | "list": list_path, 20 | "completed": completed_file_path, 21 | "enc": evaluated_enc_path 22 | """ 23 | 24 | start_folder = 'data' 25 | data_path = root_dir + '/' + start_folder 26 | custom_path = data_path + '/custom' 27 | images_path = custom_path + '/images' 28 | labels_path = custom_path + '/labels' 29 | classes_file_path = custom_path + '/classes.names' 30 | train_file_path = custom_path + '/train.txt' 31 | valid_file_path = custom_path + '/valid.txt' 32 | test_file_path = custom_path + '/test.txt' 33 | completed_file_path = data_path + "/completed_videos" 34 | evaluated_enc_path = data_path + "/evaluated_enc" 35 | 36 | if not os.path.isdir(data_path): 37 | os.mkdir(data_path) 38 | start_folder_abs = os.path.abspath(data_path) 39 | list_path = start_folder_abs + '/custom/images/' 40 | 41 | if not os.path.isdir(custom_path): 42 | os.mkdir(custom_path) 43 | 44 | if not os.path.isdir(images_path): 45 | os.mkdir(images_path) 46 | 47 | if not os.path.isdir(labels_path): 48 | os.mkdir(labels_path) 49 | 50 | if not os.path.isdir(evaluated_enc_path): 51 | os.mkdir(evaluated_enc_path) 52 | 53 | # Setup classes 54 | f = open(classes_file_path, "w") 55 | f.write("vehicle\n") 56 | f.write("pedestrian\n") 57 | f.close 58 | 59 | if not os.path.isfile(completed_file_path): 60 | f = open(completed_file_path, "x") 61 | f.close() 62 | 63 | if not os.path.isfile(train_file_path): 64 | f = open(train_file_path, "x") 65 | f.close() 66 | 67 | if not os.path.isfile(valid_file_path): 68 | f = open(valid_file_path, "x") 69 | f.close() 70 | 71 | if not os.path.isfile(test_file_path): 72 | f = open(test_file_path, "x") 73 | f.close() 74 | 75 | return { 76 | "images": images_path, 77 | "labels": labels_path, 78 | "train_file": train_file_path, 79 | "valid_file": valid_file_path, 80 | "test_file": test_file_path, 81 | "list": list_path, 82 | "completed": completed_file_path, 83 | "enc": evaluated_enc_path 84 | } 85 | 86 | 87 | def getEventList(directory: str) -> list: 88 | """ 89 | @brief: Check in directory for events and bbox annotations. 90 | An event is valid if annotation file 91 | with same basename exists. 92 | @param: directory - Directory where the .dat and .npy files 93 | are stored 94 | @return: List of basenames of valid event files. 95 | """ 96 | 97 | file_list_npy = [file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory, file)) and 98 | os.path.splitext(os.path.join(directory, file))[1] == '.npy'] 99 | file_list_dat = [file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory, file)) and 100 | os.path.splitext(os.path.join(directory, file))[1] == '.dat'] 101 | filtered_file_list = [] 102 | for td in file_list_dat: 103 | if "cut" in td: 104 | # Avoid files with same name but different 'cut' 105 | # @TODO: change split policy to handle these files 106 | print("Skipping video {:s}: filename not compliant".format(td)) 107 | continue 108 | 109 | td_split = td.split('_') 110 | td = td_split[0] + "_" + td_split[1] + "_" + td_split[2] + "_" + td_split[3] 111 | for bbox in file_list_npy: 112 | bbox_split = bbox.split('_') 113 | bbox = bbox_split[0] + "_" + bbox_split[1] + "_" + bbox_split[2] + "_" + bbox_split[3] 114 | if td == bbox: 115 | filtered_file_list.append(td) 116 | 117 | return filtered_file_list -------------------------------------------------------------------------------- /src/ec_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | ec_utils.py: Event Converter Utils - Utility functions 3 | """ 4 | 5 | import numpy as np 6 | import matplotlib.pyplot as plt 7 | from matplotlib.patches import Rectangle 8 | from matplotlib.ticker import NullLocator 9 | 10 | 11 | def show_image(frame: np.array, bboxes: np.array, max_value: int = 1): 12 | """ 13 | @brief: show video of encoded frames and their bboxes 14 | during processing 15 | @param: frame - A np array containing pixel informations 16 | @param: bboxes - np array with the bboxes associated to the frame. 17 | As loaded from the GEN1 .npy array 18 | """ 19 | 20 | plt.figure(1) 21 | plt.clf() 22 | plt.axis("off") 23 | plt.imshow(frame, animated=True, cmap='gray', vmin=0, vmax=max_value) 24 | #plt.colorbar() 25 | 26 | # Get the current reference 27 | ax = plt.gca() 28 | 29 | # Create Rectangle boxes 30 | for b in bboxes: 31 | predicted_class = b[5] 32 | x = b[1] 33 | y = b[2] 34 | w = b[3] 35 | h = b[4] 36 | bbox_color = 'g' if predicted_class == 1 else 'r' 37 | # Create Rectangle 38 | rect = Rectangle((x, y), w, h, linewidth=2, edgecolor=bbox_color, facecolor='none') 39 | # Add the patch to the Axes 40 | ax.add_patch(rect) 41 | # Add label 42 | plt.text( 43 | b[1], 44 | b[2], 45 | s='Pedestrian' if predicted_class == 1 else 'Vehicle', 46 | color="white", 47 | verticalalignment="top", 48 | bbox={"color": bbox_color, "pad": 0}, 49 | ) 50 | 51 | 52 | def save_bb_image(frame: np.array, 53 | bboxes: np.array, 54 | save_path: str, 55 | only_detection: bool = True, 56 | max_value: int = 1): 57 | """ 58 | @brief: save encoded frames with their bboxes 59 | @param: frame - A np array containing pixel informations 60 | @param: bboxes - np array with the bboxes associated to the frame. 61 | As loaded from the GEN1 .npy array 62 | @param: save_path - Existing path where the resulting images should 63 | be saved 64 | """ 65 | 66 | plt.imshow(frame, cmap='gray', vmin=0, vmax=max_value) 67 | plt.axis('off') 68 | 69 | # Get the current reference 70 | ax = plt.gca() 71 | 72 | # Create Rectangle boxes 73 | for b in bboxes: 74 | predicted_class = b[5] 75 | x = b[1] 76 | y = b[2] 77 | w = b[3] 78 | h = b[4] 79 | bbox_color = 'g' if predicted_class == 1 else 'r' 80 | # Create Rectangle 81 | rect = Rectangle((x, y), w, h, linewidth=2, edgecolor=bbox_color, facecolor='none') 82 | # Add the patch to the Axes 83 | ax.add_patch(rect) 84 | # Add label 85 | plt.text( 86 | b[1], 87 | b[2], 88 | s='Pedestrian' if predicted_class == 1 else 'Vehicle', 89 | color="white", 90 | verticalalignment="top", 91 | bbox={"color": bbox_color, "pad": 0}, 92 | ) 93 | 94 | if not only_detection: 95 | # Save all frames if requested 96 | plt.savefig(save_path, bbox_inches='tight') 97 | plt.close() 98 | elif len(bboxes) != 0: 99 | # Save only frames with bboxes associated 100 | plt.savefig(save_path, bbox_inches='tight') 101 | plt.close() 102 | 103 | 104 | def convertBBoxCoords(bbox: np.array, image_width: int, image_height: int) -> np.array: 105 | """ 106 | @brief: Converts top-left starting coordinates to 107 | rectangle-centered coordinates. Moreover, 108 | coordinates and size are normalized. 109 | @param: bbox - A bbox as loaded from the GEN1 .npy array 110 | @param: image_width 111 | @param: image_height 112 | @return: np array compliant to YOLOV3 implementation. 113 | """ 114 | 115 | top_left_x = bbox[1] 116 | top_left_y = bbox[2] 117 | width = bbox[3] 118 | height = bbox[4] 119 | norm_center_x = float((top_left_x + (width / 2)) / image_width) 120 | norm_center_y = float((top_left_y + (height / 2)) / image_height) 121 | norm_width = float(width / image_width) 122 | norm_height = float(height / image_height) 123 | 124 | new_bbox = np.array([int(bbox[5]), norm_center_x, norm_center_y, norm_width, norm_height]) 125 | 126 | return new_bbox 127 | -------------------------------------------------------------------------------- /src/encoders.py: -------------------------------------------------------------------------------- 1 | """ 2 | encoders.py: encode frames from event videos using 3 | Temporal Binary Representation, Polarity, Surface Active Events encodings 4 | """ 5 | 6 | import math 7 | import sys 8 | import numpy as np 9 | from tqdm import tqdm 10 | 11 | from tbe import TemporalBinaryEncoding 12 | import sys 13 | sys.path.insert(0, '../prophesee-automotive-dataset-toolbox/') 14 | from src.io.psee_loader import PSEELoader 15 | 16 | 17 | def encode_video_sae(width: int, 18 | height: int, 19 | video: PSEELoader, 20 | delta: int = 2500) -> np.array: 21 | """ 22 | @brief: Encode video in a sequence of frames using 23 | Surface Active Event (SAE) encoding 24 | @param: width 25 | @param: height 26 | @param: video - loaded from PSEELoader 27 | @param: delta - accumulation time 28 | @return: encoded frames as a Numpy array with the following data type: 29 | [('startTs', np.uint16), 30 | ('endTs', np.uint16), 31 | ('frame', np.float32, (height, width))] 32 | """ 33 | 34 | print("Starting Surface Active Event encoding...") 35 | 36 | # Each encoded frame will have a start/end timestamp (ms) in order 37 | # to associate bounding boxes later. 38 | # Note: If videos are longer than 1 minutes, 16 bits per ts are not sufficient. 39 | data_type = np.dtype([('startTs', np.uint16), 40 | ('endTs', np.uint16), 41 | ('frame', np.float32, (height, width))]) 42 | 43 | samplePerVideo = math.ceil(video.total_time() / delta) 44 | sae_array = np.zeros(samplePerVideo, dtype=data_type) 45 | 46 | i = 0 47 | startTimestamp = 0 # milliseconds 48 | endTimestamp = 0 # milliseconds 49 | 50 | pbar = tqdm(total=samplePerVideo, file=sys.stdout) 51 | while not video.done: 52 | events = video.load_delta_t(delta) 53 | f = np.zeros(video.get_size()) 54 | for e in events: 55 | # Evaluate polarity of an event 56 | # for a certain pixel 57 | t_p = e['t'] # microseconds 58 | t_0 = startTimestamp * 1000 # microseconds 59 | f[e['y'], e['x']] = 255 * ((t_p - t_0) / delta) 60 | 61 | endTimestamp += delta / 1000 62 | sae_array[i]['startTs'] = startTimestamp 63 | sae_array[i]['endTs'] = endTimestamp 64 | sae_array[i]['frame'] = f 65 | startTimestamp += delta / 1000 66 | i += 1 67 | 68 | pbar.update(1) 69 | 70 | pbar.close() 71 | return sae_array 72 | 73 | 74 | def encode_video_polarity(width: int, 75 | height: int, 76 | video: PSEELoader, 77 | delta: int = 2500) -> np.array: 78 | """ 79 | @brief: Encode video in a sequence of frames using 80 | Polarity encoding 81 | @param: width 82 | @param: height 83 | @param: video - loaded from PSEELoader 84 | @param: delta - accumulation time 85 | @return: encoded frames as a Numpy array with the following data type: 86 | [('startTs', np.uint16), 87 | ('endTs', np.uint16), 88 | ('frame', np.float32, (height, width))] 89 | """ 90 | 91 | print("Starting Polarity Encoding...") 92 | 93 | # Each encoded frame will have a start/end timestamp (ms) in order 94 | # to associate bounding boxes later. 95 | # Note: If videos are longer than 1 minutes, 16 bits per ts are not sufficient. 96 | data_type = np.dtype([('startTs', np.uint16), 97 | ('endTs', np.uint16), 98 | ('frame', np.float32, (height, width))]) 99 | 100 | samplePerVideo = math.ceil(video.total_time() / delta) 101 | polarity_array = np.zeros(samplePerVideo, dtype=data_type) 102 | 103 | i = 0 104 | startTimestamp = 0 # milliseconds 105 | endTimestamp = 0 # milliseconds 106 | 107 | pbar = tqdm(total=samplePerVideo, file=sys.stdout) 108 | while not video.done: 109 | events = video.load_delta_t(delta) 110 | f = np.full(video.get_size(), 0.5) 111 | for e in events: 112 | # Evaluate polarity of an event 113 | # for a certain pixel 114 | if e['p'] == 1: 115 | f[e['y'], e['x']] = 1 116 | else: 117 | f[e['y'], e['x']] = 0 118 | 119 | endTimestamp += delta / 1000 120 | polarity_array[i]['startTs'] = startTimestamp 121 | polarity_array[i]['endTs'] = endTimestamp 122 | polarity_array[i]['frame'] = f 123 | startTimestamp += delta / 1000 124 | i += 1 125 | 126 | pbar.update(1) 127 | 128 | pbar.close() 129 | return polarity_array 130 | 131 | 132 | def encode_video_tbe(N: int, 133 | width: int, 134 | height: int, 135 | video: PSEELoader, 136 | encoder: TemporalBinaryEncoding, 137 | delta: int = 2500) -> np.array: 138 | """ 139 | @brief: Encode an event video in a sequence of frame 140 | using the Temporal Binary Representation 141 | @param: N - number of bits to be used 142 | @param: width 143 | @param: height 144 | @param: video - loaded from PSEELoader 145 | @param: encoded - TBE encoder 146 | @param: delta - accumulation time 147 | @return: encoded frames as a Numpy array with the following data type: 148 | [('startTs', np.uint16), 149 | ('endTs', np.uint16), 150 | ('frame', np.float32, (height, width))] 151 | """ 152 | 153 | print("Starting Temporal Binary Encoding...") 154 | 155 | # Each encoded frame will have a start/end timestamp (ms) in order 156 | # to associate bounding boxes later. 157 | # Note: If videos are longer than 1 minutes, 16 bits per ts are not sufficient. 158 | data_type = np.dtype([('startTs', np.uint16), 159 | ('endTs', np.uint16), 160 | ('frame', np.float32, (height, width))]) 161 | 162 | samplePerVideo = math.ceil((video.total_time() / delta) / N) 163 | accumulation_mat = np.zeros((N, height, width)) 164 | tbe_array = np.zeros(samplePerVideo, dtype=data_type) 165 | 166 | i = 0 167 | j = 0 168 | startTimestamp = 0 # milliseconds 169 | endTimestamp = 0 # milliseconds 170 | 171 | pbar = tqdm(total = samplePerVideo, file = sys.stdout) 172 | while not video.done: 173 | i = (i + 1) % N 174 | # Load next 1ms events from the video 175 | events = video.load_delta_t(delta) 176 | f = np.zeros(video.get_size()) 177 | for e in events: 178 | # Evaluate presence/absence of event for 179 | # a certain pixel 180 | f[e['y'], e['x']] = 1 181 | 182 | accumulation_mat[i, ...] = f 183 | 184 | if i == N - 1: 185 | endTimestamp += (N * delta) / 1000 186 | tbe = encoder.encode(accumulation_mat) 187 | tbe_array[j]['startTs'] = startTimestamp 188 | tbe_array[j]['endTs'] = endTimestamp 189 | tbe_array[j]['frame'] = tbe 190 | j += 1 191 | startTimestamp += (N * delta) / 1000 192 | pbar.update(1) 193 | 194 | pbar.close() 195 | return tbe_array 196 | 197 | 198 | def get_frame_BB(frame: np.array, BB_array: np.array) -> np.array: 199 | """ 200 | @brief: Associates to an encoded video frame 201 | a list of bounding boxes with timestamp included in 202 | start/end timestamp of the frame. 203 | @param: frame - Encoded frame with the following structure: 204 | [{'startTs': startTs}, {'endTs': endTs}, {'frame': frame}] 205 | (i.e. as the one returned from the encoders fuctions) 206 | @param: BB_array - Bounding Boxes array, 207 | loaded from the GEN1 .npy arrays 208 | @return: The associated BBoxes. 209 | """ 210 | 211 | associated_bb = [] 212 | for bb in BB_array: 213 | # Convert timestamp to milliseconds 214 | timestamp = bb[0] / 1000 215 | startTime = frame['startTs'] 216 | endTime = frame['endTs'] 217 | if timestamp >= startTime and timestamp <= endTime: 218 | associated_bb.append(bb) 219 | # Avoid useless iterations 220 | if timestamp > endTime: 221 | break 222 | 223 | return np.array(associated_bb) -------------------------------------------------------------------------------- /src/event_converter.py: -------------------------------------------------------------------------------- 1 | """ 2 | event_converter.py: convert event videos to frames with Temporal Binary Encoding 3 | Main module 4 | """ 5 | 6 | import os 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | from PIL import Image 10 | 11 | from encoders import * 12 | from ec_utils import * 13 | from dir_handler import * 14 | from tbe import TemporalBinaryEncoding 15 | from cl_parser import CLParser 16 | 17 | import sys 18 | sys.path.insert(0, '../prophesee-automotive-dataset-toolbox/') 19 | from src.io.psee_loader import PSEELoader 20 | 21 | 22 | if __name__ == "__main__": 23 | print("Event to frame converter") 24 | 25 | # Parsing arguments 26 | parser = CLParser() 27 | args = parser.parse() 28 | save_encoding = True if args.save_enc > 0 else False 29 | use_stored_encoding = True if args.use_stored_enc > 0 else False 30 | show_video = True if args.show_video > 0 else False 31 | tbr_bits_requested = True if args.tbr_bits != None else False 32 | src_video_requested = True if args.src_video != None else False 33 | dest_path_requested = True if args.dest_path != None else False 34 | event_type_requested = True if args.event_type != None else False 35 | save_path_requested = True if args.save_bb_img != None else False 36 | accumulation_time_requested = True if args.accumulation_time != None else False 37 | encoder_type_requested = True if args.encoder != None else False 38 | export_all_frames_requested = True if args.export_all_frames != None else False 39 | 40 | dest_root_folder = '..' 41 | if dest_path_requested: 42 | dest_root_folder = args.dest_path[0] 43 | 44 | save_path = "" 45 | if save_path_requested: 46 | save_path = args.save_bb_img[0] + '/' 47 | 48 | export_frames_path = "" 49 | if export_all_frames_requested: 50 | export_frames_path = args.export_all_frames[0] + '/' 51 | 52 | video_dir = "../train_events/" 53 | if src_video_requested: 54 | video_dir = args.src_video[0] + '/' 55 | 56 | data_type = 'train' 57 | if event_type_requested: 58 | if args.event_type[0] == 'train' or args.event_type[0] == 'validation' or args.event_type[0] == 'test': 59 | data_type = args.event_type[0] 60 | else: 61 | print("Invalid event type requested. Supported: .") 62 | exit() 63 | 64 | # Encoder 65 | requested_encoder = 'tbe' 66 | if encoder_type_requested: 67 | if args.encoder[0] == 'tbe' or args.encoder[0] == 'polarity' or args.encoder[0] == 'sae': 68 | requested_encoder = args.encoder[0] 69 | else: 70 | print("Invalid encoder requested") 71 | exit() 72 | 73 | # Number of bits to be used in Temporal Binary Encoding 74 | tbr_bits = args.tbr_bits 75 | 76 | # Accumulation time (microseconds) 77 | delta_t = args.accumulation_time 78 | 79 | # Print some info 80 | print("===============================") 81 | print("Encoder: " + requested_encoder) 82 | print("Requested encoded array saving: " + str(save_encoding)) 83 | print("Requested saved encoded array loading: " + str(use_stored_encoding)) 84 | print("Requested video show during processing: " + str(show_video)) 85 | if requested_encoder == 'tbe': 86 | print("Using {:d} bits to represent events".format(tbr_bits)) 87 | print("Accumulation time: " + str(delta_t)) 88 | print("Source event path: " + video_dir) 89 | print("Destination path: " + dest_root_folder + '/data') 90 | print("Event data type: " + data_type) 91 | print("===============================") 92 | 93 | if data_type == "train": 94 | txt_list_file = 'train_file' 95 | elif data_type == 'validation': 96 | txt_list_file = 'valid_file' 97 | else: 98 | txt_list_file = 'test_file' 99 | 100 | # Setup data directory to save files (images, bboxes and labels) 101 | dir_paths = setupDirectories(dest_root_folder) 102 | 103 | # Iterate through videos in video_dir to get list 104 | video_names = getEventList(video_dir) 105 | 106 | # Max pixel value to display frames 107 | max_pixel_value = 1 108 | 109 | # Iterate videos 110 | for video_name in video_names: 111 | video_path = video_dir + video_name 112 | 113 | with open(dir_paths['completed']) as completed_videos: 114 | if video_name in completed_videos.read(): 115 | print("Skipping completed video: " + video_name) 116 | continue 117 | 118 | print("Processing video: " + video_name) 119 | 120 | gen1_bboxes = np.load(video_path + "_bbox.npy") 121 | 122 | # Load video 123 | gen1_video = PSEELoader(video_path + "_td.dat") 124 | 125 | width = gen1_video.get_size()[1] 126 | height = gen1_video.get_size()[0] 127 | encoder = TemporalBinaryEncoding(tbr_bits, width, height) 128 | 129 | if not use_stored_encoding: 130 | # Convert event video to a Temporal Binary Encoded frames array 131 | if requested_encoder == 'tbe': 132 | encoded_array = encode_video_tbe(tbr_bits, width, height, gen1_video, encoder, delta_t) 133 | elif requested_encoder == 'polarity': 134 | encoded_array = encode_video_polarity(width, height, gen1_video, delta_t) 135 | else: 136 | encoded_array = encode_video_sae(width, height, gen1_video, delta_t) 137 | max_pixel_value = 255 138 | 139 | if save_encoding: 140 | np.save(dir_paths["enc"] + video_name + "_enc.npy", encoded_array) 141 | else: 142 | # Use the pre-evaluated encoded (tbe or else) array 143 | encoded_array = np.load(dir_paths["enc"] + video_name + "_enc.npy") 144 | 145 | # Iterate through video frames 146 | img_count = 0 147 | bbox_count = 0 148 | print("Saving encoded frames and bounding boxes...") 149 | for f in encoded_array: 150 | bboxes = get_frame_BB(f, gen1_bboxes) 151 | 152 | filename = video_name + str("_" + str(f["startTs"])) 153 | # Save images that have at least a bbox 154 | if len(bboxes) != 0: 155 | # Save image 156 | plt.imsave(dir_paths["images"] + "/" + filename + ".jpg", f['frame'], vmin=0, vmax=1, cmap='gray') 157 | 158 | # Update train or validation txt file (append if not existing) 159 | with open(dir_paths[txt_list_file], "r+") as list_txt_file: 160 | file_string = dir_paths["list"] + filename + ".jpg" 161 | for line in list_txt_file: 162 | # Search for image file path in this file 163 | if file_string in line: 164 | break 165 | else: # Note: this indentation is intentional 166 | # If entered, the string does not exist in this file 167 | # Append file path 168 | list_txt_file.write(file_string + "\n") 169 | 170 | # Write BBoxes in labels 171 | label_file = open(dir_paths["labels"] + "/" + filename + ".txt", "w") 172 | for b in bboxes: 173 | conv_bbox = convertBBoxCoords(b, width, height) 174 | label_file.write(str("%d" % conv_bbox[0]) + " ") 175 | label_file.write(str("%.8f" % conv_bbox[1]) + " ") 176 | label_file.write(str("%.8f" % conv_bbox[2]) + " ") 177 | label_file.write(str("%.8f" % conv_bbox[3]) + " ") 178 | label_file.write(str("%.8f" % conv_bbox[4]) + "\n") 179 | bbox_count += 1 180 | label_file.close() 181 | 182 | if save_path_requested: 183 | save_bb_image(f['frame'], bboxes, save_path + filename + "_bb.jpg") 184 | 185 | img_count += 1 186 | 187 | if show_video: 188 | show_image(f['frame'], bboxes, max_pixel_value) 189 | plt.pause(0.05) 190 | 191 | if export_all_frames_requested: 192 | save_bb_image(f['frame'], np.array([]), export_frames_path + filename + "_" + requested_encoder + ".jpg", False, max_pixel_value) 193 | 194 | print("Saved {:d} encoded frames in path: {:s}".format(img_count, dir_paths["images"])) 195 | print("Saved {:d} bounding boxes annotations in path: {:s}".format(bbox_count, dir_paths["labels"])) 196 | 197 | completed_file = open(dir_paths['completed'], "a") 198 | completed_file.write(video_name + "\n") 199 | completed_file.close() 200 | 201 | print("Done") 202 | -------------------------------------------------------------------------------- /src/rt_detection.py: -------------------------------------------------------------------------------- 1 | """ 2 | rt_detection.py: Real Time Detection, uses YOLOv3 implementation 3 | in order to detect objects on an GEN1 event video (.dat) using 4 | Temporal Binary, Polarity and Surface Active Events encodings 5 | """ 6 | 7 | from __future__ import division 8 | 9 | import os 10 | import sys 11 | import time 12 | import datetime 13 | import argparse 14 | 15 | from PIL import Image 16 | 17 | import torch 18 | import torchvision.transforms as transforms 19 | from torch.utils.data import DataLoader 20 | from torchvision import datasets 21 | from torch.autograd import Variable 22 | 23 | import matplotlib.pyplot as plt 24 | import matplotlib.patches as patches 25 | from matplotlib.ticker import NullLocator 26 | 27 | from encoders import * 28 | from ec_utils import * 29 | from tbe import TemporalBinaryEncoding 30 | 31 | import sys 32 | sys.path.insert(0, '../PyTorch-YOLOv3/') 33 | from models import * 34 | from utils.utils import * 35 | from utils.datasets import * 36 | from utils.augmentations import * 37 | from utils.transforms import * 38 | 39 | sys.path.insert(0, '../prophesee-automotive-dataset-toolbox/') 40 | from src.io.psee_loader import PSEELoader 41 | 42 | 43 | def rescaleAndHandleFrame(detections, 44 | frame, 45 | img_size, 46 | show_video, 47 | save_frames, 48 | batch_count, 49 | is_sae: bool = False): 50 | bbox_list = [] 51 | if detections[0] is not None: 52 | for d in detections: 53 | d = d.cpu() 54 | to_list = d.tolist() 55 | if len(to_list) != 0: 56 | bbox_list.append(to_list) 57 | 58 | bbox_list = np.array(bbox_list) 59 | bboxes = [] 60 | if len(bbox_list) != 0: 61 | # Rescale boxes to original image 62 | bbox_list = rescale_boxes(bbox_list[0], img_size, frame.shape) 63 | print(bbox_list) 64 | for x1, y1, x2, y2, conf, cls_conf, cls_pred in bbox_list: 65 | 66 | print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item())) 67 | 68 | box_w = x2 - x1 69 | box_h = y2 - y1 70 | bbox = [0, x1, y1, box_w, box_h, cls_pred] 71 | bboxes.append(bbox) 72 | 73 | if show_video > 0: 74 | show_image(frame, np.array(bboxes), 255 if is_sae else 1) 75 | plt.pause(0.001) 76 | 77 | if save_frames > 0: 78 | save_bb_image(frame, np.array(bboxes), output_path + "/" + str(batch_count) + ".png", False, 255 if is_sae else 1) 79 | 80 | 81 | def tbr_detection(gen1_video, 82 | tbr_bits, 83 | delta_t, 84 | output_path, 85 | show_video, 86 | save_frames, 87 | img_size, 88 | conf_thres, 89 | nms_thres): 90 | # Set up TBE vars 91 | accumulation_mat = np.zeros((tbr_bits, gen1_video.get_size()[0], gen1_video.get_size()[1])) 92 | tbe_frame = np.zeros(gen1_video.get_size()) 93 | encoder = TemporalBinaryEncoding(tbr_bits, gen1_video.get_size()[1], gen1_video.get_size()[0]) 94 | 95 | i = 0 96 | batch_count = 0 97 | prev_time = time.time() 98 | # Parse events and build TBE frames 99 | while not gen1_video.done: 100 | i = (i + 1) % tbr_bits 101 | # Load next 1ms events from the video 102 | events = gen1_video.load_delta_t(delta_t) 103 | f = np.zeros(gen1_video.get_size()) 104 | for e in events: 105 | # Evaluate presence/absence of event for 106 | # a certain pixel 107 | f[e['y'], e['x']] = 1 108 | 109 | accumulation_mat[i, ...] = f 110 | 111 | if i == tbr_bits - 1: 112 | # Encode frame 113 | tbe_frame = encoder.encode(accumulation_mat) 114 | 115 | transform = transforms.Compose([ 116 | ToTensor(), 117 | Resize(img_size) 118 | ]) 119 | 120 | # Implemented transformations expect bbox array. Use a fake array 121 | # @TODO: find a better solution... 122 | input_img, bbox = transform([Image.fromarray(255 * tbe_frame).convert('RGB'), np.zeros((1,1))]) 123 | # Add batch size (1) 124 | input_img = torch.unsqueeze(input_img, 0) 125 | input_img = input_img.to(device) 126 | 127 | detect_prev_time = time.time() 128 | # Detect objects on TBE frame 129 | with torch.no_grad(): 130 | detections = model(input_img) 131 | detections = non_max_suppression(detections, conf_thres, nms_thres) 132 | detection_time = datetime.timedelta(seconds=time.time() - detect_prev_time) 133 | print("\t+ Detection Time: %s" % (detection_time)) 134 | 135 | # Log progress 136 | current_time = time.time() 137 | inference_time = datetime.timedelta(seconds=current_time - prev_time) 138 | prev_time = current_time 139 | print("\t+ Batch %d, Inference Time: %s" % (batch_count, inference_time)) 140 | batch_count += 1 141 | 142 | rescaleAndHandleFrame(detections, tbe_frame, img_size, show_video, save_frames, batch_count) 143 | 144 | 145 | def polarity_detection(gen1_video, 146 | delta_t, 147 | output_path, 148 | show_video, 149 | save_frames, 150 | img_size, 151 | conf_thres, 152 | nms_thres): 153 | batch_count = 0 154 | prev_time = time.time() 155 | # Parse events and build Polarity frames 156 | while not gen1_video.done: 157 | # Load next events from the video 158 | events = gen1_video.load_delta_t(delta_t) 159 | p_frame = np.full(gen1_video.get_size(), 0.5) 160 | for e in events: 161 | # Evaluate polarity of an event 162 | # for a certain pixel 163 | if e['p'] == 1: 164 | p_frame[e['y'], e['x']] = 1 165 | else: 166 | p_frame[e['y'], e['x']] = 0 167 | 168 | transform = transforms.Compose([ 169 | ToTensor(), 170 | Resize(img_size) 171 | ]) 172 | 173 | # Implemented transformations expect bbox array. Use a fake array 174 | # @TODO: find a better solution... 175 | input_img, bbox = transform([Image.fromarray(255 * p_frame).convert('RGB'), np.zeros((1,1))]) 176 | # Add batch size (1) 177 | input_img = torch.unsqueeze(input_img, 0) 178 | input_img = input_img.to(device) 179 | 180 | detect_prev_time = time.time() 181 | # Detect objects on Polarity frame 182 | with torch.no_grad(): 183 | detections = model(input_img) 184 | detections = non_max_suppression(detections, conf_thres, nms_thres) 185 | detection_time = datetime.timedelta(seconds=time.time() - detect_prev_time) 186 | print("\t+ Detection Time: %s" % (detection_time)) 187 | 188 | # Log progress 189 | current_time = time.time() 190 | inference_time = datetime.timedelta(seconds=current_time - prev_time) 191 | prev_time = current_time 192 | print("\t+ Batch %d, Inference Time: %s" % (batch_count, inference_time)) 193 | batch_count += 1 194 | 195 | rescaleAndHandleFrame(detections, p_frame, img_size, show_video, save_frames, batch_count) 196 | 197 | 198 | def sae_detection(gen1_video, 199 | delta_t, 200 | output_path, 201 | show_video, 202 | save_frames, 203 | img_size, 204 | conf_thres, 205 | nms_thres): 206 | batch_count = 0 207 | prev_time = time.time() 208 | startTimestamp = 0 # microseconds 209 | # Parse events and build SAE frames 210 | while not gen1_video.done: 211 | # Load next events from the video 212 | events = gen1_video.load_delta_t(delta_t) 213 | sae_frame = np.zeros(gen1_video.get_size()) 214 | for e in events: 215 | # Evaluate sae of an event 216 | # for a certain pixel 217 | t_p = e['t'] # microseconds 218 | t_0 = startTimestamp # microseconds 219 | sae_frame[e['y'], e['x']] = 255 * ((t_p - t_0) / delta_t) 220 | 221 | startTimestamp += delta_t 222 | 223 | transform = transforms.Compose([ 224 | ToTensor(), 225 | Resize(img_size) 226 | ]) 227 | 228 | # Implemented transformations expect bbox array. Use a fake array 229 | # @TODO: find a better solution... 230 | input_img, bbox = transform([Image.fromarray(sae_frame).convert('RGB'), np.zeros((1,1))]) 231 | # Add batch size (1) 232 | input_img = torch.unsqueeze(input_img, 0) 233 | input_img = input_img.to(device) 234 | 235 | detect_prev_time = time.time() 236 | # Detect objects on SAE frame 237 | with torch.no_grad(): 238 | detections = model(input_img) 239 | detections = non_max_suppression(detections, conf_thres, nms_thres) 240 | detection_time = datetime.timedelta(seconds=time.time() - detect_prev_time) 241 | print("\t+ Detection Time: %s" % (detection_time)) 242 | 243 | # Log progress 244 | current_time = time.time() 245 | inference_time = datetime.timedelta(seconds=current_time - prev_time) 246 | prev_time = current_time 247 | print("\t+ Batch %d, Inference Time: %s" % (batch_count, inference_time)) 248 | batch_count += 1 249 | 250 | rescaleAndHandleFrame(detections, sae_frame, img_size, show_video, save_frames, batch_count, True) 251 | 252 | 253 | if __name__ == "__main__": 254 | parser = argparse.ArgumentParser() 255 | parser.add_argument("--encoder", type=str, default='tbr', 256 | help="encoder: encode method ") 257 | parser.add_argument("--event_video", type=str, default="../data/event.dat", 258 | help="event_video: path to event video (.dat)") 259 | parser.add_argument('--tbr_bits', '-n', type=int, default=8, 260 | help='tbr_bits: set the number of bits for Temporal Binary Representation. Default: 8') 261 | parser.add_argument('--accumulation_time', '-a', type=int, default=2500, 262 | help='accumulation_time: set the quantization time of events (microseconds). Default: 2500') 263 | parser.add_argument("--model_def", type=str, default="../PyTorch-YOLOv3/config/yolov3.cfg", 264 | help="model_def: path to model definition file") 265 | parser.add_argument("--weights_path", type=str, default="../PyTorch-YOLOv3/weights/yolov3.weights", 266 | help="weights_path: path to weights file") 267 | parser.add_argument("--class_path", type=str, default="../data/classes.names", 268 | help="class_path: path to class label file") 269 | parser.add_argument("--conf_thres", type=float, default=0.8, 270 | help="conf_thres: object confidence threshold") 271 | parser.add_argument("--nms_thres", type=float, default=0.4, 272 | help="nms_thres: iou thresshold for non-maximum suppression") 273 | parser.add_argument("--batch_size", type=int, default=1, 274 | help="batch_size: size of the batches") 275 | parser.add_argument("--n_cpu", type=int, default=0, 276 | help="m_cpu: number of cpu threads to use during batch generation") 277 | parser.add_argument("--img_size", type=int, default=416, 278 | help="img_size: size of each image dimension") 279 | parser.add_argument('--show_video', action='count', default=0, 280 | help='show_video: show video with evaluated TBR frames and their bboxes during processing. Default: false') 281 | parser.add_argument('--save_frames', action='count', default=0, 282 | help='save_frames: save TBE frames and their detection as images') 283 | 284 | opt = parser.parse_args() 285 | print(opt) 286 | 287 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 288 | 289 | output_path = "../rt_detections" 290 | if opt.save_frames > 0: 291 | os.makedirs(output_path, exist_ok=True) 292 | 293 | # Set up model 294 | model = Darknet(opt.model_def, img_size=opt.img_size).to(device) 295 | 296 | if opt.weights_path.endswith(".weights"): 297 | # Load darknet weights 298 | model.load_darknet_weights(opt.weights_path) 299 | else: 300 | # Load checkpoint weights 301 | model.load_state_dict(torch.load(opt.weights_path)) 302 | 303 | model.eval() # Set in evaluation mode 304 | 305 | # Encoder 306 | encoder = opt.encoder 307 | if encoder != "tbr" and encoder != "polarity" and encoder != "sae": 308 | print("Invalid encoder specified. Available encoders: . Exiting...") 309 | exit() 310 | 311 | # Number of bits to be used in Temporal Binary Encoding 312 | tbr_bits = opt.tbr_bits 313 | 314 | # Accumulation time (microseconds) 315 | delta_t = opt.accumulation_time 316 | 317 | gen1_video = PSEELoader(opt.event_video) 318 | 319 | classes = load_classes(opt.class_path) # Extracts class labels from file 320 | 321 | if encoder == "tbr": 322 | tbr_detection(gen1_video, 323 | tbr_bits, 324 | delta_t, 325 | output_path, 326 | opt.show_video, 327 | opt.save_frames, 328 | opt.img_size, 329 | opt.conf_thres, 330 | opt.nms_thres) 331 | elif encoder == "polarity": 332 | polarity_detection(gen1_video, 333 | delta_t, 334 | output_path, 335 | opt.show_video, 336 | opt.save_frames, 337 | opt.img_size, 338 | opt.conf_thres, 339 | opt.nms_thres) 340 | else: 341 | sae_detection(gen1_video, 342 | delta_t, 343 | output_path, 344 | opt.show_video, 345 | opt.save_frames, 346 | opt.img_size, 347 | opt.conf_thres, 348 | opt.nms_thres) 349 | 350 | -------------------------------------------------------------------------------- /src/tbe.py: -------------------------------------------------------------------------------- 1 | """ 2 | tbe.py: class that manages the Temporal Binary Encoding 3 | """ 4 | 5 | import numpy as np 6 | 7 | class TemporalBinaryEncoding: 8 | """ 9 | @brief: Class for Temporal Binary Encoding using N bits 10 | """ 11 | 12 | def __init__(self, N: int, width: int, height: int): 13 | self.N = N 14 | self.width = width 15 | self.height = height 16 | self._mask = np.ones((self.N, self.height, self.width)) 17 | 18 | # Build the mask 19 | for i in range(N): 20 | self._mask[i, :, :] = 2 ** i 21 | 22 | def encode(self, mat: np.array) -> np.array: 23 | """ 24 | @brief: Encode events using binary encoding 25 | @param: mat 26 | @return: Encoded frame 27 | """ 28 | 29 | frame = np.sum((mat * self._mask), 0) / (2 ** self.N) 30 | return frame -------------------------------------------------------------------------------- /src/test_gen1.py: -------------------------------------------------------------------------------- 1 | """ 2 | test_gen1.py: Fork of the YOLOv3 implementation. 3 | This script will also output a GEN1 compliant 4 | npy array for each test event in order to use 5 | the prophesee COCO evaluation. 6 | """ 7 | 8 | from __future__ import division 9 | 10 | import sys 11 | sys.path.insert(0, '../PyTorch-YOLOv3/') 12 | from models import * 13 | from utils.utils import * 14 | from utils.datasets import * 15 | from utils.augmentations import * 16 | from utils.transforms import * 17 | from utils.parse_config import * 18 | 19 | import os 20 | import sys 21 | import time 22 | import datetime 23 | import argparse 24 | import tqdm 25 | 26 | import torch 27 | from torch.utils.data import DataLoader 28 | from torchvision import datasets 29 | from torchvision import transforms 30 | from torch.autograd import Variable 31 | import torch.optim as optim 32 | 33 | 34 | def extract_bboxes(tensor, timestamp, img_size): 35 | bboxes = [] 36 | if tensor is None: 37 | return bboxes 38 | 39 | gen1_img_width = 304 40 | gen1_img_height = 240 41 | clone_tensor = tensor.clone() 42 | clone_tensor = clone_tensor.numpy() 43 | # Rescale boxes to original image 44 | bbox_list = rescale_boxes(clone_tensor, img_size, (gen1_img_height, gen1_img_width)) 45 | for b in bbox_list: 46 | x1 = b[0] 47 | y1 = b[1] 48 | x2 = b[2] 49 | y2 = b[3] 50 | 51 | w = x2 - x1 52 | h = y2 - y1 53 | 54 | pred_cls = int(b[6]) 55 | conf = float(b[5]) 56 | 57 | bbox = [timestamp, int(x1), int(y1), int(w), int(h), pred_cls, conf, 0] 58 | bboxes.append(tuple(bbox)) 59 | 60 | return bboxes 61 | 62 | def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size, gen1_output, acc_time): 63 | model.eval() 64 | 65 | # Get dataloader 66 | dataset = ListDataset(path, 67 | img_size=img_size, 68 | multiscale=False, 69 | transform=DEFAULT_TRANSFORMS) 70 | dataloader = torch.utils.data.DataLoader( 71 | dataset, 72 | batch_size=batch_size, 73 | shuffle=False, 74 | num_workers=1, 75 | collate_fn=dataset.collate_fn 76 | ) 77 | 78 | Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor 79 | 80 | labels = [] 81 | sample_metrics = [] # List of tuples (TP, confs, pred) 82 | event_npy = [] 83 | gen1_data_type= np.dtype([('ts', ' 5 | 6 | src_folder="$1" 7 | new_path="$2" 8 | 9 | src_relative_path="$src_folder"/data/custom 10 | dest_relative_path="$new_path"/data/custom/images 11 | train_txt_file="$src_relative_path"/train.txt 12 | valid_txt_file="$src_relative_path"/valid.txt 13 | test_txt_file="$src_relative_path"/test.txt 14 | 15 | check_txt_files() { 16 | local txt_file_path="$1" 17 | if [ ! -f "$txt_file_path" ] 18 | then 19 | echo "File $txt_file_path does not exists, exiting..." 20 | exit 1 21 | fi 22 | } 23 | 24 | change_paths() { 25 | local tmp="$1"/swp.txt 26 | local src="$2" 27 | 28 | touch "$tmp" 29 | 30 | while read line; do 31 | filename=$(basename "$line") 32 | new_line="$dest_relative_path"/"$filename" 33 | echo "$new_line" >> "$tmp" 34 | done < "$src" 35 | 36 | mv "$tmp" "$src" 37 | } 38 | 39 | check_txt_files "$train_txt_file" 40 | check_txt_files "$valid_txt_file" 41 | check_txt_files "$test_txt_file" 42 | change_paths "$src_relative_path" "$train_txt_file" 43 | change_paths "$src_relative_path" "$valid_txt_file" 44 | change_paths "$src_relative_path" "$test_txt_file" 45 | -------------------------------------------------------------------------------- /tools/get_bbox_classes.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Get the number of bboxes saved for each class. 3 | # Usage: /bin/bash get_bbox_classes.sh 4 | 5 | pedestrians=0 6 | vehicles=0 7 | images=0 8 | 9 | readarray -t a < "$1" 10 | 11 | for txt in "${a[@]}" 12 | do 13 | images=$((images+1)) 14 | filename=$(basename "$txt") 15 | filename="${filename%.*}.txt" 16 | while read line; do 17 | # Read each label line 18 | class=$(echo $line | head -n1 | awk '{print $1;}') 19 | if [ "$class" == 1 ] 20 | then 21 | pedestrians=$((pedestrians+1)) 22 | elif [ "$class" == 0 ] 23 | then 24 | vehicles=$((vehicles+1)) 25 | fi 26 | done < "$2"/"$filename" 27 | done 28 | 29 | echo "Images: $images" 30 | echo "Vehicles: $vehicles" 31 | echo "Pedestrians: $pedestrians" -------------------------------------------------------------------------------- /tools/get_gen1_bboxes.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import numpy as np 4 | 5 | def printGen1BBoxes(directory: str): 6 | """ 7 | @brief: Print GEN1 bboxes (.npy files) 8 | @param: directory - Directory where GEN1 .npy files are stored 9 | """ 10 | 11 | file_list_npy = [file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory, file)) and 12 | os.path.splitext(os.path.join(directory, file))[1] == '.npy'] 13 | for npy_arr in file_list_npy: 14 | pedestrian_bboxes = 0 15 | vehicle_bboxes = 0 16 | 17 | gen1_bboxes = np.load(directory + "/" + npy_arr) 18 | for bbox in gen1_bboxes: 19 | if bbox[5] == 1: 20 | pedestrian_bboxes += 1 21 | else: 22 | vehicle_bboxes += 1 23 | 24 | print("==================================") 25 | print("Filename: " + npy_arr) 26 | print("Pedestrian bboxes: " + str(pedestrian_bboxes)) 27 | print("Vehicle bboxes: " + str(vehicle_bboxes)) 28 | 29 | 30 | if __name__ == "__main__": 31 | if len(sys.argv) < 2: 32 | print("Usage: python get_gen1_bboxes.py path") 33 | exit() 34 | 35 | printGen1BBoxes(str(sys.argv[1])) -------------------------------------------------------------------------------- /yolo_config/gen1-test.data: -------------------------------------------------------------------------------- 1 | classes= 2 2 | valid=/path/to/dataset/data/custom/test.txt 3 | names=/path/to/dataset/data/custom/classes.names 4 | eval=coco 5 | -------------------------------------------------------------------------------- /yolo_config/gen1.data: -------------------------------------------------------------------------------- 1 | classes=2 2 | train=/path/to/dataset/data/custom/train.txt 3 | valid=/path/to/dataset/data/custom/valid.txt 4 | names=/path/to/dataset/data/custom/classes.names 5 | -------------------------------------------------------------------------------- /yolo_config/yolov3-gen1.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Training 3 | batch=64 4 | subdivisions=8 5 | width=320 6 | height=320 7 | channels=3 8 | momentum=0.9 9 | decay=0.0005 10 | angle=0 11 | saturation = 1.5 12 | exposure = 1.5 13 | hue=.1 14 | 15 | learning_rate=0.001 16 | burn_in=1000 17 | max_batches = 500200 18 | policy=steps 19 | steps=400000,450000 20 | scales=.1,.1 21 | 22 | [convolutional] 23 | batch_normalize=1 24 | filters=32 25 | size=3 26 | stride=1 27 | pad=1 28 | activation=leaky 29 | 30 | # Downsample 31 | 32 | [convolutional] 33 | batch_normalize=1 34 | filters=64 35 | size=3 36 | stride=2 37 | pad=1 38 | activation=leaky 39 | 40 | [convolutional] 41 | batch_normalize=1 42 | filters=32 43 | size=1 44 | stride=1 45 | pad=1 46 | activation=leaky 47 | 48 | [convolutional] 49 | batch_normalize=1 50 | filters=64 51 | size=3 52 | stride=1 53 | pad=1 54 | activation=leaky 55 | 56 | [shortcut] 57 | from=-3 58 | activation=linear 59 | 60 | # Downsample 61 | 62 | [convolutional] 63 | batch_normalize=1 64 | filters=128 65 | size=3 66 | stride=2 67 | pad=1 68 | activation=leaky 69 | 70 | [convolutional] 71 | batch_normalize=1 72 | filters=64 73 | size=1 74 | stride=1 75 | pad=1 76 | activation=leaky 77 | 78 | [convolutional] 79 | batch_normalize=1 80 | filters=128 81 | size=3 82 | stride=1 83 | pad=1 84 | activation=leaky 85 | 86 | [shortcut] 87 | from=-3 88 | activation=linear 89 | 90 | [convolutional] 91 | batch_normalize=1 92 | filters=64 93 | size=1 94 | stride=1 95 | pad=1 96 | activation=leaky 97 | 98 | [convolutional] 99 | batch_normalize=1 100 | filters=128 101 | size=3 102 | stride=1 103 | pad=1 104 | activation=leaky 105 | 106 | [shortcut] 107 | from=-3 108 | activation=linear 109 | 110 | # Downsample 111 | 112 | [convolutional] 113 | batch_normalize=1 114 | filters=256 115 | size=3 116 | stride=2 117 | pad=1 118 | activation=leaky 119 | 120 | [convolutional] 121 | batch_normalize=1 122 | filters=128 123 | size=1 124 | stride=1 125 | pad=1 126 | activation=leaky 127 | 128 | [convolutional] 129 | batch_normalize=1 130 | filters=256 131 | size=3 132 | stride=1 133 | pad=1 134 | activation=leaky 135 | 136 | [shortcut] 137 | from=-3 138 | activation=linear 139 | 140 | [convolutional] 141 | batch_normalize=1 142 | filters=128 143 | size=1 144 | stride=1 145 | pad=1 146 | activation=leaky 147 | 148 | [convolutional] 149 | batch_normalize=1 150 | filters=256 151 | size=3 152 | stride=1 153 | pad=1 154 | activation=leaky 155 | 156 | [shortcut] 157 | from=-3 158 | activation=linear 159 | 160 | [convolutional] 161 | batch_normalize=1 162 | filters=128 163 | size=1 164 | stride=1 165 | pad=1 166 | activation=leaky 167 | 168 | [convolutional] 169 | batch_normalize=1 170 | filters=256 171 | size=3 172 | stride=1 173 | pad=1 174 | activation=leaky 175 | 176 | [shortcut] 177 | from=-3 178 | activation=linear 179 | 180 | [convolutional] 181 | batch_normalize=1 182 | filters=128 183 | size=1 184 | stride=1 185 | pad=1 186 | activation=leaky 187 | 188 | [convolutional] 189 | batch_normalize=1 190 | filters=256 191 | size=3 192 | stride=1 193 | pad=1 194 | activation=leaky 195 | 196 | [shortcut] 197 | from=-3 198 | activation=linear 199 | 200 | 201 | [convolutional] 202 | batch_normalize=1 203 | filters=128 204 | size=1 205 | stride=1 206 | pad=1 207 | activation=leaky 208 | 209 | [convolutional] 210 | batch_normalize=1 211 | filters=256 212 | size=3 213 | stride=1 214 | pad=1 215 | activation=leaky 216 | 217 | [shortcut] 218 | from=-3 219 | activation=linear 220 | 221 | [convolutional] 222 | batch_normalize=1 223 | filters=128 224 | size=1 225 | stride=1 226 | pad=1 227 | activation=leaky 228 | 229 | [convolutional] 230 | batch_normalize=1 231 | filters=256 232 | size=3 233 | stride=1 234 | pad=1 235 | activation=leaky 236 | 237 | [shortcut] 238 | from=-3 239 | activation=linear 240 | 241 | [convolutional] 242 | batch_normalize=1 243 | filters=128 244 | size=1 245 | stride=1 246 | pad=1 247 | activation=leaky 248 | 249 | [convolutional] 250 | batch_normalize=1 251 | filters=256 252 | size=3 253 | stride=1 254 | pad=1 255 | activation=leaky 256 | 257 | [shortcut] 258 | from=-3 259 | activation=linear 260 | 261 | [convolutional] 262 | batch_normalize=1 263 | filters=128 264 | size=1 265 | stride=1 266 | pad=1 267 | activation=leaky 268 | 269 | [convolutional] 270 | batch_normalize=1 271 | filters=256 272 | size=3 273 | stride=1 274 | pad=1 275 | activation=leaky 276 | 277 | [shortcut] 278 | from=-3 279 | activation=linear 280 | 281 | # Downsample 282 | 283 | [convolutional] 284 | batch_normalize=1 285 | filters=512 286 | size=3 287 | stride=2 288 | pad=1 289 | activation=leaky 290 | 291 | [convolutional] 292 | batch_normalize=1 293 | filters=256 294 | size=1 295 | stride=1 296 | pad=1 297 | activation=leaky 298 | 299 | [convolutional] 300 | batch_normalize=1 301 | filters=512 302 | size=3 303 | stride=1 304 | pad=1 305 | activation=leaky 306 | 307 | [shortcut] 308 | from=-3 309 | activation=linear 310 | 311 | 312 | [convolutional] 313 | batch_normalize=1 314 | filters=256 315 | size=1 316 | stride=1 317 | pad=1 318 | activation=leaky 319 | 320 | [convolutional] 321 | batch_normalize=1 322 | filters=512 323 | size=3 324 | stride=1 325 | pad=1 326 | activation=leaky 327 | 328 | [shortcut] 329 | from=-3 330 | activation=linear 331 | 332 | 333 | [convolutional] 334 | batch_normalize=1 335 | filters=256 336 | size=1 337 | stride=1 338 | pad=1 339 | activation=leaky 340 | 341 | [convolutional] 342 | batch_normalize=1 343 | filters=512 344 | size=3 345 | stride=1 346 | pad=1 347 | activation=leaky 348 | 349 | [shortcut] 350 | from=-3 351 | activation=linear 352 | 353 | 354 | [convolutional] 355 | batch_normalize=1 356 | filters=256 357 | size=1 358 | stride=1 359 | pad=1 360 | activation=leaky 361 | 362 | [convolutional] 363 | batch_normalize=1 364 | filters=512 365 | size=3 366 | stride=1 367 | pad=1 368 | activation=leaky 369 | 370 | [shortcut] 371 | from=-3 372 | activation=linear 373 | 374 | [convolutional] 375 | batch_normalize=1 376 | filters=256 377 | size=1 378 | stride=1 379 | pad=1 380 | activation=leaky 381 | 382 | [convolutional] 383 | batch_normalize=1 384 | filters=512 385 | size=3 386 | stride=1 387 | pad=1 388 | activation=leaky 389 | 390 | [shortcut] 391 | from=-3 392 | activation=linear 393 | 394 | 395 | [convolutional] 396 | batch_normalize=1 397 | filters=256 398 | size=1 399 | stride=1 400 | pad=1 401 | activation=leaky 402 | 403 | [convolutional] 404 | batch_normalize=1 405 | filters=512 406 | size=3 407 | stride=1 408 | pad=1 409 | activation=leaky 410 | 411 | [shortcut] 412 | from=-3 413 | activation=linear 414 | 415 | 416 | [convolutional] 417 | batch_normalize=1 418 | filters=256 419 | size=1 420 | stride=1 421 | pad=1 422 | activation=leaky 423 | 424 | [convolutional] 425 | batch_normalize=1 426 | filters=512 427 | size=3 428 | stride=1 429 | pad=1 430 | activation=leaky 431 | 432 | [shortcut] 433 | from=-3 434 | activation=linear 435 | 436 | [convolutional] 437 | batch_normalize=1 438 | filters=256 439 | size=1 440 | stride=1 441 | pad=1 442 | activation=leaky 443 | 444 | [convolutional] 445 | batch_normalize=1 446 | filters=512 447 | size=3 448 | stride=1 449 | pad=1 450 | activation=leaky 451 | 452 | [shortcut] 453 | from=-3 454 | activation=linear 455 | 456 | # Downsample 457 | 458 | [convolutional] 459 | batch_normalize=1 460 | filters=1024 461 | size=3 462 | stride=2 463 | pad=1 464 | activation=leaky 465 | 466 | [convolutional] 467 | batch_normalize=1 468 | filters=512 469 | size=1 470 | stride=1 471 | pad=1 472 | activation=leaky 473 | 474 | [convolutional] 475 | batch_normalize=1 476 | filters=1024 477 | size=3 478 | stride=1 479 | pad=1 480 | activation=leaky 481 | 482 | [shortcut] 483 | from=-3 484 | activation=linear 485 | 486 | [convolutional] 487 | batch_normalize=1 488 | filters=512 489 | size=1 490 | stride=1 491 | pad=1 492 | activation=leaky 493 | 494 | [convolutional] 495 | batch_normalize=1 496 | filters=1024 497 | size=3 498 | stride=1 499 | pad=1 500 | activation=leaky 501 | 502 | [shortcut] 503 | from=-3 504 | activation=linear 505 | 506 | [convolutional] 507 | batch_normalize=1 508 | filters=512 509 | size=1 510 | stride=1 511 | pad=1 512 | activation=leaky 513 | 514 | [convolutional] 515 | batch_normalize=1 516 | filters=1024 517 | size=3 518 | stride=1 519 | pad=1 520 | activation=leaky 521 | 522 | [shortcut] 523 | from=-3 524 | activation=linear 525 | 526 | [convolutional] 527 | batch_normalize=1 528 | filters=512 529 | size=1 530 | stride=1 531 | pad=1 532 | activation=leaky 533 | 534 | [convolutional] 535 | batch_normalize=1 536 | filters=1024 537 | size=3 538 | stride=1 539 | pad=1 540 | activation=leaky 541 | 542 | [shortcut] 543 | from=-3 544 | activation=linear 545 | 546 | ###################### 547 | 548 | [convolutional] 549 | batch_normalize=1 550 | filters=512 551 | size=1 552 | stride=1 553 | pad=1 554 | activation=leaky 555 | 556 | [convolutional] 557 | batch_normalize=1 558 | size=3 559 | stride=1 560 | pad=1 561 | filters=1024 562 | activation=leaky 563 | 564 | [convolutional] 565 | batch_normalize=1 566 | filters=512 567 | size=1 568 | stride=1 569 | pad=1 570 | activation=leaky 571 | 572 | [convolutional] 573 | batch_normalize=1 574 | size=3 575 | stride=1 576 | pad=1 577 | filters=1024 578 | activation=leaky 579 | 580 | [convolutional] 581 | batch_normalize=1 582 | filters=512 583 | size=1 584 | stride=1 585 | pad=1 586 | activation=leaky 587 | 588 | [convolutional] 589 | batch_normalize=1 590 | size=3 591 | stride=1 592 | pad=1 593 | filters=1024 594 | activation=leaky 595 | 596 | [convolutional] 597 | size=1 598 | stride=1 599 | pad=1 600 | filters=255 601 | activation=linear 602 | 603 | 604 | [yolo] 605 | mask = 6,7,8 606 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 607 | classes=80 608 | num=9 609 | jitter=.3 610 | ignore_thresh = .7 611 | truth_thresh = 1 612 | random=1 613 | 614 | 615 | [route] 616 | layers = -4 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=256 621 | size=1 622 | stride=1 623 | pad=1 624 | activation=leaky 625 | 626 | [upsample] 627 | stride=2 628 | 629 | [route] 630 | layers = -1, 61 631 | 632 | 633 | 634 | [convolutional] 635 | batch_normalize=1 636 | filters=256 637 | size=1 638 | stride=1 639 | pad=1 640 | activation=leaky 641 | 642 | [convolutional] 643 | batch_normalize=1 644 | size=3 645 | stride=1 646 | pad=1 647 | filters=512 648 | activation=leaky 649 | 650 | [convolutional] 651 | batch_normalize=1 652 | filters=256 653 | size=1 654 | stride=1 655 | pad=1 656 | activation=leaky 657 | 658 | [convolutional] 659 | batch_normalize=1 660 | size=3 661 | stride=1 662 | pad=1 663 | filters=512 664 | activation=leaky 665 | 666 | [convolutional] 667 | batch_normalize=1 668 | filters=256 669 | size=1 670 | stride=1 671 | pad=1 672 | activation=leaky 673 | 674 | [convolutional] 675 | batch_normalize=1 676 | size=3 677 | stride=1 678 | pad=1 679 | filters=512 680 | activation=leaky 681 | 682 | [convolutional] 683 | size=1 684 | stride=1 685 | pad=1 686 | filters=255 687 | activation=linear 688 | 689 | 690 | [yolo] 691 | mask = 3,4,5 692 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 693 | classes=80 694 | num=9 695 | jitter=.3 696 | ignore_thresh = .7 697 | truth_thresh = 1 698 | random=1 699 | 700 | 701 | 702 | [route] 703 | layers = -4 704 | 705 | [convolutional] 706 | batch_normalize=1 707 | filters=128 708 | size=1 709 | stride=1 710 | pad=1 711 | activation=leaky 712 | 713 | [upsample] 714 | stride=2 715 | 716 | [route] 717 | layers = -1, 36 718 | 719 | 720 | 721 | [convolutional] 722 | batch_normalize=1 723 | filters=128 724 | size=1 725 | stride=1 726 | pad=1 727 | activation=leaky 728 | 729 | [convolutional] 730 | batch_normalize=1 731 | size=3 732 | stride=1 733 | pad=1 734 | filters=256 735 | activation=leaky 736 | 737 | [convolutional] 738 | batch_normalize=1 739 | filters=128 740 | size=1 741 | stride=1 742 | pad=1 743 | activation=leaky 744 | 745 | [convolutional] 746 | batch_normalize=1 747 | size=3 748 | stride=1 749 | pad=1 750 | filters=256 751 | activation=leaky 752 | 753 | [convolutional] 754 | batch_normalize=1 755 | filters=128 756 | size=1 757 | stride=1 758 | pad=1 759 | activation=leaky 760 | 761 | [convolutional] 762 | batch_normalize=1 763 | size=3 764 | stride=1 765 | pad=1 766 | filters=256 767 | activation=leaky 768 | 769 | [convolutional] 770 | size=1 771 | stride=1 772 | pad=1 773 | filters=255 774 | activation=linear 775 | 776 | 777 | [yolo] 778 | mask = 0,1,2 779 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 780 | classes=80 781 | num=9 782 | jitter=.3 783 | ignore_thresh = .7 784 | truth_thresh = 1 785 | random=1 786 | -------------------------------------------------------------------------------- /yolo_config/yolov3-tiny.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Training 3 | batch=64 4 | subdivisions=8 5 | width=320 6 | height=320 7 | channels=3 8 | momentum=0.9 9 | decay=0.0005 10 | angle=0 11 | saturation = 1.5 12 | exposure = 1.5 13 | hue=.1 14 | 15 | learning_rate=0.001 16 | burn_in=1000 17 | max_batches = 500200 18 | policy=steps 19 | steps=400000,450000 20 | scales=.1,.1 21 | 22 | # 0 23 | [convolutional] 24 | batch_normalize=1 25 | filters=16 26 | size=3 27 | stride=1 28 | pad=1 29 | activation=leaky 30 | 31 | # 1 32 | [maxpool] 33 | size=2 34 | stride=2 35 | 36 | # 2 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=1 42 | pad=1 43 | activation=leaky 44 | 45 | # 3 46 | [maxpool] 47 | size=2 48 | stride=2 49 | 50 | # 4 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | # 5 60 | [maxpool] 61 | size=2 62 | stride=2 63 | 64 | # 6 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=1 70 | pad=1 71 | activation=leaky 72 | 73 | # 7 74 | [maxpool] 75 | size=2 76 | stride=2 77 | 78 | # 8 79 | [convolutional] 80 | batch_normalize=1 81 | filters=256 82 | size=3 83 | stride=1 84 | pad=1 85 | activation=leaky 86 | 87 | # 9 88 | [maxpool] 89 | size=2 90 | stride=2 91 | 92 | # 10 93 | [convolutional] 94 | batch_normalize=1 95 | filters=512 96 | size=3 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | # 11 102 | [maxpool] 103 | size=2 104 | stride=1 105 | 106 | # 12 107 | [convolutional] 108 | batch_normalize=1 109 | filters=1024 110 | size=3 111 | stride=1 112 | pad=1 113 | activation=leaky 114 | 115 | ########### 116 | 117 | # 13 118 | [convolutional] 119 | batch_normalize=1 120 | filters=256 121 | size=1 122 | stride=1 123 | pad=1 124 | activation=leaky 125 | 126 | # 14 127 | [convolutional] 128 | batch_normalize=1 129 | filters=512 130 | size=3 131 | stride=1 132 | pad=1 133 | activation=leaky 134 | 135 | # 15 136 | [convolutional] 137 | size=1 138 | stride=1 139 | pad=1 140 | filters=255 141 | activation=linear 142 | 143 | 144 | 145 | # 16 146 | [yolo] 147 | mask = 3,4,5 148 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 149 | classes=80 150 | num=6 151 | jitter=.3 152 | ignore_thresh = .7 153 | truth_thresh = 1 154 | random=1 155 | 156 | # 17 157 | [route] 158 | layers = -4 159 | 160 | # 18 161 | [convolutional] 162 | batch_normalize=1 163 | filters=128 164 | size=1 165 | stride=1 166 | pad=1 167 | activation=leaky 168 | 169 | # 19 170 | [upsample] 171 | stride=2 172 | 173 | # 20 174 | [route] 175 | layers = -1, 8 176 | 177 | # 21 178 | [convolutional] 179 | batch_normalize=1 180 | filters=256 181 | size=3 182 | stride=1 183 | pad=1 184 | activation=leaky 185 | 186 | # 22 187 | [convolutional] 188 | size=1 189 | stride=1 190 | pad=1 191 | filters=255 192 | activation=linear 193 | 194 | # 23 195 | [yolo] 196 | mask = 1,2,3 197 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 198 | classes=80 199 | num=6 200 | jitter=.3 201 | ignore_thresh = .7 202 | truth_thresh = 1 203 | random=1 204 | --------------------------------------------------------------------------------