├── .gitignore
├── .gitmodules
├── README.md
├── doc
    └── VMR_Francesco_Areoluci_Presentation.pdf
├── environment.yml
├── environment_cuda.yml
├── src
    ├── cl_parser.py
    ├── dir_handler.py
    ├── ec_utils.py
    ├── encoders.py
    ├── event_converter.py
    ├── rt_detection.py
    ├── tbe.py
    └── test_gen1.py
├── tools
    ├── change_dataset_path.sh
    ├── get_bbox_classes.sh
    └── get_gen1_bboxes.py
└── yolo_config
    ├── gen1-test.data
    ├── gen1.data
    ├── yolov3-gen1.cfg
    └── yolov3-tiny.cfg


/.gitignore:
--------------------------------------------------------------------------------
1 | data
2 | train_events
3 | valid_events
4 | test_events
5 | */__pycache__
6 | rt_detections
7 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "PyTorch-YOLOv3"]
2 | 	path = PyTorch-YOLOv3
3 | 	url = https://github.com/eriklindernoren/PyTorch-YOLOv3.git
4 | [submodule "prophesee-automotive-dataset-toolbox"]
5 | 	path = prophesee-automotive-dataset-toolbox
6 | 	url = https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox.git
7 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Temporal Binary Represented Event Object Detection
  2 | 
  3 | This repository contains a framework that can be used in order to perform object detection over events acquired from Event Cameras (https://en.wikipedia.org/wiki/Event_camera).
  4 | To perform tests, the following technologies and tools have been employed:
  5 | 
  6 | * Prophesee’s GEN1 Automotive Detection Dataset. This dataset contains events and their annotated bounding boxes for two classes: pedestrian and vehicles (https://www.prophesee.ai/2020/01/24/prophesee-gen1-automotive-detection-dataset/)
  7 | 
  8 | * Temporal Binary Representation. This encoding have been developed in order to encode events into frames that can be feed to an object detector along with annotated bounding boxes. For further details check this paper: https://arxiv.org/pdf/2010.08946.pdf
  9 | 
 10 | * YOLOv3 as Object Detector
 11 | 
 12 | ## Submodules
 13 | 
 14 | This repository use the following repos as submodules:
 15 | * PyTorch YOLOv3 implementation: https://github.com/eriklindernoren/PyTorch-YOLOv3.git
 16 | * Prophesee Toolbox: https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox.git
 17 | 
 18 | Once this repository have been cloned, run:
 19 | > git submodule update --init
 20 | 
 21 | ## Requirements
 22 | 
 23 | In order to execute the conversion and the object detection, use the environment.yml file to create a dedicated Conda environment.
 24 | Note: if you have a NVidia graphic card compatible with CUDA, use the environment_cuda.yml environment in order to use the GPU with YOLO.
 25 | 
 26 | ## Convert events to frames
 27 | 
 28 | Events from the Prophesee’s GEN1 dataset can be converted to frames and bounding box labeling using the code inside the src/ folder. The code performs the conversion of all the events listed in a given directory and organizes the data in a folder compliant to what the YOLOv3 implementation expect. Given a destination directory, the following directory tree is generated:
 29 | ``` bash
 30 | .
 31 | └── data
 32 |     ├── completed_videos
 33 |     ├── custom
 34 |     │   ├── classes.names
 35 |     │   ├── images
 36 |     │   ├── labels
 37 |     │   ├── test.txt
 38 |     │   ├── train.txt
 39 |     │   └── valid.txt
 40 |     └── evaluated_tbe
 41 | 
 42 | ``` 
 43 | 
 44 | Starting from the custom folder, the converted frames and bounding box annotations are stored in images and labels folder. Image types are specified in test, train and valid txt files. Already converted events are specified in completed_videos text file. Temporal Binary Represented array can be stored in npy format in evaluated_tbe folder to avoid performing the conversion multiple times.
 45 | Moreover, other types of conversion have been implemented in order to compare the results of Temporal Binary Represented event object detection (Polarity and Surface Active Events encoding).
 46 | 
 47 | ### Conversion
 48 | 
 49 | The conversion can be executed using src/event_converted.py file:
 50 | > python event_converter.py -h
 51 | ``` bash
 52 | Event to frame converter
 53 | usage: event_converter.py [-h] [--use_stored_enc] [--save_enc] [--show_video]
 54 |                           [--tbr_bits TBR_BITS] [--src_video SRC_VIDEO]
 55 |                           [--dest_path DEST_PATH] [--event_type EVENT_TYPE]
 56 |                           [--save_bb_img SAVE_BB_IMG]
 57 |                           [--accumulation_time ACCUMULATION_TIME]
 58 |                           [--encoder ENCODER]
 59 |                           [--export_all_frames EXPORT_ALL_FRAMES]
 60 | 
 61 | Convert events to frames and associates bboxes
 62 | 
 63 | optional arguments:
 64 |   -h, --help            show this help message and exit
 65 |   --use_stored_enc, -l  use_stored_enc: instead of evaluates TBR or other
 66 |                         encodings, uses pre-evaluated encoded array. Default:
 67 |                         false
 68 |   --save_enc, -s        save_enc: save the intermediate TBR or other encodings
 69 |                         frame array. Default: false
 70 |   --show_video, -v      show_video: show video with evaluated TBR frames and
 71 |                         their bboxes during processing. Default: false
 72 |   --tbr_bits TBR_BITS, -n TBR_BITS
 73 |                         tbr_bits: set the number of bits for Temporal Binary
 74 |                         Representation. Default: 8
 75 |   --src_video SRC_VIDEO, -t SRC_VIDEO
 76 |                         src_video: path to event videos
 77 |   --dest_path DEST_PATH, -d DEST_PATH
 78 |                         dest_path: path where images and bboxes will be stored
 79 |   --event_type EVENT_TYPE, -e EVENT_TYPE
 80 |                         event_type: specify data type: <train | validation |
 81 |                         test>
 82 |   --save_bb_img SAVE_BB_IMG, -b SAVE_BB_IMG
 83 |                         save_bb_img: save frame with bboxes to path
 84 |   --accumulation_time ACCUMULATION_TIME, -a ACCUMULATION_TIME
 85 |                         accumulation_time: set the quantization time of events
 86 |                         (microseconds). Default: 2500
 87 |   --encoder ENCODER, -c ENCODER
 88 |                         encoder: set the encoder: <tbe | polarity | sae>.
 89 |                         Default: tbe
 90 |   --export_all_frames EXPORT_ALL_FRAMES
 91 |                         export_all_frames: export all encoded frames from an
 92 |                         event video to path
 93 | 
 94 | ```
 95 | 
 96 | For example, to convert events from directory /dataset/train, store results in /dest/folder and label them as train data, run the following:
 97 | > python event_converter.py --src_video /dataset/train --dest_path /dest/folder
 98 | 
 99 | To convert events from directory /dataset/validation, store results in /dest/folder and label them as validation data, run the following:
100 | > python event_converter.py --src_video /dataset/validation --dest_path /dest/folder
101 | 
102 | To convert events from directory /dataset/test, store results in /dest/folder and label them as test data, run the following:
103 | > python event_converter.py --src_video /dataset/test --dest_path /dest/folder
104 | 
105 | Additional options are available in order to:
106 | * Change the number of bits that should be used in TBR - Option: -n X
107 | * Save converted frames with bboxes as image in a directory during processing - Option: -b /path/to/folder
108 | * Save the resulting encoded array in npy format - Option: -s
109 | * Load an encoded array - Option -l
110 | * Show video of converted frames and bboxes during processing - Option: -v
111 | * Change the accumulation time - Option: -a
112 | * Change encoder in order to store frames in other formats - Option: -c <tbe | polarity | sae>
113 | 
114 | ### Training the object detector
115 | 
116 | Once the dataset has been built, the object detector can be trained.
117 | To setup the detector, modify the gen1.data and gen1-test.data files inside the yolo_config folder with the absolute path of the dataset.
118 | Two configuration files are avaible in order to use the tiny (yolo_config/yolov3-tiny.cfg) or the full (yolo_config/yolov3-gen1.cfg) YOLO implementation.
119 | 
120 | To train the detector, from the PyTorch-YOLOv3 folder launch the following command:
121 | > python3 train.py --model_def ../yolo_config/yolov3-<tiny | gen1>.cfg --data_config ../yolo_config/gen1.data
122 | 
123 | To test the detector:
124 | 
125 | > python3 test.py --model_def ../yolo_config/yolov3-<tiny | gen1>.cfg --data_config ../yolo_config/gen1-test.data --weights_path checkpoints/preferred_ckpt.pth
126 | 
127 | To use the detector against real images:
128 | 
129 | > python3 detect.py --image_folder /path/to/images --model_def ../yolo_config/yolov3-<tiny | gen1>.cfg --weights_path checkpoints/preferred_ckpt.pth --class_path /path/to/dataset/data/custom/classes.names
130 | 
131 | Further informations are available in PyTorch-YOLOv3 repository README.
132 | 
133 | ### Use the Prophesee COCO metric evaluation
134 | 
135 | In order to use the Prophesee evaluator script, compliant npy array files should be created. These arrays should contain the detected bounding boxes on test images along with their timestamp.
136 | This has be done by forking the YOLOv3 test.py script. The new script creates for each event (frames that belong to the same video) an array of tuples, where each tuple is a detected bounding box. The timestamp associated to a bounding box is approximated as: starting_accumulation_time + (total_accumulation_time / 2), where the total_accumulation_time for Temporal Binary Encoded frames is the accumulation time * number of bits used.
137 | In order to output the npy files enter the src/ folder and run the following command:
138 | 
139 | > python3 test_gen1.py --model_def ../yolo_config/yolov3-<tiny | gen1>.cfg --data_config ../yolo_config/gen1-test.data --weights_path ../PyTorch-YOLOv3/checkpoints/preferred_ckpt.pth --gen1_output /path/to/output/folder --total_acc_time 20000
140 | 
141 | Once the npy files have been created, the Prophesee evaluator can be used. Enter the prophesee-automotive-dataset-toolbox repository and run the following command:
142 | 
143 | > python3 psee_evaluator.py --camera GEN1 /path/to/gen1/events/npy /path/to/detection/events/npy
144 | 
145 | Note: in order to use the psee_evaluator script, it should be moved on the root directory of the repository. Otherwise the following error will be produced:
146 | 
147 | ``` bash
148 | Traceback (most recent call last):
149 |   File "psee_evaluator.py", line 5, in <module>
150 |     from src.metrics.coco_eval import evaluate_detection
151 | ModuleNotFoundError: No module named 'src'
152 | ```
153 | 
154 | ### Real time object detection
155 | 
156 | The YOLOv3 detect.py script has been forked in order to create a demo script that can be used to convert an event video to encoded frames and detect objects on them at the same time. In order to do that enter the src/ folder and run the following commands. 
157 | 
158 | > python3 rt_detection.py --class_path /path/to/dataset/data/custom/classes.names --event_video /path/to/event/dat --model_def ../yolo_config/yolov3-<tiny | gen1>.cfg --encoder tbr --accumulation_time 10000 --tbr_bits 8 --show_video --weights_path ../PyTorch-YOLOv3/checkpoints/preferred_ckpt.pth --conf_thres 0.8
159 | 
160 | All the three developed encoders can be used and can be specified with the option:
161 | * --encoder <tbr | polarity | sae>, default: tbr
162 | 


--------------------------------------------------------------------------------
/doc/VMR_Francesco_Areoluci_Presentation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/francescoareoluci/tbr-event-object-detection/e1de8c47fedbe1ea5ab57ae936f1673ff4fb7272/doc/VMR_Francesco_Areoluci_Presentation.pdf


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
  1 | name: pytorch_env
  2 | channels:
  3 |   - conda-forge
  4 |   - defaults
  5 | dependencies:
  6 |   - _libgcc_mutex=0.1=main
  7 |   - _pytorch_select=0.1=cpu_0
  8 |   - absl-py=0.12.0=pyhd8ed1ab_0
  9 |   - aiohttp=3.7.4=py37h27cfd23_1
 10 |   - async-timeout=3.0.1=py_1000
 11 |   - attrs=20.3.0=pyhd3deb0d_0
 12 |   - blas=1.0=mkl
 13 |   - blinker=1.4=py_1
 14 |   - blosc=1.21.0=h8c45485_0
 15 |   - brotli=1.0.9=he6710b0_2
 16 |   - brotlipy=0.7.0=py37hb5d75c8_1001
 17 |   - brunsli=0.1=h2531618_0
 18 |   - bzip2=1.0.8=h7b6447c_0
 19 |   - c-ares=1.17.1=h36c2ea0_0
 20 |   - ca-certificates=2020.12.5=ha878542_0
 21 |   - cachetools=4.2.1=pyhd8ed1ab_0
 22 |   - cairo=1.14.12=h8948797_3
 23 |   - certifi=2020.12.5=py37h89c1867_1
 24 |   - cffi=1.14.5=py37h261ae71_0
 25 |   - chardet=3.0.4=py37he5f6b98_1008
 26 |   - charls=2.1.0=he6710b0_2
 27 |   - click=7.1.2=pyh9f0ad1d_0
 28 |   - cloudpickle=1.6.0=py_0
 29 |   - colorama=0.4.4=pyh9f0ad1d_0
 30 |   - cryptography=3.4.7=py37h5d9358c_0
 31 |   - cycler=0.10.0=py37_0
 32 |   - cytoolz=0.11.0=py37h7b6447c_0
 33 |   - dask-core=2021.3.0=pyhd3eb1b0_0
 34 |   - dbus=1.13.18=hb2f20db_0
 35 |   - decorator=4.4.2=pyhd3eb1b0_0
 36 |   - expat=2.2.10=he6710b0_2
 37 |   - ffmpeg=4.0=hcdf2ecd_0
 38 |   - fontconfig=2.13.1=h6c09931_0
 39 |   - freeglut=3.0.0=hf484d3e_5
 40 |   - freetype=2.10.4=h5ab3b9f_0
 41 |   - geos=3.8.0=he6710b0_0
 42 |   - giflib=5.1.4=h14c3975_1
 43 |   - glib=2.66.1=h92f7085_0
 44 |   - google-auth=1.26.1=pyh44b312d_0
 45 |   - google-auth-oauthlib=0.4.1=py_2
 46 |   - graphite2=1.3.14=h23475e2_0
 47 |   - grpcio=1.33.2=py37haffed2e_2
 48 |   - gst-plugins-base=1.14.0=h8213a91_2
 49 |   - gstreamer=1.14.0=h28cd5cc_2
 50 |   - harfbuzz=1.8.8=hffaf4a1_0
 51 |   - hdf5=1.10.2=hba1933b_1
 52 |   - icu=58.2=he6710b0_3
 53 |   - idna=2.10=pyh9f0ad1d_0
 54 |   - imagecodecs=2021.1.11=py37h581e88b_1
 55 |   - imageio=2.9.0=py_0
 56 |   - imgaug=0.4.0=pyhd3eb1b0_0
 57 |   - importlib-metadata=3.9.1=py37h89c1867_0
 58 |   - intel-openmp=2019.4=243
 59 |   - jasper=2.0.14=h07fcdf6_1
 60 |   - jpeg=9b=h024ee3a_2
 61 |   - jxrlib=1.1=h7b6447c_2
 62 |   - kiwisolver=1.3.1=py37h2531618_0
 63 |   - lcms2=2.11=h396b838_0
 64 |   - ld_impl_linux-64=2.33.1=h53a641e_7
 65 |   - lerc=2.2.1=h2531618_0
 66 |   - libaec=1.0.4=he6710b0_1
 67 |   - libdeflate=1.7=h27cfd23_5
 68 |   - libedit=3.1.20191231=h14c3975_1
 69 |   - libffi=3.3=he6710b0_2
 70 |   - libgcc-ng=9.1.0=hdf63c60_0
 71 |   - libgfortran-ng=7.3.0=hdf63c60_0
 72 |   - libglu=9.0.0=hf484d3e_1
 73 |   - libopencv=3.4.2=hb342d67_1
 74 |   - libopus=1.3.1=h7b6447c_0
 75 |   - libpng=1.6.37=hbc83047_0
 76 |   - libprotobuf=3.14.0=h8c45485_0
 77 |   - libstdcxx-ng=9.1.0=hdf63c60_0
 78 |   - libtiff=4.1.0=h2733197_1
 79 |   - libuuid=1.0.3=h1bed415_2
 80 |   - libvpx=1.7.0=h439df22_0
 81 |   - libwebp=1.0.1=h8e7db2f_0
 82 |   - libxcb=1.14=h7b6447c_0
 83 |   - libxml2=2.9.10=hb55368b_3
 84 |   - libzopfli=1.0.3=he6710b0_0
 85 |   - lz4-c=1.9.3=h2531618_0
 86 |   - markdown=3.3.4=pyhd8ed1ab_0
 87 |   - matplotlib=3.3.2=h06a4308_0
 88 |   - matplotlib-base=3.3.2=py37h817c723_0
 89 |   - mkl=2019.4=243
 90 |   - mkl-service=2.3.0=py37he8ac12f_0
 91 |   - mkl_fft=1.2.0=py37h23d657b_0
 92 |   - mkl_random=1.0.4=py37hd81dba3_0
 93 |   - multidict=5.1.0=py37h27cfd23_2
 94 |   - ncurses=6.2=he6710b0_1
 95 |   - networkx=2.5=py_0
 96 |   - ninja=1.10.2=py37hff7bd54_0
 97 |   - numpy=1.19.2=py37h54aff64_0
 98 |   - numpy-base=1.19.2=py37hfa32c7d_0
 99 |   - oauthlib=3.0.1=py_0
100 |   - olefile=0.46=py37_0
101 |   - opencv=3.4.2=py37h6fd60c2_1
102 |   - openjpeg=2.3.0=h05c96fa_1
103 |   - openssl=1.1.1k=h27cfd23_0
104 |   - pcre=8.44=he6710b0_0
105 |   - pillow=8.1.0=py37he98fc37_0
106 |   - pip=20.3.3=py37h06a4308_0
107 |   - pixman=0.40.0=h7b6447c_0
108 |   - protobuf=3.14.0=py37h2531618_1
109 |   - py-opencv=3.4.2=py37hb342d67_1
110 |   - pyasn1=0.4.8=py_0
111 |   - pyasn1-modules=0.2.7=py_0
112 |   - pycparser=2.20=py_2
113 |   - pyjwt=2.0.1=pyhd8ed1ab_0
114 |   - pyopenssl=20.0.1=pyhd8ed1ab_0
115 |   - pyparsing=2.4.7=pyhd3eb1b0_0
116 |   - pyqt=5.9.2=py37h05f1152_2
117 |   - pysocks=1.7.1=py37h89c1867_3
118 |   - python=3.7.9=h7579374_0
119 |   - python-dateutil=2.8.1=pyhd3eb1b0_0
120 |   - python_abi=3.7=1_cp37m
121 |   - pytorch=1.3.1=cpu_py37h62f834f_0
122 |   - pywavelets=1.1.1=py37h7b6447c_2
123 |   - pyyaml=5.4.1=py37h27cfd23_1
124 |   - qt=5.9.7=h5867ecd_1
125 |   - readline=8.1=h27cfd23_0
126 |   - requests=2.25.1=pyhd3deb0d_0
127 |   - requests-oauthlib=1.3.0=pyh9f0ad1d_0
128 |   - rsa=4.7.2=pyh44b312d_0
129 |   - scikit-image=0.17.2=py37hdf5156a_0
130 |   - scipy=1.6.2=py37h91f5cce_0
131 |   - setuptools=52.0.0=py37h06a4308_0
132 |   - shapely=1.7.1=py37h98ec03d_0
133 |   - sip=4.19.8=py37hf484d3e_0
134 |   - six=1.15.0=py37h06a4308_0
135 |   - snappy=1.1.8=he6710b0_0
136 |   - sqlite=3.33.0=h62c20be_0
137 |   - tensorboard=2.4.1=pyhd8ed1ab_0
138 |   - tensorboard-plugin-wit=1.8.0=pyh44b312d_0
139 |   - termcolor=1.1.0=py_2
140 |   - terminaltables=3.1.0=py_0
141 |   - tifffile=2021.3.17=pyhd3eb1b0_1
142 |   - tk=8.6.10=hbc83047_0
143 |   - toolz=0.11.1=pyhd3eb1b0_0
144 |   - torchvision=0.4.2=cpu_py37h9ec355b_0
145 |   - tornado=6.1=py37h27cfd23_0
146 |   - tqdm=4.59.0=pyhd3eb1b0_1
147 |   - typing-extensions=3.7.4.3=0
148 |   - typing_extensions=3.7.4.3=py_0
149 |   - tzdata=2020f=h52ac0ba_0
150 |   - urllib3=1.26.4=pyhd8ed1ab_0
151 |   - werkzeug=1.0.1=pyh9f0ad1d_0
152 |   - wheel=0.36.2=pyhd3eb1b0_0
153 |   - xz=5.2.5=h7b6447c_0
154 |   - yaml=0.2.5=h7b6447c_0
155 |   - yarl=1.6.3=py37h4abf009_0
156 |   - zfp=0.5.5=h2531618_4
157 |   - zipp=3.4.1=pyhd8ed1ab_0
158 |   - zlib=1.2.11=h7b6447c_3
159 |   - zstd=1.4.5=h9ceee32_0
160 | prefix: /home/magenta/anaconda3/envs/pytorch_env
161 | 
162 | 


--------------------------------------------------------------------------------
/environment_cuda.yml:
--------------------------------------------------------------------------------
  1 | name: pytorch_env
  2 | channels:
  3 |   - pytorch
  4 |   - conda-forge
  5 |   - defaults
  6 | dependencies:
  7 |   - _libgcc_mutex=0.1=main
  8 |   - absl-py=0.12.0=pyhd8ed1ab_0
  9 |   - blas=1.0=mkl
 10 |   - bzip2=1.0.8=h7b6447c_0
 11 |   - c-ares=1.17.1=h36c2ea0_0
 12 |   - ca-certificates=2020.12.5=ha878542_0
 13 |   - cairo=1.16.0=hf32fb01_1
 14 |   - certifi=2020.12.5=py37h89c1867_1
 15 |   - cloudpickle=1.6.0=py_0
 16 |   - colorama=0.4.4=pyh9f0ad1d_0
 17 |   - cudatoolkit=11.0.221=h6bb024c_0
 18 |   - cycler=0.10.0=py37_0
 19 |   - cython=0.29.17=py37h3340039_0
 20 |   - cytoolz=0.11.0=py37h7b6447c_0
 21 |   - dask-core=2021.4.0=pyhd3eb1b0_0
 22 |   - dbus=1.13.18=hb2f20db_0
 23 |   - decorator=5.0.6=pyhd3eb1b0_0
 24 |   - expat=2.3.0=h2531618_2
 25 |   - ffmpeg=4.0=hcdf2ecd_0
 26 |   - fontconfig=2.13.1=h6c09931_0
 27 |   - freeglut=3.0.0=hf484d3e_5
 28 |   - freetype=2.10.4=h5ab3b9f_0
 29 |   - fsspec=0.9.0=pyhd3eb1b0_0
 30 |   - geos=3.8.0=he6710b0_0
 31 |   - glib=2.68.1=h36276a3_0
 32 |   - graphite2=1.3.14=h23475e2_0
 33 |   - grpcio=1.33.2=py37haffed2e_2
 34 |   - gst-plugins-base=1.14.0=h8213a91_2
 35 |   - gstreamer=1.14.0=h28cd5cc_2
 36 |   - harfbuzz=1.8.8=hffaf4a1_0
 37 |   - hdf5=1.10.2=hba1933b_1
 38 |   - icu=58.2=he6710b0_3
 39 |   - imageio=2.9.0=pyhd3eb1b0_0
 40 |   - imgaug=0.4.0=pyhd3eb1b0_0
 41 |   - importlib-metadata=3.10.1=py37h89c1867_0
 42 |   - intel-openmp=2020.2=254
 43 |   - jasper=2.0.14=h07fcdf6_1
 44 |   - jpeg=9b=h024ee3a_2
 45 |   - kiwisolver=1.3.1=py37h2531618_0
 46 |   - lcms2=2.12=h3be6417_0
 47 |   - ld_impl_linux-64=2.33.1=h53a641e_7
 48 |   - libffi=3.3=he6710b0_2
 49 |   - libgcc-ng=9.1.0=hdf63c60_0
 50 |   - libgfortran-ng=7.3.0=hdf63c60_0
 51 |   - libglu=9.0.0=hf484d3e_1
 52 |   - libopencv=3.4.2=hb342d67_1
 53 |   - libopus=1.3.1=h7b6447c_0
 54 |   - libpng=1.6.37=hbc83047_0
 55 |   - libprotobuf=3.14.0=h8c45485_0
 56 |   - libstdcxx-ng=9.1.0=hdf63c60_0
 57 |   - libtiff=4.1.0=h2733197_1
 58 |   - libuuid=1.0.3=h1bed415_2
 59 |   - libuv=1.40.0=h7b6447c_0
 60 |   - libvpx=1.7.0=h439df22_0
 61 |   - libxcb=1.14=h7b6447c_0
 62 |   - libxml2=2.9.10=hb55368b_3
 63 |   - locket=0.2.1=py37h06a4308_1
 64 |   - lz4-c=1.9.3=h2531618_0
 65 |   - markdown=3.3.4=pyhd8ed1ab_0
 66 |   - matplotlib=3.3.4=py37h06a4308_0
 67 |   - matplotlib-base=3.3.4=py37h62a2d02_0
 68 |   - mkl=2020.2=256
 69 |   - mkl-service=2.3.0=py37he8ac12f_0
 70 |   - mkl_fft=1.3.0=py37h54f3939_0
 71 |   - mkl_random=1.1.1=py37h0573a6f_0
 72 |   - ncurses=6.2=he6710b0_1
 73 |   - networkx=2.2=py37_1
 74 |   - ninja=1.10.2=hff7bd54_1
 75 |   - numpy=1.19.2=py37h54aff64_0
 76 |   - numpy-base=1.19.2=py37hfa32c7d_0
 77 |   - olefile=0.46=py37_0
 78 |   - opencv=3.4.2=py37h6fd60c2_1
 79 |   - openssl=1.1.1k=h27cfd23_0
 80 |   - partd=1.2.0=pyhd3eb1b0_0
 81 |   - pcre=8.44=he6710b0_0
 82 |   - pillow=8.2.0=py37he98fc37_0
 83 |   - pip=21.0.1=py37h06a4308_0
 84 |   - pixman=0.40.0=h7b6447c_0
 85 |   - protobuf=3.14.0=py37h2531618_1
 86 |   - py-opencv=3.4.2=py37hb342d67_1
 87 |   - pycocotools=2.0.2=py37h8f50634_1
 88 |   - pyparsing=2.4.7=pyhd3eb1b0_0
 89 |   - pyqt=5.9.2=py37h05f1152_2
 90 |   - python=3.7.10=hdb3f193_0
 91 |   - python-dateutil=2.8.1=pyhd3eb1b0_0
 92 |   - python_abi=3.7=1_cp37m
 93 |   - pytorch=1.7.1=py3.7_cuda11.0.221_cudnn8.0.5_0
 94 |   - pywavelets=1.1.1=py37h7b6447c_2
 95 |   - pyyaml=5.4.1=py37h27cfd23_1
 96 |   - qt=5.9.7=h5867ecd_1
 97 |   - readline=8.1=h27cfd23_0
 98 |   - scikit-image=0.18.1=py37ha9443f7_0
 99 |   - scipy=1.6.2=py37h91f5cce_0
100 |   - setuptools=52.0.0=py37h06a4308_0
101 |   - shapely=1.7.1=py37h98ec03d_0
102 |   - sip=4.19.8=py37hf484d3e_0
103 |   - six=1.15.0=py37h06a4308_0
104 |   - sqlite=3.35.4=hdfb4753_0
105 |   - tensorboard=1.15.0=py37_0
106 |   - termcolor=1.1.0=py_2
107 |   - terminaltables=3.1.0=py_0
108 |   - tifffile=2020.10.1=py37hdd07704_2
109 |   - tk=8.6.10=hbc83047_0
110 |   - toolz=0.11.1=pyhd3eb1b0_0
111 |   - torchvision=0.8.2=py37_cu110
112 |   - tornado=6.1=py37h27cfd23_0
113 |   - tqdm=4.59.0=pyhd3eb1b0_1
114 |   - typing_extensions=3.7.4.3=pyha847dfd_0
115 |   - werkzeug=1.0.1=pyh9f0ad1d_0
116 |   - wheel=0.36.2=pyhd3eb1b0_0
117 |   - xz=5.2.5=h7b6447c_0
118 |   - yaml=0.2.5=h7b6447c_0
119 |   - zipp=3.4.1=pyhd8ed1ab_0
120 |   - zlib=1.2.11=h7b6447c_3
121 |   - zstd=1.4.9=haebb681_0
122 | prefix: /home/fareoluci/miniconda3/envs/pytorch_env
123 | 


--------------------------------------------------------------------------------
/src/cl_parser.py:
--------------------------------------------------------------------------------
 1 | """
 2 | cl_parser.py: Command Line Parser, parses user commands
 3 | """
 4 | 
 5 | import argparse
 6 | 
 7 | class CLParser:
 8 |     """
 9 |     @brief: Class to manage command line arguments
10 |     """
11 |     
12 |     def __init__(self):
13 |         self._parser = argparse.ArgumentParser(description='Convert events to frames and associates bboxes')
14 |         self._parser.add_argument('--use_stored_enc', '-l', action='count', default=0,
15 |                         help='use_stored_enc: instead of evaluates TBR or other encodings, uses pre-evaluated encoded array. Default: false')
16 |         self._parser.add_argument('--save_enc', '-s', action='count', default=0,
17 |                         help='save_enc: save the intermediate TBR or other encodings frame array. Default: false')
18 |         self._parser.add_argument('--show_video', '-v', action='count', default=0,
19 |                         help='show_video: show video with evaluated TBR frames and their bboxes during processing. Default: false')
20 |         self._parser.add_argument('--tbr_bits', '-n', type=int, default=8,
21 |                         help='tbr_bits: set the number of bits for Temporal Binary Representation. Default: 8')
22 |         self._parser.add_argument('--src_video', '-t', type=str, nargs=1,
23 |                         help='src_video: path to event videos')
24 |         self._parser.add_argument('--dest_path', '-d', type=str, nargs=1,
25 |                         help='dest_path: path where images and bboxes will be stored')
26 |         self._parser.add_argument('--event_type', '-e', type=str, nargs=1,
27 |                         help='event_type: specify data type: <train | validation | test>')
28 |         self._parser.add_argument('--save_bb_img', '-b', type=str, nargs=1,
29 |                         help='save_bb_img: save frame with bboxes to path')
30 |         self._parser.add_argument('--accumulation_time', '-a', type=int, default=2500,
31 |                         help='accumulation_time: set the quantization time of events (microseconds). Default: 2500')
32 |         self._parser.add_argument('--encoder', '-c', type=str, nargs=1,
33 |                         help='encoder: set the encoder: <tbe | polarity | sae>. Default: tbe')
34 |         self._parser.add_argument('--export_all_frames', type=str, nargs=1,
35 |                         help='export_all_frames: export all encoded frames from an event video to path')
36 | 
37 |     def parse(self):
38 |         """
39 |         @brief: parse the command line arguments
40 |         @return: parsed arguments
41 |         """
42 | 
43 |         return self._parser.parse_args()


--------------------------------------------------------------------------------
/src/dir_handler.py:
--------------------------------------------------------------------------------
  1 | """
  2 | dir_handlers.py: module that manages the input/output directories
  3 | """
  4 | 
  5 | import os
  6 | 
  7 | def setupDirectories(root_dir: str) -> dict:
  8 |     """
  9 |     @brief: Setup directories as requested in YOLOV3
 10 |             implementation. 
 11 |     @param: root_dir - Root directory where the files should be
 12 |                        saved. Must be a valid folder
 13 |     @return: Dict of useful directories:
 14 |             "images": images_path,
 15 |             "labels": labels_path,
 16 |             "train_file": train_file_path,
 17 |             "valid_file": valid_file_path,
 18 |             "test_file": test_file_path,
 19 |             "list": list_path,
 20 |             "completed": completed_file_path,
 21 |             "enc": evaluated_enc_path
 22 |     """
 23 | 
 24 |     start_folder = 'data'
 25 |     data_path = root_dir + '/' + start_folder
 26 |     custom_path = data_path + '/custom'
 27 |     images_path = custom_path + '/images'
 28 |     labels_path = custom_path + '/labels'
 29 |     classes_file_path = custom_path + '/classes.names'
 30 |     train_file_path = custom_path + '/train.txt'
 31 |     valid_file_path = custom_path + '/valid.txt'
 32 |     test_file_path = custom_path + '/test.txt'
 33 |     completed_file_path = data_path + "/completed_videos"
 34 |     evaluated_enc_path = data_path + "/evaluated_enc"
 35 | 
 36 |     if not os.path.isdir(data_path):
 37 |         os.mkdir(data_path)
 38 |     start_folder_abs = os.path.abspath(data_path)
 39 |     list_path = start_folder_abs + '/custom/images/'
 40 | 
 41 |     if not os.path.isdir(custom_path):
 42 |         os.mkdir(custom_path)
 43 | 
 44 |     if not os.path.isdir(images_path):
 45 |         os.mkdir(images_path)
 46 | 
 47 |     if not os.path.isdir(labels_path):
 48 |         os.mkdir(labels_path)
 49 | 
 50 |     if not os.path.isdir(evaluated_enc_path):
 51 |         os.mkdir(evaluated_enc_path)
 52 | 
 53 |     # Setup classes
 54 |     f = open(classes_file_path, "w")
 55 |     f.write("vehicle\n")
 56 |     f.write("pedestrian\n")
 57 |     f.close
 58 | 
 59 |     if not os.path.isfile(completed_file_path):
 60 |         f = open(completed_file_path, "x")
 61 |         f.close()
 62 | 
 63 |     if not os.path.isfile(train_file_path):
 64 |         f = open(train_file_path, "x")
 65 |         f.close()
 66 | 
 67 |     if not os.path.isfile(valid_file_path):
 68 |         f = open(valid_file_path, "x")
 69 |         f.close()
 70 | 
 71 |     if not os.path.isfile(test_file_path):
 72 |         f = open(test_file_path, "x")
 73 |         f.close()
 74 | 
 75 |     return {
 76 |             "images": images_path,
 77 |             "labels": labels_path,
 78 |             "train_file": train_file_path,
 79 |             "valid_file": valid_file_path,
 80 |             "test_file": test_file_path,
 81 |             "list": list_path,
 82 |             "completed": completed_file_path,
 83 |             "enc": evaluated_enc_path
 84 |             }
 85 | 
 86 | 
 87 | def getEventList(directory: str) -> list:
 88 |     """
 89 |     @brief: Check in directory for events and bbox annotations. 
 90 |             An event is valid if annotation file 
 91 |             with same basename exists.
 92 |     @param: directory - Directory where the .dat and .npy files
 93 |                         are stored
 94 |     @return: List of basenames of valid event files.
 95 |     """
 96 | 
 97 |     file_list_npy = [file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory, file)) and 
 98 |                                                                 os.path.splitext(os.path.join(directory, file))[1] == '.npy']
 99 |     file_list_dat = [file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory, file)) and 
100 |                                                                 os.path.splitext(os.path.join(directory, file))[1] == '.dat']
101 |     filtered_file_list = []
102 |     for td in file_list_dat:
103 |         if "cut" in td:
104 |             # Avoid files with same name but different 'cut'
105 |             # @TODO: change split policy to handle these files
106 |             print("Skipping video {:s}: filename not compliant".format(td))
107 |             continue
108 | 
109 |         td_split = td.split('_')
110 |         td = td_split[0] + "_" + td_split[1] + "_" + td_split[2] + "_" + td_split[3]
111 |         for bbox in file_list_npy:
112 |             bbox_split = bbox.split('_')
113 |             bbox = bbox_split[0] + "_" + bbox_split[1] + "_" + bbox_split[2] + "_" + bbox_split[3]
114 |             if td == bbox:
115 |                 filtered_file_list.append(td)
116 | 
117 |     return filtered_file_list


--------------------------------------------------------------------------------
/src/ec_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | ec_utils.py: Event Converter Utils - Utility functions
  3 | """
  4 | 
  5 | import numpy as np
  6 | import matplotlib.pyplot as plt
  7 | from matplotlib.patches import Rectangle
  8 | from matplotlib.ticker import NullLocator
  9 | 
 10 | 
 11 | def show_image(frame: np.array, bboxes: np.array, max_value: int = 1):
 12 |     """
 13 |     @brief: show video of encoded frames and their bboxes
 14 |             during processing
 15 |     @param: frame - A np array containing pixel informations
 16 |     @param: bboxes - np array with the bboxes associated to the frame.
 17 |                      As loaded from the GEN1 .npy array
 18 |     """
 19 | 
 20 |     plt.figure(1)
 21 |     plt.clf()
 22 |     plt.axis("off")
 23 |     plt.imshow(frame, animated=True, cmap='gray', vmin=0, vmax=max_value)
 24 |     #plt.colorbar()
 25 | 
 26 |     # Get the current reference
 27 |     ax = plt.gca()
 28 | 
 29 |     # Create Rectangle boxes
 30 |     for b in bboxes:
 31 |         predicted_class = b[5]
 32 |         x = b[1]
 33 |         y = b[2]
 34 |         w = b[3]
 35 |         h = b[4]
 36 |         bbox_color = 'g' if predicted_class == 1 else 'r'
 37 |         # Create Rectangle
 38 |         rect = Rectangle((x, y), w, h, linewidth=2, edgecolor=bbox_color, facecolor='none')
 39 |         # Add the patch to the Axes
 40 |         ax.add_patch(rect)
 41 |         # Add label
 42 |         plt.text(
 43 |             b[1],
 44 |             b[2],
 45 |             s='Pedestrian' if predicted_class == 1 else 'Vehicle',
 46 |             color="white",
 47 |             verticalalignment="top",
 48 |             bbox={"color": bbox_color, "pad": 0},
 49 |         )
 50 | 
 51 | 
 52 | def save_bb_image(frame: np.array, 
 53 |                   bboxes: np.array, 
 54 |                   save_path: str, 
 55 |                   only_detection: bool = True,
 56 | 		  max_value: int = 1):
 57 |     """
 58 |     @brief: save encoded frames with their bboxes
 59 |     @param: frame - A np array containing pixel informations
 60 |     @param: bboxes - np array with the bboxes associated to the frame.
 61 |                      As loaded from the GEN1 .npy array
 62 |     @param: save_path - Existing path where the resulting images should
 63 |                         be saved
 64 |     """
 65 | 
 66 |     plt.imshow(frame, cmap='gray', vmin=0, vmax=max_value)
 67 |     plt.axis('off')
 68 | 
 69 |     # Get the current reference
 70 |     ax = plt.gca()
 71 | 
 72 |     # Create Rectangle boxes
 73 |     for b in bboxes:
 74 |         predicted_class = b[5]
 75 |         x = b[1]
 76 |         y = b[2]
 77 |         w = b[3]
 78 |         h = b[4]
 79 |         bbox_color = 'g' if predicted_class == 1 else 'r'
 80 |         # Create Rectangle
 81 |         rect = Rectangle((x, y), w, h, linewidth=2, edgecolor=bbox_color, facecolor='none')
 82 |         # Add the patch to the Axes
 83 |         ax.add_patch(rect)
 84 |         # Add label
 85 |         plt.text(
 86 |             b[1],
 87 |             b[2],
 88 |             s='Pedestrian' if predicted_class == 1 else 'Vehicle',
 89 |             color="white",
 90 |             verticalalignment="top",
 91 |             bbox={"color": bbox_color, "pad": 0},
 92 |         )
 93 |     
 94 |     if not only_detection:
 95 |         # Save all frames if requested
 96 |         plt.savefig(save_path, bbox_inches='tight')
 97 |         plt.close()
 98 |     elif len(bboxes) != 0:
 99 |         # Save only frames with bboxes associated
100 |         plt.savefig(save_path, bbox_inches='tight')
101 |         plt.close()
102 |         
103 | 
104 | def convertBBoxCoords(bbox: np.array, image_width: int, image_height: int) -> np.array:
105 |     """
106 |     @brief: Converts top-left starting coordinates to
107 |             rectangle-centered coordinates. Moreover,
108 |             coordinates and size are normalized.
109 |     @param: bbox - A bbox as loaded from the GEN1 .npy array
110 |     @param: image_width
111 |     @param: image_height
112 |     @return: np array compliant to YOLOV3 implementation.
113 |     """
114 | 
115 |     top_left_x = bbox[1]
116 |     top_left_y = bbox[2]
117 |     width = bbox[3]
118 |     height = bbox[4]
119 |     norm_center_x = float((top_left_x + (width / 2)) / image_width)
120 |     norm_center_y = float((top_left_y + (height / 2)) / image_height)
121 |     norm_width = float(width / image_width)
122 |     norm_height = float(height / image_height)
123 | 
124 |     new_bbox = np.array([int(bbox[5]), norm_center_x, norm_center_y, norm_width, norm_height])
125 |     
126 |     return new_bbox
127 | 


--------------------------------------------------------------------------------
/src/encoders.py:
--------------------------------------------------------------------------------
  1 | """
  2 | encoders.py: encode frames from event videos using
  3 | Temporal Binary Representation, Polarity, Surface Active Events encodings
  4 | """
  5 | 
  6 | import math
  7 | import sys
  8 | import numpy as np
  9 | from tqdm import tqdm
 10 | 
 11 | from tbe import TemporalBinaryEncoding
 12 | import sys
 13 | sys.path.insert(0, '../prophesee-automotive-dataset-toolbox/')
 14 | from src.io.psee_loader import PSEELoader
 15 | 
 16 | 
 17 | def encode_video_sae(width: int, 
 18 |                     height: int, 
 19 |                     video: PSEELoader, 
 20 |                     delta: int = 2500) -> np.array:
 21 |     """
 22 |     @brief: Encode video in a sequence of frames using
 23 |             Surface Active Event (SAE) encoding
 24 |     @param: width
 25 |     @param: height
 26 |     @param: video - loaded from PSEELoader
 27 |     @param: delta - accumulation time
 28 |     @return: encoded frames as a Numpy array with the following data type:
 29 |                 [('startTs', np.uint16), 
 30 |                 ('endTs', np.uint16), 
 31 |                 ('frame', np.float32, (height, width))]
 32 |     """
 33 | 
 34 |     print("Starting Surface Active Event encoding...")
 35 | 
 36 |     # Each encoded frame will have a start/end timestamp (ms) in order
 37 |     # to associate bounding boxes later.
 38 |     # Note: If videos are longer than 1 minutes, 16 bits per ts are not sufficient.
 39 |     data_type = np.dtype([('startTs', np.uint16), 
 40 |                             ('endTs', np.uint16), 
 41 |                             ('frame', np.float32, (height, width))])
 42 |     
 43 |     samplePerVideo = math.ceil(video.total_time() / delta)
 44 |     sae_array = np.zeros(samplePerVideo, dtype=data_type)
 45 | 
 46 |     i = 0
 47 |     startTimestamp = 0  # milliseconds
 48 |     endTimestamp = 0    # milliseconds
 49 | 
 50 |     pbar = tqdm(total=samplePerVideo, file=sys.stdout)
 51 |     while not video.done:
 52 |         events = video.load_delta_t(delta)
 53 |         f = np.zeros(video.get_size())
 54 |         for e in events:
 55 |             # Evaluate polarity of an event 
 56 |             # for a certain pixel
 57 |             t_p = e['t']                    # microseconds
 58 |             t_0 = startTimestamp * 1000     # microseconds
 59 |             f[e['y'], e['x']] = 255 * ((t_p - t_0) / delta)
 60 | 
 61 |         endTimestamp += delta / 1000
 62 |         sae_array[i]['startTs'] = startTimestamp
 63 |         sae_array[i]['endTs'] = endTimestamp
 64 |         sae_array[i]['frame'] = f
 65 |         startTimestamp += delta / 1000
 66 |         i += 1
 67 |         
 68 |         pbar.update(1)
 69 | 
 70 |     pbar.close()
 71 |     return sae_array
 72 | 
 73 | 
 74 | def encode_video_polarity(width: int, 
 75 |                         height: int, 
 76 |                         video: PSEELoader, 
 77 |                         delta: int = 2500) -> np.array:
 78 |     """
 79 |     @brief: Encode video in a sequence of frames using
 80 |             Polarity encoding
 81 |     @param: width
 82 |     @param: height
 83 |     @param: video - loaded from PSEELoader
 84 |     @param: delta - accumulation time
 85 |     @return: encoded frames as a Numpy array with the following data type:
 86 |              [('startTs', np.uint16), 
 87 |               ('endTs', np.uint16), 
 88 |               ('frame', np.float32, (height, width))]
 89 |     """
 90 | 
 91 |     print("Starting Polarity Encoding...")
 92 | 
 93 |     # Each encoded frame will have a start/end timestamp (ms) in order
 94 |     # to associate bounding boxes later.
 95 |     # Note: If videos are longer than 1 minutes, 16 bits per ts are not sufficient.
 96 |     data_type = np.dtype([('startTs', np.uint16), 
 97 |                             ('endTs', np.uint16), 
 98 |                             ('frame', np.float32, (height, width))])
 99 |     
100 |     samplePerVideo = math.ceil(video.total_time() / delta)
101 |     polarity_array = np.zeros(samplePerVideo, dtype=data_type)
102 | 
103 |     i = 0
104 |     startTimestamp = 0   # milliseconds
105 |     endTimestamp = 0     # milliseconds
106 | 
107 |     pbar = tqdm(total=samplePerVideo, file=sys.stdout)
108 |     while not video.done:
109 |         events = video.load_delta_t(delta)
110 |         f = np.full(video.get_size(), 0.5)
111 |         for e in events:
112 |             # Evaluate polarity of an event 
113 |             # for a certain pixel
114 |             if e['p'] == 1:
115 |                 f[e['y'], e['x']] = 1
116 |             else:
117 |                 f[e['y'], e['x']] = 0
118 | 
119 |         endTimestamp += delta / 1000
120 |         polarity_array[i]['startTs'] = startTimestamp
121 |         polarity_array[i]['endTs'] = endTimestamp
122 |         polarity_array[i]['frame'] = f
123 |         startTimestamp += delta / 1000
124 |         i += 1
125 |         
126 |         pbar.update(1)
127 | 
128 |     pbar.close()
129 |     return polarity_array
130 | 
131 | 
132 | def encode_video_tbe(N: int, 
133 |                     width: int, 
134 |                     height: int, 
135 |                     video: PSEELoader, 
136 |                     encoder: TemporalBinaryEncoding, 
137 |                     delta: int = 2500) -> np.array:
138 |     """
139 |     @brief: Encode an event video in a sequence of frame
140 |             using the Temporal Binary Representation
141 |     @param: N - number of bits to be used
142 |     @param: width
143 |     @param: height
144 |     @param: video - loaded from PSEELoader
145 |     @param: encoded - TBE encoder
146 |     @param: delta - accumulation time
147 |     @return: encoded frames as a Numpy array with the following data type:
148 |              [('startTs', np.uint16), 
149 |               ('endTs', np.uint16), 
150 |               ('frame', np.float32, (height, width))]
151 |     """
152 |     
153 |     print("Starting Temporal Binary Encoding...")
154 | 
155 |     # Each encoded frame will have a start/end timestamp (ms) in order
156 |     # to associate bounding boxes later.
157 |     # Note: If videos are longer than 1 minutes, 16 bits per ts are not sufficient.
158 |     data_type = np.dtype([('startTs', np.uint16), 
159 |                             ('endTs', np.uint16), 
160 |                             ('frame', np.float32, (height, width))])
161 |     
162 |     samplePerVideo = math.ceil((video.total_time() / delta) / N)
163 |     accumulation_mat = np.zeros((N, height, width))
164 |     tbe_array = np.zeros(samplePerVideo, dtype=data_type)
165 | 
166 |     i = 0
167 |     j = 0
168 |     startTimestamp = 0  # milliseconds
169 |     endTimestamp = 0    # milliseconds
170 | 
171 |     pbar = tqdm(total = samplePerVideo, file = sys.stdout)
172 |     while not video.done:
173 |         i = (i + 1) % N
174 |         # Load next 1ms events from the video
175 |         events = video.load_delta_t(delta)
176 |         f = np.zeros(video.get_size())
177 |         for e in events:
178 |             # Evaluate presence/absence of event for
179 |             # a certain pixel
180 |             f[e['y'], e['x']] = 1
181 | 
182 |         accumulation_mat[i, ...] = f
183 | 
184 |         if i == N - 1:
185 |             endTimestamp += (N * delta) / 1000
186 |             tbe = encoder.encode(accumulation_mat)
187 |             tbe_array[j]['startTs'] = startTimestamp
188 |             tbe_array[j]['endTs'] = endTimestamp
189 |             tbe_array[j]['frame'] = tbe
190 |             j += 1
191 |             startTimestamp += (N * delta) / 1000
192 |             pbar.update(1)
193 |     
194 |     pbar.close()
195 |     return tbe_array
196 | 
197 | 
198 | def get_frame_BB(frame: np.array, BB_array: np.array) -> np.array:
199 |     """
200 |     @brief: Associates to an encoded video frame
201 |             a list of bounding boxes with timestamp included in 
202 |             start/end timestamp of the frame. 
203 |     @param: frame - Encoded frame with the following structure:
204 |                     [{'startTs': startTs}, {'endTs': endTs}, {'frame': frame}]
205 |                     (i.e. as the one returned from the encoders fuctions)
206 |     @param: BB_array - Bounding Boxes array, 
207 |                        loaded from the GEN1 .npy arrays
208 |     @return: The associated BBoxes.
209 |     """
210 | 
211 |     associated_bb = []
212 |     for bb in BB_array:
213 |         # Convert timestamp to milliseconds
214 |         timestamp = bb[0] / 1000
215 |         startTime = frame['startTs']
216 |         endTime = frame['endTs']
217 |         if timestamp >= startTime and timestamp <= endTime:
218 |             associated_bb.append(bb)
219 |         # Avoid useless iterations
220 |         if timestamp > endTime:
221 |             break
222 |     
223 |     return np.array(associated_bb)


--------------------------------------------------------------------------------
/src/event_converter.py:
--------------------------------------------------------------------------------
  1 | """
  2 | event_converter.py: convert event videos to frames with Temporal Binary Encoding
  3 | Main module
  4 | """
  5 | 
  6 | import os
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | from PIL import Image
 10 | 
 11 | from encoders import *
 12 | from ec_utils import *
 13 | from dir_handler import *
 14 | from tbe import TemporalBinaryEncoding
 15 | from cl_parser import CLParser
 16 | 
 17 | import sys
 18 | sys.path.insert(0, '../prophesee-automotive-dataset-toolbox/')
 19 | from src.io.psee_loader import PSEELoader
 20 | 
 21 | 
 22 | if __name__ == "__main__":
 23 |     print("Event to frame converter")
 24 | 
 25 |     # Parsing arguments
 26 |     parser = CLParser()
 27 |     args = parser.parse()
 28 |     save_encoding = True if args.save_enc > 0 else False
 29 |     use_stored_encoding = True if args.use_stored_enc > 0 else False
 30 |     show_video = True if args.show_video > 0 else False
 31 |     tbr_bits_requested = True if args.tbr_bits != None else False
 32 |     src_video_requested = True if args.src_video != None else False
 33 |     dest_path_requested = True if args.dest_path != None else False
 34 |     event_type_requested = True if args.event_type != None else False
 35 |     save_path_requested = True if args.save_bb_img != None else False
 36 |     accumulation_time_requested = True if args.accumulation_time != None else False
 37 |     encoder_type_requested = True if args.encoder != None else False
 38 |     export_all_frames_requested = True if args.export_all_frames != None else False
 39 | 
 40 |     dest_root_folder = '..'
 41 |     if dest_path_requested:
 42 |         dest_root_folder = args.dest_path[0]
 43 | 
 44 |     save_path = ""
 45 |     if save_path_requested:
 46 |         save_path = args.save_bb_img[0] + '/'
 47 | 
 48 |     export_frames_path = ""
 49 |     if export_all_frames_requested:
 50 |         export_frames_path = args.export_all_frames[0] + '/'
 51 | 
 52 |     video_dir = "../train_events/"
 53 |     if src_video_requested:
 54 |         video_dir = args.src_video[0] + '/'
 55 | 
 56 |     data_type = 'train'
 57 |     if event_type_requested:
 58 |         if args.event_type[0] == 'train' or args.event_type[0] == 'validation' or args.event_type[0] == 'test':
 59 |             data_type = args.event_type[0]
 60 |         else:
 61 |             print("Invalid event type requested. Supported: <train | validation | test>.")
 62 |             exit()
 63 | 
 64 |     # Encoder
 65 |     requested_encoder = 'tbe'
 66 |     if encoder_type_requested:
 67 |         if args.encoder[0] == 'tbe' or args.encoder[0] == 'polarity' or args.encoder[0] == 'sae':
 68 |             requested_encoder = args.encoder[0]
 69 |         else:
 70 |             print("Invalid encoder requested")
 71 |             exit()
 72 | 
 73 |     # Number of bits to be used in Temporal Binary Encoding
 74 |     tbr_bits = args.tbr_bits
 75 | 
 76 |     # Accumulation time (microseconds)
 77 |     delta_t = args.accumulation_time
 78 | 
 79 |     # Print some info
 80 |     print("===============================")
 81 |     print("Encoder: " + requested_encoder)
 82 |     print("Requested encoded array saving: " + str(save_encoding))
 83 |     print("Requested saved encoded array loading: " + str(use_stored_encoding))
 84 |     print("Requested video show during processing: " + str(show_video))
 85 |     if requested_encoder == 'tbe':
 86 |         print("Using {:d} bits to represent events".format(tbr_bits))
 87 |     print("Accumulation time: " + str(delta_t))
 88 |     print("Source event path: " + video_dir)
 89 |     print("Destination path: " + dest_root_folder + '/data')
 90 |     print("Event data type: " + data_type)
 91 |     print("===============================")
 92 | 
 93 |     if data_type == "train":
 94 |         txt_list_file = 'train_file'
 95 |     elif data_type == 'validation':
 96 |         txt_list_file = 'valid_file'
 97 |     else:
 98 |         txt_list_file = 'test_file'
 99 | 
100 |     # Setup data directory to save files (images, bboxes and labels)
101 |     dir_paths = setupDirectories(dest_root_folder)
102 | 
103 |     # Iterate through videos in video_dir to get list 
104 |     video_names = getEventList(video_dir)
105 | 
106 |     # Max pixel value to display frames
107 |     max_pixel_value = 1
108 | 
109 |     # Iterate videos
110 |     for video_name in video_names:
111 |         video_path = video_dir + video_name
112 | 
113 |         with open(dir_paths['completed']) as completed_videos:
114 |             if video_name in completed_videos.read():
115 |                 print("Skipping completed video: " + video_name)
116 |                 continue
117 | 
118 |         print("Processing video: " + video_name)
119 | 
120 |         gen1_bboxes = np.load(video_path + "_bbox.npy")
121 | 
122 |         # Load video
123 |         gen1_video = PSEELoader(video_path + "_td.dat")
124 | 
125 |         width = gen1_video.get_size()[1]
126 |         height = gen1_video.get_size()[0]
127 |         encoder = TemporalBinaryEncoding(tbr_bits, width, height)
128 | 
129 |         if not use_stored_encoding:
130 |             # Convert event video to a Temporal Binary Encoded frames array
131 |             if requested_encoder == 'tbe':
132 |                 encoded_array = encode_video_tbe(tbr_bits, width, height, gen1_video, encoder, delta_t)
133 |             elif requested_encoder == 'polarity':
134 |                 encoded_array = encode_video_polarity(width, height, gen1_video, delta_t)
135 |             else:
136 |                 encoded_array = encode_video_sae(width, height, gen1_video, delta_t)
137 |                 max_pixel_value = 255
138 | 
139 |             if save_encoding:
140 |                 np.save(dir_paths["enc"] + video_name + "_enc.npy", encoded_array)
141 |         else:
142 |             # Use the pre-evaluated encoded (tbe or else) array
143 |             encoded_array = np.load(dir_paths["enc"] + video_name + "_enc.npy")
144 | 
145 |         # Iterate through video frames
146 |         img_count = 0
147 |         bbox_count = 0
148 |         print("Saving encoded frames and bounding boxes...")
149 |         for f in encoded_array:
150 |             bboxes = get_frame_BB(f, gen1_bboxes)
151 | 
152 |             filename = video_name + str("_" + str(f["startTs"]))
153 |             # Save images that have at least a bbox
154 |             if len(bboxes) != 0:
155 |                 # Save image
156 |                 plt.imsave(dir_paths["images"] + "/" + filename + ".jpg", f['frame'], vmin=0, vmax=1, cmap='gray')
157 | 
158 |                 # Update train or validation txt file (append if not existing)
159 |                 with open(dir_paths[txt_list_file], "r+") as list_txt_file:
160 |                     file_string = dir_paths["list"] + filename + ".jpg"
161 |                     for line in list_txt_file:
162 |                         # Search for image file path in this file
163 |                         if file_string in line:
164 |                             break
165 |                     else:   # Note: this indentation is intentional
166 |                         # If entered, the string does not exist in this file
167 |                         # Append file path
168 |                         list_txt_file.write(file_string + "\n")
169 | 
170 |                 # Write BBoxes in labels
171 |                 label_file = open(dir_paths["labels"] + "/" + filename + ".txt", "w")
172 |                 for b in bboxes:
173 |                     conv_bbox = convertBBoxCoords(b, width, height)
174 |                     label_file.write(str("%d" % conv_bbox[0]) + " ")
175 |                     label_file.write(str("%.8f" % conv_bbox[1]) + " ")
176 |                     label_file.write(str("%.8f" % conv_bbox[2]) + " ")
177 |                     label_file.write(str("%.8f" % conv_bbox[3]) + " ")
178 |                     label_file.write(str("%.8f" % conv_bbox[4]) + "\n")
179 |                     bbox_count += 1
180 |                 label_file.close()
181 | 
182 |                 if save_path_requested:
183 |                     save_bb_image(f['frame'], bboxes, save_path + filename + "_bb.jpg")
184 | 
185 |                 img_count += 1
186 | 
187 |             if show_video:
188 |                 show_image(f['frame'], bboxes, max_pixel_value)
189 |                 plt.pause(0.05)
190 | 
191 |             if export_all_frames_requested:
192 |                 save_bb_image(f['frame'], np.array([]), export_frames_path + filename + "_" + requested_encoder + ".jpg", False, max_pixel_value)
193 | 
194 |         print("Saved {:d} encoded frames in path: {:s}".format(img_count, dir_paths["images"]))
195 |         print("Saved {:d} bounding boxes annotations in path: {:s}".format(bbox_count, dir_paths["labels"]))
196 | 
197 |         completed_file = open(dir_paths['completed'], "a")
198 |         completed_file.write(video_name + "\n")
199 |         completed_file.close()
200 | 
201 |     print("Done")
202 | 


--------------------------------------------------------------------------------
/src/rt_detection.py:
--------------------------------------------------------------------------------
  1 | """
  2 | rt_detection.py: Real Time Detection, uses YOLOv3 implementation
  3 | in order to detect objects on an GEN1 event video (.dat) using
  4 | Temporal Binary, Polarity and Surface Active Events encodings
  5 | """
  6 | 
  7 | from __future__ import division
  8 | 
  9 | import os
 10 | import sys
 11 | import time
 12 | import datetime
 13 | import argparse
 14 | 
 15 | from PIL import Image
 16 | 
 17 | import torch
 18 | import torchvision.transforms as transforms
 19 | from torch.utils.data import DataLoader
 20 | from torchvision import datasets
 21 | from torch.autograd import Variable
 22 | 
 23 | import matplotlib.pyplot as plt
 24 | import matplotlib.patches as patches
 25 | from matplotlib.ticker import NullLocator
 26 | 
 27 | from encoders import *
 28 | from ec_utils import *
 29 | from tbe import TemporalBinaryEncoding
 30 | 
 31 | import sys
 32 | sys.path.insert(0, '../PyTorch-YOLOv3/')
 33 | from models import *
 34 | from utils.utils import *
 35 | from utils.datasets import *
 36 | from utils.augmentations import *
 37 | from utils.transforms import *
 38 | 
 39 | sys.path.insert(0, '../prophesee-automotive-dataset-toolbox/')
 40 | from src.io.psee_loader import PSEELoader
 41 | 
 42 | 
 43 | def rescaleAndHandleFrame(detections,
 44 |                             frame,
 45 |                             img_size,
 46 |                             show_video,
 47 |                             save_frames,
 48 |                             batch_count,
 49 | 			    is_sae: bool = False):
 50 |     bbox_list = []
 51 |     if detections[0] is not None:
 52 |         for d in detections:
 53 |             d = d.cpu()
 54 |             to_list = d.tolist()
 55 |             if len(to_list) != 0:
 56 |                 bbox_list.append(to_list)
 57 | 
 58 |     bbox_list = np.array(bbox_list)
 59 |     bboxes = []
 60 |     if len(bbox_list) != 0:
 61 |         # Rescale boxes to original image
 62 |         bbox_list = rescale_boxes(bbox_list[0], img_size, frame.shape)
 63 |         print(bbox_list)
 64 |         for x1, y1, x2, y2, conf, cls_conf, cls_pred in bbox_list:
 65 | 
 66 |             print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))
 67 | 
 68 |             box_w = x2 - x1
 69 |             box_h = y2 - y1
 70 |             bbox = [0, x1, y1, box_w, box_h, cls_pred]
 71 |             bboxes.append(bbox)
 72 | 
 73 |     if show_video > 0:
 74 |         show_image(frame, np.array(bboxes), 255 if is_sae else 1)
 75 |         plt.pause(0.001)
 76 | 
 77 |     if save_frames > 0:
 78 |         save_bb_image(frame, np.array(bboxes), output_path + "/" + str(batch_count) + ".png", False, 255 if is_sae else 1)
 79 | 
 80 | 
 81 | def tbr_detection(gen1_video,
 82 |                     tbr_bits,
 83 |                     delta_t,
 84 |                     output_path,
 85 |                     show_video,
 86 |                     save_frames,
 87 |                     img_size,
 88 |                     conf_thres,
 89 |                     nms_thres):
 90 |     # Set up TBE vars
 91 |     accumulation_mat = np.zeros((tbr_bits, gen1_video.get_size()[0], gen1_video.get_size()[1]))
 92 |     tbe_frame = np.zeros(gen1_video.get_size())
 93 |     encoder = TemporalBinaryEncoding(tbr_bits, gen1_video.get_size()[1], gen1_video.get_size()[0])
 94 | 
 95 |     i = 0
 96 |     batch_count = 0
 97 |     prev_time = time.time()
 98 |     # Parse events and build TBE frames
 99 |     while not gen1_video.done:
100 |         i = (i + 1) % tbr_bits
101 |         # Load next 1ms events from the video
102 |         events = gen1_video.load_delta_t(delta_t)
103 |         f = np.zeros(gen1_video.get_size())
104 |         for e in events:
105 |             # Evaluate presence/absence of event for
106 |             # a certain pixel
107 |             f[e['y'], e['x']] = 1
108 | 
109 |         accumulation_mat[i, ...] = f
110 | 
111 |         if i == tbr_bits - 1:
112 |             # Encode frame
113 |             tbe_frame = encoder.encode(accumulation_mat)
114 | 
115 |             transform = transforms.Compose([
116 |                 ToTensor(),
117 |                 Resize(img_size)
118 |             ])
119 | 
120 |             # Implemented transformations expect bbox array. Use a fake array
121 |             # @TODO: find a better solution...
122 |             input_img, bbox = transform([Image.fromarray(255 * tbe_frame).convert('RGB'), np.zeros((1,1))])
123 |             # Add batch size (1)
124 |             input_img = torch.unsqueeze(input_img, 0)
125 |             input_img = input_img.to(device)
126 | 
127 |             detect_prev_time = time.time()
128 |             # Detect objects on TBE frame
129 |             with torch.no_grad():
130 |                 detections = model(input_img)
131 |                 detections = non_max_suppression(detections, conf_thres, nms_thres)
132 |             detection_time = datetime.timedelta(seconds=time.time() - detect_prev_time)
133 |             print("\t+ Detection Time: %s" % (detection_time))
134 | 
135 |             # Log progress
136 |             current_time = time.time()
137 |             inference_time = datetime.timedelta(seconds=current_time - prev_time)
138 |             prev_time = current_time
139 |             print("\t+ Batch %d, Inference Time: %s" % (batch_count, inference_time))
140 |             batch_count += 1
141 | 
142 |             rescaleAndHandleFrame(detections, tbe_frame, img_size, show_video, save_frames, batch_count)
143 | 
144 | 
145 | def polarity_detection(gen1_video,
146 |                         delta_t,
147 |                         output_path,
148 |                         show_video,
149 |                         save_frames,
150 |                         img_size,
151 |                         conf_thres,
152 |                         nms_thres):
153 |     batch_count = 0
154 |     prev_time = time.time()
155 |     # Parse events and build Polarity frames
156 |     while not gen1_video.done:
157 |         # Load next events from the video
158 |         events = gen1_video.load_delta_t(delta_t)
159 |         p_frame = np.full(gen1_video.get_size(), 0.5)
160 |         for e in events:
161 |             # Evaluate polarity of an event 
162 |             # for a certain pixel
163 |             if e['p'] == 1:
164 |                 p_frame[e['y'], e['x']] = 1
165 |             else:
166 |                 p_frame[e['y'], e['x']] = 0
167 | 
168 |         transform = transforms.Compose([
169 |             ToTensor(),
170 |             Resize(img_size)
171 |         ])
172 | 
173 |         # Implemented transformations expect bbox array. Use a fake array
174 |         # @TODO: find a better solution...
175 |         input_img, bbox = transform([Image.fromarray(255 * p_frame).convert('RGB'), np.zeros((1,1))])
176 |         # Add batch size (1)
177 |         input_img = torch.unsqueeze(input_img, 0)
178 |         input_img = input_img.to(device)
179 | 
180 |         detect_prev_time = time.time()
181 |         # Detect objects on Polarity frame
182 |         with torch.no_grad():
183 |             detections = model(input_img)
184 |             detections = non_max_suppression(detections, conf_thres, nms_thres)
185 |         detection_time = datetime.timedelta(seconds=time.time() - detect_prev_time)
186 |         print("\t+ Detection Time: %s" % (detection_time))
187 | 
188 |         # Log progress
189 |         current_time = time.time()
190 |         inference_time = datetime.timedelta(seconds=current_time - prev_time)
191 |         prev_time = current_time
192 |         print("\t+ Batch %d, Inference Time: %s" % (batch_count, inference_time))
193 |         batch_count += 1
194 | 
195 |         rescaleAndHandleFrame(detections, p_frame, img_size, show_video, save_frames, batch_count)
196 | 
197 | 
198 | def sae_detection(gen1_video,
199 |                     delta_t,
200 | 		    output_path,
201 |                     show_video,
202 |                     save_frames,
203 |                     img_size,
204 |                     conf_thres,
205 |                     nms_thres):
206 |     batch_count = 0
207 |     prev_time = time.time()
208 |     startTimestamp = 0    # microseconds
209 |     # Parse events and build SAE frames
210 |     while not gen1_video.done:
211 |         # Load next events from the video
212 |         events = gen1_video.load_delta_t(delta_t)
213 |         sae_frame = np.zeros(gen1_video.get_size())
214 |         for e in events:
215 |             # Evaluate sae of an event
216 |             # for a certain pixel
217 |             t_p = e['t']             # microseconds
218 |             t_0 = startTimestamp     # microseconds
219 |             sae_frame[e['y'], e['x']] = 255 * ((t_p - t_0) / delta_t)
220 | 
221 |         startTimestamp += delta_t
222 | 
223 |         transform = transforms.Compose([
224 |             ToTensor(),
225 |             Resize(img_size)
226 |         ])
227 | 
228 |         # Implemented transformations expect bbox array. Use a fake array
229 |         # @TODO: find a better solution...
230 |         input_img, bbox = transform([Image.fromarray(sae_frame).convert('RGB'), np.zeros((1,1))])
231 |         # Add batch size (1)
232 |         input_img = torch.unsqueeze(input_img, 0)
233 |         input_img = input_img.to(device)
234 | 
235 |         detect_prev_time = time.time()
236 |         # Detect objects on SAE frame
237 |         with torch.no_grad():
238 |             detections = model(input_img)
239 |             detections = non_max_suppression(detections, conf_thres, nms_thres)
240 |         detection_time = datetime.timedelta(seconds=time.time() - detect_prev_time)
241 |         print("\t+ Detection Time: %s" % (detection_time))
242 | 
243 |         # Log progress
244 |         current_time = time.time()
245 |         inference_time = datetime.timedelta(seconds=current_time - prev_time)
246 |         prev_time = current_time
247 |         print("\t+ Batch %d, Inference Time: %s" % (batch_count, inference_time))
248 |         batch_count += 1
249 | 
250 |         rescaleAndHandleFrame(detections, sae_frame, img_size, show_video, save_frames, batch_count, True)
251 | 
252 | 
253 | if __name__ == "__main__":
254 |     parser = argparse.ArgumentParser()
255 |     parser.add_argument("--encoder", type=str, default='tbr',
256 |                         help="encoder: encode method <tbr | polarity | sae>")
257 |     parser.add_argument("--event_video", type=str, default="../data/event.dat", 
258 |                         help="event_video: path to event video (.dat)")
259 |     parser.add_argument('--tbr_bits', '-n', type=int, default=8,
260 |                         help='tbr_bits: set the number of bits for Temporal Binary Representation. Default: 8')
261 |     parser.add_argument('--accumulation_time', '-a', type=int, default=2500,
262 |                         help='accumulation_time: set the quantization time of events (microseconds). Default: 2500')
263 |     parser.add_argument("--model_def", type=str, default="../PyTorch-YOLOv3/config/yolov3.cfg",
264 |                         help="model_def: path to model definition file")
265 |     parser.add_argument("--weights_path", type=str, default="../PyTorch-YOLOv3/weights/yolov3.weights", 
266 |                         help="weights_path: path to weights file")
267 |     parser.add_argument("--class_path", type=str, default="../data/classes.names", 
268 |                         help="class_path: path to class label file")
269 |     parser.add_argument("--conf_thres", type=float, default=0.8, 
270 |                         help="conf_thres: object confidence threshold")
271 |     parser.add_argument("--nms_thres", type=float, default=0.4, 
272 |                         help="nms_thres: iou thresshold for non-maximum suppression")
273 |     parser.add_argument("--batch_size", type=int, default=1, 
274 |                         help="batch_size: size of the batches")
275 |     parser.add_argument("--n_cpu", type=int, default=0, 
276 |                         help="m_cpu: number of cpu threads to use during batch generation")
277 |     parser.add_argument("--img_size", type=int, default=416, 
278 |                         help="img_size: size of each image dimension")
279 |     parser.add_argument('--show_video', action='count', default=0,
280 |                         help='show_video: show video with evaluated TBR frames and their bboxes during processing. Default: false')
281 |     parser.add_argument('--save_frames', action='count', default=0,
282 |                         help='save_frames: save TBE frames and their detection as images')
283 | 
284 |     opt = parser.parse_args()
285 |     print(opt)
286 | 
287 |     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
288 | 
289 |     output_path = "../rt_detections"
290 |     if opt.save_frames > 0:
291 |         os.makedirs(output_path, exist_ok=True)
292 | 
293 |     # Set up model
294 |     model = Darknet(opt.model_def, img_size=opt.img_size).to(device)
295 | 
296 |     if opt.weights_path.endswith(".weights"):
297 |         # Load darknet weights
298 |         model.load_darknet_weights(opt.weights_path)
299 |     else:
300 |         # Load checkpoint weights
301 |         model.load_state_dict(torch.load(opt.weights_path))
302 | 
303 |     model.eval()  # Set in evaluation mode
304 | 
305 |     # Encoder
306 |     encoder = opt.encoder
307 |     if encoder != "tbr" and encoder != "polarity" and encoder != "sae":
308 |         print("Invalid encoder specified. Available encoders: <tbr | polarity | sae>. Exiting...")
309 |         exit()
310 | 
311 |     # Number of bits to be used in Temporal Binary Encoding
312 |     tbr_bits = opt.tbr_bits
313 | 
314 |     # Accumulation time (microseconds)
315 |     delta_t = opt.accumulation_time
316 |     
317 |     gen1_video = PSEELoader(opt.event_video)
318 | 
319 |     classes = load_classes(opt.class_path)  # Extracts class labels from file
320 | 
321 |     if encoder == "tbr":
322 |         tbr_detection(gen1_video, 
323 |                     tbr_bits, 
324 |                     delta_t, 
325 |                     output_path, 
326 |                     opt.show_video, 
327 |                     opt.save_frames, 
328 |                     opt.img_size, 
329 |                     opt.conf_thres, 
330 |                     opt.nms_thres)
331 |     elif encoder == "polarity":
332 |         polarity_detection(gen1_video, 
333 |                         delta_t, 
334 |                         output_path, 
335 |                         opt.show_video, 
336 |                         opt.save_frames, 
337 |                         opt.img_size, 
338 |                         opt.conf_thres, 
339 |                         opt.nms_thres)
340 |     else:
341 |         sae_detection(gen1_video, 
342 |                     delta_t, 
343 |                     output_path, 
344 |                     opt.show_video, 
345 |                     opt.save_frames, 
346 |                     opt.img_size, 
347 |                     opt.conf_thres, 
348 |                     opt.nms_thres)
349 | 
350 | 


--------------------------------------------------------------------------------
/src/tbe.py:
--------------------------------------------------------------------------------
 1 | """
 2 | tbe.py: class that manages the Temporal Binary Encoding
 3 | """
 4 | 
 5 | import numpy as np
 6 | 
 7 | class TemporalBinaryEncoding:
 8 |     """
 9 |     @brief: Class for Temporal Binary Encoding using N bits
10 |     """
11 | 
12 |     def __init__(self, N: int, width: int, height: int):
13 |         self.N = N
14 |         self.width = width
15 |         self.height = height
16 |         self._mask = np.ones((self.N, self.height, self.width))
17 | 
18 |         # Build the mask
19 |         for i in range(N):
20 |             self._mask[i, :, :] = 2 ** i
21 | 
22 |     def encode(self, mat: np.array) -> np.array:
23 |         """
24 |         @brief: Encode events using binary encoding
25 |         @param: mat
26 |         @return: Encoded frame
27 |         """
28 | 
29 |         frame = np.sum((mat * self._mask), 0) / (2 ** self.N)
30 |         return frame


--------------------------------------------------------------------------------
/src/test_gen1.py:
--------------------------------------------------------------------------------
  1 | """
  2 | test_gen1.py: Fork of the YOLOv3 implementation.
  3 | This script will also output a GEN1 compliant
  4 | npy array for each test event in order to use
  5 | the prophesee COCO evaluation.
  6 | """
  7 | 
  8 | from __future__ import division
  9 | 
 10 | import sys
 11 | sys.path.insert(0, '../PyTorch-YOLOv3/')
 12 | from models import *
 13 | from utils.utils import *
 14 | from utils.datasets import *
 15 | from utils.augmentations import *
 16 | from utils.transforms import *
 17 | from utils.parse_config import *
 18 | 
 19 | import os
 20 | import sys
 21 | import time
 22 | import datetime
 23 | import argparse
 24 | import tqdm
 25 | 
 26 | import torch
 27 | from torch.utils.data import DataLoader
 28 | from torchvision import datasets
 29 | from torchvision import transforms
 30 | from torch.autograd import Variable
 31 | import torch.optim as optim
 32 | 
 33 | 
 34 | def extract_bboxes(tensor, timestamp, img_size):
 35 |     bboxes = []
 36 |     if tensor is None:
 37 |         return bboxes
 38 | 
 39 |     gen1_img_width = 304
 40 |     gen1_img_height = 240
 41 |     clone_tensor = tensor.clone()
 42 |     clone_tensor = clone_tensor.numpy()
 43 |     # Rescale boxes to original image
 44 |     bbox_list = rescale_boxes(clone_tensor, img_size, (gen1_img_height, gen1_img_width))
 45 |     for b in bbox_list:
 46 |         x1 = b[0]
 47 |         y1 = b[1]
 48 |         x2 = b[2]
 49 |         y2 = b[3]
 50 | 
 51 |         w = x2 - x1
 52 |         h = y2 - y1
 53 | 
 54 |         pred_cls = int(b[6])
 55 |         conf = float(b[5])
 56 | 
 57 |         bbox = [timestamp, int(x1), int(y1), int(w), int(h), pred_cls, conf, 0]
 58 |         bboxes.append(tuple(bbox))
 59 |     
 60 |     return bboxes
 61 | 
 62 | def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size, gen1_output, acc_time):
 63 |     model.eval()
 64 | 
 65 |     # Get dataloader
 66 |     dataset = ListDataset(path, 
 67 |                           img_size=img_size, 
 68 |                           multiscale=False, 
 69 |                           transform=DEFAULT_TRANSFORMS)
 70 |     dataloader = torch.utils.data.DataLoader(
 71 |         dataset, 
 72 |         batch_size=batch_size,
 73 |         shuffle=False,
 74 |         num_workers=1,
 75 |         collate_fn=dataset.collate_fn
 76 |     )
 77 | 
 78 |     Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
 79 | 
 80 |     labels = []
 81 |     sample_metrics = []  # List of tuples (TP, confs, pred)
 82 |     event_npy = []
 83 |     gen1_data_type= np.dtype([('ts', '<u8'), 
 84 |                             ('x', '<f4'), 
 85 |                             ('y', '<f4'), 
 86 |                             ('w', '<f4'), 
 87 |                             ('h', '<f4'), 
 88 |                             ('class_id', 'u1'),
 89 |                             ('confidence', '<f4'),
 90 |                             ('track_id', '<u4')])
 91 |     last_event = ""
 92 |     rel_path = gen1_output
 93 |     for batch_i, (img_names, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):
 94 |         
 95 |         if targets is None:
 96 |             continue
 97 |             
 98 |         # Extract labels
 99 |         labels += targets[:, 1].tolist()
100 |         # Rescale target
101 |         targets[:, 2:] = xywh2xyxy(targets[:, 2:])
102 |         targets[:, 2:] *= img_size
103 | 
104 |         imgs = Variable(imgs.type(Tensor), requires_grad=False)
105 | 
106 |         curr_event = os.path.basename(img_names[0])
107 |         event_split = curr_event.split('_')
108 |         curr_event = event_split[0] + "_" + event_split[1] + "_" + event_split[2] + "_" + event_split[3]
109 |         file_ts = int(event_split[4].split('.')[0]) * 1000    # to microseconds
110 |         # @TODO: tbd if this is an optimal approximation
111 |         ts = (file_ts + (accumulation_time / 2))
112 | 
113 |         with torch.no_grad():
114 |             outputs = model(imgs)
115 |             outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)
116 | 
117 |         if last_event == "":
118 |             # First event
119 |             bboxes = extract_bboxes(outputs[0], ts, img_size)
120 |             if len(bboxes) != 0:
121 |                 event_npy += bboxes
122 |         elif last_event == curr_event:
123 |             # Prediction of the same event
124 |             bboxes = extract_bboxes(outputs[0], ts, img_size)
125 |             if len(bboxes) != 0:
126 |                 event_npy += bboxes
127 |         else:
128 |             # Prediction of another event
129 |             # Save old array
130 |             event_npy = np.array(event_npy, dtype=gen1_data_type)
131 |             np.save(rel_path + "/" + last_event + "_bbox.npy", event_npy)
132 |             # Create new array
133 |             event_npy = []
134 |             bboxes = extract_bboxes(outputs[0], ts, img_size)
135 |             if len(bboxes) != 0:
136 |                 event_npy += bboxes
137 | 
138 |         last_event = curr_event
139 | 
140 |         sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres)
141 |     
142 |     # Save last event
143 |     event_npy = np.array(event_npy, dtype=gen1_data_type)
144 |     np.save(rel_path + "/" + curr_event + "_bbox.npy", event_npy)
145 | 
146 |     if len(sample_metrics) == 0:  # no detections over whole validation set.
147 |         return None
148 |     
149 |     # Concatenate sample statistics
150 |     true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]
151 |     precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)
152 | 
153 |     return precision, recall, AP, f1, ap_class
154 | 
155 | 
156 | if __name__ == "__main__":
157 |     parser = argparse.ArgumentParser()
158 |     parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
159 |     parser.add_argument("--data_config", type=str, default="config/coco.data", help="path to data config file")
160 |     parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
161 |     parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
162 |     parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected")
163 |     parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold")
164 |     parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")
165 |     parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
166 |     parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
167 |     parser.add_argument("--total_acc_time", type=int, default=20000, 
168 |                         help="total accumulation time in microseconds (for tbe = accumulation time * nbits)")
169 |     parser.add_argument("--gen1_output", type=str, default="../gen1_arrays", 
170 |                         help="path where GEN1 npy file should be stored")
171 |     opt = parser.parse_args()
172 |     print(opt)
173 | 
174 |     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
175 | 
176 |     data_config = parse_data_config(opt.data_config)
177 |     valid_path = data_config["valid"]
178 |     class_names = load_classes(data_config["names"])
179 | 
180 |     # Initiate model
181 |     model = Darknet(opt.model_def).to(device)
182 |     if opt.weights_path.endswith(".weights"):
183 |         # Load darknet weights
184 |         model.load_darknet_weights(opt.weights_path)
185 |     else:
186 |         # Load checkpoint weights
187 |         model.load_state_dict(torch.load(opt.weights_path))
188 | 
189 |     gen1_output_path = opt.gen1_output
190 |     accumulation_time = opt.total_acc_time
191 | 
192 |     print("Compute mAP...")
193 | 
194 |     precision, recall, AP, f1, ap_class = evaluate(
195 |         model,
196 |         path=valid_path,
197 |         iou_thres=opt.iou_thres,
198 |         conf_thres=opt.conf_thres,
199 |         nms_thres=opt.nms_thres,
200 |         img_size=opt.img_size,
201 |         batch_size=1,
202 |         gen1_output=gen1_output_path,
203 |         acc_time=accumulation_time
204 |     )
205 | 
206 |     print("Class precisions:")
207 |     for i, c in enumerate(precision):
208 |         print(f"+ Class '{i}' ({class_names[i]}) - Precision: {c}")
209 | 
210 |     print("Class recalls:")
211 |     for i, c in enumerate(recall):
212 |         print(f"+ Class '{i}' ({class_names[i]}) - Recall: {c}")
213 | 
214 |     print("Average Precisions:")
215 |     for i, c in enumerate(ap_class):
216 |         print(f"+ Class '{c}' ({class_names[c]}) - AP: {AP[i]}")
217 | 
218 |     print(f"mAP: {AP.mean()}")
219 | 


--------------------------------------------------------------------------------
/tools/change_dataset_path.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # Change the path of images in train, valid and test txt files.
 3 | # Can be used when the encoded dataset is moved in another path.
 4 | # Usage: /bin/bash change_dataset_path.sh <src_dataset_root_folder> <dst_dataset_root_folder>
 5 | 
 6 | src_folder="$1"
 7 | new_path="$2"
 8 | 
 9 | src_relative_path="$src_folder"/data/custom
10 | dest_relative_path="$new_path"/data/custom/images
11 | train_txt_file="$src_relative_path"/train.txt
12 | valid_txt_file="$src_relative_path"/valid.txt
13 | test_txt_file="$src_relative_path"/test.txt
14 | 
15 | check_txt_files() {
16 | 	local txt_file_path="$1"
17 | 	if [ ! -f "$txt_file_path" ]
18 | 	then
19 | 		echo "File $txt_file_path does not exists, exiting..."
20 | 		exit 1
21 | 	fi
22 | }
23 | 
24 | change_paths() {
25 | 	local tmp="$1"/swp.txt
26 | 	local src="$2"
27 | 
28 | 	touch "$tmp"
29 | 
30 | 	while read line; do
31 | 		filename=$(basename "$line")
32 | 		new_line="$dest_relative_path"/"$filename"
33 | 		echo "$new_line" >> "$tmp"
34 | 	done < "$src"
35 | 
36 | 	mv "$tmp" "$src"
37 | }
38 | 
39 | check_txt_files "$train_txt_file"
40 | check_txt_files "$valid_txt_file"
41 | check_txt_files "$test_txt_file"
42 | change_paths "$src_relative_path" "$train_txt_file"
43 | change_paths "$src_relative_path" "$valid_txt_file"
44 | change_paths "$src_relative_path" "$test_txt_file"
45 | 


--------------------------------------------------------------------------------
/tools/get_bbox_classes.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # Get the number of bboxes saved for each class.
 3 | # Usage: /bin/bash get_bbox_classes.sh </path/to/txt/file> </path/to/labels>
 4 | 
 5 | pedestrians=0
 6 | vehicles=0
 7 | images=0
 8 | 
 9 | readarray -t a < "$1"
10 | 
11 | for txt in "${a[@]}"
12 | do
13 | 	images=$((images+1))
14 | 	filename=$(basename "$txt")
15 | 	filename="${filename%.*}.txt"
16 |     while read line; do
17 |     	# Read each label line
18 | 		class=$(echo $line | head -n1 | awk '{print $1;}')
19 | 		if [ "$class" == 1 ]
20 | 		then
21 | 			pedestrians=$((pedestrians+1))
22 | 		elif [ "$class" == 0 ]
23 | 		then
24 | 			vehicles=$((vehicles+1))
25 | 		fi
26 | 	done < "$2"/"$filename"
27 | done
28 | 
29 | echo "Images: $images"
30 | echo "Vehicles: $vehicles"
31 | echo "Pedestrians: $pedestrians"


--------------------------------------------------------------------------------
/tools/get_gen1_bboxes.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sys
 3 | import numpy as np
 4 | 
 5 | def printGen1BBoxes(directory: str):
 6 |     """
 7 |     @brief: Print GEN1 bboxes (.npy files)
 8 |     @param: directory - Directory where GEN1 .npy files are stored
 9 |     """
10 | 
11 |     file_list_npy = [file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory, file)) and 
12 |                                                                 os.path.splitext(os.path.join(directory, file))[1] == '.npy']
13 |     for npy_arr in file_list_npy:
14 |         pedestrian_bboxes = 0
15 |         vehicle_bboxes = 0
16 |         
17 |         gen1_bboxes = np.load(directory + "/" + npy_arr)
18 |         for bbox in gen1_bboxes:
19 |             if bbox[5] == 1:
20 |                 pedestrian_bboxes += 1
21 |             else:
22 |                 vehicle_bboxes += 1
23 | 
24 |         print("==================================")
25 |         print("Filename: " + npy_arr)
26 |         print("Pedestrian bboxes: " + str(pedestrian_bboxes))
27 |         print("Vehicle bboxes: " + str(vehicle_bboxes))
28 | 
29 | 
30 | if __name__ == "__main__":
31 |     if len(sys.argv) < 2:
32 |         print("Usage: python get_gen1_bboxes.py path")
33 |         exit()
34 | 
35 |     printGen1BBoxes(str(sys.argv[1]))


--------------------------------------------------------------------------------
/yolo_config/gen1-test.data:
--------------------------------------------------------------------------------
1 | classes= 2
2 | valid=/path/to/dataset/data/custom/test.txt
3 | names=/path/to/dataset/data/custom/classes.names
4 | eval=coco
5 | 


--------------------------------------------------------------------------------
/yolo_config/gen1.data:
--------------------------------------------------------------------------------
1 | classes=2
2 | train=/path/to/dataset/data/custom/train.txt
3 | valid=/path/to/dataset/data/custom/valid.txt
4 | names=/path/to/dataset/data/custom/classes.names
5 | 


--------------------------------------------------------------------------------
/yolo_config/yolov3-gen1.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Training
  3 | batch=64
  4 | subdivisions=8
  5 | width=320
  6 | height=320
  7 | channels=3
  8 | momentum=0.9
  9 | decay=0.0005
 10 | angle=0
 11 | saturation = 1.5
 12 | exposure = 1.5
 13 | hue=.1
 14 | 
 15 | learning_rate=0.001
 16 | burn_in=1000
 17 | max_batches = 500200
 18 | policy=steps
 19 | steps=400000,450000
 20 | scales=.1,.1
 21 | 
 22 | [convolutional]
 23 | batch_normalize=1
 24 | filters=32
 25 | size=3
 26 | stride=1
 27 | pad=1
 28 | activation=leaky
 29 | 
 30 | # Downsample
 31 | 
 32 | [convolutional]
 33 | batch_normalize=1
 34 | filters=64
 35 | size=3
 36 | stride=2
 37 | pad=1
 38 | activation=leaky
 39 | 
 40 | [convolutional]
 41 | batch_normalize=1
 42 | filters=32
 43 | size=1
 44 | stride=1
 45 | pad=1
 46 | activation=leaky
 47 | 
 48 | [convolutional]
 49 | batch_normalize=1
 50 | filters=64
 51 | size=3
 52 | stride=1
 53 | pad=1
 54 | activation=leaky
 55 | 
 56 | [shortcut]
 57 | from=-3
 58 | activation=linear
 59 | 
 60 | # Downsample
 61 | 
 62 | [convolutional]
 63 | batch_normalize=1
 64 | filters=128
 65 | size=3
 66 | stride=2
 67 | pad=1
 68 | activation=leaky
 69 | 
 70 | [convolutional]
 71 | batch_normalize=1
 72 | filters=64
 73 | size=1
 74 | stride=1
 75 | pad=1
 76 | activation=leaky
 77 | 
 78 | [convolutional]
 79 | batch_normalize=1
 80 | filters=128
 81 | size=3
 82 | stride=1
 83 | pad=1
 84 | activation=leaky
 85 | 
 86 | [shortcut]
 87 | from=-3
 88 | activation=linear
 89 | 
 90 | [convolutional]
 91 | batch_normalize=1
 92 | filters=64
 93 | size=1
 94 | stride=1
 95 | pad=1
 96 | activation=leaky
 97 | 
 98 | [convolutional]
 99 | batch_normalize=1
100 | filters=128
101 | size=3
102 | stride=1
103 | pad=1
104 | activation=leaky
105 | 
106 | [shortcut]
107 | from=-3
108 | activation=linear
109 | 
110 | # Downsample
111 | 
112 | [convolutional]
113 | batch_normalize=1
114 | filters=256
115 | size=3
116 | stride=2
117 | pad=1
118 | activation=leaky
119 | 
120 | [convolutional]
121 | batch_normalize=1
122 | filters=128
123 | size=1
124 | stride=1
125 | pad=1
126 | activation=leaky
127 | 
128 | [convolutional]
129 | batch_normalize=1
130 | filters=256
131 | size=3
132 | stride=1
133 | pad=1
134 | activation=leaky
135 | 
136 | [shortcut]
137 | from=-3
138 | activation=linear
139 | 
140 | [convolutional]
141 | batch_normalize=1
142 | filters=128
143 | size=1
144 | stride=1
145 | pad=1
146 | activation=leaky
147 | 
148 | [convolutional]
149 | batch_normalize=1
150 | filters=256
151 | size=3
152 | stride=1
153 | pad=1
154 | activation=leaky
155 | 
156 | [shortcut]
157 | from=-3
158 | activation=linear
159 | 
160 | [convolutional]
161 | batch_normalize=1
162 | filters=128
163 | size=1
164 | stride=1
165 | pad=1
166 | activation=leaky
167 | 
168 | [convolutional]
169 | batch_normalize=1
170 | filters=256
171 | size=3
172 | stride=1
173 | pad=1
174 | activation=leaky
175 | 
176 | [shortcut]
177 | from=-3
178 | activation=linear
179 | 
180 | [convolutional]
181 | batch_normalize=1
182 | filters=128
183 | size=1
184 | stride=1
185 | pad=1
186 | activation=leaky
187 | 
188 | [convolutional]
189 | batch_normalize=1
190 | filters=256
191 | size=3
192 | stride=1
193 | pad=1
194 | activation=leaky
195 | 
196 | [shortcut]
197 | from=-3
198 | activation=linear
199 | 
200 | 
201 | [convolutional]
202 | batch_normalize=1
203 | filters=128
204 | size=1
205 | stride=1
206 | pad=1
207 | activation=leaky
208 | 
209 | [convolutional]
210 | batch_normalize=1
211 | filters=256
212 | size=3
213 | stride=1
214 | pad=1
215 | activation=leaky
216 | 
217 | [shortcut]
218 | from=-3
219 | activation=linear
220 | 
221 | [convolutional]
222 | batch_normalize=1
223 | filters=128
224 | size=1
225 | stride=1
226 | pad=1
227 | activation=leaky
228 | 
229 | [convolutional]
230 | batch_normalize=1
231 | filters=256
232 | size=3
233 | stride=1
234 | pad=1
235 | activation=leaky
236 | 
237 | [shortcut]
238 | from=-3
239 | activation=linear
240 | 
241 | [convolutional]
242 | batch_normalize=1
243 | filters=128
244 | size=1
245 | stride=1
246 | pad=1
247 | activation=leaky
248 | 
249 | [convolutional]
250 | batch_normalize=1
251 | filters=256
252 | size=3
253 | stride=1
254 | pad=1
255 | activation=leaky
256 | 
257 | [shortcut]
258 | from=-3
259 | activation=linear
260 | 
261 | [convolutional]
262 | batch_normalize=1
263 | filters=128
264 | size=1
265 | stride=1
266 | pad=1
267 | activation=leaky
268 | 
269 | [convolutional]
270 | batch_normalize=1
271 | filters=256
272 | size=3
273 | stride=1
274 | pad=1
275 | activation=leaky
276 | 
277 | [shortcut]
278 | from=-3
279 | activation=linear
280 | 
281 | # Downsample
282 | 
283 | [convolutional]
284 | batch_normalize=1
285 | filters=512
286 | size=3
287 | stride=2
288 | pad=1
289 | activation=leaky
290 | 
291 | [convolutional]
292 | batch_normalize=1
293 | filters=256
294 | size=1
295 | stride=1
296 | pad=1
297 | activation=leaky
298 | 
299 | [convolutional]
300 | batch_normalize=1
301 | filters=512
302 | size=3
303 | stride=1
304 | pad=1
305 | activation=leaky
306 | 
307 | [shortcut]
308 | from=-3
309 | activation=linear
310 | 
311 | 
312 | [convolutional]
313 | batch_normalize=1
314 | filters=256
315 | size=1
316 | stride=1
317 | pad=1
318 | activation=leaky
319 | 
320 | [convolutional]
321 | batch_normalize=1
322 | filters=512
323 | size=3
324 | stride=1
325 | pad=1
326 | activation=leaky
327 | 
328 | [shortcut]
329 | from=-3
330 | activation=linear
331 | 
332 | 
333 | [convolutional]
334 | batch_normalize=1
335 | filters=256
336 | size=1
337 | stride=1
338 | pad=1
339 | activation=leaky
340 | 
341 | [convolutional]
342 | batch_normalize=1
343 | filters=512
344 | size=3
345 | stride=1
346 | pad=1
347 | activation=leaky
348 | 
349 | [shortcut]
350 | from=-3
351 | activation=linear
352 | 
353 | 
354 | [convolutional]
355 | batch_normalize=1
356 | filters=256
357 | size=1
358 | stride=1
359 | pad=1
360 | activation=leaky
361 | 
362 | [convolutional]
363 | batch_normalize=1
364 | filters=512
365 | size=3
366 | stride=1
367 | pad=1
368 | activation=leaky
369 | 
370 | [shortcut]
371 | from=-3
372 | activation=linear
373 | 
374 | [convolutional]
375 | batch_normalize=1
376 | filters=256
377 | size=1
378 | stride=1
379 | pad=1
380 | activation=leaky
381 | 
382 | [convolutional]
383 | batch_normalize=1
384 | filters=512
385 | size=3
386 | stride=1
387 | pad=1
388 | activation=leaky
389 | 
390 | [shortcut]
391 | from=-3
392 | activation=linear
393 | 
394 | 
395 | [convolutional]
396 | batch_normalize=1
397 | filters=256
398 | size=1
399 | stride=1
400 | pad=1
401 | activation=leaky
402 | 
403 | [convolutional]
404 | batch_normalize=1
405 | filters=512
406 | size=3
407 | stride=1
408 | pad=1
409 | activation=leaky
410 | 
411 | [shortcut]
412 | from=-3
413 | activation=linear
414 | 
415 | 
416 | [convolutional]
417 | batch_normalize=1
418 | filters=256
419 | size=1
420 | stride=1
421 | pad=1
422 | activation=leaky
423 | 
424 | [convolutional]
425 | batch_normalize=1
426 | filters=512
427 | size=3
428 | stride=1
429 | pad=1
430 | activation=leaky
431 | 
432 | [shortcut]
433 | from=-3
434 | activation=linear
435 | 
436 | [convolutional]
437 | batch_normalize=1
438 | filters=256
439 | size=1
440 | stride=1
441 | pad=1
442 | activation=leaky
443 | 
444 | [convolutional]
445 | batch_normalize=1
446 | filters=512
447 | size=3
448 | stride=1
449 | pad=1
450 | activation=leaky
451 | 
452 | [shortcut]
453 | from=-3
454 | activation=linear
455 | 
456 | # Downsample
457 | 
458 | [convolutional]
459 | batch_normalize=1
460 | filters=1024
461 | size=3
462 | stride=2
463 | pad=1
464 | activation=leaky
465 | 
466 | [convolutional]
467 | batch_normalize=1
468 | filters=512
469 | size=1
470 | stride=1
471 | pad=1
472 | activation=leaky
473 | 
474 | [convolutional]
475 | batch_normalize=1
476 | filters=1024
477 | size=3
478 | stride=1
479 | pad=1
480 | activation=leaky
481 | 
482 | [shortcut]
483 | from=-3
484 | activation=linear
485 | 
486 | [convolutional]
487 | batch_normalize=1
488 | filters=512
489 | size=1
490 | stride=1
491 | pad=1
492 | activation=leaky
493 | 
494 | [convolutional]
495 | batch_normalize=1
496 | filters=1024
497 | size=3
498 | stride=1
499 | pad=1
500 | activation=leaky
501 | 
502 | [shortcut]
503 | from=-3
504 | activation=linear
505 | 
506 | [convolutional]
507 | batch_normalize=1
508 | filters=512
509 | size=1
510 | stride=1
511 | pad=1
512 | activation=leaky
513 | 
514 | [convolutional]
515 | batch_normalize=1
516 | filters=1024
517 | size=3
518 | stride=1
519 | pad=1
520 | activation=leaky
521 | 
522 | [shortcut]
523 | from=-3
524 | activation=linear
525 | 
526 | [convolutional]
527 | batch_normalize=1
528 | filters=512
529 | size=1
530 | stride=1
531 | pad=1
532 | activation=leaky
533 | 
534 | [convolutional]
535 | batch_normalize=1
536 | filters=1024
537 | size=3
538 | stride=1
539 | pad=1
540 | activation=leaky
541 | 
542 | [shortcut]
543 | from=-3
544 | activation=linear
545 | 
546 | ######################
547 | 
548 | [convolutional]
549 | batch_normalize=1
550 | filters=512
551 | size=1
552 | stride=1
553 | pad=1
554 | activation=leaky
555 | 
556 | [convolutional]
557 | batch_normalize=1
558 | size=3
559 | stride=1
560 | pad=1
561 | filters=1024
562 | activation=leaky
563 | 
564 | [convolutional]
565 | batch_normalize=1
566 | filters=512
567 | size=1
568 | stride=1
569 | pad=1
570 | activation=leaky
571 | 
572 | [convolutional]
573 | batch_normalize=1
574 | size=3
575 | stride=1
576 | pad=1
577 | filters=1024
578 | activation=leaky
579 | 
580 | [convolutional]
581 | batch_normalize=1
582 | filters=512
583 | size=1
584 | stride=1
585 | pad=1
586 | activation=leaky
587 | 
588 | [convolutional]
589 | batch_normalize=1
590 | size=3
591 | stride=1
592 | pad=1
593 | filters=1024
594 | activation=leaky
595 | 
596 | [convolutional]
597 | size=1
598 | stride=1
599 | pad=1
600 | filters=255
601 | activation=linear
602 | 
603 | 
604 | [yolo]
605 | mask = 6,7,8
606 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
607 | classes=80
608 | num=9
609 | jitter=.3
610 | ignore_thresh = .7
611 | truth_thresh = 1
612 | random=1
613 | 
614 | 
615 | [route]
616 | layers = -4
617 | 
618 | [convolutional]
619 | batch_normalize=1
620 | filters=256
621 | size=1
622 | stride=1
623 | pad=1
624 | activation=leaky
625 | 
626 | [upsample]
627 | stride=2
628 | 
629 | [route]
630 | layers = -1, 61
631 | 
632 | 
633 | 
634 | [convolutional]
635 | batch_normalize=1
636 | filters=256
637 | size=1
638 | stride=1
639 | pad=1
640 | activation=leaky
641 | 
642 | [convolutional]
643 | batch_normalize=1
644 | size=3
645 | stride=1
646 | pad=1
647 | filters=512
648 | activation=leaky
649 | 
650 | [convolutional]
651 | batch_normalize=1
652 | filters=256
653 | size=1
654 | stride=1
655 | pad=1
656 | activation=leaky
657 | 
658 | [convolutional]
659 | batch_normalize=1
660 | size=3
661 | stride=1
662 | pad=1
663 | filters=512
664 | activation=leaky
665 | 
666 | [convolutional]
667 | batch_normalize=1
668 | filters=256
669 | size=1
670 | stride=1
671 | pad=1
672 | activation=leaky
673 | 
674 | [convolutional]
675 | batch_normalize=1
676 | size=3
677 | stride=1
678 | pad=1
679 | filters=512
680 | activation=leaky
681 | 
682 | [convolutional]
683 | size=1
684 | stride=1
685 | pad=1
686 | filters=255
687 | activation=linear
688 | 
689 | 
690 | [yolo]
691 | mask = 3,4,5
692 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
693 | classes=80
694 | num=9
695 | jitter=.3
696 | ignore_thresh = .7
697 | truth_thresh = 1
698 | random=1
699 | 
700 | 
701 | 
702 | [route]
703 | layers = -4
704 | 
705 | [convolutional]
706 | batch_normalize=1
707 | filters=128
708 | size=1
709 | stride=1
710 | pad=1
711 | activation=leaky
712 | 
713 | [upsample]
714 | stride=2
715 | 
716 | [route]
717 | layers = -1, 36
718 | 
719 | 
720 | 
721 | [convolutional]
722 | batch_normalize=1
723 | filters=128
724 | size=1
725 | stride=1
726 | pad=1
727 | activation=leaky
728 | 
729 | [convolutional]
730 | batch_normalize=1
731 | size=3
732 | stride=1
733 | pad=1
734 | filters=256
735 | activation=leaky
736 | 
737 | [convolutional]
738 | batch_normalize=1
739 | filters=128
740 | size=1
741 | stride=1
742 | pad=1
743 | activation=leaky
744 | 
745 | [convolutional]
746 | batch_normalize=1
747 | size=3
748 | stride=1
749 | pad=1
750 | filters=256
751 | activation=leaky
752 | 
753 | [convolutional]
754 | batch_normalize=1
755 | filters=128
756 | size=1
757 | stride=1
758 | pad=1
759 | activation=leaky
760 | 
761 | [convolutional]
762 | batch_normalize=1
763 | size=3
764 | stride=1
765 | pad=1
766 | filters=256
767 | activation=leaky
768 | 
769 | [convolutional]
770 | size=1
771 | stride=1
772 | pad=1
773 | filters=255
774 | activation=linear
775 | 
776 | 
777 | [yolo]
778 | mask = 0,1,2
779 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
780 | classes=80
781 | num=9
782 | jitter=.3
783 | ignore_thresh = .7
784 | truth_thresh = 1
785 | random=1
786 | 


--------------------------------------------------------------------------------
/yolo_config/yolov3-tiny.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Training
  3 | batch=64
  4 | subdivisions=8
  5 | width=320
  6 | height=320
  7 | channels=3
  8 | momentum=0.9
  9 | decay=0.0005
 10 | angle=0
 11 | saturation = 1.5
 12 | exposure = 1.5
 13 | hue=.1
 14 | 
 15 | learning_rate=0.001
 16 | burn_in=1000
 17 | max_batches = 500200
 18 | policy=steps
 19 | steps=400000,450000
 20 | scales=.1,.1
 21 | 
 22 | # 0
 23 | [convolutional]
 24 | batch_normalize=1
 25 | filters=16
 26 | size=3
 27 | stride=1
 28 | pad=1
 29 | activation=leaky
 30 | 
 31 | # 1
 32 | [maxpool]
 33 | size=2
 34 | stride=2
 35 | 
 36 | # 2
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=32
 40 | size=3
 41 | stride=1
 42 | pad=1
 43 | activation=leaky
 44 | 
 45 | # 3
 46 | [maxpool]
 47 | size=2
 48 | stride=2
 49 | 
 50 | # 4
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | # 5
 60 | [maxpool]
 61 | size=2
 62 | stride=2
 63 | 
 64 | # 6
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=1
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | # 7
 74 | [maxpool]
 75 | size=2
 76 | stride=2
 77 | 
 78 | # 8
 79 | [convolutional]
 80 | batch_normalize=1
 81 | filters=256
 82 | size=3
 83 | stride=1
 84 | pad=1
 85 | activation=leaky
 86 | 
 87 | # 9
 88 | [maxpool]
 89 | size=2
 90 | stride=2
 91 | 
 92 | # 10
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=512
 96 | size=3
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | # 11
102 | [maxpool]
103 | size=2
104 | stride=1
105 | 
106 | # 12
107 | [convolutional]
108 | batch_normalize=1
109 | filters=1024
110 | size=3
111 | stride=1
112 | pad=1
113 | activation=leaky
114 | 
115 | ###########
116 | 
117 | # 13
118 | [convolutional]
119 | batch_normalize=1
120 | filters=256
121 | size=1
122 | stride=1
123 | pad=1
124 | activation=leaky
125 | 
126 | # 14
127 | [convolutional]
128 | batch_normalize=1
129 | filters=512
130 | size=3
131 | stride=1
132 | pad=1
133 | activation=leaky
134 | 
135 | # 15
136 | [convolutional]
137 | size=1
138 | stride=1
139 | pad=1
140 | filters=255
141 | activation=linear
142 | 
143 | 
144 | 
145 | # 16
146 | [yolo]
147 | mask = 3,4,5
148 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
149 | classes=80
150 | num=6
151 | jitter=.3
152 | ignore_thresh = .7
153 | truth_thresh = 1
154 | random=1
155 | 
156 | # 17
157 | [route]
158 | layers = -4
159 | 
160 | # 18
161 | [convolutional]
162 | batch_normalize=1
163 | filters=128
164 | size=1
165 | stride=1
166 | pad=1
167 | activation=leaky
168 | 
169 | # 19
170 | [upsample]
171 | stride=2
172 | 
173 | # 20
174 | [route]
175 | layers = -1, 8
176 | 
177 | # 21
178 | [convolutional]
179 | batch_normalize=1
180 | filters=256
181 | size=3
182 | stride=1
183 | pad=1
184 | activation=leaky
185 | 
186 | # 22
187 | [convolutional]
188 | size=1
189 | stride=1
190 | pad=1
191 | filters=255
192 | activation=linear
193 | 
194 | # 23
195 | [yolo]
196 | mask = 1,2,3
197 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
198 | classes=80
199 | num=6
200 | jitter=.3
201 | ignore_thresh = .7
202 | truth_thresh = 1
203 | random=1
204 | 


--------------------------------------------------------------------------------