├── demo_dataset ├── annotations.txt ├── jumping │ ├── 0001 │ │ ├── img_00001.jpg │ │ ├── img_00002.jpg │ │ ├── img_00003.jpg │ │ ├── img_00004.jpg │ │ ├── img_00005.jpg │ │ ├── img_00006.jpg │ │ ├── img_00007.jpg │ │ ├── img_00008.jpg │ │ ├── img_00009.jpg │ │ ├── img_00010.jpg │ │ ├── img_00011.jpg │ │ ├── img_00012.jpg │ │ ├── img_00013.jpg │ │ ├── img_00014.jpg │ │ ├── img_00015.jpg │ │ ├── img_00016.jpg │ │ └── img_00017.jpg │ └── 0002 │ │ ├── img_00001.jpg │ │ ├── img_00002.jpg │ │ ├── img_00003.jpg │ │ ├── img_00004.jpg │ │ ├── img_00005.jpg │ │ ├── img_00006.jpg │ │ ├── img_00007.jpg │ │ ├── img_00008.jpg │ │ ├── img_00009.jpg │ │ ├── img_00010.jpg │ │ ├── img_00011.jpg │ │ ├── img_00012.jpg │ │ ├── img_00013.jpg │ │ ├── img_00014.jpg │ │ ├── img_00015.jpg │ │ ├── img_00016.jpg │ │ ├── img_00017.jpg │ │ └── img_00018.jpg └── running │ ├── 0001 │ ├── img_00001.jpg │ ├── img_00002.jpg │ ├── img_00003.jpg │ ├── img_00004.jpg │ ├── img_00005.jpg │ ├── img_00006.jpg │ ├── img_00007.jpg │ ├── img_00008.jpg │ ├── img_00009.jpg │ ├── img_00010.jpg │ ├── img_00011.jpg │ ├── img_00012.jpg │ ├── img_00013.jpg │ ├── img_00014.jpg │ └── img_00015.jpg │ └── 0002 │ ├── img_00001.jpg │ ├── img_00002.jpg │ ├── img_00003.jpg │ ├── img_00004.jpg │ ├── img_00005.jpg │ ├── img_00006.jpg │ ├── img_00007.jpg │ ├── img_00008.jpg │ ├── img_00009.jpg │ ├── img_00010.jpg │ ├── img_00011.jpg │ ├── img_00012.jpg │ ├── img_00013.jpg │ ├── img_00014.jpg │ └── img_00015.jpg ├── docs ├── source │ ├── modules.rst │ ├── demo.rst │ ├── VideoDataset.rst │ ├── conf.py │ ├── README.rst │ └── index.rst ├── Makefile └── make.bat ├── demo_dataset_multilabel ├── annotations.txt ├── jumping │ ├── 0001 │ │ ├── img_00001.jpg │ │ ├── img_00002.jpg │ │ ├── img_00003.jpg │ │ ├── img_00004.jpg │ │ ├── img_00005.jpg │ │ ├── img_00006.jpg │ │ ├── img_00007.jpg │ │ ├── img_00008.jpg │ │ ├── img_00009.jpg │ │ ├── img_00010.jpg │ │ ├── img_00011.jpg │ │ ├── img_00012.jpg │ │ ├── img_00013.jpg │ │ ├── img_00014.jpg │ │ ├── img_00015.jpg │ │ ├── img_00016.jpg │ │ └── img_00017.jpg │ └── 0002 │ │ ├── img_00001.jpg │ │ ├── img_00002.jpg │ │ ├── img_00003.jpg │ │ ├── img_00004.jpg │ │ ├── img_00005.jpg │ │ ├── img_00006.jpg │ │ ├── img_00007.jpg │ │ ├── img_00008.jpg │ │ ├── img_00009.jpg │ │ ├── img_00010.jpg │ │ ├── img_00011.jpg │ │ ├── img_00012.jpg │ │ ├── img_00013.jpg │ │ ├── img_00014.jpg │ │ ├── img_00015.jpg │ │ ├── img_00016.jpg │ │ ├── img_00017.jpg │ │ └── img_00018.jpg └── running │ ├── 0001 │ ├── img_00001.jpg │ ├── img_00002.jpg │ ├── img_00003.jpg │ ├── img_00004.jpg │ ├── img_00005.jpg │ ├── img_00006.jpg │ ├── img_00007.jpg │ ├── img_00008.jpg │ ├── img_00009.jpg │ ├── img_00010.jpg │ ├── img_00011.jpg │ ├── img_00012.jpg │ ├── img_00013.jpg │ ├── img_00014.jpg │ └── img_00015.jpg │ └── 0002 │ ├── img_00001.jpg │ ├── img_00002.jpg │ ├── img_00003.jpg │ ├── img_00004.jpg │ ├── img_00005.jpg │ ├── img_00006.jpg │ ├── img_00007.jpg │ ├── img_00008.jpg │ ├── img_00009.jpg │ ├── img_00010.jpg │ ├── img_00011.jpg │ ├── img_00012.jpg │ ├── img_00013.jpg │ ├── img_00014.jpg │ └── img_00015.jpg ├── requirements.txt ├── EpicKitchens100 ├── original_annotations_to_processed_annotations.py └── README.md ├── LICENSE ├── Kinetics400 ├── process_annotation_file.py ├── README.md ├── videos_to_frames.py └── labels_to_id.csv ├── SomethingSomethingV2 ├── README.md ├── original_annotations_to_processed_annotations.py └── videos_to_frames.py ├── demo.py ├── video_dataset.py └── README.md /demo_dataset/annotations.txt: -------------------------------------------------------------------------------- 1 | jumping/0001 1 17 0 2 | jumping/0002 1 18 0 3 | running/0001 1 15 1 4 | running/0002 1 15 1 -------------------------------------------------------------------------------- /docs/source/modules.rst: -------------------------------------------------------------------------------- 1 | DataLoading 2 | =========== 3 | 4 | .. toctree:: 5 | :maxdepth: 4 6 | 7 | demo 8 | video_dataset 9 | -------------------------------------------------------------------------------- /demo_dataset_multilabel/annotations.txt: -------------------------------------------------------------------------------- 1 | jumping/0001 1 17 0 2 4 2 | jumping/0002 1 18 0 1 3 3 | running/0001 1 15 1 1 2 4 | running/0002 1 15 1 3 3 -------------------------------------------------------------------------------- /docs/source/demo.rst: -------------------------------------------------------------------------------- 1 | demo module 2 | =========== 3 | 4 | .. automodule:: demo 5 | :members: 6 | :undoc-members: 7 | :show-inheritance: 8 | -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00015.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00016.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00016.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0001/img_00017.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00017.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00015.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00016.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00016.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00017.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00017.jpg -------------------------------------------------------------------------------- /demo_dataset/jumping/0002/img_00018.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00018.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0001/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00015.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset/running/0002/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00015.jpg -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # python >= 3.6 2 | torch >= 1.7.0 3 | torchvision >= 0.8.0 4 | 5 | # for demo 6 | matplotlib 7 | 8 | # for author 9 | sphinx == 3.3.1 -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00015.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00016.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00016.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0001/img_00017.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00017.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00015.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00016.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00016.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00017.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00017.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/jumping/0002/img_00018.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00018.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0001/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00015.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00001.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00002.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00003.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00004.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00005.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00006.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00007.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00008.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00009.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00010.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00011.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00011.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00012.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00012.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00013.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00013.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00014.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00014.jpg -------------------------------------------------------------------------------- /demo_dataset_multilabel/running/0002/img_00015.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00015.jpg -------------------------------------------------------------------------------- /docs/source/VideoDataset.rst: -------------------------------------------------------------------------------- 1 | VideoDataset module 2 | ===================== 3 | 4 | .. automodule:: video_dataset 5 | :members: 6 | :exclude-members: VideoRecord 7 | :show-inheritance: 8 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line, and also 5 | # from the environment for the first two. 6 | SPHINXOPTS ?= 7 | SPHINXBUILD ?= sphinx-build 8 | SOURCEDIR = source 9 | BUILDDIR = build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) -b coverage 22 | -------------------------------------------------------------------------------- /docs/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | pushd %~dp0 4 | 5 | REM Command file for Sphinx documentation 6 | 7 | if "%SPHINXBUILD%" == "" ( 8 | set SPHINXBUILD=sphinx-build 9 | ) 10 | set SOURCEDIR=source 11 | set BUILDDIR=build 12 | 13 | if "%1" == "" goto help 14 | 15 | %SPHINXBUILD% >NUL 2>NUL 16 | if errorlevel 9009 ( 17 | echo. 18 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx 19 | echo.installed, then set the SPHINXBUILD environment variable to point 20 | echo.to the full path of the 'sphinx-build' executable. Alternatively you 21 | echo.may add the Sphinx directory to PATH. 22 | echo. 23 | echo.If you don't have Sphinx installed, grab it from 24 | echo.http://sphinx-doc.org/ 25 | exit /b 1 26 | ) 27 | 28 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% 29 | goto end 30 | 31 | :help 32 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% 33 | 34 | :end 35 | popd 36 | -------------------------------------------------------------------------------- /EpicKitchens100/original_annotations_to_processed_annotations.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | 4 | """ 5 | This script converts the original EPIC-KITCHENS-100 annotation file 6 | and turns it into an annotation.txt file that is compatible 7 | with this repository's dataloader VideoFrameDataset. 8 | 9 | Modify the two filepaths below and then run this script. 10 | """ 11 | 12 | if __name__ == '__main__': 13 | # filepath to where you have stored the original annotaiton file 14 | annotation_file = os.path.join(os.path.expanduser('~'), 'homedata', 'EPICKITCHENS', 'annotations', 'EPIC_100_train_subset.csv') 15 | 16 | # the output path and file name that you want to use 17 | out_file = os.path.join(os.path.expanduser('~'), 'data', 'EPICKITCHENS', 'annotations', 'EPIC_100_validation_new.txt') 18 | 19 | data = pd.read_csv(annotation_file, header=0) 20 | for i,row in enumerate(data): 21 | print(row) 22 | 23 | with open(out_file, 'a') as file: 24 | for index, row in data.iterrows(): 25 | path = os.path.join(row['participant_id'], 'rgb_frames', row['video_id']) 26 | start_frame = row['start_frame'] 27 | last_frame = row['stop_frame'] 28 | verb_class = row['verb_class'] 29 | noun_class = row['noun_class'] 30 | 31 | file.write(f"{path} {start_frame} {last_frame} {verb_class} {noun_class}\n") 32 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2020, Raivo Eli Koot 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /EpicKitchens100/README.md: -------------------------------------------------------------------------------- 1 | # Using Epic Kitchens 100 2 | This directory contains pre-made annotation files to use the [Epic Kitchens 100](https://epic-kitchens.github.io/2021) dataset with this 3 | repository's VideoFrameDataset dataloader. The two `.txt` files in this directory are the training and validation annotation files that you can use for EPIC-Epic Kitchens 100 with VideoFrameDataset. That's it! Reading `(1) Dataset Overview` below can also help you understand the Epic Kitchens 100 files. 4 | 5 | If you need/want to recreate these processed annotation files yourself, read the rest of this README below. 6 | 7 | ### 1. Dataset Overview 8 | When you download the Epic Kitchens 100 dataset, it comes in the following format: 9 | - A folder containing jpeg RGB frames for each video 10 | - A `.csv` file for the training annotations and the validation annotations 11 | 12 | To use VideoFrameDataset with Epic Kitchens 100, we need to 13 | 1. Turn the `.csv` file into an `annotations.txt` file, as described in the main README of this repository. 14 | 15 | ### 2. Processing 16 | Doing (1) from above, is very easy if you use the python script provided in this directory. 17 | - For (1), run the script `original_annotations_to_processed_annotations.py` and make sure that you 18 | set the file paths correctly inside of the script. Run this script once for training and once for validation 19 | annotations. 20 | 21 | ### 3. Done 22 | That's it! You now have all you need to use VideoFrameDataset and start training 23 | on Epic Kitchens 100: 24 | - two annotation text files 25 | - a folder called `RGB` that contains the frames of all videos 26 | -------------------------------------------------------------------------------- /Kinetics400/process_annotation_file.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from typing import Dict 3 | import os 4 | 5 | label_to_id_file = 'labels_to_id.csv' 6 | annotation_in_file = 'train.csv' 7 | annotation_out_file = 'train_processed.txt' 8 | rgb_path = 'rgb/' 9 | 10 | 11 | """ Read in mapping from class label to class ID """ 12 | label_to_id_df = pd.read_csv(label_to_id_file) 13 | label_to_id_dict: Dict[str, int] = dict() 14 | for index, row in label_to_id_df.iterrows(): 15 | label_to_id_dict[row['name']] = int(row['id']) 16 | 17 | 18 | """ Read in original annotations and convert class label to class ID """ 19 | annotations = pd.read_csv(annotation_in_file) 20 | annotations_processed = [] 21 | for index, row in annotations.iterrows(): 22 | label, video_id, start_time, end_time = row['label'], row['youtube_id'], row['time_start'], row['time_end'] 23 | annotations_processed.append((video_id, label_to_id_dict[label], start_time, end_time)) 24 | 25 | 26 | """ Find out how many rgb frames each video has and finally write to new annotation file """ 27 | with open(annotation_out_file, 'w') as f: 28 | for annotation in annotations_processed: 29 | video_id, class_id, start_time, end_time = annotation 30 | video_id = str(video_id) +'_{:06d}_{:06d}'.format(start_time, end_time) 31 | start_frame = 0 32 | frame_path = os.path.join(rgb_path, video_id) 33 | try: 34 | num_frames = len(os.listdir(frame_path)) 35 | if num_frames == 0: 36 | print(f'{video_id} - no frames') 37 | continue 38 | except FileNotFoundError as e: 39 | print(e) 40 | continue 41 | 42 | annotation_string = "{} {} {} {}\n".format(video_id, start_frame, num_frames-1, class_id) 43 | f.write(annotation_string) 44 | -------------------------------------------------------------------------------- /SomethingSomethingV2/README.md: -------------------------------------------------------------------------------- 1 | # Using Something Something V2 2 | This directory contains helpers to use the [Something Something V2](https://20bn.com/datasets/something-something) dataset with this 3 | repository's VideoFrameDataset dataloader. 4 | 5 | ### 1. Dataset Overview 6 | When you download the Something Something V2 dataset, it comes in the following format: 7 | - A `.webm` video file for every video 8 | - A `.json` file for the training annotations and the validation annotations 9 | 10 | To use VideoFrameDataset with Something Something V2, we need to 11 | 1. Create a folder for every `.webm` file that contains the RGB frames of that video. 12 | 2. Turn the `.json` file into an `annotations.txt` file, as described in the main README of this repository. 13 | 14 | ### 2. Processing 15 | Doing (1) and (2) from above, is very easy if you use the python scripts provided in this directory. 16 | - For (1), run the script `videos_to_frames.py` and make sure that you set the file paths 17 | correctly inside of the script. 18 | - For (2), run the script `original_annotations_to_processed_annotations.py` and make sure that you 19 | set the file paths correctly inside of the script. You also must have completed step (1), 20 | before you are able to run this script. Run this script once for training and once for validation 21 | annotations. 22 | 23 | NOTE: The processed training and validation files that step (2) outputs, are uploaded here as well. 24 | You can directly use these and skip step (2). However, after completing step (1), you might have 25 | to run (2) yourself, to create these two files yourself, in case there is some discrepancy between 26 | the way `videos_to_frames.py` extracts RGB frames on my machine compared to on yours (this happened 27 | to me once). 28 | 29 | ### 3. Done 30 | That's it! You should then have a folder on your disk `RGB` that contains all videos in individual RGB 31 | frames, and the two annotation files. This is all you need to use VideoFrameDataset and start training 32 | on Something Something V2! 33 | -------------------------------------------------------------------------------- /SomethingSomethingV2/original_annotations_to_processed_annotations.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | """ 5 | This script takes in the original json annotation files for 6 | SomethingSomethingV2 and turns them into annotation.txt files 7 | that are compatible with this repository's dataloader VideoFrameDataset. 8 | 9 | Running this script requires that you already have the SomethingSomethingV2 10 | videos on disk as RGB frames, where each video has its own folder, containing 11 | the RGB frames of that video. For this, you can use 12 | the script videos_to_frames.py. 13 | 14 | Modify the three filepaths below and then run this script. 15 | """ 16 | 17 | # the official Something Something V2 annotations file, either training or validation. 18 | raw_annotations = 'something-something-v2-validation.json' 19 | # the name of the output file 20 | out_file = 'something-something-v2-validation-processed.txt' 21 | # the official Something Something V2 label file, that specifies a mapping from TEXT_LABEL -> CLASS_ID 22 | labels_file = 'something-something-v2-labels.json' 23 | 24 | rgb_root = '../rgb/' 25 | 26 | annotations = None 27 | 28 | """ list containing the annotations for each sample """ 29 | with open(raw_annotations) as file: 30 | annotations = json.load(file) 31 | 32 | """ dictionary to go from [text label] -> [integer label] """ 33 | with open(labels_file) as file: 34 | labels_to_ids_dict = json.load(file) 35 | 36 | 37 | with open(out_file, 'w') as file: 38 | for sample in annotations: 39 | sample_id = sample['id'] 40 | 41 | """ find out the number of frames for this sample """ 42 | sample_rgb_directory = os.path.join(rgb_root,sample_id) 43 | num_frames = len(os.listdir(sample_rgb_directory)) - 1 44 | 45 | """ convert [text label] -> [integer label] """ 46 | text_label = sample['template'].replace('[', '').replace(']', '') 47 | label_id = labels_to_ids_dict[text_label] 48 | 49 | """ write to processed file """ 50 | annotation_string = "{} {} {} {}\n".format(sample_id, 0, num_frames, label_id) 51 | file.write(annotation_string) 52 | -------------------------------------------------------------------------------- /Kinetics400/README.md: -------------------------------------------------------------------------------- 1 | # Using Kineics 400 2 | This directory contains helpers to use the [Kinetics 400](https://github.com/cvdfoundation/kinetics-dataset) dataset with this 3 | repository's VideoFrameDataset dataloader. Download it from [this URL](https://github.com/cvdfoundation/kinetics-dataset). 4 | 5 | ### 1. Dataset Overview 6 | When you download the Kinetics 400 dataset, it comes in the following format: 7 | - An `.mp4` video file for every video 8 | - A `.csv` file for the training, validation, and testing annotations 9 | 10 | To use VideoFrameDataset with Kinetics 400, we need to 11 | 1. Create a folder for every `.mp4` file that contains the RGB frames of that video. 12 | 2. Turn each `.csv` file into an `annotations.txt` file, as described in the main README of this repository. 13 | 14 | ### 2. Processing 15 | Doing (1) and (2) from above, is very easy if you use the python scripts provided in this directory. 16 | - For (1), make sure that all `.mp4` files (training, validation, and testing) are located in a single and the same 17 | directory. Run the script `videos_to_frames.py` and make sure that you set the file paths 18 | correctly inside of the script. This will probably take ~10 hours for Kinetics 400. 19 | - For (2), run the script `process_annotation_file.py` once for each annotation `.csv` and make sure that you 20 | set the file paths correctly inside of the script. You also must have completed step (1), 21 | before you are able to run this script. 22 | 23 | NOTE: The processed training, validation, and testing files that step (2) outputs, are uploaded here as well. 24 | You can directly use these and skip step (2). However, after completing step (1), you might have 25 | to run (2) yourself, to create these three annotation files yourself, in case there is some discrepancy between 26 | the way `videos_to_frames.py` extracts RGB frames on my machine compared to on yours (This is very likely. I 27 | recommend running step 2 yourself). 28 | 29 | ### 3. Done 30 | That's it! You should then have a folder on your disk `RGB` that contains all videos in individual RGB 31 | frames, and the three annotation files. This is all you need to use VideoFrameDataset and start training 32 | on Kinetics 400! 33 | -------------------------------------------------------------------------------- /docs/source/conf.py: -------------------------------------------------------------------------------- 1 | # Configuration file for the Sphinx documentation builder. 2 | # 3 | # This file only contains a selection of the most common options. For a full 4 | # list see the documentation: 5 | # https://www.sphinx-doc.org/en/master/usage/configuration.html 6 | 7 | # -- Path setup -------------------------------------------------------------- 8 | 9 | # If extensions (or modules to document with autodoc) are in another directory, 10 | # add these directories to sys.path here. If the directory is relative to the 11 | # documentation root, use os.path.abspath to make it absolute, like shown here. 12 | # 13 | import os 14 | import sys 15 | sys.path.insert(0, os.path.abspath('../..')) 16 | 17 | 18 | # -- Project information ----------------------------------------------------- 19 | 20 | project = 'Video Dataset Loading PyTorch' 21 | copyright = '2020, Raivo Koot' 22 | author = 'Raivo Koot' 23 | 24 | # The full version, including alpha/beta/rc tags 25 | release = '1.0' 26 | 27 | 28 | # -- General configuration --------------------------------------------------- 29 | 30 | # Add any Sphinx extension module names here, as strings. They can be 31 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 32 | # ones. 33 | extensions = [ 34 | 'sphinx.ext.napoleon', 35 | 'sphinx.ext.autodoc', 36 | 'sphinx.ext.viewcode', 37 | 'sphinx.ext.coverage', 38 | ] 39 | 40 | # Add any paths that contain templates here, relative to this directory. 41 | templates_path = ['_templates'] 42 | 43 | # List of patterns, relative to source directory, that match files and 44 | # directories to ignore when looking for source files. 45 | # This pattern also affects html_static_path and html_extra_path. 46 | exclude_patterns = [] 47 | 48 | 49 | # -- Options for HTML output ------------------------------------------------- 50 | 51 | # The theme to use for HTML and HTML Help pages. See the documentation for 52 | # a list of builtin themes. 53 | # 54 | html_theme = 'sphinx_rtd_theme' 55 | 56 | # Add any paths that contain custom static files (such as style sheets) here, 57 | # relative to this directory. They are copied after the builtin static files, 58 | # so a file named "default.css" will overwrite the builtin "default.css". 59 | html_static_path = ['_static'] -------------------------------------------------------------------------------- /SomethingSomethingV2/videos_to_frames.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import threading 4 | from queue import Queue 5 | 6 | """ 7 | Given individual video files (mp4, webm) on disk, creates a folder for 8 | every video file and saves the video's RGB frames as jpeg files in that 9 | folder. 10 | 11 | It can be used to turn SomethingSomethingV2, which comes as 12 | many ".webm" files, into an RGB folder for each ".webm" file. 13 | Uses multithreading to extract frames faster. 14 | 15 | Modify the two filepaths at the bottom and then run this script. 16 | """ 17 | 18 | 19 | def video_to_rgb(video_filename, out_dir, resize_shape): 20 | file_template = 'frame_{0:012d}.jpg' 21 | reader = cv2.VideoCapture(video_filename) 22 | success, frame, = reader.read() # read first frame 23 | 24 | count = 0 25 | while success: 26 | out_filepath = os.path.join(out_dir, file_template.format(count)) 27 | frame = cv2.resize(frame, resize_shape) 28 | cv2.imwrite(out_filepath, frame) 29 | success, frame = reader.read() 30 | count += 1 31 | 32 | def process_videofile(video_filename, video_path, rgb_out_path, file_extension: str ='.mp4'): 33 | filepath = os.path.join(video_path, video_filename) 34 | video_filename = video_filename.replace(file_extension, '') 35 | 36 | out_dir = os.path.join(rgb_out_path, video_filename) 37 | os.mkdir(out_dir) 38 | video_to_rgb(filepath, out_dir, resize_shape=(224, 224)) 39 | 40 | def thread_job(queue, video_path, rgb_out_path, file_extension='.webm'): 41 | while not queue.empty(): 42 | video_filename = queue.get() 43 | process_videofile(video_filename, video_path, rgb_out_path, file_extension=file_extension) 44 | queue.task_done() 45 | 46 | 47 | if __name__ == '__main__': 48 | # the path to the folder which contains all video files (mp4, webm, or other) 49 | video_path = 'videos' 50 | # the root output path where RGB frame folders should be created 51 | rgb_out_path = 'rgb' 52 | # the file extension that the videos have 53 | file_extension = '.webm' 54 | 55 | video_filenames = os.listdir(video_path) 56 | queue = Queue() 57 | [queue.put(video_filename) for video_filename in video_filenames] 58 | 59 | NUM_THREADS = 30 60 | for i in range(NUM_THREADS): 61 | worker = threading.Thread(target=thread_job, args=(queue, video_path, rgb_out_path, file_extension)) 62 | worker.start() 63 | 64 | print('waiting for all videos to be completed.', queue.qsize(), 'videos') 65 | print('This can take an hour or two depending on dataset size') 66 | queue.join() 67 | print('all done') 68 | -------------------------------------------------------------------------------- /Kinetics400/videos_to_frames.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import threading 4 | from queue import Queue 5 | 6 | """ 7 | Given individual video files (mp4, webm) on disk, creates a folder for 8 | every video file and saves the video's RGB frames as jpeg files in that 9 | folder. 10 | 11 | It can be used to turn Kinetics 400, which comes as 12 | many ".mp4" files, into an RGB folder for each ".mp4" file. 13 | Uses multithreading to extract frames faster. 14 | 15 | Modify the two filepaths at the bottom and then run this script. 16 | """ 17 | 18 | 19 | def video_to_rgb(video_filename, out_dir, resize_shape): 20 | file_template = 'frame_{0:012d}.jpg' 21 | reader = cv2.VideoCapture(video_filename) 22 | success, frame, = reader.read() # read first frame 23 | 24 | count = 0 25 | while success: 26 | out_filepath = os.path.join(out_dir, file_template.format(count)) 27 | frame = cv2.resize(frame, resize_shape) 28 | cv2.imwrite(out_filepath, frame) 29 | success, frame = reader.read() 30 | count += 1 31 | 32 | def process_videofile(video_filename, video_path, rgb_out_path, file_extension: str ='.mp4'): 33 | filepath = os.path.join(video_path, video_filename) 34 | video_filename = video_filename.replace(file_extension, '') 35 | 36 | out_dir = os.path.join(rgb_out_path, video_filename) 37 | os.mkdir(out_dir) 38 | video_to_rgb(filepath, out_dir, resize_shape=OUT_HEIGHT_WIDTH) 39 | 40 | def thread_job(queue, video_path, rgb_out_path, file_extension='.webm'): 41 | while not queue.empty(): 42 | video_filename = queue.get() 43 | process_videofile(video_filename, video_path, rgb_out_path, file_extension=file_extension) 44 | queue.task_done() 45 | 46 | 47 | if __name__ == '__main__': 48 | # the path to the folder which contains all video files (mp4, webm, or other) 49 | video_path = '/home/raivo/data1/kinetics/videos/all' 50 | # the root output path where RGB frame folders should be created 51 | rgb_out_path = 'rgb' 52 | # the file extension that the videos have 53 | file_extension = '.mp4' 54 | # hight and width to resize RGB frames to 55 | OUT_HEIGHT_WIDTH = (224, 224) 56 | 57 | video_filenames = os.listdir(video_path) 58 | queue = Queue() 59 | [queue.put(video_filename) for video_filename in video_filenames] 60 | 61 | NUM_THREADS = 15 62 | for i in range(NUM_THREADS): 63 | worker = threading.Thread(target=thread_job, args=(queue, video_path, rgb_out_path, file_extension)) 64 | worker.start() 65 | 66 | print('waiting for all videos to be completed.', queue.qsize(), 'videos') 67 | print('This can take an hour or two depending on dataset size') 68 | queue.join() 69 | print('all done') 70 | -------------------------------------------------------------------------------- /docs/source/README.rst: -------------------------------------------------------------------------------- 1 | Efficient Video Dataset Loading, Preprocessing, and Augmentation 2 | ================================================================ 3 | 4 | Author: `Raivo Koot `__ 5 | 6 | If you are completely unfamiliar with loading datasets in PyTorch using 7 | ``torch.utils.data.Dataset`` and ``torch.utils.data.DataLoader``, I 8 | recommend getting familiar with these first through 9 | `this `__ 10 | or 11 | `this `__. 12 | 13 | Overview: This example demonstrates the use of ``VideoFrameDataset`` 14 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15 | 16 | The VideoFrameDataset class serves to ``easily``, ``efficiently`` and 17 | ``effectively`` load video samples from video datasets in PyTorch. 1) 18 | Easily because this dataset class can be used with custom datasets with 19 | minimum effort and no modification. The class merely expects the video 20 | dataset to have a certain structure on disk and expects a .txt 21 | annotation file that enumerates each video sample. Details on this can 22 | be found below and at 23 | ``https://pykale.readthedocs.io/en/latest/kale.loaddata.html#kale-loaddata-video_dataset-module``. 24 | 2) Efficiently because the video loading pipeline that this class 25 | implements is very fast. This minimizes GPU waiting time during training 26 | by eliminating input bottlenecks that can slow down training time by 27 | several folds. 3) Effectively because the implemented sampling strategy 28 | for video frames is very strong. Video training using the entire 29 | sequence of video frames (often several hundred) is too memory and 30 | compute intense. Therefore, this implementation samples frames evenly 31 | from the video (sparse temporal sampling) so that the loaded frames 32 | represent every part of the video, with support for arbitrary and 33 | differing video lengths within the same dataset. This approach has shown 34 | to be very effective and is taken from `"Temporal Segment Networks 35 | (ECCV2016)" `__ with modifications. 36 | 37 | In conjunction with PyTorch's DataLoader, the VideoFrameDataset class 38 | returns video batch tensors of size 39 | ``BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH``. 40 | 41 | For a demo, visit ``demo.py``. ### QuickDemo (demo.py) 42 | 43 | .. code:: python 44 | 45 | root = os.path.join(os.getcwd(), 'demo_dataset') # Folder in which all videos lie in a specific structure 46 | annotation_file = os.path.join(root, 'annotations.txt') # A row for each video sample as: (VIDEO_PATH NUM_FRAMES CLASS_INDEX) 47 | 48 | """ DEMO 1 WITHOUT IMAGE TRANSFORMS """ 49 | dataset = VideoFrameDataset( 50 | root_path=root, 51 | annotationfile_path=annotation_file, 52 | num_segments=5, 53 | frames_per_segment=1, 54 | image_template='img_{:05d}.jpg', 55 | transform=None, 56 | random_shift=True, 57 | test_mode=False 58 | ) 59 | 60 | sample = dataset[0] # take first sample of dataset 61 | frames = sample[0] # list of PIL images 62 | label = sample[1] # integer label 63 | 64 | for image in frames: 65 | plt.imshow(image) 66 | plt.title(label) 67 | plt.show() 68 | plt.pause(1) 69 | 70 | Table of Contents 71 | ================= 72 | 73 | - `1. Requirements <#1-requirements>`__ 74 | - `2. Custom Dataset <#2-custom-dataset>`__ 75 | - `3. Video Frame Sampling Method <#3-video-frame-sampling-method>`__ 76 | - `4. Using VideoFrameDataset for 77 | Training <#4-using-videoframedataset-for-training>`__ 78 | - `5. Conclusion <#5-conclusion>`__ 79 | - `6. Acknowledgements <#6-acknowledgements>`__ 80 | 81 | 1. Requirements 82 | ~~~~~~~~~~~~~~~ 83 | 84 | :: 85 | 86 | # Without these three, VideoFrameDataset will not work. 87 | torchvision >= 0.8.0 88 | torch >= 1.7.0 89 | python >= 3.6 90 | 91 | 2. Custom Dataset 92 | ~~~~~~~~~~~~~~~~~ 93 | 94 | To use any dataset, two conditions must be met. 1) The video data must 95 | be supplied as RGB frames, each frame saved as an image file. Each video 96 | must have its own folder, in which the frames of that video lie. The 97 | frames of a video inside its folder must be named uniformly as 98 | ``img_00001.jpg`` ... ``img_00120.jpg``, if there are 120 frames. The 99 | filename template for frames is then "img\_{:05d}.jpg" (python string 100 | formatting, specifying 5 digits after the underscore), and must be 101 | supplied to the constructor of VideoFrameDataset as a parameter. Each 102 | video folder lies inside a ``root`` folder of this dataset. 2) To 103 | enumerate all video samples in the dataset and their required metadata, 104 | a ``.txt`` annotation file must be manually created that contains a row 105 | for each video sample in the dataset. The training, validation, and 106 | testing datasets must have separate annotation files. Each row must be a 107 | space-separated list that contains 108 | ``VIDEO_PATH NUM_FRAMES CLASS_INDEX``. The ``VIDEO_PATH`` of a video 109 | sample should be provided without the ``root`` prefix of this dataset. 110 | 111 | This example project demonstrates this using a dummy dataset inside of 112 | ``demo_dataset/``, which is the ``root`` dataset folder of this example. 113 | The folder structure looks as follows: 114 | 115 | :: 116 | 117 | demo_dataset 118 | │ 119 | ├───annotations.txt 120 | ├───jumping # arbitrary class folder naming 121 | │ ├───0001 # arbitrary video folder naming 122 | │ │ ├───img_00001.jpg 123 | │ │ . 124 | │ │ └───img_00017.jpg 125 | │ └───0002 126 | │ ├───img_00001.jpg 127 | │ . 128 | │ └───img_00017.jpg 129 | │ 130 | └───running # arbitrary folder naming 131 | ├───0001 # arbitrary video folder naming 132 | │ ├───img_00001.jpg 133 | │ . 134 | │ └───img_00017.jpg 135 | └───0002 136 | ├───img_00001.jpg 137 | . 138 | └───img_00017.jpg 139 | 140 | 141 | 142 | The accompanying annotation ``.txt`` file contains the following rows 143 | 144 | :: 145 | 146 | jumping/0001 17 0 147 | jumping/0002 18 0 148 | running/0001 15 1 149 | running/0002 20 1 150 | 151 | Instantiating a VideoFrameDataset with the ``root_path`` parameter 152 | pointing to ``demo_dataset``, the ``annotationsfile_path`` parameter 153 | pointing to the annotation file, and the ``imagefile_template`` 154 | parameter as "img\_{:05d}.jpg", is all that it takes to start using the 155 | VideoFrameDataset class. 156 | 157 | 3. Video Frame Sampling Method 158 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 159 | 160 | When loading a video, only a number of its frames are loaded. They are 161 | chosen in the following way: 1. The frame indices [1,N] are divided into 162 | NUM\_SEGMENTS even segments. From each segment, FRAMES\_PER\_SEGMENT 163 | consecutive indices are chosen at random. This results in 164 | NUM\_SEGMENTS\*FRAMES\_PER\_SEGMENT chosen indices, whose frames are 165 | loaded as PIL images and put into a list and returned when calling 166 | ``dataset[i]``. 167 | 168 | 4. Using VideoFrameDataset for training 169 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 170 | 171 | As demonstrated in ``demo.py``, we can use PyTorch's 172 | ``torch.utils.data.DataLoader`` class with VideoFrameDataset to take 173 | care of shuffling, batching, and more. To turn the lists of PIL images 174 | returned by VideoFrameDataset into tensors, the transform 175 | ``video_dataset.imglist_totensor()`` can be supplied as the 176 | ``transform`` parameter to VideoFrameDataset. This turns a list of N PIL 177 | images into a batch of images/frames of shape 178 | ``N x CHANNELS x HEIGHT x WIDTH``. We can further chain preprocessing 179 | and augmentation functions that act on batches of images onto the end of 180 | ``imglist_totensor()``. 181 | 182 | As of ``torchvision 0.8.0``, all torchvision transforms can now also 183 | operate on batches of images, and they apply deterministic or random 184 | transformations on the batch identically on all images of the batch. 185 | Therefore, any torchvision transform can be used here to apply 186 | video-uniform preprocessing and augmentation. 187 | 188 | 5. Conclusion 189 | ~~~~~~~~~~~~~ 190 | 191 | A proper code-based explanation on how to use VideoFrameDataset for 192 | training is provided in ``demo.py`` 193 | 194 | 6. Acknowledgements 195 | ~~~~~~~~~~~~~~~~~~~ 196 | 197 | We thank the authors of TSN for their 198 | `codebase `__, from which we 199 | took VideoFrameDataset and adapted it. 200 | -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | from video_dataset import VideoFrameDataset, ImglistToTensor 2 | from torchvision import transforms 3 | import torch 4 | import matplotlib.pyplot as plt 5 | from mpl_toolkits.axes_grid1 import ImageGrid 6 | import os 7 | 8 | """ 9 | Ignore this function and look at "main" below. 10 | """ 11 | def plot_video(rows, cols, frame_list, plot_width, plot_height, title: str): 12 | fig = plt.figure(figsize=(plot_width, plot_height)) 13 | grid = ImageGrid(fig, 111, # similar to subplot(111) 14 | nrows_ncols=(rows, cols), # creates 2x2 grid of axes 15 | axes_pad=0.3, # pad between axes in inch. 16 | ) 17 | 18 | for index, (ax, im) in enumerate(zip(grid, frame_list)): 19 | # Iterating over the grid returns the Axes. 20 | ax.imshow(im) 21 | ax.set_title(index) 22 | plt.suptitle(title) 23 | plt.show() 24 | 25 | if __name__ == '__main__': 26 | """ 27 | This demo uses the dummy dataset inside of the folder "demo_dataset". 28 | It is structured just like a real dataset would need to be structured. 29 | 30 | TABLE OF CODE CONTENTS: 31 | 1. Minimal demo without image transforms 32 | 2. Minimal demo without sparse temporal sampling for single continuous frame clips, without image transforms 33 | 3. Demo with image transforms 34 | 4. Demo 3 continued with PyTorch dataloader 35 | 5. Demo of using a dataset where samples have multiple separate class labels 36 | 37 | """ 38 | videos_root = os.path.join(os.getcwd(), 'demo_dataset') 39 | annotation_file = os.path.join(videos_root, 'annotations.txt') 40 | 41 | 42 | """ DEMO 1 WITHOUT IMAGE TRANSFORMS """ 43 | dataset = VideoFrameDataset( 44 | root_path=videos_root, 45 | annotationfile_path=annotation_file, 46 | num_segments=5, 47 | frames_per_segment=1, 48 | imagefile_template='img_{:05d}.jpg', 49 | transform=None, 50 | test_mode=False 51 | ) 52 | 53 | sample = dataset[0] 54 | frames = sample[0] # list of PIL images 55 | label = sample[1] # integer label 56 | 57 | plot_video(rows=1, cols=5, frame_list=frames, plot_width=15., plot_height=3., 58 | title='Evenly Sampled Frames, No Video Transform') 59 | 60 | 61 | 62 | """ DEMO 2 SINGLE CONTINUOUS FRAME CLIP INSTEAD OF SAMPLED FRAMES, WITHOUT TRANSFORMS """ 63 | # If you do not want to use sparse temporal sampling, and instead 64 | # want to just load N consecutive frames starting from a random 65 | # start index, this is easy. Simply set NUM_SEGMENTS=1 and 66 | # FRAMES_PER_SEGMENT=N. Each time a sample is loaded, N 67 | # frames will be loaded from a new random start index. 68 | dataset = VideoFrameDataset( 69 | root_path=videos_root, 70 | annotationfile_path=annotation_file, 71 | num_segments=1, 72 | frames_per_segment=9, 73 | imagefile_template='img_{:05d}.jpg', 74 | transform=None, 75 | test_mode=False 76 | ) 77 | 78 | sample = dataset[3] 79 | frames = sample[0] # list of PIL images 80 | label = sample[1] # integer label 81 | 82 | plot_video(rows=3, cols=3, frame_list=frames, plot_width=10., plot_height=5., 83 | title='Continuous Sampled Frame Clip, No Video Transform') 84 | 85 | 86 | 87 | """ DEMO 3 WITH TRANSFORMS """ 88 | # As of torchvision 0.8.0, torchvision transforms support batches of images 89 | # of size (BATCH x CHANNELS x HEIGHT x WIDTH) and apply deterministic or random 90 | # transformations on the batch identically on all images of the batch. Any torchvision 91 | # transform for image augmentation can thus also be used for video augmentation. 92 | preprocess = transforms.Compose([ 93 | ImglistToTensor(), # list of PIL images to (FRAMES x CHANNELS x HEIGHT x WIDTH) tensor 94 | transforms.Resize(299), # image batch, resize smaller edge to 299 95 | transforms.CenterCrop(299), # image batch, center crop to square 299x299 96 | transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), 97 | ]) 98 | 99 | dataset = VideoFrameDataset( 100 | root_path=videos_root, 101 | annotationfile_path=annotation_file, 102 | num_segments=5, 103 | frames_per_segment=1, 104 | imagefile_template='img_{:05d}.jpg', 105 | transform=preprocess, 106 | test_mode=False 107 | ) 108 | 109 | sample = dataset[2] 110 | frame_tensor = sample[0] # tensor of shape (NUM_SEGMENTS*FRAMES_PER_SEGMENT) x CHANNELS x HEIGHT x WIDTH 111 | label = sample[1] # integer label 112 | 113 | print('Video Tensor Size:', frame_tensor.size()) 114 | 115 | def denormalize(video_tensor): 116 | """ 117 | Undoes mean/standard deviation normalization, zero to one scaling, 118 | and channel rearrangement for a batch of images. 119 | args: 120 | video_tensor: a (FRAMES x CHANNELS x HEIGHT x WIDTH) tensor 121 | """ 122 | inverse_normalize = transforms.Normalize( 123 | mean=[-0.485 / 0.229, -0.456 / 0.224, -0.406 / 0.225], 124 | std=[1 / 0.229, 1 / 0.224, 1 / 0.225] 125 | ) 126 | return (inverse_normalize(video_tensor) * 255.).type(torch.uint8).permute(0, 2, 3, 1).numpy() 127 | 128 | 129 | frame_tensor = denormalize(frame_tensor) 130 | plot_video(rows=1, cols=5, frame_list=frame_tensor, plot_width=15., plot_height=3., 131 | title='Evenly Sampled Frames, + Video Transform') 132 | 133 | 134 | 135 | """ DEMO 3 CONTINUED: DATALOADER """ 136 | dataloader = torch.utils.data.DataLoader( 137 | dataset=dataset, 138 | batch_size=2, 139 | shuffle=True, 140 | num_workers=4, 141 | pin_memory=True 142 | ) 143 | 144 | for epoch in range(10): 145 | for video_batch, labels in dataloader: 146 | """ 147 | Insert Training Code Here 148 | """ 149 | print(labels) 150 | print("\nVideo Batch Tensor Size:", video_batch.size()) 151 | print("Batch Labels Size:", labels.size()) 152 | break 153 | break 154 | 155 | 156 | """ DEMO 5: SAMPLES WITH MULTIPLE LABELS """ 157 | """ 158 | Apart from supporting just a single label per sample, VideoFrameDataset also supports multi-label samples, 159 | where a sample can be associated with more than just one label. EPIC-KITCHENS, for example, associates a 160 | noun, verb, and action with each video clip. To support this, instead of each row in annotations.txt 161 | being (VIDEO_PATH, START_FRAME, END_FRAME, LABEL_ID), each row can also be 162 | (VIDEO_PATH, START_FRAME, END_FRAME, LABEL_1_ID, ..., LABEL_N_ID). An example of this can be seen in the 163 | directory `demo_dataset_multilabel`. 164 | 165 | Each sample returned by VideoFrameDataset is then ((FRAMESxCHANNELSxHEIGHTxWIDTH), (LABEL_1, ..., LABEL_N)). 166 | When paired with the `torch.utils.data.DataLoader`, instead of yielding each batch as 167 | ((BATCHxFRAMESxCHANNELSxHEIGHTxWIDTH), (BATCH)) where the second tuple item is the labels of the batch, 168 | `torch.utils.data.DataLoader` returns a batch as ((BATCHxFRAMESxCHANNELSxHEIGHTxWIDTH), ((BATCH),...,(BATCH)) 169 | where the second tuple item is itself a tuple, with N BATCH-sized tensors of labels, where N is the 170 | number of labels assigned to each sample. 171 | """ 172 | videos_root = os.path.join(os.getcwd(), 'demo_dataset_multilabel') 173 | annotation_file = os.path.join(videos_root, 'annotations.txt') 174 | 175 | dataset = VideoFrameDataset( 176 | root_path=videos_root, 177 | annotationfile_path=annotation_file, 178 | num_segments=5, 179 | frames_per_segment=1, 180 | imagefile_template='img_{:05d}.jpg', 181 | transform=preprocess, 182 | test_mode=False 183 | ) 184 | 185 | dataloader = torch.utils.data.DataLoader( 186 | dataset=dataset, 187 | batch_size=3, 188 | shuffle=True, 189 | num_workers=2, 190 | pin_memory=True 191 | ) 192 | 193 | print("\nMulti-Label Example") 194 | for epoch in range(10): 195 | for batch in dataloader: 196 | """ 197 | Insert Training Code Here 198 | """ 199 | video_batch, (labels1, labels2, labels3) = batch 200 | 201 | print("Video Batch Tensor Size:", video_batch.size()) 202 | print("Labels1 Size:", labels1.size()) # == batch_size 203 | print("Labels2 Size:", labels2.size()) # == batch_size 204 | print("Labels3 Size:", labels3.size()) # == batch_size 205 | 206 | break 207 | break 208 | -------------------------------------------------------------------------------- /Kinetics400/labels_to_id.csv: -------------------------------------------------------------------------------- 1 | id,name 2 | 0,abseiling 3 | 1,air drumming 4 | 2,answering questions 5 | 3,applauding 6 | 4,applying cream 7 | 5,archery 8 | 6,arm wrestling 9 | 7,arranging flowers 10 | 8,assembling computer 11 | 9,auctioning 12 | 10,baby waking up 13 | 11,baking cookies 14 | 12,balloon blowing 15 | 13,bandaging 16 | 14,barbequing 17 | 15,bartending 18 | 16,beatboxing 19 | 17,bee keeping 20 | 18,belly dancing 21 | 19,bench pressing 22 | 20,bending back 23 | 21,bending metal 24 | 22,biking through snow 25 | 23,blasting sand 26 | 24,blowing glass 27 | 25,blowing leaves 28 | 26,blowing nose 29 | 27,blowing out candles 30 | 28,bobsledding 31 | 29,bookbinding 32 | 30,bouncing on trampoline 33 | 31,bowling 34 | 32,braiding hair 35 | 33,breading or breadcrumbing 36 | 34,breakdancing 37 | 35,brush painting 38 | 36,brushing hair 39 | 37,brushing teeth 40 | 38,building cabinet 41 | 39,building shed 42 | 40,bungee jumping 43 | 41,busking 44 | 42,canoeing or kayaking 45 | 43,capoeira 46 | 44,carrying baby 47 | 45,cartwheeling 48 | 46,carving pumpkin 49 | 47,catching fish 50 | 48,catching or throwing baseball 51 | 49,catching or throwing frisbee 52 | 50,catching or throwing softball 53 | 51,celebrating 54 | 52,changing oil 55 | 53,changing wheel 56 | 54,checking tires 57 | 55,cheerleading 58 | 56,chopping wood 59 | 57,clapping 60 | 58,clay pottery making 61 | 59,clean and jerk 62 | 60,cleaning floor 63 | 61,cleaning gutters 64 | 62,cleaning pool 65 | 63,cleaning shoes 66 | 64,cleaning toilet 67 | 65,cleaning windows 68 | 66,climbing a rope 69 | 67,climbing ladder 70 | 68,climbing tree 71 | 69,contact juggling 72 | 70,cooking chicken 73 | 71,cooking egg 74 | 72,cooking on campfire 75 | 73,cooking sausages 76 | 74,counting money 77 | 75,country line dancing 78 | 76,cracking neck 79 | 77,crawling baby 80 | 78,crossing river 81 | 79,crying 82 | 80,curling hair 83 | 81,cutting nails 84 | 82,cutting pineapple 85 | 83,cutting watermelon 86 | 84,dancing ballet 87 | 85,dancing charleston 88 | 86,dancing gangnam style 89 | 87,dancing macarena 90 | 88,deadlifting 91 | 89,decorating the christmas tree 92 | 90,digging 93 | 91,dining 94 | 92,disc golfing 95 | 93,diving cliff 96 | 94,dodgeball 97 | 95,doing aerobics 98 | 96,doing laundry 99 | 97,doing nails 100 | 98,drawing 101 | 99,dribbling basketball 102 | 100,drinking 103 | 101,drinking beer 104 | 102,drinking shots 105 | 103,driving car 106 | 104,driving tractor 107 | 105,drop kicking 108 | 106,drumming fingers 109 | 107,dunking basketball 110 | 108,dying hair 111 | 109,eating burger 112 | 110,eating cake 113 | 111,eating carrots 114 | 112,eating chips 115 | 113,eating doughnuts 116 | 114,eating hotdog 117 | 115,eating ice cream 118 | 116,eating spaghetti 119 | 117,eating watermelon 120 | 118,egg hunting 121 | 119,exercising arm 122 | 120,exercising with an exercise ball 123 | 121,extinguishing fire 124 | 122,faceplanting 125 | 123,feeding birds 126 | 124,feeding fish 127 | 125,feeding goats 128 | 126,filling eyebrows 129 | 127,finger snapping 130 | 128,fixing hair 131 | 129,flipping pancake 132 | 130,flying kite 133 | 131,folding clothes 134 | 132,folding napkins 135 | 133,folding paper 136 | 134,front raises 137 | 135,frying vegetables 138 | 136,garbage collecting 139 | 137,gargling 140 | 138,getting a haircut 141 | 139,getting a tattoo 142 | 140,giving or receiving award 143 | 141,golf chipping 144 | 142,golf driving 145 | 143,golf putting 146 | 144,grinding meat 147 | 145,grooming dog 148 | 146,grooming horse 149 | 147,gymnastics tumbling 150 | 148,hammer throw 151 | 149,headbanging 152 | 150,headbutting 153 | 151,high jump 154 | 152,high kick 155 | 153,hitting baseball 156 | 154,hockey stop 157 | 155,holding snake 158 | 156,hopscotch 159 | 157,hoverboarding 160 | 158,hugging 161 | 159,hula hooping 162 | 160,hurdling 163 | 161,hurling (sport) 164 | 162,ice climbing 165 | 163,ice fishing 166 | 164,ice skating 167 | 165,ironing 168 | 166,javelin throw 169 | 167,jetskiing 170 | 168,jogging 171 | 169,juggling balls 172 | 170,juggling fire 173 | 171,juggling soccer ball 174 | 172,jumping into pool 175 | 173,jumpstyle dancing 176 | 174,kicking field goal 177 | 175,kicking soccer ball 178 | 176,kissing 179 | 177,kitesurfing 180 | 178,knitting 181 | 179,krumping 182 | 180,laughing 183 | 181,laying bricks 184 | 182,long jump 185 | 183,lunge 186 | 184,making a cake 187 | 185,making a sandwich 188 | 186,making bed 189 | 187,making jewelry 190 | 188,making pizza 191 | 189,making snowman 192 | 190,making sushi 193 | 191,making tea 194 | 192,marching 195 | 193,massaging back 196 | 194,massaging feet 197 | 195,massaging legs 198 | 196,massaging person's head 199 | 197,milking cow 200 | 198,mopping floor 201 | 199,motorcycling 202 | 200,moving furniture 203 | 201,mowing lawn 204 | 202,news anchoring 205 | 203,opening bottle 206 | 204,opening present 207 | 205,paragliding 208 | 206,parasailing 209 | 207,parkour 210 | 208,passing American football (in game) 211 | 209,passing American football (not in game) 212 | 210,peeling apples 213 | 211,peeling potatoes 214 | 212,petting animal (not cat) 215 | 213,petting cat 216 | 214,picking fruit 217 | 215,planting trees 218 | 216,plastering 219 | 217,playing accordion 220 | 218,playing badminton 221 | 219,playing bagpipes 222 | 220,playing basketball 223 | 221,playing bass guitar 224 | 222,playing cards 225 | 223,playing cello 226 | 224,playing chess 227 | 225,playing clarinet 228 | 226,playing controller 229 | 227,playing cricket 230 | 228,playing cymbals 231 | 229,playing didgeridoo 232 | 230,playing drums 233 | 231,playing flute 234 | 232,playing guitar 235 | 233,playing harmonica 236 | 234,playing harp 237 | 235,playing ice hockey 238 | 236,playing keyboard 239 | 237,playing kickball 240 | 238,playing monopoly 241 | 239,playing organ 242 | 240,playing paintball 243 | 241,playing piano 244 | 242,playing poker 245 | 243,playing recorder 246 | 244,playing saxophone 247 | 245,playing squash or racquetball 248 | 246,playing tennis 249 | 247,playing trombone 250 | 248,playing trumpet 251 | 249,playing ukulele 252 | 250,playing violin 253 | 251,playing volleyball 254 | 252,playing xylophone 255 | 253,pole vault 256 | 254,presenting weather forecast 257 | 255,pull ups 258 | 256,pumping fist 259 | 257,pumping gas 260 | 258,punching bag 261 | 259,punching person (boxing) 262 | 260,push up 263 | 261,pushing car 264 | 262,pushing cart 265 | 263,pushing wheelchair 266 | 264,reading book 267 | 265,reading newspaper 268 | 266,recording music 269 | 267,riding a bike 270 | 268,riding camel 271 | 269,riding elephant 272 | 270,riding mechanical bull 273 | 271,riding mountain bike 274 | 272,riding mule 275 | 273,riding or walking with horse 276 | 274,riding scooter 277 | 275,riding unicycle 278 | 276,ripping paper 279 | 277,robot dancing 280 | 278,rock climbing 281 | 279,rock scissors paper 282 | 280,roller skating 283 | 281,running on treadmill 284 | 282,sailing 285 | 283,salsa dancing 286 | 284,sanding floor 287 | 285,scrambling eggs 288 | 286,scuba diving 289 | 287,setting table 290 | 288,shaking hands 291 | 289,shaking head 292 | 290,sharpening knives 293 | 291,sharpening pencil 294 | 292,shaving head 295 | 293,shaving legs 296 | 294,shearing sheep 297 | 295,shining shoes 298 | 296,shooting basketball 299 | 297,shooting goal (soccer) 300 | 298,shot put 301 | 299,shoveling snow 302 | 300,shredding paper 303 | 301,shuffling cards 304 | 302,side kick 305 | 303,sign language interpreting 306 | 304,singing 307 | 305,situp 308 | 306,skateboarding 309 | 307,ski jumping 310 | 308,skiing (not slalom or crosscountry) 311 | 309,skiing crosscountry 312 | 310,skiing slalom 313 | 311,skipping rope 314 | 312,skydiving 315 | 313,slacklining 316 | 314,slapping 317 | 315,sled dog racing 318 | 316,smoking 319 | 317,smoking hookah 320 | 318,snatch weight lifting 321 | 319,sneezing 322 | 320,sniffing 323 | 321,snorkeling 324 | 322,snowboarding 325 | 323,snowkiting 326 | 324,snowmobiling 327 | 325,somersaulting 328 | 326,spinning poi 329 | 327,spray painting 330 | 328,spraying 331 | 329,springboard diving 332 | 330,squat 333 | 331,sticking tongue out 334 | 332,stomping grapes 335 | 333,stretching arm 336 | 334,stretching leg 337 | 335,strumming guitar 338 | 336,surfing crowd 339 | 337,surfing water 340 | 338,sweeping floor 341 | 339,swimming backstroke 342 | 340,swimming breast stroke 343 | 341,swimming butterfly stroke 344 | 342,swing dancing 345 | 343,swinging legs 346 | 344,swinging on something 347 | 345,sword fighting 348 | 346,tai chi 349 | 347,taking a shower 350 | 348,tango dancing 351 | 349,tap dancing 352 | 350,tapping guitar 353 | 351,tapping pen 354 | 352,tasting beer 355 | 353,tasting food 356 | 354,testifying 357 | 355,texting 358 | 356,throwing axe 359 | 357,throwing ball 360 | 358,throwing discus 361 | 359,tickling 362 | 360,tobogganing 363 | 361,tossing coin 364 | 362,tossing salad 365 | 363,training dog 366 | 364,trapezing 367 | 365,trimming or shaving beard 368 | 366,trimming trees 369 | 367,triple jump 370 | 368,tying bow tie 371 | 369,tying knot (not on a tie) 372 | 370,tying tie 373 | 371,unboxing 374 | 372,unloading truck 375 | 373,using computer 376 | 374,using remote controller (not gaming) 377 | 375,using segway 378 | 376,vault 379 | 377,waiting in line 380 | 378,walking the dog 381 | 379,washing dishes 382 | 380,washing feet 383 | 381,washing hair 384 | 382,washing hands 385 | 383,water skiing 386 | 384,water sliding 387 | 385,watering plants 388 | 386,waxing back 389 | 387,waxing chest 390 | 388,waxing eyebrows 391 | 389,waxing legs 392 | 390,weaving basket 393 | 391,welding 394 | 392,whistling 395 | 393,windsurfing 396 | 394,wrapping present 397 | 395,wrestling 398 | 396,writing 399 | 397,yawning 400 | 398,yoga 401 | 399,zumba 402 | -------------------------------------------------------------------------------- /docs/source/index.rst: -------------------------------------------------------------------------------- 1 | .. Video Dataset Loading PyTorch documentation master file, created by 2 | sphinx-quickstart on Fri Nov 13 02:54:35 2020. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Video Dataset Loading in Pytorch ! 7 | ================================== 8 | 9 | .. toctree:: 10 | :maxdepth: 2 11 | :caption: Contents 12 | 13 | VideoDataset 14 | Github Demo, Readme & Code 15 | README 16 | 17 | 18 | Efficient Video Dataset Loading, Preprocessing, and Augmentation 19 | ======================================================================== 20 | To get the most up-to-date README, please visit `Github: Video Dataset Loading Pytorch `__ 21 | 22 | Author: `Raivo Koot `__ 23 | 24 | If you are completely unfamiliar with loading datasets in PyTorch using 25 | ``torch.utils.data.Dataset`` and ``torch.utils.data.DataLoader``, I 26 | recommend getting familiar with these first through 27 | `this `__ 28 | or 29 | `this `__. 30 | 31 | Overview: This example demonstrates the use of ``VideoFrameDataset`` 32 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 33 | 34 | The VideoFrameDataset class serves to ``easily``, ``efficiently`` and 35 | ``effectively`` load video samples from video datasets in PyTorch. 36 | 37 | 1) Easily because this dataset class can be used with custom datasets with 38 | minimum effort and no modification. The class merely expects the video 39 | dataset to have a certain structure on disk and expects a .txt 40 | annotation file that enumerates each video sample. Details on this can 41 | be found below and at 42 | ``https://video-dataset-loading-pytorch.readthedocs.io/``. 43 | 44 | 2) Efficiently because the video loading pipeline that this class 45 | implements is very fast. This minimizes GPU waiting time during training 46 | by eliminating input bottlenecks that can slow down training time by 47 | several folds. 48 | 49 | 3) Effectively because the implemented sampling strategy 50 | for video frames is very strong. Video training using the entire 51 | sequence of video frames (often several hundred) is too memory and 52 | compute intense. Therefore, this implementation samples frames evenly 53 | from the video (sparse temporal sampling) so that the loaded frames 54 | represent every part of the video, with support for arbitrary and 55 | differing video lengths within the same dataset. This approach has shown 56 | to be very effective and is taken from `"Temporal Segment Networks 57 | (ECCV2016)" `__ with modifications. 58 | 59 | In conjunction with PyTorch's DataLoader, the VideoFrameDataset class 60 | returns video batch tensors of size 61 | ``BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH``. 62 | 63 | For a demo, visit ``https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch``. 64 | 65 | QuickDemo (demo.py) 66 | ~~~~~~~~~~~~~~~~~~~ 67 | 68 | .. code:: python 69 | 70 | root = os.path.join(os.getcwd(), 'demo_dataset') # Folder in which all videos lie in a specific structure 71 | annotation_file = os.path.join(root, 'annotations.txt') # A row for each video sample as: (VIDEO_PATH NUM_FRAMES CLASS_INDEX) 72 | 73 | """ DEMO 1 WITHOUT IMAGE TRANSFORMS """ 74 | dataset = VideoFrameDataset( 75 | root_path=root, 76 | annotationfile_path=annotation_file, 77 | num_segments=5, 78 | frames_per_segment=1, 79 | image_template='img_{:05d}.jpg', 80 | transform=None, 81 | random_shift=True, 82 | test_mode=False 83 | ) 84 | 85 | sample = dataset[0] # take first sample of dataset 86 | frames = sample[0] # list of PIL images 87 | label = sample[1] # integer label 88 | 89 | for image in frames: 90 | plt.imshow(image) 91 | plt.title(label) 92 | plt.show() 93 | plt.pause(1) 94 | 95 | Table of Contents 96 | ================= 97 | 98 | - `1. Requirements <#1-requirements>`__ 99 | - `2. Custom Dataset <#2-custom-dataset>`__ 100 | - `3. Video Frame Sampling Method <#3-video-frame-sampling-method>`__ 101 | - `4. Using VideoFrameDataset for 102 | Training <#4-using-videoframedataset-for-training>`__ 103 | - `5. Conclusion <#5-conclusion>`__ 104 | - `6. Acknowledgements <#6-acknowledgements>`__ 105 | 106 | 1. Requirements 107 | ~~~~~~~~~~~~~~~ 108 | 109 | :: 110 | 111 | # Without these three, VideoFrameDataset will not work. 112 | torchvision >= 0.8.0 113 | torch >= 1.7.0 114 | python >= 3.6 115 | 116 | 2. Custom Dataset 117 | ~~~~~~~~~~~~~~~~~ 118 | 119 | To use any dataset, two conditions must be met. 1) The video data must 120 | be supplied as RGB frames, each frame saved as an image file. Each video 121 | must have its own folder, in which the frames of that video lie. The 122 | frames of a video inside its folder must be named uniformly as 123 | ``img_00001.jpg`` ... ``img_00120.jpg``, if there are 120 frames. The 124 | filename template for frames is then "img\_{:05d}.jpg" (python string 125 | formatting, specifying 5 digits after the underscore), and must be 126 | supplied to the constructor of VideoFrameDataset as a parameter. Each 127 | video folder lies inside a ``root`` folder of this dataset. 2) To 128 | enumerate all video samples in the dataset and their required metadata, 129 | a ``.txt`` annotation file must be manually created that contains a row 130 | for each video sample in the dataset. The training, validation, and 131 | testing datasets must have separate annotation files. Each row must be a 132 | space-separated list that contains 133 | ``VIDEO_PATH NUM_FRAMES CLASS_INDEX``. The ``VIDEO_PATH`` of a video 134 | sample should be provided without the ``root`` prefix of this dataset. 135 | 136 | This example project demonstrates this using a dummy dataset inside of 137 | ``demo_dataset/``, which is the ``root`` dataset folder of this example. 138 | The folder structure looks as follows: 139 | 140 | :: 141 | 142 | demo_dataset 143 | │ 144 | ├───annotations.txt 145 | ├───jumping # arbitrary class folder naming 146 | │ ├───0001 # arbitrary video folder naming 147 | │ │ ├───img_00001.jpg 148 | │ │ . 149 | │ │ └───img_00017.jpg 150 | │ └───0002 151 | │ ├───img_00001.jpg 152 | │ . 153 | │ └───img_00018.jpg 154 | │ 155 | └───running # arbitrary folder naming 156 | ├───0001 # arbitrary video folder naming 157 | │ ├───img_00001.jpg 158 | │ . 159 | │ └───img_00015.jpg 160 | └───0002 161 | ├───img_00001.jpg 162 | . 163 | └───img_00015.jpg 164 | 165 | 166 | 167 | The accompanying annotation ``.txt`` file contains the following rows 168 | 169 | :: 170 | 171 | jumping/0001 17 0 172 | jumping/0002 18 0 173 | running/0001 15 1 174 | running/0002 15 1 175 | 176 | Instantiating a VideoFrameDataset with the ``root_path`` parameter 177 | pointing to ``demo_dataset``, the ``annotationsfile_path`` parameter 178 | pointing to the annotation file, and the ``imagefile_template`` 179 | parameter as "img\_{:05d}.jpg", is all that it takes to start using the 180 | VideoFrameDataset class. 181 | 182 | 3. Video Frame Sampling Method 183 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 184 | 185 | When loading a video, only a number of its frames are loaded. They are 186 | chosen in the following way: 1. The frame indices [1,N] are divided into 187 | NUM\_SEGMENTS even segments. From each segment, FRAMES\_PER\_SEGMENT 188 | consecutive indices are chosen at random. This results in 189 | NUM\_SEGMENTS\*FRAMES\_PER\_SEGMENT chosen indices, whose frames are 190 | loaded as PIL images and put into a list and returned when calling 191 | ``dataset[i]``. 192 | 193 | 4. Using VideoFrameDataset for training 194 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 195 | 196 | As demonstrated in ``https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch/blob/main/demo.py``, we can use PyTorch's 197 | ``torch.utils.data.DataLoader`` class with VideoFrameDataset to take 198 | care of shuffling, batching, and more. To turn the lists of PIL images 199 | returned by VideoFrameDataset into tensors, the transform 200 | ``video_dataset.imglist_totensor()`` can be supplied as the 201 | ``transform`` parameter to VideoFrameDataset. This turns a list of N PIL 202 | images into a batch of images/frames of shape 203 | ``N x CHANNELS x HEIGHT x WIDTH``. We can further chain preprocessing 204 | and augmentation functions that act on batches of images onto the end of 205 | ``imglist_totensor()``. 206 | 207 | As of ``torchvision 0.8.0``, all torchvision transforms can now also 208 | operate on batches of images, and they apply deterministic or random 209 | transformations on the batch identically on all images of the batch. 210 | Therefore, any torchvision transform can be used here to apply 211 | video-uniform preprocessing and augmentation. 212 | 213 | 5. Conclusion 214 | ~~~~~~~~~~~~~ 215 | 216 | A proper code-based explanation on how to use VideoFrameDataset for 217 | training is provided in ``https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch/blob/main/demo.py`` 218 | 219 | 6. Acknowledgements 220 | ~~~~~~~~~~~~~~~~~~~ 221 | 222 | We thank the authors of TSN for their 223 | `codebase `__, from which we 224 | took VideoFrameDataset and adapted it. 225 | -------------------------------------------------------------------------------- /video_dataset.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path 3 | import numpy as np 4 | from PIL import Image 5 | from torchvision import transforms 6 | import torch 7 | from typing import List, Union, Tuple, Any 8 | 9 | 10 | class VideoRecord(object): 11 | """ 12 | Helper class for class VideoFrameDataset. This class 13 | represents a video sample's metadata. 14 | 15 | Args: 16 | root_datapath: the system path to the root folder 17 | of the videos. 18 | row: A list with four or more elements where 1) The first 19 | element is the path to the video sample's frames excluding 20 | the root_datapath prefix 2) The second element is the starting frame id of the video 21 | 3) The third element is the inclusive ending frame id of the video 22 | 4) The fourth element is the label index. 23 | 5) any following elements are labels in the case of multi-label classification 24 | """ 25 | def __init__(self, row, root_datapath): 26 | self._data = row 27 | self._path = os.path.join(root_datapath, row[0]) 28 | 29 | 30 | @property 31 | def path(self) -> str: 32 | return self._path 33 | 34 | @property 35 | def num_frames(self) -> int: 36 | return self.end_frame - self.start_frame + 1 # +1 because end frame is inclusive 37 | @property 38 | def start_frame(self) -> int: 39 | return int(self._data[1]) 40 | 41 | @property 42 | def end_frame(self) -> int: 43 | return int(self._data[2]) 44 | 45 | @property 46 | def label(self) -> Union[int, List[int]]: 47 | # just one label_id 48 | if len(self._data) == 4: 49 | return int(self._data[3]) 50 | # sample associated with multiple labels 51 | else: 52 | return [int(label_id) for label_id in self._data[3:]] 53 | 54 | class VideoFrameDataset(torch.utils.data.Dataset): 55 | r""" 56 | A highly efficient and adaptable dataset class for videos. 57 | Instead of loading every frame of a video, 58 | loads x RGB frames of a video (sparse temporal sampling) and evenly 59 | chooses those frames from start to end of the video, returning 60 | a list of x PIL images or ``FRAMES x CHANNELS x HEIGHT x WIDTH`` 61 | tensors where FRAMES=x if the ``ImglistToTensor()`` 62 | transform is used. 63 | 64 | More specifically, the frame range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS 65 | segments and FRAMES_PER_SEGMENT consecutive frames are taken from each segment. 66 | 67 | Note: 68 | A demonstration of using this class can be seen 69 | in ``demo.py`` 70 | https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch 71 | 72 | Note: 73 | This dataset broadly corresponds to the frame sampling technique 74 | introduced in ``Temporal Segment Networks`` at ECCV2016 75 | https://arxiv.org/abs/1608.00859. 76 | 77 | 78 | Note: 79 | This class relies on receiving video data in a structure where 80 | inside a ``ROOT_DATA`` folder, each video lies in its own folder, 81 | where each video folder contains the frames of the video as 82 | individual files with a naming convention such as 83 | img_001.jpg ... img_059.jpg. 84 | For enumeration and annotations, this class expects to receive 85 | the path to a .txt file where each video sample has a row with four 86 | (or more in the case of multi-label, see README on Github) 87 | space separated values: 88 | ``VIDEO_FOLDER_PATH START_FRAME END_FRAME LABEL_INDEX``. 89 | ``VIDEO_FOLDER_PATH`` is expected to be the path of a video folder 90 | excluding the ``ROOT_DATA`` prefix. For example, ``ROOT_DATA`` might 91 | be ``home\data\datasetxyz\videos\``, inside of which a ``VIDEO_FOLDER_PATH`` 92 | might be ``jumping\0052\`` or ``sample1\`` or ``00053\``. 93 | 94 | Args: 95 | root_path: The root path in which video folders lie. 96 | this is ROOT_DATA from the description above. 97 | annotationfile_path: The .txt annotation file containing 98 | one row per video sample as described above. 99 | num_segments: The number of segments the video should 100 | be divided into to sample frames from. 101 | frames_per_segment: The number of frames that should 102 | be loaded per segment. For each segment's 103 | frame-range, a random start index or the 104 | center is chosen, from which frames_per_segment 105 | consecutive frames are loaded. 106 | imagefile_template: The image filename template that video frame files 107 | have inside of their video folders as described above. 108 | transform: Transform pipeline that receives a list of PIL images/frames. 109 | test_mode: If True, frames are taken from the center of each 110 | segment, instead of a random location in each segment. 111 | 112 | """ 113 | def __init__(self, 114 | root_path: str, 115 | annotationfile_path: str, 116 | num_segments: int = 3, 117 | frames_per_segment: int = 1, 118 | imagefile_template: str='img_{:05d}.jpg', 119 | transform = None, 120 | test_mode: bool = False): 121 | super(VideoFrameDataset, self).__init__() 122 | 123 | self.root_path = root_path 124 | self.annotationfile_path = annotationfile_path 125 | self.num_segments = num_segments 126 | self.frames_per_segment = frames_per_segment 127 | self.imagefile_template = imagefile_template 128 | self.transform = transform 129 | self.test_mode = test_mode 130 | 131 | self._parse_annotationfile() 132 | self._sanity_check_samples() 133 | 134 | def _load_image(self, directory: str, idx: int) -> Image.Image: 135 | return Image.open(os.path.join(directory, self.imagefile_template.format(idx))).convert('RGB') 136 | 137 | def _parse_annotationfile(self): 138 | self.video_list = [VideoRecord(x.strip().split(), self.root_path) for x in open(self.annotationfile_path)] 139 | 140 | def _sanity_check_samples(self): 141 | for record in self.video_list: 142 | if record.num_frames <= 0 or record.start_frame == record.end_frame: 143 | print(f"\nDataset Warning: video {record.path} seems to have zero RGB frames on disk!\n") 144 | 145 | elif record.num_frames < (self.num_segments * self.frames_per_segment): 146 | print(f"\nDataset Warning: video {record.path} has {record.num_frames} frames " 147 | f"but the dataloader is set up to load " 148 | f"(num_segments={self.num_segments})*(frames_per_segment={self.frames_per_segment})" 149 | f"={self.num_segments * self.frames_per_segment} frames. Dataloader will throw an " 150 | f"error when trying to load this video.\n") 151 | 152 | def _get_start_indices(self, record: VideoRecord) -> 'np.ndarray[int]': 153 | """ 154 | For each segment, choose a start index from where frames 155 | are to be loaded from. 156 | 157 | Args: 158 | record: VideoRecord denoting a video sample. 159 | Returns: 160 | List of indices of where the frames of each 161 | segment are to be loaded from. 162 | """ 163 | # choose start indices that are perfectly evenly spread across the video frames. 164 | if self.test_mode: 165 | distance_between_indices = (record.num_frames - self.frames_per_segment + 1) / float(self.num_segments) 166 | 167 | start_indices = np.array([int(distance_between_indices / 2.0 + distance_between_indices * x) 168 | for x in range(self.num_segments)]) 169 | # randomly sample start indices that are approximately evenly spread across the video frames. 170 | else: 171 | max_valid_start_index = (record.num_frames - self.frames_per_segment + 1) // self.num_segments 172 | 173 | start_indices = np.multiply(list(range(self.num_segments)), max_valid_start_index) + \ 174 | np.random.randint(max_valid_start_index, size=self.num_segments) 175 | 176 | return start_indices 177 | 178 | def __getitem__(self, idx: int) -> Union[ 179 | Tuple[List[Image.Image], Union[int, List[int]]], 180 | Tuple['torch.Tensor[num_frames, channels, height, width]', Union[int, List[int]]], 181 | Tuple[Any, Union[int, List[int]]], 182 | ]: 183 | """ 184 | For video with id idx, loads self.NUM_SEGMENTS * self.FRAMES_PER_SEGMENT 185 | frames from evenly chosen locations across the video. 186 | 187 | Args: 188 | idx: Video sample index. 189 | Returns: 190 | A tuple of (video, label). Label is either a single 191 | integer or a list of integers in the case of multiple labels. 192 | Video is either 1) a list of PIL images if no transform is used 193 | 2) a batch of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1] 194 | if the transform "ImglistToTensor" is used 195 | 3) or anything else if a custom transform is used. 196 | """ 197 | record: VideoRecord = self.video_list[idx] 198 | 199 | frame_start_indices: 'np.ndarray[int]' = self._get_start_indices(record) 200 | 201 | return self._get(record, frame_start_indices) 202 | 203 | def _get(self, record: VideoRecord, frame_start_indices: 'np.ndarray[int]') -> Union[ 204 | Tuple[List[Image.Image], Union[int, List[int]]], 205 | Tuple['torch.Tensor[num_frames, channels, height, width]', Union[int, List[int]]], 206 | Tuple[Any, Union[int, List[int]]], 207 | ]: 208 | """ 209 | Loads the frames of a video at the corresponding 210 | indices. 211 | 212 | Args: 213 | record: VideoRecord denoting a video sample. 214 | frame_start_indices: Indices from which to load consecutive frames from. 215 | Returns: 216 | A tuple of (video, label). Label is either a single 217 | integer or a list of integers in the case of multiple labels. 218 | Video is either 1) a list of PIL images if no transform is used 219 | 2) a batch of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1] 220 | if the transform "ImglistToTensor" is used 221 | 3) or anything else if a custom transform is used. 222 | """ 223 | 224 | frame_start_indices = frame_start_indices + record.start_frame 225 | images = list() 226 | 227 | # from each start_index, load self.frames_per_segment 228 | # consecutive frames 229 | for start_index in frame_start_indices: 230 | frame_index = int(start_index) 231 | 232 | # load self.frames_per_segment consecutive frames 233 | for _ in range(self.frames_per_segment): 234 | image = self._load_image(record.path, frame_index) 235 | images.append(image) 236 | 237 | if frame_index < record.end_frame: 238 | frame_index += 1 239 | 240 | if self.transform is not None: 241 | images = self.transform(images) 242 | 243 | return images, record.label 244 | 245 | def __len__(self): 246 | return len(self.video_list) 247 | 248 | class ImglistToTensor(torch.nn.Module): 249 | """ 250 | Converts a list of PIL images in the range [0,255] to a torch.FloatTensor 251 | of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1]. 252 | Can be used as first transform for ``VideoFrameDataset``. 253 | """ 254 | @staticmethod 255 | def forward(img_list: List[Image.Image]) -> 'torch.Tensor[NUM_IMAGES, CHANNELS, HEIGHT, WIDTH]': 256 | """ 257 | Converts each PIL image in a list to 258 | a torch Tensor and stacks them into 259 | a single tensor. 260 | 261 | Args: 262 | img_list: list of PIL images. 263 | Returns: 264 | tensor of size ``NUM_IMAGES x CHANNELS x HEIGHT x WIDTH`` 265 | """ 266 | return torch.stack([transforms.functional.to_tensor(pic) for pic in img_list]) 267 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Efficient Video Dataset Loading and Augmentation in PyTorch 2 | Author: [Raivo Koot](https://github.com/RaivoKoot) 3 | https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html 4 | If you find the code useful, please star the repository. 5 | 6 | If you are completely unfamiliar with loading datasets in PyTorch using `torch.utils.data.Dataset` and `torch.utils.data.DataLoader`, I recommend 7 | getting familiar with these first through [this](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) or 8 | [this](https://github.com/utkuozbulak/pytorch-custom-dataset-examples). 9 | 10 | ### In a Nutshell 11 | Video-Dataset-Loading-Pytorch provides the lowest entry barrier for setting up deep learning training loops on video data. It makes working with video datasets easy and accessible (also efficient!). It only requires you to have your video dataset in a certain format on disk and takes care of the rest. No complicated dependencies and it supports native Torchvision video data augmentation. 12 | 13 | ### Overview: This small library solely provides the class `VideoFrameDataset` 14 | The VideoFrameDataset class (an implementation of `torch.utils.data.Dataset`) serves to `easily`, `efficiently` and `effectively` load video samples from video datasets in PyTorch. 15 | 1) Easily because this dataset class can be used with custom datasets with minimum effort and no modification. The class merely expects the 16 | video dataset to have a certain structure on disk and expects a .txt annotation file that enumerates each video sample. Details on this 17 | can be found below. Pre-made annotation files and preparation scripts are also provided for [Kinetics 400](https://github.com/cvdfoundation/kinetics-dataset), [Something Something V2](https://20bn.com/datasets/something-something) and [Epic Kitchens 100](https://epic-kitchens.github.io/2021). 18 | 2) Efficiently because the video loading pipeline that this class implements is very fast. This minimizes GPU waiting time during training by eliminating CPU input bottlenecks that can slow down training time by several folds. 19 | 3) Effectively because the implemented sampling strategy for video frames is very representative. Video training using the entire sequence of 20 | video frames (often several hundred) is too memory and compute intense. Therefore, this implementation samples frames evenly from the video (sparse temporal sampling) 21 | so that the loaded frames represent every part of the video, with support for arbitrary and differing video lengths within the same dataset. 22 | This approach has shown to be very effective and is taken from 23 | ["Temporal Segment Networks (ECCV2016)"](https://arxiv.org/abs/1608.00859) with modifications. 24 | 25 | In conjunction with PyTorch's DataLoader, the VideoFrameDataset class returns video batch tensors of size `BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH`. 26 | 27 | For a demo, visit `demo.py`. 28 | 29 | ### QuickDemo (demo.py) 30 | ```python 31 | root = os.path.join(os.getcwd(), 'demo_dataset') # Folder in which all videos lie in a specific structure 32 | annotation_file = os.path.join(root, 'annotations.txt') # A row for each video sample as: (VIDEO_PATH START_FRAME END_FRAME CLASS_ID) 33 | 34 | """ DEMO 1 WITHOUT IMAGE TRANSFORMS """ 35 | dataset = VideoFrameDataset( 36 | root_path=root, 37 | annotationfile_path=annotation_file, 38 | num_segments=5, 39 | frames_per_segment=1, 40 | imagefile_template='img_{:05d}.jpg', 41 | transform=None, 42 | test_mode=False 43 | ) 44 | 45 | sample = dataset[0] # take first sample of dataset 46 | frames = sample[0] # list of PIL images 47 | label = sample[1] # integer label 48 | 49 | for image in frames: 50 | plt.imshow(image) 51 | plt.title(label) 52 | plt.show() 53 | plt.pause(1) 54 | ``` 55 | ![alt text](https://github.com/RaivoKoot/images/blob/main/Action_Video.jpg "Action Video") 56 | # Table of Contents 57 | - [1. Requirements](#1-requirements) 58 | - [2. Custom Dataset](#2-custom-dataset) 59 | - [3. Video Frame Sampling Method](#3-video-frame-sampling-method) 60 | - [4. Alternate Video Frame Sampling Methods](#4-alternate-video-frame-sampling-methods) 61 | - [5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training) 62 | - [6. Allowing Multiple Labels per Sample](#6-allowing-multiple-labels-per-sample) 63 | - [7. Conclusion](#7-conclusion) 64 | - [8. Kinetics 400 & Something Something V2 & EPIC-KITCHENS-100](#8-kinetics-400--something-something-v2--epic-kitchens-100) 65 | - [9. Upcoming Features](#9-upcoming-features) 66 | - [10. Acknowledgements](#10-acknowledgements) 67 | 68 | ### 1. Requirements 69 | ``` 70 | # Without these three, VideoFrameDataset will not work. 71 | torchvision >= 0.8.0 72 | torch >= 1.7.0 73 | python >= 3.6 74 | ``` 75 | ### 2. Custom Dataset 76 | (This description explains using custom datasets where each sample has a single class label. If you want to know how to 77 | use a dataset where a sample can have more than a single class label, read this anyways and then read `6.` below) 78 | 79 | To use any dataset, two conditions must be met. 80 | 1) The video data must be supplied as RGB frames, each frame saved as an image file. Each video must have its own folder, in which the frames of 81 | that video lie. The frames of a video inside its folder must be named uniformly with consecutive indices such as `img_00001.jpg` ... `img_00120.jpg`, if there are 120 frames. 82 | Indices can start at zero or any other number and the exact file name template can be chosen freely. The filename template 83 | for frames in this example is "img_{:05d}.jpg" (python string formatting, specifying 5 digits after the underscore), and must be supplied to the 84 | constructor of VideoFrameDataset as a parameter. Each video folder must lie inside some `root` folder. 85 | 2) To enumerate all video samples in the dataset and their required metadata, a `.txt` annotation file must be manually created that contains a row for each 86 | video clip sample in the dataset. The training, validation, and testing datasets must have separate annotation files. Each row must be a space-separated list that contains 87 | `VIDEO_PATH START_FRAME END_FRAME CLASS_INDEX`. The `VIDEO_PATH` of a video sample should be provided without the `root` prefix of this dataset. 88 | 89 | This example project demonstrates this using a dummy dataset inside of `demo_dataset/`, which is the `root` dataset folder of this example. The folder 90 | structure looks as follows: 91 | ``` 92 | demo_dataset 93 | │ 94 | ├───annotations.txt 95 | ├───jumping # arbitrary class folder naming 96 | │ ├───0001 # arbitrary video folder naming 97 | │ │ ├───img_00001.jpg 98 | │ │ . 99 | │ │ └───img_00017.jpg 100 | │ └───0002 101 | │ ├───img_00001.jpg 102 | │ . 103 | │ └───img_00018.jpg 104 | │ 105 | └───running # arbitrary folder naming 106 | ├───0001 # arbitrary video folder naming 107 | │ ├───img_00001.jpg 108 | │ . 109 | │ └───img_00015.jpg 110 | └───0002 111 | ├───img_00001.jpg 112 | . 113 | └───img_00015.jpg 114 | 115 | 116 | ``` 117 | The accompanying annotation `.txt` file contains the following rows (PATH, START_FRAME, END_FRAME, LABEL_ID) 118 | ``` 119 | jumping/0001 1 17 0 120 | jumping/0002 1 18 0 121 | running/0001 1 15 1 122 | running/0002 1 15 1 123 | ``` 124 | Another annotations file that uses multiple clips from each video could be 125 | ``` 126 | jumping/0001 1 8 0 127 | jumping/0001 5 17 0 128 | jumping/0002 1 18 0 129 | running/0001 10 15 1 130 | running/0001 5 10 1 131 | running/0002 1 15 1 132 | ``` 133 | (END_FRAME is inclusive) 134 | 135 | Another, simpler, example of the way your dataset's RGB frames can be organized on disk is the following: 136 | ``` 137 | demo_dataset 138 | │ 139 | ├───annotations.txt 140 | └───rgb 141 | ├───video_1 142 | │ ├───img_00001.jpg 143 | │ . 144 | │ └───img_00017.jpg 145 | ├───video_2 146 | │ ├───img_00001.jpg 147 | │ . 148 | │ └───img_00044.jpg 149 | └───video_3 150 | ├───img_00001.jpg 151 | . 152 | └───img_00023.jpg 153 | 154 | 155 | ``` 156 | The accompanying annotation `.txt` file contains the following rows (PATH, START_FRAME, END_FRAME, LABEL_ID) 157 | ``` 158 | video_1 1 17 1 159 | video_2 1 44 0 160 | video_3 1 23 0 161 | ``` 162 | 163 | Instantiating a VideoFrameDataset with the `root_path` parameter pointing to `demo_dataset/rgb/`, the `annotationsfile_path` parameter pointing to the annotation file `demo_dataset/annotations.txt`, and 164 | the `imagefile_template` parameter as "img_{:05d}.jpg", is all that it takes to start using the VideoFrameDataset class. 165 | 166 | ### 3. Video Frame Sampling Method 167 | When loading a video, only a number of its frames are loaded. They are chosen in the following way: 168 | 1. The frame index range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS even segments. From each segment, a random start-index is sampled from which FRAMES_PER_SEGMENT consecutive indices are loaded. 169 | This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling 170 | `dataset[i]`. 171 | ![alt text](https://github.com/RaivoKoot/images/blob/main/Sparse_Temporal_Sampling.jpg "Sparse-Temporal-Sampling-Strategy") 172 | 173 | ### 4. Alternate Video Frame Sampling Methods 174 | If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous 175 | clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Because VideoFrameDataset 176 | will chose a random start index per segment and take `NUM_SEGMENTS` continuous frames from each sampled start 177 | index, this will result in a single N-frame continuous clip per video that starts at a random index. 178 | An example of this is in `demo.py`. 179 | 180 | ### 5. Using VideoFrameDataset for training 181 | As demonstrated in `demo.py`, we can use PyTorch's `torch.utils.data.DataLoader` class with VideoFrameDataset to take care of shuffling, batching, and more. 182 | To turn the lists of PIL images returned by VideoFrameDataset into tensors, the transform `video_dataset.ImglistToTensor()` can be supplied 183 | as the `transform` parameter to VideoFrameDataset. This turns a list of N PIL images into a batch of images/frames of shape `N x CHANNELS x HEIGHT x WIDTH`. 184 | We can further chain preprocessing and augmentation functions that act on batches of images onto the end of `ImglistToTensor()`, as seen in `demo.py` 185 | 186 | As of `torchvision 0.8.0`, all torchvision transforms can now also operate on batches of images, and they apply deterministic or random transformations 187 | on the batch identically on all images of the batch. Because a single video-tensor (FRAMES x CHANNELS x HEIGHT x WIDTH) 188 | has the same shape as an image batch tensor (BATCH x CHANNELS x HEIGHT x WIDTH), any torchvision transform can be used here to apply video-uniform preprocessing and augmentation. 189 | 190 | REMEMBER: 191 | Pytorch transforms are applied to individual dataset samples (in this case a list of PIL images of a video, or a video-frame tensor after `ImglistToTensor()`) before 192 | batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `ImglistToTensor()` is not used. 193 | 194 | ### 6. Allowing Multiple Labels per Sample 195 | Your dataset labels might be more complicated than just a single label id per sample. For example, in the EPIC-KITCHENS dataset 196 | each video clip has a verb class, noun class, and action class. In this case, each sample is associated with three label ids. 197 | To accommodate for datasets where a sample can have N integer labels, `annotation.txt` files can be used where each row 198 | is space separated `PATH, FRAME_START, FRAME_END, LABEL_1_ID, ..., LABEL_N_ID`, instead of 199 | `PATH, FRAME_START, FRAME_END, LABEL_ID`. The VideoFrameDataset class 200 | can handle this type of annotation files too, without changing anything apart from the rows in your `annotations.txt`. 201 | 202 | The `annotations.txt` file for a dataset where multiple clip samples can come from the same video and each sample has 203 | three labels, would have rows like `PATH, START_FRAME, END_FRAME, LABEL1, LABEL2, LABEL3` as seen below 204 | ``` 205 | jumping/0001 1 8 0 2 1 206 | jumping/0001 5 17 0 10 3 207 | jumping/0002 1 18 0 5 3 208 | running/0001 10 15 1 3 3 209 | running/0001 5 10 1 1 0 210 | running/0002 1 15 1 12 4 211 | ``` 212 | 213 | When you use `torch.utils.data.DataLoader` with VideoFrameDataset to retrieve your batches during 214 | training, the dataloader then no longer returns batches as a `( (BATCHxFRAMESxHEIGHTxWIDTH) , (BATCH) )` tuple, where the second item is 215 | just a list/tensor of the batch's labels. Instead, the second item is replaced with the tuple 216 | `( (BATCH) ... (BATCH) )` where the first BATCH-sized list gives label_1 for the whole batch, and the last BATCH-sized 217 | list gives label_n for the whole batch. 218 | 219 | A demo of this can be found at the end in `demo.py`. It uses the dummy dataset in directory `demo_dataset_multilabel`. 220 | 221 | ### 7. Conclusion 222 | A proper code-based explanation on how to use VideoFrameDataset for training is provided in `demo.py` 223 | 224 | ### 8. Kinetics 400 & Something Something V2 & EPIC-KITCHENS-100 225 | After you have read Section 1 to 7, this repository also contains easy pre-made conversion scripts and annotation files to get you instantly started with the Kinetics 400 dataset, Something Something V2 dataset, and the EPIC-KITCHENS-100 dataset. To get started with either, read the README inside the `Kinetics400`, `EpicKitchens100` or `SomethingSomethingV2` directory. 226 | 227 | ### 9. Upcoming Features 228 | - [x] Include compatible annotation files for common datasets, such as Something-Something-V2, EPIC-KITCHENS-100 and Kinetics, so that users do not need to spend their own time converting those datasets' annotation files to be compatible with this repository. 229 | - [x] Add demo for sampling a single continous-frame clip from videos. 230 | - [x] Add support for arbitrary labels that are more than just a single integer. 231 | - [x] Add support for specifying START_FRAME and END_FRAME for a video instead of NUM_FRAMES. 232 | - [x] Improve the handling of edge cases where NUM_FRAMES*FRAM_PER_SEG (or similar) might be larger than the number of frames in a video. (a warning message is printed now) 233 | - [x] Clean up some of the internal code that is still very messy, which was taken from the below codebase. 234 | - [ ] Create a version of this implementation that uses OpenCV instead of PIL for frame loading, so that you can use Albumentation transforms instead of Torchvision transforms. 235 | 236 | ### 10. Acknowledgements 237 | We thank the authors of TSN for their [codebase](https://github.com/yjxiong/tsn-pytorch), from which we took VideoFrameDataset and adapted it 238 | for general use and compatibility. 239 | ``` 240 | @InProceedings{wang2016_TemporalSegmentNetworks, 241 | title={Temporal Segment Networks: Towards Good Practices for Deep Action Recognition}, 242 | author={Limin Wang and Yuanjun Xiong and Zhe Wang and Yu Qiao and Dahua Lin and 243 | Xiaoou Tang and Luc {Val Gool}}, 244 | booktitle={The European Conference on Computer Vision (ECCV)}, 245 | year={2016} 246 | } 247 | ``` 248 | --------------------------------------------------------------------------------