├── demo_dataset
    ├── annotations.txt
    ├── jumping
    │   ├── 0001
    │   │   ├── img_00001.jpg
    │   │   ├── img_00002.jpg
    │   │   ├── img_00003.jpg
    │   │   ├── img_00004.jpg
    │   │   ├── img_00005.jpg
    │   │   ├── img_00006.jpg
    │   │   ├── img_00007.jpg
    │   │   ├── img_00008.jpg
    │   │   ├── img_00009.jpg
    │   │   ├── img_00010.jpg
    │   │   ├── img_00011.jpg
    │   │   ├── img_00012.jpg
    │   │   ├── img_00013.jpg
    │   │   ├── img_00014.jpg
    │   │   ├── img_00015.jpg
    │   │   ├── img_00016.jpg
    │   │   └── img_00017.jpg
    │   └── 0002
    │   │   ├── img_00001.jpg
    │   │   ├── img_00002.jpg
    │   │   ├── img_00003.jpg
    │   │   ├── img_00004.jpg
    │   │   ├── img_00005.jpg
    │   │   ├── img_00006.jpg
    │   │   ├── img_00007.jpg
    │   │   ├── img_00008.jpg
    │   │   ├── img_00009.jpg
    │   │   ├── img_00010.jpg
    │   │   ├── img_00011.jpg
    │   │   ├── img_00012.jpg
    │   │   ├── img_00013.jpg
    │   │   ├── img_00014.jpg
    │   │   ├── img_00015.jpg
    │   │   ├── img_00016.jpg
    │   │   ├── img_00017.jpg
    │   │   └── img_00018.jpg
    └── running
    │   ├── 0001
    │       ├── img_00001.jpg
    │       ├── img_00002.jpg
    │       ├── img_00003.jpg
    │       ├── img_00004.jpg
    │       ├── img_00005.jpg
    │       ├── img_00006.jpg
    │       ├── img_00007.jpg
    │       ├── img_00008.jpg
    │       ├── img_00009.jpg
    │       ├── img_00010.jpg
    │       ├── img_00011.jpg
    │       ├── img_00012.jpg
    │       ├── img_00013.jpg
    │       ├── img_00014.jpg
    │       └── img_00015.jpg
    │   └── 0002
    │       ├── img_00001.jpg
    │       ├── img_00002.jpg
    │       ├── img_00003.jpg
    │       ├── img_00004.jpg
    │       ├── img_00005.jpg
    │       ├── img_00006.jpg
    │       ├── img_00007.jpg
    │       ├── img_00008.jpg
    │       ├── img_00009.jpg
    │       ├── img_00010.jpg
    │       ├── img_00011.jpg
    │       ├── img_00012.jpg
    │       ├── img_00013.jpg
    │       ├── img_00014.jpg
    │       └── img_00015.jpg
├── docs
    ├── source
    │   ├── modules.rst
    │   ├── demo.rst
    │   ├── VideoDataset.rst
    │   ├── conf.py
    │   ├── README.rst
    │   └── index.rst
    ├── Makefile
    └── make.bat
├── demo_dataset_multilabel
    ├── annotations.txt
    ├── jumping
    │   ├── 0001
    │   │   ├── img_00001.jpg
    │   │   ├── img_00002.jpg
    │   │   ├── img_00003.jpg
    │   │   ├── img_00004.jpg
    │   │   ├── img_00005.jpg
    │   │   ├── img_00006.jpg
    │   │   ├── img_00007.jpg
    │   │   ├── img_00008.jpg
    │   │   ├── img_00009.jpg
    │   │   ├── img_00010.jpg
    │   │   ├── img_00011.jpg
    │   │   ├── img_00012.jpg
    │   │   ├── img_00013.jpg
    │   │   ├── img_00014.jpg
    │   │   ├── img_00015.jpg
    │   │   ├── img_00016.jpg
    │   │   └── img_00017.jpg
    │   └── 0002
    │   │   ├── img_00001.jpg
    │   │   ├── img_00002.jpg
    │   │   ├── img_00003.jpg
    │   │   ├── img_00004.jpg
    │   │   ├── img_00005.jpg
    │   │   ├── img_00006.jpg
    │   │   ├── img_00007.jpg
    │   │   ├── img_00008.jpg
    │   │   ├── img_00009.jpg
    │   │   ├── img_00010.jpg
    │   │   ├── img_00011.jpg
    │   │   ├── img_00012.jpg
    │   │   ├── img_00013.jpg
    │   │   ├── img_00014.jpg
    │   │   ├── img_00015.jpg
    │   │   ├── img_00016.jpg
    │   │   ├── img_00017.jpg
    │   │   └── img_00018.jpg
    └── running
    │   ├── 0001
    │       ├── img_00001.jpg
    │       ├── img_00002.jpg
    │       ├── img_00003.jpg
    │       ├── img_00004.jpg
    │       ├── img_00005.jpg
    │       ├── img_00006.jpg
    │       ├── img_00007.jpg
    │       ├── img_00008.jpg
    │       ├── img_00009.jpg
    │       ├── img_00010.jpg
    │       ├── img_00011.jpg
    │       ├── img_00012.jpg
    │       ├── img_00013.jpg
    │       ├── img_00014.jpg
    │       └── img_00015.jpg
    │   └── 0002
    │       ├── img_00001.jpg
    │       ├── img_00002.jpg
    │       ├── img_00003.jpg
    │       ├── img_00004.jpg
    │       ├── img_00005.jpg
    │       ├── img_00006.jpg
    │       ├── img_00007.jpg
    │       ├── img_00008.jpg
    │       ├── img_00009.jpg
    │       ├── img_00010.jpg
    │       ├── img_00011.jpg
    │       ├── img_00012.jpg
    │       ├── img_00013.jpg
    │       ├── img_00014.jpg
    │       └── img_00015.jpg
├── requirements.txt
├── EpicKitchens100
    ├── original_annotations_to_processed_annotations.py
    └── README.md
├── LICENSE
├── Kinetics400
    ├── process_annotation_file.py
    ├── README.md
    ├── videos_to_frames.py
    └── labels_to_id.csv
├── SomethingSomethingV2
    ├── README.md
    ├── original_annotations_to_processed_annotations.py
    └── videos_to_frames.py
├── demo.py
├── video_dataset.py
└── README.md


/demo_dataset/annotations.txt:
--------------------------------------------------------------------------------
1 | jumping/0001 1 17 0
2 | jumping/0002 1 18 0
3 | running/0001 1 15 1
4 | running/0002 1 15 1


--------------------------------------------------------------------------------
/docs/source/modules.rst:
--------------------------------------------------------------------------------
1 | DataLoading
2 | ===========
3 | 
4 | .. toctree::
5 |    :maxdepth: 4
6 | 
7 |    demo
8 |    video_dataset
9 | 


--------------------------------------------------------------------------------
/demo_dataset_multilabel/annotations.txt:
--------------------------------------------------------------------------------
1 | jumping/0001 1 17 0 2 4
2 | jumping/0002 1 18 0 1 3
3 | running/0001 1 15 1 1 2
4 | running/0002 1 15 1 3 3


--------------------------------------------------------------------------------
/docs/source/demo.rst:
--------------------------------------------------------------------------------
1 | demo module
2 | ===========
3 | 
4 | .. automodule:: demo
5 |    :members:
6 |    :undoc-members:
7 |    :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00015.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00016.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00016.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0001/img_00017.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0001/img_00017.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00015.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00016.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00016.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00017.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00017.jpg


--------------------------------------------------------------------------------
/demo_dataset/jumping/0002/img_00018.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/jumping/0002/img_00018.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0001/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0001/img_00015.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset/running/0002/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset/running/0002/img_00015.jpg


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | # python >= 3.6
2 | torch >= 1.7.0
3 | torchvision >= 0.8.0
4 | 
5 | # for demo
6 | matplotlib
7 | 
8 | # for author
9 | sphinx == 3.3.1


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00015.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00016.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00016.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0001/img_00017.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0001/img_00017.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00015.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00016.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00016.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00017.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00017.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/jumping/0002/img_00018.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/jumping/0002/img_00018.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0001/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0001/img_00015.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00001.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00001.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00002.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00002.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00003.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00003.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00004.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00004.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00005.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00005.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00006.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00007.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00008.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00009.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00010.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00010.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00011.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00011.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00012.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00012.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00013.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00013.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00014.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00014.jpg


--------------------------------------------------------------------------------
/demo_dataset_multilabel/running/0002/img_00015.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RaivoKoot/Video-Dataset-Loading-Pytorch/HEAD/demo_dataset_multilabel/running/0002/img_00015.jpg


--------------------------------------------------------------------------------
/docs/source/VideoDataset.rst:
--------------------------------------------------------------------------------
1 | VideoDataset module
2 | =====================
3 | 
4 | .. automodule:: video_dataset
5 |    :members:
6 |    :exclude-members: VideoRecord
7 |    :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
 1 | # Minimal makefile for Sphinx documentation
 2 | #
 3 | 
 4 | # You can set these variables from the command line, and also
 5 | # from the environment for the first two.
 6 | SPHINXOPTS    ?=
 7 | SPHINXBUILD   ?= sphinx-build
 8 | SOURCEDIR     = source
 9 | BUILDDIR      = build
10 | 
11 | # Put it first so that "make" without argument is like "make help".
12 | help:
13 | 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14 | 
15 | .PHONY: help Makefile
16 | 
17 | # Catch-all target: route all unknown targets to Sphinx using the new
18 | # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
19 | %: Makefile
20 | 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
21 | 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) -b coverage
22 | 


--------------------------------------------------------------------------------
/docs/make.bat:
--------------------------------------------------------------------------------
 1 | @ECHO OFF
 2 | 
 3 | pushd %~dp0
 4 | 
 5 | REM Command file for Sphinx documentation
 6 | 
 7 | if "%SPHINXBUILD%" == "" (
 8 | 	set SPHINXBUILD=sphinx-build
 9 | )
10 | set SOURCEDIR=source
11 | set BUILDDIR=build
12 | 
13 | if "%1" == "" goto help
14 | 
15 | %SPHINXBUILD% >NUL 2>NUL
16 | if errorlevel 9009 (
17 | 	echo.
18 | 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
19 | 	echo.installed, then set the SPHINXBUILD environment variable to point
20 | 	echo.to the full path of the 'sphinx-build' executable. Alternatively you
21 | 	echo.may add the Sphinx directory to PATH.
22 | 	echo.
23 | 	echo.If you don't have Sphinx installed, grab it from
24 | 	echo.http://sphinx-doc.org/
25 | 	exit /b 1
26 | )
27 | 
28 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29 | goto end
30 | 
31 | :help
32 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33 | 
34 | :end
35 | popd
36 | 


--------------------------------------------------------------------------------
/EpicKitchens100/original_annotations_to_processed_annotations.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import pandas as pd
 3 | 
 4 | """
 5 | This script converts the original EPIC-KITCHENS-100 annotation file 
 6 | and turns it into an annotation.txt file that is compatible 
 7 | with this repository's dataloader VideoFrameDataset.
 8 | 
 9 | Modify the two filepaths below and then run this script.
10 | """
11 | 
12 | if __name__ == '__main__':
13 |     # filepath to where you have stored the original annotaiton file
14 |     annotation_file = os.path.join(os.path.expanduser('~'), 'homedata', 'EPICKITCHENS', 'annotations', 'EPIC_100_train_subset.csv')
15 | 
16 |     # the output path and file name that you want to use
17 |     out_file = os.path.join(os.path.expanduser('~'), 'data', 'EPICKITCHENS', 'annotations', 'EPIC_100_validation_new.txt')
18 | 
19 |     data = pd.read_csv(annotation_file, header=0)
20 |     for i,row in enumerate(data):
21 |         print(row)
22 | 
23 |     with open(out_file, 'a') as file:
24 |         for index, row in data.iterrows():
25 |             path = os.path.join(row['participant_id'], 'rgb_frames', row['video_id'])
26 |             start_frame = row['start_frame']
27 |             last_frame = row['stop_frame']
28 |             verb_class = row['verb_class']
29 |             noun_class = row['noun_class']
30 | 
31 |             file.write(f"{path} {start_frame} {last_frame} {verb_class} {noun_class}\n")
32 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 2-Clause License
 2 | 
 3 | Copyright (c) 2020, Raivo Eli Koot
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 


--------------------------------------------------------------------------------
/EpicKitchens100/README.md:
--------------------------------------------------------------------------------
 1 | # Using Epic Kitchens 100
 2 | This directory contains pre-made annotation files to use the [Epic Kitchens 100](https://epic-kitchens.github.io/2021) dataset with this 
 3 | repository's VideoFrameDataset dataloader. The two `.txt` files in this directory are the training and validation annotation files that you can use for EPIC-Epic Kitchens 100 with VideoFrameDataset. That's it! Reading `(1) Dataset Overview` below can also help you understand the Epic Kitchens 100 files.
 4 | 
 5 | If you need/want to recreate these processed annotation files yourself, read the rest of this README below.
 6 | 
 7 | ### 1. Dataset Overview
 8 | When you download the Epic Kitchens 100 dataset, it comes in the following format:
 9 | - A folder containing jpeg RGB frames for each video
10 | - A `.csv` file for the training annotations and the validation annotations
11 | 
12 | To use VideoFrameDataset with Epic Kitchens 100, we need to
13 | 1. Turn the `.csv` file into an `annotations.txt` file, as described in the main README of this repository.
14 | 
15 | ### 2. Processing
16 | Doing (1) from above, is very easy if you use the python script provided in this directory.
17 | - For (1), run the script `original_annotations_to_processed_annotations.py` and make sure that you
18 | set the file paths correctly inside of the script. Run this script once for training and once for validation 
19 | annotations.
20 | 
21 | ### 3. Done
22 | That's it! You now have all you need to use VideoFrameDataset and start training
23 | on Epic Kitchens 100:
24 | - two annotation text files
25 | - a folder called `RGB` that contains the frames of all videos
26 | 


--------------------------------------------------------------------------------
/Kinetics400/process_annotation_file.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from typing import Dict
 3 | import os
 4 | 
 5 | label_to_id_file = 'labels_to_id.csv'
 6 | annotation_in_file = 'train.csv'
 7 | annotation_out_file = 'train_processed.txt'
 8 | rgb_path = 'rgb/'
 9 | 
10 | 
11 | """ Read in mapping from class label to class ID """
12 | label_to_id_df = pd.read_csv(label_to_id_file)
13 | label_to_id_dict: Dict[str, int] = dict()
14 | for index, row in label_to_id_df.iterrows():
15 |     label_to_id_dict[row['name']] = int(row['id'])
16 | 
17 | 
18 | """ Read in original annotations and convert class label to class ID """
19 | annotations = pd.read_csv(annotation_in_file)
20 | annotations_processed = []
21 | for index, row in annotations.iterrows():
22 |     label, video_id, start_time, end_time = row['label'], row['youtube_id'], row['time_start'], row['time_end']
23 |     annotations_processed.append((video_id, label_to_id_dict[label], start_time, end_time))
24 | 
25 | 
26 | """ Find out how many rgb frames each video has and finally write to new annotation file """
27 | with open(annotation_out_file, 'w') as f:
28 |     for annotation in annotations_processed:
29 |         video_id, class_id, start_time, end_time = annotation
30 |         video_id = str(video_id) +'_{:06d}_{:06d}'.format(start_time, end_time)
31 |         start_frame = 0
32 |         frame_path = os.path.join(rgb_path, video_id)
33 |         try:
34 |             num_frames = len(os.listdir(frame_path))
35 |             if num_frames == 0:
36 |                 print(f'{video_id} - no frames')
37 |                 continue
38 |         except FileNotFoundError as e:
39 |             print(e)
40 |             continue
41 | 
42 |         annotation_string = "{} {} {} {}\n".format(video_id, start_frame, num_frames-1, class_id)
43 |         f.write(annotation_string)
44 | 


--------------------------------------------------------------------------------
/SomethingSomethingV2/README.md:
--------------------------------------------------------------------------------
 1 | # Using Something Something V2
 2 | This directory contains helpers to use the [Something Something V2](https://20bn.com/datasets/something-something) dataset with this 
 3 | repository's VideoFrameDataset dataloader.
 4 | 
 5 | ### 1. Dataset Overview
 6 | When you download the Something Something V2 dataset, it comes in the following format:
 7 | - A `.webm` video file for every video
 8 | - A `.json` file for the training annotations and the validation annotations
 9 | 
10 | To use VideoFrameDataset with Something Something V2, we need to
11 | 1. Create a folder for every `.webm` file that contains the RGB frames of that video.
12 | 2. Turn the `.json` file into an `annotations.txt` file, as described in the main README of this repository.
13 | 
14 | ### 2. Processing
15 | Doing (1) and (2) from above, is very easy if you use the python scripts provided in this directory.
16 | - For (1), run the script `videos_to_frames.py` and make sure that you set the file paths
17 | correctly inside of the script.
18 | - For (2), run the script `original_annotations_to_processed_annotations.py` and make sure that you
19 | set the file paths correctly inside of the script. You also must have completed step (1),
20 | before you are able to run this script. Run this script once for training and once for validation 
21 | annotations.
22 | 
23 | NOTE: The processed training and validation files that step (2) outputs, are uploaded here as well.
24 | You can directly use these and skip step (2). However, after completing step (1), you might have
25 | to run (2) yourself, to create these two files yourself, in case there is some discrepancy between
26 | the way `videos_to_frames.py` extracts RGB frames on my machine compared to on yours (this happened
27 | to me once).
28 | 
29 | ### 3. Done
30 | That's it! You should then have a folder on your disk `RGB` that contains all videos in individual RGB
31 | frames, and the two annotation files. This is all you need to use VideoFrameDataset and start training
32 | on Something Something V2!
33 | 


--------------------------------------------------------------------------------
/SomethingSomethingV2/original_annotations_to_processed_annotations.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | 
 4 | """
 5 | This script takes in the original json annotation files for 
 6 | SomethingSomethingV2 and turns them into annotation.txt files
 7 | that are compatible with this repository's dataloader VideoFrameDataset.
 8 | 
 9 | Running this script requires that you already have the SomethingSomethingV2
10 | videos on disk as RGB frames, where each video has its own folder, containing
11 | the RGB frames of that video. For this, you can use 
12 | the script videos_to_frames.py.
13 | 
14 | Modify the three filepaths below and then run this script.
15 | """
16 | 
17 | # the official Something Something V2 annotations file, either training or validation.
18 | raw_annotations = 'something-something-v2-validation.json'
19 | # the name of the output file
20 | out_file = 'something-something-v2-validation-processed.txt'
21 | # the official Something Something V2 label file, that specifies a mapping from TEXT_LABEL -> CLASS_ID
22 | labels_file = 'something-something-v2-labels.json'
23 | 
24 | rgb_root = '../rgb/'
25 | 
26 | annotations = None
27 | 
28 | """ list containing the annotations for each sample """
29 | with open(raw_annotations) as file:
30 |     annotations = json.load(file)
31 | 
32 | """ dictionary to go from [text label] -> [integer label] """
33 | with open(labels_file) as file:
34 |     labels_to_ids_dict = json.load(file)
35 | 
36 | 
37 | with open(out_file, 'w') as file:
38 |     for sample in annotations:
39 |         sample_id = sample['id']
40 | 
41 |         """ find out the number of frames for this sample """
42 |         sample_rgb_directory = os.path.join(rgb_root,sample_id)
43 |         num_frames = len(os.listdir(sample_rgb_directory)) - 1
44 | 
45 |         """ convert [text label] -> [integer label] """
46 |         text_label = sample['template'].replace('[', '').replace(']', '')
47 |         label_id = labels_to_ids_dict[text_label]
48 | 
49 |         """ write to processed file """
50 |         annotation_string = "{} {} {} {}\n".format(sample_id, 0, num_frames, label_id)
51 |         file.write(annotation_string)
52 | 


--------------------------------------------------------------------------------
/Kinetics400/README.md:
--------------------------------------------------------------------------------
 1 | # Using Kineics 400
 2 | This directory contains helpers to use the [Kinetics 400](https://github.com/cvdfoundation/kinetics-dataset) dataset with this 
 3 | repository's VideoFrameDataset dataloader. Download it from [this URL](https://github.com/cvdfoundation/kinetics-dataset).
 4 | 
 5 | ### 1. Dataset Overview
 6 | When you download the Kinetics 400 dataset, it comes in the following format:
 7 | - An `.mp4` video file for every video
 8 | - A `.csv` file for the training, validation, and testing annotations
 9 | 
10 | To use VideoFrameDataset with Kinetics 400, we need to
11 | 1. Create a folder for every `.mp4` file that contains the RGB frames of that video.
12 | 2. Turn each `.csv` file into an `annotations.txt` file, as described in the main README of this repository.
13 | 
14 | ### 2. Processing
15 | Doing (1) and (2) from above, is very easy if you use the python scripts provided in this directory.
16 | - For (1), make sure that all `.mp4` files (training, validation, and testing) are located in a single and the same
17 | directory. Run the script `videos_to_frames.py` and make sure that you set the file paths
18 | correctly inside of the script. This will probably take ~10 hours for Kinetics 400.
19 | - For (2), run the script `process_annotation_file.py` once for each annotation `.csv` and make sure that you
20 | set the file paths correctly inside of the script. You also must have completed step (1),
21 | before you are able to run this script.
22 | 
23 | NOTE: The processed training, validation, and testing files that step (2) outputs, are uploaded here as well.
24 | You can directly use these and skip step (2). However, after completing step (1), you might have
25 | to run (2) yourself, to create these three annotation files yourself, in case there is some discrepancy between
26 | the way `videos_to_frames.py` extracts RGB frames on my machine compared to on yours (This is very likely. I 
27 | recommend running step 2 yourself).
28 | 
29 | ### 3. Done
30 | That's it! You should then have a folder on your disk `RGB` that contains all videos in individual RGB
31 | frames, and the three annotation files. This is all you need to use VideoFrameDataset and start training
32 | on Kinetics 400!
33 | 


--------------------------------------------------------------------------------
/docs/source/conf.py:
--------------------------------------------------------------------------------
 1 | # Configuration file for the Sphinx documentation builder.
 2 | #
 3 | # This file only contains a selection of the most common options. For a full
 4 | # list see the documentation:
 5 | # https://www.sphinx-doc.org/en/master/usage/configuration.html
 6 | 
 7 | # -- Path setup --------------------------------------------------------------
 8 | 
 9 | # If extensions (or modules to document with autodoc) are in another directory,
10 | # add these directories to sys.path here. If the directory is relative to the
11 | # documentation root, use os.path.abspath to make it absolute, like shown here.
12 | #
13 | import os
14 | import sys
15 | sys.path.insert(0, os.path.abspath('../..'))
16 | 
17 | 
18 | # -- Project information -----------------------------------------------------
19 | 
20 | project = 'Video Dataset Loading PyTorch'
21 | copyright = '2020, Raivo Koot'
22 | author = 'Raivo Koot'
23 | 
24 | # The full version, including alpha/beta/rc tags
25 | release = '1.0'
26 | 
27 | 
28 | # -- General configuration ---------------------------------------------------
29 | 
30 | # Add any Sphinx extension module names here, as strings. They can be
31 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
32 | # ones.
33 | extensions = [
34 | 	'sphinx.ext.napoleon',
35 |     'sphinx.ext.autodoc',
36 |     'sphinx.ext.viewcode',
37 |     'sphinx.ext.coverage',
38 | ]
39 | 
40 | # Add any paths that contain templates here, relative to this directory.
41 | templates_path = ['_templates']
42 | 
43 | # List of patterns, relative to source directory, that match files and
44 | # directories to ignore when looking for source files.
45 | # This pattern also affects html_static_path and html_extra_path.
46 | exclude_patterns = []
47 | 
48 | 
49 | # -- Options for HTML output -------------------------------------------------
50 | 
51 | # The theme to use for HTML and HTML Help pages.  See the documentation for
52 | # a list of builtin themes.
53 | #
54 | html_theme = 'sphinx_rtd_theme'
55 | 
56 | # Add any paths that contain custom static files (such as style sheets) here,
57 | # relative to this directory. They are copied after the builtin static files,
58 | # so a file named "default.css" will overwrite the builtin "default.css".
59 | html_static_path = ['_static']


--------------------------------------------------------------------------------
/SomethingSomethingV2/videos_to_frames.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import cv2
 3 | import threading
 4 | from queue import Queue
 5 | 
 6 | """
 7 | Given individual video files (mp4, webm) on disk, creates a folder for
 8 | every video file and saves the video's RGB frames as jpeg files in that
 9 | folder.
10 | 
11 | It can be used to turn SomethingSomethingV2, which comes as 
12 | many ".webm" files, into an RGB folder for each ".webm" file.
13 | Uses multithreading to extract frames faster.
14 | 
15 | Modify the two filepaths at the bottom and then run this script.
16 | """
17 | 
18 | 
19 | def video_to_rgb(video_filename, out_dir, resize_shape):
20 |     file_template = 'frame_{0:012d}.jpg'
21 |     reader = cv2.VideoCapture(video_filename)
22 |     success, frame, = reader.read()  # read first frame
23 | 
24 |     count = 0
25 |     while success:
26 |         out_filepath = os.path.join(out_dir, file_template.format(count))
27 |         frame = cv2.resize(frame, resize_shape)
28 |         cv2.imwrite(out_filepath, frame)
29 |         success, frame = reader.read()
30 |         count += 1
31 | 
32 | def process_videofile(video_filename, video_path, rgb_out_path, file_extension: str ='.mp4'):
33 |     filepath = os.path.join(video_path, video_filename)
34 |     video_filename = video_filename.replace(file_extension, '')
35 | 
36 |     out_dir = os.path.join(rgb_out_path, video_filename)
37 |     os.mkdir(out_dir)
38 |     video_to_rgb(filepath, out_dir, resize_shape=(224, 224))
39 | 
40 | def thread_job(queue, video_path, rgb_out_path, file_extension='.webm'):
41 |     while not queue.empty():
42 |         video_filename = queue.get()
43 |         process_videofile(video_filename, video_path, rgb_out_path, file_extension=file_extension)
44 |         queue.task_done()
45 | 
46 | 
47 | if __name__ == '__main__':
48 |     # the path to the folder which contains all video files (mp4, webm, or other)
49 |     video_path = 'videos'
50 |     # the root output path where RGB frame folders should be created
51 |     rgb_out_path = 'rgb'
52 |     # the file extension that the videos have
53 |     file_extension = '.webm'
54 | 
55 |     video_filenames = os.listdir(video_path)
56 |     queue = Queue()
57 |     [queue.put(video_filename) for video_filename in video_filenames]
58 | 
59 |     NUM_THREADS = 30
60 |     for i in range(NUM_THREADS):
61 |         worker = threading.Thread(target=thread_job, args=(queue, video_path, rgb_out_path, file_extension))
62 |         worker.start()
63 | 
64 |     print('waiting for all videos to be completed.', queue.qsize(), 'videos')
65 |     print('This can take an hour or two depending on dataset size')
66 |     queue.join()
67 |     print('all done')
68 | 


--------------------------------------------------------------------------------
/Kinetics400/videos_to_frames.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import cv2
 3 | import threading
 4 | from queue import Queue
 5 | 
 6 | """
 7 | Given individual video files (mp4, webm) on disk, creates a folder for
 8 | every video file and saves the video's RGB frames as jpeg files in that
 9 | folder.
10 | 
11 | It can be used to turn Kinetics 400, which comes as
12 | many ".mp4" files, into an RGB folder for each ".mp4" file.
13 | Uses multithreading to extract frames faster.
14 | 
15 | Modify the two filepaths at the bottom and then run this script.
16 | """
17 | 
18 | 
19 | def video_to_rgb(video_filename, out_dir, resize_shape):
20 |     file_template = 'frame_{0:012d}.jpg'
21 |     reader = cv2.VideoCapture(video_filename)
22 |     success, frame, = reader.read()  # read first frame
23 | 
24 |     count = 0
25 |     while success:
26 |         out_filepath = os.path.join(out_dir, file_template.format(count))
27 |         frame = cv2.resize(frame, resize_shape)
28 |         cv2.imwrite(out_filepath, frame)
29 |         success, frame = reader.read()
30 |         count += 1
31 | 
32 | def process_videofile(video_filename, video_path, rgb_out_path, file_extension: str ='.mp4'):
33 |     filepath = os.path.join(video_path, video_filename)
34 |     video_filename = video_filename.replace(file_extension, '')
35 | 
36 |     out_dir = os.path.join(rgb_out_path, video_filename)
37 |     os.mkdir(out_dir)
38 |     video_to_rgb(filepath, out_dir, resize_shape=OUT_HEIGHT_WIDTH)
39 | 
40 | def thread_job(queue, video_path, rgb_out_path, file_extension='.webm'):
41 |     while not queue.empty():
42 |         video_filename = queue.get()
43 |         process_videofile(video_filename, video_path, rgb_out_path, file_extension=file_extension)
44 |         queue.task_done()
45 | 
46 | 
47 | if __name__ == '__main__':
48 |     # the path to the folder which contains all video files (mp4, webm, or other)
49 |     video_path = '/home/raivo/data1/kinetics/videos/all'
50 |     # the root output path where RGB frame folders should be created
51 |     rgb_out_path = 'rgb'
52 |     # the file extension that the videos have
53 |     file_extension = '.mp4'
54 |     # hight and width to resize RGB frames to
55 |     OUT_HEIGHT_WIDTH = (224, 224)
56 | 
57 |     video_filenames = os.listdir(video_path)
58 |     queue = Queue()
59 |     [queue.put(video_filename) for video_filename in video_filenames]
60 | 
61 |     NUM_THREADS = 15
62 |     for i in range(NUM_THREADS):
63 |         worker = threading.Thread(target=thread_job, args=(queue, video_path, rgb_out_path, file_extension))
64 |         worker.start()
65 | 
66 |     print('waiting for all videos to be completed.', queue.qsize(), 'videos')
67 |     print('This can take an hour or two depending on dataset size')
68 |     queue.join()
69 |     print('all done')
70 | 


--------------------------------------------------------------------------------
/docs/source/README.rst:
--------------------------------------------------------------------------------
  1 | Efficient Video Dataset Loading, Preprocessing, and Augmentation
  2 | ================================================================
  3 | 
  4 | Author: `Raivo Koot <https://github.com/RaivoKoot>`__
  5 | 
  6 | If you are completely unfamiliar with loading datasets in PyTorch using
  7 | ``torch.utils.data.Dataset`` and ``torch.utils.data.DataLoader``, I
  8 | recommend getting familiar with these first through
  9 | `this <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`__
 10 | or
 11 | `this <https://github.com/utkuozbulak/pytorch-custom-dataset-examples>`__.
 12 | 
 13 | Overview: This example demonstrates the use of ``VideoFrameDataset``
 14 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 15 | 
 16 | The VideoFrameDataset class serves to ``easily``, ``efficiently`` and
 17 | ``effectively`` load video samples from video datasets in PyTorch. 1)
 18 | Easily because this dataset class can be used with custom datasets with
 19 | minimum effort and no modification. The class merely expects the video
 20 | dataset to have a certain structure on disk and expects a .txt
 21 | annotation file that enumerates each video sample. Details on this can
 22 | be found below and at
 23 | ``https://pykale.readthedocs.io/en/latest/kale.loaddata.html#kale-loaddata-video_dataset-module``.
 24 | 2) Efficiently because the video loading pipeline that this class
 25 | implements is very fast. This minimizes GPU waiting time during training
 26 | by eliminating input bottlenecks that can slow down training time by
 27 | several folds. 3) Effectively because the implemented sampling strategy
 28 | for video frames is very strong. Video training using the entire
 29 | sequence of video frames (often several hundred) is too memory and
 30 | compute intense. Therefore, this implementation samples frames evenly
 31 | from the video (sparse temporal sampling) so that the loaded frames
 32 | represent every part of the video, with support for arbitrary and
 33 | differing video lengths within the same dataset. This approach has shown
 34 | to be very effective and is taken from `"Temporal Segment Networks
 35 | (ECCV2016)" <https://arxiv.org/abs/1608.00859>`__ with modifications.
 36 | 
 37 | In conjunction with PyTorch's DataLoader, the VideoFrameDataset class
 38 | returns video batch tensors of size
 39 | ``BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH``.
 40 | 
 41 | For a demo, visit ``demo.py``. ### QuickDemo (demo.py)
 42 | 
 43 | .. code:: python
 44 | 
 45 |     root = os.path.join(os.getcwd(), 'demo_dataset')  # Folder in which all videos lie in a specific structure
 46 |     annotation_file = os.path.join(root, 'annotations.txt')  # A row for each video sample as: (VIDEO_PATH NUM_FRAMES CLASS_INDEX)
 47 | 
 48 |     """ DEMO 1 WITHOUT IMAGE TRANSFORMS """
 49 |     dataset = VideoFrameDataset(
 50 |         root_path=root,
 51 |         annotationfile_path=annotation_file,
 52 |         num_segments=5,
 53 |         frames_per_segment=1,
 54 |         image_template='img_{:05d}.jpg',
 55 |         transform=None,
 56 |         random_shift=True,
 57 |         test_mode=False
 58 |     )
 59 | 
 60 |     sample = dataset[0]  # take first sample of dataset 
 61 |     frames = sample[0]   # list of PIL images
 62 |     label = sample[1]    # integer label
 63 | 
 64 |     for image in frames:
 65 |         plt.imshow(image)
 66 |         plt.title(label)
 67 |         plt.show()
 68 |         plt.pause(1)
 69 | 
 70 | Table of Contents
 71 | =================
 72 | 
 73 | -  `1. Requirements <#1-requirements>`__
 74 | -  `2. Custom Dataset <#2-custom-dataset>`__
 75 | -  `3. Video Frame Sampling Method <#3-video-frame-sampling-method>`__
 76 | -  `4. Using VideoFrameDataset for
 77 |    Training <#4-using-videoframedataset-for-training>`__
 78 | -  `5. Conclusion <#5-conclusion>`__
 79 | -  `6. Acknowledgements <#6-acknowledgements>`__
 80 | 
 81 | 1. Requirements
 82 | ~~~~~~~~~~~~~~~
 83 | 
 84 | ::
 85 | 
 86 |     # Without these three, VideoFrameDataset will not work.
 87 |     torchvision >= 0.8.0
 88 |     torch >= 1.7.0
 89 |     python >= 3.6
 90 | 
 91 | 2. Custom Dataset
 92 | ~~~~~~~~~~~~~~~~~
 93 | 
 94 | To use any dataset, two conditions must be met. 1) The video data must
 95 | be supplied as RGB frames, each frame saved as an image file. Each video
 96 | must have its own folder, in which the frames of that video lie. The
 97 | frames of a video inside its folder must be named uniformly as
 98 | ``img_00001.jpg`` ... ``img_00120.jpg``, if there are 120 frames. The
 99 | filename template for frames is then "img\_{:05d}.jpg" (python string
100 | formatting, specifying 5 digits after the underscore), and must be
101 | supplied to the constructor of VideoFrameDataset as a parameter. Each
102 | video folder lies inside a ``root`` folder of this dataset. 2) To
103 | enumerate all video samples in the dataset and their required metadata,
104 | a ``.txt`` annotation file must be manually created that contains a row
105 | for each video sample in the dataset. The training, validation, and
106 | testing datasets must have separate annotation files. Each row must be a
107 | space-separated list that contains
108 | ``VIDEO_PATH NUM_FRAMES CLASS_INDEX``. The ``VIDEO_PATH`` of a video
109 | sample should be provided without the ``root`` prefix of this dataset.
110 | 
111 | This example project demonstrates this using a dummy dataset inside of
112 | ``demo_dataset/``, which is the ``root`` dataset folder of this example.
113 | The folder structure looks as follows:
114 | 
115 | ::
116 | 
117 |     demo_dataset
118 |     │
119 |     ├───annotations.txt
120 |     ├───jumping # arbitrary class folder naming
121 |     │       ├───0001  # arbitrary video folder naming
122 |     │       │     ├───img_00001.jpg
123 |     │       │     .
124 |     │       │     └───img_00017.jpg
125 |     │       └───0002
126 |     │             ├───img_00001.jpg
127 |     │             .
128 |     │             └───img_00017.jpg
129 |     │
130 |     └───running # arbitrary folder naming
131 |             ├───0001  # arbitrary video folder naming
132 |             │     ├───img_00001.jpg
133 |             │     .
134 |             │     └───img_00017.jpg
135 |             └───0002
136 |                   ├───img_00001.jpg
137 |                   .
138 |                   └───img_00017.jpg
139 | 
140 |      
141 | 
142 | The accompanying annotation ``.txt`` file contains the following rows
143 | 
144 | ::
145 | 
146 |     jumping/0001 17 0
147 |     jumping/0002 18 0
148 |     running/0001 15 1
149 |     running/0002 20 1
150 | 
151 | Instantiating a VideoFrameDataset with the ``root_path`` parameter
152 | pointing to ``demo_dataset``, the ``annotationsfile_path`` parameter
153 | pointing to the annotation file, and the ``imagefile_template``
154 | parameter as "img\_{:05d}.jpg", is all that it takes to start using the
155 | VideoFrameDataset class.
156 | 
157 | 3. Video Frame Sampling Method
158 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
159 | 
160 | When loading a video, only a number of its frames are loaded. They are
161 | chosen in the following way: 1. The frame indices [1,N] are divided into
162 | NUM\_SEGMENTS even segments. From each segment, FRAMES\_PER\_SEGMENT
163 | consecutive indices are chosen at random. This results in
164 | NUM\_SEGMENTS\*FRAMES\_PER\_SEGMENT chosen indices, whose frames are
165 | loaded as PIL images and put into a list and returned when calling
166 | ``dataset[i]``.
167 | 
168 | 4. Using VideoFrameDataset for training
169 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
170 | 
171 | As demonstrated in ``demo.py``, we can use PyTorch's
172 | ``torch.utils.data.DataLoader`` class with VideoFrameDataset to take
173 | care of shuffling, batching, and more. To turn the lists of PIL images
174 | returned by VideoFrameDataset into tensors, the transform
175 | ``video_dataset.imglist_totensor()`` can be supplied as the
176 | ``transform`` parameter to VideoFrameDataset. This turns a list of N PIL
177 | images into a batch of images/frames of shape
178 | ``N x CHANNELS x HEIGHT x WIDTH``. We can further chain preprocessing
179 | and augmentation functions that act on batches of images onto the end of
180 | ``imglist_totensor()``.
181 | 
182 | As of ``torchvision 0.8.0``, all torchvision transforms can now also
183 | operate on batches of images, and they apply deterministic or random
184 | transformations on the batch identically on all images of the batch.
185 | Therefore, any torchvision transform can be used here to apply
186 | video-uniform preprocessing and augmentation.
187 | 
188 | 5. Conclusion
189 | ~~~~~~~~~~~~~
190 | 
191 | A proper code-based explanation on how to use VideoFrameDataset for
192 | training is provided in ``demo.py``
193 | 
194 | 6. Acknowledgements
195 | ~~~~~~~~~~~~~~~~~~~
196 | 
197 | We thank the authors of TSN for their
198 | `codebase <https://github.com/yjxiong/tsn-pytorch>`__, from which we
199 | took VideoFrameDataset and adapted it.
200 | 


--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
  1 | from video_dataset import  VideoFrameDataset, ImglistToTensor
  2 | from torchvision import transforms
  3 | import torch
  4 | import matplotlib.pyplot as plt
  5 | from mpl_toolkits.axes_grid1 import ImageGrid
  6 | import os
  7 | 
  8 | """
  9 | Ignore this function and look at "main" below.
 10 | """
 11 | def plot_video(rows, cols, frame_list, plot_width, plot_height, title: str):
 12 |     fig = plt.figure(figsize=(plot_width, plot_height))
 13 |     grid = ImageGrid(fig, 111,  # similar to subplot(111)
 14 |                      nrows_ncols=(rows, cols),  # creates 2x2 grid of axes
 15 |                      axes_pad=0.3,  # pad between axes in inch.
 16 |                      )
 17 | 
 18 |     for index, (ax, im) in enumerate(zip(grid, frame_list)):
 19 |         # Iterating over the grid returns the Axes.
 20 |         ax.imshow(im)
 21 |         ax.set_title(index)
 22 |     plt.suptitle(title)
 23 |     plt.show()
 24 | 
 25 | if __name__ == '__main__':
 26 |     """
 27 |     This demo uses the dummy dataset inside of the folder "demo_dataset".
 28 |     It is structured just like a real dataset would need to be structured.
 29 |     
 30 |     TABLE OF CODE CONTENTS:
 31 |     1. Minimal demo without image transforms
 32 |     2. Minimal demo without sparse temporal sampling for single continuous frame clips, without image transforms
 33 |     3. Demo with image transforms
 34 |     4. Demo 3 continued with PyTorch dataloader
 35 |     5. Demo of using a dataset where samples have multiple separate class labels
 36 |     
 37 |     """
 38 |     videos_root = os.path.join(os.getcwd(), 'demo_dataset')
 39 |     annotation_file = os.path.join(videos_root, 'annotations.txt')
 40 | 
 41 | 
 42 |     """ DEMO 1 WITHOUT IMAGE TRANSFORMS """
 43 |     dataset = VideoFrameDataset(
 44 |         root_path=videos_root,
 45 |         annotationfile_path=annotation_file,
 46 |         num_segments=5,
 47 |         frames_per_segment=1,
 48 |         imagefile_template='img_{:05d}.jpg',
 49 |         transform=None,
 50 |         test_mode=False
 51 |     )
 52 | 
 53 |     sample = dataset[0]
 54 |     frames = sample[0]  # list of PIL images
 55 |     label = sample[1]   # integer label
 56 | 
 57 |     plot_video(rows=1, cols=5, frame_list=frames, plot_width=15., plot_height=3.,
 58 |                title='Evenly Sampled Frames, No Video Transform')
 59 | 
 60 | 
 61 | 
 62 |     """ DEMO 2 SINGLE CONTINUOUS FRAME CLIP INSTEAD OF SAMPLED FRAMES, WITHOUT TRANSFORMS """
 63 |     # If you do not want to use sparse temporal sampling, and instead
 64 |     # want to just load N consecutive frames starting from a random
 65 |     # start index, this is easy. Simply set NUM_SEGMENTS=1 and
 66 |     # FRAMES_PER_SEGMENT=N. Each time a sample is loaded, N
 67 |     # frames will be loaded from a new random start index.
 68 |     dataset = VideoFrameDataset(
 69 |         root_path=videos_root,
 70 |         annotationfile_path=annotation_file,
 71 |         num_segments=1,
 72 |         frames_per_segment=9,
 73 |         imagefile_template='img_{:05d}.jpg',
 74 |         transform=None,
 75 |         test_mode=False
 76 |     )
 77 | 
 78 |     sample = dataset[3]
 79 |     frames = sample[0]  # list of PIL images
 80 |     label = sample[1]  # integer label
 81 | 
 82 |     plot_video(rows=3, cols=3, frame_list=frames, plot_width=10., plot_height=5.,
 83 |                title='Continuous Sampled Frame Clip, No Video Transform')
 84 | 
 85 | 
 86 | 
 87 |     """ DEMO 3 WITH TRANSFORMS """
 88 |     # As of torchvision 0.8.0, torchvision transforms support batches of images
 89 |     # of size (BATCH x CHANNELS x HEIGHT x WIDTH) and apply deterministic or random
 90 |     # transformations on the batch identically on all images of the batch. Any torchvision
 91 |     # transform for image augmentation can thus also be used  for video augmentation.
 92 |     preprocess = transforms.Compose([
 93 |         ImglistToTensor(),  # list of PIL images to (FRAMES x CHANNELS x HEIGHT x WIDTH) tensor
 94 |         transforms.Resize(299),  # image batch, resize smaller edge to 299
 95 |         transforms.CenterCrop(299),  # image batch, center crop to square 299x299
 96 |         transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
 97 |     ])
 98 | 
 99 |     dataset = VideoFrameDataset(
100 |         root_path=videos_root,
101 |         annotationfile_path=annotation_file,
102 |         num_segments=5,
103 |         frames_per_segment=1,
104 |         imagefile_template='img_{:05d}.jpg',
105 |         transform=preprocess,
106 |         test_mode=False
107 |     )
108 | 
109 |     sample = dataset[2]
110 |     frame_tensor = sample[0]  # tensor of shape (NUM_SEGMENTS*FRAMES_PER_SEGMENT) x CHANNELS x HEIGHT x WIDTH
111 |     label = sample[1]  # integer label
112 | 
113 |     print('Video Tensor Size:', frame_tensor.size())
114 | 
115 |     def denormalize(video_tensor):
116 |         """
117 |         Undoes mean/standard deviation normalization, zero to one scaling,
118 |         and channel rearrangement for a batch of images.
119 |         args:
120 |             video_tensor: a (FRAMES x CHANNELS x HEIGHT x WIDTH) tensor
121 |         """
122 |         inverse_normalize = transforms.Normalize(
123 |             mean=[-0.485 / 0.229, -0.456 / 0.224, -0.406 / 0.225],
124 |             std=[1 / 0.229, 1 / 0.224, 1 / 0.225]
125 |         )
126 |         return (inverse_normalize(video_tensor) * 255.).type(torch.uint8).permute(0, 2, 3, 1).numpy()
127 | 
128 | 
129 |     frame_tensor = denormalize(frame_tensor)
130 |     plot_video(rows=1, cols=5, frame_list=frame_tensor, plot_width=15., plot_height=3.,
131 |                title='Evenly Sampled Frames, + Video Transform')
132 | 
133 | 
134 | 
135 |     """ DEMO 3 CONTINUED: DATALOADER """
136 |     dataloader = torch.utils.data.DataLoader(
137 |         dataset=dataset,
138 |         batch_size=2,
139 |         shuffle=True,
140 |         num_workers=4,
141 |         pin_memory=True
142 |     )
143 | 
144 |     for epoch in range(10):
145 |         for video_batch, labels in dataloader:
146 |             """
147 |             Insert Training Code Here
148 |             """
149 |             print(labels)
150 |             print("\nVideo Batch Tensor Size:", video_batch.size())
151 |             print("Batch Labels Size:", labels.size())
152 |             break
153 |         break
154 | 
155 | 
156 |     """ DEMO 5: SAMPLES WITH MULTIPLE LABELS """
157 |     """
158 |     Apart from supporting just a single label per sample, VideoFrameDataset also supports multi-label samples,
159 |     where a sample can be associated with more than just one label. EPIC-KITCHENS, for example, associates a
160 |     noun, verb, and action with each video clip. To support this, instead of each row in annotations.txt
161 |     being (VIDEO_PATH, START_FRAME, END_FRAME, LABEL_ID), each row can also be
162 |     (VIDEO_PATH, START_FRAME, END_FRAME, LABEL_1_ID, ..., LABEL_N_ID). An example of this can be seen in the
163 |     directory `demo_dataset_multilabel`.
164 |     
165 |     Each sample returned by VideoFrameDataset is then ((FRAMESxCHANNELSxHEIGHTxWIDTH), (LABEL_1, ..., LABEL_N)).
166 |     When paired with the `torch.utils.data.DataLoader`, instead of yielding each batch as
167 |     ((BATCHxFRAMESxCHANNELSxHEIGHTxWIDTH), (BATCH)) where the second tuple item is the labels of the batch,
168 |     `torch.utils.data.DataLoader` returns a batch as ((BATCHxFRAMESxCHANNELSxHEIGHTxWIDTH), ((BATCH),...,(BATCH))
169 |     where the second tuple item is itself a tuple, with N BATCH-sized tensors of labels, where N is the 
170 |     number of labels assigned to each sample.
171 |     """
172 |     videos_root = os.path.join(os.getcwd(), 'demo_dataset_multilabel')
173 |     annotation_file = os.path.join(videos_root, 'annotations.txt')
174 | 
175 |     dataset = VideoFrameDataset(
176 |         root_path=videos_root,
177 |         annotationfile_path=annotation_file,
178 |         num_segments=5,
179 |         frames_per_segment=1,
180 |         imagefile_template='img_{:05d}.jpg',
181 |         transform=preprocess,
182 |         test_mode=False
183 |     )
184 | 
185 |     dataloader = torch.utils.data.DataLoader(
186 |         dataset=dataset,
187 |         batch_size=3,
188 |         shuffle=True,
189 |         num_workers=2,
190 |         pin_memory=True
191 |     )
192 | 
193 |     print("\nMulti-Label Example")
194 |     for epoch in range(10):
195 |         for batch in dataloader:
196 |             """
197 |             Insert Training Code Here
198 |             """
199 |             video_batch, (labels1, labels2, labels3) = batch
200 | 
201 |             print("Video Batch Tensor Size:", video_batch.size())
202 |             print("Labels1 Size:", labels1.size())  # == batch_size
203 |             print("Labels2 Size:", labels2.size())  # == batch_size
204 |             print("Labels3 Size:", labels3.size())  # == batch_size
205 | 
206 |             break
207 |         break
208 | 


--------------------------------------------------------------------------------
/Kinetics400/labels_to_id.csv:
--------------------------------------------------------------------------------
  1 | id,name
  2 | 0,abseiling
  3 | 1,air drumming
  4 | 2,answering questions
  5 | 3,applauding
  6 | 4,applying cream
  7 | 5,archery
  8 | 6,arm wrestling
  9 | 7,arranging flowers
 10 | 8,assembling computer
 11 | 9,auctioning
 12 | 10,baby waking up
 13 | 11,baking cookies
 14 | 12,balloon blowing
 15 | 13,bandaging
 16 | 14,barbequing
 17 | 15,bartending
 18 | 16,beatboxing
 19 | 17,bee keeping
 20 | 18,belly dancing
 21 | 19,bench pressing
 22 | 20,bending back
 23 | 21,bending metal
 24 | 22,biking through snow
 25 | 23,blasting sand
 26 | 24,blowing glass
 27 | 25,blowing leaves
 28 | 26,blowing nose
 29 | 27,blowing out candles
 30 | 28,bobsledding
 31 | 29,bookbinding
 32 | 30,bouncing on trampoline
 33 | 31,bowling
 34 | 32,braiding hair
 35 | 33,breading or breadcrumbing
 36 | 34,breakdancing
 37 | 35,brush painting
 38 | 36,brushing hair
 39 | 37,brushing teeth
 40 | 38,building cabinet
 41 | 39,building shed
 42 | 40,bungee jumping
 43 | 41,busking
 44 | 42,canoeing or kayaking
 45 | 43,capoeira
 46 | 44,carrying baby
 47 | 45,cartwheeling
 48 | 46,carving pumpkin
 49 | 47,catching fish
 50 | 48,catching or throwing baseball
 51 | 49,catching or throwing frisbee
 52 | 50,catching or throwing softball
 53 | 51,celebrating
 54 | 52,changing oil
 55 | 53,changing wheel
 56 | 54,checking tires
 57 | 55,cheerleading
 58 | 56,chopping wood
 59 | 57,clapping
 60 | 58,clay pottery making
 61 | 59,clean and jerk
 62 | 60,cleaning floor
 63 | 61,cleaning gutters
 64 | 62,cleaning pool
 65 | 63,cleaning shoes
 66 | 64,cleaning toilet
 67 | 65,cleaning windows
 68 | 66,climbing a rope
 69 | 67,climbing ladder
 70 | 68,climbing tree
 71 | 69,contact juggling
 72 | 70,cooking chicken
 73 | 71,cooking egg
 74 | 72,cooking on campfire
 75 | 73,cooking sausages
 76 | 74,counting money
 77 | 75,country line dancing
 78 | 76,cracking neck
 79 | 77,crawling baby
 80 | 78,crossing river
 81 | 79,crying
 82 | 80,curling hair
 83 | 81,cutting nails
 84 | 82,cutting pineapple
 85 | 83,cutting watermelon
 86 | 84,dancing ballet
 87 | 85,dancing charleston
 88 | 86,dancing gangnam style
 89 | 87,dancing macarena
 90 | 88,deadlifting
 91 | 89,decorating the christmas tree
 92 | 90,digging
 93 | 91,dining
 94 | 92,disc golfing
 95 | 93,diving cliff
 96 | 94,dodgeball
 97 | 95,doing aerobics
 98 | 96,doing laundry
 99 | 97,doing nails
100 | 98,drawing
101 | 99,dribbling basketball
102 | 100,drinking
103 | 101,drinking beer
104 | 102,drinking shots
105 | 103,driving car
106 | 104,driving tractor
107 | 105,drop kicking
108 | 106,drumming fingers
109 | 107,dunking basketball
110 | 108,dying hair
111 | 109,eating burger
112 | 110,eating cake
113 | 111,eating carrots
114 | 112,eating chips
115 | 113,eating doughnuts
116 | 114,eating hotdog
117 | 115,eating ice cream
118 | 116,eating spaghetti
119 | 117,eating watermelon
120 | 118,egg hunting
121 | 119,exercising arm
122 | 120,exercising with an exercise ball
123 | 121,extinguishing fire
124 | 122,faceplanting
125 | 123,feeding birds
126 | 124,feeding fish
127 | 125,feeding goats
128 | 126,filling eyebrows
129 | 127,finger snapping
130 | 128,fixing hair
131 | 129,flipping pancake
132 | 130,flying kite
133 | 131,folding clothes
134 | 132,folding napkins
135 | 133,folding paper
136 | 134,front raises
137 | 135,frying vegetables
138 | 136,garbage collecting
139 | 137,gargling
140 | 138,getting a haircut
141 | 139,getting a tattoo
142 | 140,giving or receiving award
143 | 141,golf chipping
144 | 142,golf driving
145 | 143,golf putting
146 | 144,grinding meat
147 | 145,grooming dog
148 | 146,grooming horse
149 | 147,gymnastics tumbling
150 | 148,hammer throw
151 | 149,headbanging
152 | 150,headbutting
153 | 151,high jump
154 | 152,high kick
155 | 153,hitting baseball
156 | 154,hockey stop
157 | 155,holding snake
158 | 156,hopscotch
159 | 157,hoverboarding
160 | 158,hugging
161 | 159,hula hooping
162 | 160,hurdling
163 | 161,hurling (sport)
164 | 162,ice climbing
165 | 163,ice fishing
166 | 164,ice skating
167 | 165,ironing
168 | 166,javelin throw
169 | 167,jetskiing
170 | 168,jogging
171 | 169,juggling balls
172 | 170,juggling fire
173 | 171,juggling soccer ball
174 | 172,jumping into pool
175 | 173,jumpstyle dancing
176 | 174,kicking field goal
177 | 175,kicking soccer ball
178 | 176,kissing
179 | 177,kitesurfing
180 | 178,knitting
181 | 179,krumping
182 | 180,laughing
183 | 181,laying bricks
184 | 182,long jump
185 | 183,lunge
186 | 184,making a cake
187 | 185,making a sandwich
188 | 186,making bed
189 | 187,making jewelry
190 | 188,making pizza
191 | 189,making snowman
192 | 190,making sushi
193 | 191,making tea
194 | 192,marching
195 | 193,massaging back
196 | 194,massaging feet
197 | 195,massaging legs
198 | 196,massaging person's head
199 | 197,milking cow
200 | 198,mopping floor
201 | 199,motorcycling
202 | 200,moving furniture
203 | 201,mowing lawn
204 | 202,news anchoring
205 | 203,opening bottle
206 | 204,opening present
207 | 205,paragliding
208 | 206,parasailing
209 | 207,parkour
210 | 208,passing American football (in game)
211 | 209,passing American football (not in game)
212 | 210,peeling apples
213 | 211,peeling potatoes
214 | 212,petting animal (not cat)
215 | 213,petting cat
216 | 214,picking fruit
217 | 215,planting trees
218 | 216,plastering
219 | 217,playing accordion
220 | 218,playing badminton
221 | 219,playing bagpipes
222 | 220,playing basketball
223 | 221,playing bass guitar
224 | 222,playing cards
225 | 223,playing cello
226 | 224,playing chess
227 | 225,playing clarinet
228 | 226,playing controller
229 | 227,playing cricket
230 | 228,playing cymbals
231 | 229,playing didgeridoo
232 | 230,playing drums
233 | 231,playing flute
234 | 232,playing guitar
235 | 233,playing harmonica
236 | 234,playing harp
237 | 235,playing ice hockey
238 | 236,playing keyboard
239 | 237,playing kickball
240 | 238,playing monopoly
241 | 239,playing organ
242 | 240,playing paintball
243 | 241,playing piano
244 | 242,playing poker
245 | 243,playing recorder
246 | 244,playing saxophone
247 | 245,playing squash or racquetball
248 | 246,playing tennis
249 | 247,playing trombone
250 | 248,playing trumpet
251 | 249,playing ukulele
252 | 250,playing violin
253 | 251,playing volleyball
254 | 252,playing xylophone
255 | 253,pole vault
256 | 254,presenting weather forecast
257 | 255,pull ups
258 | 256,pumping fist
259 | 257,pumping gas
260 | 258,punching bag
261 | 259,punching person (boxing)
262 | 260,push up
263 | 261,pushing car
264 | 262,pushing cart
265 | 263,pushing wheelchair
266 | 264,reading book
267 | 265,reading newspaper
268 | 266,recording music
269 | 267,riding a bike
270 | 268,riding camel
271 | 269,riding elephant
272 | 270,riding mechanical bull
273 | 271,riding mountain bike
274 | 272,riding mule
275 | 273,riding or walking with horse
276 | 274,riding scooter
277 | 275,riding unicycle
278 | 276,ripping paper
279 | 277,robot dancing
280 | 278,rock climbing
281 | 279,rock scissors paper
282 | 280,roller skating
283 | 281,running on treadmill
284 | 282,sailing
285 | 283,salsa dancing
286 | 284,sanding floor
287 | 285,scrambling eggs
288 | 286,scuba diving
289 | 287,setting table
290 | 288,shaking hands
291 | 289,shaking head
292 | 290,sharpening knives
293 | 291,sharpening pencil
294 | 292,shaving head
295 | 293,shaving legs
296 | 294,shearing sheep
297 | 295,shining shoes
298 | 296,shooting basketball
299 | 297,shooting goal (soccer)
300 | 298,shot put
301 | 299,shoveling snow
302 | 300,shredding paper
303 | 301,shuffling cards
304 | 302,side kick
305 | 303,sign language interpreting
306 | 304,singing
307 | 305,situp
308 | 306,skateboarding
309 | 307,ski jumping
310 | 308,skiing (not slalom or crosscountry)
311 | 309,skiing crosscountry
312 | 310,skiing slalom
313 | 311,skipping rope
314 | 312,skydiving
315 | 313,slacklining
316 | 314,slapping
317 | 315,sled dog racing
318 | 316,smoking
319 | 317,smoking hookah
320 | 318,snatch weight lifting
321 | 319,sneezing
322 | 320,sniffing
323 | 321,snorkeling
324 | 322,snowboarding
325 | 323,snowkiting
326 | 324,snowmobiling
327 | 325,somersaulting
328 | 326,spinning poi
329 | 327,spray painting
330 | 328,spraying
331 | 329,springboard diving
332 | 330,squat
333 | 331,sticking tongue out
334 | 332,stomping grapes
335 | 333,stretching arm
336 | 334,stretching leg
337 | 335,strumming guitar
338 | 336,surfing crowd
339 | 337,surfing water
340 | 338,sweeping floor
341 | 339,swimming backstroke
342 | 340,swimming breast stroke
343 | 341,swimming butterfly stroke
344 | 342,swing dancing
345 | 343,swinging legs
346 | 344,swinging on something
347 | 345,sword fighting
348 | 346,tai chi
349 | 347,taking a shower
350 | 348,tango dancing
351 | 349,tap dancing
352 | 350,tapping guitar
353 | 351,tapping pen
354 | 352,tasting beer
355 | 353,tasting food
356 | 354,testifying
357 | 355,texting
358 | 356,throwing axe
359 | 357,throwing ball
360 | 358,throwing discus
361 | 359,tickling
362 | 360,tobogganing
363 | 361,tossing coin
364 | 362,tossing salad
365 | 363,training dog
366 | 364,trapezing
367 | 365,trimming or shaving beard
368 | 366,trimming trees
369 | 367,triple jump
370 | 368,tying bow tie
371 | 369,tying knot (not on a tie)
372 | 370,tying tie
373 | 371,unboxing
374 | 372,unloading truck
375 | 373,using computer
376 | 374,using remote controller (not gaming)
377 | 375,using segway
378 | 376,vault
379 | 377,waiting in line
380 | 378,walking the dog
381 | 379,washing dishes
382 | 380,washing feet
383 | 381,washing hair
384 | 382,washing hands
385 | 383,water skiing
386 | 384,water sliding
387 | 385,watering plants
388 | 386,waxing back
389 | 387,waxing chest
390 | 388,waxing eyebrows
391 | 389,waxing legs
392 | 390,weaving basket
393 | 391,welding
394 | 392,whistling
395 | 393,windsurfing
396 | 394,wrapping present
397 | 395,wrestling
398 | 396,writing
399 | 397,yawning
400 | 398,yoga
401 | 399,zumba
402 | 


--------------------------------------------------------------------------------
/docs/source/index.rst:
--------------------------------------------------------------------------------
  1 | .. Video Dataset Loading PyTorch documentation master file, created by
  2 |    sphinx-quickstart on Fri Nov 13 02:54:35 2020.
  3 |    You can adapt this file completely to your liking, but it should at least
  4 |    contain the root `toctree` directive.
  5 | 
  6 | Video Dataset Loading in Pytorch !
  7 | ==================================
  8 | 
  9 | .. toctree::
 10 |    :maxdepth: 2
 11 |    :caption: Contents
 12 |    
 13 |    VideoDataset
 14 |    Github Demo, Readme & Code <https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch>
 15 |    README <https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch/blob/main/README.md>
 16 | 	
 17 | 
 18 | Efficient Video Dataset Loading, Preprocessing, and Augmentation
 19 | ========================================================================
 20 | To get the most up-to-date README, please visit `Github: Video Dataset Loading Pytorch <https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch>`__
 21 | 
 22 | Author: `Raivo Koot <https://github.com/RaivoKoot>`__
 23 | 
 24 | If you are completely unfamiliar with loading datasets in PyTorch using
 25 | ``torch.utils.data.Dataset`` and ``torch.utils.data.DataLoader``, I
 26 | recommend getting familiar with these first through
 27 | `this <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`__
 28 | or
 29 | `this <https://github.com/utkuozbulak/pytorch-custom-dataset-examples>`__.
 30 | 
 31 | Overview: This example demonstrates the use of ``VideoFrameDataset``
 32 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 33 | 
 34 | The VideoFrameDataset class serves to ``easily``, ``efficiently`` and
 35 | ``effectively`` load video samples from video datasets in PyTorch.
 36 | 
 37 | 1) Easily because this dataset class can be used with custom datasets with
 38 | minimum effort and no modification. The class merely expects the video
 39 | dataset to have a certain structure on disk and expects a .txt
 40 | annotation file that enumerates each video sample. Details on this can
 41 | be found below and at
 42 | ``https://video-dataset-loading-pytorch.readthedocs.io/``.
 43 | 
 44 | 2) Efficiently because the video loading pipeline that this class
 45 | implements is very fast. This minimizes GPU waiting time during training
 46 | by eliminating input bottlenecks that can slow down training time by
 47 | several folds. 
 48 | 
 49 | 3) Effectively because the implemented sampling strategy
 50 | for video frames is very strong. Video training using the entire
 51 | sequence of video frames (often several hundred) is too memory and
 52 | compute intense. Therefore, this implementation samples frames evenly
 53 | from the video (sparse temporal sampling) so that the loaded frames
 54 | represent every part of the video, with support for arbitrary and
 55 | differing video lengths within the same dataset. This approach has shown
 56 | to be very effective and is taken from `"Temporal Segment Networks
 57 | (ECCV2016)" <https://arxiv.org/abs/1608.00859>`__ with modifications.
 58 | 
 59 | In conjunction with PyTorch's DataLoader, the VideoFrameDataset class
 60 | returns video batch tensors of size
 61 | ``BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH``.
 62 | 
 63 | For a demo, visit ``https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch``. 
 64 | 
 65 | QuickDemo (demo.py)
 66 | ~~~~~~~~~~~~~~~~~~~
 67 | 
 68 | .. code:: python
 69 | 
 70 |     root = os.path.join(os.getcwd(), 'demo_dataset')  # Folder in which all videos lie in a specific structure
 71 |     annotation_file = os.path.join(root, 'annotations.txt')  # A row for each video sample as: (VIDEO_PATH NUM_FRAMES CLASS_INDEX)
 72 | 
 73 |     """ DEMO 1 WITHOUT IMAGE TRANSFORMS """
 74 |     dataset = VideoFrameDataset(
 75 |         root_path=root,
 76 |         annotationfile_path=annotation_file,
 77 |         num_segments=5,
 78 |         frames_per_segment=1,
 79 |         image_template='img_{:05d}.jpg',
 80 |         transform=None,
 81 |         random_shift=True,
 82 |         test_mode=False
 83 |     )
 84 | 
 85 |     sample = dataset[0]  # take first sample of dataset 
 86 |     frames = sample[0]   # list of PIL images
 87 |     label = sample[1]    # integer label
 88 | 
 89 |     for image in frames:
 90 |         plt.imshow(image)
 91 |         plt.title(label)
 92 |         plt.show()
 93 |         plt.pause(1)
 94 | 
 95 | Table of Contents
 96 | =================
 97 | 
 98 | -  `1. Requirements <#1-requirements>`__
 99 | -  `2. Custom Dataset <#2-custom-dataset>`__
100 | -  `3. Video Frame Sampling Method <#3-video-frame-sampling-method>`__
101 | -  `4. Using VideoFrameDataset for
102 |    Training <#4-using-videoframedataset-for-training>`__
103 | -  `5. Conclusion <#5-conclusion>`__
104 | -  `6. Acknowledgements <#6-acknowledgements>`__
105 | 
106 | 1. Requirements
107 | ~~~~~~~~~~~~~~~
108 | 
109 | ::
110 | 
111 |     # Without these three, VideoFrameDataset will not work.
112 |     torchvision >= 0.8.0
113 |     torch >= 1.7.0
114 |     python >= 3.6
115 | 
116 | 2. Custom Dataset
117 | ~~~~~~~~~~~~~~~~~
118 | 
119 | To use any dataset, two conditions must be met. 1) The video data must
120 | be supplied as RGB frames, each frame saved as an image file. Each video
121 | must have its own folder, in which the frames of that video lie. The
122 | frames of a video inside its folder must be named uniformly as
123 | ``img_00001.jpg`` ... ``img_00120.jpg``, if there are 120 frames. The
124 | filename template for frames is then "img\_{:05d}.jpg" (python string
125 | formatting, specifying 5 digits after the underscore), and must be
126 | supplied to the constructor of VideoFrameDataset as a parameter. Each
127 | video folder lies inside a ``root`` folder of this dataset. 2) To
128 | enumerate all video samples in the dataset and their required metadata,
129 | a ``.txt`` annotation file must be manually created that contains a row
130 | for each video sample in the dataset. The training, validation, and
131 | testing datasets must have separate annotation files. Each row must be a
132 | space-separated list that contains
133 | ``VIDEO_PATH NUM_FRAMES CLASS_INDEX``. The ``VIDEO_PATH`` of a video
134 | sample should be provided without the ``root`` prefix of this dataset.
135 | 
136 | This example project demonstrates this using a dummy dataset inside of
137 | ``demo_dataset/``, which is the ``root`` dataset folder of this example.
138 | The folder structure looks as follows:
139 | 
140 | ::
141 | 
142 |     demo_dataset
143 |     │
144 |     ├───annotations.txt
145 |     ├───jumping # arbitrary class folder naming
146 |     │       ├───0001  # arbitrary video folder naming
147 |     │       │     ├───img_00001.jpg
148 |     │       │     .
149 |     │       │     └───img_00017.jpg
150 |     │       └───0002
151 |     │             ├───img_00001.jpg
152 |     │             .
153 |     │             └───img_00018.jpg
154 |     │
155 |     └───running # arbitrary folder naming
156 |             ├───0001  # arbitrary video folder naming
157 |             │     ├───img_00001.jpg
158 |             │     .
159 |             │     └───img_00015.jpg
160 |             └───0002
161 |                   ├───img_00001.jpg
162 |                   .
163 |                   └───img_00015.jpg
164 | 
165 |      
166 | 
167 | The accompanying annotation ``.txt`` file contains the following rows
168 | 
169 | ::
170 | 
171 |     jumping/0001 17 0
172 |     jumping/0002 18 0
173 |     running/0001 15 1
174 |     running/0002 15 1
175 | 
176 | Instantiating a VideoFrameDataset with the ``root_path`` parameter
177 | pointing to ``demo_dataset``, the ``annotationsfile_path`` parameter
178 | pointing to the annotation file, and the ``imagefile_template``
179 | parameter as "img\_{:05d}.jpg", is all that it takes to start using the
180 | VideoFrameDataset class.
181 | 
182 | 3. Video Frame Sampling Method
183 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
184 | 
185 | When loading a video, only a number of its frames are loaded. They are
186 | chosen in the following way: 1. The frame indices [1,N] are divided into
187 | NUM\_SEGMENTS even segments. From each segment, FRAMES\_PER\_SEGMENT
188 | consecutive indices are chosen at random. This results in
189 | NUM\_SEGMENTS\*FRAMES\_PER\_SEGMENT chosen indices, whose frames are
190 | loaded as PIL images and put into a list and returned when calling
191 | ``dataset[i]``.
192 | 
193 | 4. Using VideoFrameDataset for training
194 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195 | 
196 | As demonstrated in ``https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch/blob/main/demo.py``, we can use PyTorch's
197 | ``torch.utils.data.DataLoader`` class with VideoFrameDataset to take
198 | care of shuffling, batching, and more. To turn the lists of PIL images
199 | returned by VideoFrameDataset into tensors, the transform
200 | ``video_dataset.imglist_totensor()`` can be supplied as the
201 | ``transform`` parameter to VideoFrameDataset. This turns a list of N PIL
202 | images into a batch of images/frames of shape
203 | ``N x CHANNELS x HEIGHT x WIDTH``. We can further chain preprocessing
204 | and augmentation functions that act on batches of images onto the end of
205 | ``imglist_totensor()``.
206 | 
207 | As of ``torchvision 0.8.0``, all torchvision transforms can now also
208 | operate on batches of images, and they apply deterministic or random
209 | transformations on the batch identically on all images of the batch.
210 | Therefore, any torchvision transform can be used here to apply
211 | video-uniform preprocessing and augmentation.
212 | 
213 | 5. Conclusion
214 | ~~~~~~~~~~~~~
215 | 
216 | A proper code-based explanation on how to use VideoFrameDataset for
217 | training is provided in ``https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch/blob/main/demo.py``
218 | 
219 | 6. Acknowledgements
220 | ~~~~~~~~~~~~~~~~~~~
221 | 
222 | We thank the authors of TSN for their
223 | `codebase <https://github.com/yjxiong/tsn-pytorch>`__, from which we
224 | took VideoFrameDataset and adapted it.
225 | 


--------------------------------------------------------------------------------
/video_dataset.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import os.path
  3 | import numpy as np
  4 | from PIL import Image
  5 | from torchvision import transforms
  6 | import torch
  7 | from typing import List, Union, Tuple, Any
  8 | 
  9 | 
 10 | class VideoRecord(object):
 11 |     """
 12 |     Helper class for class VideoFrameDataset. This class
 13 |     represents a video sample's metadata.
 14 | 
 15 |     Args:
 16 |         root_datapath: the system path to the root folder
 17 |                        of the videos.
 18 |         row: A list with four or more elements where 1) The first
 19 |              element is the path to the video sample's frames excluding
 20 |              the root_datapath prefix 2) The  second element is the starting frame id of the video
 21 |              3) The third element is the inclusive ending frame id of the video
 22 |              4) The fourth element is the label index.
 23 |              5) any following elements are labels in the case of multi-label classification
 24 |     """
 25 |     def __init__(self, row, root_datapath):
 26 |         self._data = row
 27 |         self._path = os.path.join(root_datapath, row[0])
 28 | 
 29 | 
 30 |     @property
 31 |     def path(self) -> str:
 32 |         return self._path
 33 | 
 34 |     @property
 35 |     def num_frames(self) -> int:
 36 |         return self.end_frame - self.start_frame + 1  # +1 because end frame is inclusive
 37 |     @property
 38 |     def start_frame(self) -> int:
 39 |         return int(self._data[1])
 40 | 
 41 |     @property
 42 |     def end_frame(self) -> int:
 43 |         return int(self._data[2])
 44 | 
 45 |     @property
 46 |     def label(self) -> Union[int, List[int]]:
 47 |         # just one label_id
 48 |         if len(self._data) == 4:
 49 |             return int(self._data[3])
 50 |         # sample associated with multiple labels
 51 |         else:
 52 |             return [int(label_id) for label_id in self._data[3:]]
 53 | 
 54 | class VideoFrameDataset(torch.utils.data.Dataset):
 55 |     r"""
 56 |     A highly efficient and adaptable dataset class for videos.
 57 |     Instead of loading every frame of a video,
 58 |     loads x RGB frames of a video (sparse temporal sampling) and evenly
 59 |     chooses those frames from start to end of the video, returning
 60 |     a list of x PIL images or ``FRAMES x CHANNELS x HEIGHT x WIDTH``
 61 |     tensors where FRAMES=x if the ``ImglistToTensor()``
 62 |     transform is used.
 63 | 
 64 |     More specifically, the frame range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS
 65 |     segments and FRAMES_PER_SEGMENT consecutive frames are taken from each segment.
 66 | 
 67 |     Note:
 68 |         A demonstration of using this class can be seen
 69 |         in ``demo.py``
 70 |         https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch
 71 | 
 72 |     Note:
 73 |         This dataset broadly corresponds to the frame sampling technique
 74 |         introduced in ``Temporal Segment Networks`` at ECCV2016
 75 |         https://arxiv.org/abs/1608.00859.
 76 | 
 77 | 
 78 |     Note:
 79 |         This class relies on receiving video data in a structure where
 80 |         inside a ``ROOT_DATA`` folder, each video lies in its own folder,
 81 |         where each video folder contains the frames of the video as
 82 |         individual files with a naming convention such as
 83 |         img_001.jpg ... img_059.jpg.
 84 |         For enumeration and annotations, this class expects to receive
 85 |         the path to a .txt file where each video sample has a row with four
 86 |         (or more in the case of multi-label, see README on Github)
 87 |         space separated values:
 88 |         ``VIDEO_FOLDER_PATH     START_FRAME      END_FRAME      LABEL_INDEX``.
 89 |         ``VIDEO_FOLDER_PATH`` is expected to be the path of a video folder
 90 |         excluding the ``ROOT_DATA`` prefix. For example, ``ROOT_DATA`` might
 91 |         be ``home\data\datasetxyz\videos\``, inside of which a ``VIDEO_FOLDER_PATH``
 92 |         might be ``jumping\0052\`` or ``sample1\`` or ``00053\``.
 93 | 
 94 |     Args:
 95 |         root_path: The root path in which video folders lie.
 96 |                    this is ROOT_DATA from the description above.
 97 |         annotationfile_path: The .txt annotation file containing
 98 |                              one row per video sample as described above.
 99 |         num_segments: The number of segments the video should
100 |                       be divided into to sample frames from.
101 |         frames_per_segment: The number of frames that should
102 |                             be loaded per segment. For each segment's
103 |                             frame-range, a random start index or the
104 |                             center is chosen, from which frames_per_segment
105 |                             consecutive frames are loaded.
106 |         imagefile_template: The image filename template that video frame files
107 |                             have inside of their video folders as described above.
108 |         transform: Transform pipeline that receives a list of PIL images/frames.
109 |         test_mode: If True, frames are taken from the center of each
110 |                    segment, instead of a random location in each segment.
111 | 
112 |     """
113 |     def __init__(self,
114 |                  root_path: str,
115 |                  annotationfile_path: str,
116 |                  num_segments: int = 3,
117 |                  frames_per_segment: int = 1,
118 |                  imagefile_template: str='img_{:05d}.jpg',
119 |                  transform = None,
120 |                  test_mode: bool = False):
121 |         super(VideoFrameDataset, self).__init__()
122 | 
123 |         self.root_path = root_path
124 |         self.annotationfile_path = annotationfile_path
125 |         self.num_segments = num_segments
126 |         self.frames_per_segment = frames_per_segment
127 |         self.imagefile_template = imagefile_template
128 |         self.transform = transform
129 |         self.test_mode = test_mode
130 | 
131 |         self._parse_annotationfile()
132 |         self._sanity_check_samples()
133 | 
134 |     def _load_image(self, directory: str, idx: int) -> Image.Image:
135 |         return Image.open(os.path.join(directory, self.imagefile_template.format(idx))).convert('RGB')
136 | 
137 |     def _parse_annotationfile(self):
138 |         self.video_list = [VideoRecord(x.strip().split(), self.root_path) for x in open(self.annotationfile_path)]
139 | 
140 |     def _sanity_check_samples(self):
141 |         for record in self.video_list:
142 |             if record.num_frames <= 0 or record.start_frame == record.end_frame:
143 |                 print(f"\nDataset Warning: video {record.path} seems to have zero RGB frames on disk!\n")
144 | 
145 |             elif record.num_frames < (self.num_segments * self.frames_per_segment):
146 |                 print(f"\nDataset Warning: video {record.path} has {record.num_frames} frames "
147 |                       f"but the dataloader is set up to load "
148 |                       f"(num_segments={self.num_segments})*(frames_per_segment={self.frames_per_segment})"
149 |                       f"={self.num_segments * self.frames_per_segment} frames. Dataloader will throw an "
150 |                       f"error when trying to load this video.\n")
151 | 
152 |     def _get_start_indices(self, record: VideoRecord) -> 'np.ndarray[int]':
153 |         """
154 |         For each segment, choose a start index from where frames
155 |         are to be loaded from.
156 | 
157 |         Args:
158 |             record: VideoRecord denoting a video sample.
159 |         Returns:
160 |             List of indices of where the frames of each
161 |             segment are to be loaded from.
162 |         """
163 |         # choose start indices that are perfectly evenly spread across the video frames.
164 |         if self.test_mode:
165 |             distance_between_indices = (record.num_frames - self.frames_per_segment + 1) / float(self.num_segments)
166 | 
167 |             start_indices = np.array([int(distance_between_indices / 2.0 + distance_between_indices * x)
168 |                                       for x in range(self.num_segments)])
169 |         # randomly sample start indices that are approximately evenly spread across the video frames.
170 |         else:
171 |             max_valid_start_index = (record.num_frames - self.frames_per_segment + 1) // self.num_segments
172 | 
173 |             start_indices = np.multiply(list(range(self.num_segments)), max_valid_start_index) + \
174 |                       np.random.randint(max_valid_start_index, size=self.num_segments)
175 | 
176 |         return start_indices
177 | 
178 |     def __getitem__(self, idx: int) -> Union[
179 |         Tuple[List[Image.Image], Union[int, List[int]]],
180 |         Tuple['torch.Tensor[num_frames, channels, height, width]', Union[int, List[int]]],
181 |         Tuple[Any, Union[int, List[int]]],
182 |         ]:
183 |         """
184 |         For video with id idx, loads self.NUM_SEGMENTS * self.FRAMES_PER_SEGMENT
185 |         frames from evenly chosen locations across the video.
186 | 
187 |         Args:
188 |             idx: Video sample index.
189 |         Returns:
190 |             A tuple of (video, label). Label is either a single
191 |             integer or a list of integers in the case of multiple labels.
192 |             Video is either 1) a list of PIL images if no transform is used
193 |             2) a batch of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1]
194 |             if the transform "ImglistToTensor" is used
195 |             3) or anything else if a custom transform is used.
196 |         """
197 |         record: VideoRecord = self.video_list[idx]
198 | 
199 |         frame_start_indices: 'np.ndarray[int]' = self._get_start_indices(record)
200 | 
201 |         return self._get(record, frame_start_indices)
202 | 
203 |     def _get(self, record: VideoRecord, frame_start_indices: 'np.ndarray[int]') -> Union[
204 |         Tuple[List[Image.Image], Union[int, List[int]]],
205 |         Tuple['torch.Tensor[num_frames, channels, height, width]', Union[int, List[int]]],
206 |         Tuple[Any, Union[int, List[int]]],
207 |         ]:
208 |         """
209 |         Loads the frames of a video at the corresponding
210 |         indices.
211 | 
212 |         Args:
213 |             record: VideoRecord denoting a video sample.
214 |             frame_start_indices: Indices from which to load consecutive frames from.
215 |         Returns:
216 |             A tuple of (video, label). Label is either a single
217 |             integer or a list of integers in the case of multiple labels.
218 |             Video is either 1) a list of PIL images if no transform is used
219 |             2) a batch of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1]
220 |             if the transform "ImglistToTensor" is used
221 |             3) or anything else if a custom transform is used.
222 |         """
223 | 
224 |         frame_start_indices = frame_start_indices + record.start_frame
225 |         images = list()
226 | 
227 |         # from each start_index, load self.frames_per_segment
228 |         # consecutive frames
229 |         for start_index in frame_start_indices:
230 |             frame_index = int(start_index)
231 | 
232 |             # load self.frames_per_segment consecutive frames
233 |             for _ in range(self.frames_per_segment):
234 |                 image = self._load_image(record.path, frame_index)
235 |                 images.append(image)
236 | 
237 |                 if frame_index < record.end_frame:
238 |                     frame_index += 1
239 | 
240 |         if self.transform is not None:
241 |             images = self.transform(images)
242 | 
243 |         return images, record.label
244 | 
245 |     def __len__(self):
246 |         return len(self.video_list)
247 | 
248 | class ImglistToTensor(torch.nn.Module):
249 |     """
250 |     Converts a list of PIL images in the range [0,255] to a torch.FloatTensor
251 |     of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1].
252 |     Can be used as first transform for ``VideoFrameDataset``.
253 |     """
254 |     @staticmethod
255 |     def forward(img_list: List[Image.Image]) -> 'torch.Tensor[NUM_IMAGES, CHANNELS, HEIGHT, WIDTH]':
256 |         """
257 |         Converts each PIL image in a list to
258 |         a torch Tensor and stacks them into
259 |         a single tensor.
260 | 
261 |         Args:
262 |             img_list: list of PIL images.
263 |         Returns:
264 |             tensor of size ``NUM_IMAGES x CHANNELS x HEIGHT x WIDTH``
265 |         """
266 |         return torch.stack([transforms.functional.to_tensor(pic) for pic in img_list])
267 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Efficient Video Dataset Loading and Augmentation in PyTorch
  2 | Author: [Raivo Koot](https://github.com/RaivoKoot)  
  3 | https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html  
  4 | If you find the code useful, please star the repository.  
  5 |   
  6 | If you are completely unfamiliar with loading datasets in PyTorch using `torch.utils.data.Dataset` and `torch.utils.data.DataLoader`, I recommend
  7 | getting familiar with these first through [this](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) or 
  8 | [this](https://github.com/utkuozbulak/pytorch-custom-dataset-examples). 
  9 | 
 10 | ### In a Nutshell
 11 | Video-Dataset-Loading-Pytorch provides the lowest entry barrier for setting up deep learning training loops on video data. It makes working with video datasets easy and accessible (also efficient!). It only requires you to have your video dataset in a certain format on disk and takes care of the rest. No complicated dependencies and it supports native Torchvision video data augmentation.
 12 | 
 13 | ### Overview: This small library solely provides the class `VideoFrameDataset`
 14 | The VideoFrameDataset class (an implementation of `torch.utils.data.Dataset`) serves to `easily`, `efficiently` and `effectively` load video samples from video datasets in PyTorch.
 15 | 1) Easily because this dataset class can be used with custom datasets with minimum effort and no modification. The class merely expects the 
 16 | video dataset to have a certain structure on disk and expects a .txt annotation file that enumerates each video sample. Details on this 
 17 | can be found below. Pre-made annotation files and preparation scripts are also provided for [Kinetics 400](https://github.com/cvdfoundation/kinetics-dataset), [Something Something V2](https://20bn.com/datasets/something-something) and [Epic Kitchens 100](https://epic-kitchens.github.io/2021).
 18 | 2) Efficiently because the video loading pipeline that this class implements is very fast. This minimizes GPU waiting time during training by eliminating CPU input bottlenecks that can slow down training time by several folds.
 19 | 3) Effectively because the implemented sampling strategy for video frames is very representative. Video training using the entire sequence of 
 20 | video frames (often several hundred) is too memory and compute intense. Therefore, this implementation samples frames evenly from the video (sparse temporal sampling) 
 21 | so that the loaded frames represent every part of the video, with support for arbitrary and differing video lengths within the same dataset. 
 22 | This approach has shown to be very effective and is taken from
 23 | ["Temporal Segment Networks (ECCV2016)"](https://arxiv.org/abs/1608.00859) with modifications.
 24 | 
 25 | In conjunction with PyTorch's DataLoader, the VideoFrameDataset class returns video batch tensors of size `BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH`.  
 26 |   
 27 | For a demo, visit `demo.py`.  
 28 | 
 29 | ### QuickDemo (demo.py)
 30 | ```python
 31 | root = os.path.join(os.getcwd(), 'demo_dataset')  # Folder in which all videos lie in a specific structure
 32 | annotation_file = os.path.join(root, 'annotations.txt')  # A row for each video sample as: (VIDEO_PATH START_FRAME END_FRAME CLASS_ID)
 33 | 
 34 | """ DEMO 1 WITHOUT IMAGE TRANSFORMS """
 35 | dataset = VideoFrameDataset(
 36 |     root_path=root,
 37 |     annotationfile_path=annotation_file,
 38 |     num_segments=5,
 39 |     frames_per_segment=1,
 40 |     imagefile_template='img_{:05d}.jpg',
 41 |     transform=None,
 42 |     test_mode=False
 43 | )
 44 | 
 45 | sample = dataset[0]  # take first sample of dataset 
 46 | frames = sample[0]   # list of PIL images
 47 | label = sample[1]    # integer label
 48 | 
 49 | for image in frames:
 50 |     plt.imshow(image)
 51 |     plt.title(label)
 52 |     plt.show()
 53 |     plt.pause(1)
 54 | ```
 55 | ![alt text](https://github.com/RaivoKoot/images/blob/main/Action_Video.jpg "Action Video")
 56 | # Table of Contents
 57 | - [1. Requirements](#1-requirements)
 58 | - [2. Custom Dataset](#2-custom-dataset)
 59 | - [3. Video Frame Sampling Method](#3-video-frame-sampling-method)
 60 | - [4. Alternate Video Frame Sampling Methods](#4-alternate-video-frame-sampling-methods)
 61 | - [5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training)
 62 | - [6. Allowing Multiple Labels per Sample](#6-allowing-multiple-labels-per-sample)
 63 | - [7. Conclusion](#7-conclusion)
 64 | - [8. Kinetics 400 & Something Something V2 & EPIC-KITCHENS-100](#8-kinetics-400--something-something-v2--epic-kitchens-100)
 65 | - [9. Upcoming Features](#9-upcoming-features)
 66 | - [10. Acknowledgements](#10-acknowledgements)
 67 | 
 68 | ### 1. Requirements
 69 | ```
 70 | # Without these three, VideoFrameDataset will not work.
 71 | torchvision >= 0.8.0
 72 | torch >= 1.7.0
 73 | python >= 3.6
 74 | ```
 75 | ### 2. Custom Dataset
 76 | (This description explains using custom datasets where each sample has a single class label. If you want to know how to
 77 | use a dataset where a sample can have more than a single class label, read this anyways and then read `6.` below)  
 78 |   
 79 | To use any dataset, two conditions must be met.
 80 | 1) The video data must be supplied as RGB frames, each frame saved as an image file. Each video must have its own folder, in which the frames of
 81 | that video lie. The frames of a video inside its folder must be named uniformly with consecutive indices such as `img_00001.jpg` ... `img_00120.jpg`, if there are 120 frames. 
 82 |    Indices can start at zero or any other number and the exact file name template can be chosen freely. The filename template 
 83 |    for frames in this example is "img_{:05d}.jpg" (python string formatting, specifying 5 digits after the underscore), and must be supplied to the 
 84 |    constructor of VideoFrameDataset as a parameter. Each video folder must lie inside some `root` folder.
 85 | 2) To enumerate all video samples in the dataset and their required metadata, a `.txt` annotation file must be manually created that contains a row for each
 86 | video clip sample in the dataset. The training, validation, and testing datasets must have separate annotation files. Each row must be a space-separated list that contains
 87 | `VIDEO_PATH START_FRAME END_FRAME CLASS_INDEX`. The `VIDEO_PATH` of a video sample should be provided without the `root` prefix of this dataset.
 88 | 
 89 | This example project demonstrates this using a dummy dataset inside of `demo_dataset/`, which is the `root` dataset folder of this example. The folder 
 90 | structure looks as follows:
 91 | ```
 92 | demo_dataset
 93 | │
 94 | ├───annotations.txt
 95 | ├───jumping # arbitrary class folder naming
 96 | │       ├───0001  # arbitrary video folder naming
 97 | │       │     ├───img_00001.jpg
 98 | │       │     .
 99 | │       │     └───img_00017.jpg
100 | │       └───0002
101 | │             ├───img_00001.jpg
102 | │             .
103 | │             └───img_00018.jpg
104 | │
105 | └───running # arbitrary folder naming
106 |         ├───0001  # arbitrary video folder naming
107 |         │     ├───img_00001.jpg
108 |         │     .
109 |         │     └───img_00015.jpg
110 |         └───0002
111 |               ├───img_00001.jpg
112 |               .
113 |               └───img_00015.jpg
114 | 
115 |  
116 | ```
117 | The accompanying annotation `.txt` file contains the following rows (PATH, START_FRAME, END_FRAME, LABEL_ID)
118 | ```
119 | jumping/0001 1 17 0
120 | jumping/0002 1 18 0
121 | running/0001 1 15 1
122 | running/0002 1 15 1
123 | ```
124 | Another annotations file that uses multiple clips from each video could be
125 | ```
126 | jumping/0001 1 8 0
127 | jumping/0001 5 17 0
128 | jumping/0002 1 18 0
129 | running/0001 10 15 1
130 | running/0001 5 10 1
131 | running/0002 1 15 1
132 | ```
133 | (END_FRAME is inclusive)  
134 | 
135 | Another, simpler, example of the way your dataset's RGB frames can be organized on disk is the following:
136 | ```
137 | demo_dataset
138 | │
139 | ├───annotations.txt
140 | └───rgb 
141 |      ├───video_1
142 |      │     ├───img_00001.jpg
143 |      │     .
144 |      │     └───img_00017.jpg
145 |      ├───video_2
146 |      │     ├───img_00001.jpg
147 |      │     .
148 |      │     └───img_00044.jpg
149 |      └───video_3
150 |            ├───img_00001.jpg
151 |            .
152 |            └───img_00023.jpg
153 | 
154 |  
155 | ```
156 | The accompanying annotation `.txt` file contains the following rows (PATH, START_FRAME, END_FRAME, LABEL_ID)
157 | ```
158 | video_1 1 17 1
159 | video_2 1 44 0
160 | video_3 1 23 0
161 | ```
162 |   
163 | Instantiating a VideoFrameDataset with the `root_path` parameter pointing to `demo_dataset/rgb/`, the `annotationsfile_path` parameter pointing to the annotation file `demo_dataset/annotations.txt`, and
164 | the `imagefile_template` parameter as "img_{:05d}.jpg", is all that it takes to start using the VideoFrameDataset class.
165 | 
166 | ### 3. Video Frame Sampling Method
167 | When loading a video, only a number of its frames are loaded. They are chosen in the following way:
168 | 1. The frame index range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS even segments. From each segment, a random start-index is sampled from which FRAMES_PER_SEGMENT consecutive indices are loaded.
169 | This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling
170 | `dataset[i]`.
171 | ![alt text](https://github.com/RaivoKoot/images/blob/main/Sparse_Temporal_Sampling.jpg "Sparse-Temporal-Sampling-Strategy")
172 | 
173 | ### 4. Alternate Video Frame Sampling Methods
174 | If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous
175 | clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Because VideoFrameDataset
176 | will chose a random start index per segment and take `NUM_SEGMENTS` continuous frames from each sampled start
177 | index, this will result in a single N-frame continuous clip per video that starts at a random index. 
178 | An example of this is in `demo.py`.  
179 |   
180 | ### 5. Using VideoFrameDataset for training
181 | As demonstrated in `demo.py`, we can use PyTorch's `torch.utils.data.DataLoader` class with VideoFrameDataset to take care of shuffling, batching, and more.
182 | To turn the lists of PIL images returned by VideoFrameDataset into tensors, the transform `video_dataset.ImglistToTensor()` can be supplied
183 | as the `transform` parameter to VideoFrameDataset. This turns a list of N PIL images into a batch of images/frames of shape `N x CHANNELS x HEIGHT x WIDTH`. 
184 | We can further chain preprocessing and augmentation functions that act on batches of images onto the end of `ImglistToTensor()`, as seen in `demo.py`
185 |   
186 | As of `torchvision 0.8.0`, all torchvision transforms can now also operate on batches of images, and they apply deterministic or random transformations
187 | on the batch identically on all images of the batch. Because a single video-tensor (FRAMES x CHANNELS x HEIGHT x WIDTH) 
188 | has the same shape as an image batch tensor (BATCH x CHANNELS x HEIGHT x WIDTH), any torchvision transform can be used here to apply video-uniform preprocessing and augmentation.
189 |   
190 | REMEMBER:  
191 | Pytorch transforms are applied to individual dataset samples (in this case a list of PIL images of a video, or a video-frame tensor after `ImglistToTensor()`) before
192 | batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `ImglistToTensor()` is not used.
193 | 
194 | ### 6. Allowing Multiple Labels per Sample
195 | Your dataset labels might be more complicated than just a single label id per sample. For example, in the EPIC-KITCHENS dataset
196 | each video clip has a verb class, noun class, and action class. In this case, each sample is associated with three label ids.
197 | To accommodate for datasets where a sample can have N integer labels, `annotation.txt` files can be used where each row
198 | is space separated `PATH,   FRAME_START,    FRAME_END,    LABEL_1_ID,    ...,    LABEL_N_ID`, instead of 
199 | `PATH,   FRAME_START,    FRAME_END,    LABEL_ID`. The VideoFrameDataset class
200 | can handle this type of annotation files too, without changing anything apart from the rows in your `annotations.txt`.  
201 |   
202 | The `annotations.txt` file for a dataset where multiple clip samples can come from the same video and each sample has
203 | three labels, would have rows like `PATH,   START_FRAME,    END_FRAME,    LABEL1,    LABEL2,    LABEL3` as seen below
204 | ```
205 | jumping/0001 1 8 0 2 1
206 | jumping/0001 5 17 0 10 3
207 | jumping/0002 1 18 0 5 3
208 | running/0001 10 15 1 3 3
209 | running/0001 5 10 1 1 0
210 | running/0002 1 15 1 12 4
211 | ```
212 |   
213 | When you use `torch.utils.data.DataLoader` with VideoFrameDataset to retrieve your batches during
214 | training, the dataloader then no longer returns batches as a `( (BATCHxFRAMESxHEIGHTxWIDTH) , (BATCH) )` tuple, where the second item is
215 | just a list/tensor of the batch's labels. Instead, the second item is replaced with the tuple 
216 | `( (BATCH) ... (BATCH) )` where the first BATCH-sized list gives label_1 for the whole batch, and the last BATCH-sized
217 | list gives label_n for the whole batch.  
218 |   
219 | A demo of this can be found at the end in `demo.py`. It uses the dummy dataset in directory `demo_dataset_multilabel`.
220 | 
221 | ### 7. Conclusion
222 | A proper code-based explanation on how to use VideoFrameDataset for training is provided in `demo.py`
223 | 
224 | ### 8. Kinetics 400 & Something Something V2 & EPIC-KITCHENS-100
225 | After you have read Section 1 to 7, this repository also contains easy pre-made conversion scripts and annotation files to get you instantly started with the Kinetics 400 dataset, Something Something V2 dataset, and the EPIC-KITCHENS-100 dataset. To get started with either, read the README inside the `Kinetics400`, `EpicKitchens100` or `SomethingSomethingV2` directory.
226 | 
227 | ### 9. Upcoming Features
228 | - [x] Include compatible annotation files for common datasets, such as Something-Something-V2, EPIC-KITCHENS-100 and Kinetics, so that users do not need to spend their own time converting those datasets' annotation files to be compatible with this repository.
229 | - [x] Add demo for sampling a single continous-frame clip from videos.
230 | - [x] Add support for arbitrary labels that are more than just a single integer.
231 | - [x] Add support for specifying START_FRAME and END_FRAME for a video instead of NUM_FRAMES.
232 | - [x] Improve the handling of edge cases where NUM_FRAMES*FRAM_PER_SEG (or similar) might be larger than the number of frames in a video. (a warning message is printed now)
233 | - [x] Clean up some of the internal code that is still very messy, which was taken from the below codebase.
234 | - [ ] Create a version of this implementation that uses OpenCV instead of PIL for frame loading, so that you can use Albumentation transforms instead of Torchvision transforms.
235 | 
236 | ### 10. Acknowledgements
237 | We thank the authors of TSN for their [codebase](https://github.com/yjxiong/tsn-pytorch), from which we took VideoFrameDataset and adapted it
238 | for general use and compatibility.
239 | ```
240 | @InProceedings{wang2016_TemporalSegmentNetworks,
241 |     title={Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
242 |     author={Limin Wang and Yuanjun Xiong and Zhe Wang and Yu Qiao and Dahua Lin and
243 |             Xiaoou Tang and Luc {Val Gool}},
244 |     booktitle={The European Conference on Computer Vision (ECCV)},
245 |     year={2016}
246 | }
247 | ```
248 | 


--------------------------------------------------------------------------------