├── FEATURE_ZOO.md
├── GUIDELINES.md
├── LICENSE
├── MODEL_ZOO.md
├── README.md
├── configs
├── pool
│ ├── backbone
│ │ ├── csn.yaml
│ │ ├── localization-conv.yaml
│ │ ├── r2d3ds.yaml
│ │ ├── r2p1d.yaml
│ │ ├── s3dg.yaml
│ │ ├── slowfast_4x16.yaml
│ │ ├── slowfast_8x8.yaml
│ │ ├── tada2d.yaml
│ │ ├── tadaconvnextv2_base.yaml
│ │ ├── tadaconvnextv2_small.yaml
│ │ ├── tadaconvnextv2_tiny.yaml
│ │ ├── tadaformer_b16.yaml
│ │ ├── tadaformer_l14.yaml
│ │ ├── timesformer.yaml
│ │ ├── vivit.yaml
│ │ └── vivit_fac_enc.yaml
│ ├── base.yaml
│ └── run
│ │ └── training
│ │ ├── finetune.yaml
│ │ ├── from_scratch.yaml
│ │ ├── from_scratch_large.yaml
│ │ ├── localization.yaml
│ │ └── mosi.yaml
└── projects
│ ├── epic-kitchen-ar
│ ├── csn_ek100.yaml
│ ├── csn_ek100_submission.yaml
│ ├── ek100
│ │ ├── csn.yaml
│ │ ├── csn_submit.yaml
│ │ ├── csn_test.yaml
│ │ ├── vivit_fac_enc.yaml
│ │ ├── vivit_fac_enc_submit.yaml
│ │ └── vivit_fac_enc_test.yaml
│ ├── k400
│ │ ├── vivit_fac_enc_b16x2.yaml
│ │ └── vivit_fac_enc_b16x2_test.yaml
│ ├── vivit_fac_enc_ek100.yaml
│ ├── vivit_fac_enc_ek100_submission.yaml
│ └── vivit_fac_enc_k400.yaml
│ ├── epic-kitchen-tal
│ ├── bmn-epic
│ │ └── vivit-os-local.yaml
│ └── bmn_epic.yaml
│ ├── mosi
│ ├── baselines
│ │ ├── r2d3ds_hmdb.yaml
│ │ ├── r2d3ds_ucf.yaml
│ │ ├── r2p1d_hmdb.yaml
│ │ └── r2p1d_ucf.yaml
│ ├── ft-hmdb
│ │ ├── r2d3ds.yaml
│ │ ├── r2d3ds_test.yaml
│ │ ├── r2p1d.yaml
│ │ └── r2p1d_test.yaml
│ ├── ft-ucf
│ │ ├── r2d3ds.yaml
│ │ ├── r2d3ds_test.yaml
│ │ ├── r2p1d.yaml
│ │ └── r2p1d_test.yaml
│ ├── ft_r2d3ds_hmdb.yaml
│ ├── ft_r2d3ds_ucf.yaml
│ ├── ft_r2p1d_hmdb.yaml
│ ├── ft_r2p1d_ucf.yaml
│ ├── mosi_r2d3ds_hmdb.yaml
│ ├── mosi_r2d3ds_imagenet.yaml
│ ├── mosi_r2d3ds_ucf.yaml
│ ├── mosi_r2p1d_hmdb.yaml
│ ├── mosi_r2p1d_ucf.yaml
│ ├── pt-hmdb
│ │ ├── r2d3ds.yaml
│ │ └── r2p1d.yaml
│ ├── pt-imagenet
│ │ └── r2d3ds.yaml
│ └── pt-ucf
│ │ ├── r2d3ds.yaml
│ │ └── r2p1d.yaml
│ ├── tada
│ ├── k400
│ │ ├── tada2d_16x5.yaml
│ │ └── tada2d_8x8.yaml
│ ├── ssv2
│ │ ├── tada2d_16f.yaml
│ │ └── tada2d_8f.yaml
│ ├── tada2d_k400.yaml
│ └── tada2d_ssv2.yaml
│ ├── tadaconvnextv2
│ ├── tadaconvnextv2_base_k400_16f.yaml
│ ├── tadaconvnextv2_base_ssv2_16f.yaml
│ ├── tadaconvnextv2_small_k400_16f.yaml
│ ├── tadaconvnextv2_small_ssv2_16f.yaml
│ ├── tadaconvnextv2_tiny_k400_16f.yaml
│ └── tadaconvnextv2_tiny_ssv2_16f.yaml
│ └── tadaformer
│ ├── tadaformer_b16_k400_16f.yaml
│ ├── tadaformer_b16_ssv2_16f.yaml
│ ├── tadaformer_l14_k400_16f.yaml
│ └── tadaformer_l14_ssv2_16f.yaml
├── projects
├── epic-kitchen-ar
│ └── README.md
├── epic-kitchen-tal
│ └── README.md
├── mosi
│ ├── MoSI.png
│ └── README.md
├── tada
│ ├── README.md
│ └── TAda2D.png
└── tadaconvv2
│ ├── README.md
│ └── TAdaConvV2.png
├── runs
├── run.py
├── submission_test.py
├── test.py
├── test_epic_localization.py
└── train.py
└── tadaconv
├── datasets
├── __init__.py
├── base
│ ├── __init__.py
│ ├── base_dataset.py
│ ├── builder.py
│ ├── epickitchen100.py
│ ├── epickitchen100_feature.py
│ ├── hmdb51.py
│ ├── imagenet.py
│ ├── kinetics400.py
│ ├── ssv2.py
│ └── ucf101.py
└── utils
│ ├── __init__.py
│ ├── auto_augment.py
│ ├── collate_functions.py
│ ├── mixup.py
│ ├── preprocess_ssv2.py
│ ├── random_erasing.py
│ └── transformations.py
├── models
├── __init__.py
├── base
│ ├── __init__.py
│ ├── backbone.py
│ ├── base_blocks.py
│ ├── builder.py
│ ├── models.py
│ ├── slowfast.py
│ └── transformer.py
├── module_zoo
│ ├── __init__.py
│ ├── branches
│ │ ├── __init__.py
│ │ ├── csn_branch.py
│ │ ├── non_local.py
│ │ ├── r2d3d_branch.py
│ │ ├── r2plus1d_branch.py
│ │ ├── s3dg_branch.py
│ │ ├── slowfast_branch.py
│ │ ├── tada_branch.py
│ │ ├── tadaconvnextv2.py
│ │ └── tadaformer.py
│ ├── heads
│ │ ├── __init__.py
│ │ ├── bmn_head.py
│ │ ├── mosi_head.py
│ │ ├── slowfast_head.py
│ │ └── transformer_head.py
│ ├── ops
│ │ ├── __init__.py
│ │ ├── misc.py
│ │ ├── tadaconv.py
│ │ └── tadaconv_v2.py
│ └── stems
│ │ ├── __init__.py
│ │ ├── downsample_stem.py
│ │ ├── embedding_stem.py
│ │ └── r2plus1d_stem.py
└── utils
│ ├── init_helper.py
│ ├── lars.py
│ ├── localization_losses.py
│ ├── losses.py
│ ├── lr_policy.py
│ ├── model_ema.py
│ ├── optimizer.py
│ └── params.py
├── sslgenerators
├── __init__.py
├── builder.py
└── mosi
│ └── mosi_generator.py
└── utils
├── __init__.py
├── bboxes_1d.py
├── bucket.py
├── checkpoint.py
├── config.py
├── distributed.py
├── eval_tal
├── eval_epic_detection.py
└── eval_tal.py
├── launcher.py
├── logging.py
├── meters.py
├── metrics.py
├── misc.py
├── registry.py
├── sampler.py
├── tal_tools.py
├── tensor.py
├── timer.py
└── val_dist_sampler.py
/FEATURE_ZOO.md:
--------------------------------------------------------------------------------
1 | # FEATURE ZOO
2 |
3 | Here, we provide strong features for temporal action localization on HACS and Epic-Kitchens-100.
4 |
5 | | dataset | model | resolution | features | classification | average mAP |
6 | | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
7 | | EK100 | TAda2D | 8 x 8 | [features:code dc05](https://pan.baidu.com/s/1YS9yj_O21HedIxyh2PMrqw) | [classification:code 2j51](https://pan.baidu.com/s/1z7h7OAFR2UO_Q7t8dA6YbQ) | 13.18 (A) |
8 | | HACS | TAda2D | 8 x 8 | [features:code 23kv](https://pan.baidu.com/s/1FHkRFvJldtEmD8kzYw_yMQ) | - | 32.3 |
9 | | EK100 | ViViT Fact. Enc.-B16x2 | 32 x 2 | coming soon | coming soon | 18.30 (A) |
10 |
11 | Annotations used for temporal action localization with our codebase can be found [here:code r30w](https://pan.baidu.com/s/16CtY0zTIzgDpm7sjhCAA6w).
12 |
13 | Pre-trained localization models using these features can be found in the [MODEL_ZOO.md](MODEL_ZOO.md).
14 |
15 | ## Guideline
16 |
17 | ### Feature preparation
18 | After downloading the compressed feature files, first extract the `.pkl` files as follows. For example, for TAda2D HACS features:
19 |
20 | ```bash
21 | cat features_s16_fps30_val_2G.tar.gz?? | tar zx
22 | cat features_s16_fps30_train_2G.tar.gz?? | tar zx
23 | ```
24 |
25 | By the above commands, you should have two folders named `features_s16_fps30_train` and `features_s16_fps30_val`, containing some `.pkl` files. Each `.pkl` file corresponds to one video.
26 |
27 | ### Feature loading
28 | To load the features, please use the `load_feature` function in `datasets/base/epickitchen100_feature.py`:
29 |
30 | ```python
31 | def load_feature(path):
32 | if type(path) is str:
33 | with open(path, 'rb') as f:
34 | data = torch.load(f)
35 | else:
36 | data = torch.load(path)
37 | return data
38 | ```
39 |
40 | ### Feature concatenation
41 | For **Epic-Kitchen-100**, we divide each video to multiple clips, and the length of each clip is 5 secs. To perform action localization, features are first concatenated using `_transform_feature_scale` function in `datasets/base/epickitchen100_feature.py`. For example, during training, if the action segment is `[8.5, 16.1]`, it will require three clip features: `[[5.0,10.0], [10.0, 15.0], [15.0, 20.0]]`. From the features from these clips, we obtain features for the ground truth action segment. For more details, please refer to [epickitchen100_feature.py](datasets/base/epickitchen100_feature.py).
--------------------------------------------------------------------------------
/GUIDELINES.md:
--------------------------------------------------------------------------------
1 | # Guidelines for pytorch-video-understanding
2 |
3 | ## Installation
4 |
5 | Requirements:
6 | - Python>=3.6
7 | - torch>=1.5
8 | - torchvision (version corresponding with torch)
9 | - simplejson==3.11.1
10 | - decord>=0.6.0
11 | - pyyaml
12 | - einops
13 | - oss2
14 | - psutil
15 | - tqdm
16 | - pandas
17 |
18 | optional requirements
19 | - fvcore (for flops calculation)
20 |
21 | ## Data preparation
22 |
23 | For all datasets available in `datasets/base`, the name for each dataset list is specified in the `_get_dataset_list_name` function.
24 | Here we provide a table summarizing all the name and the formats of the datasets.
25 |
26 | | dataset | split | list file name | format |
27 | | ------- | ----- | -------------- | ------ |
28 | | epic-kitchens-100 | train | EPIC_100_train.csv | as downloaded |
29 | | epic-kitchens-100 | val | EPIC_100_validation.csv | as downloaded |
30 | | epic-kitchens-100 | test | EPIC_100_test_timestamps.csv | as downloaded |
31 | | hmdb51 | train/val | hmdb51_train_list.txt/hmdb51_val_list.txt | "video_path, supervised_label" |
32 | | imagenet | train/val | imagenet_train.txt/imagenet_val.txt | "image_path, supervised_label" |
33 | | kinetics 400 | train/val | kinetics400_train_list.txt/kinetics400_val_list.txt | "video_path, supervised_label" |
34 | | ssv2 | train | something-something-v2-train-with-label.json | json file with "label_idx" specifying the class and "id" specifying the name |
35 | | ssv2 | val | something-something-v2-val-with-label.json | json file with "label_idx" specifying the class and "id" specifying the name |
36 | | ucf101 | train/val | ucf101_train_list.txt/ucf101_val_list.txt | "video_path, supervised_label" |
37 |
38 | For epic-kitchens-features, the file name is specified in the respective configs in `configs/projects/epic-kitchen-tal`.
39 |
40 | ### Preprocessing Something-Something-V2 dataset
41 |
42 | We found the the video decoder we use [decord](https://github.com/dmlc/decord) has difficulty in decoding the original webm files. So we provide a script for preprocessing the `.webm` files in the original something-something-v2 dataset to `.mp4` files. To do this, simply run:
43 |
44 | ```bash
45 | python datasets/utils/preprocess_ssv2_annos.py --anno --anno_path path_to_your_annotation
46 | python datasets/utils/preprocess_ssv2_annos.py --data --data_path path_to_your_ssv2_videos --data_out_path path_to_put_output_videos
47 | ```
48 |
49 | Remember to make sure the annotation files are organized as follows:
50 | ```
51 | -- path_to_your_annotation
52 | -- something-something-v2-train.json
53 | -- something-something-v2-validation.json
54 | -- something-something-v2-labels.json
55 | ```
56 |
57 | ## Running
58 |
59 | The entry file for all the runs are `runs/run.py`.
60 |
61 | Before running, some settings need to be configured in the config file.
62 | The codebase is designed to be experiment friendly for rapid development of new models and representation learning approaches, in that the config files are designed in a hierarchical way.
63 |
64 | Take Tada2D as an example, each experiment (such as TAda2D_8x8 on kinetics 400: `configs/projects/tada/k400/tada2d_8x8.yaml`) inherits the config from the following hierarchy.
65 | ```
66 | --- base config file [configs/pool/base.yaml]
67 | --- base run config [configs/pool/run/training/from_scratch_large.yaml]
68 | --- base backbone config [configs/pool/backbone/tada2d.yaml]
69 | --- base experiment config [configs/projects/tada/tada2d_k400.yaml]
70 | --- current experiment config [configs/projects/tada/k400/tada2d_8x8.yaml]
71 | ```
72 | Generally, the base config file `configs/pool/base.yaml` contains all the possible keys used in this codebase and the bottom config overwrites its base config when the same key is encountered in both files.
73 | A good practice would be set the parameters shared for all the experiments in the base experiment config, and set parameters that are different for each experiments to the current experiment config.
74 |
75 | For an example run, open `configs/projects/tada/tada2d_k400.yaml`
76 | A. Set `DATA.DATA_ROOT_DIR` and `DATA.DATA_ANNO_DIR` to point to the kinetics 400,
77 | B. Set the valid gpu number `NUM_GPUS`
78 | Then the codebase can be run by:
79 | ```
80 | python runs/run.py --cfg configs/projects/tada/k400/tada2d_8x8.yaml
81 | ```
--------------------------------------------------------------------------------
/configs/pool/backbone/csn.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: irCSN
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: 152
6 | META_ARCH: ResNet3D
7 | NUM_FILTERS: [64, 256, 512, 1024, 2048]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 2048
10 | KERNEL_SIZE: [
11 | [3, 7, 7],
12 | [3, 3, 3],
13 | [3, 3, 3],
14 | [3, 3, 3],
15 | [3, 3, 3]
16 | ]
17 | DOWNSAMPLING: [true, false, true, true, true]
18 | DOWNSAMPLING_TEMPORAL: [false, false, true, true, true]
19 | NUM_STREAMS: 1
20 | EXPANSION_RATIO: 4
21 | BRANCH:
22 | NAME: CSNBranch
23 | STEM:
24 | NAME: DownSampleStem
25 | NONLOCAL:
26 | ENABLE: false
27 | STAGES: [5]
28 | MASK_ENABLE: false
29 | HEAD:
30 | NAME: BaseHead
31 | ACTIVATION: softmax
32 | DROPOUT_RATE: 0
33 | NUM_CLASSES: # !!!
34 |
--------------------------------------------------------------------------------
/configs/pool/backbone/localization-conv.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: BaseVideoModel
3 | VIDEO:
4 | DIM1D: 256
5 | DIM2D: 128
6 | DIM3D: 512
7 | BACKBONE_LAYER: 2
8 | BACKBONE_GROUPS_NUM: 4
9 | BACKBONE:
10 | META_ARCH: SimpleLocalizationConv
--------------------------------------------------------------------------------
/configs/pool/backbone/r2d3ds.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: R2D3D
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: 18
6 | META_ARCH: ResNet3D
7 | NUM_FILTERS: [64, 64, 128, 256, 256]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 256
10 | KERNEL_SIZE: [
11 | [1, 7, 7],
12 | [1, 3, 3],
13 | [1, 3, 3],
14 | [3, 3, 3],
15 | [3, 3, 3]
16 | ]
17 | DOWNSAMPLING: [true, false, true, true, true]
18 | DOWNSAMPLING_TEMPORAL: [false, false, false, true, true]
19 | NUM_STREAMS: 1
20 | EXPANSION_RATIO: 2
21 | BRANCH:
22 | NAME: R2D3DBranch
23 | STEM:
24 | NAME: DownSampleStem
25 | NONLOCAL:
26 | ENABLE: false
27 | STAGES: [5]
28 | MASK_ENABLE: false
29 | HEAD:
30 | NAME: BaseHead
31 | ACTIVATION: softmax
32 | DROPOUT_RATE: 0
33 | NUM_CLASSES: # !!!
34 |
--------------------------------------------------------------------------------
/configs/pool/backbone/r2p1d.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: R2Plus1D
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: 10
6 | META_ARCH: ResNet3D
7 | NUM_INPUT_CHANNELS: 3
8 | NUM_FILTERS: [64, 64, 128, 256, 512]
9 | NUM_OUT_FEATURES: 512
10 | KERNEL_SIZE: [
11 | [3, 7, 7],
12 | [3, 3, 3],
13 | [3, 3, 3],
14 | [3, 3, 3],
15 | [3, 3, 3]
16 | ]
17 | DOWNSAMPLING: [true, false, true, true, true]
18 | DOWNSAMPLING_TEMPORAL: [false, false, true, true, true]
19 | NUM_STREAMS: 1
20 | EXPANSION_RATIO: 2
21 | BRANCH:
22 | NAME: R2Plus1DBranch
23 | STEM:
24 | NAME: R2Plus1DStem
25 | NONLOCAL:
26 | ENABLE: false
27 | STAGES: [5]
28 | MASK_ENABLE: false
29 | HEAD:
30 | NAME: BaseHead
31 | ACTIVATION: softmax
32 | DROPOUT_RATE: 0
33 | NUM_CLASSES: # !!!
34 |
--------------------------------------------------------------------------------
/configs/pool/backbone/s3dg.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: S3DG
3 | VIDEO:
4 | BACKBONE:
5 | META_ARCH: Inception3D
6 | NUM_OUT_FEATURES: 1024
7 | NUM_STREAMS: 1
8 | BRANCH:
9 | NAME: STConv3d
10 | GATING: true
11 | STEM:
12 | NAME: STConv3d
13 | NONLOCAL:
14 | ENABLE: false
15 | STAGES: [5]
16 | MASK_ENABLE: false
17 | HEAD:
18 | NAME: BaseHead
19 | ACTIVATION: softmax
20 | DROPOUT_RATE: 0
21 | NUM_CLASSES: # !!!
--------------------------------------------------------------------------------
/configs/pool/backbone/slowfast_4x16.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: SlowFast_4x16
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: 50
6 | META_ARCH: Slowfast
7 | NUM_FILTERS: [64, 256, 512, 1024, 2048]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 2048
10 | KERNEL_SIZE: [
11 | [
12 | [1, 7, 7],
13 | [1, 3, 3],
14 | [1, 3, 3],
15 | [1, 3, 3],
16 | [1, 3, 3],
17 | ],
18 | [
19 | [5, 7, 7],
20 | [1, 3, 3],
21 | [1, 3, 3],
22 | [1, 3, 3],
23 | [1, 3, 3],
24 | ],
25 | ]
26 | DOWNSAMPLING: [true, false, true, true, true]
27 | DOWNSAMPLING_TEMPORAL: [false, false, false, false, false]
28 | TEMPORAL_CONV_BOTTLENECK:
29 | [
30 | [false, false, false, true, true], # slow branch,
31 | [false, true, true, true, true] # fast branch
32 | ]
33 | NUM_STREAMS: 1
34 | EXPANSION_RATIO: 4
35 | BRANCH:
36 | NAME: SlowfastBranch
37 | STEM:
38 | NAME: DownSampleStem
39 | SLOWFAST:
40 | MODE: slowfast
41 | ALPHA: 8
42 | BETA: 8 # slow fast channel ratio
43 | CONV_CHANNEL_RATIO: 2
44 | KERNEL_SIZE: 5
45 | FUSION_CONV_BIAS: false
46 | FUSION_BN: true
47 | FUSION_RELU: true
48 | NONLOCAL:
49 | ENABLE: false
50 | STAGES: [5]
51 | MASK_ENABLE: false
52 | HEAD:
53 | NAME: SlowFastHead
54 | ACTIVATION: softmax
55 | DROPOUT_RATE: 0
56 | NUM_CLASSES: # !!!
57 | DATA:
58 | NUM_INPUT_FRAMES: 32
59 | SAMPLING_RATE: 2
60 |
--------------------------------------------------------------------------------
/configs/pool/backbone/slowfast_8x8.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: SlowFast_8x8
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: 50
6 | META_ARCH: Slowfast
7 | NUM_FILTERS: [64, 256, 512, 1024, 2048]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 2048
10 | KERNEL_SIZE: [
11 | [
12 | [1, 7, 7],
13 | [1, 3, 3],
14 | [1, 3, 3],
15 | [1, 3, 3],
16 | [1, 3, 3],
17 | ],
18 | [
19 | [5, 7, 7],
20 | [1, 3, 3],
21 | [1, 3, 3],
22 | [1, 3, 3],
23 | [1, 3, 3],
24 | ],
25 | ]
26 | DOWNSAMPLING: [true, false, true, true, true]
27 | DOWNSAMPLING_TEMPORAL: [false, false, false, false, false]
28 | TEMPORAL_CONV_BOTTLENECK:
29 | [
30 | [false, false, false, true, true], # slow branch,
31 | [false, true, true, true, true] # fast branch
32 | ]
33 | NUM_STREAMS: 1
34 | EXPANSION_RATIO: 4
35 | BRANCH:
36 | NAME: SlowfastBranch
37 | STEM:
38 | NAME: DownSampleStem
39 | SLOWFAST:
40 | MODE: slowfast
41 | ALPHA: 4
42 | BETA: 8 # slow fast channel ratio
43 | CONV_CHANNEL_RATIO: 2
44 | KERNEL_SIZE: 7
45 | FUSION_CONV_BIAS: false
46 | FUSION_BN: true
47 | FUSION_RELU: true
48 | NONLOCAL:
49 | ENABLE: false
50 | STAGES: [5]
51 | MASK_ENABLE: false
52 | HEAD:
53 | NAME: SlowFastHead
54 | ACTIVATION: softmax
55 | DROPOUT_RATE: 0
56 | NUM_CLASSES: # !!!
57 | DATA:
58 | NUM_INPUT_FRAMES: 32
59 | SAMPLING_RATE: 2
--------------------------------------------------------------------------------
/configs/pool/backbone/tada2d.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: TAda2D
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: 50
6 | META_ARCH: ResNet3D
7 | NUM_FILTERS: [64, 256, 512, 1024, 2048]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 2048
10 | KERNEL_SIZE: [
11 | [1, 7, 7],
12 | [1, 3, 3],
13 | [1, 3, 3],
14 | [1, 3, 3],
15 | [1, 3, 3]
16 | ]
17 | DOWNSAMPLING: [true, true, true, true, true]
18 | DOWNSAMPLING_TEMPORAL: [false, false, false, false, false]
19 | NUM_STREAMS: 1
20 | EXPANSION_RATIO: 4
21 | INITIALIZATION: kaiming
22 | STEM:
23 | NAME: Base2DStem
24 | BRANCH:
25 | NAME: TAda2DBlock
26 | ROUTE_FUNC_K: [3, 3]
27 | ROUTE_FUNC_R: 4
28 | POOL_K: [3, 1, 1]
29 | NONLOCAL:
30 | ENABLE: false
31 | STAGES: [5]
32 | MASK_ENABLE: false
33 | HEAD:
34 | NAME: BaseHead
35 | ACTIVATION: softmax
36 | DROPOUT_RATE: 0
37 | NUM_CLASSES: # !!!
38 |
--------------------------------------------------------------------------------
/configs/pool/backbone/tadaconvnextv2_base.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: TAdaConvNeXtV2-Base
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: [3, 3, 27, 3]
6 | META_ARCH: ConvNeXt
7 | NUM_FILTERS: [128, 256, 512, 1024]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 1024
10 | DROP_PATH: 0.6
11 | LARGE_SCALE_INIT_VALUE: 1e-6
12 | STEM:
13 | T_KERNEL_SIZE: 3
14 | T_STRIDE: 2
15 | BRANCH:
16 | NAME: TAdaConvNeXtV2Block
17 | ROUTE_FUNC_K: [3, 3]
18 | ROUTE_FUNC_R: 2
19 | HEAD_DIM: 64
20 | HEAD:
21 | NAME: BaseHead
22 | ACTIVATION: softmax
23 | DROPOUT_RATE: 0
24 | NUM_CLASSES: # !!!
25 |
26 |
--------------------------------------------------------------------------------
/configs/pool/backbone/tadaconvnextv2_small.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: TAdaConvNeXtV2-Small
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: [3, 3, 27, 3]
6 | META_ARCH: ConvNeXt
7 | NUM_FILTERS: [96, 192, 384, 768]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 768
10 | DROP_PATH: 0.4
11 | LARGE_SCALE_INIT_VALUE: 1e-6
12 | STEM:
13 | T_KERNEL_SIZE: 3
14 | T_STRIDE: 2
15 | BRANCH:
16 | NAME: TAdaConvNeXtV2Block
17 | ROUTE_FUNC_K: [3, 3]
18 | ROUTE_FUNC_R: 2
19 | HEAD_DIM: 48
20 | HEAD:
21 | NAME: BaseHead
22 | ACTIVATION: softmax
23 | DROPOUT_RATE: 0
24 | NUM_CLASSES: # !!!
25 |
26 |
--------------------------------------------------------------------------------
/configs/pool/backbone/tadaconvnextv2_tiny.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: TAdaConvNeXtV2-Tiny
3 | VIDEO:
4 | BACKBONE:
5 | DEPTH: [3, 3, 9, 3]
6 | META_ARCH: ConvNeXt
7 | NUM_FILTERS: [96, 192, 384, 768]
8 | NUM_INPUT_CHANNELS: 3
9 | NUM_OUT_FEATURES: 768
10 | DROP_PATH: 0.2
11 | LARGE_SCALE_INIT_VALUE: 1e-6
12 | STEM:
13 | T_KERNEL_SIZE: 3
14 | T_STRIDE: 2
15 | BRANCH:
16 | NAME: TAdaConvNeXtV2Block
17 | ROUTE_FUNC_K: [3, 3]
18 | ROUTE_FUNC_R: 2
19 | HEAD_DIM: 48
20 | HEAD:
21 | NAME: BaseHead
22 | ACTIVATION: softmax
23 | DROPOUT_RATE: 0
24 | NUM_CLASSES: # !!!
25 |
26 |
--------------------------------------------------------------------------------
/configs/pool/backbone/tadaformer_b16.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: TAdaFormer_B16
3 |
4 | VIDEO:
5 | BACKBONE:
6 | META_ARCH: VisionTransformer
7 | INPUT_RES: 224
8 | PATCH_SIZE: 16
9 | TUBLET_SIZE: 3
10 | TUBLET_STRIDE: 2
11 | NUM_FEATURES: 768
12 | NUM_OUT_FEATURES: 768
13 | DEPTH: 12
14 | NUM_HEADS: 12
15 | DROP_PATH: 0.0
16 | ATTN_DROPOUT: 0.0
17 | REQUIRE_PROJ: false
18 | ATTN_MASK_ENABLE: false
19 | DOUBLE_TADA: false
20 | FREEZE: false
21 | REDUCTION: 2
22 | BRANCH:
23 | NAME: TAdaFormerBlock
24 | ROUTE_FUNC_K: [3, 3]
25 | ROUTE_FUNC_R: 2
26 | TEMP_ENHANCE: false
27 | HEAD:
28 | NAME: BaseHead
29 | OUTPUT_DIM: 512
--------------------------------------------------------------------------------
/configs/pool/backbone/tadaformer_l14.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: TAdaFormer_L14
3 |
4 | VIDEO:
5 | BACKBONE:
6 | META_ARCH: VisionTransformer
7 | INPUT_RES: 224
8 | PATCH_SIZE: 14
9 | TUBLET_SIZE: 3
10 | TUBLET_STRIDE: 2
11 | NUM_FEATURES: 1024
12 | NUM_OUT_FEATURES: 1024
13 | DEPTH: 24
14 | NUM_HEADS: 16
15 | DROP_PATH: 0.0
16 | ATTN_DROPOUT: 0.0
17 | REQUIRE_PROJ: false
18 | ATTN_MASK_ENABLE: false
19 | DOUBLE_TADA: false
20 | FREEZE: false
21 | REDUCTION: 2
22 | BRANCH:
23 | NAME: TAdaFormerBlock
24 | ROUTE_FUNC_K: [3, 3]
25 | ROUTE_FUNC_R: 2
26 | TEMP_ENHANCE: false
27 | HEAD:
28 | NAME: BaseHead
29 | OUTPUT_DIM: 512
--------------------------------------------------------------------------------
/configs/pool/backbone/timesformer.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: timesformer
3 | VIDEO:
4 | BACKBONE:
5 | META_ARCH: Transformer
6 | NUM_FEATURES: 768
7 | NUM_OUT_FEATURES: 768
8 | PATCH_SIZE: 16
9 | DEPTH: 12
10 | NUM_HEADS: 12
11 | DIM_HEAD: 64
12 | ATTN_DROPOUT: 0.1
13 | FF_DROPOUT: 0.1
14 | DROP_PATH: 0.0
15 | PRE_LOGITS: false
16 | STEM:
17 | NAME: PatchEmbedStem
18 | BRANCH:
19 | NAME: TimesformerLayer
20 | NONLOCAL:
21 | ENABLE: false
22 | STAGES: [5]
23 | MASK_ENABLE: false
24 | HEAD:
25 | NAME: TransformerHead
26 | ACTIVATION: softmax
27 | DROPOUT_RATE: 0
28 | NUM_CLASSES: # !!!
--------------------------------------------------------------------------------
/configs/pool/backbone/vivit.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: vivit
3 | VIDEO:
4 | BACKBONE:
5 | META_ARCH: Transformer
6 | NUM_FEATURES: 768
7 | NUM_OUT_FEATURES: 768
8 | PATCH_SIZE: 16
9 | TUBELET_SIZE: 2
10 | DEPTH: 12
11 | NUM_HEADS: 12
12 | DIM_HEAD: 64
13 | ATTN_DROPOUT: 0.0
14 | FF_DROPOUT: 0.0
15 | DROP_PATH: 0.1
16 | MLP_MULT: 4
17 | STEM:
18 | NAME: TubeletEmbeddingStem
19 | BRANCH:
20 | NAME: BaseTransformerLayer
21 | HEAD:
22 | NAME: TransformerHead
23 | ACTIVATION: softmax
24 | DROPOUT_RATE: 0
25 | NUM_CLASSES: # !!!
26 | PRE_LOGITS: false
27 | TRAIN:
28 | CHECKPOINT_PRE_PROCESS:
29 | ENABLE: true
30 | POP_HEAD: true
31 | POS_EMBED: repeat
32 | PATCH_EMBD: central_frame
--------------------------------------------------------------------------------
/configs/pool/backbone/vivit_fac_enc.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | NAME: vivit
3 | VIDEO:
4 | BACKBONE:
5 | META_ARCH: FactorizedTransformer
6 | NUM_FEATURES: 768
7 | NUM_OUT_FEATURES: 768
8 | PATCH_SIZE: 16
9 | TUBELET_SIZE: 2
10 | DEPTH: 12
11 | DEPTH_TEMP: 4
12 | NUM_HEADS: 12
13 | DIM_HEAD: 64
14 | ATTN_DROPOUT: 0.0
15 | FF_DROPOUT: 0.0
16 | DROP_PATH: 0.1
17 | MLP_MULT: 4
18 | STEM:
19 | NAME: TubeletEmbeddingStem
20 | BRANCH:
21 | NAME: BaseTransformerLayer
22 | HEAD:
23 | NAME: TransformerHead
24 | ACTIVATION: softmax
25 | DROPOUT_RATE: 0
26 | NUM_CLASSES: # !!!
27 | PRE_LOGITS: false
28 | TRAIN:
29 | CHECKPOINT_PRE_PROCESS:
30 | ENABLE: true
31 | POP_HEAD: true
32 | POS_EMBED:
33 | PATCH_EMBD: central_frame
--------------------------------------------------------------------------------
/configs/pool/base.yaml:
--------------------------------------------------------------------------------
1 | TASK_TYPE: classification
2 | PRETRAIN:
3 | ENABLE: false
4 | LOCALIZATION:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: false
8 | DATASET:
9 | BATCH_SIZE: 128
10 | LOG_FILE: training_log.log
11 | EVAL_PERIOD: 10
12 | NUM_FOLDS: 1
13 | AUTO_RESUME: true
14 | CHECKPOINT_PERIOD: 10
15 | INIT: ""
16 | CHECKPOINT_FILE_PATH: ""
17 | CHECKPOINT_TYPE: pytorch
18 | CHECKPOINT_INFLATE: false
19 | CHECKPOINT_PRE_PROCESS:
20 | ENABLE: false
21 | FINE_TUNE: false
22 | ONLY_LINEAR: false
23 | LR_REDUCE: false
24 | TRAIN_VAL_COMBINE: false
25 | TEST:
26 | ENABLE: false
27 | DATASET:
28 | BATCH_SIZE: 100
29 | NUM_SPATIAL_CROPS: 1
30 | SPATIAL_CROPS: cc
31 | NUM_ENSEMBLE_VIEWS: 1
32 | LOG_FILE: val.log
33 | CHECKPOINT_FILE_PATH: ""
34 | CHECKPOINT_TYPE: pytorch
35 | AUTOMATIC_MULTI_SCALE_TEST: true
36 | VISUALIZATION:
37 | ENABLE: false
38 | NAME: ""
39 | FEATURE_MAPS:
40 | ENABLE: false
41 | BASE_OUTPUT_DIR: ""
42 | SUBMISSION:
43 | ENABLE: false
44 | SAVE_RESULTS_PATH: "test.json"
45 | DATA:
46 | DATA_ROOT_DIR: /data_root/
47 | ANNO_DIR: /anno_dir/
48 | NUM_INPUT_FRAMES: 16
49 | NUM_INPUT_CHANNELS: 3
50 | SAMPLING_MODE: interval_based
51 | SAMPLING_RATE: 4
52 | TRAIN_JITTER_SCALES: [168, 224]
53 | TRAIN_CROP_SIZE: 112
54 | TEST_SCALE: 224
55 | TEST_CROP_SIZE: 112
56 | MEAN: [0.45, 0.45, 0.45]
57 | STD: [0.225, 0.225, 0.225]
58 | MULTI_LABEL: false
59 | ENSEMBLE_METHOD: sum
60 | TARGET_FPS: 30
61 | MINUS_INTERVAL: false
62 | MODEL:
63 | NAME:
64 | EMA:
65 | ENABLE: false
66 | DECAY: 0.99996
67 | VIDEO:
68 | BACKBONE:
69 | DEPTH:
70 | META_ARCH:
71 | NUM_FILTERS:
72 | NUM_INPUT_CHANNELS: 3
73 | NUM_OUT_FEATURES:
74 | KERNEL_SIZE:
75 | DOWNSAMPLING:
76 | DOWNSAMPLING_TEMPORAL:
77 | NUM_STREAMS: 1
78 | EXPANSION_RATIO: 2
79 | BRANCH:
80 | NAME:
81 | STEM:
82 | NAME:
83 | NONLOCAL:
84 | ENABLE: false
85 | STAGES: [5]
86 | MASK_ENABLE: false
87 | INITIALIZATION:
88 | HEAD:
89 | NAME: BaseHead
90 | ACTIVATION: softmax
91 | DROPOUT_RATE: 0
92 | NUM_CLASSES:
93 | OPTIMIZER:
94 | ADJUST_LR: false
95 | BASE_LR: 0.002
96 | LR_POLICY: cosine
97 | MAX_EPOCH: 300
98 | MOMENTUM: 0.9
99 | WEIGHT_DECAY: 1e-3
100 | WARMUP_EPOCHS: 10
101 | WARMUP_START_LR: 0.0002
102 | OPTIM_METHOD: adam
103 | DAMPENING: 0.0
104 | NESTEROV: true
105 | BIAS_DOUBLE: false
106 | NEW_PARAMS: []
107 | NEW_PARAMS_MULT: 10
108 | NEW_PARAMS_WD_MULT: 1
109 | LAYER_WISE_LR_DECAY: 1.0
110 | COSINE_AFTER_WARMUP: false
111 | COSINE_END_LR: 1e-6
112 | BN:
113 | WB_LOCK: false
114 | FREEZE: false
115 | WEIGHT_DECAY: 0.0
116 | MOMENTUM: 0.1
117 | EPS: 1e-5
118 | SYNC: false
119 | DATA_LOADER:
120 | NUM_WORKERS: 4
121 | PIN_MEMORY: false
122 | ENABLE_MULTI_THREAD_DECODE: true
123 | COLLATE_FN:
124 | NUM_GPUS: 8
125 | SHARD_ID: 0
126 | NUM_SHARDS: 1
127 | RANDOM_SEED: 0
128 | OUTPUT_DIR: output/
129 | OUTPUT_CFG_FILE: configuration.log
130 | LOG_PERIOD: 10
131 | DIST_BACKEND: nccl
132 | LOG_MODEL_INFO: true
133 | LOG_CONFIG_INFO: true
134 | OSS:
135 | ENABLE: false
136 | KEY:
137 | SECRET:
138 | ENDPOINT:
139 | CHECKPOINT_OUTPUT_PATH: # !!@7
140 | SECONDARY_DATA_OSS:
141 | ENABLE: false
142 | KEY:
143 | SECRET:
144 | ENDPOINT:
145 | BUCKETS: [""]
146 | AUGMENTATION:
147 | COLOR_AUG: false
148 | BRIGHTNESS: 0.5
149 | CONTRAST: 0.5
150 | SATURATION: 0.5
151 | HUE: 0.25
152 | GRAYSCALE: 0.3
153 | CONSISTENT: true
154 | SHUFFLE: true
155 | GRAY_FIRST: true
156 | RATIO: [0.857142857142857, 1.1666666666666667]
157 | USE_GPU: false
158 | MIXUP:
159 | ENABLE: false
160 | ALPHA: 0.0
161 | PROB: 1.0
162 | MODE: batch
163 | SWITCH_PROB: 0.5
164 | CUTMIX:
165 | ENABLE: false
166 | ALPHA: 0.0
167 | MINMAX:
168 | RANDOM_ERASING:
169 | ENABLE: false
170 | PROB: 0.25
171 | MODE: const
172 | COUNT: [1, 1]
173 | NUM_SPLITS: 0
174 | AREA_RANGE: [0.02, 0.33]
175 | MIN_ASPECT: 0.3
176 | LABEL_SMOOTHING: 0.0
177 | SSV2_FLIP: false
178 | PAI: false
179 | USE_MULTISEG_VAL_DIST: false
--------------------------------------------------------------------------------
/configs/pool/run/training/finetune.yaml:
--------------------------------------------------------------------------------
1 | PRETRAIN:
2 | ENABLE: false
3 | TRAIN:
4 | ENABLE: true
5 | DATASET: # !!@1
6 | BATCH_SIZE: 1024
7 | LOG_FILE: training_log.log
8 | LOSS_FUNC: cross_entropy
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 30
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 10
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: true
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: true # !!@3
20 | DATASET: # !!@3
21 | BATCH_SIZE: 1024
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: true
29 | DATA:
30 | DATA_ROOT_DIR:
31 | ANNO_DIR:
32 | NUM_INPUT_FRAMES: 16
33 | NUM_INPUT_CHANNELS: 3
34 | SAMPLING_MODE: interval_based
35 | SAMPLING_RATE: 4
36 | TRAIN_JITTER_SCALES: [168, 224]
37 | TRAIN_CROP_SIZE: 112
38 | TEST_SCALE: 224
39 | TEST_CROP_SIZE: 112
40 | MEAN: [0.45, 0.45, 0.45]
41 | STD: [0.225, 0.225, 0.225]
42 | MULTI_LABEL: false
43 | ENSEMBLE_METHOD: sum
44 | FPS: 30
45 | TARGET_FPS: 30
46 | OPTIMIZER:
47 | BASE_LR: 0.002
48 | LR_POLICY: cosine
49 | MAX_EPOCH: 300
50 | MOMENTUM: 0.9
51 | WEIGHT_DECAY: 1e-3
52 | WARMUP_EPOCHS: 10
53 | WARMUP_START_LR: 0.0002
54 | OPTIM_METHOD: adam
55 | DAMPENING: 0.0
56 | NESTEROV: true
57 | BN:
58 | WEIGHT_DECAY: 0.0
59 | EPS: 1e-3
60 | DATA_LOADER:
61 | NUM_WORKERS: 4
62 | PIN_MEMORY: false
63 | ENABLE_MULTI_THREAD_DECODE: true
64 | NUM_GPUS: 8
65 | SHARD_ID: 0
66 | NUM_SHARDS: 1
67 | RANDOM_SEED: 0
68 | OUTPUT_DIR:
69 | OUTPUT_CFG_FILE: configuration.log
70 | LOG_PERIOD: 10
71 | DIST_BACKEND: nccl
72 | LOG_MODEL_INFO: true
73 | LOG_CONFIG_INFO: true
74 | AUGMENTATION:
75 | COLOR_AUG: true
76 | BRIGHTNESS: 0.5
77 | CONTRAST: 0.5
78 | SATURATION: 0.5
79 | HUE: 0.25
80 | GRAYSCALE: 0.3
81 | CONSISTENT: true
82 | SHUFFLE: true
83 | GRAY_FIRST: true
84 | RATIO: [0.857142857142857, 1.1666666666666667]
85 | USE_GPU: true
86 | PAI: false
87 |
88 |
--------------------------------------------------------------------------------
/configs/pool/run/training/from_scratch.yaml:
--------------------------------------------------------------------------------
1 | PRETRAIN:
2 | ENABLE: false
3 | TRAIN:
4 | ENABLE: true
5 | DATASET: # !!@1
6 | BATCH_SIZE: 1024
7 | LOG_FILE: training_log.log
8 | LOSS_FUNC: cross_entropy
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 30
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 10
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: false
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: false # !!@3
20 | DATASET: # !!@3
21 | BATCH_SIZE: 1024
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: true
29 | DATA:
30 | DATA_ROOT_DIR:
31 | ANNO_DIR:
32 | NUM_INPUT_FRAMES: 16
33 | NUM_INPUT_CHANNELS: 3
34 | SAMPLING_MODE: interval_based
35 | SAMPLING_RATE: 4
36 | TRAIN_JITTER_SCALES: [168, 224]
37 | TRAIN_CROP_SIZE: 112
38 | TEST_SCALE: 224
39 | TEST_CROP_SIZE: 112
40 | MEAN: [0.45, 0.45, 0.45]
41 | STD: [0.225, 0.225, 0.225]
42 | MULTI_LABEL: false
43 | ENSEMBLE_METHOD: sum
44 | FPS: 30
45 | TARGET_FPS: 30
46 | OPTIMIZER:
47 | BASE_LR: 0.002
48 | LR_POLICY: cosine
49 | MAX_EPOCH: 300
50 | MOMENTUM: 0.9
51 | WEIGHT_DECAY: 1e-3
52 | WARMUP_EPOCHS: 10
53 | WARMUP_START_LR: 0.0002
54 | OPTIM_METHOD: adam
55 | DAMPENING: 0.0
56 | NESTEROV: true
57 | BN:
58 | WEIGHT_DECAY: 0.0
59 | EPS: 1e-3
60 | DATA_LOADER:
61 | NUM_WORKERS: 4
62 | PIN_MEMORY: false
63 | ENABLE_MULTI_THREAD_DECODE: true
64 | NUM_GPUS: 8
65 | SHARD_ID: 0
66 | NUM_SHARDS: 1
67 | RANDOM_SEED: 0
68 | OUTPUT_DIR:
69 | OUTPUT_CFG_FILE: configuration.log
70 | LOG_PERIOD: 10
71 | DIST_BACKEND: nccl
72 | LOG_MODEL_INFO: true
73 | LOG_CONFIG_INFO: true
74 | AUGMENTATION:
75 | COLOR_AUG: true
76 | BRIGHTNESS: 0.5
77 | CONTRAST: 0.5
78 | SATURATION: 0.5
79 | HUE: 0.25
80 | GRAYSCALE: 0.3
81 | CONSISTENT: true
82 | SHUFFLE: true
83 | GRAY_FIRST: true
84 | RATIO: [0.857142857142857, 1.1666666666666667]
85 | USE_GPU: true
86 | PAI: false
--------------------------------------------------------------------------------
/configs/pool/run/training/from_scratch_large.yaml:
--------------------------------------------------------------------------------
1 | PRETRAIN:
2 | ENABLE: false
3 | TRAIN:
4 | ENABLE: true
5 | DATASET: # !!@1
6 | BATCH_SIZE: 256 # 256 for 32 gpus
7 | LOG_FILE: training_log.log
8 | LOSS_FUNC: cross_entropy
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 1
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 5
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: false
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: true # !!@3
20 | DATASET: # !!@3
21 | BATCH_SIZE: 256
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: true
29 | AUTOMATIC_MULTI_SCALE_TEST_SPATIAL: true
30 | DATA:
31 | DATA_ROOT_DIR:
32 | ANNO_DIR:
33 | NUM_INPUT_FRAMES: 16
34 | NUM_INPUT_CHANNELS: 3
35 | SAMPLING_MODE: interval_based
36 | SAMPLING_RATE: 4
37 | TRAIN_JITTER_SCALES: [256, 320]
38 | TRAIN_CROP_SIZE: 224
39 | TEST_SCALE: 224
40 | TEST_CROP_SIZE: 224
41 | MEAN: [0.45, 0.45, 0.45]
42 | STD: [0.225, 0.225, 0.225]
43 | MULTI_LABEL: false
44 | ENSEMBLE_METHOD: sum
45 | FPS: 30
46 | TARGET_FPS: 30
47 | OPTIMIZER:
48 | BASE_LR: 0.001
49 | ADJUST_LR: false
50 | LR_POLICY: cosine
51 | MAX_EPOCH: 100
52 | MOMENTUM: 0.9
53 | WEIGHT_DECAY: 1e-4
54 | WARMUP_EPOCHS: 10
55 | WARMUP_START_LR: 0.0001
56 | OPTIM_METHOD: adam
57 | DAMPENING: 0.0
58 | NESTEROV: true
59 | BN:
60 | WEIGHT_DECAY: 0.0
61 | DATA_LOADER:
62 | NUM_WORKERS: 8
63 | PIN_MEMORY: false
64 | ENABLE_MULTI_THREAD_DECODE: true
65 | NUM_GPUS: 32
66 | SHARD_ID: 0
67 | NUM_SHARDS: 1
68 | RANDOM_SEED: 0
69 | OUTPUT_DIR:
70 | OUTPUT_CFG_FILE: configuration.log
71 | LOG_PERIOD: 10
72 | DIST_BACKEND: nccl
73 | LOG_MODEL_INFO: true
74 | LOG_CONFIG_INFO: true
75 | AUGMENTATION:
76 | COLOR_AUG: false
77 | BRIGHTNESS: 0.5
78 | CONTRAST: 0.5
79 | SATURATION: 0.5
80 | HUE: 0.25
81 | GRAYSCALE: 0.3
82 | CONSISTENT: true
83 | SHUFFLE: true
84 | GRAY_FIRST: true
85 | RATIO: [0.857142857142857, 1.1666666666666667]
86 | USE_GPU: false
87 | PAI: false
--------------------------------------------------------------------------------
/configs/pool/run/training/localization.yaml:
--------------------------------------------------------------------------------
1 | TASK_TYPE: localization
2 | LOCALIZATION:
3 | ENABLE: true
4 | LOSS: Tem+PemReg+PemCls
5 | LOSS_WEIGHTS: [1,10,1]
6 | POS_CLS_THRES: 0.9
7 | POS_REG_THRES: 0.7
8 | NEG_REG_THRES: 0.3
9 |
10 | TEST_OUTPUT_DIR: ./output/
11 | PROPS_DIR: prop_results
12 | PROPS_REGRESSION_LOSS: smoothl1
13 | RESULT_FILE: localization_detection_res
14 | CLASSIFIER_FILE: ""
15 | POST_PROCESS:
16 | THREAD: 32
17 | SOFT_NMS_ALPHA: 0.4
18 | SOFT_NMS_LOW_THRES: 0.0
19 | SOFT_NMS_HIGH_THRES: 0.0
20 | PROP_NUM: 100
21 | SELECT_SCORE: 0.0001
22 | SCORE_TYPE: 'cr'
23 | CLR_POWER: 1.2
24 | REG_POWER: 1.2
25 | IOU_POWER: 2.0
26 | TCA_POWER: 1.0
27 | ACTION_SCORE_POWER: 1.0
28 | VIDEO_SCORES_WEIGHT: 1.0
29 |
30 | TRAIN:
31 | ENABLE: true
32 | DATASET: Epickitchen100Localization # !!@1
33 | BATCH_SIZE: 64
34 | LOG_FILE: training_log.log
35 | EVAL_PERIOD: 1
36 | NUM_FOLDS: 1
37 | AUTO_RESUME: true
38 | CHECKPOINT_PERIOD: 1
39 | CHECKPOINT_FILE_PATH: "" # !!@2
40 | CHECKPOINT_TYPE: pytorch
41 | CHECKPOINT_INFLATE: false
42 | FINE_TUNE: false
43 | LR_REDUCE: false
44 | TEST:
45 | ENABLE: false # !!@3
46 | OUTPUT_TEST: false
47 | FORCE_FORWARD: false
48 | DATASET: Epickitchen100Localization # !!@3
49 | BATCH_SIZE: 128
50 | LOG_FILE: val.log
51 | TEST_SET: val
52 | CHECKPOINT_FILE_PATH: ""
53 | SAVE_RESULTS_PATH: "preds.log"
54 | CHECKPOINT_TYPE: pytorch
55 | AUTOMATIC_MULTI_SCALE_TEST: false
56 | TEST_CHECKPOINT: [7,8,9,10]
57 |
58 | DATA:
59 | DATA_ROOT_DIR:
60 | ANNO_DIR:
61 | TEMPORAL_SCALE: 200
62 | DURATION_SCALE: -1
63 | TEMPORAL_MODE: resize
64 | NUM_INPUT_CHANNELS: 2304
65 | TEMPORAL_INTERVAL: 0.53333333
66 | NORM_FEATURE: true
67 | ANNO_NAME: ""
68 | LABELS_TYPE: bmn
69 |
70 | SOLVER:
71 | BASE_LR: 0.001
72 | ADJUST_LR: true
73 | LR_POLICY: cosine
74 | MAX_EPOCH: 10
75 | MOMENTUM: 0.9
76 | WEIGHT_DECAY: 1e-4
77 | WARMUP_EPOCHS: 1
78 | WARMUP_START_LR: 0.0001
79 | OPTIM_METHOD: adam
80 | DAMPENING: 0.0
81 | NESTEROV: true
82 | BN:
83 | USE_BN: false
84 | WEIGHT_DECAY: 0.0
85 | DATA_LOADER:
86 | NUM_WORKERS: 8
87 | PIN_MEMORY: true
88 |
89 | NUM_GPUS: 8
90 | SHARD_ID: 0
91 | NUM_SHARDS: 1
92 | RANDOM_SEED: 0
93 | OUTPUT_DIR: output/test
94 | OUTPUT_CFG_FILE: configuration.log
95 | LOG_PERIOD: 10
96 | DIST_BACKEND: nccl
97 | DEBUG_MODE: false
98 | LOG_MODEL_INFO: true
99 | LOG_CONFIG_INFO: true
100 | OSS:
101 | ENABLE: false
102 | PAI: true
103 |
--------------------------------------------------------------------------------
/configs/pool/run/training/mosi.yaml:
--------------------------------------------------------------------------------
1 | PRETRAIN:
2 | ENABLE: true
3 | GENERATOR: MoSIGenerator
4 | LOSS: MoSIJoint
5 | LOSS_WEIGHTS: [1]
6 | DISTANCE_JITTER: [1, 1]
7 | SCALE_JITTER: false
8 | NUM_FRAMES: 16
9 | DATA_MODE: xy
10 | DECOUPLE: true
11 | FRAME_SIZE_STANDARDIZE_ENABLE: true
12 | STANDARD_SIZE: 320
13 | LABEL_MODE: joint # seperate / joint
14 | ZERO_OUT: false
15 | STATIC_MASK: true
16 | ASPECT_RATIO: [1, 1]
17 | MASK_SIZE_RATIO: [0.3, 0.5]
18 | NUM_CLIPS_PER_VIDEO: 1
19 | TRAIN:
20 | ENABLE: true
21 | DATASET: # !!@1
22 | BATCH_SIZE: 80 # 80 for 8 gpus
23 | LOG_FILE: training_log.log
24 | EVAL_PERIOD: 5
25 | NUM_FOLDS: 1
26 | AUTO_RESUME: true
27 | CHECKPOINT_PERIOD: 10
28 | CHECKPOINT_FILE_PATH: "" # !!@2
29 | CHECKPOINT_TYPE: pytorch
30 | CHECKPOINT_INFLATE: false
31 | FINE_TUNE: false
32 | ONLY_LINEAR: false
33 | TEST:
34 | ENABLE: false # !!@3
35 | DATASET: # !!@3
36 | BATCH_SIZE: 80 # 80 for 8 gpus
37 | NUM_SPATIAL_CROPS: 1
38 | SPATIAL_CROPS: cc
39 | NUM_ENSEMBLE_VIEWS: 1
40 | LOG_FILE: val.log
41 | CHECKPOINT_FILE_PATH: ""
42 | CHECKPOINT_TYPE: pytorch
43 | AUTOMATIC_MULTI_SCALE_TEST: false
44 | DATA:
45 | DATA_ROOT_DIR:
46 | ANNO_DIR:
47 | NUM_INPUT_FRAMES: 1
48 | NUM_INPUT_CHANNELS: 3
49 | SAMPLING_MODE: interval_based
50 | SAMPLING_RATE: 4
51 | TRAIN_JITTER_SCALES: [168, 224]
52 | TRAIN_CROP_SIZE: 112
53 | TEST_SCALE: 224
54 | TEST_CROP_SIZE: 112
55 | MEAN: [0.45, 0.45, 0.45]
56 | STD: [0.225, 0.225, 0.225]
57 | MULTI_LABEL: false
58 | ENSEMBLE_METHOD: sum
59 | FPS: 30
60 | TARGET_FPS: 30
61 | OPTIMIZER:
62 | BASE_LR: 0.001
63 | LR_POLICY: cosine
64 | MAX_EPOCH: 100
65 | MOMENTUM: 0.9
66 | WEIGHT_DECAY: 1e-4
67 | WARMUP_EPOCHS: 10
68 | WARMUP_START_LR: 0.0001
69 | OPTIM_METHOD: adam
70 | DAMPENING: 0.0
71 | NESTEROV: true
72 | BN:
73 | WEIGHT_DECAY: 0.0
74 | EPS: 1e-3
75 | DATA_LOADER:
76 | NUM_WORKERS: 4
77 | PIN_MEMORY: false
78 | ENABLE_MULTI_THREAD_DECODE: true
79 | NUM_GPUS: 8
80 | SHARD_ID: 0
81 | NUM_SHARDS: 1
82 | RANDOM_SEED: 0
83 | OUTPUT_DIR:
84 | OUTPUT_CFG_FILE: configuration.log
85 | LOG_PERIOD: 10
86 | DIST_BACKEND: nccl
87 | LOG_MODEL_INFO: true
88 | LOG_CONFIG_INFO: true
89 | AUGMENTATION:
90 | COLOR_AUG: true
91 | BRIGHTNESS: 0.5
92 | CONTRAST: 0.5
93 | SATURATION: 0.5
94 | HUE: 0.25
95 | GRAYSCALE: 0.3
96 | CONSISTENT: false
97 | SHUFFLE: true
98 | GRAY_FIRST: true
99 | RATIO: [0.857142857142857, 1.1666666666666667]
100 | USE_GPU: true
101 | PAI: false
102 |
103 | MODEL:
104 | NAME: MoSINet
105 | VIDEO:
106 | HEAD:
107 | NAME: MoSIHeadJoint
108 | NUM_CLASSES: 5
109 | DROPOUT_RATE: 0.5
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/csn_ek100.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/csn.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: epickitchen100
9 | BATCH_SIZE: 256
10 | CHECKPOINT_FILE_PATH: ""
11 | TEST:
12 | ENABLE: true
13 | DATASET: epickitchen100
14 | BATCH_SIZE: 256
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/clips_512/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/annos/epic-kitchens-100-annotations-master/
18 | NUM_INPUT_FRAMES: 32
19 | SAMPLING_RATE: 2
20 | TEST_SCALE: 256
21 | TEST_CROP_SIZE: 256
22 | MULTI_LABEL: true
23 | TARGET_FPS: 60
24 | VIDEO:
25 | HEAD:
26 | NAME: BaseHeadx2
27 | NUM_CLASSES: [97, 300]
28 | DROPOUT_RATE: 0.5
29 | DATA_LOADER:
30 | NUM_WORKERS: 4
31 | OPTIMIZER:
32 | BASE_LR: 0.0001
33 | ADJUST_LR: false
34 | LR_POLICY: cosine
35 | MAX_EPOCH: 50
36 | MOMENTUM: 0.9
37 | WEIGHT_DECAY: 0.05
38 | WARMUP_EPOCHS: 5
39 | WARMUP_START_LR: 0.000001
40 | OPTIM_METHOD: adamw
41 | DAMPENING: 0.0
42 | NESTEROV: true
43 | NUM_GPUS: 32
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/csn_ek100_submission.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/csn.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: false
8 | DATASET: epickitchen100
9 | BATCH_SIZE: 256
10 | CHECKPOINT_FILE_PATH: ""
11 | TEST:
12 | ENABLE: false
13 | DATASET: epickitchen100
14 | BATCH_SIZE: 256
15 | SUBMISSION:
16 | ENABLE: true
17 | ACTION_CLASS_ENSUMBLE_METHOD: "sum" # sum or calculate
18 | TASK_TYPE: submission
19 | DATA:
20 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/clips_512/
21 | ANNO_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/annos/epic-kitchens-100-annotations-master/
22 | NUM_INPUT_FRAMES: 32
23 | SAMPLING_RATE: 2
24 | TEST_SCALE: 256
25 | TEST_CROP_SIZE: 256
26 | MULTI_LABEL: true
27 | TARGET_FPS: 60
28 | VIDEO:
29 | HEAD:
30 | NAME: BaseHeadx2
31 | NUM_CLASSES: [97, 300]
32 | DROPOUT_RATE: 0.5
33 | DATA_LOADER:
34 | NUM_WORKERS: 4
35 | NUM_GPUS: 32
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/ek100/csn.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../csn_ek100.yaml
2 | TRAIN:
3 | CHECKPOINT_PERIOD: 1
4 | CHECKPOINT_FILE_PATH: "" # pretrained weights from K400/K700/IG65M...
5 | FINE_TUNE: true
6 | CHECKPOINT_PRE_PROCESS:
7 | ENABLE: true
8 | POP_HEAD: true
9 | POS_EMBED:
10 | PATCH_EMBD:
11 | AUGMENTATION:
12 | COLOR_AUG: true
13 | BRIGHTNESS: 0.5
14 | CONTRAST: 0.5
15 | SATURATION: 0.5
16 | HUE: 0.25
17 | GRAYSCALE: 0.0
18 | CONSISTENT: true
19 | SHUFFLE: false
20 | GRAY_FIRST: false
21 | USE_GPU: false
22 | MIXUP:
23 | ENABLE: true
24 | ALPHA: 0.2
25 | PROB: 1.0
26 | MODE: batch
27 | SWITCH_PROB: 0.5
28 | CUTMIX:
29 | ENABLE: true
30 | ALPHA: 1.0
31 | MINMAX:
32 | RANDOM_ERASING:
33 | ENABLE: true
34 | PROB: 0.25
35 | MODE: pixel
36 | COUNT: [1, 1]
37 | NUM_SPLITS: 0
38 | AREA_RANGE: [0.02, 0.33]
39 | MIN_ASPECT: 0.3
40 | LABEL_SMOOTHING: 0.2
41 | BN:
42 | WB_LOCK: false
43 | FREEZE: true
44 | OUTPUT_DIR: output/csn_ek100
45 |
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/ek100/csn_submit.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../csn_ek100_submission.yaml
2 | TRAIN:
3 | CHECKPOINT_FILE_PATH: ./checkpoints/csn152_pt_k700_ft_ek100_32x224x224_4452_public.pyth
4 | BATCH_SIZE: 256
5 | TEST:
6 | BATCH_SIZE: 256
7 | OUTPUT_DIR: output/csn_ek100_submit
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/ek100/csn_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../csn_ek100.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: ./checkpoints/csn152_pt_k700_ft_ek100_32x224x224_4452_public.pyth
5 | BN:
6 | WB_LOCK: false
7 | FREEZE: true
8 | OUTPUT_DIR: output/csn_ek100_test
9 |
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/ek100/vivit_fac_enc.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../vivit_fac_enc_ek100.yaml
2 | TRAIN:
3 | CHECKPOINT_PERIOD: 1
4 | EVAL_PERIOD: 1
5 | CHECKPOINT_FILE_PATH: "" # directory of pretrained models
6 | FINE_TUNE: true
7 | BATCH_SIZE: 128
8 | CHECKPOINT_PRE_PROCESS:
9 | ENABLE: true
10 | POP_HEAD: true
11 | POS_EMBED: super-resolution
12 | PATCH_EMBD:
13 |
14 | DATA:
15 | TRAIN_JITTER_SCALES: [336, 448]
16 | TRAIN_CROP_SIZE: 320
17 | TEST_SCALE: 320
18 | TEST_CROP_SIZE: 320
19 |
20 | AUGMENTATION:
21 | COLOR_AUG: true
22 | BRIGHTNESS: 0.5
23 | CONTRAST: 0.5
24 | SATURATION: 0.5
25 | HUE: 0.25
26 | GRAYSCALE: 0.0
27 | CONSISTENT: true
28 | SHUFFLE: false
29 | GRAY_FIRST: false
30 | USE_GPU: false
31 | MIXUP:
32 | ENABLE: true
33 | ALPHA: 0.2
34 | PROB: 1.0
35 | MODE: batch
36 | SWITCH_PROB: 0.5
37 | CUTMIX:
38 | ENABLE: true
39 | ALPHA: 1.0
40 | MINMAX:
41 | RANDOM_ERASING:
42 | ENABLE: true
43 | PROB: 0.25
44 | MODE: pixel
45 | COUNT: [1, 1]
46 | NUM_SPLITS: 0
47 | AREA_RANGE: [0.02, 0.33]
48 | MIN_ASPECT: 0.3
49 | LABEL_SMOOTHING: 0.2
50 |
51 | VIDEO:
52 | BACKBONE:
53 | DROP_PATH: 0.2
54 | HEAD:
55 | DROPOUT_RATE: 0.0
56 |
57 | DATA_LOADER:
58 | NUM_WORKERS: 8
59 |
60 | OUTPUT_DIR: output/vivit_fac_enc_ek100
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/ek100/vivit_fac_enc_submit.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../vivit_fac_enc_ek100_submission.yaml
2 | TRAIN:
3 | CHECKPOINT_PERIOD: 1
4 | EVAL_PERIOD: 1
5 | CHECKPOINT_FILE_PATH: ./checkpoints/vivit_fac_enc_b16x2_pt_k700_ft_ek100_32x224x224_4630_public.pyth
6 | FINE_TUNE: true
7 | BATCH_SIZE: 256
8 |
9 | DATA:
10 | TRAIN_JITTER_SCALES: [336, 448]
11 | TRAIN_CROP_SIZE: 320
12 | TEST_SCALE: 320
13 | TEST_CROP_SIZE: 320
14 |
15 | DATA_LOADER:
16 | NUM_WORKERS: 8
17 |
18 | OUTPUT_DIR: output/vivit_fac_enc_ek100_submit
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/ek100/vivit_fac_enc_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../vivit_fac_enc_ek100.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: ./checkpoints/vivit_fac_enc_b16x2_pt_k700_ft_ek100_32x224x224_4630_public.pyth
5 | CHECKPOINT_PRE_PROCESS:
6 | ENABLE: true
7 | POP_HEAD: true
8 | POS_EMBED: super-resolution
9 | PATCH_EMBD:
10 |
11 | DATA:
12 | TRAIN_JITTER_SCALES: [336, 448]
13 | TRAIN_CROP_SIZE: 320
14 | TEST_SCALE: 320
15 | TEST_CROP_SIZE: 320
16 |
17 | DATA_LOADER:
18 | NUM_WORKERS: 8
19 |
20 | OUTPUT_DIR: output/vivit_fac_enc_ek100_test
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/k400/vivit_fac_enc_b16x2.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../vivit_fac_enc_k400.yaml
2 | TRAIN:
3 | CHECKPOINT_PERIOD: 1
4 | EVAL_PERIOD: 1
5 | CHECKPOINT_FILE_PATH: "" # directory to the pretrained imagenet vit b16 224 model
6 | FINE_TUNE: true
7 | OPTIMIZER:
8 | BASE_LR: 0.0001
9 | ADJUST_LR: false
10 | LR_POLICY: cosine
11 | MAX_EPOCH: 30
12 | MOMENTUM: 0.9
13 | WEIGHT_DECAY: 0.1
14 | WARMUP_EPOCHS: 2.5
15 | WARMUP_START_LR: 0.000001
16 | OPTIM_METHOD: adamw
17 | DAMPENING: 0.0
18 | NESTEROV: true
19 | MODEL:
20 | EMA:
21 | ENABLE: true
22 | DECAY: 0.999
23 |
24 | AUGMENTATION:
25 | COLOR_AUG: true
26 | BRIGHTNESS: 0.5
27 | CONTRAST: 0.5
28 | SATURATION: 0.5
29 | HUE: 0.25
30 | GRAYSCALE: 0.3
31 | CONSISTENT: true
32 | SHUFFLE: true
33 | GRAY_FIRST: true
34 | USE_GPU: false
35 | MIXUP:
36 | ENABLE: true
37 | ALPHA: 0.2
38 | PROB: 1.0
39 | MODE: batch
40 | SWITCH_PROB: 0.5
41 | LABEL_SMOOTHING: 0.1
42 |
43 | VIDEO:
44 | HEAD:
45 | DROPOUT_RATE: 0.0
46 |
47 | OUTPUT_DIR: output/vivit_fac_enc_k400
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/k400/vivit_fac_enc_b16x2_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../vivit_fac_enc_k400.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: "./checkpoints/vivit_fac_enc_b16x2_k400_32x224x224_7935_public.pyth"
5 |
6 | OUTPUT_DIR: output/vivit_fac_enc_k400_test
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/vivit_fac_enc_ek100.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/vivit_fac_enc.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: epickitchen100
9 | BATCH_SIZE: 256
10 | CHECKPOINT_FILE_PATH: ""
11 | TEST:
12 | ENABLE: true
13 | DATASET: epickitchen100
14 | BATCH_SIZE: 256
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/clips_512/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/annos/epic-kitchens-100-annotations-master/
18 | NUM_INPUT_FRAMES: 32
19 | SAMPLING_RATE: 2
20 | MULTI_LABEL: true
21 | TARGET_FPS: 60
22 | VIDEO:
23 | HEAD:
24 | NAME: TransformerHeadx2
25 | NUM_CLASSES: [97, 300]
26 | DROPOUT_RATE: 0.5
27 |
28 | DATA_LOADER:
29 | NUM_WORKERS: 4
30 |
31 | OPTIMIZER:
32 | BASE_LR: 0.0001
33 | ADJUST_LR: false
34 | LR_POLICY: cosine
35 | MAX_EPOCH: 50
36 | MOMENTUM: 0.9
37 | WEIGHT_DECAY: 0.05
38 | WARMUP_EPOCHS: 5
39 | WARMUP_START_LR: 0.000001
40 | OPTIM_METHOD: adamw
41 | DAMPENING: 0.0
42 | NESTEROV: true
43 | NUM_GPUS: 32
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/vivit_fac_enc_ek100_submission.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/vivit_fac_enc.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: false
8 | DATASET: epickitchen100
9 | BATCH_SIZE: 256
10 | CHECKPOINT_FILE_PATH: ""
11 | TEST:
12 | ENABLE: false
13 | DATASET: epickitchen100
14 | BATCH_SIZE: 256
15 | SUBMISSION:
16 | ENABLE: true
17 | ACTION_CLASS_ENSUMBLE_METHOD: "sum" # sum or calculate
18 | TASK_TYPE: submission
19 | DATA:
20 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/clips_512/
21 | ANNO_DIR: /mnt/ziyuan/ziyuan/EPIC-KITCHENS-100/annos/epic-kitchens-100-annotations-master/
22 | NUM_INPUT_FRAMES: 32
23 | SAMPLING_RATE: 2
24 | MULTI_LABEL: true
25 | TARGET_FPS: 60
26 | VIDEO:
27 | HEAD:
28 | NAME: TransformerHeadx2
29 | NUM_CLASSES: [97, 300]
30 | DROPOUT_RATE: 0.5
31 |
32 | DATA_LOADER:
33 | NUM_WORKERS: 10
34 | NUM_GPUS: 32
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-ar/vivit_fac_enc_k400.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/vivit_fac_enc.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 256
10 | CHECKPOINT_FILE_PATH: ""
11 | TEST:
12 | ENABLE: true
13 | DATASET: kinetics400
14 | BATCH_SIZE: 256
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
18 | SAMPLING_RATE: 2
19 | NUM_INPUT_FRAMES: 32
20 | VIDEO:
21 | HEAD:
22 | NUM_CLASSES: 400
23 | DROPOUT_RATE: 0.5
24 |
25 | DATA_LOADER:
26 | NUM_WORKERS: 4
27 | NUM_GPUS: 32
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-tal/bmn-epic/vivit-os-local.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../bmn_epic.yaml
2 | TRAIN:
3 | ENABLE: true
4 | BATCH_SIZE: 4
5 | CHECKPOINT_FILE_PATH: ""
6 | TEST:
7 | ENABLE: true
8 | BATCH_SIZE: 4
9 | TEST_CHECKPOINT: [9]
10 | CHECKPOINT_FILE_PATH: ""
11 | OUTPUT_DIR: /mnt/data-nas/qingzhiwu/results/checkpoints/epic_tal/vvt-os/
12 |
13 |
--------------------------------------------------------------------------------
/configs/projects/epic-kitchen-tal/bmn_epic.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/localization.yaml
2 | _BASE_MODEL: ../../pool/backbone/localization-conv.yaml
3 |
4 | TRAIN:
5 | ENABLE: true
6 | BATCH_SIZE: 16
7 | DATASET: Epickitchen100Localization
8 | CHECKPOINT_FILE_PATH: # !!@2
9 | TEST:
10 | ENABLE: true
11 | BATCH_SIZE: 16
12 | DATASET: Epickitchen100Localization
13 |
14 | LOCALIZATION:
15 | ENABLE: true
16 | LOSS: Tem+PemReg+PemCls
17 | LOSS_WEIGHTS: [1,10,1,1]
18 | TEST_OUTPUT_DIR: ./output/
19 | PROPS_DIR: prop_results
20 | RESULT_FILE: tal_detection_res
21 | CLASSIFIER_FILE:
22 | POST_PROCESS:
23 | PROP_NUM_RATIO: 2
24 | THREAD: 32
25 | SOFT_NMS_ALPHA: 0.4
26 | SOFT_NMS_LOW_THRES: 0.25
27 | SOFT_NMS_HIGH_THRES: 0.9
28 | PROP_NUM_RATIO: 1.0
29 | SELECT_SCORE: 0.0
30 | SCORE_TYPE: 'cr'
31 | CLR_POWER: 1.2
32 | REG_POWER: 1.0
33 | IOU_POWER: 2.0
34 | ACTION_SCORE_POWER: 1.0
35 | VIDEO_SCORES_WEIGHT: 1.0
36 |
37 | DATA:
38 | DATA_ROOT_DIR: [/mnt/data-nas/qingzhiwu/dataset/epic-tal/features/features_s8_fps60_320_-1_train/]
39 | ANNO_DIR: /mnt/data-nas/qingzhiwu/dataset/epic-tal/annotations/
40 | VIDEO_LENGTH_FILE: epic_videos_len.txt
41 | ANNO_NAME: "EPIC_100_validation.json"
42 | TEMPORAL_SCALE: 200
43 | DURATION_SCALE: 100
44 | NUM_INPUT_CHANNELS: 6912
45 | NORM_FEATURE: false
46 | LABELS_TYPE: bmn
47 | LOAD_TYPE: torch
48 | CLIPS_LIST_FILE: 5s_clips.txt
49 | TARGET_FPS: 60
50 | NUM_INPUT_FRAMES: 32
51 | SAMPLING_RATE: 2
52 | CLIP_INTERVAL: 8
53 | MULTI_LABEL: true
54 | CLASSIFIER_ROOT_DIR: /mnt/data-nas/qingzhiwu/dataset/epic-tal/features/cls_res_s8_fps60_320_-1_train/
55 | LOAD_CLASSIFIER_RES: true
56 |
57 | OPTIMIZER:
58 | BASE_LR: 0.002
59 | ADJUST_LR: true
60 | LR_POLICY: cosine
61 | MAX_EPOCH: 10
62 | MOMENTUM: 0.9
63 | WEIGHT_DECAY: 1e-4
64 | WARMUP_EPOCHS: 1
65 | WARMUP_START_LR: 0.00001
66 | OPTIM_METHOD: adamw
67 | DAMPENING: 0.0
68 | NESTEROV: true
69 |
70 | VIDEO:
71 | HEAD:
72 | NAME: BaseBMN
73 | ACTIVATION: sigmoid
74 | DROPOUT_RATE: 0
75 | NUM_SAMPLE: 32
76 | NUM_SAMPLE_PERBIN: 3
77 | BOUNDARY_RATIO: 0.5
78 | USE_BMN_REGRESSION: false
79 |
80 | LOG_PERIOD: 50
81 | USE_MULTISEG_VAL_DIST: true
--------------------------------------------------------------------------------
/configs/projects/mosi/baselines/r2d3ds_hmdb.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../../pool/run/training/from_scratch.yaml
2 | _BASE_MODEL: ../../../pool/backbone/r2d3ds.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: HMDB51
9 | CHECKPOINT_FILE_PATH: ""
10 | BATCH_SIZE: 1024
11 | TEST:
12 | ENABLE: true
13 | DATASET: HMDB51
14 | BATCH_SIZE: 1024
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/hmdb51/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/hmdb51/anno_lists/
18 | VIDEO:
19 | HEAD:
20 | NUM_CLASSES: 51
21 | DROPOUT_RATE: 0.5
22 | OUTPUT_DIR: output/r2d3ds_hmdb_from_scratch
23 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/baselines/r2d3ds_ucf.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../../pool/run/training/from_scratch.yaml
2 | _BASE_MODEL: ../../../pool/backbone/r2d3ds.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: UCF101
9 | CHECKPOINT_FILE_PATH: ""
10 | BATCH_SIZE: 1024
11 | TEST:
12 | ENABLE: true
13 | DATASET: UCF101
14 | BATCH_SIZE: 1024
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ucf101/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/ucf101/annotations/
18 | VIDEO:
19 | HEAD:
20 | NUM_CLASSES: 101
21 | DROPOUT_RATE: 0.5
22 | OUTPUT_DIR: output/r2d3ds_ucf_from_scratch
23 | NUM_GPUS: 8
24 |
--------------------------------------------------------------------------------
/configs/projects/mosi/baselines/r2p1d_hmdb.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../../pool/run/training/from_scratch.yaml
2 | _BASE_MODEL: ../../../pool/backbone/r2p1d.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: HMDB51
9 | CHECKPOINT_FILE_PATH: ""
10 | BATCH_SIZE: 384
11 | TEST:
12 | ENABLE: true
13 | DATASET: HMDB51
14 | BATCH_SIZE: 384
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/hmdb51/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/hmdb51/anno_lists/
18 | VIDEO:
19 | HEAD:
20 | NUM_CLASSES: 51
21 | DROPOUT_RATE: 0.5
22 | OUTPUT_DIR: output/r2p1d_hmdb_from_scratch
23 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/baselines/r2p1d_ucf.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../../pool/run/training/from_scratch.yaml
2 | _BASE_MODEL: ../../../pool/backbone/r2p1d.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: UCF101
9 | CHECKPOINT_FILE_PATH: ""
10 | BATCH_SIZE: 384
11 | TEST:
12 | ENABLE: true
13 | DATASET: UCF101
14 | BATCH_SIZE: 384
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ucf101/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/ucf101/annotations/
18 | VIDEO:
19 | HEAD:
20 | NUM_CLASSES: 101
21 | DROPOUT_RATE: 0.5
22 | OUTPUT_DIR: output/r2p1d_ucf_from_scratch
23 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-hmdb/r2d3ds.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2d3ds_hmdb.yaml
2 | TRAIN:
3 | CHECKPOINT_FILE_PATH: ./checkpoints/r2d3ds_pt_hmdb_mosi_public.pyth
4 | OUTPUT_DIR: output/r2d3ds_mosi_ft_hmdb
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-hmdb/r2d3ds_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2d3ds_hmdb.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: ./checkpoints/r2d3ds_pt_hmdb_ft_hmdb_4693_public.pyth
5 | OUTPUT_DIR: output/r2d3ds_mosi_ft_hmdb_test
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-hmdb/r2p1d.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2p1d_hmdb.yaml
2 | TRAIN:
3 | CHECKPOINT_FILE_PATH: ./checkpoints/r2p1d_pt_hmdb_mosi_public.pyth
4 | OUTPUT_DIR: output/r2p1d_mosi_ft_hmdb
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-hmdb/r2p1d_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2p1d_hmdb.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: ./checkpoints/r2p1d_pt_hmdb_ft_hmdb_5183_public.pyth
5 | OUTPUT_DIR: output/r2p1d_mosi_ft_hmdb_test
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-ucf/r2d3ds.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2d3ds_ucf.yaml
2 | TRAIN:
3 | CHECKPOINT_FILE_PATH: ./checkpoints/r2d3ds_pt_ucf_mosi_public.pyth
4 | OUTPUT_DIR: output/r2d3ds_mosi_ft_ucf
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-ucf/r2d3ds_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2d3ds_ucf.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: ./checkpoints/r2d3ds_pt_ucf_ft_ucf_7175_public.pyth
5 | OUTPUT_DIR: output/r2d3ds_mosi_ft_ucf_test
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-ucf/r2p1d.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2p1d_ucf.yaml
2 | TRAIN:
3 | CHECKPOINT_FILE_PATH: ./checkpoints/r2p1d_pt_ucf_mosi_public.pyth
4 | OUTPUT_DIR: output/r2p1d_mosi_ft_ucf
5 |
--------------------------------------------------------------------------------
/configs/projects/mosi/ft-ucf/r2p1d_test.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../ft_r2p1d_ucf.yaml
2 | TRAIN:
3 | ENABLE: false
4 | CHECKPOINT_FILE_PATH: ./checkpoints/r2p1d_pt_ucf_ft_ucf_8279_public.pyth
5 | OUTPUT_DIR: output/r2p1d_mosi_ft_ucf_test
6 |
--------------------------------------------------------------------------------
/configs/projects/mosi/ft_r2d3ds_hmdb.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/finetune.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2d3ds.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: HMDB51
9 | CHECKPOINT_FILE_PATH: "" # !!@2
10 | BATCH_SIZE: 1024
11 | TEST:
12 | ENABLE: true
13 | DATASET: HMDB51
14 | BATCH_SIZE: 1024
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/hmdb51/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/hmdb51/anno_lists/
18 | MINUS_INTERVAL: false
19 | VIDEO:
20 | HEAD:
21 | NUM_CLASSES: 51
22 | DROPOUT_RATE: 0.5
23 | OPTIMIZER:
24 | BASE_LR: 0.002
25 | WARMUP_START_LR: 0.0002
26 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/ft_r2d3ds_ucf.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/finetune.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2d3ds.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: UCF101
9 | CHECKPOINT_FILE_PATH: "" # !!@2
10 | BATCH_SIZE: 1024
11 | TEST:
12 | ENABLE: true
13 | DATASET: UCF101
14 | BATCH_SIZE: 1024
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ucf101/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/ucf101/annotations/
18 | MINUS_INTERVAL: false
19 | VIDEO:
20 | HEAD:
21 | NUM_CLASSES: 101
22 | DROPOUT_RATE: 0.5
23 | OPTIMIZER:
24 | BASE_LR: 0.004
25 | WARMUP_START_LR: 0.0004
26 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/ft_r2p1d_hmdb.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/finetune.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2p1d.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: HMDB51
9 | CHECKPOINT_FILE_PATH: "" # !!@2
10 | BATCH_SIZE: 384
11 | TEST:
12 | ENABLE: true
13 | DATASET: HMDB51
14 | BATCH_SIZE: 384
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/hmdb51/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/hmdb51/anno_lists/
18 | MINUS_INTERVAL: false
19 | VIDEO:
20 | HEAD:
21 | NUM_CLASSES: 51
22 | DROPOUT_RATE: 0.5
23 | OPTIMIZER:
24 | BASE_LR: 0.00075
25 | WARMUP_START_LR: 0.000075
26 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/ft_r2p1d_ucf.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/finetune.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2p1d.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: UCF101
9 | CHECKPOINT_FILE_PATH: "" # !!@2
10 | BATCH_SIZE: 384
11 | TEST:
12 | ENABLE: true
13 | DATASET: UCF101
14 | BATCH_SIZE: 384
15 | DATA:
16 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ucf101/videos/
17 | ANNO_DIR: /mnt/ziyuan/ziyuan/ucf101/annotations/
18 | MINUS_INTERVAL: false
19 | VIDEO:
20 | HEAD:
21 | NUM_CLASSES: 101
22 | DROPOUT_RATE: 0.5
23 | OPTIMIZER:
24 | BASE_LR: 0.0015
25 | WARMUP_START_LR: 0.00015
26 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/mosi/mosi_r2d3ds_hmdb.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/mosi.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2d3ds.yaml
3 |
4 | TRAIN:
5 | ENABLE: true
6 | DATASET: HMDB51
7 | BATCH_SIZE: 10 # 10 per gpu
8 | LOG_FILE: training_log.log
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 20
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 10
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: false
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: false
20 | DATASET: HMDB51
21 | BATCH_SIZE: 10
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: false
29 | DATA:
30 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/hmdb51/videos/
31 | ANNO_DIR: /mnt/ziyuan/ziyuan/hmdb51/anno_lists/
32 | NUM_GPUS: 16
--------------------------------------------------------------------------------
/configs/projects/mosi/mosi_r2d3ds_imagenet.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/mosi.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2d3ds.yaml
3 |
4 | PRETRAIN:
5 | IMAGENET_DATA_SIZE:
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: imagenet
9 | BATCH_SIZE: 10 # 10 per gpu
10 | LOG_FILE: training_log.log
11 | EVAL_PERIOD: 5
12 | NUM_FOLDS: 20
13 | AUTO_RESUME: true
14 | CHECKPOINT_PERIOD: 10
15 | CHECKPOINT_FILE_PATH: "" # !!@2p
16 | CHECKPOINT_TYPE: pytorch
17 | CHECKPOINT_INFLATE: false
18 | FINE_TUNE: false
19 | ONLY_LINEAR: false
20 | TEST:
21 | ENABLE: false
22 | DATASET: imagenet
23 | BATCH_SIZE: 10
24 | NUM_SPATIAL_CROPS: 1
25 | SPATIAL_CROPS: cc
26 | NUM_ENSEMBLE_VIEWS: 1
27 | LOG_FILE: val.log
28 | CHECKPOINT_FILE_PATH: ""
29 | CHECKPOINT_TYPE: pytorch
30 | AUTOMATIC_MULTI_SCALE_TEST: false
31 | DATA:
32 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/imagenet/
33 | ANNO_DIR: /mnt/ziyuan/ziyuan/imagenet/
34 | MEAN: [0.485, 0.456, 0.406]
35 | STD: [0.229, 0.224, 0.225]
36 | NUM_GPUS: 16
--------------------------------------------------------------------------------
/configs/projects/mosi/mosi_r2d3ds_ucf.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/mosi.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2d3ds.yaml
3 |
4 | TRAIN:
5 | ENABLE: true
6 | DATASET: UCF101
7 | BATCH_SIZE: 10 # 10 per gpu
8 | LOG_FILE: training_log.log
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 20
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 10
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: false
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: false
20 | DATASET: UCF101
21 | BATCH_SIZE: 10
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: false
29 | DATA:
30 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ucf101/videos/
31 | ANNO_DIR: /mnt/ziyuan/ziyuan/ucf101/annotations/
32 | NUM_GPUS: 16
--------------------------------------------------------------------------------
/configs/projects/mosi/mosi_r2p1d_hmdb.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/mosi.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2p1d.yaml
3 |
4 | TRAIN:
5 | ENABLE: true
6 | DATASET: HMDB51
7 | BATCH_SIZE: 5 # 5 per gpu
8 | LOG_FILE: training_log.log
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 20
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 10
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: false
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: false
20 | DATASET: HMDB51
21 | BATCH_SIZE: 5
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: false
29 | DATA:
30 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/hmdb51/videos/
31 | ANNO_DIR: /mnt/ziyuan/ziyuan/hmdb51/anno_lists/
32 | NUM_GPUS: 16
--------------------------------------------------------------------------------
/configs/projects/mosi/mosi_r2p1d_ucf.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/mosi.yaml
2 | _BASE_MODEL: ../../pool/backbone/r2p1d.yaml
3 |
4 | TRAIN:
5 | ENABLE: true
6 | DATASET: UCF101
7 | BATCH_SIZE: 5 # 5 per gpu
8 | LOG_FILE: training_log.log
9 | EVAL_PERIOD: 5
10 | NUM_FOLDS: 20
11 | AUTO_RESUME: true
12 | CHECKPOINT_PERIOD: 10
13 | CHECKPOINT_FILE_PATH: "" # !!@2
14 | CHECKPOINT_TYPE: pytorch
15 | CHECKPOINT_INFLATE: false
16 | FINE_TUNE: false
17 | ONLY_LINEAR: false
18 | TEST:
19 | ENABLE: false
20 | DATASET: UCF101
21 | BATCH_SIZE: 5
22 | NUM_SPATIAL_CROPS: 1
23 | SPATIAL_CROPS: cc
24 | NUM_ENSEMBLE_VIEWS: 1
25 | LOG_FILE: val.log
26 | CHECKPOINT_FILE_PATH: ""
27 | CHECKPOINT_TYPE: pytorch
28 | AUTOMATIC_MULTI_SCALE_TEST: false
29 | DATA:
30 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ucf101/videos/
31 | ANNO_DIR: /mnt/ziyuan/ziyuan/ucf101/annotations/
32 | NUM_GPUS: 16
--------------------------------------------------------------------------------
/configs/projects/mosi/pt-hmdb/r2d3ds.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../mosi_r2d3ds_hmdb.yaml
2 | TRAIN:
3 | EVAL_PERIOD: 10
4 | OUTPUT_DIR: output/r2d3ds_pt_hmdb
--------------------------------------------------------------------------------
/configs/projects/mosi/pt-hmdb/r2p1d.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../mosi_r2p1d_hmdb.yaml
2 | TRAIN:
3 | EVAL_PERIOD: 10
4 | OUTPUT_DIR: output/r2p1d_pt_hmdb
5 |
--------------------------------------------------------------------------------
/configs/projects/mosi/pt-imagenet/r2d3ds.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../mosi_r2d3ds_imagenet.yaml
2 | TRAIN:
3 | EVAL_PERIOD: 10
4 | PRETRAIN:
5 | IMAGENET_DATA_SIZE: 5
6 | OUTPUT_DIR: output/r2d3ds_pt_imagenet
--------------------------------------------------------------------------------
/configs/projects/mosi/pt-ucf/r2d3ds.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../mosi_r2d3ds_ucf.yaml
2 | TRAIN:
3 | EVAL_PERIOD: 10
4 | OUTPUT_DIR: output/r2d3ds_pt_ucf
--------------------------------------------------------------------------------
/configs/projects/mosi/pt-ucf/r2p1d.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../mosi_r2p1d_ucf.yaml
2 | TRAIN:
3 | EVAL_PERIOD: 10
4 | OUTPUT_DIR: output/r2p1d_pt_ucf
5 |
--------------------------------------------------------------------------------
/configs/projects/tada/k400/tada2d_16x5.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../tada2d_k400.yaml
2 | TRAIN:
3 | FINE_TUNE: true
4 | BATCH_SIZE: 64
5 | INIT: in1k
6 | CHECKPOINT_FILE_PATH: "" # pretrained imagenet weights
7 | OPTIMIZER:
8 | BASE_LR: 0.12
9 | DATA:
10 | SAMPLING_RATE: 5
11 | NUM_INPUT_FRAMES: 16
12 | OUTPUT_DIR: output/tada2d_16x5_k400
--------------------------------------------------------------------------------
/configs/projects/tada/k400/tada2d_8x8.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../tada2d_k400.yaml
2 | TRAIN:
3 | FINE_TUNE: true
4 | INIT: in1k
5 | CHECKPOINT_FILE_PATH: "" # pretrained imagenet weights
6 | DATA:
7 | SAMPLING_RATE: 8
8 | NUM_INPUT_FRAMES: 8
9 | OUTPUT_DIR: output/tada2d_8x8_k400
--------------------------------------------------------------------------------
/configs/projects/tada/ssv2/tada2d_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../tada2d_ssv2.yaml
2 | TRAIN:
3 | FINE_TUNE: true
4 | BATCH_SIZE: 64
5 | INIT: in1k
6 | CHECKPOINT_FILE_PATH: "" # pretrained imagenet weights
7 | DATA:
8 | NUM_INPUT_FRAMES: 16
9 | OPTIMIZER:
10 | BASE_LR: 0.24
11 | OUTPUT_DIR: output/tada2d_ssv2_16f
--------------------------------------------------------------------------------
/configs/projects/tada/ssv2/tada2d_8f.yaml:
--------------------------------------------------------------------------------
1 | _BASE: ../tada2d_ssv2.yaml
2 | TRAIN:
3 | FINE_TUNE: true
4 | INIT: in1k
5 | CHECKPOINT_FILE_PATH: "" # pretrained imagenet weights
6 | DATA:
7 | NUM_INPUT_FRAMES: 8
8 | OUTPUT_DIR: output/tada2d_ssv2_8f
--------------------------------------------------------------------------------
/configs/projects/tada/tada2d_k400.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tada2d.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 128
10 | FINE_TUNE: true
11 | INIT: in1k
12 | CHECKPOINT_FILE_PATH: "" # !!@2
13 | TEST:
14 | ENABLE: true
15 | DATASET: kinetics400
16 | BATCH_SIZE: 128
17 | DATA:
18 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
19 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
20 | SAMPLING_RATE: 8
21 | NUM_INPUT_FRAMES: 8
22 | TRAIN_JITTER_SCALES: [224, 340]
23 | TRAIN_CROP_SIZE: 224
24 | TEST_SCALE: 256
25 | TEST_CROP_SIZE: 256
26 | VIDEO:
27 | HEAD:
28 | NUM_CLASSES: 400
29 | DROPOUT_RATE: 0.5
30 | DATA_LOADER:
31 | NUM_WORKERS: 8
32 | OPTIMIZER:
33 | BASE_LR: 0.24
34 | ADJUST_LR: false
35 | LR_POLICY: cosine
36 | MAX_EPOCH: 100
37 | MOMENTUM: 0.9
38 | WEIGHT_DECAY: 1e-4
39 | WARMUP_EPOCHS: 8
40 | WARMUP_START_LR: 0.01
41 | OPTIM_METHOD: sgd
42 | DAMPENING: 0.0
43 | NESTEROV: true
44 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/tada/tada2d_ssv2.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tada2d.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: ssv2
9 | BATCH_SIZE: 128
10 | FINE_TUNE: true
11 | INIT: in1k
12 | CHECKPOINT_FILE_PATH: ""
13 | TEST:
14 | ENABLE: true
15 | DATASET: ssv2
16 | BATCH_SIZE: 128
17 | DATA:
18 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ssv2/
19 | ANNO_DIR: /mnt/ziyuan/ziyuan/ssv2/
20 | NUM_INPUT_FRAMES: 8
21 | SAMPLING_MODE: segment_based
22 | TRAIN_JITTER_SCALES: [224, 340]
23 | TRAIN_CROP_SIZE: 224
24 | TEST_SCALE: 256
25 | TEST_CROP_SIZE: 256
26 | VIDEO:
27 | HEAD:
28 | NUM_CLASSES: 174
29 | DROPOUT_RATE: 0.5
30 | DATA_LOADER:
31 | NUM_WORKERS: 8
32 | OPTIMIZER:
33 | BASE_LR: 0.48
34 | ADJUST_LR: false
35 | LR_POLICY: cosine
36 | MAX_EPOCH: 64
37 | MOMENTUM: 0.9
38 | WEIGHT_DECAY: 1e-4
39 | WARMUP_EPOCHS: 4
40 | WARMUP_START_LR: 0.0001
41 | OPTIM_METHOD: sgd
42 | DAMPENING: 0.0
43 | NESTEROV: true
44 | AUGMENTATION:
45 | SSV2_FLIP: true
46 | NUM_GPUS: 8
--------------------------------------------------------------------------------
/configs/projects/tadaconvnextv2/tadaconvnextv2_base_k400_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaconvnextv2_base.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 64 #total batch size: 64x4=256
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: in1k
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: kinetics400
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
21 | SAMPLING_RATE: 5
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 256
26 | TEST_CROP_SIZE: 256
27 | VIDEO:
28 | BACKBONE:
29 | DROP_PATH: 0.6
30 | HEAD:
31 | NUM_CLASSES: 400
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaconvnextv2_base_k400_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 2.5e-4
38 | ADJUST_LR: false
39 | LR_POLICY: cosine
40 | MAX_EPOCH: 100
41 | MOMENTUM: 0.9
42 | WEIGHT_DECAY: 0.02
43 | WARMUP_EPOCHS: 8
44 | WARMUP_START_LR: 1e-6
45 | OPTIM_METHOD: adamw
46 | DAMPENING: 0.0
47 | NESTEROV: true
48 | HEAD_LRMULT: 10
49 | NEW_PARAMS: ["dwconv_rf", "norm_avgpool"]
50 | NEW_PARAMS_MULT: 10
51 | AUGMENTATION:
52 | COLOR_AUG: true
53 | GRAYSCALE: 0.2
54 | COLOR_P: 0.0
55 | CONSISTENT: true
56 | SHUFFLE: true
57 | GRAY_FIRST: false
58 | IS_SPLIT: false
59 | USE_GPU: false
60 | SSV2_FLIP: true
61 | RATIO: [0.75, 1.333]
62 | MIXUP:
63 | ENABLE: false
64 | CUTMIX:
65 | ENABLE: false
66 | RANDOM_ERASING:
67 | ENABLE: false
68 | LABEL_SMOOTHING: 0.0
69 | AUTOAUGMENT:
70 | ENABLE: true
71 | BEFORE_CROP: true
72 | TYPE: rand-m9-n4-mstd0.5-inc1
73 | NUM_GPUS: 8
74 | DATA_LOADER:
75 | NUM_WORKERS: 12
76 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaconvnextv2/tadaconvnextv2_base_ssv2_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaconvnextv2_base.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: ssv2
9 | BATCH_SIZE: 64 #total batch size: 64x4=256
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: in1k # by default, the initialization is from kinetics 400 pretrain
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: ssv2
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ssv2/videos_mp4/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/ssv2/labels/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 256
26 | TEST_CROP_SIZE: 256
27 | VIDEO:
28 | BACKBONE:
29 | DROP_PATH: 0.6
30 | HEAD:
31 | NUM_CLASSES: 174
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaconvnextv2_base_ssv2_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 2.5e-4
38 | ADJUST_LR: false
39 | LR_POLICY: cosine
40 | MAX_EPOCH: 64
41 | MOMENTUM: 0.9
42 | WEIGHT_DECAY: 0.02
43 | WARMUP_EPOCHS: 2.5
44 | WARMUP_START_LR: 1e-6
45 | OPTIM_METHOD: adamw
46 | DAMPENING: 0.0
47 | NESTEROV: true
48 | HEAD_LRMULT: 10
49 | NEW_PARAMS: ["dwconv_rf", "norm_avgpool"]
50 | NEW_PARAMS_MULT: 10
51 | AUGMENTATION:
52 | COLOR_AUG: true
53 | GRAYSCALE: 0.2
54 | COLOR_P: 0.0
55 | CONSISTENT: true
56 | SHUFFLE: true
57 | GRAY_FIRST: false
58 | IS_SPLIT: false
59 | USE_GPU: false
60 | SSV2_FLIP: true
61 | RATIO: [0.75, 1.333]
62 | MIXUP:
63 | ENABLE: false
64 | CUTMIX:
65 | ENABLE: false
66 | RANDOM_ERASING:
67 | ENABLE: false
68 | LABEL_SMOOTHING: 0.0
69 | AUTOAUGMENT:
70 | ENABLE: true
71 | BEFORE_CROP: true
72 | TYPE: rand-m9-n4-mstd0.5-inc1
73 | NUM_GPUS: 8
74 | DATA_LOADER:
75 | NUM_WORKERS: 12
76 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaconvnextv2/tadaconvnextv2_small_k400_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaconvnextv2_small.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 64 #total batch size: 64x4=256
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: in1k
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: kinetics400
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
21 | SAMPLING_RATE: 5
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 256
26 | TEST_CROP_SIZE: 256
27 | VIDEO:
28 | BACKBONE:
29 | DROP_PATH: 0.4
30 | HEAD:
31 | NUM_CLASSES: 400
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaconvnextv2_small_k400_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 2.5e-4
38 | ADJUST_LR: false
39 | LR_POLICY: cosine
40 | MAX_EPOCH: 100
41 | MOMENTUM: 0.9
42 | WEIGHT_DECAY: 0.02
43 | WARMUP_EPOCHS: 8
44 | WARMUP_START_LR: 1e-6
45 | OPTIM_METHOD: adamw
46 | DAMPENING: 0.0
47 | NESTEROV: true
48 | HEAD_LRMULT: 10
49 | NEW_PARAMS: ["dwconv_rf", "norm_avgpool"]
50 | NEW_PARAMS_MULT: 10
51 | AUGMENTATION:
52 | COLOR_AUG: true
53 | GRAYSCALE: 0.2
54 | COLOR_P: 0.0
55 | CONSISTENT: true
56 | SHUFFLE: true
57 | GRAY_FIRST: false
58 | IS_SPLIT: false
59 | USE_GPU: false
60 | SSV2_FLIP: true
61 | RATIO: [0.75, 1.333]
62 | MIXUP:
63 | ENABLE: false
64 | CUTMIX:
65 | ENABLE: false
66 | RANDOM_ERASING:
67 | ENABLE: false
68 | LABEL_SMOOTHING: 0.0
69 | AUTOAUGMENT:
70 | ENABLE: true
71 | BEFORE_CROP: true
72 | TYPE: rand-m9-n4-mstd0.5-inc1
73 | NUM_GPUS: 8
74 | DATA_LOADER:
75 | NUM_WORKERS: 12
76 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaconvnextv2/tadaconvnextv2_small_ssv2_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaconvnextv2_small.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: ssv2
9 | BATCH_SIZE: 64 #total batch size: 64x4=256
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: in1k # by default, the initialization is from kinetics 400 pretrain
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: ssv2
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ssv2/videos_mp4/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/ssv2/labels/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 256
26 | TEST_CROP_SIZE: 256
27 | VIDEO:
28 | BACKBONE:
29 | DROP_PATH: 0.5
30 | HEAD:
31 | NUM_CLASSES: 174
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaconvnextv2_small_ssv2_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 2.5e-4
38 | ADJUST_LR: false
39 | LR_POLICY: cosine
40 | MAX_EPOCH: 64
41 | MOMENTUM: 0.9
42 | WEIGHT_DECAY: 0.02
43 | WARMUP_EPOCHS: 2.5
44 | WARMUP_START_LR: 1e-6
45 | OPTIM_METHOD: adamw
46 | DAMPENING: 0.0
47 | NESTEROV: true
48 | HEAD_LRMULT: 10
49 | NEW_PARAMS: ["dwconv_rf", "norm_avgpool"]
50 | NEW_PARAMS_MULT: 10
51 | AUGMENTATION:
52 | COLOR_AUG: true
53 | GRAYSCALE: 0.2
54 | COLOR_P: 0.0
55 | CONSISTENT: true
56 | SHUFFLE: true
57 | GRAY_FIRST: false
58 | IS_SPLIT: false
59 | USE_GPU: false
60 | SSV2_FLIP: true
61 | RATIO: [0.75, 1.333]
62 | MIXUP:
63 | ENABLE: false
64 | CUTMIX:
65 | ENABLE: false
66 | RANDOM_ERASING:
67 | ENABLE: false
68 | LABEL_SMOOTHING: 0.0
69 | AUTOAUGMENT:
70 | ENABLE: true
71 | BEFORE_CROP: true
72 | TYPE: rand-m9-n4-mstd0.5-inc1
73 | NUM_GPUS: 8
74 | DATA_LOADER:
75 | NUM_WORKERS: 12
76 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaconvnextv2/tadaconvnextv2_tiny_k400_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaconvnextv2_tiny.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 128 #total batch size: 128x4=512
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: in1k
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: kinetics400
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
21 | SAMPLING_RATE: 5
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 256
26 | TEST_CROP_SIZE: 256
27 | VIDEO:
28 | BACKBONE:
29 | DROP_PATH: 0.2
30 | HEAD:
31 | NUM_CLASSES: 400
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaconvnextv2_tiny_k400_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 5e-4
38 | ADJUST_LR: false
39 | LR_POLICY: cosine
40 | MAX_EPOCH: 100
41 | MOMENTUM: 0.9
42 | WEIGHT_DECAY: 0.02
43 | WARMUP_EPOCHS: 8
44 | WARMUP_START_LR: 1e-6
45 | OPTIM_METHOD: adamw
46 | DAMPENING: 0.0
47 | NESTEROV: true
48 | HEAD_LRMULT: 10
49 | NEW_PARAMS: ["dwconv_rf", "norm_avgpool"]
50 | NEW_PARAMS_MULT: 10
51 | AUGMENTATION:
52 | COLOR_AUG: true
53 | GRAYSCALE: 0.2
54 | COLOR_P: 0.0
55 | CONSISTENT: true
56 | SHUFFLE: true
57 | GRAY_FIRST: false
58 | IS_SPLIT: false
59 | USE_GPU: false
60 | SSV2_FLIP: true
61 | RATIO: [0.75, 1.333]
62 | MIXUP:
63 | ENABLE: false
64 | CUTMIX:
65 | ENABLE: false
66 | RANDOM_ERASING:
67 | ENABLE: false
68 | LABEL_SMOOTHING: 0.0
69 | AUTOAUGMENT:
70 | ENABLE: true
71 | BEFORE_CROP: true
72 | TYPE: rand-m9-n4-mstd0.5-inc1
73 | NUM_GPUS: 8
74 | DATA_LOADER:
75 | NUM_WORKERS: 12
76 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaconvnextv2/tadaconvnextv2_tiny_ssv2_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaconvnextv2_tiny.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: ssv2
9 | BATCH_SIZE: 128 #total batch size: 128x4=512
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: in1k # by default, the initialization is from kinetics 400 pretrain
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: ssv2
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ssv2/videos_mp4/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/ssv2/labels/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 256
26 | TEST_CROP_SIZE: 256
27 | VIDEO:
28 | BACKBONE:
29 | DROP_PATH: 0.3
30 | HEAD:
31 | NUM_CLASSES: 174
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaconvnextv2_tiny_ssv2_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 5e-4
38 | ADJUST_LR: false
39 | LR_POLICY: cosine
40 | MAX_EPOCH: 64
41 | MOMENTUM: 0.9
42 | WEIGHT_DECAY: 0.02
43 | WARMUP_EPOCHS: 2.5
44 | WARMUP_START_LR: 1e-6
45 | OPTIM_METHOD: adamw
46 | DAMPENING: 0.0
47 | NESTEROV: true
48 | HEAD_LRMULT: 10
49 | NEW_PARAMS: ["dwconv_rf", "norm_avgpool"]
50 | NEW_PARAMS_MULT: 10
51 | AUGMENTATION:
52 | COLOR_AUG: true
53 | GRAYSCALE: 0.2
54 | COLOR_P: 0.0
55 | CONSISTENT: true
56 | SHUFFLE: true
57 | GRAY_FIRST: false
58 | IS_SPLIT: false
59 | USE_GPU: false
60 | SSV2_FLIP: true
61 | RATIO: [0.75, 1.333]
62 | MIXUP:
63 | ENABLE: false
64 | CUTMIX:
65 | ENABLE: false
66 | RANDOM_ERASING:
67 | ENABLE: false
68 | LABEL_SMOOTHING: 0.0
69 | AUTOAUGMENT:
70 | ENABLE: true
71 | BEFORE_CROP: true
72 | TYPE: rand-m9-n4-mstd0.5-inc1
73 | NUM_GPUS: 8
74 | DATA_LOADER:
75 | NUM_WORKERS: 12
76 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaformer/tadaformer_b16_k400_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaformer_b16.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 256
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: clip
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: kinetics400
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 224
26 | TEST_CROP_SIZE: 224
27 | MEAN: [0.48145466, 0.4578275, 0.40821073]
28 | STD: [0.26862954, 0.26130258, 0.27577711]
29 | VIDEO:
30 | HEAD:
31 | NUM_CLASSES: 400
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaformer_b16_k400_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 5e-5
38 | ADJUST_LR: false
39 | LR_POLICY: cosine_v2
40 | COSINE_END_LR: 1e-6
41 | COSINE_AFTER_WARMUP: true
42 | MAX_EPOCH: 30
43 | MOMENTUM: 0.9
44 | WEIGHT_DECAY: 0.05
45 | WARMUP_EPOCHS: 5
46 | WARMUP_START_LR: 1e-6
47 | OPTIM_METHOD: adamw
48 | DAMPENING: 0.0
49 | NESTEROV: true
50 | HEAD_LRMULT: 10
51 | NEW_PARAMS: ["tada"]
52 | NEW_PARAMS_MULT: 10
53 | LAYER_WISE_LR_DECAY: 0.7
54 | AUGMENTATION:
55 | COLOR_AUG: true
56 | GRAYSCALE: 0.2
57 | COLOR_P: 0.0
58 | CONSISTENT: true
59 | SHUFFLE: true
60 | GRAY_FIRST: false
61 | IS_SPLIT: false
62 | USE_GPU: false
63 | SSV2_FLIP: true
64 | RATIO: [0.75, 1.333]
65 | MIXUP:
66 | ENABLE: false
67 | CUTMIX:
68 | ENABLE: false
69 | RANDOM_ERASING:
70 | ENABLE: false
71 | LABEL_SMOOTHING: 0.1
72 | AUTOAUGMENT:
73 | ENABLE: true
74 | BEFORE_CROP: true
75 | TYPE: rand-m9-n4-mstd0.5-inc1
76 | NUM_GPUS: 8
77 | DATA_LOADER:
78 | NUM_WORKERS: 12
79 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaformer/tadaformer_b16_ssv2_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaformer_b16.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: ssv2
9 | BATCH_SIZE: 256
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: clip
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: ssv2
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ssv2/videos_mp4/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/ssv2/labels/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 224
26 | TEST_CROP_SIZE: 224
27 | MEAN: [0.48145466, 0.4578275, 0.40821073]
28 | STD: [0.26862954, 0.26130258, 0.27577711]
29 | VIDEO:
30 | BACKBONE:
31 | TEMP_ENHANCE: true
32 | DOUBLE_TADA: true
33 | HEAD:
34 | NUM_CLASSES: 174
35 | DROPOUT_RATE: 0.5
36 |
37 | OUTPUT_DIR: output/tadaformer_b16_ssv2_16f
38 |
39 | OPTIMIZER:
40 | BASE_LR: 5e-4
41 | ADJUST_LR: false
42 | LR_POLICY: cosine_v2
43 | COSINE_END_LR: 1e-6
44 | COSINE_AFTER_WARMUP: true
45 | MAX_EPOCH: 24
46 | MOMENTUM: 0.9
47 | WEIGHT_DECAY: 0.05
48 | WARMUP_EPOCHS: 4
49 | WARMUP_START_LR: 1e-8
50 | OPTIM_METHOD: adamw
51 | DAMPENING: 0.0
52 | NESTEROV: true
53 | HEAD_LRMULT: 10
54 | NEW_PARAMS: ["tada"]
55 | NEW_PARAMS_MULT: 10
56 | LAYER_WISE_LR_DECAY: 0.7
57 | AUGMENTATION:
58 | COLOR_AUG: true
59 | GRAYSCALE: 0.2
60 | COLOR_P: 0.0
61 | CONSISTENT: true
62 | SHUFFLE: true
63 | GRAY_FIRST: false
64 | IS_SPLIT: false
65 | USE_GPU: false
66 | SSV2_FLIP: true
67 | RATIO: [0.75, 1.333]
68 | MIXUP:
69 | ENABLE: false
70 | CUTMIX:
71 | ENABLE: false
72 | RANDOM_ERASING:
73 | ENABLE: false
74 | LABEL_SMOOTHING: 0.1
75 | AUTOAUGMENT:
76 | ENABLE: true
77 | BEFORE_CROP: true
78 | TYPE: rand-m9-n4-mstd0.5-inc1
79 | NUM_GPUS: 8
80 | DATA_LOADER:
81 | NUM_WORKERS: 12
82 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaformer/tadaformer_l14_k400_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaformer_l14.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: kinetics400
9 | BATCH_SIZE: 64
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: clip
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: kinetics400
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/kinetics400/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/kinetics400/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 224
26 | TEST_CROP_SIZE: 224
27 | MEAN: [0.48145466, 0.4578275, 0.40821073]
28 | STD: [0.26862954, 0.26130258, 0.27577711]
29 | VIDEO:
30 | HEAD:
31 | NUM_CLASSES: 400
32 | DROPOUT_RATE: 0.5
33 |
34 | OUTPUT_DIR: output/tadaformer_l14_k400_16f
35 |
36 | OPTIMIZER:
37 | BASE_LR: 2e-5
38 | ADJUST_LR: false
39 | LR_POLICY: cosine_v2
40 | COSINE_END_LR: 1e-6
41 | COSINE_AFTER_WARMUP: true
42 | MAX_EPOCH: 24
43 | MOMENTUM: 0.9
44 | WEIGHT_DECAY: 0.05
45 | WARMUP_EPOCHS: 5
46 | WARMUP_START_LR: 1e-6
47 | OPTIM_METHOD: adamw
48 | DAMPENING: 0.0
49 | NESTEROV: true
50 | HEAD_LRMULT: 10
51 | NEW_PARAMS: ["tada"]
52 | NEW_PARAMS_MULT: 10
53 | LAYER_WISE_LR_DECAY: 0.85
54 | AUGMENTATION:
55 | COLOR_AUG: true
56 | GRAYSCALE: 0.2
57 | COLOR_P: 0.0
58 | CONSISTENT: true
59 | SHUFFLE: true
60 | GRAY_FIRST: false
61 | IS_SPLIT: false
62 | USE_GPU: false
63 | SSV2_FLIP: true
64 | RATIO: [0.75, 1.333]
65 | MIXUP:
66 | ENABLE: false
67 | CUTMIX:
68 | ENABLE: false
69 | RANDOM_ERASING:
70 | ENABLE: false
71 | LABEL_SMOOTHING: 0.1
72 | AUTOAUGMENT:
73 | ENABLE: true
74 | BEFORE_CROP: true
75 | TYPE: rand-m9-n4-mstd0.5-inc1
76 | NUM_GPUS: 16
77 | DATA_LOADER:
78 | NUM_WORKERS: 12
79 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/configs/projects/tadaformer/tadaformer_l14_ssv2_16f.yaml:
--------------------------------------------------------------------------------
1 | _BASE_RUN: ../../pool/run/training/from_scratch_large.yaml
2 | _BASE_MODEL: ../../pool/backbone/tadaformer_l14.yaml
3 |
4 | PRETRAIN:
5 | ENABLE: false
6 | TRAIN:
7 | ENABLE: true
8 | DATASET: ssv2
9 | BATCH_SIZE: 128
10 | FINE_TUNE: true
11 | LR_REDUCE: true
12 | INIT: clip
13 | CHECKPOINT_FILE_PATH: ""
14 | TEST:
15 | ENABLE: true
16 | DATASET: ssv2
17 | BATCH_SIZE: 256
18 | DATA:
19 | DATA_ROOT_DIR: /mnt/ziyuan/ziyuan/ssv2/videos_mp4/
20 | ANNO_DIR: /mnt/ziyuan/ziyuan/ssv2/labels/
21 | SAMPLING_MODE: segment_based
22 | NUM_INPUT_FRAMES: 16
23 | TRAIN_JITTER_SCALES: [0.08, 1.0]
24 | TRAIN_CROP_SIZE: 224
25 | TEST_SCALE: 224
26 | TEST_CROP_SIZE: 224
27 | MEAN: [0.48145466, 0.4578275, 0.40821073]
28 | STD: [0.26862954, 0.26130258, 0.27577711]
29 | VIDEO:
30 | BACKBONE:
31 | DROP_PATH: 0.2
32 | TEMP_ENHANCE: true
33 | DOUBLE_TADA: true
34 | HEAD:
35 | NUM_CLASSES: 174
36 | DROPOUT_RATE: 0.5
37 |
38 | OUTPUT_DIR: output/tadaformer_l14_ssv2_16f
39 |
40 | OPTIMIZER:
41 | BASE_LR: 2.5e-4
42 | ADJUST_LR: false
43 | LR_POLICY: cosine_v2
44 | COSINE_END_LR: 1e-6
45 | COSINE_AFTER_WARMUP: true
46 | MAX_EPOCH: 24
47 | MOMENTUM: 0.9
48 | WEIGHT_DECAY: 0.05
49 | WARMUP_EPOCHS: 4
50 | WARMUP_START_LR: 1e-8
51 | OPTIM_METHOD: adamw
52 | DAMPENING: 0.0
53 | NESTEROV: true
54 | HEAD_LRMULT: 10
55 | NEW_PARAMS: ["tada"]
56 | NEW_PARAMS_MULT: 10
57 | LAYER_WISE_LR_DECAY: 0.85
58 | AUGMENTATION:
59 | COLOR_AUG: true
60 | GRAYSCALE: 0.2
61 | COLOR_P: 0.0
62 | CONSISTENT: true
63 | SHUFFLE: true
64 | GRAY_FIRST: false
65 | IS_SPLIT: false
66 | USE_GPU: false
67 | SSV2_FLIP: true
68 | RATIO: [0.75, 1.333]
69 | MIXUP:
70 | ENABLE: false
71 | CUTMIX:
72 | ENABLE: false
73 | RANDOM_ERASING:
74 | ENABLE: false
75 | LABEL_SMOOTHING: 0.1
76 | AUTOAUGMENT:
77 | ENABLE: true
78 | BEFORE_CROP: true
79 | TYPE: rand-m9-n4-mstd0.5-inc1
80 | NUM_GPUS: 8
81 | DATA_LOADER:
82 | NUM_WORKERS: 12
83 | PIN_MEMORY: true
--------------------------------------------------------------------------------
/projects/epic-kitchen-ar/README.md:
--------------------------------------------------------------------------------
1 | # Towards training stronger video vision transformers for epic-kitchens-100 action recognition (CVPR 2021 Workshop)
2 | [Ziyuan Huang](https://huang-ziyuan.github.io/), [Zhiwu Qing](https://scholar.google.com/citations?user=q9refl4AAAAJ&hl=zh-CN), Xiang Wang, Yutong Feng, [Shiwei Zhang](https://scholar.google.com/citations?user=ZO3OQ-8AAAAJ&hl=zh-CN&authuser=1), Jianwen Jiang, Zhurong Xia, Mingqian Tang, Nong Sang, and [Marcelo Ang](https://www.eng.nus.edu.sg/me/staff/ang-jr-marcelo-h/).
3 | In arXiv, 2021. [[Paper]](https://arxiv.org/pdf/2106.05058).
4 |
5 | # Running instructions
6 | Action recognition on Epic-Kitchens-100 share the same pipline with classification. Refer to `configs/projects/epic-kitchen-ar/vivit_fac_enc_ek100.yaml` for more details. We also include some trained weights in the [MODEL ZOO](MODEL_ZOO.md).
7 |
8 | For an example run, set the `DATA_ROOT_DIR`, `ANNO_DIR` and `NUM_GPUS` in `configs/projects/epic-kitchen-ar/vivit_fac_enc_ek100.yaml`, and run the command
9 |
10 | ```
11 | python runs/run.py --cfgconfigs/projects/epic-kitchen-ar/ek100/vivit_fac_enc.yaml
12 | ```
13 |
14 | # Citing this report
15 | If you find the training setting useful, please consider citing the paper as follows:
16 | ```BibTeX
17 | @article{huang2021towards,
18 | title={Towards training stronger video vision transformers for epic-kitchens-100 action recognition},
19 | author={Huang, Ziyuan and Qing, Zhiwu and Wang, Xiang and Feng, Yutong and Zhang, Shiwei and Jiang, Jianwen and Xia, Zhurong and Tang, Mingqian and Sang, Nong and Ang Jr, Marcelo H},
20 | journal={arXiv preprint arXiv:2106.05058},
21 | year={2021}
22 | }
23 | ```
--------------------------------------------------------------------------------
/projects/epic-kitchen-tal/README.md:
--------------------------------------------------------------------------------
1 |
2 | # A Stronger Baseline for Ego-Centric Action Detection (CVPR 2021 Workshop)
3 |
4 |
5 | # Running instructions
6 | To train the action localization model, set the `_BASE_RUN` to point to `configs/pool/run/training/localization.yaml`. See `configs/projects/epic-kitchen-tal/bmn_epic.yaml` for more details. Alternatively, you can also find some pre-trained model in the `MODEL_ZOO.md`.
7 |
8 | For detailed explanations on the approach itself, please refer to the [paper](https://arxiv.org/pdf/2106.06942).
9 |
10 | For preparing dataset, please download [features](), [classification results]() and [dataset annotations]().
11 |
12 |
13 | For an example run, set the `DATA_ROOT_DIR`, `ANNO_DIR`, `CLASSIFIER_ROOT_DIR` and `NUM_GPUS` in `configs/projects/epic-kitchen-tal/bmn_epic.yaml`, and run the command
14 |
15 | ```
16 | python runs/run.py --cfg configs/projects/epic-kitchen-tal/bmn-epic/vivit-os-local.yaml
17 | ```
18 |
19 |
20 | # Citing this report
21 | If you find this report useful for your research, please consider citing the paper as follows:
22 | ```BibTeX
23 | @article{qing2021stronger,
24 | title={A Stronger Baseline for Ego-Centric Action Detection},
25 | author={Qing, Zhiwu and Huang, Ziyuan and Wang, Xiang and Feng, Yutong and Zhang, Shiwei and Jiang, Jianwen and Tang, Mingqian and Gao, Changxin and Ang Jr, Marcelo H and Sang, Nong},
26 | journal={arXiv preprint arXiv:2106.06942},
27 | year={2021}
28 | }
29 | ```
30 |
--------------------------------------------------------------------------------
/projects/mosi/MoSI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alibaba-mmai-research/TAdaConv/75b7839b37fc94d98d4fe5f2aff4b3df4e347dfb/projects/mosi/MoSI.png
--------------------------------------------------------------------------------
/projects/mosi/README.md:
--------------------------------------------------------------------------------
1 | # Self-supervised Motion Learning from Static Images (CVPR 2021)
2 | [Ziyuan Huang](https://huang-ziyuan.github.io/), [Shiwei Zhang](https://scholar.google.com/citations?user=ZO3OQ-8AAAAJ&hl=zh-CN&authuser=1), Jianwen Jiang, Mingqian Tang,
3 | [Rong Jin](https://www.cse.msu.edu/~rongjin/), [Marcelo Ang](https://www.eng.nus.edu.sg/me/staff/ang-jr-marcelo-h/),
4 | In CVPR, 2021.
5 |
6 | [[Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Huang_Self-Supervised_Motion_Learning_From_Static_Images_CVPR_2021_paper.pdf)]
7 |
8 | # Running instructions
9 | To train the model with MoSI, set the `_BASE_RUN` to point to `configs/pool/run/training/mosi.yaml`. See `configs/projects/mosi/mosi_*.yaml` for more details. Alternatively, you can also find some pre-trained model in the `MODEL_ZOO.md`.
10 |
11 | For detailed explanations on the approach itself, please refer to the [paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Huang_Self-Supervised_Motion_Learning_From_Static_Images_CVPR_2021_paper.pdf).
12 |
13 | For an example run, set the `DATA_ROOT_DIR`, `ANNO_DIR` and `NUM_GPUS` in `configs/projects/mosi/mosi_r2d3ds_hmdb.yaml`, and run the command
14 |
15 | ```
16 | python runs/run.py --cfg configs/projects/mosi/pt-hmdb/r2d3ds.yaml
17 | ```
18 |
19 |
20 |
22 |
21 |