├── .gitignore
├── README.md
├── _config.yml
├── dataset
    ├── dev.json
    ├── test.json
    └── train.json
├── env.yaml
├── index.md
├── requirements.txt
├── video_converter.py
└── video_downloader.py


/.gitignore:
--------------------------------------------------------------------------------
1 | videos


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # MLSLT: Towards Multilingual Sign Language Translation
 2 | 
 3 | This repository contains the SP-10 dataset described in "MLSLT: Towards Multilingual Sign Language Translation".
 4 | The copyright of the data is owned by the non-profit association European Sign Language Centre, please ensure you have obtained permission before making any academic or commercial use.
 5 | 
 6 | ## Download videos
 7 | 
 8 | 1. Download repo.
 9 | ```
10 | git clone https://github.com/MLSLT/SP-10
11 | 
12 | ```
13 | 2. Install dependencies
14 | 
15 | If you use conda you can create a virtual environment with the following command,
16 | ```
17 | conda env create -f env.yaml
18 | conda activate sp-10
19 | ```
20 | or you can use pip to install the corresponding packages.
21 | ```
22 | pip install -r requirements.txt
23 | ```
24 | 3. Install [ffmpeg](https://ffmpeg.org/) to process videos, if you use Ubuntu system you can use the following command to install.
25 | ```
26 | sudo apt update
27 | sudo apt install ffmpeg
28 | ```
29 | 
30 | 4. Download videos.
31 | ```
32 | python video_downloader.py [--save_path ${SAVE_PATH}]
33 | ```
34 | The dataset is saved in the path ./videos by default, and you can also use --save_path to customize the save path.
35 | 
36 | 4. Convert videos.
37 | If you customize the save path, please make sure that the save_path of this step is the same as the previous step.
38 | ```
39 | python video_converter.py [--save_path ${SAVE_PATH}]
40 | ```
41 | ## Folder structure
42 | The storage structure of video data in folders is shown below.
43 | ```
44 | .
45 | ├── dev
46 | │   └── 635
47 | │       ├── bg.mp4
48 | │       ├── de.mp4
49 | │       ├── en.mp4
50 | │       ...
51 | ├── test
52 | │   └── 1296
53 | │       ├── bg.mp4
54 | │       ├── de.mp4
55 | │       ├── en.mp4
56 | │       ...
57 | └── train
58 |     ├── 410
59 |     │   ├── bg.mp4
60 |     │   ├── de.mp4
61 |     │   ├── en.mp4
62 |         ...
63 | 
64 | ```


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman


--------------------------------------------------------------------------------
/env.yaml:
--------------------------------------------------------------------------------
 1 | name: sp-10
 2 | channels:
 3 |   - defaults
 4 | dependencies:
 5 |   - _libgcc_mutex=0.1=main
 6 |   - _openmp_mutex=4.5=1_gnu
 7 |   - ca-certificates=2022.2.1=h06a4308_0
 8 |   - certifi=2021.10.8=py37h06a4308_2
 9 |   - ld_impl_linux-64=2.35.1=h7274673_9
10 |   - libffi=3.3=he6710b0_2
11 |   - libgcc-ng=9.3.0=h5101ec6_17
12 |   - libgomp=9.3.0=h5101ec6_17
13 |   - libstdcxx-ng=9.3.0=hd4cf53a_17
14 |   - ncurses=6.3=h7f8727e_2
15 |   - openssl=1.1.1m=h7f8727e_0
16 |   - pip=21.2.2=py37h06a4308_0
17 |   - python=3.7.11=h12debd9_0
18 |   - readline=8.1.2=h7f8727e_1
19 |   - setuptools=58.0.4=py37h06a4308_0
20 |   - sqlite=3.38.0=hc218d9a_0
21 |   - tk=8.6.11=h1ccaba5_0
22 |   - wheel=0.37.1=pyhd3eb1b0_0
23 |   - xz=5.2.5=h7b6447c_0
24 |   - zlib=1.2.11=h7f8727e_4
25 |   - pip:
26 |     - numpy==1.21.5
27 |     - opencv-python==4.5.5.64
28 | prefix: /home/yinaoxiong/.miniconda/envs/sp-10
29 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
 1 | ## Welcome to GitHub Pages
 2 | 
 3 | You can use the [editor on GitHub](https://github.com/MLSLT/SP-10/edit/gh-pages/index.md) to maintain and preview the content for your website in Markdown files.
 4 | 
 5 | Whenever you commit to this repository, GitHub Pages will run [Jekyll](https://jekyllrb.com/) to rebuild the pages in your site, from the content in your Markdown files.
 6 | 
 7 | ### Markdown
 8 | 
 9 | Markdown is a lightweight and easy-to-use syntax for styling your writing. It includes conventions for
10 | 
11 | ```markdown
12 | Syntax highlighted code block
13 | 
14 | # Header 1
15 | ## Header 2
16 | ### Header 3
17 | 
18 | - Bulleted
19 | - List
20 | 
21 | 1. Numbered
22 | 2. List
23 | 
24 | **Bold** and _Italic_ and `Code` text
25 | 
26 | [Link](url) and ![Image](src)
27 | ```
28 | 
29 | For more details see [Basic writing and formatting syntax](https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax).
30 | 
31 | ### Jekyll Themes
32 | 
33 | Your Pages site will use the layout and styles from the Jekyll theme you have selected in your [repository settings](https://github.com/MLSLT/SP-10/settings/pages). The name of this theme is saved in the Jekyll `_config.yml` configuration file.
34 | 
35 | ### Support or Contact
36 | 
37 | Having trouble with Pages? Check out our [documentation](https://docs.github.com/categories/github-pages-basics/) or [contact support](https://support.github.com/contact) and we’ll help you sort it out.
38 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.21.5
2 | opencv-python==4.5.5.64
3 | 


--------------------------------------------------------------------------------
/video_converter.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import argparse
 3 | from pathlib import Path
 4 | import os
 5 | import cv2
 6 | import shutil
 7 | 
 8 | logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
 9 | 
10 | 
11 | def main(args):
12 |     root_path = Path(args.save_path)
13 |     for video_path in root_path.rglob("*.mp4"):
14 |         cap = cv2.VideoCapture(str(video_path))
15 |         frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
16 |         if frame_rate == 25:
17 |             continue
18 |         logging.info(f"Start converting {video_path}, its frame rate is {frame_rate}")
19 |         video_path = video_path.resolve()
20 |         os.system(f"ffmpeg -i '{video_path}' -r 25 /tmp/sign.mp4 -y")
21 |         shutil.move("/tmp/sign.mp4", str(video_path))
22 | 
23 | 
24 | if __name__ == "__main__":
25 |     parse = argparse.ArgumentParser()
26 |     parse.add_argument(
27 |         "--save_path", help="Path to save the video", default="./videos", type=str
28 |     )
29 |     args = parse.parse_args()
30 |     logging.info("Start converting SP-10 dataset")
31 |     main(args)
32 |     logging.info("Converting SP-10 dataset finished")
33 | 


--------------------------------------------------------------------------------
/video_downloader.py:
--------------------------------------------------------------------------------
 1 | import urllib.request
 2 | import json
 3 | import logging
 4 | import argparse
 5 | from pathlib import Path
 6 | 
 7 | logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
 8 | 
 9 | 
10 | def main(args):
11 |     # user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36'
12 |     # headers = {'User-Agent': user_agent}
13 |     parts = ["train","dev", "test"]
14 |     for part in parts:
15 |         logging.info(f"Downloading {part}ing set")
16 |         with open(f"dataset/{part}.json", "r") as f:
17 |             data = json.load(f)
18 |         for obj in data:
19 |             id = obj["id"]
20 |             sign_list = obj["sign_list"]
21 |             for sign_obj in sign_list:
22 |                 video_url = sign_obj["video_url"]
23 |                 if video_url is None:
24 |                     continue
25 |                 file_path = Path(args.save_path) / part / str(id) / f"{sign_obj['lang']}.mp4"
26 |                 if file_path.exists():
27 |                     logging.info(f"{file_path} already exists")
28 |                     continue
29 |                 file_path.parent.mkdir(parents=True, exist_ok=True)
30 |                 logging.info(f"Downloading {file_path}")
31 |                 logging.info(f"Requesting {video_url}")
32 |                 urllib.request.urlretrieve(video_url, file_path)
33 |         logging.info(f"Downloading {part}ing set finished")
34 | 
35 | 
36 | 
37 | 
38 | 
39 | if __name__ == "__main__":
40 |     parse = argparse.ArgumentParser()
41 |     parse.add_argument("--save_path", help="Path to save the video",default="./videos",type=str)
42 |     args = parse.parse_args()
43 |     logging.info("Start downloading SP-10 dataset")
44 |     main(args)
45 |     logging.info("Downloading SP-10 dataset finished")


--------------------------------------------------------------------------------