├── .gitignore ├── README.md ├── _config.yml ├── dataset ├── dev.json ├── test.json └── train.json ├── env.yaml ├── index.md ├── requirements.txt ├── video_converter.py └── video_downloader.py /.gitignore: -------------------------------------------------------------------------------- 1 | videos -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MLSLT: Towards Multilingual Sign Language Translation 2 | 3 | This repository contains the SP-10 dataset described in "MLSLT: Towards Multilingual Sign Language Translation". 4 | The copyright of the data is owned by the non-profit association European Sign Language Centre, please ensure you have obtained permission before making any academic or commercial use. 5 | 6 | ## Download videos 7 | 8 | 1. Download repo. 9 | ``` 10 | git clone https://github.com/MLSLT/SP-10 11 | 12 | ``` 13 | 2. Install dependencies 14 | 15 | If you use conda you can create a virtual environment with the following command, 16 | ``` 17 | conda env create -f env.yaml 18 | conda activate sp-10 19 | ``` 20 | or you can use pip to install the corresponding packages. 21 | ``` 22 | pip install -r requirements.txt 23 | ``` 24 | 3. Install [ffmpeg](https://ffmpeg.org/) to process videos, if you use Ubuntu system you can use the following command to install. 25 | ``` 26 | sudo apt update 27 | sudo apt install ffmpeg 28 | ``` 29 | 30 | 4. Download videos. 31 | ``` 32 | python video_downloader.py [--save_path ${SAVE_PATH}] 33 | ``` 34 | The dataset is saved in the path ./videos by default, and you can also use --save_path to customize the save path. 35 | 36 | 4. Convert videos. 37 | If you customize the save path, please make sure that the save_path of this step is the same as the previous step. 38 | ``` 39 | python video_converter.py [--save_path ${SAVE_PATH}] 40 | ``` 41 | ## Folder structure 42 | The storage structure of video data in folders is shown below. 43 | ``` 44 | . 45 | ├── dev 46 | │   └── 635 47 | │   ├── bg.mp4 48 | │   ├── de.mp4 49 | │   ├── en.mp4 50 | │   ... 51 | ├── test 52 | │   └── 1296 53 | │   ├── bg.mp4 54 | │   ├── de.mp4 55 | │   ├── en.mp4 56 | │   ... 57 | └── train 58 | ├── 410 59 | │   ├── bg.mp4 60 | │   ├── de.mp4 61 | │   ├── en.mp4 62 | ... 63 | 64 | ``` -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /env.yaml: -------------------------------------------------------------------------------- 1 | name: sp-10 2 | channels: 3 | - defaults 4 | dependencies: 5 | - _libgcc_mutex=0.1=main 6 | - _openmp_mutex=4.5=1_gnu 7 | - ca-certificates=2022.2.1=h06a4308_0 8 | - certifi=2021.10.8=py37h06a4308_2 9 | - ld_impl_linux-64=2.35.1=h7274673_9 10 | - libffi=3.3=he6710b0_2 11 | - libgcc-ng=9.3.0=h5101ec6_17 12 | - libgomp=9.3.0=h5101ec6_17 13 | - libstdcxx-ng=9.3.0=hd4cf53a_17 14 | - ncurses=6.3=h7f8727e_2 15 | - openssl=1.1.1m=h7f8727e_0 16 | - pip=21.2.2=py37h06a4308_0 17 | - python=3.7.11=h12debd9_0 18 | - readline=8.1.2=h7f8727e_1 19 | - setuptools=58.0.4=py37h06a4308_0 20 | - sqlite=3.38.0=hc218d9a_0 21 | - tk=8.6.11=h1ccaba5_0 22 | - wheel=0.37.1=pyhd3eb1b0_0 23 | - xz=5.2.5=h7b6447c_0 24 | - zlib=1.2.11=h7f8727e_4 25 | - pip: 26 | - numpy==1.21.5 27 | - opencv-python==4.5.5.64 28 | prefix: /home/yinaoxiong/.miniconda/envs/sp-10 29 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | ## Welcome to GitHub Pages 2 | 3 | You can use the [editor on GitHub](https://github.com/MLSLT/SP-10/edit/gh-pages/index.md) to maintain and preview the content for your website in Markdown files. 4 | 5 | Whenever you commit to this repository, GitHub Pages will run [Jekyll](https://jekyllrb.com/) to rebuild the pages in your site, from the content in your Markdown files. 6 | 7 | ### Markdown 8 | 9 | Markdown is a lightweight and easy-to-use syntax for styling your writing. It includes conventions for 10 | 11 | ```markdown 12 | Syntax highlighted code block 13 | 14 | # Header 1 15 | ## Header 2 16 | ### Header 3 17 | 18 | - Bulleted 19 | - List 20 | 21 | 1. Numbered 22 | 2. List 23 | 24 | **Bold** and _Italic_ and `Code` text 25 | 26 | [Link](url) and ![Image](src) 27 | ``` 28 | 29 | For more details see [Basic writing and formatting syntax](https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax). 30 | 31 | ### Jekyll Themes 32 | 33 | Your Pages site will use the layout and styles from the Jekyll theme you have selected in your [repository settings](https://github.com/MLSLT/SP-10/settings/pages). The name of this theme is saved in the Jekyll `_config.yml` configuration file. 34 | 35 | ### Support or Contact 36 | 37 | Having trouble with Pages? Check out our [documentation](https://docs.github.com/categories/github-pages-basics/) or [contact support](https://support.github.com/contact) and we’ll help you sort it out. 38 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.21.5 2 | opencv-python==4.5.5.64 3 | -------------------------------------------------------------------------------- /video_converter.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import argparse 3 | from pathlib import Path 4 | import os 5 | import cv2 6 | import shutil 7 | 8 | logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s") 9 | 10 | 11 | def main(args): 12 | root_path = Path(args.save_path) 13 | for video_path in root_path.rglob("*.mp4"): 14 | cap = cv2.VideoCapture(str(video_path)) 15 | frame_rate = int(cap.get(cv2.CAP_PROP_FPS)) 16 | if frame_rate == 25: 17 | continue 18 | logging.info(f"Start converting {video_path}, its frame rate is {frame_rate}") 19 | video_path = video_path.resolve() 20 | os.system(f"ffmpeg -i '{video_path}' -r 25 /tmp/sign.mp4 -y") 21 | shutil.move("/tmp/sign.mp4", str(video_path)) 22 | 23 | 24 | if __name__ == "__main__": 25 | parse = argparse.ArgumentParser() 26 | parse.add_argument( 27 | "--save_path", help="Path to save the video", default="./videos", type=str 28 | ) 29 | args = parse.parse_args() 30 | logging.info("Start converting SP-10 dataset") 31 | main(args) 32 | logging.info("Converting SP-10 dataset finished") 33 | -------------------------------------------------------------------------------- /video_downloader.py: -------------------------------------------------------------------------------- 1 | import urllib.request 2 | import json 3 | import logging 4 | import argparse 5 | from pathlib import Path 6 | 7 | logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s') 8 | 9 | 10 | def main(args): 11 | # user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36' 12 | # headers = {'User-Agent': user_agent} 13 | parts = ["train","dev", "test"] 14 | for part in parts: 15 | logging.info(f"Downloading {part}ing set") 16 | with open(f"dataset/{part}.json", "r") as f: 17 | data = json.load(f) 18 | for obj in data: 19 | id = obj["id"] 20 | sign_list = obj["sign_list"] 21 | for sign_obj in sign_list: 22 | video_url = sign_obj["video_url"] 23 | if video_url is None: 24 | continue 25 | file_path = Path(args.save_path) / part / str(id) / f"{sign_obj['lang']}.mp4" 26 | if file_path.exists(): 27 | logging.info(f"{file_path} already exists") 28 | continue 29 | file_path.parent.mkdir(parents=True, exist_ok=True) 30 | logging.info(f"Downloading {file_path}") 31 | logging.info(f"Requesting {video_url}") 32 | urllib.request.urlretrieve(video_url, file_path) 33 | logging.info(f"Downloading {part}ing set finished") 34 | 35 | 36 | 37 | 38 | 39 | if __name__ == "__main__": 40 | parse = argparse.ArgumentParser() 41 | parse.add_argument("--save_path", help="Path to save the video",default="./videos",type=str) 42 | args = parse.parse_args() 43 | logging.info("Start downloading SP-10 dataset") 44 | main(args) 45 | logging.info("Downloading SP-10 dataset finished") --------------------------------------------------------------------------------