├── .gitignore ├── LICENSE ├── README.md ├── README.rst ├── assets ├── example1.png ├── example2.png ├── example3.png └── logo.png ├── requirements.txt ├── scripts ├── showdata └── showdata-bak ├── setup.py └── showdata ├── __init__.py ├── server.py ├── showdata.py └── static └── template.html /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode/* -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2018 The Python Packaging Authority 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![logo](assets/logo.png) 2 | # ShowData: Show your dataset in web browser! 3 | 4 | ShowData is a python tool to visualize and manage the multi-media files in remote server. 5 | It provides useful commond-line tools and fully customizeble API to generate html file for multi-media files. 6 | 7 | ## Examples 8 | It supports filtering data by text, sorting data by coloum values and pagination. 9 | 10 | VeRiWild dataset 11 | ![example](assets/example1.png) 12 | ![example](assets/example2.png) 13 | 14 | ReID Strong baseline Results 15 | ![example](assets/example3.png) 16 | 17 | ## Install 18 | 19 | ``` 20 | pip install -U git+https://github.com/silverbulletmdc/showdata 21 | ``` 22 | 23 | ## Command Line Tools 24 | 25 | ### Basic usage 26 | Open a file server (a stronger alternative to `python -m http.server`) 27 | ``` 28 | showdata server -p -h 29 | ``` 30 | 31 | Compare images with the same name from different folders 32 | ``` 33 | showdata compare -o 34 | ``` 35 | 36 | All string values ends with `png`/`jpg`/`jpeg` will be rendered as images, `mp4` will be rendered as video. Others are rendered as text. 37 | 38 | ## API 39 | ```python 40 | from showdata import generate_html_table 41 | data = [ 42 | { 43 | "idx": 1, 44 | "label": 'cat', 45 | "img": { 46 | "src": "images/cat.jpg", 47 | "text": "The text will be shown on the top of the image", 48 | "style": "border: 2mm solid green" 49 | } 50 | "mask": 'images/cat_mask.png', 51 | }, 52 | { 53 | "idx": 2, 54 | "label": 'dog', 55 | "img": 'images/dog.jpg', 56 | "mask": 'images/dog_mask.png', 57 | }, 58 | ] 59 | generate_html_table(data, output_path='index.html') 60 | ``` 61 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | ============================================ 2 | Show Data: Show your dataset in web browser! 3 | ============================================ 4 | 5 | Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. 6 | It provides some useful commond line tools and fully customizeble API reference to generate html table different tasks. 7 | -------------------------------------------------------------------------------- /assets/example1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silverbulletmdc/showdata/333d0b4aa90a69f5e5c98d6fc441fe46d59997c2/assets/example1.png -------------------------------------------------------------------------------- /assets/example2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silverbulletmdc/showdata/333d0b4aa90a69f5e5c98d6fc441fe46d59997c2/assets/example2.png -------------------------------------------------------------------------------- /assets/example3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silverbulletmdc/showdata/333d0b4aa90a69f5e5c98d6fc441fe46d59997c2/assets/example3.png -------------------------------------------------------------------------------- /assets/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silverbulletmdc/showdata/333d0b4aa90a69f5e5c98d6fc441fe46d59997c2/assets/logo.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | click 2 | pandas 3 | tqdm 4 | -------------------------------------------------------------------------------- /scripts/showdata: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env/python 2 | from showdata import generate_html_table, load_dir 3 | import showdata.server as showdata_server 4 | import click 5 | import os 6 | 7 | @click.group() 8 | def main(): 9 | pass 10 | 11 | @main.command() 12 | @click.option('-i', '--input-path', help='Input path. It can be a folder, a pandas pkl/csv file or json file.') 13 | @click.option('-o', '--output-path', default='./index.html', help='Output path. It must in a parent directory of all images. Default is ./index.html.') 14 | @click.option('-w', '--width', default='400', help='Width of all the images. Default is 400') 15 | @click.option('-h', '--height', default='auto', help='Height of all the images. Default is auto') 16 | @click.option('-l', '--level', default=1, type=int, help='Level of folder to be generated. Default is 1') 17 | @click.option('-r', '--rel-path', default=True, type=bool, help='Whether to use relative path of input path and output path. Default is True') 18 | @click.option('-f', '--float-precision', default=2, type=int, help='Float precision. Default is 2') 19 | @click.option('-m', '--max-str-len', default=50, type=int, help='Max string length. Default is 50') 20 | def show(input_path, output_path, width, height, level, rel_path, float_precision, max_str_len): 21 | assert os.path.exists(input_path), 'Input file not exists.' 22 | data_table = [] 23 | if os.path.isdir(input_path): 24 | data_table = load_dir(input_path, level) 25 | 26 | elif input_path.endswith('pkl'): 27 | import pandas as pd 28 | data_table = pd.read_pickle(input_path).to_dict('records') 29 | 30 | elif input_path.endswith('csv'): 31 | import pandas as pd 32 | data_table = pd.read_csv(input_path, index_col=0).to_dict('records') 33 | 34 | elif input_path.endswith('json'): 35 | import json 36 | with open(input_path, 'r') as f: 37 | data_table = json.load(f) 38 | 39 | generate_html_table(data_table, 40 | image_width=width, 41 | image_height=height, 42 | output_path=output_path, 43 | rel_path=rel_path, 44 | float_precision=float_precision, 45 | max_str_len=max_str_len) 46 | 47 | 48 | @main.command() 49 | @click.option('-o', '--output-path', default='index.html') 50 | @click.option('-w', '--width', default=400) 51 | @click.option('-e', '--exts', default='jpg,png,mp4,jpeg,gif') 52 | @click.argument('compare_folders', nargs=-1) 53 | def compare(compare_folders, output_path, width, exts): 54 | exts = exts.split(',') 55 | table = [] 56 | for fname in sorted(os.listdir(compare_folders[0])): 57 | if fname.split('.')[-1] in exts: 58 | row = {} 59 | for folder in compare_folders: 60 | row[folder] = os.path.join(folder, fname) 61 | table.append(row) 62 | generate_html_table(table, image_width=width, output_path=output_path) 63 | 64 | 65 | @main.command() 66 | @click.option('-p', '--port', default='8000') 67 | @click.option('-h', '--host', default='0.0.0.0') 68 | @click.option('-d', '--debug', default=False, type=bool, help="Whether to open the flask app in debug mode.") 69 | @click.option('--allow-modify', default=False, type=bool, help="Whether to allow the user to modify your files.") 70 | @click.option('--show-delete-button', default=False, type=bool, help="Whethter to show the delete button.") 71 | @click.option('--index-hide', default=True, type=bool, help="Whether to hide the folder content if there is a index.html in that folder.") 72 | @click.option('--password', default="1234", type=str, help="Password") 73 | @click.option('--root-url', default="/", type=str, help="Root url used for redirect.") 74 | def server(port, host, debug, allow_modify, show_delete_button, index_hide, password, root_url): 75 | showdata_server.allow_modify = allow_modify 76 | showdata_server.show_delete_button = show_delete_button 77 | showdata_server.index_hide = index_hide 78 | showdata_server.password = password 79 | showdata_server.root_url = root_url 80 | showdata_server.app.run(host, port, debug=debug) 81 | 82 | 83 | if __name__ == "__main__": 84 | main() 85 | -------------------------------------------------------------------------------- /scripts/showdata-bak: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env/python 2 | from showdata import generate_html_table, load_dir 3 | import click 4 | import os 5 | 6 | 7 | @click.command() 8 | @click.option('-i', '--input-path', help='Input path. It can be a folder, a pandas pkl/csv file or json file.') 9 | @click.option('-o', '--output-path', default='./index.html', help='Output path. It must in a parent directory of all images. Default is ./index.html.') 10 | @click.option('-w', '--width', default='400', help='Width of all the images. Default is 400') 11 | @click.option('-h', '--height', default='auto', help='Height of all the images. Default is auto') 12 | @click.option('-l', '--level', default=1, type=int, help='Level of folder to be generated. Default is 1') 13 | @click.option('-r', '--rel-path', default=True, type=bool, help='Whether to use relative path of input path and output path. Default is True') 14 | @click.option('-f', '--float-precision', default=2, type=int, help='Float precision. Default is 2') 15 | @click.option('-m', '--max-str-len', default=50, type=int, help='Max string length. Default is 50') 16 | def cmd(input_path, output_path, width, height, level, rel_path, float_precision, max_str_len): 17 | assert os.path.exists(input_path), 'Input file not exists.' 18 | data_table = [] 19 | if os.path.isdir(input_path): 20 | data_table = load_dir(input_path, level) 21 | 22 | elif input_path.endswith('pkl'): 23 | import pandas as pd 24 | data_table = pd.read_pickle(input_path).to_dict('records') 25 | 26 | elif input_path.endswith('csv'): 27 | import pandas as pd 28 | data_table = pd.read_csv(input_path, index_col=0).to_dict('records') 29 | 30 | elif input_path.endswith('json'): 31 | import json 32 | with open(input_path, 'r') as f: 33 | data_table = json.load(f) 34 | 35 | generate_html_table(data_table, 36 | image_width=width, 37 | image_height=height, 38 | output_path=output_path, 39 | rel_path=rel_path, 40 | float_precision=float_precision, 41 | max_str_len=max_str_len) 42 | 43 | 44 | if __name__ == "__main__": 45 | cmd() 46 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | import glob 3 | import os 4 | 5 | this_directory = os.path.abspath(os.path.dirname(__file__)) 6 | 7 | def read_file(filename): 8 | with open(os.path.join(this_directory, filename), encoding='utf-8') as f: 9 | long_description = f.read() 10 | return long_description 11 | 12 | setup( 13 | name="showdata", 14 | version="1.4.9", 15 | author="Dechao Meng", 16 | url="https://github.com/silverbulletmdc/showdata", 17 | author_email="dechao.meng@vipl.ict.ac.cn", 18 | description="Remote file system visualizer and manager.", 19 | long_description_content_type="text/markdown", 20 | long_description=open("README.rst").read(), 21 | packages=find_packages(exclude=('examples', 'examples.*')), 22 | scripts=glob.glob('scripts/*'), 23 | install_requires=['click', 'pandas', 'flask', 'flask-cors'], 24 | package_data={ 25 | 'showdata': ['static/*'] 26 | } 27 | ) 28 | -------------------------------------------------------------------------------- /showdata/__init__.py: -------------------------------------------------------------------------------- 1 | from .showdata import * -------------------------------------------------------------------------------- /showdata/server.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, Response, redirect 2 | import mimetypes 3 | import os 4 | from showdata import generate_html_table 5 | from urllib.parse import quote 6 | from flask_cors import CORS 7 | import time 8 | 9 | 10 | allow_modify = False 11 | show_delete_button = False 12 | show_upload_button = False 13 | index_hide = True 14 | password = "1234" 15 | root_url = "/" 16 | app = Flask(__name__) 17 | CORS(app) 18 | 19 | 20 | def get_head_div(full_path): 21 | github_div = '' 22 | path_div = get_path_div(full_path) 23 | return f""" 24 |
25 | {path_div} 26 |
27 | {github_div} 28 |
29 | """ 30 | 31 | def get_path_div(full_path): 32 | sub_path = os.path.relpath(full_path, os.getcwd()) 33 | sub_paths = sub_path.split('/') 34 | 35 | head_div = '" 44 | return head_div 45 | 46 | def all_images(files): 47 | exts = ['.jpg', '.png', '.JPEG'] 48 | for file in files: 49 | if not os.path.splitext(file)[1] in exts: 50 | return False 51 | return True 52 | 53 | 54 | def grid_image(files, full_path, head_div): 55 | cols = 6 56 | table = [] 57 | row = {"row\col": 0} 58 | for i, img_path in enumerate(files): 59 | row[i % cols] = {'src': str(img_path), 'text': str(img_path)+' '} 60 | if len(row)-1 == cols: 61 | table.append(row) 62 | row = {"row\col": i // cols+1} 63 | 64 | if len(row) > 0: 65 | table.append(row) 66 | 67 | return generate_html_table(table, 68 | image_width=256, 69 | save=False, 70 | rel_path=False, 71 | title=full_path, 72 | max_str_len=-1, 73 | page_size=50, 74 | head_div=head_div) 75 | 76 | 77 | def parse_folder(full_path): 78 | files = sorted(os.listdir(full_path)) 79 | # 过滤掉python文件 80 | for ext in [".py", ".cpp", ".ipynb", ".c", ".md"]: 81 | files = [file for file in files if not file.endswith(ext)] 82 | # 过滤掉隐藏文件夹 83 | a = "" 84 | files = [file for file in files if not file.startswith('.')] 85 | head_div = get_head_div(full_path) 86 | 87 | if len(files) > 10000: 88 | head_div += f"
Total {len(files)} files, only show top 10000 files for speed.
" 89 | files = files[:10000] 90 | 91 | if len(files) > 20 and all_images(files): 92 | return grid_image(files, full_path, head_div) 93 | 94 | table = [] 95 | 96 | row = {} 97 | print(full_path) 98 | # row["filename"] = f' .. ' 99 | # row["type"] = f"parent folder" 100 | # row["size"] = f"" 101 | # row["time"] = '' 102 | # row["content"] = ".." 103 | 104 | # if allow_modify and show_delete_button: 105 | # row["delete"] = f'' 106 | # table.append(row) 107 | 108 | row = {} 109 | if allow_modify and show_upload_button: 110 | row["filename"] = "upload file" 111 | row["type"] = f""" 112 | """ 113 | row["size"] = f"" 114 | row["content"] = f""" 115 |
116 | 117 | 126 |
127 | """ 128 | if show_delete_button: 129 | row["delete"] = f'' 130 | table.append(row) 131 | 132 | for i, file in enumerate(files): 133 | row = {} 134 | if os.path.isdir(full_path + '/' + file): 135 | file = file + '/' 136 | row["filename"] = f' {file} ' 137 | row['time'] = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(os.path.getmtime(full_path + '/' + file))) 138 | row["size"] = f"{os.path.getsize(full_path + '/' + file) / 1024 / 1024:.2f}M" 139 | row["type"] = f"{os.path.splitext(file)[-1]}" 140 | row["content"] = file 141 | if allow_modify and show_delete_button: 142 | # 不允许删除文件夹 143 | if os.path.isdir(full_path + '/' + file): 144 | row["delete"] = "" 145 | else: 146 | row["delete"] = f"""""" 150 | table.append(row) 151 | 152 | table = sorted(table, key=lambda x: x['time'])[::-1] 153 | return generate_html_table(table, 154 | image_width=256, 155 | save=False, 156 | rel_path=False, 157 | title=full_path, 158 | max_str_len=-1, 159 | page_size=50, 160 | head_div=head_div) 161 | 162 | 163 | def safety_check(path): 164 | folder, file = os.path.split(path) 165 | 166 | if file.startswith('.'): 167 | return False 168 | if 'id_rsa' in file: 169 | return False 170 | 171 | return True 172 | 173 | @app.route('/', defaults={"subpath": "./"}) 174 | @app.route('/', methods=['GET', 'POST']) 175 | def server(subpath): 176 | if ".." in subpath or '..' in os.path.relpath(subpath, '.'): 177 | return 'File Not Found.', 404 178 | full_path = f'{subpath.strip()}' 179 | print(full_path) 180 | if request.method == 'GET': 181 | action = request.args.get('action', default='download', type=str) 182 | # 下载文件 183 | if action == 'download': 184 | if os.path.exists(full_path): 185 | if os.path.isdir(full_path): 186 | if full_path[-1] != '/': 187 | return redirect('/' + full_path + '/') 188 | if index_hide and os.path.exists(full_path + 'index.html'): 189 | return redirect('/' + full_path + 'index.html') 190 | return parse_folder(full_path) 191 | else: 192 | if not safety_check(full_path): 193 | return 'Permission Error', 404 194 | data = open(full_path, 'rb').read() 195 | return Response(data, mimetype=mimetypes.guess_type(subpath)[0]) 196 | else: 197 | return 'File not Found', 404 198 | 199 | # 删除文件 200 | elif action == 'delete': 201 | user_password = request.args.get( 202 | 'password', default='1234', type=str) 203 | if allow_modify and password == user_password: 204 | if os.path.exists(full_path): 205 | os.system('rm -f "%s"' % full_path) 206 | return f'Delete {full_path}', 200 207 | else: 208 | return f'{full_path} does not exists', 200 209 | else: 210 | return f"Don't allow delete files.", 405 211 | 212 | else: 213 | return 'Invalid Action', 404 214 | 215 | elif request.method == 'POST': 216 | # 上传文件 217 | if 'file' in request.files: 218 | if not allow_modify: 219 | return "Don't allow upload files.", 405 220 | file = request.files['file'] 221 | filename = file.filename 222 | 223 | if not os.path.exists(full_path): 224 | os.makedirs(full_path, exist_ok=True) 225 | 226 | file.save(os.path.join(full_path, filename)) 227 | return 'Success uploaded!', 200 228 | else: 229 | return 'Need file', 404 230 | -------------------------------------------------------------------------------- /showdata/showdata.py: -------------------------------------------------------------------------------- 1 | import time 2 | import os 3 | from urllib.parse import urlencode, quote 4 | from tqdm import tqdm 5 | 6 | template_path = os.path.join(os.path.split(__file__)[0], 'static/template.html') 7 | 8 | def time_it(func): 9 | def wrapper(*args, **kwargs): 10 | start = time.time() 11 | print(f'Start {func.__name__}') 12 | output = func(*args, **kwargs) 13 | end = time.time() 14 | print(f'End {func.__name__}. Elapsed {end-start} seconds') 15 | return output 16 | return wrapper 17 | 18 | 19 | def load_dir(input_path, level): 20 | content_table = [] 21 | # 处理一级目录 22 | if level == 1: 23 | for idx, img_name in enumerate(sorted(os.listdir(input_path))): 24 | content = {'idx': idx+1, 'img_name': img_name + 25 | ' ', 'img': f"{input_path}/{img_name}"} 26 | content_table.append(content) 27 | 28 | # 处理二级目录 29 | elif level == 2: 30 | idx = 1 31 | for class_dir in sorted(os.listdir(input_path)): 32 | if not os.path.isdir(f'{input_path}/{class_dir}'): 33 | continue 34 | 35 | for img_name in sorted(os.listdir(f"{input_path}/{class_dir}")): 36 | img_path = f"{input_path}/{class_dir}/{img_name}" 37 | content = {"idx": idx, "class": class_dir, 38 | 'img_name': img_name+' ', 'img': img_path} 39 | content_table.append(content) 40 | idx += 1 41 | 42 | return content_table 43 | 44 | 45 | def handle_src(src, output_dir, rel_path=True): 46 | if rel_path: 47 | src = os.path.relpath(src, output_dir) 48 | assert '..' not in src, 'The html file must in one of the parent folder of all images.' 49 | return quote(src) 50 | 51 | 52 | github_head = '' 53 | 54 | def generate_html_table(content_table, 55 | image_width='auto', 56 | image_height='auto', 57 | output_path='', 58 | float_precision=3, 59 | max_str_len=30, 60 | rel_path=True, 61 | save=True, 62 | title="Showdata", 63 | head_div=github_head, 64 | page_size=10): 65 | """Generate html table 66 | 67 | Args: 68 | content_table: 2D table. 69 | width: image width. 70 | height: image height. 71 | output_path: output html path. 72 | float_precision: Max precision of float values. 73 | max_str_len: Max string length. 74 | rel_path: Whether to use the relative path of input image and output path. 75 | save: Whether to save output file. 76 | head_div: You can append some custom info in the top of the page. 77 | page_size: default page size. 78 | """ 79 | output_dir = os.path.split(output_path)[0] 80 | html = '' 81 | html += '' 82 | 83 | html += f""" 84 | {title} 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | """ 96 | 97 | html += '' 98 | html += '' 99 | if head_div: 100 | html += head_div 101 | html += f""" 102 | 124 | """ 125 | 126 | html += '' 127 | html += '' 128 | heads = content_table[0].keys() 129 | 130 | for i, h in enumerate(heads): 131 | html += f'' 132 | 133 | html += "" 134 | html += "" 135 | html += '
{h}
' 136 | 137 | html += """ 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | """ 147 | 148 | width = image_width 149 | height = image_height 150 | all_content_dict = [] 151 | for content_row in tqdm(content_table, desc='Generating rows...'): 152 | content_dict = {} 153 | for i, head in enumerate(heads): 154 | content = 'None' if not head in content_row else content_row[head] 155 | subhtml = '' 156 | 157 | if type(content) == dict: # 图片,支持更丰富的样式 158 | src = handle_src(content['src'], output_dir, rel_path) 159 | alt = '' if not "alt" in content else content['alt'] 160 | title = '' if not "title" in content else content['title'] 161 | item_width = width if not "width" in content else content['width'] 162 | item_height = height if not "height" in content else content['height'] 163 | text = '' if not "text" in content else content['text'] 164 | style = '' if not "style" in content else content['style'] 165 | if text != '': 166 | subhtml += f"
{text}
" 167 | subhtml += f"\"{alt}\"" 168 | 169 | # 图片 170 | elif type(content) == str and os.path.splitext(content)[-1].lower() in ['.jpg', '.png', '.jpeg', '.gif']: 171 | src = handle_src(content, output_dir, rel_path) 172 | subhtml += f"\"{src}\"" 173 | 174 | # 视频 175 | elif type(content) == str and os.path.splitext(content)[-1].lower() in ['.mp4', '.webm']: 176 | src = handle_src(content, output_dir, rel_path) 177 | subhtml += f"