├── README.md ├── cli ├── __init__.py ├── downloader.py ├── parser.py ├── pipeline.py └── reader.py ├── config ├── LANGS_322.tsv ├── TAGS_HTML_HEADERS.tsv ├── __init__.py └── config.py ├── core ├── __init__.py ├── downloader.py ├── parse_wikitable_html.py ├── utils │ ├── __init__.py │ └── io_worker.py └── wikitable_to_image.py ├── data └── dump │ └── crwiki-NS0-20220301-ENTERPRISE-HTML.json.tar.gz ├── requirements.txt ├── run.py └── wtabhtml.py /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/README.md -------------------------------------------------------------------------------- /cli/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/cli/__init__.py -------------------------------------------------------------------------------- /cli/downloader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/cli/downloader.py -------------------------------------------------------------------------------- /cli/parser.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/cli/parser.py -------------------------------------------------------------------------------- /cli/pipeline.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/cli/pipeline.py -------------------------------------------------------------------------------- /cli/reader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/cli/reader.py -------------------------------------------------------------------------------- /config/LANGS_322.tsv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/config/LANGS_322.tsv -------------------------------------------------------------------------------- /config/TAGS_HTML_HEADERS.tsv: -------------------------------------------------------------------------------- 1 | h1 2 | h2 3 | h3 4 | h4 5 | h5 6 | h6 7 | -------------------------------------------------------------------------------- /config/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /config/config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/config/config.py -------------------------------------------------------------------------------- /core/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /core/downloader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/core/downloader.py -------------------------------------------------------------------------------- /core/parse_wikitable_html.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/core/parse_wikitable_html.py -------------------------------------------------------------------------------- /core/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /core/utils/io_worker.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/core/utils/io_worker.py -------------------------------------------------------------------------------- /core/wikitable_to_image.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/core/wikitable_to_image.py -------------------------------------------------------------------------------- /data/dump/crwiki-NS0-20220301-ENTERPRISE-HTML.json.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/data/dump/crwiki-NS0-20220301-ENTERPRISE-HTML.json.tar.gz -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/requirements.txt -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/run.py -------------------------------------------------------------------------------- /wtabhtml.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phucty/wtabhtml/HEAD/wtabhtml.py --------------------------------------------------------------------------------