├── .gitignore ├── LICENSE ├── README.md ├── poetry.lock ├── pyproject.toml ├── src └── spacy_html_tokenizer │ ├── __init__.py │ └── html_tokenizer.py └── tests ├── __init__.py └── test_spacy_html_tokenizer.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/README.md -------------------------------------------------------------------------------- /poetry.lock: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/poetry.lock -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/pyproject.toml -------------------------------------------------------------------------------- /src/spacy_html_tokenizer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/src/spacy_html_tokenizer/__init__.py -------------------------------------------------------------------------------- /src/spacy_html_tokenizer/html_tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/src/spacy_html_tokenizer/html_tokenizer.py -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/test_spacy_html_tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbaumgartner/spacy-html-tokenizer/HEAD/tests/test_spacy_html_tokenizer.py --------------------------------------------------------------------------------