├── .gitignore ├── .travis.yml ├── LICENSE ├── MANIFEST.in ├── Readme.rst ├── docs ├── Makefile ├── conf.py ├── index.rst └── make.bat ├── requirements.txt ├── setup.py ├── test_data ├── alexa_short_header.arc.gz └── crlf_at_1k_boundary.warc.gz ├── warc ├── __init__.py ├── arc.py ├── tests │ ├── __init__.py │ ├── test_arc.py │ ├── test_common.py │ ├── test_utils.py │ └── test_warc.py ├── utils.py └── warc.py └── warcscrape.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/.gitignore -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/.travis.yml -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/LICENSE -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include Readme.rst LICENSE 2 | -------------------------------------------------------------------------------- /Readme.rst: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/Readme.rst -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/docs/Makefile -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/docs/conf.py -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/docs/index.rst -------------------------------------------------------------------------------- /docs/make.bat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/docs/make.bat -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | nose 2 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/setup.py -------------------------------------------------------------------------------- /test_data/alexa_short_header.arc.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/test_data/alexa_short_header.arc.gz -------------------------------------------------------------------------------- /test_data/crlf_at_1k_boundary.warc.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/test_data/crlf_at_1k_boundary.warc.gz -------------------------------------------------------------------------------- /warc/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/__init__.py -------------------------------------------------------------------------------- /warc/arc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/arc.py -------------------------------------------------------------------------------- /warc/tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /warc/tests/test_arc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/tests/test_arc.py -------------------------------------------------------------------------------- /warc/tests/test_common.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/tests/test_common.py -------------------------------------------------------------------------------- /warc/tests/test_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/tests/test_utils.py -------------------------------------------------------------------------------- /warc/tests/test_warc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/tests/test_warc.py -------------------------------------------------------------------------------- /warc/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/utils.py -------------------------------------------------------------------------------- /warc/warc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warc/warc.py -------------------------------------------------------------------------------- /warcscrape.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jpbruinsslot/warc3/HEAD/warcscrape.py --------------------------------------------------------------------------------