├── README.md ├── data ├── paper01.pdf └── paper02.pdf ├── notebooks ├── 01_blog.ipynb └── pdf01_snippet.png └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | This repo demonstrates how to use the [Unstructured library](https://unstructured-io.github.io/unstructured/getting_started.html) with Weaviate. The Unstructured Library offers powerful capabilities for parsing a variety of data sources and extracting structured text from them. This includes, but is not limited to, documents like PDFs, Powerpoints, or JPEG files. 3 | 4 | The dataset we've included are two publicly available research papers. One paper contains a single column, and the other has a two column format. The notebook starts with a basic approach to using Unstructured and ends with an end-to-end example. This includes connecting to your Weaviate instance, defining your schema, uploading the data and then running two queries. 5 | 6 | Read the [blog post](https://weaviate.io/blog/ingesting-pdfs-into-weaviate) for more information! -------------------------------------------------------------------------------- /data/paper01.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weaviate-tutorials/how-to-ingest-pdfs-with-unstructured/d56d157d3b6da7bdff308d61e3207e5e72685ddc/data/paper01.pdf -------------------------------------------------------------------------------- /data/paper02.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weaviate-tutorials/how-to-ingest-pdfs-with-unstructured/d56d157d3b6da7bdff308d61e3207e5e72685ddc/data/paper02.pdf -------------------------------------------------------------------------------- /notebooks/pdf01_snippet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weaviate-tutorials/how-to-ingest-pdfs-with-unstructured/d56d157d3b6da7bdff308d61e3207e5e72685ddc/notebooks/pdf01_snippet.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # To ensure app dependencies are ported from your virtual environment/host machine into your container, run 'pip freeze > requirements.txt' in the terminal to overwrite this file 2 | weaviate-client 3 | pytest 4 | pytest-cov 5 | black[jupyter] 6 | ipykernel 7 | textwrap3 --------------------------------------------------------------------------------