├── figures
├── web_search.gif
├── Soda.png
├── soda_title.png
├── test_pics
│ ├── 1.jpg
│ ├── 2.jpg
│ ├── 3.jpg
│ ├── 4.jpg
│ ├── 5.jpg
│ ├── 6.jpg
│ ├── 7.jpg
│ ├── 8.jpg
│ └── 9.jpg
├── web_search.mp4
├── SODAtitle_crop.pdf
├── image_retrieve.mp4
├── text_retrieve.mp4
└── soda_architecture.png
├── web_ui
├── database
│ ├── tmp7hym5i1j
│ ├── 1713354159_521_database
│ │ ├── 936abf26-b455-4d19-a124-b9ffeb5be592
│ │ │ ├── link_lists.bin
│ │ │ ├── header.bin
│ │ │ └── length.bin
│ │ └── chroma.sqlite3
│ ├── 1713354159_521
│ │ └── RAR_arXiv.pdf
│ └── ff122a0aed7ded202b4cdad150eb8aff35500a8c
│ │ └── soda_title.png
├── soda_title.png
└── web_ui.py
├── RAG
├── test_img.jpg
├── artwork_img
│ ├── 1072.jpg
│ ├── 2436.jpg
│ ├── 2545.jpg
│ ├── 2550.jpg
│ ├── 283.jpg
│ ├── 3100.jpg
│ ├── 3118.jpg
│ ├── 3483.jpg
│ ├── 3764.jpg
│ ├── 3917.jpg
│ ├── 3945.jpg
│ ├── 4073.jpg
│ ├── 4195.jpg
│ ├── 466.jpg
│ ├── 4880.jpg
│ ├── 5522.jpg
│ ├── 5570.jpg
│ ├── 7440.jpg
│ ├── 7460.jpg
│ ├── 7852.jpg
│ ├── 8106.jpg
│ ├── 8154.jpg
│ ├── 8308.jpg
│ ├── 8432.jpg
│ ├── 8439.jpg
│ ├── 8657.jpg
│ ├── 9086.jpg
│ ├── 9512.jpg
│ ├── 9740.jpg
│ ├── 10161.jpg
│ ├── 10601.jpg
│ ├── 10671.jpg
│ ├── 10896.jpg
│ ├── 12020.jpg
│ ├── 13577.jpg
│ ├── 13778.jpg
│ ├── 13939.jpg
│ ├── 14283.jpg
│ ├── 14427.jpg
│ ├── 15167.jpg
│ ├── 15870.jpg
│ ├── 16101.jpg
│ ├── 16215.jpg
│ ├── 16715.jpg
│ ├── 16970.jpg
│ ├── 17669.jpg
│ ├── 17760.jpg
│ ├── 17960.jpg
│ ├── 18130.jpg
│ ├── 18149.jpg
│ ├── 19159.jpg
│ ├── 19341.jpg
│ ├── 20326.jpg
│ ├── 21230.jpg
│ ├── 21905.jpg
│ ├── 22186.jpg
│ ├── 22651.jpg
│ ├── 22740.jpg
│ ├── 23438.jpg
│ ├── 23588.jpg
│ ├── 23901.jpg
│ ├── 24321.jpg
│ ├── 25400.jpg
│ ├── 25427.jpg
│ ├── 25484.jpg
│ ├── 25609.jpg
│ ├── 26587.jpg
│ ├── 27400.jpg
│ ├── 27493.jpg
│ ├── 27521.jpg
│ ├── 27782.jpg
│ ├── 28167.jpg
│ ├── 29519.jpg
│ ├── 29537.jpg
│ ├── 30119.jpg
│ ├── 30738.jpg
│ ├── 30878.jpg
│ ├── 31182.jpg
│ ├── 32091.jpg
│ ├── 32275.jpg
│ ├── 32845.jpg
│ ├── 34486.jpg
│ ├── 34602.jpg
│ ├── 34926.jpg
│ ├── 35186.jpg
│ ├── 35574.jpg
│ ├── 36035.jpg
│ ├── 36711.jpg
│ ├── 37121.jpg
│ ├── 37504.jpg
│ ├── 38682.jpg
│ ├── 39561.jpg
│ ├── 40334.jpg
│ ├── 40827.jpg
│ ├── 40945.jpg
│ ├── 41303.jpg
│ ├── 41805.jpg
│ ├── 42039.jpg
│ ├── 43111.jpg
│ ├── 43121.jpg
│ └── .ipynb_checkpoints
│ │ ├── 10161-checkpoint.jpg
│ │ ├── 35574-checkpoint.jpg
│ │ ├── 37121-checkpoint.jpg
│ │ ├── 38682-checkpoint.jpg
│ │ ├── 42039-checkpoint.jpg
│ │ └── 43121-checkpoint.jpg
├── __pycache__
│ └── utils.cpython-310.pyc
├── text_rag.ipynb
├── image_rag.ipynb
├── utils.py
└── artwork_data.tsv
├── mllm
├── test_img.jpg
├── __pycache__
│ └── soda_mllm.cpython-310.pyc
├── IXC2.py
└── soda_mllm.py
├── service
├── __pycache__
│ └── utils.cpython-310.pyc
├── utils.py
└── rerank.ipynb
├── web_search
├── __pycache__
│ └── utils.cpython-310.pyc
├── Google_API.ipynb
├── utils.py
├── Bing_API.ipynb
└── Serper_API.ipynb
├── requirements.txt
├── README_zh.md
├── README.md
└── LICENSE
/figures/web_search.gif:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/web_ui/database/tmp7hym5i1j:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/RAG/test_img.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/test_img.jpg
--------------------------------------------------------------------------------
/figures/Soda.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/Soda.png
--------------------------------------------------------------------------------
/mllm/test_img.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/mllm/test_img.jpg
--------------------------------------------------------------------------------
/web_ui/database/1713354159_521_database/936abf26-b455-4d19-a124-b9ffeb5be592/link_lists.bin:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/web_ui/soda_title.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_ui/soda_title.png
--------------------------------------------------------------------------------
/RAG/artwork_img/1072.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/1072.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/2436.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/2436.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/2545.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/2545.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/2550.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/2550.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/283.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/283.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/3100.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/3100.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/3118.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/3118.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/3483.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/3483.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/3764.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/3764.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/3917.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/3917.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/3945.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/3945.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/4073.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/4073.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/4195.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/4195.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/466.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/466.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/4880.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/4880.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/5522.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/5522.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/5570.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/5570.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/7440.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/7440.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/7460.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/7460.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/7852.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/7852.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/8106.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/8106.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/8154.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/8154.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/8308.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/8308.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/8432.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/8432.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/8439.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/8439.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/8657.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/8657.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/9086.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/9086.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/9512.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/9512.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/9740.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/9740.jpg
--------------------------------------------------------------------------------
/figures/soda_title.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/soda_title.png
--------------------------------------------------------------------------------
/figures/test_pics/1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/1.jpg
--------------------------------------------------------------------------------
/figures/test_pics/2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/2.jpg
--------------------------------------------------------------------------------
/figures/test_pics/3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/3.jpg
--------------------------------------------------------------------------------
/figures/test_pics/4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/4.jpg
--------------------------------------------------------------------------------
/figures/test_pics/5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/5.jpg
--------------------------------------------------------------------------------
/figures/test_pics/6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/6.jpg
--------------------------------------------------------------------------------
/figures/test_pics/7.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/7.jpg
--------------------------------------------------------------------------------
/figures/test_pics/8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/8.jpg
--------------------------------------------------------------------------------
/figures/test_pics/9.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/test_pics/9.jpg
--------------------------------------------------------------------------------
/figures/web_search.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/web_search.mp4
--------------------------------------------------------------------------------
/RAG/artwork_img/10161.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/10161.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/10601.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/10601.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/10671.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/10671.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/10896.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/10896.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/12020.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/12020.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/13577.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/13577.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/13778.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/13778.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/13939.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/13939.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/14283.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/14283.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/14427.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/14427.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/15167.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/15167.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/15870.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/15870.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/16101.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/16101.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/16215.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/16215.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/16715.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/16715.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/16970.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/16970.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/17669.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/17669.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/17760.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/17760.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/17960.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/17960.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/18130.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/18130.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/18149.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/18149.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/19159.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/19159.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/19341.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/19341.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/20326.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/20326.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/21230.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/21230.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/21905.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/21905.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/22186.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/22186.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/22651.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/22651.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/22740.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/22740.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/23438.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/23438.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/23588.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/23588.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/23901.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/23901.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/24321.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/24321.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/25400.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/25400.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/25427.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/25427.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/25484.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/25484.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/25609.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/25609.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/26587.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/26587.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/27400.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/27400.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/27493.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/27493.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/27521.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/27521.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/27782.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/27782.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/28167.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/28167.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/29519.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/29519.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/29537.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/29537.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/30119.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/30119.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/30738.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/30738.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/30878.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/30878.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/31182.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/31182.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/32091.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/32091.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/32275.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/32275.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/32845.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/32845.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/34486.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/34486.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/34602.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/34602.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/34926.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/34926.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/35186.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/35186.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/35574.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/35574.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/36035.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/36035.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/36711.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/36711.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/37121.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/37121.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/37504.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/37504.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/38682.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/38682.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/39561.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/39561.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/40334.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/40334.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/40827.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/40827.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/40945.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/40945.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/41303.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/41303.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/41805.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/41805.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/42039.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/42039.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/43111.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/43111.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/43121.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/43121.jpg
--------------------------------------------------------------------------------
/figures/SODAtitle_crop.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/SODAtitle_crop.pdf
--------------------------------------------------------------------------------
/figures/image_retrieve.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/image_retrieve.mp4
--------------------------------------------------------------------------------
/figures/text_retrieve.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/text_retrieve.mp4
--------------------------------------------------------------------------------
/figures/soda_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/figures/soda_architecture.png
--------------------------------------------------------------------------------
/RAG/__pycache__/utils.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/__pycache__/utils.cpython-310.pyc
--------------------------------------------------------------------------------
/service/__pycache__/utils.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/service/__pycache__/utils.cpython-310.pyc
--------------------------------------------------------------------------------
/mllm/__pycache__/soda_mllm.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/mllm/__pycache__/soda_mllm.cpython-310.pyc
--------------------------------------------------------------------------------
/web_search/__pycache__/utils.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_search/__pycache__/utils.cpython-310.pyc
--------------------------------------------------------------------------------
/web_ui/database/1713354159_521/RAR_arXiv.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_ui/database/1713354159_521/RAR_arXiv.pdf
--------------------------------------------------------------------------------
/web_ui/database/1713354159_521_database/chroma.sqlite3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_ui/database/1713354159_521_database/chroma.sqlite3
--------------------------------------------------------------------------------
/RAG/artwork_img/.ipynb_checkpoints/10161-checkpoint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/.ipynb_checkpoints/10161-checkpoint.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/.ipynb_checkpoints/35574-checkpoint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/.ipynb_checkpoints/35574-checkpoint.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/.ipynb_checkpoints/37121-checkpoint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/.ipynb_checkpoints/37121-checkpoint.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/.ipynb_checkpoints/38682-checkpoint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/.ipynb_checkpoints/38682-checkpoint.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/.ipynb_checkpoints/42039-checkpoint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/.ipynb_checkpoints/42039-checkpoint.jpg
--------------------------------------------------------------------------------
/RAG/artwork_img/.ipynb_checkpoints/43121-checkpoint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/RAG/artwork_img/.ipynb_checkpoints/43121-checkpoint.jpg
--------------------------------------------------------------------------------
/web_ui/database/ff122a0aed7ded202b4cdad150eb8aff35500a8c/soda_title.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_ui/database/ff122a0aed7ded202b4cdad150eb8aff35500a8c/soda_title.png
--------------------------------------------------------------------------------
/web_ui/database/1713354159_521_database/936abf26-b455-4d19-a124-b9ffeb5be592/header.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_ui/database/1713354159_521_database/936abf26-b455-4d19-a124-b9ffeb5be592/header.bin
--------------------------------------------------------------------------------
/web_ui/database/1713354159_521_database/936abf26-b455-4d19-a124-b9ffeb5be592/length.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Liuziyu77/Soda/HEAD/web_ui/database/1713354159_521_database/936abf26-b455-4d19-a124-b9ffeb5be592/length.bin
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aiohttp==3.9.3
2 | chromadb==0.4.24
3 | clip==1.0
4 | gradio==4.26.0
5 | html2text==2024.2.26
6 | ipython==8.12.3
7 | langchain==0.1.16
8 | langchain_community==0.0.32
9 | llama_index==0.10.28
10 | llava==0.0.1.dev0
11 | matplotlib==3.8.3
12 | numpy==1.26.4
13 | openai==1.17.0
14 | pandas==2.2.2
15 | peft==0.10.0
16 | Pillow==10.3.0
17 | Requests==2.31.0
18 | sentence_transformers==2.6.1
19 | torch==2.1.2+cu118
20 | tqdm==4.66.2
21 | transformers==4.39.2
22 |
--------------------------------------------------------------------------------
/mllm/IXC2.py:
--------------------------------------------------------------------------------
1 | import clip
2 | import json
3 | import torch
4 | import numpy as np
5 | from PIL import Image
6 | from tqdm import tqdm
7 | Image.MAX_IMAGE_PIXELS = None
8 | import matplotlib.pyplot as plt
9 | from sentence_transformers import CrossEncoder
10 | from transformers import AutoModel, AutoTokenizer
11 | from transformers import (AutoModelForCausalLM, AutoTokenizer,
12 | StoppingCriteria, StoppingCriteriaList)
13 | from transformers.generation import GenerationConfig
14 | from peft import AutoPeftModelForCausalLM
15 |
16 | torch.manual_seed(1234)
17 |
18 | torch.set_grad_enabled(False)
19 | model = AutoModelForCausalLM.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).cuda().eval()
20 | tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True)
21 |
22 |
23 | query = ''+ "What do you see in the picture?"
24 | with torch.no_grad():
25 | query = query
26 | image = "./test_img.jpg"
27 | with torch.cuda.amp.autocast():
28 | response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
29 | print("\033[92m" + response + "\033[0m")
--------------------------------------------------------------------------------
/mllm/soda_mllm.py:
--------------------------------------------------------------------------------
1 | from openai import OpenAI
2 | import torch
3 |
4 | # Enter your OpenAI API here:
5 | api_base = "***********************************"
6 | api_key = "***********************************"
7 |
8 | # call gpt4v
9 | def mllm_openai(query, search_results):
10 | conversation_history = []
11 | client = OpenAI(api_key=api_key, base_url=api_base)
12 | response = client.chat.completions.create(
13 | model="gpt-4-0125-preview",
14 | messages=[
15 | {"role": "system", "content": "You are a helpful assistant."},
16 | {"role": "user", "content": f"Given a input text and web information, please response to the input text with these information.\n Here is the input text: {query}.\n Here is the materials: {search_results}"},
17 | ]
18 | )
19 | # print(response.choices[0].message.content)
20 | return response.choices[0].message.content
21 |
22 | # call InternLM-Xcomposer2
23 | def mllm_IXC2(IXC2_model, IXC2_tokenizer, query, search_results):
24 | text_inputs = f"Given a input text and web information, please response to the input text with these information.\n Here is the input text: {query}.\n Here is the materials: {search_results}"
25 | with torch.no_grad():
26 | query = text_inputs
27 | with torch.cuda.amp.autocast():
28 | response, _ = IXC2_model.chat(IXC2_tokenizer, query=text_inputs, history=[], do_sample=False)
29 | # print("\033[92m" + response + "\033[0m")
30 | return response
--------------------------------------------------------------------------------
/service/utils.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append("../")
3 | from mllm.soda_mllm import mllm_openai
4 | from web_search.utils import search, preprocess_search_results, merge_snippet, get_web_contents
5 |
6 | ### input snippet to LLM
7 | async def web_search_snippet(query, search_engine='google', search_num=10, search_type=None):
8 | try:
9 | results = search(search_engine, query)
10 | urls, title, snippet = preprocess_search_results(search_engine, results, search_num, search_type)
11 | web_contents_combined = merge_snippet(snippet)
12 | answer = mllm_openai(query,web_contents_combined)
13 | return answer
14 | except Exception as e:
15 | print(f"Web search error: {e}")
16 |
17 | ### input whole web page to LLM
18 | async def web_search_pagehtml(query, search_engine='google', search_num=3, search_type=None):
19 | try:
20 | results = search(search_engine, query)
21 | urls, title, snippet = preprocess_search_results(search_engine, results, search_num, search_type)
22 | web_contents = await get_web_contents(urls)
23 | print(len((web_contents)))
24 | web_contents_combined = " ".join(web_contents)
25 | answer = mllm_openai(query,web_contents_combined)
26 | return answer
27 | except Exception as e:
28 | print(f"Web search error: {e}")
29 |
30 | ### retrieve database
31 | def rag_database(query, text_num, text_collection):
32 | try:
33 | ans = text_collection.query(
34 | query_texts=[query],
35 | n_results=text_num
36 | )
37 | ans = ans["documents"][0]
38 | ans_combined = merge_snippet(ans)
39 | print("RAG text:")
40 | print(ans_combined)
41 | answer = mllm_openai(query,ans_combined)
42 | return answer
43 | except Exception as e:
44 | print(f"RAG database error: {e}")
--------------------------------------------------------------------------------
/RAG/text_rag.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "id": "54c60bec-0433-4590-b51a-265a31162bd7",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "from utils import preprocess_files, build_text_database"
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": null,
16 | "id": "d382be04-6140-4749-a6f9-071d4a70b524",
17 | "metadata": {},
18 | "outputs": [],
19 | "source": [
20 | "### upload PDF TXT or DOCX file\n",
21 | "docs = preprocess_files(file_path=\"RAR-pr.docx\",chunk_size=1000, chunk_overlap=50)"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "id": "1972207e-2fe1-4346-ad2c-f4dfa77c686c",
28 | "metadata": {},
29 | "outputs": [],
30 | "source": [
31 | "docs"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": null,
37 | "id": "780ef08c-a9d7-4da6-9be5-f7876c2d3b81",
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "### build database\n",
42 | "txt_collection = build_text_database(docs, batch_size=40000, encoder=\"intfloat/multilingual-e5-base\", database_name = \"test\")"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": null,
48 | "id": "a4b7cac3-fdd2-4871-a617-4cff0899af7e",
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "### change query_text to retrieve\n",
53 | "query_text = \"What is the database consisted of\"\n",
54 | "n_text = 10\n",
55 | "ans = txt_collection.query(\n",
56 | " query_texts=[query_text],\n",
57 | " n_results=n_text\n",
58 | ")\n",
59 | "print(ans)"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": null,
65 | "id": "4b98ed5d-2180-45b2-823f-54cd2ce95ec9",
66 | "metadata": {},
67 | "outputs": [],
68 | "source": []
69 | }
70 | ],
71 | "metadata": {
72 | "kernelspec": {
73 | "display_name": "Python 3 (ipykernel)",
74 | "language": "python",
75 | "name": "python3"
76 | },
77 | "language_info": {
78 | "codemirror_mode": {
79 | "name": "ipython",
80 | "version": 3
81 | },
82 | "file_extension": ".py",
83 | "mimetype": "text/x-python",
84 | "name": "python",
85 | "nbconvert_exporter": "python",
86 | "pygments_lexer": "ipython3",
87 | "version": "3.10.13"
88 | }
89 | },
90 | "nbformat": 4,
91 | "nbformat_minor": 5
92 | }
93 |
--------------------------------------------------------------------------------
/service/rerank.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "id": "4875508a",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "\"\"\"\n",
11 | "test code for rerank\n",
12 | "\"\"\""
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": 1,
18 | "id": "0644bef7-478a-4eb3-ab09-0f48ac2953d4",
19 | "metadata": {},
20 | "outputs": [
21 | {
22 | "name": "stderr",
23 | "output_type": "stream",
24 | "text": [
25 | "/mnt/petrelfs/liuziyu/miniconda3/envs/soda/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
26 | " from .autonotebook import tqdm as notebook_tqdm\n",
27 | "/mnt/petrelfs/liuziyu/miniconda3/envs/soda/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
28 | " return self.fget.__get__(instance, owner)()\n"
29 | ]
30 | }
31 | ],
32 | "source": [
33 | "from sentence_transformers import CrossEncoder\n",
34 | "cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2', max_length=512)"
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": 7,
40 | "id": "93af7e00-0f27-4b24-bd83-1cf1dbf52552",
41 | "metadata": {},
42 | "outputs": [
43 | {
44 | "name": "stdout",
45 | "output_type": "stream",
46 | "text": [
47 | "[-9.597101 -8.573202 -7.995801 -0.78380984]\n",
48 | "['Thank you', 'you are welcome', 'Good afternoon', 'Today is Friday']\n"
49 | ]
50 | }
51 | ],
52 | "source": [
53 | "import numpy as np\n",
54 | "query_text = \"Thanks\"\n",
55 | "str_list = [\"Today is Friday\", \"Good afternoon\", \"you are welcome\", \"Thank you\"]\n",
56 | "input_cross_encoder = [(query_text, s) for s in str_list]\n",
57 | "rerank_scores = cross_encoder_model.predict(input_cross_encoder)\n",
58 | "print(rerank_scores)\n",
59 | "sorted_str_list = [x for _, x in sorted(zip(rerank_scores, str_list), reverse=True)]\n",
60 | "print(sorted_str_list)"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": null,
66 | "id": "c29a3b38-f975-4c34-ad5c-df16095a1b8c",
67 | "metadata": {},
68 | "outputs": [],
69 | "source": []
70 | }
71 | ],
72 | "metadata": {
73 | "kernelspec": {
74 | "display_name": "Python 3 (ipykernel)",
75 | "language": "python",
76 | "name": "python3"
77 | },
78 | "language_info": {
79 | "codemirror_mode": {
80 | "name": "ipython",
81 | "version": 3
82 | },
83 | "file_extension": ".py",
84 | "mimetype": "text/x-python",
85 | "name": "python",
86 | "nbconvert_exporter": "python",
87 | "pygments_lexer": "ipython3",
88 | "version": "3.10.13"
89 | }
90 | },
91 | "nbformat": 4,
92 | "nbformat_minor": 5
93 | }
94 |
--------------------------------------------------------------------------------
/RAG/image_rag.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "id": "bf037b19-3475-4850-9fd0-d99a713331ce",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "import matplotlib.pyplot as plt\n",
11 | "from PIL import Image\n",
12 | "from utils import build_image_database"
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": null,
18 | "id": "72c0d164-4aa0-40c0-9ecb-60fa731e97fb",
19 | "metadata": {},
20 | "outputs": [],
21 | "source": [
22 | "### give a image fold path,build image database\n",
23 | "img_collection = build_image_database(fold_path = '/mnt/petrelfs/liuziyu/V3Det/Soda/RAG/artwork_img/',database_name = 'test')"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": null,
29 | "id": "4175c116-96ed-4279-a440-4474d4f40c41",
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "### input a image, then retrieve\n",
34 | "test_image_path = \"./test_img.jpg\"\n",
35 | "n_pictures = 1\n",
36 | "ans = img_collection.query(\n",
37 | " query_uris=[test_image_path], # A list of strings representing URIs to data\n",
38 | " n_results=n_pictures\n",
39 | " )"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": null,
45 | "id": "e434db62-c47a-4b56-8784-503e2ed14526",
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "retrieved_image_path = '/mnt/petrelfs/liuziyu/V3Det/Soda/RAG/artwork_img/'+ ans[\"metadatas\"][0][0][\"ID\"]+'.jpg'\n",
50 | "retrieved_image_path"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "id": "154c4c94-c71a-4bba-8135-dd9a443ada5e",
57 | "metadata": {},
58 | "outputs": [],
59 | "source": [
60 | "### show results\n",
61 | "image_path1 = test_image_path\n",
62 | "image_path2 = retrieved_image_path\n",
63 | "\n",
64 | "image1 = Image.open(image_path1)\n",
65 | "image2 = Image.open(image_path2)\n",
66 | "\n",
67 | "fig, axes = plt.subplots(1, 2, figsize=(12, 6))\n",
68 | "\n",
69 | "axes[0].imshow(image1)\n",
70 | "axes[0].set_title('Original Image')\n",
71 | "axes[0].axis('off')\n",
72 | "\n",
73 | "axes[1].imshow(image2)\n",
74 | "axes[1].set_title('Retrieved Image')\n",
75 | "axes[1].axis('off')\n",
76 | "\n",
77 | "plt.show()"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": null,
83 | "id": "f921cd03-f5d6-4809-b47e-6c2d2da32dc8",
84 | "metadata": {},
85 | "outputs": [],
86 | "source": []
87 | }
88 | ],
89 | "metadata": {
90 | "kernelspec": {
91 | "display_name": "Python 3 (ipykernel)",
92 | "language": "python",
93 | "name": "python3"
94 | },
95 | "language_info": {
96 | "codemirror_mode": {
97 | "name": "ipython",
98 | "version": 3
99 | },
100 | "file_extension": ".py",
101 | "mimetype": "text/x-python",
102 | "name": "python",
103 | "nbconvert_exporter": "python",
104 | "pygments_lexer": "ipython3",
105 | "version": "3.10.13"
106 | }
107 | },
108 | "nbformat": 4,
109 | "nbformat_minor": 5
110 | }
111 |
--------------------------------------------------------------------------------
/README_zh.md:
--------------------------------------------------------------------------------
1 |
2 |

3 | SODA: Search, Organize, Discovery Anything
4 |
5 |
8 |
9 |
12 |
13 |
14 |
15 | English |
16 | 简体中文
17 |
18 |
19 | 🌟 欢迎来到我的GitHub项目!如果这个项目让你心动,不妨赏个星星吧!点赞越多,更新越快,快乐越多!
20 |
21 | ## 📣 简介
22 | 随着大型语言模型(LLM)的出现和广泛部署,这些高级系统在多种应用领域展现了巨大的潜力。然而,即使是像GPT-4这样的先进模型也不是没有局限性;它们并非全知全能,且容易出现所谓的“幻觉问题”。
23 |
24 | 为了解决这些局限,我们创新了一款尖端的信息整合工具——**SODA(Search, Organize, Discover Anything)**。SODA以大型语言模型为核心,灵活地从众多渠道获取数据以响应用户查询,从而提供精细和全面的答案。通过SODA,用户可以利用一个高级的网络搜索机制,从互联网上提取相关信息。这使得LLM的内在知识和外部资源无缝整合,确保提供的答案不仅准确而且可靠。此外,SODA还支持用户上传个人文件,便于创建一个私密、安全且强大的本地知识数据库。这一功能使得LLM能够轻松吸收新信息,无需预训练或微调,有效地利用这些知识来响应用户输入。
25 |
26 | 总体而言,SODA被设计为一个安全、可靠、且智能信息采集和处理的工具。它的设计使用户能够处理和解释从大语言模型、网络和您自己的数据库中获得的信息。
27 |
28 | ## 🔭 SODA架构
29 | SODA的架构如下所示:
30 |
31 |

32 |
33 | 我们现在支持 **网络检索**, **文本检索(本地数据库)** and **图像检索(本地数据库)**。
34 | 在文本检索阶段,我们实现了**两阶段**检索过程,第一阶段从数据库中检索信息,第二阶段对检索到的文本进行重新排序。
35 |
36 | ## 📢 New
37 | - 🚀 [04/18/2024] 我们开源了第一版的SODA,即将发布更多更新!!!
38 |
39 | ## 💡 Highlights
40 | - 🔥 **全新的技术框架** 我们开发了一款由LLM(大型语言模型)驱动的信息整合工具,它提供了一个用于检索增强生成(RAG)的技术框架,并为AI Agent提供了使用工具的指导。
41 | - 🔥 **良好的兼容性** SODA能够轻松切换组件,使用不同的搜索引擎、向量数据库或LLM,并展现出良好的兼容性。
42 | - 🔥 **可靠&信息来源可追溯** SODA有效地解决了LLM的部分幻觉问题,提供了可追溯信息源的可靠且准确的答案。
43 | - 🔥 **数据隐私** SODA支持本地数据库,允许模型在不进行预训练或微调的情况下获取新知识,同时有效保护用户数据隐私。
44 |
45 | ## 🛠️ 使用方法
46 |
47 | ### 目录
48 | - [安装](#安装)
49 | - [网络搜索](#网络搜索)
50 | - [本地数据库检索](#本地数据库检索)
51 | - [大语言模型](#大语言模型)
52 |
53 | ### 安装
54 | 要在本地运行SODA,首先按照以下的命令行将项目克隆到本地,并安装依赖的包。
55 | ```bash
56 | mkdir SODA
57 | cd SODA
58 | git clone https://github.com/Liuziyu77/Soda.git
59 | pip install requirements.txt
60 | ```
61 | 如果要体验SODA的全部功能,请运行我们基于gradio构建的webui程序,命令行如下所示。如果要体验SODA中单一功能的效果,可以进入SODA子文件夹中寻找`.ipynb`文件,并运行。
62 | ```bash
63 | cd web_ui
64 | python web_ui.py
65 | ```
66 | 请注意,您需要在 `web_ui.py` 文件中修改 `base_directory` 路径。此路径用于临时存储中间文件(例如从本地文件构建的数据库)。这些文件将自动定期进行清理。如有需要,请相应调整代码。
67 |
68 | 为启用网络搜索功能并使用 OpenAI 的API,请在 `./web_search/utils.py` 和 `./mllm/soda_mllm.py` 文件夹中输入相应的 API 密钥。
69 |
70 | ### 🌐 网络搜索
71 | 与网络搜索相关的代码存放在 `web_search` 文件夹中。该文件夹包含了一系列代码,这些代码利用各种搜索引擎的API来根据用户输入检索相关信息。这一过程展示了多种搜索工具的有效整合,优化了搜索结果的相关性和准确性。
72 |
73 | #### 使用API
74 | 我们现在已经支持谷歌,必应和serper的API。你可以通过运行`./web_search/Google_API.ipynb`, `./web_search/Serper_API.ipynb` 和 `./web_search/Bing_API.ipynb` 去测试不同搜索引擎。但是,你首先需要获取对应的API。以下是获取API的网页连接。
75 |
76 | * 你可以获取 Google APIs: [Google API](https://cloud.google.com/apis/docs/overview)
77 | * 你可以获取 Bing APIs: [Bing API](https://serper.dev/)
78 | * 你可以获取 Serper APIs: [Serper API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api)
79 |
80 | 以下是网络搜索功能的示例: [Web Search Example](figures/web_search.mp4).
81 | Web Search Example
82 |
83 |
84 |
85 | https://github.com/Liuziyu77/Soda_Dev/assets/137670115/7bc73223-eaa9-44f5-a379-8bf204d4380c
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 | ### 🔎 本地数据库检索
94 | 与本地数据库上的检索增强生成(RAG)相关的代码存放在 `RAG` 文件夹中。这个文件夹实现了构建您自己的本地数据库并从中检索信息。它包括 `text-text` 检索、`image-image` 检索以及 `image-image&text pair` 检索。您可以通过运行不同的 `.ipynb` 文件来测试检索功能,我们提供了三个脚本作为示例。
95 |
96 | #### 1. 文本到文本的检索
97 | 运行 `./RAG/text_rag.ipynb` 来构建本地文本数据库并从中检索信息,你所需要做的唯一一件事就是修改上传的文件的路径。 SODA现在支持 TXT, DOCX, PDF 等文件格式。
98 |
99 | SODA使用 **Sentense transfomer** 作文文本编码器,不久将会支持更多的文本编码器!
100 |
101 | 以下是文本检索的示例: [Text Retrieve Example](figures/text_retrieve.mp4).
102 | Text Retrieve Example
103 |
104 |
105 |
106 |
107 |
108 | https://github.com/Liuziyu77/Soda_Dev/assets/137670115/7a2042b9-7c03-44f4-9e36-abb7f244da19
109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 | #### 2. 图像到图像的检索
117 | 运行 `./RAG/image_rag.ipynb` 构建本地图像数据库并从中检索图像。 你所需要做的唯一一件事就是修改上传的文件夹的路径。
118 |
119 | 我们使用 **CLIP-B/32** 作为图像编码器。不久将会支持更多的图像编码器!
120 |
121 | 以下是图像检索的示例: [Image Retrieve Example](figures/image_retrieve.mp4).
122 | Image Retrieve Example
123 |
124 |
125 |
126 |
127 |
128 |
129 | https://github.com/Liuziyu77/Soda_Dev/assets/137670115/761e489c-d572-4070-bb29-bb31d891f661
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 | #### 3. 图像到图文对的检索
138 | 运行 `./RAG/multimodal_rag.ipynb` 构建多模态数据库并从中检索信息。 在这里,用户需要提供了一个 `.tsv` 文件,包括`ID`, `PATH`, `INFO`。一个示例的 TSV 文件是 `./RAG/artwork_data.tsv`。
139 |
140 | ### 🐑 大语言模型
141 | 我们使用 InternLM-Xcomposer2 (一个基于 InternLM2-7B 的视觉语言模型)或者 GPT-4 来处理来自网络或者本地数据库的信息,并反馈给使用者。其中 InternLM-Xcomposer2 在本地运行。不久我们将会支持更多类型的大语言模型作为 SODA 的信息处理大脑。
142 |
143 | ### ✒️ Citation
144 | ```
145 | @misc{2024SODA,
146 | title={SODA: Search, Organize, Discovery Anything},
147 | author={SODA Team},
148 | howpublished = {\url{https://github.com/Liuziyu77/Soda}},
149 | year={2024}
150 | }
151 | ```
152 |
153 | ## 📜 License
154 |   **Usage and License Notices**: The data and code are intended and licensed for research use only.
155 |
--------------------------------------------------------------------------------
/web_search/Google_API.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "id": "bd73e766",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "\"\"\"\n",
11 | "input your Google API and test web search\n",
12 | "\"\"\""
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": 4,
18 | "id": "a0ea24df-ef76-40a5-9d68-52fef825d7d6",
19 | "metadata": {},
20 | "outputs": [
21 | {
22 | "name": "stdout",
23 | "output_type": "stream",
24 | "text": [
25 | "Enter your search query: llava\n"
26 | ]
27 | }
28 | ],
29 | "source": [
30 | "import requests\n",
31 | "\n",
32 | "def google_search(query):\n",
33 | " api_key = '**************************************'\n",
34 | " cx = '**********************************'\n",
35 | " url = 'https://www.googleapis.com/customsearch/v1'\n",
36 | " params = {'key': api_key, 'cx': cx, 'q': query}\n",
37 | " response = requests.get(url, params=params)\n",
38 | " data = response.json()\n",
39 | " return data['items']\n",
40 | "\n",
41 | "query = input('Enter your search query: ')\n",
42 | "search_results = google_search(query)"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 5,
48 | "id": "4ad27ded-c9af-43d3-93b4-a67b4e78c825",
49 | "metadata": {},
50 | "outputs": [
51 | {
52 | "data": {
53 | "text/html": [
54 | "\n",
55 | " | LLaVA | \n",
56 | " LLaVA Model. We introduce LLaVA (Large Language-and-Vision Assistant) , an end-to-end trained large multimodal model that connects a vision encoder and LLM for ... | \n",
57 | "
\n",
58 | "\n",
59 | " | haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual Instruction ... - GitHub | \n",
60 | " Chat about images using LLaVA without the need of Gradio interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference. With 4-bit quantization ... | \n",
61 | "
\n",
62 | "\n",
63 | " | Visual Instruction Tuning | \n",
64 | " Apr 17, 2023 ... Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal ... | \n",
65 | "
\n",
66 | "\n",
67 | " | LLaVA | \n",
68 | " The service is a research preview intended for non-commercial use only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and ... | \n",
69 | "
\n",
70 | "\n",
71 | " | LLaVA: Large Language and Vision Assistant - Microsoft Research | \n",
72 | " LLaVA is an open-source project, collaborating with research community to advance the state-of-the-art in AI. LLaVA represents the first end-to-end trained ... | \n",
73 | "
\n",
74 | "\n",
75 | " | Image descriptions with LLAVA : r/LocalLLaMA | \n",
76 | " Dec 30, 2023 ... Image descriptions with LLAVA ... Here is a \"good\" label: The image features a comic strip with two main characters: an angel and a runner. The ... | \n",
77 | "
\n",
78 | "\n",
79 | " | LLaVa | \n",
80 | " LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language ... | \n",
81 | "
\n",
82 | "\n",
83 | " | [llama.cpp] Experimental LLaVA 1.6 Quants (34B and Mistral 7B) : r ... | \n",
84 | " Feb 2, 2024 ... [llama.cpp] Experimental LLaVA 1.6 Quants (34B and Mistral 7B) ... They were prepared through this hacky script and is likely missing some of the ... | \n",
85 | "
\n",
86 | "\n",
87 | " | How to use llava with huggingface - Transformers - Hugging Face ... | \n",
88 | " Aug 27, 2023 ... Create a Visual Chatbot on AWS EC2 with LLaVA-1.5 and Runhouse. Get started with multimodal conversational models using the open-source LLaVA- ... | \n",
89 | "
\n",
90 | "\n",
91 | " | Has Anyone Encountered Issues with LLaVA 1.6 Models on Ollama ... | \n",
92 | " Feb 9, 2024 ... Make sure you are running latest version of LLaVA, for me it runs superfast, even tho sometimes server can crash, when the response requires ... | \n",
93 | "
"
94 | ],
95 | "text/plain": [
96 | ""
97 | ]
98 | },
99 | "execution_count": 5,
100 | "metadata": {},
101 | "output_type": "execute_result"
102 | }
103 | ],
104 | "source": [
105 | "from IPython.display import HTML\n",
106 | "\n",
107 | "rows = \"\\n\".join([\"\"\"\n",
108 | " | {1} | \n",
109 | " {2} | \n",
110 | "
\"\"\".format(v[\"link\"], v[\"title\"], v[\"snippet\"])\n",
111 | " for v in search_results])\n",
112 | "HTML(\"\".format(rows))"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "id": "c58aea72-a4a8-4131-b03a-e86d5d7707c9",
119 | "metadata": {},
120 | "outputs": [],
121 | "source": []
122 | }
123 | ],
124 | "metadata": {
125 | "kernelspec": {
126 | "display_name": "Python 3 (ipykernel)",
127 | "language": "python",
128 | "name": "python3"
129 | },
130 | "language_info": {
131 | "codemirror_mode": {
132 | "name": "ipython",
133 | "version": 3
134 | },
135 | "file_extension": ".py",
136 | "mimetype": "text/x-python",
137 | "name": "python",
138 | "nbconvert_exporter": "python",
139 | "pygments_lexer": "ipython3",
140 | "version": "3.10.13"
141 | }
142 | },
143 | "nbformat": 4,
144 | "nbformat_minor": 5
145 | }
146 |
--------------------------------------------------------------------------------
/web_search/utils.py:
--------------------------------------------------------------------------------
1 | import re
2 | import json
3 | import urllib
4 | import urllib.request
5 | import requests
6 | import html2text
7 | import urllib.request
8 | from IPython.display import HTML
9 | import asyncio
10 | import aiohttp
11 |
12 | """
13 | Input your search engine API here
14 | """
15 | def google_search(query):
16 | try:
17 | api_key = '********************************'
18 | cx = '****************************'
19 | url = 'https://www.googleapis.com/customsearch/v1'
20 | params = {'key': api_key, 'cx': cx, 'q': query}
21 | response = requests.get(url, params=params)
22 | data = response.json()
23 | ### ["link"], ["title"], ["snippet"]
24 | return data['items']
25 | except Exception as e:
26 | print(f"Google API Search Failed: {e}")
27 | raise e
28 |
29 | """
30 | Input your search engine API here
31 | """
32 | def bing_search(query):
33 | try:
34 | subscription_key = "*****************************"
35 | assert subscription_key
36 | search_url = "https://api.bing.microsoft.com/v7.0/search"
37 |
38 | search_term = query
39 |
40 | headers = {"Ocp-Apim-Subscription-Key": subscription_key}
41 | params = {"q": search_term, "textDecorations": True, "textFormat": "HTML"}
42 | response = requests.get(search_url, headers=headers, params=params)
43 | response.raise_for_status()
44 | search_results = response.json()
45 |
46 | ### ["url"], ["name"], ["snippet"]
47 | return search_results["webPages"]["value"]
48 | except Exception as e:
49 | print(f"Bing API Search Failed: {e}")
50 | raise e
51 |
52 | """
53 | Input your search engine API here
54 | """
55 | def serper_search(query, search_type):
56 | try:
57 | url = f"https://google.serper.dev/{search_type}"
58 | payload = json.dumps({
59 | "q": query,
60 | "num": 10
61 | })
62 | headers = {
63 | 'X-API-KEY': '**********************************',
64 | 'Content-Type': 'application/json'
65 | }
66 | response = requests.request("POST", url, headers=headers, data=payload)
67 | search_results = json.loads(response.text)
68 |
69 | if search_type == "images":
70 | ### ["link"], ["title"], ["imageUrl"]
71 | return search_results["images"]
72 |
73 | elif search_type == "videos":
74 | ### ["link"], ["title"], ["snippet"], ["imageUrl"]
75 | return search_results["videos"]
76 | except Exception as e:
77 | print(f"Serper API Search Failed: {e}")
78 | raise e
79 |
80 | """
81 | Web search
82 | """
83 | def search(search_engine, query, search_type=None):
84 | if search_engine == "google":
85 | search_results = google_search(query)
86 | return search_results
87 | elif search_engine == "bing":
88 | search_results = bing_search(query)
89 | return search_results
90 | elif search_engine == "serper":
91 | search_results = serper_search(query, search_type)
92 | return search_results
93 | else:
94 | print("Search engine is not support.")
95 |
96 | def preprocess_search_results(search_engine, search_results, search_num=5, search_type=None):
97 | if search_engine == "google":
98 | url = []
99 | title = []
100 | snippet = []
101 | for item in search_results:
102 | url.append(item["link"])
103 | title.append(item["title"])
104 | snippet.append(item["snippet"])
105 | if len(search_results)>=search_num:
106 | return url[:search_num], title[:search_num], snippet[:search_num]
107 | else:
108 | return url, title, snippet
109 |
110 | elif search_engine == "bing":
111 | url = []
112 | title = []
113 | snippet = []
114 | for item in search_results:
115 | url.append(item["url"])
116 | title.append(item["name"])
117 | snippet.append(item["snippet"])
118 | if len(search_results)>=search_num:
119 | return url[:search_num], title[:search_num], snippet[:search_num]
120 | else:
121 | return url, title, snippet
122 |
123 | elif search_engine == "serper":
124 | url = []
125 | title = []
126 | image_url = []
127 | snippet = []
128 | if search_type=="images":
129 | for item in search_results:
130 | url.append(item["link"])
131 | title.append(item["title"])
132 | image_url.append(item["imageUrl"])
133 | if len(search_results)>=search_num:
134 | return url[:search_num], title[:search_num], image_url[:search_num]
135 | else:
136 | return url, title, image_url
137 |
138 | elif search_type == "videos":
139 | for item in search_results:
140 | url.append(item["link"])
141 | title.append(item["title"])
142 | image_url.append(item["imageUrl"])
143 | snippet.append(item["snippet"])
144 | if len(search_results)>=search_num:
145 | return url[:search_num], title[:search_num], image_url[:search_num], snippet[:search_num]
146 | else:
147 | return url, title, image_url, snippet
148 | else:
149 | print("Search engine is not support.")
150 |
151 | """
152 | 抓取页面方法,调用该方法返回抓取到数据
153 | """
154 | async def get_web_contents(urls):
155 | try:
156 | async with aiohttp.ClientSession(trust_env = True) as session:
157 | # tasks = [get_web_content(session, url) for url in urls]
158 | tasks = [read_pageHtml(session, url) for url in urls]
159 | web_contents = await asyncio.gather(*tasks, return_exceptions=False)
160 | return web_contents
161 | except aiohttp.ClientResponseError as e:
162 | print(f"get web contents failed: {e}")
163 | return []
164 |
165 | async def read_pageHtml(session, url):
166 | async with session.get(url) as response:
167 | try:
168 | response.raise_for_status()
169 | print(response)
170 | try:
171 | response.encoding = 'utf-8'
172 | html = await response.text()
173 | except UnicodeDecodeError:
174 | try:
175 | response.encoding = 'gbk'
176 | html = await response.text()
177 | except Exception as e:
178 | print(f"An error occurred during decode html: {e}")
179 |
180 | # HTML(html)
181 | h = html2text.HTML2Text()
182 | h.ignore_links = True
183 | h.ignore_images = True
184 | page_string = h.handle(html)
185 | page_string = page_string.replace("\n","")
186 | return page_string
187 |
188 | except Exception as e:
189 | print(f"An error occurred during read web pages: {e}")
190 | return None
191 |
192 | def read_single_pageHtml(url):
193 | try:
194 | file = urllib.request.urlopen(url)
195 | data = file.read()
196 | # decode to string
197 | try:
198 | decoded_html = data.decode('utf-8')
199 | except UnicodeDecodeError:
200 | try:
201 | decoded_html = data.decode('gbk')
202 | except Exception as e:
203 | print(f"An error occurred during decode html: {e}")
204 |
205 | # string to html
206 | HTML(decoded_html)
207 | # html to markdown
208 | h = html2text.HTML2Text()
209 | h.ignore_links = True
210 | h.ignore_images = True
211 | page_string = h.handle(decoded_html)
212 | page_string = page_string.replace("\n","")
213 | return page_string
214 |
215 | except Exception as e:
216 | print(f"An error occurred during read web pages: {e}")
217 | return None
218 |
219 | def merge_snippet(str_list):
220 | numbered_str = "\n".join(f"{i+1}. {s}" for i, s in enumerate(str_list))
221 | return numbered_str
222 |
223 | """
224 | rerank
225 | """
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |

3 | SODA: Search, Organize, Discovery Anything
4 |
5 |
8 |
9 |
12 |
13 |
14 |
15 | English |
16 | 简体中文
17 |
18 |
19 | 🌟 Welcome to my GitHub project! If you like what you see, don't hesitate to hit that star button! More stars, faster updates, more fun!
20 |
21 | ## 📣 Introduction
22 | With the advent and extensive deployment of Large Language Models (LLMs), these sophisticated systems have showcased immense potential in a variety of application domains. Nevertheless, even highly advanced models such as GPT-4 are not without their limitations; they aren't omniscient and are susceptible to the so-called 'hallucination problem'.
23 |
24 | Acknowledging these constraints, we have innovated **SODA (Search, Organize, Discover Anything) - a cutting-edge information integration Tool**, propelled by the power of large language models(LLMs). SODA leverages an LLM at its core for processing information, adeptly sourcing data from a multitude of channels in response to user queries. This enables it to provide nuanced and comprehensive answers. Through SODA, users gain access to a sophisticated web search mechanism that fetches pertinent information from the internet. This integrates seamlessly with the innate knowledge of the LLM and external sources, ensuring answers are not only accurate but also reliable. Furthermore, SODA empowers users to upload personal files, facilitating the creation of a private, secure, and robust local knowledge database. This feature allows LLMs to assimilate new information effortlessly, eliminating the need for pre-training or fine-tuning, and to utilize this knowledge effectively in response to queries.
25 |
26 | Overall, SODA is envisioned as a **secure**, **dependable**, and **intelligently sourced** tool. It's strategically designed to enable users to proficiently handle and interpret information gleaned from expansive models, the web, and your own database.
27 |
28 | ## 🔭 Architecture
29 | SODA's architecture is show below:
30 |
31 |

32 |
33 |
34 | We support **web search**, **text retrieval(local database)** and **image retrieval(local database)** now.
35 | In text retrieval, we have implemented a **two-stage** retrieval process, consisting of initial database retrieval and subsequent reranking.
36 |
37 | ## 📢 News
38 | - 🚀 [04/18/2024] We have open-sourced the first version of SODA, and more updates will be coming soon!!!
39 |
40 | ## 💡 Highlights
41 | - 🔥 **New technology framework.** We have developed an LLM-driven information integration tool, which provides a technical framework for retrieval argumented generation(RAG) and tool use directions for AI Agents.
42 | - 🔥 **Good compatibility.** SODA is capable of easily swapping components, utilizing various search engines, vector databases or LLMs, and exhibits good compatibility.
43 | - 🔥 **Reliable&traceable.** SODA effectively addresses partial hallucination issues of LLM, providing reliable and accurate answers with traceable information sources.
44 | - 🔥 **Data privacy.** SODA supports local databases, allowing the model to acquire new knowledge without pretraining or finetuning, while effectively protecting user data privacy.
45 |
46 | ## 🛠️ Usage
47 |
48 | ### Contents
49 | - [Install](#Install)
50 | - [Web Search Pipeline with Various APIs](#Web-Search-Pipeline-with-Various-APIs)
51 | - [Retrieve-Pipeline-Based-on-Local-Database](#Retrieve-Pipeline-Based-on-Local-Database)
52 | - [LLMs](#LLMs)
53 |
54 | ### Install
55 | To run SODA locally, clone the repository and set up the environment.
56 | ```bash
57 | mkdir SODA
58 | cd SODA
59 | git clone https://github.com/Liuziyu77/Soda.git
60 | pip install requirements.txt
61 | ```
62 | To experiment with individual functions of SODA, navigate through various directories to execute `.ipynb` files. To run Gradio locally, please follow these instructions.
63 | ```bash
64 | cd web_ui
65 | python web_ui.py
66 | ```
67 | Please note that you need to **modify the base_directory** path in `web_ui.py`. Intermediate files generated (such as databases built from local files) will be temporarily stored there. These files will be periodically cleaned up. If needed, please adjust the code accordingly.
68 |
69 | To enable web search and utilize OpenAI's API, please enter the corresponding API keys in the `./web_search/utils.py` and `./mllm/soda_mllm.py` files.
70 |
71 | ### 🌐 Web Search Pipeline with Various APIs
72 | The code related to web search is stored in the `web_search` folder. This folder contains a collection of code that utilizes various search engine APIs to retrieve relevant information based on user input. This process demonstrates an efficient integration of multiple search tools to optimize the relevance and accuracy of search results.
73 |
74 | #### Using API
75 | We have suported APIs of Google, Bing and Serper. You can run `./web_search/Google_API.ipynb`, `./web_search/Serper_API.ipynb` and `./web_search/Bing_API.ipynb` to test the usage of these search engines. But first of all, an API is necessary. Here are the links to get various search engine APIs.
76 |
77 | * You can get Google APIs: [Google API](https://cloud.google.com/apis/docs/overview)
78 | * You can get Bing APIs: [Bing API](https://serper.dev/)
79 | * You can get Serper APIs: [Serper API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api)
80 |
81 | Additionally, we will offer comprehensive search capabilities beyond **text**, including support for both **image and video searches** soon!
82 |
83 | Here is the [Web Search Example](figures/web_search.mp4).
84 | Web Search Example
85 |
86 |
87 |
88 | https://github.com/Liuziyu77/Soda_Dev/assets/137670115/7bc73223-eaa9-44f5-a379-8bf204d4380c
89 |
90 |
91 |
92 |
93 |
94 |
95 | ### 🔎 Retrieve Pipeline Based on Local Database
96 | The code related to RAG on local database is stored in the `RAG` folder. This folder is the implement of building your own local database and retrieve information from it. It includes `text-text` retrieve, `image-image` retrieve and `image-image&text pair` retrieve. You can test the retrieval functionalities by running different `.ipynb` files, we provide three scripts as examples.
97 |
98 | #### 1. Text-text retrieve
99 | User can run the `./RAG/text_rag.ipynb` to build a text database and retrieve information from it. The only thing you need to do is just providing a text file path. We support TXT, DOCX, PDF format now.
100 |
101 | We use **Sentense transfomer** as the text encoder. More encoder will be supported soon!
102 |
103 | Here is the [Text Retrieve Example](figures/text_retrieve.mp4).
104 | Text Retrieve Example
105 |
106 |
107 |
108 |
109 |
110 | https://github.com/Liuziyu77/Soda_Dev/assets/137670115/7a2042b9-7c03-44f4-9e36-abb7f244da19
111 |
112 |
113 |
114 |
115 |
116 |
117 |
118 | #### 2. Image-Image retrieve
119 | User can run the `./RAG/image_rag.ipynb` to build a image database and retrieve information from it. The only thing you need to do is just providing a folder path.
120 |
121 | We use **CLIP-B/32** as the image encoder. More visual encoder will be supported soon!
122 |
123 | Here is the [Image Retrieve Example](figures/image_retrieve.mp4).
124 | Image Retrieve Example
125 |
126 |
127 |
128 |
129 |
130 |
131 | https://github.com/Liuziyu77/Soda_Dev/assets/137670115/761e489c-d572-4070-bb29-bb31d891f661
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 | #### 3. Image-Image&Text retrieve
140 | User can run the `./RAG/multimodal_rag.ipynb` to build a multimodal database and retrieve information from it. Here, you need to provide a `.tsv` file which include your data's `ID`, `PATH`, `INFO`. An example TSV file is `./RAG/artwork_data.tsv`.
141 |
142 | ### 🐑 LLMs
143 | We use the InternLM-Xcomposer2(a vision-language large model (VLLM) based on InternLM2-7B) or GPT-4 to process the information from web and database, and feedback to users. We will soon support more LLMs as the information processing core for SODA.
144 |
145 | ### ✒️ Citation
146 | ```
147 | @misc{2024SODA,
148 | title={SODA: Search, Organize, Discovery Anything},
149 | author={SODA Team},
150 | howpublished = {\url{https://github.com/Liuziyu77/Soda}},
151 | year={2024}
152 | }
153 | ```
154 |
155 | ## 📜 License
156 |   **Usage and License Notices**: The data and code are intended and licensed for research use only.
157 |
--------------------------------------------------------------------------------
/web_search/Bing_API.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "\"\"\"\n",
10 | "input your Bing API and test web search\n",
11 | "\"\"\""
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": 1,
17 | "metadata": {
18 | "executionInfo": {
19 | "elapsed": 1375,
20 | "status": "ok",
21 | "timestamp": 1712481598835,
22 | "user": {
23 | "displayName": "刘某",
24 | "userId": "17997477929294889377"
25 | },
26 | "user_tz": -480
27 | },
28 | "id": "o56ILroftBdi"
29 | },
30 | "outputs": [],
31 | "source": [
32 | "subscription_key = \"********************************\"\n",
33 | "assert subscription_key"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 5,
39 | "metadata": {
40 | "executionInfo": {
41 | "elapsed": 315,
42 | "status": "ok",
43 | "timestamp": 1712481618047,
44 | "user": {
45 | "displayName": "刘某",
46 | "userId": "17997477929294889377"
47 | },
48 | "user_tz": -480
49 | },
50 | "id": "pf8pIZzVulyN"
51 | },
52 | "outputs": [],
53 | "source": [
54 | "search_url = \"https://api.bing.microsoft.com/v7.0/search\"\n",
55 | "search_term = \"LLaVa\""
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": 6,
61 | "metadata": {
62 | "executionInfo": {
63 | "elapsed": 524,
64 | "status": "ok",
65 | "timestamp": 1712481620257,
66 | "user": {
67 | "displayName": "刘某",
68 | "userId": "17997477929294889377"
69 | },
70 | "user_tz": -480
71 | },
72 | "id": "-3Cmm9Gf23fr"
73 | },
74 | "outputs": [],
75 | "source": [
76 | "import requests\n",
77 | "\n",
78 | "headers = {\"Ocp-Apim-Subscription-Key\": subscription_key}\n",
79 | "params = {\"q\": search_term, \"textDecorations\": True, \"textFormat\": \"HTML\"}\n",
80 | "response = requests.get(search_url, headers=headers, params=params)\n",
81 | "response.raise_for_status()\n",
82 | "search_results = response.json()"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 11,
88 | "metadata": {
89 | "colab": {
90 | "base_uri": "https://localhost:8080/",
91 | "height": 436
92 | },
93 | "executionInfo": {
94 | "elapsed": 316,
95 | "status": "ok",
96 | "timestamp": 1712481635040,
97 | "user": {
98 | "displayName": "刘某",
99 | "userId": "17997477929294889377"
100 | },
101 | "user_tz": -480
102 | },
103 | "id": "DcXJq5lf3AKc",
104 | "outputId": "6e8c8bdc-c371-4139-b6c7-18a01a5c5b5a"
105 | },
106 | "outputs": [
107 | {
108 | "data": {
109 | "text/html": [
110 | "\n",
111 | " | haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond. - GitHub | \n",
112 | " [10/26] 🔥 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts, script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA. [10/12] Check out the Korean] | \n",
113 | "
\n",
114 | "\n",
115 | " | LLaVA | \n",
116 | " LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA. | \n",
117 | "
\n",
118 | "\n",
119 | " | LLaVA | \n",
120 | " Image. Drop Image Here - or - Click to Upload. Examples. What is unusual about this image? What are the things I should be cautious about when I visit here? Parameters . LLaVA Chatbot. 👍 Upvote. | \n",
121 | "
\n",
122 | "\n",
123 | " | LLaVA(Large Language and Vision Assistant)大模型 - 知乎 | \n",
124 | " LLaVA(Large Language and Vision Assistant)是一个由威斯康星大学麦迪逊分校、微软研究院和哥伦比亚大学研究者共同发布的多模态大模型。. 该模型展示出了一些接近多模态 GPT-4 的图文理解能力:相对于 GPT-4 获得了 85.1% 的相对得分。. 当在科学问答(Science QA)上进行 ... | \n",
125 | "
\n",
126 | "\n",
127 | " | LLaVA-1.6: Improved reasoning, OCR, and world knowledge | \n",
128 | " Today, we are thrilled to present LLaVA-1.6, with improved reasoning, OCR, and world knowledge. LLaVA-1.6 even exceeds Gemini Pro on several benchmarks. Compared with LLaVA-1.5, LLaVA-1.6 has several improvements: Increasing the input image to 4x | \n",
129 | "
\n",
130 | "\n",
131 | " | LLaVA: Large Language and Vision Assistant - Microsoft Research | \n",
132 | " LLaVA is an open-source project, collaborating with research community to advance the state-of-the-art in AI. LLaVA represents the first end-to-end trained large multimodal model (LMM) that achieves impressive chat capabilities mimicking spirits of the multimodal GPT-4. | \n",
133 | "
\n",
134 | "\n",
135 | " | LLaVA-Interactive | \n",
136 | " LLaVA-Interactive is a system-level synergy of the inference stages of three models, without additional model training. It is surprisingly cheap to build. Checkout our code release on GitHub. For better demo experience, please play LLaVA-Interactive in a seperate tab by clicking me. | \n",
137 | "
\n",
138 | "\n",
139 | " | [2304.08485] Visual Instruction Tuning - arXiv.org | \n",
140 | " By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early | \n",
141 | "
\n",
142 | "\n",
143 | " | Releases · haotian-liu/LLaVA · GitHub | \n",
144 | " Release v1.1.0. 🔥 LLaVA-1.5 is out! This release supports LLaVA-1.5 model inference and serving. We will release the training scripts, data, and evaluation scripts on benchmarks in the coming week. | \n",
145 | "
\n",
146 | "\n",
147 | " | LLaVA/README.md at main · haotian-liu/LLaVA · GitHub | \n",
148 | " [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond. - LLaVA/README.md at main · haotian-liu/LLaVA | \n",
149 | "
"
150 | ],
151 | "text/plain": [
152 | ""
153 | ]
154 | },
155 | "execution_count": 11,
156 | "metadata": {},
157 | "output_type": "execute_result"
158 | }
159 | ],
160 | "source": [
161 | "from IPython.display import HTML\n",
162 | "\n",
163 | "rows = \"\\n\".join([\"\"\"\n",
164 | " | {1} | \n",
165 | " {2} | \n",
166 | "
\"\"\".format(v[\"url\"], v[\"name\"], v[\"snippet\"])\n",
167 | " for v in search_results[\"webPages\"][\"value\"]])\n",
168 | "HTML(\"\".format(rows))"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": null,
174 | "metadata": {},
175 | "outputs": [],
176 | "source": []
177 | }
178 | ],
179 | "metadata": {
180 | "colab": {
181 | "authorship_tag": "ABX9TyNg5Jz6VfBt+1bKMOC2VI7z",
182 | "provenance": []
183 | },
184 | "kernelspec": {
185 | "display_name": "Python 3 (ipykernel)",
186 | "language": "python",
187 | "name": "python3"
188 | },
189 | "language_info": {
190 | "codemirror_mode": {
191 | "name": "ipython",
192 | "version": 3
193 | },
194 | "file_extension": ".py",
195 | "mimetype": "text/x-python",
196 | "name": "python",
197 | "nbconvert_exporter": "python",
198 | "pygments_lexer": "ipython3",
199 | "version": "3.10.13"
200 | }
201 | },
202 | "nbformat": 4,
203 | "nbformat_minor": 4
204 | }
205 |
--------------------------------------------------------------------------------
/RAG/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import glob
3 | import shutil
4 | import logging
5 | import chromadb
6 | import numpy as np
7 | import pandas as pd
8 | import sentence_transformers
9 | from chromadb.utils import embedding_functions
10 | from chromadb.utils.data_loaders import ImageLoader
11 | from langchain.document_loaders import PyMuPDFLoader
12 | from langchain.embeddings import HuggingFaceEmbeddings
13 | from langchain_community.document_loaders import PyPDFLoader
14 | from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
15 | from langchain.text_splitter import RecursiveCharacterTextSplitter, TokenTextSplitter
16 |
17 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
18 | logging.info("initial")
19 |
20 | """
21 | Load Files Function
22 | """
23 | ### return file format
24 | def identify_file_type(file_path):
25 | _, file_extension = os.path.splitext(file_path)
26 | file_extension = file_extension.lower()
27 |
28 | if file_extension == '.pdf':
29 | return 'PDF'
30 | elif file_extension == '.txt':
31 | return 'Text'
32 | elif file_extension == '.tsv':
33 | return 'TSV'
34 | elif file_extension == '.csv':
35 | return 'CSV'
36 | else:
37 | return None
38 |
39 | ### load PDF,TXT,DOCX
40 | def documents_load_local(doc_path):
41 | try:
42 | docs = []
43 | docs.extend(PyMuPDFLoader(doc_path).load())
44 | return docs
45 | except Exception as e:
46 | print(f"Documentes loaded failed: {e}")
47 | raise e
48 |
49 | ### load PDF
50 | def documents_load_pdf(pdf_path):
51 | loader = PyPDFLoader(pdf_path)
52 | docs = loader.load_and_split()
53 | return docs
54 |
55 | ### load CVS
56 | def documents_load_csv(path):
57 | docs = pd.read_csv(path, sep='\t', header=None)
58 | return docs
59 |
60 | """
61 | Preprocess loaded text files
62 | """
63 | ### split documents for different chunk_size
64 | def documents_split(docs,chunk_size, chunk_overlap):
65 | try:
66 | text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
67 | # text_splitter = TokenTextSplitter(encoding_name=encoder, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
68 | docs_split = text_splitter.split_documents(docs)
69 | return docs_split
70 | except Exception as e:
71 | print(f"Documentes split failed: {e}")
72 | raise e
73 |
74 | def documents_to_str(docs):
75 | docs_str = []
76 | for doc in docs:
77 | docs_str.append(doc.dict()["page_content"])
78 | return docs_str
79 |
80 |
81 | def preprocess_files(file_path, chunk_size, chunk_overlap):
82 | docs = documents_load_local(file_path)
83 | docs_split = documents_split(docs, chunk_size, chunk_overlap)
84 | docs_split[0].dict()["page_content"]
85 | docs_split_str = documents_to_str(docs_split)
86 | return docs_split_str
87 |
88 | """
89 | Build text database
90 | """
91 | def build_text_database(docs, batch_size, encoder="intfloat/multilingual-e5-base", database_name = "test", database_path = "./database/test"):
92 | if os.path.exists(database_path):
93 | print(f"Database '{database_name}' exists. Change a new name.")
94 | return None
95 | else:
96 | ### text encoder
97 | embeddings = HuggingFaceEmbeddings(model_name=encoder, encode_kwargs={'normalize_embeddings': True})
98 | ### client
99 | client = chromadb.PersistentClient(path=database_path)
100 | ### text encoder funtino
101 | text_emb_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=encoder, normalize_embeddings = True)
102 |
103 | ### create collection
104 | txt_collection = client.create_collection(name=database_name, embedding_function=text_emb_fn, metadata={"hnsw:space": "cosine"})
105 |
106 | ### load collection
107 | txt_collection = client.get_collection(name=database_name, embedding_function=text_emb_fn)
108 |
109 | for start_idx in range(0, len(docs), batch_size):
110 | end_idx = min(start_idx + batch_size, len(docs))
111 | batch_docs = docs[start_idx:end_idx]
112 | ids = [str(num) for num in range(start_idx, end_idx)]
113 | # embeddings
114 | logging.info("Begin embedding docs.")
115 | embed_split = embeddings.embed_documents(batch_docs)
116 | logging.info("End embedding docs.")
117 |
118 | # add to database
119 | metadatas = [{"ID": num} for num in ids]
120 | logging.info("Begin adding documents.")
121 | txt_collection.add(
122 | documents=batch_docs,
123 | embeddings=embed_split,
124 | ids=ids,
125 | )
126 | logging.info(f"Batch {start_idx // batch_size + 1} done!")
127 | logging.info("All done!")
128 |
129 | return txt_collection
130 |
131 |
132 | ### delete database
133 | def delete_folder(folder_path):
134 | # check the fold
135 | if os.path.exists(folder_path):
136 | # delete the fold
137 | shutil.rmtree(folder_path)
138 | print(f"Folder '{folder_path}' has been deleted.")
139 | else:
140 | print(f"Folder '{folder_path}' does not exist.")
141 |
142 |
143 | """
144 | Read fold with images
145 | """
146 | def read_image_files(folder_path):
147 | # format
148 | image_formats = ["*.jpg", "*.jpeg", "*.png", "*.gif", "*.bmp", "*.tiff"]
149 | image_paths = []
150 | for format in image_formats:
151 | image_paths.extend(glob.glob(os.path.join(folder_path, format)))
152 |
153 | return image_paths
154 |
155 |
156 |
157 | """
158 | Build image database
159 | """
160 | def build_image_database(fold_path, database_name = "test", database_path = "./database/test"):
161 |
162 | ### client
163 | client = chromadb.PersistentClient(path=database_path)
164 | ### image encoder function
165 | img_emb_fn = OpenCLIPEmbeddingFunction()
166 | ### data loader
167 | data_loader = ImageLoader()
168 | ### creat collection
169 | img_collection = client.create_collection(name=database_name, embedding_function=img_emb_fn, metadata={"hnsw:space": "cosine"})
170 | ### load collection
171 | img_collection = client.get_collection(name=database_name, embedding_function=img_emb_fn, data_loader=data_loader)
172 |
173 | image_paths = [os.path.join(fold_path, file) for file in os.listdir(fold_path) if file.lower().endswith(('.jpg', '.jpeg', '.png', '.gif', '.bmp'))]
174 | print("total image number is: "+str(len(image_paths)))
175 | ids = list(range(0, len(image_paths)))
176 | ids = [str(num) for num in ids]
177 | image_names = [os.path.splitext(os.path.basename(url))[0] for url in image_paths]
178 | metadatas = [{"ID":metadata} for metadata in image_names]
179 | logging.info("Begin building database")
180 | img_collection.add(
181 | ids=ids,
182 | uris = image_paths,
183 | metadatas=metadatas,
184 | )
185 | logging.info("Done!")
186 |
187 | return img_collection
188 |
189 |
190 | """
191 | Build multimodal database
192 | """
193 | def build_multimodal_database(tsv_path, image_folder, encoder="intfloat/multilingual-e5-base", database_name = "test"):
194 | ### text encoder
195 | embeddings = HuggingFaceEmbeddings(model_name=encoder, encode_kwargs={'normalize_embeddings': True})
196 | ### client
197 | client = chromadb.PersistentClient(path=f"./database/{database_name}")
198 | ### text/image encoder function
199 | text_emb_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="intfloat/multilingual-e5-base")
200 | img_emb_fn = OpenCLIPEmbeddingFunction()
201 | ### data loader
202 | data_loader = ImageLoader()
203 |
204 |
205 | txt_collection = client.create_collection(name=f"{database_name}_text", embedding_function=text_emb_fn, metadata={"hnsw:space": "cosine"})
206 | img_collection = client.create_collection(name=f"{database_name}_image", embedding_function=img_emb_fn, metadata={"hnsw:space": "cosine"})
207 |
208 | ### load collection
209 | txt_collection = client.get_collection(name=f"{database_name}_text", embedding_function=text_emb_fn)
210 | img_collection = client.get_collection(name=f"{database_name}_image", embedding_function=img_emb_fn, data_loader=data_loader)
211 |
212 |
213 | ### add data
214 | ### embed_text [[float],[float],[float],...]; docs_split [str,str,str,...]
215 | df = pd.read_csv(tsv_path, sep='\t')
216 |
217 | docs_split = df["INFO"]
218 | docs_split = [s for s in docs_split]
219 |
220 | metadatas = df["PATH"]
221 | metadatas = [s for s in metadatas]
222 | image_paths = metadatas.copy()
223 | metadatas = [{"ID":metadata} for metadata in metadatas]
224 |
225 | ids = list(range(0, len(docs_split)))
226 | ids = [str(num) for num in ids]
227 |
228 | logging.info("Len of data is: "+str(len(ids)))
229 |
230 |
231 | logging.info("Begin add text.")
232 | txt_collection.add(
233 | documents = docs_split,
234 | ids = ids,
235 | metadatas = metadatas
236 | )
237 | logging.info("End add text.")
238 |
239 | image_paths = [image_folder+image_path for image_path in image_paths]
240 | logging.info("Begin add images.")
241 | img_collection.add(
242 | ids=ids,
243 | uris = image_paths,
244 | metadatas=metadatas
245 | )
246 | logging.info("End add images.")
247 | logging.info("ALL DONE!")
248 |
249 | return txt_collection, img_collection
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [Ziyu Liu] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/web_ui/web_ui.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append("../")
3 | import torch
4 | import gradio as gr
5 | from mllm.soda_mllm import mllm_openai, mllm_IXC2
6 | from web_search.utils import search, preprocess_search_results, merge_snippet
7 | from web_search.utils import bing_search
8 | import os
9 | import uuid
10 | os.environ["no_proxy"] = "127.0.0.1,localhost"
11 | os.environ["GRADIO_TEMP_DIR"] = "./database"
12 | import shutil
13 | import time
14 | import threading
15 | import random
16 | from RAG.utils import preprocess_files, build_text_database
17 | from RAG.utils import build_image_database
18 | from service.utils import rag_database
19 | from sentence_transformers import CrossEncoder
20 | from transformers import AutoModel, AutoTokenizer
21 | from transformers import (AutoModelForCausalLM, AutoTokenizer,
22 | StoppingCriteria, StoppingCriteriaList)
23 | from transformers.generation import GenerationConfig
24 | from peft import AutoPeftModelForCausalLM
25 |
26 | torch.manual_seed(1234)
27 |
28 | ### globle variable
29 | user_directory = None
30 | image_user_directory = None
31 | txt_collection = None
32 | img_collection = None
33 |
34 | ### init model
35 | ## crose_encoder
36 | cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2', max_length=512)
37 | ## internlm-xcomposer
38 | torch.set_grad_enabled(False)
39 | IXC2_model = AutoModelForCausalLM.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).cuda().eval()
40 | IXC2_tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True)
41 |
42 | """
43 | set your database path here
44 | """
45 | base_directory = "/mnt/petrelfs/liuziyu/V3Det/Soda_Dev/web_ui/database/"
46 | # max keep time(s)
47 | max_file_age = 3600 # 1 hour
48 |
49 | custom_css = """
50 | body { font-family: 'Arial'; background-color: #f4f4f4; color: #333; }
51 | .header { background-color: #5D1049; padding: 20px; color: #fff; text-align: center; }
52 | .header h1 { margin: 0; }
53 | .button, .tab { background-color: #EAC435; color: #5D1049; border: none; border-radius: 5px; }
54 | .button:hover, .tab:hover { background-color: #F3D70B; }
55 | input, textarea, .textbox { border-radius: 5px; }
56 | img { border: none; display: block; margin: 0 auto; }
57 | .intro p { color: #333; font-size: 18px; text-align: center; }
58 | #
59 | """
60 | # h1 {
61 | # color: black;
62 | # font-size: 36px;
63 | # text-align: center;
64 | # }
65 | # p {
66 | # color: #555;
67 | # font-size: 24px;
68 | # text-align: center;
69 | # }
70 |
71 | def clear_old_files():
72 | while True:
73 | now = time.time()
74 | for filename in os.listdir(base_directory):
75 | file_path = os.path.join(base_directory, filename)
76 | if os.path.isdir(file_path):
77 | creation_time = os.path.getctime(file_path)
78 | if now - creation_time > max_file_age:
79 | shutil.rmtree(file_path)
80 | time.sleep(3600) # check 1 time per hour
81 | # clear old data
82 | threading.Thread(target=clear_old_files, daemon=True).start()
83 |
84 |
85 | def web_search_gradio(query, search_engine, mllm, search_num):
86 | results = search(search_engine, query)
87 | print(search_engine)
88 | urls, title, snippet = preprocess_search_results(search_engine, results, search_num)
89 |
90 | ### rerank
91 | input_cross_encoder = [(query, s) for s in snippet]
92 | rerank_scores = cross_encoder_model.predict(input_cross_encoder)
93 | sorted_str_list = [x for _, x in sorted(zip(rerank_scores, snippet), reverse=True)]
94 |
95 | web_contents_combined = merge_snippet(sorted_str_list)
96 |
97 | if mllm == "GPT4-V":
98 | answer = mllm_openai(query, web_contents_combined)
99 | elif mllm =="InternLM-Xcomposer2":
100 | answer = mllm_IXC2(IXC2_model, IXC2_tokenizer, query, web_contents_combined)
101 | links = []
102 | for i in range(len(urls)):
103 | links.append((title[i], urls[i]))
104 | links = "\n".join([f"[{title}]({url})" for title, url in links])
105 | return links, answer
106 |
107 | def process_uploaded_text_file(file_upload):
108 | global user_directory
109 | global txt_collection
110 |
111 | try:
112 | session_id = f"{int(time.time())}_{random.randint(0, 1000)}"
113 | user_directory = os.path.join(base_directory, session_id)
114 | os.makedirs(user_directory, exist_ok=True)
115 | target_path = os.path.join(user_directory, file_upload.name.split("/")[-1])
116 | shutil.move(file_upload.name, target_path)
117 |
118 | docs = preprocess_files(file_path = target_path,chunk_size=500, chunk_overlap=50)
119 | txt_collection = build_text_database(docs, batch_size=40000, encoder="intfloat/multilingual-e5-base", database_name = session_id, database_path = user_directory+"_database")
120 | return "Database has been built."
121 | except Exception as e:
122 | print("Error:", e)
123 | return "Database built error"
124 |
125 | def text_rag_gradio(query, n_results, mllm):
126 | global txt_collection
127 | try:
128 | ans = txt_collection.query(
129 | query_texts=[query],
130 | n_results=n_results
131 | )
132 | ans = ans["documents"][0]
133 |
134 | ### rerank
135 | input_cross_encoder = [(query, s) for s in ans]
136 | rerank_scores = cross_encoder_model.predict(input_cross_encoder)
137 | sorted_str_list = [x for _, x in sorted(zip(rerank_scores, ans), reverse=True)]
138 |
139 | ans_combined = merge_snippet(sorted_str_list)
140 | if mllm == "GPT4-V":
141 | answer = mllm_openai(query, ans_combined)
142 | elif mllm =="InternLM-Xcomposer2":
143 | answer = mllm_IXC2(IXC2_model, IXC2_tokenizer, query, ans_combined)
144 | return ans_combined,answer
145 | except Exception as e:
146 | print(f"RAG database error: {e}")
147 | return f"{e}"
148 |
149 |
150 | def process_uploaded_image_fold_file(images_upload):
151 | global img_collection
152 | global image_user_directory
153 | file_paths = [file.name for file in images_upload]
154 | session_id = f"{int(time.time())}_{random.randint(0, 1000)}"
155 | image_user_directory = os.path.join(base_directory, session_id)
156 | os.makedirs(image_user_directory, exist_ok=True)
157 |
158 | for file_path in file_paths:
159 | target_path = os.path.join(image_user_directory, file_path.split("/")[-1])
160 | shutil.move(file_path, target_path)
161 |
162 | img_collection = build_image_database(fold_path = image_user_directory, database_name = session_id, database_path = image_user_directory+"_database")
163 |
164 | return "Database has been built."
165 |
166 | def image_rag_gradio(image_upload):
167 | test_image_path = image_upload.name
168 | n_pictures = 4
169 | ans = img_collection.query(
170 | query_uris=[test_image_path], # A list of strings representing URIs to data
171 | n_results=n_pictures
172 | )
173 | retrieved_image_path = []
174 | retrieved_image_path.append(image_user_directory+"/"+ans["metadatas"][0][0]["ID"]+'.jpg')
175 | retrieved_image_path.append(image_user_directory+"/"+ans["metadatas"][0][1]["ID"]+'.jpg')
176 | retrieved_image_path.append(image_user_directory+"/"+ans["metadatas"][0][2]["ID"]+'.jpg')
177 | return [test_image_path],retrieved_image_path
178 |
179 |
180 | """
181 | Gradio main function
182 | """
183 | def main():
184 | # SimplyRetrieve App
185 | with gr.Blocks(title="SODA Agent",css=custom_css) as app:
186 | gr.Markdown("""
187 |
191 | """)
192 | with gr.Row():
193 | gr.Markdown("")
194 | gr.Image(value="./soda_title.png", show_download_button=False,
195 | container=False)
196 | gr.Markdown("")
197 |
198 | gr.Markdown("""
199 |
200 |
201 |
202 | 🚀🚀🚀Welcome to the SODA: Search, Organize, Discover Anything. This multi-functional tool helps you search the web, process text and images, and leverage large language models to derive insights. 🚀🚀🚀
203 |
204 |
205 |
206 | Choose a tab to begin using specific functionalities:
207 |
208 | 🌐 **Web Search**: Search the internet using your preferred search engine.
209 |
210 | 🔎 **Text Retrieve**: Upload and process text files, and ask questions based on their content.
211 |
212 | 🌅 **Image Retrieve**: Manage and retrieve images based on uploaded content.
213 |
214 |
215 | Link to our Github: [SODA](https://github.com/Liuziyu77/Soda)
216 | """)
217 |
218 | with gr.Tab("Web Search"):
219 | with gr.Row():
220 | # Input section - where users can type their query
221 | web_search_query = gr.Textbox(label="Enter your query here", placeholder="Type your query...", lines=2)
222 | search_engine_dropdown = gr.Dropdown(label="Select Search Engine", choices=["google", "bing"],
223 | value="bing")
224 | web_mllm_dropdown = gr.Dropdown(label="Select mllms", choices=["GPT4-V", "InternLM-Xcomposer2"],
225 | value="GPT4-V")
226 | search_num_slider = gr.Slider(label="Number of results", minimum=0, maximum=10, step=1, value=10)
227 | web_search_button = gr.Button("Search")
228 |
229 | with gr.Row():
230 | # Output section - divided into two parts: search results and LLM response
231 | with gr.Column(scale=1):
232 | web_search_results = gr.Textbox(label="Web Search Results", placeholder="Search results from web will appear here...", lines=10, interactive=False)
233 | with gr.Column(scale=1):
234 | web_search_answer = gr.Textbox(label="LLM Response", placeholder="LLM's response will appear here...", lines=10, interactive=False)
235 |
236 | with gr.Tab("Text Retrieve"):
237 | with gr.Row():
238 | with gr.Column(scale=1):
239 | text_file_upload = gr.components.File(label="Upload a file or folder")
240 | text_file_upload_status = gr.Textbox(label="Database status:", placeholder="No database...")
241 | process_text_file_button = gr.Button("Process File")
242 | with gr.Row():
243 | text_mllm_dropdown = gr.Dropdown(label="Select mllms", choices=["GPT4-V", "InternLM-Xcomposer2"],
244 | value="GPT4-V")
245 | text_rag_n_results_slider = gr.Slider(label="Number of Retrieved Texts", minimum=1, maximum=10,step=1, value=3)
246 | with gr.Column(scale=1):
247 | text_question_input = gr.Textbox(label="Enter your question", placeholder="Type your question here...", lines=2)
248 | text_rag_button = gr.Button("Submit Question")
249 | with gr.Row():
250 | # Display section for search results and answer
251 | with gr.Column(scale=1):
252 | text_rag_results = gr.Textbox(label="Local Database Search Results", placeholder="Search results from local database will appear here...", lines=10, interactive=False)
253 | with gr.Column(scale=1):
254 | text_rag_answer = gr.Textbox(label="LLM Response", placeholder="LLM's response will appear here....", lines=10, interactive=False)
255 |
256 | with gr.Tab("Image Retrieve"):
257 | with gr.Row():
258 | with gr.Column(scale=1):
259 | image_file_input = gr.components.File(label="Upload a file or folder", file_count="multiple")
260 | image_file_input_status = gr.Textbox(label="Database status:", placeholder="No database...")
261 | process_image_file_button = gr.Button("Process Images")
262 | with gr.Column(scale=1):
263 | image_question_upload = gr.components.File(label="Upload a single image for retrieval")
264 | image_rag_button = gr.Button("Retrieve Images")
265 | with gr.Row():
266 | with gr.Column(scale=1):
267 | origin_images = gr.Gallery(label="Original Images", show_label=True, columns=[3], rows=[1], object_fit="contain", height="auto")
268 | with gr.Column(scale=1):
269 | retrieved_images = gr.Gallery(label="Retrieved Images", show_label=True, columns=[3], rows=[1], object_fit="contain", height="auto")
270 |
271 | # Web Search Event
272 | web_search_button.click(
273 | fn=web_search_gradio,
274 | inputs=[web_search_query, search_engine_dropdown, web_mllm_dropdown, search_num_slider],
275 | outputs=[web_search_results, web_search_answer]
276 | )
277 | # Text RAG Event
278 | process_text_file_button.click(process_uploaded_text_file, inputs = text_file_upload, outputs = text_file_upload_status)
279 | text_rag_button.click(
280 | fn=text_rag_gradio,
281 | inputs=[text_question_input, text_rag_n_results_slider, text_mllm_dropdown],
282 | outputs=[text_rag_results, text_rag_answer]
283 | )
284 | # Image RAG Event
285 | process_image_file_button.click(process_uploaded_image_fold_file, inputs = image_file_input, outputs = image_file_input_status)
286 | image_rag_button.click(image_rag_gradio, inputs=image_question_upload, outputs=[origin_images,retrieved_images])
287 |
288 | # App Main Settings
289 | app.queue(max_size=100)
290 | app.launch(share=True, server_name="0.0.0.0", server_port=10078)
291 |
292 | if __name__ == "__main__":
293 | main()
294 |
--------------------------------------------------------------------------------
/web_search/Serper_API.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 30,
6 | "id": "3e7cb826-e7da-4bc1-8024-73bce8cbb410",
7 | "metadata": {},
8 | "outputs": [
9 | {
10 | "name": "stdout",
11 | "output_type": "stream",
12 | "text": [
13 | "{\"searchParameters\":{\"q\":\"lions\",\"type\":\"videos\",\"num\":10,\"engine\":\"google\"},\"videos\":[{\"title\":\"Jake Bates DOES IT AGAIN....The Detroit Lions ... - YouTube\",\"link\":\"https://www.youtube.com/watch?v=kLX17H48ios\",\"snippet\":\"... Lions resonates with every fan, \\\"Lions Syndicate\\\" brings you another electrifying update that you simply can't miss. \\\"Jake Bates DOES ...\",\"imageUrl\":\"https://i.ytimg.com/vi/kLX17H48ios/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mQ8U8YW-sY56CrG1KlKj-m2v9-iw\",\"duration\":\"6:51\",\"source\":\"YouTube\",\"channel\":\"Lions Syndicate\",\"date\":\"10 hours ago\",\"position\":1},{\"title\":\"Detroit Lions News And A Mock Draft - YouTube\",\"link\":\"https://www.youtube.com/watch?v=KvxOKLOkOAQ\",\"snippet\":\"Detroit Lions News And A Mock Draft In this episode of the Detroit Lions Podcast, Chris and Jeff are finally back! Its been a bit since the ...\",\"imageUrl\":\"https://i.ytimg.com/vi/KvxOKLOkOAQ/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3ll1UxiFsfNUmOnZF2k7xnLT3O6bA\",\"duration\":\"1:33:36\",\"source\":\"YouTube\",\"channel\":\"Detroit Lions Podcast\",\"date\":\"1 day ago\",\"position\":2},{\"title\":\"Diving into Detroit Lions 7-Round Mock Draft from Erik Schlitt\",\"link\":\"https://www.youtube.com/watch?v=JpaR3ACFNeE\",\"snippet\":\"Pride of Detroit's very own Erik Schlitt conducted a 7-round mock draft for the Detroit Lions. Tune in to see if Meko and Morgan agree with ...\",\"imageUrl\":\"https://i.ytimg.com/vi/JpaR3ACFNeE/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nTOOIw-4vS-OPilwrELuGG3mJoAg\",\"duration\":\"31:10\",\"source\":\"YouTube\",\"channel\":\"Pride of Detroit\",\"date\":\"1 day ago\",\"position\":3},{\"title\":\"Alim McNeill Is A BEAST For The Lions - YouTube\",\"link\":\"https://www.youtube.com/watch?v=wWF71AtKm7E\",\"snippet\":\"Alim doesn't get a lot of hype but he's a top 10 DT imo. I think he's gonna have a huge 2024 season for The Lions with double digit sacks.\",\"imageUrl\":\"https://i.ytimg.com/vi/wWF71AtKm7E/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nsm_9Q3gP0XlJP3xacRtERJRA5Sg\",\"duration\":\"22:15\",\"source\":\"YouTube\",\"channel\":\"Green Light with Chris Long\",\"date\":\"14 hours ago\",\"position\":4},{\"title\":\"Opening Day and the Detroit Lions still get the loudest ovation\",\"link\":\"https://www.youtube.com/watch?v=Difxt8qW0cs\",\"snippet\":\"Opening Day and the Detroit Lions still get the loudest ovation. 1.8K views · 5 hours ago Locked On Lions Podcast ...more. Locked On Lions. 10.7 ...\",\"imageUrl\":\"https://i.ytimg.com/vi/Difxt8qW0cs/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nVHORSmC8Qp8liO59MNEfkPBPkaQ\",\"duration\":\"25:56\",\"source\":\"YouTube\",\"channel\":\"Locked On Lions\",\"date\":\"2 days ago\",\"position\":5},{\"title\":\"Detroit Lions Mailbag Rumors: Sign Xavien Howard & Tyler ...\",\"link\":\"https://www.youtube.com/watch?v=TCUpU3dcZvU\",\"snippet\":\"Detroit Lions rumors begin with the Detroit Lions signing Free Agenct CB Xavien Howard. The Lions need help on defense and get a veteran CB ...\",\"imageUrl\":\"https://i.ytimg.com/vi/TCUpU3dcZvU/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nPek0pKF5q9EJo-Kkc6llMyh5QuA\",\"duration\":\"13:59\",\"source\":\"YouTube\",\"channel\":\"Lions Talk by Chat Sports\",\"date\":\"1 day ago\",\"position\":6},{\"title\":\"How Much BETTER Are the Detroit Lions? - YouTube\",\"link\":\"https://www.youtube.com/watch?v=IKruZqpva4s\",\"snippet\":\"Ryan Ermanni, Braylon Edwards and Tom Mazawey discuss the Detroit Lions free agent signings and how the crop of new players will affect the ...\",\"imageUrl\":\"https://i.ytimg.com/vi/IKruZqpva4s/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mjl4P_pMfRT16kFOXcI1dvmNyq6w\",\"duration\":\"14:32\",\"source\":\"YouTube\",\"channel\":\"WoodwardSports\",\"date\":\"3 weeks ago\",\"position\":7},{\"title\":\"I Don't Think We Realize What The Detroit Lions Just Did..\",\"link\":\"https://www.youtube.com/watch?v=dYSSHgYPERM\",\"snippet\":\"No mention of Amik Robertson coming to the lions from the raiders. I'm excited to see what he can do too he wanted to be a lion baaaad.\",\"imageUrl\":\"https://i.ytimg.com/vi/dYSSHgYPERM/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3ly3aAsp2Ma_rifiEf4PsddliNd0A\",\"duration\":\"8:35\",\"source\":\"YouTube\",\"channel\":\"Football Logic\",\"date\":\"1 week ago\",\"position\":8},{\"title\":\"Detroit Lions Are Setting Up A Huge Draft Move - YouTube\",\"link\":\"https://www.youtube.com/watch?v=Fvxq3w_jR8A\",\"snippet\":\"Detroit Lions Are Setting Up A Huge Draft Move. detroit lions news and rumors. detroit lions are for sure going to trade their pick in the ...\",\"imageUrl\":\"https://i.ytimg.com/vi/Fvxq3w_jR8A/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nrmm2_-p_OJruEODf9H9armk6Btg\",\"duration\":\"8:06\",\"source\":\"YouTube\",\"channel\":\"Sports Talk Detroit\",\"date\":\"7 hours ago\",\"position\":9},{\"title\":\"Zeitler on living up to the standard that's been set - Detroit Lions\",\"link\":\"https://www.detroitlions.com/video/zeitler-on-living-up-to-the-standard-that-s-been-set\",\"snippet\":\"Wood on continued growth throughout the organization. President and CEO Rod Wood speaks to the media about the future of the Lions organization at the 2024 NFL ...\",\"imageUrl\":\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSVOSzC6dO3nIo29HBfspJQObbJVNOR7qBiQLeSEZeCcjsp&s\",\"source\":\"Detroit Lions\",\"date\":\"3 weeks ago\",\"position\":10}]}\n"
14 | ]
15 | }
16 | ],
17 | "source": [
18 | "import requests\n",
19 | "import json\n",
20 | "from IPython.display import HTML\n",
21 | "\n",
22 | "### choose between images or videos\n",
23 | "# search_type = \"images\"\n",
24 | "search_type = \"videos\"\n",
25 | "url = f\"https://google.serper.dev/{search_type}\"\n",
26 | "\n",
27 | "payload = json.dumps({\n",
28 | " \"q\": \"lions\",\n",
29 | " \"num\": 10\n",
30 | "})\n",
31 | "headers = {\n",
32 | " 'X-API-KEY': '*************************************',\n",
33 | " 'Content-Type': 'application/json'\n",
34 | "}\n",
35 | "\n",
36 | "response = requests.request(\"POST\", url, headers=headers, data=payload)\n",
37 | "\n",
38 | "print(response.text)"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 31,
44 | "id": "6c70f0ff-3e96-4974-b97e-12a3d5f85813",
45 | "metadata": {},
46 | "outputs": [
47 | {
48 | "name": "stdout",
49 | "output_type": "stream",
50 | "text": [
51 | "\n"
52 | ]
53 | }
54 | ],
55 | "source": [
56 | "search_results = json.loads(response.text)\n",
57 | "print(type(search_results))"
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": 32,
63 | "id": "6a90bde5-d68e-409e-8e8a-06b56c6b4afc",
64 | "metadata": {},
65 | "outputs": [
66 | {
67 | "name": "stdout",
68 | "output_type": "stream",
69 | "text": [
70 | "dict_keys(['searchParameters', 'videos'])\n"
71 | ]
72 | }
73 | ],
74 | "source": [
75 | "print(search_results.keys())"
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": 33,
81 | "id": "220491af-2c00-48c5-96be-d52e7ec91f7c",
82 | "metadata": {},
83 | "outputs": [
84 | {
85 | "name": "stdout",
86 | "output_type": "stream",
87 | "text": [
88 | "{'q': 'lions', 'type': 'videos', 'num': 10, 'engine': 'google'}\n",
89 | "10\n",
90 | "[{'title': 'Jake Bates DOES IT AGAIN....The Detroit Lions ... - YouTube', 'link': 'https://www.youtube.com/watch?v=kLX17H48ios', 'snippet': '... Lions resonates with every fan, \"Lions Syndicate\" brings you another electrifying update that you simply can\\'t miss. \"Jake Bates DOES ...', 'imageUrl': 'https://i.ytimg.com/vi/kLX17H48ios/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mQ8U8YW-sY56CrG1KlKj-m2v9-iw', 'duration': '6:51', 'source': 'YouTube', 'channel': 'Lions Syndicate', 'date': '10 hours ago', 'position': 1}, {'title': 'Detroit Lions News And A Mock Draft - YouTube', 'link': 'https://www.youtube.com/watch?v=KvxOKLOkOAQ', 'snippet': 'Detroit Lions News And A Mock Draft In this episode of the Detroit Lions Podcast, Chris and Jeff are finally back! Its been a bit since the ...', 'imageUrl': 'https://i.ytimg.com/vi/KvxOKLOkOAQ/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3ll1UxiFsfNUmOnZF2k7xnLT3O6bA', 'duration': '1:33:36', 'source': 'YouTube', 'channel': 'Detroit Lions Podcast', 'date': '1 day ago', 'position': 2}, {'title': 'Diving into Detroit Lions 7-Round Mock Draft from Erik Schlitt', 'link': 'https://www.youtube.com/watch?v=JpaR3ACFNeE', 'snippet': \"Pride of Detroit's very own Erik Schlitt conducted a 7-round mock draft for the Detroit Lions. Tune in to see if Meko and Morgan agree with ...\", 'imageUrl': 'https://i.ytimg.com/vi/JpaR3ACFNeE/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nTOOIw-4vS-OPilwrELuGG3mJoAg', 'duration': '31:10', 'source': 'YouTube', 'channel': 'Pride of Detroit', 'date': '1 day ago', 'position': 3}, {'title': 'Alim McNeill Is A BEAST For The Lions - YouTube', 'link': 'https://www.youtube.com/watch?v=wWF71AtKm7E', 'snippet': \"Alim doesn't get a lot of hype but he's a top 10 DT imo. I think he's gonna have a huge 2024 season for The Lions with double digit sacks.\", 'imageUrl': 'https://i.ytimg.com/vi/wWF71AtKm7E/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nsm_9Q3gP0XlJP3xacRtERJRA5Sg', 'duration': '22:15', 'source': 'YouTube', 'channel': 'Green Light with Chris Long', 'date': '14 hours ago', 'position': 4}, {'title': 'Opening Day and the Detroit Lions still get the loudest ovation', 'link': 'https://www.youtube.com/watch?v=Difxt8qW0cs', 'snippet': 'Opening Day and the Detroit Lions still get the loudest ovation. 1.8K views · 5 hours ago Locked On Lions Podcast ...more. Locked On Lions. 10.7 ...', 'imageUrl': 'https://i.ytimg.com/vi/Difxt8qW0cs/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nVHORSmC8Qp8liO59MNEfkPBPkaQ', 'duration': '25:56', 'source': 'YouTube', 'channel': 'Locked On Lions', 'date': '2 days ago', 'position': 5}, {'title': 'Detroit Lions Mailbag Rumors: Sign Xavien Howard & Tyler ...', 'link': 'https://www.youtube.com/watch?v=TCUpU3dcZvU', 'snippet': 'Detroit Lions rumors begin with the Detroit Lions signing Free Agenct CB Xavien Howard. The Lions need help on defense and get a veteran CB ...', 'imageUrl': 'https://i.ytimg.com/vi/TCUpU3dcZvU/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nPek0pKF5q9EJo-Kkc6llMyh5QuA', 'duration': '13:59', 'source': 'YouTube', 'channel': 'Lions Talk by Chat Sports', 'date': '1 day ago', 'position': 6}, {'title': 'How Much BETTER Are the Detroit Lions? - YouTube', 'link': 'https://www.youtube.com/watch?v=IKruZqpva4s', 'snippet': 'Ryan Ermanni, Braylon Edwards and Tom Mazawey discuss the Detroit Lions free agent signings and how the crop of new players will affect the ...', 'imageUrl': 'https://i.ytimg.com/vi/IKruZqpva4s/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mjl4P_pMfRT16kFOXcI1dvmNyq6w', 'duration': '14:32', 'source': 'YouTube', 'channel': 'WoodwardSports', 'date': '3 weeks ago', 'position': 7}, {'title': \"I Don't Think We Realize What The Detroit Lions Just Did..\", 'link': 'https://www.youtube.com/watch?v=dYSSHgYPERM', 'snippet': \"No mention of Amik Robertson coming to the lions from the raiders. I'm excited to see what he can do too he wanted to be a lion baaaad.\", 'imageUrl': 'https://i.ytimg.com/vi/dYSSHgYPERM/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3ly3aAsp2Ma_rifiEf4PsddliNd0A', 'duration': '8:35', 'source': 'YouTube', 'channel': 'Football Logic', 'date': '1 week ago', 'position': 8}, {'title': 'Detroit Lions Are Setting Up A Huge Draft Move - YouTube', 'link': 'https://www.youtube.com/watch?v=Fvxq3w_jR8A', 'snippet': 'Detroit Lions Are Setting Up A Huge Draft Move. detroit lions news and rumors. detroit lions are for sure going to trade their pick in the ...', 'imageUrl': 'https://i.ytimg.com/vi/Fvxq3w_jR8A/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nrmm2_-p_OJruEODf9H9armk6Btg', 'duration': '8:06', 'source': 'YouTube', 'channel': 'Sports Talk Detroit', 'date': '7 hours ago', 'position': 9}, {'title': \"Zeitler on living up to the standard that's been set - Detroit Lions\", 'link': 'https://www.detroitlions.com/video/zeitler-on-living-up-to-the-standard-that-s-been-set', 'snippet': 'Wood on continued growth throughout the organization. President and CEO Rod Wood speaks to the media about the future of the Lions organization at the 2024 NFL ...', 'imageUrl': 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSVOSzC6dO3nIo29HBfspJQObbJVNOR7qBiQLeSEZeCcjsp&s', 'source': 'Detroit Lions', 'date': '3 weeks ago', 'position': 10}]\n",
91 | "{'title': 'Jake Bates DOES IT AGAIN....The Detroit Lions ... - YouTube', 'link': 'https://www.youtube.com/watch?v=kLX17H48ios', 'snippet': '... Lions resonates with every fan, \"Lions Syndicate\" brings you another electrifying update that you simply can\\'t miss. \"Jake Bates DOES ...', 'imageUrl': 'https://i.ytimg.com/vi/kLX17H48ios/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mQ8U8YW-sY56CrG1KlKj-m2v9-iw', 'duration': '6:51', 'source': 'YouTube', 'channel': 'Lions Syndicate', 'date': '10 hours ago', 'position': 1}\n"
92 | ]
93 | }
94 | ],
95 | "source": [
96 | "if search_type == \"images\":\n",
97 | " print(search_results[\"searchParameters\"])\n",
98 | " print(len(search_results[\"images\"]))\n",
99 | " print(search_results[\"images\"])\n",
100 | " print(search_results[\"images\"][0])\n",
101 | "elif search_type == \"videos\":\n",
102 | " print(search_results[\"searchParameters\"])\n",
103 | " print(len(search_results[\"videos\"]))\n",
104 | " print(search_results[\"videos\"])\n",
105 | " print(search_results[\"videos\"][0])"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": 38,
111 | "id": "020c7e8c-a5a3-4ee9-af13-7aa3fd8ebb5c",
112 | "metadata": {},
113 | "outputs": [
114 | {
115 | "data": {
116 | "text/html": [
117 | "\n",
118 | " | Jake Bates DOES IT AGAIN....The Detroit Lions ... - YouTube | \n",
119 | " ... Lions resonates with every fan, \"Lions Syndicate\" brings you another electrifying update that you simply can't miss. \"Jake Bates DOES ... | \n",
120 | " https://i.ytimg.com/vi/kLX17H48ios/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mQ8U8YW-sY56CrG1KlKj-m2v9-iw | \n",
121 | "
\n",
122 | "\n",
123 | " | Detroit Lions News And A Mock Draft - YouTube | \n",
124 | " Detroit Lions News And A Mock Draft In this episode of the Detroit Lions Podcast, Chris and Jeff are finally back! Its been a bit since the ... | \n",
125 | " https://i.ytimg.com/vi/KvxOKLOkOAQ/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3ll1UxiFsfNUmOnZF2k7xnLT3O6bA | \n",
126 | "
\n",
127 | "\n",
128 | " | Diving into Detroit Lions 7-Round Mock Draft from Erik Schlitt | \n",
129 | " Pride of Detroit's very own Erik Schlitt conducted a 7-round mock draft for the Detroit Lions. Tune in to see if Meko and Morgan agree with ... | \n",
130 | " https://i.ytimg.com/vi/JpaR3ACFNeE/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nTOOIw-4vS-OPilwrELuGG3mJoAg | \n",
131 | "
\n",
132 | "\n",
133 | " | Alim McNeill Is A BEAST For The Lions - YouTube | \n",
134 | " Alim doesn't get a lot of hype but he's a top 10 DT imo. I think he's gonna have a huge 2024 season for The Lions with double digit sacks. | \n",
135 | " https://i.ytimg.com/vi/wWF71AtKm7E/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nsm_9Q3gP0XlJP3xacRtERJRA5Sg | \n",
136 | "
\n",
137 | "\n",
138 | " | Opening Day and the Detroit Lions still get the loudest ovation | \n",
139 | " Opening Day and the Detroit Lions still get the loudest ovation. 1.8K views · 5 hours ago Locked On Lions Podcast ...more. Locked On Lions. 10.7 ... | \n",
140 | " https://i.ytimg.com/vi/Difxt8qW0cs/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nVHORSmC8Qp8liO59MNEfkPBPkaQ | \n",
141 | "
\n",
142 | "\n",
143 | " | Detroit Lions Mailbag Rumors: Sign Xavien Howard & Tyler ... | \n",
144 | " Detroit Lions rumors begin with the Detroit Lions signing Free Agenct CB Xavien Howard. The Lions need help on defense and get a veteran CB ... | \n",
145 | " https://i.ytimg.com/vi/TCUpU3dcZvU/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nPek0pKF5q9EJo-Kkc6llMyh5QuA | \n",
146 | "
\n",
147 | "\n",
148 | " | How Much BETTER Are the Detroit Lions? - YouTube | \n",
149 | " Ryan Ermanni, Braylon Edwards and Tom Mazawey discuss the Detroit Lions free agent signings and how the crop of new players will affect the ... | \n",
150 | " https://i.ytimg.com/vi/IKruZqpva4s/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3mjl4P_pMfRT16kFOXcI1dvmNyq6w | \n",
151 | "
\n",
152 | "\n",
153 | " | I Don't Think We Realize What The Detroit Lions Just Did.. | \n",
154 | " No mention of Amik Robertson coming to the lions from the raiders. I'm excited to see what he can do too he wanted to be a lion baaaad. | \n",
155 | " https://i.ytimg.com/vi/dYSSHgYPERM/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3ly3aAsp2Ma_rifiEf4PsddliNd0A | \n",
156 | "
\n",
157 | "\n",
158 | " | Detroit Lions Are Setting Up A Huge Draft Move - YouTube | \n",
159 | " Detroit Lions Are Setting Up A Huge Draft Move. detroit lions news and rumors. detroit lions are for sure going to trade their pick in the ... | \n",
160 | " https://i.ytimg.com/vi/Fvxq3w_jR8A/mqdefault.jpg?sqp=-oaymwEFCJQBEFM&rs=AMzJL3nrmm2_-p_OJruEODf9H9armk6Btg | \n",
161 | "
\n",
162 | "\n",
163 | " | Zeitler on living up to the standard that's been set - Detroit Lions | \n",
164 | " Wood on continued growth throughout the organization. President and CEO Rod Wood speaks to the media about the future of the Lions organization at the 2024 NFL ... | \n",
165 | " https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSVOSzC6dO3nIo29HBfspJQObbJVNOR7qBiQLeSEZeCcjsp&s | \n",
166 | "
"
167 | ],
168 | "text/plain": [
169 | ""
170 | ]
171 | },
172 | "metadata": {},
173 | "output_type": "display_data"
174 | }
175 | ],
176 | "source": [
177 | "if search_type == \"images\":\n",
178 | " rows = \"\\n\".join([\"\"\"\n",
179 | " | {1} | \n",
180 | " {2} | \n",
181 | "
\"\"\".format(v[\"link\"], v[\"title\"], v[\"imageUrl\"])\n",
182 | " for v in search_results[\"images\"]])\n",
183 | " display(HTML(\"\".format(rows)))\n",
184 | "elif search_type == \"videos\":\n",
185 | " rows = \"\\n\".join([\"\"\"\n",
186 | " | {1} | \n",
187 | " {2} | \n",
188 | " {3} | \n",
189 | "
\"\"\".format(v[\"link\"], v[\"title\"], v[\"snippet\"], v[\"imageUrl\"])\n",
190 | " for v in search_results[\"videos\"]])\n",
191 | " display(HTML(\"\".format(rows)))"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": null,
197 | "id": "f7e23d70-e969-40de-8a1d-06b44610ccbb",
198 | "metadata": {},
199 | "outputs": [],
200 | "source": []
201 | }
202 | ],
203 | "metadata": {
204 | "kernelspec": {
205 | "display_name": "Python 3 (ipykernel)",
206 | "language": "python",
207 | "name": "python3"
208 | },
209 | "language_info": {
210 | "codemirror_mode": {
211 | "name": "ipython",
212 | "version": 3
213 | },
214 | "file_extension": ".py",
215 | "mimetype": "text/x-python",
216 | "name": "python",
217 | "nbconvert_exporter": "python",
218 | "pygments_lexer": "ipython3",
219 | "version": "3.10.13"
220 | }
221 | },
222 | "nbformat": 4,
223 | "nbformat_minor": 5
224 | }
225 |
--------------------------------------------------------------------------------
/RAG/artwork_data.tsv:
--------------------------------------------------------------------------------
1 | ID PATH INFO
2 | 283 283.jpg AUTHOR: ALLEGRAIN, Christophe-Gabriel, BORN-DIED: (b. 1710, Paris, d. 1795, Paris), TITLE: Venus at Bath, DATE: c. 1767, TECHNIQUE: Marble, height 174 cm, LOCATION: Musée du Louvre, Paris, FORM: sculpture, TYPE: mythological, SCHOOL: French, TIMELINE: 1751-1800
3 | 466 466.jpg AUTHOR: AMMANATI, Bartolomeo, BORN-DIED: (b. 1511, Settignano, d. 1592, Firenze), TITLE: Allegory of Winter, DATE: 1563-65, TECHNIQUE: Stone, LOCATION: Villa Medici, Castello, FORM: sculpture, TYPE: mythological, SCHOOL: Italian, TIMELINE: 1551-1600
4 | 1072 1072.jpg AUTHOR: ANTONIO DA FIRENZE, BORN-DIED: (active in the first half of the 15th century), TITLE: Madonna and Child with Saints, DATE: 1400-50, TECHNIQUE: Tempera on wood, 151 x 85 cm, LOCATION: The Hermitage, St. Petersburg, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1401-1450
5 | 2436 2436.jpg AUTHOR: ARNOLFO DI CAMBIO, BORN-DIED: (b. ca. 1245, Colle di Valdelse, d. ca. 1310, Firenze), TITLE: Presepio (detail), DATE: -, TECHNIQUE: Marble, LOCATION: Santa Maria Maggiore, Rome, FORM: sculpture, TYPE: religious, SCHOOL: Italian, TIMELINE: 1251-1300
6 | 2545 2545.jpg AUTHOR: AST, Balthasar van der, BORN-DIED: (b. 1594, Middelburg, d. 1657, Delft), TITLE: Still-Life with a Basket of Flowers, DATE: after 1632, TECHNIQUE: Oil on canvas, 42 x 62 cm, LOCATION: Nationalmuseum, Stockholm, FORM: painting, TYPE: still-life, SCHOOL: Dutch, TIMELINE: 1601-1650
7 | 2550 2550.jpg AUTHOR: AST, Balthasar van der, BORN-DIED: (b. 1594, Middelburg, d. 1657, Delft), TITLE: Vase of Flowers by a Window, DATE: 1650-57, TECHNIQUE: Oil on panel, 67 x 98 cm, LOCATION: Anhaltische Gemäldegalerie, Dessau, FORM: painting, TYPE: still-life, SCHOOL: Dutch, TIMELINE: 1601-1650
8 | 3100 3100.jpg AUTHOR: BARYE, Antoine-Louis, BORN-DIED: (b. 1796, Paris, d. 1875, Paris), TITLE: Lion Bitten by a Snake, DATE: 1831, TECHNIQUE: Bronze, 135 x 178 cm, LOCATION: Musée du Louvre, Paris, FORM: sculpture, TYPE: genre, SCHOOL: French, TIMELINE: 1801-1850
9 | 3118 3118.jpg AUTHOR: BASAITI, Marco, BORN-DIED: (active 1496-1530 in Venice), TITLE: St Sebastian, DATE: -, TECHNIQUE: Oil on canvas, LOCATION: Santa Maria della Salute, Venice, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1501-1550
10 | 3483 3483.jpg AUTHOR: BELLINI, Giovanni, BORN-DIED: (b. ca. 1426, Venezia, d. 1516, Venezia), TITLE: Madonna and Child, DATE: c. 1455, TECHNIQUE: Tempera on panel, 72 x 46 cm, LOCATION: Metropolitan Museum of Art, New York, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1451-1500
11 | 3764 3764.jpg AUTHOR: BENEFIAL, Marco, BORN-DIED: (b. 1684, Roma, d. 1764, Roma), TITLE: Self-Portrait, DATE: 1731, TECHNIQUE: Red chalk on paper, 354 x 230 mm, LOCATION: J. Paul Getty Museum, Los Angeles, FORM: painting, TYPE: portrait, SCHOOL: Italian, TIMELINE: 1701-1750
12 | 3917 3917.jpg AUTHOR: BERJON, Antoine, BORN-DIED: (b. 1754, Lyon, d. 1843, Lyon), TITLE: Still-Life, DATE: -, TECHNIQUE: Oil on canvas, 46 x 56 cm, LOCATION: Private collection, FORM: painting, TYPE: still-life, SCHOOL: French, TIMELINE: 1801-1850
13 | 3945 3945.jpg AUTHOR: BERNINI, Gian Lorenzo, BORN-DIED: (b. 1598, Napoli, d. 1680, Roma), TITLE: Interior view, DATE: 1658-61, TECHNIQUE: Photo, LOCATION: Sant'Andrea al Quirinale, Rome, FORM: architecture, TYPE: religious, SCHOOL: Italian, TIMELINE: 1601-1650
14 | 4073 4073.jpg AUTHOR: BERRUGUETE, Alonso, BORN-DIED: (b. 1488, Paredes de Nava, d. 1561, Valladolid), TITLE: Adoration of the Magi, DATE: 1526-32, TECHNIQUE: Polychrome wood, LOCATION: National Museum of Religious Carvings, Valladolid, FORM: sculpture, TYPE: religious, SCHOOL: Spanish, TIMELINE: 1501-1550
15 | 4195 4195.jpg AUTHOR: BEZZUOLI, Giuseppe, BORN-DIED: (b. 1784, Firenze, d. 1855, Firenze), TITLE: Venus Crossing the Sea on a Shell, DATE: 1830s, TECHNIQUE: Oil on wood panel, 33 x 41 cm, LOCATION: Private collection, FORM: painting, TYPE: mythological, SCHOOL: Italian, TIMELINE: 1801-1850
16 | 4880 4880.jpg AUTHOR: BOSCH, Hieronymus, BORN-DIED: (b. ca. 1450, 's-Hertogenbosch, d. 1516, 's-Hertogenbosch), TITLE: Triptych of Garden of Earthly Delights (detail), DATE: c. 1500, TECHNIQUE: Oil on panel, LOCATION: Museo del Prado, Madrid, FORM: painting, TYPE: religious, SCHOOL: Netherlandish, TIMELINE: 1451-1500
17 | 5522 5522.jpg AUTHOR: BÖCKLIN, Arnold, BORN-DIED: (b. 1827, Basel, d. 1901, Firenze), TITLE: Campagna Landscape, DATE: 1857-58, TECHNIQUE: Oil on canvas, 88 x 105 cm, LOCATION: Nationalgalerie, Berlin, FORM: painting, TYPE: landscape, SCHOOL: Swiss, TIMELINE: 1851-1900
18 | 5570 5570.jpg AUTHOR: BRAMANTE, Donato, BORN-DIED: (b. 1444, Fermignano, d. 1514, Roma), TITLE: Umbrella vault, DATE: 1493, TECHNIQUE: -, LOCATION: Santa Maria delle Grazie, Milan, FORM: architecture, TYPE: interior, SCHOOL: Italian, TIMELINE: 1451-1500
19 | 7440 7440.jpg AUTHOR: CARPACCIO, Vittore, BORN-DIED: (b. 1472, Venezia, d. 1526, Capodistria), TITLE: The Lion of St Mark (detail), DATE: 1516, TECHNIQUE: Tempera on canvas, LOCATION: Palazzo Ducale, Venice, FORM: painting, TYPE: historical, SCHOOL: Italian, TIMELINE: 1501-1550
20 | 7460 7460.jpg AUTHOR: CARPEAUX, Jean-Baptiste, BORN-DIED: (b. 1827, Valenciennes, d. 1875, Courbevoie), TITLE: Ugolino and His Sons (detail), DATE: 1865-67, TECHNIQUE: Marble, LOCATION: Metropolitan Museum of Art, New York, FORM: sculpture, TYPE: other, SCHOOL: French, TIMELINE: 1851-1900
21 | 7852 7852.jpg AUTHOR: CAVALLINO, Bernardo, BORN-DIED: (b. 1616, Napoli, d. ca. 1656, Napoli), TITLE: Curing of Tobias, DATE: -, TECHNIQUE: Oil on canvas, 76 x 103 cm, LOCATION: Museo del Prado, Madrid, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1601-1650
22 | 8106 8106.jpg AUTHOR: CÉZANNE, Paul, BORN-DIED: (b. 1839, Aix-en-Provence, d. 1906, Aix-en-Provence), TITLE: Mont Sainte-Victoire, DATE: c. 1887, TECHNIQUE: Oil on canvas, 67 x 92 cm, LOCATION: Courtauld Gallery, London, FORM: painting, TYPE: landscape, SCHOOL: French, TIMELINE: 1851-1900
23 | 8154 8154.jpg AUTHOR: CÉZANNE, Paul, BORN-DIED: (b. 1839, Aix-en-Provence, d. 1906, Aix-en-Provence), TITLE: The Large Bathers, DATE: 1898-1905, TECHNIQUE: Oil on canvas, 208 x 249 cm, LOCATION: Museum of Art, Philadelphia, FORM: painting, TYPE: other, SCHOOL: French, TIMELINE: 1851-1900
24 | 8308 8308.jpg AUTHOR: CHAUVEAU, François, BORN-DIED: (b. 1613, Paris, d. 1676, Paris), TITLE: Frontispiece to 'Cabinet de M. de Scudéry', DATE: 1646, TECHNIQUE: Engraving, LOCATION: Bibliothèque Nationale, Paris, FORM: graphics, TYPE: other, SCHOOL: French, TIMELINE: 1651-1700
25 | 8432 8432.jpg AUTHOR: CIMA da Conegliano, BORN-DIED: (b. ca. 1459, Conegliano, d. 1517/18, Conegliano), TITLE: St Christopher with the Infant Christ and St Peter, DATE: 1504-06, TECHNIQUE: Oil on poplar panel, 73 x 56 cm, LOCATION: Private collection, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1451-1500
26 | 8439 8439.jpg AUTHOR: CIMA da Conegliano, BORN-DIED: (b. ca. 1459, Conegliano, d. 1517/18, Conegliano), TITLE: St John the Baptist, DATE: c. 1500, TECHNIQUE: Stained glass, LOCATION: Basilica dei Santi Giovanni e Paolo, Venice, FORM: stained-glass, TYPE: religious, SCHOOL: Italian, TIMELINE: 1451-1500
27 | 8657 8657.jpg AUTHOR: CLEVE, Joos van, BORN-DIED: (b. ca. 1485, Antwerpen, d. 1540, Antwerpen), TITLE: Self-Portrait, DATE: c. 1519, TECHNIQUE: Oil on panel, 38 x 27 cm, LOCATION: Museo Thyssen-Bornemisza, Madrid, FORM: painting, TYPE: portrait, SCHOOL: Flemish, TIMELINE: 1501-1550
28 | 9086 9086.jpg AUTHOR: CORNELISZ VAN OOSTSANEN, Jacob, BORN-DIED: (b. ca. 1472, Oostzan, d. 1533, Amsterdam), TITLE: Mary Magdalen, DATE: 1519, TECHNIQUE: Panel, 49 x 40 cm, LOCATION: Art Museum, Saint Louis, FORM: painting, TYPE: religious, SCHOOL: Netherlandish, TIMELINE: 1501-1550
29 | 9512 9512.jpg AUTHOR: COUSTOU, Nicolas, BORN-DIED: (b. 1658, Lyon, d. 1733, Paris), TITLE: Louis XIII Kneeling, DATE: 1712-28, TECHNIQUE: Marble, LOCATION: Notre-Dame Cathedral, Paris, FORM: sculpture, TYPE: religious, SCHOOL: French, TIMELINE: 1701-1750
30 | 9740 9740.jpg AUTHOR: CRANACH, Lucas the Elder, BORN-DIED: (b. 1472, Kronach, d. 1553, Weimar), TITLE: The Paradise, DATE: 1530, TECHNIQUE: Limewood, 81 x 114 cm, LOCATION: Kunsthistorisches Museum, Vienna, FORM: painting, TYPE: religious, SCHOOL: German, TIMELINE: 1501-1550
31 | 10161 10161.jpg AUTHOR: CUYP, Aelbert, BORN-DIED: (b. 1620, Dordrecht, d. 1691, Dordrecht), TITLE: River Scene with Milking Woman, DATE: c. 1646, TECHNIQUE: Oil on wood, 48,3 x 74,6 cm, LOCATION: Staatliche Kunsthalle, Karlsruhe, FORM: painting, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1651-1700
32 | 10601 10601.jpg AUTHOR: DECAMPS, Alexandre Gabriel, BORN-DIED: (b. 1803, Paris, d. 1860, Fontainebleau), TITLE: The Defeat of the Cimbri, DATE: 1833, TECHNIQUE: Oil on canvas, 130 x 195 cm, LOCATION: Musée du Louvre, Paris, FORM: painting, TYPE: historical, SCHOOL: French, TIMELINE: 1801-1850
33 | 10671 10671.jpg AUTHOR: DEGAS, Edgar, BORN-DIED: (b. 1834, Paris, d. 1917, Paris), TITLE: The Dance Class, DATE: 1874, TECHNIQUE: Oil on canvas, 84 x 77 cm, LOCATION: Metropolitan Museum of Art, New York, FORM: painting, TYPE: genre, SCHOOL: French, TIMELINE: 1851-1900
34 | 10896 10896.jpg AUTHOR: DELEMER, Jean, BORN-DIED: (active mid 15th century in Tournai and Brussels), TITLE: Female Figure from the Tomb of Isabella of Bourbon, DATE: 1476, TECHNIQUE: Bronze with black laquer patina, height 58 cm, LOCATION: Rijksmuseum, Amsterdam, FORM: sculpture, TYPE: other, SCHOOL: Flemish, TIMELINE: 1401-1450
35 | 12020 12020.jpg AUTHOR: DÜRER, Albrecht, BORN-DIED: (b. 1471, Nürnberg, d. 1528, Nürnberg), TITLE: Adoration of the Magi (detail), DATE: 1504, TECHNIQUE: Oil on wood, LOCATION: Galleria degli Uffizi, Florence, FORM: painting, TYPE: religious, SCHOOL: German, TIMELINE: 1501-1550
36 | 13577 13577.jpg AUTHOR: FRAGONARD, Jean-Honoré, BORN-DIED: (b. 1732, Grasse, d. 1806, Paris), TITLE: Marie-Madeleine Guimard (Fanciful Figure), DATE: 1769, TECHNIQUE: Oil on canvas, 82 x 65 cm, LOCATION: Musée du Louvre, Paris, FORM: painting, TYPE: portrait, SCHOOL: French, TIMELINE: 1751-1800
37 | 13778 13778.jpg AUTHOR: FRANCKEN, Frans II, BORN-DIED: (b. 1581, Antwerpen, d. 1642, Antwerpen), TITLE: Feast of Esther, DATE: -, TECHNIQUE: Oil on copper, 55 x 69 cm, LOCATION: Národní Galerie, Prague, FORM: painting, TYPE: mythological, SCHOOL: Flemish, TIMELINE: 1601-1650
38 | 13939 13939.jpg AUTHOR: FRUEAUF, Rueland the Elder, BORN-DIED: (b. 1440/45, Passau, d. 1507, Passau), TITLE: The Annunciation (detail), DATE: c. 1495, TECHNIQUE: Tempera on pine panel, LOCATION: Szépmûvészeti Múzeum, Budapest, FORM: painting, TYPE: religious, SCHOOL: Austrian, TIMELINE: 1451-1500
39 | 14283 14283.jpg AUTHOR: GAUGUIN, Paul, BORN-DIED: (b. 1848, Paris, d. 1903, Atuona, Hiva Oa, French Polynesia), TITLE: Study of a Nude (Suzanne Sewing), DATE: 1880, TECHNIQUE: Oil on canvas, 115 x 80 cm, LOCATION: Ny Carlsberg Glyptotek, Copenhagen, FORM: painting, TYPE: study, SCHOOL: French, TIMELINE: 1851-1900
40 | 14427 14427.jpg AUTHOR: GAUGUIN, Paul, BORN-DIED: (b. 1848, Paris, d. 1903, Atuona, Hiva Oa, French Polynesia), TITLE: The God Taaroa with One of His Wifes, DATE: 1892-93, TECHNIQUE: Watercolour, 215 x 170 mm, LOCATION: Musée du Louvre, Paris, FORM: graphics, TYPE: other, SCHOOL: French, TIMELINE: 1851-1900
41 | 15167 15167.jpg AUTHOR: GIJSELS, Pieter, BORN-DIED: (b. 1621, Antwerpen, d. 1690, Antwerpen), TITLE: Village Scene, DATE: -, TECHNIQUE: Oil on copper, 13 x 17 cm, LOCATION: Private collection, FORM: painting, TYPE: landscape, SCHOOL: Flemish, TIMELINE: 1651-1700
42 | 15870 15870.jpg AUTHOR: GIOTTO di Bondone, BORN-DIED: (b. 1267, Vespignano, d. 1337, Firenze), TITLE: The Stefaneschi Triptych: Martyrdom of St Paul, DATE: c. 1330, TECHNIQUE: Tempera on panel, LOCATION: Pinacoteca, Vatican, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1301-1350
43 | 16101 16101.jpg AUTHOR: GIROMETTI, Giuseppe, BORN-DIED: (b. 1760, Roma, d. 1851, Roma), TITLE: Nessus Abducting Deianira, DATE: 1815-25, TECHNIQUE: Sardonyx, mounted in gold as a pendant, 405 x 480 mm, LOCATION: Metropolitan Museum of Art, New York, FORM: sculpture, TYPE: mythological, SCHOOL: Italian, TIMELINE: 1801-1850
44 | 16215 16215.jpg AUTHOR: GIUSTI, Antonio, BORN-DIED: (b. 1479, San Martino, d. 1519, Tours), TITLE: Head of St Peter, DATE: 1509, TECHNIQUE: Terracotta, height 30 cm, LOCATION: Musée du Louvre, Paris, FORM: sculpture, TYPE: religious, SCHOOL: Italian, TIMELINE: 1501-1550
45 | 16715 16715.jpg AUTHOR: GOGH, Vincent van, BORN-DIED: (b. 1853, Groot Zundert, d. 1890, Auvers-sur-Oise), TITLE: Wheat Field with Cypresses, DATE: June 1889, Saint-Rémy, TECHNIQUE: Black chalk and pen, 470 x 620 mm, LOCATION: Rijksmuseum Vincent van Gogh, Amsterdam, FORM: graphics, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1851-1900
46 | 16970 16970.jpg AUTHOR: GOSSART, Jan, BORN-DIED: (b. ca. 1478, Maubeuge, d. 1532, Middelburg), TITLE: The Mocking of Christ or the Man of Sorrows, DATE: c. 1525, TECHNIQUE: Etching on iron, second state, 200 x 148 mm, LOCATION: Museum of Fine Arts, Boston, FORM: graphics, TYPE: religious, SCHOOL: Flemish, TIMELINE: 1501-1550
47 | 17669 17669.jpg AUTHOR: GRECO, El, BORN-DIED: (b. 1541, Candia, d. 1614, Toledo), TITLE: The Crucifixion, DATE: 1596-1600, TECHNIQUE: Oil on canvas, 312 x 169 cm, LOCATION: Museo del Prado, Madrid, FORM: painting, TYPE: religious, SCHOOL: Spanish, TIMELINE: 1551-1600
48 | 17760 17760.jpg AUTHOR: GRECO, El, BORN-DIED: (b. 1541, Candia, d. 1614, Toledo), TITLE: Laocoön (detail), DATE: 1610, TECHNIQUE: Oil on canvas, LOCATION: National Gallery of Art, Washington, FORM: painting, TYPE: mythological, SCHOOL: Spanish, TIMELINE: 1551-1600
49 | 17960 17960.jpg AUTHOR: GRÜNEWALD, Matthias, BORN-DIED: (b. 1470/80, Würzburg, d. 1528, Halle), TITLE: The Annunciation (detail), DATE: c. 1515, TECHNIQUE: Oil on wood, LOCATION: Musée d'Unterlinden, Colmar, FORM: painting, TYPE: religious, SCHOOL: German, TIMELINE: 1501-1550
50 | 18130 18130.jpg AUTHOR: GUARDI, Francesco, BORN-DIED: (b. 1712, Venezia, d. 1793, Venezia), TITLE: The Canal Grande with San Simeone Piccolo, DATE: after 1780, TECHNIQUE: Oil on canvas, 63 x 89 cm, LOCATION: Akademie der bildenden Künste, Vienna, FORM: painting, TYPE: landscape, SCHOOL: Italian, TIMELINE: 1751-1800
51 | 18149 18149.jpg AUTHOR: GUARDI, Gianantonio, BORN-DIED: (b. 1699, Wien, d. 1760, Venezia), TITLE: Madonna and Child with Saints, DATE: 1746-48, TECHNIQUE: Oil on canvas, 234 x 154 cm, LOCATION: Parish Church, Belvedere di Aquileia, FORM: painting, TYPE: landscape, SCHOOL: Italian, TIMELINE: 1701-1750
52 | 19159 19159.jpg AUTHOR: HOLBEIN, Hans the Younger, BORN-DIED: (b. 1497, Augsburg, d. 1543, London), TITLE: Robert Cheseman, DATE: 1533, TECHNIQUE: Oil on oak, 59 x 63 cm, LOCATION: Mauritshuis, The Hague, FORM: painting, TYPE: portrait, SCHOOL: German, TIMELINE: 1501-1550
53 | 19341 19341.jpg AUTHOR: HONTHORST, Gerrit van, BORN-DIED: (b. 1590, Utrecht, d. 1656, Utrecht), TITLE: Portrait of a Gentleman, DATE: 1631, TECHNIQUE: Oil on canvas, 70 x 58 cm, LOCATION: Private collection, FORM: painting, TYPE: portrait, SCHOOL: Dutch, TIMELINE: 1601-1650
54 | 20326 20326.jpg AUTHOR: KNYFF, Jacob, BORN-DIED: (b. 1639, Haarlem, d. 1681, London), TITLE: An English Ship and other Shipping off Castle Cornet, Guernsey, DATE: -, TECHNIQUE: Oil on canvas, 218 x 165 cm, LOCATION: Private collection, FORM: painting, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1651-1700
55 | 21230 21230.jpg AUTHOR: LEONARDO da Vinci, BORN-DIED: (b. 1452, Vinci, d. 1519, Cloux, near Amboise), TITLE: Landscape near Pisa, DATE: 1502-03, TECHNIQUE: Red chalk on paper, 211 x 150 mm, LOCATION: Biblioteca Nacional, Madrid, FORM: graphics, TYPE: landscape, SCHOOL: Italian, TIMELINE: 1451-1500
56 | 21905 21905.jpg AUTHOR: LISS, Johann, BORN-DIED: (b. ca. 1590, Oldenburg, d. 1631, Verona), TITLE: The Ecstasy of St Paul, DATE: 1628-29, TECHNIQUE: Oil on canvas, 80 x 58 cm, LOCATION: Staatliche Museen, Berlin, FORM: painting, TYPE: religious, SCHOOL: German, TIMELINE: 1601-1650
57 | 22186 22186.jpg AUTHOR: LORENZETTI, Ambrogio, BORN-DIED: (b. ca. 1290, Siena, d. 1348, Siena), TITLE: Nursing Madonna, DATE: c. 1330, TECHNIQUE: Tempera on wood, 90 x 48 cm, LOCATION: Palazzo Arcivescovile, Siena, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1301-1350
58 | 22651 22651.jpg AUTHOR: MACTAGGART, William, BORN-DIED: (b. 1835, Aros, Kintyre, Strathclyde, d. 1910, Broomiknowe, Lothian), TITLE: The Storm, DATE: 1890, TECHNIQUE: Oil on canvas, 122 x 183 cm, LOCATION: National Gallery of Scotland, Edinburgh, FORM: painting, TYPE: landscape, SCHOOL: Scottish, TIMELINE: 1851-1900
59 | 22740 22740.jpg AUTHOR: MAGNASCO, Alessandro, BORN-DIED: (b. 1667, Genova, d. 1749, Genova), TITLE: The Seashore, DATE: 1720s, TECHNIQUE: Oil on canvas, 158 x 211 cm, LOCATION: The Hermitage, St. Petersburg, FORM: painting, TYPE: landscape, SCHOOL: Italian, TIMELINE: 1701-1750
60 | 23438 23438.jpg AUTHOR: MARREL, Jacob, BORN-DIED: (b. ca. 1613, Frankenthal, d. 1681, Frankfurt), TITLE: Still-Life, DATE: 1669, TECHNIQUE: Oil on canvas, 36 x 43 cm, LOCATION: Private collection, FORM: painting, TYPE: still-life, SCHOOL: German, TIMELINE: 1651-1700
61 | 23588 23588.jpg AUTHOR: MASO DI BANCO, BORN-DIED: (active 1320-50 in Firenze), TITLE: Descent of Mary's Girdle to the Apostle Thomas, DATE: c. 1337-39, TECHNIQUE: Panel, LOCATION: Staatliche Museen, Berlin, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1301-1350
62 | 23901 23901.jpg AUTHOR: MASTER of Frankfurt, BORN-DIED: (b. 1460, d. 1533, Antwerpen), TITLE: St Christopher (detail), DATE: c. 1495, TECHNIQUE: Oil on panel, LOCATION: Private collection, FORM: painting, TYPE: religious, SCHOOL: Flemish, TIMELINE: 1501-1550
63 | 24321 24321.jpg AUTHOR: MATHAM, Theodor, BORN-DIED: (b. ca. 1605, Haarlem, d. 1676, Amsterdam), TITLE: Merry Toper, DATE: 1629-30, TECHNIQUE: Engraving, 216 x 170 mm, LOCATION: Graphische Sammlung Albertina, Vienna, FORM: graphics, TYPE: genre, SCHOOL: Dutch, TIMELINE: 1601-1650
64 | 25400 25400.jpg AUTHOR: METSU, Gabriel, BORN-DIED: (b. 1629, Leiden, d. 1667, Amsterdam), TITLE: The Dismissal of Hagar, DATE: 1653-54, TECHNIQUE: Oil on canvas, 112 x 86 cm, LOCATION: Stedelijk Museum De Lakenhal, Leiden, FORM: painting, TYPE: religious, SCHOOL: Dutch, TIMELINE: 1651-1700
65 | 25427 25427.jpg AUTHOR: METSU, Gabriel, BORN-DIED: (b. 1629, Leiden, d. 1667, Amsterdam), TITLE: Visit of the Physician, DATE: 1660-67, TECHNIQUE: Oil on canvas, 61 x 48 cm, LOCATION: The Hermitage, St. Petersburg, FORM: painting, TYPE: genre, SCHOOL: Dutch, TIMELINE: 1651-1700
66 | 25484 25484.jpg AUTHOR: MICHELANGELO Buonarroti, BORN-DIED: (b. 1475, Caprese, d. 1564, Roma), TITLE: Madonna and Child (detail), DATE: 1501-05, TECHNIQUE: Marble, LOCATION: O.L. Vrouwekerk, Bruges, FORM: sculpture, TYPE: religious, SCHOOL: Italian, TIMELINE: 1501-1550
67 | 25609 25609.jpg AUTHOR: MICHELANGELO Buonarroti, BORN-DIED: (b. 1475, Caprese, d. 1564, Roma), TITLE: The third bay of the ceiling, DATE: 1508-12, TECHNIQUE: Fresco, LOCATION: Cappella Sistina, Vatican, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1501-1550
68 | 26587 26587.jpg AUTHOR: MINIATURIST, French, BORN-DIED: (active around 870 in Saint-Denis), TITLE: Codex Aureus, DATE: 870, TECHNIQUE: Manuscript (Clm. 14000), 420 x 330 mm, LOCATION: Bayerische Staatsbibliothek, Munich, FORM: illumination, TYPE: religious, SCHOOL: French, TIMELINE: 0851-0900
69 | 27400 27400.jpg AUTHOR: MISERONI, Ottavio, BORN-DIED: (b. 1567, Milano, d. 1624, Praha), TITLE: Bowl in the Form of a Lion Skin, DATE: 1590s, TECHNIQUE: Cairngorm, 9 x 25 x 10 cm, LOCATION: Kunsthistorisches Museum, Vienna, FORM: metalwork, TYPE: other, SCHOOL: Italian, TIMELINE: 1601-1650
70 | 27493 27493.jpg AUTHOR: MOLENAER, Klaes, BORN-DIED: (b. before 1630, Haarlem, d. 1676, Haarlem), TITLE: Winter Landscape, DATE: -, TECHNIQUE: Oil on oak panel, 37 x 49 cm, LOCATION: Private collection, FORM: painting, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1651-1700
71 | 27521 27521.jpg AUTHOR: MOMPER, Joos de, BORN-DIED: (b. 1564, Antwerpen, d. 1634/35, Antwerpen), TITLE: Christ Healing the Blind Man, DATE: -, TECHNIQUE: Oil on canvas, 138 x 205 cm, LOCATION: Private collection, FORM: painting, TYPE: religious, SCHOOL: Flemish, TIMELINE: 1601-1650
72 | 27782 27782.jpg AUTHOR: MONTAÑÉS, Juan Martínez, BORN-DIED: (b. 1568, Alcala la Real, d. 1649, Sevilla), TITLE: The Merciful Christ, DATE: c. 1605, TECHNIQUE: Polychrome wood, height 190 cm, LOCATION: Cathedral, Seville, FORM: sculpture, TYPE: religious, SCHOOL: Spanish, TIMELINE: 1601-1650
73 | 28167 28167.jpg AUTHOR: MOSAIC ARTIST, Italian, BORN-DIED: (active 1200-1250 in Lucca), TITLE: Façade, DATE: 1200-50, TECHNIQUE: Mosaic, LOCATION: San Frediano, Lucca, FORM: mosaic, TYPE: religious, SCHOOL: Italian, TIMELINE: 1201-1250
74 | 29519 29519.jpg AUTHOR: PATENIER, Joachim, BORN-DIED: (b. ca. 1480, Bouvignes, d. 1524, Antwerpen), TITLE: Baptism of Christ, DATE: -, TECHNIQUE: Oil on oak, 59,5 x 77 cm, LOCATION: Kunsthistorisches Museum, Vienna, FORM: painting, TYPE: religious, SCHOOL: Flemish, TIMELINE: 1501-1550
75 | 29537 29537.jpg AUTHOR: PATER, Jean Baptiste Joseph, BORN-DIED: (b. 1695, Valenciennes, d. 1736, Paris), TITLE: Concert Champêtre, DATE: 1730-35, TECHNIQUE: Oil on canvas, LOCATION: Musée des Beaux-Arts, Valenciennes, FORM: painting, TYPE: genre, SCHOOL: French, TIMELINE: 1701-1750
76 | 30119 30119.jpg AUTHOR: PIERO DELLA FRANCESCA, BORN-DIED: (b. 1416, Borgo San Sepolcro, d. 1492, Borgo San Sepolcro), TITLE: 6. Torture of the Jew (detail), DATE: 1452-66, TECHNIQUE: Fresco, LOCATION: San Francesco, Arezzo, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1451-1500
77 | 30738 30738.jpg AUTHOR: PITTONI, Giambattista, BORN-DIED: (b. 1687, Venezia, d. 1767, Venezia), TITLE: Hagar in the Desert, DATE: -, TECHNIQUE: Oil on canvas, LOCATION: Santa Maria Gloriosa dei Frari, Venice, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1701-1750
78 | 30878 30878.jpg AUTHOR: POMPEI, Alessandro, Conte, BORN-DIED: (b. 1705, Verona, d. 1772, Garda), TITLE: Villa Pompei Carlotti: Façade, DATE: 1731-37, TECHNIQUE: Photo, LOCATION: Illasi, Verona, FORM: architecture, TYPE: other, SCHOOL: Italian, TIMELINE: 1701-1750
79 | 31182 31182.jpg AUTHOR: POUSSIN, Nicolas, BORN-DIED: (b. 1594, Les Andelys, d. 1665, Roma), TITLE: The Triumph of David, DATE: c. 1630, TECHNIQUE: Oil on canvas, 100 x 130 cm, LOCATION: Museo del Prado, Madrid, FORM: painting, TYPE: mythological, SCHOOL: French, TIMELINE: 1601-1650
80 | 32091 32091.jpg AUTHOR: RAFFAELLO Sanzio, BORN-DIED: (b. 1483, Urbino, d. 1520, Roma), TITLE: Head of an Angel, DATE: 1519-20, TECHNIQUE: Black chalk and brownish charcoal, heightened with white, 308 x 254 mm, LOCATION: Szépmûvészeti Múzeum, Budapest, FORM: graphics, TYPE: study, SCHOOL: Italian, TIMELINE: 1501-1550
81 | 32275 32275.jpg AUTHOR: REGNAULT, Henri, BORN-DIED: (b. 1843, Paris, d. 1871, Buzenval), TITLE: Salome, DATE: 1870, TECHNIQUE: Oil on canvas, 160 x 103 cm, LOCATION: Metropolitan Museum of Art, New York, FORM: painting, TYPE: religious, SCHOOL: French, TIMELINE: 1851-1900
82 | 32845 32845.jpg AUTHOR: REMBRANDT Harmenszoon van Rijn, BORN-DIED: (b. 1606, Leiden, d. 1669, Amsterdam), TITLE: Man in Armour (Mars?), DATE: 1650s, TECHNIQUE: Oil on canvas, 102 x 91 cm, LOCATION: Metropolitan Museum of Art, New York, FORM: painting, TYPE: portrait, SCHOOL: Dutch, TIMELINE: 1601-1650
83 | 34486 34486.jpg AUTHOR: RUBENS, Peter Paul, BORN-DIED: (b. 1577, Siegen, d. 1640, Antwerpen), TITLE: Portrait of Helene Fourment, DATE: c. 1630, TECHNIQUE: Oil on canvas, LOCATION: Musées Royaux des Beaux-Arts, Brussels, FORM: painting, TYPE: portrait, SCHOOL: Flemish, TIMELINE: 1601-1650
84 | 34602 34602.jpg AUTHOR: RUISDAEL, Jacob Isaackszon van, BORN-DIED: (b. ca. 1628, Haarlem, d. 1682, Amsterdam), TITLE: The Dam Square in Amsterdam (detail), DATE: c. 1670, TECHNIQUE: Oil on canvas, LOCATION: Staatliche Museen, Berlin, FORM: painting, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1651-1700
85 | 34926 34926.jpg AUTHOR: SALVIATI, Cecchino del, BORN-DIED: (b. 1510, Firenze, d. 1563, Roma), TITLE: Charity, DATE: 1554-58, TECHNIQUE: Oil on wood, 156 x 122 cm, LOCATION: Galleria degli Uffizi, Florence, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1501-1550
86 | 35186 35186.jpg AUTHOR: SARACENI, Carlo, BORN-DIED: (b. 1579, Venezia, d. 1620, Venezia), TITLE: The Rest on the Flight into Egypt, DATE: 1606, TECHNIQUE: Oil on canvas, 180 x 125 cm, LOCATION: Eremo dei Camaldolesi, Frascati, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1601-1650
87 | 35574 35574.jpg AUTHOR: SCHUFFENECKER, Emile, BORN-DIED: (b. 1851, Fresne-Saint-Mames, d. 1934, Paris), TITLE: Female Nude Seated on a Bed, DATE: 1885, TECHNIQUE: Oil on canvas, 65 x 45 cm, LOCATION: Private collection, FORM: painting, TYPE: other, SCHOOL: French, TIMELINE: 1851-1900
88 | 36035 36035.jpg AUTHOR: SIMONE MARTINI, BORN-DIED: (b. 1280/85, Siena, d. 1344, Avignon), TITLE: Maestà (detail), DATE: 1315, TECHNIQUE: Fresco, 99 x 85 cm (size of detail), LOCATION: Palazzo Pubblico, Siena, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1301-1350
89 | 36711 36711.jpg AUTHOR: STEEN, Jan, BORN-DIED: (b. 1626, Leiden, d. 1679, Leiden), TITLE: Self-Portrait as a Lutenist, DATE: 1652-55, TECHNIQUE: Oil on panel, 55 x 44 cm, LOCATION: Museo Thyssen-Bornemisza, Madrid, FORM: painting, TYPE: portrait, SCHOOL: Dutch, TIMELINE: 1651-1700
90 | 37121 37121.jpg AUTHOR: SUSTERMANS, Justus, BORN-DIED: (b. 1597, Antwerpen, d. 1681, Firenze), TITLE: Portrait of Vittoria della Rovere, DATE: 1628-30, TECHNIQUE: Oil on canvas, 114 x 92 cm, LOCATION: Museo di Palazzo Martelli, Florence, FORM: painting, TYPE: portrait, SCHOOL: Flemish, TIMELINE: 1601-1650
91 | 37504 37504.jpg AUTHOR: TETRODE, Willem Danielsz van, BORN-DIED: (b. ca. 1525, Delft, d. ca. 1587, Delft), TITLE: Hercules Pomarius, DATE: 1545-65, TECHNIQUE: Bronze, height 39 cm, LOCATION: Private collection, FORM: sculpture, TYPE: mythological, SCHOOL: Dutch, TIMELINE: 1551-1600
92 | 38682 38682.jpg AUTHOR: TORRITI, Jacopo, BORN-DIED: (active c. 1270-1300), TITLE: Deësis vault, DATE: 1290s, TECHNIQUE: Fresco, LOCATION: Upper Church, San Francesco, Assisi, FORM: painting, TYPE: religious, SCHOOL: Italian, TIMELINE: 1251-1300
93 | 39561 39561.jpg AUTHOR: UNKNOWN MASTER, Dutch, BORN-DIED: (active around 1650), TITLE: Wall Tile Painter, DATE: c. 1650, TECHNIQUE: Black charcoal, LOCATION: Amsterdam Museum, Amsterdam, FORM: graphics, TYPE: study, SCHOOL: Dutch, TIMELINE: 1601-1650
94 | 40334 40334.jpg AUTHOR: UNKNOWN MASTER, Italian, BORN-DIED: (active in 1500-1510 in Padua), TITLE: Toad, DATE: 1500-10, TECHNIQUE: Bronze, diameter cm, LOCATION: Kunsthistorisches Museum, Vienna, FORM: sculpture, TYPE: other, SCHOOL: Italian, TIMELINE: 1501-1550
95 | 40827 40827.jpg AUTHOR: VASARI, Giorgio, BORN-DIED: (b. 1511, Arezzo, d. 1574, Firenze), TITLE: Allegory Related to Alchemy, DATE: c. 1570, TECHNIQUE: Oil on slate, LOCATION: Palazzo Vecchio, Florence, FORM: painting, TYPE: mythological, SCHOOL: Italian, TIMELINE: 1551-1600
96 | 40945 40945.jpg AUTHOR: VELÁZQUEZ, Diego Rodriguez de Silva y, BORN-DIED: (b. 1599, Sevilla, d. 1660, Madrid), TITLE: Young Man, DATE: c. 1629, TECHNIQUE: Oil on canvas, 89 x 69 cm, LOCATION: Alte Pinakothek, Munich, FORM: painting, TYPE: portrait, SCHOOL: Spanish, TIMELINE: 1601-1650
97 | 41303 41303.jpg AUTHOR: VERMEER, Johannes, BORN-DIED: (b. 1632, Delft, d. 1675, Delft), TITLE: View of Delft, DATE: 1659-60, TECHNIQUE: Oil on canvas, 97 x 116 cm, LOCATION: Mauritshuis, The Hague, FORM: painting, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1651-1700
98 | 41805 41805.jpg AUTHOR: VERWILT, François, BORN-DIED: (b. ca. 1620, Rotterdam, d. 1691, Rotterdam), TITLE: Adoration of the Shepherds, DATE: c. 1660, TECHNIQUE: Oil on panel, 42 x 62 cm, LOCATION: Private collection, FORM: painting, TYPE: religious, SCHOOL: Dutch, TIMELINE: 1651-1700
99 | 42039 42039.jpg AUTHOR: VITTORIA, Alessandro, BORN-DIED: (b. 1525, Trento, d. 1608, Venezia), TITLE: St Sebastian, DATE: 1566, TECHNIQUE: Bronze, height 54 cm, LOCATION: Metropolitan Museum of Art, New York, FORM: sculpture, TYPE: religious, SCHOOL: Italian, TIMELINE: 1551-1600
100 | 43111 43111.jpg AUTHOR: WTEWAEL, Peter, BORN-DIED: (b. 1596, Utrecht, d. 1660, Utrecht), TITLE: Allegory of Love, DATE: -, TECHNIQUE: Oil on canvas, 68 x 106 cm, LOCATION: Private collection, FORM: painting, TYPE: mythological, SCHOOL: Dutch, TIMELINE: 1601-1650
101 | 43121 43121.jpg AUTHOR: WYCK, Jan, BORN-DIED: (b. 1644, Haarlem, d. 1700, Mortlake), TITLE: A Boating Scene, DATE: -, TECHNIQUE: Oil on canvas, 60 x 92 cm, LOCATION: Private collection, FORM: painting, TYPE: landscape, SCHOOL: Dutch, TIMELINE: 1651-1700
102 |
--------------------------------------------------------------------------------