├── HiChunk_train_config.yaml ├── LICENSE ├── README.md ├── assets ├── framework.png ├── logo.svg ├── youtu_lab.png └── youtu_mascot.png ├── config ├── dataset2maxlen.json ├── dataset2prompt.json ├── model2maxlen.json └── model2path.json ├── cov_mertic.py ├── eval.py ├── metrics.py ├── pipeline ├── chunking │ ├── HiChunk │ │ ├── HiChunk.py │ │ └── hi_chunking.sh │ ├── LumberChunk │ │ ├── lumber_chunk.py │ │ └── lumber_chunking.sh │ ├── SemanticChunk │ │ ├── semantic_chunk.py │ │ └── semantic_chunking.sh │ └── chunk_result_analysis.py ├── embedding │ └── generate_passage_embeddings.py ├── indexing │ ├── indexing.py │ └── splitter.py ├── mBGE.sh ├── merge_output.py └── retrieval │ └── passage_retrieval.py ├── pred.py ├── process_train_data.ipynb ├── requirements.txt └── retrieval_algo.py /HiChunk_train_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/HiChunk_train_config.yaml -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/README.md -------------------------------------------------------------------------------- /assets/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/assets/framework.png -------------------------------------------------------------------------------- /assets/logo.svg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/assets/logo.svg -------------------------------------------------------------------------------- /assets/youtu_lab.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/assets/youtu_lab.png -------------------------------------------------------------------------------- /assets/youtu_mascot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/assets/youtu_mascot.png -------------------------------------------------------------------------------- /config/dataset2maxlen.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/config/dataset2maxlen.json -------------------------------------------------------------------------------- /config/dataset2prompt.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/config/dataset2prompt.json -------------------------------------------------------------------------------- /config/model2maxlen.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/config/model2maxlen.json -------------------------------------------------------------------------------- /config/model2path.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/config/model2path.json -------------------------------------------------------------------------------- /cov_mertic.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/cov_mertic.py -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/eval.py -------------------------------------------------------------------------------- /metrics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/metrics.py -------------------------------------------------------------------------------- /pipeline/chunking/HiChunk/HiChunk.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/HiChunk/HiChunk.py -------------------------------------------------------------------------------- /pipeline/chunking/HiChunk/hi_chunking.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/HiChunk/hi_chunking.sh -------------------------------------------------------------------------------- /pipeline/chunking/LumberChunk/lumber_chunk.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/LumberChunk/lumber_chunk.py -------------------------------------------------------------------------------- /pipeline/chunking/LumberChunk/lumber_chunking.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/LumberChunk/lumber_chunking.sh -------------------------------------------------------------------------------- /pipeline/chunking/SemanticChunk/semantic_chunk.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/SemanticChunk/semantic_chunk.py -------------------------------------------------------------------------------- /pipeline/chunking/SemanticChunk/semantic_chunking.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/SemanticChunk/semantic_chunking.sh -------------------------------------------------------------------------------- /pipeline/chunking/chunk_result_analysis.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/chunking/chunk_result_analysis.py -------------------------------------------------------------------------------- /pipeline/embedding/generate_passage_embeddings.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/embedding/generate_passage_embeddings.py -------------------------------------------------------------------------------- /pipeline/indexing/indexing.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/indexing/indexing.py -------------------------------------------------------------------------------- /pipeline/indexing/splitter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/indexing/splitter.py -------------------------------------------------------------------------------- /pipeline/mBGE.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/mBGE.sh -------------------------------------------------------------------------------- /pipeline/merge_output.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/merge_output.py -------------------------------------------------------------------------------- /pipeline/retrieval/passage_retrieval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pipeline/retrieval/passage_retrieval.py -------------------------------------------------------------------------------- /pred.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/pred.py -------------------------------------------------------------------------------- /process_train_data.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/process_train_data.ipynb -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/requirements.txt -------------------------------------------------------------------------------- /retrieval_algo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/HiChunk/HEAD/retrieval_algo.py --------------------------------------------------------------------------------