├── README.md └── assets ├── drawio ├── Llama2.drawio ├── architecture.drawio ├── flash-attention-1.drawio └── memory-hierarchy.drawio ├── pdf ├── Llama2.pdf ├── architecture.pdf ├── flash-attention-1.pdf ├── memory-hierarchy.pdf ├── mha-gqa-mqa.png ├── nvidia-gh100.png ├── paged-attention.png └── sa_mha.pdf └── svg ├── Llama2.svg └── architecture.svg /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/README.md -------------------------------------------------------------------------------- /assets/drawio/Llama2.drawio: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/drawio/Llama2.drawio -------------------------------------------------------------------------------- /assets/drawio/architecture.drawio: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/drawio/architecture.drawio -------------------------------------------------------------------------------- /assets/drawio/flash-attention-1.drawio: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/drawio/flash-attention-1.drawio -------------------------------------------------------------------------------- /assets/drawio/memory-hierarchy.drawio: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/drawio/memory-hierarchy.drawio -------------------------------------------------------------------------------- /assets/pdf/Llama2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/Llama2.pdf -------------------------------------------------------------------------------- /assets/pdf/architecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/architecture.pdf -------------------------------------------------------------------------------- /assets/pdf/flash-attention-1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/flash-attention-1.pdf -------------------------------------------------------------------------------- /assets/pdf/memory-hierarchy.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/memory-hierarchy.pdf -------------------------------------------------------------------------------- /assets/pdf/mha-gqa-mqa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/mha-gqa-mqa.png -------------------------------------------------------------------------------- /assets/pdf/nvidia-gh100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/nvidia-gh100.png -------------------------------------------------------------------------------- /assets/pdf/paged-attention.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/paged-attention.png -------------------------------------------------------------------------------- /assets/pdf/sa_mha.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/pdf/sa_mha.pdf -------------------------------------------------------------------------------- /assets/svg/Llama2.svg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/svg/Llama2.svg -------------------------------------------------------------------------------- /assets/svg/architecture.svg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/godaai/llm-inference/HEAD/assets/svg/architecture.svg --------------------------------------------------------------------------------