├── .gitignore ├── GPU_peak_perf_test.cu ├── GPU_peak_perf_test.py ├── LICENSE ├── README.md ├── RGP_Capture.py ├── bench_with_ck.py ├── bench_with_ck_BNHD.py ├── bench_with_ck_bf16_BNHD.py ├── bench_with_sdpa.py ├── bench_with_sdpa_BNHD.py ├── bench_with_sdpa_bf16.py ├── bench_with_sdpa_bf16_BNHD.py ├── bench_with_triton.py ├── bench_with_triton_ck.py ├── bench_with_triton_ck_BNHD.py ├── bench_with_triton_ck_BNHD_linux.py ├── bench_with_triton_ck_linux.py ├── brbcCalc.xlsx ├── ck_fttn ├── ck_fttn_lib.dll └── ck_fttn_pyb.pyd ├── dummy.cpp ├── gemm_test ├── host.cpp ├── kernel.cu ├── kernel_builtin_wmma.cu ├── kernel_builtin_wmma_A@B.cu ├── kernel_builtin_wmma_w64.cu ├── kernel_fp32.cu ├── kernel_hgemm_AB_T.cu ├── test1.py ├── test1_builtin_w32.py ├── test1_builtin_w32_A@B.py ├── test1_fp32.py ├── test1_hgemm_A@BT.py ├── test1_w64.py └── zluda_hijack_torch_hip_ext.py ├── precision_test.py ├── precision_test_fp32ver.py ├── precision_test_triton.py ├── pure_torch_ver.py ├── rocwmma_fattn ├── FlashAttn.py ├── host.cpp ├── kernel_bf16.cu ├── kernel_fp16.cu └── zluda_hijack_torch_hip_ext.py ├── sdpa_test.py ├── test_arrange.py └── triton_fused_attention.py /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode/ 2 | __pycache__/ 3 | build/ 4 | tmp_test/ 5 | *.png 6 | *.hip 7 | -------------------------------------------------------------------------------- /GPU_peak_perf_test.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/GPU_peak_perf_test.cu -------------------------------------------------------------------------------- /GPU_peak_perf_test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/GPU_peak_perf_test.py -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/README.md -------------------------------------------------------------------------------- /RGP_Capture.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/RGP_Capture.py -------------------------------------------------------------------------------- /bench_with_ck.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_ck.py -------------------------------------------------------------------------------- /bench_with_ck_BNHD.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_ck_BNHD.py -------------------------------------------------------------------------------- /bench_with_ck_bf16_BNHD.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_ck_bf16_BNHD.py -------------------------------------------------------------------------------- /bench_with_sdpa.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_sdpa.py -------------------------------------------------------------------------------- /bench_with_sdpa_BNHD.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_sdpa_BNHD.py -------------------------------------------------------------------------------- /bench_with_sdpa_bf16.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_sdpa_bf16.py -------------------------------------------------------------------------------- /bench_with_sdpa_bf16_BNHD.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_sdpa_bf16_BNHD.py -------------------------------------------------------------------------------- /bench_with_triton.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_triton.py -------------------------------------------------------------------------------- /bench_with_triton_ck.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_triton_ck.py -------------------------------------------------------------------------------- /bench_with_triton_ck_BNHD.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_triton_ck_BNHD.py -------------------------------------------------------------------------------- /bench_with_triton_ck_BNHD_linux.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_triton_ck_BNHD_linux.py -------------------------------------------------------------------------------- /bench_with_triton_ck_linux.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/bench_with_triton_ck_linux.py -------------------------------------------------------------------------------- /brbcCalc.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/brbcCalc.xlsx -------------------------------------------------------------------------------- /ck_fttn/ck_fttn_lib.dll: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/ck_fttn/ck_fttn_lib.dll -------------------------------------------------------------------------------- /ck_fttn/ck_fttn_pyb.pyd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/ck_fttn/ck_fttn_pyb.pyd -------------------------------------------------------------------------------- /dummy.cpp: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /gemm_test/host.cpp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/host.cpp -------------------------------------------------------------------------------- /gemm_test/kernel.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/kernel.cu -------------------------------------------------------------------------------- /gemm_test/kernel_builtin_wmma.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/kernel_builtin_wmma.cu -------------------------------------------------------------------------------- /gemm_test/kernel_builtin_wmma_A@B.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/kernel_builtin_wmma_A@B.cu -------------------------------------------------------------------------------- /gemm_test/kernel_builtin_wmma_w64.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/kernel_builtin_wmma_w64.cu -------------------------------------------------------------------------------- /gemm_test/kernel_fp32.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/kernel_fp32.cu -------------------------------------------------------------------------------- /gemm_test/kernel_hgemm_AB_T.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/kernel_hgemm_AB_T.cu -------------------------------------------------------------------------------- /gemm_test/test1.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/test1.py -------------------------------------------------------------------------------- /gemm_test/test1_builtin_w32.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/test1_builtin_w32.py -------------------------------------------------------------------------------- /gemm_test/test1_builtin_w32_A@B.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/test1_builtin_w32_A@B.py -------------------------------------------------------------------------------- /gemm_test/test1_fp32.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/test1_fp32.py -------------------------------------------------------------------------------- /gemm_test/test1_hgemm_A@BT.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/test1_hgemm_A@BT.py -------------------------------------------------------------------------------- /gemm_test/test1_w64.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/test1_w64.py -------------------------------------------------------------------------------- /gemm_test/zluda_hijack_torch_hip_ext.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/gemm_test/zluda_hijack_torch_hip_ext.py -------------------------------------------------------------------------------- /precision_test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/precision_test.py -------------------------------------------------------------------------------- /precision_test_fp32ver.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/precision_test_fp32ver.py -------------------------------------------------------------------------------- /precision_test_triton.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/precision_test_triton.py -------------------------------------------------------------------------------- /pure_torch_ver.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/pure_torch_ver.py -------------------------------------------------------------------------------- /rocwmma_fattn/FlashAttn.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/rocwmma_fattn/FlashAttn.py -------------------------------------------------------------------------------- /rocwmma_fattn/host.cpp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/rocwmma_fattn/host.cpp -------------------------------------------------------------------------------- /rocwmma_fattn/kernel_bf16.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/rocwmma_fattn/kernel_bf16.cu -------------------------------------------------------------------------------- /rocwmma_fattn/kernel_fp16.cu: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/rocwmma_fattn/kernel_fp16.cu -------------------------------------------------------------------------------- /rocwmma_fattn/zluda_hijack_torch_hip_ext.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/rocwmma_fattn/zluda_hijack_torch_hip_ext.py -------------------------------------------------------------------------------- /sdpa_test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/sdpa_test.py -------------------------------------------------------------------------------- /test_arrange.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/test_arrange.py -------------------------------------------------------------------------------- /triton_fused_attention.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Repeerc/flash-attention-v2-RDNA3-minimal/HEAD/triton_fused_attention.py --------------------------------------------------------------------------------