├── LICENSE ├── README.md ├── baseline_selected_layer_attention_maps.png ├── configuration_qwen3.py ├── demo.py ├── figs ├── benchmark_ppl_comparison.pdf ├── lm_loss.pdf ├── plot_compare.ipynb ├── plot_curve.ipynb └── tb_curves │ ├── 3T-baseline.pt │ └── 3T-gate.pt ├── gate_elementwise_selected_layer_attention_maps.png ├── gate_headwise_selected_layer_attention_maps.png └── modeling_qwen3.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/README.md -------------------------------------------------------------------------------- /baseline_selected_layer_attention_maps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/baseline_selected_layer_attention_maps.png -------------------------------------------------------------------------------- /configuration_qwen3.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/configuration_qwen3.py -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/demo.py -------------------------------------------------------------------------------- /figs/benchmark_ppl_comparison.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/figs/benchmark_ppl_comparison.pdf -------------------------------------------------------------------------------- /figs/lm_loss.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/figs/lm_loss.pdf -------------------------------------------------------------------------------- /figs/plot_compare.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/figs/plot_compare.ipynb -------------------------------------------------------------------------------- /figs/plot_curve.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/figs/plot_curve.ipynb -------------------------------------------------------------------------------- /figs/tb_curves/3T-baseline.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/figs/tb_curves/3T-baseline.pt -------------------------------------------------------------------------------- /figs/tb_curves/3T-gate.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/figs/tb_curves/3T-gate.pt -------------------------------------------------------------------------------- /gate_elementwise_selected_layer_attention_maps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/gate_elementwise_selected_layer_attention_maps.png -------------------------------------------------------------------------------- /gate_headwise_selected_layer_attention_maps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/gate_headwise_selected_layer_attention_maps.png -------------------------------------------------------------------------------- /modeling_qwen3.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiuzh20/gated_attention/HEAD/modeling_qwen3.py --------------------------------------------------------------------------------