├── CN.png
├── blue.png
├── red.png
├── README.md
└── text_attention.py


/CN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jiesutd/Text-Attention-Heatmap-Visualization/HEAD/CN.png


--------------------------------------------------------------------------------
/blue.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jiesutd/Text-Attention-Heatmap-Visualization/HEAD/blue.png


--------------------------------------------------------------------------------
/red.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jiesutd/Text-Attention-Heatmap-Visualization/HEAD/red.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # TAHV：Text Attention Heatmap Visualization
 2 | 
 3 | This repository provides a simple visualization tool for the attention based NLP tasks. 
 4 | 
 5 | Many attention based NLP tasks visualize the text with attention weights as background. This code takes word list and the corresponding weights as input and generate the Latex code to visualize the attention based text. The Latex code will generates a standalone `.pdf` visulization file. Users can use this `.pdf` visulization file as vector diagram to demonstrate the attention ability of the model in their papers/slides/demos.
 6 | 
 7 | 
 8 | ## Usage
 9 | 
10 | It is very simple to use this code. Feed the word list and weight list in function `generate` with output Latex file directory and color configuration. The Latex file will be generated. Then compile the Latex file, the `.pdf` file will be generated.
11 | 
12 | * Notice the weight range: [0-100]
13 | 
14 | 
15 | ![alt text](red.png "Red demo")
16 | 
17 | 
18 | ![alt text](blue.png "Blue demo")
19 | 
20 | ![alt text](CN.png "Chinese demo")
21 | 
22 | 
23 | ## Citation:   
24 | 
25 | This repository will be part of new [NCRF++](https://github.com/jiesutd/NCRFpp). Please cite our [ACL demo paper](https://arxiv.org/abs/1806.05626) if you use this code.
26 | 
27 |     @inproceedings{yang2018ncrf,  
28 |      title={NCRF++: An Open-source Neural Sequence Labeling Toolkit},  
29 |      author={Yang, Jie and Zhang, Yue},  
30 |      booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
31 |      Url = {http://aclweb.org/anthology/P18-4013},
32 |      year={2018}  
33 |     }
34 | 
35 | 
36 | 
37 | ## Update
38 | * 2019-Apr-12, support Chinese
39 | * 2019-Apr-01, init version
40 | 
41 | 


--------------------------------------------------------------------------------
/text_attention.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: Jie Yang
 3 | # @Date:   2019-03-29 16:10:23
 4 | # @Last Modified by:   Jie Yang,     Contact: jieynlp@gmail.com
 5 | # @Last Modified time: 2019-04-12 09:56:12
 6 | 
 7 | 
 8 | ## convert the text/attention list to latex code, which will further generates the text heatmap based on attention weights.
 9 | import numpy as np
10 | 
11 | latex_special_token = ["!@#$%^&*()"]
12 | 
13 | def generate(text_list, attention_list, latex_file, color='red', rescale_value = False):
14 | 	assert(len(text_list) == len(attention_list))
15 | 	if rescale_value:
16 | 		attention_list = rescale(attention_list)
17 | 	word_num = len(text_list)
18 | 	text_list = clean_word(text_list)
19 | 	with open(latex_file,'w') as f:
20 | 		f.write(r'''\documentclass[varwidth]{standalone}
21 | \special{papersize=210mm,297mm}
22 | \usepackage{color}
23 | \usepackage{tcolorbox}
24 | \usepackage{CJK}
25 | \usepackage{adjustbox}
26 | \tcbset{width=0.9\textwidth,boxrule=0pt,colback=red,arc=0pt,auto outer arc,left=0pt,right=0pt,boxsep=5pt}
27 | \begin{document}
28 | \begin{CJK*}{UTF8}{gbsn}'''+'\n')
29 | 		string = r'''{\setlength{\fboxsep}{0pt}\colorbox{white!0}{\parbox{0.9\textwidth}{'''+"\n"
30 | 		for idx in range(word_num):
31 | 			string += "\\colorbox{%s!%s}{"%(color, attention_list[idx])+"\\strut " + text_list[idx]+"} "
32 | 		string += "\n}}}"
33 | 		f.write(string+'\n')
34 | 		f.write(r'''\end{CJK*}
35 | \end{document}''')
36 | 
37 | def rescale(input_list):
38 | 	the_array = np.asarray(input_list)
39 | 	the_max = np.max(the_array)
40 | 	the_min = np.min(the_array)
41 | 	rescale = (the_array - the_min)/(the_max-the_min)*100
42 | 	return rescale.tolist()
43 | 
44 | 
45 | def clean_word(word_list):
46 | 	new_word_list = []
47 | 	for word in word_list:
48 | 		for latex_sensitive in ["\\", "%", "&", "^", "#", "_",  "{", "}"]:
49 | 			if latex_sensitive in word:
50 | 				word = word.replace(latex_sensitive, '\\'+latex_sensitive)
51 | 		new_word_list.append(word)
52 | 	return new_word_list
53 | 
54 | 
55 | if __name__ == '__main__':
56 | 	## This is a demo:
57 | 
58 | 	sent = '''the USS Ronald Reagan - an aircraft carrier docked in Japan - during his tour of the region, vowing to "defeat any attack and meet any use of conventional or nuclear weapons with an overwhelming and effective American response".
59 | North Korea and the US have ratcheted up tensions in recent weeks and the movement of the strike group had raised the question of a pre-emptive strike by the US.
60 | On Wednesday, Mr Pence described the country as the "most dangerous and urgent threat to peace and security" in the Asia-Pacific.'''
61 | 	sent = '''我 回忆 起 我 曾经 在 大学 年代 ， 我们 经常 喜欢 玩 “ Hawaii guitar ” 。 说起 Guitar ， 我 想起 了 西游记 里 的 琵琶精 。
62 | 	今年 下半年 ， 中 美 合拍 的 西游记 即将 正式 开机 ， 我 继续 扮演 美猴王 孙悟空 ， 我 会 用 美猴王 艺术 形象 努力 创造 一 个 正能量 的 形象 ， 文 体 两 开花 ， 弘扬 中华 文化 ， 希望 大家 能 多多 关注 。'''
63 | 	words = sent.split()
64 | 	word_num = len(words)
65 | 	attention = [(x+1.)/word_num*100 for x in range(word_num)]
66 | 	import random
67 | 	random.seed(42)
68 | 	random.shuffle(attention)
69 | 	color = 'red'
70 | 	generate(words, attention, "sample.tex", color)


--------------------------------------------------------------------------------