└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Reinforcement-Learning-For-Dialogue-Systems
 2 | Reinforcement Learning For Dialogue Systems   强化学习在对话系统中的应用 论文或开源应用总结
 3 | 
 4 | 
 5 | ## Papers
 6 | 1、End-to-End Task-Completion Neural Dialogue Systems
 7 | https://arxiv.org/pdf/1703.01008
 8 | 
 9 | 2、2016-A User Simulator for Task-Completion Dialogues
10 | https://arxiv.org/pdf/1612.05688
11 | 
12 | 
13 | 3、Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning
14 | ICASSP 2018
15 | https://arxiv.org/pdf/1710.11277
16 | 
17 | 4、Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning
18 | https://arxiv.org/pdf/1704.03084
19 | 
20 | 5、Subgoal Discovery for Hierarchical Dialogue Policy Learning
21 | EMNLP 2018
22 | https://arxiv.org/pdf/1804.07855
23 | 
24 | 
25 | 6、Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning
26 | ACL2018
27 | https://arxiv.org/pdf/1801.06176
28 | 
29 | 7、Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
30 | EMNLP 2018
31 | https://arxiv.org/pdf/1808.09442
32 | 
33 | 8、Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
34 | AAAI209
35 | https://arxiv.org/pdf/1811.07550
36 | 
37 | 
38 | 9、Budgeted Policy Learning for Task-Oriented Dialogue Systems
39 | ACL 2019
40 | https://arxiv.org/pdf/1906.00499
41 | 
42 | 10、2019-Emnlp-Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
43 | 
44 | 
45 | 11、Su P H, Budzianowski P, Ultes S, et al. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management[J]. arXiv preprint arXiv:1707.00130, 2017.
46 | 
47 | 
48 | 12、Weisz G, Budzianowski P, Su P H, et al. Sample efficient deep reinforcement learning for dialogue systems with large action spaces
49 | 
50 | 13、He J, Chen J, He X, et al. Deep reinforcement learning with a natural language action space[J]. arXiv preprint arXiv:1511.04636, 2015.
51 | 
52 | 14、Casanueva I, Budzianowski P, Su P H, et al. Feudal reinforcement learning for dialogue management in large domains[J]. arXiv preprint arXiv:1803.03232, 2018.
53 | 
54 | 
55 | 15、 Abel D, Salvatier J, Stuhlmüller A, et al. Agent-agnostic human-in-the-loop reinforcement learning[J]. arXiv preprint arXiv:1701.04079, 2017.
56 | 
57 | 
58 | 16、 Ross S, Gordon G, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. 2011: 627-635.
59 | 
60 | 
61 | 17、 Chen L, Zhou X, Chang C, et al. Agent-aware dropout dqn for safe and efficient on-line dialogue policy learning[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 2454-2464.
62 | 
63 | 
64 | 20、Dialogue Environments are Different from Games: Investigating Variants of Deep Q-Networks for Dialogue Policy
65 | 
66 | 
67 | 
68 | ## 开源实现
69 | ### 微软开源端到端对话系统框架Convlab：https://github.com/ConvLab/ConvLab
70 | ### ConvLab中使用的用于策略的RL算法包含：  
71 | 
72 | DQN: 2013-Playing atari with deep reinforcement learning  
73 | REINFORCE:1992-Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning  
74 | PPO：2017-Proximal policy optimization algorithms  
75 | PPO's self-imitation variant: 2018- Self-imitation learning  https://arxiv.org/pdf/1707.06347.pdf  
76 | HRL:2017-Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning   
77 | A2C on policy:Asynchronous Methods for Deep Reinforcement Learning    
78 | A2C with an extra SIL loss function:Self-Imitation Learning  https://arxiv.org/abs/1806.05635   
79 | SARSA 
80 | 
81 | 
82 | ### 清华对话系统工具tatk中使用的用于策略的RL算法包含：https://github.com/thu-coai/tatk
83 | Policy Gradient: Simple statistical gradient-following algorithms for connectionist reinforcement learning  
84 | PPO：Proximal policy optimization algorithms
85 | 
86 | 
87 | ###
88 | 


--------------------------------------------------------------------------------