Google AI Blog

Using reinforcement learning for dynamic planning in open-ended conversations

# Dynamic Planning in Conversations using Reinforcement Learning 

Dynamic planning, the ability to adapt and replan based on the flow of conversation, is crucial for creating engaging and open-ended interactions in human-to-assistant conversations. In a recent study, an RL-based approach was employed to enable assistants to plan multi-turn conversations towards a goal and modify the plan dynamically. The dialogue manager at the core of the response composition loop trained using off-policy RL allowed the assistant to modify its original plan in real-time. 

The RL model was compared with a supervised transformer model using Google Assistant to converse with users about animals. The RL model showed an increase in cooperative responses, explicit positive feedback, and reduction in negative feedback, indicating improved user engagement. Characteristics of the unseen RL plan included sub-dialogues about animal sounds and entity pivoting at every turn. 

Future work involves empowering LLMs with dynamic planning capabilities using RL framework for more engaging experiences.