A while ago, Jon Oberlander wrote a squib for Computational Linguistics called "Do the Right Thing ... but Expect the Unexpected", in which he argued that when people speak, they often succeed in choosing what to say in accordance with the maxim to do the right thing, which in this case means to produce the utterance that the listener will find easiest to interpret and make sense. But they also sometimes fail, and the article points out that reasonable generation algorithms may well also do the wrong (i.e. unexpected) thing, and that this is no surprise.
In the intervening years, things have happened, some of them expected, some of them unexpected. Among them was the invention, by Donald Rumsfeld, of a useful meme for dichotomization. He split the unknowns of a situation into the "known unknowns" and the "unknown unknowns". This also works for the "unexpected": we have the "expected unexpected" and the "unexpected unexpected". Both of these turn up in information seeking natural language dialog systems. The expected exchanges of such a dialog systems are things like a question-answer pair. The "expected unexpected" are the points at which knowledge gaps and other misfires result in the need for dialog moves, such as the initiators for clarification sub-dialogs, that are there in order to fix difficulties and get the system and its interlocutor out of the ditch next to the royal road of goal-directed dialog and onto the smooth well-maintained tarmac. The "unexpected unexpected" is when something happens that leads the system to believe that its interlocutor is off in the next field climbing a tree, talking to a cow or even climbing a cow and talking to a tree. The system has no conventional moves for getting things back on track.
At this point the system may fall into the temptation that it ought to engage in sophisticated reasoning in order to work out what the appropriate repair is, by, for example, recruiting extra knowledge from somewhere until it can work out that its interlocutor is after all doing something rational. This is going to take work, which is scary. What is even more scary, this work is like the effortful, rational, slow work that Daniel Kahneman calls "System 2". Kahneman points out that System 2 thought comes less naturally than System 1 thought, which is more automatic. I think that modern dialog systems, especially the ones that work by reinforcement learning, are basically operating in a way that mirrors system 1, choosing dialog moves that, from experience, tend to work out well in moving the interaction along. They actually do a bit more than this, because they can often tell when the dialog is in a ditch, and get it out. Their remedies (such as clarification requests) are conventional, stereotyped and maybe under-informed, but they are a bit flexible, and they usually work in handling the expected unexpected. Maybe they are something like "System 1.5", with a bit of flexibility, but not enough to handle the cases where the dialog seems to be off in the next field. I doubt that there is any hope of learning System 2 thought by reinforcement learning over dialog traces.
That's OK, because there are two quite distinct reasons why the system might think the dialog is off in the next field. Either the situation will become sensible when the system manages to find the chain of reasoning that will allow it to understand that the interlocutor is acting reasonable, or, perhaps just as likely, the interlocutor really is up a cow, talking to tree, and no amount of inference will get the
dialog back where it needs to be. This interlocutor is beyond the pale, and the best that they can expect is kindness (which co-incidentally, Don Rumsfeld ... hmm, let's leave that thought unfinished).
So, if I ever get to design a dialog system, it will be called System 1.5, and it will adopt the Rumsfeldian philosophy of expecting the expected unexpected, then doing the conventional thing.