Most AI is developed with human interaction as an afterthought: first comes function and capability, and then we worry about makign the AI agent helpful and safe. What would it mean to formulate the problem with humans in mind from the get-go, and how would the algorithms used to generate AI actions and predictions change as a consequence?
First, AI agents assisting us will need to do what we, humans, want them to. Figuring out how to generate optimal actions is half the battle, the other half is figuring out the objective itself. Second, AI agents will need to act interactively with us, from robots moving around us in our homes, to cars coordinating with us on our roads, to personal assistants, say, proposing alternative flight options for us to choose from.
Our group’s research agenda is to formalize and algorithmically solve the problem of AI action not in isolation, but for assistance of and in coordination with people – this is what we call AI action for people, and around people.
Approach. We formalize the problem as a human-AI system. The key to our approach is modeling people as intentional agents, whose behavior is driven (albeit imperfectly) by what they want. We formalize intent – someone’s goals, preferences about the task, and preferences about their interaction with the agent – generally via a reward function whose parameters are unknown to the agent, and thus latent state. AI action for people is then defined by optimality with respect to this reward function. The same function drives human behavior, so AI action around people becomes a response to this behavior. This framework gives us a systematic way to break down interaction into its core components: identifying sources of evidence about the latent state (1), modeling their relation to the latent state to perform estimation (2), generating the agent’s actions with respect to the latent state, in light of the human also acting simultaneously (3), and accounting for the engineer/designer as a human that is part of the human-AI system (4).
Figure 1: A depiction of the human-AI interaction problem using a robot fetching coffee task. The human wants coffee, which the robot does not directly observe (latent state θH). The robot can treat human actions aH and even the state of the world x as observations about θH (1), via a model of how the human acts based on θH (2). The human knows the robot is trying to help, and the human and the robot act simultaneously (3). Here, the fact that the pot has already brewed coffee (x) leaks information that the human values coffee enough to brew it. The human is moving to set the cup down next to the pot (aH), expecting the robot will fill it. The robot uses x and aH to figure out what the human wants, and starts reaching for the pot.
To estimate human intent, the agent needs some source of evidence about it – it needs observations (or measurements). Traditionally in AI, these come from the human demonstrating the task on their own, which is both tedious and often insufficient. Instead, we’ve shown that when you step outside of an artificial lab setting, observations beyond demonstrations become readily available:
People leak information about what they want when they physically react to what the robot does (e.g. push it away from them [1, 2]), when they tell it to stop what it’s doing or switch it off [3], etc. And it gets even better: even the someone’s environment itself leaks information about their preferences [4], because they have been acting in it according to their preferences!
Imagine someone tells you to clean up a room, and as you walk in you see in the corner an elaborate house of cards. Even though the person didn’t say it, you implicitly know you shouldn’t clean up the house of cards. This is an example of the environment itself implicitly communicating about what the robot should optimize. We’ve contributed algorithms for learning from these sources, as well as from explicit human feedback such as answers to comparison queries [5] or feature queries [6].
Figure 2: Observations about what people want. (a) Physical corrections. Left: the robot is compliant, but does not learn from the human’s intervention. Right: the robot uses the user’s external torque to update its estimate of how the user wants the robot to move. (b) The state of the environment: the user tells the robot to go to the door, but the robot figures out, from the fact that the vase in the middle of the room is still intact, that it should probably go around it.
We’ve also enabled agents to capitalize on all these sources together by contributing a unifying formalism for reward learning from human feedback [7]. The idea started by looking at explicit sources, and generalize from there. For instance, when the person tells the agent they prefer a behavior over another, we know how to interpret that – we see it as a choice that the person is making, with respect to the reward (this is now known as RLHF). When the person gives a demonstration to the agent, they are also making a choice – it’s just that this one is implicit: they are choosing the demonstrated behavior over all other behaviors they could have demonstrated, but chose not to. Again, this choice is relative to the reward. When the person, say, turns the robot off, we argue they are also making a choice: they could have done nothing and let the robot continue, but, implicitly, they chose not to. Thus, to make sense of such leaked information, as well explicit feedback like scalar rewards, comparisons, or credit assignment, all in one algorithm,
We proposed that all human feedback, despite its diversity, can be seen as a choice the human makes implicitly – a choice that is based on the reward function, even though the reward of a choice can often not be directly evaluated.
The trick is that when choices are not agent behaviors, as in turning the robot off, correcting it, or even the state of the environment, they can still be grounded in behavior. Find the grounding, and we know how to link the choice to the reward. For instance, switching the robot off grounds to the trajectory the robot was pursuing, followed by staying stopped for the rest of the time horizon. This lens helped provide conceptual clarity on reward learning, enabled agents to combine different types of feedback together and actively select informative feedback types, and also gave us a recipe for formalizing yet-to-be-invented sources of information.
How does human behavior, which is observable, relate to the human underlying intent? While thinking of human behavior as noisily rational has taken us far, behavioral economics has long warned us that people make systematic deviations from rationality. We’ve shown that such deviations can make intent inference go completely wrong, so the AI agent really needs to account for them. Unfortunately, while behavioral economics has identified a plethora of domain-specific heuristics, attempting to somehow encode these for the agent is not scalable. Instead, we noticed that much of the behavior that appears irrational might actually be derivable from first principles:
Our idea was to treat people as actually rational, but under different assumptions than those of the agent’s.
We can then leverage data to learn what these assumptions are [8]. For instance, while users operating a complex robot might seem really suboptimal, their actions make perfect sense under their own, internal, dynamics model – people have a notion of intuitive physics that fails to capture all intricacies of the real system; learn what this internal model is, and that gives us the key to figure out what they want and assist them in spite of the suboptimality. We’ve therefore used this generalized rationality framework to improve robot assistance by modeling people as assuming a different dynamics model [9], a shorter time horizon [10], or as still learning about their own preferences [11]. Further, noisy-rationality came from econometrics and discrete spaces, and we showed better performance by rederiving it for continuous robotics spaces [12].
Figure 3: a-b: The robot accounts for the human as rational under a shorter time horizon, and can better assist by compensating for the human’s myopic behavior. c-d: The robot detects misspecification by estimating the human’s apparent irrationality – as the human is avoiding the coffee spill which the robot does not know about, the human appears irrational, the robot starts making higher variance predictions, and its plans automatically give the person more room.
Nonetheless, no model is ever perfect, which leads agents to infer the wrong reward, and in the robotics case get unsafely close to the person due to wrong predictions. We’ve proposed that the agent should estimate the person’s apparent rationality online:
If the person appears irrational under the agent’s assumptions, that simply means the agent has the wrong assumptions about the person.
We have applied this idea to assistance, where the robot detects that the human’s demonstrations or corrections cannot be explained by the set of features it currently has access to [14]. We also applied it to coordination, where this estimation naturally leads to higher variance predictions when the model is wrong [13]. What happens next is particularly exciting: rather than having to somehow intervene and heuristically make the robot more conservative, these higher variance predictions – modeling a person who appears less rational to the robot – automatically lead to plans where the robot gives the human more space, at least until their behavior starts making sense to the robot’s model again.
The agent’s actions and even its very existence influence human behavior, because people make different decisions when they interact than when they act in isolation. It is tempting to address coordination by predicting what people would do in isolation, and having robots stay out of the way. But that leads, for instance, to cars failing to merge on a highway because they can’t get into their target lane, unable to realize that their actions can actually influence what people do.
Cars can turn seemingly infeasible plans into feasible ones if they account for the mutual influence between their actions and the humans’.
We realized this mutual influence can is best characterized by a general sum dynamic game, and developed several approximations (static Stackleberg [15], hierachical decomposition [16], a game-theoretic equivalent of iLQR [17]). These enabled cars to figure out beautiful coordination strategies, like how to negotatiate merges, or that backing up at an intersection makes it more likely for the other driver to proceed first.
Similarly for assistance, people don’t behave as if they are alone:
People are aware that the AI agent is observing them, trying to learn what they want – so they try to be informative! In turn, agents should not be stuck as passive observers, but can leverage their physical actions to speed up learning.
Our idea was to formalize these phenomena via a common payoff game between the human and the robot, in which only the human can access the parameters of the shared utility – we call this the “assistance game” [18]. We’ve shown how to approximate it with an exponential reduction in complexity via a new Bellman update [19, 20], and we’ve also shown how robots can explicitly seek information gain via their physical actions, i.e. make “embodied queries”: we’ve gotten cars to invent strategies like inching forward at intersections or nudging into a lane to probe someone’s driving style [21], which otherwise would be handcrafted; and we’ve gotten manipulators to hold objects in a way such that the person reveals their ergonomic preferences when reaching for them [22].
Figure 4: By treating interaction from a game-theoretic lens and finding approximations that work in real-time, cars leverage their influence on human actions: they merge in front of someone in heavy traffic knowing that they can slow down to let them in, and decide to back up at the intersection to get the human to proceed faster. These strategies emerge automatically out of the robot’s optimization.
Finally, there is a question of who is the human in this framework. Even for robots that are not meant to interact with end-users, a human – the engineer – still needs to specify the robot’s objective (reward, cost, goal, constraints, loss, etc.). Through our experience in both academia and industry on a variety of applications, we’ve come to realize that for any interesting enough problem we have no idea how to specify the right objective. First, it’s always an iterative process. Second, even once we’ve iterated, we still get it wrong – we can never anticipate every single environment the robot will face and make sure the objective incentivizes the right behavior everywhere. With this realization,
We proposed that the specified objective should merely be evidence about, rather than the definition of, the true objective, to be interpreted only within the context it was specified for [23].
Our algorithm enabled robots to learn from the specified objective, but maintain uncertainty about what they should optimize for in new environments and implicitly know what they don’t know. This has lead to better testtime performance in arm motion planning [24] and autonomous driving. We have also closed the loop with the designer, leveraging the uncertainty to make queries about what the reward should be in hypothetical synthesized environments to narrow in on what they actually want.
Watch a playlist of our research talks here!