SCIENCE

AI Fails To Read Human Social Cues

Despite rapid advances in artificial intelligence, humans still maintain a significant edge when it comes to understanding social interactions, according to new research from Johns Hopkins University that reveals fundamental limitations in AI’s ability to interpret human behavior.

The study, presented at the International Conference on Learning Representations, found that even the most sophisticated AI models consistently fail to grasp the nuances of social dynamics that humans interpret effortlessly – a critical skill for technologies meant to interact with people in the real world.

“AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians. You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street,” said lead author Leyla Isik, an assistant professor of cognitive science at Johns Hopkins University.

The research team, including doctoral student Kathy Garcia, conducted experiments comparing human perception with AI performance. Participants watched short video clips of people either interacting with each other, performing side-by-side activities, or acting independently. They rated various features of these social interactions on a scale of one to five.

When more than 350 AI models – including language, video, and image systems – were tasked with predicting how humans would judge the same videos, the results showed a stark disconnect. While human participants largely agreed with each other in their assessments, AI models consistently failed to match human perceptions.

“It’s not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in AI. But real life isn’t static. We need AI to understand the story that is unfolding in a scene,” Garcia explained.

The findings highlight a significant gap in AI capabilities that could impact the development of technologies like self-driving cars, assistive robots, and other systems designed to navigate human social environments. While AI has demonstrated remarkable progress in recognizing objects and faces in still images, understanding dynamic social interactions presents a different challenge entirely.

Video models in particular struggled to accurately describe what people were doing in the clips. Even when image models were given a series of still frames to analyze, they couldn’t reliably determine whether people were communicating. Language models performed somewhat better at predicting human behavior, while video models showed stronger correlation with neural activity patterns in the human brain.

Researchers believe this limitation may stem from a fundamental architectural issue. Most AI neural networks were designed based on the structure of brain regions that process static images, rather than the distinct areas responsible for interpreting dynamic social scenes.

“There’s a lot of nuances, but the big takeaway is none of the AI models can match human brain and behavior responses to scenes across the board, like they do for static scenes,” Isik noted. “I think there’s something fundamental about the way humans are processing scenes that these models are missing.”

The research suggests that for AI to truly integrate into human society, developmental approaches may need to shift away from simply scaling up existing models toward architectures that better mirror how humans process social information.

For the billions being invested in autonomous vehicles and social robots, this study serves as a sobering reminder that teaching machines to navigate the complexity of human interaction remains one of AI’s most significant unsolved challenges.


Discover more from NeuroEdge

Subscribe to get the latest posts sent to your email.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button