Ella: Embodied Social Agents with Lifelong Memory

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of lifelong learning and social evolution for multi-agent systems operating in open-world environments over extended periods. We propose Ella, an embodied social agent that integrates a dual-module long-term memory system: a semantic memory organized around conceptual nodes and an spatiotemporal episodic memory encoding event sequences and spatial context. Leveraging multimodal fusion of large language models and vision perception models, Ella enables continuous storage, dynamic updating, and efficient retrieval of multimodal experiences. This architecture supports autonomous daily planning, sustained social relationship formation and maintenance, collaborative initiative-taking, and group leadership within 3D open environments. Empirical evaluation in a 15-agent coexistence setting demonstrates Ella’s strong generalization capability—rapid adaptation to unseen tasks—and its significant influence on collective behavior and emergent social structure evolution.

Technology Category

Application Category

📝 Abstract
We introduce Ella, an embodied social agent capable of lifelong learning within a community in a 3D open world, where agents accumulate experiences and acquire knowledge through everyday visual observations and social interactions. At the core of Ella's capabilities is a structured, long-term multimodal memory system that stores, updates, and retrieves information effectively. It consists of a name-centric semantic memory for organizing acquired knowledge and a spatiotemporal episodic memory for capturing multimodal experiences. By integrating this lifelong memory system with foundation models, Ella retrieves relevant information for decision-making, plans daily activities, builds social relationships, and evolves autonomously while coexisting with other intelligent beings in the open world. We conduct capability-oriented evaluations in a dynamic 3D open world where 15 agents engage in social activities for days and are assessed with a suite of unseen controlled evaluations. Experimental results show that Ella can influence, lead, and cooperate with other agents well to achieve goals, showcasing its ability to learn effectively through observation and social interaction. Our findings highlight the transformative potential of combining structured memory systems with foundation models for advancing embodied intelligence. More videos can be found at https://umass-embodied-agi.github.io/Ella/.
Problem

Research questions and friction points this paper is trying to address.

Developing embodied social agents with lifelong learning in 3D worlds
Integrating multimodal memory for knowledge storage and retrieval
Enabling autonomous decision-making and social relationship building
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured long-term multimodal memory system
Integration with foundation models
Lifelong learning through social interactions
🔎 Similar Papers
No similar papers found.
H
Hongxin Zhang
University of Massachusetts Amherst
Z
Zheyuan Zhang
Johns Hopkins University
Zeyuan Wang
Zeyuan Wang
PhD, The University of Sydney
NLPMedical Informatics
Z
Zunzhe Zhang
Tsinghua University
L
Lixing Fang
University of Massachusetts Amherst
Qinhong Zhou
Qinhong Zhou
University of Massachusetts Amherst
embodied AIlanguage models
Chuang Gan
Chuang Gan
UMass Amherst | MIT-IBM Watson AI Lab
Embodied AGIMulti-ModalRoboticsComputer VisionCognitive Science