Towards Unconstrained Human-Object Interaction

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This work addresses the limitations of conventional human-object interaction (HOI) detection, which relies on predefined interaction categories and struggles to generalize to open, dynamic environments. To overcome this constraint, the paper introduces a novel paradigm termed Unconstrained HOI (U-HOI), which for the first time eliminates dependence on fixed vocabularies during both training and inference. By leveraging multimodal large language models (MLLMs), the proposed approach parses free-form textual descriptions at test time and translates them into structured graph representations of interactions. This method demonstrates the potential of MLLMs for understanding interactions in open-world settings, exposes fundamental limitations of existing HOI detectors, and establishes a new framework for scalable, vocabulary-free interaction perception.

Technology Category

Application Category

📝 Abstract
Human-Object Interaction (HOI) detection is a longstanding computer vision problem concerned with predicting the interaction between humans and objects. Current HOI models rely on a vocabulary of interactions at training and inference time, limiting their applicability to static environments. With the advent of Multimodal Large Language Models (MLLMs), it has become feasible to explore more flexible paradigms for interaction recognition. In this work, we revisit HOI detection through the lens of MLLMs and apply them to in-the-wild HOI detection. We define the Unconstrained HOI (U-HOI) task, a novel HOI domain that removes the requirement for a predefined list of interactions at both training and inference. We evaluate a range of MLLMs on this setting and introduce a pipeline that includes test-time inference and language-to-graph conversion to extract structured interactions from free-form text. Our findings highlight the limitations of current HOI detectors and the value of MLLMs for U-HOI. Code will be available at https://github.com/francescotonini/anyhoi
Problem

Research questions and friction points this paper is trying to address.

Human-Object Interaction
Unconstrained HOI
Multimodal Large Language Models
Open-vocabulary Detection
In-the-wild Interaction Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unconstrained HOI
Multimodal Large Language Models
in-the-wild interaction detection
language-to-graph conversion
open-vocabulary HOI
🔎 Similar Papers
No similar papers found.