Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the problem of learning safe and optimal behavioral policies from human feedback—such as pairwise comparisons, rankings, or demonstrations—in safety-critical domains including robotic navigation and F1 racing control. The proposed method introduces a preference modeling framework based on weighted signal temporal logic (WSTL), which unifies safety constraints and task objectives into expressive temporal logic specifications. To enable scalable optimization, the approach innovatively integrates structural pruning and logarithmic transformation techniques to efficiently convert multilinear WSTL constraints into mixed-integer linear programs (MILPs). This reformulation significantly improves computational efficiency and scalability. Experiments on simulated robotic navigation tasks and real-world F1 telemetry data demonstrate that the method accurately captures fine-grained human preferences, strictly enforces safety requirements, and effectively models complex, dynamic task objectives.

Technology Category

Application Category

📝 Abstract

Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach to solve the learning problem from preferences, rankings, or demonstrations using Weighted Signal Temporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast the problem as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method effectively captures nuanced preferences and models complex task objectives.

Problem

Research questions and friction points this paper is trying to address.

Guaranteeing safety in autonomous systems learning from human preferences

Reducing computational complexity of Weighted Temporal Logic learning problems

Modeling nuanced preferences for robotics and Formula 1 applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Weighted Signal Temporal Logic for learning

Applies structural pruning and log-transform techniques

Formulates as Mixed-Integer Linear Program with safety

🔎 Similar Papers

No similar papers found.