REBAR: Reference Ethical Benchmark for Autonomy Readiness

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
This study addresses the critical gap in objective, computable metrics for quantifying the ethical and legal compliance of autonomous systems, which has hindered their evaluability and the development of robust accountability mechanisms. To bridge this gap, the authors propose a large language model framework integrating neuro-symbolic methods that maps system behaviors onto an interpretable β€œAutonomy Readiness Level” (ARL) scale through high-fidelity simulation and automated test generation. This approach enables, for the first time, objective and reproducible benchmark scoring of ethical performance in white-box autonomous systems, effectively closing the divide between abstract ethical principles and verifiable, accountable behaviors.
πŸ“ Abstract
As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of their limitations and ensuring accountability of those who misuse them. Current ethical embodied AI frameworks remain mostly qualitative, focusing on system design (through safety guardrails or targeted red teaming), and the realized guardrails often directly disallow unsafe behavior without providing the user with an override or interpretable reason. Instead, there is a need for computable metrics through rigorous testing that allow a user to determine the applicability of the system to the task. To address this gap, we introduce the Reference Ethical Benchmark for Autonomy Readiness (REBAR), a quantitative test and evaluation framework for autonomous systems. REBAR maps operating metrics into a computable Autonomy Readiness Level (ARL) rubric that can quantify ethical performance. Key innovations of the framework include a neuro-symbolic Large Language Model (LLM) approach to calculate and explain the ethical difficulty of scenarios, LLM-driven at-scale generation of test instances, and a versatile, photorealistic simulation environment. By evaluating white-box autonomy solutions through this rigorous testing pipeline, REBAR delivers an objective and repeatable benchmark score, bridging the gap between abstract principles and verifiable, accountable autonomy.
Problem

Research questions and friction points this paper is trying to address.

ethical benchmark
autonomy readiness
computable metrics
autonomous systems
ethical compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic LLM
Autonomy Readiness Level
ethical benchmarking
photorealistic simulation
LLM-driven test generation
J
Jonathan Diller
University of Pennsylvania
David Barnes
David Barnes
University of Westminster
Operations and supply chain management
R
Rebekah Bogdanoff
Duality Robotics, Inc.
R
Rhett Collier
Duality Robotics, Inc.
R
Roddy Collins
Kitware, Inc.
K
Keith Fieldhouse
Kitware, Inc.
Y
Yonatan Gefen
Kitware, Inc.
C
Cameron Johnson
Kitware, Inc.
A
Anuriha Kodali
University of Pennsylvania
B
Brad Kriel
Duality Robotics, Inc.
Varun Murali
Varun Murali
Assistant Professor, Texas A&M University
Decision MakingComputer VisionAutonomous SystemsMachine LearningNavigation
J
James Niehaus
Charles River Analytics
M
Mish Sukharev
Duality Robotics, Inc.
J
Joseph VanPelt
Kitware, Inc.
Anthony Hoogs
Anthony Hoogs
Senior Director of Computer Vision, Kitware, Inc.
computer visionmachine learningAI
Vijay Kumar
Vijay Kumar
Professor of Mechanical Engineering and Applied Mechanics, University of Pennsylvania
Robotics
Arslan Basharat
Arslan Basharat
Assistant Director of Computer Vision, Kitware Inc.
Motion SegmentationTrackingEvents/Activities Recognition