🤖 AI Summary
This work proposes GRACE, a novel reason-based neurosymbolic architecture designed to address the challenge of aligning highly autonomous AI systems with both operational efficiency and diverse ethical norms in real-world applications. GRACE decouples moral reasoning from instrumental decision-making through three integrated modules—moral, decision, and guardian—enabling coexistence of multiple ethical frameworks, traceable behavior, and formal verifiability. By combining deontic logic, neurosymbolic reasoning, and large language models, GRACE implements an interpretable, contestable, and formally verifiable alignment mechanism, demonstrated in a psychotherapy assistant case study. This approach empowers stakeholders to understand, challenge, and refine AI behavior while providing dual guarantees of statistical and formal alignment.
📝 Abstract
As AI agents become increasingly autonomous, widely deployed in consequential contexts, and efficacious in bringing about real-world impacts, ensuring that their decisions are not only instrumentally effective but also normatively aligned has become critical. We introduce a neuro-symbolic reason-based containment architecture, Governor for Reason-Aligned ContainmEnt (GRACE), that decouples normative reasoning from instrumental decision-making and can contain AI agents of virtually any design. GRACE restructures decision-making into three modules: a Moral Module (MM) that determines permissible macro actions via deontic logic-based reasoning; a Decision-Making Module (DMM) that encapsulates the target agent while selecting instrumentally optimal primitive actions in accordance with derived macro actions; and a Guard that monitors and enforces moral compliance. The MM uses a reason-based formalism providing a semantic foundation for deontic logic, enabling interpretability, contestability, and justifiability. Its symbolic representation enriches the DMM's informational context and supports formal verification and statistical guarantees of alignment enforced by the Guard. We demonstrate GRACE on an example of a LLM therapy assistant, showing how it enables stakeholders to understand, contest, and refine agent behavior.