Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study addresses the responsible deployment of language models (LMs) in high-stakes decision-making contexts—specifically, processing public comments under the U.S. National Environmental Policy Act (NEPA). We propose a statically typed LM subroutine framework that encapsulates LM capabilities as type-safe, auditable, and asynchronously invocable modules; it fully exposes generated outputs and supports human feedback–driven online alignment optimization. Methodologically, we integrate static typing, asynchronous programming, and human-in-the-loop supervision to ensure end-to-end transparency, interpretability, and bias controllability. Our key contribution is the first application of static typing principles to LM interface design, enabling precise functional scoping and behavior verifiability. The implemented CommentNEPA system achieves high inter-annotator agreement with human experts even without feedback, demonstrating the framework’s effectiveness in real-world policy analysis—balancing automation efficiency with sustained expert oversight and control.

Technology Category

Application Category

📝 Abstract

The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package this framework as a library to support its adoption and continued development. While this framework may be applicable across several real-world decision workflows (e.g., in healthcare and legal fields), we evaluate it in the context of public comment processing as mandated by the 1969 National Environmental Protection Act (NEPA): Specifically, we use this framework to develop "CommentNEPA," an application that compiles, organizes, and summarizes a corpus of public commentary submitted in response to a project requiring environmental review. We quantitatively evaluate the application by comparing its outputs (when operating without human feedback) to historical ``ground-truth'' data as labelled by human annotators during the preparation of official environmental impact statements.

Problem

Research questions and friction points this paper is trying to address.

Ensuring safe and transparent use of language models in real-world applications

Developing auditable LM-powered subroutines for human-in-the-loop decision-making

Optimizing public comment processing via LM automation with verifiable outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for statically typed LM-powered subroutines

Audit-ready recording of LM artifacts

Online improvement via sparse human feedback

🔎 Similar Papers

Semantic Operators: A Declarative Model for Rich, AI-based Data Processing