🤖 AI Summary
This study investigates the safe and effective deployment of large language models (LLMs) to support decision-making in high-stakes public services, specifically child welfare, where misjudgments or oversights of cases requiring professional expertise can have severe consequences. In collaboration with a major Canadian child welfare agency, the research integrates LocalLLM with BERTopic to analyze case trajectories and identify deviations from standard procedures. Findings indicate that the model successfully detects procedural omissions but exhibits significant blind spots in complex scenarios demanding nuanced social work judgment. The work underscores the necessity of participatory design approaches to co-develop language-based tools aligned with public sector needs and delineates the current limitations and future directions for AI systems in high-risk decision contexts.
📝 Abstract
Governments are the primary providers of essential public services and are responsible for delivering them effectively. In high-stakes decision-making domains such as child welfare (CW), agencies must protect children without unnecessarily prolonging a family's engagement with the system. With growing optimism around AI, governments are pushing for its integration but concerns regarding feasibility and harms remain. Through collaborations with a large Canadian CW agency, we examined how LocalLLM and BERTopic models can track CW case progress. We demonstrate how the tools can potentially assist workers in opportunistically addressing gaps in their work by signaling case progress/deviations. And yet, we also show how they fail to detect case trajectories that require discretionary judgments grounded in social work training, areas where practitioners would actually want support to pre-emptively address substantive case concerns. We also provide a roadmap of future participatory directions to co-design language tools for/with the public sector.