An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A semantic gap persists between food safety regulations and their implementation in software systems, hindering automated translation of legal provisions into executable software requirements. Method: We propose the first regulatory-compliance-oriented conceptual framework for food safety, systematically linking legal texts to software requirements. Leveraging BERT and GPT-series models, we empirically evaluate classification performance on clause-level requirement identification. We further integrate few-shot prompting, supervised fine-tuning, and grounded theory analysis to assess cross-jurisdictional (U.S./Canada) generalization and the trade-off between fine-tuning efficacy and data efficiency. Contribution/Results: Fine-tuned GPT-4o achieves 89% precision and 87% recall; few-shot prompting boosts recall to 97% (precision: 65%). Both approaches significantly outperform LSTM-based and keyword-matching baselines. Our framework advances automated regulatory-to-software requirement derivation, while findings provide empirical guidance on LLM adaptation strategies for legal compliance engineering.

Technology Category

Application Category

📝 Abstract
As Industry 4.0 transforms the food industry, the role of software in achieving compliance with food-safety regulations is becoming increasingly critical. Food-safety regulations, like those in many legal domains, have largely been articulated in a technology-independent manner to ensure their longevity and broad applicability. However, this approach leaves a gap between the regulations and the modern systems and software increasingly used to implement them. In this article, we pursue two main goals. First, we conduct a Grounded Theory study of food-safety regulations and develop a conceptual characterization of food-safety concepts that closely relate to systems and software requirements. Second, we examine the effectiveness of two families of large language models (LLMs) -- BERT and GPT -- in automatically classifying legal provisions based on requirements-related food-safety concepts. Our results show that: (a) when fine-tuned, the accuracy differences between the best-performing models in the BERT and GPT families are relatively small. Nevertheless, the most powerful model in our experiments, GPT-4o, still achieves the highest accuracy, with an average Precision of 89% and an average Recall of 87%; (b) few-shot learning with GPT-4o increases Recall to 97% but decreases Precision to 65%, suggesting a trade-off between fine-tuning and few-shot learning; (c) despite our training examples being drawn exclusively from Canadian regulations, LLM-based classification performs consistently well on test provisions from the US, indicating a degree of generalizability across regulatory jurisdictions; and (d) for our classification task, LLMs significantly outperform simpler baselines constructed using long short-term memory (LSTM) networks and automatic keyword extraction.
Problem

Research questions and friction points this paper is trying to address.

Food Safety Regulations
Modern Software Implementation
Automated Classification Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Legal Text Classification
Few-shot Learning
🔎 Similar Papers
No similar papers found.