Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Current AI-assisted programming treats code and chat logs as primary artifacts, rendering system architecture, dependencies, and design decisions untraceable and resulting in fragile, hard-to-audit software. This work proposes the Agentic Consensus paradigm, which introduces a typed attributed graph as an actionable world model—designated the consensus layer C—to supersede code as the principal engineering artifact. Synchronization operators Φ (realization) and Ψ (rehydration) maintain consistency between executable outputs and C. By explicitly modeling structural commitments, the approach reframes underspecified problems as quantifiable consensus entropy and introduces new evaluation metrics: alignment fidelity, consensus entropy, and intervention distance. The authors further develop a benchmark suite to measure reductions in human intervention, shifting the evaluation paradigm from code correctness toward collaborative alignment and controllability.

Technology Category

Application Category

📝 Abstract

Vibe coding produces correct, executable code at speed, but leaves no record of the structural commitments, dependencies, or evidence behind it. Reviewers cannot determine what invariants were assumed, what changed, or why a regression occurred. This is not a generation failure but a control failure: the dominant artifact of AI-assisted development (code plus chat history) performs dimension collapse, flattening complex system topology into low-dimensional text and making systems opaque and fragile under change. We propose Agentic Consensus: a paradigm in which the consensus layer C, an operable world model represented as a typed property graph, replaces code as the primary artifact of engineering. Executable artifacts are derived from C and kept in correspondence via synchronization operators Phi (realize) and Psi (rehydrate). Evidence links directly to structural claims in C, making every commitment auditable and under-specification explicit as measurable consensus entropy rather than a silent guess. Evaluation must move beyond code correctness toward alignment fidelity, consensus entropy, and intervention distance. We propose benchmark task families designed to measure whether consensus-based workflows reduce human intervention compared to chat-driven baselines.

Problem

Research questions and friction points this paper is trying to address.

AI-assisted development

governable consensus

system opacity

artifact dimension collapse

consensus entropy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Consensus

typed property graph

consensus entropy