π€ AI Summary
This work addresses the limitations of scaling model size alone to improve physical AI performance under constraints of latency, energy consumption, privacy, and reliability, arguing that co-optimization of sensing and reasoning is essential. Inspired by biological nervous systems, the paper proposes an Artificial Ternary Intelligence (ATI) architecture comprising brainstem (L1), cerebellum (L2), and cerebral inference subsystems (L3/L4). It introduces a novel βsensor-firstβ modular design that enables closed-loop coordination among perception control, adaptive sensing, and hierarchical reasoning. The architecture supports edge-cloud collaboration and on-demand invocation of high-level inference. Evaluated on a mobile camera prototype under dynamic lighting and motion conditions, the system achieves an end-to-end accuracy improvement from 53.8% to 88% while reducing remote high-level inference calls by 43.3%.
π Abstract
As AI moves from data centers to robots and wearables, scaling ever-larger models becomes insufficient. Physical AI operates under tight latency, energy, privacy, and reliability constraints, and its performance depends not only on model capacity but also on how signals are acquired through controllable sensors in dynamic environments. We present Artificial Tripartite Intelligence (ATI), a bio-inspired, sensor-first architectural contract for physical AI. ATI is tripartite at the systems level: a Brainstem (L1) provides reflexive safety and signal-integrity control, a Cerebellum (L2) performs continuous sensor calibration, and a Cerebral Inference Subsystem spanning L3/L4 supports routine skill selection and execution, coordination, and deep reasoning. This modular organization allows sensor control, adaptive sensing, edge-cloud execution, and foundation model reasoning to co-evolve within one closed-loop architecture, while keeping time-critical sensing and control on device and invoking higher-level inference only when needed. We instantiate ATI in a mobile camera prototype under dynamic lighting and motion. In our routed evaluation (L3-L4 split inference), compared to the default auto-exposure setting, ATI (L1/L2 adaptive sensing) improves end-to-end accuracy from 53.8% to 88% while reducing remote L4 invocations by 43.3%. These results show the value of co-designing sensing and inference for embodied AI.