TorR: Towards Brain-Inspired Task-Oriented Reasoning via Cache-Oriented Algorithm-Architecture Co-design

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying task-oriented object detection on edge devices, where high computational and memory demands hinder real-time performance and energy efficiency. To overcome this, the authors propose a co-design approach that replaces CLIP-style alignment with hyperdimensional computing (HDC) and integrates several optimization techniques: temporal consistency–based cache reuse, dynamic inference path scheduling, partial similarity reuse, delta updates, load-gated bypassing, and bit-sliced memory organization. These mechanisms collectively enable flexible trade-offs among accuracy, latency, and energy consumption. Implemented in 28nm technology, the system achieves an average AP@0.5 of 44.27% while operating at 30–60 FPS, with per-window energy consumption ranging from 50 to 113 mJ—demonstrating significantly improved energy efficiency over existing vision-language models.

Technology Category

Application Category

📝 Abstract
Task-oriented object detection (TOOD) atop CLIP offers open-vocabulary, prompt-driven semantics, yet dense per-window computation and heavy memory traffic hinder real-time, power-limited edge deployment. We present \emph{TorR}, a brain-inspired \textbf{algorithm--architecture co-design} that \textbf{replaces CLIP-style dense alignment with a hyperdimensional (HDC) associative reasoner} and turns temporal coherence into reuse. On the \emph{algorithm} side, TorR reformulates alignment as HDC similarity and graph composition, introducing \emph{partial-similarity reuse} via (i) query caching with per-class score accumulation, (ii) exact $δ$-updates when only a small set of hypervector bits change, and (iii) similarity/load-gated bypass under high system load. On the \emph{architecture} side, TorR instantiates a lane-scalable, bit-sliced item memory with bank/precision gating and a lightweight controller that schedules bypass/$δ$/full paths to meet RT-30/RT-60 targets as object counts vary. Synthesized in a TSMC 28\,nm process and exercised with a cycle-accurate simulator, TorR sustains real-time throughput with millijoule-scale energy per window ($\approx$50\,mJ at 60\,FPS; $\approx$113\,mJ at 30\,FPS) and low latency jitter, while delivering competitive AP@0.5 across five task prompts (mean 44.27\%) within a bounded margin to strong VLM baselines, but at orders-of-magnitude lower energy. The design exposes deployment-time configurability (effective dimension $D'$, thresholds, precision) to trade accuracy, latency, and energy for edge budgets.
Problem

Research questions and friction points this paper is trying to address.

task-oriented object detection
open-vocabulary
real-time edge deployment
memory traffic
energy efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperdimensional computing
algorithm-architecture co-design
task-oriented object detection
associative reasoning
energy-efficient edge AI
🔎 Similar Papers
No similar papers found.