Learning Quantized Continuous Controllers for Integer Hardware

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the triple challenge of deploying continuous-control reinforcement learning policies on embedded hardware under stringent constraints of ultra-low latency, minimal power consumption, and complete avoidance of floating-point arithmetic. We propose a quantization-aware training-driven, end-to-end co-design framework that automatically identifies 2–3-bit integer-only policies and synthesizes them directly onto an Artix-7 FPGA. By jointly optimizing low-bit neural network compression and hardware-aware implementation, our approach eliminates floating-point dependencies while preserving control robustness. Evaluated on five MuJoCo benchmarks, our quantized policies match FP32 baseline performance, achieve microsecond-scale inference latency, and consume only microjoules per action—significantly outperforming prior quantization methods. Our core contribution is the first closed-loop pipeline from low-bit policy learning to FPGA hardware synthesis, simultaneously ensuring accuracy, efficiency, and practical deployability.

Technology Category

Application Category

📝 Abstract

Deploying continuous-control reinforcement learning policies on embedded hardware requires meeting tight latency and power budgets. Small FPGAs can deliver these, but only if costly floating point pipelines are avoided. We study quantization-aware training (QAT) of policies for integer inference and we present a learning-to-hardware pipeline that automatically selects low-bit policies and synthesizes them to an Artix-7 FPGA. Across five MuJoCo tasks, we obtain policy networks that are competitive with full precision (FP32) policies but require as few as 3 or even only 2 bits per weight, and per internal activation value, as long as input precision is chosen carefully. On the target hardware, the selected policies achieve inference latencies on the order of microseconds and consume microjoules per action, favorably comparing to a quantized reference. Last, we observe that the quantized policies exhibit increased input noise robustness compared to the floating-point baseline.

Problem

Research questions and friction points this paper is trying to address.

Deploying reinforcement learning policies on low-power embedded hardware efficiently

Reducing policy network precision to 2-3 bits for integer-only inference

Achieving microsecond latency and microjoule energy consumption per action

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantization-aware training for integer inference policies

Automatic low-bit policy selection and FPGA synthesis

Achieving microsecond latency and microjoule energy consumption

🔎 Similar Papers

No similar papers found.