🤖 AI Summary
Deploying multilingual neural machine translation (NMT) on resource-constrained devices—such as IoT endpoints—remains challenging, especially for bidirectional translation between Indian and international languages. Method: This paper proposes an algorithm–hardware co-optimized lightweight NMT system, featuring quantization-aware training combined with sub-eight-bit mixed-precision quantization (FP4/INT4/FP8/INT8), tightly integrated with FPGA architecture for hardware-accelerated inference. Contribution/Results: The system achieves a 4.1× reduction in model size and a 4.2× speedup in inference latency, delivering 66 tokens/s throughput. FPGA resource utilization is significantly improved: LUTs decrease by 1.96× and flip-flops by 1.65×. Compared to OPU and HPTA baselines, throughput increases by 2.2–4.6×, enabling high real-time performance and resource efficiency under ultra-low power constraints.
📝 Abstract
This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.