π€ AI Summary
This work addresses the challenge of securely, efficiently, and accessibly deploying large-scale, customized large language models (LLMs) in multi-tenant academic research environments. We propose a tenant-aware, proxy-based computational network architecture that integrates zero-trust principles, secure sandboxing, role-based fine-grained access control (RBAC), and end-to-end HTTPS/TLS encryption to enable logically unified scheduling of physical resource pools while enforcing strict inter-tenant process and data isolation. Our approach innovatively combines parallel multi-LoRA inference, agent-driven resource orchestration, and an encrypted inference pipeline. Deployed at the University of Kentuckyβs AI Center, the platform supports secure, cross-disciplinary LLM usage by research teams. Evaluation shows a 37% reduction in inference latency and near-zero inter-tenant resource leakage risk, demonstrating robust security, scalability, and operational efficiency in production academic settings.
π Abstract
This paper introduces a user-friendly platform developed by the University of Kentucky Center for Applied AI, designed to make large, customized language models (LLMs) more accessible. By capitalizing on recent advancements in multi-LoRA inference, the system efficiently accommodates custom adapters for a diverse range of users and projects. The paper outlines the system's architecture and key features, encompassing dataset curation, model training, secure inference, and text-based feature extraction. We illustrate the establishment of a tenant-aware computational network using agent-based methods, securely utilizing islands of isolated resources as a unified system. The platform strives to deliver secure LLM services, emphasizing process and data isolation, end-to-end encryption, and role-based resource authentication. This contribution aligns with the overarching goal of enabling simplified access to cutting-edge AI models and technology in support of scientific discovery.