Published 'Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models' at KDD 2023
Published 'Self-Reminders: Defending ChatGPT against Jailbreak Attack via Self-Reminders' in Nature Machine Intelligence 2023
Published 'Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark' at ACL 2023, awarded Area Chair Award of NLP Application Track
Published 'On the Vulnerability of Value Alignment in Open-Access LLMs' in ACL 2024 Findings
Published 'Non-IID always Bad? Semi-Supervised Heterogeneous Federated Learning with Local Knowledge Enhancement' at CIKM 2023
Published 'UA-FedRec: Untargeted Attack on Federated News Recommendation' at KDD 2023
Released 'Control Risk for Potential Misuse of Artificial Intelligence in Science' on arXiv 2023, proposing the SciGuard system to mitigate AI misuse risks in scientific contexts
Released 'ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models' on arXiv 2024