Introduced and characterized the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems drawing on insights from cognitive neuroscience.
Demonstrated that it is possible to automatically construct adversarial attacks on LLMs, which can transfer to many closed-source, publicly-available chatbots like ChatGPT, Bard, and Claude.
Developed globally-robust neural networks by integrating an efficient differentiable local-robustness verifier into the forward pass of a network during training.
Proposed Capture: Centralized Library Management for IoT Devices, which splits library code onto a secure, centralized hub, reducing the burden on vendors to maintain critical security patches.
Presented a fast procedure for checking local robustness on deep networks using only geometric projections, leading to an efficient, highly-parallel GPU implementation.
Explored the effect that small changes in the composition of training data can have on an individual's outcome under a model, and the implications for the responsible application of deep learning to sensitive decisions.
Conducted an AAAI'21 tutorial on how explainability can inform questions about the robustness, privacy, and fairness aspects of model quality, and how methods for improving these can lead to better outcomes for explainability.
Research Experience
Focuses on understanding the unique risks and vulnerabilities that arise from learned components, and developing methods to mitigate them, often with provable guarantees.
Background
Associate Professor at Carnegie Mellon's School of Computer Science, and a member of CyLab. His research aims to enable systems that make secure, fair, and reliable use of machine learning.