🤖 AI Summary
Existing black-box extraction attacks against decision tree models in trusted execution environments (TEEs) suffer from high query costs, reliance on strong structural assumptions, or dependence on rich API information. This work proposes TrEEStealer, the first approach that synergistically combines TEE-specific control-flow side-channel leakage with passive information tracking to reconstruct model structures efficiently and with high fidelity, without requiring strong assumptions. Implemented on both AMD SEV—leveraging SEV-Step and performance counters—and Intel SGX—exploiting a novel branch history register side channel—and integrated with mainstream libraries including OpenCV, mlpack, and emlearn, TrEEStealer significantly outperforms prior methods in both extraction efficiency and fidelity, establishing a new state of the art and exposing fundamental vulnerabilities in TEE control-flow protections.
📝 Abstract
Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion.
In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register.
We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.