🤖 AI Summary
This work addresses the challenge of enabling efficient and privacy-preserving keyword search in the Solid decentralized personal data ecosystem, where user data is distributed across pods governed by fine-grained access controls. The authors propose ESPRESSO, a framework that leverages WebID-scoped indexes and privacy-aware metadata to facilitate effective cross-pod source selection and ranking while strictly adhering to user-defined access policies. The study introduces the first formal threat model for decentralized search, explicitly characterizing privacy risks throughout the index and metadata lifecycle, and derives design principles to limit metadata exposure and prevent unauthorized inference. Experimental results demonstrate that ESPRESSO achieves scalable and efficient keyword search without compromising data sovereignty or user privacy.
📝 Abstract
In decentralized personal data ecosystems grounded in architectures such as Solid, users retain sovereignty over their data via personal online data stores (pods), hosted on Solid-compliant server infrastructures. In such environments, data remains under the control of pod owners, which complicates search due to distribution across numerous pods and user-specific access constraints. ESPRESSO is a decentralized framework for scalable keyword-based search across distributed Solid pods under user-defined visibility policies. It addresses key challenges of decentralized search by constructing WebID-scoped indexes within pods and employing privacy-aware metadata to enable efficient source selection and ranking across servers. This paper further introduces a formal threat model for ESPRESSO, analysing the security and privacy risks associated with the generation, aggregation, and use of indexes and metadata. These risks include unintended metadata leakage and the potential for adversaries to infer sensitive information about data that resides within personal data stores. The analysis identifies key design principles that limit metadata exposure while mitigating unauthorized inference. The proposed threat model provides a foundation for evaluating privacy-preserving decentralized search and informs the design of systems with stronger privacy guarantees.