Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility Constraints

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This work addresses the challenge of enabling efficient and privacy-preserving keyword search in the Solid decentralized personal data ecosystem, where user data is distributed across pods governed by fine-grained access controls. The authors propose ESPRESSO, a framework that leverages WebID-scoped indexes and privacy-aware metadata to facilitate effective cross-pod source selection and ranking while strictly adhering to user-defined access policies. The study introduces the first formal threat model for decentralized search, explicitly characterizing privacy risks throughout the index and metadata lifecycle, and derives design principles to limit metadata exposure and prevent unauthorized inference. Experimental results demonstrate that ESPRESSO achieves scalable and efficient keyword search without compromising data sovereignty or user privacy.

Technology Category

Application Category

📝 Abstract

In decentralized personal data ecosystems grounded in architectures such as Solid, users retain sovereignty over their data via personal online data stores (pods), hosted on Solid-compliant server infrastructures. In such environments, data remains under the control of pod owners, which complicates search due to distribution across numerous pods and user-specific access constraints. ESPRESSO is a decentralized framework for scalable keyword-based search across distributed Solid pods under user-defined visibility policies. It addresses key challenges of decentralized search by constructing WebID-scoped indexes within pods and employing privacy-aware metadata to enable efficient source selection and ranking across servers. This paper further introduces a formal threat model for ESPRESSO, analysing the security and privacy risks associated with the generation, aggregation, and use of indexes and metadata. These risks include unintended metadata leakage and the potential for adversaries to infer sensitive information about data that resides within personal data stores. The analysis identifies key design principles that limit metadata exposure while mitigating unauthorized inference. The proposed threat model provides a foundation for evaluating privacy-preserving decentralized search and informs the design of systems with stronger privacy guarantees.

Problem

Research questions and friction points this paper is trying to address.

decentralized search

privacy leakage

Solid pods

visibility constraints

metadata inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

decentralized search

privacy-preserving metadata

Solid pods