🤖 AI Summary
This paper addresses sequential hypothesis testing for streaming data under constraints of limited communication bandwidth, scarce computational resources, and stringent privacy requirements. We propose the Query/Hit (Q/H) learning framework, wherein a passive party performs hypothesis verification solely by issuing symbolic queries and observing response latency—without accessing raw data or transmitting sensitive features. To our knowledge, this is the first formalization of such a query-latency–driven sequential inference paradigm. We design the Dynamic Scout-Sentinel Algorithm (DSSA), integrating a mutual information neural estimator to enable adaptive, low-overhead query policy optimization. Theoretically grounded in sequential analysis and streaming modeling, DSSA jointly optimizes statistical efficiency and privacy preservation. Empirical evaluation on real-world datasets—including mouse trajectories, typography patterns, and touch interaction logs—demonstrates significant reductions in both false alarm rate and detection delay compared to multiple state-of-the-art baselines.
📝 Abstract
This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hits). This model is motivated by scenarios with communication, computation, and privacy constraints that limit real-time access to the source. The error exponent for sequential hypothesis testing under the Q/H model is characterized, and a querying strategy, the Dynamic Scout-Sentinel Algorithm (DSSA), is proposed. The strategy employs a mutual information neural estimator to compute the error exponent associated with each query and to select the query with the highest efficiency. Extensive empirical evaluations on both synthetic and real-world datasets -- including mouse movement trajectories, typesetting patterns, and touch-based user interactions -- are provided to evaluate the performance of the proposed strategy in comparison with baselines, in terms of probability of error, query choice, and time-to-detection.