From Natural Language to Executable Properties for Property-based Testing of Mobile Apps

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Property-based testing (PBT) has demonstrated significant efficacy in mobile applications, yet its adoption is hindered by the high barrier and cost associated with manually crafting executable properties. To address this challenge, this work proposes a structured property synthesis approach that, for the first time, integrates multimodal large language models with in-context learning to automatically translate natural language descriptions into framework-specific executable test properties through UI semantic alignment. The method achieves a generation accuracy of 95.2% across 124 properties and maintains robustness at 87.6% across 1,180 linguistic variants. User studies further indicate a 56% reduction in test authoring time, substantially decreasing the need for manual intervention.

Technology Category

Application Category

📝 Abstract

Property-based testing (PBT) is a popular software testing methodology and is effective in validating the functionality of mobile applications (apps for short). However, its adoption in practice remains limited, largely due to the manual effort and technical expertise required to specify executable properties. In this experience paper, we propose a novel structured property synthesis approach that automatically translates property descriptions in natural language into executable properties, and implement it in a tool named iPBT. Our approach decomposes the problem into UI semantic grounding and executable property synthesis. It first builds an enriched widget context via multimodal LLMs to align visual elements with their functional semantics, and then uses an LLM with in-context learning to generate framework-specific executable properties. We evaluate iPBT with a closed-source LLM (GPT-4o) and an open-source LLM (DeepSeek-V3) on 124 diverse property descriptions derived from an existing benchmark dataset. iPBT achieves 95.2% (118/124) accuracy on both LLMs. Notably, an ablation study reveals that the enriched widget context contributes to an absolute improvement of up to 20.2% (from 75.0% to 95.2%). A user study with 10 participants demonstrates that iPBT reduces the time required to write executable properties by 56%, suggesting substantially lower manual effort. Furthermore, evaluations on 1,180 linguistically diverse variations demonstrate iPBT's robustness (87.6% accuracy), indicating its capability to handle varied expressions.

Problem

Research questions and friction points this paper is trying to address.

Property-based Testing

Mobile Apps

Executable Properties

Natural Language

Software Testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

property-based testing

natural language to code

multimodal LLM