FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of balancing planning efficiency and execution accuracy in multi-step image editing tasks—such as detecting and recoloring a bench, removing a cat, or repainting a wall. We propose a neuro-symbolic agent framework featuring an adaptive dual-mode architecture: a “fast” mode leverages large language models to induce reusable symbolic subroutines from historical successful editing trajectories, enabling intuitive high-level planning; a “slow” mode employs hierarchical task decomposition coupled with localized A* search to generate low-cost tool invocation sequences. The two modes dynamically coordinate during editing, autonomously selecting between subroutine invocation and fine-grained search. Experiments demonstrate that our method achieves success rates comparable to state-of-the-art baselines while significantly reducing computational overhead—marking the first effective integration of symbolic induction with neural planning for image editing.

Technology Category

Application Category

📝 Abstract

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A$^*$ search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A$^*$ on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A$^*$ search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent "FaSTA$^*$'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A$^*$ search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA$^*$ is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

Problem

Research questions and friction points this paper is trying to address.

Efficient multi-turn image editing with cost-effective planning

Combining fast LLM planning with slow precise tool-use

Reusing subroutines to reduce exploration costs for similar tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neurosymbolic agent combines LLM planning with A* search

Inductive reasoning extracts reusable subroutines for efficiency

Adaptive fast-slow planning prioritizes subroutines over A*

🔎 Similar Papers

No similar papers found.