SmartSearch: Process Reward-Guided Query Refinement for Search Agents
Abstract
SmartSearch enhances LLM-based search agents through process rewards and query refinement mechanisms that improve intermediate search query quality via a three-stage curriculum learning approach.
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.
Community
Some of the observations founded are :-
i. Dual Level Credit Assessment
This mechanism provides a comprehensive evaluation of query quality through both rule-based and model-based assessments. It allows for fine-grained supervision, helping to identify not just redundancy but also the usefulness of each query in the context of the search process.
ii. Process Reward Mechanism
The introduction of process rewards as a guiding signal for training search agents is a novel approach. It shifts the focus from solely final outcomes to the quality of intermediate queries, addressing a significant gap in existing methods that often overlook this aspect.
iii. Query Refinement Strategy
The framework employs a systematic query refinement process that identifies low quality queries and generates improved versions. This iterative refinement enhances the effectiveness of search trajectories, allowing agents to adaptively improve their queries based on feedback.
iv. Three Stage Curriculum Learning Framework
SmartSearch introduces a structured curriculum learning approach that progresses from imitation to alignment and finally to generalization. This staged learning process enables search agents to internalize query quality improvement progressively, enhancing their overall performance.
v. Empirical Validation Across Diverse Benchmarks
The paper presents extensive experimental results demonstrating SmartSearch's superior performance across multiple challenging knowledge-intensive tasks and web exploration scenarios. This empirical validation highlights the framework's robustness and effectiveness in real-world applications, showcasing its potential impact on future research in search agents and information retrieval.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LightSearcher: Efficient DeepSearch via Experiential Memory (2025)
- CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic (2025)
- SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning (2025)
- RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning (2025)
- KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering (2025)
- AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search (2026)
- PRInTS: Reward Modeling for Long-Horizon Information Seeking (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper