S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
초록
Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textbf{\textsc{S-Agent}}, a spatial tool-use agentic paradigm for understanding and reasoning over continuous multi-view images and videos. By formulating spatial reasoning as spatio-temporal evidence accumulation rather than isolated frame-level prediction, \textsc{S-Agent} reshapes spatial perception into scene-centric understanding beyond frame-centric recognition. Specifically, \textsc{S-Agent} casts the VLM as a semantic planner that decides what evidence is needed, while a hierarchy of spatial tools and experts grounds objects in 2D, lifts them into 3D geometric evidence, and aggregates this evidence into high-level spatial knowledge (\textit{e.g.}, counting, measurement, orientation, and relative position). Additionally, a temporal memory mechanism, including Scene Memory for maintaining the evolving scene state and Agent Memory for accumulating reasoning context, enables evidence integration across frames and reasoning steps. Comprehensive experiments on multi-view and video spatial reasoning benchmarks show that \textsc{S-Agent} consistently improves both open-source and closed-source VLMs in a training-free manner. Beyond inference-time augmentation, supervised fine-tuning (SFT) on \textsc{S-Agent}-generated spatial trajectories \textsc{S-300K} yields \textsc{S-Agent-8B}, a compact spatial agent that significantly surpasses similar-scale baselines (e.g., Qwen3-VL-8B) and performs comparably to advanced closed-source models (e.g., GPT-5.4 and Gemini 3).
저자 (13명)
- Yalun Dai — LinkedIn 검색
- Hao Li — LinkedIn 검색
- Shulin Tian — LinkedIn 검색
- Runmao Yao — LinkedIn 검색
- Yuhao Dong — LinkedIn 검색
- Fangzhou Hong — LinkedIn 검색
- Zhaoxi Chen — LinkedIn 검색
- Fangfu Liu — LinkedIn 검색
- Baoliang Tian — LinkedIn 검색
- Dingwen Zhang — LinkedIn 검색
- Tao Wang — LinkedIn 검색
- Kim-Hui Yap — LinkedIn 검색
- Ziwei Liu — LinkedIn 검색
저자 LinkedIn 변경 추적은 추후 자동화 예정입니다. 현재는 검색 링크를 제공합니다.