Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

arXiv · PDF · 코드 미공개

초록

Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is false or absent, e.g., emergency responders need to locate non-flooded routes for evacuation. To comprehensively study this limitation, we introduce RS-Neg, the first benchmark to evaluate negation understanding across region-level to scene-level tasks. Specifically, we design an automated data generation pipeline for RS imagery, using LLMs to synthesize diverse negation queries, and introduce a dynamic visual focus module for verification. Our evaluation reveals that advanced RS MLLMs struggle with negation, exhibiting hallucinations and substantial performance degradation. To close this gap, we propose NeFo, a novel test-time learning method that explicitly incorporates the logical role of negation into the model optimization. Remarkably, using about 5\% unlabeled test samples, NeFo significantly improves the negation understanding of models and shows strong generalization to unseen tasks. Code and data will be released upon acceptance.

저자 (4명)

Haochen Han — LinkedIn 검색
Jue Wang — LinkedIn 검색
Alex Jinpeng Wang — LinkedIn 검색
Fangming Liu — LinkedIn 검색

저자 LinkedIn 변경 추적은 추후 자동화 예정입니다. 현재는 검색 링크를 제공합니다.