<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>CV 레이더</title><description>컴퓨터비전 주요 학회/arXiv 신규 논문을 자동 수집해 요약, 코드 공개 여부, 저자 정보를 한곳에 정리합니다.</description><link>https://cv-radar.example.com/</link><item><title>JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising</title><link>https://cv-radar.example.com/papers/2606.20563/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20563/</guid><description>Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches f</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>ECCV 2026</category><category>cs.CV</category></item><item><title>TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living</title><link>https://cv-radar.example.com/papers/2606.20561/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20561/</guid><description>Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse ca</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning</title><link>https://cv-radar.example.com/papers/2606.20559/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20559/</guid><description>Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume comp</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category><category>cs.LG</category></item><item><title>Thinking in Boxes: 3D Editing in Real Images Made Easy</title><link>https://cv-radar.example.com/papers/2606.20556/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20556/</guid><description>Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating appr</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups</title><link>https://cv-radar.example.com/papers/2606.20547/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20547/</guid><description>We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.LG</category><category>cs.CV</category><category>cs.GR</category><category>cs.RO</category><category>math.DG</category></item><item><title>Current World Models Lack a Persistent State Core</title><link>https://cv-radar.example.com/papers/2606.20545/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20545/</guid><description>World models are increasingly regarded as a decisive step toward artificial general intelligence, yet modeling the physical world demands more than rendering convincing frames on demand: it requires an internal world state that keeps evolving over time, decoupled from observation</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>SSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generation</title><link>https://cv-radar.example.com/papers/2606.20543/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20543/</guid><description>Autoregressive models excel in visual generation by treating images as 1D sequences of discrete tokens, mirroring language modeling. However, this flattening discards the intrinsic 2D spatial locality of visual signals, creating severe computational bottlenecks during inference. </description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>CalTennis: Large Multi-View Tennis Video Dataset and Benchmark of Monocular-to-3D Pose Estimation</title><link>https://cv-radar.example.com/papers/2606.20542/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20542/</guid><description>The Caltech Tennis Dataset (CalTennis) is a large-scale video benchmark for evaluating monocular-to-3D pose estimation in the wild. CalTennis comprises over 11 million frames (51 hours) of tennis practice and match play from 40 players, captured with 2-6 synchronized cameras at 6</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation</title><link>https://cv-radar.example.com/papers/2606.20536/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20536/</guid><description>The Frechet Inception Distance (FID) is the de facto arbiter of image generation, yet most papers report just a single number from a single trained model using a single sampling seed. How reproducible is that number if we retrain the model, or merely resample from it? In this pap</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>VisDom: Sparse Novel View Synthesis with Visible Domain Constraint</title><link>https://cv-radar.example.com/papers/2606.20531/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20531/</guid><description>Sparse novel view synthesis (NVS) remains challenging due to the ambiguity of recovering 3D geometry from few input views. While NeRF- and Gaussian Splatting (GS)-based methods perform well with dense supervision, they often overfit in sparse settings, producing floating artifact</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs</title><link>https://cv-radar.example.com/papers/2606.20527/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20527/</guid><description>Multimodal large language models (MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood. Prior work often compares different (groups of) individuals, making it di</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>ICML 2026</category><category>cs.CL</category><category>cs.CV</category></item><item><title>SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm</title><link>https://cv-radar.example.com/papers/2606.20523/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20523/</guid><description>Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR--optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products an</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category><category>cs.AI</category><category>cs.DB</category></item><item><title>HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining</title><link>https://cv-radar.example.com/papers/2606.20521/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20521/</guid><description>Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, </description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence</title><link>https://cv-radar.example.com/papers/2606.20515/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20515/</guid><description>Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textbf{\textsc{S-Agent}}, a spatial tool-use</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining</title><link>https://cv-radar.example.com/papers/2606.20506/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20506/</guid><description>Style-content dual-reference generation aims to synthesize an image that preserves the structure and semantics of a content reference while adopting the style of a separate style reference.Despite recent progress, this setting remains challenging because models must balance conte</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category><category>cs.AI</category></item><item><title>Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation</title><link>https://cv-radar.example.com/papers/2606.20491/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20491/</guid><description>Human visual attention relies on structured scanpaths to efficiently process scenes, yet instilling this behavior into robot autonomy is in its infancy and hindered by the high,computational costs of existing predictive models. To address this, we introduce GazeLNN, a computation</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.RO</category><category>cs.CV</category></item><item><title>How Fragile Are Training-Free AI-Generated Image Detectors? A Controlled Audit of Score Direction, Preprocessing, and Compression</title><link>https://cv-radar.example.com/papers/2606.20488/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20488/</guid><description>Training-free detectors of AI-generated images promise generator-agnostic deployment without classifier training, yet their reported numbers are rarely compared under a single controlled protocol. We audit two representative training-free scores -- an autoencoder-reconstruction s</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology</title><link>https://cv-radar.example.com/papers/2606.20477/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20477/</guid><description>We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQ</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category><category>cs.CL</category><category>cs.LG</category></item><item><title>PCFootprint: A Large-Scale Dataset and Benchmark for Vectorized Building Footprint Extraction from Aerial LiDAR Point Clouds</title><link>https://cv-radar.example.com/papers/2606.20455/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20455/</guid><description>Building footprint extraction is a fundamental task in photogrammetry, remote sensing, and computer vision. Recent image-based methods have achieved remarkable progress in extracting vectorized footprints from high-resolution optical imagery. However, optical imagery inherently s</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>InfantFace: Detecting infant faces in neonatal clinical environments</title><link>https://cv-radar.example.com/papers/2606.20449/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20449/</guid><description>Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major ch</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Spectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigation</title><link>https://cv-radar.example.com/papers/2606.20419/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20419/</guid><description>Vision-language models (VLMs) often generate fluent but visually unsupported descriptions, especially by mentioning objects absent from the image. We propose QK Product Steering, a data-free, training-free, and zero-inference-cost weight edit for reducing object hallucination. Th</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>On the Redundancy of Timestep Embeddings in Diffusion Models</title><link>https://cv-radar.example.com/papers/2606.20416/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20416/</guid><description>Diffusion models rely heavily on explicit timestep embeddings to modulate the denoising process across various noise scales. In this work, we challenge the necessity of these temporal signals by analyzing their impact on U-Net and Diffusion Transformer architectures. Beyond empir</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.LG</category><category>cs.CV</category></item><item><title>FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows</title><link>https://cv-radar.example.com/papers/2606.20404/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20404/</guid><description>Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defi</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Geometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classification</title><link>https://cv-radar.example.com/papers/2606.20390/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20390/</guid><description>Automated skin cancer classification from dermoscopic images remains challenging due to heterogeneous lesion structure, strong intra-class variability, and subtle visual differences between benign and malignant cases. Existing CNN/ViT pipelines typically rely on global or patch-l</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection</title><link>https://cv-radar.example.com/papers/2606.20312/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20312/</guid><description>Pose-flow video anomaly detectors are attractive for one-class surveillance because they provide likelihood-based rankings for tracked skeleton windows. However, a single likelihood score may hide multimodal normal behavior and be sensitive to pose-observation noise. We study a f</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Through the PRISM: Preference Representation in Intermediate States of Video Diffusion Models</title><link>https://cv-radar.example.com/papers/2606.20310/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20310/</guid><description>Evaluating video generation with clean, pixel-based reward models disconnects evaluation from the noisy diffusion process and incurs massive VAE decoding costs. In this paper, we challenge this paradigm by asking a fundamental question: Can a powerful video generator inherently d</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>GEN-Guard: Correcting Generalization Failures for Deployable Federated Surgical AI</title><link>https://cv-radar.example.com/papers/2606.20303/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20303/</guid><description>Federated Learning (FL) in surgical video AI enables collaborative model training without sharing sensitive data. However, standard evaluation practices - selecting the &quot;best&quot; global model based only on validation data from participating hospitals - can lead to suboptimal deploym</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>CUPID: Reconstructing UV Texture Maps for Interpretable Person-of-Interest Deepfake Detection</title><link>https://cv-radar.example.com/papers/2606.20302/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20302/</guid><description>Deepfakes targeting a high-profile individual, known as Person-of-Interest (POI), are a threat to modern democracies and societies. Current POI deepfake detection methods still struggle to combine robustness to post-processing, efficiency and interpretability, focal aspects of mo</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>CMDS-AD: Cross-Modal Dual-Stream Decoupling for Few-Shot Anomaly Detection</title><link>https://cv-radar.example.com/papers/2606.20300/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20300/</guid><description>Few-shot anomaly detection remains challenging due to limited training data. Multi-modal anomaly detection (MAD) offers a viable solution, leveraging 3D geometric cues to enrich 2D RGB representations and compensate for this scarcity. However, existing MAD methods apply spatially</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>ECCV 2026</category><category>cs.CV</category></item><item><title>Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision</title><link>https://cv-radar.example.com/papers/2606.20291/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20291/</guid><description>Remote sensing is increasingly relied upon to deliver actionable science for forest and wildfire risk management across large landscapes. Wall-to-wall, annually updated maps are a persistent need for effective forest management. Many planning systems and data collections combine </description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.LG</category><category>cs.CV</category></item><item><title>U$^2$Mamba: A Two-level Nested U-structure Mamba for Salient Object Detection</title><link>https://cv-radar.example.com/papers/2606.20282/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20282/</guid><description>Mamba-based models have emerged as a promising alternative for salient object detection (SOD), offering significant advantages in modeling long sequences. However, existing models often fail to explore contextual information and the depth of the entire architecture. This paper in</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Efficiently Linking Real Scenes with Synthetic Data Generation for AI-based Cognitive Robotics and Computer Vision Applications</title><link>https://cv-radar.example.com/papers/2606.20272/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20272/</guid><description>AI vision models are a driving factor for the potential use case scenarios of cognitive robotics within in the industry and household applications. A large array of methods from semantic environment analysis towards 6D and grasping pose estimation have been proposed based on the </description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.RO</category><category>cs.CV</category></item><item><title>Single-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentation</title><link>https://cv-radar.example.com/papers/2606.20250/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20250/</guid><description>Existing weakly supervised semantic segmentation (WSSS) methods in computational pathology rely on a multi-stage paradigm: class activation map (CAM) generation, offline pseudo-mask refinement, and fully supervised retraining. While established, this decoupled approach presents f</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs</title><link>https://cv-radar.example.com/papers/2606.20244/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20244/</guid><description>Vision-language models (VLMs) often underperform on evidence intensive tasks because decisive visual evidence are small, localized, and easy to overlook, leading to failures in evidence readout even when high-level reasoning is intact. Prior inference-time visual interventions ca</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category><category>cs.AI</category></item><item><title>BAFIS: Dataset + Framework to assess occupational Bias and Human Preference in modern Text-to-image Models</title><link>https://cv-radar.example.com/papers/2606.20241/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20241/</guid><description>Generative artificial intelligence has the potential to improve productivity and transform the production of creative content. However, existing research indicates that image generation models are significantly influenced by biases. This work investigates the inherent biases and </description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>WACV 2026</category><category>cs.CV</category></item><item><title>DeepForestVisionV2: Ecology-Driven Taxonomy Expansion for Camera-Trap Monitoring in African Tropical Forests</title><link>https://cv-radar.example.com/papers/2606.20223/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20223/</guid><description>Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline </description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category><category>q-bio.QM</category></item><item><title>Evaluation of Image Matching for Art Skills Assessment</title><link>https://cv-radar.example.com/papers/2606.20199/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20199/</guid><description>While some individuals possess a natural talent for drawing, mastering this skill requires dedicated training and practice. Determining one&apos;s skill in the art of drawing requires proper comprehensive assessment. In this paper, we propose a method to measure drawing skill by by ma</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>cs.CV</category></item><item><title>Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation</title><link>https://cv-radar.example.com/papers/2606.20196/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20196/</guid><description>Continual Test-Time Adaptation (CTTA) aims to maintain model performance under evolving target domains by adapting online without labeled data. However, practical deployments often cannot retain the source dataset due to privacy or licensing constraints, and purely source-free CT</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>ECCV 2026</category><category>cs.CV</category></item><item><title>HilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-trainin</title><link>https://cv-radar.example.com/papers/2606.20189/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20189/</guid><description>Leveraging Vision Foundation Models (VFMs) for camera-to-LiDAR knowledge distillation offers a promising solution to the scarcity of annotated data needed to represent the immense geometric and kinematic diversity of real-world autonomous driving (AD). However, current approaches</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>ECCV 2026</category><category>cs.CV</category><category>cs.AI</category><category>cs.RO</category></item><item><title>Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs</title><link>https://cv-radar.example.com/papers/2606.20177/</link><guid isPermaLink="true">https://cv-radar.example.com/papers/2606.20177/</guid><description>Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is fa</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>ECCV 2026</category><category>cs.CV</category><category>cs.AI</category></item></channel></rss>