CV 레이더

CV 레이더컴퓨터비전 주요 학회/arXiv 신규 논문을 자동 수집해 요약, 코드 공개 여부, 저자 정보를 한곳에 정리합니다.https://cv-radar.example.com/JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoisinghttps://cv-radar.example.com/papers/2606.20563/https://cv-radar.example.com/papers/2606.20563/Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fThu, 18 Jun 2026 00:00:00 GMTECCV 2026cs.CVTimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Livinghttps://cv-radar.example.com/papers/2606.20561/https://cv-radar.example.com/papers/2606.20561/Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caThu, 18 Jun 2026 00:00:00 GMTcs.CVUNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learninghttps://cv-radar.example.com/papers/2606.20559/https://cv-radar.example.com/papers/2606.20559/Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume compThu, 18 Jun 2026 00:00:00 GMTcs.CVcs.LGThinking in Boxes: 3D Editing in Real Images Made Easyhttps://cv-radar.example.com/papers/2606.20556/https://cv-radar.example.com/papers/2606.20556/Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating apprThu, 18 Jun 2026 00:00:00 GMTcs.CVThe Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groupshttps://cv-radar.example.com/papers/2606.20547/https://cv-radar.example.com/papers/2606.20547/We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix LieThu, 18 Jun 2026 00:00:00 GMTcs.LGcs.CVcs.GRcs.ROmath.DGCurrent World Models Lack a Persistent State Corehttps://cv-radar.example.com/papers/2606.20545/https://cv-radar.example.com/papers/2606.20545/World models are increasingly regarded as a decisive step toward artificial general intelligence, yet modeling the physical world demands more than rendering convincing frames on demand: it requires an internal world state that keeps evolving over time, decoupled from observationThu, 18 Jun 2026 00:00:00 GMTcs.CVSSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generationhttps://cv-radar.example.com/papers/2606.20543/https://cv-radar.example.com/papers/2606.20543/Autoregressive models excel in visual generation by treating images as 1D sequences of discrete tokens, mirroring language modeling. However, this flattening discards the intrinsic 2D spatial locality of visual signals, creating severe computational bottlenecks during inference. Thu, 18 Jun 2026 00:00:00 GMTcs.CVCalTennis: Large Multi-View Tennis Video Dataset and Benchmark of Monocular-to-3D Pose Estimationhttps://cv-radar.example.com/papers/2606.20542/https://cv-radar.example.com/papers/2606.20542/The Caltech Tennis Dataset (CalTennis) is a large-scale video benchmark for evaluating monocular-to-3D pose estimation in the wild. CalTennis comprises over 11 million frames (51 hours) of tennis practice and match play from 40 players, captured with 2-6 synchronized cameras at 6Thu, 18 Jun 2026 00:00:00 GMTcs.CVThe FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluationhttps://cv-radar.example.com/papers/2606.20536/https://cv-radar.example.com/papers/2606.20536/The Frechet Inception Distance (FID) is the de facto arbiter of image generation, yet most papers report just a single number from a single trained model using a single sampling seed. How reproducible is that number if we retrain the model, or merely resample from it? In this papThu, 18 Jun 2026 00:00:00 GMTcs.CVVisDom: Sparse Novel View Synthesis with Visible Domain Constrainthttps://cv-radar.example.com/papers/2606.20531/https://cv-radar.example.com/papers/2606.20531/Sparse novel view synthesis (NVS) remains challenging due to the ambiguity of recovering 3D geometry from few input views. While NeRF- and Gaussian Splatting (GS)-based methods perform well with dense supervision, they often overfit in sparse settings, producing floating artifactThu, 18 Jun 2026 00:00:00 GMTcs.CVStylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMshttps://cv-radar.example.com/papers/2606.20527/https://cv-radar.example.com/papers/2606.20527/Multimodal large language models (MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood. Prior work often compares different (groups of) individuals, making it diThu, 18 Jun 2026 00:00:00 GMTICML 2026cs.CLcs.CVSARLO-80: Worldwide Slant SAR Language Optic Dataset 80cmhttps://cv-radar.example.com/papers/2606.20523/https://cv-radar.example.com/papers/2606.20523/Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR--optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products anThu, 18 Jun 2026 00:00:00 GMTcs.CVcs.AIcs.DBHumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraininghttps://cv-radar.example.com/papers/2606.20521/https://cv-radar.example.com/papers/2606.20521/Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, Thu, 18 Jun 2026 00:00:00 GMTcs.CVS-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligencehttps://cv-radar.example.com/papers/2606.20515/https://cv-radar.example.com/papers/2606.20515/Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textbf{\textsc{S-Agent}}, a spatial tool-useThu, 18 Jun 2026 00:00:00 GMTcs.CVFreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mininghttps://cv-radar.example.com/papers/2606.20506/https://cv-radar.example.com/papers/2606.20506/Style-content dual-reference generation aims to synthesize an image that preserves the structure and semantics of a content reference while adopting the style of a separate style reference.Despite recent progress, this setting remains challenging because models must balance conteThu, 18 Jun 2026 00:00:00 GMTcs.CVcs.AIFast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigationhttps://cv-radar.example.com/papers/2606.20491/https://cv-radar.example.com/papers/2606.20491/Human visual attention relies on structured scanpaths to efficiently process scenes, yet instilling this behavior into robot autonomy is in its infancy and hindered by the high,computational costs of existing predictive models. To address this, we introduce GazeLNN, a computationThu, 18 Jun 2026 00:00:00 GMTcs.ROcs.CVHow Fragile Are Training-Free AI-Generated Image Detectors? A Controlled Audit of Score Direction, Preprocessing, and Compressionhttps://cv-radar.example.com/papers/2606.20488/https://cv-radar.example.com/papers/2606.20488/Training-free detectors of AI-generated images promise generator-agnostic deployment without classifier training, yet their reported numbers are rarely compared under a single controlled protocol. We audit two representative training-free scores -- an autoencoder-reconstruction sThu, 18 Jun 2026 00:00:00 GMTcs.CVScalable Training of Spatially Grounded 2D Vision-Language Models for Radiologyhttps://cv-radar.example.com/papers/2606.20477/https://cv-radar.example.com/papers/2606.20477/We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQThu, 18 Jun 2026 00:00:00 GMTcs.CVcs.CLcs.LGPCFootprint: A Large-Scale Dataset and Benchmark for Vectorized Building Footprint Extraction from Aerial LiDAR Point Cloudshttps://cv-radar.example.com/papers/2606.20455/https://cv-radar.example.com/papers/2606.20455/Building footprint extraction is a fundamental task in photogrammetry, remote sensing, and computer vision. Recent image-based methods have achieved remarkable progress in extracting vectorized footprints from high-resolution optical imagery. However, optical imagery inherently sThu, 18 Jun 2026 00:00:00 GMTcs.CVInfantFace: Detecting infant faces in neonatal clinical environmentshttps://cv-radar.example.com/papers/2606.20449/https://cv-radar.example.com/papers/2606.20449/Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major chThu, 18 Jun 2026 00:00:00 GMTcs.CVSpectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigationhttps://cv-radar.example.com/papers/2606.20419/https://cv-radar.example.com/papers/2606.20419/Vision-language models (VLMs) often generate fluent but visually unsupported descriptions, especially by mentioning objects absent from the image. We propose QK Product Steering, a data-free, training-free, and zero-inference-cost weight edit for reducing object hallucination. ThThu, 18 Jun 2026 00:00:00 GMTcs.CVOn the Redundancy of Timestep Embeddings in Diffusion Modelshttps://cv-radar.example.com/papers/2606.20416/https://cv-radar.example.com/papers/2606.20416/Diffusion models rely heavily on explicit timestep embeddings to modulate the denoising process across various noise scales. In this work, we challenge the necessity of these temporal signals by analyzing their impact on U-Net and Diffusion Transformer architectures. Beyond empirThu, 18 Jun 2026 00:00:00 GMTcs.LGcs.CVFlowBender: Feedback-Aware Training for Self-Correcting Conditional Flowshttps://cv-radar.example.com/papers/2606.20404/https://cv-radar.example.com/papers/2606.20404/Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defiThu, 18 Jun 2026 00:00:00 GMTcs.CVGeometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classificationhttps://cv-radar.example.com/papers/2606.20390/https://cv-radar.example.com/papers/2606.20390/Automated skin cancer classification from dermoscopic images remains challenging due to heterogeneous lesion structure, strong intra-class variability, and subtle visual differences between benign and malignant cases. Existing CNN/ViT pipelines typically rely on global or patch-lThu, 18 Jun 2026 00:00:00 GMTcs.CVReliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detectionhttps://cv-radar.example.com/papers/2606.20312/https://cv-radar.example.com/papers/2606.20312/Pose-flow video anomaly detectors are attractive for one-class surveillance because they provide likelihood-based rankings for tracked skeleton windows. However, a single likelihood score may hide multimodal normal behavior and be sensitive to pose-observation noise. We study a fThu, 18 Jun 2026 00:00:00 GMTcs.CVThrough the PRISM: Preference Representation in Intermediate States of Video Diffusion Modelshttps://cv-radar.example.com/papers/2606.20310/https://cv-radar.example.com/papers/2606.20310/Evaluating video generation with clean, pixel-based reward models disconnects evaluation from the noisy diffusion process and incurs massive VAE decoding costs. In this paper, we challenge this paradigm by asking a fundamental question: Can a powerful video generator inherently dThu, 18 Jun 2026 00:00:00 GMTcs.CVGEN-Guard: Correcting Generalization Failures for Deployable Federated Surgical AIhttps://cv-radar.example.com/papers/2606.20303/https://cv-radar.example.com/papers/2606.20303/Federated Learning (FL) in surgical video AI enables collaborative model training without sharing sensitive data. However, standard evaluation practices - selecting the "best" global model based only on validation data from participating hospitals - can lead to suboptimal deploymThu, 18 Jun 2026 00:00:00 GMTcs.CVCUPID: Reconstructing UV Texture Maps for Interpretable Person-of-Interest Deepfake Detectionhttps://cv-radar.example.com/papers/2606.20302/https://cv-radar.example.com/papers/2606.20302/Deepfakes targeting a high-profile individual, known as Person-of-Interest (POI), are a threat to modern democracies and societies. Current POI deepfake detection methods still struggle to combine robustness to post-processing, efficiency and interpretability, focal aspects of moThu, 18 Jun 2026 00:00:00 GMTcs.CVCMDS-AD: Cross-Modal Dual-Stream Decoupling for Few-Shot Anomaly Detectionhttps://cv-radar.example.com/papers/2606.20300/https://cv-radar.example.com/papers/2606.20300/Few-shot anomaly detection remains challenging due to limited training data. Multi-modal anomaly detection (MAD) offers a viable solution, leveraging 3D geometric cues to enrich 2D RGB representations and compensate for this scarcity. However, existing MAD methods apply spatiallyThu, 18 Jun 2026 00:00:00 GMTECCV 2026cs.CVIntegrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer visionhttps://cv-radar.example.com/papers/2606.20291/https://cv-radar.example.com/papers/2606.20291/Remote sensing is increasingly relied upon to deliver actionable science for forest and wildfire risk management across large landscapes. Wall-to-wall, annually updated maps are a persistent need for effective forest management. Many planning systems and data collections combine Thu, 18 Jun 2026 00:00:00 GMTcs.LGcs.CVU$^2$Mamba: A Two-level Nested U-structure Mamba for Salient Object Detectionhttps://cv-radar.example.com/papers/2606.20282/https://cv-radar.example.com/papers/2606.20282/Mamba-based models have emerged as a promising alternative for salient object detection (SOD), offering significant advantages in modeling long sequences. However, existing models often fail to explore contextual information and the depth of the entire architecture. This paper inThu, 18 Jun 2026 00:00:00 GMTcs.CVEfficiently Linking Real Scenes with Synthetic Data Generation for AI-based Cognitive Robotics and Computer Vision Applicationshttps://cv-radar.example.com/papers/2606.20272/https://cv-radar.example.com/papers/2606.20272/AI vision models are a driving factor for the potential use case scenarios of cognitive robotics within in the industry and household applications. A large array of methods from semantic environment analysis towards 6D and grasping pose estimation have been proposed based on the Thu, 18 Jun 2026 00:00:00 GMTcs.ROcs.CVSingle-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentationhttps://cv-radar.example.com/papers/2606.20250/https://cv-radar.example.com/papers/2606.20250/Existing weakly supervised semantic segmentation (WSSS) methods in computational pathology rely on a multi-stage paradigm: class activation map (CAM) generation, offline pseudo-mask refinement, and fully supervised retraining. While established, this decoupled approach presents fThu, 18 Jun 2026 00:00:00 GMTcs.CVSPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMshttps://cv-radar.example.com/papers/2606.20244/https://cv-radar.example.com/papers/2606.20244/Vision-language models (VLMs) often underperform on evidence intensive tasks because decisive visual evidence are small, localized, and easy to overlook, leading to failures in evidence readout even when high-level reasoning is intact. Prior inference-time visual interventions caThu, 18 Jun 2026 00:00:00 GMTcs.CVcs.AIBAFIS: Dataset + Framework to assess occupational Bias and Human Preference in modern Text-to-image Modelshttps://cv-radar.example.com/papers/2606.20241/https://cv-radar.example.com/papers/2606.20241/Generative artificial intelligence has the potential to improve productivity and transform the production of creative content. However, existing research indicates that image generation models are significantly influenced by biases. This work investigates the inherent biases and Thu, 18 Jun 2026 00:00:00 GMTWACV 2026cs.CVDeepForestVisionV2: Ecology-Driven Taxonomy Expansion for Camera-Trap Monitoring in African Tropical Forestshttps://cv-radar.example.com/papers/2606.20223/https://cv-radar.example.com/papers/2606.20223/Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline Thu, 18 Jun 2026 00:00:00 GMTcs.CVq-bio.QMEvaluation of Image Matching for Art Skills Assessmenthttps://cv-radar.example.com/papers/2606.20199/https://cv-radar.example.com/papers/2606.20199/While some individuals possess a natural talent for drawing, mastering this skill requires dedicated training and practice. Determining one's skill in the art of drawing requires proper comprehensive assessment. In this paper, we propose a method to measure drawing skill by by maThu, 18 Jun 2026 00:00:00 GMTcs.CVDistill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptationhttps://cv-radar.example.com/papers/2606.20196/https://cv-radar.example.com/papers/2606.20196/Continual Test-Time Adaptation (CTTA) aims to maintain model performance under evolving target domains by adapting online without labeled data. However, practical deployments often cannot retain the source dataset due to privacy or licensing constraints, and purely source-free CTThu, 18 Jun 2026 00:00:00 GMTECCV 2026cs.CVHilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-traininhttps://cv-radar.example.com/papers/2606.20189/https://cv-radar.example.com/papers/2606.20189/Leveraging Vision Foundation Models (VFMs) for camera-to-LiDAR knowledge distillation offers a promising solution to the scarcity of annotated data needed to represent the immense geometric and kinematic diversity of real-world autonomous driving (AD). However, current approachesThu, 18 Jun 2026 00:00:00 GMTECCV 2026cs.CVcs.AIcs.ROEvaluating and Enhancing Negation Comprehension in Remote Sensing MLLMshttps://cv-radar.example.com/papers/2606.20177/https://cv-radar.example.com/papers/2606.20177/Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is faThu, 18 Jun 2026 00:00:00 GMTECCV 2026cs.CVcs.AI