├── misc ├── mask.jpg └── CVPR2022-papers.list ├── output ├── cvpr_22.png ├── keywords.jpg ├── wordcloud.png └── keyworks-2122.png ├── README.md └── cvpr22.py /misc/mask.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/misc/mask.jpg -------------------------------------------------------------------------------- /output/cvpr_22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/cvpr_22.png -------------------------------------------------------------------------------- /output/keywords.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/keywords.jpg -------------------------------------------------------------------------------- /output/wordcloud.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/wordcloud.png -------------------------------------------------------------------------------- /output/keyworks-2122.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/keyworks-2122.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | - [CVPR2022-Paper-Statistics](#cvpr2022-paper-statistics) 2 | - [Wordcloud](#wordcloud) 3 | - [Acceptance](#acceptance) 4 | - [Keyworks](#keyworks) 5 | - [Code Uasge](#code-uasge) 6 | # CVPR2022-Paper-Statistics 7 | 8 | Statistics and Visualization of main keyword of accepted papers for The IEEE / CVF Computer Vision and Pattern Recognition Conference ([CVPR 2022](https://cvpr2022.thecvf.com/)) 9 | 10 | Inspired by [CVPR-2021-Paper-Statistics](https://github.com/hoya012/CVPR-2021-Paper-Statistics/) 11 | 12 | 13 | ## Wordcloud 14 | 15 | 16 | ![output](./output/wordcloud.png) 17 | 18 | ## Acceptance 19 | 20 | | Year | Submissions | Acceptance | Acceptance rate | 21 | | :---: | :---------: | :--------: | :-------------: | 22 | | 2011 | 1677 | 438 | 26.10% | 23 | | 2012 | 1933 | 466 | 24.10% | 24 | | 2013 | 1798 | 472 | 26.20% | 25 | | 2014 | 1807 | 540 | 29.90% | 26 | | 2015 | 2123 | 602 | 28.40% | 27 | | 2016 | 2145 | 643 | 29.90% | 28 | | 2017 | 2680 | 783 | 29.20% | 29 | | 2018 | 3359 | 979 | 29.10% | 30 | | 2019 | 5160 | 1294 | 25.07% | 31 | | 2020 | 6656 | 1468 | 22.13% | 32 | | 2021 | 7015 | 1663 | 23.71% | 33 | | 2022 | 8161 | 2067 | 25.32% | 34 | 35 | ![acceptance](./output/cvpr_22.png) 36 | 37 | ## Keyworks 38 | 39 | ![wordcloud](./output/keywords.jpg) 40 | 41 | - Most of the top keywords were maintained 42 | - Image, Object, Detection, 3D, Video, Segmentation 43 | - **Transfomer** are about $5\times$ as frequent, thanks for [**ViT**](https://arxiv.org/abs/2010.11929) 44 | - transfomer: $35$ -> $177$ 😉 45 | 46 | 47 | ![wordcloud](./output/keyworks-2122.png) 48 | 49 | 50 | ## Code Uasge 51 | 52 | ```bash 53 | # 1. install packages 54 | pip install matplotlib scipy pillow wordcloud nltk numpy 55 | # 2. run script 56 | python cvpr22.py --list misc/CVPR2022-papers.list 57 | ``` -------------------------------------------------------------------------------- /cvpr22.py: -------------------------------------------------------------------------------- 1 | """ 2 | Dependency: 3 | matplotlib scipy pillow wordcloud nltk 4 | Provided codes were adapted from: 5 | http://amueller.github.io/word_cloud/ 6 | https://github.com/hoya012/CVPR-2021-Paper-Statistics/ 7 | """ 8 | import argparse 9 | 10 | import matplotlib.pyplot as plt 11 | import nltk 12 | import numpy as np 13 | from PIL import Image 14 | from scipy.ndimage import gaussian_gradient_magnitude 15 | from wordcloud import ImageColorGenerator, WordCloud 16 | 17 | nltk.download("stopwords") 18 | from collections import Counter 19 | 20 | from nltk.corpus import stopwords 21 | 22 | 23 | def get_list(args) -> list: 24 | with open(args.list) as f: 25 | papar_list = [line.strip() for line in f] 26 | return papar_list 27 | 28 | 29 | def gen_keywords_fig(keyword_counter): 30 | # Show N most common keywords and their frequencies 31 | num_keyowrd = 75 32 | keywords_counter_vis = keyword_counter.most_common(num_keyowrd) 33 | 34 | plt.rcdefaults() 35 | _, ax = plt.subplots(figsize=(8, 18)) 36 | 37 | key = [k[0] for k in keywords_counter_vis] 38 | value = [k[1] for k in keywords_counter_vis] 39 | y_pos = np.arange(len(key)) 40 | ax.barh(y_pos, value, align="center", color="green", ecolor="black") 41 | ax.set_yticks(y_pos) 42 | ax.set_yticklabels(key, rotation=0, fontsize=10) 43 | ax.invert_yaxis() 44 | for i, v in enumerate(value): 45 | ax.text(v + 2, i + 0.25, str(v), color="black", fontsize=10) 46 | ax.set_title("CVPR 2022 Submission Top {} Keywords".format(num_keyowrd)) 47 | 48 | plt.savefig("keywords.jpg", bbox_inches="tight", dpi=128) 49 | 50 | 51 | def gen_wordcloud_fig(keyword_list): 52 | # Show the word cloud forming by keywords 53 | parrot_color = np.array(Image.open("misc/mask.jpg")) 54 | # subsample by factor of 3. Very lossy but for a wordcloud we don't really care. 55 | parrot_color = parrot_color[::3, ::3] 56 | 57 | # create mask white is "masked out" 58 | parrot_mask = parrot_color.copy() 59 | parrot_mask[parrot_mask.sum(axis=2) == 0] = 255 60 | 61 | # some finesse: we enforce boundaries between colors so they get less washed out. 62 | # For that we do some edge detection in the image 63 | edges = np.mean( 64 | [ 65 | gaussian_gradient_magnitude(parrot_color[:, :, i] / 255.0, 2) 66 | for i in range(3) 67 | ], 68 | axis=0, 69 | ) 70 | parrot_mask[edges > 0.08] = 255 71 | 72 | # create wordcloud. A bit sluggish, you can subsample more strongly for quicker rendering 73 | # relative_scaling=0 means the frequencies in the data are reflected less 74 | # acurately but it makes a better picture 75 | wc = WordCloud( 76 | max_words=1024, 77 | mask=parrot_mask, 78 | max_font_size=50, 79 | random_state=42, 80 | background_color="white", 81 | relative_scaling=0, 82 | ) 83 | # generate word cloud 84 | wc.generate(" ".join(keyword_list)) 85 | # create coloring from image 86 | image_colors = ImageColorGenerator(parrot_color) 87 | wc.recolor(color_func=image_colors) 88 | plt.figure(figsize=(10, 10)) 89 | plt.imshow(wc, interpolation="bilinear") 90 | wc.to_file("wordcloud.png") 91 | 92 | 93 | def run(args): 94 | papar_list = get_list(args) 95 | print(f" == totals: {len(papar_list)}") 96 | 97 | stopwords_englisth = stopwords.words("english") 98 | stopwords_deep_learning = [ 99 | "learning", 100 | "network", 101 | "neural", 102 | "networks", 103 | "deep", 104 | "via", 105 | "using", 106 | "convolutional", 107 | "single", 108 | ] 109 | 110 | keyword_list = [] 111 | 112 | for idx, title in enumerate(papar_list): 113 | 114 | print(f"{idx} : {title}") 115 | 116 | word_list = title.split(" ") 117 | word_list = list(set(word_list)) 118 | word_list_cleaned = [] 119 | for word in word_list: 120 | word = word.lower() 121 | if word not in stopwords_englisth and word not in stopwords_deep_learning: 122 | word_list_cleaned.append(word) 123 | 124 | for k in range(len(word_list_cleaned)): 125 | keyword_list.append(word_list_cleaned[k]) 126 | 127 | keyword_counter = Counter(keyword_list) 128 | print(keyword_counter) 129 | print(f"{len(keyword_counter)} different keywords before merging") 130 | 131 | # Merge duplicates: CNNs and CNN 132 | duplicates = [] 133 | for k in keyword_counter: 134 | if k + "s" in keyword_counter: 135 | duplicates.append(k) 136 | for k in duplicates: 137 | keyword_counter[k] += keyword_counter[k + "s"] 138 | del keyword_counter[k + "s"] 139 | print(keyword_counter) 140 | print(f"{len(keyword_counter)} different keywords after merging") 141 | gen_keywords_fig(keyword_counter) 142 | gen_wordcloud_fig(keyword_list) 143 | 144 | 145 | if __name__ == "__main__": 146 | 147 | parser = argparse.ArgumentParser(description="CVPR 2022 Paper Statistics.") 148 | parser.add_argument("--list", type=str, required=True, help="Paper list") 149 | args = parser.parse_args() 150 | run(args) 151 | -------------------------------------------------------------------------------- /misc/CVPR2022-papers.list: -------------------------------------------------------------------------------- 1 | Efficient Deep Embedded Subspace Clustering 2 | Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers 3 | CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data 4 | Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning 5 | Active Learning for Open-Set Annotation 6 | Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training 7 | Robust Optimization As Data Augmentation for Large-Scale Graphs 8 | A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty 9 | The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration 10 | Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector 11 | GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning 12 | Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning 13 | A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration 14 | Learning To Learn by Jointly Optimizing Neural Architecture and Weights 15 | Learning To Prompt for Continual Learning 16 | Meta-Attention for ViT-Backed Continual Learning 17 | Multi-Frame Self-Supervised Depth With Transformers 18 | Continual Learning With Lifelong Vision Transformer 19 | Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation 20 | Revisiting Random Channel Pruning for Neural Network Compression 21 | Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase 22 | Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning 23 | Towards Robust and Reproducible Active Learning Using Neural Networks 24 | Non-Iterative Recovery From Nonlinear Observations Using Generative Models 25 | Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders 26 | Robust Combination of Distributed Gradients Under Adversarial Perturbations 27 | Do Learned Representations Respect Causal Relationships? 28 | How Much More Data Do I Need? Estimating Requirements for Downstream Tasks 29 | Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees 30 | Contrastive Test-Time Adaptation 31 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation 32 | Selective-Supervised Contrastive Learning With Noisy Labels 33 | RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks 34 | Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction 35 | Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels 36 | Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design 37 | Learning Structured Gaussians To Approximate Deep Ensembles 38 | Out-of-Distribution Generalization With Causal Invariant Transformations 39 | Split Hierarchical Variational Compression 40 | Implicit Feature Decoupling With Depthwise Quantization 41 | Understanding Uncertainty Maps in Vision With Statistical Testing 42 | A Hybrid Quantum-Classical Algorithm for Robust Fitting 43 | A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching 44 | FastDOG: Fast Discrete Optimization on GPU 45 | Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization 46 | AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks 47 | Training Quantised Neural Networks With STE Variants: The Additive Noise Annealing Algorithm 48 | AME: Attention and Memory Enhancement in Hyper-Parameter Optimization 49 | Accelerating Neural Network Optimization Through an Automated Control Theory Lens 50 | Efficient Maximal Coding Rate Reduction by Variational Forms 51 | A Unified Framework for Implicit Sinkhorn Differentiation 52 | Computing Wasserstein-p Distance Between Images With Linear Cost 53 | An Iterative Quantum Approach for Transformation Estimation From Point Sets 54 | BoosterNet: Improving Domain Generalization of Deep Neural Nets Using Culpability-Ranked Features 55 | Pooling Revisited: Your Receptive Field Is Suboptimal 56 | Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 57 | Online Convolutional Re-Parameterization 58 | RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality 59 | DyRep: Bootstrapping Training With Dynamic Re-Parameterization 60 | Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free 61 | Condensing CNNs With Partial Differential Equations 62 | Deep Equilibrium Optical Flow Estimation 63 | Frame Averaging for Equivariant Shape Space Learning 64 | Dual-Generator Face Reenactment 65 | Convolution of Convolution: Let Kernels Spatially Collaborate 66 | SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention 67 | RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising 68 | Co-Domain Symmetry for Complex-Valued Deep Learning 69 | Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention 70 | Compressing Models With Few Samples: Mimicking Then Replacing 71 | Total Variation Optimization Layers for Computer Vision 72 | AIM: An Auto-Augmenter for Images and Meshes 73 | Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to the Task of Accelerated MRI Reconstruction 74 | Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching 75 | Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images 76 | Delving Into the Estimation Shift of Batch Normalization in a Network 77 | Generalizing Interactive Backpropagating Refinement for Dense Prediction Networks 78 | Brain-Inspired Multilayer Perceptron With Spiking Neurons 79 | Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique 80 | Revisiting Weakly Supervised Pre-Training of Visual Perception Models 81 | On the Integration of Self-Attention and Convolution 82 | Hire-MLP: Vision MLP via Hierarchical Rearrangement 83 | Stable Long-Term Recurrent Video Super-Resolution 84 | Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation 85 | Progressive End-to-End Object Detection in Crowded Scenes 86 | Zero-Shot Text-Guided Object Generation With Dream Fields 87 | ISNet: Shape Matters for Infrared Small Target Detection 88 | Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving 89 | CLRNet: Cross Layer Refinement Network for Lane Detection 90 | CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection 91 | Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection 92 | Group Contextualization for Video Recognition 93 | Learning Transferable Human-Object Interaction Detector With Natural Language Supervision 94 | Accelerating DETR Convergence via Semantic-Aligned Matching 95 | Efficient Video Instance Segmentation via Tracklet Query and Proposal 96 | Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation 97 | Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection 98 | C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation 99 | Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval 100 | AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks 101 | Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection 102 | A Proposal-Based Paradigm for Self-Supervised Sound Source Localization in Videos 103 | SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization 104 | Towards End-to-End Unified Scene Text Detection and Layout Analysis 105 | Clothes-Changing Person Re-Identification With RGB Modality Only 106 | MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection 107 | Homography Loss for Monocular 3D Object Detection 108 | TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers 109 | TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation 110 | RBGNet: Ray-Based Grouping for 3D Object Detection 111 | Voxel Field Fusion for 3D Object Detection 112 | Learning To Detect Mobile Objects From LiDAR Scans Without Labels 113 | OccAM’s Laser: Occlusion-Based Attribution Maps for 3D Object Detectors on LiDAR Data 114 | Confidence Propagation Cluster: Unleash Full Potential of Object Detectors 115 | TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization 116 | A Voxel Graph CNN for Object Classification With Event Cameras 117 | OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection 118 | Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes 119 | Category Contrast for Unsupervised Domain Adaptation in Visual Tasks 120 | Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With a Bayesian Model 121 | GANSeg: Learning To Segment by Unsupervised Hierarchical Image Generation 122 | Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation 123 | Deep Hierarchical Semantic Segmentation 124 | Semantic Segmentation by Early Region Proxy 125 | Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation 126 | Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers 127 | Masked-Attention Mask Transformer for Universal Image Segmentation 128 | FocalClick: Towards Practical Interactive Image Segmentation 129 | High Quality Segmentation for Ultra High-Resolution Images 130 | Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks 131 | Recurrent Dynamic Embedding for Video Object Segmentation 132 | Accelerating Video Object Segmentation With Compressed Video 133 | Per-Clip Video Object Segmentation 134 | SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization 135 | Neural Recognition of Dashed Curves With Gestalt Law of Continuity 136 | CVNet: Contour Vibration Network for Building Extraction 137 | A Keypoint-Based Global Association Network for Lane Detection 138 | EDTER: Edge Detection With Transformer 139 | Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction 140 | Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration 141 | CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance 142 | FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing 143 | Rotationally Equivariant 3D Object Detection 144 | AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis 145 | Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes 146 | Human Mesh Recovery From Multiple Shots 147 | HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network 148 | Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing 149 | Disentangled3D: Learning a 3D Generative Model With Disentangled Geometry and Appearance From Monocular Images 150 | NeuralHDHair: Automatic High-Fidelity Hair Modeling From a Single Image Using Implicit Neural Representations 151 | Topologically-Aware Deformation Fields for Single-View 3D Reconstruction 152 | Generating Diverse 3D Reconstructions From a Single Occluded Face Image 153 | LOLNerf: Learn From One Look 154 | Learning Local Displacements for Point Cloud Completion 155 | Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation 156 | Dimension Embeddings for Monocular 3D Object Detection 157 | Understanding 3D Object Articulation in Internet Videos 158 | P3Depth: Monocular Depth Estimation With a Piecewise Planarity Prior 159 | Neural Face Identification in a 2D Wireframe Projection of a Manifold Object 160 | PanopticDepth: A Unified Framework for Depth-Aware Panoptic Segmentation 161 | Stability-Driven Contact Reconstruction From Monocular Color Images 162 | LGT-Net: Indoor Panoramic Room Layout Estimation With Geometry-Aware Transformer Network 163 | Collaborative Learning for Hand and Object Reconstruction With Attention-Guided Graph Convolution 164 | RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes 165 | Exploring Geometric Consistency for Monocular 3D Object Detection 166 | Learning 3D Object Shape and Layout Without 3D Supervision 167 | Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data 168 | Occluded Human Mesh Recovery 169 | LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints 170 | OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction 171 | Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light 172 | Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection 173 | HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening 174 | Revisiting Near/Remote Sensing With Geospatial Attention 175 | Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening 176 | Mutual Information-Driven Pan-Sharpening 177 | Sparse and Complete Latent Organization for Geospatial Semantic Segmentation 178 | The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions 179 | Oriented RepPoints for Aerial Object Detection 180 | Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction 181 | PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images 182 | Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites 183 | MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting 184 | Iterative Deep Homography Estimation 185 | GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors 186 | Deep Color Consistent Network for Low-Light Image Enhancement 187 | LAR-SR: A Local Autoregressive Model for Image Super-Resolution 188 | Multi-Scale Memory-Based Video Deblurring 189 | Local Texture Estimator for Implicit Representation Function 190 | Chitransformer: Towards Reliable Stereo From Cues 191 | BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras 192 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior 193 | IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation 194 | Learning Graph Regularisation for Guided Super-Resolution 195 | Self-Supervised Deep Image Restoration via Adaptive Stochastic Gradient Langevin Dynamics 196 | Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation 197 | Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching 198 | Unpaired Deep Image Deraining Using Dual Contrastive Learning 199 | Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots 200 | Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition 201 | VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution 202 | Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space 203 | Exploring and Evaluating Image Restoration Potential in Dynamic Scenes 204 | GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation 205 | Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method 206 | IDR: Self-Supervised Image Denoising via Iterative Data Refinement 207 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo 208 | Texture-Based Error Analysis for Image Super-Resolution 209 | Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel 210 | KNN Local Attention for Image Restoration 211 | Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection 212 | Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection 213 | Self-Supervised Keypoint Discovery in Behavioral Videos 214 | Learning To Align Sequential Actions in the Wild 215 | Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination 216 | End-to-End Human-Gaze-Target Detection With Transformers 217 | Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis 218 | MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction 219 | Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction 220 | End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps 221 | Learning Affordance Grounding From Exocentric Images 222 | 3D Scene Painting via Semantic Image Synthesis 223 | Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography 224 | ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection 225 | Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches 226 | Image Disentanglement Autoencoder for Steganography Without Embedding 227 | Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection 228 | Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning 229 | Density-Preserving Deep Point Cloud Compression 230 | Graph-Context Attention Networks for Size-Varied Deep Graph Matching 231 | TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions 232 | ObjectFormer for Image Manipulation Detection and Localization 233 | Sequential Voting With Relational Box Fields for Active Object Detection 234 | Efficient Classification of Very Large Images With Tiny Objects 235 | Partially Does It: Towards Scene-Level FG-SBIR With Partial Input 236 | Long-Term Visual Map Sparsification With Heterogeneous GNN 237 | Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association 238 | DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation 239 | Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring 240 | Rethinking Image Cropping: Exploring Diverse Compositions From Global Views 241 | Defensive Patches for Robust Recognition in the Physical World 242 | Semi-Supervised Video Paragraph Grounding With Contrastive Encoder 243 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels 244 | Meta Distribution Alignment for Generalizable Person Re-Identification 245 | FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction 246 | It’s About Time: Analog Clock Reading in the Wild 247 | Consistency Driven Sequential Transformers Attention Model for Partially Observable Scenes 248 | SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles 249 | Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers 250 | Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving 251 | CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation 252 | Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers 253 | Rethinking Semantic Segmentation: A Prototype View 254 | Semantic-Aware Domain Generalized Segmentation 255 | Adaptive Early-Learning Correction for Segmentation From Noisy Annotations 256 | Pointly-Supervised Instance Segmentation 257 | Joint Forecasting of Panoptic Segmentations With Difference Attention 258 | FocusCut: Diving Into a Focus View in Interactive Segmentation 259 | Human Instance Matting via Mutual Guidance and Multi-Instance Refinement 260 | Deformable Sprites for Unsupervised Video Decomposition 261 | Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation 262 | Robust and Accurate Superquadric Recovery: A Probabilistic Approach 263 | Medial Spectral Coordinates for 3D Shape Analysis 264 | Scribble-Supervised LiDAR Semantic Segmentation 265 | SoftGroup for 3D Instance Segmentation on Point Clouds 266 | Accurate 3D Body Shape Regression Using Metric and Semantic Attributes 267 | JIFF: Jointly-Aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction 268 | Tracking People by Predicting 3D Appearance, Location and Pose 269 | ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis 270 | Interacting Attention Graph for Single Image Two-Hand Reconstruction 271 | 3D Human Tongue Reconstruction From Single “In-the-Wild” Images 272 | EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation 273 | Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection 274 | OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion 275 | Gated2Gated: Self-Supervised Depth Estimation From Gated Images 276 | IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes 277 | Egocentric Scene Understanding via Multimodal Spatial Rectifier 278 | Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry 279 | The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement 280 | BANMo: Building Animatable 3D Neural Models From Many Casual Videos 281 | Self-Supervised Video Transformer 282 | Temporally Efficient Vision Transformer for Video Instance Segmentation 283 | VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation 284 | Temporal Alignment Networks for Long-Term Video 285 | Revisiting the “Video” in Video-Language Understanding 286 | Invariant Grounding for Video Question Answering 287 | P3IV: Probabilistic Procedure Planning From Instructional Videos With Weak Supervision 288 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment 289 | Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition 290 | Revisiting Skeleton-Based Action Recognition 291 | OpenTAL: Towards Open Set Temporal Action Localization 292 | Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition 293 | TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition 294 | Revealing Occlusions With 4D Neural Fields 295 | HODOR: High-Level Object Descriptors for Object Re-Segmentation in Video Learned From Static Images 296 | Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning 297 | UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection 298 | Future Transformer for Long-Term Action Anticipation 299 | MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing 300 | Learning Pixel-Level Distinctions for Video Highlight Detection 301 | DR.VIC: Decomposition and Reasoning for Video Individual Counting 302 | Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation 303 | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline 304 | Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training 305 | Coarse-To-Fine Feature Mining for Video Semantic Segmentation 306 | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation 307 | Object-Region Video Transformers 308 | Colar: Effective and Efficient Online Action Detection by Consulting Exemplars 309 | SimVP: Simpler Yet Better Video Prediction 310 | Imposing Consistency for Optical Flow Estimation 311 | Stand-Alone Inter-Frame Attention in Video Models 312 | Video Swin Transformer 313 | Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection 314 | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes 315 | Likert Scoring With Grade Decoupling for Long-Term Action Assessment 316 | Complex Video Action Reasoning via Learnable Markov Logic Network 317 | Learning From Temporal Gradient for Semi-Supervised Action Recognition 318 | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction 319 | Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation 320 | Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos 321 | Human Hands As Probes for Interactive Object Understanding 322 | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition 323 | Object-Aware Video-Language Pre-Training for Retrieval 324 | Fast and Unsupervised Action Boundary Detection for Action Segmentation 325 | Multiview Transformers for Video Recognition 326 | Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos 327 | Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection 328 | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses 329 | Sound-Guided Semantic Image Manipulation 330 | Expressive Talking Head Generation With Granular Audio-Visual Control 331 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation 332 | Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans From a Single Camera 333 | Audio-Driven Neural Gesture Reenactment With Video Motion Graphs 334 | Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data 335 | Weakly Supervised High-Fidelity Clothing Model Generation 336 | TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates 337 | Full-Range Virtual Try-On With Recurrent Tri-Level Transform 338 | Style-Based Global Appearance Flow for Virtual Try-On 339 | Dressing in the Wild by Watching Dance Videos 340 | A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres 341 | Unpaired Cartoon Image Synthesis via Gated Cycle Mapping 342 | DLFormer: Discrete Latent Transformer for Video Inpainting 343 | ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation 344 | Video Frame Interpolation With Transformer 345 | Long-Term Video Frame Interpolation via Feature Propagation 346 | Many-to-Many Splatting for Efficient Video Frame Interpolation 347 | Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image 348 | Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning 349 | Playable Environments: Video Manipulation in Space and Time 350 | Event-Based Video Reconstruction via Potential-Assisted Spiking Neural Network 351 | Modular Action Concept Grounding in Semantic Video Prediction 352 | Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning 353 | StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2 354 | Structure-Aware Motion Transfer With Deformable Anchor Model 355 | Image Animation With Perturbed Masks 356 | Thin-Plate Spline Motion Model for Image Animation 357 | Controllable Animation of Fluid Elements in Still Images 358 | Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects 359 | Geometric Structure Preserving Warp for Natural Image Stitching 360 | Few-Shot Incremental Learning for Label-to-Image Translation 361 | Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network 362 | SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks 363 | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage 364 | PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework 365 | Kubric: A Scalable Dataset Generator 366 | 360MonoDepth: High-Resolution 360° Monocular Depth Estimation 367 | Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction 368 | DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation 369 | MonoGround: Detecting Monocular 3D Objects From the Ground 370 | 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow 371 | Toward Practical Monocular Indoor Depth Estimation 372 | Focal Length and Object Pose Estimation via Render and Compare 373 | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields 374 | Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images 375 | Layered Depth Refinement With Mask Guidance 376 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction 377 | BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information 378 | Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving 379 | What’s in Your Hands? 3D Reconstruction of Generic Objects in Hands 380 | 3D Moments From Near-Duplicate Photos 381 | Neural Window Fully-Connected CRFs for Monocular Depth Estimation 382 | PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors 383 | CroMo: Cross-Modal Learning for Monocular Depth Estimation 384 | f-SfT: Shape-From-Template With a Physics-Based Deformation Model 385 | Human-Aware Object Placement for Visual Environment Reconstruction 386 | AutoRF: Learning 3D Object Radiance Fields From Single View Observations 387 | Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation 388 | MonoScene: Monocular 3D Semantic Scene Completion 389 | GenDR: A Generalized Differentiable Renderer 390 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer 391 | ROCA: Robust CAD Model Retrieval and Alignment From a Single Image 392 | HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network 393 | Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC 394 | Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning 395 | Enhancing Face Recognition With Self-Supervised 3D Reconstruction 396 | Learning To Learn Across Diverse Data Biases in Deep Face Recognition 397 | An Efficient Training Approach for Very Large Scale Face Recognition 398 | MogFace: Towards a Deeper Appreciation on Face Detection 399 | Exploring Frequency Adversarial Attacks for Face Forgery Detection 400 | End-to-End Reconstruction-Classification Learning for Face Forgery Detection 401 | Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing 402 | Privacy-Preserving Online AutoML for Domain-Specific Face Detection 403 | Simulated Adversarial Testing of Face Recognition Models 404 | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing 405 | Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin 406 | Towards Accurate Facial Landmark Detection via Cascaded Transformers 407 | PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer 408 | GazeOnce: Real-Time Multi-Person Gaze Estimation 409 | Generalizing Gaze Estimation With Rotation Consistency 410 | Face Relighting With Geometrically Consistent Shadows 411 | HairMapper: Removing Hair From Portraits Using GANs 412 | Learning To Restore 3D Face From In-the-Wild Degraded Images 413 | Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels 414 | Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation 415 | ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation 416 | Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement 417 | Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation 418 | Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation 419 | Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation 420 | Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast 421 | Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds 422 | Novel Class Discovery in Semantic Segmentation 423 | Pin the Memory: Learning To Generalize Semantic Segmentation 424 | ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation 425 | Incremental Learning in Semantic Segmentation From Image Labels 426 | Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers 427 | SharpContour: A Contour-Based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation 428 | Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings 429 | Mask Transfiner for High-Quality Instance Segmentation 430 | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity 431 | Sparse Instance Activation for Real-Time Instance Segmentation 432 | E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation 433 | Hyperbolic Image Segmentation 434 | SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information 435 | CDGNet: Class Distribution Guided Network for Human Parsing 436 | CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation 437 | Sparse Non-Local CRF 438 | Detecting Camouflaged Object in Frequency Domain 439 | Progressive Minimal Path Method With Embedded CNN 440 | Open-Set Text Recognition via Character-Context Decoupling 441 | Neural Collaborative Graph Machines for Table Structure Recognition 442 | Revisiting Document Image Dewarping by Grid Regularization 443 | Syntax-Aware Network for Handwritten Mathematical Expression Recognition 444 | Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection 445 | Fourier Document Restoration for Robust Document Dewarping and Recognition 446 | XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding 447 | SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition 448 | Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer 449 | TableFormer: Table Structure Understanding With Transformers 450 | Knowledge Mining With Scene Text for Fine-Grained Recognition 451 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents 452 | Focal and Global Knowledge Distillation for Detectors 453 | Speed Up Object Detection on Gigapixel-Level Images With Patch Arrangement 454 | Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer 455 | Learning With Neighbor Consistency for Noisy Labels 456 | Meta Convolutional Neural Networks for Single Domain Generalization 457 | Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification 458 | Geometry-Aware Guided Loss for Deep Crack Recognition 459 | Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way 460 | Dynamic Sparse R-CNN 461 | Deep Hybrid Models for Out-of-Distribution Detection 462 | AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification 463 | Feature Erasing and Diffusion Network for Occluded Person Re-Identification 464 | Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss 465 | BoxeR: Box-Attention for 2D and 3D Transformers 466 | Multi-Label Iterated Learning for Image Classification With Label Ambiguity 467 | Vision Transformer With Deformable Attention 468 | MViTv2: Improved Multiscale Vision Transformers for Classification and Detection 469 | Dense Learning Based Semi-Supervised Object Detection 470 | R(Det)2: Randomized Decision Routing for Object Detection 471 | GlideNet: Global, Local and Intrinsic Based Dense Embedding NETwork for Multi-Category Attributes Prediction 472 | Self-Supervised Equivariant Learning for Oriented Keypoint Detection 473 | Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification 474 | Object Localization Under Single Coarse Point Supervision 475 | Rethinking Visual Geo-Localization for Large-Scale Applications 476 | Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild 477 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification 478 | Towards Unsupervised Domain Generalization 479 | ViM: Out-of-Distribution With Virtual-Logit Matching 480 | Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space 481 | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation 482 | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts 483 | Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation 484 | Language As Queries for Referring Video Object Segmentation 485 | End-to-End Referring Video Object Segmentation With Multimodal Transformers 486 | Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation 487 | X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval 488 | Video-Text Representation Learning via Differentiable Weak Temporal Alignment 489 | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions 490 | Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions 491 | Measuring Compositional Consistency for Video Question Answering 492 | SimVQA: Exploring Simulated Environments for Visual Question Answering 493 | Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering 494 | SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering 495 | MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering 496 | Maintaining Reasoning Consistency in Compositional Visual Question Answering 497 | MLSLT: Towards Multilingual Sign Language Translation 498 | A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation 499 | C2SLR: Consistency-Enhanced Continuous Sign Language Recognition 500 | Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production 501 | Generating Diverse and Natural 3D Human Motions From Text 502 | Sub-Word Level Lip Reading With Visual Attention 503 | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale 504 | ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval 505 | Cross Modal Retrieval With Querybank Normalisation 506 | Prompt Distribution Learning 507 | VALHALLA: Visual Hallucination for Machine Translation 508 | VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks 509 | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality 510 | MixFormer: Mixing Features Across Windows and Dimensions 511 | Recurrent Glimpse-Based Decoder for Detection With Transformer 512 | Mobile-Former: Bridging MobileNet and Transformer 513 | Unsupervised Domain Generalization by Learning a Bridge Across Domains 514 | SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection 515 | Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection 516 | PNP: Robust Learning From Noisy Labels by Probabilistic Noise Prediction 517 | Few-Shot Object Detection With Fully Cross-Transformer 518 | Task Discrepancy Maximization for Fine-Grained Few-Shot Classification 519 | Leveraging Self-Supervision for Cross-Domain Crowd Counting 520 | What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions 521 | AdaMixer: A Fast-Converging Query-Based Object Detector 522 | Correlation Verification for Image Retrieval 523 | Real-Time Object Detection for Streaming Perception 524 | Deep Visual Geo-Localization Benchmark 525 | RendNet: Unified 2D/3D Recognizer With Latent Space Rendering 526 | Sparse Fuse Dense: Towards High Quality 3D Detection With Depth Completion 527 | Focal Sparse Convolutional Networks for 3D Object Detection 528 | Point-NeRF: Point-Based Neural Radiance Fields 529 | NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction 530 | Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction 531 | Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields 532 | RegNeRF: Regularizing Neural Radiance Fields for View Synthesis From Sparse Inputs 533 | Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields 534 | Plenoxels: Radiance Fields Without Neural Networks 535 | Neural 3D Scene Reconstruction With the Manhattan-World Assumption 536 | Neural 3D Video Synthesis From Multi-View Video 537 | Learning To Solve Hard Minimal Problems 538 | Learning a Structured Latent Space for Unsupervised Point Cloud Completion 539 | Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes 540 | IRON: Inverse Rendering by Optimizing Neural SDFs and Materials From Photometric Images 541 | Learning Multi-View Aggregation in the Wild for Large-Scale 3D Semantic Segmentation 542 | HyperDet3D: Learning a Scene-Conditioned 3D Object Detector 543 | KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos 544 | SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video 545 | Ditto: Building Digital Twins of Articulated Objects From Interaction 546 | Bijective Mapping Network for Shadow Removal 547 | Toward Fast, Flexible, and Robust Low-Light Image Enhancement 548 | Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements 549 | Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution 550 | Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution 551 | SphereSR: 360° Image Super-Resolution With Arbitrary Projection via Continuous Spherical Image Representation 552 | Learning Trajectory-Aware Transformer for Video Super-Resolution 553 | Discrete Cosine Transform Network for Guided Depth Map Super-Resolution 554 | Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations 555 | ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding 556 | Restormer: Efficient Transformer for High-Resolution Image Restoration 557 | Deep Rectangling for Image Stitching: A Learning Baseline 558 | Parametric Scattering Networks 559 | Burst Image Restoration and Enhancement 560 | MAXIM: Multi-Axis MLP for Image Processing 561 | Event-Aided Direct Sparse Odometry 562 | CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation 563 | Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection 564 | Image Dehazing Transformer With Transmission-Aware 3D Position Embedding 565 | Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity 566 | Towards Multi-Domain Single Image Dehazing via Test-Time Training 567 | Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal 568 | Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment 569 | Practical Learned Lossless JPEG Recompression With Multi-Level Cross-Channel Entropy Model in the DCT Domain 570 | Neural Compression-Based Feature Learning for Video Restoration 571 | Bi-Directional Object-Context Prioritization Learning for Saliency Ranking 572 | Pixel Screening Based Intermediate Correction for Blind Deblurring 573 | URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement 574 | A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution 575 | Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction 576 | Task Decoupled Framework for Reference-Based Super-Resolution 577 | Learning Semantic Associations for Mirror Detection 578 | SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches 579 | Investigating Tradeoffs in Real-World Video Super-Resolution 580 | BasicVSR++: Improving Video Super-Resolution With Enhanced Propagation and Alignment 581 | Inertia-Guided Flow Completion and Style Fusion for Video Inpainting 582 | Joint Global and Local Hierarchical Priors for Learned Image Compression 583 | Reflash Dropout in Image Super-Resolution 584 | Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond 585 | Dreaming To Prune Image Deraining Networks 586 | LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network 587 | Exposure Normalization and Compensation for Multiple-Exposure Correction 588 | Revisiting Temporal Alignment for Video Restoration 589 | Learning the Degradation Distribution for Blind Image Super-Resolution 590 | LSVC: A Learning-Based Stereo Video Compression Framework 591 | Learning Based Multi-Modality Image and Video Compression 592 | Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World 593 | Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish 594 | Stereo Depth From Events Cameras: Concentrate and Focus on the Future 595 | Volumetric Bundle Adjustment for Online Photorealistic Scene Capture 596 | Neural Volumetric Object Selection 597 | HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture 598 | NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions 599 | BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion 600 | Input-Level Inductive Biases for 3D Reconstruction 601 | Multi-View Mesh Reconstruction With Neural Deferred Shading 602 | StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions 603 | RGB-Depth Fusion GAN for Indoor Depth Completion 604 | PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos 605 | Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations 606 | ShapeFormer: Transformer-Based Shape Completion via Sparse Representation 607 | GuideFormer: Transformers for Image Guided Depth Completion 608 | Improving Neural Implicit Surfaces Geometry With Patch Warping 609 | Critical Regularizations for Neural Surface Reconstruction in the Wild 610 | Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction 611 | Neural RGB-D Surface Reconstruction 612 | POCO: Point Convolution for Surface Reconstruction 613 | Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors 614 | Surface Reconstruction From Point Clouds by Learning Predictive Context Priors 615 | IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment 616 | Deterministic Point Cloud Registration via Novel Transformation Decomposition 617 | Global-Aware Registration of Less-Overlap RGB-D Scans 618 | Finding Good Configurations of Planar Primitives in Unorganized Point Clouds 619 | Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels 620 | AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception 621 | WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation 622 | Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild 623 | Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture 624 | MotionAug: Augmentation With Physical Correction for Human Motion Prediction 625 | Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction 626 | Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction 627 | Motron: Multimodal Probabilistic Human Motion Forecasting 628 | Human Trajectory Prediction With Momentary Observation 629 | Non-Probability Sampling Network for Stochastic Human Trajectory Prediction 630 | Remember Intentions: Retrospective-Memory-Based Trajectory Prediction 631 | GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction With Relational Reasoning 632 | Learning Pixel Trajectories With Multiscale Contrastive Random Walks 633 | Adaptive Trajectory Prediction via Transferable GNN 634 | Neural Prior for Trajectory Estimation 635 | M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction 636 | How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting 637 | ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework 638 | Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction 639 | Convolutions for Spatial Interaction Modeling 640 | Style-ERD: Responsive and Coherent Online Motion Style Transfer 641 | Neural Inertial Localization 642 | RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry 643 | CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism 644 | ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses 645 | Projective Manifold Gradient Layer for Deep Rotation Regression 646 | Multimodal Colored Point Cloud to Image Alignment 647 | Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering 648 | REGTR: End-to-End Point Cloud Correspondences With Transformers 649 | Text2Pos: Text-to-Point-Cloud Cross-Modal Localization 650 | BCOT: A Markerless High-Precision 3D Object Tracking Benchmark 651 | SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation 652 | ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework 653 | Coupled Iterative Refinement for 6D Multi-Object Pose Estimation 654 | ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation 655 | SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings 656 | MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision 657 | Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions 658 | GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting 659 | HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR 660 | OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation 661 | FS6D: Few-Shot 6D Pose Estimation of Novel Objects 662 | OnePose: One-Shot Object Pose Estimation Without CAD Models 663 | OSOP: A Multi-Stage One Shot Object Pose Estimation Framework 664 | DiffPoseNet: Direct Differentiable Camera Pose Estimation 665 | Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects 666 | CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild 667 | Leveraging Equivariant Features for Absolute Pose Regression 668 | The Majority Can Help the Minority: Context-Rich Minority Oversampling for Long-Tailed Classification 669 | Long-Tailed Recognition via Weight Balancing 670 | Balanced Contrastive Learning for Long-Tailed Visual Recognition 671 | Targeted Supervised Contrastive Learning for Long-Tailed Recognition 672 | Long-Tailed Visual Recognition via Gaussian Clouded Logit Adjustment 673 | Long-Tail Recognition via Compositional Knowledge Transfer 674 | Nested Collaborative Learning for Long-Tailed Visual Recognition 675 | Retrieval Augmented Classification for Long-Tail Visual Recognition 676 | Trustworthy Long-Tailed Classification 677 | C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection 678 | Equalized Focal Loss for Dense Long-Tailed Object Detection 679 | Relieving Long-Tailed Instance Segmentation via Pairwise Class Balance 680 | iFS-RCNN: An Incremental Few-Shot Instance Segmenter 681 | Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling 682 | SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation 683 | Undoing the Damage of Label Shift for Cross-Domain Semantic Segmentation 684 | Representation Compensation Networks for Continual Semantic Segmentation 685 | Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer 686 | Domain-Agnostic Prior for Transfer Semantic Segmentation 687 | Image Segmentation Using Text and Image Prompts 688 | PCL: Proxy-Based Contrastive Learning for Domain Generalization 689 | Localized Adversarial Domain Generalization 690 | Compound Domain Generalization via Meta-Knowledge Encoding 691 | Style Neophile: Constantly Seeking Novel Styles for Domain Generalization 692 | Slimmable Domain Adaptation 693 | Exploring Domain-Invariant Parameters for Source Free Domain Adaptation 694 | Cross-Domain Few-Shot Learning With Task-Specific Adapters 695 | Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition 696 | Reusing the Task-Specific Classifier as a Discriminator: Discriminator-Free Adversarial Domain Adaptation 697 | Safe Self-Refinement for Transformer-Based Domain Adaptation 698 | Continual Test-Time Domain Adaptation 699 | Source-Free Domain Adaptation via Distribution Estimation 700 | Domain Adaptation on Point Clouds via Geometry-Aware Implicits 701 | Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds 702 | Hyperspherical Consistency Regularization 703 | BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning 704 | Cascade Transformers for End-to-End Person Search 705 | Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts 706 | MPViT: Multi-Path Vision Transformer for Dense Prediction 707 | NFormer: Robust Person Re-Identification With Neighbor Transformer 708 | Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification 709 | Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification 710 | Augmented Geometric Distillation for Data-Free Incremental Person ReID 711 | Salient-to-Broad Transition for Video Person Re-Identification 712 | FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification 713 | Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification 714 | Implicit Sample Extension for Unsupervised Person Re-Identification 715 | Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection 716 | Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection 717 | Fine-Grained Object Classification via Self-Supervised Pose Alignment 718 | Hyperbolic Vision Transformers: Combining Improvements in Metric Learning 719 | Non-Isotropy Regularization for Proxy-Based Deep Metric Learning 720 | Self-Taught Metric Learning Without Labels 721 | Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency 722 | Energy-Based Latent Aligner for Incremental Learning 723 | Sketch3T: Test-Time Training for Zero-Shot SBIR 724 | The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant Learning via Pose-Aware Convolution 725 | Finding Badly Drawn Bunnies 726 | Generalized Category Discovery 727 | Recall@k Surrogate Loss With Large Batches and Similarity Mixup 728 | Modeling 3D Layout for Group Re-Identification 729 | Causal Transportability for Visual Recognition 730 | Attributable Visual Similarity Learning 731 | Bi-Level Alignment for Cross-Domain Crowd Counting 732 | Mutual Quantization for Cross-Modal Search With Noisy Labels 733 | Task Adaptive Parameter Sharing for Multi-Task Learning 734 | Simple Multi-Dataset Detection 735 | Cross-Domain Adaptive Teacher for Object Detection 736 | Balanced and Hierarchical Relation Learning for One-Shot Object Detection 737 | Semantic-Aligned Fusion Transformer for One-Shot Object Detection 738 | MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning 739 | Robust Region Feature Synthesizer for Zero-Shot Object Detection 740 | Region-Aware Face Swapping 741 | High-Resolution Face Swapping via Latent Semantics Disentanglement 742 | Rethinking Deep Face Restoration 743 | Blind Face Restoration via Integrating Face Shape and Generative Priors 744 | FENeRF: Face Editing in Neural Radiance Fields 745 | TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing 746 | Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer 747 | Self-Supervised Correlation Mining Network for Person Image Generation 748 | Exploring Dual-Task Correlation for Pose Guided Person Image Generation 749 | InsetGAN for Full-Body Image Generation 750 | BodyGAN: General-Purpose Controllable Neural Human Body Generation 751 | HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs 752 | Structure-Aware Flow Generation for Human Body Reshaping 753 | Modeling Image Composition for Complex Scene Generation 754 | Local Attention Pyramid for Scene Image Generation 755 | Interactive Image Synthesis With Panoptic Layout Generation 756 | iPLAN: Interactive and Procedural Layout Planning 757 | E-CIR: Event-Enhanced Continuous Intensity Recovery 758 | Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion 759 | Neural Rays for Occlusion-Aware Image-Based Rendering 760 | Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation 761 | PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models 762 | Commonality in Natural Images Rescues GANs: Pretraining GANs With Generic and Privacy-Free Synthetic Data 763 | Think Twice Before Detecting GAN-Generated Fake Images From Their Spectral Domain Imprints 764 | Robust Invertible Image Steganography 765 | Distinguishing Unseen From Seen for Generalized Zero-Shot Learning 766 | Few-Shot Font Generation by Learning Fine-Grained Local Styles 767 | XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation 768 | Learning To Generate Line Drawings That Convey Geometry and Semantics 769 | Balanced MSE for Imbalanced Visual Regression 770 | Transferability Metrics for Selecting Source Model Ensembles 771 | OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization 772 | Robust Fine-Tuning of Zero-Shot Models 773 | Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification 774 | Learning To Learn and Remember Super Long Multi-Domain Task Sequence 775 | Learning Distinctive Margin Toward Active Domain Adaptation 776 | DINE: Domain Adaptation From Single and Multiple Black-Box Predictors 777 | Source-Free Object Detection by Learning To Overlook Domain Style 778 | Towards Principled Disentanglement for Domain Generalization 779 | Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization 780 | Causality Inspired Representation Learning for Domain Generalization 781 | Learning What Not To Segment: A New Perspective on Few-Shot Segmentation 782 | Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation 783 | ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation 784 | MeMOT: Multi-Object Tracking With Memory 785 | Unsupervised Learning of Accurate Siamese Tracking 786 | Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds 787 | GMFlow: Learning Optical Flow via Global Matching 788 | GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking 789 | SNUG: Self-Supervised Neural Dynamic Garments 790 | Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction 791 | Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation 792 | Context-Aware Sequence Alignment Using 4D Skeletal Augmentation 793 | Enabling Equivariance for Arbitrary Lie Groups 794 | RAMA: A Rapid Multicut Algorithm on GPU 795 | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks 796 | RCP: Recurrent Closest Point for Point Cloud 797 | Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis 798 | Balanced Multimodal Learning via On-the-Fly Gradient Modulation 799 | Block-NeRF: Scalable Large Scene Neural View Synthesis 800 | SceneSqueezer: Learning To Compress Scene for Camera Relocalization 801 | Light Field Neural Rendering 802 | Extracting Triangular 3D Models, Materials, and Lighting From Images 803 | Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) 804 | Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models 805 | It’s All in the Teacher: Zero-Shot Quantization Brought Closer to the Teacher 806 | NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks 807 | Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention 808 | Parameter-Free Online Test-Time Adaptation 809 | Patch-Level Representation Learning for Self-Supervised Vision Transformers 810 | Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization 811 | Mixed Differential Privacy in Computer Vision 812 | DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis 813 | Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning 814 | AirObject: A Temporally Evolving Graph Embedding for Object Identification 815 | Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds 816 | SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud 817 | Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement 818 | VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention 819 | Embracing Single Stride 3D Object Detector With Sparse Transformer 820 | Point Density-Aware Voxels for LiDAR 3D Object Detection 821 | Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation 822 | Contrastive Boundary Learning for Point Cloud Segmentation 823 | Stratified Transformer for 3D Point Cloud Segmentation 824 | No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models by Fitting Feature-Level Space-Time Surfaces 825 | Point2Seq: Detecting 3D Objects As Sequences 826 | PTTR: Relational 3D Point Cloud Object Tracking With Transformer 827 | A Unified Query-Based Paradigm for Point Cloud Understanding 828 | PointCLIP: Point Cloud Understanding by CLIP 829 | X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning 830 | MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Convolutions 831 | TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers 832 | RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo 833 | IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo 834 | PSMNet: Position-Aware Stereo Merging Network for Room Layout Estimation 835 | Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo 836 | Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering 837 | Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation 838 | Efficient Multi-View Stereo by Iterative Dynamic Cost Volume 839 | PlaneMVS: 3D Plane Reconstruction From Multi-View Stereo 840 | Discrete Time Convolution for Fast Event-Based Stereo 841 | Stereo Magnification With Multi-Layer Images 842 | TransforMatcher: Match-to-Match Attention for Semantic Correspondence 843 | Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences 844 | Locality-Aware Inter– and Intra-Video Reconstruction for Self-Supervised Correspondence Learning 845 | Transforming Model Prediction for Tracking 846 | Ranking-Based Siamese Visual Tracking 847 | Correlation-Aware Deep Tracking 848 | Global Tracking via Ensemble of Local Trackers 849 | Global Tracking Transformers 850 | Unified Transformer Tracker for Object Tracking 851 | Transformer Tracking With Cyclic Shifting Window Attention 852 | Spiking Transformers for Event-Based Single Object Tracking 853 | Adiabatic Quantum Computing for Multi Object Tracking 854 | HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction 855 | Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking 856 | TrackFormer: Multi-Object Tracking With Transformers 857 | Learning of Global Objective for Network Flow in Multi-Object Tracking 858 | LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking 859 | Multi-Object Tracking Meets Moving UAV 860 | Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline 861 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking 862 | Learning Optical Flow With Kernel Patch Attention 863 | Towards Understanding Adversarial Robustness of Optical Flow Networks 864 | DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow 865 | On the Instability of Relative Pose Estimation and RANSAC’s Role 866 | Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training 867 | Global Sensing and Measurements Reuse for Image Compressed Sensing 868 | Maximum Consensus by Weighted Influences of Monotone Boolean Functions 869 | MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph 870 | Styleformer: Transformer Based Generative Adversarial Networks With Style Vector 871 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose 872 | Generating Representative Samples for Few-Shot Classification 873 | Matching Feature Sets for Few-Shot Image Classification 874 | Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations 875 | Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection 876 | Forward Compatible Few-Shot Class-Incremental Learning 877 | Constrained Few-Shot Class-Incremental Learning 878 | Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference 879 | EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning 880 | Few-Shot Learning With Noisy Labels 881 | Ranking Distance Calibration for Cross-Domain Few-Shot Learning 882 | Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning 883 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning 884 | Learning To Memorize Feature Hallucination for One-Shot Image Generation 885 | A Closer Look at Few-Shot Image Generation 886 | Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition 887 | Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability 888 | Transferability Estimation Using Bhattacharyya Class Separability 889 | Revisiting the Transferability of Supervised Pretraining: An MLP Perspective 890 | Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data 891 | Which Model To Transfer? Finding the Needle in the Growing Haystack 892 | Does Robustness on ImageNet Transfer to Downstream Tasks? 893 | What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors 894 | OW-DETR: Open-World Detection Transformer 895 | Unseen Classes at a Later Time? No Problem 896 | Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism 897 | On Generalizing Beyond Domains in Cross-Domain Continual Learning 898 | Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries 899 | DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion 900 | Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning 901 | En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning 902 | VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning 903 | Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning 904 | KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning 905 | Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis 906 | WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery 907 | Omni-DETR: Omni-Supervised Object Detection With Transformers 908 | DESTR: Object Detection With Split Transformer 909 | A Dual Weighting Label Assignment Scheme for Object Detection 910 | Entropy-Based Active Learning for Object Detection With Progressive Diversity Constraint 911 | Localization Distillation for Dense Object Detection 912 | Group R-CNN for Weakly Semi-Supervised Object Detection With Points 913 | Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation 914 | CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping 915 | One Loss for Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching 916 | PSTR: End-to-End One-Step Person Search With Transformers 917 | Protecting Celebrities From DeepFake With Identity Consistency Transformer 918 | MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis 919 | Contextual Similarity Distillation for Asymmetric Image Retrieval 920 | Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning 921 | MPC: Multi-View Probabilistic Clustering 922 | Text Spotting Transformers 923 | Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting 924 | Reflection and Rotation Symmetry Detection via Equivariant Learning 925 | Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data 926 | A Simple Episodic Linear Probe Improves Visual Recognition in the Wild 927 | Cross Domain Object Detection by Target-Perceived Dual Branch Distillation 928 | Multi-Granularity Alignment Domain Adaptation for Object Detection 929 | Expanding Low-Density Latent Regions for Open-Set Object Detection 930 | Class-Incremental Learning With Strong Pre-Trained Models 931 | ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues 932 | Self-Supervised Models Are Continual Learners 933 | The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization 934 | Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning 935 | SimMIM: A Simple Framework for Masked Image Modeling 936 | Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning 937 | UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning 938 | Contrastive Conditional Neural Processes 939 | One-Bit Active Query With Contrastive Pairs 940 | HCSC: Hierarchical Contrastive Selective Coding 941 | Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging 942 | Hierarchical Self-Supervised Representation Learning for Movie Understanding 943 | Anomaly Detection via Reverse Distillation From One-Class Embedding 944 | Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning 945 | DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning 946 | Learning To Collaborate in Decentralized Learning of Personalized Models 947 | Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph 948 | DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning 949 | Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning 950 | Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes 951 | Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors 952 | Spectral Unsupervised Domain Adaptation for Visual Recognition 953 | DATA: Domain-Aware and Task-Aware Self-Supervised Learning 954 | Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning 955 | DeepDPM: Deep Clustering With an Unknown Number of Clusters 956 | PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions 957 | Robust Outlier Detection by De-Biasing VAE Likelihoods 958 | Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data 959 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding 960 | Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation 961 | DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation 962 | WildNet: Learning Domain Generalized Semantic Segmentation From the Wild 963 | UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation 964 | Semi-Supervised Semantic Segmentation With Error Localization Network 965 | Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation 966 | Integrative Few-Shot Learning for Classification and Segmentation 967 | GanOrCon: Are Generative Models Useful for Few-Shot Segmentation? 968 | SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis 969 | CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs 970 | GradViT: Gradient Inversion of Vision Transformers 971 | Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings 972 | CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning 973 | APRIL: Finding the Achilles’ Heel on Privacy for Vision Transformers 974 | Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning 975 | Robust Federated Learning With Noisy and Heterogeneous Clients 976 | Federated Learning With Position-Aware Neurons 977 | Layer-Wised Model Aggregation for Personalized Federated Learning 978 | FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning 979 | FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction 980 | Differentially Private Federated Learning With Local Regularization and Sparsification 981 | Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage 982 | Learn From Others and Be Yourself in Heterogeneous Federated Learning 983 | RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning 984 | Federated Class-Incremental Learning 985 | Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning 986 | FedCorr: Multi-Stage Federated Learning for Label Noise Correction 987 | ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning 988 | Cycle-Consistent Counterfactuals by Latent Transformations 989 | Consistent Explanations by Contrastive Learning 990 | Towards Better Understanding Attribution Methods 991 | Proto2Proto: Can You Recognize the Car, the Way I Do? 992 | Do Explanations Explain? Model Knows Best 993 | HINT: Hierarchical Neuron Concept Explainer 994 | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes 995 | What Do Navigation Agents Learn About Their Environment? 996 | A Framework for Learning Ante-Hoc Explainable Models via Concepts 997 | Exploiting Explainable Metrics for Augmented SGD 998 | FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks 999 | Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations 1000 | B-Cos Networks: Alignment Is All We Need for Interpretability 1001 | The Flag Median and FlagIRLS 1002 | Learning Fair Classifiers With Partially Annotated Group Labels 1003 | Estimating Structural Disparities for Face Models 1004 | Estimating Example Difficulty Using Variance of Gradients 1005 | Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models 1006 | Fair Contrastive Learning for Facial Attribute Classification 1007 | Leveraging Adversarial Examples To Quantify Membership Information Leakage 1008 | Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers 1009 | Deep Unlearning via Randomized Conditionally Independent Hessians 1010 | Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets 1011 | A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models 1012 | Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices? 1013 | Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation 1014 | SEEG: Semantic Energized Co-Speech Gesture Generation 1015 | Mix and Localize: Localizing Sound Sources in Mixtures 1016 | Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation 1017 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization 1018 | M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers 1019 | Finding Fallen Objects via Asynchronous Audio-Visual Integration 1020 | Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory 1021 | Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization 1022 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language 1023 | It’s Time for Artistic Correspondence in Music and Video 1024 | Self-Supervised Object Detection From Audio-Visual Correspondence 1025 | More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech 1026 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer 1027 | A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection 1028 | Diffusion Autoencoders: Toward a Meaningful and Decodable Representation 1029 | Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps 1030 | Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values 1031 | Ensembling Off-the-Shelf Models for GAN Training 1032 | Marginal Contrastive Correspondence for Guided Image Generation 1033 | GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation 1034 | High-Resolution Image Synthesis With Latent Diffusion Models 1035 | Vector Quantized Diffusion Model for Text-to-Image Synthesis 1036 | ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation 1037 | Dataset Distillation by Matching Training Trajectories 1038 | Continual Predictive Learning From Videos 1039 | Motion-Adjustable Neural Implicit Video Representation 1040 | Splicing ViT Features for Semantic Appearance Transfer 1041 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting 1042 | Day-to-Night Image Synthesis for Training Nighttime Neural ISPs 1043 | Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness 1044 | Few-Shot Head Swapping in the Wild 1045 | ClothFormer: Taming Video Virtual Try-On in All Module 1046 | A-ViT: Adaptive Tokens for Efficient Vision Transformer 1047 | MetaFormer Is Actually What You Need for Vision 1048 | Reversible Vision Transformers 1049 | Learned Queries for Efficient Local Attention 1050 | Shunted Self-Attention via Multi-Scale Token Aggregation 1051 | Automatic Relation-Aware Graph Network Proliferation 1052 | β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search 1053 | Distribution Consistent Neural Architecture Search 1054 | Training-Free Transformer Architecture Search 1055 | TeachAugment: Data Augmentation Optimization Using Teacher Knowledge 1056 | Knowledge Distillation via the Target-Aware Transformer 1057 | Knowledge Distillation: A Good Teacher Is Patient and Consistent 1058 | An Image Patch Is a Wave: Phase-Aware Vision MLP 1059 | Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information 1060 | Controllable Dynamic Multi-Task Architectures 1061 | Grounded Language-Image Pre-Training 1062 | ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds 1063 | CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings 1064 | Adversarial Parametric Pose Prior 1065 | Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation 1066 | PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision 1067 | Generalizable Human Pose Triangulation 1068 | GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras 1069 | Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory 1070 | Contextual Instance Decoupling for Robust Multi-Person Pose Estimation 1071 | End-to-End Multi-Person Pose Estimation With Transformers 1072 | Meta Agent Teaming Active Learning for Pose Estimation 1073 | Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation 1074 | Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer 1075 | Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture 1076 | LASER: LAtent SpacE Rendering for 2D Visual Localization 1077 | Learning To Detect Scene Landmarks for Camera Localization 1078 | Geometric Transformer for Fast and Robust Point Cloud Registration 1079 | ARCS: Accurate Rotation and Correspondence Search 1080 | FisherMatch: Semi-Supervised Rotation Regression via Entropy-Based Filtering 1081 | Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation 1082 | OSSGAN: Open-Set Semi-Supervised Image Generation 1083 | Attribute Group Editing for Reliable Few-Shot Image Generation 1084 | Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment 1085 | Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis 1086 | Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis 1087 | Generative Flows With Invertible Attentions 1088 | Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization 1089 | SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing 1090 | Manifold Learning Benefits GANs 1091 | DO-GAN: A Double Oracle Framework for Generative Adversarial Networks 1092 | Improving GAN Equilibrium by Raising Spatial Awareness 1093 | Feature Statistics Mixing Regularization for Generative Adversarial Networks 1094 | StyleSwin: Transformer-Based GAN for High-Resolution Image Generation 1095 | MaskGIT: Masked Generative Image Transformer 1096 | StyTr2: Image Style Transfer With Transformers 1097 | Style Transformer for Image Inversion and Editing 1098 | Reduce Information Loss in Transformers for Pluralistic Image Inpainting 1099 | Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding 1100 | UniCoRN: A Unified Conditional Image Repainting Network 1101 | High-Fidelity GAN Inversion for Image Attribute Editing 1102 | HyperInverter: Improving StyleGAN Inversion via Hypernetwork 1103 | Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing 1104 | On Aliased Resizing and Surprising Subtleties in GAN Evaluation 1105 | Dual-Path Image Inpainting With Auxiliary GAN Inversion 1106 | InOut: Diverse Image Outpainting via GAN Inversion 1107 | Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation 1108 | Contextual Outpainting With Object-Level Contrastive Learning 1109 | RePaint: Inpainting Using Denoising Diffusion Probabilistic Models 1110 | Perception Prioritized Training of Diffusion Models 1111 | Dynamic Dual-Output Diffusion Models 1112 | Generating High Fidelity Data From Low-Density Regions Using Diffusion Models 1113 | Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation 1114 | Bridging Global Context Interactions for High-Fidelity Image Completion 1115 | Autoregressive Image Generation Using Residual Quantization 1116 | Arbitrary-Scale Image Synthesis 1117 | Cluster-Guided Image Synthesis With Unconditional Models 1118 | Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation 1119 | Generalized Few-Shot Semantic Segmentation 1120 | Learning Non-Target Knowledge for Few-Shot Semantic Segmentation 1121 | Decoupling Zero-Shot Semantic Segmentation 1122 | Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation 1123 | ContrastMask: Contrastive Learning To Segment Every Thing 1124 | The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference 1125 | AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation 1126 | APES: Articulated Part Extraction From Sprite Sheets 1127 | GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation 1128 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision 1129 | Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images 1130 | C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image 1131 | CRIS: CLIP-Driven Referring Image Segmentation 1132 | MatteFormer: Transformer-Based Image Matting via Prior-Tokens 1133 | Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation 1134 | Pyramid Grafting Network for One-Stage High Resolution Saliency Detection 1135 | Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection 1136 | Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation 1137 | GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings 1138 | Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport 1139 | CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly 1140 | RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures 1141 | Discovering Objects That Can Move 1142 | PatchFormer: An Efficient Point Transformer With Patch Attention 1143 | Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap 1144 | SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation 1145 | An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation 1146 | Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation 1147 | Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders 1148 | Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? 1149 | BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule 1150 | Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search 1151 | Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search 1152 | GreedyNASv2: Greedier Search With a Greedy Path Filter 1153 | Neural Architecture Search With Representation Mutual Information 1154 | Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search 1155 | Knowledge Distillation With the Reused Teacher Classifier 1156 | Self-Distillation From the Last Mini-Batch for Consistency Regularization 1157 | Decoupled Knowledge Distillation 1158 | Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs 1159 | A ConvNet for the 2020s 1160 | Beyond Fixation: Dynamic Window Visual Transformer 1161 | Lite Vision Transformer With Enhanced Self-Attention 1162 | Swin Transformer V2: Scaling Up Capacity and Resolution 1163 | The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy 1164 | MulT: An End-to-End Multitask Learning Transformer 1165 | Towards Robust Vision Transformer 1166 | DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers 1167 | MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens 1168 | NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition 1169 | TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation 1170 | Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation 1171 | Scaling Vision Transformers 1172 | Bridged Transformer for Vision and Point Cloud 3D Object Detection 1173 | CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows 1174 | TransMix: Attend To Mix for Vision Transformers 1175 | MiniViT: Compressing Vision Transformers With Weight Multiplexing 1176 | Fine-Tuning Image Transformers Using Learnable Memory 1177 | Patch Slimming for Efficient Vision Transformers 1178 | CMT: Convolutional Neural Networks Meet Vision Transformers 1179 | Multimodal Token Fusion for Vision Transformers 1180 | CAFE: Learning To Condense Dataset by Aligning Features 1181 | Lite-MDETR: A Lightweight Multi-Modal Detector 1182 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning 1183 | Searching the Deployable Convolution Neural Networks for GPUs 1184 | Active Learning by Feature Mixing 1185 | When To Prune? A Policy Towards Early Structural Pruning 1186 | Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning 1187 | How Well Do Sparse ImageNet Models Transfer? 1188 | Rep-Net: Efficient On-Device Learning via Feature Reprogramming 1189 | CHEX: CHannel EXploration for CNN Model Compression 1190 | HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks 1191 | AdaViT: Adaptive Vision Transformers for Efficient Image Recognition 1192 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation 1193 | Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error 1194 | IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization 1195 | DECORE: Deep Compression With Reinforcement Learning 1196 | Towards Efficient and Scalable Sharpness-Aware Minimization 1197 | AEGNN: Asynchronous Event-Based Graph Neural Networks 1198 | DiSparse: Disentangled Sparsification for Multitask Model Compression 1199 | Multi-Modal Extreme Classification 1200 | A Sampling-Based Approach for Efficient Clustering in Large Datasets 1201 | Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction 1202 | Learnable Lookup Table for Neural Network Quantization 1203 | Instance-Aware Dynamic Neural Network Quantization 1204 | Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation 1205 | Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction 1206 | Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation 1207 | PokeBNN: A Binary Pursuit of Lightweight Accuracy 1208 | Automated Progressive Learning for Efficient Training of Vision Transformers 1209 | DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos 1210 | Channel Balancing for Accurate Quantization of Winograd Convolutions 1211 | ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching 1212 | Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs 1213 | AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation 1214 | TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing 1215 | SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems 1216 | TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed 1217 | DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation 1218 | Universal Photometric Stereo Network Using Global Lighting Contexts 1219 | Uncertainty-Aware Deep Multi-View Photometric Stereo 1220 | Fast Light-Weight Near-Field Photometric Stereo 1221 | Glass Segmentation Using Intensity and Spectral Polarization Cues 1222 | Shape From Polarization for Complex Scenes in the Wild 1223 | Deep Depth From Focus With Differential Focus Volume 1224 | Optimal LED Spectral Multiplexing for NIR2RGB Translation 1225 | Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements 1226 | NAN: Noise-Aware NeRFs for Burst-Denoising 1227 | Estimating Fine-Grained Noise Model via Contrastive Learning 1228 | Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders 1229 | MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution 1230 | PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images 1231 | Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors 1232 | Learning To Anticipate Future With Dynamic Context Removal 1233 | Self-Supervised Spatial Reasoning on Multi-View Line Drawings 1234 | Contextual Debiasing for Visual Recognition With Causal Mechanisms 1235 | Relative Pose From a Calibrated and an Uncalibrated Smartphone Image 1236 | Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation 1237 | NICE-SLAM: Neural Implicit Scalable Encoding for SLAM 1238 | NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning 1239 | ScaleNet: A Shallow Architecture for Scale Estimation 1240 | Camera Pose Estimation Using Implicit Distortion Models 1241 | GIFS: Neural Implicit Function for General Shape Representation 1242 | Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds 1243 | SPAMs: Structured Implicit Parametric Models 1244 | Deblur-NeRF: Neural Radiance Fields From Blurry Images 1245 | Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation 1246 | Depth-Supervised NeRF: Fewer Views and Faster Training for Free 1247 | Dense Depth Priors for Neural Radiance Fields From Sparse Input Views 1248 | EfficientNeRF ­ Efficient Neural Radiance Fields 1249 | InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering 1250 | Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs 1251 | Urban Radiance Fields 1252 | Hallucinated Neural Radiance Fields in the Wild 1253 | Towards Multimodal Depth Estimation From Light Fields 1254 | Degradation-Agnostic Correspondence From Resolution-Asymmetric Stereo 1255 | Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching 1256 | Attention Concatenation Volume for Accurate and Efficient Stereo Matching 1257 | Generalized Binary Search Network for Highly-Efficient Multi-View Stereo 1258 | Revisiting Domain Generalized Stereo Matching Networks From a Feature Consistency Perspective 1259 | GraftNet: Towards Domain Generalized Stereo Matching With a Broad-Spectrum and Task-Oriented Feature 1260 | ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks 1261 | ActiveZero: Mixed Domain Learning for Active Stereovision With Zero Annotation 1262 | FoggyStereo: Stereo Matching With Fog Volume Representation 1263 | Multi-Person Extreme Motion Prediction 1264 | Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation 1265 | AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation 1266 | Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation 1267 | Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation 1268 | Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video 1269 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization 1270 | Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation 1271 | Location-Free Human Pose Estimation 1272 | MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation 1273 | Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision 1274 | Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors 1275 | PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound 1276 | Differentiable Dynamics for Articulated 3D Human Motion Reconstruction 1277 | COAP: Compositional Articulated Occupancy of People 1278 | Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video 1279 | SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration 1280 | MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video 1281 | Putting People in Their Place: Monocular Regression of 3D People in Depth 1282 | FLAG: Flow-Based 3D Avatar Generation From Sparse Observations 1283 | GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping 1284 | Capturing and Inferring Dense Full-Body Human-Scene Contact 1285 | BodyMap: Learning Full-Body Dense Correspondence Map 1286 | ICON: Implicit Clothed Humans Obtained From Normals 1287 | Adversarial Texture for Fooling Person Detectors in the Physical World 1288 | Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World 1289 | Enhancing Classifier Conservativeness and Robustness by Polynomiality 1290 | Backdoor Attacks on Self-Supervised Learning 1291 | Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks 1292 | Few-Shot Backdoor Defense Using Shapley Estimation 1293 | Better Trigger Inversion Optimization in Backdoor Scanning 1294 | Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees 1295 | Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching 1296 | LAS-AT: Adversarial Training With Learnable Attack Strategy 1297 | Subspace Adversarial Training 1298 | Pyramid Adversarial Training Improves ViT Performance 1299 | Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations 1300 | Robust Image Forgery Detection Over Online Social Network Shared Images 1301 | Quantifying Societal Bias Amplification in Image Captioning 1302 | Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models 1303 | GAN-Supervised Dense Visual Alignment 1304 | Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator 1305 | Text2Mesh: Text-Driven Neural Stylization for Meshes 1306 | StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation 1307 | Physical Simulation Layer for Accurate 3D Modeling 1308 | Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time 1309 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis 1310 | I M Avatar: Implicit Morphable Head Avatars From Videos 1311 | E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations 1312 | RCL: Recurrent Continuous Localization for Temporal Action Detection 1313 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection 1314 | MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition 1315 | TubeR: Tubelet Transformer for Video Action Detection 1316 | MixFormer: End-to-End Tracking With Iterative Mixed Attention 1317 | DN-DETR: Accelerate DETR Training by Introducing Query DeNoising 1318 | Proper Reuse of Image Classification Features Improves Object Detection 1319 | Boosting 3D Object Detection by Simulating Multimodality on Point Clouds 1320 | TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation 1321 | Disentangling Visual Embeddings for Attributes and Objects 1322 | QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection 1323 | Unknown-Aware Object Detection: Learning What You Don’t Know From Videos in the Wild 1324 | Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks 1325 | Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective 1326 | Calibrating Deep Neural Networks by Pairwise Constraints 1327 | Lifelong Graph Learning 1328 | OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks 1329 | Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation 1330 | Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches 1331 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation 1332 | UnweaveNet: Unweaving Activity Stories 1333 | Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos 1334 | Audio-Adaptive Activity Recognition Across Video Domains 1335 | Frame-Wise Action Representations for Long Videos via Sequence Contrastive Learning 1336 | Image Based Reconstruction of Liquids From 2D Surface Detections 1337 | Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency 1338 | How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs 1339 | Programmatic Concept Learning for Human Motion Description and Synthesis 1340 | Learning To Recognize Procedural Activities With Distant Supervision 1341 | Implicit Motion Handling for Video Camouflaged Object Detection 1342 | Dynamic Scene Graph Generation via Anticipatory Pre-Training 1343 | Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization 1344 | OCSampler: Compressing Videos to One Clip With Single-Step Sampling 1345 | A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting 1346 | TubeFormer-DeepLab: Video Mask Transformer 1347 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization 1348 | A Graph Matching Perspective With Transformers on Video Instance Segmentation 1349 | STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction 1350 | Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos 1351 | End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection 1352 | Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision 1353 | Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement 1354 | A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information 1355 | Long-Short Temporal Contrastive Learning of Video Transformers 1356 | Scene Consistency Representation Learning for Video Scene Segmentation 1357 | Unsupervised Pre-Training for Temporal Action Localization Tasks 1358 | Contrastive Learning for Unsupervised Video Highlight Detection 1359 | Deformable Video Transformer 1360 | Recurring the Transformer for Video Action Recognition 1361 | Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation 1362 | Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model 1363 | Sign Language Video Retrieval With Free-Form Textual Queries 1364 | FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback 1365 | Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation 1366 | ESCNet: Gaze Target Detection With the Understanding of 3D Scenes 1367 | Interactive Multi-Class Tiny-Object Detection 1368 | Weakly Supervised Rotation-Invariant Aerial Object Detection Network 1369 | Large Loss Matters in Weakly Supervised Multi-Label Classification 1370 | MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning 1371 | FreeSOLO: Learning To Segment Objects Without Annotations 1372 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection 1373 | SIOD: Single Instance Annotated per Category per Image for Object Detection 1374 | Towards Robust Adaptive Object Detection Under Noisy Annotations 1375 | Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection 1376 | Salvage of Supervision in Weakly Supervised Object Detection 1377 | Label, Verify, Correct: A Simple Few Shot Object Detection Method 1378 | Background Activation Suppression for Weakly Supervised Object Localization 1379 | Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization 1380 | Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery 1381 | Cloth-Changing Person Re-Identification From a Single Image With Gait Prediction and Regularization 1382 | Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation 1383 | Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification 1384 | Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification 1385 | Towards Total Recall in Industrial Anomaly Detection 1386 | H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection 1387 | Geometric and Textural Augmentation for Domain Gap Reduction 1388 | General Incremental Learning With Domain-Aware Categorical Representations 1389 | DST: Dynamic Substitute Training for Data-Free Black-Box Attack 1390 | ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation 1391 | Label Matching Semi-Supervised Object Detection 1392 | Multidimensional Belief Quantification for Label-Efficient Meta-Learning 1393 | Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples 1394 | Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification 1395 | Class-Aware Contrastive Semi-Supervised Learning 1396 | Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework 1397 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo 1398 | Learning Where To Learn in Cross-View Self-Supervised Learning 1399 | Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective 1400 | SimMatch: Semi-Supervised Learning With Similarity Matching 1401 | Active Teacher for Semi-Supervised Object Detection 1402 | Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection 1403 | Self-Supervised Learning of Object Parts for Semantic Segmentation 1404 | MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection 1405 | Scale-Equivalent Distillation for Semi-Supervised Object Detection 1406 | A Self-Supervised Descriptor for Image Copy Detection 1407 | Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut 1408 | CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification 1409 | Semi-Supervised Few-Shot Learning via Multi-Factor Clustering 1410 | CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning 1411 | Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data 1412 | A Simple Data Mixing Prior for Improving Self-Supervised Learning 1413 | DETReg: Unsupervised Pretraining With Region Priors for Object Detection 1414 | Sound and Visual Representation Learning With Multiple Pretraining Tasks 1415 | UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training 1416 | Weakly Supervised Object Localization As Domain Adaption 1417 | Debiased Learning From Naturally Imbalanced Pseudo-Labels 1418 | Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning 1419 | Masked Feature Prediction for Self-Supervised Visual Pre-Training 1420 | Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency 1421 | Id-Free Person Similarity Learning 1422 | End-to-End Semi-Supervised Learning for Video Action Detection 1423 | Probabilistic Representations for Video Contrastive Learning 1424 | Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition 1425 | BEVT: BERT Pretraining of Video Transformers 1426 | Generative Cooperative Learning for Unsupervised Video Anomaly Detection 1427 | When Does Contrastive Visual Representation Learning Work? 1428 | The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization 1429 | What Matters for Meta-Learning Vision Regression Tasks? 1430 | IFOR: Iterative Flow Minimization for Robotic Object Rearrangement 1431 | TCTrack: Temporal Contexts for Aerial Tracking 1432 | AKB-48: A Real-World Articulated Object Knowledge Base 1433 | 3DAC: Learning Attribute Compression for Point Clouds 1434 | Simple but Effective: CLIP Embeddings for Embodied AI 1435 | Multi-Robot Active Mapping via Neural Bipartite Graph Matching 1436 | Continuous Scene Representations for Embodied AI 1437 | Interactron: Embodied Adaptive Object Detection 1438 | Online Learning of Reusable Abstract Models for Object Goal Navigation 1439 | RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization 1440 | UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation 1441 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation 1442 | Upright-Net: Learning Upright Orientation for 3D Point Cloud 1443 | DeepFake Disrupter: The Detector of DeepFake Is My Friend 1444 | HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization 1445 | Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources 1446 | Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection 1447 | Transferable Sparse Adversarial Attack 1448 | Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection 1449 | Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability 1450 | Improving Adversarial Transferability via Neuron Attribution-Based Attacks 1451 | Complex Backdoor Detection by Symmetric Feature Differencing 1452 | Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer 1453 | Zero-Query Transfer Attacks on Context-Aware Object Detectors 1454 | 360-Attack: Distortion-Aware Perturbations From Perspective-Views 1455 | Label-Only Model Inversion Attacks via Boundary Repulsion 1456 | Merry Go Round: Rotate a Frame and Fool a DNN 1457 | Cross-Modal Transferable Adversarial Attacks From Images to Videos 1458 | BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning 1459 | Investigating Top-k White-Box and Transferable Black-Box Attack 1460 | Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution 1461 | Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack 1462 | Towards Efficient Data Free Black-Box Adversarial Attack 1463 | Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network 1464 | Certified Patch Robustness via Smoothed Vision Transformers 1465 | Towards Practical Certifiable Patch Defense With Vision Transformer 1466 | On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles 1467 | 3DeformRS: Certifying Spatial Deformations on Point Clouds 1468 | Stereoscopic Universal Perturbations Across Different Architectures and Datasets 1469 | Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations 1470 | Bounded Adversarial Attack on Deep Content Features 1471 | DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints 1472 | Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart 1473 | Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness 1474 | Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input 1475 | Adversarial Eigen Attack on Black-Box Models 1476 | Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond 1477 | Enhancing Adversarial Training With Second-Order Statistics of Weights 1478 | Towards Data-Free Model Stealing in a Hard Label Setting 1479 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients 1480 | DTA: Physical Camouflage Attacks Using Differentiable Transformation Network 1481 | Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity 1482 | Enhancing Adversarial Robustness for Deep Metric Learning 1483 | Shape-Invariant 3D Adversarial Point Clouds 1484 | Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon 1485 | Exploring Effective Data for Surrogate Training Towards Black-Box Attack 1486 | NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models 1487 | Dual-Key Multimodal Backdoors for Visual Question Answering 1488 | Proactive Image Manipulation Detection 1489 | ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts 1490 | EnvEdit: Environment Editing for Vision-and-Language Navigation 1491 | HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation 1492 | Less Is More: Generating Grounded Navigation Instructions From Landmarks 1493 | Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation 1494 | Reinforced Structured State-Evolution for Vision-Language Navigation 1495 | Cross-Modal Map Learning for Vision and Language Navigation 1496 | Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation 1497 | One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones 1498 | Expanding Large Pre-Trained Unimodal Models With Multimodal Information Injection for Image-Text Multimodal Classification 1499 | Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding 1500 | Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding 1501 | Multi-View Transformer for 3D Visual Grounding 1502 | Multi-Modal Dynamic Graph Transformer for Visual Grounding 1503 | Weakly-Supervised Generation and Grounding of Visual Descriptions With Conditional Generative Models 1504 | Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning 1505 | Visual Abductive Reasoning 1506 | Query and Attention Augmentation for Knowledge-Based Explainable Reasoning 1507 | REX: Reasoning-Aware and Grounded Explanation 1508 | Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation 1509 | Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships 1510 | Scene Graph Expansion for Semantics-Guided Image Outpainting 1511 | VisualHow: Multimodal Problem Solving 1512 | FLAVA: A Foundational Language and Vision Alignment Model 1513 | Multi-Modal Alignment Using Representation Codebook 1514 | Negative-Aware Attention Framework for Image-Text Matching 1515 | Vision-Language Pre-Training With Triple Contrastive Learning 1516 | Vision-Language Pre-Training for Boosting Scene Text Detectors 1517 | COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval 1518 | NeurMiPs: Neural Mixture of Planar Experts for View Synthesis 1519 | FWD: Real-Time Novel View Synthesis With Forward Warping and Depth 1520 | SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images 1521 | Fast, Accurate and Memory-Efficient Partial Permutation Synchronization 1522 | Learning To Find Good Models in RANSAC 1523 | Optimizing Elimination Templates by Greedy Parameter Search 1524 | GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision 1525 | HARA: A Hierarchical Approach for Robust Rotation Averaging 1526 | RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging 1527 | A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors 1528 | ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance 1529 | Self-Supervised Neural Articulated Shape and Appearance Models 1530 | Virtual Elastic Objects 1531 | Decoupling Makes Weakly Supervised Local Feature Better 1532 | JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints 1533 | ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging 1534 | DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering 1535 | Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis 1536 | Structured Local Radiance Fields for Human Avatar Modeling 1537 | High-Fidelity Human Avatars From a Single RGB Camera 1538 | Forecasting Characteristic 3D Poses of Human Actions 1539 | Virtual Correspondence: Humans as a Cue for Extreme-View Geometry 1540 | BEHAVE: Dataset and Method for Tracking Human Object Interactions 1541 | Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives 1542 | RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation 1543 | NPBG++: Accelerating Neural Point-Based Graphics 1544 | Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows 1545 | Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos 1546 | Masked Autoencoders Are Scalable Vision Learners 1547 | Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision 1548 | Bayesian Invariant Risk Minimization 1549 | Crafting Better Contrastive Views for Siamese Representation Learning 1550 | Rethinking Minimal Sufficient Representation in Contrastive Learning 1551 | Multi-Level Feature Learning for Contrastive Multi-View Clustering 1552 | Point-Level Region Contrast for Object Detection Pre-Training 1553 | Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation 1554 | A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration 1555 | SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos 1556 | Omnivore: A Single Model for Many Visual Modalities 1557 | DPICT: Deep Progressive Image Compression Using Trit-Planes 1558 | Efficient Geometry-Aware 3D Generative Adversarial Networks 1559 | Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation 1560 | Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning 1561 | Versatile Multi-Modal Pre-Training for Human-Centric Perception 1562 | Bridging Video-Text Retrieval With Multiple Choice Questions 1563 | Integrating Language Guidance Into Vision-Based Deep Metric Learning 1564 | NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images 1565 | DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering 1566 | HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video 1567 | Neural Reflectance for Shape Recovery With Shadow Handling 1568 | Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video 1569 | Dancing Under the Stars: Video Denoising in Starlight 1570 | BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation 1571 | Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation 1572 | 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image 1573 | BokehMe: When Neural Rendering Meets Classical Rendering 1574 | Deblurring via Stochastic Refinement 1575 | Learning to Deblur Using Light Field Generated and Real Defocus Images 1576 | Towards Layer-Wise Image Vectorization 1577 | Dual-Shutter Optical Vibration Sensing 1578 | Fisher Information Guidance for Learned Time-of-Flight Imaging 1579 | Autofocus for Event Cameras 1580 | Adaptive Gating for Single-Photon 3D Imaging 1581 | LiDAR Snowfall Simulation for Robust 3D Object Detection 1582 | MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound 1583 | Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer 1584 | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture 1585 | Disentangling Visual and Written Concepts in CLIP 1586 | CLIP-Event: Connecting Text and Images With Event Structures 1587 | Robust Cross-Modal Representation Learning With Progressive Self-Distillation 1588 | TubeDETR: Spatio-Temporal Video Grounding With Transformers 1589 | 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection 1590 | 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds 1591 | Globetrotter: Connecting Languages by Connecting Images 1592 | Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment 1593 | WebQA: Multihop and Multimodal QA 1594 | PartGlot: Learning Shape Part Segmentation From Language Reference Games 1595 | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis 1596 | L-Verse: Bidirectional Generation Between Image and Text 1597 | Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation 1598 | LaTr: Layout-Aware Transformer for Scene-Text VQA 1599 | Learning Program Representations for Food Images and Cooking Recipes 1600 | On the Importance of Asymmetry for Siamese Representation Learning 1601 | Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy 1602 | Exploring Set Similarity for Dense Self-Supervised Representation Learning 1603 | Align Representations With Base: A New Approach to Self-Supervised Learning 1604 | Identifying Ambiguous Similarity Conditions via Semantic Matching 1605 | Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization 1606 | Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation 1607 | Unsupervised Visual Representation Learning by Online Constrained K-Means 1608 | Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views 1609 | Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework 1610 | Robust Contrastive Learning Against Noisy Views 1611 | On Learning Contrastive Representations for Learning With Noisy Labels 1612 | Directional Self-Supervised Learning for Heavy Image Augmentations 1613 | Continual Learning for Visual Search With Backward Consistent Feature Embedding 1614 | Probing Representation Forgetting in Supervised and Unsupervised Continual Learning 1615 | Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning 1616 | Bring Evanescent Representations to Life in Lifelong Class Incremental Learning 1617 | Unsupervised Learning of Debiased Representations With Pseudo-Attributes 1618 | A Conservative Approach for Unbiased Learning on Unknown Biases 1619 | Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization 1620 | Co-Advise: Cross Inductive Bias Distillation 1621 | PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures 1622 | RegionCLIP: Region-Based Language-Image Pretraining 1623 | Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks 1624 | Conditional Prompt Learning for Vision-Language Models 1625 | Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation? 1626 | Partial Class Activation Attention for Semantic Segmentation 1627 | Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers 1628 | Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation 1629 | Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation 1630 | Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation 1631 | L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation 1632 | Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data 1633 | Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation 1634 | Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation 1635 | MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation 1636 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night 1637 | Fast Point Transformer 1638 | RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior 1639 | ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes 1640 | DisARM: Displacement Aware Relation Module for 3D Detection 1641 | Learning Object Context for Novel-View Scene Layout Generation 1642 | Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts 1643 | Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image 1644 | Raw High-Definition Radar for Multi-Task Learning 1645 | Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation 1646 | UKPGAN: A General Self-Supervised Keypoint Detector 1647 | Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos 1648 | Rethinking Efficient Lane Detection via Curve Modeling 1649 | Exploiting Temporal Relations on Radar Perception for Autonomous Driving 1650 | Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective 1651 | BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement 1652 | ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning 1653 | Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion 1654 | Vehicle Trajectory Prediction Works, but Not Everywhere 1655 | LTP: Lane-Based Trajectory Prediction for Autonomous Driving 1656 | ONCE-3DLanes: Building Monocular 3D Lane Detection 1657 | Towards Driving-Oriented Metric for Lane Detection Models 1658 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes 1659 | LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection 1660 | DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection 1661 | A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation 1662 | Forecasting From LiDAR via Future Object Detection 1663 | RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding 1664 | Learning From All Vehicles 1665 | Is Mapping Necessary for Realistic PointGoal Navigation? 1666 | Symmetry-Aware Neural Architecture for Embodied Visual Exploration 1667 | Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles 1668 | Topology Preserving Local Road Network Estimation From Single Onboard Camera Image 1669 | Coupling Vision and Proprioception for Navigation of Legged Robots 1670 | Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation 1671 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection 1672 | Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior 1673 | SelfD: Self-Learning Large-Scale Driving Policies From the Web 1674 | Towards Real-World Navigation With Deep Differentiable Planners 1675 | Privacy Preserving Partial Localization 1676 | Efficient Large-Scale Localization by Global Instance Recognition 1677 | CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data 1678 | Bilateral Video Magnification Filter 1679 | Neural Data-Dependent Transform for Learned Image Compression 1680 | Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence 1681 | Deep Generalized Unfolding Networks for Image Restoration 1682 | Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling 1683 | XYDeblur: Divide and Conquer for Single Image Deblurring 1684 | Abandoning the Bayer-Filter To See in the Dark 1685 | RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution 1686 | All-in-One Image Restoration for Unknown Corruption 1687 | Modeling sRGB Camera Noise With Normalizing Flows 1688 | A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift 1689 | Video Frame Interpolation Transformer 1690 | The Devil Is in the Details: Window-Based Attention for Image Compression 1691 | Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction 1692 | RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs 1693 | AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement 1694 | HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging 1695 | HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging 1696 | Learning To Zoom Inside Camera Imaging Pipeline 1697 | Towards an End-to-End Framework for Flow-Guided Video Inpainting 1698 | Context-Aware Video Reconstruction for Rolling Shutter Cameras 1699 | CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image 1700 | Global Matching With Overlapping Attention for Optical Flow Estimation 1701 | CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow 1702 | Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression 1703 | Video Demoiréing With Relation-Based Temporal Consistency 1704 | Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images 1705 | Deep Constrained Least Squares for Blind Image Super-Resolution 1706 | Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model 1707 | Unsupervised Homography Estimation With Coplanarity-Aware GAN 1708 | Attentive Fine-Grained Structured Sparsity for Image Restoration 1709 | Uformer: A General U-Shaped Transformer for Image Restoration 1710 | Bringing Old Films Back to Life 1711 | Learning sRGB-to-Raw-RGB De-Rendering With Content-Aware Metadata 1712 | SNR-Aware Low-Light Image Enhancement 1713 | AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network 1714 | Synthetic Aperture Imaging With Events and Frames 1715 | Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition 1716 | Time Lens++: Event-Based Frame Interpolation With Parametric Non-Linear Flow and Multi-Scale Fusion 1717 | Unifying Motion Deblurring and Frame Interpolation With Events 1718 | EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction 1719 | Learning Adaptive Warping for Real-World Rolling Shutter Correction 1720 | Neural Global Shutter: Learn To Restore Video From a Rolling Shutter Camera With Global Reset Feature 1721 | TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation 1722 | Optimizing Video Prediction via Video Frame Interpolation 1723 | Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets 1724 | Memory-Augmented Non-Local Attention for Video Super-Resolution 1725 | Optical Flow Estimation for Spiking Camera 1726 | Compressive Single-Photon 3D Cameras 1727 | Single-Photon Structured Light 1728 | All-Photon Polarimetric Time-of-Flight Imaging 1729 | Holocurtains: Programming Light Curtains via Binary Holography 1730 | Towards Implicit Text-Guided 3D Shape Generation 1731 | Towards Language-Free Training for Text-to-Image Generation 1732 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic 1733 | EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching 1734 | Hierarchical Modular Network for Video Captioning 1735 | SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning 1736 | End-to-End Generative Pretraining for Multimodal Video Captioning 1737 | Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning 1738 | Scaling Up Vision-Language Pre-Training for Image Captioning 1739 | Comprehending and Ordering Semantics for Image Captioning 1740 | NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge 1741 | Injecting Semantic Concepts Into End-to-End Image Captioning 1742 | DIFNet: Boosting Visual Information Flow for Image Captioning 1743 | VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning 1744 | Show, Deconfound and Tell: Image Captioning With Causal Inference 1745 | EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval 1746 | CLIPstyler: Image Style Transfer With a Single Text Condition 1747 | HairCLIP: Design Your Hair by Text and Reference Image 1748 | DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting 1749 | On Guiding Visual Attention With Language Specification 1750 | UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog 1751 | Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer 1752 | LiT: Zero-Shot Transfer With Locked-Image Text Tuning 1753 | GroupViT: Semantic Segmentation Emerges From Text Supervision 1754 | ReSTR: Convolution-Free Referring Image Segmentation Using Transformers 1755 | LAVT: Language-Aware Vision Transformer for Referring Image Segmentation 1756 | An Empirical Study of Training End-to-End Vision-and-Language Transformers 1757 | Are Multimodal Transformers Robust to Missing Modality? 1758 | Text to Image Generation With Semantic-Spatial Aware GAN 1759 | StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis 1760 | Blended Diffusion for Text-Driven Editing of Natural Images 1761 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions 1762 | Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model 1763 | A Style-Aware Discriminator for Controllable Image Translation 1764 | Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint 1765 | Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks 1766 | FlexIT: Towards Flexible Semantic Image Translation 1767 | Modulated Contrast for Versatile Image Synthesis 1768 | QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation 1769 | Self-Supervised Dense Consistency Regularization for Image-to-Image Translation 1770 | Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation 1771 | InstaFormer: Instance-Aware Image-to-Image Translation With Transformer 1772 | Unsupervised Image-to-Image Translation With Generative Prior 1773 | StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF via 2D-3D Mutual Learning 1774 | NeRF-Editing: Geometry Editing of Neural Radiance Fields 1775 | GeoNeRF: Generalizing NeRF With Geometry Priors 1776 | Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation 1777 | AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields 1778 | HDR-NeRF: High Dynamic Range Neural Radiance Fields 1779 | NeRFReN: Neural Radiance Fields With Reflections 1780 | Neural Point Light Fields 1781 | 3D-Aware Image Synthesis via Learning Structural and Textural Representations 1782 | GIRAFFE HD: A High-Resolution 3D-Aware Generative Model 1783 | Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis 1784 | Bi-Level Doubly Variational Learning for Energy-Based Latent Variable Models 1785 | High-Resolution Image Harmonization via Collaborative Dual Transformations 1786 | Brain-Supervised Image Editing 1787 | De-Rendering 3D Objects in the Wild 1788 | Neural Fields As Learnable Kernels for 3D Reconstruction 1789 | HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing 1790 | 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies 1791 | Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian 1792 | Deep Image-Based Illumination Harmonization 1793 | GLASS: Geometric Latent Augmentation for Shape Spaces 1794 | PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes 1795 | Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes 1796 | Neural Mesh Simplification 1797 | SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters 1798 | CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation 1799 | UNIST: Unpaired Neural Implicit Shape Translation Network 1800 | CoNeRF: Controllable Neural Radiance Fields 1801 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling 1802 | Modeling Indirect Illumination for Inverse Rendering 1803 | Neural Head Avatars From Monocular RGB Videos 1804 | DeepCurrents: Learning Implicit Representations of Shapes With Boundaries 1805 | Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination 1806 | AnyFace: Free-Style Text-To-Face Synthesis and Manipulation 1807 | General Facial Representation Learning in a Visual-Linguistic Manner 1808 | Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection 1809 | Detecting Deepfakes With Self-Blended Images 1810 | 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces 1811 | Evaluation-Oriented Knowledge Distillation for Deep Face Recognition 1812 | AdaFace: Quality Adaptive Margin for Face Recognition 1813 | Moving Window Regression: A Novel Approach to Ordinal Regression 1814 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers 1815 | Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in “In-the-Wild” Videos 1816 | Deep Decomposition for Stochastic Normal-Abnormal Transport 1817 | DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification 1818 | Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification 1819 | Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations 1820 | VRDFormer: End-to-End Video Visual Relation Detection With Transformers 1821 | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation 1822 | Visual Acoustic Matching 1823 | The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation 1824 | Learning Multiple Dense Prediction Tasks From Partially Annotated Data 1825 | PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning 1826 | Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture 1827 | FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation 1828 | Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding 1829 | Equivariant Point Cloud Analysis via Learning Orientations for Message Passing 1830 | Surface Representation for Point Clouds 1831 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds 1832 | 3D Common Corruptions and Data Augmentation 1833 | INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation 1834 | How Much Does Input Data Type Impact Final Face Model Accuracy? 1835 | Ego4D: Around the World in 3,000 Hours of Egocentric Video 1836 | TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting 1837 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding 1838 | vCLIMB: A Novel Video Class Incremental Learning Benchmark 1839 | Opening Up Open World Tracking 1840 | Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions 1841 | CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters 1842 | Failure Modes of Domain Generalization Algorithms 1843 | A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes 1844 | Grounding Answers for Visual Questions Asked by Visually Impaired People 1845 | Learning To Answer Questions in Dynamic Audio-Visual Scenarios 1846 | Episodic Memory Question Answering 1847 | ScanQA: 3D Question Answering for Spatial Scene Understanding 1848 | Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles 1849 | BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild 1850 | Unified Contrastive Learning in Image-Text-Label Space 1851 | AlignMixup: Improving Representations by Interpolating Aligned Features 1852 | On the Road to Online Adaptation for Semantic Image Segmentation 1853 | ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation 1854 | Kernelized Few-Shot Object Detection With Efficient Integral Aggregation 1855 | Neural Mean Discrepancy for Efficient Out-of-Distribution Detection 1856 | A Structured Dictionary Perspective on Implicit Neural Representations 1857 | LARGE: Latent-Based Regression Through GAN Semantics 1858 | Rethinking Controllable Variational Autoencoders 1859 | Learning Canonical F-Correlation Projection for Compact Multiview Representation 1860 | Cross-Architecture Self-Supervised Video Representation Learning 1861 | Improving Video Model Transfer With Dynamic Representation Learning 1862 | Self-Supervised Image Representation Learning With Geometric Set Consistency 1863 | HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging 1864 | Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling 1865 | DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds 1866 | Neural Convolutional Surfaces 1867 | Representing 3D Shapes With Probabilistic Directed Distance Fields 1868 | H4D: Human 4D Modeling by Learning Neural Compositional Representation 1869 | Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification 1870 | Contrastive Regression for Domain Adaptation on Gaze Estimation 1871 | Forward Compatible Training for Large-Scale Embedding Retrieval Systems 1872 | Improving Subgraph Recognition With Variational Graph Information Bottleneck 1873 | Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss 1874 | Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species 1875 | Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation 1876 | Structured Sparse R-CNN for Direct Scene Graph Generation 1877 | PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation 1878 | RU-Net: Regularized Unrolling Network for Scene Graph Generation 1879 | Fine-Grained Predicates Learning for Scene Graph Generation 1880 | HL-Net: Heterophily Learning Network for Scene Graph Generation 1881 | SGTR: End-to-End Scene Graph Generation With Transformer 1882 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs 1883 | RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition 1884 | Spatial Commonsense Graph for Object Localisation in Partial Scenes 1885 | “The Pedestrian Next to the Lamppost” Adaptive Object Graphs for Better Instantaneous Mapping 1886 | Category-Aware Transformer Network for Better Human-Object Interaction Detection 1887 | Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection 1888 | Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection 1889 | Human-Object Interaction Detection via Disentangled Transformer 1890 | MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection 1891 | GaTector: A Unified Framework for Gaze Object Prediction 1892 | Rethinking Parsing Branch for Human Densepose Estimation 1893 | STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes 1894 | Crowd Counting in the Frequency Domain 1895 | Boosting Crowd Counting via Multifaceted Attention 1896 | Rethinking Spatial Invariance of Convolutional Networks for Object Counting 1897 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing 1898 | Collaborative Transformers for Grounded Situation Recognition 1899 | Deep Stereo Image Compression via Bi-Directional Coding 1900 | RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion 1901 | Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer 1902 | Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels 1903 | SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization 1904 | Automatic Color Image Stitching Using Quaternion Rank-1 Alignment 1905 | SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing 1906 | Degree-of-Linear-Polarization-Based Color Constancy 1907 | Point Cloud Color Constancy 1908 | Boosting View Synthesis With Residual Transfer 1909 | Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection 1910 | Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging 1911 | PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition 1912 | Multimodal Material Segmentation 1913 | Occlusion-Aware Cost Constructor for Light Field Depth Estimation 1914 | Learning Neural Light Fields With Ray-Space Embedding 1915 | Acquiring a Dynamic Light Field Through a Single-Shot Coded Image 1916 | Gravitationally Lensed Black Hole Emission Tomography 1917 | Deep Saliency Prior for Reducing Visual Distraction 1918 | Personalized Image Aesthetics Assessment With Rich Attributes 1919 | Artistic Style Discovery With Independent Components 1920 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos 1921 | SVIP: Sequence VerIfication for Procedures in Videos 1922 | Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency 1923 | Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization 1924 | GateHUB: Gated History Unit With Background Suppression for Online Action Detection 1925 | E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition 1926 | Hybrid Relation Guided Set Matching for Few-Shot Action Recognition 1927 | Spatio-Temporal Relation Modeling for Few-Shot Action Recognition 1928 | Alignment-Uniformity Aware Representation Learning for Zero-Shot Video Classification 1929 | Cross-Modal Representation Learning for Zero-Shot Action Recognition 1930 | Cross-Modal Background Suppression for Audio-Visual Event Localization 1931 | Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization 1932 | An Empirical Study of End-to-End Temporal Action Detection 1933 | Everything at Once – Multi-Modal Fusion Transformer for Video Retrieval 1934 | DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition 1935 | MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection 1936 | Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition 1937 | AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition 1938 | UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection 1939 | Detector-Free Weakly Supervised Group Activity Recognition 1940 | Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading 1941 | Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer 1942 | Interactiveness Field in Human-Object Interactions 1943 | GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection 1944 | Object-Relation Reasoning Graph for Action Recognition 1945 | UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection 1946 | Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition 1947 | SPAct: Self-Supervised Privacy Preservation for Action Recognition 1948 | Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering 1949 | InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition 1950 | Learning Video Representations of Human Motion From Synthetic Data 1951 | Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos 1952 | EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images 1953 | Gait Recognition in the Wild With Dense 3D Representations and a Benchmark 1954 | Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification 1955 | Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition 1956 | DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover’s Distance Improves Out-of-Distribution Face Identification 1957 | Learning Second Order Local Anomaly for General Face Forgery Detection 1958 | PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition 1959 | Face2Exp: Combating Data Biases for Facial Expression Recognition 1960 | Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation 1961 | EMOCA: Emotion Driven Monocular Face Capture and Animation 1962 | Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality 1963 | FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset 1964 | ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations 1965 | Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling 1966 | RigNeRF: Fully Controllable Neural 3D Portraits 1967 | HeadNeRF: A Real-Time NeRF-Based Parametric Head Model 1968 | Sparse to Dense Dynamic 3D Facial Expression Generation 1969 | Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion 1970 | Speech Driven Tongue Animation 1971 | Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition 1972 | gDNA: Towards Generative Detailed Neural Avatars 1973 | GraFormer: Graph-Oriented Transformer for 3D Pose Estimation 1974 | Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation 1975 | Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis 1976 | PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence 1977 | The Wanderings of Odysseus in 3D Scenes 1978 | OSSO: Obtaining Skeletal Shape From Outside 1979 | LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds 1980 | Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression 1981 | Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation 1982 | LISA: Learning Implicit Shape and Appearance of Hands 1983 | MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image 1984 | Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation 1985 | Low-Resource Adaptation for Personalized Co-Speech Gesture Generation 1986 | D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions 1987 | Synthetic Generation of Face Videos With Plethysmograph Physiology 1988 | Contour-Hugging Heatmaps for Landmark Detection 1989 | Which Images To Label for Few-Shot Medical Landmark Detection? 1990 | Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography 1991 | Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization 1992 | Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution 1993 | Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations 1994 | Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation 1995 | BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation 1996 | Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis 1997 | Towards Low-Cost and Efficient Malaria Detection 1998 | ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification 1999 | Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification 2000 | M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer 2001 | Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis 2002 | HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet 2003 | DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations 2004 | Clean Implicit 3D Structure From Noisy 2D STEM Images 2005 | Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks 2006 | Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment 2007 | Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks 2008 | NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration 2009 | SMPL-A: Modeling Person-Specific Deformable Anatomy 2010 | DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis 2011 | Affine Medical Image Registration With Coarse-To-Fine Vision Transformer 2012 | Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow 2013 | Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization 2014 | Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation 2015 | FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis 2016 | Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning 2017 | CellTypeGraph: A New Geometric Computer Vision Benchmark 2018 | ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics 2019 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos 2020 | Multi-Dimensional, Nuanced and Subjective – Measuring the Perception of Facial Expressions 2021 | DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image 2022 | OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction 2023 | PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking 2024 | Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification 2025 | JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection 2026 | DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion 2027 | Egocentric Prediction of Action Target in 3D 2028 | HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction 2029 | Amodal Panoptic Segmentation 2030 | Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark 2031 | YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset 2032 | The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting 2033 | 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos 2034 | AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval 2035 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection 2036 | Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities 2037 | Optimal Correction Cost for Object Detection Evaluation 2038 | GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains 2039 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding 2040 | Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation 2041 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes 2042 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation 2043 | Open Challenges in Deep Stereo: The Booster Dataset 2044 | No-Reference Point Cloud Quality Assessment via Domain Adaptation 2045 | Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network 2046 | How Good Is Aesthetic Ability of a Fashion Model? 2047 | Instance-Wise Occlusion and Depth Orders in Natural Scenes 2048 | PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects 2049 | Replacing Labeled Real-Image Datasets With Auto-Generated Contours 2050 | V2C: Visual Voice Cloning 2051 | M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining 2052 | It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection 2053 | From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering 2054 | Point Cloud Pre-Training With Natural 3D Structures 2055 | The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift 2056 | AutoMine: An Unmanned Mine Dataset 2057 | SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis 2058 | BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations 2059 | Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task 2060 | Unifying Panoptic Segmentation for Autonomous Driving 2061 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection 2062 | SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation 2063 | Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions --------------------------------------------------------------------------------