├── misc
    ├── mask.jpg
    └── CVPR2022-papers.list
├── output
    ├── cvpr_22.png
    ├── keywords.jpg
    ├── wordcloud.png
    └── keyworks-2122.png
├── README.md
└── cvpr22.py


/misc/mask.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/misc/mask.jpg


--------------------------------------------------------------------------------
/output/cvpr_22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/cvpr_22.png


--------------------------------------------------------------------------------
/output/keywords.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/keywords.jpg


--------------------------------------------------------------------------------
/output/wordcloud.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/wordcloud.png


--------------------------------------------------------------------------------
/output/keyworks-2122.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BIGBALLON/CVPR2022-Paper-Statistics/HEAD/output/keyworks-2122.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | - [CVPR2022-Paper-Statistics](#cvpr2022-paper-statistics)
 2 |     - [Wordcloud](#wordcloud)
 3 |     - [Acceptance](#acceptance)
 4 |     - [Keyworks](#keyworks)
 5 |     - [Code Uasge](#code-uasge)
 6 | # CVPR2022-Paper-Statistics
 7 | 
 8 | Statistics and Visualization of main keyword of accepted papers for The IEEE / CVF Computer Vision and Pattern Recognition Conference ([CVPR 2022](https://cvpr2022.thecvf.com/))
 9 | 
10 | Inspired by [CVPR-2021-Paper-Statistics](https://github.com/hoya012/CVPR-2021-Paper-Statistics/)
11 | 
12 | 
13 | ##  Wordcloud
14 | 
15 | 
16 | ![output](./output/wordcloud.png)
17 | 
18 | ##  Acceptance
19 | 
20 | | Year  | Submissions | Acceptance | Acceptance rate |
21 | | :---: | :---------: | :--------: | :-------------: |
22 | | 2011  |    1677     |    438     |     26.10%      |
23 | | 2012  |    1933     |    466     |     24.10%      |
24 | | 2013  |    1798     |    472     |     26.20%      |
25 | | 2014  |    1807     |    540     |     29.90%      |
26 | | 2015  |    2123     |    602     |     28.40%      |
27 | | 2016  |    2145     |    643     |     29.90%      |
28 | | 2017  |    2680     |    783     |     29.20%      |
29 | | 2018  |    3359     |    979     |     29.10%      |
30 | | 2019  |    5160     |    1294    |     25.07%      |
31 | | 2020  |    6656     |    1468    |     22.13%      |
32 | | 2021  |    7015     |    1663    |     23.71%      |
33 | | 2022  |    8161     |    2067    |     25.32%      |
34 | 
35 | ![acceptance](./output/cvpr_22.png)
36 | 
37 | ## Keyworks
38 | 
39 | ![wordcloud](./output/keywords.jpg)
40 | 
41 | - Most of the top keywords were maintained
42 |     - Image, Object, Detection, 3D, Video, Segmentation 
43 | - **Transfomer** are about $5\times$ as frequent, thanks for [**ViT**](https://arxiv.org/abs/2010.11929)
44 |     - transfomer: $35$ -> $177$ 😉
45 | 
46 | 
47 | ![wordcloud](./output/keyworks-2122.png)
48 | 
49 | 
50 | ## Code Uasge
51 | 
52 | ```bash
53 | # 1. install packages
54 | pip install matplotlib scipy pillow wordcloud nltk numpy
55 | # 2. run script
56 | python cvpr22.py --list misc/CVPR2022-papers.list 
57 | ```


--------------------------------------------------------------------------------
/cvpr22.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Dependency:
  3 |     matplotlib scipy pillow wordcloud nltk
  4 | Provided codes were adapted from:
  5 |     http://amueller.github.io/word_cloud/
  6 |     https://github.com/hoya012/CVPR-2021-Paper-Statistics/
  7 | """
  8 | import argparse
  9 | 
 10 | import matplotlib.pyplot as plt
 11 | import nltk
 12 | import numpy as np
 13 | from PIL import Image
 14 | from scipy.ndimage import gaussian_gradient_magnitude
 15 | from wordcloud import ImageColorGenerator, WordCloud
 16 | 
 17 | nltk.download("stopwords")
 18 | from collections import Counter
 19 | 
 20 | from nltk.corpus import stopwords
 21 | 
 22 | 
 23 | def get_list(args) -> list:
 24 |     with open(args.list) as f:
 25 |         papar_list = [line.strip() for line in f]
 26 |     return papar_list
 27 | 
 28 | 
 29 | def gen_keywords_fig(keyword_counter):
 30 |     # Show N most common keywords and their frequencies
 31 |     num_keyowrd = 75
 32 |     keywords_counter_vis = keyword_counter.most_common(num_keyowrd)
 33 | 
 34 |     plt.rcdefaults()
 35 |     _, ax = plt.subplots(figsize=(8, 18))
 36 | 
 37 |     key = [k[0] for k in keywords_counter_vis]
 38 |     value = [k[1] for k in keywords_counter_vis]
 39 |     y_pos = np.arange(len(key))
 40 |     ax.barh(y_pos, value, align="center", color="green", ecolor="black")
 41 |     ax.set_yticks(y_pos)
 42 |     ax.set_yticklabels(key, rotation=0, fontsize=10)
 43 |     ax.invert_yaxis()
 44 |     for i, v in enumerate(value):
 45 |         ax.text(v + 2, i + 0.25, str(v), color="black", fontsize=10)
 46 |     ax.set_title("CVPR 2022 Submission Top {} Keywords".format(num_keyowrd))
 47 | 
 48 |     plt.savefig("keywords.jpg", bbox_inches="tight", dpi=128)
 49 | 
 50 | 
 51 | def gen_wordcloud_fig(keyword_list):
 52 |     # Show the word cloud forming by keywords
 53 |     parrot_color = np.array(Image.open("misc/mask.jpg"))
 54 |     # subsample by factor of 3. Very lossy but for a wordcloud we don't really care.
 55 |     parrot_color = parrot_color[::3, ::3]
 56 | 
 57 |     # create mask  white is "masked out"
 58 |     parrot_mask = parrot_color.copy()
 59 |     parrot_mask[parrot_mask.sum(axis=2) == 0] = 255
 60 | 
 61 |     # some finesse: we enforce boundaries between colors so they get less washed out.
 62 |     # For that we do some edge detection in the image
 63 |     edges = np.mean(
 64 |         [
 65 |             gaussian_gradient_magnitude(parrot_color[:, :, i] / 255.0, 2)
 66 |             for i in range(3)
 67 |         ],
 68 |         axis=0,
 69 |     )
 70 |     parrot_mask[edges > 0.08] = 255
 71 | 
 72 |     # create wordcloud. A bit sluggish, you can subsample more strongly for quicker rendering
 73 |     # relative_scaling=0 means the frequencies in the data are reflected less
 74 |     # acurately but it makes a better picture
 75 |     wc = WordCloud(
 76 |         max_words=1024,
 77 |         mask=parrot_mask,
 78 |         max_font_size=50,
 79 |         random_state=42,
 80 |         background_color="white",
 81 |         relative_scaling=0,
 82 |     )
 83 |     # generate word cloud
 84 |     wc.generate(" ".join(keyword_list))
 85 |     # create coloring from image
 86 |     image_colors = ImageColorGenerator(parrot_color)
 87 |     wc.recolor(color_func=image_colors)
 88 |     plt.figure(figsize=(10, 10))
 89 |     plt.imshow(wc, interpolation="bilinear")
 90 |     wc.to_file("wordcloud.png")
 91 | 
 92 | 
 93 | def run(args):
 94 |     papar_list = get_list(args)
 95 |     print(f"  == totals: {len(papar_list)}")
 96 | 
 97 |     stopwords_englisth = stopwords.words("english")
 98 |     stopwords_deep_learning = [
 99 |         "learning",
100 |         "network",
101 |         "neural",
102 |         "networks",
103 |         "deep",
104 |         "via",
105 |         "using",
106 |         "convolutional",
107 |         "single",
108 |     ]
109 | 
110 |     keyword_list = []
111 | 
112 |     for idx, title in enumerate(papar_list):
113 | 
114 |         print(f"{idx} : {title}")
115 | 
116 |         word_list = title.split(" ")
117 |         word_list = list(set(word_list))
118 |         word_list_cleaned = []
119 |         for word in word_list:
120 |             word = word.lower()
121 |             if word not in stopwords_englisth and word not in stopwords_deep_learning:
122 |                 word_list_cleaned.append(word)
123 | 
124 |         for k in range(len(word_list_cleaned)):
125 |             keyword_list.append(word_list_cleaned[k])
126 | 
127 |     keyword_counter = Counter(keyword_list)
128 |     print(keyword_counter)
129 |     print(f"{len(keyword_counter)} different keywords before merging")
130 | 
131 |     # Merge duplicates: CNNs and CNN
132 |     duplicates = []
133 |     for k in keyword_counter:
134 |         if k + "s" in keyword_counter:
135 |             duplicates.append(k)
136 |     for k in duplicates:
137 |         keyword_counter[k] += keyword_counter[k + "s"]
138 |         del keyword_counter[k + "s"]
139 |     print(keyword_counter)
140 |     print(f"{len(keyword_counter)} different keywords after merging")
141 |     gen_keywords_fig(keyword_counter)
142 |     gen_wordcloud_fig(keyword_list)
143 | 
144 | 
145 | if __name__ == "__main__":
146 | 
147 |     parser = argparse.ArgumentParser(description="CVPR 2022 Paper Statistics.")
148 |     parser.add_argument("--list", type=str, required=True, help="Paper list")
149 |     args = parser.parse_args()
150 |     run(args)
151 | 


--------------------------------------------------------------------------------
/misc/CVPR2022-papers.list:
--------------------------------------------------------------------------------
   1 | Efficient Deep Embedded Subspace Clustering
   2 | Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers
   3 | CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data
   4 | Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning
   5 | Active Learning for Open-Set Annotation
   6 | Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training
   7 | Robust Optimization As Data Augmentation for Large-Scale Graphs
   8 | A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty
   9 | The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration
  10 | Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector
  11 | GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning
  12 | Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning
  13 | A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
  14 | Learning To Learn by Jointly Optimizing Neural Architecture and Weights
  15 | Learning To Prompt for Continual Learning
  16 | Meta-Attention for ViT-Backed Continual Learning
  17 | Multi-Frame Self-Supervised Depth With Transformers
  18 | Continual Learning With Lifelong Vision Transformer
  19 | Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation
  20 | Revisiting Random Channel Pruning for Neural Network Compression
  21 | Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase
  22 | Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning
  23 | Towards Robust and Reproducible Active Learning Using Neural Networks
  24 | Non-Iterative Recovery From Nonlinear Observations Using Generative Models
  25 | Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders
  26 | Robust Combination of Distributed Gradients Under Adversarial Perturbations
  27 | Do Learned Representations Respect Causal Relationships?
  28 | How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
  29 | Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees
  30 | Contrastive Test-Time Adaptation
  31 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
  32 | Selective-Supervised Contrastive Learning With Noisy Labels
  33 | RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks
  34 | Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
  35 | Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels
  36 | Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design
  37 | Learning Structured Gaussians To Approximate Deep Ensembles
  38 | Out-of-Distribution Generalization With Causal Invariant Transformations
  39 | Split Hierarchical Variational Compression
  40 | Implicit Feature Decoupling With Depthwise Quantization
  41 | Understanding Uncertainty Maps in Vision With Statistical Testing
  42 | A Hybrid Quantum-Classical Algorithm for Robust Fitting
  43 | A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching
  44 | FastDOG: Fast Discrete Optimization on GPU
  45 | Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization
  46 | AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks
  47 | Training Quantised Neural Networks With STE Variants: The Additive Noise Annealing Algorithm
  48 | AME: Attention and Memory Enhancement in Hyper-Parameter Optimization
  49 | Accelerating Neural Network Optimization Through an Automated Control Theory Lens
  50 | Efficient Maximal Coding Rate Reduction by Variational Forms
  51 | A Unified Framework for Implicit Sinkhorn Differentiation
  52 | Computing Wasserstein-p Distance Between Images With Linear Cost
  53 | An Iterative Quantum Approach for Transformation Estimation From Point Sets
  54 | BoosterNet: Improving Domain Generalization of Deep Neural Nets Using Culpability-Ranked Features
  55 | Pooling Revisited: Your Receptive Field Is Suboptimal
  56 | Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis
  57 | Online Convolutional Re-Parameterization
  58 | RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality
  59 | DyRep: Bootstrapping Training With Dynamic Re-Parameterization
  60 | Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
  61 | Condensing CNNs With Partial Differential Equations
  62 | Deep Equilibrium Optical Flow Estimation
  63 | Frame Averaging for Equivariant Shape Space Learning
  64 | Dual-Generator Face Reenactment
  65 | Convolution of Convolution: Let Kernels Spatially Collaborate
  66 | SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention
  67 | RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising
  68 | Co-Domain Symmetry for Complex-Valued Deep Learning
  69 | Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention
  70 | Compressing Models With Few Samples: Mimicking Then Replacing
  71 | Total Variation Optimization Layers for Computer Vision
  72 | AIM: An Auto-Augmenter for Images and Meshes
  73 | Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to the Task of Accelerated MRI Reconstruction
  74 | Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching
  75 | Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images
  76 | Delving Into the Estimation Shift of Batch Normalization in a Network
  77 | Generalizing Interactive Backpropagating Refinement for Dense Prediction Networks
  78 | Brain-Inspired Multilayer Perceptron With Spiking Neurons
  79 | Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique
  80 | Revisiting Weakly Supervised Pre-Training of Visual Perception Models
  81 | On the Integration of Self-Attention and Convolution
  82 | Hire-MLP: Vision MLP via Hierarchical Rearrangement
  83 | Stable Long-Term Recurrent Video Super-Resolution
  84 | Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation
  85 | Progressive End-to-End Object Detection in Crowded Scenes
  86 | Zero-Shot Text-Guided Object Generation With Dream Fields
  87 | ISNet: Shape Matters for Infrared Small Target Detection
  88 | Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
  89 | CLRNet: Cross Layer Refinement Network for Lane Detection
  90 | CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection
  91 | Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection
  92 | Group Contextualization for Video Recognition
  93 | Learning Transferable Human-Object Interaction Detector With Natural Language Supervision
  94 | Accelerating DETR Convergence via Semantic-Aligned Matching
  95 | Efficient Video Instance Segmentation via Tracklet Query and Proposal
  96 | Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
  97 | Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection
  98 | C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
  99 | Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval
 100 | AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks
 101 | Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
 102 | A Proposal-Based Paradigm for Self-Supervised Sound Source Localization in Videos
 103 | SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization
 104 | Towards End-to-End Unified Scene Text Detection and Layout Analysis
 105 | Clothes-Changing Person Re-Identification With RGB Modality Only
 106 | MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection
 107 | Homography Loss for Monocular 3D Object Detection
 108 | TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers
 109 | TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation
 110 | RBGNet: Ray-Based Grouping for 3D Object Detection
 111 | Voxel Field Fusion for 3D Object Detection
 112 | Learning To Detect Mobile Objects From LiDAR Scans Without Labels
 113 | OccAM’s Laser: Occlusion-Based Attribution Maps for 3D Object Detectors on LiDAR Data
 114 | Confidence Propagation Cluster: Unleash Full Potential of Object Detectors
 115 | TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization
 116 | A Voxel Graph CNN for Object Classification With Event Cameras
 117 | OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection
 118 | Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
 119 | Category Contrast for Unsupervised Domain Adaptation in Visual Tasks
 120 | Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With a Bayesian Model
 121 | GANSeg: Learning To Segment by Unsupervised Hierarchical Image Generation
 122 | Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation
 123 | Deep Hierarchical Semantic Segmentation
 124 | Semantic Segmentation by Early Region Proxy
 125 | Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation
 126 | Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
 127 | Masked-Attention Mask Transformer for Universal Image Segmentation
 128 | FocalClick: Towards Practical Interactive Image Segmentation
 129 | High Quality Segmentation for Ultra High-Resolution Images
 130 | Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks
 131 | Recurrent Dynamic Embedding for Video Object Segmentation
 132 | Accelerating Video Object Segmentation With Compressed Video
 133 | Per-Clip Video Object Segmentation
 134 | SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization
 135 | Neural Recognition of Dashed Curves With Gestalt Law of Continuity
 136 | CVNet: Contour Vibration Network for Building Extraction
 137 | A Keypoint-Based Global Association Network for Lane Detection
 138 | EDTER: Edge Detection With Transformer
 139 | Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
 140 | Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration
 141 | CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance
 142 | FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing
 143 | Rotationally Equivariant 3D Object Detection
 144 | AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis
 145 | Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes
 146 | Human Mesh Recovery From Multiple Shots
 147 | HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network
 148 | Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
 149 | Disentangled3D: Learning a 3D Generative Model With Disentangled Geometry and Appearance From Monocular Images
 150 | NeuralHDHair: Automatic High-Fidelity Hair Modeling From a Single Image Using Implicit Neural Representations
 151 | Topologically-Aware Deformation Fields for Single-View 3D Reconstruction
 152 | Generating Diverse 3D Reconstructions From a Single Occluded Face Image
 153 | LOLNerf: Learn From One Look
 154 | Learning Local Displacements for Point Cloud Completion
 155 | Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation
 156 | Dimension Embeddings for Monocular 3D Object Detection
 157 | Understanding 3D Object Articulation in Internet Videos
 158 | P3Depth: Monocular Depth Estimation With a Piecewise Planarity Prior
 159 | Neural Face Identification in a 2D Wireframe Projection of a Manifold Object
 160 | PanopticDepth: A Unified Framework for Depth-Aware Panoptic Segmentation
 161 | Stability-Driven Contact Reconstruction From Monocular Color Images
 162 | LGT-Net: Indoor Panoramic Room Layout Estimation With Geometry-Aware Transformer Network
 163 | Collaborative Learning for Hand and Object Reconstruction With Attention-Guided Graph Convolution
 164 | RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes
 165 | Exploring Geometric Consistency for Monocular 3D Object Detection
 166 | Learning 3D Object Shape and Layout Without 3D Supervision
 167 | Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data
 168 | Occluded Human Mesh Recovery
 169 | LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints
 170 | OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction
 171 | Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light
 172 | Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection
 173 | HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening
 174 | Revisiting Near/Remote Sensing With Geospatial Attention
 175 | Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening
 176 | Mutual Information-Driven Pan-Sharpening
 177 | Sparse and Complete Latent Organization for Geospatial Semantic Segmentation
 178 | The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions
 179 | Oriented RepPoints for Aerial Object Detection
 180 | Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction
 181 | PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images
 182 | Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites
 183 | MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting
 184 | Iterative Deep Homography Estimation
 185 | GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors
 186 | Deep Color Consistent Network for Low-Light Image Enhancement
 187 | LAR-SR: A Local Autoregressive Model for Image Super-Resolution
 188 | Multi-Scale Memory-Based Video Deblurring
 189 | Local Texture Estimator for Implicit Representation Function
 190 | Chitransformer: Towards Reliable Stereo From Cues
 191 | BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras
 192 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
 193 | IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
 194 | Learning Graph Regularisation for Guided Super-Resolution
 195 | Self-Supervised Deep Image Restoration via Adaptive Stochastic Gradient Langevin Dynamics
 196 | Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation
 197 | Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching
 198 | Unpaired Deep Image Deraining Using Dual Contrastive Learning
 199 | Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots
 200 | Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition
 201 | VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
 202 | Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space
 203 | Exploring and Evaluating Image Restoration Potential in Dynamic Scenes
 204 | GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation
 205 | Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method
 206 | IDR: Self-Supervised Image Denoising via Iterative Data Refinement
 207 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
 208 | Texture-Based Error Analysis for Image Super-Resolution
 209 | Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel
 210 | KNN Local Attention for Image Restoration
 211 | Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection
 212 | Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection
 213 | Self-Supervised Keypoint Discovery in Behavioral Videos
 214 | Learning To Align Sequential Actions in the Wild
 215 | Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination
 216 | End-to-End Human-Gaze-Target Detection With Transformers
 217 | Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis
 218 | MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction
 219 | Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction
 220 | End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps
 221 | Learning Affordance Grounding From Exocentric Images
 222 | 3D Scene Painting via Semantic Image Synthesis
 223 | Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography
 224 | ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection
 225 | Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches
 226 | Image Disentanglement Autoencoder for Steganography Without Embedding
 227 | Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection
 228 | Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning
 229 | Density-Preserving Deep Point Cloud Compression
 230 | Graph-Context Attention Networks for Size-Varied Deep Graph Matching
 231 | TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions
 232 | ObjectFormer for Image Manipulation Detection and Localization
 233 | Sequential Voting With Relational Box Fields for Active Object Detection
 234 | Efficient Classification of Very Large Images With Tiny Objects
 235 | Partially Does It: Towards Scene-Level FG-SBIR With Partial Input
 236 | Long-Term Visual Map Sparsification With Heterogeneous GNN
 237 | Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association
 238 | DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
 239 | Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring
 240 | Rethinking Image Cropping: Exploring Diverse Compositions From Global Views
 241 | Defensive Patches for Robust Recognition in the Physical World
 242 | Semi-Supervised Video Paragraph Grounding With Contrastive Encoder
 243 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels
 244 | Meta Distribution Alignment for Generalizable Person Re-Identification
 245 | FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction
 246 | It’s About Time: Analog Clock Reading in the Wild
 247 | Consistency Driven Sequential Transformers Attention Model for Partially Observable Scenes
 248 | SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles
 249 | Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers
 250 | Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving
 251 | CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
 252 | Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers
 253 | Rethinking Semantic Segmentation: A Prototype View
 254 | Semantic-Aware Domain Generalized Segmentation
 255 | Adaptive Early-Learning Correction for Segmentation From Noisy Annotations
 256 | Pointly-Supervised Instance Segmentation
 257 | Joint Forecasting of Panoptic Segmentations With Difference Attention
 258 | FocusCut: Diving Into a Focus View in Interactive Segmentation
 259 | Human Instance Matting via Mutual Guidance and Multi-Instance Refinement
 260 | Deformable Sprites for Unsupervised Video Decomposition
 261 | Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation
 262 | Robust and Accurate Superquadric Recovery: A Probabilistic Approach
 263 | Medial Spectral Coordinates for 3D Shape Analysis
 264 | Scribble-Supervised LiDAR Semantic Segmentation
 265 | SoftGroup for 3D Instance Segmentation on Point Clouds
 266 | Accurate 3D Body Shape Regression Using Metric and Semantic Attributes
 267 | JIFF: Jointly-Aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction
 268 | Tracking People by Predicting 3D Appearance, Location and Pose
 269 | ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis
 270 | Interacting Attention Graph for Single Image Two-Hand Reconstruction
 271 | 3D Human Tongue Reconstruction From Single “In-the-Wild” Images
 272 | EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation
 273 | Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection
 274 | OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
 275 | Gated2Gated: Self-Supervised Depth Estimation From Gated Images
 276 | IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
 277 | Egocentric Scene Understanding via Multimodal Spatial Rectifier
 278 | Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry
 279 | The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement
 280 | BANMo: Building Animatable 3D Neural Models From Many Casual Videos
 281 | Self-Supervised Video Transformer
 282 | Temporally Efficient Vision Transformer for Video Instance Segmentation
 283 | VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation
 284 | Temporal Alignment Networks for Long-Term Video
 285 | Revisiting the “Video” in Video-Language Understanding
 286 | Invariant Grounding for Video Question Answering
 287 | P3IV: Probabilistic Procedure Planning From Instructional Videos With Weak Supervision
 288 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment
 289 | Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
 290 | Revisiting Skeleton-Based Action Recognition
 291 | OpenTAL: Towards Open Set Temporal Action Localization
 292 | Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition
 293 | TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition
 294 | Revealing Occlusions With 4D Neural Fields
 295 | HODOR: High-Level Object Descriptors for Object Re-Segmentation in Video Learned From Static Images
 296 | Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning
 297 | UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection
 298 | Future Transformer for Long-Term Action Anticipation
 299 | MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing
 300 | Learning Pixel-Level Distinctions for Video Highlight Detection
 301 | DR.VIC: Decomposition and Reasoning for Video Individual Counting
 302 | Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation
 303 | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
 304 | Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training
 305 | Coarse-To-Fine Feature Mining for Video Semantic Segmentation
 306 | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
 307 | Object-Region Video Transformers
 308 | Colar: Effective and Efficient Online Action Detection by Consulting Exemplars
 309 | SimVP: Simpler Yet Better Video Prediction
 310 | Imposing Consistency for Optical Flow Estimation
 311 | Stand-Alone Inter-Frame Attention in Video Models
 312 | Video Swin Transformer
 313 | Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
 314 | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
 315 | Likert Scoring With Grade Decoupling for Long-Term Action Assessment
 316 | Complex Video Action Reasoning via Learnable Markov Logic Network
 317 | Learning From Temporal Gradient for Semi-Supervised Action Recognition
 318 | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
 319 | Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
 320 | Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos
 321 | Human Hands As Probes for Interactive Object Understanding
 322 | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition
 323 | Object-Aware Video-Language Pre-Training for Retrieval
 324 | Fast and Unsupervised Action Boundary Detection for Action Segmentation
 325 | Multiview Transformers for Video Recognition
 326 | Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos
 327 | Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
 328 | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
 329 | Sound-Guided Semantic Image Manipulation
 330 | Expressive Talking Head Generation With Granular Audio-Visual Control
 331 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation
 332 | Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans From a Single Camera
 333 | Audio-Driven Neural Gesture Reenactment With Video Motion Graphs
 334 | Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data
 335 | Weakly Supervised High-Fidelity Clothing Model Generation
 336 | TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates
 337 | Full-Range Virtual Try-On With Recurrent Tri-Level Transform
 338 | Style-Based Global Appearance Flow for Virtual Try-On
 339 | Dressing in the Wild by Watching Dance Videos
 340 | A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres
 341 | Unpaired Cartoon Image Synthesis via Gated Cycle Mapping
 342 | DLFormer: Discrete Latent Transformer for Video Inpainting
 343 | ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
 344 | Video Frame Interpolation With Transformer
 345 | Long-Term Video Frame Interpolation via Feature Propagation
 346 | Many-to-Many Splatting for Efficient Video Frame Interpolation
 347 | Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image
 348 | Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning
 349 | Playable Environments: Video Manipulation in Space and Time
 350 | Event-Based Video Reconstruction via Potential-Assisted Spiking Neural Network
 351 | Modular Action Concept Grounding in Semantic Video Prediction
 352 | Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
 353 | StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2
 354 | Structure-Aware Motion Transfer With Deformable Anchor Model
 355 | Image Animation With Perturbed Masks
 356 | Thin-Plate Spline Motion Model for Image Animation
 357 | Controllable Animation of Fluid Elements in Still Images
 358 | Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects
 359 | Geometric Structure Preserving Warp for Natural Image Stitching
 360 | Few-Shot Incremental Learning for Label-to-Image Translation
 361 | Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network
 362 | SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks
 363 | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage
 364 | PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework
 365 | Kubric: A Scalable Dataset Generator
 366 | 360MonoDepth: High-Resolution 360° Monocular Depth Estimation
 367 | Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction
 368 | DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation
 369 | MonoGround: Detecting Monocular 3D Objects From the Ground
 370 | 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow
 371 | Toward Practical Monocular Indoor Depth Estimation
 372 | Focal Length and Object Pose Estimation via Render and Compare
 373 | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
 374 | Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images
 375 | Layered Depth Refinement With Mask Guidance
 376 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
 377 | BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information
 378 | Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving
 379 | What’s in Your Hands? 3D Reconstruction of Generic Objects in Hands
 380 | 3D Moments From Near-Duplicate Photos
 381 | Neural Window Fully-Connected CRFs for Monocular Depth Estimation
 382 | PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors
 383 | CroMo: Cross-Modal Learning for Monocular Depth Estimation
 384 | f-SfT: Shape-From-Template With a Physics-Based Deformation Model
 385 | Human-Aware Object Placement for Visual Environment Reconstruction
 386 | AutoRF: Learning 3D Object Radiance Fields From Single View Observations
 387 | Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation
 388 | MonoScene: Monocular 3D Semantic Scene Completion
 389 | GenDR: A Generalized Differentiable Renderer
 390 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer
 391 | ROCA: Robust CAD Model Retrieval and Alignment From a Single Image
 392 | HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
 393 | Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC
 394 | Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning
 395 | Enhancing Face Recognition With Self-Supervised 3D Reconstruction
 396 | Learning To Learn Across Diverse Data Biases in Deep Face Recognition
 397 | An Efficient Training Approach for Very Large Scale Face Recognition
 398 | MogFace: Towards a Deeper Appreciation on Face Detection
 399 | Exploring Frequency Adversarial Attacks for Face Forgery Detection
 400 | End-to-End Reconstruction-Classification Learning for Face Forgery Detection
 401 | Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing
 402 | Privacy-Preserving Online AutoML for Domain-Specific Face Detection
 403 | Simulated Adversarial Testing of Face Recognition Models
 404 | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing
 405 | Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin
 406 | Towards Accurate Facial Landmark Detection via Cascaded Transformers
 407 | PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer
 408 | GazeOnce: Real-Time Multi-Person Gaze Estimation
 409 | Generalizing Gaze Estimation With Rotation Consistency
 410 | Face Relighting With Geometrically Consistent Shadows
 411 | HairMapper: Removing Hair From Portraits Using GANs
 412 | Learning To Restore 3D Face From In-the-Wild Degraded Images
 413 | Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
 414 | Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation
 415 | ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation
 416 | Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement
 417 | Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation
 418 | Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
 419 | Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
 420 | Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
 421 | Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds
 422 | Novel Class Discovery in Semantic Segmentation
 423 | Pin the Memory: Learning To Generalize Semantic Segmentation
 424 | ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation
 425 | Incremental Learning in Semantic Segmentation From Image Labels
 426 | Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers
 427 | SharpContour: A Contour-Based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation
 428 | Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings
 429 | Mask Transfiner for High-Quality Instance Segmentation
 430 | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
 431 | Sparse Instance Activation for Real-Time Instance Segmentation
 432 | E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation
 433 | Hyperbolic Image Segmentation
 434 | SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information
 435 | CDGNet: Class Distribution Guided Network for Human Parsing
 436 | CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
 437 | Sparse Non-Local CRF
 438 | Detecting Camouflaged Object in Frequency Domain
 439 | Progressive Minimal Path Method With Embedded CNN
 440 | Open-Set Text Recognition via Character-Context Decoupling
 441 | Neural Collaborative Graph Machines for Table Structure Recognition
 442 | Revisiting Document Image Dewarping by Grid Regularization
 443 | Syntax-Aware Network for Handwritten Mathematical Expression Recognition
 444 | Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
 445 | Fourier Document Restoration for Robust Document Dewarping and Recognition
 446 | XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
 447 | SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition
 448 | Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer
 449 | TableFormer: Table Structure Understanding With Transformers
 450 | Knowledge Mining With Scene Text for Fine-Grained Recognition
 451 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
 452 | Focal and Global Knowledge Distillation for Detectors
 453 | Speed Up Object Detection on Gigapixel-Level Images With Patch Arrangement
 454 | Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer
 455 | Learning With Neighbor Consistency for Noisy Labels
 456 | Meta Convolutional Neural Networks for Single Domain Generalization
 457 | Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
 458 | Geometry-Aware Guided Loss for Deep Crack Recognition
 459 | Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way
 460 | Dynamic Sparse R-CNN
 461 | Deep Hybrid Models for Out-of-Distribution Detection
 462 | AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification
 463 | Feature Erasing and Diffusion Network for Occluded Person Re-Identification
 464 | Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss
 465 | BoxeR: Box-Attention for 2D and 3D Transformers
 466 | Multi-Label Iterated Learning for Image Classification With Label Ambiguity
 467 | Vision Transformer With Deformable Attention
 468 | MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
 469 | Dense Learning Based Semi-Supervised Object Detection
 470 | R(Det)2: Randomized Decision Routing for Object Detection
 471 | GlideNet: Global, Local and Intrinsic Based Dense Embedding NETwork for Multi-Category Attributes Prediction
 472 | Self-Supervised Equivariant Learning for Oriented Keypoint Detection
 473 | Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
 474 | Object Localization Under Single Coarse Point Supervision
 475 | Rethinking Visual Geo-Localization for Large-Scale Applications
 476 | Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild
 477 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification
 478 | Towards Unsupervised Domain Generalization
 479 | ViM: Out-of-Distribution With Virtual-Logit Matching
 480 | Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
 481 | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
 482 | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
 483 | Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
 484 | Language As Queries for Referring Video Object Segmentation
 485 | End-to-End Referring Video Object Segmentation With Multimodal Transformers
 486 | Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation
 487 | X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
 488 | Video-Text Representation Learning via Differentiable Weak Temporal Alignment
 489 | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions
 490 | Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions
 491 | Measuring Compositional Consistency for Video Question Answering
 492 | SimVQA: Exploring Simulated Environments for Visual Question Answering
 493 | Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
 494 | SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
 495 | MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
 496 | Maintaining Reasoning Consistency in Compositional Visual Question Answering
 497 | MLSLT: Towards Multilingual Sign Language Translation
 498 | A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
 499 | C2SLR: Consistency-Enhanced Continuous Sign Language Recognition
 500 | Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
 501 | Generating Diverse and Natural 3D Human Motions From Text
 502 | Sub-Word Level Lip Reading With Visual Attention
 503 | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale
 504 | ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
 505 | Cross Modal Retrieval With Querybank Normalisation
 506 | Prompt Distribution Learning
 507 | VALHALLA: Visual Hallucination for Machine Translation
 508 | VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
 509 | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
 510 | MixFormer: Mixing Features Across Windows and Dimensions
 511 | Recurrent Glimpse-Based Decoder for Detection With Transformer
 512 | Mobile-Former: Bridging MobileNet and Transformer
 513 | Unsupervised Domain Generalization by Learning a Bridge Across Domains
 514 | SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection
 515 | Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection
 516 | PNP: Robust Learning From Noisy Labels by Probabilistic Noise Prediction
 517 | Few-Shot Object Detection With Fully Cross-Transformer
 518 | Task Discrepancy Maximization for Fine-Grained Few-Shot Classification
 519 | Leveraging Self-Supervision for Cross-Domain Crowd Counting
 520 | What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions
 521 | AdaMixer: A Fast-Converging Query-Based Object Detector
 522 | Correlation Verification for Image Retrieval
 523 | Real-Time Object Detection for Streaming Perception
 524 | Deep Visual Geo-Localization Benchmark
 525 | RendNet: Unified 2D/3D Recognizer With Latent Space Rendering
 526 | Sparse Fuse Dense: Towards High Quality 3D Detection With Depth Completion
 527 | Focal Sparse Convolutional Networks for 3D Object Detection
 528 | Point-NeRF: Point-Based Neural Radiance Fields
 529 | NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction
 530 | Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction
 531 | Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
 532 | RegNeRF: Regularizing Neural Radiance Fields for View Synthesis From Sparse Inputs
 533 | Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
 534 | Plenoxels: Radiance Fields Without Neural Networks
 535 | Neural 3D Scene Reconstruction With the Manhattan-World Assumption
 536 | Neural 3D Video Synthesis From Multi-View Video
 537 | Learning To Solve Hard Minimal Problems
 538 | Learning a Structured Latent Space for Unsupervised Point Cloud Completion
 539 | Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes
 540 | IRON: Inverse Rendering by Optimizing Neural SDFs and Materials From Photometric Images
 541 | Learning Multi-View Aggregation in the Wild for Large-Scale 3D Semantic Segmentation
 542 | HyperDet3D: Learning a Scene-Conditioned 3D Object Detector
 543 | KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos
 544 | SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video
 545 | Ditto: Building Digital Twins of Articulated Objects From Interaction
 546 | Bijective Mapping Network for Shadow Removal
 547 | Toward Fast, Flexible, and Robust Low-Light Image Enhancement
 548 | Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements
 549 | Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution
 550 | Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution
 551 | SphereSR: 360° Image Super-Resolution With Arbitrary Projection via Continuous Spherical Image Representation
 552 | Learning Trajectory-Aware Transformer for Video Super-Resolution
 553 | Discrete Cosine Transform Network for Guided Depth Map Super-Resolution
 554 | Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations
 555 | ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding
 556 | Restormer: Efficient Transformer for High-Resolution Image Restoration
 557 | Deep Rectangling for Image Stitching: A Learning Baseline
 558 | Parametric Scattering Networks
 559 | Burst Image Restoration and Enhancement
 560 | MAXIM: Multi-Axis MLP for Image Processing
 561 | Event-Aided Direct Sparse Odometry
 562 | CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
 563 | Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection
 564 | Image Dehazing Transformer With Transmission-Aware 3D Position Embedding
 565 | Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity
 566 | Towards Multi-Domain Single Image Dehazing via Test-Time Training
 567 | Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal
 568 | Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment
 569 | Practical Learned Lossless JPEG Recompression With Multi-Level Cross-Channel Entropy Model in the DCT Domain
 570 | Neural Compression-Based Feature Learning for Video Restoration
 571 | Bi-Directional Object-Context Prioritization Learning for Saliency Ranking
 572 | Pixel Screening Based Intermediate Correction for Blind Deblurring
 573 | URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement
 574 | A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution
 575 | Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction
 576 | Task Decoupled Framework for Reference-Based Super-Resolution
 577 | Learning Semantic Associations for Mirror Detection
 578 | SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches
 579 | Investigating Tradeoffs in Real-World Video Super-Resolution
 580 | BasicVSR++: Improving Video Super-Resolution With Enhanced Propagation and Alignment
 581 | Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
 582 | Joint Global and Local Hierarchical Priors for Learned Image Compression
 583 | Reflash Dropout in Image Super-Resolution
 584 | Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond
 585 | Dreaming To Prune Image Deraining Networks
 586 | LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network
 587 | Exposure Normalization and Compensation for Multiple-Exposure Correction
 588 | Revisiting Temporal Alignment for Video Restoration
 589 | Learning the Degradation Distribution for Blind Image Super-Resolution
 590 | LSVC: A Learning-Based Stereo Video Compression Framework
 591 | Learning Based Multi-Modality Image and Video Compression
 592 | Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World
 593 | Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish
 594 | Stereo Depth From Events Cameras: Concentrate and Focus on the Future
 595 | Volumetric Bundle Adjustment for Online Photorealistic Scene Capture
 596 | Neural Volumetric Object Selection
 597 | HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture
 598 | NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions
 599 | BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion
 600 | Input-Level Inductive Biases for 3D Reconstruction
 601 | Multi-View Mesh Reconstruction With Neural Deferred Shading
 602 | StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
 603 | RGB-Depth Fusion GAN for Indoor Depth Completion
 604 | PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos
 605 | Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
 606 | ShapeFormer: Transformer-Based Shape Completion via Sparse Representation
 607 | GuideFormer: Transformers for Image Guided Depth Completion
 608 | Improving Neural Implicit Surfaces Geometry With Patch Warping
 609 | Critical Regularizations for Neural Surface Reconstruction in the Wild
 610 | Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction
 611 | Neural RGB-D Surface Reconstruction
 612 | POCO: Point Convolution for Surface Reconstruction
 613 | Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors
 614 | Surface Reconstruction From Point Clouds by Learning Predictive Context Priors
 615 | IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment
 616 | Deterministic Point Cloud Registration via Novel Transformation Decomposition
 617 | Global-Aware Registration of Less-Overlap RGB-D Scans
 618 | Finding Good Configurations of Planar Primitives in Unorganized Point Clouds
 619 | Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels
 620 | AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
 621 | WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
 622 | Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild
 623 | Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture
 624 | MotionAug: Augmentation With Physical Correction for Human Motion Prediction
 625 | Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction
 626 | Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction
 627 | Motron: Multimodal Probabilistic Human Motion Forecasting
 628 | Human Trajectory Prediction With Momentary Observation
 629 | Non-Probability Sampling Network for Stochastic Human Trajectory Prediction
 630 | Remember Intentions: Retrospective-Memory-Based Trajectory Prediction
 631 | GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction With Relational Reasoning
 632 | Learning Pixel Trajectories With Multiscale Contrastive Random Walks
 633 | Adaptive Trajectory Prediction via Transferable GNN
 634 | Neural Prior for Trajectory Estimation
 635 | M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction
 636 | How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting
 637 | ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework
 638 | Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction
 639 | Convolutions for Spatial Interaction Modeling
 640 | Style-ERD: Responsive and Coherent Online Motion Style Transfer
 641 | Neural Inertial Localization
 642 | RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry
 643 | CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism
 644 | ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses
 645 | Projective Manifold Gradient Layer for Deep Rotation Regression
 646 | Multimodal Colored Point Cloud to Image Alignment
 647 | Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering
 648 | REGTR: End-to-End Point Cloud Correspondences With Transformers
 649 | Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
 650 | BCOT: A Markerless High-Precision 3D Object Tracking Benchmark
 651 | SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation
 652 | ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework
 653 | Coupled Iterative Refinement for 6D Multi-Object Pose Estimation
 654 | ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation
 655 | SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings
 656 | MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision
 657 | Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions
 658 | GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting
 659 | HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR
 660 | OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation
 661 | FS6D: Few-Shot 6D Pose Estimation of Novel Objects
 662 | OnePose: One-Shot Object Pose Estimation Without CAD Models
 663 | OSOP: A Multi-Stage One Shot Object Pose Estimation Framework
 664 | DiffPoseNet: Direct Differentiable Camera Pose Estimation
 665 | Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects
 666 | CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild
 667 | Leveraging Equivariant Features for Absolute Pose Regression
 668 | The Majority Can Help the Minority: Context-Rich Minority Oversampling for Long-Tailed Classification
 669 | Long-Tailed Recognition via Weight Balancing
 670 | Balanced Contrastive Learning for Long-Tailed Visual Recognition
 671 | Targeted Supervised Contrastive Learning for Long-Tailed Recognition
 672 | Long-Tailed Visual Recognition via Gaussian Clouded Logit Adjustment
 673 | Long-Tail Recognition via Compositional Knowledge Transfer
 674 | Nested Collaborative Learning for Long-Tailed Visual Recognition
 675 | Retrieval Augmented Classification for Long-Tail Visual Recognition
 676 | Trustworthy Long-Tailed Classification
 677 | C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection
 678 | Equalized Focal Loss for Dense Long-Tailed Object Detection
 679 | Relieving Long-Tailed Instance Segmentation via Pairwise Class Balance
 680 | iFS-RCNN: An Incremental Few-Shot Instance Segmenter
 681 | Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
 682 | SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation
 683 | Undoing the Damage of Label Shift for Cross-Domain Semantic Segmentation
 684 | Representation Compensation Networks for Continual Semantic Segmentation
 685 | Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer
 686 | Domain-Agnostic Prior for Transfer Semantic Segmentation
 687 | Image Segmentation Using Text and Image Prompts
 688 | PCL: Proxy-Based Contrastive Learning for Domain Generalization
 689 | Localized Adversarial Domain Generalization
 690 | Compound Domain Generalization via Meta-Knowledge Encoding
 691 | Style Neophile: Constantly Seeking Novel Styles for Domain Generalization
 692 | Slimmable Domain Adaptation
 693 | Exploring Domain-Invariant Parameters for Source Free Domain Adaptation
 694 | Cross-Domain Few-Shot Learning With Task-Specific Adapters
 695 | Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition
 696 | Reusing the Task-Specific Classifier as a Discriminator: Discriminator-Free Adversarial Domain Adaptation
 697 | Safe Self-Refinement for Transformer-Based Domain Adaptation
 698 | Continual Test-Time Domain Adaptation
 699 | Source-Free Domain Adaptation via Distribution Estimation
 700 | Domain Adaptation on Point Clouds via Geometry-Aware Implicits
 701 | Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds
 702 | Hyperspherical Consistency Regularization
 703 | BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning
 704 | Cascade Transformers for End-to-End Person Search
 705 | Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
 706 | MPViT: Multi-Path Vision Transformer for Dense Prediction
 707 | NFormer: Robust Person Re-Identification With Neighbor Transformer
 708 | Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification
 709 | Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification
 710 | Augmented Geometric Distillation for Data-Free Incremental Person ReID
 711 | Salient-to-Broad Transition for Video Person Re-Identification
 712 | FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification
 713 | Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification
 714 | Implicit Sample Extension for Unsupervised Person Re-Identification
 715 | Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection
 716 | Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection
 717 | Fine-Grained Object Classification via Self-Supervised Pose Alignment
 718 | Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
 719 | Non-Isotropy Regularization for Proxy-Based Deep Metric Learning
 720 | Self-Taught Metric Learning Without Labels
 721 | Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency
 722 | Energy-Based Latent Aligner for Incremental Learning
 723 | Sketch3T: Test-Time Training for Zero-Shot SBIR
 724 | The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant Learning via Pose-Aware Convolution
 725 | Finding Badly Drawn Bunnies
 726 | Generalized Category Discovery
 727 | Recall@k Surrogate Loss With Large Batches and Similarity Mixup
 728 | Modeling 3D Layout for Group Re-Identification
 729 | Causal Transportability for Visual Recognition
 730 | Attributable Visual Similarity Learning
 731 | Bi-Level Alignment for Cross-Domain Crowd Counting
 732 | Mutual Quantization for Cross-Modal Search With Noisy Labels
 733 | Task Adaptive Parameter Sharing for Multi-Task Learning
 734 | Simple Multi-Dataset Detection
 735 | Cross-Domain Adaptive Teacher for Object Detection
 736 | Balanced and Hierarchical Relation Learning for One-Shot Object Detection
 737 | Semantic-Aligned Fusion Transformer for One-Shot Object Detection
 738 | MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
 739 | Robust Region Feature Synthesizer for Zero-Shot Object Detection
 740 | Region-Aware Face Swapping
 741 | High-Resolution Face Swapping via Latent Semantics Disentanglement
 742 | Rethinking Deep Face Restoration
 743 | Blind Face Restoration via Integrating Face Shape and Generative Priors
 744 | FENeRF: Face Editing in Neural Radiance Fields
 745 | TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
 746 | Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
 747 | Self-Supervised Correlation Mining Network for Person Image Generation
 748 | Exploring Dual-Task Correlation for Pose Guided Person Image Generation
 749 | InsetGAN for Full-Body Image Generation
 750 | BodyGAN: General-Purpose Controllable Neural Human Body Generation
 751 | HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs
 752 | Structure-Aware Flow Generation for Human Body Reshaping
 753 | Modeling Image Composition for Complex Scene Generation
 754 | Local Attention Pyramid for Scene Image Generation
 755 | Interactive Image Synthesis With Panoptic Layout Generation
 756 | iPLAN: Interactive and Procedural Layout Planning
 757 | E-CIR: Event-Enhanced Continuous Intensity Recovery
 758 | Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion
 759 | Neural Rays for Occlusion-Aware Image-Based Rendering
 760 | Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation
 761 | PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models
 762 | Commonality in Natural Images Rescues GANs: Pretraining GANs With Generic and Privacy-Free Synthetic Data
 763 | Think Twice Before Detecting GAN-Generated Fake Images From Their Spectral Domain Imprints
 764 | Robust Invertible Image Steganography
 765 | Distinguishing Unseen From Seen for Generalized Zero-Shot Learning
 766 | Few-Shot Font Generation by Learning Fine-Grained Local Styles
 767 | XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation
 768 | Learning To Generate Line Drawings That Convey Geometry and Semantics
 769 | Balanced MSE for Imbalanced Visual Regression
 770 | Transferability Metrics for Selecting Source Model Ensembles
 771 | OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
 772 | Robust Fine-Tuning of Zero-Shot Models
 773 | Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
 774 | Learning To Learn and Remember Super Long Multi-Domain Task Sequence
 775 | Learning Distinctive Margin Toward Active Domain Adaptation
 776 | DINE: Domain Adaptation From Single and Multiple Black-Box Predictors
 777 | Source-Free Object Detection by Learning To Overlook Domain Style
 778 | Towards Principled Disentanglement for Domain Generalization
 779 | Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
 780 | Causality Inspired Representation Learning for Domain Generalization
 781 | Learning What Not To Segment: A New Perspective on Few-Shot Segmentation
 782 | Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
 783 | ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation
 784 | MeMOT: Multi-Object Tracking With Memory
 785 | Unsupervised Learning of Accurate Siamese Tracking
 786 | Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
 787 | GMFlow: Learning Optical Flow via Global Matching
 788 | GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking
 789 | SNUG: Self-Supervised Neural Dynamic Garments
 790 | Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction
 791 | Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation
 792 | Context-Aware Sequence Alignment Using 4D Skeletal Augmentation
 793 | Enabling Equivariance for Arbitrary Lie Groups
 794 | RAMA: A Rapid Multicut Algorithm on GPU
 795 | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
 796 | RCP: Recurrent Closest Point for Point Cloud
 797 | Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
 798 | Balanced Multimodal Learning via On-the-Fly Gradient Modulation
 799 | Block-NeRF: Scalable Large Scene Neural View Synthesis
 800 | SceneSqueezer: Learning To Compress Scene for Camera Relocalization
 801 | Light Field Neural Rendering
 802 | Extracting Triangular 3D Models, Materials, and Lighting From Images
 803 | Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3)
 804 | Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
 805 | It’s All in the Teacher: Zero-Shot Quantization Brought Closer to the Teacher
 806 | NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
 807 | Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention
 808 | Parameter-Free Online Test-Time Adaptation
 809 | Patch-Level Representation Learning for Self-Supervised Vision Transformers
 810 | Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
 811 | Mixed Differential Privacy in Computer Vision
 812 | DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis
 813 | Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
 814 | AirObject: A Temporally Evolving Graph Embedding for Object Identification
 815 | Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds
 816 | SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud
 817 | Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement
 818 | VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
 819 | Embracing Single Stride 3D Object Detector With Sparse Transformer
 820 | Point Density-Aware Voxels for LiDAR 3D Object Detection
 821 | Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
 822 | Contrastive Boundary Learning for Point Cloud Segmentation
 823 | Stratified Transformer for 3D Point Cloud Segmentation
 824 | No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models by Fitting Feature-Level Space-Time Surfaces
 825 | Point2Seq: Detecting 3D Objects As Sequences
 826 | PTTR: Relational 3D Point Cloud Object Tracking With Transformer
 827 | A Unified Query-Based Paradigm for Point Cloud Understanding
 828 | PointCLIP: Point Cloud Understanding by CLIP
 829 | X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning
 830 | MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Convolutions
 831 | TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers
 832 | RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo
 833 | IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo
 834 | PSMNet: Position-Aware Stereo Merging Network for Room Layout Estimation
 835 | Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo
 836 | Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering
 837 | Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
 838 | Efficient Multi-View Stereo by Iterative Dynamic Cost Volume
 839 | PlaneMVS: 3D Plane Reconstruction From Multi-View Stereo
 840 | Discrete Time Convolution for Fast Event-Based Stereo
 841 | Stereo Magnification With Multi-Layer Images
 842 | TransforMatcher: Match-to-Match Attention for Semantic Correspondence
 843 | Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences
 844 | Locality-Aware Inter– and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
 845 | Transforming Model Prediction for Tracking
 846 | Ranking-Based Siamese Visual Tracking
 847 | Correlation-Aware Deep Tracking
 848 | Global Tracking via Ensemble of Local Trackers
 849 | Global Tracking Transformers
 850 | Unified Transformer Tracker for Object Tracking
 851 | Transformer Tracking With Cyclic Shifting Window Attention
 852 | Spiking Transformers for Event-Based Single Object Tracking
 853 | Adiabatic Quantum Computing for Multi Object Tracking
 854 | HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
 855 | Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking
 856 | TrackFormer: Multi-Object Tracking With Transformers
 857 | Learning of Global Objective for Network Flow in Multi-Object Tracking
 858 | LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking
 859 | Multi-Object Tracking Meets Moving UAV
 860 | Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
 861 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking
 862 | Learning Optical Flow With Kernel Patch Attention
 863 | Towards Understanding Adversarial Robustness of Optical Flow Networks
 864 | DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow
 865 | On the Instability of Relative Pose Estimation and RANSAC’s Role
 866 | Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training
 867 | Global Sensing and Measurements Reuse for Image Compressed Sensing
 868 | Maximum Consensus by Weighted Influences of Monotone Boolean Functions
 869 | MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph
 870 | Styleformer: Transformer Based Generative Adversarial Networks With Style Vector
 871 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose
 872 | Generating Representative Samples for Few-Shot Classification
 873 | Matching Feature Sets for Few-Shot Image Classification
 874 | Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations
 875 | Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection
 876 | Forward Compatible Few-Shot Class-Incremental Learning
 877 | Constrained Few-Shot Class-Incremental Learning
 878 | Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference
 879 | EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning
 880 | Few-Shot Learning With Noisy Labels
 881 | Ranking Distance Calibration for Cross-Domain Few-Shot Learning
 882 | Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
 883 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning
 884 | Learning To Memorize Feature Hallucination for One-Shot Image Generation
 885 | A Closer Look at Few-Shot Image Generation
 886 | Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition
 887 | Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability
 888 | Transferability Estimation Using Bhattacharyya Class Separability
 889 | Revisiting the Transferability of Supervised Pretraining: An MLP Perspective
 890 | Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data
 891 | Which Model To Transfer? Finding the Needle in the Growing Haystack
 892 | Does Robustness on ImageNet Transfer to Downstream Tasks?
 893 | What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors
 894 | OW-DETR: Open-World Detection Transformer
 895 | Unseen Classes at a Later Time? No Problem
 896 | Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
 897 | On Generalizing Beyond Domains in Cross-Domain Continual Learning
 898 | Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries
 899 | DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion
 900 | Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning
 901 | En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning
 902 | VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
 903 | Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning
 904 | KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning
 905 | Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis
 906 | WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery
 907 | Omni-DETR: Omni-Supervised Object Detection With Transformers
 908 | DESTR: Object Detection With Split Transformer
 909 | A Dual Weighting Label Assignment Scheme for Object Detection
 910 | Entropy-Based Active Learning for Object Detection With Progressive Diversity Constraint
 911 | Localization Distillation for Dense Object Detection
 912 | Group R-CNN for Weakly Semi-Supervised Object Detection With Points
 913 | Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
 914 | CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping
 915 | One Loss for Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching
 916 | PSTR: End-to-End One-Step Person Search With Transformers
 917 | Protecting Celebrities From DeepFake With Identity Consistency Transformer
 918 | MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis
 919 | Contextual Similarity Distillation for Asymmetric Image Retrieval
 920 | Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning
 921 | MPC: Multi-View Probabilistic Clustering
 922 | Text Spotting Transformers
 923 | Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting
 924 | Reflection and Rotation Symmetry Detection via Equivariant Learning
 925 | Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data
 926 | A Simple Episodic Linear Probe Improves Visual Recognition in the Wild
 927 | Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
 928 | Multi-Granularity Alignment Domain Adaptation for Object Detection
 929 | Expanding Low-Density Latent Regions for Open-Set Object Detection
 930 | Class-Incremental Learning With Strong Pre-Trained Models
 931 | ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
 932 | Self-Supervised Models Are Continual Learners
 933 | The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization
 934 | Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning
 935 | SimMIM: A Simple Framework for Masked Image Modeling
 936 | Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning
 937 | UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning
 938 | Contrastive Conditional Neural Processes
 939 | One-Bit Active Query With Contrastive Pairs
 940 | HCSC: Hierarchical Contrastive Selective Coding
 941 | Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging
 942 | Hierarchical Self-Supervised Representation Learning for Movie Understanding
 943 | Anomaly Detection via Reverse Distillation From One-Class Embedding
 944 | Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning
 945 | DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning
 946 | Learning To Collaborate in Decentralized Learning of Personalized Models
 947 | Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph
 948 | DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning
 949 | Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning
 950 | Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes
 951 | Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors
 952 | Spectral Unsupervised Domain Adaptation for Visual Recognition
 953 | DATA: Domain-Aware and Task-Aware Self-Supervised Learning
 954 | Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning
 955 | DeepDPM: Deep Clustering With an Unknown Number of Clusters
 956 | PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions
 957 | Robust Outlier Detection by De-Biasing VAE Likelihoods
 958 | Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
 959 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
 960 | Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation
 961 | DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
 962 | WildNet: Learning Domain Generalized Semantic Segmentation From the Wild
 963 | UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation
 964 | Semi-Supervised Semantic Segmentation With Error Localization Network
 965 | Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation
 966 | Integrative Few-Shot Learning for Classification and Segmentation
 967 | GanOrCon: Are Generative Models Useful for Few-Shot Segmentation?
 968 | SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis
 969 | CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs
 970 | GradViT: Gradient Inversion of Vision Transformers
 971 | Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings
 972 | CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning
 973 | APRIL: Finding the Achilles’ Heel on Privacy for Vision Transformers
 974 | Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
 975 | Robust Federated Learning With Noisy and Heterogeneous Clients
 976 | Federated Learning With Position-Aware Neurons
 977 | Layer-Wised Model Aggregation for Personalized Federated Learning
 978 | FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning
 979 | FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction
 980 | Differentially Private Federated Learning With Local Regularization and Sparsification
 981 | Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage
 982 | Learn From Others and Be Yourself in Heterogeneous Federated Learning
 983 | RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning
 984 | Federated Class-Incremental Learning
 985 | Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning
 986 | FedCorr: Multi-Stage Federated Learning for Label Noise Correction
 987 | ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning
 988 | Cycle-Consistent Counterfactuals by Latent Transformations
 989 | Consistent Explanations by Contrastive Learning
 990 | Towards Better Understanding Attribution Methods
 991 | Proto2Proto: Can You Recognize the Car, the Way I Do?
 992 | Do Explanations Explain? Model Knows Best
 993 | HINT: Hierarchical Neuron Concept Explainer
 994 | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
 995 | What Do Navigation Agents Learn About Their Environment?
 996 | A Framework for Learning Ante-Hoc Explainable Models via Concepts
 997 | Exploiting Explainable Metrics for Augmented SGD
 998 | FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks
 999 | Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations
1000 | B-Cos Networks: Alignment Is All We Need for Interpretability
1001 | The Flag Median and FlagIRLS
1002 | Learning Fair Classifiers With Partially Annotated Group Labels
1003 | Estimating Structural Disparities for Face Models
1004 | Estimating Example Difficulty Using Variance of Gradients
1005 | Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models
1006 | Fair Contrastive Learning for Facial Attribute Classification
1007 | Leveraging Adversarial Examples To Quantify Membership Information Leakage
1008 | Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers
1009 | Deep Unlearning via Randomized Conditionally Independent Hessians
1010 | Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets
1011 | A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models
1012 | Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices?
1013 | Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
1014 | SEEG: Semantic Energized Co-Speech Gesture Generation
1015 | Mix and Localize: Localizing Sound Sources in Mixtures
1016 | Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation
1017 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization
1018 | M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers
1019 | Finding Fallen Objects via Asynchronous Audio-Visual Integration
1020 | Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
1021 | Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
1022 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language
1023 | It’s Time for Artistic Correspondence in Music and Video
1024 | Self-Supervised Object Detection From Audio-Visual Correspondence
1025 | More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
1026 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
1027 | A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection
1028 | Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
1029 | Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps
1030 | Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values
1031 | Ensembling Off-the-Shelf Models for GAN Training
1032 | Marginal Contrastive Correspondence for Guided Image Generation
1033 | GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation
1034 | High-Resolution Image Synthesis With Latent Diffusion Models
1035 | Vector Quantized Diffusion Model for Text-to-Image Synthesis
1036 | ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation
1037 | Dataset Distillation by Matching Training Trajectories
1038 | Continual Predictive Learning From Videos
1039 | Motion-Adjustable Neural Implicit Video Representation
1040 | Splicing ViT Features for Semantic Appearance Transfer
1041 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting
1042 | Day-to-Night Image Synthesis for Training Nighttime Neural ISPs
1043 | Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness
1044 | Few-Shot Head Swapping in the Wild
1045 | ClothFormer: Taming Video Virtual Try-On in All Module
1046 | A-ViT: Adaptive Tokens for Efficient Vision Transformer
1047 | MetaFormer Is Actually What You Need for Vision
1048 | Reversible Vision Transformers
1049 | Learned Queries for Efficient Local Attention
1050 | Shunted Self-Attention via Multi-Scale Token Aggregation
1051 | Automatic Relation-Aware Graph Network Proliferation
1052 | β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
1053 | Distribution Consistent Neural Architecture Search
1054 | Training-Free Transformer Architecture Search
1055 | TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
1056 | Knowledge Distillation via the Target-Aware Transformer
1057 | Knowledge Distillation: A Good Teacher Is Patient and Consistent
1058 | An Image Patch Is a Wave: Phase-Aware Vision MLP
1059 | Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
1060 | Controllable Dynamic Multi-Task Architectures
1061 | Grounded Language-Image Pre-Training
1062 | ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds
1063 | CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings
1064 | Adversarial Parametric Pose Prior
1065 | Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation
1066 | PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision
1067 | Generalizable Human Pose Triangulation
1068 | GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras
1069 | Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory
1070 | Contextual Instance Decoupling for Robust Multi-Person Pose Estimation
1071 | End-to-End Multi-Person Pose Estimation With Transformers
1072 | Meta Agent Teaming Active Learning for Pose Estimation
1073 | Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
1074 | Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer
1075 | Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture
1076 | LASER: LAtent SpacE Rendering for 2D Visual Localization
1077 | Learning To Detect Scene Landmarks for Camera Localization
1078 | Geometric Transformer for Fast and Robust Point Cloud Registration
1079 | ARCS: Accurate Rotation and Correspondence Search
1080 | FisherMatch: Semi-Supervised Rotation Regression via Entropy-Based Filtering
1081 | Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation
1082 | OSSGAN: Open-Set Semi-Supervised Image Generation
1083 | Attribute Group Editing for Reliable Few-Shot Image Generation
1084 | Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment
1085 | Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis
1086 | Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis
1087 | Generative Flows With Invertible Attentions
1088 | Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization
1089 | SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
1090 | Manifold Learning Benefits GANs
1091 | DO-GAN: A Double Oracle Framework for Generative Adversarial Networks
1092 | Improving GAN Equilibrium by Raising Spatial Awareness
1093 | Feature Statistics Mixing Regularization for Generative Adversarial Networks
1094 | StyleSwin: Transformer-Based GAN for High-Resolution Image Generation
1095 | MaskGIT: Masked Generative Image Transformer
1096 | StyTr2: Image Style Transfer With Transformers
1097 | Style Transformer for Image Inversion and Editing
1098 | Reduce Information Loss in Transformers for Pluralistic Image Inpainting
1099 | Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding
1100 | UniCoRN: A Unified Conditional Image Repainting Network
1101 | High-Fidelity GAN Inversion for Image Attribute Editing
1102 | HyperInverter: Improving StyleGAN Inversion via Hypernetwork
1103 | Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
1104 | On Aliased Resizing and Surprising Subtleties in GAN Evaluation
1105 | Dual-Path Image Inpainting With Auxiliary GAN Inversion
1106 | InOut: Diverse Image Outpainting via GAN Inversion
1107 | Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
1108 | Contextual Outpainting With Object-Level Contrastive Learning
1109 | RePaint: Inpainting Using Denoising Diffusion Probabilistic Models
1110 | Perception Prioritized Training of Diffusion Models
1111 | Dynamic Dual-Output Diffusion Models
1112 | Generating High Fidelity Data From Low-Density Regions Using Diffusion Models
1113 | Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation
1114 | Bridging Global Context Interactions for High-Fidelity Image Completion
1115 | Autoregressive Image Generation Using Residual Quantization
1116 | Arbitrary-Scale Image Synthesis
1117 | Cluster-Guided Image Synthesis With Unconditional Models
1118 | Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
1119 | Generalized Few-Shot Semantic Segmentation
1120 | Learning Non-Target Knowledge for Few-Shot Semantic Segmentation
1121 | Decoupling Zero-Shot Semantic Segmentation
1122 | Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation
1123 | ContrastMask: Contrastive Learning To Segment Every Thing
1124 | The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference
1125 | AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation
1126 | APES: Articulated Part Extraction From Sprite Sheets
1127 | GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation
1128 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
1129 | Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images
1130 | C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
1131 | CRIS: CLIP-Driven Referring Image Segmentation
1132 | MatteFormer: Transformer-Based Image Matting via Prior-Tokens
1133 | Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation
1134 | Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
1135 | Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection
1136 | Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation
1137 | GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings
1138 | Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport
1139 | CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly
1140 | RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures
1141 | Discovering Objects That Can Move
1142 | PatchFormer: An Efficient Point Transformer With Patch Attention
1143 | Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap
1144 | SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
1145 | An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation
1146 | Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation
1147 | Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders
1148 | Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?
1149 | BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule
1150 | Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
1151 | Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search
1152 | GreedyNASv2: Greedier Search With a Greedy Path Filter
1153 | Neural Architecture Search With Representation Mutual Information
1154 | Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search
1155 | Knowledge Distillation With the Reused Teacher Classifier
1156 | Self-Distillation From the Last Mini-Batch for Consistency Regularization
1157 | Decoupled Knowledge Distillation
1158 | Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
1159 | A ConvNet for the 2020s
1160 | Beyond Fixation: Dynamic Window Visual Transformer
1161 | Lite Vision Transformer With Enhanced Self-Attention
1162 | Swin Transformer V2: Scaling Up Capacity and Resolution
1163 | The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
1164 | MulT: An End-to-End Multitask Learning Transformer
1165 | Towards Robust Vision Transformer
1166 | DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
1167 | MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens
1168 | NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
1169 | TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
1170 | Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
1171 | Scaling Vision Transformers
1172 | Bridged Transformer for Vision and Point Cloud 3D Object Detection
1173 | CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows
1174 | TransMix: Attend To Mix for Vision Transformers
1175 | MiniViT: Compressing Vision Transformers With Weight Multiplexing
1176 | Fine-Tuning Image Transformers Using Learnable Memory
1177 | Patch Slimming for Efficient Vision Transformers
1178 | CMT: Convolutional Neural Networks Meet Vision Transformers
1179 | Multimodal Token Fusion for Vision Transformers
1180 | CAFE: Learning To Condense Dataset by Aligning Features
1181 | Lite-MDETR: A Lightweight Multi-Modal Detector
1182 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning
1183 | Searching the Deployable Convolution Neural Networks for GPUs
1184 | Active Learning by Feature Mixing
1185 | When To Prune? A Policy Towards Early Structural Pruning
1186 | Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning
1187 | How Well Do Sparse ImageNet Models Transfer?
1188 | Rep-Net: Efficient On-Device Learning via Feature Reprogramming
1189 | CHEX: CHannel EXploration for CNN Model Compression
1190 | HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks
1191 | AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
1192 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation
1193 | Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error
1194 | IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization
1195 | DECORE: Deep Compression With Reinforcement Learning
1196 | Towards Efficient and Scalable Sharpness-Aware Minimization
1197 | AEGNN: Asynchronous Event-Based Graph Neural Networks
1198 | DiSparse: Disentangled Sparsification for Multitask Model Compression
1199 | Multi-Modal Extreme Classification
1200 | A Sampling-Based Approach for Efficient Clustering in Large Datasets
1201 | Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction
1202 | Learnable Lookup Table for Neural Network Quantization
1203 | Instance-Aware Dynamic Neural Network Quantization
1204 | Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation
1205 | Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction
1206 | Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation
1207 | PokeBNN: A Binary Pursuit of Lightweight Accuracy
1208 | Automated Progressive Learning for Efficient Training of Vision Transformers
1209 | DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
1210 | Channel Balancing for Accurate Quantization of Winograd Convolutions
1211 | ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching
1212 | Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs
1213 | AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation
1214 | TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing
1215 | SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems
1216 | TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed
1217 | DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation
1218 | Universal Photometric Stereo Network Using Global Lighting Contexts
1219 | Uncertainty-Aware Deep Multi-View Photometric Stereo
1220 | Fast Light-Weight Near-Field Photometric Stereo
1221 | Glass Segmentation Using Intensity and Spectral Polarization Cues
1222 | Shape From Polarization for Complex Scenes in the Wild
1223 | Deep Depth From Focus With Differential Focus Volume
1224 | Optimal LED Spectral Multiplexing for NIR2RGB Translation
1225 | Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements
1226 | NAN: Noise-Aware NeRFs for Burst-Denoising
1227 | Estimating Fine-Grained Noise Model via Contrastive Learning
1228 | Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders
1229 | MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution
1230 | PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images
1231 | Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors
1232 | Learning To Anticipate Future With Dynamic Context Removal
1233 | Self-Supervised Spatial Reasoning on Multi-View Line Drawings
1234 | Contextual Debiasing for Visual Recognition With Causal Mechanisms
1235 | Relative Pose From a Calibrated and an Uncalibrated Smartphone Image
1236 | Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation
1237 | NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
1238 | NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning
1239 | ScaleNet: A Shallow Architecture for Scale Estimation
1240 | Camera Pose Estimation Using Implicit Distortion Models
1241 | GIFS: Neural Implicit Function for General Shape Representation
1242 | Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds
1243 | SPAMs: Structured Implicit Parametric Models
1244 | Deblur-NeRF: Neural Radiance Fields From Blurry Images
1245 | Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
1246 | Depth-Supervised NeRF: Fewer Views and Faster Training for Free
1247 | Dense Depth Priors for Neural Radiance Fields From Sparse Input Views
1248 | EfficientNeRF ­ Efficient Neural Radiance Fields
1249 | InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering
1250 | Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs
1251 | Urban Radiance Fields
1252 | Hallucinated Neural Radiance Fields in the Wild
1253 | Towards Multimodal Depth Estimation From Light Fields
1254 | Degradation-Agnostic Correspondence From Resolution-Asymmetric Stereo
1255 | Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching
1256 | Attention Concatenation Volume for Accurate and Efficient Stereo Matching
1257 | Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
1258 | Revisiting Domain Generalized Stereo Matching Networks From a Feature Consistency Perspective
1259 | GraftNet: Towards Domain Generalized Stereo Matching With a Broad-Spectrum and Task-Oriented Feature
1260 | ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks
1261 | ActiveZero: Mixed Domain Learning for Active Stereovision With Zero Annotation
1262 | FoggyStereo: Stereo Matching With Fog Volume Representation
1263 | Multi-Person Extreme Motion Prediction
1264 | Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation
1265 | AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation
1266 | Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation
1267 | Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
1268 | Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video
1269 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization
1270 | Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
1271 | Location-Free Human Pose Estimation
1272 | MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
1273 | Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision
1274 | Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors
1275 | PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound
1276 | Differentiable Dynamics for Articulated 3D Human Motion Reconstruction
1277 | COAP: Compositional Articulated Occupancy of People
1278 | Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video
1279 | SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration
1280 | MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
1281 | Putting People in Their Place: Monocular Regression of 3D People in Depth
1282 | FLAG: Flow-Based 3D Avatar Generation From Sparse Observations
1283 | GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
1284 | Capturing and Inferring Dense Full-Body Human-Scene Contact
1285 | BodyMap: Learning Full-Body Dense Correspondence Map
1286 | ICON: Implicit Clothed Humans Obtained From Normals
1287 | Adversarial Texture for Fooling Person Detectors in the Physical World
1288 | Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World
1289 | Enhancing Classifier Conservativeness and Robustness by Polynomiality
1290 | Backdoor Attacks on Self-Supervised Learning
1291 | Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
1292 | Few-Shot Backdoor Defense Using Shapley Estimation
1293 | Better Trigger Inversion Optimization in Backdoor Scanning
1294 | Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees
1295 | Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching
1296 | LAS-AT: Adversarial Training With Learnable Attack Strategy
1297 | Subspace Adversarial Training
1298 | Pyramid Adversarial Training Improves ViT Performance
1299 | Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
1300 | Robust Image Forgery Detection Over Online Social Network Shared Images
1301 | Quantifying Societal Bias Amplification in Image Captioning
1302 | Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models
1303 | GAN-Supervised Dense Visual Alignment
1304 | Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator
1305 | Text2Mesh: Text-Driven Neural Stylization for Meshes
1306 | StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation
1307 | Physical Simulation Layer for Accurate 3D Modeling
1308 | Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time
1309 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
1310 | I M Avatar: Implicit Morphable Head Avatars From Videos
1311 | E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations
1312 | RCL: Recurrent Continuous Localization for Temporal Action Detection
1313 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
1314 | MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
1315 | TubeR: Tubelet Transformer for Video Action Detection
1316 | MixFormer: End-to-End Tracking With Iterative Mixed Attention
1317 | DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
1318 | Proper Reuse of Image Classification Features Improves Object Detection
1319 | Boosting 3D Object Detection by Simulating Multimodality on Point Clouds
1320 | TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation
1321 | Disentangling Visual Embeddings for Attributes and Objects
1322 | QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
1323 | Unknown-Aware Object Detection: Learning What You Don’t Know From Videos in the Wild
1324 | Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks
1325 | Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective
1326 | Calibrating Deep Neural Networks by Pairwise Constraints
1327 | Lifelong Graph Learning
1328 | OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
1329 | Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation
1330 | Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches
1331 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation
1332 | UnweaveNet: Unweaving Activity Stories
1333 | Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
1334 | Audio-Adaptive Activity Recognition Across Video Domains
1335 | Frame-Wise Action Representations for Long Videos via Sequence Contrastive Learning
1336 | Image Based Reconstruction of Liquids From 2D Surface Detections
1337 | Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency
1338 | How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs
1339 | Programmatic Concept Learning for Human Motion Description and Synthesis
1340 | Learning To Recognize Procedural Activities With Distant Supervision
1341 | Implicit Motion Handling for Video Camouflaged Object Detection
1342 | Dynamic Scene Graph Generation via Anticipatory Pre-Training
1343 | Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization
1344 | OCSampler: Compressing Videos to One Clip With Single-Step Sampling
1345 | A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting
1346 | TubeFormer-DeepLab: Video Mask Transformer
1347 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization
1348 | A Graph Matching Perspective With Transformers on Video Instance Segmentation
1349 | STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction
1350 | Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
1351 | End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
1352 | Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision
1353 | Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement
1354 | A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
1355 | Long-Short Temporal Contrastive Learning of Video Transformers
1356 | Scene Consistency Representation Learning for Video Scene Segmentation
1357 | Unsupervised Pre-Training for Temporal Action Localization Tasks
1358 | Contrastive Learning for Unsupervised Video Highlight Detection
1359 | Deformable Video Transformer
1360 | Recurring the Transformer for Video Action Recognition
1361 | Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation
1362 | Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model
1363 | Sign Language Video Retrieval With Free-Form Textual Queries
1364 | FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback
1365 | Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation
1366 | ESCNet: Gaze Target Detection With the Understanding of 3D Scenes
1367 | Interactive Multi-Class Tiny-Object Detection
1368 | Weakly Supervised Rotation-Invariant Aerial Object Detection Network
1369 | Large Loss Matters in Weakly Supervised Multi-Label Classification
1370 | MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning
1371 | FreeSOLO: Learning To Segment Objects Without Annotations
1372 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection
1373 | SIOD: Single Instance Annotated per Category per Image for Object Detection
1374 | Towards Robust Adaptive Object Detection Under Noisy Annotations
1375 | Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection
1376 | Salvage of Supervision in Weakly Supervised Object Detection
1377 | Label, Verify, Correct: A Simple Few Shot Object Detection Method
1378 | Background Activation Suppression for Weakly Supervised Object Localization
1379 | Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization
1380 | Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery
1381 | Cloth-Changing Person Re-Identification From a Single Image With Gait Prediction and Regularization
1382 | Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation
1383 | Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification
1384 | Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification
1385 | Towards Total Recall in Industrial Anomaly Detection
1386 | H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection
1387 | Geometric and Textural Augmentation for Domain Gap Reduction
1388 | General Incremental Learning With Domain-Aware Categorical Representations
1389 | DST: Dynamic Substitute Training for Data-Free Black-Box Attack
1390 | ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation
1391 | Label Matching Semi-Supervised Object Detection
1392 | Multidimensional Belief Quantification for Label-Efficient Meta-Learning
1393 | Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples
1394 | Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification
1395 | Class-Aware Contrastive Semi-Supervised Learning
1396 | Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework
1397 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo
1398 | Learning Where To Learn in Cross-View Self-Supervised Learning
1399 | Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective
1400 | SimMatch: Semi-Supervised Learning With Similarity Matching
1401 | Active Teacher for Semi-Supervised Object Detection
1402 | Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection
1403 | Self-Supervised Learning of Object Parts for Semantic Segmentation
1404 | MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
1405 | Scale-Equivalent Distillation for Semi-Supervised Object Detection
1406 | A Self-Supervised Descriptor for Image Copy Detection
1407 | Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut
1408 | CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification
1409 | Semi-Supervised Few-Shot Learning via Multi-Factor Clustering
1410 | CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning
1411 | Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data
1412 | A Simple Data Mixing Prior for Improving Self-Supervised Learning
1413 | DETReg: Unsupervised Pretraining With Region Priors for Object Detection
1414 | Sound and Visual Representation Learning With Multiple Pretraining Tasks
1415 | UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training
1416 | Weakly Supervised Object Localization As Domain Adaption
1417 | Debiased Learning From Naturally Imbalanced Pseudo-Labels
1418 | Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning
1419 | Masked Feature Prediction for Self-Supervised Visual Pre-Training
1420 | Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency
1421 | Id-Free Person Similarity Learning
1422 | End-to-End Semi-Supervised Learning for Video Action Detection
1423 | Probabilistic Representations for Video Contrastive Learning
1424 | Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition
1425 | BEVT: BERT Pretraining of Video Transformers
1426 | Generative Cooperative Learning for Unsupervised Video Anomaly Detection
1427 | When Does Contrastive Visual Representation Learning Work?
1428 | The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
1429 | What Matters for Meta-Learning Vision Regression Tasks?
1430 | IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
1431 | TCTrack: Temporal Contexts for Aerial Tracking
1432 | AKB-48: A Real-World Articulated Object Knowledge Base
1433 | 3DAC: Learning Attribute Compression for Point Clouds
1434 | Simple but Effective: CLIP Embeddings for Embodied AI
1435 | Multi-Robot Active Mapping via Neural Bipartite Graph Matching
1436 | Continuous Scene Representations for Embodied AI
1437 | Interactron: Embodied Adaptive Object Detection
1438 | Online Learning of Reusable Abstract Models for Object Goal Navigation
1439 | RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization
1440 | UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation
1441 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation
1442 | Upright-Net: Learning Upright Orientation for 3D Point Cloud
1443 | DeepFake Disrupter: The Detector of DeepFake Is My Friend
1444 | HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization
1445 | Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources
1446 | Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
1447 | Transferable Sparse Adversarial Attack
1448 | Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection
1449 | Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability
1450 | Improving Adversarial Transferability via Neuron Attribution-Based Attacks
1451 | Complex Backdoor Detection by Symmetric Feature Differencing
1452 | Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer
1453 | Zero-Query Transfer Attacks on Context-Aware Object Detectors
1454 | 360-Attack: Distortion-Aware Perturbations From Perspective-Views
1455 | Label-Only Model Inversion Attacks via Boundary Repulsion
1456 | Merry Go Round: Rotate a Frame and Fool a DNN
1457 | Cross-Modal Transferable Adversarial Attacks From Images to Videos
1458 | BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning
1459 | Investigating Top-k White-Box and Transferable Black-Box Attack
1460 | Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution
1461 | Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack
1462 | Towards Efficient Data Free Black-Box Adversarial Attack
1463 | Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network
1464 | Certified Patch Robustness via Smoothed Vision Transformers
1465 | Towards Practical Certifiable Patch Defense With Vision Transformer
1466 | On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles
1467 | 3DeformRS: Certifying Spatial Deformations on Point Clouds
1468 | Stereoscopic Universal Perturbations Across Different Architectures and Datasets
1469 | Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations
1470 | Bounded Adversarial Attack on Deep Content Features
1471 | DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints
1472 | Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart
1473 | Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
1474 | Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input
1475 | Adversarial Eigen Attack on Black-Box Models
1476 | Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond
1477 | Enhancing Adversarial Training With Second-Order Statistics of Weights
1478 | Towards Data-Free Model Stealing in a Hard Label Setting
1479 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
1480 | DTA: Physical Camouflage Attacks Using Differentiable Transformation Network
1481 | Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity
1482 | Enhancing Adversarial Robustness for Deep Metric Learning
1483 | Shape-Invariant 3D Adversarial Point Clouds
1484 | Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon
1485 | Exploring Effective Data for Surrogate Training Towards Black-Box Attack
1486 | NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
1487 | Dual-Key Multimodal Backdoors for Visual Question Answering
1488 | Proactive Image Manipulation Detection
1489 | ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts
1490 | EnvEdit: Environment Editing for Vision-and-Language Navigation
1491 | HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation
1492 | Less Is More: Generating Grounded Navigation Instructions From Landmarks
1493 | Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
1494 | Reinforced Structured State-Evolution for Vision-Language Navigation
1495 | Cross-Modal Map Learning for Vision and Language Navigation
1496 | Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
1497 | One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones
1498 | Expanding Large Pre-Trained Unimodal Models With Multimodal Information Injection for Image-Text Multimodal Classification
1499 | Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
1500 | Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
1501 | Multi-View Transformer for 3D Visual Grounding
1502 | Multi-Modal Dynamic Graph Transformer for Visual Grounding
1503 | Weakly-Supervised Generation and Grounding of Visual Descriptions With Conditional Generative Models
1504 | Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning
1505 | Visual Abductive Reasoning
1506 | Query and Attention Augmentation for Knowledge-Based Explainable Reasoning
1507 | REX: Reasoning-Aware and Grounded Explanation
1508 | Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation
1509 | Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships
1510 | Scene Graph Expansion for Semantics-Guided Image Outpainting
1511 | VisualHow: Multimodal Problem Solving
1512 | FLAVA: A Foundational Language and Vision Alignment Model
1513 | Multi-Modal Alignment Using Representation Codebook
1514 | Negative-Aware Attention Framework for Image-Text Matching
1515 | Vision-Language Pre-Training With Triple Contrastive Learning
1516 | Vision-Language Pre-Training for Boosting Scene Text Detectors
1517 | COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
1518 | NeurMiPs: Neural Mixture of Planar Experts for View Synthesis
1519 | FWD: Real-Time Novel View Synthesis With Forward Warping and Depth
1520 | SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images
1521 | Fast, Accurate and Memory-Efficient Partial Permutation Synchronization
1522 | Learning To Find Good Models in RANSAC
1523 | Optimizing Elimination Templates by Greedy Parameter Search
1524 | GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision
1525 | HARA: A Hierarchical Approach for Robust Rotation Averaging
1526 | RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging
1527 | A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors
1528 | ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance
1529 | Self-Supervised Neural Articulated Shape and Appearance Models
1530 | Virtual Elastic Objects
1531 | Decoupling Makes Weakly Supervised Local Feature Better
1532 | JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints
1533 | ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging
1534 | DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering
1535 | Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis
1536 | Structured Local Radiance Fields for Human Avatar Modeling
1537 | High-Fidelity Human Avatars From a Single RGB Camera
1538 | Forecasting Characteristic 3D Poses of Human Actions
1539 | Virtual Correspondence: Humans as a Cue for Extreme-View Geometry
1540 | BEHAVE: Dataset and Method for Tracking Human Object Interactions
1541 | Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives
1542 | RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
1543 | NPBG++: Accelerating Neural Point-Based Graphics
1544 | Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows
1545 | Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos
1546 | Masked Autoencoders Are Scalable Vision Learners
1547 | Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision
1548 | Bayesian Invariant Risk Minimization
1549 | Crafting Better Contrastive Views for Siamese Representation Learning
1550 | Rethinking Minimal Sufficient Representation in Contrastive Learning
1551 | Multi-Level Feature Learning for Contrastive Multi-View Clustering
1552 | Point-Level Region Contrast for Object Detection Pre-Training
1553 | Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation
1554 | A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration
1555 | SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos
1556 | Omnivore: A Single Model for Many Visual Modalities
1557 | DPICT: Deep Progressive Image Compression Using Trit-Planes
1558 | Efficient Geometry-Aware 3D Generative Adversarial Networks
1559 | Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation
1560 | Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
1561 | Versatile Multi-Modal Pre-Training for Human-Centric Perception
1562 | Bridging Video-Text Retrieval With Multiple Choice Questions
1563 | Integrating Language Guidance Into Vision-Based Deep Metric Learning
1564 | NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images
1565 | DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering
1566 | HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video
1567 | Neural Reflectance for Shape Recovery With Shadow Handling
1568 | Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video
1569 | Dancing Under the Stars: Video Denoising in Starlight
1570 | BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation
1571 | Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation
1572 | 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image
1573 | BokehMe: When Neural Rendering Meets Classical Rendering
1574 | Deblurring via Stochastic Refinement
1575 | Learning to Deblur Using Light Field Generated and Real Defocus Images
1576 | Towards Layer-Wise Image Vectorization
1577 | Dual-Shutter Optical Vibration Sensing
1578 | Fisher Information Guidance for Learned Time-of-Flight Imaging
1579 | Autofocus for Event Cameras
1580 | Adaptive Gating for Single-Photon 3D Imaging
1581 | LiDAR Snowfall Simulation for Robust 3D Object Detection
1582 | MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
1583 | Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer
1584 | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
1585 | Disentangling Visual and Written Concepts in CLIP
1586 | CLIP-Event: Connecting Text and Images With Event Structures
1587 | Robust Cross-Modal Representation Learning With Progressive Self-Distillation
1588 | TubeDETR: Spatio-Temporal Video Grounding With Transformers
1589 | 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
1590 | 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
1591 | Globetrotter: Connecting Languages by Connecting Images
1592 | Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment
1593 | WebQA: Multihop and Multimodal QA
1594 | PartGlot: Learning Shape Part Segmentation From Language Reference Games
1595 | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
1596 | L-Verse: Bidirectional Generation Between Image and Text
1597 | Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation
1598 | LaTr: Layout-Aware Transformer for Scene-Text VQA
1599 | Learning Program Representations for Food Images and Cooking Recipes
1600 | On the Importance of Asymmetry for Siamese Representation Learning
1601 | Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy
1602 | Exploring Set Similarity for Dense Self-Supervised Representation Learning
1603 | Align Representations With Base: A New Approach to Self-Supervised Learning
1604 | Identifying Ambiguous Similarity Conditions via Semantic Matching
1605 | Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization
1606 | Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation
1607 | Unsupervised Visual Representation Learning by Online Constrained K-Means
1608 | Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views
1609 | Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework
1610 | Robust Contrastive Learning Against Noisy Views
1611 | On Learning Contrastive Representations for Learning With Noisy Labels
1612 | Directional Self-Supervised Learning for Heavy Image Augmentations
1613 | Continual Learning for Visual Search With Backward Consistent Feature Embedding
1614 | Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
1615 | Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning
1616 | Bring Evanescent Representations to Life in Lifelong Class Incremental Learning
1617 | Unsupervised Learning of Debiased Representations With Pseudo-Attributes
1618 | A Conservative Approach for Unbiased Learning on Unknown Biases
1619 | Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization
1620 | Co-Advise: Cross Inductive Bias Distillation
1621 | PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
1622 | RegionCLIP: Region-Based Language-Image Pretraining
1623 | Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks
1624 | Conditional Prompt Learning for Vision-Language Models
1625 | Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation?
1626 | Partial Class Activation Attention for Semantic Segmentation
1627 | Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers
1628 | Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation
1629 | Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation
1630 | Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
1631 | L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
1632 | Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data
1633 | Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
1634 | Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation
1635 | MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
1636 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night
1637 | Fast Point Transformer
1638 | RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior
1639 | ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes
1640 | DisARM: Displacement Aware Relation Module for 3D Detection
1641 | Learning Object Context for Novel-View Scene Layout Generation
1642 | Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts
1643 | Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image
1644 | Raw High-Definition Radar for Multi-Task Learning
1645 | Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
1646 | UKPGAN: A General Self-Supervised Keypoint Detector
1647 | Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos
1648 | Rethinking Efficient Lane Detection via Curve Modeling
1649 | Exploiting Temporal Relations on Radar Perception for Autonomous Driving
1650 | Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective
1651 | BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement
1652 | ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning
1653 | Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion
1654 | Vehicle Trajectory Prediction Works, but Not Everywhere
1655 | LTP: Lane-Based Trajectory Prediction for Autonomous Driving
1656 | ONCE-3DLanes: Building Monocular 3D Lane Detection
1657 | Towards Driving-Oriented Metric for Lane Detection Models
1658 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes
1659 | LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection
1660 | DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
1661 | A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation
1662 | Forecasting From LiDAR via Future Object Detection
1663 | RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding
1664 | Learning From All Vehicles
1665 | Is Mapping Necessary for Realistic PointGoal Navigation?
1666 | Symmetry-Aware Neural Architecture for Embodied Visual Exploration
1667 | Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles
1668 | Topology Preserving Local Road Network Estimation From Single Onboard Camera Image
1669 | Coupling Vision and Proprioception for Navigation of Legged Robots
1670 | Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation
1671 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
1672 | Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior
1673 | SelfD: Self-Learning Large-Scale Driving Policies From the Web
1674 | Towards Real-World Navigation With Deep Differentiable Planners
1675 | Privacy Preserving Partial Localization
1676 | Efficient Large-Scale Localization by Global Instance Recognition
1677 | CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data
1678 | Bilateral Video Magnification Filter
1679 | Neural Data-Dependent Transform for Learned Image Compression
1680 | Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
1681 | Deep Generalized Unfolding Networks for Image Restoration
1682 | Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling
1683 | XYDeblur: Divide and Conquer for Single Image Deblurring
1684 | Abandoning the Bayer-Filter To See in the Dark
1685 | RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution
1686 | All-in-One Image Restoration for Unknown Corruption
1687 | Modeling sRGB Camera Noise With Normalizing Flows
1688 | A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift
1689 | Video Frame Interpolation Transformer
1690 | The Devil Is in the Details: Window-Based Attention for Image Compression
1691 | Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction
1692 | RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
1693 | AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement
1694 | HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging
1695 | HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging
1696 | Learning To Zoom Inside Camera Imaging Pipeline
1697 | Towards an End-to-End Framework for Flow-Guided Video Inpainting
1698 | Context-Aware Video Reconstruction for Rolling Shutter Cameras
1699 | CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image
1700 | Global Matching With Overlapping Attention for Optical Flow Estimation
1701 | CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
1702 | Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression
1703 | Video Demoiréing With Relation-Based Temporal Consistency
1704 | Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images
1705 | Deep Constrained Least Squares for Blind Image Super-Resolution
1706 | Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model
1707 | Unsupervised Homography Estimation With Coplanarity-Aware GAN
1708 | Attentive Fine-Grained Structured Sparsity for Image Restoration
1709 | Uformer: A General U-Shaped Transformer for Image Restoration
1710 | Bringing Old Films Back to Life
1711 | Learning sRGB-to-Raw-RGB De-Rendering With Content-Aware Metadata
1712 | SNR-Aware Low-Light Image Enhancement
1713 | AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network
1714 | Synthetic Aperture Imaging With Events and Frames
1715 | Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition
1716 | Time Lens++: Event-Based Frame Interpolation With Parametric Non-Linear Flow and Multi-Scale Fusion
1717 | Unifying Motion Deblurring and Frame Interpolation With Events
1718 | EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction
1719 | Learning Adaptive Warping for Real-World Rolling Shutter Correction
1720 | Neural Global Shutter: Learn To Restore Video From a Rolling Shutter Camera With Global Reset Feature
1721 | TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation
1722 | Optimizing Video Prediction via Video Frame Interpolation
1723 | Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets
1724 | Memory-Augmented Non-Local Attention for Video Super-Resolution
1725 | Optical Flow Estimation for Spiking Camera
1726 | Compressive Single-Photon 3D Cameras
1727 | Single-Photon Structured Light
1728 | All-Photon Polarimetric Time-of-Flight Imaging
1729 | Holocurtains: Programming Light Curtains via Binary Holography
1730 | Towards Implicit Text-Guided 3D Shape Generation
1731 | Towards Language-Free Training for Text-to-Image Generation
1732 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
1733 | EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
1734 | Hierarchical Modular Network for Video Captioning
1735 | SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
1736 | End-to-End Generative Pretraining for Multimodal Video Captioning
1737 | Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
1738 | Scaling Up Vision-Language Pre-Training for Image Captioning
1739 | Comprehending and Ordering Semantics for Image Captioning
1740 | NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge
1741 | Injecting Semantic Concepts Into End-to-End Image Captioning
1742 | DIFNet: Boosting Visual Information Flow for Image Captioning
1743 | VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning
1744 | Show, Deconfound and Tell: Image Captioning With Causal Inference
1745 | EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval
1746 | CLIPstyler: Image Style Transfer With a Single Text Condition
1747 | HairCLIP: Design Your Hair by Text and Reference Image
1748 | DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting
1749 | On Guiding Visual Attention With Language Specification
1750 | UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog
1751 | Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer
1752 | LiT: Zero-Shot Transfer With Locked-Image Text Tuning
1753 | GroupViT: Semantic Segmentation Emerges From Text Supervision
1754 | ReSTR: Convolution-Free Referring Image Segmentation Using Transformers
1755 | LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
1756 | An Empirical Study of Training End-to-End Vision-and-Language Transformers
1757 | Are Multimodal Transformers Robust to Missing Modality?
1758 | Text to Image Generation With Semantic-Spatial Aware GAN
1759 | StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis
1760 | Blended Diffusion for Text-Driven Editing of Natural Images
1761 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions
1762 | Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
1763 | A Style-Aware Discriminator for Controllable Image Translation
1764 | Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint
1765 | Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks
1766 | FlexIT: Towards Flexible Semantic Image Translation
1767 | Modulated Contrast for Versatile Image Synthesis
1768 | QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation
1769 | Self-Supervised Dense Consistency Regularization for Image-to-Image Translation
1770 | Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation
1771 | InstaFormer: Instance-Aware Image-to-Image Translation With Transformer
1772 | Unsupervised Image-to-Image Translation With Generative Prior
1773 | StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF via 2D-3D Mutual Learning
1774 | NeRF-Editing: Geometry Editing of Neural Radiance Fields
1775 | GeoNeRF: Generalizing NeRF With Geometry Priors
1776 | Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation
1777 | AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields
1778 | HDR-NeRF: High Dynamic Range Neural Radiance Fields
1779 | NeRFReN: Neural Radiance Fields With Reflections
1780 | Neural Point Light Fields
1781 | 3D-Aware Image Synthesis via Learning Structural and Textural Representations
1782 | GIRAFFE HD: A High-Resolution 3D-Aware Generative Model
1783 | Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis
1784 | Bi-Level Doubly Variational Learning for Energy-Based Latent Variable Models
1785 | High-Resolution Image Harmonization via Collaborative Dual Transformations
1786 | Brain-Supervised Image Editing
1787 | De-Rendering 3D Objects in the Wild
1788 | Neural Fields As Learnable Kernels for 3D Reconstruction
1789 | HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing
1790 | 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies
1791 | Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian
1792 | Deep Image-Based Illumination Harmonization
1793 | GLASS: Geometric Latent Augmentation for Shape Spaces
1794 | PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes
1795 | Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes
1796 | Neural Mesh Simplification
1797 | SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters
1798 | CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation
1799 | UNIST: Unpaired Neural Implicit Shape Translation Network
1800 | CoNeRF: Controllable Neural Radiance Fields
1801 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
1802 | Modeling Indirect Illumination for Inverse Rendering
1803 | Neural Head Avatars From Monocular RGB Videos
1804 | DeepCurrents: Learning Implicit Representations of Shapes With Boundaries
1805 | Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination
1806 | AnyFace: Free-Style Text-To-Face Synthesis and Manipulation
1807 | General Facial Representation Learning in a Visual-Linguistic Manner
1808 | Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection
1809 | Detecting Deepfakes With Self-Blended Images
1810 | 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces
1811 | Evaluation-Oriented Knowledge Distillation for Deep Face Recognition
1812 | AdaFace: Quality Adaptive Margin for Face Recognition
1813 | Moving Window Regression: A Novel Approach to Ordinal Regression
1814 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers
1815 | Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in “In-the-Wild” Videos
1816 | Deep Decomposition for Stochastic Normal-Abnormal Transport
1817 | DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification
1818 | Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification
1819 | Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations
1820 | VRDFormer: End-to-End Video Visual Relation Detection With Transformers
1821 | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
1822 | Visual Acoustic Matching
1823 | The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
1824 | Learning Multiple Dense Prediction Tasks From Partially Annotated Data
1825 | PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning
1826 | Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture
1827 | FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation
1828 | Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding
1829 | Equivariant Point Cloud Analysis via Learning Orientations for Message Passing
1830 | Surface Representation for Point Clouds
1831 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds
1832 | 3D Common Corruptions and Data Augmentation
1833 | INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation
1834 | How Much Does Input Data Type Impact Final Face Model Accuracy?
1835 | Ego4D: Around the World in 3,000 Hours of Egocentric Video
1836 | TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting
1837 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
1838 | vCLIMB: A Novel Video Class Incremental Learning Benchmark
1839 | Opening Up Open World Tracking
1840 | Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
1841 | CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters
1842 | Failure Modes of Domain Generalization Algorithms
1843 | A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes
1844 | Grounding Answers for Visual Questions Asked by Visually Impaired People
1845 | Learning To Answer Questions in Dynamic Audio-Visual Scenarios
1846 | Episodic Memory Question Answering
1847 | ScanQA: 3D Question Answering for Spatial Scene Understanding
1848 | Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles
1849 | BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild
1850 | Unified Contrastive Learning in Image-Text-Label Space
1851 | AlignMixup: Improving Representations by Interpolating Aligned Features
1852 | On the Road to Online Adaptation for Semantic Image Segmentation
1853 | ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation
1854 | Kernelized Few-Shot Object Detection With Efficient Integral Aggregation
1855 | Neural Mean Discrepancy for Efficient Out-of-Distribution Detection
1856 | A Structured Dictionary Perspective on Implicit Neural Representations
1857 | LARGE: Latent-Based Regression Through GAN Semantics
1858 | Rethinking Controllable Variational Autoencoders
1859 | Learning Canonical F-Correlation Projection for Compact Multiview Representation
1860 | Cross-Architecture Self-Supervised Video Representation Learning
1861 | Improving Video Model Transfer With Dynamic Representation Learning
1862 | Self-Supervised Image Representation Learning With Geometric Set Consistency
1863 | HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging
1864 | Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling
1865 | DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds
1866 | Neural Convolutional Surfaces
1867 | Representing 3D Shapes With Probabilistic Directed Distance Fields
1868 | H4D: Human 4D Modeling by Learning Neural Compositional Representation
1869 | Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification
1870 | Contrastive Regression for Domain Adaptation on Gaze Estimation
1871 | Forward Compatible Training for Large-Scale Embedding Retrieval Systems
1872 | Improving Subgraph Recognition With Variational Graph Information Bottleneck
1873 | Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss
1874 | Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species
1875 | Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation
1876 | Structured Sparse R-CNN for Direct Scene Graph Generation
1877 | PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation
1878 | RU-Net: Regularized Unrolling Network for Scene Graph Generation
1879 | Fine-Grained Predicates Learning for Scene Graph Generation
1880 | HL-Net: Heterophily Learning Network for Scene Graph Generation
1881 | SGTR: End-to-End Scene Graph Generation With Transformer
1882 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs
1883 | RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
1884 | Spatial Commonsense Graph for Object Localisation in Partial Scenes
1885 | “The Pedestrian Next to the Lamppost” Adaptive Object Graphs for Better Instantaneous Mapping
1886 | Category-Aware Transformer Network for Better Human-Object Interaction Detection
1887 | Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection
1888 | Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection
1889 | Human-Object Interaction Detection via Disentangled Transformer
1890 | MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
1891 | GaTector: A Unified Framework for Gaze Object Prediction
1892 | Rethinking Parsing Branch for Human Densepose Estimation
1893 | STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes
1894 | Crowd Counting in the Frequency Domain
1895 | Boosting Crowd Counting via Multifaceted Attention
1896 | Rethinking Spatial Invariance of Convolutional Networks for Object Counting
1897 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
1898 | Collaborative Transformers for Grounded Situation Recognition
1899 | Deep Stereo Image Compression via Bi-Directional Coding
1900 | RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion
1901 | Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer
1902 | Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels
1903 | SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization
1904 | Automatic Color Image Stitching Using Quaternion Rank-1 Alignment
1905 | SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
1906 | Degree-of-Linear-Polarization-Based Color Constancy
1907 | Point Cloud Color Constancy
1908 | Boosting View Synthesis With Residual Transfer
1909 | Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection
1910 | Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging
1911 | PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition
1912 | Multimodal Material Segmentation
1913 | Occlusion-Aware Cost Constructor for Light Field Depth Estimation
1914 | Learning Neural Light Fields With Ray-Space Embedding
1915 | Acquiring a Dynamic Light Field Through a Single-Shot Coded Image
1916 | Gravitationally Lensed Black Hole Emission Tomography
1917 | Deep Saliency Prior for Reducing Visual Distraction
1918 | Personalized Image Aesthetics Assessment With Rich Attributes
1919 | Artistic Style Discovery With Independent Components
1920 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
1921 | SVIP: Sequence VerIfication for Procedures in Videos
1922 | Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency
1923 | Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization
1924 | GateHUB: Gated History Unit With Background Suppression for Online Action Detection
1925 | E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
1926 | Hybrid Relation Guided Set Matching for Few-Shot Action Recognition
1927 | Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
1928 | Alignment-Uniformity Aware Representation Learning for Zero-Shot Video Classification
1929 | Cross-Modal Representation Learning for Zero-Shot Action Recognition
1930 | Cross-Modal Background Suppression for Audio-Visual Event Localization
1931 | Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization
1932 | An Empirical Study of End-to-End Temporal Action Detection
1933 | Everything at Once – Multi-Modal Fusion Transformer for Video Retrieval
1934 | DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
1935 | MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
1936 | Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition
1937 | AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
1938 | UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection
1939 | Detector-Free Weakly Supervised Group Activity Recognition
1940 | Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading
1941 | Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer
1942 | Interactiveness Field in Human-Object Interactions
1943 | GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
1944 | Object-Relation Reasoning Graph for Action Recognition
1945 | UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
1946 | Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition
1947 | SPAct: Self-Supervised Privacy Preservation for Action Recognition
1948 | Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
1949 | InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition
1950 | Learning Video Representations of Human Motion From Synthetic Data
1951 | Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
1952 | EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images
1953 | Gait Recognition in the Wild With Dense 3D Representations and a Benchmark
1954 | Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification
1955 | Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition
1956 | DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover’s Distance Improves Out-of-Distribution Face Identification
1957 | Learning Second Order Local Anomaly for General Face Forgery Detection
1958 | PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition
1959 | Face2Exp: Combating Data Biases for Facial Expression Recognition
1960 | Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation
1961 | EMOCA: Emotion Driven Monocular Face Capture and Animation
1962 | Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality
1963 | FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset
1964 | ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations
1965 | Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling
1966 | RigNeRF: Fully Controllable Neural 3D Portraits
1967 | HeadNeRF: A Real-Time NeRF-Based Parametric Head Model
1968 | Sparse to Dense Dynamic 3D Facial Expression Generation
1969 | Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
1970 | Speech Driven Tongue Animation
1971 | Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition
1972 | gDNA: Towards Generative Detailed Neural Avatars
1973 | GraFormer: Graph-Oriented Transformer for 3D Pose Estimation
1974 | Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation
1975 | Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis
1976 | PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence
1977 | The Wanderings of Odysseus in 3D Scenes
1978 | OSSO: Obtaining Skeletal Shape From Outside
1979 | LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds
1980 | Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression
1981 | Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation
1982 | LISA: Learning Implicit Shape and Appearance of Hands
1983 | MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image
1984 | Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation
1985 | Low-Resource Adaptation for Personalized Co-Speech Gesture Generation
1986 | D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions
1987 | Synthetic Generation of Face Videos With Plethysmograph Physiology
1988 | Contour-Hugging Heatmaps for Landmark Detection
1989 | Which Images To Label for Few-Shot Medical Landmark Detection?
1990 | Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography
1991 | Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization
1992 | Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution
1993 | Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations
1994 | Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation
1995 | BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation
1996 | Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis
1997 | Towards Low-Cost and Efficient Malaria Detection
1998 | ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification
1999 | Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification
2000 | M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer
2001 | Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
2002 | HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet
2003 | DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations
2004 | Clean Implicit 3D Structure From Noisy 2D STEM Images
2005 | Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks
2006 | Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment
2007 | Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks
2008 | NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration
2009 | SMPL-A: Modeling Person-Specific Deformable Anatomy
2010 | DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis
2011 | Affine Medical Image Registration With Coarse-To-Fine Vision Transformer
2012 | Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow
2013 | Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization
2014 | Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation
2015 | FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis
2016 | Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning
2017 | CellTypeGraph: A New Geometric Computer Vision Benchmark
2018 | ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics
2019 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
2020 | Multi-Dimensional, Nuanced and Subjective – Measuring the Perception of Facial Expressions
2021 | DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image
2022 | OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction
2023 | PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking
2024 | Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification
2025 | JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection
2026 | DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
2027 | Egocentric Prediction of Action Target in 3D
2028 | HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
2029 | Amodal Panoptic Segmentation
2030 | Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark
2031 | YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
2032 | The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting
2033 | 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos
2034 | AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
2035 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection
2036 | Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
2037 | Optimal Correction Cost for Object Detection Evaluation
2038 | GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains
2039 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
2040 | Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation
2041 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
2042 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation
2043 | Open Challenges in Deep Stereo: The Booster Dataset
2044 | No-Reference Point Cloud Quality Assessment via Domain Adaptation
2045 | Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network
2046 | How Good Is Aesthetic Ability of a Fashion Model?
2047 | Instance-Wise Occlusion and Depth Orders in Natural Scenes
2048 | PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects
2049 | Replacing Labeled Real-Image Datasets With Auto-Generated Contours
2050 | V2C: Visual Voice Cloning
2051 | M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining
2052 | It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
2053 | From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering
2054 | Point Cloud Pre-Training With Natural 3D Structures
2055 | The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift
2056 | AutoMine: An Unmanned Mine Dataset
2057 | SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis
2058 | BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations
2059 | Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
2060 | Unifying Panoptic Segmentation for Autonomous Driving
2061 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
2062 | SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
2063 | Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions


--------------------------------------------------------------------------------