CVPR 2021

CVPR  2021 is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers.

Now the 2021 paper has not been fully released, and will be updated directly when it is released later. Now let’s review the 2020 and 2019 papers.

CVPR 2021 

Continuously update Github 

Target Detection

  1. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Paper address:

  2. Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector Paper address:


Image segmentation

  1. Semi-Supervised Semantic Image Segmentation with Self-correcting Networks Paper address:

  2. Deep Snake for Real-Time Instance Segmentation Paper address:

  3. CenterMask: Real-Time Anchor-Free Instance Segmentation Paper address:  Code:

  4. SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks Paper address:

  5. PolarMask: Single Shot Instance Segmentation with Polar Representation Paper address:  Code:

  6. xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation Paper address:

  7. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation Paper address:


Face recognition

  1. Towards Universal Representation Learning for Deep Face Recognition paper address:

  2. Suppressing Uncertainties for Large-Scale Facial Expression Recognition
    Paper address:  Code:

3. Face X-ray for More General Face Forgery Detection paper address:


Target Tracking

1. ROAM: Recurrently Optimizing Tracking Model Paper address:


3D point cloud & reconstruction

  1. PF-Net: Point Fractal Network for 3D Point Cloud Completion Paper address:

  2. PointAugment: an Auto-Augmentation Framework for Point Cloud Classification Paper address:  Code:

3. Learning multiview 3D point cloud registration address:

  1. C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds Paper address:

  2. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds Paper address:

  3. Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image Paper address:

  4. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion Paper address:

  5. In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction from 2D Landmarks Paper address:


Attitude estimation

  1. VIBE: Video Inference for Human Body Pose and Shape Estimation Paper address:

  2. Distribution-Aware Coordinate Representation for Human Pose Estimation Paper address:

  3. 4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras Paper address:

  4. Optimal least-squares solution to the hand-eye calibration problem Paper address:

  5. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry Paper address:

  6. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition Paper address:

  7. Distribution Aware Coordinate Representation for Human Pose Estimation Paper address:

  8. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation Paper address:

9.PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation Paper address:



  1. Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models Paper address:  Code:

  2. MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis Paper address:

  3. Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory Paper address:


Small sample & zero sample

  1. Improved Few-Shot Visual Classification paper address:

2. Meta-Transfer Learning for Zero-Shot Super-Resolution Paper address:


Weak supervision & unsupervised

  1. Rethinking the Route Towards Weakly Supervised Object Localization Paper address:
  2. NestedVAE: Isolating Common Factors via Weak Supervision Paper address:

3.Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation Paper address:

4. Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction address:


Neural Networks

  1. Visual Commonsense R-CNN paper address:

  2. GhostNet: More Features from Cheap Operations Paper address:  Code:

  3. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Paper address:


Model acceleration

  1. GPU-Accelerated Mobile Multi-view Style Transfer paper address:


Visual common sense

  1. What it Thinks is Important is Important: Robustness Transfers through Input Gradients Paper address:

2. Attentive Context Normalization for Robust Permutation-Equivariant Learning paper address:

  1. Bundle Adjustment on a Graph Processor Paper address:

  2. Transferring Dense Pose to Proximal Animal Classes Paper address:

  3. Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs Paper address:

  4. Learning in the Frequency Domain paper address:

7. Filter Grafting for Deep Neural Networks paper address:

8.ClusterFit: Improving Generalization of Visual Representations Paper address:

9.Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction Paper address:

10. Auto-Encoding Twin-Bottleneck Hashing paper address:

11. Learning Representations by Predicting Bags of Visual Words Paper address:

12.Holistically-Attracted Wireframe Parsing paper address:

13.A General and Adaptive Robust Loss Function paper address:

14. A Characteristic Function Approach to Deep Implicit Generative Modeling paper address:

15.AdderNet: Do We Really Need Multiplications in Deep Learning? Paper address:

16.12-in-1: Multi-Task Vision and Language Representation Learning Paper address:

17.Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks Paper address:

18.CARS: Contunuous Evolution for Efficient Neural Architecture Search Paper address:  Code:

19.Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training Paper address:  Code:

1.GhostNet: More Features from Cheap Operations (over the architecture of Mobilenet v3) Paper link:  model (amazing performance on ARM CPU): https://github. com/iamhankai/

We beat other SOTA lightweight CNNs such as MobileNetV3 and FBNet.

  1. AdderNet: Do We Really Need Multiplications in Deep Learning? (Additive Neural Network) Achieved very good performance on large-scale neural networks and datasets. Link to the paper:

  2. Frequency Domain Compact 3D Convolutional Neural Networks (3dCNN compression) Paper link:  Open source code:

  3. A Semi-Supervised Assessor of Neural Architectures (NAS)

  4. Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection (NAS detection) backbone-neck-head search together, trinity

  5. CARS: Contunuous Evolution for Efficient Neural Architecture Search (Continuously evolved NAS) is efficient, has multiple advantages of differentiability and evolution, and can output Pareto pre-research

  6. On Positive-Unlabeled Classification in GAN (PU+GAN)

  7. Learning multiview 3D point cloud registration (3D point cloud) Link to the paper:

  8. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition (fine-grained action recognition) Link to the paper:

  9. Action Modifiers: Learning from Adverbs in Instructional Video Link to the paper:

  10. PolarMask: Single Shot Instance Segmentation with Polar Representation (instance segmentation modeling) Paper link: Paper interpretation:  Open source code: https://github. com/xieenze/PolarMask

  11. Rethinking Performance Estimation in Neural Architecture Search (NAS) Since the real time-consuming part of block wise neural architecture search is performance estimation, this article finds the optimal parameters for block wise NAS, which is faster and more relevant.

  12. Distribution Aware Coordinate Representation for Human Pose Estimation (human body pose estimation) Link to the paper: Github:  Author team homepage: coco/



  1. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network Paper address:  Code:,https://github. com/aim-uofa/adet


Image classification

  1. Self-training with Noisy Student improves ImageNet classification Paper address:

  2. Image Matching across Wide Baselines: From Paper to Practice Paper address:

  3. Towards Robust Image Classification Using Sequential Attention Models Paper address:


Video analysis

  1. Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications Paper address:

  2. Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs Paper address:

  3. Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Paper address:

  4. Object Relational Graph with Teacher-Recommended Learning for Video Captioning Paper address:

  5. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution Paper address:

  6. Blurry Video Frame Interpolation Paper address:

  7. Hierarchical Conditional Relation Networks for Video Question Answering Paper address:

  8. Action Modifiers: Learning from Adverbs in Instructional Video Paper address:


Image Processing

  1. Learning to Shade Hand-drawn Sketches Paper address:

2.Single Image Reflection Removal through Cascaded Refinement Paper address:

3.Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data Paper address:

  1. Deep Image Harmonization via Domain Verification Paper address:  Code:

  2. RoutedFusion: Learning Real-time Depth Map Fusion Paper address:



  1. Visual Commonsense R-CNN, Visual Commonsense R-CNN

  1. Out-of-distribution image detection

  1. Blurry Video Frame Interpolation, Blurry Video Frame Interpolation

  1. Meta transfer learning zero sample superscore

  1. 3D indoor scene understanding

6. Generate unbiased scene graphs from biased training

  1. Automatically encode double bottleneck hash

  1. A Convolutional Neural Network of Social Spatio-temporal Graph for Human Trajectory Prediction

  1. For general representation learning for deep face recognition

  1. Visual representation generalization

  1. Reduce context bias

  1. Unsupervised reinforcement learning with transferable meta skills

  1. Fast and accurate spatio-temporal video super-resolution

  1. Object relationship diagram Teacher recommended learning video captioning

  1. Rethinking the Location and Routing of Weakly Supervised Objects

  1. General agents for learning visual and language navigation through pre-training

  1. GhostNet lightweight neural network

  1. AdderNet: In deep learning, do we really need multiplication?

  1. CARS: continuous evolution of efficient neural structure search

  1. Removal of reflections in a single image through collaborative iterative cascade fine-tuning

  1. Filter grafting of deep neural network

  1. PolarMask: unify instance segmentation to FCN

  1. Semi-supervised semantic image segmentation

  1. Defend general attacks through selective feature regeneration

  1. Real-time image retrieval based on fine-grained sketches

  1. Ask the VQA model with sub-questions

  1. Learning neural 3D texture space from 2D paradigms

  1. NestedVAE: Isolate common factors through weak supervision

  1. Realize multiple future trajectory predictions

  1. Use sequence attention model for robust image classification

Source : The official website of CVPR 2021

Next Post Previous Post
Comment Here
Add Your Comment
comment url

Donate Me