Portrait of Stefan Stojanov

Stefan Stojanov

stojanov@stanford.edu

Twitter GitHub Google Scholar LinkedIn CV

I am an applied scientist at Amazon, where I work on video understanding.

Previously I was a postdoctoral researcher in the Stanford Vision and Learning Lab and Stanford NeuroAILab, advised by Jiajun Wu and Dan Yamins. At Stanford, I worked on developing video foundation models and fine-grained object representations. I was fortunate to be supported by a fellowship from Stanford HAI. I received my PhD at the Georgia Institute of Technology advised by James Rehg, where I worked on continual, self-supervised and low-shot learning, and 3D object shape reconstruction.

My main research interests are in computer vision and machine learning. I focus on the intersection of 3D vision, self-supervision, data synthesis, and video-based learning. My work explores how these areas can complement each other to develop systems capable of efficiently learning rich, granular representations of the physical world.

Publications & Preprints

Taming Generative Video Models for Zero-shot Optical Flow Extraction

A variety of frozen video models can be counterfactually prompted to extract motion in videos.

Seungwoo Kim*, Khai Loong Aw*, Klemen Kotar*, Cristobal Eyzaguirre, Wanhee Lee, Yunong Liu, Jared Watrous, Stefan Stojanov, Juan Carlos Niebles, Jiajun Wu, Daniel Yamins.

preprint, 2025

arxiv / project page


Discovering and Using Spelke Segments

Spelke segments (image regions that move together) emerge from counterfactual world models.

Rahul Venkatesh, Klemen Kotar, Lilian Naing Chen, Seungwoo Kim, Luca Thomas Wheeler, Jared Watrous, Ashley Xu, Gia Ancone, Wanhee Lee, Honglin Chen, Daniel Bear, Stefan Stojanov, Daniel Yamins.

preprint, 2025

arxiv / project page


Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

Optimizing visual prompts enables state-of-the-art motion estimation in videos.

Stefan Stojanov*, David Wendt*, Seungwoo Kim*, Rahul Venkatesh*, Kevin Feigelis, Jiajun Wu, Daniel Yamins.

preprint, 2025

arxiv / project page / code


Weakly-Supervised Learning of Dense Functional Correspondences

Pseudo-labels from VLMs plus spatial contrastive losses enable dense functional correspondences.

Stefan Stojanov*, Linan Zhao*, Yunzhi Zhang, Dan Yamins, Jiajun Wu.

ICCV 2025 - poster

paper / project page


The BabyView Dataset: High-resolution Egocentric Videos of Infants’ and Young Children’s Everyday Experiences

A large egocentric dataset of infants and toddlers for vision and language learning.

Bria Long*, Robert Z. Sparks*, Violet Xiang*, Stefan Stojanov*, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank.

CCN 2025 - poster

arxiv


3 × 2: 3D Object Part Segmentation by 2D Semantic Correspondences thumbnail

3 × 2: 3D Object Part Segmentation by 2D Semantic Correspondences

Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, James M. Rehg, Matt Feiszli.

ECCV 2024 - poster

paper


ZeroShape thumbnail

ZeroShape: Regression-based Zero-shot Shape Reconstruction

SOTA 3D shape reconstructor with high computational efficiency and low training data budget.

Zixuan Huang*, Stefan Stojanov*, Anh Thai, Varun Jampani, James M. Rehg

CVPR 2024 - poster

paper / code / project page / demo


Low-shot Object Learning thumbnail

Low-shot Object Learning with Mutual Exclusivity Bias

Mutual Exclusivity Bias enables fast learning of objects that generalizes.

Anh Thai, Ahmad Humayun, Stefan Stojanov, Zixuan Huang, Bikram Boote, James M. Rehg

NeurIPS 2023 – Datasets and Benchmarks Track

paper / code / project page


ShapeClipper thumbnail

ShapeClipper: Scalable 3D Shape Learning via Geometric and CLIP-based Consistency

CLIP and geometric consistency constraints facilitate scalable learning of object shape reconstruction.

Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Stefan Stojanov, James M. Rehg

CVPR 2023 – poster

arxiv / code / project page / video


Learning Dense Object Descriptors thumbnail

Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization

Dense feature-level self-supervised learning from multiple camera views without any category labels leads to representations that can generalize to novel categories.

Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg

NeurIPS 2022 – poster

arxiv / code / project page / poster / video


Planes vs. Chairs thumbnail

Planes vs. Chairs: Category-guided 3D Shape Learning without any 3D Cues

A 3D-unsupervised model that learns shapes of multiple object categories at once.

Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg

ECCV 2022 – poster

arxiv / code / project page / poster / video


Continual 3D Shape Reconstruction thumbnail

The Surprising Positive Knowledge Transfer in Continual 3D Object Shape Reconstruction

Continual learning of 3D shape reconstruction does not suffer from catastrophic forgetting as much as discriminative learning tasks.

Anh Thai, Stefan Stojanov, Zixuan Huang, James M. Rehg

3DV 2022 – poster

arxiv / code


Depth for Gaze Estimation thumbnail

The Benefits of Depth Information for Head-Mounted Gaze Estimation

Fusing depth and image information improves deep models' robustness to fitment and slip for head-mounted gaze estimation.

Stefan Stojanov, Sachin Talathi, Abhishek Sharma

ETRA 2022 – short paper

pdf


Using Shape to Categorize thumbnail

Using Shape to Categorize: Low-Shot Learning with an Explicit Shape Bias

Learning representations to generalize based on 3D shape and then learning to map images into them leads to improved low-shot generalization.

Stefan Stojanov, Anh Thai, James M. Rehg

CVPR 2021 – poster

arxiv / code / dataset / project page


3D Reconstruction from Single Images thumbnail

3D Reconstruction of Novel Object Shapes from Single Images

An implicit SDF representation-based method for single-view 3D shape reconstruction.

Anh Thai*, Stefan Stojanov*, James M. Rehg

3DV 2021 – oral

arxiv / code / project page


Incremental Object Learning thumbnail

Incremental Object Learning from Contiguous Views

Repetition of learned concepts in continual learning ameliorates catastrophic forgetting.

Stefan Stojanov, Anh Thai*, Samarth Mishra*, James M. Rehg

CVPR 2019 – oral – Best Paper Award Finalist

pdf / code / dataset / video


Unsupervised 3D Pose Estimation thumbnail

Unsupervised 3D Pose Estimation with Geometric Self-supervision

Utilizing adversarial learning to estimate 3D human pose without 3D supervision.

Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Stefan Stojanov, James M. Rehg

CVPR 2019 – poster

arxiv

Design inspired by the websites of Georgia Gkioxari and Jon Barron