I am an applied scientist at Amazon, where I work on video understanding.
Previously I was a postdoctoral researcher in the Stanford Vision and Learning Lab and Stanford NeuroAILab, advised by Jiajun Wu and Dan Yamins. At Stanford, I worked on developing video foundation models and fine-grained object representations. I was fortunate to be supported by a fellowship from Stanford HAI. I received my PhD at the Georgia Institute of Technology advised by James Rehg, where I worked on continual, self-supervised and low-shot learning, and 3D object shape reconstruction.
My main research interests are in computer vision and machine learning. I focus on the intersection of 3D vision, self-supervision, data synthesis, and video-based learning. My work explores how these areas can complement each other to develop systems capable of efficiently learning rich, granular representations of the physical world.
Taming Generative Video Models for Zero-shot Optical Flow Extraction
A variety of frozen video models can be counterfactually prompted to extract motion in videos.
Seungwoo Kim*, Khai Loong Aw*, Klemen Kotar*, Cristobal Eyzaguirre, Wanhee Lee, Yunong Liu, Jared Watrous, Stefan Stojanov, Juan Carlos Niebles, Jiajun Wu, Daniel Yamins.
preprint, 2025
Discovering and Using Spelke Segments
Spelke segments (image regions that move together) emerge from counterfactual world models.
Rahul Venkatesh, Klemen Kotar, Lilian Naing Chen, Seungwoo Kim, Luca Thomas Wheeler, Jared Watrous, Ashley Xu, Gia Ancone, Wanhee Lee, Honglin Chen, Daniel Bear, Stefan Stojanov, Daniel Yamins.
preprint, 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Optimizing visual prompts enables state-of-the-art motion estimation in videos.
Stefan Stojanov*, David Wendt*, Seungwoo Kim*, Rahul Venkatesh*, Kevin Feigelis, Jiajun Wu, Daniel Yamins.
preprint, 2025
arxiv / project page / code
Weakly-Supervised Learning of Dense Functional Correspondences
Pseudo-labels from VLMs plus spatial contrastive losses enable dense functional correspondences.
Stefan Stojanov*, Linan Zhao*, Yunzhi Zhang, Dan Yamins, Jiajun Wu.
ICCV 2025 - poster
The BabyView Dataset: High-resolution Egocentric Videos of Infants’ and Young Children’s Everyday Experiences
A large egocentric dataset of infants and toddlers for vision and language learning.
Bria Long*, Robert Z. Sparks*, Violet Xiang*, Stefan Stojanov*, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank.
CCN 2025 - poster
3 × 2: 3D Object Part Segmentation by 2D Semantic Correspondences
Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, James M. Rehg, Matt Feiszli.
ECCV 2024 - poster
ZeroShape: Regression-based Zero-shot Shape Reconstruction
SOTA 3D shape reconstructor with high computational efficiency and low training data budget.
Zixuan Huang*, Stefan Stojanov*, Anh Thai, Varun Jampani, James M. Rehg
CVPR 2024 - poster
paper / code / project page / demo
Low-shot Object Learning with Mutual Exclusivity Bias
Mutual Exclusivity Bias enables fast learning of objects that generalizes.
Anh Thai, Ahmad Humayun, Stefan Stojanov, Zixuan Huang, Bikram Boote, James M. Rehg
NeurIPS 2023 – Datasets and Benchmarks Track
paper / code / project page
ShapeClipper: Scalable 3D Shape Learning via Geometric and CLIP-based Consistency
CLIP and geometric consistency constraints facilitate scalable learning of object shape reconstruction.
Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Stefan Stojanov, James M. Rehg
CVPR 2023 – poster
arxiv / code / project page / video
Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
Dense feature-level self-supervised learning from multiple camera views without any category labels leads to representations that can generalize to novel categories.
Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg
NeurIPS 2022 – poster
arxiv / code / project page / poster / video
Planes vs. Chairs: Category-guided 3D Shape Learning without any 3D Cues
A 3D-unsupervised model that learns shapes of multiple object categories at once.
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg
ECCV 2022 – poster
arxiv / code / project page / poster / video
The Surprising Positive Knowledge Transfer in Continual 3D Object Shape Reconstruction
Continual learning of 3D shape reconstruction does not suffer from catastrophic forgetting as much as discriminative learning tasks.
Anh Thai, Stefan Stojanov, Zixuan Huang, James M. Rehg
3DV 2022 – poster
The Benefits of Depth Information for Head-Mounted Gaze Estimation
Fusing depth and image information improves deep models' robustness to fitment and slip for head-mounted gaze estimation.
Stefan Stojanov, Sachin Talathi, Abhishek Sharma
ETRA 2022 – short paper
Using Shape to Categorize: Low-Shot Learning with an Explicit Shape Bias
Learning representations to generalize based on 3D shape and then learning to map images into them leads to improved low-shot generalization.
Stefan Stojanov, Anh Thai, James M. Rehg
CVPR 2021 – poster
arxiv / code / dataset / project page
3D Reconstruction of Novel Object Shapes from Single Images
An implicit SDF representation-based method for single-view 3D shape reconstruction.
Anh Thai*, Stefan Stojanov*, James M. Rehg
3DV 2021 – oral
arxiv / code / project page
Incremental Object Learning from Contiguous Views
Repetition of learned concepts in continual learning ameliorates catastrophic forgetting.
Stefan Stojanov, Anh Thai*, Samarth Mishra*, James M. Rehg
CVPR 2019 – oral – Best Paper Award Finalist
Unsupervised 3D Pose Estimation with Geometric Self-supervision
Utilizing adversarial learning to estimate 3D human pose without 3D supervision.
Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Stefan Stojanov, James M. Rehg
CVPR 2019 – poster