Hengkai Guo's Home Page

Hengkai Guo

I currently lead a team at ByteDance, focusing on the development of advanced 3D vision and augmented reality (AR) technologies. In my role, I have sprearheaded 3D vision projects, aimed at enhancing AR interaction, 3D editing, and content creation. Prior to joining ByteDance, I completed both my bachelor's and master's degrees from Tsinghua University. Additionally, I have enriched my experience with internships at Google, Hulu, Sensetime, and Microsoft.

Hiring: We are seeking talented individuals (both FTEs and interns) with expertise in generative models and 3D vision. If you're interested, please feel free to email me (Email: guohengkaighk [AT] gmail [DOT] com).

Email / Scholar / Github

Research

My primary interest lies in 3D vision and its practical applications, encompassing areas including 3D localization, perception, reconstruction, understanding, and generation. (* indicates equal contribution, † indicates project lead.)

	Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Sili Chen, Hengkai Guo†, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang CVPR, 2025 (Highlight Presentation) project page / code / arXiv An accurate, consistent, efficient, and generalizable video depth estimator, capable of supporting videos of any length.
	ZeroPlane: Towards In-the-wild 3D Plane Reconstruction from a Single Image Jiachen Liu, Rui Yu, Sili Chen, Sharon X Huang, Hengkai Guo† CVPR, 2025 (Highlight Presentation) code / arXiv A zero-shot monocular plane estimator over diverse domains.
	MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction Wang Zhao, Jiachen Liu, Sheng Zhang, Yishu Li, Sili Chen, Sharon X Huang, Yong-Jin Liu, Hengkai Guo† IROS, 2024 (Oral Presentation) code (Coming soon) / arXiv Leverage pretrained depth and normal models to facilitate zero-shot monocular plane reconstruction.
	Lazy Visual Localization via Motion Averaging Siyan Dong, Shaohui Liu, Hengkai Guo, Baoquan Chen, Marc Pollefeys arXiv, 2024 arXiv Integrate motion averaging into visual localization to eliminate the need for pre-built 3D maps.
	ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras Wang Zhao, Shaohui Liu, Hengkai Guo†, Wenping Wang, Yong-Jin Liu ECCV, 2022 project page / arXiv / code A video structure-from-motion system with dense point trajectories that generalizes well to in-the-wild sequences with dynamic motion.
	A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo Wang Zhao, Shaohui Liu, Yi Wei, Hengkai Guo, Yong-Jin Liu ICCV, 2021 project page / arXiv / code A differentiable solver for multi-view stereo that iteratively solves for per-view depth and normal map based on the locally planar assumption.
	Iterative Feature Matching for Self-supervised Indoor Depth Estimation Yi Wei, Hengkai Guo†, Jiwen Lu, Jie Zhou TCSVT, 2021 paper A self-supervised depth estimation framework designed for low-texture indoor scenes, which operates without PoseNet by utilizing iterative feature matching.
	GPO: Global Plane Optimization for Fast and Accurate Monocular SLAM Initialization Sicong Du, Hengkai Guo*†, Yao Chen, Yilun Lin, Xiangbing Meng, Linfu Wen, Fei-Yue Wang ICRA*, 2020 arXiv A monocular initialization method for SLAM using multi-frame planar homographies.
	Geometric Pretraining for Monocular Depth Estimation Kaixuan Wang, Yao Chen, Hengkai Guo, Linfu Wen, Shaojie Shen ICRA, 2020 paper / code Use self-supervised flow pretraining task to improve monocular depth estimation.
	Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang Neurocomputing, 2019 arXiv / code Enhance RGB-D-based 3D hand pose estimation by fusing features from initial pose guidance, estimated through a region ensemble network.
	Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images Yi Wei, Guijin Wang, Cairong Zhang, Hengkai Guo, Xinghao Chen, Huazhong Yang VCIP, 2017 (Best Student Paper Award) arXiv / dataset A two-stream model designed to regress the 3D positions of fingertips from binocular images.
	Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang ICIP, 2017 arXiv / code A state-of-the-art RGB-D-based 3D hand pose model that utilizes a convolutional network and regional feature ensemble.
	Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition Xinghao Chen, Hengkai Guo, Guijin Wang, Cairong Zhang, Li Zhang ICIP, 2017 arXiv Use a recurrent neural network with augmented motion features to estimate skeleton-based dynamic hand gesture.

Miscellanea

Competitions	Detection In the Wild Challenge in CVPR 2019: 2nd for Object365 Full Track PoseTrack Challenge in ECCV 2018: 2nd for human pose tracking without extra datasets
Honors and Awards	Intel Scholarship, 2016 (5 people in Tsinghua University) Guanghua Scholarship, 2015 Excellent Graduate in Tsinghua University, 2014 Nanxiang Jiang Scholarship, 2013 (Top scholarship, 19 people in Tsinghua University)

Design and source code from Jon Barron's website

Research

Miscellanea

Competitions

Honors and Awards