Hengkai Guo

I currently lead a team at ByteDance, focusing on the development of advanced 3D vision and augmented reality (AR) technologies. In my role, I have sprearheaded 3D vision projects, aimed at enhancing AR interaction, 3D editing, and content creation. Prior to joining ByteDance, I completed both my bachelor's and master's degrees from Tsinghua University. Additionally, I have enriched my experience with internships at Google, Hulu, Sensetime, and Microsoft.

Hiring: We are seeking talented individuals (both FTEs and interns) with expertise in generative models and 3D vision. If you're interested, please feel free to email me (Email: guohengkaighk [AT] gmail [DOT] com).

Email  /  Scholar  /  Github

profile photo

Research

My primary interest lies in 3D vision and its practical applications, encompassing areas including 3D localization, perception, reconstruction, understanding, and generation. (* indicates equal contribution, † indicates project lead.)

dise Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen, Hengkai Guo†, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang
arXiv, 2025
project page / arXiv

An accurate, consistent, efficient, and generalizable video depth estimator, capable of supporting videos of any length.

dise MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction
Wang Zhao, Jiachen Liu, Sheng Zhang, Yishu Li, Sili Chen, Sharon X Huang, Yong-Jin Liu, Hengkai Guo
IROS, 2024   (Oral Presentation)
code (Coming soon) / arXiv

Leverage pretrained depth and normal models to facilitate zero-shot monocular plane reconstruction.

dise Lazy Visual Localization via Motion Averaging
Siyan Dong*, Shaohui Liu*, Hengkai Guo, Baoquan Chen, Marc Pollefeys
arXiv, 2024
arXiv

Integrate motion averaging into visual localization to eliminate the need for pre-built 3D maps.

dise ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras
Wang Zhao, Shaohui Liu, Hengkai Guo†, Wenping Wang, Yong-Jin Liu
ECCV, 2022
project page / arXiv / code

A video structure-from-motion system with dense point trajectories that generalizes well to in-the-wild sequences with dynamic motion.

dise A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo
Wang Zhao*, Shaohui Liu*, Yi Wei, Hengkai Guo, Yong-Jin Liu
ICCV, 2021
project page / arXiv / code

A differentiable solver for multi-view stereo that iteratively solves for per-view depth and normal map based on the locally planar assumption.

dise Iterative Feature Matching for Self-supervised Indoor Depth Estimation
Yi Wei, Hengkai Guo†, Jiwen Lu, Jie Zhou
TCSVT, 2021
paper

A self-supervised depth estimation framework designed for low-texture indoor scenes, which operates without PoseNet by utilizing iterative feature matching.

dise GPO: Global Plane Optimization for Fast and Accurate Monocular SLAM Initialization
Sicong Du*, Hengkai Guo*†, Yao Chen, Yilun Lin, Xiangbing Meng, Linfu Wen, Fei-Yue Wang
ICRA, 2020
arXiv

A monocular initialization method for SLAM using multi-frame planar homographies.

dise Geometric Pretraining for Monocular Depth Estimation
Kaixuan Wang, Yao Chen, Hengkai Guo, Linfu Wen, Shaojie Shen
ICRA, 2020
paper / code

Use self-supervised flow pretraining task to improve monocular depth estimation.

dise Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation
Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang
Neurocomputing, 2019
arXiv / code

Enhance RGB-D-based 3D hand pose estimation by fusing features from initial pose guidance, estimated through a region ensemble network.

dise Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images
Yi Wei, Guijin Wang, Cairong Zhang, Hengkai Guo, Xinghao Chen, Huazhong Yang
VCIP, 2017   (Best Student Paper Award)
arXiv / dataset

A two-stream model designed to regress the 3D positions of fingertips from binocular images.

dise Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation
Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang
ICIP, 2017
arXiv / code

A state-of-the-art RGB-D-based 3D hand pose model that utilizes a convolutional network and regional feature ensemble.

dise Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition
Xinghao Chen, Hengkai Guo, Guijin Wang, Cairong Zhang, Li Zhang
ICIP, 2017
arXiv

Use a recurrent neural network with augmented motion features to estimate skeleton-based dynamic hand gesture.

Miscellanea

Competitions

Detection In the Wild Challenge in CVPR 2019: 2nd for Object365 Full Track
PoseTrack Challenge in ECCV 2018: 2nd for human pose tracking without extra datasets

Honors and Awards

Intel Scholarship, 2016 (5 people in Tsinghua University)
Guanghua Scholarship, 2015
Excellent Graduate in Tsinghua University, 2014
Nanxiang Jiang Scholarship, 2013 (Top scholarship, 19 people in Tsinghua University)


© Hengkai Guo | Last updated: Feb 2025

Design and source code from Jon Barron's website