Hengkai Guo
I currently lead a team at ByteDance, focusing on the development of advanced 3D vision and augmented reality (AR) technologies. In my role, I have sprearheaded 3D vision projects, aimed at enhancing AR interaction, 3D editing, and content creation. Prior to joining ByteDance, I completed both my bachelor's and master's degrees from Tsinghua University. Additionally, I have enriched my experience with internships at Google, Hulu, Sensetime, and Microsoft.
Hiring: We are seeking talented individuals (both FTEs and interns) with expertise in generative models and 3D vision. If you're interested, please feel free to email me (Email: guohengkaighk [AT] gmail [DOT] com).
Email /
Scholar /
Github
|
|
Research
My primary interest lies in 3D vision and its practical applications, encompassing areas including 3D localization, perception, reconstruction, understanding, and generation. (* indicates equal contribution, † indicates project lead.)
|
|
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen,
Hengkai Guo†,
Shengnan Zhu,
Feihu Zhang,
Zilong Huang,
Jiashi Feng,
Bingyi Kang
arXiv, 2025
project page
/
arXiv
An accurate, consistent, efficient, and generalizable video depth estimator, capable of supporting videos of any length.
|
|
MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction
Wang Zhao,
Jiachen Liu,
Sheng Zhang,
Yishu Li,
Sili Chen,
Sharon X Huang,
Yong-Jin Liu,
Hengkai Guo†
IROS, 2024   (Oral Presentation)
code (Coming soon)
/
arXiv
Leverage pretrained depth and normal models to facilitate zero-shot monocular plane reconstruction.
|
|
Lazy Visual Localization via Motion Averaging
Siyan Dong*,
Shaohui Liu*,
Hengkai Guo,
Baoquan Chen,
Marc Pollefeys
arXiv, 2024
arXiv
Integrate motion averaging into visual localization to eliminate the need for pre-built 3D maps.
|
|
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras
Wang Zhao,
Shaohui Liu,
Hengkai Guo†,
Wenping Wang,
Yong-Jin Liu
ECCV, 2022
project page
/
arXiv
/
code
A video structure-from-motion system with dense point trajectories that generalizes well to in-the-wild sequences with dynamic motion.
|
|
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo
Wang Zhao*,
Shaohui Liu*,
Yi Wei,
Hengkai Guo,
Yong-Jin Liu
ICCV, 2021
project page
/
arXiv
/
code
A differentiable solver for multi-view stereo that iteratively solves for per-view depth and normal map based on the locally planar assumption.
|
|
Iterative Feature Matching for Self-supervised Indoor Depth Estimation
Yi Wei,
Hengkai Guo†,
Jiwen Lu,
Jie Zhou
TCSVT, 2021
paper
A self-supervised depth estimation framework designed for low-texture indoor scenes, which operates without PoseNet by utilizing iterative feature matching.
|
|
GPO: Global Plane Optimization for Fast and Accurate Monocular SLAM Initialization
Sicong Du*,
Hengkai Guo*†,
Yao Chen,
Yilun Lin,
Xiangbing Meng,
Linfu Wen,
Fei-Yue Wang
ICRA, 2020
arXiv
A monocular initialization method for SLAM using multi-frame planar homographies.
|
|
Geometric Pretraining for Monocular Depth Estimation
Kaixuan Wang,
Yao Chen,
Hengkai Guo,
Linfu Wen,
Shaojie Shen
ICRA, 2020
paper
/
code
Use self-supervised flow pretraining task to improve monocular depth estimation.
|
|
Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation
Xinghao Chen,
Guijin Wang,
Hengkai Guo,
Cairong Zhang
Neurocomputing, 2019
arXiv
/
code
Enhance RGB-D-based 3D hand pose estimation by fusing features from initial pose guidance, estimated through a region ensemble network.
|
|
Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images
Yi Wei,
Guijin Wang,
Cairong Zhang,
Hengkai Guo,
Xinghao Chen,
Huazhong Yang
VCIP, 2017   (Best Student Paper Award)
arXiv
/
dataset
A two-stream model designed to regress the 3D positions of fingertips from binocular images.
|
|
Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation
Hengkai Guo,
Guijin Wang,
Xinghao Chen,
Cairong Zhang,
Fei Qiao,
Huazhong Yang
ICIP, 2017
arXiv
/
code
A state-of-the-art RGB-D-based 3D hand pose model that utilizes a convolutional network and regional feature ensemble.
|
|
Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition
Xinghao Chen,
Hengkai Guo,
Guijin Wang,
Cairong Zhang,
Li Zhang
ICIP, 2017
arXiv
Use a recurrent neural network with augmented motion features to estimate skeleton-based dynamic hand gesture.
|
|