Abstract
Accurate monocular depth estimation is a fundamental computer vision challenge with applications in autonomous navigation, robotics, and augmented reality. Projecting a 3D world onto 2D planes creates inherent, difficult-to-resolve ambiguities. Furthermore, the scarcity of high-quality ground truth annotations limits supervised learning generalization. This paper introduces SynDepth-Hybrid, a rigorous semi-supervised framework combining synthetic domain adaptation with variational geometric refinement to overcome these limitations. Our architecture integrates a deep residual convolutional neural network for initial coarse prediction with a differentiable module enforcing physical constraints like piecewise smoothness and boundary consistency. The core innovation is a two-stage training paradigm: pre-training on photorealistic synthetic data, followed by unsupervised domain adaptation to real-world imagery using photometric and geometric constraints. We formulate refinement as a constrained energy minimization problem regularized by anisotropic image gradients and surface smoothness priors. Evaluations on benchmarks like KITTI, NYU Depth v2, and Make3D demonstrate state-of-the-art performance, particularly at object boundaries where traditional methods suffer from over-smoothing. Results show a 23.7% improvement in RMSE and 31.2% in boundary-aware metrics over supervised baselines. The provided mathematical formulation offers theoretical guarantees on module convergence and stability, while the architecture remains efficient for near real-time applications.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)