Abstract
Computer vision systems deployed on consumer and handheld sensors (smartphones, low-cost thermal cameras, and compact clinical imaging devices) face three recurring technical challenges: (i) limited labeled data, (ii) domain shift across devices and capture conditions, and (iii) tight constraints on latency, memory, and energy. This paper presents a unified scientific framework for data-efficient image understanding spanning (a) skin-region segmentation and skin-condition classification, (b) palmprint-based authentication on mobile devices, and (c) thermal-image-based attribute classification. We synthesize a consistent training and evaluation protocol combining transfer learning, self-supervised pretraining, strong augmentation, metric learning, calibration, and synthetic data generation using modern generative models. To rigorously quantify the impact of these components under varying data regimes, we provide a formal sensitivity analysis that models the full protocol and demonstrates how sample size, augmentation strength, and pretraining choice affect segmentation overlap, verification error, and calibration. The result is a practical blueprint for building robust consumer-sensor vision systems under data scarcity, alongside a transparent methodology for benchmarking.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)