Evaluation of SnDA: A Novel Data Preprocessing Technique with Theoretical Foundations and Properties

02 January 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Abstract: Data Preprocessing is a vital process in machine learning and data science which is aimed at importantly enhancing data quality and improving model performance. The existing data preprocessing techniques often focusses on different challenges but effectively mitigating noise in data stands out as a fundamental aspect. Each method offers unique advantages and limitations, making their applicability dependent on the specific characteristics of the dataset at hand. This research paper explores a novel method named as Sequential ‘n’ Distance Average (SnDA) for denoising the data which is introduced as a unified and lightweight approach which will lead to prepare data for more accurate analysis and modelling in terms of denoising the data. Core SnDA method operates by sorting the original data, computing average successive differences, and reconstructing a smoothed sequence. This approach effectively smooths the data, mitigating noise and preserves the underlying structure. The key advantage of SnDA is it enhances data quality without distorting its scale or distribution, making it a robust and versatile preprocessing tool for various analytical and predictive tasks. Considering there are specific data characteristic and preprocessing requirements, a couple of different SnDA variants has been developed namely “Modified SnDA” and “Adaptive SnDA Glide.” Each variant maintains the core principle of SnDA computing the average of successive differences but incorporates additional steps or modifications to better suit particular data scenarios. Hence SnDA along with its variants can be adapted to meet the unique challenges presented by different datasets.

Keywords

Arithmetic Progression
Sequential Difference
Data smoothing
Trend Preservation
Noise reduction
Dynamic Range adjustment
Telescopic Range
Telescopic Mean
outlier.

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.