Data augmentation techniques for ML models: Enhancing model performance through data variability
Abstract
The performance of machine learning (ML) models is closely tied to the quality, quantity, and diversity of the training datasets. Insufficient data or datasets lacking variability often led to overfitting, where models excel on training data but fail to generalize to unseen examples. Data augmentation, a technique that artificially expands datasets by applying transformations, has become an indispensable tool for improving ML model performance. This paper explores a range of data augmentation techniques, including traditional methods such as image flipping and rotation, advanced approaches like GAN-generated synthetic data, and hybrid strategies such as mixup and CutMix. Using an image classification task as a baseline, we demonstrate how these techniques improve model robustness, increase accuracy by 15–20%, and reduce overfitting by 20%. The paper also examines how data augmentation adapts to different data types, including text, images, and videos, and discusses the unique challenges and benefits associated with each. Future directions focus on automating augmentation pipelines to optimize their application across domains, ultimately making ML workflows more robust and scalable.
How to Cite This Article
Cibaca Khandelwal (2021). Data augmentation techniques for ML models: Enhancing model performance through data variability . International Journal of Multidisciplinary Research and Growth Evaluation (IJMRGE), 2(2), 279-283. DOI: https://doi.org/10.54660/.IJMRGE.2021.2.1-279-283