Feature Engineering for Healthcare Big Data: Approaches to Missing Data Imputation, Dimensionality Reduction, and Time-Series Analysis
Abstract
In the domain of healthcare analytics, especially regarding electronic health records (EHRs), feature engineering plays an integrative role in clinical understanding of Big Data. The scale of EHR data has a wealth of temporal and heterogeneous information, but it is extremely challenging because of the high dimensionality, irregular sampling, and high volumes of missing data. This paper presents an attempt to capture some of the most important methods on missing data imputation, dimensionality reduction and time-series modeling. In addition, we construct an empirically grounded feature engineering pipeline based on real world experiences, which include significant deduplication projects within clinical data. The review and framework not only provide insights but also serve as a practical guide for researchers and practitioners for enhancing the use of EHR data in predictive modeling, patient stratification, and population health analytics.
How to Cite This Article
Simran Sethi (2020). Feature Engineering for Healthcare Big Data: Approaches to Missing Data Imputation, Dimensionality Reduction, and Time-Series Analysis . International Journal of Multidisciplinary Research and Growth Evaluation (IJMRGE), 1(1), 120-124. DOI: https://doi.org/10.54660/.IJMRGE.2020.1.1.120-124