Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

  • Version 1.0.0
  • Download 6
  • File Size 673.70 KB
  • File Count 1
  • Create Date August 2, 2018
  • Last Updated August 2, 2018

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

A common issue encountered in data analysis is the presence of missing values in datasets. Modern statistical techniques of data cleaning require complete data, but some statistical packages often default to the least desirable options in handling missing data such as (exclude cases list wise, exclude cases pair wise, exclude cases analysis by analysis, etc.,). Allowing software packages to perform the task of removing incomplete data most often creates the problem of eliminating a good deal of other important data that contribute to the overall analysis results. Incomplete or missing data affects the precision and validity of the result estimation depending on the extent of ‘,missingness’,. Various methods are available for handling missing values before data analysis. This study is aimed at comparing the results of using complete data in analysis, data missing completely at random (MCAR) or missing at random (MAR), means substitution (MS), strong and weak imputations as well as multiple imputations (MI). With a random sample of 3,000 examinee responses in the UTME Physics that was extracted and analyzed, about 20% of the data were deleted to simulate an MCAR situation. When the Physics scores were correlated with the UTME aggregate scores, result obtained showed significant relationship as moderated by discipline applied by the examinees in the original dataset. Missing data corrections that use mean substitution, deletion method and weak or strong imputation methods were found to be biased with population parameters being overestimated or under estimated. MI yielded a closest significant estimate of the of the population parameter (at p<, 0.05). It is recommended that MI method be used in handling missing or incomplete data because of the promise of providing unbiased estimates of relationship in MNAR situations. Missing values or incomplete data should not be discarded in analysis because doing this will limit the sample size, degree of freedom for analysis as well as produce biased estimates of the population parameters.Keywords: UTME, missingness, dataset, multiple imputations, unbiased estimate

Attached Files

FileAction
paper_3fc73498f.pdfDownload 
Menu
X