It is very difficult to have complete data while making data analysis in practice. In this tutorial, we learn simple missing imputation techniques – mean, median, mode. Find out how to impute missing data in R.
In this tutorial, we learn three simple imputation methods in R. Firstly, we learn how to make missing data imputation with mean. Secondly, we go over median imputation. At last, we learn how to make mode imputation in R.
1) How to Make Mean Imputation in R
In our example, we create a vector including a missing observation. We find the place of missing observation with is.na() function. After that, we use mean() function to find by excluding missing observations. In our example, the mean of the vector is 225 after excluding missing observations.
data <- c(100, 200, 300, 300, NA)
data[is.na(data)] <- mean(data, na.rm = TRUE)
data
## [1] 100 200 300 300 225
Check Out: How to Remove Outliers from Data in R
2) How to Make Median Imputation in R
In this section, we learn how to conduct median imputation in R. We utilize the median of the vector with median() function by keeping the missing observations out. For our example data, the median is 250 after excluding NAs.
data <- c(100, 200, 300, 300, NA)
data[is.na(data)] <- median(data, na.rm = TRUE)
data
## [1] 100 200 300 300 250
Also Check: How to Handle Missing Values in R
3) How to Make Mode Imputation in R
In this part, we go over how to implement mode imputation in R. This imputation type is generally used for categorical variables. We need to use mode of the variable. For this purpose, we can find frequency of each value using table() function which removes the NAs in default. After finding the frequencies, we use which.max() function to find the place of highest frequency. Then, we use names() function to find the mode, but it returns the output with “character” class. Therefore, we use as.numeric() function to return output as numeric. In our example, the mode of the variable is 300 after keeping missing observations away.
data <- c(100, 200, 300, 300, NA)
data[is.na(data)] <- as.numeric(names(which.max(table(data))))
data
## [1] 100 200 300 300 300
Also Check: How to Clean Data in R
The application of the codes is available in our youtube channel below.
Don’t forget to check: How to Merge Data Frames in R
0 Comments
2 Pingbacks