Handling missing values is the crucial process before data analysis. This ultimate guide covers all important aspects of handling missing (NA) values. Find out how to deal with NA values in R.

In this guide, we will work on 5 ways for dealing with missing values. Firstly, we will learn how to find missing values in R. Secondly, we will count the number of NA values. Thirdly, we will go through the ways to remove NA values in R. Then, we will discover how to return error message when NA exists. Last, we will learn how to leave data with no action.

In this part, for example, we construct a 4×3 data frame including two NA values to learn how to deal with missing values.

data <- as.data.frame(matrix(c(1:7, NA, 8:10, NA), ncol = 3, byrow = TRUE))
data
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6
## 3  7 NA  8
## 4  9 10 NA

Check Out: What are Data Structures in R?

1. How to Discover NA in R

First, let’s find which cells of the data are missing by using is.na() function. Then, we discover which rows include NA values by using complete.cases() function.

is.na(data)
##         V1    V2    V3
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE  TRUE FALSE
## [4,] FALSE FALSE  TRUE

complete.cases(data)
## [1]  TRUE  TRUE FALSE FALSE

2. How to Count NA in R

In this part, we learn three ways of counting NA values in R. Firstly, we count all missing observations in R with sum() function. Then, we count the number of missing values in each column by utilizing colSums() function. Last, we learn how to determine the number of NA values in each row by using rowSums() function.

sum(is.na(data))
## [1] 2

colSums(is.na(data))
## V1 V2 V3 
##  0  1  1

rowSums(is.na(data))
## [1] 0 0 1 1

Also Check: 6 Ways of Subsetting Data in R

3. How to Remove NA in R

In this section, we work on six ways of removing NA values in R. Firstly, we use brackets with complete.cases() function to exclude missing values in R. Secondly, we omit missing values with na.omit() function. Thirdly, we learn how to get rid of NA values by using na.exclude() function. Then, we exclude NA values using drop_na() function available in tidyr package (Wickham, 2020). Moreover, we get rid of missing values of a column specified in drop_na() function. Last, we learn how to remove missing values of specified columns in drop_na() function.

data[complete.cases(data),]
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6

na.omit(data)
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6

na.exclude(data)
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6

library(tidyr)
data %>% drop_na()
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6

data %>% drop_na(V2)
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6
## 3  9 10 NA

data %>% drop_na(V2,V3)
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6

4. How to Return Error Message When NA Exists

In this part, we learn how to return error message when there exist missing values in data by using na.fail() function.

na.fail(data)
## Error in na.fail.default(data) : missing values in object

Also Check: How to Clean Data in R

5. How to Leave Data with No Action

We leave data without removing any missing values by using na.pass() function.

na.pass(data) 
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6
## 3  7 NA  8
## 4  9 10 NA

The application of the codes is available in our youtube channel below.

How to Handle Missing Values in R Using RStudio
Subscribe to YouTube Channel

Don’t forget to check: How to Export Data from R

References

Wickham, H. (2020). tidyr: Tidy Messy Data. R package version 1.1.2.


Dr. Osman Dag