R is capable of pulling the desired portion of data. Subsetting a data frame in R is the most essential part of data manipulation. We will go through subsetting data in detail.

In this article, we will work on 6 ways to subset a data frame in R. Firstly, we will learn how to subset using brackets by selecting the rows and columns we want. Secondly, we will subset data by excluding the rows and colums we don’t want. Thirdly, we will select specific data by using brackets in combination with the which() function. Moreover, we will subset a data frame in R using the subset() function. Also, we will subset using the select() and filter() functions from the dplyr package (Wickham et al., 2020). Last but not least, we will select random sample from data by using sample() function.

Check Out: How to Import Data into R

In this part, we will use iris data set available in R. Firstly, let’s know the iris data we will work on.

class(iris)
## [1] "data.frame"

dim(iris)
## [1] 150   5

Also Check: What are Data Structures in R?

1. Subset Using Brackets by Selecting Rows and Columns

In this part, we use brackets by selecting rows and colums. Firstly, we pull the first three rows of data. Then, we select the first three rows and the columns from third to fifth.

iris[c(1:3),]
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa

iris[c(1:3),c(3:5)]
##   Petal.Length Petal.Width Species
## 1          1.4         0.2  setosa
## 2          1.4         0.2  setosa
## 3          1.3         0.2  setosa

2. Subset Using Brackets by Excluding Rows and Columns

Also, we can find same subset by excluding the rows and columns.

iris[-c(4:150),]
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa

iris[-c(4:150),-c(1:2)]
##   Petal.Length Petal.Width Species
## 1          1.4         0.2  setosa
## 2          1.4         0.2  setosa
## 3          1.3         0.2  setosa

Also Check: How to Clean Data in R

3. Subset Using Brackets with which() Function

We can select any subset of data in R based on condition with which() function. For example, let’s select setosa species with their sepal length larger than 5.6. Also, we can obtain the columns from third to fifth of the same subset.

iris[which(iris$Species=="setosa"&iris$Sepal.Length>5.6),]
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 15          5.8         4.0          1.2         0.2  setosa
## 16          5.7         4.4          1.5         0.4  setosa
## 19          5.7         3.8          1.7         0.3  setosa

iris[which(iris$Species=="setosa"&iris$Sepal.Length>5.6), 3:5]
##    Petal.Length Petal.Width Species
## 15          1.2         0.2  setosa
## 16          1.5         0.4  setosa
## 19          1.7         0.3  setosa

4. Subset Data with subset() Function

We can select same subsets with subset() fuction.

subset(iris, Species=="setosa"&Sepal.Length>5.6)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 15          5.8         4.0          1.2         0.2  setosa
## 16          5.7         4.4          1.5         0.4  setosa
## 19          5.7         3.8          1.7         0.3  setosa

subset(iris, Species=="setosa"&Sepal.Length>5.6, 3:5)
##    Petal.Length Petal.Width Species
## 15          1.2         0.2  setosa
## 16          1.5         0.4  setosa
## 19          1.7         0.3  setosa

5. Subset Data in Combination of select() and filter() Functions

We can obtain same subsets using filter() and select() functions available in dplyr package (Wickham et al., 2020).

library(dplyr)
filter(iris, Species=="setosa"&Sepal.Length>5.6)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.8         4.0          1.2         0.2  setosa
## 2          5.7         4.4          1.5         0.4  setosa
## 3          5.7         3.8          1.7         0.3  setosa

select(filter(iris, Species=="setosa"&Sepal.Length>5.6), 3:5)
##   Petal.Length Petal.Width Species
## 1          1.2         0.2  setosa
## 2          1.5         0.4  setosa
## 3          1.7         0.3  setosa

6. Subset a Random Sample with sample() Function

Lastly, we will learn how to sample a subset randomly from a data frame with sample() function.

set.seed(123) # For reproducibility of same result
iris[sample(1:nrow(iris), 3, replace = FALSE),]
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 44           5.0         3.5          1.6         0.6     setosa
## 118          7.7         3.8          6.7         2.2  virginica
## 61           5.0         2.0          3.5         1.0 versicolor

set.seed(123) # For reproducibility of same result
iris[sample(1:nrow(iris), 3, replace = FALSE), 3:5]
##     Petal.Length Petal.Width    Species
## 44           1.6         0.6     setosa
## 118          6.7         2.2  virginica
## 61           3.5         1.0 versicolor

The application of the codes is available in our youtube channel below.

6 Ways of Subsetting Data in R Using RStudio
Subscribe to YouTube Channel

Don’t forget to check: How to Export Data from R

References

Wickham, H., Francois, R., Henry, L., Muller, K. (2020). dplyr: A Grammar of Data Manipulation. R package version 1.0.2.