Shapiro-Wilk test is one of the most well-known tests for assessing the normality of data. This comprehensive guide includes the ways of applying Shapiro-Wilk test for univariate and multivariate normality. Find out how to use Shapiro-Wilk test in R.

Shapiro-Wilk test is the goodness-of-fit test for assessing normality of data. Razali and Wah (2011) made power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. They pointed out that Shapiro-Wilk test is the most powerful normality test. Yap and Sim (2011) made a comparison of various types of normality tests including Shapiro–Wilk test, Kolmogorov–Smirnov test, Lilliefors test, Cramer–von Mises test, Anderson–Darling test, D’Agostino–Pearson test, Jarque–Bera test and chi-squared test. Also, they stated that Shapiro–Wilk and D’Agostino tests have better power for symmetric short-tailed distributions. Moreover, they pointed out that Shapiro-Wilk, Jarque–Bera and D’Agostino tests performs well in power. Also, Shapiro–Wilk test was found to be the most powerful test for asymmetric distributions in same study.

In this guide, we will work on three ways of testing normality with Shapiro-Wilk test in R. Firstly, we will work on use of Shapiro-Wilk test for univariate data. Then, we will use Shapiro-Wilk test for univariate data in groups. Last, we will check multivariate normality via Shapiro-Wilk test.

Check Out: How to Categorize Numeric Variables in R

1. Shapiro-Wilk Test for Univariate Normality in R

In this part, we work on testing normality via Shapiro-Wilk test. We learn the use of shapiro.test() function. In this case, we use sepal length of setosa type (one of iris types) as an example data.

data <- iris$Sepal.Length[1:50]
shapiro.test(data)
## 
##         Shapiro-Wilk normality test
## 
## data:  data
## W = 0.9777, p-value = 0.4595

According to the result of Shapiro-Wilk test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.4595) is larger than alpha (0.05).

Also Check: How to Recode Character Variables in R

2. Shapiro-Wilk Test for Normality of Groups in R

In this section, we learn two ways of testing normality of data in groups with Shapiro-Wilk test. Firstly, we use nor.test() function available in onewaytests package (Dag et al., 2018) in R. Then, we test normality of data in each group using tapply() function. In this case, we assess sepal length’s normality of all iris types (setosa, versicolor, virginica) as an example data.

onewaytests::nor.test(Sepal.Length~Species, data = iris)
## 
##   Shapiro-Wilk Normality Test (alpha = 0.05) 
## -------------------------------------------------- 
##   data : Sepal.Length and Species 
## 
##        Level Statistic   p.value   Normality
## 1     setosa 0.9776985 0.4595132  Not reject
## 2 versicolor 0.9778357 0.4647370  Not reject
## 3  virginica 0.9711794 0.2583147  Not reject
## -------------------------------------------------- 

tapply(iris$Sepal.Length, iris$Species, shapiro.test)
## $setosa
## 
##         Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9777, p-value = 0.4595
## 
## 
## $versicolor
## 
##         Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.97784, p-value = 0.4647
## 
## 
## $virginica
## 
##         Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.97118, p-value = 0.2583

According to the results, Shapiro-Wilk normality test states that there is not enough evidence to reject the normality of sepal length values in each iris species since all p-values are greater than alpha (0.05).

Also Check: How to Handle Missing Values in R

3. Shapiro-Wilk Test for Multivariate Normality in R

In this section, we learn how to test multivariate normality via Shapiro-Wilk test. We use mshapiro.test() function available in mvnormtest package (Jarek, 2012). In this case, we assess multivariate normality of sepal length, sepal width, petal length, petal width of setosa type (one of iris types) as an example data.

data <- iris[1:50, 1:4]
mvnormtest::mshapiro.test(t(data))
## 
##         Shapiro-Wilk normality test
## 
## data:  Z
## W = 0.95878, p-value = 0.07906

According to the result of Shapiro-Wilk multivariate normality test, there is no enough evidence to reject null hypothesis (Ho: Distribution of data is multivariate normal) since p-value (0.07906) is larger than alpha (0.05).

The application of the codes is available in our youtube channel below.

Shapiro-Wilk Test for Univariate and Multivariate Normality in R Using RStudio
Subscribe to YouTube Channel

Don’t forget to check: 6 Ways of Subsetting Data in R

References

Dag, O., Dolgun, A., Konar, N.M. (2018). onewaytests: An R Package for One-Way Tests in Independent Groups Designs. R Journal, 10(1), 175-199.

Jarek, S. (2012). mvnormtest: Normality Test for Multivariate Variables. R package version 0.1-9.

Razali, N.M., Wah, Y.B. (2011). Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.

Yap, B.W., Sim, C.H. (2011). Comparisons of Various Types of Normality Tests. Journal of Statistical Computation and Simulation, 81(12), 2141-2155.


Dr. Osman Dag