How to Assess Normality in R

Normality has an essential role since the most of the statistical methods are based on normal distribution. This ultimate guide includes the ways of checking normality. Find out how to assess normality in R.

In this guide, we will work on three ways of assessing normality in R. Firstly, we will work on visual methods for assessing normality. Then, we will present the most well-known normality tests available in R. Last, we will go through case of large sample size.

In this part, we will use iris data set available in R. Firstly, let’s construct our data set we will work on. We work on sepal length of setosa type (one of iris types).

data <- iris$Sepal.Length[1:50]

Visual Methods for Assessing Normality

In this part, we work on visual methods to assess normality in R. There exist two main graphics of assessing the distribution; namely, density plot and Q-Q plot.

1. Density Plot

Density plot provides a visual assessment about whether the distribution is bell shaped or not. We use ggdensity() function available in ggpubr package (Kassambara, 2020).

ggpubr::ggdensity(data,  fill = "lightgray", add = "mean",  xlab = "Sepal Length of Setosa Type")

Check Out: Shapiro-Wilk Test for Univariate and Multivariate Normality in R

2. Q-Q Plot

Q-Q plot (Quantile-Quantile plot) is drawn between given sample and normal distribution. A 45-degree reference line is also plotted to assess how close sample values are to normal distribution. To draw Q-Q plot, we use ggqqplot() function available in ggpubr package (Kassambara, 2020).

ggpubr::ggqqplot(data)

Normality Tests

In this part, we go through seven well-known normality tests in R. The following is the list of these tests.

Shapiro-Wilk test
Jarque-Bera test
Anderson-Darling test
Cramer-von Mises test
Lilliefors (Kolmogorov-Smirnov) test
Pearson chi-square test
Shapiro-Francia test

1. Shapiro-Wilk Test for Normality

We use shapiro.test() function to check the normality of data with Shapiro-Wilk test.

shapiro.test(data)
## 
##         Shapiro-Wilk normality test
## 
## data:  data
## W = 0.9777, p-value = 0.4595

According to the result of Shapiro-Wilk test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.4595) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

Also Check: How to Handle Missing Values in R

2. Jarque-Bera Test for Normality

We use jarque.bera.test() function to check the normality of data with Jarque-Bera test available in tseries package (Trapletti and Hornik, 2019).

tseries::jarque.bera.test(data)
## 
##         Jarque Bera Test
## 
## data:  data
## X-squared = 0.36208, df = 2, p-value = 0.8344

Jarque-Bera test suggest that there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.8344) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

3. Anderson-Darling Test for Normality

To use Anderson-Darling test for assessing normality in R, we apply ad.test() function available in nortest package (Gross and Ligges, 2015).

nortest::ad.test(data)
## 
##         Anderson-Darling normality test
## 
## data:  data
## A = 0.40799, p-value = 0.3352

According to Anderson-Darling test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.3352) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

4. Cramer-von Mises Test for Normality

Researchers can use Cramer-von Mises test to assess normality of data with cvm.test() function available in nortest package (Gross and Ligges, 2015).

nortest::cvm.test(data)
## 
##         Cramer-von Mises normality test
## 
## data:  data
## W = 0.071753, p-value = 0.2597

According to Cramer-von Mises test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.2597) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

Also Check: How to Recode Character Variables in R

5. Lilliefors (Kolmogorov-Smirnov) Test for Normality

One can assess the normality via Lilliefors test with lillie.test() function available in nortest package (Gross and Ligges, 2015).

nortest::lillie.test(data)
## 
##         Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  data
## D = 0.11486, p-value = 0.09693

Lilliefors test states that there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.09693) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

6. Pearson Chi-square Test for Normality

Researchers can use Pearson chi-square test for assessing normality with pearson.test() function available in nortest package (Gross and Ligges, 2015).

nortest::pearson.test(data)
## 
##         Pearson chi-square normality test
## 
## data:  data
## P = 9.2, p-value = 0.2386

We can conclude that the data are normally distributed since p-value (0.2386) is larger than alpha (0.05) according to Pearson chi-square test.

7. Shapiro-Francia Test for Normality

One can assess the normality through Shapiro-Francia test with sf.test() function available in nortest package (Gross and Ligges, 2015).

nortest::sf.test(data)
## 
##         Shapiro-Francia normality test
## 
## data:  data
## W = 0.9817, p-value = 0.5357

According to the results of Shapiro-Francia test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.5357) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

Case of Large Sample Size

According to Central Limit Theorem, no matter what distribution is, the sampling distribution of mean tends to be normal if the sample is large enough (n ≥ 30). It is important to note that sample size in each group must be large enough. In such a case, normality is not needed to be met since the sampling distribution is normal.

The application of the codes is available in our youtube channel below.

How to Assess Normality in R Using RStudio

Don’t forget to check: How to Categorize Numeric Variables in R

References

Gross, J., Ligges, U. (2015). nortest: Tests for Normality. R package version 1.0-4.

Kassambara, A. (2020). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.4.0.

Trapletti, A., Hornik, K. (2019). tseries: Time Series Analysis and Computational Finance. R package version 0.10-47.

Visual Methods for Assessing Normality

1. Density Plot

2. Q-Q Plot

Normality Tests

1. Shapiro-Wilk Test for Normality

2. Jarque-Bera Test for Normality

3. Anderson-Darling Test for Normality

4. Cramer-von Mises Test for Normality

5. Lilliefors (Kolmogorov-Smirnov) Test for Normality

6. Pearson Chi-square Test for Normality

7. Shapiro-Francia Test for Normality

Case of Large Sample Size

References

0 Comments

6 Pingbacks

Recent Posts

Jobs for Data Scientist

Archives

How to Assess Normality in R

Visual Methods for Assessing Normality

1. Density Plot

2. Q-Q Plot

Normality Tests

1. Shapiro-Wilk Test for Normality

2. Jarque-Bera Test for Normality

3. Anderson-Darling Test for Normality

4. Cramer-von Mises Test for Normality

5. Lilliefors (Kolmogorov-Smirnov) Test for Normality

6. Pearson Chi-square Test for Normality

7. Shapiro-Francia Test for Normality

Case of Large Sample Size

References

0 Comments

6 Pingbacks

Recent Posts

Jobs for Data Scientist

Archives

Connect With Us