Basic Summaries

The heart of tidyverse descriptive statistics is the summarise() function. It reduces a dataset to one or more summary statistics.

Suppose we want to compute the mean, minimum, and maximum of the YEAR variable in the stock data:

library(dplyr)
stocks %>%
  summarise(
    mean_year = mean(YEAR, na.rm = TRUE),
    min_year  = min(YEAR,  na.rm = TRUE),
    max_year  = max(YEAR,  na.rm = TRUE)
  )
  mean_year min_year max_year
1    1971.5     1928     2015

This code produces a one-row tibble containing the requested values.

Notice that we must explicitly set na.rm = TRUE. Unlike base R’s summary(), tidyverse functions never remove missing values unless told to do so. This makes missing-data handling visible and intentional.

Practice

Try requesting the mean of SPSTOCKS from the Stocks dataset using dplyr.

NOTE: The stocks dataset and the dplyr package are already loaded in the working directory of this webr session.

stocks %>%
  summarise(
    mean_spstocks = mean(SPSTOCK, na.rm = TRUE)
  )
  mean_spstocks
1      11.42045
Back to top