Summaries Across Multiple Variables

Often, we want to apply the same summary function to several variables. The across() helper allows us to do this in a compact and readable way. Here, everything() can be used to select all variables in the dataset, and subsequently .x is commonly used to refer back to either everyting() or any other selection of variables you might have made.

Practice

Try requesting the summary statistics for the entire stocks dataset with dplyr.

Interactive Editor
Solution

stocks %>%
  summarise(across(everything(), 
                    ~ mean(.x, na.rm = TRUE)))

    YEAR   TBONDS  SPSTOCK TBONDS_D SPSTOCK_D
1 1971.5 5.463415 11.42045 0.804878 0.7159091

You can also make an explicit selection of a subset of variables. For example, to compute the mean of the Bonds and the S&P500:

library(dplyr)
stocks %>%
  summarise(across(c(TBONDS_D, SPSTOCK), 
                    ~ mean(.x, na.rm = TRUE)))

  TBONDS_D  SPSTOCK
1 0.804878 11.42045

across() selects columns TBONDS_D and SPSTOCK, and applies a function—here, mean()—to each of them (referred to as .x within the mean() function).

We can also compute multiple summaries per variable by supplying a list of functions:

stocks %>%
  summarise(across(c(TBONDS_D, SPSTOCK), 
                    list(mean = ~ mean(.x, na.rm = TRUE),
                         sd = ~ sd(.x, na.rm = TRUE))))

  TBONDS_D_mean TBONDS_D_sd SPSTOCK_mean SPSTOCK_sd
1      0.804878   0.3987333     11.42045   19.84638

The result contains several columns for each variable, one for each requested statistic.

Practice

Try requesting the summary statistics for column 2 and 5 of stocks with dplyr.

Interactive Editor
Solution

stocks %>%
  summarise(across(c(2, 5), 
                    list(mean = ~ mean(.x, na.rm = TRUE),
                         sd = ~ sd(.x, na.rm = TRUE))))

  TBONDS_mean TBONDS_sd SPSTOCK_D_mean SPSTOCK_D_sd
1    5.463415   7.99567      0.7159091    0.4535648