The summary function

We can produce several summary statistics at once by using the summary() function from ‘base R’ (i.e. without installing packages). You can use this on your entire dataset at once.

Practice

Try requesting the summary statistics for the entire stocks dataset.

summary(stocks)
      YEAR          TBONDS           SPSTOCK          TBONDS_D     
 Min.   :1928   Min.   :-11.000   Min.   :-44.00   Min.   :0.0000  
 1st Qu.:1950   1st Qu.:  1.000   1st Qu.: -1.00   1st Qu.:1.0000  
 Median :1972   Median :  4.000   Median : 14.00   Median :1.0000  
 Mean   :1972   Mean   :  5.463   Mean   : 11.42   Mean   :0.8049  
 3rd Qu.:1993   3rd Qu.:  9.000   3rd Qu.: 25.25   3rd Qu.:1.0000  
 Max.   :2015   Max.   : 33.000   Max.   : 53.00   Max.   :1.0000  
                NA's   :6                          NA's   :6       
   SPSTOCK_D     
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :1.0000  
 Mean   :0.7159  
 3rd Qu.:1.0000  
 Max.   :1.0000  
                 

If you want to retrieve descriptive statistics for multiple variables, you can use the entire data frame as an argument for summary(). You can also select a subset of variables (columns) by using, for example, stocks[1:3] as an argument to select the first three columns or stocks[c(1,2,4)] to select column 1, 2 and 4. Try it out!

Practice

Try requesting the summary statistics for column 2 and 5 of stocks.

summary(stocks[c(1,5)])
      YEAR        SPSTOCK_D     
 Min.   :1928   Min.   :0.0000  
 1st Qu.:1950   1st Qu.:0.0000  
 Median :1972   Median :1.0000  
 Mean   :1972   Mean   :0.7159  
 3rd Qu.:1993   3rd Qu.:1.0000  
 Max.   :2015   Max.   :1.0000  
Back to top