Calculating summary statistics of variables

In this tutorial, we use a dataset about profits from stocks and bonds. We can take a look at the data using head(), which shows us the first 6 rows of the data frame.

head(stocks)
  YEAR TBONDS SPSTOCK TBONDS_D SPSTOCK_D
1 1928      1      44        1         1
2 1929      4      -8        1         0
3 1930     NA     -25       NA         0
4 1931     -3     -44        0         0
5 1932      9      -9        1         0
6 1933      2      50        1         1

The data set contains five variables:

The str() function tells us the class of each variable. You will notice that most variables in the stocks data contain integers.

str(stocks)
'data.frame':   88 obs. of  5 variables:
 $ YEAR     : int  1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 ...
 $ TBONDS   : int  1 4 NA -3 9 2 8 4 5 1 ...
 $ SPSTOCK  : int  44 -8 -25 -44 -9 50 -1 47 32 -35 ...
 $ TBONDS_D : int  1 1 NA 0 1 1 1 1 1 1 ...
 $ SPSTOCK_D: int  1 0 0 0 0 1 0 1 1 0 ...

We will explore this data set further. At the start, we are interested to get more information about our collected variables. In most cases, we want at least to know the following parameters:

For this purpose, R provides the following functions:

mean()
sd()
min()
max()

You can use these for a single variable (at a time). You can call up each variable by adding $‘variable name’ after the name of the data frame.

Practice

Try requesting the mean of SPSTOCK from the stocks dataset.

NOTE: The stocks dataset is already loaded in the working directory of this webr session.

mean(stocks$SPSTOCK)
[1] 11.42045
Back to top