A dataset is considered wide when repeated measurements are spread across multiple columns. For example, imagine we have a small subset of Big Five Personality (BFI) data where each person has multiple personality items stored across separate columns:
This is wide because the items A1, A2, and A3 are all separate variables, even though they represent repeated measures of the same underlying construct within a person.
To convert these columns into a long format, we use pivot_longer() and select the variables we want to change from wide to long. Here, we also indicate how we want to name the new (now) single column (values_to), and specify the variable name that will consists of various categories indicating from which individual column the values originate (names_to).
Now each row corresponds to a single item response, the item column tells us which item it was, the response column gives the value.
This long structure is ideal for plotting, modeling, and group-based summaries.
Pivoting Multiple Columns at Once
In real datasets, multiple variables may need pivoting simultaneously. For example, suppose we have two versions of a test, with “pre” and “post” scores:
library(dplyr)data_long <- data %>%pivot_longer(cols =-id,names_to =c("time", "item"),names_sep ="_",values_to ="score" )head(data_long)
# A tibble: 6 × 4
id time item score
<int> <chr> <chr> <dbl>
1 1 pre A 3
2 1 post A 4
3 1 pre B 2
4 1 post B 3
5 2 pre A 4
6 2 post A 5
Practice
Transform the following wide dataset into long format. The dataset contains measurements of height and weight taken at two different time points (Time1 and Time2).