From Wide to Long

A dataset is considered wide when repeated measurements are spread across multiple columns. For example, imagine we have a small subset of Big Five Personality (BFI) data where each person has multiple personality items stored across separate columns:

library(dplyr)
library(tidyr)
data("bfi", package = "psych")
bfi_small <- bfi %>% 
  mutate(id = row_number()) %>%
  select(id, A1, A2, A3)
head(bfi_small)
      id A1 A2 A3
61617  1  2  4  3
61618  2  2  4  5
61620  3  5  4  5
61621  4  4  4  6
61622  5  2  3  3
61623  6  6  6  5

This is wide because the items A1, A2, and A3 are all separate variables, even though they represent repeated measures of the same underlying construct within a person.

To convert these columns into a long format, we use pivot_longer() and select the variables we want to change from wide to long. Here, we also indicate how we want to name the new (now) single column (values_to), and specify the variable name that will consists of various categories indicating from which individual column the values originate (names_to).

bfi_small_long <- bfi_small %>%
  pivot_longer(
    cols = A1:A3,
    names_to = "item",
    values_to = "response"
  )

head(bfi_small_long)
# A tibble: 6 × 3
     id item  response
  <int> <chr>    <int>
1     1 A1           2
2     1 A2           4
3     1 A3           3
4     2 A1           2
5     2 A2           4
6     2 A3           5

Now each row corresponds to a single item response, the item column tells us which item it was, the response column gives the value.

This long structure is ideal for plotting, modeling, and group-based summaries.

Pivoting Multiple Columns at Once

In real datasets, multiple variables may need pivoting simultaneously. For example, suppose we have two versions of a test, with “pre” and “post” scores:

library(tidyr)
data <- tibble(
  id = 1:3,
  pre_A = c(3, 4, 2),
  post_A = c(4, 5, 3),
  pre_B = c(2, 3, 2),
  post_B = c(3, 4, 3)
)
head(data)
# A tibble: 3 × 5
     id pre_A post_A pre_B post_B
  <int> <dbl>  <dbl> <dbl>  <dbl>
1     1     3      4     2      3
2     2     4      5     3      4
3     3     2      3     2      3

We can take all pre_* and post_* columns long:

library(dplyr)
data_long <- data %>%
  pivot_longer(
    cols = -id,
    names_to = c("time", "item"),
    names_sep = "_",
    values_to = "score"
  )
head(data_long)
# A tibble: 6 × 4
     id time  item  score
  <int> <chr> <chr> <dbl>
1     1 pre   A         3
2     1 post  A         4
3     1 pre   B         2
4     1 post  B         3
5     2 pre   A         4
6     2 post  A         5
Practice

Transform the following wide dataset into long format. The dataset contains measurements of height and weight taken at two different time points (Time1 and Time2).

# A tibble: 4 × 5
     id height_Time1 height_Time2 weight_Time1 weight_Time2
  <int>        <dbl>        <dbl>        <dbl>        <dbl>
1     1          150          152           50           52
2     2          160          162           60           62
3     3          170          172           70           72
4     4          180          182           80           82
long_data <- wide_data %>%
  pivot_longer(
    cols = -id,
    names_to = c("measurement", "time"),
    names_sep = "_",
    values_to = "value"
  )
Back to top