Long Format vs. Wide Format

Choosing between long and wide format depends on what you want to do with your data. Long format is usually better for analysis and visualization, while wide format is often better for inspection and for methods that require matrix-like inputs. Being able to move between the two lets you work in the most convenient form for each task.

Long Format

Long format works well when you want to summarize or visualize repeated measures. Once each item becomes a row category, grouping and plotting become much simpler.

library(dplyr)
library(tidyr)
data("bfi", package = "psych")
bfi_long <- bfi %>%
  pivot_longer(A1:A5, names_to = "item", values_to = "response") %>%
  select(item, response, everything())
head(bfi_long)
# A tibble: 6 × 25
  item  response    C1    C2    C3    C4    C5    E1    E2    E3    E4    E5
  <chr>    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 A1           2     2     3     3     4     4     3     3     3     4     4
2 A2           4     2     3     3     4     4     3     3     3     4     4
3 A3           3     2     3     3     4     4     3     3     3     4     4
4 A4           4     2     3     3     4     4     3     3     3     4     4
5 A5           4     2     3     3     4     4     3     3     3     4     4
6 A1           2     5     4     4     3     4     1     1     6     4     3
# ℹ 13 more variables: N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>,
#   O1 <int>, O2 <int>, O3 <int>, O4 <int>, O5 <int>, gender <int>,
#   education <int>, age <int>
bfi_long %>%
  group_by(item) %>%
  summarise(mean = mean(response, na.rm = TRUE))
# A tibble: 5 × 2
  item   mean
  <chr> <dbl>
1 A1     2.41
2 A2     4.80
3 A3     4.60
4 A4     4.70
5 A5     4.56

Long format also works naturally with ggplot2, especially for faceting.

Wide Format

Wide format is helpful when you need all variables as separate columns, such as for correlation matrices or for exporting readable summary tables.

bfi %>%
  select(A1:A5) %>%
  cor(use = "pairwise")
           A1         A2         A3         A4         A5
A1  1.0000000 -0.3401932 -0.2652471 -0.1464245 -0.1814383
A2 -0.3401932  1.0000000  0.4850980  0.3350872  0.3900836
A3 -0.2652471  0.4850980  1.0000000  0.3604283  0.5041411
A4 -0.1464245  0.3350872  0.3604283  1.0000000  0.3075373
A5 -0.1814383  0.3900836  0.5041411  0.3075373  1.0000000

You can also pivot back to wide format after summarizing:

bfi_long %>%
  group_by(item) %>%
  summarise(mean = mean(response, na.rm = TRUE)) %>%
  pivot_wider(names_from = item, values_from = mean)
# A tibble: 1 × 5
     A1    A2    A3    A4    A5
  <dbl> <dbl> <dbl> <dbl> <dbl>
1  2.41  4.80  4.60  4.70  4.56
Back to top