-
Notifications
You must be signed in to change notification settings - Fork 420
Description
I think it would be really useful for pivot_longer to preserve the variable labels as the value labels after pivoting. Unfortunately this is not possible. To clarify, I don't think value labels should be preserved.
I work with survey data that are usually saved as .sav files. I then use the haven package to import them to R. This gives me both variable labels and value labels. In this case, only variable labels are relevant. Often, the variable labels are the questions used in the survey and are quite long. When a question is "select all that apply", each response option is split into a new variable. In order to analyze the questions, I use pivot_longer to make it into one variable.
The issue I run into is that I would like to maintain the original variable labels as the value labels. Here is an example data frame with variable labels.
library(tidyr)
# create fake data
df <- tibble(
q1 = haven::labelled(0:1, label = "Question 1"),
q2_1 = haven::labelled(0:1, label = "Question 2, Response Option 1"),
q2_2 = haven::labelled(0:1, label = "Question 2, Response Option 2"),
q2_3 = haven::labelled(0:1, label = "Question 2, Response Option 3"),
q2_4 = haven::labelled(0:1, label = "Question 2, Response Option 4"),
q2_5 = haven::labelled(0:1, label = "Question 2, Response Option 5"),
q2_6 = haven::labelled(0:1, label = "Question 2, Response Option 6"),
q2_7 = haven::labelled(0:1, label = "Question 2, Response Option 7"),
q2_8 = haven::labelled(0:1, label = "Question 2, Response Option 8"),
q3 = haven::labelled(0:1, label = "Question 3")
)
# pivot the data
df_long <- df %>%
pivot_longer(
cols = -c(q1, q3),
names_to = " var",
values_to = "resp"
)
df_long
labelled::look_for(df_long)
We can see in df_long that none of the variable labels made it in as value_labels for "var". We can manually set it using labelled::set_variable_labels like this:
df_long <- df_long %>%
set_value_labels(
var = c(q2_1 = "Question 2, Response Option 1",
q2_2 = "Question 2, Response Option 2",
q2_3 = "Question 2, Response Option 3",
q2_4 = "Question 2, Response Option 4",
q2_5 = "Question 2, Response Option 5",
q2_6 = "Question 2, Response Option 6",
q2_7 = "Question 2, Response Option 7",
q2_8 = "Question 2, Response Option 8")
)
labelled::look_for(df_long)
We can see now that the "var" variable has value labels. These value labels are the same as the variable labels of the variables that were pivoted. This is the ideal output.
Is it possible to change pivot_longer() so that it provides you the option of using the variable labels as the new value_labels in the "var" column?
The current workaround I have found is with the sjlabelled::label_to_colname() function as seen below:
df_long <- df %>%
sjlabelled::label_to_colnames(q2_1:q2_8) %>%
pivot_longer(
cols = -c(q1, q3),
names_to = "var",
values_to = "resp"
)
df_long
However, this is really not ideal as it just renames the variables immediately and doesn't add value_labels. Thus making subsetting the "var" variable incredibly cumbersome.
I also think that this should be possible since each variable has only one variable_label and therefore there shouldn't be any conflicts when pivoting, unlike with value_labels.