mean(bfi$age)[1] 28.78214
bfi$age |> mean()[1] 28.78214
The pipe operator simply passes the output from the upstream function into the first argument of the downstream function. Hence, these two expressions are equivalent:
Create a pipeline to complete the following operations on the mtcars dataset.
mtcars dataset.mtcars data.Your solution should be a single pipeline that produces a length-one logical vector.
You can use the following Base R functions in your solution:
scale(): Standardize variablescolMeans(): Compute the mean of each column in a matrix or data frame.The data piped into a function only fill the first argument. We’re free to specify any additional inputs in the normal way. Hence, the following to expressions are also equivalent.
Create a pipeline to complete the following operations in one expression.
Your solution should be a single pipeline that produces a length-one logical vector.
Many R functions, particularly those that use a so-called formula interface, don’t take the input data as their first argument. If you try to include such a function in a normal pipeline, it won’t work. For example, in the following code, we try to use the bfi dataset to estimate a linear regression model wherein age and open predict extra.
Error in `as.data.frame.default()`:
! cannot coerce class '"formula"' to a data.frame
In these cases, you can use the special placeholder token _ to tell R explicitly where to insert the piped object.
Call:
lm(formula = extra ~ age + open, data = bfi)
Coefficients:
(Intercept) age open
2.752007 0.004406 0.276073
Note that we must name the argument to which we assign _. The following won’t work.
Error in lm(extra ~ age + open, "_"): pipe placeholder can only be used as a named argument (<input>:1:8)
This trick makes it possible to use nearly any function in a pipeline, even when the dataset isn’t the first argument.
Create a pipeline to perform the following operations with the bfi dataset:
dplyr::slice_sample().agree onto extra, open, and gender.
lm().library(dplyr)
set.seed(235711)
bfi |>
slice_sample(n = 500) |>
lm(agree ~ extra + open + gender, data = _) |>
resid() |>
abs() |>
sum()[1] 318.6229
NOTES:
set.seed(235711) before your pipeline, you should get exactly the same result because slice_sample() will use the same sequence of pseudo random numbers to pick the rows it “randomly” samples.