Pipes

The preferred method when using dplyr and other tidyverse package functions is to use pipes.

The tidyverse pipe symbol is |> (historically it was %>%).

Pipes allow you to more easily combine multiple functions together with a logical flow. A big advantage of pipes is that they are generally easier to read compared to nesting functions within each other.

This page will give a brief introduction to pipes using various functions that are covered in other pages of this section. Other pages will demonstrate further examples of pipes.

Dataset

For demonstration we’ll load the hbr_maples data from the lterdatasampler package (hyperlink includes install instructions).

Note: When piping, the first variable of the post pipe function is the piped data/object.

#Load package
library("lterdatasampler")
#hbr_maples tibble for demonstration
maples_tbl <- tibble::as_tibble(lterdatasampler::hbr_maples)
maples_tbl
# A tibble: 359 × 11
    year watershed elevation transect sample stem_length leaf1area leaf2area
   <dbl> <fct>     <fct>     <fct>    <fct>        <dbl>     <dbl>     <dbl>
 1  2003 Reference Low       R1       1             86.9     13.8      12.1 
 2  2003 Reference Low       R1       2            114       14.6      15.3 
 3  2003 Reference Low       R1       3             83.5     12.5       9.73
 4  2003 Reference Low       R1       4             68.1      9.97     10.1 
 5  2003 Reference Low       R1       5             72.1      6.84      5.48
 6  2003 Reference Low       R1       6             77.7      9.66      7.64
 7  2003 Reference Low       R1       7             85.5      8.82      9.23
 8  2003 Reference Low       R1       8             81.6      5.83      6.18
 9  2003 Reference Low       R1       9             92.9      8.11      7.13
10  2003 Reference Low       R1       10            59.6      3.02      3.44
# ℹ 349 more rows
# ℹ 3 more variables: leaf_dry_mass <dbl>, stem_dry_mass <dbl>,
#   corrected_leaf_area <dbl>

One step pipe

Below is a one step pipe command. In it we pipe our tibble maples_tbl into the function slice() to extract rows 1:5.

maples_tbl |> dplyr::slice(1:5)
# A tibble: 5 × 11
   year watershed elevation transect sample stem_length leaf1area leaf2area
  <dbl> <fct>     <fct>     <fct>    <fct>        <dbl>     <dbl>     <dbl>
1  2003 Reference Low       R1       1             86.9     13.8      12.1 
2  2003 Reference Low       R1       2            114       14.6      15.3 
3  2003 Reference Low       R1       3             83.5     12.5       9.73
4  2003 Reference Low       R1       4             68.1      9.97     10.1 
5  2003 Reference Low       R1       5             72.1      6.84      5.48
# ℹ 3 more variables: leaf_dry_mass <dbl>, stem_dry_mass <dbl>,
#   corrected_leaf_area <dbl>

The above command acts the same as the below.

dplyr::slice(maples_tbl,1:5)
# A tibble: 5 × 11
   year watershed elevation transect sample stem_length leaf1area leaf2area
  <dbl> <fct>     <fct>     <fct>    <fct>        <dbl>     <dbl>     <dbl>
1  2003 Reference Low       R1       1             86.9     13.8      12.1 
2  2003 Reference Low       R1       2            114       14.6      15.3 
3  2003 Reference Low       R1       3             83.5     12.5       9.73
4  2003 Reference Low       R1       4             68.1      9.97     10.1 
5  2003 Reference Low       R1       5             72.1      6.84      5.48
# ℹ 3 more variables: leaf_dry_mass <dbl>, stem_dry_mass <dbl>,
#   corrected_leaf_area <dbl>

To assign the output of piped function you can utilise the assignment operator as usual.

maples_subset_tbl <- maples_tbl |> dplyr::slice(1:7)
maples_subset_tbl
# A tibble: 7 × 11
   year watershed elevation transect sample stem_length leaf1area leaf2area
  <dbl> <fct>     <fct>     <fct>    <fct>        <dbl>     <dbl>     <dbl>
1  2003 Reference Low       R1       1             86.9     13.8      12.1 
2  2003 Reference Low       R1       2            114       14.6      15.3 
3  2003 Reference Low       R1       3             83.5     12.5       9.73
4  2003 Reference Low       R1       4             68.1      9.97     10.1 
5  2003 Reference Low       R1       5             72.1      6.84      5.48
6  2003 Reference Low       R1       6             77.7      9.66      7.64
7  2003 Reference Low       R1       7             85.5      8.82      9.23
# ℹ 3 more variables: leaf_dry_mass <dbl>, stem_dry_mass <dbl>,
#   corrected_leaf_area <dbl>

Multi step pipe

Pipes are great when you need to use multiple functions on one object.

In the below case we:

  • Extract rows 1 to 1000 with dplyr::slice()
  • Then filter the 1000 rows so we only retain observations/rows where the elevation column value is equal to “Low” with dplyr::filter()
  • Select columns leaf1area and leaf2area with dplyr::select()
  • Summarise the resulting tibble with summary()

Note: Pipes are not only limited to tidyverse functions.

maples_tbl |> 
    #Extract indexes 1 to 1000
    dplyr::slice(1:1000) |>
    #Extract only low elevation samples
    dplyr::filter(elevation == "Low") |>
    #Select the columns leaf1area and leaf2area
    dplyr::select(c(leaf1area,leaf2area)) |>
    #Summarise tibble
    summary()
   leaf1area        leaf2area     
 Min.   : 2.480   Min.   : 3.444  
 1st Qu.: 9.308   1st Qu.: 9.548  
 Median :12.110   Median :11.891  
 Mean   :12.239   Mean   :12.305  
 3rd Qu.:15.405   3rd Qu.:15.588  
 Max.   :26.198   Max.   :24.235  

Less visually clear method with nested functions.

summary(dplyr::select(dplyr::filter(dplyr::slice(maples_tbl,1:1000),elevation=="Low"),c(leaf1area,leaf2area)))
   leaf1area        leaf2area     
 Min.   : 2.480   Min.   : 3.444  
 1st Qu.: 9.308   1st Qu.: 9.548  
 Median :12.110   Median :11.891  
 Mean   :12.239   Mean   :12.305  
 3rd Qu.:15.405   3rd Qu.:15.588  
 Max.   :26.198   Max.   :24.235