Select

The function dplyr::select() allows you to select columns from a tibble. There are many different ways to do this with various helper functions.

When subsetting with dplyr::select() the resulting object will always be a tibble.

Tidyverse reference page

Dataset

For demonstration we’ll load the mammal_sleep_tbl data from the mgrtibbles package (hyperlink includes install instructions). For easier viewing we’ll subset it so it only has 5 rows.

#Load package
library("mgrtibbles")
#mammal_sleep_tbl tibble for demonstration
mammal_sleep_tbl<- mgrtibbles::mammal_sleep_tbl |> dplyr::slice(1:5)
mammal_sleep_tbl
# A tibble: 5 × 11
  species body_wt brain_wt non_dreaming dreaming total_sleep life_span gestation
  <chr>     <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>     <dbl>
1 Africa… 6654      5.71           NA       NA           3.3      38.6       645
2 Africa…    1      0.0066          6.3      2           8.3       4.5        42
3 Arctic…    3.38   0.0445         NA       NA          12.5      14          60
4 Arctic…    0.92   0.0057         NA       NA          16.5      NA          25
5 Asiane… 2547      4.60            2.1      1.8         3.9      69         624
# ℹ 3 more variables: predation <fct>, exposure <fct>, danger <fct>

One column

Select one column

mammal_sleep_tbl |> dplyr::select(dreaming)
# A tibble: 5 × 1
  dreaming
     <dbl>
1     NA  
2      2  
3     NA  
4     NA  
5      1.8

Multiple columns

Select multiple columns individually.

mammal_sleep_tbl |> dplyr::select(species,dreaming,gestation)
# A tibble: 5 × 3
  species                dreaming gestation
  <chr>                     <dbl>     <dbl>
1 Africanelephant            NA         645
2 Africangiantpouchedrat      2          42
3 ArcticFox                  NA          60
4 Arcticgroundsquirrel       NA          25
5 Asianelephant               1.8       624

Consecutive range of columns

Select a consecutive range of columns using the first and last column names of the range.

mammal_sleep_tbl |> dplyr::select(dreaming:gestation)
# A tibble: 5 × 4
  dreaming total_sleep life_span gestation
     <dbl>       <dbl>     <dbl>     <dbl>
1     NA           3.3      38.6       645
2      2           8.3       4.5        42
3     NA          12.5      14          60
4     NA          16.5      NA          25
5      1.8         3.9      69         624

Numeric indexes

Numeric indexes can be used for column selection.

Select the first column.

mammal_sleep_tbl |> dplyr::select(1)
# A tibble: 5 × 1
  species               
  <chr>                 
1 Africanelephant       
2 Africangiantpouchedrat
3 ArcticFox             
4 Arcticgroundsquirrel  
5 Asianelephant         

Select columns 3:5.

mammal_sleep_tbl |> dplyr::select(3:5)
# A tibble: 5 × 3
  brain_wt non_dreaming dreaming
     <dbl>        <dbl>    <dbl>
1   5.71           NA       NA  
2   0.0066          6.3      2  
3   0.0445         NA       NA  
4   0.0057         NA       NA  
5   4.60            2.1      1.8

Select columns 4, 7,and 2.

mammal_sleep_tbl |> dplyr::select(c(4,7,2))
# A tibble: 5 × 3
  non_dreaming life_span body_wt
         <dbl>     <dbl>   <dbl>
1         NA        38.6 6654   
2          6.3       4.5    1   
3         NA        14      3.38
4         NA        NA      0.92
5          2.1      69   2547   

Select all but the 6th column.

mammal_sleep_tbl |> dplyr::select(-6)
# A tibble: 5 × 10
  species   body_wt brain_wt non_dreaming dreaming life_span gestation predation
  <chr>       <dbl>    <dbl>        <dbl>    <dbl>     <dbl>     <dbl> <fct>    
1 Africane… 6654      5.71           NA       NA        38.6       645 3        
2 Africang…    1      0.0066          6.3      2         4.5        42 3        
3 ArcticFox    3.38   0.0445         NA       NA        14          60 1        
4 Arcticgr…    0.92   0.0057         NA       NA        NA          25 5        
5 Asianele… 2547      4.60            2.1      1.8      69         624 3        
# ℹ 2 more variables: exposure <fct>, danger <fct>

Last column

Select the last column with last_col() helper function.

mammal_sleep_tbl |> dplyr::select(last_col())
# A tibble: 5 × 1
  danger
  <fct> 
1 3     
2 3     
3 1     
4 3     
5 4     

Select the fourth last column (i.e. 3 spaces from the last column).

mammal_sleep_tbl |> dplyr::select(last_col(3))
# A tibble: 5 × 1
  gestation
      <dbl>
1       645
2        42
3        60
4        25
5       624

Select the last 3 columns.

mammal_sleep_tbl |> dplyr::select(last_col(2):last_col())
# A tibble: 5 × 3
  predation exposure danger
  <fct>     <fct>    <fct> 
1 3         5        3     
2 3         1        3     
3 1         1        1     
4 5         2        3     
5 3         5        4     

Starts with

Select columns using a prefix of the column names to extract with the starts_with() helper function.

mammal_sleep_tbl |> dplyr::select(starts_with("b"))
# A tibble: 5 × 2
  body_wt brain_wt
    <dbl>    <dbl>
1 6654      5.71  
2    1      0.0066
3    3.38   0.0445
4    0.92   0.0057
5 2547      4.60  

Ends with

Select columns using a prefix of the column names to extract with the ends_with() helper function.

mammal_sleep_tbl |> dplyr::select(ends_with("dreaming"))
# A tibble: 5 × 2
  non_dreaming dreaming
         <dbl>    <dbl>
1         NA       NA  
2          6.3      2  
3         NA       NA  
4         NA       NA  
5          2.1      1.8

Contains

Select columns using a string the column names will contain with the contains() helper function.

mammal_sleep_tbl |> dplyr::select(contains("in"))
# A tibble: 5 × 3
  brain_wt non_dreaming dreaming
     <dbl>        <dbl>    <dbl>
1   5.71           NA       NA  
2   0.0066          6.3      2  
3   0.0445         NA       NA  
4   0.0057         NA       NA  
5   4.60            2.1      1.8

Matches

The matches() function is similar to the above contains() function but it can be used to select with regular expressions.

In the below example we use [] to mean one of the letters within them. In other words [eu]r means “er” or “ur”.

mammal_sleep_tbl |> dplyr::select(matches("[eu]r"))
# A tibble: 5 × 2
  exposure danger
  <fct>    <fct> 
1 5        3     
2 1        3     
3 1        1     
4 2        3     
5 5        4     

Another example where s[lp] mean “sl” or “sp”.

mammal_sleep_tbl |> dplyr::select(matches("s[lp]"))
# A tibble: 5 × 3
  species                total_sleep life_span
  <chr>                        <dbl>     <dbl>
1 Africanelephant                3.3      38.6
2 Africangiantpouchedrat         8.3       4.5
3 ArcticFox                     12.5      14  
4 Arcticgroundsquirrel          16.5      NA  
5 Asianelephant                  3.9      69  

Combine with c()

You can combine many of the methods above with c().

Select the species column and the last three columns.

mammal_sleep_tbl |> dplyr::select(
    c(species,last_col(2):last_col())
)
# A tibble: 5 × 4
  species                predation exposure danger
  <chr>                  <fct>     <fct>    <fct> 
1 Africanelephant        3         5        3     
2 Africangiantpouchedrat 3         1        3     
3 ArcticFox              1         1        1     
4 Arcticgroundsquirrel   5         2        3     
5 Asianelephant          3         5        4     

Select the species column, the columns containing dreaming, and the 6th to 7th columns.

mammal_sleep_tbl |> dplyr::select(
    c(species, contains("dreaming"),6:7)
)
# A tibble: 5 × 5
  species                non_dreaming dreaming total_sleep life_span
  <chr>                         <dbl>    <dbl>       <dbl>     <dbl>
1 Africanelephant                NA       NA           3.3      38.6
2 Africangiantpouchedrat          6.3      2           8.3       4.5
3 ArcticFox                      NA       NA          12.5      14  
4 Arcticgroundsquirrel           NA       NA          16.5      NA  
5 Asianelephant                   2.1      1.8         3.9      69