The function dplyr::select()
allows you to select columns from a tibble . There are many different ways to do this with various helper functions.
When subsetting with dplyr::select()
the resulting object will always be a tibble .
Tidyverse reference page
Dataset
For demonstration we’ll load the mammal_sleep_tbl
data from the mgrtibbles package (hyperlink includes install instructions). For easier viewing we’ll subset it so it only has 5 rows.
#Load package
library ("mgrtibbles" )
#mammal_sleep_tbl tibble for demonstration
mammal_sleep_tbl<- mgrtibbles:: mammal_sleep_tbl |> dplyr:: slice (1 : 5 )
mammal_sleep_tbl
# A tibble: 5 × 11
species body_wt brain_wt non_dreaming dreaming total_sleep life_span gestation
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Africa… 6654 5.71 NA NA 3.3 38.6 645
2 Africa… 1 0.0066 6.3 2 8.3 4.5 42
3 Arctic… 3.38 0.0445 NA NA 12.5 14 60
4 Arctic… 0.92 0.0057 NA NA 16.5 NA 25
5 Asiane… 2547 4.60 2.1 1.8 3.9 69 624
# ℹ 3 more variables: predation <fct>, exposure <fct>, danger <fct>
One column
Select one column
mammal_sleep_tbl |> dplyr:: select (dreaming)
# A tibble: 5 × 1
dreaming
<dbl>
1 NA
2 2
3 NA
4 NA
5 1.8
Multiple columns
Select multiple columns individually.
mammal_sleep_tbl |> dplyr:: select (species,dreaming,gestation)
# A tibble: 5 × 3
species dreaming gestation
<chr> <dbl> <dbl>
1 Africanelephant NA 645
2 Africangiantpouchedrat 2 42
3 ArcticFox NA 60
4 Arcticgroundsquirrel NA 25
5 Asianelephant 1.8 624
Consecutive range of columns
Select a consecutive range of columns using the first and last column names of the range.
mammal_sleep_tbl |> dplyr:: select (dreaming: gestation)
# A tibble: 5 × 4
dreaming total_sleep life_span gestation
<dbl> <dbl> <dbl> <dbl>
1 NA 3.3 38.6 645
2 2 8.3 4.5 42
3 NA 12.5 14 60
4 NA 16.5 NA 25
5 1.8 3.9 69 624
Numeric indexes
Numeric indexes can be used for column selection.
Select the first column.
mammal_sleep_tbl |> dplyr:: select (1 )
# A tibble: 5 × 1
species
<chr>
1 Africanelephant
2 Africangiantpouchedrat
3 ArcticFox
4 Arcticgroundsquirrel
5 Asianelephant
Select columns 3:5.
mammal_sleep_tbl |> dplyr:: select (3 : 5 )
# A tibble: 5 × 3
brain_wt non_dreaming dreaming
<dbl> <dbl> <dbl>
1 5.71 NA NA
2 0.0066 6.3 2
3 0.0445 NA NA
4 0.0057 NA NA
5 4.60 2.1 1.8
Select columns 4, 7,and 2.
mammal_sleep_tbl |> dplyr:: select (c (4 ,7 ,2 ))
# A tibble: 5 × 3
non_dreaming life_span body_wt
<dbl> <dbl> <dbl>
1 NA 38.6 6654
2 6.3 4.5 1
3 NA 14 3.38
4 NA NA 0.92
5 2.1 69 2547
Select all but the 6th column.
mammal_sleep_tbl |> dplyr:: select (- 6 )
# A tibble: 5 × 10
species body_wt brain_wt non_dreaming dreaming life_span gestation predation
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Africane… 6654 5.71 NA NA 38.6 645 3
2 Africang… 1 0.0066 6.3 2 4.5 42 3
3 ArcticFox 3.38 0.0445 NA NA 14 60 1
4 Arcticgr… 0.92 0.0057 NA NA NA 25 5
5 Asianele… 2547 4.60 2.1 1.8 69 624 3
# ℹ 2 more variables: exposure <fct>, danger <fct>
Last column
Select the last column with last_col()
helper function.
mammal_sleep_tbl |> dplyr:: select (last_col ())
# A tibble: 5 × 1
danger
<fct>
1 3
2 3
3 1
4 3
5 4
Select the fourth last column (i.e. 3 spaces from the last column).
mammal_sleep_tbl |> dplyr:: select (last_col (3 ))
# A tibble: 5 × 1
gestation
<dbl>
1 645
2 42
3 60
4 25
5 624
Select the last 3 columns.
mammal_sleep_tbl |> dplyr:: select (last_col (2 ): last_col ())
# A tibble: 5 × 3
predation exposure danger
<fct> <fct> <fct>
1 3 5 3
2 3 1 3
3 1 1 1
4 5 2 3
5 3 5 4
Starts with
Select columns using a prefix of the column names to extract with the starts_with()
helper function.
mammal_sleep_tbl |> dplyr:: select (starts_with ("b" ))
# A tibble: 5 × 2
body_wt brain_wt
<dbl> <dbl>
1 6654 5.71
2 1 0.0066
3 3.38 0.0445
4 0.92 0.0057
5 2547 4.60
Ends with
Select columns using a prefix of the column names to extract with the ends_with()
helper function.
mammal_sleep_tbl |> dplyr:: select (ends_with ("dreaming" ))
# A tibble: 5 × 2
non_dreaming dreaming
<dbl> <dbl>
1 NA NA
2 6.3 2
3 NA NA
4 NA NA
5 2.1 1.8
Contains
Select columns using a string the column names will contain with the contains()
helper function.
mammal_sleep_tbl |> dplyr:: select (contains ("in" ))
# A tibble: 5 × 3
brain_wt non_dreaming dreaming
<dbl> <dbl> <dbl>
1 5.71 NA NA
2 0.0066 6.3 2
3 0.0445 NA NA
4 0.0057 NA NA
5 4.60 2.1 1.8
Matches
The matches()
function is similar to the above contains()
function but it can be used to select with regular expressions.
In the below example we use []
to mean one of the letters within them. In other words [eu]r
means “er” or “ur” .
mammal_sleep_tbl |> dplyr:: select (matches ("[eu]r" ))
# A tibble: 5 × 2
exposure danger
<fct> <fct>
1 5 3
2 1 3
3 1 1
4 2 3
5 5 4
Another example where s[lp]
mean “sl” or “sp” .
mammal_sleep_tbl |> dplyr:: select (matches ("s[lp]" ))
# A tibble: 5 × 3
species total_sleep life_span
<chr> <dbl> <dbl>
1 Africanelephant 3.3 38.6
2 Africangiantpouchedrat 8.3 4.5
3 ArcticFox 12.5 14
4 Arcticgroundsquirrel 16.5 NA
5 Asianelephant 3.9 69
Combine with c()
You can combine many of the methods above with c()
.
Select the species column and the last three columns.
mammal_sleep_tbl |> dplyr:: select (
c (species,last_col (2 ): last_col ())
)
# A tibble: 5 × 4
species predation exposure danger
<chr> <fct> <fct> <fct>
1 Africanelephant 3 5 3
2 Africangiantpouchedrat 3 1 3
3 ArcticFox 1 1 1
4 Arcticgroundsquirrel 5 2 3
5 Asianelephant 3 5 4
Select the species column, the columns containing dreaming, and the 6th to 7th columns.
mammal_sleep_tbl |> dplyr:: select (
c (species, contains ("dreaming" ),6 : 7 )
)
# A tibble: 5 × 5
species non_dreaming dreaming total_sleep life_span
<chr> <dbl> <dbl> <dbl> <dbl>
1 Africanelephant NA NA 3.3 38.6
2 Africangiantpouchedrat 6.3 2 8.3 4.5
3 ArcticFox NA NA 12.5 14
4 Arcticgroundsquirrel NA NA 16.5 NA
5 Asianelephant 2.1 1.8 3.9 69