Filter

The dplyr::filter() function can be used to extract rows that meet a certain condition.

This can be a useful method to:

Tidyverse reference page

Dataset

For demonstration we’ll load the knz_bison data from the lterdatasampler package (hyperlink includes install instructions).

#Load package
library("lterdatasampler")
#knz_bison tibble for demonstration
bison_tbl <- lterdatasampler::knz_bison |>
    #Convert to tibble
    tibble::as_tibble()
bison_tbl
# A tibble: 8,325 × 8
   data_code rec_year rec_month rec_day animal_code animal_sex animal_weight
   <chr>        <dbl>     <dbl>   <dbl> <chr>       <chr>              <dbl>
 1 CBH01         1994        11       8 813         F                    890
 2 CBH01         1994        11       8 834         F                   1074
 3 CBH01         1994        11       8 B-301       F                   1060
 4 CBH01         1994        11       8 B-402       F                    989
 5 CBH01         1994        11       8 B-403       F                   1062
 6 CBH01         1994        11       8 B-502       F                    978
 7 CBH01         1994        11       8 B-503       F                   1068
 8 CBH01         1994        11       8 B-504       F                   1024
 9 CBH01         1994        11       8 B-601       F                    978
10 CBH01         1994        11       8 B-602       F                   1188
# ℹ 8,315 more rows
# ℹ 1 more variable: animal_yob <dbl>

Numeric columns

Filter the tibble to only contain rows/observations from the year 2000 and onwards (>2000).

bison_tbl |> dplyr::filter(rec_year > 2000)
# A tibble: 6,939 × 8
   data_code rec_year rec_month rec_day animal_code animal_sex animal_weight
   <chr>        <dbl>     <dbl>   <dbl> <chr>       <chr>              <dbl>
 1 CBH01         2001        11      14 A1          F                    824
 2 CBH01         2001        11      14 A11         F                   1030
 3 CBH01         2001        11      14 A12         F                    984
 4 CBH01         2001        11      14 A13         F                    986
 5 CBH01         2001        11      14 A14         F                    978
 6 CBH01         2001        11      14 A15         F                   1052
 7 CBH01         2001        11      14 A16         F                   1010
 8 CBH01         2001        11      14 A17         F                    992
 9 CBH01         2001        11      14 A18         F                    960
10 CBH01         2001        11      14 A19         F                    960
# ℹ 6,929 more rows
# ℹ 1 more variable: animal_yob <dbl>

String columns

Filter the tibble to only contain male observations.

bison_tbl |> dplyr::filter(animal_sex == "M")
# A tibble: 3,040 × 8
   data_code rec_year rec_month rec_day animal_code animal_sex animal_weight
   <chr>        <dbl>     <dbl>   <dbl> <chr>       <chr>              <dbl>
 1 CBH01         1994        11       8 910         M                    982
 2 CBH01         1994        11       8 91C         M                   1020
 3 CBH01         1994        11       8 91F         M                   1050
 4 CBH01         1994        11       8 91H         M                   1037
 5 CBH01         1994        11       8 91J         M                   1104
 6 CBH01         1994        11       8 91K         M                   1306
 7 CBH01         1994        11       8 91L         M                   1122
 8 CBH01         1994        11       8 91M         M                   1136
 9 CBH01         1994        11       8 91P         M                    996
10 CBH01         1994        11       8 91W         M                    954
# ℹ 3,030 more rows
# ℹ 1 more variable: animal_yob <dbl>

Multiple columns

You can filter based on multiple columns with logical operators.

The 2 main R logical operators are:

  • &: and (ampersand symbol)
  • |: or (pipe symbol)

Extract females that have a weight greater or equal to 1000 pounds.

bison_tbl |> dplyr::filter(animal_sex == "F" & animal_weight >= 1000)
# A tibble: 1,215 × 8
   data_code rec_year rec_month rec_day animal_code animal_sex animal_weight
   <chr>        <dbl>     <dbl>   <dbl> <chr>       <chr>              <dbl>
 1 CBH01         1994        11       8 834         F                   1074
 2 CBH01         1994        11       8 B-301       F                   1060
 3 CBH01         1994        11       8 B-403       F                   1062
 4 CBH01         1994        11       8 B-503       F                   1068
 5 CBH01         1994        11       8 B-504       F                   1024
 6 CBH01         1994        11       8 B-602       F                   1188
 7 CBH01         1994        11       8 B-701       F                   1030
 8 CBH01         1994        11       8 B-704       F                   1030
 9 CBH01         1994        11       8 B-706       F                   1108
10 CBH01         1994        11       8 884         F                   1046
# ℹ 1,205 more rows
# ℹ 1 more variable: animal_yob <dbl>

Extract males or samples with a weight less than 900 pounds.

bison_tbl |> dplyr::filter(animal_sex == "M" | animal_weight < 900)
# A tibble: 5,982 × 8
   data_code rec_year rec_month rec_day animal_code animal_sex animal_weight
   <chr>        <dbl>     <dbl>   <dbl> <chr>       <chr>              <dbl>
 1 CBH01         1994        11       8 813         F                    890
 2 CBH01         1994        11       8 B-905       F                    828
 3 CBH01         1994        11       8 B-909       F                    812
 4 CBH01         1994        11       8 W-008       F                    884
 5 CBH01         1994        11       8 910         M                    982
 6 CBH01         1994        11       8 91C         M                   1020
 7 CBH01         1994        11       8 91F         M                   1050
 8 CBH01         1994        11       8 91H         M                   1037
 9 CBH01         1994        11       8 91J         M                   1104
10 CBH01         1994        11       8 91K         M                   1306
# ℹ 5,972 more rows
# ℹ 1 more variable: animal_yob <dbl>

Pipes

You can use pipes for finer control of multiple comparisons.

The below command carries out the following filtering steps in order:

  1. Extract females.
  2. Extract observations with animal weight between 900 and 1100.
  3. Extract samples from the year 2000.
bison_tbl |> 
    #Filter to retain females
    dplyr::filter(animal_sex == "F") |>
    #Filter to retain samples with animal_weight between 900 and 1100
    dplyr::filter(animal_weight > 900 & animal_weight < 1000) |>
    #Retain samples from year 2000
    dplyr::filter(rec_year == 2000)
# A tibble: 39 × 8
   data_code rec_year rec_month rec_day animal_code animal_sex animal_weight
   <chr>        <dbl>     <dbl>   <dbl> <chr>       <chr>              <dbl>
 1 CBH01         2000        11      16 A11         F                    976
 2 CBH01         2000        11      16 A14         F                    930
 3 CBH01         2000        11      16 A16         F                    952
 4 CBH01         2000        11      16 A17         F                    964
 5 CBH01         2000        11      16 A18         F                    904
 6 CBH01         2000        11      16 A2          F                    910
 7 CBH01         2000        11      16 A22         F                    992
 8 CBH01         2000        11      16 A27         F                    970
 9 CBH01         2000        11      16 A28         F                    902
10 CBH01         2000        11      16 A6          F                    992
# ℹ 29 more rows
# ℹ 1 more variable: animal_yob <dbl>

NA removal

For removal of NAs please see the tidyr drop NA page.