Replace NA

It is common to have NAs in your data. However, sometimes instead of removing them you may want to replace them. below are three methods to carry this out.

Tidyverse reference page

Dataset

For demonstration we’ll load the mammal_sleep_tbl data from the mgrtibbles package (hyperlink includes install instructions). Additionally, we’ll subset it to:

  • Keep the first ten rows with at least one NA.
  • COlumns species to life_span
#Load package
library("mgrtibbles")
#mammal_sleep_tbl tibble for demonstration
#Subset to only keep rows with at least one NA
mammal_sleep_na_tbl <- mgrtibbles::mammal_sleep_tbl[!complete.cases(mammal_sleep_tbl),] |>
    #Slice to keep first ten rows and select columns species to life_span
    dplyr::slice(1:10) |> dplyr::select(species:life_span)
#View tibble
mammal_sleep_na_tbl
# A tibble: 10 × 7
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6654      5.71           NA       NA           3.3      38.6
 2 ArcticFox           3.38   0.0445         NA       NA          12.5      14  
 3 Arcticgroundsqu…    0.92   0.0057         NA       NA          16.5      NA  
 4 Deserthedgehog      0.55   0.0024          7.6      2.7        10.3      NA  
 5 Donkey            187.     0.419          NA       NA           3.1      40  
 6 Genet               1.41   0.0175          4.8      1.3         6.1      34  
 7 Giantarmadillo     60      0.081          12        6.1        18.1       7  
 8 Giraffe           529      0.68           NA        0.3        NA        28  
 9 Gorilla           207      0.406          NA       NA          12        39.3
10 Graywolf           36.3    0.120          NA       NA          13        16.2

Replace with a value

Replace NAs in specified columns with tidyr::replace_na().

When used with a tibble/data.frame the function needs to be provided with a list. The list contains the column names and the replacement value.

One column

Replaces NAs in the dreaming column with 0.

mammal_sleep_na_tbl |>
    tidyr::replace_na(list(dreaming=0))
# A tibble: 10 × 7
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6654      5.71           NA        0           3.3      38.6
 2 ArcticFox           3.38   0.0445         NA        0          12.5      14  
 3 Arcticgroundsqu…    0.92   0.0057         NA        0          16.5      NA  
 4 Deserthedgehog      0.55   0.0024          7.6      2.7        10.3      NA  
 5 Donkey            187.     0.419          NA        0           3.1      40  
 6 Genet               1.41   0.0175          4.8      1.3         6.1      34  
 7 Giantarmadillo     60      0.081          12        6.1        18.1       7  
 8 Giraffe           529      0.68           NA        0.3        NA        28  
 9 Gorilla           207      0.406          NA        0          12        39.3
10 Graywolf           36.3    0.120          NA        0          13        16.2

Multiple columns

Replaces:

  • NAs in the dreaming column with 0.
  • NAs in the life_span column with 1.
mammal_sleep_na_tbl |>
    tidyr::replace_na(list(dreaming=0, life_span=1))
# A tibble: 10 × 7
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6654      5.71           NA        0           3.3      38.6
 2 ArcticFox           3.38   0.0445         NA        0          12.5      14  
 3 Arcticgroundsqu…    0.92   0.0057         NA        0          16.5       1  
 4 Deserthedgehog      0.55   0.0024          7.6      2.7        10.3       1  
 5 Donkey            187.     0.419          NA        0           3.1      40  
 6 Genet               1.41   0.0175          4.8      1.3         6.1      34  
 7 Giantarmadillo     60      0.081          12        6.1        18.1       7  
 8 Giraffe           529      0.68           NA        0.3        NA        28  
 9 Gorilla           207      0.406          NA        0          12        39.3
10 Graywolf           36.3    0.120          NA        0          13        16.2

Vector

the replace_na() function can be used with vectors. In this case the function only needs to value to be replace NAs with.

mammal_sleep_na_tbl |>
    #Pull out non_dreaming column as a vector
    dplyr::pull(non_dreaming) |>
    #Replace NAs with 0
    tidyr::replace_na(0)
 [1]  0.0  0.0  0.0  7.6  0.0  4.8 12.0  0.0  0.0  0.0

Replace with other column

NAs can be replaced with the corresponding value in another column. This carried out with the following functions:

  • dplyr::mutate: Mutate columns to create new columns based on existing ones, modify existing columns, and delete columns.
  • dplyr::coalesce: Finds the first non-missing value at each point.
mammal_sleep_na_tbl |>
    dplyr::mutate(dreaming = dplyr::coalesce(dreaming, total_sleep))
# A tibble: 10 × 7
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6654      5.71           NA        3.3         3.3      38.6
 2 ArcticFox           3.38   0.0445         NA       12.5        12.5      14  
 3 Arcticgroundsqu…    0.92   0.0057         NA       16.5        16.5      NA  
 4 Deserthedgehog      0.55   0.0024          7.6      2.7        10.3      NA  
 5 Donkey            187.     0.419          NA        3.1         3.1      40  
 6 Genet               1.41   0.0175          4.8      1.3         6.1      34  
 7 Giantarmadillo     60      0.081          12        6.1        18.1       7  
 8 Giraffe           529      0.68           NA        0.3        NA        28  
 9 Gorilla           207      0.406          NA       12          12        39.3
10 Graywolf           36.3    0.120          NA       13          13        16.2

Replace all NAs

All NAs can be replaced with a value using base R’s is.na() function.

mammal_sleep_na_tbl[is.na(mammal_sleep_na_tbl)] <- 0
mammal_sleep_na_tbl
# A tibble: 10 × 7
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6654      5.71            0        0           3.3      38.6
 2 ArcticFox           3.38   0.0445          0        0          12.5      14  
 3 Arcticgroundsqu…    0.92   0.0057          0        0          16.5       0  
 4 Deserthedgehog      0.55   0.0024          7.6      2.7        10.3       0  
 5 Donkey            187.     0.419           0        0           3.1      40  
 6 Genet               1.41   0.0175          4.8      1.3         6.1      34  
 7 Giantarmadillo     60      0.081          12        6.1        18.1       7  
 8 Giraffe           529      0.68            0        0.3         0        28  
 9 Gorilla           207      0.406           0        0          12        39.3
10 Graywolf           36.3    0.120           0        0          13        16.2