6 min read

Prepend or Pad Character to String


odometer
How would the above number be imported? (You need to know!)

View raw source for this post

Summary

With the ubiquitous “FIPS” codes, there is no consistency in how they are formatted. It seems like on every project, the merge is dependent upon the format matching in the datasets and then I’ve usually got a mess on my hands. A missing “0” is often the problem.

Table of Contents

Overview

As an illustration, FIPS codes will often look like this where the lengths of the strings are different. Some datasets will have the zero in front, while others will not. Sometimes, the problem isn’t how the data was formatted, but how the analyst imports it.

df <- tibble(fips = c("1000", "01000"))
df
# A tibble: 2 × 1
  fips 
  <chr>
1 1000 
2 01000

This wreaks havoc on your project when you go to merge.

Strategy 1 - Set column type

One strategy to dealing with this problem is to read in the data as character so that the formatting is clear. Otherwise, the function will just drop the leading zeros and convert the column to an integer. For example, in the data.table and utils package, the code might look like:

utils::read.csv(file = file, colClasses = "character")
#or
data.table::fread(input = input, colClasses = "character")

Strategy 2 - the readr package

I tried out the readr package for this problem and it correctly converted the example to a character string and retained the leading zero.

readr::read_csv(file = file)

Strategy 3 - the “Hacky Way”

I’m embarrased to admit that this was some code I routinely used in projects. And I cringe looking at it. The line of code was interrupting a readable magrittr pipe. (Check out the new pipe |>). There is a better, more elegant solution.

df$fips[which(nchar(df$fips)==4)] <- paste0("0", df$fips[which(nchar(df$fips)==4)])
df
# A tibble: 2 × 1
  fips 
  <chr>
1 01000
2 01000

Strategy 4 - the “Elegant Way”

library(stringr)
df %>% 
  mutate(fips = str_pad(fips, width = 5, side = "left", pad = "0"))
# A tibble: 2 × 1
  fips 
  <chr>
1 01000
2 01000

Conclusion

The best strategy is to avoid this problem in the first place. Import the data correctly. Importing all data as character first allows you to see the data as it is without getting transformed by an import package. The next best strategy is to use str_pad in the stringr package. It makes for readable code and describes what occurred. Check the Stackoverflow post, “How to add leading zeros?”, for additional ideas.

Sometimes the hardest part of coding is knowing what the best search term is. I wished I’d used the term “pad” and I could’ve avoided years of hacky code. Ugh.

References

[1]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2021 [Online]. Available: https://www.R-project.org/
[2]
Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2021 [Online]. Available: https://CRAN.R-project.org/package=blogdown
[3]
H. Wickham and J. Hester, Readr: Read rectangular text data. 2021 [Online]. Available: https://CRAN.R-project.org/package=readr
[4]
H. Wickham, Stringr: Simple, consistent wrappers for common string operations. 2019 [Online]. Available: https://CRAN.R-project.org/package=stringr
[5]
H. Wickham, Tidyverse: Easily install and load the tidyverse. 2021 [Online]. Available: https://CRAN.R-project.org/package=tidyverse

Disclaimer

The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       macOS Catalina 10.15.7      
 system   x86_64, darwin17.0          
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Chicago             
 date     2021-08-29                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
 backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
 blogdown    * 1.4     2021-07-23 [1] CRAN (R 4.1.0)
 bookdown      0.23    2021-08-13 [1] CRAN (R 4.1.0)
 broom         0.7.9   2021-07-27 [1] CRAN (R 4.1.0)
 bslib         0.2.5.1 2021-05-18 [1] CRAN (R 4.1.0)
 cachem        1.0.5   2021-05-15 [1] CRAN (R 4.1.0)
 callr         3.7.0   2021-04-20 [1] CRAN (R 4.1.0)
 cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.1.0)
 cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
 codetools     0.2-18  2020-11-04 [1] CRAN (R 4.1.0)
 colorspace    2.0-2   2021-06-24 [1] CRAN (R 4.1.0)
 crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
 DBI           1.1.1   2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.1.0)
 desc          1.3.0   2021-03-05 [1] CRAN (R 4.1.0)
 devtools    * 2.4.2   2021-06-07 [1] CRAN (R 4.1.0)
 digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
 dplyr       * 1.0.7   2021-06-18 [1] CRAN (R 4.1.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
 evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
 fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
 forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.1.0)
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
 generics      0.1.0   2020-10-31 [1] CRAN (R 4.1.0)
 ggplot2     * 3.3.5   2021-06-25 [1] CRAN (R 4.1.0)
 ggthemes    * 4.2.4   2021-01-20 [1] CRAN (R 4.1.0)
 glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
 gtable        0.3.0   2019-03-25 [1] CRAN (R 4.1.0)
 haven         2.4.3   2021-08-04 [1] CRAN (R 4.1.0)
 hms           1.1.0   2021-05-17 [1] CRAN (R 4.1.0)
 htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.1.0)
 jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.1.0)
 jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.1.0)
 knitr         1.33    2021-04-24 [1] CRAN (R 4.1.0)
 lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.1.0)
 lubridate     1.7.10  2021-02-26 [1] CRAN (R 4.1.0)
 magrittr    * 2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
 memoise       2.0.0   2021-01-26 [1] CRAN (R 4.1.0)
 modelr        0.1.8   2020-05-19 [1] CRAN (R 4.1.0)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.1.0)
 pillar        1.6.2   2021-07-29 [1] CRAN (R 4.1.0)
 pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
 pkgload       1.2.1   2021-04-06 [1] CRAN (R 4.1.0)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.1.0)
 processx      3.5.2   2021-04-30 [1] CRAN (R 4.1.0)
 prompt      * 1.0.1   2021-03-12 [1] CRAN (R 4.1.0)
 ps            1.6.0   2021-02-28 [1] CRAN (R 4.1.0)
 purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
 R6            2.5.0   2020-10-28 [1] CRAN (R 4.1.0)
 Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
 readr       * 2.0.1   2021-08-10 [1] CRAN (R 4.1.0)
 readxl        1.3.1   2019-03-13 [1] CRAN (R 4.1.0)
 remotes       2.4.0   2021-06-02 [1] CRAN (R 4.1.0)
 reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
 rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
 rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.1.0)
 roxygen2    * 7.1.1   2020-06-27 [1] CRAN (R 4.1.0)
 rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.1.0)
 rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
 rvest         1.0.1   2021-07-26 [1] CRAN (R 4.1.0)
 sass          0.4.0   2021-05-12 [1] CRAN (R 4.1.0)
 scales        1.1.1   2020-05-11 [1] CRAN (R 4.1.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
 stringi       1.7.3   2021-07-16 [1] CRAN (R 4.1.0)
 stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
 testthat      3.0.4   2021-07-01 [1] CRAN (R 4.1.0)
 tibble      * 3.1.3   2021-07-23 [1] CRAN (R 4.1.0)
 tidyr       * 1.1.3   2021-03-03 [1] CRAN (R 4.1.0)
 tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.1.0)
 tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.1.0)
 tzdb          0.1.2   2021-07-20 [1] CRAN (R 4.1.0)
 usethis     * 2.0.1   2021-02-10 [1] CRAN (R 4.1.0)
 utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
 vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
 withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
 xfun          0.25    2021-08-06 [1] CRAN (R 4.1.0)
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.1.0)
 yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)

[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library