When "Nothing" Becomes a Big Problem in R

Not long ago, I was scraping websites for multiple pieces of information. I was “fishing” in that the information may exist or it may not. Then, the information would be combined into a data frame. However, I kept getting an error message when the code ran. Since the numbers were very large and the websites diverse, it was unclear what about the data frame was giving the error message below:

 "Error in data.frame():  arguments imply differing number of rows: 0, 1"

Once the problem was reduced to its simplest form, the data frame looked like this:

a <- character()
b <- 1
df <- data.frame(a = a, b = b)

The question becomes what is the character() reference. From “An Introduction to R,” the text reads:

Vectors must have their values all of the same mode. Thus any given vector must be unambiguously either logicalnumericcomplexcharacter or raw. (The only apparent exception to this rule is the special “value” listed as NA for quantities not available, but in fact there are several types of NA). Note that a vector can be empty and still have a mode. For example the empty character string vector is listed as character(0) and the empty numeric vector as numeric(0).

The problem was a challenge in that I needed the empty character to serve as a placeholder so that I could recombine the data. Ideally, “NA” would be assigned because it would hold the place and communicate that no value was present. Rather, the solution was about catching the error and then replacing it with “NA”. Once the error was caught, the data frame could be built.

#The solution for catching the error
if(purrr::is_empty(title)){assign("a", NA)}

#Thus the equivalent data frame was
a <- NA
b <- 1
df <- data.frame(a = a, b = b)
   a b
1 NA 1
#no error: yay!

The is_empty function was provided by the purrr package on Cran. It took a couple of hours to find the solution and it was an example of when “nothing” can become a big problem.

R, technologyRobert Wiederstein