11 min read

List into Tibble-Part-03


lists into tibbles
Lesson on how to convert a list to a tibble.

View raw source for this post

Summary

Function returns and data are often in list form. Some, like myself, prefer to convert them to a table or data frame to perform computations. Lists of equal lengths are easy to convert to a table, but lists of unequal lengths can be problematic. The tibble package allows for lists to be stored in the column of a data frame.

Table of Contents

Overview

This post discusses inserting a list into a data frame or converting a list to a data frame. While it may be achieved within the traditional data frame (the base package), we will use the tibble package to accomplish the task.

Converting a list to a tibble will require some knowledge of three packages: tibble, tidyr, and purrr package. Here, the workflow will be introduced, using the most common approaches with simple exercises.

More rigorous examples of lists to tibbles are reserved for Rectangling - Part 05 and Use Cases - Part 06. The concept of a “list-column” figures prominently in the discussion. It is a powerful method for storing one or more lists within a single tibble.

According to the tibble package details, “[t]ibble is the central data structure for the set of packages known as the tidyverse. Tidyverse packages include dplyr, ggplot2, and tidyr. The objective is to have a data frame where a list can be stored in a column and tibble provides this functionality. (”tibble” and “data frame” are often used interchangeably.)

tibbles

A tibble is a new take on data frames. A tibble is a data frame but it changes the default behaviors of R’s older data frame type. A tibble is different from the traditional data frame in the following ways:

  1. it does not change an input’s type;
  2. it makes it easier to use with list-columns;
  3. variable names are not changed;
  4. it evaluates its arguments lazily and sequentially;
  5. it never uses row.names(); and
  6. it only recycles vectors of length 1

“R for Data Science” devotes an entire chapter, Chapter 10, to tibbles.

Create via as_tibble()

tibble::as_tibble(iris[1:5, ])
# A tibble: 5 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa 
4          4.6         3.1          1.5         0.2 setosa 
5          5           3.6          1.4         0.2 setosa 

Create via tribble()

When entering data by hand, it may be easier to create the table using tribble(), short for “transposed table.”

tibble::tribble(
    ~x, ~y, ~z,
    #--|--|--
    "a", 1, 1.1,
    "b", 2, 2.1
)
# A tibble: 2 × 3
  x         y     z
  <chr> <dbl> <dbl>
1 a         1   1.1
2 b         2   2.1

Key Point

Lists can be described as having equal lengths or varying lengths. A list in the form list(a = 1, b = 1) is a list where there is one element to each named list. Thus, the list is one where the elements are of equal lengths.

list(a = 1, b = 1) |> map_dbl(length)
a b 
1 1 

However, lists are commonly of varying lengths like list(a = 1, b = c(1, 2)). Such a list would be described as having unequal or varying lengths.

list(a = 1, b = c(1, 2)) |> map_dbl(length)
a b 
1 2 

Where a list contains the same number of elements as rows in a data frame, it is both conceptually easy to visualize and, with some creativity, to assign it to a column.

ll <- list(a = c(1, 2), b = c(1, 2))
dt <- tibble(
    names = c("a", "b"),
    a = ll[["a"]],
    b = ll[["b"]]
)
dt
# A tibble: 2 × 3
  names     a     b
  <chr> <dbl> <dbl>
1 a         1     1
2 b         2     2

However, when lists are of unequal lengths, the solution is not as apparent.

ll <- list(a = 1, b = 1:5)
dt <- tibble(
    names = c("a", "b"),
    lists = ll
)
dt
# A tibble: 2 × 2
  names lists       
  <chr> <named list>
1 a     <dbl [1]>   
2 b     <int [5]>   

Rather than the values of the list being added, tibble created a column by default to store lists. This is extremely convenient, but a little confusing to get used to. Lists of unequal lengths are the most likely, but not exclusively, candidates to be stored within a tibble list-column.

List-columns

The columns of a tibble are usually an atomic vector, with each column sharing the same length and data type. However, a tibble can have a column that contains a list, also known as a “list-column”. List-columns can contain any of the different data types, making them extremely useful within a tibble.

A list-column can be created in several ways. List-columns are addressed in the context of models in R for Data Science–Chapter 25–Many Models and also in an evolving text, Functional Programming.

With tibble()

tibble() is used to construct a data frame.

tibble(x = 1:3, y = list(1:5, 1:10, 1:20))
# A tibble: 3 × 2
      x y         
  <int> <list>    
1     1 <int [5]> 
2     2 <int [10]>
3     3 <int [20]>

With enframe()

enframe() converts named atomic vectors or lists to one or two column data frames. Its opposite is deframe().

Example 1

list(1:3) |> enframe(name = NULL)
# A tibble: 1 × 1
  value    
  <list>   
1 <int [3]>

Example 2

list(a = 1, b = 2, c = c(3, 4)) |> enframe(name = "name")
# A tibble: 3 × 2
  name  value    
  <chr> <list>   
1 a     <dbl [1]>
2 b     <dbl [1]>
3 c     <dbl [2]>

With tidyr::nest()

The function nest() creates a list-column of data frames and its opposite is unnest(). In the example below, the length of each atomic vector is 6. (It’s not clear to me why you’d want to nest two columns in the example below, but it is nonetheless possible).

x_data <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) |>
    nest(data = c(y, z))
x_data
# A tibble: 3 × 2
      x data            
  <dbl> <list>          
1     1 <tibble [3 × 2]>
2     2 <tibble [2 × 2]>
3     3 <tibble [1 × 2]>

Unnest list-column

x_data |> unnest(data)
# A tibble: 6 × 3
      x     y     z
  <dbl> <int> <int>
1     1     1     6
2     1     2     5
3     1     3     4
4     2     4     3
5     2     5     2
6     3     6     1

list of equal lengths

There are a variety of ways to convert lists of equal lengths to a data frame. Here are three potential solutions. There are many others suggested in the StackOverflow question: Convert a list to a data frame.

# create list
ll <- list(a = c(1, 2), b = c(3, 4), c = c("x", "y"))

Solution 1 - tibble

# as tibble
tibble(
    a = ll[["a"]],
    b = ll[["b"]],
    c = ll$c
)
# A tibble: 2 × 3
      a     b c    
  <dbl> <dbl> <chr>
1     1     3 x    
2     2     4 y    

Solution 2 - dplyr

# as tibble
dplyr::bind_rows(ll)
# A tibble: 2 × 3
      a     b c    
  <dbl> <dbl> <chr>
1     1     3 x    
2     2     4 y    

Solution 3 - purrr

ll |>
    map_dfc(~ as_tibble(.x)) |>
    setNames(c("a", "b", "c"))
# A tibble: 2 × 3
      a     b c    
  <dbl> <dbl> <chr>
1     1     3 x    
2     2     4 y    
$a
[1] 1 2

$b
[1] 3 4

$c
[1] "x" "y"

Prelude to Rectangling

Before leaping into the dense vignette on “rectangling”, I thought we’d ease into the topic with some short, easy-to-inspect data.

JSON List

This json snippet represents the contact information for a single person, “Joe Jackson.” Note that the “address” and “phoneNumbers” fields are nested.

jsonList <- list(rjson::fromJSON('
{
   "firstName": "Joe",
   "lastName": "Jackson",
   "gender": "male",
   "age": 28,
   "address": {
       "streetAddress": "101",
       "city": "San Diego",
       "state": "CA"
   },
   "phoneNumbers": [
       { "type": "home", "number": "7349282382" }
   ]
}
'))

First, assign the json list to a tibble. The resulting tibble is a single list-column and the variable is named “contact.”

contacts <- tibble(contact = jsonList)
# tibble with single list column
contacts
# A tibble: 1 × 1
  contact         
  <list>          
1 <named list [6]>

Finally, unnest the different columns. This was accomplished only after several attempts.

contacts |>
    unnest_wider(contact) |>
    unnest_wider(address) |>
    unnest_wider(phoneNumbers) |>
    unnest_wider(`...1`)
# A tibble: 1 × 9
  firstName lastName gender   age streetAddress city      state type  number    
  <chr>     <chr>    <chr>  <dbl> <chr>         <chr>     <chr> <chr> <chr>     
1 Joe       Jackson  male      28 101           San Diego CA    home  7349282382

XML list

This is the same Joe Jackson data as before but in XML.

library(xml2)
xmlList <- as_list(read_xml(
    '<?xml version="1.0" encoding="UTF-8" ?>
        <root>
        <firstName>Joe</firstName>
        <lastName>Jackson</lastName>
        <gender>male</gender>
        <age>28</age>
        <address>
        <streetAddress>101</streetAddress>
        <city>San Diego</city>
        <state>CA</state>
        </address>
        <phoneNumbers>
        <type>home</type>
        <number>7349282382</number>
        </phoneNumbers>
        </root>
        '
) |> xml_ns_strip())
# tibble with single nested column
contacts <- tibble(contact = xmlList)
contacts
# A tibble: 1 × 1
  contact         
  <named list>    
1 <named list [6]>
df <- contacts |>
    unnest_wider(contact) |>
    unnest_wider(address) |>
    unnest_wider(phoneNumbers) |>
    mutate(across(everything(), unlist))

Conclusion

In this post, we learned about the tibble package and how to include or convert a list to a column in a tibble. Lists can either be of equal or unequal lengths. Where lists are of unequal lengths, a simple strategy is to include them in a tibble as a list-column. We also looked at how .json and .xml data are imported as a list and how to convert the list to a tibble using the tidyr function unnest_wider().

References

[1]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2022 [Online]. Available: https://www.R-project.org/
[2]
Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2022 [Online]. Available: https://CRAN.R-project.org/package=blogdown
[3]
L. Henry and H. Wickham, Purrr: Functional programming tools. 2020 [Online]. Available: https://CRAN.R-project.org/package=purrr
[4]
K. Müller and H. Wickham, Tibble: Simple data frames. 2021 [Online]. Available: https://CRAN.R-project.org/package=tibble
[5]
H. Wickham and M. Girlich, Tidyr: Tidy messy data. 2022 [Online]. Available: https://CRAN.R-project.org/package=tidyr

Disclaimer

The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.3 (2022-03-10)
 os       macOS Big Sur/Monterey 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2022-04-22
 pandoc   2.14.1 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 blogdown    * 1.9        2022-03-28 [1] CRAN (R 4.1.2)
 bookdown      0.25       2022-03-16 [1] CRAN (R 4.1.2)
 brio          1.1.3      2021-11-30 [1] CRAN (R 4.1.0)
 bslib         0.3.1.9000 2022-03-04 [1] Github (rstudio/bslib@888fbe0)
 cachem        1.0.6      2021-08-19 [1] CRAN (R 4.1.0)
 callr         3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cli           3.2.0      2022-02-14 [1] CRAN (R 4.1.2)
 colorspace    2.0-3      2022-02-21 [1] CRAN (R 4.1.2)
 crayon        1.5.1      2022-03-26 [1] CRAN (R 4.1.0)
 DBI           1.1.2      2021-12-20 [1] CRAN (R 4.1.0)
 desc          1.4.1      2022-03-06 [1] CRAN (R 4.1.2)
 devtools    * 2.4.3      2021-11-30 [1] CRAN (R 4.1.0)
 digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.0)
 dplyr         1.0.8      2022-02-08 [1] CRAN (R 4.1.2)
 ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
 fansi         1.0.3      2022-03-24 [1] CRAN (R 4.1.2)
 fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 formatR       1.12       2022-03-31 [1] CRAN (R 4.1.2)
 fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.0)
 generics      0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
 ggplot2       3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
 ggthemes    * 4.2.4      2021-01-20 [1] CRAN (R 4.1.0)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
 gtable        0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
 jquerylib     0.1.4      2021-04-26 [1] CRAN (R 4.1.0)
 jsonlite      1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
 knitr         1.38       2022-03-25 [1] CRAN (R 4.1.0)
 lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.1.2)
 memoise       2.0.1      2021-11-26 [1] CRAN (R 4.1.0)
 munsell       0.5.0.9000 2021-10-19 [1] Github (cwickham/munsell@e539541)
 pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
 pkgbuild      1.3.1      2021-12-20 [1] CRAN (R 4.1.0)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload       1.2.4      2021-11-30 [1] CRAN (R 4.1.0)
 prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx      3.5.3      2022-03-25 [1] CRAN (R 4.1.0)
 ps            1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr       * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
 remotes       2.4.2      2021-11-30 [1] CRAN (R 4.1.0)
 rjson         0.2.21     2022-01-09 [1] CRAN (R 4.1.2)
 rlang         1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
 rmarkdown     2.13       2022-03-10 [1] CRAN (R 4.1.2)
 rprojroot     2.0.3      2022-04-02 [1] CRAN (R 4.1.0)
 rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
 sass          0.4.1      2022-03-23 [1] CRAN (R 4.1.2)
 scales        1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.0)
 stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
 stringr       1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
 testthat      3.1.3      2022-03-29 [1] CRAN (R 4.1.2)
 tibble      * 3.1.6      2021-11-07 [1] CRAN (R 4.1.0)
 tidyr       * 1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
 tidyselect    1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
 usethis     * 2.1.5      2021-12-09 [1] CRAN (R 4.1.0)
 utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs         0.4.0      2022-03-30 [1] CRAN (R 4.1.2)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.0)
 xfun          0.30       2022-03-02 [1] CRAN (R 4.1.2)
 xml2        * 1.3.3      2021-11-30 [1] CRAN (R 4.1.0)
 yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)

 [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────