Summary

This post predicts appliance energy usage.

Overview
Background
- Building Energy Usage
- Appliance Energy Use
Current Data Set
Data and model
- Plot
- Split
- Build Models
- Visualize
- Accuracy
- Refit
Results
Conclusion
Acknowledgements
References
Disclaimer
Reproducibility

Overview

The purpose of the post is to mimic the analysis of Candendo et. al. with a variety of statistical models using the tidymodels and modeltime packages in R. Candendo predicts appliance energy usage. The post will provide an (1) overview of energy prediction, (2) construct statistical models and (3) evaluate their effectiveness.

Background

Building Energy Usage

Energy usage has been explored extensively concerning buildings. “The buildings and building construction sectors consumed 36% of the global final energy and nearly 40% of total CO2 emission in 2018”. [1] Building energy usage may be reduced by improving efficiency. Less energy usage would lower buildings’ environmental and economic burden. [1] Effective prediction techniques could quantify savings and improve conservation. [1] Building designs could be chosen based on the expected energy usage, and future costs could be discounted to present value over the building’s expected useful life. [1]

In a 2020 metanalysis, Sun and coauthors performed an extensive review on building energy prediction compiling 105 papers.[1] The effort detailed “the entire data-driven process that includes feature engineering, potential data-driven models and expected outputs.” [1]

Sun grouped energy conservation strategies into three boxes: white, grey, and black. “White box” models predict the thermal behavior by numerical equations. [1]. “Grey box” or hybrid methods “combine physical models and data-driven approaches to simulate building energy.” [1]. “Black box” use machine learning methods to discover statistical patterns with the data set. [1]

Frequently, data sets contain (1) meteorological/outside, (2) indoor, (3) occupancy, (4) time, (5) building characteristics, (6) socioeconomic, and (7) historic features. [1] With regard to available features, meteorological information, historical data and time index are the top-3 important factors for building energy prediction. [1] Popular statistical models are artificial neural network (“ANN”), support vector regression (“SVR”), and linear regression (“LR”), while the less popular models are time series analysis and regression trees (“RT”). [1]

Appliance Energy Use

Appliances represent a significant portion (between 20 and 30% of the electrical energy demand[2] electricity consumption in domestic buildings is explained by two main factors: the type and number of electrical appliances and the use of the appliances by the occupants.[2] Buildings and, specifically residential buildings, are candidates in reducing energy as “approximately 74% of this electricity use is from all buildings, and 38% is from residential buildings. [3]

“Refrigeration and freezer loads tended to be very flat, while cooking, dishwasher, lights and small appliances showed distinct evening peaks.” [4] “Refrigerators have a uniform load profile, while clothes washers, cloth dryers, and dishwashers are very user-dependent and thus vary from household to household and time of day.”[3]

The variability in energy usage is a challenge for power operators. Electric grids are vulnerable to failure at peak load times. A Texas study attributed 75% of the peak demand to buildings, with 50% to residential and 25% to business. [3] Economizing appliance use during peak times has received more attention than HVAC systems because appliances do not affect “the comfort of the indoor environment.” [3]

Current Data Set

The data set originally served as the basis for an article entitled “Data-driven prediction models of energy use of appliances in a low-energy house.” [2] Since the publication of the article, the data set is available at the UCI Machine Learning Repository and can be found here. According to the website,

“the data set is time series measured at 10-minute intervals for 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non-predictive attributes (parameters). GitHub Repo.”

The data set consists of a single, residential house and was simplified to reduce the file size by filtering the measurements to those taken at the start of each hour. The resulting dataset contained 1/6 of the rows of the original. An exploratory analysis of the data set was previously conducted here.

The 2017 paper used four statistical models which “were trained with repeated cross-validation and evaluated on a testing set: (a) multiple linear regression, (b) support vector machine with radial kernel, (c) random forest and (d) gradient boosting machines (GBM). The best model (GBM) was able to explain 97% of the variance (R2) in the training set and with 57% in the testing set when using all the predictors.

Data and model

Plot

Split

# Split Data 90/10
set.seed(1)
# note the "initial_time_split" from "resamples" package
df_splits <- initial_time_split(df, prop = 0.9)
df_train <- training(df_splits)
df_test <- testing(df_splits)

Build Models

# Model 1: auto_arima ----
model_fit_arima_no_boost <- arima_reg() %>%
    set_engine(engine = "auto_arima") %>%
    fit(appliances ~ date_time + t1, data = df_train)
# Model 2: arima_boost ----
model_fit_arima_boosted <- arima_boost(
    min_n = 2,
    learn_rate = 0.015
) %>%
    set_engine(engine = "auto_arima_xgboost") %>%
    fit(appliances ~ date_time + as.numeric(date_time) + factor(hour(date_time), ordered = F),
        data = df_train)
# Model 3: ets ----
model_fit_ets <- exp_smoothing() %>%
    set_engine(engine = "ets") %>%
    fit(appliances ~ ., data = df_train)
# Model 4: prophet ----
model_fit_prophet <- prophet_reg() %>%
    set_engine(engine = "prophet") %>%
    fit(appliances ~ date_time, data = training(df_splits))
# Model 5: lm ----
model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(appliances ~ date_time,
        data = training(df_splits))

# Add fitted models to model table
models_tbl <- modeltime_table(
    model_fit_arima_no_boost,
    model_fit_arima_boosted,
    model_fit_ets,
    model_fit_prophet,
    model_fit_lm
)
# calibrate
calibration_tbl <- models_tbl %>%
    modeltime_calibrate(new_data = testing(df_splits))

Visualize

Accuracy

.model_id	.model_desc	.type	mae	mape	mase	smape	rmse	rsq
Accuracy Table
1	REGRESSION WITH ARIMA(2,0,1)(2,0,0)[24] ERRORS	Test	51.90	64.83	1.11	48.86	74.25	0.09
2	ARIMA(2,0,0)(2,0,0)[24] WITH NON-ZERO MEAN W/ XGBOOST ERRORS	Test	46.32	53.67	0.99	43.68	72.73	0.11
3	ETS(M,AD,M)	Test	44.09	42.61	0.94	45.80	74.90	0.12
4	PROPHET	Test	39.95	40.75	0.85	36.45	67.24	0.21
5	LM	Test	45.92	50.35	0.98	43.07	75.41	0.01

Refit

Results

Conclusion

Acknowledgements

This blog post was made possible thanks to:

References

[1]

Y. Sun, F. Haghighat, and B. C. M. Fung, “A review of the-state-of-the-art in data-driven approaches for building energy prediction,” Energy and Buildings, vol. 221, p. 110022, Aug. 2020, doi: 10.1016/j.enbuild.2020.110022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378778819339313. [Accessed: 18-Jan-2023]

[2]

L. M. Candanedo, V. Feldheim, and D. Deramaix, “Data driven prediction models of energy use of appliances in a low-energy house,” Energy and Buildings, vol. 140, pp. 81–97, Apr. 2017, doi: 10.1016/j.enbuild.2017.01.083. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378778816308970. [Accessed: 13-Jan-2023]

[3]

K. S. Cetin, “Characterizing large residential appliance peak load reduction potential utilizing a probabilistic approach,” Science and Technology for the Built Environment, vol. 22, no. 6, pp. 720–732, Aug. 2016, doi: 10.1080/23744731.2016.1195660. [Online]. Available: https://doi.org/10.1080/23744731.2016.1195660. [Accessed: 18-Jan-2023]

[4]

R. G. Pratt, C. C. Conner, B. A. Cooke, and E. E. Richman, “Metered end-use consumption and load shapes from the ELCAP residential sample of existing homes in the Pacific Northwest,” Energy and Buildings, vol. 19, no. 3, pp. 179–193, Jan. 1993, doi: 10.1016/0378-7788(93)90026-Q. [Online]. Available: https://www.sciencedirect.com/science/article/pii/037877889390026Q. [Accessed: 18-Jan-2023]

[5]

R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2022 [Online]. Available: https://www.R-project.org/

[6]

Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2022 [Online]. Available: https://CRAN.R-project.org/package=blogdown

[7]

V. Spinu, G. Grolemund, and H. Wickham, Lubridate: Make dealing with dates a little easier. 2022 [Online]. Available: https://CRAN.R-project.org/package=lubridate

[8]

M. Dancho, Modeltime: The tidymodels extension for time series modeling. 2022 [Online]. Available: https://CRAN.R-project.org/package=modeltime

[9]

M. Kuhn and H. Wickham, Tidymodels: Easily install and load the tidymodels packages. 2022 [Online]. Available: https://CRAN.R-project.org/package=tidymodels

[10]

H. Wickham, Tidyverse: Easily install and load the tidyverse. 2022 [Online]. Available: https://CRAN.R-project.org/package=tidyverse

[11]

M. Dancho and D. Vaughan, Timetk: A tool kit for working with time series in r. 2022 [Online]. Available: https://CRAN.R-project.org/package=timetk

Disclaimer

The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31)
 os       macOS Big Sur ... 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2023-01-18
 pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
 blogdown    * 1.16    2022-12-13 [1] CRAN (R 4.2.0)
 bookdown      0.31    2022-12-13 [1] CRAN (R 4.2.0)
 bslib         0.4.2   2022-12-16 [1] CRAN (R 4.2.0)
 cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.0)
 callr         3.7.3   2022-11-02 [1] CRAN (R 4.2.0)
 cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.0)
 codetools     0.2-18  2020-11-04 [1] CRAN (R 4.2.2)
 colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
 crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.0)
 DBI           1.1.3   2022-06-18 [1] CRAN (R 4.2.0)
 devtools    * 2.4.5   2022-10-11 [1] CRAN (R 4.2.0)
 digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
 dplyr         1.0.10  2022-09-01 [1] CRAN (R 4.2.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
 evaluate      0.18    2022-11-07 [1] CRAN (R 4.2.0)
 fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
 farver        2.1.1   2022-07-06 [1] CRAN (R 4.2.0)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
 fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
 ggplot2     * 3.4.0   2022-11-04 [1] CRAN (R 4.2.0)
 ggthemes    * 4.2.4   2021-01-20 [1] CRAN (R 4.2.0)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
 gtable        0.3.1   2022-09-01 [1] CRAN (R 4.2.0)
 highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
 htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.0)
 htmlwidgets   1.6.0   2022-12-15 [1] CRAN (R 4.2.0)
 httpuv        1.6.8   2023-01-12 [1] CRAN (R 4.2.2)
 jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
 jsonlite      1.8.4   2022-12-06 [1] CRAN (R 4.2.0)
 knitr         1.41    2022-11-18 [1] CRAN (R 4.2.0)
 labeling      0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
 later         1.3.0   2021-08-18 [1] CRAN (R 4.2.0)
 lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
 pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.0)
 pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.2.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
 pkgload       1.3.2   2022-11-16 [1] CRAN (R 4.2.0)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.2.0)
 processx      3.8.0   2022-10-26 [1] CRAN (R 4.2.0)
 profvis       0.3.7   2020-11-02 [1] CRAN (R 4.2.0)
 promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0)
 ps            1.7.2   2022-10-26 [1] CRAN (R 4.2.0)
 purrr         0.3.5   2022-10-06 [1] CRAN (R 4.2.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
 Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.2.0)
 remotes       2.4.2   2021-11-30 [1] CRAN (R 4.2.0)
 rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.0)
 rmarkdown     2.18    2022-11-09 [1] CRAN (R 4.2.0)
 rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.0)
 sass          0.4.4   2022-11-24 [1] CRAN (R 4.2.0)
 scales        1.2.1   2022-08-20 [1] CRAN (R 4.2.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
 shiny         1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
 stringi       1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
 stringr       1.5.0   2022-12-02 [1] CRAN (R 4.2.0)
 tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.0)
 tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.0)
 usethis     * 2.1.6   2022-05-25 [1] CRAN (R 4.2.0)
 utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
 vctrs         0.5.1   2022-11-16 [1] CRAN (R 4.2.0)
 withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
 xfun          0.35    2022-11-16 [1] CRAN (R 4.2.0)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.0)
 yaml          2.3.6   2022-10-18 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Predicting Building Energy Usage