logoGlobal Energy Interconnection

Contents

Figure(0

    Tables(0

      Global Energy Interconnection

      Volume 8, Issue 3, Jun 2025, Pages 486-496
      Ref.

      SP-RF-ARIMA: A sparse random forest and ARIMA hybrid model for electric load forecasting

      Kamran Hassanpouri Baesmata ,Farhad Shokoohib,* ,Zeinab Farrokhia
      ( a Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, 4505 S.Maryland Pkwy., Las Vegas 89154 NV, USA , b Department of Mathematical Sciences, University of Nevada, Las Vegas, 4505 S.Maryland Pkwy., Las Vegas 89154 NV, USA )

      Keywords

      Abstract

      Abstract Accurate Electric Load Forecasting (ELF) is crucial for optimizing production capacity, improving operational efficiency, and managing energy resources effectively.Moreover, precise ELF contributes to a smaller environmental footprint by reducing the risks of disruption, downtime, and waste.However, with increasingly complex energy consumption patterns driven by renewable energy integration and changing consumer behaviors, no single approach has emerged as universally effective.In response, this research presents a hybrid modeling framework that combines the strengths of Random Forest (RF) and Autoregressive Integrated Moving Average (ARIMA) models,enhanced with advanced feature selection—Minimum Redundancy Maximum Relevancy and Maximum Synergy (MRMRMS) method—to produce a sparse model.Additionally, the residual patterns are analyzed to enhance forecast accuracy.High-resolution weather data from Weather Underground and historical energy consumption data from PJM for Duke Energy Ohio and Kentucky(DEO&K)are used in this application.This methodology,termed SP-RF-ARIMA,is evaluated against existing approaches;it demonstrates more than 40% reduction in mean absolute error and root mean square error compared to the second-best method.

      0 Introduction

      This study highlights the significant benefits of integrating machine learning with Minimum Redundancy Maximum Relevancy and Maximum Synergy (MRMRMS) in enhancing the accuracy of electrical load predictions, providing a robust foundation for optimizing energy production strategies.

      Electric Load Forecasting (ELF) is critical in the management and operation of power systems.Accurate forecasts are essential for utility companies to ensure the reliability, efficiency, and cost-effectiveness of electricity supply.These forecasts aid in operational decisionmaking processes, such as unit commitment, load switching, maintenance scheduling, and infrastructure development.As energy consumption patterns become more complex owing to the growth of renewable energy sources and varying consumer behaviors,the challenge of accurate ELF grows significantly [1-3].

      Traditionally, ELF has relied on statistical methods that use historical load data to predict future consumption.These methods range from simple Moving Average(MA) to more sophisticated time series models, such as Autoregressive Integrated Moving Average (ARIMA)and exponential smoothing.These techniques are favored for their interpretability and their ability to incorporate seasonal and cyclic patterns in load data.However,Machine Learning(ML)methods[4] have surpassed traditional parametric statistical models in ELF by their robust handling of non-linear relationships and complex data interactions inherent in electricity consumption patterns.ML models significantly improve accuracy, adaptability,and efficiency, especially in managing large datasets and integrating diverse data types.Moreover, they excel in real-time analysis, allowing continuous updates as new data becomes avail able.Additionally, these models are more capable of handling missing data and outliers,which are essential for maintaining reliable predictive analytics[5-7].

      Although ML provides enhanced predictive capabilities and operational efficiencies, it requires more computational resources and expert knowl edge and lacks the interpretability of traditional statistical models [8,9].With the advent of big data and advancements in computational power, ML methods have emerged as powerful alternatives.They enable modeli ng more complex patterns and relationships present in the data, resulting in more accurate forecasts [10,11].Section 1 reviews some ML techniques.However, despite the potential of ML methods,the transition from traditional parametric statistical models is not always straightforward [12].

      Feature selection techniques play a crucial role in improving the performance of ML models for forecasting.By focusing on the most relevant variables, feature selection helps reduce noise and prevents the model from learning patterns based on irrelevant fluctuation s in the data.This simplification improves prediction accuracy, accelerates model training, and reduces the risk of overfitting,ensuring the model performs well on unseen data [13,14].Additionally, a Sparse (SP) model is easier to interpret,making it more transparent for decision-makers.It also enhances computational and storage efficiency, which is vital when working with large datasets that are typical in load forecasting [15].Furthermore, models streamlined through feature selection adapt better to changes in data patterns, maintaining their effectiveness over time.

      Each method for ELF has its advantages and limitations.Statistical methods are well understood and trusted in the industry, often requiring fewer computational resources than ML approaches.However, although ML methods can potentially offer greater predictive power,they require substan tial amounts of data and considerable tuning to achieve optimal performance.Moreover, they suffer from overfitting and a lack of interpretability, making them less transparent for decision-makers [16,17].

      Given these differences, a direct comparison of traditional statistical methods and ML approaches in ELF is essential for clarifying their respective strengths and weaknesses, enabling utilities to choose the most appropriate method for their specific conditions and needs.

      First, a detailed comparison is made between traditional statistical and ML methods used in ELF, focusing on short-term load forecasting, which is critical for daily operational strategies of power utilities.The techniques are evaluated in terms of methodology, accuracy, computational efficiency, and ease of use.This comparison is based on a systematic analysis using real-world datasets to simulate typical forecasting scenarios faced by utilities.

      Second, an efficient method that integrates a statistical approach with an ML technique for ELF is proposed,leveraging the strengths of both methodologies to optimize load forecasting.This hybrid approach combines the robustness of statistical methods in capturing underlying data trends with the adaptability of ML techniques for handling complex, non-linear interactions.By merging these approaches, the model benefits from the interpretability of statistical analyses and the dynamic learning capabilities of ML, resulting in a more accurate and reliable forecasting tool.

      The remainder of the paper is structured as follows.Section 1 reviews the literature on statistical and ML forecasting techniques.The proposed method is described in Section 2.Section 3 covers a real-world case study, including data preparation, model selection, evaluation criteria,and detailed data analysis.Sections 4 and 5 presents the concluding remarks and outlines areas for future research.

      1 Literature review

      Advances in forecasting methodologies have significantly improved the accuracy and reliability of predictions,which are essent ial for cost reduction and enhancing power system stability[18,19].In [20], the authors comprehensively reviewed classical models, highlighting their utility in capturing linear relationships in historical load data.However, as grid operations become more complex and the variability in electrical loads increases, particular ly with the integration of renewable energy sources,these traditional models struggle to capture nonlinear patterns and stochastic load behaviors effectively [11].

      Recently, a shift toward more sophisticated data-driven approaches, such as ML techniques, has occurred in response to the limitations.Support Vector Machine(SVM) and Neural Networks (NNs) have gained popularity owing to their ability to model nonlinearities and handle large datasets with numerous input variables [21-23].In [24], researchers demonstrated the superiority of SVM in short-term load forecasting, particularly in handling the nonlinear dynamics and seasonality of load data.Similarly, NNs, particularly Long Short-Term Memory(LSTM) networks, are highly effective for short-term and medium-term load forecasting owing to their capacity to learn long-term dependencies in sequential data [25-27].

      Applying Deep Learning (DL) techniques in ELF represents a significant advancement in the field.These models, which utilize complex structures with multiple hidden layers, are particularly effective in extracting patterns and features from vast amounts of data without n ecessitating manual feature engineering[28].The ability of DL models to integrate and learn from diverse datasets, including weather conditions, economic indicators, and consumer behavior patterns, has significantly improved forecasting accuracy.

      Hybrid models that combine traditional statistical methods with modern ML techniques have also emerged as a compelling approach to enhance forecasting accuracy and robustness.These models leverage the strengths of each method, such as the interpretability of statistical models and the predictive power of ML.For instance,[29] proposed a hybrid model integrating ARIMA and an SVM network, harnessing ARIMA’s capability to model linear relationships and SVM’s proficiency in capturing hidden nonlinear patterns.This synergy effectively improves the stability and accuracy of predictions compared to using each model independently.

      With the advent of smart grids and real-time data acquisition, research has also shifted toward real-time load forecasting.Using big data analytics and real-time processing capabilities, utilities can now predict load changes minute-by-minute.Several studies, including [30] an d [31],have explored the potential of real-time predictive analytics to dynamically manage grid operations and respond to sudde n changes in load, thus enhancing the reliability of power supply systems.

      1.1 Parametric statistical models

      Parametric statistical models have long been the backbone of load forecasting in the energy sector.These methods rely primarily on time series analysis.The underlying assumption is that the past patterns of electricity demand will continue, albeit with some adjustments for known variations and trends.This study focuses on the most popular statistical methods:

      ARIMA: This model is a highly regarded statistical tool for predicting future trends in time-series data.Developed primarily in the 1970s by George Box and Gwilym Jenkins, this approach enhanced older methods, offering a comprehensive and adaptable framework for examining stochastic time-series data.ARIMA excels in short-tomedium-term forecasting, particularly for data with discernible patterns unaffected by external influences [32].The general form of the ARIMA model is denoted as

      ARIMA p d q ARIMA p d q where pis the number of Autoregressive (AR) terms, dis the number of nonseasonal differences needed for stationarity, and q isthe number of lagged forecast errors in the prediction equation(MA).TheARIMA p d q model can be written as:

      where yt is the time-series value at time t Bis the backshift operator, defined asBytyt ϕ B is the AR polynomial of orderp θB is the MA polynomial of order q, and εtis th e white noise error term.The critical elements of ARIMA are as follows:

      Autoregressive AR p : This part describes changes in the variable using its past values.The parameter p denotes the autoregressive order.

      Integrated I d : It involves differencing the da tad times to achieve stationarity, where statistical properties like mean and variance remain consistent over time.

      Moving average MA q : This part of the model expresses the prediction as a combination of the error terms from the present and previous points in time.The parameter q specifies the number of lagged forecast errors included in the model.

      ARIMA is particularly useful in economic and financial forecasting, including market trends, sales, and economic conditions.However, its effectiveness diminishes with highly seasonal or heteroskedastic data, where variations do not remain consistent over time.For such scenarios,alternative models like Seasonal ARIMA [33] or the Autoregressive Conditional Heteroskedasticity (ARCH)/Generalized Autoregressive Conditional Heteroskedasticity (GARCH) family [34] may be better suited.

      Linear regression (LR): This model is one of the most used statistical techniques in predictive modeling and quantitative analysis that models the relationship between independent variables and a dependent variable as follows:

      where y is the response variable, xjand βjare the features and coefficients, respectively, andε isthe error term.LR is conducted as follows:

      Model formulation: The statistical model and candidate variables to be included in the regression are selected.

      Parameter estimation: The regression coefficients βare estimated using the least-squares method by minimizing the residual sum of the squares, after which the fitted model is presented.

      Diagnostic checking: Once the model is fitted, the model assum ptions are validated using the residuals.

      Prediction and inference: Statistical inference and prediction are performed, including predicting responses for new observations, testing parameters, and checking linearity.

      1.2 Machine learning models

      ML has become increasingly vital in load forecasting,for its ability to autonomously identify complex patterns in data.Unlike traditional models, ML can capture nonlinear relationships that are typically missed.Below are key ML models frequently applied in load forecasting:

      (I) Support vector machine: SVMs, introduced by Vladimir Vapnik and Alexey Chervonenkis in 1963 [35], are robust ML models that are adept at linear and nonlinear classification, regression, and even o utlier detection(Fig.1).Their strengths include:

      Fig.1.Support vector machine, general form.

      Maximizing margin: SVM seeks the hyperplane that maximizes the margin be tween classes, enhancing generalization on unseen data.

      Kernel trick: For non-linearly separable data, SVM employs the kernel trick, mapping data into a high erdimensional space where it becomes linearly separable.

      Soft margin classification: By introducing slack variables, SVM accommodates data misclassifications while balancing margin maximization and classification error.

      Support vectors: Only a subset of the training data,called support vectors, impacts the SVM’ s decision boundary, making it computationally efficient.

      (II) Decision Trees (DTs): DTs are non-parametric models used for classification and regression [36](Fig.2).Based on feature values, DTs are conducted by recursively splitting data to form a tree structure, where:

      Nodes represent features, and branches represent the decisions based on feature values.

      Splits are made based on metrics like Gini Impurity for classificat ion or Mean Squared Error for regression.

      Pruning techniques are applied post-training to combat overfitting, impr oving generalization to new data.

      Random Forest (RF): RF, an ensemble learning method, builds multiple DTs to enhance prediction accuracy [37](Fig.3).The critical components of RF include:

      Bootstrap sampling: Trees are built using bootstrap samples from the original datase t, introducing variability that reduces overfitting.

      Random splits: Each tree splits based on a random subset of features, leading to diverse, less correlated trees.

      Ensemble prediction: For classification, the forest votes on the most likely class, whereas for regres sion, it outputs the average prediction across trees.

      Fig.2.Decision trees, general form.

      Fig.3.Random forest, general form.

      Neural Networks: NNs are computational models inspired by biological NNs[38](Fig.4).Their architecture includes:

      Layers of neurons connected by weighted links, including inpu t, hidden, and output layers.

      Forward propagation: Data flows through the network,with each neuron’s activation determined by functions like sigmoid or ReLU.

      Backpropagation: Weights are adjusted by propagating errors back through the network,refining predictions over time.

      Optimization is achieved via algorithms like gradient descent, whi ch minimizes the prediction error.

      Deep Learning: DL extends NNs by adding multiple layers, enabling it to model intricate data patterns [39](Fig.5).Its critical featu res are:

      Automated feature learning: DL models learn hierarchical features directly from raw data, redu cing the need for manual feature engineering.

      Forward/backpropagation: Similar to NNs but with deeper architectures that enable capturing more complex relationships.

      Optimization algorithms, such as stochastic gradient descent, faci litate learning across vast datasets.

      LSTM: LSTMs are a type of recurrent NNs designed to capture long-term dependencies in sequential data,making them ideal for time-series forecasting[40](Fig.6).LSTMs incorpora te:

      2 Method

      A hybrid approach for ELF that integrates ML and statistical techniques is proposed.Specifically, this method combines RF, renowned for its ability to model sophisticated, non-linear relationships, with ARIMA, a widely used statistical model for time series data.This hybrid combination leverages RF’s predictive strengths alongside ARIMA’s capacity to account for time-dependent structures in the data, thus mitigating the limitations of each individual approach.A more robust and accurate forecasting model is created by merging these methodologies.

      Fig.4.Neural network, general form.

      Fig.5.Deep learning, general form.

      Fig.6.LSTM network, general form.

      ( ): = element-wise node multiplications; (+): = element-wise node addition; σ: = sigmoid activation functions; tanh: = output function to generate the candidate cell state.Forget gate: Determines which past information to discard.Input gate: Updates the cell state with relevant new information.Output gate:Controls the information passed on to the next step.

      To further enhance the model’s performance, the MRMRMS feature selection method is employed to ensure that only the most relevant features contribute to the forecasting process.This approach reduces noise and enhances model accuracy by selecting variables that optimally balance relevance and redundancy.Feature selection in large datasets is an NP-complete problem owing to the vast number of feature combinations, making identifying a balanced subset challenging.The Minimum Redundancy Maximum Relevance (MRMR) method addresses this by selecting highly relevant features to the target variable while minimizing redundancy among them.The MRMRMS technique extends MRMR’s strengths, refining the selection process by optimizing relevance and minimizing redundancy across multiple scales.This capability makes MRMRMS ideal for complex feature spaces, offering a more robust and informative subset for the proposed hybrid model, thereby enhancing predictive accuracy and interpretability.

      The proposed methodology thus overcomes the limitations of relying solely on ML or statistical models, resulting in a comprehensive and precise forecast.Fig.7 presents the workflow of the proposed forecasting method.The first step in any forecasting problem is to gather high-quality data.In the present case, this includes reliable historical consumption data and weather data, such as temperature and wind speed.To prevent overfitting, the MRMRMS feature selection technique is applied to identify features that balance relevance and redundancy,ensuring that only the most informative variables are retained.

      Following feature selection, the model is built to generate an initial forecast.This preliminary prediction enables computing residuals and identifying trends, which can be used to forecast future Prediction Residuals(PR).By combining consumption and error forecast,one obtains a more accurate overall forecast of future consumption.

      This hybrid approach achieves sparsity, incorporates RF, and integrates ARIMA, hence termed SP-RFARIMA.The proposed method outperforms competing techniques, as demonstrated in the subsequent real data analysis.

      A brief explanation of the proposed method is given in Table 1.

      3 Case study evaluation

      This section assesses the performance of the proposed method through real-world electrical consumption data in the United States.The focus was on forecasting for Duke Energy Ohio and Kentucky (DEO&K), which serves areas like Cincinnati and northern Kentucky.Forecasting in this region is challenging because of the fluctuating consumption patterns and external weather conditions.

      The official electrical load consumption data from the PJM website from September 17, 2022, to March 17,2023, were collected and analyzed.

      The PJM website1 https://pjm.com/. provides data from the electric grid across 13 states, making it a reliable source for analysis.In addition, relevant weather data, such as ‘dew point’,‘humidity’, ‘pressure’, ‘temperature’, and ‘wind speed,’were gathered from the Weather Underground website2 https://www.wunderground.com..Fig.8 shows the behavior of the weather data and electrical load consumption during the study period.

      The MRMRMS feature selection algorithm [41] was applied to ensure optimal performance and prevent overfitting.‘Wind speed’, ‘wind gust’ and ‘temperature’ were identified as the best features for this case study because they exhibited the highest relevance to the electrical load patterns.

      Additionally, the unique energy consumption patterns on weekends and public holidays were addressed by adjusting the actual load data.Energy usage these days tends to deviate from regular weekday consumption owing to differing residential and commercial activity levels.The adjustment process included the following steps:

      Fig.7.Workflow of the proposed electric load forecasting method.

      Table 1 Brief description of the proposed method.

      StepDescription DataLoad and clean the data.Several preprocessing methods are applied on the data: The Holiday factor (95.8%) is applied to reduce noise.Each month is standardized to 31 days.Missing values are inputted using the average of adjacent observations.Feature selectionApply MRMRMS to select the best set of features.Split dataSplit the data into training and test datasets.Train modelTrain and compute predictions using RF (yrf) and ARIMA (yar), separately; compute average prediction (yrfaryrf yar 2).ResidualsCompute residuals (rrfar yrfar yobs).Train and compute predicted residuals using RF and ARIMA,separately,and compute the mean predicted residual.Final predictionAdd the average prediction and predicted residual to compute the final prediction.Model evaluationEvaluate the model using the test set.Competing methods Compare the results of the proposed method with several competing methods.

      Fig.8.Behavior of weather data and electrical load consumption from 9/17//2022 to 3/17/2023.

      Weekends and public holidays were first identified in the dataset owing to their distinct energy consumption behaviors compared to weekdays.

      Next, the average energy usage on these special days was calculated and co mpared to the average consumption on regular weekdays.

      A ‘holiday coefficient’ was subsequently identified to quantify the deviations in energy usage.For instance, average energy usage on holidays and weekends was 95.8% of that on regular weekdays.

      Finally, the ‘holiday coefficient’ was applied to the actual load data on weekends and public holiday s to ensure that the forecast accounts for these irregular consumption patterns.

      For forecasting purposes, the dataset was divided into two groups: training and test.This setup enabled model validation by forecasting future consumption using the training data and evaluating the performance on the test data.After applying the MRMRMS-selected features,the preliminary consumption forecast and the residuals for each model were computed.These residuals were then analyzed to detect trends to predict future errors.One can combine the consumption and error forecasts to achieve a more accurate overall prediction.

      The study benchmarked six forecasting methods, combining traditional statistical models and contemporary ML techniques to cover a broad spectrum of predictive modeling capabilities.LR was selected for its simplicity and interpreta bility, providing a straightforward baseline for performance comparison.ARIMA was included owing to its proficiency in modeling data with trends and seasonality, making it a standard tool in time-series analysis.From the ML suite, SVM was chosen for its effectiveness in high-dimensional spaces and non-linear decision boundaries.LSTM was crucial for its ability to capture long-term dependencies in sequential data.DTs offered a clear, interpretable structure for decision making, useful for capturing nonlinear patterns that simpler models might miss.Lastl y, RF was used for its robustness, derived from ensemble learning to enhance prediction accuracy and mitigate overfitting.Together, these methods ensure a comprehensive assessment across various statistical and ML approaches, highlighting their respective strengths in forecasting.

      Notably, RF requires all available features to maximize its predictive accuracy, wher eas the other models benefit from feature reduction.

      A combination of RF and ARIMA, the most effective methods among the basic approaches for ELF, was employed.Addit ional hybrid models were also explored,including ARIMA-DTs, ARIMA-SVM, and ARIMALSTM.

      Several performance metrics were used to evaluate each method: Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).These metrics are well-suited for evaluating the accuracy and reliability of forecasti ng models, particularly in time-series data.Fig.9 depicts the comparison results.Evidently, SP-RFARIMA outperformed all other competin g methods.Table 2 lists a 24-day forecasting based on the proposed method and competing methods.Table 3 lists the improvement in residuals after including residual prediction to forecasting.

      Fig.9.Performance comparison of methods in electric load forecasting.

      Table 2 Performance comparison of machine learning and statistical methods in forecasting.

      HourMethodMachine LearningStatisticsHybrid ObservedSVMLSTMDTsRFLRARIMAARIMA-SVMARIMA-DTSAIRMA-LSTMSP-RF-ARIMA 13.152.972.922.953.092.993.063.012.983.013.01 23.082.993.333.003.092.983.113.022.993.023.05 32.993.013.273.013.003.003.153.033.023.033.07 42.873.122.983.353.103.063.173.033.123.033.04 52.763.132.763.522.913.103.183.043.143.042.99 62.733.322.793.152.903.213.173.043.333.042.92 72.713.343.023.113.043.293.163.043.343.042.94 82.713.093.213.172.923.223.143.043.103.042.96 92.763.173.253.062.933.183.123.043.173.042.94 102.833.103.153.012.993.143.103.043.113.042.90 112.943.113.022.873.133.123.093.043.123.042.96 123.042.912.902.913.283.023.073.042.913.043.03 133.142.902.832.823.302.943.063.042.923.043.08 143.212.842.812.733.332.883.063.042.853.043.14 153.232.832.832.893.392.853.053.042.843.043.17 163.252.842.873.013.992.843.053.042.853.043.16 173.202.892.913.083.302.863.063.042.903.043.12 183.103.083.003.263.292.973.063.043.093.043.05 193.003.333.173.162.823.163.063.043.353.042.99 202.943.073.282.922.993.163.073.043.083.042.97 212.942.893.212.852.983.033.073.052.903.052.99 223.012.883.022.743.132.943.083.052.893.053.04 233.042.822.832.843.082.873.083.052.933.053.13 243.182.872.772.843.202.853.083.792.883.793.21

      Table 3 Reduction in predicted residuals after applying prediction error to forecasting.

      Hour12345678 Value0.150.180.180.150.090.020.030.05 Hour910111213141516 Value0.040.010.070.140.20.240.260.24 Hour1718192021222324 Value0.180.10.0400.010.050.110.18

      4 Discussion

      The SP-RF-ARIMA model is designed for adaptability across regions and electrical load types.Integrating sparse random forests, ARIMA, and robust feature selection captures essential energy consumption patterns without being region-specific.Its modular framework accommodates varying datasets and operational conditions, ensuring reliable forecasting despite differences in data quality and structure.This generalizable approach enables consistent performance and lays the groundwork for future refinements using diverse energy market data.

      In addition to MRMRMS, various feature selection methods like the correlation matrix and adaptive least absolute shrinkage and selection operator were explored.All these methods co nsistently identified the same set of selected features(wind speed,wind gust,and temperature)as MRMRMS.

      The hybrid model directly addresses crucial challenges in energy forecasting, which is critical for optimizing energy distribution and planning on a global scale.Although the proposed algorithm is simple and efficient,its strong predictive performance and real-time capabilities make it universally applicable to modern, large-scale energy systems.The balance between accuracy and practicality ensures its usefulness in real-world energy management.

      Cloud computing enables rapid, accurate, real-time load forecasting through dynamic resource allocation, parallel processing, and seamless, scalable performance.Cloud platforms ensure reliability, fault tolerance, and continuous operation, even during peak loads.Additionally, their cost-effective, on-dema nd nature reduces expenses by eliminating upfront hardware investments.Overall, cloud computing enhances speed, resilience, and scalability, making it essential for real-time forecasting.

      This research provides valuable insights into electrical demand forecasting, with practical implications for energy management systems worldwide.As energy demand continues to rise globally, the need for data-driven, precise,and adaptive energy systems becomes increasingly critical,paving the way for smarter and more sustainable energy solutions.

      5 Conclusion

      This study developed and validated a comprehensive framework for forecasting electrical demand by integrating multiple predictive techniques, including statistical models and advanced ML algorithms.The MRMRMS method was employed for feature selection to enhance these models’ accuracy and efficiency.The 7-stage prediction process incorpora ted explicit modeling of residuals to capture nuances that traditional methods often overlook.This approach improved forecast accuracy and revealed underlying patterns in electrical usage and production inefficiencies.

      Using several criteria, the proposed method was demonstrated to significantly improve forecasting accuracy.A key takeaway from this study is that analyzing residual patterns provides deeper insights into forecast errors and consumption trends.

      Incorporating real-time data analytics and renewable energy sources into future forecasting models is recommended to further improve the responsiveness and sustainability of power management systems.The ultimate objective is to develop a fully adaptive system that predicts demand with high precision and optimizes energy distribution and consumption in real time.

      CRediT authorship contribution statement

      Kamran Hassanpouri Baesmat: Writing - original draft,Visualization, Validation, Software, Methodology,Investigation, Formal analysis, Conceptualization.Farhad Shokoohi: Writing - original draft, Visualization, Validation, Supervision, Software, Methodology, Investigation,Funding acquisition, Formal analysis, Conceptualization.Zeinab Farrokhi: Writing - original draft, Visualization,Validation, Software, Methodology, Formal analysis.

      Declaration of competing interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgments

      The authors express their gratitude to the editorial team and the anonymous reviewers for their invaluable constructive feedback, which substantially enhanced the manuscr ipt’s quality.This work was supported by the Startup Grant (PG18929) awarded to F.Shokoohi.

      Appendix A Supplementary material

      Supplementary data to this article can be found online at https://doi.org/10.1016/j.gloei.2025.04.003.

      References

      1. [1]

        K.H.Baesmat, I.Masoudipour, H.Samet, Improving the performance of short-term load forecast using a hybrid artificial neural network and artificial bee colony algorithm, IEEE Can.J.Electr.Comput.Eng.44 (3) (2021) 275-282, https://doi.org/10.1109/icjece.2021.3056125. [百度学术]

      2. [2]

        W.Zha, Y.Ji, C.Liang, Short-term load forecasting method based on secondary decomposition and improved hierarchical clustering,ResultsEng.22(2024),https://doi.org/10.1016/j.rineng.2024.101993 101993. [百度学术]

      3. [3]

        A.Sharma, S.K.Jain, A novel two-stage framework for mid-term electric load forecasting, IEEE Trans.Ind.Inf.20 (1) (2024) 247-255, https://doi.org/10.1109/TII.2023.3259445. [百度学术]

      4. [4]

        Z.Farrokhi, A.Kalhor, M.T.Masouleh, Implementation and evaluation of object identification techniques on Nao robot platform, Int.J.Mechatron.Electr.Comput.Technol.(ijmec) 8(29) (2018) 3947-3958.https://aeuso.org/includes/files/articles/Vol8_Iss29_3947-3958_Implementation_and_Evaluation_of_Ob.pdfs. [百度学术]

      5. [5]

        A.R.Singh, R.S.Kumar, M.Bajaj, C.B.Khadse, I.Zaitsev,Machine learning-based energy management and power forecasting in grid connected microgrids with multiple distributed energy sources, Sci.Rep.14 (1) (2024), https://doi.org/10.1038/s41598-024-70336-3 19207. [百度学术]

      6. [6]

        J.Lee, Y.Cho, National-scale electricity peak load forecasting:traditional, machine learning,or hybrid model?,Energy 239(2022),https://doi.org/10.1016/j.energy.2021.122366 122366. [百度学术]

      7. [7]

        P.-H.Kuo, C.-J.Huang, A high precision artificial neural networks model for short-term energy load forecasting, Energies 11 (1)(2018), https://doi.org/10.3390/en11010213. [百度学术]

      8. [8]

        K.H.Baesmat, Impedance analysis of adaptive distance relays using machine learning, in: S.Latifi (Ed.), ITNG 2024: 21st International Conference on Information Technology-New Generations, Springer Nature Switzerland, Cham, 2024, pp.457-461, https://doi.org/10.1007/978-3-031-56599-1_57. [百度学术]

      9. [9]

        E.Vivas, H.Allende-Cid, R.Salas, A systematic review of statistical and machine learning methods for electrical power forecasting with reported mape scor e, Entropy 22 (12) (2020),https://doi.org/10.3390/e22121412. [百度学术]

      10. [10]

        M.Jawad, M.S.A.Nadeem, S.-O.Shim, I.R.Khan, A.Shaheen,N.Habib, L.Hussain, W.Aziz, Machine learning based costeffective electricity load forecasting model using correlated meteorological parameters, IEEE Access 8 (2020) 146847-146864,https://doi.org/10.1109/ACCESS.2020.3014086. [百度学术]

      11. [11]

        M.Cordeiro-Costas, D.Villanueva, P.Eguı´a-Oller, M.Martı´nez-Comesan˜a, S.Ramos, Load forecasting with machine learning and deep learning methods, Appl.Sci.13 (13) (2023), https://doi.org/10.3390/app13137933. [百度学术]

      12. [12]

        K.Hassanpouri Baesmat, A.Shiri, A new combined method for future energy forecasting in electrical networks ITEES-17-0407.R4,Int.Trans.Electr.Energy Syst.29 (3) (2019), https://doi.org/10.1002/etep.2749 e2749. [百度学术]

      13. [13]

        Y.Liang, D.Niu, W.-C.Hong, Short term load forecasting based on feature extraction and improved general regression neural network model, Energy 166 (2019) 653-663, https://doi.org/10.1016/j.energy.2018.10.119. [百度学术]

      14. [14]

        D.Lu, D.Zhao, Z.Li, Short-term nodal load forecasting based on machine learning techniques,Int.Trans.Electr.Energy Syst.31(9)(2021), https://doi.org/10.1002/2050-7038.13016 e13016. [百度学术]

      15. [15]

        D.Theng, K.K.Bhoyar, Feature selection techniques for machine learning: a survey of more than two decades of research, Knowl.Inf.Syst.66 (3) (2024) 1575-1637, https://doi.org/10.1007/s10115-023-02010-5. [百度学术]

      16. [16]

        K.Shahare, A.Mitra, D.Naware, R.Keshri, H.Suryawanshi,Performance analysis and comparison of various techniques for short-term load forecasting,Energy Rep.9(2023)799-808,https://doi.org/10.1016/j.egyr.2022.11.086, 2022 9th International Conference on Power and Energy Systems Engineering. [百度学术]

      17. [17]

        A.K.Shaikh, A.Nazir, N.Khalique, A.S.Shah, N.Adhikari, A new approach to seasonal energy consumption forecasting using temporal convolutional networks, Results Eng.19 (2023), https://doi.org/10.1016/j.rineng.2023.101296 101296. [百度学术]

      18. [18]

        M.Qureshi, M.A.Arbab, S.U.Rehman, Deep learning-based forecasting of electricity consumption, Sci.Rep.14 (1) (2024),https://doi.org/10.1038/s41598-024-56602-4 6489. [百度学术]

      19. [19]

        O.Rubasinghe, X.Zhang, T.K.Chau, Y.H.Chow, T.Fernando,H.H.-C.Lu, A novel sequence to sequence data modelling-based CNN-LSTM algorithm for three years ahead monthly peak load forecasting, IEEE Trans.Power Syst.39 (1) (2024) 1932-1947,https://doi.org/10.1109/TPWRS.2023.3271325. [百度学术]

      20. [20]

        F.Dewangan, A.Y.Abdelaziz, M.Biswal, Load forecasting models in smart grid using smart meter information:a review,Energies 16(3) (2023), https://doi.org/10.3390/en16031404. [百度学术]

      21. [21]

        O.Linda, M.Manic, GNH-SVM framework - classifying large datasets with support vector machines using growing neural gas, in: 2009 International Joint Conference on Neural Networks, 2009, pp.1820-1826, https://doi.org/10.1109/IJCNN.2009.5178713. [百度学术]

      22. [22]

        D.-X.Niu, Y.-L.Wang, Support vector machines based on data mining technology in power load forecasting, in: 2007 International Conference on Wireless Communications,Networking and Mobile Computing, 2007, pp.5373-5376, https://doi.org/10.1109/WICOM.2007.1316. [百度学术]

      23. [23]

        T.-N.Do, F.Poulet, Parallel learning of local SVM algorithms for classifying large datasets, in: A.Hameurlain, J.Kung, R.Wagner,T.K.Dang,N.Thoai(Eds.),Transactions on Large-Scale Dataand Knowledge-Centered Systems XXXI, Springer Berlin Heidelberg,Berlin,Heidelberg,2017,pp.67-93,https://doi.org/10.1007/978-3-662-54173-9_4. [百度学术]

      24. [24]

        Aasim, S.Singh, A.Mohapatra, Data driven day-ahead electrical load forecasting through repeated wavelet transform assisted SVM model, Appl.Soft Comput.111 (2021), https://doi.org/10.1016/j.asoc.2021.107730 107730. [百度学术]

      25. [25]

        S.Zhang, R.Chen, J.Cao, J.Tan, A CNN, and LSTM-based multi-task learning architecture for short and medium-term electricity load forecasting, Electric Power Syst.Res.222 (2023),https://doi.org/10.1016/j.epsr.2023.109507, Engineering, Springer Nature Switzerland, Cham, 2023, pp.293-303.https://doi.org/10.1007/978-3-031-40579-2_29 109507. [百度学术]

      26. [26]

        X.Wang, F.Fang, X.Zhang, Y.Liu, L.Wei, Y.Shi, LSTM-based short-term load forecasting for building electricity consumption,in:2019 IEEE 28th International Symposium on Industrial Electronics(ISIE),2019,pp.1418-1423,https://doi.org/10.1109/ISIE.2019.8781349. [百度学术]

      27. [27]

        S.Muzaffar, A.Afshari, Short-term load forecasts using LSTM networks, Energy Proc.158 (2019) 2922-2927, https://doi.org/10.1016/j.egypro.2019.01.952, innovative Solutions for Energy Transitions. [百度学术]

      28. [28]

        V.Suresh, P.Janik, J.M.Guerrero, Z.Leonowicz, T.Sikorski,Microgrid energy management system with embedded deep learning forecaster and combined optimizer, IEEE Access 8(2020)202225-202239,https://doi.org/10.1109/ACCESS.2020.3036131. [百度学术]

      29. [29]

        K.H.Baesmat, S.Latifi, A new hybrid method for electrical load forecasting based on deviation correction and MRMRMS, in: H.Selvaraj, G.Chmaj, D.Zydek (Eds.), Advances in Systems Engineering, Springer Nature Switzerland, Cham, 2023, pp.293-303, https://doi.org/10.1007/978-3-031-40579-2_29. [百度学术]

      30. [30]

        P.Vrablecova, A.Bou Ezzeddine, V.Rozinajova, S.Sarik, A.K.Sangaiah, Smart grid load forecasting using online support vector regression, Comput.Electr.Eng.65 (2018) 102-117, https://doi.org/10.1016/j.compeleceng.2017.07.006. [百度学术]

      31. [31]

        H.Jamali, A.Karimi, M.Haghighizadeh, A new method of cloudbased computation model for mobile devices: energy consumption optimization in mobile-to-mobile computation offloading, in:Proceedings of the 6th International Conference on Communications and Broadband Networking, ICCBN ’18,Association for Computing Machinery, New York, NY, USA,2018, pp.32-37, https://doi.org/10.1145/3193092.3193103. [百度学术]

      32. [32]

        A.Raza, L.Jingzhao, M.Adnan, I.Ahmad, Optimal load forecasting and scheduling strategies for smart homes peer-topeer energy networks: a comprehensive survey with critical simulation analysis, Results Eng.22 (2024), https://doi.org/10.1016/j.rineng.2024.102188 102188. [百度学术]

      33. [33]

        A.K.Abdella Ahmed, A.M.Ibraheem, M.K.Abd-Ellah,Forecasting of municipal solid waste multi-classification by using time-series deep learning depending on the living standard,Results Eng.16 (2022), https://doi.org/10.1016/j.rineng.2022.100655 100655. [百度学术]

      34. [34]

        T.Bollerslev,Generalizedautoregressiveconditional heteroskedasticity, J.Econ.31 (3) (1986) 307-327, https://doi.org/10.1016/0304-4076(86)90063-1. [百度学术]

      35. [35]

        V.N.Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc., 1995, doi: 10.1007/978-1-4757-3264-1. [百度学术]

      36. [36]

        F.F.Fadoul, A.A.Hassan, R.C¸ ag˘lar, Integrating autoencoder and decision tree models for enhanced energy consumption forecasting in microgrids: a meteorological data-driven approach in Djibouti,Results Eng.5 (2024) 5-55, https://doi.org/10.1016/j.rineng.2024.103033. [百度学术]

      37. [37]

        M.S.Islam, M.Minul Alam, A.Ahamed, S.I.Ali Meerza,Prediction of diabetes at early stage using interpretable machine learning, in: SoutheastCon 2023, 2023, pp.261-265, https://doi.org/10.1109/SoutheastCon51012.2023.10115152. [百度学术]

      38. [38]

        G.Be´ke´si, L.Barancsuk, B.Hartmann, Deep neural networkbased distribution system state estimation using hyperparameter optimization, Results Eng.24 (2024), https://doi.org/10.1016/j.rineng.2024.102908 102908. [百度学术]

      39. [39]

        L.Jiang, X.Wang, W.Li, L.Wang, X.Yin, L.Jia, Hybrid multitask multi-information fusion deep learning for household short-term load forecasting, IEEE Trans.Smart Grid 12(6)(2021)5362-5372, https://doi.org/10.1109/TSG.2021.3091469. [百度学术]

      40. [40]

        Z.Zhang, Q.Zhang, H.Liang, B.Gorbani, Optimizing electric load forecasting with support vector regression/LSTM optimized by flexible gorilla troops algorithm and neural networks a case study, Sci.Rep.14 (1) (2024), https://doi.org/10.1038/s41598-024-73893-9 22092. [百度学术]

      41. [41]

        Z.Farrokhi, K.H.Baesmat, E.E.Regentova, Enhancing urban intelligence energy management: innovative load forecasting techniques for electrical networks, J.Power Energy Eng.12(2024) 72-88, https://doi.org/10.4236/jpee.2024.1211005. [百度学术]

      Fund Information

      Author

      • Kamran Hassanpouri Baesmat

        Kamran Hassanpouri Baesmat received master’s degree at Yazd University, Yazd, 2016, and bachelor’s degree at Islamic Azad University,Markazi, 2012.He is working towards Ph.D.degree at University of Nevada Las Vegas,USA.His research interests include artificial Intelligence, HRI, EV and battery pack design,power grid, smart grid, energy management,etc.

      • Farhad Shokoohi

        Farhad Shokoohi received Ph.D.and master’s degrees in Statistics at Shahid Beheshti University, Tehran, 2012 and 2006.He received bachelor’s degree in Statistics at Razi University, Kermanshah, 2003.He is working in University of Nevada Las Vegas, USA, as an Assistant Professor of Statistics.His research interests include Statistics, Machine learning,High-dimensional Data Analysis, Statistical Genetic and Genomics, etc.

      • Zeinab Farrokhi

        Zeinab Farrokhi received master’s degree at University of Tehran, Iran, 2017, and bachelor’s degree at Shahid Rajaee University, Iran,year.She is working towards Ph.D.degree at University of Nevada Las Vegas, USA.Her research interests include machine learning,computer vision, deep learning, image processing with application in SWR analysis and human robot interaction.

      Publish Info

      Received:

      Accepted:

      Pubulished:2025-06-25

      Reference: Kamran Hassanpouri Baesmat,Farhad Shokoohi,Zeinab Farrokhi,(2025) SP-RF-ARIMA: A sparse random forest and ARIMA hybrid model for electric load forecasting.Global Energy Interconnection,8(3):486-496.

      Share to WeChat friends or circle of friends

      Use the WeChat “Scan” function to share this article with
      your WeChat friends or circle of friends