Metadata-Version: 2.1
Name: stlflib
Version: 1.1.0
Summary: Short-Term Forecasting of Regional Electrical Load Based on CatBoost Model
Home-page: https://github.com/caapel/ForecastPowerEnergy
Author: caapel
Author-email: caapel@mail.ru
License: KSPEU License
Project-URL: GitHub, https://github.com/caapel/ForecastPowerEnergy
Keywords: STLF
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: shap==0.47.0
Requires-Dist: ephem
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: tqdm
Requires-Dist: numpy==1.25.2
Requires-Dist: pandas
Requires-Dist: seaborn
Requires-Dist: matplotlib
Requires-Dist: catboost
Requires-Dist: graphviz
Requires-Dist: scikit-learn
Requires-Dist: ipywidgets
Requires-Dist: openpyxl==3.1.2

# Short-Term Load Forecasting Based on CatBoost Model Library (STLFLib)

This is a Python STLF machine learning library designed for generating energy consumption bids for the DAM (day-ahead market). 
The library is distributed under the KSPEU license ([RU 2025688100](https://new.fips.ru/registers-doc-view/fips_servlet?DB=EVM&DocNumber=2025688100&TypeFile=html)). For commercial use, please contact the author: caapel@mail.ru.

----------

## How to install ##
To install, you can use the command:

    pip install stlflib

Or download the repository from [GitHub](https://github.com/caapel/ForecastPowerEnergy) (private access)

----------

## Using ##
The essence of this project and its library is described in detail in the study [***Short-Term Forecasting of Regional Electrical Load Based on XGBoost Model***](https://doi.org/10.3390/en18195144)
> In this file you will not find a detailed description and instructions on how to work with this library; 
> only a description of each of the basic library modules.

### Dependency ##
The **dependency** module (located in the `dependency.py` file) contains a complete list of dependencies.
The module has only one function:
- *print_dependency()* - prints the versions of installed dependencies

and 5 global variable (for remote control of GUI-program settings):

- *GUI_model* - name of the current working model
- *GUI_max_depth_br3_act* - default max_depth for the br3 model
- *GUI_learn_period_br3_act* - default learn_period for the br3 model
- *GUI_max_depth_br2_act* - default max_depth for the br2 model
- *GUI_learn_period_br2_act* - default learn_period for the br2 model

### ServiceDB ### 
The **serviceDB** module (located in the `serviceDB.py` file) contains a set of tools for working with the database:
<br>-------------------------------create---------------------------------<br>
- *generate_volume_df(path)* - generate a dataframe with archived energy consumption data from prepared .xls-files located at `path`
- *get_weather(date, verify_HTTPS_request=False, verify_result)* - generate a weather archive/forecast (outside air temperature) for the specified date with a sampling frequency of 1 hour
- *get_br_feature(date)* - load a BEM (Balancing energy Market) archive/forecast for the specified date
- *get_RSV_rate(date)* - load the unregulated DAM price for the specified date (per month)
- *updating_or_create_df(get_function, filename, start=datetime(2013, 1, 1).date())* - create a new (from the specified date) or replenish an existing database (filename.xlsx) with missing data up to the end of the previous month, returning the resulting dataframe
- *merge_and_export_DB(total_volume_df, df_weather, df_br_feature, path='', filename='DataBase.xlsx', update=True)* - merge dataframes total_volume_df (`Volume.xlsx`), df_weather (`Weather.xlsx`), and df_br_feature (`br_feature.xlsx`) by the 'Date' column into one common database (by default, `DataBase.xlsx`)
- *database_volume_update(total_volume_df, path='', filename='DataBase.csv')* - The function updates only the 'Volume' column in the selected database (DataBase.xlsx by default). A simplified version of the `merge_and_export_DB` function
<br>-------------------------------service--------------------------------<br>
- *get_empty_daily_df(date)* - creates an empty dataframe (25 rows: from 0:00 to 24:00) for the specified date (for full temperature interpolation)
- *add_date_scalar(df)* - adds additional categorical features to the dataframe: Day, Month, Year, WeekDay
- *is_check_DataBase(df)* - checks database integrity
- *act_pred_reverse(df_br_feature)* - replaces missing actual (Act) consumption and BR generation values ​​with planned (Pred) values. This function is used to generate a forecast for the current day, when the actual values ​​of `ActCons` and `ActGen` are not available for the entire day
- *get_files_from_path(path='_raw_Data_TatEnergosbyt', remove=True, jupyter_widget=True)* - retrieving operational data from the directory (by default, `/_raw_Data_TatEnergosbyt`)
- *update_DataBase(total_oper_df, filename='DataBase.csv', common_check=True)* - updating the database by adding operational data from `total_oper_df`

### Preprocessing ###
The **preprocessing** module (located in the `preprocessing.py` file) contains data preprocessing tools for subsequent transfer of this data to the **core** functions (CatBoostRegressor):
- *get_type_day(df)* - encoding the day type (`TypeDay`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the industrial calendar of the Republic of Tatarstan
- *get_light(df)* - encoding the light interval (`Light`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the geographic location of the city of Kazan
- *get_season(df)* - encoding seasonality based on the `df.Date` column of the DataTime format
- *prepareData(df, lag_start=1, lag_end=7)* - data preprocessing function. Preprocessing includes: adding day type, light interval, seasonality, and energy consumption lag (default 1...7 days)

### Core ###
The **core** module (located in the `core.py` file) is the main class in the library. It is based on `CatBoostRegressor` and has a number of functions:
- *predict_volume(df_general, df_predict, max_depth, learn_period)* - model training and energy consumption forecasting
- *get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end, jupyter_widget=True, disable_widget=False)* - generates a data frame with the predicted energy consumption volume for the specified planning horizon
- *date_str_format(df_predicted)* - generates a date string for the exported xlsx file
- *get_DAM_order(df_general, max_depth, learn_period, model, date_start, date_end, jupyter_widget=True, path)* - generates a DAM order and exports it to xlsx format

### Validating ###
The **validating** helper module (located in the `validating.py` file) is designed to validate the **core** functions:
- *get_df_val_predicted(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end, jupyter_widget=True, disable_widget=False)* - function for generating a dataframe with predicted energy consumption volumes for the specified planning horizon, adapted for validation calculations (simulating the absence of 'ActCons' and 'ActGen' data after 7 AM, offline access to the weather forecast and BEM data)
- *get_df_validate(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end, logging=True, jupyter_widget=True)* - function for validating the model for the specified time interval. Returns a validation dataframe with predicted values.
- *get_df_validate_with_loss(df_validate_result, df_RSV_vs_BR_rate)* - adds a 'loss' column with BEM losses to the resulting dataframe.
- *draw_diff_predict_vs_fact(df_validate_result, fontsize=12, font='Palatino Linotype')* - outputs validation results in table and graph form (Matplotlib object).
- *Grid_Search(df_general, df_general_date_index, max_depth_grid, learn_period_grid, model, date_start, date_end, jupyter_widget=True)* - grid search for optimal training period and tree depth.
- *search_result_highlighting(df_search_result)* - highlights the search results of the Grid_Search() function.

### EDA ###
The **EDA** graphics module (located in the `EDA.py` file) is designed to display the results of exploratory data analysis and includes the following functions:
<br>--------------------------------utility----------------------------------<br>
- *save_fig_to_pdf(plt, path, filename)* - export the rendered figure to PNG format
- *rus_features(df, order='columns')* - русификация датафреймов для диаграмм и графиков
- *cat_codes_data(df_general, model, translation)* - russification of dataframes for charts and graphs
- *build_and_fit_cat_selector(df_general_cat, max_depth)* - creating and training a CatBoost feature selector
- *get_shap_values(selector, df_general_cat)* - generating the SHAP_value CatBoost Regressor with SHAP.Explainer
- *draw_learning_curve(df_general, max_depth, model, fontsize=15)* - calculates and plots the learning curve.
<br>------------------------------visualisation------------------------------<br>
- *draw_learning_curve(df_general, max_depth, model, fontsize, font, save_fig_dir)* - calculation and construction of the learning curve
- *graf_full_consumption(df, groupby_months, fontsize, font, save_fig_dir)* - construction of a year-by-year energy consumption graph (with and without grouping by month)
- *graf_year_consumption(df, year, groupby_days, fontsize, font, save_fig_dir)* - construction of an annual energy consumption chart traced by month (with and without grouping by days)
- *graf_temp_cons_corr(df, year, fontsize, font, save_fig_dir)* - plotting the dependence of average daily electricity consumption on the average daily temperature for a given year
- *graf_month_consumption(df, month, year, fontsize, font, save_fig_dir)* - plotting a monthly energy consumption graph with tracing by day of the week
- *draw_features_correlation_heatmaps(df_general, season, fontsize, font, translation, save_fig_dir)* - constructing a heat map of Pearson correlations for the target and source features
- *draw_hour_of_day_correlation_heatmaps(df_general, fontsize, font, translation, save_fig_dir)* - constructing a heat map of hourly Pearson correlations for the original features
- *draw_bar_CatBoost_feature_importances(selector, df_general_cat, fontsize, font, translation, save_fig_dir)* - plotting the value diagram of features obtained using the built-in selector of the CatBoost regressor
- *draw_bar_SHAP_feature_importances(shap_values, fontsize, font, translation, save_fig_dir)* - constructing a feature value diagram obtained using SHAP.Explainer
- *plot_and_save_shap_scatter(feature_name, df_general_cat, shap_values, interaction_index, fontsize, font, translation, save_fig_dir, filename)* - plotting shap.dependence_plot for the selected feature
- *draw_example_decision_tree(df_general_cat, max_depth, tree_idx)* - constructing an instance of the CatBoost regressor decision tree
