📊 Dataset Overview
High-level summary of the dataset structure and composition.
📝 Data Types & Missing Values
| Column | Data Type | Missing Values |
|---|---|---|
| {{ col_name }} | {{ dtype }} |
{% set missing = basic_info.missing_values[col_name] if basic_info.missing_values and col_name in basic_info.missing_values else 0 %} {{ missing }} {% if missing | int > 0 %} ⚠️ {% else %} ✓ {% endif %} |
🔍 Missing Values Summary
{{ count }} missing
📈 Summary Statistics
The Summary Statistics table provides a comprehensive overview of key descriptive metrics for each numerical variable, including count, mean, standard deviation, minimum, quartiles, and maximum values. This table helps in understanding the central tendency, dispersion, and overall distribution of the data, serving as a foundational step for exploratory analysis and data quality assessment.
| Statistic | {% for col_name in basic_info.column_names %}{{ col_name }} | {% endfor %}
|---|---|
| {{ stat_name }} | {% for col_name in basic_info.column_names %}{% if col_name in basic_info.summary_stats and stat_name in basic_info.summary_stats[col_name] %} {% set val = basic_info.summary_stats[col_name][stat_name] %} {% if val is number %} {{ "%.3f"|format(val) }} {% else %} {{ val }} {% endif %} {% else %} - {% endif %} | {% endfor %}
📈 Variance Inflation Factor (VIF)
The VIF table provides the exact Variance Inflation Factor values for each feature, offering a precise measure of multicollinearity within the dataset. By examining these values, one can identify variables that may be redundant or highly correlated with others. This detailed representation complements the chart by supporting informed decisions on feature elimination, transformation, or regularization to improve model robustness and reliability.
Methodology: Variance Inflation Factor (VIF) Analysis
- Iterative Regression: Regress each feature variable against all other independent variables in the dataset.
- R-Squared Calculation: Determine the coefficient of determination (R²) for each specific regression model.
-
VIF Computation:
Apply the formula
VIF = 1 / (1 - R²)to quantify the inflation of variance for each feature. - Multicollinearity Identification: Evaluate the resulting values to detect high correlation (typically values > 5 or 10).
- Feature Refinement: Utilize the scores to decide on variable elimination, transformation, or the use of regularization techniques.
| Feature | Variance Inflation Factor (VIF) |
|---|---|
| {{ vif.feature[row] }} | {{ vif.vif_value[row] }} |
📊 ACF & PACF Analysis
The ACF and PACF tables present the numerical values of autocorrelation and partial autocorrelation coefficients for each variable across specified lags. These tabulated values enable precise identification of statistically significant lags and the strength of relationships over time. By complementing the visual insights from the plots, the tables support more detailed analysis and help validate lag selection decisions for modeling and forecasting purposes.
Methodology: ACF & PACF Lag Analysis
- Lag Correlation Calculation: Compute the Pearson correlation between the time series and its own lagged versions (ACF).
- Partial Influence Extraction: Calculate the PACF to isolate the direct relationship between observations at different lags by removing intermediate effects.
- Coefficient Tabulation: Organize the resulting values into a table to provide precise numerical depth beyond the visual plots.
- Significance Testing: Compare coefficients against standard error bounds to identify which lags are statistically meaningful.
- Model Order Selection: Use the significant lag values to inform the parameters (p, q) for ARIMA or other forecasting models.
{{ item.column }}
| Lag | ACF | PACF |
|---|---|---|
| {{ item.lags[idx] }} | {% if item.acf[idx] is number %} {{ "%.3f" | format(item.acf[idx]) }} {% else %} {{ item.acf[idx] }} {% endif %} | {% if item.pacf[idx] is number %} {{ "%.3f" | format(item.pacf[idx]) }} {% else %} {{ item.pacf[idx] }} {% endif %} |
🧠 Granger Causality Test
The Granger Causality Test table summarizes the presence and characteristics of causal relationships between variables based on statistical testing. It indicates whether a significant causal effect exists, the direction of the relationship (positive or negative), the optimal lag at which the effect is strongest, and the corresponding p-value for significance. Additionally, it reports the number of lags evaluated, providing context for the robustness of the test results.
Methodology: Granger Causality Testing
- Stationarity Verification: Ensure all time-series variables are stationary to prevent spurious regression results.
- Lag Optimization: Test multiple time lags to determine the point where the past values of one variable most significantly predict another.
- Statistical Hypothesis Testing: Perform F-tests to calculate p-values, determining if the predictive power of a feature is statistically significant.
- Directional Analysis: Assess the sign (positive or negative) of the relationship to understand how the feature influences the KPI.
- Robustness Validation: Report the total lags evaluated and significance levels to confirm the reliability of the causal inference.
Error Threshold (MAPE) Used: {{ "%.2f"|format(causality_test.error_threshold) }} %
| Variable | Causal | Coefficient Sign | Best Lag | P-value | Causality Score | MAPE (%) | Number of Lags Tested |
|---|---|---|---|---|---|---|---|
| {{ result.variable }} | {% if result.causal %} Yes 👍 {% elif result.causal == None %} None {% else %} No 👎 {% endif %} | {% if result.coefficient_sign == 'positive' %} Positive ↑ {% elif result.coefficient_sign == None %} None {% else %} Negative ↓ {% endif %} | {{ result.best_lag }} | {% if result.p_value is number %}{{ "%.5f"|format(result.p_value) }}{% else %}{{ result.p_value }}{% endif %} | {% if result.score is number %}{{ "%.2f"|format(result.score) }}{% else %}{{ result.score }}{% endif %} | {% if result.mape_score is number %}{{ "%.2f"|format(result.mape_score) }}{% else %}{{ result.mape_score }}{% endif %} | {{ result.number_of_lags_tested }} |
📈 Time Comparison
The YoY table presents yearly aggregated values for each variable alongside their corresponding percentage changes compared to the previous year. This structured view allows for precise comparison of annual performance and growth rates, supporting detailed analysis of trends and facilitating data-driven decision-making.
| Year | {% for col in time_comparison.columns %}{{ col }} | {{ col }} % Change | {% endfor %}
|---|---|---|
| {{ row.year }} | {% for col in time_comparison.columns %}{{ row[col] }} | {{ row[col + '_pct_change'] }} | {% endfor %}
🔗 Correlation Matrix
The correlation heatmap visualizes pairwise relationships between variables using color intensity to represent the strength and direction of correlations. This chart enables quick identification of highly correlated variable pairs, patterns, and potential multicollinearity within the dataset.
| Variable | {% for col_name in corr_matrix.keys() %}{{ col_name }} | {% endfor %}
|---|---|
| {{ row_name }} | {% for col_name in corr_matrix.keys() %} {% set corr_val = row_values[col_name] if row_values[col_name] is defined else None %}{% if corr_val is not none and corr_val is number %} {{ "%.3f" | format(corr_val) }} {% else %} — {% endif %} | {% endfor %}
⏱️ Lag Correlation
The Lag Correlation table presents the correlation of the target (KPI) variable with its past values across different lag periods (e.g., T vs. T−1, T−2, T−3, ...). It helps identify temporal dependencies and persistence in the series, supporting informed lag selection for time-series modeling.
Methodology: KPI Lag Correlation Analysis
- Lag Generation: Create multiple shifted versions of the target KPI variable for defined intervals (e.g., T−1, T−2, T−3, ...).
- Pairwise Calculation: Compute the correlation coefficient between the current KPI value (T) and each of its historical lags.
- Persistence Measurement: Evaluate the decay rate of correlation values to determine how long past values continue to influence the present.
- Statistical Tabulation: Organize the coefficients into a structured table to allow for precise comparison across different time offsets.
- Feature Engineering: Select the most highly correlated lags to be used as predictive inputs in time-series forecasting models.
| Lag Period | Correlation |
|---|---|
| Lag {{ lag_key }} | {% if corr_val is number %} {{ "%.4f" | format(corr_val) }} {% else %} {{ corr_val }} {% endif %} |
📊 Visualizations
Embedded charts and plots for visual data exploration.
{{ chart.title }}
{{ chart.description }}
{% set image_src = chart.image_data or chart.correlation_chart or chart.time_series_chart or chart.outliers_chart or chart.lag_correlation_chart %} {% if image_src %}