{{ generator or "OwlMix EDA Module" }}

{{ header_title or "🦉 OwlMix EDA Report" }}

{{ header_subtitle or "Exploratory Data Analysis for Marketing Mix Modeling" }}

📅 {{ report_date or "Unknown" }}
{% if basic_info %}

📊 Dataset Overview

High-level summary of the dataset structure and composition.

Total Records
{{ basic_info.num_rows | default("N/A") }}
Total Columns
{{ basic_info.num_columns | default("N/A") }}
{% if basic_info.column_names %}
Column Names:
{{ basic_info.column_names | join(", ") }}
{% endif %}
{% if basic_info.data_types %}

📝 Data Types & Missing Values

{% for col_name, dtype in basic_info.data_types.items() %} {% endfor %}
Column Data Type Missing Values
{{ col_name }} {{ dtype }} {% set missing = basic_info.missing_values[col_name] if basic_info.missing_values and col_name in basic_info.missing_values else 0 %} {{ missing }} {% if missing | int > 0 %} ⚠️ {% else %} {% endif %}
{% endif %} {% if basic_info.missing_values %}

🔍 Missing Values Summary

{% for col_name, count in basic_info.missing_values.items() %}
{{ col_name }}
{{ count }} missing
{% endfor %}
{% endif %}
{% if basic_info.summary_stats %}

📈 Summary Statistics

The Summary Statistics table provides a comprehensive overview of key descriptive metrics for each numerical variable, including count, mean, standard deviation, minimum, quartiles, and maximum values. This table helps in understanding the central tendency, dispersion, and overall distribution of the data, serving as a foundational step for exploratory analysis and data quality assessment.

{% for col_name in basic_info.column_names %} {% endfor %} {% if basic_info.summary_stats %} {% set stat_names = basic_info.summary_stats.values() | list | first %} {% if stat_names %} {% for stat_name in stat_names.keys() %} {% for col_name in basic_info.column_names %} {% endfor %} {% endfor %} {% endif %} {% endif %}
Statistic{{ col_name }}
{{ stat_name }} {% if col_name in basic_info.summary_stats and stat_name in basic_info.summary_stats[col_name] %} {% set val = basic_info.summary_stats[col_name][stat_name] %} {% if val is number %} {{ "%.3f"|format(val) }} {% else %} {{ val }} {% endif %} {% else %} - {% endif %}
{% endif %} {% else %}
⚠️ No basic information available. Ensure report.json contains 'basic_info' section.
{% endif %} {% if vif %}

📈 Variance Inflation Factor (VIF)

The VIF table provides the exact Variance Inflation Factor values for each feature, offering a precise measure of multicollinearity within the dataset. By examining these values, one can identify variables that may be redundant or highly correlated with others. This detailed representation complements the chart by supporting informed decisions on feature elimination, transformation, or regularization to improve model robustness and reliability.

Methodology: Variance Inflation Factor (VIF) Analysis

  1. Iterative Regression: Regress each feature variable against all other independent variables in the dataset.
  2. R-Squared Calculation: Determine the coefficient of determination (R²) for each specific regression model.
  3. VIF Computation: Apply the formula VIF = 1 / (1 - R²) to quantify the inflation of variance for each feature.
  4. Multicollinearity Identification: Evaluate the resulting values to detect high correlation (typically values > 5 or 10).
  5. Feature Refinement: Utilize the scores to decide on variable elimination, transformation, or the use of regularization techniques.
{% for row in range(vif.feature|length) %} {% endfor %}
Feature Variance Inflation Factor (VIF)
{{ vif.feature[row] }} {{ vif.vif_value[row] }}
{% endif %} {% if acf_pacf %}

📊 ACF & PACF Analysis

The ACF and PACF tables present the numerical values of autocorrelation and partial autocorrelation coefficients for each variable across specified lags. These tabulated values enable precise identification of statistically significant lags and the strength of relationships over time. By complementing the visual insights from the plots, the tables support more detailed analysis and help validate lag selection decisions for modeling and forecasting purposes.

Methodology: ACF & PACF Lag Analysis

  1. Lag Correlation Calculation: Compute the Pearson correlation between the time series and its own lagged versions (ACF).
  2. Partial Influence Extraction: Calculate the PACF to isolate the direct relationship between observations at different lags by removing intermediate effects.
  3. Coefficient Tabulation: Organize the resulting values into a table to provide precise numerical depth beyond the visual plots.
  4. Significance Testing: Compare coefficients against standard error bounds to identify which lags are statistically meaningful.
  5. Model Order Selection: Use the significant lag values to inform the parameters (p, q) for ARIMA or other forecasting models.
{% for item in acf_pacf.data %}

{{ item.column }}

{% for idx in range(item.lags | length) %} {% endfor %}
Lag ACF PACF
{{ item.lags[idx] }} {% if item.acf[idx] is number %} {{ "%.3f" | format(item.acf[idx]) }} {% else %} {{ item.acf[idx] }} {% endif %} {% if item.pacf[idx] is number %} {{ "%.3f" | format(item.pacf[idx]) }} {% else %} {{ item.pacf[idx] }} {% endif %}
{% endfor %}
{% endif %} {% if causality_test %}

🧠 Granger Causality Test

The Granger Causality Test table summarizes the presence and characteristics of causal relationships between variables based on statistical testing. It indicates whether a significant causal effect exists, the direction of the relationship (positive or negative), the optimal lag at which the effect is strongest, and the corresponding p-value for significance. Additionally, it reports the number of lags evaluated, providing context for the robustness of the test results.

Methodology: Granger Causality Testing

  1. Stationarity Verification: Ensure all time-series variables are stationary to prevent spurious regression results.
  2. Lag Optimization: Test multiple time lags to determine the point where the past values of one variable most significantly predict another.
  3. Statistical Hypothesis Testing: Perform F-tests to calculate p-values, determining if the predictive power of a feature is statistically significant.
  4. Directional Analysis: Assess the sign (positive or negative) of the relationship to understand how the feature influences the KPI.
  5. Robustness Validation: Report the total lags evaluated and significance levels to confirm the reliability of the causal inference.

Error Threshold (MAPE) Used: {{ "%.2f"|format(causality_test.error_threshold) }} %

{% for result in causality_test.causality_test_results %} {% endfor %}
Variable Causal Coefficient Sign Best Lag P-value Causality Score MAPE (%) Number of Lags Tested
{{ result.variable }} {% if result.causal %} Yes 👍 {% elif result.causal == None %} None {% else %} No 👎 {% endif %} {% if result.coefficient_sign == 'positive' %} Positive ↑ {% elif result.coefficient_sign == None %} None {% else %} Negative ↓ {% endif %} {{ result.best_lag }} {% if result.p_value is number %}{{ "%.5f"|format(result.p_value) }}{% else %}{{ result.p_value }}{% endif %} {% if result.score is number %}{{ "%.2f"|format(result.score) }}{% else %}{{ result.score }}{% endif %} {% if result.mape_score is number %}{{ "%.2f"|format(result.mape_score) }}{% else %}{{ result.mape_score }}{% endif %} {{ result.number_of_lags_tested }}
{% endif %} {% if time_comparison %}

📈 Time Comparison

The YoY table presents yearly aggregated values for each variable alongside their corresponding percentage changes compared to the previous year. This structured view allows for precise comparison of annual performance and growth rates, supporting detailed analysis of trends and facilitating data-driven decision-making.

{% for col in time_comparison.columns %} {% endfor %} {% for row in time_comparison.data %} {% for col in time_comparison.columns %} {% endfor %} {% endfor %}
Year {{ col }} {{ col }} % Change
{{ row.year }} {{ row[col] }} {{ row[col + '_pct_change'] }}
{% endif %} {% if corr_matrix %}

🔗 Correlation Matrix

The correlation heatmap visualizes pairwise relationships between variables using color intensity to represent the strength and direction of correlations. This chart enables quick identification of highly correlated variable pairs, patterns, and potential multicollinearity within the dataset.

{% for col_name in corr_matrix.keys() %} {% endfor %} {% for row_name, row_values in corr_matrix.items() %} {% for col_name in corr_matrix.keys() %} {% set corr_val = row_values[col_name] if row_values[col_name] is defined else None %} {% endfor %} {% endfor %}
Variable{{ col_name }}
{{ row_name }} {% if corr_val is not none and corr_val is number %} {{ "%.3f" | format(corr_val) }} {% else %} — {% endif %}
{% endif %} {% if lag_corr %}

⏱️ Lag Correlation

The Lag Correlation table presents the correlation of the target (KPI) variable with its past values across different lag periods (e.g., T vs. T−1, T−2, T−3, ...). It helps identify temporal dependencies and persistence in the series, supporting informed lag selection for time-series modeling.

Methodology: KPI Lag Correlation Analysis

  1. Lag Generation: Create multiple shifted versions of the target KPI variable for defined intervals (e.g., T−1, T−2, T−3, ...).
  2. Pairwise Calculation: Compute the correlation coefficient between the current KPI value (T) and each of its historical lags.
  3. Persistence Measurement: Evaluate the decay rate of correlation values to determine how long past values continue to influence the present.
  4. Statistical Tabulation: Organize the coefficients into a structured table to allow for precise comparison across different time offsets.
  5. Feature Engineering: Select the most highly correlated lags to be used as predictive inputs in time-series forecasting models.
{% for lag_key, corr_val in lag_corr.items() %} {% endfor %}
Lag Period Correlation
Lag {{ lag_key }} {% if corr_val is number %} {{ "%.4f" | format(corr_val) }} {% else %} {{ corr_val }} {% endif %}
{% endif %} {% if charts %}

📊 Visualizations

Embedded charts and plots for visual data exploration.

{% for chart in charts %}

{{ chart.title }}

{{ chart.description }}

{% set image_src = chart.image_data or chart.correlation_chart or chart.time_series_chart or chart.outliers_chart or chart.lag_correlation_chart %} {% if image_src %} {{ chart.alt_text or chart.title }} {% else %}
No image available for this chart.
{% endif %}
{% endfor %}
{% endif %}