dml.CamlDML
core.dml.CamlDML(self, df, uuid, y, t, X=None, model_y=HistGradientBoostingRegressor(max_depth=3, max_iter=500), model_t=HistGradientBoostingClassifier(max_depth=3, max_iter=500), discrete_treatment=True, discrete_outcome=False, spark=None)
The CamlDML class represents a Double Machine Learning (DML) implementation for estimating… average treatment effects (ATE), conditional average treatment effects (CATE), group average treatment effects (GATE), etc.
This class… TODO
Parameters
Name | Type | Description | Default |
---|---|---|---|
df |
pandas.DataFrame | polars.DataFrame | pyspark.sql.DataFrame | Table | The input DataFrame representing the data for the EchoCATE instance. | required |
uuid |
str | The str representing the column name for the universal identifier code (eg, ehhn) | required |
y |
str | The str representing the column name for the outcome variable. | required |
t |
str | The str representing the column name(s) for the treatment variable(s). | required |
X |
str | List[str] | None | The str (if unity) or list of feature names representing the custom feature set. Defaults to None. | None |
model_y |
RegressorMixin | ClassifierMixin | The nuissance model to be used for predicting the outcome. Defaults to HistGradientBoostingRegressor. | HistGradientBoostingRegressor(max_depth=3, max_iter=500) |
model_t |
RegressorMixin | ClassifierMixin | The nuissance model to be used for predicting the treatment. Defaults to HistGradientBoostingClassifier. | HistGradientBoostingClassifier(max_depth=3, max_iter=500) |
discrete_treatment |
bool | A boolean indicating whether the treatment is discrete or continuous. Defaults to True. | True |
spark |
SparkSession | None | The SparkSession object used for connecting to Ibis when df is a pyspark.sql.DataFrame. Defaults to None. |
None |
Attributes
Name | Type | Description |
---|---|---|
df | pandas.DataFrame | polars.DataFrame | pyspark.sql.DataFrame | Table | The input DataFrame representing the data for the EchoCATE instance. |
uuid | str | The str representing the column name for the universal identifier code (eg, ehhn) |
y | str | The str representing the column name for the outcome variable. |
t | str | The str representing the column name(s) for the treatment variable(s). |
X | List[str] | str | None | The str (if unity) or list/tuple of feature names representing the custom feature set. |
model_y | RegressorMixin | ClassifierMixin | The nuissance model to be used for predicting the outcome. |
model_t | RegressorMixin | ClassifierMixin | The nuissance model to be used for predicting the treatment. |
discrete_treatment | bool | A boolean indicating whether the treatment is discrete or continuous. |
spark | SparkSession | The SparkSession object used for connecting to Ibis when df is a pyspark.sql.DataFrame. |
_ibis_connection | ibis.client.Client | The Ibis client object representing the backend connection to Ibis. |
_ibis_df | Table | The Ibis table expression representing the DataFrame connected to Ibis. |
_table_name | str | The name of the temporary table/view created for the DataFrame in Ibis. |
_Y | Table | The outcome variable data as ibis table. |
_T | Table | The treatment variable data as ibis table. |
_X | Table | The feature set data as ibis table. |
_estimator | CausalForestDML | The fitted EconML estimator object. |
Methods
Name | Description |
---|---|
fit | Fits the econometric model to learn the CATE function. |
optimize | Optimizes a households treatment based on CATE predictions. Only applicable when |
predict | Predicts the CATE given feature set. |
rank | Ranks households based on the those with the highest estimated CATE. |
summarize | Provides population summary of treatment effects, including Average Treatment Effects (ATEs) |
fit
core.dml.CamlDML.fit(estimator='CausalForestDML', return_estimator=False, **kwargs)
Fits the econometric model to learn the CATE function.
Sets the _Y, _T, and _X internal attributes to the data of the outcome, treatment, and feature set, respectively. Additionally, sets the _estimator internal attribute to the fitted EconML estimator object.
Parameters
Name | Type | Description | Default |
---|---|---|---|
estimator |
str | The estimator to use for fitting the CATE function. Defaults to ‘CausalForestDML’. Currently, only this option is available. | 'CausalForestDML' |
return_estimator |
bool | Set to True to recieve the estimator object back after fitting. Defaults to False. | False |
**kwargs |
Additional keyword arguments to pass to the EconML estimator. | {} |
Returns
Type | Description |
---|---|
econml.dml.causal_forest.CausalForestDML: | The fitted EconML CausalForestDML estimator object if return_estimator is True. |
optimize
core.dml.CamlDML.optimize()
Optimizes a households treatment based on CATE predictions. Only applicable when vector of treatments includes more than 1 mutually exlusive treatment.
Returns
Type | Description |
---|---|
None |
predict
core.dml.CamlDML.predict(out_of_sample_df=None, ci=90, return_predictions=False, append_predictions=False)
Predicts the CATE given feature set.
Returns
Type | Description |
---|---|
A tuple containing the predicted CATE, standard errors, lower bound, and upper bound if return_predictions is True. |
rank
core.dml.CamlDML.rank()
Ranks households based on the those with the highest estimated CATE.
Returns
Type | Description |
---|---|
None |
summarize
core.dml.CamlDML.summarize()
Provides population summary of treatment effects, including Average Treatment Effects (ATEs) and Conditional Average Treatement Effects (CATEs).
Returns
Type | Description |
---|---|
econml.utilities.Summary: | Population summary of the results. |