Feature Selection transforms¶
Each header below represents a feature selection transform. These transforms are used in the context of feature_selections
.
[[feature_selections]]
input_column = "clean_birthyr"
output_column = "replaced_birthyr"
condition = "case when clean_birthyr is null or clean_birthyr == '' then year - age else clean_birthyr end"
transform = "sql_condition"
There are some additional attributes available for all transforms: checkpoint
, override_column_a
, override_column_b
, set_value_column_a
, set_value_column_b
.
bigrams¶
Split the given string column into bigrams.
Attributes:
input_column
- Type:string
. Required.output_column
- Type:string
. Required.no_first_pad
- Type: boolean. Optional. If set to true, don’t prepend a space “ “ to the column before splitting into bigrams. If false or not provided, do prepend the space.
[[feature_selections]]
input_column = "namelast_clean"
output_column = "namelast_clean_bigrams"
transform = "bigrams"
sql_condition¶
Apply the given SQL.
Attributes:
condition
- Type:string
. Required. The SQL condition to apply.output_column
- Type:string
. Required.
[[feature_selections]]
input_column = "clean_birthyr"
output_column = "replaced_birthyr"
condition = "case when clean_birthyr is null or clean_birthyr == '' then year - age else clean_birthyr end"
transform = "sql_condition"
array¶
Combine two input columns into an array output column.
Attributes:
input_columns
- Type: list of strings. Required. The two input columns.output_column
- Type:string
. Required.
[[feature_selections]]
input_columns = ["namelast_clean_bigrams", "namefrst_unstd_bigrams"]
output_column = "namelast_frst_bigrams"
transform = "array"
union¶
Take the set union of two columns that are arrays of strings, returning another array of strings.
Attributes:
input_columns
- Type: list of strings. Required.output_column
- Type:string
. Required.
soundex¶
Compute the soundex encoding of the input column.
Attributes:
input_column
- Type:string
. Required.output_column
- Type:string
. Required.
[[feature_selections]]
input_column = "namelast_clean"
output_column = "namelast_clean_soundex"
transform = "soundex"
power¶
Raise the input column to a given power.
Attributes:
input_col
- Type:string
. Required.output_col
- Type:string
. Required.exponent
- Type:int
. Required. The power to which to raise the input column.
[[feature_selections]]
input_col = "ncount"
output_col = "ncount2"
transform = "power"
exponent = 2