aws_ddk_core.stages.GlueTransformStage

class aws_ddk_core.stages.GlueTransformStage(*args: Any, **kwargs)

Class that represents a Glue Transform DDK DataStage.

__init__(scope: constructs.Construct, id: str, environment_id: str, job_name: typing.Optional[str] = None, executable: typing.Optional[aws_cdk.aws_glue_alpha.JobExecutable] = None, job_role: typing.Optional[aws_cdk.aws_iam.IRole] = None, crawler_name: typing.Optional[str] = None, database_name: typing.Optional[str] = None, crawler_role: typing.Optional[aws_cdk.aws_iam.IRole] = None, targets: typing.Optional[aws_cdk.aws_glue.CfnCrawler.TargetsProperty] = None, job_args: typing.Optional[typing.Dict[str, typing.Any]] = None, glue_job_args: typing.Optional[typing.Dict[str, typing.Any]] = {}, glue_crawler_args: typing.Optional[typing.Dict[str, typing.Any]] = {}, crawler_allow_failure: typing.Optional[bool] = True, state_machine_input: typing.Optional[typing.Dict[str, typing.Any]] = None, additional_role_policy_statements: typing.Optional[typing.List[aws_cdk.aws_iam.PolicyStatement]] = None, state_machine_retry_max_attempts: typing.Optional[int] = 3, state_machine_retry_backoff_rate: typing.Optional[int] = 2, state_machine_retry_interval: typing.Optional[aws_cdk.Duration] = <aws_cdk.Duration object>, state_machine_failed_executions_alarm_threshold: typing.Optional[int] = 1, state_machine_failed_executions_alarm_evaluation_periods: typing.Optional[int] = 1, state_machine_args: typing.Optional[typing.Dict[str, typing.Any]] = {}, alarms_enabled: typing.Optional[bool] = True) None

DDK Glue Transform stage.

Stage that contains a step function that runs Glue job, and a Glue crawler afterwards. If the Glue job or crawler names are not supplied, then they are created.

Parameters
  • scope (Construct) – Scope within which this construct is defined

  • id (str) – Identifier of the stage

  • environment_id (str) – Identifier of the environment

  • job_name (Optional[str]) – The name of a preexisting Glue job to run. If None, a Glue job is created

  • executable (Optional[JobExecutable]) – The job executable properties

  • job_role (Optional[IRole]) – The job execution role

  • crawler_name (Optional[str]) – The name of a preexisting Glue crawler to run. If None, a Glue crawler is created

  • database_name (Optional[str]) – The name of the database in which the crawler’s output is stored

  • crawler_role (Optional[IRole]) – The crawler execution role

  • targets (Optional[TargetsProperty]) – A collection of targets to crawl

  • job_args (Optional[Dict[str, Any]]) – The input arguments to the Glue job

  • glue_job_args (Optional[Dict[str, Any]]) – Additional Glue job properties. For complete list of properties refer to CDK Documentation - Glue Job: https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_glue_alpha/Job.html

  • glue_crawler_args (Optional[Dict[str, Any]]) – Additional arguments to pass to CDK L1 Construct: CfnCrawler. See: https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_glue/CfnCrawler.html

  • crawler_allow_failure (Optional[Bool]) – Argument to allow stepfunction success for crawler failures/execption like Glue.CrawlerRunningException Defaults to True

  • state_machine_input (Optional[Dict[str, Any]]) – The input dict to the state machine

  • additional_role_policy_statements (Optional[List[PolicyStatement]]) – Additional IAM policy statements to add to the state machine role

  • state_machine_retry_max_attempts (Optional[int],) – How many times to retry this particular error. Defaults to 3

  • state_machine_retry_backoff_rate (Optional[int]) – Multiplication for how much longer the wait interval gets on every retry. Defaults to 2

  • state_machine_retry_interval (Optional[cdk.Duration]) – How many seconds to wait initially before retrying. Defaults to cdk.Duration.seconds(1)

  • state_machine_failed_executions_alarm_threshold (Optional[int]) – The number of failed state machine executions before triggering CW alarm. Defaults to 1

  • state_machine_failed_executions_alarm_evaluation_periods (Optional[int]) – The number of periods over which data is compared to the specified threshold. Defaults to 1

  • state_machine_args (Optional[Dict[str, Any]]) – Additional arguments to pass to State Machine creation. See: https://awslabs.github.io/aws-ddk/release/latest/api/core/stubs/aws_ddk_core.pipelines.StateMachineStage.html#aws_ddk_core.pipelines.StateMachineStage.build_state_machine # noqa

  • alarms_enabled (Optional[bool]) – Enable/Disable all alarms in the stage. Default - True

Methods

__init__(scope, id, environment_id[, ...])

DDK Glue Transform stage.

add_alarm(alarm_id, alarm_metric[, ...])

Add a CloudWatch alarm for the Data Stage

build_state_machine(id, environment_id, ...)

Build state machine.

get_event_pattern()

Get output event pattern of the stage.

get_targets()

Get input targets of the stage.

is_construct(x)

Checks if x is a construct.

to_string()

Returns a string representation of this construct.

Attributes

cloudwatch_alarms

List[Alarm] List of CloudWatch Alarms linked to the stage

crawler

Optional[CfnCrawler] The Glue crawler

job

Optional[IJob] The Glue job

node

The tree node.

state_machine

StateMachine The state machine

property crawler: Optional[aws_cdk.aws_glue.CfnCrawler]

Optional[CfnCrawler] The Glue crawler

Type

Return

property job: Optional[aws_cdk.aws_glue_alpha.IJob]

Optional[IJob] The Glue job

Type

Return