2020-09-24 13:56:52,474 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training
2020-09-24 13:56:52,482 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2020-09-24 13:57:53,115 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
/usr/local/bin/python3.7 -m pip install -r requirements.txt
Collecting transformers==3.0.2
  Downloading transformers-3.0.2-py3-none-any.whl (769 kB)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/site-packages (from transformers==3.0.2->-r requirements.txt (line 1)) (20.4)
Collecting sacremoses
  Downloading sacremoses-0.0.43.tar.gz (883 kB)
Collecting regex!=2019.12.17
  Downloading regex-2020.7.14-cp37-cp37m-manylinux2010_x86_64.whl (660 kB)
Collecting filelock
  Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
Collecting sentencepiece!=0.1.92
  Downloading sentencepiece-0.1.91-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
Collecting tqdm>=4.27
  Downloading tqdm-4.49.0-py2.py3-none-any.whl (69 kB)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/site-packages (from transformers==3.0.2->-r requirements.txt (line 1)) (1.18.5)
Requirement already satisfied: requests in /usr/local/lib/python3.7/site-packages (from transformers==3.0.2->-r requirements.txt (line 1)) (2.24.0)
Collecting tokenizers==0.8.1.rc1
  Downloading tokenizers-0.8.1rc1-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/site-packages (from packaging->transformers==3.0.2->-r requirements.txt (line 1)) (2.4.7)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from packaging->transformers==3.0.2->-r requirements.txt (line 1)) (1.15.0)
Collecting click
  Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/site-packages (from sacremoses->transformers==3.0.2->-r requirements.txt (line 1)) (0.16.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests->transformers==3.0.2->-r requirements.txt (line 1)) (1.25.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests->transformers==3.0.2->-r requirements.txt (line 1)) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests->transformers==3.0.2->-r requirements.txt (line 1)) (2020.6.20)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests->transformers==3.0.2->-r requirements.txt (line 1)) (2.10)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py): started
  Building wheel for sacremoses (setup.py): finished with status 'done'
  Created wheel for sacremoses: filename=sacremoses-0.0.43-py3-none-any.whl size=893259 sha256=cae9030700abd3491bd21112e6472e2ed738acce2b4c6fe2346bf7496f91317e
  Stored in directory: /root/.cache/pip/wheels/69/09/d1/bf058f7d6fa0ecba2ce7c66be3b8d012beb4bf61a6e0c101c0
Successfully built sacremoses
Installing collected packages: regex, click, tqdm, sacremoses, filelock, sentencepiece, tokenizers, transformers
Successfully installed click-7.1.2 filelock-3.0.12 regex-2020.7.14 sacremoses-0.0.43 sentencepiece-0.1.91 tokenizers-0.8.1rc1 tqdm-4.49.0 transformers-3.0.2
2020-09-24 13:57:57,198 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2020-09-24 13:57:57,215 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2020-09-24 13:57:57,230 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2020-09-24 13:57:57,240 sagemaker-training-toolkit INFO     Invoking user script

Training Env:

{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "task_6_1_state": "/opt/ml/input/data/task_6_1_state",
        "task_6_1_model": "/opt/ml/input/data/task_6_1_model"
    },
    "current_host": "algo-2",
    "framework_module": "sagemaker_tensorflow_container.training:main",
    "hosts": [
        "algo-1",
        "algo-2"
    ],
    "hyperparameters": {
        "model_dir": "s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model",
        "task_type": "2"
    },
    "input_config_dir": "/opt/ml/input/config",
    "input_data_config": {
        "task_6_1_state": {
            "TrainingInputMode": "File",
            "S3DistributionType": "ShardedByS3Key",
            "RecordWrapperType": "None"
        },
        "task_6_1_model": {
            "TrainingInputMode": "File",
            "S3DistributionType": "FullyReplicated",
            "RecordWrapperType": "None"
        }
    },
    "input_dir": "/opt/ml/input",
    "is_master": false,
    "job_name": "cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ",
    "log_level": 20,
    "master_hostname": "algo-1",
    "model_dir": "/opt/ml/model",
    "module_dir": "s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/source/sourcedir.tar.gz",
    "module_name": "worker6",
    "network_interface_name": "eth0",
    "num_cpus": 2,
    "num_gpus": 0,
    "output_data_dir": "/opt/ml/output/data",
    "output_dir": "/opt/ml/output",
    "output_intermediate_dir": "/opt/ml/output/intermediate",
    "resource_config": {
        "current_host": "algo-2",
        "hosts": [
            "algo-1",
            "algo-2"
        ],
        "network_interface_name": "eth0"
    },
    "user_entry_point": "worker6.py"
}

Environment variables:

SM_HOSTS=["algo-1","algo-2"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"model_dir":"s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model","task_type":"2"}
SM_USER_ENTRY_POINT=worker6.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_host":"algo-2","hosts":["algo-1","algo-2"],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"task_6_1_model":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"task_6_1_state":{"RecordWrapperType":"None","S3DistributionType":"ShardedByS3Key","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["task_6_1_model","task_6_1_state"]
SM_CURRENT_HOST=algo-2
SM_MODULE_NAME=worker6
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_tensorflow_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=2
SM_NUM_GPUS=0
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"task_6_1_model":"/opt/ml/input/data/task_6_1_model","task_6_1_state":"/opt/ml/input/data/task_6_1_state"},"current_host":"algo-2","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1","algo-2"],"hyperparameters":{"model_dir":"s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model","task_type":"2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"task_6_1_model":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"task_6_1_state":{"RecordWrapperType":"None","S3DistributionType":"ShardedByS3Key","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":false,"job_name":"cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/source/sourcedir.tar.gz","module_name":"worker6","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-2","hosts":["algo-1","algo-2"],"network_interface_name":"eth0"},"user_entry_point":"worker6.py"}
SM_USER_ARGS=["--model_dir","s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model","--task_type","2"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TASK_6_1_STATE=/opt/ml/input/data/task_6_1_state
SM_CHANNEL_TASK_6_1_MODEL=/opt/ml/input/data/task_6_1_model
SM_HP_MODEL_DIR=s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model
SM_HP_TASK_TYPE=2
PYTHONPATH=/opt/ml/code:/usr/local/bin:/usr/local/lib/python37.zip:/usr/local/lib/python3.7:/usr/local/lib/python3.7/lib-dynload:/usr/local/lib/python3.7/site-packages

Invoking script with the following command:

/usr/local/bin/python3.7 worker6.py --model_dir s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model --task_type 2


-- Internal Lib2 imported!
INFO:worker_toolkit.worker_lib:Deleting other instances' state
INFO:worker_toolkit.worker_lib:Creating instance specific state dir
INFO:worker_toolkit.worker_lib:Worker config: Namespace(channel_data='', channel_model='', channel_task_6_1_model='/opt/ml/input/data/task_6_1_model', channel_task_6_1_state='/opt/ml/input/data/task_6_1_state', channels=['task_6_1_model', 'task_6_1_state'], current_host='algo-2', hosts=['algo-1', 'algo-2'], hps={'model_dir': 's3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model', 'task_type': '2'}, input_config_dir='/opt/ml/input/config', input_data_config='{"task_6_1_model":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"task_6_1_state":{"RecordWrapperType":"None","S3DistributionType":"ShardedByS3Key","TrainingInputMode":"File"}}', input_dir='/opt/ml/input', instance_state='/state/algo-2', job_name='cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ', model_dir='s3://sagemaker-us-east-1-667232328135/tests/simple-sagemaker-example-cli_2020-09-24-13-43-54_py37/cli-task6-2/cli-task6-2-2020-09-24-13-53-37-UpCaxVPZ/model', network_interface_name='eth0', num_cpus=2, num_gpus=0, output_data_dir='/opt/ml/output/data', output_dir='/opt/ml/output', resource_config='{"current_host":"algo-2","hosts":["algo-1","algo-2"],"network_interface_name":"eth0"}', state='/state')
INFO:__main__:input channel task_6_1_model is at /opt/ml/input/data/task_6_1_model
INFO:__main__:input channel task_6_1_state is at /opt/ml/input/data/task_6_1_state
INFO:__main__:*** START listing files in /opt/ml
INFO:__main__:[File] /opt/ml/code
INFO:__main__:[File] /opt/ml/code/external_dependency
INFO:__main__:[Dir ] /opt/ml/code/external_dependency/lib1.py
INFO:__main__:[File] /opt/ml/code/internal_dependency
INFO:__main__:[Dir ] /opt/ml/code/internal_dependency/lib2.py
INFO:__main__:[Dir ] /opt/ml/code/requirements.txt
INFO:__main__:[Dir ] /opt/ml/code/worker6.py
INFO:__main__:[File] /opt/ml/code/worker_toolkit
INFO:__main__:[Dir ] /opt/ml/code/worker_toolkit/__init__.py
INFO:__main__:[Dir ] /opt/ml/code/worker_toolkit/worker_lib.py
INFO:__main__:[File] /opt/ml/errors
INFO:__main__:[File] /opt/ml/input
INFO:__main__:[File] /opt/ml/input/config
INFO:__main__:[Dir ] /opt/ml/input/config/checkpointconfig.json
INFO:__main__:[Dir ] /opt/ml/input/config/debughookconfig.json
INFO:__main__:[Dir ] /opt/ml/input/config/hyperparameters.json
INFO:__main__:[Dir ] /opt/ml/input/config/init-config.json
INFO:__main__:[Dir ] /opt/ml/input/config/inputdataconfig.json
INFO:__main__:[Dir ] /opt/ml/input/config/metric-definition-regex.json
INFO:__main__:[Dir ] /opt/ml/input/config/resourceconfig.json
INFO:__main__:[Dir ] /opt/ml/input/config/trainingjobconfig.json
INFO:__main__:[Dir ] /opt/ml/input/config/upstreamoutputdataconfig.json
INFO:__main__:[File] /opt/ml/input/data
INFO:__main__:[Dir ] /opt/ml/input/data/checkpoints-manifest
INFO:__main__:[File] /opt/ml/input/data/task_6_1_model
INFO:__main__:[Dir ] /opt/ml/input/data/task_6_1_model/model.tar.gz
INFO:__main__:[Dir ] /opt/ml/input/data/task_6_1_model-manifest
INFO:__main__:[File] /opt/ml/input/data/task_6_1_state
INFO:__main__:[File] /opt/ml/input/data/task_6_1_state/algo-1
INFO:__main__:[Dir ] /opt/ml/input/data/task_6_1_state/algo-1/algo-1
INFO:__main__:[File] /opt/ml/input/data/task_6_1_state/algo-2
INFO:__main__:[Dir ] /opt/ml/input/data/task_6_1_state/algo-2/algo-2
INFO:__main__:[Dir ] /opt/ml/input/data/task_6_1_state-manifest
INFO:__main__:[File] /opt/ml/model
INFO:__main__:[File] /opt/ml/output
INFO:__main__:[File] /opt/ml/output/data
INFO:__main__:[File] /opt/ml/output/metrics
INFO:__main__:[File] /opt/ml/output/metrics/sagemaker
INFO:__main__:[File] /opt/ml/output/profiler
INFO:__main__:[File] /opt/ml/output/tensors
INFO:__main__:*** END file listing /opt/ml
INFO:__main__:*** START listing files in /state
INFO:__main__:[File] /state/algo-2
INFO:__main__:*** END file listing /state
-- External Lib1 imported!
INFO:__main__:Score=10;
INFO:__main__:Score=20;
INFO:worker_toolkit.worker_lib:Marking instance algo-2 completion
INFO:__main__:*** START listing files in /opt/ml
INFO:__main__:[File] /opt/ml/errors
INFO:__main__:[File] /opt/ml/model
INFO:__main__:[File] /opt/ml/output
INFO:__main__:[File] /opt/ml/output/data
INFO:__main__:[File] /opt/ml/output/metrics
INFO:__main__:[File] /opt/ml/output/metrics/sagemaker
INFO:__main__:[File] /opt/ml/output/profiler
INFO:__main__:[File] /opt/ml/output/tensors
INFO:__main__:*** END file listing /opt/ml
INFO:__main__:*** START listing files in /state
INFO:__main__:[File] /state/algo-2
INFO:__main__:[Dir ] /state/algo-2/__COMPLETED__
INFO:__main__:*** END file listing /state
INFO:__main__:finished!
2020-09-24 13:59:00,180 sagemaker_tensorflow_container.training WARNING  No model artifact is saved under path /opt/ml/model. Your training job will not save any model files to S3.
For details of how to construct your training script see:
https://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script
2020-09-24 13:59:00,180 sagemaker-training-toolkit INFO     Reporting training SUCCESS
