/home/kramesh3/.local/lib/python3.9/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
/home/kramesh3/.local/lib/python3.9/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(
2025-03-24 12:03:04.621649: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-24 12:03:04.636756: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1742832184.681829   55155 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742832184.691472   55155 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1742832184.702309   55158 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742832184.722911   55158 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-24 12:03:04.744498: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-03-24 12:03:04.746854: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-03-24 12:03:04.850965: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-24 12:03:04.862490: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1742832184.904484   55156 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1742832184.916210   55157 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742832184.924339   55157 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
E0000 00:00:1742832184.935781   55156 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-24 12:03:04.975048: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-03-24 12:03:04.985537: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/home/kramesh3/.local/lib/python3.9/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/24/2025 12:03:11:WARNING:--dry_run was specified. Reducing number of training steps to 2 and logging intervals to 1...
Differentially private training...
Path the model is saved to : /home/kramesh3/syntheval//data/generator/models/princeton-nlp_Sheared-LLaMA-1.3B_tab_DP_8
PrivacyArguments(per_sample_max_grad_norm=1.0, noise_multiplier=None, target_epsilon=8, target_delta=1e-05, disable_dp=False)
03/24/2025 12:03:11:WARNING:Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False
03/24/2025 12:03:11:INFO:Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=2,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=True,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
dry_run=True,
dry_test_run=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_strategy=no,
eval_use_gather_object=False,
evaluation_strategy=steps,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=64,
gradient_checkpointing=True,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0003,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/data/projects/syntheval/models/runs/Mar24_12-03-09_oak.wse.jhu.edu,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=4,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=constant,
max_grad_norm=0.0,
max_steps=2,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1,
optim=adamw_torch,
optim_args=None,
optim_target_modules=None,
output_dir=/data/projects/syntheval/models/,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['tensorboard'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=/data/projects/syntheval/models/,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=steps,
save_total_limit=2,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_liger_kernel=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.01,
)
03/24/2025 12:03:11:INFO:Privacy parameters PrivacyArguments(per_sample_max_grad_norm=1.0, noise_multiplier=None, target_epsilon=8, target_delta=1e-05, disable_dp=False)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/24/2025 12:03:11:WARNING:--dry_run was specified. Reducing number of training steps to 2 and logging intervals to 1...
Differentially private training...
Path the model is saved to : /home/kramesh3/syntheval//data/generator/models/princeton-nlp_Sheared-LLaMA-1.3B_tab_DP_8
PrivacyArguments(per_sample_max_grad_norm=1.0, noise_multiplier=None, target_epsilon=8, target_delta=1e-05, disable_dp=False)
03/24/2025 12:03:11:WARNING:Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, 16-bits training: False
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/24/2025 12:03:11:WARNING:--dry_run was specified. Reducing number of training steps to 2 and logging intervals to 1...
03/24/2025 12:03:11:WARNING:--dry_run was specified. Reducing number of training steps to 2 and logging intervals to 1...
Differentially private training...Differentially private training...

Path the model is saved to : /home/kramesh3/syntheval//data/generator/models/princeton-nlp_Sheared-LLaMA-1.3B_tab_DP_8Path the model is saved to : /home/kramesh3/syntheval//data/generator/models/princeton-nlp_Sheared-LLaMA-1.3B_tab_DP_8

PrivacyArguments(per_sample_max_grad_norm=1.0, noise_multiplier=None, target_epsilon=8, target_delta=1e-05, disable_dp=False)PrivacyArguments(per_sample_max_grad_norm=1.0, noise_multiplier=None, target_epsilon=8, target_delta=1e-05, disable_dp=False)

03/24/2025 12:03:11:WARNING:Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, 16-bits training: False
03/24/2025 12:03:11:WARNING:Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, 16-bits training: False
CPU:  0.2
Model:  princeton-nlp/Sheared-LLaMA-1.3B
[INFO|configuration_utils.py:675] 2025-03-24 12:03:15,196 >> loading configuration file config.json from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/config.json
[INFO|configuration_utils.py:742] 2025-03-24 12:03:15,197 >> Model config LlamaConfig {
  "_name_or_path": "princeton-nlp/Sheared-LLaMA-1.3B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5504,
  "max_position_embeddings": 4096,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.45.2",
  "use_cache": true,
  "vocab_size": 32000
}

[INFO|modeling_utils.py:3732] 2025-03-24 12:03:15,259 >> loading weights file pytorch_model.bin from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/pytorch_model.bin
[INFO|configuration_utils.py:1099] 2025-03-24 12:03:15,278 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0
}

[INFO|modeling_utils.py:4574] 2025-03-24 12:03:15,331 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4582] 2025-03-24 12:03:15,331 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at princeton-nlp/Sheared-LLaMA-1.3B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
CPU:  0.1
Model:  princeton-nlp/Sheared-LLaMA-1.3B
[INFO|configuration_utils.py:1054] 2025-03-24 12:03:15,409 >> loading configuration file generation_config.json from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/generation_config.json
[INFO|configuration_utils.py:1099] 2025-03-24 12:03:15,409 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0
}

CPU:  0.1
Model:  princeton-nlp/Sheared-LLaMA-1.3B
CPU:  0.1
Model:  princeton-nlp/Sheared-LLaMA-1.3B
[INFO|tokenization_utils_base.py:2206] 2025-03-24 12:03:15,466 >> loading file tokenizer.model from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/tokenizer.model
[INFO|tokenization_utils_base.py:2206] 2025-03-24 12:03:15,466 >> loading file tokenizer.json from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/tokenizer.json
[INFO|tokenization_utils_base.py:2206] 2025-03-24 12:03:15,466 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2206] 2025-03-24 12:03:15,466 >> loading file special_tokens_map.json from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/special_tokens_map.json
[INFO|tokenization_utils_base.py:2206] 2025-03-24 12:03:15,466 >> loading file tokenizer_config.json from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/tokenizer_config.json
[WARNING|logging.py:328] 2025-03-24 12:03:15,507 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
[INFO|modeling_utils.py:2161] 2025-03-24 12:03:15,574 >> You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
[WARNING|logging.py:328] 2025-03-24 12:03:15,958 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
[WARNING|logging.py:328] 2025-03-24 12:03:16,085 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
[WARNING|logging.py:328] 2025-03-24 12:03:16,139 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
03/24/2025 12:03:20:WARNING:Using the latest cached version of the module from /home/kramesh3/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--rouge/b01e0accf3bd6dd24839b769a5fda24e14995071570870922c71970b3a6ed886 (last modified on Fri Feb  2 20:35:49 2024) since it couldn't be found locally at evaluate-metric--rouge, or remotely on the Hugging Face Hub.
Process #0 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00000_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #0 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00000_of_00008.arrow
Process #1 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00001_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #1 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00001_of_00008.arrow
Process #2 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00002_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #2 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00002_of_00008.arrow
Process #3 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00003_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #3 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00003_of_00008.arrow
Process #4 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00004_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #4 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00004_of_00008.arrow
Process #5 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00005_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #5 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00005_of_00008.arrow
Process #6 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00006_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #6 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00006_of_00008.arrow
Process #7 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00007_of_00008.arrow
03/24/2025 12:03:20:INFO:Process #7 will write at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00007_of_00008.arrow
Spawning 8 processes
03/24/2025 12:03:20:INFO:Spawning 8 processes
03/24/2025 12:03:21:WARNING:Using the latest cached version of the module from /home/kramesh3/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--rouge/b01e0accf3bd6dd24839b769a5fda24e14995071570870922c71970b3a6ed886 (last modified on Fri Feb  2 20:35:49 2024) since it couldn't be found locally at evaluate-metric--rouge, or remotely on the Hugging Face Hub.
03/24/2025 12:03:21:WARNING:Using the latest cached version of the module from /home/kramesh3/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--rouge/b01e0accf3bd6dd24839b769a5fda24e14995071570870922c71970b3a6ed886 (last modified on Fri Feb  2 20:35:49 2024) since it couldn't be found locally at evaluate-metric--rouge, or remotely on the Hugging Face Hub.
03/24/2025 12:03:21:WARNING:Using the latest cached version of the module from /home/kramesh3/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--rouge/b01e0accf3bd6dd24839b769a5fda24e14995071570870922c71970b3a6ed886 (last modified on Fri Feb  2 20:35:49 2024) since it couldn't be found locally at evaluate-metric--rouge, or remotely on the Hugging Face Hub.
[rank1]:[W324 12:03:22.987275595 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank3]:[W324 12:03:22.201037779 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank2]:[W324 12:03:22.244900657 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00000_of_00008.arrow
03/24/2025 12:03:23:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00000_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00002_of_00008.arrow
03/24/2025 12:03:23:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00002_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00005_of_00008.arrow
03/24/2025 12:03:24:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00005_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00001_of_00008.arrow
03/24/2025 12:03:24:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00001_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00003_of_00008.arrow
03/24/2025 12:03:24:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00003_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00006_of_00008.arrow
03/24/2025 12:03:24:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00006_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00004_of_00008.arrow
03/24/2025 12:03:25:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00004_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00007_of_00008.arrow
03/24/2025 12:03:25:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/train/cache-3ee6dc8d3664daeb_00007_of_00008.arrow
Concatenating 8 shards
03/24/2025 12:03:25:INFO:Concatenating 8 shards
Process #0 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00000_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #0 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00000_of_00008.arrow
Process #1 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00001_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #1 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00001_of_00008.arrow
Process #2 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00002_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #2 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00002_of_00008.arrow
Process #3 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00003_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #3 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00003_of_00008.arrow
Process #4 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00004_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #4 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00004_of_00008.arrow
Process #5 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00005_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #5 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00005_of_00008.arrow
Process #6 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00006_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #6 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00006_of_00008.arrow
Process #7 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00007_of_00008.arrow
03/24/2025 12:03:25:INFO:Process #7 will write at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00007_of_00008.arrow
Spawning 8 processes
03/24/2025 12:03:25:INFO:Spawning 8 processes
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00000_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00000_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00001_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00001_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00002_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00002_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00003_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00003_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00004_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00004_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00005_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00005_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00006_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00006_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00007_of_00008.arrow
03/24/2025 12:03:26:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/validation/cache-ffbe47634fdf5d52_00007_of_00008.arrow
Concatenating 8 shards
03/24/2025 12:03:27:INFO:Concatenating 8 shards
Process #0 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00000_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #0 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00000_of_00008.arrow
Process #1 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00001_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #1 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00001_of_00008.arrow
Process #2 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00002_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #2 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00002_of_00008.arrow
Process #3 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00003_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #3 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00003_of_00008.arrow
Process #4 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00004_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #4 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00004_of_00008.arrow
Process #5 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00005_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #5 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00005_of_00008.arrow
Process #6 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00006_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #6 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00006_of_00008.arrow
Process #7 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00007_of_00008.arrow
03/24/2025 12:03:27:INFO:Process #7 will write at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00007_of_00008.arrow
Spawning 8 processes
03/24/2025 12:03:27:INFO:Spawning 8 processes
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00002_of_00008.arrow
03/24/2025 12:03:30:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00002_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00007_of_00008.arrow
03/24/2025 12:03:30:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00007_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00000_of_00008.arrow
03/24/2025 12:03:30:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00000_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00005_of_00008.arrow
03/24/2025 12:03:31:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00005_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00001_of_00008.arrow
03/24/2025 12:03:31:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00001_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00003_of_00008.arrow
03/24/2025 12:03:31:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00003_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00006_of_00008.arrow
03/24/2025 12:03:31:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00006_of_00008.arrow
Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00004_of_00008.arrow
03/24/2025 12:03:31:INFO:Caching processed dataset at /home/kramesh3/syntheval/data/generator/data/tab/test/cache-b592543381a89ae8_00004_of_00008.arrow
Concatenating 8 shards
03/24/2025 12:03:31:INFO:Concatenating 8 shards
[rank0]:[W324 12:03:31.448415003 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
03/24/2025 12:03:32:INFO:Using LoRA
03/24/2025 12:03:32:INFO:Total number of parameters of the model: 1347000320
03/24/2025 12:03:32:INFO:Fine-tuned number of parameters of the model: 1572864
1572864
Differentially Private Training: True
args
1
4
64
1014
args
1
4
64
1014
0.252465483234714
4
1572864
Differentially Private Training: True
args
1
4
64
1014
args
1
4
64
1014
0.252465483234714
4
1572864
Differentially Private Training: True
args
1
4
64
1014
args
1
4
64
1014
0.252465483234714
4
1572864
Differentially Private Training: True
args
1
4
64
1014
args
1
4
64
1014
0.252465483234714
4
args
1
4
64
1014
args
1
4
64
1014
args
1
4
64
1014
[WARNING|trainer.py:617] 2025-03-24 12:04:38,339 >> max_steps is given, it will override any value given in num_train_epochs
Trainer initialized.
args
1
4
64
1014
args
1
4
64
1014
args
1
4
64
1014
args
1
4
64
1014
args
1
4
64
1014
[WARNING|trainer.py:617] 2025-03-24 12:04:48,935 >> max_steps is given, it will override any value given in num_train_epochs
Trainer initialized.
args
1
4
64
1014
args
1
4
64
1014
args
1
4
64
1014
[WARNING|trainer.py:617] 2025-03-24 12:04:49,757 >> max_steps is given, it will override any value given in num_train_epochs
Trainer initialized.
args
1
4
64
1014
[WARNING|trainer.py:617] 2025-03-24 12:04:50,944 >> max_steps is given, it will override any value given in num_train_epochs
Trainer initialized.
[INFO|trainer.py:2243] 2025-03-24 12:04:51,888 >> ***** Running training *****
[INFO|trainer.py:2244] 2025-03-24 12:04:51,889 >>   Num examples = 1,014
[INFO|trainer.py:2245] 2025-03-24 12:04:51,889 >>   Num Epochs = 1
[INFO|trainer.py:2246] 2025-03-24 12:04:51,889 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2249] 2025-03-24 12:04:51,889 >>   Total train batch size (w. parallel, distributed & accumulation) = 256
[INFO|trainer.py:2250] 2025-03-24 12:04:51,889 >>   Gradient Accumulation steps = 64
[INFO|trainer.py:2251] 2025-03-24 12:04:51,889 >>   Total optimization steps = 2
[INFO|trainer.py:2252] 2025-03-24 12:04:51,891 >>   Number of trainable parameters = 1,572,864
/home/kramesh3/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1830: FutureWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  self._maybe_warn_non_full_backward_hook(args, result, grad_fn)
/home/kramesh3/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1830: FutureWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  self._maybe_warn_non_full_backward_hook(args, result, grad_fn)
/home/kramesh3/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1830: FutureWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  self._maybe_warn_non_full_backward_hook(args, result, grad_fn)
/home/kramesh3/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1830: FutureWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  self._maybe_warn_non_full_backward_hook(args, result, grad_fn)
[rank0]:[W324 12:05:21.549462689 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W324 12:05:22.296870890 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W324 12:05:22.300031545 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W324 12:05:22.343032633 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[INFO|trainer.py:3705] 2025-03-24 12:05:53,310 >> Saving model checkpoint to /data/projects/syntheval/models/checkpoint-2
[INFO|trainer.py:3719] 2025-03-24 12:05:53,316 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.
[INFO|tokenization_utils_base.py:2641] 2025-03-24 12:06:05,034 >> tokenizer config file saved in /data/projects/syntheval/models/checkpoint-2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2650] 2025-03-24 12:06:05,034 >> Special tokens file saved in /data/projects/syntheval/models/checkpoint-2/special_tokens_map.json
[INFO|trainer.py:3797] 2025-03-24 12:06:05,174 >> Deleting older checkpoint [/data/projects/syntheval/models/checkpoint-2] due to args.save_total_limit
[INFO|trainer.py:2505] 2025-03-24 12:06:07,627 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 75.7368, 'train_samples_per_second': 6.76, 'train_steps_per_second': 0.026, 'train_loss': 1.5726735591888428, 'epoch': 0.5039370078740157}
/home/kramesh3/.local/lib/python3.9/site-packages/opacus/privacy_engine.py:142: UserWarning: Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/opacus/privacy_engine.py:142: UserWarning: Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/opacus/privacy_engine.py:142: UserWarning: Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
  warnings.warn(
{'final_epsilon_prv': 6.327880206825773, 'final_epsilon_rdp': 7.06948937105914, 'epoch': 0.5039370078740157}
/home/kramesh3/.local/lib/python3.9/site-packages/opacus/privacy_engine.py:142: UserWarning: Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
  warnings.warn(
/data/kramesh3/envs/sdg-env/lib/python3.9/site-packages/peft/utils/save_and_load.py:160: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
  warnings.warn(
[INFO|configuration_utils.py:675] 2025-03-24 12:06:22,139 >> loading configuration file config.json from cache at /home/kramesh3/.cache/huggingface/hub/models--princeton-nlp--Sheared-LLaMA-1.3B/snapshots/a4b76938edbf571ea7d7d9904861cbdca08809b4/config.json
[INFO|configuration_utils.py:742] 2025-03-24 12:06:22,141 >> Model config LlamaConfig {
  "_name_or_path": "princeton-nlp/Sheared-LLaMA-1.3B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5504,
  "max_position_embeddings": 4096,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.45.2",
  "use_cache": true,
  "vocab_size": 32000
}

/data/kramesh3/envs/sdg-env/lib/python3.9/site-packages/peft/utils/save_and_load.py:160: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
  warnings.warn(
/data/kramesh3/envs/sdg-env/lib/python3.9/site-packages/peft/utils/save_and_load.py:160: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
  warnings.warn(
/data/kramesh3/envs/sdg-env/lib/python3.9/site-packages/peft/utils/save_and_load.py:160: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
  warnings.warn(
[rank0]:[W324 12:06:25.968130737 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
