/home/kramesh3/.local/lib/python3.9/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(
2025-03-21 14:15:20.917415: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1742580920.935449   82422 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742580920.941042   82422 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-21 14:15:20.959428: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/home/kramesh3/.local/lib/python3.9/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/21/2025 14:15:23:WARNING:Process rank: 0, device: cuda:0, n_gpu: 4, distributed training: True, 16-bits training: False
[WARNING|logging.py:328] 2025-03-21 14:15:27,657 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
03/21/2025 14:15:30:WARNING:Using the latest cached version of the module from /home/kramesh3/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--rouge/b01e0accf3bd6dd24839b769a5fda24e14995071570870922c71970b3a6ed886 (last modified on Fri Feb  2 20:35:49 2024) since it couldn't be found locally at evaluate-metric--rouge, or remotely on the Hugging Face Hub.
CPU:  0.1
Model:  princeton-nlp/Sheared-LLaMA-1.3B
1572864
Differentially Private Training: False
/home/kramesh3/.local/lib/python3.9/site-packages/torch/nn/parallel/_functions.py:70: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn(
{'loss': 1.522, 'learning_rate': 0.0003, 'epoch': 0.0625}
{'loss': 1.4131, 'learning_rate': 0.0003, 'epoch': 0.125}
{'loss': 1.4111, 'learning_rate': 0.0003, 'epoch': 0.1875}
{'loss': 1.3828, 'learning_rate': 0.0003, 'epoch': 0.25}
{'loss': 1.3565, 'learning_rate': 0.0003, 'epoch': 0.3125}
{'loss': 1.2788, 'learning_rate': 0.0003, 'epoch': 0.375}
{'loss': 1.2129, 'learning_rate': 0.0003, 'epoch': 0.4375}
{'loss': 1.2882, 'learning_rate': 0.0003, 'epoch': 0.5}
{'loss': 1.2182, 'learning_rate': 0.0003, 'epoch': 0.5625}
{'loss': 1.2345, 'learning_rate': 0.0003, 'epoch': 0.625}
{'loss': 1.184, 'learning_rate': 0.0003, 'epoch': 0.6875}
{'loss': 1.1806, 'learning_rate': 0.0003, 'epoch': 0.75}
{'loss': 1.2099, 'learning_rate': 0.0003, 'epoch': 0.8125}
{'loss': 1.217, 'learning_rate': 0.0003, 'epoch': 0.875}
{'loss': 1.156, 'learning_rate': 0.0003, 'epoch': 0.9375}
{'loss': 1.1482, 'learning_rate': 0.0003, 'epoch': 1.0}
{'loss': 1.1049, 'learning_rate': 0.0003, 'epoch': 1.0625}
{'loss': 1.1643, 'learning_rate': 0.0003, 'epoch': 1.125}
{'loss': 1.1206, 'learning_rate': 0.0003, 'epoch': 1.1875}
{'loss': 1.1379, 'learning_rate': 0.0003, 'epoch': 1.25}
{'loss': 1.106, 'learning_rate': 0.0003, 'epoch': 1.3125}
{'loss': 1.161, 'learning_rate': 0.0003, 'epoch': 1.375}
{'loss': 1.1164, 'learning_rate': 0.0003, 'epoch': 1.4375}
{'loss': 1.1261, 'learning_rate': 0.0003, 'epoch': 1.5}
{'loss': 1.1113, 'learning_rate': 0.0003, 'epoch': 1.5625}
{'loss': 1.1142, 'learning_rate': 0.0003, 'epoch': 1.625}
{'loss': 1.117, 'learning_rate': 0.0003, 'epoch': 1.6875}
{'loss': 1.0761, 'learning_rate': 0.0003, 'epoch': 1.75}
{'loss': 1.1123, 'learning_rate': 0.0003, 'epoch': 1.8125}
{'loss': 1.0933, 'learning_rate': 0.0003, 'epoch': 1.875}
{'loss': 1.0644, 'learning_rate': 0.0003, 'epoch': 1.9375}
{'loss': 1.0506, 'learning_rate': 0.0003, 'epoch': 2.0}
/data/kramesh3/envs/sdg-env/lib/python3.9/site-packages/peft/utils/save_and_load.py:160: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
  warnings.warn(
/data/kramesh3/envs/sdg-env/lib/python3.9/site-packages/peft/utils/save_and_load.py:160: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
  warnings.warn(
/home/kramesh3/.local/lib/python3.9/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
[WARNING|integration_utils.py:100] 2025-03-21 14:24:18,034 >> Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/21/2025 14:24:24:WARNING:Using the latest cached version of the module from /home/kramesh3/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--rouge/b01e0accf3bd6dd24839b769a5fda24e14995071570870922c71970b3a6ed886 (last modified on Fri Feb  2 20:35:49 2024) since it couldn't be found locally at evaluate-metric--rouge, or remotely on the Hugging Face Hub.
[WARNING|logging.py:313] 2025-03-21 14:24:30,684 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[WARNING|logging.py:328] 2025-03-21 14:24:30,744 >> Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
{'loss': 1.0614, 'learning_rate': 0.0003, 'epoch': 2.0625}
{'loss': 1.0629, 'learning_rate': 0.0003, 'epoch': 2.125}
{'loss': 1.0569, 'learning_rate': 0.0003, 'epoch': 2.1875}
{'loss': 1.0481, 'learning_rate': 0.0003, 'epoch': 2.25}
{'loss': 1.05, 'learning_rate': 0.0003, 'epoch': 2.3125}
{'loss': 1.0618, 'learning_rate': 0.0003, 'epoch': 2.375}
{'loss': 1.0341, 'learning_rate': 0.0003, 'epoch': 2.4375}
{'loss': 1.0592, 'learning_rate': 0.0003, 'epoch': 2.5}
{'loss': 1.0602, 'learning_rate': 0.0003, 'epoch': 2.5625}
{'loss': 1.0456, 'learning_rate': 0.0003, 'epoch': 2.625}
{'loss': 1.0509, 'learning_rate': 0.0003, 'epoch': 2.6875}
{'loss': 1.0575, 'learning_rate': 0.0003, 'epoch': 2.75}
{'loss': 1.0122, 'learning_rate': 0.0003, 'epoch': 2.8125}
{'loss': 1.0624, 'learning_rate': 0.0003, 'epoch': 2.875}
{'loss': 0.9834, 'learning_rate': 0.0003, 'epoch': 2.9375}
{'loss': 1.0728, 'learning_rate': 0.0003, 'epoch': 3.0}
{'train_runtime': 517.1759, 'train_samples_per_second': 5.882, 'train_steps_per_second': 0.371, 'train_loss': 1.1451983377337456, 'epoch': 3.0}
0
CPU:  0.1
Model:  princeton-nlp/Sheared-LLaMA-1.3B
Using LoRA
Total number of parameters of the model: 1347000320
Fine-tuned number of parameters of the model: 0
Differentially Private Training: False
Non-DP model has been loaded...
Testing for the entire dataset. Number of generations per prompt:  5
Length of test data 127
Saving results to file...
