Metadata-Version: 2.4
Name: instella-internal-test
Version: 0.1.0
Summary: Open language model trained by AMD GenAI team.this is a fork version for test only
Author: AMD GenAI Team
License: Instella-3B [RESEARCH-ONLY RAIL-MS]
        
        Licensed Artifact(s):
        
        -   Model
        
        -   Source Code
        
        Section I: PREAMBLE
        
        BY ACCESSING, DOWNLOADING, INSTALLING, OR USING THE ARTIFACT, YOU AGREE
        TO BE BOUND BY THIS LICENSE. IF YOU DO NOT AGREE TO ALL OF THE TERMS AND
        CONDITIONS OF THIS LICENSE, DO NOT ACCESS, DOWNLOAD, INSTALL, OR USE THE
        ARTIFACT.
        
        1. Definitions
        
        (a) “Application” refers to a sequence of instructions or statements
            written in machine code language, including object code (that is the
            product of a compiler), binary code (data using a two-symbol system)
            or an intermediate language (such as register transfer language).
        
        (b) “Artifact” refers to a software application (in either binary or
            source code format), Model, and/or Source Code, in accordance with
            what is specified above as the “Licensed Artifact”.
        
        (c) “Contribution” means any work, including any modifications or
            additions to an Artifact, that is intentionally submitted to
            Licensor for inclusion or incorporation in the Artifact directly or
            indirectly by the rights owner. For the purposes of this definition,
            “submitted” means any form of electronic, verbal, or written
            communication sent to the Licensor or its representatives, including
            but not limited to communication on electronic mailing lists, source
            code control systems, and issue tracking systems that are managed
            by, or on behalf of, the Licensor for the purpose of discussing,
            sharing and improving the Artifact, but excluding communication that
            is conspicuously marked or otherwise designated in writing by the
            contributor as “Not a Contribution.”
        
        (d) “Contributor” means Licensor or any other individual or legal entity
            that creates or owns a Contribution that is added to or incorporated
            into an Artifact or its Derivative.
        
        (e) “Data” means a collection of information and/or content extracted
            from the dataset used with a given Model, including to train,
            pretrain, or otherwise evaluate the Model. The Data is not licensed
            under this License.
        
        (f) “Derivative” means a work derived from or based upon an Artifact,
            and includes all modified versions of such Artifact.
        
        (g) “Distribution” means any transmission, reproduction, publication or
            other sharing of an Artifact or Derivative to a Third Party,
            including providing a hosted service incorporating the Artifact,
            which is made available by electronic or other remote means -
            e.g. API-based or web access.
        
        (h) “Harm” includes but is not limited to physical, mental,
            psychological, financial and reputational damage, pain, or loss.
        
        (i) “License” means the terms and conditions for use, reproduction, and
            Distribution as defined in this document.
        
        (j) “Licensor” means the rights owner (by virtue of creation or
            documented transfer of ownership) or entity authorized by the rights
            owner (e.g., exclusive licensee) that is granting the rights in this
            License.
        
        (k) “Model” means any machine-learning based assembly or assemblies
            (including checkpoints), consisting of learnt weights, parameters
            (including optimizer states), corresponding to the model
            architecture as embodied in the Source Code.
        
        (l) “Output” means the results of operating a Model as embodied in
            informational content resulting therefrom.
        
        (m) “Permitted Purpose” means for academic or research purposes only.
        
        (n) “Source Code” means any collection of text written using
            human-readable programming language, including the code and scripts
            used to define, run, load, benchmark or evaluate a Model or any
            component thereof, and/or used to prepare data for training or
            evaluation, if any. Source Code includes any accompanying
            documentation, tutorials, examples, etc, if any. For clarity, the
            term “Source Code” as used in this License includes any and all
            Derivatives of such Source Code.
        
        (o) “Third Parties” means individuals or legal entities that are not
            under common control with Licensor or You.
        
        (p) “Use” includes accessing, using, copying, modifying, and/or
            distributing an Artifact; in connection with a Model as Artifact,
            Use also includes creating content, fine-tuning, updating, running,
            training, evaluating and/or re-parametrizing such Model.
        
        (q) “You” (or “Your”) means an individual or legal entity receiving and
            exercising permissions granted by this License and/or making use of
            the Artifact for permitted purposes and in any permitted field of
            use, including usage of the Artifact in an end-use application -
            e.g. chatbot, translator, image generator, etc.
        
        Section II: INTELLECTUAL PROPERTY RIGHTS
        
        Both copyright and patent grants may apply to the Artifact. The Artifact
        is subject to additional terms and conditions as described in Section III
        below. 
        
        2. Grant of Copyright License. Conditioned upon compliance with Section
        III below and subject to the terms and conditions of this License, each
        Contributor hereby grants to You, only in connection with the Permitted
        Purpose, a worldwide, non-exclusive, royalty-free copyright license to
        reproduce, use, publicly display, publicly perform, sublicense, and
        distribute the Artifact and Derivatives thereof.
        
        3. Grant of Patent License. Conditioned upon compliance with Section III
        below and subject to the terms and conditions of this License, and only
        where and as applicable, each Contributor hereby grants to You, only in
        connection with the Permitted Purpose, a worldwide, non-exclusive,
        royalty-free, irrevocable (except as stated in this paragraph) patent
        license to make, have made, use, sell, offer to sell, import, and
        otherwise transfer the Artifact where such license applies only to those
        patent claims licensable by such Contributor that are necessarily
        infringed by their Contribution(s) alone or by combination of their
        Contribution(s) with the Artifact to which such Contribution(s) was
        submitted. If You institute patent litigation against any entity
        (including a cross-claim or counterclaim in a lawsuit) alleging that the
        Artifact and/or a Contribution incorporated within the Artifact
        constitutes direct or contributory patent infringement, then any patent
        licenses granted to You under this License in connection with the
        Artifact shall terminate as of the date such litigation is asserted or
        filed.
        
        Licensor and Contributor each have the right to grant the licenses
        above.
        
        Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
        
        4. Use-based Restrictions. The restrictions contained in the AMD
        Responsible AI Use Policy set forth in Attachment A are mandatory Use-
        based restrictions. Therefore You may not Use the Artifact in violation
        of such restrictions. You may Use the Artifact only subject to this
        License; if Section II is held unenforceable or inapplicable, this
        Section III will continue to govern any use of the Artifact. You shall
        require all of Your users who Use the Artifact or its Derivative
        to comply with the terms and conditions of this License, including
        those contained in this paragraph, and only for the Permitted Purpose.
        
        5. The Output You Generate with a Model (as Artifact). Except as set
        forth herein, Licensor claims no rights in the Output You generate. You
        are accountable for the Output You generate and its subsequent uses. No
        use of the Output may contravene any provision as stated in this
        License.
        
        6. Distribution and Redistribution. You may host for Third Party remote
        access purposes (e.g. software-as-a-service), reproduce and distribute
        copies of the Artifact or its Derivatives in any medium, with or without
        modifications, provided that You meet the following conditions:
        
        6.1.  Use-based restrictions in paragraph 4 MUST be included as a
              condition precedent to effect any type of legal agreement (e.g. a
              license) governing the use and/or distribution of the Artifact or
              its Derivatives, and You shall give such notice to any subsequent
              Third Party recipients;
        6.2.  You shall give any Third Party recipients of the Artifact or its
              Derivatives a copy of this License;
        6.3.  You shall cause any modified files to carry prominent notices
              stating that You changed the files;
        6.4.  You shall retain all copyright, patent, trademark, and attribution
              notices excluding those notices that do not pertain to any part of
              the Artifact or its Derivatives.
        6.5.  You and any Third Party recipients of the Artifact or its
              Derivative shall adhere to the Permitted Purpose.
        
        You may add Your own copyright statement to Your modifications and may
        provide additional or different license terms and conditions with
        respect to paragraph 6.1., to govern the use, reproduction, or
        Distribution of Your modifications, or for any Derivative, provided that
        Your use, reproduction, and Distribution of the Artifact or its
        Derivative otherwise complies with the conditions stated in this
        License. In other words, the Use-based restrictions in Attachment A form
        the minimum set of terms for You to license to Third Parties any
        Artifact or its Derivative, but You may add more restrictive terms if
        You deem it necessary.
        
        Section IV: OTHER PROVISIONS
        
        7. Updates and Runtime Restrictions. To the maximum extent permitted by
        law, Licensor reserves the right to restrict (remotely or otherwise)
        usage of the Artifact in violation of this License or update the
        Artifact through electronic means.
        
        8. Trademarks and Related. Nothing in this License permits You to make
        use of Licensors’ trademarks, trade names, logos or to otherwise suggest
        endorsement or misrepresent the relationship between the parties; and
        any rights not expressly granted herein are reserved by the Licensors.
        
        9. Disclaimer of Warranty. Unless required by applicable law or agreed
        to in writing, Licensor provides the Artifact (and each Contributor
        provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR
        CONDITIONS OF ANY KIND, either express or implied, including, without
        limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
        MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely
        responsible for determining the appropriateness of using the Artifact,
        and assume any risks associated with Your exercise of permissions under
        this License.
        
        10. Limitation of Liability. In no event and under no legal theory,
        whether in tort (including negligence), contract, or otherwise, unless
        required by applicable law (such as deliberate and grossly negligent
        acts) or agreed to in writing, shall any Contributor be liable to You
        for damages, including any direct, indirect, special, incidental, or
        consequential damages of any character arising as a result of this
        License or out of the use or inability to use the Artifact (including
        but not limited to damages for loss of goodwill, work stoppage, computer
        failure or malfunction, or any and all other commercial damages or
        losses), even if such Contributor has been advised of the possibility of
        such damages.
        
        11. If any provision of this License is held to be invalid, illegal or
        unenforceable, the remaining provisions shall be unaffected thereby and
        remain valid as if such provision had not been set forth herein.
        
        12. Term and Termination. The term of this License will commence upon
        the earlier of Your (a) acceptance of this License or (b) accessing the
        Artifact; and will continue in full force and effect until terminated in
        accordance with the terms and conditions herein. Licensor may terminate
        this License if You are in breach of any term or condition of this
        License. Upon termination of this License, all licenses granted to You
        will terminate and You must promptly delete and cease use of the
        Artifact. Sections 1, 7, 8, 9, 10, 11, and 12 survive termination of
        this License.
        
        END OF TERMS AND CONDITIONS
        
        Attachment A
        
        AMD Responsible AI Use Policy
        
        AMD is committed to the responsible use of its Artificial Intelligence
        (AI) products and technologies (“AMD AI”).  AMD AI may include
        artificial intelligence or machine learning technologies that use
        algorithms to analyze data and generate output using predictions based
        on patterns in data.  This policy explains the uses that AMD
        specifically prohibits.
        
        If you use any AMD AI, you are agreeing to use the AMD AI in compliance
        with applicable laws and not for any of the following prohibited uses.
        
        Prohibited Uses:
        
        1) No Illegal Acts.  Do not use AMD AI in violation of any applicable
        national, state, local, or other jurisdictional law, rule, regulation,
        or sanction.
        
        2) No Explicit Content.  Do not use AMD AI to submit (as input),
        generate, or disseminate content depicting violent or sexually explicit
        content or to create sexual chatbots.
        
        3) No Harm.  Do not use AMD AI for any potentially harmful uses,
           including fraud, deception, discrimination, abuse, or harassment,
           including the following:
        
           a) Harm or abuse of a minor, including grooming and child sexual
              exploitation.
        
           b) Impersonation of human beings for purposes of deception.
        
           c) Generation or dissemination of information you know to be false
              for the purpose of harming others.
        
           d) Intentionally defame, disparage, or otherwise harass others.
        
           e) Intentionally attempting to materially distort the behavior of a
              person in a manner that causes or is likely to cause that person
              or another person physical or psychological harm.
        
           f) Providing medical advice or interpretation of medical results that
              is intended to be a substitute for professional medical advice,
              diagnosis, or treatment.
        
           g) Engaging in the unlawful or unauthorized practice of any
              profession, including financial, legal, medical, health, or
              related professional practices.
        
           h) Judgment of, discrimination against, or harm to individuals or
              groups based on legally protected characteristics or categories,
              online or offline social behavior, or known or predicted personal
              or personality characteristics, including any of the foregoing
              uses in social credit systems.
        
        4) No High-Risk Activity.  Do not use AMD AI in any high-risk activities
         or applications that create a risk of personal injury, death, or
        severe property or environmental damage, including in weapons or
        military applications.
        
        5) No Personal Information.  Do not use AMD AI to collect, process, or
        disclose personal data, including heath or sensitive personal
        information, without the necessary rights or consents.
        
        6) No Infringement.  Do not use AMD AI to generate or disseminate any
        information that infringes upon or misappropriates the intellectual
        property rights of others, including copyright, trademark, patent, and
        trade secret rights, rights to privacy, and publicity rights.
        
        7) No Malware.  Do not use AMD AI to generate or disseminate malware or
        any other content to be used for the purpose of facilitating unpermitted
        access to, or use of, computer systems or data.
        
        8) No Obfuscation.  Do not inappropriately obfuscate or fail to disclose
        to end users the presence of AI in any application in which AMD AI is
        deployed, along with any known risks or dangers of using AI without
        appropriate safeguards, oversight and human control.
        
        9) No Reliance.  Do not rely on any information generated using AMD AI
        without assessing it for accuracy, potential for harm, or other specific
        risks applicable to the use case.
        
Project-URL: Homepage, https://github.com/AMD-AIG-AIMA/Instella
Project-URL: Repository, https://github.com/AMD-AIG-AIMA/Instella
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICES
Requires-Dist: numpy
Requires-Dist: omegaconf
Requires-Dist: rich
Requires-Dist: boto3
Requires-Dist: google-cloud-storage
Requires-Dist: tokenizers
Requires-Dist: packaging
Requires-Dist: cached_path>=1.6.2
Requires-Dist: transformers
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy<1.4,>=1.0; extra == "dev"
Requires-Dist: black<24.0,>=23.1; extra == "dev"
Requires-Dist: isort<5.13,>=5.12; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-sphinx; extra == "dev"
Requires-Dist: twine>=1.11.0; extra == "dev"
Requires-Dist: setuptools; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Requires-Dist: build; extra == "dev"
Provides-Extra: train
Requires-Dist: wandb; extra == "train"
Requires-Dist: click; extra == "train"
Requires-Dist: torchmetrics; extra == "train"
Requires-Dist: smashed[remote]>=0.21.1; extra == "train"
Requires-Dist: safetensors; extra == "train"
Requires-Dist: datasets; extra == "train"
Requires-Dist: scikit-learn; extra == "train"
Requires-Dist: msgspec>=0.14.0; extra == "train"
Provides-Extra: all
Requires-Dist: instella[dev,train]; extra == "all"
Dynamic: license-file

<div align="center">
  <br>
  <br>
  <h1>Instella✨: Fully Open Language Models with Stellar Performance</h1>
<a href='https://huggingface.co/collections/amd/instella-67c8a2c56e9198c85a97dd08'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
<a href='https://rocm.blogs.amd.com/artificial-intelligence/introducing-instella-3B/README.html'><img src='https://img.shields.io/badge/Technical-Blog-red'></a> 
</div>

This is a fork version for test only! 
Instella is a family of state-of-the-art open language models trained on AMD Instinct™ MI300X GPUs by the AMD GenAI team. Instella models significantly outperform existing fully open language models of similar size, as well as bridges the gap between fully open and open weight models by achieving competitive performance compared to Llama-3.2-3B and Qwen2.5-3B models. We provide the model weights, training code, and training data to accelerate the development of open-source language models. For our vision-language models, please check out [Instella-VL](https://github.com/AMD-AIG-AIMA/InstellaVL). For our long-context model, please go to [Instella-Long](https://github.com/AMD-AIG-AIMA/Instella/tree/instella-long).

<div align="center">
<img src="figs/scaling_perf_instruct.png" style="object-fit: contain;"/>
<em><b>Figure 1:</b> Pareto frontier of pre-training tokens vs average benchmark performance for pre-trained and instruct models.</em>

[^1]

</div>

[^1]: Here even for instruct models, we compared against pre-training tokens as 1) exact open weigth instruct model training token numbers are unknown, and 2) adding instruct model training tokens (in billions) leads to marginally insignificant shift in trends.
## Getting Started

### Installation
First install [PyTorch](https://pytorch.org) according to the instructions specific to your operating system. For AMD GPUs, you can aslo start with a [rocm/pytorch](https://hub.docker.com/r/rocm/pytorch/tags?name=pytorch) docker. 

To install from source (recommended for training/fine-tuning) run:

```bash
git clone https://github.com/AMD-AIG-AIMA/Instella.git
cd Instella
# install Flash-Attention on MI300X
GPU_ARCH=gfx942 MAX_JOBS=$(nproc) pip install git+https://github.com/Dao-AILab/flash-attention.git -v
# install other dependencies
pip install -e .[all]
```

### Example Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "amd/Instella-3B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True)

prompt = [{"role": "user", "content": "What are the benefits of open-source AI research?"}]
inputs = tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
    return_tensors='pt'
)

tokens = model.generate(
    inputs.to(model.device),
    max_new_tokens=1024,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(tokens[0], skip_special_tokens=False))
```

### Chat in TRL
You can also use the TRL CLI to chat with the model from the terminal:
```bash
pip install trl
trl chat --model_name_or_path amd/Instella-3B-Instruct --trust_remote_code --max_new_tokens 1024

# <root>:
# which is bigger 9.8 or 9.11?

# <amd/Instella-3B-Instruct>:
# 9.8 is bigger than 9.11. The difference between the two numbers is 0.69 (9.8 - 9.11 = 0.69), which indicates that 9.8 is 0.69 units larger than 9.11.  
```


## Pre-Training 

### Data Preparation
We use the [OLMoE-mix-0924](https://huggingface.co/datasets/allenai/OLMoE-mix-0924) dataset for stage 1 pretraining. After downloading the dataset, run the following to tokenize the text data:
```bash
pip install dolma
bash scripts/prepare_pretrain_data_stage1.sh
```

To prepare the second stage training data, download the [dolmino-mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124), [python-edu](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus#downloading-the-data), [dm_math](https://huggingface.co/datasets/LLM360/TxT360/tree/main/data/dm_maths) datasets, and run the data preparation script:
```bash
bash scripts/prepare_pretrain_data_stage2.sh
```

### Training
The configs used to train the Instella-3B models are provided in the [`configs`](https://github.com/AMD-AIG-AIMA/Instella/blob/main/configs/) directory. 

Once you've updated the data paths in the config you can launch a training run via `torchrun`. For example, to launch the 3B model training on a single 8x GPU node, you would run:

```bash
torchrun --nproc_per_node=8 scripts/train.py configs/instella-3b-pretrain-stage1.yaml
```

To resume training from a checkpoint, you can pass its path to `scripts/train.py` with the `--load_path` arguments. For example, to resume training from step 10000 of the Instella pretraining run:

```bash
torchrun --nproc_per_node=8 scripts/train.py configs/instella-3b-pretrain-stage1.yaml --load_path output/pretrain/Instella-3B-pretrain-stage1/latest
```
To launch multi-node jobs, run the following on each of the nodes:
```bash
torchrun --nnodes=$NUM_NODES --nproc_per_node=8 --rdzv_id=$JOB_ID --rdzv_backend=c10d --rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT scripts/train.py configs/instella-3b-pretrain-stage1.yaml
```
where `NUM_NODES` is the total number of nodes, `JOB_ID` is the user-defined job id, `MASTER_ADDR` is the IP address of the master node and `MASTER_PORT` is the port on the `MASTER_ADDR` that can be used to host the C10d TCP store. Please refer to this [documentation](https://pytorch.org/docs/stable/elastic/run.html) for `torchrun` to understand the arguments to configure the rendezvous backend for multi-node training. 

For the second stage pretraining, we trained the model from the first stage checkpoints with three random seeds (see the configs: [5796](./configs/instella-3b-pretrain-stage2-seed-5796.yaml), [6198](./configs/instella-3b-pretrain-stage2-seed-6198.yaml), and [8915](./configs/instella-3b-pretrain-stage2-seed-8915.yaml)), and then merge the checkpoints with [this script](./scripts/merge_ckpts.py). 

## Supervised Fine-tuning (SFT)

### Data Preparation
Run the following commands to prepare the SFT data:
```bash
bash scripts/prepare_sft_data.sh
```
### Training 
Launch the SFT job with the [SFT config file](./configs/instella-3b-sft.yaml):

```
torchrun --nproc_per_node=8 scripts/train.py configs/instella-3b-sft.yaml
```

Note: please make sure to update `load_path` to your final pretrain checkpoint.

## Direct Preference Optimization (DPO)
We conduct DPO after SFT using [open-instruct](https://github.com/allenai/open-instruct/tree/main) with this [commit](https://github.com/allenai/open-instruct/tree/bcb991d4d9b297dc301e03ebaaa5d80dd76bb384/). Please follow their instructions to install the package and then run the DPO training:

```bash
accelerate launch \
    --mixed_precision bf16 \
    --num_machines 1 \
    --num_processes 8 \
    --use_deepspeed \
    --deepspeed_config_file configs/ds_stage2.conf \
    scripts/dpo_tune.py \
    configs/instella-3b-dpo.yaml
```

## Evaluation

Please refer to this [folder](./scripts/evals) for detailed instructions for model evaluation.

## Generate GSM8k Synthetic Data

Synthetic data generation for GSM8k is a multi-step process:
1. Original question -> Masked question (The numerical values in the question are replaced by variables).
2. Masked question -> Program (Code to solve the masked question).
3. Program -> Perturbed questions (New questions where the values have been perturbed).
4. Perturbed questions -> Chain of thought solutions. 

Some steps are repeated multiple times until we know that the output is correct. Specifically, in steps 2 and 4, we already know the answer, so if the answer from the generated programs (or CoTs) don't match the expected answer, we re-run the previous steps. 

For steps 1 and 2, please run the following command:
```
python -W ignore scripts/generate_gsm8k_programs.py
```
  
For steps 3 and 4, please run the following command:
```
python -W ignore scripts/generate_gsm8k_new_samples.py 0
```

## Additional Resources

### Hugging Face Model Cards

- Pre-trained models:
  - Instella-3B-Stage1: [amd/Instella-3B-Stage1](https://huggingface.co/amd/Instella-3B-Stage1), First stage pre-training checkpoint.
  - Instella-3B: [amd/Instella-3B](https://huggingface.co/amd/Instella-3B), Final pre-training checkpoint.
- Instruction-tuned models:
  - Instella-3B-SFT: [amd/Instella-3B-SFT](https://huggingface.co/amd/Instella-3B-SFT), Supervised fine-tuned checkpoint.
  - Instella-3B-Instruct: [amd/Instella-3B-Instruct](https://huggingface.co/amd/Instella-3B-Instruct), Final Instruction-tuned checkpoint.

### Datasets

Second stage pre-training GSM8k synthetic dataset: [amd/Instella-GSM8K-synthetic](https://huggingface.co/datasets/amd/Instella-GSM8K-synthetic)

- The dataset consists of two splits: “train” and “train_119K”.
- For Instella-3B model second stage pre-training we used the “train_119K” split, which is a subset of the larger “train” split.

Please refer to the following blogs to get started with using these techniques on AMD GPUs:

- [PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm™](https://rocm.blogs.amd.com/artificial-intelligence/fsdp-training-pytorch/README.html)
- [Accelerating Large Language Models with Flash Attention on AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html)
- [Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm™](https://rocm.blogs.amd.com/artificial-intelligence/torch_compile/README.html)
- [Introducing the First AMD 1B Language Models: AMD OLMo](https://www.amd.com/en/developer/resources/technical-articles/introducing-the-first-amd-1b-language-model.html)
 
## Acknowledgement
This codebase is built from [OLMo](https://github.com/allenai/OLMo/tree/main).

## License

- The Instella-3B models are licensed for academic and research purposes under a ResearchRAIL license. 
- The [amd/Instella-GSM8K-synthetic](https://huggingface.co/datasets/amd/Instella-GSM8K-synthetic) dataset used in second stage pre-training is built with Qwen2.5-72B-Instruct, and is licensed for academic and research purposes under a ResearchRAIL license. Refer to the [LICENSE](https://huggingface.co/datasets/amd/Instella-GSM8K-synthetic/blob/main/LICENSE) and [NOTICES](https://huggingface.co/datasets/amd/Instella-GSM8K-synthetic/blob/main/NOTICES) in the [amd/Instella-GSM8K-synthetic](https://huggingface.co/datasets/amd/Instella-GSM8K-synthetic) dataset card files for more information.
- Refer to the [LICENSE](./LICENSE) and [NOTICES](./NOTICES) files for more information.

## Citations
Feel free to cite our Instella-3B models and give us a star⭐ if you find our work helpful :)

```text
@misc{Instella,
    title = {Instella: Fully Open Language Models with Stellar Performance},
    url = {https://huggingface.co/amd/Instella-3B},
    author = {Jiang Liu and Jialian Wu and Xiaodong Yu and Prakamya Mishra and Sudhanshu Ranjan and Zicheng Liu and Chaitanya Manem and Yusheng Su and Pratik Prabhanjan Brahma and Gowtham Ramesh and Ximeng Sun and Ze Wang and Emad Barsoum},
    month = {March},
    year = {2025}
}
```
