Metadata-Version: 2.4
Name: ol-openedx-course-translations
Version: 0.6.0
Summary: An Open edX plugin to translate courses
Author: MIT Office of Digital Learning
License-Expression: BSD-3-Clause
License-File: LICENSE.txt
Keywords: Python,edx
Requires-Python: >=3.11
Requires-Dist: deepl>=1.25.0
Requires-Dist: django>=4.0
Requires-Dist: djangorestframework>=3.14.0
Requires-Dist: edx-opaque-keys
Requires-Dist: litellm==1.82.5
Requires-Dist: srt>=3.5.3
Description-Content-Type: text/x-rst

OL Open edX Course Translations
===============================

An Open edX plugin to manage course translations.

Purpose
*******

Translate course content into multiple languages to enhance accessibility for a global audience.

Setup
=====

For detailed installation instructions, please refer to the `plugin installation guide <../../docs#installation-guide>`_.

Installation required in:

* Studio (CMS)
* LMS

Configuration
=============

- Add the following configuration values to the config file in Open edX. For any release after Juniper, that config file is ``/edx/etc/lms.yml`` and ``/edx/etc/cms.yml``. If you're using ``private.py``, add these values to ``lms/envs/private.py`` and ``cms/envs/private.py``. These should be added to the top level. **Ask a fellow developer for these values.**

  .. code-block:: python

       # Output directory for translated courses
       # Default: /openedx/data/course_translations/
       COURSE_TRANSLATIONS_BASE_DIR: "/openedx/data/course_translations/"

       # Translation providers configuration
       TRANSLATIONS_PROVIDERS: {
           "default_provider": "mistral",  # Default provider to use
           "deepl": {
               "api_key": "<YOUR_DEEPL_API_KEY>",
           },
           "openai": {
               "api_key": "<YOUR_OPENAI_API_KEY>",
               "default_model": "gpt-5.2",
           },
           "gemini": {
               "api_key": "<YOUR_GEMINI_API_KEY>",
               "default_model": "gemini-3-pro-preview",
           },
           "mistral": {
               "api_key": "<YOUR_MISTRAL_API_KEY>",
               "default_model": "mistral-large-latest",
           },
       }
       LITE_LLM_REQUEST_TIMEOUT: 300  # Timeout for LLM API requests in seconds

- For Tutor installations, these values can also be managed through a `custom Tutor plugin <https://docs.tutor.edly.io/tutorials/plugin.html#plugin-development-tutorial>`_.

Translation Providers
=====================

The plugin supports multiple translation providers:

- DeepL
- OpenAI (GPT models)
- Gemini (Google)
- Mistral

**Configuration**

All providers are configured through the ``TRANSLATIONS_PROVIDERS`` dictionary in your settings:

.. code-block:: python

    TRANSLATIONS_PROVIDERS = {
        "default_provider": "mistral",  # Optional: default provider for commands
        "deepl": {
            "api_key": "<YOUR_DEEPL_API_KEY>",
        },
        "openai": {
            "api_key": "<YOUR_OPENAI_API_KEY>",
            "default_model": "gpt-5.2",  # Optional: used when model not specified
        },
        "gemini": {
            "api_key": "<YOUR_GEMINI_API_KEY>",
            "default_model": "gemini-3-pro-preview",
        },
        "mistral": {
            "api_key": "<YOUR_MISTRAL_API_KEY>",
            "default_model": "mistral-large-latest",
        },
    }

**Important Notes:**

1. **DeepL Configuration**: DeepL must be configured in ``TRANSLATIONS_PROVIDERS['deepl']['api_key']``.

2. **DeepL for Subtitle Repair**: DeepL is used as a fallback repair mechanism for subtitle translations when LLM providers fail validation. Even if you use LLM providers for primary translation, you should configure DeepL to enable automatic repair.

3. **Default Models**: The ``default_model`` in each provider's configuration is used when you specify a provider without a model (e.g., ``openai`` instead of ``openai/gpt-5.2``).

**Provider Selection**

You can specify providers in three ways:

1. **Provider only** (uses default model from settings):

.. code-block:: bash

    ./manage.py cms translate_course \
        --target-language ar \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider openai \
        --srt-translation-provider gemini

2. **Provider with specific model**:

.. code-block:: bash

    ./manage.py cms translate_course \
        --target-language ar \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider openai/gpt-5.2 \
        --srt-translation-provider gemini/gemini-3-pro-preview

3. **DeepL** (no model needed):

.. code-block:: bash

    ./manage.py cms translate_course \
        --target-language ar \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider deepl \
        --srt-translation-provider deepl

**Note:** If you specify a provider without a model (e.g., ``openai`` instead of ``openai/gpt-5.2``), the system will use the ``default_model`` configured in ``TRANSLATIONS_PROVIDERS`` for that provider.

Translating a Course
====================
1. Open the course in Studio.
2. Go to Tools -> Export Course.
3. Export the course as a .tar.gz file.
4. Go to the CMS shell
5. Run the management command to translate the course:

   .. code-block:: bash

        ./manage.py cms translate_course \
            --source-language en \
            --target-language ar \
            --course-dir /path/to/course.tar.gz \
            --content-translation-provider openai \
            --srt-translation-provider gemini \
            --translation-validation-provider openai/gpt-5.2 \
            --content-glossary /path/to/content/glossary \
            --srt-glossary /path/to/srt/glossary

**Command Options:**

- ``--source-language``: Source language code (default: en)
- ``--target-language``: Target language code (required)
- ``--course-dir``: Path to exported course tar.gz file (required)
- ``--content-translation-provider``: Translation provider for content (XML/HTML and text) (required).

  Format:

  - ``deepl`` - uses DeepL (no model needed)
  - ``PROVIDER`` - uses provider with default model from settings (e.g., ``openai``, ``gemini``, ``mistral``)
  - ``PROVIDER/MODEL`` - uses provider with specific model (e.g., ``openai/gpt-5.2``, ``gemini/gemini-3-pro-preview``, ``mistral/mistral-large-latest``)

- ``--srt-translation-provider``: Translation provider for SRT subtitles (required). Same format as ``--content-translation-provider``
- ``--translation-validation-provider``: Optional provider to validate/fix XML/HTML translations after translation.
- ``--content-glossary``: Path to glossary directory for content (XML/HTML and text) translation (optional)
- ``--srt-glossary``: Path to glossary directory for SRT subtitle translation (optional)

**Examples:**

.. code-block:: bash

    # Use DeepL for both content and subtitles
    ./manage.py cms translate_course \
        --target-language ar \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider deepl \
        --srt-translation-provider deepl

    # Use OpenAI and Gemini with default models from settings
    ./manage.py cms translate_course \
        --target-language fr \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider openai \
        --srt-translation-provider gemini

    # Use OpenAI with specific model for content, Gemini with default for subtitles
    ./manage.py cms translate_course \
        --target-language fr \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider openai/gpt-5.2 \
        --srt-translation-provider gemini

    # Use Mistral with specific model and separate glossaries for content and SRT
    ./manage.py cms translate_course \
        --target-language es \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider mistral/mistral-large-latest \
        --srt-translation-provider mistral/mistral-large-latest \
        --content-glossary /path/to/content/glossary \
        --srt-glossary /path/to/srt/glossary

    # Use different glossaries for content vs subtitles
    ./manage.py cms translate_course \
        --target-language es \
        --course-dir /path/to/course.tar.gz \
        --content-translation-provider openai \
        --srt-translation-provider gemini \
        --content-glossary /path/to/technical/glossary \
        --srt-glossary /path/to/conversational/glossary

**Glossary Support:**

You can use separate glossaries for content and subtitle translation. This allows you to apply different terminology choices based on context:

- **Content glossary** (``--content-glossary``): Used for XML/HTML content, policy files, and text-based course materials. Typically contains more formal or technical terminology.
- **SRT glossary** (``--srt-glossary``): Used for subtitle translation. Can contain more conversational or context-specific terms appropriate for spoken content.

Create language-specific glossary files in each glossary directory:

.. code-block:: bash

    # Content glossary structure
    glossaries/technical/
    ├── ar.txt  # Arabic glossary
    ├── fr.txt  # French glossary
    └── es.txt  # Spanish glossary

    # SRT glossary structure
    glossaries/conversational/
    ├── ar.txt  # Arabic glossary
    ├── fr.txt  # French glossary
    └── es.txt  # Spanish glossary

Format: One term per line as "source_term : translated_term"

.. code-block:: text

    # es HINTS
    ## TERM MAPPINGS
    These are preferred terminology choices for this language. Use them whenever they sound natural; adapt freely if context requires.

    - 'accuracy' : 'exactitud'
    - 'activation function' : 'función de activación'
    - 'artificial intelligence' : 'inteligencia artificial'
    - 'AUC' : 'AUC'

**Note:** Both glossary arguments are optional. If not provided, translation will proceed without glossary terms. You can provide one, both, or neither glossary as needed.

Subtitle Translation and Validation
====================================

The course translation system includes robust subtitle (SRT) translation with automatic validation and retry mechanisms to ensure high-quality translations with preserved timing information.

**Translation Process**

The subtitle translation follows a multi-stage process with built-in quality checks:

1. **Initial Translation**: Subtitles are translated using your configured provider (DeepL or LLM)
2. **Validation**: Timestamps, subtitle count, and content are validated to ensure integrity
3. **Automatic Retry**: If validation fails, the system automatically retries translation (up to 1 additional attempt)
4. **Task Failure**: If all retries fail validation, the translation task fails to prevent corrupted subtitle files

**Validation Rules**

The system validates subtitle translations against these criteria:

- **Subtitle Count**: Translated file must have the same number of subtitle blocks as the original
- **Index Matching**: Each subtitle block index must match the original (e.g., if original has blocks 1-100, translation must have blocks 1-100 in the same order)
- **Timestamp Preservation**: Start and end times for each subtitle block must remain unchanged
- **Content Validation**: Non-empty original subtitles must have non-empty translations (blank translations are flagged as errors)

**Example Validation Process:**

.. code-block:: text

    1. Initial Translation (using OpenAI):
       ✓ 150 subtitle blocks translated
       ✗ Validation failed: 3 blocks have mismatched timestamps

    2. Retry Attempt:
       ✓ 150 subtitle blocks translated
       ✗ Validation failed: 2 blocks still have issues

    3. Task Failure:
       ❌ Translation failed after all retries
       ❌ Task aborted to prevent corrupted subtitle files

**Failure Handling**

If subtitle translation fails after all attempts:

- The translation task will fail with a ``ValueError``
- The entire course translation will be aborted to prevent incomplete translations
- The translated course directory will be automatically cleaned up
- An error message will indicate which subtitle file caused the failure
- No partial or corrupted translation files will be left behind

License
*******

The code in this repository is licensed under the AGPL 3.0 unless
otherwise noted.

Please see `LICENSE.txt <LICENSE.txt>`_ for details.
