Core API Reference
===================

Main Functions
--------------

process_references()
~~~~~~~~~~~~~~~~~~~~

The primary function for processing citations.

**Signature:**

.. code-block:: python

    def process_references(
        input_content: str,
        input_type: str,
        template_name: str,
        output_format: str,
        interactive_callback: Callable[[List[Dict]], int]
    ) -> Dict[str, Any]

**Parameters:**

- ``input_content`` (str): The reference content to process
- ``input_type`` (str): Type of input - "txt" or "bib" (required)
- ``template_name`` (str): Template name to use (e.g., "journal_article_full") (required)
- ``output_format`` (str): Output format - "bibtex", "apa", or "mla" (required)
- ``interactive_callback`` (Callable): Function to handle ambiguous matches. Takes a list of candidate dicts and returns the selected index (0-based), or -1 to skip (required)

**Returns:**

A dictionary with keys:

- ``results`` (List[str]): List of formatted citation strings
- ``report`` (dict): Processing report containing:
  
  - ``total`` (int): Total number of entries processed
  - ``succeeded`` (int): Number of successfully processed entries
  - ``failed_entries`` (List[Dict]): List of failed entries with error details

**Example:**

.. code-block:: python

    from onecite import process_references
    
    result = process_references(
        input_content="10.1038/nature14539",
        input_type="txt",
        template_name="journal_article_full",
        output_format="bibtex",
        interactive_callback=lambda candidates: 0  # Auto-select first match
    )
    
    # Access results
    for citation in result['results']:
        print(citation)
    
    # Check report
    print(f"Succeeded: {result['report']['succeeded']}/{result['report']['total']}")

Data Classes
------------

RawEntry
~~~~~~~~

A TypedDict representing an unprocessed reference entry (Stage 1).

**Type Definition:**

.. code-block:: python

    class RawEntry(TypedDict, total=False):
        id: int
        raw_text: str
        doi: Optional[str]
        url: Optional[str]
        query_string: Optional[str]
        original_entry: Optional[Dict[str, Any]]

**Attributes:**

- ``id`` (int): Entry identifier
- ``raw_text`` (str): The raw reference text
- ``doi`` (str, optional): Digital Object Identifier if detected
- ``url`` (str, optional): URL if detected
- ``query_string`` (str, optional): Search query string
- ``original_entry`` (dict, optional): Preserved original BibTeX entry fields

IdentifiedEntry
~~~~~~~~~~~~~~~

A TypedDict representing an entry after identification from data sources (Stage 2).

**Type Definition:**

.. code-block:: python

    class IdentifiedEntry(TypedDict, total=False):
        id: int
        raw_text: str
        doi: Optional[str]
        arxiv_id: Optional[str]
        url: Optional[str]
        metadata: Optional[Dict[str, Any]]
        status: str

**Attributes:**

- ``id`` (int): Entry identifier
- ``raw_text`` (str): Original raw text
- ``doi`` (str, optional): Digital Object Identifier
- ``arxiv_id`` (str, optional): arXiv identifier
- ``url`` (str, optional): Conference or other URL
- ``metadata`` (dict, optional): Additional metadata from various sources
- ``status`` (str): Status - 'identified' or 'identification_failed'

CompletedEntry
~~~~~~~~~~~~~~~

A TypedDict representing a fully processed entry with all metadata (Stage 3).

**Type Definition:**

.. code-block:: python

    class CompletedEntry(TypedDict, total=False):
        id: int
        doi: str
        status: str
        bib_key: str
        bib_data: Dict[str, Any]

**Attributes:**

- ``id`` (int): Entry identifier
- ``doi`` (str): Digital Object Identifier
- ``status`` (str): Status - 'completed' or 'enrichment_failed'
- ``bib_key`` (str): BibTeX citation key (e.g., "LeCun2015Deep")
- ``bib_data`` (dict): Complete bibliographic data with all fields

**Note:** CompletedEntry is a TypedDict without methods. Use the ``FormatterModule`` from ``pipeline.py`` to convert entries to different output formats.

Classes
-------

TemplateLoader
~~~~~~~~~~~~~~

Manages citation templates by loading YAML template files.

**Constructor:**

.. code-block:: python

    TemplateLoader(templates_dir: Optional[str] = None)

**Parameters:**

- ``templates_dir`` (str, optional): Custom template directory path. If None, uses the built-in ``onecite/templates/`` directory.

**Methods:**

- ``load_template(template_name: str) -> Dict[str, Any]``: Load a YAML template by name. Returns the template dictionary or a default template if not found.

**Example:**

.. code-block:: python

    from onecite import TemplateLoader
    
    # Use default templates directory
    loader = TemplateLoader()
    template = loader.load_template("journal_article_full")
    print(template['name'])
    
    # Use custom templates directory
    custom_loader = TemplateLoader(templates_dir="/path/to/templates")
    custom_template = custom_loader.load_template("my_custom_template")

PipelineController
~~~~~~~~~~~~~~~~~~~

Manages the 4-stage processing pipeline (Parse → Identify → Enrich → Format).

**Constructor:**

.. code-block:: python

    PipelineController(use_google_scholar: bool = False)

**Parameters:**

- ``use_google_scholar`` (bool): Whether to enable Google Scholar as a data source. Default is False.

**Methods:**

- ``process(input_content: str, input_type: str, template_name: str, output_format: str, interactive_callback: Callable) -> Dict[str, Any]``: Execute the complete 4-stage processing pipeline

**Note:** Most users should use the ``process_references()`` function instead, which provides a simpler interface. PipelineController is a lower-level API for advanced use cases.

**Example:**

.. code-block:: python

    from onecite import PipelineController
    
    controller = PipelineController()
    result = controller.process(
        input_content="10.1038/nature14539",
        input_type="txt",
        template_name="journal_article_full",
        output_format="bibtex",
        interactive_callback=lambda candidates: 0
    )
    
    print(result['results'])

Exceptions
----------

All exceptions inherit from ``OneCiteError``.

OneCiteError
~~~~~~~~~~~~

Base exception for all OneCite errors.

ValidationError
~~~~~~~~~~~~~~~

Raised when entry validation fails.

.. code-block:: python

    try:
        result = process_references("")
    except ValidationError as e:
        print(f"Validation failed: {e}")

ParseError
~~~~~~~~~~

Raised when parsing input fails.

.. code-block:: python

    try:
        result = process_references("invalid input", input_type="bib")
    except ParseError as e:
        print(f"Parse failed: {e}")

ResolverError
~~~~~~~~~~~~~

Raised when data source resolution fails.

.. code-block:: python

    try:
        result = process_references("nonexistent doi")
    except ResolverError as e:
        print(f"Resolver failed: {e}")

Advanced Usage
--------------

Custom Data Processing
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from onecite import process_references
    
    # For most use cases, use process_references directly
    result = process_references(
        input_content="10.1038/nature14539",
        input_type="txt",
        template_name="journal_article_full",
        output_format="bibtex",
        interactive_callback=lambda candidates: 0
    )
    
    print('\n\n'.join(result['results']))
    
    # Access the processing report
    report = result['report']
    print(f"Total: {report['total']}, Succeeded: {report['succeeded']}")

Working with Templates
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from onecite import TemplateLoader, process_references
    
    # Load a template to inspect it
    loader = TemplateLoader()
    template = loader.load_template("journal_article_full")
    print(f"Template name: {template['name']}")
    print(f"Entry type: {template['entry_type']}")
    
    # To use a custom template, place it in onecite/templates/ directory
    # then reference it by name
    result = process_references(
        input_content="10.1038/nature14539",
        input_type="txt",
        template_name="your_custom_template",  # without .yaml extension
        output_format="bibtex",
        interactive_callback=lambda candidates: 0
    )

Next Steps
----------

- See :doc:`../python_api` for usage examples
- Check :doc:`../advanced_usage` for complex scenarios
- Review :doc:`../templates` for custom formatting
