scitex_scholar.metadata_engines

class scitex_scholar.metadata_engines.ScholarEngine(engines=None, config=None, use_cache=True, clear_cache=False)[source]

Bases: object

Aggregates metadata from multiple engines for enrichment.

__init__(engines=None, config=None, use_cache=True, clear_cache=False)[source]
_setup_cache(clear_cache=False)[source]

Setup cache directory and files.

_load_cache()[source]

Load cache from file.

_save_cache()[source]

Save cache to file.

_get_cache_key(title=None, doi=None, **kwargs)[source]

Generate cache key for search parameters.

async search_async(title=None, doi=None, **kwargs)[source]

Search all engines and return combined results.

Return type:

Optional[Dict[str, Dict]]

async search_batch_async(titles=None, dois=None)[source]

Search multiple papers in batch with parallel processing.

Return type:

List[Optional[Dict[str, Dict]]]

async _search_engine_with_timeout(engine, engine_name, title=None, doi=None, timeout=15, **kwargs)[source]

Search single engine with timeout.

_extract_identifiers(metadata)[source]

Extract all identifiers from metadata.

Return type:

Dict

_identifiers_match(ids1, ids2)[source]

Check if any identifiers match between two papers.

Return type:

bool

_validate_paper_consistency(metadata_list)[source]

Check if all metadata refers to same paper by title, exact year, and first author.

Return type:

bool

_validate_against_query(metadata, query_title)[source]

Validate metadata matches the original query with strict title matching.

Return type:

bool

_combine_metadata(engine_results)[source]

Combine metadata with query validation.

Return type:

Optional[Dict]

_merge_metadata_structures(base, additional)[source]

Merge two metadata structures with engine priority.

Return type:

Dict