scitex_scholar.citation_graph

Citation Graph Module

Build and analyze citation networks for academic papers using CrossRef data.

This module provides tools to: - Extract citation relationships - Calculate paper similarity (co-citation, bibliographic coupling) - Build citation network graphs - Export for visualization (D3.js, vis.js, Cytoscape)

Example (local SQLite):
>>> from scitex_scholar.citation_graph import CitationGraphBuilder
>>> builder = CitationGraphBuilder(db_path="/path/to/crossref.db")
>>> graph = builder.build("10.1038/s41586-020-2008-3", top_n=20)
Example (HTTP via crossref-local):
>>> builder = CitationGraphBuilder(api_url="http://localhost:31291")
>>> graph = builder.build("10.1038/s41586-020-2008-3", top_n=20)
class scitex_scholar.citation_graph.CitationGraphBuilder(db_path=None, api_url=None)[source]

Bases: object

Build citation network graphs for academic papers.

Auto-detects backend via crossref_local.Config (DB → HTTP).

Example (auto-detect):
>>> builder = CitationGraphBuilder()
>>> graph = builder.build("10.1038/s41586-020-2008-3", top_n=20)
Example (explicit SQLite):
>>> builder = CitationGraphBuilder(db_path="/path/to/crossref.db")
Example (explicit HTTP):
>>> builder = CitationGraphBuilder(api_url="http://localhost:31291")
__init__(db_path=None, api_url=None)[source]

Initialize builder with database path, HTTP API URL, or auto-detect.

When no args given, delegates to crossref_local.Config for auto-detection: 1. CROSSREF_LOCAL_MODE env var (explicit “db” or “http”) 2. CROSSREF_LOCAL_API_URL env var → HTTP mode 3. Local DB file existence → DB mode 4. Fallback to HTTP mode

Parameters:
  • db_path (str) – Path to CrossRef SQLite database (local mode)

  • api_url (str) – URL of crossref-local HTTP API (HTTP mode)

_auto_detect()[source]

Auto-detect backend via crossref_local.Config.

build(seed_doi, top_n=20, weight_coupling=2.0, weight_cocitation=2.0, weight_direct=1.0)[source]

Build citation network around a seed paper.

Parameters:
  • seed_doi (str) – DOI of the seed paper

  • top_n (int) – Number of most similar papers to include

  • weight_coupling (float) – Weight for bibliographic coupling

  • weight_cocitation (float) – Weight for co-citation

  • weight_direct (float) – Weight for direct citations

Return type:

CitationGraph

Returns:

CitationGraph object with nodes and edges

_create_paper_node(doi, similarity_score)[source]

Create a PaperNode with metadata from database.

Parameters:
  • doi (str) – DOI of the paper

  • similarity_score (float) – Calculated similarity score

Return type:

PaperNode

Returns:

PaperNode object

_build_citation_edges(dois)[source]

Build citation edges between papers in the network.

Parameters:

dois (List[str]) – List of DOIs in the network

Return type:

List[CitationEdge]

Returns:

List of CitationEdge objects

build_from_dois(dois, num_related_per_doi=20, weight_coupling=2.0, weight_cocitation=2.0, weight_direct=1.0)[source]

Build citation network from multiple seed DOIs.

Combines similarity scores from all seeds to find papers related to the entire set, producing a richer connected graph.

Parameters:
  • dois (List[str]) – List of seed DOIs

  • num_related_per_doi (int) – Number of related papers to discover per DOI

  • weight_coupling (float) – Weight for bibliographic coupling

  • weight_cocitation (float) – Weight for co-citation

  • weight_direct (float) – Weight for direct citations

Return type:

CitationGraph

Returns:

CitationGraph with all seeds + related papers + edges

build_from_query(query, num_related_per_doi=20, search_limit=10, weight_coupling=2.0, weight_cocitation=2.0, weight_direct=1.0)[source]

Build citation network from a text query.

Searches local databases, extracts DOIs from results, then delegates to build_from_dois().

Parameters:
  • query (str) – Search query (e.g. “hippocampal sharp wave ripples”)

  • num_related_per_doi (int) – Related papers per seed DOI

  • search_limit (int) – Max papers to fetch from search

  • weight_coupling (float) – Weight for bibliographic coupling

  • weight_cocitation (float) – Weight for co-citation

  • weight_direct (float) – Weight for direct citations

Return type:

CitationGraph

Returns:

CitationGraph with search-discovered seeds + related papers

export_json(graph, output_path)[source]

Export graph to JSON file for visualization.

Parameters:
  • graph (CitationGraph) – CitationGraph to export

  • output_path (str) – Path to output JSON file

get_paper_summary(doi)[source]

Get summary information for a paper.

Parameters:

doi (str) – DOI of the paper

Return type:

Optional[dict]

Returns:

Dictionary with paper summary

class scitex_scholar.citation_graph.PaperNode(doi, title='', year=0, authors=<factory>, journal='', citation_count=0, similarity_score=0.0, is_seed=False, metadata=<factory>)[source]

Bases: object

Represents a paper in the citation network.

doi: str
title: str = ''
year: int = 0
authors: List[str]
journal: str = ''
citation_count: int = 0
similarity_score: float = 0.0
is_seed: bool = False
metadata: Dict
to_dict()[source]

Convert to dictionary for JSON export.

Return type:

Dict

__init__(doi, title='', year=0, authors=<factory>, journal='', citation_count=0, similarity_score=0.0, is_seed=False, metadata=<factory>)
class scitex_scholar.citation_graph.CitationEdge(source, target, edge_type='cites', weight=1.0)[source]

Bases: object

Represents a citation relationship between papers.

source: str
target: str
edge_type: str = 'cites'
weight: float = 1.0
to_dict()[source]

Convert to dictionary for JSON export.

Return type:

Dict

__init__(source, target, edge_type='cites', weight=1.0)
class scitex_scholar.citation_graph.CitationGraph(seed_doi, seed_dois=<factory>, nodes=<factory>, edges=<factory>, metadata=<factory>)[source]

Bases: object

Represents a complete citation network.

seed_doi: str
seed_dois: List[str]
nodes: List[PaperNode]
edges: List[CitationEdge]
metadata: Dict
to_dict()[source]

Convert to dictionary for JSON export.

Return type:

Dict

property node_count: int

Number of nodes in graph.

property edge_count: int

Number of edges in graph.

to_networkx()[source]

Convert to NetworkX DiGraph with node attributes.

Returns:

Directed graph with node attributes: title, short_title, year, citations, similarity, journal.

Return type:

networkx.DiGraph

__init__(seed_doi, seed_dois=<factory>, nodes=<factory>, edges=<factory>, metadata=<factory>)
scitex_scholar.citation_graph.plot_citation_graph(graph, backend='auto', output=None, **kwargs)[source]

Visualize a citation graph with pluggable backends.

Parameters:
  • graph (CitationGraph or networkx.DiGraph) – Citation network to visualize. CitationGraph is auto-converted via to_networkx().

  • backend (str) – Rendering backend: ‘auto’, ‘figrecipe’, ‘scitex.plt’, ‘matplotlib’, or ‘pyvis’. Default ‘auto’ picks the best available.

  • output (str, optional) – Output file path. Required for ‘pyvis’ backend (HTML). For static backends, saves the figure to this path.

  • **kwargs – Backend-specific keyword arguments (layout, seed, figsize, etc.).

Returns:

Backend-specific result. Static backends return {'fig', 'ax', 'pos', 'backend'}. Pyvis returns {'output', 'backend'}.

Return type:

dict

scitex_scholar.citation_graph.list_backends()[source]

List available visualization backends.

Returns:

Mapping of backend name to availability.

Return type:

dict