Metadata-Version: 2.4
Name: science-of-science-pipeline-udt
Version: 0.1.5
Summary: Integrated pipeline and dashboard for tracing conceptual emergence and evolution in semantic space (UDT case study).
Author: Duco Trompert
License-Expression: MIT
Keywords: science-of-science,bibliometrics,science-mapping,networks,word2vec,openalex,dash
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Visualization
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: networkx>=3.1
Requires-Dist: gensim>=4.3
Requires-Dist: plotly>=5.18
Requires-Dist: dash>=2.14
Requires-Dist: dash-cytoscape>=1.0.2
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == "dev"
Requires-Dist: twine>=5.1.1; extra == "dev"
Requires-Dist: ruff>=0.5.0; extra == "dev"
Dynamic: license-file

# Science of Science: UDT Concept Evolution Pipeline

This repository contains the Python implementation accompanying the bachelor thesis:

**Duco Trompert (Universiteit van Amsterdam, Jan 23, 2026)**

*Science of Science: An Integrated Pipeline for Tracing Conceptual Emergence and Evolution in Semantic Space*

The project implements an **integrated pipeline** for science mapping that links:
data collection (OpenAlex) → pre-processing → network & embedding representations → analysis → **interactive dashboard**.

## What it does

- **Collects and caches** publication metadata from the **OpenAlex API** for a target concept (default: *"Urban Digital Twin"*).
- Builds **keyword co-occurrence networks** (overall and per-year slices).
- Builds **semantic similarity networks** from **Word2Vec** embeddings trained on titles/abstracts/keywords.
- (Optional) Builds **concept-method bipartite networks** using an LLM-based keyword labelling step (served via Ollama).
- Provides an interactive **Dash** dashboard with network visualisations (dash-cytoscape) and time series (plotly).

## Installation (Linux)
```
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install --upgrade science-of-science-pipeline-udt
```


## Installation (Windows CMD)
```
python -m venv .venv
.\.venv\Scripts\activate.bat
python -m pip install --upgrade pip
python -m pip install --upgrade science-of-science-pipeline-udt
```


## Run the dashboard
```
udt-dashboard
```

Open http://127.0.0.1:8050/ in your browser.


## Deactivate the virtual environment (after usage)
```
deactivate
```

