Metadata-Version: 2.2
Name: lm-sim
Version: 0.0.6
Summary: Python package to compute similarity between LMs
Author-email: Ilze Amanda Auzina <ilze.amanda.auzina@outlook.com>, Shashwat Goel <shashwatnow@gmail.com>, Joschka Strüber <joschka.strueber@bethgelab.org>
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# LM-Similarity

**lm-sim** is a Python module for computing similarity between Language Models and is distributed under the MIT license. 

## Installation

### Dependencies

**lm-sim** requries:
- Python (>=3.9)
- Numpy (>= 1.19.5)

### User installation 
If you already have a working installation of NumPy, the easiest way to install lm-sim is using pip:
```
pip install lm-sim
```

### Example Usage 
Currently we support the calcualtion of 3 similarity metrics in the context of MCQ datasets: 
- $\kappa_p$ probabilistic (default)
- $\kappa_p$ discrete
- Error Consistency

#### Compute similarity based on $\kappa_p$

Below is a simple example on how to compute similarity between 2 models based on $k_p$. The input has be to formatted as follows:
- `output_a`: list[np.array], containing the softmax output probabilties of model a
- `output_b`: list[np.array], containing the softmax output probabilties of model b
- `gt`: list[int], containing the index of the ground truth 

```
from lmsim.metrics import Kappa_p

kappa_p= Kappa_p()
kappa_p.compute_k(output_a, output_b, gt)

```

For a discrete computation (when output probabilities are not availble) set the flag `prob=False` and the input must be formatted as one-hot vectors:
- `output_a`: list[np.array], one-hot vector of model a
- `output_b`: list[np.array], one-hot vector of model b

```
from lmsim.metrics import Kappa_p

kappa_p = Kappa_p(prob=False)
kappa_p.compute_k(output_a, output_b, gt)
```

#### Compute similarity based on Error Consistency
```
from lmsim.metrics import EC

ec = EC()
ec.compute_k(output_a, output_b, gt)
```
Implementation supports both softmax output probabilties or one-hot vector as input.
