Metadata-Version: 2.4
Name: astrodetection
Version: 0.1.9
Summary: A Python library for detecting astroturfing (coordinated inauthentic behavior) in social media posts.
Home-page: https://github.com/savaij/astrodetection
Author: Francesco Savatteri
Author-email: astrodetection_python@proton.me
License: MIT
Project-URL: Bug Tracker, https://github.com/savaij/astrodetection/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: networkx
Requires-Dist: tqdm
Requires-Dist: demoji
Requires-Dist: requests
Requires-Dist: polyleven
Requires-Dist: ipysigma
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: scipy
Provides-Extra: standard
Requires-Dist: faiss-cpu; extra == "standard"
Requires-Dist: fasttext; extra == "standard"
Requires-Dist: gensim; extra == "standard"
Provides-Extra: light
Requires-Dist: scikit-learn; extra == "light"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Astrodetection

Astrodetection is a Python library designed for detecting astroturfing clues from lists of posts (mainly on X up to now, but not exclusively)


## Installation

### Pip

```bash
pip install "astrodetection[standard]"
```

or

```bash
pip install "astrodetection[light]"
```

### Conda

1. Use the YAML file to configure the environment with conda:

   ```bash
   conda create -n astrodetection_env
   conda activate astrodetection_env
   conda env update -f environment_standard.yml
   ```

**Note:** the ```environment_standard.yml``` configuration file uses FAISS and Fasttext libraries for [VIGINUM D3LTA implementation](https://github.com/VIGINUM-FR/D3lta)

**If you have compatibility issues, prefer ```environment_light.yml``` and use ```astrodetection_light``` module

## Usage

You can import directly the main functions:

```python
from astrodetection import semantic_faiss, prepare_input_data, compute_bot_likelihood_metrics, create_network
```

Or use them directly:

```python
import glob
import pandas as pd
import os
import numpy as np
import astrodetection

# Load a single JSON file into a DataFrame
file = "file_path"  # Select the first file
df = pd.read_json(file)
df.index = df.index.astype(str)  # Compatibility with d3lta

# Preprocess the DataFrame
df = df[df['tweet'].str.len() > 100]
df = df[df['username'] != 'grok']
df.index = df.index.astype(str)

# Compute matches and scores
df_filtered, df_emb = astrodetection.prepare_input_data(df, embeddings=df['emb'])

matches, df_cluster = astrodetection.semantic_faiss(
    df_filtered.rename(columns={'tweet': 'original'}),
    min_size_txt=0,
    df_embeddings_use=df_emb,
    threshold_grapheme=0.8,
    threshold_language=0.715,
    threshold_semantic=0.9
) #function taken from D3LTA 

scores = astrodetection.compute_bot_likelihood_metrics(df, matches=matches)

# Create a network
network = astrodetection.create_network(matches, df)
```

# New changes

1. _`semantic_faiss`_ function can now take detect only copypastas based on levenshtein distance, ignoring embeddings, if "skip" is passed as argument in _df_embeddings_use_ field.

2. _`compute_bot_likelihood_metrics`_ function can now take columns' names as arguments for more customization
