Metadata-Version: 2.4
Name: sinapsis-bertopic
Version: 0.1.0
Summary: Add your description here
Project-URL: Homepage, https://sinapsis.tech
Project-URL: Documentation, https://docs.sinapsis.tech/docs
Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-bertopic.git
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: bertopic>=0.17.4
Requires-Dist: kaleido>=1.2.0
Requires-Dist: pillow>=12.1.1
Requires-Dist: sinapsis>=0.2.25
Provides-Extra: wikipedia-reader
Requires-Dist: sinapsis-langchain-readers[langchain-wikipedia-readers]>=0.1.8; extra == "wikipedia-reader"
Provides-Extra: sinapsis-data-writers
Requires-Dist: sinapsis-data-writers[opencv]>=0.1.16; extra == "sinapsis-data-writers"
Provides-Extra: all
Requires-Dist: sinapsis-bertopic[sinapsis-data-writers,wikipedia-reader]; extra == "all"

<h1 align="center">
<br>
<a href="https://sinapsis.tech/">
  <img
    src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
    alt="" width="300">
</a><br>
Sinapsis BERTopic
<br>
</h1>

<h4 align="center">Package for BERTopic </h4>

<p align="center">
<a href="#installation">🐍  Installation</a> •
<a href="#features"> 🚀 Features</a> •
<a href="#documentation">📙 Documentation</a> •
<a href="#license"> 🔍 License </a>
</p>

**Sinapsis BERTopic** provides BERTopic model integration for the Sinapsis framework for topic clusterization.


<h2 id="installation"> 🐍  Installation </h2>

Install using your package manager of choice. We encourage the use of <code>uv</code>

This project is private. Make sure you have authorized credentials before proceeding.

**Recommended Method (using `.netrc`):**

To avoid baking credentials into URLs, configure your `~/.netrc` file with your credentials:



Example with <code>uv</code>:

```bash
  uv pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech
```
 or with raw <code>pip</code>:
```bash
  pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech
```
<h2 id="features">🚀 Features</h2>

<h3>Templates Supported</h3>

This package includes a publisher Template and a Worker agent
- **BERTopicFitModel**: A template class for fitting BERTopic models and saving them to disk.
- **BERTopicPredict**: Template for topic prediction using BERTopic models.
- **BERTopicVisualizeDocuments**: BERTopic-based document visualization template for generating and exporting interactive topic model
    visualizations.
    This template extends BERTopicBase to provide functionality for encoding documents using sentence transformers,
    fitting a BERTopic model, and producing interactive visualizations of documents in a reduced dimensional space.
    The visualizations can be saved as HTML files and optionally exported as image arrays.
- **BERTopicVisualizeTopics**: Template for BERTopic topic visualization.

    This template extends BERTopicPredict to generate and save interactive visualizations
    of topics discovered by a BERTopic model. It produces plotly-based visual representations
    of topic relationships and characteristics, and persists them as HTML files.

> [!TIP]
> Use CLI command ``` sinapsis info --all-template-names``` to show a list with all the available Template names installed with Sinapsis OpenAI.


> [!TIP]
> Use CLI command ```sinapsis info --example-template-config TEMPLATE_NAME``` to produce an example Agent config for the Template specified in ***TEMPLATE_NAME***.

For example, for ***BERTopicFitModel*** use ```sinapsis info --example-template-config BERTopicFitModel``` to produce an example config like:

```yaml
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicFitModel
  class_name: BERTopicFitModel
  template_input: InputTemplate
  attributes:
    bertopic_model_params:
      language: english
      top_n_words: 10
      n_gram_range: !!python/tuple
      - 1
      - 1
      min_topic_size: 10
      nr_topics: null
      low_memory: false
      calculate_probabilities: false
      seed_topic_list: null
      zeroshot_topic_list: null
      zeroshot_min_similarity: 0.7
    umap_model_params:
      n_neighbors: 15
      n_components: 2
      metric: euclidean
      metric_kwds: null
      output_metric: euclidean
      output_metric_kwds: null
      n_epochs: null
      learning_rate: 1.0
      init: spectral
      min_dist: 0.1
      spread: 1.0
      low_memory: true
      n_jobs: -1
      set_op_mix_ratio: 1.0
      local_connectivity: 1.0
      repulsion_strength: 1.0
      negative_sample_rate: 5
      transform_queue_size: 4.0
      a: null
      b: null
      random_state: null
      angular_rp_forest: false
      target_n_neighbors: -1
      target_metric: categorical
      target_metric_kwds: null
      target_weight: 0.5
      transform_seed: 42
      transform_mode: embedding
      force_approximation_algorithm: false
      verbose: false
      tqdm_kwds: null
      unique: false
      densmap: false
      dens_lambda: 2.0
      dens_frac: 0.3
      dens_var_shift: 0.1
      output_dens: false
      disconnection_distance: null
      precomputed_knn: !!python/tuple
      - null
      - null
      - null
    hdbscan_model_params:
      min_cluster_size: 5
      min_samples: null
      cluster_selection_epsilon: 0.0
      cluster_selection_persistence: 0.0
      max_cluster_size: 0
      metric: euclidean
      alpha: 1.0
      p: null
      algorithm: best
      leaf_size: 40
      approx_min_span_tree: true
      gen_min_span_tree: false
      core_dist_n_jobs: 4
      cluster_selection_method: eom
      allow_single_cluster: false
      prediction_data: false
      branch_detection_data: false
      match_reference_implementation: false
      cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
      kwargs: '`replace_me:dict[str, typing.Any]`'
    bertopic_save_model_params:
      serialization: safetensors
      save_ctfidf: true
      save_embedding_model: sentence-transformers/all-MiniLM-L6-v2
    root_dir: /root/.cache/sinapsis
    save_path: '`replace_me:<class ''str''>`'



```


<h2 id='example'>📚 Usage example</h2>

Below is an example YAML configuration for an albumentations worker

<details>
<summary ><strong><span style="font-size: 1.4em;">Config</span></strong></summary>

```yaml
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicFitModel
  class_name: BERTopicFitModel
  template_input: InputTemplate
  attributes:
    bertopic_model_params:
      language: english
      top_n_words: 10
      n_gram_range: !!python/tuple
      - 1
      - 1
      min_topic_size: 10
      nr_topics: null
      low_memory: false
      calculate_probabilities: false
      seed_topic_list: null
      zeroshot_topic_list: null
      zeroshot_min_similarity: 0.7
    umap_model_params:
      n_neighbors: 15
      n_components: 2
      metric: euclidean
      metric_kwds: null
      output_metric: euclidean
      output_metric_kwds: null
      n_epochs: null
      learning_rate: 1.0
      init: spectral
      min_dist: 0.1
      spread: 1.0
      low_memory: true
      n_jobs: -1
      set_op_mix_ratio: 1.0
      local_connectivity: 1.0
      repulsion_strength: 1.0
      negative_sample_rate: 5
      transform_queue_size: 4.0
      a: null
      b: null
      random_state: null
      angular_rp_forest: false
      target_n_neighbors: -1
      target_metric: categorical
      target_metric_kwds: null
      target_weight: 0.5
      transform_seed: 42
      transform_mode: embedding
      force_approximation_algorithm: false
      verbose: false
      tqdm_kwds: null
      unique: false
      densmap: false
      dens_lambda: 2.0
      dens_frac: 0.3
      dens_var_shift: 0.1
      output_dens: false
      disconnection_distance: null
      precomputed_knn: !!python/tuple
      - null
      - null
      - null
    hdbscan_model_params:
      min_cluster_size: 5
      min_samples: null
      cluster_selection_epsilon: 0.0
      cluster_selection_persistence: 0.0
      max_cluster_size: 0
      metric: euclidean
      alpha: 1.0
      p: null
      algorithm: best
      leaf_size: 40
      approx_min_span_tree: true
      gen_min_span_tree: false
      core_dist_n_jobs: 4
      cluster_selection_method: eom
      allow_single_cluster: false
      prediction_data: false
      branch_detection_data: false
      match_reference_implementation: false
      cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
      kwargs: '`replace_me:dict[str, typing.Any]`'
    bertopic_save_model_params:
      serialization: safetensors
      save_ctfidf: true
      save_embedding_model: sentence-transformers/all-MiniLM-L6-v2
    root_dir: /root/.cache/sinapsis
    save_path: '`replace_me:<class ''str''>`'

```
</details>
This configuration defines an **agent** and a sequence of **templates** to fit a bertopic data based on incoming data.

To run the config, use the CLI:
```bash
sinapsis run name_of_config.yml
```


<h2 id="documentation">📙 Documentation</h2>

Documentation for this and other sinapsis packages is available on the [sinapsis website](https://docs.sinapsis.tech/docs)

Tutorials for different projects within sinapsis are available at [sinapsis tutorials page](https://docs.sinapsis.tech/tutorials)

